| 1 1 1 1 1 1 5 5 5 5 5 5 5 5 4 5 5 1 5 4 4 3 3 1 1 1 7 1 6 5 4 3 3 1 1 1 1 1 1 1 1 7 7 1 1 1 7 7 2 2 1 1 2 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 | // SPDX-License-Identifier: GPL-2.0-or-later /* RxRPC key management * * Copyright (C) 2007 Red Hat, Inc. All Rights Reserved. * Written by David Howells (dhowells@redhat.com) * * RxRPC keys should have a description of describing their purpose: * "afs@example.com" */ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #include <crypto/skcipher.h> #include <linux/module.h> #include <linux/net.h> #include <linux/skbuff.h> #include <linux/key-type.h> #include <linux/ctype.h> #include <linux/slab.h> #include <net/sock.h> #include <net/af_rxrpc.h> #include <keys/rxrpc-type.h> #include <keys/user-type.h> #include "ar-internal.h" static int rxrpc_preparse(struct key_preparsed_payload *); static void rxrpc_free_preparse(struct key_preparsed_payload *); static void rxrpc_destroy(struct key *); static void rxrpc_describe(const struct key *, struct seq_file *); static long rxrpc_read(const struct key *, char *, size_t); /* * rxrpc defined keys take an arbitrary string as the description and an * arbitrary blob of data as the payload */ struct key_type key_type_rxrpc = { .name = "rxrpc", .flags = KEY_TYPE_NET_DOMAIN, .preparse = rxrpc_preparse, .free_preparse = rxrpc_free_preparse, .instantiate = generic_key_instantiate, .destroy = rxrpc_destroy, .describe = rxrpc_describe, .read = rxrpc_read, }; EXPORT_SYMBOL(key_type_rxrpc); /* * parse an RxKAD type XDR format token * - the caller guarantees we have at least 4 words */ static int rxrpc_preparse_xdr_rxkad(struct key_preparsed_payload *prep, size_t datalen, const __be32 *xdr, unsigned int toklen) { struct rxrpc_key_token *token, **pptoken; time64_t expiry; size_t plen; u32 tktlen; _enter(",{%x,%x,%x,%x},%u", ntohl(xdr[0]), ntohl(xdr[1]), ntohl(xdr[2]), ntohl(xdr[3]), toklen); if (toklen <= 8 * 4) return -EKEYREJECTED; tktlen = ntohl(xdr[7]); _debug("tktlen: %x", tktlen); if (tktlen > AFSTOKEN_RK_TIX_MAX) return -EKEYREJECTED; if (toklen < 8 * 4 + tktlen) return -EKEYREJECTED; plen = sizeof(*token) + sizeof(*token->kad) + tktlen; prep->quotalen = datalen + plen; plen -= sizeof(*token); token = kzalloc_obj(*token); if (!token) return -ENOMEM; token->kad = kzalloc(plen, GFP_KERNEL); if (!token->kad) { kfree(token); return -ENOMEM; } token->security_index = RXRPC_SECURITY_RXKAD; token->kad->ticket_len = tktlen; token->kad->vice_id = ntohl(xdr[0]); token->kad->kvno = ntohl(xdr[1]); token->kad->start = ntohl(xdr[4]); token->kad->expiry = ntohl(xdr[5]); token->kad->primary_flag = ntohl(xdr[6]); memcpy(&token->kad->session_key, &xdr[2], 8); memcpy(&token->kad->ticket, &xdr[8], tktlen); _debug("SCIX: %u", token->security_index); _debug("TLEN: %u", token->kad->ticket_len); _debug("EXPY: %x", token->kad->expiry); _debug("KVNO: %u", token->kad->kvno); _debug("PRIM: %u", token->kad->primary_flag); _debug("SKEY: %02x%02x%02x%02x%02x%02x%02x%02x", token->kad->session_key[0], token->kad->session_key[1], token->kad->session_key[2], token->kad->session_key[3], token->kad->session_key[4], token->kad->session_key[5], token->kad->session_key[6], token->kad->session_key[7]); if (token->kad->ticket_len >= 8) _debug("TCKT: %02x%02x%02x%02x%02x%02x%02x%02x", token->kad->ticket[0], token->kad->ticket[1], token->kad->ticket[2], token->kad->ticket[3], token->kad->ticket[4], token->kad->ticket[5], token->kad->ticket[6], token->kad->ticket[7]); /* count the number of tokens attached */ prep->payload.data[1] = (void *)((unsigned long)prep->payload.data[1] + 1); /* attach the data */ for (pptoken = (struct rxrpc_key_token **)&prep->payload.data[0]; *pptoken; pptoken = &(*pptoken)->next) continue; *pptoken = token; expiry = rxrpc_u32_to_time64(token->kad->expiry); if (expiry < prep->expiry) prep->expiry = expiry; _leave(" = 0"); return 0; } static u64 xdr_dec64(const __be32 *xdr) { return (u64)ntohl(xdr[0]) << 32 | (u64)ntohl(xdr[1]); } static time64_t rxrpc_s64_to_time64(s64 time_in_100ns) { bool neg = false; u64 tmp = time_in_100ns; if (time_in_100ns < 0) { tmp = -time_in_100ns; neg = true; } do_div(tmp, 10000000); return neg ? -tmp : tmp; } /* * Parse a YFS-RxGK type XDR format token * - the caller guarantees we have at least 4 words * * struct token_rxgk { * opr_time begintime; * opr_time endtime; * afs_int64 level; * afs_int64 lifetime; * afs_int64 bytelife; * afs_int64 enctype; * opaque key<>; * opaque ticket<>; * }; */ static int rxrpc_preparse_xdr_yfs_rxgk(struct key_preparsed_payload *prep, size_t datalen, const __be32 *xdr, unsigned int toklen) { struct rxrpc_key_token *token, **pptoken; time64_t expiry; size_t plen; const __be32 *ticket, *key; s64 tmp; u32 tktlen, keylen; _enter(",{%x,%x,%x,%x},%x", ntohl(xdr[0]), ntohl(xdr[1]), ntohl(xdr[2]), ntohl(xdr[3]), toklen); if (6 * 2 + 2 > toklen / 4) goto reject; key = xdr + (6 * 2 + 1); keylen = ntohl(key[-1]); _debug("keylen: %x", keylen); keylen = round_up(keylen, 4); if ((6 * 2 + 2) * 4 + keylen > toklen) goto reject; ticket = xdr + (6 * 2 + 1 + (keylen / 4) + 1); tktlen = ntohl(ticket[-1]); _debug("tktlen: %x", tktlen); tktlen = round_up(tktlen, 4); if ((6 * 2 + 2) * 4 + keylen + tktlen != toklen) { kleave(" = -EKEYREJECTED [%x!=%x, %x,%x]", (6 * 2 + 2) * 4 + keylen + tktlen, toklen, keylen, tktlen); goto reject; } plen = sizeof(*token) + sizeof(*token->rxgk) + tktlen + keylen; prep->quotalen = datalen + plen; plen -= sizeof(*token); token = kzalloc_obj(*token); if (!token) goto nomem; token->rxgk = kzalloc(sizeof(*token->rxgk) + keylen, GFP_KERNEL); if (!token->rxgk) goto nomem_token; token->security_index = RXRPC_SECURITY_YFS_RXGK; token->rxgk->begintime = xdr_dec64(xdr + 0 * 2); token->rxgk->endtime = xdr_dec64(xdr + 1 * 2); token->rxgk->level = tmp = xdr_dec64(xdr + 2 * 2); if (tmp < -1LL || tmp > RXRPC_SECURITY_ENCRYPT) goto reject_token; token->rxgk->lifetime = xdr_dec64(xdr + 3 * 2); token->rxgk->bytelife = xdr_dec64(xdr + 4 * 2); token->rxgk->enctype = tmp = xdr_dec64(xdr + 5 * 2); if (tmp < 0 || tmp > UINT_MAX) goto reject_token; token->rxgk->key.len = ntohl(key[-1]); token->rxgk->key.data = token->rxgk->_key; token->rxgk->ticket.len = ntohl(ticket[-1]); if (token->rxgk->endtime != 0) { expiry = rxrpc_s64_to_time64(token->rxgk->endtime); if (expiry < 0) goto expired; if (expiry < prep->expiry) prep->expiry = expiry; } memcpy(token->rxgk->key.data, key, token->rxgk->key.len); /* Pad the ticket so that we can use it directly in XDR */ token->rxgk->ticket.data = kzalloc(round_up(token->rxgk->ticket.len, 4), GFP_KERNEL); if (!token->rxgk->ticket.data) goto nomem_yrxgk; memcpy(token->rxgk->ticket.data, ticket, token->rxgk->ticket.len); _debug("SCIX: %u", token->security_index); _debug("EXPY: %llx", token->rxgk->endtime); _debug("LIFE: %llx", token->rxgk->lifetime); _debug("BYTE: %llx", token->rxgk->bytelife); _debug("ENC : %u", token->rxgk->enctype); _debug("LEVL: %u", token->rxgk->level); _debug("KLEN: %u", token->rxgk->key.len); _debug("TLEN: %u", token->rxgk->ticket.len); _debug("KEY0: %*phN", token->rxgk->key.len, token->rxgk->key.data); _debug("TICK: %*phN", min_t(u32, token->rxgk->ticket.len, 32), token->rxgk->ticket.data); /* count the number of tokens attached */ prep->payload.data[1] = (void *)((unsigned long)prep->payload.data[1] + 1); /* attach the data */ for (pptoken = (struct rxrpc_key_token **)&prep->payload.data[0]; *pptoken; pptoken = &(*pptoken)->next) continue; *pptoken = token; _leave(" = 0"); return 0; nomem_yrxgk: kfree(token->rxgk); nomem_token: kfree(token); nomem: return -ENOMEM; reject_token: kfree(token); reject: return -EKEYREJECTED; expired: kfree(token->rxgk); kfree(token); return -EKEYEXPIRED; } /* * attempt to parse the data as the XDR format * - the caller guarantees we have more than 7 words */ static int rxrpc_preparse_xdr(struct key_preparsed_payload *prep) { const __be32 *xdr = prep->data, *token, *p; const char *cp; unsigned int len, paddedlen, loop, ntoken, toklen, sec_ix; size_t datalen = prep->datalen; int ret, ret2; _enter(",{%x,%x,%x,%x},%zu", ntohl(xdr[0]), ntohl(xdr[1]), ntohl(xdr[2]), ntohl(xdr[3]), prep->datalen); if (datalen > AFSTOKEN_LENGTH_MAX) goto not_xdr; /* XDR is an array of __be32's */ if (datalen & 3) goto not_xdr; /* the flags should be 0 (the setpag bit must be handled by * userspace) */ if (ntohl(*xdr++) != 0) goto not_xdr; datalen -= 4; /* check the cell name */ len = ntohl(*xdr++); if (len < 1 || len > AFSTOKEN_CELL_MAX) goto not_xdr; datalen -= 4; paddedlen = (len + 3) & ~3; if (paddedlen > datalen) goto not_xdr; cp = (const char *) xdr; for (loop = 0; loop < len; loop++) if (!isprint(cp[loop])) goto not_xdr; for (; loop < paddedlen; loop++) if (cp[loop]) goto not_xdr; _debug("cellname: [%u/%u] '%*.*s'", len, paddedlen, len, len, (const char *) xdr); datalen -= paddedlen; xdr += paddedlen >> 2; /* get the token count */ if (datalen < 12) goto not_xdr; ntoken = ntohl(*xdr++); datalen -= 4; _debug("ntoken: %x", ntoken); if (ntoken < 1 || ntoken > AFSTOKEN_MAX) goto not_xdr; /* check each token wrapper */ p = xdr; loop = ntoken; do { if (datalen < 8) goto not_xdr; toklen = ntohl(*p++); sec_ix = ntohl(*p); datalen -= 4; _debug("token: [%x/%zx] %x", toklen, datalen, sec_ix); paddedlen = (toklen + 3) & ~3; if (toklen < 20 || toklen > datalen || paddedlen > datalen) goto not_xdr; datalen -= paddedlen; p += paddedlen >> 2; } while (--loop > 0); _debug("remainder: %zu", datalen); if (datalen != 0) goto not_xdr; /* okay: we're going to assume it's valid XDR format * - we ignore the cellname, relying on the key to be correctly named */ ret = -EPROTONOSUPPORT; do { toklen = ntohl(*xdr++); token = xdr; xdr += (toklen + 3) / 4; sec_ix = ntohl(*token++); toklen -= 4; _debug("TOKEN type=%x len=%x", sec_ix, toklen); switch (sec_ix) { case RXRPC_SECURITY_RXKAD: ret2 = rxrpc_preparse_xdr_rxkad(prep, datalen, token, toklen); break; case RXRPC_SECURITY_YFS_RXGK: ret2 = rxrpc_preparse_xdr_yfs_rxgk(prep, datalen, token, toklen); break; default: ret2 = -EPROTONOSUPPORT; break; } switch (ret2) { case 0: ret = 0; break; case -EPROTONOSUPPORT: break; case -ENOPKG: if (ret != 0) ret = -ENOPKG; break; default: ret = ret2; goto error; } } while (--ntoken > 0); error: _leave(" = %d", ret); return ret; not_xdr: _leave(" = -EPROTO"); return -EPROTO; } /* * Preparse an rxrpc defined key. * * Data should be of the form: * OFFSET LEN CONTENT * 0 4 key interface version number * 4 2 security index (type) * 6 2 ticket length * 8 4 key expiry time (time_t) * 12 4 kvno * 16 8 session key * 24 [len] ticket * * if no data is provided, then a no-security key is made */ static int rxrpc_preparse(struct key_preparsed_payload *prep) { const struct rxrpc_key_data_v1 *v1; struct rxrpc_key_token *token, **pp; time64_t expiry; size_t plen; u32 kver; int ret; _enter("%zu", prep->datalen); /* handle a no-security key */ if (!prep->data && prep->datalen == 0) return 0; /* determine if the XDR payload format is being used */ if (prep->datalen > 7 * 4) { ret = rxrpc_preparse_xdr(prep); if (ret != -EPROTO) return ret; } /* get the key interface version number */ ret = -EINVAL; if (prep->datalen <= 4 || !prep->data) goto error; memcpy(&kver, prep->data, sizeof(kver)); prep->data += sizeof(kver); prep->datalen -= sizeof(kver); _debug("KEY I/F VERSION: %u", kver); ret = -EKEYREJECTED; if (kver != 1) goto error; /* deal with a version 1 key */ ret = -EINVAL; if (prep->datalen < sizeof(*v1)) goto error; v1 = prep->data; if (prep->datalen != sizeof(*v1) + v1->ticket_length) goto error; _debug("SCIX: %u", v1->security_index); _debug("TLEN: %u", v1->ticket_length); _debug("EXPY: %x", v1->expiry); _debug("KVNO: %u", v1->kvno); _debug("SKEY: %02x%02x%02x%02x%02x%02x%02x%02x", v1->session_key[0], v1->session_key[1], v1->session_key[2], v1->session_key[3], v1->session_key[4], v1->session_key[5], v1->session_key[6], v1->session_key[7]); if (v1->ticket_length >= 8) _debug("TCKT: %02x%02x%02x%02x%02x%02x%02x%02x", v1->ticket[0], v1->ticket[1], v1->ticket[2], v1->ticket[3], v1->ticket[4], v1->ticket[5], v1->ticket[6], v1->ticket[7]); ret = -EPROTONOSUPPORT; if (v1->security_index != RXRPC_SECURITY_RXKAD) goto error; plen = sizeof(*token->kad) + v1->ticket_length; prep->quotalen = plen + sizeof(*token); ret = -ENOMEM; token = kzalloc_obj(*token); if (!token) goto error; token->kad = kzalloc(plen, GFP_KERNEL); if (!token->kad) goto error_free; token->security_index = RXRPC_SECURITY_RXKAD; token->kad->ticket_len = v1->ticket_length; token->kad->expiry = v1->expiry; token->kad->kvno = v1->kvno; memcpy(&token->kad->session_key, &v1->session_key, 8); memcpy(&token->kad->ticket, v1->ticket, v1->ticket_length); /* count the number of tokens attached */ prep->payload.data[1] = (void *)((unsigned long)prep->payload.data[1] + 1); /* attach the data */ pp = (struct rxrpc_key_token **)&prep->payload.data[0]; while (*pp) pp = &(*pp)->next; *pp = token; expiry = rxrpc_u32_to_time64(token->kad->expiry); if (expiry < prep->expiry) prep->expiry = expiry; token = NULL; ret = 0; error_free: kfree(token); error: return ret; } /* * Free token list. */ static void rxrpc_free_token_list(struct rxrpc_key_token *token) { struct rxrpc_key_token *next; for (; token; token = next) { next = token->next; switch (token->security_index) { case RXRPC_SECURITY_RXKAD: kfree(token->kad); break; case RXRPC_SECURITY_YFS_RXGK: kfree(token->rxgk->ticket.data); kfree(token->rxgk); break; default: pr_err("Unknown token type %x on rxrpc key\n", token->security_index); BUG(); } kfree(token); } } /* * Clean up preparse data. */ static void rxrpc_free_preparse(struct key_preparsed_payload *prep) { rxrpc_free_token_list(prep->payload.data[0]); } /* * dispose of the data dangling from the corpse of a rxrpc key */ static void rxrpc_destroy(struct key *key) { rxrpc_free_token_list(key->payload.data[0]); } /* * describe the rxrpc key */ static void rxrpc_describe(const struct key *key, struct seq_file *m) { const struct rxrpc_key_token *token; const char *sep = ": "; seq_puts(m, key->description); for (token = key->payload.data[0]; token; token = token->next) { seq_puts(m, sep); switch (token->security_index) { case RXRPC_SECURITY_RXKAD: seq_puts(m, "ka"); break; case RXRPC_SECURITY_YFS_RXGK: seq_puts(m, "ygk"); break; default: /* we have a ticket we can't encode */ seq_printf(m, "%u", token->security_index); break; } sep = " "; } } /* * grab the security key for a socket */ int rxrpc_request_key(struct rxrpc_sock *rx, sockptr_t optval, int optlen) { struct key *key; char *description; _enter(""); if (optlen <= 0 || optlen > PAGE_SIZE - 1 || rx->securities) return -EINVAL; description = memdup_sockptr_nul(optval, optlen); if (IS_ERR(description)) return PTR_ERR(description); key = request_key_net(&key_type_rxrpc, description, sock_net(&rx->sk), NULL); if (IS_ERR(key)) { kfree(description); _leave(" = %ld", PTR_ERR(key)); return PTR_ERR(key); } rx->key = key; kfree(description); _leave(" = 0 [key %x]", key->serial); return 0; } /* * generate a server data key */ int rxrpc_get_server_data_key(struct rxrpc_connection *conn, const void *session_key, time64_t expiry, u32 kvno) { const struct cred *cred = current_cred(); struct key *key; int ret; struct { u32 kver; struct rxrpc_key_data_v1 v1; } data; _enter(""); key = key_alloc(&key_type_rxrpc, "x", GLOBAL_ROOT_UID, GLOBAL_ROOT_GID, cred, 0, KEY_ALLOC_NOT_IN_QUOTA, NULL); if (IS_ERR(key)) { _leave(" = -ENOMEM [alloc %ld]", PTR_ERR(key)); return -ENOMEM; } _debug("key %d", key_serial(key)); data.kver = 1; data.v1.security_index = RXRPC_SECURITY_RXKAD; data.v1.ticket_length = 0; data.v1.expiry = rxrpc_time64_to_u32(expiry); data.v1.kvno = 0; memcpy(&data.v1.session_key, session_key, sizeof(data.v1.session_key)); ret = key_instantiate_and_link(key, &data, sizeof(data), NULL, NULL); if (ret < 0) goto error; conn->key = key; _leave(" = 0 [%d]", key_serial(key)); return 0; error: key_revoke(key); key_put(key); _leave(" = -ENOMEM [ins %d]", ret); return -ENOMEM; } EXPORT_SYMBOL(rxrpc_get_server_data_key); /** * rxrpc_get_null_key - Generate a null RxRPC key * @keyname: The name to give the key. * * Generate a null RxRPC key that can be used to indicate anonymous security is * required for a particular domain. * * Return: The new key or a negative error code. */ struct key *rxrpc_get_null_key(const char *keyname) { const struct cred *cred = current_cred(); struct key *key; int ret; key = key_alloc(&key_type_rxrpc, keyname, GLOBAL_ROOT_UID, GLOBAL_ROOT_GID, cred, KEY_POS_SEARCH, KEY_ALLOC_NOT_IN_QUOTA, NULL); if (IS_ERR(key)) return key; ret = key_instantiate_and_link(key, NULL, 0, NULL, NULL); if (ret < 0) { key_revoke(key); key_put(key); return ERR_PTR(ret); } return key; } EXPORT_SYMBOL(rxrpc_get_null_key); /* * read the contents of an rxrpc key * - this returns the result in XDR form */ static long rxrpc_read(const struct key *key, char *buffer, size_t buflen) { const struct rxrpc_key_token *token; size_t size; __be32 *xdr, *oldxdr; u32 cnlen, toksize, ntoks, tok, zero; u16 toksizes[AFSTOKEN_MAX]; _enter(""); /* we don't know what form we should return non-AFS keys in */ if (memcmp(key->description, "afs@", 4) != 0) return -EOPNOTSUPP; cnlen = strlen(key->description + 4); #define RND(X) (((X) + 3) & ~3) /* AFS keys we return in XDR form, so we need to work out the size of * the XDR */ size = 2 * 4; /* flags, cellname len */ size += RND(cnlen); /* cellname */ size += 1 * 4; /* token count */ ntoks = 0; for (token = key->payload.data[0]; token; token = token->next) { toksize = 4; /* sec index */ switch (token->security_index) { case RXRPC_SECURITY_RXKAD: toksize += 8 * 4; /* viceid, kvno, key*2, begin, * end, primary, tktlen */ if (!token->no_leak_key) toksize += RND(token->kad->ticket_len); break; case RXRPC_SECURITY_YFS_RXGK: toksize += 6 * 8 + 2 * 4; if (!token->no_leak_key) toksize += RND(token->rxgk->key.len); toksize += RND(token->rxgk->ticket.len); break; default: /* we have a ticket we can't encode */ pr_err("Unsupported key token type (%u)\n", token->security_index); return -ENOPKG; } _debug("token[%u]: toksize=%u", ntoks, toksize); if (WARN_ON(toksize > AFSTOKEN_LENGTH_MAX)) return -EIO; toksizes[ntoks++] = toksize; size += toksize + 4; /* each token has a length word */ } #undef RND if (!buffer || buflen < size) return size; xdr = (__be32 *)buffer; zero = 0; #define ENCODE(x) \ do { \ *xdr++ = htonl(x); \ } while(0) #define ENCODE_DATA(l, s) \ do { \ u32 _l = (l); \ ENCODE(l); \ memcpy(xdr, (s), _l); \ if (_l & 3) \ memcpy((u8 *)xdr + _l, &zero, 4 - (_l & 3)); \ xdr += (_l + 3) >> 2; \ } while(0) #define ENCODE_BYTES(l, s) \ do { \ u32 _l = (l); \ memcpy(xdr, (s), _l); \ if (_l & 3) \ memcpy((u8 *)xdr + _l, &zero, 4 - (_l & 3)); \ xdr += (_l + 3) >> 2; \ } while(0) #define ENCODE64(x) \ do { \ __be64 y = cpu_to_be64(x); \ memcpy(xdr, &y, 8); \ xdr += 8 >> 2; \ } while(0) #define ENCODE_STR(s) \ do { \ const char *_s = (s); \ ENCODE_DATA(strlen(_s), _s); \ } while(0) ENCODE(0); /* flags */ ENCODE_DATA(cnlen, key->description + 4); /* cellname */ ENCODE(ntoks); tok = 0; for (token = key->payload.data[0]; token; token = token->next) { toksize = toksizes[tok++]; ENCODE(toksize); oldxdr = xdr; ENCODE(token->security_index); switch (token->security_index) { case RXRPC_SECURITY_RXKAD: ENCODE(token->kad->vice_id); ENCODE(token->kad->kvno); ENCODE_BYTES(8, token->kad->session_key); ENCODE(token->kad->start); ENCODE(token->kad->expiry); ENCODE(token->kad->primary_flag); if (token->no_leak_key) ENCODE(0); else ENCODE_DATA(token->kad->ticket_len, token->kad->ticket); break; case RXRPC_SECURITY_YFS_RXGK: ENCODE64(token->rxgk->begintime); ENCODE64(token->rxgk->endtime); ENCODE64(token->rxgk->level); ENCODE64(token->rxgk->lifetime); ENCODE64(token->rxgk->bytelife); ENCODE64(token->rxgk->enctype); if (token->no_leak_key) ENCODE(0); else ENCODE_DATA(token->rxgk->key.len, token->rxgk->key.data); ENCODE_DATA(token->rxgk->ticket.len, token->rxgk->ticket.data); break; default: pr_err("Unsupported key token type (%u)\n", token->security_index); return -ENOPKG; } if (WARN_ON((unsigned long)xdr - (unsigned long)oldxdr != toksize)) return -EIO; } #undef ENCODE_STR #undef ENCODE_DATA #undef ENCODE64 #undef ENCODE if (WARN_ON(tok != ntoks)) return -EIO; if (WARN_ON((unsigned long)xdr - (unsigned long)buffer != size)) return -EIO; _leave(" = %zu", size); return size; } |
| 7 2 7 2 11 11 11 11 10 10 11 11 11 3 11 11 10 4 4 4 4 4 4 2 2 2 2 2 2 2 7 7 7 7 7 11 4 4 4 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 | /* * Non-physical true random number generator based on timing jitter -- * Linux Kernel Crypto API specific code * * Copyright Stephan Mueller <smueller@chronox.de>, 2015 - 2023 * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, and the entire permission notice in its entirety, * including the disclaimer of warranties. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. The name of the author may not be used to endorse or promote * products derived from this software without specific prior * written permission. * * ALTERNATIVELY, this product may be distributed under the terms of * the GNU General Public License, in which case the provisions of the GPL2 are * required INSTEAD OF the above restrictions. (This clause is * necessary due to a potential bad interaction between the GPL and * the restrictions contained in a BSD-style copyright.) * * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESS OR IMPLIED * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE, ALL OF * WHICH ARE HEREBY DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT * OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE * USE OF THIS SOFTWARE, EVEN IF NOT ADVISED OF THE POSSIBILITY OF SUCH * DAMAGE. */ #include <crypto/hash.h> #include <crypto/sha3.h> #include <linux/fips.h> #include <linux/kernel.h> #include <linux/module.h> #include <linux/slab.h> #include <linux/time.h> #include <crypto/internal/rng.h> #include "jitterentropy.h" #define JENT_CONDITIONING_HASH "sha3-256" /*************************************************************************** * Helper function ***************************************************************************/ void *jent_kvzalloc(unsigned int len) { return kvzalloc(len, GFP_KERNEL); } void jent_kvzfree(void *ptr, unsigned int len) { kvfree_sensitive(ptr, len); } void *jent_zalloc(unsigned int len) { return kzalloc(len, GFP_KERNEL); } void jent_zfree(void *ptr) { kfree_sensitive(ptr); } /* * Obtain a high-resolution time stamp value. The time stamp is used to measure * the execution time of a given code path and its variations. Hence, the time * stamp must have a sufficiently high resolution. * * Note, if the function returns zero because a given architecture does not * implement a high-resolution time stamp, the RNG code's runtime test * will detect it and will not produce output. */ void jent_get_nstime(__u64 *out) { __u64 tmp = 0; tmp = random_get_entropy(); /* * If random_get_entropy does not return a value, i.e. it is not * implemented for a given architecture, use a clock source. * hoping that there are timers we can work with. */ if (tmp == 0) tmp = ktime_get_ns(); *out = tmp; jent_raw_hires_entropy_store(tmp); } int jent_hash_time(void *hash_state, __u64 time, u8 *addtl, unsigned int addtl_len, __u64 hash_loop_cnt, unsigned int stuck) { struct shash_desc *hash_state_desc = (struct shash_desc *)hash_state; SHASH_DESC_ON_STACK(desc, hash_state_desc->tfm); u8 intermediary[SHA3_256_DIGEST_SIZE]; __u64 j = 0; int ret; desc->tfm = hash_state_desc->tfm; if (sizeof(intermediary) != crypto_shash_digestsize(desc->tfm)) { pr_warn_ratelimited("Unexpected digest size\n"); return -EINVAL; } kmsan_unpoison_memory(intermediary, sizeof(intermediary)); /* * This loop fills a buffer which is injected into the entropy pool. * The main reason for this loop is to execute something over which we * can perform a timing measurement. The injection of the resulting * data into the pool is performed to ensure the result is used and * the compiler cannot optimize the loop away in case the result is not * used at all. Yet that data is considered "additional information" * considering the terminology from SP800-90A without any entropy. * * Note, it does not matter which or how much data you inject, we are * interested in one Keccack1600 compression operation performed with * the crypto_shash_final. */ for (j = 0; j < hash_loop_cnt; j++) { ret = crypto_shash_init(desc) ?: crypto_shash_update(desc, intermediary, sizeof(intermediary)) ?: crypto_shash_finup(desc, addtl, addtl_len, intermediary); if (ret) goto err; } /* * Inject the data from the previous loop into the pool. This data is * not considered to contain any entropy, but it stirs the pool a bit. */ ret = crypto_shash_update(hash_state_desc, intermediary, sizeof(intermediary)); if (ret) goto err; /* * Insert the time stamp into the hash context representing the pool. * * If the time stamp is stuck, do not finally insert the value into the * entropy pool. Although this operation should not do any harm even * when the time stamp has no entropy, SP800-90B requires that any * conditioning operation to have an identical amount of input data * according to section 3.1.5. */ if (stuck) { time = 0; } ret = crypto_shash_update(hash_state_desc, (u8 *)&time, sizeof(__u64)); err: shash_desc_zero(desc); memzero_explicit(intermediary, sizeof(intermediary)); return ret; } int jent_read_random_block(void *hash_state, char *dst, unsigned int dst_len) { struct shash_desc *hash_state_desc = (struct shash_desc *)hash_state; u8 jent_block[SHA3_256_DIGEST_SIZE]; /* Obtain data from entropy pool and re-initialize it */ int ret = crypto_shash_final(hash_state_desc, jent_block) ?: crypto_shash_init(hash_state_desc) ?: crypto_shash_update(hash_state_desc, jent_block, sizeof(jent_block)); if (!ret && dst_len) memcpy(dst, jent_block, dst_len); memzero_explicit(jent_block, sizeof(jent_block)); return ret; } /*************************************************************************** * Kernel crypto API interface ***************************************************************************/ struct jitterentropy { spinlock_t jent_lock; struct rand_data *entropy_collector; struct crypto_shash *tfm; struct shash_desc *sdesc; }; static void jent_kcapi_cleanup(struct crypto_tfm *tfm) { struct jitterentropy *rng = crypto_tfm_ctx(tfm); spin_lock(&rng->jent_lock); if (rng->sdesc) { shash_desc_zero(rng->sdesc); kfree(rng->sdesc); } rng->sdesc = NULL; if (rng->tfm) crypto_free_shash(rng->tfm); rng->tfm = NULL; if (rng->entropy_collector) jent_entropy_collector_free(rng->entropy_collector); rng->entropy_collector = NULL; spin_unlock(&rng->jent_lock); } static int jent_kcapi_init(struct crypto_tfm *tfm) { struct jitterentropy *rng = crypto_tfm_ctx(tfm); struct crypto_shash *hash; struct shash_desc *sdesc; int size, ret = 0; spin_lock_init(&rng->jent_lock); /* Use SHA3-256 as conditioner */ hash = crypto_alloc_shash(JENT_CONDITIONING_HASH, 0, 0); if (IS_ERR(hash)) { pr_err("Cannot allocate conditioning digest\n"); return PTR_ERR(hash); } rng->tfm = hash; size = sizeof(struct shash_desc) + crypto_shash_descsize(hash); sdesc = kmalloc(size, GFP_KERNEL); if (!sdesc) { ret = -ENOMEM; goto err; } sdesc->tfm = hash; crypto_shash_init(sdesc); rng->sdesc = sdesc; rng->entropy_collector = jent_entropy_collector_alloc(CONFIG_CRYPTO_JITTERENTROPY_OSR, 0, sdesc); if (!rng->entropy_collector) { ret = -ENOMEM; goto err; } spin_lock_init(&rng->jent_lock); return 0; err: jent_kcapi_cleanup(tfm); return ret; } static int jent_kcapi_random(struct crypto_rng *tfm, const u8 *src, unsigned int slen, u8 *rdata, unsigned int dlen) { struct jitterentropy *rng = crypto_rng_ctx(tfm); int ret = 0; spin_lock(&rng->jent_lock); ret = jent_read_entropy(rng->entropy_collector, rdata, dlen); if (ret == -3) { /* Handle permanent health test error */ /* * If the kernel was booted with fips=1, it implies that * the entire kernel acts as a FIPS 140 module. In this case * an SP800-90B permanent health test error is treated as * a FIPS module error. */ if (fips_enabled) panic("Jitter RNG permanent health test failure\n"); pr_err("Jitter RNG permanent health test failure\n"); ret = -EFAULT; } else if (ret == -2) { /* Handle intermittent health test error */ pr_warn_ratelimited("Reset Jitter RNG due to intermittent health test failure\n"); ret = -EAGAIN; } else if (ret == -1) { /* Handle other errors */ ret = -EINVAL; } spin_unlock(&rng->jent_lock); return ret; } static int jent_kcapi_reset(struct crypto_rng *tfm, const u8 *seed, unsigned int slen) { return 0; } static struct rng_alg jent_alg = { .generate = jent_kcapi_random, .seed = jent_kcapi_reset, .seedsize = 0, .base = { .cra_name = "jitterentropy_rng", .cra_driver_name = "jitterentropy_rng", .cra_priority = 100, .cra_ctxsize = sizeof(struct jitterentropy), .cra_module = THIS_MODULE, .cra_init = jent_kcapi_init, .cra_exit = jent_kcapi_cleanup, } }; static int __init jent_mod_init(void) { SHASH_DESC_ON_STACK(desc, tfm); struct crypto_shash *tfm; int ret = 0; jent_testing_init(); tfm = crypto_alloc_shash(JENT_CONDITIONING_HASH, 0, 0); if (IS_ERR(tfm)) { jent_testing_exit(); return PTR_ERR(tfm); } desc->tfm = tfm; crypto_shash_init(desc); ret = jent_entropy_init(CONFIG_CRYPTO_JITTERENTROPY_OSR, 0, desc, NULL); shash_desc_zero(desc); crypto_free_shash(tfm); if (ret) { /* Handle permanent health test error */ if (fips_enabled) panic("jitterentropy: Initialization failed with host not compliant with requirements: %d\n", ret); jent_testing_exit(); pr_info("jitterentropy: Initialization failed with host not compliant with requirements: %d\n", ret); return -EFAULT; } return crypto_register_rng(&jent_alg); } static void __exit jent_mod_exit(void) { jent_testing_exit(); crypto_unregister_rng(&jent_alg); } module_init(jent_mod_init); module_exit(jent_mod_exit); MODULE_LICENSE("Dual BSD/GPL"); MODULE_AUTHOR("Stephan Mueller <smueller@chronox.de>"); MODULE_DESCRIPTION("Non-physical True Random Number Generator based on CPU Jitter"); MODULE_ALIAS_CRYPTO("jitterentropy_rng"); |
| 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 | /* SPDX-License-Identifier: MIT */ /* * The VGA aribiter manages VGA space routing and VGA resource decode to * allow multiple VGA devices to be used in a system in a safe way. * * (C) Copyright 2005 Benjamin Herrenschmidt <benh@kernel.crashing.org> * (C) Copyright 2007 Paulo R. Zanoni <przanoni@gmail.com> * (C) Copyright 2007, 2009 Tiago Vignatti <vignatti@freedesktop.org> */ #ifndef LINUX_VGA_H #define LINUX_VGA_H #include <video/vga.h> struct pci_dev; /* Legacy VGA regions */ #define VGA_RSRC_NONE 0x00 #define VGA_RSRC_LEGACY_IO 0x01 #define VGA_RSRC_LEGACY_MEM 0x02 #define VGA_RSRC_LEGACY_MASK (VGA_RSRC_LEGACY_IO | VGA_RSRC_LEGACY_MEM) /* Non-legacy access */ #define VGA_RSRC_NORMAL_IO 0x04 #define VGA_RSRC_NORMAL_MEM 0x08 #ifdef CONFIG_VGA_ARB void vga_set_legacy_decoding(struct pci_dev *pdev, unsigned int decodes); int vga_get(struct pci_dev *pdev, unsigned int rsrc, int interruptible); void vga_put(struct pci_dev *pdev, unsigned int rsrc); struct pci_dev *vga_default_device(void); void vga_set_default_device(struct pci_dev *pdev); int vga_remove_vgacon(struct pci_dev *pdev); int vga_client_register(struct pci_dev *pdev, unsigned int (*set_decode)(struct pci_dev *pdev, bool state)); #else /* CONFIG_VGA_ARB */ static inline void vga_set_legacy_decoding(struct pci_dev *pdev, unsigned int decodes) { }; static inline int vga_get(struct pci_dev *pdev, unsigned int rsrc, int interruptible) { return 0; } static inline void vga_put(struct pci_dev *pdev, unsigned int rsrc) { } static inline struct pci_dev *vga_default_device(void) { return NULL; } static inline void vga_set_default_device(struct pci_dev *pdev) { } static inline int vga_remove_vgacon(struct pci_dev *pdev) { return 0; } static inline int vga_client_register(struct pci_dev *pdev, unsigned int (*set_decode)(struct pci_dev *pdev, bool state)) { return 0; } #endif /* CONFIG_VGA_ARB */ /** * vga_get_interruptible * @pdev: pci device of the VGA card or NULL for the system default * @rsrc: bit mask of resources to acquire and lock * * Shortcut to vga_get with interruptible set to true. * * On success, release the VGA resource again with vga_put(). */ static inline int vga_get_interruptible(struct pci_dev *pdev, unsigned int rsrc) { return vga_get(pdev, rsrc, 1); } /** * vga_get_uninterruptible - shortcut to vga_get() * @pdev: pci device of the VGA card or NULL for the system default * @rsrc: bit mask of resources to acquire and lock * * Shortcut to vga_get with interruptible set to false. * * On success, release the VGA resource again with vga_put(). */ static inline int vga_get_uninterruptible(struct pci_dev *pdev, unsigned int rsrc) { return vga_get(pdev, rsrc, 0); } static inline void vga_client_unregister(struct pci_dev *pdev) { vga_client_register(pdev, NULL); } #endif /* LINUX_VGA_H */ |
| 5 1 34 35 34 1 34 5 5 5 5 8 1 404 405 403 405 404 404 405 424 405 427 232 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 | // SPDX-License-Identifier: GPL-2.0-only /* -*- linux-c -*- * sysctl_net.c: sysctl interface to net subsystem. * * Begun April 1, 1996, Mike Shaver. * Added /proc/sys/net directories for each protocol family. [MS] * * Revision 1.2 1996/05/08 20:24:40 shaver * Added bits for NET_BRIDGE and the NET_IPV4_ARP stuff and * NET_IPV4_IP_FORWARD. * * */ #include <linux/mm.h> #include <linux/export.h> #include <linux/sysctl.h> #include <linux/nsproxy.h> #include <net/sock.h> #ifdef CONFIG_INET #include <net/ip.h> #endif #ifdef CONFIG_NET #include <linux/if_ether.h> #endif static struct ctl_table_set * net_ctl_header_lookup(struct ctl_table_root *root) { return ¤t->nsproxy->net_ns->sysctls; } static int is_seen(struct ctl_table_set *set) { return ¤t->nsproxy->net_ns->sysctls == set; } /* Return standard mode bits for table entry. */ static int net_ctl_permissions(struct ctl_table_header *head, const struct ctl_table *table) { struct net *net = container_of(head->set, struct net, sysctls); /* Allow network administrator to have same access as root. */ if (ns_capable_noaudit(net->user_ns, CAP_NET_ADMIN)) { int mode = (table->mode >> 6) & 7; return (mode << 6) | (mode << 3) | mode; } return table->mode; } static void net_ctl_set_ownership(struct ctl_table_header *head, kuid_t *uid, kgid_t *gid) { struct net *net = container_of(head->set, struct net, sysctls); kuid_t ns_root_uid; kgid_t ns_root_gid; ns_root_uid = make_kuid(net->user_ns, 0); if (uid_valid(ns_root_uid)) *uid = ns_root_uid; ns_root_gid = make_kgid(net->user_ns, 0); if (gid_valid(ns_root_gid)) *gid = ns_root_gid; } static struct ctl_table_root net_sysctl_root = { .lookup = net_ctl_header_lookup, .permissions = net_ctl_permissions, .set_ownership = net_ctl_set_ownership, }; static int __net_init sysctl_net_init(struct net *net) { setup_sysctl_set(&net->sysctls, &net_sysctl_root, is_seen); return 0; } static void __net_exit sysctl_net_exit(struct net *net) { retire_sysctl_set(&net->sysctls); } static struct pernet_operations sysctl_pernet_ops = { .init = sysctl_net_init, .exit = sysctl_net_exit, }; static struct ctl_table_header *net_header; __init int net_sysctl_init(void) { static struct ctl_table empty[1]; int ret = -ENOMEM; /* Avoid limitations in the sysctl implementation by * registering "/proc/sys/net" as an empty directory not in a * network namespace. */ net_header = register_sysctl_sz("net", empty, 0); if (!net_header) goto out; ret = register_pernet_subsys(&sysctl_pernet_ops); if (ret) goto out1; out: return ret; out1: unregister_sysctl_table(net_header); net_header = NULL; goto out; } /* Verify that sysctls for non-init netns are safe by either: * 1) being read-only, or * 2) having a data pointer which points outside of the global kernel/module * data segment, and rather into the heap where a per-net object was * allocated. */ static void ensure_safe_net_sysctl(struct net *net, const char *path, struct ctl_table *table, size_t table_size) { struct ctl_table *ent; pr_debug("Registering net sysctl (net %p): %s\n", net, path); ent = table; for (size_t i = 0; i < table_size; ent++, i++) { unsigned long addr; const char *where; pr_debug(" procname=%s mode=%o proc_handler=%ps data=%p\n", ent->procname, ent->mode, ent->proc_handler, ent->data); /* If it's not writable inside the netns, then it can't hurt. */ if ((ent->mode & 0222) == 0) { pr_debug(" Not writable by anyone\n"); continue; } /* Where does data point? */ addr = (unsigned long)ent->data; if (is_module_address(addr)) where = "module"; else if (is_kernel_core_data(addr)) where = "kernel"; else continue; /* If it is writable and points to kernel/module global * data, then it's probably a netns leak. */ WARN(1, "sysctl %s/%s: data points to %s global data: %ps\n", path, ent->procname, where, ent->data); /* Make it "safe" by dropping writable perms */ ent->mode &= ~0222; } } struct ctl_table_header *register_net_sysctl_sz(struct net *net, const char *path, struct ctl_table *table, size_t table_size) { if (!net_eq(net, &init_net)) ensure_safe_net_sysctl(net, path, table, table_size); return __register_sysctl_table(&net->sysctls, path, table, table_size); } EXPORT_SYMBOL_GPL(register_net_sysctl_sz); void unregister_net_sysctl_table(struct ctl_table_header *header) { unregister_sysctl_table(header); } EXPORT_SYMBOL_GPL(unregister_net_sysctl_table); |
| 4 4 4 4 4 4 4 4 4 1 4 4 1 1 1 1 4 4 4 4 4 4 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 | // SPDX-License-Identifier: GPL-2.0-only #include "netlink.h" #include "common.h" #include <linux/phy.h> #include <linux/phylib_stubs.h> struct linkstate_req_info { struct ethnl_req_info base; }; struct linkstate_reply_data { struct ethnl_reply_data base; int link; int sqi; int sqi_max; struct ethtool_link_ext_stats link_stats; bool link_ext_state_provided; struct ethtool_link_ext_state_info ethtool_link_ext_state_info; }; #define LINKSTATE_REPDATA(__reply_base) \ container_of(__reply_base, struct linkstate_reply_data, base) const struct nla_policy ethnl_linkstate_get_policy[] = { [ETHTOOL_A_LINKSTATE_HEADER] = NLA_POLICY_NESTED(ethnl_header_policy_stats), }; static int linkstate_get_sqi(struct phy_device *phydev) { int ret; if (!phydev) return -EOPNOTSUPP; mutex_lock(&phydev->lock); if (!phydev->drv || !phydev->drv->get_sqi) ret = -EOPNOTSUPP; else if (!phydev->link) ret = -ENETDOWN; else ret = phydev->drv->get_sqi(phydev); mutex_unlock(&phydev->lock); return ret; } static int linkstate_get_sqi_max(struct phy_device *phydev) { int ret; if (!phydev) return -EOPNOTSUPP; mutex_lock(&phydev->lock); if (!phydev->drv || !phydev->drv->get_sqi_max) ret = -EOPNOTSUPP; else if (!phydev->link) ret = -ENETDOWN; else ret = phydev->drv->get_sqi_max(phydev); mutex_unlock(&phydev->lock); return ret; }; static bool linkstate_sqi_critical_error(int sqi) { return sqi < 0 && sqi != -EOPNOTSUPP && sqi != -ENETDOWN; } static bool linkstate_sqi_valid(struct linkstate_reply_data *data) { return data->sqi >= 0 && data->sqi_max >= 0 && data->sqi <= data->sqi_max; } static int linkstate_get_link_ext_state(struct net_device *dev, struct linkstate_reply_data *data) { int err; if (!dev->ethtool_ops->get_link_ext_state) return -EOPNOTSUPP; err = dev->ethtool_ops->get_link_ext_state(dev, &data->ethtool_link_ext_state_info); if (err) return err; data->link_ext_state_provided = true; return 0; } static int linkstate_prepare_data(const struct ethnl_req_info *req_base, struct ethnl_reply_data *reply_base, const struct genl_info *info) { struct linkstate_reply_data *data = LINKSTATE_REPDATA(reply_base); struct net_device *dev = reply_base->dev; struct nlattr **tb = info->attrs; struct phy_device *phydev; int ret; phydev = ethnl_req_get_phydev(req_base, tb, ETHTOOL_A_LINKSTATE_HEADER, info->extack); if (IS_ERR(phydev)) { ret = PTR_ERR(phydev); goto out; } ret = ethnl_ops_begin(dev); if (ret < 0) return ret; data->link = __ethtool_get_link(dev); ret = linkstate_get_sqi(phydev); if (linkstate_sqi_critical_error(ret)) goto out; data->sqi = ret; ret = linkstate_get_sqi_max(phydev); if (linkstate_sqi_critical_error(ret)) goto out; data->sqi_max = ret; if (dev->flags & IFF_UP) { ret = linkstate_get_link_ext_state(dev, data); if (ret < 0 && ret != -EOPNOTSUPP && ret != -ENODATA) goto out; } ethtool_stats_init((u64 *)&data->link_stats, sizeof(data->link_stats) / 8); if (req_base->flags & ETHTOOL_FLAG_STATS) { if (phydev) phy_ethtool_get_link_ext_stats(phydev, &data->link_stats); if (dev->ethtool_ops->get_link_ext_stats) dev->ethtool_ops->get_link_ext_stats(dev, &data->link_stats); } ret = 0; out: ethnl_ops_complete(dev); return ret; } static int linkstate_reply_size(const struct ethnl_req_info *req_base, const struct ethnl_reply_data *reply_base) { struct linkstate_reply_data *data = LINKSTATE_REPDATA(reply_base); int len; len = nla_total_size(sizeof(u8)) /* LINKSTATE_LINK */ + 0; if (linkstate_sqi_valid(data)) { len += nla_total_size(sizeof(u32)); /* LINKSTATE_SQI */ len += nla_total_size(sizeof(u32)); /* LINKSTATE_SQI_MAX */ } if (data->link_ext_state_provided) len += nla_total_size(sizeof(u8)); /* LINKSTATE_EXT_STATE */ if (data->ethtool_link_ext_state_info.__link_ext_substate) len += nla_total_size(sizeof(u8)); /* LINKSTATE_EXT_SUBSTATE */ if (data->link_stats.link_down_events != ETHTOOL_STAT_NOT_SET) len += nla_total_size(sizeof(u32)); return len; } static int linkstate_fill_reply(struct sk_buff *skb, const struct ethnl_req_info *req_base, const struct ethnl_reply_data *reply_base) { struct linkstate_reply_data *data = LINKSTATE_REPDATA(reply_base); if (data->link >= 0 && nla_put_u8(skb, ETHTOOL_A_LINKSTATE_LINK, !!data->link)) return -EMSGSIZE; if (linkstate_sqi_valid(data)) { if (nla_put_u32(skb, ETHTOOL_A_LINKSTATE_SQI, data->sqi)) return -EMSGSIZE; if (nla_put_u32(skb, ETHTOOL_A_LINKSTATE_SQI_MAX, data->sqi_max)) return -EMSGSIZE; } if (data->link_ext_state_provided) { if (nla_put_u8(skb, ETHTOOL_A_LINKSTATE_EXT_STATE, data->ethtool_link_ext_state_info.link_ext_state)) return -EMSGSIZE; if (data->ethtool_link_ext_state_info.__link_ext_substate && nla_put_u8(skb, ETHTOOL_A_LINKSTATE_EXT_SUBSTATE, data->ethtool_link_ext_state_info.__link_ext_substate)) return -EMSGSIZE; } if (data->link_stats.link_down_events != ETHTOOL_STAT_NOT_SET) if (nla_put_u32(skb, ETHTOOL_A_LINKSTATE_EXT_DOWN_CNT, data->link_stats.link_down_events)) return -EMSGSIZE; return 0; } const struct ethnl_request_ops ethnl_linkstate_request_ops = { .request_cmd = ETHTOOL_MSG_LINKSTATE_GET, .reply_cmd = ETHTOOL_MSG_LINKSTATE_GET_REPLY, .hdr_attr = ETHTOOL_A_LINKSTATE_HEADER, .req_info_size = sizeof(struct linkstate_req_info), .reply_data_size = sizeof(struct linkstate_reply_data), .prepare_data = linkstate_prepare_data, .reply_size = linkstate_reply_size, .fill_reply = linkstate_fill_reply, }; |
| 10 10 10 10 6 6 10 2 2 2 2 2 8 8 8 8 8 10 10 10 10 10 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 | // SPDX-License-Identifier: GPL-2.0+ /* * Copyright (C) 2012 by Alan Stern */ /* This file is part of ehci-hcd.c */ /*-------------------------------------------------------------------------*/ /* Set a bit in the USBCMD register */ static void ehci_set_command_bit(struct ehci_hcd *ehci, u32 bit) { ehci->command |= bit; ehci_writel(ehci, ehci->command, &ehci->regs->command); /* unblock posted write */ ehci_readl(ehci, &ehci->regs->command); } /* Clear a bit in the USBCMD register */ static void ehci_clear_command_bit(struct ehci_hcd *ehci, u32 bit) { ehci->command &= ~bit; ehci_writel(ehci, ehci->command, &ehci->regs->command); /* unblock posted write */ ehci_readl(ehci, &ehci->regs->command); } /*-------------------------------------------------------------------------*/ /* * EHCI timer support... Now using hrtimers. * * Lots of different events are triggered from ehci->hrtimer. Whenever * the timer routine runs, it checks each possible event; events that are * currently enabled and whose expiration time has passed get handled. * The set of enabled events is stored as a collection of bitflags in * ehci->enabled_hrtimer_events, and they are numbered in order of * increasing delay values (ranging between 1 ms and 100 ms). * * Rather than implementing a sorted list or tree of all pending events, * we keep track only of the lowest-numbered pending event, in * ehci->next_hrtimer_event. Whenever ehci->hrtimer gets restarted, its * expiration time is set to the timeout value for this event. * * As a result, events might not get handled right away; the actual delay * could be anywhere up to twice the requested delay. This doesn't * matter, because none of the events are especially time-critical. The * ones that matter most all have a delay of 1 ms, so they will be * handled after 2 ms at most, which is okay. In addition to this, we * allow for an expiration range of 1 ms. */ /* * Delay lengths for the hrtimer event types. * Keep this list sorted by delay length, in the same order as * the event types indexed by enum ehci_hrtimer_event in ehci.h. */ static unsigned event_delays_ns[] = { 1 * NSEC_PER_MSEC, /* EHCI_HRTIMER_POLL_ASS */ 1 * NSEC_PER_MSEC, /* EHCI_HRTIMER_POLL_PSS */ 1 * NSEC_PER_MSEC, /* EHCI_HRTIMER_POLL_DEAD */ 1125 * NSEC_PER_USEC, /* EHCI_HRTIMER_UNLINK_INTR */ 2 * NSEC_PER_MSEC, /* EHCI_HRTIMER_FREE_ITDS */ 2 * NSEC_PER_MSEC, /* EHCI_HRTIMER_ACTIVE_UNLINK */ 5 * NSEC_PER_MSEC, /* EHCI_HRTIMER_START_UNLINK_INTR */ 6 * NSEC_PER_MSEC, /* EHCI_HRTIMER_ASYNC_UNLINKS */ 10 * NSEC_PER_MSEC, /* EHCI_HRTIMER_IAA_WATCHDOG */ 10 * NSEC_PER_MSEC, /* EHCI_HRTIMER_DISABLE_PERIODIC */ 15 * NSEC_PER_MSEC, /* EHCI_HRTIMER_DISABLE_ASYNC */ 100 * NSEC_PER_MSEC, /* EHCI_HRTIMER_IO_WATCHDOG */ }; /* Enable a pending hrtimer event */ static void ehci_enable_event(struct ehci_hcd *ehci, unsigned event, bool resched) { ktime_t *timeout = &ehci->hr_timeouts[event]; if (resched) *timeout = ktime_add(ktime_get(), event_delays_ns[event]); ehci->enabled_hrtimer_events |= (1 << event); /* Track only the lowest-numbered pending event */ if (event < ehci->next_hrtimer_event) { ehci->next_hrtimer_event = event; hrtimer_start_range_ns(&ehci->hrtimer, *timeout, NSEC_PER_MSEC, HRTIMER_MODE_ABS); } } /* Poll the STS_ASS status bit; see when it agrees with CMD_ASE */ static void ehci_poll_ASS(struct ehci_hcd *ehci) { unsigned actual, want; /* Don't enable anything if the controller isn't running (e.g., died) */ if (ehci->rh_state != EHCI_RH_RUNNING) return; want = (ehci->command & CMD_ASE) ? STS_ASS : 0; actual = ehci_readl(ehci, &ehci->regs->status) & STS_ASS; if (want != actual) { /* Poll again later, but give up after about 2-4 ms */ if (ehci->ASS_poll_count++ < 2) { ehci_enable_event(ehci, EHCI_HRTIMER_POLL_ASS, true); return; } ehci_dbg(ehci, "Waited too long for the async schedule status (%x/%x), giving up\n", want, actual); } ehci->ASS_poll_count = 0; /* The status is up-to-date; restart or stop the schedule as needed */ if (want == 0) { /* Stopped */ if (ehci->async_count > 0) ehci_set_command_bit(ehci, CMD_ASE); } else { /* Running */ if (ehci->async_count == 0) { /* Turn off the schedule after a while */ ehci_enable_event(ehci, EHCI_HRTIMER_DISABLE_ASYNC, true); } } } /* Turn off the async schedule after a brief delay */ static void ehci_disable_ASE(struct ehci_hcd *ehci) { ehci_clear_command_bit(ehci, CMD_ASE); } /* Poll the STS_PSS status bit; see when it agrees with CMD_PSE */ static void ehci_poll_PSS(struct ehci_hcd *ehci) { unsigned actual, want; /* Don't do anything if the controller isn't running (e.g., died) */ if (ehci->rh_state != EHCI_RH_RUNNING) return; want = (ehci->command & CMD_PSE) ? STS_PSS : 0; actual = ehci_readl(ehci, &ehci->regs->status) & STS_PSS; if (want != actual) { /* Poll again later, but give up after about 2-4 ms */ if (ehci->PSS_poll_count++ < 2) { ehci_enable_event(ehci, EHCI_HRTIMER_POLL_PSS, true); return; } ehci_dbg(ehci, "Waited too long for the periodic schedule status (%x/%x), giving up\n", want, actual); } ehci->PSS_poll_count = 0; /* The status is up-to-date; restart or stop the schedule as needed */ if (want == 0) { /* Stopped */ if (ehci->periodic_count > 0) ehci_set_command_bit(ehci, CMD_PSE); } else { /* Running */ if (ehci->periodic_count == 0) { /* Turn off the schedule after a while */ ehci_enable_event(ehci, EHCI_HRTIMER_DISABLE_PERIODIC, true); } } } /* Turn off the periodic schedule after a brief delay */ static void ehci_disable_PSE(struct ehci_hcd *ehci) { ehci_clear_command_bit(ehci, CMD_PSE); } /* Poll the STS_HALT status bit; see when a dead controller stops */ static void ehci_handle_controller_death(struct ehci_hcd *ehci) { if (!(ehci_readl(ehci, &ehci->regs->status) & STS_HALT)) { /* Give up after a few milliseconds */ if (ehci->died_poll_count++ < 5) { /* Try again later */ ehci_enable_event(ehci, EHCI_HRTIMER_POLL_DEAD, true); return; } ehci_warn(ehci, "Waited too long for the controller to stop, giving up\n"); } /* Clean up the mess */ ehci->rh_state = EHCI_RH_HALTED; ehci_writel(ehci, 0, &ehci->regs->configured_flag); ehci_writel(ehci, 0, &ehci->regs->intr_enable); ehci_work(ehci); end_unlink_async(ehci); /* Not in process context, so don't try to reset the controller */ } /* start to unlink interrupt QHs */ static void ehci_handle_start_intr_unlinks(struct ehci_hcd *ehci) { bool stopped = (ehci->rh_state < EHCI_RH_RUNNING); /* * Process all the QHs on the intr_unlink list that were added * before the current unlink cycle began. The list is in * temporal order, so stop when we reach the first entry in the * current cycle. But if the root hub isn't running then * process all the QHs on the list. */ while (!list_empty(&ehci->intr_unlink_wait)) { struct ehci_qh *qh; qh = list_first_entry(&ehci->intr_unlink_wait, struct ehci_qh, unlink_node); if (!stopped && (qh->unlink_cycle == ehci->intr_unlink_wait_cycle)) break; list_del_init(&qh->unlink_node); qh->unlink_reason |= QH_UNLINK_QUEUE_EMPTY; start_unlink_intr(ehci, qh); } /* Handle remaining entries later */ if (!list_empty(&ehci->intr_unlink_wait)) { ehci_enable_event(ehci, EHCI_HRTIMER_START_UNLINK_INTR, true); ++ehci->intr_unlink_wait_cycle; } } /* Handle unlinked interrupt QHs once they are gone from the hardware */ static void ehci_handle_intr_unlinks(struct ehci_hcd *ehci) { bool stopped = (ehci->rh_state < EHCI_RH_RUNNING); /* * Process all the QHs on the intr_unlink list that were added * before the current unlink cycle began. The list is in * temporal order, so stop when we reach the first entry in the * current cycle. But if the root hub isn't running then * process all the QHs on the list. */ ehci->intr_unlinking = true; while (!list_empty(&ehci->intr_unlink)) { struct ehci_qh *qh; qh = list_first_entry(&ehci->intr_unlink, struct ehci_qh, unlink_node); if (!stopped && qh->unlink_cycle == ehci->intr_unlink_cycle) break; list_del_init(&qh->unlink_node); end_unlink_intr(ehci, qh); } /* Handle remaining entries later */ if (!list_empty(&ehci->intr_unlink)) { ehci_enable_event(ehci, EHCI_HRTIMER_UNLINK_INTR, true); ++ehci->intr_unlink_cycle; } ehci->intr_unlinking = false; } /* Start another free-iTDs/siTDs cycle */ static void start_free_itds(struct ehci_hcd *ehci) { if (!(ehci->enabled_hrtimer_events & BIT(EHCI_HRTIMER_FREE_ITDS))) { ehci->last_itd_to_free = list_entry( ehci->cached_itd_list.prev, struct ehci_itd, itd_list); ehci->last_sitd_to_free = list_entry( ehci->cached_sitd_list.prev, struct ehci_sitd, sitd_list); ehci_enable_event(ehci, EHCI_HRTIMER_FREE_ITDS, true); } } /* Wait for controller to stop using old iTDs and siTDs */ static void end_free_itds(struct ehci_hcd *ehci) { struct ehci_itd *itd, *n; struct ehci_sitd *sitd, *sn; if (ehci->rh_state < EHCI_RH_RUNNING) { ehci->last_itd_to_free = NULL; ehci->last_sitd_to_free = NULL; } list_for_each_entry_safe(itd, n, &ehci->cached_itd_list, itd_list) { list_del(&itd->itd_list); dma_pool_free(ehci->itd_pool, itd, itd->itd_dma); if (itd == ehci->last_itd_to_free) break; } list_for_each_entry_safe(sitd, sn, &ehci->cached_sitd_list, sitd_list) { list_del(&sitd->sitd_list); dma_pool_free(ehci->sitd_pool, sitd, sitd->sitd_dma); if (sitd == ehci->last_sitd_to_free) break; } if (!list_empty(&ehci->cached_itd_list) || !list_empty(&ehci->cached_sitd_list)) start_free_itds(ehci); } /* Handle lost (or very late) IAA interrupts */ static void ehci_iaa_watchdog(struct ehci_hcd *ehci) { u32 cmd, status; /* * Lost IAA irqs wedge things badly; seen first with a vt8235. * So we need this watchdog, but must protect it against both * (a) SMP races against real IAA firing and retriggering, and * (b) clean HC shutdown, when IAA watchdog was pending. */ if (!ehci->iaa_in_progress || ehci->rh_state != EHCI_RH_RUNNING) return; /* If we get here, IAA is *REALLY* late. It's barely * conceivable that the system is so busy that CMD_IAAD * is still legitimately set, so let's be sure it's * clear before we read STS_IAA. (The HC should clear * CMD_IAAD when it sets STS_IAA.) */ cmd = ehci_readl(ehci, &ehci->regs->command); /* * If IAA is set here it either legitimately triggered * after the watchdog timer expired (_way_ late, so we'll * still count it as lost) ... or a silicon erratum: * - VIA seems to set IAA without triggering the IRQ; * - IAAD potentially cleared without setting IAA. */ status = ehci_readl(ehci, &ehci->regs->status); if ((status & STS_IAA) || !(cmd & CMD_IAAD)) { INCR(ehci->stats.lost_iaa); ehci_writel(ehci, STS_IAA, &ehci->regs->status); } ehci_dbg(ehci, "IAA watchdog: status %x cmd %x\n", status, cmd); end_iaa_cycle(ehci); } /* Enable the I/O watchdog, if appropriate */ static void turn_on_io_watchdog(struct ehci_hcd *ehci) { /* Not needed if the controller isn't running or it's already enabled */ if (ehci->rh_state != EHCI_RH_RUNNING || (ehci->enabled_hrtimer_events & BIT(EHCI_HRTIMER_IO_WATCHDOG))) return; /* * Isochronous transfers always need the watchdog. * For other sorts we use it only if the flag is set. */ if (ehci->isoc_count > 0 || (ehci->need_io_watchdog && ehci->async_count + ehci->intr_count > 0)) ehci_enable_event(ehci, EHCI_HRTIMER_IO_WATCHDOG, true); } /* * Handler functions for the hrtimer event types. * Keep this array in the same order as the event types indexed by * enum ehci_hrtimer_event in ehci.h. */ static void (*event_handlers[])(struct ehci_hcd *) = { ehci_poll_ASS, /* EHCI_HRTIMER_POLL_ASS */ ehci_poll_PSS, /* EHCI_HRTIMER_POLL_PSS */ ehci_handle_controller_death, /* EHCI_HRTIMER_POLL_DEAD */ ehci_handle_intr_unlinks, /* EHCI_HRTIMER_UNLINK_INTR */ end_free_itds, /* EHCI_HRTIMER_FREE_ITDS */ end_unlink_async, /* EHCI_HRTIMER_ACTIVE_UNLINK */ ehci_handle_start_intr_unlinks, /* EHCI_HRTIMER_START_UNLINK_INTR */ unlink_empty_async, /* EHCI_HRTIMER_ASYNC_UNLINKS */ ehci_iaa_watchdog, /* EHCI_HRTIMER_IAA_WATCHDOG */ ehci_disable_PSE, /* EHCI_HRTIMER_DISABLE_PERIODIC */ ehci_disable_ASE, /* EHCI_HRTIMER_DISABLE_ASYNC */ ehci_work, /* EHCI_HRTIMER_IO_WATCHDOG */ }; static enum hrtimer_restart ehci_hrtimer_func(struct hrtimer *t) { struct ehci_hcd *ehci = container_of(t, struct ehci_hcd, hrtimer); ktime_t now; unsigned long events; unsigned long flags; unsigned e; spin_lock_irqsave(&ehci->lock, flags); events = ehci->enabled_hrtimer_events; ehci->enabled_hrtimer_events = 0; ehci->next_hrtimer_event = EHCI_HRTIMER_NO_EVENT; /* * Check each pending event. If its time has expired, handle * the event; otherwise re-enable it. */ now = ktime_get(); for_each_set_bit(e, &events, EHCI_HRTIMER_NUM_EVENTS) { if (ktime_compare(now, ehci->hr_timeouts[e]) >= 0) event_handlers[e](ehci); else ehci_enable_event(ehci, e, false); } spin_unlock_irqrestore(&ehci->lock, flags); return HRTIMER_NORESTART; } |
| 4 2 5 22 22 21 24 2 1 21 22 24 22 3 22 22 1 22 12 12 46 61 15 15 15 15 21 17 4 17 17 10 14 34 33 1 33 6 6 27 33 11 37 37 37 1 2 3 2 2 49 46 3 3 2 1 45 49 6 4 6 22 22 3 3 3 3 16 2 16 2 2 17 2 17 2 2 2 35 36 2 2 2 14 14 2 2 13 13 2 2 2 1 2 2 2 1 2 39 38 4 2 1 1 3 3 3 39 3 3 3 3 38 4 37 2 2 2 18 13 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 | // SPDX-License-Identifier: GPL-2.0-or-later /* * Asynchronous Cryptographic Hash operations. * * This is the implementation of the ahash (asynchronous hash) API. It differs * from shash (synchronous hash) in that ahash supports asynchronous operations, * and it hashes data from scatterlists instead of virtually addressed buffers. * * The ahash API provides access to both ahash and shash algorithms. The shash * API only provides access to shash algorithms. * * Copyright (c) 2008 Loc Ho <lho@amcc.com> */ #include <crypto/scatterwalk.h> #include <linux/cryptouser.h> #include <linux/err.h> #include <linux/kernel.h> #include <linux/mm.h> #include <linux/module.h> #include <linux/scatterlist.h> #include <linux/slab.h> #include <linux/seq_file.h> #include <linux/string.h> #include <linux/string_choices.h> #include <net/netlink.h> #include "hash.h" #define CRYPTO_ALG_TYPE_AHASH_MASK 0x0000000e static int ahash_def_finup(struct ahash_request *req); static inline bool crypto_ahash_block_only(struct crypto_ahash *tfm) { return crypto_ahash_alg(tfm)->halg.base.cra_flags & CRYPTO_AHASH_ALG_BLOCK_ONLY; } static inline bool crypto_ahash_final_nonzero(struct crypto_ahash *tfm) { return crypto_ahash_alg(tfm)->halg.base.cra_flags & CRYPTO_AHASH_ALG_FINAL_NONZERO; } static inline bool crypto_ahash_need_fallback(struct crypto_ahash *tfm) { return crypto_ahash_alg(tfm)->halg.base.cra_flags & CRYPTO_ALG_NEED_FALLBACK; } static inline void ahash_op_done(void *data, int err, int (*finish)(struct ahash_request *, int)) { struct ahash_request *areq = data; crypto_completion_t compl; compl = areq->saved_complete; data = areq->saved_data; if (err == -EINPROGRESS) goto out; areq->base.flags &= ~CRYPTO_TFM_REQ_MAY_SLEEP; err = finish(areq, err); if (err == -EINPROGRESS || err == -EBUSY) return; out: compl(data, err); } static int hash_walk_next(struct crypto_hash_walk *walk) { unsigned int offset = walk->offset; unsigned int nbytes = min(walk->entrylen, ((unsigned int)(PAGE_SIZE)) - offset); walk->data = kmap_local_page(walk->pg); walk->data += offset; walk->entrylen -= nbytes; return nbytes; } static int hash_walk_new_entry(struct crypto_hash_walk *walk) { struct scatterlist *sg; sg = walk->sg; walk->offset = sg->offset; walk->pg = sg_page(walk->sg) + (walk->offset >> PAGE_SHIFT); walk->offset = offset_in_page(walk->offset); walk->entrylen = sg->length; if (walk->entrylen > walk->total) walk->entrylen = walk->total; walk->total -= walk->entrylen; return hash_walk_next(walk); } int crypto_hash_walk_first(struct ahash_request *req, struct crypto_hash_walk *walk) { walk->total = req->nbytes; walk->entrylen = 0; if (!walk->total) return 0; walk->flags = req->base.flags; if (ahash_request_isvirt(req)) { walk->data = req->svirt; walk->total = 0; return req->nbytes; } walk->sg = req->src; return hash_walk_new_entry(walk); } EXPORT_SYMBOL_GPL(crypto_hash_walk_first); int crypto_hash_walk_done(struct crypto_hash_walk *walk, int err) { if ((walk->flags & CRYPTO_AHASH_REQ_VIRT)) return err; walk->data -= walk->offset; kunmap_local(walk->data); crypto_yield(walk->flags); if (err) return err; if (walk->entrylen) { walk->offset = 0; walk->pg++; return hash_walk_next(walk); } if (!walk->total) return 0; walk->sg = sg_next(walk->sg); return hash_walk_new_entry(walk); } EXPORT_SYMBOL_GPL(crypto_hash_walk_done); /* * For an ahash tfm that is using an shash algorithm (instead of an ahash * algorithm), this returns the underlying shash tfm. */ static inline struct crypto_shash *ahash_to_shash(struct crypto_ahash *tfm) { return *(struct crypto_shash **)crypto_ahash_ctx(tfm); } static inline struct shash_desc *prepare_shash_desc(struct ahash_request *req, struct crypto_ahash *tfm) { struct shash_desc *desc = ahash_request_ctx(req); desc->tfm = ahash_to_shash(tfm); return desc; } int shash_ahash_update(struct ahash_request *req, struct shash_desc *desc) { struct crypto_hash_walk walk; int nbytes; for (nbytes = crypto_hash_walk_first(req, &walk); nbytes > 0; nbytes = crypto_hash_walk_done(&walk, nbytes)) nbytes = crypto_shash_update(desc, walk.data, nbytes); return nbytes; } EXPORT_SYMBOL_GPL(shash_ahash_update); int shash_ahash_finup(struct ahash_request *req, struct shash_desc *desc) { struct crypto_hash_walk walk; int nbytes; nbytes = crypto_hash_walk_first(req, &walk); if (!nbytes) return crypto_shash_final(desc, req->result); do { nbytes = crypto_hash_walk_last(&walk) ? crypto_shash_finup(desc, walk.data, nbytes, req->result) : crypto_shash_update(desc, walk.data, nbytes); nbytes = crypto_hash_walk_done(&walk, nbytes); } while (nbytes > 0); return nbytes; } EXPORT_SYMBOL_GPL(shash_ahash_finup); int shash_ahash_digest(struct ahash_request *req, struct shash_desc *desc) { unsigned int nbytes = req->nbytes; struct scatterlist *sg; unsigned int offset; struct page *page; const u8 *data; int err; data = req->svirt; if (!nbytes || ahash_request_isvirt(req)) return crypto_shash_digest(desc, data, nbytes, req->result); sg = req->src; if (nbytes > sg->length) return crypto_shash_init(desc) ?: shash_ahash_finup(req, desc); page = sg_page(sg); offset = sg->offset; data = lowmem_page_address(page) + offset; if (!IS_ENABLED(CONFIG_HIGHMEM)) return crypto_shash_digest(desc, data, nbytes, req->result); page += offset >> PAGE_SHIFT; offset = offset_in_page(offset); if (nbytes > (unsigned int)PAGE_SIZE - offset) return crypto_shash_init(desc) ?: shash_ahash_finup(req, desc); data = kmap_local_page(page); err = crypto_shash_digest(desc, data + offset, nbytes, req->result); kunmap_local(data); return err; } EXPORT_SYMBOL_GPL(shash_ahash_digest); static void crypto_exit_ahash_using_shash(struct crypto_tfm *tfm) { struct crypto_shash **ctx = crypto_tfm_ctx(tfm); crypto_free_shash(*ctx); } static int crypto_init_ahash_using_shash(struct crypto_tfm *tfm) { struct crypto_alg *calg = tfm->__crt_alg; struct crypto_ahash *crt = __crypto_ahash_cast(tfm); struct crypto_shash **ctx = crypto_tfm_ctx(tfm); struct crypto_shash *shash; if (!crypto_mod_get(calg)) return -EAGAIN; shash = crypto_create_tfm(calg, &crypto_shash_type); if (IS_ERR(shash)) { crypto_mod_put(calg); return PTR_ERR(shash); } crt->using_shash = true; *ctx = shash; tfm->exit = crypto_exit_ahash_using_shash; crypto_ahash_set_flags(crt, crypto_shash_get_flags(shash) & CRYPTO_TFM_NEED_KEY); return 0; } static int ahash_nosetkey(struct crypto_ahash *tfm, const u8 *key, unsigned int keylen) { return -ENOSYS; } static void ahash_set_needkey(struct crypto_ahash *tfm, struct ahash_alg *alg) { if (alg->setkey != ahash_nosetkey && !(alg->halg.base.cra_flags & CRYPTO_ALG_OPTIONAL_KEY)) crypto_ahash_set_flags(tfm, CRYPTO_TFM_NEED_KEY); } int crypto_ahash_setkey(struct crypto_ahash *tfm, const u8 *key, unsigned int keylen) { if (likely(tfm->using_shash)) { struct crypto_shash *shash = ahash_to_shash(tfm); int err; err = crypto_shash_setkey(shash, key, keylen); if (unlikely(err)) { crypto_ahash_set_flags(tfm, crypto_shash_get_flags(shash) & CRYPTO_TFM_NEED_KEY); return err; } } else { struct ahash_alg *alg = crypto_ahash_alg(tfm); int err; err = alg->setkey(tfm, key, keylen); if (!err && crypto_ahash_need_fallback(tfm)) err = crypto_ahash_setkey(crypto_ahash_fb(tfm), key, keylen); if (unlikely(err)) { ahash_set_needkey(tfm, alg); return err; } } crypto_ahash_clear_flags(tfm, CRYPTO_TFM_NEED_KEY); return 0; } EXPORT_SYMBOL_GPL(crypto_ahash_setkey); static int ahash_do_req_chain(struct ahash_request *req, int (*const *op)(struct ahash_request *req)) { struct crypto_ahash *tfm = crypto_ahash_reqtfm(req); int err; if (crypto_ahash_req_virt(tfm) || !ahash_request_isvirt(req)) return (*op)(req); if (crypto_ahash_statesize(tfm) > HASH_MAX_STATESIZE) return -ENOSYS; if (!crypto_ahash_need_fallback(tfm)) return -ENOSYS; if (crypto_hash_no_export_core(tfm)) return -ENOSYS; { u8 state[HASH_MAX_STATESIZE]; if (op == &crypto_ahash_alg(tfm)->digest) { ahash_request_set_tfm(req, crypto_ahash_fb(tfm)); err = crypto_ahash_digest(req); goto out_no_state; } err = crypto_ahash_export(req, state); ahash_request_set_tfm(req, crypto_ahash_fb(tfm)); err = err ?: crypto_ahash_import(req, state); if (op == &crypto_ahash_alg(tfm)->finup) { err = err ?: crypto_ahash_finup(req); goto out_no_state; } err = err ?: crypto_ahash_update(req) ?: crypto_ahash_export(req, state); ahash_request_set_tfm(req, tfm); return err ?: crypto_ahash_import(req, state); out_no_state: ahash_request_set_tfm(req, tfm); return err; } } int crypto_ahash_init(struct ahash_request *req) { struct crypto_ahash *tfm = crypto_ahash_reqtfm(req); if (likely(tfm->using_shash)) return crypto_shash_init(prepare_shash_desc(req, tfm)); if (crypto_ahash_get_flags(tfm) & CRYPTO_TFM_NEED_KEY) return -ENOKEY; if (ahash_req_on_stack(req) && ahash_is_async(tfm)) return -EAGAIN; if (crypto_ahash_block_only(tfm)) { u8 *buf = ahash_request_ctx(req); buf += crypto_ahash_reqsize(tfm) - 1; *buf = 0; } return crypto_ahash_alg(tfm)->init(req); } EXPORT_SYMBOL_GPL(crypto_ahash_init); static void ahash_save_req(struct ahash_request *req, crypto_completion_t cplt) { req->saved_complete = req->base.complete; req->saved_data = req->base.data; req->base.complete = cplt; req->base.data = req; } static void ahash_restore_req(struct ahash_request *req) { req->base.complete = req->saved_complete; req->base.data = req->saved_data; } static int ahash_update_finish(struct ahash_request *req, int err) { struct crypto_ahash *tfm = crypto_ahash_reqtfm(req); bool nonzero = crypto_ahash_final_nonzero(tfm); int bs = crypto_ahash_blocksize(tfm); u8 *blenp = ahash_request_ctx(req); int blen; u8 *buf; blenp += crypto_ahash_reqsize(tfm) - 1; blen = *blenp; buf = blenp - bs; if (blen) { req->src = req->sg_head + 1; if (sg_is_chain(req->src)) req->src = sg_chain_ptr(req->src); } req->nbytes += nonzero - blen; blen = 0; if (err >= 0) { blen = err + nonzero; err = 0; } if (ahash_request_isvirt(req)) memcpy(buf, req->svirt + req->nbytes - blen, blen); else memcpy_from_sglist(buf, req->src, req->nbytes - blen, blen); *blenp = blen; ahash_restore_req(req); return err; } static void ahash_update_done(void *data, int err) { ahash_op_done(data, err, ahash_update_finish); } int crypto_ahash_update(struct ahash_request *req) { struct crypto_ahash *tfm = crypto_ahash_reqtfm(req); bool nonzero = crypto_ahash_final_nonzero(tfm); int bs = crypto_ahash_blocksize(tfm); u8 *blenp = ahash_request_ctx(req); int blen, err; u8 *buf; if (likely(tfm->using_shash)) return shash_ahash_update(req, ahash_request_ctx(req)); if (ahash_req_on_stack(req) && ahash_is_async(tfm)) return -EAGAIN; if (!crypto_ahash_block_only(tfm)) return ahash_do_req_chain(req, &crypto_ahash_alg(tfm)->update); blenp += crypto_ahash_reqsize(tfm) - 1; blen = *blenp; buf = blenp - bs; if (blen + req->nbytes < bs + nonzero) { if (ahash_request_isvirt(req)) memcpy(buf + blen, req->svirt, req->nbytes); else memcpy_from_sglist(buf + blen, req->src, 0, req->nbytes); *blenp += req->nbytes; return 0; } if (blen) { memset(req->sg_head, 0, sizeof(req->sg_head[0])); sg_set_buf(req->sg_head, buf, blen); if (req->src != req->sg_head + 1) sg_chain(req->sg_head, 2, req->src); req->src = req->sg_head; req->nbytes += blen; } req->nbytes -= nonzero; ahash_save_req(req, ahash_update_done); err = ahash_do_req_chain(req, &crypto_ahash_alg(tfm)->update); if (err == -EINPROGRESS || err == -EBUSY) return err; return ahash_update_finish(req, err); } EXPORT_SYMBOL_GPL(crypto_ahash_update); static int ahash_finup_finish(struct ahash_request *req, int err) { struct crypto_ahash *tfm = crypto_ahash_reqtfm(req); u8 *blenp = ahash_request_ctx(req); int blen; blenp += crypto_ahash_reqsize(tfm) - 1; blen = *blenp; if (blen) { if (sg_is_last(req->src)) req->src = NULL; else { req->src = req->sg_head + 1; if (sg_is_chain(req->src)) req->src = sg_chain_ptr(req->src); } req->nbytes -= blen; } ahash_restore_req(req); return err; } static void ahash_finup_done(void *data, int err) { ahash_op_done(data, err, ahash_finup_finish); } int crypto_ahash_finup(struct ahash_request *req) { struct crypto_ahash *tfm = crypto_ahash_reqtfm(req); int bs = crypto_ahash_blocksize(tfm); u8 *blenp = ahash_request_ctx(req); int blen, err; u8 *buf; if (likely(tfm->using_shash)) return shash_ahash_finup(req, ahash_request_ctx(req)); if (ahash_req_on_stack(req) && ahash_is_async(tfm)) return -EAGAIN; if (!crypto_ahash_alg(tfm)->finup) return ahash_def_finup(req); if (!crypto_ahash_block_only(tfm)) return ahash_do_req_chain(req, &crypto_ahash_alg(tfm)->finup); blenp += crypto_ahash_reqsize(tfm) - 1; blen = *blenp; buf = blenp - bs; if (blen) { memset(req->sg_head, 0, sizeof(req->sg_head[0])); sg_set_buf(req->sg_head, buf, blen); if (!req->src) sg_mark_end(req->sg_head); else if (req->src != req->sg_head + 1) sg_chain(req->sg_head, 2, req->src); req->src = req->sg_head; req->nbytes += blen; } ahash_save_req(req, ahash_finup_done); err = ahash_do_req_chain(req, &crypto_ahash_alg(tfm)->finup); if (err == -EINPROGRESS || err == -EBUSY) return err; return ahash_finup_finish(req, err); } EXPORT_SYMBOL_GPL(crypto_ahash_finup); int crypto_ahash_digest(struct ahash_request *req) { struct crypto_ahash *tfm = crypto_ahash_reqtfm(req); if (likely(tfm->using_shash)) return shash_ahash_digest(req, prepare_shash_desc(req, tfm)); if (ahash_req_on_stack(req) && ahash_is_async(tfm)) return -EAGAIN; if (crypto_ahash_get_flags(tfm) & CRYPTO_TFM_NEED_KEY) return -ENOKEY; return ahash_do_req_chain(req, &crypto_ahash_alg(tfm)->digest); } EXPORT_SYMBOL_GPL(crypto_ahash_digest); static void ahash_def_finup_done2(void *data, int err) { struct ahash_request *areq = data; if (err == -EINPROGRESS) return; ahash_restore_req(areq); ahash_request_complete(areq, err); } static int ahash_def_finup_finish1(struct ahash_request *req, int err) { struct crypto_ahash *tfm = crypto_ahash_reqtfm(req); if (err) goto out; req->base.complete = ahash_def_finup_done2; err = crypto_ahash_alg(tfm)->final(req); if (err == -EINPROGRESS || err == -EBUSY) return err; out: ahash_restore_req(req); return err; } static void ahash_def_finup_done1(void *data, int err) { ahash_op_done(data, err, ahash_def_finup_finish1); } static int ahash_def_finup(struct ahash_request *req) { int err; ahash_save_req(req, ahash_def_finup_done1); err = crypto_ahash_update(req); if (err == -EINPROGRESS || err == -EBUSY) return err; return ahash_def_finup_finish1(req, err); } int crypto_ahash_export_core(struct ahash_request *req, void *out) { struct crypto_ahash *tfm = crypto_ahash_reqtfm(req); if (likely(tfm->using_shash)) return crypto_shash_export_core(ahash_request_ctx(req), out); return crypto_ahash_alg(tfm)->export_core(req, out); } EXPORT_SYMBOL_GPL(crypto_ahash_export_core); int crypto_ahash_export(struct ahash_request *req, void *out) { struct crypto_ahash *tfm = crypto_ahash_reqtfm(req); if (likely(tfm->using_shash)) return crypto_shash_export(ahash_request_ctx(req), out); if (crypto_ahash_block_only(tfm)) { unsigned int plen = crypto_ahash_blocksize(tfm) + 1; unsigned int reqsize = crypto_ahash_reqsize(tfm); unsigned int ss = crypto_ahash_statesize(tfm); u8 *buf = ahash_request_ctx(req); memcpy(out + ss - plen, buf + reqsize - plen, plen); } return crypto_ahash_alg(tfm)->export(req, out); } EXPORT_SYMBOL_GPL(crypto_ahash_export); int crypto_ahash_import_core(struct ahash_request *req, const void *in) { struct crypto_ahash *tfm = crypto_ahash_reqtfm(req); if (likely(tfm->using_shash)) return crypto_shash_import_core(prepare_shash_desc(req, tfm), in); if (crypto_ahash_get_flags(tfm) & CRYPTO_TFM_NEED_KEY) return -ENOKEY; if (crypto_ahash_block_only(tfm)) { unsigned int reqsize = crypto_ahash_reqsize(tfm); u8 *buf = ahash_request_ctx(req); buf[reqsize - 1] = 0; } return crypto_ahash_alg(tfm)->import_core(req, in); } EXPORT_SYMBOL_GPL(crypto_ahash_import_core); int crypto_ahash_import(struct ahash_request *req, const void *in) { struct crypto_ahash *tfm = crypto_ahash_reqtfm(req); if (likely(tfm->using_shash)) return crypto_shash_import(prepare_shash_desc(req, tfm), in); if (crypto_ahash_get_flags(tfm) & CRYPTO_TFM_NEED_KEY) return -ENOKEY; if (crypto_ahash_block_only(tfm)) { unsigned int plen = crypto_ahash_blocksize(tfm) + 1; unsigned int reqsize = crypto_ahash_reqsize(tfm); unsigned int ss = crypto_ahash_statesize(tfm); u8 *buf = ahash_request_ctx(req); memcpy(buf + reqsize - plen, in + ss - plen, plen); if (buf[reqsize - 1] >= plen) return -EOVERFLOW; } return crypto_ahash_alg(tfm)->import(req, in); } EXPORT_SYMBOL_GPL(crypto_ahash_import); static void crypto_ahash_exit_tfm(struct crypto_tfm *tfm) { struct crypto_ahash *hash = __crypto_ahash_cast(tfm); struct ahash_alg *alg = crypto_ahash_alg(hash); if (alg->exit_tfm) alg->exit_tfm(hash); else if (tfm->__crt_alg->cra_exit) tfm->__crt_alg->cra_exit(tfm); if (crypto_ahash_need_fallback(hash)) crypto_free_ahash(crypto_ahash_fb(hash)); } static int crypto_ahash_init_tfm(struct crypto_tfm *tfm) { struct crypto_ahash *hash = __crypto_ahash_cast(tfm); struct ahash_alg *alg = crypto_ahash_alg(hash); struct crypto_ahash *fb = NULL; int err; crypto_ahash_set_statesize(hash, alg->halg.statesize); crypto_ahash_set_reqsize(hash, crypto_tfm_alg_reqsize(tfm)); if (tfm->__crt_alg->cra_type == &crypto_shash_type) return crypto_init_ahash_using_shash(tfm); if (crypto_ahash_need_fallback(hash)) { fb = crypto_alloc_ahash(crypto_ahash_alg_name(hash), CRYPTO_ALG_REQ_VIRT, CRYPTO_ALG_ASYNC | CRYPTO_ALG_REQ_VIRT | CRYPTO_AHASH_ALG_NO_EXPORT_CORE); if (IS_ERR(fb)) return PTR_ERR(fb); tfm->fb = crypto_ahash_tfm(fb); } ahash_set_needkey(hash, alg); tfm->exit = crypto_ahash_exit_tfm; if (alg->init_tfm) err = alg->init_tfm(hash); else if (tfm->__crt_alg->cra_init) err = tfm->__crt_alg->cra_init(tfm); else return 0; if (err) goto out_free_sync_hash; if (!ahash_is_async(hash) && crypto_ahash_reqsize(hash) > MAX_SYNC_HASH_REQSIZE) goto out_exit_tfm; BUILD_BUG_ON(HASH_MAX_DESCSIZE > MAX_SYNC_HASH_REQSIZE); if (crypto_ahash_reqsize(hash) < HASH_MAX_DESCSIZE) crypto_ahash_set_reqsize(hash, HASH_MAX_DESCSIZE); return 0; out_exit_tfm: if (alg->exit_tfm) alg->exit_tfm(hash); else if (tfm->__crt_alg->cra_exit) tfm->__crt_alg->cra_exit(tfm); err = -EINVAL; out_free_sync_hash: crypto_free_ahash(fb); return err; } static unsigned int crypto_ahash_extsize(struct crypto_alg *alg) { if (alg->cra_type == &crypto_shash_type) return sizeof(struct crypto_shash *); return crypto_alg_extsize(alg); } static void crypto_ahash_free_instance(struct crypto_instance *inst) { struct ahash_instance *ahash = ahash_instance(inst); ahash->free(ahash); } static int __maybe_unused crypto_ahash_report( struct sk_buff *skb, struct crypto_alg *alg) { struct crypto_report_hash rhash; memset(&rhash, 0, sizeof(rhash)); strscpy(rhash.type, "ahash", sizeof(rhash.type)); rhash.blocksize = alg->cra_blocksize; rhash.digestsize = __crypto_hash_alg_common(alg)->digestsize; return nla_put(skb, CRYPTOCFGA_REPORT_HASH, sizeof(rhash), &rhash); } static void __maybe_unused crypto_ahash_show(struct seq_file *m, struct crypto_alg *alg) { seq_printf(m, "type : ahash\n"); seq_printf(m, "async : %s\n", str_yes_no(alg->cra_flags & CRYPTO_ALG_ASYNC)); seq_printf(m, "blocksize : %u\n", alg->cra_blocksize); seq_printf(m, "digestsize : %u\n", __crypto_hash_alg_common(alg)->digestsize); } static const struct crypto_type crypto_ahash_type = { .extsize = crypto_ahash_extsize, .init_tfm = crypto_ahash_init_tfm, .free = crypto_ahash_free_instance, #ifdef CONFIG_PROC_FS .show = crypto_ahash_show, #endif #if IS_ENABLED(CONFIG_CRYPTO_USER) .report = crypto_ahash_report, #endif .maskclear = ~CRYPTO_ALG_TYPE_MASK, .maskset = CRYPTO_ALG_TYPE_AHASH_MASK, .type = CRYPTO_ALG_TYPE_AHASH, .tfmsize = offsetof(struct crypto_ahash, base), .algsize = offsetof(struct ahash_alg, halg.base), }; int crypto_grab_ahash(struct crypto_ahash_spawn *spawn, struct crypto_instance *inst, const char *name, u32 type, u32 mask) { spawn->base.frontend = &crypto_ahash_type; return crypto_grab_spawn(&spawn->base, inst, name, type, mask); } EXPORT_SYMBOL_GPL(crypto_grab_ahash); struct crypto_ahash *crypto_alloc_ahash(const char *alg_name, u32 type, u32 mask) { return crypto_alloc_tfm(alg_name, &crypto_ahash_type, type, mask); } EXPORT_SYMBOL_GPL(crypto_alloc_ahash); int crypto_has_ahash(const char *alg_name, u32 type, u32 mask) { return crypto_type_has_alg(alg_name, &crypto_ahash_type, type, mask); } EXPORT_SYMBOL_GPL(crypto_has_ahash); bool crypto_hash_alg_has_setkey(struct hash_alg_common *halg) { struct crypto_alg *alg = &halg->base; if (alg->cra_type == &crypto_shash_type) return crypto_shash_alg_has_setkey(__crypto_shash_alg(alg)); return __crypto_ahash_alg(alg)->setkey != ahash_nosetkey; } EXPORT_SYMBOL_GPL(crypto_hash_alg_has_setkey); struct crypto_ahash *crypto_clone_ahash(struct crypto_ahash *hash) { struct hash_alg_common *halg = crypto_hash_alg_common(hash); struct crypto_tfm *tfm = crypto_ahash_tfm(hash); struct crypto_ahash *fb = NULL; struct crypto_ahash *nhash; struct ahash_alg *alg; int err; if (!crypto_hash_alg_has_setkey(halg)) { tfm = crypto_tfm_get(tfm); if (IS_ERR(tfm)) return ERR_CAST(tfm); return hash; } nhash = crypto_clone_tfm(&crypto_ahash_type, tfm); if (IS_ERR(nhash)) return nhash; nhash->reqsize = hash->reqsize; nhash->statesize = hash->statesize; if (likely(hash->using_shash)) { struct crypto_shash **nctx = crypto_ahash_ctx(nhash); struct crypto_shash *shash; shash = crypto_clone_shash(ahash_to_shash(hash)); if (IS_ERR(shash)) { err = PTR_ERR(shash); goto out_free_nhash; } crypto_ahash_tfm(nhash)->exit = crypto_exit_ahash_using_shash; nhash->using_shash = true; *nctx = shash; return nhash; } if (crypto_ahash_need_fallback(hash)) { fb = crypto_clone_ahash(crypto_ahash_fb(hash)); err = PTR_ERR(fb); if (IS_ERR(fb)) goto out_free_nhash; crypto_ahash_tfm(nhash)->fb = crypto_ahash_tfm(fb); } err = -ENOSYS; alg = crypto_ahash_alg(hash); if (!alg->clone_tfm) goto out_free_fb; err = alg->clone_tfm(nhash, hash); if (err) goto out_free_fb; crypto_ahash_tfm(nhash)->exit = crypto_ahash_exit_tfm; return nhash; out_free_fb: crypto_free_ahash(fb); out_free_nhash: crypto_free_ahash(nhash); return ERR_PTR(err); } EXPORT_SYMBOL_GPL(crypto_clone_ahash); static int ahash_default_export_core(struct ahash_request *req, void *out) { return -ENOSYS; } static int ahash_default_import_core(struct ahash_request *req, const void *in) { return -ENOSYS; } static int ahash_prepare_alg(struct ahash_alg *alg) { struct crypto_alg *base = &alg->halg.base; int err; if (alg->halg.statesize == 0) return -EINVAL; if (base->cra_reqsize && base->cra_reqsize < alg->halg.statesize) return -EINVAL; if (!(base->cra_flags & CRYPTO_ALG_ASYNC) && base->cra_reqsize > MAX_SYNC_HASH_REQSIZE) return -EINVAL; if (base->cra_flags & CRYPTO_ALG_NEED_FALLBACK && base->cra_flags & CRYPTO_ALG_NO_FALLBACK) return -EINVAL; err = hash_prepare_alg(&alg->halg); if (err) return err; base->cra_type = &crypto_ahash_type; base->cra_flags |= CRYPTO_ALG_TYPE_AHASH; if ((base->cra_flags ^ CRYPTO_ALG_REQ_VIRT) & (CRYPTO_ALG_ASYNC | CRYPTO_ALG_REQ_VIRT) && !(base->cra_flags & CRYPTO_ALG_NO_FALLBACK)) base->cra_flags |= CRYPTO_ALG_NEED_FALLBACK; if (!alg->setkey) alg->setkey = ahash_nosetkey; if (base->cra_flags & CRYPTO_AHASH_ALG_BLOCK_ONLY) { BUILD_BUG_ON(MAX_ALGAPI_BLOCKSIZE >= 256); if (!alg->finup) return -EINVAL; base->cra_reqsize += base->cra_blocksize + 1; alg->halg.statesize += base->cra_blocksize + 1; alg->export_core = alg->export; alg->import_core = alg->import; } else if (!alg->export_core || !alg->import_core) { alg->export_core = ahash_default_export_core; alg->import_core = ahash_default_import_core; base->cra_flags |= CRYPTO_AHASH_ALG_NO_EXPORT_CORE; } return 0; } int crypto_register_ahash(struct ahash_alg *alg) { struct crypto_alg *base = &alg->halg.base; int err; err = ahash_prepare_alg(alg); if (err) return err; return crypto_register_alg(base); } EXPORT_SYMBOL_GPL(crypto_register_ahash); void crypto_unregister_ahash(struct ahash_alg *alg) { crypto_unregister_alg(&alg->halg.base); } EXPORT_SYMBOL_GPL(crypto_unregister_ahash); int crypto_register_ahashes(struct ahash_alg *algs, int count) { int i, ret; for (i = 0; i < count; i++) { ret = crypto_register_ahash(&algs[i]); if (ret) { crypto_unregister_ahashes(algs, i); return ret; } } return 0; } EXPORT_SYMBOL_GPL(crypto_register_ahashes); void crypto_unregister_ahashes(struct ahash_alg *algs, int count) { int i; for (i = count - 1; i >= 0; --i) crypto_unregister_ahash(&algs[i]); } EXPORT_SYMBOL_GPL(crypto_unregister_ahashes); int ahash_register_instance(struct crypto_template *tmpl, struct ahash_instance *inst) { int err; if (WARN_ON(!inst->free)) return -EINVAL; err = ahash_prepare_alg(&inst->alg); if (err) return err; return crypto_register_instance(tmpl, ahash_crypto_instance(inst)); } EXPORT_SYMBOL_GPL(ahash_register_instance); void ahash_request_free(struct ahash_request *req) { if (unlikely(!req)) return; if (!ahash_req_on_stack(req)) { kfree(req); return; } ahash_request_zero(req); } EXPORT_SYMBOL_GPL(ahash_request_free); int crypto_hash_digest(struct crypto_ahash *tfm, const u8 *data, unsigned int len, u8 *out) { HASH_REQUEST_ON_STACK(req, crypto_ahash_fb(tfm)); int err; ahash_request_set_callback(req, 0, NULL, NULL); ahash_request_set_virt(req, data, out, len); err = crypto_ahash_digest(req); ahash_request_zero(req); return err; } EXPORT_SYMBOL_GPL(crypto_hash_digest); void ahash_free_singlespawn_instance(struct ahash_instance *inst) { crypto_drop_spawn(ahash_instance_ctx(inst)); kfree(inst); } EXPORT_SYMBOL_GPL(ahash_free_singlespawn_instance); MODULE_LICENSE("GPL"); MODULE_DESCRIPTION("Asynchronous cryptographic hash type"); |
| 41 36 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 | /* SPDX-License-Identifier: GPL-2.0 * * FUSE: Filesystem in Userspace * Copyright (C) 2001-2008 Miklos Szeredi <miklos@szeredi.hu> */ #ifndef _FS_FUSE_DEV_I_H #define _FS_FUSE_DEV_I_H #include <linux/types.h> /* Ordinary requests have even IDs, while interrupts IDs are odd */ #define FUSE_INT_REQ_BIT (1ULL << 0) #define FUSE_REQ_ID_STEP (1ULL << 1) extern struct wait_queue_head fuse_dev_waitq; struct fuse_arg; struct fuse_args; struct fuse_pqueue; struct fuse_req; struct fuse_iqueue; struct fuse_forget_link; struct fuse_copy_state { struct fuse_req *req; struct iov_iter *iter; struct pipe_buffer *pipebufs; struct pipe_buffer *currbuf; struct pipe_inode_info *pipe; unsigned long nr_segs; struct page *pg; unsigned int len; unsigned int offset; bool write:1; bool move_folios:1; bool is_uring:1; struct { unsigned int copied_sz; /* copied size into the user buffer */ } ring; }; #define FUSE_DEV_SYNC_INIT ((struct fuse_dev *) 1) #define FUSE_DEV_PTR_MASK (~1UL) static inline struct fuse_dev *__fuse_get_dev(struct file *file) { /* * Lockless access is OK, because file->private data is set * once during mount and is valid until the file is released. */ struct fuse_dev *fud = READ_ONCE(file->private_data); return (typeof(fud)) ((unsigned long) fud & FUSE_DEV_PTR_MASK); } struct fuse_dev *fuse_get_dev(struct file *file); unsigned int fuse_req_hash(u64 unique); struct fuse_req *fuse_request_find(struct fuse_pqueue *fpq, u64 unique); void fuse_dev_end_requests(struct list_head *head); void fuse_copy_init(struct fuse_copy_state *cs, bool write, struct iov_iter *iter); void fuse_copy_finish(struct fuse_copy_state *cs); int fuse_copy_args(struct fuse_copy_state *cs, unsigned int numargs, unsigned int argpages, struct fuse_arg *args, int zeroing); int fuse_copy_out_args(struct fuse_copy_state *cs, struct fuse_args *args, unsigned int nbytes); void fuse_dev_queue_forget(struct fuse_iqueue *fiq, struct fuse_forget_link *forget); void fuse_dev_queue_interrupt(struct fuse_iqueue *fiq, struct fuse_req *req); bool fuse_remove_pending_req(struct fuse_req *req, spinlock_t *lock); bool fuse_request_expired(struct fuse_conn *fc, struct list_head *list); #endif |
| 2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 | /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _LINUX_STATFS_H #define _LINUX_STATFS_H #include <linux/types.h> #include <asm/statfs.h> #include <asm/byteorder.h> struct kstatfs { long f_type; long f_bsize; u64 f_blocks; u64 f_bfree; u64 f_bavail; u64 f_files; u64 f_ffree; __kernel_fsid_t f_fsid; long f_namelen; long f_frsize; long f_flags; long f_spare[4]; }; /* * Definitions for the flag in f_flag. * * Generally these flags are equivalent to the MS_ flags used in the mount * ABI. The exception is ST_VALID which has the same value as MS_REMOUNT * which doesn't make any sense for statfs. */ #define ST_RDONLY 0x0001 /* mount read-only */ #define ST_NOSUID 0x0002 /* ignore suid and sgid bits */ #define ST_NODEV 0x0004 /* disallow access to device special files */ #define ST_NOEXEC 0x0008 /* disallow program execution */ #define ST_SYNCHRONOUS 0x0010 /* writes are synced at once */ #define ST_VALID 0x0020 /* f_flags support is implemented */ #define ST_MANDLOCK 0x0040 /* allow mandatory locks on an FS */ /* 0x0080 used for ST_WRITE in glibc */ /* 0x0100 used for ST_APPEND in glibc */ /* 0x0200 used for ST_IMMUTABLE in glibc */ #define ST_NOATIME 0x0400 /* do not update access times */ #define ST_NODIRATIME 0x0800 /* do not update directory access times */ #define ST_RELATIME 0x1000 /* update atime relative to mtime/ctime */ #define ST_NOSYMFOLLOW 0x2000 /* do not follow symlinks */ struct dentry; extern int vfs_get_fsid(struct dentry *dentry, __kernel_fsid_t *fsid); static inline __kernel_fsid_t u64_to_fsid(u64 v) { return (__kernel_fsid_t){.val = {(u32)v, (u32)(v>>32)}}; } /* Fold 16 bytes uuid to 64 bit fsid */ static inline __kernel_fsid_t uuid_to_fsid(__u8 *uuid) { return u64_to_fsid(le64_to_cpup((void *)uuid) ^ le64_to_cpup((void *)(uuid + sizeof(u64)))); } #endif |
| 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 1 1 2 3 3 2 3 3 3 3 3 3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 | // SPDX-License-Identifier: GPL-2.0-or-later /* * vimc-scaler.c Virtual Media Controller Driver * * Copyright (C) 2015-2017 Helen Koike <helen.fornazier@gmail.com> */ #include <linux/moduleparam.h> #include <linux/string.h> #include <linux/vmalloc.h> #include <linux/v4l2-mediabus.h> #include <media/v4l2-rect.h> #include <media/v4l2-subdev.h> #include "vimc-common.h" /* Pad identifier */ enum vimc_scaler_pad { VIMC_SCALER_SINK = 0, VIMC_SCALER_SRC = 1, }; #define VIMC_SCALER_FMT_WIDTH_DEFAULT 640 #define VIMC_SCALER_FMT_HEIGHT_DEFAULT 480 struct vimc_scaler_device { struct vimc_ent_device ved; struct v4l2_subdev sd; struct media_pad pads[2]; u8 *src_frame; /* * Virtual "hardware" configuration, filled when the stream starts or * when controls are set. */ struct { struct v4l2_mbus_framefmt sink_fmt; struct v4l2_mbus_framefmt src_fmt; struct v4l2_rect sink_crop; unsigned int bpp; } hw; }; static const struct v4l2_mbus_framefmt fmt_default = { .width = VIMC_SCALER_FMT_WIDTH_DEFAULT, .height = VIMC_SCALER_FMT_HEIGHT_DEFAULT, .code = MEDIA_BUS_FMT_RGB888_1X24, .field = V4L2_FIELD_NONE, .colorspace = V4L2_COLORSPACE_SRGB, }; static const struct v4l2_rect crop_rect_default = { .width = VIMC_SCALER_FMT_WIDTH_DEFAULT, .height = VIMC_SCALER_FMT_HEIGHT_DEFAULT, .top = 0, .left = 0, }; static const struct v4l2_rect crop_rect_min = { .width = VIMC_FRAME_MIN_WIDTH, .height = VIMC_FRAME_MIN_HEIGHT, .top = 0, .left = 0, }; static struct v4l2_rect vimc_scaler_get_crop_bound_sink(const struct v4l2_mbus_framefmt *sink_fmt) { /* Get the crop bounds to clamp the crop rectangle correctly */ struct v4l2_rect r = { .left = 0, .top = 0, .width = sink_fmt->width, .height = sink_fmt->height, }; return r; } static int vimc_scaler_init_state(struct v4l2_subdev *sd, struct v4l2_subdev_state *sd_state) { struct v4l2_mbus_framefmt *mf; struct v4l2_rect *r; unsigned int i; for (i = 0; i < sd->entity.num_pads; i++) { mf = v4l2_subdev_state_get_format(sd_state, i); *mf = fmt_default; } r = v4l2_subdev_state_get_crop(sd_state, VIMC_SCALER_SINK); *r = crop_rect_default; return 0; } static int vimc_scaler_enum_mbus_code(struct v4l2_subdev *sd, struct v4l2_subdev_state *sd_state, struct v4l2_subdev_mbus_code_enum *code) { u32 mbus_code = vimc_mbus_code_by_index(code->index); const struct vimc_pix_map *vpix; if (!mbus_code) return -EINVAL; vpix = vimc_pix_map_by_code(mbus_code); /* We don't support bayer format */ if (!vpix || vpix->bayer) return -EINVAL; code->code = mbus_code; return 0; } static int vimc_scaler_enum_frame_size(struct v4l2_subdev *sd, struct v4l2_subdev_state *sd_state, struct v4l2_subdev_frame_size_enum *fse) { const struct vimc_pix_map *vpix; if (fse->index) return -EINVAL; /* Only accept code in the pix map table in non bayer format */ vpix = vimc_pix_map_by_code(fse->code); if (!vpix || vpix->bayer) return -EINVAL; fse->min_width = VIMC_FRAME_MIN_WIDTH; fse->min_height = VIMC_FRAME_MIN_HEIGHT; fse->max_width = VIMC_FRAME_MAX_WIDTH; fse->max_height = VIMC_FRAME_MAX_HEIGHT; return 0; } static int vimc_scaler_set_fmt(struct v4l2_subdev *sd, struct v4l2_subdev_state *sd_state, struct v4l2_subdev_format *format) { struct vimc_scaler_device *vscaler = v4l2_get_subdevdata(sd); struct v4l2_mbus_framefmt *fmt; /* Do not change the active format while stream is on */ if (format->which == V4L2_SUBDEV_FORMAT_ACTIVE && vscaler->src_frame) return -EBUSY; fmt = v4l2_subdev_state_get_format(sd_state, format->pad); /* * The media bus code and colorspace can only be changed on the sink * pad, the source pad only follows. */ if (format->pad == VIMC_SCALER_SINK) { const struct vimc_pix_map *vpix; /* Only accept code in the pix map table in non bayer format. */ vpix = vimc_pix_map_by_code(format->format.code); if (vpix && !vpix->bayer) fmt->code = format->format.code; else fmt->code = fmt_default.code; /* Clamp the colorspace to valid values. */ fmt->colorspace = format->format.colorspace; fmt->ycbcr_enc = format->format.ycbcr_enc; fmt->quantization = format->format.quantization; fmt->xfer_func = format->format.xfer_func; vimc_colorimetry_clamp(fmt); } /* Clamp and align the width and height */ fmt->width = clamp_t(u32, format->format.width, VIMC_FRAME_MIN_WIDTH, VIMC_FRAME_MAX_WIDTH) & ~1; fmt->height = clamp_t(u32, format->format.height, VIMC_FRAME_MIN_HEIGHT, VIMC_FRAME_MAX_HEIGHT) & ~1; /* * Propagate the sink pad format to the crop rectangle and the source * pad. */ if (format->pad == VIMC_SCALER_SINK) { struct v4l2_mbus_framefmt *src_fmt; struct v4l2_rect *crop; crop = v4l2_subdev_state_get_crop(sd_state, VIMC_SCALER_SINK); crop->width = fmt->width; crop->height = fmt->height; crop->top = 0; crop->left = 0; src_fmt = v4l2_subdev_state_get_format(sd_state, VIMC_SCALER_SRC); *src_fmt = *fmt; } format->format = *fmt; return 0; } static int vimc_scaler_get_selection(struct v4l2_subdev *sd, struct v4l2_subdev_state *sd_state, struct v4l2_subdev_selection *sel) { struct v4l2_mbus_framefmt *sink_fmt; if (VIMC_IS_SRC(sel->pad)) return -EINVAL; switch (sel->target) { case V4L2_SEL_TGT_CROP: sel->r = *v4l2_subdev_state_get_crop(sd_state, VIMC_SCALER_SINK); break; case V4L2_SEL_TGT_CROP_BOUNDS: sink_fmt = v4l2_subdev_state_get_format(sd_state, VIMC_SCALER_SINK); sel->r = vimc_scaler_get_crop_bound_sink(sink_fmt); break; default: return -EINVAL; } return 0; } static void vimc_scaler_adjust_sink_crop(struct v4l2_rect *r, const struct v4l2_mbus_framefmt *sink_fmt) { const struct v4l2_rect sink_rect = vimc_scaler_get_crop_bound_sink(sink_fmt); /* Disallow rectangles smaller than the minimal one. */ v4l2_rect_set_min_size(r, &crop_rect_min); v4l2_rect_map_inside(r, &sink_rect); } static int vimc_scaler_set_selection(struct v4l2_subdev *sd, struct v4l2_subdev_state *sd_state, struct v4l2_subdev_selection *sel) { struct vimc_scaler_device *vscaler = v4l2_get_subdevdata(sd); struct v4l2_mbus_framefmt *sink_fmt; struct v4l2_rect *crop_rect; /* Only support setting the crop of the sink pad */ if (VIMC_IS_SRC(sel->pad) || sel->target != V4L2_SEL_TGT_CROP) return -EINVAL; if (sel->which == V4L2_SUBDEV_FORMAT_ACTIVE && vscaler->src_frame) return -EBUSY; crop_rect = v4l2_subdev_state_get_crop(sd_state, VIMC_SCALER_SINK); sink_fmt = v4l2_subdev_state_get_format(sd_state, VIMC_SCALER_SINK); vimc_scaler_adjust_sink_crop(&sel->r, sink_fmt); *crop_rect = sel->r; return 0; } static const struct v4l2_subdev_pad_ops vimc_scaler_pad_ops = { .enum_mbus_code = vimc_scaler_enum_mbus_code, .enum_frame_size = vimc_scaler_enum_frame_size, .get_fmt = v4l2_subdev_get_fmt, .set_fmt = vimc_scaler_set_fmt, .get_selection = vimc_scaler_get_selection, .set_selection = vimc_scaler_set_selection, }; static int vimc_scaler_s_stream(struct v4l2_subdev *sd, int enable) { struct vimc_scaler_device *vscaler = v4l2_get_subdevdata(sd); if (enable) { struct v4l2_subdev_state *state; const struct v4l2_mbus_framefmt *format; const struct v4l2_rect *rect; unsigned int frame_size; if (vscaler->src_frame) return 0; state = v4l2_subdev_lock_and_get_active_state(sd); /* Save the bytes per pixel of the sink. */ format = v4l2_subdev_state_get_format(state, VIMC_SCALER_SINK); vscaler->hw.sink_fmt = *format; vscaler->hw.bpp = vimc_pix_map_by_code(format->code)->bpp; /* Calculate the frame size of the source pad. */ format = v4l2_subdev_state_get_format(state, VIMC_SCALER_SRC); vscaler->hw.src_fmt = *format; frame_size = format->width * format->height * vscaler->hw.bpp; rect = v4l2_subdev_state_get_crop(state, VIMC_SCALER_SINK); vscaler->hw.sink_crop = *rect; v4l2_subdev_unlock_state(state); /* * Allocate the frame buffer. Use vmalloc to be able to allocate * a large amount of memory. */ vscaler->src_frame = vmalloc(frame_size); if (!vscaler->src_frame) return -ENOMEM; } else { if (!vscaler->src_frame) return 0; vfree(vscaler->src_frame); vscaler->src_frame = NULL; } return 0; } static const struct v4l2_subdev_video_ops vimc_scaler_video_ops = { .s_stream = vimc_scaler_s_stream, }; static const struct v4l2_subdev_ops vimc_scaler_ops = { .pad = &vimc_scaler_pad_ops, .video = &vimc_scaler_video_ops, }; static const struct v4l2_subdev_internal_ops vimc_scaler_internal_ops = { .init_state = vimc_scaler_init_state, }; static void vimc_scaler_fill_src_frame(const struct vimc_scaler_device *const vscaler, const u8 *const sink_frame) { const struct v4l2_mbus_framefmt *sink_fmt = &vscaler->hw.sink_fmt; const struct v4l2_mbus_framefmt *src_fmt = &vscaler->hw.src_fmt; const struct v4l2_rect *r = &vscaler->hw.sink_crop; unsigned int src_x, src_y; u8 *walker = vscaler->src_frame; /* Set each pixel at the src_frame to its sink_frame equivalent */ for (src_y = 0; src_y < src_fmt->height; src_y++) { unsigned int snk_y, y_offset; snk_y = (src_y * r->height) / src_fmt->height + r->top; y_offset = snk_y * sink_fmt->width * vscaler->hw.bpp; for (src_x = 0; src_x < src_fmt->width; src_x++) { unsigned int snk_x, x_offset, index; snk_x = (src_x * r->width) / src_fmt->width + r->left; x_offset = snk_x * vscaler->hw.bpp; index = y_offset + x_offset; memcpy(walker, &sink_frame[index], vscaler->hw.bpp); walker += vscaler->hw.bpp; } } } static void *vimc_scaler_process_frame(struct vimc_ent_device *ved, const void *sink_frame) { struct vimc_scaler_device *vscaler = container_of(ved, struct vimc_scaler_device, ved); /* If the stream in this node is not active, just return */ if (!vscaler->src_frame) return ERR_PTR(-EINVAL); vimc_scaler_fill_src_frame(vscaler, sink_frame); return vscaler->src_frame; }; static void vimc_scaler_release(struct vimc_ent_device *ved) { struct vimc_scaler_device *vscaler = container_of(ved, struct vimc_scaler_device, ved); v4l2_subdev_cleanup(&vscaler->sd); media_entity_cleanup(vscaler->ved.ent); kfree(vscaler); } static struct vimc_ent_device *vimc_scaler_add(struct vimc_device *vimc, const char *vcfg_name) { struct v4l2_device *v4l2_dev = &vimc->v4l2_dev; struct vimc_scaler_device *vscaler; int ret; /* Allocate the vscaler struct */ vscaler = kzalloc_obj(*vscaler); if (!vscaler) return ERR_PTR(-ENOMEM); /* Initialize ved and sd */ vscaler->pads[VIMC_SCALER_SINK].flags = MEDIA_PAD_FL_SINK; vscaler->pads[VIMC_SCALER_SRC].flags = MEDIA_PAD_FL_SOURCE; ret = vimc_ent_sd_register(&vscaler->ved, &vscaler->sd, v4l2_dev, vcfg_name, MEDIA_ENT_F_PROC_VIDEO_SCALER, 2, vscaler->pads, &vimc_scaler_internal_ops, &vimc_scaler_ops); if (ret) { kfree(vscaler); return ERR_PTR(ret); } vscaler->ved.process_frame = vimc_scaler_process_frame; vscaler->ved.dev = vimc->mdev.dev; return &vscaler->ved; } const struct vimc_ent_type vimc_scaler_type = { .add = vimc_scaler_add, .release = vimc_scaler_release }; |
| 6 6 2 2 2 1 19 19 1 1 2 1 2 2 19 16 19 2 17 12 23 23 23 23 21 19 34 19 34 2 33 8 34 26 23 23 23 1 1 1 23 2 2 2 22 23 23 23 23 9 9 14 23 17 13 6 17 12 1 1 1 12 7 13 8 13 13 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 | // SPDX-License-Identifier: GPL-2.0 /* * Waiting for completion events */ #include <linux/kernel.h> #include <linux/sched/signal.h> #include <linux/io_uring.h> #include <trace/events/io_uring.h> #include <uapi/linux/io_uring.h> #include "io_uring.h" #include "napi.h" #include "wait.h" static int io_wake_function(struct wait_queue_entry *curr, unsigned int mode, int wake_flags, void *key) { struct io_wait_queue *iowq = container_of(curr, struct io_wait_queue, wq); /* * Cannot safely flush overflowed CQEs from here, ensure we wake up * the task, and the next invocation will do it. */ if (io_should_wake(iowq) || io_has_work(iowq->ctx)) return autoremove_wake_function(curr, mode, wake_flags, key); return -1; } int io_run_task_work_sig(struct io_ring_ctx *ctx) { if (io_local_work_pending(ctx)) { __set_current_state(TASK_RUNNING); if (io_run_local_work(ctx, INT_MAX, IO_LOCAL_TW_DEFAULT_MAX) > 0) return 0; } if (io_run_task_work() > 0) return 0; if (task_sigpending(current)) return -EINTR; return 0; } static bool current_pending_io(void) { struct io_uring_task *tctx = current->io_uring; if (!tctx) return false; return percpu_counter_read_positive(&tctx->inflight); } static enum hrtimer_restart io_cqring_timer_wakeup(struct hrtimer *timer) { struct io_wait_queue *iowq = container_of(timer, struct io_wait_queue, t); WRITE_ONCE(iowq->hit_timeout, 1); iowq->min_timeout = 0; wake_up_process(iowq->wq.private); return HRTIMER_NORESTART; } /* * Doing min_timeout portion. If we saw any timeouts, events, or have work, * wake up. If not, and we have a normal timeout, switch to that and keep * sleeping. */ static enum hrtimer_restart io_cqring_min_timer_wakeup(struct hrtimer *timer) { struct io_wait_queue *iowq = container_of(timer, struct io_wait_queue, t); struct io_ring_ctx *ctx = iowq->ctx; /* no general timeout, or shorter (or equal), we are done */ if (iowq->timeout == KTIME_MAX || ktime_compare(iowq->min_timeout, iowq->timeout) >= 0) goto out_wake; /* work we may need to run, wake function will see if we need to wake */ if (io_has_work(ctx)) goto out_wake; /* got events since we started waiting, min timeout is done */ if (iowq->cq_min_tail != READ_ONCE(ctx->rings->cq.tail)) goto out_wake; /* if we have any events and min timeout expired, we're done */ if (io_cqring_events(ctx)) goto out_wake; /* * If using deferred task_work running and application is waiting on * more than one request, ensure we reset it now where we are switching * to normal sleeps. Any request completion post min_wait should wake * the task and return. */ if (ctx->flags & IORING_SETUP_DEFER_TASKRUN) { atomic_set(&ctx->cq_wait_nr, 1); smp_mb(); if (!llist_empty(&ctx->work_llist)) goto out_wake; } /* any generated CQE posted past this time should wake us up */ iowq->cq_tail = iowq->cq_min_tail; hrtimer_update_function(&iowq->t, io_cqring_timer_wakeup); hrtimer_set_expires(timer, iowq->timeout); return HRTIMER_RESTART; out_wake: return io_cqring_timer_wakeup(timer); } static int io_cqring_schedule_timeout(struct io_wait_queue *iowq, clockid_t clock_id, ktime_t start_time) { ktime_t timeout; if (iowq->min_timeout) { timeout = ktime_add_ns(iowq->min_timeout, start_time); hrtimer_setup_on_stack(&iowq->t, io_cqring_min_timer_wakeup, clock_id, HRTIMER_MODE_ABS); } else { timeout = iowq->timeout; hrtimer_setup_on_stack(&iowq->t, io_cqring_timer_wakeup, clock_id, HRTIMER_MODE_ABS); } hrtimer_set_expires_range_ns(&iowq->t, timeout, 0); hrtimer_start_expires(&iowq->t, HRTIMER_MODE_ABS); if (!READ_ONCE(iowq->hit_timeout)) schedule(); hrtimer_cancel(&iowq->t); destroy_hrtimer_on_stack(&iowq->t); __set_current_state(TASK_RUNNING); return READ_ONCE(iowq->hit_timeout) ? -ETIME : 0; } static int __io_cqring_wait_schedule(struct io_ring_ctx *ctx, struct io_wait_queue *iowq, struct ext_arg *ext_arg, ktime_t start_time) { int ret = 0; /* * Mark us as being in io_wait if we have pending requests, so cpufreq * can take into account that the task is waiting for IO - turns out * to be important for low QD IO. */ if (ext_arg->iowait && current_pending_io()) current->in_iowait = 1; if (iowq->timeout != KTIME_MAX || iowq->min_timeout) ret = io_cqring_schedule_timeout(iowq, ctx->clockid, start_time); else schedule(); current->in_iowait = 0; return ret; } /* If this returns > 0, the caller should retry */ static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx, struct io_wait_queue *iowq, struct ext_arg *ext_arg, ktime_t start_time) { if (unlikely(READ_ONCE(ctx->check_cq))) return 1; if (unlikely(io_local_work_pending(ctx))) return 1; if (unlikely(task_work_pending(current))) return 1; if (unlikely(task_sigpending(current))) return -EINTR; if (unlikely(io_should_wake(iowq))) return 0; return __io_cqring_wait_schedule(ctx, iowq, ext_arg, start_time); } /* * Wait until events become available, if we don't already have some. The * application must reap them itself, as they reside on the shared cq ring. */ int io_cqring_wait(struct io_ring_ctx *ctx, int min_events, u32 flags, struct ext_arg *ext_arg) { struct io_wait_queue iowq; struct io_rings *rings = ctx->rings; ktime_t start_time; int ret; min_events = min_t(int, min_events, ctx->cq_entries); if (!io_allowed_run_tw(ctx)) return -EEXIST; if (io_local_work_pending(ctx)) io_run_local_work(ctx, min_events, max(IO_LOCAL_TW_DEFAULT_MAX, min_events)); io_run_task_work(); if (unlikely(test_bit(IO_CHECK_CQ_OVERFLOW_BIT, &ctx->check_cq))) io_cqring_do_overflow_flush(ctx); if (__io_cqring_events_user(ctx) >= min_events) return 0; init_waitqueue_func_entry(&iowq.wq, io_wake_function); iowq.wq.private = current; INIT_LIST_HEAD(&iowq.wq.entry); iowq.ctx = ctx; iowq.cq_tail = READ_ONCE(ctx->rings->cq.head) + min_events; iowq.cq_min_tail = READ_ONCE(ctx->rings->cq.tail); iowq.nr_timeouts = atomic_read(&ctx->cq_timeouts); iowq.hit_timeout = 0; iowq.min_timeout = ext_arg->min_time; iowq.timeout = KTIME_MAX; start_time = io_get_time(ctx); if (ext_arg->ts_set) { iowq.timeout = timespec64_to_ktime(ext_arg->ts); if (!(flags & IORING_ENTER_ABS_TIMER)) iowq.timeout = ktime_add(iowq.timeout, start_time); } if (ext_arg->sig) { #ifdef CONFIG_COMPAT if (in_compat_syscall()) ret = set_compat_user_sigmask((const compat_sigset_t __user *)ext_arg->sig, ext_arg->argsz); else #endif ret = set_user_sigmask(ext_arg->sig, ext_arg->argsz); if (ret) return ret; } io_napi_busy_loop(ctx, &iowq); trace_io_uring_cqring_wait(ctx, min_events); do { unsigned long check_cq; int nr_wait; /* if min timeout has been hit, don't reset wait count */ if (!iowq.hit_timeout) nr_wait = (int) iowq.cq_tail - READ_ONCE(ctx->rings->cq.tail); else nr_wait = 1; if (ctx->flags & IORING_SETUP_DEFER_TASKRUN) { atomic_set(&ctx->cq_wait_nr, nr_wait); set_current_state(TASK_INTERRUPTIBLE); } else { prepare_to_wait_exclusive(&ctx->cq_wait, &iowq.wq, TASK_INTERRUPTIBLE); } ret = io_cqring_wait_schedule(ctx, &iowq, ext_arg, start_time); __set_current_state(TASK_RUNNING); atomic_set(&ctx->cq_wait_nr, IO_CQ_WAKE_INIT); /* * Run task_work after scheduling and before io_should_wake(). * If we got woken because of task_work being processed, run it * now rather than let the caller do another wait loop. */ if (io_local_work_pending(ctx)) io_run_local_work(ctx, nr_wait, nr_wait); io_run_task_work(); /* * Non-local task_work will be run on exit to userspace, but * if we're using DEFER_TASKRUN, then we could have waited * with a timeout for a number of requests. If the timeout * hits, we could have some requests ready to process. Ensure * this break is _after_ we have run task_work, to avoid * deferring running potentially pending requests until the * next time we wait for events. */ if (ret < 0) break; check_cq = READ_ONCE(ctx->check_cq); if (unlikely(check_cq)) { /* let the caller flush overflows, retry */ if (check_cq & BIT(IO_CHECK_CQ_OVERFLOW_BIT)) io_cqring_do_overflow_flush(ctx); if (check_cq & BIT(IO_CHECK_CQ_DROPPED_BIT)) { ret = -EBADR; break; } } if (io_should_wake(&iowq)) { ret = 0; break; } cond_resched(); } while (1); if (!(ctx->flags & IORING_SETUP_DEFER_TASKRUN)) finish_wait(&ctx->cq_wait, &iowq.wq); restore_saved_sigmask_unless(ret == -EINTR); return READ_ONCE(rings->cq.head) == READ_ONCE(rings->cq.tail) ? ret : 0; } |
| 14 2 2 14 14 14 14 14 14 14 14 23 23 23 24 24 24 14 14 43 43 7 7 7 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 | // SPDX-License-Identifier: GPL-2.0-or-later /* * Asynchronous Compression operations * * Copyright (c) 2016, Intel Corporation * Authors: Weigang Li <weigang.li@intel.com> * Giovanni Cabiddu <giovanni.cabiddu@intel.com> */ #include <crypto/internal/acompress.h> #include <crypto/scatterwalk.h> #include <linux/cryptouser.h> #include <linux/cpumask.h> #include <linux/err.h> #include <linux/kernel.h> #include <linux/module.h> #include <linux/percpu.h> #include <linux/scatterlist.h> #include <linux/sched.h> #include <linux/seq_file.h> #include <linux/smp.h> #include <linux/spinlock.h> #include <linux/string.h> #include <linux/workqueue.h> #include <net/netlink.h> #include "compress.h" struct crypto_scomp; enum { ACOMP_WALK_SLEEP = 1 << 0, ACOMP_WALK_SRC_LINEAR = 1 << 1, ACOMP_WALK_DST_LINEAR = 1 << 2, }; static const struct crypto_type crypto_acomp_type; static void acomp_reqchain_done(void *data, int err); static inline struct acomp_alg *__crypto_acomp_alg(struct crypto_alg *alg) { return container_of(alg, struct acomp_alg, calg.base); } static inline struct acomp_alg *crypto_acomp_alg(struct crypto_acomp *tfm) { return __crypto_acomp_alg(crypto_acomp_tfm(tfm)->__crt_alg); } static int __maybe_unused crypto_acomp_report( struct sk_buff *skb, struct crypto_alg *alg) { struct crypto_report_acomp racomp; memset(&racomp, 0, sizeof(racomp)); strscpy(racomp.type, "acomp", sizeof(racomp.type)); return nla_put(skb, CRYPTOCFGA_REPORT_ACOMP, sizeof(racomp), &racomp); } static void __maybe_unused crypto_acomp_show(struct seq_file *m, struct crypto_alg *alg) { seq_puts(m, "type : acomp\n"); } static void crypto_acomp_exit_tfm(struct crypto_tfm *tfm) { struct crypto_acomp *acomp = __crypto_acomp_tfm(tfm); struct acomp_alg *alg = crypto_acomp_alg(acomp); if (alg->exit) alg->exit(acomp); if (acomp_is_async(acomp)) crypto_free_acomp(crypto_acomp_fb(acomp)); } static int crypto_acomp_init_tfm(struct crypto_tfm *tfm) { struct crypto_acomp *acomp = __crypto_acomp_tfm(tfm); struct acomp_alg *alg = crypto_acomp_alg(acomp); struct crypto_acomp *fb = NULL; int err; if (tfm->__crt_alg->cra_type != &crypto_acomp_type) return crypto_init_scomp_ops_async(tfm); if (acomp_is_async(acomp)) { fb = crypto_alloc_acomp(crypto_acomp_alg_name(acomp), 0, CRYPTO_ALG_ASYNC); if (IS_ERR(fb)) return PTR_ERR(fb); err = -EINVAL; if (crypto_acomp_reqsize(fb) > MAX_SYNC_COMP_REQSIZE) goto out_free_fb; tfm->fb = crypto_acomp_tfm(fb); } acomp->compress = alg->compress; acomp->decompress = alg->decompress; acomp->reqsize = alg->base.cra_reqsize; acomp->base.exit = crypto_acomp_exit_tfm; if (!alg->init) return 0; err = alg->init(acomp); if (err) goto out_free_fb; return 0; out_free_fb: crypto_free_acomp(fb); return err; } static unsigned int crypto_acomp_extsize(struct crypto_alg *alg) { int extsize = crypto_alg_extsize(alg); if (alg->cra_type != &crypto_acomp_type) extsize += sizeof(struct crypto_scomp *); return extsize; } static const struct crypto_type crypto_acomp_type = { .extsize = crypto_acomp_extsize, .init_tfm = crypto_acomp_init_tfm, #ifdef CONFIG_PROC_FS .show = crypto_acomp_show, #endif #if IS_ENABLED(CONFIG_CRYPTO_USER) .report = crypto_acomp_report, #endif .maskclear = ~CRYPTO_ALG_TYPE_MASK, .maskset = CRYPTO_ALG_TYPE_ACOMPRESS_MASK, .type = CRYPTO_ALG_TYPE_ACOMPRESS, .tfmsize = offsetof(struct crypto_acomp, base), .algsize = offsetof(struct acomp_alg, base), }; struct crypto_acomp *crypto_alloc_acomp(const char *alg_name, u32 type, u32 mask) { return crypto_alloc_tfm(alg_name, &crypto_acomp_type, type, mask); } EXPORT_SYMBOL_GPL(crypto_alloc_acomp); struct crypto_acomp *crypto_alloc_acomp_node(const char *alg_name, u32 type, u32 mask, int node) { return crypto_alloc_tfm_node(alg_name, &crypto_acomp_type, type, mask, node); } EXPORT_SYMBOL_GPL(crypto_alloc_acomp_node); static void acomp_save_req(struct acomp_req *req, crypto_completion_t cplt) { struct acomp_req_chain *state = &req->chain; state->compl = req->base.complete; state->data = req->base.data; req->base.complete = cplt; req->base.data = state; } static void acomp_restore_req(struct acomp_req *req) { struct acomp_req_chain *state = req->base.data; req->base.complete = state->compl; req->base.data = state->data; } static void acomp_reqchain_virt(struct acomp_req *req) { struct acomp_req_chain *state = &req->chain; unsigned int slen = req->slen; unsigned int dlen = req->dlen; if (state->flags & CRYPTO_ACOMP_REQ_SRC_VIRT) acomp_request_set_src_dma(req, state->src, slen); if (state->flags & CRYPTO_ACOMP_REQ_DST_VIRT) acomp_request_set_dst_dma(req, state->dst, dlen); } static void acomp_virt_to_sg(struct acomp_req *req) { struct acomp_req_chain *state = &req->chain; state->flags = req->base.flags & (CRYPTO_ACOMP_REQ_SRC_VIRT | CRYPTO_ACOMP_REQ_DST_VIRT); if (acomp_request_src_isvirt(req)) { unsigned int slen = req->slen; const u8 *svirt = req->svirt; state->src = svirt; sg_init_one(&state->ssg, svirt, slen); acomp_request_set_src_sg(req, &state->ssg, slen); } if (acomp_request_dst_isvirt(req)) { unsigned int dlen = req->dlen; u8 *dvirt = req->dvirt; state->dst = dvirt; sg_init_one(&state->dsg, dvirt, dlen); acomp_request_set_dst_sg(req, &state->dsg, dlen); } } static int acomp_do_nondma(struct acomp_req *req, bool comp) { ACOMP_FBREQ_ON_STACK(fbreq, req); int err; if (comp) err = crypto_acomp_compress(fbreq); else err = crypto_acomp_decompress(fbreq); req->dlen = fbreq->dlen; return err; } static int acomp_do_one_req(struct acomp_req *req, bool comp) { if (acomp_request_isnondma(req)) return acomp_do_nondma(req, comp); acomp_virt_to_sg(req); return comp ? crypto_acomp_reqtfm(req)->compress(req) : crypto_acomp_reqtfm(req)->decompress(req); } static int acomp_reqchain_finish(struct acomp_req *req, int err) { acomp_reqchain_virt(req); acomp_restore_req(req); return err; } static void acomp_reqchain_done(void *data, int err) { struct acomp_req *req = data; crypto_completion_t compl; compl = req->chain.compl; data = req->chain.data; if (err == -EINPROGRESS) goto notify; err = acomp_reqchain_finish(req, err); notify: compl(data, err); } static int acomp_do_req_chain(struct acomp_req *req, bool comp) { int err; acomp_save_req(req, acomp_reqchain_done); err = acomp_do_one_req(req, comp); if (err == -EBUSY || err == -EINPROGRESS) return err; return acomp_reqchain_finish(req, err); } int crypto_acomp_compress(struct acomp_req *req) { struct crypto_acomp *tfm = crypto_acomp_reqtfm(req); if (acomp_req_on_stack(req) && acomp_is_async(tfm)) return -EAGAIN; if (crypto_acomp_req_virt(tfm) || acomp_request_issg(req)) return crypto_acomp_reqtfm(req)->compress(req); return acomp_do_req_chain(req, true); } EXPORT_SYMBOL_GPL(crypto_acomp_compress); int crypto_acomp_decompress(struct acomp_req *req) { struct crypto_acomp *tfm = crypto_acomp_reqtfm(req); if (acomp_req_on_stack(req) && acomp_is_async(tfm)) return -EAGAIN; if (crypto_acomp_req_virt(tfm) || acomp_request_issg(req)) return crypto_acomp_reqtfm(req)->decompress(req); return acomp_do_req_chain(req, false); } EXPORT_SYMBOL_GPL(crypto_acomp_decompress); void comp_prepare_alg(struct comp_alg_common *alg) { struct crypto_alg *base = &alg->base; base->cra_flags &= ~CRYPTO_ALG_TYPE_MASK; } int crypto_register_acomp(struct acomp_alg *alg) { struct crypto_alg *base = &alg->calg.base; comp_prepare_alg(&alg->calg); base->cra_type = &crypto_acomp_type; base->cra_flags |= CRYPTO_ALG_TYPE_ACOMPRESS; return crypto_register_alg(base); } EXPORT_SYMBOL_GPL(crypto_register_acomp); void crypto_unregister_acomp(struct acomp_alg *alg) { crypto_unregister_alg(&alg->base); } EXPORT_SYMBOL_GPL(crypto_unregister_acomp); int crypto_register_acomps(struct acomp_alg *algs, int count) { int i, ret; for (i = 0; i < count; i++) { ret = crypto_register_acomp(&algs[i]); if (ret) { crypto_unregister_acomps(algs, i); return ret; } } return 0; } EXPORT_SYMBOL_GPL(crypto_register_acomps); void crypto_unregister_acomps(struct acomp_alg *algs, int count) { int i; for (i = count - 1; i >= 0; --i) crypto_unregister_acomp(&algs[i]); } EXPORT_SYMBOL_GPL(crypto_unregister_acomps); static void acomp_stream_workfn(struct work_struct *work) { struct crypto_acomp_streams *s = container_of(work, struct crypto_acomp_streams, stream_work); struct crypto_acomp_stream __percpu *streams = s->streams; int cpu; for_each_cpu(cpu, &s->stream_want) { struct crypto_acomp_stream *ps; void *ctx; ps = per_cpu_ptr(streams, cpu); if (ps->ctx) continue; ctx = s->alloc_ctx(); if (IS_ERR(ctx)) break; spin_lock_bh(&ps->lock); ps->ctx = ctx; spin_unlock_bh(&ps->lock); cpumask_clear_cpu(cpu, &s->stream_want); } } void crypto_acomp_free_streams(struct crypto_acomp_streams *s) { struct crypto_acomp_stream __percpu *streams = s->streams; void (*free_ctx)(void *); int i; s->streams = NULL; if (!streams) return; cancel_work_sync(&s->stream_work); free_ctx = s->free_ctx; for_each_possible_cpu(i) { struct crypto_acomp_stream *ps = per_cpu_ptr(streams, i); if (!ps->ctx) continue; free_ctx(ps->ctx); } free_percpu(streams); } EXPORT_SYMBOL_GPL(crypto_acomp_free_streams); int crypto_acomp_alloc_streams(struct crypto_acomp_streams *s) { struct crypto_acomp_stream __percpu *streams; struct crypto_acomp_stream *ps; unsigned int i; void *ctx; if (s->streams) return 0; streams = alloc_percpu(struct crypto_acomp_stream); if (!streams) return -ENOMEM; ctx = s->alloc_ctx(); if (IS_ERR(ctx)) { free_percpu(streams); return PTR_ERR(ctx); } i = cpumask_first(cpu_possible_mask); ps = per_cpu_ptr(streams, i); ps->ctx = ctx; for_each_possible_cpu(i) { ps = per_cpu_ptr(streams, i); spin_lock_init(&ps->lock); } s->streams = streams; INIT_WORK(&s->stream_work, acomp_stream_workfn); return 0; } EXPORT_SYMBOL_GPL(crypto_acomp_alloc_streams); struct crypto_acomp_stream *_crypto_acomp_lock_stream_bh( struct crypto_acomp_streams *s) { struct crypto_acomp_stream __percpu *streams = s->streams; int cpu = raw_smp_processor_id(); struct crypto_acomp_stream *ps; ps = per_cpu_ptr(streams, cpu); spin_lock_bh(&ps->lock); if (likely(ps->ctx)) return ps; spin_unlock(&ps->lock); cpumask_set_cpu(cpu, &s->stream_want); schedule_work(&s->stream_work); ps = per_cpu_ptr(streams, cpumask_first(cpu_possible_mask)); spin_lock(&ps->lock); return ps; } EXPORT_SYMBOL_GPL(_crypto_acomp_lock_stream_bh); void acomp_walk_done_src(struct acomp_walk *walk, int used) { walk->slen -= used; if ((walk->flags & ACOMP_WALK_SRC_LINEAR)) scatterwalk_advance(&walk->in, used); else scatterwalk_done_src(&walk->in, used); if ((walk->flags & ACOMP_WALK_SLEEP)) cond_resched(); } EXPORT_SYMBOL_GPL(acomp_walk_done_src); void acomp_walk_done_dst(struct acomp_walk *walk, int used) { walk->dlen -= used; if ((walk->flags & ACOMP_WALK_DST_LINEAR)) scatterwalk_advance(&walk->out, used); else scatterwalk_done_dst(&walk->out, used); if ((walk->flags & ACOMP_WALK_SLEEP)) cond_resched(); } EXPORT_SYMBOL_GPL(acomp_walk_done_dst); int acomp_walk_next_src(struct acomp_walk *walk) { unsigned int slen = walk->slen; unsigned int max = UINT_MAX; if (!preempt_model_preemptible() && (walk->flags & ACOMP_WALK_SLEEP)) max = PAGE_SIZE; if ((walk->flags & ACOMP_WALK_SRC_LINEAR)) { walk->in.__addr = (void *)(((u8 *)walk->in.sg) + walk->in.offset); return min(slen, max); } return slen ? scatterwalk_next(&walk->in, slen) : 0; } EXPORT_SYMBOL_GPL(acomp_walk_next_src); int acomp_walk_next_dst(struct acomp_walk *walk) { unsigned int dlen = walk->dlen; unsigned int max = UINT_MAX; if (!preempt_model_preemptible() && (walk->flags & ACOMP_WALK_SLEEP)) max = PAGE_SIZE; if ((walk->flags & ACOMP_WALK_DST_LINEAR)) { walk->out.__addr = (void *)(((u8 *)walk->out.sg) + walk->out.offset); return min(dlen, max); } return dlen ? scatterwalk_next(&walk->out, dlen) : 0; } EXPORT_SYMBOL_GPL(acomp_walk_next_dst); int acomp_walk_virt(struct acomp_walk *__restrict walk, struct acomp_req *__restrict req, bool atomic) { struct scatterlist *src = req->src; struct scatterlist *dst = req->dst; walk->slen = req->slen; walk->dlen = req->dlen; if (!walk->slen || !walk->dlen) return -EINVAL; walk->flags = 0; if ((req->base.flags & CRYPTO_TFM_REQ_MAY_SLEEP) && !atomic) walk->flags |= ACOMP_WALK_SLEEP; if ((req->base.flags & CRYPTO_ACOMP_REQ_SRC_VIRT)) walk->flags |= ACOMP_WALK_SRC_LINEAR; if ((req->base.flags & CRYPTO_ACOMP_REQ_DST_VIRT)) walk->flags |= ACOMP_WALK_DST_LINEAR; if ((walk->flags & ACOMP_WALK_SRC_LINEAR)) { walk->in.sg = (void *)req->svirt; walk->in.offset = 0; } else scatterwalk_start(&walk->in, src); if ((walk->flags & ACOMP_WALK_DST_LINEAR)) { walk->out.sg = (void *)req->dvirt; walk->out.offset = 0; } else scatterwalk_start(&walk->out, dst); return 0; } EXPORT_SYMBOL_GPL(acomp_walk_virt); struct acomp_req *acomp_request_clone(struct acomp_req *req, size_t total, gfp_t gfp) { struct acomp_req *nreq; nreq = container_of(crypto_request_clone(&req->base, total, gfp), struct acomp_req, base); if (nreq == req) return req; if (req->src == &req->chain.ssg) nreq->src = &nreq->chain.ssg; if (req->dst == &req->chain.dsg) nreq->dst = &nreq->chain.dsg; return nreq; } EXPORT_SYMBOL_GPL(acomp_request_clone); MODULE_LICENSE("GPL"); MODULE_DESCRIPTION("Asynchronous compression type"); |
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | /* SPDX-License-Identifier: GPL-2.0 */ /* * Shared Memory Communications over RDMA (SMC-R) and RoCE * * Manage send buffer * * Copyright IBM Corp. 2016 * * Author(s): Ursula Braun <ubraun@linux.vnet.ibm.com> */ #ifndef SMC_TX_H #define SMC_TX_H #include <linux/socket.h> #include <linux/types.h> #include "smc.h" #include "smc_cdc.h" static inline int smc_tx_prepared_sends(struct smc_connection *conn) { union smc_host_cursor sent, prep; smc_curs_copy(&sent, &conn->tx_curs_sent, conn); smc_curs_copy(&prep, &conn->tx_curs_prep, conn); return smc_curs_diff(conn->sndbuf_desc->len, &sent, &prep); } void smc_tx_pending(struct smc_connection *conn); void smc_tx_work(struct work_struct *work); void smc_tx_init(struct smc_sock *smc); int smc_tx_sendmsg(struct smc_sock *smc, struct msghdr *msg, size_t len); int smc_tx_sndbuf_nonempty(struct smc_connection *conn); void smc_tx_sndbuf_nonfull(struct smc_sock *smc); void smc_tx_consumer_update(struct smc_connection *conn, bool force); int smcd_tx_ism_write(struct smc_connection *conn, void *data, size_t len, u32 offset, int signal); #endif /* SMC_TX_H */ |
| 17 2 1 14 16 2 2 2 2 2 2 2 2 1 1 1 1 2 2 2 2 2 2 1 2 2 2 2 1 2 3 2 2 2 1 2 1 3 3 3 2 3 7 7 3 7 4 7 6 7 3 7 4 7 6 7 7 7 7 18 18 17 17 17 1 1 2 2 3 6 1 1 18 1 17 5 5 2 2 1 2 7 7 2 2 3 2 3 4 2 1 6 19 18 18 18 17 1 17 15 2 16 2 14 18 17 18 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 | // SPDX-License-Identifier: GPL-2.0 /* * Quota code necessary even when VFS quota support is not compiled * into the kernel. The interesting stuff is over in dquot.c, here * we have symbols for initial quotactl(2) handling, the sysctl(2) * variables, etc - things needed even when quota support disabled. */ #include <linux/fs.h> #include <linux/namei.h> #include <linux/slab.h> #include <asm/current.h> #include <linux/blkdev.h> #include <linux/uaccess.h> #include <linux/kernel.h> #include <linux/security.h> #include <linux/syscalls.h> #include <linux/capability.h> #include <linux/quotaops.h> #include <linux/types.h> #include <linux/mount.h> #include <linux/writeback.h> #include <linux/nospec.h> #include "compat.h" #include "../internal.h" static int check_quotactl_permission(struct super_block *sb, int type, int cmd, qid_t id) { switch (cmd) { /* these commands do not require any special privilegues */ case Q_GETFMT: case Q_SYNC: case Q_GETINFO: case Q_XGETQSTAT: case Q_XGETQSTATV: case Q_XQUOTASYNC: break; /* allow to query information for dquots we "own" */ case Q_GETQUOTA: case Q_XGETQUOTA: if ((type == USRQUOTA && uid_eq(current_euid(), make_kuid(current_user_ns(), id))) || (type == GRPQUOTA && in_egroup_p(make_kgid(current_user_ns(), id)))) break; fallthrough; default: if (!capable(CAP_SYS_ADMIN)) return -EPERM; } return security_quotactl(cmd, type, id, sb); } static void quota_sync_one(struct super_block *sb, void *arg) { int type = *(int *)arg; if (sb->s_qcop && sb->s_qcop->quota_sync && (sb->s_quota_types & (1 << type))) sb->s_qcop->quota_sync(sb, type); } static int quota_sync_all(int type) { int ret; ret = security_quotactl(Q_SYNC, type, 0, NULL); if (!ret) iterate_supers(quota_sync_one, &type); return ret; } unsigned int qtype_enforce_flag(int type) { switch (type) { case USRQUOTA: return FS_QUOTA_UDQ_ENFD; case GRPQUOTA: return FS_QUOTA_GDQ_ENFD; case PRJQUOTA: return FS_QUOTA_PDQ_ENFD; } return 0; } static int quota_quotaon(struct super_block *sb, int type, qid_t id, const struct path *path) { if (!sb->s_qcop->quota_on && !sb->s_qcop->quota_enable) return -ENOSYS; if (sb->s_qcop->quota_enable) return sb->s_qcop->quota_enable(sb, qtype_enforce_flag(type)); if (IS_ERR(path)) return PTR_ERR(path); return sb->s_qcop->quota_on(sb, type, id, path); } static int quota_quotaoff(struct super_block *sb, int type) { if (!sb->s_qcop->quota_off && !sb->s_qcop->quota_disable) return -ENOSYS; if (sb->s_qcop->quota_disable) return sb->s_qcop->quota_disable(sb, qtype_enforce_flag(type)); return sb->s_qcop->quota_off(sb, type); } static int quota_getfmt(struct super_block *sb, int type, void __user *addr) { __u32 fmt; if (!sb_has_quota_active(sb, type)) return -ESRCH; fmt = sb_dqopt(sb)->info[type].dqi_format->qf_fmt_id; if (copy_to_user(addr, &fmt, sizeof(fmt))) return -EFAULT; return 0; } static int quota_getinfo(struct super_block *sb, int type, void __user *addr) { struct qc_state state; struct qc_type_state *tstate; struct if_dqinfo uinfo; int ret; if (!sb->s_qcop->get_state) return -ENOSYS; ret = sb->s_qcop->get_state(sb, &state); if (ret) return ret; tstate = state.s_state + type; if (!(tstate->flags & QCI_ACCT_ENABLED)) return -ESRCH; memset(&uinfo, 0, sizeof(uinfo)); uinfo.dqi_bgrace = tstate->spc_timelimit; uinfo.dqi_igrace = tstate->ino_timelimit; if (tstate->flags & QCI_SYSFILE) uinfo.dqi_flags |= DQF_SYS_FILE; if (tstate->flags & QCI_ROOT_SQUASH) uinfo.dqi_flags |= DQF_ROOT_SQUASH; uinfo.dqi_valid = IIF_ALL; if (copy_to_user(addr, &uinfo, sizeof(uinfo))) return -EFAULT; return 0; } static int quota_setinfo(struct super_block *sb, int type, void __user *addr) { struct if_dqinfo info; struct qc_info qinfo; if (copy_from_user(&info, addr, sizeof(info))) return -EFAULT; if (!sb->s_qcop->set_info) return -ENOSYS; if (info.dqi_valid & ~(IIF_FLAGS | IIF_BGRACE | IIF_IGRACE)) return -EINVAL; memset(&qinfo, 0, sizeof(qinfo)); if (info.dqi_valid & IIF_FLAGS) { if (info.dqi_flags & ~DQF_SETINFO_MASK) return -EINVAL; if (info.dqi_flags & DQF_ROOT_SQUASH) qinfo.i_flags |= QCI_ROOT_SQUASH; qinfo.i_fieldmask |= QC_FLAGS; } if (info.dqi_valid & IIF_BGRACE) { qinfo.i_spc_timelimit = info.dqi_bgrace; qinfo.i_fieldmask |= QC_SPC_TIMER; } if (info.dqi_valid & IIF_IGRACE) { qinfo.i_ino_timelimit = info.dqi_igrace; qinfo.i_fieldmask |= QC_INO_TIMER; } return sb->s_qcop->set_info(sb, type, &qinfo); } static inline qsize_t qbtos(qsize_t blocks) { return blocks << QIF_DQBLKSIZE_BITS; } static inline qsize_t stoqb(qsize_t space) { return (space + QIF_DQBLKSIZE - 1) >> QIF_DQBLKSIZE_BITS; } static void copy_to_if_dqblk(struct if_dqblk *dst, struct qc_dqblk *src) { memset(dst, 0, sizeof(*dst)); dst->dqb_bhardlimit = stoqb(src->d_spc_hardlimit); dst->dqb_bsoftlimit = stoqb(src->d_spc_softlimit); dst->dqb_curspace = src->d_space; dst->dqb_ihardlimit = src->d_ino_hardlimit; dst->dqb_isoftlimit = src->d_ino_softlimit; dst->dqb_curinodes = src->d_ino_count; dst->dqb_btime = src->d_spc_timer; dst->dqb_itime = src->d_ino_timer; dst->dqb_valid = QIF_ALL; } static int quota_getquota(struct super_block *sb, int type, qid_t id, void __user *addr) { struct kqid qid; struct qc_dqblk fdq; struct if_dqblk idq; int ret; if (!sb->s_qcop->get_dqblk) return -ENOSYS; qid = make_kqid(current_user_ns(), type, id); if (!qid_has_mapping(sb->s_user_ns, qid)) return -EINVAL; ret = sb->s_qcop->get_dqblk(sb, qid, &fdq); if (ret) return ret; copy_to_if_dqblk(&idq, &fdq); if (compat_need_64bit_alignment_fixup()) { struct compat_if_dqblk __user *compat_dqblk = addr; if (copy_to_user(compat_dqblk, &idq, sizeof(*compat_dqblk))) return -EFAULT; if (put_user(idq.dqb_valid, &compat_dqblk->dqb_valid)) return -EFAULT; } else { if (copy_to_user(addr, &idq, sizeof(idq))) return -EFAULT; } return 0; } /* * Return quota for next active quota >= this id, if any exists, * otherwise return -ENOENT via ->get_nextdqblk */ static int quota_getnextquota(struct super_block *sb, int type, qid_t id, void __user *addr) { struct kqid qid; struct qc_dqblk fdq; struct if_nextdqblk idq; int ret; if (!sb->s_qcop->get_nextdqblk) return -ENOSYS; qid = make_kqid(current_user_ns(), type, id); if (!qid_has_mapping(sb->s_user_ns, qid)) return -EINVAL; ret = sb->s_qcop->get_nextdqblk(sb, &qid, &fdq); if (ret) return ret; /* struct if_nextdqblk is a superset of struct if_dqblk */ copy_to_if_dqblk((struct if_dqblk *)&idq, &fdq); idq.dqb_id = from_kqid(current_user_ns(), qid); if (copy_to_user(addr, &idq, sizeof(idq))) return -EFAULT; return 0; } static void copy_from_if_dqblk(struct qc_dqblk *dst, struct if_dqblk *src) { dst->d_spc_hardlimit = qbtos(src->dqb_bhardlimit); dst->d_spc_softlimit = qbtos(src->dqb_bsoftlimit); dst->d_space = src->dqb_curspace; dst->d_ino_hardlimit = src->dqb_ihardlimit; dst->d_ino_softlimit = src->dqb_isoftlimit; dst->d_ino_count = src->dqb_curinodes; dst->d_spc_timer = src->dqb_btime; dst->d_ino_timer = src->dqb_itime; dst->d_fieldmask = 0; if (src->dqb_valid & QIF_BLIMITS) dst->d_fieldmask |= QC_SPC_SOFT | QC_SPC_HARD; if (src->dqb_valid & QIF_SPACE) dst->d_fieldmask |= QC_SPACE; if (src->dqb_valid & QIF_ILIMITS) dst->d_fieldmask |= QC_INO_SOFT | QC_INO_HARD; if (src->dqb_valid & QIF_INODES) dst->d_fieldmask |= QC_INO_COUNT; if (src->dqb_valid & QIF_BTIME) dst->d_fieldmask |= QC_SPC_TIMER; if (src->dqb_valid & QIF_ITIME) dst->d_fieldmask |= QC_INO_TIMER; } static int quota_setquota(struct super_block *sb, int type, qid_t id, void __user *addr) { struct qc_dqblk fdq; struct if_dqblk idq; struct kqid qid; if (compat_need_64bit_alignment_fixup()) { struct compat_if_dqblk __user *compat_dqblk = addr; if (copy_from_user(&idq, compat_dqblk, sizeof(*compat_dqblk)) || get_user(idq.dqb_valid, &compat_dqblk->dqb_valid)) return -EFAULT; } else { if (copy_from_user(&idq, addr, sizeof(idq))) return -EFAULT; } if (!sb->s_qcop->set_dqblk) return -ENOSYS; qid = make_kqid(current_user_ns(), type, id); if (!qid_has_mapping(sb->s_user_ns, qid)) return -EINVAL; copy_from_if_dqblk(&fdq, &idq); return sb->s_qcop->set_dqblk(sb, qid, &fdq); } static int quota_enable(struct super_block *sb, void __user *addr) { __u32 flags; if (copy_from_user(&flags, addr, sizeof(flags))) return -EFAULT; if (!sb->s_qcop->quota_enable) return -ENOSYS; return sb->s_qcop->quota_enable(sb, flags); } static int quota_disable(struct super_block *sb, void __user *addr) { __u32 flags; if (copy_from_user(&flags, addr, sizeof(flags))) return -EFAULT; if (!sb->s_qcop->quota_disable) return -ENOSYS; return sb->s_qcop->quota_disable(sb, flags); } static int quota_state_to_flags(struct qc_state *state) { int flags = 0; if (state->s_state[USRQUOTA].flags & QCI_ACCT_ENABLED) flags |= FS_QUOTA_UDQ_ACCT; if (state->s_state[USRQUOTA].flags & QCI_LIMITS_ENFORCED) flags |= FS_QUOTA_UDQ_ENFD; if (state->s_state[GRPQUOTA].flags & QCI_ACCT_ENABLED) flags |= FS_QUOTA_GDQ_ACCT; if (state->s_state[GRPQUOTA].flags & QCI_LIMITS_ENFORCED) flags |= FS_QUOTA_GDQ_ENFD; if (state->s_state[PRJQUOTA].flags & QCI_ACCT_ENABLED) flags |= FS_QUOTA_PDQ_ACCT; if (state->s_state[PRJQUOTA].flags & QCI_LIMITS_ENFORCED) flags |= FS_QUOTA_PDQ_ENFD; return flags; } static int quota_getstate(struct super_block *sb, int type, struct fs_quota_stat *fqs) { struct qc_state state; int ret; memset(&state, 0, sizeof (struct qc_state)); ret = sb->s_qcop->get_state(sb, &state); if (ret < 0) return ret; memset(fqs, 0, sizeof(*fqs)); fqs->qs_version = FS_QSTAT_VERSION; fqs->qs_flags = quota_state_to_flags(&state); /* No quota enabled? */ if (!fqs->qs_flags) return -ENOSYS; fqs->qs_incoredqs = state.s_incoredqs; fqs->qs_btimelimit = state.s_state[type].spc_timelimit; fqs->qs_itimelimit = state.s_state[type].ino_timelimit; fqs->qs_rtbtimelimit = state.s_state[type].rt_spc_timelimit; fqs->qs_bwarnlimit = state.s_state[type].spc_warnlimit; fqs->qs_iwarnlimit = state.s_state[type].ino_warnlimit; /* Inodes may be allocated even if inactive; copy out if present */ if (state.s_state[USRQUOTA].ino) { fqs->qs_uquota.qfs_ino = state.s_state[USRQUOTA].ino; fqs->qs_uquota.qfs_nblks = state.s_state[USRQUOTA].blocks; fqs->qs_uquota.qfs_nextents = state.s_state[USRQUOTA].nextents; } if (state.s_state[GRPQUOTA].ino) { fqs->qs_gquota.qfs_ino = state.s_state[GRPQUOTA].ino; fqs->qs_gquota.qfs_nblks = state.s_state[GRPQUOTA].blocks; fqs->qs_gquota.qfs_nextents = state.s_state[GRPQUOTA].nextents; } if (state.s_state[PRJQUOTA].ino) { /* * Q_XGETQSTAT doesn't have room for both group and project * quotas. So, allow the project quota values to be copied out * only if there is no group quota information available. */ if (!(state.s_state[GRPQUOTA].flags & QCI_ACCT_ENABLED)) { fqs->qs_gquota.qfs_ino = state.s_state[PRJQUOTA].ino; fqs->qs_gquota.qfs_nblks = state.s_state[PRJQUOTA].blocks; fqs->qs_gquota.qfs_nextents = state.s_state[PRJQUOTA].nextents; } } return 0; } static int compat_copy_fs_qfilestat(struct compat_fs_qfilestat __user *to, struct fs_qfilestat *from) { if (copy_to_user(to, from, sizeof(*to)) || put_user(from->qfs_nextents, &to->qfs_nextents)) return -EFAULT; return 0; } static int compat_copy_fs_quota_stat(struct compat_fs_quota_stat __user *to, struct fs_quota_stat *from) { if (put_user(from->qs_version, &to->qs_version) || put_user(from->qs_flags, &to->qs_flags) || put_user(from->qs_pad, &to->qs_pad) || compat_copy_fs_qfilestat(&to->qs_uquota, &from->qs_uquota) || compat_copy_fs_qfilestat(&to->qs_gquota, &from->qs_gquota) || put_user(from->qs_incoredqs, &to->qs_incoredqs) || put_user(from->qs_btimelimit, &to->qs_btimelimit) || put_user(from->qs_itimelimit, &to->qs_itimelimit) || put_user(from->qs_rtbtimelimit, &to->qs_rtbtimelimit) || put_user(from->qs_bwarnlimit, &to->qs_bwarnlimit) || put_user(from->qs_iwarnlimit, &to->qs_iwarnlimit)) return -EFAULT; return 0; } static int quota_getxstate(struct super_block *sb, int type, void __user *addr) { struct fs_quota_stat fqs; int ret; if (!sb->s_qcop->get_state) return -ENOSYS; ret = quota_getstate(sb, type, &fqs); if (ret) return ret; if (compat_need_64bit_alignment_fixup()) return compat_copy_fs_quota_stat(addr, &fqs); if (copy_to_user(addr, &fqs, sizeof(fqs))) return -EFAULT; return 0; } static int quota_getstatev(struct super_block *sb, int type, struct fs_quota_statv *fqs) { struct qc_state state; int ret; memset(&state, 0, sizeof (struct qc_state)); ret = sb->s_qcop->get_state(sb, &state); if (ret < 0) return ret; memset(fqs, 0, sizeof(*fqs)); fqs->qs_version = FS_QSTAT_VERSION; fqs->qs_flags = quota_state_to_flags(&state); /* No quota enabled? */ if (!fqs->qs_flags) return -ENOSYS; fqs->qs_incoredqs = state.s_incoredqs; fqs->qs_btimelimit = state.s_state[type].spc_timelimit; fqs->qs_itimelimit = state.s_state[type].ino_timelimit; fqs->qs_rtbtimelimit = state.s_state[type].rt_spc_timelimit; fqs->qs_bwarnlimit = state.s_state[type].spc_warnlimit; fqs->qs_iwarnlimit = state.s_state[type].ino_warnlimit; fqs->qs_rtbwarnlimit = state.s_state[type].rt_spc_warnlimit; /* Inodes may be allocated even if inactive; copy out if present */ if (state.s_state[USRQUOTA].ino) { fqs->qs_uquota.qfs_ino = state.s_state[USRQUOTA].ino; fqs->qs_uquota.qfs_nblks = state.s_state[USRQUOTA].blocks; fqs->qs_uquota.qfs_nextents = state.s_state[USRQUOTA].nextents; } if (state.s_state[GRPQUOTA].ino) { fqs->qs_gquota.qfs_ino = state.s_state[GRPQUOTA].ino; fqs->qs_gquota.qfs_nblks = state.s_state[GRPQUOTA].blocks; fqs->qs_gquota.qfs_nextents = state.s_state[GRPQUOTA].nextents; } if (state.s_state[PRJQUOTA].ino) { fqs->qs_pquota.qfs_ino = state.s_state[PRJQUOTA].ino; fqs->qs_pquota.qfs_nblks = state.s_state[PRJQUOTA].blocks; fqs->qs_pquota.qfs_nextents = state.s_state[PRJQUOTA].nextents; } return 0; } static int quota_getxstatev(struct super_block *sb, int type, void __user *addr) { struct fs_quota_statv fqs; int ret; if (!sb->s_qcop->get_state) return -ENOSYS; memset(&fqs, 0, sizeof(fqs)); if (copy_from_user(&fqs, addr, 1)) /* Just read qs_version */ return -EFAULT; /* If this kernel doesn't support user specified version, fail */ switch (fqs.qs_version) { case FS_QSTATV_VERSION1: break; default: return -EINVAL; } ret = quota_getstatev(sb, type, &fqs); if (!ret && copy_to_user(addr, &fqs, sizeof(fqs))) return -EFAULT; return ret; } /* * XFS defines BBTOB and BTOBB macros inside fs/xfs/ and we cannot move them * out of there as xfsprogs rely on definitions being in that header file. So * just define same functions here for quota purposes. */ #define XFS_BB_SHIFT 9 static inline u64 quota_bbtob(u64 blocks) { return blocks << XFS_BB_SHIFT; } static inline u64 quota_btobb(u64 bytes) { return (bytes + (1 << XFS_BB_SHIFT) - 1) >> XFS_BB_SHIFT; } static inline s64 copy_from_xfs_dqblk_ts(const struct fs_disk_quota *d, __s32 timer, __s8 timer_hi) { if (d->d_fieldmask & FS_DQ_BIGTIME) return (u32)timer | (s64)timer_hi << 32; return timer; } static void copy_from_xfs_dqblk(struct qc_dqblk *dst, struct fs_disk_quota *src) { dst->d_spc_hardlimit = quota_bbtob(src->d_blk_hardlimit); dst->d_spc_softlimit = quota_bbtob(src->d_blk_softlimit); dst->d_ino_hardlimit = src->d_ino_hardlimit; dst->d_ino_softlimit = src->d_ino_softlimit; dst->d_space = quota_bbtob(src->d_bcount); dst->d_ino_count = src->d_icount; dst->d_ino_timer = copy_from_xfs_dqblk_ts(src, src->d_itimer, src->d_itimer_hi); dst->d_spc_timer = copy_from_xfs_dqblk_ts(src, src->d_btimer, src->d_btimer_hi); dst->d_ino_warns = src->d_iwarns; dst->d_spc_warns = src->d_bwarns; dst->d_rt_spc_hardlimit = quota_bbtob(src->d_rtb_hardlimit); dst->d_rt_spc_softlimit = quota_bbtob(src->d_rtb_softlimit); dst->d_rt_space = quota_bbtob(src->d_rtbcount); dst->d_rt_spc_timer = copy_from_xfs_dqblk_ts(src, src->d_rtbtimer, src->d_rtbtimer_hi); dst->d_rt_spc_warns = src->d_rtbwarns; dst->d_fieldmask = 0; if (src->d_fieldmask & FS_DQ_ISOFT) dst->d_fieldmask |= QC_INO_SOFT; if (src->d_fieldmask & FS_DQ_IHARD) dst->d_fieldmask |= QC_INO_HARD; if (src->d_fieldmask & FS_DQ_BSOFT) dst->d_fieldmask |= QC_SPC_SOFT; if (src->d_fieldmask & FS_DQ_BHARD) dst->d_fieldmask |= QC_SPC_HARD; if (src->d_fieldmask & FS_DQ_RTBSOFT) dst->d_fieldmask |= QC_RT_SPC_SOFT; if (src->d_fieldmask & FS_DQ_RTBHARD) dst->d_fieldmask |= QC_RT_SPC_HARD; if (src->d_fieldmask & FS_DQ_BTIMER) dst->d_fieldmask |= QC_SPC_TIMER; if (src->d_fieldmask & FS_DQ_ITIMER) dst->d_fieldmask |= QC_INO_TIMER; if (src->d_fieldmask & FS_DQ_RTBTIMER) dst->d_fieldmask |= QC_RT_SPC_TIMER; if (src->d_fieldmask & FS_DQ_BWARNS) dst->d_fieldmask |= QC_SPC_WARNS; if (src->d_fieldmask & FS_DQ_IWARNS) dst->d_fieldmask |= QC_INO_WARNS; if (src->d_fieldmask & FS_DQ_RTBWARNS) dst->d_fieldmask |= QC_RT_SPC_WARNS; if (src->d_fieldmask & FS_DQ_BCOUNT) dst->d_fieldmask |= QC_SPACE; if (src->d_fieldmask & FS_DQ_ICOUNT) dst->d_fieldmask |= QC_INO_COUNT; if (src->d_fieldmask & FS_DQ_RTBCOUNT) dst->d_fieldmask |= QC_RT_SPACE; } static void copy_qcinfo_from_xfs_dqblk(struct qc_info *dst, struct fs_disk_quota *src) { memset(dst, 0, sizeof(*dst)); dst->i_spc_timelimit = src->d_btimer; dst->i_ino_timelimit = src->d_itimer; dst->i_rt_spc_timelimit = src->d_rtbtimer; dst->i_ino_warnlimit = src->d_iwarns; dst->i_spc_warnlimit = src->d_bwarns; dst->i_rt_spc_warnlimit = src->d_rtbwarns; if (src->d_fieldmask & FS_DQ_BWARNS) dst->i_fieldmask |= QC_SPC_WARNS; if (src->d_fieldmask & FS_DQ_IWARNS) dst->i_fieldmask |= QC_INO_WARNS; if (src->d_fieldmask & FS_DQ_RTBWARNS) dst->i_fieldmask |= QC_RT_SPC_WARNS; if (src->d_fieldmask & FS_DQ_BTIMER) dst->i_fieldmask |= QC_SPC_TIMER; if (src->d_fieldmask & FS_DQ_ITIMER) dst->i_fieldmask |= QC_INO_TIMER; if (src->d_fieldmask & FS_DQ_RTBTIMER) dst->i_fieldmask |= QC_RT_SPC_TIMER; } static int quota_setxquota(struct super_block *sb, int type, qid_t id, void __user *addr) { struct fs_disk_quota fdq; struct qc_dqblk qdq; struct kqid qid; if (copy_from_user(&fdq, addr, sizeof(fdq))) return -EFAULT; if (!sb->s_qcop->set_dqblk) return -ENOSYS; qid = make_kqid(current_user_ns(), type, id); if (!qid_has_mapping(sb->s_user_ns, qid)) return -EINVAL; /* Are we actually setting timer / warning limits for all users? */ if (from_kqid(sb->s_user_ns, qid) == 0 && fdq.d_fieldmask & (FS_DQ_WARNS_MASK | FS_DQ_TIMER_MASK)) { struct qc_info qinfo; int ret; if (!sb->s_qcop->set_info) return -EINVAL; copy_qcinfo_from_xfs_dqblk(&qinfo, &fdq); ret = sb->s_qcop->set_info(sb, type, &qinfo); if (ret) return ret; /* These are already done */ fdq.d_fieldmask &= ~(FS_DQ_WARNS_MASK | FS_DQ_TIMER_MASK); } copy_from_xfs_dqblk(&qdq, &fdq); return sb->s_qcop->set_dqblk(sb, qid, &qdq); } static inline void copy_to_xfs_dqblk_ts(const struct fs_disk_quota *d, __s32 *timer_lo, __s8 *timer_hi, s64 timer) { *timer_lo = timer; if (d->d_fieldmask & FS_DQ_BIGTIME) *timer_hi = timer >> 32; } static inline bool want_bigtime(s64 timer) { return timer > S32_MAX || timer < S32_MIN; } static void copy_to_xfs_dqblk(struct fs_disk_quota *dst, struct qc_dqblk *src, int type, qid_t id) { memset(dst, 0, sizeof(*dst)); if (want_bigtime(src->d_ino_timer) || want_bigtime(src->d_spc_timer) || want_bigtime(src->d_rt_spc_timer)) dst->d_fieldmask |= FS_DQ_BIGTIME; dst->d_version = FS_DQUOT_VERSION; dst->d_id = id; if (type == USRQUOTA) dst->d_flags = FS_USER_QUOTA; else if (type == PRJQUOTA) dst->d_flags = FS_PROJ_QUOTA; else dst->d_flags = FS_GROUP_QUOTA; dst->d_blk_hardlimit = quota_btobb(src->d_spc_hardlimit); dst->d_blk_softlimit = quota_btobb(src->d_spc_softlimit); dst->d_ino_hardlimit = src->d_ino_hardlimit; dst->d_ino_softlimit = src->d_ino_softlimit; dst->d_bcount = quota_btobb(src->d_space); dst->d_icount = src->d_ino_count; copy_to_xfs_dqblk_ts(dst, &dst->d_itimer, &dst->d_itimer_hi, src->d_ino_timer); copy_to_xfs_dqblk_ts(dst, &dst->d_btimer, &dst->d_btimer_hi, src->d_spc_timer); dst->d_iwarns = src->d_ino_warns; dst->d_bwarns = src->d_spc_warns; dst->d_rtb_hardlimit = quota_btobb(src->d_rt_spc_hardlimit); dst->d_rtb_softlimit = quota_btobb(src->d_rt_spc_softlimit); dst->d_rtbcount = quota_btobb(src->d_rt_space); copy_to_xfs_dqblk_ts(dst, &dst->d_rtbtimer, &dst->d_rtbtimer_hi, src->d_rt_spc_timer); dst->d_rtbwarns = src->d_rt_spc_warns; } static int quota_getxquota(struct super_block *sb, int type, qid_t id, void __user *addr) { struct fs_disk_quota fdq; struct qc_dqblk qdq; struct kqid qid; int ret; if (!sb->s_qcop->get_dqblk) return -ENOSYS; qid = make_kqid(current_user_ns(), type, id); if (!qid_has_mapping(sb->s_user_ns, qid)) return -EINVAL; ret = sb->s_qcop->get_dqblk(sb, qid, &qdq); if (ret) return ret; copy_to_xfs_dqblk(&fdq, &qdq, type, id); if (copy_to_user(addr, &fdq, sizeof(fdq))) return -EFAULT; return ret; } /* * Return quota for next active quota >= this id, if any exists, * otherwise return -ENOENT via ->get_nextdqblk. */ static int quota_getnextxquota(struct super_block *sb, int type, qid_t id, void __user *addr) { struct fs_disk_quota fdq; struct qc_dqblk qdq; struct kqid qid; qid_t id_out; int ret; if (!sb->s_qcop->get_nextdqblk) return -ENOSYS; qid = make_kqid(current_user_ns(), type, id); if (!qid_has_mapping(sb->s_user_ns, qid)) return -EINVAL; ret = sb->s_qcop->get_nextdqblk(sb, &qid, &qdq); if (ret) return ret; id_out = from_kqid(current_user_ns(), qid); copy_to_xfs_dqblk(&fdq, &qdq, type, id_out); if (copy_to_user(addr, &fdq, sizeof(fdq))) return -EFAULT; return ret; } static int quota_rmxquota(struct super_block *sb, void __user *addr) { __u32 flags; if (copy_from_user(&flags, addr, sizeof(flags))) return -EFAULT; if (!sb->s_qcop->rm_xquota) return -ENOSYS; return sb->s_qcop->rm_xquota(sb, flags); } /* Copy parameters and call proper function */ static int do_quotactl(struct super_block *sb, int type, int cmd, qid_t id, void __user *addr, const struct path *path) { int ret; type = array_index_nospec(type, MAXQUOTAS); /* * Quota not supported on this fs? Check this before s_quota_types * since they needn't be set if quota is not supported at all. */ if (!sb->s_qcop) return -ENOSYS; if (!(sb->s_quota_types & (1 << type))) return -EINVAL; ret = check_quotactl_permission(sb, type, cmd, id); if (ret < 0) return ret; switch (cmd) { case Q_QUOTAON: return quota_quotaon(sb, type, id, path); case Q_QUOTAOFF: return quota_quotaoff(sb, type); case Q_GETFMT: return quota_getfmt(sb, type, addr); case Q_GETINFO: return quota_getinfo(sb, type, addr); case Q_SETINFO: return quota_setinfo(sb, type, addr); case Q_GETQUOTA: return quota_getquota(sb, type, id, addr); case Q_GETNEXTQUOTA: return quota_getnextquota(sb, type, id, addr); case Q_SETQUOTA: return quota_setquota(sb, type, id, addr); case Q_SYNC: if (!sb->s_qcop->quota_sync) return -ENOSYS; return sb->s_qcop->quota_sync(sb, type); case Q_XQUOTAON: return quota_enable(sb, addr); case Q_XQUOTAOFF: return quota_disable(sb, addr); case Q_XQUOTARM: return quota_rmxquota(sb, addr); case Q_XGETQSTAT: return quota_getxstate(sb, type, addr); case Q_XGETQSTATV: return quota_getxstatev(sb, type, addr); case Q_XSETQLIM: return quota_setxquota(sb, type, id, addr); case Q_XGETQUOTA: return quota_getxquota(sb, type, id, addr); case Q_XGETNEXTQUOTA: return quota_getnextxquota(sb, type, id, addr); case Q_XQUOTASYNC: if (sb_rdonly(sb)) return -EROFS; /* XFS quotas are fully coherent now, making this call a noop */ return 0; default: return -EINVAL; } } /* Return 1 if 'cmd' will block on frozen filesystem */ static int quotactl_cmd_write(int cmd) { /* * We cannot allow Q_GETQUOTA and Q_GETNEXTQUOTA without write access * as dquot_acquire() may allocate space for new structure and OCFS2 * needs to increment on-disk use count. */ switch (cmd) { case Q_GETFMT: case Q_GETINFO: case Q_SYNC: case Q_XGETQSTAT: case Q_XGETQSTATV: case Q_XGETQUOTA: case Q_XGETNEXTQUOTA: case Q_XQUOTASYNC: return 0; } return 1; } /* Return true if quotactl command is manipulating quota on/off state */ static bool quotactl_cmd_onoff(int cmd) { return (cmd == Q_QUOTAON) || (cmd == Q_QUOTAOFF) || (cmd == Q_XQUOTAON) || (cmd == Q_XQUOTAOFF); } /* * look up a superblock on which quota ops will be performed * - use the name of a block device to find the superblock thereon */ static struct super_block *quotactl_block(const char __user *special, int cmd) { #ifdef CONFIG_BLOCK struct super_block *sb; CLASS(filename, tmp)(special); bool excl = false, thawed = false; int error; dev_t dev; if (IS_ERR(tmp)) return ERR_CAST(tmp); error = lookup_bdev(tmp->name, &dev); if (error) return ERR_PTR(error); if (quotactl_cmd_onoff(cmd)) { excl = true; thawed = true; } else if (quotactl_cmd_write(cmd)) { thawed = true; } retry: sb = user_get_super(dev, excl); if (!sb) return ERR_PTR(-ENODEV); if (thawed && sb->s_writers.frozen != SB_UNFROZEN) { if (excl) up_write(&sb->s_umount); else up_read(&sb->s_umount); /* Wait for sb to unfreeze */ sb_start_write(sb); sb_end_write(sb); put_super(sb); cond_resched(); goto retry; } return sb; #else return ERR_PTR(-ENODEV); #endif } /* * This is the system call interface. This communicates with * the user-level programs. Currently this only supports diskquota * calls. Maybe we need to add the process quotas etc. in the future, * but we probably should use rlimits for that. */ SYSCALL_DEFINE4(quotactl, unsigned int, cmd, const char __user *, special, qid_t, id, void __user *, addr) { uint cmds, type; struct super_block *sb = NULL; struct path path, *pathp = NULL; int ret; cmds = cmd >> SUBCMDSHIFT; type = cmd & SUBCMDMASK; if (type >= MAXQUOTAS) return -EINVAL; /* * As a special case Q_SYNC can be called without a specific device. * It will iterate all superblocks that have quota enabled and call * the sync action on each of them. */ if (!special) { if (cmds == Q_SYNC) return quota_sync_all(type); return -ENODEV; } /* * Path for quotaon has to be resolved before grabbing superblock * because that gets s_umount sem which is also possibly needed by path * resolution (think about autofs) and thus deadlocks could arise. */ if (cmds == Q_QUOTAON) { ret = user_path_at(AT_FDCWD, addr, LOOKUP_FOLLOW|LOOKUP_AUTOMOUNT, &path); if (ret) pathp = ERR_PTR(ret); else pathp = &path; } sb = quotactl_block(special, cmds); if (IS_ERR(sb)) { ret = PTR_ERR(sb); goto out; } ret = do_quotactl(sb, type, cmds, id, addr, pathp); if (!quotactl_cmd_onoff(cmds)) drop_super(sb); else drop_super_exclusive(sb); out: if (pathp && !IS_ERR(pathp)) path_put(pathp); return ret; } SYSCALL_DEFINE4(quotactl_fd, unsigned int, fd, unsigned int, cmd, qid_t, id, void __user *, addr) { struct super_block *sb; unsigned int cmds = cmd >> SUBCMDSHIFT; unsigned int type = cmd & SUBCMDMASK; CLASS(fd_raw, f)(fd); int ret; if (fd_empty(f)) return -EBADF; if (type >= MAXQUOTAS) return -EINVAL; if (quotactl_cmd_write(cmds)) { ret = mnt_want_write(fd_file(f)->f_path.mnt); if (ret) return ret; } sb = fd_file(f)->f_path.mnt->mnt_sb; if (quotactl_cmd_onoff(cmds)) down_write(&sb->s_umount); else down_read(&sb->s_umount); ret = do_quotactl(sb, type, cmds, id, addr, ERR_PTR(-EINVAL)); if (quotactl_cmd_onoff(cmds)) up_write(&sb->s_umount); else up_read(&sb->s_umount); if (quotactl_cmd_write(cmds)) mnt_drop_write(fd_file(f)->f_path.mnt); return ret; } |
| 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 | // SPDX-License-Identifier: GPL-2.0-only /* Copyright (c) 2026 Christian Brauner <brauner@kernel.org> */ #include <linux/fs/super_types.h> #include <linux/fs_context.h> #include <linux/magic.h> static const struct super_operations nullfs_super_operations = { .statfs = simple_statfs, }; static int nullfs_fs_fill_super(struct super_block *s, struct fs_context *fc) { struct inode *inode; s->s_maxbytes = MAX_LFS_FILESIZE; s->s_blocksize = PAGE_SIZE; s->s_blocksize_bits = PAGE_SHIFT; s->s_magic = NULL_FS_MAGIC; s->s_op = &nullfs_super_operations; s->s_export_op = NULL; s->s_xattr = NULL; s->s_time_gran = 1; s->s_d_flags = 0; inode = new_inode(s); if (!inode) return -ENOMEM; /* nullfs is permanently empty... */ make_empty_dir_inode(inode); simple_inode_init_ts(inode); inode->i_ino = 1; /* ... and immutable. */ inode->i_flags |= S_IMMUTABLE; s->s_root = d_make_root(inode); if (!s->s_root) return -ENOMEM; return 0; } /* * For now this is a single global instance. If needed we can make it * mountable by userspace at which point we will need to make it * multi-instance. */ static int nullfs_fs_get_tree(struct fs_context *fc) { return get_tree_single(fc, nullfs_fs_fill_super); } static const struct fs_context_operations nullfs_fs_context_ops = { .get_tree = nullfs_fs_get_tree, }; static int nullfs_init_fs_context(struct fs_context *fc) { fc->ops = &nullfs_fs_context_ops; fc->global = true; fc->sb_flags = SB_NOUSER; fc->s_iflags = SB_I_NOEXEC | SB_I_NODEV; return 0; } struct file_system_type nullfs_fs_type = { .name = "nullfs", .init_fs_context = nullfs_init_fs_context, .kill_sb = kill_anon_super, }; |
| 30 87 87 30 106 119 130 130 78 78 77 12 60 384 111 247 14 12 98 1 98 1 2 129 93 92 129 102 90 3 3 5 96 7 7 5 95 4 4 4 102 2 98 87 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 | /* SPDX-License-Identifier: GPL-2.0-or-later */ /* SCTP kernel implementation * (C) Copyright IBM Corp. 2001, 2004 * Copyright (c) 1999-2000 Cisco, Inc. * Copyright (c) 1999-2001 Motorola, Inc. * Copyright (c) 2001-2003 Intel Corp. * * This file is part of the SCTP kernel implementation * * The base lksctp header. * * Please send any bug reports or fixes you make to the * email address(es): * lksctp developers <linux-sctp@vger.kernel.org> * * Written or modified by: * La Monte H.P. Yarroll <piggy@acm.org> * Xingang Guo <xingang.guo@intel.com> * Jon Grimm <jgrimm@us.ibm.com> * Daisy Chang <daisyc@us.ibm.com> * Sridhar Samudrala <sri@us.ibm.com> * Ardelle Fan <ardelle.fan@intel.com> * Ryan Layer <rmlayer@us.ibm.com> * Kevin Gao <kevin.gao@intel.com> */ #ifndef __net_sctp_h__ #define __net_sctp_h__ /* Header Strategy. * Start getting some control over the header file dependencies: * includes * constants * structs * prototypes * macros, externs, and inlines * * Move test_frame specific items out of the kernel headers * and into the test frame headers. This is not perfect in any sense * and will continue to evolve. */ #include <linux/types.h> #include <linux/slab.h> #include <linux/in.h> #include <linux/tty.h> #include <linux/proc_fs.h> #include <linux/spinlock.h> #include <linux/jiffies.h> #include <linux/idr.h> #if IS_ENABLED(CONFIG_IPV6) #include <net/ipv6.h> #include <net/ip6_route.h> #endif #include <linux/uaccess.h> #include <asm/page.h> #include <net/sock.h> #include <net/snmp.h> #include <net/sctp/structs.h> #include <net/sctp/constants.h> #ifdef CONFIG_IP_SCTP_MODULE #define SCTP_PROTOSW_FLAG 0 #else /* static! */ #define SCTP_PROTOSW_FLAG INET_PROTOSW_PERMANENT #endif /* * Function declarations. */ /* * sctp/protocol.c */ int sctp_copy_local_addr_list(struct net *net, struct sctp_bind_addr *addr, enum sctp_scope, gfp_t gfp, int flags); struct sctp_pf *sctp_get_pf_specific(sa_family_t family); int sctp_register_pf(struct sctp_pf *, sa_family_t); void sctp_addr_wq_mgmt(struct net *, struct sctp_sockaddr_entry *, int); int sctp_udp_sock_start(struct net *net); void sctp_udp_sock_stop(struct net *net); /* * sctp/socket.c */ int sctp_inet_connect(struct socket *sock, struct sockaddr_unsized *uaddr, int addr_len, int flags); int sctp_backlog_rcv(struct sock *sk, struct sk_buff *skb); int sctp_inet_listen(struct socket *sock, int backlog); void sctp_write_space(struct sock *sk); void sctp_data_ready(struct sock *sk); __poll_t sctp_poll(struct file *file, struct socket *sock, poll_table *wait); void sctp_sock_rfree(struct sk_buff *skb); extern struct percpu_counter sctp_sockets_allocated; int sctp_asconf_mgmt(struct sctp_sock *, struct sctp_sockaddr_entry *); struct sk_buff *sctp_skb_recv_datagram(struct sock *, int, int *); typedef int (*sctp_callback_t)(struct sctp_endpoint *, struct sctp_transport *, void *); void sctp_transport_walk_start(struct rhashtable_iter *iter); void sctp_transport_walk_stop(struct rhashtable_iter *iter); struct sctp_transport *sctp_transport_get_next(struct net *net, struct rhashtable_iter *iter); struct sctp_transport *sctp_transport_get_idx(struct net *net, struct rhashtable_iter *iter, int pos); int sctp_transport_lookup_process(sctp_callback_t cb, struct net *net, const union sctp_addr *laddr, const union sctp_addr *paddr, void *p, int dif); int sctp_transport_traverse_process(sctp_callback_t cb, sctp_callback_t cb_done, struct net *net, int *pos, void *p); int sctp_for_each_endpoint(int (*cb)(struct sctp_endpoint *, void *), void *p); int sctp_get_sctp_info(struct sock *sk, struct sctp_association *asoc, struct sctp_info *info); /* * sctp/primitive.c */ int sctp_primitive_ASSOCIATE(struct net *, struct sctp_association *, void *arg); int sctp_primitive_SHUTDOWN(struct net *, struct sctp_association *, void *arg); int sctp_primitive_ABORT(struct net *, struct sctp_association *, void *arg); int sctp_primitive_SEND(struct net *, struct sctp_association *, void *arg); int sctp_primitive_REQUESTHEARTBEAT(struct net *, struct sctp_association *, void *arg); int sctp_primitive_ASCONF(struct net *, struct sctp_association *, void *arg); int sctp_primitive_RECONF(struct net *net, struct sctp_association *asoc, void *arg); /* * sctp/input.c */ int sctp_rcv(struct sk_buff *skb); int sctp_v4_err(struct sk_buff *skb, u32 info); int sctp_hash_endpoint(struct sctp_endpoint *ep); void sctp_unhash_endpoint(struct sctp_endpoint *); struct sock *sctp_err_lookup(struct net *net, int family, struct sk_buff *, struct sctphdr *, struct sctp_association **, struct sctp_transport **); void sctp_err_finish(struct sock *, struct sctp_transport *); int sctp_udp_v4_err(struct sock *sk, struct sk_buff *skb); int sctp_udp_v6_err(struct sock *sk, struct sk_buff *skb); void sctp_icmp_frag_needed(struct sock *, struct sctp_association *, struct sctp_transport *t, __u32 pmtu); void sctp_icmp_redirect(struct sock *, struct sctp_transport *, struct sk_buff *); void sctp_icmp_proto_unreachable(struct sock *sk, struct sctp_association *asoc, struct sctp_transport *t); int sctp_transport_hashtable_init(void); void sctp_transport_hashtable_destroy(void); int sctp_hash_transport(struct sctp_transport *t); void sctp_unhash_transport(struct sctp_transport *t); struct sctp_transport *sctp_addrs_lookup_transport( struct net *net, const union sctp_addr *laddr, const union sctp_addr *paddr, int dif, int sdif); struct sctp_transport *sctp_epaddr_lookup_transport( const struct sctp_endpoint *ep, const union sctp_addr *paddr); bool sctp_sk_bound_dev_eq(struct net *net, int bound_dev_if, int dif, int sdif); /* * sctp/proc.c */ int __net_init sctp_proc_init(struct net *net); /* * sctp/offload.c */ int sctp_offload_init(void); /* * sctp/stream_sched.c */ void sctp_sched_ops_init(void); /* * sctp/stream.c */ int sctp_send_reset_streams(struct sctp_association *asoc, struct sctp_reset_streams *params); int sctp_send_reset_assoc(struct sctp_association *asoc); int sctp_send_add_streams(struct sctp_association *asoc, struct sctp_add_streams *params); /* * Module global variables */ /* * sctp/protocol.c */ extern struct kmem_cache *sctp_chunk_cachep __read_mostly; extern struct kmem_cache *sctp_bucket_cachep __read_mostly; extern long sysctl_sctp_mem[3]; extern int sysctl_sctp_rmem[3]; extern int sysctl_sctp_wmem[3]; /* * Section: Macros, externs, and inlines */ /* SCTP SNMP MIB stats handlers */ #define SCTP_INC_STATS(net, field) SNMP_INC_STATS((net)->sctp.sctp_statistics, field) #define __SCTP_INC_STATS(net, field) __SNMP_INC_STATS((net)->sctp.sctp_statistics, field) #define SCTP_DEC_STATS(net, field) SNMP_DEC_STATS((net)->sctp.sctp_statistics, field) /* sctp mib definitions */ enum { SCTP_MIB_NUM = 0, SCTP_MIB_CURRESTAB, /* CurrEstab */ SCTP_MIB_ACTIVEESTABS, /* ActiveEstabs */ SCTP_MIB_PASSIVEESTABS, /* PassiveEstabs */ SCTP_MIB_ABORTEDS, /* Aborteds */ SCTP_MIB_SHUTDOWNS, /* Shutdowns */ SCTP_MIB_OUTOFBLUES, /* OutOfBlues */ SCTP_MIB_CHECKSUMERRORS, /* ChecksumErrors */ SCTP_MIB_OUTCTRLCHUNKS, /* OutCtrlChunks */ SCTP_MIB_OUTORDERCHUNKS, /* OutOrderChunks */ SCTP_MIB_OUTUNORDERCHUNKS, /* OutUnorderChunks */ SCTP_MIB_INCTRLCHUNKS, /* InCtrlChunks */ SCTP_MIB_INORDERCHUNKS, /* InOrderChunks */ SCTP_MIB_INUNORDERCHUNKS, /* InUnorderChunks */ SCTP_MIB_FRAGUSRMSGS, /* FragUsrMsgs */ SCTP_MIB_REASMUSRMSGS, /* ReasmUsrMsgs */ SCTP_MIB_OUTSCTPPACKS, /* OutSCTPPacks */ SCTP_MIB_INSCTPPACKS, /* InSCTPPacks */ SCTP_MIB_T1_INIT_EXPIREDS, SCTP_MIB_T1_COOKIE_EXPIREDS, SCTP_MIB_T2_SHUTDOWN_EXPIREDS, SCTP_MIB_T3_RTX_EXPIREDS, SCTP_MIB_T4_RTO_EXPIREDS, SCTP_MIB_T5_SHUTDOWN_GUARD_EXPIREDS, SCTP_MIB_DELAY_SACK_EXPIREDS, SCTP_MIB_AUTOCLOSE_EXPIREDS, SCTP_MIB_T1_RETRANSMITS, SCTP_MIB_T3_RETRANSMITS, SCTP_MIB_PMTUD_RETRANSMITS, SCTP_MIB_FAST_RETRANSMITS, SCTP_MIB_IN_PKT_SOFTIRQ, SCTP_MIB_IN_PKT_BACKLOG, SCTP_MIB_IN_PKT_DISCARDS, SCTP_MIB_IN_DATA_CHUNK_DISCARDS, __SCTP_MIB_MAX }; #define SCTP_MIB_MAX __SCTP_MIB_MAX struct sctp_mib { unsigned long mibs[SCTP_MIB_MAX]; }; /* helper function to track stats about max rto and related transport */ static inline void sctp_max_rto(struct sctp_association *asoc, struct sctp_transport *trans) { if (asoc->stats.max_obs_rto < (__u64)trans->rto) { asoc->stats.max_obs_rto = trans->rto; memset(&asoc->stats.obs_rto_ipaddr, 0, sizeof(struct sockaddr_storage)); memcpy(&asoc->stats.obs_rto_ipaddr, &trans->ipaddr, trans->af_specific->sockaddr_len); } } /* * Macros for keeping a global reference of object allocations. */ #ifdef CONFIG_SCTP_DBG_OBJCNT extern atomic_t sctp_dbg_objcnt_sock; extern atomic_t sctp_dbg_objcnt_ep; extern atomic_t sctp_dbg_objcnt_assoc; extern atomic_t sctp_dbg_objcnt_transport; extern atomic_t sctp_dbg_objcnt_chunk; extern atomic_t sctp_dbg_objcnt_bind_addr; extern atomic_t sctp_dbg_objcnt_bind_bucket; extern atomic_t sctp_dbg_objcnt_addr; extern atomic_t sctp_dbg_objcnt_datamsg; extern atomic_t sctp_dbg_objcnt_keys; /* Macros to atomically increment/decrement objcnt counters. */ #define SCTP_DBG_OBJCNT_INC(name) \ atomic_inc(&sctp_dbg_objcnt_## name) #define SCTP_DBG_OBJCNT_DEC(name) \ atomic_dec(&sctp_dbg_objcnt_## name) #define SCTP_DBG_OBJCNT(name) \ atomic_t sctp_dbg_objcnt_## name = ATOMIC_INIT(0) /* Macro to help create new entries in the global array of * objcnt counters. */ #define SCTP_DBG_OBJCNT_ENTRY(name) \ {.label= #name, .counter= &sctp_dbg_objcnt_## name} void sctp_dbg_objcnt_init(struct net *); #else #define SCTP_DBG_OBJCNT_INC(name) #define SCTP_DBG_OBJCNT_DEC(name) static inline void sctp_dbg_objcnt_init(struct net *net) { return; } #endif /* CONFIG_SCTP_DBG_OBJCOUNT */ #if defined CONFIG_SYSCTL void sctp_sysctl_register(void); void sctp_sysctl_unregister(void); int sctp_sysctl_net_register(struct net *net); void sctp_sysctl_net_unregister(struct net *net); #else static inline void sctp_sysctl_register(void) { return; } static inline void sctp_sysctl_unregister(void) { return; } static inline int sctp_sysctl_net_register(struct net *net) { return 0; } static inline void sctp_sysctl_net_unregister(struct net *net) { return; } #endif /* Size of Supported Address Parameter for 'x' address types. */ #define SCTP_SAT_LEN(x) (sizeof(struct sctp_paramhdr) + (x) * sizeof(__u16)) #if IS_ENABLED(CONFIG_IPV6) void sctp_v6_pf_init(void); void sctp_v6_pf_exit(void); int sctp_v6_protosw_init(void); void sctp_v6_protosw_exit(void); int sctp_v6_add_protocol(void); void sctp_v6_del_protocol(void); #else /* #ifdef defined(CONFIG_IPV6) */ static inline void sctp_v6_pf_init(void) { return; } static inline void sctp_v6_pf_exit(void) { return; } static inline int sctp_v6_protosw_init(void) { return 0; } static inline void sctp_v6_protosw_exit(void) { return; } static inline int sctp_v6_add_protocol(void) { return 0; } static inline void sctp_v6_del_protocol(void) { return; } #endif /* #if defined(CONFIG_IPV6) */ /* Map an association to an assoc_id. */ static inline sctp_assoc_t sctp_assoc2id(const struct sctp_association *asoc) { return asoc ? asoc->assoc_id : 0; } static inline enum sctp_sstat_state sctp_assoc_to_state(const struct sctp_association *asoc) { /* SCTP's uapi always had SCTP_EMPTY(=0) as a dummy state, but we * got rid of it in kernel space. Therefore SCTP_CLOSED et al * start at =1 in user space, but actually as =0 in kernel space. * Now that we can not break user space and SCTP_EMPTY is exposed * there, we need to fix it up with an ugly offset not to break * applications. :( */ return asoc->state + 1; } /* Look up the association by its id. */ struct sctp_association *sctp_id2assoc(struct sock *sk, sctp_assoc_t id); /* A macro to walk a list of skbs. */ #define sctp_skb_for_each(pos, head, tmp) \ skb_queue_walk_safe(head, pos, tmp) /** * sctp_list_dequeue - remove from the head of the queue * @list: list to dequeue from * * Remove the head of the list. The head item is * returned or %NULL if the list is empty. */ static inline struct list_head *sctp_list_dequeue(struct list_head *list) { struct list_head *result = NULL; if (!list_empty(list)) { result = list->next; list_del_init(result); } return result; } /* SCTP version of skb_set_owner_r. We need this one because * of the way we have to do receive buffer accounting on bundled * chunks. */ static inline void sctp_skb_set_owner_r(struct sk_buff *skb, struct sock *sk) { struct sctp_ulpevent *event = sctp_skb2event(skb); skb_orphan(skb); skb->sk = sk; skb->destructor = sctp_sock_rfree; atomic_add(event->rmem_len, &sk->sk_rmem_alloc); /* * This mimics the behavior of skb_set_owner_r */ sk_mem_charge(sk, event->rmem_len); } /* Tests if the list has one and only one entry. */ static inline int sctp_list_single_entry(struct list_head *head) { return list_is_singular(head); } static inline bool sctp_chunk_pending(const struct sctp_chunk *chunk) { return !list_empty(&chunk->list); } /* Walk through a list of TLV parameters. Don't trust the * individual parameter lengths and instead depend on * the chunk length to indicate when to stop. Make sure * there is room for a param header too. */ #define sctp_walk_params(pos, chunk)\ _sctp_walk_params((pos), (chunk), ntohs((chunk)->chunk_hdr.length)) #define _sctp_walk_params(pos, chunk, end)\ for (pos.v = (u8 *)(chunk + 1);\ (pos.v + offsetof(struct sctp_paramhdr, length) + sizeof(pos.p->length) <=\ (void *)chunk + end) &&\ pos.v <= (void *)chunk + end - ntohs(pos.p->length) &&\ ntohs(pos.p->length) >= sizeof(struct sctp_paramhdr);\ pos.v += SCTP_PAD4(ntohs(pos.p->length))) #define sctp_walk_errors(err, chunk_hdr)\ _sctp_walk_errors((err), (chunk_hdr), ntohs((chunk_hdr)->length)) #define _sctp_walk_errors(err, chunk_hdr, end)\ for (err = (struct sctp_errhdr *)((void *)chunk_hdr + \ sizeof(struct sctp_chunkhdr));\ ((void *)err + offsetof(struct sctp_errhdr, length) + sizeof(err->length) <=\ (void *)chunk_hdr + end) &&\ (void *)err <= (void *)chunk_hdr + end - ntohs(err->length) &&\ ntohs(err->length) >= sizeof(struct sctp_errhdr); \ err = (struct sctp_errhdr *)((void *)err + SCTP_PAD4(ntohs(err->length)))) #define sctp_walk_fwdtsn(pos, chunk)\ _sctp_walk_fwdtsn((pos), (chunk), ntohs((chunk)->chunk_hdr->length) - sizeof(struct sctp_fwdtsn_chunk)) #define _sctp_walk_fwdtsn(pos, chunk, end)\ for (pos = (void *)(chunk->subh.fwdtsn_hdr + 1);\ (void *)pos <= (void *)(chunk->subh.fwdtsn_hdr + 1) + end - sizeof(struct sctp_fwdtsn_skip);\ pos++) /* External references. */ extern struct proto sctp_prot; extern struct proto sctpv6_prot; void sctp_put_port(struct sock *sk); extern struct idr sctp_assocs_id; extern spinlock_t sctp_assocs_id_lock; /* Static inline functions. */ /* Convert from an IP version number to an Address Family symbol. */ static inline int ipver2af(__u8 ipver) { switch (ipver) { case 4: return AF_INET; case 6: return AF_INET6; default: return 0; } } /* Convert from an address parameter type to an address family. */ static inline int param_type2af(__be16 type) { switch (type) { case SCTP_PARAM_IPV4_ADDRESS: return AF_INET; case SCTP_PARAM_IPV6_ADDRESS: return AF_INET6; default: return 0; } } /* Warning: The following hash functions assume a power of two 'size'. */ /* This is the hash function for the SCTP port hash table. */ static inline int sctp_phashfn(struct net *net, __u16 lport) { return (net_hash_mix(net) + lport) & (sctp_port_hashsize - 1); } /* This is the hash function for the endpoint hash table. */ static inline int sctp_ep_hashfn(struct net *net, __u16 lport) { return (net_hash_mix(net) + lport) & (sctp_ep_hashsize - 1); } #define sctp_for_each_hentry(ep, head) \ hlist_for_each_entry(ep, head, node) /* Is a socket of this style? */ #define sctp_style(sk, style) __sctp_style((sk), (SCTP_SOCKET_##style)) static inline int __sctp_style(const struct sock *sk, enum sctp_socket_type style) { return sctp_sk(sk)->type == style; } /* Is the association in this state? */ #define sctp_state(asoc, state) __sctp_state((asoc), (SCTP_STATE_##state)) static inline int __sctp_state(const struct sctp_association *asoc, enum sctp_state state) { return asoc->state == state; } /* Is the socket in this state? */ #define sctp_sstate(sk, state) __sctp_sstate((sk), (SCTP_SS_##state)) static inline int __sctp_sstate(const struct sock *sk, enum sctp_sock_state state) { return sk->sk_state == state; } /* Map v4-mapped v6 address back to v4 address */ static inline void sctp_v6_map_v4(union sctp_addr *addr) { addr->v4.sin_family = AF_INET; addr->v4.sin_port = addr->v6.sin6_port; addr->v4.sin_addr.s_addr = addr->v6.sin6_addr.s6_addr32[3]; } /* Map v4 address to v4-mapped v6 address */ static inline void sctp_v4_map_v6(union sctp_addr *addr) { __be16 port; port = addr->v4.sin_port; addr->v6.sin6_addr.s6_addr32[3] = addr->v4.sin_addr.s_addr; addr->v6.sin6_port = port; addr->v6.sin6_family = AF_INET6; addr->v6.sin6_flowinfo = 0; addr->v6.sin6_scope_id = 0; addr->v6.sin6_addr.s6_addr32[0] = 0; addr->v6.sin6_addr.s6_addr32[1] = 0; addr->v6.sin6_addr.s6_addr32[2] = htonl(0x0000ffff); } /* The cookie is always 0 since this is how it's used in the * pmtu code. */ static inline struct dst_entry *sctp_transport_dst_check(struct sctp_transport *t) { if (t->dst && !dst_check(t->dst, t->dst_cookie)) sctp_transport_dst_release(t); return t->dst; } /* Calculate max payload size given a MTU, or the total overhead if * given MTU is zero */ static inline __u32 __sctp_mtu_payload(const struct sctp_sock *sp, const struct sctp_transport *t, __u32 mtu, __u32 extra) { __u32 overhead = sizeof(struct sctphdr) + extra; if (sp) { overhead += sp->pf->af->net_header_len; if (sp->udp_port && (!t || t->encap_port)) overhead += sizeof(struct udphdr); } else { overhead += sizeof(struct ipv6hdr); } if (WARN_ON_ONCE(mtu && mtu <= overhead)) mtu = overhead; return mtu ? mtu - overhead : overhead; } static inline __u32 sctp_mtu_payload(const struct sctp_sock *sp, __u32 mtu, __u32 extra) { return __sctp_mtu_payload(sp, NULL, mtu, extra); } static inline __u32 sctp_dst_mtu(const struct dst_entry *dst) { return SCTP_TRUNC4(max_t(__u32, dst_mtu(dst), SCTP_DEFAULT_MINSEGMENT)); } static inline bool sctp_transport_pmtu_check(struct sctp_transport *t) { __u32 pmtu = sctp_dst_mtu(t->dst); if (t->pathmtu == pmtu) return true; t->pathmtu = pmtu; return false; } static inline __u32 sctp_min_frag_point(struct sctp_sock *sp, __u16 datasize) { return sctp_mtu_payload(sp, SCTP_DEFAULT_MINSEGMENT, datasize); } static inline int sctp_transport_pl_hlen(struct sctp_transport *t) { return __sctp_mtu_payload(sctp_sk(t->asoc->base.sk), t, 0, 0) - sizeof(struct sctphdr); } static inline void sctp_transport_pl_reset(struct sctp_transport *t) { if (t->probe_interval && (t->param_flags & SPP_PMTUD_ENABLE) && (t->state == SCTP_ACTIVE || t->state == SCTP_UNKNOWN)) { if (t->pl.state == SCTP_PL_DISABLED) { t->pl.state = SCTP_PL_BASE; t->pl.pmtu = SCTP_BASE_PLPMTU; t->pl.probe_size = SCTP_BASE_PLPMTU; sctp_transport_reset_probe_timer(t); } } else { if (t->pl.state != SCTP_PL_DISABLED) { if (timer_delete(&t->probe_timer)) sctp_transport_put(t); t->pl.state = SCTP_PL_DISABLED; } } } static inline void sctp_transport_pl_update(struct sctp_transport *t) { if (t->pl.state == SCTP_PL_DISABLED) return; t->pl.state = SCTP_PL_BASE; t->pl.pmtu = SCTP_BASE_PLPMTU; t->pl.probe_size = SCTP_BASE_PLPMTU; sctp_transport_reset_probe_timer(t); } static inline bool sctp_transport_pl_enabled(struct sctp_transport *t) { return t->pl.state != SCTP_PL_DISABLED; } static inline bool sctp_newsk_ready(const struct sock *sk) { return sock_flag(sk, SOCK_DEAD) || sk->sk_socket; } static inline void sctp_sock_set_nodelay(struct sock *sk) { lock_sock(sk); sctp_sk(sk)->nodelay = true; release_sock(sk); } #endif /* __net_sctp_h__ */ |
| 1 1 1 44 45 45 45 1 2 2 7 7 4 4 5 7 1 4 1 4 4 4 1 5 5 5 5 5 1 5 2 5 1 1 1 31 5 31 4 4 4 3 1 24 23 23 13 13 2 1 1 1 49 49 49 1 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 | // SPDX-License-Identifier: GPL-2.0-only /* * Landlock - Ptrace and scope hooks * * Copyright © 2017-2020 Mickaël Salaün <mic@digikod.net> * Copyright © 2019-2020 ANSSI * Copyright © 2024-2025 Microsoft Corporation */ #include <asm/current.h> #include <linux/cleanup.h> #include <linux/cred.h> #include <linux/errno.h> #include <linux/kernel.h> #include <linux/lsm_audit.h> #include <linux/lsm_hooks.h> #include <linux/rcupdate.h> #include <linux/sched.h> #include <linux/sched/signal.h> #include <net/af_unix.h> #include <net/sock.h> #include "audit.h" #include "common.h" #include "cred.h" #include "domain.h" #include "fs.h" #include "ruleset.h" #include "setup.h" #include "task.h" /** * domain_scope_le - Checks domain ordering for scoped ptrace * * @parent: Parent domain. * @child: Potential child of @parent. * * Checks if the @parent domain is less or equal to (i.e. an ancestor, which * means a subset of) the @child domain. */ static bool domain_scope_le(const struct landlock_ruleset *const parent, const struct landlock_ruleset *const child) { const struct landlock_hierarchy *walker; /* Quick return for non-landlocked tasks. */ if (!parent) return true; if (!child) return false; for (walker = child->hierarchy; walker; walker = walker->parent) { if (walker == parent->hierarchy) /* @parent is in the scoped hierarchy of @child. */ return true; } /* There is no relationship between @parent and @child. */ return false; } static int domain_ptrace(const struct landlock_ruleset *const parent, const struct landlock_ruleset *const child) { if (domain_scope_le(parent, child)) return 0; return -EPERM; } /** * hook_ptrace_access_check - Determines whether the current process may access * another * * @child: Process to be accessed. * @mode: Mode of attachment. * * If the current task has Landlock rules, then the child must have at least * the same rules. Else denied. * * Determines whether a process may access another, returning 0 if permission * granted, -errno if denied. */ static int hook_ptrace_access_check(struct task_struct *const child, const unsigned int mode) { const struct landlock_cred_security *parent_subject; int err; /* Quick return for non-landlocked tasks. */ parent_subject = landlock_cred(current_cred()); if (!parent_subject) return 0; scoped_guard(rcu) { const struct landlock_ruleset *const child_dom = landlock_get_task_domain(child); err = domain_ptrace(parent_subject->domain, child_dom); } if (!err) return 0; /* * For the ptrace_access_check case, we log the current/parent domain * and the child task. */ if (!(mode & PTRACE_MODE_NOAUDIT)) landlock_log_denial(parent_subject, &(struct landlock_request) { .type = LANDLOCK_REQUEST_PTRACE, .audit = { .type = LSM_AUDIT_DATA_TASK, .u.tsk = child, }, .layer_plus_one = parent_subject->domain->num_layers, }); return err; } /** * hook_ptrace_traceme - Determines whether another process may trace the * current one * * @parent: Task proposed to be the tracer. * * If the parent has Landlock rules, then the current task must have the same * or more rules. Else denied. * * Determines whether the nominated task is permitted to trace the current * process, returning 0 if permission is granted, -errno if denied. */ static int hook_ptrace_traceme(struct task_struct *const parent) { const struct landlock_cred_security *parent_subject; const struct landlock_ruleset *child_dom; int err; child_dom = landlock_get_current_domain(); guard(rcu)(); parent_subject = landlock_cred(__task_cred(parent)); err = domain_ptrace(parent_subject->domain, child_dom); if (!err) return 0; /* * For the ptrace_traceme case, we log the domain which is the cause of * the denial, which means the parent domain instead of the current * domain. This may look unusual because the ptrace_traceme action is a * request to be traced, but the semantic is consistent with * hook_ptrace_access_check(). */ landlock_log_denial(parent_subject, &(struct landlock_request) { .type = LANDLOCK_REQUEST_PTRACE, .audit = { .type = LSM_AUDIT_DATA_TASK, .u.tsk = current, }, .layer_plus_one = parent_subject->domain->num_layers, }); return err; } /** * domain_is_scoped - Check if an interaction from a client/sender to a * server/receiver should be restricted based on scope controls. * * @client: IPC sender domain. * @server: IPC receiver domain. * @scope: The scope restriction criteria. * * Returns: True if @server is in a different domain from @client, and @client * is scoped to access @server (i.e. access should be denied). */ static bool domain_is_scoped(const struct landlock_ruleset *const client, const struct landlock_ruleset *const server, access_mask_t scope) { int client_layer, server_layer; const struct landlock_hierarchy *client_walker, *server_walker; /* Quick return if client has no domain */ if (WARN_ON_ONCE(!client)) return false; client_layer = client->num_layers - 1; client_walker = client->hierarchy; /* * client_layer must be a signed integer with greater capacity * than client->num_layers to ensure the following loop stops. */ BUILD_BUG_ON(sizeof(client_layer) > sizeof(client->num_layers)); server_layer = server ? (server->num_layers - 1) : -1; server_walker = server ? server->hierarchy : NULL; /* * Walks client's parent domains down to the same hierarchy level * as the server's domain, and checks that none of these client's * parent domains are scoped. */ for (; client_layer > server_layer; client_layer--) { if (landlock_get_scope_mask(client, client_layer) & scope) return true; client_walker = client_walker->parent; } /* * Walks server's parent domains down to the same hierarchy level as * the client's domain. */ for (; server_layer > client_layer; server_layer--) server_walker = server_walker->parent; for (; client_layer >= 0; client_layer--) { if (landlock_get_scope_mask(client, client_layer) & scope) { /* * Client and server are at the same level in the * hierarchy. If the client is scoped, the request is * only allowed if this domain is also a server's * ancestor. */ return server_walker != client_walker; } client_walker = client_walker->parent; server_walker = server_walker->parent; } return false; } static bool sock_is_scoped(struct sock *const other, const struct landlock_ruleset *const domain) { const struct landlock_ruleset *dom_other; /* The credentials will not change. */ lockdep_assert_held(&unix_sk(other)->lock); dom_other = landlock_cred(other->sk_socket->file->f_cred)->domain; return domain_is_scoped(domain, dom_other, LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET); } static bool is_abstract_socket(struct sock *const sock) { struct unix_address *addr = unix_sk(sock)->addr; if (!addr) return false; if (addr->len >= offsetof(struct sockaddr_un, sun_path) + 1 && addr->name->sun_path[0] == '\0') return true; return false; } static const struct access_masks unix_scope = { .scope = LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET, }; static int hook_unix_stream_connect(struct sock *const sock, struct sock *const other, struct sock *const newsk) { size_t handle_layer; const struct landlock_cred_security *const subject = landlock_get_applicable_subject(current_cred(), unix_scope, &handle_layer); /* Quick return for non-landlocked tasks. */ if (!subject) return 0; if (!is_abstract_socket(other)) return 0; if (!sock_is_scoped(other, subject->domain)) return 0; landlock_log_denial(subject, &(struct landlock_request) { .type = LANDLOCK_REQUEST_SCOPE_ABSTRACT_UNIX_SOCKET, .audit = { .type = LSM_AUDIT_DATA_NET, .u.net = &(struct lsm_network_audit) { .sk = other, }, }, .layer_plus_one = handle_layer + 1, }); return -EPERM; } static int hook_unix_may_send(struct socket *const sock, struct socket *const other) { size_t handle_layer; const struct landlock_cred_security *const subject = landlock_get_applicable_subject(current_cred(), unix_scope, &handle_layer); if (!subject) return 0; /* * Checks if this datagram socket was already allowed to be connected * to other. */ if (unix_peer(sock->sk) == other->sk) return 0; if (!is_abstract_socket(other->sk)) return 0; if (!sock_is_scoped(other->sk, subject->domain)) return 0; landlock_log_denial(subject, &(struct landlock_request) { .type = LANDLOCK_REQUEST_SCOPE_ABSTRACT_UNIX_SOCKET, .audit = { .type = LSM_AUDIT_DATA_NET, .u.net = &(struct lsm_network_audit) { .sk = other->sk, }, }, .layer_plus_one = handle_layer + 1, }); return -EPERM; } static const struct access_masks signal_scope = { .scope = LANDLOCK_SCOPE_SIGNAL, }; static int hook_task_kill(struct task_struct *const p, struct kernel_siginfo *const info, const int sig, const struct cred *cred) { bool is_scoped; size_t handle_layer; const struct landlock_cred_security *subject; if (!cred) { /* * Always allow sending signals between threads of the same process. * This is required for process credential changes by the Native POSIX * Threads Library and implemented by the set*id(2) wrappers and * libcap(3) with tgkill(2). See nptl(7) and libpsx(3). * * This exception is similar to the __ptrace_may_access() one. */ if (same_thread_group(p, current)) return 0; /* Not dealing with USB IO. */ cred = current_cred(); } subject = landlock_get_applicable_subject(cred, signal_scope, &handle_layer); /* Quick return for non-landlocked tasks. */ if (!subject) return 0; scoped_guard(rcu) { is_scoped = domain_is_scoped(subject->domain, landlock_get_task_domain(p), signal_scope.scope); } if (!is_scoped) return 0; landlock_log_denial(subject, &(struct landlock_request) { .type = LANDLOCK_REQUEST_SCOPE_SIGNAL, .audit = { .type = LSM_AUDIT_DATA_TASK, .u.tsk = p, }, .layer_plus_one = handle_layer + 1, }); return -EPERM; } static int hook_file_send_sigiotask(struct task_struct *tsk, struct fown_struct *fown, int signum) { const struct landlock_cred_security *subject; bool is_scoped = false; /* Lock already held by send_sigio() and send_sigurg(). */ lockdep_assert_held(&fown->lock); subject = &landlock_file(fown->file)->fown_subject; /* * Quick return for unowned socket. * * subject->domain has already been filtered when saved by * hook_file_set_fowner(), so there is no need to call * landlock_get_applicable_subject() here. */ if (!subject->domain) return 0; scoped_guard(rcu) { is_scoped = domain_is_scoped(subject->domain, landlock_get_task_domain(tsk), signal_scope.scope); } if (!is_scoped) return 0; landlock_log_denial(subject, &(struct landlock_request) { .type = LANDLOCK_REQUEST_SCOPE_SIGNAL, .audit = { .type = LSM_AUDIT_DATA_TASK, .u.tsk = tsk, }, #ifdef CONFIG_AUDIT .layer_plus_one = landlock_file(fown->file)->fown_layer + 1, #endif /* CONFIG_AUDIT */ }); return -EPERM; } static struct security_hook_list landlock_hooks[] __ro_after_init = { LSM_HOOK_INIT(ptrace_access_check, hook_ptrace_access_check), LSM_HOOK_INIT(ptrace_traceme, hook_ptrace_traceme), LSM_HOOK_INIT(unix_stream_connect, hook_unix_stream_connect), LSM_HOOK_INIT(unix_may_send, hook_unix_may_send), LSM_HOOK_INIT(task_kill, hook_task_kill), LSM_HOOK_INIT(file_send_sigiotask, hook_file_send_sigiotask), }; __init void landlock_add_task_hooks(void) { security_add_hooks(landlock_hooks, ARRAY_SIZE(landlock_hooks), &landlock_lsmid); } |
| 5 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 | /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _LINUX_PAGE_COUNTER_H #define _LINUX_PAGE_COUNTER_H #include <linux/atomic.h> #include <linux/cache.h> #include <linux/limits.h> #include <asm/page.h> struct page_counter { /* * Make sure 'usage' does not share cacheline with any other field in * v2. The memcg->memory.usage is a hot member of struct mem_cgroup. */ atomic_long_t usage; unsigned long failcnt; /* v1-only field */ CACHELINE_PADDING(_pad1_); /* effective memory.min and memory.min usage tracking */ unsigned long emin; atomic_long_t min_usage; atomic_long_t children_min_usage; /* effective memory.low and memory.low usage tracking */ unsigned long elow; atomic_long_t low_usage; atomic_long_t children_low_usage; unsigned long watermark; /* Latest cg2 reset watermark */ unsigned long local_watermark; /* Keep all the read most fields in a separete cacheline. */ CACHELINE_PADDING(_pad2_); bool protection_support; bool track_failcnt; unsigned long min; unsigned long low; unsigned long high; unsigned long max; struct page_counter *parent; } ____cacheline_internodealigned_in_smp; #if BITS_PER_LONG == 32 #define PAGE_COUNTER_MAX LONG_MAX #else #define PAGE_COUNTER_MAX (LONG_MAX / PAGE_SIZE) #endif /* * Protection is supported only for the first counter (with id 0). */ static inline void page_counter_init(struct page_counter *counter, struct page_counter *parent, bool protection_support) { counter->usage = (atomic_long_t)ATOMIC_LONG_INIT(0); counter->max = PAGE_COUNTER_MAX; counter->parent = parent; counter->protection_support = protection_support; counter->track_failcnt = false; } static inline unsigned long page_counter_read(struct page_counter *counter) { return atomic_long_read(&counter->usage); } void page_counter_cancel(struct page_counter *counter, unsigned long nr_pages); void page_counter_charge(struct page_counter *counter, unsigned long nr_pages); bool page_counter_try_charge(struct page_counter *counter, unsigned long nr_pages, struct page_counter **fail); void page_counter_uncharge(struct page_counter *counter, unsigned long nr_pages); void page_counter_set_min(struct page_counter *counter, unsigned long nr_pages); void page_counter_set_low(struct page_counter *counter, unsigned long nr_pages); static inline void page_counter_set_high(struct page_counter *counter, unsigned long nr_pages) { WRITE_ONCE(counter->high, nr_pages); } int page_counter_set_max(struct page_counter *counter, unsigned long nr_pages); int page_counter_memparse(const char *buf, const char *max, unsigned long *nr_pages); static inline void page_counter_reset_watermark(struct page_counter *counter) { unsigned long usage = page_counter_read(counter); /* * Update local_watermark first, so it's always <= watermark * (modulo CPU/compiler re-ordering) */ counter->local_watermark = usage; counter->watermark = usage; } #if IS_ENABLED(CONFIG_MEMCG) || IS_ENABLED(CONFIG_CGROUP_DMEM) void page_counter_calculate_protection(struct page_counter *root, struct page_counter *counter, bool recursive_protection); #else static inline void page_counter_calculate_protection(struct page_counter *root, struct page_counter *counter, bool recursive_protection) {} #endif #endif /* _LINUX_PAGE_COUNTER_H */ |
| 1 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 | // SPDX-License-Identifier: GPL-2.0-only #include <linux/kernel.h> #include <linux/init.h> #include <linux/module.h> #include <linux/netlink.h> #include <linux/netfilter.h> #include <linux/netfilter/nf_tables.h> #include <net/netfilter/nf_tables_core.h> #include <net/netfilter/nf_tables.h> #include <net/netfilter/nft_fib.h> static void nft_fib_inet_eval(const struct nft_expr *expr, struct nft_regs *regs, const struct nft_pktinfo *pkt) { const struct nft_fib *priv = nft_expr_priv(expr); switch (nft_pf(pkt)) { case NFPROTO_IPV4: switch (priv->result) { case NFT_FIB_RESULT_OIF: case NFT_FIB_RESULT_OIFNAME: return nft_fib4_eval(expr, regs, pkt); case NFT_FIB_RESULT_ADDRTYPE: return nft_fib4_eval_type(expr, regs, pkt); } break; case NFPROTO_IPV6: switch (priv->result) { case NFT_FIB_RESULT_OIF: case NFT_FIB_RESULT_OIFNAME: return nft_fib6_eval(expr, regs, pkt); case NFT_FIB_RESULT_ADDRTYPE: return nft_fib6_eval_type(expr, regs, pkt); } break; } regs->verdict.code = NF_DROP; } static struct nft_expr_type nft_fib_inet_type; static const struct nft_expr_ops nft_fib_inet_ops = { .type = &nft_fib_inet_type, .size = NFT_EXPR_SIZE(sizeof(struct nft_fib)), .eval = nft_fib_inet_eval, .init = nft_fib_init, .dump = nft_fib_dump, .validate = nft_fib_validate, .reduce = nft_fib_reduce, }; static struct nft_expr_type nft_fib_inet_type __read_mostly = { .family = NFPROTO_INET, .name = "fib", .ops = &nft_fib_inet_ops, .policy = nft_fib_policy, .maxattr = NFTA_FIB_MAX, .owner = THIS_MODULE, }; static int __init nft_fib_inet_module_init(void) { return nft_register_expr(&nft_fib_inet_type); } static void __exit nft_fib_inet_module_exit(void) { nft_unregister_expr(&nft_fib_inet_type); } module_init(nft_fib_inet_module_init); module_exit(nft_fib_inet_module_exit); MODULE_LICENSE("GPL"); MODULE_AUTHOR("Florian Westphal <fw@strlen.de>"); MODULE_ALIAS_NFT_AF_EXPR(1, "fib"); MODULE_DESCRIPTION("nftables fib inet support"); |
| 2 16 2 2 2 2 2 2 6 5 2 2 2 1 1 9 3 1 3 2 1 1 2 9 5 2 2 5 2 2 1 2 2 2 2 3 2 4 4 6 5 6 2 6 6 2 4 4 3 4 1 2 5 6 2 2 1 1 2 1 2 1 2 3 3 3 2 1 1 1 1 2 2 2 1 2 3 11 9 11 2 2 1 3 3 2 4 11 4 2 11 6 2 2 3 2 4 4 4 2 2 2 8 8 8 8 1 7 7 7 2 6 6 1 6 6 6 2 2 2 2 2 2 2 1 1 5 4 4 3 3 6 7 2 8 8 8 8 8 6 2 2 4 6 5 6 3 3 2 10 6 6 4 1 3 2 2 2 12 12 12 4 4 1 8 11 11 1 1 1 10 3 10 6 10 5 5 5 1 4 4 3 5 4 4 4 4 4 4 4 3 2 2 2 2 9 3 3 6 11 12 15 15 15 14 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 | // SPDX-License-Identifier: GPL-2.0 /* * linux/ipc/msg.c * Copyright (C) 1992 Krishna Balasubramanian * * Removed all the remaining kerneld mess * Catch the -EFAULT stuff properly * Use GFP_KERNEL for messages as in 1.2 * Fixed up the unchecked user space derefs * Copyright (C) 1998 Alan Cox & Andi Kleen * * /proc/sysvipc/msg support (c) 1999 Dragos Acostachioaie <dragos@iname.com> * * mostly rewritten, threaded and wake-one semantics added * MSGMAX limit removed, sysctl's added * (c) 1999 Manfred Spraul <manfred@colorfullife.com> * * support for audit of ipc object properties and permission changes * Dustin Kirkland <dustin.kirkland@us.ibm.com> * * namespaces support * OpenVZ, SWsoft Inc. * Pavel Emelianov <xemul@openvz.org> */ #include <linux/capability.h> #include <linux/msg.h> #include <linux/spinlock.h> #include <linux/init.h> #include <linux/mm.h> #include <linux/proc_fs.h> #include <linux/list.h> #include <linux/security.h> #include <linux/sched/wake_q.h> #include <linux/syscalls.h> #include <linux/audit.h> #include <linux/seq_file.h> #include <linux/rwsem.h> #include <linux/nsproxy.h> #include <linux/ipc_namespace.h> #include <linux/rhashtable.h> #include <linux/percpu_counter.h> #include <asm/current.h> #include <linux/uaccess.h> #include "util.h" /* one msq_queue structure for each present queue on the system */ struct msg_queue { struct kern_ipc_perm q_perm; time64_t q_stime; /* last msgsnd time */ time64_t q_rtime; /* last msgrcv time */ time64_t q_ctime; /* last change time */ unsigned long q_cbytes; /* current number of bytes on queue */ unsigned long q_qnum; /* number of messages in queue */ unsigned long q_qbytes; /* max number of bytes on queue */ struct pid *q_lspid; /* pid of last msgsnd */ struct pid *q_lrpid; /* last receive pid */ struct list_head q_messages; struct list_head q_receivers; struct list_head q_senders; } __randomize_layout; /* * MSG_BARRIER Locking: * * Similar to the optimization used in ipc/mqueue.c, one syscall return path * does not acquire any locks when it sees that a message exists in * msg_receiver.r_msg. Therefore r_msg is set using smp_store_release() * and accessed using READ_ONCE()+smp_acquire__after_ctrl_dep(). In addition, * wake_q_add_safe() is used. See ipc/mqueue.c for more details */ /* one msg_receiver structure for each sleeping receiver */ struct msg_receiver { struct list_head r_list; struct task_struct *r_tsk; int r_mode; long r_msgtype; long r_maxsize; struct msg_msg *r_msg; }; /* one msg_sender for each sleeping sender */ struct msg_sender { struct list_head list; struct task_struct *tsk; size_t msgsz; }; #define SEARCH_ANY 1 #define SEARCH_EQUAL 2 #define SEARCH_NOTEQUAL 3 #define SEARCH_LESSEQUAL 4 #define SEARCH_NUMBER 5 #define msg_ids(ns) ((ns)->ids[IPC_MSG_IDS]) static inline struct msg_queue *msq_obtain_object(struct ipc_namespace *ns, int id) { struct kern_ipc_perm *ipcp = ipc_obtain_object_idr(&msg_ids(ns), id); if (IS_ERR(ipcp)) return ERR_CAST(ipcp); return container_of(ipcp, struct msg_queue, q_perm); } static inline struct msg_queue *msq_obtain_object_check(struct ipc_namespace *ns, int id) { struct kern_ipc_perm *ipcp = ipc_obtain_object_check(&msg_ids(ns), id); if (IS_ERR(ipcp)) return ERR_CAST(ipcp); return container_of(ipcp, struct msg_queue, q_perm); } static inline void msg_rmid(struct ipc_namespace *ns, struct msg_queue *s) { ipc_rmid(&msg_ids(ns), &s->q_perm); } static void msg_rcu_free(struct rcu_head *head) { struct kern_ipc_perm *p = container_of(head, struct kern_ipc_perm, rcu); struct msg_queue *msq = container_of(p, struct msg_queue, q_perm); security_msg_queue_free(&msq->q_perm); kfree(msq); } /** * newque - Create a new msg queue * @ns: namespace * @params: ptr to the structure that contains the key and msgflg * * Called with msg_ids.rwsem held (writer) */ static int newque(struct ipc_namespace *ns, struct ipc_params *params) { struct msg_queue *msq; int retval; key_t key = params->key; int msgflg = params->flg; msq = kmalloc_obj(*msq, GFP_KERNEL_ACCOUNT); if (unlikely(!msq)) return -ENOMEM; msq->q_perm.mode = msgflg & S_IRWXUGO; msq->q_perm.key = key; msq->q_perm.security = NULL; retval = security_msg_queue_alloc(&msq->q_perm); if (retval) { kfree(msq); return retval; } msq->q_stime = msq->q_rtime = 0; msq->q_ctime = ktime_get_real_seconds(); msq->q_cbytes = msq->q_qnum = 0; msq->q_qbytes = ns->msg_ctlmnb; msq->q_lspid = msq->q_lrpid = NULL; INIT_LIST_HEAD(&msq->q_messages); INIT_LIST_HEAD(&msq->q_receivers); INIT_LIST_HEAD(&msq->q_senders); /* ipc_addid() locks msq upon success. */ retval = ipc_addid(&msg_ids(ns), &msq->q_perm, ns->msg_ctlmni); if (retval < 0) { ipc_rcu_putref(&msq->q_perm, msg_rcu_free); return retval; } ipc_unlock_object(&msq->q_perm); rcu_read_unlock(); return msq->q_perm.id; } static inline bool msg_fits_inqueue(struct msg_queue *msq, size_t msgsz) { return msgsz + msq->q_cbytes <= msq->q_qbytes && 1 + msq->q_qnum <= msq->q_qbytes; } static inline void ss_add(struct msg_queue *msq, struct msg_sender *mss, size_t msgsz) { mss->tsk = current; mss->msgsz = msgsz; /* * No memory barrier required: we did ipc_lock_object(), * and the waker obtains that lock before calling wake_q_add(). */ __set_current_state(TASK_INTERRUPTIBLE); list_add_tail(&mss->list, &msq->q_senders); } static inline void ss_del(struct msg_sender *mss) { if (mss->list.next) list_del(&mss->list); } static void ss_wakeup(struct msg_queue *msq, struct wake_q_head *wake_q, bool kill) { struct msg_sender *mss, *t; struct task_struct *stop_tsk = NULL; struct list_head *h = &msq->q_senders; list_for_each_entry_safe(mss, t, h, list) { if (kill) mss->list.next = NULL; /* * Stop at the first task we don't wakeup, * we've already iterated the original * sender queue. */ else if (stop_tsk == mss->tsk) break; /* * We are not in an EIDRM scenario here, therefore * verify that we really need to wakeup the task. * To maintain current semantics and wakeup order, * move the sender to the tail on behalf of the * blocked task. */ else if (!msg_fits_inqueue(msq, mss->msgsz)) { if (!stop_tsk) stop_tsk = mss->tsk; list_move_tail(&mss->list, &msq->q_senders); continue; } wake_q_add(wake_q, mss->tsk); } } static void expunge_all(struct msg_queue *msq, int res, struct wake_q_head *wake_q) { struct msg_receiver *msr, *t; list_for_each_entry_safe(msr, t, &msq->q_receivers, r_list) { struct task_struct *r_tsk; r_tsk = get_task_struct(msr->r_tsk); /* see MSG_BARRIER for purpose/pairing */ smp_store_release(&msr->r_msg, ERR_PTR(res)); wake_q_add_safe(wake_q, r_tsk); } } /* * freeque() wakes up waiters on the sender and receiver waiting queue, * removes the message queue from message queue ID IDR, and cleans up all the * messages associated with this queue. * * msg_ids.rwsem (writer) and the spinlock for this message queue are held * before freeque() is called. msg_ids.rwsem remains locked on exit. */ static void freeque(struct ipc_namespace *ns, struct kern_ipc_perm *ipcp) __releases(RCU) __releases(&msq->q_perm) { struct msg_msg *msg, *t; struct msg_queue *msq = container_of(ipcp, struct msg_queue, q_perm); DEFINE_WAKE_Q(wake_q); expunge_all(msq, -EIDRM, &wake_q); ss_wakeup(msq, &wake_q, true); msg_rmid(ns, msq); ipc_unlock_object(&msq->q_perm); wake_up_q(&wake_q); rcu_read_unlock(); list_for_each_entry_safe(msg, t, &msq->q_messages, m_list) { percpu_counter_sub_local(&ns->percpu_msg_hdrs, 1); free_msg(msg); } percpu_counter_sub_local(&ns->percpu_msg_bytes, msq->q_cbytes); ipc_update_pid(&msq->q_lspid, NULL); ipc_update_pid(&msq->q_lrpid, NULL); ipc_rcu_putref(&msq->q_perm, msg_rcu_free); } long ksys_msgget(key_t key, int msgflg) { struct ipc_namespace *ns; static const struct ipc_ops msg_ops = { .getnew = newque, .associate = security_msg_queue_associate, }; struct ipc_params msg_params; ns = current->nsproxy->ipc_ns; msg_params.key = key; msg_params.flg = msgflg; return ipcget(ns, &msg_ids(ns), &msg_ops, &msg_params); } SYSCALL_DEFINE2(msgget, key_t, key, int, msgflg) { return ksys_msgget(key, msgflg); } static inline unsigned long copy_msqid_to_user(void __user *buf, struct msqid64_ds *in, int version) { switch (version) { case IPC_64: return copy_to_user(buf, in, sizeof(*in)); case IPC_OLD: { struct msqid_ds out; memset(&out, 0, sizeof(out)); ipc64_perm_to_ipc_perm(&in->msg_perm, &out.msg_perm); out.msg_stime = in->msg_stime; out.msg_rtime = in->msg_rtime; out.msg_ctime = in->msg_ctime; if (in->msg_cbytes > USHRT_MAX) out.msg_cbytes = USHRT_MAX; else out.msg_cbytes = in->msg_cbytes; out.msg_lcbytes = in->msg_cbytes; if (in->msg_qnum > USHRT_MAX) out.msg_qnum = USHRT_MAX; else out.msg_qnum = in->msg_qnum; if (in->msg_qbytes > USHRT_MAX) out.msg_qbytes = USHRT_MAX; else out.msg_qbytes = in->msg_qbytes; out.msg_lqbytes = in->msg_qbytes; out.msg_lspid = in->msg_lspid; out.msg_lrpid = in->msg_lrpid; return copy_to_user(buf, &out, sizeof(out)); } default: return -EINVAL; } } static inline unsigned long copy_msqid_from_user(struct msqid64_ds *out, void __user *buf, int version) { switch (version) { case IPC_64: if (copy_from_user(out, buf, sizeof(*out))) return -EFAULT; return 0; case IPC_OLD: { struct msqid_ds tbuf_old; if (copy_from_user(&tbuf_old, buf, sizeof(tbuf_old))) return -EFAULT; out->msg_perm.uid = tbuf_old.msg_perm.uid; out->msg_perm.gid = tbuf_old.msg_perm.gid; out->msg_perm.mode = tbuf_old.msg_perm.mode; if (tbuf_old.msg_qbytes == 0) out->msg_qbytes = tbuf_old.msg_lqbytes; else out->msg_qbytes = tbuf_old.msg_qbytes; return 0; } default: return -EINVAL; } } /* * This function handles some msgctl commands which require the rwsem * to be held in write mode. * NOTE: no locks must be held, the rwsem is taken inside this function. */ static int msgctl_down(struct ipc_namespace *ns, int msqid, int cmd, struct ipc64_perm *perm, int msg_qbytes) { struct kern_ipc_perm *ipcp; struct msg_queue *msq; int err; down_write(&msg_ids(ns).rwsem); rcu_read_lock(); ipcp = ipcctl_obtain_check(ns, &msg_ids(ns), msqid, cmd, perm, msg_qbytes); if (IS_ERR(ipcp)) { err = PTR_ERR(ipcp); goto out_unlock1; } msq = container_of(ipcp, struct msg_queue, q_perm); err = security_msg_queue_msgctl(&msq->q_perm, cmd); if (err) goto out_unlock1; switch (cmd) { case IPC_RMID: ipc_lock_object(&msq->q_perm); /* freeque unlocks the ipc object and rcu */ freeque(ns, ipcp); goto out_up; case IPC_SET: { DEFINE_WAKE_Q(wake_q); if (msg_qbytes > ns->msg_ctlmnb && !capable(CAP_SYS_RESOURCE)) { err = -EPERM; goto out_unlock1; } ipc_lock_object(&msq->q_perm); err = ipc_update_perm(perm, ipcp); if (err) goto out_unlock0; msq->q_qbytes = msg_qbytes; msq->q_ctime = ktime_get_real_seconds(); /* * Sleeping receivers might be excluded by * stricter permissions. */ expunge_all(msq, -EAGAIN, &wake_q); /* * Sleeping senders might be able to send * due to a larger queue size. */ ss_wakeup(msq, &wake_q, false); ipc_unlock_object(&msq->q_perm); wake_up_q(&wake_q); goto out_unlock1; } default: err = -EINVAL; goto out_unlock1; } out_unlock0: ipc_unlock_object(&msq->q_perm); out_unlock1: rcu_read_unlock(); out_up: up_write(&msg_ids(ns).rwsem); return err; } static int msgctl_info(struct ipc_namespace *ns, int msqid, int cmd, struct msginfo *msginfo) { int err; int max_idx; /* * We must not return kernel stack data. * due to padding, it's not enough * to set all member fields. */ err = security_msg_queue_msgctl(NULL, cmd); if (err) return err; memset(msginfo, 0, sizeof(*msginfo)); msginfo->msgmni = ns->msg_ctlmni; msginfo->msgmax = ns->msg_ctlmax; msginfo->msgmnb = ns->msg_ctlmnb; msginfo->msgssz = MSGSSZ; msginfo->msgseg = MSGSEG; down_read(&msg_ids(ns).rwsem); if (cmd == MSG_INFO) msginfo->msgpool = msg_ids(ns).in_use; max_idx = ipc_get_maxidx(&msg_ids(ns)); up_read(&msg_ids(ns).rwsem); if (cmd == MSG_INFO) { msginfo->msgmap = min_t(int, percpu_counter_sum(&ns->percpu_msg_hdrs), INT_MAX); msginfo->msgtql = min_t(int, percpu_counter_sum(&ns->percpu_msg_bytes), INT_MAX); } else { msginfo->msgmap = MSGMAP; msginfo->msgpool = MSGPOOL; msginfo->msgtql = MSGTQL; } return (max_idx < 0) ? 0 : max_idx; } static int msgctl_stat(struct ipc_namespace *ns, int msqid, int cmd, struct msqid64_ds *p) { struct msg_queue *msq; int err; memset(p, 0, sizeof(*p)); rcu_read_lock(); if (cmd == MSG_STAT || cmd == MSG_STAT_ANY) { msq = msq_obtain_object(ns, msqid); if (IS_ERR(msq)) { err = PTR_ERR(msq); goto out_unlock; } } else { /* IPC_STAT */ msq = msq_obtain_object_check(ns, msqid); if (IS_ERR(msq)) { err = PTR_ERR(msq); goto out_unlock; } } /* see comment for SHM_STAT_ANY */ if (cmd == MSG_STAT_ANY) audit_ipc_obj(&msq->q_perm); else { err = -EACCES; if (ipcperms(ns, &msq->q_perm, S_IRUGO)) goto out_unlock; } err = security_msg_queue_msgctl(&msq->q_perm, cmd); if (err) goto out_unlock; ipc_lock_object(&msq->q_perm); if (!ipc_valid_object(&msq->q_perm)) { ipc_unlock_object(&msq->q_perm); err = -EIDRM; goto out_unlock; } kernel_to_ipc64_perm(&msq->q_perm, &p->msg_perm); p->msg_stime = msq->q_stime; p->msg_rtime = msq->q_rtime; p->msg_ctime = msq->q_ctime; #ifndef CONFIG_64BIT p->msg_stime_high = msq->q_stime >> 32; p->msg_rtime_high = msq->q_rtime >> 32; p->msg_ctime_high = msq->q_ctime >> 32; #endif p->msg_cbytes = msq->q_cbytes; p->msg_qnum = msq->q_qnum; p->msg_qbytes = msq->q_qbytes; p->msg_lspid = pid_vnr(msq->q_lspid); p->msg_lrpid = pid_vnr(msq->q_lrpid); if (cmd == IPC_STAT) { /* * As defined in SUS: * Return 0 on success */ err = 0; } else { /* * MSG_STAT and MSG_STAT_ANY (both Linux specific) * Return the full id, including the sequence number */ err = msq->q_perm.id; } ipc_unlock_object(&msq->q_perm); out_unlock: rcu_read_unlock(); return err; } static long ksys_msgctl(int msqid, int cmd, struct msqid_ds __user *buf, int version) { struct ipc_namespace *ns; struct msqid64_ds msqid64; int err; if (msqid < 0 || cmd < 0) return -EINVAL; ns = current->nsproxy->ipc_ns; switch (cmd) { case IPC_INFO: case MSG_INFO: { struct msginfo msginfo; err = msgctl_info(ns, msqid, cmd, &msginfo); if (err < 0) return err; if (copy_to_user(buf, &msginfo, sizeof(struct msginfo))) err = -EFAULT; return err; } case MSG_STAT: /* msqid is an index rather than a msg queue id */ case MSG_STAT_ANY: case IPC_STAT: err = msgctl_stat(ns, msqid, cmd, &msqid64); if (err < 0) return err; if (copy_msqid_to_user(buf, &msqid64, version)) err = -EFAULT; return err; case IPC_SET: if (copy_msqid_from_user(&msqid64, buf, version)) return -EFAULT; return msgctl_down(ns, msqid, cmd, &msqid64.msg_perm, msqid64.msg_qbytes); case IPC_RMID: return msgctl_down(ns, msqid, cmd, NULL, 0); default: return -EINVAL; } } SYSCALL_DEFINE3(msgctl, int, msqid, int, cmd, struct msqid_ds __user *, buf) { return ksys_msgctl(msqid, cmd, buf, IPC_64); } #ifdef CONFIG_ARCH_WANT_IPC_PARSE_VERSION long ksys_old_msgctl(int msqid, int cmd, struct msqid_ds __user *buf) { int version = ipc_parse_version(&cmd); return ksys_msgctl(msqid, cmd, buf, version); } SYSCALL_DEFINE3(old_msgctl, int, msqid, int, cmd, struct msqid_ds __user *, buf) { return ksys_old_msgctl(msqid, cmd, buf); } #endif #ifdef CONFIG_COMPAT struct compat_msqid_ds { struct compat_ipc_perm msg_perm; compat_uptr_t msg_first; compat_uptr_t msg_last; old_time32_t msg_stime; old_time32_t msg_rtime; old_time32_t msg_ctime; compat_ulong_t msg_lcbytes; compat_ulong_t msg_lqbytes; unsigned short msg_cbytes; unsigned short msg_qnum; unsigned short msg_qbytes; compat_ipc_pid_t msg_lspid; compat_ipc_pid_t msg_lrpid; }; static int copy_compat_msqid_from_user(struct msqid64_ds *out, void __user *buf, int version) { memset(out, 0, sizeof(*out)); if (version == IPC_64) { struct compat_msqid64_ds __user *p = buf; if (get_compat_ipc64_perm(&out->msg_perm, &p->msg_perm)) return -EFAULT; if (get_user(out->msg_qbytes, &p->msg_qbytes)) return -EFAULT; } else { struct compat_msqid_ds __user *p = buf; if (get_compat_ipc_perm(&out->msg_perm, &p->msg_perm)) return -EFAULT; if (get_user(out->msg_qbytes, &p->msg_qbytes)) return -EFAULT; } return 0; } static int copy_compat_msqid_to_user(void __user *buf, struct msqid64_ds *in, int version) { if (version == IPC_64) { struct compat_msqid64_ds v; memset(&v, 0, sizeof(v)); to_compat_ipc64_perm(&v.msg_perm, &in->msg_perm); v.msg_stime = lower_32_bits(in->msg_stime); v.msg_stime_high = upper_32_bits(in->msg_stime); v.msg_rtime = lower_32_bits(in->msg_rtime); v.msg_rtime_high = upper_32_bits(in->msg_rtime); v.msg_ctime = lower_32_bits(in->msg_ctime); v.msg_ctime_high = upper_32_bits(in->msg_ctime); v.msg_cbytes = in->msg_cbytes; v.msg_qnum = in->msg_qnum; v.msg_qbytes = in->msg_qbytes; v.msg_lspid = in->msg_lspid; v.msg_lrpid = in->msg_lrpid; return copy_to_user(buf, &v, sizeof(v)); } else { struct compat_msqid_ds v; memset(&v, 0, sizeof(v)); to_compat_ipc_perm(&v.msg_perm, &in->msg_perm); v.msg_stime = in->msg_stime; v.msg_rtime = in->msg_rtime; v.msg_ctime = in->msg_ctime; v.msg_cbytes = in->msg_cbytes; v.msg_qnum = in->msg_qnum; v.msg_qbytes = in->msg_qbytes; v.msg_lspid = in->msg_lspid; v.msg_lrpid = in->msg_lrpid; return copy_to_user(buf, &v, sizeof(v)); } } static long compat_ksys_msgctl(int msqid, int cmd, void __user *uptr, int version) { struct ipc_namespace *ns; int err; struct msqid64_ds msqid64; ns = current->nsproxy->ipc_ns; if (msqid < 0 || cmd < 0) return -EINVAL; switch (cmd & (~IPC_64)) { case IPC_INFO: case MSG_INFO: { struct msginfo msginfo; err = msgctl_info(ns, msqid, cmd, &msginfo); if (err < 0) return err; if (copy_to_user(uptr, &msginfo, sizeof(struct msginfo))) err = -EFAULT; return err; } case IPC_STAT: case MSG_STAT: case MSG_STAT_ANY: err = msgctl_stat(ns, msqid, cmd, &msqid64); if (err < 0) return err; if (copy_compat_msqid_to_user(uptr, &msqid64, version)) err = -EFAULT; return err; case IPC_SET: if (copy_compat_msqid_from_user(&msqid64, uptr, version)) return -EFAULT; return msgctl_down(ns, msqid, cmd, &msqid64.msg_perm, msqid64.msg_qbytes); case IPC_RMID: return msgctl_down(ns, msqid, cmd, NULL, 0); default: return -EINVAL; } } COMPAT_SYSCALL_DEFINE3(msgctl, int, msqid, int, cmd, void __user *, uptr) { return compat_ksys_msgctl(msqid, cmd, uptr, IPC_64); } #ifdef CONFIG_ARCH_WANT_COMPAT_IPC_PARSE_VERSION long compat_ksys_old_msgctl(int msqid, int cmd, void __user *uptr) { int version = compat_ipc_parse_version(&cmd); return compat_ksys_msgctl(msqid, cmd, uptr, version); } COMPAT_SYSCALL_DEFINE3(old_msgctl, int, msqid, int, cmd, void __user *, uptr) { return compat_ksys_old_msgctl(msqid, cmd, uptr); } #endif #endif static int testmsg(struct msg_msg *msg, long type, int mode) { switch (mode) { case SEARCH_ANY: case SEARCH_NUMBER: return 1; case SEARCH_LESSEQUAL: if (msg->m_type <= type) return 1; break; case SEARCH_EQUAL: if (msg->m_type == type) return 1; break; case SEARCH_NOTEQUAL: if (msg->m_type != type) return 1; break; } return 0; } static inline int pipelined_send(struct msg_queue *msq, struct msg_msg *msg, struct wake_q_head *wake_q) { struct msg_receiver *msr, *t; list_for_each_entry_safe(msr, t, &msq->q_receivers, r_list) { if (testmsg(msg, msr->r_msgtype, msr->r_mode) && !security_msg_queue_msgrcv(&msq->q_perm, msg, msr->r_tsk, msr->r_msgtype, msr->r_mode)) { list_del(&msr->r_list); if (msr->r_maxsize < msg->m_ts) { wake_q_add(wake_q, msr->r_tsk); /* See expunge_all regarding memory barrier */ smp_store_release(&msr->r_msg, ERR_PTR(-E2BIG)); } else { ipc_update_pid(&msq->q_lrpid, task_pid(msr->r_tsk)); msq->q_rtime = ktime_get_real_seconds(); wake_q_add(wake_q, msr->r_tsk); /* See expunge_all regarding memory barrier */ smp_store_release(&msr->r_msg, msg); return 1; } } } return 0; } static long do_msgsnd(int msqid, long mtype, void __user *mtext, size_t msgsz, int msgflg) { struct msg_queue *msq; struct msg_msg *msg; int err; struct ipc_namespace *ns; DEFINE_WAKE_Q(wake_q); ns = current->nsproxy->ipc_ns; if (msgsz > ns->msg_ctlmax || (long) msgsz < 0 || msqid < 0) return -EINVAL; if (mtype < 1) return -EINVAL; msg = load_msg(mtext, msgsz); if (IS_ERR(msg)) return PTR_ERR(msg); msg->m_type = mtype; msg->m_ts = msgsz; rcu_read_lock(); msq = msq_obtain_object_check(ns, msqid); if (IS_ERR(msq)) { err = PTR_ERR(msq); goto out_unlock1; } ipc_lock_object(&msq->q_perm); for (;;) { struct msg_sender s; err = -EACCES; if (ipcperms(ns, &msq->q_perm, S_IWUGO)) goto out_unlock0; /* raced with RMID? */ if (!ipc_valid_object(&msq->q_perm)) { err = -EIDRM; goto out_unlock0; } err = security_msg_queue_msgsnd(&msq->q_perm, msg, msgflg); if (err) goto out_unlock0; if (msg_fits_inqueue(msq, msgsz)) break; /* queue full, wait: */ if (msgflg & IPC_NOWAIT) { err = -EAGAIN; goto out_unlock0; } /* enqueue the sender and prepare to block */ ss_add(msq, &s, msgsz); if (!ipc_rcu_getref(&msq->q_perm)) { err = -EIDRM; goto out_unlock0; } ipc_unlock_object(&msq->q_perm); rcu_read_unlock(); schedule(); rcu_read_lock(); ipc_lock_object(&msq->q_perm); ipc_rcu_putref(&msq->q_perm, msg_rcu_free); /* raced with RMID? */ if (!ipc_valid_object(&msq->q_perm)) { err = -EIDRM; goto out_unlock0; } ss_del(&s); if (signal_pending(current)) { err = -ERESTARTNOHAND; goto out_unlock0; } } ipc_update_pid(&msq->q_lspid, task_tgid(current)); msq->q_stime = ktime_get_real_seconds(); if (!pipelined_send(msq, msg, &wake_q)) { /* no one is waiting for this message, enqueue it */ list_add_tail(&msg->m_list, &msq->q_messages); msq->q_cbytes += msgsz; msq->q_qnum++; percpu_counter_add_local(&ns->percpu_msg_bytes, msgsz); percpu_counter_add_local(&ns->percpu_msg_hdrs, 1); } err = 0; msg = NULL; out_unlock0: ipc_unlock_object(&msq->q_perm); wake_up_q(&wake_q); out_unlock1: rcu_read_unlock(); if (msg != NULL) free_msg(msg); return err; } long ksys_msgsnd(int msqid, struct msgbuf __user *msgp, size_t msgsz, int msgflg) { long mtype; if (get_user(mtype, &msgp->mtype)) return -EFAULT; return do_msgsnd(msqid, mtype, msgp->mtext, msgsz, msgflg); } SYSCALL_DEFINE4(msgsnd, int, msqid, struct msgbuf __user *, msgp, size_t, msgsz, int, msgflg) { return ksys_msgsnd(msqid, msgp, msgsz, msgflg); } #ifdef CONFIG_COMPAT struct compat_msgbuf { compat_long_t mtype; char mtext[]; }; long compat_ksys_msgsnd(int msqid, compat_uptr_t msgp, compat_ssize_t msgsz, int msgflg) { struct compat_msgbuf __user *up = compat_ptr(msgp); compat_long_t mtype; if (get_user(mtype, &up->mtype)) return -EFAULT; return do_msgsnd(msqid, mtype, up->mtext, (ssize_t)msgsz, msgflg); } COMPAT_SYSCALL_DEFINE4(msgsnd, int, msqid, compat_uptr_t, msgp, compat_ssize_t, msgsz, int, msgflg) { return compat_ksys_msgsnd(msqid, msgp, msgsz, msgflg); } #endif static inline int convert_mode(long *msgtyp, int msgflg) { if (msgflg & MSG_COPY) return SEARCH_NUMBER; /* * find message of correct type. * msgtyp = 0 => get first. * msgtyp > 0 => get first message of matching type. * msgtyp < 0 => get message with least type must be < abs(msgtype). */ if (*msgtyp == 0) return SEARCH_ANY; if (*msgtyp < 0) { if (*msgtyp == LONG_MIN) /* -LONG_MIN is undefined */ *msgtyp = LONG_MAX; else *msgtyp = -*msgtyp; return SEARCH_LESSEQUAL; } if (msgflg & MSG_EXCEPT) return SEARCH_NOTEQUAL; return SEARCH_EQUAL; } static long do_msg_fill(void __user *dest, struct msg_msg *msg, size_t bufsz) { struct msgbuf __user *msgp = dest; size_t msgsz; if (put_user(msg->m_type, &msgp->mtype)) return -EFAULT; msgsz = (bufsz > msg->m_ts) ? msg->m_ts : bufsz; if (store_msg(msgp->mtext, msg, msgsz)) return -EFAULT; return msgsz; } #ifdef CONFIG_CHECKPOINT_RESTORE /* * This function creates new kernel message structure, large enough to store * bufsz message bytes. */ static inline struct msg_msg *prepare_copy(void __user *buf, size_t bufsz) { struct msg_msg *copy; /* * Create dummy message to copy real message to. */ copy = load_msg(buf, bufsz); if (!IS_ERR(copy)) copy->m_ts = bufsz; return copy; } static inline void free_copy(struct msg_msg *copy) { if (copy) free_msg(copy); } #else static inline struct msg_msg *prepare_copy(void __user *buf, size_t bufsz) { return ERR_PTR(-ENOSYS); } static inline void free_copy(struct msg_msg *copy) { } #endif static struct msg_msg *find_msg(struct msg_queue *msq, long *msgtyp, int mode) { struct msg_msg *msg, *found = NULL; long count = 0; list_for_each_entry(msg, &msq->q_messages, m_list) { if (testmsg(msg, *msgtyp, mode) && !security_msg_queue_msgrcv(&msq->q_perm, msg, current, *msgtyp, mode)) { if (mode == SEARCH_LESSEQUAL && msg->m_type != 1) { *msgtyp = msg->m_type - 1; found = msg; } else if (mode == SEARCH_NUMBER) { if (*msgtyp == count) return msg; } else return msg; count++; } } return found ?: ERR_PTR(-EAGAIN); } static long do_msgrcv(int msqid, void __user *buf, size_t bufsz, long msgtyp, int msgflg, long (*msg_handler)(void __user *, struct msg_msg *, size_t)) { int mode; struct msg_queue *msq; struct ipc_namespace *ns; struct msg_msg *msg, *copy = NULL; DEFINE_WAKE_Q(wake_q); ns = current->nsproxy->ipc_ns; if (msqid < 0 || (long) bufsz < 0) return -EINVAL; if (msgflg & MSG_COPY) { if ((msgflg & MSG_EXCEPT) || !(msgflg & IPC_NOWAIT)) return -EINVAL; copy = prepare_copy(buf, min_t(size_t, bufsz, ns->msg_ctlmax)); if (IS_ERR(copy)) return PTR_ERR(copy); } mode = convert_mode(&msgtyp, msgflg); rcu_read_lock(); msq = msq_obtain_object_check(ns, msqid); if (IS_ERR(msq)) { rcu_read_unlock(); free_copy(copy); return PTR_ERR(msq); } for (;;) { struct msg_receiver msr_d; msg = ERR_PTR(-EACCES); if (ipcperms(ns, &msq->q_perm, S_IRUGO)) goto out_unlock1; ipc_lock_object(&msq->q_perm); /* raced with RMID? */ if (!ipc_valid_object(&msq->q_perm)) { msg = ERR_PTR(-EIDRM); goto out_unlock0; } msg = find_msg(msq, &msgtyp, mode); if (!IS_ERR(msg)) { /* * Found a suitable message. * Unlink it from the queue. */ if ((bufsz < msg->m_ts) && !(msgflg & MSG_NOERROR)) { msg = ERR_PTR(-E2BIG); goto out_unlock0; } /* * If we are copying, then do not unlink message and do * not update queue parameters. */ if (msgflg & MSG_COPY) { msg = copy_msg(msg, copy); goto out_unlock0; } list_del(&msg->m_list); msq->q_qnum--; msq->q_rtime = ktime_get_real_seconds(); ipc_update_pid(&msq->q_lrpid, task_tgid(current)); msq->q_cbytes -= msg->m_ts; percpu_counter_sub_local(&ns->percpu_msg_bytes, msg->m_ts); percpu_counter_sub_local(&ns->percpu_msg_hdrs, 1); ss_wakeup(msq, &wake_q, false); goto out_unlock0; } /* No message waiting. Wait for a message */ if (msgflg & IPC_NOWAIT) { msg = ERR_PTR(-ENOMSG); goto out_unlock0; } list_add_tail(&msr_d.r_list, &msq->q_receivers); msr_d.r_tsk = current; msr_d.r_msgtype = msgtyp; msr_d.r_mode = mode; if (msgflg & MSG_NOERROR) msr_d.r_maxsize = INT_MAX; else msr_d.r_maxsize = bufsz; /* memory barrier not require due to ipc_lock_object() */ WRITE_ONCE(msr_d.r_msg, ERR_PTR(-EAGAIN)); /* memory barrier not required, we own ipc_lock_object() */ __set_current_state(TASK_INTERRUPTIBLE); ipc_unlock_object(&msq->q_perm); rcu_read_unlock(); schedule(); /* * Lockless receive, part 1: * We don't hold a reference to the queue and getting a * reference would defeat the idea of a lockless operation, * thus the code relies on rcu to guarantee the existence of * msq: * Prior to destruction, expunge_all(-EIRDM) changes r_msg. * Thus if r_msg is -EAGAIN, then the queue not yet destroyed. */ rcu_read_lock(); /* * Lockless receive, part 2: * The work in pipelined_send() and expunge_all(): * - Set pointer to message * - Queue the receiver task for later wakeup * - Wake up the process after the lock is dropped. * * Should the process wake up before this wakeup (due to a * signal) it will either see the message and continue ... */ msg = READ_ONCE(msr_d.r_msg); if (msg != ERR_PTR(-EAGAIN)) { /* see MSG_BARRIER for purpose/pairing */ smp_acquire__after_ctrl_dep(); goto out_unlock1; } /* * ... or see -EAGAIN, acquire the lock to check the message * again. */ ipc_lock_object(&msq->q_perm); msg = READ_ONCE(msr_d.r_msg); if (msg != ERR_PTR(-EAGAIN)) goto out_unlock0; list_del(&msr_d.r_list); if (signal_pending(current)) { msg = ERR_PTR(-ERESTARTNOHAND); goto out_unlock0; } ipc_unlock_object(&msq->q_perm); } out_unlock0: ipc_unlock_object(&msq->q_perm); wake_up_q(&wake_q); out_unlock1: rcu_read_unlock(); if (IS_ERR(msg)) { free_copy(copy); return PTR_ERR(msg); } bufsz = msg_handler(buf, msg, bufsz); free_msg(msg); return bufsz; } long ksys_msgrcv(int msqid, struct msgbuf __user *msgp, size_t msgsz, long msgtyp, int msgflg) { return do_msgrcv(msqid, msgp, msgsz, msgtyp, msgflg, do_msg_fill); } SYSCALL_DEFINE5(msgrcv, int, msqid, struct msgbuf __user *, msgp, size_t, msgsz, long, msgtyp, int, msgflg) { return ksys_msgrcv(msqid, msgp, msgsz, msgtyp, msgflg); } #ifdef CONFIG_COMPAT static long compat_do_msg_fill(void __user *dest, struct msg_msg *msg, size_t bufsz) { struct compat_msgbuf __user *msgp = dest; size_t msgsz; if (put_user(msg->m_type, &msgp->mtype)) return -EFAULT; msgsz = (bufsz > msg->m_ts) ? msg->m_ts : bufsz; if (store_msg(msgp->mtext, msg, msgsz)) return -EFAULT; return msgsz; } long compat_ksys_msgrcv(int msqid, compat_uptr_t msgp, compat_ssize_t msgsz, compat_long_t msgtyp, int msgflg) { return do_msgrcv(msqid, compat_ptr(msgp), (ssize_t)msgsz, (long)msgtyp, msgflg, compat_do_msg_fill); } COMPAT_SYSCALL_DEFINE5(msgrcv, int, msqid, compat_uptr_t, msgp, compat_ssize_t, msgsz, compat_long_t, msgtyp, int, msgflg) { return compat_ksys_msgrcv(msqid, msgp, msgsz, msgtyp, msgflg); } #endif int msg_init_ns(struct ipc_namespace *ns) { int ret; ns->msg_ctlmax = MSGMAX; ns->msg_ctlmnb = MSGMNB; ns->msg_ctlmni = MSGMNI; ret = percpu_counter_init(&ns->percpu_msg_bytes, 0, GFP_KERNEL); if (ret) goto fail_msg_bytes; ret = percpu_counter_init(&ns->percpu_msg_hdrs, 0, GFP_KERNEL); if (ret) goto fail_msg_hdrs; ipc_init_ids(&ns->ids[IPC_MSG_IDS]); return 0; fail_msg_hdrs: percpu_counter_destroy(&ns->percpu_msg_bytes); fail_msg_bytes: return ret; } #ifdef CONFIG_IPC_NS void msg_exit_ns(struct ipc_namespace *ns) { free_ipcs(ns, &msg_ids(ns), freeque); idr_destroy(&ns->ids[IPC_MSG_IDS].ipcs_idr); rhashtable_destroy(&ns->ids[IPC_MSG_IDS].key_ht); percpu_counter_destroy(&ns->percpu_msg_bytes); percpu_counter_destroy(&ns->percpu_msg_hdrs); } #endif #ifdef CONFIG_PROC_FS static int sysvipc_msg_proc_show(struct seq_file *s, void *it) { struct pid_namespace *pid_ns = ipc_seq_pid_ns(s); struct user_namespace *user_ns = seq_user_ns(s); struct kern_ipc_perm *ipcp = it; struct msg_queue *msq = container_of(ipcp, struct msg_queue, q_perm); seq_printf(s, "%10d %10d %4o %10lu %10lu %5u %5u %5u %5u %5u %5u %10llu %10llu %10llu\n", msq->q_perm.key, msq->q_perm.id, msq->q_perm.mode, msq->q_cbytes, msq->q_qnum, pid_nr_ns(msq->q_lspid, pid_ns), pid_nr_ns(msq->q_lrpid, pid_ns), from_kuid_munged(user_ns, msq->q_perm.uid), from_kgid_munged(user_ns, msq->q_perm.gid), from_kuid_munged(user_ns, msq->q_perm.cuid), from_kgid_munged(user_ns, msq->q_perm.cgid), msq->q_stime, msq->q_rtime, msq->q_ctime); return 0; } #endif void __init msg_init(void) { msg_init_ns(&init_ipc_ns); ipc_init_proc_interface("sysvipc/msg", " key msqid perms cbytes qnum lspid lrpid uid gid cuid cgid stime rtime ctime\n", IPC_MSG_IDS, sysvipc_msg_proc_show); } |
| 1 1 53 53 53 53 23 23 23 28 28 26 22 19 8 18 19 18 19 9 9 4 4 9 24 9 20 12 9 9 6 6 6 18 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 | // SPDX-License-Identifier: GPL-2.0-or-later /* * Cryptographic API. * * Cipher operations. * * Copyright (c) 2002 James Morris <jmorris@intercode.com.au> * 2002 Adam J. Richter <adam@yggdrasil.com> * 2004 Jean-Luc Cooke <jlcooke@certainkey.com> */ #include <crypto/scatterwalk.h> #include <linux/kernel.h> #include <linux/mm.h> #include <linux/module.h> #include <linux/scatterlist.h> void scatterwalk_skip(struct scatter_walk *walk, unsigned int nbytes) { struct scatterlist *sg = walk->sg; nbytes += walk->offset - sg->offset; while (nbytes > sg->length) { nbytes -= sg->length; sg = sg_next(sg); } walk->sg = sg; walk->offset = sg->offset + nbytes; } EXPORT_SYMBOL_GPL(scatterwalk_skip); inline void memcpy_from_scatterwalk(void *buf, struct scatter_walk *walk, unsigned int nbytes) { do { unsigned int to_copy; to_copy = scatterwalk_next(walk, nbytes); memcpy(buf, walk->addr, to_copy); scatterwalk_done_src(walk, to_copy); buf += to_copy; nbytes -= to_copy; } while (nbytes); } EXPORT_SYMBOL_GPL(memcpy_from_scatterwalk); inline void memcpy_to_scatterwalk(struct scatter_walk *walk, const void *buf, unsigned int nbytes) { do { unsigned int to_copy; to_copy = scatterwalk_next(walk, nbytes); memcpy(walk->addr, buf, to_copy); scatterwalk_done_dst(walk, to_copy); buf += to_copy; nbytes -= to_copy; } while (nbytes); } EXPORT_SYMBOL_GPL(memcpy_to_scatterwalk); void memcpy_from_sglist(void *buf, struct scatterlist *sg, unsigned int start, unsigned int nbytes) { struct scatter_walk walk; if (unlikely(nbytes == 0)) /* in case sg == NULL */ return; scatterwalk_start_at_pos(&walk, sg, start); memcpy_from_scatterwalk(buf, &walk, nbytes); } EXPORT_SYMBOL_GPL(memcpy_from_sglist); void memcpy_to_sglist(struct scatterlist *sg, unsigned int start, const void *buf, unsigned int nbytes) { struct scatter_walk walk; if (unlikely(nbytes == 0)) /* in case sg == NULL */ return; scatterwalk_start_at_pos(&walk, sg, start); memcpy_to_scatterwalk(&walk, buf, nbytes); } EXPORT_SYMBOL_GPL(memcpy_to_sglist); /** * memcpy_sglist() - Copy data from one scatterlist to another * @dst: The destination scatterlist. Can be NULL if @nbytes == 0. * @src: The source scatterlist. Can be NULL if @nbytes == 0. * @nbytes: Number of bytes to copy * * The scatterlists can describe exactly the same memory, in which case this * function is a no-op. No other overlaps are supported. * * Context: Any context */ void memcpy_sglist(struct scatterlist *dst, struct scatterlist *src, unsigned int nbytes) { unsigned int src_offset, dst_offset; if (unlikely(nbytes == 0)) /* in case src and/or dst is NULL */ return; src_offset = src->offset; dst_offset = dst->offset; for (;;) { /* Compute the length to copy this step. */ unsigned int len = min3(src->offset + src->length - src_offset, dst->offset + dst->length - dst_offset, nbytes); struct page *src_page = sg_page(src); struct page *dst_page = sg_page(dst); const void *src_virt; void *dst_virt; if (IS_ENABLED(CONFIG_HIGHMEM)) { /* HIGHMEM: we may have to actually map the pages. */ const unsigned int src_oip = offset_in_page(src_offset); const unsigned int dst_oip = offset_in_page(dst_offset); const unsigned int limit = PAGE_SIZE; /* Further limit len to not cross a page boundary. */ len = min3(len, limit - src_oip, limit - dst_oip); /* Compute the source and destination pages. */ src_page += src_offset / PAGE_SIZE; dst_page += dst_offset / PAGE_SIZE; if (src_page != dst_page) { /* Copy between different pages. */ memcpy_page(dst_page, dst_oip, src_page, src_oip, len); flush_dcache_page(dst_page); } else if (src_oip != dst_oip) { /* Copy between different parts of same page. */ dst_virt = kmap_local_page(dst_page); memcpy(dst_virt + dst_oip, dst_virt + src_oip, len); kunmap_local(dst_virt); flush_dcache_page(dst_page); } /* Else, it's the same memory. No action needed. */ } else { /* * !HIGHMEM: no mapping needed. Just work in the linear * buffer of each sg entry. Note that we can cross page * boundaries, as they are not significant in this case. */ src_virt = page_address(src_page) + src_offset; dst_virt = page_address(dst_page) + dst_offset; if (src_virt != dst_virt) { memcpy(dst_virt, src_virt, len); if (ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE) __scatterwalk_flush_dcache_pages( dst_page, dst_offset, len); } /* Else, it's the same memory. No action needed. */ } nbytes -= len; if (nbytes == 0) /* No more to copy? */ break; /* * There's more to copy. Advance the offsets by the length * copied this step, and advance the sg entries as needed. */ src_offset += len; if (src_offset >= src->offset + src->length) { src = sg_next(src); src_offset = src->offset; } dst_offset += len; if (dst_offset >= dst->offset + dst->length) { dst = sg_next(dst); dst_offset = dst->offset; } } } EXPORT_SYMBOL_GPL(memcpy_sglist); struct scatterlist *scatterwalk_ffwd(struct scatterlist dst[2], struct scatterlist *src, unsigned int len) { for (;;) { if (!len) return src; if (src->length > len) break; len -= src->length; src = sg_next(src); } sg_init_table(dst, 2); sg_set_page(dst, sg_page(src), src->length - len, src->offset + len); scatterwalk_crypto_chain(dst, sg_next(src), 2); return dst; } EXPORT_SYMBOL_GPL(scatterwalk_ffwd); |
| 16 16 16 16 15 16 16 16 16 15 16 15 16 16 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 | // SPDX-License-Identifier: GPL-2.0-or-later /* * acpi_osl.c - OS-dependent functions ($Revision: 83 $) * * Copyright (C) 2000 Andrew Henroid * Copyright (C) 2001, 2002 Andy Grover <andrew.grover@intel.com> * Copyright (C) 2001, 2002 Paul Diefenbaugh <paul.s.diefenbaugh@intel.com> * Copyright (c) 2008 Intel Corporation * Author: Matthew Wilcox <willy@linux.intel.com> */ #define pr_fmt(fmt) "ACPI: OSL: " fmt #include <linux/module.h> #include <linux/kernel.h> #include <linux/slab.h> #include <linux/mm.h> #include <linux/highmem.h> #include <linux/lockdep.h> #include <linux/pci.h> #include <linux/interrupt.h> #include <linux/kmod.h> #include <linux/delay.h> #include <linux/workqueue.h> #include <linux/nmi.h> #include <linux/acpi.h> #include <linux/efi.h> #include <linux/ioport.h> #include <linux/list.h> #include <linux/jiffies.h> #include <linux/semaphore.h> #include <linux/security.h> #include <asm/io.h> #include <linux/uaccess.h> #include <linux/io-64-nonatomic-lo-hi.h> #include "acpica/accommon.h" #include "internal.h" /* Definitions for ACPI_DEBUG_PRINT() */ #define _COMPONENT ACPI_OS_SERVICES ACPI_MODULE_NAME("osl"); struct acpi_os_dpc { acpi_osd_exec_callback function; void *context; struct work_struct work; }; #ifdef ENABLE_DEBUGGER #include <linux/kdb.h> /* stuff for debugger support */ int acpi_in_debugger; EXPORT_SYMBOL(acpi_in_debugger); #endif /*ENABLE_DEBUGGER */ static int (*__acpi_os_prepare_sleep)(u8 sleep_state, u32 pm1a_ctrl, u32 pm1b_ctrl); static int (*__acpi_os_prepare_extended_sleep)(u8 sleep_state, u32 val_a, u32 val_b); static acpi_osd_handler acpi_irq_handler; static void *acpi_irq_context; static struct workqueue_struct *kacpid_wq; static struct workqueue_struct *kacpi_notify_wq; static struct workqueue_struct *kacpi_hotplug_wq; static bool acpi_os_initialized; unsigned int acpi_sci_irq = INVALID_ACPI_IRQ; bool acpi_permanent_mmap = false; /* * This list of permanent mappings is for memory that may be accessed from * interrupt context, where we can't do the ioremap(). */ struct acpi_ioremap { struct list_head list; void __iomem *virt; acpi_physical_address phys; acpi_size size; union { unsigned long refcount; struct rcu_work rwork; } track; }; static LIST_HEAD(acpi_ioremaps); static DEFINE_MUTEX(acpi_ioremap_lock); #define acpi_ioremap_lock_held() lock_is_held(&acpi_ioremap_lock.dep_map) static void __init acpi_request_region (struct acpi_generic_address *gas, unsigned int length, char *desc) { u64 addr; /* Handle possible alignment issues */ memcpy(&addr, &gas->address, sizeof(addr)); if (!addr || !length) return; /* Resources are never freed */ if (gas->space_id == ACPI_ADR_SPACE_SYSTEM_IO) request_region(addr, length, desc); else if (gas->space_id == ACPI_ADR_SPACE_SYSTEM_MEMORY) request_mem_region(addr, length, desc); } static int __init acpi_reserve_resources(void) { acpi_request_region(&acpi_gbl_FADT.xpm1a_event_block, acpi_gbl_FADT.pm1_event_length, "ACPI PM1a_EVT_BLK"); acpi_request_region(&acpi_gbl_FADT.xpm1b_event_block, acpi_gbl_FADT.pm1_event_length, "ACPI PM1b_EVT_BLK"); acpi_request_region(&acpi_gbl_FADT.xpm1a_control_block, acpi_gbl_FADT.pm1_control_length, "ACPI PM1a_CNT_BLK"); acpi_request_region(&acpi_gbl_FADT.xpm1b_control_block, acpi_gbl_FADT.pm1_control_length, "ACPI PM1b_CNT_BLK"); if (acpi_gbl_FADT.pm_timer_length == 4) acpi_request_region(&acpi_gbl_FADT.xpm_timer_block, 4, "ACPI PM_TMR"); acpi_request_region(&acpi_gbl_FADT.xpm2_control_block, acpi_gbl_FADT.pm2_control_length, "ACPI PM2_CNT_BLK"); /* Length of GPE blocks must be a non-negative multiple of 2 */ if (!(acpi_gbl_FADT.gpe0_block_length & 0x1)) acpi_request_region(&acpi_gbl_FADT.xgpe0_block, acpi_gbl_FADT.gpe0_block_length, "ACPI GPE0_BLK"); if (!(acpi_gbl_FADT.gpe1_block_length & 0x1)) acpi_request_region(&acpi_gbl_FADT.xgpe1_block, acpi_gbl_FADT.gpe1_block_length, "ACPI GPE1_BLK"); return 0; } fs_initcall_sync(acpi_reserve_resources); void acpi_os_printf(const char *fmt, ...) { va_list args; va_start(args, fmt); acpi_os_vprintf(fmt, args); va_end(args); } EXPORT_SYMBOL(acpi_os_printf); void __printf(1, 0) acpi_os_vprintf(const char *fmt, va_list args) { static char buffer[512]; vsprintf(buffer, fmt, args); #ifdef ENABLE_DEBUGGER if (acpi_in_debugger) { kdb_printf("%s", buffer); } else { if (printk_get_level(buffer)) printk("%s", buffer); else printk(KERN_CONT "%s", buffer); } #else if (acpi_debugger_write_log(buffer) < 0) { if (printk_get_level(buffer)) printk("%s", buffer); else printk(KERN_CONT "%s", buffer); } #endif } #ifdef CONFIG_KEXEC static unsigned long acpi_rsdp; static int __init setup_acpi_rsdp(char *arg) { return kstrtoul(arg, 16, &acpi_rsdp); } early_param("acpi_rsdp", setup_acpi_rsdp); #endif acpi_physical_address __init acpi_os_get_root_pointer(void) { acpi_physical_address pa; #ifdef CONFIG_KEXEC /* * We may have been provided with an RSDP on the command line, * but if a malicious user has done so they may be pointing us * at modified ACPI tables that could alter kernel behaviour - * so, we check the lockdown status before making use of * it. If we trust it then also stash it in an architecture * specific location (if appropriate) so it can be carried * over further kexec()s. */ if (acpi_rsdp && !security_locked_down(LOCKDOWN_ACPI_TABLES)) { acpi_arch_set_root_pointer(acpi_rsdp); return acpi_rsdp; } #endif pa = acpi_arch_get_root_pointer(); if (pa) return pa; if (efi_enabled(EFI_CONFIG_TABLES)) { if (efi.acpi20 != EFI_INVALID_TABLE_ADDR) return efi.acpi20; if (efi.acpi != EFI_INVALID_TABLE_ADDR) return efi.acpi; pr_err("System description tables not found\n"); } else if (IS_ENABLED(CONFIG_ACPI_LEGACY_TABLES_LOOKUP)) { acpi_find_root_pointer(&pa); } return pa; } /* Must be called with 'acpi_ioremap_lock' or RCU read lock held. */ static struct acpi_ioremap * acpi_map_lookup(acpi_physical_address phys, acpi_size size) { struct acpi_ioremap *map; list_for_each_entry_rcu(map, &acpi_ioremaps, list, acpi_ioremap_lock_held()) if (map->phys <= phys && phys + size <= map->phys + map->size) return map; return NULL; } /* Must be called with 'acpi_ioremap_lock' or RCU read lock held. */ static void __iomem * acpi_map_vaddr_lookup(acpi_physical_address phys, unsigned int size) { struct acpi_ioremap *map; map = acpi_map_lookup(phys, size); if (map) return map->virt + (phys - map->phys); return NULL; } void __iomem *acpi_os_get_iomem(acpi_physical_address phys, unsigned int size) { struct acpi_ioremap *map; void __iomem *virt = NULL; mutex_lock(&acpi_ioremap_lock); map = acpi_map_lookup(phys, size); if (map) { virt = map->virt + (phys - map->phys); map->track.refcount++; } mutex_unlock(&acpi_ioremap_lock); return virt; } EXPORT_SYMBOL_GPL(acpi_os_get_iomem); /* Must be called with 'acpi_ioremap_lock' or RCU read lock held. */ static struct acpi_ioremap * acpi_map_lookup_virt(void __iomem *virt, acpi_size size) { struct acpi_ioremap *map; list_for_each_entry_rcu(map, &acpi_ioremaps, list, acpi_ioremap_lock_held()) if (map->virt <= virt && virt + size <= map->virt + map->size) return map; return NULL; } #if defined(CONFIG_ARM64) || defined(CONFIG_RISCV) /* ioremap will take care of cache attributes */ #define should_use_kmap(pfn) 0 #else #define should_use_kmap(pfn) page_is_ram(pfn) #endif static void __iomem *acpi_map(acpi_physical_address pg_off, unsigned long pg_sz) { unsigned long pfn; pfn = pg_off >> PAGE_SHIFT; if (should_use_kmap(pfn)) { if (pg_sz > PAGE_SIZE) return NULL; return (void __iomem __force *)kmap(pfn_to_page(pfn)); } else return acpi_os_ioremap(pg_off, pg_sz); } static void acpi_unmap(acpi_physical_address pg_off, void __iomem *vaddr) { unsigned long pfn; pfn = pg_off >> PAGE_SHIFT; if (should_use_kmap(pfn)) kunmap(pfn_to_page(pfn)); else iounmap(vaddr); } /** * acpi_os_map_iomem - Get a virtual address for a given physical address range. * @phys: Start of the physical address range to map. * @size: Size of the physical address range to map. * * Look up the given physical address range in the list of existing ACPI memory * mappings. If found, get a reference to it and return a pointer to it (its * virtual address). If not found, map it, add it to that list and return a * pointer to it. * * During early init (when acpi_permanent_mmap has not been set yet) this * routine simply calls __acpi_map_table() to get the job done. */ void __iomem __ref *acpi_os_map_iomem(acpi_physical_address phys, acpi_size size) { struct acpi_ioremap *map; void __iomem *virt; acpi_physical_address pg_off; acpi_size pg_sz; if (phys > ULONG_MAX) { pr_err("Cannot map memory that high: 0x%llx\n", phys); return NULL; } if (!acpi_permanent_mmap) return __acpi_map_table((unsigned long)phys, size); mutex_lock(&acpi_ioremap_lock); /* Check if there's a suitable mapping already. */ map = acpi_map_lookup(phys, size); if (map) { map->track.refcount++; goto out; } map = kzalloc_obj(*map); if (!map) { mutex_unlock(&acpi_ioremap_lock); return NULL; } pg_off = round_down(phys, PAGE_SIZE); pg_sz = round_up(phys + size, PAGE_SIZE) - pg_off; virt = acpi_map(phys, size); if (!virt) { mutex_unlock(&acpi_ioremap_lock); kfree(map); return NULL; } INIT_LIST_HEAD(&map->list); map->virt = (void __iomem __force *)((unsigned long)virt & PAGE_MASK); map->phys = pg_off; map->size = pg_sz; map->track.refcount = 1; list_add_tail_rcu(&map->list, &acpi_ioremaps); out: mutex_unlock(&acpi_ioremap_lock); return map->virt + (phys - map->phys); } EXPORT_SYMBOL_GPL(acpi_os_map_iomem); void *__ref acpi_os_map_memory(acpi_physical_address phys, acpi_size size) { return (void *)acpi_os_map_iomem(phys, size); } EXPORT_SYMBOL_GPL(acpi_os_map_memory); static void acpi_os_map_remove(struct work_struct *work) { struct acpi_ioremap *map = container_of(to_rcu_work(work), struct acpi_ioremap, track.rwork); acpi_unmap(map->phys, map->virt); kfree(map); } /* Must be called with mutex_lock(&acpi_ioremap_lock) */ static void acpi_os_drop_map_ref(struct acpi_ioremap *map) { if (--map->track.refcount) return; list_del_rcu(&map->list); INIT_RCU_WORK(&map->track.rwork, acpi_os_map_remove); queue_rcu_work(system_percpu_wq, &map->track.rwork); } /** * acpi_os_unmap_iomem - Drop a memory mapping reference. * @virt: Start of the address range to drop a reference to. * @size: Size of the address range to drop a reference to. * * Look up the given virtual address range in the list of existing ACPI memory * mappings, drop a reference to it and if there are no more active references * to it, queue it up for later removal. * * During early init (when acpi_permanent_mmap has not been set yet) this * routine simply calls __acpi_unmap_table() to get the job done. Since * __acpi_unmap_table() is an __init function, the __ref annotation is needed * here. */ void __ref acpi_os_unmap_iomem(void __iomem *virt, acpi_size size) { struct acpi_ioremap *map; if (!acpi_permanent_mmap) { __acpi_unmap_table(virt, size); return; } mutex_lock(&acpi_ioremap_lock); map = acpi_map_lookup_virt(virt, size); if (!map) { mutex_unlock(&acpi_ioremap_lock); WARN(true, "ACPI: %s: bad address %p\n", __func__, virt); return; } acpi_os_drop_map_ref(map); mutex_unlock(&acpi_ioremap_lock); } EXPORT_SYMBOL_GPL(acpi_os_unmap_iomem); /** * acpi_os_unmap_memory - Drop a memory mapping reference. * @virt: Start of the address range to drop a reference to. * @size: Size of the address range to drop a reference to. */ void __ref acpi_os_unmap_memory(void *virt, acpi_size size) { acpi_os_unmap_iomem((void __iomem *)virt, size); } EXPORT_SYMBOL_GPL(acpi_os_unmap_memory); void __iomem *acpi_os_map_generic_address(struct acpi_generic_address *gas) { u64 addr; if (gas->space_id != ACPI_ADR_SPACE_SYSTEM_MEMORY) return NULL; /* Handle possible alignment issues */ memcpy(&addr, &gas->address, sizeof(addr)); if (!addr || !gas->bit_width) return NULL; return acpi_os_map_iomem(addr, gas->bit_width / 8); } EXPORT_SYMBOL(acpi_os_map_generic_address); void acpi_os_unmap_generic_address(struct acpi_generic_address *gas) { u64 addr; struct acpi_ioremap *map; if (gas->space_id != ACPI_ADR_SPACE_SYSTEM_MEMORY) return; /* Handle possible alignment issues */ memcpy(&addr, &gas->address, sizeof(addr)); if (!addr || !gas->bit_width) return; mutex_lock(&acpi_ioremap_lock); map = acpi_map_lookup(addr, gas->bit_width / 8); if (!map) { mutex_unlock(&acpi_ioremap_lock); return; } acpi_os_drop_map_ref(map); mutex_unlock(&acpi_ioremap_lock); } EXPORT_SYMBOL(acpi_os_unmap_generic_address); #ifdef ACPI_FUTURE_USAGE acpi_status acpi_os_get_physical_address(void *virt, acpi_physical_address *phys) { if (!phys || !virt) return AE_BAD_PARAMETER; *phys = virt_to_phys(virt); return AE_OK; } #endif #ifdef CONFIG_ACPI_REV_OVERRIDE_POSSIBLE static bool acpi_rev_override; int __init acpi_rev_override_setup(char *str) { acpi_rev_override = true; return 1; } __setup("acpi_rev_override", acpi_rev_override_setup); #else #define acpi_rev_override false #endif #define ACPI_MAX_OVERRIDE_LEN 100 static char acpi_os_name[ACPI_MAX_OVERRIDE_LEN]; acpi_status acpi_os_predefined_override(const struct acpi_predefined_names *init_val, acpi_string *new_val) { if (!init_val || !new_val) return AE_BAD_PARAMETER; *new_val = NULL; if (!memcmp(init_val->name, "_OS_", 4) && strlen(acpi_os_name)) { pr_info("Overriding _OS definition to '%s'\n", acpi_os_name); *new_val = acpi_os_name; } if (!memcmp(init_val->name, "_REV", 4) && acpi_rev_override) { pr_info("Overriding _REV return value to 5\n"); *new_val = (char *)5; } return AE_OK; } static irqreturn_t acpi_irq(int irq, void *dev_id) { if ((*acpi_irq_handler)(acpi_irq_context)) { acpi_irq_handled++; return IRQ_HANDLED; } else { acpi_irq_not_handled++; return IRQ_NONE; } } acpi_status acpi_os_install_interrupt_handler(u32 gsi, acpi_osd_handler handler, void *context) { unsigned int irq; acpi_irq_stats_init(); /* * ACPI interrupts different from the SCI in our copy of the FADT are * not supported. */ if (gsi != acpi_gbl_FADT.sci_interrupt) return AE_BAD_PARAMETER; if (acpi_irq_handler) return AE_ALREADY_ACQUIRED; if (acpi_gsi_to_irq(gsi, &irq) < 0) { pr_err("SCI (ACPI GSI %d) not registered\n", gsi); return AE_OK; } acpi_irq_handler = handler; acpi_irq_context = context; if (request_threaded_irq(irq, NULL, acpi_irq, IRQF_SHARED | IRQF_ONESHOT, "acpi", acpi_irq)) { pr_err("SCI (IRQ%d) allocation failed\n", irq); acpi_irq_handler = NULL; return AE_NOT_ACQUIRED; } acpi_sci_irq = irq; return AE_OK; } acpi_status acpi_os_remove_interrupt_handler(u32 gsi, acpi_osd_handler handler) { if (gsi != acpi_gbl_FADT.sci_interrupt || !acpi_sci_irq_valid()) return AE_BAD_PARAMETER; free_irq(acpi_sci_irq, acpi_irq); acpi_irq_handler = NULL; acpi_sci_irq = INVALID_ACPI_IRQ; return AE_OK; } /* * Running in interpreter thread context, safe to sleep */ void acpi_os_sleep(u64 ms) { u64 usec = ms * USEC_PER_MSEC, delta_us = 50; /* * Use a hrtimer because the timer wheel timers are optimized for * cancelation before they expire and this timer is not going to be * canceled. * * Set the delta between the requested sleep time and the effective * deadline to at least 50 us in case there is an opportunity for timer * coalescing. * * Moreover, longer sleeps can be assumed to need somewhat less timer * precision, so sacrifice some of it for making the timer a more likely * candidate for coalescing by setting the delta to 1% of the sleep time * if it is above 5 ms (this value is chosen so that the delta is a * continuous function of the sleep time). */ if (ms > 5) delta_us = (USEC_PER_MSEC / 100) * ms; usleep_range(usec, usec + delta_us); } void acpi_os_stall(u32 us) { while (us) { u32 delay = 1000; if (delay > us) delay = us; udelay(delay); touch_nmi_watchdog(); us -= delay; } } /* * Support ACPI 3.0 AML Timer operand. Returns a 64-bit free-running, * monotonically increasing timer with 100ns granularity. Do not use * ktime_get() to implement this function because this function may get * called after timekeeping has been suspended. Note: calling this function * after timekeeping has been suspended may lead to unexpected results * because when timekeeping is suspended the jiffies counter is not * incremented. See also timekeeping_suspend(). */ u64 acpi_os_get_timer(void) { return (get_jiffies_64() - INITIAL_JIFFIES) * (ACPI_100NSEC_PER_SEC / HZ); } acpi_status acpi_os_read_port(acpi_io_address port, u32 *value, u32 width) { u32 dummy; if (!IS_ENABLED(CONFIG_HAS_IOPORT)) { /* * set all-1 result as if reading from non-existing * I/O port */ *value = GENMASK(width, 0); return AE_NOT_IMPLEMENTED; } if (value) *value = 0; else value = &dummy; if (width <= 8) { *value = inb(port); } else if (width <= 16) { *value = inw(port); } else if (width <= 32) { *value = inl(port); } else { pr_debug("%s: Access width %d not supported\n", __func__, width); return AE_BAD_PARAMETER; } return AE_OK; } EXPORT_SYMBOL(acpi_os_read_port); acpi_status acpi_os_write_port(acpi_io_address port, u32 value, u32 width) { if (!IS_ENABLED(CONFIG_HAS_IOPORT)) return AE_NOT_IMPLEMENTED; if (width <= 8) { outb(value, port); } else if (width <= 16) { outw(value, port); } else if (width <= 32) { outl(value, port); } else { pr_debug("%s: Access width %d not supported\n", __func__, width); return AE_BAD_PARAMETER; } return AE_OK; } EXPORT_SYMBOL(acpi_os_write_port); int acpi_os_read_iomem(void __iomem *virt_addr, u64 *value, u32 width) { switch (width) { case 8: *(u8 *) value = readb(virt_addr); break; case 16: *(u16 *) value = readw(virt_addr); break; case 32: *(u32 *) value = readl(virt_addr); break; case 64: *(u64 *) value = readq(virt_addr); break; default: return -EINVAL; } return 0; } acpi_status acpi_os_read_memory(acpi_physical_address phys_addr, u64 *value, u32 width) { void __iomem *virt_addr; unsigned int size = width / 8; bool unmap = false; u64 dummy; int error; rcu_read_lock(); virt_addr = acpi_map_vaddr_lookup(phys_addr, size); if (!virt_addr) { rcu_read_unlock(); virt_addr = acpi_os_ioremap(phys_addr, size); if (!virt_addr) return AE_BAD_ADDRESS; unmap = true; } if (!value) value = &dummy; error = acpi_os_read_iomem(virt_addr, value, width); BUG_ON(error); if (unmap) iounmap(virt_addr); else rcu_read_unlock(); return AE_OK; } acpi_status acpi_os_write_memory(acpi_physical_address phys_addr, u64 value, u32 width) { void __iomem *virt_addr; unsigned int size = width / 8; bool unmap = false; rcu_read_lock(); virt_addr = acpi_map_vaddr_lookup(phys_addr, size); if (!virt_addr) { rcu_read_unlock(); virt_addr = acpi_os_ioremap(phys_addr, size); if (!virt_addr) return AE_BAD_ADDRESS; unmap = true; } switch (width) { case 8: writeb(value, virt_addr); break; case 16: writew(value, virt_addr); break; case 32: writel(value, virt_addr); break; case 64: writeq(value, virt_addr); break; default: BUG(); } if (unmap) iounmap(virt_addr); else rcu_read_unlock(); return AE_OK; } #ifdef CONFIG_PCI acpi_status acpi_os_read_pci_configuration(struct acpi_pci_id *pci_id, u32 reg, u64 *value, u32 width) { int result, size; u32 value32; if (!value) return AE_BAD_PARAMETER; switch (width) { case 8: size = 1; break; case 16: size = 2; break; case 32: size = 4; break; default: return AE_ERROR; } result = raw_pci_read(pci_id->segment, pci_id->bus, PCI_DEVFN(pci_id->device, pci_id->function), reg, size, &value32); *value = value32; return (result ? AE_ERROR : AE_OK); } acpi_status acpi_os_write_pci_configuration(struct acpi_pci_id *pci_id, u32 reg, u64 value, u32 width) { int result, size; switch (width) { case 8: size = 1; break; case 16: size = 2; break; case 32: size = 4; break; default: return AE_ERROR; } result = raw_pci_write(pci_id->segment, pci_id->bus, PCI_DEVFN(pci_id->device, pci_id->function), reg, size, value); return (result ? AE_ERROR : AE_OK); } #endif static void acpi_os_execute_deferred(struct work_struct *work) { struct acpi_os_dpc *dpc = container_of(work, struct acpi_os_dpc, work); dpc->function(dpc->context); kfree(dpc); } #ifdef CONFIG_ACPI_DEBUGGER static struct acpi_debugger acpi_debugger; static bool acpi_debugger_initialized; int acpi_register_debugger(struct module *owner, const struct acpi_debugger_ops *ops) { int ret = 0; mutex_lock(&acpi_debugger.lock); if (acpi_debugger.ops) { ret = -EBUSY; goto err_lock; } acpi_debugger.owner = owner; acpi_debugger.ops = ops; err_lock: mutex_unlock(&acpi_debugger.lock); return ret; } EXPORT_SYMBOL(acpi_register_debugger); void acpi_unregister_debugger(const struct acpi_debugger_ops *ops) { mutex_lock(&acpi_debugger.lock); if (ops == acpi_debugger.ops) { acpi_debugger.ops = NULL; acpi_debugger.owner = NULL; } mutex_unlock(&acpi_debugger.lock); } EXPORT_SYMBOL(acpi_unregister_debugger); int acpi_debugger_create_thread(acpi_osd_exec_callback function, void *context) { int ret; int (*func)(acpi_osd_exec_callback, void *); struct module *owner; if (!acpi_debugger_initialized) return -ENODEV; mutex_lock(&acpi_debugger.lock); if (!acpi_debugger.ops) { ret = -ENODEV; goto err_lock; } if (!try_module_get(acpi_debugger.owner)) { ret = -ENODEV; goto err_lock; } func = acpi_debugger.ops->create_thread; owner = acpi_debugger.owner; mutex_unlock(&acpi_debugger.lock); ret = func(function, context); mutex_lock(&acpi_debugger.lock); module_put(owner); err_lock: mutex_unlock(&acpi_debugger.lock); return ret; } ssize_t acpi_debugger_write_log(const char *msg) { ssize_t ret; ssize_t (*func)(const char *); struct module *owner; if (!acpi_debugger_initialized) return -ENODEV; mutex_lock(&acpi_debugger.lock); if (!acpi_debugger.ops) { ret = -ENODEV; goto err_lock; } if (!try_module_get(acpi_debugger.owner)) { ret = -ENODEV; goto err_lock; } func = acpi_debugger.ops->write_log; owner = acpi_debugger.owner; mutex_unlock(&acpi_debugger.lock); ret = func(msg); mutex_lock(&acpi_debugger.lock); module_put(owner); err_lock: mutex_unlock(&acpi_debugger.lock); return ret; } ssize_t acpi_debugger_read_cmd(char *buffer, size_t buffer_length) { ssize_t ret; ssize_t (*func)(char *, size_t); struct module *owner; if (!acpi_debugger_initialized) return -ENODEV; mutex_lock(&acpi_debugger.lock); if (!acpi_debugger.ops) { ret = -ENODEV; goto err_lock; } if (!try_module_get(acpi_debugger.owner)) { ret = -ENODEV; goto err_lock; } func = acpi_debugger.ops->read_cmd; owner = acpi_debugger.owner; mutex_unlock(&acpi_debugger.lock); ret = func(buffer, buffer_length); mutex_lock(&acpi_debugger.lock); module_put(owner); err_lock: mutex_unlock(&acpi_debugger.lock); return ret; } int acpi_debugger_wait_command_ready(void) { int ret; int (*func)(bool, char *, size_t); struct module *owner; if (!acpi_debugger_initialized) return -ENODEV; mutex_lock(&acpi_debugger.lock); if (!acpi_debugger.ops) { ret = -ENODEV; goto err_lock; } if (!try_module_get(acpi_debugger.owner)) { ret = -ENODEV; goto err_lock; } func = acpi_debugger.ops->wait_command_ready; owner = acpi_debugger.owner; mutex_unlock(&acpi_debugger.lock); ret = func(acpi_gbl_method_executing, acpi_gbl_db_line_buf, ACPI_DB_LINE_BUFFER_SIZE); mutex_lock(&acpi_debugger.lock); module_put(owner); err_lock: mutex_unlock(&acpi_debugger.lock); return ret; } int acpi_debugger_notify_command_complete(void) { int ret; int (*func)(void); struct module *owner; if (!acpi_debugger_initialized) return -ENODEV; mutex_lock(&acpi_debugger.lock); if (!acpi_debugger.ops) { ret = -ENODEV; goto err_lock; } if (!try_module_get(acpi_debugger.owner)) { ret = -ENODEV; goto err_lock; } func = acpi_debugger.ops->notify_command_complete; owner = acpi_debugger.owner; mutex_unlock(&acpi_debugger.lock); ret = func(); mutex_lock(&acpi_debugger.lock); module_put(owner); err_lock: mutex_unlock(&acpi_debugger.lock); return ret; } int __init acpi_debugger_init(void) { mutex_init(&acpi_debugger.lock); acpi_debugger_initialized = true; return 0; } #endif /******************************************************************************* * * FUNCTION: acpi_os_execute * * PARAMETERS: Type - Type of the callback * Function - Function to be executed * Context - Function parameters * * RETURN: Status * * DESCRIPTION: Depending on type, either queues function for deferred execution or * immediately executes function on a separate thread. * ******************************************************************************/ acpi_status acpi_os_execute(acpi_execute_type type, acpi_osd_exec_callback function, void *context) { struct acpi_os_dpc *dpc; int ret; ACPI_DEBUG_PRINT((ACPI_DB_EXEC, "Scheduling function [%p(%p)] for deferred execution.\n", function, context)); if (type == OSL_DEBUGGER_MAIN_THREAD) { ret = acpi_debugger_create_thread(function, context); if (ret) { pr_err("Kernel thread creation failed\n"); return AE_ERROR; } return AE_OK; } /* * Allocate/initialize DPC structure. Note that this memory will be * freed by the callee. The kernel handles the work_struct list in a * way that allows us to also free its memory inside the callee. * Because we may want to schedule several tasks with different * parameters we can't use the approach some kernel code uses of * having a static work_struct. */ dpc = kzalloc_obj(struct acpi_os_dpc, GFP_ATOMIC); if (!dpc) return AE_NO_MEMORY; dpc->function = function; dpc->context = context; INIT_WORK(&dpc->work, acpi_os_execute_deferred); /* * To prevent lockdep from complaining unnecessarily, make sure that * there is a different static lockdep key for each workqueue by using * INIT_WORK() for each of them separately. */ switch (type) { case OSL_NOTIFY_HANDLER: ret = queue_work(kacpi_notify_wq, &dpc->work); break; case OSL_GPE_HANDLER: /* * On some machines, a software-initiated SMI causes corruption * unless the SMI runs on CPU 0. An SMI can be initiated by * any AML, but typically it's done in GPE-related methods that * are run via workqueues, so we can avoid the known corruption * cases by always queueing on CPU 0. */ ret = queue_work_on(0, kacpid_wq, &dpc->work); break; default: pr_err("Unsupported os_execute type %d.\n", type); goto err; } if (!ret) { pr_err("Unable to queue work\n"); goto err; } return AE_OK; err: kfree(dpc); return AE_ERROR; } EXPORT_SYMBOL(acpi_os_execute); void acpi_os_wait_events_complete(void) { /* * Make sure the GPE handler or the fixed event handler is not used * on another CPU after removal. */ if (acpi_sci_irq_valid()) synchronize_hardirq(acpi_sci_irq); flush_workqueue(kacpid_wq); flush_workqueue(kacpi_notify_wq); } EXPORT_SYMBOL(acpi_os_wait_events_complete); struct acpi_hp_work { struct work_struct work; struct acpi_device *adev; u32 src; }; static void acpi_hotplug_work_fn(struct work_struct *work) { struct acpi_hp_work *hpw = container_of(work, struct acpi_hp_work, work); acpi_os_wait_events_complete(); acpi_device_hotplug(hpw->adev, hpw->src); kfree(hpw); } acpi_status acpi_hotplug_schedule(struct acpi_device *adev, u32 src) { struct acpi_hp_work *hpw; acpi_handle_debug(adev->handle, "Scheduling hotplug event %u for deferred handling\n", src); hpw = kmalloc_obj(*hpw); if (!hpw) return AE_NO_MEMORY; INIT_WORK(&hpw->work, acpi_hotplug_work_fn); hpw->adev = adev; hpw->src = src; /* * We can't run hotplug code in kacpid_wq/kacpid_notify_wq etc., because * the hotplug code may call driver .remove() functions, which may * invoke flush_scheduled_work()/acpi_os_wait_events_complete() to flush * these workqueues. */ if (!queue_work(kacpi_hotplug_wq, &hpw->work)) { kfree(hpw); return AE_ERROR; } return AE_OK; } bool acpi_queue_hotplug_work(struct work_struct *work) { return queue_work(kacpi_hotplug_wq, work); } acpi_status acpi_os_create_semaphore(u32 max_units, u32 initial_units, acpi_handle *handle) { struct semaphore *sem = NULL; sem = acpi_os_allocate_zeroed(sizeof(struct semaphore)); if (!sem) return AE_NO_MEMORY; sema_init(sem, initial_units); *handle = (acpi_handle *) sem; ACPI_DEBUG_PRINT((ACPI_DB_MUTEX, "Creating semaphore[%p|%d].\n", *handle, initial_units)); return AE_OK; } /* * TODO: A better way to delete semaphores? Linux doesn't have a * 'delete_semaphore()' function -- may result in an invalid * pointer dereference for non-synchronized consumers. Should * we at least check for blocked threads and signal/cancel them? */ acpi_status acpi_os_delete_semaphore(acpi_handle handle) { struct semaphore *sem = (struct semaphore *)handle; if (!sem) return AE_BAD_PARAMETER; ACPI_DEBUG_PRINT((ACPI_DB_MUTEX, "Deleting semaphore[%p].\n", handle)); BUG_ON(!list_empty(&sem->wait_list)); kfree(sem); sem = NULL; return AE_OK; } /* * TODO: Support for units > 1? */ acpi_status acpi_os_wait_semaphore(acpi_handle handle, u32 units, u16 timeout) { acpi_status status = AE_OK; struct semaphore *sem = (struct semaphore *)handle; long jiffies; int ret = 0; if (!acpi_os_initialized) return AE_OK; if (!sem || (units < 1)) return AE_BAD_PARAMETER; if (units > 1) return AE_SUPPORT; ACPI_DEBUG_PRINT((ACPI_DB_MUTEX, "Waiting for semaphore[%p|%d|%d]\n", handle, units, timeout)); if (timeout == ACPI_WAIT_FOREVER) jiffies = MAX_SCHEDULE_TIMEOUT; else jiffies = msecs_to_jiffies(timeout); ret = down_timeout(sem, jiffies); if (ret) status = AE_TIME; if (ACPI_FAILURE(status)) { ACPI_DEBUG_PRINT((ACPI_DB_MUTEX, "Failed to acquire semaphore[%p|%d|%d], %s", handle, units, timeout, acpi_format_exception(status))); } else { ACPI_DEBUG_PRINT((ACPI_DB_MUTEX, "Acquired semaphore[%p|%d|%d]", handle, units, timeout)); } return status; } /* * TODO: Support for units > 1? */ acpi_status acpi_os_signal_semaphore(acpi_handle handle, u32 units) { struct semaphore *sem = (struct semaphore *)handle; if (!acpi_os_initialized) return AE_OK; if (!sem || (units < 1)) return AE_BAD_PARAMETER; if (units > 1) return AE_SUPPORT; ACPI_DEBUG_PRINT((ACPI_DB_MUTEX, "Signaling semaphore[%p|%d]\n", handle, units)); up(sem); return AE_OK; } acpi_status acpi_os_get_line(char *buffer, u32 buffer_length, u32 *bytes_read) { #ifdef ENABLE_DEBUGGER if (acpi_in_debugger) { u32 chars; kdb_read(buffer, buffer_length); /* remove the CR kdb includes */ chars = strlen(buffer) - 1; buffer[chars] = '\0'; } #else int ret; ret = acpi_debugger_read_cmd(buffer, buffer_length); if (ret < 0) return AE_ERROR; if (bytes_read) *bytes_read = ret; #endif return AE_OK; } EXPORT_SYMBOL(acpi_os_get_line); acpi_status acpi_os_wait_command_ready(void) { int ret; ret = acpi_debugger_wait_command_ready(); if (ret < 0) return AE_ERROR; return AE_OK; } acpi_status acpi_os_notify_command_complete(void) { int ret; ret = acpi_debugger_notify_command_complete(); if (ret < 0) return AE_ERROR; return AE_OK; } acpi_status acpi_os_signal(u32 function, void *info) { switch (function) { case ACPI_SIGNAL_FATAL: pr_err("Fatal opcode executed\n"); break; case ACPI_SIGNAL_BREAKPOINT: /* * AML Breakpoint * ACPI spec. says to treat it as a NOP unless * you are debugging. So if/when we integrate * AML debugger into the kernel debugger its * hook will go here. But until then it is * not useful to print anything on breakpoints. */ break; default: break; } return AE_OK; } static int __init acpi_os_name_setup(char *str) { char *p = acpi_os_name; int count = ACPI_MAX_OVERRIDE_LEN - 1; if (!str || !*str) return 0; for (; count-- && *str; str++) { if (isalnum(*str) || *str == ' ' || *str == ':') *p++ = *str; else if (*str == '\'' || *str == '"') continue; else break; } *p = 0; return 1; } __setup("acpi_os_name=", acpi_os_name_setup); /* * Disable the auto-serialization of named objects creation methods. * * This feature is enabled by default. It marks the AML control methods * that contain the opcodes to create named objects as "Serialized". */ static int __init acpi_no_auto_serialize_setup(char *str) { acpi_gbl_auto_serialize_methods = FALSE; pr_info("Auto-serialization disabled\n"); return 1; } __setup("acpi_no_auto_serialize", acpi_no_auto_serialize_setup); /* Check of resource interference between native drivers and ACPI * OperationRegions (SystemIO and System Memory only). * IO ports and memory declared in ACPI might be used by the ACPI subsystem * in arbitrary AML code and can interfere with legacy drivers. * acpi_enforce_resources= can be set to: * * - strict (default) (2) * -> further driver trying to access the resources will not load * - lax (1) * -> further driver trying to access the resources will load, but you * get a system message that something might go wrong... * * - no (0) * -> ACPI Operation Region resources will not be registered * */ #define ENFORCE_RESOURCES_STRICT 2 #define ENFORCE_RESOURCES_LAX 1 #define ENFORCE_RESOURCES_NO 0 static unsigned int acpi_enforce_resources = ENFORCE_RESOURCES_STRICT; static int __init acpi_enforce_resources_setup(char *str) { if (str == NULL || *str == '\0') return 0; if (!strcmp("strict", str)) acpi_enforce_resources = ENFORCE_RESOURCES_STRICT; else if (!strcmp("lax", str)) acpi_enforce_resources = ENFORCE_RESOURCES_LAX; else if (!strcmp("no", str)) acpi_enforce_resources = ENFORCE_RESOURCES_NO; return 1; } __setup("acpi_enforce_resources=", acpi_enforce_resources_setup); /* Check for resource conflicts between ACPI OperationRegions and native * drivers */ int acpi_check_resource_conflict(const struct resource *res) { acpi_adr_space_type space_id; if (acpi_enforce_resources == ENFORCE_RESOURCES_NO) return 0; if (res->flags & IORESOURCE_IO) space_id = ACPI_ADR_SPACE_SYSTEM_IO; else if (res->flags & IORESOURCE_MEM) space_id = ACPI_ADR_SPACE_SYSTEM_MEMORY; else return 0; if (!acpi_check_address_range(space_id, res->start, resource_size(res), 1)) return 0; pr_info("Resource conflict; ACPI support missing from driver?\n"); if (acpi_enforce_resources == ENFORCE_RESOURCES_STRICT) return -EBUSY; if (acpi_enforce_resources == ENFORCE_RESOURCES_LAX) pr_notice("Resource conflict: System may be unstable or behave erratically\n"); return 0; } EXPORT_SYMBOL(acpi_check_resource_conflict); int acpi_check_region(resource_size_t start, resource_size_t n, const char *name) { struct resource res = DEFINE_RES_IO_NAMED(start, n, name); return acpi_check_resource_conflict(&res); } EXPORT_SYMBOL(acpi_check_region); /* * Let drivers know whether the resource checks are effective */ int acpi_resources_are_enforced(void) { return acpi_enforce_resources == ENFORCE_RESOURCES_STRICT; } EXPORT_SYMBOL(acpi_resources_are_enforced); /* * Deallocate the memory for a spinlock. */ void acpi_os_delete_lock(acpi_spinlock handle) { ACPI_FREE(handle); } /* * Acquire a spinlock. * * handle is a pointer to the spinlock_t. */ acpi_cpu_flags acpi_os_acquire_lock(acpi_spinlock lockp) __acquires(lockp) { spin_lock(lockp); return 0; } /* * Release a spinlock. See above. */ void acpi_os_release_lock(acpi_spinlock lockp, acpi_cpu_flags not_used) __releases(lockp) { spin_unlock(lockp); } #ifndef ACPI_USE_LOCAL_CACHE /******************************************************************************* * * FUNCTION: acpi_os_create_cache * * PARAMETERS: name - Ascii name for the cache * size - Size of each cached object * depth - Maximum depth of the cache (in objects) <ignored> * cache - Where the new cache object is returned * * RETURN: status * * DESCRIPTION: Create a cache object * ******************************************************************************/ acpi_status acpi_os_create_cache(char *name, u16 size, u16 depth, acpi_cache_t **cache) { *cache = kmem_cache_create(name, size, 0, 0, NULL); if (*cache == NULL) return AE_ERROR; else return AE_OK; } /******************************************************************************* * * FUNCTION: acpi_os_purge_cache * * PARAMETERS: Cache - Handle to cache object * * RETURN: Status * * DESCRIPTION: Free all objects within the requested cache. * ******************************************************************************/ acpi_status acpi_os_purge_cache(acpi_cache_t *cache) { kmem_cache_shrink(cache); return AE_OK; } /******************************************************************************* * * FUNCTION: acpi_os_delete_cache * * PARAMETERS: Cache - Handle to cache object * * RETURN: Status * * DESCRIPTION: Free all objects within the requested cache and delete the * cache object. * ******************************************************************************/ acpi_status acpi_os_delete_cache(acpi_cache_t *cache) { kmem_cache_destroy(cache); return AE_OK; } /******************************************************************************* * * FUNCTION: acpi_os_release_object * * PARAMETERS: Cache - Handle to cache object * Object - The object to be released * * RETURN: None * * DESCRIPTION: Release an object to the specified cache. If cache is full, * the object is deleted. * ******************************************************************************/ acpi_status acpi_os_release_object(acpi_cache_t *cache, void *object) { kmem_cache_free(cache, object); return AE_OK; } #endif static int __init acpi_no_static_ssdt_setup(char *s) { acpi_gbl_disable_ssdt_table_install = TRUE; pr_info("Static SSDT installation disabled\n"); return 0; } early_param("acpi_no_static_ssdt", acpi_no_static_ssdt_setup); static int __init acpi_disable_return_repair(char *s) { pr_notice("Predefined validation mechanism disabled\n"); acpi_gbl_disable_auto_repair = TRUE; return 1; } __setup("acpica_no_return_repair", acpi_disable_return_repair); acpi_status __init acpi_os_initialize(void) { acpi_os_map_generic_address(&acpi_gbl_FADT.xpm1a_event_block); acpi_os_map_generic_address(&acpi_gbl_FADT.xpm1b_event_block); acpi_gbl_xgpe0_block_logical_address = (unsigned long)acpi_os_map_generic_address(&acpi_gbl_FADT.xgpe0_block); acpi_gbl_xgpe1_block_logical_address = (unsigned long)acpi_os_map_generic_address(&acpi_gbl_FADT.xgpe1_block); if (acpi_gbl_FADT.flags & ACPI_FADT_RESET_REGISTER) { /* * Use acpi_os_map_generic_address to pre-map the reset * register if it's in system memory. */ void *rv; rv = acpi_os_map_generic_address(&acpi_gbl_FADT.reset_register); pr_debug("%s: Reset register mapping %s\n", __func__, rv ? "successful" : "failed"); } acpi_os_initialized = true; return AE_OK; } acpi_status __init acpi_os_initialize1(void) { kacpid_wq = alloc_workqueue("kacpid", WQ_PERCPU, 1); kacpi_notify_wq = alloc_workqueue("kacpi_notify", WQ_PERCPU, 0); kacpi_hotplug_wq = alloc_ordered_workqueue("kacpi_hotplug", 0); BUG_ON(!kacpid_wq); BUG_ON(!kacpi_notify_wq); BUG_ON(!kacpi_hotplug_wq); acpi_osi_init(); return AE_OK; } acpi_status acpi_os_terminate(void) { if (acpi_irq_handler) { acpi_os_remove_interrupt_handler(acpi_gbl_FADT.sci_interrupt, acpi_irq_handler); } acpi_os_unmap_generic_address(&acpi_gbl_FADT.xgpe1_block); acpi_os_unmap_generic_address(&acpi_gbl_FADT.xgpe0_block); acpi_gbl_xgpe0_block_logical_address = 0UL; acpi_gbl_xgpe1_block_logical_address = 0UL; acpi_os_unmap_generic_address(&acpi_gbl_FADT.xpm1b_event_block); acpi_os_unmap_generic_address(&acpi_gbl_FADT.xpm1a_event_block); if (acpi_gbl_FADT.flags & ACPI_FADT_RESET_REGISTER) acpi_os_unmap_generic_address(&acpi_gbl_FADT.reset_register); destroy_workqueue(kacpid_wq); destroy_workqueue(kacpi_notify_wq); destroy_workqueue(kacpi_hotplug_wq); return AE_OK; } acpi_status acpi_os_prepare_sleep(u8 sleep_state, u32 pm1a_control, u32 pm1b_control) { int rc = 0; if (__acpi_os_prepare_sleep) rc = __acpi_os_prepare_sleep(sleep_state, pm1a_control, pm1b_control); if (rc < 0) return AE_ERROR; else if (rc > 0) return AE_CTRL_TERMINATE; return AE_OK; } void acpi_os_set_prepare_sleep(int (*func)(u8 sleep_state, u32 pm1a_ctrl, u32 pm1b_ctrl)) { __acpi_os_prepare_sleep = func; } #if (ACPI_REDUCED_HARDWARE) acpi_status acpi_os_prepare_extended_sleep(u8 sleep_state, u32 val_a, u32 val_b) { int rc = 0; if (__acpi_os_prepare_extended_sleep) rc = __acpi_os_prepare_extended_sleep(sleep_state, val_a, val_b); if (rc < 0) return AE_ERROR; else if (rc > 0) return AE_CTRL_TERMINATE; return AE_OK; } #else acpi_status acpi_os_prepare_extended_sleep(u8 sleep_state, u32 val_a, u32 val_b) { return AE_OK; } #endif void acpi_os_set_prepare_extended_sleep(int (*func)(u8 sleep_state, u32 val_a, u32 val_b)) { __acpi_os_prepare_extended_sleep = func; } acpi_status acpi_os_enter_sleep(u8 sleep_state, u32 reg_a_value, u32 reg_b_value) { acpi_status status; if (acpi_gbl_reduced_hardware) status = acpi_os_prepare_extended_sleep(sleep_state, reg_a_value, reg_b_value); else status = acpi_os_prepare_sleep(sleep_state, reg_a_value, reg_b_value); return status; } |
| 16 1 1 1 30 30 30 27 3 1 1 1 27 1 1 1 1 1 1 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 | // SPDX-License-Identifier: GPL-2.0-only /* * kvm asynchronous fault support * * Copyright 2010 Red Hat, Inc. * * Author: * Gleb Natapov <gleb@redhat.com> */ #include <linux/kvm_host.h> #include <linux/slab.h> #include <linux/module.h> #include <linux/mmu_context.h> #include <linux/sched/mm.h> #include "async_pf.h" #include <trace/events/kvm.h> static struct kmem_cache *async_pf_cache; int kvm_async_pf_init(void) { async_pf_cache = KMEM_CACHE(kvm_async_pf, 0); if (!async_pf_cache) return -ENOMEM; return 0; } void kvm_async_pf_deinit(void) { kmem_cache_destroy(async_pf_cache); async_pf_cache = NULL; } void kvm_async_pf_vcpu_init(struct kvm_vcpu *vcpu) { INIT_LIST_HEAD(&vcpu->async_pf.done); INIT_LIST_HEAD(&vcpu->async_pf.queue); spin_lock_init(&vcpu->async_pf.lock); } static void async_pf_execute(struct work_struct *work) { struct kvm_async_pf *apf = container_of(work, struct kvm_async_pf, work); struct kvm_vcpu *vcpu = apf->vcpu; struct mm_struct *mm = vcpu->kvm->mm; unsigned long addr = apf->addr; gpa_t cr2_or_gpa = apf->cr2_or_gpa; int locked = 1; bool first; might_sleep(); /* * Attempt to pin the VM's host address space, and simply skip gup() if * acquiring a pin fail, i.e. if the process is exiting. Note, KVM * holds a reference to its associated mm_struct until the very end of * kvm_destroy_vm(), i.e. the struct itself won't be freed before this * work item is fully processed. */ if (mmget_not_zero(mm)) { mmap_read_lock(mm); get_user_pages_remote(mm, addr, 1, FOLL_WRITE, NULL, &locked); if (locked) mmap_read_unlock(mm); mmput(mm); } /* * Notify and kick the vCPU even if faulting in the page failed, e.g. * so that the vCPU can retry the fault synchronously. */ if (IS_ENABLED(CONFIG_KVM_ASYNC_PF_SYNC)) kvm_arch_async_page_present(vcpu, apf); spin_lock(&vcpu->async_pf.lock); first = list_empty(&vcpu->async_pf.done); list_add_tail(&apf->link, &vcpu->async_pf.done); spin_unlock(&vcpu->async_pf.lock); /* * The apf struct may be freed by kvm_check_async_pf_completion() as * soon as the lock is dropped. Nullify it to prevent improper usage. */ apf = NULL; if (!IS_ENABLED(CONFIG_KVM_ASYNC_PF_SYNC) && first) kvm_arch_async_page_present_queued(vcpu); trace_kvm_async_pf_completed(addr, cr2_or_gpa); __kvm_vcpu_wake_up(vcpu); } static void kvm_flush_and_free_async_pf_work(struct kvm_async_pf *work) { /* * The async #PF is "done", but KVM must wait for the work item itself, * i.e. async_pf_execute(), to run to completion. If KVM is a module, * KVM must ensure *no* code owned by the KVM (the module) can be run * after the last call to module_put(). Note, flushing the work item * is always required when the item is taken off the completion queue. * E.g. even if the vCPU handles the item in the "normal" path, the VM * could be terminated before async_pf_execute() completes. * * Wake all events skip the queue and go straight done, i.e. don't * need to be flushed (but sanity check that the work wasn't queued). */ if (work->wakeup_all) WARN_ON_ONCE(work->work.func); else flush_work(&work->work); kmem_cache_free(async_pf_cache, work); } void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu) { /* cancel outstanding work queue item */ while (!list_empty(&vcpu->async_pf.queue)) { struct kvm_async_pf *work = list_first_entry(&vcpu->async_pf.queue, typeof(*work), queue); list_del(&work->queue); #ifdef CONFIG_KVM_ASYNC_PF_SYNC flush_work(&work->work); #else if (cancel_work_sync(&work->work)) kmem_cache_free(async_pf_cache, work); #endif } spin_lock(&vcpu->async_pf.lock); while (!list_empty(&vcpu->async_pf.done)) { struct kvm_async_pf *work = list_first_entry(&vcpu->async_pf.done, typeof(*work), link); list_del(&work->link); spin_unlock(&vcpu->async_pf.lock); kvm_flush_and_free_async_pf_work(work); spin_lock(&vcpu->async_pf.lock); } spin_unlock(&vcpu->async_pf.lock); vcpu->async_pf.queued = 0; } void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu) { struct kvm_async_pf *work; while (!list_empty_careful(&vcpu->async_pf.done) && kvm_arch_can_dequeue_async_page_present(vcpu)) { spin_lock(&vcpu->async_pf.lock); work = list_first_entry(&vcpu->async_pf.done, typeof(*work), link); list_del(&work->link); spin_unlock(&vcpu->async_pf.lock); kvm_arch_async_page_ready(vcpu, work); if (!IS_ENABLED(CONFIG_KVM_ASYNC_PF_SYNC)) kvm_arch_async_page_present(vcpu, work); list_del(&work->queue); vcpu->async_pf.queued--; kvm_flush_and_free_async_pf_work(work); } } /* * Try to schedule a job to handle page fault asynchronously. Returns 'true' on * success, 'false' on failure (page fault has to be handled synchronously). */ bool kvm_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, unsigned long hva, struct kvm_arch_async_pf *arch) { struct kvm_async_pf *work; if (vcpu->async_pf.queued >= ASYNC_PF_PER_VCPU) return false; /* Arch specific code should not do async PF in this case */ if (unlikely(kvm_is_error_hva(hva))) return false; /* * do alloc nowait since if we are going to sleep anyway we * may as well sleep faulting in page */ work = kmem_cache_zalloc(async_pf_cache, GFP_NOWAIT); if (!work) return false; work->wakeup_all = false; work->vcpu = vcpu; work->cr2_or_gpa = cr2_or_gpa; work->addr = hva; work->arch = *arch; INIT_WORK(&work->work, async_pf_execute); list_add_tail(&work->queue, &vcpu->async_pf.queue); vcpu->async_pf.queued++; work->notpresent_injected = kvm_arch_async_page_not_present(vcpu, work); schedule_work(&work->work); return true; } int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu) { struct kvm_async_pf *work; bool first; if (!list_empty_careful(&vcpu->async_pf.done)) return 0; work = kmem_cache_zalloc(async_pf_cache, GFP_ATOMIC); if (!work) return -ENOMEM; work->wakeup_all = true; INIT_LIST_HEAD(&work->queue); /* for list_del to work */ spin_lock(&vcpu->async_pf.lock); first = list_empty(&vcpu->async_pf.done); list_add_tail(&work->link, &vcpu->async_pf.done); spin_unlock(&vcpu->async_pf.lock); if (!IS_ENABLED(CONFIG_KVM_ASYNC_PF_SYNC) && first) kvm_arch_async_page_present_queued(vcpu); vcpu->async_pf.queued++; return 0; } |
| 20 14 37 37 22 14 7 15 2 11 2 9 2 7 7 3 3 7 11 10 11 1 12 9 3 15 11 5 4 5 11 5 6 11 10 5 6 7 4 3 3 3 3 3 18 17 2 7 2 17 18 18 18 18 18 17 16 18 17 1 1 1 16 17 16 1 1 17 16 16 10 15 15 15 15 14 14 15 3 12 3 15 11 3 3 3 1 1 1 1 1 3 17 47 47 27 7 7 33 33 17 17 6 14 12 4 3 1 2 1 2 1 22 19 5 5 5 4 5 5 5 5 5 5 5 6 4 3 1 3 5 5 4 5 6 5 2 5 9 2 5 2 3 4 2 2 2 6 2 4 4 9 9 4 5 5 9 9 10 1 10 9 9 9 10 7 12 7 2 6 6 5 2 4 4 11 2 2 1 2 4 2 7 3 4 11 12 1 16 23 5 5 4 5 5 5 5 4 4 2 4 4 4 2 4 4 2 3 9 4 5 4 5 5 5 5 5 4 4 4 4 5 5 1 4 4 5 15 11 8 8 7 8 8 8 5 1 7 2 6 8 8 1 8 7 6 6 5 6 6 6 4 4 6 5 6 6 4 7 4 4 4 4 4 4 4 4 4 2 2 1 2 1 2 2 1 2 1 1 1 2 2 2 8 1 7 7 8 1 7 1 1 1 1 4 4 4 3 2 2 2 2 1 1 1 3 4 4 3 4 1 1 1 1 1 11 1 5 11 11 9 8 9 9 9 8 5 5 9 9 9 8 9 7 9 7 7 7 2 2 2 2 7 1 1 1 1 8 7 7 4 4 4 1 4 3 1 2 3 3 7 4 7 7 7 4 1 3 9 8 9 9 9 7 4 4 6 6 2 8 2 1 1 2 1 1 1 1 6 2 6 4 1 1 2 1 2 1 1 2 2 6 1 1 1 2 2 2 1 1 1 2 2 1 2 1 1 1 1 1 1 1 1 1 11 11 6 4 11 3 3 1 1 1 2 2 2 1 2 2 6 2 4 4 4 6 3 3 1 3 2 1 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 2674 2675 2676 2677 2678 2679 2680 2681 2682 2683 2684 2685 2686 2687 2688 2689 2690 2691 2692 2693 2694 2695 2696 2697 2698 2699 2700 2701 2702 2703 2704 2705 2706 2707 2708 2709 2710 2711 2712 2713 2714 2715 2716 2717 2718 2719 2720 2721 2722 2723 2724 2725 2726 2727 2728 2729 2730 2731 2732 2733 2734 2735 2736 2737 | // SPDX-License-Identifier: GPL-2.0-only /* Copyright (c) 2011-2014 PLUMgrid, http://plumgrid.com * Copyright (c) 2016 Facebook */ #include <linux/bpf.h> #include <linux/btf.h> #include <linux/jhash.h> #include <linux/filter.h> #include <linux/rculist_nulls.h> #include <linux/rcupdate_wait.h> #include <linux/random.h> #include <uapi/linux/btf.h> #include <linux/rcupdate_trace.h> #include <linux/btf_ids.h> #include "percpu_freelist.h" #include "bpf_lru_list.h" #include "map_in_map.h" #include <linux/bpf_mem_alloc.h> #include <asm/rqspinlock.h> #define HTAB_CREATE_FLAG_MASK \ (BPF_F_NO_PREALLOC | BPF_F_NO_COMMON_LRU | BPF_F_NUMA_NODE | \ BPF_F_ACCESS_MASK | BPF_F_ZERO_SEED) #define BATCH_OPS(_name) \ .map_lookup_batch = \ _name##_map_lookup_batch, \ .map_lookup_and_delete_batch = \ _name##_map_lookup_and_delete_batch, \ .map_update_batch = \ generic_map_update_batch, \ .map_delete_batch = \ generic_map_delete_batch /* * The bucket lock has two protection scopes: * * 1) Serializing concurrent operations from BPF programs on different * CPUs * * 2) Serializing concurrent operations from BPF programs and sys_bpf() * * BPF programs can execute in any context including perf, kprobes and * tracing. As there are almost no limits where perf, kprobes and tracing * can be invoked from the lock operations need to be protected against * deadlocks. Deadlocks can be caused by recursion and by an invocation in * the lock held section when functions which acquire this lock are invoked * from sys_bpf(). BPF recursion is prevented by incrementing the per CPU * variable bpf_prog_active, which prevents BPF programs attached to perf * events, kprobes and tracing to be invoked before the prior invocation * from one of these contexts completed. sys_bpf() uses the same mechanism * by pinning the task to the current CPU and incrementing the recursion * protection across the map operation. * * This has subtle implications on PREEMPT_RT. PREEMPT_RT forbids certain * operations like memory allocations (even with GFP_ATOMIC) from atomic * contexts. This is required because even with GFP_ATOMIC the memory * allocator calls into code paths which acquire locks with long held lock * sections. To ensure the deterministic behaviour these locks are regular * spinlocks, which are converted to 'sleepable' spinlocks on RT. The only * true atomic contexts on an RT kernel are the low level hardware * handling, scheduling, low level interrupt handling, NMIs etc. None of * these contexts should ever do memory allocations. * * As regular device interrupt handlers and soft interrupts are forced into * thread context, the existing code which does * spin_lock*(); alloc(GFP_ATOMIC); spin_unlock*(); * just works. * * In theory the BPF locks could be converted to regular spinlocks as well, * but the bucket locks and percpu_freelist locks can be taken from * arbitrary contexts (perf, kprobes, tracepoints) which are required to be * atomic contexts even on RT. Before the introduction of bpf_mem_alloc, * it is only safe to use raw spinlock for preallocated hash map on a RT kernel, * because there is no memory allocation within the lock held sections. However * after hash map was fully converted to use bpf_mem_alloc, there will be * non-synchronous memory allocation for non-preallocated hash map, so it is * safe to always use raw spinlock for bucket lock. */ struct bucket { struct hlist_nulls_head head; rqspinlock_t raw_lock; }; struct bpf_htab { struct bpf_map map; struct bpf_mem_alloc ma; struct bpf_mem_alloc pcpu_ma; struct bucket *buckets; void *elems; union { struct pcpu_freelist freelist; struct bpf_lru lru; }; struct htab_elem *__percpu *extra_elems; /* number of elements in non-preallocated hashtable are kept * in either pcount or count */ struct percpu_counter pcount; atomic_t count; bool use_percpu_counter; u32 n_buckets; /* number of hash buckets */ u32 elem_size; /* size of each element in bytes */ u32 hashrnd; }; /* each htab element is struct htab_elem + key + value */ struct htab_elem { union { struct hlist_nulls_node hash_node; struct { void *padding; union { struct pcpu_freelist_node fnode; struct htab_elem *batch_flink; }; }; }; union { /* pointer to per-cpu pointer */ void *ptr_to_pptr; struct bpf_lru_node lru_node; }; u32 hash; char key[] __aligned(8); }; struct htab_btf_record { struct btf_record *record; u32 key_size; }; static inline bool htab_is_prealloc(const struct bpf_htab *htab) { return !(htab->map.map_flags & BPF_F_NO_PREALLOC); } static void htab_init_buckets(struct bpf_htab *htab) { unsigned int i; for (i = 0; i < htab->n_buckets; i++) { INIT_HLIST_NULLS_HEAD(&htab->buckets[i].head, i); raw_res_spin_lock_init(&htab->buckets[i].raw_lock); cond_resched(); } } static inline int htab_lock_bucket(struct bucket *b, unsigned long *pflags) { unsigned long flags; int ret; ret = raw_res_spin_lock_irqsave(&b->raw_lock, flags); if (ret) return ret; *pflags = flags; return 0; } static inline void htab_unlock_bucket(struct bucket *b, unsigned long flags) { raw_res_spin_unlock_irqrestore(&b->raw_lock, flags); } static bool htab_lru_map_delete_node(void *arg, struct bpf_lru_node *node); static bool htab_is_lru(const struct bpf_htab *htab) { return htab->map.map_type == BPF_MAP_TYPE_LRU_HASH || htab->map.map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH; } static bool htab_is_percpu(const struct bpf_htab *htab) { return htab->map.map_type == BPF_MAP_TYPE_PERCPU_HASH || htab->map.map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH; } static inline bool is_fd_htab(const struct bpf_htab *htab) { return htab->map.map_type == BPF_MAP_TYPE_HASH_OF_MAPS; } static inline void *htab_elem_value(struct htab_elem *l, u32 key_size) { return l->key + round_up(key_size, 8); } static inline void htab_elem_set_ptr(struct htab_elem *l, u32 key_size, void __percpu *pptr) { *(void __percpu **)htab_elem_value(l, key_size) = pptr; } static inline void __percpu *htab_elem_get_ptr(struct htab_elem *l, u32 key_size) { return *(void __percpu **)htab_elem_value(l, key_size); } static void *fd_htab_map_get_ptr(const struct bpf_map *map, struct htab_elem *l) { return *(void **)htab_elem_value(l, map->key_size); } static struct htab_elem *get_htab_elem(struct bpf_htab *htab, int i) { return (struct htab_elem *) (htab->elems + i * (u64)htab->elem_size); } /* Both percpu and fd htab support in-place update, so no need for * extra elem. LRU itself can remove the least used element, so * there is no need for an extra elem during map_update. */ static bool htab_has_extra_elems(struct bpf_htab *htab) { return !htab_is_percpu(htab) && !htab_is_lru(htab) && !is_fd_htab(htab); } static void htab_free_prealloced_internal_structs(struct bpf_htab *htab) { u32 num_entries = htab->map.max_entries; int i; if (htab_has_extra_elems(htab)) num_entries += num_possible_cpus(); for (i = 0; i < num_entries; i++) { struct htab_elem *elem; elem = get_htab_elem(htab, i); bpf_map_free_internal_structs(&htab->map, htab_elem_value(elem, htab->map.key_size)); cond_resched(); } } static void htab_free_prealloced_fields(struct bpf_htab *htab) { u32 num_entries = htab->map.max_entries; int i; if (IS_ERR_OR_NULL(htab->map.record)) return; if (htab_has_extra_elems(htab)) num_entries += num_possible_cpus(); for (i = 0; i < num_entries; i++) { struct htab_elem *elem; elem = get_htab_elem(htab, i); if (htab_is_percpu(htab)) { void __percpu *pptr = htab_elem_get_ptr(elem, htab->map.key_size); int cpu; for_each_possible_cpu(cpu) { bpf_obj_free_fields(htab->map.record, per_cpu_ptr(pptr, cpu)); cond_resched(); } } else { bpf_obj_free_fields(htab->map.record, htab_elem_value(elem, htab->map.key_size)); cond_resched(); } cond_resched(); } } static void htab_free_elems(struct bpf_htab *htab) { int i; if (!htab_is_percpu(htab)) goto free_elems; for (i = 0; i < htab->map.max_entries; i++) { void __percpu *pptr; pptr = htab_elem_get_ptr(get_htab_elem(htab, i), htab->map.key_size); free_percpu(pptr); cond_resched(); } free_elems: bpf_map_area_free(htab->elems); } /* The LRU list has a lock (lru_lock). Each htab bucket has a lock * (bucket_lock). If both locks need to be acquired together, the lock * order is always lru_lock -> bucket_lock and this only happens in * bpf_lru_list.c logic. For example, certain code path of * bpf_lru_pop_free(), which is called by function prealloc_lru_pop(), * will acquire lru_lock first followed by acquiring bucket_lock. * * In hashtab.c, to avoid deadlock, lock acquisition of * bucket_lock followed by lru_lock is not allowed. In such cases, * bucket_lock needs to be released first before acquiring lru_lock. */ static struct htab_elem *prealloc_lru_pop(struct bpf_htab *htab, void *key, u32 hash) { struct bpf_lru_node *node = bpf_lru_pop_free(&htab->lru, hash); struct htab_elem *l; if (node) { bpf_map_inc_elem_count(&htab->map); l = container_of(node, struct htab_elem, lru_node); memcpy(l->key, key, htab->map.key_size); return l; } return NULL; } static int prealloc_init(struct bpf_htab *htab) { u32 num_entries = htab->map.max_entries; int err = -ENOMEM, i; if (htab_has_extra_elems(htab)) num_entries += num_possible_cpus(); htab->elems = bpf_map_area_alloc((u64)htab->elem_size * num_entries, htab->map.numa_node); if (!htab->elems) return -ENOMEM; if (!htab_is_percpu(htab)) goto skip_percpu_elems; for (i = 0; i < num_entries; i++) { u32 size = round_up(htab->map.value_size, 8); void __percpu *pptr; pptr = bpf_map_alloc_percpu(&htab->map, size, 8, GFP_USER | __GFP_NOWARN); if (!pptr) goto free_elems; htab_elem_set_ptr(get_htab_elem(htab, i), htab->map.key_size, pptr); cond_resched(); } skip_percpu_elems: if (htab_is_lru(htab)) err = bpf_lru_init(&htab->lru, htab->map.map_flags & BPF_F_NO_COMMON_LRU, offsetof(struct htab_elem, hash) - offsetof(struct htab_elem, lru_node), htab_lru_map_delete_node, htab); else err = pcpu_freelist_init(&htab->freelist); if (err) goto free_elems; if (htab_is_lru(htab)) bpf_lru_populate(&htab->lru, htab->elems, offsetof(struct htab_elem, lru_node), htab->elem_size, num_entries); else pcpu_freelist_populate(&htab->freelist, htab->elems + offsetof(struct htab_elem, fnode), htab->elem_size, num_entries); return 0; free_elems: htab_free_elems(htab); return err; } static void prealloc_destroy(struct bpf_htab *htab) { htab_free_elems(htab); if (htab_is_lru(htab)) bpf_lru_destroy(&htab->lru); else pcpu_freelist_destroy(&htab->freelist); } static int alloc_extra_elems(struct bpf_htab *htab) { struct htab_elem *__percpu *pptr, *l_new; struct pcpu_freelist_node *l; int cpu; pptr = bpf_map_alloc_percpu(&htab->map, sizeof(struct htab_elem *), 8, GFP_USER | __GFP_NOWARN); if (!pptr) return -ENOMEM; for_each_possible_cpu(cpu) { l = pcpu_freelist_pop(&htab->freelist); /* pop will succeed, since prealloc_init() * preallocated extra num_possible_cpus elements */ l_new = container_of(l, struct htab_elem, fnode); *per_cpu_ptr(pptr, cpu) = l_new; } htab->extra_elems = pptr; return 0; } /* Called from syscall */ static int htab_map_alloc_check(union bpf_attr *attr) { bool percpu = (attr->map_type == BPF_MAP_TYPE_PERCPU_HASH || attr->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH); bool lru = (attr->map_type == BPF_MAP_TYPE_LRU_HASH || attr->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH); /* percpu_lru means each cpu has its own LRU list. * it is different from BPF_MAP_TYPE_PERCPU_HASH where * the map's value itself is percpu. percpu_lru has * nothing to do with the map's value. */ bool percpu_lru = (attr->map_flags & BPF_F_NO_COMMON_LRU); bool prealloc = !(attr->map_flags & BPF_F_NO_PREALLOC); bool zero_seed = (attr->map_flags & BPF_F_ZERO_SEED); int numa_node = bpf_map_attr_numa_node(attr); BUILD_BUG_ON(offsetof(struct htab_elem, fnode.next) != offsetof(struct htab_elem, hash_node.pprev)); if (zero_seed && !capable(CAP_SYS_ADMIN)) /* Guard against local DoS, and discourage production use. */ return -EPERM; if (attr->map_flags & ~HTAB_CREATE_FLAG_MASK || !bpf_map_flags_access_ok(attr->map_flags)) return -EINVAL; if (!lru && percpu_lru) return -EINVAL; if (lru && !prealloc) return -ENOTSUPP; if (numa_node != NUMA_NO_NODE && (percpu || percpu_lru)) return -EINVAL; /* check sanity of attributes. * value_size == 0 may be allowed in the future to use map as a set */ if (attr->max_entries == 0 || attr->key_size == 0 || attr->value_size == 0) return -EINVAL; if ((u64)attr->key_size + attr->value_size >= KMALLOC_MAX_SIZE - sizeof(struct htab_elem)) /* if key_size + value_size is bigger, the user space won't be * able to access the elements via bpf syscall. This check * also makes sure that the elem_size doesn't overflow and it's * kmalloc-able later in htab_map_update_elem() */ return -E2BIG; /* percpu map value size is bound by PCPU_MIN_UNIT_SIZE */ if (percpu && round_up(attr->value_size, 8) > PCPU_MIN_UNIT_SIZE) return -E2BIG; return 0; } static void htab_mem_dtor(void *obj, void *ctx) { struct htab_btf_record *hrec = ctx; struct htab_elem *elem = obj; void *map_value; if (IS_ERR_OR_NULL(hrec->record)) return; map_value = htab_elem_value(elem, hrec->key_size); bpf_obj_free_fields(hrec->record, map_value); } static void htab_pcpu_mem_dtor(void *obj, void *ctx) { void __percpu *pptr = *(void __percpu **)obj; struct htab_btf_record *hrec = ctx; int cpu; if (IS_ERR_OR_NULL(hrec->record)) return; for_each_possible_cpu(cpu) bpf_obj_free_fields(hrec->record, per_cpu_ptr(pptr, cpu)); } static void htab_dtor_ctx_free(void *ctx) { struct htab_btf_record *hrec = ctx; btf_record_free(hrec->record); kfree(ctx); } static int htab_set_dtor(struct bpf_htab *htab, void (*dtor)(void *, void *)) { u32 key_size = htab->map.key_size; struct bpf_mem_alloc *ma; struct htab_btf_record *hrec; int err; /* No need for dtors. */ if (IS_ERR_OR_NULL(htab->map.record)) return 0; hrec = kzalloc(sizeof(*hrec), GFP_KERNEL); if (!hrec) return -ENOMEM; hrec->key_size = key_size; hrec->record = btf_record_dup(htab->map.record); if (IS_ERR(hrec->record)) { err = PTR_ERR(hrec->record); kfree(hrec); return err; } ma = htab_is_percpu(htab) ? &htab->pcpu_ma : &htab->ma; bpf_mem_alloc_set_dtor(ma, dtor, htab_dtor_ctx_free, hrec); return 0; } static int htab_map_check_btf(struct bpf_map *map, const struct btf *btf, const struct btf_type *key_type, const struct btf_type *value_type) { struct bpf_htab *htab = container_of(map, struct bpf_htab, map); if (htab_is_prealloc(htab)) return 0; /* * We must set the dtor using this callback, as map's BTF record is not * populated in htab_map_alloc(), so it will always appear as NULL. */ if (htab_is_percpu(htab)) return htab_set_dtor(htab, htab_pcpu_mem_dtor); else return htab_set_dtor(htab, htab_mem_dtor); } static struct bpf_map *htab_map_alloc(union bpf_attr *attr) { bool percpu = (attr->map_type == BPF_MAP_TYPE_PERCPU_HASH || attr->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH); /* percpu_lru means each cpu has its own LRU list. * it is different from BPF_MAP_TYPE_PERCPU_HASH where * the map's value itself is percpu. percpu_lru has * nothing to do with the map's value. */ bool percpu_lru = (attr->map_flags & BPF_F_NO_COMMON_LRU); bool prealloc = !(attr->map_flags & BPF_F_NO_PREALLOC); struct bpf_htab *htab; int err; htab = bpf_map_area_alloc(sizeof(*htab), NUMA_NO_NODE); if (!htab) return ERR_PTR(-ENOMEM); bpf_map_init_from_attr(&htab->map, attr); if (percpu_lru) { /* ensure each CPU's lru list has >=1 elements. * since we are at it, make each lru list has the same * number of elements. */ htab->map.max_entries = roundup(attr->max_entries, num_possible_cpus()); if (htab->map.max_entries < attr->max_entries) htab->map.max_entries = rounddown(attr->max_entries, num_possible_cpus()); } /* hash table size must be power of 2; roundup_pow_of_two() can overflow * into UB on 32-bit arches, so check that first */ err = -E2BIG; if (htab->map.max_entries > 1UL << 31) goto free_htab; htab->n_buckets = roundup_pow_of_two(htab->map.max_entries); htab->elem_size = sizeof(struct htab_elem) + round_up(htab->map.key_size, 8); if (percpu) htab->elem_size += sizeof(void *); else htab->elem_size += round_up(htab->map.value_size, 8); /* check for u32 overflow */ if (htab->n_buckets > U32_MAX / sizeof(struct bucket)) goto free_htab; err = bpf_map_init_elem_count(&htab->map); if (err) goto free_htab; err = -ENOMEM; htab->buckets = bpf_map_area_alloc(htab->n_buckets * sizeof(struct bucket), htab->map.numa_node); if (!htab->buckets) goto free_elem_count; if (htab->map.map_flags & BPF_F_ZERO_SEED) htab->hashrnd = 0; else htab->hashrnd = get_random_u32(); htab_init_buckets(htab); /* compute_batch_value() computes batch value as num_online_cpus() * 2 * and __percpu_counter_compare() needs * htab->max_entries - cur_number_of_elems to be more than batch * num_online_cpus() * for percpu_counter to be faster than atomic_t. In practice the average bpf * hash map size is 10k, which means that a system with 64 cpus will fill * hashmap to 20% of 10k before percpu_counter becomes ineffective. Therefore * define our own batch count as 32 then 10k hash map can be filled up to 80%: * 10k - 8k > 32 _batch_ * 64 _cpus_ * and __percpu_counter_compare() will still be fast. At that point hash map * collisions will dominate its performance anyway. Assume that hash map filled * to 50+% isn't going to be O(1) and use the following formula to choose * between percpu_counter and atomic_t. */ #define PERCPU_COUNTER_BATCH 32 if (attr->max_entries / 2 > num_online_cpus() * PERCPU_COUNTER_BATCH) htab->use_percpu_counter = true; if (htab->use_percpu_counter) { err = percpu_counter_init(&htab->pcount, 0, GFP_KERNEL); if (err) goto free_map_locked; } if (prealloc) { err = prealloc_init(htab); if (err) goto free_map_locked; if (htab_has_extra_elems(htab)) { err = alloc_extra_elems(htab); if (err) goto free_prealloc; } } else { err = bpf_mem_alloc_init(&htab->ma, htab->elem_size, false); if (err) goto free_map_locked; if (percpu) { err = bpf_mem_alloc_init(&htab->pcpu_ma, round_up(htab->map.value_size, 8), true); if (err) goto free_map_locked; } } return &htab->map; free_prealloc: prealloc_destroy(htab); free_map_locked: if (htab->use_percpu_counter) percpu_counter_destroy(&htab->pcount); bpf_map_area_free(htab->buckets); bpf_mem_alloc_destroy(&htab->pcpu_ma); bpf_mem_alloc_destroy(&htab->ma); free_elem_count: bpf_map_free_elem_count(&htab->map); free_htab: bpf_map_area_free(htab); return ERR_PTR(err); } static inline u32 htab_map_hash(const void *key, u32 key_len, u32 hashrnd) { if (likely(key_len % 4 == 0)) return jhash2(key, key_len / 4, hashrnd); return jhash(key, key_len, hashrnd); } static inline struct bucket *__select_bucket(struct bpf_htab *htab, u32 hash) { return &htab->buckets[hash & (htab->n_buckets - 1)]; } static inline struct hlist_nulls_head *select_bucket(struct bpf_htab *htab, u32 hash) { return &__select_bucket(htab, hash)->head; } /* this lookup function can only be called with bucket lock taken */ static struct htab_elem *lookup_elem_raw(struct hlist_nulls_head *head, u32 hash, void *key, u32 key_size) { struct hlist_nulls_node *n; struct htab_elem *l; hlist_nulls_for_each_entry_rcu(l, n, head, hash_node) if (l->hash == hash && !memcmp(&l->key, key, key_size)) return l; return NULL; } /* can be called without bucket lock. it will repeat the loop in * the unlikely event when elements moved from one bucket into another * while link list is being walked */ static struct htab_elem *lookup_nulls_elem_raw(struct hlist_nulls_head *head, u32 hash, void *key, u32 key_size, u32 n_buckets) { struct hlist_nulls_node *n; struct htab_elem *l; again: hlist_nulls_for_each_entry_rcu(l, n, head, hash_node) if (l->hash == hash && !memcmp(&l->key, key, key_size)) return l; if (unlikely(get_nulls_value(n) != (hash & (n_buckets - 1)))) goto again; return NULL; } /* Called from syscall or from eBPF program directly, so * arguments have to match bpf_map_lookup_elem() exactly. * The return value is adjusted by BPF instructions * in htab_map_gen_lookup(). */ static void *__htab_map_lookup_elem(struct bpf_map *map, void *key) { struct bpf_htab *htab = container_of(map, struct bpf_htab, map); struct hlist_nulls_head *head; struct htab_elem *l; u32 hash, key_size; WARN_ON_ONCE(!bpf_rcu_lock_held()); key_size = map->key_size; hash = htab_map_hash(key, key_size, htab->hashrnd); head = select_bucket(htab, hash); l = lookup_nulls_elem_raw(head, hash, key, key_size, htab->n_buckets); return l; } static void *htab_map_lookup_elem(struct bpf_map *map, void *key) { struct htab_elem *l = __htab_map_lookup_elem(map, key); if (l) return htab_elem_value(l, map->key_size); return NULL; } /* inline bpf_map_lookup_elem() call. * Instead of: * bpf_prog * bpf_map_lookup_elem * map->ops->map_lookup_elem * htab_map_lookup_elem * __htab_map_lookup_elem * do: * bpf_prog * __htab_map_lookup_elem */ static int htab_map_gen_lookup(struct bpf_map *map, struct bpf_insn *insn_buf) { struct bpf_insn *insn = insn_buf; const int ret = BPF_REG_0; BUILD_BUG_ON(!__same_type(&__htab_map_lookup_elem, (void *(*)(struct bpf_map *map, void *key))NULL)); *insn++ = BPF_EMIT_CALL(__htab_map_lookup_elem); *insn++ = BPF_JMP_IMM(BPF_JEQ, ret, 0, 1); *insn++ = BPF_ALU64_IMM(BPF_ADD, ret, offsetof(struct htab_elem, key) + round_up(map->key_size, 8)); return insn - insn_buf; } static __always_inline void *__htab_lru_map_lookup_elem(struct bpf_map *map, void *key, const bool mark) { struct htab_elem *l = __htab_map_lookup_elem(map, key); if (l) { if (mark) bpf_lru_node_set_ref(&l->lru_node); return htab_elem_value(l, map->key_size); } return NULL; } static void *htab_lru_map_lookup_elem(struct bpf_map *map, void *key) { return __htab_lru_map_lookup_elem(map, key, true); } static void *htab_lru_map_lookup_elem_sys(struct bpf_map *map, void *key) { return __htab_lru_map_lookup_elem(map, key, false); } static int htab_lru_map_gen_lookup(struct bpf_map *map, struct bpf_insn *insn_buf) { struct bpf_insn *insn = insn_buf; const int ret = BPF_REG_0; const int ref_reg = BPF_REG_1; BUILD_BUG_ON(!__same_type(&__htab_map_lookup_elem, (void *(*)(struct bpf_map *map, void *key))NULL)); *insn++ = BPF_EMIT_CALL(__htab_map_lookup_elem); *insn++ = BPF_JMP_IMM(BPF_JEQ, ret, 0, 4); *insn++ = BPF_LDX_MEM(BPF_B, ref_reg, ret, offsetof(struct htab_elem, lru_node) + offsetof(struct bpf_lru_node, ref)); *insn++ = BPF_JMP_IMM(BPF_JNE, ref_reg, 0, 1); *insn++ = BPF_ST_MEM(BPF_B, ret, offsetof(struct htab_elem, lru_node) + offsetof(struct bpf_lru_node, ref), 1); *insn++ = BPF_ALU64_IMM(BPF_ADD, ret, offsetof(struct htab_elem, key) + round_up(map->key_size, 8)); return insn - insn_buf; } static void check_and_free_fields(struct bpf_htab *htab, struct htab_elem *elem) { if (IS_ERR_OR_NULL(htab->map.record)) return; if (htab_is_percpu(htab)) { void __percpu *pptr = htab_elem_get_ptr(elem, htab->map.key_size); int cpu; for_each_possible_cpu(cpu) bpf_obj_free_fields(htab->map.record, per_cpu_ptr(pptr, cpu)); } else { void *map_value = htab_elem_value(elem, htab->map.key_size); bpf_obj_free_fields(htab->map.record, map_value); } } /* It is called from the bpf_lru_list when the LRU needs to delete * older elements from the htab. */ static bool htab_lru_map_delete_node(void *arg, struct bpf_lru_node *node) { struct bpf_htab *htab = arg; struct htab_elem *l = NULL, *tgt_l; struct hlist_nulls_head *head; struct hlist_nulls_node *n; unsigned long flags; struct bucket *b; int ret; tgt_l = container_of(node, struct htab_elem, lru_node); b = __select_bucket(htab, tgt_l->hash); head = &b->head; ret = htab_lock_bucket(b, &flags); if (ret) return false; hlist_nulls_for_each_entry_rcu(l, n, head, hash_node) if (l == tgt_l) { hlist_nulls_del_rcu(&l->hash_node); bpf_map_dec_elem_count(&htab->map); break; } htab_unlock_bucket(b, flags); if (l == tgt_l) check_and_free_fields(htab, l); return l == tgt_l; } /* Called from syscall */ static int htab_map_get_next_key(struct bpf_map *map, void *key, void *next_key) { struct bpf_htab *htab = container_of(map, struct bpf_htab, map); struct hlist_nulls_head *head; struct htab_elem *l, *next_l; u32 hash, key_size; int i = 0; WARN_ON_ONCE(!rcu_read_lock_held()); key_size = map->key_size; if (!key) goto find_first_elem; hash = htab_map_hash(key, key_size, htab->hashrnd); head = select_bucket(htab, hash); /* lookup the key */ l = lookup_nulls_elem_raw(head, hash, key, key_size, htab->n_buckets); if (!l) goto find_first_elem; /* key was found, get next key in the same bucket */ next_l = hlist_nulls_entry_safe(rcu_dereference_raw(hlist_nulls_next_rcu(&l->hash_node)), struct htab_elem, hash_node); if (next_l) { /* if next elem in this hash list is non-zero, just return it */ memcpy(next_key, next_l->key, key_size); return 0; } /* no more elements in this hash list, go to the next bucket */ i = hash & (htab->n_buckets - 1); i++; find_first_elem: /* iterate over buckets */ for (; i < htab->n_buckets; i++) { head = select_bucket(htab, i); /* pick first element in the bucket */ next_l = hlist_nulls_entry_safe(rcu_dereference_raw(hlist_nulls_first_rcu(head)), struct htab_elem, hash_node); if (next_l) { /* if it's not empty, just return it */ memcpy(next_key, next_l->key, key_size); return 0; } } /* iterated over all buckets and all elements */ return -ENOENT; } static void htab_elem_free(struct bpf_htab *htab, struct htab_elem *l) { check_and_free_fields(htab, l); if (htab->map.map_type == BPF_MAP_TYPE_PERCPU_HASH) bpf_mem_cache_free(&htab->pcpu_ma, l->ptr_to_pptr); bpf_mem_cache_free(&htab->ma, l); } static void htab_put_fd_value(struct bpf_htab *htab, struct htab_elem *l) { struct bpf_map *map = &htab->map; void *ptr; if (map->ops->map_fd_put_ptr) { ptr = fd_htab_map_get_ptr(map, l); map->ops->map_fd_put_ptr(map, ptr, true); } } static bool is_map_full(struct bpf_htab *htab) { if (htab->use_percpu_counter) return __percpu_counter_compare(&htab->pcount, htab->map.max_entries, PERCPU_COUNTER_BATCH) >= 0; return atomic_read(&htab->count) >= htab->map.max_entries; } static void inc_elem_count(struct bpf_htab *htab) { bpf_map_inc_elem_count(&htab->map); if (htab->use_percpu_counter) percpu_counter_add_batch(&htab->pcount, 1, PERCPU_COUNTER_BATCH); else atomic_inc(&htab->count); } static void dec_elem_count(struct bpf_htab *htab) { bpf_map_dec_elem_count(&htab->map); if (htab->use_percpu_counter) percpu_counter_add_batch(&htab->pcount, -1, PERCPU_COUNTER_BATCH); else atomic_dec(&htab->count); } static void free_htab_elem(struct bpf_htab *htab, struct htab_elem *l) { htab_put_fd_value(htab, l); if (htab_is_prealloc(htab)) { bpf_map_dec_elem_count(&htab->map); check_and_free_fields(htab, l); pcpu_freelist_push(&htab->freelist, &l->fnode); } else { dec_elem_count(htab); htab_elem_free(htab, l); } } static void pcpu_copy_value(struct bpf_htab *htab, void __percpu *pptr, void *value, bool onallcpus, u64 map_flags) { void *ptr; if (!onallcpus) { /* copy true value_size bytes */ ptr = this_cpu_ptr(pptr); copy_map_value(&htab->map, ptr, value); bpf_obj_free_fields(htab->map.record, ptr); } else { u32 size = round_up(htab->map.value_size, 8); void *val; int cpu; if (map_flags & BPF_F_CPU) { cpu = map_flags >> 32; ptr = per_cpu_ptr(pptr, cpu); copy_map_value(&htab->map, ptr, value); bpf_obj_free_fields(htab->map.record, ptr); return; } for_each_possible_cpu(cpu) { ptr = per_cpu_ptr(pptr, cpu); val = (map_flags & BPF_F_ALL_CPUS) ? value : value + size * cpu; copy_map_value(&htab->map, ptr, val); bpf_obj_free_fields(htab->map.record, ptr); } } } static void pcpu_init_value(struct bpf_htab *htab, void __percpu *pptr, void *value, bool onallcpus, u64 map_flags) { /* When not setting the initial value on all cpus, zero-fill element * values for other cpus. Otherwise, bpf program has no way to ensure * known initial values for cpus other than current one * (onallcpus=false always when coming from bpf prog). */ if (!onallcpus) { int current_cpu = raw_smp_processor_id(); int cpu; for_each_possible_cpu(cpu) { if (cpu == current_cpu) copy_map_value_long(&htab->map, per_cpu_ptr(pptr, cpu), value); else /* Since elem is preallocated, we cannot touch special fields */ zero_map_value(&htab->map, per_cpu_ptr(pptr, cpu)); } } else { pcpu_copy_value(htab, pptr, value, onallcpus, map_flags); } } static bool fd_htab_map_needs_adjust(const struct bpf_htab *htab) { return is_fd_htab(htab) && BITS_PER_LONG == 64; } static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key, void *value, u32 key_size, u32 hash, bool percpu, bool onallcpus, struct htab_elem *old_elem, u64 map_flags) { u32 size = htab->map.value_size; bool prealloc = htab_is_prealloc(htab); struct htab_elem *l_new, **pl_new; void __percpu *pptr; if (prealloc) { if (old_elem) { /* if we're updating the existing element, * use per-cpu extra elems to avoid freelist_pop/push */ pl_new = this_cpu_ptr(htab->extra_elems); l_new = *pl_new; *pl_new = old_elem; } else { struct pcpu_freelist_node *l; l = __pcpu_freelist_pop(&htab->freelist); if (!l) return ERR_PTR(-E2BIG); l_new = container_of(l, struct htab_elem, fnode); bpf_map_inc_elem_count(&htab->map); } } else { if (is_map_full(htab)) if (!old_elem) /* when map is full and update() is replacing * old element, it's ok to allocate, since * old element will be freed immediately. * Otherwise return an error */ return ERR_PTR(-E2BIG); inc_elem_count(htab); l_new = bpf_mem_cache_alloc(&htab->ma); if (!l_new) { l_new = ERR_PTR(-ENOMEM); goto dec_count; } } memcpy(l_new->key, key, key_size); if (percpu) { if (prealloc) { pptr = htab_elem_get_ptr(l_new, key_size); } else { /* alloc_percpu zero-fills */ void *ptr = bpf_mem_cache_alloc(&htab->pcpu_ma); if (!ptr) { bpf_mem_cache_free(&htab->ma, l_new); l_new = ERR_PTR(-ENOMEM); goto dec_count; } l_new->ptr_to_pptr = ptr; pptr = *(void __percpu **)ptr; } pcpu_init_value(htab, pptr, value, onallcpus, map_flags); if (!prealloc) htab_elem_set_ptr(l_new, key_size, pptr); } else if (fd_htab_map_needs_adjust(htab)) { size = round_up(size, 8); memcpy(htab_elem_value(l_new, key_size), value, size); } else { copy_map_value(&htab->map, htab_elem_value(l_new, key_size), value); } l_new->hash = hash; return l_new; dec_count: dec_elem_count(htab); return l_new; } static int check_flags(struct bpf_htab *htab, struct htab_elem *l_old, u64 map_flags) { if (l_old && (map_flags & ~BPF_F_LOCK) == BPF_NOEXIST) /* elem already exists */ return -EEXIST; if (!l_old && (map_flags & ~BPF_F_LOCK) == BPF_EXIST) /* elem doesn't exist, cannot update it */ return -ENOENT; return 0; } /* Called from syscall or from eBPF program */ static long htab_map_update_elem(struct bpf_map *map, void *key, void *value, u64 map_flags) { struct bpf_htab *htab = container_of(map, struct bpf_htab, map); struct htab_elem *l_new, *l_old; struct hlist_nulls_head *head; unsigned long flags; struct bucket *b; u32 key_size, hash; int ret; if (unlikely((map_flags & ~BPF_F_LOCK) > BPF_EXIST)) /* unknown flags */ return -EINVAL; WARN_ON_ONCE(!bpf_rcu_lock_held()); key_size = map->key_size; hash = htab_map_hash(key, key_size, htab->hashrnd); b = __select_bucket(htab, hash); head = &b->head; if (unlikely(map_flags & BPF_F_LOCK)) { if (unlikely(!btf_record_has_field(map->record, BPF_SPIN_LOCK))) return -EINVAL; /* find an element without taking the bucket lock */ l_old = lookup_nulls_elem_raw(head, hash, key, key_size, htab->n_buckets); ret = check_flags(htab, l_old, map_flags); if (ret) return ret; if (l_old) { /* grab the element lock and update value in place */ copy_map_value_locked(map, htab_elem_value(l_old, key_size), value, false); return 0; } /* fall through, grab the bucket lock and lookup again. * 99.9% chance that the element won't be found, * but second lookup under lock has to be done. */ } ret = htab_lock_bucket(b, &flags); if (ret) return ret; l_old = lookup_elem_raw(head, hash, key, key_size); ret = check_flags(htab, l_old, map_flags); if (ret) goto err; if (unlikely(l_old && (map_flags & BPF_F_LOCK))) { /* first lookup without the bucket lock didn't find the element, * but second lookup with the bucket lock found it. * This case is highly unlikely, but has to be dealt with: * grab the element lock in addition to the bucket lock * and update element in place */ copy_map_value_locked(map, htab_elem_value(l_old, key_size), value, false); ret = 0; goto err; } l_new = alloc_htab_elem(htab, key, value, key_size, hash, false, false, l_old, map_flags); if (IS_ERR(l_new)) { /* all pre-allocated elements are in use or memory exhausted */ ret = PTR_ERR(l_new); goto err; } /* add new element to the head of the list, so that * concurrent search will find it before old elem */ hlist_nulls_add_head_rcu(&l_new->hash_node, head); if (l_old) { hlist_nulls_del_rcu(&l_old->hash_node); /* l_old has already been stashed in htab->extra_elems, free * its special fields before it is available for reuse. */ if (htab_is_prealloc(htab)) check_and_free_fields(htab, l_old); } htab_unlock_bucket(b, flags); if (l_old && !htab_is_prealloc(htab)) free_htab_elem(htab, l_old); return 0; err: htab_unlock_bucket(b, flags); return ret; } static void htab_lru_push_free(struct bpf_htab *htab, struct htab_elem *elem) { check_and_free_fields(htab, elem); bpf_map_dec_elem_count(&htab->map); bpf_lru_push_free(&htab->lru, &elem->lru_node); } static long htab_lru_map_update_elem(struct bpf_map *map, void *key, void *value, u64 map_flags) { struct bpf_htab *htab = container_of(map, struct bpf_htab, map); struct htab_elem *l_new, *l_old = NULL; struct hlist_nulls_head *head; unsigned long flags; struct bucket *b; u32 key_size, hash; int ret; if (unlikely(map_flags > BPF_EXIST)) /* unknown flags */ return -EINVAL; WARN_ON_ONCE(!bpf_rcu_lock_held()); key_size = map->key_size; hash = htab_map_hash(key, key_size, htab->hashrnd); b = __select_bucket(htab, hash); head = &b->head; /* For LRU, we need to alloc before taking bucket's * spinlock because getting free nodes from LRU may need * to remove older elements from htab and this removal * operation will need a bucket lock. */ l_new = prealloc_lru_pop(htab, key, hash); if (!l_new) return -ENOMEM; copy_map_value(&htab->map, htab_elem_value(l_new, map->key_size), value); ret = htab_lock_bucket(b, &flags); if (ret) goto err_lock_bucket; l_old = lookup_elem_raw(head, hash, key, key_size); ret = check_flags(htab, l_old, map_flags); if (ret) goto err; /* add new element to the head of the list, so that * concurrent search will find it before old elem */ hlist_nulls_add_head_rcu(&l_new->hash_node, head); if (l_old) { bpf_lru_node_set_ref(&l_new->lru_node); hlist_nulls_del_rcu(&l_old->hash_node); } ret = 0; err: htab_unlock_bucket(b, flags); err_lock_bucket: if (ret) htab_lru_push_free(htab, l_new); else if (l_old) htab_lru_push_free(htab, l_old); return ret; } static int htab_map_check_update_flags(bool onallcpus, u64 map_flags) { if (unlikely(!onallcpus && map_flags > BPF_EXIST)) return -EINVAL; if (unlikely(onallcpus && ((map_flags & BPF_F_LOCK) || (u32)map_flags > BPF_F_ALL_CPUS))) return -EINVAL; return 0; } static long htab_map_update_elem_in_place(struct bpf_map *map, void *key, void *value, u64 map_flags, bool percpu, bool onallcpus) { struct bpf_htab *htab = container_of(map, struct bpf_htab, map); struct htab_elem *l_new, *l_old; struct hlist_nulls_head *head; void *old_map_ptr = NULL; unsigned long flags; struct bucket *b; u32 key_size, hash; int ret; ret = htab_map_check_update_flags(onallcpus, map_flags); if (unlikely(ret)) return ret; WARN_ON_ONCE(!bpf_rcu_lock_held()); key_size = map->key_size; hash = htab_map_hash(key, key_size, htab->hashrnd); b = __select_bucket(htab, hash); head = &b->head; ret = htab_lock_bucket(b, &flags); if (ret) return ret; l_old = lookup_elem_raw(head, hash, key, key_size); ret = check_flags(htab, l_old, map_flags); if (ret) goto err; if (l_old) { /* Update value in-place */ if (percpu) { pcpu_copy_value(htab, htab_elem_get_ptr(l_old, key_size), value, onallcpus, map_flags); } else { void **inner_map_pptr = htab_elem_value(l_old, key_size); old_map_ptr = *inner_map_pptr; WRITE_ONCE(*inner_map_pptr, *(void **)value); } } else { l_new = alloc_htab_elem(htab, key, value, key_size, hash, percpu, onallcpus, NULL, map_flags); if (IS_ERR(l_new)) { ret = PTR_ERR(l_new); goto err; } hlist_nulls_add_head_rcu(&l_new->hash_node, head); } err: htab_unlock_bucket(b, flags); if (old_map_ptr) map->ops->map_fd_put_ptr(map, old_map_ptr, true); return ret; } static long __htab_lru_percpu_map_update_elem(struct bpf_map *map, void *key, void *value, u64 map_flags, bool onallcpus) { struct bpf_htab *htab = container_of(map, struct bpf_htab, map); struct htab_elem *l_new = NULL, *l_old; struct hlist_nulls_head *head; unsigned long flags; struct bucket *b; u32 key_size, hash; int ret; ret = htab_map_check_update_flags(onallcpus, map_flags); if (unlikely(ret)) return ret; WARN_ON_ONCE(!bpf_rcu_lock_held()); key_size = map->key_size; hash = htab_map_hash(key, key_size, htab->hashrnd); b = __select_bucket(htab, hash); head = &b->head; /* For LRU, we need to alloc before taking bucket's * spinlock because LRU's elem alloc may need * to remove older elem from htab and this removal * operation will need a bucket lock. */ if (map_flags != BPF_EXIST) { l_new = prealloc_lru_pop(htab, key, hash); if (!l_new) return -ENOMEM; } ret = htab_lock_bucket(b, &flags); if (ret) goto err_lock_bucket; l_old = lookup_elem_raw(head, hash, key, key_size); ret = check_flags(htab, l_old, map_flags); if (ret) goto err; if (l_old) { bpf_lru_node_set_ref(&l_old->lru_node); /* per-cpu hash map can update value in-place */ pcpu_copy_value(htab, htab_elem_get_ptr(l_old, key_size), value, onallcpus, map_flags); } else { pcpu_init_value(htab, htab_elem_get_ptr(l_new, key_size), value, onallcpus, map_flags); hlist_nulls_add_head_rcu(&l_new->hash_node, head); l_new = NULL; } ret = 0; err: htab_unlock_bucket(b, flags); err_lock_bucket: if (l_new) { bpf_map_dec_elem_count(&htab->map); bpf_lru_push_free(&htab->lru, &l_new->lru_node); } return ret; } static long htab_percpu_map_update_elem(struct bpf_map *map, void *key, void *value, u64 map_flags) { return htab_map_update_elem_in_place(map, key, value, map_flags, true, false); } static long htab_lru_percpu_map_update_elem(struct bpf_map *map, void *key, void *value, u64 map_flags) { return __htab_lru_percpu_map_update_elem(map, key, value, map_flags, false); } /* Called from syscall or from eBPF program */ static long htab_map_delete_elem(struct bpf_map *map, void *key) { struct bpf_htab *htab = container_of(map, struct bpf_htab, map); struct hlist_nulls_head *head; struct bucket *b; struct htab_elem *l; unsigned long flags; u32 hash, key_size; int ret; WARN_ON_ONCE(!bpf_rcu_lock_held()); key_size = map->key_size; hash = htab_map_hash(key, key_size, htab->hashrnd); b = __select_bucket(htab, hash); head = &b->head; ret = htab_lock_bucket(b, &flags); if (ret) return ret; l = lookup_elem_raw(head, hash, key, key_size); if (l) hlist_nulls_del_rcu(&l->hash_node); else ret = -ENOENT; htab_unlock_bucket(b, flags); if (l) free_htab_elem(htab, l); return ret; } static long htab_lru_map_delete_elem(struct bpf_map *map, void *key) { struct bpf_htab *htab = container_of(map, struct bpf_htab, map); struct hlist_nulls_head *head; struct bucket *b; struct htab_elem *l; unsigned long flags; u32 hash, key_size; int ret; WARN_ON_ONCE(!bpf_rcu_lock_held()); key_size = map->key_size; hash = htab_map_hash(key, key_size, htab->hashrnd); b = __select_bucket(htab, hash); head = &b->head; ret = htab_lock_bucket(b, &flags); if (ret) return ret; l = lookup_elem_raw(head, hash, key, key_size); if (l) hlist_nulls_del_rcu(&l->hash_node); else ret = -ENOENT; htab_unlock_bucket(b, flags); if (l) htab_lru_push_free(htab, l); return ret; } static void delete_all_elements(struct bpf_htab *htab) { int i; /* It's called from a worker thread and migration has been disabled, * therefore, it is OK to invoke bpf_mem_cache_free() directly. */ for (i = 0; i < htab->n_buckets; i++) { struct hlist_nulls_head *head = select_bucket(htab, i); struct hlist_nulls_node *n; struct htab_elem *l; hlist_nulls_for_each_entry_safe(l, n, head, hash_node) { hlist_nulls_del_rcu(&l->hash_node); htab_elem_free(htab, l); } cond_resched(); } } static void htab_free_malloced_internal_structs(struct bpf_htab *htab) { int i; rcu_read_lock(); for (i = 0; i < htab->n_buckets; i++) { struct hlist_nulls_head *head = select_bucket(htab, i); struct hlist_nulls_node *n; struct htab_elem *l; hlist_nulls_for_each_entry(l, n, head, hash_node) { /* We only free internal structs on uref dropping to zero */ bpf_map_free_internal_structs(&htab->map, htab_elem_value(l, htab->map.key_size)); } cond_resched_rcu(); } rcu_read_unlock(); } static void htab_map_free_internal_structs(struct bpf_map *map) { struct bpf_htab *htab = container_of(map, struct bpf_htab, map); /* We only free internal structs on uref dropping to zero */ if (!bpf_map_has_internal_structs(map)) return; if (htab_is_prealloc(htab)) htab_free_prealloced_internal_structs(htab); else htab_free_malloced_internal_structs(htab); } /* Called when map->refcnt goes to zero, either from workqueue or from syscall */ static void htab_map_free(struct bpf_map *map) { struct bpf_htab *htab = container_of(map, struct bpf_htab, map); /* bpf_free_used_maps() or close(map_fd) will trigger this map_free callback. * bpf_free_used_maps() is called after bpf prog is no longer executing. * There is no need to synchronize_rcu() here to protect map elements. */ /* htab no longer uses call_rcu() directly. bpf_mem_alloc does it * underneath and is responsible for waiting for callbacks to finish * during bpf_mem_alloc_destroy(). */ if (!htab_is_prealloc(htab)) { delete_all_elements(htab); } else { htab_free_prealloced_fields(htab); prealloc_destroy(htab); } bpf_map_free_elem_count(map); free_percpu(htab->extra_elems); bpf_map_area_free(htab->buckets); bpf_mem_alloc_destroy(&htab->pcpu_ma); bpf_mem_alloc_destroy(&htab->ma); if (htab->use_percpu_counter) percpu_counter_destroy(&htab->pcount); bpf_map_area_free(htab); } static void htab_map_seq_show_elem(struct bpf_map *map, void *key, struct seq_file *m) { void *value; rcu_read_lock(); value = htab_map_lookup_elem(map, key); if (!value) { rcu_read_unlock(); return; } btf_type_seq_show(map->btf, map->btf_key_type_id, key, m); seq_puts(m, ": "); btf_type_seq_show(map->btf, map->btf_value_type_id, value, m); seq_putc(m, '\n'); rcu_read_unlock(); } static int __htab_map_lookup_and_delete_elem(struct bpf_map *map, void *key, void *value, bool is_lru_map, bool is_percpu, u64 flags) { struct bpf_htab *htab = container_of(map, struct bpf_htab, map); struct hlist_nulls_head *head; unsigned long bflags; struct htab_elem *l; u32 hash, key_size; struct bucket *b; int ret; key_size = map->key_size; hash = htab_map_hash(key, key_size, htab->hashrnd); b = __select_bucket(htab, hash); head = &b->head; ret = htab_lock_bucket(b, &bflags); if (ret) return ret; l = lookup_elem_raw(head, hash, key, key_size); if (!l) { ret = -ENOENT; goto out_unlock; } if (is_percpu) { u32 roundup_value_size = round_up(map->value_size, 8); void __percpu *pptr; int off = 0, cpu; pptr = htab_elem_get_ptr(l, key_size); for_each_possible_cpu(cpu) { copy_map_value_long(&htab->map, value + off, per_cpu_ptr(pptr, cpu)); check_and_init_map_value(&htab->map, value + off); off += roundup_value_size; } } else { void *src = htab_elem_value(l, map->key_size); if (flags & BPF_F_LOCK) copy_map_value_locked(map, value, src, true); else copy_map_value(map, value, src); /* Zeroing special fields in the temp buffer */ check_and_init_map_value(map, value); } hlist_nulls_del_rcu(&l->hash_node); out_unlock: htab_unlock_bucket(b, bflags); if (l) { if (is_lru_map) htab_lru_push_free(htab, l); else free_htab_elem(htab, l); } return ret; } static int htab_map_lookup_and_delete_elem(struct bpf_map *map, void *key, void *value, u64 flags) { return __htab_map_lookup_and_delete_elem(map, key, value, false, false, flags); } static int htab_percpu_map_lookup_and_delete_elem(struct bpf_map *map, void *key, void *value, u64 flags) { return __htab_map_lookup_and_delete_elem(map, key, value, false, true, flags); } static int htab_lru_map_lookup_and_delete_elem(struct bpf_map *map, void *key, void *value, u64 flags) { return __htab_map_lookup_and_delete_elem(map, key, value, true, false, flags); } static int htab_lru_percpu_map_lookup_and_delete_elem(struct bpf_map *map, void *key, void *value, u64 flags) { return __htab_map_lookup_and_delete_elem(map, key, value, true, true, flags); } static int __htab_map_lookup_and_delete_batch(struct bpf_map *map, const union bpf_attr *attr, union bpf_attr __user *uattr, bool do_delete, bool is_lru_map, bool is_percpu) { struct bpf_htab *htab = container_of(map, struct bpf_htab, map); void *keys = NULL, *values = NULL, *value, *dst_key, *dst_val; void __user *uvalues = u64_to_user_ptr(attr->batch.values); void __user *ukeys = u64_to_user_ptr(attr->batch.keys); void __user *ubatch = u64_to_user_ptr(attr->batch.in_batch); u32 batch, max_count, size, bucket_size, map_id; u64 elem_map_flags, map_flags, allowed_flags; u32 bucket_cnt, total, key_size, value_size; struct htab_elem *node_to_free = NULL; struct hlist_nulls_head *head; struct hlist_nulls_node *n; unsigned long flags = 0; bool locked = false; struct htab_elem *l; struct bucket *b; int ret = 0; elem_map_flags = attr->batch.elem_flags; allowed_flags = BPF_F_LOCK; if (!do_delete && is_percpu) allowed_flags |= BPF_F_CPU; ret = bpf_map_check_op_flags(map, elem_map_flags, allowed_flags); if (ret) return ret; map_flags = attr->batch.flags; if (map_flags) return -EINVAL; max_count = attr->batch.count; if (!max_count) return 0; if (put_user(0, &uattr->batch.count)) return -EFAULT; batch = 0; if (ubatch && copy_from_user(&batch, ubatch, sizeof(batch))) return -EFAULT; if (batch >= htab->n_buckets) return -ENOENT; key_size = htab->map.key_size; value_size = htab->map.value_size; size = round_up(value_size, 8); if (is_percpu && !(elem_map_flags & BPF_F_CPU)) value_size = size * num_possible_cpus(); total = 0; /* while experimenting with hash tables with sizes ranging from 10 to * 1000, it was observed that a bucket can have up to 5 entries. */ bucket_size = 5; alloc: /* We cannot do copy_from_user or copy_to_user inside * the rcu_read_lock. Allocate enough space here. */ keys = kvmalloc_array(key_size, bucket_size, GFP_USER | __GFP_NOWARN); values = kvmalloc_array(value_size, bucket_size, GFP_USER | __GFP_NOWARN); if (!keys || !values) { ret = -ENOMEM; goto after_loop; } again: bpf_disable_instrumentation(); rcu_read_lock(); again_nocopy: dst_key = keys; dst_val = values; b = &htab->buckets[batch]; head = &b->head; /* do not grab the lock unless need it (bucket_cnt > 0). */ if (locked) { ret = htab_lock_bucket(b, &flags); if (ret) { rcu_read_unlock(); bpf_enable_instrumentation(); goto after_loop; } } bucket_cnt = 0; hlist_nulls_for_each_entry_rcu(l, n, head, hash_node) bucket_cnt++; if (bucket_cnt && !locked) { locked = true; goto again_nocopy; } if (bucket_cnt > (max_count - total)) { if (total == 0) ret = -ENOSPC; /* Note that since bucket_cnt > 0 here, it is implicit * that the locked was grabbed, so release it. */ htab_unlock_bucket(b, flags); rcu_read_unlock(); bpf_enable_instrumentation(); goto after_loop; } if (bucket_cnt > bucket_size) { bucket_size = bucket_cnt; /* Note that since bucket_cnt > 0 here, it is implicit * that the locked was grabbed, so release it. */ htab_unlock_bucket(b, flags); rcu_read_unlock(); bpf_enable_instrumentation(); kvfree(keys); kvfree(values); goto alloc; } /* Next block is only safe to run if you have grabbed the lock */ if (!locked) goto next_batch; hlist_nulls_for_each_entry_safe(l, n, head, hash_node) { memcpy(dst_key, l->key, key_size); if (is_percpu) { int off = 0, cpu; void __percpu *pptr; pptr = htab_elem_get_ptr(l, map->key_size); if (elem_map_flags & BPF_F_CPU) { cpu = elem_map_flags >> 32; copy_map_value(&htab->map, dst_val, per_cpu_ptr(pptr, cpu)); check_and_init_map_value(&htab->map, dst_val); } else { for_each_possible_cpu(cpu) { copy_map_value_long(&htab->map, dst_val + off, per_cpu_ptr(pptr, cpu)); check_and_init_map_value(&htab->map, dst_val + off); off += size; } } } else { value = htab_elem_value(l, key_size); if (is_fd_htab(htab)) { struct bpf_map **inner_map = value; /* Actual value is the id of the inner map */ map_id = map->ops->map_fd_sys_lookup_elem(*inner_map); value = &map_id; } if (elem_map_flags & BPF_F_LOCK) copy_map_value_locked(map, dst_val, value, true); else copy_map_value(map, dst_val, value); /* Zeroing special fields in the temp buffer */ check_and_init_map_value(map, dst_val); } if (do_delete) { hlist_nulls_del_rcu(&l->hash_node); /* bpf_lru_push_free() will acquire lru_lock, which * may cause deadlock. See comments in function * prealloc_lru_pop(). Let us do bpf_lru_push_free() * after releasing the bucket lock. * * For htab of maps, htab_put_fd_value() in * free_htab_elem() may acquire a spinlock with bucket * lock being held and it violates the lock rule, so * invoke free_htab_elem() after unlock as well. */ l->batch_flink = node_to_free; node_to_free = l; } dst_key += key_size; dst_val += value_size; } htab_unlock_bucket(b, flags); locked = false; while (node_to_free) { l = node_to_free; node_to_free = node_to_free->batch_flink; if (is_lru_map) htab_lru_push_free(htab, l); else free_htab_elem(htab, l); } next_batch: /* If we are not copying data, we can go to next bucket and avoid * unlocking the rcu. */ if (!bucket_cnt && (batch + 1 < htab->n_buckets)) { batch++; goto again_nocopy; } rcu_read_unlock(); bpf_enable_instrumentation(); if (bucket_cnt && (copy_to_user(ukeys + total * key_size, keys, key_size * bucket_cnt) || copy_to_user(uvalues + total * value_size, values, value_size * bucket_cnt))) { ret = -EFAULT; goto after_loop; } total += bucket_cnt; batch++; if (batch >= htab->n_buckets) { ret = -ENOENT; goto after_loop; } goto again; after_loop: if (ret == -EFAULT) goto out; /* copy # of entries and next batch */ ubatch = u64_to_user_ptr(attr->batch.out_batch); if (copy_to_user(ubatch, &batch, sizeof(batch)) || put_user(total, &uattr->batch.count)) ret = -EFAULT; out: kvfree(keys); kvfree(values); return ret; } static int htab_percpu_map_lookup_batch(struct bpf_map *map, const union bpf_attr *attr, union bpf_attr __user *uattr) { return __htab_map_lookup_and_delete_batch(map, attr, uattr, false, false, true); } static int htab_percpu_map_lookup_and_delete_batch(struct bpf_map *map, const union bpf_attr *attr, union bpf_attr __user *uattr) { return __htab_map_lookup_and_delete_batch(map, attr, uattr, true, false, true); } static int htab_map_lookup_batch(struct bpf_map *map, const union bpf_attr *attr, union bpf_attr __user *uattr) { return __htab_map_lookup_and_delete_batch(map, attr, uattr, false, false, false); } static int htab_map_lookup_and_delete_batch(struct bpf_map *map, const union bpf_attr *attr, union bpf_attr __user *uattr) { return __htab_map_lookup_and_delete_batch(map, attr, uattr, true, false, false); } static int htab_lru_percpu_map_lookup_batch(struct bpf_map *map, const union bpf_attr *attr, union bpf_attr __user *uattr) { return __htab_map_lookup_and_delete_batch(map, attr, uattr, false, true, true); } static int htab_lru_percpu_map_lookup_and_delete_batch(struct bpf_map *map, const union bpf_attr *attr, union bpf_attr __user *uattr) { return __htab_map_lookup_and_delete_batch(map, attr, uattr, true, true, true); } static int htab_lru_map_lookup_batch(struct bpf_map *map, const union bpf_attr *attr, union bpf_attr __user *uattr) { return __htab_map_lookup_and_delete_batch(map, attr, uattr, false, true, false); } static int htab_lru_map_lookup_and_delete_batch(struct bpf_map *map, const union bpf_attr *attr, union bpf_attr __user *uattr) { return __htab_map_lookup_and_delete_batch(map, attr, uattr, true, true, false); } struct bpf_iter_seq_hash_map_info { struct bpf_map *map; struct bpf_htab *htab; void *percpu_value_buf; // non-zero means percpu hash u32 bucket_id; u32 skip_elems; }; static struct htab_elem * bpf_hash_map_seq_find_next(struct bpf_iter_seq_hash_map_info *info, struct htab_elem *prev_elem) { const struct bpf_htab *htab = info->htab; u32 skip_elems = info->skip_elems; u32 bucket_id = info->bucket_id; struct hlist_nulls_head *head; struct hlist_nulls_node *n; struct htab_elem *elem; struct bucket *b; u32 i, count; if (bucket_id >= htab->n_buckets) return NULL; /* try to find next elem in the same bucket */ if (prev_elem) { /* no update/deletion on this bucket, prev_elem should be still valid * and we won't skip elements. */ n = rcu_dereference_raw(hlist_nulls_next_rcu(&prev_elem->hash_node)); elem = hlist_nulls_entry_safe(n, struct htab_elem, hash_node); if (elem) return elem; /* not found, unlock and go to the next bucket */ b = &htab->buckets[bucket_id++]; rcu_read_unlock(); skip_elems = 0; } for (i = bucket_id; i < htab->n_buckets; i++) { b = &htab->buckets[i]; rcu_read_lock(); count = 0; head = &b->head; hlist_nulls_for_each_entry_rcu(elem, n, head, hash_node) { if (count >= skip_elems) { info->bucket_id = i; info->skip_elems = count; return elem; } count++; } rcu_read_unlock(); skip_elems = 0; } info->bucket_id = i; info->skip_elems = 0; return NULL; } static void *bpf_hash_map_seq_start(struct seq_file *seq, loff_t *pos) { struct bpf_iter_seq_hash_map_info *info = seq->private; struct htab_elem *elem; elem = bpf_hash_map_seq_find_next(info, NULL); if (!elem) return NULL; if (*pos == 0) ++*pos; return elem; } static void *bpf_hash_map_seq_next(struct seq_file *seq, void *v, loff_t *pos) { struct bpf_iter_seq_hash_map_info *info = seq->private; ++*pos; ++info->skip_elems; return bpf_hash_map_seq_find_next(info, v); } static int __bpf_hash_map_seq_show(struct seq_file *seq, struct htab_elem *elem) { struct bpf_iter_seq_hash_map_info *info = seq->private; struct bpf_iter__bpf_map_elem ctx = {}; struct bpf_map *map = info->map; struct bpf_iter_meta meta; int ret = 0, off = 0, cpu; u32 roundup_value_size; struct bpf_prog *prog; void __percpu *pptr; meta.seq = seq; prog = bpf_iter_get_info(&meta, elem == NULL); if (prog) { ctx.meta = &meta; ctx.map = info->map; if (elem) { ctx.key = elem->key; if (!info->percpu_value_buf) { ctx.value = htab_elem_value(elem, map->key_size); } else { roundup_value_size = round_up(map->value_size, 8); pptr = htab_elem_get_ptr(elem, map->key_size); for_each_possible_cpu(cpu) { copy_map_value_long(map, info->percpu_value_buf + off, per_cpu_ptr(pptr, cpu)); check_and_init_map_value(map, info->percpu_value_buf + off); off += roundup_value_size; } ctx.value = info->percpu_value_buf; } } ret = bpf_iter_run_prog(prog, &ctx); } return ret; } static int bpf_hash_map_seq_show(struct seq_file *seq, void *v) { return __bpf_hash_map_seq_show(seq, v); } static void bpf_hash_map_seq_stop(struct seq_file *seq, void *v) { if (!v) (void)__bpf_hash_map_seq_show(seq, NULL); else rcu_read_unlock(); } static int bpf_iter_init_hash_map(void *priv_data, struct bpf_iter_aux_info *aux) { struct bpf_iter_seq_hash_map_info *seq_info = priv_data; struct bpf_map *map = aux->map; void *value_buf; u32 buf_size; if (map->map_type == BPF_MAP_TYPE_PERCPU_HASH || map->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH) { buf_size = round_up(map->value_size, 8) * num_possible_cpus(); value_buf = kmalloc(buf_size, GFP_USER | __GFP_NOWARN); if (!value_buf) return -ENOMEM; seq_info->percpu_value_buf = value_buf; } bpf_map_inc_with_uref(map); seq_info->map = map; seq_info->htab = container_of(map, struct bpf_htab, map); return 0; } static void bpf_iter_fini_hash_map(void *priv_data) { struct bpf_iter_seq_hash_map_info *seq_info = priv_data; bpf_map_put_with_uref(seq_info->map); kfree(seq_info->percpu_value_buf); } static const struct seq_operations bpf_hash_map_seq_ops = { .start = bpf_hash_map_seq_start, .next = bpf_hash_map_seq_next, .stop = bpf_hash_map_seq_stop, .show = bpf_hash_map_seq_show, }; static const struct bpf_iter_seq_info iter_seq_info = { .seq_ops = &bpf_hash_map_seq_ops, .init_seq_private = bpf_iter_init_hash_map, .fini_seq_private = bpf_iter_fini_hash_map, .seq_priv_size = sizeof(struct bpf_iter_seq_hash_map_info), }; static long bpf_for_each_hash_elem(struct bpf_map *map, bpf_callback_t callback_fn, void *callback_ctx, u64 flags) { struct bpf_htab *htab = container_of(map, struct bpf_htab, map); struct hlist_nulls_head *head; struct hlist_nulls_node *n; struct htab_elem *elem; int i, num_elems = 0; void __percpu *pptr; struct bucket *b; void *key, *val; bool is_percpu; u64 ret = 0; cant_migrate(); if (flags != 0) return -EINVAL; is_percpu = htab_is_percpu(htab); /* migration has been disabled, so percpu value prepared here will be * the same as the one seen by the bpf program with * bpf_map_lookup_elem(). */ for (i = 0; i < htab->n_buckets; i++) { b = &htab->buckets[i]; rcu_read_lock(); head = &b->head; hlist_nulls_for_each_entry_safe(elem, n, head, hash_node) { key = elem->key; if (is_percpu) { /* current cpu value for percpu map */ pptr = htab_elem_get_ptr(elem, map->key_size); val = this_cpu_ptr(pptr); } else { val = htab_elem_value(elem, map->key_size); } num_elems++; ret = callback_fn((u64)(long)map, (u64)(long)key, (u64)(long)val, (u64)(long)callback_ctx, 0); /* return value: 0 - continue, 1 - stop and return */ if (ret) { rcu_read_unlock(); goto out; } } rcu_read_unlock(); } out: return num_elems; } static u64 htab_map_mem_usage(const struct bpf_map *map) { struct bpf_htab *htab = container_of(map, struct bpf_htab, map); u32 value_size = round_up(htab->map.value_size, 8); bool prealloc = htab_is_prealloc(htab); bool percpu = htab_is_percpu(htab); bool lru = htab_is_lru(htab); u64 num_entries, usage; usage = sizeof(struct bpf_htab) + sizeof(struct bucket) * htab->n_buckets; if (prealloc) { num_entries = map->max_entries; if (htab_has_extra_elems(htab)) num_entries += num_possible_cpus(); usage += htab->elem_size * num_entries; if (percpu) usage += value_size * num_possible_cpus() * num_entries; else if (!lru) usage += sizeof(struct htab_elem *) * num_possible_cpus(); } else { #define LLIST_NODE_SZ sizeof(struct llist_node) num_entries = htab->use_percpu_counter ? percpu_counter_sum(&htab->pcount) : atomic_read(&htab->count); usage += (htab->elem_size + LLIST_NODE_SZ) * num_entries; if (percpu) { usage += (LLIST_NODE_SZ + sizeof(void *)) * num_entries; usage += value_size * num_possible_cpus() * num_entries; } } return usage; } BTF_ID_LIST_SINGLE(htab_map_btf_ids, struct, bpf_htab) const struct bpf_map_ops htab_map_ops = { .map_meta_equal = bpf_map_meta_equal, .map_alloc_check = htab_map_alloc_check, .map_alloc = htab_map_alloc, .map_free = htab_map_free, .map_get_next_key = htab_map_get_next_key, .map_release_uref = htab_map_free_internal_structs, .map_lookup_elem = htab_map_lookup_elem, .map_lookup_and_delete_elem = htab_map_lookup_and_delete_elem, .map_update_elem = htab_map_update_elem, .map_delete_elem = htab_map_delete_elem, .map_gen_lookup = htab_map_gen_lookup, .map_seq_show_elem = htab_map_seq_show_elem, .map_set_for_each_callback_args = map_set_for_each_callback_args, .map_for_each_callback = bpf_for_each_hash_elem, .map_check_btf = htab_map_check_btf, .map_mem_usage = htab_map_mem_usage, BATCH_OPS(htab), .map_btf_id = &htab_map_btf_ids[0], .iter_seq_info = &iter_seq_info, }; const struct bpf_map_ops htab_lru_map_ops = { .map_meta_equal = bpf_map_meta_equal, .map_alloc_check = htab_map_alloc_check, .map_alloc = htab_map_alloc, .map_free = htab_map_free, .map_get_next_key = htab_map_get_next_key, .map_release_uref = htab_map_free_internal_structs, .map_lookup_elem = htab_lru_map_lookup_elem, .map_lookup_and_delete_elem = htab_lru_map_lookup_and_delete_elem, .map_lookup_elem_sys_only = htab_lru_map_lookup_elem_sys, .map_update_elem = htab_lru_map_update_elem, .map_delete_elem = htab_lru_map_delete_elem, .map_gen_lookup = htab_lru_map_gen_lookup, .map_seq_show_elem = htab_map_seq_show_elem, .map_set_for_each_callback_args = map_set_for_each_callback_args, .map_for_each_callback = bpf_for_each_hash_elem, .map_check_btf = htab_map_check_btf, .map_mem_usage = htab_map_mem_usage, BATCH_OPS(htab_lru), .map_btf_id = &htab_map_btf_ids[0], .iter_seq_info = &iter_seq_info, }; /* Called from eBPF program */ static void *htab_percpu_map_lookup_elem(struct bpf_map *map, void *key) { struct htab_elem *l = __htab_map_lookup_elem(map, key); if (l) return this_cpu_ptr(htab_elem_get_ptr(l, map->key_size)); else return NULL; } /* inline bpf_map_lookup_elem() call for per-CPU hashmap */ static int htab_percpu_map_gen_lookup(struct bpf_map *map, struct bpf_insn *insn_buf) { struct bpf_insn *insn = insn_buf; if (!bpf_jit_supports_percpu_insn()) return -EOPNOTSUPP; BUILD_BUG_ON(!__same_type(&__htab_map_lookup_elem, (void *(*)(struct bpf_map *map, void *key))NULL)); *insn++ = BPF_EMIT_CALL(__htab_map_lookup_elem); *insn++ = BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 3); *insn++ = BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, offsetof(struct htab_elem, key) + roundup(map->key_size, 8)); *insn++ = BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0); *insn++ = BPF_MOV64_PERCPU_REG(BPF_REG_0, BPF_REG_0); return insn - insn_buf; } static void *htab_percpu_map_lookup_percpu_elem(struct bpf_map *map, void *key, u32 cpu) { struct htab_elem *l; if (cpu >= nr_cpu_ids) return NULL; l = __htab_map_lookup_elem(map, key); if (l) return per_cpu_ptr(htab_elem_get_ptr(l, map->key_size), cpu); else return NULL; } static void *htab_lru_percpu_map_lookup_elem(struct bpf_map *map, void *key) { struct htab_elem *l = __htab_map_lookup_elem(map, key); if (l) { bpf_lru_node_set_ref(&l->lru_node); return this_cpu_ptr(htab_elem_get_ptr(l, map->key_size)); } return NULL; } static void *htab_lru_percpu_map_lookup_percpu_elem(struct bpf_map *map, void *key, u32 cpu) { struct htab_elem *l; if (cpu >= nr_cpu_ids) return NULL; l = __htab_map_lookup_elem(map, key); if (l) { bpf_lru_node_set_ref(&l->lru_node); return per_cpu_ptr(htab_elem_get_ptr(l, map->key_size), cpu); } return NULL; } int bpf_percpu_hash_copy(struct bpf_map *map, void *key, void *value, u64 map_flags) { struct htab_elem *l; void __percpu *pptr; int ret = -ENOENT; int cpu, off = 0; u32 size; /* per_cpu areas are zero-filled and bpf programs can only * access 'value_size' of them, so copying rounded areas * will not leak any kernel data */ size = round_up(map->value_size, 8); rcu_read_lock(); l = __htab_map_lookup_elem(map, key); if (!l) goto out; ret = 0; /* We do not mark LRU map element here in order to not mess up * eviction heuristics when user space does a map walk. */ pptr = htab_elem_get_ptr(l, map->key_size); if (map_flags & BPF_F_CPU) { cpu = map_flags >> 32; copy_map_value(map, value, per_cpu_ptr(pptr, cpu)); check_and_init_map_value(map, value); goto out; } for_each_possible_cpu(cpu) { copy_map_value_long(map, value + off, per_cpu_ptr(pptr, cpu)); check_and_init_map_value(map, value + off); off += size; } out: rcu_read_unlock(); return ret; } int bpf_percpu_hash_update(struct bpf_map *map, void *key, void *value, u64 map_flags) { struct bpf_htab *htab = container_of(map, struct bpf_htab, map); int ret; rcu_read_lock(); if (htab_is_lru(htab)) ret = __htab_lru_percpu_map_update_elem(map, key, value, map_flags, true); else ret = htab_map_update_elem_in_place(map, key, value, map_flags, true, true); rcu_read_unlock(); return ret; } static void htab_percpu_map_seq_show_elem(struct bpf_map *map, void *key, struct seq_file *m) { struct htab_elem *l; void __percpu *pptr; int cpu; rcu_read_lock(); l = __htab_map_lookup_elem(map, key); if (!l) { rcu_read_unlock(); return; } btf_type_seq_show(map->btf, map->btf_key_type_id, key, m); seq_puts(m, ": {\n"); pptr = htab_elem_get_ptr(l, map->key_size); for_each_possible_cpu(cpu) { seq_printf(m, "\tcpu%d: ", cpu); btf_type_seq_show(map->btf, map->btf_value_type_id, per_cpu_ptr(pptr, cpu), m); seq_putc(m, '\n'); } seq_puts(m, "}\n"); rcu_read_unlock(); } const struct bpf_map_ops htab_percpu_map_ops = { .map_meta_equal = bpf_map_meta_equal, .map_alloc_check = htab_map_alloc_check, .map_alloc = htab_map_alloc, .map_free = htab_map_free, .map_get_next_key = htab_map_get_next_key, .map_lookup_elem = htab_percpu_map_lookup_elem, .map_gen_lookup = htab_percpu_map_gen_lookup, .map_lookup_and_delete_elem = htab_percpu_map_lookup_and_delete_elem, .map_update_elem = htab_percpu_map_update_elem, .map_delete_elem = htab_map_delete_elem, .map_lookup_percpu_elem = htab_percpu_map_lookup_percpu_elem, .map_seq_show_elem = htab_percpu_map_seq_show_elem, .map_set_for_each_callback_args = map_set_for_each_callback_args, .map_for_each_callback = bpf_for_each_hash_elem, .map_check_btf = htab_map_check_btf, .map_mem_usage = htab_map_mem_usage, BATCH_OPS(htab_percpu), .map_btf_id = &htab_map_btf_ids[0], .iter_seq_info = &iter_seq_info, }; const struct bpf_map_ops htab_lru_percpu_map_ops = { .map_meta_equal = bpf_map_meta_equal, .map_alloc_check = htab_map_alloc_check, .map_alloc = htab_map_alloc, .map_free = htab_map_free, .map_get_next_key = htab_map_get_next_key, .map_lookup_elem = htab_lru_percpu_map_lookup_elem, .map_lookup_and_delete_elem = htab_lru_percpu_map_lookup_and_delete_elem, .map_update_elem = htab_lru_percpu_map_update_elem, .map_delete_elem = htab_lru_map_delete_elem, .map_lookup_percpu_elem = htab_lru_percpu_map_lookup_percpu_elem, .map_seq_show_elem = htab_percpu_map_seq_show_elem, .map_set_for_each_callback_args = map_set_for_each_callback_args, .map_for_each_callback = bpf_for_each_hash_elem, .map_check_btf = htab_map_check_btf, .map_mem_usage = htab_map_mem_usage, BATCH_OPS(htab_lru_percpu), .map_btf_id = &htab_map_btf_ids[0], .iter_seq_info = &iter_seq_info, }; static int fd_htab_map_alloc_check(union bpf_attr *attr) { if (attr->value_size != sizeof(u32)) return -EINVAL; return htab_map_alloc_check(attr); } static void fd_htab_map_free(struct bpf_map *map) { struct bpf_htab *htab = container_of(map, struct bpf_htab, map); struct hlist_nulls_node *n; struct hlist_nulls_head *head; struct htab_elem *l; int i; for (i = 0; i < htab->n_buckets; i++) { head = select_bucket(htab, i); hlist_nulls_for_each_entry_safe(l, n, head, hash_node) { void *ptr = fd_htab_map_get_ptr(map, l); map->ops->map_fd_put_ptr(map, ptr, false); } } htab_map_free(map); } /* only called from syscall */ int bpf_fd_htab_map_lookup_elem(struct bpf_map *map, void *key, u32 *value) { void **ptr; int ret = 0; if (!map->ops->map_fd_sys_lookup_elem) return -ENOTSUPP; rcu_read_lock(); ptr = htab_map_lookup_elem(map, key); if (ptr) *value = map->ops->map_fd_sys_lookup_elem(READ_ONCE(*ptr)); else ret = -ENOENT; rcu_read_unlock(); return ret; } /* Only called from syscall */ int bpf_fd_htab_map_update_elem(struct bpf_map *map, struct file *map_file, void *key, void *value, u64 map_flags) { void *ptr; int ret; ptr = map->ops->map_fd_get_ptr(map, map_file, *(int *)value); if (IS_ERR(ptr)) return PTR_ERR(ptr); /* The htab bucket lock is always held during update operations in fd * htab map, and the following rcu_read_lock() is only used to avoid * the WARN_ON_ONCE in htab_map_update_elem_in_place(). */ rcu_read_lock(); ret = htab_map_update_elem_in_place(map, key, &ptr, map_flags, false, false); rcu_read_unlock(); if (ret) map->ops->map_fd_put_ptr(map, ptr, false); return ret; } static struct bpf_map *htab_of_map_alloc(union bpf_attr *attr) { struct bpf_map *map, *inner_map_meta; inner_map_meta = bpf_map_meta_alloc(attr->inner_map_fd); if (IS_ERR(inner_map_meta)) return inner_map_meta; map = htab_map_alloc(attr); if (IS_ERR(map)) { bpf_map_meta_free(inner_map_meta); return map; } map->inner_map_meta = inner_map_meta; return map; } static void *htab_of_map_lookup_elem(struct bpf_map *map, void *key) { struct bpf_map **inner_map = htab_map_lookup_elem(map, key); if (!inner_map) return NULL; return READ_ONCE(*inner_map); } static int htab_of_map_gen_lookup(struct bpf_map *map, struct bpf_insn *insn_buf) { struct bpf_insn *insn = insn_buf; const int ret = BPF_REG_0; BUILD_BUG_ON(!__same_type(&__htab_map_lookup_elem, (void *(*)(struct bpf_map *map, void *key))NULL)); *insn++ = BPF_EMIT_CALL(__htab_map_lookup_elem); *insn++ = BPF_JMP_IMM(BPF_JEQ, ret, 0, 2); *insn++ = BPF_ALU64_IMM(BPF_ADD, ret, offsetof(struct htab_elem, key) + round_up(map->key_size, 8)); *insn++ = BPF_LDX_MEM(BPF_DW, ret, ret, 0); return insn - insn_buf; } static void htab_of_map_free(struct bpf_map *map) { bpf_map_meta_free(map->inner_map_meta); fd_htab_map_free(map); } const struct bpf_map_ops htab_of_maps_map_ops = { .map_alloc_check = fd_htab_map_alloc_check, .map_alloc = htab_of_map_alloc, .map_free = htab_of_map_free, .map_get_next_key = htab_map_get_next_key, .map_lookup_elem = htab_of_map_lookup_elem, .map_delete_elem = htab_map_delete_elem, .map_fd_get_ptr = bpf_map_fd_get_ptr, .map_fd_put_ptr = bpf_map_fd_put_ptr, .map_fd_sys_lookup_elem = bpf_map_fd_sys_lookup_elem, .map_gen_lookup = htab_of_map_gen_lookup, .map_check_btf = map_check_no_btf, .map_mem_usage = htab_map_mem_usage, BATCH_OPS(htab), .map_btf_id = &htab_map_btf_ids[0], }; |
| 9 9 14 1 14 2 2 1 14 12 7 7 5 12 6 28 27 28 28 28 5 5 1 4 2 5 5 4 5 5 6 6 6 5 1 6 3 1 1 1 1 1 3 3 18 18 2 2 2 2 1 2 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 | // SPDX-License-Identifier: GPL-2.0-or-later /* * Spanning tree protocol; interface code * Linux ethernet bridge * * Authors: * Lennert Buytenhek <buytenh@gnu.org> */ #include <linux/kernel.h> #include <linux/kmod.h> #include <linux/etherdevice.h> #include <linux/rtnetlink.h> #include <net/switchdev.h> #include "br_private.h" #include "br_private_stp.h" /* Port id is composed of priority and port number. * NB: some bits of priority are dropped to * make room for more ports. */ static inline port_id br_make_port_id(__u8 priority, __u16 port_no) { return ((u16)priority << BR_PORT_BITS) | (port_no & ((1<<BR_PORT_BITS)-1)); } #define BR_MAX_PORT_PRIORITY ((u16)~0 >> BR_PORT_BITS) /* called under bridge lock */ void br_init_port(struct net_bridge_port *p) { int err; p->port_id = br_make_port_id(p->priority, p->port_no); br_become_designated_port(p); br_set_state(p, BR_STATE_BLOCKING); p->topology_change_ack = 0; p->config_pending = 0; err = __set_ageing_time(p->dev, p->br->ageing_time); if (err) netdev_err(p->dev, "failed to offload ageing time\n"); } /* NO locks held */ void br_stp_enable_bridge(struct net_bridge *br) { struct net_bridge_port *p; spin_lock_bh(&br->lock); if (br->stp_enabled == BR_KERNEL_STP) mod_timer(&br->hello_timer, jiffies + br->hello_time); mod_delayed_work(system_long_wq, &br->gc_work, HZ / 10); br_config_bpdu_generation(br); list_for_each_entry(p, &br->port_list, list) { if (netif_running(p->dev) && netif_oper_up(p->dev)) br_stp_enable_port(p); } spin_unlock_bh(&br->lock); } /* NO locks held */ void br_stp_disable_bridge(struct net_bridge *br) { struct net_bridge_port *p; spin_lock_bh(&br->lock); list_for_each_entry(p, &br->port_list, list) { if (p->state != BR_STATE_DISABLED) br_stp_disable_port(p); } __br_set_topology_change(br, 0); br->topology_change_detected = 0; spin_unlock_bh(&br->lock); timer_delete_sync(&br->hello_timer); timer_delete_sync(&br->topology_change_timer); timer_delete_sync(&br->tcn_timer); cancel_delayed_work_sync(&br->gc_work); } /* called under bridge lock */ void br_stp_enable_port(struct net_bridge_port *p) { br_init_port(p); br_port_state_selection(p->br); br_ifinfo_notify(RTM_NEWLINK, NULL, p); } /* called under bridge lock */ void br_stp_disable_port(struct net_bridge_port *p) { struct net_bridge *br = p->br; int wasroot; wasroot = br_is_root_bridge(br); br_become_designated_port(p); br_set_state(p, BR_STATE_DISABLED); p->topology_change_ack = 0; p->config_pending = 0; br_ifinfo_notify(RTM_NEWLINK, NULL, p); timer_delete(&p->message_age_timer); timer_delete(&p->forward_delay_timer); timer_delete(&p->hold_timer); if (!rcu_access_pointer(p->backup_port)) br_fdb_delete_by_port(br, p, 0, 0); br_multicast_disable_port(p); br_configuration_update(br); br_port_state_selection(br); if (br_is_root_bridge(br) && !wasroot) br_become_root_bridge(br); } static int br_stp_call_user(struct net_bridge *br, char *arg) { char *argv[] = { BR_STP_PROG, br->dev->name, arg, NULL }; char *envp[] = { NULL }; int rc; /* call userspace STP and report program errors */ rc = call_usermodehelper(BR_STP_PROG, argv, envp, UMH_WAIT_PROC); if (rc > 0) { if (rc & 0xff) br_debug(br, BR_STP_PROG " received signal %d\n", rc & 0x7f); else br_debug(br, BR_STP_PROG " exited with code %d\n", (rc >> 8) & 0xff); } return rc; } static void br_stp_start(struct net_bridge *br) { int err = -ENOENT; if (net_eq(dev_net(br->dev), &init_net)) err = br_stp_call_user(br, "start"); if (err && err != -ENOENT) br_err(br, "failed to start userspace STP (%d)\n", err); spin_lock_bh(&br->lock); if (br->bridge_forward_delay < BR_MIN_FORWARD_DELAY) __br_set_forward_delay(br, BR_MIN_FORWARD_DELAY); else if (br->bridge_forward_delay > BR_MAX_FORWARD_DELAY) __br_set_forward_delay(br, BR_MAX_FORWARD_DELAY); if (!err) { br->stp_enabled = BR_USER_STP; br_debug(br, "userspace STP started\n"); } else { br->stp_enabled = BR_KERNEL_STP; br_debug(br, "using kernel STP\n"); /* To start timers on any ports left in blocking */ if (br->dev->flags & IFF_UP) mod_timer(&br->hello_timer, jiffies + br->hello_time); br_port_state_selection(br); } spin_unlock_bh(&br->lock); } static void br_stp_stop(struct net_bridge *br) { int err; if (br->stp_enabled == BR_USER_STP) { err = br_stp_call_user(br, "stop"); if (err) br_err(br, "failed to stop userspace STP (%d)\n", err); /* To start timers on any ports left in blocking */ spin_lock_bh(&br->lock); br_port_state_selection(br); spin_unlock_bh(&br->lock); } br->stp_enabled = BR_NO_STP; } int br_stp_set_enabled(struct net_bridge *br, unsigned long val, struct netlink_ext_ack *extack) { ASSERT_RTNL(); if (br_mrp_enabled(br)) { NL_SET_ERR_MSG_MOD(extack, "STP can't be enabled if MRP is already enabled"); return -EINVAL; } if (val) { if (br->stp_enabled == BR_NO_STP) br_stp_start(br); } else { if (br->stp_enabled != BR_NO_STP) br_stp_stop(br); } return 0; } /* called under bridge lock */ void br_stp_change_bridge_id(struct net_bridge *br, const unsigned char *addr) { /* should be aligned on 2 bytes for ether_addr_equal() */ unsigned short oldaddr_aligned[ETH_ALEN >> 1]; unsigned char *oldaddr = (unsigned char *)oldaddr_aligned; struct net_bridge_port *p; int wasroot; wasroot = br_is_root_bridge(br); br_fdb_change_mac_address(br, addr); memcpy(oldaddr, br->bridge_id.addr, ETH_ALEN); memcpy(br->bridge_id.addr, addr, ETH_ALEN); eth_hw_addr_set(br->dev, addr); list_for_each_entry(p, &br->port_list, list) { if (ether_addr_equal(p->designated_bridge.addr, oldaddr)) memcpy(p->designated_bridge.addr, addr, ETH_ALEN); if (ether_addr_equal(p->designated_root.addr, oldaddr)) memcpy(p->designated_root.addr, addr, ETH_ALEN); } br_configuration_update(br); br_port_state_selection(br); if (br_is_root_bridge(br) && !wasroot) br_become_root_bridge(br); } /* should be aligned on 2 bytes for ether_addr_equal() */ static const unsigned short br_mac_zero_aligned[ETH_ALEN >> 1]; /* called under bridge lock */ bool br_stp_recalculate_bridge_id(struct net_bridge *br) { const unsigned char *br_mac_zero = (const unsigned char *)br_mac_zero_aligned; const unsigned char *addr = br_mac_zero; struct net_bridge_port *p; /* user has chosen a value so keep it */ if (br->dev->addr_assign_type == NET_ADDR_SET) return false; list_for_each_entry(p, &br->port_list, list) { if (addr == br_mac_zero || memcmp(p->dev->dev_addr, addr, ETH_ALEN) < 0) addr = p->dev->dev_addr; } if (ether_addr_equal(br->bridge_id.addr, addr)) return false; /* no change */ br_stp_change_bridge_id(br, addr); return true; } /* Acquires and releases bridge lock */ void br_stp_set_bridge_priority(struct net_bridge *br, u16 newprio) { struct net_bridge_port *p; int wasroot; spin_lock_bh(&br->lock); wasroot = br_is_root_bridge(br); list_for_each_entry(p, &br->port_list, list) { if (p->state != BR_STATE_DISABLED && br_is_designated_port(p)) { p->designated_bridge.prio[0] = (newprio >> 8) & 0xFF; p->designated_bridge.prio[1] = newprio & 0xFF; } } br->bridge_id.prio[0] = (newprio >> 8) & 0xFF; br->bridge_id.prio[1] = newprio & 0xFF; br_configuration_update(br); br_port_state_selection(br); if (br_is_root_bridge(br) && !wasroot) br_become_root_bridge(br); spin_unlock_bh(&br->lock); } /* called under bridge lock */ int br_stp_set_port_priority(struct net_bridge_port *p, unsigned long newprio) { port_id new_port_id; if (newprio > BR_MAX_PORT_PRIORITY) return -ERANGE; new_port_id = br_make_port_id(newprio, p->port_no); if (br_is_designated_port(p)) p->designated_port = new_port_id; p->port_id = new_port_id; p->priority = newprio; if (!memcmp(&p->br->bridge_id, &p->designated_bridge, 8) && p->port_id < p->designated_port) { br_become_designated_port(p); br_port_state_selection(p->br); } return 0; } /* called under bridge lock */ int br_stp_set_path_cost(struct net_bridge_port *p, unsigned long path_cost) { if (path_cost < BR_MIN_PATH_COST || path_cost > BR_MAX_PATH_COST) return -ERANGE; p->flags |= BR_ADMIN_COST; p->path_cost = path_cost; br_configuration_update(p->br); br_port_state_selection(p->br); return 0; } ssize_t br_show_bridge_id(char *buf, const struct bridge_id *id) { return sysfs_emit(buf, "%.2x%.2x.%.2x%.2x%.2x%.2x%.2x%.2x\n", id->prio[0], id->prio[1], id->addr[0], id->addr[1], id->addr[2], id->addr[3], id->addr[4], id->addr[5]); } |
| 3 3 7 7 3 2 7 1 7 1 7 7 1 7 19 3 2 1 1 1 19 4 4 4 4 4 4 4 4 1 4 4 2 2 2 4 4 2 2 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 7 7 7 19 19 19 3 2 19 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 | // SPDX-License-Identifier: GPL-2.0 #include <linux/jhash.h> #include <linux/netfilter.h> #include <linux/rcupdate.h> #include <linux/rhashtable.h> #include <linux/vmalloc.h> #include <net/genetlink.h> #include <net/netns/generic.h> #include <uapi/linux/genetlink.h> #include "ila.h" struct ila_xlat_params { struct ila_params ip; int ifindex; }; struct ila_map { struct ila_xlat_params xp; struct rhash_head node; struct ila_map __rcu *next; struct rcu_head rcu; }; #define MAX_LOCKS 1024 #define LOCKS_PER_CPU 10 static int alloc_ila_locks(struct ila_net *ilan) { return alloc_bucket_spinlocks(&ilan->xlat.locks, &ilan->xlat.locks_mask, MAX_LOCKS, LOCKS_PER_CPU, GFP_KERNEL); } static u32 hashrnd __read_mostly; static __always_inline void __ila_hash_secret_init(void) { net_get_random_once(&hashrnd, sizeof(hashrnd)); } static inline u32 ila_locator_hash(struct ila_locator loc) { u32 *v = (u32 *)loc.v32; __ila_hash_secret_init(); return jhash_2words(v[0], v[1], hashrnd); } static inline spinlock_t *ila_get_lock(struct ila_net *ilan, struct ila_locator loc) { return &ilan->xlat.locks[ila_locator_hash(loc) & ilan->xlat.locks_mask]; } static inline int ila_cmp_wildcards(struct ila_map *ila, struct ila_addr *iaddr, int ifindex) { return (ila->xp.ifindex && ila->xp.ifindex != ifindex); } static inline int ila_cmp_params(struct ila_map *ila, struct ila_xlat_params *xp) { return (ila->xp.ifindex != xp->ifindex); } static int ila_cmpfn(struct rhashtable_compare_arg *arg, const void *obj) { const struct ila_map *ila = obj; return (ila->xp.ip.locator_match.v64 != *(__be64 *)arg->key); } static inline int ila_order(struct ila_map *ila) { int score = 0; if (ila->xp.ifindex) score += 1 << 1; return score; } static const struct rhashtable_params rht_params = { .nelem_hint = 1024, .head_offset = offsetof(struct ila_map, node), .key_offset = offsetof(struct ila_map, xp.ip.locator_match), .key_len = sizeof(u64), /* identifier */ .max_size = 1048576, .min_size = 256, .automatic_shrinking = true, .obj_cmpfn = ila_cmpfn, }; static int parse_nl_config(struct genl_info *info, struct ila_xlat_params *xp) { memset(xp, 0, sizeof(*xp)); if (info->attrs[ILA_ATTR_LOCATOR]) xp->ip.locator.v64 = (__force __be64)nla_get_u64( info->attrs[ILA_ATTR_LOCATOR]); if (info->attrs[ILA_ATTR_LOCATOR_MATCH]) xp->ip.locator_match.v64 = (__force __be64)nla_get_u64( info->attrs[ILA_ATTR_LOCATOR_MATCH]); xp->ip.csum_mode = nla_get_u8_default(info->attrs[ILA_ATTR_CSUM_MODE], ILA_CSUM_NO_ACTION); xp->ip.ident_type = nla_get_u8_default(info->attrs[ILA_ATTR_IDENT_TYPE], ILA_ATYPE_USE_FORMAT); if (info->attrs[ILA_ATTR_IFINDEX]) xp->ifindex = nla_get_s32(info->attrs[ILA_ATTR_IFINDEX]); return 0; } /* Must be called with rcu readlock */ static inline struct ila_map *ila_lookup_wildcards(struct ila_addr *iaddr, int ifindex, struct ila_net *ilan) { struct ila_map *ila; ila = rhashtable_lookup_fast(&ilan->xlat.rhash_table, &iaddr->loc, rht_params); while (ila) { if (!ila_cmp_wildcards(ila, iaddr, ifindex)) return ila; ila = rcu_access_pointer(ila->next); } return NULL; } /* Must be called with rcu readlock */ static inline struct ila_map *ila_lookup_by_params(struct ila_xlat_params *xp, struct ila_net *ilan) { struct ila_map *ila; ila = rhashtable_lookup_fast(&ilan->xlat.rhash_table, &xp->ip.locator_match, rht_params); while (ila) { if (!ila_cmp_params(ila, xp)) return ila; ila = rcu_access_pointer(ila->next); } return NULL; } static inline void ila_release(struct ila_map *ila) { kfree_rcu(ila, rcu); } static void ila_free_node(struct ila_map *ila) { struct ila_map *next; /* Assume rcu_readlock held */ while (ila) { next = rcu_access_pointer(ila->next); ila_release(ila); ila = next; } } static void ila_free_cb(void *ptr, void *arg) { ila_free_node((struct ila_map *)ptr); } static int ila_xlat_addr(struct sk_buff *skb, bool sir2ila); static unsigned int ila_nf_input(void *priv, struct sk_buff *skb, const struct nf_hook_state *state) { ila_xlat_addr(skb, false); return NF_ACCEPT; } static const struct nf_hook_ops ila_nf_hook_ops[] = { { .hook = ila_nf_input, .pf = NFPROTO_IPV6, .hooknum = NF_INET_PRE_ROUTING, .priority = -1, }, }; static DEFINE_MUTEX(ila_mutex); static int ila_add_mapping(struct net *net, struct ila_xlat_params *xp) { struct ila_net *ilan = net_generic(net, ila_net_id); struct ila_map *ila, *head; spinlock_t *lock = ila_get_lock(ilan, xp->ip.locator_match); int err = 0, order; if (!READ_ONCE(ilan->xlat.hooks_registered)) { /* We defer registering net hooks in the namespace until the * first mapping is added. */ mutex_lock(&ila_mutex); if (!ilan->xlat.hooks_registered) { err = nf_register_net_hooks(net, ila_nf_hook_ops, ARRAY_SIZE(ila_nf_hook_ops)); if (!err) WRITE_ONCE(ilan->xlat.hooks_registered, true); } mutex_unlock(&ila_mutex); if (err) return err; } ila = kzalloc_obj(*ila); if (!ila) return -ENOMEM; ila_init_saved_csum(&xp->ip); ila->xp = *xp; order = ila_order(ila); spin_lock(lock); head = rhashtable_lookup_fast(&ilan->xlat.rhash_table, &xp->ip.locator_match, rht_params); if (!head) { /* New entry for the rhash_table */ err = rhashtable_lookup_insert_fast(&ilan->xlat.rhash_table, &ila->node, rht_params); } else { struct ila_map *tila = head, *prev = NULL; do { if (!ila_cmp_params(tila, xp)) { err = -EEXIST; goto out; } if (order > ila_order(tila)) break; prev = tila; tila = rcu_dereference_protected(tila->next, lockdep_is_held(lock)); } while (tila); if (prev) { /* Insert in sub list of head */ RCU_INIT_POINTER(ila->next, tila); rcu_assign_pointer(prev->next, ila); } else { /* Make this ila new head */ RCU_INIT_POINTER(ila->next, head); err = rhashtable_replace_fast(&ilan->xlat.rhash_table, &head->node, &ila->node, rht_params); if (err) goto out; } } out: spin_unlock(lock); if (err) kfree(ila); return err; } static int ila_del_mapping(struct net *net, struct ila_xlat_params *xp) { struct ila_net *ilan = net_generic(net, ila_net_id); struct ila_map *ila, *head, *prev; spinlock_t *lock = ila_get_lock(ilan, xp->ip.locator_match); int err = -ENOENT; spin_lock(lock); head = rhashtable_lookup_fast(&ilan->xlat.rhash_table, &xp->ip.locator_match, rht_params); ila = head; prev = NULL; while (ila) { if (ila_cmp_params(ila, xp)) { prev = ila; ila = rcu_dereference_protected(ila->next, lockdep_is_held(lock)); continue; } err = 0; if (prev) { /* Not head, just delete from list */ rcu_assign_pointer(prev->next, ila->next); } else { /* It is the head. If there is something in the * sublist we need to make a new head. */ head = rcu_dereference_protected(ila->next, lockdep_is_held(lock)); if (head) { /* Put first entry in the sublist into the * table */ err = rhashtable_replace_fast( &ilan->xlat.rhash_table, &ila->node, &head->node, rht_params); if (err) goto out; } else { /* Entry no longer used */ err = rhashtable_remove_fast( &ilan->xlat.rhash_table, &ila->node, rht_params); } } ila_release(ila); break; } out: spin_unlock(lock); return err; } int ila_xlat_nl_cmd_add_mapping(struct sk_buff *skb, struct genl_info *info) { struct net *net = genl_info_net(info); struct ila_xlat_params p; int err; err = parse_nl_config(info, &p); if (err) return err; return ila_add_mapping(net, &p); } int ila_xlat_nl_cmd_del_mapping(struct sk_buff *skb, struct genl_info *info) { struct net *net = genl_info_net(info); struct ila_xlat_params xp; int err; err = parse_nl_config(info, &xp); if (err) return err; ila_del_mapping(net, &xp); return 0; } static inline spinlock_t *lock_from_ila_map(struct ila_net *ilan, struct ila_map *ila) { return ila_get_lock(ilan, ila->xp.ip.locator_match); } int ila_xlat_nl_cmd_flush(struct sk_buff *skb, struct genl_info *info) { struct net *net = genl_info_net(info); struct ila_net *ilan = net_generic(net, ila_net_id); struct rhashtable_iter iter; struct ila_map *ila; spinlock_t *lock; int ret = 0; rhashtable_walk_enter(&ilan->xlat.rhash_table, &iter); rhashtable_walk_start(&iter); for (;;) { ila = rhashtable_walk_next(&iter); if (IS_ERR(ila)) { if (PTR_ERR(ila) == -EAGAIN) continue; ret = PTR_ERR(ila); goto done; } else if (!ila) { break; } lock = lock_from_ila_map(ilan, ila); spin_lock(lock); ret = rhashtable_remove_fast(&ilan->xlat.rhash_table, &ila->node, rht_params); if (!ret) ila_free_node(ila); spin_unlock(lock); if (ret) break; } done: rhashtable_walk_stop(&iter); rhashtable_walk_exit(&iter); return ret; } static int ila_fill_info(struct ila_map *ila, struct sk_buff *msg) { if (nla_put_u64_64bit(msg, ILA_ATTR_LOCATOR, (__force u64)ila->xp.ip.locator.v64, ILA_ATTR_PAD) || nla_put_u64_64bit(msg, ILA_ATTR_LOCATOR_MATCH, (__force u64)ila->xp.ip.locator_match.v64, ILA_ATTR_PAD) || nla_put_s32(msg, ILA_ATTR_IFINDEX, ila->xp.ifindex) || nla_put_u8(msg, ILA_ATTR_CSUM_MODE, ila->xp.ip.csum_mode) || nla_put_u8(msg, ILA_ATTR_IDENT_TYPE, ila->xp.ip.ident_type)) return -1; return 0; } static int ila_dump_info(struct ila_map *ila, u32 portid, u32 seq, u32 flags, struct sk_buff *skb, u8 cmd) { void *hdr; hdr = genlmsg_put(skb, portid, seq, &ila_nl_family, flags, cmd); if (!hdr) return -ENOMEM; if (ila_fill_info(ila, skb) < 0) goto nla_put_failure; genlmsg_end(skb, hdr); return 0; nla_put_failure: genlmsg_cancel(skb, hdr); return -EMSGSIZE; } int ila_xlat_nl_cmd_get_mapping(struct sk_buff *skb, struct genl_info *info) { struct net *net = genl_info_net(info); struct ila_net *ilan = net_generic(net, ila_net_id); struct sk_buff *msg; struct ila_xlat_params xp; struct ila_map *ila; int ret; ret = parse_nl_config(info, &xp); if (ret) return ret; msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); if (!msg) return -ENOMEM; rcu_read_lock(); ret = -ESRCH; ila = ila_lookup_by_params(&xp, ilan); if (ila) { ret = ila_dump_info(ila, info->snd_portid, info->snd_seq, 0, msg, info->genlhdr->cmd); } rcu_read_unlock(); if (ret < 0) goto out_free; return genlmsg_reply(msg, info); out_free: nlmsg_free(msg); return ret; } struct ila_dump_iter { struct rhashtable_iter rhiter; int skip; }; int ila_xlat_nl_dump_start(struct netlink_callback *cb) { struct net *net = sock_net(cb->skb->sk); struct ila_net *ilan = net_generic(net, ila_net_id); struct ila_dump_iter *iter; iter = kmalloc_obj(*iter); if (!iter) return -ENOMEM; rhashtable_walk_enter(&ilan->xlat.rhash_table, &iter->rhiter); iter->skip = 0; cb->args[0] = (long)iter; return 0; } int ila_xlat_nl_dump_done(struct netlink_callback *cb) { struct ila_dump_iter *iter = (struct ila_dump_iter *)cb->args[0]; rhashtable_walk_exit(&iter->rhiter); kfree(iter); return 0; } int ila_xlat_nl_dump(struct sk_buff *skb, struct netlink_callback *cb) { struct ila_dump_iter *iter = (struct ila_dump_iter *)cb->args[0]; struct rhashtable_iter *rhiter = &iter->rhiter; int skip = iter->skip; struct ila_map *ila; int ret; rhashtable_walk_start(rhiter); /* Get first entry */ ila = rhashtable_walk_peek(rhiter); if (ila && !IS_ERR(ila) && skip) { /* Skip over visited entries */ while (ila && skip) { /* Skip over any ila entries in this list that we * have already dumped. */ ila = rcu_access_pointer(ila->next); skip--; } } skip = 0; for (;;) { if (IS_ERR(ila)) { ret = PTR_ERR(ila); if (ret == -EAGAIN) { /* Table has changed and iter has reset. Return * -EAGAIN to the application even if we have * written data to the skb. The application * needs to deal with this. */ goto out_ret; } else { break; } } else if (!ila) { ret = 0; break; } while (ila) { ret = ila_dump_info(ila, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq, NLM_F_MULTI, skb, ILA_CMD_GET); if (ret) goto out; skip++; ila = rcu_access_pointer(ila->next); } skip = 0; ila = rhashtable_walk_next(rhiter); } out: iter->skip = skip; ret = (skb->len ? : ret); out_ret: rhashtable_walk_stop(rhiter); return ret; } int ila_xlat_init_net(struct net *net) { struct ila_net *ilan = net_generic(net, ila_net_id); int err; err = alloc_ila_locks(ilan); if (err) return err; err = rhashtable_init(&ilan->xlat.rhash_table, &rht_params); if (err) { free_bucket_spinlocks(ilan->xlat.locks); return err; } return 0; } void ila_xlat_pre_exit_net(struct net *net) { struct ila_net *ilan = net_generic(net, ila_net_id); if (ilan->xlat.hooks_registered) nf_unregister_net_hooks(net, ila_nf_hook_ops, ARRAY_SIZE(ila_nf_hook_ops)); } void ila_xlat_exit_net(struct net *net) { struct ila_net *ilan = net_generic(net, ila_net_id); rhashtable_free_and_destroy(&ilan->xlat.rhash_table, ila_free_cb, NULL); free_bucket_spinlocks(ilan->xlat.locks); } static int ila_xlat_addr(struct sk_buff *skb, bool sir2ila) { struct ila_map *ila; struct ipv6hdr *ip6h = ipv6_hdr(skb); struct net *net = dev_net(skb->dev); struct ila_net *ilan = net_generic(net, ila_net_id); struct ila_addr *iaddr = ila_a2i(&ip6h->daddr); /* Assumes skb contains a valid IPv6 header that is pulled */ /* No check here that ILA type in the mapping matches what is in the * address. We assume that whatever sender gaves us can be translated. * The checksum mode however is relevant. */ rcu_read_lock(); ila = ila_lookup_wildcards(iaddr, skb->dev->ifindex, ilan); if (ila) ila_update_ipv6_locator(skb, &ila->xp.ip, sir2ila); rcu_read_unlock(); return 0; } |
| 50 50 50 50 50 49 50 50 50 50 50 1 50 15 50 4 4 2 50 50 50 50 57 6 6 5 53 39 15 8 58 56 3 51 7 6 7 56 3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 | // SPDX-License-Identifier: GPL-2.0-or-later #include <linux/skbuff.h> #include <linux/sctp.h> #include <net/gso.h> #include <net/gro.h> /** * skb_eth_gso_segment - segmentation handler for ethernet protocols. * @skb: buffer to segment * @features: features for the output path (see dev->features) * @type: Ethernet Protocol ID */ struct sk_buff *skb_eth_gso_segment(struct sk_buff *skb, netdev_features_t features, __be16 type) { struct sk_buff *segs = ERR_PTR(-EPROTONOSUPPORT); struct packet_offload *ptype; rcu_read_lock(); list_for_each_entry_rcu(ptype, &net_hotdata.offload_base, list) { if (ptype->type == type && ptype->callbacks.gso_segment) { segs = ptype->callbacks.gso_segment(skb, features); break; } } rcu_read_unlock(); return segs; } EXPORT_SYMBOL(skb_eth_gso_segment); /** * skb_mac_gso_segment - mac layer segmentation handler. * @skb: buffer to segment * @features: features for the output path (see dev->features) */ struct sk_buff *skb_mac_gso_segment(struct sk_buff *skb, netdev_features_t features) { struct sk_buff *segs = ERR_PTR(-EPROTONOSUPPORT); struct packet_offload *ptype; int vlan_depth = skb->mac_len; __be16 type = skb_network_protocol(skb, &vlan_depth); if (unlikely(!type)) return ERR_PTR(-EINVAL); __skb_pull(skb, vlan_depth); rcu_read_lock(); list_for_each_entry_rcu(ptype, &net_hotdata.offload_base, list) { if (ptype->type == type && ptype->callbacks.gso_segment) { segs = ptype->callbacks.gso_segment(skb, features); break; } } rcu_read_unlock(); __skb_push(skb, skb->data - skb_mac_header(skb)); return segs; } EXPORT_SYMBOL(skb_mac_gso_segment); /* openvswitch calls this on rx path, so we need a different check. */ static bool skb_needs_check(const struct sk_buff *skb, bool tx_path) { if (tx_path) return skb->ip_summed != CHECKSUM_PARTIAL && skb->ip_summed != CHECKSUM_UNNECESSARY; return skb->ip_summed == CHECKSUM_NONE; } /** * __skb_gso_segment - Perform segmentation on skb. * @skb: buffer to segment * @features: features for the output path (see dev->features) * @tx_path: whether it is called in TX path * * This function segments the given skb and returns a list of segments. * * It may return NULL if the skb requires no segmentation. This is * only possible when GSO is used for verifying header integrity. * * Segmentation preserves SKB_GSO_CB_OFFSET bytes of previous skb cb. */ struct sk_buff *__skb_gso_segment(struct sk_buff *skb, netdev_features_t features, bool tx_path) { struct sk_buff *segs; if (unlikely(skb_needs_check(skb, tx_path))) { int err; /* We're going to init ->check field in TCP or UDP header */ err = skb_cow_head(skb, 0); if (err < 0) return ERR_PTR(err); } /* Only report GSO partial support if it will enable us to * support segmentation on this frame without needing additional * work. */ if (features & NETIF_F_GSO_PARTIAL) { netdev_features_t partial_features = NETIF_F_GSO_ROBUST; struct net_device *dev = skb->dev; partial_features |= dev->features & dev->gso_partial_features; if (!skb_gso_ok(skb, features | partial_features)) features &= ~NETIF_F_GSO_PARTIAL; } BUILD_BUG_ON(SKB_GSO_CB_OFFSET + sizeof(*SKB_GSO_CB(skb)) > sizeof(skb->cb)); SKB_GSO_CB(skb)->mac_offset = skb_headroom(skb); SKB_GSO_CB(skb)->encap_level = 0; skb_reset_mac_header(skb); skb_reset_mac_len(skb); segs = skb_mac_gso_segment(skb, features); if (segs != skb && unlikely(skb_needs_check(skb, tx_path) && !IS_ERR(segs))) skb_warn_bad_offload(skb); return segs; } EXPORT_SYMBOL(__skb_gso_segment); /** * skb_gso_transport_seglen - Return length of individual segments of a gso packet * * @skb: GSO skb * * skb_gso_transport_seglen is used to determine the real size of the * individual segments, including Layer4 headers (TCP/UDP). * * The MAC/L2 or network (IP, IPv6) headers are not accounted for. */ static unsigned int skb_gso_transport_seglen(const struct sk_buff *skb) { const struct skb_shared_info *shinfo = skb_shinfo(skb); unsigned int thlen = 0; if (skb->encapsulation) { thlen = skb_inner_transport_header(skb) - skb_transport_header(skb); if (likely(shinfo->gso_type & (SKB_GSO_TCPV4 | SKB_GSO_TCPV6))) thlen += inner_tcp_hdrlen(skb); } else if (likely(shinfo->gso_type & (SKB_GSO_TCPV4 | SKB_GSO_TCPV6))) { thlen = tcp_hdrlen(skb); } else if (unlikely(skb_is_gso_sctp(skb))) { thlen = sizeof(struct sctphdr); } else if (shinfo->gso_type & SKB_GSO_UDP_L4) { thlen = sizeof(struct udphdr); } /* UFO sets gso_size to the size of the fragmentation * payload, i.e. the size of the L4 (UDP) header is already * accounted for. */ return thlen + shinfo->gso_size; } /** * skb_gso_network_seglen - Return length of individual segments of a gso packet * * @skb: GSO skb * * skb_gso_network_seglen is used to determine the real size of the * individual segments, including Layer3 (IP, IPv6) and L4 headers (TCP/UDP). * * The MAC/L2 header is not accounted for. */ static unsigned int skb_gso_network_seglen(const struct sk_buff *skb) { unsigned int hdr_len = skb_transport_header(skb) - skb_network_header(skb); return hdr_len + skb_gso_transport_seglen(skb); } /** * skb_gso_mac_seglen - Return length of individual segments of a gso packet * * @skb: GSO skb * * skb_gso_mac_seglen is used to determine the real size of the * individual segments, including MAC/L2, Layer3 (IP, IPv6) and L4 * headers (TCP/UDP). */ static unsigned int skb_gso_mac_seglen(const struct sk_buff *skb) { unsigned int hdr_len = skb_transport_header(skb) - skb_mac_header(skb); return hdr_len + skb_gso_transport_seglen(skb); } /** * skb_gso_size_check - check the skb size, considering GSO_BY_FRAGS * * There are a couple of instances where we have a GSO skb, and we * want to determine what size it would be after it is segmented. * * We might want to check: * - L3+L4+payload size (e.g. IP forwarding) * - L2+L3+L4+payload size (e.g. sanity check before passing to driver) * * This is a helper to do that correctly considering GSO_BY_FRAGS. * * @skb: GSO skb * * @seg_len: The segmented length (from skb_gso_*_seglen). In the * GSO_BY_FRAGS case this will be [header sizes + GSO_BY_FRAGS]. * * @max_len: The maximum permissible length. * * Returns true if the segmented length <= max length. */ static inline bool skb_gso_size_check(const struct sk_buff *skb, unsigned int seg_len, unsigned int max_len) { const struct skb_shared_info *shinfo = skb_shinfo(skb); const struct sk_buff *iter; if (shinfo->gso_size != GSO_BY_FRAGS) return seg_len <= max_len; /* Undo this so we can re-use header sizes */ seg_len -= GSO_BY_FRAGS; skb_walk_frags(skb, iter) { if (seg_len + skb_headlen(iter) > max_len) return false; } return true; } /** * skb_gso_validate_network_len - Will a split GSO skb fit into a given MTU? * * @skb: GSO skb * @mtu: MTU to validate against * * skb_gso_validate_network_len validates if a given skb will fit a * wanted MTU once split. It considers L3 headers, L4 headers, and the * payload. */ bool skb_gso_validate_network_len(const struct sk_buff *skb, unsigned int mtu) { return skb_gso_size_check(skb, skb_gso_network_seglen(skb), mtu); } EXPORT_SYMBOL_GPL(skb_gso_validate_network_len); /** * skb_gso_validate_mac_len - Will a split GSO skb fit in a given length? * * @skb: GSO skb * @len: length to validate against * * skb_gso_validate_mac_len validates if a given skb will fit a wanted * length once split, including L2, L3 and L4 headers and the payload. */ bool skb_gso_validate_mac_len(const struct sk_buff *skb, unsigned int len) { return skb_gso_size_check(skb, skb_gso_mac_seglen(skb), len); } EXPORT_SYMBOL_GPL(skb_gso_validate_mac_len); |
| 176 219 32 307 29 3 5 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 | /* SPDX-License-Identifier: GPL-2.0 */ #undef TRACE_SYSTEM #define TRACE_SYSTEM filemap #if !defined(_TRACE_FILEMAP_H) || defined(TRACE_HEADER_MULTI_READ) #define _TRACE_FILEMAP_H #include <linux/types.h> #include <linux/tracepoint.h> #include <linux/mm.h> #include <linux/memcontrol.h> #include <linux/device.h> #include <linux/kdev_t.h> #include <linux/errseq.h> DECLARE_EVENT_CLASS(mm_filemap_op_page_cache, TP_PROTO(struct folio *folio), TP_ARGS(folio), TP_STRUCT__entry( __field(unsigned long, pfn) __field(unsigned long, i_ino) __field(unsigned long, index) __field(dev_t, s_dev) __field(unsigned char, order) ), TP_fast_assign( __entry->pfn = folio_pfn(folio); __entry->i_ino = folio->mapping->host->i_ino; __entry->index = folio->index; if (folio->mapping->host->i_sb) __entry->s_dev = folio->mapping->host->i_sb->s_dev; else __entry->s_dev = folio->mapping->host->i_rdev; __entry->order = folio_order(folio); ), TP_printk("dev %d:%d ino %lx pfn=0x%lx ofs=%lu order=%u", MAJOR(__entry->s_dev), MINOR(__entry->s_dev), __entry->i_ino, __entry->pfn, __entry->index << PAGE_SHIFT, __entry->order) ); DEFINE_EVENT(mm_filemap_op_page_cache, mm_filemap_delete_from_page_cache, TP_PROTO(struct folio *folio), TP_ARGS(folio) ); DEFINE_EVENT(mm_filemap_op_page_cache, mm_filemap_add_to_page_cache, TP_PROTO(struct folio *folio), TP_ARGS(folio) ); DECLARE_EVENT_CLASS(mm_filemap_op_page_cache_range, TP_PROTO( struct address_space *mapping, pgoff_t index, pgoff_t last_index ), TP_ARGS(mapping, index, last_index), TP_STRUCT__entry( __field(unsigned long, i_ino) __field(dev_t, s_dev) __field(unsigned long, index) __field(unsigned long, last_index) ), TP_fast_assign( __entry->i_ino = mapping->host->i_ino; if (mapping->host->i_sb) __entry->s_dev = mapping->host->i_sb->s_dev; else __entry->s_dev = mapping->host->i_rdev; __entry->index = index; __entry->last_index = last_index; ), TP_printk( "dev=%d:%d ino=%lx ofs=%lld-%lld", MAJOR(__entry->s_dev), MINOR(__entry->s_dev), __entry->i_ino, ((loff_t)__entry->index) << PAGE_SHIFT, ((((loff_t)__entry->last_index + 1) << PAGE_SHIFT) - 1) ) ); DEFINE_EVENT(mm_filemap_op_page_cache_range, mm_filemap_get_pages, TP_PROTO( struct address_space *mapping, pgoff_t index, pgoff_t last_index ), TP_ARGS(mapping, index, last_index) ); DEFINE_EVENT(mm_filemap_op_page_cache_range, mm_filemap_map_pages, TP_PROTO( struct address_space *mapping, pgoff_t index, pgoff_t last_index ), TP_ARGS(mapping, index, last_index) ); TRACE_EVENT(mm_filemap_fault, TP_PROTO(struct address_space *mapping, pgoff_t index), TP_ARGS(mapping, index), TP_STRUCT__entry( __field(unsigned long, i_ino) __field(dev_t, s_dev) __field(unsigned long, index) ), TP_fast_assign( __entry->i_ino = mapping->host->i_ino; if (mapping->host->i_sb) __entry->s_dev = mapping->host->i_sb->s_dev; else __entry->s_dev = mapping->host->i_rdev; __entry->index = index; ), TP_printk( "dev=%d:%d ino=%lx ofs=%lld", MAJOR(__entry->s_dev), MINOR(__entry->s_dev), __entry->i_ino, ((loff_t)__entry->index) << PAGE_SHIFT ) ); TRACE_EVENT(filemap_set_wb_err, TP_PROTO(struct address_space *mapping, errseq_t eseq), TP_ARGS(mapping, eseq), TP_STRUCT__entry( __field(unsigned long, i_ino) __field(dev_t, s_dev) __field(errseq_t, errseq) ), TP_fast_assign( __entry->i_ino = mapping->host->i_ino; __entry->errseq = eseq; if (mapping->host->i_sb) __entry->s_dev = mapping->host->i_sb->s_dev; else __entry->s_dev = mapping->host->i_rdev; ), TP_printk("dev=%d:%d ino=0x%lx errseq=0x%x", MAJOR(__entry->s_dev), MINOR(__entry->s_dev), __entry->i_ino, __entry->errseq) ); TRACE_EVENT(file_check_and_advance_wb_err, TP_PROTO(struct file *file, errseq_t old), TP_ARGS(file, old), TP_STRUCT__entry( __field(struct file *, file) __field(unsigned long, i_ino) __field(dev_t, s_dev) __field(errseq_t, old) __field(errseq_t, new) ), TP_fast_assign( __entry->file = file; __entry->i_ino = file->f_mapping->host->i_ino; if (file->f_mapping->host->i_sb) __entry->s_dev = file->f_mapping->host->i_sb->s_dev; else __entry->s_dev = file->f_mapping->host->i_rdev; __entry->old = old; __entry->new = file->f_wb_err; ), TP_printk("file=%p dev=%d:%d ino=0x%lx old=0x%x new=0x%x", __entry->file, MAJOR(__entry->s_dev), MINOR(__entry->s_dev), __entry->i_ino, __entry->old, __entry->new) ); #endif /* _TRACE_FILEMAP_H */ /* This part must be outside protection */ #include <trace/define_trace.h> |
| 66 78 58 57 58 58 39 58 2 58 78 75 74 75 36 29 2 29 29 6 2 2 24 2 2 1 2 24 44 6 6 44 44 44 44 34 33 33 33 33 31 6 29 40 40 40 10 3 58 120 60 58 48 122 95 96 96 95 96 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 | // SPDX-License-Identifier: GPL-2.0 /* * SUCS NET3: * * Generic stream handling routines. These are generic for most * protocols. Even IP. Tonight 8-). * This is used because TCP, LLC (others too) layer all have mostly * identical sendmsg() and recvmsg() code. * So we (will) share it here. * * Authors: Arnaldo Carvalho de Melo <acme@conectiva.com.br> * (from old tcp.c code) * Alan Cox <alan@lxorguk.ukuu.org.uk> (Borrowed comments 8-)) */ #include <linux/module.h> #include <linux/sched/signal.h> #include <linux/net.h> #include <linux/signal.h> #include <linux/tcp.h> #include <linux/wait.h> #include <net/sock.h> /** * sk_stream_write_space - stream socket write_space callback. * @sk: pointer to the socket structure * * This function is invoked when there's space available in the socket's * send buffer for writing. It first checks if the socket is writable, * clears the SOCK_NOSPACE flag indicating that memory for writing * is now available, wakes up any processes waiting for write operations * and sends asynchronous notifications if needed. */ void sk_stream_write_space(struct sock *sk) { struct socket *sock = sk->sk_socket; struct socket_wq *wq; if (__sk_stream_is_writeable(sk, 1) && sock) { clear_bit(SOCK_NOSPACE, &sock->flags); rcu_read_lock(); wq = rcu_dereference(sk->sk_wq); if (skwq_has_sleeper(wq)) wake_up_interruptible_poll(&wq->wait, EPOLLOUT | EPOLLWRNORM | EPOLLWRBAND); if (wq && wq->fasync_list && !(sk->sk_shutdown & SEND_SHUTDOWN)) sock_wake_async(wq, SOCK_WAKE_SPACE, POLL_OUT); rcu_read_unlock(); } } /** * sk_stream_wait_connect - Wait for a socket to get into the connected state * @sk: sock to wait on * @timeo_p: for how long to wait * * Must be called with the socket locked. */ int sk_stream_wait_connect(struct sock *sk, long *timeo_p) { DEFINE_WAIT_FUNC(wait, woken_wake_function); struct task_struct *tsk = current; int done; do { int err = sock_error(sk); if (err) return err; if ((1 << sk->sk_state) & ~(TCPF_SYN_SENT | TCPF_SYN_RECV)) return -EPIPE; if (!*timeo_p) return -EAGAIN; if (signal_pending(tsk)) return sock_intr_errno(*timeo_p); add_wait_queue(sk_sleep(sk), &wait); sk->sk_write_pending++; done = sk_wait_event(sk, timeo_p, !READ_ONCE(sk->sk_err) && !((1 << READ_ONCE(sk->sk_state)) & ~(TCPF_ESTABLISHED | TCPF_CLOSE_WAIT)), &wait); remove_wait_queue(sk_sleep(sk), &wait); sk->sk_write_pending--; } while (!done); return done < 0 ? done : 0; } EXPORT_SYMBOL(sk_stream_wait_connect); /** * sk_stream_closing - Return 1 if we still have things to send in our buffers. * @sk: socket to verify */ static int sk_stream_closing(const struct sock *sk) { return (1 << READ_ONCE(sk->sk_state)) & (TCPF_FIN_WAIT1 | TCPF_CLOSING | TCPF_LAST_ACK); } void sk_stream_wait_close(struct sock *sk, long timeout) { if (timeout) { DEFINE_WAIT_FUNC(wait, woken_wake_function); add_wait_queue(sk_sleep(sk), &wait); do { if (sk_wait_event(sk, &timeout, !sk_stream_closing(sk), &wait)) break; } while (!signal_pending(current) && timeout); remove_wait_queue(sk_sleep(sk), &wait); } } EXPORT_SYMBOL(sk_stream_wait_close); /** * sk_stream_wait_memory - Wait for more memory for a socket * @sk: socket to wait for memory * @timeo_p: for how long */ int sk_stream_wait_memory(struct sock *sk, long *timeo_p) { int ret, err = 0; long vm_wait = 0; long current_timeo = *timeo_p; DEFINE_WAIT_FUNC(wait, woken_wake_function); if (sk_stream_memory_free(sk)) current_timeo = vm_wait = get_random_u32_below(HZ / 5) + 2; add_wait_queue(sk_sleep(sk), &wait); while (1) { sk_set_bit(SOCKWQ_ASYNC_NOSPACE, sk); if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN)) goto do_error; if (!*timeo_p) goto do_eagain; if (signal_pending(current)) goto do_interrupted; sk_clear_bit(SOCKWQ_ASYNC_NOSPACE, sk); if (sk_stream_memory_free(sk) && !vm_wait) break; set_bit(SOCK_NOSPACE, &sk->sk_socket->flags); sk->sk_write_pending++; ret = sk_wait_event(sk, ¤t_timeo, READ_ONCE(sk->sk_err) || (READ_ONCE(sk->sk_shutdown) & SEND_SHUTDOWN) || (sk_stream_memory_free(sk) && !vm_wait), &wait); sk->sk_write_pending--; if (ret < 0) goto do_error; if (vm_wait) { vm_wait -= current_timeo; current_timeo = *timeo_p; if (current_timeo != MAX_SCHEDULE_TIMEOUT && (current_timeo -= vm_wait) < 0) current_timeo = 0; vm_wait = 0; } *timeo_p = current_timeo; } out: if (!sock_flag(sk, SOCK_DEAD)) remove_wait_queue(sk_sleep(sk), &wait); return err; do_error: err = -EPIPE; goto out; do_eagain: /* Make sure that whenever EAGAIN is returned, EPOLLOUT event can * be generated later. * When TCP receives ACK packets that make room, tcp_check_space() * only calls tcp_new_space() if SOCK_NOSPACE is set. */ set_bit(SOCK_NOSPACE, &sk->sk_socket->flags); err = -EAGAIN; goto out; do_interrupted: err = sock_intr_errno(*timeo_p); goto out; } EXPORT_SYMBOL(sk_stream_wait_memory); int sk_stream_error(struct sock *sk, int flags, int err) { if (err == -EPIPE) err = sock_error(sk) ? : -EPIPE; if (err == -EPIPE && !(flags & MSG_NOSIGNAL)) send_sig(SIGPIPE, current, 0); return err; } EXPORT_SYMBOL(sk_stream_error); void sk_stream_kill_queues(struct sock *sk) { /* First the read buffer. */ __skb_queue_purge(&sk->sk_receive_queue); /* Next, the error queue. * We need to use queue lock, because other threads might * add packets to the queue without socket lock being held. */ skb_queue_purge(&sk->sk_error_queue); /* Next, the write queue. */ WARN_ON_ONCE(!skb_queue_empty(&sk->sk_write_queue)); /* Account for returned memory. */ sk_mem_reclaim_final(sk); WARN_ON_ONCE(sk->sk_wmem_queued); /* It is _impossible_ for the backlog to contain anything * when we get here. All user references to this socket * have gone away, only the net layer knows can touch it. */ } EXPORT_SYMBOL(sk_stream_kill_queues); |
| 172 42 171 120 173 19 92 10 82 144 64 218 16 3 5 12 61 60 59 58 46 46 45 6 8 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 | /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _LINUX_SWAPOPS_H #define _LINUX_SWAPOPS_H #include <linux/radix-tree.h> #include <linux/bug.h> #include <linux/mm_types.h> #ifdef CONFIG_MMU #ifdef CONFIG_SWAP #include <linux/swapfile.h> #endif /* CONFIG_SWAP */ /* * swapcache pages are stored in the swapper_space radix tree. We want to * get good packing density in that tree, so the index should be dense in * the low-order bits. * * We arrange the `type' and `offset' fields so that `type' is at the six * high-order bits of the swp_entry_t and `offset' is right-aligned in the * remaining bits. Although `type' itself needs only five bits, we allow for * shmem/tmpfs to shift it all up a further one bit: see swp_to_radix_entry(). * * swp_entry_t's are *never* stored anywhere in their arch-dependent format. */ #define SWP_TYPE_SHIFT (BITS_PER_XA_VALUE - MAX_SWAPFILES_SHIFT) #define SWP_OFFSET_MASK ((1UL << SWP_TYPE_SHIFT) - 1) /* * Definitions only for PFN swap entries (see leafeant_has_pfn()). To * store PFN, we only need SWP_PFN_BITS bits. Each of the pfn swap entries * can use the extra bits to store other information besides PFN. */ #ifdef MAX_PHYSMEM_BITS #define SWP_PFN_BITS (MAX_PHYSMEM_BITS - PAGE_SHIFT) #else /* MAX_PHYSMEM_BITS */ #define SWP_PFN_BITS min_t(int, \ sizeof(phys_addr_t) * 8 - PAGE_SHIFT, \ SWP_TYPE_SHIFT) #endif /* MAX_PHYSMEM_BITS */ #define SWP_PFN_MASK (BIT(SWP_PFN_BITS) - 1) /** * Migration swap entry specific bitfield definitions. Layout: * * |----------+--------------------| * | swp_type | swp_offset | * |----------+--------+-+-+-------| * | | resv |D|A| PFN | * |----------+--------+-+-+-------| * * @SWP_MIG_YOUNG_BIT: Whether the page used to have young bit set (bit A) * @SWP_MIG_DIRTY_BIT: Whether the page used to have dirty bit set (bit D) * * Note: A/D bits will be stored in migration entries iff there're enough * free bits in arch specific swp offset. By default we'll ignore A/D bits * when migrating a page. Please refer to migration_entry_supports_ad() * for more information. If there're more bits besides PFN and A/D bits, * they should be reserved and always be zeros. */ #define SWP_MIG_YOUNG_BIT (SWP_PFN_BITS) #define SWP_MIG_DIRTY_BIT (SWP_PFN_BITS + 1) #define SWP_MIG_TOTAL_BITS (SWP_PFN_BITS + 2) #define SWP_MIG_YOUNG BIT(SWP_MIG_YOUNG_BIT) #define SWP_MIG_DIRTY BIT(SWP_MIG_DIRTY_BIT) /* Clear all flags but only keep swp_entry_t related information */ static inline pte_t pte_swp_clear_flags(pte_t pte) { if (pte_swp_exclusive(pte)) pte = pte_swp_clear_exclusive(pte); if (pte_swp_soft_dirty(pte)) pte = pte_swp_clear_soft_dirty(pte); if (pte_swp_uffd_wp(pte)) pte = pte_swp_clear_uffd_wp(pte); return pte; } /* * Store a type+offset into a swp_entry_t in an arch-independent format */ static inline swp_entry_t swp_entry(unsigned long type, pgoff_t offset) { swp_entry_t ret; ret.val = (type << SWP_TYPE_SHIFT) | (offset & SWP_OFFSET_MASK); return ret; } /* * Extract the `type' field from a swp_entry_t. The swp_entry_t is in * arch-independent format */ static inline unsigned swp_type(swp_entry_t entry) { return (entry.val >> SWP_TYPE_SHIFT); } /* * Extract the `offset' field from a swp_entry_t. The swp_entry_t is in * arch-independent format */ static inline pgoff_t swp_offset(swp_entry_t entry) { return entry.val & SWP_OFFSET_MASK; } /* * Convert the arch-independent representation of a swp_entry_t into the * arch-dependent pte representation. */ static inline pte_t swp_entry_to_pte(swp_entry_t entry) { swp_entry_t arch_entry; arch_entry = __swp_entry(swp_type(entry), swp_offset(entry)); return __swp_entry_to_pte(arch_entry); } static inline swp_entry_t radix_to_swp_entry(void *arg) { swp_entry_t entry; entry.val = xa_to_value(arg); return entry; } static inline void *swp_to_radix_entry(swp_entry_t entry) { return xa_mk_value(entry.val); } #if IS_ENABLED(CONFIG_DEVICE_PRIVATE) static inline swp_entry_t make_readable_device_private_entry(pgoff_t offset) { return swp_entry(SWP_DEVICE_READ, offset); } static inline swp_entry_t make_writable_device_private_entry(pgoff_t offset) { return swp_entry(SWP_DEVICE_WRITE, offset); } static inline swp_entry_t make_device_exclusive_entry(pgoff_t offset) { return swp_entry(SWP_DEVICE_EXCLUSIVE, offset); } #else /* CONFIG_DEVICE_PRIVATE */ static inline swp_entry_t make_readable_device_private_entry(pgoff_t offset) { return swp_entry(0, 0); } static inline swp_entry_t make_writable_device_private_entry(pgoff_t offset) { return swp_entry(0, 0); } static inline swp_entry_t make_device_exclusive_entry(pgoff_t offset) { return swp_entry(0, 0); } #endif /* CONFIG_DEVICE_PRIVATE */ #ifdef CONFIG_MIGRATION static inline swp_entry_t make_readable_migration_entry(pgoff_t offset) { return swp_entry(SWP_MIGRATION_READ, offset); } static inline swp_entry_t make_readable_exclusive_migration_entry(pgoff_t offset) { return swp_entry(SWP_MIGRATION_READ_EXCLUSIVE, offset); } static inline swp_entry_t make_writable_migration_entry(pgoff_t offset) { return swp_entry(SWP_MIGRATION_WRITE, offset); } /* * Returns whether the host has large enough swap offset field to support * carrying over pgtable A/D bits for page migrations. The result is * pretty much arch specific. */ static inline bool migration_entry_supports_ad(void) { #ifdef CONFIG_SWAP return swap_migration_ad_supported; #else /* CONFIG_SWAP */ return false; #endif /* CONFIG_SWAP */ } static inline swp_entry_t make_migration_entry_young(swp_entry_t entry) { if (migration_entry_supports_ad()) return swp_entry(swp_type(entry), swp_offset(entry) | SWP_MIG_YOUNG); return entry; } static inline swp_entry_t make_migration_entry_dirty(swp_entry_t entry) { if (migration_entry_supports_ad()) return swp_entry(swp_type(entry), swp_offset(entry) | SWP_MIG_DIRTY); return entry; } extern void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd, unsigned long address); extern void migration_entry_wait_huge(struct vm_area_struct *vma, unsigned long addr, pte_t *pte); #else /* CONFIG_MIGRATION */ static inline swp_entry_t make_readable_migration_entry(pgoff_t offset) { return swp_entry(0, 0); } static inline swp_entry_t make_readable_exclusive_migration_entry(pgoff_t offset) { return swp_entry(0, 0); } static inline swp_entry_t make_writable_migration_entry(pgoff_t offset) { return swp_entry(0, 0); } static inline void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd, unsigned long address) { } static inline void migration_entry_wait_huge(struct vm_area_struct *vma, unsigned long addr, pte_t *pte) { } static inline swp_entry_t make_migration_entry_young(swp_entry_t entry) { return entry; } static inline swp_entry_t make_migration_entry_dirty(swp_entry_t entry) { return entry; } #endif /* CONFIG_MIGRATION */ #ifdef CONFIG_MEMORY_FAILURE /* * Support for hardware poisoned pages */ static inline swp_entry_t make_hwpoison_entry(struct page *page) { BUG_ON(!PageLocked(page)); return swp_entry(SWP_HWPOISON, page_to_pfn(page)); } static inline int is_hwpoison_entry(swp_entry_t entry) { return swp_type(entry) == SWP_HWPOISON; } #else static inline swp_entry_t make_hwpoison_entry(struct page *page) { return swp_entry(0, 0); } static inline int is_hwpoison_entry(swp_entry_t swp) { return 0; } #endif typedef unsigned long pte_marker; #define PTE_MARKER_UFFD_WP BIT(0) /* * "Poisoned" here is meant in the very general sense of "future accesses are * invalid", instead of referring very specifically to hardware memory errors. * This marker is meant to represent any of various different causes of this. * * Note that, when encountered by the faulting logic, PTEs with this marker will * result in VM_FAULT_HWPOISON and thus regardless trigger hardware memory error * logic. */ #define PTE_MARKER_POISONED BIT(1) /* * Indicates that, on fault, this PTE will case a SIGSEGV signal to be * sent. This means guard markers behave in effect as if the region were mapped * PROT_NONE, rather than if they were a memory hole or equivalent. */ #define PTE_MARKER_GUARD BIT(2) #define PTE_MARKER_MASK (BIT(3) - 1) static inline swp_entry_t make_pte_marker_entry(pte_marker marker) { return swp_entry(SWP_PTE_MARKER, marker); } static inline pte_t make_pte_marker(pte_marker marker) { return swp_entry_to_pte(make_pte_marker_entry(marker)); } static inline swp_entry_t make_poisoned_swp_entry(void) { return make_pte_marker_entry(PTE_MARKER_POISONED); } static inline swp_entry_t make_guard_swp_entry(void) { return make_pte_marker_entry(PTE_MARKER_GUARD); } struct page_vma_mapped_walk; #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION extern int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw, struct page *page); extern void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, struct page *new); extern void pmd_migration_entry_wait(struct mm_struct *mm, pmd_t *pmd); static inline pmd_t swp_entry_to_pmd(swp_entry_t entry) { swp_entry_t arch_entry; arch_entry = __swp_entry(swp_type(entry), swp_offset(entry)); return __swp_entry_to_pmd(arch_entry); } #else /* CONFIG_ARCH_ENABLE_THP_MIGRATION */ static inline int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw, struct page *page) { BUILD_BUG(); } static inline void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, struct page *new) { BUILD_BUG(); } static inline void pmd_migration_entry_wait(struct mm_struct *m, pmd_t *p) { } static inline pmd_t swp_entry_to_pmd(swp_entry_t entry) { return __pmd(0); } #endif /* CONFIG_ARCH_ENABLE_THP_MIGRATION */ #endif /* CONFIG_MMU */ #endif /* _LINUX_SWAPOPS_H */ |
| 1 1 1 1 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 | // SPDX-License-Identifier: GPL-2.0-only /****************************************************************************** ******************************************************************************* ** ** Copyright (C) Sistina Software, Inc. 1997-2003 All rights reserved. ** Copyright (C) 2004-2009 Red Hat, Inc. All rights reserved. ** ** ******************************************************************************* ******************************************************************************/ /* * lowcomms.c * * This is the "low-level" comms layer. * * It is responsible for sending/receiving messages * from other nodes in the cluster. * * Cluster nodes are referred to by their nodeids. nodeids are * simply 32 bit numbers to the locking module - if they need to * be expanded for the cluster infrastructure then that is its * responsibility. It is this layer's * responsibility to resolve these into IP address or * whatever it needs for inter-node communication. * * The comms level is two kernel threads that deal mainly with * the receiving of messages from other nodes and passing them * up to the mid-level comms layer (which understands the * message format) for execution by the locking core, and * a send thread which does all the setting up of connections * to remote nodes and the sending of data. Threads are not allowed * to send their own data because it may cause them to wait in times * of high load. Also, this way, the sending thread can collect together * messages bound for one node and send them in one block. * * lowcomms will choose to use either TCP or SCTP as its transport layer * depending on the configuration variable 'protocol'. This should be set * to 0 (default) for TCP or 1 for SCTP. It should be configured using a * cluster-wide mechanism as it must be the same on all nodes of the cluster * for the DLM to function. * */ #include <asm/ioctls.h> #include <net/sock.h> #include <net/tcp.h> #include <linux/pagemap.h> #include <linux/file.h> #include <linux/mutex.h> #include <linux/sctp.h> #include <linux/slab.h> #include <net/sctp/sctp.h> #include <net/ipv6.h> #include <trace/events/dlm.h> #include <trace/events/sock.h> #include "dlm_internal.h" #include "lowcomms.h" #include "midcomms.h" #include "memory.h" #include "config.h" #define DLM_SHUTDOWN_WAIT_TIMEOUT msecs_to_jiffies(5000) #define DLM_MAX_PROCESS_BUFFERS 24 #define NEEDED_RMEM (4*1024*1024) struct connection { struct socket *sock; /* NULL if not connected */ uint32_t nodeid; /* So we know who we are in the list */ /* this semaphore is used to allow parallel recv/send in read * lock mode. When we release a sock we need to held the write lock. * * However this is locking code and not nice. When we remove the * othercon handling we can look into other mechanism to synchronize * io handling to call sock_release() at the right time. */ struct rw_semaphore sock_lock; unsigned long flags; #define CF_APP_LIMITED 0 #define CF_RECV_PENDING 1 #define CF_SEND_PENDING 2 #define CF_RECV_INTR 3 #define CF_IO_STOP 4 #define CF_IS_OTHERCON 5 struct list_head writequeue; /* List of outgoing writequeue_entries */ spinlock_t writequeue_lock; int retries; struct hlist_node list; /* due some connect()/accept() races we currently have this cross over * connection attempt second connection for one node. * * There is a solution to avoid the race by introducing a connect * rule as e.g. our_nodeid > nodeid_to_connect who is allowed to * connect. Otherside can connect but will only be considered that * the other side wants to have a reconnect. * * However changing to this behaviour will break backwards compatible. * In a DLM protocol major version upgrade we should remove this! */ struct connection *othercon; struct work_struct rwork; /* receive worker */ struct work_struct swork; /* send worker */ wait_queue_head_t shutdown_wait; unsigned char rx_leftover_buf[DLM_MAX_SOCKET_BUFSIZE]; int rx_leftover; int mark; int addr_count; int curr_addr_index; struct sockaddr_storage addr[DLM_MAX_ADDR_COUNT]; spinlock_t addrs_lock; struct rcu_head rcu; }; #define sock2con(x) ((struct connection *)(x)->sk_user_data) struct listen_connection { struct socket *sock; struct work_struct rwork; }; #define DLM_WQ_REMAIN_BYTES(e) (PAGE_SIZE - e->end) #define DLM_WQ_LENGTH_BYTES(e) (e->end - e->offset) /* An entry waiting to be sent */ struct writequeue_entry { struct list_head list; struct page *page; int offset; int len; int end; int users; bool dirty; struct connection *con; struct list_head msgs; struct kref ref; }; struct dlm_msg { struct writequeue_entry *entry; struct dlm_msg *orig_msg; bool retransmit; void *ppc; int len; int idx; /* new()/commit() idx exchange */ struct list_head list; struct kref ref; }; struct processqueue_entry { unsigned char *buf; int nodeid; int buflen; struct list_head list; }; struct dlm_proto_ops { bool try_new_addr; const char *name; int proto; int how; void (*sockopts)(struct socket *sock); int (*bind)(struct socket *sock); int (*listen_validate)(void); void (*listen_sockopts)(struct socket *sock); int (*listen_bind)(struct socket *sock); }; static struct listen_sock_callbacks { void (*sk_error_report)(struct sock *); void (*sk_data_ready)(struct sock *); void (*sk_state_change)(struct sock *); void (*sk_write_space)(struct sock *); } listen_sock; static struct listen_connection listen_con; static struct sockaddr_storage dlm_local_addr[DLM_MAX_ADDR_COUNT]; static int dlm_local_count; /* Work queues */ static struct workqueue_struct *io_workqueue; static struct workqueue_struct *process_workqueue; static struct hlist_head connection_hash[CONN_HASH_SIZE]; static DEFINE_SPINLOCK(connections_lock); DEFINE_STATIC_SRCU(connections_srcu); static const struct dlm_proto_ops *dlm_proto_ops; #define DLM_IO_SUCCESS 0 #define DLM_IO_END 1 #define DLM_IO_EOF 2 #define DLM_IO_RESCHED 3 #define DLM_IO_FLUSH 4 static void process_recv_sockets(struct work_struct *work); static void process_send_sockets(struct work_struct *work); static void process_dlm_messages(struct work_struct *work); static DECLARE_WORK(process_work, process_dlm_messages); static DEFINE_SPINLOCK(processqueue_lock); static bool process_dlm_messages_pending; static DECLARE_WAIT_QUEUE_HEAD(processqueue_wq); static atomic_t processqueue_count; static LIST_HEAD(processqueue); bool dlm_lowcomms_is_running(void) { return !!listen_con.sock; } static void lowcomms_queue_swork(struct connection *con) { assert_spin_locked(&con->writequeue_lock); if (!test_bit(CF_IO_STOP, &con->flags) && !test_bit(CF_APP_LIMITED, &con->flags) && !test_and_set_bit(CF_SEND_PENDING, &con->flags)) queue_work(io_workqueue, &con->swork); } static void lowcomms_queue_rwork(struct connection *con) { #ifdef CONFIG_LOCKDEP WARN_ON_ONCE(!lockdep_sock_is_held(con->sock->sk)); #endif if (!test_bit(CF_IO_STOP, &con->flags) && !test_and_set_bit(CF_RECV_PENDING, &con->flags)) queue_work(io_workqueue, &con->rwork); } static void writequeue_entry_ctor(void *data) { struct writequeue_entry *entry = data; INIT_LIST_HEAD(&entry->msgs); } struct kmem_cache *dlm_lowcomms_writequeue_cache_create(void) { return kmem_cache_create("dlm_writequeue", sizeof(struct writequeue_entry), 0, 0, writequeue_entry_ctor); } struct kmem_cache *dlm_lowcomms_msg_cache_create(void) { return KMEM_CACHE(dlm_msg, 0); } /* need to held writequeue_lock */ static struct writequeue_entry *con_next_wq(struct connection *con) { struct writequeue_entry *e; e = list_first_entry_or_null(&con->writequeue, struct writequeue_entry, list); /* if len is zero nothing is to send, if there are users filling * buffers we wait until the users are done so we can send more. */ if (!e || e->users || e->len == 0) return NULL; return e; } static struct connection *__find_con(int nodeid, int r) { struct connection *con; hlist_for_each_entry_rcu(con, &connection_hash[r], list) { if (con->nodeid == nodeid) return con; } return NULL; } static void dlm_con_init(struct connection *con, int nodeid) { con->nodeid = nodeid; init_rwsem(&con->sock_lock); INIT_LIST_HEAD(&con->writequeue); spin_lock_init(&con->writequeue_lock); INIT_WORK(&con->swork, process_send_sockets); INIT_WORK(&con->rwork, process_recv_sockets); spin_lock_init(&con->addrs_lock); init_waitqueue_head(&con->shutdown_wait); } /* * If 'allocation' is zero then we don't attempt to create a new * connection structure for this node. */ static struct connection *nodeid2con(int nodeid, gfp_t alloc) { struct connection *con, *tmp; int r; r = nodeid_hash(nodeid); con = __find_con(nodeid, r); if (con || !alloc) return con; con = kzalloc_obj(*con, alloc); if (!con) return NULL; dlm_con_init(con, nodeid); spin_lock(&connections_lock); /* Because multiple workqueues/threads calls this function it can * race on multiple cpu's. Instead of locking hot path __find_con() * we just check in rare cases of recently added nodes again * under protection of connections_lock. If this is the case we * abort our connection creation and return the existing connection. */ tmp = __find_con(nodeid, r); if (tmp) { spin_unlock(&connections_lock); kfree(con); return tmp; } hlist_add_head_rcu(&con->list, &connection_hash[r]); spin_unlock(&connections_lock); return con; } static int addr_compare(const struct sockaddr_storage *x, const struct sockaddr_storage *y) { switch (x->ss_family) { case AF_INET: { struct sockaddr_in *sinx = (struct sockaddr_in *)x; struct sockaddr_in *siny = (struct sockaddr_in *)y; if (sinx->sin_addr.s_addr != siny->sin_addr.s_addr) return 0; if (sinx->sin_port != siny->sin_port) return 0; break; } case AF_INET6: { struct sockaddr_in6 *sinx = (struct sockaddr_in6 *)x; struct sockaddr_in6 *siny = (struct sockaddr_in6 *)y; if (!ipv6_addr_equal(&sinx->sin6_addr, &siny->sin6_addr)) return 0; if (sinx->sin6_port != siny->sin6_port) return 0; break; } default: return 0; } return 1; } static int nodeid_to_addr(int nodeid, struct sockaddr_storage *sas_out, struct sockaddr *sa_out, bool try_new_addr, unsigned int *mark) { struct sockaddr_storage sas; struct connection *con; int idx; if (!dlm_local_count) return -1; idx = srcu_read_lock(&connections_srcu); con = nodeid2con(nodeid, 0); if (!con) { srcu_read_unlock(&connections_srcu, idx); return -ENOENT; } spin_lock(&con->addrs_lock); if (!con->addr_count) { spin_unlock(&con->addrs_lock); srcu_read_unlock(&connections_srcu, idx); return -ENOENT; } memcpy(&sas, &con->addr[con->curr_addr_index], sizeof(struct sockaddr_storage)); if (try_new_addr) { con->curr_addr_index++; if (con->curr_addr_index == con->addr_count) con->curr_addr_index = 0; } *mark = con->mark; spin_unlock(&con->addrs_lock); if (sas_out) memcpy(sas_out, &sas, sizeof(struct sockaddr_storage)); if (!sa_out) { srcu_read_unlock(&connections_srcu, idx); return 0; } if (dlm_local_addr[0].ss_family == AF_INET) { struct sockaddr_in *in4 = (struct sockaddr_in *) &sas; struct sockaddr_in *ret4 = (struct sockaddr_in *) sa_out; ret4->sin_addr.s_addr = in4->sin_addr.s_addr; } else { struct sockaddr_in6 *in6 = (struct sockaddr_in6 *) &sas; struct sockaddr_in6 *ret6 = (struct sockaddr_in6 *) sa_out; ret6->sin6_addr = in6->sin6_addr; } srcu_read_unlock(&connections_srcu, idx); return 0; } static int addr_to_nodeid(struct sockaddr_storage *addr, int *nodeid, unsigned int *mark) { struct connection *con; int i, idx, addr_i; idx = srcu_read_lock(&connections_srcu); for (i = 0; i < CONN_HASH_SIZE; i++) { hlist_for_each_entry_rcu(con, &connection_hash[i], list) { WARN_ON_ONCE(!con->addr_count); spin_lock(&con->addrs_lock); for (addr_i = 0; addr_i < con->addr_count; addr_i++) { if (addr_compare(&con->addr[addr_i], addr)) { *nodeid = con->nodeid; *mark = con->mark; spin_unlock(&con->addrs_lock); srcu_read_unlock(&connections_srcu, idx); return 0; } } spin_unlock(&con->addrs_lock); } } srcu_read_unlock(&connections_srcu, idx); return -ENOENT; } static bool dlm_lowcomms_con_has_addr(const struct connection *con, const struct sockaddr_storage *addr) { int i; for (i = 0; i < con->addr_count; i++) { if (addr_compare(&con->addr[i], addr)) return true; } return false; } int dlm_lowcomms_addr(int nodeid, struct sockaddr_storage *addr) { struct connection *con; bool ret; int idx; idx = srcu_read_lock(&connections_srcu); con = nodeid2con(nodeid, GFP_NOFS); if (!con) { srcu_read_unlock(&connections_srcu, idx); return -ENOMEM; } spin_lock(&con->addrs_lock); if (!con->addr_count) { memcpy(&con->addr[0], addr, sizeof(*addr)); con->addr_count = 1; con->mark = dlm_config.ci_mark; spin_unlock(&con->addrs_lock); srcu_read_unlock(&connections_srcu, idx); return 0; } ret = dlm_lowcomms_con_has_addr(con, addr); if (ret) { spin_unlock(&con->addrs_lock); srcu_read_unlock(&connections_srcu, idx); return -EEXIST; } if (con->addr_count >= DLM_MAX_ADDR_COUNT) { spin_unlock(&con->addrs_lock); srcu_read_unlock(&connections_srcu, idx); return -ENOSPC; } memcpy(&con->addr[con->addr_count++], addr, sizeof(*addr)); srcu_read_unlock(&connections_srcu, idx); spin_unlock(&con->addrs_lock); return 0; } /* Data available on socket or listen socket received a connect */ static void lowcomms_data_ready(struct sock *sk) { struct connection *con = sock2con(sk); trace_sk_data_ready(sk); set_bit(CF_RECV_INTR, &con->flags); lowcomms_queue_rwork(con); } static void lowcomms_write_space(struct sock *sk) { struct connection *con = sock2con(sk); clear_bit(SOCK_NOSPACE, &con->sock->flags); spin_lock_bh(&con->writequeue_lock); if (test_and_clear_bit(CF_APP_LIMITED, &con->flags)) { con->sock->sk->sk_write_pending--; clear_bit(SOCKWQ_ASYNC_NOSPACE, &con->sock->flags); } lowcomms_queue_swork(con); spin_unlock_bh(&con->writequeue_lock); } static void lowcomms_state_change(struct sock *sk) { /* SCTP layer is not calling sk_data_ready when the connection * is done, so we catch the signal through here. */ if (sk->sk_shutdown & RCV_SHUTDOWN) lowcomms_data_ready(sk); } static void lowcomms_listen_data_ready(struct sock *sk) { trace_sk_data_ready(sk); queue_work(io_workqueue, &listen_con.rwork); } int dlm_lowcomms_connect_node(int nodeid) { struct connection *con; int idx; idx = srcu_read_lock(&connections_srcu); con = nodeid2con(nodeid, 0); if (WARN_ON_ONCE(!con)) { srcu_read_unlock(&connections_srcu, idx); return -ENOENT; } down_read(&con->sock_lock); if (!con->sock) { spin_lock_bh(&con->writequeue_lock); lowcomms_queue_swork(con); spin_unlock_bh(&con->writequeue_lock); } up_read(&con->sock_lock); srcu_read_unlock(&connections_srcu, idx); cond_resched(); return 0; } int dlm_lowcomms_nodes_set_mark(int nodeid, unsigned int mark) { struct connection *con; int idx; idx = srcu_read_lock(&connections_srcu); con = nodeid2con(nodeid, 0); if (!con) { srcu_read_unlock(&connections_srcu, idx); return -ENOENT; } spin_lock(&con->addrs_lock); con->mark = mark; spin_unlock(&con->addrs_lock); srcu_read_unlock(&connections_srcu, idx); return 0; } static void lowcomms_error_report(struct sock *sk) { struct connection *con = sock2con(sk); struct inet_sock *inet; inet = inet_sk(sk); switch (sk->sk_family) { case AF_INET: printk_ratelimited(KERN_ERR "dlm: node %d: socket error " "sending to node %d at %pI4, dport %d, " "sk_err=%d/%d\n", dlm_our_nodeid(), con->nodeid, &inet->inet_daddr, ntohs(inet->inet_dport), sk->sk_err, READ_ONCE(sk->sk_err_soft)); break; #if IS_ENABLED(CONFIG_IPV6) case AF_INET6: printk_ratelimited(KERN_ERR "dlm: node %d: socket error " "sending to node %d at %pI6c, " "dport %d, sk_err=%d/%d\n", dlm_our_nodeid(), con->nodeid, &sk->sk_v6_daddr, ntohs(inet->inet_dport), sk->sk_err, READ_ONCE(sk->sk_err_soft)); break; #endif default: printk_ratelimited(KERN_ERR "dlm: node %d: socket error " "invalid socket family %d set, " "sk_err=%d/%d\n", dlm_our_nodeid(), sk->sk_family, sk->sk_err, READ_ONCE(sk->sk_err_soft)); break; } dlm_midcomms_unack_msg_resend(con->nodeid); listen_sock.sk_error_report(sk); } static void restore_callbacks(struct sock *sk) { #ifdef CONFIG_LOCKDEP WARN_ON_ONCE(!lockdep_sock_is_held(sk)); #endif sk->sk_user_data = NULL; sk->sk_data_ready = listen_sock.sk_data_ready; sk->sk_state_change = listen_sock.sk_state_change; sk->sk_write_space = listen_sock.sk_write_space; sk->sk_error_report = listen_sock.sk_error_report; } /* Make a socket active */ static void add_sock(struct socket *sock, struct connection *con) { struct sock *sk = sock->sk; lock_sock(sk); con->sock = sock; sk->sk_user_data = con; sk->sk_data_ready = lowcomms_data_ready; sk->sk_write_space = lowcomms_write_space; if (dlm_config.ci_protocol == DLM_PROTO_SCTP) sk->sk_state_change = lowcomms_state_change; sk->sk_allocation = GFP_NOFS; sk->sk_use_task_frag = false; sk->sk_error_report = lowcomms_error_report; release_sock(sk); } /* Add the port number to an IPv6 or 4 sockaddr and return the address length */ static void make_sockaddr(struct sockaddr_storage *saddr, __be16 port, int *addr_len) { saddr->ss_family = dlm_local_addr[0].ss_family; if (saddr->ss_family == AF_INET) { struct sockaddr_in *in4_addr = (struct sockaddr_in *)saddr; in4_addr->sin_port = port; *addr_len = sizeof(struct sockaddr_in); memset(&in4_addr->sin_zero, 0, sizeof(in4_addr->sin_zero)); } else { struct sockaddr_in6 *in6_addr = (struct sockaddr_in6 *)saddr; in6_addr->sin6_port = port; *addr_len = sizeof(struct sockaddr_in6); } memset((char *)saddr + *addr_len, 0, sizeof(struct sockaddr_storage) - *addr_len); } static void dlm_page_release(struct kref *kref) { struct writequeue_entry *e = container_of(kref, struct writequeue_entry, ref); __free_page(e->page); dlm_free_writequeue(e); } static void dlm_msg_release(struct kref *kref) { struct dlm_msg *msg = container_of(kref, struct dlm_msg, ref); kref_put(&msg->entry->ref, dlm_page_release); dlm_free_msg(msg); } static void free_entry(struct writequeue_entry *e) { struct dlm_msg *msg, *tmp; list_for_each_entry_safe(msg, tmp, &e->msgs, list) { if (msg->orig_msg) { msg->orig_msg->retransmit = false; kref_put(&msg->orig_msg->ref, dlm_msg_release); } list_del(&msg->list); kref_put(&msg->ref, dlm_msg_release); } list_del(&e->list); kref_put(&e->ref, dlm_page_release); } static void dlm_close_sock(struct socket **sock) { lock_sock((*sock)->sk); restore_callbacks((*sock)->sk); release_sock((*sock)->sk); sock_release(*sock); *sock = NULL; } static void allow_connection_io(struct connection *con) { if (con->othercon) clear_bit(CF_IO_STOP, &con->othercon->flags); clear_bit(CF_IO_STOP, &con->flags); } static void stop_connection_io(struct connection *con) { if (con->othercon) stop_connection_io(con->othercon); spin_lock_bh(&con->writequeue_lock); set_bit(CF_IO_STOP, &con->flags); spin_unlock_bh(&con->writequeue_lock); down_write(&con->sock_lock); if (con->sock) { lock_sock(con->sock->sk); restore_callbacks(con->sock->sk); release_sock(con->sock->sk); } up_write(&con->sock_lock); cancel_work_sync(&con->swork); cancel_work_sync(&con->rwork); } /* Close a remote connection and tidy up */ static void close_connection(struct connection *con, bool and_other) { struct writequeue_entry *e; if (con->othercon && and_other) close_connection(con->othercon, false); down_write(&con->sock_lock); if (!con->sock) { up_write(&con->sock_lock); return; } dlm_close_sock(&con->sock); /* if we send a writequeue entry only a half way, we drop the * whole entry because reconnection and that we not start of the * middle of a msg which will confuse the other end. * * we can always drop messages because retransmits, but what we * cannot allow is to transmit half messages which may be processed * at the other side. * * our policy is to start on a clean state when disconnects, we don't * know what's send/received on transport layer in this case. */ spin_lock_bh(&con->writequeue_lock); if (!list_empty(&con->writequeue)) { e = list_first_entry(&con->writequeue, struct writequeue_entry, list); if (e->dirty) free_entry(e); } spin_unlock_bh(&con->writequeue_lock); con->rx_leftover = 0; con->retries = 0; clear_bit(CF_APP_LIMITED, &con->flags); clear_bit(CF_RECV_PENDING, &con->flags); clear_bit(CF_SEND_PENDING, &con->flags); up_write(&con->sock_lock); } static void shutdown_connection(struct connection *con, bool and_other) { int ret; if (con->othercon && and_other) shutdown_connection(con->othercon, false); flush_workqueue(io_workqueue); down_read(&con->sock_lock); /* nothing to shutdown */ if (!con->sock) { up_read(&con->sock_lock); return; } ret = kernel_sock_shutdown(con->sock, dlm_proto_ops->how); up_read(&con->sock_lock); if (ret) { log_print("Connection %p failed to shutdown: %d will force close", con, ret); goto force_close; } else { ret = wait_event_timeout(con->shutdown_wait, !con->sock, DLM_SHUTDOWN_WAIT_TIMEOUT); if (ret == 0) { log_print("Connection %p shutdown timed out, will force close", con); goto force_close; } } return; force_close: close_connection(con, false); } static struct processqueue_entry *new_processqueue_entry(int nodeid, int buflen) { struct processqueue_entry *pentry; pentry = kmalloc_obj(*pentry, GFP_NOFS); if (!pentry) return NULL; pentry->buf = kmalloc(buflen, GFP_NOFS); if (!pentry->buf) { kfree(pentry); return NULL; } pentry->nodeid = nodeid; return pentry; } static void free_processqueue_entry(struct processqueue_entry *pentry) { kfree(pentry->buf); kfree(pentry); } static void process_dlm_messages(struct work_struct *work) { struct processqueue_entry *pentry; spin_lock_bh(&processqueue_lock); pentry = list_first_entry_or_null(&processqueue, struct processqueue_entry, list); if (WARN_ON_ONCE(!pentry)) { process_dlm_messages_pending = false; spin_unlock_bh(&processqueue_lock); return; } list_del(&pentry->list); if (atomic_dec_and_test(&processqueue_count)) wake_up(&processqueue_wq); spin_unlock_bh(&processqueue_lock); for (;;) { dlm_process_incoming_buffer(pentry->nodeid, pentry->buf, pentry->buflen); free_processqueue_entry(pentry); spin_lock_bh(&processqueue_lock); pentry = list_first_entry_or_null(&processqueue, struct processqueue_entry, list); if (!pentry) { process_dlm_messages_pending = false; spin_unlock_bh(&processqueue_lock); break; } list_del(&pentry->list); if (atomic_dec_and_test(&processqueue_count)) wake_up(&processqueue_wq); spin_unlock_bh(&processqueue_lock); } } /* Data received from remote end */ static int receive_from_sock(struct connection *con, int buflen) { struct processqueue_entry *pentry; int ret, buflen_real; struct msghdr msg; struct kvec iov; pentry = new_processqueue_entry(con->nodeid, buflen); if (!pentry) return DLM_IO_RESCHED; memcpy(pentry->buf, con->rx_leftover_buf, con->rx_leftover); /* calculate new buffer parameter regarding last receive and * possible leftover bytes */ iov.iov_base = pentry->buf + con->rx_leftover; iov.iov_len = buflen - con->rx_leftover; memset(&msg, 0, sizeof(msg)); msg.msg_flags = MSG_DONTWAIT | MSG_NOSIGNAL; clear_bit(CF_RECV_INTR, &con->flags); again: ret = kernel_recvmsg(con->sock, &msg, &iov, 1, iov.iov_len, msg.msg_flags); trace_dlm_recv(con->nodeid, ret); if (ret == -EAGAIN) { lock_sock(con->sock->sk); if (test_and_clear_bit(CF_RECV_INTR, &con->flags)) { release_sock(con->sock->sk); goto again; } clear_bit(CF_RECV_PENDING, &con->flags); release_sock(con->sock->sk); free_processqueue_entry(pentry); return DLM_IO_END; } else if (ret == 0) { /* close will clear CF_RECV_PENDING */ free_processqueue_entry(pentry); return DLM_IO_EOF; } else if (ret < 0) { free_processqueue_entry(pentry); return ret; } /* new buflen according readed bytes and leftover from last receive */ buflen_real = ret + con->rx_leftover; ret = dlm_validate_incoming_buffer(con->nodeid, pentry->buf, buflen_real); if (ret < 0) { free_processqueue_entry(pentry); return ret; } pentry->buflen = ret; /* calculate leftover bytes from process and put it into begin of * the receive buffer, so next receive we have the full message * at the start address of the receive buffer. */ con->rx_leftover = buflen_real - ret; memmove(con->rx_leftover_buf, pentry->buf + ret, con->rx_leftover); spin_lock_bh(&processqueue_lock); ret = atomic_inc_return(&processqueue_count); list_add_tail(&pentry->list, &processqueue); if (!process_dlm_messages_pending) { process_dlm_messages_pending = true; queue_work(process_workqueue, &process_work); } spin_unlock_bh(&processqueue_lock); if (ret > DLM_MAX_PROCESS_BUFFERS) return DLM_IO_FLUSH; return DLM_IO_SUCCESS; } /* Listening socket is busy, accept a connection */ static int accept_from_sock(void) { struct sockaddr_storage peeraddr; int len, idx, result, nodeid; struct connection *newcon; struct socket *newsock; unsigned int mark; result = kernel_accept(listen_con.sock, &newsock, O_NONBLOCK); if (result == -EAGAIN) return DLM_IO_END; else if (result < 0) goto accept_err; /* Get the connected socket's peer */ memset(&peeraddr, 0, sizeof(peeraddr)); len = newsock->ops->getname(newsock, (struct sockaddr *)&peeraddr, 2); if (len < 0) { result = -ECONNABORTED; goto accept_err; } /* Get the new node's NODEID */ make_sockaddr(&peeraddr, 0, &len); if (addr_to_nodeid(&peeraddr, &nodeid, &mark)) { switch (peeraddr.ss_family) { case AF_INET: { struct sockaddr_in *sin = (struct sockaddr_in *)&peeraddr; log_print("connect from non cluster IPv4 node %pI4", &sin->sin_addr); break; } #if IS_ENABLED(CONFIG_IPV6) case AF_INET6: { struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *)&peeraddr; log_print("connect from non cluster IPv6 node %pI6c", &sin6->sin6_addr); break; } #endif default: log_print("invalid family from non cluster node"); break; } sock_release(newsock); return -1; } log_print("got connection from %d", nodeid); /* Check to see if we already have a connection to this node. This * could happen if the two nodes initiate a connection at roughly * the same time and the connections cross on the wire. * In this case we store the incoming one in "othercon" */ idx = srcu_read_lock(&connections_srcu); newcon = nodeid2con(nodeid, 0); if (WARN_ON_ONCE(!newcon)) { srcu_read_unlock(&connections_srcu, idx); result = -ENOENT; goto accept_err; } sock_set_mark(newsock->sk, mark); down_write(&newcon->sock_lock); if (newcon->sock) { struct connection *othercon = newcon->othercon; if (!othercon) { othercon = kzalloc_obj(*othercon, GFP_NOFS); if (!othercon) { log_print("failed to allocate incoming socket"); up_write(&newcon->sock_lock); srcu_read_unlock(&connections_srcu, idx); result = -ENOMEM; goto accept_err; } dlm_con_init(othercon, nodeid); lockdep_set_subclass(&othercon->sock_lock, 1); newcon->othercon = othercon; set_bit(CF_IS_OTHERCON, &othercon->flags); } else { /* close other sock con if we have something new */ close_connection(othercon, false); } down_write(&othercon->sock_lock); add_sock(newsock, othercon); /* check if we receved something while adding */ lock_sock(othercon->sock->sk); lowcomms_queue_rwork(othercon); release_sock(othercon->sock->sk); up_write(&othercon->sock_lock); } else { /* accept copies the sk after we've saved the callbacks, so we don't want to save them a second time or comm errors will result in calling sk_error_report recursively. */ add_sock(newsock, newcon); /* check if we receved something while adding */ lock_sock(newcon->sock->sk); lowcomms_queue_rwork(newcon); release_sock(newcon->sock->sk); } up_write(&newcon->sock_lock); srcu_read_unlock(&connections_srcu, idx); return DLM_IO_SUCCESS; accept_err: if (newsock) sock_release(newsock); return result; } /* * writequeue_entry_complete - try to delete and free write queue entry * @e: write queue entry to try to delete * @completed: bytes completed * * writequeue_lock must be held. */ static void writequeue_entry_complete(struct writequeue_entry *e, int completed) { e->offset += completed; e->len -= completed; /* signal that page was half way transmitted */ e->dirty = true; if (e->len == 0 && e->users == 0) free_entry(e); } /* * sctp_bind_addrs - bind a SCTP socket to all our addresses */ static int sctp_bind_addrs(struct socket *sock, __be16 port) { struct sockaddr_storage localaddr; struct sockaddr_unsized *addr = (struct sockaddr_unsized *)&localaddr; int i, addr_len, result = 0; for (i = 0; i < dlm_local_count; i++) { memcpy(&localaddr, &dlm_local_addr[i], sizeof(localaddr)); make_sockaddr(&localaddr, port, &addr_len); if (!i) result = kernel_bind(sock, addr, addr_len); else result = sock_bind_add(sock->sk, addr, addr_len); if (result < 0) { log_print("Can't bind to %d addr number %d, %d.\n", port, i + 1, result); break; } } return result; } /* Get local addresses */ static void init_local(void) { struct sockaddr_storage sas; int i; dlm_local_count = 0; for (i = 0; i < DLM_MAX_ADDR_COUNT; i++) { if (dlm_our_addr(&sas, i)) break; memcpy(&dlm_local_addr[dlm_local_count++], &sas, sizeof(sas)); } } static struct writequeue_entry *new_writequeue_entry(struct connection *con) { struct writequeue_entry *entry; entry = dlm_allocate_writequeue(); if (!entry) return NULL; entry->page = alloc_page(GFP_ATOMIC | __GFP_ZERO); if (!entry->page) { dlm_free_writequeue(entry); return NULL; } entry->offset = 0; entry->len = 0; entry->end = 0; entry->dirty = false; entry->con = con; entry->users = 1; kref_init(&entry->ref); return entry; } static struct writequeue_entry *new_wq_entry(struct connection *con, int len, char **ppc, void (*cb)(void *data), void *data) { struct writequeue_entry *e; spin_lock_bh(&con->writequeue_lock); if (!list_empty(&con->writequeue)) { e = list_last_entry(&con->writequeue, struct writequeue_entry, list); if (DLM_WQ_REMAIN_BYTES(e) >= len) { kref_get(&e->ref); *ppc = page_address(e->page) + e->end; if (cb) cb(data); e->end += len; e->users++; goto out; } } e = new_writequeue_entry(con); if (!e) goto out; kref_get(&e->ref); *ppc = page_address(e->page); e->end += len; if (cb) cb(data); list_add_tail(&e->list, &con->writequeue); out: spin_unlock_bh(&con->writequeue_lock); return e; }; static struct dlm_msg *dlm_lowcomms_new_msg_con(struct connection *con, int len, char **ppc, void (*cb)(void *data), void *data) { struct writequeue_entry *e; struct dlm_msg *msg; msg = dlm_allocate_msg(); if (!msg) return NULL; kref_init(&msg->ref); e = new_wq_entry(con, len, ppc, cb, data); if (!e) { dlm_free_msg(msg); return NULL; } msg->retransmit = false; msg->orig_msg = NULL; msg->ppc = *ppc; msg->len = len; msg->entry = e; return msg; } /* avoid false positive for nodes_srcu, unlock happens in * dlm_lowcomms_commit_msg which is a must call if success */ #ifndef __CHECKER__ struct dlm_msg *dlm_lowcomms_new_msg(int nodeid, int len, char **ppc, void (*cb)(void *data), void *data) { struct connection *con; struct dlm_msg *msg; int idx; if (len > DLM_MAX_SOCKET_BUFSIZE || len < sizeof(struct dlm_header)) { BUILD_BUG_ON(PAGE_SIZE < DLM_MAX_SOCKET_BUFSIZE); log_print("failed to allocate a buffer of size %d", len); WARN_ON_ONCE(1); return NULL; } idx = srcu_read_lock(&connections_srcu); con = nodeid2con(nodeid, 0); if (WARN_ON_ONCE(!con)) { srcu_read_unlock(&connections_srcu, idx); return NULL; } msg = dlm_lowcomms_new_msg_con(con, len, ppc, cb, data); if (!msg) { srcu_read_unlock(&connections_srcu, idx); return NULL; } /* for dlm_lowcomms_commit_msg() */ kref_get(&msg->ref); /* we assume if successful commit must called */ msg->idx = idx; return msg; } #endif static void _dlm_lowcomms_commit_msg(struct dlm_msg *msg) { struct writequeue_entry *e = msg->entry; struct connection *con = e->con; int users; spin_lock_bh(&con->writequeue_lock); kref_get(&msg->ref); list_add(&msg->list, &e->msgs); users = --e->users; if (users) goto out; e->len = DLM_WQ_LENGTH_BYTES(e); lowcomms_queue_swork(con); out: spin_unlock_bh(&con->writequeue_lock); return; } /* avoid false positive for nodes_srcu, lock was happen in * dlm_lowcomms_new_msg */ #ifndef __CHECKER__ void dlm_lowcomms_commit_msg(struct dlm_msg *msg) { _dlm_lowcomms_commit_msg(msg); srcu_read_unlock(&connections_srcu, msg->idx); /* because dlm_lowcomms_new_msg() */ kref_put(&msg->ref, dlm_msg_release); } #endif void dlm_lowcomms_put_msg(struct dlm_msg *msg) { kref_put(&msg->ref, dlm_msg_release); } /* does not held connections_srcu, usage lowcomms_error_report only */ int dlm_lowcomms_resend_msg(struct dlm_msg *msg) { struct dlm_msg *msg_resend; char *ppc; if (msg->retransmit) return 1; msg_resend = dlm_lowcomms_new_msg_con(msg->entry->con, msg->len, &ppc, NULL, NULL); if (!msg_resend) return -ENOMEM; msg->retransmit = true; kref_get(&msg->ref); msg_resend->orig_msg = msg; memcpy(ppc, msg->ppc, msg->len); _dlm_lowcomms_commit_msg(msg_resend); dlm_lowcomms_put_msg(msg_resend); return 0; } /* Send a message */ static int send_to_sock(struct connection *con) { struct writequeue_entry *e; struct bio_vec bvec; struct msghdr msg = { .msg_flags = MSG_SPLICE_PAGES | MSG_DONTWAIT | MSG_NOSIGNAL, }; int len, offset, ret; spin_lock_bh(&con->writequeue_lock); e = con_next_wq(con); if (!e) { clear_bit(CF_SEND_PENDING, &con->flags); spin_unlock_bh(&con->writequeue_lock); return DLM_IO_END; } len = e->len; offset = e->offset; WARN_ON_ONCE(len == 0 && e->users == 0); spin_unlock_bh(&con->writequeue_lock); bvec_set_page(&bvec, e->page, len, offset); iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, len); ret = sock_sendmsg(con->sock, &msg); trace_dlm_send(con->nodeid, ret); if (ret == -EAGAIN || ret == 0) { lock_sock(con->sock->sk); spin_lock_bh(&con->writequeue_lock); if (test_bit(SOCKWQ_ASYNC_NOSPACE, &con->sock->flags) && !test_and_set_bit(CF_APP_LIMITED, &con->flags)) { /* Notify TCP that we're limited by the * application window size. */ set_bit(SOCK_NOSPACE, &con->sock->sk->sk_socket->flags); con->sock->sk->sk_write_pending++; clear_bit(CF_SEND_PENDING, &con->flags); spin_unlock_bh(&con->writequeue_lock); release_sock(con->sock->sk); /* wait for write_space() event */ return DLM_IO_END; } spin_unlock_bh(&con->writequeue_lock); release_sock(con->sock->sk); return DLM_IO_RESCHED; } else if (ret < 0) { return ret; } spin_lock_bh(&con->writequeue_lock); writequeue_entry_complete(e, ret); spin_unlock_bh(&con->writequeue_lock); return DLM_IO_SUCCESS; } static void clean_one_writequeue(struct connection *con) { struct writequeue_entry *e, *safe; spin_lock_bh(&con->writequeue_lock); list_for_each_entry_safe(e, safe, &con->writequeue, list) { free_entry(e); } spin_unlock_bh(&con->writequeue_lock); } static void connection_release(struct rcu_head *rcu) { struct connection *con = container_of(rcu, struct connection, rcu); WARN_ON_ONCE(!list_empty(&con->writequeue)); WARN_ON_ONCE(con->sock); kfree(con); } /* Called from recovery when it knows that a node has left the cluster */ int dlm_lowcomms_close(int nodeid) { struct connection *con; int idx; log_print("closing connection to node %d", nodeid); idx = srcu_read_lock(&connections_srcu); con = nodeid2con(nodeid, 0); if (WARN_ON_ONCE(!con)) { srcu_read_unlock(&connections_srcu, idx); return -ENOENT; } stop_connection_io(con); log_print("io handling for node: %d stopped", nodeid); close_connection(con, true); spin_lock(&connections_lock); hlist_del_rcu(&con->list); spin_unlock(&connections_lock); clean_one_writequeue(con); call_srcu(&connections_srcu, &con->rcu, connection_release); if (con->othercon) { clean_one_writequeue(con->othercon); call_srcu(&connections_srcu, &con->othercon->rcu, connection_release); } srcu_read_unlock(&connections_srcu, idx); /* for debugging we print when we are done to compare with other * messages in between. This function need to be correctly synchronized * with io handling */ log_print("closing connection to node %d done", nodeid); return 0; } /* Receive worker function */ static void process_recv_sockets(struct work_struct *work) { struct connection *con = container_of(work, struct connection, rwork); int ret, buflen; down_read(&con->sock_lock); if (!con->sock) { up_read(&con->sock_lock); return; } buflen = READ_ONCE(dlm_config.ci_buffer_size); do { ret = receive_from_sock(con, buflen); } while (ret == DLM_IO_SUCCESS); up_read(&con->sock_lock); switch (ret) { case DLM_IO_END: /* CF_RECV_PENDING cleared */ break; case DLM_IO_EOF: close_connection(con, false); wake_up(&con->shutdown_wait); /* CF_RECV_PENDING cleared */ break; case DLM_IO_FLUSH: /* we can't flush the process_workqueue here because a * WQ_MEM_RECLAIM workequeue can occurr a deadlock for a non * WQ_MEM_RECLAIM workqueue such as process_workqueue. Instead * we have a waitqueue to wait until all messages are * processed. * * This handling is only necessary to backoff the sender and * not queue all messages from the socket layer into DLM * processqueue. When DLM is capable to parse multiple messages * on an e.g. per socket basis this handling can might be * removed. Especially in a message burst we are too slow to * process messages and the queue will fill up memory. */ wait_event(processqueue_wq, !atomic_read(&processqueue_count)); fallthrough; case DLM_IO_RESCHED: cond_resched(); queue_work(io_workqueue, &con->rwork); /* CF_RECV_PENDING not cleared */ break; default: if (ret < 0) { if (test_bit(CF_IS_OTHERCON, &con->flags)) { close_connection(con, false); } else { spin_lock_bh(&con->writequeue_lock); lowcomms_queue_swork(con); spin_unlock_bh(&con->writequeue_lock); } /* CF_RECV_PENDING cleared for othercon * we trigger send queue if not already done * and process_send_sockets will handle it */ break; } WARN_ON_ONCE(1); break; } } static void process_listen_recv_socket(struct work_struct *work) { int ret; if (WARN_ON_ONCE(!listen_con.sock)) return; do { ret = accept_from_sock(); } while (ret == DLM_IO_SUCCESS); if (ret < 0) log_print("critical error accepting connection: %d", ret); } static int dlm_connect(struct connection *con) { struct sockaddr_storage addr; int result, addr_len; struct socket *sock; unsigned int mark; memset(&addr, 0, sizeof(addr)); result = nodeid_to_addr(con->nodeid, &addr, NULL, dlm_proto_ops->try_new_addr, &mark); if (result < 0) { log_print("no address for nodeid %d", con->nodeid); return result; } /* Create a socket to communicate with */ result = sock_create_kern(&init_net, dlm_local_addr[0].ss_family, SOCK_STREAM, dlm_proto_ops->proto, &sock); if (result < 0) return result; sock_set_mark(sock->sk, mark); dlm_proto_ops->sockopts(sock); result = dlm_proto_ops->bind(sock); if (result < 0) { sock_release(sock); return result; } add_sock(sock, con); log_print_ratelimited("connecting to %d", con->nodeid); make_sockaddr(&addr, dlm_config.ci_tcp_port, &addr_len); result = kernel_connect(sock, (struct sockaddr_unsized *)&addr, addr_len, 0); switch (result) { case -EINPROGRESS: /* not an error */ fallthrough; case 0: break; default: if (result < 0) dlm_close_sock(&con->sock); break; } return result; } /* Send worker function */ static void process_send_sockets(struct work_struct *work) { struct connection *con = container_of(work, struct connection, swork); int ret; WARN_ON_ONCE(test_bit(CF_IS_OTHERCON, &con->flags)); down_read(&con->sock_lock); if (!con->sock) { up_read(&con->sock_lock); down_write(&con->sock_lock); if (!con->sock) { ret = dlm_connect(con); switch (ret) { case 0: break; default: /* CF_SEND_PENDING not cleared */ up_write(&con->sock_lock); log_print("connect to node %d try %d error %d", con->nodeid, con->retries++, ret); msleep(1000); /* For now we try forever to reconnect. In * future we should send a event to cluster * manager to fence itself after certain amount * of retries. */ queue_work(io_workqueue, &con->swork); return; } } downgrade_write(&con->sock_lock); } do { ret = send_to_sock(con); } while (ret == DLM_IO_SUCCESS); up_read(&con->sock_lock); switch (ret) { case DLM_IO_END: /* CF_SEND_PENDING cleared */ break; case DLM_IO_RESCHED: /* CF_SEND_PENDING not cleared */ cond_resched(); queue_work(io_workqueue, &con->swork); break; default: if (ret < 0) { close_connection(con, false); /* CF_SEND_PENDING cleared */ spin_lock_bh(&con->writequeue_lock); lowcomms_queue_swork(con); spin_unlock_bh(&con->writequeue_lock); break; } WARN_ON_ONCE(1); break; } } static void work_stop(void) { if (io_workqueue) { destroy_workqueue(io_workqueue); io_workqueue = NULL; } if (process_workqueue) { destroy_workqueue(process_workqueue); process_workqueue = NULL; } } static int work_start(void) { io_workqueue = alloc_workqueue("dlm_io", WQ_HIGHPRI | WQ_MEM_RECLAIM | WQ_UNBOUND, 0); if (!io_workqueue) { log_print("can't start dlm_io"); return -ENOMEM; } process_workqueue = alloc_workqueue("dlm_process", WQ_HIGHPRI | WQ_BH | WQ_PERCPU, 0); if (!process_workqueue) { log_print("can't start dlm_process"); destroy_workqueue(io_workqueue); io_workqueue = NULL; return -ENOMEM; } return 0; } void dlm_lowcomms_shutdown(void) { struct connection *con; int i, idx; /* stop lowcomms_listen_data_ready calls */ lock_sock(listen_con.sock->sk); listen_con.sock->sk->sk_data_ready = listen_sock.sk_data_ready; release_sock(listen_con.sock->sk); cancel_work_sync(&listen_con.rwork); dlm_close_sock(&listen_con.sock); idx = srcu_read_lock(&connections_srcu); for (i = 0; i < CONN_HASH_SIZE; i++) { hlist_for_each_entry_rcu(con, &connection_hash[i], list) { shutdown_connection(con, true); stop_connection_io(con); flush_workqueue(process_workqueue); close_connection(con, true); clean_one_writequeue(con); if (con->othercon) clean_one_writequeue(con->othercon); allow_connection_io(con); } } srcu_read_unlock(&connections_srcu, idx); } void dlm_lowcomms_stop(void) { work_stop(); dlm_proto_ops = NULL; } static int dlm_listen_for_all(void) { struct socket *sock; int result; log_print("Using %s for communications", dlm_proto_ops->name); result = dlm_proto_ops->listen_validate(); if (result < 0) return result; result = sock_create_kern(&init_net, dlm_local_addr[0].ss_family, SOCK_STREAM, dlm_proto_ops->proto, &sock); if (result < 0) { log_print("Can't create comms socket: %d", result); return result; } sock_set_mark(sock->sk, dlm_config.ci_mark); dlm_proto_ops->listen_sockopts(sock); result = dlm_proto_ops->listen_bind(sock); if (result < 0) goto out; lock_sock(sock->sk); listen_sock.sk_data_ready = sock->sk->sk_data_ready; listen_sock.sk_write_space = sock->sk->sk_write_space; listen_sock.sk_error_report = sock->sk->sk_error_report; listen_sock.sk_state_change = sock->sk->sk_state_change; listen_con.sock = sock; sock->sk->sk_allocation = GFP_NOFS; sock->sk->sk_use_task_frag = false; sock->sk->sk_data_ready = lowcomms_listen_data_ready; release_sock(sock->sk); result = sock->ops->listen(sock, 128); if (result < 0) { dlm_close_sock(&listen_con.sock); return result; } return 0; out: sock_release(sock); return result; } static int dlm_tcp_bind(struct socket *sock) { struct sockaddr_storage src_addr; int result, addr_len; /* Bind to our cluster-known address connecting to avoid * routing problems. */ memcpy(&src_addr, &dlm_local_addr[0], sizeof(src_addr)); make_sockaddr(&src_addr, 0, &addr_len); result = kernel_bind(sock, (struct sockaddr_unsized *)&src_addr, addr_len); if (result < 0) { /* This *may* not indicate a critical error */ log_print("could not bind for connect: %d", result); } return 0; } static int dlm_tcp_listen_validate(void) { /* We don't support multi-homed hosts */ if (dlm_local_count > 1) { log_print("Detect multi-homed hosts but use only the first IP address."); log_print("Try SCTP, if you want to enable multi-link."); } return 0; } static void dlm_tcp_sockopts(struct socket *sock) { /* Turn off Nagle's algorithm */ tcp_sock_set_nodelay(sock->sk); } static void dlm_tcp_listen_sockopts(struct socket *sock) { dlm_tcp_sockopts(sock); sock_set_reuseaddr(sock->sk); } static int dlm_tcp_listen_bind(struct socket *sock) { int addr_len; /* Bind to our port */ make_sockaddr(&dlm_local_addr[0], dlm_config.ci_tcp_port, &addr_len); return kernel_bind(sock, (struct sockaddr_unsized *)&dlm_local_addr[0], addr_len); } static const struct dlm_proto_ops dlm_tcp_ops = { .name = "TCP", .proto = IPPROTO_TCP, .how = SHUT_WR, .sockopts = dlm_tcp_sockopts, .bind = dlm_tcp_bind, .listen_validate = dlm_tcp_listen_validate, .listen_sockopts = dlm_tcp_listen_sockopts, .listen_bind = dlm_tcp_listen_bind, }; static int dlm_sctp_bind(struct socket *sock) { return sctp_bind_addrs(sock, 0); } static int dlm_sctp_listen_validate(void) { if (!IS_ENABLED(CONFIG_IP_SCTP)) { log_print("SCTP is not enabled by this kernel"); return -EOPNOTSUPP; } request_module("sctp"); return 0; } static int dlm_sctp_bind_listen(struct socket *sock) { return sctp_bind_addrs(sock, dlm_config.ci_tcp_port); } static void dlm_sctp_sockopts(struct socket *sock) { /* Turn off Nagle's algorithm */ sctp_sock_set_nodelay(sock->sk); sock_set_rcvbuf(sock->sk, NEEDED_RMEM); } static const struct dlm_proto_ops dlm_sctp_ops = { .name = "SCTP", .proto = IPPROTO_SCTP, .how = SHUT_RDWR, .try_new_addr = true, .sockopts = dlm_sctp_sockopts, .bind = dlm_sctp_bind, .listen_validate = dlm_sctp_listen_validate, .listen_sockopts = dlm_sctp_sockopts, .listen_bind = dlm_sctp_bind_listen, }; int dlm_lowcomms_start(void) { int error; init_local(); if (!dlm_local_count) { error = -ENOTCONN; log_print("no local IP address has been set"); goto fail; } error = work_start(); if (error) goto fail; /* Start listening */ switch (dlm_config.ci_protocol) { case DLM_PROTO_TCP: dlm_proto_ops = &dlm_tcp_ops; break; case DLM_PROTO_SCTP: dlm_proto_ops = &dlm_sctp_ops; break; default: log_print("Invalid protocol identifier %d set", dlm_config.ci_protocol); error = -EINVAL; goto fail_proto_ops; } error = dlm_listen_for_all(); if (error) goto fail_listen; return 0; fail_listen: dlm_proto_ops = NULL; fail_proto_ops: work_stop(); fail: return error; } void dlm_lowcomms_init(void) { int i; for (i = 0; i < CONN_HASH_SIZE; i++) INIT_HLIST_HEAD(&connection_hash[i]); INIT_WORK(&listen_con.rwork, process_listen_recv_socket); } void dlm_lowcomms_exit(void) { struct connection *con; int i, idx; idx = srcu_read_lock(&connections_srcu); for (i = 0; i < CONN_HASH_SIZE; i++) { hlist_for_each_entry_rcu(con, &connection_hash[i], list) { spin_lock(&connections_lock); hlist_del_rcu(&con->list); spin_unlock(&connections_lock); if (con->othercon) call_srcu(&connections_srcu, &con->othercon->rcu, connection_release); call_srcu(&connections_srcu, &con->rcu, connection_release); } } srcu_read_unlock(&connections_srcu, idx); } |
| 6 7 5 1 1 1 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 2 1 2 2 2 2 10 5 5 5 3 5 5 5 5 5 4 5 5 5 5 5 5 5 5 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 10 2 2 2 2 10 10 7 7 7 1 7 7 7 7 7 6 7 7 7 1 7 7 5 5 5 4 4 5 4 5 4 7 7 7 7 8 8 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 | // SPDX-License-Identifier: GPL-2.0-only /* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */ #include <linux/mm.h> #include <linux/llist.h> #include <linux/bpf.h> #include <linux/irq_work.h> #include <linux/bpf_mem_alloc.h> #include <linux/memcontrol.h> #include <asm/local.h> /* Any context (including NMI) BPF specific memory allocator. * * Tracing BPF programs can attach to kprobe and fentry. Hence they * run in unknown context where calling plain kmalloc() might not be safe. * * Front-end kmalloc() with per-cpu per-bucket cache of free elements. * Refill this cache asynchronously from irq_work. * * CPU_0 buckets * 16 32 64 96 128 196 256 512 1024 2048 4096 * ... * CPU_N buckets * 16 32 64 96 128 196 256 512 1024 2048 4096 * * The buckets are prefilled at the start. * BPF programs always run with migration disabled. * It's safe to allocate from cache of the current cpu with irqs disabled. * Free-ing is always done into bucket of the current cpu as well. * irq_work trims extra free elements from buckets with kfree * and refills them with kmalloc, so global kmalloc logic takes care * of freeing objects allocated by one cpu and freed on another. * * Every allocated objected is padded with extra 8 bytes that contains * struct llist_node. */ #define LLIST_NODE_SZ sizeof(struct llist_node) #define BPF_MEM_ALLOC_SIZE_MAX 4096 /* similar to kmalloc, but sizeof == 8 bucket is gone */ static u8 size_index[24] __ro_after_init = { 3, /* 8 */ 3, /* 16 */ 4, /* 24 */ 4, /* 32 */ 5, /* 40 */ 5, /* 48 */ 5, /* 56 */ 5, /* 64 */ 1, /* 72 */ 1, /* 80 */ 1, /* 88 */ 1, /* 96 */ 6, /* 104 */ 6, /* 112 */ 6, /* 120 */ 6, /* 128 */ 2, /* 136 */ 2, /* 144 */ 2, /* 152 */ 2, /* 160 */ 2, /* 168 */ 2, /* 176 */ 2, /* 184 */ 2 /* 192 */ }; static int bpf_mem_cache_idx(size_t size) { if (!size || size > BPF_MEM_ALLOC_SIZE_MAX) return -1; if (size <= 192) return size_index[(size - 1) / 8] - 1; return fls(size - 1) - 2; } #define NUM_CACHES 11 struct bpf_mem_cache { /* per-cpu list of free objects of size 'unit_size'. * All accesses are done with interrupts disabled and 'active' counter * protection with __llist_add() and __llist_del_first(). */ struct llist_head free_llist; local_t active; /* Operations on the free_list from unit_alloc/unit_free/bpf_mem_refill * are sequenced by per-cpu 'active' counter. But unit_free() cannot * fail. When 'active' is busy the unit_free() will add an object to * free_llist_extra. */ struct llist_head free_llist_extra; struct irq_work refill_work; struct obj_cgroup *objcg; int unit_size; /* count of objects in free_llist */ int free_cnt; int low_watermark, high_watermark, batch; int percpu_size; bool draining; struct bpf_mem_cache *tgt; void (*dtor)(void *obj, void *ctx); void *dtor_ctx; /* list of objects to be freed after RCU GP */ struct llist_head free_by_rcu; struct llist_node *free_by_rcu_tail; struct llist_head waiting_for_gp; struct llist_node *waiting_for_gp_tail; struct rcu_head rcu; atomic_t call_rcu_in_progress; struct llist_head free_llist_extra_rcu; /* list of objects to be freed after RCU tasks trace GP */ struct llist_head free_by_rcu_ttrace; struct llist_head waiting_for_gp_ttrace; struct rcu_head rcu_ttrace; atomic_t call_rcu_ttrace_in_progress; }; struct bpf_mem_caches { struct bpf_mem_cache cache[NUM_CACHES]; }; static const u16 sizes[NUM_CACHES] = {96, 192, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096}; static struct llist_node notrace *__llist_del_first(struct llist_head *head) { struct llist_node *entry, *next; entry = head->first; if (!entry) return NULL; next = entry->next; head->first = next; return entry; } static void *__alloc(struct bpf_mem_cache *c, int node, gfp_t flags) { if (c->percpu_size) { void __percpu **obj = kmalloc_node(c->percpu_size, flags, node); void __percpu *pptr = __alloc_percpu_gfp(c->unit_size, 8, flags); if (!obj || !pptr) { free_percpu(pptr); kfree(obj); return NULL; } obj[1] = pptr; return obj; } return kmalloc_node(c->unit_size, flags | __GFP_ZERO, node); } static struct mem_cgroup *get_memcg(const struct bpf_mem_cache *c) { #ifdef CONFIG_MEMCG if (c->objcg) return get_mem_cgroup_from_objcg(c->objcg); return root_mem_cgroup; #else return NULL; #endif } static void inc_active(struct bpf_mem_cache *c, unsigned long *flags) { if (IS_ENABLED(CONFIG_PREEMPT_RT)) /* In RT irq_work runs in per-cpu kthread, so disable * interrupts to avoid preemption and interrupts and * reduce the chance of bpf prog executing on this cpu * when active counter is busy. */ local_irq_save(*flags); /* alloc_bulk runs from irq_work which will not preempt a bpf * program that does unit_alloc/unit_free since IRQs are * disabled there. There is no race to increment 'active' * counter. It protects free_llist from corruption in case NMI * bpf prog preempted this loop. */ WARN_ON_ONCE(local_inc_return(&c->active) != 1); } static void dec_active(struct bpf_mem_cache *c, unsigned long *flags) { local_dec(&c->active); if (IS_ENABLED(CONFIG_PREEMPT_RT)) local_irq_restore(*flags); } static void add_obj_to_free_list(struct bpf_mem_cache *c, void *obj) { unsigned long flags; inc_active(c, &flags); __llist_add(obj, &c->free_llist); c->free_cnt++; dec_active(c, &flags); } /* Mostly runs from irq_work except __init phase. */ static void alloc_bulk(struct bpf_mem_cache *c, int cnt, int node, bool atomic) { struct mem_cgroup *memcg = NULL, *old_memcg; gfp_t gfp; void *obj; int i; gfp = __GFP_NOWARN | __GFP_ACCOUNT; gfp |= atomic ? GFP_NOWAIT : GFP_KERNEL; for (i = 0; i < cnt; i++) { /* * For every 'c' llist_del_first(&c->free_by_rcu_ttrace); is * done only by one CPU == current CPU. Other CPUs might * llist_add() and llist_del_all() in parallel. */ obj = llist_del_first(&c->free_by_rcu_ttrace); if (!obj) break; add_obj_to_free_list(c, obj); } if (i >= cnt) return; for (; i < cnt; i++) { obj = llist_del_first(&c->waiting_for_gp_ttrace); if (!obj) break; add_obj_to_free_list(c, obj); } if (i >= cnt) return; memcg = get_memcg(c); old_memcg = set_active_memcg(memcg); for (; i < cnt; i++) { /* Allocate, but don't deplete atomic reserves that typical * GFP_ATOMIC would do. irq_work runs on this cpu and kmalloc * will allocate from the current numa node which is what we * want here. */ obj = __alloc(c, node, gfp); if (!obj) break; add_obj_to_free_list(c, obj); } set_active_memcg(old_memcg); mem_cgroup_put(memcg); } static void free_one(void *obj, bool percpu) { if (percpu) free_percpu(((void __percpu **)obj)[1]); kfree(obj); } static int free_all(struct bpf_mem_cache *c, struct llist_node *llnode, bool percpu) { struct llist_node *pos, *t; int cnt = 0; llist_for_each_safe(pos, t, llnode) { if (c->dtor) c->dtor((void *)pos + LLIST_NODE_SZ, c->dtor_ctx); free_one(pos, percpu); cnt++; } return cnt; } static void __free_rcu(struct rcu_head *head) { struct bpf_mem_cache *c = container_of(head, struct bpf_mem_cache, rcu_ttrace); free_all(c, llist_del_all(&c->waiting_for_gp_ttrace), !!c->percpu_size); atomic_set(&c->call_rcu_ttrace_in_progress, 0); } static void __free_rcu_tasks_trace(struct rcu_head *head) { /* If RCU Tasks Trace grace period implies RCU grace period, * there is no need to invoke call_rcu(). */ if (rcu_trace_implies_rcu_gp()) __free_rcu(head); else call_rcu(head, __free_rcu); } static void enque_to_free(struct bpf_mem_cache *c, void *obj) { struct llist_node *llnode = obj; /* bpf_mem_cache is a per-cpu object. Freeing happens in irq_work. * Nothing races to add to free_by_rcu_ttrace list. */ llist_add(llnode, &c->free_by_rcu_ttrace); } static void do_call_rcu_ttrace(struct bpf_mem_cache *c) { struct llist_node *llnode, *t; if (atomic_xchg(&c->call_rcu_ttrace_in_progress, 1)) { if (unlikely(READ_ONCE(c->draining))) { llnode = llist_del_all(&c->free_by_rcu_ttrace); free_all(c, llnode, !!c->percpu_size); } return; } WARN_ON_ONCE(!llist_empty(&c->waiting_for_gp_ttrace)); llist_for_each_safe(llnode, t, llist_del_all(&c->free_by_rcu_ttrace)) llist_add(llnode, &c->waiting_for_gp_ttrace); if (unlikely(READ_ONCE(c->draining))) { __free_rcu(&c->rcu_ttrace); return; } /* Use call_rcu_tasks_trace() to wait for sleepable progs to finish. * If RCU Tasks Trace grace period implies RCU grace period, free * these elements directly, else use call_rcu() to wait for normal * progs to finish and finally do free_one() on each element. */ call_rcu_tasks_trace(&c->rcu_ttrace, __free_rcu_tasks_trace); } static void free_bulk(struct bpf_mem_cache *c) { struct bpf_mem_cache *tgt = c->tgt; struct llist_node *llnode, *t; unsigned long flags; int cnt; WARN_ON_ONCE(tgt->unit_size != c->unit_size); WARN_ON_ONCE(tgt->percpu_size != c->percpu_size); do { inc_active(c, &flags); llnode = __llist_del_first(&c->free_llist); if (llnode) cnt = --c->free_cnt; else cnt = 0; dec_active(c, &flags); if (llnode) enque_to_free(tgt, llnode); } while (cnt > (c->high_watermark + c->low_watermark) / 2); /* and drain free_llist_extra */ llist_for_each_safe(llnode, t, llist_del_all(&c->free_llist_extra)) enque_to_free(tgt, llnode); do_call_rcu_ttrace(tgt); } static void __free_by_rcu(struct rcu_head *head) { struct bpf_mem_cache *c = container_of(head, struct bpf_mem_cache, rcu); struct bpf_mem_cache *tgt = c->tgt; struct llist_node *llnode; WARN_ON_ONCE(tgt->unit_size != c->unit_size); WARN_ON_ONCE(tgt->percpu_size != c->percpu_size); llnode = llist_del_all(&c->waiting_for_gp); if (!llnode) goto out; llist_add_batch(llnode, c->waiting_for_gp_tail, &tgt->free_by_rcu_ttrace); /* Objects went through regular RCU GP. Send them to RCU tasks trace */ do_call_rcu_ttrace(tgt); out: atomic_set(&c->call_rcu_in_progress, 0); } static void check_free_by_rcu(struct bpf_mem_cache *c) { struct llist_node *llnode, *t; unsigned long flags; /* drain free_llist_extra_rcu */ if (unlikely(!llist_empty(&c->free_llist_extra_rcu))) { inc_active(c, &flags); llist_for_each_safe(llnode, t, llist_del_all(&c->free_llist_extra_rcu)) if (__llist_add(llnode, &c->free_by_rcu)) c->free_by_rcu_tail = llnode; dec_active(c, &flags); } if (llist_empty(&c->free_by_rcu)) return; if (atomic_xchg(&c->call_rcu_in_progress, 1)) { /* * Instead of kmalloc-ing new rcu_head and triggering 10k * call_rcu() to hit rcutree.qhimark and force RCU to notice * the overload just ask RCU to hurry up. There could be many * objects in free_by_rcu list. * This hint reduces memory consumption for an artificial * benchmark from 2 Gbyte to 150 Mbyte. */ rcu_request_urgent_qs_task(current); return; } WARN_ON_ONCE(!llist_empty(&c->waiting_for_gp)); inc_active(c, &flags); WRITE_ONCE(c->waiting_for_gp.first, __llist_del_all(&c->free_by_rcu)); c->waiting_for_gp_tail = c->free_by_rcu_tail; dec_active(c, &flags); if (unlikely(READ_ONCE(c->draining))) { free_all(c, llist_del_all(&c->waiting_for_gp), !!c->percpu_size); atomic_set(&c->call_rcu_in_progress, 0); } else { call_rcu_hurry(&c->rcu, __free_by_rcu); } } static void bpf_mem_refill(struct irq_work *work) { struct bpf_mem_cache *c = container_of(work, struct bpf_mem_cache, refill_work); int cnt; /* Racy access to free_cnt. It doesn't need to be 100% accurate */ cnt = c->free_cnt; if (cnt < c->low_watermark) /* irq_work runs on this cpu and kmalloc will allocate * from the current numa node which is what we want here. */ alloc_bulk(c, c->batch, NUMA_NO_NODE, true); else if (cnt > c->high_watermark) free_bulk(c); check_free_by_rcu(c); } static void notrace irq_work_raise(struct bpf_mem_cache *c) { irq_work_queue(&c->refill_work); } /* For typical bpf map case that uses bpf_mem_cache_alloc and single bucket * the freelist cache will be elem_size * 64 (or less) on each cpu. * * For bpf programs that don't have statically known allocation sizes and * assuming (low_mark + high_mark) / 2 as an average number of elements per * bucket and all buckets are used the total amount of memory in freelists * on each cpu will be: * 64*16 + 64*32 + 64*64 + 64*96 + 64*128 + 64*196 + 64*256 + 32*512 + 16*1024 + 8*2048 + 4*4096 * == ~ 116 Kbyte using below heuristic. * Initialized, but unused bpf allocator (not bpf map specific one) will * consume ~ 11 Kbyte per cpu. * Typical case will be between 11K and 116K closer to 11K. * bpf progs can and should share bpf_mem_cache when possible. * * Percpu allocation is typically rare. To avoid potential unnecessary large * memory consumption, set low_mark = 1 and high_mark = 3, resulting in c->batch = 1. */ static void init_refill_work(struct bpf_mem_cache *c) { init_irq_work(&c->refill_work, bpf_mem_refill); if (c->percpu_size) { c->low_watermark = 1; c->high_watermark = 3; } else if (c->unit_size <= 256) { c->low_watermark = 32; c->high_watermark = 96; } else { /* When page_size == 4k, order-0 cache will have low_mark == 2 * and high_mark == 6 with batch alloc of 3 individual pages at * a time. * 8k allocs and above low == 1, high == 3, batch == 1. */ c->low_watermark = max(32 * 256 / c->unit_size, 1); c->high_watermark = max(96 * 256 / c->unit_size, 3); } c->batch = max((c->high_watermark - c->low_watermark) / 4 * 3, 1); } static void prefill_mem_cache(struct bpf_mem_cache *c, int cpu) { int cnt = 1; /* To avoid consuming memory, for non-percpu allocation, assume that * 1st run of bpf prog won't be doing more than 4 map_update_elem from * irq disabled region if unit size is less than or equal to 256. * For all other cases, let us just do one allocation. */ if (!c->percpu_size && c->unit_size <= 256) cnt = 4; alloc_bulk(c, cnt, cpu_to_node(cpu), false); } /* When size != 0 bpf_mem_cache for each cpu. * This is typical bpf hash map use case when all elements have equal size. * * When size == 0 allocate 11 bpf_mem_cache-s for each cpu, then rely on * kmalloc/kfree. Max allocation size is 4096 in this case. * This is bpf_dynptr and bpf_kptr use case. */ int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu) { struct bpf_mem_caches *cc; struct bpf_mem_caches __percpu *pcc; struct bpf_mem_cache *c; struct bpf_mem_cache __percpu *pc; struct obj_cgroup *objcg = NULL; int cpu, i, unit_size, percpu_size = 0; if (percpu && size == 0) return -EINVAL; /* room for llist_node and per-cpu pointer */ if (percpu) percpu_size = LLIST_NODE_SZ + sizeof(void *); ma->percpu = percpu; if (size) { pc = __alloc_percpu_gfp(sizeof(*pc), 8, GFP_KERNEL); if (!pc) return -ENOMEM; if (!percpu) size += LLIST_NODE_SZ; /* room for llist_node */ unit_size = size; #ifdef CONFIG_MEMCG if (memcg_bpf_enabled()) objcg = get_obj_cgroup_from_current(); #endif ma->objcg = objcg; for_each_possible_cpu(cpu) { c = per_cpu_ptr(pc, cpu); c->unit_size = unit_size; c->objcg = objcg; c->percpu_size = percpu_size; c->tgt = c; init_refill_work(c); prefill_mem_cache(c, cpu); } ma->cache = pc; return 0; } pcc = __alloc_percpu_gfp(sizeof(*cc), 8, GFP_KERNEL); if (!pcc) return -ENOMEM; #ifdef CONFIG_MEMCG objcg = get_obj_cgroup_from_current(); #endif ma->objcg = objcg; for_each_possible_cpu(cpu) { cc = per_cpu_ptr(pcc, cpu); for (i = 0; i < NUM_CACHES; i++) { c = &cc->cache[i]; c->unit_size = sizes[i]; c->objcg = objcg; c->percpu_size = percpu_size; c->tgt = c; init_refill_work(c); prefill_mem_cache(c, cpu); } } ma->caches = pcc; return 0; } int bpf_mem_alloc_percpu_init(struct bpf_mem_alloc *ma, struct obj_cgroup *objcg) { struct bpf_mem_caches __percpu *pcc; pcc = __alloc_percpu_gfp(sizeof(struct bpf_mem_caches), 8, GFP_KERNEL); if (!pcc) return -ENOMEM; ma->caches = pcc; ma->objcg = objcg; ma->percpu = true; return 0; } int bpf_mem_alloc_percpu_unit_init(struct bpf_mem_alloc *ma, int size) { struct bpf_mem_caches *cc; struct bpf_mem_caches __percpu *pcc; int cpu, i, unit_size, percpu_size; struct obj_cgroup *objcg; struct bpf_mem_cache *c; i = bpf_mem_cache_idx(size); if (i < 0) return -EINVAL; /* room for llist_node and per-cpu pointer */ percpu_size = LLIST_NODE_SZ + sizeof(void *); unit_size = sizes[i]; objcg = ma->objcg; pcc = ma->caches; for_each_possible_cpu(cpu) { cc = per_cpu_ptr(pcc, cpu); c = &cc->cache[i]; if (c->unit_size) break; c->unit_size = unit_size; c->objcg = objcg; c->percpu_size = percpu_size; c->tgt = c; init_refill_work(c); prefill_mem_cache(c, cpu); } return 0; } static void drain_mem_cache(struct bpf_mem_cache *c) { bool percpu = !!c->percpu_size; /* No progs are using this bpf_mem_cache, but htab_map_free() called * bpf_mem_cache_free() for all remaining elements and they can be in * free_by_rcu_ttrace or in waiting_for_gp_ttrace lists, so drain those lists now. * * Except for waiting_for_gp_ttrace list, there are no concurrent operations * on these lists, so it is safe to use __llist_del_all(). */ free_all(c, llist_del_all(&c->free_by_rcu_ttrace), percpu); free_all(c, llist_del_all(&c->waiting_for_gp_ttrace), percpu); free_all(c, __llist_del_all(&c->free_llist), percpu); free_all(c, __llist_del_all(&c->free_llist_extra), percpu); free_all(c, __llist_del_all(&c->free_by_rcu), percpu); free_all(c, __llist_del_all(&c->free_llist_extra_rcu), percpu); free_all(c, llist_del_all(&c->waiting_for_gp), percpu); } static void check_mem_cache(struct bpf_mem_cache *c) { WARN_ON_ONCE(!llist_empty(&c->free_by_rcu_ttrace)); WARN_ON_ONCE(!llist_empty(&c->waiting_for_gp_ttrace)); WARN_ON_ONCE(!llist_empty(&c->free_llist)); WARN_ON_ONCE(!llist_empty(&c->free_llist_extra)); WARN_ON_ONCE(!llist_empty(&c->free_by_rcu)); WARN_ON_ONCE(!llist_empty(&c->free_llist_extra_rcu)); WARN_ON_ONCE(!llist_empty(&c->waiting_for_gp)); } static void check_leaked_objs(struct bpf_mem_alloc *ma) { struct bpf_mem_caches *cc; struct bpf_mem_cache *c; int cpu, i; if (ma->cache) { for_each_possible_cpu(cpu) { c = per_cpu_ptr(ma->cache, cpu); check_mem_cache(c); } } if (ma->caches) { for_each_possible_cpu(cpu) { cc = per_cpu_ptr(ma->caches, cpu); for (i = 0; i < NUM_CACHES; i++) { c = &cc->cache[i]; check_mem_cache(c); } } } } static void free_mem_alloc_no_barrier(struct bpf_mem_alloc *ma) { /* We can free dtor ctx only once all callbacks are done using it. */ if (ma->dtor_ctx_free) ma->dtor_ctx_free(ma->dtor_ctx); check_leaked_objs(ma); free_percpu(ma->cache); free_percpu(ma->caches); ma->cache = NULL; ma->caches = NULL; } static void free_mem_alloc(struct bpf_mem_alloc *ma) { /* waiting_for_gp[_ttrace] lists were drained, but RCU callbacks * might still execute. Wait for them. * * rcu_barrier_tasks_trace() doesn't imply synchronize_rcu_tasks_trace(), * but rcu_barrier_tasks_trace() and rcu_barrier() below are only used * to wait for the pending __free_rcu_tasks_trace() and __free_rcu(), * so if call_rcu(head, __free_rcu) is skipped due to * rcu_trace_implies_rcu_gp(), it will be OK to skip rcu_barrier() by * using rcu_trace_implies_rcu_gp() as well. */ rcu_barrier(); /* wait for __free_by_rcu */ rcu_barrier_tasks_trace(); /* wait for __free_rcu */ if (!rcu_trace_implies_rcu_gp()) rcu_barrier(); free_mem_alloc_no_barrier(ma); } static void free_mem_alloc_deferred(struct work_struct *work) { struct bpf_mem_alloc *ma = container_of(work, struct bpf_mem_alloc, work); free_mem_alloc(ma); kfree(ma); } static void destroy_mem_alloc(struct bpf_mem_alloc *ma, int rcu_in_progress) { struct bpf_mem_alloc *copy; if (!rcu_in_progress) { /* Fast path. No callbacks are pending, hence no need to do * rcu_barrier-s. */ free_mem_alloc_no_barrier(ma); return; } copy = kmemdup(ma, sizeof(*ma), GFP_KERNEL); if (!copy) { /* Slow path with inline barrier-s */ free_mem_alloc(ma); return; } /* Defer barriers into worker to let the rest of map memory to be freed */ memset(ma, 0, sizeof(*ma)); INIT_WORK(©->work, free_mem_alloc_deferred); queue_work(system_dfl_wq, ©->work); } void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma) { struct bpf_mem_caches *cc; struct bpf_mem_cache *c; int cpu, i, rcu_in_progress; if (ma->cache) { rcu_in_progress = 0; for_each_possible_cpu(cpu) { c = per_cpu_ptr(ma->cache, cpu); WRITE_ONCE(c->draining, true); irq_work_sync(&c->refill_work); drain_mem_cache(c); rcu_in_progress += atomic_read(&c->call_rcu_ttrace_in_progress); rcu_in_progress += atomic_read(&c->call_rcu_in_progress); } obj_cgroup_put(ma->objcg); destroy_mem_alloc(ma, rcu_in_progress); } if (ma->caches) { rcu_in_progress = 0; for_each_possible_cpu(cpu) { cc = per_cpu_ptr(ma->caches, cpu); for (i = 0; i < NUM_CACHES; i++) { c = &cc->cache[i]; WRITE_ONCE(c->draining, true); irq_work_sync(&c->refill_work); drain_mem_cache(c); rcu_in_progress += atomic_read(&c->call_rcu_ttrace_in_progress); rcu_in_progress += atomic_read(&c->call_rcu_in_progress); } } obj_cgroup_put(ma->objcg); destroy_mem_alloc(ma, rcu_in_progress); } } /* notrace is necessary here and in other functions to make sure * bpf programs cannot attach to them and cause llist corruptions. */ static void notrace *unit_alloc(struct bpf_mem_cache *c) { struct llist_node *llnode = NULL; unsigned long flags; int cnt = 0; /* Disable irqs to prevent the following race for majority of prog types: * prog_A * bpf_mem_alloc * preemption or irq -> prog_B * bpf_mem_alloc * * but prog_B could be a perf_event NMI prog. * Use per-cpu 'active' counter to order free_list access between * unit_alloc/unit_free/bpf_mem_refill. */ local_irq_save(flags); if (local_inc_return(&c->active) == 1) { llnode = __llist_del_first(&c->free_llist); if (llnode) { cnt = --c->free_cnt; *(struct bpf_mem_cache **)llnode = c; } } local_dec(&c->active); WARN_ON(cnt < 0); if (cnt < c->low_watermark) irq_work_raise(c); /* Enable IRQ after the enqueue of irq work completes, so irq work * will run after IRQ is enabled and free_llist may be refilled by * irq work before other task preempts current task. */ local_irq_restore(flags); return llnode; } /* Though 'ptr' object could have been allocated on a different cpu * add it to the free_llist of the current cpu. * Let kfree() logic deal with it when it's later called from irq_work. */ static void notrace unit_free(struct bpf_mem_cache *c, void *ptr) { struct llist_node *llnode = ptr - LLIST_NODE_SZ; unsigned long flags; int cnt = 0; BUILD_BUG_ON(LLIST_NODE_SZ > 8); /* * Remember bpf_mem_cache that allocated this object. * The hint is not accurate. */ c->tgt = *(struct bpf_mem_cache **)llnode; local_irq_save(flags); if (local_inc_return(&c->active) == 1) { __llist_add(llnode, &c->free_llist); cnt = ++c->free_cnt; } else { /* unit_free() cannot fail. Therefore add an object to atomic * llist. free_bulk() will drain it. Though free_llist_extra is * a per-cpu list we have to use atomic llist_add here, since * it also can be interrupted by bpf nmi prog that does another * unit_free() into the same free_llist_extra. */ llist_add(llnode, &c->free_llist_extra); } local_dec(&c->active); if (cnt > c->high_watermark) /* free few objects from current cpu into global kmalloc pool */ irq_work_raise(c); /* Enable IRQ after irq_work_raise() completes, otherwise when current * task is preempted by task which does unit_alloc(), unit_alloc() may * return NULL unexpectedly because irq work is already pending but can * not been triggered and free_llist can not be refilled timely. */ local_irq_restore(flags); } static void notrace unit_free_rcu(struct bpf_mem_cache *c, void *ptr) { struct llist_node *llnode = ptr - LLIST_NODE_SZ; unsigned long flags; c->tgt = *(struct bpf_mem_cache **)llnode; local_irq_save(flags); if (local_inc_return(&c->active) == 1) { if (__llist_add(llnode, &c->free_by_rcu)) c->free_by_rcu_tail = llnode; } else { llist_add(llnode, &c->free_llist_extra_rcu); } local_dec(&c->active); if (!atomic_read(&c->call_rcu_in_progress)) irq_work_raise(c); local_irq_restore(flags); } /* Called from BPF program or from sys_bpf syscall. * In both cases migration is disabled. */ void notrace *bpf_mem_alloc(struct bpf_mem_alloc *ma, size_t size) { int idx; void *ret; if (!size) return NULL; if (!ma->percpu) size += LLIST_NODE_SZ; idx = bpf_mem_cache_idx(size); if (idx < 0) return NULL; ret = unit_alloc(this_cpu_ptr(ma->caches)->cache + idx); return !ret ? NULL : ret + LLIST_NODE_SZ; } void notrace bpf_mem_free(struct bpf_mem_alloc *ma, void *ptr) { struct bpf_mem_cache *c; int idx; if (!ptr) return; c = *(void **)(ptr - LLIST_NODE_SZ); idx = bpf_mem_cache_idx(c->unit_size); if (WARN_ON_ONCE(idx < 0)) return; unit_free(this_cpu_ptr(ma->caches)->cache + idx, ptr); } void notrace bpf_mem_free_rcu(struct bpf_mem_alloc *ma, void *ptr) { struct bpf_mem_cache *c; int idx; if (!ptr) return; c = *(void **)(ptr - LLIST_NODE_SZ); idx = bpf_mem_cache_idx(c->unit_size); if (WARN_ON_ONCE(idx < 0)) return; unit_free_rcu(this_cpu_ptr(ma->caches)->cache + idx, ptr); } void notrace *bpf_mem_cache_alloc(struct bpf_mem_alloc *ma) { void *ret; ret = unit_alloc(this_cpu_ptr(ma->cache)); return !ret ? NULL : ret + LLIST_NODE_SZ; } void notrace bpf_mem_cache_free(struct bpf_mem_alloc *ma, void *ptr) { if (!ptr) return; unit_free(this_cpu_ptr(ma->cache), ptr); } void notrace bpf_mem_cache_free_rcu(struct bpf_mem_alloc *ma, void *ptr) { if (!ptr) return; unit_free_rcu(this_cpu_ptr(ma->cache), ptr); } /* Directly does a kfree() without putting 'ptr' back to the free_llist * for reuse and without waiting for a rcu_tasks_trace gp. * The caller must first go through the rcu_tasks_trace gp for 'ptr' * before calling bpf_mem_cache_raw_free(). * It could be used when the rcu_tasks_trace callback does not have * a hold on the original bpf_mem_alloc object that allocated the * 'ptr'. This should only be used in the uncommon code path. * Otherwise, the bpf_mem_alloc's free_llist cannot be refilled * and may affect performance. */ void bpf_mem_cache_raw_free(void *ptr) { if (!ptr) return; kfree(ptr - LLIST_NODE_SZ); } /* When flags == GFP_KERNEL, it signals that the caller will not cause * deadlock when using kmalloc. bpf_mem_cache_alloc_flags() will use * kmalloc if the free_llist is empty. */ void notrace *bpf_mem_cache_alloc_flags(struct bpf_mem_alloc *ma, gfp_t flags) { struct bpf_mem_cache *c; void *ret; c = this_cpu_ptr(ma->cache); ret = unit_alloc(c); if (!ret && flags == GFP_KERNEL) { struct mem_cgroup *memcg, *old_memcg; memcg = get_memcg(c); old_memcg = set_active_memcg(memcg); ret = __alloc(c, NUMA_NO_NODE, GFP_KERNEL | __GFP_NOWARN | __GFP_ACCOUNT); if (ret) *(struct bpf_mem_cache **)ret = c; set_active_memcg(old_memcg); mem_cgroup_put(memcg); } return !ret ? NULL : ret + LLIST_NODE_SZ; } int bpf_mem_alloc_check_size(bool percpu, size_t size) { /* The size of percpu allocation doesn't have LLIST_NODE_SZ overhead */ if ((percpu && size > BPF_MEM_ALLOC_SIZE_MAX) || (!percpu && size > BPF_MEM_ALLOC_SIZE_MAX - LLIST_NODE_SZ)) return -E2BIG; return 0; } void bpf_mem_alloc_set_dtor(struct bpf_mem_alloc *ma, void (*dtor)(void *obj, void *ctx), void (*dtor_ctx_free)(void *ctx), void *ctx) { struct bpf_mem_caches *cc; struct bpf_mem_cache *c; int cpu, i; ma->dtor_ctx_free = dtor_ctx_free; ma->dtor_ctx = ctx; if (ma->cache) { for_each_possible_cpu(cpu) { c = per_cpu_ptr(ma->cache, cpu); c->dtor = dtor; c->dtor_ctx = ctx; } } if (ma->caches) { for_each_possible_cpu(cpu) { cc = per_cpu_ptr(ma->caches, cpu); for (i = 0; i < NUM_CACHES; i++) { c = &cc->cache[i]; c->dtor = dtor; c->dtor_ctx = ctx; } } } } |
| 1 5 3 2 2 3 5 1 1 6 4 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 | #undef TRACE_SYSTEM #define TRACE_SYSTEM rtc #if !defined(_TRACE_RTC_H) || defined(TRACE_HEADER_MULTI_READ) #define _TRACE_RTC_H #include <linux/rtc.h> #include <linux/tracepoint.h> DECLARE_EVENT_CLASS(rtc_time_alarm_class, TP_PROTO(time64_t secs, int err), TP_ARGS(secs, err), TP_STRUCT__entry( __field(time64_t, secs) __field(int, err) ), TP_fast_assign( __entry->secs = secs; __entry->err = err; ), TP_printk("UTC (%lld) (%d)", __entry->secs, __entry->err ) ); DEFINE_EVENT(rtc_time_alarm_class, rtc_set_time, TP_PROTO(time64_t secs, int err), TP_ARGS(secs, err) ); DEFINE_EVENT(rtc_time_alarm_class, rtc_read_time, TP_PROTO(time64_t secs, int err), TP_ARGS(secs, err) ); DEFINE_EVENT(rtc_time_alarm_class, rtc_set_alarm, TP_PROTO(time64_t secs, int err), TP_ARGS(secs, err) ); DEFINE_EVENT(rtc_time_alarm_class, rtc_read_alarm, TP_PROTO(time64_t secs, int err), TP_ARGS(secs, err) ); TRACE_EVENT(rtc_irq_set_freq, TP_PROTO(int freq, int err), TP_ARGS(freq, err), TP_STRUCT__entry( __field(int, freq) __field(int, err) ), TP_fast_assign( __entry->freq = freq; __entry->err = err; ), TP_printk("set RTC periodic IRQ frequency:%u (%d)", __entry->freq, __entry->err ) ); TRACE_EVENT(rtc_irq_set_state, TP_PROTO(int enabled, int err), TP_ARGS(enabled, err), TP_STRUCT__entry( __field(int, enabled) __field(int, err) ), TP_fast_assign( __entry->enabled = enabled; __entry->err = err; ), TP_printk("%s RTC 2^N Hz periodic IRQs (%d)", __entry->enabled ? "enable" : "disable", __entry->err ) ); TRACE_EVENT(rtc_alarm_irq_enable, TP_PROTO(unsigned int enabled, int err), TP_ARGS(enabled, err), TP_STRUCT__entry( __field(unsigned int, enabled) __field(int, err) ), TP_fast_assign( __entry->enabled = enabled; __entry->err = err; ), TP_printk("%s RTC alarm IRQ (%d)", __entry->enabled ? "enable" : "disable", __entry->err ) ); DECLARE_EVENT_CLASS(rtc_offset_class, TP_PROTO(long offset, int err), TP_ARGS(offset, err), TP_STRUCT__entry( __field(long, offset) __field(int, err) ), TP_fast_assign( __entry->offset = offset; __entry->err = err; ), TP_printk("RTC offset: %ld (%d)", __entry->offset, __entry->err ) ); DEFINE_EVENT(rtc_offset_class, rtc_set_offset, TP_PROTO(long offset, int err), TP_ARGS(offset, err) ); DEFINE_EVENT(rtc_offset_class, rtc_read_offset, TP_PROTO(long offset, int err), TP_ARGS(offset, err) ); DECLARE_EVENT_CLASS(rtc_timer_class, TP_PROTO(struct rtc_timer *timer), TP_ARGS(timer), TP_STRUCT__entry( __field(struct rtc_timer *, timer) __field(ktime_t, expires) __field(ktime_t, period) ), TP_fast_assign( __entry->timer = timer; __entry->expires = timer->node.expires; __entry->period = timer->period; ), TP_printk("RTC timer:(%p) expires:%lld period:%lld", __entry->timer, __entry->expires, __entry->period ) ); DEFINE_EVENT(rtc_timer_class, rtc_timer_enqueue, TP_PROTO(struct rtc_timer *timer), TP_ARGS(timer) ); DEFINE_EVENT(rtc_timer_class, rtc_timer_dequeue, TP_PROTO(struct rtc_timer *timer), TP_ARGS(timer) ); DEFINE_EVENT(rtc_timer_class, rtc_timer_fired, TP_PROTO(struct rtc_timer *timer), TP_ARGS(timer) ); #endif /* _TRACE_RTC_H */ /* This part must be outside protection */ #include <trace/define_trace.h> |
| 1 1 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 | /* SPDX-License-Identifier: GPL-2.0-or-later */ /* * Cryptographic API. * * ARIA Cipher Algorithm. * * Documentation of ARIA can be found in RFC 5794. * Copyright (c) 2022 Taehee Yoo <ap420073@gmail.com> * Copyright (c) 2022 Taehee Yoo <ap420073@gmail.com> * * Information for ARIA * http://210.104.33.10/ARIA/index-e.html (English) * http://seed.kisa.or.kr/ (Korean) * * Public domain version is distributed above. */ #ifndef _CRYPTO_ARIA_H #define _CRYPTO_ARIA_H #include <crypto/algapi.h> #include <linux/module.h> #include <linux/init.h> #include <linux/types.h> #include <linux/errno.h> #include <asm/byteorder.h> #define ARIA_MIN_KEY_SIZE 16 #define ARIA_MAX_KEY_SIZE 32 #define ARIA_BLOCK_SIZE 16 #define ARIA_MAX_RD_KEYS 17 #define ARIA_RD_KEY_WORDS (ARIA_BLOCK_SIZE / sizeof(u32)) struct aria_ctx { u32 enc_key[ARIA_MAX_RD_KEYS][ARIA_RD_KEY_WORDS]; u32 dec_key[ARIA_MAX_RD_KEYS][ARIA_RD_KEY_WORDS]; int rounds; int key_length; }; static const u32 s1[256] = { 0x00636363, 0x007c7c7c, 0x00777777, 0x007b7b7b, 0x00f2f2f2, 0x006b6b6b, 0x006f6f6f, 0x00c5c5c5, 0x00303030, 0x00010101, 0x00676767, 0x002b2b2b, 0x00fefefe, 0x00d7d7d7, 0x00ababab, 0x00767676, 0x00cacaca, 0x00828282, 0x00c9c9c9, 0x007d7d7d, 0x00fafafa, 0x00595959, 0x00474747, 0x00f0f0f0, 0x00adadad, 0x00d4d4d4, 0x00a2a2a2, 0x00afafaf, 0x009c9c9c, 0x00a4a4a4, 0x00727272, 0x00c0c0c0, 0x00b7b7b7, 0x00fdfdfd, 0x00939393, 0x00262626, 0x00363636, 0x003f3f3f, 0x00f7f7f7, 0x00cccccc, 0x00343434, 0x00a5a5a5, 0x00e5e5e5, 0x00f1f1f1, 0x00717171, 0x00d8d8d8, 0x00313131, 0x00151515, 0x00040404, 0x00c7c7c7, 0x00232323, 0x00c3c3c3, 0x00181818, 0x00969696, 0x00050505, 0x009a9a9a, 0x00070707, 0x00121212, 0x00808080, 0x00e2e2e2, 0x00ebebeb, 0x00272727, 0x00b2b2b2, 0x00757575, 0x00090909, 0x00838383, 0x002c2c2c, 0x001a1a1a, 0x001b1b1b, 0x006e6e6e, 0x005a5a5a, 0x00a0a0a0, 0x00525252, 0x003b3b3b, 0x00d6d6d6, 0x00b3b3b3, 0x00292929, 0x00e3e3e3, 0x002f2f2f, 0x00848484, 0x00535353, 0x00d1d1d1, 0x00000000, 0x00ededed, 0x00202020, 0x00fcfcfc, 0x00b1b1b1, 0x005b5b5b, 0x006a6a6a, 0x00cbcbcb, 0x00bebebe, 0x00393939, 0x004a4a4a, 0x004c4c4c, 0x00585858, 0x00cfcfcf, 0x00d0d0d0, 0x00efefef, 0x00aaaaaa, 0x00fbfbfb, 0x00434343, 0x004d4d4d, 0x00333333, 0x00858585, 0x00454545, 0x00f9f9f9, 0x00020202, 0x007f7f7f, 0x00505050, 0x003c3c3c, 0x009f9f9f, 0x00a8a8a8, 0x00515151, 0x00a3a3a3, 0x00404040, 0x008f8f8f, 0x00929292, 0x009d9d9d, 0x00383838, 0x00f5f5f5, 0x00bcbcbc, 0x00b6b6b6, 0x00dadada, 0x00212121, 0x00101010, 0x00ffffff, 0x00f3f3f3, 0x00d2d2d2, 0x00cdcdcd, 0x000c0c0c, 0x00131313, 0x00ececec, 0x005f5f5f, 0x00979797, 0x00444444, 0x00171717, 0x00c4c4c4, 0x00a7a7a7, 0x007e7e7e, 0x003d3d3d, 0x00646464, 0x005d5d5d, 0x00191919, 0x00737373, 0x00606060, 0x00818181, 0x004f4f4f, 0x00dcdcdc, 0x00222222, 0x002a2a2a, 0x00909090, 0x00888888, 0x00464646, 0x00eeeeee, 0x00b8b8b8, 0x00141414, 0x00dedede, 0x005e5e5e, 0x000b0b0b, 0x00dbdbdb, 0x00e0e0e0, 0x00323232, 0x003a3a3a, 0x000a0a0a, 0x00494949, 0x00060606, 0x00242424, 0x005c5c5c, 0x00c2c2c2, 0x00d3d3d3, 0x00acacac, 0x00626262, 0x00919191, 0x00959595, 0x00e4e4e4, 0x00797979, 0x00e7e7e7, 0x00c8c8c8, 0x00373737, 0x006d6d6d, 0x008d8d8d, 0x00d5d5d5, 0x004e4e4e, 0x00a9a9a9, 0x006c6c6c, 0x00565656, 0x00f4f4f4, 0x00eaeaea, 0x00656565, 0x007a7a7a, 0x00aeaeae, 0x00080808, 0x00bababa, 0x00787878, 0x00252525, 0x002e2e2e, 0x001c1c1c, 0x00a6a6a6, 0x00b4b4b4, 0x00c6c6c6, 0x00e8e8e8, 0x00dddddd, 0x00747474, 0x001f1f1f, 0x004b4b4b, 0x00bdbdbd, 0x008b8b8b, 0x008a8a8a, 0x00707070, 0x003e3e3e, 0x00b5b5b5, 0x00666666, 0x00484848, 0x00030303, 0x00f6f6f6, 0x000e0e0e, 0x00616161, 0x00353535, 0x00575757, 0x00b9b9b9, 0x00868686, 0x00c1c1c1, 0x001d1d1d, 0x009e9e9e, 0x00e1e1e1, 0x00f8f8f8, 0x00989898, 0x00111111, 0x00696969, 0x00d9d9d9, 0x008e8e8e, 0x00949494, 0x009b9b9b, 0x001e1e1e, 0x00878787, 0x00e9e9e9, 0x00cecece, 0x00555555, 0x00282828, 0x00dfdfdf, 0x008c8c8c, 0x00a1a1a1, 0x00898989, 0x000d0d0d, 0x00bfbfbf, 0x00e6e6e6, 0x00424242, 0x00686868, 0x00414141, 0x00999999, 0x002d2d2d, 0x000f0f0f, 0x00b0b0b0, 0x00545454, 0x00bbbbbb, 0x00161616 }; static const u32 s2[256] = { 0xe200e2e2, 0x4e004e4e, 0x54005454, 0xfc00fcfc, 0x94009494, 0xc200c2c2, 0x4a004a4a, 0xcc00cccc, 0x62006262, 0x0d000d0d, 0x6a006a6a, 0x46004646, 0x3c003c3c, 0x4d004d4d, 0x8b008b8b, 0xd100d1d1, 0x5e005e5e, 0xfa00fafa, 0x64006464, 0xcb00cbcb, 0xb400b4b4, 0x97009797, 0xbe00bebe, 0x2b002b2b, 0xbc00bcbc, 0x77007777, 0x2e002e2e, 0x03000303, 0xd300d3d3, 0x19001919, 0x59005959, 0xc100c1c1, 0x1d001d1d, 0x06000606, 0x41004141, 0x6b006b6b, 0x55005555, 0xf000f0f0, 0x99009999, 0x69006969, 0xea00eaea, 0x9c009c9c, 0x18001818, 0xae00aeae, 0x63006363, 0xdf00dfdf, 0xe700e7e7, 0xbb00bbbb, 0x00000000, 0x73007373, 0x66006666, 0xfb00fbfb, 0x96009696, 0x4c004c4c, 0x85008585, 0xe400e4e4, 0x3a003a3a, 0x09000909, 0x45004545, 0xaa00aaaa, 0x0f000f0f, 0xee00eeee, 0x10001010, 0xeb00ebeb, 0x2d002d2d, 0x7f007f7f, 0xf400f4f4, 0x29002929, 0xac00acac, 0xcf00cfcf, 0xad00adad, 0x91009191, 0x8d008d8d, 0x78007878, 0xc800c8c8, 0x95009595, 0xf900f9f9, 0x2f002f2f, 0xce00cece, 0xcd00cdcd, 0x08000808, 0x7a007a7a, 0x88008888, 0x38003838, 0x5c005c5c, 0x83008383, 0x2a002a2a, 0x28002828, 0x47004747, 0xdb00dbdb, 0xb800b8b8, 0xc700c7c7, 0x93009393, 0xa400a4a4, 0x12001212, 0x53005353, 0xff00ffff, 0x87008787, 0x0e000e0e, 0x31003131, 0x36003636, 0x21002121, 0x58005858, 0x48004848, 0x01000101, 0x8e008e8e, 0x37003737, 0x74007474, 0x32003232, 0xca00caca, 0xe900e9e9, 0xb100b1b1, 0xb700b7b7, 0xab00abab, 0x0c000c0c, 0xd700d7d7, 0xc400c4c4, 0x56005656, 0x42004242, 0x26002626, 0x07000707, 0x98009898, 0x60006060, 0xd900d9d9, 0xb600b6b6, 0xb900b9b9, 0x11001111, 0x40004040, 0xec00ecec, 0x20002020, 0x8c008c8c, 0xbd00bdbd, 0xa000a0a0, 0xc900c9c9, 0x84008484, 0x04000404, 0x49004949, 0x23002323, 0xf100f1f1, 0x4f004f4f, 0x50005050, 0x1f001f1f, 0x13001313, 0xdc00dcdc, 0xd800d8d8, 0xc000c0c0, 0x9e009e9e, 0x57005757, 0xe300e3e3, 0xc300c3c3, 0x7b007b7b, 0x65006565, 0x3b003b3b, 0x02000202, 0x8f008f8f, 0x3e003e3e, 0xe800e8e8, 0x25002525, 0x92009292, 0xe500e5e5, 0x15001515, 0xdd00dddd, 0xfd00fdfd, 0x17001717, 0xa900a9a9, 0xbf00bfbf, 0xd400d4d4, 0x9a009a9a, 0x7e007e7e, 0xc500c5c5, 0x39003939, 0x67006767, 0xfe00fefe, 0x76007676, 0x9d009d9d, 0x43004343, 0xa700a7a7, 0xe100e1e1, 0xd000d0d0, 0xf500f5f5, 0x68006868, 0xf200f2f2, 0x1b001b1b, 0x34003434, 0x70007070, 0x05000505, 0xa300a3a3, 0x8a008a8a, 0xd500d5d5, 0x79007979, 0x86008686, 0xa800a8a8, 0x30003030, 0xc600c6c6, 0x51005151, 0x4b004b4b, 0x1e001e1e, 0xa600a6a6, 0x27002727, 0xf600f6f6, 0x35003535, 0xd200d2d2, 0x6e006e6e, 0x24002424, 0x16001616, 0x82008282, 0x5f005f5f, 0xda00dada, 0xe600e6e6, 0x75007575, 0xa200a2a2, 0xef00efef, 0x2c002c2c, 0xb200b2b2, 0x1c001c1c, 0x9f009f9f, 0x5d005d5d, 0x6f006f6f, 0x80008080, 0x0a000a0a, 0x72007272, 0x44004444, 0x9b009b9b, 0x6c006c6c, 0x90009090, 0x0b000b0b, 0x5b005b5b, 0x33003333, 0x7d007d7d, 0x5a005a5a, 0x52005252, 0xf300f3f3, 0x61006161, 0xa100a1a1, 0xf700f7f7, 0xb000b0b0, 0xd600d6d6, 0x3f003f3f, 0x7c007c7c, 0x6d006d6d, 0xed00eded, 0x14001414, 0xe000e0e0, 0xa500a5a5, 0x3d003d3d, 0x22002222, 0xb300b3b3, 0xf800f8f8, 0x89008989, 0xde00dede, 0x71007171, 0x1a001a1a, 0xaf00afaf, 0xba00baba, 0xb500b5b5, 0x81008181 }; static const u32 x1[256] = { 0x52520052, 0x09090009, 0x6a6a006a, 0xd5d500d5, 0x30300030, 0x36360036, 0xa5a500a5, 0x38380038, 0xbfbf00bf, 0x40400040, 0xa3a300a3, 0x9e9e009e, 0x81810081, 0xf3f300f3, 0xd7d700d7, 0xfbfb00fb, 0x7c7c007c, 0xe3e300e3, 0x39390039, 0x82820082, 0x9b9b009b, 0x2f2f002f, 0xffff00ff, 0x87870087, 0x34340034, 0x8e8e008e, 0x43430043, 0x44440044, 0xc4c400c4, 0xdede00de, 0xe9e900e9, 0xcbcb00cb, 0x54540054, 0x7b7b007b, 0x94940094, 0x32320032, 0xa6a600a6, 0xc2c200c2, 0x23230023, 0x3d3d003d, 0xeeee00ee, 0x4c4c004c, 0x95950095, 0x0b0b000b, 0x42420042, 0xfafa00fa, 0xc3c300c3, 0x4e4e004e, 0x08080008, 0x2e2e002e, 0xa1a100a1, 0x66660066, 0x28280028, 0xd9d900d9, 0x24240024, 0xb2b200b2, 0x76760076, 0x5b5b005b, 0xa2a200a2, 0x49490049, 0x6d6d006d, 0x8b8b008b, 0xd1d100d1, 0x25250025, 0x72720072, 0xf8f800f8, 0xf6f600f6, 0x64640064, 0x86860086, 0x68680068, 0x98980098, 0x16160016, 0xd4d400d4, 0xa4a400a4, 0x5c5c005c, 0xcccc00cc, 0x5d5d005d, 0x65650065, 0xb6b600b6, 0x92920092, 0x6c6c006c, 0x70700070, 0x48480048, 0x50500050, 0xfdfd00fd, 0xeded00ed, 0xb9b900b9, 0xdada00da, 0x5e5e005e, 0x15150015, 0x46460046, 0x57570057, 0xa7a700a7, 0x8d8d008d, 0x9d9d009d, 0x84840084, 0x90900090, 0xd8d800d8, 0xabab00ab, 0x00000000, 0x8c8c008c, 0xbcbc00bc, 0xd3d300d3, 0x0a0a000a, 0xf7f700f7, 0xe4e400e4, 0x58580058, 0x05050005, 0xb8b800b8, 0xb3b300b3, 0x45450045, 0x06060006, 0xd0d000d0, 0x2c2c002c, 0x1e1e001e, 0x8f8f008f, 0xcaca00ca, 0x3f3f003f, 0x0f0f000f, 0x02020002, 0xc1c100c1, 0xafaf00af, 0xbdbd00bd, 0x03030003, 0x01010001, 0x13130013, 0x8a8a008a, 0x6b6b006b, 0x3a3a003a, 0x91910091, 0x11110011, 0x41410041, 0x4f4f004f, 0x67670067, 0xdcdc00dc, 0xeaea00ea, 0x97970097, 0xf2f200f2, 0xcfcf00cf, 0xcece00ce, 0xf0f000f0, 0xb4b400b4, 0xe6e600e6, 0x73730073, 0x96960096, 0xacac00ac, 0x74740074, 0x22220022, 0xe7e700e7, 0xadad00ad, 0x35350035, 0x85850085, 0xe2e200e2, 0xf9f900f9, 0x37370037, 0xe8e800e8, 0x1c1c001c, 0x75750075, 0xdfdf00df, 0x6e6e006e, 0x47470047, 0xf1f100f1, 0x1a1a001a, 0x71710071, 0x1d1d001d, 0x29290029, 0xc5c500c5, 0x89890089, 0x6f6f006f, 0xb7b700b7, 0x62620062, 0x0e0e000e, 0xaaaa00aa, 0x18180018, 0xbebe00be, 0x1b1b001b, 0xfcfc00fc, 0x56560056, 0x3e3e003e, 0x4b4b004b, 0xc6c600c6, 0xd2d200d2, 0x79790079, 0x20200020, 0x9a9a009a, 0xdbdb00db, 0xc0c000c0, 0xfefe00fe, 0x78780078, 0xcdcd00cd, 0x5a5a005a, 0xf4f400f4, 0x1f1f001f, 0xdddd00dd, 0xa8a800a8, 0x33330033, 0x88880088, 0x07070007, 0xc7c700c7, 0x31310031, 0xb1b100b1, 0x12120012, 0x10100010, 0x59590059, 0x27270027, 0x80800080, 0xecec00ec, 0x5f5f005f, 0x60600060, 0x51510051, 0x7f7f007f, 0xa9a900a9, 0x19190019, 0xb5b500b5, 0x4a4a004a, 0x0d0d000d, 0x2d2d002d, 0xe5e500e5, 0x7a7a007a, 0x9f9f009f, 0x93930093, 0xc9c900c9, 0x9c9c009c, 0xefef00ef, 0xa0a000a0, 0xe0e000e0, 0x3b3b003b, 0x4d4d004d, 0xaeae00ae, 0x2a2a002a, 0xf5f500f5, 0xb0b000b0, 0xc8c800c8, 0xebeb00eb, 0xbbbb00bb, 0x3c3c003c, 0x83830083, 0x53530053, 0x99990099, 0x61610061, 0x17170017, 0x2b2b002b, 0x04040004, 0x7e7e007e, 0xbaba00ba, 0x77770077, 0xd6d600d6, 0x26260026, 0xe1e100e1, 0x69690069, 0x14140014, 0x63630063, 0x55550055, 0x21210021, 0x0c0c000c, 0x7d7d007d }; static const u32 x2[256] = { 0x30303000, 0x68686800, 0x99999900, 0x1b1b1b00, 0x87878700, 0xb9b9b900, 0x21212100, 0x78787800, 0x50505000, 0x39393900, 0xdbdbdb00, 0xe1e1e100, 0x72727200, 0x09090900, 0x62626200, 0x3c3c3c00, 0x3e3e3e00, 0x7e7e7e00, 0x5e5e5e00, 0x8e8e8e00, 0xf1f1f100, 0xa0a0a000, 0xcccccc00, 0xa3a3a300, 0x2a2a2a00, 0x1d1d1d00, 0xfbfbfb00, 0xb6b6b600, 0xd6d6d600, 0x20202000, 0xc4c4c400, 0x8d8d8d00, 0x81818100, 0x65656500, 0xf5f5f500, 0x89898900, 0xcbcbcb00, 0x9d9d9d00, 0x77777700, 0xc6c6c600, 0x57575700, 0x43434300, 0x56565600, 0x17171700, 0xd4d4d400, 0x40404000, 0x1a1a1a00, 0x4d4d4d00, 0xc0c0c000, 0x63636300, 0x6c6c6c00, 0xe3e3e300, 0xb7b7b700, 0xc8c8c800, 0x64646400, 0x6a6a6a00, 0x53535300, 0xaaaaaa00, 0x38383800, 0x98989800, 0x0c0c0c00, 0xf4f4f400, 0x9b9b9b00, 0xededed00, 0x7f7f7f00, 0x22222200, 0x76767600, 0xafafaf00, 0xdddddd00, 0x3a3a3a00, 0x0b0b0b00, 0x58585800, 0x67676700, 0x88888800, 0x06060600, 0xc3c3c300, 0x35353500, 0x0d0d0d00, 0x01010100, 0x8b8b8b00, 0x8c8c8c00, 0xc2c2c200, 0xe6e6e600, 0x5f5f5f00, 0x02020200, 0x24242400, 0x75757500, 0x93939300, 0x66666600, 0x1e1e1e00, 0xe5e5e500, 0xe2e2e200, 0x54545400, 0xd8d8d800, 0x10101000, 0xcecece00, 0x7a7a7a00, 0xe8e8e800, 0x08080800, 0x2c2c2c00, 0x12121200, 0x97979700, 0x32323200, 0xababab00, 0xb4b4b400, 0x27272700, 0x0a0a0a00, 0x23232300, 0xdfdfdf00, 0xefefef00, 0xcacaca00, 0xd9d9d900, 0xb8b8b800, 0xfafafa00, 0xdcdcdc00, 0x31313100, 0x6b6b6b00, 0xd1d1d100, 0xadadad00, 0x19191900, 0x49494900, 0xbdbdbd00, 0x51515100, 0x96969600, 0xeeeeee00, 0xe4e4e400, 0xa8a8a800, 0x41414100, 0xdadada00, 0xffffff00, 0xcdcdcd00, 0x55555500, 0x86868600, 0x36363600, 0xbebebe00, 0x61616100, 0x52525200, 0xf8f8f800, 0xbbbbbb00, 0x0e0e0e00, 0x82828200, 0x48484800, 0x69696900, 0x9a9a9a00, 0xe0e0e000, 0x47474700, 0x9e9e9e00, 0x5c5c5c00, 0x04040400, 0x4b4b4b00, 0x34343400, 0x15151500, 0x79797900, 0x26262600, 0xa7a7a700, 0xdedede00, 0x29292900, 0xaeaeae00, 0x92929200, 0xd7d7d700, 0x84848400, 0xe9e9e900, 0xd2d2d200, 0xbababa00, 0x5d5d5d00, 0xf3f3f300, 0xc5c5c500, 0xb0b0b000, 0xbfbfbf00, 0xa4a4a400, 0x3b3b3b00, 0x71717100, 0x44444400, 0x46464600, 0x2b2b2b00, 0xfcfcfc00, 0xebebeb00, 0x6f6f6f00, 0xd5d5d500, 0xf6f6f600, 0x14141400, 0xfefefe00, 0x7c7c7c00, 0x70707000, 0x5a5a5a00, 0x7d7d7d00, 0xfdfdfd00, 0x2f2f2f00, 0x18181800, 0x83838300, 0x16161600, 0xa5a5a500, 0x91919100, 0x1f1f1f00, 0x05050500, 0x95959500, 0x74747400, 0xa9a9a900, 0xc1c1c100, 0x5b5b5b00, 0x4a4a4a00, 0x85858500, 0x6d6d6d00, 0x13131300, 0x07070700, 0x4f4f4f00, 0x4e4e4e00, 0x45454500, 0xb2b2b200, 0x0f0f0f00, 0xc9c9c900, 0x1c1c1c00, 0xa6a6a600, 0xbcbcbc00, 0xececec00, 0x73737300, 0x90909000, 0x7b7b7b00, 0xcfcfcf00, 0x59595900, 0x8f8f8f00, 0xa1a1a100, 0xf9f9f900, 0x2d2d2d00, 0xf2f2f200, 0xb1b1b100, 0x00000000, 0x94949400, 0x37373700, 0x9f9f9f00, 0xd0d0d000, 0x2e2e2e00, 0x9c9c9c00, 0x6e6e6e00, 0x28282800, 0x3f3f3f00, 0x80808000, 0xf0f0f000, 0x3d3d3d00, 0xd3d3d300, 0x25252500, 0x8a8a8a00, 0xb5b5b500, 0xe7e7e700, 0x42424200, 0xb3b3b300, 0xc7c7c700, 0xeaeaea00, 0xf7f7f700, 0x4c4c4c00, 0x11111100, 0x33333300, 0x03030300, 0xa2a2a200, 0xacacac00, 0x60606000 }; static inline u32 rotl32(u32 v, u32 r) { return ((v << r) | (v >> (32 - r))); } static inline u32 rotr32(u32 v, u32 r) { return ((v >> r) | (v << (32 - r))); } static inline u32 bswap32(u32 v) { return ((v << 24) ^ (v >> 24) ^ ((v & 0x0000ff00) << 8) ^ ((v & 0x00ff0000) >> 8)); } static inline u8 get_u8(u32 x, u32 y) { return (x >> ((3 - y) * 8)); } static inline u32 make_u32(u8 v0, u8 v1, u8 v2, u8 v3) { return ((u32)v0 << 24) | ((u32)v1 << 16) | ((u32)v2 << 8) | ((u32)v3); } static inline u32 aria_m(u32 t0) { return rotr32(t0, 8) ^ rotr32(t0 ^ rotr32(t0, 8), 16); } /* S-Box Layer 1 + M */ static inline void aria_sbox_layer1_with_pre_diff(u32 *t0, u32 *t1, u32 *t2, u32 *t3) { *t0 = s1[get_u8(*t0, 0)] ^ s2[get_u8(*t0, 1)] ^ x1[get_u8(*t0, 2)] ^ x2[get_u8(*t0, 3)]; *t1 = s1[get_u8(*t1, 0)] ^ s2[get_u8(*t1, 1)] ^ x1[get_u8(*t1, 2)] ^ x2[get_u8(*t1, 3)]; *t2 = s1[get_u8(*t2, 0)] ^ s2[get_u8(*t2, 1)] ^ x1[get_u8(*t2, 2)] ^ x2[get_u8(*t2, 3)]; *t3 = s1[get_u8(*t3, 0)] ^ s2[get_u8(*t3, 1)] ^ x1[get_u8(*t3, 2)] ^ x2[get_u8(*t3, 3)]; } /* S-Box Layer 2 + M */ static inline void aria_sbox_layer2_with_pre_diff(u32 *t0, u32 *t1, u32 *t2, u32 *t3) { *t0 = x1[get_u8(*t0, 0)] ^ x2[get_u8(*t0, 1)] ^ s1[get_u8(*t0, 2)] ^ s2[get_u8(*t0, 3)]; *t1 = x1[get_u8(*t1, 0)] ^ x2[get_u8(*t1, 1)] ^ s1[get_u8(*t1, 2)] ^ s2[get_u8(*t1, 3)]; *t2 = x1[get_u8(*t2, 0)] ^ x2[get_u8(*t2, 1)] ^ s1[get_u8(*t2, 2)] ^ s2[get_u8(*t2, 3)]; *t3 = x1[get_u8(*t3, 0)] ^ x2[get_u8(*t3, 1)] ^ s1[get_u8(*t3, 2)] ^ s2[get_u8(*t3, 3)]; } /* Word-level diffusion */ static inline void aria_diff_word(u32 *t0, u32 *t1, u32 *t2, u32 *t3) { *t1 ^= *t2; *t2 ^= *t3; *t0 ^= *t1; *t3 ^= *t1; *t2 ^= *t0; *t1 ^= *t2; } /* Byte-level diffusion */ static inline void aria_diff_byte(u32 *t1, u32 *t2, u32 *t3) { *t1 = ((*t1 << 8) & 0xff00ff00) ^ ((*t1 >> 8) & 0x00ff00ff); *t2 = rotr32(*t2, 16); *t3 = bswap32(*t3); } /* Key XOR Layer */ static inline void aria_add_round_key(u32 *rk, u32 *t0, u32 *t1, u32 *t2, u32 *t3) { *t0 ^= rk[0]; *t1 ^= rk[1]; *t2 ^= rk[2]; *t3 ^= rk[3]; } /* Odd round Substitution & Diffusion */ static inline void aria_subst_diff_odd(u32 *t0, u32 *t1, u32 *t2, u32 *t3) { aria_sbox_layer1_with_pre_diff(t0, t1, t2, t3); aria_diff_word(t0, t1, t2, t3); aria_diff_byte(t1, t2, t3); aria_diff_word(t0, t1, t2, t3); } /* Even round Substitution & Diffusion */ static inline void aria_subst_diff_even(u32 *t0, u32 *t1, u32 *t2, u32 *t3) { aria_sbox_layer2_with_pre_diff(t0, t1, t2, t3); aria_diff_word(t0, t1, t2, t3); aria_diff_byte(t3, t0, t1); aria_diff_word(t0, t1, t2, t3); } /* Q, R Macro expanded ARIA GSRK */ static inline void aria_gsrk(u32 *rk, u32 *x, u32 *y, u32 n) { int q = 4 - (n / 32); int r = n % 32; rk[0] = (x[0]) ^ ((y[q % 4]) >> r) ^ ((y[(q + 3) % 4]) << (32 - r)); rk[1] = (x[1]) ^ ((y[(q + 1) % 4]) >> r) ^ ((y[q % 4]) << (32 - r)); rk[2] = (x[2]) ^ ((y[(q + 2) % 4]) >> r) ^ ((y[(q + 1) % 4]) << (32 - r)); rk[3] = (x[3]) ^ ((y[(q + 3) % 4]) >> r) ^ ((y[(q + 2) % 4]) << (32 - r)); } void aria_encrypt(void *ctx, u8 *out, const u8 *in); void aria_decrypt(void *ctx, u8 *out, const u8 *in); int aria_set_key(struct crypto_tfm *tfm, const u8 *in_key, unsigned int key_len); #endif |
| 75 31 37 35 35 35 11 11 27 27 26 27 27 27 35 35 35 35 35 34 35 35 35 27 27 17 17 12 11 26 21 14 11 11 17 11 11 11 11 67 69 68 37 55 35 68 69 68 69 470 43 470 462 123 37 37 35 34 37 37 23 37 37 36 36 35 35 36 22 36 35 27 27 27 20 27 26 3 18 18 18 17 3 1 21 21 4 4 4 4 4 3 1 21 21 20 18 18 18 17 14 14 1 18 21 21 21 40 40 39 39 40 46 41 40 39 39 2 39 40 2 2 40 46 41 40 40 41 41 41 16 41 40 40 41 41 2 2 2 2 1 2 1 1 2 41 40 1 41 41 2 2 2 40 41 33 40 7 32 10 10 10 1 1 1 1 8 4 3 1 4 1 6 6 6 8 5 5 5 5 5 5 3 5 5 5 3 3 3 3 3 4 4 4 3 1 4 3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 | // SPDX-License-Identifier: GPL-2.0 /* * linux/mm/mlock.c * * (C) Copyright 1995 Linus Torvalds * (C) Copyright 2002 Christoph Hellwig */ #include <linux/capability.h> #include <linux/mman.h> #include <linux/mm.h> #include <linux/sched/user.h> #include <linux/swap.h> #include <linux/swapops.h> #include <linux/pagemap.h> #include <linux/pagevec.h> #include <linux/pagewalk.h> #include <linux/mempolicy.h> #include <linux/syscalls.h> #include <linux/sched.h> #include <linux/export.h> #include <linux/rmap.h> #include <linux/mmzone.h> #include <linux/hugetlb.h> #include <linux/memcontrol.h> #include <linux/mm_inline.h> #include <linux/secretmem.h> #include "internal.h" struct mlock_fbatch { local_lock_t lock; struct folio_batch fbatch; }; static DEFINE_PER_CPU(struct mlock_fbatch, mlock_fbatch) = { .lock = INIT_LOCAL_LOCK(lock), }; bool can_do_mlock(void) { if (rlimit(RLIMIT_MEMLOCK) != 0) return true; if (capable(CAP_IPC_LOCK)) return true; return false; } EXPORT_SYMBOL(can_do_mlock); /* * Mlocked folios are marked with the PG_mlocked flag for efficient testing * in vmscan and, possibly, the fault path; and to support semi-accurate * statistics. * * An mlocked folio [folio_test_mlocked(folio)] is unevictable. As such, it * will be ostensibly placed on the LRU "unevictable" list (actually no such * list exists), rather than the [in]active lists. PG_unevictable is set to * indicate the unevictable state. */ static struct lruvec *__mlock_folio(struct folio *folio, struct lruvec *lruvec) { /* There is nothing more we can do while it's off LRU */ if (!folio_test_clear_lru(folio)) return lruvec; lruvec = folio_lruvec_relock_irq(folio, lruvec); if (unlikely(folio_evictable(folio))) { /* * This is a little surprising, but quite possible: PG_mlocked * must have got cleared already by another CPU. Could this * folio be unevictable? I'm not sure, but move it now if so. */ if (folio_test_unevictable(folio)) { lruvec_del_folio(lruvec, folio); folio_clear_unevictable(folio); lruvec_add_folio(lruvec, folio); __count_vm_events(UNEVICTABLE_PGRESCUED, folio_nr_pages(folio)); } goto out; } if (folio_test_unevictable(folio)) { if (folio_test_mlocked(folio)) folio->mlock_count++; goto out; } lruvec_del_folio(lruvec, folio); folio_clear_active(folio); folio_set_unevictable(folio); folio->mlock_count = !!folio_test_mlocked(folio); lruvec_add_folio(lruvec, folio); __count_vm_events(UNEVICTABLE_PGCULLED, folio_nr_pages(folio)); out: folio_set_lru(folio); return lruvec; } static struct lruvec *__mlock_new_folio(struct folio *folio, struct lruvec *lruvec) { VM_BUG_ON_FOLIO(folio_test_lru(folio), folio); lruvec = folio_lruvec_relock_irq(folio, lruvec); /* As above, this is a little surprising, but possible */ if (unlikely(folio_evictable(folio))) goto out; folio_set_unevictable(folio); folio->mlock_count = !!folio_test_mlocked(folio); __count_vm_events(UNEVICTABLE_PGCULLED, folio_nr_pages(folio)); out: lruvec_add_folio(lruvec, folio); folio_set_lru(folio); return lruvec; } static struct lruvec *__munlock_folio(struct folio *folio, struct lruvec *lruvec) { int nr_pages = folio_nr_pages(folio); bool isolated = false; if (!folio_test_clear_lru(folio)) goto munlock; isolated = true; lruvec = folio_lruvec_relock_irq(folio, lruvec); if (folio_test_unevictable(folio)) { /* Then mlock_count is maintained, but might undercount */ if (folio->mlock_count) folio->mlock_count--; if (folio->mlock_count) goto out; } /* else assume that was the last mlock: reclaim will fix it if not */ munlock: if (folio_test_clear_mlocked(folio)) { __zone_stat_mod_folio(folio, NR_MLOCK, -nr_pages); if (isolated || !folio_test_unevictable(folio)) __count_vm_events(UNEVICTABLE_PGMUNLOCKED, nr_pages); else __count_vm_events(UNEVICTABLE_PGSTRANDED, nr_pages); } /* folio_evictable() has to be checked *after* clearing Mlocked */ if (isolated && folio_test_unevictable(folio) && folio_evictable(folio)) { lruvec_del_folio(lruvec, folio); folio_clear_unevictable(folio); lruvec_add_folio(lruvec, folio); __count_vm_events(UNEVICTABLE_PGRESCUED, nr_pages); } out: if (isolated) folio_set_lru(folio); return lruvec; } /* * Flags held in the low bits of a struct folio pointer on the mlock_fbatch. */ #define LRU_FOLIO 0x1 #define NEW_FOLIO 0x2 static inline struct folio *mlock_lru(struct folio *folio) { return (struct folio *)((unsigned long)folio + LRU_FOLIO); } static inline struct folio *mlock_new(struct folio *folio) { return (struct folio *)((unsigned long)folio + NEW_FOLIO); } /* * mlock_folio_batch() is derived from folio_batch_move_lru(): perhaps that can * make use of such folio pointer flags in future, but for now just keep it for * mlock. We could use three separate folio batches instead, but one feels * better (munlocking a full folio batch does not need to drain mlocking folio * batches first). */ static void mlock_folio_batch(struct folio_batch *fbatch) { struct lruvec *lruvec = NULL; unsigned long mlock; struct folio *folio; int i; for (i = 0; i < folio_batch_count(fbatch); i++) { folio = fbatch->folios[i]; mlock = (unsigned long)folio & (LRU_FOLIO | NEW_FOLIO); folio = (struct folio *)((unsigned long)folio - mlock); fbatch->folios[i] = folio; if (mlock & LRU_FOLIO) lruvec = __mlock_folio(folio, lruvec); else if (mlock & NEW_FOLIO) lruvec = __mlock_new_folio(folio, lruvec); else lruvec = __munlock_folio(folio, lruvec); } if (lruvec) unlock_page_lruvec_irq(lruvec); folios_put(fbatch); } void mlock_drain_local(void) { struct folio_batch *fbatch; local_lock(&mlock_fbatch.lock); fbatch = this_cpu_ptr(&mlock_fbatch.fbatch); if (folio_batch_count(fbatch)) mlock_folio_batch(fbatch); local_unlock(&mlock_fbatch.lock); } void mlock_drain_remote(int cpu) { struct folio_batch *fbatch; WARN_ON_ONCE(cpu_online(cpu)); fbatch = &per_cpu(mlock_fbatch.fbatch, cpu); if (folio_batch_count(fbatch)) mlock_folio_batch(fbatch); } bool need_mlock_drain(int cpu) { return folio_batch_count(&per_cpu(mlock_fbatch.fbatch, cpu)); } /** * mlock_folio - mlock a folio already on (or temporarily off) LRU * @folio: folio to be mlocked. */ void mlock_folio(struct folio *folio) { struct folio_batch *fbatch; local_lock(&mlock_fbatch.lock); fbatch = this_cpu_ptr(&mlock_fbatch.fbatch); if (!folio_test_set_mlocked(folio)) { int nr_pages = folio_nr_pages(folio); zone_stat_mod_folio(folio, NR_MLOCK, nr_pages); __count_vm_events(UNEVICTABLE_PGMLOCKED, nr_pages); } folio_get(folio); if (!folio_batch_add(fbatch, mlock_lru(folio)) || !folio_may_be_lru_cached(folio) || lru_cache_disabled()) mlock_folio_batch(fbatch); local_unlock(&mlock_fbatch.lock); } /** * mlock_new_folio - mlock a newly allocated folio not yet on LRU * @folio: folio to be mlocked, either normal or a THP head. */ void mlock_new_folio(struct folio *folio) { struct folio_batch *fbatch; int nr_pages = folio_nr_pages(folio); local_lock(&mlock_fbatch.lock); fbatch = this_cpu_ptr(&mlock_fbatch.fbatch); folio_set_mlocked(folio); zone_stat_mod_folio(folio, NR_MLOCK, nr_pages); __count_vm_events(UNEVICTABLE_PGMLOCKED, nr_pages); folio_get(folio); if (!folio_batch_add(fbatch, mlock_new(folio)) || !folio_may_be_lru_cached(folio) || lru_cache_disabled()) mlock_folio_batch(fbatch); local_unlock(&mlock_fbatch.lock); } /** * munlock_folio - munlock a folio * @folio: folio to be munlocked, either normal or a THP head. */ void munlock_folio(struct folio *folio) { struct folio_batch *fbatch; local_lock(&mlock_fbatch.lock); fbatch = this_cpu_ptr(&mlock_fbatch.fbatch); /* * folio_test_clear_mlocked(folio) must be left to __munlock_folio(), * which will check whether the folio is multiply mlocked. */ folio_get(folio); if (!folio_batch_add(fbatch, folio) || !folio_may_be_lru_cached(folio) || lru_cache_disabled()) mlock_folio_batch(fbatch); local_unlock(&mlock_fbatch.lock); } static inline unsigned int folio_mlock_step(struct folio *folio, pte_t *pte, unsigned long addr, unsigned long end) { unsigned int count = (end - addr) >> PAGE_SHIFT; pte_t ptent = ptep_get(pte); if (!folio_test_large(folio)) return 1; return folio_pte_batch(folio, pte, ptent, count); } static inline bool allow_mlock_munlock(struct folio *folio, struct vm_area_struct *vma, unsigned long start, unsigned long end, unsigned int step) { /* * For unlock, allow munlock large folio which is partially * mapped to VMA. As it's possible that large folio is * mlocked and VMA is split later. * * During memory pressure, such kind of large folio can * be split. And the pages are not in VM_LOCKed VMA * can be reclaimed. */ if (!(vma->vm_flags & VM_LOCKED)) return true; /* folio_within_range() cannot take KSM, but any small folio is OK */ if (!folio_test_large(folio)) return true; /* folio not in range [start, end), skip mlock */ if (!folio_within_range(folio, vma, start, end)) return false; /* folio is not fully mapped, skip mlock */ if (step != folio_nr_pages(folio)) return false; return true; } static int mlock_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, struct mm_walk *walk) { struct vm_area_struct *vma = walk->vma; spinlock_t *ptl; pte_t *start_pte, *pte; pte_t ptent; struct folio *folio; unsigned int step = 1; unsigned long start = addr; ptl = pmd_trans_huge_lock(pmd, vma); if (ptl) { if (!pmd_present(*pmd)) goto out; if (is_huge_zero_pmd(*pmd)) goto out; folio = pmd_folio(*pmd); if (folio_is_zone_device(folio)) goto out; if (vma->vm_flags & VM_LOCKED) mlock_folio(folio); else munlock_folio(folio); goto out; } start_pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); if (!start_pte) { walk->action = ACTION_AGAIN; return 0; } for (pte = start_pte; addr != end; pte++, addr += PAGE_SIZE) { ptent = ptep_get(pte); if (!pte_present(ptent)) continue; folio = vm_normal_folio(vma, addr, ptent); if (!folio || folio_is_zone_device(folio)) continue; step = folio_mlock_step(folio, pte, addr, end); if (!allow_mlock_munlock(folio, vma, start, end, step)) goto next_entry; if (vma->vm_flags & VM_LOCKED) mlock_folio(folio); else munlock_folio(folio); next_entry: pte += step - 1; addr += (step - 1) << PAGE_SHIFT; } pte_unmap(start_pte); out: spin_unlock(ptl); cond_resched(); return 0; } /* * mlock_vma_pages_range() - mlock any pages already in the range, * or munlock all pages in the range. * @vma - vma containing range to be mlock()ed or munlock()ed * @start - start address in @vma of the range * @end - end of range in @vma * @newflags - the new set of flags for @vma. * * Called for mlock(), mlock2() and mlockall(), to set @vma VM_LOCKED; * called for munlock() and munlockall(), to clear VM_LOCKED from @vma. */ static void mlock_vma_pages_range(struct vm_area_struct *vma, unsigned long start, unsigned long end, vm_flags_t newflags) { static const struct mm_walk_ops mlock_walk_ops = { .pmd_entry = mlock_pte_range, .walk_lock = PGWALK_WRLOCK_VERIFY, }; /* * There is a slight chance that concurrent page migration, * or page reclaim finding a page of this now-VM_LOCKED vma, * will call mlock_vma_folio() and raise page's mlock_count: * double counting, leaving the page unevictable indefinitely. * Communicate this danger to mlock_vma_folio() with VM_IO, * which is a VM_SPECIAL flag not allowed on VM_LOCKED vmas. * mmap_lock is held in write mode here, so this weird * combination should not be visible to other mmap_lock users; * but WRITE_ONCE so rmap walkers must see VM_IO if VM_LOCKED. */ if (newflags & VM_LOCKED) newflags |= VM_IO; vma_start_write(vma); vm_flags_reset_once(vma, newflags); lru_add_drain(); walk_page_range(vma->vm_mm, start, end, &mlock_walk_ops, NULL); lru_add_drain(); if (newflags & VM_IO) { newflags &= ~VM_IO; vm_flags_reset_once(vma, newflags); } } /* * mlock_fixup - handle mlock[all]/munlock[all] requests. * * Filters out "special" vmas -- VM_LOCKED never gets set for these, and * munlock is a no-op. However, for some special vmas, we go ahead and * populate the ptes. * * For vmas that pass the filters, merge/split as appropriate. */ static int mlock_fixup(struct vma_iterator *vmi, struct vm_area_struct *vma, struct vm_area_struct **prev, unsigned long start, unsigned long end, vm_flags_t newflags) { struct mm_struct *mm = vma->vm_mm; int nr_pages; int ret = 0; vm_flags_t oldflags = vma->vm_flags; if (newflags == oldflags || (oldflags & VM_SPECIAL) || is_vm_hugetlb_page(vma) || vma == get_gate_vma(current->mm) || vma_is_dax(vma) || vma_is_secretmem(vma) || (oldflags & VM_DROPPABLE)) /* don't set VM_LOCKED or VM_LOCKONFAULT and don't count */ goto out; vma = vma_modify_flags(vmi, *prev, vma, start, end, &newflags); if (IS_ERR(vma)) { ret = PTR_ERR(vma); goto out; } /* * Keep track of amount of locked VM. */ nr_pages = (end - start) >> PAGE_SHIFT; if (!(newflags & VM_LOCKED)) nr_pages = -nr_pages; else if (oldflags & VM_LOCKED) nr_pages = 0; mm->locked_vm += nr_pages; /* * vm_flags is protected by the mmap_lock held in write mode. * It's okay if try_to_unmap_one unmaps a page just after we * set VM_LOCKED, populate_vma_page_range will bring it back. */ if ((newflags & VM_LOCKED) && (oldflags & VM_LOCKED)) { /* No work to do, and mlocking twice would be wrong */ vma_start_write(vma); vm_flags_reset(vma, newflags); } else { mlock_vma_pages_range(vma, start, end, newflags); } out: *prev = vma; return ret; } static int apply_vma_lock_flags(unsigned long start, size_t len, vm_flags_t flags) { unsigned long nstart, end, tmp; struct vm_area_struct *vma, *prev; VMA_ITERATOR(vmi, current->mm, start); VM_BUG_ON(offset_in_page(start)); VM_BUG_ON(len != PAGE_ALIGN(len)); end = start + len; if (end < start) return -EINVAL; if (end == start) return 0; vma = vma_iter_load(&vmi); if (!vma) return -ENOMEM; prev = vma_prev(&vmi); if (start > vma->vm_start) prev = vma; nstart = start; tmp = vma->vm_start; for_each_vma_range(vmi, vma, end) { int error; vm_flags_t newflags; if (vma->vm_start != tmp) return -ENOMEM; newflags = vma->vm_flags & ~VM_LOCKED_MASK; newflags |= flags; /* Here we know that vma->vm_start <= nstart < vma->vm_end. */ tmp = vma->vm_end; if (tmp > end) tmp = end; error = mlock_fixup(&vmi, vma, &prev, nstart, tmp, newflags); if (error) return error; tmp = vma_iter_end(&vmi); nstart = tmp; } if (tmp < end) return -ENOMEM; return 0; } /* * Go through vma areas and sum size of mlocked * vma pages, as return value. * Note deferred memory locking case(mlock2(,,MLOCK_ONFAULT) * is also counted. * Return value: previously mlocked page counts */ static unsigned long count_mm_mlocked_page_nr(struct mm_struct *mm, unsigned long start, size_t len) { struct vm_area_struct *vma; unsigned long count = 0; unsigned long end; VMA_ITERATOR(vmi, mm, start); /* Don't overflow past ULONG_MAX */ if (unlikely(ULONG_MAX - len < start)) end = ULONG_MAX; else end = start + len; for_each_vma_range(vmi, vma, end) { if (vma->vm_flags & VM_LOCKED) { if (start > vma->vm_start) count -= (start - vma->vm_start); if (end < vma->vm_end) { count += end - vma->vm_start; break; } count += vma->vm_end - vma->vm_start; } } return count >> PAGE_SHIFT; } /* * convert get_user_pages() return value to posix mlock() error */ static int __mlock_posix_error_return(long retval) { if (retval == -EFAULT) retval = -ENOMEM; else if (retval == -ENOMEM) retval = -EAGAIN; return retval; } static __must_check int do_mlock(unsigned long start, size_t len, vm_flags_t flags) { unsigned long locked; unsigned long lock_limit; int error = -ENOMEM; start = untagged_addr(start); if (!can_do_mlock()) return -EPERM; len = PAGE_ALIGN(len + (offset_in_page(start))); start &= PAGE_MASK; lock_limit = rlimit(RLIMIT_MEMLOCK); lock_limit >>= PAGE_SHIFT; locked = len >> PAGE_SHIFT; if (mmap_write_lock_killable(current->mm)) return -EINTR; locked += current->mm->locked_vm; if ((locked > lock_limit) && (!capable(CAP_IPC_LOCK))) { /* * It is possible that the regions requested intersect with * previously mlocked areas, that part area in "mm->locked_vm" * should not be counted to new mlock increment count. So check * and adjust locked count if necessary. */ locked -= count_mm_mlocked_page_nr(current->mm, start, len); } /* check against resource limits */ if ((locked <= lock_limit) || capable(CAP_IPC_LOCK)) error = apply_vma_lock_flags(start, len, flags); mmap_write_unlock(current->mm); if (error) return error; error = __mm_populate(start, len, 0); if (error) return __mlock_posix_error_return(error); return 0; } SYSCALL_DEFINE2(mlock, unsigned long, start, size_t, len) { return do_mlock(start, len, VM_LOCKED); } SYSCALL_DEFINE3(mlock2, unsigned long, start, size_t, len, int, flags) { vm_flags_t vm_flags = VM_LOCKED; if (flags & ~MLOCK_ONFAULT) return -EINVAL; if (flags & MLOCK_ONFAULT) vm_flags |= VM_LOCKONFAULT; return do_mlock(start, len, vm_flags); } SYSCALL_DEFINE2(munlock, unsigned long, start, size_t, len) { int ret; start = untagged_addr(start); len = PAGE_ALIGN(len + (offset_in_page(start))); start &= PAGE_MASK; if (mmap_write_lock_killable(current->mm)) return -EINTR; ret = apply_vma_lock_flags(start, len, 0); mmap_write_unlock(current->mm); return ret; } /* * Take the MCL_* flags passed into mlockall (or 0 if called from munlockall) * and translate into the appropriate modifications to mm->def_flags and/or the * flags for all current VMAs. * * There are a couple of subtleties with this. If mlockall() is called multiple * times with different flags, the values do not necessarily stack. If mlockall * is called once including the MCL_FUTURE flag and then a second time without * it, VM_LOCKED and VM_LOCKONFAULT will be cleared from mm->def_flags. */ static int apply_mlockall_flags(int flags) { VMA_ITERATOR(vmi, current->mm, 0); struct vm_area_struct *vma, *prev = NULL; vm_flags_t to_add = 0; current->mm->def_flags &= ~VM_LOCKED_MASK; if (flags & MCL_FUTURE) { current->mm->def_flags |= VM_LOCKED; if (flags & MCL_ONFAULT) current->mm->def_flags |= VM_LOCKONFAULT; if (!(flags & MCL_CURRENT)) goto out; } if (flags & MCL_CURRENT) { to_add |= VM_LOCKED; if (flags & MCL_ONFAULT) to_add |= VM_LOCKONFAULT; } for_each_vma(vmi, vma) { int error; vm_flags_t newflags; newflags = vma->vm_flags & ~VM_LOCKED_MASK; newflags |= to_add; error = mlock_fixup(&vmi, vma, &prev, vma->vm_start, vma->vm_end, newflags); /* Ignore errors, but prev needs fixing up. */ if (error) prev = vma; cond_resched(); } out: return 0; } SYSCALL_DEFINE1(mlockall, int, flags) { unsigned long lock_limit; int ret; if (!flags || (flags & ~(MCL_CURRENT | MCL_FUTURE | MCL_ONFAULT)) || flags == MCL_ONFAULT) return -EINVAL; if (!can_do_mlock()) return -EPERM; lock_limit = rlimit(RLIMIT_MEMLOCK); lock_limit >>= PAGE_SHIFT; if (mmap_write_lock_killable(current->mm)) return -EINTR; ret = -ENOMEM; if (!(flags & MCL_CURRENT) || (current->mm->total_vm <= lock_limit) || capable(CAP_IPC_LOCK)) ret = apply_mlockall_flags(flags); mmap_write_unlock(current->mm); if (!ret && (flags & MCL_CURRENT)) mm_populate(0, TASK_SIZE); return ret; } SYSCALL_DEFINE0(munlockall) { int ret; if (mmap_write_lock_killable(current->mm)) return -EINTR; ret = apply_mlockall_flags(0); mmap_write_unlock(current->mm); return ret; } /* * Objects with different lifetime than processes (SHM_LOCK and SHM_HUGETLB * shm segments) get accounted against the user_struct instead. */ static DEFINE_SPINLOCK(shmlock_user_lock); int user_shm_lock(size_t size, struct ucounts *ucounts) { unsigned long lock_limit, locked; long memlock; int allowed = 0; locked = (size + PAGE_SIZE - 1) >> PAGE_SHIFT; lock_limit = rlimit(RLIMIT_MEMLOCK); if (lock_limit != RLIM_INFINITY) lock_limit >>= PAGE_SHIFT; spin_lock(&shmlock_user_lock); memlock = inc_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_MEMLOCK, locked); if ((memlock == LONG_MAX || memlock > lock_limit) && !capable(CAP_IPC_LOCK)) { dec_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_MEMLOCK, locked); goto out; } if (!get_ucounts(ucounts)) { dec_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_MEMLOCK, locked); allowed = 0; goto out; } allowed = 1; out: spin_unlock(&shmlock_user_lock); return allowed; } void user_shm_unlock(size_t size, struct ucounts *ucounts) { spin_lock(&shmlock_user_lock); dec_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_MEMLOCK, (size + PAGE_SIZE - 1) >> PAGE_SHIFT); spin_unlock(&shmlock_user_lock); put_ucounts(ucounts); } |
| 5 4 4 4 6 6 2 2 2 2 2 14 13 11 14 13 14 14 2 1 1 1 7 2 2 2 18 18 18 18 14 18 8 11 18 18 18 9 5 5 5 2 5 1 1 6 6 2 5 4 4 1 5 6 4 4 4 3 1 3 2 3 2 3 16 16 16 16 16 13 5 3 2 2 3 3 3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 | // SPDX-License-Identifier: GPL-2.0-only /* * Fence mechanism for dma-buf and to allow for asynchronous dma access * * Copyright (C) 2012 Canonical Ltd * Copyright (C) 2012 Texas Instruments * * Authors: * Rob Clark <robdclark@gmail.com> * Maarten Lankhorst <maarten.lankhorst@canonical.com> */ #include <linux/slab.h> #include <linux/export.h> #include <linux/atomic.h> #include <linux/dma-fence.h> #include <linux/sched/signal.h> #include <linux/seq_file.h> #define CREATE_TRACE_POINTS #include <trace/events/dma_fence.h> EXPORT_TRACEPOINT_SYMBOL(dma_fence_emit); EXPORT_TRACEPOINT_SYMBOL(dma_fence_enable_signal); EXPORT_TRACEPOINT_SYMBOL(dma_fence_signaled); static DEFINE_SPINLOCK(dma_fence_stub_lock); static struct dma_fence dma_fence_stub; /* * fence context counter: each execution context should have its own * fence context, this allows checking if fences belong to the same * context or not. One device can have multiple separate contexts, * and they're used if some engine can run independently of another. */ static atomic64_t dma_fence_context_counter = ATOMIC64_INIT(1); /** * DOC: DMA fences overview * * DMA fences, represented by &struct dma_fence, are the kernel internal * synchronization primitive for DMA operations like GPU rendering, video * encoding/decoding, or displaying buffers on a screen. * * A fence is initialized using dma_fence_init() and completed using * dma_fence_signal(). Fences are associated with a context, allocated through * dma_fence_context_alloc(), and all fences on the same context are * fully ordered. * * Since the purposes of fences is to facilitate cross-device and * cross-application synchronization, there's multiple ways to use one: * * - Individual fences can be exposed as a &sync_file, accessed as a file * descriptor from userspace, created by calling sync_file_create(). This is * called explicit fencing, since userspace passes around explicit * synchronization points. * * - Some subsystems also have their own explicit fencing primitives, like * &drm_syncobj. Compared to &sync_file, a &drm_syncobj allows the underlying * fence to be updated. * * - Then there's also implicit fencing, where the synchronization points are * implicitly passed around as part of shared &dma_buf instances. Such * implicit fences are stored in &struct dma_resv through the * &dma_buf.resv pointer. */ /** * DOC: fence cross-driver contract * * Since &dma_fence provide a cross driver contract, all drivers must follow the * same rules: * * * Fences must complete in a reasonable time. Fences which represent kernels * and shaders submitted by userspace, which could run forever, must be backed * up by timeout and gpu hang recovery code. Minimally that code must prevent * further command submission and force complete all in-flight fences, e.g. * when the driver or hardware do not support gpu reset, or if the gpu reset * failed for some reason. Ideally the driver supports gpu recovery which only * affects the offending userspace context, and no other userspace * submissions. * * * Drivers may have different ideas of what completion within a reasonable * time means. Some hang recovery code uses a fixed timeout, others a mix * between observing forward progress and increasingly strict timeouts. * Drivers should not try to second guess timeout handling of fences from * other drivers. * * * To ensure there's no deadlocks of dma_fence_wait() against other locks * drivers should annotate all code required to reach dma_fence_signal(), * which completes the fences, with dma_fence_begin_signalling() and * dma_fence_end_signalling(). * * * Drivers are allowed to call dma_fence_wait() while holding dma_resv_lock(). * This means any code required for fence completion cannot acquire a * &dma_resv lock. Note that this also pulls in the entire established * locking hierarchy around dma_resv_lock() and dma_resv_unlock(). * * * Drivers are allowed to call dma_fence_wait() from their &shrinker * callbacks. This means any code required for fence completion cannot * allocate memory with GFP_KERNEL. * * * Drivers are allowed to call dma_fence_wait() from their &mmu_notifier * respectively &mmu_interval_notifier callbacks. This means any code required * for fence completion cannot allocate memory with GFP_NOFS or GFP_NOIO. * Only GFP_ATOMIC is permissible, which might fail. * * Note that only GPU drivers have a reasonable excuse for both requiring * &mmu_interval_notifier and &shrinker callbacks at the same time as having to * track asynchronous compute work using &dma_fence. No driver outside of * drivers/gpu should ever call dma_fence_wait() in such contexts. */ static const char *dma_fence_stub_get_name(struct dma_fence *fence) { return "stub"; } static const struct dma_fence_ops dma_fence_stub_ops = { .get_driver_name = dma_fence_stub_get_name, .get_timeline_name = dma_fence_stub_get_name, }; static int __init dma_fence_init_stub(void) { dma_fence_init(&dma_fence_stub, &dma_fence_stub_ops, &dma_fence_stub_lock, 0, 0); set_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &dma_fence_stub.flags); dma_fence_signal(&dma_fence_stub); return 0; } subsys_initcall(dma_fence_init_stub); /** * dma_fence_get_stub - return a signaled fence * * Return a stub fence which is already signaled. The fence's timestamp * corresponds to the initialisation time of the linux kernel. */ struct dma_fence *dma_fence_get_stub(void) { return dma_fence_get(&dma_fence_stub); } EXPORT_SYMBOL(dma_fence_get_stub); /** * dma_fence_allocate_private_stub - return a private, signaled fence * @timestamp: timestamp when the fence was signaled * * Return a newly allocated and signaled stub fence. */ struct dma_fence *dma_fence_allocate_private_stub(ktime_t timestamp) { struct dma_fence *fence; fence = kzalloc_obj(*fence); if (fence == NULL) return NULL; dma_fence_init(fence, &dma_fence_stub_ops, &dma_fence_stub_lock, 0, 0); set_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &fence->flags); dma_fence_signal_timestamp(fence, timestamp); return fence; } EXPORT_SYMBOL(dma_fence_allocate_private_stub); /** * dma_fence_context_alloc - allocate an array of fence contexts * @num: amount of contexts to allocate * * This function will return the first index of the number of fence contexts * allocated. The fence context is used for setting &dma_fence.context to a * unique number by passing the context to dma_fence_init(). */ u64 dma_fence_context_alloc(unsigned num) { WARN_ON(!num); return atomic64_fetch_add(num, &dma_fence_context_counter); } EXPORT_SYMBOL(dma_fence_context_alloc); /** * DOC: fence signalling annotation * * Proving correctness of all the kernel code around &dma_fence through code * review and testing is tricky for a few reasons: * * * It is a cross-driver contract, and therefore all drivers must follow the * same rules for lock nesting order, calling contexts for various functions * and anything else significant for in-kernel interfaces. But it is also * impossible to test all drivers in a single machine, hence brute-force N vs. * N testing of all combinations is impossible. Even just limiting to the * possible combinations is infeasible. * * * There is an enormous amount of driver code involved. For render drivers * there's the tail of command submission, after fences are published, * scheduler code, interrupt and workers to process job completion, * and timeout, gpu reset and gpu hang recovery code. Plus for integration * with core mm with have &mmu_notifier, respectively &mmu_interval_notifier, * and &shrinker. For modesetting drivers there's the commit tail functions * between when fences for an atomic modeset are published, and when the * corresponding vblank completes, including any interrupt processing and * related workers. Auditing all that code, across all drivers, is not * feasible. * * * Due to how many other subsystems are involved and the locking hierarchies * this pulls in there is extremely thin wiggle-room for driver-specific * differences. &dma_fence interacts with almost all of the core memory * handling through page fault handlers via &dma_resv, dma_resv_lock() and * dma_resv_unlock(). On the other side it also interacts through all * allocation sites through &mmu_notifier and &shrinker. * * Furthermore lockdep does not handle cross-release dependencies, which means * any deadlocks between dma_fence_wait() and dma_fence_signal() can't be caught * at runtime with some quick testing. The simplest example is one thread * waiting on a &dma_fence while holding a lock:: * * lock(A); * dma_fence_wait(B); * unlock(A); * * while the other thread is stuck trying to acquire the same lock, which * prevents it from signalling the fence the previous thread is stuck waiting * on:: * * lock(A); * unlock(A); * dma_fence_signal(B); * * By manually annotating all code relevant to signalling a &dma_fence we can * teach lockdep about these dependencies, which also helps with the validation * headache since now lockdep can check all the rules for us:: * * cookie = dma_fence_begin_signalling(); * lock(A); * unlock(A); * dma_fence_signal(B); * dma_fence_end_signalling(cookie); * * For using dma_fence_begin_signalling() and dma_fence_end_signalling() to * annotate critical sections the following rules need to be observed: * * * All code necessary to complete a &dma_fence must be annotated, from the * point where a fence is accessible to other threads, to the point where * dma_fence_signal() is called. Un-annotated code can contain deadlock issues, * and due to the very strict rules and many corner cases it is infeasible to * catch these just with review or normal stress testing. * * * &struct dma_resv deserves a special note, since the readers are only * protected by rcu. This means the signalling critical section starts as soon * as the new fences are installed, even before dma_resv_unlock() is called. * * * The only exception are fast paths and opportunistic signalling code, which * calls dma_fence_signal() purely as an optimization, but is not required to * guarantee completion of a &dma_fence. The usual example is a wait IOCTL * which calls dma_fence_signal(), while the mandatory completion path goes * through a hardware interrupt and possible job completion worker. * * * To aid composability of code, the annotations can be freely nested, as long * as the overall locking hierarchy is consistent. The annotations also work * both in interrupt and process context. Due to implementation details this * requires that callers pass an opaque cookie from * dma_fence_begin_signalling() to dma_fence_end_signalling(). * * * Validation against the cross driver contract is implemented by priming * lockdep with the relevant hierarchy at boot-up. This means even just * testing with a single device is enough to validate a driver, at least as * far as deadlocks with dma_fence_wait() against dma_fence_signal() are * concerned. */ #ifdef CONFIG_LOCKDEP static struct lockdep_map dma_fence_lockdep_map = { .name = "dma_fence_map" }; /** * dma_fence_begin_signalling - begin a critical DMA fence signalling section * * Drivers should use this to annotate the beginning of any code section * required to eventually complete &dma_fence by calling dma_fence_signal(). * * The end of these critical sections are annotated with * dma_fence_end_signalling(). * * Returns: * * Opaque cookie needed by the implementation, which needs to be passed to * dma_fence_end_signalling(). */ bool dma_fence_begin_signalling(void) { /* explicitly nesting ... */ if (lock_is_held_type(&dma_fence_lockdep_map, 1)) return true; /* rely on might_sleep check for soft/hardirq locks */ if (in_atomic()) return true; /* ... and non-recursive successful read_trylock */ lock_acquire(&dma_fence_lockdep_map, 0, 1, 1, 1, NULL, _RET_IP_); return false; } EXPORT_SYMBOL(dma_fence_begin_signalling); /** * dma_fence_end_signalling - end a critical DMA fence signalling section * @cookie: opaque cookie from dma_fence_begin_signalling() * * Closes a critical section annotation opened by dma_fence_begin_signalling(). */ void dma_fence_end_signalling(bool cookie) { if (cookie) return; lock_release(&dma_fence_lockdep_map, _RET_IP_); } EXPORT_SYMBOL(dma_fence_end_signalling); void __dma_fence_might_wait(void) { bool tmp; tmp = lock_is_held_type(&dma_fence_lockdep_map, 1); if (tmp) lock_release(&dma_fence_lockdep_map, _THIS_IP_); lock_map_acquire(&dma_fence_lockdep_map); lock_map_release(&dma_fence_lockdep_map); if (tmp) lock_acquire(&dma_fence_lockdep_map, 0, 1, 1, 1, NULL, _THIS_IP_); } #endif /** * dma_fence_signal_timestamp_locked - signal completion of a fence * @fence: the fence to signal * @timestamp: fence signal timestamp in kernel's CLOCK_MONOTONIC time domain * * Signal completion for software callbacks on a fence, this will unblock * dma_fence_wait() calls and run all the callbacks added with * dma_fence_add_callback(). Can be called multiple times, but since a fence * can only go from the unsignaled to the signaled state and not back, it will * only be effective the first time. Set the timestamp provided as the fence * signal timestamp. * * Unlike dma_fence_signal_timestamp(), this function must be called with * &dma_fence.lock held. */ void dma_fence_signal_timestamp_locked(struct dma_fence *fence, ktime_t timestamp) { struct dma_fence_cb *cur, *tmp; struct list_head cb_list; lockdep_assert_held(fence->lock); if (unlikely(test_and_set_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))) return; /* Stash the cb_list before replacing it with the timestamp */ list_replace(&fence->cb_list, &cb_list); fence->timestamp = timestamp; set_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, &fence->flags); trace_dma_fence_signaled(fence); list_for_each_entry_safe(cur, tmp, &cb_list, node) { INIT_LIST_HEAD(&cur->node); cur->func(fence, cur); } } EXPORT_SYMBOL(dma_fence_signal_timestamp_locked); /** * dma_fence_signal_timestamp - signal completion of a fence * @fence: the fence to signal * @timestamp: fence signal timestamp in kernel's CLOCK_MONOTONIC time domain * * Signal completion for software callbacks on a fence, this will unblock * dma_fence_wait() calls and run all the callbacks added with * dma_fence_add_callback(). Can be called multiple times, but since a fence * can only go from the unsignaled to the signaled state and not back, it will * only be effective the first time. Set the timestamp provided as the fence * signal timestamp. */ void dma_fence_signal_timestamp(struct dma_fence *fence, ktime_t timestamp) { unsigned long flags; if (WARN_ON(!fence)) return; spin_lock_irqsave(fence->lock, flags); dma_fence_signal_timestamp_locked(fence, timestamp); spin_unlock_irqrestore(fence->lock, flags); } EXPORT_SYMBOL(dma_fence_signal_timestamp); /** * dma_fence_signal_locked - signal completion of a fence * @fence: the fence to signal * * Signal completion for software callbacks on a fence, this will unblock * dma_fence_wait() calls and run all the callbacks added with * dma_fence_add_callback(). Can be called multiple times, but since a fence * can only go from the unsignaled to the signaled state and not back, it will * only be effective the first time. * * Unlike dma_fence_signal(), this function must be called with &dma_fence.lock * held. */ void dma_fence_signal_locked(struct dma_fence *fence) { dma_fence_signal_timestamp_locked(fence, ktime_get()); } EXPORT_SYMBOL(dma_fence_signal_locked); /** * dma_fence_check_and_signal_locked - signal the fence if it's not yet signaled * @fence: the fence to check and signal * * Checks whether a fence was signaled and signals it if it was not yet signaled. * * Unlike dma_fence_check_and_signal(), this function must be called with * &struct dma_fence.lock being held. * * Return: true if fence has been signaled already, false otherwise. */ bool dma_fence_check_and_signal_locked(struct dma_fence *fence) { bool ret; ret = dma_fence_test_signaled_flag(fence); dma_fence_signal_locked(fence); return ret; } EXPORT_SYMBOL(dma_fence_check_and_signal_locked); /** * dma_fence_check_and_signal - signal the fence if it's not yet signaled * @fence: the fence to check and signal * * Checks whether a fence was signaled and signals it if it was not yet signaled. * All this is done in a race-free manner. * * Return: true if fence has been signaled already, false otherwise. */ bool dma_fence_check_and_signal(struct dma_fence *fence) { unsigned long flags; bool ret; spin_lock_irqsave(fence->lock, flags); ret = dma_fence_check_and_signal_locked(fence); spin_unlock_irqrestore(fence->lock, flags); return ret; } EXPORT_SYMBOL(dma_fence_check_and_signal); /** * dma_fence_signal - signal completion of a fence * @fence: the fence to signal * * Signal completion for software callbacks on a fence, this will unblock * dma_fence_wait() calls and run all the callbacks added with * dma_fence_add_callback(). Can be called multiple times, but since a fence * can only go from the unsignaled to the signaled state and not back, it will * only be effective the first time. */ void dma_fence_signal(struct dma_fence *fence) { unsigned long flags; bool tmp; if (WARN_ON(!fence)) return; tmp = dma_fence_begin_signalling(); spin_lock_irqsave(fence->lock, flags); dma_fence_signal_timestamp_locked(fence, ktime_get()); spin_unlock_irqrestore(fence->lock, flags); dma_fence_end_signalling(tmp); } EXPORT_SYMBOL(dma_fence_signal); /** * dma_fence_wait_timeout - sleep until the fence gets signaled * or until timeout elapses * @fence: the fence to wait on * @intr: if true, do an interruptible wait * @timeout: timeout value in jiffies, or MAX_SCHEDULE_TIMEOUT * * Returns -ERESTARTSYS if interrupted, 0 if the wait timed out, or the * remaining timeout in jiffies on success. Other error values may be * returned on custom implementations. * * Performs a synchronous wait on this fence. It is assumed the caller * directly or indirectly (buf-mgr between reservation and committing) * holds a reference to the fence, otherwise the fence might be * freed before return, resulting in undefined behavior. * * See also dma_fence_wait() and dma_fence_wait_any_timeout(). */ signed long dma_fence_wait_timeout(struct dma_fence *fence, bool intr, signed long timeout) { signed long ret; if (WARN_ON(timeout < 0)) return -EINVAL; might_sleep(); __dma_fence_might_wait(); dma_fence_enable_sw_signaling(fence); if (trace_dma_fence_wait_start_enabled()) { rcu_read_lock(); trace_dma_fence_wait_start(fence); rcu_read_unlock(); } if (fence->ops->wait) ret = fence->ops->wait(fence, intr, timeout); else ret = dma_fence_default_wait(fence, intr, timeout); if (trace_dma_fence_wait_end_enabled()) { rcu_read_lock(); trace_dma_fence_wait_end(fence); rcu_read_unlock(); } return ret; } EXPORT_SYMBOL(dma_fence_wait_timeout); /** * dma_fence_release - default release function for fences * @kref: &dma_fence.recfount * * This is the default release functions for &dma_fence. Drivers shouldn't call * this directly, but instead call dma_fence_put(). */ void dma_fence_release(struct kref *kref) { struct dma_fence *fence = container_of(kref, struct dma_fence, refcount); rcu_read_lock(); trace_dma_fence_destroy(fence); if (!list_empty(&fence->cb_list) && !dma_fence_test_signaled_flag(fence)) { const char __rcu *timeline; const char __rcu *driver; unsigned long flags; driver = dma_fence_driver_name(fence); timeline = dma_fence_timeline_name(fence); WARN(1, "Fence %s:%s:%llx:%llx released with pending signals!\n", rcu_dereference(driver), rcu_dereference(timeline), fence->context, fence->seqno); /* * Failed to signal before release, likely a refcounting issue. * * This should never happen, but if it does make sure that we * don't leave chains dangling. We set the error flag first * so that the callbacks know this signal is due to an error. */ spin_lock_irqsave(fence->lock, flags); fence->error = -EDEADLK; dma_fence_signal_locked(fence); spin_unlock_irqrestore(fence->lock, flags); } rcu_read_unlock(); if (fence->ops->release) fence->ops->release(fence); else dma_fence_free(fence); } EXPORT_SYMBOL(dma_fence_release); /** * dma_fence_free - default release function for &dma_fence. * @fence: fence to release * * This is the default implementation for &dma_fence_ops.release. It calls * kfree_rcu() on @fence. */ void dma_fence_free(struct dma_fence *fence) { kfree_rcu(fence, rcu); } EXPORT_SYMBOL(dma_fence_free); static bool __dma_fence_enable_signaling(struct dma_fence *fence) { bool was_set; lockdep_assert_held(fence->lock); was_set = test_and_set_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &fence->flags); if (dma_fence_test_signaled_flag(fence)) return false; if (!was_set && fence->ops->enable_signaling) { trace_dma_fence_enable_signal(fence); if (!fence->ops->enable_signaling(fence)) { dma_fence_signal_locked(fence); return false; } } return true; } /** * dma_fence_enable_sw_signaling - enable signaling on fence * @fence: the fence to enable * * This will request for sw signaling to be enabled, to make the fence * complete as soon as possible. This calls &dma_fence_ops.enable_signaling * internally. */ void dma_fence_enable_sw_signaling(struct dma_fence *fence) { unsigned long flags; spin_lock_irqsave(fence->lock, flags); __dma_fence_enable_signaling(fence); spin_unlock_irqrestore(fence->lock, flags); } EXPORT_SYMBOL(dma_fence_enable_sw_signaling); /** * dma_fence_add_callback - add a callback to be called when the fence * is signaled * @fence: the fence to wait on * @cb: the callback to register * @func: the function to call * * Add a software callback to the fence. The caller should keep a reference to * the fence. * * @cb will be initialized by dma_fence_add_callback(), no initialization * by the caller is required. Any number of callbacks can be registered * to a fence, but a callback can only be registered to one fence at a time. * * If fence is already signaled, this function will return -ENOENT (and * *not* call the callback). * * Note that the callback can be called from an atomic context or irq context. * * Returns 0 in case of success, -ENOENT if the fence is already signaled * and -EINVAL in case of error. */ int dma_fence_add_callback(struct dma_fence *fence, struct dma_fence_cb *cb, dma_fence_func_t func) { unsigned long flags; int ret = 0; if (WARN_ON(!fence || !func)) return -EINVAL; if (dma_fence_test_signaled_flag(fence)) { INIT_LIST_HEAD(&cb->node); return -ENOENT; } spin_lock_irqsave(fence->lock, flags); if (__dma_fence_enable_signaling(fence)) { cb->func = func; list_add_tail(&cb->node, &fence->cb_list); } else { INIT_LIST_HEAD(&cb->node); ret = -ENOENT; } spin_unlock_irqrestore(fence->lock, flags); return ret; } EXPORT_SYMBOL(dma_fence_add_callback); /** * dma_fence_get_status - returns the status upon completion * @fence: the dma_fence to query * * This wraps dma_fence_get_status_locked() to return the error status * condition on a signaled fence. See dma_fence_get_status_locked() for more * details. * * Returns 0 if the fence has not yet been signaled, 1 if the fence has * been signaled without an error condition, or a negative error code * if the fence has been completed in err. */ int dma_fence_get_status(struct dma_fence *fence) { unsigned long flags; int status; spin_lock_irqsave(fence->lock, flags); status = dma_fence_get_status_locked(fence); spin_unlock_irqrestore(fence->lock, flags); return status; } EXPORT_SYMBOL(dma_fence_get_status); /** * dma_fence_remove_callback - remove a callback from the signaling list * @fence: the fence to wait on * @cb: the callback to remove * * Remove a previously queued callback from the fence. This function returns * true if the callback is successfully removed, or false if the fence has * already been signaled. * * *WARNING*: * Cancelling a callback should only be done if you really know what you're * doing, since deadlocks and race conditions could occur all too easily. For * this reason, it should only ever be done on hardware lockup recovery, * with a reference held to the fence. * * Behaviour is undefined if @cb has not been added to @fence using * dma_fence_add_callback() beforehand. */ bool dma_fence_remove_callback(struct dma_fence *fence, struct dma_fence_cb *cb) { unsigned long flags; bool ret; spin_lock_irqsave(fence->lock, flags); ret = !list_empty(&cb->node); if (ret) list_del_init(&cb->node); spin_unlock_irqrestore(fence->lock, flags); return ret; } EXPORT_SYMBOL(dma_fence_remove_callback); struct default_wait_cb { struct dma_fence_cb base; struct task_struct *task; }; static void dma_fence_default_wait_cb(struct dma_fence *fence, struct dma_fence_cb *cb) { struct default_wait_cb *wait = container_of(cb, struct default_wait_cb, base); wake_up_state(wait->task, TASK_NORMAL); } /** * dma_fence_default_wait - default sleep until the fence gets signaled * or until timeout elapses * @fence: the fence to wait on * @intr: if true, do an interruptible wait * @timeout: timeout value in jiffies, or MAX_SCHEDULE_TIMEOUT * * Returns -ERESTARTSYS if interrupted, 0 if the wait timed out, or the * remaining timeout in jiffies on success. If timeout is zero the value one is * returned if the fence is already signaled for consistency with other * functions taking a jiffies timeout. */ signed long dma_fence_default_wait(struct dma_fence *fence, bool intr, signed long timeout) { struct default_wait_cb cb; unsigned long flags; signed long ret = timeout ? timeout : 1; spin_lock_irqsave(fence->lock, flags); if (dma_fence_test_signaled_flag(fence)) goto out; if (intr && signal_pending(current)) { ret = -ERESTARTSYS; goto out; } if (!timeout) { ret = 0; goto out; } cb.base.func = dma_fence_default_wait_cb; cb.task = current; list_add(&cb.base.node, &fence->cb_list); while (!dma_fence_test_signaled_flag(fence) && ret > 0) { if (intr) __set_current_state(TASK_INTERRUPTIBLE); else __set_current_state(TASK_UNINTERRUPTIBLE); spin_unlock_irqrestore(fence->lock, flags); ret = schedule_timeout(ret); spin_lock_irqsave(fence->lock, flags); if (ret > 0 && intr && signal_pending(current)) ret = -ERESTARTSYS; } if (!list_empty(&cb.base.node)) list_del(&cb.base.node); __set_current_state(TASK_RUNNING); out: spin_unlock_irqrestore(fence->lock, flags); return ret; } EXPORT_SYMBOL(dma_fence_default_wait); static bool dma_fence_test_signaled_any(struct dma_fence **fences, uint32_t count, uint32_t *idx) { int i; for (i = 0; i < count; ++i) { struct dma_fence *fence = fences[i]; if (dma_fence_test_signaled_flag(fence)) { if (idx) *idx = i; return true; } } return false; } /** * dma_fence_wait_any_timeout - sleep until any fence gets signaled * or until timeout elapses * @fences: array of fences to wait on * @count: number of fences to wait on * @intr: if true, do an interruptible wait * @timeout: timeout value in jiffies, or MAX_SCHEDULE_TIMEOUT * @idx: used to store the first signaled fence index, meaningful only on * positive return * * Returns -EINVAL on custom fence wait implementation, -ERESTARTSYS if * interrupted, 0 if the wait timed out, or the remaining timeout in jiffies * on success. * * Synchronous waits for the first fence in the array to be signaled. The * caller needs to hold a reference to all fences in the array, otherwise a * fence might be freed before return, resulting in undefined behavior. * * See also dma_fence_wait() and dma_fence_wait_timeout(). */ signed long dma_fence_wait_any_timeout(struct dma_fence **fences, uint32_t count, bool intr, signed long timeout, uint32_t *idx) { struct default_wait_cb *cb; signed long ret = timeout; unsigned i; if (WARN_ON(!fences || !count || timeout < 0)) return -EINVAL; if (timeout == 0) { for (i = 0; i < count; ++i) if (dma_fence_is_signaled(fences[i])) { if (idx) *idx = i; return 1; } return 0; } cb = kzalloc_objs(struct default_wait_cb, count); if (cb == NULL) { ret = -ENOMEM; goto err_free_cb; } for (i = 0; i < count; ++i) { struct dma_fence *fence = fences[i]; cb[i].task = current; if (dma_fence_add_callback(fence, &cb[i].base, dma_fence_default_wait_cb)) { /* This fence is already signaled */ if (idx) *idx = i; goto fence_rm_cb; } } while (ret > 0) { if (intr) set_current_state(TASK_INTERRUPTIBLE); else set_current_state(TASK_UNINTERRUPTIBLE); if (dma_fence_test_signaled_any(fences, count, idx)) break; ret = schedule_timeout(ret); if (ret > 0 && intr && signal_pending(current)) ret = -ERESTARTSYS; } __set_current_state(TASK_RUNNING); fence_rm_cb: while (i-- > 0) dma_fence_remove_callback(fences[i], &cb[i].base); err_free_cb: kfree(cb); return ret; } EXPORT_SYMBOL(dma_fence_wait_any_timeout); /** * DOC: deadline hints * * In an ideal world, it would be possible to pipeline a workload sufficiently * that a utilization based device frequency governor could arrive at a minimum * frequency that meets the requirements of the use-case, in order to minimize * power consumption. But in the real world there are many workloads which * defy this ideal. For example, but not limited to: * * * Workloads that ping-pong between device and CPU, with alternating periods * of CPU waiting for device, and device waiting on CPU. This can result in * devfreq and cpufreq seeing idle time in their respective domains and in * result reduce frequency. * * * Workloads that interact with a periodic time based deadline, such as double * buffered GPU rendering vs vblank sync'd page flipping. In this scenario, * missing a vblank deadline results in an *increase* in idle time on the GPU * (since it has to wait an additional vblank period), sending a signal to * the GPU's devfreq to reduce frequency, when in fact the opposite is what is * needed. * * To this end, deadline hint(s) can be set on a &dma_fence via &dma_fence_set_deadline * (or indirectly via userspace facing ioctls like &sync_set_deadline). * The deadline hint provides a way for the waiting driver, or userspace, to * convey an appropriate sense of urgency to the signaling driver. * * A deadline hint is given in absolute ktime (CLOCK_MONOTONIC for userspace * facing APIs). The time could either be some point in the future (such as * the vblank based deadline for page-flipping, or the start of a compositor's * composition cycle), or the current time to indicate an immediate deadline * hint (Ie. forward progress cannot be made until this fence is signaled). * * Multiple deadlines may be set on a given fence, even in parallel. See the * documentation for &dma_fence_ops.set_deadline. * * The deadline hint is just that, a hint. The driver that created the fence * may react by increasing frequency, making different scheduling choices, etc. * Or doing nothing at all. */ /** * dma_fence_set_deadline - set desired fence-wait deadline hint * @fence: the fence that is to be waited on * @deadline: the time by which the waiter hopes for the fence to be * signaled * * Give the fence signaler a hint about an upcoming deadline, such as * vblank, by which point the waiter would prefer the fence to be * signaled by. This is intended to give feedback to the fence signaler * to aid in power management decisions, such as boosting GPU frequency * if a periodic vblank deadline is approaching but the fence is not * yet signaled.. */ void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline) { if (fence->ops->set_deadline && !dma_fence_is_signaled(fence)) fence->ops->set_deadline(fence, deadline); } EXPORT_SYMBOL(dma_fence_set_deadline); /** * dma_fence_describe - Dump fence description into seq_file * @fence: the fence to describe * @seq: the seq_file to put the textual description into * * Dump a textual description of the fence and it's state into the seq_file. */ void dma_fence_describe(struct dma_fence *fence, struct seq_file *seq) { const char __rcu *timeline = ""; const char __rcu *driver = ""; const char *signaled = ""; rcu_read_lock(); if (!dma_fence_is_signaled(fence)) { timeline = dma_fence_timeline_name(fence); driver = dma_fence_driver_name(fence); signaled = "un"; } seq_printf(seq, "%llu:%llu %s %s %ssignalled\n", fence->context, fence->seqno, timeline, driver, signaled); rcu_read_unlock(); } EXPORT_SYMBOL(dma_fence_describe); static void __dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops, spinlock_t *lock, u64 context, u64 seqno, unsigned long flags) { BUG_ON(!lock); BUG_ON(!ops || !ops->get_driver_name || !ops->get_timeline_name); kref_init(&fence->refcount); fence->ops = ops; INIT_LIST_HEAD(&fence->cb_list); fence->lock = lock; fence->context = context; fence->seqno = seqno; fence->flags = flags; fence->error = 0; trace_dma_fence_init(fence); } /** * dma_fence_init - Initialize a custom fence. * @fence: the fence to initialize * @ops: the dma_fence_ops for operations on this fence * @lock: the irqsafe spinlock to use for locking this fence * @context: the execution context this fence is run on * @seqno: a linear increasing sequence number for this context * * Initializes an allocated fence, the caller doesn't have to keep its * refcount after committing with this fence, but it will need to hold a * refcount again if &dma_fence_ops.enable_signaling gets called. * * context and seqno are used for easy comparison between fences, allowing * to check which fence is later by simply using dma_fence_later(). */ void dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops, spinlock_t *lock, u64 context, u64 seqno) { __dma_fence_init(fence, ops, lock, context, seqno, 0UL); } EXPORT_SYMBOL(dma_fence_init); /** * dma_fence_init64 - Initialize a custom fence with 64-bit seqno support. * @fence: the fence to initialize * @ops: the dma_fence_ops for operations on this fence * @lock: the irqsafe spinlock to use for locking this fence * @context: the execution context this fence is run on * @seqno: a linear increasing sequence number for this context * * Initializes an allocated fence, the caller doesn't have to keep its * refcount after committing with this fence, but it will need to hold a * refcount again if &dma_fence_ops.enable_signaling gets called. * * Context and seqno are used for easy comparison between fences, allowing * to check which fence is later by simply using dma_fence_later(). */ void dma_fence_init64(struct dma_fence *fence, const struct dma_fence_ops *ops, spinlock_t *lock, u64 context, u64 seqno) { __dma_fence_init(fence, ops, lock, context, seqno, BIT(DMA_FENCE_FLAG_SEQNO64_BIT)); } EXPORT_SYMBOL(dma_fence_init64); /** * dma_fence_driver_name - Access the driver name * @fence: the fence to query * * Returns a driver name backing the dma-fence implementation. * * IMPORTANT CONSIDERATION: * Dma-fence contract stipulates that access to driver provided data (data not * directly embedded into the object itself), such as the &dma_fence.lock and * memory potentially accessed by the &dma_fence.ops functions, is forbidden * after the fence has been signalled. Drivers are allowed to free that data, * and some do. * * To allow safe access drivers are mandated to guarantee a RCU grace period * between signalling the fence and freeing said data. * * As such access to the driver name is only valid inside a RCU locked section. * The pointer MUST be both queried and USED ONLY WITHIN a SINGLE block guarded * by the &rcu_read_lock and &rcu_read_unlock pair. */ const char __rcu *dma_fence_driver_name(struct dma_fence *fence) { RCU_LOCKDEP_WARN(!rcu_read_lock_held(), "RCU protection is required for safe access to returned string"); if (!dma_fence_test_signaled_flag(fence)) return fence->ops->get_driver_name(fence); else return "detached-driver"; } EXPORT_SYMBOL(dma_fence_driver_name); /** * dma_fence_timeline_name - Access the timeline name * @fence: the fence to query * * Returns a timeline name provided by the dma-fence implementation. * * IMPORTANT CONSIDERATION: * Dma-fence contract stipulates that access to driver provided data (data not * directly embedded into the object itself), such as the &dma_fence.lock and * memory potentially accessed by the &dma_fence.ops functions, is forbidden * after the fence has been signalled. Drivers are allowed to free that data, * and some do. * * To allow safe access drivers are mandated to guarantee a RCU grace period * between signalling the fence and freeing said data. * * As such access to the driver name is only valid inside a RCU locked section. * The pointer MUST be both queried and USED ONLY WITHIN a SINGLE block guarded * by the &rcu_read_lock and &rcu_read_unlock pair. */ const char __rcu *dma_fence_timeline_name(struct dma_fence *fence) { RCU_LOCKDEP_WARN(!rcu_read_lock_held(), "RCU protection is required for safe access to returned string"); if (!dma_fence_test_signaled_flag(fence)) return fence->ops->get_timeline_name(fence); else return "signaled-timeline"; } EXPORT_SYMBOL(dma_fence_timeline_name); |
| 12 12 12 10 10 1 2 18 18 48 47 44 48 4 44 3 3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 | // SPDX-License-Identifier: GPL-2.0-or-later /* * OSS compatible sequencer driver * * registration of device and proc * * Copyright (C) 1998,99 Takashi Iwai <tiwai@suse.de> */ #include <linux/init.h> #include <linux/module.h> #include <linux/mutex.h> #include <linux/compat.h> #include <sound/core.h> #include <sound/minors.h> #include <sound/initval.h> #include "seq_oss_device.h" #include "seq_oss_synth.h" /* * module option */ MODULE_AUTHOR("Takashi Iwai <tiwai@suse.de>"); MODULE_DESCRIPTION("OSS-compatible sequencer module"); MODULE_LICENSE("GPL"); /* Takashi says this is really only for sound-service-0-, but this is OK. */ MODULE_ALIAS_SNDRV_MINOR(SNDRV_MINOR_OSS_SEQUENCER); MODULE_ALIAS_SNDRV_MINOR(SNDRV_MINOR_OSS_MUSIC); /* * prototypes */ static int register_device(void); static void unregister_device(void); #ifdef CONFIG_SND_PROC_FS static int register_proc(void); static void unregister_proc(void); #else static inline int register_proc(void) { return 0; } static inline void unregister_proc(void) {} #endif static int odev_open(struct inode *inode, struct file *file); static int odev_release(struct inode *inode, struct file *file); static ssize_t odev_read(struct file *file, char __user *buf, size_t count, loff_t *offset); static ssize_t odev_write(struct file *file, const char __user *buf, size_t count, loff_t *offset); static long odev_ioctl(struct file *file, unsigned int cmd, unsigned long arg); static __poll_t odev_poll(struct file *file, poll_table * wait); /* * module interface */ static struct snd_seq_driver seq_oss_synth_driver = { .probe = snd_seq_oss_synth_probe, .remove = snd_seq_oss_synth_remove, .driver = { .name = KBUILD_MODNAME, }, .id = SNDRV_SEQ_DEV_ID_OSS, .argsize = sizeof(struct snd_seq_oss_reg), }; static int __init alsa_seq_oss_init(void) { int rc; rc = register_device(); if (rc < 0) goto error; rc = register_proc(); if (rc < 0) { unregister_device(); goto error; } rc = snd_seq_oss_create_client(); if (rc < 0) { unregister_proc(); unregister_device(); goto error; } rc = snd_seq_driver_register(&seq_oss_synth_driver); if (rc < 0) { snd_seq_oss_delete_client(); unregister_proc(); unregister_device(); goto error; } /* success */ snd_seq_oss_synth_init(); error: return rc; } static void __exit alsa_seq_oss_exit(void) { snd_seq_driver_unregister(&seq_oss_synth_driver); snd_seq_oss_delete_client(); unregister_proc(); unregister_device(); } module_init(alsa_seq_oss_init) module_exit(alsa_seq_oss_exit) /* * ALSA minor device interface */ static DEFINE_MUTEX(register_mutex); static int odev_open(struct inode *inode, struct file *file) { int level; if (iminor(inode) == SNDRV_MINOR_OSS_MUSIC) level = SNDRV_SEQ_OSS_MODE_MUSIC; else level = SNDRV_SEQ_OSS_MODE_SYNTH; guard(mutex)(®ister_mutex); return snd_seq_oss_open(file, level); } static int odev_release(struct inode *inode, struct file *file) { struct seq_oss_devinfo *dp; dp = file->private_data; if (!dp) return 0; guard(mutex)(®ister_mutex); snd_seq_oss_release(dp); return 0; } static ssize_t odev_read(struct file *file, char __user *buf, size_t count, loff_t *offset) { struct seq_oss_devinfo *dp; dp = file->private_data; if (snd_BUG_ON(!dp)) return -ENXIO; return snd_seq_oss_read(dp, buf, count); } static ssize_t odev_write(struct file *file, const char __user *buf, size_t count, loff_t *offset) { struct seq_oss_devinfo *dp; dp = file->private_data; if (snd_BUG_ON(!dp)) return -ENXIO; return snd_seq_oss_write(dp, buf, count, file); } static long odev_ioctl(struct file *file, unsigned int cmd, unsigned long arg) { struct seq_oss_devinfo *dp; long rc; dp = file->private_data; if (snd_BUG_ON(!dp)) return -ENXIO; if (cmd != SNDCTL_SEQ_SYNC && mutex_lock_interruptible(®ister_mutex)) return -ERESTARTSYS; rc = snd_seq_oss_ioctl(dp, cmd, arg); if (cmd != SNDCTL_SEQ_SYNC) mutex_unlock(®ister_mutex); return rc; } #ifdef CONFIG_COMPAT static long odev_ioctl_compat(struct file *file, unsigned int cmd, unsigned long arg) { return odev_ioctl(file, cmd, (unsigned long)compat_ptr(arg)); } #else #define odev_ioctl_compat NULL #endif static __poll_t odev_poll(struct file *file, poll_table * wait) { struct seq_oss_devinfo *dp; dp = file->private_data; if (snd_BUG_ON(!dp)) return EPOLLERR; return snd_seq_oss_poll(dp, file, wait); } /* * registration of sequencer minor device */ static const struct file_operations seq_oss_f_ops = { .owner = THIS_MODULE, .read = odev_read, .write = odev_write, .open = odev_open, .release = odev_release, .poll = odev_poll, .unlocked_ioctl = odev_ioctl, .compat_ioctl = odev_ioctl_compat, .llseek = noop_llseek, }; static int __init register_device(void) { int rc; guard(mutex)(®ister_mutex); rc = snd_register_oss_device(SNDRV_OSS_DEVICE_TYPE_SEQUENCER, NULL, 0, &seq_oss_f_ops, NULL); if (rc < 0) { pr_err("ALSA: seq_oss: can't register device seq\n"); return rc; } rc = snd_register_oss_device(SNDRV_OSS_DEVICE_TYPE_MUSIC, NULL, 0, &seq_oss_f_ops, NULL); if (rc < 0) { pr_err("ALSA: seq_oss: can't register device music\n"); snd_unregister_oss_device(SNDRV_OSS_DEVICE_TYPE_SEQUENCER, NULL, 0); return rc; } return 0; } static void unregister_device(void) { guard(mutex)(®ister_mutex); if (snd_unregister_oss_device(SNDRV_OSS_DEVICE_TYPE_MUSIC, NULL, 0) < 0) pr_err("ALSA: seq_oss: error unregister device music\n"); if (snd_unregister_oss_device(SNDRV_OSS_DEVICE_TYPE_SEQUENCER, NULL, 0) < 0) pr_err("ALSA: seq_oss: error unregister device seq\n"); } /* * /proc interface */ #ifdef CONFIG_SND_PROC_FS static struct snd_info_entry *info_entry; static void info_read(struct snd_info_entry *entry, struct snd_info_buffer *buf) { guard(mutex)(®ister_mutex); snd_iprintf(buf, "OSS sequencer emulation version %s\n", SNDRV_SEQ_OSS_VERSION_STR); snd_seq_oss_system_info_read(buf); snd_seq_oss_synth_info_read(buf); snd_seq_oss_midi_info_read(buf); } static int __init register_proc(void) { struct snd_info_entry *entry; entry = snd_info_create_module_entry(THIS_MODULE, SNDRV_SEQ_OSS_PROCNAME, snd_seq_root); if (entry == NULL) return -ENOMEM; entry->content = SNDRV_INFO_CONTENT_TEXT; entry->private_data = NULL; entry->c.text.read = info_read; if (snd_info_register(entry) < 0) { snd_info_free_entry(entry); return -ENOMEM; } info_entry = entry; return 0; } static void unregister_proc(void) { snd_info_free_entry(info_entry); info_entry = NULL; } #endif /* CONFIG_SND_PROC_FS */ |
| 5 3 8 8 8 7 5 8 8 5 6 8 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 | // SPDX-License-Identifier: GPL-2.0-only /* (C) 1999-2001 Paul `Rusty' Russell * (C) 2002-2004 Netfilter Core Team <coreteam@netfilter.org> */ #include <linux/module.h> #include <net/ipv6.h> #include <net/ip6_route.h> #include <net/ip6_fib.h> #include <net/ip6_checksum.h> #include <net/netfilter/ipv6/nf_reject.h> #include <linux/netfilter_ipv6.h> #include <linux/netfilter_bridge.h> static struct ipv6hdr * nf_reject_ip6hdr_put(struct sk_buff *nskb, const struct sk_buff *oldskb, __u8 protocol, int hoplimit); static void nf_reject_ip6_tcphdr_put(struct sk_buff *nskb, const struct sk_buff *oldskb, const struct tcphdr *oth, unsigned int otcplen); static const struct tcphdr * nf_reject_ip6_tcphdr_get(struct sk_buff *oldskb, struct tcphdr *otcph, unsigned int *otcplen, int hook); static bool nf_reject_v6_csum_ok(struct sk_buff *skb, int hook) { const struct ipv6hdr *ip6h = ipv6_hdr(skb); int thoff; __be16 fo; u8 proto = ip6h->nexthdr; if (skb_csum_unnecessary(skb)) return true; if (ip6h->payload_len && pskb_trim_rcsum(skb, ntohs(ip6h->payload_len) + sizeof(*ip6h))) return false; ip6h = ipv6_hdr(skb); thoff = ipv6_skip_exthdr(skb, ((u8*)(ip6h+1) - skb->data), &proto, &fo); if (thoff < 0 || thoff >= skb->len || (fo & htons(~0x7)) != 0) return false; if (!nf_reject_verify_csum(skb, thoff, proto)) return true; return nf_ip6_checksum(skb, hook, thoff, proto) == 0; } static int nf_reject_ip6hdr_validate(struct sk_buff *skb) { struct ipv6hdr *hdr; u32 pkt_len; if (!pskb_may_pull(skb, sizeof(struct ipv6hdr))) return 0; hdr = ipv6_hdr(skb); if (hdr->version != 6) return 0; pkt_len = ntohs(hdr->payload_len); if (pkt_len + sizeof(struct ipv6hdr) > skb->len) return 0; return 1; } struct sk_buff *nf_reject_skb_v6_tcp_reset(struct net *net, struct sk_buff *oldskb, const struct net_device *dev, int hook) { struct sk_buff *nskb; const struct tcphdr *oth; struct tcphdr _oth; unsigned int otcplen; struct ipv6hdr *nip6h; if (!nf_reject_ip6hdr_validate(oldskb)) return NULL; oth = nf_reject_ip6_tcphdr_get(oldskb, &_oth, &otcplen, hook); if (!oth) return NULL; nskb = alloc_skb(sizeof(struct ipv6hdr) + sizeof(struct tcphdr) + LL_MAX_HEADER, GFP_ATOMIC); if (!nskb) return NULL; nskb->dev = (struct net_device *)dev; skb_reserve(nskb, LL_MAX_HEADER); nip6h = nf_reject_ip6hdr_put(nskb, oldskb, IPPROTO_TCP, READ_ONCE(net->ipv6.devconf_all->hop_limit)); nf_reject_ip6_tcphdr_put(nskb, oldskb, oth, otcplen); nip6h->payload_len = htons(nskb->len - sizeof(struct ipv6hdr)); return nskb; } EXPORT_SYMBOL_GPL(nf_reject_skb_v6_tcp_reset); static bool nf_skb_is_icmp6_unreach(const struct sk_buff *skb) { const struct ipv6hdr *ip6h = ipv6_hdr(skb); u8 proto = ip6h->nexthdr; u8 _type, *tp; int thoff; __be16 fo; thoff = ipv6_skip_exthdr(skb, ((u8 *)(ip6h + 1) - skb->data), &proto, &fo); if (thoff < 0 || thoff >= skb->len || fo != 0) return false; if (proto != IPPROTO_ICMPV6) return false; tp = skb_header_pointer(skb, thoff + offsetof(struct icmp6hdr, icmp6_type), sizeof(_type), &_type); if (!tp) return false; return *tp == ICMPV6_DEST_UNREACH; } struct sk_buff *nf_reject_skb_v6_unreach(struct net *net, struct sk_buff *oldskb, const struct net_device *dev, int hook, u8 code) { struct sk_buff *nskb; struct ipv6hdr *nip6h; struct icmp6hdr *icmp6h; unsigned int len; if (!nf_reject_ip6hdr_validate(oldskb)) return NULL; /* Don't reply to ICMPV6_DEST_UNREACH with ICMPV6_DEST_UNREACH */ if (nf_skb_is_icmp6_unreach(oldskb)) return NULL; /* Include "As much of invoking packet as possible without the ICMPv6 * packet exceeding the minimum IPv6 MTU" in the ICMP payload. */ len = min_t(unsigned int, 1220, oldskb->len); if (!pskb_may_pull(oldskb, len)) return NULL; if (!nf_reject_v6_csum_ok(oldskb, hook)) return NULL; nskb = alloc_skb(sizeof(struct ipv6hdr) + sizeof(struct icmp6hdr) + LL_MAX_HEADER + len, GFP_ATOMIC); if (!nskb) return NULL; nskb->dev = (struct net_device *)dev; skb_reserve(nskb, LL_MAX_HEADER); nip6h = nf_reject_ip6hdr_put(nskb, oldskb, IPPROTO_ICMPV6, READ_ONCE(net->ipv6.devconf_all->hop_limit)); skb_reset_transport_header(nskb); icmp6h = skb_put_zero(nskb, sizeof(struct icmp6hdr)); icmp6h->icmp6_type = ICMPV6_DEST_UNREACH; icmp6h->icmp6_code = code; skb_put_data(nskb, skb_network_header(oldskb), len); nip6h->payload_len = htons(nskb->len - sizeof(struct ipv6hdr)); icmp6h->icmp6_cksum = csum_ipv6_magic(&nip6h->saddr, &nip6h->daddr, nskb->len - sizeof(struct ipv6hdr), IPPROTO_ICMPV6, csum_partial(icmp6h, nskb->len - sizeof(struct ipv6hdr), 0)); return nskb; } EXPORT_SYMBOL_GPL(nf_reject_skb_v6_unreach); static const struct tcphdr * nf_reject_ip6_tcphdr_get(struct sk_buff *oldskb, struct tcphdr *otcph, unsigned int *otcplen, int hook) { const struct ipv6hdr *oip6h = ipv6_hdr(oldskb); u8 proto; __be16 frag_off; int tcphoff; proto = oip6h->nexthdr; tcphoff = ipv6_skip_exthdr(oldskb, ((u8 *)(oip6h + 1) - oldskb->data), &proto, &frag_off); if ((tcphoff < 0) || (tcphoff > oldskb->len)) { pr_debug("Cannot get TCP header.\n"); return NULL; } *otcplen = oldskb->len - tcphoff; /* IP header checks: fragment, too short. */ if (proto != IPPROTO_TCP || *otcplen < sizeof(struct tcphdr)) { pr_debug("proto(%d) != IPPROTO_TCP or too short (len = %d)\n", proto, *otcplen); return NULL; } otcph = skb_header_pointer(oldskb, tcphoff, sizeof(struct tcphdr), otcph); if (otcph == NULL) return NULL; /* No RST for RST. */ if (otcph->rst) { pr_debug("RST is set\n"); return NULL; } /* Check checksum. */ if (nf_ip6_checksum(oldskb, hook, tcphoff, IPPROTO_TCP)) { pr_debug("TCP checksum is invalid\n"); return NULL; } return otcph; } static struct ipv6hdr * nf_reject_ip6hdr_put(struct sk_buff *nskb, const struct sk_buff *oldskb, __u8 protocol, int hoplimit) { struct ipv6hdr *ip6h; const struct ipv6hdr *oip6h = ipv6_hdr(oldskb); #define DEFAULT_TOS_VALUE 0x0U const __u8 tclass = DEFAULT_TOS_VALUE; skb_put(nskb, sizeof(struct ipv6hdr)); skb_reset_network_header(nskb); ip6h = ipv6_hdr(nskb); ip6_flow_hdr(ip6h, tclass, 0); ip6h->hop_limit = hoplimit; ip6h->nexthdr = protocol; ip6h->saddr = oip6h->daddr; ip6h->daddr = oip6h->saddr; nskb->protocol = htons(ETH_P_IPV6); return ip6h; } static void nf_reject_ip6_tcphdr_put(struct sk_buff *nskb, const struct sk_buff *oldskb, const struct tcphdr *oth, unsigned int otcplen) { struct tcphdr *tcph; skb_reset_transport_header(nskb); tcph = skb_put_zero(nskb, sizeof(struct tcphdr)); /* Truncate to length (no data) */ tcph->doff = sizeof(struct tcphdr)/4; tcph->source = oth->dest; tcph->dest = oth->source; if (oth->ack) { tcph->seq = oth->ack_seq; } else { tcph->ack_seq = htonl(ntohl(oth->seq) + oth->syn + oth->fin + otcplen - (oth->doff<<2)); tcph->ack = 1; } tcph->rst = 1; /* Adjust TCP checksum */ tcph->check = csum_ipv6_magic(&ipv6_hdr(nskb)->saddr, &ipv6_hdr(nskb)->daddr, sizeof(struct tcphdr), IPPROTO_TCP, csum_partial(tcph, sizeof(struct tcphdr), 0)); } static int nf_reject6_fill_skb_dst(struct sk_buff *skb_in) { struct dst_entry *dst = NULL; struct flowi fl; memset(&fl, 0, sizeof(struct flowi)); fl.u.ip6.daddr = ipv6_hdr(skb_in)->saddr; nf_ip6_route(dev_net(skb_in->dev), &dst, &fl, false); if (!dst) return -1; skb_dst_set(skb_in, dst); return 0; } void nf_send_reset6(struct net *net, struct sock *sk, struct sk_buff *oldskb, int hook) { const struct ipv6hdr *oip6h = ipv6_hdr(oldskb); struct dst_entry *dst = NULL; const struct tcphdr *otcph; struct sk_buff *nskb; struct tcphdr _otcph; unsigned int otcplen; struct flowi6 fl6; if ((!(ipv6_addr_type(&oip6h->saddr) & IPV6_ADDR_UNICAST)) || (!(ipv6_addr_type(&oip6h->daddr) & IPV6_ADDR_UNICAST))) { pr_debug("addr is not unicast.\n"); return; } otcph = nf_reject_ip6_tcphdr_get(oldskb, &_otcph, &otcplen, hook); if (!otcph) return; memset(&fl6, 0, sizeof(fl6)); fl6.flowi6_proto = IPPROTO_TCP; fl6.saddr = oip6h->daddr; fl6.daddr = oip6h->saddr; fl6.fl6_sport = otcph->dest; fl6.fl6_dport = otcph->source; if (!skb_dst(oldskb)) { nf_ip6_route(net, &dst, flowi6_to_flowi(&fl6), false); if (!dst) return; skb_dst_set(oldskb, dst); } fl6.flowi6_oif = l3mdev_master_ifindex(skb_dst_dev(oldskb)); fl6.flowi6_mark = IP6_REPLY_MARK(net, oldskb->mark); security_skb_classify_flow(oldskb, flowi6_to_flowi_common(&fl6)); dst = ip6_route_output(net, NULL, &fl6); if (dst->error) { dst_release(dst); return; } dst = xfrm_lookup(net, dst, flowi6_to_flowi(&fl6), NULL, 0); if (IS_ERR(dst)) return; nskb = alloc_skb(LL_MAX_HEADER + sizeof(struct ipv6hdr) + sizeof(struct tcphdr) + dst->trailer_len, GFP_ATOMIC); if (!nskb) { net_dbg_ratelimited("cannot alloc skb\n"); dst_release(dst); return; } skb_dst_set(nskb, dst); nskb->mark = fl6.flowi6_mark; skb_reserve(nskb, LL_MAX_HEADER); nf_reject_ip6hdr_put(nskb, oldskb, IPPROTO_TCP, ip6_dst_hoplimit(dst)); nf_reject_ip6_tcphdr_put(nskb, oldskb, otcph, otcplen); nf_ct_attach(nskb, oldskb); nf_ct_set_closing(skb_nfct(oldskb)); #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER) /* If we use ip6_local_out for bridged traffic, the MAC source on * the RST will be ours, instead of the destination's. This confuses * some routers/firewalls, and they drop the packet. So we need to * build the eth header using the original destination's MAC as the * source, and send the RST packet directly. */ if (nf_bridge_info_exists(oldskb)) { struct ethhdr *oeth = eth_hdr(oldskb); struct ipv6hdr *ip6h = ipv6_hdr(nskb); struct net_device *br_indev; br_indev = nf_bridge_get_physindev(oldskb, net); if (!br_indev) { kfree_skb(nskb); return; } nskb->dev = br_indev; nskb->protocol = htons(ETH_P_IPV6); ip6h->payload_len = htons(sizeof(struct tcphdr)); if (dev_hard_header(nskb, nskb->dev, ntohs(nskb->protocol), oeth->h_source, oeth->h_dest, nskb->len) < 0) { kfree_skb(nskb); return; } dev_queue_xmit(nskb); } else #endif ip6_local_out(net, sk, nskb); } EXPORT_SYMBOL_GPL(nf_send_reset6); static bool reject6_csum_ok(struct sk_buff *skb, int hook) { const struct ipv6hdr *ip6h = ipv6_hdr(skb); int thoff; __be16 fo; u8 proto; if (skb_csum_unnecessary(skb)) return true; proto = ip6h->nexthdr; thoff = ipv6_skip_exthdr(skb, ((u8 *)(ip6h + 1) - skb->data), &proto, &fo); if (thoff < 0 || thoff >= skb->len || (fo & htons(~0x7)) != 0) return false; if (!nf_reject_verify_csum(skb, thoff, proto)) return true; return nf_ip6_checksum(skb, hook, thoff, proto) == 0; } void nf_send_unreach6(struct net *net, struct sk_buff *skb_in, unsigned char code, unsigned int hooknum) { if (!reject6_csum_ok(skb_in, hooknum)) return; if (hooknum == NF_INET_LOCAL_OUT && skb_in->dev == NULL) skb_in->dev = net->loopback_dev; if (!skb_dst(skb_in) && nf_reject6_fill_skb_dst(skb_in) < 0) return; icmpv6_send(skb_in, ICMPV6_DEST_UNREACH, code, 0); } EXPORT_SYMBOL_GPL(nf_send_unreach6); MODULE_LICENSE("GPL"); MODULE_DESCRIPTION("IPv6 packet rejection core"); |
| 14 14 14 86 6 87 87 87 87 86 87 85 85 85 85 6 6 6 310 89 89 87 88 88 89 68 89 89 1 88 87 84 86 85 71 17 84 79 7 7 87 84 85 86 86 85 82 82 86 14 85 86 38 61 38 87 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 | // SPDX-License-Identifier: GPL-2.0-only /* * Copyright (C) 2008 IBM Corporation * * Author: Mimi Zohar <zohar@us.ibm.com> * * File: ima_api.c * Implements must_appraise_or_measure, collect_measurement, * appraise_measurement, store_measurement and store_template. */ #include <linux/slab.h> #include <linux/file.h> #include <linux/fs.h> #include <linux/hex.h> #include <linux/xattr.h> #include <linux/evm.h> #include <linux/fsverity.h> #include "ima.h" /* * ima_free_template_entry - free an existing template entry */ void ima_free_template_entry(struct ima_template_entry *entry) { int i; for (i = 0; i < entry->template_desc->num_fields; i++) kfree(entry->template_data[i].data); kfree(entry->digests); kfree(entry); } /* * ima_alloc_init_template - create and initialize a new template entry */ int ima_alloc_init_template(struct ima_event_data *event_data, struct ima_template_entry **entry, struct ima_template_desc *desc) { struct ima_template_desc *template_desc; struct tpm_digest *digests; int i, result = 0; if (desc) template_desc = desc; else template_desc = ima_template_desc_current(); *entry = kzalloc_flex(**entry, template_data, template_desc->num_fields, GFP_NOFS); if (!*entry) return -ENOMEM; digests = kzalloc_objs(*digests, NR_BANKS(ima_tpm_chip) + ima_extra_slots, GFP_NOFS); if (!digests) { kfree(*entry); *entry = NULL; return -ENOMEM; } (*entry)->digests = digests; (*entry)->template_desc = template_desc; for (i = 0; i < template_desc->num_fields; i++) { const struct ima_template_field *field = template_desc->fields[i]; u32 len; result = field->field_init(event_data, &((*entry)->template_data[i])); if (result != 0) goto out; len = (*entry)->template_data[i].len; (*entry)->template_data_len += sizeof(len); (*entry)->template_data_len += len; } return 0; out: ima_free_template_entry(*entry); *entry = NULL; return result; } /* * ima_store_template - store ima template measurements * * Calculate the hash of a template entry, add the template entry * to an ordered list of measurement entries maintained inside the kernel, * and also update the aggregate integrity value (maintained inside the * configured TPM PCR) over the hashes of the current list of measurement * entries. * * Applications retrieve the current kernel-held measurement list through * the securityfs entries in /sys/kernel/security/ima. The signed aggregate * TPM PCR (called quote) can be retrieved using a TPM user space library * and is used to validate the measurement list. * * Returns 0 on success, error code otherwise */ int ima_store_template(struct ima_template_entry *entry, int violation, struct inode *inode, const unsigned char *filename, int pcr) { static const char op[] = "add_template_measure"; static const char audit_cause[] = "hashing_error"; char *template_name = entry->template_desc->name; int result; if (!violation) { result = ima_calc_field_array_hash(&entry->template_data[0], entry); if (result < 0) { integrity_audit_msg(AUDIT_INTEGRITY_PCR, inode, template_name, op, audit_cause, result, 0); return result; } } entry->pcr = pcr; result = ima_add_template_entry(entry, violation, op, inode, filename); return result; } /* * ima_add_violation - add violation to measurement list. * * Violations are flagged in the measurement list with zero hash values. * By extending the PCR with 0xFF's instead of with zeroes, the PCR * value is invalidated. */ void ima_add_violation(struct file *file, const unsigned char *filename, struct ima_iint_cache *iint, const char *op, const char *cause) { struct ima_template_entry *entry; struct inode *inode = file_inode(file); struct ima_event_data event_data = { .iint = iint, .file = file, .filename = filename, .violation = cause }; int violation = 1; int result; /* can overflow, only indicator */ atomic_long_inc(&ima_htable.violations); result = ima_alloc_init_template(&event_data, &entry, NULL); if (result < 0) { result = -ENOMEM; goto err_out; } result = ima_store_template(entry, violation, inode, filename, CONFIG_IMA_MEASURE_PCR_IDX); if (result < 0) ima_free_template_entry(entry); err_out: integrity_audit_msg(AUDIT_INTEGRITY_PCR, inode, filename, op, cause, result, 0); } /** * ima_get_action - appraise & measure decision based on policy. * @idmap: idmap of the mount the inode was found from * @inode: pointer to the inode associated with the object being validated * @cred: pointer to credentials structure to validate * @prop: properties of the task being validated * @mask: contains the permission mask (MAY_READ, MAY_WRITE, MAY_EXEC, * MAY_APPEND) * @func: caller identifier * @pcr: pointer filled in if matched measure policy sets pcr= * @template_desc: pointer filled in if matched measure policy sets template= * @func_data: func specific data, may be NULL * @allowed_algos: allowlist of hash algorithms for the IMA xattr * * The policy is defined in terms of keypairs: * subj=, obj=, type=, func=, mask=, fsmagic= * subj,obj, and type: are LSM specific. * func: FILE_CHECK | BPRM_CHECK | CREDS_CHECK | MMAP_CHECK | MODULE_CHECK * | KEXEC_CMDLINE | KEY_CHECK | CRITICAL_DATA | SETXATTR_CHECK * | MMAP_CHECK_REQPROT * mask: contains the permission mask * fsmagic: hex value * * Returns IMA_MEASURE, IMA_APPRAISE mask. * */ int ima_get_action(struct mnt_idmap *idmap, struct inode *inode, const struct cred *cred, struct lsm_prop *prop, int mask, enum ima_hooks func, int *pcr, struct ima_template_desc **template_desc, const char *func_data, unsigned int *allowed_algos) { int flags = IMA_MEASURE | IMA_AUDIT | IMA_APPRAISE | IMA_HASH; flags &= ima_policy_flag; return ima_match_policy(idmap, inode, cred, prop, func, mask, flags, pcr, template_desc, func_data, allowed_algos); } static bool ima_get_verity_digest(struct ima_iint_cache *iint, struct inode *inode, struct ima_max_digest_data *hash) { enum hash_algo alg; int digest_len; /* * On failure, 'measure' policy rules will result in a file data * hash containing 0's. */ digest_len = fsverity_get_digest(inode, hash->digest, NULL, &alg); if (digest_len == 0) return false; /* * Unlike in the case of actually calculating the file hash, in * the fsverity case regardless of the hash algorithm, return * the verity digest to be included in the measurement list. A * mismatch between the verity algorithm and the xattr signature * algorithm, if one exists, will be detected later. */ hash->hdr.algo = alg; hash->hdr.length = digest_len; return true; } /* * ima_collect_measurement - collect file measurement * * Calculate the file hash, if it doesn't already exist, * storing the measurement and i_version in the iint. * * Must be called with iint->mutex held. * * Return 0 on success, error code otherwise */ int ima_collect_measurement(struct ima_iint_cache *iint, struct file *file, void *buf, loff_t size, enum hash_algo algo, struct modsig *modsig) { const char *audit_cause = "failed"; struct inode *inode = file_inode(file); struct inode *real_inode = d_real_inode(file_dentry(file)); struct ima_max_digest_data hash; struct ima_digest_data *hash_hdr = container_of(&hash.hdr, struct ima_digest_data, hdr); struct name_snapshot filename; struct kstat stat; int result = 0; int length; void *tmpbuf; u64 i_version = 0; /* * Always collect the modsig, because IMA might have already collected * the file digest without collecting the modsig in a previous * measurement rule. */ if (modsig) ima_collect_modsig(modsig, buf, size); if (iint->flags & IMA_COLLECTED) goto out; /* * Detecting file change is based on i_version. On filesystems * which do not support i_version, support was originally limited * to an initial measurement/appraisal/audit, but was modified to * assume the file changed. */ result = vfs_getattr_nosec(&file->f_path, &stat, STATX_CHANGE_COOKIE, AT_STATX_SYNC_AS_STAT); if (!result && (stat.result_mask & STATX_CHANGE_COOKIE)) i_version = stat.change_cookie; hash.hdr.algo = algo; hash.hdr.length = hash_digest_size[algo]; /* Initialize hash digest to 0's in case of failure */ memset(&hash.digest, 0, sizeof(hash.digest)); if (iint->flags & IMA_VERITY_REQUIRED) { if (!ima_get_verity_digest(iint, inode, &hash)) { audit_cause = "no-verity-digest"; result = -ENODATA; } } else if (buf) { result = ima_calc_buffer_hash(buf, size, hash_hdr); } else { result = ima_calc_file_hash(file, hash_hdr); } if (result && result != -EBADF && result != -EINVAL) goto out; length = sizeof(hash.hdr) + hash.hdr.length; tmpbuf = krealloc(iint->ima_hash, length, GFP_NOFS); if (!tmpbuf) { result = -ENOMEM; goto out; } iint->ima_hash = tmpbuf; memcpy(iint->ima_hash, &hash, length); if (real_inode == inode) iint->real_inode.version = i_version; else integrity_inode_attrs_store(&iint->real_inode, i_version, real_inode); /* Possibly temporary failure due to type of read (eg. O_DIRECT) */ if (!result) iint->flags |= IMA_COLLECTED; out: if (result) { if (file->f_flags & O_DIRECT) audit_cause = "failed(directio)"; take_dentry_name_snapshot(&filename, file->f_path.dentry); integrity_audit_msg(AUDIT_INTEGRITY_DATA, inode, filename.name.name, "collect_data", audit_cause, result, 0); release_dentry_name_snapshot(&filename); } return result; } /* * ima_store_measurement - store file measurement * * Create an "ima" template and then store the template by calling * ima_store_template. * * We only get here if the inode has not already been measured, * but the measurement could already exist: * - multiple copies of the same file on either the same or * different filesystems. * - the inode was previously flushed as well as the iint info, * containing the hashing info. * * Must be called with iint->mutex held. */ void ima_store_measurement(struct ima_iint_cache *iint, struct file *file, const unsigned char *filename, struct evm_ima_xattr_data *xattr_value, int xattr_len, const struct modsig *modsig, int pcr, struct ima_template_desc *template_desc) { static const char op[] = "add_template_measure"; static const char audit_cause[] = "ENOMEM"; int result = -ENOMEM; struct inode *inode = file_inode(file); struct ima_template_entry *entry; struct ima_event_data event_data = { .iint = iint, .file = file, .filename = filename, .xattr_value = xattr_value, .xattr_len = xattr_len, .modsig = modsig }; int violation = 0; /* * We still need to store the measurement in the case of MODSIG because * we only have its contents to put in the list at the time of * appraisal, but a file measurement from earlier might already exist in * the measurement list. */ if (iint->measured_pcrs & (0x1 << pcr) && !modsig) return; result = ima_alloc_init_template(&event_data, &entry, template_desc); if (result < 0) { integrity_audit_msg(AUDIT_INTEGRITY_PCR, inode, filename, op, audit_cause, result, 0); return; } result = ima_store_template(entry, violation, inode, filename, pcr); if ((!result || result == -EEXIST) && !(file->f_flags & O_DIRECT)) { iint->flags |= IMA_MEASURED; iint->measured_pcrs |= (0x1 << pcr); } if (result < 0) ima_free_template_entry(entry); } void ima_audit_measurement(struct ima_iint_cache *iint, const unsigned char *filename) { struct audit_buffer *ab; char *hash; const char *algo_name = hash_algo_name[iint->ima_hash->algo]; int i; if (iint->flags & IMA_AUDITED) return; hash = kzalloc((iint->ima_hash->length * 2) + 1, GFP_KERNEL); if (!hash) return; for (i = 0; i < iint->ima_hash->length; i++) hex_byte_pack(hash + (i * 2), iint->ima_hash->digest[i]); hash[i * 2] = '\0'; ab = audit_log_start(audit_context(), GFP_KERNEL, AUDIT_INTEGRITY_RULE); if (!ab) goto out; audit_log_format(ab, "file="); audit_log_untrustedstring(ab, filename); audit_log_format(ab, " hash=\"%s:%s\"", algo_name, hash); audit_log_task_info(ab); audit_log_end(ab); iint->flags |= IMA_AUDITED; out: kfree(hash); return; } /* * ima_d_path - return a pointer to the full pathname * * Attempt to return a pointer to the full pathname for use in the * IMA measurement list, IMA audit records, and auditing logs. * * On failure, return a pointer to a copy of the filename, not dname. * Returning a pointer to dname, could result in using the pointer * after the memory has been freed. */ const char *ima_d_path(const struct path *path, char **pathbuf, char *namebuf) { struct name_snapshot filename; char *pathname = NULL; *pathbuf = __getname(); if (*pathbuf) { pathname = d_absolute_path(path, *pathbuf, PATH_MAX); if (IS_ERR(pathname)) { __putname(*pathbuf); *pathbuf = NULL; pathname = NULL; } } if (!pathname) { take_dentry_name_snapshot(&filename, path->dentry); strscpy(namebuf, filename.name.name, NAME_MAX); release_dentry_name_snapshot(&filename); pathname = namebuf; } return pathname; } |
| 29 29 28 27 26 29 15 2 15 15 15 8 8 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 | // SPDX-License-Identifier: GPL-2.0 #include <linux/kernel.h> #include <linux/ip.h> #include <linux/sctp.h> #include <net/ip.h> #include <net/ip6_checksum.h> #include <linux/netfilter.h> #include <linux/netfilter_ipv4.h> #include <net/sctp/checksum.h> #include <net/ip_vs.h> static int sctp_csum_check(int af, struct sk_buff *skb, struct ip_vs_protocol *pp, unsigned int sctphoff); static int sctp_conn_schedule(struct netns_ipvs *ipvs, int af, struct sk_buff *skb, struct ip_vs_proto_data *pd, int *verdict, struct ip_vs_conn **cpp, struct ip_vs_iphdr *iph) { struct ip_vs_service *svc; struct sctp_chunkhdr _schunkh, *sch; struct sctphdr *sh, _sctph; __be16 _ports[2], *ports = NULL; if (likely(!ip_vs_iph_icmp(iph))) { sh = skb_header_pointer(skb, iph->len, sizeof(_sctph), &_sctph); if (sh) { sch = skb_header_pointer(skb, iph->len + sizeof(_sctph), sizeof(_schunkh), &_schunkh); if (sch) { if (sch->type == SCTP_CID_ABORT || !(sysctl_sloppy_sctp(ipvs) || sch->type == SCTP_CID_INIT)) return 1; ports = &sh->source; } } } else { ports = skb_header_pointer( skb, iph->len, sizeof(_ports), &_ports); } if (!ports) { *verdict = NF_DROP; return 0; } if (likely(!ip_vs_iph_inverse(iph))) svc = ip_vs_service_find(ipvs, af, skb->mark, iph->protocol, &iph->daddr, ports[1]); else svc = ip_vs_service_find(ipvs, af, skb->mark, iph->protocol, &iph->saddr, ports[0]); if (svc) { int ignored; if (ip_vs_todrop(ipvs)) { /* * It seems that we are very loaded. * We have to drop this packet :( */ *verdict = NF_DROP; return 0; } /* * Let the virtual server select a real server for the * incoming connection, and create a connection entry. */ *cpp = ip_vs_schedule(svc, skb, pd, &ignored, iph); if (!*cpp && ignored <= 0) { if (!ignored) *verdict = ip_vs_leave(svc, skb, pd, iph); else *verdict = NF_DROP; return 0; } } /* NF_ACCEPT */ return 1; } static void sctp_nat_csum(struct sk_buff *skb, struct sctphdr *sctph, unsigned int sctphoff) { sctph->checksum = sctp_compute_cksum(skb, sctphoff); skb->ip_summed = CHECKSUM_UNNECESSARY; } static int sctp_snat_handler(struct sk_buff *skb, struct ip_vs_protocol *pp, struct ip_vs_conn *cp, struct ip_vs_iphdr *iph) { struct sctphdr *sctph; unsigned int sctphoff = iph->len; bool payload_csum = false; #ifdef CONFIG_IP_VS_IPV6 if (cp->af == AF_INET6 && iph->fragoffs) return 1; #endif /* csum_check requires unshared skb */ if (skb_ensure_writable(skb, sctphoff + sizeof(*sctph))) return 0; if (unlikely(cp->app != NULL)) { int ret; /* Some checks before mangling */ if (!sctp_csum_check(cp->af, skb, pp, sctphoff)) return 0; /* Call application helper if needed */ ret = ip_vs_app_pkt_out(cp, skb, iph); if (ret == 0) return 0; /* ret=2: csum update is needed after payload mangling */ if (ret == 2) payload_csum = true; } sctph = (void *) skb_network_header(skb) + sctphoff; /* Only update csum if we really have to */ if (sctph->source != cp->vport || payload_csum || skb->ip_summed == CHECKSUM_PARTIAL) { sctph->source = cp->vport; if (!skb_is_gso(skb)) sctp_nat_csum(skb, sctph, sctphoff); } else { skb->ip_summed = CHECKSUM_UNNECESSARY; } return 1; } static int sctp_dnat_handler(struct sk_buff *skb, struct ip_vs_protocol *pp, struct ip_vs_conn *cp, struct ip_vs_iphdr *iph) { struct sctphdr *sctph; unsigned int sctphoff = iph->len; bool payload_csum = false; #ifdef CONFIG_IP_VS_IPV6 if (cp->af == AF_INET6 && iph->fragoffs) return 1; #endif /* csum_check requires unshared skb */ if (skb_ensure_writable(skb, sctphoff + sizeof(*sctph))) return 0; if (unlikely(cp->app != NULL)) { int ret; /* Some checks before mangling */ if (!sctp_csum_check(cp->af, skb, pp, sctphoff)) return 0; /* Call application helper if needed */ ret = ip_vs_app_pkt_in(cp, skb, iph); if (ret == 0) return 0; /* ret=2: csum update is needed after payload mangling */ if (ret == 2) payload_csum = true; } sctph = (void *) skb_network_header(skb) + sctphoff; /* Only update csum if we really have to */ if (sctph->dest != cp->dport || payload_csum || (skb->ip_summed == CHECKSUM_PARTIAL && !(skb_dst(skb)->dev->features & NETIF_F_SCTP_CRC))) { sctph->dest = cp->dport; if (!skb_is_gso(skb)) sctp_nat_csum(skb, sctph, sctphoff); } else if (skb->ip_summed != CHECKSUM_PARTIAL) { skb->ip_summed = CHECKSUM_UNNECESSARY; } return 1; } static int sctp_csum_check(int af, struct sk_buff *skb, struct ip_vs_protocol *pp, unsigned int sctphoff) { struct sctphdr *sh; __le32 cmp, val; sh = (struct sctphdr *)(skb->data + sctphoff); cmp = sh->checksum; val = sctp_compute_cksum(skb, sctphoff); if (val != cmp) { /* CRC failure, dump it. */ IP_VS_DBG_RL_PKT(0, af, pp, skb, 0, "Failed checksum for"); return 0; } return 1; } enum ipvs_sctp_event_t { IP_VS_SCTP_DATA = 0, /* DATA, SACK, HEARTBEATs */ IP_VS_SCTP_INIT, IP_VS_SCTP_INIT_ACK, IP_VS_SCTP_COOKIE_ECHO, IP_VS_SCTP_COOKIE_ACK, IP_VS_SCTP_SHUTDOWN, IP_VS_SCTP_SHUTDOWN_ACK, IP_VS_SCTP_SHUTDOWN_COMPLETE, IP_VS_SCTP_ERROR, IP_VS_SCTP_ABORT, IP_VS_SCTP_EVENT_LAST }; /* RFC 2960, 3.2 Chunk Field Descriptions */ static __u8 sctp_events[] = { [SCTP_CID_DATA] = IP_VS_SCTP_DATA, [SCTP_CID_INIT] = IP_VS_SCTP_INIT, [SCTP_CID_INIT_ACK] = IP_VS_SCTP_INIT_ACK, [SCTP_CID_SACK] = IP_VS_SCTP_DATA, [SCTP_CID_HEARTBEAT] = IP_VS_SCTP_DATA, [SCTP_CID_HEARTBEAT_ACK] = IP_VS_SCTP_DATA, [SCTP_CID_ABORT] = IP_VS_SCTP_ABORT, [SCTP_CID_SHUTDOWN] = IP_VS_SCTP_SHUTDOWN, [SCTP_CID_SHUTDOWN_ACK] = IP_VS_SCTP_SHUTDOWN_ACK, [SCTP_CID_ERROR] = IP_VS_SCTP_ERROR, [SCTP_CID_COOKIE_ECHO] = IP_VS_SCTP_COOKIE_ECHO, [SCTP_CID_COOKIE_ACK] = IP_VS_SCTP_COOKIE_ACK, [SCTP_CID_ECN_ECNE] = IP_VS_SCTP_DATA, [SCTP_CID_ECN_CWR] = IP_VS_SCTP_DATA, [SCTP_CID_SHUTDOWN_COMPLETE] = IP_VS_SCTP_SHUTDOWN_COMPLETE, }; /* SCTP States: * See RFC 2960, 4. SCTP Association State Diagram * * New states (not in diagram): * - INIT1 state: use shorter timeout for dropped INIT packets * - REJECTED state: use shorter timeout if INIT is rejected with ABORT * - INIT, COOKIE_SENT, COOKIE_REPLIED, COOKIE states: for better debugging * * The states are as seen in real server. In the diagram, INIT1, INIT, * COOKIE_SENT and COOKIE_REPLIED processing happens in CLOSED state. * * States as per packets from client (C) and server (S): * * Setup of client connection: * IP_VS_SCTP_S_INIT1: First C:INIT sent, wait for S:INIT-ACK * IP_VS_SCTP_S_INIT: Next C:INIT sent, wait for S:INIT-ACK * IP_VS_SCTP_S_COOKIE_SENT: S:INIT-ACK sent, wait for C:COOKIE-ECHO * IP_VS_SCTP_S_COOKIE_REPLIED: C:COOKIE-ECHO sent, wait for S:COOKIE-ACK * * Setup of server connection: * IP_VS_SCTP_S_COOKIE_WAIT: S:INIT sent, wait for C:INIT-ACK * IP_VS_SCTP_S_COOKIE: C:INIT-ACK sent, wait for S:COOKIE-ECHO * IP_VS_SCTP_S_COOKIE_ECHOED: S:COOKIE-ECHO sent, wait for C:COOKIE-ACK */ #define sNO IP_VS_SCTP_S_NONE #define sI1 IP_VS_SCTP_S_INIT1 #define sIN IP_VS_SCTP_S_INIT #define sCS IP_VS_SCTP_S_COOKIE_SENT #define sCR IP_VS_SCTP_S_COOKIE_REPLIED #define sCW IP_VS_SCTP_S_COOKIE_WAIT #define sCO IP_VS_SCTP_S_COOKIE #define sCE IP_VS_SCTP_S_COOKIE_ECHOED #define sES IP_VS_SCTP_S_ESTABLISHED #define sSS IP_VS_SCTP_S_SHUTDOWN_SENT #define sSR IP_VS_SCTP_S_SHUTDOWN_RECEIVED #define sSA IP_VS_SCTP_S_SHUTDOWN_ACK_SENT #define sRJ IP_VS_SCTP_S_REJECTED #define sCL IP_VS_SCTP_S_CLOSED static const __u8 sctp_states [IP_VS_DIR_LAST][IP_VS_SCTP_EVENT_LAST][IP_VS_SCTP_S_LAST] = { { /* INPUT */ /* sNO, sI1, sIN, sCS, sCR, sCW, sCO, sCE, sES, sSS, sSR, sSA, sRJ, sCL*/ /* d */{sES, sI1, sIN, sCS, sCR, sCW, sCO, sCE, sES, sSS, sSR, sSA, sRJ, sCL}, /* i */{sI1, sIN, sIN, sCS, sCR, sCW, sCO, sCE, sES, sSS, sSR, sSA, sIN, sIN}, /* i_a */{sCW, sCW, sCW, sCS, sCR, sCO, sCO, sCE, sES, sSS, sSR, sSA, sRJ, sCL}, /* c_e */{sCR, sIN, sIN, sCR, sCR, sCW, sCO, sCE, sES, sSS, sSR, sSA, sRJ, sCL}, /* c_a */{sES, sI1, sIN, sCS, sCR, sCW, sCO, sES, sES, sSS, sSR, sSA, sRJ, sCL}, /* s */{sSR, sI1, sIN, sCS, sCR, sCW, sCO, sCE, sSR, sSS, sSR, sSA, sRJ, sCL}, /* s_a */{sCL, sIN, sIN, sCS, sCR, sCW, sCO, sCE, sES, sCL, sSR, sCL, sRJ, sCL}, /* s_c */{sCL, sCL, sCL, sCS, sCR, sCW, sCO, sCE, sES, sSS, sSR, sCL, sRJ, sCL}, /* err */{sCL, sI1, sIN, sCS, sCR, sCW, sCO, sCL, sES, sSS, sSR, sSA, sRJ, sCL}, /* ab */{sCL, sCL, sCL, sCL, sCL, sRJ, sCL, sCL, sCL, sCL, sCL, sCL, sCL, sCL}, }, { /* OUTPUT */ /* sNO, sI1, sIN, sCS, sCR, sCW, sCO, sCE, sES, sSS, sSR, sSA, sRJ, sCL*/ /* d */{sES, sI1, sIN, sCS, sCR, sCW, sCO, sCE, sES, sSS, sSR, sSA, sRJ, sCL}, /* i */{sCW, sCW, sCW, sCW, sCW, sCW, sCW, sCW, sES, sCW, sCW, sCW, sCW, sCW}, /* i_a */{sCS, sCS, sCS, sCS, sCR, sCW, sCO, sCE, sES, sSS, sSR, sSA, sRJ, sCL}, /* c_e */{sCE, sCE, sCE, sCE, sCE, sCE, sCE, sCE, sES, sSS, sSR, sSA, sRJ, sCL}, /* c_a */{sES, sES, sES, sES, sES, sES, sES, sES, sES, sSS, sSR, sSA, sRJ, sCL}, /* s */{sSS, sSS, sSS, sSS, sSS, sSS, sSS, sSS, sSS, sSS, sSR, sSA, sRJ, sCL}, /* s_a */{sSA, sSA, sSA, sSA, sSA, sCW, sCO, sCE, sES, sSA, sSA, sSA, sRJ, sCL}, /* s_c */{sCL, sI1, sIN, sCS, sCR, sCW, sCO, sCE, sES, sSS, sSR, sSA, sRJ, sCL}, /* err */{sCL, sCL, sCL, sCL, sCL, sCW, sCO, sCE, sES, sSS, sSR, sSA, sRJ, sCL}, /* ab */{sCL, sRJ, sCL, sCL, sCL, sCL, sCL, sCL, sCL, sCL, sCL, sCL, sCL, sCL}, }, { /* INPUT-ONLY */ /* sNO, sI1, sIN, sCS, sCR, sCW, sCO, sCE, sES, sSS, sSR, sSA, sRJ, sCL*/ /* d */{sES, sI1, sIN, sCS, sCR, sES, sCO, sCE, sES, sSS, sSR, sSA, sRJ, sCL}, /* i */{sI1, sIN, sIN, sIN, sIN, sIN, sCO, sCE, sES, sSS, sSR, sSA, sIN, sIN}, /* i_a */{sCE, sCE, sCE, sCE, sCE, sCE, sCO, sCE, sES, sSS, sSR, sSA, sRJ, sCL}, /* c_e */{sES, sES, sES, sES, sES, sES, sCO, sCE, sES, sSS, sSR, sSA, sRJ, sCL}, /* c_a */{sES, sI1, sIN, sES, sES, sCW, sES, sES, sES, sSS, sSR, sSA, sRJ, sCL}, /* s */{sSR, sI1, sIN, sCS, sCR, sCW, sCO, sCE, sSR, sSS, sSR, sSA, sRJ, sCL}, /* s_a */{sCL, sIN, sIN, sCS, sCR, sCW, sCO, sCE, sCL, sCL, sSR, sCL, sRJ, sCL}, /* s_c */{sCL, sCL, sCL, sCL, sCL, sCW, sCO, sCE, sES, sSS, sCL, sCL, sRJ, sCL}, /* err */{sCL, sI1, sIN, sCS, sCR, sCW, sCO, sCE, sES, sSS, sSR, sSA, sRJ, sCL}, /* ab */{sCL, sCL, sCL, sCL, sCL, sRJ, sCL, sCL, sCL, sCL, sCL, sCL, sCL, sCL}, }, }; #define IP_VS_SCTP_MAX_RTO ((60 + 1) * HZ) /* Timeout table[state] */ static const int sctp_timeouts[IP_VS_SCTP_S_LAST + 1] = { [IP_VS_SCTP_S_NONE] = 2 * HZ, [IP_VS_SCTP_S_INIT1] = (0 + 3 + 1) * HZ, [IP_VS_SCTP_S_INIT] = IP_VS_SCTP_MAX_RTO, [IP_VS_SCTP_S_COOKIE_SENT] = IP_VS_SCTP_MAX_RTO, [IP_VS_SCTP_S_COOKIE_REPLIED] = IP_VS_SCTP_MAX_RTO, [IP_VS_SCTP_S_COOKIE_WAIT] = IP_VS_SCTP_MAX_RTO, [IP_VS_SCTP_S_COOKIE] = IP_VS_SCTP_MAX_RTO, [IP_VS_SCTP_S_COOKIE_ECHOED] = IP_VS_SCTP_MAX_RTO, [IP_VS_SCTP_S_ESTABLISHED] = 15 * 60 * HZ, [IP_VS_SCTP_S_SHUTDOWN_SENT] = IP_VS_SCTP_MAX_RTO, [IP_VS_SCTP_S_SHUTDOWN_RECEIVED] = IP_VS_SCTP_MAX_RTO, [IP_VS_SCTP_S_SHUTDOWN_ACK_SENT] = IP_VS_SCTP_MAX_RTO, [IP_VS_SCTP_S_REJECTED] = (0 + 3 + 1) * HZ, [IP_VS_SCTP_S_CLOSED] = IP_VS_SCTP_MAX_RTO, [IP_VS_SCTP_S_LAST] = 2 * HZ, }; static const char *sctp_state_name_table[IP_VS_SCTP_S_LAST + 1] = { [IP_VS_SCTP_S_NONE] = "NONE", [IP_VS_SCTP_S_INIT1] = "INIT1", [IP_VS_SCTP_S_INIT] = "INIT", [IP_VS_SCTP_S_COOKIE_SENT] = "C-SENT", [IP_VS_SCTP_S_COOKIE_REPLIED] = "C-REPLIED", [IP_VS_SCTP_S_COOKIE_WAIT] = "C-WAIT", [IP_VS_SCTP_S_COOKIE] = "COOKIE", [IP_VS_SCTP_S_COOKIE_ECHOED] = "C-ECHOED", [IP_VS_SCTP_S_ESTABLISHED] = "ESTABLISHED", [IP_VS_SCTP_S_SHUTDOWN_SENT] = "S-SENT", [IP_VS_SCTP_S_SHUTDOWN_RECEIVED] = "S-RECEIVED", [IP_VS_SCTP_S_SHUTDOWN_ACK_SENT] = "S-ACK-SENT", [IP_VS_SCTP_S_REJECTED] = "REJECTED", [IP_VS_SCTP_S_CLOSED] = "CLOSED", [IP_VS_SCTP_S_LAST] = "BUG!", }; static const char *sctp_state_name(int state) { if (state >= IP_VS_SCTP_S_LAST) return "ERR!"; if (sctp_state_name_table[state]) return sctp_state_name_table[state]; return "?"; } static inline void set_sctp_state(struct ip_vs_proto_data *pd, struct ip_vs_conn *cp, int direction, const struct sk_buff *skb) { struct sctp_chunkhdr _sctpch, *sch; unsigned char chunk_type; int event, next_state; int ihl, cofs; #ifdef CONFIG_IP_VS_IPV6 ihl = cp->af == AF_INET ? ip_hdrlen(skb) : sizeof(struct ipv6hdr); #else ihl = ip_hdrlen(skb); #endif cofs = ihl + sizeof(struct sctphdr); sch = skb_header_pointer(skb, cofs, sizeof(_sctpch), &_sctpch); if (sch == NULL) return; chunk_type = sch->type; /* * Section 3: Multiple chunks can be bundled into one SCTP packet * up to the MTU size, except for the INIT, INIT ACK, and * SHUTDOWN COMPLETE chunks. These chunks MUST NOT be bundled with * any other chunk in a packet. * * Section 3.3.7: DATA chunks MUST NOT be bundled with ABORT. Control * chunks (except for INIT, INIT ACK, and SHUTDOWN COMPLETE) MAY be * bundled with an ABORT, but they MUST be placed before the ABORT * in the SCTP packet or they will be ignored by the receiver. */ if ((sch->type == SCTP_CID_COOKIE_ECHO) || (sch->type == SCTP_CID_COOKIE_ACK)) { int clen = ntohs(sch->length); if (clen >= sizeof(_sctpch)) { sch = skb_header_pointer(skb, cofs + ALIGN(clen, 4), sizeof(_sctpch), &_sctpch); if (sch && sch->type == SCTP_CID_ABORT) chunk_type = sch->type; } } event = (chunk_type < sizeof(sctp_events)) ? sctp_events[chunk_type] : IP_VS_SCTP_DATA; /* Update direction to INPUT_ONLY if necessary * or delete NO_OUTPUT flag if output packet detected */ if (cp->flags & IP_VS_CONN_F_NOOUTPUT) { if (direction == IP_VS_DIR_OUTPUT) cp->flags &= ~IP_VS_CONN_F_NOOUTPUT; else direction = IP_VS_DIR_INPUT_ONLY; } next_state = sctp_states[direction][event][cp->state]; if (next_state != cp->state) { struct ip_vs_dest *dest = cp->dest; IP_VS_DBG_BUF(8, "%s %s %s:%d->" "%s:%d state: %s->%s conn->refcnt:%d\n", pd->pp->name, ((direction == IP_VS_DIR_OUTPUT) ? "output " : "input "), IP_VS_DBG_ADDR(cp->daf, &cp->daddr), ntohs(cp->dport), IP_VS_DBG_ADDR(cp->af, &cp->caddr), ntohs(cp->cport), sctp_state_name(cp->state), sctp_state_name(next_state), refcount_read(&cp->refcnt)); if (dest) { if (!(cp->flags & IP_VS_CONN_F_INACTIVE) && (next_state != IP_VS_SCTP_S_ESTABLISHED)) { atomic_dec(&dest->activeconns); atomic_inc(&dest->inactconns); cp->flags |= IP_VS_CONN_F_INACTIVE; } else if ((cp->flags & IP_VS_CONN_F_INACTIVE) && (next_state == IP_VS_SCTP_S_ESTABLISHED)) { atomic_inc(&dest->activeconns); atomic_dec(&dest->inactconns); cp->flags &= ~IP_VS_CONN_F_INACTIVE; } } if (next_state == IP_VS_SCTP_S_ESTABLISHED) ip_vs_control_assure_ct(cp); } if (likely(pd)) cp->timeout = pd->timeout_table[cp->state = next_state]; else /* What to do ? */ cp->timeout = sctp_timeouts[cp->state = next_state]; } static void sctp_state_transition(struct ip_vs_conn *cp, int direction, const struct sk_buff *skb, struct ip_vs_proto_data *pd) { spin_lock_bh(&cp->lock); set_sctp_state(pd, cp, direction, skb); spin_unlock_bh(&cp->lock); } static inline __u16 sctp_app_hashkey(__be16 port) { return (((__force u16)port >> SCTP_APP_TAB_BITS) ^ (__force u16)port) & SCTP_APP_TAB_MASK; } static int sctp_register_app(struct netns_ipvs *ipvs, struct ip_vs_app *inc) { struct ip_vs_app *i; __u16 hash; __be16 port = inc->port; int ret = 0; struct ip_vs_proto_data *pd = ip_vs_proto_data_get(ipvs, IPPROTO_SCTP); hash = sctp_app_hashkey(port); list_for_each_entry(i, &ipvs->sctp_apps[hash], p_list) { if (i->port == port) { ret = -EEXIST; goto out; } } list_add_rcu(&inc->p_list, &ipvs->sctp_apps[hash]); atomic_inc(&pd->appcnt); out: return ret; } static void sctp_unregister_app(struct netns_ipvs *ipvs, struct ip_vs_app *inc) { struct ip_vs_proto_data *pd = ip_vs_proto_data_get(ipvs, IPPROTO_SCTP); atomic_dec(&pd->appcnt); list_del_rcu(&inc->p_list); } static int sctp_app_conn_bind(struct ip_vs_conn *cp) { struct netns_ipvs *ipvs = cp->ipvs; int hash; struct ip_vs_app *inc; int result = 0; /* Default binding: bind app only for NAT */ if (IP_VS_FWD_METHOD(cp) != IP_VS_CONN_F_MASQ) return 0; /* Lookup application incarnations and bind the right one */ hash = sctp_app_hashkey(cp->vport); list_for_each_entry_rcu(inc, &ipvs->sctp_apps[hash], p_list) { if (inc->port == cp->vport) { if (unlikely(!ip_vs_app_inc_get(inc))) break; IP_VS_DBG_BUF(9, "%s: Binding conn %s:%u->" "%s:%u to app %s on port %u\n", __func__, IP_VS_DBG_ADDR(cp->af, &cp->caddr), ntohs(cp->cport), IP_VS_DBG_ADDR(cp->af, &cp->vaddr), ntohs(cp->vport), inc->name, ntohs(inc->port)); cp->app = inc; if (inc->init_conn) result = inc->init_conn(inc, cp); break; } } return result; } /* --------------------------------------------- * timeouts is netns related now. * --------------------------------------------- */ static int __ip_vs_sctp_init(struct netns_ipvs *ipvs, struct ip_vs_proto_data *pd) { ip_vs_init_hash_table(ipvs->sctp_apps, SCTP_APP_TAB_SIZE); pd->timeout_table = ip_vs_create_timeout_table((int *)sctp_timeouts, sizeof(sctp_timeouts)); if (!pd->timeout_table) return -ENOMEM; return 0; } static void __ip_vs_sctp_exit(struct netns_ipvs *ipvs, struct ip_vs_proto_data *pd) { kfree(pd->timeout_table); } struct ip_vs_protocol ip_vs_protocol_sctp = { .name = "SCTP", .protocol = IPPROTO_SCTP, .num_states = IP_VS_SCTP_S_LAST, .dont_defrag = 0, .init = NULL, .exit = NULL, .init_netns = __ip_vs_sctp_init, .exit_netns = __ip_vs_sctp_exit, .register_app = sctp_register_app, .unregister_app = sctp_unregister_app, .conn_schedule = sctp_conn_schedule, .conn_in_get = ip_vs_conn_in_get_proto, .conn_out_get = ip_vs_conn_out_get_proto, .snat_handler = sctp_snat_handler, .dnat_handler = sctp_dnat_handler, .state_name = sctp_state_name, .state_transition = sctp_state_transition, .app_conn_bind = sctp_app_conn_bind, .debug_packet = ip_vs_tcpudp_debug_packet, .timeout_change = NULL, }; |
| 5 5 5 5 5 5 5 632 100 100 98 19 18 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 | /* SPDX-License-Identifier: GPL-2.0 */ /* * A hash table (hashtab) maintains associations between * key values and datum values. The type of the key values * and the type of the datum values is arbitrary. The * functions for hash computation and key comparison are * provided by the creator of the table. * * Author : Stephen Smalley, <stephen.smalley.work@gmail.com> */ #ifndef _SS_HASHTAB_H_ #define _SS_HASHTAB_H_ #include <linux/types.h> #include <linux/errno.h> #include <linux/sched.h> #define HASHTAB_MAX_NODES U32_MAX struct hashtab_key_params { u32 (*hash)(const void *key); /* hash func */ int (*cmp)(const void *key1, const void *key2); /* comparison func */ }; struct hashtab_node { void *key; void *datum; struct hashtab_node *next; }; struct hashtab { struct hashtab_node **htable; /* hash table */ u32 size; /* number of slots in hash table */ u32 nel; /* number of elements in hash table */ }; struct hashtab_info { u32 slots_used; u32 max_chain_len; u64 chain2_len_sum; }; /* * Initializes a new hash table with the specified characteristics. * * Returns -ENOMEM if insufficient space is available or 0 otherwise. */ int hashtab_init(struct hashtab *h, u32 nel_hint); int __hashtab_insert(struct hashtab *h, struct hashtab_node **dst, void *key, void *datum); /* * Inserts the specified (key, datum) pair into the specified hash table. * * Returns -ENOMEM on memory allocation error, * -EEXIST if there is already an entry with the same key, * -EINVAL for general errors or 0 otherwise. */ static inline int hashtab_insert(struct hashtab *h, void *key, void *datum, struct hashtab_key_params key_params) { u32 hvalue; struct hashtab_node *prev, *cur; cond_resched(); if (!h->size || h->nel == HASHTAB_MAX_NODES) return -EINVAL; hvalue = key_params.hash(key) & (h->size - 1); prev = NULL; cur = h->htable[hvalue]; while (cur) { int cmp = key_params.cmp(key, cur->key); if (cmp == 0) return -EEXIST; if (cmp < 0) break; prev = cur; cur = cur->next; } return __hashtab_insert(h, prev ? &prev->next : &h->htable[hvalue], key, datum); } /* * Searches for the entry with the specified key in the hash table. * * Returns NULL if no entry has the specified key or * the datum of the entry otherwise. */ static inline void *hashtab_search(struct hashtab *h, const void *key, struct hashtab_key_params key_params) { u32 hvalue; struct hashtab_node *cur; if (!h->size) return NULL; hvalue = key_params.hash(key) & (h->size - 1); cur = h->htable[hvalue]; while (cur) { int cmp = key_params.cmp(key, cur->key); if (cmp == 0) return cur->datum; if (cmp < 0) break; cur = cur->next; } return NULL; } /* * Destroys the specified hash table. */ void hashtab_destroy(struct hashtab *h); /* * Applies the specified apply function to (key,datum,args) * for each entry in the specified hash table. * * The order in which the function is applied to the entries * is dependent upon the internal structure of the hash table. * * If apply returns a non-zero status, then hashtab_map will cease * iterating through the hash table and will propagate the error * return to its caller. */ int hashtab_map(struct hashtab *h, int (*apply)(void *k, void *d, void *args), void *args); int hashtab_duplicate(struct hashtab *new, const struct hashtab *orig, int (*copy)(struct hashtab_node *new, const struct hashtab_node *orig, void *args), int (*destroy)(void *k, void *d, void *args), void *args); #ifdef CONFIG_SECURITY_SELINUX_DEBUG /* Fill info with some hash table statistics */ void hashtab_stat(struct hashtab *h, struct hashtab_info *info); #else static inline void hashtab_stat(struct hashtab *h, struct hashtab_info *info) { return; } #endif #endif /* _SS_HASHTAB_H */ |
| 2 2 4 1 4 1 3 2 2 2 2 2 4 4 4 1 4 5 5 5 2 2 5 4 5 3 5 4 5 4 5 4 5 5 5 5 5 5 5 5 5 5 5 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 | // SPDX-License-Identifier: GPL-2.0 /* XDP sockets monitoring support * * Copyright(c) 2019 Intel Corporation. * * Author: Björn Töpel <bjorn.topel@intel.com> */ #include <linux/module.h> #include <net/xdp_sock.h> #include <linux/xdp_diag.h> #include <linux/sock_diag.h> #include "xsk_queue.h" #include "xsk.h" static int xsk_diag_put_info(const struct xdp_sock *xs, struct sk_buff *nlskb) { struct xdp_diag_info di = {}; di.ifindex = xs->dev ? xs->dev->ifindex : 0; di.queue_id = xs->queue_id; return nla_put(nlskb, XDP_DIAG_INFO, sizeof(di), &di); } static int xsk_diag_put_ring(const struct xsk_queue *queue, int nl_type, struct sk_buff *nlskb) { struct xdp_diag_ring dr = {}; dr.entries = queue->nentries; return nla_put(nlskb, nl_type, sizeof(dr), &dr); } static int xsk_diag_put_rings_cfg(const struct xdp_sock *xs, struct sk_buff *nlskb) { int err = 0; if (xs->rx) err = xsk_diag_put_ring(xs->rx, XDP_DIAG_RX_RING, nlskb); if (!err && xs->tx) err = xsk_diag_put_ring(xs->tx, XDP_DIAG_TX_RING, nlskb); return err; } static int xsk_diag_put_umem(const struct xdp_sock *xs, struct sk_buff *nlskb) { struct xsk_buff_pool *pool = xs->pool; struct xdp_umem *umem = xs->umem; struct xdp_diag_umem du = {}; int err; if (!umem) return 0; du.id = umem->id; du.size = umem->size; du.num_pages = umem->npgs; du.chunk_size = umem->chunk_size; du.headroom = umem->headroom; du.ifindex = (pool && pool->netdev) ? pool->netdev->ifindex : 0; du.queue_id = pool ? pool->queue_id : 0; du.flags = 0; if (umem->zc) du.flags |= XDP_DU_F_ZEROCOPY; du.refs = refcount_read(&umem->users); err = nla_put(nlskb, XDP_DIAG_UMEM, sizeof(du), &du); if (!err && pool && pool->fq) err = xsk_diag_put_ring(pool->fq, XDP_DIAG_UMEM_FILL_RING, nlskb); if (!err && pool && pool->cq) err = xsk_diag_put_ring(pool->cq, XDP_DIAG_UMEM_COMPLETION_RING, nlskb); return err; } static int xsk_diag_put_stats(const struct xdp_sock *xs, struct sk_buff *nlskb) { struct xdp_diag_stats du = {}; du.n_rx_dropped = xs->rx_dropped; du.n_rx_invalid = xskq_nb_invalid_descs(xs->rx); du.n_rx_full = xs->rx_queue_full; du.n_fill_ring_empty = xs->pool ? xskq_nb_queue_empty_descs(xs->pool->fq) : 0; du.n_tx_invalid = xskq_nb_invalid_descs(xs->tx); du.n_tx_ring_empty = xskq_nb_queue_empty_descs(xs->tx); return nla_put(nlskb, XDP_DIAG_STATS, sizeof(du), &du); } static int xsk_diag_fill(struct sock *sk, struct sk_buff *nlskb, struct xdp_diag_req *req, struct user_namespace *user_ns, u32 portid, u32 seq, u32 flags, int sk_ino) { struct xdp_sock *xs = xdp_sk(sk); struct xdp_diag_msg *msg; struct nlmsghdr *nlh; nlh = nlmsg_put(nlskb, portid, seq, SOCK_DIAG_BY_FAMILY, sizeof(*msg), flags); if (!nlh) return -EMSGSIZE; msg = nlmsg_data(nlh); memset(msg, 0, sizeof(*msg)); msg->xdiag_family = AF_XDP; msg->xdiag_type = sk->sk_type; msg->xdiag_ino = sk_ino; sock_diag_save_cookie(sk, msg->xdiag_cookie); mutex_lock(&xs->mutex); if (READ_ONCE(xs->state) == XSK_UNBOUND) goto out_nlmsg_trim; if ((req->xdiag_show & XDP_SHOW_INFO) && xsk_diag_put_info(xs, nlskb)) goto out_nlmsg_trim; if ((req->xdiag_show & XDP_SHOW_INFO) && nla_put_u32(nlskb, XDP_DIAG_UID, from_kuid_munged(user_ns, sk_uid(sk)))) goto out_nlmsg_trim; if ((req->xdiag_show & XDP_SHOW_RING_CFG) && xsk_diag_put_rings_cfg(xs, nlskb)) goto out_nlmsg_trim; if ((req->xdiag_show & XDP_SHOW_UMEM) && xsk_diag_put_umem(xs, nlskb)) goto out_nlmsg_trim; if ((req->xdiag_show & XDP_SHOW_MEMINFO) && sock_diag_put_meminfo(sk, nlskb, XDP_DIAG_MEMINFO)) goto out_nlmsg_trim; if ((req->xdiag_show & XDP_SHOW_STATS) && xsk_diag_put_stats(xs, nlskb)) goto out_nlmsg_trim; mutex_unlock(&xs->mutex); nlmsg_end(nlskb, nlh); return 0; out_nlmsg_trim: mutex_unlock(&xs->mutex); nlmsg_cancel(nlskb, nlh); return -EMSGSIZE; } static int xsk_diag_dump(struct sk_buff *nlskb, struct netlink_callback *cb) { struct xdp_diag_req *req = nlmsg_data(cb->nlh); struct net *net = sock_net(nlskb->sk); int num = 0, s_num = cb->args[0]; struct sock *sk; mutex_lock(&net->xdp.lock); sk_for_each(sk, &net->xdp.list) { if (!net_eq(sock_net(sk), net)) continue; if (num++ < s_num) continue; if (xsk_diag_fill(sk, nlskb, req, sk_user_ns(NETLINK_CB(cb->skb).sk), NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq, NLM_F_MULTI, sock_i_ino(sk)) < 0) { num--; break; } } mutex_unlock(&net->xdp.lock); cb->args[0] = num; return nlskb->len; } static int xsk_diag_handler_dump(struct sk_buff *nlskb, struct nlmsghdr *hdr) { struct netlink_dump_control c = { .dump = xsk_diag_dump }; int hdrlen = sizeof(struct xdp_diag_req); struct net *net = sock_net(nlskb->sk); if (nlmsg_len(hdr) < hdrlen) return -EINVAL; if (!(hdr->nlmsg_flags & NLM_F_DUMP)) return -EOPNOTSUPP; return netlink_dump_start(net->diag_nlsk, nlskb, hdr, &c); } static const struct sock_diag_handler xsk_diag_handler = { .owner = THIS_MODULE, .family = AF_XDP, .dump = xsk_diag_handler_dump, }; static int __init xsk_diag_init(void) { return sock_diag_register(&xsk_diag_handler); } static void __exit xsk_diag_exit(void) { sock_diag_unregister(&xsk_diag_handler); } module_init(xsk_diag_init); module_exit(xsk_diag_exit); MODULE_LICENSE("GPL"); MODULE_DESCRIPTION("XDP socket monitoring via SOCK_DIAG"); MODULE_ALIAS_NET_PF_PROTO_TYPE(PF_NETLINK, NETLINK_SOCK_DIAG, AF_XDP); |
| 7 7 7 7 7 3 6 7 3 7 3 3 3 3 7 7 7 7 7 4 3 3 7 7 7 6 6 5 2 5 3 3 3 3 3 3 5 2 5 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 | // SPDX-License-Identifier: GPL-2.0-or-later /* PKCS#7 parser * * Copyright (C) 2012 Red Hat, Inc. All Rights Reserved. * Written by David Howells (dhowells@redhat.com) */ #define pr_fmt(fmt) "PKCS7: "fmt #include <linux/kernel.h> #include <linux/module.h> #include <linux/export.h> #include <linux/slab.h> #include <linux/err.h> #include <linux/oid_registry.h> #include <crypto/public_key.h> #include "pkcs7_parser.h" #include "pkcs7.asn1.h" MODULE_DESCRIPTION("PKCS#7 parser"); MODULE_AUTHOR("Red Hat, Inc."); MODULE_LICENSE("GPL"); struct pkcs7_parse_context { struct pkcs7_message *msg; /* Message being constructed */ struct pkcs7_signed_info *sinfo; /* SignedInfo being constructed */ struct pkcs7_signed_info **ppsinfo; struct x509_certificate *certs; /* Certificate cache */ struct x509_certificate **ppcerts; unsigned long data; /* Start of data */ enum OID last_oid; /* Last OID encountered */ unsigned x509_index; unsigned sinfo_index; const void *raw_serial; unsigned raw_serial_size; unsigned raw_issuer_size; const void *raw_issuer; const void *raw_skid; unsigned raw_skid_size; bool expect_skid; }; /* * Free a signed information block. */ static void pkcs7_free_signed_info(struct pkcs7_signed_info *sinfo) { if (sinfo) { public_key_signature_free(sinfo->sig); kfree(sinfo); } } /** * pkcs7_free_message - Free a PKCS#7 message * @pkcs7: The PKCS#7 message to free */ void pkcs7_free_message(struct pkcs7_message *pkcs7) { struct x509_certificate *cert; struct pkcs7_signed_info *sinfo; if (pkcs7) { while (pkcs7->certs) { cert = pkcs7->certs; pkcs7->certs = cert->next; x509_free_certificate(cert); } while (pkcs7->crl) { cert = pkcs7->crl; pkcs7->crl = cert->next; x509_free_certificate(cert); } while (pkcs7->signed_infos) { sinfo = pkcs7->signed_infos; pkcs7->signed_infos = sinfo->next; pkcs7_free_signed_info(sinfo); } kfree(pkcs7); } } EXPORT_SYMBOL_GPL(pkcs7_free_message); /* * Check authenticatedAttributes are provided or not provided consistently. */ static int pkcs7_check_authattrs(struct pkcs7_message *msg) { struct pkcs7_signed_info *sinfo; bool want = false; sinfo = msg->signed_infos; if (!sinfo) goto inconsistent; #ifdef CONFIG_PKCS7_WAIVE_AUTHATTRS_REJECTION_FOR_MLDSA msg->authattrs_rej_waivable = true; #endif if (sinfo->authattrs) { want = true; msg->have_authattrs = true; #ifdef CONFIG_PKCS7_WAIVE_AUTHATTRS_REJECTION_FOR_MLDSA if (strncmp(sinfo->sig->pkey_algo, "mldsa", 5) != 0) msg->authattrs_rej_waivable = false; #endif } else if (sinfo->sig->algo_takes_data) { sinfo->sig->hash_algo = "none"; } for (sinfo = sinfo->next; sinfo; sinfo = sinfo->next) { if (!!sinfo->authattrs != want) goto inconsistent; if (!sinfo->authattrs && sinfo->sig->algo_takes_data) sinfo->sig->hash_algo = "none"; } return 0; inconsistent: pr_warn("Inconsistently supplied authAttrs\n"); return -EINVAL; } /** * pkcs7_parse_message - Parse a PKCS#7 message * @data: The raw binary ASN.1 encoded message to be parsed * @datalen: The size of the encoded message */ struct pkcs7_message *pkcs7_parse_message(const void *data, size_t datalen) { struct pkcs7_parse_context *ctx; struct pkcs7_message *msg = ERR_PTR(-ENOMEM); int ret; ctx = kzalloc_obj(struct pkcs7_parse_context); if (!ctx) goto out_no_ctx; ctx->msg = kzalloc_obj(struct pkcs7_message); if (!ctx->msg) goto out_no_msg; ctx->sinfo = kzalloc_obj(struct pkcs7_signed_info); if (!ctx->sinfo) goto out_no_sinfo; ctx->sinfo->sig = kzalloc_obj(struct public_key_signature); if (!ctx->sinfo->sig) goto out_no_sig; ctx->data = (unsigned long)data; ctx->ppcerts = &ctx->certs; ctx->ppsinfo = &ctx->msg->signed_infos; /* Attempt to decode the signature */ ret = asn1_ber_decoder(&pkcs7_decoder, ctx, data, datalen); if (ret < 0) { msg = ERR_PTR(ret); goto out; } ret = pkcs7_check_authattrs(ctx->msg); if (ret < 0) { msg = ERR_PTR(ret); goto out; } msg = ctx->msg; ctx->msg = NULL; out: while (ctx->certs) { struct x509_certificate *cert = ctx->certs; ctx->certs = cert->next; x509_free_certificate(cert); } out_no_sig: pkcs7_free_signed_info(ctx->sinfo); out_no_sinfo: pkcs7_free_message(ctx->msg); out_no_msg: kfree(ctx); out_no_ctx: return msg; } EXPORT_SYMBOL_GPL(pkcs7_parse_message); /** * pkcs7_get_content_data - Get access to the PKCS#7 content * @pkcs7: The preparsed PKCS#7 message to access * @_data: Place to return a pointer to the data * @_data_len: Place to return the data length * @_headerlen: Size of ASN.1 header not included in _data * * Get access to the data content of the PKCS#7 message. The size of the * header of the ASN.1 object that contains it is also provided and can be used * to adjust *_data and *_data_len to get the entire object. * * Returns -ENODATA if the data object was missing from the message. */ int pkcs7_get_content_data(const struct pkcs7_message *pkcs7, const void **_data, size_t *_data_len, size_t *_headerlen) { if (!pkcs7->data) return -ENODATA; *_data = pkcs7->data; *_data_len = pkcs7->data_len; if (_headerlen) *_headerlen = pkcs7->data_hdrlen; return 0; } EXPORT_SYMBOL_GPL(pkcs7_get_content_data); /* * Note an OID when we find one for later processing when we know how * to interpret it. */ int pkcs7_note_OID(void *context, size_t hdrlen, unsigned char tag, const void *value, size_t vlen) { struct pkcs7_parse_context *ctx = context; ctx->last_oid = look_up_OID(value, vlen); if (ctx->last_oid == OID__NR) { char buffer[50]; sprint_oid(value, vlen, buffer, sizeof(buffer)); printk("PKCS7: Unknown OID: [%lu] %s\n", (unsigned long)value - ctx->data, buffer); } return 0; } /* * Note the digest algorithm for the signature. */ int pkcs7_sig_note_digest_algo(void *context, size_t hdrlen, unsigned char tag, const void *value, size_t vlen) { struct pkcs7_parse_context *ctx = context; switch (ctx->last_oid) { case OID_sha1: ctx->sinfo->sig->hash_algo = "sha1"; break; case OID_sha256: ctx->sinfo->sig->hash_algo = "sha256"; break; case OID_sha384: ctx->sinfo->sig->hash_algo = "sha384"; break; case OID_sha512: ctx->sinfo->sig->hash_algo = "sha512"; break; case OID_sha224: ctx->sinfo->sig->hash_algo = "sha224"; break; case OID_sm3: ctx->sinfo->sig->hash_algo = "sm3"; break; case OID_gost2012Digest256: ctx->sinfo->sig->hash_algo = "streebog256"; break; case OID_gost2012Digest512: ctx->sinfo->sig->hash_algo = "streebog512"; break; case OID_sha3_256: ctx->sinfo->sig->hash_algo = "sha3-256"; break; case OID_sha3_384: ctx->sinfo->sig->hash_algo = "sha3-384"; break; case OID_sha3_512: ctx->sinfo->sig->hash_algo = "sha3-512"; break; default: printk("Unsupported digest algo: %u\n", ctx->last_oid); return -ENOPKG; } return 0; } /* * Note the public key algorithm for the signature. */ int pkcs7_sig_note_pkey_algo(void *context, size_t hdrlen, unsigned char tag, const void *value, size_t vlen) { struct pkcs7_parse_context *ctx = context; switch (ctx->last_oid) { case OID_rsaEncryption: ctx->sinfo->sig->pkey_algo = "rsa"; ctx->sinfo->sig->encoding = "pkcs1"; break; case OID_id_ecdsa_with_sha1: case OID_id_ecdsa_with_sha224: case OID_id_ecdsa_with_sha256: case OID_id_ecdsa_with_sha384: case OID_id_ecdsa_with_sha512: case OID_id_ecdsa_with_sha3_256: case OID_id_ecdsa_with_sha3_384: case OID_id_ecdsa_with_sha3_512: ctx->sinfo->sig->pkey_algo = "ecdsa"; ctx->sinfo->sig->encoding = "x962"; break; case OID_gost2012PKey256: case OID_gost2012PKey512: ctx->sinfo->sig->pkey_algo = "ecrdsa"; ctx->sinfo->sig->encoding = "raw"; break; case OID_id_ml_dsa_44: ctx->sinfo->sig->pkey_algo = "mldsa44"; ctx->sinfo->sig->encoding = "raw"; ctx->sinfo->sig->algo_takes_data = true; break; case OID_id_ml_dsa_65: ctx->sinfo->sig->pkey_algo = "mldsa65"; ctx->sinfo->sig->encoding = "raw"; ctx->sinfo->sig->algo_takes_data = true; break; case OID_id_ml_dsa_87: ctx->sinfo->sig->pkey_algo = "mldsa87"; ctx->sinfo->sig->encoding = "raw"; ctx->sinfo->sig->algo_takes_data = true; break; default: printk("Unsupported pkey algo: %u\n", ctx->last_oid); return -ENOPKG; } return 0; } /* * We only support signed data [RFC2315 sec 9]. */ int pkcs7_check_content_type(void *context, size_t hdrlen, unsigned char tag, const void *value, size_t vlen) { struct pkcs7_parse_context *ctx = context; if (ctx->last_oid != OID_signed_data) { pr_warn("Only support pkcs7_signedData type\n"); return -EINVAL; } return 0; } /* * Note the SignedData version */ int pkcs7_note_signeddata_version(void *context, size_t hdrlen, unsigned char tag, const void *value, size_t vlen) { struct pkcs7_parse_context *ctx = context; unsigned version; if (vlen != 1) goto unsupported; ctx->msg->version = version = *(const u8 *)value; switch (version) { case 1: /* PKCS#7 SignedData [RFC2315 sec 9.1] * CMS ver 1 SignedData [RFC5652 sec 5.1] */ break; case 3: /* CMS ver 3 SignedData [RFC2315 sec 5.1] */ break; default: goto unsupported; } return 0; unsupported: pr_warn("Unsupported SignedData version\n"); return -EINVAL; } /* * Note the SignerInfo version */ int pkcs7_note_signerinfo_version(void *context, size_t hdrlen, unsigned char tag, const void *value, size_t vlen) { struct pkcs7_parse_context *ctx = context; unsigned version; if (vlen != 1) goto unsupported; version = *(const u8 *)value; switch (version) { case 1: /* PKCS#7 SignerInfo [RFC2315 sec 9.2] * CMS ver 1 SignerInfo [RFC5652 sec 5.3] */ if (ctx->msg->version != 1) goto version_mismatch; ctx->expect_skid = false; break; case 3: /* CMS ver 3 SignerInfo [RFC2315 sec 5.3] */ if (ctx->msg->version == 1) goto version_mismatch; ctx->expect_skid = true; break; default: goto unsupported; } return 0; unsupported: pr_warn("Unsupported SignerInfo version\n"); return -EINVAL; version_mismatch: pr_warn("SignedData-SignerInfo version mismatch\n"); return -EBADMSG; } /* * Extract a certificate and store it in the context. */ int pkcs7_extract_cert(void *context, size_t hdrlen, unsigned char tag, const void *value, size_t vlen) { struct pkcs7_parse_context *ctx = context; struct x509_certificate *x509; if (tag != ((ASN1_UNIV << 6) | ASN1_CONS_BIT | ASN1_SEQ)) { pr_debug("Cert began with tag %02x at %lu\n", tag, (unsigned long)ctx - ctx->data); return -EBADMSG; } /* We have to correct for the header so that the X.509 parser can start * from the beginning. Note that since X.509 stipulates DER, there * probably shouldn't be an EOC trailer - but it is in PKCS#7 (which * stipulates BER). */ value -= hdrlen; vlen += hdrlen; if (((u8*)value)[1] == 0x80) vlen += 2; /* Indefinite length - there should be an EOC */ x509 = x509_cert_parse(value, vlen); if (IS_ERR(x509)) return PTR_ERR(x509); x509->index = ++ctx->x509_index; pr_debug("Got cert %u for %s\n", x509->index, x509->subject); pr_debug("- fingerprint %*phN\n", x509->id->len, x509->id->data); *ctx->ppcerts = x509; ctx->ppcerts = &x509->next; return 0; } /* * Save the certificate list */ int pkcs7_note_certificate_list(void *context, size_t hdrlen, unsigned char tag, const void *value, size_t vlen) { struct pkcs7_parse_context *ctx = context; pr_devel("Got cert list (%02x)\n", tag); *ctx->ppcerts = ctx->msg->certs; ctx->msg->certs = ctx->certs; ctx->certs = NULL; ctx->ppcerts = &ctx->certs; return 0; } /* * Note the content type. */ int pkcs7_note_content(void *context, size_t hdrlen, unsigned char tag, const void *value, size_t vlen) { struct pkcs7_parse_context *ctx = context; if (ctx->last_oid != OID_data && ctx->last_oid != OID_msIndirectData) { pr_warn("Unsupported data type %d\n", ctx->last_oid); return -EINVAL; } ctx->msg->data_type = ctx->last_oid; return 0; } /* * Extract the data from the message and store that and its content type OID in * the context. */ int pkcs7_note_data(void *context, size_t hdrlen, unsigned char tag, const void *value, size_t vlen) { struct pkcs7_parse_context *ctx = context; pr_debug("Got data\n"); ctx->msg->data = value; ctx->msg->data_len = vlen; ctx->msg->data_hdrlen = hdrlen; return 0; } /* * Parse authenticated attributes. */ int pkcs7_sig_note_authenticated_attr(void *context, size_t hdrlen, unsigned char tag, const void *value, size_t vlen) { struct pkcs7_parse_context *ctx = context; struct pkcs7_signed_info *sinfo = ctx->sinfo; enum OID content_type; pr_devel("AuthAttr: %02x %zu [%*ph]\n", tag, vlen, (unsigned)vlen, value); switch (ctx->last_oid) { case OID_contentType: if (__test_and_set_bit(sinfo_has_content_type, &sinfo->aa_set)) goto repeated; content_type = look_up_OID(value, vlen); if (content_type != ctx->msg->data_type) { pr_warn("Mismatch between global data type (%d) and sinfo %u (%d)\n", ctx->msg->data_type, sinfo->index, content_type); return -EBADMSG; } return 0; case OID_signingTime: if (__test_and_set_bit(sinfo_has_signing_time, &sinfo->aa_set)) goto repeated; /* Should we check that the signing time is consistent * with the signer's X.509 cert? */ return x509_decode_time(&sinfo->signing_time, hdrlen, tag, value, vlen); case OID_messageDigest: if (__test_and_set_bit(sinfo_has_message_digest, &sinfo->aa_set)) goto repeated; if (tag != ASN1_OTS) return -EBADMSG; sinfo->msgdigest = value; sinfo->msgdigest_len = vlen; return 0; case OID_smimeCapabilites: if (__test_and_set_bit(sinfo_has_smime_caps, &sinfo->aa_set)) goto repeated; if (ctx->msg->data_type != OID_msIndirectData) { pr_warn("S/MIME Caps only allowed with Authenticode\n"); return -EKEYREJECTED; } return 0; /* Microsoft SpOpusInfo seems to be contain cont[0] 16-bit BE * char URLs and cont[1] 8-bit char URLs. * * Microsoft StatementType seems to contain a list of OIDs that * are also used as extendedKeyUsage types in X.509 certs. */ case OID_msSpOpusInfo: if (__test_and_set_bit(sinfo_has_ms_opus_info, &sinfo->aa_set)) goto repeated; goto authenticode_check; case OID_msStatementType: if (__test_and_set_bit(sinfo_has_ms_statement_type, &sinfo->aa_set)) goto repeated; authenticode_check: if (ctx->msg->data_type != OID_msIndirectData) { pr_warn("Authenticode AuthAttrs only allowed with Authenticode\n"); return -EKEYREJECTED; } /* I'm not sure how to validate these */ return 0; default: return 0; } repeated: /* We permit max one item per AuthenticatedAttribute and no repeats */ pr_warn("Repeated/multivalue AuthAttrs not permitted\n"); return -EKEYREJECTED; } /* * Note the set of auth attributes for digestion purposes [RFC2315 sec 9.3] */ int pkcs7_sig_note_set_of_authattrs(void *context, size_t hdrlen, unsigned char tag, const void *value, size_t vlen) { struct pkcs7_parse_context *ctx = context; struct pkcs7_signed_info *sinfo = ctx->sinfo; if (!test_bit(sinfo_has_content_type, &sinfo->aa_set) || !test_bit(sinfo_has_message_digest, &sinfo->aa_set)) { pr_warn("Missing required AuthAttr\n"); return -EBADMSG; } if (ctx->msg->data_type != OID_msIndirectData && test_bit(sinfo_has_ms_opus_info, &sinfo->aa_set)) { pr_warn("Unexpected Authenticode AuthAttr\n"); return -EBADMSG; } /* We need to switch the 'CONT 0' to a 'SET OF' when we digest */ sinfo->authattrs = value - hdrlen; sinfo->authattrs_len = vlen + hdrlen; return 0; } /* * Note the issuing certificate serial number */ int pkcs7_sig_note_serial(void *context, size_t hdrlen, unsigned char tag, const void *value, size_t vlen) { struct pkcs7_parse_context *ctx = context; ctx->raw_serial = value; ctx->raw_serial_size = vlen; return 0; } /* * Note the issuer's name */ int pkcs7_sig_note_issuer(void *context, size_t hdrlen, unsigned char tag, const void *value, size_t vlen) { struct pkcs7_parse_context *ctx = context; ctx->raw_issuer = value; ctx->raw_issuer_size = vlen; return 0; } /* * Note the issuing cert's subjectKeyIdentifier */ int pkcs7_sig_note_skid(void *context, size_t hdrlen, unsigned char tag, const void *value, size_t vlen) { struct pkcs7_parse_context *ctx = context; pr_devel("SKID: %02x %zu [%*ph]\n", tag, vlen, (unsigned)vlen, value); ctx->raw_skid = value; ctx->raw_skid_size = vlen; return 0; } /* * Note the signature data */ int pkcs7_sig_note_signature(void *context, size_t hdrlen, unsigned char tag, const void *value, size_t vlen) { struct pkcs7_parse_context *ctx = context; ctx->sinfo->sig->s = kmemdup(value, vlen, GFP_KERNEL); if (!ctx->sinfo->sig->s) return -ENOMEM; ctx->sinfo->sig->s_size = vlen; return 0; } /* * Note a signature information block */ int pkcs7_note_signed_info(void *context, size_t hdrlen, unsigned char tag, const void *value, size_t vlen) { struct pkcs7_parse_context *ctx = context; struct pkcs7_signed_info *sinfo = ctx->sinfo; struct asymmetric_key_id *kid; if (ctx->msg->data_type == OID_msIndirectData && !sinfo->authattrs) { pr_warn("Authenticode requires AuthAttrs\n"); return -EBADMSG; } /* Generate cert issuer + serial number key ID */ if (!ctx->expect_skid) { kid = asymmetric_key_generate_id(ctx->raw_serial, ctx->raw_serial_size, ctx->raw_issuer, ctx->raw_issuer_size); } else { kid = asymmetric_key_generate_id(ctx->raw_skid, ctx->raw_skid_size, "", 0); } if (IS_ERR(kid)) return PTR_ERR(kid); pr_devel("SINFO KID: %u [%*phN]\n", kid->len, kid->len, kid->data); sinfo->sig->auth_ids[0] = kid; sinfo->index = ++ctx->sinfo_index; *ctx->ppsinfo = sinfo; ctx->ppsinfo = &sinfo->next; ctx->sinfo = kzalloc_obj(struct pkcs7_signed_info); if (!ctx->sinfo) return -ENOMEM; ctx->sinfo->sig = kzalloc_obj(struct public_key_signature); if (!ctx->sinfo->sig) return -ENOMEM; return 0; } |
| 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 | /* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) */ /* * cec - HDMI Consumer Electronics Control public header * * Copyright 2016 Cisco Systems, Inc. and/or its affiliates. All rights reserved. */ #ifndef _CEC_UAPI_H #define _CEC_UAPI_H #include <linux/types.h> #include <linux/string.h> #define CEC_MAX_MSG_SIZE 16 /** * struct cec_msg - CEC message structure. * @tx_ts: Timestamp in nanoseconds using CLOCK_MONOTONIC. Set by the * driver when the message transmission has finished. * @rx_ts: Timestamp in nanoseconds using CLOCK_MONOTONIC. Set by the * driver when the message was received. * @len: Length in bytes of the message. * @timeout: The timeout (in ms) that is used to timeout CEC_RECEIVE. * Set to 0 if you want to wait forever. This timeout can also be * used with CEC_TRANSMIT as the timeout for waiting for a reply. * If 0, then it will use a 1 second timeout instead of waiting * forever as is done with CEC_RECEIVE. * @sequence: The framework assigns a sequence number to messages that are * sent. This can be used to track replies to previously sent * messages. * @flags: Set to 0. * @msg: The message payload. * @reply: This field is ignored with CEC_RECEIVE and is only used by * CEC_TRANSMIT. If non-zero, then wait for a reply with this * opcode. Set to CEC_MSG_FEATURE_ABORT if you want to wait for * a possible ABORT reply. If there was an error when sending the * msg or FeatureAbort was returned, then reply is set to 0. * If reply is non-zero upon return, then len/msg are set to * the received message. * If reply is zero upon return and status has the * CEC_TX_STATUS_FEATURE_ABORT bit set, then len/msg are set to * the received feature abort message. * If reply is zero upon return and status has the * CEC_TX_STATUS_MAX_RETRIES bit set, then no reply was seen at * all. If reply is non-zero for CEC_TRANSMIT and the message is a * broadcast, then -EINVAL is returned. * if reply is non-zero, then timeout is set to 1000 (the required * maximum response time). * @rx_status: The message receive status bits. Set by the driver. * @tx_status: The message transmit status bits. Set by the driver. * @tx_arb_lost_cnt: The number of 'Arbitration Lost' events. Set by the driver. * @tx_nack_cnt: The number of 'Not Acknowledged' events. Set by the driver. * @tx_low_drive_cnt: The number of 'Low Drive Detected' events. Set by the * driver. * @tx_error_cnt: The number of 'Error' events. Set by the driver. */ struct cec_msg { __u64 tx_ts; __u64 rx_ts; __u32 len; __u32 timeout; __u32 sequence; __u32 flags; __u8 msg[CEC_MAX_MSG_SIZE]; __u8 reply; __u8 rx_status; __u8 tx_status; __u8 tx_arb_lost_cnt; __u8 tx_nack_cnt; __u8 tx_low_drive_cnt; __u8 tx_error_cnt; }; /** * cec_msg_initiator - return the initiator's logical address. * @msg: the message structure */ static inline __u8 cec_msg_initiator(const struct cec_msg *msg) { return msg->msg[0] >> 4; } /** * cec_msg_destination - return the destination's logical address. * @msg: the message structure */ static inline __u8 cec_msg_destination(const struct cec_msg *msg) { return msg->msg[0] & 0xf; } /** * cec_msg_opcode - return the opcode of the message, -1 for poll * @msg: the message structure */ static inline int cec_msg_opcode(const struct cec_msg *msg) { return msg->len > 1 ? msg->msg[1] : -1; } /** * cec_msg_is_broadcast - return true if this is a broadcast message. * @msg: the message structure */ static inline int cec_msg_is_broadcast(const struct cec_msg *msg) { return (msg->msg[0] & 0xf) == 0xf; } /** * cec_msg_init - initialize the message structure. * @msg: the message structure * @initiator: the logical address of the initiator * @destination:the logical address of the destination (0xf for broadcast) * * The whole structure is zeroed, the len field is set to 1 (i.e. a poll * message) and the initiator and destination are filled in. */ static inline void cec_msg_init(struct cec_msg *msg, __u8 initiator, __u8 destination) { memset(msg, 0, sizeof(*msg)); msg->msg[0] = (initiator << 4) | destination; msg->len = 1; } /** * cec_msg_set_reply_to - fill in destination/initiator in a reply message. * @msg: the message structure for the reply * @orig: the original message structure * * Set the msg destination to the orig initiator and the msg initiator to the * orig destination. Note that msg and orig may be the same pointer, in which * case the change is done in place. * * It also zeroes the reply, timeout and flags fields. */ static inline void cec_msg_set_reply_to(struct cec_msg *msg, struct cec_msg *orig) { /* The destination becomes the initiator and vice versa */ msg->msg[0] = (cec_msg_destination(orig) << 4) | cec_msg_initiator(orig); msg->reply = 0; msg->timeout = 0; msg->flags = 0; } /** * cec_msg_recv_is_tx_result - return true if this message contains the * result of an earlier non-blocking transmit * @msg: the message structure from CEC_RECEIVE */ static inline int cec_msg_recv_is_tx_result(const struct cec_msg *msg) { return msg->sequence && msg->tx_status && !msg->rx_status; } /** * cec_msg_recv_is_rx_result - return true if this message contains the * reply of an earlier non-blocking transmit * @msg: the message structure from CEC_RECEIVE */ static inline int cec_msg_recv_is_rx_result(const struct cec_msg *msg) { return msg->sequence && !msg->tx_status && msg->rx_status; } /* cec_msg flags field */ #define CEC_MSG_FL_REPLY_TO_FOLLOWERS (1 << 0) #define CEC_MSG_FL_RAW (1 << 1) #define CEC_MSG_FL_REPLY_VENDOR_ID (1 << 2) /* cec_msg tx/rx_status field */ #define CEC_TX_STATUS_OK (1 << 0) #define CEC_TX_STATUS_ARB_LOST (1 << 1) #define CEC_TX_STATUS_NACK (1 << 2) #define CEC_TX_STATUS_LOW_DRIVE (1 << 3) #define CEC_TX_STATUS_ERROR (1 << 4) #define CEC_TX_STATUS_MAX_RETRIES (1 << 5) #define CEC_TX_STATUS_ABORTED (1 << 6) #define CEC_TX_STATUS_TIMEOUT (1 << 7) #define CEC_RX_STATUS_OK (1 << 0) #define CEC_RX_STATUS_TIMEOUT (1 << 1) #define CEC_RX_STATUS_FEATURE_ABORT (1 << 2) #define CEC_RX_STATUS_ABORTED (1 << 3) static inline int cec_msg_status_is_ok(const struct cec_msg *msg) { if (msg->tx_status && !(msg->tx_status & CEC_TX_STATUS_OK)) return 0; if (msg->rx_status && !(msg->rx_status & CEC_RX_STATUS_OK)) return 0; if (!msg->tx_status && !msg->rx_status) return 0; return !(msg->rx_status & CEC_RX_STATUS_FEATURE_ABORT); } #define CEC_LOG_ADDR_INVALID 0xff #define CEC_PHYS_ADDR_INVALID 0xffff /* * The maximum number of logical addresses one device can be assigned to. * The CEC 2.0 spec allows for only 2 logical addresses at the moment. The * Analog Devices CEC hardware supports 3. So let's go wild and go for 4. */ #define CEC_MAX_LOG_ADDRS 4 /* The logical addresses defined by CEC 2.0 */ #define CEC_LOG_ADDR_TV 0 #define CEC_LOG_ADDR_RECORD_1 1 #define CEC_LOG_ADDR_RECORD_2 2 #define CEC_LOG_ADDR_TUNER_1 3 #define CEC_LOG_ADDR_PLAYBACK_1 4 #define CEC_LOG_ADDR_AUDIOSYSTEM 5 #define CEC_LOG_ADDR_TUNER_2 6 #define CEC_LOG_ADDR_TUNER_3 7 #define CEC_LOG_ADDR_PLAYBACK_2 8 #define CEC_LOG_ADDR_RECORD_3 9 #define CEC_LOG_ADDR_TUNER_4 10 #define CEC_LOG_ADDR_PLAYBACK_3 11 #define CEC_LOG_ADDR_BACKUP_1 12 #define CEC_LOG_ADDR_BACKUP_2 13 #define CEC_LOG_ADDR_SPECIFIC 14 #define CEC_LOG_ADDR_UNREGISTERED 15 /* as initiator address */ #define CEC_LOG_ADDR_BROADCAST 15 /* as destination address */ /* The logical address types that the CEC device wants to claim */ #define CEC_LOG_ADDR_TYPE_TV 0 #define CEC_LOG_ADDR_TYPE_RECORD 1 #define CEC_LOG_ADDR_TYPE_TUNER 2 #define CEC_LOG_ADDR_TYPE_PLAYBACK 3 #define CEC_LOG_ADDR_TYPE_AUDIOSYSTEM 4 #define CEC_LOG_ADDR_TYPE_SPECIFIC 5 #define CEC_LOG_ADDR_TYPE_UNREGISTERED 6 /* * Switches should use UNREGISTERED. * Processors should use SPECIFIC. */ #define CEC_LOG_ADDR_MASK_TV (1 << CEC_LOG_ADDR_TV) #define CEC_LOG_ADDR_MASK_RECORD ((1 << CEC_LOG_ADDR_RECORD_1) | \ (1 << CEC_LOG_ADDR_RECORD_2) | \ (1 << CEC_LOG_ADDR_RECORD_3)) #define CEC_LOG_ADDR_MASK_TUNER ((1 << CEC_LOG_ADDR_TUNER_1) | \ (1 << CEC_LOG_ADDR_TUNER_2) | \ (1 << CEC_LOG_ADDR_TUNER_3) | \ (1 << CEC_LOG_ADDR_TUNER_4)) #define CEC_LOG_ADDR_MASK_PLAYBACK ((1 << CEC_LOG_ADDR_PLAYBACK_1) | \ (1 << CEC_LOG_ADDR_PLAYBACK_2) | \ (1 << CEC_LOG_ADDR_PLAYBACK_3)) #define CEC_LOG_ADDR_MASK_AUDIOSYSTEM (1 << CEC_LOG_ADDR_AUDIOSYSTEM) #define CEC_LOG_ADDR_MASK_BACKUP ((1 << CEC_LOG_ADDR_BACKUP_1) | \ (1 << CEC_LOG_ADDR_BACKUP_2)) #define CEC_LOG_ADDR_MASK_SPECIFIC (1 << CEC_LOG_ADDR_SPECIFIC) #define CEC_LOG_ADDR_MASK_UNREGISTERED (1 << CEC_LOG_ADDR_UNREGISTERED) static inline int cec_has_tv(__u16 log_addr_mask) { return log_addr_mask & CEC_LOG_ADDR_MASK_TV; } static inline int cec_has_record(__u16 log_addr_mask) { return log_addr_mask & CEC_LOG_ADDR_MASK_RECORD; } static inline int cec_has_tuner(__u16 log_addr_mask) { return log_addr_mask & CEC_LOG_ADDR_MASK_TUNER; } static inline int cec_has_playback(__u16 log_addr_mask) { return log_addr_mask & CEC_LOG_ADDR_MASK_PLAYBACK; } static inline int cec_has_audiosystem(__u16 log_addr_mask) { return log_addr_mask & CEC_LOG_ADDR_MASK_AUDIOSYSTEM; } static inline int cec_has_backup(__u16 log_addr_mask) { return log_addr_mask & CEC_LOG_ADDR_MASK_BACKUP; } static inline int cec_has_specific(__u16 log_addr_mask) { return log_addr_mask & CEC_LOG_ADDR_MASK_SPECIFIC; } static inline int cec_is_unregistered(__u16 log_addr_mask) { return log_addr_mask & CEC_LOG_ADDR_MASK_UNREGISTERED; } static inline int cec_is_unconfigured(__u16 log_addr_mask) { return log_addr_mask == 0; } /* * Use this if there is no vendor ID (CEC_G_VENDOR_ID) or if the vendor ID * should be disabled (CEC_S_VENDOR_ID) */ #define CEC_VENDOR_ID_NONE 0xffffffff /* The message handling modes */ /* Modes for initiator */ #define CEC_MODE_NO_INITIATOR (0x0 << 0) #define CEC_MODE_INITIATOR (0x1 << 0) #define CEC_MODE_EXCL_INITIATOR (0x2 << 0) #define CEC_MODE_INITIATOR_MSK 0x0f /* Modes for follower */ #define CEC_MODE_NO_FOLLOWER (0x0 << 4) #define CEC_MODE_FOLLOWER (0x1 << 4) #define CEC_MODE_EXCL_FOLLOWER (0x2 << 4) #define CEC_MODE_EXCL_FOLLOWER_PASSTHRU (0x3 << 4) #define CEC_MODE_MONITOR_PIN (0xd << 4) #define CEC_MODE_MONITOR (0xe << 4) #define CEC_MODE_MONITOR_ALL (0xf << 4) #define CEC_MODE_FOLLOWER_MSK 0xf0 /* Userspace has to configure the physical address */ #define CEC_CAP_PHYS_ADDR (1 << 0) /* Userspace has to configure the logical addresses */ #define CEC_CAP_LOG_ADDRS (1 << 1) /* Userspace can transmit messages (and thus become follower as well) */ #define CEC_CAP_TRANSMIT (1 << 2) /* * Passthrough all messages instead of processing them. */ #define CEC_CAP_PASSTHROUGH (1 << 3) /* Supports remote control */ #define CEC_CAP_RC (1 << 4) /* Hardware can monitor all messages, not just directed and broadcast. */ #define CEC_CAP_MONITOR_ALL (1 << 5) /* Hardware can use CEC only if the HDMI HPD pin is high. */ #define CEC_CAP_NEEDS_HPD (1 << 6) /* Hardware can monitor CEC pin transitions */ #define CEC_CAP_MONITOR_PIN (1 << 7) /* CEC_ADAP_G_CONNECTOR_INFO is available */ #define CEC_CAP_CONNECTOR_INFO (1 << 8) /* CEC_MSG_FL_REPLY_VENDOR_ID is available */ #define CEC_CAP_REPLY_VENDOR_ID (1 << 9) /** * struct cec_caps - CEC capabilities structure. * @driver: name of the CEC device driver. * @name: name of the CEC device. @driver + @name must be unique. * @available_log_addrs: number of available logical addresses. * @capabilities: capabilities of the CEC adapter. * @version: version of the CEC adapter framework. */ struct cec_caps { char driver[32]; char name[32]; __u32 available_log_addrs; __u32 capabilities; __u32 version; }; /** * struct cec_log_addrs - CEC logical addresses structure. * @log_addr: the claimed logical addresses. Set by the driver. * @log_addr_mask: current logical address mask. Set by the driver. * @cec_version: the CEC version that the adapter should implement. Set by the * caller. * @num_log_addrs: how many logical addresses should be claimed. Set by the * caller. * @vendor_id: the vendor ID of the device. Set by the caller. * @flags: flags. * @osd_name: the OSD name of the device. Set by the caller. * @primary_device_type: the primary device type for each logical address. * Set by the caller. * @log_addr_type: the logical address types. Set by the caller. * @all_device_types: CEC 2.0: all device types represented by the logical * address. Set by the caller. * @features: CEC 2.0: The logical address features. Set by the caller. */ struct cec_log_addrs { __u8 log_addr[CEC_MAX_LOG_ADDRS]; __u16 log_addr_mask; __u8 cec_version; __u8 num_log_addrs; __u32 vendor_id; __u32 flags; char osd_name[15]; __u8 primary_device_type[CEC_MAX_LOG_ADDRS]; __u8 log_addr_type[CEC_MAX_LOG_ADDRS]; /* CEC 2.0 */ __u8 all_device_types[CEC_MAX_LOG_ADDRS]; __u8 features[CEC_MAX_LOG_ADDRS][12]; }; /* Allow a fallback to unregistered */ #define CEC_LOG_ADDRS_FL_ALLOW_UNREG_FALLBACK (1 << 0) /* Passthrough RC messages to the input subsystem */ #define CEC_LOG_ADDRS_FL_ALLOW_RC_PASSTHRU (1 << 1) /* CDC-Only device: supports only CDC messages */ #define CEC_LOG_ADDRS_FL_CDC_ONLY (1 << 2) /** * struct cec_drm_connector_info - tells which drm connector is * associated with the CEC adapter. * @card_no: drm card number * @connector_id: drm connector ID */ struct cec_drm_connector_info { __u32 card_no; __u32 connector_id; }; #define CEC_CONNECTOR_TYPE_NO_CONNECTOR 0 #define CEC_CONNECTOR_TYPE_DRM 1 /** * struct cec_connector_info - tells if and which connector is * associated with the CEC adapter. * @type: connector type (if any) * @drm: drm connector info * @raw: array to pad the union */ struct cec_connector_info { __u32 type; union { struct cec_drm_connector_info drm; __u32 raw[16]; }; }; /* Events */ /* Event that occurs when the adapter state changes */ #define CEC_EVENT_STATE_CHANGE 1 /* * This event is sent when messages are lost because the application * didn't empty the message queue in time */ #define CEC_EVENT_LOST_MSGS 2 #define CEC_EVENT_PIN_CEC_LOW 3 #define CEC_EVENT_PIN_CEC_HIGH 4 #define CEC_EVENT_PIN_HPD_LOW 5 #define CEC_EVENT_PIN_HPD_HIGH 6 #define CEC_EVENT_PIN_5V_LOW 7 #define CEC_EVENT_PIN_5V_HIGH 8 #define CEC_EVENT_FL_INITIAL_STATE (1 << 0) #define CEC_EVENT_FL_DROPPED_EVENTS (1 << 1) /** * struct cec_event_state_change - used when the CEC adapter changes state. * @phys_addr: the current physical address * @log_addr_mask: the current logical address mask * @have_conn_info: if non-zero, then HDMI connector information is available. * This field is only valid if CEC_CAP_CONNECTOR_INFO is set. If that * capability is set and @have_conn_info is zero, then that indicates * that the HDMI connector device is not instantiated, either because * the HDMI driver is still configuring the device or because the HDMI * device was unbound. */ struct cec_event_state_change { __u16 phys_addr; __u16 log_addr_mask; __u16 have_conn_info; }; /** * struct cec_event_lost_msgs - tells you how many messages were lost. * @lost_msgs: how many messages were lost. */ struct cec_event_lost_msgs { __u32 lost_msgs; }; /** * struct cec_event - CEC event structure * @ts: the timestamp of when the event was sent. * @event: the event. * @flags: event flags. * @state_change: the event payload for CEC_EVENT_STATE_CHANGE. * @lost_msgs: the event payload for CEC_EVENT_LOST_MSGS. * @raw: array to pad the union. */ struct cec_event { __u64 ts; __u32 event; __u32 flags; union { struct cec_event_state_change state_change; struct cec_event_lost_msgs lost_msgs; __u32 raw[16]; }; }; /* ioctls */ /* Adapter capabilities */ #define CEC_ADAP_G_CAPS _IOWR('a', 0, struct cec_caps) /* * phys_addr is either 0 (if this is the CEC root device) * or a valid physical address obtained from the sink's EDID * as read by this CEC device (if this is a source device) * or a physical address obtained and modified from a sink * EDID and used for a sink CEC device. * If nothing is connected, then phys_addr is 0xffff. * See HDMI 1.4b, section 8.7 (Physical Address). * * The CEC_ADAP_S_PHYS_ADDR ioctl may not be available if that is handled * internally. */ #define CEC_ADAP_G_PHYS_ADDR _IOR('a', 1, __u16) #define CEC_ADAP_S_PHYS_ADDR _IOW('a', 2, __u16) /* * Configure the CEC adapter. It sets the device type and which * logical types it will try to claim. It will return which * logical addresses it could actually claim. * An error is returned if the adapter is disabled or if there * is no physical address assigned. */ #define CEC_ADAP_G_LOG_ADDRS _IOR('a', 3, struct cec_log_addrs) #define CEC_ADAP_S_LOG_ADDRS _IOWR('a', 4, struct cec_log_addrs) /* Transmit/receive a CEC command */ #define CEC_TRANSMIT _IOWR('a', 5, struct cec_msg) #define CEC_RECEIVE _IOWR('a', 6, struct cec_msg) /* Dequeue CEC events */ #define CEC_DQEVENT _IOWR('a', 7, struct cec_event) /* * Get and set the message handling mode for this filehandle. */ #define CEC_G_MODE _IOR('a', 8, __u32) #define CEC_S_MODE _IOW('a', 9, __u32) /* Get the connector info */ #define CEC_ADAP_G_CONNECTOR_INFO _IOR('a', 10, struct cec_connector_info) /* * The remainder of this header defines all CEC messages and operands. * The format matters since it the cec-ctl utility parses it to generate * code for implementing all these messages. * * Comments ending with 'Feature' group messages for each feature. * If messages are part of multiple features, then the "Has also" * comment is used to list the previously defined messages that are * supported by the feature. * * Before operands are defined a comment is added that gives the * name of the operand and in brackets the variable name of the * corresponding argument in the cec-funcs.h function. */ /* Messages */ /* One Touch Play Feature */ #define CEC_MSG_ACTIVE_SOURCE 0x82 #define CEC_MSG_IMAGE_VIEW_ON 0x04 #define CEC_MSG_TEXT_VIEW_ON 0x0d /* Routing Control Feature */ /* * Has also: * CEC_MSG_ACTIVE_SOURCE */ #define CEC_MSG_INACTIVE_SOURCE 0x9d #define CEC_MSG_REQUEST_ACTIVE_SOURCE 0x85 #define CEC_MSG_ROUTING_CHANGE 0x80 #define CEC_MSG_ROUTING_INFORMATION 0x81 #define CEC_MSG_SET_STREAM_PATH 0x86 /* Standby Feature */ #define CEC_MSG_STANDBY 0x36 /* One Touch Record Feature */ #define CEC_MSG_RECORD_OFF 0x0b #define CEC_MSG_RECORD_ON 0x09 /* Record Source Type Operand (rec_src_type) */ #define CEC_OP_RECORD_SRC_OWN 1 #define CEC_OP_RECORD_SRC_DIGITAL 2 #define CEC_OP_RECORD_SRC_ANALOG 3 #define CEC_OP_RECORD_SRC_EXT_PLUG 4 #define CEC_OP_RECORD_SRC_EXT_PHYS_ADDR 5 /* Service Identification Method Operand (service_id_method) */ #define CEC_OP_SERVICE_ID_METHOD_BY_DIG_ID 0 #define CEC_OP_SERVICE_ID_METHOD_BY_CHANNEL 1 /* Digital Service Broadcast System Operand (dig_bcast_system) */ #define CEC_OP_DIG_SERVICE_BCAST_SYSTEM_ARIB_GEN 0x00 #define CEC_OP_DIG_SERVICE_BCAST_SYSTEM_ATSC_GEN 0x01 #define CEC_OP_DIG_SERVICE_BCAST_SYSTEM_DVB_GEN 0x02 #define CEC_OP_DIG_SERVICE_BCAST_SYSTEM_ARIB_BS 0x08 #define CEC_OP_DIG_SERVICE_BCAST_SYSTEM_ARIB_CS 0x09 #define CEC_OP_DIG_SERVICE_BCAST_SYSTEM_ARIB_T 0x0a #define CEC_OP_DIG_SERVICE_BCAST_SYSTEM_ATSC_CABLE 0x10 #define CEC_OP_DIG_SERVICE_BCAST_SYSTEM_ATSC_SAT 0x11 #define CEC_OP_DIG_SERVICE_BCAST_SYSTEM_ATSC_T 0x12 #define CEC_OP_DIG_SERVICE_BCAST_SYSTEM_DVB_C 0x18 #define CEC_OP_DIG_SERVICE_BCAST_SYSTEM_DVB_S 0x19 #define CEC_OP_DIG_SERVICE_BCAST_SYSTEM_DVB_S2 0x1a #define CEC_OP_DIG_SERVICE_BCAST_SYSTEM_DVB_T 0x1b /* Analogue Broadcast Type Operand (ana_bcast_type) */ #define CEC_OP_ANA_BCAST_TYPE_CABLE 0 #define CEC_OP_ANA_BCAST_TYPE_SATELLITE 1 #define CEC_OP_ANA_BCAST_TYPE_TERRESTRIAL 2 /* Broadcast System Operand (bcast_system) */ #define CEC_OP_BCAST_SYSTEM_PAL_BG 0x00 #define CEC_OP_BCAST_SYSTEM_SECAM_LQ 0x01 /* SECAM L' */ #define CEC_OP_BCAST_SYSTEM_PAL_M 0x02 #define CEC_OP_BCAST_SYSTEM_NTSC_M 0x03 #define CEC_OP_BCAST_SYSTEM_PAL_I 0x04 #define CEC_OP_BCAST_SYSTEM_SECAM_DK 0x05 #define CEC_OP_BCAST_SYSTEM_SECAM_BG 0x06 #define CEC_OP_BCAST_SYSTEM_SECAM_L 0x07 #define CEC_OP_BCAST_SYSTEM_PAL_DK 0x08 #define CEC_OP_BCAST_SYSTEM_OTHER 0x1f /* Channel Number Format Operand (channel_number_fmt) */ #define CEC_OP_CHANNEL_NUMBER_FMT_1_PART 0x01 #define CEC_OP_CHANNEL_NUMBER_FMT_2_PART 0x02 #define CEC_MSG_RECORD_STATUS 0x0a /* Record Status Operand (rec_status) */ #define CEC_OP_RECORD_STATUS_CUR_SRC 0x01 #define CEC_OP_RECORD_STATUS_DIG_SERVICE 0x02 #define CEC_OP_RECORD_STATUS_ANA_SERVICE 0x03 #define CEC_OP_RECORD_STATUS_EXT_INPUT 0x04 #define CEC_OP_RECORD_STATUS_NO_DIG_SERVICE 0x05 #define CEC_OP_RECORD_STATUS_NO_ANA_SERVICE 0x06 #define CEC_OP_RECORD_STATUS_NO_SERVICE 0x07 #define CEC_OP_RECORD_STATUS_INVALID_EXT_PLUG 0x09 #define CEC_OP_RECORD_STATUS_INVALID_EXT_PHYS_ADDR 0x0a #define CEC_OP_RECORD_STATUS_UNSUP_CA 0x0b #define CEC_OP_RECORD_STATUS_NO_CA_ENTITLEMENTS 0x0c #define CEC_OP_RECORD_STATUS_CANT_COPY_SRC 0x0d #define CEC_OP_RECORD_STATUS_NO_MORE_COPIES 0x0e #define CEC_OP_RECORD_STATUS_NO_MEDIA 0x10 #define CEC_OP_RECORD_STATUS_PLAYING 0x11 #define CEC_OP_RECORD_STATUS_ALREADY_RECORDING 0x12 #define CEC_OP_RECORD_STATUS_MEDIA_PROT 0x13 #define CEC_OP_RECORD_STATUS_NO_SIGNAL 0x14 #define CEC_OP_RECORD_STATUS_MEDIA_PROBLEM 0x15 #define CEC_OP_RECORD_STATUS_NO_SPACE 0x16 #define CEC_OP_RECORD_STATUS_PARENTAL_LOCK 0x17 #define CEC_OP_RECORD_STATUS_TERMINATED_OK 0x1a #define CEC_OP_RECORD_STATUS_ALREADY_TERM 0x1b #define CEC_OP_RECORD_STATUS_OTHER 0x1f #define CEC_MSG_RECORD_TV_SCREEN 0x0f /* Timer Programming Feature */ #define CEC_MSG_CLEAR_ANALOGUE_TIMER 0x33 /* Recording Sequence Operand (recording_seq) */ #define CEC_OP_REC_SEQ_SUNDAY 0x01 #define CEC_OP_REC_SEQ_MONDAY 0x02 #define CEC_OP_REC_SEQ_TUESDAY 0x04 #define CEC_OP_REC_SEQ_WEDNESDAY 0x08 #define CEC_OP_REC_SEQ_THURSDAY 0x10 #define CEC_OP_REC_SEQ_FRIDAY 0x20 #define CEC_OP_REC_SEQ_SATURDAY 0x40 #define CEC_OP_REC_SEQ_ONCE_ONLY 0x00 #define CEC_MSG_CLEAR_DIGITAL_TIMER 0x99 #define CEC_MSG_CLEAR_EXT_TIMER 0xa1 /* External Source Specifier Operand (ext_src_spec) */ #define CEC_OP_EXT_SRC_PLUG 0x04 #define CEC_OP_EXT_SRC_PHYS_ADDR 0x05 #define CEC_MSG_SET_ANALOGUE_TIMER 0x34 #define CEC_MSG_SET_DIGITAL_TIMER 0x97 #define CEC_MSG_SET_EXT_TIMER 0xa2 #define CEC_MSG_SET_TIMER_PROGRAM_TITLE 0x67 #define CEC_MSG_TIMER_CLEARED_STATUS 0x43 /* Timer Cleared Status Data Operand (timer_cleared_status) */ #define CEC_OP_TIMER_CLR_STAT_RECORDING 0x00 #define CEC_OP_TIMER_CLR_STAT_NO_MATCHING 0x01 #define CEC_OP_TIMER_CLR_STAT_NO_INFO 0x02 #define CEC_OP_TIMER_CLR_STAT_CLEARED 0x80 #define CEC_MSG_TIMER_STATUS 0x35 /* Timer Overlap Warning Operand (timer_overlap_warning) */ #define CEC_OP_TIMER_OVERLAP_WARNING_NO_OVERLAP 0 #define CEC_OP_TIMER_OVERLAP_WARNING_OVERLAP 1 /* Media Info Operand (media_info) */ #define CEC_OP_MEDIA_INFO_UNPROT_MEDIA 0 #define CEC_OP_MEDIA_INFO_PROT_MEDIA 1 #define CEC_OP_MEDIA_INFO_NO_MEDIA 2 /* Programmed Indicator Operand (prog_indicator) */ #define CEC_OP_PROG_IND_NOT_PROGRAMMED 0 #define CEC_OP_PROG_IND_PROGRAMMED 1 /* Programmed Info Operand (prog_info) */ #define CEC_OP_PROG_INFO_ENOUGH_SPACE 0x08 #define CEC_OP_PROG_INFO_NOT_ENOUGH_SPACE 0x09 #define CEC_OP_PROG_INFO_MIGHT_NOT_BE_ENOUGH_SPACE 0x0b #define CEC_OP_PROG_INFO_NONE_AVAILABLE 0x0a /* Not Programmed Error Info Operand (prog_error) */ #define CEC_OP_PROG_ERROR_NO_FREE_TIMER 0x01 #define CEC_OP_PROG_ERROR_DATE_OUT_OF_RANGE 0x02 #define CEC_OP_PROG_ERROR_REC_SEQ_ERROR 0x03 #define CEC_OP_PROG_ERROR_INV_EXT_PLUG 0x04 #define CEC_OP_PROG_ERROR_INV_EXT_PHYS_ADDR 0x05 #define CEC_OP_PROG_ERROR_CA_UNSUPP 0x06 #define CEC_OP_PROG_ERROR_INSUF_CA_ENTITLEMENTS 0x07 #define CEC_OP_PROG_ERROR_RESOLUTION_UNSUPP 0x08 #define CEC_OP_PROG_ERROR_PARENTAL_LOCK 0x09 #define CEC_OP_PROG_ERROR_CLOCK_FAILURE 0x0a #define CEC_OP_PROG_ERROR_DUPLICATE 0x0e /* System Information Feature */ #define CEC_MSG_CEC_VERSION 0x9e /* CEC Version Operand (cec_version) */ #define CEC_OP_CEC_VERSION_1_3A 4 #define CEC_OP_CEC_VERSION_1_4 5 #define CEC_OP_CEC_VERSION_2_0 6 #define CEC_MSG_GET_CEC_VERSION 0x9f #define CEC_MSG_GIVE_PHYSICAL_ADDR 0x83 #define CEC_MSG_GET_MENU_LANGUAGE 0x91 #define CEC_MSG_REPORT_PHYSICAL_ADDR 0x84 /* Primary Device Type Operand (prim_devtype) */ #define CEC_OP_PRIM_DEVTYPE_TV 0 #define CEC_OP_PRIM_DEVTYPE_RECORD 1 #define CEC_OP_PRIM_DEVTYPE_TUNER 3 #define CEC_OP_PRIM_DEVTYPE_PLAYBACK 4 #define CEC_OP_PRIM_DEVTYPE_AUDIOSYSTEM 5 #define CEC_OP_PRIM_DEVTYPE_SWITCH 6 #define CEC_OP_PRIM_DEVTYPE_PROCESSOR 7 #define CEC_MSG_SET_MENU_LANGUAGE 0x32 #define CEC_MSG_REPORT_FEATURES 0xa6 /* HDMI 2.0 */ /* All Device Types Operand (all_device_types) */ #define CEC_OP_ALL_DEVTYPE_TV 0x80 #define CEC_OP_ALL_DEVTYPE_RECORD 0x40 #define CEC_OP_ALL_DEVTYPE_TUNER 0x20 #define CEC_OP_ALL_DEVTYPE_PLAYBACK 0x10 #define CEC_OP_ALL_DEVTYPE_AUDIOSYSTEM 0x08 #define CEC_OP_ALL_DEVTYPE_SWITCH 0x04 /* * And if you wondering what happened to PROCESSOR devices: those should * be mapped to a SWITCH. */ /* Valid for RC Profile and Device Feature operands */ #define CEC_OP_FEAT_EXT 0x80 /* Extension bit */ /* RC Profile Operand (rc_profile) */ #define CEC_OP_FEAT_RC_TV_PROFILE_NONE 0x00 #define CEC_OP_FEAT_RC_TV_PROFILE_1 0x02 #define CEC_OP_FEAT_RC_TV_PROFILE_2 0x06 #define CEC_OP_FEAT_RC_TV_PROFILE_3 0x0a #define CEC_OP_FEAT_RC_TV_PROFILE_4 0x0e #define CEC_OP_FEAT_RC_SRC_HAS_DEV_ROOT_MENU 0x50 #define CEC_OP_FEAT_RC_SRC_HAS_DEV_SETUP_MENU 0x48 #define CEC_OP_FEAT_RC_SRC_HAS_CONTENTS_MENU 0x44 #define CEC_OP_FEAT_RC_SRC_HAS_MEDIA_TOP_MENU 0x42 #define CEC_OP_FEAT_RC_SRC_HAS_MEDIA_CONTEXT_MENU 0x41 /* Device Feature Operand (dev_features) */ #define CEC_OP_FEAT_DEV_HAS_RECORD_TV_SCREEN 0x40 #define CEC_OP_FEAT_DEV_HAS_SET_OSD_STRING 0x20 #define CEC_OP_FEAT_DEV_HAS_DECK_CONTROL 0x10 #define CEC_OP_FEAT_DEV_HAS_SET_AUDIO_RATE 0x08 #define CEC_OP_FEAT_DEV_SINK_HAS_ARC_TX 0x04 #define CEC_OP_FEAT_DEV_SOURCE_HAS_ARC_RX 0x02 #define CEC_OP_FEAT_DEV_HAS_SET_AUDIO_VOLUME_LEVEL 0x01 #define CEC_MSG_GIVE_FEATURES 0xa5 /* HDMI 2.0 */ /* Deck Control Feature */ #define CEC_MSG_DECK_CONTROL 0x42 /* Deck Control Mode Operand (deck_control_mode) */ #define CEC_OP_DECK_CTL_MODE_SKIP_FWD 1 #define CEC_OP_DECK_CTL_MODE_SKIP_REV 2 #define CEC_OP_DECK_CTL_MODE_STOP 3 #define CEC_OP_DECK_CTL_MODE_EJECT 4 #define CEC_MSG_DECK_STATUS 0x1b /* Deck Info Operand (deck_info) */ #define CEC_OP_DECK_INFO_PLAY 0x11 #define CEC_OP_DECK_INFO_RECORD 0x12 #define CEC_OP_DECK_INFO_PLAY_REV 0x13 #define CEC_OP_DECK_INFO_STILL 0x14 #define CEC_OP_DECK_INFO_SLOW 0x15 #define CEC_OP_DECK_INFO_SLOW_REV 0x16 #define CEC_OP_DECK_INFO_FAST_FWD 0x17 #define CEC_OP_DECK_INFO_FAST_REV 0x18 #define CEC_OP_DECK_INFO_NO_MEDIA 0x19 #define CEC_OP_DECK_INFO_STOP 0x1a #define CEC_OP_DECK_INFO_SKIP_FWD 0x1b #define CEC_OP_DECK_INFO_SKIP_REV 0x1c #define CEC_OP_DECK_INFO_INDEX_SEARCH_FWD 0x1d #define CEC_OP_DECK_INFO_INDEX_SEARCH_REV 0x1e #define CEC_OP_DECK_INFO_OTHER 0x1f #define CEC_MSG_GIVE_DECK_STATUS 0x1a /* Status Request Operand (status_req) */ #define CEC_OP_STATUS_REQ_ON 1 #define CEC_OP_STATUS_REQ_OFF 2 #define CEC_OP_STATUS_REQ_ONCE 3 #define CEC_MSG_PLAY 0x41 /* Play Mode Operand (play_mode) */ #define CEC_OP_PLAY_MODE_PLAY_FWD 0x24 #define CEC_OP_PLAY_MODE_PLAY_REV 0x20 #define CEC_OP_PLAY_MODE_PLAY_STILL 0x25 #define CEC_OP_PLAY_MODE_PLAY_FAST_FWD_MIN 0x05 #define CEC_OP_PLAY_MODE_PLAY_FAST_FWD_MED 0x06 #define CEC_OP_PLAY_MODE_PLAY_FAST_FWD_MAX 0x07 #define CEC_OP_PLAY_MODE_PLAY_FAST_REV_MIN 0x09 #define CEC_OP_PLAY_MODE_PLAY_FAST_REV_MED 0x0a #define CEC_OP_PLAY_MODE_PLAY_FAST_REV_MAX 0x0b #define CEC_OP_PLAY_MODE_PLAY_SLOW_FWD_MIN 0x15 #define CEC_OP_PLAY_MODE_PLAY_SLOW_FWD_MED 0x16 #define CEC_OP_PLAY_MODE_PLAY_SLOW_FWD_MAX 0x17 #define CEC_OP_PLAY_MODE_PLAY_SLOW_REV_MIN 0x19 #define CEC_OP_PLAY_MODE_PLAY_SLOW_REV_MED 0x1a #define CEC_OP_PLAY_MODE_PLAY_SLOW_REV_MAX 0x1b /* Tuner Control Feature */ #define CEC_MSG_GIVE_TUNER_DEVICE_STATUS 0x08 #define CEC_MSG_SELECT_ANALOGUE_SERVICE 0x92 #define CEC_MSG_SELECT_DIGITAL_SERVICE 0x93 #define CEC_MSG_TUNER_DEVICE_STATUS 0x07 /* Recording Flag Operand (rec_flag) */ #define CEC_OP_REC_FLAG_NOT_USED 0 #define CEC_OP_REC_FLAG_USED 1 /* Tuner Display Info Operand (tuner_display_info) */ #define CEC_OP_TUNER_DISPLAY_INFO_DIGITAL 0 #define CEC_OP_TUNER_DISPLAY_INFO_NONE 1 #define CEC_OP_TUNER_DISPLAY_INFO_ANALOGUE 2 #define CEC_MSG_TUNER_STEP_DECREMENT 0x06 #define CEC_MSG_TUNER_STEP_INCREMENT 0x05 /* Vendor Specific Commands Feature */ /* * Has also: * CEC_MSG_CEC_VERSION * CEC_MSG_GET_CEC_VERSION */ #define CEC_MSG_DEVICE_VENDOR_ID 0x87 #define CEC_MSG_GIVE_DEVICE_VENDOR_ID 0x8c #define CEC_MSG_VENDOR_COMMAND 0x89 #define CEC_MSG_VENDOR_COMMAND_WITH_ID 0xa0 #define CEC_MSG_VENDOR_REMOTE_BUTTON_DOWN 0x8a #define CEC_MSG_VENDOR_REMOTE_BUTTON_UP 0x8b /* OSD Display Feature */ #define CEC_MSG_SET_OSD_STRING 0x64 /* Display Control Operand (disp_ctl) */ #define CEC_OP_DISP_CTL_DEFAULT 0x00 #define CEC_OP_DISP_CTL_UNTIL_CLEARED 0x40 #define CEC_OP_DISP_CTL_CLEAR 0x80 /* Device OSD Transfer Feature */ #define CEC_MSG_GIVE_OSD_NAME 0x46 #define CEC_MSG_SET_OSD_NAME 0x47 /* Device Menu Control Feature */ #define CEC_MSG_MENU_REQUEST 0x8d /* Menu Request Type Operand (menu_req) */ #define CEC_OP_MENU_REQUEST_ACTIVATE 0x00 #define CEC_OP_MENU_REQUEST_DEACTIVATE 0x01 #define CEC_OP_MENU_REQUEST_QUERY 0x02 #define CEC_MSG_MENU_STATUS 0x8e /* Menu State Operand (menu_state) */ #define CEC_OP_MENU_STATE_ACTIVATED 0x00 #define CEC_OP_MENU_STATE_DEACTIVATED 0x01 #define CEC_MSG_USER_CONTROL_PRESSED 0x44 /* UI Command Operand (ui_cmd) */ #define CEC_OP_UI_CMD_SELECT 0x00 #define CEC_OP_UI_CMD_UP 0x01 #define CEC_OP_UI_CMD_DOWN 0x02 #define CEC_OP_UI_CMD_LEFT 0x03 #define CEC_OP_UI_CMD_RIGHT 0x04 #define CEC_OP_UI_CMD_RIGHT_UP 0x05 #define CEC_OP_UI_CMD_RIGHT_DOWN 0x06 #define CEC_OP_UI_CMD_LEFT_UP 0x07 #define CEC_OP_UI_CMD_LEFT_DOWN 0x08 #define CEC_OP_UI_CMD_DEVICE_ROOT_MENU 0x09 #define CEC_OP_UI_CMD_DEVICE_SETUP_MENU 0x0a #define CEC_OP_UI_CMD_CONTENTS_MENU 0x0b #define CEC_OP_UI_CMD_FAVORITE_MENU 0x0c #define CEC_OP_UI_CMD_BACK 0x0d #define CEC_OP_UI_CMD_MEDIA_TOP_MENU 0x10 #define CEC_OP_UI_CMD_MEDIA_CONTEXT_SENSITIVE_MENU 0x11 #define CEC_OP_UI_CMD_NUMBER_ENTRY_MODE 0x1d #define CEC_OP_UI_CMD_NUMBER_11 0x1e #define CEC_OP_UI_CMD_NUMBER_12 0x1f #define CEC_OP_UI_CMD_NUMBER_0_OR_NUMBER_10 0x20 #define CEC_OP_UI_CMD_NUMBER_1 0x21 #define CEC_OP_UI_CMD_NUMBER_2 0x22 #define CEC_OP_UI_CMD_NUMBER_3 0x23 #define CEC_OP_UI_CMD_NUMBER_4 0x24 #define CEC_OP_UI_CMD_NUMBER_5 0x25 #define CEC_OP_UI_CMD_NUMBER_6 0x26 #define CEC_OP_UI_CMD_NUMBER_7 0x27 #define CEC_OP_UI_CMD_NUMBER_8 0x28 #define CEC_OP_UI_CMD_NUMBER_9 0x29 #define CEC_OP_UI_CMD_DOT 0x2a #define CEC_OP_UI_CMD_ENTER 0x2b #define CEC_OP_UI_CMD_CLEAR 0x2c #define CEC_OP_UI_CMD_NEXT_FAVORITE 0x2f #define CEC_OP_UI_CMD_CHANNEL_UP 0x30 #define CEC_OP_UI_CMD_CHANNEL_DOWN 0x31 #define CEC_OP_UI_CMD_PREVIOUS_CHANNEL 0x32 #define CEC_OP_UI_CMD_SOUND_SELECT 0x33 #define CEC_OP_UI_CMD_INPUT_SELECT 0x34 #define CEC_OP_UI_CMD_DISPLAY_INFORMATION 0x35 #define CEC_OP_UI_CMD_HELP 0x36 #define CEC_OP_UI_CMD_PAGE_UP 0x37 #define CEC_OP_UI_CMD_PAGE_DOWN 0x38 #define CEC_OP_UI_CMD_POWER 0x40 #define CEC_OP_UI_CMD_VOLUME_UP 0x41 #define CEC_OP_UI_CMD_VOLUME_DOWN 0x42 #define CEC_OP_UI_CMD_MUTE 0x43 #define CEC_OP_UI_CMD_PLAY 0x44 #define CEC_OP_UI_CMD_STOP 0x45 #define CEC_OP_UI_CMD_PAUSE 0x46 #define CEC_OP_UI_CMD_RECORD 0x47 #define CEC_OP_UI_CMD_REWIND 0x48 #define CEC_OP_UI_CMD_FAST_FORWARD 0x49 #define CEC_OP_UI_CMD_EJECT 0x4a #define CEC_OP_UI_CMD_SKIP_FORWARD 0x4b #define CEC_OP_UI_CMD_SKIP_BACKWARD 0x4c #define CEC_OP_UI_CMD_STOP_RECORD 0x4d #define CEC_OP_UI_CMD_PAUSE_RECORD 0x4e #define CEC_OP_UI_CMD_ANGLE 0x50 #define CEC_OP_UI_CMD_SUB_PICTURE 0x51 #define CEC_OP_UI_CMD_VIDEO_ON_DEMAND 0x52 #define CEC_OP_UI_CMD_ELECTRONIC_PROGRAM_GUIDE 0x53 #define CEC_OP_UI_CMD_TIMER_PROGRAMMING 0x54 #define CEC_OP_UI_CMD_INITIAL_CONFIGURATION 0x55 #define CEC_OP_UI_CMD_SELECT_BROADCAST_TYPE 0x56 #define CEC_OP_UI_CMD_SELECT_SOUND_PRESENTATION 0x57 #define CEC_OP_UI_CMD_AUDIO_DESCRIPTION 0x58 #define CEC_OP_UI_CMD_INTERNET 0x59 #define CEC_OP_UI_CMD_3D_MODE 0x5a #define CEC_OP_UI_CMD_PLAY_FUNCTION 0x60 #define CEC_OP_UI_CMD_PAUSE_PLAY_FUNCTION 0x61 #define CEC_OP_UI_CMD_RECORD_FUNCTION 0x62 #define CEC_OP_UI_CMD_PAUSE_RECORD_FUNCTION 0x63 #define CEC_OP_UI_CMD_STOP_FUNCTION 0x64 #define CEC_OP_UI_CMD_MUTE_FUNCTION 0x65 #define CEC_OP_UI_CMD_RESTORE_VOLUME_FUNCTION 0x66 #define CEC_OP_UI_CMD_TUNE_FUNCTION 0x67 #define CEC_OP_UI_CMD_SELECT_MEDIA_FUNCTION 0x68 #define CEC_OP_UI_CMD_SELECT_AV_INPUT_FUNCTION 0x69 #define CEC_OP_UI_CMD_SELECT_AUDIO_INPUT_FUNCTION 0x6a #define CEC_OP_UI_CMD_POWER_TOGGLE_FUNCTION 0x6b #define CEC_OP_UI_CMD_POWER_OFF_FUNCTION 0x6c #define CEC_OP_UI_CMD_POWER_ON_FUNCTION 0x6d #define CEC_OP_UI_CMD_F1_BLUE 0x71 #define CEC_OP_UI_CMD_F2_RED 0x72 #define CEC_OP_UI_CMD_F3_GREEN 0x73 #define CEC_OP_UI_CMD_F4_YELLOW 0x74 #define CEC_OP_UI_CMD_F5 0x75 #define CEC_OP_UI_CMD_DATA 0x76 /* UI Broadcast Type Operand (ui_bcast_type) */ #define CEC_OP_UI_BCAST_TYPE_TOGGLE_ALL 0x00 #define CEC_OP_UI_BCAST_TYPE_TOGGLE_DIG_ANA 0x01 #define CEC_OP_UI_BCAST_TYPE_ANALOGUE 0x10 #define CEC_OP_UI_BCAST_TYPE_ANALOGUE_T 0x20 #define CEC_OP_UI_BCAST_TYPE_ANALOGUE_CABLE 0x30 #define CEC_OP_UI_BCAST_TYPE_ANALOGUE_SAT 0x40 #define CEC_OP_UI_BCAST_TYPE_DIGITAL 0x50 #define CEC_OP_UI_BCAST_TYPE_DIGITAL_T 0x60 #define CEC_OP_UI_BCAST_TYPE_DIGITAL_CABLE 0x70 #define CEC_OP_UI_BCAST_TYPE_DIGITAL_SAT 0x80 #define CEC_OP_UI_BCAST_TYPE_DIGITAL_COM_SAT 0x90 #define CEC_OP_UI_BCAST_TYPE_DIGITAL_COM_SAT2 0x91 #define CEC_OP_UI_BCAST_TYPE_IP 0xa0 /* UI Sound Presentation Control Operand (ui_snd_pres_ctl) */ #define CEC_OP_UI_SND_PRES_CTL_DUAL_MONO 0x10 #define CEC_OP_UI_SND_PRES_CTL_KARAOKE 0x20 #define CEC_OP_UI_SND_PRES_CTL_DOWNMIX 0x80 #define CEC_OP_UI_SND_PRES_CTL_REVERB 0x90 #define CEC_OP_UI_SND_PRES_CTL_EQUALIZER 0xa0 #define CEC_OP_UI_SND_PRES_CTL_BASS_UP 0xb1 #define CEC_OP_UI_SND_PRES_CTL_BASS_NEUTRAL 0xb2 #define CEC_OP_UI_SND_PRES_CTL_BASS_DOWN 0xb3 #define CEC_OP_UI_SND_PRES_CTL_TREBLE_UP 0xc1 #define CEC_OP_UI_SND_PRES_CTL_TREBLE_NEUTRAL 0xc2 #define CEC_OP_UI_SND_PRES_CTL_TREBLE_DOWN 0xc3 #define CEC_MSG_USER_CONTROL_RELEASED 0x45 /* Remote Control Passthrough Feature */ /* * Has also: * CEC_MSG_USER_CONTROL_PRESSED * CEC_MSG_USER_CONTROL_RELEASED */ /* Power Status Feature */ #define CEC_MSG_GIVE_DEVICE_POWER_STATUS 0x8f #define CEC_MSG_REPORT_POWER_STATUS 0x90 /* Power Status Operand (pwr_state) */ #define CEC_OP_POWER_STATUS_ON 0 #define CEC_OP_POWER_STATUS_STANDBY 1 #define CEC_OP_POWER_STATUS_TO_ON 2 #define CEC_OP_POWER_STATUS_TO_STANDBY 3 /* General Protocol Messages */ #define CEC_MSG_FEATURE_ABORT 0x00 /* Abort Reason Operand (reason) */ #define CEC_OP_ABORT_UNRECOGNIZED_OP 0 #define CEC_OP_ABORT_INCORRECT_MODE 1 #define CEC_OP_ABORT_NO_SOURCE 2 #define CEC_OP_ABORT_INVALID_OP 3 #define CEC_OP_ABORT_REFUSED 4 #define CEC_OP_ABORT_UNDETERMINED 5 #define CEC_MSG_ABORT 0xff /* System Audio Control Feature */ /* * Has also: * CEC_MSG_USER_CONTROL_PRESSED * CEC_MSG_USER_CONTROL_RELEASED */ #define CEC_MSG_GIVE_AUDIO_STATUS 0x71 #define CEC_MSG_GIVE_SYSTEM_AUDIO_MODE_STATUS 0x7d #define CEC_MSG_REPORT_AUDIO_STATUS 0x7a /* Audio Mute Status Operand (aud_mute_status) */ #define CEC_OP_AUD_MUTE_STATUS_OFF 0 #define CEC_OP_AUD_MUTE_STATUS_ON 1 #define CEC_MSG_REPORT_SHORT_AUDIO_DESCRIPTOR 0xa3 #define CEC_MSG_REQUEST_SHORT_AUDIO_DESCRIPTOR 0xa4 #define CEC_MSG_SET_SYSTEM_AUDIO_MODE 0x72 /* System Audio Status Operand (sys_aud_status) */ #define CEC_OP_SYS_AUD_STATUS_OFF 0 #define CEC_OP_SYS_AUD_STATUS_ON 1 #define CEC_MSG_SYSTEM_AUDIO_MODE_REQUEST 0x70 #define CEC_MSG_SYSTEM_AUDIO_MODE_STATUS 0x7e /* Audio Format ID Operand (audio_format_id) */ #define CEC_OP_AUD_FMT_ID_CEA861 0 #define CEC_OP_AUD_FMT_ID_CEA861_CXT 1 #define CEC_MSG_SET_AUDIO_VOLUME_LEVEL 0x73 /* Audio Rate Control Feature */ #define CEC_MSG_SET_AUDIO_RATE 0x9a /* Audio Rate Operand (audio_rate) */ #define CEC_OP_AUD_RATE_OFF 0 #define CEC_OP_AUD_RATE_WIDE_STD 1 #define CEC_OP_AUD_RATE_WIDE_FAST 2 #define CEC_OP_AUD_RATE_WIDE_SLOW 3 #define CEC_OP_AUD_RATE_NARROW_STD 4 #define CEC_OP_AUD_RATE_NARROW_FAST 5 #define CEC_OP_AUD_RATE_NARROW_SLOW 6 /* Audio Return Channel Control Feature */ #define CEC_MSG_INITIATE_ARC 0xc0 #define CEC_MSG_REPORT_ARC_INITIATED 0xc1 #define CEC_MSG_REPORT_ARC_TERMINATED 0xc2 #define CEC_MSG_REQUEST_ARC_INITIATION 0xc3 #define CEC_MSG_REQUEST_ARC_TERMINATION 0xc4 #define CEC_MSG_TERMINATE_ARC 0xc5 /* Dynamic Audio Lipsync Feature */ /* Only for CEC 2.0 and up */ #define CEC_MSG_REQUEST_CURRENT_LATENCY 0xa7 #define CEC_MSG_REPORT_CURRENT_LATENCY 0xa8 /* Low Latency Mode Operand (low_latency_mode) */ #define CEC_OP_LOW_LATENCY_MODE_OFF 0 #define CEC_OP_LOW_LATENCY_MODE_ON 1 /* Audio Output Compensated Operand (audio_out_compensated) */ #define CEC_OP_AUD_OUT_COMPENSATED_NA 0 #define CEC_OP_AUD_OUT_COMPENSATED_DELAY 1 #define CEC_OP_AUD_OUT_COMPENSATED_NO_DELAY 2 #define CEC_OP_AUD_OUT_COMPENSATED_PARTIAL_DELAY 3 /* Capability Discovery and Control Feature */ #define CEC_MSG_CDC_MESSAGE 0xf8 /* Ethernet-over-HDMI: nobody ever does this... */ #define CEC_MSG_CDC_HEC_INQUIRE_STATE 0x00 #define CEC_MSG_CDC_HEC_REPORT_STATE 0x01 /* HEC Functionality State Operand (hec_func_state) */ #define CEC_OP_HEC_FUNC_STATE_NOT_SUPPORTED 0 #define CEC_OP_HEC_FUNC_STATE_INACTIVE 1 #define CEC_OP_HEC_FUNC_STATE_ACTIVE 2 #define CEC_OP_HEC_FUNC_STATE_ACTIVATION_FIELD 3 /* Host Functionality State Operand (host_func_state) */ #define CEC_OP_HOST_FUNC_STATE_NOT_SUPPORTED 0 #define CEC_OP_HOST_FUNC_STATE_INACTIVE 1 #define CEC_OP_HOST_FUNC_STATE_ACTIVE 2 /* ENC Functionality State Operand (enc_func_state) */ #define CEC_OP_ENC_FUNC_STATE_EXT_CON_NOT_SUPPORTED 0 #define CEC_OP_ENC_FUNC_STATE_EXT_CON_INACTIVE 1 #define CEC_OP_ENC_FUNC_STATE_EXT_CON_ACTIVE 2 /* CDC Error Code Operand (cdc_errcode) */ #define CEC_OP_CDC_ERROR_CODE_NONE 0 #define CEC_OP_CDC_ERROR_CODE_CAP_UNSUPPORTED 1 #define CEC_OP_CDC_ERROR_CODE_WRONG_STATE 2 #define CEC_OP_CDC_ERROR_CODE_OTHER 3 /* HEC Support Operand (hec_support) */ #define CEC_OP_HEC_SUPPORT_NO 0 #define CEC_OP_HEC_SUPPORT_YES 1 /* HEC Activation Operand (hec_activation) */ #define CEC_OP_HEC_ACTIVATION_ON 0 #define CEC_OP_HEC_ACTIVATION_OFF 1 #define CEC_MSG_CDC_HEC_SET_STATE_ADJACENT 0x02 #define CEC_MSG_CDC_HEC_SET_STATE 0x03 /* HEC Set State Operand (hec_set_state) */ #define CEC_OP_HEC_SET_STATE_DEACTIVATE 0 #define CEC_OP_HEC_SET_STATE_ACTIVATE 1 #define CEC_MSG_CDC_HEC_REQUEST_DEACTIVATION 0x04 #define CEC_MSG_CDC_HEC_NOTIFY_ALIVE 0x05 #define CEC_MSG_CDC_HEC_DISCOVER 0x06 /* Hotplug Detect messages */ #define CEC_MSG_CDC_HPD_SET_STATE 0x10 /* HPD State Operand (hpd_state) */ #define CEC_OP_HPD_STATE_CP_EDID_DISABLE 0 #define CEC_OP_HPD_STATE_CP_EDID_ENABLE 1 #define CEC_OP_HPD_STATE_CP_EDID_DISABLE_ENABLE 2 #define CEC_OP_HPD_STATE_EDID_DISABLE 3 #define CEC_OP_HPD_STATE_EDID_ENABLE 4 #define CEC_OP_HPD_STATE_EDID_DISABLE_ENABLE 5 #define CEC_MSG_CDC_HPD_REPORT_STATE 0x11 /* HPD Error Code Operand (hpd_error) */ #define CEC_OP_HPD_ERROR_NONE 0 #define CEC_OP_HPD_ERROR_INITIATOR_NOT_CAPABLE 1 #define CEC_OP_HPD_ERROR_INITIATOR_WRONG_STATE 2 #define CEC_OP_HPD_ERROR_OTHER 3 #define CEC_OP_HPD_ERROR_NONE_NO_VIDEO 4 /* End of Messages */ /* Helper functions to identify the 'special' CEC devices */ static inline int cec_is_2nd_tv(const struct cec_log_addrs *las) { /* * It is a second TV if the logical address is 14 or 15 and the * primary device type is a TV. */ return las->num_log_addrs && las->log_addr[0] >= CEC_LOG_ADDR_SPECIFIC && las->primary_device_type[0] == CEC_OP_PRIM_DEVTYPE_TV; } static inline int cec_is_processor(const struct cec_log_addrs *las) { /* * It is a processor if the logical address is 12-15 and the * primary device type is a Processor. */ return las->num_log_addrs && las->log_addr[0] >= CEC_LOG_ADDR_BACKUP_1 && las->primary_device_type[0] == CEC_OP_PRIM_DEVTYPE_PROCESSOR; } static inline int cec_is_switch(const struct cec_log_addrs *las) { /* * It is a switch if the logical address is 15 and the * primary device type is a Switch and the CDC-Only flag is not set. */ return las->num_log_addrs == 1 && las->log_addr[0] == CEC_LOG_ADDR_UNREGISTERED && las->primary_device_type[0] == CEC_OP_PRIM_DEVTYPE_SWITCH && !(las->flags & CEC_LOG_ADDRS_FL_CDC_ONLY); } static inline int cec_is_cdc_only(const struct cec_log_addrs *las) { /* * It is a CDC-only device if the logical address is 15 and the * primary device type is a Switch and the CDC-Only flag is set. */ return las->num_log_addrs == 1 && las->log_addr[0] == CEC_LOG_ADDR_UNREGISTERED && las->primary_device_type[0] == CEC_OP_PRIM_DEVTYPE_SWITCH && (las->flags & CEC_LOG_ADDRS_FL_CDC_ONLY); } #endif |
| 5 5 5 5 5 5 5 5 42 42 7 7 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 | // SPDX-License-Identifier: GPL-2.0+ /* * Procedures for creating, accessing and interpreting the device tree. * * Paul Mackerras August 1996. * Copyright (C) 1996-2005 Paul Mackerras. * * Adapted for 64bit PowerPC by Dave Engebretsen and Peter Bergner. * {engebret|bergner}@us.ibm.com * * Adapted for sparc and sparc64 by David S. Miller davem@davemloft.net * * Reconsolidated from arch/x/kernel/prom.c by Stephen Rothwell and * Grant Likely. */ #define pr_fmt(fmt) "OF: " fmt #include <linux/cleanup.h> #include <linux/console.h> #include <linux/ctype.h> #include <linux/cpu.h> #include <linux/module.h> #include <linux/of.h> #include <linux/of_device.h> #include <linux/of_graph.h> #include <linux/spinlock.h> #include <linux/slab.h> #include <linux/string.h> #include <linux/proc_fs.h> #include "of_private.h" LIST_HEAD(aliases_lookup); struct device_node *of_root; EXPORT_SYMBOL(of_root); struct device_node *of_chosen; EXPORT_SYMBOL(of_chosen); struct device_node *of_aliases; struct device_node *of_stdout; static const char *of_stdout_options; struct kset *of_kset; /* * Used to protect the of_aliases, to hold off addition of nodes to sysfs. * This mutex must be held whenever modifications are being made to the * device tree. The of_{attach,detach}_node() and * of_{add,remove,update}_property() helpers make sure this happens. */ DEFINE_MUTEX(of_mutex); /* use when traversing tree through the child, sibling, * or parent members of struct device_node. */ DEFINE_RAW_SPINLOCK(devtree_lock); bool of_node_name_eq(const struct device_node *np, const char *name) { const char *node_name; size_t len; if (!np) return false; node_name = kbasename(np->full_name); len = strchrnul(node_name, '@') - node_name; return (strlen(name) == len) && (strncmp(node_name, name, len) == 0); } EXPORT_SYMBOL(of_node_name_eq); bool of_node_name_prefix(const struct device_node *np, const char *prefix) { if (!np) return false; return strncmp(kbasename(np->full_name), prefix, strlen(prefix)) == 0; } EXPORT_SYMBOL(of_node_name_prefix); static bool __of_node_is_type(const struct device_node *np, const char *type) { const char *match = __of_get_property(np, "device_type", NULL); return np && match && type && !strcmp(match, type); } #define EXCLUDED_DEFAULT_CELLS_PLATFORMS ( \ IS_ENABLED(CONFIG_SPARC) || \ of_find_compatible_node(NULL, NULL, "coreboot") \ ) int of_bus_n_addr_cells(struct device_node *np) { u32 cells; for (; np; np = np->parent) { if (!of_property_read_u32(np, "#address-cells", &cells)) return cells; /* * Default root value and walking parent nodes for "#address-cells" * is deprecated. Any platforms which hit this warning should * be added to the excluded list. */ WARN_ONCE(!EXCLUDED_DEFAULT_CELLS_PLATFORMS, "Missing '#address-cells' in %pOF\n", np); } return OF_ROOT_NODE_ADDR_CELLS_DEFAULT; } int of_n_addr_cells(struct device_node *np) { if (np->parent) np = np->parent; return of_bus_n_addr_cells(np); } EXPORT_SYMBOL(of_n_addr_cells); int of_bus_n_size_cells(struct device_node *np) { u32 cells; for (; np; np = np->parent) { if (!of_property_read_u32(np, "#size-cells", &cells)) return cells; /* * Default root value and walking parent nodes for "#size-cells" * is deprecated. Any platforms which hit this warning should * be added to the excluded list. */ WARN_ONCE(!EXCLUDED_DEFAULT_CELLS_PLATFORMS, "Missing '#size-cells' in %pOF\n", np); } return OF_ROOT_NODE_SIZE_CELLS_DEFAULT; } int of_n_size_cells(struct device_node *np) { if (np->parent) np = np->parent; return of_bus_n_size_cells(np); } EXPORT_SYMBOL(of_n_size_cells); #ifdef CONFIG_NUMA int __weak of_node_to_nid(struct device_node *np) { return NUMA_NO_NODE; } #endif #define OF_PHANDLE_CACHE_BITS 7 #define OF_PHANDLE_CACHE_SZ BIT(OF_PHANDLE_CACHE_BITS) static struct device_node *phandle_cache[OF_PHANDLE_CACHE_SZ]; static u32 of_phandle_cache_hash(phandle handle) { return hash_32(handle, OF_PHANDLE_CACHE_BITS); } /* * Caller must hold devtree_lock. */ void __of_phandle_cache_inv_entry(phandle handle) { u32 handle_hash; struct device_node *np; if (!handle) return; handle_hash = of_phandle_cache_hash(handle); np = phandle_cache[handle_hash]; if (np && handle == np->phandle) phandle_cache[handle_hash] = NULL; } void __init of_core_init(void) { struct device_node *np; of_platform_register_reconfig_notifier(); /* Create the kset, and register existing nodes */ mutex_lock(&of_mutex); of_kset = kset_create_and_add("devicetree", NULL, firmware_kobj); if (!of_kset) { mutex_unlock(&of_mutex); pr_err("failed to register existing nodes\n"); return; } for_each_of_allnodes(np) { __of_attach_node_sysfs(np); if (np->phandle && !phandle_cache[of_phandle_cache_hash(np->phandle)]) phandle_cache[of_phandle_cache_hash(np->phandle)] = np; } mutex_unlock(&of_mutex); /* Symlink in /proc as required by userspace ABI */ if (of_root) proc_symlink("device-tree", NULL, "/sys/firmware/devicetree/base"); } static struct property *__of_find_property(const struct device_node *np, const char *name, int *lenp) { struct property *pp; if (!np) return NULL; for (pp = np->properties; pp; pp = pp->next) { if (of_prop_cmp(pp->name, name) == 0) { if (lenp) *lenp = pp->length; break; } } return pp; } struct property *of_find_property(const struct device_node *np, const char *name, int *lenp) { struct property *pp; unsigned long flags; raw_spin_lock_irqsave(&devtree_lock, flags); pp = __of_find_property(np, name, lenp); raw_spin_unlock_irqrestore(&devtree_lock, flags); return pp; } EXPORT_SYMBOL(of_find_property); struct device_node *__of_find_all_nodes(struct device_node *prev) { struct device_node *np; if (!prev) { np = of_root; } else if (prev->child) { np = prev->child; } else { /* Walk back up looking for a sibling, or the end of the structure */ np = prev; while (np->parent && !np->sibling) np = np->parent; np = np->sibling; /* Might be null at the end of the tree */ } return np; } /** * of_find_all_nodes - Get next node in global list * @prev: Previous node or NULL to start iteration * of_node_put() will be called on it * * Return: A node pointer with refcount incremented, use * of_node_put() on it when done. */ struct device_node *of_find_all_nodes(struct device_node *prev) { struct device_node *np; unsigned long flags; raw_spin_lock_irqsave(&devtree_lock, flags); np = __of_find_all_nodes(prev); of_node_get(np); of_node_put(prev); raw_spin_unlock_irqrestore(&devtree_lock, flags); return np; } EXPORT_SYMBOL(of_find_all_nodes); /* * Find a property with a given name for a given node * and return the value. */ const void *__of_get_property(const struct device_node *np, const char *name, int *lenp) { const struct property *pp = __of_find_property(np, name, lenp); return pp ? pp->value : NULL; } /* * Find a property with a given name for a given node * and return the value. */ const void *of_get_property(const struct device_node *np, const char *name, int *lenp) { const struct property *pp = of_find_property(np, name, lenp); return pp ? pp->value : NULL; } EXPORT_SYMBOL(of_get_property); /** * __of_device_is_compatible() - Check if the node matches given constraints * @device: pointer to node * @compat: required compatible string, NULL or "" for any match * @type: required device_type value, NULL or "" for any match * @name: required node name, NULL or "" for any match * * Checks if the given @compat, @type and @name strings match the * properties of the given @device. A constraints can be skipped by * passing NULL or an empty string as the constraint. * * Returns 0 for no match, and a positive integer on match. The return * value is a relative score with larger values indicating better * matches. The score is weighted for the most specific compatible value * to get the highest score. Matching type is next, followed by matching * name. Practically speaking, this results in the following priority * order for matches: * * 1. specific compatible && type && name * 2. specific compatible && type * 3. specific compatible && name * 4. specific compatible * 5. general compatible && type && name * 6. general compatible && type * 7. general compatible && name * 8. general compatible * 9. type && name * 10. type * 11. name */ static int __of_device_is_compatible(const struct device_node *device, const char *compat, const char *type, const char *name) { const struct property *prop; const char *cp; int index = 0, score = 0; /* Compatible match has highest priority */ if (compat && compat[0]) { prop = __of_find_property(device, "compatible", NULL); for (cp = of_prop_next_string(prop, NULL); cp; cp = of_prop_next_string(prop, cp), index++) { if (of_compat_cmp(cp, compat, strlen(compat)) == 0) { score = INT_MAX/2 - (index << 2); break; } } if (!score) return 0; } /* Matching type is better than matching name */ if (type && type[0]) { if (!__of_node_is_type(device, type)) return 0; score += 2; } /* Matching name is a bit better than not */ if (name && name[0]) { if (!of_node_name_eq(device, name)) return 0; score++; } return score; } /** Checks if the given "compat" string matches one of the strings in * the device's "compatible" property */ int of_device_is_compatible(const struct device_node *device, const char *compat) { unsigned long flags; int res; raw_spin_lock_irqsave(&devtree_lock, flags); res = __of_device_is_compatible(device, compat, NULL, NULL); raw_spin_unlock_irqrestore(&devtree_lock, flags); return res; } EXPORT_SYMBOL(of_device_is_compatible); /** Checks if the device is compatible with any of the entries in * a NULL terminated array of strings. Returns the best match * score or 0. */ int of_device_compatible_match(const struct device_node *device, const char *const *compat) { unsigned int tmp, score = 0; if (!compat) return 0; while (*compat) { tmp = of_device_is_compatible(device, *compat); if (tmp > score) score = tmp; compat++; } return score; } EXPORT_SYMBOL_GPL(of_device_compatible_match); /** * of_machine_compatible_match - Test root of device tree against a compatible array * @compats: NULL terminated array of compatible strings to look for in root node's compatible property. * * Returns true if the root node has any of the given compatible values in its * compatible property. */ bool of_machine_compatible_match(const char *const *compats) { struct device_node *root; int rc = 0; root = of_find_node_by_path("/"); if (root) { rc = of_device_compatible_match(root, compats); of_node_put(root); } return rc != 0; } EXPORT_SYMBOL(of_machine_compatible_match); /** * of_machine_device_match - Test root of device tree against a of_device_id array * @matches: NULL terminated array of of_device_id match structures to search in * * Returns true if the root node has any of the given compatible values in its * compatible property. */ bool of_machine_device_match(const struct of_device_id *matches) { struct device_node *root; const struct of_device_id *match = NULL; root = of_find_node_by_path("/"); if (root) { match = of_match_node(matches, root); of_node_put(root); } return match != NULL; } EXPORT_SYMBOL(of_machine_device_match); /** * of_machine_get_match_data - Tell if root of device tree has a matching of_match structure * @matches: NULL terminated array of of_device_id match structures to search in * * Returns data associated with matched entry or NULL */ const void *of_machine_get_match_data(const struct of_device_id *matches) { const struct of_device_id *match; struct device_node *root; root = of_find_node_by_path("/"); if (!root) return NULL; match = of_match_node(matches, root); of_node_put(root); if (!match) return NULL; return match->data; } EXPORT_SYMBOL(of_machine_get_match_data); static bool __of_device_is_status(const struct device_node *device, const char * const*strings) { const char *status; int statlen; if (!device) return false; status = __of_get_property(device, "status", &statlen); if (status == NULL) return false; if (statlen > 0) { while (*strings) { unsigned int len = strlen(*strings); if ((*strings)[len - 1] == '-') { if (!strncmp(status, *strings, len)) return true; } else { if (!strcmp(status, *strings)) return true; } strings++; } } return false; } /** * __of_device_is_available - check if a device is available for use * * @device: Node to check for availability, with locks already held * * Return: True if the status property is absent or set to "okay" or "ok", * false otherwise */ static bool __of_device_is_available(const struct device_node *device) { static const char * const ok[] = {"okay", "ok", NULL}; if (!device) return false; return !__of_get_property(device, "status", NULL) || __of_device_is_status(device, ok); } /** * __of_device_is_reserved - check if a device is reserved * * @device: Node to check for availability, with locks already held * * Return: True if the status property is set to "reserved", false otherwise */ static bool __of_device_is_reserved(const struct device_node *device) { static const char * const reserved[] = {"reserved", NULL}; return __of_device_is_status(device, reserved); } /** * of_device_is_available - check if a device is available for use * * @device: Node to check for availability * * Return: True if the status property is absent or set to "okay" or "ok", * false otherwise */ bool of_device_is_available(const struct device_node *device) { unsigned long flags; bool res; raw_spin_lock_irqsave(&devtree_lock, flags); res = __of_device_is_available(device); raw_spin_unlock_irqrestore(&devtree_lock, flags); return res; } EXPORT_SYMBOL(of_device_is_available); /** * __of_device_is_fail - check if a device has status "fail" or "fail-..." * * @device: Node to check status for, with locks already held * * Return: True if the status property is set to "fail" or "fail-..." (for any * error code suffix), false otherwise */ static bool __of_device_is_fail(const struct device_node *device) { static const char * const fail[] = {"fail", "fail-", NULL}; return __of_device_is_status(device, fail); } /** * of_device_is_big_endian - check if a device has BE registers * * @device: Node to check for endianness * * Return: True if the device has a "big-endian" property, or if the kernel * was compiled for BE *and* the device has a "native-endian" property. * Returns false otherwise. * * Callers would nominally use ioread32be/iowrite32be if * of_device_is_big_endian() == true, or readl/writel otherwise. */ bool of_device_is_big_endian(const struct device_node *device) { if (of_property_read_bool(device, "big-endian")) return true; if (IS_ENABLED(CONFIG_CPU_BIG_ENDIAN) && of_property_read_bool(device, "native-endian")) return true; return false; } EXPORT_SYMBOL(of_device_is_big_endian); /** * of_get_parent - Get a node's parent if any * @node: Node to get parent * * Return: A node pointer with refcount incremented, use * of_node_put() on it when done. */ struct device_node *of_get_parent(const struct device_node *node) { struct device_node *np; unsigned long flags; if (!node) return NULL; raw_spin_lock_irqsave(&devtree_lock, flags); np = of_node_get(node->parent); raw_spin_unlock_irqrestore(&devtree_lock, flags); return np; } EXPORT_SYMBOL(of_get_parent); /** * of_get_next_parent - Iterate to a node's parent * @node: Node to get parent of * * This is like of_get_parent() except that it drops the * refcount on the passed node, making it suitable for iterating * through a node's parents. * * Return: A node pointer with refcount incremented, use * of_node_put() on it when done. */ struct device_node *of_get_next_parent(struct device_node *node) { struct device_node *parent; unsigned long flags; if (!node) return NULL; raw_spin_lock_irqsave(&devtree_lock, flags); parent = of_node_get(node->parent); of_node_put(node); raw_spin_unlock_irqrestore(&devtree_lock, flags); return parent; } EXPORT_SYMBOL(of_get_next_parent); static struct device_node *__of_get_next_child(const struct device_node *node, struct device_node *prev) { struct device_node *next; if (!node) return NULL; next = prev ? prev->sibling : node->child; of_node_get(next); of_node_put(prev); return next; } #define __for_each_child_of_node(parent, child) \ for (child = __of_get_next_child(parent, NULL); child != NULL; \ child = __of_get_next_child(parent, child)) /** * of_get_next_child - Iterate a node childs * @node: parent node * @prev: previous child of the parent node, or NULL to get first * * Return: A node pointer with refcount incremented, use of_node_put() on * it when done. Returns NULL when prev is the last child. Decrements the * refcount of prev. */ struct device_node *of_get_next_child(const struct device_node *node, struct device_node *prev) { struct device_node *next; unsigned long flags; raw_spin_lock_irqsave(&devtree_lock, flags); next = __of_get_next_child(node, prev); raw_spin_unlock_irqrestore(&devtree_lock, flags); return next; } EXPORT_SYMBOL(of_get_next_child); /** * of_get_next_child_with_prefix - Find the next child node with prefix * @node: parent node * @prev: previous child of the parent node, or NULL to get first * @prefix: prefix that the node name should have * * This function is like of_get_next_child(), except that it automatically * skips any nodes whose name doesn't have the given prefix. * * Return: A node pointer with refcount incremented, use * of_node_put() on it when done. */ struct device_node *of_get_next_child_with_prefix(const struct device_node *node, struct device_node *prev, const char *prefix) { struct device_node *next; unsigned long flags; if (!node) return NULL; raw_spin_lock_irqsave(&devtree_lock, flags); next = prev ? prev->sibling : node->child; for (; next; next = next->sibling) { if (!of_node_name_prefix(next, prefix)) continue; if (of_node_get(next)) break; } of_node_put(prev); raw_spin_unlock_irqrestore(&devtree_lock, flags); return next; } EXPORT_SYMBOL(of_get_next_child_with_prefix); static struct device_node *of_get_next_status_child(const struct device_node *node, struct device_node *prev, bool (*checker)(const struct device_node *)) { struct device_node *next; unsigned long flags; if (!node) return NULL; raw_spin_lock_irqsave(&devtree_lock, flags); next = prev ? prev->sibling : node->child; for (; next; next = next->sibling) { if (!checker(next)) continue; if (of_node_get(next)) break; } of_node_put(prev); raw_spin_unlock_irqrestore(&devtree_lock, flags); return next; } /** * of_get_next_available_child - Find the next available child node * @node: parent node * @prev: previous child of the parent node, or NULL to get first * * This function is like of_get_next_child(), except that it * automatically skips any disabled nodes (i.e. status = "disabled"). */ struct device_node *of_get_next_available_child(const struct device_node *node, struct device_node *prev) { return of_get_next_status_child(node, prev, __of_device_is_available); } EXPORT_SYMBOL(of_get_next_available_child); /** * of_get_next_reserved_child - Find the next reserved child node * @node: parent node * @prev: previous child of the parent node, or NULL to get first * * This function is like of_get_next_child(), except that it * automatically skips any disabled nodes (i.e. status = "disabled"). */ struct device_node *of_get_next_reserved_child(const struct device_node *node, struct device_node *prev) { return of_get_next_status_child(node, prev, __of_device_is_reserved); } EXPORT_SYMBOL(of_get_next_reserved_child); /** * of_get_next_cpu_node - Iterate on cpu nodes * @prev: previous child of the /cpus node, or NULL to get first * * Unusable CPUs (those with the status property set to "fail" or "fail-...") * will be skipped. * * Return: A cpu node pointer with refcount incremented, use of_node_put() * on it when done. Returns NULL when prev is the last child. Decrements * the refcount of prev. */ struct device_node *of_get_next_cpu_node(struct device_node *prev) { struct device_node *next = NULL; unsigned long flags; struct device_node *node; if (!prev) node = of_find_node_by_path("/cpus"); raw_spin_lock_irqsave(&devtree_lock, flags); if (prev) next = prev->sibling; else if (node) { next = node->child; of_node_put(node); } for (; next; next = next->sibling) { if (__of_device_is_fail(next)) continue; if (!(of_node_name_eq(next, "cpu") || __of_node_is_type(next, "cpu"))) continue; if (of_node_get(next)) break; } of_node_put(prev); raw_spin_unlock_irqrestore(&devtree_lock, flags); return next; } EXPORT_SYMBOL(of_get_next_cpu_node); /** * of_get_compatible_child - Find compatible child node * @parent: parent node * @compatible: compatible string * * Lookup child node whose compatible property contains the given compatible * string. * * Return: a node pointer with refcount incremented, use of_node_put() on it * when done; or NULL if not found. */ struct device_node *of_get_compatible_child(const struct device_node *parent, const char *compatible) { struct device_node *child; for_each_child_of_node(parent, child) { if (of_device_is_compatible(child, compatible)) break; } return child; } EXPORT_SYMBOL(of_get_compatible_child); /** * of_get_child_by_name - Find the child node by name for a given parent * @node: parent node * @name: child name to look for. * * This function looks for child node for given matching name * * Return: A node pointer if found, with refcount incremented, use * of_node_put() on it when done. * Returns NULL if node is not found. */ struct device_node *of_get_child_by_name(const struct device_node *node, const char *name) { struct device_node *child; for_each_child_of_node(node, child) if (of_node_name_eq(child, name)) break; return child; } EXPORT_SYMBOL(of_get_child_by_name); /** * of_get_available_child_by_name - Find the available child node by name for a given parent * @node: parent node * @name: child name to look for. * * This function looks for child node for given matching name and checks the * device's availability for use. * * Return: A node pointer if found, with refcount incremented, use * of_node_put() on it when done. * Returns NULL if node is not found. */ struct device_node *of_get_available_child_by_name(const struct device_node *node, const char *name) { struct device_node *child; child = of_get_child_by_name(node, name); if (child && !of_device_is_available(child)) { of_node_put(child); return NULL; } return child; } EXPORT_SYMBOL(of_get_available_child_by_name); struct device_node *__of_find_node_by_path(const struct device_node *parent, const char *path) { struct device_node *child; int len; len = strcspn(path, "/:"); if (!len) return NULL; __for_each_child_of_node(parent, child) { const char *name = kbasename(child->full_name); if (strncmp(path, name, len) == 0 && (strlen(name) == len)) return child; } return NULL; } struct device_node *__of_find_node_by_full_path(struct device_node *node, const char *path) { const char *separator = strchr(path, ':'); while (node && *path == '/') { struct device_node *tmp = node; path++; /* Increment past '/' delimiter */ node = __of_find_node_by_path(node, path); of_node_put(tmp); path = strchrnul(path, '/'); if (separator && separator < path) break; } return node; } /** * of_find_node_opts_by_path - Find a node matching a full OF path * @path: Either the full path to match, or if the path does not * start with '/', the name of a property of the /aliases * node (an alias). In the case of an alias, the node * matching the alias' value will be returned. * @opts: Address of a pointer into which to store the start of * an options string appended to the end of the path with * a ':' separator. * * Valid paths: * * /foo/bar Full path * * foo Valid alias * * foo/bar Valid alias + relative path * * Return: A node pointer with refcount incremented, use * of_node_put() on it when done. */ struct device_node *of_find_node_opts_by_path(const char *path, const char **opts) { struct device_node *np = NULL; const struct property *pp; unsigned long flags; const char *separator = strchr(path, ':'); if (opts) *opts = separator ? separator + 1 : NULL; if (strcmp(path, "/") == 0) return of_node_get(of_root); /* The path could begin with an alias */ if (*path != '/') { int len; const char *p = strchrnul(path, '/'); if (separator && separator < p) p = separator; len = p - path; /* of_aliases must not be NULL */ if (!of_aliases) return NULL; for_each_property_of_node(of_aliases, pp) { if (strlen(pp->name) == len && !strncmp(pp->name, path, len)) { np = of_find_node_by_path(pp->value); break; } } if (!np) return NULL; path = p; } /* Step down the tree matching path components */ raw_spin_lock_irqsave(&devtree_lock, flags); if (!np) np = of_node_get(of_root); np = __of_find_node_by_full_path(np, path); raw_spin_unlock_irqrestore(&devtree_lock, flags); return np; } EXPORT_SYMBOL(of_find_node_opts_by_path); /** * of_find_node_by_name - Find a node by its "name" property * @from: The node to start searching from or NULL; the node * you pass will not be searched, only the next one * will. Typically, you pass what the previous call * returned. of_node_put() will be called on @from. * @name: The name string to match against * * Return: A node pointer with refcount incremented, use * of_node_put() on it when done. */ struct device_node *of_find_node_by_name(struct device_node *from, const char *name) { struct device_node *np; unsigned long flags; raw_spin_lock_irqsave(&devtree_lock, flags); for_each_of_allnodes_from(from, np) if (of_node_name_eq(np, name) && of_node_get(np)) break; of_node_put(from); raw_spin_unlock_irqrestore(&devtree_lock, flags); return np; } EXPORT_SYMBOL(of_find_node_by_name); /** * of_find_node_by_type - Find a node by its "device_type" property * @from: The node to start searching from, or NULL to start searching * the entire device tree. The node you pass will not be * searched, only the next one will; typically, you pass * what the previous call returned. of_node_put() will be * called on from for you. * @type: The type string to match against * * Return: A node pointer with refcount incremented, use * of_node_put() on it when done. */ struct device_node *of_find_node_by_type(struct device_node *from, const char *type) { struct device_node *np; unsigned long flags; raw_spin_lock_irqsave(&devtree_lock, flags); for_each_of_allnodes_from(from, np) if (__of_node_is_type(np, type) && of_node_get(np)) break; of_node_put(from); raw_spin_unlock_irqrestore(&devtree_lock, flags); return np; } EXPORT_SYMBOL(of_find_node_by_type); /** * of_find_compatible_node - Find a node based on type and one of the * tokens in its "compatible" property * @from: The node to start searching from or NULL, the node * you pass will not be searched, only the next one * will; typically, you pass what the previous call * returned. of_node_put() will be called on it * @type: The type string to match "device_type" or NULL to ignore * @compatible: The string to match to one of the tokens in the device * "compatible" list. * * Return: A node pointer with refcount incremented, use * of_node_put() on it when done. */ struct device_node *of_find_compatible_node(struct device_node *from, const char *type, const char *compatible) { struct device_node *np; unsigned long flags; raw_spin_lock_irqsave(&devtree_lock, flags); for_each_of_allnodes_from(from, np) if (__of_device_is_compatible(np, compatible, type, NULL) && of_node_get(np)) break; of_node_put(from); raw_spin_unlock_irqrestore(&devtree_lock, flags); return np; } EXPORT_SYMBOL(of_find_compatible_node); /** * of_find_node_with_property - Find a node which has a property with * the given name. * @from: The node to start searching from or NULL, the node * you pass will not be searched, only the next one * will; typically, you pass what the previous call * returned. of_node_put() will be called on it * @prop_name: The name of the property to look for. * * Return: A node pointer with refcount incremented, use * of_node_put() on it when done. */ struct device_node *of_find_node_with_property(struct device_node *from, const char *prop_name) { struct device_node *np; unsigned long flags; raw_spin_lock_irqsave(&devtree_lock, flags); for_each_of_allnodes_from(from, np) { if (__of_find_property(np, prop_name, NULL)) { of_node_get(np); break; } } of_node_put(from); raw_spin_unlock_irqrestore(&devtree_lock, flags); return np; } EXPORT_SYMBOL(of_find_node_with_property); static const struct of_device_id *__of_match_node(const struct of_device_id *matches, const struct device_node *node) { const struct of_device_id *best_match = NULL; int score, best_score = 0; if (!matches) return NULL; for (; matches->name[0] || matches->type[0] || matches->compatible[0]; matches++) { score = __of_device_is_compatible(node, matches->compatible, matches->type, matches->name); if (score > best_score) { best_match = matches; best_score = score; } } return best_match; } /** * of_match_node - Tell if a device_node has a matching of_match structure * @matches: array of of device match structures to search in * @node: the of device structure to match against * * Low level utility function used by device matching. */ const struct of_device_id *of_match_node(const struct of_device_id *matches, const struct device_node *node) { const struct of_device_id *match; unsigned long flags; raw_spin_lock_irqsave(&devtree_lock, flags); match = __of_match_node(matches, node); raw_spin_unlock_irqrestore(&devtree_lock, flags); return match; } EXPORT_SYMBOL(of_match_node); /** * of_find_matching_node_and_match - Find a node based on an of_device_id * match table. * @from: The node to start searching from or NULL, the node * you pass will not be searched, only the next one * will; typically, you pass what the previous call * returned. of_node_put() will be called on it * @matches: array of of device match structures to search in * @match: Updated to point at the matches entry which matched * * Return: A node pointer with refcount incremented, use * of_node_put() on it when done. */ struct device_node *of_find_matching_node_and_match(struct device_node *from, const struct of_device_id *matches, const struct of_device_id **match) { struct device_node *np; const struct of_device_id *m; unsigned long flags; if (match) *match = NULL; raw_spin_lock_irqsave(&devtree_lock, flags); for_each_of_allnodes_from(from, np) { m = __of_match_node(matches, np); if (m && of_node_get(np)) { if (match) *match = m; break; } } of_node_put(from); raw_spin_unlock_irqrestore(&devtree_lock, flags); return np; } EXPORT_SYMBOL(of_find_matching_node_and_match); /** * of_alias_from_compatible - Lookup appropriate alias for a device node * depending on compatible * @node: pointer to a device tree node * @alias: Pointer to buffer that alias value will be copied into * @len: Length of alias value * * Based on the value of the compatible property, this routine will attempt * to choose an appropriate alias value for a particular device tree node. * It does this by stripping the manufacturer prefix (as delimited by a ',') * from the first entry in the compatible list property. * * Note: The matching on just the "product" side of the compatible is a relic * from I2C and SPI. Please do not add any new user. * * Return: This routine returns 0 on success, <0 on failure. */ int of_alias_from_compatible(const struct device_node *node, char *alias, int len) { const char *compatible, *p; int cplen; compatible = of_get_property(node, "compatible", &cplen); if (!compatible || strlen(compatible) > cplen) return -ENODEV; p = strchr(compatible, ','); strscpy(alias, p ? p + 1 : compatible, len); return 0; } EXPORT_SYMBOL_GPL(of_alias_from_compatible); /** * of_find_node_by_phandle - Find a node given a phandle * @handle: phandle of the node to find * * Return: A node pointer with refcount incremented, use * of_node_put() on it when done. */ struct device_node *of_find_node_by_phandle(phandle handle) { struct device_node *np = NULL; unsigned long flags; u32 handle_hash; if (!handle) return NULL; handle_hash = of_phandle_cache_hash(handle); raw_spin_lock_irqsave(&devtree_lock, flags); if (phandle_cache[handle_hash] && handle == phandle_cache[handle_hash]->phandle) np = phandle_cache[handle_hash]; if (!np) { for_each_of_allnodes(np) if (np->phandle == handle && !of_node_check_flag(np, OF_DETACHED)) { phandle_cache[handle_hash] = np; break; } } of_node_get(np); raw_spin_unlock_irqrestore(&devtree_lock, flags); return np; } EXPORT_SYMBOL(of_find_node_by_phandle); void of_print_phandle_args(const char *msg, const struct of_phandle_args *args) { int i; printk("%s %pOF", msg, args->np); for (i = 0; i < args->args_count; i++) { const char delim = i ? ',' : ':'; pr_cont("%c%08x", delim, args->args[i]); } pr_cont("\n"); } int of_phandle_iterator_init(struct of_phandle_iterator *it, const struct device_node *np, const char *list_name, const char *cells_name, int cell_count) { const __be32 *list; int size; memset(it, 0, sizeof(*it)); /* * one of cell_count or cells_name must be provided to determine the * argument length. */ if (cell_count < 0 && !cells_name) return -EINVAL; list = of_get_property(np, list_name, &size); if (!list) return -ENOENT; it->cells_name = cells_name; it->cell_count = cell_count; it->parent = np; it->list_end = list + size / sizeof(*list); it->phandle_end = list; it->cur = list; return 0; } EXPORT_SYMBOL_GPL(of_phandle_iterator_init); int of_phandle_iterator_next(struct of_phandle_iterator *it) { uint32_t count = 0; if (it->node) { of_node_put(it->node); it->node = NULL; } if (!it->cur || it->phandle_end >= it->list_end) return -ENOENT; it->cur = it->phandle_end; /* If phandle is 0, then it is an empty entry with no arguments. */ it->phandle = be32_to_cpup(it->cur++); if (it->phandle) { /* * Find the provider node and parse the #*-cells property to * determine the argument length. */ it->node = of_find_node_by_phandle(it->phandle); if (it->cells_name) { if (!it->node) { pr_err("%pOF: could not find phandle %d\n", it->parent, it->phandle); goto err; } if (of_property_read_u32(it->node, it->cells_name, &count)) { /* * If both cell_count and cells_name is given, * fall back to cell_count in absence * of the cells_name property */ if (it->cell_count >= 0) { count = it->cell_count; } else { pr_err("%pOF: could not get %s for %pOF\n", it->parent, it->cells_name, it->node); goto err; } } } else { count = it->cell_count; } /* * Make sure that the arguments actually fit in the remaining * property data length */ if (it->cur + count > it->list_end) { if (it->cells_name) pr_err("%pOF: %s = %d found %td\n", it->parent, it->cells_name, count, it->list_end - it->cur); else pr_err("%pOF: phandle %s needs %d, found %td\n", it->parent, of_node_full_name(it->node), count, it->list_end - it->cur); goto err; } } it->phandle_end = it->cur + count; it->cur_count = count; return 0; err: if (it->node) { of_node_put(it->node); it->node = NULL; } return -EINVAL; } EXPORT_SYMBOL_GPL(of_phandle_iterator_next); int of_phandle_iterator_args(struct of_phandle_iterator *it, uint32_t *args, int size) { int i, count; count = it->cur_count; if (WARN_ON(size < count)) count = size; for (i = 0; i < count; i++) args[i] = be32_to_cpup(it->cur++); return count; } int __of_parse_phandle_with_args(const struct device_node *np, const char *list_name, const char *cells_name, int cell_count, int index, struct of_phandle_args *out_args) { struct of_phandle_iterator it; int rc, cur_index = 0; if (index < 0) return -EINVAL; /* Loop over the phandles until all the requested entry is found */ of_for_each_phandle(&it, rc, np, list_name, cells_name, cell_count) { /* * All of the error cases bail out of the loop, so at * this point, the parsing is successful. If the requested * index matches, then fill the out_args structure and return, * or return -ENOENT for an empty entry. */ rc = -ENOENT; if (cur_index == index) { if (!it.phandle) goto err; if (out_args) { int c; c = of_phandle_iterator_args(&it, out_args->args, MAX_PHANDLE_ARGS); out_args->np = it.node; out_args->args_count = c; } else { of_node_put(it.node); } /* Found it! return success */ return 0; } cur_index++; } /* * Unlock node before returning result; will be one of: * -ENOENT : index is for empty phandle * -EINVAL : parsing error on data */ err: of_node_put(it.node); return rc; } EXPORT_SYMBOL(__of_parse_phandle_with_args); /** * of_parse_phandle_with_args_map() - Find a node pointed by phandle in a list and remap it * @np: pointer to a device tree node containing a list * @list_name: property name that contains a list * @stem_name: stem of property names that specify phandles' arguments count * @index: index of a phandle to parse out * @out_args: optional pointer to output arguments structure (will be filled) * * This function is useful to parse lists of phandles and their arguments. * Returns 0 on success and fills out_args, on error returns appropriate errno * value. The difference between this function and of_parse_phandle_with_args() * is that this API remaps a phandle if the node the phandle points to has * a <@stem_name>-map property. * * Caller is responsible to call of_node_put() on the returned out_args->np * pointer. * * Example:: * * phandle1: node1 { * #list-cells = <2>; * }; * * phandle2: node2 { * #list-cells = <1>; * }; * * phandle3: node3 { * #list-cells = <1>; * list-map = <0 &phandle2 3>, * <1 &phandle2 2>, * <2 &phandle1 5 1>; * list-map-mask = <0x3>; * }; * * node4 { * list = <&phandle1 1 2 &phandle3 0>; * }; * * To get a device_node of the ``node2`` node you may call this: * of_parse_phandle_with_args(node4, "list", "list", 1, &args); */ int of_parse_phandle_with_args_map(const struct device_node *np, const char *list_name, const char *stem_name, int index, struct of_phandle_args *out_args) { char *cells_name __free(kfree) = kasprintf(GFP_KERNEL, "#%s-cells", stem_name); char *map_name __free(kfree) = kasprintf(GFP_KERNEL, "%s-map", stem_name); char *mask_name __free(kfree) = kasprintf(GFP_KERNEL, "%s-map-mask", stem_name); char *pass_name __free(kfree) = kasprintf(GFP_KERNEL, "%s-map-pass-thru", stem_name); struct device_node *cur, *new = NULL; const __be32 *map, *mask, *pass; static const __be32 dummy_mask[] = { [0 ... (MAX_PHANDLE_ARGS - 1)] = cpu_to_be32(~0) }; static const __be32 dummy_pass[] = { [0 ... (MAX_PHANDLE_ARGS - 1)] = cpu_to_be32(0) }; __be32 initial_match_array[MAX_PHANDLE_ARGS]; const __be32 *match_array = initial_match_array; int i, ret, map_len, match; u32 list_size, new_size; if (index < 0) return -EINVAL; if (!cells_name || !map_name || !mask_name || !pass_name) return -ENOMEM; ret = __of_parse_phandle_with_args(np, list_name, cells_name, -1, index, out_args); if (ret) return ret; /* Get the #<list>-cells property */ cur = out_args->np; ret = of_property_read_u32(cur, cells_name, &list_size); if (ret < 0) goto put; /* Precalculate the match array - this simplifies match loop */ for (i = 0; i < list_size; i++) initial_match_array[i] = cpu_to_be32(out_args->args[i]); ret = -EINVAL; while (cur) { /* Get the <list>-map property */ map = of_get_property(cur, map_name, &map_len); if (!map) { return 0; } map_len /= sizeof(u32); /* Get the <list>-map-mask property (optional) */ mask = of_get_property(cur, mask_name, NULL); if (!mask) mask = dummy_mask; /* Iterate through <list>-map property */ match = 0; while (map_len > (list_size + 1) && !match) { /* Compare specifiers */ match = 1; for (i = 0; i < list_size; i++, map_len--) match &= !((match_array[i] ^ *map++) & mask[i]); of_node_put(new); new = of_find_node_by_phandle(be32_to_cpup(map)); map++; map_len--; /* Check if not found */ if (!new) { ret = -EINVAL; goto put; } if (!of_device_is_available(new)) match = 0; ret = of_property_read_u32(new, cells_name, &new_size); if (ret) goto put; /* Check for malformed properties */ if (WARN_ON(new_size > MAX_PHANDLE_ARGS) || map_len < new_size) { ret = -EINVAL; goto put; } /* Move forward by new node's #<list>-cells amount */ map += new_size; map_len -= new_size; } if (!match) { ret = -ENOENT; goto put; } /* Get the <list>-map-pass-thru property (optional) */ pass = of_get_property(cur, pass_name, NULL); if (!pass) pass = dummy_pass; /* * Successfully parsed a <list>-map translation; copy new * specifier into the out_args structure, keeping the * bits specified in <list>-map-pass-thru. */ for (i = 0; i < new_size; i++) { __be32 val = *(map - new_size + i); if (i < list_size) { val &= ~pass[i]; val |= cpu_to_be32(out_args->args[i]) & pass[i]; } initial_match_array[i] = val; out_args->args[i] = be32_to_cpu(val); } out_args->args_count = list_size = new_size; /* Iterate again with new provider */ out_args->np = new; of_node_put(cur); cur = new; new = NULL; } put: of_node_put(cur); of_node_put(new); return ret; } EXPORT_SYMBOL(of_parse_phandle_with_args_map); /** * of_count_phandle_with_args() - Find the number of phandles references in a property * @np: pointer to a device tree node containing a list * @list_name: property name that contains a list * @cells_name: property name that specifies phandles' arguments count * * Return: The number of phandle + argument tuples within a property. It * is a typical pattern to encode a list of phandle and variable * arguments into a single property. The number of arguments is encoded * by a property in the phandle-target node. For example, a gpios * property would contain a list of GPIO specifies consisting of a * phandle and 1 or more arguments. The number of arguments are * determined by the #gpio-cells property in the node pointed to by the * phandle. */ int of_count_phandle_with_args(const struct device_node *np, const char *list_name, const char *cells_name) { struct of_phandle_iterator it; int rc, cur_index = 0; /* * If cells_name is NULL we assume a cell count of 0. This makes * counting the phandles trivial as each 32bit word in the list is a * phandle and no arguments are to consider. So we don't iterate through * the list but just use the length to determine the phandle count. */ if (!cells_name) { const __be32 *list; int size; list = of_get_property(np, list_name, &size); if (!list) return -ENOENT; return size / sizeof(*list); } rc = of_phandle_iterator_init(&it, np, list_name, cells_name, -1); if (rc) return rc; while ((rc = of_phandle_iterator_next(&it)) == 0) cur_index += 1; if (rc != -ENOENT) return rc; return cur_index; } EXPORT_SYMBOL(of_count_phandle_with_args); static struct property *__of_remove_property_from_list(struct property **list, struct property *prop) { struct property **next; for (next = list; *next; next = &(*next)->next) { if (*next == prop) { *next = prop->next; prop->next = NULL; return prop; } } return NULL; } /** * __of_add_property - Add a property to a node without lock operations * @np: Caller's Device Node * @prop: Property to add */ int __of_add_property(struct device_node *np, struct property *prop) { int rc = 0; unsigned long flags; struct property **next; raw_spin_lock_irqsave(&devtree_lock, flags); __of_remove_property_from_list(&np->deadprops, prop); prop->next = NULL; next = &np->properties; while (*next) { if (of_prop_cmp(prop->name, (*next)->name) == 0) { /* duplicate ! don't insert it */ rc = -EEXIST; goto out_unlock; } next = &(*next)->next; } *next = prop; out_unlock: raw_spin_unlock_irqrestore(&devtree_lock, flags); if (rc) return rc; __of_add_property_sysfs(np, prop); return 0; } /** * of_add_property - Add a property to a node * @np: Caller's Device Node * @prop: Property to add */ int of_add_property(struct device_node *np, struct property *prop) { int rc; mutex_lock(&of_mutex); rc = __of_add_property(np, prop); mutex_unlock(&of_mutex); if (!rc) of_property_notify(OF_RECONFIG_ADD_PROPERTY, np, prop, NULL); return rc; } EXPORT_SYMBOL_GPL(of_add_property); int __of_remove_property(struct device_node *np, struct property *prop) { unsigned long flags; int rc = -ENODEV; raw_spin_lock_irqsave(&devtree_lock, flags); if (__of_remove_property_from_list(&np->properties, prop)) { /* Found the property, add it to deadprops list */ prop->next = np->deadprops; np->deadprops = prop; rc = 0; } raw_spin_unlock_irqrestore(&devtree_lock, flags); if (rc) return rc; __of_remove_property_sysfs(np, prop); return 0; } /** * of_remove_property - Remove a property from a node. * @np: Caller's Device Node * @prop: Property to remove * * Note that we don't actually remove it, since we have given out * who-knows-how-many pointers to the data using get-property. * Instead we just move the property to the "dead properties" * list, so it won't be found any more. */ int of_remove_property(struct device_node *np, struct property *prop) { int rc; if (!prop) return -ENODEV; mutex_lock(&of_mutex); rc = __of_remove_property(np, prop); mutex_unlock(&of_mutex); if (!rc) of_property_notify(OF_RECONFIG_REMOVE_PROPERTY, np, prop, NULL); return rc; } EXPORT_SYMBOL_GPL(of_remove_property); int __of_update_property(struct device_node *np, struct property *newprop, struct property **oldpropp) { struct property **next, *oldprop; unsigned long flags; raw_spin_lock_irqsave(&devtree_lock, flags); __of_remove_property_from_list(&np->deadprops, newprop); for (next = &np->properties; *next; next = &(*next)->next) { if (of_prop_cmp((*next)->name, newprop->name) == 0) break; } *oldpropp = oldprop = *next; if (oldprop) { /* replace the node */ newprop->next = oldprop->next; *next = newprop; oldprop->next = np->deadprops; np->deadprops = oldprop; } else { /* new node */ newprop->next = NULL; *next = newprop; } raw_spin_unlock_irqrestore(&devtree_lock, flags); __of_update_property_sysfs(np, newprop, oldprop); return 0; } /* * of_update_property - Update a property in a node, if the property does * not exist, add it. * * Note that we don't actually remove it, since we have given out * who-knows-how-many pointers to the data using get-property. * Instead we just move the property to the "dead properties" list, * and add the new property to the property list */ int of_update_property(struct device_node *np, struct property *newprop) { struct property *oldprop; int rc; if (!newprop->name) return -EINVAL; mutex_lock(&of_mutex); rc = __of_update_property(np, newprop, &oldprop); mutex_unlock(&of_mutex); if (!rc) of_property_notify(OF_RECONFIG_UPDATE_PROPERTY, np, newprop, oldprop); return rc; } static void of_alias_add(struct alias_prop *ap, struct device_node *np, int id, const char *stem, int stem_len) { ap->np = np; ap->id = id; strscpy(ap->stem, stem, stem_len + 1); list_add_tail(&ap->link, &aliases_lookup); pr_debug("adding DT alias:%s: stem=%s id=%i node=%pOF\n", ap->alias, ap->stem, ap->id, np); } /** * of_alias_scan - Scan all properties of the 'aliases' node * @dt_alloc: An allocator that provides a virtual address to memory * for storing the resulting tree * * The function scans all the properties of the 'aliases' node and populates * the global lookup table with the properties. */ void of_alias_scan(void * (*dt_alloc)(u64 size, u64 align)) { const struct property *pp; of_aliases = of_find_node_by_path("/aliases"); of_chosen = of_find_node_by_path("/chosen"); if (of_chosen == NULL) of_chosen = of_find_node_by_path("/chosen@0"); if (of_chosen) { /* linux,stdout-path and /aliases/stdout are for legacy compatibility */ const char *name = NULL; if (of_property_read_string(of_chosen, "stdout-path", &name)) of_property_read_string(of_chosen, "linux,stdout-path", &name); if (IS_ENABLED(CONFIG_PPC) && !name) of_property_read_string(of_aliases, "stdout", &name); if (name) of_stdout = of_find_node_opts_by_path(name, &of_stdout_options); if (of_stdout) of_stdout->fwnode.flags |= FWNODE_FLAG_BEST_EFFORT; } if (!of_aliases) return; for_each_property_of_node(of_aliases, pp) { const char *start = pp->name; const char *end = start + strlen(start); struct device_node *np; struct alias_prop *ap; int id, len; /* Skip those we do not want to proceed */ if (is_pseudo_property(pp->name)) continue; np = of_find_node_by_path(pp->value); if (!np) continue; /* walk the alias backwards to extract the id and work out * the 'stem' string */ while (isdigit(*(end-1)) && end > start) end--; len = end - start; if (kstrtoint(end, 10, &id) < 0) { of_node_put(np); continue; } /* Allocate an alias_prop with enough space for the stem */ ap = dt_alloc(sizeof(*ap) + len + 1, __alignof__(*ap)); if (!ap) { of_node_put(np); continue; } memset(ap, 0, sizeof(*ap) + len + 1); ap->alias = start; of_alias_add(ap, np, id, start, len); } } /** * of_alias_get_id - Get alias id for the given device_node * @np: Pointer to the given device_node * @stem: Alias stem of the given device_node * * The function travels the lookup table to get the alias id for the given * device_node and alias stem. * * Return: The alias id if found. */ int of_alias_get_id(const struct device_node *np, const char *stem) { struct alias_prop *app; int id = -ENODEV; mutex_lock(&of_mutex); list_for_each_entry(app, &aliases_lookup, link) { if (strcmp(app->stem, stem) != 0) continue; if (np == app->np) { id = app->id; break; } } mutex_unlock(&of_mutex); return id; } EXPORT_SYMBOL_GPL(of_alias_get_id); /** * of_alias_get_highest_id - Get highest alias id for the given stem * @stem: Alias stem to be examined * * The function travels the lookup table to get the highest alias id for the * given alias stem. It returns the alias id if found. */ int of_alias_get_highest_id(const char *stem) { struct alias_prop *app; int id = -ENODEV; mutex_lock(&of_mutex); list_for_each_entry(app, &aliases_lookup, link) { if (strcmp(app->stem, stem) != 0) continue; if (app->id > id) id = app->id; } mutex_unlock(&of_mutex); return id; } EXPORT_SYMBOL_GPL(of_alias_get_highest_id); /** * of_console_check() - Test and setup console for DT setup * @dn: Pointer to device node * @name: Name to use for preferred console without index. ex. "ttyS" * @index: Index to use for preferred console. * * Check if the given device node matches the stdout-path property in the * /chosen node. If it does then register it as the preferred console. * * Return: TRUE if console successfully setup. Otherwise return FALSE. */ bool of_console_check(const struct device_node *dn, char *name, int index) { if (!dn || dn != of_stdout || console_set_on_cmdline) return false; /* * XXX: cast `options' to char pointer to suppress complication * warnings: printk, UART and console drivers expect char pointer. */ return !add_preferred_console(name, index, (char *)of_stdout_options); } EXPORT_SYMBOL_GPL(of_console_check); /** * of_find_next_cache_node - Find a node's subsidiary cache * @np: node of type "cpu" or "cache" * * Return: A node pointer with refcount incremented, use * of_node_put() on it when done. Caller should hold a reference * to np. */ struct device_node *of_find_next_cache_node(const struct device_node *np) { struct device_node *child, *cache_node; cache_node = of_parse_phandle(np, "l2-cache", 0); if (!cache_node) cache_node = of_parse_phandle(np, "next-level-cache", 0); if (cache_node) return cache_node; /* OF on pmac has nodes instead of properties named "l2-cache" * beneath CPU nodes. */ if (IS_ENABLED(CONFIG_PPC_PMAC) && of_node_is_type(np, "cpu")) for_each_child_of_node(np, child) if (of_node_is_type(child, "cache")) return child; return NULL; } /** * of_find_last_cache_level - Find the level at which the last cache is * present for the given logical cpu * * @cpu: cpu number(logical index) for which the last cache level is needed * * Return: The level at which the last cache is present. It is exactly * same as the total number of cache levels for the given logical cpu. */ int of_find_last_cache_level(unsigned int cpu) { u32 cache_level = 0; struct device_node *prev = NULL, *np = of_cpu_device_node_get(cpu); while (np) { of_node_put(prev); prev = np; np = of_find_next_cache_node(np); } of_property_read_u32(prev, "cache-level", &cache_level); of_node_put(prev); return cache_level; } /** * of_map_id - Translate an ID through a downstream mapping. * @np: root complex device node. * @id: device ID to map. * @map_name: property name of the map to use. * @map_mask_name: optional property name of the mask to use. * @target: optional pointer to a target device node. * @id_out: optional pointer to receive the translated ID. * * Given a device ID, look up the appropriate implementation-defined * platform ID and/or the target device which receives transactions on that * ID, as per the "iommu-map" and "msi-map" bindings. Either of @target or * @id_out may be NULL if only the other is required. If @target points to * a non-NULL device node pointer, only entries targeting that node will be * matched; if it points to a NULL value, it will receive the device node of * the first matching target phandle, with a reference held. * * Return: 0 on success or a standard error code on failure. */ int of_map_id(const struct device_node *np, u32 id, const char *map_name, const char *map_mask_name, struct device_node **target, u32 *id_out) { u32 map_mask, masked_id; int map_len; const __be32 *map = NULL; if (!np || !map_name || (!target && !id_out)) return -EINVAL; map = of_get_property(np, map_name, &map_len); if (!map) { if (target) return -ENODEV; /* Otherwise, no map implies no translation */ *id_out = id; return 0; } if (!map_len || map_len % (4 * sizeof(*map))) { pr_err("%pOF: Error: Bad %s length: %d\n", np, map_name, map_len); return -EINVAL; } /* The default is to select all bits. */ map_mask = 0xffffffff; /* * Can be overridden by "{iommu,msi}-map-mask" property. * If of_property_read_u32() fails, the default is used. */ if (map_mask_name) of_property_read_u32(np, map_mask_name, &map_mask); masked_id = map_mask & id; for ( ; map_len > 0; map_len -= 4 * sizeof(*map), map += 4) { struct device_node *phandle_node; u32 id_base = be32_to_cpup(map + 0); u32 phandle = be32_to_cpup(map + 1); u32 out_base = be32_to_cpup(map + 2); u32 id_len = be32_to_cpup(map + 3); if (id_base & ~map_mask) { pr_err("%pOF: Invalid %s translation - %s-mask (0x%x) ignores id-base (0x%x)\n", np, map_name, map_name, map_mask, id_base); return -EFAULT; } if (masked_id < id_base || masked_id >= id_base + id_len) continue; phandle_node = of_find_node_by_phandle(phandle); if (!phandle_node) return -ENODEV; if (target) { if (*target) of_node_put(phandle_node); else *target = phandle_node; if (*target != phandle_node) continue; } if (id_out) *id_out = masked_id - id_base + out_base; pr_debug("%pOF: %s, using mask %08x, id-base: %08x, out-base: %08x, length: %08x, id: %08x -> %08x\n", np, map_name, map_mask, id_base, out_base, id_len, id, masked_id - id_base + out_base); return 0; } pr_info("%pOF: no %s translation for id 0x%x on %pOF\n", np, map_name, id, target && *target ? *target : NULL); /* Bypasses translation */ if (id_out) *id_out = id; return 0; } EXPORT_SYMBOL_GPL(of_map_id); |
| 36 5 5 5 3 1 1 3 3 3 3 3 3 5 5 17 5 17 10 10 10 10 11 11 21 11 21 18 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 | // SPDX-License-Identifier: GPL-2.0-only /* Copyright (C) 2020 Red Hat, Inc. * Author: Jason Wang <jasowang@redhat.com> * * IOTLB implementation for vhost. */ #include <linux/slab.h> #include <linux/vhost_iotlb.h> #include <linux/module.h> #define MOD_VERSION "0.1" #define MOD_DESC "VHOST IOTLB" #define MOD_AUTHOR "Jason Wang <jasowang@redhat.com>" #define MOD_LICENSE "GPL v2" #define START(map) ((map)->start) #define LAST(map) ((map)->last) INTERVAL_TREE_DEFINE(struct vhost_iotlb_map, rb, __u64, __subtree_last, START, LAST, static inline, vhost_iotlb_itree); /** * vhost_iotlb_map_free - remove a map node and free it * @iotlb: the IOTLB * @map: the map that want to be remove and freed */ void vhost_iotlb_map_free(struct vhost_iotlb *iotlb, struct vhost_iotlb_map *map) { vhost_iotlb_itree_remove(map, &iotlb->root); list_del(&map->link); kfree(map); iotlb->nmaps--; } EXPORT_SYMBOL_GPL(vhost_iotlb_map_free); /** * vhost_iotlb_add_range_ctx - add a new range to vhost IOTLB * @iotlb: the IOTLB * @start: start of the IOVA range * @last: last of IOVA range * @addr: the address that is mapped to @start * @perm: access permission of this range * @opaque: the opaque pointer for the new mapping * * Returns an error last is smaller than start or memory allocation * fails */ int vhost_iotlb_add_range_ctx(struct vhost_iotlb *iotlb, u64 start, u64 last, u64 addr, unsigned int perm, void *opaque) { struct vhost_iotlb_map *map; if (last < start) return -EFAULT; /* If the range being mapped is [0, ULONG_MAX], split it into two entries * otherwise its size would overflow u64. */ if (start == 0 && last == ULONG_MAX) { u64 mid = last / 2; int err = vhost_iotlb_add_range_ctx(iotlb, start, mid, addr, perm, opaque); if (err) return err; addr += mid + 1; start = mid + 1; } if (iotlb->limit && iotlb->nmaps == iotlb->limit && iotlb->flags & VHOST_IOTLB_FLAG_RETIRE) { map = list_first_entry(&iotlb->list, typeof(*map), link); vhost_iotlb_map_free(iotlb, map); } map = kmalloc_obj(*map, GFP_ATOMIC); if (!map) return -ENOMEM; map->start = start; map->size = last - start + 1; map->last = last; map->addr = addr; map->perm = perm; map->opaque = opaque; iotlb->nmaps++; vhost_iotlb_itree_insert(map, &iotlb->root); INIT_LIST_HEAD(&map->link); list_add_tail(&map->link, &iotlb->list); return 0; } EXPORT_SYMBOL_GPL(vhost_iotlb_add_range_ctx); int vhost_iotlb_add_range(struct vhost_iotlb *iotlb, u64 start, u64 last, u64 addr, unsigned int perm) { return vhost_iotlb_add_range_ctx(iotlb, start, last, addr, perm, NULL); } EXPORT_SYMBOL_GPL(vhost_iotlb_add_range); /** * vhost_iotlb_del_range - delete overlapped ranges from vhost IOTLB * @iotlb: the IOTLB * @start: start of the IOVA range * @last: last of IOVA range */ void vhost_iotlb_del_range(struct vhost_iotlb *iotlb, u64 start, u64 last) { struct vhost_iotlb_map *map; while ((map = vhost_iotlb_itree_iter_first(&iotlb->root, start, last))) vhost_iotlb_map_free(iotlb, map); } EXPORT_SYMBOL_GPL(vhost_iotlb_del_range); /** * vhost_iotlb_init - initialize a vhost IOTLB * @iotlb: the IOTLB that needs to be initialized * @limit: maximum number of IOTLB entries * @flags: VHOST_IOTLB_FLAG_XXX */ void vhost_iotlb_init(struct vhost_iotlb *iotlb, unsigned int limit, unsigned int flags) { iotlb->root = RB_ROOT_CACHED; iotlb->limit = limit; iotlb->nmaps = 0; iotlb->flags = flags; INIT_LIST_HEAD(&iotlb->list); } EXPORT_SYMBOL_GPL(vhost_iotlb_init); /** * vhost_iotlb_alloc - add a new vhost IOTLB * @limit: maximum number of IOTLB entries * @flags: VHOST_IOTLB_FLAG_XXX * * Returns an error is memory allocation fails */ struct vhost_iotlb *vhost_iotlb_alloc(unsigned int limit, unsigned int flags) { struct vhost_iotlb *iotlb = kzalloc_obj(*iotlb); if (!iotlb) return NULL; vhost_iotlb_init(iotlb, limit, flags); return iotlb; } EXPORT_SYMBOL_GPL(vhost_iotlb_alloc); /** * vhost_iotlb_reset - reset vhost IOTLB (free all IOTLB entries) * @iotlb: the IOTLB to be reset */ void vhost_iotlb_reset(struct vhost_iotlb *iotlb) { vhost_iotlb_del_range(iotlb, 0ULL, 0ULL - 1); } EXPORT_SYMBOL_GPL(vhost_iotlb_reset); /** * vhost_iotlb_free - reset and free vhost IOTLB * @iotlb: the IOTLB to be freed */ void vhost_iotlb_free(struct vhost_iotlb *iotlb) { if (iotlb) { vhost_iotlb_reset(iotlb); kfree(iotlb); } } EXPORT_SYMBOL_GPL(vhost_iotlb_free); /** * vhost_iotlb_itree_first - return the first overlapped range * @iotlb: the IOTLB * @start: start of IOVA range * @last: last byte in IOVA range */ struct vhost_iotlb_map * vhost_iotlb_itree_first(struct vhost_iotlb *iotlb, u64 start, u64 last) { return vhost_iotlb_itree_iter_first(&iotlb->root, start, last); } EXPORT_SYMBOL_GPL(vhost_iotlb_itree_first); /** * vhost_iotlb_itree_next - return the next overlapped range * @map: the starting map node * @start: start of IOVA range * @last: last byte IOVA range */ struct vhost_iotlb_map * vhost_iotlb_itree_next(struct vhost_iotlb_map *map, u64 start, u64 last) { return vhost_iotlb_itree_iter_next(map, start, last); } EXPORT_SYMBOL_GPL(vhost_iotlb_itree_next); MODULE_VERSION(MOD_VERSION); MODULE_DESCRIPTION(MOD_DESC); MODULE_AUTHOR(MOD_AUTHOR); MODULE_LICENSE(MOD_LICENSE); |
| 45 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 | /* SPDX-License-Identifier: GPL-2.0 */ /* * ioport.h Definitions of routines for detecting, reserving and * allocating system resources. * * Authors: Linus Torvalds */ #ifndef _LINUX_IOPORT_H #define _LINUX_IOPORT_H #ifndef __ASSEMBLY__ #include <linux/args.h> #include <linux/bits.h> #include <linux/compiler.h> #include <linux/minmax.h> #include <linux/types.h> /* * Resources are tree-like, allowing * nesting etc.. */ struct resource { resource_size_t start; resource_size_t end; const char *name; unsigned long flags; unsigned long desc; struct resource *parent, *sibling, *child; }; /* * IO resources have these defined flags. * * PCI devices expose these flags to userspace in the "resource" sysfs file, * so don't move them. */ #define IORESOURCE_BITS 0x000000ff /* Bus-specific bits */ #define IORESOURCE_TYPE_BITS 0x00001f00 /* Resource type */ #define IORESOURCE_IO 0x00000100 /* PCI/ISA I/O ports */ #define IORESOURCE_MEM 0x00000200 #define IORESOURCE_REG 0x00000300 /* Register offsets */ #define IORESOURCE_IRQ 0x00000400 #define IORESOURCE_DMA 0x00000800 #define IORESOURCE_BUS 0x00001000 #define IORESOURCE_PREFETCH 0x00002000 /* No side effects */ #define IORESOURCE_READONLY 0x00004000 #define IORESOURCE_CACHEABLE 0x00008000 #define IORESOURCE_RANGELENGTH 0x00010000 #define IORESOURCE_SHADOWABLE 0x00020000 #define IORESOURCE_SIZEALIGN 0x00040000 /* size indicates alignment */ #define IORESOURCE_STARTALIGN 0x00080000 /* start field is alignment */ #define IORESOURCE_MEM_64 0x00100000 #define IORESOURCE_WINDOW 0x00200000 /* forwarded by bridge */ #define IORESOURCE_MUXED 0x00400000 /* Resource is software muxed */ #define IORESOURCE_EXT_TYPE_BITS 0x01000000 /* Resource extended types */ #define IORESOURCE_SYSRAM 0x01000000 /* System RAM (modifier) */ /* IORESOURCE_SYSRAM specific bits. */ #define IORESOURCE_SYSRAM_DRIVER_MANAGED 0x02000000 /* Always detected via a driver. */ #define IORESOURCE_SYSRAM_MERGEABLE 0x04000000 /* Resource can be merged. */ #define IORESOURCE_EXCLUSIVE 0x08000000 /* Userland may not map this resource */ #define IORESOURCE_DISABLED 0x10000000 #define IORESOURCE_UNSET 0x20000000 /* No address assigned yet */ #define IORESOURCE_AUTO 0x40000000 #define IORESOURCE_BUSY 0x80000000 /* Driver has marked this resource busy */ /* I/O resource extended types */ #define IORESOURCE_SYSTEM_RAM (IORESOURCE_MEM|IORESOURCE_SYSRAM) /* PnP IRQ specific bits (IORESOURCE_BITS) */ #define IORESOURCE_IRQ_HIGHEDGE (1<<0) #define IORESOURCE_IRQ_LOWEDGE (1<<1) #define IORESOURCE_IRQ_HIGHLEVEL (1<<2) #define IORESOURCE_IRQ_LOWLEVEL (1<<3) #define IORESOURCE_IRQ_SHAREABLE (1<<4) #define IORESOURCE_IRQ_OPTIONAL (1<<5) #define IORESOURCE_IRQ_WAKECAPABLE (1<<6) /* PnP DMA specific bits (IORESOURCE_BITS) */ #define IORESOURCE_DMA_TYPE_MASK (3<<0) #define IORESOURCE_DMA_8BIT (0<<0) #define IORESOURCE_DMA_8AND16BIT (1<<0) #define IORESOURCE_DMA_16BIT (2<<0) #define IORESOURCE_DMA_MASTER (1<<2) #define IORESOURCE_DMA_BYTE (1<<3) #define IORESOURCE_DMA_WORD (1<<4) #define IORESOURCE_DMA_SPEED_MASK (3<<6) #define IORESOURCE_DMA_COMPATIBLE (0<<6) #define IORESOURCE_DMA_TYPEA (1<<6) #define IORESOURCE_DMA_TYPEB (2<<6) #define IORESOURCE_DMA_TYPEF (3<<6) /* PnP memory I/O specific bits (IORESOURCE_BITS) */ #define IORESOURCE_MEM_WRITEABLE (1<<0) /* dup: IORESOURCE_READONLY */ #define IORESOURCE_MEM_CACHEABLE (1<<1) /* dup: IORESOURCE_CACHEABLE */ #define IORESOURCE_MEM_RANGELENGTH (1<<2) /* dup: IORESOURCE_RANGELENGTH */ #define IORESOURCE_MEM_TYPE_MASK (3<<3) #define IORESOURCE_MEM_8BIT (0<<3) #define IORESOURCE_MEM_16BIT (1<<3) #define IORESOURCE_MEM_8AND16BIT (2<<3) #define IORESOURCE_MEM_32BIT (3<<3) #define IORESOURCE_MEM_SHADOWABLE (1<<5) /* dup: IORESOURCE_SHADOWABLE */ #define IORESOURCE_MEM_EXPANSIONROM (1<<6) #define IORESOURCE_MEM_NONPOSTED (1<<7) /* PnP I/O specific bits (IORESOURCE_BITS) */ #define IORESOURCE_IO_16BIT_ADDR (1<<0) #define IORESOURCE_IO_FIXED (1<<1) #define IORESOURCE_IO_SPARSE (1<<2) /* PCI ROM control bits (IORESOURCE_BITS) */ #define IORESOURCE_ROM_ENABLE (1<<0) /* ROM is enabled, same as PCI_ROM_ADDRESS_ENABLE */ #define IORESOURCE_ROM_SHADOW (1<<1) /* Use RAM image, not ROM BAR */ /* PCI control bits. Shares IORESOURCE_BITS with above PCI ROM. */ #define IORESOURCE_PCI_FIXED (1<<4) /* Do not move resource */ #define IORESOURCE_PCI_EA_BEI (1<<5) /* BAR Equivalent Indicator */ /* * I/O Resource Descriptors * * Descriptors are used by walk_iomem_res_desc() and region_intersects() * for searching a specific resource range in the iomem table. Assign * a new descriptor when a resource range supports the search interfaces. * Otherwise, resource.desc must be set to IORES_DESC_NONE (0). */ enum { IORES_DESC_NONE = 0, IORES_DESC_CRASH_KERNEL = 1, IORES_DESC_ACPI_TABLES = 2, IORES_DESC_ACPI_NV_STORAGE = 3, IORES_DESC_PERSISTENT_MEMORY = 4, IORES_DESC_PERSISTENT_MEMORY_LEGACY = 5, IORES_DESC_DEVICE_PRIVATE_MEMORY = 6, IORES_DESC_RESERVED = 7, IORES_DESC_SOFT_RESERVED = 8, IORES_DESC_CXL = 9, }; /* * Flags controlling ioremap() behavior. */ enum { IORES_MAP_SYSTEM_RAM = BIT(0), IORES_MAP_ENCRYPTED = BIT(1), }; /* helpers to define resources */ #define DEFINE_RES_NAMED_DESC(_start, _size, _name, _flags, _desc) \ (struct resource) { \ .start = (_start), \ .end = (_start) + (_size) - 1, \ .name = (_name), \ .flags = (_flags), \ .desc = (_desc), \ } #define DEFINE_RES_NAMED(_start, _size, _name, _flags) \ DEFINE_RES_NAMED_DESC(_start, _size, _name, _flags, IORES_DESC_NONE) #define __DEFINE_RES0() \ DEFINE_RES_NAMED(0, 0, NULL, IORESOURCE_UNSET) #define __DEFINE_RES3(_start, _size, _flags) \ DEFINE_RES_NAMED(_start, _size, NULL, _flags) #define DEFINE_RES(...) \ CONCATENATE(__DEFINE_RES, COUNT_ARGS(__VA_ARGS__))(__VA_ARGS__) #define DEFINE_RES_IO_NAMED(_start, _size, _name) \ DEFINE_RES_NAMED((_start), (_size), (_name), IORESOURCE_IO) #define DEFINE_RES_IO(_start, _size) \ DEFINE_RES_IO_NAMED((_start), (_size), NULL) #define DEFINE_RES_MEM_NAMED(_start, _size, _name) \ DEFINE_RES_NAMED((_start), (_size), (_name), IORESOURCE_MEM) #define DEFINE_RES_MEM(_start, _size) \ DEFINE_RES_MEM_NAMED((_start), (_size), NULL) #define DEFINE_RES_REG_NAMED(_start, _size, _name) \ DEFINE_RES_NAMED((_start), (_size), (_name), IORESOURCE_REG) #define DEFINE_RES_REG(_start, _size) \ DEFINE_RES_REG_NAMED((_start), (_size), NULL) #define DEFINE_RES_IRQ_NAMED(_irq, _name) \ DEFINE_RES_NAMED((_irq), 1, (_name), IORESOURCE_IRQ) #define DEFINE_RES_IRQ(_irq) \ DEFINE_RES_IRQ_NAMED((_irq), NULL) #define DEFINE_RES_DMA_NAMED(_dma, _name) \ DEFINE_RES_NAMED((_dma), 1, (_name), IORESOURCE_DMA) #define DEFINE_RES_DMA(_dma) \ DEFINE_RES_DMA_NAMED((_dma), NULL) /** * typedef resource_alignf - Resource alignment callback * @data: Private data used by the callback * @res: Resource candidate range (an empty resource space) * @size: The minimum size of the empty space * @align: Alignment from the constraints * * Callback allows calculating resource placement and alignment beyond min, * max, and align fields in the struct resource_constraint. * * Return: Start address for the resource. */ typedef resource_size_t (*resource_alignf)(void *data, const struct resource *res, resource_size_t size, resource_size_t align); /** * struct resource_constraint - constraints to be met while searching empty * resource space * @min: The minimum address for the memory range * @max: The maximum address for the memory range * @align: Alignment for the start address of the empty space * @alignf: Additional alignment constraints callback * @alignf_data: Data provided for @alignf callback * * Contains the range and alignment constraints that have to be met during * find_resource_space(). @alignf can be NULL indicating no alignment beyond * @align is necessary. */ struct resource_constraint { resource_size_t min, max, align; resource_alignf alignf; void *alignf_data; }; /* PC/ISA/whatever - the normal PC address spaces: IO and memory */ extern struct resource ioport_resource; extern struct resource iomem_resource; extern struct resource soft_reserve_resource; extern struct resource *request_resource_conflict(struct resource *root, struct resource *new); extern int request_resource(struct resource *root, struct resource *new); extern int release_resource(struct resource *new); void release_child_resources(struct resource *new); extern void reserve_region_with_split(struct resource *root, resource_size_t start, resource_size_t end, const char *name); extern struct resource *insert_resource_conflict(struct resource *parent, struct resource *new); extern int insert_resource(struct resource *parent, struct resource *new); extern void insert_resource_expand_to_fit(struct resource *root, struct resource *new); extern int remove_resource(struct resource *old); extern void arch_remove_reservations(struct resource *avail); extern int allocate_resource(struct resource *root, struct resource *new, resource_size_t size, resource_size_t min, resource_size_t max, resource_size_t align, resource_alignf alignf, void *alignf_data); struct resource *lookup_resource(struct resource *root, resource_size_t start); int adjust_resource(struct resource *res, resource_size_t start, resource_size_t size); resource_size_t resource_alignment(struct resource *res); /** * resource_set_size - Calculate resource end address from size and start * @res: Resource descriptor * @size: Size of the resource * * Calculate the end address for @res based on @size. * * Note: The start address of @res must be set when calling this function. * Prefer resource_set_range() if setting both the start address and @size. */ static inline void resource_set_size(struct resource *res, resource_size_t size) { res->end = res->start + size - 1; } /** * resource_set_range - Set resource start and end addresses * @res: Resource descriptor * @start: Start address for the resource * @size: Size of the resource * * Set @res start address and calculate the end address based on @size. */ static inline void resource_set_range(struct resource *res, resource_size_t start, resource_size_t size) { res->start = start; resource_set_size(res, size); } static inline resource_size_t resource_size(const struct resource *res) { return res->end - res->start + 1; } static inline unsigned long resource_type(const struct resource *res) { return res->flags & IORESOURCE_TYPE_BITS; } static inline unsigned long resource_ext_type(const struct resource *res) { return res->flags & IORESOURCE_EXT_TYPE_BITS; } /* True iff r1 completely contains r2 */ static inline bool resource_contains(const struct resource *r1, const struct resource *r2) { if (resource_type(r1) != resource_type(r2)) return false; if (r1->flags & IORESOURCE_UNSET || r2->flags & IORESOURCE_UNSET) return false; return r1->start <= r2->start && r1->end >= r2->end; } /* True if any part of r1 overlaps r2 */ static inline bool resource_overlaps(const struct resource *r1, const struct resource *r2) { return r1->start <= r2->end && r1->end >= r2->start; } static inline bool resource_intersection(const struct resource *r1, const struct resource *r2, struct resource *r) { if (!resource_overlaps(r1, r2)) return false; r->start = max(r1->start, r2->start); r->end = min(r1->end, r2->end); return true; } static inline bool resource_union(const struct resource *r1, const struct resource *r2, struct resource *r) { if (!resource_overlaps(r1, r2)) return false; r->start = min(r1->start, r2->start); r->end = max(r1->end, r2->end); return true; } /* * Check if this resource is added to a resource tree or detached. Caller is * responsible for not racing assignment. */ static inline bool resource_assigned(const struct resource *res) { return res->parent; } int find_resource_space(struct resource *root, struct resource *new, resource_size_t size, struct resource_constraint *constraint); /* Convenience shorthand with allocation */ #define request_region(start,n,name) __request_region(&ioport_resource, (start), (n), (name), 0) #define request_muxed_region(start,n,name) __request_region(&ioport_resource, (start), (n), (name), IORESOURCE_MUXED) #define __request_mem_region(start,n,name, excl) __request_region(&iomem_resource, (start), (n), (name), excl) #define request_mem_region(start,n,name) __request_region(&iomem_resource, (start), (n), (name), 0) #define request_mem_region_muxed(start, n, name) \ __request_region(&iomem_resource, (start), (n), (name), IORESOURCE_MUXED) #define request_mem_region_exclusive(start,n,name) \ __request_region(&iomem_resource, (start), (n), (name), IORESOURCE_EXCLUSIVE) #define rename_region(region, newname) do { (region)->name = (newname); } while (0) extern struct resource * __request_region(struct resource *, resource_size_t start, resource_size_t n, const char *name, int flags); /* Compatibility cruft */ #define release_region(start,n) __release_region(&ioport_resource, (start), (n)) #define release_mem_region(start,n) __release_region(&iomem_resource, (start), (n)) extern void __release_region(struct resource *, resource_size_t, resource_size_t); #ifdef CONFIG_MEMORY_HOTREMOVE extern void release_mem_region_adjustable(resource_size_t, resource_size_t); #endif #ifdef CONFIG_MEMORY_HOTPLUG extern void merge_system_ram_resource(struct resource *res); #endif /* Wrappers for managed devices */ struct device; extern int devm_request_resource(struct device *dev, struct resource *root, struct resource *new); extern void devm_release_resource(struct device *dev, struct resource *new); #define devm_request_region(dev,start,n,name) \ __devm_request_region(dev, &ioport_resource, (start), (n), (name)) #define devm_request_mem_region(dev,start,n,name) \ __devm_request_region(dev, &iomem_resource, (start), (n), (name)) extern struct resource * __devm_request_region(struct device *dev, struct resource *parent, resource_size_t start, resource_size_t n, const char *name); #define devm_release_region(dev, start, n) \ __devm_release_region(dev, &ioport_resource, (start), (n)) #define devm_release_mem_region(dev, start, n) \ __devm_release_region(dev, &iomem_resource, (start), (n)) extern void __devm_release_region(struct device *dev, struct resource *parent, resource_size_t start, resource_size_t n); extern int iomem_map_sanity_check(resource_size_t addr, unsigned long size); extern bool iomem_is_exclusive(u64 addr); extern bool resource_is_exclusive(struct resource *resource, u64 addr, resource_size_t size); extern int walk_system_ram_range(unsigned long start_pfn, unsigned long nr_pages, void *arg, int (*func)(unsigned long, unsigned long, void *)); extern int walk_mem_res(u64 start, u64 end, void *arg, int (*func)(struct resource *, void *)); extern int walk_system_ram_res(u64 start, u64 end, void *arg, int (*func)(struct resource *, void *)); extern int walk_system_ram_res_rev(u64 start, u64 end, void *arg, int (*func)(struct resource *, void *)); extern int walk_iomem_res_desc(unsigned long desc, unsigned long flags, u64 start, u64 end, void *arg, int (*func)(struct resource *, void *)); extern int walk_soft_reserve_res(u64 start, u64 end, void *arg, int (*func)(struct resource *, void *)); extern int region_intersects_soft_reserve(resource_size_t start, size_t size); struct resource *devm_request_free_mem_region(struct device *dev, struct resource *base, unsigned long size); struct resource *request_free_mem_region(struct resource *base, unsigned long size, const char *name); struct resource *alloc_free_mem_region(struct resource *base, unsigned long size, unsigned long align, const char *name); static inline void irqresource_disabled(struct resource *res, u32 irq) { res->start = irq; res->end = irq; res->flags |= IORESOURCE_IRQ | IORESOURCE_DISABLED | IORESOURCE_UNSET; } extern struct address_space *iomem_get_mapping(void); #endif /* __ASSEMBLY__ */ #endif /* _LINUX_IOPORT_H */ |
| 36 37 47 4 11 46 11 47 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 | /* * Constant-time equality testing of memory regions. * * Authors: * * James Yonan <james@openvpn.net> * Daniel Borkmann <dborkman@redhat.com> * * This file is provided under a dual BSD/GPLv2 license. When using or * redistributing this file, you may do so under either license. * * GPL LICENSE SUMMARY * * Copyright(c) 2013 OpenVPN Technologies, Inc. All rights reserved. * * This program is free software; you can redistribute it and/or modify * it under the terms of version 2 of the GNU General Public License as * published by the Free Software Foundation. * * This program is distributed in the hope that it will be useful, but * WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA. * The full GNU General Public License is included in this distribution * in the file called LICENSE.GPL. * * BSD LICENSE * * Copyright(c) 2013 OpenVPN Technologies, Inc. All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * * * Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * * Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in * the documentation and/or other materials provided with the * distribution. * * Neither the name of OpenVPN Technologies nor the names of its * contributors may be used to endorse or promote products derived * from this software without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ #include <crypto/algapi.h> #include <linux/export.h> #include <linux/module.h> #include <linux/unaligned.h> /* Generic path for arbitrary size */ static inline unsigned long __crypto_memneq_generic(const void *a, const void *b, size_t size) { unsigned long neq = 0; #if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) while (size >= sizeof(unsigned long)) { neq |= get_unaligned((unsigned long *)a) ^ get_unaligned((unsigned long *)b); OPTIMIZER_HIDE_VAR(neq); a += sizeof(unsigned long); b += sizeof(unsigned long); size -= sizeof(unsigned long); } #endif /* CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS */ while (size > 0) { neq |= *(unsigned char *)a ^ *(unsigned char *)b; OPTIMIZER_HIDE_VAR(neq); a += 1; b += 1; size -= 1; } return neq; } /* Loop-free fast-path for frequently used 16-byte size */ static inline unsigned long __crypto_memneq_16(const void *a, const void *b) { unsigned long neq = 0; #ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS if (sizeof(unsigned long) == 8) { neq |= get_unaligned((unsigned long *)a) ^ get_unaligned((unsigned long *)b); OPTIMIZER_HIDE_VAR(neq); neq |= get_unaligned((unsigned long *)(a + 8)) ^ get_unaligned((unsigned long *)(b + 8)); OPTIMIZER_HIDE_VAR(neq); } else if (sizeof(unsigned int) == 4) { neq |= get_unaligned((unsigned int *)a) ^ get_unaligned((unsigned int *)b); OPTIMIZER_HIDE_VAR(neq); neq |= get_unaligned((unsigned int *)(a + 4)) ^ get_unaligned((unsigned int *)(b + 4)); OPTIMIZER_HIDE_VAR(neq); neq |= get_unaligned((unsigned int *)(a + 8)) ^ get_unaligned((unsigned int *)(b + 8)); OPTIMIZER_HIDE_VAR(neq); neq |= get_unaligned((unsigned int *)(a + 12)) ^ get_unaligned((unsigned int *)(b + 12)); OPTIMIZER_HIDE_VAR(neq); } else #endif /* CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS */ { neq |= *(unsigned char *)(a) ^ *(unsigned char *)(b); OPTIMIZER_HIDE_VAR(neq); neq |= *(unsigned char *)(a+1) ^ *(unsigned char *)(b+1); OPTIMIZER_HIDE_VAR(neq); neq |= *(unsigned char *)(a+2) ^ *(unsigned char *)(b+2); OPTIMIZER_HIDE_VAR(neq); neq |= *(unsigned char *)(a+3) ^ *(unsigned char *)(b+3); OPTIMIZER_HIDE_VAR(neq); neq |= *(unsigned char *)(a+4) ^ *(unsigned char *)(b+4); OPTIMIZER_HIDE_VAR(neq); neq |= *(unsigned char *)(a+5) ^ *(unsigned char *)(b+5); OPTIMIZER_HIDE_VAR(neq); neq |= *(unsigned char *)(a+6) ^ *(unsigned char *)(b+6); OPTIMIZER_HIDE_VAR(neq); neq |= *(unsigned char *)(a+7) ^ *(unsigned char *)(b+7); OPTIMIZER_HIDE_VAR(neq); neq |= *(unsigned char *)(a+8) ^ *(unsigned char *)(b+8); OPTIMIZER_HIDE_VAR(neq); neq |= *(unsigned char *)(a+9) ^ *(unsigned char *)(b+9); OPTIMIZER_HIDE_VAR(neq); neq |= *(unsigned char *)(a+10) ^ *(unsigned char *)(b+10); OPTIMIZER_HIDE_VAR(neq); neq |= *(unsigned char *)(a+11) ^ *(unsigned char *)(b+11); OPTIMIZER_HIDE_VAR(neq); neq |= *(unsigned char *)(a+12) ^ *(unsigned char *)(b+12); OPTIMIZER_HIDE_VAR(neq); neq |= *(unsigned char *)(a+13) ^ *(unsigned char *)(b+13); OPTIMIZER_HIDE_VAR(neq); neq |= *(unsigned char *)(a+14) ^ *(unsigned char *)(b+14); OPTIMIZER_HIDE_VAR(neq); neq |= *(unsigned char *)(a+15) ^ *(unsigned char *)(b+15); OPTIMIZER_HIDE_VAR(neq); } return neq; } /* Compare two areas of memory without leaking timing information, * and with special optimizations for common sizes. Users should * not call this function directly, but should instead use * crypto_memneq defined in crypto/algapi.h. */ noinline unsigned long __crypto_memneq(const void *a, const void *b, size_t size) { switch (size) { case 16: return __crypto_memneq_16(a, b); default: return __crypto_memneq_generic(a, b, size); } } EXPORT_SYMBOL(__crypto_memneq); |
| 101 101 101 101 100 101 101 101 101 101 101 5 3 3 3 3 3 3 19 19 17 17 17 17 5 2 3 2 3 3 3 3 2 17 17 17 17 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 | // SPDX-License-Identifier: GPL-2.0-only #include <net/netdev_lock.h> #include "netlink.h" #include "common.h" #include "bitset.h" struct features_req_info { struct ethnl_req_info base; }; struct features_reply_data { struct ethnl_reply_data base; u32 hw[ETHTOOL_DEV_FEATURE_WORDS]; u32 wanted[ETHTOOL_DEV_FEATURE_WORDS]; u32 active[ETHTOOL_DEV_FEATURE_WORDS]; u32 nochange[ETHTOOL_DEV_FEATURE_WORDS]; u32 all[ETHTOOL_DEV_FEATURE_WORDS]; }; #define FEATURES_REPDATA(__reply_base) \ container_of(__reply_base, struct features_reply_data, base) const struct nla_policy ethnl_features_get_policy[] = { [ETHTOOL_A_FEATURES_HEADER] = NLA_POLICY_NESTED(ethnl_header_policy), }; static void ethnl_features_to_bitmap32(u32 *dest, netdev_features_t src) { unsigned int i; for (i = 0; i < ETHTOOL_DEV_FEATURE_WORDS; i++) dest[i] = src >> (32 * i); } static int features_prepare_data(const struct ethnl_req_info *req_base, struct ethnl_reply_data *reply_base, const struct genl_info *info) { struct features_reply_data *data = FEATURES_REPDATA(reply_base); struct net_device *dev = reply_base->dev; netdev_features_t all_features; ethnl_features_to_bitmap32(data->hw, dev->hw_features); ethnl_features_to_bitmap32(data->wanted, dev->wanted_features); ethnl_features_to_bitmap32(data->active, dev->features); ethnl_features_to_bitmap32(data->nochange, NETIF_F_NEVER_CHANGE); all_features = GENMASK_ULL(NETDEV_FEATURE_COUNT - 1, 0); ethnl_features_to_bitmap32(data->all, all_features); return 0; } static int features_reply_size(const struct ethnl_req_info *req_base, const struct ethnl_reply_data *reply_base) { const struct features_reply_data *data = FEATURES_REPDATA(reply_base); bool compact = req_base->flags & ETHTOOL_FLAG_COMPACT_BITSETS; unsigned int len = 0; int ret; ret = ethnl_bitset32_size(data->hw, data->all, NETDEV_FEATURE_COUNT, netdev_features_strings, compact); if (ret < 0) return ret; len += ret; ret = ethnl_bitset32_size(data->wanted, NULL, NETDEV_FEATURE_COUNT, netdev_features_strings, compact); if (ret < 0) return ret; len += ret; ret = ethnl_bitset32_size(data->active, NULL, NETDEV_FEATURE_COUNT, netdev_features_strings, compact); if (ret < 0) return ret; len += ret; ret = ethnl_bitset32_size(data->nochange, NULL, NETDEV_FEATURE_COUNT, netdev_features_strings, compact); if (ret < 0) return ret; len += ret; return len; } static int features_fill_reply(struct sk_buff *skb, const struct ethnl_req_info *req_base, const struct ethnl_reply_data *reply_base) { const struct features_reply_data *data = FEATURES_REPDATA(reply_base); bool compact = req_base->flags & ETHTOOL_FLAG_COMPACT_BITSETS; int ret; ret = ethnl_put_bitset32(skb, ETHTOOL_A_FEATURES_HW, data->hw, data->all, NETDEV_FEATURE_COUNT, netdev_features_strings, compact); if (ret < 0) return ret; ret = ethnl_put_bitset32(skb, ETHTOOL_A_FEATURES_WANTED, data->wanted, NULL, NETDEV_FEATURE_COUNT, netdev_features_strings, compact); if (ret < 0) return ret; ret = ethnl_put_bitset32(skb, ETHTOOL_A_FEATURES_ACTIVE, data->active, NULL, NETDEV_FEATURE_COUNT, netdev_features_strings, compact); if (ret < 0) return ret; return ethnl_put_bitset32(skb, ETHTOOL_A_FEATURES_NOCHANGE, data->nochange, NULL, NETDEV_FEATURE_COUNT, netdev_features_strings, compact); } const struct ethnl_request_ops ethnl_features_request_ops = { .request_cmd = ETHTOOL_MSG_FEATURES_GET, .reply_cmd = ETHTOOL_MSG_FEATURES_GET_REPLY, .hdr_attr = ETHTOOL_A_FEATURES_HEADER, .req_info_size = sizeof(struct features_req_info), .reply_data_size = sizeof(struct features_reply_data), .prepare_data = features_prepare_data, .reply_size = features_reply_size, .fill_reply = features_fill_reply, }; /* FEATURES_SET */ const struct nla_policy ethnl_features_set_policy[] = { [ETHTOOL_A_FEATURES_HEADER] = NLA_POLICY_NESTED(ethnl_header_policy), [ETHTOOL_A_FEATURES_WANTED] = { .type = NLA_NESTED }, }; static void ethnl_features_to_bitmap(unsigned long *dest, netdev_features_t val) { const unsigned int words = BITS_TO_LONGS(NETDEV_FEATURE_COUNT); unsigned int i; for (i = 0; i < words; i++) dest[i] = (unsigned long)(val >> (i * BITS_PER_LONG)); } static netdev_features_t ethnl_bitmap_to_features(unsigned long *src) { const unsigned int nft_bits = sizeof(netdev_features_t) * BITS_PER_BYTE; const unsigned int words = BITS_TO_LONGS(NETDEV_FEATURE_COUNT); netdev_features_t ret = 0; unsigned int i; for (i = 0; i < words; i++) ret |= (netdev_features_t)(src[i]) << (i * BITS_PER_LONG); ret &= ~(netdev_features_t)0 >> (nft_bits - NETDEV_FEATURE_COUNT); return ret; } static int features_send_reply(struct net_device *dev, struct genl_info *info, const unsigned long *wanted, const unsigned long *wanted_mask, const unsigned long *active, const unsigned long *active_mask, bool compact) { struct sk_buff *rskb; void *reply_payload; int reply_len = 0; int ret; reply_len = ethnl_reply_header_size(); ret = ethnl_bitset_size(wanted, wanted_mask, NETDEV_FEATURE_COUNT, netdev_features_strings, compact); if (ret < 0) goto err; reply_len += ret; ret = ethnl_bitset_size(active, active_mask, NETDEV_FEATURE_COUNT, netdev_features_strings, compact); if (ret < 0) goto err; reply_len += ret; ret = -ENOMEM; rskb = ethnl_reply_init(reply_len, dev, ETHTOOL_MSG_FEATURES_SET_REPLY, ETHTOOL_A_FEATURES_HEADER, info, &reply_payload); if (!rskb) goto err; ret = ethnl_put_bitset(rskb, ETHTOOL_A_FEATURES_WANTED, wanted, wanted_mask, NETDEV_FEATURE_COUNT, netdev_features_strings, compact); if (ret < 0) goto nla_put_failure; ret = ethnl_put_bitset(rskb, ETHTOOL_A_FEATURES_ACTIVE, active, active_mask, NETDEV_FEATURE_COUNT, netdev_features_strings, compact); if (ret < 0) goto nla_put_failure; genlmsg_end(rskb, reply_payload); ret = genlmsg_reply(rskb, info); return ret; nla_put_failure: nlmsg_free(rskb); WARN_ONCE(1, "calculated message payload length (%d) not sufficient\n", reply_len); err: GENL_SET_ERR_MSG(info, "failed to send reply message"); return ret; } int ethnl_set_features(struct sk_buff *skb, struct genl_info *info) { DECLARE_BITMAP(wanted_diff_mask, NETDEV_FEATURE_COUNT); DECLARE_BITMAP(active_diff_mask, NETDEV_FEATURE_COUNT); DECLARE_BITMAP(old_active, NETDEV_FEATURE_COUNT); DECLARE_BITMAP(old_wanted, NETDEV_FEATURE_COUNT); DECLARE_BITMAP(new_active, NETDEV_FEATURE_COUNT); DECLARE_BITMAP(new_wanted, NETDEV_FEATURE_COUNT); DECLARE_BITMAP(req_wanted, NETDEV_FEATURE_COUNT); DECLARE_BITMAP(req_mask, NETDEV_FEATURE_COUNT); struct ethnl_req_info req_info = {}; struct nlattr **tb = info->attrs; struct net_device *dev; bool mod; int ret; if (!tb[ETHTOOL_A_FEATURES_WANTED]) return -EINVAL; ret = ethnl_parse_header_dev_get(&req_info, tb[ETHTOOL_A_FEATURES_HEADER], genl_info_net(info), info->extack, true); if (ret < 0) return ret; dev = req_info.dev; rtnl_lock(); netdev_lock_ops(dev); ret = ethnl_ops_begin(dev); if (ret < 0) goto out_unlock; ethnl_features_to_bitmap(old_active, dev->features); ethnl_features_to_bitmap(old_wanted, dev->wanted_features); ret = ethnl_parse_bitset(req_wanted, req_mask, NETDEV_FEATURE_COUNT, tb[ETHTOOL_A_FEATURES_WANTED], netdev_features_strings, info->extack); if (ret < 0) goto out_ops; if (ethnl_bitmap_to_features(req_mask) & ~NETIF_F_ETHTOOL_BITS) { GENL_SET_ERR_MSG(info, "attempt to change non-ethtool features"); ret = -EINVAL; goto out_ops; } /* set req_wanted bits not in req_mask from old_wanted */ bitmap_and(req_wanted, req_wanted, req_mask, NETDEV_FEATURE_COUNT); bitmap_andnot(new_wanted, old_wanted, req_mask, NETDEV_FEATURE_COUNT); bitmap_or(req_wanted, new_wanted, req_wanted, NETDEV_FEATURE_COUNT); if (!bitmap_equal(req_wanted, old_wanted, NETDEV_FEATURE_COUNT)) { dev->wanted_features &= ~dev->hw_features; dev->wanted_features |= ethnl_bitmap_to_features(req_wanted) & dev->hw_features; __netdev_update_features(dev); } ethnl_features_to_bitmap(new_active, dev->features); mod = !bitmap_equal(old_active, new_active, NETDEV_FEATURE_COUNT); ret = 0; if (!(req_info.flags & ETHTOOL_FLAG_OMIT_REPLY)) { bool compact = req_info.flags & ETHTOOL_FLAG_COMPACT_BITSETS; bitmap_xor(wanted_diff_mask, req_wanted, new_active, NETDEV_FEATURE_COUNT); bitmap_xor(active_diff_mask, old_active, new_active, NETDEV_FEATURE_COUNT); bitmap_and(wanted_diff_mask, wanted_diff_mask, req_mask, NETDEV_FEATURE_COUNT); bitmap_and(req_wanted, req_wanted, wanted_diff_mask, NETDEV_FEATURE_COUNT); bitmap_and(new_active, new_active, active_diff_mask, NETDEV_FEATURE_COUNT); ret = features_send_reply(dev, info, req_wanted, wanted_diff_mask, new_active, active_diff_mask, compact); } if (mod) netdev_features_change(dev); out_ops: ethnl_ops_complete(dev); out_unlock: netdev_unlock_ops(dev); rtnl_unlock(); ethnl_parse_header_dev_put(&req_info); return ret; } |
| 30 30 22 27 14 14 14 14 14 14 14 23 23 23 29 30 14 23 23 23 30 30 32 32 32 11 11 11 11 11 32 32 32 32 32 32 1 1 32 32 11 25 32 1 1 32 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 | // SPDX-License-Identifier: GPL-2.0 /* * linux/mm/page_io.c * * Copyright (C) 1991, 1992, 1993, 1994 Linus Torvalds * * Swap reorganised 29.12.95, * Asynchronous swapping added 30.12.95. Stephen Tweedie * Removed race in async swapping. 14.4.1996. Bruno Haible * Add swap of shared pages through the page cache. 20.2.1998. Stephen Tweedie * Always use brw_page, life becomes simpler. 12 May 1998 Eric Biederman */ #include <linux/mm.h> #include <linux/kernel_stat.h> #include <linux/gfp.h> #include <linux/pagemap.h> #include <linux/swap.h> #include <linux/bio.h> #include <linux/swapops.h> #include <linux/writeback.h> #include <linux/blkdev.h> #include <linux/psi.h> #include <linux/uio.h> #include <linux/sched/task.h> #include <linux/delayacct.h> #include <linux/zswap.h> #include "swap.h" static void __end_swap_bio_write(struct bio *bio) { struct folio *folio = bio_first_folio_all(bio); if (bio->bi_status) { /* * We failed to write the page out to swap-space. * Re-dirty the page in order to avoid it being reclaimed. * Also print a dire warning that things will go BAD (tm) * very quickly. * * Also clear PG_reclaim to avoid folio_rotate_reclaimable() */ folio_mark_dirty(folio); pr_alert_ratelimited("Write-error on swap-device (%u:%u:%llu)\n", MAJOR(bio_dev(bio)), MINOR(bio_dev(bio)), (unsigned long long)bio->bi_iter.bi_sector); folio_clear_reclaim(folio); } folio_end_writeback(folio); } static void end_swap_bio_write(struct bio *bio) { __end_swap_bio_write(bio); bio_put(bio); } static void __end_swap_bio_read(struct bio *bio) { struct folio *folio = bio_first_folio_all(bio); if (bio->bi_status) { pr_alert_ratelimited("Read-error on swap-device (%u:%u:%llu)\n", MAJOR(bio_dev(bio)), MINOR(bio_dev(bio)), (unsigned long long)bio->bi_iter.bi_sector); } else { folio_mark_uptodate(folio); } folio_unlock(folio); } static void end_swap_bio_read(struct bio *bio) { __end_swap_bio_read(bio); bio_put(bio); } int generic_swapfile_activate(struct swap_info_struct *sis, struct file *swap_file, sector_t *span) { struct address_space *mapping = swap_file->f_mapping; struct inode *inode = mapping->host; unsigned blocks_per_page; unsigned long page_no; unsigned blkbits; sector_t probe_block; sector_t last_block; sector_t lowest_block = -1; sector_t highest_block = 0; int nr_extents = 0; int ret; blkbits = inode->i_blkbits; blocks_per_page = PAGE_SIZE >> blkbits; /* * Map all the blocks into the extent tree. This code doesn't try * to be very smart. */ probe_block = 0; page_no = 0; last_block = i_size_read(inode) >> blkbits; while ((probe_block + blocks_per_page) <= last_block && page_no < sis->max) { unsigned block_in_page; sector_t first_block; cond_resched(); first_block = probe_block; ret = bmap(inode, &first_block); if (ret || !first_block) goto bad_bmap; /* * It must be PAGE_SIZE aligned on-disk */ if (first_block & (blocks_per_page - 1)) { probe_block++; goto reprobe; } for (block_in_page = 1; block_in_page < blocks_per_page; block_in_page++) { sector_t block; block = probe_block + block_in_page; ret = bmap(inode, &block); if (ret || !block) goto bad_bmap; if (block != first_block + block_in_page) { /* Discontiguity */ probe_block++; goto reprobe; } } first_block >>= (PAGE_SHIFT - blkbits); if (page_no) { /* exclude the header page */ if (first_block < lowest_block) lowest_block = first_block; if (first_block > highest_block) highest_block = first_block; } /* * We found a PAGE_SIZE-length, PAGE_SIZE-aligned run of blocks */ ret = add_swap_extent(sis, page_no, 1, first_block); if (ret < 0) goto out; nr_extents += ret; page_no++; probe_block += blocks_per_page; reprobe: continue; } ret = nr_extents; *span = 1 + highest_block - lowest_block; if (page_no == 0) page_no = 1; /* force Empty message */ sis->max = page_no; sis->pages = page_no - 1; out: return ret; bad_bmap: pr_err("swapon: swapfile has holes\n"); ret = -EINVAL; goto out; } static bool is_folio_zero_filled(struct folio *folio) { unsigned int pos, last_pos; unsigned long *data; unsigned int i; last_pos = PAGE_SIZE / sizeof(*data) - 1; for (i = 0; i < folio_nr_pages(folio); i++) { data = kmap_local_folio(folio, i * PAGE_SIZE); /* * Check last word first, incase the page is zero-filled at * the start and has non-zero data at the end, which is common * in real-world workloads. */ if (data[last_pos]) { kunmap_local(data); return false; } for (pos = 0; pos < last_pos; pos++) { if (data[pos]) { kunmap_local(data); return false; } } kunmap_local(data); } return true; } static void swap_zeromap_folio_set(struct folio *folio) { struct obj_cgroup *objcg = get_obj_cgroup_from_folio(folio); struct swap_info_struct *sis = __swap_entry_to_info(folio->swap); int nr_pages = folio_nr_pages(folio); swp_entry_t entry; unsigned int i; for (i = 0; i < folio_nr_pages(folio); i++) { entry = page_swap_entry(folio_page(folio, i)); set_bit(swp_offset(entry), sis->zeromap); } count_vm_events(SWPOUT_ZERO, nr_pages); if (objcg) { count_objcg_events(objcg, SWPOUT_ZERO, nr_pages); obj_cgroup_put(objcg); } } static void swap_zeromap_folio_clear(struct folio *folio) { struct swap_info_struct *sis = __swap_entry_to_info(folio->swap); swp_entry_t entry; unsigned int i; for (i = 0; i < folio_nr_pages(folio); i++) { entry = page_swap_entry(folio_page(folio, i)); clear_bit(swp_offset(entry), sis->zeromap); } } /* * We may have stale swap cache pages in memory: notice * them here and get rid of the unnecessary final write. */ int swap_writeout(struct folio *folio, struct swap_iocb **swap_plug) { int ret = 0; if (folio_free_swap(folio)) goto out_unlock; /* * Arch code may have to preserve more data than just the page * contents, e.g. memory tags. */ ret = arch_prepare_to_swap(folio); if (ret) { folio_mark_dirty(folio); goto out_unlock; } /* * Use a bitmap (zeromap) to avoid doing IO for zero-filled pages. * The bits in zeromap are protected by the locked swapcache folio * and atomic updates are used to protect against read-modify-write * corruption due to other zero swap entries seeing concurrent updates. */ if (is_folio_zero_filled(folio)) { swap_zeromap_folio_set(folio); goto out_unlock; } /* * Clear bits this folio occupies in the zeromap to prevent zero data * being read in from any previous zero writes that occupied the same * swap entries. */ swap_zeromap_folio_clear(folio); if (zswap_store(folio)) { count_mthp_stat(folio_order(folio), MTHP_STAT_ZSWPOUT); goto out_unlock; } if (!mem_cgroup_zswap_writeback_enabled(folio_memcg(folio))) { folio_mark_dirty(folio); return AOP_WRITEPAGE_ACTIVATE; } __swap_writepage(folio, swap_plug); return 0; out_unlock: folio_unlock(folio); return ret; } static inline void count_swpout_vm_event(struct folio *folio) { #ifdef CONFIG_TRANSPARENT_HUGEPAGE if (unlikely(folio_test_pmd_mappable(folio))) { count_memcg_folio_events(folio, THP_SWPOUT, 1); count_vm_event(THP_SWPOUT); } #endif count_mthp_stat(folio_order(folio), MTHP_STAT_SWPOUT); count_memcg_folio_events(folio, PSWPOUT, folio_nr_pages(folio)); count_vm_events(PSWPOUT, folio_nr_pages(folio)); } #if defined(CONFIG_MEMCG) && defined(CONFIG_BLK_CGROUP) static void bio_associate_blkg_from_page(struct bio *bio, struct folio *folio) { struct cgroup_subsys_state *css; struct mem_cgroup *memcg; memcg = folio_memcg(folio); if (!memcg) return; rcu_read_lock(); css = cgroup_e_css(memcg->css.cgroup, &io_cgrp_subsys); bio_associate_blkg_from_css(bio, css); rcu_read_unlock(); } #else #define bio_associate_blkg_from_page(bio, folio) do { } while (0) #endif /* CONFIG_MEMCG && CONFIG_BLK_CGROUP */ struct swap_iocb { struct kiocb iocb; struct bio_vec bvec[SWAP_CLUSTER_MAX]; int pages; int len; }; static mempool_t *sio_pool; int sio_pool_init(void) { if (!sio_pool) { mempool_t *pool = mempool_create_kmalloc_pool( SWAP_CLUSTER_MAX, sizeof(struct swap_iocb)); if (cmpxchg(&sio_pool, NULL, pool)) mempool_destroy(pool); } if (!sio_pool) return -ENOMEM; return 0; } static void sio_write_complete(struct kiocb *iocb, long ret) { struct swap_iocb *sio = container_of(iocb, struct swap_iocb, iocb); struct page *page = sio->bvec[0].bv_page; int p; if (ret != sio->len) { /* * In the case of swap-over-nfs, this can be a * temporary failure if the system has limited * memory for allocating transmit buffers. * Mark the page dirty and avoid * folio_rotate_reclaimable but rate-limit the * messages. */ pr_err_ratelimited("Write error %ld on dio swapfile (%llu)\n", ret, swap_dev_pos(page_swap_entry(page))); for (p = 0; p < sio->pages; p++) { page = sio->bvec[p].bv_page; set_page_dirty(page); ClearPageReclaim(page); } } for (p = 0; p < sio->pages; p++) end_page_writeback(sio->bvec[p].bv_page); mempool_free(sio, sio_pool); } static void swap_writepage_fs(struct folio *folio, struct swap_iocb **swap_plug) { struct swap_iocb *sio = swap_plug ? *swap_plug : NULL; struct swap_info_struct *sis = __swap_entry_to_info(folio->swap); struct file *swap_file = sis->swap_file; loff_t pos = swap_dev_pos(folio->swap); count_swpout_vm_event(folio); folio_start_writeback(folio); folio_unlock(folio); if (sio) { if (sio->iocb.ki_filp != swap_file || sio->iocb.ki_pos + sio->len != pos) { swap_write_unplug(sio); sio = NULL; } } if (!sio) { sio = mempool_alloc(sio_pool, GFP_NOIO); init_sync_kiocb(&sio->iocb, swap_file); sio->iocb.ki_complete = sio_write_complete; sio->iocb.ki_pos = pos; sio->pages = 0; sio->len = 0; } bvec_set_folio(&sio->bvec[sio->pages], folio, folio_size(folio), 0); sio->len += folio_size(folio); sio->pages += 1; if (sio->pages == ARRAY_SIZE(sio->bvec) || !swap_plug) { swap_write_unplug(sio); sio = NULL; } if (swap_plug) *swap_plug = sio; } static void swap_writepage_bdev_sync(struct folio *folio, struct swap_info_struct *sis) { struct bio_vec bv; struct bio bio; bio_init(&bio, sis->bdev, &bv, 1, REQ_OP_WRITE | REQ_SWAP); bio.bi_iter.bi_sector = swap_folio_sector(folio); bio_add_folio_nofail(&bio, folio, folio_size(folio), 0); bio_associate_blkg_from_page(&bio, folio); count_swpout_vm_event(folio); folio_start_writeback(folio); folio_unlock(folio); submit_bio_wait(&bio); __end_swap_bio_write(&bio); } static void swap_writepage_bdev_async(struct folio *folio, struct swap_info_struct *sis) { struct bio *bio; bio = bio_alloc(sis->bdev, 1, REQ_OP_WRITE | REQ_SWAP, GFP_NOIO); bio->bi_iter.bi_sector = swap_folio_sector(folio); bio->bi_end_io = end_swap_bio_write; bio_add_folio_nofail(bio, folio, folio_size(folio), 0); bio_associate_blkg_from_page(bio, folio); count_swpout_vm_event(folio); folio_start_writeback(folio); folio_unlock(folio); submit_bio(bio); } void __swap_writepage(struct folio *folio, struct swap_iocb **swap_plug) { struct swap_info_struct *sis = __swap_entry_to_info(folio->swap); VM_BUG_ON_FOLIO(!folio_test_swapcache(folio), folio); /* * ->flags can be updated non-atomically (scan_swap_map_slots), * but that will never affect SWP_FS_OPS, so the data_race * is safe. */ if (data_race(sis->flags & SWP_FS_OPS)) swap_writepage_fs(folio, swap_plug); /* * ->flags can be updated non-atomically (scan_swap_map_slots), * but that will never affect SWP_SYNCHRONOUS_IO, so the data_race * is safe. */ else if (data_race(sis->flags & SWP_SYNCHRONOUS_IO)) swap_writepage_bdev_sync(folio, sis); else swap_writepage_bdev_async(folio, sis); } void swap_write_unplug(struct swap_iocb *sio) { struct iov_iter from; struct address_space *mapping = sio->iocb.ki_filp->f_mapping; int ret; iov_iter_bvec(&from, ITER_SOURCE, sio->bvec, sio->pages, sio->len); ret = mapping->a_ops->swap_rw(&sio->iocb, &from); if (ret != -EIOCBQUEUED) sio_write_complete(&sio->iocb, ret); } static void sio_read_complete(struct kiocb *iocb, long ret) { struct swap_iocb *sio = container_of(iocb, struct swap_iocb, iocb); int p; if (ret == sio->len) { for (p = 0; p < sio->pages; p++) { struct folio *folio = page_folio(sio->bvec[p].bv_page); count_mthp_stat(folio_order(folio), MTHP_STAT_SWPIN); count_memcg_folio_events(folio, PSWPIN, folio_nr_pages(folio)); folio_mark_uptodate(folio); folio_unlock(folio); } count_vm_events(PSWPIN, sio->pages); } else { for (p = 0; p < sio->pages; p++) { struct folio *folio = page_folio(sio->bvec[p].bv_page); folio_unlock(folio); } pr_alert_ratelimited("Read-error on swap-device\n"); } mempool_free(sio, sio_pool); } static bool swap_read_folio_zeromap(struct folio *folio) { int nr_pages = folio_nr_pages(folio); struct obj_cgroup *objcg; bool is_zeromap; /* * Swapping in a large folio that is partially in the zeromap is not * currently handled. Return true without marking the folio uptodate so * that an IO error is emitted (e.g. do_swap_page() will sigbus). */ if (WARN_ON_ONCE(swap_zeromap_batch(folio->swap, nr_pages, &is_zeromap) != nr_pages)) return true; if (!is_zeromap) return false; objcg = get_obj_cgroup_from_folio(folio); count_vm_events(SWPIN_ZERO, nr_pages); if (objcg) { count_objcg_events(objcg, SWPIN_ZERO, nr_pages); obj_cgroup_put(objcg); } folio_zero_range(folio, 0, folio_size(folio)); folio_mark_uptodate(folio); return true; } static void swap_read_folio_fs(struct folio *folio, struct swap_iocb **plug) { struct swap_info_struct *sis = __swap_entry_to_info(folio->swap); struct swap_iocb *sio = NULL; loff_t pos = swap_dev_pos(folio->swap); if (plug) sio = *plug; if (sio) { if (sio->iocb.ki_filp != sis->swap_file || sio->iocb.ki_pos + sio->len != pos) { swap_read_unplug(sio); sio = NULL; } } if (!sio) { sio = mempool_alloc(sio_pool, GFP_KERNEL); init_sync_kiocb(&sio->iocb, sis->swap_file); sio->iocb.ki_pos = pos; sio->iocb.ki_complete = sio_read_complete; sio->pages = 0; sio->len = 0; } bvec_set_folio(&sio->bvec[sio->pages], folio, folio_size(folio), 0); sio->len += folio_size(folio); sio->pages += 1; if (sio->pages == ARRAY_SIZE(sio->bvec) || !plug) { swap_read_unplug(sio); sio = NULL; } if (plug) *plug = sio; } static void swap_read_folio_bdev_sync(struct folio *folio, struct swap_info_struct *sis) { struct bio_vec bv; struct bio bio; bio_init(&bio, sis->bdev, &bv, 1, REQ_OP_READ); bio.bi_iter.bi_sector = swap_folio_sector(folio); bio_add_folio_nofail(&bio, folio, folio_size(folio), 0); /* * Keep this task valid during swap readpage because the oom killer may * attempt to access it in the page fault retry time check. */ get_task_struct(current); count_mthp_stat(folio_order(folio), MTHP_STAT_SWPIN); count_memcg_folio_events(folio, PSWPIN, folio_nr_pages(folio)); count_vm_events(PSWPIN, folio_nr_pages(folio)); submit_bio_wait(&bio); __end_swap_bio_read(&bio); put_task_struct(current); } static void swap_read_folio_bdev_async(struct folio *folio, struct swap_info_struct *sis) { struct bio *bio; bio = bio_alloc(sis->bdev, 1, REQ_OP_READ, GFP_KERNEL); bio->bi_iter.bi_sector = swap_folio_sector(folio); bio->bi_end_io = end_swap_bio_read; bio_add_folio_nofail(bio, folio, folio_size(folio), 0); count_mthp_stat(folio_order(folio), MTHP_STAT_SWPIN); count_memcg_folio_events(folio, PSWPIN, folio_nr_pages(folio)); count_vm_events(PSWPIN, folio_nr_pages(folio)); submit_bio(bio); } void swap_read_folio(struct folio *folio, struct swap_iocb **plug) { struct swap_info_struct *sis = __swap_entry_to_info(folio->swap); bool synchronous = sis->flags & SWP_SYNCHRONOUS_IO; bool workingset = folio_test_workingset(folio); unsigned long pflags; bool in_thrashing; VM_BUG_ON_FOLIO(!folio_test_swapcache(folio) && !synchronous, folio); VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); VM_BUG_ON_FOLIO(folio_test_uptodate(folio), folio); /* * Count submission time as memory stall and delay. When the device * is congested, or the submitting cgroup IO-throttled, submission * can be a significant part of overall IO time. */ if (workingset) { delayacct_thrashing_start(&in_thrashing); psi_memstall_enter(&pflags); } delayacct_swapin_start(); if (swap_read_folio_zeromap(folio)) { folio_unlock(folio); goto finish; } if (zswap_load(folio) != -ENOENT) goto finish; /* We have to read from slower devices. Increase zswap protection. */ zswap_folio_swapin(folio); if (data_race(sis->flags & SWP_FS_OPS)) { swap_read_folio_fs(folio, plug); } else if (synchronous) { swap_read_folio_bdev_sync(folio, sis); } else { swap_read_folio_bdev_async(folio, sis); } finish: if (workingset) { delayacct_thrashing_end(&in_thrashing); psi_memstall_leave(&pflags); } delayacct_swapin_end(); } void __swap_read_unplug(struct swap_iocb *sio) { struct iov_iter from; struct address_space *mapping = sio->iocb.ki_filp->f_mapping; int ret; iov_iter_bvec(&from, ITER_DEST, sio->bvec, sio->pages, sio->len); ret = mapping->a_ops->swap_rw(&sio->iocb, &from); if (ret != -EIOCBQUEUED) sio_read_complete(&sio->iocb, ret); } |
| 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 | // SPDX-License-Identifier: GPL-2.0-only /* * 'raw' table, which is the very first hooked in at PRE_ROUTING and LOCAL_OUT . * * Copyright (C) 2003 Jozsef Kadlecsik <kadlec@netfilter.org> */ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #include <linux/module.h> #include <linux/netfilter_ipv4/ip_tables.h> #include <linux/slab.h> #include <net/ip.h> #define RAW_VALID_HOOKS ((1 << NF_INET_PRE_ROUTING) | (1 << NF_INET_LOCAL_OUT)) static bool raw_before_defrag __read_mostly; MODULE_PARM_DESC(raw_before_defrag, "Enable raw table before defrag"); module_param(raw_before_defrag, bool, 0000); static const struct xt_table packet_raw = { .name = "raw", .valid_hooks = RAW_VALID_HOOKS, .me = THIS_MODULE, .af = NFPROTO_IPV4, .priority = NF_IP_PRI_RAW, }; static const struct xt_table packet_raw_before_defrag = { .name = "raw", .valid_hooks = RAW_VALID_HOOKS, .me = THIS_MODULE, .af = NFPROTO_IPV4, .priority = NF_IP_PRI_RAW_BEFORE_DEFRAG, }; static struct nf_hook_ops *rawtable_ops __read_mostly; static int iptable_raw_table_init(struct net *net) { struct ipt_replace *repl; const struct xt_table *table = &packet_raw; int ret; if (raw_before_defrag) table = &packet_raw_before_defrag; repl = ipt_alloc_initial_table(table); if (repl == NULL) return -ENOMEM; ret = ipt_register_table(net, table, repl, rawtable_ops); kfree(repl); return ret; } static void __net_exit iptable_raw_net_pre_exit(struct net *net) { ipt_unregister_table_pre_exit(net, "raw"); } static void __net_exit iptable_raw_net_exit(struct net *net) { ipt_unregister_table_exit(net, "raw"); } static struct pernet_operations iptable_raw_net_ops = { .pre_exit = iptable_raw_net_pre_exit, .exit = iptable_raw_net_exit, }; static int __init iptable_raw_init(void) { int ret; const struct xt_table *table = &packet_raw; if (raw_before_defrag) { table = &packet_raw_before_defrag; pr_info("Enabling raw table before defrag\n"); } ret = xt_register_template(table, iptable_raw_table_init); if (ret < 0) return ret; rawtable_ops = xt_hook_ops_alloc(table, ipt_do_table); if (IS_ERR(rawtable_ops)) { xt_unregister_template(table); return PTR_ERR(rawtable_ops); } ret = register_pernet_subsys(&iptable_raw_net_ops); if (ret < 0) { xt_unregister_template(table); kfree(rawtable_ops); return ret; } return ret; } static void __exit iptable_raw_fini(void) { unregister_pernet_subsys(&iptable_raw_net_ops); kfree(rawtable_ops); xt_unregister_template(&packet_raw); } module_init(iptable_raw_init); module_exit(iptable_raw_fini); MODULE_LICENSE("GPL"); MODULE_DESCRIPTION("iptables legacy raw table"); |
| 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 8 10 4 5 7 4 6 10 10 10 10 10 10 10 10 10 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 | // SPDX-License-Identifier: LGPL-2.0+ /* * Linear conversion Plug-In * Copyright (c) 1999 by Jaroslav Kysela <perex@perex.cz>, * Abramo Bagnara <abramo@alsa-project.org> */ #include <linux/time.h> #include <sound/core.h> #include <sound/pcm.h> #include "pcm_plugin.h" /* * Basic linear conversion plugin */ struct linear_priv { int cvt_endian; /* need endian conversion? */ unsigned int src_ofs; /* byte offset in source format */ unsigned int dst_ofs; /* byte soffset in destination format */ unsigned int copy_ofs; /* byte offset in temporary u32 data */ unsigned int dst_bytes; /* byte size of destination format */ unsigned int copy_bytes; /* bytes to copy per conversion */ unsigned int flip; /* MSB flip for signeness, done after endian conv */ }; static inline void do_convert(struct linear_priv *data, unsigned char *dst, unsigned char *src) { unsigned int tmp = 0; unsigned char *p = (unsigned char *)&tmp; memcpy(p + data->copy_ofs, src + data->src_ofs, data->copy_bytes); if (data->cvt_endian) tmp = swab32(tmp); tmp ^= data->flip; memcpy(dst, p + data->dst_ofs, data->dst_bytes); } static void convert(struct snd_pcm_plugin *plugin, const struct snd_pcm_plugin_channel *src_channels, struct snd_pcm_plugin_channel *dst_channels, snd_pcm_uframes_t frames) { struct linear_priv *data = (struct linear_priv *)plugin->extra_data; int channel; int nchannels = plugin->src_format.channels; for (channel = 0; channel < nchannels; ++channel) { char *src; char *dst; int src_step, dst_step; snd_pcm_uframes_t frames1; if (!src_channels[channel].enabled) { if (dst_channels[channel].wanted) snd_pcm_area_silence(&dst_channels[channel].area, 0, frames, plugin->dst_format.format); dst_channels[channel].enabled = 0; continue; } dst_channels[channel].enabled = 1; src = src_channels[channel].area.addr + src_channels[channel].area.first / 8; dst = dst_channels[channel].area.addr + dst_channels[channel].area.first / 8; src_step = src_channels[channel].area.step / 8; dst_step = dst_channels[channel].area.step / 8; frames1 = frames; while (frames1-- > 0) { do_convert(data, dst, src); src += src_step; dst += dst_step; } } } static snd_pcm_sframes_t linear_transfer(struct snd_pcm_plugin *plugin, const struct snd_pcm_plugin_channel *src_channels, struct snd_pcm_plugin_channel *dst_channels, snd_pcm_uframes_t frames) { if (snd_BUG_ON(!plugin || !src_channels || !dst_channels)) return -ENXIO; if (frames == 0) return 0; #ifdef CONFIG_SND_DEBUG { unsigned int channel; for (channel = 0; channel < plugin->src_format.channels; channel++) { if (snd_BUG_ON(src_channels[channel].area.first % 8 || src_channels[channel].area.step % 8)) return -ENXIO; if (snd_BUG_ON(dst_channels[channel].area.first % 8 || dst_channels[channel].area.step % 8)) return -ENXIO; } } #endif if (frames > dst_channels[0].frames) frames = dst_channels[0].frames; convert(plugin, src_channels, dst_channels, frames); return frames; } static void init_data(struct linear_priv *data, snd_pcm_format_t src_format, snd_pcm_format_t dst_format) { int src_le, dst_le, src_bytes, dst_bytes; src_bytes = snd_pcm_format_width(src_format) / 8; dst_bytes = snd_pcm_format_width(dst_format) / 8; src_le = snd_pcm_format_little_endian(src_format) > 0; dst_le = snd_pcm_format_little_endian(dst_format) > 0; data->dst_bytes = dst_bytes; data->cvt_endian = src_le != dst_le; data->copy_bytes = src_bytes < dst_bytes ? src_bytes : dst_bytes; if (src_le) { data->copy_ofs = 4 - data->copy_bytes; data->src_ofs = src_bytes - data->copy_bytes; } else data->src_ofs = snd_pcm_format_physical_width(src_format) / 8 - src_bytes; if (dst_le) data->dst_ofs = 4 - data->dst_bytes; else data->dst_ofs = snd_pcm_format_physical_width(dst_format) / 8 - dst_bytes; if (snd_pcm_format_signed(src_format) != snd_pcm_format_signed(dst_format)) { if (dst_le) data->flip = (__force u32)cpu_to_le32(0x80000000); else data->flip = (__force u32)cpu_to_be32(0x80000000); } } int snd_pcm_plugin_build_linear(struct snd_pcm_substream *plug, struct snd_pcm_plugin_format *src_format, struct snd_pcm_plugin_format *dst_format, struct snd_pcm_plugin **r_plugin) { int err; struct linear_priv *data; struct snd_pcm_plugin *plugin; if (snd_BUG_ON(!r_plugin)) return -ENXIO; *r_plugin = NULL; if (snd_BUG_ON(src_format->rate != dst_format->rate)) return -ENXIO; if (snd_BUG_ON(src_format->channels != dst_format->channels)) return -ENXIO; if (snd_BUG_ON(!snd_pcm_format_linear(src_format->format) || !snd_pcm_format_linear(dst_format->format))) return -ENXIO; err = snd_pcm_plugin_build(plug, "linear format conversion", src_format, dst_format, sizeof(struct linear_priv), &plugin); if (err < 0) return err; data = (struct linear_priv *)plugin->extra_data; init_data(data, src_format->format, dst_format->format); plugin->transfer = linear_transfer; *r_plugin = plugin; return 0; } |
| 181 181 135 27 196 179 192 196 28 28 27 28 28 27 28 28 195 28 27 194 196 193 196 28 194 29 181 181 179 192 180 194 28 179 196 196 191 196 135 26 27 109 110 109 1 134 28 6 6 6 2 2 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 | // SPDX-License-Identifier: GPL-2.0 /* * blk-mq scheduling framework * * Copyright (C) 2016 Jens Axboe */ #include <linux/kernel.h> #include <linux/module.h> #include <linux/list_sort.h> #include <trace/events/block.h> #include "blk.h" #include "blk-mq.h" #include "blk-mq-debugfs.h" #include "blk-mq-sched.h" #include "blk-wbt.h" /* * Mark a hardware queue as needing a restart. */ void blk_mq_sched_mark_restart_hctx(struct blk_mq_hw_ctx *hctx) { if (test_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state)) return; set_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state); } EXPORT_SYMBOL_GPL(blk_mq_sched_mark_restart_hctx); void __blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx) { clear_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state); /* * Order clearing SCHED_RESTART and list_empty_careful(&hctx->dispatch) * in blk_mq_run_hw_queue(). Its pair is the barrier in * blk_mq_dispatch_rq_list(). So dispatch code won't see SCHED_RESTART, * meantime new request added to hctx->dispatch is missed to check in * blk_mq_run_hw_queue(). */ smp_mb(); blk_mq_run_hw_queue(hctx, true); } static int sched_rq_cmp(void *priv, const struct list_head *a, const struct list_head *b) { struct request *rqa = container_of(a, struct request, queuelist); struct request *rqb = container_of(b, struct request, queuelist); return rqa->mq_hctx > rqb->mq_hctx; } static bool blk_mq_dispatch_hctx_list(struct list_head *rq_list) { struct blk_mq_hw_ctx *hctx = list_first_entry(rq_list, struct request, queuelist)->mq_hctx; struct request *rq; LIST_HEAD(hctx_list); list_for_each_entry(rq, rq_list, queuelist) { if (rq->mq_hctx != hctx) { list_cut_before(&hctx_list, rq_list, &rq->queuelist); goto dispatch; } } list_splice_tail_init(rq_list, &hctx_list); dispatch: return blk_mq_dispatch_rq_list(hctx, &hctx_list, false); } #define BLK_MQ_BUDGET_DELAY 3 /* ms units */ /* * Only SCSI implements .get_budget and .put_budget, and SCSI restarts * its queue by itself in its completion handler, so we don't need to * restart queue if .get_budget() fails to get the budget. * * Returns -EAGAIN if hctx->dispatch was found non-empty and run_work has to * be run again. This is necessary to avoid starving flushes. */ static int __blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) { struct request_queue *q = hctx->queue; struct elevator_queue *e = q->elevator; bool multi_hctxs = false, run_queue = false; bool dispatched = false, busy = false; unsigned int max_dispatch; LIST_HEAD(rq_list); int count = 0; if (hctx->dispatch_busy) max_dispatch = 1; else max_dispatch = hctx->queue->nr_requests; do { struct request *rq; int budget_token; if (e->type->ops.has_work && !e->type->ops.has_work(hctx)) break; if (!list_empty_careful(&hctx->dispatch)) { busy = true; break; } budget_token = blk_mq_get_dispatch_budget(q); if (budget_token < 0) break; rq = e->type->ops.dispatch_request(hctx); if (!rq) { blk_mq_put_dispatch_budget(q, budget_token); /* * We're releasing without dispatching. Holding the * budget could have blocked any "hctx"s with the * same queue and if we didn't dispatch then there's * no guarantee anyone will kick the queue. Kick it * ourselves. */ run_queue = true; break; } blk_mq_set_rq_budget_token(rq, budget_token); /* * Now this rq owns the budget which has to be released * if this rq won't be queued to driver via .queue_rq() * in blk_mq_dispatch_rq_list(). */ list_add_tail(&rq->queuelist, &rq_list); count++; if (rq->mq_hctx != hctx) multi_hctxs = true; /* * If we cannot get tag for the request, stop dequeueing * requests from the IO scheduler. We are unlikely to be able * to submit them anyway and it creates false impression for * scheduling heuristics that the device can take more IO. */ if (!blk_mq_get_driver_tag(rq)) break; } while (count < max_dispatch); if (!count) { if (run_queue) blk_mq_delay_run_hw_queues(q, BLK_MQ_BUDGET_DELAY); } else if (multi_hctxs) { /* * Requests from different hctx may be dequeued from some * schedulers, such as bfq and deadline. * * Sort the requests in the list according to their hctx, * dispatch batching requests from same hctx at a time. */ list_sort(NULL, &rq_list, sched_rq_cmp); do { dispatched |= blk_mq_dispatch_hctx_list(&rq_list); } while (!list_empty(&rq_list)); } else { dispatched = blk_mq_dispatch_rq_list(hctx, &rq_list, false); } if (busy) return -EAGAIN; return !!dispatched; } static int blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) { unsigned long end = jiffies + HZ; int ret; do { ret = __blk_mq_do_dispatch_sched(hctx); if (ret != 1) break; if (need_resched() || time_is_before_jiffies(end)) { blk_mq_delay_run_hw_queue(hctx, 0); break; } } while (1); return ret; } static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx) { unsigned short idx = ctx->index_hw[hctx->type]; if (++idx == hctx->nr_ctx) idx = 0; return hctx->ctxs[idx]; } /* * Only SCSI implements .get_budget and .put_budget, and SCSI restarts * its queue by itself in its completion handler, so we don't need to * restart queue if .get_budget() fails to get the budget. * * Returns -EAGAIN if hctx->dispatch was found non-empty and run_work has to * be run again. This is necessary to avoid starving flushes. */ static int blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx) { struct request_queue *q = hctx->queue; LIST_HEAD(rq_list); struct blk_mq_ctx *ctx = READ_ONCE(hctx->dispatch_from); int ret = 0; struct request *rq; do { int budget_token; if (!list_empty_careful(&hctx->dispatch)) { ret = -EAGAIN; break; } if (!sbitmap_any_bit_set(&hctx->ctx_map)) break; budget_token = blk_mq_get_dispatch_budget(q); if (budget_token < 0) break; rq = blk_mq_dequeue_from_ctx(hctx, ctx); if (!rq) { blk_mq_put_dispatch_budget(q, budget_token); /* * We're releasing without dispatching. Holding the * budget could have blocked any "hctx"s with the * same queue and if we didn't dispatch then there's * no guarantee anyone will kick the queue. Kick it * ourselves. */ blk_mq_delay_run_hw_queues(q, BLK_MQ_BUDGET_DELAY); break; } blk_mq_set_rq_budget_token(rq, budget_token); /* * Now this rq owns the budget which has to be released * if this rq won't be queued to driver via .queue_rq() * in blk_mq_dispatch_rq_list(). */ list_add(&rq->queuelist, &rq_list); /* round robin for fair dispatch */ ctx = blk_mq_next_ctx(hctx, rq->mq_ctx); } while (blk_mq_dispatch_rq_list(rq->mq_hctx, &rq_list, false)); WRITE_ONCE(hctx->dispatch_from, ctx); return ret; } static int __blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx) { bool need_dispatch = false; LIST_HEAD(rq_list); /* * If we have previous entries on our dispatch list, grab them first for * more fair dispatch. */ if (!list_empty_careful(&hctx->dispatch)) { spin_lock(&hctx->lock); if (!list_empty(&hctx->dispatch)) list_splice_init(&hctx->dispatch, &rq_list); spin_unlock(&hctx->lock); } /* * Only ask the scheduler for requests, if we didn't have residual * requests from the dispatch list. This is to avoid the case where * we only ever dispatch a fraction of the requests available because * of low device queue depth. Once we pull requests out of the IO * scheduler, we can no longer merge or sort them. So it's best to * leave them there for as long as we can. Mark the hw queue as * needing a restart in that case. * * We want to dispatch from the scheduler if there was nothing * on the dispatch list or we were able to dispatch from the * dispatch list. */ if (!list_empty(&rq_list)) { blk_mq_sched_mark_restart_hctx(hctx); if (!blk_mq_dispatch_rq_list(hctx, &rq_list, true)) return 0; need_dispatch = true; } else { need_dispatch = hctx->dispatch_busy; } if (hctx->queue->elevator) return blk_mq_do_dispatch_sched(hctx); /* dequeue request one by one from sw queue if queue is busy */ if (need_dispatch) return blk_mq_do_dispatch_ctx(hctx); blk_mq_flush_busy_ctxs(hctx, &rq_list); blk_mq_dispatch_rq_list(hctx, &rq_list, true); return 0; } void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx) { struct request_queue *q = hctx->queue; /* RCU or SRCU read lock is needed before checking quiesced flag */ if (unlikely(blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(q))) return; /* * A return of -EAGAIN is an indication that hctx->dispatch is not * empty and we must run again in order to avoid starving flushes. */ if (__blk_mq_sched_dispatch_requests(hctx) == -EAGAIN) { if (__blk_mq_sched_dispatch_requests(hctx) == -EAGAIN) blk_mq_run_hw_queue(hctx, true); } } bool blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio, unsigned int nr_segs) { struct elevator_queue *e = q->elevator; struct blk_mq_ctx *ctx; struct blk_mq_hw_ctx *hctx; bool ret = false; enum hctx_type type; if (e && e->type->ops.bio_merge) { ret = e->type->ops.bio_merge(q, bio, nr_segs); goto out_put; } ctx = blk_mq_get_ctx(q); hctx = blk_mq_map_queue(bio->bi_opf, ctx); type = hctx->type; if (list_empty_careful(&ctx->rq_lists[type])) goto out_put; /* default per sw-queue merge */ spin_lock(&ctx->lock); /* * Reverse check our software queue for entries that we could * potentially merge with. Currently includes a hand-wavy stop * count of 8, to not spend too much time checking for merges. */ if (blk_bio_list_merge(q, &ctx->rq_lists[type], bio, nr_segs)) ret = true; spin_unlock(&ctx->lock); out_put: return ret; } bool blk_mq_sched_try_insert_merge(struct request_queue *q, struct request *rq, struct list_head *free) { return rq_mergeable(rq) && elv_attempt_insert_merge(q, rq, free); } EXPORT_SYMBOL_GPL(blk_mq_sched_try_insert_merge); /* called in queue's release handler, tagset has gone away */ static void blk_mq_sched_tags_teardown(struct request_queue *q, unsigned int flags) { struct blk_mq_hw_ctx *hctx; unsigned long i; queue_for_each_hw_ctx(q, hctx, i) hctx->sched_tags = NULL; if (blk_mq_is_shared_tags(flags)) q->sched_shared_tags = NULL; } void blk_mq_sched_reg_debugfs(struct request_queue *q) { struct blk_mq_hw_ctx *hctx; unsigned int memflags; unsigned long i; memflags = blk_debugfs_lock(q); blk_mq_debugfs_register_sched(q); queue_for_each_hw_ctx(q, hctx, i) blk_mq_debugfs_register_sched_hctx(q, hctx); blk_debugfs_unlock(q, memflags); } void blk_mq_sched_unreg_debugfs(struct request_queue *q) { struct blk_mq_hw_ctx *hctx; unsigned long i; blk_debugfs_lock_nomemsave(q); queue_for_each_hw_ctx(q, hctx, i) blk_mq_debugfs_unregister_sched_hctx(hctx); blk_mq_debugfs_unregister_sched(q); blk_debugfs_unlock_nomemrestore(q); } void blk_mq_free_sched_tags(struct elevator_tags *et, struct blk_mq_tag_set *set) { unsigned long i; /* Shared tags are stored at index 0 in @tags. */ if (blk_mq_is_shared_tags(set->flags)) blk_mq_free_map_and_rqs(set, et->tags[0], BLK_MQ_NO_HCTX_IDX); else { for (i = 0; i < et->nr_hw_queues; i++) blk_mq_free_map_and_rqs(set, et->tags[i], i); } kfree(et); } void blk_mq_free_sched_res(struct elevator_resources *res, struct elevator_type *type, struct blk_mq_tag_set *set) { if (res->et) { blk_mq_free_sched_tags(res->et, set); res->et = NULL; } if (res->data) { blk_mq_free_sched_data(type, res->data); res->data = NULL; } } void blk_mq_free_sched_res_batch(struct xarray *elv_tbl, struct blk_mq_tag_set *set) { struct request_queue *q; struct elv_change_ctx *ctx; lockdep_assert_held_write(&set->update_nr_hwq_lock); list_for_each_entry(q, &set->tag_list, tag_set_list) { /* * Accessing q->elevator without holding q->elevator_lock is * safe because we're holding here set->update_nr_hwq_lock in * the writer context. So, scheduler update/switch code (which * acquires the same lock but in the reader context) can't run * concurrently. */ if (q->elevator) { ctx = xa_load(elv_tbl, q->id); if (!ctx) { WARN_ON_ONCE(1); continue; } blk_mq_free_sched_res(&ctx->res, ctx->type, set); } } } void blk_mq_free_sched_ctx_batch(struct xarray *elv_tbl) { unsigned long i; struct elv_change_ctx *ctx; xa_for_each(elv_tbl, i, ctx) { xa_erase(elv_tbl, i); kfree(ctx); } } int blk_mq_alloc_sched_ctx_batch(struct xarray *elv_tbl, struct blk_mq_tag_set *set) { struct request_queue *q; struct elv_change_ctx *ctx; lockdep_assert_held_write(&set->update_nr_hwq_lock); list_for_each_entry(q, &set->tag_list, tag_set_list) { ctx = kzalloc_obj(struct elv_change_ctx); if (!ctx) return -ENOMEM; if (xa_insert(elv_tbl, q->id, ctx, GFP_KERNEL)) { kfree(ctx); return -ENOMEM; } } return 0; } struct elevator_tags *blk_mq_alloc_sched_tags(struct blk_mq_tag_set *set, unsigned int nr_hw_queues, unsigned int nr_requests) { unsigned int nr_tags; int i; struct elevator_tags *et; gfp_t gfp = GFP_NOIO | __GFP_ZERO | __GFP_NOWARN | __GFP_NORETRY; if (blk_mq_is_shared_tags(set->flags)) nr_tags = 1; else nr_tags = nr_hw_queues; et = kmalloc_flex(*et, tags, nr_tags, gfp); if (!et) return NULL; et->nr_requests = nr_requests; et->nr_hw_queues = nr_hw_queues; if (blk_mq_is_shared_tags(set->flags)) { /* Shared tags are stored at index 0 in @tags. */ et->tags[0] = blk_mq_alloc_map_and_rqs(set, BLK_MQ_NO_HCTX_IDX, MAX_SCHED_RQ); if (!et->tags[0]) goto out; } else { for (i = 0; i < et->nr_hw_queues; i++) { et->tags[i] = blk_mq_alloc_map_and_rqs(set, i, et->nr_requests); if (!et->tags[i]) goto out_unwind; } } return et; out_unwind: while (--i >= 0) blk_mq_free_map_and_rqs(set, et->tags[i], i); out: kfree(et); return NULL; } int blk_mq_alloc_sched_res(struct request_queue *q, struct elevator_type *type, struct elevator_resources *res, unsigned int nr_hw_queues) { struct blk_mq_tag_set *set = q->tag_set; res->et = blk_mq_alloc_sched_tags(set, nr_hw_queues, blk_mq_default_nr_requests(set)); if (!res->et) return -ENOMEM; res->data = blk_mq_alloc_sched_data(q, type); if (IS_ERR(res->data)) { blk_mq_free_sched_tags(res->et, set); return -ENOMEM; } return 0; } int blk_mq_alloc_sched_res_batch(struct xarray *elv_tbl, struct blk_mq_tag_set *set, unsigned int nr_hw_queues) { struct elv_change_ctx *ctx; struct request_queue *q; int ret = -ENOMEM; lockdep_assert_held_write(&set->update_nr_hwq_lock); list_for_each_entry(q, &set->tag_list, tag_set_list) { /* * Accessing q->elevator without holding q->elevator_lock is * safe because we're holding here set->update_nr_hwq_lock in * the writer context. So, scheduler update/switch code (which * acquires the same lock but in the reader context) can't run * concurrently. */ if (q->elevator) { ctx = xa_load(elv_tbl, q->id); if (WARN_ON_ONCE(!ctx)) { ret = -ENOENT; goto out_unwind; } ret = blk_mq_alloc_sched_res(q, q->elevator->type, &ctx->res, nr_hw_queues); if (ret) goto out_unwind; } } return 0; out_unwind: list_for_each_entry_continue_reverse(q, &set->tag_list, tag_set_list) { if (q->elevator) { ctx = xa_load(elv_tbl, q->id); if (ctx) blk_mq_free_sched_res(&ctx->res, ctx->type, set); } } return ret; } /* caller must have a reference to @e, will grab another one if successful */ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e, struct elevator_resources *res) { unsigned int flags = q->tag_set->flags; struct elevator_tags *et = res->et; struct blk_mq_hw_ctx *hctx; struct elevator_queue *eq; unsigned long i; int ret; eq = elevator_alloc(q, e, res); if (!eq) return -ENOMEM; q->nr_requests = et->nr_requests; if (blk_mq_is_shared_tags(flags)) { /* Shared tags are stored at index 0 in @et->tags. */ q->sched_shared_tags = et->tags[0]; blk_mq_tag_update_sched_shared_tags(q, et->nr_requests); } queue_for_each_hw_ctx(q, hctx, i) { if (blk_mq_is_shared_tags(flags)) hctx->sched_tags = q->sched_shared_tags; else hctx->sched_tags = et->tags[i]; } ret = e->ops.init_sched(q, eq); if (ret) goto out; queue_for_each_hw_ctx(q, hctx, i) { if (e->ops.init_hctx) { ret = e->ops.init_hctx(hctx, i); if (ret) { blk_mq_exit_sched(q, eq); kobject_put(&eq->kobj); return ret; } } } return 0; out: blk_mq_sched_tags_teardown(q, flags); kobject_put(&eq->kobj); q->elevator = NULL; return ret; } /* * called in either blk_queue_cleanup or elevator_switch, tagset * is required for freeing requests */ void blk_mq_sched_free_rqs(struct request_queue *q) { struct blk_mq_hw_ctx *hctx; unsigned long i; if (blk_mq_is_shared_tags(q->tag_set->flags)) { blk_mq_free_rqs(q->tag_set, q->sched_shared_tags, BLK_MQ_NO_HCTX_IDX); } else { queue_for_each_hw_ctx(q, hctx, i) { if (hctx->sched_tags) blk_mq_free_rqs(q->tag_set, hctx->sched_tags, i); } } } void blk_mq_exit_sched(struct request_queue *q, struct elevator_queue *e) { struct blk_mq_hw_ctx *hctx; unsigned long i; unsigned int flags = 0; queue_for_each_hw_ctx(q, hctx, i) { if (e->type->ops.exit_hctx && hctx->sched_data) { e->type->ops.exit_hctx(hctx, i); hctx->sched_data = NULL; } flags = hctx->flags; } if (e->type->ops.exit_sched) e->type->ops.exit_sched(e); blk_mq_sched_tags_teardown(q, flags); set_bit(ELEVATOR_FLAG_DYING, &q->elevator->flags); q->elevator = NULL; } |
| 28 28 25 2 2 2 2 2 2 2 2 70 84 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 | // SPDX-License-Identifier: GPL-2.0-or-later /* SCTP kernel implementation * (C) Copyright Red Hat Inc. 2017 * * This file is part of the SCTP kernel implementation * * These functions implement sctp stream message interleaving, mostly * including I-DATA and I-FORWARD-TSN chunks process. * * Please send any bug reports or fixes you make to the * email addresched(es): * lksctp developers <linux-sctp@vger.kernel.org> * * Written or modified by: * Xin Long <lucien.xin@gmail.com> */ #include <net/busy_poll.h> #include <net/sctp/sctp.h> #include <net/sctp/sm.h> #include <net/sctp/ulpevent.h> #include <linux/sctp.h> static struct sctp_chunk *sctp_make_idatafrag_empty( const struct sctp_association *asoc, const struct sctp_sndrcvinfo *sinfo, int len, __u8 flags, gfp_t gfp) { struct sctp_chunk *retval; struct sctp_idatahdr dp; memset(&dp, 0, sizeof(dp)); dp.stream = htons(sinfo->sinfo_stream); if (sinfo->sinfo_flags & SCTP_UNORDERED) flags |= SCTP_DATA_UNORDERED; retval = sctp_make_idata(asoc, flags, sizeof(dp) + len, gfp); if (!retval) return NULL; retval->subh.idata_hdr = sctp_addto_chunk(retval, sizeof(dp), &dp); memcpy(&retval->sinfo, sinfo, sizeof(struct sctp_sndrcvinfo)); return retval; } static void sctp_chunk_assign_mid(struct sctp_chunk *chunk) { struct sctp_stream *stream; struct sctp_chunk *lchunk; __u32 cfsn = 0; __u16 sid; if (chunk->has_mid) return; sid = sctp_chunk_stream_no(chunk); stream = &chunk->asoc->stream; list_for_each_entry(lchunk, &chunk->msg->chunks, frag_list) { struct sctp_idatahdr *hdr; __u32 mid; lchunk->has_mid = 1; hdr = lchunk->subh.idata_hdr; if (lchunk->chunk_hdr->flags & SCTP_DATA_FIRST_FRAG) hdr->ppid = lchunk->sinfo.sinfo_ppid; else hdr->fsn = htonl(cfsn++); if (lchunk->chunk_hdr->flags & SCTP_DATA_UNORDERED) { mid = lchunk->chunk_hdr->flags & SCTP_DATA_LAST_FRAG ? sctp_mid_uo_next(stream, out, sid) : sctp_mid_uo_peek(stream, out, sid); } else { mid = lchunk->chunk_hdr->flags & SCTP_DATA_LAST_FRAG ? sctp_mid_next(stream, out, sid) : sctp_mid_peek(stream, out, sid); } hdr->mid = htonl(mid); } } static bool sctp_validate_data(struct sctp_chunk *chunk) { struct sctp_stream *stream; __u16 sid, ssn; if (chunk->chunk_hdr->type != SCTP_CID_DATA) return false; if (chunk->chunk_hdr->flags & SCTP_DATA_UNORDERED) return true; stream = &chunk->asoc->stream; sid = sctp_chunk_stream_no(chunk); ssn = ntohs(chunk->subh.data_hdr->ssn); return !SSN_lt(ssn, sctp_ssn_peek(stream, in, sid)); } static bool sctp_validate_idata(struct sctp_chunk *chunk) { struct sctp_stream *stream; __u32 mid; __u16 sid; if (chunk->chunk_hdr->type != SCTP_CID_I_DATA) return false; if (chunk->chunk_hdr->flags & SCTP_DATA_UNORDERED) return true; stream = &chunk->asoc->stream; sid = sctp_chunk_stream_no(chunk); mid = ntohl(chunk->subh.idata_hdr->mid); return !MID_lt(mid, sctp_mid_peek(stream, in, sid)); } static void sctp_intl_store_reasm(struct sctp_ulpq *ulpq, struct sctp_ulpevent *event) { struct sctp_ulpevent *cevent; struct sk_buff *pos, *loc; pos = skb_peek_tail(&ulpq->reasm); if (!pos) { __skb_queue_tail(&ulpq->reasm, sctp_event2skb(event)); return; } cevent = sctp_skb2event(pos); if (event->stream == cevent->stream && event->mid == cevent->mid && (cevent->msg_flags & SCTP_DATA_FIRST_FRAG || (!(event->msg_flags & SCTP_DATA_FIRST_FRAG) && event->fsn > cevent->fsn))) { __skb_queue_tail(&ulpq->reasm, sctp_event2skb(event)); return; } if ((event->stream == cevent->stream && MID_lt(cevent->mid, event->mid)) || event->stream > cevent->stream) { __skb_queue_tail(&ulpq->reasm, sctp_event2skb(event)); return; } loc = NULL; skb_queue_walk(&ulpq->reasm, pos) { cevent = sctp_skb2event(pos); if (event->stream < cevent->stream || (event->stream == cevent->stream && MID_lt(event->mid, cevent->mid))) { loc = pos; break; } if (event->stream == cevent->stream && event->mid == cevent->mid && !(cevent->msg_flags & SCTP_DATA_FIRST_FRAG) && (event->msg_flags & SCTP_DATA_FIRST_FRAG || event->fsn < cevent->fsn)) { loc = pos; break; } } if (!loc) __skb_queue_tail(&ulpq->reasm, sctp_event2skb(event)); else __skb_queue_before(&ulpq->reasm, loc, sctp_event2skb(event)); } static struct sctp_ulpevent *sctp_intl_retrieve_partial( struct sctp_ulpq *ulpq, struct sctp_ulpevent *event) { struct sk_buff *first_frag = NULL; struct sk_buff *last_frag = NULL; struct sctp_ulpevent *retval; struct sctp_stream_in *sin; struct sk_buff *pos; __u32 next_fsn = 0; int is_last = 0; sin = sctp_stream_in(&ulpq->asoc->stream, event->stream); skb_queue_walk(&ulpq->reasm, pos) { struct sctp_ulpevent *cevent = sctp_skb2event(pos); if (cevent->stream < event->stream) continue; if (cevent->stream > event->stream || cevent->mid != sin->mid) break; switch (cevent->msg_flags & SCTP_DATA_FRAG_MASK) { case SCTP_DATA_FIRST_FRAG: goto out; case SCTP_DATA_MIDDLE_FRAG: if (!first_frag) { if (cevent->fsn == sin->fsn) { first_frag = pos; last_frag = pos; next_fsn = cevent->fsn + 1; } } else if (cevent->fsn == next_fsn) { last_frag = pos; next_fsn++; } else { goto out; } break; case SCTP_DATA_LAST_FRAG: if (!first_frag) { if (cevent->fsn == sin->fsn) { first_frag = pos; last_frag = pos; next_fsn = 0; is_last = 1; } } else if (cevent->fsn == next_fsn) { last_frag = pos; next_fsn = 0; is_last = 1; } goto out; default: goto out; } } out: if (!first_frag) return NULL; retval = sctp_make_reassembled_event(ulpq->asoc->base.net, &ulpq->reasm, first_frag, last_frag); if (retval) { sin->fsn = next_fsn; if (is_last) { retval->msg_flags |= MSG_EOR; sin->pd_mode = 0; } } return retval; } static struct sctp_ulpevent *sctp_intl_retrieve_reassembled( struct sctp_ulpq *ulpq, struct sctp_ulpevent *event) { struct sctp_association *asoc = ulpq->asoc; struct sk_buff *pos, *first_frag = NULL; struct sctp_ulpevent *retval = NULL; struct sk_buff *pd_first = NULL; struct sk_buff *pd_last = NULL; struct sctp_stream_in *sin; __u32 next_fsn = 0; __u32 pd_point = 0; __u32 pd_len = 0; __u32 mid = 0; sin = sctp_stream_in(&ulpq->asoc->stream, event->stream); skb_queue_walk(&ulpq->reasm, pos) { struct sctp_ulpevent *cevent = sctp_skb2event(pos); if (cevent->stream < event->stream) continue; if (cevent->stream > event->stream) break; if (MID_lt(cevent->mid, event->mid)) continue; if (MID_lt(event->mid, cevent->mid)) break; switch (cevent->msg_flags & SCTP_DATA_FRAG_MASK) { case SCTP_DATA_FIRST_FRAG: if (cevent->mid == sin->mid) { pd_first = pos; pd_last = pos; pd_len = pos->len; } first_frag = pos; next_fsn = 0; mid = cevent->mid; break; case SCTP_DATA_MIDDLE_FRAG: if (first_frag && cevent->mid == mid && cevent->fsn == next_fsn) { next_fsn++; if (pd_first) { pd_last = pos; pd_len += pos->len; } } else { first_frag = NULL; } break; case SCTP_DATA_LAST_FRAG: if (first_frag && cevent->mid == mid && cevent->fsn == next_fsn) goto found; else first_frag = NULL; break; } } if (!pd_first) goto out; pd_point = sctp_sk(asoc->base.sk)->pd_point; if (pd_point && pd_point <= pd_len) { retval = sctp_make_reassembled_event(asoc->base.net, &ulpq->reasm, pd_first, pd_last); if (retval) { sin->fsn = next_fsn; sin->pd_mode = 1; } } goto out; found: retval = sctp_make_reassembled_event(asoc->base.net, &ulpq->reasm, first_frag, pos); if (retval) retval->msg_flags |= MSG_EOR; out: return retval; } static struct sctp_ulpevent *sctp_intl_reasm(struct sctp_ulpq *ulpq, struct sctp_ulpevent *event) { struct sctp_ulpevent *retval = NULL; struct sctp_stream_in *sin; if (SCTP_DATA_NOT_FRAG == (event->msg_flags & SCTP_DATA_FRAG_MASK)) { event->msg_flags |= MSG_EOR; return event; } sctp_intl_store_reasm(ulpq, event); sin = sctp_stream_in(&ulpq->asoc->stream, event->stream); if (sin->pd_mode && event->mid == sin->mid && event->fsn == sin->fsn) retval = sctp_intl_retrieve_partial(ulpq, event); if (!retval) retval = sctp_intl_retrieve_reassembled(ulpq, event); return retval; } static void sctp_intl_store_ordered(struct sctp_ulpq *ulpq, struct sctp_ulpevent *event) { struct sctp_ulpevent *cevent; struct sk_buff *pos, *loc; pos = skb_peek_tail(&ulpq->lobby); if (!pos) { __skb_queue_tail(&ulpq->lobby, sctp_event2skb(event)); return; } cevent = (struct sctp_ulpevent *)pos->cb; if (event->stream == cevent->stream && MID_lt(cevent->mid, event->mid)) { __skb_queue_tail(&ulpq->lobby, sctp_event2skb(event)); return; } if (event->stream > cevent->stream) { __skb_queue_tail(&ulpq->lobby, sctp_event2skb(event)); return; } loc = NULL; skb_queue_walk(&ulpq->lobby, pos) { cevent = (struct sctp_ulpevent *)pos->cb; if (cevent->stream > event->stream) { loc = pos; break; } if (cevent->stream == event->stream && MID_lt(event->mid, cevent->mid)) { loc = pos; break; } } if (!loc) __skb_queue_tail(&ulpq->lobby, sctp_event2skb(event)); else __skb_queue_before(&ulpq->lobby, loc, sctp_event2skb(event)); } static void sctp_intl_retrieve_ordered(struct sctp_ulpq *ulpq, struct sctp_ulpevent *event) { struct sk_buff_head *event_list; struct sctp_stream *stream; struct sk_buff *pos, *tmp; __u16 sid = event->stream; stream = &ulpq->asoc->stream; event_list = (struct sk_buff_head *)sctp_event2skb(event)->prev; sctp_skb_for_each(pos, &ulpq->lobby, tmp) { struct sctp_ulpevent *cevent = (struct sctp_ulpevent *)pos->cb; if (cevent->stream > sid) break; if (cevent->stream < sid) continue; if (cevent->mid != sctp_mid_peek(stream, in, sid)) break; sctp_mid_next(stream, in, sid); __skb_unlink(pos, &ulpq->lobby); __skb_queue_tail(event_list, pos); } } static struct sctp_ulpevent *sctp_intl_order(struct sctp_ulpq *ulpq, struct sctp_ulpevent *event) { struct sctp_stream *stream; __u16 sid; stream = &ulpq->asoc->stream; sid = event->stream; if (event->mid != sctp_mid_peek(stream, in, sid)) { sctp_intl_store_ordered(ulpq, event); return NULL; } sctp_mid_next(stream, in, sid); sctp_intl_retrieve_ordered(ulpq, event); return event; } static int sctp_enqueue_event(struct sctp_ulpq *ulpq, struct sk_buff_head *skb_list) { struct sock *sk = ulpq->asoc->base.sk; struct sctp_sock *sp = sctp_sk(sk); struct sctp_ulpevent *event; struct sk_buff *skb; skb = __skb_peek(skb_list); event = sctp_skb2event(skb); if (sk->sk_shutdown & RCV_SHUTDOWN && (sk->sk_shutdown & SEND_SHUTDOWN || !sctp_ulpevent_is_notification(event))) goto out_free; if (!sctp_ulpevent_is_notification(event)) { sk_mark_napi_id(sk, skb); sk_incoming_cpu_update(sk); } if (!sctp_ulpevent_is_enabled(event, ulpq->asoc->subscribe)) goto out_free; skb_queue_splice_tail_init(skb_list, &sk->sk_receive_queue); if (!sp->data_ready_signalled) { sp->data_ready_signalled = 1; sk->sk_data_ready(sk); } return 1; out_free: sctp_queue_purge_ulpevents(skb_list); return 0; } static void sctp_intl_store_reasm_uo(struct sctp_ulpq *ulpq, struct sctp_ulpevent *event) { struct sctp_ulpevent *cevent; struct sk_buff *pos; pos = skb_peek_tail(&ulpq->reasm_uo); if (!pos) { __skb_queue_tail(&ulpq->reasm_uo, sctp_event2skb(event)); return; } cevent = sctp_skb2event(pos); if (event->stream == cevent->stream && event->mid == cevent->mid && (cevent->msg_flags & SCTP_DATA_FIRST_FRAG || (!(event->msg_flags & SCTP_DATA_FIRST_FRAG) && event->fsn > cevent->fsn))) { __skb_queue_tail(&ulpq->reasm_uo, sctp_event2skb(event)); return; } if ((event->stream == cevent->stream && MID_lt(cevent->mid, event->mid)) || event->stream > cevent->stream) { __skb_queue_tail(&ulpq->reasm_uo, sctp_event2skb(event)); return; } skb_queue_walk(&ulpq->reasm_uo, pos) { cevent = sctp_skb2event(pos); if (event->stream < cevent->stream || (event->stream == cevent->stream && MID_lt(event->mid, cevent->mid))) break; if (event->stream == cevent->stream && event->mid == cevent->mid && !(cevent->msg_flags & SCTP_DATA_FIRST_FRAG) && (event->msg_flags & SCTP_DATA_FIRST_FRAG || event->fsn < cevent->fsn)) break; } __skb_queue_before(&ulpq->reasm_uo, pos, sctp_event2skb(event)); } static struct sctp_ulpevent *sctp_intl_retrieve_partial_uo( struct sctp_ulpq *ulpq, struct sctp_ulpevent *event) { struct sk_buff *first_frag = NULL; struct sk_buff *last_frag = NULL; struct sctp_ulpevent *retval; struct sctp_stream_in *sin; struct sk_buff *pos; __u32 next_fsn = 0; int is_last = 0; sin = sctp_stream_in(&ulpq->asoc->stream, event->stream); skb_queue_walk(&ulpq->reasm_uo, pos) { struct sctp_ulpevent *cevent = sctp_skb2event(pos); if (cevent->stream < event->stream) continue; if (cevent->stream > event->stream) break; if (MID_lt(cevent->mid, sin->mid_uo)) continue; if (MID_lt(sin->mid_uo, cevent->mid)) break; switch (cevent->msg_flags & SCTP_DATA_FRAG_MASK) { case SCTP_DATA_FIRST_FRAG: goto out; case SCTP_DATA_MIDDLE_FRAG: if (!first_frag) { if (cevent->fsn == sin->fsn_uo) { first_frag = pos; last_frag = pos; next_fsn = cevent->fsn + 1; } } else if (cevent->fsn == next_fsn) { last_frag = pos; next_fsn++; } else { goto out; } break; case SCTP_DATA_LAST_FRAG: if (!first_frag) { if (cevent->fsn == sin->fsn_uo) { first_frag = pos; last_frag = pos; next_fsn = 0; is_last = 1; } } else if (cevent->fsn == next_fsn) { last_frag = pos; next_fsn = 0; is_last = 1; } goto out; default: goto out; } } out: if (!first_frag) return NULL; retval = sctp_make_reassembled_event(ulpq->asoc->base.net, &ulpq->reasm_uo, first_frag, last_frag); if (retval) { sin->fsn_uo = next_fsn; if (is_last) { retval->msg_flags |= MSG_EOR; sin->pd_mode_uo = 0; } } return retval; } static struct sctp_ulpevent *sctp_intl_retrieve_reassembled_uo( struct sctp_ulpq *ulpq, struct sctp_ulpevent *event) { struct sctp_association *asoc = ulpq->asoc; struct sk_buff *pos, *first_frag = NULL; struct sctp_ulpevent *retval = NULL; struct sk_buff *pd_first = NULL; struct sk_buff *pd_last = NULL; struct sctp_stream_in *sin; __u32 next_fsn = 0; __u32 pd_point = 0; __u32 pd_len = 0; __u32 mid = 0; sin = sctp_stream_in(&ulpq->asoc->stream, event->stream); skb_queue_walk(&ulpq->reasm_uo, pos) { struct sctp_ulpevent *cevent = sctp_skb2event(pos); if (cevent->stream < event->stream) continue; if (cevent->stream > event->stream) break; if (MID_lt(cevent->mid, event->mid)) continue; if (MID_lt(event->mid, cevent->mid)) break; switch (cevent->msg_flags & SCTP_DATA_FRAG_MASK) { case SCTP_DATA_FIRST_FRAG: if (!sin->pd_mode_uo) { sin->mid_uo = cevent->mid; pd_first = pos; pd_last = pos; pd_len = pos->len; } first_frag = pos; next_fsn = 0; mid = cevent->mid; break; case SCTP_DATA_MIDDLE_FRAG: if (first_frag && cevent->mid == mid && cevent->fsn == next_fsn) { next_fsn++; if (pd_first) { pd_last = pos; pd_len += pos->len; } } else { first_frag = NULL; } break; case SCTP_DATA_LAST_FRAG: if (first_frag && cevent->mid == mid && cevent->fsn == next_fsn) goto found; else first_frag = NULL; break; } } if (!pd_first) goto out; pd_point = sctp_sk(asoc->base.sk)->pd_point; if (pd_point && pd_point <= pd_len) { retval = sctp_make_reassembled_event(asoc->base.net, &ulpq->reasm_uo, pd_first, pd_last); if (retval) { sin->fsn_uo = next_fsn; sin->pd_mode_uo = 1; } } goto out; found: retval = sctp_make_reassembled_event(asoc->base.net, &ulpq->reasm_uo, first_frag, pos); if (retval) retval->msg_flags |= MSG_EOR; out: return retval; } static struct sctp_ulpevent *sctp_intl_reasm_uo(struct sctp_ulpq *ulpq, struct sctp_ulpevent *event) { struct sctp_ulpevent *retval = NULL; struct sctp_stream_in *sin; if (SCTP_DATA_NOT_FRAG == (event->msg_flags & SCTP_DATA_FRAG_MASK)) { event->msg_flags |= MSG_EOR; return event; } sctp_intl_store_reasm_uo(ulpq, event); sin = sctp_stream_in(&ulpq->asoc->stream, event->stream); if (sin->pd_mode_uo && event->mid == sin->mid_uo && event->fsn == sin->fsn_uo) retval = sctp_intl_retrieve_partial_uo(ulpq, event); if (!retval) retval = sctp_intl_retrieve_reassembled_uo(ulpq, event); return retval; } static struct sctp_ulpevent *sctp_intl_retrieve_first_uo(struct sctp_ulpq *ulpq) { struct sctp_stream_in *csin, *sin = NULL; struct sk_buff *first_frag = NULL; struct sk_buff *last_frag = NULL; struct sctp_ulpevent *retval; struct sk_buff *pos; __u32 next_fsn = 0; __u16 sid = 0; skb_queue_walk(&ulpq->reasm_uo, pos) { struct sctp_ulpevent *cevent = sctp_skb2event(pos); csin = sctp_stream_in(&ulpq->asoc->stream, cevent->stream); if (csin->pd_mode_uo) continue; switch (cevent->msg_flags & SCTP_DATA_FRAG_MASK) { case SCTP_DATA_FIRST_FRAG: if (first_frag) goto out; first_frag = pos; last_frag = pos; next_fsn = 0; sin = csin; sid = cevent->stream; sin->mid_uo = cevent->mid; break; case SCTP_DATA_MIDDLE_FRAG: if (!first_frag) break; if (cevent->stream == sid && cevent->mid == sin->mid_uo && cevent->fsn == next_fsn) { next_fsn++; last_frag = pos; } else { goto out; } break; case SCTP_DATA_LAST_FRAG: if (first_frag) goto out; break; default: break; } } if (!first_frag) return NULL; out: retval = sctp_make_reassembled_event(ulpq->asoc->base.net, &ulpq->reasm_uo, first_frag, last_frag); if (retval) { sin->fsn_uo = next_fsn; sin->pd_mode_uo = 1; } return retval; } static int sctp_ulpevent_idata(struct sctp_ulpq *ulpq, struct sctp_chunk *chunk, gfp_t gfp) { struct sctp_ulpevent *event; struct sk_buff_head temp; int event_eor = 0; event = sctp_ulpevent_make_rcvmsg(chunk->asoc, chunk, gfp); if (!event) return -ENOMEM; event->mid = ntohl(chunk->subh.idata_hdr->mid); if (event->msg_flags & SCTP_DATA_FIRST_FRAG) event->ppid = chunk->subh.idata_hdr->ppid; else event->fsn = ntohl(chunk->subh.idata_hdr->fsn); if (!(event->msg_flags & SCTP_DATA_UNORDERED)) { event = sctp_intl_reasm(ulpq, event); if (event) { skb_queue_head_init(&temp); __skb_queue_tail(&temp, sctp_event2skb(event)); if (event->msg_flags & MSG_EOR) event = sctp_intl_order(ulpq, event); } } else { event = sctp_intl_reasm_uo(ulpq, event); if (event) { skb_queue_head_init(&temp); __skb_queue_tail(&temp, sctp_event2skb(event)); } } if (event) { event_eor = (event->msg_flags & MSG_EOR) ? 1 : 0; sctp_enqueue_event(ulpq, &temp); } return event_eor; } static struct sctp_ulpevent *sctp_intl_retrieve_first(struct sctp_ulpq *ulpq) { struct sctp_stream_in *csin, *sin = NULL; struct sk_buff *first_frag = NULL; struct sk_buff *last_frag = NULL; struct sctp_ulpevent *retval; struct sk_buff *pos; __u32 next_fsn = 0; __u16 sid = 0; skb_queue_walk(&ulpq->reasm, pos) { struct sctp_ulpevent *cevent = sctp_skb2event(pos); csin = sctp_stream_in(&ulpq->asoc->stream, cevent->stream); if (csin->pd_mode) continue; switch (cevent->msg_flags & SCTP_DATA_FRAG_MASK) { case SCTP_DATA_FIRST_FRAG: if (first_frag) goto out; if (cevent->mid == csin->mid) { first_frag = pos; last_frag = pos; next_fsn = 0; sin = csin; sid = cevent->stream; } break; case SCTP_DATA_MIDDLE_FRAG: if (!first_frag) break; if (cevent->stream == sid && cevent->mid == sin->mid && cevent->fsn == next_fsn) { next_fsn++; last_frag = pos; } else { goto out; } break; case SCTP_DATA_LAST_FRAG: if (first_frag) goto out; break; default: break; } } if (!first_frag) return NULL; out: retval = sctp_make_reassembled_event(ulpq->asoc->base.net, &ulpq->reasm, first_frag, last_frag); if (retval) { sin->fsn = next_fsn; sin->pd_mode = 1; } return retval; } static void sctp_intl_start_pd(struct sctp_ulpq *ulpq, gfp_t gfp) { struct sctp_ulpevent *event; struct sk_buff_head temp; if (!skb_queue_empty(&ulpq->reasm)) { do { event = sctp_intl_retrieve_first(ulpq); if (event) { skb_queue_head_init(&temp); __skb_queue_tail(&temp, sctp_event2skb(event)); sctp_enqueue_event(ulpq, &temp); } } while (event); } if (!skb_queue_empty(&ulpq->reasm_uo)) { do { event = sctp_intl_retrieve_first_uo(ulpq); if (event) { skb_queue_head_init(&temp); __skb_queue_tail(&temp, sctp_event2skb(event)); sctp_enqueue_event(ulpq, &temp); } } while (event); } } static void sctp_renege_events(struct sctp_ulpq *ulpq, struct sctp_chunk *chunk, gfp_t gfp) { struct sctp_association *asoc = ulpq->asoc; __u32 freed = 0; __u16 needed; needed = ntohs(chunk->chunk_hdr->length) - sizeof(struct sctp_idata_chunk); if (skb_queue_empty(&asoc->base.sk->sk_receive_queue)) { freed = sctp_ulpq_renege_list(ulpq, &ulpq->lobby, needed); if (freed < needed) freed += sctp_ulpq_renege_list(ulpq, &ulpq->reasm, needed); if (freed < needed) freed += sctp_ulpq_renege_list(ulpq, &ulpq->reasm_uo, needed); } if (freed >= needed && sctp_ulpevent_idata(ulpq, chunk, gfp) <= 0) sctp_intl_start_pd(ulpq, gfp); } static void sctp_intl_stream_abort_pd(struct sctp_ulpq *ulpq, __u16 sid, __u32 mid, __u16 flags, gfp_t gfp) { struct sock *sk = ulpq->asoc->base.sk; struct sctp_ulpevent *ev = NULL; if (!sctp_ulpevent_type_enabled(ulpq->asoc->subscribe, SCTP_PARTIAL_DELIVERY_EVENT)) return; ev = sctp_ulpevent_make_pdapi(ulpq->asoc, SCTP_PARTIAL_DELIVERY_ABORTED, sid, mid, flags, gfp); if (ev) { struct sctp_sock *sp = sctp_sk(sk); __skb_queue_tail(&sk->sk_receive_queue, sctp_event2skb(ev)); if (!sp->data_ready_signalled) { sp->data_ready_signalled = 1; sk->sk_data_ready(sk); } } } static void sctp_intl_reap_ordered(struct sctp_ulpq *ulpq, __u16 sid) { struct sctp_stream *stream = &ulpq->asoc->stream; struct sctp_ulpevent *cevent, *event = NULL; struct sk_buff_head *lobby = &ulpq->lobby; struct sk_buff *pos, *tmp; struct sk_buff_head temp; __u16 csid; __u32 cmid; skb_queue_head_init(&temp); sctp_skb_for_each(pos, lobby, tmp) { cevent = (struct sctp_ulpevent *)pos->cb; csid = cevent->stream; cmid = cevent->mid; if (csid > sid) break; if (csid < sid) continue; if (!MID_lt(cmid, sctp_mid_peek(stream, in, csid))) break; __skb_unlink(pos, lobby); if (!event) event = sctp_skb2event(pos); __skb_queue_tail(&temp, pos); } if (!event && pos != (struct sk_buff *)lobby) { cevent = (struct sctp_ulpevent *)pos->cb; csid = cevent->stream; cmid = cevent->mid; if (csid == sid && cmid == sctp_mid_peek(stream, in, csid)) { sctp_mid_next(stream, in, csid); __skb_unlink(pos, lobby); __skb_queue_tail(&temp, pos); event = sctp_skb2event(pos); } } if (event) { sctp_intl_retrieve_ordered(ulpq, event); sctp_enqueue_event(ulpq, &temp); } } static void sctp_intl_abort_pd(struct sctp_ulpq *ulpq, gfp_t gfp) { struct sctp_stream *stream = &ulpq->asoc->stream; __u16 sid; for (sid = 0; sid < stream->incnt; sid++) { struct sctp_stream_in *sin = SCTP_SI(stream, sid); __u32 mid; if (sin->pd_mode_uo) { sin->pd_mode_uo = 0; mid = sin->mid_uo; sctp_intl_stream_abort_pd(ulpq, sid, mid, 0x1, gfp); } if (sin->pd_mode) { sin->pd_mode = 0; mid = sin->mid; sctp_intl_stream_abort_pd(ulpq, sid, mid, 0, gfp); sctp_mid_skip(stream, in, sid, mid); sctp_intl_reap_ordered(ulpq, sid); } } /* intl abort pd happens only when all data needs to be cleaned */ sctp_ulpq_flush(ulpq); } static inline int sctp_get_skip_pos(struct sctp_ifwdtsn_skip *skiplist, int nskips, __be16 stream, __u8 flags) { int i; for (i = 0; i < nskips; i++) if (skiplist[i].stream == stream && skiplist[i].flags == flags) return i; return i; } #define SCTP_FTSN_U_BIT 0x1 static void sctp_generate_iftsn(struct sctp_outq *q, __u32 ctsn) { struct sctp_ifwdtsn_skip ftsn_skip_arr[10]; struct sctp_association *asoc = q->asoc; struct sctp_chunk *ftsn_chunk = NULL; struct list_head *lchunk, *temp; int nskips = 0, skip_pos; struct sctp_chunk *chunk; __u32 tsn; if (!asoc->peer.prsctp_capable) return; if (TSN_lt(asoc->adv_peer_ack_point, ctsn)) asoc->adv_peer_ack_point = ctsn; list_for_each_safe(lchunk, temp, &q->abandoned) { chunk = list_entry(lchunk, struct sctp_chunk, transmitted_list); tsn = ntohl(chunk->subh.data_hdr->tsn); if (TSN_lte(tsn, ctsn)) { list_del_init(lchunk); sctp_chunk_free(chunk); } else if (TSN_lte(tsn, asoc->adv_peer_ack_point + 1)) { __be16 sid = chunk->subh.idata_hdr->stream; __be32 mid = chunk->subh.idata_hdr->mid; __u8 flags = 0; if (chunk->chunk_hdr->flags & SCTP_DATA_UNORDERED) flags |= SCTP_FTSN_U_BIT; asoc->adv_peer_ack_point = tsn; skip_pos = sctp_get_skip_pos(&ftsn_skip_arr[0], nskips, sid, flags); ftsn_skip_arr[skip_pos].stream = sid; ftsn_skip_arr[skip_pos].reserved = 0; ftsn_skip_arr[skip_pos].flags = flags; ftsn_skip_arr[skip_pos].mid = mid; if (skip_pos == nskips) nskips++; if (nskips == 10) break; } else { break; } } if (asoc->adv_peer_ack_point > ctsn) ftsn_chunk = sctp_make_ifwdtsn(asoc, asoc->adv_peer_ack_point, nskips, &ftsn_skip_arr[0]); if (ftsn_chunk) { list_add_tail(&ftsn_chunk->list, &q->control_chunk_list); SCTP_INC_STATS(asoc->base.net, SCTP_MIB_OUTCTRLCHUNKS); } } #define _sctp_walk_ifwdtsn(pos, chunk, end) \ for (pos = (void *)(chunk->subh.ifwdtsn_hdr + 1); \ (void *)pos <= (void *)(chunk->subh.ifwdtsn_hdr + 1) + (end) - \ sizeof(struct sctp_ifwdtsn_skip); pos++) #define sctp_walk_ifwdtsn(pos, ch) \ _sctp_walk_ifwdtsn((pos), (ch), ntohs((ch)->chunk_hdr->length) - \ sizeof(struct sctp_ifwdtsn_chunk)) static bool sctp_validate_fwdtsn(struct sctp_chunk *chunk) { struct sctp_fwdtsn_skip *skip; __u16 incnt; if (chunk->chunk_hdr->type != SCTP_CID_FWD_TSN) return false; incnt = chunk->asoc->stream.incnt; sctp_walk_fwdtsn(skip, chunk) if (ntohs(skip->stream) >= incnt) return false; return true; } static bool sctp_validate_iftsn(struct sctp_chunk *chunk) { struct sctp_ifwdtsn_skip *skip; __u16 incnt; if (chunk->chunk_hdr->type != SCTP_CID_I_FWD_TSN) return false; incnt = chunk->asoc->stream.incnt; sctp_walk_ifwdtsn(skip, chunk) if (ntohs(skip->stream) >= incnt) return false; return true; } static void sctp_report_fwdtsn(struct sctp_ulpq *ulpq, __u32 ftsn) { /* Move the Cumulattive TSN Ack ahead. */ sctp_tsnmap_skip(&ulpq->asoc->peer.tsn_map, ftsn); /* purge the fragmentation queue */ sctp_ulpq_reasm_flushtsn(ulpq, ftsn); /* Abort any in progress partial delivery. */ sctp_ulpq_abort_pd(ulpq, GFP_ATOMIC); } static void sctp_intl_reasm_flushtsn(struct sctp_ulpq *ulpq, __u32 ftsn) { struct sk_buff *pos, *tmp; skb_queue_walk_safe(&ulpq->reasm, pos, tmp) { struct sctp_ulpevent *event = sctp_skb2event(pos); __u32 tsn = event->tsn; if (TSN_lte(tsn, ftsn)) { __skb_unlink(pos, &ulpq->reasm); sctp_ulpevent_free(event); } } skb_queue_walk_safe(&ulpq->reasm_uo, pos, tmp) { struct sctp_ulpevent *event = sctp_skb2event(pos); __u32 tsn = event->tsn; if (TSN_lte(tsn, ftsn)) { __skb_unlink(pos, &ulpq->reasm_uo); sctp_ulpevent_free(event); } } } static void sctp_report_iftsn(struct sctp_ulpq *ulpq, __u32 ftsn) { /* Move the Cumulattive TSN Ack ahead. */ sctp_tsnmap_skip(&ulpq->asoc->peer.tsn_map, ftsn); /* purge the fragmentation queue */ sctp_intl_reasm_flushtsn(ulpq, ftsn); /* abort only when it's for all data */ if (ftsn == sctp_tsnmap_get_max_tsn_seen(&ulpq->asoc->peer.tsn_map)) sctp_intl_abort_pd(ulpq, GFP_ATOMIC); } static void sctp_handle_fwdtsn(struct sctp_ulpq *ulpq, struct sctp_chunk *chunk) { struct sctp_fwdtsn_skip *skip; /* Walk through all the skipped SSNs */ sctp_walk_fwdtsn(skip, chunk) sctp_ulpq_skip(ulpq, ntohs(skip->stream), ntohs(skip->ssn)); } static void sctp_intl_skip(struct sctp_ulpq *ulpq, __u16 sid, __u32 mid, __u8 flags) { struct sctp_stream_in *sin = sctp_stream_in(&ulpq->asoc->stream, sid); struct sctp_stream *stream = &ulpq->asoc->stream; if (flags & SCTP_FTSN_U_BIT) { if (sin->pd_mode_uo && MID_lt(sin->mid_uo, mid)) { sin->pd_mode_uo = 0; sctp_intl_stream_abort_pd(ulpq, sid, mid, 0x1, GFP_ATOMIC); } return; } if (MID_lt(mid, sctp_mid_peek(stream, in, sid))) return; if (sin->pd_mode) { sin->pd_mode = 0; sctp_intl_stream_abort_pd(ulpq, sid, mid, 0x0, GFP_ATOMIC); } sctp_mid_skip(stream, in, sid, mid); sctp_intl_reap_ordered(ulpq, sid); } static void sctp_handle_iftsn(struct sctp_ulpq *ulpq, struct sctp_chunk *chunk) { struct sctp_ifwdtsn_skip *skip; /* Walk through all the skipped MIDs and abort stream pd if possible */ sctp_walk_ifwdtsn(skip, chunk) sctp_intl_skip(ulpq, ntohs(skip->stream), ntohl(skip->mid), skip->flags); } static int do_ulpq_tail_event(struct sctp_ulpq *ulpq, struct sctp_ulpevent *event) { struct sk_buff_head temp; skb_queue_head_init(&temp); __skb_queue_tail(&temp, sctp_event2skb(event)); return sctp_ulpq_tail_event(ulpq, &temp); } static struct sctp_stream_interleave sctp_stream_interleave_0 = { .data_chunk_len = sizeof(struct sctp_data_chunk), .ftsn_chunk_len = sizeof(struct sctp_fwdtsn_chunk), /* DATA process functions */ .make_datafrag = sctp_make_datafrag_empty, .assign_number = sctp_chunk_assign_ssn, .validate_data = sctp_validate_data, .ulpevent_data = sctp_ulpq_tail_data, .enqueue_event = do_ulpq_tail_event, .renege_events = sctp_ulpq_renege, .start_pd = sctp_ulpq_partial_delivery, .abort_pd = sctp_ulpq_abort_pd, /* FORWARD-TSN process functions */ .generate_ftsn = sctp_generate_fwdtsn, .validate_ftsn = sctp_validate_fwdtsn, .report_ftsn = sctp_report_fwdtsn, .handle_ftsn = sctp_handle_fwdtsn, }; static int do_sctp_enqueue_event(struct sctp_ulpq *ulpq, struct sctp_ulpevent *event) { struct sk_buff_head temp; skb_queue_head_init(&temp); __skb_queue_tail(&temp, sctp_event2skb(event)); return sctp_enqueue_event(ulpq, &temp); } static struct sctp_stream_interleave sctp_stream_interleave_1 = { .data_chunk_len = sizeof(struct sctp_idata_chunk), .ftsn_chunk_len = sizeof(struct sctp_ifwdtsn_chunk), /* I-DATA process functions */ .make_datafrag = sctp_make_idatafrag_empty, .assign_number = sctp_chunk_assign_mid, .validate_data = sctp_validate_idata, .ulpevent_data = sctp_ulpevent_idata, .enqueue_event = do_sctp_enqueue_event, .renege_events = sctp_renege_events, .start_pd = sctp_intl_start_pd, .abort_pd = sctp_intl_abort_pd, /* I-FORWARD-TSN process functions */ .generate_ftsn = sctp_generate_iftsn, .validate_ftsn = sctp_validate_iftsn, .report_ftsn = sctp_report_iftsn, .handle_ftsn = sctp_handle_iftsn, }; void sctp_stream_interleave_init(struct sctp_stream *stream) { struct sctp_association *asoc; asoc = container_of(stream, struct sctp_association, stream); stream->si = asoc->peer.intl_capable ? &sctp_stream_interleave_1 : &sctp_stream_interleave_0; } |
| 10 10 10 10 10 9 10 10 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 | // SPDX-License-Identifier: GPL-2.0 #include <linux/slab.h> #include <linux/lockdep.h> #include <linux/sysfs.h> #include <linux/kobject.h> #include <linux/memory.h> #include <linux/memory-tiers.h> #include <linux/notifier.h> #include <linux/sched/sysctl.h> #include "internal.h" struct memory_tier { /* hierarchy of memory tiers */ struct list_head list; /* list of all memory types part of this tier */ struct list_head memory_types; /* * start value of abstract distance. memory tier maps * an abstract distance range, * adistance_start .. adistance_start + MEMTIER_CHUNK_SIZE */ int adistance_start; struct device dev; /* All the nodes that are part of all the lower memory tiers. */ nodemask_t lower_tier_mask; }; struct demotion_nodes { nodemask_t preferred; }; struct node_memory_type_map { struct memory_dev_type *memtype; int map_count; }; static DEFINE_MUTEX(memory_tier_lock); static LIST_HEAD(memory_tiers); /* * The list is used to store all memory types that are not created * by a device driver. */ static LIST_HEAD(default_memory_types); static struct node_memory_type_map node_memory_types[MAX_NUMNODES]; struct memory_dev_type *default_dram_type; nodemask_t default_dram_nodes __initdata = NODE_MASK_NONE; static const struct bus_type memory_tier_subsys = { .name = "memory_tiering", .dev_name = "memory_tier", }; #ifdef CONFIG_NUMA_BALANCING /** * folio_use_access_time - check if a folio reuses cpupid for page access time * @folio: folio to check * * folio's _last_cpupid field is repurposed by memory tiering. In memory * tiering mode, cpupid of slow memory folio (not toptier memory) is used to * record page access time. * * Return: the folio _last_cpupid is used to record page access time */ bool folio_use_access_time(struct folio *folio) { return (sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING) && !node_is_toptier(folio_nid(folio)); } #endif #ifdef CONFIG_MIGRATION static int top_tier_adistance; /* * node_demotion[] examples: * * Example 1: * * Node 0 & 1 are CPU + DRAM nodes, node 2 & 3 are PMEM nodes. * * node distances: * node 0 1 2 3 * 0 10 20 30 40 * 1 20 10 40 30 * 2 30 40 10 40 * 3 40 30 40 10 * * memory_tiers0 = 0-1 * memory_tiers1 = 2-3 * * node_demotion[0].preferred = 2 * node_demotion[1].preferred = 3 * node_demotion[2].preferred = <empty> * node_demotion[3].preferred = <empty> * * Example 2: * * Node 0 & 1 are CPU + DRAM nodes, node 2 is memory-only DRAM node. * * node distances: * node 0 1 2 * 0 10 20 30 * 1 20 10 30 * 2 30 30 10 * * memory_tiers0 = 0-2 * * node_demotion[0].preferred = <empty> * node_demotion[1].preferred = <empty> * node_demotion[2].preferred = <empty> * * Example 3: * * Node 0 is CPU + DRAM nodes, Node 1 is HBM node, node 2 is PMEM node. * * node distances: * node 0 1 2 * 0 10 20 30 * 1 20 10 40 * 2 30 40 10 * * memory_tiers0 = 1 * memory_tiers1 = 0 * memory_tiers2 = 2 * * node_demotion[0].preferred = 2 * node_demotion[1].preferred = 0 * node_demotion[2].preferred = <empty> * */ static struct demotion_nodes *node_demotion __read_mostly; #endif /* CONFIG_MIGRATION */ static BLOCKING_NOTIFIER_HEAD(mt_adistance_algorithms); /* The lock is used to protect `default_dram_perf*` info and nid. */ static DEFINE_MUTEX(default_dram_perf_lock); static bool default_dram_perf_error; static struct access_coordinate default_dram_perf; static int default_dram_perf_ref_nid = NUMA_NO_NODE; static const char *default_dram_perf_ref_source; static inline struct memory_tier *to_memory_tier(struct device *device) { return container_of(device, struct memory_tier, dev); } static __always_inline nodemask_t get_memtier_nodemask(struct memory_tier *memtier) { nodemask_t nodes = NODE_MASK_NONE; struct memory_dev_type *memtype; list_for_each_entry(memtype, &memtier->memory_types, tier_sibling) nodes_or(nodes, nodes, memtype->nodes); return nodes; } static void memory_tier_device_release(struct device *dev) { struct memory_tier *tier = to_memory_tier(dev); /* * synchronize_rcu in clear_node_memory_tier makes sure * we don't have rcu access to this memory tier. */ kfree(tier); } static ssize_t nodelist_show(struct device *dev, struct device_attribute *attr, char *buf) { int ret; nodemask_t nmask; mutex_lock(&memory_tier_lock); nmask = get_memtier_nodemask(to_memory_tier(dev)); ret = sysfs_emit(buf, "%*pbl\n", nodemask_pr_args(&nmask)); mutex_unlock(&memory_tier_lock); return ret; } static DEVICE_ATTR_RO(nodelist); static struct attribute *memtier_dev_attrs[] = { &dev_attr_nodelist.attr, NULL }; static const struct attribute_group memtier_dev_group = { .attrs = memtier_dev_attrs, }; static const struct attribute_group *memtier_dev_groups[] = { &memtier_dev_group, NULL }; static struct memory_tier *find_create_memory_tier(struct memory_dev_type *memtype) { int ret; bool found_slot = false; struct memory_tier *memtier, *new_memtier; int adistance = memtype->adistance; unsigned int memtier_adistance_chunk_size = MEMTIER_CHUNK_SIZE; lockdep_assert_held_once(&memory_tier_lock); adistance = round_down(adistance, memtier_adistance_chunk_size); /* * If the memtype is already part of a memory tier, * just return that. */ if (!list_empty(&memtype->tier_sibling)) { list_for_each_entry(memtier, &memory_tiers, list) { if (adistance == memtier->adistance_start) return memtier; } WARN_ON(1); return ERR_PTR(-EINVAL); } list_for_each_entry(memtier, &memory_tiers, list) { if (adistance == memtier->adistance_start) { goto link_memtype; } else if (adistance < memtier->adistance_start) { found_slot = true; break; } } new_memtier = kzalloc_obj(struct memory_tier); if (!new_memtier) return ERR_PTR(-ENOMEM); new_memtier->adistance_start = adistance; INIT_LIST_HEAD(&new_memtier->list); INIT_LIST_HEAD(&new_memtier->memory_types); if (found_slot) list_add_tail(&new_memtier->list, &memtier->list); else list_add_tail(&new_memtier->list, &memory_tiers); new_memtier->dev.id = adistance >> MEMTIER_CHUNK_BITS; new_memtier->dev.bus = &memory_tier_subsys; new_memtier->dev.release = memory_tier_device_release; new_memtier->dev.groups = memtier_dev_groups; ret = device_register(&new_memtier->dev); if (ret) { list_del(&new_memtier->list); put_device(&new_memtier->dev); return ERR_PTR(ret); } memtier = new_memtier; link_memtype: list_add(&memtype->tier_sibling, &memtier->memory_types); return memtier; } static struct memory_tier *__node_get_memory_tier(int node) { pg_data_t *pgdat; pgdat = NODE_DATA(node); if (!pgdat) return NULL; /* * Since we hold memory_tier_lock, we can avoid * RCU read locks when accessing the details. No * parallel updates are possible here. */ return rcu_dereference_check(pgdat->memtier, lockdep_is_held(&memory_tier_lock)); } #ifdef CONFIG_MIGRATION bool node_is_toptier(int node) { bool toptier; pg_data_t *pgdat; struct memory_tier *memtier; pgdat = NODE_DATA(node); if (!pgdat) return false; rcu_read_lock(); memtier = rcu_dereference(pgdat->memtier); if (!memtier) { toptier = true; goto out; } if (memtier->adistance_start <= top_tier_adistance) toptier = true; else toptier = false; out: rcu_read_unlock(); return toptier; } void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets) { struct memory_tier *memtier; /* * pg_data_t.memtier updates includes a synchronize_rcu() * which ensures that we either find NULL or a valid memtier * in NODE_DATA. protect the access via rcu_read_lock(); */ rcu_read_lock(); memtier = rcu_dereference(pgdat->memtier); if (memtier) *targets = memtier->lower_tier_mask; else *targets = NODE_MASK_NONE; rcu_read_unlock(); } /** * next_demotion_node() - Get the next node in the demotion path * @node: The starting node to lookup the next node * @allowed_mask: The pointer to allowed node mask * * Return: node id for next memory node in the demotion path hierarchy * from @node; NUMA_NO_NODE if @node is terminal. This does not keep * @node online or guarantee that it *continues* to be the next demotion * target. */ int next_demotion_node(int node, const nodemask_t *allowed_mask) { struct demotion_nodes *nd; nodemask_t mask; if (!node_demotion) return NUMA_NO_NODE; nd = &node_demotion[node]; /* * node_demotion[] is updated without excluding this * function from running. * * Make sure to use RCU over entire code blocks if * node_demotion[] reads need to be consistent. */ rcu_read_lock(); /* Filter out nodes that are not in allowed_mask. */ nodes_and(mask, nd->preferred, *allowed_mask); rcu_read_unlock(); /* * If there are multiple target nodes, just select one * target node randomly. * * In addition, we can also use round-robin to select * target node, but we should introduce another variable * for node_demotion[] to record last selected target node, * that may cause cache ping-pong due to the changing of * last target node. Or introducing per-cpu data to avoid * caching issue, which seems more complicated. So selecting * target node randomly seems better until now. */ if (!nodes_empty(mask)) return node_random(&mask); /* * Preferred nodes are not in allowed_mask. Flip bits in * allowed_mask as used node mask. Then, use it to get the * closest demotion target. */ nodes_complement(mask, *allowed_mask); return find_next_best_node(node, &mask); } static void disable_all_demotion_targets(void) { struct memory_tier *memtier; int node; for_each_node_state(node, N_MEMORY) { node_demotion[node].preferred = NODE_MASK_NONE; /* * We are holding memory_tier_lock, it is safe * to access pgda->memtier. */ memtier = __node_get_memory_tier(node); if (memtier) memtier->lower_tier_mask = NODE_MASK_NONE; } /* * Ensure that the "disable" is visible across the system. * Readers will see either a combination of before+disable * state or disable+after. They will never see before and * after state together. */ synchronize_rcu(); } static void dump_demotion_targets(void) { int node; for_each_node_state(node, N_MEMORY) { struct memory_tier *memtier = __node_get_memory_tier(node); nodemask_t preferred = node_demotion[node].preferred; if (!memtier) continue; if (nodes_empty(preferred)) pr_info("Demotion targets for Node %d: null\n", node); else pr_info("Demotion targets for Node %d: preferred: %*pbl, fallback: %*pbl\n", node, nodemask_pr_args(&preferred), nodemask_pr_args(&memtier->lower_tier_mask)); } } /* * Find an automatic demotion target for all memory * nodes. Failing here is OK. It might just indicate * being at the end of a chain. */ static void establish_demotion_targets(void) { struct memory_tier *memtier; struct demotion_nodes *nd; int target = NUMA_NO_NODE, node; int distance, best_distance; nodemask_t tier_nodes, lower_tier; lockdep_assert_held_once(&memory_tier_lock); if (!node_demotion) return; disable_all_demotion_targets(); for_each_node_state(node, N_MEMORY) { best_distance = -1; nd = &node_demotion[node]; memtier = __node_get_memory_tier(node); if (!memtier || list_is_last(&memtier->list, &memory_tiers)) continue; /* * Get the lower memtier to find the demotion node list. */ memtier = list_next_entry(memtier, list); tier_nodes = get_memtier_nodemask(memtier); /* * find_next_best_node, use 'used' nodemask as a skip list. * Add all memory nodes except the selected memory tier * nodelist to skip list so that we find the best node from the * memtier nodelist. */ nodes_andnot(tier_nodes, node_states[N_MEMORY], tier_nodes); /* * Find all the nodes in the memory tier node list of same best distance. * add them to the preferred mask. We randomly select between nodes * in the preferred mask when allocating pages during demotion. */ do { target = find_next_best_node(node, &tier_nodes); if (target == NUMA_NO_NODE) break; distance = node_distance(node, target); if (distance == best_distance || best_distance == -1) { best_distance = distance; node_set(target, nd->preferred); } else { break; } } while (1); } /* * Promotion is allowed from a memory tier to higher * memory tier only if the memory tier doesn't include * compute. We want to skip promotion from a memory tier, * if any node that is part of the memory tier have CPUs. * Once we detect such a memory tier, we consider that tier * as top tiper from which promotion is not allowed. */ list_for_each_entry_reverse(memtier, &memory_tiers, list) { tier_nodes = get_memtier_nodemask(memtier); if (nodes_and(tier_nodes, node_states[N_CPU], tier_nodes)) { /* * abstract distance below the max value of this memtier * is considered toptier. */ top_tier_adistance = memtier->adistance_start + MEMTIER_CHUNK_SIZE - 1; break; } } /* * Now build the lower_tier mask for each node collecting node mask from * all memory tier below it. This allows us to fallback demotion page * allocation to a set of nodes that is closer the above selected * preferred node. */ lower_tier = node_states[N_MEMORY]; list_for_each_entry(memtier, &memory_tiers, list) { /* * Keep removing current tier from lower_tier nodes, * This will remove all nodes in current and above * memory tier from the lower_tier mask. */ tier_nodes = get_memtier_nodemask(memtier); nodes_andnot(lower_tier, lower_tier, tier_nodes); memtier->lower_tier_mask = lower_tier; } dump_demotion_targets(); } #else static inline void establish_demotion_targets(void) {} #endif /* CONFIG_MIGRATION */ static inline void __init_node_memory_type(int node, struct memory_dev_type *memtype) { if (!node_memory_types[node].memtype) node_memory_types[node].memtype = memtype; /* * for each device getting added in the same NUMA node * with this specific memtype, bump the map count. We * Only take memtype device reference once, so that * changing a node memtype can be done by dropping the * only reference count taken here. */ if (node_memory_types[node].memtype == memtype) { if (!node_memory_types[node].map_count++) kref_get(&memtype->kref); } } static struct memory_tier *set_node_memory_tier(int node) { struct memory_tier *memtier; struct memory_dev_type *memtype = default_dram_type; int adist = MEMTIER_ADISTANCE_DRAM; pg_data_t *pgdat = NODE_DATA(node); lockdep_assert_held_once(&memory_tier_lock); if (!node_state(node, N_MEMORY)) return ERR_PTR(-EINVAL); mt_calc_adistance(node, &adist); if (!node_memory_types[node].memtype) { memtype = mt_find_alloc_memory_type(adist, &default_memory_types); if (IS_ERR(memtype)) { memtype = default_dram_type; pr_info("Failed to allocate a memory type. Fall back.\n"); } } __init_node_memory_type(node, memtype); memtype = node_memory_types[node].memtype; node_set(node, memtype->nodes); memtier = find_create_memory_tier(memtype); if (!IS_ERR(memtier)) rcu_assign_pointer(pgdat->memtier, memtier); return memtier; } static void destroy_memory_tier(struct memory_tier *memtier) { list_del(&memtier->list); device_unregister(&memtier->dev); } static bool clear_node_memory_tier(int node) { bool cleared = false; pg_data_t *pgdat; struct memory_tier *memtier; pgdat = NODE_DATA(node); if (!pgdat) return false; /* * Make sure that anybody looking at NODE_DATA who finds * a valid memtier finds memory_dev_types with nodes still * linked to the memtier. We achieve this by waiting for * rcu read section to finish using synchronize_rcu. * This also enables us to free the destroyed memory tier * with kfree instead of kfree_rcu */ memtier = __node_get_memory_tier(node); if (memtier) { struct memory_dev_type *memtype; rcu_assign_pointer(pgdat->memtier, NULL); synchronize_rcu(); memtype = node_memory_types[node].memtype; node_clear(node, memtype->nodes); if (nodes_empty(memtype->nodes)) { list_del_init(&memtype->tier_sibling); if (list_empty(&memtier->memory_types)) destroy_memory_tier(memtier); } cleared = true; } return cleared; } static void release_memtype(struct kref *kref) { struct memory_dev_type *memtype; memtype = container_of(kref, struct memory_dev_type, kref); kfree(memtype); } struct memory_dev_type *alloc_memory_type(int adistance) { struct memory_dev_type *memtype; memtype = kmalloc_obj(*memtype); if (!memtype) return ERR_PTR(-ENOMEM); memtype->adistance = adistance; INIT_LIST_HEAD(&memtype->tier_sibling); memtype->nodes = NODE_MASK_NONE; kref_init(&memtype->kref); return memtype; } EXPORT_SYMBOL_GPL(alloc_memory_type); void put_memory_type(struct memory_dev_type *memtype) { kref_put(&memtype->kref, release_memtype); } EXPORT_SYMBOL_GPL(put_memory_type); void init_node_memory_type(int node, struct memory_dev_type *memtype) { mutex_lock(&memory_tier_lock); __init_node_memory_type(node, memtype); mutex_unlock(&memory_tier_lock); } EXPORT_SYMBOL_GPL(init_node_memory_type); void clear_node_memory_type(int node, struct memory_dev_type *memtype) { mutex_lock(&memory_tier_lock); if (node_memory_types[node].memtype == memtype || !memtype) node_memory_types[node].map_count--; /* * If we unmapped all the attached devices to this node, * clear the node memory type. */ if (!node_memory_types[node].map_count) { memtype = node_memory_types[node].memtype; node_memory_types[node].memtype = NULL; put_memory_type(memtype); } mutex_unlock(&memory_tier_lock); } EXPORT_SYMBOL_GPL(clear_node_memory_type); struct memory_dev_type *mt_find_alloc_memory_type(int adist, struct list_head *memory_types) { struct memory_dev_type *mtype; list_for_each_entry(mtype, memory_types, list) if (mtype->adistance == adist) return mtype; mtype = alloc_memory_type(adist); if (IS_ERR(mtype)) return mtype; list_add(&mtype->list, memory_types); return mtype; } EXPORT_SYMBOL_GPL(mt_find_alloc_memory_type); void mt_put_memory_types(struct list_head *memory_types) { struct memory_dev_type *mtype, *mtn; list_for_each_entry_safe(mtype, mtn, memory_types, list) { list_del(&mtype->list); put_memory_type(mtype); } } EXPORT_SYMBOL_GPL(mt_put_memory_types); /* * This is invoked via `late_initcall()` to initialize memory tiers for * memory nodes, both with and without CPUs. After the initialization of * firmware and devices, adistance algorithms are expected to be provided. */ static int __init memory_tier_late_init(void) { int nid; struct memory_tier *memtier; get_online_mems(); guard(mutex)(&memory_tier_lock); /* Assign each uninitialized N_MEMORY node to a memory tier. */ for_each_node_state(nid, N_MEMORY) { /* * Some device drivers may have initialized * memory tiers, potentially bringing memory nodes * online and configuring memory tiers. * Exclude them here. */ if (node_memory_types[nid].memtype) continue; memtier = set_node_memory_tier(nid); if (IS_ERR(memtier)) continue; } establish_demotion_targets(); put_online_mems(); return 0; } late_initcall(memory_tier_late_init); static void dump_hmem_attrs(struct access_coordinate *coord, const char *prefix) { pr_info( "%sread_latency: %u, write_latency: %u, read_bandwidth: %u, write_bandwidth: %u\n", prefix, coord->read_latency, coord->write_latency, coord->read_bandwidth, coord->write_bandwidth); } int mt_set_default_dram_perf(int nid, struct access_coordinate *perf, const char *source) { guard(mutex)(&default_dram_perf_lock); if (default_dram_perf_error) return -EIO; if (perf->read_latency + perf->write_latency == 0 || perf->read_bandwidth + perf->write_bandwidth == 0) return -EINVAL; if (default_dram_perf_ref_nid == NUMA_NO_NODE) { default_dram_perf = *perf; default_dram_perf_ref_nid = nid; default_dram_perf_ref_source = kstrdup(source, GFP_KERNEL); return 0; } /* * The performance of all default DRAM nodes is expected to be * same (that is, the variation is less than 10%). And it * will be used as base to calculate the abstract distance of * other memory nodes. */ if (abs(perf->read_latency - default_dram_perf.read_latency) * 10 > default_dram_perf.read_latency || abs(perf->write_latency - default_dram_perf.write_latency) * 10 > default_dram_perf.write_latency || abs(perf->read_bandwidth - default_dram_perf.read_bandwidth) * 10 > default_dram_perf.read_bandwidth || abs(perf->write_bandwidth - default_dram_perf.write_bandwidth) * 10 > default_dram_perf.write_bandwidth) { pr_info( "memory-tiers: the performance of DRAM node %d mismatches that of the reference\n" "DRAM node %d.\n", nid, default_dram_perf_ref_nid); pr_info(" performance of reference DRAM node %d from %s:\n", default_dram_perf_ref_nid, default_dram_perf_ref_source); dump_hmem_attrs(&default_dram_perf, " "); pr_info(" performance of DRAM node %d from %s:\n", nid, source); dump_hmem_attrs(perf, " "); pr_info( " disable default DRAM node performance based abstract distance algorithm.\n"); default_dram_perf_error = true; return -EINVAL; } return 0; } int mt_perf_to_adistance(struct access_coordinate *perf, int *adist) { guard(mutex)(&default_dram_perf_lock); if (default_dram_perf_error) return -EIO; if (perf->read_latency + perf->write_latency == 0 || perf->read_bandwidth + perf->write_bandwidth == 0) return -EINVAL; if (default_dram_perf_ref_nid == NUMA_NO_NODE) return -ENOENT; /* * The abstract distance of a memory node is in direct proportion to * its memory latency (read + write) and inversely proportional to its * memory bandwidth (read + write). The abstract distance, memory * latency, and memory bandwidth of the default DRAM nodes are used as * the base. */ *adist = MEMTIER_ADISTANCE_DRAM * (perf->read_latency + perf->write_latency) / (default_dram_perf.read_latency + default_dram_perf.write_latency) * (default_dram_perf.read_bandwidth + default_dram_perf.write_bandwidth) / (perf->read_bandwidth + perf->write_bandwidth); return 0; } EXPORT_SYMBOL_GPL(mt_perf_to_adistance); /** * register_mt_adistance_algorithm() - Register memory tiering abstract distance algorithm * @nb: The notifier block which describe the algorithm * * Return: 0 on success, errno on error. * * Every memory tiering abstract distance algorithm provider needs to * register the algorithm with register_mt_adistance_algorithm(). To * calculate the abstract distance for a specified memory node, the * notifier function will be called unless some high priority * algorithm has provided result. The prototype of the notifier * function is as follows, * * int (*algorithm_notifier)(struct notifier_block *nb, * unsigned long nid, void *data); * * Where "nid" specifies the memory node, "data" is the pointer to the * returned abstract distance (that is, "int *adist"). If the * algorithm provides the result, NOTIFY_STOP should be returned. * Otherwise, return_value & %NOTIFY_STOP_MASK == 0 to allow the next * algorithm in the chain to provide the result. */ int register_mt_adistance_algorithm(struct notifier_block *nb) { return blocking_notifier_chain_register(&mt_adistance_algorithms, nb); } EXPORT_SYMBOL_GPL(register_mt_adistance_algorithm); /** * unregister_mt_adistance_algorithm() - Unregister memory tiering abstract distance algorithm * @nb: the notifier block which describe the algorithm * * Return: 0 on success, errno on error. */ int unregister_mt_adistance_algorithm(struct notifier_block *nb) { return blocking_notifier_chain_unregister(&mt_adistance_algorithms, nb); } EXPORT_SYMBOL_GPL(unregister_mt_adistance_algorithm); /** * mt_calc_adistance() - Calculate abstract distance with registered algorithms * @node: the node to calculate abstract distance for * @adist: the returned abstract distance * * Return: if return_value & %NOTIFY_STOP_MASK != 0, then some * abstract distance algorithm provides the result, and return it via * @adist. Otherwise, no algorithm can provide the result and @adist * will be kept as it is. */ int mt_calc_adistance(int node, int *adist) { return blocking_notifier_call_chain(&mt_adistance_algorithms, node, adist); } EXPORT_SYMBOL_GPL(mt_calc_adistance); static int __meminit memtier_hotplug_callback(struct notifier_block *self, unsigned long action, void *_arg) { struct memory_tier *memtier; struct node_notify *nn = _arg; switch (action) { case NODE_REMOVED_LAST_MEMORY: mutex_lock(&memory_tier_lock); if (clear_node_memory_tier(nn->nid)) establish_demotion_targets(); mutex_unlock(&memory_tier_lock); break; case NODE_ADDED_FIRST_MEMORY: mutex_lock(&memory_tier_lock); memtier = set_node_memory_tier(nn->nid); if (!IS_ERR(memtier)) establish_demotion_targets(); mutex_unlock(&memory_tier_lock); break; } return notifier_from_errno(0); } static int __init memory_tier_init(void) { int ret; ret = subsys_virtual_register(&memory_tier_subsys, NULL); if (ret) panic("%s() failed to register memory tier subsystem\n", __func__); #ifdef CONFIG_MIGRATION node_demotion = kzalloc_objs(struct demotion_nodes, nr_node_ids); WARN_ON(!node_demotion); #endif mutex_lock(&memory_tier_lock); /* * For now we can have 4 faster memory tiers with smaller adistance * than default DRAM tier. */ default_dram_type = mt_find_alloc_memory_type(MEMTIER_ADISTANCE_DRAM, &default_memory_types); mutex_unlock(&memory_tier_lock); if (IS_ERR(default_dram_type)) panic("%s() failed to allocate default DRAM tier\n", __func__); /* Record nodes with memory and CPU to set default DRAM performance. */ nodes_and(default_dram_nodes, node_states[N_MEMORY], node_states[N_CPU]); hotplug_node_notifier(memtier_hotplug_callback, MEMTIER_HOTPLUG_PRI); return 0; } subsys_initcall(memory_tier_init); bool numa_demotion_enabled = false; #ifdef CONFIG_MIGRATION #ifdef CONFIG_SYSFS static ssize_t demotion_enabled_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf) { return sysfs_emit(buf, "%s\n", str_true_false(numa_demotion_enabled)); } static ssize_t demotion_enabled_store(struct kobject *kobj, struct kobj_attribute *attr, const char *buf, size_t count) { ssize_t ret; bool before = numa_demotion_enabled; ret = kstrtobool(buf, &numa_demotion_enabled); if (ret) return ret; /* * Reset kswapd_failures statistics. They may no longer be * valid since the policy for kswapd has changed. */ if (before == false && numa_demotion_enabled == true) { struct pglist_data *pgdat; for_each_online_pgdat(pgdat) kswapd_clear_hopeless(pgdat, KSWAPD_CLEAR_HOPELESS_OTHER); } return count; } static struct kobj_attribute numa_demotion_enabled_attr = __ATTR_RW(demotion_enabled); static struct attribute *numa_attrs[] = { &numa_demotion_enabled_attr.attr, NULL, }; static const struct attribute_group numa_attr_group = { .attrs = numa_attrs, }; static int __init numa_init_sysfs(void) { int err; struct kobject *numa_kobj; numa_kobj = kobject_create_and_add("numa", mm_kobj); if (!numa_kobj) { pr_err("failed to create numa kobject\n"); return -ENOMEM; } err = sysfs_create_group(numa_kobj, &numa_attr_group); if (err) { pr_err("failed to register numa group\n"); goto delete_obj; } return 0; delete_obj: kobject_put(numa_kobj); return err; } subsys_initcall(numa_init_sysfs); #endif /* CONFIG_SYSFS */ #endif |
| 2 4 4 2 2 2 70 1 1 1 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 | // SPDX-License-Identifier: MIT #include <linux/export.h> #include <linux/fb.h> #include <drm/drm_drv.h> #include <drm/drm_fbdev_shmem.h> #include <drm/drm_fb_helper.h> #include <drm/drm_framebuffer.h> #include <drm/drm_gem_framebuffer_helper.h> #include <drm/drm_gem_shmem_helper.h> #include <drm/drm_print.h> /* * struct fb_ops */ static int drm_fbdev_shmem_fb_open(struct fb_info *info, int user) { struct drm_fb_helper *fb_helper = info->par; /* No need to take a ref for fbcon because it unbinds on unregister */ if (user && !try_module_get(fb_helper->dev->driver->fops->owner)) return -ENODEV; return 0; } static int drm_fbdev_shmem_fb_release(struct fb_info *info, int user) { struct drm_fb_helper *fb_helper = info->par; if (user) module_put(fb_helper->dev->driver->fops->owner); return 0; } FB_GEN_DEFAULT_DEFERRED_SYSMEM_OPS(drm_fbdev_shmem, drm_fb_helper_damage_range, drm_fb_helper_damage_area); static int drm_fbdev_shmem_fb_mmap(struct fb_info *info, struct vm_area_struct *vma) { struct drm_fb_helper *fb_helper = info->par; struct drm_framebuffer *fb = fb_helper->fb; struct drm_gem_object *obj = drm_gem_fb_get_obj(fb, 0); struct drm_gem_shmem_object *shmem = to_drm_gem_shmem_obj(obj); if (shmem->map_wc) vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot); return fb_deferred_io_mmap(info, vma); } static void drm_fbdev_shmem_fb_destroy(struct fb_info *info) { struct drm_fb_helper *fb_helper = info->par; if (!fb_helper->dev) return; fb_deferred_io_cleanup(info); drm_fb_helper_fini(fb_helper); drm_client_buffer_vunmap(fb_helper->buffer); drm_client_buffer_delete(fb_helper->buffer); drm_client_release(&fb_helper->client); } static const struct fb_ops drm_fbdev_shmem_fb_ops = { .owner = THIS_MODULE, .fb_open = drm_fbdev_shmem_fb_open, .fb_release = drm_fbdev_shmem_fb_release, __FB_DEFAULT_DEFERRED_OPS_RDWR(drm_fbdev_shmem), DRM_FB_HELPER_DEFAULT_OPS, __FB_DEFAULT_DEFERRED_OPS_DRAW(drm_fbdev_shmem), .fb_mmap = drm_fbdev_shmem_fb_mmap, .fb_destroy = drm_fbdev_shmem_fb_destroy, }; static struct page *drm_fbdev_shmem_get_page(struct fb_info *info, unsigned long offset) { struct drm_fb_helper *fb_helper = info->par; struct drm_framebuffer *fb = fb_helper->fb; struct drm_gem_object *obj = drm_gem_fb_get_obj(fb, 0); struct drm_gem_shmem_object *shmem = to_drm_gem_shmem_obj(obj); unsigned int i = offset >> PAGE_SHIFT; struct page *page; if (fb_WARN_ON_ONCE(info, offset > obj->size)) return NULL; page = shmem->pages[i]; // protected by active vmap if (page) get_page(page); fb_WARN_ON_ONCE(info, !page); return page; } /* * struct drm_fb_helper */ static int drm_fbdev_shmem_helper_fb_dirty(struct drm_fb_helper *helper, struct drm_clip_rect *clip) { struct drm_device *dev = helper->dev; int ret; /* Call damage handlers only if necessary */ if (!(clip->x1 < clip->x2 && clip->y1 < clip->y2)) return 0; if (helper->fb->funcs->dirty) { ret = helper->fb->funcs->dirty(helper->fb, NULL, 0, 0, clip, 1); if (drm_WARN_ONCE(dev, ret, "Dirty helper failed: ret=%d\n", ret)) return ret; } return 0; } static const struct drm_fb_helper_funcs drm_fbdev_shmem_helper_funcs = { .fb_dirty = drm_fbdev_shmem_helper_fb_dirty, }; /* * struct drm_driver */ int drm_fbdev_shmem_driver_fbdev_probe(struct drm_fb_helper *fb_helper, struct drm_fb_helper_surface_size *sizes) { struct drm_client_dev *client = &fb_helper->client; struct drm_device *dev = fb_helper->dev; struct fb_info *info = fb_helper->info; struct drm_client_buffer *buffer; struct drm_gem_shmem_object *shmem; struct drm_framebuffer *fb; u32 format; struct iosys_map map; int ret; drm_dbg_kms(dev, "surface width(%d), height(%d) and bpp(%d)\n", sizes->surface_width, sizes->surface_height, sizes->surface_bpp); format = drm_driver_legacy_fb_format(dev, sizes->surface_bpp, sizes->surface_depth); buffer = drm_client_buffer_create_dumb(client, sizes->surface_width, sizes->surface_height, format); if (IS_ERR(buffer)) return PTR_ERR(buffer); shmem = to_drm_gem_shmem_obj(buffer->gem); fb = buffer->fb; ret = drm_client_buffer_vmap(buffer, &map); if (ret) { goto err_drm_client_buffer_delete; } else if (drm_WARN_ON(dev, map.is_iomem)) { ret = -ENODEV; /* I/O memory not supported; use generic emulation */ goto err_drm_client_buffer_delete; } fb_helper->funcs = &drm_fbdev_shmem_helper_funcs; fb_helper->buffer = buffer; fb_helper->fb = fb; drm_fb_helper_fill_info(info, fb_helper, sizes); info->fbops = &drm_fbdev_shmem_fb_ops; /* screen */ info->flags |= FBINFO_VIRTFB; /* system memory */ if (!shmem->map_wc) info->flags |= FBINFO_READS_FAST; /* signal caching */ info->screen_size = sizes->surface_height * fb->pitches[0]; info->screen_buffer = map.vaddr; info->fix.smem_len = info->screen_size; /* deferred I/O */ fb_helper->fbdefio.delay = HZ / 20; fb_helper->fbdefio.get_page = drm_fbdev_shmem_get_page; fb_helper->fbdefio.deferred_io = drm_fb_helper_deferred_io; info->fbdefio = &fb_helper->fbdefio; ret = fb_deferred_io_init(info); if (ret) goto err_drm_client_buffer_vunmap; return 0; err_drm_client_buffer_vunmap: fb_helper->fb = NULL; fb_helper->buffer = NULL; drm_client_buffer_vunmap(buffer); err_drm_client_buffer_delete: drm_client_buffer_delete(buffer); return ret; } EXPORT_SYMBOL(drm_fbdev_shmem_driver_fbdev_probe); |
| 9 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 | // SPDX-License-Identifier: GPL-2.0 /* * The class-specific portions of the driver model * * Copyright (c) 2001-2003 Patrick Mochel <mochel@osdl.org> * Copyright (c) 2004-2009 Greg Kroah-Hartman <gregkh@suse.de> * Copyright (c) 2008-2009 Novell Inc. * Copyright (c) 2012-2019 Greg Kroah-Hartman <gregkh@linuxfoundation.org> * Copyright (c) 2012-2019 Linux Foundation * * See Documentation/driver-api/driver-model/ for more information. */ #ifndef _DEVICE_CLASS_H_ #define _DEVICE_CLASS_H_ #include <linux/kobject.h> #include <linux/klist.h> #include <linux/pm.h> #include <linux/device/bus.h> struct device; struct fwnode_handle; /** * struct class - device classes * @name: Name of the class. * @class_groups: Default attributes of this class. * @dev_groups: Default attributes of the devices that belong to the class. * @dev_uevent: Called when a device is added, removed from this class, or a * few other things that generate uevents to add the environment * variables. * @devnode: Callback to provide the devtmpfs. * @class_release: Called to release this class. * @dev_release: Called to release the device. * @shutdown_pre: Called at shut-down time before driver shutdown. * @ns_type: Callbacks so sysfs can detemine namespaces. * @namespace: Namespace of the device belongs to this class. * @get_ownership: Allows class to specify uid/gid of the sysfs directories * for the devices belonging to the class. Usually tied to * device's namespace. * @pm: The default device power management operations of this class. * * A class is a higher-level view of a device that abstracts out low-level * implementation details. Drivers may see a SCSI disk or an ATA disk, but, * at the class level, they are all simply disks. Classes allow user space * to work with devices based on what they do, rather than how they are * connected or how they work. */ struct class { const char *name; const struct attribute_group **class_groups; const struct attribute_group **dev_groups; int (*dev_uevent)(const struct device *dev, struct kobj_uevent_env *env); char *(*devnode)(const struct device *dev, umode_t *mode); void (*class_release)(const struct class *class); void (*dev_release)(struct device *dev); int (*shutdown_pre)(struct device *dev); const struct kobj_ns_type_operations *ns_type; const void *(*namespace)(const struct device *dev); void (*get_ownership)(const struct device *dev, kuid_t *uid, kgid_t *gid); const struct dev_pm_ops *pm; }; struct class_dev_iter { struct klist_iter ki; const struct device_type *type; struct subsys_private *sp; }; int __must_check class_register(const struct class *class); void class_unregister(const struct class *class); bool class_is_registered(const struct class *class); struct class_compat; struct class_compat *class_compat_register(const char *name); void class_compat_unregister(struct class_compat *cls); int class_compat_create_link(struct class_compat *cls, struct device *dev); void class_compat_remove_link(struct class_compat *cls, struct device *dev); void class_dev_iter_init(struct class_dev_iter *iter, const struct class *class, const struct device *start, const struct device_type *type); struct device *class_dev_iter_next(struct class_dev_iter *iter); void class_dev_iter_exit(struct class_dev_iter *iter); int class_for_each_device(const struct class *class, const struct device *start, void *data, device_iter_t fn); struct device *class_find_device(const struct class *class, const struct device *start, const void *data, device_match_t match); /** * class_find_device_by_name - device iterator for locating a particular device * of a specific name. * @class: class type * @name: name of the device to match */ static inline struct device *class_find_device_by_name(const struct class *class, const char *name) { return class_find_device(class, NULL, name, device_match_name); } /** * class_find_device_by_of_node : device iterator for locating a particular device * matching the of_node. * @class: class type * @np: of_node of the device to match. */ static inline struct device *class_find_device_by_of_node(const struct class *class, const struct device_node *np) { return class_find_device(class, NULL, np, device_match_of_node); } /** * class_find_device_by_fwnode : device iterator for locating a particular device * matching the fwnode. * @class: class type * @fwnode: fwnode of the device to match. */ static inline struct device *class_find_device_by_fwnode(const struct class *class, const struct fwnode_handle *fwnode) { return class_find_device(class, NULL, fwnode, device_match_fwnode); } /** * class_find_device_by_devt : device iterator for locating a particular device * matching the device type. * @class: class type * @devt: device type of the device to match. */ static inline struct device *class_find_device_by_devt(const struct class *class, dev_t devt) { return class_find_device(class, NULL, &devt, device_match_devt); } #ifdef CONFIG_ACPI struct acpi_device; /** * class_find_device_by_acpi_dev : device iterator for locating a particular * device matching the ACPI_COMPANION device. * @class: class type * @adev: ACPI_COMPANION device to match. */ static inline struct device *class_find_device_by_acpi_dev(const struct class *class, const struct acpi_device *adev) { return class_find_device(class, NULL, adev, device_match_acpi_dev); } #else static inline struct device *class_find_device_by_acpi_dev(const struct class *class, const void *adev) { return NULL; } #endif struct class_attribute { struct attribute attr; ssize_t (*show)(const struct class *class, const struct class_attribute *attr, char *buf); ssize_t (*store)(const struct class *class, const struct class_attribute *attr, const char *buf, size_t count); }; #define CLASS_ATTR_RW(_name) \ struct class_attribute class_attr_##_name = __ATTR_RW(_name) #define CLASS_ATTR_RO(_name) \ struct class_attribute class_attr_##_name = __ATTR_RO(_name) #define CLASS_ATTR_WO(_name) \ struct class_attribute class_attr_##_name = __ATTR_WO(_name) int __must_check class_create_file_ns(const struct class *class, const struct class_attribute *attr, const void *ns); void class_remove_file_ns(const struct class *class, const struct class_attribute *attr, const void *ns); static inline int __must_check class_create_file(const struct class *class, const struct class_attribute *attr) { return class_create_file_ns(class, attr, NULL); } static inline void class_remove_file(const struct class *class, const struct class_attribute *attr) { class_remove_file_ns(class, attr, NULL); } /* Simple class attribute that is just a static string */ struct class_attribute_string { struct class_attribute attr; char *str; }; /* Currently read-only only */ #define _CLASS_ATTR_STRING(_name, _mode, _str) \ { __ATTR(_name, _mode, show_class_attr_string, NULL), _str } #define CLASS_ATTR_STRING(_name, _mode, _str) \ struct class_attribute_string class_attr_##_name = \ _CLASS_ATTR_STRING(_name, _mode, _str) ssize_t show_class_attr_string(const struct class *class, const struct class_attribute *attr, char *buf); struct class_interface { struct list_head node; const struct class *class; int (*add_dev) (struct device *dev); void (*remove_dev) (struct device *dev); }; int __must_check class_interface_register(struct class_interface *); void class_interface_unregister(struct class_interface *); struct class * __must_check class_create(const char *name); void class_destroy(const struct class *cls); #endif /* _DEVICE_CLASS_H_ */ |
| 1 1 2 2 2 2 1 2 2 2 1 2 2 1 1 1 1 1 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 | // SPDX-License-Identifier: GPL-2.0-or-later /* * Glue Code for 3-way parallel assembler optimized version of Twofish * * Copyright (c) 2011 Jussi Kivilinna <jussi.kivilinna@mbnet.fi> */ #include <asm/cpu_device_id.h> #include <crypto/algapi.h> #include <crypto/twofish.h> #include <linux/crypto.h> #include <linux/export.h> #include <linux/init.h> #include <linux/module.h> #include <linux/types.h> #include "twofish.h" #include "ecb_cbc_helpers.h" EXPORT_SYMBOL_GPL(__twofish_enc_blk_3way); EXPORT_SYMBOL_GPL(twofish_dec_blk_3way); static int twofish_setkey_skcipher(struct crypto_skcipher *tfm, const u8 *key, unsigned int keylen) { return twofish_setkey(&tfm->base, key, keylen); } static inline void twofish_enc_blk_3way(const void *ctx, u8 *dst, const u8 *src) { __twofish_enc_blk_3way(ctx, dst, src, false); } void twofish_dec_blk_cbc_3way(const void *ctx, u8 *dst, const u8 *src) { u8 buf[2][TF_BLOCK_SIZE]; const u8 *s = src; if (dst == src) s = memcpy(buf, src, sizeof(buf)); twofish_dec_blk_3way(ctx, dst, src); crypto_xor(dst + TF_BLOCK_SIZE, s, sizeof(buf)); } EXPORT_SYMBOL_GPL(twofish_dec_blk_cbc_3way); static int ecb_encrypt(struct skcipher_request *req) { ECB_WALK_START(req, TF_BLOCK_SIZE, -1); ECB_BLOCK(3, twofish_enc_blk_3way); ECB_BLOCK(1, twofish_enc_blk); ECB_WALK_END(); } static int ecb_decrypt(struct skcipher_request *req) { ECB_WALK_START(req, TF_BLOCK_SIZE, -1); ECB_BLOCK(3, twofish_dec_blk_3way); ECB_BLOCK(1, twofish_dec_blk); ECB_WALK_END(); } static int cbc_encrypt(struct skcipher_request *req) { CBC_WALK_START(req, TF_BLOCK_SIZE, -1); CBC_ENC_BLOCK(twofish_enc_blk); CBC_WALK_END(); } static int cbc_decrypt(struct skcipher_request *req) { CBC_WALK_START(req, TF_BLOCK_SIZE, -1); CBC_DEC_BLOCK(3, twofish_dec_blk_cbc_3way); CBC_DEC_BLOCK(1, twofish_dec_blk); CBC_WALK_END(); } static struct skcipher_alg tf_skciphers[] = { { .base.cra_name = "ecb(twofish)", .base.cra_driver_name = "ecb-twofish-3way", .base.cra_priority = 300, .base.cra_blocksize = TF_BLOCK_SIZE, .base.cra_ctxsize = sizeof(struct twofish_ctx), .base.cra_module = THIS_MODULE, .min_keysize = TF_MIN_KEY_SIZE, .max_keysize = TF_MAX_KEY_SIZE, .setkey = twofish_setkey_skcipher, .encrypt = ecb_encrypt, .decrypt = ecb_decrypt, }, { .base.cra_name = "cbc(twofish)", .base.cra_driver_name = "cbc-twofish-3way", .base.cra_priority = 300, .base.cra_blocksize = TF_BLOCK_SIZE, .base.cra_ctxsize = sizeof(struct twofish_ctx), .base.cra_module = THIS_MODULE, .min_keysize = TF_MIN_KEY_SIZE, .max_keysize = TF_MAX_KEY_SIZE, .ivsize = TF_BLOCK_SIZE, .setkey = twofish_setkey_skcipher, .encrypt = cbc_encrypt, .decrypt = cbc_decrypt, }, }; static bool is_blacklisted_cpu(void) { if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL) return false; switch (boot_cpu_data.x86_vfm) { case INTEL_ATOM_BONNELL: case INTEL_ATOM_BONNELL_MID: case INTEL_ATOM_SALTWELL: /* * On Atom, twofish-3way is slower than original assembler * implementation. Twofish-3way trades off some performance in * storing blocks in 64bit registers to allow three blocks to * be processed parallel. Parallel operation then allows gaining * more performance than was trade off, on out-of-order CPUs. * However Atom does not benefit from this parallelism and * should be blacklisted. */ return true; } if (boot_cpu_data.x86 == 0x0f) { /* * On Pentium 4, twofish-3way is slower than original assembler * implementation because excessive uses of 64bit rotate and * left-shifts (which are really slow on P4) needed to store and * handle 128bit block in two 64bit registers. */ return true; } return false; } static int force; module_param(force, int, 0); MODULE_PARM_DESC(force, "Force module load, ignore CPU blacklist"); static int __init twofish_3way_init(void) { if (!force && is_blacklisted_cpu()) { printk(KERN_INFO "twofish-x86_64-3way: performance on this CPU " "would be suboptimal: disabling " "twofish-x86_64-3way.\n"); return -ENODEV; } return crypto_register_skciphers(tf_skciphers, ARRAY_SIZE(tf_skciphers)); } static void __exit twofish_3way_fini(void) { crypto_unregister_skciphers(tf_skciphers, ARRAY_SIZE(tf_skciphers)); } module_init(twofish_3way_init); module_exit(twofish_3way_fini); MODULE_LICENSE("GPL"); MODULE_DESCRIPTION("Twofish Cipher Algorithm, 3-way parallel asm optimized"); MODULE_ALIAS_CRYPTO("twofish"); MODULE_ALIAS_CRYPTO("twofish-asm"); |
| 1 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 | // SPDX-License-Identifier: GPL-2.0 /* * linux/fs/affs/amigaffs.c * * (c) 1996 Hans-Joachim Widmaier - Rewritten * * (C) 1993 Ray Burr - Amiga FFS filesystem. * * Please send bug reports to: hjw@zvw.de */ #include <linux/math64.h> #include <linux/iversion.h> #include "affs.h" /* * Functions for accessing Amiga-FFS structures. */ /* Insert a header block bh into the directory dir * caller must hold AFFS_DIR->i_hash_lock! */ int affs_insert_hash(struct inode *dir, struct buffer_head *bh) { struct super_block *sb = dir->i_sb; struct buffer_head *dir_bh; u32 ino, hash_ino; int offset; ino = bh->b_blocknr; offset = affs_hash_name(sb, AFFS_TAIL(sb, bh)->name + 1, AFFS_TAIL(sb, bh)->name[0]); pr_debug("%s(dir=%lu, ino=%d)\n", __func__, dir->i_ino, ino); dir_bh = affs_bread(sb, dir->i_ino); if (!dir_bh) return -EIO; hash_ino = be32_to_cpu(AFFS_HEAD(dir_bh)->table[offset]); while (hash_ino) { affs_brelse(dir_bh); dir_bh = affs_bread(sb, hash_ino); if (!dir_bh) return -EIO; hash_ino = be32_to_cpu(AFFS_TAIL(sb, dir_bh)->hash_chain); } AFFS_TAIL(sb, bh)->parent = cpu_to_be32(dir->i_ino); AFFS_TAIL(sb, bh)->hash_chain = 0; affs_fix_checksum(sb, bh); if (dir->i_ino == dir_bh->b_blocknr) AFFS_HEAD(dir_bh)->table[offset] = cpu_to_be32(ino); else AFFS_TAIL(sb, dir_bh)->hash_chain = cpu_to_be32(ino); affs_adjust_checksum(dir_bh, ino); mark_buffer_dirty_inode(dir_bh, dir); affs_brelse(dir_bh); inode_set_mtime_to_ts(dir, inode_set_ctime_current(dir)); inode_inc_iversion(dir); mark_inode_dirty(dir); return 0; } /* Remove a header block from its directory. * caller must hold AFFS_DIR->i_hash_lock! */ int affs_remove_hash(struct inode *dir, struct buffer_head *rem_bh) { struct super_block *sb; struct buffer_head *bh; u32 rem_ino, hash_ino; __be32 ino; int offset, retval; sb = dir->i_sb; rem_ino = rem_bh->b_blocknr; offset = affs_hash_name(sb, AFFS_TAIL(sb, rem_bh)->name+1, AFFS_TAIL(sb, rem_bh)->name[0]); pr_debug("%s(dir=%lu, ino=%d, hashval=%d)\n", __func__, dir->i_ino, rem_ino, offset); bh = affs_bread(sb, dir->i_ino); if (!bh) return -EIO; retval = -ENOENT; hash_ino = be32_to_cpu(AFFS_HEAD(bh)->table[offset]); while (hash_ino) { if (hash_ino == rem_ino) { ino = AFFS_TAIL(sb, rem_bh)->hash_chain; if (dir->i_ino == bh->b_blocknr) AFFS_HEAD(bh)->table[offset] = ino; else AFFS_TAIL(sb, bh)->hash_chain = ino; affs_adjust_checksum(bh, be32_to_cpu(ino) - hash_ino); mark_buffer_dirty_inode(bh, dir); AFFS_TAIL(sb, rem_bh)->parent = 0; retval = 0; break; } affs_brelse(bh); bh = affs_bread(sb, hash_ino); if (!bh) return -EIO; hash_ino = be32_to_cpu(AFFS_TAIL(sb, bh)->hash_chain); } affs_brelse(bh); inode_set_mtime_to_ts(dir, inode_set_ctime_current(dir)); inode_inc_iversion(dir); mark_inode_dirty(dir); return retval; } static void affs_fix_dcache(struct inode *inode, u32 entry_ino) { struct dentry *dentry; spin_lock(&inode->i_lock); hlist_for_each_entry(dentry, &inode->i_dentry, d_u.d_alias) { if (entry_ino == (u32)(long)dentry->d_fsdata) { dentry->d_fsdata = (void *)inode->i_ino; break; } } spin_unlock(&inode->i_lock); } /* Remove header from link chain */ static int affs_remove_link(struct dentry *dentry) { struct inode *dir, *inode = d_inode(dentry); struct super_block *sb = inode->i_sb; struct buffer_head *bh, *link_bh = NULL; u32 link_ino, ino; int retval; pr_debug("%s(key=%ld)\n", __func__, inode->i_ino); retval = -EIO; bh = affs_bread(sb, inode->i_ino); if (!bh) goto done; link_ino = (u32)(long)dentry->d_fsdata; if (inode->i_ino == link_ino) { /* we can't remove the head of the link, as its blocknr is still used as ino, * so we remove the block of the first link instead. */ link_ino = be32_to_cpu(AFFS_TAIL(sb, bh)->link_chain); link_bh = affs_bread(sb, link_ino); if (!link_bh) goto done; dir = affs_iget(sb, be32_to_cpu(AFFS_TAIL(sb, link_bh)->parent)); if (IS_ERR(dir)) { retval = PTR_ERR(dir); goto done; } affs_lock_dir(dir); /* * if there's a dentry for that block, make it * refer to inode itself. */ affs_fix_dcache(inode, link_ino); retval = affs_remove_hash(dir, link_bh); if (retval) { affs_unlock_dir(dir); goto done; } mark_buffer_dirty_inode(link_bh, inode); memcpy(AFFS_TAIL(sb, bh)->name, AFFS_TAIL(sb, link_bh)->name, 32); retval = affs_insert_hash(dir, bh); if (retval) { affs_unlock_dir(dir); goto done; } mark_buffer_dirty_inode(bh, inode); affs_unlock_dir(dir); iput(dir); } else { link_bh = affs_bread(sb, link_ino); if (!link_bh) goto done; } while ((ino = be32_to_cpu(AFFS_TAIL(sb, bh)->link_chain)) != 0) { if (ino == link_ino) { __be32 ino2 = AFFS_TAIL(sb, link_bh)->link_chain; AFFS_TAIL(sb, bh)->link_chain = ino2; affs_adjust_checksum(bh, be32_to_cpu(ino2) - link_ino); mark_buffer_dirty_inode(bh, inode); retval = 0; /* Fix the link count, if bh is a normal header block without links */ switch (be32_to_cpu(AFFS_TAIL(sb, bh)->stype)) { case ST_LINKDIR: case ST_LINKFILE: break; default: if (!AFFS_TAIL(sb, bh)->link_chain) set_nlink(inode, 1); } affs_free_block(sb, link_ino); goto done; } affs_brelse(bh); bh = affs_bread(sb, ino); if (!bh) goto done; } retval = -ENOENT; done: affs_brelse(link_bh); affs_brelse(bh); return retval; } static int affs_empty_dir(struct inode *inode) { struct super_block *sb = inode->i_sb; struct buffer_head *bh; int retval, size; retval = -EIO; bh = affs_bread(sb, inode->i_ino); if (!bh) goto done; retval = -ENOTEMPTY; for (size = AFFS_SB(sb)->s_hashsize - 1; size >= 0; size--) if (AFFS_HEAD(bh)->table[size]) goto not_empty; retval = 0; not_empty: affs_brelse(bh); done: return retval; } /* Remove a filesystem object. If the object to be removed has * links to it, one of the links must be changed to inherit * the file or directory. As above, any inode will do. * The buffer will not be freed. If the header is a link, the * block will be marked as free. * This function returns a negative error number in case of * an error, else 0 if the inode is to be deleted or 1 if not. */ int affs_remove_header(struct dentry *dentry) { struct super_block *sb; struct inode *inode, *dir; struct buffer_head *bh = NULL; int retval; dir = d_inode(dentry->d_parent); sb = dir->i_sb; retval = -ENOENT; inode = d_inode(dentry); if (!inode) goto done; pr_debug("%s(key=%ld)\n", __func__, inode->i_ino); retval = -EIO; bh = affs_bread(sb, (u32)(long)dentry->d_fsdata); if (!bh) goto done; affs_lock_link(inode); affs_lock_dir(dir); switch (be32_to_cpu(AFFS_TAIL(sb, bh)->stype)) { case ST_USERDIR: /* if we ever want to support links to dirs * i_hash_lock of the inode must only be * taken after some checks */ affs_lock_dir(inode); retval = affs_empty_dir(inode); affs_unlock_dir(inode); if (retval) goto done_unlock; break; default: break; } retval = affs_remove_hash(dir, bh); if (retval) goto done_unlock; mark_buffer_dirty_inode(bh, inode); affs_unlock_dir(dir); if (inode->i_nlink > 1) retval = affs_remove_link(dentry); else clear_nlink(inode); affs_unlock_link(inode); inode_set_ctime_current(inode); mark_inode_dirty(inode); done: affs_brelse(bh); return retval; done_unlock: affs_unlock_dir(dir); affs_unlock_link(inode); goto done; } /* Checksum a block, do various consistency checks and optionally return the blocks type number. DATA points to the block. If their pointers are non-null, *PTYPE and *STYPE are set to the primary and secondary block types respectively, *HASHSIZE is set to the size of the hashtable (which lets us calculate the block size). Returns non-zero if the block is not consistent. */ u32 affs_checksum_block(struct super_block *sb, struct buffer_head *bh) { __be32 *ptr = (__be32 *)bh->b_data; u32 sum; int bsize; sum = 0; for (bsize = sb->s_blocksize / sizeof(__be32); bsize > 0; bsize--) sum += be32_to_cpu(*ptr++); return sum; } /* * Calculate the checksum of a disk block and store it * at the indicated position. */ void affs_fix_checksum(struct super_block *sb, struct buffer_head *bh) { int cnt = sb->s_blocksize / sizeof(__be32); __be32 *ptr = (__be32 *)bh->b_data; u32 checksum; __be32 *checksumptr; checksumptr = ptr + 5; *checksumptr = 0; for (checksum = 0; cnt > 0; ptr++, cnt--) checksum += be32_to_cpu(*ptr); *checksumptr = cpu_to_be32(-checksum); } void affs_secs_to_datestamp(time64_t secs, struct affs_date *ds) { u32 days; u32 minute; s32 rem; secs -= sys_tz.tz_minuteswest * 60 + AFFS_EPOCH_DELTA; if (secs < 0) secs = 0; days = div_s64_rem(secs, 86400, &rem); minute = rem / 60; rem -= minute * 60; ds->days = cpu_to_be32(days); ds->mins = cpu_to_be32(minute); ds->ticks = cpu_to_be32(rem * 50); } umode_t affs_prot_to_mode(u32 prot) { umode_t mode = 0; if (!(prot & FIBF_NOWRITE)) mode |= 0200; if (!(prot & FIBF_NOREAD)) mode |= 0400; if (!(prot & FIBF_NOEXECUTE)) mode |= 0100; if (prot & FIBF_GRP_WRITE) mode |= 0020; if (prot & FIBF_GRP_READ) mode |= 0040; if (prot & FIBF_GRP_EXECUTE) mode |= 0010; if (prot & FIBF_OTR_WRITE) mode |= 0002; if (prot & FIBF_OTR_READ) mode |= 0004; if (prot & FIBF_OTR_EXECUTE) mode |= 0001; return mode; } void affs_mode_to_prot(struct inode *inode) { u32 prot = AFFS_I(inode)->i_protect; umode_t mode = inode->i_mode; /* * First, clear all RWED bits for owner, group, other. * Then, recalculate them afresh. * * We'll always clear the delete-inhibit bit for the owner, as that is * the classic single-user mode AmigaOS protection bit and we need to * stay compatible with all scenarios. * * Since multi-user AmigaOS is an extension, we'll only set the * delete-allow bit if any of the other bits in the same user class * (group/other) are used. */ prot &= ~(FIBF_NOEXECUTE | FIBF_NOREAD | FIBF_NOWRITE | FIBF_NODELETE | FIBF_GRP_EXECUTE | FIBF_GRP_READ | FIBF_GRP_WRITE | FIBF_GRP_DELETE | FIBF_OTR_EXECUTE | FIBF_OTR_READ | FIBF_OTR_WRITE | FIBF_OTR_DELETE); /* Classic single-user AmigaOS flags. These are inverted. */ if (!(mode & 0100)) prot |= FIBF_NOEXECUTE; if (!(mode & 0400)) prot |= FIBF_NOREAD; if (!(mode & 0200)) prot |= FIBF_NOWRITE; /* Multi-user extended flags. Not inverted. */ if (mode & 0010) prot |= FIBF_GRP_EXECUTE; if (mode & 0040) prot |= FIBF_GRP_READ; if (mode & 0020) prot |= FIBF_GRP_WRITE; if (mode & 0070) prot |= FIBF_GRP_DELETE; if (mode & 0001) prot |= FIBF_OTR_EXECUTE; if (mode & 0004) prot |= FIBF_OTR_READ; if (mode & 0002) prot |= FIBF_OTR_WRITE; if (mode & 0007) prot |= FIBF_OTR_DELETE; AFFS_I(inode)->i_protect = prot; } void affs_error(struct super_block *sb, const char *function, const char *fmt, ...) { struct va_format vaf; va_list args; va_start(args, fmt); vaf.fmt = fmt; vaf.va = &args; pr_crit("error (device %s): %s(): %pV\n", sb->s_id, function, &vaf); if (!sb_rdonly(sb)) pr_warn("Remounting filesystem read-only\n"); sb->s_flags |= SB_RDONLY; va_end(args); } void affs_warning(struct super_block *sb, const char *function, const char *fmt, ...) { struct va_format vaf; va_list args; va_start(args, fmt); vaf.fmt = fmt; vaf.va = &args; pr_warn("(device %s): %s(): %pV\n", sb->s_id, function, &vaf); va_end(args); } bool affs_nofilenametruncate(const struct dentry *dentry) { return affs_test_opt(AFFS_SB(dentry->d_sb)->s_flags, SF_NO_TRUNCATE); } /* Check if the name is valid for a affs object. */ int affs_check_name(const unsigned char *name, int len, bool notruncate) { int i; if (len > AFFSNAMEMAX) { if (notruncate) return -ENAMETOOLONG; len = AFFSNAMEMAX; } for (i = 0; i < len; i++) { if (name[i] < ' ' || name[i] == ':' || (name[i] > 0x7e && name[i] < 0xa0)) return -EINVAL; } return 0; } /* This function copies name to bstr, with at most 30 * characters length. The bstr will be prepended by * a length byte. * NOTE: The name will must be already checked by * affs_check_name()! */ int affs_copy_name(unsigned char *bstr, struct dentry *dentry) { u32 len = min(dentry->d_name.len, AFFSNAMEMAX); *bstr++ = len; memcpy(bstr, dentry->d_name.name, len); return len; } |
| 7 4 5 16 14 43 16 40 38 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 | /* SPDX-License-Identifier: GPL-2.0 */ #undef TRACE_SYSTEM #define TRACE_SYSTEM compaction #if !defined(_TRACE_COMPACTION_H) || defined(TRACE_HEADER_MULTI_READ) #define _TRACE_COMPACTION_H #include <linux/types.h> #include <linux/list.h> #include <linux/tracepoint.h> #include <trace/events/mmflags.h> DECLARE_EVENT_CLASS(mm_compaction_isolate_template, TP_PROTO( unsigned long start_pfn, unsigned long end_pfn, unsigned long nr_scanned, unsigned long nr_taken), TP_ARGS(start_pfn, end_pfn, nr_scanned, nr_taken), TP_STRUCT__entry( __field(unsigned long, start_pfn) __field(unsigned long, end_pfn) __field(unsigned long, nr_scanned) __field(unsigned long, nr_taken) ), TP_fast_assign( __entry->start_pfn = start_pfn; __entry->end_pfn = end_pfn; __entry->nr_scanned = nr_scanned; __entry->nr_taken = nr_taken; ), TP_printk("range=(0x%lx ~ 0x%lx) nr_scanned=%lu nr_taken=%lu", __entry->start_pfn, __entry->end_pfn, __entry->nr_scanned, __entry->nr_taken) ); DEFINE_EVENT(mm_compaction_isolate_template, mm_compaction_isolate_migratepages, TP_PROTO( unsigned long start_pfn, unsigned long end_pfn, unsigned long nr_scanned, unsigned long nr_taken), TP_ARGS(start_pfn, end_pfn, nr_scanned, nr_taken) ); DEFINE_EVENT(mm_compaction_isolate_template, mm_compaction_isolate_freepages, TP_PROTO( unsigned long start_pfn, unsigned long end_pfn, unsigned long nr_scanned, unsigned long nr_taken), TP_ARGS(start_pfn, end_pfn, nr_scanned, nr_taken) ); DEFINE_EVENT(mm_compaction_isolate_template, mm_compaction_fast_isolate_freepages, TP_PROTO( unsigned long start_pfn, unsigned long end_pfn, unsigned long nr_scanned, unsigned long nr_taken), TP_ARGS(start_pfn, end_pfn, nr_scanned, nr_taken) ); #ifdef CONFIG_COMPACTION TRACE_EVENT(mm_compaction_migratepages, TP_PROTO(unsigned int nr_migratepages, unsigned int nr_succeeded), TP_ARGS(nr_migratepages, nr_succeeded), TP_STRUCT__entry( __field(unsigned long, nr_migrated) __field(unsigned long, nr_failed) ), TP_fast_assign( __entry->nr_migrated = nr_succeeded; __entry->nr_failed = nr_migratepages - nr_succeeded; ), TP_printk("nr_migrated=%lu nr_failed=%lu", __entry->nr_migrated, __entry->nr_failed) ); TRACE_EVENT(mm_compaction_begin, TP_PROTO(struct compact_control *cc, unsigned long zone_start, unsigned long zone_end, bool sync), TP_ARGS(cc, zone_start, zone_end, sync), TP_STRUCT__entry( __field(unsigned long, zone_start) __field(unsigned long, migrate_pfn) __field(unsigned long, free_pfn) __field(unsigned long, zone_end) __field(bool, sync) ), TP_fast_assign( __entry->zone_start = zone_start; __entry->migrate_pfn = cc->migrate_pfn; __entry->free_pfn = cc->free_pfn; __entry->zone_end = zone_end; __entry->sync = sync; ), TP_printk("zone_start=0x%lx migrate_pfn=0x%lx free_pfn=0x%lx zone_end=0x%lx, mode=%s", __entry->zone_start, __entry->migrate_pfn, __entry->free_pfn, __entry->zone_end, __entry->sync ? "sync" : "async") ); TRACE_EVENT(mm_compaction_end, TP_PROTO(struct compact_control *cc, unsigned long zone_start, unsigned long zone_end, bool sync, int status), TP_ARGS(cc, zone_start, zone_end, sync, status), TP_STRUCT__entry( __field(unsigned long, zone_start) __field(unsigned long, migrate_pfn) __field(unsigned long, free_pfn) __field(unsigned long, zone_end) __field(bool, sync) __field(int, status) ), TP_fast_assign( __entry->zone_start = zone_start; __entry->migrate_pfn = cc->migrate_pfn; __entry->free_pfn = cc->free_pfn; __entry->zone_end = zone_end; __entry->sync = sync; __entry->status = status; ), TP_printk("zone_start=0x%lx migrate_pfn=0x%lx free_pfn=0x%lx zone_end=0x%lx, mode=%s status=%s", __entry->zone_start, __entry->migrate_pfn, __entry->free_pfn, __entry->zone_end, __entry->sync ? "sync" : "async", __print_symbolic(__entry->status, COMPACTION_STATUS)) ); TRACE_EVENT(mm_compaction_try_to_compact_pages, TP_PROTO( int order, gfp_t gfp_mask, int prio), TP_ARGS(order, gfp_mask, prio), TP_STRUCT__entry( __field(int, order) __field(unsigned long, gfp_mask) __field(int, prio) ), TP_fast_assign( __entry->order = order; __entry->gfp_mask = (__force unsigned long)gfp_mask; __entry->prio = prio; ), TP_printk("order=%d gfp_mask=%s priority=%d", __entry->order, show_gfp_flags(__entry->gfp_mask), __entry->prio) ); DECLARE_EVENT_CLASS(mm_compaction_suitable_template, TP_PROTO(struct zone *zone, int order, int ret), TP_ARGS(zone, order, ret), TP_STRUCT__entry( __field(int, nid) __field(enum zone_type, idx) __field(int, order) __field(int, ret) ), TP_fast_assign( __entry->nid = zone_to_nid(zone); __entry->idx = zone_idx(zone); __entry->order = order; __entry->ret = ret; ), TP_printk("node=%d zone=%-8s order=%d ret=%s", __entry->nid, __print_symbolic(__entry->idx, ZONE_TYPE), __entry->order, __print_symbolic(__entry->ret, COMPACTION_STATUS)) ); DEFINE_EVENT(mm_compaction_suitable_template, mm_compaction_finished, TP_PROTO(struct zone *zone, int order, int ret), TP_ARGS(zone, order, ret) ); DEFINE_EVENT(mm_compaction_suitable_template, mm_compaction_suitable, TP_PROTO(struct zone *zone, int order, int ret), TP_ARGS(zone, order, ret) ); DECLARE_EVENT_CLASS(mm_compaction_defer_template, TP_PROTO(struct zone *zone, int order), TP_ARGS(zone, order), TP_STRUCT__entry( __field(int, nid) __field(enum zone_type, idx) __field(int, order) __field(unsigned int, considered) __field(unsigned int, defer_shift) __field(int, order_failed) ), TP_fast_assign( __entry->nid = zone_to_nid(zone); __entry->idx = zone_idx(zone); __entry->order = order; __entry->considered = zone->compact_considered; __entry->defer_shift = zone->compact_defer_shift; __entry->order_failed = zone->compact_order_failed; ), TP_printk("node=%d zone=%-8s order=%d order_failed=%d consider=%u limit=%lu", __entry->nid, __print_symbolic(__entry->idx, ZONE_TYPE), __entry->order, __entry->order_failed, __entry->considered, 1UL << __entry->defer_shift) ); DEFINE_EVENT(mm_compaction_defer_template, mm_compaction_deferred, TP_PROTO(struct zone *zone, int order), TP_ARGS(zone, order) ); DEFINE_EVENT(mm_compaction_defer_template, mm_compaction_defer_compaction, TP_PROTO(struct zone *zone, int order), TP_ARGS(zone, order) ); DEFINE_EVENT(mm_compaction_defer_template, mm_compaction_defer_reset, TP_PROTO(struct zone *zone, int order), TP_ARGS(zone, order) ); TRACE_EVENT(mm_compaction_kcompactd_sleep, TP_PROTO(int nid), TP_ARGS(nid), TP_STRUCT__entry( __field(int, nid) ), TP_fast_assign( __entry->nid = nid; ), TP_printk("nid=%d", __entry->nid) ); DECLARE_EVENT_CLASS(kcompactd_wake_template, TP_PROTO(int nid, int order, enum zone_type highest_zoneidx), TP_ARGS(nid, order, highest_zoneidx), TP_STRUCT__entry( __field(int, nid) __field(int, order) __field(enum zone_type, highest_zoneidx) ), TP_fast_assign( __entry->nid = nid; __entry->order = order; __entry->highest_zoneidx = highest_zoneidx; ), /* * classzone_idx is previous name of the highest_zoneidx. * Reason not to change it is the ABI requirement of the tracepoint. */ TP_printk("nid=%d order=%d classzone_idx=%-8s", __entry->nid, __entry->order, __print_symbolic(__entry->highest_zoneidx, ZONE_TYPE)) ); DEFINE_EVENT(kcompactd_wake_template, mm_compaction_wakeup_kcompactd, TP_PROTO(int nid, int order, enum zone_type highest_zoneidx), TP_ARGS(nid, order, highest_zoneidx) ); DEFINE_EVENT(kcompactd_wake_template, mm_compaction_kcompactd_wake, TP_PROTO(int nid, int order, enum zone_type highest_zoneidx), TP_ARGS(nid, order, highest_zoneidx) ); #endif #endif /* _TRACE_COMPACTION_H */ /* This part must be outside protection */ #include <trace/define_trace.h> |
| 10 10 11 11 10 1 10 11 11 11 11 11 9 10 10 10 11 11 11 2 1 2 1 2 2 2 2 3 3 3 3 2 1 1 1 1 1 2 2 2 2 2 2 3 1 2 2 1 1 1 1 1 1 1 1 1 1 9 1 1 1 1 1 1 1 1 1 1 1 11 11 6 6 6 2 3 1 2 9 1 1 1 21 9 10 10 10 9 10 10 10 10 7 3 10 10 10 7 7 7 9 8 2 9 6 6 11 11 11 1 1 5 5 5 5 5 5 5 5 5 5 5 4 4 4 4 4 4 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 4 4 4 4 4 2 2 2 2 1 1 1 1 1 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 | // SPDX-License-Identifier: GPL-2.0-only /* * Framework for buffer objects that can be shared across devices/subsystems. * * Copyright(C) 2011 Linaro Limited. All rights reserved. * Author: Sumit Semwal <sumit.semwal@ti.com> * * Many thanks to linaro-mm-sig list, and specially * Arnd Bergmann <arnd@arndb.de>, Rob Clark <rob@ti.com> and * Daniel Vetter <daniel@ffwll.ch> for their support in creation and * refining of this idea. */ #include <linux/fs.h> #include <linux/slab.h> #include <linux/dma-buf.h> #include <linux/dma-fence.h> #include <linux/dma-fence-unwrap.h> #include <linux/anon_inodes.h> #include <linux/export.h> #include <linux/debugfs.h> #include <linux/list.h> #include <linux/module.h> #include <linux/mutex.h> #include <linux/seq_file.h> #include <linux/sync_file.h> #include <linux/poll.h> #include <linux/dma-resv.h> #include <linux/mm.h> #include <linux/mount.h> #include <linux/pseudo_fs.h> #include <uapi/linux/dma-buf.h> #include <uapi/linux/magic.h> #define CREATE_TRACE_POINTS #include <trace/events/dma_buf.h> /* * dmabuf->name must be accessed with holding dmabuf->name_lock. * we need to take the lock around the tracepoint call itself where * it is called in the code. * * Note: FUNC##_enabled() is a static branch that will only * be set when the trace event is enabled. */ #define DMA_BUF_TRACE(FUNC, ...) \ do { \ /* Always expose lock if lockdep is enabled */ \ if (IS_ENABLED(CONFIG_LOCKDEP) || FUNC##_enabled()) { \ guard(spinlock)(&dmabuf->name_lock); \ FUNC(__VA_ARGS__); \ } \ } while (0) /* Wrapper to hide the sg_table page link from the importer */ struct dma_buf_sg_table_wrapper { struct sg_table *original; struct sg_table wrapper; }; static inline int is_dma_buf_file(struct file *); static DEFINE_MUTEX(dmabuf_list_mutex); static LIST_HEAD(dmabuf_list); static void __dma_buf_list_add(struct dma_buf *dmabuf) { mutex_lock(&dmabuf_list_mutex); list_add(&dmabuf->list_node, &dmabuf_list); mutex_unlock(&dmabuf_list_mutex); } static void __dma_buf_list_del(struct dma_buf *dmabuf) { if (!dmabuf) return; mutex_lock(&dmabuf_list_mutex); list_del(&dmabuf->list_node); mutex_unlock(&dmabuf_list_mutex); } /** * dma_buf_iter_begin - begin iteration through global list of all DMA buffers * * Returns the first buffer in the global list of DMA-bufs that's not in the * process of being destroyed. Increments that buffer's reference count to * prevent buffer destruction. Callers must release the reference, either by * continuing iteration with dma_buf_iter_next(), or with dma_buf_put(). * * Return: * * First buffer from global list, with refcount elevated * * NULL if no active buffers are present */ struct dma_buf *dma_buf_iter_begin(void) { struct dma_buf *ret = NULL, *dmabuf; /* * The list mutex does not protect a dmabuf's refcount, so it can be * zeroed while we are iterating. We cannot call get_dma_buf() since the * caller may not already own a reference to the buffer. */ mutex_lock(&dmabuf_list_mutex); list_for_each_entry(dmabuf, &dmabuf_list, list_node) { if (file_ref_get(&dmabuf->file->f_ref)) { ret = dmabuf; break; } } mutex_unlock(&dmabuf_list_mutex); return ret; } /** * dma_buf_iter_next - continue iteration through global list of all DMA buffers * @dmabuf: [in] pointer to dma_buf * * Decrements the reference count on the provided buffer. Returns the next * buffer from the remainder of the global list of DMA-bufs with its reference * count incremented. Callers must release the reference, either by continuing * iteration with dma_buf_iter_next(), or with dma_buf_put(). * * Return: * * Next buffer from global list, with refcount elevated * * NULL if no additional active buffers are present */ struct dma_buf *dma_buf_iter_next(struct dma_buf *dmabuf) { struct dma_buf *ret = NULL; /* * The list mutex does not protect a dmabuf's refcount, so it can be * zeroed while we are iterating. We cannot call get_dma_buf() since the * caller may not already own a reference to the buffer. */ mutex_lock(&dmabuf_list_mutex); dma_buf_put(dmabuf); list_for_each_entry_continue(dmabuf, &dmabuf_list, list_node) { if (file_ref_get(&dmabuf->file->f_ref)) { ret = dmabuf; break; } } mutex_unlock(&dmabuf_list_mutex); return ret; } static char *dmabuffs_dname(struct dentry *dentry, char *buffer, int buflen) { struct dma_buf *dmabuf; char name[DMA_BUF_NAME_LEN]; ssize_t ret = 0; dmabuf = dentry->d_fsdata; spin_lock(&dmabuf->name_lock); if (dmabuf->name) ret = strscpy(name, dmabuf->name, sizeof(name)); spin_unlock(&dmabuf->name_lock); return dynamic_dname(buffer, buflen, "/%s:%s", dentry->d_name.name, ret > 0 ? name : ""); } static void dma_buf_release(struct dentry *dentry) { struct dma_buf *dmabuf; dmabuf = dentry->d_fsdata; if (unlikely(!dmabuf)) return; BUG_ON(dmabuf->vmapping_counter); /* * If you hit this BUG() it could mean: * * There's a file reference imbalance in dma_buf_poll / dma_buf_poll_cb or somewhere else * * dmabuf->cb_in/out.active are non-0 despite no pending fence callback */ BUG_ON(dmabuf->cb_in.active || dmabuf->cb_out.active); dmabuf->ops->release(dmabuf); if (dmabuf->resv == (struct dma_resv *)&dmabuf[1]) dma_resv_fini(dmabuf->resv); WARN_ON(!list_empty(&dmabuf->attachments)); module_put(dmabuf->owner); kfree(dmabuf->name); kfree(dmabuf); } static int dma_buf_file_release(struct inode *inode, struct file *file) { if (!is_dma_buf_file(file)) return -EINVAL; __dma_buf_list_del(file->private_data); return 0; } static const struct dentry_operations dma_buf_dentry_ops = { .d_dname = dmabuffs_dname, .d_release = dma_buf_release, }; static struct vfsmount *dma_buf_mnt; static int dma_buf_fs_init_context(struct fs_context *fc) { struct pseudo_fs_context *ctx; ctx = init_pseudo(fc, DMA_BUF_MAGIC); if (!ctx) return -ENOMEM; ctx->dops = &dma_buf_dentry_ops; return 0; } static struct file_system_type dma_buf_fs_type = { .name = "dmabuf", .init_fs_context = dma_buf_fs_init_context, .kill_sb = kill_anon_super, }; static int dma_buf_mmap_internal(struct file *file, struct vm_area_struct *vma) { struct dma_buf *dmabuf; if (!is_dma_buf_file(file)) return -EINVAL; dmabuf = file->private_data; /* check if buffer supports mmap */ if (!dmabuf->ops->mmap) return -EINVAL; /* check for overflowing the buffer's size */ if (vma->vm_pgoff + vma_pages(vma) > dmabuf->size >> PAGE_SHIFT) return -EINVAL; DMA_BUF_TRACE(trace_dma_buf_mmap_internal, dmabuf); return dmabuf->ops->mmap(dmabuf, vma); } static loff_t dma_buf_llseek(struct file *file, loff_t offset, int whence) { struct dma_buf *dmabuf; loff_t base; if (!is_dma_buf_file(file)) return -EBADF; dmabuf = file->private_data; /* only support discovering the end of the buffer, * but also allow SEEK_SET to maintain the idiomatic * SEEK_END(0), SEEK_CUR(0) pattern. */ if (whence == SEEK_END) base = dmabuf->size; else if (whence == SEEK_SET) base = 0; else return -EINVAL; if (offset != 0) return -EINVAL; return base + offset; } /** * DOC: implicit fence polling * * To support cross-device and cross-driver synchronization of buffer access * implicit fences (represented internally in the kernel with &struct dma_fence) * can be attached to a &dma_buf. The glue for that and a few related things are * provided in the &dma_resv structure. * * Userspace can query the state of these implicitly tracked fences using poll() * and related system calls: * * - Checking for EPOLLIN, i.e. read access, can be use to query the state of the * most recent write or exclusive fence. * * - Checking for EPOLLOUT, i.e. write access, can be used to query the state of * all attached fences, shared and exclusive ones. * * Note that this only signals the completion of the respective fences, i.e. the * DMA transfers are complete. Cache flushing and any other necessary * preparations before CPU access can begin still need to happen. * * As an alternative to poll(), the set of fences on DMA buffer can be * exported as a &sync_file using &dma_buf_sync_file_export. */ static void dma_buf_poll_cb(struct dma_fence *fence, struct dma_fence_cb *cb) { struct dma_buf_poll_cb_t *dcb = (struct dma_buf_poll_cb_t *)cb; struct dma_buf *dmabuf = container_of(dcb->poll, struct dma_buf, poll); unsigned long flags; spin_lock_irqsave(&dcb->poll->lock, flags); wake_up_locked_poll(dcb->poll, dcb->active); dcb->active = 0; spin_unlock_irqrestore(&dcb->poll->lock, flags); dma_fence_put(fence); /* Paired with get_file in dma_buf_poll */ fput(dmabuf->file); } static bool dma_buf_poll_add_cb(struct dma_resv *resv, bool write, struct dma_buf_poll_cb_t *dcb) { struct dma_resv_iter cursor; struct dma_fence *fence; int r; dma_resv_for_each_fence(&cursor, resv, dma_resv_usage_rw(write), fence) { dma_fence_get(fence); r = dma_fence_add_callback(fence, &dcb->cb, dma_buf_poll_cb); if (!r) return true; dma_fence_put(fence); } return false; } static __poll_t dma_buf_poll(struct file *file, poll_table *poll) { struct dma_buf *dmabuf; struct dma_resv *resv; __poll_t events; dmabuf = file->private_data; if (!dmabuf || !dmabuf->resv) return EPOLLERR; resv = dmabuf->resv; poll_wait(file, &dmabuf->poll, poll); events = poll_requested_events(poll) & (EPOLLIN | EPOLLOUT); if (!events) return 0; dma_resv_lock(resv, NULL); if (events & EPOLLOUT) { struct dma_buf_poll_cb_t *dcb = &dmabuf->cb_out; /* Check that callback isn't busy */ spin_lock_irq(&dmabuf->poll.lock); if (dcb->active) events &= ~EPOLLOUT; else dcb->active = EPOLLOUT; spin_unlock_irq(&dmabuf->poll.lock); if (events & EPOLLOUT) { /* Paired with fput in dma_buf_poll_cb */ get_file(dmabuf->file); if (!dma_buf_poll_add_cb(resv, true, dcb)) /* No callback queued, wake up any other waiters */ dma_buf_poll_cb(NULL, &dcb->cb); else events &= ~EPOLLOUT; } } if (events & EPOLLIN) { struct dma_buf_poll_cb_t *dcb = &dmabuf->cb_in; /* Check that callback isn't busy */ spin_lock_irq(&dmabuf->poll.lock); if (dcb->active) events &= ~EPOLLIN; else dcb->active = EPOLLIN; spin_unlock_irq(&dmabuf->poll.lock); if (events & EPOLLIN) { /* Paired with fput in dma_buf_poll_cb */ get_file(dmabuf->file); if (!dma_buf_poll_add_cb(resv, false, dcb)) /* No callback queued, wake up any other waiters */ dma_buf_poll_cb(NULL, &dcb->cb); else events &= ~EPOLLIN; } } dma_resv_unlock(resv); return events; } /** * dma_buf_set_name - Set a name to a specific dma_buf to track the usage. * It could support changing the name of the dma-buf if the same * piece of memory is used for multiple purpose between different devices. * * @dmabuf: [in] dmabuf buffer that will be renamed. * @buf: [in] A piece of userspace memory that contains the name of * the dma-buf. * * Returns 0 on success. If the dma-buf buffer is already attached to * devices, return -EBUSY. * */ static long dma_buf_set_name(struct dma_buf *dmabuf, const char __user *buf) { char *name = strndup_user(buf, DMA_BUF_NAME_LEN); if (IS_ERR(name)) return PTR_ERR(name); spin_lock(&dmabuf->name_lock); kfree(dmabuf->name); dmabuf->name = name; spin_unlock(&dmabuf->name_lock); return 0; } #if IS_ENABLED(CONFIG_SYNC_FILE) static long dma_buf_export_sync_file(struct dma_buf *dmabuf, void __user *user_data) { struct dma_buf_export_sync_file arg; enum dma_resv_usage usage; struct dma_fence *fence = NULL; struct sync_file *sync_file; int fd, ret; if (copy_from_user(&arg, user_data, sizeof(arg))) return -EFAULT; if (arg.flags & ~DMA_BUF_SYNC_RW) return -EINVAL; if ((arg.flags & DMA_BUF_SYNC_RW) == 0) return -EINVAL; fd = get_unused_fd_flags(O_CLOEXEC); if (fd < 0) return fd; usage = dma_resv_usage_rw(arg.flags & DMA_BUF_SYNC_WRITE); ret = dma_resv_get_singleton(dmabuf->resv, usage, &fence); if (ret) goto err_put_fd; if (!fence) fence = dma_fence_get_stub(); sync_file = sync_file_create(fence); dma_fence_put(fence); if (!sync_file) { ret = -ENOMEM; goto err_put_fd; } arg.fd = fd; if (copy_to_user(user_data, &arg, sizeof(arg))) { ret = -EFAULT; goto err_put_file; } fd_install(fd, sync_file->file); return 0; err_put_file: fput(sync_file->file); err_put_fd: put_unused_fd(fd); return ret; } static long dma_buf_import_sync_file(struct dma_buf *dmabuf, const void __user *user_data) { struct dma_buf_import_sync_file arg; struct dma_fence *fence, *f; enum dma_resv_usage usage; struct dma_fence_unwrap iter; unsigned int num_fences; int ret = 0; if (copy_from_user(&arg, user_data, sizeof(arg))) return -EFAULT; if (arg.flags & ~DMA_BUF_SYNC_RW) return -EINVAL; if ((arg.flags & DMA_BUF_SYNC_RW) == 0) return -EINVAL; fence = sync_file_get_fence(arg.fd); if (!fence) return -EINVAL; usage = (arg.flags & DMA_BUF_SYNC_WRITE) ? DMA_RESV_USAGE_WRITE : DMA_RESV_USAGE_READ; num_fences = 0; dma_fence_unwrap_for_each(f, &iter, fence) ++num_fences; if (num_fences > 0) { dma_resv_lock(dmabuf->resv, NULL); ret = dma_resv_reserve_fences(dmabuf->resv, num_fences); if (!ret) { dma_fence_unwrap_for_each(f, &iter, fence) dma_resv_add_fence(dmabuf->resv, f, usage); } dma_resv_unlock(dmabuf->resv); } dma_fence_put(fence); return ret; } #endif static long dma_buf_ioctl(struct file *file, unsigned int cmd, unsigned long arg) { struct dma_buf *dmabuf; struct dma_buf_sync sync; enum dma_data_direction direction; int ret; dmabuf = file->private_data; switch (cmd) { case DMA_BUF_IOCTL_SYNC: if (copy_from_user(&sync, (void __user *) arg, sizeof(sync))) return -EFAULT; if (sync.flags & ~DMA_BUF_SYNC_VALID_FLAGS_MASK) return -EINVAL; switch (sync.flags & DMA_BUF_SYNC_RW) { case DMA_BUF_SYNC_READ: direction = DMA_FROM_DEVICE; break; case DMA_BUF_SYNC_WRITE: direction = DMA_TO_DEVICE; break; case DMA_BUF_SYNC_RW: direction = DMA_BIDIRECTIONAL; break; default: return -EINVAL; } if (sync.flags & DMA_BUF_SYNC_END) ret = dma_buf_end_cpu_access(dmabuf, direction); else ret = dma_buf_begin_cpu_access(dmabuf, direction); return ret; case DMA_BUF_SET_NAME_A: case DMA_BUF_SET_NAME_B: return dma_buf_set_name(dmabuf, (const char __user *)arg); #if IS_ENABLED(CONFIG_SYNC_FILE) case DMA_BUF_IOCTL_EXPORT_SYNC_FILE: return dma_buf_export_sync_file(dmabuf, (void __user *)arg); case DMA_BUF_IOCTL_IMPORT_SYNC_FILE: return dma_buf_import_sync_file(dmabuf, (const void __user *)arg); #endif default: return -ENOTTY; } } static void dma_buf_show_fdinfo(struct seq_file *m, struct file *file) { struct dma_buf *dmabuf = file->private_data; seq_printf(m, "size:\t%zu\n", dmabuf->size); /* Don't count the temporary reference taken inside procfs seq_show */ seq_printf(m, "count:\t%ld\n", file_count(dmabuf->file) - 1); seq_printf(m, "exp_name:\t%s\n", dmabuf->exp_name); spin_lock(&dmabuf->name_lock); if (dmabuf->name) seq_printf(m, "name:\t%s\n", dmabuf->name); spin_unlock(&dmabuf->name_lock); } static const struct file_operations dma_buf_fops = { .release = dma_buf_file_release, .mmap = dma_buf_mmap_internal, .llseek = dma_buf_llseek, .poll = dma_buf_poll, .unlocked_ioctl = dma_buf_ioctl, .compat_ioctl = compat_ptr_ioctl, .show_fdinfo = dma_buf_show_fdinfo, }; /* * is_dma_buf_file - Check if struct file* is associated with dma_buf */ static inline int is_dma_buf_file(struct file *file) { return file->f_op == &dma_buf_fops; } static struct file *dma_buf_getfile(size_t size, int flags) { static atomic64_t dmabuf_inode = ATOMIC64_INIT(0); struct inode *inode = alloc_anon_inode(dma_buf_mnt->mnt_sb); struct file *file; if (IS_ERR(inode)) return ERR_CAST(inode); inode->i_size = size; inode_set_bytes(inode, size); /* * The ->i_ino acquired from get_next_ino() is not unique thus * not suitable for using it as dentry name by dmabuf stats. * Override ->i_ino with the unique and dmabuffs specific * value. */ inode->i_ino = atomic64_inc_return(&dmabuf_inode); flags &= O_ACCMODE | O_NONBLOCK; file = alloc_file_pseudo(inode, dma_buf_mnt, "dmabuf", flags, &dma_buf_fops); if (IS_ERR(file)) goto err_alloc_file; return file; err_alloc_file: iput(inode); return file; } /** * DOC: dma buf device access * * For device DMA access to a shared DMA buffer the usual sequence of operations * is fairly simple: * * 1. The exporter defines his exporter instance using * DEFINE_DMA_BUF_EXPORT_INFO() and calls dma_buf_export() to wrap a private * buffer object into a &dma_buf. It then exports that &dma_buf to userspace * as a file descriptor by calling dma_buf_fd(). * * 2. Userspace passes this file-descriptors to all drivers it wants this buffer * to share with: First the file descriptor is converted to a &dma_buf using * dma_buf_get(). Then the buffer is attached to the device using * dma_buf_attach(). * * Up to this stage the exporter is still free to migrate or reallocate the * backing storage. * * 3. Once the buffer is attached to all devices userspace can initiate DMA * access to the shared buffer. In the kernel this is done by calling * dma_buf_map_attachment() and dma_buf_unmap_attachment(). * * 4. Once a driver is done with a shared buffer it needs to call * dma_buf_detach() (after cleaning up any mappings) and then release the * reference acquired with dma_buf_get() by calling dma_buf_put(). * * For the detailed semantics exporters are expected to implement see * &dma_buf_ops. */ /** * dma_buf_export - Creates a new dma_buf, and associates an anon file * with this buffer, so it can be exported. * Also connect the allocator specific data and ops to the buffer. * Additionally, provide a name string for exporter; useful in debugging. * * @exp_info: [in] holds all the export related information provided * by the exporter. see &struct dma_buf_export_info * for further details. * * Returns, on success, a newly created struct dma_buf object, which wraps the * supplied private data and operations for struct dma_buf_ops. On either * missing ops, or error in allocating struct dma_buf, will return negative * error. * * For most cases the easiest way to create @exp_info is through the * %DEFINE_DMA_BUF_EXPORT_INFO macro. */ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info) { struct dma_buf *dmabuf; struct dma_resv *resv = exp_info->resv; struct file *file; size_t alloc_size = sizeof(struct dma_buf); int ret; if (WARN_ON(!exp_info->priv || !exp_info->ops || !exp_info->ops->map_dma_buf || !exp_info->ops->unmap_dma_buf || !exp_info->ops->release)) return ERR_PTR(-EINVAL); if (WARN_ON(!exp_info->ops->pin != !exp_info->ops->unpin)) return ERR_PTR(-EINVAL); if (!try_module_get(exp_info->owner)) return ERR_PTR(-ENOENT); file = dma_buf_getfile(exp_info->size, exp_info->flags); if (IS_ERR(file)) { ret = PTR_ERR(file); goto err_module; } if (!exp_info->resv) alloc_size += sizeof(struct dma_resv); else /* prevent &dma_buf[1] == dma_buf->resv */ alloc_size += 1; dmabuf = kzalloc(alloc_size, GFP_KERNEL); if (!dmabuf) { ret = -ENOMEM; goto err_file; } dmabuf->priv = exp_info->priv; dmabuf->ops = exp_info->ops; dmabuf->size = exp_info->size; dmabuf->exp_name = exp_info->exp_name; dmabuf->owner = exp_info->owner; spin_lock_init(&dmabuf->name_lock); init_waitqueue_head(&dmabuf->poll); dmabuf->cb_in.poll = dmabuf->cb_out.poll = &dmabuf->poll; dmabuf->cb_in.active = dmabuf->cb_out.active = 0; INIT_LIST_HEAD(&dmabuf->attachments); if (!resv) { dmabuf->resv = (struct dma_resv *)&dmabuf[1]; dma_resv_init(dmabuf->resv); } else { dmabuf->resv = resv; } file->private_data = dmabuf; file->f_path.dentry->d_fsdata = dmabuf; dmabuf->file = file; __dma_buf_list_add(dmabuf); DMA_BUF_TRACE(trace_dma_buf_export, dmabuf); return dmabuf; err_file: fput(file); err_module: module_put(exp_info->owner); return ERR_PTR(ret); } EXPORT_SYMBOL_NS_GPL(dma_buf_export, "DMA_BUF"); /** * dma_buf_fd - returns a file descriptor for the given struct dma_buf * @dmabuf: [in] pointer to dma_buf for which fd is required. * @flags: [in] flags to give to fd * * On success, returns an associated 'fd'. Else, returns error. */ int dma_buf_fd(struct dma_buf *dmabuf, int flags) { int fd; if (!dmabuf || !dmabuf->file) return -EINVAL; fd = FD_ADD(flags, dmabuf->file); DMA_BUF_TRACE(trace_dma_buf_fd, dmabuf, fd); return fd; } EXPORT_SYMBOL_NS_GPL(dma_buf_fd, "DMA_BUF"); /** * dma_buf_get - returns the struct dma_buf related to an fd * @fd: [in] fd associated with the struct dma_buf to be returned * * On success, returns the struct dma_buf associated with an fd; uses * file's refcounting done by fget to increase refcount. returns ERR_PTR * otherwise. */ struct dma_buf *dma_buf_get(int fd) { struct file *file; struct dma_buf *dmabuf; file = fget(fd); if (!file) return ERR_PTR(-EBADF); if (!is_dma_buf_file(file)) { fput(file); return ERR_PTR(-EINVAL); } dmabuf = file->private_data; DMA_BUF_TRACE(trace_dma_buf_get, dmabuf, fd); return dmabuf; } EXPORT_SYMBOL_NS_GPL(dma_buf_get, "DMA_BUF"); /** * dma_buf_put - decreases refcount of the buffer * @dmabuf: [in] buffer to reduce refcount of * * Uses file's refcounting done implicitly by fput(). * * If, as a result of this call, the refcount becomes 0, the 'release' file * operation related to this fd is called. It calls &dma_buf_ops.release vfunc * in turn, and frees the memory allocated for dmabuf when exported. */ void dma_buf_put(struct dma_buf *dmabuf) { if (WARN_ON(!dmabuf || !dmabuf->file)) return; fput(dmabuf->file); DMA_BUF_TRACE(trace_dma_buf_put, dmabuf); } EXPORT_SYMBOL_NS_GPL(dma_buf_put, "DMA_BUF"); static int dma_buf_wrap_sg_table(struct sg_table **sg_table) { struct scatterlist *to_sg, *from_sg; struct sg_table *from = *sg_table; struct dma_buf_sg_table_wrapper *to; int i, ret; if (!IS_ENABLED(CONFIG_DMABUF_DEBUG)) return 0; /* * To catch abuse of the underlying struct page by importers copy the * sg_table without copying the page_link and give only the copy back to * the importer. */ to = kzalloc_obj(*to); if (!to) return -ENOMEM; ret = sg_alloc_table(&to->wrapper, from->nents, GFP_KERNEL); if (ret) goto free_to; to_sg = to->wrapper.sgl; for_each_sgtable_dma_sg(from, from_sg, i) { to_sg->offset = 0; to_sg->length = 0; sg_assign_page(to_sg, NULL); sg_dma_address(to_sg) = sg_dma_address(from_sg); sg_dma_len(to_sg) = sg_dma_len(from_sg); to_sg = sg_next(to_sg); } to->original = from; *sg_table = &to->wrapper; return 0; free_to: kfree(to); return ret; } static void dma_buf_unwrap_sg_table(struct sg_table **sg_table) { struct dma_buf_sg_table_wrapper *copy; if (!IS_ENABLED(CONFIG_DMABUF_DEBUG)) return; copy = container_of(*sg_table, typeof(*copy), wrapper); *sg_table = copy->original; sg_free_table(©->wrapper); kfree(copy); } static inline bool dma_buf_attachment_is_dynamic(struct dma_buf_attachment *attach) { return !!attach->importer_ops; } static bool dma_buf_pin_on_map(struct dma_buf_attachment *attach) { return attach->dmabuf->ops->pin && (!dma_buf_attachment_is_dynamic(attach) || !IS_ENABLED(CONFIG_DMABUF_MOVE_NOTIFY)); } /** * DOC: locking convention * * In order to avoid deadlock situations between dma-buf exports and importers, * all dma-buf API users must follow the common dma-buf locking convention. * * Convention for importers * * 1. Importers must hold the dma-buf reservation lock when calling these * functions: * * - dma_buf_pin() * - dma_buf_unpin() * - dma_buf_map_attachment() * - dma_buf_unmap_attachment() * - dma_buf_vmap() * - dma_buf_vunmap() * * 2. Importers must not hold the dma-buf reservation lock when calling these * functions: * * - dma_buf_attach() * - dma_buf_dynamic_attach() * - dma_buf_detach() * - dma_buf_export() * - dma_buf_fd() * - dma_buf_get() * - dma_buf_put() * - dma_buf_mmap() * - dma_buf_begin_cpu_access() * - dma_buf_end_cpu_access() * - dma_buf_map_attachment_unlocked() * - dma_buf_unmap_attachment_unlocked() * - dma_buf_vmap_unlocked() * - dma_buf_vunmap_unlocked() * * Convention for exporters * * 1. These &dma_buf_ops callbacks are invoked with unlocked dma-buf * reservation and exporter can take the lock: * * - &dma_buf_ops.attach() * - &dma_buf_ops.detach() * - &dma_buf_ops.release() * - &dma_buf_ops.begin_cpu_access() * - &dma_buf_ops.end_cpu_access() * - &dma_buf_ops.mmap() * * 2. These &dma_buf_ops callbacks are invoked with locked dma-buf * reservation and exporter can't take the lock: * * - &dma_buf_ops.pin() * - &dma_buf_ops.unpin() * - &dma_buf_ops.map_dma_buf() * - &dma_buf_ops.unmap_dma_buf() * - &dma_buf_ops.vmap() * - &dma_buf_ops.vunmap() * * 3. Exporters must hold the dma-buf reservation lock when calling these * functions: * * - dma_buf_move_notify() */ /** * dma_buf_dynamic_attach - Add the device to dma_buf's attachments list * @dmabuf: [in] buffer to attach device to. * @dev: [in] device to be attached. * @importer_ops: [in] importer operations for the attachment * @importer_priv: [in] importer private pointer for the attachment * * Returns struct dma_buf_attachment pointer for this attachment. Attachments * must be cleaned up by calling dma_buf_detach(). * * Optionally this calls &dma_buf_ops.attach to allow device-specific attach * functionality. * * Returns: * * A pointer to newly created &dma_buf_attachment on success, or a negative * error code wrapped into a pointer on failure. * * Note that this can fail if the backing storage of @dmabuf is in a place not * accessible to @dev, and cannot be moved to a more suitable place. This is * indicated with the error code -EBUSY. */ struct dma_buf_attachment * dma_buf_dynamic_attach(struct dma_buf *dmabuf, struct device *dev, const struct dma_buf_attach_ops *importer_ops, void *importer_priv) { struct dma_buf_attachment *attach; int ret; if (WARN_ON(!dmabuf || !dev)) return ERR_PTR(-EINVAL); if (WARN_ON(importer_ops && !importer_ops->move_notify)) return ERR_PTR(-EINVAL); attach = kzalloc_obj(*attach); if (!attach) return ERR_PTR(-ENOMEM); attach->dev = dev; attach->dmabuf = dmabuf; if (importer_ops) attach->peer2peer = importer_ops->allow_peer2peer; attach->importer_ops = importer_ops; attach->importer_priv = importer_priv; if (dmabuf->ops->attach) { ret = dmabuf->ops->attach(dmabuf, attach); if (ret) goto err_attach; } dma_resv_lock(dmabuf->resv, NULL); list_add(&attach->node, &dmabuf->attachments); dma_resv_unlock(dmabuf->resv); DMA_BUF_TRACE(trace_dma_buf_dynamic_attach, dmabuf, attach, dma_buf_attachment_is_dynamic(attach), dev); return attach; err_attach: kfree(attach); return ERR_PTR(ret); } EXPORT_SYMBOL_NS_GPL(dma_buf_dynamic_attach, "DMA_BUF"); /** * dma_buf_attach - Wrapper for dma_buf_dynamic_attach * @dmabuf: [in] buffer to attach device to. * @dev: [in] device to be attached. * * Wrapper to call dma_buf_dynamic_attach() for drivers which still use a static * mapping. */ struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf, struct device *dev) { return dma_buf_dynamic_attach(dmabuf, dev, NULL, NULL); } EXPORT_SYMBOL_NS_GPL(dma_buf_attach, "DMA_BUF"); /** * dma_buf_detach - Remove the given attachment from dmabuf's attachments list * @dmabuf: [in] buffer to detach from. * @attach: [in] attachment to be detached; is free'd after this call. * * Clean up a device attachment obtained by calling dma_buf_attach(). * * Optionally this calls &dma_buf_ops.detach for device-specific detach. */ void dma_buf_detach(struct dma_buf *dmabuf, struct dma_buf_attachment *attach) { if (WARN_ON(!dmabuf || !attach || dmabuf != attach->dmabuf)) return; dma_resv_lock(dmabuf->resv, NULL); list_del(&attach->node); dma_resv_unlock(dmabuf->resv); if (dmabuf->ops->detach) dmabuf->ops->detach(dmabuf, attach); DMA_BUF_TRACE(trace_dma_buf_detach, dmabuf, attach, dma_buf_attachment_is_dynamic(attach), attach->dev); kfree(attach); } EXPORT_SYMBOL_NS_GPL(dma_buf_detach, "DMA_BUF"); /** * dma_buf_pin - Lock down the DMA-buf * @attach: [in] attachment which should be pinned * * Only dynamic importers (who set up @attach with dma_buf_dynamic_attach()) may * call this, and only for limited use cases like scanout and not for temporary * pin operations. It is not permitted to allow userspace to pin arbitrary * amounts of buffers through this interface. * * Buffers must be unpinned by calling dma_buf_unpin(). * * Returns: * 0 on success, negative error code on failure. */ int dma_buf_pin(struct dma_buf_attachment *attach) { struct dma_buf *dmabuf = attach->dmabuf; int ret = 0; WARN_ON(!attach->importer_ops); dma_resv_assert_held(dmabuf->resv); if (dmabuf->ops->pin) ret = dmabuf->ops->pin(attach); return ret; } EXPORT_SYMBOL_NS_GPL(dma_buf_pin, "DMA_BUF"); /** * dma_buf_unpin - Unpin a DMA-buf * @attach: [in] attachment which should be unpinned * * This unpins a buffer pinned by dma_buf_pin() and allows the exporter to move * any mapping of @attach again and inform the importer through * &dma_buf_attach_ops.move_notify. */ void dma_buf_unpin(struct dma_buf_attachment *attach) { struct dma_buf *dmabuf = attach->dmabuf; WARN_ON(!attach->importer_ops); dma_resv_assert_held(dmabuf->resv); if (dmabuf->ops->unpin) dmabuf->ops->unpin(attach); } EXPORT_SYMBOL_NS_GPL(dma_buf_unpin, "DMA_BUF"); /** * dma_buf_map_attachment - Returns the scatterlist table of the attachment; * mapped into _device_ address space. Is a wrapper for map_dma_buf() of the * dma_buf_ops. * @attach: [in] attachment whose scatterlist is to be returned * @direction: [in] direction of DMA transfer * * Returns sg_table containing the scatterlist to be returned; returns ERR_PTR * on error. May return -EINTR if it is interrupted by a signal. * * On success, the DMA addresses and lengths in the returned scatterlist are * PAGE_SIZE aligned. * * A mapping must be unmapped by using dma_buf_unmap_attachment(). Note that * the underlying backing storage is pinned for as long as a mapping exists, * therefore users/importers should not hold onto a mapping for undue amounts of * time. * * Important: Dynamic importers must wait for the exclusive fence of the struct * dma_resv attached to the DMA-BUF first. */ struct sg_table *dma_buf_map_attachment(struct dma_buf_attachment *attach, enum dma_data_direction direction) { struct sg_table *sg_table; signed long ret; might_sleep(); if (WARN_ON(!attach || !attach->dmabuf)) return ERR_PTR(-EINVAL); dma_resv_assert_held(attach->dmabuf->resv); if (dma_buf_pin_on_map(attach)) { ret = attach->dmabuf->ops->pin(attach); /* * Catch exporters making buffers inaccessible even when * attachments preventing that exist. */ WARN_ON_ONCE(ret == -EBUSY); if (ret) return ERR_PTR(ret); } sg_table = attach->dmabuf->ops->map_dma_buf(attach, direction); if (!sg_table) sg_table = ERR_PTR(-ENOMEM); if (IS_ERR(sg_table)) goto error_unpin; /* * Importers with static attachments don't wait for fences. */ if (!dma_buf_attachment_is_dynamic(attach)) { ret = dma_resv_wait_timeout(attach->dmabuf->resv, DMA_RESV_USAGE_KERNEL, true, MAX_SCHEDULE_TIMEOUT); if (ret < 0) goto error_unmap; } ret = dma_buf_wrap_sg_table(&sg_table); if (ret) goto error_unmap; if (IS_ENABLED(CONFIG_DMA_API_DEBUG)) { struct scatterlist *sg; u64 addr; int len; int i; for_each_sgtable_dma_sg(sg_table, sg, i) { addr = sg_dma_address(sg); len = sg_dma_len(sg); if (!PAGE_ALIGNED(addr) || !PAGE_ALIGNED(len)) { pr_debug("%s: addr %llx or len %x is not page aligned!\n", __func__, addr, len); break; } } } return sg_table; error_unmap: attach->dmabuf->ops->unmap_dma_buf(attach, sg_table, direction); sg_table = ERR_PTR(ret); error_unpin: if (dma_buf_pin_on_map(attach)) attach->dmabuf->ops->unpin(attach); return sg_table; } EXPORT_SYMBOL_NS_GPL(dma_buf_map_attachment, "DMA_BUF"); /** * dma_buf_map_attachment_unlocked - Returns the scatterlist table of the attachment; * mapped into _device_ address space. Is a wrapper for map_dma_buf() of the * dma_buf_ops. * @attach: [in] attachment whose scatterlist is to be returned * @direction: [in] direction of DMA transfer * * Unlocked variant of dma_buf_map_attachment(). */ struct sg_table * dma_buf_map_attachment_unlocked(struct dma_buf_attachment *attach, enum dma_data_direction direction) { struct sg_table *sg_table; might_sleep(); if (WARN_ON(!attach || !attach->dmabuf)) return ERR_PTR(-EINVAL); dma_resv_lock(attach->dmabuf->resv, NULL); sg_table = dma_buf_map_attachment(attach, direction); dma_resv_unlock(attach->dmabuf->resv); return sg_table; } EXPORT_SYMBOL_NS_GPL(dma_buf_map_attachment_unlocked, "DMA_BUF"); /** * dma_buf_unmap_attachment - unmaps and decreases usecount of the buffer;might * deallocate the scatterlist associated. Is a wrapper for unmap_dma_buf() of * dma_buf_ops. * @attach: [in] attachment to unmap buffer from * @sg_table: [in] scatterlist info of the buffer to unmap * @direction: [in] direction of DMA transfer * * This unmaps a DMA mapping for @attached obtained by dma_buf_map_attachment(). */ void dma_buf_unmap_attachment(struct dma_buf_attachment *attach, struct sg_table *sg_table, enum dma_data_direction direction) { might_sleep(); if (WARN_ON(!attach || !attach->dmabuf || !sg_table)) return; dma_resv_assert_held(attach->dmabuf->resv); dma_buf_unwrap_sg_table(&sg_table); attach->dmabuf->ops->unmap_dma_buf(attach, sg_table, direction); if (dma_buf_pin_on_map(attach)) attach->dmabuf->ops->unpin(attach); } EXPORT_SYMBOL_NS_GPL(dma_buf_unmap_attachment, "DMA_BUF"); /** * dma_buf_unmap_attachment_unlocked - unmaps and decreases usecount of the buffer;might * deallocate the scatterlist associated. Is a wrapper for unmap_dma_buf() of * dma_buf_ops. * @attach: [in] attachment to unmap buffer from * @sg_table: [in] scatterlist info of the buffer to unmap * @direction: [in] direction of DMA transfer * * Unlocked variant of dma_buf_unmap_attachment(). */ void dma_buf_unmap_attachment_unlocked(struct dma_buf_attachment *attach, struct sg_table *sg_table, enum dma_data_direction direction) { might_sleep(); if (WARN_ON(!attach || !attach->dmabuf || !sg_table)) return; dma_resv_lock(attach->dmabuf->resv, NULL); dma_buf_unmap_attachment(attach, sg_table, direction); dma_resv_unlock(attach->dmabuf->resv); } EXPORT_SYMBOL_NS_GPL(dma_buf_unmap_attachment_unlocked, "DMA_BUF"); /** * dma_buf_move_notify - notify attachments that DMA-buf is moving * * @dmabuf: [in] buffer which is moving * * Informs all attachments that they need to destroy and recreate all their * mappings. */ void dma_buf_move_notify(struct dma_buf *dmabuf) { struct dma_buf_attachment *attach; dma_resv_assert_held(dmabuf->resv); list_for_each_entry(attach, &dmabuf->attachments, node) if (attach->importer_ops) attach->importer_ops->move_notify(attach); } EXPORT_SYMBOL_NS_GPL(dma_buf_move_notify, "DMA_BUF"); /** * DOC: cpu access * * There are multiple reasons for supporting CPU access to a dma buffer object: * * - Fallback operations in the kernel, for example when a device is connected * over USB and the kernel needs to shuffle the data around first before * sending it away. Cache coherency is handled by bracketing any transactions * with calls to dma_buf_begin_cpu_access() and dma_buf_end_cpu_access() * access. * * Since for most kernel internal dma-buf accesses need the entire buffer, a * vmap interface is introduced. Note that on very old 32-bit architectures * vmalloc space might be limited and result in vmap calls failing. * * Interfaces: * * .. code-block:: c * * void *dma_buf_vmap(struct dma_buf *dmabuf, struct iosys_map *map) * void dma_buf_vunmap(struct dma_buf *dmabuf, struct iosys_map *map) * * The vmap call can fail if there is no vmap support in the exporter, or if * it runs out of vmalloc space. Note that the dma-buf layer keeps a reference * count for all vmap access and calls down into the exporter's vmap function * only when no vmapping exists, and only unmaps it once. Protection against * concurrent vmap/vunmap calls is provided by taking the &dma_buf.lock mutex. * * - For full compatibility on the importer side with existing userspace * interfaces, which might already support mmap'ing buffers. This is needed in * many processing pipelines (e.g. feeding a software rendered image into a * hardware pipeline, thumbnail creation, snapshots, ...). Also, Android's ION * framework already supported this and for DMA buffer file descriptors to * replace ION buffers mmap support was needed. * * There is no special interfaces, userspace simply calls mmap on the dma-buf * fd. But like for CPU access there's a need to bracket the actual access, * which is handled by the ioctl (DMA_BUF_IOCTL_SYNC). Note that * DMA_BUF_IOCTL_SYNC can fail with -EAGAIN or -EINTR, in which case it must * be restarted. * * Some systems might need some sort of cache coherency management e.g. when * CPU and GPU domains are being accessed through dma-buf at the same time. * To circumvent this problem there are begin/end coherency markers, that * forward directly to existing dma-buf device drivers vfunc hooks. Userspace * can make use of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The * sequence would be used like following: * * - mmap dma-buf fd * - for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. read/write * to mmap area 3. SYNC_END ioctl. This can be repeated as often as you * want (with the new data being consumed by say the GPU or the scanout * device) * - munmap once you don't need the buffer any more * * For correctness and optimal performance, it is always required to use * SYNC_START and SYNC_END before and after, respectively, when accessing the * mapped address. Userspace cannot rely on coherent access, even when there * are systems where it just works without calling these ioctls. * * - And as a CPU fallback in userspace processing pipelines. * * Similar to the motivation for kernel cpu access it is again important that * the userspace code of a given importing subsystem can use the same * interfaces with a imported dma-buf buffer object as with a native buffer * object. This is especially important for drm where the userspace part of * contemporary OpenGL, X, and other drivers is huge, and reworking them to * use a different way to mmap a buffer rather invasive. * * The assumption in the current dma-buf interfaces is that redirecting the * initial mmap is all that's needed. A survey of some of the existing * subsystems shows that no driver seems to do any nefarious thing like * syncing up with outstanding asynchronous processing on the device or * allocating special resources at fault time. So hopefully this is good * enough, since adding interfaces to intercept pagefaults and allow pte * shootdowns would increase the complexity quite a bit. * * Interface: * * .. code-block:: c * * int dma_buf_mmap(struct dma_buf *, struct vm_area_struct *, unsigned long); * * If the importing subsystem simply provides a special-purpose mmap call to * set up a mapping in userspace, calling do_mmap with &dma_buf.file will * equally achieve that for a dma-buf object. */ static int __dma_buf_begin_cpu_access(struct dma_buf *dmabuf, enum dma_data_direction direction) { bool write = (direction == DMA_BIDIRECTIONAL || direction == DMA_TO_DEVICE); struct dma_resv *resv = dmabuf->resv; long ret; /* Wait on any implicit rendering fences */ ret = dma_resv_wait_timeout(resv, dma_resv_usage_rw(write), true, MAX_SCHEDULE_TIMEOUT); if (ret < 0) return ret; return 0; } /** * dma_buf_begin_cpu_access - Must be called before accessing a dma_buf from the * cpu in the kernel context. Calls begin_cpu_access to allow exporter-specific * preparations. Coherency is only guaranteed in the specified range for the * specified access direction. * @dmabuf: [in] buffer to prepare cpu access for. * @direction: [in] direction of access. * * After the cpu access is complete the caller should call * dma_buf_end_cpu_access(). Only when cpu access is bracketed by both calls is * it guaranteed to be coherent with other DMA access. * * This function will also wait for any DMA transactions tracked through * implicit synchronization in &dma_buf.resv. For DMA transactions with explicit * synchronization this function will only ensure cache coherency, callers must * ensure synchronization with such DMA transactions on their own. * * Can return negative error values, returns 0 on success. */ int dma_buf_begin_cpu_access(struct dma_buf *dmabuf, enum dma_data_direction direction) { int ret = 0; if (WARN_ON(!dmabuf)) return -EINVAL; might_lock(&dmabuf->resv->lock.base); if (dmabuf->ops->begin_cpu_access) ret = dmabuf->ops->begin_cpu_access(dmabuf, direction); /* Ensure that all fences are waited upon - but we first allow * the native handler the chance to do so more efficiently if it * chooses. A double invocation here will be reasonably cheap no-op. */ if (ret == 0) ret = __dma_buf_begin_cpu_access(dmabuf, direction); return ret; } EXPORT_SYMBOL_NS_GPL(dma_buf_begin_cpu_access, "DMA_BUF"); /** * dma_buf_end_cpu_access - Must be called after accessing a dma_buf from the * cpu in the kernel context. Calls end_cpu_access to allow exporter-specific * actions. Coherency is only guaranteed in the specified range for the * specified access direction. * @dmabuf: [in] buffer to complete cpu access for. * @direction: [in] direction of access. * * This terminates CPU access started with dma_buf_begin_cpu_access(). * * Can return negative error values, returns 0 on success. */ int dma_buf_end_cpu_access(struct dma_buf *dmabuf, enum dma_data_direction direction) { int ret = 0; WARN_ON(!dmabuf); might_lock(&dmabuf->resv->lock.base); if (dmabuf->ops->end_cpu_access) ret = dmabuf->ops->end_cpu_access(dmabuf, direction); return ret; } EXPORT_SYMBOL_NS_GPL(dma_buf_end_cpu_access, "DMA_BUF"); /** * dma_buf_mmap - Setup up a userspace mmap with the given vma * @dmabuf: [in] buffer that should back the vma * @vma: [in] vma for the mmap * @pgoff: [in] offset in pages where this mmap should start within the * dma-buf buffer. * * This function adjusts the passed in vma so that it points at the file of the * dma_buf operation. It also adjusts the starting pgoff and does bounds * checking on the size of the vma. Then it calls the exporters mmap function to * set up the mapping. * * Can return negative error values, returns 0 on success. */ int dma_buf_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma, unsigned long pgoff) { if (WARN_ON(!dmabuf || !vma)) return -EINVAL; /* check if buffer supports mmap */ if (!dmabuf->ops->mmap) return -EINVAL; /* check for offset overflow */ if (pgoff + vma_pages(vma) < pgoff) return -EOVERFLOW; /* check for overflowing the buffer's size */ if (pgoff + vma_pages(vma) > dmabuf->size >> PAGE_SHIFT) return -EINVAL; /* readjust the vma */ vma_set_file(vma, dmabuf->file); vma->vm_pgoff = pgoff; DMA_BUF_TRACE(trace_dma_buf_mmap, dmabuf); return dmabuf->ops->mmap(dmabuf, vma); } EXPORT_SYMBOL_NS_GPL(dma_buf_mmap, "DMA_BUF"); /** * dma_buf_vmap - Create virtual mapping for the buffer object into kernel * address space. Same restrictions as for vmap and friends apply. * @dmabuf: [in] buffer to vmap * @map: [out] returns the vmap pointer * * This call may fail due to lack of virtual mapping address space. * These calls are optional in drivers. The intended use for them * is for mapping objects linear in kernel space for high use objects. * * To ensure coherency users must call dma_buf_begin_cpu_access() and * dma_buf_end_cpu_access() around any cpu access performed through this * mapping. * * Returns 0 on success, or a negative errno code otherwise. */ int dma_buf_vmap(struct dma_buf *dmabuf, struct iosys_map *map) { struct iosys_map ptr; int ret; iosys_map_clear(map); if (WARN_ON(!dmabuf)) return -EINVAL; dma_resv_assert_held(dmabuf->resv); if (!dmabuf->ops->vmap) return -EINVAL; if (dmabuf->vmapping_counter) { dmabuf->vmapping_counter++; BUG_ON(iosys_map_is_null(&dmabuf->vmap_ptr)); *map = dmabuf->vmap_ptr; return 0; } BUG_ON(iosys_map_is_set(&dmabuf->vmap_ptr)); ret = dmabuf->ops->vmap(dmabuf, &ptr); if (WARN_ON_ONCE(ret)) return ret; dmabuf->vmap_ptr = ptr; dmabuf->vmapping_counter = 1; *map = dmabuf->vmap_ptr; return 0; } EXPORT_SYMBOL_NS_GPL(dma_buf_vmap, "DMA_BUF"); /** * dma_buf_vmap_unlocked - Create virtual mapping for the buffer object into kernel * address space. Same restrictions as for vmap and friends apply. * @dmabuf: [in] buffer to vmap * @map: [out] returns the vmap pointer * * Unlocked version of dma_buf_vmap() * * Returns 0 on success, or a negative errno code otherwise. */ int dma_buf_vmap_unlocked(struct dma_buf *dmabuf, struct iosys_map *map) { int ret; iosys_map_clear(map); if (WARN_ON(!dmabuf)) return -EINVAL; dma_resv_lock(dmabuf->resv, NULL); ret = dma_buf_vmap(dmabuf, map); dma_resv_unlock(dmabuf->resv); return ret; } EXPORT_SYMBOL_NS_GPL(dma_buf_vmap_unlocked, "DMA_BUF"); /** * dma_buf_vunmap - Unmap a vmap obtained by dma_buf_vmap. * @dmabuf: [in] buffer to vunmap * @map: [in] vmap pointer to vunmap */ void dma_buf_vunmap(struct dma_buf *dmabuf, struct iosys_map *map) { if (WARN_ON(!dmabuf)) return; dma_resv_assert_held(dmabuf->resv); BUG_ON(iosys_map_is_null(&dmabuf->vmap_ptr)); BUG_ON(dmabuf->vmapping_counter == 0); BUG_ON(!iosys_map_is_equal(&dmabuf->vmap_ptr, map)); if (--dmabuf->vmapping_counter == 0) { if (dmabuf->ops->vunmap) dmabuf->ops->vunmap(dmabuf, map); iosys_map_clear(&dmabuf->vmap_ptr); } } EXPORT_SYMBOL_NS_GPL(dma_buf_vunmap, "DMA_BUF"); /** * dma_buf_vunmap_unlocked - Unmap a vmap obtained by dma_buf_vmap. * @dmabuf: [in] buffer to vunmap * @map: [in] vmap pointer to vunmap */ void dma_buf_vunmap_unlocked(struct dma_buf *dmabuf, struct iosys_map *map) { if (WARN_ON(!dmabuf)) return; dma_resv_lock(dmabuf->resv, NULL); dma_buf_vunmap(dmabuf, map); dma_resv_unlock(dmabuf->resv); } EXPORT_SYMBOL_NS_GPL(dma_buf_vunmap_unlocked, "DMA_BUF"); #ifdef CONFIG_DEBUG_FS static int dma_buf_debug_show(struct seq_file *s, void *unused) { struct dma_buf *buf_obj; struct dma_buf_attachment *attach_obj; int count = 0, attach_count; size_t size = 0; int ret; ret = mutex_lock_interruptible(&dmabuf_list_mutex); if (ret) return ret; seq_puts(s, "\nDma-buf Objects:\n"); seq_printf(s, "%-8s\t%-8s\t%-8s\t%-8s\texp_name\t%-8s\tname\n", "size", "flags", "mode", "count", "ino"); list_for_each_entry(buf_obj, &dmabuf_list, list_node) { ret = dma_resv_lock_interruptible(buf_obj->resv, NULL); if (ret) goto error_unlock; spin_lock(&buf_obj->name_lock); seq_printf(s, "%08zu\t%08x\t%08x\t%08ld\t%s\t%08lu\t%s\n", buf_obj->size, buf_obj->file->f_flags, buf_obj->file->f_mode, file_count(buf_obj->file), buf_obj->exp_name, file_inode(buf_obj->file)->i_ino, buf_obj->name ?: "<none>"); spin_unlock(&buf_obj->name_lock); dma_resv_describe(buf_obj->resv, s); seq_puts(s, "\tAttached Devices:\n"); attach_count = 0; list_for_each_entry(attach_obj, &buf_obj->attachments, node) { seq_printf(s, "\t%s\n", dev_name(attach_obj->dev)); attach_count++; } dma_resv_unlock(buf_obj->resv); seq_printf(s, "Total %d devices attached\n\n", attach_count); count++; size += buf_obj->size; } seq_printf(s, "\nTotal %d objects, %zu bytes\n", count, size); mutex_unlock(&dmabuf_list_mutex); return 0; error_unlock: mutex_unlock(&dmabuf_list_mutex); return ret; } DEFINE_SHOW_ATTRIBUTE(dma_buf_debug); static struct dentry *dma_buf_debugfs_dir; static int dma_buf_init_debugfs(void) { struct dentry *d; int err = 0; d = debugfs_create_dir("dma_buf", NULL); if (IS_ERR(d)) return PTR_ERR(d); dma_buf_debugfs_dir = d; d = debugfs_create_file("bufinfo", 0444, dma_buf_debugfs_dir, NULL, &dma_buf_debug_fops); if (IS_ERR(d)) { pr_debug("dma_buf: debugfs: failed to create node bufinfo\n"); debugfs_remove_recursive(dma_buf_debugfs_dir); dma_buf_debugfs_dir = NULL; err = PTR_ERR(d); } return err; } static void dma_buf_uninit_debugfs(void) { debugfs_remove_recursive(dma_buf_debugfs_dir); } #else static inline int dma_buf_init_debugfs(void) { return 0; } static inline void dma_buf_uninit_debugfs(void) { } #endif static int __init dma_buf_init(void) { dma_buf_mnt = kern_mount(&dma_buf_fs_type); if (IS_ERR(dma_buf_mnt)) return PTR_ERR(dma_buf_mnt); dma_buf_init_debugfs(); return 0; } subsys_initcall(dma_buf_init); static void __exit dma_buf_deinit(void) { dma_buf_uninit_debugfs(); kern_unmount(dma_buf_mnt); } __exitcall(dma_buf_deinit); |
| 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 | // SPDX-License-Identifier: GPL-2.0-only /* * vsock sock_diag(7) module * * Copyright (C) 2017 Red Hat, Inc. * Author: Stefan Hajnoczi <stefanha@redhat.com> */ #include <linux/module.h> #include <linux/sock_diag.h> #include <linux/vm_sockets_diag.h> #include <net/af_vsock.h> static int sk_diag_fill(struct sock *sk, struct sk_buff *skb, u32 portid, u32 seq, u32 flags) { struct vsock_sock *vsk = vsock_sk(sk); struct vsock_diag_msg *rep; struct nlmsghdr *nlh; nlh = nlmsg_put(skb, portid, seq, SOCK_DIAG_BY_FAMILY, sizeof(*rep), flags); if (!nlh) return -EMSGSIZE; rep = nlmsg_data(nlh); rep->vdiag_family = AF_VSOCK; /* Lock order dictates that sk_lock is acquired before * vsock_table_lock, so we cannot lock here. Simply don't take * sk_lock; sk is guaranteed to stay alive since vsock_table_lock is * held. */ rep->vdiag_type = sk->sk_type; rep->vdiag_state = sk->sk_state; rep->vdiag_shutdown = sk->sk_shutdown; rep->vdiag_src_cid = vsk->local_addr.svm_cid; rep->vdiag_src_port = vsk->local_addr.svm_port; rep->vdiag_dst_cid = vsk->remote_addr.svm_cid; rep->vdiag_dst_port = vsk->remote_addr.svm_port; rep->vdiag_ino = sock_i_ino(sk); sock_diag_save_cookie(sk, rep->vdiag_cookie); return 0; } static int vsock_diag_dump(struct sk_buff *skb, struct netlink_callback *cb) { struct vsock_diag_req *req; struct vsock_sock *vsk; unsigned int bucket; unsigned int last_i; unsigned int table; struct net *net; unsigned int i; req = nlmsg_data(cb->nlh); net = sock_net(skb->sk); /* State saved between calls: */ table = cb->args[0]; bucket = cb->args[1]; i = last_i = cb->args[2]; /* TODO VMCI pending sockets? */ spin_lock_bh(&vsock_table_lock); /* Bind table (locally created sockets) */ if (table == 0) { while (bucket < ARRAY_SIZE(vsock_bind_table)) { struct list_head *head = &vsock_bind_table[bucket]; i = 0; list_for_each_entry(vsk, head, bound_table) { struct sock *sk = sk_vsock(vsk); if (!net_eq(sock_net(sk), net)) continue; if (i < last_i) goto next_bind; if (!(req->vdiag_states & (1 << sk->sk_state))) goto next_bind; if (sk_diag_fill(sk, skb, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq, NLM_F_MULTI) < 0) goto done; next_bind: i++; } last_i = 0; bucket++; } table++; bucket = 0; } /* Connected table (accepted connections) */ while (bucket < ARRAY_SIZE(vsock_connected_table)) { struct list_head *head = &vsock_connected_table[bucket]; i = 0; list_for_each_entry(vsk, head, connected_table) { struct sock *sk = sk_vsock(vsk); /* Skip sockets we've already seen above */ if (__vsock_in_bound_table(vsk)) continue; if (!net_eq(sock_net(sk), net)) continue; if (i < last_i) goto next_connected; if (!(req->vdiag_states & (1 << sk->sk_state))) goto next_connected; if (sk_diag_fill(sk, skb, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq, NLM_F_MULTI) < 0) goto done; next_connected: i++; } last_i = 0; bucket++; } done: spin_unlock_bh(&vsock_table_lock); cb->args[0] = table; cb->args[1] = bucket; cb->args[2] = i; return skb->len; } static int vsock_diag_handler_dump(struct sk_buff *skb, struct nlmsghdr *h) { int hdrlen = sizeof(struct vsock_diag_req); struct net *net = sock_net(skb->sk); if (nlmsg_len(h) < hdrlen) return -EINVAL; if (h->nlmsg_flags & NLM_F_DUMP) { struct netlink_dump_control c = { .dump = vsock_diag_dump, }; return netlink_dump_start(net->diag_nlsk, skb, h, &c); } return -EOPNOTSUPP; } static const struct sock_diag_handler vsock_diag_handler = { .owner = THIS_MODULE, .family = AF_VSOCK, .dump = vsock_diag_handler_dump, }; static int __init vsock_diag_init(void) { return sock_diag_register(&vsock_diag_handler); } static void __exit vsock_diag_exit(void) { sock_diag_unregister(&vsock_diag_handler); } module_init(vsock_diag_init); module_exit(vsock_diag_exit); MODULE_LICENSE("GPL"); MODULE_DESCRIPTION("VMware Virtual Sockets monitoring via SOCK_DIAG"); MODULE_ALIAS_NET_PF_PROTO_TYPE(PF_NETLINK, NETLINK_SOCK_DIAG, 40 /* AF_VSOCK */); |
| 1 1 1 1 1 2 2 2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 | // SPDX-License-Identifier: GPL-2.0-or-later /* * Abilis Systems Single DVB-T Receiver * Copyright (C) 2008 Pierrick Hascoet <pierrick.hascoet@abilis.com> * Copyright (C) 2010 Devin Heitmueller <dheitmueller@kernellabs.com> */ #include <linux/kernel.h> #include <linux/errno.h> #include <linux/slab.h> #include <linux/mm.h> #include <linux/usb.h> #include "as102_drv.h" #include "as102_usb_drv.h" #include "as102_fw.h" static void as102_usb_disconnect(struct usb_interface *interface); static int as102_usb_probe(struct usb_interface *interface, const struct usb_device_id *id); static int as102_usb_start_stream(struct as102_dev_t *dev); static void as102_usb_stop_stream(struct as102_dev_t *dev); static int as102_open(struct inode *inode, struct file *file); static int as102_release(struct inode *inode, struct file *file); static const struct usb_device_id as102_usb_id_table[] = { { USB_DEVICE(AS102_USB_DEVICE_VENDOR_ID, AS102_USB_DEVICE_PID_0001) }, { USB_DEVICE(PCTV_74E_USB_VID, PCTV_74E_USB_PID) }, { USB_DEVICE(ELGATO_EYETV_DTT_USB_VID, ELGATO_EYETV_DTT_USB_PID) }, { USB_DEVICE(NBOX_DVBT_DONGLE_USB_VID, NBOX_DVBT_DONGLE_USB_PID) }, { USB_DEVICE(SKY_IT_DIGITAL_KEY_USB_VID, SKY_IT_DIGITAL_KEY_USB_PID) }, { } /* Terminating entry */ }; /* Note that this table must always have the same number of entries as the as102_usb_id_table struct */ static const char * const as102_device_names[] = { AS102_REFERENCE_DESIGN, AS102_PCTV_74E, AS102_ELGATO_EYETV_DTT_NAME, AS102_NBOX_DVBT_DONGLE_NAME, AS102_SKY_IT_DIGITAL_KEY_NAME, NULL /* Terminating entry */ }; /* eLNA configuration: devices built on the reference design work best with 0xA0, while custom designs seem to require 0xC0 */ static uint8_t const as102_elna_cfg[] = { 0xA0, 0xC0, 0xC0, 0xA0, 0xA0, 0x00 /* Terminating entry */ }; struct usb_driver as102_usb_driver = { .name = DRIVER_FULL_NAME, .probe = as102_usb_probe, .disconnect = as102_usb_disconnect, .id_table = as102_usb_id_table }; static const struct file_operations as102_dev_fops = { .owner = THIS_MODULE, .open = as102_open, .release = as102_release, }; static struct usb_class_driver as102_usb_class_driver = { .name = "aton2-%d", .fops = &as102_dev_fops, .minor_base = AS102_DEVICE_MAJOR, }; static int as102_usb_xfer_cmd(struct as10x_bus_adapter_t *bus_adap, unsigned char *send_buf, int send_buf_len, unsigned char *recv_buf, int recv_buf_len) { int ret = 0; if (send_buf != NULL) { ret = usb_control_msg(bus_adap->usb_dev, usb_sndctrlpipe(bus_adap->usb_dev, 0), AS102_USB_DEVICE_TX_CTRL_CMD, USB_DIR_OUT | USB_TYPE_VENDOR | USB_RECIP_DEVICE, bus_adap->cmd_xid, /* value */ 0, /* index */ send_buf, send_buf_len, USB_CTRL_SET_TIMEOUT /* 200 */); if (ret < 0) { dev_dbg(&bus_adap->usb_dev->dev, "usb_control_msg(send) failed, err %i\n", ret); return ret; } if (ret != send_buf_len) { dev_dbg(&bus_adap->usb_dev->dev, "only wrote %d of %d bytes\n", ret, send_buf_len); return -1; } } if (recv_buf != NULL) { #ifdef TRACE dev_dbg(bus_adap->usb_dev->dev, "want to read: %d bytes\n", recv_buf_len); #endif ret = usb_control_msg(bus_adap->usb_dev, usb_rcvctrlpipe(bus_adap->usb_dev, 0), AS102_USB_DEVICE_RX_CTRL_CMD, USB_DIR_IN | USB_TYPE_VENDOR | USB_RECIP_DEVICE, bus_adap->cmd_xid, /* value */ 0, /* index */ recv_buf, recv_buf_len, USB_CTRL_GET_TIMEOUT /* 200 */); if (ret < 0) { dev_dbg(&bus_adap->usb_dev->dev, "usb_control_msg(recv) failed, err %i\n", ret); return ret; } #ifdef TRACE dev_dbg(bus_adap->usb_dev->dev, "read %d bytes\n", recv_buf_len); #endif } return ret; } static int as102_send_ep1(struct as10x_bus_adapter_t *bus_adap, unsigned char *send_buf, int send_buf_len, int swap32) { int ret, actual_len; ret = usb_bulk_msg(bus_adap->usb_dev, usb_sndbulkpipe(bus_adap->usb_dev, 1), send_buf, send_buf_len, &actual_len, 200); if (ret) { dev_dbg(&bus_adap->usb_dev->dev, "usb_bulk_msg(send) failed, err %i\n", ret); return ret; } if (actual_len != send_buf_len) { dev_dbg(&bus_adap->usb_dev->dev, "only wrote %d of %d bytes\n", actual_len, send_buf_len); return -1; } return actual_len; } static int as102_read_ep2(struct as10x_bus_adapter_t *bus_adap, unsigned char *recv_buf, int recv_buf_len) { int ret, actual_len; if (recv_buf == NULL) return -EINVAL; ret = usb_bulk_msg(bus_adap->usb_dev, usb_rcvbulkpipe(bus_adap->usb_dev, 2), recv_buf, recv_buf_len, &actual_len, 200); if (ret) { dev_dbg(&bus_adap->usb_dev->dev, "usb_bulk_msg(recv) failed, err %i\n", ret); return ret; } if (actual_len != recv_buf_len) { dev_dbg(&bus_adap->usb_dev->dev, "only read %d of %d bytes\n", actual_len, recv_buf_len); return -1; } return actual_len; } static const struct as102_priv_ops_t as102_priv_ops = { .upload_fw_pkt = as102_send_ep1, .xfer_cmd = as102_usb_xfer_cmd, .as102_read_ep2 = as102_read_ep2, .start_stream = as102_usb_start_stream, .stop_stream = as102_usb_stop_stream, }; static int as102_submit_urb_stream(struct as102_dev_t *dev, struct urb *urb) { int err; usb_fill_bulk_urb(urb, dev->bus_adap.usb_dev, usb_rcvbulkpipe(dev->bus_adap.usb_dev, 0x2), urb->transfer_buffer, AS102_USB_BUF_SIZE, as102_urb_stream_irq, dev); err = usb_submit_urb(urb, GFP_ATOMIC); if (err) dev_dbg(&urb->dev->dev, "%s: usb_submit_urb failed\n", __func__); return err; } void as102_urb_stream_irq(struct urb *urb) { struct as102_dev_t *as102_dev = urb->context; if (urb->actual_length > 0) { dvb_dmx_swfilter(&as102_dev->dvb_dmx, urb->transfer_buffer, urb->actual_length); } else { if (urb->actual_length == 0) memset(urb->transfer_buffer, 0, AS102_USB_BUF_SIZE); } /* is not stopped, re-submit urb */ if (as102_dev->streaming) as102_submit_urb_stream(as102_dev, urb); } static void as102_free_usb_stream_buffer(struct as102_dev_t *dev) { int i; for (i = 0; i < MAX_STREAM_URB; i++) usb_free_urb(dev->stream_urb[i]); usb_free_coherent(dev->bus_adap.usb_dev, MAX_STREAM_URB * AS102_USB_BUF_SIZE, dev->stream, dev->dma_addr); } static int as102_alloc_usb_stream_buffer(struct as102_dev_t *dev) { int i; dev->stream = usb_alloc_coherent(dev->bus_adap.usb_dev, MAX_STREAM_URB * AS102_USB_BUF_SIZE, GFP_KERNEL, &dev->dma_addr); if (!dev->stream) { dev_dbg(&dev->bus_adap.usb_dev->dev, "%s: usb_buffer_alloc failed\n", __func__); return -ENOMEM; } memset(dev->stream, 0, MAX_STREAM_URB * AS102_USB_BUF_SIZE); /* init urb buffers */ for (i = 0; i < MAX_STREAM_URB; i++) { struct urb *urb; urb = usb_alloc_urb(0, GFP_KERNEL); if (urb == NULL) { as102_free_usb_stream_buffer(dev); return -ENOMEM; } urb->transfer_buffer = dev->stream + (i * AS102_USB_BUF_SIZE); urb->transfer_dma = dev->dma_addr + (i * AS102_USB_BUF_SIZE); urb->transfer_flags = URB_NO_TRANSFER_DMA_MAP; urb->transfer_buffer_length = AS102_USB_BUF_SIZE; dev->stream_urb[i] = urb; } return 0; } static void as102_usb_stop_stream(struct as102_dev_t *dev) { int i; for (i = 0; i < MAX_STREAM_URB; i++) usb_kill_urb(dev->stream_urb[i]); } static int as102_usb_start_stream(struct as102_dev_t *dev) { int i, ret = 0; for (i = 0; i < MAX_STREAM_URB; i++) { ret = as102_submit_urb_stream(dev, dev->stream_urb[i]); if (ret) { as102_usb_stop_stream(dev); return ret; } } return 0; } static void as102_usb_release(struct kref *kref) { struct as102_dev_t *as102_dev; as102_dev = container_of(kref, struct as102_dev_t, kref); usb_put_dev(as102_dev->bus_adap.usb_dev); kfree(as102_dev); } static void as102_usb_disconnect(struct usb_interface *intf) { struct as102_dev_t *as102_dev; /* extract as102_dev_t from usb_device private data */ as102_dev = usb_get_intfdata(intf); /* unregister dvb layer */ as102_dvb_unregister(as102_dev); /* free usb buffers */ as102_free_usb_stream_buffer(as102_dev); usb_set_intfdata(intf, NULL); /* usb unregister device */ usb_deregister_dev(intf, &as102_usb_class_driver); /* decrement usage counter */ kref_put(&as102_dev->kref, as102_usb_release); pr_info("%s: device has been disconnected\n", DRIVER_NAME); } static int as102_usb_probe(struct usb_interface *intf, const struct usb_device_id *id) { int ret; struct as102_dev_t *as102_dev; int i; /* This should never actually happen */ if (ARRAY_SIZE(as102_usb_id_table) != (sizeof(as102_device_names) / sizeof(const char *))) { pr_err("Device names table invalid size"); return -EINVAL; } as102_dev = kzalloc_obj(struct as102_dev_t); if (as102_dev == NULL) return -ENOMEM; /* Assign the user-friendly device name */ for (i = 0; i < ARRAY_SIZE(as102_usb_id_table); i++) { if (id == &as102_usb_id_table[i]) { as102_dev->name = as102_device_names[i]; as102_dev->elna_cfg = as102_elna_cfg[i]; } } if (as102_dev->name == NULL) as102_dev->name = "Unknown AS102 device"; /* set private callback functions */ as102_dev->bus_adap.ops = &as102_priv_ops; /* init cmd token for usb bus */ as102_dev->bus_adap.cmd = &as102_dev->bus_adap.token.usb.c; as102_dev->bus_adap.rsp = &as102_dev->bus_adap.token.usb.r; /* init kernel device reference */ kref_init(&as102_dev->kref); /* store as102 device to usb_device private data */ usb_set_intfdata(intf, (void *) as102_dev); /* store in as102 device the usb_device pointer */ as102_dev->bus_adap.usb_dev = usb_get_dev(interface_to_usbdev(intf)); /* we can register the device now, as it is ready */ ret = usb_register_dev(intf, &as102_usb_class_driver); if (ret < 0) { /* something prevented us from registering this driver */ dev_err(&intf->dev, "%s: usb_register_dev() failed (errno = %d)\n", __func__, ret); goto failed; } pr_info("%s: device has been detected\n", DRIVER_NAME); /* request buffer allocation for streaming */ ret = as102_alloc_usb_stream_buffer(as102_dev); if (ret != 0) goto failed_stream; /* register dvb layer */ ret = as102_dvb_register(as102_dev); if (ret != 0) goto failed_dvb; return ret; failed_dvb: as102_free_usb_stream_buffer(as102_dev); failed_stream: usb_deregister_dev(intf, &as102_usb_class_driver); failed: usb_put_dev(as102_dev->bus_adap.usb_dev); usb_set_intfdata(intf, NULL); kfree(as102_dev); return ret; } static int as102_open(struct inode *inode, struct file *file) { int ret = 0, minor = 0; struct usb_interface *intf = NULL; struct as102_dev_t *dev = NULL; /* read minor from inode */ minor = iminor(inode); /* fetch device from usb interface */ intf = usb_find_interface(&as102_usb_driver, minor); if (intf == NULL) { pr_err("%s: can't find device for minor %d\n", __func__, minor); ret = -ENODEV; goto exit; } /* get our device */ dev = usb_get_intfdata(intf); if (dev == NULL) { ret = -EFAULT; goto exit; } /* save our device object in the file's private structure */ file->private_data = dev; /* increment our usage count for the device */ kref_get(&dev->kref); exit: return ret; } static int as102_release(struct inode *inode, struct file *file) { struct as102_dev_t *dev = NULL; dev = file->private_data; if (dev != NULL) { /* decrement the count on our device */ kref_put(&dev->kref, as102_usb_release); } return 0; } MODULE_DEVICE_TABLE(usb, as102_usb_id_table); |
| 26 26 26 26 26 24 26 6 6 6 6 6 6 6 5 6 52 52 52 6 1 5 51 51 51 51 52 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 | // SPDX-License-Identifier: GPL-2.0-or-later /* * INET An implementation of the TCP/IP protocol suite for the LINUX * operating system. INET is implemented using the BSD Socket * interface as the means of communication with the user level. * * Support for INET6 connection oriented protocols. * * Authors: See the TCPv6 sources */ #include <linux/module.h> #include <linux/in6.h> #include <linux/ipv6.h> #include <linux/jhash.h> #include <linux/slab.h> #include <net/addrconf.h> #include <net/inet_connection_sock.h> #include <net/inet_ecn.h> #include <net/inet_hashtables.h> #include <net/ip6_route.h> #include <net/sock.h> #include <net/inet6_connection_sock.h> #include <net/sock_reuseport.h> struct dst_entry *inet6_csk_route_req(const struct sock *sk, struct dst_entry *dst, struct flowi6 *fl6, const struct request_sock *req, u8 proto) { const struct inet_request_sock *ireq = inet_rsk(req); const struct ipv6_pinfo *np = inet6_sk(sk); struct in6_addr *final_p, final; memset(fl6, 0, sizeof(*fl6)); fl6->flowi6_proto = proto; fl6->daddr = ireq->ir_v6_rmt_addr; rcu_read_lock(); final_p = fl6_update_dst(fl6, rcu_dereference(np->opt), &final); rcu_read_unlock(); fl6->saddr = ireq->ir_v6_loc_addr; fl6->flowi6_oif = ireq->ir_iif; fl6->flowi6_mark = ireq->ir_mark; fl6->fl6_dport = ireq->ir_rmt_port; fl6->fl6_sport = htons(ireq->ir_num); fl6->flowi6_uid = sk_uid(sk); security_req_classify_flow(req, flowi6_to_flowi_common(fl6)); if (!dst) { dst = ip6_dst_lookup_flow(sock_net(sk), sk, fl6, final_p); if (IS_ERR(dst)) return NULL; } return dst; } static struct dst_entry *inet6_csk_route_socket(struct sock *sk, struct flowi6 *fl6) { struct inet_sock *inet = inet_sk(sk); struct ipv6_pinfo *np = inet6_sk(sk); struct in6_addr *final_p; struct dst_entry *dst; memset(fl6, 0, sizeof(*fl6)); fl6->flowi6_proto = sk->sk_protocol; fl6->daddr = sk->sk_v6_daddr; fl6->saddr = np->saddr; fl6->flowlabel = np->flow_label; IP6_ECN_flow_xmit(sk, fl6->flowlabel); fl6->flowi6_oif = sk->sk_bound_dev_if; fl6->flowi6_mark = sk->sk_mark; fl6->fl6_sport = inet->inet_sport; fl6->fl6_dport = inet->inet_dport; fl6->flowi6_uid = sk_uid(sk); security_sk_classify_flow(sk, flowi6_to_flowi_common(fl6)); rcu_read_lock(); final_p = fl6_update_dst(fl6, rcu_dereference(np->opt), &np->final); rcu_read_unlock(); dst = ip6_dst_lookup_flow(sock_net(sk), sk, fl6, final_p); if (!IS_ERR(dst)) ip6_dst_store(sk, dst, false, false); return dst; } int inet6_csk_xmit(struct sock *sk, struct sk_buff *skb, struct flowi *fl_unused) { struct flowi6 *fl6 = &inet_sk(sk)->cork.fl.u.ip6; struct ipv6_pinfo *np = inet6_sk(sk); struct dst_entry *dst; int res; dst = __sk_dst_check(sk, np->dst_cookie); if (unlikely(!dst)) { dst = inet6_csk_route_socket(sk, fl6); if (IS_ERR(dst)) { WRITE_ONCE(sk->sk_err_soft, -PTR_ERR(dst)); sk->sk_route_caps = 0; kfree_skb(skb); return PTR_ERR(dst); } /* Restore final destination back after routing done */ fl6->daddr = sk->sk_v6_daddr; } rcu_read_lock(); skb_dst_set_noref(skb, dst); res = ip6_xmit(sk, skb, fl6, sk->sk_mark, rcu_dereference(np->opt), np->tclass, READ_ONCE(sk->sk_priority)); rcu_read_unlock(); return res; } EXPORT_SYMBOL_GPL(inet6_csk_xmit); struct dst_entry *inet6_csk_update_pmtu(struct sock *sk, u32 mtu) { struct flowi6 *fl6 = &inet_sk(sk)->cork.fl.u.ip6; struct dst_entry *dst; dst = inet6_csk_route_socket(sk, fl6); if (IS_ERR(dst)) return NULL; dst->ops->update_pmtu(dst, sk, NULL, mtu, true); dst = inet6_csk_route_socket(sk, fl6); return IS_ERR(dst) ? NULL : dst; } |
| 1 15 3 15 15 18 14 13 15 15 15 15 13 15 15 10 10 9 10 84 109 95 2 1 3 2 3 31 31 2 30 30 30 25 31 31 6 31 4 30 30 31 31 31 31 31 11 8 11 31 26 9 31 31 6 45 6 44 42 30 29 29 1 2 2 2 5 5 5 2 5 5 5 5 22 22 22 20 1 22 19 15 3 2 2 16 16 18 1 19 4 5 5 4 34 35 2 33 9 9 3 9 9 2 1 9 18 18 17 18 3 17 18 31 21 20 2 2 11 3 1 1 1 1 2 1 40 40 39 40 39 33 38 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 | // SPDX-License-Identifier: GPL-2.0-or-later /* * ALSA sequencer Timing queue handling * Copyright (c) 1998-1999 by Frank van de Pol <fvdpol@coil.demon.nl> * * MAJOR CHANGES * Nov. 13, 1999 Takashi Iwai <iwai@ww.uni-erlangen.de> * - Queues are allocated dynamically via ioctl. * - When owner client is deleted, all owned queues are deleted, too. * - Owner of unlocked queue is kept unmodified even if it is * manipulated by other clients. * - Owner field in SET_QUEUE_OWNER ioctl must be identical with the * caller client. i.e. Changing owner to a third client is not * allowed. * * Aug. 30, 2000 Takashi Iwai * - Queues are managed in static array again, but with better way. * The API itself is identical. * - The queue is locked when struct snd_seq_queue pointer is returned via * queueptr(). This pointer *MUST* be released afterward by * queuefree(ptr). * - Addition of experimental sync support. */ #include <linux/init.h> #include <linux/slab.h> #include <sound/core.h> #include "seq_memory.h" #include "seq_queue.h" #include "seq_clientmgr.h" #include "seq_fifo.h" #include "seq_timer.h" #include "seq_info.h" /* list of allocated queues */ static struct snd_seq_queue *queue_list[SNDRV_SEQ_MAX_QUEUES]; static DEFINE_SPINLOCK(queue_list_lock); /* number of queues allocated */ static int num_queues; int snd_seq_queue_get_cur_queues(void) { return num_queues; } /*----------------------------------------------------------------*/ /* assign queue id and insert to list */ static int queue_list_add(struct snd_seq_queue *q) { int i; guard(spinlock_irqsave)(&queue_list_lock); for (i = 0; i < SNDRV_SEQ_MAX_QUEUES; i++) { if (! queue_list[i]) { queue_list[i] = q; q->queue = i; num_queues++; return i; } } return -1; } static struct snd_seq_queue *queue_list_remove(int id, int client) { struct snd_seq_queue *q; guard(spinlock_irqsave)(&queue_list_lock); q = queue_list[id]; if (q) { guard(spinlock)(&q->owner_lock); if (q->owner == client) { /* found */ q->klocked = 1; queue_list[id] = NULL; num_queues--; return q; } } return NULL; } /*----------------------------------------------------------------*/ /* create new queue (constructor) */ static struct snd_seq_queue *queue_new(int owner, int locked) { struct snd_seq_queue *q; q = kzalloc_obj(*q); if (!q) return NULL; spin_lock_init(&q->owner_lock); spin_lock_init(&q->check_lock); mutex_init(&q->timer_mutex); snd_use_lock_init(&q->use_lock); q->queue = -1; q->tickq = snd_seq_prioq_new(); q->timeq = snd_seq_prioq_new(); q->timer = snd_seq_timer_new(); if (q->tickq == NULL || q->timeq == NULL || q->timer == NULL) { snd_seq_prioq_delete(&q->tickq); snd_seq_prioq_delete(&q->timeq); snd_seq_timer_delete(&q->timer); kfree(q); return NULL; } q->owner = owner; q->locked = locked; q->klocked = 0; return q; } /* delete queue (destructor) */ static void queue_delete(struct snd_seq_queue *q) { /* stop and release the timer */ mutex_lock(&q->timer_mutex); snd_seq_timer_stop(q->timer); snd_seq_timer_close(q); mutex_unlock(&q->timer_mutex); /* wait until access free */ snd_use_lock_sync(&q->use_lock); /* release resources... */ snd_seq_prioq_delete(&q->tickq); snd_seq_prioq_delete(&q->timeq); snd_seq_timer_delete(&q->timer); kfree(q); } /*----------------------------------------------------------------*/ /* delete all existing queues */ void snd_seq_queues_delete(void) { int i; /* clear list */ for (i = 0; i < SNDRV_SEQ_MAX_QUEUES; i++) { if (queue_list[i]) queue_delete(queue_list[i]); } } static void queue_use(struct snd_seq_queue *queue, int client, int use); /* allocate a new queue - * return pointer to new queue or ERR_PTR(-errno) for error * The new queue's use_lock is set to 1. It is the caller's responsibility to * call snd_use_lock_free(&q->use_lock). */ struct snd_seq_queue *snd_seq_queue_alloc(int client, int locked, unsigned int info_flags) { struct snd_seq_queue *q; q = queue_new(client, locked); if (q == NULL) return ERR_PTR(-ENOMEM); q->info_flags = info_flags; queue_use(q, client, 1); snd_use_lock_use(&q->use_lock); if (queue_list_add(q) < 0) { snd_use_lock_free(&q->use_lock); queue_delete(q); return ERR_PTR(-ENOMEM); } return q; } /* delete a queue - queue must be owned by the client */ int snd_seq_queue_delete(int client, int queueid) { struct snd_seq_queue *q; if (queueid < 0 || queueid >= SNDRV_SEQ_MAX_QUEUES) return -EINVAL; q = queue_list_remove(queueid, client); if (q == NULL) return -EINVAL; queue_delete(q); return 0; } /* return pointer to queue structure for specified id */ struct snd_seq_queue *queueptr(int queueid) { struct snd_seq_queue *q; if (queueid < 0 || queueid >= SNDRV_SEQ_MAX_QUEUES) return NULL; guard(spinlock_irqsave)(&queue_list_lock); q = queue_list[queueid]; if (q) snd_use_lock_use(&q->use_lock); return q; } /* return the (first) queue matching with the specified name */ struct snd_seq_queue *snd_seq_queue_find_name(char *name) { int i; for (i = 0; i < SNDRV_SEQ_MAX_QUEUES; i++) { struct snd_seq_queue *q __free(snd_seq_queue) = queueptr(i); if (q) { if (strncmp(q->name, name, sizeof(q->name)) == 0) return no_free_ptr(q); } } return NULL; } /* -------------------------------------------------------- */ #define MAX_CELL_PROCESSES_IN_QUEUE 1000 void snd_seq_check_queue(struct snd_seq_queue *q, int atomic, int hop) { struct snd_seq_event_cell *cell; snd_seq_tick_time_t cur_tick; snd_seq_real_time_t cur_time; int processed = 0; if (q == NULL) return; /* make this function non-reentrant */ scoped_guard(spinlock_irqsave, &q->check_lock) { if (q->check_blocked) { q->check_again = 1; return; /* other thread is already checking queues */ } q->check_blocked = 1; } __again: /* Process tick queue... */ cur_tick = snd_seq_timer_get_cur_tick(q->timer); for (;;) { cell = snd_seq_prioq_cell_out(q->tickq, &cur_tick); if (!cell) break; snd_seq_dispatch_event(cell, atomic, hop); if (++processed >= MAX_CELL_PROCESSES_IN_QUEUE) goto out; /* the rest processed at the next batch */ } /* Process time queue... */ cur_time = snd_seq_timer_get_cur_time(q->timer, false); for (;;) { cell = snd_seq_prioq_cell_out(q->timeq, &cur_time); if (!cell) break; snd_seq_dispatch_event(cell, atomic, hop); if (++processed >= MAX_CELL_PROCESSES_IN_QUEUE) goto out; /* the rest processed at the next batch */ } out: /* free lock */ scoped_guard(spinlock_irqsave, &q->check_lock) { if (q->check_again) { q->check_again = 0; if (processed < MAX_CELL_PROCESSES_IN_QUEUE) goto __again; } q->check_blocked = 0; } } /* enqueue a event to singe queue */ int snd_seq_enqueue_event(struct snd_seq_event_cell *cell, int atomic, int hop) { int dest, err; if (snd_BUG_ON(!cell)) return -EINVAL; dest = cell->event.queue; /* destination queue */ struct snd_seq_queue *q __free(snd_seq_queue) = queueptr(dest); if (q == NULL) return -EINVAL; /* handle relative time stamps, convert them into absolute */ if ((cell->event.flags & SNDRV_SEQ_TIME_MODE_MASK) == SNDRV_SEQ_TIME_MODE_REL) { switch (cell->event.flags & SNDRV_SEQ_TIME_STAMP_MASK) { case SNDRV_SEQ_TIME_STAMP_TICK: cell->event.time.tick += q->timer->tick.cur_tick; break; case SNDRV_SEQ_TIME_STAMP_REAL: snd_seq_inc_real_time(&cell->event.time.time, &q->timer->cur_time); break; } cell->event.flags &= ~SNDRV_SEQ_TIME_MODE_MASK; cell->event.flags |= SNDRV_SEQ_TIME_MODE_ABS; } /* enqueue event in the real-time or midi queue */ switch (cell->event.flags & SNDRV_SEQ_TIME_STAMP_MASK) { case SNDRV_SEQ_TIME_STAMP_TICK: err = snd_seq_prioq_cell_in(q->tickq, cell); break; case SNDRV_SEQ_TIME_STAMP_REAL: default: err = snd_seq_prioq_cell_in(q->timeq, cell); break; } if (err < 0) return err; /* trigger dispatching */ snd_seq_check_queue(q, atomic, hop); return 0; } /*----------------------------------------------------------------*/ static inline int check_access(struct snd_seq_queue *q, int client) { return (q->owner == client) || (!q->locked && !q->klocked); } /* check if the client has permission to modify queue parameters. * if it does, lock the queue */ static int queue_access_lock(struct snd_seq_queue *q, int client) { int access_ok; guard(spinlock_irqsave)(&q->owner_lock); access_ok = check_access(q, client); if (access_ok) q->klocked = 1; return access_ok; } /* unlock the queue */ static inline void queue_access_unlock(struct snd_seq_queue *q) { guard(spinlock_irqsave)(&q->owner_lock); q->klocked = 0; } /* exported - only checking permission */ int snd_seq_queue_check_access(int queueid, int client) { struct snd_seq_queue *q __free(snd_seq_queue) = queueptr(queueid); if (! q) return 0; guard(spinlock_irqsave)(&q->owner_lock); return check_access(q, client); } /*----------------------------------------------------------------*/ /* * change queue's owner and permission */ int snd_seq_queue_set_owner(int queueid, int client, int locked) { struct snd_seq_queue *q __free(snd_seq_queue) = queueptr(queueid); if (q == NULL) return -EINVAL; if (!queue_access_lock(q, client)) return -EPERM; scoped_guard(spinlock_irqsave, &q->owner_lock) { q->locked = locked ? 1 : 0; q->owner = client; } queue_access_unlock(q); return 0; } /*----------------------------------------------------------------*/ /* open timer - * q->use mutex should be down before calling this function to avoid * confliction with snd_seq_queue_use() */ int snd_seq_queue_timer_open(int queueid) { int result = 0; struct snd_seq_timer *tmr; struct snd_seq_queue *queue __free(snd_seq_queue) = queueptr(queueid); if (queue == NULL) return -EINVAL; tmr = queue->timer; result = snd_seq_timer_open(queue); if (result < 0) { snd_seq_timer_defaults(tmr); result = snd_seq_timer_open(queue); } return result; } /* close timer - * q->use mutex should be down before calling this function */ int snd_seq_queue_timer_close(int queueid) { int result = 0; struct snd_seq_queue *queue __free(snd_seq_queue) = queueptr(queueid); if (queue == NULL) return -EINVAL; snd_seq_timer_close(queue); return result; } /* change queue tempo and ppq */ int snd_seq_queue_timer_set_tempo(int queueid, int client, struct snd_seq_queue_tempo *info) { struct snd_seq_queue *q __free(snd_seq_queue) = queueptr(queueid); int result; if (q == NULL) return -EINVAL; if (!queue_access_lock(q, client)) return -EPERM; result = snd_seq_timer_set_tempo_ppq(q->timer, info->tempo, info->ppq, info->tempo_base); if (result >= 0 && info->skew_base > 0) result = snd_seq_timer_set_skew(q->timer, info->skew_value, info->skew_base); queue_access_unlock(q); return result; } /* use or unuse this queue */ static void queue_use(struct snd_seq_queue *queue, int client, int use) { if (use) { if (!test_and_set_bit(client, queue->clients_bitmap)) queue->clients++; } else { if (test_and_clear_bit(client, queue->clients_bitmap)) queue->clients--; } if (queue->clients) { if (use && queue->clients == 1) snd_seq_timer_defaults(queue->timer); snd_seq_timer_open(queue); } else { snd_seq_timer_close(queue); } } /* use or unuse this queue - * if it is the first client, starts the timer. * if it is not longer used by any clients, stop the timer. */ int snd_seq_queue_use(int queueid, int client, int use) { struct snd_seq_queue *queue __free(snd_seq_queue) = queueptr(queueid); if (queue == NULL) return -EINVAL; guard(mutex)(&queue->timer_mutex); queue_use(queue, client, use); return 0; } /* * check if queue is used by the client * return negative value if the queue is invalid. * return 0 if not used, 1 if used. */ int snd_seq_queue_is_used(int queueid, int client) { struct snd_seq_queue *q __free(snd_seq_queue) = queueptr(queueid); if (q == NULL) return -EINVAL; /* invalid queue */ return test_bit(client, q->clients_bitmap) ? 1 : 0; } /*----------------------------------------------------------------*/ /* final stage notification - * remove cells for no longer exist client (for non-owned queue) * or delete this queue (for owned queue) */ void snd_seq_queue_client_leave(int client) { int i; /* delete own queues from queue list */ for (i = 0; i < SNDRV_SEQ_MAX_QUEUES; i++) { struct snd_seq_queue *q = queue_list_remove(i, client); if (q) queue_delete(q); } /* remove cells from existing queues - * they are not owned by this client */ for (i = 0; i < SNDRV_SEQ_MAX_QUEUES; i++) { struct snd_seq_queue *q __free(snd_seq_queue) = queueptr(i); if (!q) continue; if (test_bit(client, q->clients_bitmap)) { snd_seq_prioq_leave(q->tickq, client, 0); snd_seq_prioq_leave(q->timeq, client, 0); snd_seq_queue_use(q->queue, client, 0); } } } /*----------------------------------------------------------------*/ /* remove cells based on flush criteria */ void snd_seq_queue_remove_cells(int client, struct snd_seq_remove_events *info) { int i; for (i = 0; i < SNDRV_SEQ_MAX_QUEUES; i++) { struct snd_seq_queue *q __free(snd_seq_queue) = queueptr(i); if (!q) continue; if (test_bit(client, q->clients_bitmap) && (! (info->remove_mode & SNDRV_SEQ_REMOVE_DEST) || q->queue == info->queue)) { snd_seq_prioq_remove_events(q->tickq, client, info); snd_seq_prioq_remove_events(q->timeq, client, info); } } } /*----------------------------------------------------------------*/ /* * send events to all subscribed ports */ static void queue_broadcast_event(struct snd_seq_queue *q, struct snd_seq_event *ev, int atomic, int hop) { struct snd_seq_event sev; sev = *ev; sev.flags = SNDRV_SEQ_TIME_STAMP_TICK|SNDRV_SEQ_TIME_MODE_ABS; sev.time.tick = q->timer->tick.cur_tick; sev.queue = q->queue; sev.data.queue.queue = q->queue; /* broadcast events from Timer port */ sev.source.client = SNDRV_SEQ_CLIENT_SYSTEM; sev.source.port = SNDRV_SEQ_PORT_SYSTEM_TIMER; sev.dest.client = SNDRV_SEQ_ADDRESS_SUBSCRIBERS; snd_seq_kernel_client_dispatch(SNDRV_SEQ_CLIENT_SYSTEM, &sev, atomic, hop); } /* * process a received queue-control event. * this function is exported for seq_sync.c. */ static void snd_seq_queue_process_event(struct snd_seq_queue *q, struct snd_seq_event *ev, int atomic, int hop) { switch (ev->type) { case SNDRV_SEQ_EVENT_START: snd_seq_prioq_leave(q->tickq, ev->source.client, 1); snd_seq_prioq_leave(q->timeq, ev->source.client, 1); if (! snd_seq_timer_start(q->timer)) queue_broadcast_event(q, ev, atomic, hop); break; case SNDRV_SEQ_EVENT_CONTINUE: if (! snd_seq_timer_continue(q->timer)) queue_broadcast_event(q, ev, atomic, hop); break; case SNDRV_SEQ_EVENT_STOP: snd_seq_timer_stop(q->timer); queue_broadcast_event(q, ev, atomic, hop); break; case SNDRV_SEQ_EVENT_TEMPO: snd_seq_timer_set_tempo(q->timer, ev->data.queue.param.value); queue_broadcast_event(q, ev, atomic, hop); break; case SNDRV_SEQ_EVENT_SETPOS_TICK: if (snd_seq_timer_set_position_tick(q->timer, ev->data.queue.param.time.tick) == 0) { queue_broadcast_event(q, ev, atomic, hop); } break; case SNDRV_SEQ_EVENT_SETPOS_TIME: if (snd_seq_timer_set_position_time(q->timer, ev->data.queue.param.time.time) == 0) { queue_broadcast_event(q, ev, atomic, hop); } break; case SNDRV_SEQ_EVENT_QUEUE_SKEW: if (snd_seq_timer_set_skew(q->timer, ev->data.queue.param.skew.value, ev->data.queue.param.skew.base) == 0) { queue_broadcast_event(q, ev, atomic, hop); } break; } } /* * Queue control via timer control port: * this function is exported as a callback of timer port. */ int snd_seq_control_queue(struct snd_seq_event *ev, int atomic, int hop) { if (snd_BUG_ON(!ev)) return -EINVAL; struct snd_seq_queue *q __free(snd_seq_queue) = queueptr(ev->data.queue.queue); if (q == NULL) return -EINVAL; if (!queue_access_lock(q, ev->source.client)) return -EPERM; snd_seq_queue_process_event(q, ev, atomic, hop); queue_access_unlock(q); return 0; } /*----------------------------------------------------------------*/ #ifdef CONFIG_SND_PROC_FS /* exported to seq_info.c */ void snd_seq_info_queues_read(struct snd_info_entry *entry, struct snd_info_buffer *buffer) { int i, bpm; struct snd_seq_timer *tmr; bool locked; int owner; for (i = 0; i < SNDRV_SEQ_MAX_QUEUES; i++) { struct snd_seq_queue *q __free(snd_seq_queue) = queueptr(i); if (!q) continue; tmr = q->timer; if (tmr->tempo) bpm = (60000 * tmr->tempo_base) / tmr->tempo; else bpm = 0; scoped_guard(spinlock_irq, &q->owner_lock) { locked = q->locked; owner = q->owner; } snd_iprintf(buffer, "queue %d: [%s]\n", q->queue, q->name); snd_iprintf(buffer, "owned by client : %d\n", owner); snd_iprintf(buffer, "lock status : %s\n", locked ? "Locked" : "Free"); snd_iprintf(buffer, "queued time events : %d\n", snd_seq_prioq_avail(q->timeq)); snd_iprintf(buffer, "queued tick events : %d\n", snd_seq_prioq_avail(q->tickq)); snd_iprintf(buffer, "timer state : %s\n", tmr->running ? "Running" : "Stopped"); snd_iprintf(buffer, "timer PPQ : %d\n", tmr->ppq); snd_iprintf(buffer, "current tempo : %d\n", tmr->tempo); snd_iprintf(buffer, "tempo base : %d ns\n", tmr->tempo_base); snd_iprintf(buffer, "current BPM : %d\n", bpm); snd_iprintf(buffer, "current time : %d.%09d s\n", tmr->cur_time.tv_sec, tmr->cur_time.tv_nsec); snd_iprintf(buffer, "current tick : %d\n", tmr->tick.cur_tick); snd_iprintf(buffer, "\n"); } } #endif /* CONFIG_SND_PROC_FS */ |
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 | /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _ASM_X86_CMPXCHG_64_H #define _ASM_X86_CMPXCHG_64_H #define arch_cmpxchg64(ptr, o, n) \ ({ \ BUILD_BUG_ON(sizeof(*(ptr)) != 8); \ arch_cmpxchg((ptr), (o), (n)); \ }) #define arch_cmpxchg64_local(ptr, o, n) \ ({ \ BUILD_BUG_ON(sizeof(*(ptr)) != 8); \ arch_cmpxchg_local((ptr), (o), (n)); \ }) #define arch_try_cmpxchg64(ptr, po, n) \ ({ \ BUILD_BUG_ON(sizeof(*(ptr)) != 8); \ arch_try_cmpxchg((ptr), (po), (n)); \ }) #define arch_try_cmpxchg64_local(ptr, po, n) \ ({ \ BUILD_BUG_ON(sizeof(*(ptr)) != 8); \ arch_try_cmpxchg_local((ptr), (po), (n)); \ }) union __u128_halves { u128 full; struct { u64 low, high; }; }; #define __arch_cmpxchg128(_ptr, _old, _new, _lock) \ ({ \ union __u128_halves o = { .full = (_old), }, \ n = { .full = (_new), }; \ \ asm_inline volatile(_lock "cmpxchg16b %[ptr]" \ : [ptr] "+m" (*(_ptr)), \ "+a" (o.low), "+d" (o.high) \ : "b" (n.low), "c" (n.high) \ : "memory"); \ \ o.full; \ }) static __always_inline u128 arch_cmpxchg128(volatile u128 *ptr, u128 old, u128 new) { return __arch_cmpxchg128(ptr, old, new, LOCK_PREFIX); } #define arch_cmpxchg128 arch_cmpxchg128 static __always_inline u128 arch_cmpxchg128_local(volatile u128 *ptr, u128 old, u128 new) { return __arch_cmpxchg128(ptr, old, new,); } #define arch_cmpxchg128_local arch_cmpxchg128_local #define __arch_try_cmpxchg128(_ptr, _oldp, _new, _lock) \ ({ \ union __u128_halves o = { .full = *(_oldp), }, \ n = { .full = (_new), }; \ bool ret; \ \ asm_inline volatile(_lock "cmpxchg16b %[ptr]" \ : "=@ccz" (ret), \ [ptr] "+m" (*(_ptr)), \ "+a" (o.low), "+d" (o.high) \ : "b" (n.low), "c" (n.high) \ : "memory"); \ \ if (unlikely(!ret)) \ *(_oldp) = o.full; \ \ likely(ret); \ }) static __always_inline bool arch_try_cmpxchg128(volatile u128 *ptr, u128 *oldp, u128 new) { return __arch_try_cmpxchg128(ptr, oldp, new, LOCK_PREFIX); } #define arch_try_cmpxchg128 arch_try_cmpxchg128 static __always_inline bool arch_try_cmpxchg128_local(volatile u128 *ptr, u128 *oldp, u128 new) { return __arch_try_cmpxchg128(ptr, oldp, new,); } #define arch_try_cmpxchg128_local arch_try_cmpxchg128_local #define system_has_cmpxchg128() boot_cpu_has(X86_FEATURE_CX16) #endif /* _ASM_X86_CMPXCHG_64_H */ |
| 4 4 37 139 115 26 4 25 14 6 8 6 140 111 21 70 1 70 53 98 5 10 52 97 69 57 53 54 54 57 98 98 98 96 98 58 4 53 54 53 53 97 98 3 3 3 3 3 3 3 3 70 70 69 69 67 70 70 99 67 6 67 66 33 61 20 21 61 61 47 61 109 109 109 94 109 1 109 109 72 72 72 72 72 72 72 63 72 71 71 71 55 71 3 71 3 71 71 71 72 71 72 1 5 5 3 3 3 3 3 3 3 3 3 3 5 3 5 3 7 2 2 2 2 2 2 2 2 2 2 2 2 2 2 7 2 5 7 1 1 1 1 1 1 1 7 7 1 1 1 1 1 1 157 156 25 159 147 147 156 8 149 156 8 8 8 8 81 23 23 188 94 94 93 79 79 34 1 34 2 34 34 98 172 186 1 1 1 1 208 190 208 203 208 10 8 6 9 7 9 10 34 34 25 25 25 25 2 2 23 23 4 4 34 138 21 140 21 140 17 5 79 77 17 75 71 69 68 67 69 69 68 69 1 163 163 163 163 95 91 85 92 211 206 1 203 1 203 1 6 206 210 1 1 1 1 208 199 202 163 202 201 164 202 6 108 94 108 206 116 84 120 209 209 1 208 204 4 202 205 202 117 106 10 10 10 10 2 10 98 89 117 210 201 4 3 4 3 1 1 74 75 74 3 71 75 73 75 73 74 74 68 68 68 67 8 67 68 67 67 68 1 1 1 1 3 2 1 2 1 1 1 1 2 1 1 3 3 3 1 2 2 2 2 1 2 1 1 2 3 2 1 1 1 2 1 1 1 2 2 1 9 9 4 9 1 6 4 1 1 3 1 3 5 66 65 1 1 13 66 65 63 185 1 185 186 79 79 197 193 198 196 2 194 184 197 5 4 6 6 6 6 5 6 7 5 8 6 1 5 5 1 5 1 5 5 1 5 5 5 5 5 3 2 2 3 3 5 8 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 | // SPDX-License-Identifier: GPL-2.0-or-later /* * net/sched/act_api.c Packet action API. * * Author: Jamal Hadi Salim */ #include <linux/types.h> #include <linux/kernel.h> #include <linux/string.h> #include <linux/errno.h> #include <linux/slab.h> #include <linux/skbuff.h> #include <linux/init.h> #include <linux/kmod.h> #include <linux/err.h> #include <linux/module.h> #include <net/net_namespace.h> #include <net/sock.h> #include <net/sch_generic.h> #include <net/pkt_cls.h> #include <net/tc_act/tc_pedit.h> #include <net/act_api.h> #include <net/netlink.h> #include <net/flow_offload.h> #include <net/tc_wrapper.h> #ifdef CONFIG_INET DEFINE_STATIC_KEY_FALSE(tcf_frag_xmit_count); EXPORT_SYMBOL_GPL(tcf_frag_xmit_count); #endif int tcf_dev_queue_xmit(struct sk_buff *skb, int (*xmit)(struct sk_buff *skb)) { #ifdef CONFIG_INET if (static_branch_unlikely(&tcf_frag_xmit_count)) return sch_frag_xmit_hook(skb, xmit); #endif return xmit(skb); } EXPORT_SYMBOL_GPL(tcf_dev_queue_xmit); static void tcf_action_goto_chain_exec(const struct tc_action *a, struct tcf_result *res) { const struct tcf_chain *chain = rcu_dereference_bh(a->goto_chain); res->goto_tp = rcu_dereference_bh(chain->filter_chain); } static void tcf_free_cookie_rcu(struct rcu_head *p) { struct tc_cookie *cookie = container_of(p, struct tc_cookie, rcu); kfree(cookie->data); kfree(cookie); } static void tcf_set_action_cookie(struct tc_cookie __rcu **old_cookie, struct tc_cookie *new_cookie) { struct tc_cookie *old; old = unrcu_pointer(xchg(old_cookie, RCU_INITIALIZER(new_cookie))); if (old) call_rcu(&old->rcu, tcf_free_cookie_rcu); } int tcf_action_check_ctrlact(int action, struct tcf_proto *tp, struct tcf_chain **newchain, struct netlink_ext_ack *extack) { int opcode = TC_ACT_EXT_OPCODE(action), ret = -EINVAL; u32 chain_index; if (!opcode) ret = action > TC_ACT_VALUE_MAX ? -EINVAL : 0; else if (opcode <= TC_ACT_EXT_OPCODE_MAX || action == TC_ACT_UNSPEC) ret = 0; if (ret) { NL_SET_ERR_MSG(extack, "invalid control action"); goto end; } if (TC_ACT_EXT_CMP(action, TC_ACT_GOTO_CHAIN)) { chain_index = action & TC_ACT_EXT_VAL_MASK; if (!tp || !newchain) { ret = -EINVAL; NL_SET_ERR_MSG(extack, "can't goto NULL proto/chain"); goto end; } *newchain = tcf_chain_get_by_act(tp->chain->block, chain_index); if (!*newchain) { ret = -ENOMEM; NL_SET_ERR_MSG(extack, "can't allocate goto_chain"); } } end: return ret; } EXPORT_SYMBOL(tcf_action_check_ctrlact); struct tcf_chain *tcf_action_set_ctrlact(struct tc_action *a, int action, struct tcf_chain *goto_chain) { a->tcfa_action = action; goto_chain = rcu_replace_pointer(a->goto_chain, goto_chain, 1); return goto_chain; } EXPORT_SYMBOL(tcf_action_set_ctrlact); /* XXX: For standalone actions, we don't need a RCU grace period either, because * actions are always connected to filters and filters are already destroyed in * RCU callbacks, so after a RCU grace period actions are already disconnected * from filters. Readers later can not find us. */ static void free_tcf(struct tc_action *p) { struct tcf_chain *chain = rcu_dereference_protected(p->goto_chain, 1); free_percpu(p->cpu_bstats); free_percpu(p->cpu_bstats_hw); free_percpu(p->cpu_qstats); tcf_set_action_cookie(&p->user_cookie, NULL); if (chain) tcf_chain_put_by_act(chain); kfree(p); } static void offload_action_hw_count_set(struct tc_action *act, u32 hw_count) { act->in_hw_count = hw_count; } static void offload_action_hw_count_inc(struct tc_action *act, u32 hw_count) { act->in_hw_count += hw_count; } static void offload_action_hw_count_dec(struct tc_action *act, u32 hw_count) { act->in_hw_count = act->in_hw_count > hw_count ? act->in_hw_count - hw_count : 0; } static unsigned int tcf_offload_act_num_actions_single(struct tc_action *act) { if (is_tcf_pedit(act)) return tcf_pedit_nkeys(act); else return 1; } static bool tc_act_skip_hw(u32 flags) { return (flags & TCA_ACT_FLAGS_SKIP_HW) ? true : false; } static bool tc_act_skip_sw(u32 flags) { return (flags & TCA_ACT_FLAGS_SKIP_SW) ? true : false; } /* SKIP_HW and SKIP_SW are mutually exclusive flags. */ static bool tc_act_flags_valid(u32 flags) { flags &= TCA_ACT_FLAGS_SKIP_HW | TCA_ACT_FLAGS_SKIP_SW; return flags ^ (TCA_ACT_FLAGS_SKIP_HW | TCA_ACT_FLAGS_SKIP_SW); } static int offload_action_init(struct flow_offload_action *fl_action, struct tc_action *act, enum offload_act_command cmd, struct netlink_ext_ack *extack) { int err; fl_action->extack = extack; fl_action->command = cmd; fl_action->index = act->tcfa_index; fl_action->cookie = (unsigned long)act; if (act->ops->offload_act_setup) { spin_lock_bh(&act->tcfa_lock); err = act->ops->offload_act_setup(act, fl_action, NULL, false, extack); spin_unlock_bh(&act->tcfa_lock); return err; } return -EOPNOTSUPP; } static int tcf_action_offload_cmd_ex(struct flow_offload_action *fl_act, u32 *hw_count) { int err; err = flow_indr_dev_setup_offload(NULL, NULL, TC_SETUP_ACT, fl_act, NULL, NULL); if (err < 0) return err; if (hw_count) *hw_count = err; return 0; } static int tcf_action_offload_cmd_cb_ex(struct flow_offload_action *fl_act, u32 *hw_count, flow_indr_block_bind_cb_t *cb, void *cb_priv) { int err; err = cb(NULL, NULL, cb_priv, TC_SETUP_ACT, NULL, fl_act, NULL); if (err < 0) return err; if (hw_count) *hw_count = 1; return 0; } static int tcf_action_offload_cmd(struct flow_offload_action *fl_act, u32 *hw_count, flow_indr_block_bind_cb_t *cb, void *cb_priv) { return cb ? tcf_action_offload_cmd_cb_ex(fl_act, hw_count, cb, cb_priv) : tcf_action_offload_cmd_ex(fl_act, hw_count); } static int tcf_action_offload_add_ex(struct tc_action *action, struct netlink_ext_ack *extack, flow_indr_block_bind_cb_t *cb, void *cb_priv) { bool skip_sw = tc_act_skip_sw(action->tcfa_flags); struct tc_action *actions[TCA_ACT_MAX_PRIO] = { [0] = action, }; struct flow_offload_action *fl_action; u32 in_hw_count = 0; int num, err = 0; if (tc_act_skip_hw(action->tcfa_flags)) return 0; num = tcf_offload_act_num_actions_single(action); fl_action = offload_action_alloc(num); if (!fl_action) return -ENOMEM; err = offload_action_init(fl_action, action, FLOW_ACT_REPLACE, extack); if (err) goto fl_err; err = tc_setup_action(&fl_action->action, actions, 0, extack); if (err) { NL_SET_ERR_MSG_MOD(extack, "Failed to setup tc actions for offload"); goto fl_err; } err = tcf_action_offload_cmd(fl_action, &in_hw_count, cb, cb_priv); if (!err) cb ? offload_action_hw_count_inc(action, in_hw_count) : offload_action_hw_count_set(action, in_hw_count); if (skip_sw && !tc_act_in_hw(action)) err = -EINVAL; tc_cleanup_offload_action(&fl_action->action); fl_err: kfree(fl_action); return err; } /* offload the tc action after it is inserted */ static int tcf_action_offload_add(struct tc_action *action, struct netlink_ext_ack *extack) { return tcf_action_offload_add_ex(action, extack, NULL, NULL); } int tcf_action_update_hw_stats(struct tc_action *action) { struct flow_offload_action fl_act = {}; int err; err = offload_action_init(&fl_act, action, FLOW_ACT_STATS, NULL); if (err) return err; err = tcf_action_offload_cmd(&fl_act, NULL, NULL, NULL); if (!err) { preempt_disable(); tcf_action_stats_update(action, fl_act.stats.bytes, fl_act.stats.pkts, fl_act.stats.drops, fl_act.stats.lastused, true); preempt_enable(); action->used_hw_stats = fl_act.stats.used_hw_stats; action->used_hw_stats_valid = true; } else { return -EOPNOTSUPP; } return 0; } EXPORT_SYMBOL(tcf_action_update_hw_stats); static int tcf_action_offload_del_ex(struct tc_action *action, flow_indr_block_bind_cb_t *cb, void *cb_priv) { struct flow_offload_action fl_act = {}; u32 in_hw_count = 0; int err = 0; if (!tc_act_in_hw(action)) return 0; err = offload_action_init(&fl_act, action, FLOW_ACT_DESTROY, NULL); if (err) return err; err = tcf_action_offload_cmd(&fl_act, &in_hw_count, cb, cb_priv); if (err < 0) return err; if (!cb && action->in_hw_count != in_hw_count) return -EINVAL; /* do not need to update hw state when deleting action */ if (cb && in_hw_count) offload_action_hw_count_dec(action, in_hw_count); return 0; } static int tcf_action_offload_del(struct tc_action *action) { return tcf_action_offload_del_ex(action, NULL, NULL); } static void tcf_action_cleanup(struct tc_action *p) { tcf_action_offload_del(p); if (p->ops->cleanup) p->ops->cleanup(p); gen_kill_estimator(&p->tcfa_rate_est); free_tcf(p); } static int __tcf_action_put(struct tc_action *p, bool bind) { struct tcf_idrinfo *idrinfo = p->idrinfo; if (refcount_dec_and_mutex_lock(&p->tcfa_refcnt, &idrinfo->lock)) { if (bind) atomic_dec(&p->tcfa_bindcnt); idr_remove(&idrinfo->action_idr, p->tcfa_index); mutex_unlock(&idrinfo->lock); tcf_action_cleanup(p); return 1; } if (bind) atomic_dec(&p->tcfa_bindcnt); return 0; } static int __tcf_idr_release(struct tc_action *p, bool bind, bool strict) { int ret = 0; /* Release with strict==1 and bind==0 is only called through act API * interface (classifiers always bind). Only case when action with * positive reference count and zero bind count can exist is when it was * also created with act API (unbinding last classifier will destroy the * action if it was created by classifier). So only case when bind count * can be changed after initial check is when unbound action is * destroyed by act API while classifier binds to action with same id * concurrently. This result either creation of new action(same behavior * as before), or reusing existing action if concurrent process * increments reference count before action is deleted. Both scenarios * are acceptable. */ if (p) { if (!bind && strict && atomic_read(&p->tcfa_bindcnt) > 0) return -EPERM; if (__tcf_action_put(p, bind)) ret = ACT_P_DELETED; } return ret; } int tcf_idr_release(struct tc_action *a, bool bind) { const struct tc_action_ops *ops = a->ops; int ret; ret = __tcf_idr_release(a, bind, false); if (ret == ACT_P_DELETED) module_put(ops->owner); return ret; } EXPORT_SYMBOL(tcf_idr_release); static size_t tcf_action_shared_attrs_size(const struct tc_action *act) { struct tc_cookie *user_cookie; u32 cookie_len = 0; rcu_read_lock(); user_cookie = rcu_dereference(act->user_cookie); if (user_cookie) cookie_len = nla_total_size(user_cookie->len); rcu_read_unlock(); return nla_total_size(0) /* action number nested */ + nla_total_size(IFNAMSIZ) /* TCA_ACT_KIND */ + cookie_len /* TCA_ACT_COOKIE */ + nla_total_size(sizeof(struct nla_bitfield32)) /* TCA_ACT_HW_STATS */ + nla_total_size(0) /* TCA_ACT_STATS nested */ + nla_total_size(sizeof(struct nla_bitfield32)) /* TCA_ACT_FLAGS */ /* TCA_STATS_BASIC */ + nla_total_size_64bit(sizeof(struct gnet_stats_basic)) /* TCA_STATS_PKT64 */ + nla_total_size_64bit(sizeof(u64)) /* TCA_STATS_QUEUE */ + nla_total_size_64bit(sizeof(struct gnet_stats_queue)) + nla_total_size(0) /* TCA_ACT_OPTIONS nested */ + nla_total_size(sizeof(struct tcf_t)); /* TCA_GACT_TM */ } static size_t tcf_action_full_attrs_size(size_t sz) { return NLMSG_HDRLEN /* struct nlmsghdr */ + sizeof(struct tcamsg) + nla_total_size(0) /* TCA_ACT_TAB nested */ + sz; } static size_t tcf_action_fill_size(const struct tc_action *act) { size_t sz = tcf_action_shared_attrs_size(act); if (act->ops->get_fill_size) return act->ops->get_fill_size(act) + sz; return sz; } static int tcf_action_dump_terse(struct sk_buff *skb, struct tc_action *a, bool from_act) { unsigned char *b = skb_tail_pointer(skb); struct tc_cookie *cookie; if (nla_put_string(skb, TCA_ACT_KIND, a->ops->kind)) goto nla_put_failure; if (tcf_action_copy_stats(skb, a, 0)) goto nla_put_failure; if (from_act && nla_put_u32(skb, TCA_ACT_INDEX, a->tcfa_index)) goto nla_put_failure; rcu_read_lock(); cookie = rcu_dereference(a->user_cookie); if (cookie) { if (nla_put(skb, TCA_ACT_COOKIE, cookie->len, cookie->data)) { rcu_read_unlock(); goto nla_put_failure; } } rcu_read_unlock(); return 0; nla_put_failure: nlmsg_trim(skb, b); return -1; } static int tcf_action_dump_1(struct sk_buff *skb, struct tc_action *a, int bind, int ref) { unsigned char *b = skb_tail_pointer(skb); struct nlattr *nest; int err = -EINVAL; u32 flags; if (tcf_action_dump_terse(skb, a, false)) goto nla_put_failure; if (a->hw_stats != TCA_ACT_HW_STATS_ANY && nla_put_bitfield32(skb, TCA_ACT_HW_STATS, a->hw_stats, TCA_ACT_HW_STATS_ANY)) goto nla_put_failure; if (a->used_hw_stats_valid && nla_put_bitfield32(skb, TCA_ACT_USED_HW_STATS, a->used_hw_stats, TCA_ACT_HW_STATS_ANY)) goto nla_put_failure; flags = a->tcfa_flags & TCA_ACT_FLAGS_USER_MASK; if (flags && nla_put_bitfield32(skb, TCA_ACT_FLAGS, flags, flags)) goto nla_put_failure; if (nla_put_u32(skb, TCA_ACT_IN_HW_COUNT, a->in_hw_count)) goto nla_put_failure; nest = nla_nest_start_noflag(skb, TCA_ACT_OPTIONS); if (nest == NULL) goto nla_put_failure; err = tcf_action_dump_old(skb, a, bind, ref); if (err > 0) { nla_nest_end(skb, nest); return err; } nla_put_failure: nlmsg_trim(skb, b); return -1; } static int tcf_dump_walker(struct tcf_idrinfo *idrinfo, struct sk_buff *skb, struct netlink_callback *cb) { int err = 0, index = -1, s_i = 0, n_i = 0; u32 act_flags = cb->args[2]; unsigned long jiffy_since = cb->args[3]; struct nlattr *nest; struct idr *idr = &idrinfo->action_idr; struct tc_action *p; unsigned long id = 1; unsigned long tmp; mutex_lock(&idrinfo->lock); s_i = cb->args[0]; idr_for_each_entry_ul(idr, p, tmp, id) { index++; if (index < s_i) continue; if (IS_ERR(p)) continue; if (jiffy_since && time_after(jiffy_since, (unsigned long)p->tcfa_tm.lastuse)) continue; tcf_action_update_hw_stats(p); nest = nla_nest_start_noflag(skb, n_i); if (!nest) { index--; goto nla_put_failure; } err = (act_flags & TCA_ACT_FLAG_TERSE_DUMP) ? tcf_action_dump_terse(skb, p, true) : tcf_action_dump_1(skb, p, 0, 0); if (err < 0) { index--; nlmsg_trim(skb, nest); goto done; } nla_nest_end(skb, nest); n_i++; if (!(act_flags & TCA_ACT_FLAG_LARGE_DUMP_ON) && n_i >= TCA_ACT_MAX_PRIO) goto done; } done: if (index >= 0) cb->args[0] = index + 1; mutex_unlock(&idrinfo->lock); if (n_i) { if (act_flags & TCA_ACT_FLAG_LARGE_DUMP_ON) cb->args[1] = n_i; } return n_i; nla_put_failure: nla_nest_cancel(skb, nest); goto done; } static int tcf_idr_release_unsafe(struct tc_action *p) { if (atomic_read(&p->tcfa_bindcnt) > 0) return -EPERM; if (refcount_dec_and_test(&p->tcfa_refcnt)) { idr_remove(&p->idrinfo->action_idr, p->tcfa_index); tcf_action_cleanup(p); return ACT_P_DELETED; } return 0; } static int tcf_del_walker(struct tcf_idrinfo *idrinfo, struct sk_buff *skb, const struct tc_action_ops *ops, struct netlink_ext_ack *extack) { struct nlattr *nest; int n_i = 0; int ret = -EINVAL; struct idr *idr = &idrinfo->action_idr; struct tc_action *p; unsigned long id = 1; unsigned long tmp; nest = nla_nest_start_noflag(skb, 0); if (nest == NULL) goto nla_put_failure; if (nla_put_string(skb, TCA_ACT_KIND, ops->kind)) goto nla_put_failure; ret = 0; mutex_lock(&idrinfo->lock); idr_for_each_entry_ul(idr, p, tmp, id) { if (IS_ERR(p)) continue; ret = tcf_idr_release_unsafe(p); if (ret == ACT_P_DELETED) module_put(ops->owner); else if (ret < 0) break; n_i++; } mutex_unlock(&idrinfo->lock); if (ret < 0) { if (n_i) NL_SET_ERR_MSG(extack, "Unable to flush all TC actions"); else goto nla_put_failure; } ret = nla_put_u32(skb, TCA_FCNT, n_i); if (ret) goto nla_put_failure; nla_nest_end(skb, nest); return n_i; nla_put_failure: nla_nest_cancel(skb, nest); return ret; } int tcf_generic_walker(struct tc_action_net *tn, struct sk_buff *skb, struct netlink_callback *cb, int type, const struct tc_action_ops *ops, struct netlink_ext_ack *extack) { struct tcf_idrinfo *idrinfo = tn->idrinfo; if (type == RTM_DELACTION) { return tcf_del_walker(idrinfo, skb, ops, extack); } else if (type == RTM_GETACTION) { return tcf_dump_walker(idrinfo, skb, cb); } else { WARN(1, "tcf_generic_walker: unknown command %d\n", type); NL_SET_ERR_MSG(extack, "tcf_generic_walker: unknown command"); return -EINVAL; } } EXPORT_SYMBOL(tcf_generic_walker); int tcf_idr_search(struct tc_action_net *tn, struct tc_action **a, u32 index) { struct tcf_idrinfo *idrinfo = tn->idrinfo; struct tc_action *p; mutex_lock(&idrinfo->lock); p = idr_find(&idrinfo->action_idr, index); if (IS_ERR(p)) p = NULL; else if (p) refcount_inc(&p->tcfa_refcnt); mutex_unlock(&idrinfo->lock); if (p) { *a = p; return true; } return false; } EXPORT_SYMBOL(tcf_idr_search); static int __tcf_generic_walker(struct net *net, struct sk_buff *skb, struct netlink_callback *cb, int type, const struct tc_action_ops *ops, struct netlink_ext_ack *extack) { struct tc_action_net *tn = net_generic(net, ops->net_id); if (unlikely(ops->walk)) return ops->walk(net, skb, cb, type, ops, extack); return tcf_generic_walker(tn, skb, cb, type, ops, extack); } static int __tcf_idr_search(struct net *net, const struct tc_action_ops *ops, struct tc_action **a, u32 index) { struct tc_action_net *tn = net_generic(net, ops->net_id); if (unlikely(ops->lookup)) return ops->lookup(net, a, index); return tcf_idr_search(tn, a, index); } static int tcf_idr_delete_index(struct tcf_idrinfo *idrinfo, u32 index) { struct tc_action *p; int ret = 0; mutex_lock(&idrinfo->lock); p = idr_find(&idrinfo->action_idr, index); if (!p) { mutex_unlock(&idrinfo->lock); return -ENOENT; } if (!atomic_read(&p->tcfa_bindcnt)) { if (refcount_dec_and_test(&p->tcfa_refcnt)) { struct module *owner = p->ops->owner; WARN_ON(p != idr_remove(&idrinfo->action_idr, p->tcfa_index)); mutex_unlock(&idrinfo->lock); tcf_action_cleanup(p); module_put(owner); return 0; } ret = 0; } else { ret = -EPERM; } mutex_unlock(&idrinfo->lock); return ret; } int tcf_idr_create(struct tc_action_net *tn, u32 index, struct nlattr *est, struct tc_action **a, const struct tc_action_ops *ops, int bind, bool cpustats, u32 flags) { struct tc_action *p = kzalloc(ops->size, GFP_KERNEL); struct tcf_idrinfo *idrinfo = tn->idrinfo; int err = -ENOMEM; if (unlikely(!p)) return -ENOMEM; refcount_set(&p->tcfa_refcnt, 1); if (bind) atomic_set(&p->tcfa_bindcnt, 1); if (cpustats) { p->cpu_bstats = netdev_alloc_pcpu_stats(struct gnet_stats_basic_sync); if (!p->cpu_bstats) goto err1; p->cpu_bstats_hw = netdev_alloc_pcpu_stats(struct gnet_stats_basic_sync); if (!p->cpu_bstats_hw) goto err2; p->cpu_qstats = alloc_percpu(struct gnet_stats_queue); if (!p->cpu_qstats) goto err3; } gnet_stats_basic_sync_init(&p->tcfa_bstats); gnet_stats_basic_sync_init(&p->tcfa_bstats_hw); spin_lock_init(&p->tcfa_lock); p->tcfa_index = index; p->tcfa_tm.install = jiffies; p->tcfa_tm.lastuse = jiffies; p->tcfa_tm.firstuse = 0; p->tcfa_flags = flags; if (est) { err = gen_new_estimator(&p->tcfa_bstats, p->cpu_bstats, &p->tcfa_rate_est, &p->tcfa_lock, false, est); if (err) goto err4; } p->idrinfo = idrinfo; __module_get(ops->owner); p->ops = ops; *a = p; return 0; err4: free_percpu(p->cpu_qstats); err3: free_percpu(p->cpu_bstats_hw); err2: free_percpu(p->cpu_bstats); err1: kfree(p); return err; } EXPORT_SYMBOL(tcf_idr_create); int tcf_idr_create_from_flags(struct tc_action_net *tn, u32 index, struct nlattr *est, struct tc_action **a, const struct tc_action_ops *ops, int bind, u32 flags) { /* Set cpustats according to actions flags. */ return tcf_idr_create(tn, index, est, a, ops, bind, !(flags & TCA_ACT_FLAGS_NO_PERCPU_STATS), flags); } EXPORT_SYMBOL(tcf_idr_create_from_flags); /* Cleanup idr index that was allocated but not initialized. */ void tcf_idr_cleanup(struct tc_action_net *tn, u32 index) { struct tcf_idrinfo *idrinfo = tn->idrinfo; mutex_lock(&idrinfo->lock); /* Remove ERR_PTR(-EBUSY) allocated by tcf_idr_check_alloc */ WARN_ON(!IS_ERR(idr_remove(&idrinfo->action_idr, index))); mutex_unlock(&idrinfo->lock); } EXPORT_SYMBOL(tcf_idr_cleanup); /* Check if action with specified index exists. If actions is found, increments * its reference and bind counters, and return 1. Otherwise insert temporary * error pointer (to prevent concurrent users from inserting actions with same * index) and return 0. * * May return -EAGAIN for binding actions in case of a parallel add/delete on * the requested index. */ int tcf_idr_check_alloc(struct tc_action_net *tn, u32 *index, struct tc_action **a, int bind) { struct tcf_idrinfo *idrinfo = tn->idrinfo; struct tc_action *p; int ret; u32 max; if (*index) { rcu_read_lock(); p = idr_find(&idrinfo->action_idr, *index); if (IS_ERR(p)) { /* This means that another process allocated * index but did not assign the pointer yet. */ rcu_read_unlock(); return -EAGAIN; } if (!p) { /* Empty slot, try to allocate it */ max = *index; rcu_read_unlock(); goto new; } if (!refcount_inc_not_zero(&p->tcfa_refcnt)) { /* Action was deleted in parallel */ rcu_read_unlock(); return -EAGAIN; } if (bind) atomic_inc(&p->tcfa_bindcnt); *a = p; rcu_read_unlock(); return 1; } else { /* Find a slot */ *index = 1; max = UINT_MAX; } new: *a = NULL; mutex_lock(&idrinfo->lock); ret = idr_alloc_u32(&idrinfo->action_idr, ERR_PTR(-EBUSY), index, max, GFP_KERNEL); mutex_unlock(&idrinfo->lock); /* N binds raced for action allocation, * retry for all the ones that failed. */ if (ret == -ENOSPC && *index == max) ret = -EAGAIN; return ret; } EXPORT_SYMBOL(tcf_idr_check_alloc); void tcf_idrinfo_destroy(const struct tc_action_ops *ops, struct tcf_idrinfo *idrinfo) { struct idr *idr = &idrinfo->action_idr; bool mutex_taken = false; struct tc_action *p; unsigned long id = 1; unsigned long tmp; int ret; idr_for_each_entry_ul(idr, p, tmp, id) { if (IS_ERR(p)) continue; if (tc_act_in_hw(p) && !mutex_taken) { rtnl_lock(); mutex_taken = true; } ret = __tcf_idr_release(p, false, true); if (ret == ACT_P_DELETED) module_put(ops->owner); else if (ret < 0) return; } if (mutex_taken) rtnl_unlock(); idr_destroy(&idrinfo->action_idr); } EXPORT_SYMBOL(tcf_idrinfo_destroy); static LIST_HEAD(act_base); static DEFINE_RWLOCK(act_mod_lock); /* since act ops id is stored in pernet subsystem list, * then there is no way to walk through only all the action * subsystem, so we keep tc action pernet ops id for * reoffload to walk through. */ static LIST_HEAD(act_pernet_id_list); static DEFINE_MUTEX(act_id_mutex); struct tc_act_pernet_id { struct list_head list; unsigned int id; }; static int tcf_pernet_add_id_list(unsigned int id) { struct tc_act_pernet_id *id_ptr; int ret = 0; mutex_lock(&act_id_mutex); list_for_each_entry(id_ptr, &act_pernet_id_list, list) { if (id_ptr->id == id) { ret = -EEXIST; goto err_out; } } id_ptr = kzalloc_obj(*id_ptr); if (!id_ptr) { ret = -ENOMEM; goto err_out; } id_ptr->id = id; list_add_tail(&id_ptr->list, &act_pernet_id_list); err_out: mutex_unlock(&act_id_mutex); return ret; } static void tcf_pernet_del_id_list(unsigned int id) { struct tc_act_pernet_id *id_ptr; mutex_lock(&act_id_mutex); list_for_each_entry(id_ptr, &act_pernet_id_list, list) { if (id_ptr->id == id) { list_del(&id_ptr->list); kfree(id_ptr); break; } } mutex_unlock(&act_id_mutex); } int tcf_register_action(struct tc_action_ops *act, struct pernet_operations *ops) { struct tc_action_ops *a; int ret; if (!act->act || !act->dump || !act->init) return -EINVAL; /* We have to register pernet ops before making the action ops visible, * otherwise tcf_action_init_1() could get a partially initialized * netns. */ ret = register_pernet_subsys(ops); if (ret) return ret; if (ops->id) { ret = tcf_pernet_add_id_list(*ops->id); if (ret) goto err_id; } write_lock(&act_mod_lock); list_for_each_entry(a, &act_base, head) { if (act->id == a->id || (strcmp(act->kind, a->kind) == 0)) { ret = -EEXIST; goto err_out; } } list_add_tail(&act->head, &act_base); write_unlock(&act_mod_lock); return 0; err_out: write_unlock(&act_mod_lock); if (ops->id) tcf_pernet_del_id_list(*ops->id); err_id: unregister_pernet_subsys(ops); return ret; } EXPORT_SYMBOL(tcf_register_action); int tcf_unregister_action(struct tc_action_ops *act, struct pernet_operations *ops) { struct tc_action_ops *a; int err = -ENOENT; write_lock(&act_mod_lock); list_for_each_entry(a, &act_base, head) { if (a == act) { list_del(&act->head); err = 0; break; } } write_unlock(&act_mod_lock); if (!err) { unregister_pernet_subsys(ops); if (ops->id) tcf_pernet_del_id_list(*ops->id); } return err; } EXPORT_SYMBOL(tcf_unregister_action); /* lookup by name */ static struct tc_action_ops *tc_lookup_action_n(char *kind) { struct tc_action_ops *a, *res = NULL; if (kind) { read_lock(&act_mod_lock); list_for_each_entry(a, &act_base, head) { if (strcmp(kind, a->kind) == 0) { if (try_module_get(a->owner)) res = a; break; } } read_unlock(&act_mod_lock); } return res; } /* lookup by nlattr */ static struct tc_action_ops *tc_lookup_action(struct nlattr *kind) { struct tc_action_ops *a, *res = NULL; if (kind) { read_lock(&act_mod_lock); list_for_each_entry(a, &act_base, head) { if (nla_strcmp(kind, a->kind) == 0) { if (try_module_get(a->owner)) res = a; break; } } read_unlock(&act_mod_lock); } return res; } /*TCA_ACT_MAX_PRIO is 32, there count up to 32 */ #define TCA_ACT_MAX_PRIO_MASK 0x1FF int tcf_action_exec(struct sk_buff *skb, struct tc_action **actions, int nr_actions, struct tcf_result *res) { u32 jmp_prgcnt = 0; u32 jmp_ttl = TCA_ACT_MAX_PRIO; /*matches actions per filter */ int i; int ret = TC_ACT_OK; if (skb_skip_tc_classify(skb)) return TC_ACT_OK; restart_act_graph: for (i = 0; i < nr_actions; i++) { const struct tc_action *a = actions[i]; int repeat_ttl; if (jmp_prgcnt > 0) { jmp_prgcnt -= 1; continue; } if (tc_act_skip_sw(a->tcfa_flags)) continue; repeat_ttl = 32; repeat: ret = tc_act(skb, a, res); if (unlikely(ret == TC_ACT_REPEAT)) { if (--repeat_ttl != 0) goto repeat; /* suspicious opcode, stop pipeline */ net_warn_ratelimited("TC_ACT_REPEAT abuse ?\n"); return TC_ACT_OK; } if (TC_ACT_EXT_CMP(ret, TC_ACT_JUMP)) { jmp_prgcnt = ret & TCA_ACT_MAX_PRIO_MASK; if (!jmp_prgcnt || (jmp_prgcnt > nr_actions)) { /* faulty opcode, stop pipeline */ return TC_ACT_OK; } else { jmp_ttl -= 1; if (jmp_ttl > 0) goto restart_act_graph; else /* faulty graph, stop pipeline */ return TC_ACT_OK; } } else if (TC_ACT_EXT_CMP(ret, TC_ACT_GOTO_CHAIN)) { if (unlikely(!rcu_access_pointer(a->goto_chain))) { tcf_set_drop_reason(skb, SKB_DROP_REASON_TC_CHAIN_NOTFOUND); return TC_ACT_SHOT; } tcf_action_goto_chain_exec(a, res); } if (ret != TC_ACT_PIPE) break; } return ret; } EXPORT_SYMBOL(tcf_action_exec); int tcf_action_destroy(struct tc_action *actions[], int bind) { const struct tc_action_ops *ops; struct tc_action *a; int ret = 0, i; tcf_act_for_each_action(i, a, actions) { actions[i] = NULL; ops = a->ops; ret = __tcf_idr_release(a, bind, true); if (ret == ACT_P_DELETED) module_put(ops->owner); else if (ret < 0) return ret; } return ret; } static int tcf_action_put(struct tc_action *p) { return __tcf_action_put(p, false); } static void tcf_action_put_many(struct tc_action *actions[]) { struct tc_action *a; int i; tcf_act_for_each_action(i, a, actions) { const struct tc_action_ops *ops = a->ops; if (tcf_action_put(a)) module_put(ops->owner); } } static void tca_put_bound_many(struct tc_action *actions[], int init_res[]) { struct tc_action *a; int i; tcf_act_for_each_action(i, a, actions) { const struct tc_action_ops *ops = a->ops; if (init_res[i] == ACT_P_CREATED) continue; if (tcf_action_put(a)) module_put(ops->owner); } } int tcf_action_dump_old(struct sk_buff *skb, struct tc_action *a, int bind, int ref) { return a->ops->dump(skb, a, bind, ref); } int tcf_action_dump(struct sk_buff *skb, struct tc_action *actions[], int bind, int ref, bool terse) { struct tc_action *a; int err = -EINVAL, i; struct nlattr *nest; tcf_act_for_each_action(i, a, actions) { nest = nla_nest_start_noflag(skb, i + 1); if (nest == NULL) goto nla_put_failure; err = terse ? tcf_action_dump_terse(skb, a, false) : tcf_action_dump_1(skb, a, bind, ref); if (err < 0) goto errout; nla_nest_end(skb, nest); } return 0; nla_put_failure: err = -EINVAL; errout: nla_nest_cancel(skb, nest); return err; } static struct tc_cookie *nla_memdup_cookie(struct nlattr **tb) { struct tc_cookie *c = kzalloc_obj(*c); if (!c) return NULL; c->data = nla_memdup(tb[TCA_ACT_COOKIE], GFP_KERNEL); if (!c->data) { kfree(c); return NULL; } c->len = nla_len(tb[TCA_ACT_COOKIE]); return c; } static u8 tcf_action_hw_stats_get(struct nlattr *hw_stats_attr) { struct nla_bitfield32 hw_stats_bf; /* If the user did not pass the attr, that means he does * not care about the type. Return "any" in that case * which is setting on all supported types. */ if (!hw_stats_attr) return TCA_ACT_HW_STATS_ANY; hw_stats_bf = nla_get_bitfield32(hw_stats_attr); return hw_stats_bf.value; } static const struct nla_policy tcf_action_policy[TCA_ACT_MAX + 1] = { [TCA_ACT_KIND] = { .type = NLA_STRING }, [TCA_ACT_INDEX] = { .type = NLA_U32 }, [TCA_ACT_COOKIE] = { .type = NLA_BINARY, .len = TC_COOKIE_MAX_SIZE }, [TCA_ACT_OPTIONS] = { .type = NLA_NESTED }, [TCA_ACT_FLAGS] = NLA_POLICY_BITFIELD32(TCA_ACT_FLAGS_NO_PERCPU_STATS | TCA_ACT_FLAGS_SKIP_HW | TCA_ACT_FLAGS_SKIP_SW), [TCA_ACT_HW_STATS] = NLA_POLICY_BITFIELD32(TCA_ACT_HW_STATS_ANY), }; void tcf_idr_insert_many(struct tc_action *actions[], int init_res[]) { struct tc_action *a; int i; tcf_act_for_each_action(i, a, actions) { struct tcf_idrinfo *idrinfo; if (init_res[i] == ACT_P_BOUND) continue; idrinfo = a->idrinfo; mutex_lock(&idrinfo->lock); /* Replace ERR_PTR(-EBUSY) allocated by tcf_idr_check_alloc */ idr_replace(&idrinfo->action_idr, a, a->tcfa_index); mutex_unlock(&idrinfo->lock); } } struct tc_action_ops *tc_action_load_ops(struct nlattr *nla, u32 flags, struct netlink_ext_ack *extack) { bool police = flags & TCA_ACT_FLAGS_POLICE; struct nlattr *tb[TCA_ACT_MAX + 1]; struct tc_action_ops *a_o; char act_name[IFNAMSIZ]; struct nlattr *kind; int err; if (!police) { err = nla_parse_nested_deprecated(tb, TCA_ACT_MAX, nla, tcf_action_policy, extack); if (err < 0) return ERR_PTR(err); err = -EINVAL; kind = tb[TCA_ACT_KIND]; if (!kind) { NL_SET_ERR_MSG(extack, "TC action kind must be specified"); return ERR_PTR(err); } if (nla_strscpy(act_name, kind, IFNAMSIZ) < 0) { NL_SET_ERR_MSG(extack, "TC action name too long"); return ERR_PTR(err); } } else { if (strscpy(act_name, "police", IFNAMSIZ) < 0) { NL_SET_ERR_MSG(extack, "TC action name too long"); return ERR_PTR(-EINVAL); } } a_o = tc_lookup_action_n(act_name); if (a_o == NULL) { #ifdef CONFIG_MODULES bool rtnl_held = !(flags & TCA_ACT_FLAGS_NO_RTNL); if (rtnl_held) rtnl_unlock(); request_module(NET_ACT_ALIAS_PREFIX "%s", act_name); if (rtnl_held) rtnl_lock(); a_o = tc_lookup_action_n(act_name); /* We dropped the RTNL semaphore in order to * perform the module load. So, even if we * succeeded in loading the module we have to * tell the caller to replay the request. We * indicate this using -EAGAIN. */ if (a_o != NULL) { module_put(a_o->owner); return ERR_PTR(-EAGAIN); } #endif NL_SET_ERR_MSG(extack, "Failed to load TC action module"); return ERR_PTR(-ENOENT); } return a_o; } struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp, struct nlattr *nla, struct nlattr *est, struct tc_action_ops *a_o, int *init_res, u32 flags, struct netlink_ext_ack *extack) { bool police = flags & TCA_ACT_FLAGS_POLICE; struct nla_bitfield32 userflags = { 0, 0 }; struct tc_cookie *user_cookie = NULL; u8 hw_stats = TCA_ACT_HW_STATS_ANY; struct nlattr *tb[TCA_ACT_MAX + 1]; struct tc_action *a; int err; /* backward compatibility for policer */ if (!police) { err = nla_parse_nested_deprecated(tb, TCA_ACT_MAX, nla, tcf_action_policy, extack); if (err < 0) return ERR_PTR(err); if (tb[TCA_ACT_COOKIE]) { user_cookie = nla_memdup_cookie(tb); if (!user_cookie) { NL_SET_ERR_MSG(extack, "No memory to generate TC cookie"); err = -ENOMEM; goto err_out; } } hw_stats = tcf_action_hw_stats_get(tb[TCA_ACT_HW_STATS]); if (tb[TCA_ACT_FLAGS]) { userflags = nla_get_bitfield32(tb[TCA_ACT_FLAGS]); if (!tc_act_flags_valid(userflags.value)) { err = -EINVAL; goto err_out; } } err = a_o->init(net, tb[TCA_ACT_OPTIONS], est, &a, tp, userflags.value | flags, extack); } else { err = a_o->init(net, nla, est, &a, tp, userflags.value | flags, extack); } if (err < 0) goto err_out; *init_res = err; if (!police && tb[TCA_ACT_COOKIE]) tcf_set_action_cookie(&a->user_cookie, user_cookie); if (!police) a->hw_stats = hw_stats; return a; err_out: if (user_cookie) { kfree(user_cookie->data); kfree(user_cookie); } return ERR_PTR(err); } static bool tc_act_bind(u32 flags) { return !!(flags & TCA_ACT_FLAGS_BIND); } /* Returns numbers of initialized actions or negative error. */ int tcf_action_init(struct net *net, struct tcf_proto *tp, struct nlattr *nla, struct nlattr *est, struct tc_action *actions[], int init_res[], size_t *attr_size, u32 flags, u32 fl_flags, struct netlink_ext_ack *extack) { struct tc_action_ops *ops[TCA_ACT_MAX_PRIO] = {}; struct nlattr *tb[TCA_ACT_MAX_PRIO + 2]; struct tc_action *act; size_t sz = 0; int err; int i; err = nla_parse_nested_deprecated(tb, TCA_ACT_MAX_PRIO + 1, nla, NULL, extack); if (err < 0) return err; /* The nested attributes are parsed as types, but they are really an * array of actions. So we parse one more than we can handle, and return * an error if the last one is set (as that indicates that the request * contained more than the maximum number of actions). */ if (tb[TCA_ACT_MAX_PRIO + 1]) { NL_SET_ERR_MSG_FMT(extack, "Only %d actions supported per filter", TCA_ACT_MAX_PRIO); return -EINVAL; } for (i = 1; i <= TCA_ACT_MAX_PRIO && tb[i]; i++) { struct tc_action_ops *a_o; a_o = tc_action_load_ops(tb[i], flags, extack); if (IS_ERR(a_o)) { err = PTR_ERR(a_o); goto err_mod; } ops[i - 1] = a_o; } for (i = 1; i <= TCA_ACT_MAX_PRIO && tb[i]; i++) { act = tcf_action_init_1(net, tp, tb[i], est, ops[i - 1], &init_res[i - 1], flags, extack); if (IS_ERR(act)) { err = PTR_ERR(act); goto err; } sz += tcf_action_fill_size(act); /* Start from index 0 */ actions[i - 1] = act; if (tc_act_bind(flags)) { bool skip_sw = tc_skip_sw(fl_flags); bool skip_hw = tc_skip_hw(fl_flags); if (tc_act_bind(act->tcfa_flags)) { /* Action is created by classifier and is not * standalone. Check that the user did not set * any action flags different than the * classifier flags, and inherit the flags from * the classifier for the compatibility case * where no flags were specified at all. */ if ((tc_act_skip_sw(act->tcfa_flags) && !skip_sw) || (tc_act_skip_hw(act->tcfa_flags) && !skip_hw)) { NL_SET_ERR_MSG(extack, "Mismatch between action and filter offload flags"); err = -EINVAL; goto err; } if (skip_sw) act->tcfa_flags |= TCA_ACT_FLAGS_SKIP_SW; if (skip_hw) act->tcfa_flags |= TCA_ACT_FLAGS_SKIP_HW; continue; } /* Action is standalone */ if (skip_sw != tc_act_skip_sw(act->tcfa_flags) || skip_hw != tc_act_skip_hw(act->tcfa_flags)) { NL_SET_ERR_MSG(extack, "Mismatch between action and filter offload flags"); err = -EINVAL; goto err; } } else { err = tcf_action_offload_add(act, extack); if (tc_act_skip_sw(act->tcfa_flags) && err) goto err; } } /* We have to commit them all together, because if any error happened in * between, we could not handle the failure gracefully. */ tcf_idr_insert_many(actions, init_res); *attr_size = tcf_action_full_attrs_size(sz); err = i - 1; goto err_mod; err: tcf_action_destroy(actions, flags & TCA_ACT_FLAGS_BIND); err_mod: for (i = 0; i < TCA_ACT_MAX_PRIO && ops[i]; i++) module_put(ops[i]->owner); return err; } void tcf_action_update_stats(struct tc_action *a, u64 bytes, u64 packets, u64 drops, bool hw) { if (a->cpu_bstats) { _bstats_update(this_cpu_ptr(a->cpu_bstats), bytes, packets); this_cpu_ptr(a->cpu_qstats)->drops += drops; if (hw) _bstats_update(this_cpu_ptr(a->cpu_bstats_hw), bytes, packets); return; } _bstats_update(&a->tcfa_bstats, bytes, packets); atomic_add(drops, &a->tcfa_drops); if (hw) _bstats_update(&a->tcfa_bstats_hw, bytes, packets); } EXPORT_SYMBOL(tcf_action_update_stats); int tcf_action_copy_stats(struct sk_buff *skb, struct tc_action *p, int compat_mode) { struct gnet_stats_queue qstats = {0}; struct gnet_dump d; int err = 0; if (p == NULL) goto errout; /* compat_mode being true specifies a call that is supposed * to add additional backward compatibility statistic TLVs. */ if (compat_mode) { if (p->type == TCA_OLD_COMPAT) err = gnet_stats_start_copy_compat(skb, 0, TCA_STATS, TCA_XSTATS, &p->tcfa_lock, &d, TCA_PAD); else return 0; } else err = gnet_stats_start_copy(skb, TCA_ACT_STATS, &p->tcfa_lock, &d, TCA_ACT_PAD); if (err < 0) goto errout; qstats.drops = atomic_read(&p->tcfa_drops); qstats.overlimits = atomic_read(&p->tcfa_overlimits); if (gnet_stats_copy_basic(&d, p->cpu_bstats, &p->tcfa_bstats, false) < 0 || gnet_stats_copy_basic_hw(&d, p->cpu_bstats_hw, &p->tcfa_bstats_hw, false) < 0 || gnet_stats_copy_rate_est(&d, &p->tcfa_rate_est) < 0 || gnet_stats_copy_queue(&d, p->cpu_qstats, &qstats, qstats.qlen) < 0) goto errout; if (gnet_stats_finish_copy(&d) < 0) goto errout; return 0; errout: return -1; } static int tca_get_fill(struct sk_buff *skb, struct tc_action *actions[], u32 portid, u32 seq, u16 flags, int event, int bind, int ref, struct netlink_ext_ack *extack) { struct tcamsg *t; struct nlmsghdr *nlh; unsigned char *b = skb_tail_pointer(skb); struct nlattr *nest; nlh = nlmsg_put(skb, portid, seq, event, sizeof(*t), flags); if (!nlh) goto out_nlmsg_trim; t = nlmsg_data(nlh); t->tca_family = AF_UNSPEC; t->tca__pad1 = 0; t->tca__pad2 = 0; if (extack && extack->_msg && nla_put_string(skb, TCA_ROOT_EXT_WARN_MSG, extack->_msg)) goto out_nlmsg_trim; nest = nla_nest_start_noflag(skb, TCA_ACT_TAB); if (!nest) goto out_nlmsg_trim; if (tcf_action_dump(skb, actions, bind, ref, false) < 0) goto out_nlmsg_trim; nla_nest_end(skb, nest); nlh->nlmsg_len = skb_tail_pointer(skb) - b; return skb->len; out_nlmsg_trim: nlmsg_trim(skb, b); return -1; } static int tcf_get_notify(struct net *net, u32 portid, struct nlmsghdr *n, struct tc_action *actions[], int event, struct netlink_ext_ack *extack) { struct sk_buff *skb; skb = alloc_skb(NLMSG_GOODSIZE, GFP_KERNEL); if (!skb) return -ENOBUFS; if (tca_get_fill(skb, actions, portid, n->nlmsg_seq, 0, event, 0, 1, NULL) <= 0) { NL_SET_ERR_MSG(extack, "Failed to fill netlink attributes while adding TC action"); kfree_skb(skb); return -EINVAL; } return rtnl_unicast(skb, net, portid); } static struct tc_action *tcf_action_get_1(struct net *net, struct nlattr *nla, struct nlmsghdr *n, u32 portid, struct netlink_ext_ack *extack) { struct nlattr *tb[TCA_ACT_MAX + 1]; const struct tc_action_ops *ops; struct tc_action *a; int index; int err; err = nla_parse_nested_deprecated(tb, TCA_ACT_MAX, nla, tcf_action_policy, extack); if (err < 0) goto err_out; err = -EINVAL; if (tb[TCA_ACT_INDEX] == NULL || nla_len(tb[TCA_ACT_INDEX]) < sizeof(index)) { NL_SET_ERR_MSG(extack, "Invalid TC action index value"); goto err_out; } index = nla_get_u32(tb[TCA_ACT_INDEX]); err = -EINVAL; ops = tc_lookup_action(tb[TCA_ACT_KIND]); if (!ops) { /* could happen in batch of actions */ NL_SET_ERR_MSG(extack, "Specified TC action kind not found"); goto err_out; } err = -ENOENT; if (__tcf_idr_search(net, ops, &a, index) == 0) { NL_SET_ERR_MSG(extack, "TC action with specified index not found"); goto err_mod; } module_put(ops->owner); return a; err_mod: module_put(ops->owner); err_out: return ERR_PTR(err); } static int tca_action_flush(struct net *net, struct nlattr *nla, struct nlmsghdr *n, u32 portid, struct netlink_ext_ack *extack) { struct sk_buff *skb; unsigned char *b; struct nlmsghdr *nlh; struct tcamsg *t; struct netlink_callback dcb; struct nlattr *nest; struct nlattr *tb[TCA_ACT_MAX + 1]; const struct tc_action_ops *ops; struct nlattr *kind; int err = -ENOMEM; skb = alloc_skb(NLMSG_GOODSIZE, GFP_KERNEL); if (!skb) return err; b = skb_tail_pointer(skb); err = nla_parse_nested_deprecated(tb, TCA_ACT_MAX, nla, tcf_action_policy, extack); if (err < 0) goto err_out; err = -EINVAL; kind = tb[TCA_ACT_KIND]; ops = tc_lookup_action(kind); if (!ops) { /*some idjot trying to flush unknown action */ NL_SET_ERR_MSG(extack, "Cannot flush unknown TC action"); goto err_out; } nlh = nlmsg_put(skb, portid, n->nlmsg_seq, RTM_DELACTION, sizeof(*t), 0); if (!nlh) { NL_SET_ERR_MSG(extack, "Failed to create TC action flush notification"); goto out_module_put; } t = nlmsg_data(nlh); t->tca_family = AF_UNSPEC; t->tca__pad1 = 0; t->tca__pad2 = 0; nest = nla_nest_start_noflag(skb, TCA_ACT_TAB); if (!nest) { NL_SET_ERR_MSG(extack, "Failed to add new netlink message"); goto out_module_put; } err = __tcf_generic_walker(net, skb, &dcb, RTM_DELACTION, ops, extack); if (err <= 0) { nla_nest_cancel(skb, nest); goto out_module_put; } nla_nest_end(skb, nest); nlh->nlmsg_len = skb_tail_pointer(skb) - b; nlh->nlmsg_flags |= NLM_F_ROOT; module_put(ops->owner); err = rtnetlink_send(skb, net, portid, RTNLGRP_TC, n->nlmsg_flags & NLM_F_ECHO); if (err < 0) NL_SET_ERR_MSG(extack, "Failed to send TC action flush notification"); return err; out_module_put: module_put(ops->owner); err_out: kfree_skb(skb); return err; } static int tcf_action_delete(struct net *net, struct tc_action *actions[]) { struct tc_action *a; int i; tcf_act_for_each_action(i, a, actions) { const struct tc_action_ops *ops = a->ops; /* Actions can be deleted concurrently so we must save their * type and id to search again after reference is released. */ struct tcf_idrinfo *idrinfo = a->idrinfo; u32 act_index = a->tcfa_index; actions[i] = NULL; if (tcf_action_put(a)) { /* last reference, action was deleted concurrently */ module_put(ops->owner); } else { int ret; /* now do the delete */ ret = tcf_idr_delete_index(idrinfo, act_index); if (ret < 0) return ret; } } return 0; } static struct sk_buff *tcf_reoffload_del_notify_msg(struct net *net, struct tc_action *action) { size_t attr_size = tcf_action_fill_size(action); struct tc_action *actions[TCA_ACT_MAX_PRIO] = { [0] = action, }; struct sk_buff *skb; skb = alloc_skb(max(attr_size, NLMSG_GOODSIZE), GFP_KERNEL); if (!skb) return ERR_PTR(-ENOBUFS); if (tca_get_fill(skb, actions, 0, 0, 0, RTM_DELACTION, 0, 1, NULL) <= 0) { kfree_skb(skb); return ERR_PTR(-EINVAL); } return skb; } static int tcf_reoffload_del_notify(struct net *net, struct tc_action *action) { const struct tc_action_ops *ops = action->ops; struct sk_buff *skb; int ret; if (!rtnl_notify_needed(net, 0, RTNLGRP_TC)) { skb = NULL; } else { skb = tcf_reoffload_del_notify_msg(net, action); if (IS_ERR(skb)) return PTR_ERR(skb); } ret = tcf_idr_release_unsafe(action); if (ret == ACT_P_DELETED) { module_put(ops->owner); ret = rtnetlink_maybe_send(skb, net, 0, RTNLGRP_TC, 0); } else { kfree_skb(skb); } return ret; } int tcf_action_reoffload_cb(flow_indr_block_bind_cb_t *cb, void *cb_priv, bool add) { struct tc_act_pernet_id *id_ptr; struct tcf_idrinfo *idrinfo; struct tc_action_net *tn; struct tc_action *p; unsigned int act_id; unsigned long tmp; unsigned long id; struct idr *idr; struct net *net; int ret; if (!cb) return -EINVAL; down_read(&net_rwsem); mutex_lock(&act_id_mutex); for_each_net(net) { list_for_each_entry(id_ptr, &act_pernet_id_list, list) { act_id = id_ptr->id; tn = net_generic(net, act_id); if (!tn) continue; idrinfo = tn->idrinfo; if (!idrinfo) continue; mutex_lock(&idrinfo->lock); idr = &idrinfo->action_idr; idr_for_each_entry_ul(idr, p, tmp, id) { if (IS_ERR(p) || tc_act_bind(p->tcfa_flags)) continue; if (add) { tcf_action_offload_add_ex(p, NULL, cb, cb_priv); continue; } /* cb unregister to update hw count */ ret = tcf_action_offload_del_ex(p, cb, cb_priv); if (ret < 0) continue; if (tc_act_skip_sw(p->tcfa_flags) && !tc_act_in_hw(p)) tcf_reoffload_del_notify(net, p); } mutex_unlock(&idrinfo->lock); } } mutex_unlock(&act_id_mutex); up_read(&net_rwsem); return 0; } static struct sk_buff *tcf_del_notify_msg(struct net *net, struct nlmsghdr *n, struct tc_action *actions[], u32 portid, size_t attr_size, struct netlink_ext_ack *extack) { struct sk_buff *skb; skb = alloc_skb(max(attr_size, NLMSG_GOODSIZE), GFP_KERNEL); if (!skb) return ERR_PTR(-ENOBUFS); if (tca_get_fill(skb, actions, portid, n->nlmsg_seq, 0, RTM_DELACTION, 0, 2, extack) <= 0) { NL_SET_ERR_MSG(extack, "Failed to fill netlink TC action attributes"); kfree_skb(skb); return ERR_PTR(-EINVAL); } return skb; } static int tcf_del_notify(struct net *net, struct nlmsghdr *n, struct tc_action *actions[], u32 portid, size_t attr_size, struct netlink_ext_ack *extack) { struct sk_buff *skb; int ret; if (!rtnl_notify_needed(net, n->nlmsg_flags, RTNLGRP_TC)) { skb = NULL; } else { skb = tcf_del_notify_msg(net, n, actions, portid, attr_size, extack); if (IS_ERR(skb)) return PTR_ERR(skb); } /* now do the delete */ ret = tcf_action_delete(net, actions); if (ret < 0) { NL_SET_ERR_MSG(extack, "Failed to delete TC action"); kfree_skb(skb); return ret; } return rtnetlink_maybe_send(skb, net, portid, RTNLGRP_TC, n->nlmsg_flags & NLM_F_ECHO); } static int tca_action_gd(struct net *net, struct nlattr *nla, struct nlmsghdr *n, u32 portid, int event, struct netlink_ext_ack *extack) { int i, ret; struct nlattr *tb[TCA_ACT_MAX_PRIO + 1]; struct tc_action *act; size_t attr_size = 0; struct tc_action *actions[TCA_ACT_MAX_PRIO] = {}; ret = nla_parse_nested_deprecated(tb, TCA_ACT_MAX_PRIO, nla, NULL, extack); if (ret < 0) return ret; if (event == RTM_DELACTION && n->nlmsg_flags & NLM_F_ROOT) { if (tb[1]) return tca_action_flush(net, tb[1], n, portid, extack); NL_SET_ERR_MSG(extack, "Invalid netlink attributes while flushing TC action"); return -EINVAL; } for (i = 1; i <= TCA_ACT_MAX_PRIO && tb[i]; i++) { act = tcf_action_get_1(net, tb[i], n, portid, extack); if (IS_ERR(act)) { ret = PTR_ERR(act); goto err; } attr_size += tcf_action_fill_size(act); actions[i - 1] = act; } attr_size = tcf_action_full_attrs_size(attr_size); if (event == RTM_GETACTION) ret = tcf_get_notify(net, portid, n, actions, event, extack); else { /* delete */ ret = tcf_del_notify(net, n, actions, portid, attr_size, extack); if (ret) goto err; return 0; } err: tcf_action_put_many(actions); return ret; } static struct sk_buff *tcf_add_notify_msg(struct net *net, struct nlmsghdr *n, struct tc_action *actions[], u32 portid, size_t attr_size, struct netlink_ext_ack *extack) { struct sk_buff *skb; skb = alloc_skb(max(attr_size, NLMSG_GOODSIZE), GFP_KERNEL); if (!skb) return ERR_PTR(-ENOBUFS); if (tca_get_fill(skb, actions, portid, n->nlmsg_seq, n->nlmsg_flags, RTM_NEWACTION, 0, 0, extack) <= 0) { NL_SET_ERR_MSG(extack, "Failed to fill netlink attributes while adding TC action"); kfree_skb(skb); return ERR_PTR(-EINVAL); } return skb; } static int tcf_add_notify(struct net *net, struct nlmsghdr *n, struct tc_action *actions[], u32 portid, size_t attr_size, struct netlink_ext_ack *extack) { struct sk_buff *skb; if (!rtnl_notify_needed(net, n->nlmsg_flags, RTNLGRP_TC)) { skb = NULL; } else { skb = tcf_add_notify_msg(net, n, actions, portid, attr_size, extack); if (IS_ERR(skb)) return PTR_ERR(skb); } return rtnetlink_maybe_send(skb, net, portid, RTNLGRP_TC, n->nlmsg_flags & NLM_F_ECHO); } static int tcf_action_add(struct net *net, struct nlattr *nla, struct nlmsghdr *n, u32 portid, u32 flags, struct netlink_ext_ack *extack) { size_t attr_size = 0; int loop, ret; struct tc_action *actions[TCA_ACT_MAX_PRIO] = {}; int init_res[TCA_ACT_MAX_PRIO] = {}; for (loop = 0; loop < 10; loop++) { ret = tcf_action_init(net, NULL, nla, NULL, actions, init_res, &attr_size, flags, 0, extack); if (ret != -EAGAIN) break; } if (ret < 0) return ret; ret = tcf_add_notify(net, n, actions, portid, attr_size, extack); /* only put bound actions */ tca_put_bound_many(actions, init_res); return ret; } static const struct nla_policy tcaa_policy[TCA_ROOT_MAX + 1] = { [TCA_ROOT_FLAGS] = NLA_POLICY_BITFIELD32(TCA_ACT_FLAG_LARGE_DUMP_ON | TCA_ACT_FLAG_TERSE_DUMP), [TCA_ROOT_TIME_DELTA] = { .type = NLA_U32 }, }; static int tc_ctl_action(struct sk_buff *skb, struct nlmsghdr *n, struct netlink_ext_ack *extack) { struct net *net = sock_net(skb->sk); struct nlattr *tca[TCA_ROOT_MAX + 1]; u32 portid = NETLINK_CB(skb).portid; u32 flags = 0; int ret = 0; if ((n->nlmsg_type != RTM_GETACTION) && !netlink_capable(skb, CAP_NET_ADMIN)) return -EPERM; ret = nlmsg_parse_deprecated(n, sizeof(struct tcamsg), tca, TCA_ROOT_MAX, NULL, extack); if (ret < 0) return ret; if (tca[TCA_ACT_TAB] == NULL) { NL_SET_ERR_MSG(extack, "Netlink action attributes missing"); return -EINVAL; } /* n->nlmsg_flags & NLM_F_CREATE */ switch (n->nlmsg_type) { case RTM_NEWACTION: /* we are going to assume all other flags * imply create only if it doesn't exist * Note that CREATE | EXCL implies that * but since we want avoid ambiguity (eg when flags * is zero) then just set this */ if (n->nlmsg_flags & NLM_F_REPLACE) flags = TCA_ACT_FLAGS_REPLACE; ret = tcf_action_add(net, tca[TCA_ACT_TAB], n, portid, flags, extack); break; case RTM_DELACTION: ret = tca_action_gd(net, tca[TCA_ACT_TAB], n, portid, RTM_DELACTION, extack); break; case RTM_GETACTION: ret = tca_action_gd(net, tca[TCA_ACT_TAB], n, portid, RTM_GETACTION, extack); break; default: BUG(); } return ret; } static struct nlattr *find_dump_kind(struct nlattr **nla) { struct nlattr *tb1, *tb2[TCA_ACT_MAX + 1]; struct nlattr *tb[TCA_ACT_MAX_PRIO + 1]; struct nlattr *kind; tb1 = nla[TCA_ACT_TAB]; if (tb1 == NULL) return NULL; if (nla_parse_deprecated(tb, TCA_ACT_MAX_PRIO, nla_data(tb1), NLMSG_ALIGN(nla_len(tb1)), NULL, NULL) < 0) return NULL; if (tb[1] == NULL) return NULL; if (nla_parse_nested_deprecated(tb2, TCA_ACT_MAX, tb[1], tcf_action_policy, NULL) < 0) return NULL; kind = tb2[TCA_ACT_KIND]; return kind; } static int tc_dump_action(struct sk_buff *skb, struct netlink_callback *cb) { struct net *net = sock_net(skb->sk); struct nlmsghdr *nlh; unsigned char *b = skb_tail_pointer(skb); struct nlattr *nest; struct tc_action_ops *a_o; int ret = 0; struct tcamsg *t = (struct tcamsg *) nlmsg_data(cb->nlh); struct nlattr *tb[TCA_ROOT_MAX + 1]; struct nlattr *count_attr = NULL; unsigned long jiffy_since = 0; struct nlattr *kind = NULL; struct nla_bitfield32 bf; u32 msecs_since = 0; u32 act_count = 0; ret = nlmsg_parse_deprecated(cb->nlh, sizeof(struct tcamsg), tb, TCA_ROOT_MAX, tcaa_policy, cb->extack); if (ret < 0) return ret; kind = find_dump_kind(tb); if (kind == NULL) { pr_info("tc_dump_action: action bad kind\n"); return 0; } a_o = tc_lookup_action(kind); if (a_o == NULL) return 0; cb->args[2] = 0; if (tb[TCA_ROOT_FLAGS]) { bf = nla_get_bitfield32(tb[TCA_ROOT_FLAGS]); cb->args[2] = bf.value; } if (tb[TCA_ROOT_TIME_DELTA]) { msecs_since = nla_get_u32(tb[TCA_ROOT_TIME_DELTA]); } nlh = nlmsg_put(skb, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq, cb->nlh->nlmsg_type, sizeof(*t), 0); if (!nlh) goto out_module_put; if (msecs_since) jiffy_since = jiffies - msecs_to_jiffies(msecs_since); t = nlmsg_data(nlh); t->tca_family = AF_UNSPEC; t->tca__pad1 = 0; t->tca__pad2 = 0; cb->args[3] = jiffy_since; count_attr = nla_reserve(skb, TCA_ROOT_COUNT, sizeof(u32)); if (!count_attr) goto out_module_put; nest = nla_nest_start_noflag(skb, TCA_ACT_TAB); if (nest == NULL) goto out_module_put; ret = __tcf_generic_walker(net, skb, cb, RTM_GETACTION, a_o, NULL); if (ret < 0) goto out_module_put; if (ret > 0) { nla_nest_end(skb, nest); ret = skb->len; act_count = cb->args[1]; memcpy(nla_data(count_attr), &act_count, sizeof(u32)); cb->args[1] = 0; } else nlmsg_trim(skb, b); nlh->nlmsg_len = skb_tail_pointer(skb) - b; if (NETLINK_CB(cb->skb).portid && ret) nlh->nlmsg_flags |= NLM_F_MULTI; module_put(a_o->owner); return skb->len; out_module_put: module_put(a_o->owner); nlmsg_trim(skb, b); return skb->len; } static const struct rtnl_msg_handler tc_action_rtnl_msg_handlers[] __initconst = { {.msgtype = RTM_NEWACTION, .doit = tc_ctl_action}, {.msgtype = RTM_DELACTION, .doit = tc_ctl_action}, {.msgtype = RTM_GETACTION, .doit = tc_ctl_action, .dumpit = tc_dump_action}, }; static int __init tc_action_init(void) { rtnl_register_many(tc_action_rtnl_msg_handlers); return 0; } subsys_initcall(tc_action_init); |
| 4 4 1 1 4 4 4 1 8 1 9 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 | // SPDX-License-Identifier: GPL-2.0-or-later /* * Cryptographic API. * * Serpent Cipher Algorithm. * * Copyright (C) 2002 Dag Arne Osvik <osvik@ii.uib.no> */ #include <crypto/algapi.h> #include <linux/init.h> #include <linux/module.h> #include <linux/errno.h> #include <linux/unaligned.h> #include <linux/types.h> #include <crypto/serpent.h> /* Key is padded to the maximum of 256 bits before round key generation. * Any key length <= 256 bits (32 bytes) is allowed by the algorithm. */ #define PHI 0x9e3779b9UL #define keyiter(a, b, c, d, i, j) \ ({ b ^= d; b ^= c; b ^= a; b ^= PHI ^ i; b = rol32(b, 11); k[j] = b; }) #define loadkeys(x0, x1, x2, x3, i) \ ({ x0 = k[i]; x1 = k[i+1]; x2 = k[i+2]; x3 = k[i+3]; }) #define storekeys(x0, x1, x2, x3, i) \ ({ k[i] = x0; k[i+1] = x1; k[i+2] = x2; k[i+3] = x3; }) #define store_and_load_keys(x0, x1, x2, x3, s, l) \ ({ storekeys(x0, x1, x2, x3, s); loadkeys(x0, x1, x2, x3, l); }) #define K(x0, x1, x2, x3, i) ({ \ x3 ^= k[4*(i)+3]; x2 ^= k[4*(i)+2]; \ x1 ^= k[4*(i)+1]; x0 ^= k[4*(i)+0]; \ }) #define LK(x0, x1, x2, x3, x4, i) ({ \ x0 = rol32(x0, 13);\ x2 = rol32(x2, 3); x1 ^= x0; x4 = x0 << 3; \ x3 ^= x2; x1 ^= x2; \ x1 = rol32(x1, 1); x3 ^= x4; \ x3 = rol32(x3, 7); x4 = x1; \ x0 ^= x1; x4 <<= 7; x2 ^= x3; \ x0 ^= x3; x2 ^= x4; x3 ^= k[4*i+3]; \ x1 ^= k[4*i+1]; x0 = rol32(x0, 5); x2 = rol32(x2, 22);\ x0 ^= k[4*i+0]; x2 ^= k[4*i+2]; \ }) #define KL(x0, x1, x2, x3, x4, i) ({ \ x0 ^= k[4*i+0]; x1 ^= k[4*i+1]; x2 ^= k[4*i+2]; \ x3 ^= k[4*i+3]; x0 = ror32(x0, 5); x2 = ror32(x2, 22);\ x4 = x1; x2 ^= x3; x0 ^= x3; \ x4 <<= 7; x0 ^= x1; x1 = ror32(x1, 1); \ x2 ^= x4; x3 = ror32(x3, 7); x4 = x0 << 3; \ x1 ^= x0; x3 ^= x4; x0 = ror32(x0, 13);\ x1 ^= x2; x3 ^= x2; x2 = ror32(x2, 3); \ }) #define S0(x0, x1, x2, x3, x4) ({ \ x4 = x3; \ x3 |= x0; x0 ^= x4; x4 ^= x2; \ x4 = ~x4; x3 ^= x1; x1 &= x0; \ x1 ^= x4; x2 ^= x0; x0 ^= x3; \ x4 |= x0; x0 ^= x2; x2 &= x1; \ x3 ^= x2; x1 = ~x1; x2 ^= x4; \ x1 ^= x2; \ }) #define S1(x0, x1, x2, x3, x4) ({ \ x4 = x1; \ x1 ^= x0; x0 ^= x3; x3 = ~x3; \ x4 &= x1; x0 |= x1; x3 ^= x2; \ x0 ^= x3; x1 ^= x3; x3 ^= x4; \ x1 |= x4; x4 ^= x2; x2 &= x0; \ x2 ^= x1; x1 |= x0; x0 = ~x0; \ x0 ^= x2; x4 ^= x1; \ }) #define S2(x0, x1, x2, x3, x4) ({ \ x3 = ~x3; \ x1 ^= x0; x4 = x0; x0 &= x2; \ x0 ^= x3; x3 |= x4; x2 ^= x1; \ x3 ^= x1; x1 &= x0; x0 ^= x2; \ x2 &= x3; x3 |= x1; x0 = ~x0; \ x3 ^= x0; x4 ^= x0; x0 ^= x2; \ x1 |= x2; \ }) #define S3(x0, x1, x2, x3, x4) ({ \ x4 = x1; \ x1 ^= x3; x3 |= x0; x4 &= x0; \ x0 ^= x2; x2 ^= x1; x1 &= x3; \ x2 ^= x3; x0 |= x4; x4 ^= x3; \ x1 ^= x0; x0 &= x3; x3 &= x4; \ x3 ^= x2; x4 |= x1; x2 &= x1; \ x4 ^= x3; x0 ^= x3; x3 ^= x2; \ }) #define S4(x0, x1, x2, x3, x4) ({ \ x4 = x3; \ x3 &= x0; x0 ^= x4; \ x3 ^= x2; x2 |= x4; x0 ^= x1; \ x4 ^= x3; x2 |= x0; \ x2 ^= x1; x1 &= x0; \ x1 ^= x4; x4 &= x2; x2 ^= x3; \ x4 ^= x0; x3 |= x1; x1 = ~x1; \ x3 ^= x0; \ }) #define S5(x0, x1, x2, x3, x4) ({ \ x4 = x1; x1 |= x0; \ x2 ^= x1; x3 = ~x3; x4 ^= x0; \ x0 ^= x2; x1 &= x4; x4 |= x3; \ x4 ^= x0; x0 &= x3; x1 ^= x3; \ x3 ^= x2; x0 ^= x1; x2 &= x4; \ x1 ^= x2; x2 &= x0; \ x3 ^= x2; \ }) #define S6(x0, x1, x2, x3, x4) ({ \ x4 = x1; \ x3 ^= x0; x1 ^= x2; x2 ^= x0; \ x0 &= x3; x1 |= x3; x4 = ~x4; \ x0 ^= x1; x1 ^= x2; \ x3 ^= x4; x4 ^= x0; x2 &= x0; \ x4 ^= x1; x2 ^= x3; x3 &= x1; \ x3 ^= x0; x1 ^= x2; \ }) #define S7(x0, x1, x2, x3, x4) ({ \ x1 = ~x1; \ x4 = x1; x0 = ~x0; x1 &= x2; \ x1 ^= x3; x3 |= x4; x4 ^= x2; \ x2 ^= x3; x3 ^= x0; x0 |= x1; \ x2 &= x0; x0 ^= x4; x4 ^= x3; \ x3 &= x0; x4 ^= x1; \ x2 ^= x4; x3 ^= x1; x4 |= x0; \ x4 ^= x1; \ }) #define SI0(x0, x1, x2, x3, x4) ({ \ x4 = x3; x1 ^= x0; \ x3 |= x1; x4 ^= x1; x0 = ~x0; \ x2 ^= x3; x3 ^= x0; x0 &= x1; \ x0 ^= x2; x2 &= x3; x3 ^= x4; \ x2 ^= x3; x1 ^= x3; x3 &= x0; \ x1 ^= x0; x0 ^= x2; x4 ^= x3; \ }) #define SI1(x0, x1, x2, x3, x4) ({ \ x1 ^= x3; x4 = x0; \ x0 ^= x2; x2 = ~x2; x4 |= x1; \ x4 ^= x3; x3 &= x1; x1 ^= x2; \ x2 &= x4; x4 ^= x1; x1 |= x3; \ x3 ^= x0; x2 ^= x0; x0 |= x4; \ x2 ^= x4; x1 ^= x0; \ x4 ^= x1; \ }) #define SI2(x0, x1, x2, x3, x4) ({ \ x2 ^= x1; x4 = x3; x3 = ~x3; \ x3 |= x2; x2 ^= x4; x4 ^= x0; \ x3 ^= x1; x1 |= x2; x2 ^= x0; \ x1 ^= x4; x4 |= x3; x2 ^= x3; \ x4 ^= x2; x2 &= x1; \ x2 ^= x3; x3 ^= x4; x4 ^= x0; \ }) #define SI3(x0, x1, x2, x3, x4) ({ \ x2 ^= x1; \ x4 = x1; x1 &= x2; \ x1 ^= x0; x0 |= x4; x4 ^= x3; \ x0 ^= x3; x3 |= x1; x1 ^= x2; \ x1 ^= x3; x0 ^= x2; x2 ^= x3; \ x3 &= x1; x1 ^= x0; x0 &= x2; \ x4 ^= x3; x3 ^= x0; x0 ^= x1; \ }) #define SI4(x0, x1, x2, x3, x4) ({ \ x2 ^= x3; x4 = x0; x0 &= x1; \ x0 ^= x2; x2 |= x3; x4 = ~x4; \ x1 ^= x0; x0 ^= x2; x2 &= x4; \ x2 ^= x0; x0 |= x4; \ x0 ^= x3; x3 &= x2; \ x4 ^= x3; x3 ^= x1; x1 &= x0; \ x4 ^= x1; x0 ^= x3; \ }) #define SI5(x0, x1, x2, x3, x4) ({ \ x4 = x1; x1 |= x2; \ x2 ^= x4; x1 ^= x3; x3 &= x4; \ x2 ^= x3; x3 |= x0; x0 = ~x0; \ x3 ^= x2; x2 |= x0; x4 ^= x1; \ x2 ^= x4; x4 &= x0; x0 ^= x1; \ x1 ^= x3; x0 &= x2; x2 ^= x3; \ x0 ^= x2; x2 ^= x4; x4 ^= x3; \ }) #define SI6(x0, x1, x2, x3, x4) ({ \ x0 ^= x2; \ x4 = x0; x0 &= x3; x2 ^= x3; \ x0 ^= x2; x3 ^= x1; x2 |= x4; \ x2 ^= x3; x3 &= x0; x0 = ~x0; \ x3 ^= x1; x1 &= x2; x4 ^= x0; \ x3 ^= x4; x4 ^= x2; x0 ^= x1; \ x2 ^= x0; \ }) #define SI7(x0, x1, x2, x3, x4) ({ \ x4 = x3; x3 &= x0; x0 ^= x2; \ x2 |= x4; x4 ^= x1; x0 = ~x0; \ x1 |= x3; x4 ^= x0; x0 &= x2; \ x0 ^= x1; x1 &= x2; x3 ^= x2; \ x4 ^= x3; x2 &= x3; x3 |= x0; \ x1 ^= x4; x3 ^= x4; x4 &= x0; \ x4 ^= x2; \ }) /* * both gcc and clang have misoptimized this function in the past, * producing horrible object code from spilling temporary variables * on the stack. Forcing this part out of line avoids that. */ static noinline void __serpent_setkey_sbox(u32 r0, u32 r1, u32 r2, u32 r3, u32 r4, u32 *k) { k += 100; S3(r3, r4, r0, r1, r2); store_and_load_keys(r1, r2, r4, r3, 28, 24); S4(r1, r2, r4, r3, r0); store_and_load_keys(r2, r4, r3, r0, 24, 20); S5(r2, r4, r3, r0, r1); store_and_load_keys(r1, r2, r4, r0, 20, 16); S6(r1, r2, r4, r0, r3); store_and_load_keys(r4, r3, r2, r0, 16, 12); S7(r4, r3, r2, r0, r1); store_and_load_keys(r1, r2, r0, r4, 12, 8); S0(r1, r2, r0, r4, r3); store_and_load_keys(r0, r2, r4, r1, 8, 4); S1(r0, r2, r4, r1, r3); store_and_load_keys(r3, r4, r1, r0, 4, 0); S2(r3, r4, r1, r0, r2); store_and_load_keys(r2, r4, r3, r0, 0, -4); S3(r2, r4, r3, r0, r1); store_and_load_keys(r0, r1, r4, r2, -4, -8); S4(r0, r1, r4, r2, r3); store_and_load_keys(r1, r4, r2, r3, -8, -12); S5(r1, r4, r2, r3, r0); store_and_load_keys(r0, r1, r4, r3, -12, -16); S6(r0, r1, r4, r3, r2); store_and_load_keys(r4, r2, r1, r3, -16, -20); S7(r4, r2, r1, r3, r0); store_and_load_keys(r0, r1, r3, r4, -20, -24); S0(r0, r1, r3, r4, r2); store_and_load_keys(r3, r1, r4, r0, -24, -28); k -= 50; S1(r3, r1, r4, r0, r2); store_and_load_keys(r2, r4, r0, r3, 22, 18); S2(r2, r4, r0, r3, r1); store_and_load_keys(r1, r4, r2, r3, 18, 14); S3(r1, r4, r2, r3, r0); store_and_load_keys(r3, r0, r4, r1, 14, 10); S4(r3, r0, r4, r1, r2); store_and_load_keys(r0, r4, r1, r2, 10, 6); S5(r0, r4, r1, r2, r3); store_and_load_keys(r3, r0, r4, r2, 6, 2); S6(r3, r0, r4, r2, r1); store_and_load_keys(r4, r1, r0, r2, 2, -2); S7(r4, r1, r0, r2, r3); store_and_load_keys(r3, r0, r2, r4, -2, -6); S0(r3, r0, r2, r4, r1); store_and_load_keys(r2, r0, r4, r3, -6, -10); S1(r2, r0, r4, r3, r1); store_and_load_keys(r1, r4, r3, r2, -10, -14); S2(r1, r4, r3, r2, r0); store_and_load_keys(r0, r4, r1, r2, -14, -18); S3(r0, r4, r1, r2, r3); store_and_load_keys(r2, r3, r4, r0, -18, -22); k -= 50; S4(r2, r3, r4, r0, r1); store_and_load_keys(r3, r4, r0, r1, 28, 24); S5(r3, r4, r0, r1, r2); store_and_load_keys(r2, r3, r4, r1, 24, 20); S6(r2, r3, r4, r1, r0); store_and_load_keys(r4, r0, r3, r1, 20, 16); S7(r4, r0, r3, r1, r2); store_and_load_keys(r2, r3, r1, r4, 16, 12); S0(r2, r3, r1, r4, r0); store_and_load_keys(r1, r3, r4, r2, 12, 8); S1(r1, r3, r4, r2, r0); store_and_load_keys(r0, r4, r2, r1, 8, 4); S2(r0, r4, r2, r1, r3); store_and_load_keys(r3, r4, r0, r1, 4, 0); S3(r3, r4, r0, r1, r2); storekeys(r1, r2, r4, r3, 0); } int __serpent_setkey(struct serpent_ctx *ctx, const u8 *key, unsigned int keylen) { u32 *k = ctx->expkey; u8 *k8 = (u8 *)k; u32 r0, r1, r2, r3, r4; __le32 *lk; int i; /* Copy key, add padding */ for (i = 0; i < keylen; ++i) k8[i] = key[i]; if (i < SERPENT_MAX_KEY_SIZE) k8[i++] = 1; while (i < SERPENT_MAX_KEY_SIZE) k8[i++] = 0; lk = (__le32 *)k; k[0] = le32_to_cpu(lk[0]); k[1] = le32_to_cpu(lk[1]); k[2] = le32_to_cpu(lk[2]); k[3] = le32_to_cpu(lk[3]); k[4] = le32_to_cpu(lk[4]); k[5] = le32_to_cpu(lk[5]); k[6] = le32_to_cpu(lk[6]); k[7] = le32_to_cpu(lk[7]); /* Expand key using polynomial */ r0 = k[3]; r1 = k[4]; r2 = k[5]; r3 = k[6]; r4 = k[7]; keyiter(k[0], r0, r4, r2, 0, 0); keyiter(k[1], r1, r0, r3, 1, 1); keyiter(k[2], r2, r1, r4, 2, 2); keyiter(k[3], r3, r2, r0, 3, 3); keyiter(k[4], r4, r3, r1, 4, 4); keyiter(k[5], r0, r4, r2, 5, 5); keyiter(k[6], r1, r0, r3, 6, 6); keyiter(k[7], r2, r1, r4, 7, 7); keyiter(k[0], r3, r2, r0, 8, 8); keyiter(k[1], r4, r3, r1, 9, 9); keyiter(k[2], r0, r4, r2, 10, 10); keyiter(k[3], r1, r0, r3, 11, 11); keyiter(k[4], r2, r1, r4, 12, 12); keyiter(k[5], r3, r2, r0, 13, 13); keyiter(k[6], r4, r3, r1, 14, 14); keyiter(k[7], r0, r4, r2, 15, 15); keyiter(k[8], r1, r0, r3, 16, 16); keyiter(k[9], r2, r1, r4, 17, 17); keyiter(k[10], r3, r2, r0, 18, 18); keyiter(k[11], r4, r3, r1, 19, 19); keyiter(k[12], r0, r4, r2, 20, 20); keyiter(k[13], r1, r0, r3, 21, 21); keyiter(k[14], r2, r1, r4, 22, 22); keyiter(k[15], r3, r2, r0, 23, 23); keyiter(k[16], r4, r3, r1, 24, 24); keyiter(k[17], r0, r4, r2, 25, 25); keyiter(k[18], r1, r0, r3, 26, 26); keyiter(k[19], r2, r1, r4, 27, 27); keyiter(k[20], r3, r2, r0, 28, 28); keyiter(k[21], r4, r3, r1, 29, 29); keyiter(k[22], r0, r4, r2, 30, 30); keyiter(k[23], r1, r0, r3, 31, 31); k += 50; keyiter(k[-26], r2, r1, r4, 32, -18); keyiter(k[-25], r3, r2, r0, 33, -17); keyiter(k[-24], r4, r3, r1, 34, -16); keyiter(k[-23], r0, r4, r2, 35, -15); keyiter(k[-22], r1, r0, r3, 36, -14); keyiter(k[-21], r2, r1, r4, 37, -13); keyiter(k[-20], r3, r2, r0, 38, -12); keyiter(k[-19], r4, r3, r1, 39, -11); keyiter(k[-18], r0, r4, r2, 40, -10); keyiter(k[-17], r1, r0, r3, 41, -9); keyiter(k[-16], r2, r1, r4, 42, -8); keyiter(k[-15], r3, r2, r0, 43, -7); keyiter(k[-14], r4, r3, r1, 44, -6); keyiter(k[-13], r0, r4, r2, 45, -5); keyiter(k[-12], r1, r0, r3, 46, -4); keyiter(k[-11], r2, r1, r4, 47, -3); keyiter(k[-10], r3, r2, r0, 48, -2); keyiter(k[-9], r4, r3, r1, 49, -1); keyiter(k[-8], r0, r4, r2, 50, 0); keyiter(k[-7], r1, r0, r3, 51, 1); keyiter(k[-6], r2, r1, r4, 52, 2); keyiter(k[-5], r3, r2, r0, 53, 3); keyiter(k[-4], r4, r3, r1, 54, 4); keyiter(k[-3], r0, r4, r2, 55, 5); keyiter(k[-2], r1, r0, r3, 56, 6); keyiter(k[-1], r2, r1, r4, 57, 7); keyiter(k[0], r3, r2, r0, 58, 8); keyiter(k[1], r4, r3, r1, 59, 9); keyiter(k[2], r0, r4, r2, 60, 10); keyiter(k[3], r1, r0, r3, 61, 11); keyiter(k[4], r2, r1, r4, 62, 12); keyiter(k[5], r3, r2, r0, 63, 13); keyiter(k[6], r4, r3, r1, 64, 14); keyiter(k[7], r0, r4, r2, 65, 15); keyiter(k[8], r1, r0, r3, 66, 16); keyiter(k[9], r2, r1, r4, 67, 17); keyiter(k[10], r3, r2, r0, 68, 18); keyiter(k[11], r4, r3, r1, 69, 19); keyiter(k[12], r0, r4, r2, 70, 20); keyiter(k[13], r1, r0, r3, 71, 21); keyiter(k[14], r2, r1, r4, 72, 22); keyiter(k[15], r3, r2, r0, 73, 23); keyiter(k[16], r4, r3, r1, 74, 24); keyiter(k[17], r0, r4, r2, 75, 25); keyiter(k[18], r1, r0, r3, 76, 26); keyiter(k[19], r2, r1, r4, 77, 27); keyiter(k[20], r3, r2, r0, 78, 28); keyiter(k[21], r4, r3, r1, 79, 29); keyiter(k[22], r0, r4, r2, 80, 30); keyiter(k[23], r1, r0, r3, 81, 31); k += 50; keyiter(k[-26], r2, r1, r4, 82, -18); keyiter(k[-25], r3, r2, r0, 83, -17); keyiter(k[-24], r4, r3, r1, 84, -16); keyiter(k[-23], r0, r4, r2, 85, -15); keyiter(k[-22], r1, r0, r3, 86, -14); keyiter(k[-21], r2, r1, r4, 87, -13); keyiter(k[-20], r3, r2, r0, 88, -12); keyiter(k[-19], r4, r3, r1, 89, -11); keyiter(k[-18], r0, r4, r2, 90, -10); keyiter(k[-17], r1, r0, r3, 91, -9); keyiter(k[-16], r2, r1, r4, 92, -8); keyiter(k[-15], r3, r2, r0, 93, -7); keyiter(k[-14], r4, r3, r1, 94, -6); keyiter(k[-13], r0, r4, r2, 95, -5); keyiter(k[-12], r1, r0, r3, 96, -4); keyiter(k[-11], r2, r1, r4, 97, -3); keyiter(k[-10], r3, r2, r0, 98, -2); keyiter(k[-9], r4, r3, r1, 99, -1); keyiter(k[-8], r0, r4, r2, 100, 0); keyiter(k[-7], r1, r0, r3, 101, 1); keyiter(k[-6], r2, r1, r4, 102, 2); keyiter(k[-5], r3, r2, r0, 103, 3); keyiter(k[-4], r4, r3, r1, 104, 4); keyiter(k[-3], r0, r4, r2, 105, 5); keyiter(k[-2], r1, r0, r3, 106, 6); keyiter(k[-1], r2, r1, r4, 107, 7); keyiter(k[0], r3, r2, r0, 108, 8); keyiter(k[1], r4, r3, r1, 109, 9); keyiter(k[2], r0, r4, r2, 110, 10); keyiter(k[3], r1, r0, r3, 111, 11); keyiter(k[4], r2, r1, r4, 112, 12); keyiter(k[5], r3, r2, r0, 113, 13); keyiter(k[6], r4, r3, r1, 114, 14); keyiter(k[7], r0, r4, r2, 115, 15); keyiter(k[8], r1, r0, r3, 116, 16); keyiter(k[9], r2, r1, r4, 117, 17); keyiter(k[10], r3, r2, r0, 118, 18); keyiter(k[11], r4, r3, r1, 119, 19); keyiter(k[12], r0, r4, r2, 120, 20); keyiter(k[13], r1, r0, r3, 121, 21); keyiter(k[14], r2, r1, r4, 122, 22); keyiter(k[15], r3, r2, r0, 123, 23); keyiter(k[16], r4, r3, r1, 124, 24); keyiter(k[17], r0, r4, r2, 125, 25); keyiter(k[18], r1, r0, r3, 126, 26); keyiter(k[19], r2, r1, r4, 127, 27); keyiter(k[20], r3, r2, r0, 128, 28); keyiter(k[21], r4, r3, r1, 129, 29); keyiter(k[22], r0, r4, r2, 130, 30); keyiter(k[23], r1, r0, r3, 131, 31); /* Apply S-boxes */ __serpent_setkey_sbox(r0, r1, r2, r3, r4, ctx->expkey); return 0; } EXPORT_SYMBOL_GPL(__serpent_setkey); int serpent_setkey(struct crypto_tfm *tfm, const u8 *key, unsigned int keylen) { return __serpent_setkey(crypto_tfm_ctx(tfm), key, keylen); } EXPORT_SYMBOL_GPL(serpent_setkey); void __serpent_encrypt(const void *c, u8 *dst, const u8 *src) { const struct serpent_ctx *ctx = c; const u32 *k = ctx->expkey; u32 r0, r1, r2, r3, r4; r0 = get_unaligned_le32(src); r1 = get_unaligned_le32(src + 4); r2 = get_unaligned_le32(src + 8); r3 = get_unaligned_le32(src + 12); K(r0, r1, r2, r3, 0); S0(r0, r1, r2, r3, r4); LK(r2, r1, r3, r0, r4, 1); S1(r2, r1, r3, r0, r4); LK(r4, r3, r0, r2, r1, 2); S2(r4, r3, r0, r2, r1); LK(r1, r3, r4, r2, r0, 3); S3(r1, r3, r4, r2, r0); LK(r2, r0, r3, r1, r4, 4); S4(r2, r0, r3, r1, r4); LK(r0, r3, r1, r4, r2, 5); S5(r0, r3, r1, r4, r2); LK(r2, r0, r3, r4, r1, 6); S6(r2, r0, r3, r4, r1); LK(r3, r1, r0, r4, r2, 7); S7(r3, r1, r0, r4, r2); LK(r2, r0, r4, r3, r1, 8); S0(r2, r0, r4, r3, r1); LK(r4, r0, r3, r2, r1, 9); S1(r4, r0, r3, r2, r1); LK(r1, r3, r2, r4, r0, 10); S2(r1, r3, r2, r4, r0); LK(r0, r3, r1, r4, r2, 11); S3(r0, r3, r1, r4, r2); LK(r4, r2, r3, r0, r1, 12); S4(r4, r2, r3, r0, r1); LK(r2, r3, r0, r1, r4, 13); S5(r2, r3, r0, r1, r4); LK(r4, r2, r3, r1, r0, 14); S6(r4, r2, r3, r1, r0); LK(r3, r0, r2, r1, r4, 15); S7(r3, r0, r2, r1, r4); LK(r4, r2, r1, r3, r0, 16); S0(r4, r2, r1, r3, r0); LK(r1, r2, r3, r4, r0, 17); S1(r1, r2, r3, r4, r0); LK(r0, r3, r4, r1, r2, 18); S2(r0, r3, r4, r1, r2); LK(r2, r3, r0, r1, r4, 19); S3(r2, r3, r0, r1, r4); LK(r1, r4, r3, r2, r0, 20); S4(r1, r4, r3, r2, r0); LK(r4, r3, r2, r0, r1, 21); S5(r4, r3, r2, r0, r1); LK(r1, r4, r3, r0, r2, 22); S6(r1, r4, r3, r0, r2); LK(r3, r2, r4, r0, r1, 23); S7(r3, r2, r4, r0, r1); LK(r1, r4, r0, r3, r2, 24); S0(r1, r4, r0, r3, r2); LK(r0, r4, r3, r1, r2, 25); S1(r0, r4, r3, r1, r2); LK(r2, r3, r1, r0, r4, 26); S2(r2, r3, r1, r0, r4); LK(r4, r3, r2, r0, r1, 27); S3(r4, r3, r2, r0, r1); LK(r0, r1, r3, r4, r2, 28); S4(r0, r1, r3, r4, r2); LK(r1, r3, r4, r2, r0, 29); S5(r1, r3, r4, r2, r0); LK(r0, r1, r3, r2, r4, 30); S6(r0, r1, r3, r2, r4); LK(r3, r4, r1, r2, r0, 31); S7(r3, r4, r1, r2, r0); K(r0, r1, r2, r3, 32); put_unaligned_le32(r0, dst); put_unaligned_le32(r1, dst + 4); put_unaligned_le32(r2, dst + 8); put_unaligned_le32(r3, dst + 12); } EXPORT_SYMBOL_GPL(__serpent_encrypt); static void serpent_encrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src) { struct serpent_ctx *ctx = crypto_tfm_ctx(tfm); __serpent_encrypt(ctx, dst, src); } void __serpent_decrypt(const void *c, u8 *dst, const u8 *src) { const struct serpent_ctx *ctx = c; const u32 *k = ctx->expkey; u32 r0, r1, r2, r3, r4; r0 = get_unaligned_le32(src); r1 = get_unaligned_le32(src + 4); r2 = get_unaligned_le32(src + 8); r3 = get_unaligned_le32(src + 12); K(r0, r1, r2, r3, 32); SI7(r0, r1, r2, r3, r4); KL(r1, r3, r0, r4, r2, 31); SI6(r1, r3, r0, r4, r2); KL(r0, r2, r4, r1, r3, 30); SI5(r0, r2, r4, r1, r3); KL(r2, r3, r0, r4, r1, 29); SI4(r2, r3, r0, r4, r1); KL(r2, r0, r1, r4, r3, 28); SI3(r2, r0, r1, r4, r3); KL(r1, r2, r3, r4, r0, 27); SI2(r1, r2, r3, r4, r0); KL(r2, r0, r4, r3, r1, 26); SI1(r2, r0, r4, r3, r1); KL(r1, r0, r4, r3, r2, 25); SI0(r1, r0, r4, r3, r2); KL(r4, r2, r0, r1, r3, 24); SI7(r4, r2, r0, r1, r3); KL(r2, r1, r4, r3, r0, 23); SI6(r2, r1, r4, r3, r0); KL(r4, r0, r3, r2, r1, 22); SI5(r4, r0, r3, r2, r1); KL(r0, r1, r4, r3, r2, 21); SI4(r0, r1, r4, r3, r2); KL(r0, r4, r2, r3, r1, 20); SI3(r0, r4, r2, r3, r1); KL(r2, r0, r1, r3, r4, 19); SI2(r2, r0, r1, r3, r4); KL(r0, r4, r3, r1, r2, 18); SI1(r0, r4, r3, r1, r2); KL(r2, r4, r3, r1, r0, 17); SI0(r2, r4, r3, r1, r0); KL(r3, r0, r4, r2, r1, 16); SI7(r3, r0, r4, r2, r1); KL(r0, r2, r3, r1, r4, 15); SI6(r0, r2, r3, r1, r4); KL(r3, r4, r1, r0, r2, 14); SI5(r3, r4, r1, r0, r2); KL(r4, r2, r3, r1, r0, 13); SI4(r4, r2, r3, r1, r0); KL(r4, r3, r0, r1, r2, 12); SI3(r4, r3, r0, r1, r2); KL(r0, r4, r2, r1, r3, 11); SI2(r0, r4, r2, r1, r3); KL(r4, r3, r1, r2, r0, 10); SI1(r4, r3, r1, r2, r0); KL(r0, r3, r1, r2, r4, 9); SI0(r0, r3, r1, r2, r4); KL(r1, r4, r3, r0, r2, 8); SI7(r1, r4, r3, r0, r2); KL(r4, r0, r1, r2, r3, 7); SI6(r4, r0, r1, r2, r3); KL(r1, r3, r2, r4, r0, 6); SI5(r1, r3, r2, r4, r0); KL(r3, r0, r1, r2, r4, 5); SI4(r3, r0, r1, r2, r4); KL(r3, r1, r4, r2, r0, 4); SI3(r3, r1, r4, r2, r0); KL(r4, r3, r0, r2, r1, 3); SI2(r4, r3, r0, r2, r1); KL(r3, r1, r2, r0, r4, 2); SI1(r3, r1, r2, r0, r4); KL(r4, r1, r2, r0, r3, 1); SI0(r4, r1, r2, r0, r3); K(r2, r3, r1, r4, 0); put_unaligned_le32(r2, dst); put_unaligned_le32(r3, dst + 4); put_unaligned_le32(r1, dst + 8); put_unaligned_le32(r4, dst + 12); } EXPORT_SYMBOL_GPL(__serpent_decrypt); static void serpent_decrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src) { struct serpent_ctx *ctx = crypto_tfm_ctx(tfm); __serpent_decrypt(ctx, dst, src); } static struct crypto_alg srp_alg = { .cra_name = "serpent", .cra_driver_name = "serpent-generic", .cra_priority = 100, .cra_flags = CRYPTO_ALG_TYPE_CIPHER, .cra_blocksize = SERPENT_BLOCK_SIZE, .cra_ctxsize = sizeof(struct serpent_ctx), .cra_module = THIS_MODULE, .cra_u = { .cipher = { .cia_min_keysize = SERPENT_MIN_KEY_SIZE, .cia_max_keysize = SERPENT_MAX_KEY_SIZE, .cia_setkey = serpent_setkey, .cia_encrypt = serpent_encrypt, .cia_decrypt = serpent_decrypt } } }; static int __init serpent_mod_init(void) { return crypto_register_alg(&srp_alg); } static void __exit serpent_mod_fini(void) { crypto_unregister_alg(&srp_alg); } module_init(serpent_mod_init); module_exit(serpent_mod_fini); MODULE_LICENSE("GPL"); MODULE_DESCRIPTION("Serpent Cipher Algorithm"); MODULE_AUTHOR("Dag Arne Osvik <osvik@ii.uib.no>"); MODULE_ALIAS_CRYPTO("serpent"); MODULE_ALIAS_CRYPTO("serpent-generic"); |
| 4 6 6 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 | // SPDX-License-Identifier: GPL-2.0-or-later /* * LAPB release 002 * * This code REQUIRES 2.1.15 or higher/ NET3.038 * * History * LAPB 001 Jonathan Naylor Started Coding * LAPB 002 Jonathan Naylor New timer architecture. */ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #include <linux/errno.h> #include <linux/types.h> #include <linux/socket.h> #include <linux/in.h> #include <linux/kernel.h> #include <linux/jiffies.h> #include <linux/timer.h> #include <linux/string.h> #include <linux/sockios.h> #include <linux/net.h> #include <linux/inet.h> #include <linux/skbuff.h> #include <net/sock.h> #include <linux/uaccess.h> #include <linux/fcntl.h> #include <linux/mm.h> #include <linux/interrupt.h> #include <net/lapb.h> static void lapb_t1timer_expiry(struct timer_list *); static void lapb_t2timer_expiry(struct timer_list *); void lapb_start_t1timer(struct lapb_cb *lapb) { timer_delete(&lapb->t1timer); lapb->t1timer.function = lapb_t1timer_expiry; lapb->t1timer.expires = jiffies + lapb->t1; lapb->t1timer_running = true; add_timer(&lapb->t1timer); } void lapb_start_t2timer(struct lapb_cb *lapb) { timer_delete(&lapb->t2timer); lapb->t2timer.function = lapb_t2timer_expiry; lapb->t2timer.expires = jiffies + lapb->t2; lapb->t2timer_running = true; add_timer(&lapb->t2timer); } void lapb_stop_t1timer(struct lapb_cb *lapb) { lapb->t1timer_running = false; timer_delete(&lapb->t1timer); } void lapb_stop_t2timer(struct lapb_cb *lapb) { lapb->t2timer_running = false; timer_delete(&lapb->t2timer); } int lapb_t1timer_running(struct lapb_cb *lapb) { return lapb->t1timer_running; } static void lapb_t2timer_expiry(struct timer_list *t) { struct lapb_cb *lapb = timer_container_of(lapb, t, t2timer); spin_lock_bh(&lapb->lock); if (timer_pending(&lapb->t2timer)) /* A new timer has been set up */ goto out; if (!lapb->t2timer_running) /* The timer has been stopped */ goto out; if (lapb->condition & LAPB_ACK_PENDING_CONDITION) { lapb->condition &= ~LAPB_ACK_PENDING_CONDITION; lapb_timeout_response(lapb); } lapb->t2timer_running = false; out: spin_unlock_bh(&lapb->lock); } static void lapb_t1timer_expiry(struct timer_list *t) { struct lapb_cb *lapb = timer_container_of(lapb, t, t1timer); spin_lock_bh(&lapb->lock); if (timer_pending(&lapb->t1timer)) /* A new timer has been set up */ goto out; if (!lapb->t1timer_running) /* The timer has been stopped */ goto out; switch (lapb->state) { /* * If we are a DCE, send DM up to N2 times, then switch to * STATE_1 and send SABM(E). */ case LAPB_STATE_0: if (lapb->mode & LAPB_DCE && lapb->n2count != lapb->n2) { lapb->n2count++; lapb_send_control(lapb, LAPB_DM, LAPB_POLLOFF, LAPB_RESPONSE); } else { lapb->state = LAPB_STATE_1; lapb_establish_data_link(lapb); } break; /* * Awaiting connection state, send SABM(E), up to N2 times. */ case LAPB_STATE_1: if (lapb->n2count == lapb->n2) { lapb_clear_queues(lapb); lapb->state = LAPB_STATE_0; lapb_disconnect_indication(lapb, LAPB_TIMEDOUT); lapb_dbg(0, "(%p) S1 -> S0\n", lapb->dev); lapb->t1timer_running = false; goto out; } else { lapb->n2count++; if (lapb->mode & LAPB_EXTENDED) { lapb_dbg(1, "(%p) S1 TX SABME(1)\n", lapb->dev); lapb_send_control(lapb, LAPB_SABME, LAPB_POLLON, LAPB_COMMAND); } else { lapb_dbg(1, "(%p) S1 TX SABM(1)\n", lapb->dev); lapb_send_control(lapb, LAPB_SABM, LAPB_POLLON, LAPB_COMMAND); } } break; /* * Awaiting disconnection state, send DISC, up to N2 times. */ case LAPB_STATE_2: if (lapb->n2count == lapb->n2) { lapb_clear_queues(lapb); lapb->state = LAPB_STATE_0; lapb_disconnect_confirmation(lapb, LAPB_TIMEDOUT); lapb_dbg(0, "(%p) S2 -> S0\n", lapb->dev); lapb->t1timer_running = false; goto out; } else { lapb->n2count++; lapb_dbg(1, "(%p) S2 TX DISC(1)\n", lapb->dev); lapb_send_control(lapb, LAPB_DISC, LAPB_POLLON, LAPB_COMMAND); } break; /* * Data transfer state, restransmit I frames, up to N2 times. */ case LAPB_STATE_3: if (lapb->n2count == lapb->n2) { lapb_clear_queues(lapb); lapb->state = LAPB_STATE_0; lapb_stop_t2timer(lapb); lapb_disconnect_indication(lapb, LAPB_TIMEDOUT); lapb_dbg(0, "(%p) S3 -> S0\n", lapb->dev); lapb->t1timer_running = false; goto out; } else { lapb->n2count++; lapb_requeue_frames(lapb); lapb_kick(lapb); } break; /* * Frame reject state, restransmit FRMR frames, up to N2 times. */ case LAPB_STATE_4: if (lapb->n2count == lapb->n2) { lapb_clear_queues(lapb); lapb->state = LAPB_STATE_0; lapb_disconnect_indication(lapb, LAPB_TIMEDOUT); lapb_dbg(0, "(%p) S4 -> S0\n", lapb->dev); lapb->t1timer_running = false; goto out; } else { lapb->n2count++; lapb_transmit_frmr(lapb); } break; } lapb_start_t1timer(lapb); out: spin_unlock_bh(&lapb->lock); } |
| 8 15 3 14 18 18 11 13 8 18 18 18 18 18 18 15 15 15 8 7 15 15 8 8 1 1 15 15 15 15 17 20 20 20 3 3 20 2 2 20 9 11 20 1 19 9 1 8 1 18 4 1 17 17 1 18 18 18 18 20 15 15 15 3 3 3 12 12 3 3 3 12 12 11 5 5 5 8 1 1 8 8 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 | // SPDX-License-Identifier: GPL-2.0-only /* * xt_hashlimit - Netfilter module to limit the number of packets per time * separately for each hashbucket (sourceip/sourceport/dstip/dstport) * * (C) 2003-2004 by Harald Welte <laforge@netfilter.org> * (C) 2006-2012 Patrick McHardy <kaber@trash.net> * Copyright © CC Computer Consultants GmbH, 2007 - 2008 * * Development of this code was funded by Astaro AG, http://www.astaro.com/ */ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #include <linux/module.h> #include <linux/spinlock.h> #include <linux/random.h> #include <linux/jhash.h> #include <linux/slab.h> #include <linux/proc_fs.h> #include <linux/seq_file.h> #include <linux/list.h> #include <linux/skbuff.h> #include <linux/mm.h> #include <linux/in.h> #include <linux/ip.h> #if IS_ENABLED(CONFIG_IP6_NF_IPTABLES) #include <linux/ipv6.h> #include <net/ipv6.h> #endif #include <net/net_namespace.h> #include <net/netns/generic.h> #include <linux/netfilter/x_tables.h> #include <linux/netfilter_ipv4/ip_tables.h> #include <linux/netfilter_ipv6/ip6_tables.h> #include <linux/mutex.h> #include <linux/kernel.h> #include <linux/refcount.h> #include <uapi/linux/netfilter/xt_hashlimit.h> #define XT_HASHLIMIT_ALL (XT_HASHLIMIT_HASH_DIP | XT_HASHLIMIT_HASH_DPT | \ XT_HASHLIMIT_HASH_SIP | XT_HASHLIMIT_HASH_SPT | \ XT_HASHLIMIT_INVERT | XT_HASHLIMIT_BYTES |\ XT_HASHLIMIT_RATE_MATCH) MODULE_LICENSE("GPL"); MODULE_AUTHOR("Harald Welte <laforge@netfilter.org>"); MODULE_AUTHOR("Jan Engelhardt <jengelh@medozas.de>"); MODULE_DESCRIPTION("Xtables: per hash-bucket rate-limit match"); MODULE_ALIAS("ipt_hashlimit"); MODULE_ALIAS("ip6t_hashlimit"); struct hashlimit_net { struct hlist_head htables; struct proc_dir_entry *ipt_hashlimit; struct proc_dir_entry *ip6t_hashlimit; }; static unsigned int hashlimit_net_id; static inline struct hashlimit_net *hashlimit_pernet(struct net *net) { return net_generic(net, hashlimit_net_id); } /* need to declare this at the top */ static const struct seq_operations dl_seq_ops_v2; static const struct seq_operations dl_seq_ops_v1; static const struct seq_operations dl_seq_ops; /* hash table crap */ struct dsthash_dst { union { struct { __be32 src; __be32 dst; } ip; #if IS_ENABLED(CONFIG_IP6_NF_IPTABLES) struct { __be32 src[4]; __be32 dst[4]; } ip6; #endif }; __be16 src_port; __be16 dst_port; }; struct dsthash_ent { /* static / read-only parts in the beginning */ struct hlist_node node; struct dsthash_dst dst; /* modified structure members in the end */ spinlock_t lock; unsigned long expires; /* precalculated expiry time */ struct { unsigned long prev; /* last modification */ union { struct { u_int64_t credit; u_int64_t credit_cap; u_int64_t cost; }; struct { u_int32_t interval, prev_window; u_int64_t current_rate; u_int64_t rate; int64_t burst; }; }; } rateinfo; struct rcu_head rcu; }; struct xt_hashlimit_htable { struct hlist_node node; /* global list of all htables */ refcount_t use; u_int8_t family; bool rnd_initialized; struct hashlimit_cfg3 cfg; /* config */ /* used internally */ spinlock_t lock; /* lock for list_head */ u_int32_t rnd; /* random seed for hash */ unsigned int count; /* number entries in table */ struct delayed_work gc_work; /* seq_file stuff */ struct proc_dir_entry *pde; const char *name; struct net *net; struct hlist_head hash[]; /* hashtable itself */ }; static int cfg_copy(struct hashlimit_cfg3 *to, const void *from, int revision) { if (revision == 1) { struct hashlimit_cfg1 *cfg = (struct hashlimit_cfg1 *)from; to->mode = cfg->mode; to->avg = cfg->avg; to->burst = cfg->burst; to->size = cfg->size; to->max = cfg->max; to->gc_interval = cfg->gc_interval; to->expire = cfg->expire; to->srcmask = cfg->srcmask; to->dstmask = cfg->dstmask; } else if (revision == 2) { struct hashlimit_cfg2 *cfg = (struct hashlimit_cfg2 *)from; to->mode = cfg->mode; to->avg = cfg->avg; to->burst = cfg->burst; to->size = cfg->size; to->max = cfg->max; to->gc_interval = cfg->gc_interval; to->expire = cfg->expire; to->srcmask = cfg->srcmask; to->dstmask = cfg->dstmask; } else if (revision == 3) { memcpy(to, from, sizeof(struct hashlimit_cfg3)); } else { return -EINVAL; } return 0; } static DEFINE_MUTEX(hashlimit_mutex); /* protects htables list */ static struct kmem_cache *hashlimit_cachep __read_mostly; static inline bool dst_cmp(const struct dsthash_ent *ent, const struct dsthash_dst *b) { return !memcmp(&ent->dst, b, sizeof(ent->dst)); } static u_int32_t hash_dst(const struct xt_hashlimit_htable *ht, const struct dsthash_dst *dst) { u_int32_t hash = jhash2((const u32 *)dst, sizeof(*dst)/sizeof(u32), ht->rnd); /* * Instead of returning hash % ht->cfg.size (implying a divide) * we return the high 32 bits of the (hash * ht->cfg.size) that will * give results between [0 and cfg.size-1] and same hash distribution, * but using a multiply, less expensive than a divide */ return reciprocal_scale(hash, ht->cfg.size); } static struct dsthash_ent * dsthash_find(const struct xt_hashlimit_htable *ht, const struct dsthash_dst *dst) { struct dsthash_ent *ent; u_int32_t hash = hash_dst(ht, dst); if (!hlist_empty(&ht->hash[hash])) { hlist_for_each_entry_rcu(ent, &ht->hash[hash], node) if (dst_cmp(ent, dst)) { spin_lock(&ent->lock); return ent; } } return NULL; } /* allocate dsthash_ent, initialize dst, put in htable and lock it */ static struct dsthash_ent * dsthash_alloc_init(struct xt_hashlimit_htable *ht, const struct dsthash_dst *dst, bool *race) { struct dsthash_ent *ent; spin_lock(&ht->lock); /* Two or more packets may race to create the same entry in the * hashtable, double check if this packet lost race. */ ent = dsthash_find(ht, dst); if (ent != NULL) { spin_unlock(&ht->lock); *race = true; return ent; } /* initialize hash with random val at the time we allocate * the first hashtable entry */ if (unlikely(!ht->rnd_initialized)) { get_random_bytes(&ht->rnd, sizeof(ht->rnd)); ht->rnd_initialized = true; } if (ht->cfg.max && ht->count >= ht->cfg.max) { /* FIXME: do something. question is what.. */ net_err_ratelimited("max count of %u reached\n", ht->cfg.max); ent = NULL; } else ent = kmem_cache_alloc(hashlimit_cachep, GFP_ATOMIC); if (ent) { memcpy(&ent->dst, dst, sizeof(ent->dst)); spin_lock_init(&ent->lock); spin_lock(&ent->lock); hlist_add_head_rcu(&ent->node, &ht->hash[hash_dst(ht, dst)]); ht->count++; } spin_unlock(&ht->lock); return ent; } static void dsthash_free_rcu(struct rcu_head *head) { struct dsthash_ent *ent = container_of(head, struct dsthash_ent, rcu); kmem_cache_free(hashlimit_cachep, ent); } static inline void dsthash_free(struct xt_hashlimit_htable *ht, struct dsthash_ent *ent) { hlist_del_rcu(&ent->node); call_rcu(&ent->rcu, dsthash_free_rcu); ht->count--; } static void htable_gc(struct work_struct *work); static int htable_create(struct net *net, struct hashlimit_cfg3 *cfg, const char *name, u_int8_t family, struct xt_hashlimit_htable **out_hinfo, int revision) { struct hashlimit_net *hashlimit_net = hashlimit_pernet(net); struct xt_hashlimit_htable *hinfo; const struct seq_operations *ops; unsigned int size, i; unsigned long nr_pages = totalram_pages(); int ret; if (cfg->size) { size = cfg->size; } else { size = (nr_pages << PAGE_SHIFT) / 16384 / sizeof(struct hlist_head); if (nr_pages > 1024 * 1024 * 1024 / PAGE_SIZE) size = 8192; if (size < 16) size = 16; } hinfo = kvmalloc_flex(*hinfo, hash, size); if (hinfo == NULL) return -ENOMEM; *out_hinfo = hinfo; /* copy match config into hashtable config */ ret = cfg_copy(&hinfo->cfg, (void *)cfg, 3); if (ret) { kvfree(hinfo); return ret; } hinfo->cfg.size = size; if (hinfo->cfg.max == 0) hinfo->cfg.max = 8 * hinfo->cfg.size; else if (hinfo->cfg.max < hinfo->cfg.size) hinfo->cfg.max = hinfo->cfg.size; for (i = 0; i < hinfo->cfg.size; i++) INIT_HLIST_HEAD(&hinfo->hash[i]); refcount_set(&hinfo->use, 1); hinfo->count = 0; hinfo->family = family; hinfo->rnd_initialized = false; hinfo->name = kstrdup(name, GFP_KERNEL); if (!hinfo->name) { kvfree(hinfo); return -ENOMEM; } spin_lock_init(&hinfo->lock); switch (revision) { case 1: ops = &dl_seq_ops_v1; break; case 2: ops = &dl_seq_ops_v2; break; default: ops = &dl_seq_ops; } hinfo->pde = proc_create_seq_data(name, 0, (family == NFPROTO_IPV4) ? hashlimit_net->ipt_hashlimit : hashlimit_net->ip6t_hashlimit, ops, hinfo); if (hinfo->pde == NULL) { kfree(hinfo->name); kvfree(hinfo); return -ENOMEM; } hinfo->net = net; INIT_DEFERRABLE_WORK(&hinfo->gc_work, htable_gc); queue_delayed_work(system_power_efficient_wq, &hinfo->gc_work, msecs_to_jiffies(hinfo->cfg.gc_interval)); hlist_add_head(&hinfo->node, &hashlimit_net->htables); return 0; } static void htable_selective_cleanup(struct xt_hashlimit_htable *ht, bool select_all) { unsigned int i; for (i = 0; i < ht->cfg.size; i++) { struct hlist_head *head = &ht->hash[i]; struct dsthash_ent *dh; struct hlist_node *n; if (hlist_empty(head)) continue; spin_lock_bh(&ht->lock); hlist_for_each_entry_safe(dh, n, head, node) { if (time_after_eq(jiffies, dh->expires) || select_all) dsthash_free(ht, dh); } spin_unlock_bh(&ht->lock); cond_resched(); } } static void htable_gc(struct work_struct *work) { struct xt_hashlimit_htable *ht; ht = container_of(work, struct xt_hashlimit_htable, gc_work.work); htable_selective_cleanup(ht, false); queue_delayed_work(system_power_efficient_wq, &ht->gc_work, msecs_to_jiffies(ht->cfg.gc_interval)); } static void htable_remove_proc_entry(struct xt_hashlimit_htable *hinfo) { struct hashlimit_net *hashlimit_net = hashlimit_pernet(hinfo->net); struct proc_dir_entry *parent; if (hinfo->family == NFPROTO_IPV4) parent = hashlimit_net->ipt_hashlimit; else parent = hashlimit_net->ip6t_hashlimit; if (parent != NULL) remove_proc_entry(hinfo->name, parent); } static struct xt_hashlimit_htable *htable_find_get(struct net *net, const char *name, u_int8_t family) { struct hashlimit_net *hashlimit_net = hashlimit_pernet(net); struct xt_hashlimit_htable *hinfo; hlist_for_each_entry(hinfo, &hashlimit_net->htables, node) { if (!strcmp(name, hinfo->name) && hinfo->family == family) { refcount_inc(&hinfo->use); return hinfo; } } return NULL; } static void htable_put(struct xt_hashlimit_htable *hinfo) { if (refcount_dec_and_mutex_lock(&hinfo->use, &hashlimit_mutex)) { hlist_del(&hinfo->node); htable_remove_proc_entry(hinfo); mutex_unlock(&hashlimit_mutex); cancel_delayed_work_sync(&hinfo->gc_work); htable_selective_cleanup(hinfo, true); kfree(hinfo->name); kvfree(hinfo); } } /* The algorithm used is the Simple Token Bucket Filter (TBF) * see net/sched/sch_tbf.c in the linux source tree */ /* Rusty: This is my (non-mathematically-inclined) understanding of this algorithm. The `average rate' in jiffies becomes your initial amount of credit `credit' and the most credit you can ever have `credit_cap'. The `peak rate' becomes the cost of passing the test, `cost'. `prev' tracks the last packet hit: you gain one credit per jiffy. If you get credit balance more than this, the extra credit is discarded. Every time the match passes, you lose `cost' credits; if you don't have that many, the test fails. See Alexey's formal explanation in net/sched/sch_tbf.c. To get the maximum range, we multiply by this factor (ie. you get N credits per jiffy). We want to allow a rate as low as 1 per day (slowest userspace tool allows), which means CREDITS_PER_JIFFY*HZ*60*60*24 < 2^32 ie. */ #define MAX_CPJ_v1 (0xFFFFFFFF / (HZ*60*60*24)) #define MAX_CPJ (0xFFFFFFFFFFFFFFFFULL / (HZ*60*60*24)) /* Repeated shift and or gives us all 1s, final shift and add 1 gives * us the power of 2 below the theoretical max, so GCC simply does a * shift. */ #define _POW2_BELOW2(x) ((x)|((x)>>1)) #define _POW2_BELOW4(x) (_POW2_BELOW2(x)|_POW2_BELOW2((x)>>2)) #define _POW2_BELOW8(x) (_POW2_BELOW4(x)|_POW2_BELOW4((x)>>4)) #define _POW2_BELOW16(x) (_POW2_BELOW8(x)|_POW2_BELOW8((x)>>8)) #define _POW2_BELOW32(x) (_POW2_BELOW16(x)|_POW2_BELOW16((x)>>16)) #define _POW2_BELOW64(x) (_POW2_BELOW32(x)|_POW2_BELOW32((x)>>32)) #define POW2_BELOW32(x) ((_POW2_BELOW32(x)>>1) + 1) #define POW2_BELOW64(x) ((_POW2_BELOW64(x)>>1) + 1) #define CREDITS_PER_JIFFY POW2_BELOW64(MAX_CPJ) #define CREDITS_PER_JIFFY_v1 POW2_BELOW32(MAX_CPJ_v1) /* in byte mode, the lowest possible rate is one packet/second. * credit_cap is used as a counter that tells us how many times we can * refill the "credits available" counter when it becomes empty. */ #define MAX_CPJ_BYTES (0xFFFFFFFF / HZ) #define CREDITS_PER_JIFFY_BYTES POW2_BELOW32(MAX_CPJ_BYTES) static u32 xt_hashlimit_len_to_chunks(u32 len) { return (len >> XT_HASHLIMIT_BYTE_SHIFT) + 1; } /* Precision saver. */ static u64 user2credits(u64 user, int revision) { u64 scale = (revision == 1) ? XT_HASHLIMIT_SCALE : XT_HASHLIMIT_SCALE_v2; u64 cpj = (revision == 1) ? CREDITS_PER_JIFFY_v1 : CREDITS_PER_JIFFY; /* Avoid overflow: divide the constant operands first */ if (scale >= HZ * cpj) return div64_u64(user, div64_u64(scale, HZ * cpj)); return user * div64_u64(HZ * cpj, scale); } static u32 user2credits_byte(u32 user) { u64 us = user; us *= HZ * CREDITS_PER_JIFFY_BYTES; return (u32) (us >> 32); } static u64 user2rate(u64 user) { if (user != 0) { return div64_u64(XT_HASHLIMIT_SCALE_v2, user); } else { pr_info_ratelimited("invalid rate from userspace: %llu\n", user); return 0; } } static u64 user2rate_bytes(u32 user) { u64 r; r = user ? U32_MAX / user : U32_MAX; return (r - 1) << XT_HASHLIMIT_BYTE_SHIFT; } static void rateinfo_recalc(struct dsthash_ent *dh, unsigned long now, u32 mode, int revision) { unsigned long delta = now - dh->rateinfo.prev; u64 cap, cpj; if (delta == 0) return; if (revision >= 3 && mode & XT_HASHLIMIT_RATE_MATCH) { u64 interval = dh->rateinfo.interval * HZ; if (delta < interval) return; dh->rateinfo.prev = now; dh->rateinfo.prev_window = ((dh->rateinfo.current_rate * interval) > (delta * dh->rateinfo.rate)); dh->rateinfo.current_rate = 0; return; } dh->rateinfo.prev = now; if (mode & XT_HASHLIMIT_BYTES) { u64 tmp = dh->rateinfo.credit; dh->rateinfo.credit += CREDITS_PER_JIFFY_BYTES * delta; cap = CREDITS_PER_JIFFY_BYTES * HZ; if (tmp >= dh->rateinfo.credit) {/* overflow */ dh->rateinfo.credit = cap; return; } } else { cpj = (revision == 1) ? CREDITS_PER_JIFFY_v1 : CREDITS_PER_JIFFY; dh->rateinfo.credit += delta * cpj; cap = dh->rateinfo.credit_cap; } if (dh->rateinfo.credit > cap) dh->rateinfo.credit = cap; } static void rateinfo_init(struct dsthash_ent *dh, struct xt_hashlimit_htable *hinfo, int revision) { dh->rateinfo.prev = jiffies; if (revision >= 3 && hinfo->cfg.mode & XT_HASHLIMIT_RATE_MATCH) { dh->rateinfo.prev_window = 0; dh->rateinfo.current_rate = 0; if (hinfo->cfg.mode & XT_HASHLIMIT_BYTES) { dh->rateinfo.rate = user2rate_bytes((u32)hinfo->cfg.avg); if (hinfo->cfg.burst) dh->rateinfo.burst = hinfo->cfg.burst * dh->rateinfo.rate; else dh->rateinfo.burst = dh->rateinfo.rate; } else { dh->rateinfo.rate = user2rate(hinfo->cfg.avg); dh->rateinfo.burst = hinfo->cfg.burst + dh->rateinfo.rate; } dh->rateinfo.interval = hinfo->cfg.interval; } else if (hinfo->cfg.mode & XT_HASHLIMIT_BYTES) { dh->rateinfo.credit = CREDITS_PER_JIFFY_BYTES * HZ; dh->rateinfo.cost = user2credits_byte(hinfo->cfg.avg); dh->rateinfo.credit_cap = hinfo->cfg.burst; } else { dh->rateinfo.credit = user2credits(hinfo->cfg.avg * hinfo->cfg.burst, revision); dh->rateinfo.cost = user2credits(hinfo->cfg.avg, revision); dh->rateinfo.credit_cap = dh->rateinfo.credit; } } static inline __be32 maskl(__be32 a, unsigned int l) { return l ? htonl(ntohl(a) & ~0 << (32 - l)) : 0; } #if IS_ENABLED(CONFIG_IP6_NF_IPTABLES) static void hashlimit_ipv6_mask(__be32 *i, unsigned int p) { switch (p) { case 0 ... 31: i[0] = maskl(i[0], p); i[1] = i[2] = i[3] = 0; break; case 32 ... 63: i[1] = maskl(i[1], p - 32); i[2] = i[3] = 0; break; case 64 ... 95: i[2] = maskl(i[2], p - 64); i[3] = 0; break; case 96 ... 127: i[3] = maskl(i[3], p - 96); break; case 128: break; } } #endif static int hashlimit_init_dst(const struct xt_hashlimit_htable *hinfo, struct dsthash_dst *dst, const struct sk_buff *skb, unsigned int protoff) { __be16 _ports[2], *ports; u8 nexthdr; int poff; memset(dst, 0, sizeof(*dst)); switch (hinfo->family) { case NFPROTO_IPV4: if (hinfo->cfg.mode & XT_HASHLIMIT_HASH_DIP) dst->ip.dst = maskl(ip_hdr(skb)->daddr, hinfo->cfg.dstmask); if (hinfo->cfg.mode & XT_HASHLIMIT_HASH_SIP) dst->ip.src = maskl(ip_hdr(skb)->saddr, hinfo->cfg.srcmask); if (!(hinfo->cfg.mode & (XT_HASHLIMIT_HASH_DPT | XT_HASHLIMIT_HASH_SPT))) return 0; nexthdr = ip_hdr(skb)->protocol; break; #if IS_ENABLED(CONFIG_IP6_NF_IPTABLES) case NFPROTO_IPV6: { __be16 frag_off; if (hinfo->cfg.mode & XT_HASHLIMIT_HASH_DIP) { memcpy(&dst->ip6.dst, &ipv6_hdr(skb)->daddr, sizeof(dst->ip6.dst)); hashlimit_ipv6_mask(dst->ip6.dst, hinfo->cfg.dstmask); } if (hinfo->cfg.mode & XT_HASHLIMIT_HASH_SIP) { memcpy(&dst->ip6.src, &ipv6_hdr(skb)->saddr, sizeof(dst->ip6.src)); hashlimit_ipv6_mask(dst->ip6.src, hinfo->cfg.srcmask); } if (!(hinfo->cfg.mode & (XT_HASHLIMIT_HASH_DPT | XT_HASHLIMIT_HASH_SPT))) return 0; nexthdr = ipv6_hdr(skb)->nexthdr; protoff = ipv6_skip_exthdr(skb, sizeof(struct ipv6hdr), &nexthdr, &frag_off); if ((int)protoff < 0) return -1; break; } #endif default: BUG(); return 0; } poff = proto_ports_offset(nexthdr); if (poff >= 0) { ports = skb_header_pointer(skb, protoff + poff, sizeof(_ports), &_ports); } else { _ports[0] = _ports[1] = 0; ports = _ports; } if (!ports) return -1; if (hinfo->cfg.mode & XT_HASHLIMIT_HASH_SPT) dst->src_port = ports[0]; if (hinfo->cfg.mode & XT_HASHLIMIT_HASH_DPT) dst->dst_port = ports[1]; return 0; } static u32 hashlimit_byte_cost(unsigned int len, struct dsthash_ent *dh) { u64 tmp = xt_hashlimit_len_to_chunks(len); tmp = tmp * dh->rateinfo.cost; if (unlikely(tmp > CREDITS_PER_JIFFY_BYTES * HZ)) tmp = CREDITS_PER_JIFFY_BYTES * HZ; if (dh->rateinfo.credit < tmp && dh->rateinfo.credit_cap) { dh->rateinfo.credit_cap--; dh->rateinfo.credit = CREDITS_PER_JIFFY_BYTES * HZ; } return (u32) tmp; } static bool hashlimit_mt_common(const struct sk_buff *skb, struct xt_action_param *par, struct xt_hashlimit_htable *hinfo, const struct hashlimit_cfg3 *cfg, int revision) { unsigned long now = jiffies; struct dsthash_ent *dh; struct dsthash_dst dst; bool race = false; u64 cost; if (hashlimit_init_dst(hinfo, &dst, skb, par->thoff) < 0) goto hotdrop; local_bh_disable(); dh = dsthash_find(hinfo, &dst); if (dh == NULL) { dh = dsthash_alloc_init(hinfo, &dst, &race); if (dh == NULL) { local_bh_enable(); goto hotdrop; } else if (race) { /* Already got an entry, update expiration timeout */ dh->expires = now + msecs_to_jiffies(hinfo->cfg.expire); rateinfo_recalc(dh, now, hinfo->cfg.mode, revision); } else { dh->expires = jiffies + msecs_to_jiffies(hinfo->cfg.expire); rateinfo_init(dh, hinfo, revision); } } else { /* update expiration timeout */ dh->expires = now + msecs_to_jiffies(hinfo->cfg.expire); rateinfo_recalc(dh, now, hinfo->cfg.mode, revision); } if (cfg->mode & XT_HASHLIMIT_RATE_MATCH) { cost = (cfg->mode & XT_HASHLIMIT_BYTES) ? skb->len : 1; dh->rateinfo.current_rate += cost; if (!dh->rateinfo.prev_window && (dh->rateinfo.current_rate <= dh->rateinfo.burst)) { spin_unlock(&dh->lock); local_bh_enable(); return !(cfg->mode & XT_HASHLIMIT_INVERT); } else { goto overlimit; } } if (cfg->mode & XT_HASHLIMIT_BYTES) cost = hashlimit_byte_cost(skb->len, dh); else cost = dh->rateinfo.cost; if (dh->rateinfo.credit >= cost) { /* below the limit */ dh->rateinfo.credit -= cost; spin_unlock(&dh->lock); local_bh_enable(); return !(cfg->mode & XT_HASHLIMIT_INVERT); } overlimit: spin_unlock(&dh->lock); local_bh_enable(); /* default match is underlimit - so over the limit, we need to invert */ return cfg->mode & XT_HASHLIMIT_INVERT; hotdrop: par->hotdrop = true; return false; } static bool hashlimit_mt_v1(const struct sk_buff *skb, struct xt_action_param *par) { const struct xt_hashlimit_mtinfo1 *info = par->matchinfo; struct xt_hashlimit_htable *hinfo = info->hinfo; struct hashlimit_cfg3 cfg = {}; int ret; ret = cfg_copy(&cfg, (void *)&info->cfg, 1); if (ret) return ret; return hashlimit_mt_common(skb, par, hinfo, &cfg, 1); } static bool hashlimit_mt_v2(const struct sk_buff *skb, struct xt_action_param *par) { const struct xt_hashlimit_mtinfo2 *info = par->matchinfo; struct xt_hashlimit_htable *hinfo = info->hinfo; struct hashlimit_cfg3 cfg = {}; int ret; ret = cfg_copy(&cfg, (void *)&info->cfg, 2); if (ret) return ret; return hashlimit_mt_common(skb, par, hinfo, &cfg, 2); } static bool hashlimit_mt(const struct sk_buff *skb, struct xt_action_param *par) { const struct xt_hashlimit_mtinfo3 *info = par->matchinfo; struct xt_hashlimit_htable *hinfo = info->hinfo; return hashlimit_mt_common(skb, par, hinfo, &info->cfg, 3); } #define HASHLIMIT_MAX_SIZE 1048576 static int hashlimit_mt_check_common(const struct xt_mtchk_param *par, struct xt_hashlimit_htable **hinfo, struct hashlimit_cfg3 *cfg, const char *name, int revision) { struct net *net = par->net; int ret; if (cfg->gc_interval == 0 || cfg->expire == 0) return -EINVAL; if (cfg->size > HASHLIMIT_MAX_SIZE) { cfg->size = HASHLIMIT_MAX_SIZE; pr_info_ratelimited("size too large, truncated to %u\n", cfg->size); } if (cfg->max > HASHLIMIT_MAX_SIZE) { cfg->max = HASHLIMIT_MAX_SIZE; pr_info_ratelimited("max too large, truncated to %u\n", cfg->max); } if (par->family == NFPROTO_IPV4) { if (cfg->srcmask > 32 || cfg->dstmask > 32) return -EINVAL; } else { if (cfg->srcmask > 128 || cfg->dstmask > 128) return -EINVAL; } if (cfg->mode & ~XT_HASHLIMIT_ALL) { pr_info_ratelimited("Unknown mode mask %X, kernel too old?\n", cfg->mode); return -EINVAL; } /* Check for overflow. */ if (revision >= 3 && cfg->mode & XT_HASHLIMIT_RATE_MATCH) { if (cfg->avg == 0 || cfg->avg > U32_MAX) { pr_info_ratelimited("invalid rate\n"); return -ERANGE; } if (cfg->interval == 0) { pr_info_ratelimited("invalid interval\n"); return -EINVAL; } } else if (cfg->mode & XT_HASHLIMIT_BYTES) { if (user2credits_byte(cfg->avg) == 0) { pr_info_ratelimited("overflow, rate too high: %llu\n", cfg->avg); return -EINVAL; } } else if (cfg->burst == 0 || user2credits(cfg->avg * cfg->burst, revision) < user2credits(cfg->avg, revision)) { pr_info_ratelimited("overflow, try lower: %llu/%llu\n", cfg->avg, cfg->burst); return -ERANGE; } mutex_lock(&hashlimit_mutex); *hinfo = htable_find_get(net, name, par->family); if (*hinfo == NULL) { ret = htable_create(net, cfg, name, par->family, hinfo, revision); if (ret < 0) { mutex_unlock(&hashlimit_mutex); return ret; } } mutex_unlock(&hashlimit_mutex); return 0; } static int hashlimit_mt_check_v1(const struct xt_mtchk_param *par) { struct xt_hashlimit_mtinfo1 *info = par->matchinfo; struct hashlimit_cfg3 cfg = {}; int ret; ret = xt_check_proc_name(info->name, sizeof(info->name)); if (ret) return ret; ret = cfg_copy(&cfg, (void *)&info->cfg, 1); if (ret) return ret; return hashlimit_mt_check_common(par, &info->hinfo, &cfg, info->name, 1); } static int hashlimit_mt_check_v2(const struct xt_mtchk_param *par) { struct xt_hashlimit_mtinfo2 *info = par->matchinfo; struct hashlimit_cfg3 cfg = {}; int ret; ret = xt_check_proc_name(info->name, sizeof(info->name)); if (ret) return ret; ret = cfg_copy(&cfg, (void *)&info->cfg, 2); if (ret) return ret; return hashlimit_mt_check_common(par, &info->hinfo, &cfg, info->name, 2); } static int hashlimit_mt_check(const struct xt_mtchk_param *par) { struct xt_hashlimit_mtinfo3 *info = par->matchinfo; int ret; ret = xt_check_proc_name(info->name, sizeof(info->name)); if (ret) return ret; return hashlimit_mt_check_common(par, &info->hinfo, &info->cfg, info->name, 3); } static void hashlimit_mt_destroy_v2(const struct xt_mtdtor_param *par) { const struct xt_hashlimit_mtinfo2 *info = par->matchinfo; htable_put(info->hinfo); } static void hashlimit_mt_destroy_v1(const struct xt_mtdtor_param *par) { const struct xt_hashlimit_mtinfo1 *info = par->matchinfo; htable_put(info->hinfo); } static void hashlimit_mt_destroy(const struct xt_mtdtor_param *par) { const struct xt_hashlimit_mtinfo3 *info = par->matchinfo; htable_put(info->hinfo); } static struct xt_match hashlimit_mt_reg[] __read_mostly = { { .name = "hashlimit", .revision = 1, .family = NFPROTO_IPV4, .match = hashlimit_mt_v1, .matchsize = sizeof(struct xt_hashlimit_mtinfo1), .usersize = offsetof(struct xt_hashlimit_mtinfo1, hinfo), .checkentry = hashlimit_mt_check_v1, .destroy = hashlimit_mt_destroy_v1, .me = THIS_MODULE, }, { .name = "hashlimit", .revision = 2, .family = NFPROTO_IPV4, .match = hashlimit_mt_v2, .matchsize = sizeof(struct xt_hashlimit_mtinfo2), .usersize = offsetof(struct xt_hashlimit_mtinfo2, hinfo), .checkentry = hashlimit_mt_check_v2, .destroy = hashlimit_mt_destroy_v2, .me = THIS_MODULE, }, { .name = "hashlimit", .revision = 3, .family = NFPROTO_IPV4, .match = hashlimit_mt, .matchsize = sizeof(struct xt_hashlimit_mtinfo3), .usersize = offsetof(struct xt_hashlimit_mtinfo3, hinfo), .checkentry = hashlimit_mt_check, .destroy = hashlimit_mt_destroy, .me = THIS_MODULE, }, #if IS_ENABLED(CONFIG_IP6_NF_IPTABLES) { .name = "hashlimit", .revision = 1, .family = NFPROTO_IPV6, .match = hashlimit_mt_v1, .matchsize = sizeof(struct xt_hashlimit_mtinfo1), .usersize = offsetof(struct xt_hashlimit_mtinfo1, hinfo), .checkentry = hashlimit_mt_check_v1, .destroy = hashlimit_mt_destroy_v1, .me = THIS_MODULE, }, { .name = "hashlimit", .revision = 2, .family = NFPROTO_IPV6, .match = hashlimit_mt_v2, .matchsize = sizeof(struct xt_hashlimit_mtinfo2), .usersize = offsetof(struct xt_hashlimit_mtinfo2, hinfo), .checkentry = hashlimit_mt_check_v2, .destroy = hashlimit_mt_destroy_v2, .me = THIS_MODULE, }, { .name = "hashlimit", .revision = 3, .family = NFPROTO_IPV6, .match = hashlimit_mt, .matchsize = sizeof(struct xt_hashlimit_mtinfo3), .usersize = offsetof(struct xt_hashlimit_mtinfo3, hinfo), .checkentry = hashlimit_mt_check, .destroy = hashlimit_mt_destroy, .me = THIS_MODULE, }, #endif }; /* PROC stuff */ static void *dl_seq_start(struct seq_file *s, loff_t *pos) __acquires(htable->lock) { struct xt_hashlimit_htable *htable = pde_data(file_inode(s->file)); unsigned int *bucket; spin_lock_bh(&htable->lock); if (*pos >= htable->cfg.size) return NULL; bucket = kmalloc(sizeof(unsigned int), GFP_ATOMIC); if (!bucket) return ERR_PTR(-ENOMEM); *bucket = *pos; return bucket; } static void *dl_seq_next(struct seq_file *s, void *v, loff_t *pos) { struct xt_hashlimit_htable *htable = pde_data(file_inode(s->file)); unsigned int *bucket = v; *pos = ++(*bucket); if (*pos >= htable->cfg.size) { kfree(v); return NULL; } return bucket; } static void dl_seq_stop(struct seq_file *s, void *v) __releases(htable->lock) { struct xt_hashlimit_htable *htable = pde_data(file_inode(s->file)); unsigned int *bucket = v; if (!IS_ERR(bucket)) kfree(bucket); spin_unlock_bh(&htable->lock); } static void dl_seq_print(struct dsthash_ent *ent, u_int8_t family, struct seq_file *s) { switch (family) { case NFPROTO_IPV4: seq_printf(s, "%ld %pI4:%u->%pI4:%u %llu %llu %llu\n", (long)(ent->expires - jiffies)/HZ, &ent->dst.ip.src, ntohs(ent->dst.src_port), &ent->dst.ip.dst, ntohs(ent->dst.dst_port), ent->rateinfo.credit, ent->rateinfo.credit_cap, ent->rateinfo.cost); break; #if IS_ENABLED(CONFIG_IP6_NF_IPTABLES) case NFPROTO_IPV6: seq_printf(s, "%ld %pI6:%u->%pI6:%u %llu %llu %llu\n", (long)(ent->expires - jiffies)/HZ, &ent->dst.ip6.src, ntohs(ent->dst.src_port), &ent->dst.ip6.dst, ntohs(ent->dst.dst_port), ent->rateinfo.credit, ent->rateinfo.credit_cap, ent->rateinfo.cost); break; #endif default: BUG(); } } static int dl_seq_real_show_v2(struct dsthash_ent *ent, u_int8_t family, struct seq_file *s) { struct xt_hashlimit_htable *ht = pde_data(file_inode(s->file)); spin_lock(&ent->lock); /* recalculate to show accurate numbers */ rateinfo_recalc(ent, jiffies, ht->cfg.mode, 2); dl_seq_print(ent, family, s); spin_unlock(&ent->lock); return seq_has_overflowed(s); } static int dl_seq_real_show_v1(struct dsthash_ent *ent, u_int8_t family, struct seq_file *s) { struct xt_hashlimit_htable *ht = pde_data(file_inode(s->file)); spin_lock(&ent->lock); /* recalculate to show accurate numbers */ rateinfo_recalc(ent, jiffies, ht->cfg.mode, 1); dl_seq_print(ent, family, s); spin_unlock(&ent->lock); return seq_has_overflowed(s); } static int dl_seq_real_show(struct dsthash_ent *ent, u_int8_t family, struct seq_file *s) { struct xt_hashlimit_htable *ht = pde_data(file_inode(s->file)); spin_lock(&ent->lock); /* recalculate to show accurate numbers */ rateinfo_recalc(ent, jiffies, ht->cfg.mode, 3); dl_seq_print(ent, family, s); spin_unlock(&ent->lock); return seq_has_overflowed(s); } static int dl_seq_show_v2(struct seq_file *s, void *v) { struct xt_hashlimit_htable *htable = pde_data(file_inode(s->file)); unsigned int *bucket = (unsigned int *)v; struct dsthash_ent *ent; if (!hlist_empty(&htable->hash[*bucket])) { hlist_for_each_entry(ent, &htable->hash[*bucket], node) if (dl_seq_real_show_v2(ent, htable->family, s)) return -1; } return 0; } static int dl_seq_show_v1(struct seq_file *s, void *v) { struct xt_hashlimit_htable *htable = pde_data(file_inode(s->file)); unsigned int *bucket = v; struct dsthash_ent *ent; if (!hlist_empty(&htable->hash[*bucket])) { hlist_for_each_entry(ent, &htable->hash[*bucket], node) if (dl_seq_real_show_v1(ent, htable->family, s)) return -1; } return 0; } static int dl_seq_show(struct seq_file *s, void *v) { struct xt_hashlimit_htable *htable = pde_data(file_inode(s->file)); unsigned int *bucket = v; struct dsthash_ent *ent; if (!hlist_empty(&htable->hash[*bucket])) { hlist_for_each_entry(ent, &htable->hash[*bucket], node) if (dl_seq_real_show(ent, htable->family, s)) return -1; } return 0; } static const struct seq_operations dl_seq_ops_v1 = { .start = dl_seq_start, .next = dl_seq_next, .stop = dl_seq_stop, .show = dl_seq_show_v1 }; static const struct seq_operations dl_seq_ops_v2 = { .start = dl_seq_start, .next = dl_seq_next, .stop = dl_seq_stop, .show = dl_seq_show_v2 }; static const struct seq_operations dl_seq_ops = { .start = dl_seq_start, .next = dl_seq_next, .stop = dl_seq_stop, .show = dl_seq_show }; static int __net_init hashlimit_proc_net_init(struct net *net) { struct hashlimit_net *hashlimit_net = hashlimit_pernet(net); hashlimit_net->ipt_hashlimit = proc_mkdir("ipt_hashlimit", net->proc_net); if (!hashlimit_net->ipt_hashlimit) return -ENOMEM; #if IS_ENABLED(CONFIG_IP6_NF_IPTABLES) hashlimit_net->ip6t_hashlimit = proc_mkdir("ip6t_hashlimit", net->proc_net); if (!hashlimit_net->ip6t_hashlimit) { remove_proc_entry("ipt_hashlimit", net->proc_net); return -ENOMEM; } #endif return 0; } static void __net_exit hashlimit_proc_net_exit(struct net *net) { struct xt_hashlimit_htable *hinfo; struct hashlimit_net *hashlimit_net = hashlimit_pernet(net); /* hashlimit_net_exit() is called before hashlimit_mt_destroy(). * Make sure that the parent ipt_hashlimit and ip6t_hashlimit proc * entries is empty before trying to remove it. */ mutex_lock(&hashlimit_mutex); hlist_for_each_entry(hinfo, &hashlimit_net->htables, node) htable_remove_proc_entry(hinfo); hashlimit_net->ipt_hashlimit = NULL; hashlimit_net->ip6t_hashlimit = NULL; mutex_unlock(&hashlimit_mutex); remove_proc_entry("ipt_hashlimit", net->proc_net); #if IS_ENABLED(CONFIG_IP6_NF_IPTABLES) remove_proc_entry("ip6t_hashlimit", net->proc_net); #endif } static int __net_init hashlimit_net_init(struct net *net) { struct hashlimit_net *hashlimit_net = hashlimit_pernet(net); INIT_HLIST_HEAD(&hashlimit_net->htables); return hashlimit_proc_net_init(net); } static void __net_exit hashlimit_net_exit(struct net *net) { hashlimit_proc_net_exit(net); } static struct pernet_operations hashlimit_net_ops = { .init = hashlimit_net_init, .exit = hashlimit_net_exit, .id = &hashlimit_net_id, .size = sizeof(struct hashlimit_net), }; static int __init hashlimit_mt_init(void) { int err; err = register_pernet_subsys(&hashlimit_net_ops); if (err < 0) return err; err = xt_register_matches(hashlimit_mt_reg, ARRAY_SIZE(hashlimit_mt_reg)); if (err < 0) goto err1; err = -ENOMEM; hashlimit_cachep = kmem_cache_create("xt_hashlimit", sizeof(struct dsthash_ent), 0, 0, NULL); if (!hashlimit_cachep) { pr_warn("unable to create slab cache\n"); goto err2; } return 0; err2: xt_unregister_matches(hashlimit_mt_reg, ARRAY_SIZE(hashlimit_mt_reg)); err1: unregister_pernet_subsys(&hashlimit_net_ops); return err; } static void __exit hashlimit_mt_exit(void) { xt_unregister_matches(hashlimit_mt_reg, ARRAY_SIZE(hashlimit_mt_reg)); unregister_pernet_subsys(&hashlimit_net_ops); rcu_barrier(); kmem_cache_destroy(hashlimit_cachep); } module_init(hashlimit_mt_init); module_exit(hashlimit_mt_exit); |
| 535 15 52 11 12 3 11 534 24 313 501 140 245 5 4 21 8 57 59 40 384 88 29 459 15 3 4 1 2 1 3 4 7 8 22 1 54 6 6 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 | /* SPDX-License-Identifier: GPL-2.0 */ #if !defined(_TRACE_KVM_H) || defined(TRACE_HEADER_MULTI_READ) #define _TRACE_KVM_H #include <linux/tracepoint.h> #include <asm/vmx.h> #include <asm/svm.h> #include <asm/clocksource.h> #include <asm/pvclock-abi.h> #undef TRACE_SYSTEM #define TRACE_SYSTEM kvm #ifdef CREATE_TRACE_POINTS #define tracing_kvm_rip_read(vcpu) ({ \ typeof(vcpu) __vcpu = vcpu; \ __vcpu->arch.guest_state_protected ? 0 : kvm_rip_read(__vcpu); \ }) #endif /* * Tracepoint for guest mode entry. */ TRACE_EVENT(kvm_entry, TP_PROTO(struct kvm_vcpu *vcpu, bool force_immediate_exit), TP_ARGS(vcpu, force_immediate_exit), TP_STRUCT__entry( __field( unsigned int, vcpu_id ) __field( unsigned long, rip ) __field( bool, immediate_exit ) __field( u32, intr_info ) __field( u32, error_code ) ), TP_fast_assign( __entry->vcpu_id = vcpu->vcpu_id; __entry->rip = tracing_kvm_rip_read(vcpu); __entry->immediate_exit = force_immediate_exit; kvm_x86_call(get_entry_info)(vcpu, &__entry->intr_info, &__entry->error_code); ), TP_printk("vcpu %u, rip 0x%lx intr_info 0x%08x error_code 0x%08x%s", __entry->vcpu_id, __entry->rip, __entry->intr_info, __entry->error_code, __entry->immediate_exit ? "[immediate exit]" : "") ); /* * Tracepoint for hypercall. */ TRACE_EVENT(kvm_hypercall, TP_PROTO(unsigned long nr, unsigned long a0, unsigned long a1, unsigned long a2, unsigned long a3), TP_ARGS(nr, a0, a1, a2, a3), TP_STRUCT__entry( __field( unsigned long, nr ) __field( unsigned long, a0 ) __field( unsigned long, a1 ) __field( unsigned long, a2 ) __field( unsigned long, a3 ) ), TP_fast_assign( __entry->nr = nr; __entry->a0 = a0; __entry->a1 = a1; __entry->a2 = a2; __entry->a3 = a3; ), TP_printk("nr 0x%lx a0 0x%lx a1 0x%lx a2 0x%lx a3 0x%lx", __entry->nr, __entry->a0, __entry->a1, __entry->a2, __entry->a3) ); /* * Tracepoint for hypercall. */ TRACE_EVENT(kvm_hv_hypercall, TP_PROTO(__u16 code, bool fast, __u16 var_cnt, __u16 rep_cnt, __u16 rep_idx, __u64 ingpa, __u64 outgpa), TP_ARGS(code, fast, var_cnt, rep_cnt, rep_idx, ingpa, outgpa), TP_STRUCT__entry( __field( __u16, rep_cnt ) __field( __u16, rep_idx ) __field( __u64, ingpa ) __field( __u64, outgpa ) __field( __u16, code ) __field( __u16, var_cnt ) __field( bool, fast ) ), TP_fast_assign( __entry->rep_cnt = rep_cnt; __entry->rep_idx = rep_idx; __entry->ingpa = ingpa; __entry->outgpa = outgpa; __entry->code = code; __entry->var_cnt = var_cnt; __entry->fast = fast; ), TP_printk("code 0x%x %s var_cnt 0x%x rep_cnt 0x%x idx 0x%x in 0x%llx out 0x%llx", __entry->code, __entry->fast ? "fast" : "slow", __entry->var_cnt, __entry->rep_cnt, __entry->rep_idx, __entry->ingpa, __entry->outgpa) ); TRACE_EVENT(kvm_hv_hypercall_done, TP_PROTO(u64 result), TP_ARGS(result), TP_STRUCT__entry( __field(__u64, result) ), TP_fast_assign( __entry->result = result; ), TP_printk("result 0x%llx", __entry->result) ); /* * Tracepoint for Xen hypercall. */ TRACE_EVENT(kvm_xen_hypercall, TP_PROTO(u8 cpl, unsigned long nr, unsigned long a0, unsigned long a1, unsigned long a2, unsigned long a3, unsigned long a4, unsigned long a5), TP_ARGS(cpl, nr, a0, a1, a2, a3, a4, a5), TP_STRUCT__entry( __field(u8, cpl) __field(unsigned long, nr) __field(unsigned long, a0) __field(unsigned long, a1) __field(unsigned long, a2) __field(unsigned long, a3) __field(unsigned long, a4) __field(unsigned long, a5) ), TP_fast_assign( __entry->cpl = cpl; __entry->nr = nr; __entry->a0 = a0; __entry->a1 = a1; __entry->a2 = a2; __entry->a3 = a3; __entry->a4 = a4; __entry->a4 = a5; ), TP_printk("cpl %d nr 0x%lx a0 0x%lx a1 0x%lx a2 0x%lx a3 0x%lx a4 0x%lx a5 %lx", __entry->cpl, __entry->nr, __entry->a0, __entry->a1, __entry->a2, __entry->a3, __entry->a4, __entry->a5) ); /* * Tracepoint for PIO. */ #define KVM_PIO_IN 0 #define KVM_PIO_OUT 1 TRACE_EVENT(kvm_pio, TP_PROTO(unsigned int rw, unsigned int port, unsigned int size, unsigned int count, const void *data), TP_ARGS(rw, port, size, count, data), TP_STRUCT__entry( __field( unsigned int, rw ) __field( unsigned int, port ) __field( unsigned int, size ) __field( unsigned int, count ) __field( unsigned int, val ) ), TP_fast_assign( __entry->rw = rw; __entry->port = port; __entry->size = size; __entry->count = count; if (size == 1) __entry->val = *(unsigned char *)data; else if (size == 2) __entry->val = *(unsigned short *)data; else __entry->val = *(unsigned int *)data; ), TP_printk("pio_%s at 0x%x size %d count %d val 0x%x %s", __entry->rw ? "write" : "read", __entry->port, __entry->size, __entry->count, __entry->val, __entry->count > 1 ? "(...)" : "") ); /* * Tracepoint for fast mmio. */ TRACE_EVENT(kvm_fast_mmio, TP_PROTO(u64 gpa), TP_ARGS(gpa), TP_STRUCT__entry( __field(u64, gpa) ), TP_fast_assign( __entry->gpa = gpa; ), TP_printk("fast mmio at gpa 0x%llx", __entry->gpa) ); /* * Tracepoint for cpuid. */ TRACE_EVENT(kvm_cpuid, TP_PROTO(unsigned int function, unsigned int index, unsigned long rax, unsigned long rbx, unsigned long rcx, unsigned long rdx, bool found, bool used_max_basic), TP_ARGS(function, index, rax, rbx, rcx, rdx, found, used_max_basic), TP_STRUCT__entry( __field( unsigned int, function ) __field( unsigned int, index ) __field( unsigned long, rax ) __field( unsigned long, rbx ) __field( unsigned long, rcx ) __field( unsigned long, rdx ) __field( bool, found ) __field( bool, used_max_basic ) ), TP_fast_assign( __entry->function = function; __entry->index = index; __entry->rax = rax; __entry->rbx = rbx; __entry->rcx = rcx; __entry->rdx = rdx; __entry->found = found; __entry->used_max_basic = used_max_basic; ), TP_printk("func %x idx %x rax %lx rbx %lx rcx %lx rdx %lx, cpuid entry %s%s", __entry->function, __entry->index, __entry->rax, __entry->rbx, __entry->rcx, __entry->rdx, __entry->found ? "found" : "not found", __entry->used_max_basic ? ", used max basic" : "") ); #define kvm_deliver_mode \ {0x0, "Fixed"}, \ {0x1, "LowPrio"}, \ {0x2, "SMI"}, \ {0x3, "Res3"}, \ {0x4, "NMI"}, \ {0x5, "INIT"}, \ {0x6, "SIPI"}, \ {0x7, "ExtINT"} #ifdef CONFIG_KVM_IOAPIC TRACE_EVENT(kvm_ioapic_set_irq, TP_PROTO(__u64 e, int pin, bool coalesced), TP_ARGS(e, pin, coalesced), TP_STRUCT__entry( __field( __u64, e ) __field( int, pin ) __field( bool, coalesced ) ), TP_fast_assign( __entry->e = e; __entry->pin = pin; __entry->coalesced = coalesced; ), TP_printk("pin %u dst %x vec %u (%s|%s|%s%s)%s", __entry->pin, (u8)(__entry->e >> 56), (u8)__entry->e, __print_symbolic((__entry->e >> 8 & 0x7), kvm_deliver_mode), (__entry->e & (1<<11)) ? "logical" : "physical", (__entry->e & (1<<15)) ? "level" : "edge", (__entry->e & (1<<16)) ? "|masked" : "", __entry->coalesced ? " (coalesced)" : "") ); TRACE_EVENT(kvm_ioapic_delayed_eoi_inj, TP_PROTO(__u64 e), TP_ARGS(e), TP_STRUCT__entry( __field( __u64, e ) ), TP_fast_assign( __entry->e = e; ), TP_printk("dst %x vec %u (%s|%s|%s%s)", (u8)(__entry->e >> 56), (u8)__entry->e, __print_symbolic((__entry->e >> 8 & 0x7), kvm_deliver_mode), (__entry->e & (1<<11)) ? "logical" : "physical", (__entry->e & (1<<15)) ? "level" : "edge", (__entry->e & (1<<16)) ? "|masked" : "") ); #endif TRACE_EVENT(kvm_msi_set_irq, TP_PROTO(__u64 address, __u64 data), TP_ARGS(address, data), TP_STRUCT__entry( __field( __u64, address ) __field( __u64, data ) ), TP_fast_assign( __entry->address = address; __entry->data = data; ), TP_printk("dst %llx vec %u (%s|%s|%s%s)", (u8)(__entry->address >> 12) | ((__entry->address >> 32) & 0xffffff00), (u8)__entry->data, __print_symbolic((__entry->data >> 8 & 0x7), kvm_deliver_mode), (__entry->address & (1<<2)) ? "logical" : "physical", (__entry->data & (1<<15)) ? "level" : "edge", (__entry->address & (1<<3)) ? "|rh" : "") ); #define AREG(x) { APIC_##x, "APIC_" #x } #define kvm_trace_symbol_apic \ AREG(ID), AREG(LVR), AREG(TASKPRI), AREG(ARBPRI), AREG(PROCPRI), \ AREG(EOI), AREG(RRR), AREG(LDR), AREG(DFR), AREG(SPIV), AREG(ISR), \ AREG(TMR), AREG(IRR), AREG(ESR), AREG(ICR), AREG(ICR2), AREG(LVTT), \ AREG(LVTTHMR), AREG(LVTPC), AREG(LVT0), AREG(LVT1), AREG(LVTERR), \ AREG(TMICT), AREG(TMCCT), AREG(TDCR), AREG(SELF_IPI), AREG(EFEAT), \ AREG(ECTRL) /* * Tracepoint for apic access. */ TRACE_EVENT(kvm_apic, TP_PROTO(unsigned int rw, unsigned int reg, u64 val), TP_ARGS(rw, reg, val), TP_STRUCT__entry( __field( unsigned int, rw ) __field( unsigned int, reg ) __field( u64, val ) ), TP_fast_assign( __entry->rw = rw; __entry->reg = reg; __entry->val = val; ), TP_printk("apic_%s %s = 0x%llx", __entry->rw ? "write" : "read", __print_symbolic(__entry->reg, kvm_trace_symbol_apic), __entry->val) ); #define trace_kvm_apic_read(reg, val) trace_kvm_apic(0, reg, val) #define trace_kvm_apic_write(reg, val) trace_kvm_apic(1, reg, val) #define KVM_ISA_VMX 1 #define KVM_ISA_SVM 2 #define kvm_print_exit_reason(exit_reason, isa) \ (isa == KVM_ISA_VMX) ? \ __print_symbolic(exit_reason & 0xffff, VMX_EXIT_REASONS) : \ __print_symbolic_u64(exit_reason, SVM_EXIT_REASONS), \ (isa == KVM_ISA_VMX && exit_reason & ~0xffff) ? " " : "", \ (isa == KVM_ISA_VMX) ? \ __print_flags_u64(exit_reason & ~0xffff, " ", VMX_EXIT_REASON_FLAGS) : "" #define TRACE_EVENT_KVM_EXIT(name) \ TRACE_EVENT(name, \ TP_PROTO(struct kvm_vcpu *vcpu, u32 isa), \ TP_ARGS(vcpu, isa), \ \ TP_STRUCT__entry( \ __field( unsigned int, exit_reason ) \ __field( unsigned long, guest_rip ) \ __field( u32, isa ) \ __field( u64, info1 ) \ __field( u64, info2 ) \ __field( u32, intr_info ) \ __field( u32, error_code ) \ __field( unsigned int, vcpu_id ) \ __field( u64, requests ) \ ), \ \ TP_fast_assign( \ __entry->guest_rip = tracing_kvm_rip_read(vcpu); \ __entry->isa = isa; \ __entry->vcpu_id = vcpu->vcpu_id; \ __entry->requests = READ_ONCE(vcpu->requests); \ kvm_x86_call(get_exit_info)(vcpu, \ &__entry->exit_reason, \ &__entry->info1, \ &__entry->info2, \ &__entry->intr_info, \ &__entry->error_code); \ ), \ \ TP_printk("vcpu %u reason %s%s%s rip 0x%lx info1 0x%016llx " \ "info2 0x%016llx intr_info 0x%08x error_code 0x%08x " \ "requests 0x%016llx", \ __entry->vcpu_id, \ kvm_print_exit_reason(__entry->exit_reason, __entry->isa), \ __entry->guest_rip, __entry->info1, __entry->info2, \ __entry->intr_info, __entry->error_code, \ __entry->requests) \ ) /* * Tracepoint for kvm guest exit: */ TRACE_EVENT_KVM_EXIT(kvm_exit); /* * Tracepoint for kvm interrupt injection: */ TRACE_EVENT(kvm_inj_virq, TP_PROTO(unsigned int vector, bool soft, bool reinjected), TP_ARGS(vector, soft, reinjected), TP_STRUCT__entry( __field( unsigned int, vector ) __field( bool, soft ) __field( bool, reinjected ) ), TP_fast_assign( __entry->vector = vector; __entry->soft = soft; __entry->reinjected = reinjected; ), TP_printk("%s 0x%x%s", __entry->soft ? "Soft/INTn" : "IRQ", __entry->vector, __entry->reinjected ? " [reinjected]" : "") ); #define EXS(x) { x##_VECTOR, "#" #x } #define kvm_trace_sym_exc \ EXS(DE), EXS(DB), EXS(BP), EXS(OF), EXS(BR), EXS(UD), EXS(NM), \ EXS(DF), EXS(TS), EXS(NP), EXS(SS), EXS(GP), EXS(PF), EXS(MF), \ EXS(AC), EXS(MC), EXS(XM), EXS(VE), EXS(CP), \ EXS(HV), EXS(VC), EXS(SX) /* * Tracepoint for kvm interrupt injection: */ TRACE_EVENT(kvm_inj_exception, TP_PROTO(unsigned exception, bool has_error, unsigned error_code, bool reinjected), TP_ARGS(exception, has_error, error_code, reinjected), TP_STRUCT__entry( __field( u8, exception ) __field( u8, has_error ) __field( u32, error_code ) __field( bool, reinjected ) ), TP_fast_assign( __entry->exception = exception; __entry->has_error = has_error; __entry->error_code = error_code; __entry->reinjected = reinjected; ), TP_printk("%s%s%s%s%s", __print_symbolic(__entry->exception, kvm_trace_sym_exc), !__entry->has_error ? "" : " (", !__entry->has_error ? "" : __print_symbolic(__entry->error_code, { }), !__entry->has_error ? "" : ")", __entry->reinjected ? " [reinjected]" : "") ); /* * Tracepoint for page fault. */ TRACE_EVENT(kvm_page_fault, TP_PROTO(struct kvm_vcpu *vcpu, u64 fault_address, u64 error_code), TP_ARGS(vcpu, fault_address, error_code), TP_STRUCT__entry( __field( unsigned int, vcpu_id ) __field( unsigned long, guest_rip ) __field( u64, fault_address ) __field( u64, error_code ) ), TP_fast_assign( __entry->vcpu_id = vcpu->vcpu_id; __entry->guest_rip = tracing_kvm_rip_read(vcpu); __entry->fault_address = fault_address; __entry->error_code = error_code; ), TP_printk("vcpu %u rip 0x%lx address 0x%016llx error_code 0x%llx", __entry->vcpu_id, __entry->guest_rip, __entry->fault_address, __entry->error_code) ); /* * Tracepoint for guest MSR access. */ TRACE_EVENT(kvm_msr, TP_PROTO(unsigned write, u32 ecx, u64 data, bool exception), TP_ARGS(write, ecx, data, exception), TP_STRUCT__entry( __field( unsigned, write ) __field( u32, ecx ) __field( u64, data ) __field( u8, exception ) ), TP_fast_assign( __entry->write = write; __entry->ecx = ecx; __entry->data = data; __entry->exception = exception; ), TP_printk("msr_%s %x = 0x%llx%s", __entry->write ? "write" : "read", __entry->ecx, __entry->data, __entry->exception ? " (#GP)" : "") ); #define trace_kvm_msr_read(ecx, data) trace_kvm_msr(0, ecx, data, false) #define trace_kvm_msr_write(ecx, data) trace_kvm_msr(1, ecx, data, false) #define trace_kvm_msr_read_ex(ecx) trace_kvm_msr(0, ecx, 0, true) #define trace_kvm_msr_write_ex(ecx, data) trace_kvm_msr(1, ecx, data, true) /* * Tracepoint for guest CR access. */ TRACE_EVENT(kvm_cr, TP_PROTO(unsigned int rw, unsigned int cr, unsigned long val), TP_ARGS(rw, cr, val), TP_STRUCT__entry( __field( unsigned int, rw ) __field( unsigned int, cr ) __field( unsigned long, val ) ), TP_fast_assign( __entry->rw = rw; __entry->cr = cr; __entry->val = val; ), TP_printk("cr_%s %x = 0x%lx", __entry->rw ? "write" : "read", __entry->cr, __entry->val) ); #define trace_kvm_cr_read(cr, val) trace_kvm_cr(0, cr, val) #define trace_kvm_cr_write(cr, val) trace_kvm_cr(1, cr, val) TRACE_EVENT(kvm_pic_set_irq, TP_PROTO(__u8 chip, __u8 pin, __u8 elcr, __u8 imr, bool coalesced), TP_ARGS(chip, pin, elcr, imr, coalesced), TP_STRUCT__entry( __field( __u8, chip ) __field( __u8, pin ) __field( __u8, elcr ) __field( __u8, imr ) __field( bool, coalesced ) ), TP_fast_assign( __entry->chip = chip; __entry->pin = pin; __entry->elcr = elcr; __entry->imr = imr; __entry->coalesced = coalesced; ), TP_printk("chip %u pin %u (%s%s)%s", __entry->chip, __entry->pin, (__entry->elcr & (1 << __entry->pin)) ? "level":"edge", (__entry->imr & (1 << __entry->pin)) ? "|masked":"", __entry->coalesced ? " (coalesced)" : "") ); #define kvm_apic_dst_shorthand \ {0x0, "dst"}, \ {0x1, "self"}, \ {0x2, "all"}, \ {0x3, "all-but-self"} TRACE_EVENT(kvm_apic_ipi, TP_PROTO(__u32 icr_low, __u32 dest_id), TP_ARGS(icr_low, dest_id), TP_STRUCT__entry( __field( __u32, icr_low ) __field( __u32, dest_id ) ), TP_fast_assign( __entry->icr_low = icr_low; __entry->dest_id = dest_id; ), TP_printk("dst %x vec %u (%s|%s|%s|%s|%s)", __entry->dest_id, (u8)__entry->icr_low, __print_symbolic((__entry->icr_low >> 8 & 0x7), kvm_deliver_mode), (__entry->icr_low & (1<<11)) ? "logical" : "physical", (__entry->icr_low & (1<<14)) ? "assert" : "de-assert", (__entry->icr_low & (1<<15)) ? "level" : "edge", __print_symbolic((__entry->icr_low >> 18 & 0x3), kvm_apic_dst_shorthand)) ); TRACE_EVENT(kvm_apic_accept_irq, TP_PROTO(__u32 apicid, __u16 dm, __u16 tm, __u8 vec), TP_ARGS(apicid, dm, tm, vec), TP_STRUCT__entry( __field( __u32, apicid ) __field( __u16, dm ) __field( __u16, tm ) __field( __u8, vec ) ), TP_fast_assign( __entry->apicid = apicid; __entry->dm = dm; __entry->tm = tm; __entry->vec = vec; ), TP_printk("apicid %x vec %u (%s|%s)", __entry->apicid, __entry->vec, __print_symbolic((__entry->dm >> 8 & 0x7), kvm_deliver_mode), __entry->tm ? "level" : "edge") ); TRACE_EVENT(kvm_eoi, TP_PROTO(struct kvm_lapic *apic, int vector), TP_ARGS(apic, vector), TP_STRUCT__entry( __field( __u32, apicid ) __field( int, vector ) ), TP_fast_assign( __entry->apicid = apic->vcpu->vcpu_id; __entry->vector = vector; ), TP_printk("apicid %x vector %d", __entry->apicid, __entry->vector) ); TRACE_EVENT(kvm_pv_eoi, TP_PROTO(struct kvm_lapic *apic, int vector), TP_ARGS(apic, vector), TP_STRUCT__entry( __field( __u32, apicid ) __field( int, vector ) ), TP_fast_assign( __entry->apicid = apic->vcpu->vcpu_id; __entry->vector = vector; ), TP_printk("apicid %x vector %d", __entry->apicid, __entry->vector) ); /* * Tracepoint for nested VMRUN */ TRACE_EVENT(kvm_nested_vmenter, TP_PROTO(__u64 rip, __u64 vmcb, __u64 nested_rip, __u32 int_ctl, __u32 event_inj, bool tdp_enabled, __u64 guest_tdp_pgd, __u64 guest_cr3, __u32 isa), TP_ARGS(rip, vmcb, nested_rip, int_ctl, event_inj, tdp_enabled, guest_tdp_pgd, guest_cr3, isa), TP_STRUCT__entry( __field( __u64, rip ) __field( __u64, vmcb ) __field( __u64, nested_rip ) __field( __u32, int_ctl ) __field( __u32, event_inj ) __field( bool, tdp_enabled ) __field( __u64, guest_pgd ) __field( __u32, isa ) ), TP_fast_assign( __entry->rip = rip; __entry->vmcb = vmcb; __entry->nested_rip = nested_rip; __entry->int_ctl = int_ctl; __entry->event_inj = event_inj; __entry->tdp_enabled = tdp_enabled; __entry->guest_pgd = tdp_enabled ? guest_tdp_pgd : guest_cr3; __entry->isa = isa; ), TP_printk("rip: 0x%016llx %s: 0x%016llx nested_rip: 0x%016llx " "int_ctl: 0x%08x event_inj: 0x%08x nested_%s=%s %s: 0x%016llx", __entry->rip, __entry->isa == KVM_ISA_VMX ? "vmcs" : "vmcb", __entry->vmcb, __entry->nested_rip, __entry->int_ctl, __entry->event_inj, __entry->isa == KVM_ISA_VMX ? "ept" : "npt", __entry->tdp_enabled ? "y" : "n", !__entry->tdp_enabled ? "guest_cr3" : __entry->isa == KVM_ISA_VMX ? "nested_eptp" : "nested_cr3", __entry->guest_pgd) ); TRACE_EVENT(kvm_nested_intercepts, TP_PROTO(__u16 cr_read, __u16 cr_write, __u32 exceptions, __u32 intercept1, __u32 intercept2, __u32 intercept3), TP_ARGS(cr_read, cr_write, exceptions, intercept1, intercept2, intercept3), TP_STRUCT__entry( __field( __u16, cr_read ) __field( __u16, cr_write ) __field( __u32, exceptions ) __field( __u32, intercept1 ) __field( __u32, intercept2 ) __field( __u32, intercept3 ) ), TP_fast_assign( __entry->cr_read = cr_read; __entry->cr_write = cr_write; __entry->exceptions = exceptions; __entry->intercept1 = intercept1; __entry->intercept2 = intercept2; __entry->intercept3 = intercept3; ), TP_printk("cr_read: %04x cr_write: %04x excp: %08x " "intercepts: %08x %08x %08x", __entry->cr_read, __entry->cr_write, __entry->exceptions, __entry->intercept1, __entry->intercept2, __entry->intercept3) ); /* * Tracepoint for #VMEXIT while nested */ TRACE_EVENT_KVM_EXIT(kvm_nested_vmexit); /* * Tracepoint for #VMEXIT reinjected to the guest */ TRACE_EVENT(kvm_nested_vmexit_inject, TP_PROTO(__u64 exit_code, __u64 exit_info1, __u64 exit_info2, __u32 exit_int_info, __u32 exit_int_info_err, __u32 isa), TP_ARGS(exit_code, exit_info1, exit_info2, exit_int_info, exit_int_info_err, isa), TP_STRUCT__entry( __field( __u32, exit_code ) __field( __u64, exit_info1 ) __field( __u64, exit_info2 ) __field( __u32, exit_int_info ) __field( __u32, exit_int_info_err ) __field( __u32, isa ) ), TP_fast_assign( __entry->exit_code = exit_code; __entry->exit_info1 = exit_info1; __entry->exit_info2 = exit_info2; __entry->exit_int_info = exit_int_info; __entry->exit_int_info_err = exit_int_info_err; __entry->isa = isa; ), TP_printk("reason: %s%s%s ext_inf1: 0x%016llx " "ext_inf2: 0x%016llx ext_int: 0x%08x ext_int_err: 0x%08x", kvm_print_exit_reason(__entry->exit_code, __entry->isa), __entry->exit_info1, __entry->exit_info2, __entry->exit_int_info, __entry->exit_int_info_err) ); /* * Tracepoint for nested #vmexit because of interrupt pending */ TRACE_EVENT(kvm_nested_intr_vmexit, TP_PROTO(__u64 rip), TP_ARGS(rip), TP_STRUCT__entry( __field( __u64, rip ) ), TP_fast_assign( __entry->rip = rip ), TP_printk("rip: 0x%016llx", __entry->rip) ); /* * Tracepoint for nested #vmexit because of interrupt pending */ TRACE_EVENT(kvm_invlpga, TP_PROTO(__u64 rip, unsigned int asid, u64 address), TP_ARGS(rip, asid, address), TP_STRUCT__entry( __field( __u64, rip ) __field( unsigned int, asid ) __field( __u64, address ) ), TP_fast_assign( __entry->rip = rip; __entry->asid = asid; __entry->address = address; ), TP_printk("rip: 0x%016llx asid: %u address: 0x%016llx", __entry->rip, __entry->asid, __entry->address) ); /* * Tracepoint for nested #vmexit because of interrupt pending */ TRACE_EVENT(kvm_skinit, TP_PROTO(__u64 rip, __u32 slb), TP_ARGS(rip, slb), TP_STRUCT__entry( __field( __u64, rip ) __field( __u32, slb ) ), TP_fast_assign( __entry->rip = rip; __entry->slb = slb; ), TP_printk("rip: 0x%016llx slb: 0x%08x", __entry->rip, __entry->slb) ); #define KVM_EMUL_INSN_F_CR0_PE (1 << 0) #define KVM_EMUL_INSN_F_EFL_VM (1 << 1) #define KVM_EMUL_INSN_F_CS_D (1 << 2) #define KVM_EMUL_INSN_F_CS_L (1 << 3) #define kvm_trace_symbol_emul_flags \ { 0, "real" }, \ { KVM_EMUL_INSN_F_CR0_PE \ | KVM_EMUL_INSN_F_EFL_VM, "vm16" }, \ { KVM_EMUL_INSN_F_CR0_PE, "prot16" }, \ { KVM_EMUL_INSN_F_CR0_PE \ | KVM_EMUL_INSN_F_CS_D, "prot32" }, \ { KVM_EMUL_INSN_F_CR0_PE \ | KVM_EMUL_INSN_F_CS_L, "prot64" } #define kei_decode_mode(mode) ({ \ u8 flags = 0xff; \ switch (mode) { \ case X86EMUL_MODE_REAL: \ flags = 0; \ break; \ case X86EMUL_MODE_VM86: \ flags = KVM_EMUL_INSN_F_EFL_VM; \ break; \ case X86EMUL_MODE_PROT16: \ flags = KVM_EMUL_INSN_F_CR0_PE; \ break; \ case X86EMUL_MODE_PROT32: \ flags = KVM_EMUL_INSN_F_CR0_PE \ | KVM_EMUL_INSN_F_CS_D; \ break; \ case X86EMUL_MODE_PROT64: \ flags = KVM_EMUL_INSN_F_CR0_PE \ | KVM_EMUL_INSN_F_CS_L; \ break; \ } \ flags; \ }) TRACE_EVENT(kvm_emulate_insn, TP_PROTO(struct kvm_vcpu *vcpu, __u8 failed), TP_ARGS(vcpu, failed), TP_STRUCT__entry( __field( __u64, rip ) __field( __u32, csbase ) __field( __u8, len ) __array( __u8, insn, X86_MAX_INSTRUCTION_LENGTH ) __field( __u8, flags ) __field( __u8, failed ) ), TP_fast_assign( __entry->csbase = kvm_x86_call(get_segment_base)(vcpu, VCPU_SREG_CS); __entry->len = vcpu->arch.emulate_ctxt->fetch.ptr - vcpu->arch.emulate_ctxt->fetch.data; __entry->rip = vcpu->arch.emulate_ctxt->_eip - __entry->len; memcpy(__entry->insn, vcpu->arch.emulate_ctxt->fetch.data, X86_MAX_INSTRUCTION_LENGTH); __entry->flags = kei_decode_mode(vcpu->arch.emulate_ctxt->mode); __entry->failed = failed; ), TP_printk("%x:%llx:%s (%s)%s", __entry->csbase, __entry->rip, __print_hex(__entry->insn, __entry->len), __print_symbolic(__entry->flags, kvm_trace_symbol_emul_flags), __entry->failed ? " failed" : "" ) ); #define trace_kvm_emulate_insn_start(vcpu) trace_kvm_emulate_insn(vcpu, 0) #define trace_kvm_emulate_insn_failed(vcpu) trace_kvm_emulate_insn(vcpu, 1) TRACE_EVENT( vcpu_match_mmio, TP_PROTO(gva_t gva, gpa_t gpa, bool write, bool gpa_match), TP_ARGS(gva, gpa, write, gpa_match), TP_STRUCT__entry( __field(gva_t, gva) __field(gpa_t, gpa) __field(bool, write) __field(bool, gpa_match) ), TP_fast_assign( __entry->gva = gva; __entry->gpa = gpa; __entry->write = write; __entry->gpa_match = gpa_match ), TP_printk("gva %#lx gpa %#llx %s %s", __entry->gva, __entry->gpa, __entry->write ? "Write" : "Read", __entry->gpa_match ? "GPA" : "GVA") ); TRACE_EVENT(kvm_write_tsc_offset, TP_PROTO(unsigned int vcpu_id, __u64 previous_tsc_offset, __u64 next_tsc_offset), TP_ARGS(vcpu_id, previous_tsc_offset, next_tsc_offset), TP_STRUCT__entry( __field( unsigned int, vcpu_id ) __field( __u64, previous_tsc_offset ) __field( __u64, next_tsc_offset ) ), TP_fast_assign( __entry->vcpu_id = vcpu_id; __entry->previous_tsc_offset = previous_tsc_offset; __entry->next_tsc_offset = next_tsc_offset; ), TP_printk("vcpu=%u prev=%llu next=%llu", __entry->vcpu_id, __entry->previous_tsc_offset, __entry->next_tsc_offset) ); #ifdef CONFIG_X86_64 #define host_clocks \ {VDSO_CLOCKMODE_NONE, "none"}, \ {VDSO_CLOCKMODE_TSC, "tsc"} \ TRACE_EVENT(kvm_update_master_clock, TP_PROTO(bool use_master_clock, unsigned int host_clock, bool offset_matched), TP_ARGS(use_master_clock, host_clock, offset_matched), TP_STRUCT__entry( __field( bool, use_master_clock ) __field( unsigned int, host_clock ) __field( bool, offset_matched ) ), TP_fast_assign( __entry->use_master_clock = use_master_clock; __entry->host_clock = host_clock; __entry->offset_matched = offset_matched; ), TP_printk("masterclock %d hostclock %s offsetmatched %u", __entry->use_master_clock, __print_symbolic(__entry->host_clock, host_clocks), __entry->offset_matched) ); TRACE_EVENT(kvm_track_tsc, TP_PROTO(unsigned int vcpu_id, unsigned int nr_matched, unsigned int online_vcpus, bool use_master_clock, unsigned int host_clock), TP_ARGS(vcpu_id, nr_matched, online_vcpus, use_master_clock, host_clock), TP_STRUCT__entry( __field( unsigned int, vcpu_id ) __field( unsigned int, nr_vcpus_matched_tsc ) __field( unsigned int, online_vcpus ) __field( bool, use_master_clock ) __field( unsigned int, host_clock ) ), TP_fast_assign( __entry->vcpu_id = vcpu_id; __entry->nr_vcpus_matched_tsc = nr_matched; __entry->online_vcpus = online_vcpus; __entry->use_master_clock = use_master_clock; __entry->host_clock = host_clock; ), TP_printk("vcpu_id %u masterclock %u offsetmatched %u nr_online %u" " hostclock %s", __entry->vcpu_id, __entry->use_master_clock, __entry->nr_vcpus_matched_tsc, __entry->online_vcpus, __print_symbolic(__entry->host_clock, host_clocks)) ); #endif /* CONFIG_X86_64 */ /* * Tracepoint for PML full VMEXIT. */ TRACE_EVENT(kvm_pml_full, TP_PROTO(unsigned int vcpu_id), TP_ARGS(vcpu_id), TP_STRUCT__entry( __field( unsigned int, vcpu_id ) ), TP_fast_assign( __entry->vcpu_id = vcpu_id; ), TP_printk("vcpu %d: PML full", __entry->vcpu_id) ); TRACE_EVENT(kvm_ple_window_update, TP_PROTO(unsigned int vcpu_id, unsigned int new, unsigned int old), TP_ARGS(vcpu_id, new, old), TP_STRUCT__entry( __field( unsigned int, vcpu_id ) __field( unsigned int, new ) __field( unsigned int, old ) ), TP_fast_assign( __entry->vcpu_id = vcpu_id; __entry->new = new; __entry->old = old; ), TP_printk("vcpu %u old %u new %u (%s)", __entry->vcpu_id, __entry->old, __entry->new, __entry->old < __entry->new ? "growed" : "shrinked") ); TRACE_EVENT(kvm_pvclock_update, TP_PROTO(unsigned int vcpu_id, struct pvclock_vcpu_time_info *pvclock), TP_ARGS(vcpu_id, pvclock), TP_STRUCT__entry( __field( unsigned int, vcpu_id ) __field( __u32, version ) __field( __u64, tsc_timestamp ) __field( __u64, system_time ) __field( __u32, tsc_to_system_mul ) __field( __s8, tsc_shift ) __field( __u8, flags ) ), TP_fast_assign( __entry->vcpu_id = vcpu_id; __entry->version = pvclock->version; __entry->tsc_timestamp = pvclock->tsc_timestamp; __entry->system_time = pvclock->system_time; __entry->tsc_to_system_mul = pvclock->tsc_to_system_mul; __entry->tsc_shift = pvclock->tsc_shift; __entry->flags = pvclock->flags; ), TP_printk("vcpu_id %u, pvclock { version %u, tsc_timestamp 0x%llx, " "system_time 0x%llx, tsc_to_system_mul 0x%x, tsc_shift %d, " "flags 0x%x }", __entry->vcpu_id, __entry->version, __entry->tsc_timestamp, __entry->system_time, __entry->tsc_to_system_mul, __entry->tsc_shift, __entry->flags) ); TRACE_EVENT(kvm_wait_lapic_expire, TP_PROTO(unsigned int vcpu_id, s64 delta), TP_ARGS(vcpu_id, delta), TP_STRUCT__entry( __field( unsigned int, vcpu_id ) __field( s64, delta ) ), TP_fast_assign( __entry->vcpu_id = vcpu_id; __entry->delta = delta; ), TP_printk("vcpu %u: delta %lld (%s)", __entry->vcpu_id, __entry->delta, __entry->delta < 0 ? "early" : "late") ); TRACE_EVENT(kvm_smm_transition, TP_PROTO(unsigned int vcpu_id, u64 smbase, bool entering), TP_ARGS(vcpu_id, smbase, entering), TP_STRUCT__entry( __field( unsigned int, vcpu_id ) __field( u64, smbase ) __field( bool, entering ) ), TP_fast_assign( __entry->vcpu_id = vcpu_id; __entry->smbase = smbase; __entry->entering = entering; |