| 9 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 | /* * Copyright (c) 2016 Intel Corporation * * Permission to use, copy, modify, distribute, and sell this software and its * documentation for any purpose is hereby granted without fee, provided that * the above copyright notice appear in all copies and that both that copyright * notice and this permission notice appear in supporting documentation, and * that the name of the copyright holders not be used in advertising or * publicity pertaining to distribution of the software without specific, * written prior permission. The copyright holders make no representations * about the suitability of this software for any purpose. It is provided "as * is" without express or implied warranty. * * THE COPYRIGHT HOLDERS DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, * INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO * EVENT SHALL THE COPYRIGHT HOLDERS BE LIABLE FOR ANY SPECIAL, INDIRECT OR * CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, * DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER * TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE * OF THIS SOFTWARE. */ #include <drm/drm_atomic_helper.h> #include <drm/drm_client_event.h> #include <drm/drm_fourcc.h> #include <drm/drm_framebuffer.h> #include <drm/drm_modeset_helper.h> #include <drm/drm_plane_helper.h> #include <drm/drm_print.h> #include <drm/drm_probe_helper.h> /** * DOC: aux kms helpers * * This helper library contains various one-off functions which don't really fit * anywhere else in the DRM modeset helper library. */ /** * drm_helper_move_panel_connectors_to_head() - move panels to the front in the * connector list * @dev: drm device to operate on * * Some userspace presumes that the first connected connector is the main * display, where it's supposed to display e.g. the login screen. For * laptops, this should be the main panel. Use this function to sort all * (eDP/LVDS/DSI) panels to the front of the connector list, instead of * painstakingly trying to initialize them in the right order. */ void drm_helper_move_panel_connectors_to_head(struct drm_device *dev) { struct drm_connector *connector, *tmp; struct list_head panel_list; INIT_LIST_HEAD(&panel_list); spin_lock_irq(&dev->mode_config.connector_list_lock); list_for_each_entry_safe(connector, tmp, &dev->mode_config.connector_list, head) { if (connector->connector_type == DRM_MODE_CONNECTOR_LVDS || connector->connector_type == DRM_MODE_CONNECTOR_eDP || connector->connector_type == DRM_MODE_CONNECTOR_DSI) list_move_tail(&connector->head, &panel_list); } list_splice(&panel_list, &dev->mode_config.connector_list); spin_unlock_irq(&dev->mode_config.connector_list_lock); } EXPORT_SYMBOL(drm_helper_move_panel_connectors_to_head); /** * drm_helper_mode_fill_fb_struct - fill out framebuffer metadata * @dev: DRM device * @fb: drm_framebuffer object to fill out * @mode_cmd: metadata from the userspace fb creation request * * This helper can be used in a drivers fb_create callback to pre-fill the fb's * metadata fields. */ void drm_helper_mode_fill_fb_struct(struct drm_device *dev, struct drm_framebuffer *fb, const struct drm_mode_fb_cmd2 *mode_cmd) { int i; fb->dev = dev; fb->format = drm_get_format_info(dev, mode_cmd); fb->width = mode_cmd->width; fb->height = mode_cmd->height; for (i = 0; i < 4; i++) { fb->pitches[i] = mode_cmd->pitches[i]; fb->offsets[i] = mode_cmd->offsets[i]; } fb->modifier = mode_cmd->modifier[0]; fb->flags = mode_cmd->flags; } EXPORT_SYMBOL(drm_helper_mode_fill_fb_struct); /* * This is the minimal list of formats that seem to be safe for modeset use * with all current DRM drivers. Most hardware can actually support more * formats than this and drivers may specify a more accurate list when * creating the primary plane. */ static const uint32_t safe_modeset_formats[] = { DRM_FORMAT_XRGB8888, DRM_FORMAT_ARGB8888, }; static const struct drm_plane_funcs primary_plane_funcs = { DRM_PLANE_NON_ATOMIC_FUNCS, }; /** * drm_crtc_init - Legacy CRTC initialization function * @dev: DRM device * @crtc: CRTC object to init * @funcs: callbacks for the new CRTC * * Initialize a CRTC object with a default helper-provided primary plane and no * cursor plane. * * Note that we make some assumptions about hardware limitations that may not be * true for all hardware: * * 1. Primary plane cannot be repositioned. * 2. Primary plane cannot be scaled. * 3. Primary plane must cover the entire CRTC. * 4. Subpixel positioning is not supported. * 5. The primary plane must always be on if the CRTC is enabled. * * This is purely a backwards compatibility helper for old drivers. Drivers * should instead implement their own primary plane. Atomic drivers must do so. * Drivers with the above hardware restriction can look into using &struct * drm_simple_display_pipe, which encapsulates the above limitations into a nice * interface. * * Returns: * Zero on success, error code on failure. */ int drm_crtc_init(struct drm_device *dev, struct drm_crtc *crtc, const struct drm_crtc_funcs *funcs) { struct drm_plane *primary; int ret; /* possible_crtc's will be filled in later by crtc_init */ primary = __drm_universal_plane_alloc(dev, sizeof(*primary), 0, 0, &primary_plane_funcs, safe_modeset_formats, ARRAY_SIZE(safe_modeset_formats), NULL, DRM_PLANE_TYPE_PRIMARY, NULL); if (IS_ERR(primary)) return PTR_ERR(primary); /* * Remove the format_default field from drm_plane when dropping * this helper. */ primary->format_default = true; ret = drm_crtc_init_with_planes(dev, crtc, primary, NULL, funcs, NULL); if (ret) goto err_drm_plane_cleanup; return 0; err_drm_plane_cleanup: drm_plane_cleanup(primary); kfree(primary); return ret; } EXPORT_SYMBOL(drm_crtc_init); /** * drm_mode_config_helper_suspend - Modeset suspend helper * @dev: DRM device * * This helper function takes care of suspending the modeset side. It disables * output polling if initialized, suspends fbdev if used and finally calls * drm_atomic_helper_suspend(). * If suspending fails, fbdev and polling is re-enabled. * * Returns: * Zero on success, negative error code on error. * * See also: * drm_kms_helper_poll_disable() and drm_client_dev_suspend(). */ int drm_mode_config_helper_suspend(struct drm_device *dev) { struct drm_atomic_state *state; if (!dev) return 0; /* * Don't disable polling if it was never initialized */ if (dev->mode_config.poll_enabled) drm_kms_helper_poll_disable(dev); drm_client_dev_suspend(dev, false); state = drm_atomic_helper_suspend(dev); if (IS_ERR(state)) { drm_client_dev_resume(dev, false); /* * Don't enable polling if it was never initialized */ if (dev->mode_config.poll_enabled) drm_kms_helper_poll_enable(dev); return PTR_ERR(state); } dev->mode_config.suspend_state = state; return 0; } EXPORT_SYMBOL(drm_mode_config_helper_suspend); /** * drm_mode_config_helper_resume - Modeset resume helper * @dev: DRM device * * This helper function takes care of resuming the modeset side. It calls * drm_atomic_helper_resume(), resumes fbdev if used and enables output polling * if initiaized. * * Returns: * Zero on success, negative error code on error. * * See also: * drm_client_dev_resume() and drm_kms_helper_poll_enable(). */ int drm_mode_config_helper_resume(struct drm_device *dev) { int ret; if (!dev) return 0; if (WARN_ON(!dev->mode_config.suspend_state)) return -EINVAL; ret = drm_atomic_helper_resume(dev, dev->mode_config.suspend_state); if (ret) DRM_ERROR("Failed to resume (%d)\n", ret); dev->mode_config.suspend_state = NULL; drm_client_dev_resume(dev, false); /* * Don't enable polling if it is not initialized */ if (dev->mode_config.poll_enabled) drm_kms_helper_poll_enable(dev); return ret; } EXPORT_SYMBOL(drm_mode_config_helper_resume); |
| 7 7 7 5 5 6 6 6 6 1 6 6 6 2 1 6 5 4 6 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 | // SPDX-License-Identifier: GPL-2.0-or-later /* * Copyright (C) 2011 Intel Corporation. All rights reserved. */ #define pr_fmt(fmt) "llcp: %s: " fmt, __func__ #include <linux/init.h> #include <linux/kernel.h> #include <linux/module.h> #include <linux/nfc.h> #include <net/nfc/nfc.h> #include "nfc.h" #include "llcp.h" static const u8 llcp_tlv_length[LLCP_TLV_MAX] = { 0, 1, /* VERSION */ 2, /* MIUX */ 2, /* WKS */ 1, /* LTO */ 1, /* RW */ 0, /* SN */ 1, /* OPT */ 0, /* SDREQ */ 2, /* SDRES */ }; static u8 llcp_tlv8(const u8 *tlv, u8 type) { if (tlv[0] != type || tlv[1] != llcp_tlv_length[tlv[0]]) return 0; return tlv[2]; } static u16 llcp_tlv16(const u8 *tlv, u8 type) { if (tlv[0] != type || tlv[1] != llcp_tlv_length[tlv[0]]) return 0; return be16_to_cpu(*((__be16 *)(tlv + 2))); } static u8 llcp_tlv_version(const u8 *tlv) { return llcp_tlv8(tlv, LLCP_TLV_VERSION); } static u16 llcp_tlv_miux(const u8 *tlv) { return llcp_tlv16(tlv, LLCP_TLV_MIUX) & 0x7ff; } static u16 llcp_tlv_wks(const u8 *tlv) { return llcp_tlv16(tlv, LLCP_TLV_WKS); } static u16 llcp_tlv_lto(const u8 *tlv) { return llcp_tlv8(tlv, LLCP_TLV_LTO); } static u8 llcp_tlv_opt(const u8 *tlv) { return llcp_tlv8(tlv, LLCP_TLV_OPT); } static u8 llcp_tlv_rw(const u8 *tlv) { return llcp_tlv8(tlv, LLCP_TLV_RW) & 0xf; } u8 *nfc_llcp_build_tlv(u8 type, const u8 *value, u8 value_length, u8 *tlv_length) { u8 *tlv, length; pr_debug("type %d\n", type); if (type >= LLCP_TLV_MAX) return NULL; length = llcp_tlv_length[type]; if (length == 0 && value_length == 0) return NULL; else if (length == 0) length = value_length; *tlv_length = 2 + length; tlv = kzalloc(2 + length, GFP_KERNEL); if (tlv == NULL) return tlv; tlv[0] = type; tlv[1] = length; memcpy(tlv + 2, value, length); return tlv; } struct nfc_llcp_sdp_tlv *nfc_llcp_build_sdres_tlv(u8 tid, u8 sap) { struct nfc_llcp_sdp_tlv *sdres; u8 value[2]; sdres = kzalloc(sizeof(struct nfc_llcp_sdp_tlv), GFP_KERNEL); if (sdres == NULL) return NULL; value[0] = tid; value[1] = sap; sdres->tlv = nfc_llcp_build_tlv(LLCP_TLV_SDRES, value, 2, &sdres->tlv_len); if (sdres->tlv == NULL) { kfree(sdres); return NULL; } sdres->tid = tid; sdres->sap = sap; INIT_HLIST_NODE(&sdres->node); return sdres; } struct nfc_llcp_sdp_tlv *nfc_llcp_build_sdreq_tlv(u8 tid, const char *uri, size_t uri_len) { struct nfc_llcp_sdp_tlv *sdreq; pr_debug("uri: %s, len: %zu\n", uri, uri_len); /* sdreq->tlv_len is u8, takes uri_len, + 3 for header, + 1 for NULL */ if (WARN_ON_ONCE(uri_len > U8_MAX - 4)) return NULL; sdreq = kzalloc(sizeof(struct nfc_llcp_sdp_tlv), GFP_KERNEL); if (sdreq == NULL) return NULL; sdreq->tlv_len = uri_len + 3; if (uri[uri_len - 1] == 0) sdreq->tlv_len--; sdreq->tlv = kzalloc(sdreq->tlv_len + 1, GFP_KERNEL); if (sdreq->tlv == NULL) { kfree(sdreq); return NULL; } sdreq->tlv[0] = LLCP_TLV_SDREQ; sdreq->tlv[1] = sdreq->tlv_len - 2; sdreq->tlv[2] = tid; sdreq->tid = tid; sdreq->uri = sdreq->tlv + 3; memcpy(sdreq->uri, uri, uri_len); sdreq->time = jiffies; INIT_HLIST_NODE(&sdreq->node); return sdreq; } void nfc_llcp_free_sdp_tlv(struct nfc_llcp_sdp_tlv *sdp) { kfree(sdp->tlv); kfree(sdp); } void nfc_llcp_free_sdp_tlv_list(struct hlist_head *head) { struct nfc_llcp_sdp_tlv *sdp; struct hlist_node *n; hlist_for_each_entry_safe(sdp, n, head, node) { hlist_del(&sdp->node); nfc_llcp_free_sdp_tlv(sdp); } } int nfc_llcp_parse_gb_tlv(struct nfc_llcp_local *local, const u8 *tlv_array, u16 tlv_array_len) { const u8 *tlv = tlv_array; u8 type, length, offset = 0; pr_debug("TLV array length %d\n", tlv_array_len); if (local == NULL) return -ENODEV; while (offset < tlv_array_len) { type = tlv[0]; length = tlv[1]; pr_debug("type 0x%x length %d\n", type, length); switch (type) { case LLCP_TLV_VERSION: local->remote_version = llcp_tlv_version(tlv); break; case LLCP_TLV_MIUX: local->remote_miu = llcp_tlv_miux(tlv) + 128; break; case LLCP_TLV_WKS: local->remote_wks = llcp_tlv_wks(tlv); break; case LLCP_TLV_LTO: local->remote_lto = llcp_tlv_lto(tlv) * 10; break; case LLCP_TLV_OPT: local->remote_opt = llcp_tlv_opt(tlv); break; default: pr_err("Invalid gt tlv value 0x%x\n", type); break; } offset += length + 2; tlv += length + 2; } pr_debug("version 0x%x miu %d lto %d opt 0x%x wks 0x%x\n", local->remote_version, local->remote_miu, local->remote_lto, local->remote_opt, local->remote_wks); return 0; } int nfc_llcp_parse_connection_tlv(struct nfc_llcp_sock *sock, const u8 *tlv_array, u16 tlv_array_len) { const u8 *tlv = tlv_array; u8 type, length, offset = 0; pr_debug("TLV array length %d\n", tlv_array_len); if (sock == NULL) return -ENOTCONN; while (offset < tlv_array_len) { type = tlv[0]; length = tlv[1]; pr_debug("type 0x%x length %d\n", type, length); switch (type) { case LLCP_TLV_MIUX: sock->remote_miu = llcp_tlv_miux(tlv) + 128; break; case LLCP_TLV_RW: sock->remote_rw = llcp_tlv_rw(tlv); break; case LLCP_TLV_SN: break; default: pr_err("Invalid gt tlv value 0x%x\n", type); break; } offset += length + 2; tlv += length + 2; } pr_debug("sock %p rw %d miu %d\n", sock, sock->remote_rw, sock->remote_miu); return 0; } static struct sk_buff *llcp_add_header(struct sk_buff *pdu, u8 dsap, u8 ssap, u8 ptype) { u8 header[2]; pr_debug("ptype 0x%x dsap 0x%x ssap 0x%x\n", ptype, dsap, ssap); header[0] = (u8)((dsap << 2) | (ptype >> 2)); header[1] = (u8)((ptype << 6) | ssap); pr_debug("header 0x%x 0x%x\n", header[0], header[1]); skb_put_data(pdu, header, LLCP_HEADER_SIZE); return pdu; } static struct sk_buff *llcp_add_tlv(struct sk_buff *pdu, const u8 *tlv, u8 tlv_length) { /* XXX Add an skb length check */ if (tlv == NULL) return NULL; skb_put_data(pdu, tlv, tlv_length); return pdu; } static struct sk_buff *llcp_allocate_pdu(struct nfc_llcp_sock *sock, u8 cmd, u16 size) { struct sk_buff *skb; int err; if (sock->ssap == 0) return NULL; skb = nfc_alloc_send_skb(sock->dev, &sock->sk, MSG_DONTWAIT, size + LLCP_HEADER_SIZE, &err); if (skb == NULL) { pr_err("Could not allocate PDU\n"); return NULL; } skb = llcp_add_header(skb, sock->dsap, sock->ssap, cmd); return skb; } int nfc_llcp_send_disconnect(struct nfc_llcp_sock *sock) { struct sk_buff *skb; struct nfc_dev *dev; struct nfc_llcp_local *local; local = sock->local; if (local == NULL) return -ENODEV; dev = sock->dev; if (dev == NULL) return -ENODEV; skb = llcp_allocate_pdu(sock, LLCP_PDU_DISC, 0); if (skb == NULL) return -ENOMEM; skb_queue_tail(&local->tx_queue, skb); return 0; } int nfc_llcp_send_symm(struct nfc_dev *dev) { struct sk_buff *skb; struct nfc_llcp_local *local; u16 size = 0; int err; local = nfc_llcp_find_local(dev); if (local == NULL) return -ENODEV; size += LLCP_HEADER_SIZE; size += dev->tx_headroom + dev->tx_tailroom + NFC_HEADER_SIZE; skb = alloc_skb(size, GFP_KERNEL); if (skb == NULL) { err = -ENOMEM; goto out; } skb_reserve(skb, dev->tx_headroom + NFC_HEADER_SIZE); skb = llcp_add_header(skb, 0, 0, LLCP_PDU_SYMM); __net_timestamp(skb); nfc_llcp_send_to_raw_sock(local, skb, NFC_DIRECTION_TX); err = nfc_data_exchange(dev, local->target_idx, skb, nfc_llcp_recv, local); out: nfc_llcp_local_put(local); return err; } int nfc_llcp_send_connect(struct nfc_llcp_sock *sock) { struct nfc_llcp_local *local; struct sk_buff *skb; const u8 *service_name_tlv = NULL; const u8 *miux_tlv = NULL; const u8 *rw_tlv = NULL; u8 service_name_tlv_length = 0; u8 miux_tlv_length, rw_tlv_length, rw; int err; u16 size = 0; __be16 miux; local = sock->local; if (local == NULL) return -ENODEV; if (sock->service_name != NULL) { service_name_tlv = nfc_llcp_build_tlv(LLCP_TLV_SN, sock->service_name, sock->service_name_len, &service_name_tlv_length); if (!service_name_tlv) { err = -ENOMEM; goto error_tlv; } size += service_name_tlv_length; } /* If the socket parameters are not set, use the local ones */ miux = be16_to_cpu(sock->miux) > LLCP_MAX_MIUX ? local->miux : sock->miux; rw = sock->rw > LLCP_MAX_RW ? local->rw : sock->rw; miux_tlv = nfc_llcp_build_tlv(LLCP_TLV_MIUX, (u8 *)&miux, 0, &miux_tlv_length); if (!miux_tlv) { err = -ENOMEM; goto error_tlv; } size += miux_tlv_length; rw_tlv = nfc_llcp_build_tlv(LLCP_TLV_RW, &rw, 0, &rw_tlv_length); if (!rw_tlv) { err = -ENOMEM; goto error_tlv; } size += rw_tlv_length; pr_debug("SKB size %d SN length %zu\n", size, sock->service_name_len); skb = llcp_allocate_pdu(sock, LLCP_PDU_CONNECT, size); if (skb == NULL) { err = -ENOMEM; goto error_tlv; } llcp_add_tlv(skb, service_name_tlv, service_name_tlv_length); llcp_add_tlv(skb, miux_tlv, miux_tlv_length); llcp_add_tlv(skb, rw_tlv, rw_tlv_length); skb_queue_tail(&local->tx_queue, skb); err = 0; error_tlv: if (err) pr_err("error %d\n", err); kfree(service_name_tlv); kfree(miux_tlv); kfree(rw_tlv); return err; } int nfc_llcp_send_cc(struct nfc_llcp_sock *sock) { struct nfc_llcp_local *local; struct sk_buff *skb; const u8 *miux_tlv = NULL; const u8 *rw_tlv = NULL; u8 miux_tlv_length, rw_tlv_length, rw; int err; u16 size = 0; __be16 miux; local = sock->local; if (local == NULL) return -ENODEV; /* If the socket parameters are not set, use the local ones */ miux = be16_to_cpu(sock->miux) > LLCP_MAX_MIUX ? local->miux : sock->miux; rw = sock->rw > LLCP_MAX_RW ? local->rw : sock->rw; miux_tlv = nfc_llcp_build_tlv(LLCP_TLV_MIUX, (u8 *)&miux, 0, &miux_tlv_length); if (!miux_tlv) { err = -ENOMEM; goto error_tlv; } size += miux_tlv_length; rw_tlv = nfc_llcp_build_tlv(LLCP_TLV_RW, &rw, 0, &rw_tlv_length); if (!rw_tlv) { err = -ENOMEM; goto error_tlv; } size += rw_tlv_length; skb = llcp_allocate_pdu(sock, LLCP_PDU_CC, size); if (skb == NULL) { err = -ENOMEM; goto error_tlv; } llcp_add_tlv(skb, miux_tlv, miux_tlv_length); llcp_add_tlv(skb, rw_tlv, rw_tlv_length); skb_queue_tail(&local->tx_queue, skb); err = 0; error_tlv: if (err) pr_err("error %d\n", err); kfree(miux_tlv); kfree(rw_tlv); return err; } static struct sk_buff *nfc_llcp_allocate_snl(struct nfc_llcp_local *local, size_t tlv_length) { struct sk_buff *skb; struct nfc_dev *dev; u16 size = 0; if (local == NULL) return ERR_PTR(-ENODEV); dev = local->dev; if (dev == NULL) return ERR_PTR(-ENODEV); size += LLCP_HEADER_SIZE; size += dev->tx_headroom + dev->tx_tailroom + NFC_HEADER_SIZE; size += tlv_length; skb = alloc_skb(size, GFP_KERNEL); if (skb == NULL) return ERR_PTR(-ENOMEM); skb_reserve(skb, dev->tx_headroom + NFC_HEADER_SIZE); skb = llcp_add_header(skb, LLCP_SAP_SDP, LLCP_SAP_SDP, LLCP_PDU_SNL); return skb; } int nfc_llcp_send_snl_sdres(struct nfc_llcp_local *local, struct hlist_head *tlv_list, size_t tlvs_len) { struct nfc_llcp_sdp_tlv *sdp; struct hlist_node *n; struct sk_buff *skb; skb = nfc_llcp_allocate_snl(local, tlvs_len); if (IS_ERR(skb)) return PTR_ERR(skb); hlist_for_each_entry_safe(sdp, n, tlv_list, node) { skb_put_data(skb, sdp->tlv, sdp->tlv_len); hlist_del(&sdp->node); nfc_llcp_free_sdp_tlv(sdp); } skb_queue_tail(&local->tx_queue, skb); return 0; } int nfc_llcp_send_snl_sdreq(struct nfc_llcp_local *local, struct hlist_head *tlv_list, size_t tlvs_len) { struct nfc_llcp_sdp_tlv *sdreq; struct hlist_node *n; struct sk_buff *skb; skb = nfc_llcp_allocate_snl(local, tlvs_len); if (IS_ERR(skb)) return PTR_ERR(skb); mutex_lock(&local->sdreq_lock); if (hlist_empty(&local->pending_sdreqs)) mod_timer(&local->sdreq_timer, jiffies + msecs_to_jiffies(3 * local->remote_lto)); hlist_for_each_entry_safe(sdreq, n, tlv_list, node) { pr_debug("tid %d for %s\n", sdreq->tid, sdreq->uri); skb_put_data(skb, sdreq->tlv, sdreq->tlv_len); hlist_del(&sdreq->node); hlist_add_head(&sdreq->node, &local->pending_sdreqs); } mutex_unlock(&local->sdreq_lock); skb_queue_tail(&local->tx_queue, skb); return 0; } int nfc_llcp_send_dm(struct nfc_llcp_local *local, u8 ssap, u8 dsap, u8 reason) { struct sk_buff *skb; struct nfc_dev *dev; u16 size = 1; /* Reason code */ pr_debug("Sending DM reason 0x%x\n", reason); if (local == NULL) return -ENODEV; dev = local->dev; if (dev == NULL) return -ENODEV; size += LLCP_HEADER_SIZE; size += dev->tx_headroom + dev->tx_tailroom + NFC_HEADER_SIZE; skb = alloc_skb(size, GFP_KERNEL); if (skb == NULL) return -ENOMEM; skb_reserve(skb, dev->tx_headroom + NFC_HEADER_SIZE); skb = llcp_add_header(skb, dsap, ssap, LLCP_PDU_DM); skb_put_data(skb, &reason, 1); skb_queue_head(&local->tx_queue, skb); return 0; } int nfc_llcp_send_i_frame(struct nfc_llcp_sock *sock, struct msghdr *msg, size_t len) { struct sk_buff *pdu; struct sock *sk = &sock->sk; struct nfc_llcp_local *local; size_t frag_len = 0, remaining_len; u8 *msg_data, *msg_ptr; u16 remote_miu; pr_debug("Send I frame len %zd\n", len); local = sock->local; if (local == NULL) return -ENODEV; /* Remote is ready but has not acknowledged our frames */ if((sock->remote_ready && skb_queue_len(&sock->tx_pending_queue) >= sock->remote_rw && skb_queue_len(&sock->tx_queue) >= 2 * sock->remote_rw)) { pr_err("Pending queue is full %d frames\n", skb_queue_len(&sock->tx_pending_queue)); return -ENOBUFS; } /* Remote is not ready and we've been queueing enough frames */ if ((!sock->remote_ready && skb_queue_len(&sock->tx_queue) >= 2 * sock->remote_rw)) { pr_err("Tx queue is full %d frames\n", skb_queue_len(&sock->tx_queue)); return -ENOBUFS; } msg_data = kmalloc(len, GFP_USER | __GFP_NOWARN); if (msg_data == NULL) return -ENOMEM; if (memcpy_from_msg(msg_data, msg, len)) { kfree(msg_data); return -EFAULT; } remaining_len = len; msg_ptr = msg_data; do { remote_miu = sock->remote_miu > LLCP_MAX_MIU ? LLCP_DEFAULT_MIU : sock->remote_miu; frag_len = min_t(size_t, remote_miu, remaining_len); pr_debug("Fragment %zd bytes remaining %zd", frag_len, remaining_len); pdu = llcp_allocate_pdu(sock, LLCP_PDU_I, frag_len + LLCP_SEQUENCE_SIZE); if (pdu == NULL) { kfree(msg_data); return -ENOMEM; } skb_put(pdu, LLCP_SEQUENCE_SIZE); if (likely(frag_len > 0)) skb_put_data(pdu, msg_ptr, frag_len); skb_queue_tail(&sock->tx_queue, pdu); lock_sock(sk); nfc_llcp_queue_i_frames(sock); release_sock(sk); remaining_len -= frag_len; msg_ptr += frag_len; } while (remaining_len > 0); kfree(msg_data); return len; } int nfc_llcp_send_ui_frame(struct nfc_llcp_sock *sock, u8 ssap, u8 dsap, struct msghdr *msg, size_t len) { struct sk_buff *pdu; struct nfc_llcp_local *local; size_t frag_len = 0, remaining_len; u8 *msg_ptr, *msg_data; u16 remote_miu; int err; pr_debug("Send UI frame len %zd\n", len); local = sock->local; if (local == NULL) return -ENODEV; msg_data = kmalloc(len, GFP_USER | __GFP_NOWARN); if (msg_data == NULL) return -ENOMEM; if (memcpy_from_msg(msg_data, msg, len)) { kfree(msg_data); return -EFAULT; } remaining_len = len; msg_ptr = msg_data; do { remote_miu = sock->remote_miu > LLCP_MAX_MIU ? local->remote_miu : sock->remote_miu; frag_len = min_t(size_t, remote_miu, remaining_len); pr_debug("Fragment %zd bytes remaining %zd", frag_len, remaining_len); pdu = nfc_alloc_send_skb(sock->dev, &sock->sk, 0, frag_len + LLCP_HEADER_SIZE, &err); if (pdu == NULL) { pr_err("Could not allocate PDU (error=%d)\n", err); len -= remaining_len; if (len == 0) len = err; break; } pdu = llcp_add_header(pdu, dsap, ssap, LLCP_PDU_UI); if (likely(frag_len > 0)) skb_put_data(pdu, msg_ptr, frag_len); /* No need to check for the peer RW for UI frames */ skb_queue_tail(&local->tx_queue, pdu); remaining_len -= frag_len; msg_ptr += frag_len; } while (remaining_len > 0); kfree(msg_data); return len; } int nfc_llcp_send_rr(struct nfc_llcp_sock *sock) { struct sk_buff *skb; struct nfc_llcp_local *local; pr_debug("Send rr nr %d\n", sock->recv_n); local = sock->local; if (local == NULL) return -ENODEV; skb = llcp_allocate_pdu(sock, LLCP_PDU_RR, LLCP_SEQUENCE_SIZE); if (skb == NULL) return -ENOMEM; skb_put(skb, LLCP_SEQUENCE_SIZE); skb->data[2] = sock->recv_n; skb_queue_head(&local->tx_queue, skb); return 0; } |
| 876 973 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 | /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _BCACHEFS_ERRCODE_H #define _BCACHEFS_ERRCODE_H #define BCH_ERRCODES() \ x(ERANGE, ERANGE_option_too_small) \ x(ERANGE, ERANGE_option_too_big) \ x(EINVAL, mount_option) \ x(BCH_ERR_mount_option, option_name) \ x(BCH_ERR_mount_option, option_value) \ x(BCH_ERR_mount_option, option_not_bool) \ x(ENOMEM, ENOMEM_stripe_buf) \ x(ENOMEM, ENOMEM_replicas_table) \ x(ENOMEM, ENOMEM_cpu_replicas) \ x(ENOMEM, ENOMEM_replicas_gc) \ x(ENOMEM, ENOMEM_disk_groups_validate) \ x(ENOMEM, ENOMEM_disk_groups_to_cpu) \ x(ENOMEM, ENOMEM_mark_snapshot) \ x(ENOMEM, ENOMEM_mark_stripe) \ x(ENOMEM, ENOMEM_mark_stripe_ptr) \ x(ENOMEM, ENOMEM_btree_key_cache_create) \ x(ENOMEM, ENOMEM_btree_key_cache_fill) \ x(ENOMEM, ENOMEM_btree_key_cache_insert) \ x(ENOMEM, ENOMEM_trans_kmalloc) \ x(ENOMEM, ENOMEM_trans_log_msg) \ x(ENOMEM, ENOMEM_do_encrypt) \ x(ENOMEM, ENOMEM_ec_read_extent) \ x(ENOMEM, ENOMEM_ec_stripe_mem_alloc) \ x(ENOMEM, ENOMEM_ec_new_stripe_alloc) \ x(ENOMEM, ENOMEM_fs_btree_cache_init) \ x(ENOMEM, ENOMEM_fs_btree_key_cache_init) \ x(ENOMEM, ENOMEM_fs_counters_init) \ x(ENOMEM, ENOMEM_fs_btree_write_buffer_init) \ x(ENOMEM, ENOMEM_io_clock_init) \ x(ENOMEM, ENOMEM_blacklist_table_init) \ x(ENOMEM, ENOMEM_sb_realloc_injected) \ x(ENOMEM, ENOMEM_sb_bio_realloc) \ x(ENOMEM, ENOMEM_sb_buf_realloc) \ x(ENOMEM, ENOMEM_sb_journal_validate) \ x(ENOMEM, ENOMEM_sb_journal_v2_validate) \ x(ENOMEM, ENOMEM_journal_entry_add) \ x(ENOMEM, ENOMEM_journal_read_buf_realloc) \ x(ENOMEM, ENOMEM_btree_interior_update_worker_init)\ x(ENOMEM, ENOMEM_btree_interior_update_pool_init) \ x(ENOMEM, ENOMEM_bio_read_init) \ x(ENOMEM, ENOMEM_bio_read_split_init) \ x(ENOMEM, ENOMEM_bio_write_init) \ x(ENOMEM, ENOMEM_bio_bounce_pages_init) \ x(ENOMEM, ENOMEM_writepage_bioset_init) \ x(ENOMEM, ENOMEM_dio_read_bioset_init) \ x(ENOMEM, ENOMEM_dio_write_bioset_init) \ x(ENOMEM, ENOMEM_nocow_flush_bioset_init) \ x(ENOMEM, ENOMEM_promote_table_init) \ x(ENOMEM, ENOMEM_compression_bounce_read_init) \ x(ENOMEM, ENOMEM_compression_bounce_write_init) \ x(ENOMEM, ENOMEM_compression_workspace_init) \ x(ENOMEM, ENOMEM_backpointer_mismatches_bitmap) \ x(EIO, compression_workspace_not_initialized) \ x(ENOMEM, ENOMEM_bucket_gens) \ x(ENOMEM, ENOMEM_buckets_nouse) \ x(ENOMEM, ENOMEM_usage_init) \ x(ENOMEM, ENOMEM_btree_node_read_all_replicas) \ x(ENOMEM, ENOMEM_btree_node_reclaim) \ x(ENOMEM, ENOMEM_btree_node_mem_alloc) \ x(ENOMEM, ENOMEM_btree_cache_cannibalize_lock) \ x(ENOMEM, ENOMEM_buckets_waiting_for_journal_init)\ x(ENOMEM, ENOMEM_buckets_waiting_for_journal_set) \ x(ENOMEM, ENOMEM_set_nr_journal_buckets) \ x(ENOMEM, ENOMEM_dev_journal_init) \ x(ENOMEM, ENOMEM_journal_pin_fifo) \ x(ENOMEM, ENOMEM_journal_buf) \ x(ENOMEM, ENOMEM_gc_start) \ x(ENOMEM, ENOMEM_gc_alloc_start) \ x(ENOMEM, ENOMEM_gc_reflink_start) \ x(ENOMEM, ENOMEM_gc_gens) \ x(ENOMEM, ENOMEM_gc_repair_key) \ x(ENOMEM, ENOMEM_fsck_extent_ends_at) \ x(ENOMEM, ENOMEM_fsck_add_nlink) \ x(ENOMEM, ENOMEM_journal_key_insert) \ x(ENOMEM, ENOMEM_journal_keys_sort) \ x(ENOMEM, ENOMEM_read_superblock_clean) \ x(ENOMEM, ENOMEM_fs_alloc) \ x(ENOMEM, ENOMEM_fs_name_alloc) \ x(ENOMEM, ENOMEM_fs_other_alloc) \ x(ENOMEM, ENOMEM_dev_alloc) \ x(ENOMEM, ENOMEM_disk_accounting) \ x(ENOMEM, ENOMEM_stripe_head_alloc) \ x(ENOMEM, ENOMEM_journal_read_bucket) \ x(ENOSPC, ENOSPC_disk_reservation) \ x(ENOSPC, ENOSPC_bucket_alloc) \ x(ENOSPC, ENOSPC_disk_label_add) \ x(ENOSPC, ENOSPC_stripe_create) \ x(ENOSPC, ENOSPC_inode_create) \ x(ENOSPC, ENOSPC_str_hash_create) \ x(ENOSPC, ENOSPC_snapshot_create) \ x(ENOSPC, ENOSPC_subvolume_create) \ x(ENOSPC, ENOSPC_sb) \ x(ENOSPC, ENOSPC_sb_journal) \ x(ENOSPC, ENOSPC_sb_journal_seq_blacklist) \ x(ENOSPC, ENOSPC_sb_quota) \ x(ENOSPC, ENOSPC_sb_replicas) \ x(ENOSPC, ENOSPC_sb_members) \ x(ENOSPC, ENOSPC_sb_members_v2) \ x(ENOSPC, ENOSPC_sb_crypt) \ x(ENOSPC, ENOSPC_sb_downgrade) \ x(ENOSPC, ENOSPC_btree_slot) \ x(ENOSPC, ENOSPC_snapshot_tree) \ x(ENOENT, ENOENT_bkey_type_mismatch) \ x(ENOENT, ENOENT_str_hash_lookup) \ x(ENOENT, ENOENT_str_hash_set_must_replace) \ x(ENOENT, ENOENT_inode) \ x(ENOENT, ENOENT_not_subvol) \ x(ENOENT, ENOENT_not_directory) \ x(ENOENT, ENOENT_directory_dead) \ x(ENOENT, ENOENT_subvolume) \ x(ENOENT, ENOENT_snapshot_tree) \ x(ENOENT, ENOENT_dirent_doesnt_match_inode) \ x(ENOENT, ENOENT_dev_not_found) \ x(ENOENT, ENOENT_dev_idx_not_found) \ x(ENOENT, ENOENT_inode_no_backpointer) \ x(ENOENT, ENOENT_no_snapshot_tree_subvol) \ x(ENOTEMPTY, ENOTEMPTY_dir_not_empty) \ x(ENOTEMPTY, ENOTEMPTY_subvol_not_empty) \ x(EEXIST, EEXIST_str_hash_set) \ x(EEXIST, EEXIST_discard_in_flight_add) \ x(EEXIST, EEXIST_subvolume_create) \ x(ENOSPC, open_buckets_empty) \ x(ENOSPC, freelist_empty) \ x(BCH_ERR_freelist_empty, no_buckets_found) \ x(0, transaction_restart) \ x(BCH_ERR_transaction_restart, transaction_restart_fault_inject) \ x(BCH_ERR_transaction_restart, transaction_restart_relock) \ x(BCH_ERR_transaction_restart, transaction_restart_relock_path) \ x(BCH_ERR_transaction_restart, transaction_restart_relock_path_intent) \ x(BCH_ERR_transaction_restart, transaction_restart_relock_after_fill) \ x(BCH_ERR_transaction_restart, transaction_restart_too_many_iters) \ x(BCH_ERR_transaction_restart, transaction_restart_lock_node_reused) \ x(BCH_ERR_transaction_restart, transaction_restart_fill_relock) \ x(BCH_ERR_transaction_restart, transaction_restart_fill_mem_alloc_fail)\ x(BCH_ERR_transaction_restart, transaction_restart_mem_realloced) \ x(BCH_ERR_transaction_restart, transaction_restart_in_traverse_all) \ x(BCH_ERR_transaction_restart, transaction_restart_would_deadlock) \ x(BCH_ERR_transaction_restart, transaction_restart_would_deadlock_write)\ x(BCH_ERR_transaction_restart, transaction_restart_deadlock_recursion_limit)\ x(BCH_ERR_transaction_restart, transaction_restart_upgrade) \ x(BCH_ERR_transaction_restart, transaction_restart_key_cache_upgrade) \ x(BCH_ERR_transaction_restart, transaction_restart_key_cache_fill) \ x(BCH_ERR_transaction_restart, transaction_restart_key_cache_raced) \ x(BCH_ERR_transaction_restart, transaction_restart_key_cache_realloced)\ x(BCH_ERR_transaction_restart, transaction_restart_journal_preres_get) \ x(BCH_ERR_transaction_restart, transaction_restart_split_race) \ x(BCH_ERR_transaction_restart, transaction_restart_write_buffer_flush) \ x(BCH_ERR_transaction_restart, transaction_restart_nested) \ x(BCH_ERR_transaction_restart, transaction_restart_commit) \ x(0, no_btree_node) \ x(BCH_ERR_no_btree_node, no_btree_node_relock) \ x(BCH_ERR_no_btree_node, no_btree_node_upgrade) \ x(BCH_ERR_no_btree_node, no_btree_node_drop) \ x(BCH_ERR_no_btree_node, no_btree_node_lock_root) \ x(BCH_ERR_no_btree_node, no_btree_node_up) \ x(BCH_ERR_no_btree_node, no_btree_node_down) \ x(BCH_ERR_no_btree_node, no_btree_node_init) \ x(BCH_ERR_no_btree_node, no_btree_node_cached) \ x(BCH_ERR_no_btree_node, no_btree_node_srcu_reset) \ x(0, btree_insert_fail) \ x(BCH_ERR_btree_insert_fail, btree_insert_btree_node_full) \ x(BCH_ERR_btree_insert_fail, btree_insert_need_mark_replicas) \ x(BCH_ERR_btree_insert_fail, btree_insert_need_journal_res) \ x(BCH_ERR_btree_insert_fail, btree_insert_need_journal_reclaim) \ x(0, backpointer_to_overwritten_btree_node) \ x(0, journal_reclaim_would_deadlock) \ x(EINVAL, fsck) \ x(BCH_ERR_fsck, fsck_fix) \ x(BCH_ERR_fsck, fsck_delete_bkey) \ x(BCH_ERR_fsck, fsck_ignore) \ x(BCH_ERR_fsck, fsck_errors_not_fixed) \ x(BCH_ERR_fsck, fsck_repair_unimplemented) \ x(BCH_ERR_fsck, fsck_repair_impossible) \ x(EINVAL, restart_recovery) \ x(EINVAL, not_in_recovery) \ x(EINVAL, cannot_rewind_recovery) \ x(0, data_update_done) \ x(EINVAL, device_state_not_allowed) \ x(EINVAL, member_info_missing) \ x(EINVAL, mismatched_block_size) \ x(EINVAL, block_size_too_small) \ x(EINVAL, bucket_size_too_small) \ x(EINVAL, device_size_too_small) \ x(EINVAL, device_size_too_big) \ x(EINVAL, device_not_a_member_of_filesystem) \ x(EINVAL, device_has_been_removed) \ x(EINVAL, device_splitbrain) \ x(EINVAL, device_already_online) \ x(EINVAL, insufficient_devices_to_start) \ x(EINVAL, invalid) \ x(EINVAL, internal_fsck_err) \ x(EINVAL, opt_parse_error) \ x(EINVAL, remove_with_metadata_missing_unimplemented)\ x(EINVAL, remove_would_lose_data) \ x(EINVAL, no_resize_with_buckets_nouse) \ x(EINVAL, inode_unpack_error) \ x(EINVAL, varint_decode_error) \ x(EROFS, erofs_trans_commit) \ x(EROFS, erofs_no_writes) \ x(EROFS, erofs_journal_err) \ x(EROFS, erofs_sb_err) \ x(EROFS, erofs_unfixed_errors) \ x(EROFS, erofs_norecovery) \ x(EROFS, erofs_nochanges) \ x(EROFS, insufficient_devices) \ x(0, operation_blocked) \ x(BCH_ERR_operation_blocked, btree_cache_cannibalize_lock_blocked) \ x(BCH_ERR_operation_blocked, journal_res_get_blocked) \ x(BCH_ERR_operation_blocked, journal_preres_get_blocked) \ x(BCH_ERR_operation_blocked, bucket_alloc_blocked) \ x(BCH_ERR_operation_blocked, stripe_alloc_blocked) \ x(BCH_ERR_invalid, invalid_sb) \ x(BCH_ERR_invalid_sb, invalid_sb_magic) \ x(BCH_ERR_invalid_sb, invalid_sb_version) \ x(BCH_ERR_invalid_sb, invalid_sb_features) \ x(BCH_ERR_invalid_sb, invalid_sb_too_big) \ x(BCH_ERR_invalid_sb, invalid_sb_csum_type) \ x(BCH_ERR_invalid_sb, invalid_sb_csum) \ x(BCH_ERR_invalid_sb, invalid_sb_block_size) \ x(BCH_ERR_invalid_sb, invalid_sb_uuid) \ x(BCH_ERR_invalid_sb, invalid_sb_too_many_members) \ x(BCH_ERR_invalid_sb, invalid_sb_dev_idx) \ x(BCH_ERR_invalid_sb, invalid_sb_time_precision) \ x(BCH_ERR_invalid_sb, invalid_sb_field_size) \ x(BCH_ERR_invalid_sb, invalid_sb_layout) \ x(BCH_ERR_invalid_sb_layout, invalid_sb_layout_type) \ x(BCH_ERR_invalid_sb_layout, invalid_sb_layout_nr_superblocks) \ x(BCH_ERR_invalid_sb_layout, invalid_sb_layout_superblocks_overlap) \ x(BCH_ERR_invalid_sb_layout, invalid_sb_layout_sb_max_size_bits) \ x(BCH_ERR_invalid_sb, invalid_sb_members_missing) \ x(BCH_ERR_invalid_sb, invalid_sb_members) \ x(BCH_ERR_invalid_sb, invalid_sb_disk_groups) \ x(BCH_ERR_invalid_sb, invalid_sb_replicas) \ x(BCH_ERR_invalid_sb, invalid_replicas_entry) \ x(BCH_ERR_invalid_sb, invalid_sb_journal) \ x(BCH_ERR_invalid_sb, invalid_sb_journal_seq_blacklist) \ x(BCH_ERR_invalid_sb, invalid_sb_crypt) \ x(BCH_ERR_invalid_sb, invalid_sb_clean) \ x(BCH_ERR_invalid_sb, invalid_sb_quota) \ x(BCH_ERR_invalid_sb, invalid_sb_errors) \ x(BCH_ERR_invalid_sb, invalid_sb_opt_compression) \ x(BCH_ERR_invalid_sb, invalid_sb_ext) \ x(BCH_ERR_invalid_sb, invalid_sb_downgrade) \ x(BCH_ERR_invalid, invalid_bkey) \ x(BCH_ERR_operation_blocked, nocow_lock_blocked) \ x(EIO, journal_shutdown) \ x(EIO, journal_flush_err) \ x(EIO, btree_node_read_err) \ x(BCH_ERR_btree_node_read_err, btree_node_read_err_cached) \ x(EIO, sb_not_downgraded) \ x(EIO, btree_node_write_all_failed) \ x(EIO, btree_node_read_error) \ x(EIO, btree_node_read_validate_error) \ x(EIO, btree_need_topology_repair) \ x(EIO, bucket_ref_update) \ x(EIO, trigger_pointer) \ x(EIO, trigger_stripe_pointer) \ x(EIO, metadata_bucket_inconsistency) \ x(EIO, mark_stripe) \ x(EIO, stripe_reconstruct) \ x(EIO, key_type_error) \ x(EIO, no_device_to_read_from) \ x(EIO, missing_indirect_extent) \ x(EIO, invalidate_stripe_to_dev) \ x(EIO, no_encryption_key) \ x(EIO, insufficient_journal_devices) \ x(BCH_ERR_btree_node_read_err, btree_node_read_err_fixable) \ x(BCH_ERR_btree_node_read_err, btree_node_read_err_want_retry) \ x(BCH_ERR_btree_node_read_err, btree_node_read_err_must_retry) \ x(BCH_ERR_btree_node_read_err, btree_node_read_err_bad_node) \ x(BCH_ERR_btree_node_read_err, btree_node_read_err_incompatible) \ x(0, nopromote) \ x(BCH_ERR_nopromote, nopromote_may_not) \ x(BCH_ERR_nopromote, nopromote_already_promoted) \ x(BCH_ERR_nopromote, nopromote_unwritten) \ x(BCH_ERR_nopromote, nopromote_congested) \ x(BCH_ERR_nopromote, nopromote_in_flight) \ x(BCH_ERR_nopromote, nopromote_no_writes) \ x(BCH_ERR_nopromote, nopromote_enomem) \ x(0, invalid_snapshot_node) \ x(0, option_needs_open_fs) \ x(0, remove_disk_accounting_entry) enum bch_errcode { BCH_ERR_START = 2048, #define x(class, err) BCH_ERR_##err, BCH_ERRCODES() #undef x BCH_ERR_MAX }; const char *bch2_err_str(int); bool __bch2_err_matches(int, int); static inline bool _bch2_err_matches(int err, int class) { return err < 0 && __bch2_err_matches(err, class); } #define bch2_err_matches(_err, _class) \ ({ \ BUILD_BUG_ON(!__builtin_constant_p(_class)); \ unlikely(_bch2_err_matches(_err, _class)); \ }) int __bch2_err_class(int); static inline long bch2_err_class(long err) { return err < 0 ? __bch2_err_class(err) : err; } #define BLK_STS_REMOVED ((__force blk_status_t)128) #include <linux/blk_types.h> const char *bch2_blk_status_to_str(blk_status_t); #endif /* _BCACHFES_ERRCODE_H */ |
| 8 8 8 8 1 1 1 1 1 1 2 2 2 3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 | /* SPDX-License-Identifier: GPL-2.0-only */ /* * Copyright (c) 2008, Intel Corporation. * * Author: Alexander Duyck <alexander.h.duyck@intel.com> */ #ifndef __NET_TC_SKBEDIT_H #define __NET_TC_SKBEDIT_H #include <net/act_api.h> #include <linux/tc_act/tc_skbedit.h> struct tcf_skbedit_params { u32 flags; u32 priority; u32 mark; u32 mask; u16 queue_mapping; u16 mapping_mod; u16 ptype; struct rcu_head rcu; }; struct tcf_skbedit { struct tc_action common; struct tcf_skbedit_params __rcu *params; }; #define to_skbedit(a) ((struct tcf_skbedit *)a) /* Return true iff action is the one identified by FLAG. */ static inline bool is_tcf_skbedit_with_flag(const struct tc_action *a, u32 flag) { #ifdef CONFIG_NET_CLS_ACT u32 flags; if (a->ops && a->ops->id == TCA_ID_SKBEDIT) { rcu_read_lock(); flags = rcu_dereference(to_skbedit(a)->params)->flags; rcu_read_unlock(); return flags == flag; } #endif return false; } /* Return true iff action is mark */ static inline bool is_tcf_skbedit_mark(const struct tc_action *a) { return is_tcf_skbedit_with_flag(a, SKBEDIT_F_MARK); } static inline u32 tcf_skbedit_mark(const struct tc_action *a) { u32 mark; rcu_read_lock(); mark = rcu_dereference(to_skbedit(a)->params)->mark; rcu_read_unlock(); return mark; } /* Return true iff action is ptype */ static inline bool is_tcf_skbedit_ptype(const struct tc_action *a) { return is_tcf_skbedit_with_flag(a, SKBEDIT_F_PTYPE); } static inline u32 tcf_skbedit_ptype(const struct tc_action *a) { u16 ptype; rcu_read_lock(); ptype = rcu_dereference(to_skbedit(a)->params)->ptype; rcu_read_unlock(); return ptype; } /* Return true iff action is priority */ static inline bool is_tcf_skbedit_priority(const struct tc_action *a) { return is_tcf_skbedit_with_flag(a, SKBEDIT_F_PRIORITY); } static inline u32 tcf_skbedit_priority(const struct tc_action *a) { u32 priority; rcu_read_lock(); priority = rcu_dereference(to_skbedit(a)->params)->priority; rcu_read_unlock(); return priority; } static inline u16 tcf_skbedit_rx_queue_mapping(const struct tc_action *a) { u16 rx_queue; rcu_read_lock(); rx_queue = rcu_dereference(to_skbedit(a)->params)->queue_mapping; rcu_read_unlock(); return rx_queue; } /* Return true iff action is queue_mapping */ static inline bool is_tcf_skbedit_queue_mapping(const struct tc_action *a) { return is_tcf_skbedit_with_flag(a, SKBEDIT_F_QUEUE_MAPPING); } /* Return true if action is on ingress traffic */ static inline bool is_tcf_skbedit_ingress(u32 flags) { return flags & TCA_ACT_FLAGS_AT_INGRESS; } static inline bool is_tcf_skbedit_tx_queue_mapping(const struct tc_action *a) { return is_tcf_skbedit_queue_mapping(a) && !is_tcf_skbedit_ingress(a->tcfa_flags); } static inline bool is_tcf_skbedit_rx_queue_mapping(const struct tc_action *a) { return is_tcf_skbedit_queue_mapping(a) && is_tcf_skbedit_ingress(a->tcfa_flags); } /* Return true iff action is inheritdsfield */ static inline bool is_tcf_skbedit_inheritdsfield(const struct tc_action *a) { return is_tcf_skbedit_with_flag(a, SKBEDIT_F_INHERITDSFIELD); } #endif /* __NET_TC_SKBEDIT_H */ |
| 47 47 47 80 80 3 134 1 1 126 1 2 1 82 7 43 1 109 3 1 18 21 128 1 22 8 1 2 27 97 97 3 26 69 93 2 9 6 81 3 93 1 1 1 93 94 93 93 93 1 92 92 93 93 93 19 91 106 106 18 1 18 1 5 9 12 2 2 1 2 1 1 1 8 8 6 8 47 46 91 1 1 2 93 93 93 71 71 71 71 71 56 1 70 70 70 1 70 1 70 1 71 59 11 47 23 22 21 54 16 71 71 23 13 2 69 69 1 57 64 2 61 62 10 8 1 1 48 4 1 3 1 1 1 17 40 10 10 10 47 58 62 13 12 1 1 13 112 99 91 25 67 64 2 220 211 9 50 50 50 25 30 30 193 113 13 79 9 184 9 9 1 1 2 9 7 2 8 1 5 4 9 6 3 193 195 213 214 1 213 207 207 206 180 118 1 30 1 1 29 27 28 1 117 1 1 2 113 113 1 1 2 59 105 5 241 7 234 230 7 222 196 33 145 145 98 1 8 8 5 9 8 1 1 9 36 5 36 36 5 5 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 | // SPDX-License-Identifier: GPL-2.0-or-later /* * TCP over IPv6 * Linux INET6 implementation * * Authors: * Pedro Roque <roque@di.fc.ul.pt> * * Based on: * linux/net/ipv4/tcp.c * linux/net/ipv4/tcp_input.c * linux/net/ipv4/tcp_output.c * * Fixes: * Hideaki YOSHIFUJI : sin6_scope_id support * YOSHIFUJI Hideaki @USAGI and: Support IPV6_V6ONLY socket option, which * Alexey Kuznetsov allow both IPv4 and IPv6 sockets to bind * a single port at the same time. * YOSHIFUJI Hideaki @USAGI: convert /proc/net/tcp6 to seq_file. */ #include <linux/bottom_half.h> #include <linux/module.h> #include <linux/errno.h> #include <linux/types.h> #include <linux/socket.h> #include <linux/sockios.h> #include <linux/net.h> #include <linux/jiffies.h> #include <linux/in.h> #include <linux/in6.h> #include <linux/netdevice.h> #include <linux/init.h> #include <linux/jhash.h> #include <linux/ipsec.h> #include <linux/times.h> #include <linux/slab.h> #include <linux/uaccess.h> #include <linux/ipv6.h> #include <linux/icmpv6.h> #include <linux/random.h> #include <linux/indirect_call_wrapper.h> #include <net/tcp.h> #include <net/ndisc.h> #include <net/inet6_hashtables.h> #include <net/inet6_connection_sock.h> #include <net/ipv6.h> #include <net/transp_v6.h> #include <net/addrconf.h> #include <net/ip6_route.h> #include <net/ip6_checksum.h> #include <net/inet_ecn.h> #include <net/protocol.h> #include <net/xfrm.h> #include <net/snmp.h> #include <net/dsfield.h> #include <net/timewait_sock.h> #include <net/inet_common.h> #include <net/secure_seq.h> #include <net/hotdata.h> #include <net/busy_poll.h> #include <net/rstreason.h> #include <linux/proc_fs.h> #include <linux/seq_file.h> #include <crypto/hash.h> #include <linux/scatterlist.h> #include <trace/events/tcp.h> static void tcp_v6_send_reset(const struct sock *sk, struct sk_buff *skb, enum sk_rst_reason reason); static void tcp_v6_reqsk_send_ack(const struct sock *sk, struct sk_buff *skb, struct request_sock *req); INDIRECT_CALLABLE_SCOPE int tcp_v6_do_rcv(struct sock *sk, struct sk_buff *skb); static const struct inet_connection_sock_af_ops ipv6_mapped; const struct inet_connection_sock_af_ops ipv6_specific; #if defined(CONFIG_TCP_MD5SIG) || defined(CONFIG_TCP_AO) static const struct tcp_sock_af_ops tcp_sock_ipv6_specific; static const struct tcp_sock_af_ops tcp_sock_ipv6_mapped_specific; #endif /* Helper returning the inet6 address from a given tcp socket. * It can be used in TCP stack instead of inet6_sk(sk). * This avoids a dereference and allow compiler optimizations. * It is a specialized version of inet6_sk_generic(). */ #define tcp_inet6_sk(sk) (&container_of_const(tcp_sk(sk), \ struct tcp6_sock, tcp)->inet6) static void inet6_sk_rx_dst_set(struct sock *sk, const struct sk_buff *skb) { struct dst_entry *dst = skb_dst(skb); if (dst && dst_hold_safe(dst)) { rcu_assign_pointer(sk->sk_rx_dst, dst); sk->sk_rx_dst_ifindex = skb->skb_iif; sk->sk_rx_dst_cookie = rt6_get_cookie(dst_rt6_info(dst)); } } static u32 tcp_v6_init_seq(const struct sk_buff *skb) { return secure_tcpv6_seq(ipv6_hdr(skb)->daddr.s6_addr32, ipv6_hdr(skb)->saddr.s6_addr32, tcp_hdr(skb)->dest, tcp_hdr(skb)->source); } static u32 tcp_v6_init_ts_off(const struct net *net, const struct sk_buff *skb) { return secure_tcpv6_ts_off(net, ipv6_hdr(skb)->daddr.s6_addr32, ipv6_hdr(skb)->saddr.s6_addr32); } static int tcp_v6_pre_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len) { /* This check is replicated from tcp_v6_connect() and intended to * prevent BPF program called below from accessing bytes that are out * of the bound specified by user in addr_len. */ if (addr_len < SIN6_LEN_RFC2133) return -EINVAL; sock_owned_by_me(sk); return BPF_CGROUP_RUN_PROG_INET6_CONNECT(sk, uaddr, &addr_len); } static int tcp_v6_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len) { struct sockaddr_in6 *usin = (struct sockaddr_in6 *) uaddr; struct inet_connection_sock *icsk = inet_csk(sk); struct in6_addr *saddr = NULL, *final_p, final; struct inet_timewait_death_row *tcp_death_row; struct ipv6_pinfo *np = tcp_inet6_sk(sk); struct inet_sock *inet = inet_sk(sk); struct tcp_sock *tp = tcp_sk(sk); struct net *net = sock_net(sk); struct ipv6_txoptions *opt; struct dst_entry *dst; struct flowi6 fl6; int addr_type; int err; if (addr_len < SIN6_LEN_RFC2133) return -EINVAL; if (usin->sin6_family != AF_INET6) return -EAFNOSUPPORT; memset(&fl6, 0, sizeof(fl6)); if (inet6_test_bit(SNDFLOW, sk)) { fl6.flowlabel = usin->sin6_flowinfo&IPV6_FLOWINFO_MASK; IP6_ECN_flow_init(fl6.flowlabel); if (fl6.flowlabel&IPV6_FLOWLABEL_MASK) { struct ip6_flowlabel *flowlabel; flowlabel = fl6_sock_lookup(sk, fl6.flowlabel); if (IS_ERR(flowlabel)) return -EINVAL; fl6_sock_release(flowlabel); } } /* * connect() to INADDR_ANY means loopback (BSD'ism). */ if (ipv6_addr_any(&usin->sin6_addr)) { if (ipv6_addr_v4mapped(&sk->sk_v6_rcv_saddr)) ipv6_addr_set_v4mapped(htonl(INADDR_LOOPBACK), &usin->sin6_addr); else usin->sin6_addr = in6addr_loopback; } addr_type = ipv6_addr_type(&usin->sin6_addr); if (addr_type & IPV6_ADDR_MULTICAST) return -ENETUNREACH; if (addr_type&IPV6_ADDR_LINKLOCAL) { if (addr_len >= sizeof(struct sockaddr_in6) && usin->sin6_scope_id) { /* If interface is set while binding, indices * must coincide. */ if (!sk_dev_equal_l3scope(sk, usin->sin6_scope_id)) return -EINVAL; sk->sk_bound_dev_if = usin->sin6_scope_id; } /* Connect to link-local address requires an interface */ if (!sk->sk_bound_dev_if) return -EINVAL; } if (tp->rx_opt.ts_recent_stamp && !ipv6_addr_equal(&sk->sk_v6_daddr, &usin->sin6_addr)) { tp->rx_opt.ts_recent = 0; tp->rx_opt.ts_recent_stamp = 0; WRITE_ONCE(tp->write_seq, 0); } sk->sk_v6_daddr = usin->sin6_addr; np->flow_label = fl6.flowlabel; /* * TCP over IPv4 */ if (addr_type & IPV6_ADDR_MAPPED) { u32 exthdrlen = icsk->icsk_ext_hdr_len; struct sockaddr_in sin; if (ipv6_only_sock(sk)) return -ENETUNREACH; sin.sin_family = AF_INET; sin.sin_port = usin->sin6_port; sin.sin_addr.s_addr = usin->sin6_addr.s6_addr32[3]; /* Paired with READ_ONCE() in tcp_(get|set)sockopt() */ WRITE_ONCE(icsk->icsk_af_ops, &ipv6_mapped); if (sk_is_mptcp(sk)) mptcpv6_handle_mapped(sk, true); sk->sk_backlog_rcv = tcp_v4_do_rcv; #if defined(CONFIG_TCP_MD5SIG) || defined(CONFIG_TCP_AO) tp->af_specific = &tcp_sock_ipv6_mapped_specific; #endif err = tcp_v4_connect(sk, (struct sockaddr *)&sin, sizeof(sin)); if (err) { icsk->icsk_ext_hdr_len = exthdrlen; /* Paired with READ_ONCE() in tcp_(get|set)sockopt() */ WRITE_ONCE(icsk->icsk_af_ops, &ipv6_specific); if (sk_is_mptcp(sk)) mptcpv6_handle_mapped(sk, false); sk->sk_backlog_rcv = tcp_v6_do_rcv; #if defined(CONFIG_TCP_MD5SIG) || defined(CONFIG_TCP_AO) tp->af_specific = &tcp_sock_ipv6_specific; #endif goto failure; } np->saddr = sk->sk_v6_rcv_saddr; return err; } if (!ipv6_addr_any(&sk->sk_v6_rcv_saddr)) saddr = &sk->sk_v6_rcv_saddr; fl6.flowi6_proto = IPPROTO_TCP; fl6.daddr = sk->sk_v6_daddr; fl6.saddr = saddr ? *saddr : np->saddr; fl6.flowlabel = ip6_make_flowinfo(np->tclass, np->flow_label); fl6.flowi6_oif = sk->sk_bound_dev_if; fl6.flowi6_mark = sk->sk_mark; fl6.fl6_dport = usin->sin6_port; fl6.fl6_sport = inet->inet_sport; fl6.flowi6_uid = sk->sk_uid; opt = rcu_dereference_protected(np->opt, lockdep_sock_is_held(sk)); final_p = fl6_update_dst(&fl6, opt, &final); security_sk_classify_flow(sk, flowi6_to_flowi_common(&fl6)); dst = ip6_dst_lookup_flow(net, sk, &fl6, final_p); if (IS_ERR(dst)) { err = PTR_ERR(dst); goto failure; } tp->tcp_usec_ts = dst_tcp_usec_ts(dst); tcp_death_row = &sock_net(sk)->ipv4.tcp_death_row; if (!saddr) { saddr = &fl6.saddr; err = inet_bhash2_update_saddr(sk, saddr, AF_INET6); if (err) goto failure; } /* set the source address */ np->saddr = *saddr; inet->inet_rcv_saddr = LOOPBACK4_IPV6; sk->sk_gso_type = SKB_GSO_TCPV6; ip6_dst_store(sk, dst, NULL, NULL); icsk->icsk_ext_hdr_len = 0; if (opt) icsk->icsk_ext_hdr_len = opt->opt_flen + opt->opt_nflen; tp->rx_opt.mss_clamp = IPV6_MIN_MTU - sizeof(struct tcphdr) - sizeof(struct ipv6hdr); inet->inet_dport = usin->sin6_port; tcp_set_state(sk, TCP_SYN_SENT); err = inet6_hash_connect(tcp_death_row, sk); if (err) goto late_failure; sk_set_txhash(sk); if (likely(!tp->repair)) { if (!tp->write_seq) WRITE_ONCE(tp->write_seq, secure_tcpv6_seq(np->saddr.s6_addr32, sk->sk_v6_daddr.s6_addr32, inet->inet_sport, inet->inet_dport)); tp->tsoffset = secure_tcpv6_ts_off(net, np->saddr.s6_addr32, sk->sk_v6_daddr.s6_addr32); } if (tcp_fastopen_defer_connect(sk, &err)) return err; if (err) goto late_failure; err = tcp_connect(sk); if (err) goto late_failure; return 0; late_failure: tcp_set_state(sk, TCP_CLOSE); inet_bhash2_reset_saddr(sk); failure: inet->inet_dport = 0; sk->sk_route_caps = 0; return err; } static void tcp_v6_mtu_reduced(struct sock *sk) { struct dst_entry *dst; u32 mtu; if ((1 << sk->sk_state) & (TCPF_LISTEN | TCPF_CLOSE)) return; mtu = READ_ONCE(tcp_sk(sk)->mtu_info); /* Drop requests trying to increase our current mss. * Check done in __ip6_rt_update_pmtu() is too late. */ if (tcp_mtu_to_mss(sk, mtu) >= tcp_sk(sk)->mss_cache) return; dst = inet6_csk_update_pmtu(sk, mtu); if (!dst) return; if (inet_csk(sk)->icsk_pmtu_cookie > dst_mtu(dst)) { tcp_sync_mss(sk, dst_mtu(dst)); tcp_simple_retransmit(sk); } } static int tcp_v6_err(struct sk_buff *skb, struct inet6_skb_parm *opt, u8 type, u8 code, int offset, __be32 info) { const struct ipv6hdr *hdr = (const struct ipv6hdr *)skb->data; const struct tcphdr *th = (struct tcphdr *)(skb->data+offset); struct net *net = dev_net(skb->dev); struct request_sock *fastopen; struct ipv6_pinfo *np; struct tcp_sock *tp; __u32 seq, snd_una; struct sock *sk; bool fatal; int err; sk = __inet6_lookup_established(net, net->ipv4.tcp_death_row.hashinfo, &hdr->daddr, th->dest, &hdr->saddr, ntohs(th->source), skb->dev->ifindex, inet6_sdif(skb)); if (!sk) { __ICMP6_INC_STATS(net, __in6_dev_get(skb->dev), ICMP6_MIB_INERRORS); return -ENOENT; } if (sk->sk_state == TCP_TIME_WAIT) { /* To increase the counter of ignored icmps for TCP-AO */ tcp_ao_ignore_icmp(sk, AF_INET6, type, code); inet_twsk_put(inet_twsk(sk)); return 0; } seq = ntohl(th->seq); fatal = icmpv6_err_convert(type, code, &err); if (sk->sk_state == TCP_NEW_SYN_RECV) { tcp_req_err(sk, seq, fatal); return 0; } if (tcp_ao_ignore_icmp(sk, AF_INET6, type, code)) { sock_put(sk); return 0; } bh_lock_sock(sk); if (sock_owned_by_user(sk) && type != ICMPV6_PKT_TOOBIG) __NET_INC_STATS(net, LINUX_MIB_LOCKDROPPEDICMPS); if (sk->sk_state == TCP_CLOSE) goto out; if (static_branch_unlikely(&ip6_min_hopcount)) { /* min_hopcount can be changed concurrently from do_ipv6_setsockopt() */ if (ipv6_hdr(skb)->hop_limit < READ_ONCE(tcp_inet6_sk(sk)->min_hopcount)) { __NET_INC_STATS(net, LINUX_MIB_TCPMINTTLDROP); goto out; } } tp = tcp_sk(sk); /* XXX (TFO) - tp->snd_una should be ISN (tcp_create_openreq_child() */ fastopen = rcu_dereference(tp->fastopen_rsk); snd_una = fastopen ? tcp_rsk(fastopen)->snt_isn : tp->snd_una; if (sk->sk_state != TCP_LISTEN && !between(seq, snd_una, tp->snd_nxt)) { __NET_INC_STATS(net, LINUX_MIB_OUTOFWINDOWICMPS); goto out; } np = tcp_inet6_sk(sk); if (type == NDISC_REDIRECT) { if (!sock_owned_by_user(sk)) { struct dst_entry *dst = __sk_dst_check(sk, np->dst_cookie); if (dst) dst->ops->redirect(dst, sk, skb); } goto out; } if (type == ICMPV6_PKT_TOOBIG) { u32 mtu = ntohl(info); /* We are not interested in TCP_LISTEN and open_requests * (SYN-ACKs send out by Linux are always <576bytes so * they should go through unfragmented). */ if (sk->sk_state == TCP_LISTEN) goto out; if (!ip6_sk_accept_pmtu(sk)) goto out; if (mtu < IPV6_MIN_MTU) goto out; WRITE_ONCE(tp->mtu_info, mtu); if (!sock_owned_by_user(sk)) tcp_v6_mtu_reduced(sk); else if (!test_and_set_bit(TCP_MTU_REDUCED_DEFERRED, &sk->sk_tsq_flags)) sock_hold(sk); goto out; } /* Might be for an request_sock */ switch (sk->sk_state) { case TCP_SYN_SENT: case TCP_SYN_RECV: /* Only in fast or simultaneous open. If a fast open socket is * already accepted it is treated as a connected one below. */ if (fastopen && !fastopen->sk) break; ipv6_icmp_error(sk, skb, err, th->dest, ntohl(info), (u8 *)th); if (!sock_owned_by_user(sk)) tcp_done_with_error(sk, err); else WRITE_ONCE(sk->sk_err_soft, err); goto out; case TCP_LISTEN: break; default: /* check if this ICMP message allows revert of backoff. * (see RFC 6069) */ if (!fastopen && type == ICMPV6_DEST_UNREACH && code == ICMPV6_NOROUTE) tcp_ld_RTO_revert(sk, seq); } if (!sock_owned_by_user(sk) && inet6_test_bit(RECVERR6, sk)) { WRITE_ONCE(sk->sk_err, err); sk_error_report(sk); } else { WRITE_ONCE(sk->sk_err_soft, err); } out: bh_unlock_sock(sk); sock_put(sk); return 0; } static int tcp_v6_send_synack(const struct sock *sk, struct dst_entry *dst, struct flowi *fl, struct request_sock *req, struct tcp_fastopen_cookie *foc, enum tcp_synack_type synack_type, struct sk_buff *syn_skb) { struct inet_request_sock *ireq = inet_rsk(req); const struct ipv6_pinfo *np = tcp_inet6_sk(sk); struct ipv6_txoptions *opt; struct flowi6 *fl6 = &fl->u.ip6; struct sk_buff *skb; int err = -ENOMEM; u8 tclass; /* First, grab a route. */ if (!dst && (dst = inet6_csk_route_req(sk, fl6, req, IPPROTO_TCP)) == NULL) goto done; skb = tcp_make_synack(sk, dst, req, foc, synack_type, syn_skb); if (skb) { __tcp_v6_send_check(skb, &ireq->ir_v6_loc_addr, &ireq->ir_v6_rmt_addr); fl6->daddr = ireq->ir_v6_rmt_addr; if (inet6_test_bit(REPFLOW, sk) && ireq->pktopts) fl6->flowlabel = ip6_flowlabel(ipv6_hdr(ireq->pktopts)); tclass = READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_reflect_tos) ? (tcp_rsk(req)->syn_tos & ~INET_ECN_MASK) | (np->tclass & INET_ECN_MASK) : np->tclass; if (!INET_ECN_is_capable(tclass) && tcp_bpf_ca_needs_ecn((struct sock *)req)) tclass |= INET_ECN_ECT_0; rcu_read_lock(); opt = ireq->ipv6_opt; if (!opt) opt = rcu_dereference(np->opt); err = ip6_xmit(sk, skb, fl6, skb->mark ? : READ_ONCE(sk->sk_mark), opt, tclass, READ_ONCE(sk->sk_priority)); rcu_read_unlock(); err = net_xmit_eval(err); } done: return err; } static void tcp_v6_reqsk_destructor(struct request_sock *req) { kfree(inet_rsk(req)->ipv6_opt); consume_skb(inet_rsk(req)->pktopts); } #ifdef CONFIG_TCP_MD5SIG static struct tcp_md5sig_key *tcp_v6_md5_do_lookup(const struct sock *sk, const struct in6_addr *addr, int l3index) { return tcp_md5_do_lookup(sk, l3index, (union tcp_md5_addr *)addr, AF_INET6); } static struct tcp_md5sig_key *tcp_v6_md5_lookup(const struct sock *sk, const struct sock *addr_sk) { int l3index; l3index = l3mdev_master_ifindex_by_index(sock_net(sk), addr_sk->sk_bound_dev_if); return tcp_v6_md5_do_lookup(sk, &addr_sk->sk_v6_daddr, l3index); } static int tcp_v6_parse_md5_keys(struct sock *sk, int optname, sockptr_t optval, int optlen) { struct tcp_md5sig cmd; struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *)&cmd.tcpm_addr; union tcp_ao_addr *addr; int l3index = 0; u8 prefixlen; bool l3flag; u8 flags; if (optlen < sizeof(cmd)) return -EINVAL; if (copy_from_sockptr(&cmd, optval, sizeof(cmd))) return -EFAULT; if (sin6->sin6_family != AF_INET6) return -EINVAL; flags = cmd.tcpm_flags & TCP_MD5SIG_FLAG_IFINDEX; l3flag = cmd.tcpm_flags & TCP_MD5SIG_FLAG_IFINDEX; if (optname == TCP_MD5SIG_EXT && cmd.tcpm_flags & TCP_MD5SIG_FLAG_PREFIX) { prefixlen = cmd.tcpm_prefixlen; if (prefixlen > 128 || (ipv6_addr_v4mapped(&sin6->sin6_addr) && prefixlen > 32)) return -EINVAL; } else { prefixlen = ipv6_addr_v4mapped(&sin6->sin6_addr) ? 32 : 128; } if (optname == TCP_MD5SIG_EXT && cmd.tcpm_ifindex && cmd.tcpm_flags & TCP_MD5SIG_FLAG_IFINDEX) { struct net_device *dev; rcu_read_lock(); dev = dev_get_by_index_rcu(sock_net(sk), cmd.tcpm_ifindex); if (dev && netif_is_l3_master(dev)) l3index = dev->ifindex; rcu_read_unlock(); /* ok to reference set/not set outside of rcu; * right now device MUST be an L3 master */ if (!dev || !l3index) return -EINVAL; } if (!cmd.tcpm_keylen) { if (ipv6_addr_v4mapped(&sin6->sin6_addr)) return tcp_md5_do_del(sk, (union tcp_md5_addr *)&sin6->sin6_addr.s6_addr32[3], AF_INET, prefixlen, l3index, flags); return tcp_md5_do_del(sk, (union tcp_md5_addr *)&sin6->sin6_addr, AF_INET6, prefixlen, l3index, flags); } if (cmd.tcpm_keylen > TCP_MD5SIG_MAXKEYLEN) return -EINVAL; if (ipv6_addr_v4mapped(&sin6->sin6_addr)) { addr = (union tcp_md5_addr *)&sin6->sin6_addr.s6_addr32[3]; /* Don't allow keys for peers that have a matching TCP-AO key. * See the comment in tcp_ao_add_cmd() */ if (tcp_ao_required(sk, addr, AF_INET, l3flag ? l3index : -1, false)) return -EKEYREJECTED; return tcp_md5_do_add(sk, addr, AF_INET, prefixlen, l3index, flags, cmd.tcpm_key, cmd.tcpm_keylen); } addr = (union tcp_md5_addr *)&sin6->sin6_addr; /* Don't allow keys for peers that have a matching TCP-AO key. * See the comment in tcp_ao_add_cmd() */ if (tcp_ao_required(sk, addr, AF_INET6, l3flag ? l3index : -1, false)) return -EKEYREJECTED; return tcp_md5_do_add(sk, addr, AF_INET6, prefixlen, l3index, flags, cmd.tcpm_key, cmd.tcpm_keylen); } static int tcp_v6_md5_hash_headers(struct tcp_sigpool *hp, const struct in6_addr *daddr, const struct in6_addr *saddr, const struct tcphdr *th, int nbytes) { struct tcp6_pseudohdr *bp; struct scatterlist sg; struct tcphdr *_th; bp = hp->scratch; /* 1. TCP pseudo-header (RFC2460) */ bp->saddr = *saddr; bp->daddr = *daddr; bp->protocol = cpu_to_be32(IPPROTO_TCP); bp->len = cpu_to_be32(nbytes); _th = (struct tcphdr *)(bp + 1); memcpy(_th, th, sizeof(*th)); _th->check = 0; sg_init_one(&sg, bp, sizeof(*bp) + sizeof(*th)); ahash_request_set_crypt(hp->req, &sg, NULL, sizeof(*bp) + sizeof(*th)); return crypto_ahash_update(hp->req); } static int tcp_v6_md5_hash_hdr(char *md5_hash, const struct tcp_md5sig_key *key, const struct in6_addr *daddr, struct in6_addr *saddr, const struct tcphdr *th) { struct tcp_sigpool hp; if (tcp_sigpool_start(tcp_md5_sigpool_id, &hp)) goto clear_hash_nostart; if (crypto_ahash_init(hp.req)) goto clear_hash; if (tcp_v6_md5_hash_headers(&hp, daddr, saddr, th, th->doff << 2)) goto clear_hash; if (tcp_md5_hash_key(&hp, key)) goto clear_hash; ahash_request_set_crypt(hp.req, NULL, md5_hash, 0); if (crypto_ahash_final(hp.req)) goto clear_hash; tcp_sigpool_end(&hp); return 0; clear_hash: tcp_sigpool_end(&hp); clear_hash_nostart: memset(md5_hash, 0, 16); return 1; } static int tcp_v6_md5_hash_skb(char *md5_hash, const struct tcp_md5sig_key *key, const struct sock *sk, const struct sk_buff *skb) { const struct tcphdr *th = tcp_hdr(skb); const struct in6_addr *saddr, *daddr; struct tcp_sigpool hp; if (sk) { /* valid for establish/request sockets */ saddr = &sk->sk_v6_rcv_saddr; daddr = &sk->sk_v6_daddr; } else { const struct ipv6hdr *ip6h = ipv6_hdr(skb); saddr = &ip6h->saddr; daddr = &ip6h->daddr; } if (tcp_sigpool_start(tcp_md5_sigpool_id, &hp)) goto clear_hash_nostart; if (crypto_ahash_init(hp.req)) goto clear_hash; if (tcp_v6_md5_hash_headers(&hp, daddr, saddr, th, skb->len)) goto clear_hash; if (tcp_sigpool_hash_skb_data(&hp, skb, th->doff << 2)) goto clear_hash; if (tcp_md5_hash_key(&hp, key)) goto clear_hash; ahash_request_set_crypt(hp.req, NULL, md5_hash, 0); if (crypto_ahash_final(hp.req)) goto clear_hash; tcp_sigpool_end(&hp); return 0; clear_hash: tcp_sigpool_end(&hp); clear_hash_nostart: memset(md5_hash, 0, 16); return 1; } #endif static void tcp_v6_init_req(struct request_sock *req, const struct sock *sk_listener, struct sk_buff *skb, u32 tw_isn) { bool l3_slave = ipv6_l3mdev_skb(TCP_SKB_CB(skb)->header.h6.flags); struct inet_request_sock *ireq = inet_rsk(req); const struct ipv6_pinfo *np = tcp_inet6_sk(sk_listener); ireq->ir_v6_rmt_addr = ipv6_hdr(skb)->saddr; ireq->ir_v6_loc_addr = ipv6_hdr(skb)->daddr; /* So that link locals have meaning */ if ((!sk_listener->sk_bound_dev_if || l3_slave) && ipv6_addr_type(&ireq->ir_v6_rmt_addr) & IPV6_ADDR_LINKLOCAL) ireq->ir_iif = tcp_v6_iif(skb); if (!tw_isn && (ipv6_opt_accepted(sk_listener, skb, &TCP_SKB_CB(skb)->header.h6) || np->rxopt.bits.rxinfo || np->rxopt.bits.rxoinfo || np->rxopt.bits.rxhlim || np->rxopt.bits.rxohlim || inet6_test_bit(REPFLOW, sk_listener))) { refcount_inc(&skb->users); ireq->pktopts = skb; } } static struct dst_entry *tcp_v6_route_req(const struct sock *sk, struct sk_buff *skb, struct flowi *fl, struct request_sock *req, u32 tw_isn) { tcp_v6_init_req(req, sk, skb, tw_isn); if (security_inet_conn_request(sk, skb, req)) return NULL; return inet6_csk_route_req(sk, &fl->u.ip6, req, IPPROTO_TCP); } struct request_sock_ops tcp6_request_sock_ops __read_mostly = { .family = AF_INET6, .obj_size = sizeof(struct tcp6_request_sock), .rtx_syn_ack = tcp_rtx_synack, .send_ack = tcp_v6_reqsk_send_ack, .destructor = tcp_v6_reqsk_destructor, .send_reset = tcp_v6_send_reset, .syn_ack_timeout = tcp_syn_ack_timeout, }; const struct tcp_request_sock_ops tcp_request_sock_ipv6_ops = { .mss_clamp = IPV6_MIN_MTU - sizeof(struct tcphdr) - sizeof(struct ipv6hdr), #ifdef CONFIG_TCP_MD5SIG .req_md5_lookup = tcp_v6_md5_lookup, .calc_md5_hash = tcp_v6_md5_hash_skb, #endif #ifdef CONFIG_TCP_AO .ao_lookup = tcp_v6_ao_lookup_rsk, .ao_calc_key = tcp_v6_ao_calc_key_rsk, .ao_synack_hash = tcp_v6_ao_synack_hash, #endif #ifdef CONFIG_SYN_COOKIES .cookie_init_seq = cookie_v6_init_sequence, #endif .route_req = tcp_v6_route_req, .init_seq = tcp_v6_init_seq, .init_ts_off = tcp_v6_init_ts_off, .send_synack = tcp_v6_send_synack, }; static void tcp_v6_send_response(const struct sock *sk, struct sk_buff *skb, u32 seq, u32 ack, u32 win, u32 tsval, u32 tsecr, int oif, int rst, u8 tclass, __be32 label, u32 priority, u32 txhash, struct tcp_key *key) { const struct tcphdr *th = tcp_hdr(skb); struct tcphdr *t1; struct sk_buff *buff; struct flowi6 fl6; struct net *net = sk ? sock_net(sk) : dev_net(skb_dst(skb)->dev); struct sock *ctl_sk = net->ipv6.tcp_sk; unsigned int tot_len = sizeof(struct tcphdr); __be32 mrst = 0, *topt; struct dst_entry *dst; __u32 mark = 0; if (tsecr) tot_len += TCPOLEN_TSTAMP_ALIGNED; if (tcp_key_is_md5(key)) tot_len += TCPOLEN_MD5SIG_ALIGNED; if (tcp_key_is_ao(key)) tot_len += tcp_ao_len_aligned(key->ao_key); #ifdef CONFIG_MPTCP if (rst && !tcp_key_is_md5(key)) { mrst = mptcp_reset_option(skb); if (mrst) tot_len += sizeof(__be32); } #endif buff = alloc_skb(MAX_TCP_HEADER, GFP_ATOMIC); if (!buff) return; skb_reserve(buff, MAX_TCP_HEADER); t1 = skb_push(buff, tot_len); skb_reset_transport_header(buff); /* Swap the send and the receive. */ memset(t1, 0, sizeof(*t1)); t1->dest = th->source; t1->source = th->dest; t1->doff = tot_len / 4; t1->seq = htonl(seq); t1->ack_seq = htonl(ack); t1->ack = !rst || !th->ack; t1->rst = rst; t1->window = htons(win); topt = (__be32 *)(t1 + 1); if (tsecr) { *topt++ = htonl((TCPOPT_NOP << 24) | (TCPOPT_NOP << 16) | (TCPOPT_TIMESTAMP << 8) | TCPOLEN_TIMESTAMP); *topt++ = htonl(tsval); *topt++ = htonl(tsecr); } if (mrst) *topt++ = mrst; #ifdef CONFIG_TCP_MD5SIG if (tcp_key_is_md5(key)) { *topt++ = htonl((TCPOPT_NOP << 24) | (TCPOPT_NOP << 16) | (TCPOPT_MD5SIG << 8) | TCPOLEN_MD5SIG); tcp_v6_md5_hash_hdr((__u8 *)topt, key->md5_key, &ipv6_hdr(skb)->saddr, &ipv6_hdr(skb)->daddr, t1); } #endif #ifdef CONFIG_TCP_AO if (tcp_key_is_ao(key)) { *topt++ = htonl((TCPOPT_AO << 24) | (tcp_ao_len(key->ao_key) << 16) | (key->ao_key->sndid << 8) | (key->rcv_next)); tcp_ao_hash_hdr(AF_INET6, (char *)topt, key->ao_key, key->traffic_key, (union tcp_ao_addr *)&ipv6_hdr(skb)->saddr, (union tcp_ao_addr *)&ipv6_hdr(skb)->daddr, t1, key->sne); } #endif memset(&fl6, 0, sizeof(fl6)); fl6.daddr = ipv6_hdr(skb)->saddr; fl6.saddr = ipv6_hdr(skb)->daddr; fl6.flowlabel = label; buff->ip_summed = CHECKSUM_PARTIAL; __tcp_v6_send_check(buff, &fl6.saddr, &fl6.daddr); fl6.flowi6_proto = IPPROTO_TCP; if (rt6_need_strict(&fl6.daddr) && !oif) fl6.flowi6_oif = tcp_v6_iif(skb); else { if (!oif && netif_index_is_l3_master(net, skb->skb_iif)) oif = skb->skb_iif; fl6.flowi6_oif = oif; } if (sk) { /* unconstify the socket only to attach it to buff with care. */ skb_set_owner_edemux(buff, (struct sock *)sk); if (sk->sk_state == TCP_TIME_WAIT) mark = inet_twsk(sk)->tw_mark; else mark = READ_ONCE(sk->sk_mark); skb_set_delivery_time(buff, tcp_transmit_time(sk), SKB_CLOCK_MONOTONIC); } if (txhash) { /* autoflowlabel/skb_get_hash_flowi6 rely on buff->hash */ skb_set_hash(buff, txhash, PKT_HASH_TYPE_L4); } fl6.flowi6_mark = IP6_REPLY_MARK(net, skb->mark) ?: mark; fl6.fl6_dport = t1->dest; fl6.fl6_sport = t1->source; fl6.flowi6_uid = sock_net_uid(net, sk && sk_fullsock(sk) ? sk : NULL); security_skb_classify_flow(skb, flowi6_to_flowi_common(&fl6)); /* Pass a socket to ip6_dst_lookup either it is for RST * Underlying function will use this to retrieve the network * namespace */ if (sk && sk->sk_state != TCP_TIME_WAIT) dst = ip6_dst_lookup_flow(net, sk, &fl6, NULL); /*sk's xfrm_policy can be referred*/ else dst = ip6_dst_lookup_flow(net, ctl_sk, &fl6, NULL); if (!IS_ERR(dst)) { skb_dst_set(buff, dst); ip6_xmit(ctl_sk, buff, &fl6, fl6.flowi6_mark, NULL, tclass & ~INET_ECN_MASK, priority); TCP_INC_STATS(net, TCP_MIB_OUTSEGS); if (rst) TCP_INC_STATS(net, TCP_MIB_OUTRSTS); return; } kfree_skb(buff); } static void tcp_v6_send_reset(const struct sock *sk, struct sk_buff *skb, enum sk_rst_reason reason) { const struct tcphdr *th = tcp_hdr(skb); struct ipv6hdr *ipv6h = ipv6_hdr(skb); const __u8 *md5_hash_location = NULL; #if defined(CONFIG_TCP_MD5SIG) || defined(CONFIG_TCP_AO) bool allocated_traffic_key = false; #endif const struct tcp_ao_hdr *aoh; struct tcp_key key = {}; u32 seq = 0, ack_seq = 0; __be32 label = 0; u32 priority = 0; struct net *net; u32 txhash = 0; int oif = 0; #ifdef CONFIG_TCP_MD5SIG unsigned char newhash[16]; int genhash; struct sock *sk1 = NULL; #endif if (th->rst) return; /* If sk not NULL, it means we did a successful lookup and incoming * route had to be correct. prequeue might have dropped our dst. */ if (!sk && !ipv6_unicast_destination(skb)) return; net = sk ? sock_net(sk) : dev_net(skb_dst(skb)->dev); /* Invalid TCP option size or twice included auth */ if (tcp_parse_auth_options(th, &md5_hash_location, &aoh)) return; #if defined(CONFIG_TCP_MD5SIG) || defined(CONFIG_TCP_AO) rcu_read_lock(); #endif #ifdef CONFIG_TCP_MD5SIG if (sk && sk_fullsock(sk)) { int l3index; /* sdif set, means packet ingressed via a device * in an L3 domain and inet_iif is set to it. */ l3index = tcp_v6_sdif(skb) ? tcp_v6_iif_l3_slave(skb) : 0; key.md5_key = tcp_v6_md5_do_lookup(sk, &ipv6h->saddr, l3index); if (key.md5_key) key.type = TCP_KEY_MD5; } else if (md5_hash_location) { int dif = tcp_v6_iif_l3_slave(skb); int sdif = tcp_v6_sdif(skb); int l3index; /* * active side is lost. Try to find listening socket through * source port, and then find md5 key through listening socket. * we are not loose security here: * Incoming packet is checked with md5 hash with finding key, * no RST generated if md5 hash doesn't match. */ sk1 = inet6_lookup_listener(net, net->ipv4.tcp_death_row.hashinfo, NULL, 0, &ipv6h->saddr, th->source, &ipv6h->daddr, ntohs(th->source), dif, sdif); if (!sk1) goto out; /* sdif set, means packet ingressed via a device * in an L3 domain and dif is set to it. */ l3index = tcp_v6_sdif(skb) ? dif : 0; key.md5_key = tcp_v6_md5_do_lookup(sk1, &ipv6h->saddr, l3index); if (!key.md5_key) goto out; key.type = TCP_KEY_MD5; genhash = tcp_v6_md5_hash_skb(newhash, key.md5_key, NULL, skb); if (genhash || memcmp(md5_hash_location, newhash, 16) != 0) goto out; } #endif if (th->ack) seq = ntohl(th->ack_seq); else ack_seq = ntohl(th->seq) + th->syn + th->fin + skb->len - (th->doff << 2); #ifdef CONFIG_TCP_AO if (aoh) { int l3index; l3index = tcp_v6_sdif(skb) ? tcp_v6_iif_l3_slave(skb) : 0; if (tcp_ao_prepare_reset(sk, skb, aoh, l3index, seq, &key.ao_key, &key.traffic_key, &allocated_traffic_key, &key.rcv_next, &key.sne)) goto out; key.type = TCP_KEY_AO; } #endif if (sk) { oif = sk->sk_bound_dev_if; if (sk_fullsock(sk)) { if (inet6_test_bit(REPFLOW, sk)) label = ip6_flowlabel(ipv6h); priority = READ_ONCE(sk->sk_priority); txhash = sk->sk_txhash; } if (sk->sk_state == TCP_TIME_WAIT) { label = cpu_to_be32(inet_twsk(sk)->tw_flowlabel); priority = inet_twsk(sk)->tw_priority; txhash = inet_twsk(sk)->tw_txhash; } } else { if (net->ipv6.sysctl.flowlabel_reflect & FLOWLABEL_REFLECT_TCP_RESET) label = ip6_flowlabel(ipv6h); } trace_tcp_send_reset(sk, skb, reason); tcp_v6_send_response(sk, skb, seq, ack_seq, 0, 0, 0, oif, 1, ipv6_get_dsfield(ipv6h), label, priority, txhash, &key); #if defined(CONFIG_TCP_MD5SIG) || defined(CONFIG_TCP_AO) out: if (allocated_traffic_key) kfree(key.traffic_key); rcu_read_unlock(); #endif } static void tcp_v6_send_ack(const struct sock *sk, struct sk_buff *skb, u32 seq, u32 ack, u32 win, u32 tsval, u32 tsecr, int oif, struct tcp_key *key, u8 tclass, __be32 label, u32 priority, u32 txhash) { tcp_v6_send_response(sk, skb, seq, ack, win, tsval, tsecr, oif, 0, tclass, label, priority, txhash, key); } static void tcp_v6_timewait_ack(struct sock *sk, struct sk_buff *skb) { struct inet_timewait_sock *tw = inet_twsk(sk); struct tcp_timewait_sock *tcptw = tcp_twsk(sk); struct tcp_key key = {}; #ifdef CONFIG_TCP_AO struct tcp_ao_info *ao_info; if (static_branch_unlikely(&tcp_ao_needed.key)) { /* FIXME: the segment to-be-acked is not verified yet */ ao_info = rcu_dereference(tcptw->ao_info); if (ao_info) { const struct tcp_ao_hdr *aoh; /* Invalid TCP option size or twice included auth */ if (tcp_parse_auth_options(tcp_hdr(skb), NULL, &aoh)) goto out; if (aoh) key.ao_key = tcp_ao_established_key(sk, ao_info, aoh->rnext_keyid, -1); } } if (key.ao_key) { struct tcp_ao_key *rnext_key; key.traffic_key = snd_other_key(key.ao_key); /* rcv_next switches to our rcv_next */ rnext_key = READ_ONCE(ao_info->rnext_key); key.rcv_next = rnext_key->rcvid; key.sne = READ_ONCE(ao_info->snd_sne); key.type = TCP_KEY_AO; #else if (0) { #endif #ifdef CONFIG_TCP_MD5SIG } else if (static_branch_unlikely(&tcp_md5_needed.key)) { key.md5_key = tcp_twsk_md5_key(tcptw); if (key.md5_key) key.type = TCP_KEY_MD5; #endif } tcp_v6_send_ack(sk, skb, tcptw->tw_snd_nxt, READ_ONCE(tcptw->tw_rcv_nxt), tcptw->tw_rcv_wnd >> tw->tw_rcv_wscale, tcp_tw_tsval(tcptw), READ_ONCE(tcptw->tw_ts_recent), tw->tw_bound_dev_if, &key, tw->tw_tclass, cpu_to_be32(tw->tw_flowlabel), tw->tw_priority, tw->tw_txhash); #ifdef CONFIG_TCP_AO out: #endif inet_twsk_put(tw); } static void tcp_v6_reqsk_send_ack(const struct sock *sk, struct sk_buff *skb, struct request_sock *req) { struct tcp_key key = {}; #ifdef CONFIG_TCP_AO if (static_branch_unlikely(&tcp_ao_needed.key) && tcp_rsk_used_ao(req)) { const struct in6_addr *addr = &ipv6_hdr(skb)->saddr; const struct tcp_ao_hdr *aoh; int l3index; l3index = tcp_v6_sdif(skb) ? tcp_v6_iif_l3_slave(skb) : 0; /* Invalid TCP option size or twice included auth */ if (tcp_parse_auth_options(tcp_hdr(skb), NULL, &aoh)) return; if (!aoh) return; key.ao_key = tcp_ao_do_lookup(sk, l3index, (union tcp_ao_addr *)addr, AF_INET6, aoh->rnext_keyid, -1); if (unlikely(!key.ao_key)) { /* Send ACK with any matching MKT for the peer */ key.ao_key = tcp_ao_do_lookup(sk, l3index, (union tcp_ao_addr *)addr, AF_INET6, -1, -1); /* Matching key disappeared (user removed the key?) * let the handshake timeout. */ if (!key.ao_key) { net_info_ratelimited("TCP-AO key for (%pI6, %d)->(%pI6, %d) suddenly disappeared, won't ACK new connection\n", addr, ntohs(tcp_hdr(skb)->source), &ipv6_hdr(skb)->daddr, ntohs(tcp_hdr(skb)->dest)); return; } } key.traffic_key = kmalloc(tcp_ao_digest_size(key.ao_key), GFP_ATOMIC); if (!key.traffic_key) return; key.type = TCP_KEY_AO; key.rcv_next = aoh->keyid; tcp_v6_ao_calc_key_rsk(key.ao_key, key.traffic_key, req); #else if (0) { #endif #ifdef CONFIG_TCP_MD5SIG } else if (static_branch_unlikely(&tcp_md5_needed.key)) { int l3index = tcp_v6_sdif(skb) ? tcp_v6_iif_l3_slave(skb) : 0; key.md5_key = tcp_v6_md5_do_lookup(sk, &ipv6_hdr(skb)->saddr, l3index); if (key.md5_key) key.type = TCP_KEY_MD5; #endif } /* sk->sk_state == TCP_LISTEN -> for regular TCP_SYN_RECV * sk->sk_state == TCP_SYN_RECV -> for Fast Open. */ tcp_v6_send_ack(sk, skb, (sk->sk_state == TCP_LISTEN) ? tcp_rsk(req)->snt_isn + 1 : tcp_sk(sk)->snd_nxt, tcp_rsk(req)->rcv_nxt, tcp_synack_window(req) >> inet_rsk(req)->rcv_wscale, tcp_rsk_tsval(tcp_rsk(req)), READ_ONCE(req->ts_recent), sk->sk_bound_dev_if, &key, ipv6_get_dsfield(ipv6_hdr(skb)), 0, READ_ONCE(sk->sk_priority), READ_ONCE(tcp_rsk(req)->txhash)); if (tcp_key_is_ao(&key)) kfree(key.traffic_key); } static struct sock *tcp_v6_cookie_check(struct sock *sk, struct sk_buff *skb) { #ifdef CONFIG_SYN_COOKIES const struct tcphdr *th = tcp_hdr(skb); if (!th->syn) sk = cookie_v6_check(sk, skb); #endif return sk; } u16 tcp_v6_get_syncookie(struct sock *sk, struct ipv6hdr *iph, struct tcphdr *th, u32 *cookie) { u16 mss = 0; #ifdef CONFIG_SYN_COOKIES mss = tcp_get_syncookie_mss(&tcp6_request_sock_ops, &tcp_request_sock_ipv6_ops, sk, th); if (mss) { *cookie = __cookie_v6_init_sequence(iph, th, &mss); tcp_synq_overflow(sk); } #endif return mss; } static int tcp_v6_conn_request(struct sock *sk, struct sk_buff *skb) { if (skb->protocol == htons(ETH_P_IP)) return tcp_v4_conn_request(sk, skb); if (!ipv6_unicast_destination(skb)) goto drop; if (ipv6_addr_v4mapped(&ipv6_hdr(skb)->saddr)) { __IP6_INC_STATS(sock_net(sk), NULL, IPSTATS_MIB_INHDRERRORS); return 0; } return tcp_conn_request(&tcp6_request_sock_ops, &tcp_request_sock_ipv6_ops, sk, skb); drop: tcp_listendrop(sk); return 0; /* don't send reset */ } static void tcp_v6_restore_cb(struct sk_buff *skb) { /* We need to move header back to the beginning if xfrm6_policy_check() * and tcp_v6_fill_cb() are going to be called again. * ip6_datagram_recv_specific_ctl() also expects IP6CB to be there. */ memmove(IP6CB(skb), &TCP_SKB_CB(skb)->header.h6, sizeof(struct inet6_skb_parm)); } static struct sock *tcp_v6_syn_recv_sock(const struct sock *sk, struct sk_buff *skb, struct request_sock *req, struct dst_entry *dst, struct request_sock *req_unhash, bool *own_req) { struct inet_request_sock *ireq; struct ipv6_pinfo *newnp; const struct ipv6_pinfo *np = tcp_inet6_sk(sk); struct ipv6_txoptions *opt; struct inet_sock *newinet; bool found_dup_sk = false; struct tcp_sock *newtp; struct sock *newsk; #ifdef CONFIG_TCP_MD5SIG struct tcp_md5sig_key *key; int l3index; #endif struct flowi6 fl6; if (skb->protocol == htons(ETH_P_IP)) { /* * v6 mapped */ newsk = tcp_v4_syn_recv_sock(sk, skb, req, dst, req_unhash, own_req); if (!newsk) return NULL; inet_sk(newsk)->pinet6 = tcp_inet6_sk(newsk); newnp = tcp_inet6_sk(newsk); newtp = tcp_sk(newsk); memcpy(newnp, np, sizeof(struct ipv6_pinfo)); newnp->saddr = newsk->sk_v6_rcv_saddr; inet_csk(newsk)->icsk_af_ops = &ipv6_mapped; if (sk_is_mptcp(newsk)) mptcpv6_handle_mapped(newsk, true); newsk->sk_backlog_rcv = tcp_v4_do_rcv; #if defined(CONFIG_TCP_MD5SIG) || defined(CONFIG_TCP_AO) newtp->af_specific = &tcp_sock_ipv6_mapped_specific; #endif newnp->ipv6_mc_list = NULL; newnp->ipv6_ac_list = NULL; newnp->ipv6_fl_list = NULL; newnp->pktoptions = NULL; newnp->opt = NULL; newnp->mcast_oif = inet_iif(skb); newnp->mcast_hops = ip_hdr(skb)->ttl; newnp->rcv_flowinfo = 0; if (inet6_test_bit(REPFLOW, sk)) newnp->flow_label = 0; /* * No need to charge this sock to the relevant IPv6 refcnt debug socks count * here, tcp_create_openreq_child now does this for us, see the comment in * that function for the gory details. -acme */ /* It is tricky place. Until this moment IPv4 tcp worked with IPv6 icsk.icsk_af_ops. Sync it now. */ tcp_sync_mss(newsk, inet_csk(newsk)->icsk_pmtu_cookie); return newsk; } ireq = inet_rsk(req); if (sk_acceptq_is_full(sk)) goto out_overflow; if (!dst) { dst = inet6_csk_route_req(sk, &fl6, req, IPPROTO_TCP); if (!dst) goto out; } newsk = tcp_create_openreq_child(sk, req, skb); if (!newsk) goto out_nonewsk; /* * No need to charge this sock to the relevant IPv6 refcnt debug socks * count here, tcp_create_openreq_child now does this for us, see the * comment in that function for the gory details. -acme */ newsk->sk_gso_type = SKB_GSO_TCPV6; inet6_sk_rx_dst_set(newsk, skb); inet_sk(newsk)->pinet6 = tcp_inet6_sk(newsk); newtp = tcp_sk(newsk); newinet = inet_sk(newsk); newnp = tcp_inet6_sk(newsk); memcpy(newnp, np, sizeof(struct ipv6_pinfo)); ip6_dst_store(newsk, dst, NULL, NULL); newsk->sk_v6_daddr = ireq->ir_v6_rmt_addr; newnp->saddr = ireq->ir_v6_loc_addr; newsk->sk_v6_rcv_saddr = ireq->ir_v6_loc_addr; newsk->sk_bound_dev_if = ireq->ir_iif; /* Now IPv6 options... First: no IPv4 options. */ newinet->inet_opt = NULL; newnp->ipv6_mc_list = NULL; newnp->ipv6_ac_list = NULL; newnp->ipv6_fl_list = NULL; /* Clone RX bits */ newnp->rxopt.all = np->rxopt.all; newnp->pktoptions = NULL; newnp->opt = NULL; newnp->mcast_oif = tcp_v6_iif(skb); newnp->mcast_hops = ipv6_hdr(skb)->hop_limit; newnp->rcv_flowinfo = ip6_flowinfo(ipv6_hdr(skb)); if (inet6_test_bit(REPFLOW, sk)) newnp->flow_label = ip6_flowlabel(ipv6_hdr(skb)); /* Set ToS of the new socket based upon the value of incoming SYN. * ECT bits are set later in tcp_init_transfer(). */ if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_reflect_tos)) newnp->tclass = tcp_rsk(req)->syn_tos & ~INET_ECN_MASK; /* Clone native IPv6 options from listening socket (if any) Yes, keeping reference count would be much more clever, but we make one more one thing there: reattach optmem to newsk. */ opt = ireq->ipv6_opt; if (!opt) opt = rcu_dereference(np->opt); if (opt) { opt = ipv6_dup_options(newsk, opt); RCU_INIT_POINTER(newnp->opt, opt); } inet_csk(newsk)->icsk_ext_hdr_len = 0; if (opt) inet_csk(newsk)->icsk_ext_hdr_len = opt->opt_nflen + opt->opt_flen; tcp_ca_openreq_child(newsk, dst); tcp_sync_mss(newsk, dst_mtu(dst)); newtp->advmss = tcp_mss_clamp(tcp_sk(sk), dst_metric_advmss(dst)); tcp_initialize_rcv_mss(newsk); newinet->inet_daddr = newinet->inet_saddr = LOOPBACK4_IPV6; newinet->inet_rcv_saddr = LOOPBACK4_IPV6; #ifdef CONFIG_TCP_MD5SIG l3index = l3mdev_master_ifindex_by_index(sock_net(sk), ireq->ir_iif); if (!tcp_rsk_used_ao(req)) { /* Copy over the MD5 key from the original socket */ key = tcp_v6_md5_do_lookup(sk, &newsk->sk_v6_daddr, l3index); if (key) { const union tcp_md5_addr *addr; addr = (union tcp_md5_addr *)&newsk->sk_v6_daddr; if (tcp_md5_key_copy(newsk, addr, AF_INET6, 128, l3index, key)) { inet_csk_prepare_forced_close(newsk); tcp_done(newsk); goto out; } } } #endif #ifdef CONFIG_TCP_AO /* Copy over tcp_ao_info if any */ if (tcp_ao_copy_all_matching(sk, newsk, req, skb, AF_INET6)) goto out; /* OOM */ #endif if (__inet_inherit_port(sk, newsk) < 0) { inet_csk_prepare_forced_close(newsk); tcp_done(newsk); goto out; } *own_req = inet_ehash_nolisten(newsk, req_to_sk(req_unhash), &found_dup_sk); if (*own_req) { tcp_move_syn(newtp, req); /* Clone pktoptions received with SYN, if we own the req */ if (ireq->pktopts) { newnp->pktoptions = skb_clone_and_charge_r(ireq->pktopts, newsk); consume_skb(ireq->pktopts); ireq->pktopts = NULL; if (newnp->pktoptions) tcp_v6_restore_cb(newnp->pktoptions); } } else { if (!req_unhash && found_dup_sk) { /* This code path should only be executed in the * syncookie case only */ bh_unlock_sock(newsk); sock_put(newsk); newsk = NULL; } } return newsk; out_overflow: __NET_INC_STATS(sock_net(sk), LINUX_MIB_LISTENOVERFLOWS); out_nonewsk: dst_release(dst); out: tcp_listendrop(sk); return NULL; } INDIRECT_CALLABLE_DECLARE(struct dst_entry *ipv4_dst_check(struct dst_entry *, u32)); /* The socket must have it's spinlock held when we get * here, unless it is a TCP_LISTEN socket. * * We have a potential double-lock case here, so even when * doing backlog processing we use the BH locking scheme. * This is because we cannot sleep with the original spinlock * held. */ INDIRECT_CALLABLE_SCOPE int tcp_v6_do_rcv(struct sock *sk, struct sk_buff *skb) { struct ipv6_pinfo *np = tcp_inet6_sk(sk); struct sk_buff *opt_skb = NULL; enum skb_drop_reason reason; struct tcp_sock *tp; /* Imagine: socket is IPv6. IPv4 packet arrives, goes to IPv4 receive handler and backlogged. From backlog it always goes here. Kerboom... Fortunately, tcp_rcv_established and rcv_established handle them correctly, but it is not case with tcp_v6_hnd_req and tcp_v6_send_reset(). --ANK */ if (skb->protocol == htons(ETH_P_IP)) return tcp_v4_do_rcv(sk, skb); /* * socket locking is here for SMP purposes as backlog rcv * is currently called with bh processing disabled. */ /* Do Stevens' IPV6_PKTOPTIONS. Yes, guys, it is the only place in our code, where we may make it not affecting IPv4. The rest of code is protocol independent, and I do not like idea to uglify IPv4. Actually, all the idea behind IPV6_PKTOPTIONS looks not very well thought. For now we latch options, received in the last packet, enqueued by tcp. Feel free to propose better solution. --ANK (980728) */ if (np->rxopt.all && sk->sk_state != TCP_LISTEN) opt_skb = skb_clone_and_charge_r(skb, sk); if (sk->sk_state == TCP_ESTABLISHED) { /* Fast path */ struct dst_entry *dst; dst = rcu_dereference_protected(sk->sk_rx_dst, lockdep_sock_is_held(sk)); sock_rps_save_rxhash(sk, skb); sk_mark_napi_id(sk, skb); if (dst) { if (sk->sk_rx_dst_ifindex != skb->skb_iif || INDIRECT_CALL_1(dst->ops->check, ip6_dst_check, dst, sk->sk_rx_dst_cookie) == NULL) { RCU_INIT_POINTER(sk->sk_rx_dst, NULL); dst_release(dst); } } tcp_rcv_established(sk, skb); if (opt_skb) goto ipv6_pktoptions; return 0; } if (tcp_checksum_complete(skb)) goto csum_err; if (sk->sk_state == TCP_LISTEN) { struct sock *nsk = tcp_v6_cookie_check(sk, skb); if (nsk != sk) { if (nsk) { reason = tcp_child_process(sk, nsk, skb); if (reason) goto reset; } return 0; } } else sock_rps_save_rxhash(sk, skb); reason = tcp_rcv_state_process(sk, skb); if (reason) goto reset; if (opt_skb) goto ipv6_pktoptions; return 0; reset: tcp_v6_send_reset(sk, skb, sk_rst_convert_drop_reason(reason)); discard: if (opt_skb) __kfree_skb(opt_skb); sk_skb_reason_drop(sk, skb, reason); return 0; csum_err: reason = SKB_DROP_REASON_TCP_CSUM; trace_tcp_bad_csum(skb); TCP_INC_STATS(sock_net(sk), TCP_MIB_CSUMERRORS); TCP_INC_STATS(sock_net(sk), TCP_MIB_INERRS); goto discard; ipv6_pktoptions: /* Do you ask, what is it? 1. skb was enqueued by tcp. 2. skb is added to tail of read queue, rather than out of order. 3. socket is not in passive state. 4. Finally, it really contains options, which user wants to receive. */ tp = tcp_sk(sk); if (TCP_SKB_CB(opt_skb)->end_seq == tp->rcv_nxt && !((1 << sk->sk_state) & (TCPF_CLOSE | TCPF_LISTEN))) { if (np->rxopt.bits.rxinfo || np->rxopt.bits.rxoinfo) WRITE_ONCE(np->mcast_oif, tcp_v6_iif(opt_skb)); if (np->rxopt.bits.rxhlim || np->rxopt.bits.rxohlim) WRITE_ONCE(np->mcast_hops, ipv6_hdr(opt_skb)->hop_limit); if (np->rxopt.bits.rxflow || np->rxopt.bits.rxtclass) np->rcv_flowinfo = ip6_flowinfo(ipv6_hdr(opt_skb)); if (inet6_test_bit(REPFLOW, sk)) np->flow_label = ip6_flowlabel(ipv6_hdr(opt_skb)); if (ipv6_opt_accepted(sk, opt_skb, &TCP_SKB_CB(opt_skb)->header.h6)) { tcp_v6_restore_cb(opt_skb); opt_skb = xchg(&np->pktoptions, opt_skb); } else { __kfree_skb(opt_skb); opt_skb = xchg(&np->pktoptions, NULL); } } consume_skb(opt_skb); return 0; } static void tcp_v6_fill_cb(struct sk_buff *skb, const struct ipv6hdr *hdr, const struct tcphdr *th) { /* This is tricky: we move IP6CB at its correct location into * TCP_SKB_CB(). It must be done after xfrm6_policy_check(), because * _decode_session6() uses IP6CB(). * barrier() makes sure compiler won't play aliasing games. */ memmove(&TCP_SKB_CB(skb)->header.h6, IP6CB(skb), sizeof(struct inet6_skb_parm)); barrier(); TCP_SKB_CB(skb)->seq = ntohl(th->seq); TCP_SKB_CB(skb)->end_seq = (TCP_SKB_CB(skb)->seq + th->syn + th->fin + skb->len - th->doff*4); TCP_SKB_CB(skb)->ack_seq = ntohl(th->ack_seq); TCP_SKB_CB(skb)->tcp_flags = tcp_flag_byte(th); TCP_SKB_CB(skb)->ip_dsfield = ipv6_get_dsfield(hdr); TCP_SKB_CB(skb)->sacked = 0; TCP_SKB_CB(skb)->has_rxtstamp = skb->tstamp || skb_hwtstamps(skb)->hwtstamp; } INDIRECT_CALLABLE_SCOPE int tcp_v6_rcv(struct sk_buff *skb) { enum skb_drop_reason drop_reason; int sdif = inet6_sdif(skb); int dif = inet6_iif(skb); const struct tcphdr *th; const struct ipv6hdr *hdr; struct sock *sk = NULL; bool refcounted; int ret; u32 isn; struct net *net = dev_net(skb->dev); drop_reason = SKB_DROP_REASON_NOT_SPECIFIED; if (skb->pkt_type != PACKET_HOST) goto discard_it; /* * Count it even if it's bad. */ __TCP_INC_STATS(net, TCP_MIB_INSEGS); if (!pskb_may_pull(skb, sizeof(struct tcphdr))) goto discard_it; th = (const struct tcphdr *)skb->data; if (unlikely(th->doff < sizeof(struct tcphdr) / 4)) { drop_reason = SKB_DROP_REASON_PKT_TOO_SMALL; goto bad_packet; } if (!pskb_may_pull(skb, th->doff*4)) goto discard_it; if (skb_checksum_init(skb, IPPROTO_TCP, ip6_compute_pseudo)) goto csum_error; th = (const struct tcphdr *)skb->data; hdr = ipv6_hdr(skb); lookup: sk = __inet6_lookup_skb(net->ipv4.tcp_death_row.hashinfo, skb, __tcp_hdrlen(th), th->source, th->dest, inet6_iif(skb), sdif, &refcounted); if (!sk) goto no_tcp_socket; if (sk->sk_state == TCP_TIME_WAIT) goto do_time_wait; if (sk->sk_state == TCP_NEW_SYN_RECV) { struct request_sock *req = inet_reqsk(sk); bool req_stolen = false; struct sock *nsk; sk = req->rsk_listener; if (!xfrm6_policy_check(sk, XFRM_POLICY_IN, skb)) drop_reason = SKB_DROP_REASON_XFRM_POLICY; else drop_reason = tcp_inbound_hash(sk, req, skb, &hdr->saddr, &hdr->daddr, AF_INET6, dif, sdif); if (drop_reason) { sk_drops_add(sk, skb); reqsk_put(req); goto discard_it; } if (tcp_checksum_complete(skb)) { reqsk_put(req); goto csum_error; } if (unlikely(sk->sk_state != TCP_LISTEN)) { nsk = reuseport_migrate_sock(sk, req_to_sk(req), skb); if (!nsk) { inet_csk_reqsk_queue_drop_and_put(sk, req); goto lookup; } sk = nsk; /* reuseport_migrate_sock() has already held one sk_refcnt * before returning. */ } else { sock_hold(sk); } refcounted = true; nsk = NULL; if (!tcp_filter(sk, skb)) { th = (const struct tcphdr *)skb->data; hdr = ipv6_hdr(skb); tcp_v6_fill_cb(skb, hdr, th); nsk = tcp_check_req(sk, skb, req, false, &req_stolen); } else { drop_reason = SKB_DROP_REASON_SOCKET_FILTER; } if (!nsk) { reqsk_put(req); if (req_stolen) { /* Another cpu got exclusive access to req * and created a full blown socket. * Try to feed this packet to this socket * instead of discarding it. */ tcp_v6_restore_cb(skb); sock_put(sk); goto lookup; } goto discard_and_relse; } nf_reset_ct(skb); if (nsk == sk) { reqsk_put(req); tcp_v6_restore_cb(skb); } else { drop_reason = tcp_child_process(sk, nsk, skb); if (drop_reason) { enum sk_rst_reason rst_reason; rst_reason = sk_rst_convert_drop_reason(drop_reason); tcp_v6_send_reset(nsk, skb, rst_reason); goto discard_and_relse; } sock_put(sk); return 0; } } process: if (static_branch_unlikely(&ip6_min_hopcount)) { /* min_hopcount can be changed concurrently from do_ipv6_setsockopt() */ if (unlikely(hdr->hop_limit < READ_ONCE(tcp_inet6_sk(sk)->min_hopcount))) { __NET_INC_STATS(net, LINUX_MIB_TCPMINTTLDROP); drop_reason = SKB_DROP_REASON_TCP_MINTTL; goto discard_and_relse; } } if (!xfrm6_policy_check(sk, XFRM_POLICY_IN, skb)) { drop_reason = SKB_DROP_REASON_XFRM_POLICY; goto discard_and_relse; } drop_reason = tcp_inbound_hash(sk, NULL, skb, &hdr->saddr, &hdr->daddr, AF_INET6, dif, sdif); if (drop_reason) goto discard_and_relse; nf_reset_ct(skb); if (tcp_filter(sk, skb)) { drop_reason = SKB_DROP_REASON_SOCKET_FILTER; goto discard_and_relse; } th = (const struct tcphdr *)skb->data; hdr = ipv6_hdr(skb); tcp_v6_fill_cb(skb, hdr, th); skb->dev = NULL; if (sk->sk_state == TCP_LISTEN) { ret = tcp_v6_do_rcv(sk, skb); goto put_and_return; } sk_incoming_cpu_update(sk); bh_lock_sock_nested(sk); tcp_segs_in(tcp_sk(sk), skb); ret = 0; if (!sock_owned_by_user(sk)) { ret = tcp_v6_do_rcv(sk, skb); } else { if (tcp_add_backlog(sk, skb, &drop_reason)) goto discard_and_relse; } bh_unlock_sock(sk); put_and_return: if (refcounted) sock_put(sk); return ret ? -1 : 0; no_tcp_socket: drop_reason = SKB_DROP_REASON_NO_SOCKET; if (!xfrm6_policy_check(NULL, XFRM_POLICY_IN, skb)) goto discard_it; tcp_v6_fill_cb(skb, hdr, th); if (tcp_checksum_complete(skb)) { csum_error: drop_reason = SKB_DROP_REASON_TCP_CSUM; trace_tcp_bad_csum(skb); __TCP_INC_STATS(net, TCP_MIB_CSUMERRORS); bad_packet: __TCP_INC_STATS(net, TCP_MIB_INERRS); } else { tcp_v6_send_reset(NULL, skb, sk_rst_convert_drop_reason(drop_reason)); } discard_it: SKB_DR_OR(drop_reason, NOT_SPECIFIED); sk_skb_reason_drop(sk, skb, drop_reason); return 0; discard_and_relse: sk_drops_add(sk, skb); if (refcounted) sock_put(sk); goto discard_it; do_time_wait: if (!xfrm6_policy_check(NULL, XFRM_POLICY_IN, skb)) { drop_reason = SKB_DROP_REASON_XFRM_POLICY; inet_twsk_put(inet_twsk(sk)); goto discard_it; } tcp_v6_fill_cb(skb, hdr, th); if (tcp_checksum_complete(skb)) { inet_twsk_put(inet_twsk(sk)); goto csum_error; } switch (tcp_timewait_state_process(inet_twsk(sk), skb, th, &isn)) { case TCP_TW_SYN: { struct sock *sk2; sk2 = inet6_lookup_listener(net, net->ipv4.tcp_death_row.hashinfo, skb, __tcp_hdrlen(th), &ipv6_hdr(skb)->saddr, th->source, &ipv6_hdr(skb)->daddr, ntohs(th->dest), tcp_v6_iif_l3_slave(skb), sdif); if (sk2) { struct inet_timewait_sock *tw = inet_twsk(sk); inet_twsk_deschedule_put(tw); sk = sk2; tcp_v6_restore_cb(skb); refcounted = false; __this_cpu_write(tcp_tw_isn, isn); goto process; } } /* to ACK */ fallthrough; case TCP_TW_ACK: tcp_v6_timewait_ack(sk, skb); break; case TCP_TW_RST: tcp_v6_send_reset(sk, skb, SK_RST_REASON_TCP_TIMEWAIT_SOCKET); inet_twsk_deschedule_put(inet_twsk(sk)); goto discard_it; case TCP_TW_SUCCESS: ; } goto discard_it; } void tcp_v6_early_demux(struct sk_buff *skb) { struct net *net = dev_net(skb->dev); const struct ipv6hdr *hdr; const struct tcphdr *th; struct sock *sk; if (skb->pkt_type != PACKET_HOST) return; if (!pskb_may_pull(skb, skb_transport_offset(skb) + sizeof(struct tcphdr))) return; hdr = ipv6_hdr(skb); th = tcp_hdr(skb); if (th->doff < sizeof(struct tcphdr) / 4) return; /* Note : We use inet6_iif() here, not tcp_v6_iif() */ sk = __inet6_lookup_established(net, net->ipv4.tcp_death_row.hashinfo, &hdr->saddr, th->source, &hdr->daddr, ntohs(th->dest), inet6_iif(skb), inet6_sdif(skb)); if (sk) { skb->sk = sk; skb->destructor = sock_edemux; if (sk_fullsock(sk)) { struct dst_entry *dst = rcu_dereference(sk->sk_rx_dst); if (dst) dst = dst_check(dst, sk->sk_rx_dst_cookie); if (dst && sk->sk_rx_dst_ifindex == skb->skb_iif) skb_dst_set_noref(skb, dst); } } } static struct timewait_sock_ops tcp6_timewait_sock_ops = { .twsk_obj_size = sizeof(struct tcp6_timewait_sock), .twsk_destructor = tcp_twsk_destructor, }; INDIRECT_CALLABLE_SCOPE void tcp_v6_send_check(struct sock *sk, struct sk_buff *skb) { __tcp_v6_send_check(skb, &sk->sk_v6_rcv_saddr, &sk->sk_v6_daddr); } const struct inet_connection_sock_af_ops ipv6_specific = { .queue_xmit = inet6_csk_xmit, .send_check = tcp_v6_send_check, .rebuild_header = inet6_sk_rebuild_header, .sk_rx_dst_set = inet6_sk_rx_dst_set, .conn_request = tcp_v6_conn_request, .syn_recv_sock = tcp_v6_syn_recv_sock, .net_header_len = sizeof(struct ipv6hdr), .setsockopt = ipv6_setsockopt, .getsockopt = ipv6_getsockopt, .addr2sockaddr = inet6_csk_addr2sockaddr, .sockaddr_len = sizeof(struct sockaddr_in6), .mtu_reduced = tcp_v6_mtu_reduced, }; #if defined(CONFIG_TCP_MD5SIG) || defined(CONFIG_TCP_AO) static const struct tcp_sock_af_ops tcp_sock_ipv6_specific = { #ifdef CONFIG_TCP_MD5SIG .md5_lookup = tcp_v6_md5_lookup, .calc_md5_hash = tcp_v6_md5_hash_skb, .md5_parse = tcp_v6_parse_md5_keys, #endif #ifdef CONFIG_TCP_AO .ao_lookup = tcp_v6_ao_lookup, .calc_ao_hash = tcp_v6_ao_hash_skb, .ao_parse = tcp_v6_parse_ao, .ao_calc_key_sk = tcp_v6_ao_calc_key_sk, #endif }; #endif /* * TCP over IPv4 via INET6 API */ static const struct inet_connection_sock_af_ops ipv6_mapped = { .queue_xmit = ip_queue_xmit, .send_check = tcp_v4_send_check, .rebuild_header = inet_sk_rebuild_header, .sk_rx_dst_set = inet_sk_rx_dst_set, .conn_request = tcp_v6_conn_request, .syn_recv_sock = tcp_v6_syn_recv_sock, .net_header_len = sizeof(struct iphdr), .setsockopt = ipv6_setsockopt, .getsockopt = ipv6_getsockopt, .addr2sockaddr = inet6_csk_addr2sockaddr, .sockaddr_len = sizeof(struct sockaddr_in6), .mtu_reduced = tcp_v4_mtu_reduced, }; #if defined(CONFIG_TCP_MD5SIG) || defined(CONFIG_TCP_AO) static const struct tcp_sock_af_ops tcp_sock_ipv6_mapped_specific = { #ifdef CONFIG_TCP_MD5SIG .md5_lookup = tcp_v4_md5_lookup, .calc_md5_hash = tcp_v4_md5_hash_skb, .md5_parse = tcp_v6_parse_md5_keys, #endif #ifdef CONFIG_TCP_AO .ao_lookup = tcp_v6_ao_lookup, .calc_ao_hash = tcp_v4_ao_hash_skb, .ao_parse = tcp_v6_parse_ao, .ao_calc_key_sk = tcp_v4_ao_calc_key_sk, #endif }; #endif /* NOTE: A lot of things set to zero explicitly by call to * sk_alloc() so need not be done here. */ static int tcp_v6_init_sock(struct sock *sk) { struct inet_connection_sock *icsk = inet_csk(sk); tcp_init_sock(sk); icsk->icsk_af_ops = &ipv6_specific; #if defined(CONFIG_TCP_MD5SIG) || defined(CONFIG_TCP_AO) tcp_sk(sk)->af_specific = &tcp_sock_ipv6_specific; #endif return 0; } #ifdef CONFIG_PROC_FS /* Proc filesystem TCPv6 sock list dumping. */ static void get_openreq6(struct seq_file *seq, const struct request_sock *req, int i) { long ttd = req->rsk_timer.expires - jiffies; const struct in6_addr *src = &inet_rsk(req)->ir_v6_loc_addr; const struct in6_addr *dest = &inet_rsk(req)->ir_v6_rmt_addr; if (ttd < 0) ttd = 0; seq_printf(seq, "%4d: %08X%08X%08X%08X:%04X %08X%08X%08X%08X:%04X " "%02X %08X:%08X %02X:%08lX %08X %5u %8d %d %d %pK\n", i, src->s6_addr32[0], src->s6_addr32[1], src->s6_addr32[2], src->s6_addr32[3], inet_rsk(req)->ir_num, dest->s6_addr32[0], dest->s6_addr32[1], dest->s6_addr32[2], dest->s6_addr32[3], ntohs(inet_rsk(req)->ir_rmt_port), TCP_SYN_RECV, 0, 0, /* could print option size, but that is af dependent. */ 1, /* timers active (only the expire timer) */ jiffies_to_clock_t(ttd), req->num_timeout, from_kuid_munged(seq_user_ns(seq), sock_i_uid(req->rsk_listener)), 0, /* non standard timer */ 0, /* open_requests have no inode */ 0, req); } static void get_tcp6_sock(struct seq_file *seq, struct sock *sp, int i) { const struct in6_addr *dest, *src; __u16 destp, srcp; int timer_active; unsigned long timer_expires; const struct inet_sock *inet = inet_sk(sp); const struct tcp_sock *tp = tcp_sk(sp); const struct inet_connection_sock *icsk = inet_csk(sp); const struct fastopen_queue *fastopenq = &icsk->icsk_accept_queue.fastopenq; u8 icsk_pending; int rx_queue; int state; dest = &sp->sk_v6_daddr; src = &sp->sk_v6_rcv_saddr; destp = ntohs(inet->inet_dport); srcp = ntohs(inet->inet_sport); icsk_pending = smp_load_acquire(&icsk->icsk_pending); if (icsk_pending == ICSK_TIME_RETRANS || icsk_pending == ICSK_TIME_REO_TIMEOUT || icsk_pending == ICSK_TIME_LOSS_PROBE) { timer_active = 1; timer_expires = icsk->icsk_timeout; } else if (icsk_pending == ICSK_TIME_PROBE0) { timer_active = 4; timer_expires = icsk->icsk_timeout; } else if (timer_pending(&sp->sk_timer)) { timer_active = 2; timer_expires = sp->sk_timer.expires; } else { timer_active = 0; timer_expires = jiffies; } state = inet_sk_state_load(sp); if (state == TCP_LISTEN) rx_queue = READ_ONCE(sp->sk_ack_backlog); else /* Because we don't lock the socket, * we might find a transient negative value. */ rx_queue = max_t(int, READ_ONCE(tp->rcv_nxt) - READ_ONCE(tp->copied_seq), 0); seq_printf(seq, "%4d: %08X%08X%08X%08X:%04X %08X%08X%08X%08X:%04X " "%02X %08X:%08X %02X:%08lX %08X %5u %8d %lu %d %pK %lu %lu %u %u %d\n", i, src->s6_addr32[0], src->s6_addr32[1], src->s6_addr32[2], src->s6_addr32[3], srcp, dest->s6_addr32[0], dest->s6_addr32[1], dest->s6_addr32[2], dest->s6_addr32[3], destp, state, READ_ONCE(tp->write_seq) - tp->snd_una, rx_queue, timer_active, jiffies_delta_to_clock_t(timer_expires - jiffies), icsk->icsk_retransmits, from_kuid_munged(seq_user_ns(seq), sock_i_uid(sp)), icsk->icsk_probes_out, sock_i_ino(sp), refcount_read(&sp->sk_refcnt), sp, jiffies_to_clock_t(icsk->icsk_rto), jiffies_to_clock_t(icsk->icsk_ack.ato), (icsk->icsk_ack.quick << 1) | inet_csk_in_pingpong_mode(sp), tcp_snd_cwnd(tp), state == TCP_LISTEN ? fastopenq->max_qlen : (tcp_in_initial_slowstart(tp) ? -1 : tp->snd_ssthresh) ); } static void get_timewait6_sock(struct seq_file *seq, struct inet_timewait_sock *tw, int i) { long delta = tw->tw_timer.expires - jiffies; const struct in6_addr *dest, *src; __u16 destp, srcp; dest = &tw->tw_v6_daddr; src = &tw->tw_v6_rcv_saddr; destp = ntohs(tw->tw_dport); srcp = ntohs(tw->tw_sport); seq_printf(seq, "%4d: %08X%08X%08X%08X:%04X %08X%08X%08X%08X:%04X " "%02X %08X:%08X %02X:%08lX %08X %5d %8d %d %d %pK\n", i, src->s6_addr32[0], src->s6_addr32[1], src->s6_addr32[2], src->s6_addr32[3], srcp, dest->s6_addr32[0], dest->s6_addr32[1], dest->s6_addr32[2], dest->s6_addr32[3], destp, READ_ONCE(tw->tw_substate), 0, 0, 3, jiffies_delta_to_clock_t(delta), 0, 0, 0, 0, refcount_read(&tw->tw_refcnt), tw); } static int tcp6_seq_show(struct seq_file *seq, void *v) { struct tcp_iter_state *st; struct sock *sk = v; if (v == SEQ_START_TOKEN) { seq_puts(seq, " sl " "local_address " "remote_address " "st tx_queue rx_queue tr tm->when retrnsmt" " uid timeout inode\n"); goto out; } st = seq->private; if (sk->sk_state == TCP_TIME_WAIT) get_timewait6_sock(seq, v, st->num); else if (sk->sk_state == TCP_NEW_SYN_RECV) get_openreq6(seq, v, st->num); else get_tcp6_sock(seq, v, st->num); out: return 0; } static const struct seq_operations tcp6_seq_ops = { .show = tcp6_seq_show, .start = tcp_seq_start, .next = tcp_seq_next, .stop = tcp_seq_stop, }; static struct tcp_seq_afinfo tcp6_seq_afinfo = { .family = AF_INET6, }; int __net_init tcp6_proc_init(struct net *net) { if (!proc_create_net_data("tcp6", 0444, net->proc_net, &tcp6_seq_ops, sizeof(struct tcp_iter_state), &tcp6_seq_afinfo)) return -ENOMEM; return 0; } void tcp6_proc_exit(struct net *net) { remove_proc_entry("tcp6", net->proc_net); } #endif struct proto tcpv6_prot = { .name = "TCPv6", .owner = THIS_MODULE, .close = tcp_close, .pre_connect = tcp_v6_pre_connect, .connect = tcp_v6_connect, .disconnect = tcp_disconnect, .accept = inet_csk_accept, .ioctl = tcp_ioctl, .init = tcp_v6_init_sock, .destroy = tcp_v4_destroy_sock, .shutdown = tcp_shutdown, .setsockopt = tcp_setsockopt, .getsockopt = tcp_getsockopt, .bpf_bypass_getsockopt = tcp_bpf_bypass_getsockopt, .keepalive = tcp_set_keepalive, .recvmsg = tcp_recvmsg, .sendmsg = tcp_sendmsg, .splice_eof = tcp_splice_eof, .backlog_rcv = tcp_v6_do_rcv, .release_cb = tcp_release_cb, .hash = inet6_hash, .unhash = inet_unhash, .get_port = inet_csk_get_port, .put_port = inet_put_port, #ifdef CONFIG_BPF_SYSCALL .psock_update_sk_prot = tcp_bpf_update_proto, #endif .enter_memory_pressure = tcp_enter_memory_pressure, .leave_memory_pressure = tcp_leave_memory_pressure, .stream_memory_free = tcp_stream_memory_free, .sockets_allocated = &tcp_sockets_allocated, .memory_allocated = &tcp_memory_allocated, .per_cpu_fw_alloc = &tcp_memory_per_cpu_fw_alloc, .memory_pressure = &tcp_memory_pressure, .orphan_count = &tcp_orphan_count, .sysctl_mem = sysctl_tcp_mem, .sysctl_wmem_offset = offsetof(struct net, ipv4.sysctl_tcp_wmem), .sysctl_rmem_offset = offsetof(struct net, ipv4.sysctl_tcp_rmem), .max_header = MAX_TCP_HEADER, .obj_size = sizeof(struct tcp6_sock), .ipv6_pinfo_offset = offsetof(struct tcp6_sock, inet6), .slab_flags = SLAB_TYPESAFE_BY_RCU, .twsk_prot = &tcp6_timewait_sock_ops, .rsk_prot = &tcp6_request_sock_ops, .h.hashinfo = NULL, .no_autobind = true, .diag_destroy = tcp_abort, }; EXPORT_SYMBOL_GPL(tcpv6_prot); static struct inet_protosw tcpv6_protosw = { .type = SOCK_STREAM, .protocol = IPPROTO_TCP, .prot = &tcpv6_prot, .ops = &inet6_stream_ops, .flags = INET_PROTOSW_PERMANENT | INET_PROTOSW_ICSK, }; static int __net_init tcpv6_net_init(struct net *net) { int res; res = inet_ctl_sock_create(&net->ipv6.tcp_sk, PF_INET6, SOCK_RAW, IPPROTO_TCP, net); if (!res) net->ipv6.tcp_sk->sk_clockid = CLOCK_MONOTONIC; return res; } static void __net_exit tcpv6_net_exit(struct net *net) { inet_ctl_sock_destroy(net->ipv6.tcp_sk); } static struct pernet_operations tcpv6_net_ops = { .init = tcpv6_net_init, .exit = tcpv6_net_exit, }; int __init tcpv6_init(void) { int ret; net_hotdata.tcpv6_protocol = (struct inet6_protocol) { .handler = tcp_v6_rcv, .err_handler = tcp_v6_err, .flags = INET6_PROTO_NOPOLICY | INET6_PROTO_FINAL, }; ret = inet6_add_protocol(&net_hotdata.tcpv6_protocol, IPPROTO_TCP); if (ret) goto out; /* register inet6 protocol */ ret = inet6_register_protosw(&tcpv6_protosw); if (ret) goto out_tcpv6_protocol; ret = register_pernet_subsys(&tcpv6_net_ops); if (ret) goto out_tcpv6_protosw; ret = mptcpv6_init(); if (ret) goto out_tcpv6_pernet_subsys; out: return ret; out_tcpv6_pernet_subsys: unregister_pernet_subsys(&tcpv6_net_ops); out_tcpv6_protosw: inet6_unregister_protosw(&tcpv6_protosw); out_tcpv6_protocol: inet6_del_protocol(&net_hotdata.tcpv6_protocol, IPPROTO_TCP); goto out; } void tcpv6_exit(void) { unregister_pernet_subsys(&tcpv6_net_ops); inet6_unregister_protosw(&tcpv6_protosw); inet6_del_protocol(&net_hotdata.tcpv6_protocol, IPPROTO_TCP); } |
| 77 78 45 45 78 1 1 1 1 77 77 77 77 77 77 1 1 1 10 10 10 10 10 91 91 88 74 74 1 72 74 74 73 73 25 25 67 36 45 45 45 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 | // SPDX-License-Identifier: GPL-2.0 /* Multipath TCP token management * Copyright (c) 2017 - 2019, Intel Corporation. * * Note: This code is based on mptcp_ctrl.c from multipath-tcp.org, * authored by: * * Sébastien Barré <sebastien.barre@uclouvain.be> * Christoph Paasch <christoph.paasch@uclouvain.be> * Jaakko Korkeaniemi <jaakko.korkeaniemi@aalto.fi> * Gregory Detal <gregory.detal@uclouvain.be> * Fabien Duchêne <fabien.duchene@uclouvain.be> * Andreas Seelinger <Andreas.Seelinger@rwth-aachen.de> * Lavkesh Lahngir <lavkesh51@gmail.com> * Andreas Ripke <ripke@neclab.eu> * Vlad Dogaru <vlad.dogaru@intel.com> * Octavian Purdila <octavian.purdila@intel.com> * John Ronan <jronan@tssg.org> * Catalin Nicutar <catalin.nicutar@gmail.com> * Brandon Heller <brandonh@stanford.edu> */ #define pr_fmt(fmt) "MPTCP: " fmt #include <linux/kernel.h> #include <linux/module.h> #include <linux/memblock.h> #include <linux/ip.h> #include <linux/tcp.h> #include <net/sock.h> #include <net/inet_common.h> #include <net/protocol.h> #include <net/mptcp.h> #include "protocol.h" #define TOKEN_MAX_CHAIN_LEN 4 struct token_bucket { spinlock_t lock; int chain_len; struct hlist_nulls_head req_chain; struct hlist_nulls_head msk_chain; }; static struct token_bucket *token_hash __read_mostly; static unsigned int token_mask __read_mostly; static struct token_bucket *token_bucket(u32 token) { return &token_hash[token & token_mask]; } /* called with bucket lock held */ static struct mptcp_subflow_request_sock * __token_lookup_req(struct token_bucket *t, u32 token) { struct mptcp_subflow_request_sock *req; struct hlist_nulls_node *pos; hlist_nulls_for_each_entry_rcu(req, pos, &t->req_chain, token_node) if (req->token == token) return req; return NULL; } /* called with bucket lock held */ static struct mptcp_sock * __token_lookup_msk(struct token_bucket *t, u32 token) { struct hlist_nulls_node *pos; struct sock *sk; sk_nulls_for_each_rcu(sk, pos, &t->msk_chain) if (mptcp_sk(sk)->token == token) return mptcp_sk(sk); return NULL; } static bool __token_bucket_busy(struct token_bucket *t, u32 token) { return !token || t->chain_len >= TOKEN_MAX_CHAIN_LEN || __token_lookup_req(t, token) || __token_lookup_msk(t, token); } static void mptcp_crypto_key_gen_sha(u64 *key, u32 *token, u64 *idsn) { /* we might consider a faster version that computes the key as a * hash of some information available in the MPTCP socket. Use * random data at the moment, as it's probably the safest option * in case multiple sockets are opened in different namespaces at * the same time. */ get_random_bytes(key, sizeof(u64)); mptcp_crypto_key_sha(*key, token, idsn); } /** * mptcp_token_new_request - create new key/idsn/token for subflow_request * @req: the request socket * * This function is called when a new mptcp connection is coming in. * * It creates a unique token to identify the new mptcp connection, * a secret local key and the initial data sequence number (idsn). * * Returns 0 on success. */ int mptcp_token_new_request(struct request_sock *req) { struct mptcp_subflow_request_sock *subflow_req = mptcp_subflow_rsk(req); struct token_bucket *bucket; u32 token; mptcp_crypto_key_sha(subflow_req->local_key, &subflow_req->token, &subflow_req->idsn); pr_debug("req=%p local_key=%llu, token=%u, idsn=%llu\n", req, subflow_req->local_key, subflow_req->token, subflow_req->idsn); token = subflow_req->token; bucket = token_bucket(token); spin_lock_bh(&bucket->lock); if (__token_bucket_busy(bucket, token)) { spin_unlock_bh(&bucket->lock); return -EBUSY; } hlist_nulls_add_head_rcu(&subflow_req->token_node, &bucket->req_chain); bucket->chain_len++; spin_unlock_bh(&bucket->lock); return 0; } /** * mptcp_token_new_connect - create new key/idsn/token for subflow * @ssk: the socket that will initiate a connection * * This function is called when a new outgoing mptcp connection is * initiated. * * It creates a unique token to identify the new mptcp connection, * a secret local key and the initial data sequence number (idsn). * * On success, the mptcp connection can be found again using * the computed token at a later time, this is needed to process * join requests. * * returns 0 on success. */ int mptcp_token_new_connect(struct sock *ssk) { struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk); struct mptcp_sock *msk = mptcp_sk(subflow->conn); int retries = MPTCP_TOKEN_MAX_RETRIES; struct sock *sk = subflow->conn; struct token_bucket *bucket; again: mptcp_crypto_key_gen_sha(&subflow->local_key, &subflow->token, &subflow->idsn); bucket = token_bucket(subflow->token); spin_lock_bh(&bucket->lock); if (__token_bucket_busy(bucket, subflow->token)) { spin_unlock_bh(&bucket->lock); if (!--retries) return -EBUSY; goto again; } pr_debug("ssk=%p, local_key=%llu, token=%u, idsn=%llu\n", ssk, subflow->local_key, subflow->token, subflow->idsn); WRITE_ONCE(msk->token, subflow->token); __sk_nulls_add_node_rcu((struct sock *)msk, &bucket->msk_chain); bucket->chain_len++; spin_unlock_bh(&bucket->lock); sock_prot_inuse_add(sock_net(sk), sk->sk_prot, 1); return 0; } /** * mptcp_token_accept - replace a req sk with full sock in token hash * @req: the request socket to be removed * @msk: the just cloned socket linked to the new connection * * Called when a SYN packet creates a new logical connection, i.e. * is not a join request. */ void mptcp_token_accept(struct mptcp_subflow_request_sock *req, struct mptcp_sock *msk) { struct mptcp_subflow_request_sock *pos; struct sock *sk = (struct sock *)msk; struct token_bucket *bucket; sock_prot_inuse_add(sock_net(sk), sk->sk_prot, 1); bucket = token_bucket(req->token); spin_lock_bh(&bucket->lock); /* pedantic lookup check for the moved token */ pos = __token_lookup_req(bucket, req->token); if (!WARN_ON_ONCE(pos != req)) hlist_nulls_del_init_rcu(&req->token_node); __sk_nulls_add_node_rcu((struct sock *)msk, &bucket->msk_chain); spin_unlock_bh(&bucket->lock); } bool mptcp_token_exists(u32 token) { struct hlist_nulls_node *pos; struct token_bucket *bucket; struct mptcp_sock *msk; struct sock *sk; rcu_read_lock(); bucket = token_bucket(token); again: sk_nulls_for_each_rcu(sk, pos, &bucket->msk_chain) { msk = mptcp_sk(sk); if (READ_ONCE(msk->token) == token) goto found; } if (get_nulls_value(pos) != (token & token_mask)) goto again; rcu_read_unlock(); return false; found: rcu_read_unlock(); return true; } /** * mptcp_token_get_sock - retrieve mptcp connection sock using its token * @net: restrict to this namespace * @token: token of the mptcp connection to retrieve * * This function returns the mptcp connection structure with the given token. * A reference count on the mptcp socket returned is taken. * * returns NULL if no connection with the given token value exists. */ struct mptcp_sock *mptcp_token_get_sock(struct net *net, u32 token) { struct hlist_nulls_node *pos; struct token_bucket *bucket; struct mptcp_sock *msk; struct sock *sk; rcu_read_lock(); bucket = token_bucket(token); again: sk_nulls_for_each_rcu(sk, pos, &bucket->msk_chain) { msk = mptcp_sk(sk); if (READ_ONCE(msk->token) != token || !net_eq(sock_net(sk), net)) continue; if (!refcount_inc_not_zero(&sk->sk_refcnt)) goto not_found; if (READ_ONCE(msk->token) != token || !net_eq(sock_net(sk), net)) { sock_put(sk); goto again; } goto found; } if (get_nulls_value(pos) != (token & token_mask)) goto again; not_found: msk = NULL; found: rcu_read_unlock(); return msk; } EXPORT_SYMBOL_GPL(mptcp_token_get_sock); /** * mptcp_token_iter_next - iterate over the token container from given pos * @net: namespace to be iterated * @s_slot: start slot number * @s_num: start number inside the given lock * * This function returns the first mptcp connection structure found inside the * token container starting from the specified position, or NULL. * * On successful iteration, the iterator is moved to the next position and * a reference to the returned socket is acquired. */ struct mptcp_sock *mptcp_token_iter_next(const struct net *net, long *s_slot, long *s_num) { struct mptcp_sock *ret = NULL; struct hlist_nulls_node *pos; int slot, num = 0; for (slot = *s_slot; slot <= token_mask; *s_num = 0, slot++) { struct token_bucket *bucket = &token_hash[slot]; struct sock *sk; num = 0; if (hlist_nulls_empty(&bucket->msk_chain)) continue; rcu_read_lock(); sk_nulls_for_each_rcu(sk, pos, &bucket->msk_chain) { ++num; if (!net_eq(sock_net(sk), net)) continue; if (num <= *s_num) continue; if (!refcount_inc_not_zero(&sk->sk_refcnt)) continue; if (!net_eq(sock_net(sk), net)) { sock_put(sk); continue; } ret = mptcp_sk(sk); rcu_read_unlock(); goto out; } rcu_read_unlock(); } out: *s_slot = slot; *s_num = num; return ret; } EXPORT_SYMBOL_GPL(mptcp_token_iter_next); /** * mptcp_token_destroy_request - remove mptcp connection/token * @req: mptcp request socket dropping the token * * Remove the token associated to @req. */ void mptcp_token_destroy_request(struct request_sock *req) { struct mptcp_subflow_request_sock *subflow_req = mptcp_subflow_rsk(req); struct mptcp_subflow_request_sock *pos; struct token_bucket *bucket; if (hlist_nulls_unhashed(&subflow_req->token_node)) return; bucket = token_bucket(subflow_req->token); spin_lock_bh(&bucket->lock); pos = __token_lookup_req(bucket, subflow_req->token); if (!WARN_ON_ONCE(pos != subflow_req)) { hlist_nulls_del_init_rcu(&pos->token_node); bucket->chain_len--; } spin_unlock_bh(&bucket->lock); } /** * mptcp_token_destroy - remove mptcp connection/token * @msk: mptcp connection dropping the token * * Remove the token associated to @msk */ void mptcp_token_destroy(struct mptcp_sock *msk) { struct sock *sk = (struct sock *)msk; struct token_bucket *bucket; struct mptcp_sock *pos; if (sk_unhashed((struct sock *)msk)) return; sock_prot_inuse_add(sock_net(sk), sk->sk_prot, -1); bucket = token_bucket(msk->token); spin_lock_bh(&bucket->lock); pos = __token_lookup_msk(bucket, msk->token); if (!WARN_ON_ONCE(pos != msk)) { __sk_nulls_del_node_init_rcu((struct sock *)pos); bucket->chain_len--; } spin_unlock_bh(&bucket->lock); WRITE_ONCE(msk->token, 0); } void __init mptcp_token_init(void) { int i; token_hash = alloc_large_system_hash("MPTCP token", sizeof(struct token_bucket), 0, 20,/* one slot per 1MB of memory */ HASH_ZERO, NULL, &token_mask, 0, 64 * 1024); for (i = 0; i < token_mask + 1; ++i) { INIT_HLIST_NULLS_HEAD(&token_hash[i].req_chain, i); INIT_HLIST_NULLS_HEAD(&token_hash[i].msk_chain, i); spin_lock_init(&token_hash[i].lock); } } #if IS_MODULE(CONFIG_MPTCP_KUNIT_TEST) EXPORT_SYMBOL_GPL(mptcp_token_new_request); EXPORT_SYMBOL_GPL(mptcp_token_new_connect); EXPORT_SYMBOL_GPL(mptcp_token_accept); EXPORT_SYMBOL_GPL(mptcp_token_destroy_request); EXPORT_SYMBOL_GPL(mptcp_token_destroy); #endif |
| 204 204 2 2 154 155 203 204 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 | // SPDX-License-Identifier: GPL-2.0 #include <linux/export.h> #include <linux/min_heap.h> void __min_heap_init(min_heap_char *heap, void *data, int size) { __min_heap_init_inline(heap, data, size); } EXPORT_SYMBOL(__min_heap_init); void *__min_heap_peek(struct min_heap_char *heap) { return __min_heap_peek_inline(heap); } EXPORT_SYMBOL(__min_heap_peek); bool __min_heap_full(min_heap_char *heap) { return __min_heap_full_inline(heap); } EXPORT_SYMBOL(__min_heap_full); void __min_heap_sift_down(min_heap_char *heap, int pos, size_t elem_size, const struct min_heap_callbacks *func, void *args) { __min_heap_sift_down_inline(heap, pos, elem_size, func, args); } EXPORT_SYMBOL(__min_heap_sift_down); void __min_heap_sift_up(min_heap_char *heap, size_t elem_size, size_t idx, const struct min_heap_callbacks *func, void *args) { __min_heap_sift_up_inline(heap, elem_size, idx, func, args); } EXPORT_SYMBOL(__min_heap_sift_up); void __min_heapify_all(min_heap_char *heap, size_t elem_size, const struct min_heap_callbacks *func, void *args) { __min_heapify_all_inline(heap, elem_size, func, args); } EXPORT_SYMBOL(__min_heapify_all); bool __min_heap_pop(min_heap_char *heap, size_t elem_size, const struct min_heap_callbacks *func, void *args) { return __min_heap_pop_inline(heap, elem_size, func, args); } EXPORT_SYMBOL(__min_heap_pop); void __min_heap_pop_push(min_heap_char *heap, const void *element, size_t elem_size, const struct min_heap_callbacks *func, void *args) { __min_heap_pop_push_inline(heap, element, elem_size, func, args); } EXPORT_SYMBOL(__min_heap_pop_push); bool __min_heap_push(min_heap_char *heap, const void *element, size_t elem_size, const struct min_heap_callbacks *func, void *args) { return __min_heap_push_inline(heap, element, elem_size, func, args); } EXPORT_SYMBOL(__min_heap_push); bool __min_heap_del(min_heap_char *heap, size_t elem_size, size_t idx, const struct min_heap_callbacks *func, void *args) { return __min_heap_del_inline(heap, elem_size, idx, func, args); } EXPORT_SYMBOL(__min_heap_del); |
| 16 4 12 12 1 7 3 1 1 1 1 3 3 1 4 1 3 3 3 3 3 2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 | // SPDX-License-Identifier: GPL-2.0-only /* * * Generic part shared by ipv4 and ipv6 backends. */ #include <linux/kernel.h> #include <linux/init.h> #include <linux/module.h> #include <linux/netlink.h> #include <linux/netfilter.h> #include <linux/netfilter/nf_tables.h> #include <net/netfilter/nf_tables_core.h> #include <net/netfilter/nf_tables.h> #include <linux/in.h> #include <net/xfrm.h> static const struct nla_policy nft_xfrm_policy[NFTA_XFRM_MAX + 1] = { [NFTA_XFRM_KEY] = NLA_POLICY_MAX(NLA_BE32, 255), [NFTA_XFRM_DIR] = { .type = NLA_U8 }, [NFTA_XFRM_SPNUM] = NLA_POLICY_MAX(NLA_BE32, 255), [NFTA_XFRM_DREG] = { .type = NLA_U32 }, }; struct nft_xfrm { enum nft_xfrm_keys key:8; u8 dreg; u8 dir; u8 spnum; u8 len; }; static int nft_xfrm_get_init(const struct nft_ctx *ctx, const struct nft_expr *expr, const struct nlattr * const tb[]) { struct nft_xfrm *priv = nft_expr_priv(expr); unsigned int len = 0; u32 spnum = 0; u8 dir; if (!tb[NFTA_XFRM_KEY] || !tb[NFTA_XFRM_DIR] || !tb[NFTA_XFRM_DREG]) return -EINVAL; switch (ctx->family) { case NFPROTO_IPV4: case NFPROTO_IPV6: case NFPROTO_INET: break; default: return -EOPNOTSUPP; } priv->key = ntohl(nla_get_be32(tb[NFTA_XFRM_KEY])); switch (priv->key) { case NFT_XFRM_KEY_REQID: case NFT_XFRM_KEY_SPI: len = sizeof(u32); break; case NFT_XFRM_KEY_DADDR_IP4: case NFT_XFRM_KEY_SADDR_IP4: len = sizeof(struct in_addr); break; case NFT_XFRM_KEY_DADDR_IP6: case NFT_XFRM_KEY_SADDR_IP6: len = sizeof(struct in6_addr); break; default: return -EINVAL; } dir = nla_get_u8(tb[NFTA_XFRM_DIR]); switch (dir) { case XFRM_POLICY_IN: case XFRM_POLICY_OUT: priv->dir = dir; break; default: return -EINVAL; } if (tb[NFTA_XFRM_SPNUM]) spnum = ntohl(nla_get_be32(tb[NFTA_XFRM_SPNUM])); if (spnum >= XFRM_MAX_DEPTH) return -ERANGE; priv->spnum = spnum; priv->len = len; return nft_parse_register_store(ctx, tb[NFTA_XFRM_DREG], &priv->dreg, NULL, NFT_DATA_VALUE, len); } /* Return true if key asks for daddr/saddr and current * state does have a valid address (BEET, TUNNEL). */ static bool xfrm_state_addr_ok(enum nft_xfrm_keys k, u8 family, u8 mode) { switch (k) { case NFT_XFRM_KEY_DADDR_IP4: case NFT_XFRM_KEY_SADDR_IP4: if (family == NFPROTO_IPV4) break; return false; case NFT_XFRM_KEY_DADDR_IP6: case NFT_XFRM_KEY_SADDR_IP6: if (family == NFPROTO_IPV6) break; return false; default: return true; } return mode == XFRM_MODE_BEET || mode == XFRM_MODE_TUNNEL || mode == XFRM_MODE_IPTFS; } static void nft_xfrm_state_get_key(const struct nft_xfrm *priv, struct nft_regs *regs, const struct xfrm_state *state) { u32 *dest = ®s->data[priv->dreg]; if (!xfrm_state_addr_ok(priv->key, state->props.family, state->props.mode)) { regs->verdict.code = NFT_BREAK; return; } switch (priv->key) { case NFT_XFRM_KEY_UNSPEC: case __NFT_XFRM_KEY_MAX: WARN_ON_ONCE(1); break; case NFT_XFRM_KEY_DADDR_IP4: *dest = (__force __u32)state->id.daddr.a4; return; case NFT_XFRM_KEY_DADDR_IP6: memcpy(dest, &state->id.daddr.in6, sizeof(struct in6_addr)); return; case NFT_XFRM_KEY_SADDR_IP4: *dest = (__force __u32)state->props.saddr.a4; return; case NFT_XFRM_KEY_SADDR_IP6: memcpy(dest, &state->props.saddr.in6, sizeof(struct in6_addr)); return; case NFT_XFRM_KEY_REQID: *dest = state->props.reqid; return; case NFT_XFRM_KEY_SPI: *dest = (__force __u32)state->id.spi; return; } regs->verdict.code = NFT_BREAK; } static void nft_xfrm_get_eval_in(const struct nft_xfrm *priv, struct nft_regs *regs, const struct nft_pktinfo *pkt) { const struct sec_path *sp = skb_sec_path(pkt->skb); const struct xfrm_state *state; if (sp == NULL || sp->len <= priv->spnum) { regs->verdict.code = NFT_BREAK; return; } state = sp->xvec[priv->spnum]; nft_xfrm_state_get_key(priv, regs, state); } static void nft_xfrm_get_eval_out(const struct nft_xfrm *priv, struct nft_regs *regs, const struct nft_pktinfo *pkt) { const struct dst_entry *dst = skb_dst(pkt->skb); int i; for (i = 0; dst && dst->xfrm; dst = ((const struct xfrm_dst *)dst)->child, i++) { if (i < priv->spnum) continue; nft_xfrm_state_get_key(priv, regs, dst->xfrm); return; } regs->verdict.code = NFT_BREAK; } static void nft_xfrm_get_eval(const struct nft_expr *expr, struct nft_regs *regs, const struct nft_pktinfo *pkt) { const struct nft_xfrm *priv = nft_expr_priv(expr); switch (priv->dir) { case XFRM_POLICY_IN: nft_xfrm_get_eval_in(priv, regs, pkt); break; case XFRM_POLICY_OUT: nft_xfrm_get_eval_out(priv, regs, pkt); break; default: WARN_ON_ONCE(1); regs->verdict.code = NFT_BREAK; break; } } static int nft_xfrm_get_dump(struct sk_buff *skb, const struct nft_expr *expr, bool reset) { const struct nft_xfrm *priv = nft_expr_priv(expr); if (nft_dump_register(skb, NFTA_XFRM_DREG, priv->dreg)) return -1; if (nla_put_be32(skb, NFTA_XFRM_KEY, htonl(priv->key))) return -1; if (nla_put_u8(skb, NFTA_XFRM_DIR, priv->dir)) return -1; if (nla_put_be32(skb, NFTA_XFRM_SPNUM, htonl(priv->spnum))) return -1; return 0; } static int nft_xfrm_validate(const struct nft_ctx *ctx, const struct nft_expr *expr) { const struct nft_xfrm *priv = nft_expr_priv(expr); unsigned int hooks; if (ctx->family != NFPROTO_IPV4 && ctx->family != NFPROTO_IPV6 && ctx->family != NFPROTO_INET) return -EOPNOTSUPP; switch (priv->dir) { case XFRM_POLICY_IN: hooks = (1 << NF_INET_FORWARD) | (1 << NF_INET_LOCAL_IN) | (1 << NF_INET_PRE_ROUTING); break; case XFRM_POLICY_OUT: hooks = (1 << NF_INET_FORWARD) | (1 << NF_INET_LOCAL_OUT) | (1 << NF_INET_POST_ROUTING); break; default: WARN_ON_ONCE(1); return -EINVAL; } return nft_chain_validate_hooks(ctx->chain, hooks); } static bool nft_xfrm_reduce(struct nft_regs_track *track, const struct nft_expr *expr) { const struct nft_xfrm *priv = nft_expr_priv(expr); const struct nft_xfrm *xfrm; if (!nft_reg_track_cmp(track, expr, priv->dreg)) { nft_reg_track_update(track, expr, priv->dreg, priv->len); return false; } xfrm = nft_expr_priv(track->regs[priv->dreg].selector); if (priv->key != xfrm->key || priv->dreg != xfrm->dreg || priv->dir != xfrm->dir || priv->spnum != xfrm->spnum) { nft_reg_track_update(track, expr, priv->dreg, priv->len); return false; } if (!track->regs[priv->dreg].bitwise) return true; return nft_expr_reduce_bitwise(track, expr); } static struct nft_expr_type nft_xfrm_type; static const struct nft_expr_ops nft_xfrm_get_ops = { .type = &nft_xfrm_type, .size = NFT_EXPR_SIZE(sizeof(struct nft_xfrm)), .eval = nft_xfrm_get_eval, .init = nft_xfrm_get_init, .dump = nft_xfrm_get_dump, .validate = nft_xfrm_validate, .reduce = nft_xfrm_reduce, }; static struct nft_expr_type nft_xfrm_type __read_mostly = { .name = "xfrm", .ops = &nft_xfrm_get_ops, .policy = nft_xfrm_policy, .maxattr = NFTA_XFRM_MAX, .owner = THIS_MODULE, }; static int __init nft_xfrm_module_init(void) { return nft_register_expr(&nft_xfrm_type); } static void __exit nft_xfrm_module_exit(void) { nft_unregister_expr(&nft_xfrm_type); } module_init(nft_xfrm_module_init); module_exit(nft_xfrm_module_exit); MODULE_LICENSE("GPL"); MODULE_DESCRIPTION("nf_tables: xfrm/IPSec matching"); MODULE_AUTHOR("Florian Westphal <fw@strlen.de>"); MODULE_AUTHOR("Máté Eckl <ecklm94@gmail.com>"); MODULE_ALIAS_NFT_EXPR("xfrm"); |
| 6 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 | #ifndef INTERNAL_IO_WQ_H #define INTERNAL_IO_WQ_H #include <linux/refcount.h> #include <linux/io_uring_types.h> struct io_wq; enum { IO_WQ_WORK_CANCEL = 1, IO_WQ_WORK_HASHED = 2, IO_WQ_WORK_UNBOUND = 4, IO_WQ_WORK_CONCURRENT = 16, IO_WQ_HASH_SHIFT = 24, /* upper 8 bits are used for hash key */ }; enum io_wq_cancel { IO_WQ_CANCEL_OK, /* cancelled before started */ IO_WQ_CANCEL_RUNNING, /* found, running, and attempted cancelled */ IO_WQ_CANCEL_NOTFOUND, /* work not found */ }; typedef struct io_wq_work *(free_work_fn)(struct io_wq_work *); typedef void (io_wq_work_fn)(struct io_wq_work *); struct io_wq_hash { refcount_t refs; unsigned long map; struct wait_queue_head wait; }; static inline void io_wq_put_hash(struct io_wq_hash *hash) { if (refcount_dec_and_test(&hash->refs)) kfree(hash); } struct io_wq_data { struct io_wq_hash *hash; struct task_struct *task; io_wq_work_fn *do_work; free_work_fn *free_work; }; struct io_wq *io_wq_create(unsigned bounded, struct io_wq_data *data); void io_wq_exit_start(struct io_wq *wq); void io_wq_put_and_exit(struct io_wq *wq); void io_wq_enqueue(struct io_wq *wq, struct io_wq_work *work); void io_wq_hash_work(struct io_wq_work *work, void *val); int io_wq_cpu_affinity(struct io_uring_task *tctx, cpumask_var_t mask); int io_wq_max_workers(struct io_wq *wq, int *new_count); bool io_wq_worker_stopped(void); static inline bool io_wq_is_hashed(struct io_wq_work *work) { return atomic_read(&work->flags) & IO_WQ_WORK_HASHED; } typedef bool (work_cancel_fn)(struct io_wq_work *, void *); enum io_wq_cancel io_wq_cancel_cb(struct io_wq *wq, work_cancel_fn *cancel, void *data, bool cancel_all); #if defined(CONFIG_IO_WQ) extern void io_wq_worker_sleeping(struct task_struct *); extern void io_wq_worker_running(struct task_struct *); #else static inline void io_wq_worker_sleeping(struct task_struct *tsk) { } static inline void io_wq_worker_running(struct task_struct *tsk) { } #endif static inline bool io_wq_current_is_worker(void) { return in_task() && (current->flags & PF_IO_WORKER) && current->worker_private; } #endif |
| 359 358 359 359 358 112 237 235 266 235 328 9 392 6 148 249 331 4 171 18 444 351 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 | /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _BCACHEFS_ALLOC_BACKGROUND_H #define _BCACHEFS_ALLOC_BACKGROUND_H #include "bcachefs.h" #include "alloc_types.h" #include "buckets.h" #include "debug.h" #include "super.h" /* How out of date a pointer gen is allowed to be: */ #define BUCKET_GC_GEN_MAX 96U static inline bool bch2_dev_bucket_exists(struct bch_fs *c, struct bpos pos) { rcu_read_lock(); struct bch_dev *ca = bch2_dev_rcu_noerror(c, pos.inode); bool ret = ca && bucket_valid(ca, pos.offset); rcu_read_unlock(); return ret; } static inline u64 bucket_to_u64(struct bpos bucket) { return (bucket.inode << 48) | bucket.offset; } static inline struct bpos u64_to_bucket(u64 bucket) { return POS(bucket >> 48, bucket & ~(~0ULL << 48)); } static inline u8 alloc_gc_gen(struct bch_alloc_v4 a) { return a.gen - a.oldest_gen; } static inline void alloc_to_bucket(struct bucket *dst, struct bch_alloc_v4 src) { dst->gen = src.gen; dst->data_type = src.data_type; dst->stripe_sectors = src.stripe_sectors; dst->dirty_sectors = src.dirty_sectors; dst->cached_sectors = src.cached_sectors; dst->stripe = src.stripe; } static inline void __bucket_m_to_alloc(struct bch_alloc_v4 *dst, struct bucket src) { dst->gen = src.gen; dst->data_type = src.data_type; dst->stripe_sectors = src.stripe_sectors; dst->dirty_sectors = src.dirty_sectors; dst->cached_sectors = src.cached_sectors; dst->stripe = src.stripe; } static inline struct bch_alloc_v4 bucket_m_to_alloc(struct bucket b) { struct bch_alloc_v4 ret = {}; __bucket_m_to_alloc(&ret, b); return ret; } static inline enum bch_data_type bucket_data_type(enum bch_data_type data_type) { switch (data_type) { case BCH_DATA_cached: case BCH_DATA_stripe: return BCH_DATA_user; default: return data_type; } } static inline bool bucket_data_type_mismatch(enum bch_data_type bucket, enum bch_data_type ptr) { return !data_type_is_empty(bucket) && bucket_data_type(bucket) != bucket_data_type(ptr); } /* * It is my general preference to use unsigned types for unsigned quantities - * however, these helpers are used in disk accounting calculations run by * triggers where the output will be negated and added to an s64. unsigned is * right out even though all these quantities will fit in 32 bits, since it * won't be sign extended correctly; u64 will negate "correctly", but s64 is the * simpler option here. */ static inline s64 bch2_bucket_sectors_total(struct bch_alloc_v4 a) { return a.stripe_sectors + a.dirty_sectors + a.cached_sectors; } static inline s64 bch2_bucket_sectors_dirty(struct bch_alloc_v4 a) { return a.stripe_sectors + a.dirty_sectors; } static inline s64 bch2_bucket_sectors(struct bch_alloc_v4 a) { return a.data_type == BCH_DATA_cached ? a.cached_sectors : bch2_bucket_sectors_dirty(a); } static inline s64 bch2_bucket_sectors_fragmented(struct bch_dev *ca, struct bch_alloc_v4 a) { int d = bch2_bucket_sectors(a); return d ? max(0, ca->mi.bucket_size - d) : 0; } static inline s64 bch2_gc_bucket_sectors_fragmented(struct bch_dev *ca, struct bucket a) { int d = a.stripe_sectors + a.dirty_sectors; return d ? max(0, ca->mi.bucket_size - d) : 0; } static inline s64 bch2_bucket_sectors_unstriped(struct bch_alloc_v4 a) { return a.data_type == BCH_DATA_stripe ? a.dirty_sectors : 0; } static inline enum bch_data_type alloc_data_type(struct bch_alloc_v4 a, enum bch_data_type data_type) { if (a.stripe) return data_type == BCH_DATA_parity ? data_type : BCH_DATA_stripe; if (bch2_bucket_sectors_dirty(a)) return data_type; if (a.cached_sectors) return BCH_DATA_cached; if (BCH_ALLOC_V4_NEED_DISCARD(&a)) return BCH_DATA_need_discard; if (alloc_gc_gen(a) >= BUCKET_GC_GEN_MAX) return BCH_DATA_need_gc_gens; return BCH_DATA_free; } static inline void alloc_data_type_set(struct bch_alloc_v4 *a, enum bch_data_type data_type) { a->data_type = alloc_data_type(*a, data_type); } static inline u64 alloc_lru_idx_read(struct bch_alloc_v4 a) { return a.data_type == BCH_DATA_cached ? a.io_time[READ] & LRU_TIME_MAX : 0; } #define DATA_TYPES_MOVABLE \ ((1U << BCH_DATA_btree)| \ (1U << BCH_DATA_user)| \ (1U << BCH_DATA_stripe)) static inline bool data_type_movable(enum bch_data_type type) { return (1U << type) & DATA_TYPES_MOVABLE; } static inline u64 alloc_lru_idx_fragmentation(struct bch_alloc_v4 a, struct bch_dev *ca) { if (a.data_type >= BCH_DATA_NR) return 0; if (!data_type_movable(a.data_type) || !bch2_bucket_sectors_fragmented(ca, a)) return 0; /* * avoid overflowing LRU_TIME_BITS on a corrupted fs, when * bucket_sectors_dirty is (much) bigger than bucket_size */ u64 d = min_t(s64, bch2_bucket_sectors_dirty(a), ca->mi.bucket_size); return div_u64(d * (1ULL << 31), ca->mi.bucket_size); } static inline u64 alloc_freespace_genbits(struct bch_alloc_v4 a) { return ((u64) alloc_gc_gen(a) >> 4) << 56; } static inline struct bpos alloc_freespace_pos(struct bpos pos, struct bch_alloc_v4 a) { pos.offset |= alloc_freespace_genbits(a); return pos; } static inline unsigned alloc_v4_u64s_noerror(const struct bch_alloc_v4 *a) { return (BCH_ALLOC_V4_BACKPOINTERS_START(a) ?: BCH_ALLOC_V4_U64s_V0) + BCH_ALLOC_V4_NR_BACKPOINTERS(a) * (sizeof(struct bch_backpointer) / sizeof(u64)); } static inline unsigned alloc_v4_u64s(const struct bch_alloc_v4 *a) { unsigned ret = alloc_v4_u64s_noerror(a); BUG_ON(ret > U8_MAX - BKEY_U64s); return ret; } static inline void set_alloc_v4_u64s(struct bkey_i_alloc_v4 *a) { set_bkey_val_u64s(&a->k, alloc_v4_u64s(&a->v)); } struct bkey_i_alloc_v4 * bch2_trans_start_alloc_update_noupdate(struct btree_trans *, struct btree_iter *, struct bpos); struct bkey_i_alloc_v4 * bch2_trans_start_alloc_update(struct btree_trans *, struct bpos, enum btree_iter_update_trigger_flags); void __bch2_alloc_to_v4(struct bkey_s_c, struct bch_alloc_v4 *); static inline const struct bch_alloc_v4 *bch2_alloc_to_v4(struct bkey_s_c k, struct bch_alloc_v4 *convert) { const struct bch_alloc_v4 *ret; if (unlikely(k.k->type != KEY_TYPE_alloc_v4)) goto slowpath; ret = bkey_s_c_to_alloc_v4(k).v; if (BCH_ALLOC_V4_BACKPOINTERS_START(ret) != BCH_ALLOC_V4_U64s) goto slowpath; return ret; slowpath: __bch2_alloc_to_v4(k, convert); return convert; } struct bkey_i_alloc_v4 *bch2_alloc_to_v4_mut(struct btree_trans *, struct bkey_s_c); int bch2_bucket_io_time_reset(struct btree_trans *, unsigned, size_t, int); int bch2_alloc_v1_validate(struct bch_fs *, struct bkey_s_c, struct bkey_validate_context); int bch2_alloc_v2_validate(struct bch_fs *, struct bkey_s_c, struct bkey_validate_context); int bch2_alloc_v3_validate(struct bch_fs *, struct bkey_s_c, struct bkey_validate_context); int bch2_alloc_v4_validate(struct bch_fs *, struct bkey_s_c, struct bkey_validate_context); void bch2_alloc_v4_swab(struct bkey_s); void bch2_alloc_to_text(struct printbuf *, struct bch_fs *, struct bkey_s_c); #define bch2_bkey_ops_alloc ((struct bkey_ops) { \ .key_validate = bch2_alloc_v1_validate, \ .val_to_text = bch2_alloc_to_text, \ .trigger = bch2_trigger_alloc, \ .min_val_size = 8, \ }) #define bch2_bkey_ops_alloc_v2 ((struct bkey_ops) { \ .key_validate = bch2_alloc_v2_validate, \ .val_to_text = bch2_alloc_to_text, \ .trigger = bch2_trigger_alloc, \ .min_val_size = 8, \ }) #define bch2_bkey_ops_alloc_v3 ((struct bkey_ops) { \ .key_validate = bch2_alloc_v3_validate, \ .val_to_text = bch2_alloc_to_text, \ .trigger = bch2_trigger_alloc, \ .min_val_size = 16, \ }) #define bch2_bkey_ops_alloc_v4 ((struct bkey_ops) { \ .key_validate = bch2_alloc_v4_validate, \ .val_to_text = bch2_alloc_to_text, \ .swab = bch2_alloc_v4_swab, \ .trigger = bch2_trigger_alloc, \ .min_val_size = 48, \ }) int bch2_bucket_gens_validate(struct bch_fs *, struct bkey_s_c, struct bkey_validate_context); void bch2_bucket_gens_to_text(struct printbuf *, struct bch_fs *, struct bkey_s_c); #define bch2_bkey_ops_bucket_gens ((struct bkey_ops) { \ .key_validate = bch2_bucket_gens_validate, \ .val_to_text = bch2_bucket_gens_to_text, \ }) int bch2_bucket_gens_init(struct bch_fs *); static inline bool bkey_is_alloc(const struct bkey *k) { return k->type == KEY_TYPE_alloc || k->type == KEY_TYPE_alloc_v2 || k->type == KEY_TYPE_alloc_v3; } int bch2_alloc_read(struct bch_fs *); int bch2_alloc_key_to_dev_counters(struct btree_trans *, struct bch_dev *, const struct bch_alloc_v4 *, const struct bch_alloc_v4 *, unsigned); int bch2_trigger_alloc(struct btree_trans *, enum btree_id, unsigned, struct bkey_s_c, struct bkey_s, enum btree_iter_update_trigger_flags); int bch2_check_discard_freespace_key(struct btree_trans *, struct btree_iter *, u8 *, bool); int bch2_check_alloc_info(struct bch_fs *); int bch2_check_alloc_to_lru_refs(struct bch_fs *); void bch2_dev_do_discards(struct bch_dev *); void bch2_do_discards(struct bch_fs *); static inline u64 should_invalidate_buckets(struct bch_dev *ca, struct bch_dev_usage u) { u64 want_free = ca->mi.nbuckets >> 7; u64 free = max_t(s64, 0, u.d[BCH_DATA_free].buckets + u.d[BCH_DATA_need_discard].buckets - bch2_dev_buckets_reserved(ca, BCH_WATERMARK_stripe)); return clamp_t(s64, want_free - free, 0, u.d[BCH_DATA_cached].buckets); } void bch2_dev_do_invalidates(struct bch_dev *); void bch2_do_invalidates(struct bch_fs *); static inline struct bch_backpointer *alloc_v4_backpointers(struct bch_alloc_v4 *a) { return (void *) ((u64 *) &a->v + (BCH_ALLOC_V4_BACKPOINTERS_START(a) ?: BCH_ALLOC_V4_U64s_V0)); } static inline const struct bch_backpointer *alloc_v4_backpointers_c(const struct bch_alloc_v4 *a) { return (void *) ((u64 *) &a->v + BCH_ALLOC_V4_BACKPOINTERS_START(a)); } int bch2_dev_freespace_init(struct bch_fs *, struct bch_dev *, u64, u64); int bch2_fs_freespace_init(struct bch_fs *); int bch2_dev_remove_alloc(struct bch_fs *, struct bch_dev *); void bch2_recalc_capacity(struct bch_fs *); u64 bch2_min_rw_member_capacity(struct bch_fs *); void bch2_dev_allocator_remove(struct bch_fs *, struct bch_dev *); void bch2_dev_allocator_add(struct bch_fs *, struct bch_dev *); void bch2_dev_allocator_background_exit(struct bch_dev *); void bch2_dev_allocator_background_init(struct bch_dev *); void bch2_fs_allocator_background_init(struct bch_fs *); #endif /* _BCACHEFS_ALLOC_BACKGROUND_H */ |
| 45 22 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 | /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _ASM_X86_LOCAL_H #define _ASM_X86_LOCAL_H #include <linux/percpu.h> #include <linux/atomic.h> #include <asm/asm.h> typedef struct { atomic_long_t a; } local_t; #define LOCAL_INIT(i) { ATOMIC_LONG_INIT(i) } #define local_read(l) atomic_long_read(&(l)->a) #define local_set(l, i) atomic_long_set(&(l)->a, (i)) static inline void local_inc(local_t *l) { asm volatile(_ASM_INC "%0" : "+m" (l->a.counter)); } static inline void local_dec(local_t *l) { asm volatile(_ASM_DEC "%0" : "+m" (l->a.counter)); } static inline void local_add(long i, local_t *l) { asm volatile(_ASM_ADD "%1,%0" : "+m" (l->a.counter) : "ir" (i)); } static inline void local_sub(long i, local_t *l) { asm volatile(_ASM_SUB "%1,%0" : "+m" (l->a.counter) : "ir" (i)); } /** * local_sub_and_test - subtract value from variable and test result * @i: integer value to subtract * @l: pointer to type local_t * * Atomically subtracts @i from @l and returns * true if the result is zero, or false for all * other cases. */ static inline bool local_sub_and_test(long i, local_t *l) { return GEN_BINARY_RMWcc(_ASM_SUB, l->a.counter, e, "er", i); } /** * local_dec_and_test - decrement and test * @l: pointer to type local_t * * Atomically decrements @l by 1 and * returns true if the result is 0, or false for all other * cases. */ static inline bool local_dec_and_test(local_t *l) { return GEN_UNARY_RMWcc(_ASM_DEC, l->a.counter, e); } /** * local_inc_and_test - increment and test * @l: pointer to type local_t * * Atomically increments @l by 1 * and returns true if the result is zero, or false for all * other cases. */ static inline bool local_inc_and_test(local_t *l) { return GEN_UNARY_RMWcc(_ASM_INC, l->a.counter, e); } /** * local_add_negative - add and test if negative * @i: integer value to add * @l: pointer to type local_t * * Atomically adds @i to @l and returns true * if the result is negative, or false when * result is greater than or equal to zero. */ static inline bool local_add_negative(long i, local_t *l) { return GEN_BINARY_RMWcc(_ASM_ADD, l->a.counter, s, "er", i); } /** * local_add_return - add and return * @i: integer value to add * @l: pointer to type local_t * * Atomically adds @i to @l and returns @i + @l */ static inline long local_add_return(long i, local_t *l) { long __i = i; asm volatile(_ASM_XADD "%0, %1;" : "+r" (i), "+m" (l->a.counter) : : "memory"); return i + __i; } static inline long local_sub_return(long i, local_t *l) { return local_add_return(-i, l); } #define local_inc_return(l) (local_add_return(1, l)) #define local_dec_return(l) (local_sub_return(1, l)) static inline long local_cmpxchg(local_t *l, long old, long new) { return cmpxchg_local(&l->a.counter, old, new); } static inline bool local_try_cmpxchg(local_t *l, long *old, long new) { return try_cmpxchg_local(&l->a.counter, (typeof(l->a.counter) *) old, new); } /* * Implement local_xchg using CMPXCHG instruction without the LOCK prefix. * XCHG is expensive due to the implied LOCK prefix. The processor * cannot prefetch cachelines if XCHG is used. */ static __always_inline long local_xchg(local_t *l, long n) { long c = local_read(l); do { } while (!local_try_cmpxchg(l, &c, n)); return c; } /** * local_add_unless - add unless the number is already a given value * @l: pointer of type local_t * @a: the amount to add to l... * @u: ...unless l is equal to u. * * Atomically adds @a to @l, if @v was not already @u. * Returns true if the addition was done. */ static __always_inline bool local_add_unless(local_t *l, long a, long u) { long c = local_read(l); do { if (unlikely(c == u)) return false; } while (!local_try_cmpxchg(l, &c, c + a)); return true; } #define local_inc_not_zero(l) local_add_unless((l), 1, 0) /* On x86_32, these are no better than the atomic variants. * On x86-64 these are better than the atomic variants on SMP kernels * because they dont use a lock prefix. */ #define __local_inc(l) local_inc(l) #define __local_dec(l) local_dec(l) #define __local_add(i, l) local_add((i), (l)) #define __local_sub(i, l) local_sub((i), (l)) #endif /* _ASM_X86_LOCAL_H */ |
| 130 5 8 8 8 3 3 3 3 10 10 10 5 5 10 3 3 3 3 3 3 3 3 3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 | /* SPDX-License-Identifier: GPL-2.0 */ #ifndef __NET_TC_PED_H #define __NET_TC_PED_H #include <net/act_api.h> #include <linux/tc_act/tc_pedit.h> #include <linux/types.h> struct tcf_pedit_key_ex { enum pedit_header_type htype; enum pedit_cmd cmd; }; struct tcf_pedit_parms { struct tc_pedit_key *tcfp_keys; struct tcf_pedit_key_ex *tcfp_keys_ex; u32 tcfp_off_max_hint; unsigned char tcfp_nkeys; unsigned char tcfp_flags; struct rcu_head rcu; }; struct tcf_pedit { struct tc_action common; struct tcf_pedit_parms __rcu *parms; }; #define to_pedit(a) ((struct tcf_pedit *)a) #define to_pedit_parms(a) (rcu_dereference(to_pedit(a)->parms)) static inline bool is_tcf_pedit(const struct tc_action *a) { #ifdef CONFIG_NET_CLS_ACT if (a->ops && a->ops->id == TCA_ID_PEDIT) return true; #endif return false; } static inline int tcf_pedit_nkeys(const struct tc_action *a) { struct tcf_pedit_parms *parms; int nkeys; rcu_read_lock(); parms = to_pedit_parms(a); nkeys = parms->tcfp_nkeys; rcu_read_unlock(); return nkeys; } static inline u32 tcf_pedit_htype(const struct tc_action *a, int index) { u32 htype = TCA_PEDIT_KEY_EX_HDR_TYPE_NETWORK; struct tcf_pedit_parms *parms; rcu_read_lock(); parms = to_pedit_parms(a); if (parms->tcfp_keys_ex) htype = parms->tcfp_keys_ex[index].htype; rcu_read_unlock(); return htype; } static inline u32 tcf_pedit_cmd(const struct tc_action *a, int index) { struct tcf_pedit_parms *parms; u32 cmd = __PEDIT_CMD_MAX; rcu_read_lock(); parms = to_pedit_parms(a); if (parms->tcfp_keys_ex) cmd = parms->tcfp_keys_ex[index].cmd; rcu_read_unlock(); return cmd; } static inline u32 tcf_pedit_mask(const struct tc_action *a, int index) { struct tcf_pedit_parms *parms; u32 mask; rcu_read_lock(); parms = to_pedit_parms(a); mask = parms->tcfp_keys[index].mask; rcu_read_unlock(); return mask; } static inline u32 tcf_pedit_val(const struct tc_action *a, int index) { struct tcf_pedit_parms *parms; u32 val; rcu_read_lock(); parms = to_pedit_parms(a); val = parms->tcfp_keys[index].val; rcu_read_unlock(); return val; } static inline u32 tcf_pedit_offset(const struct tc_action *a, int index) { struct tcf_pedit_parms *parms; u32 off; rcu_read_lock(); parms = to_pedit_parms(a); off = parms->tcfp_keys[index].off; rcu_read_unlock(); return off; } #endif /* __NET_TC_PED_H */ |
| 41 77 75 5 3 6 66 72 6 66 64 17 15 2 14 1 63 64 14 64 70 1 3 3 3 2 2 64 11 60 138 1 136 1 137 136 66 74 66 82 134 71 72 50 23 24 1 1 1 3 2 45 50 50 6 1 40 2 2 2 40 24 11 5 41 41 41 5 5 35 49 1 6 34 11 46 4 50 50 7 41 1 39 2 40 1 39 2 20 22 8 1 10 1 8 1 1 8 10 4 74 67 4 2 1 43 9 2 6 6 2 4 5 9 34 31 2 2 35 16 11 5 1 15 15 4 11 14 6 1 5 11 1 36 2 1 1 1 30 1 31 3 13 1 1 11 10 3 13 10 3 1 2 8 2 11 1 2 10 8 4 7 2 8 1 9 4 5 6 13 10 3 13 13 11 2 2 3 3 2 8 3 5 4 3 3 3 4 2 1 1 4 8 8 3 5 22 1 8 13 2 2 2 177 178 177 159 176 174 101 172 11 6 5 7 7 1 6 2 4 2 4 37 1 1 23 7 3 2 32 28 28 28 26 1 26 1 27 2 2 1 1 2 1 1 63 37 31 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 | // SPDX-License-Identifier: GPL-2.0-only /* * Copyright (C) Sistina Software, Inc. 1997-2003 All rights reserved. * Copyright (C) 2004-2011 Red Hat, Inc. All rights reserved. */ #include <linux/slab.h> #include <linux/spinlock.h> #include <linux/completion.h> #include <linux/buffer_head.h> #include <linux/namei.h> #include <linux/mm.h> #include <linux/cred.h> #include <linux/xattr.h> #include <linux/posix_acl.h> #include <linux/gfs2_ondisk.h> #include <linux/crc32.h> #include <linux/iomap.h> #include <linux/security.h> #include <linux/fiemap.h> #include <linux/uaccess.h> #include "gfs2.h" #include "incore.h" #include "acl.h" #include "bmap.h" #include "dir.h" #include "xattr.h" #include "glock.h" #include "inode.h" #include "meta_io.h" #include "quota.h" #include "rgrp.h" #include "trans.h" #include "util.h" #include "super.h" #include "glops.h" static const struct inode_operations gfs2_file_iops; static const struct inode_operations gfs2_dir_iops; static const struct inode_operations gfs2_symlink_iops; /** * gfs2_set_iop - Sets inode operations * @inode: The inode with correct i_mode filled in * * GFS2 lookup code fills in vfs inode contents based on info obtained * from directory entry inside gfs2_inode_lookup(). */ static void gfs2_set_iop(struct inode *inode) { struct gfs2_sbd *sdp = GFS2_SB(inode); umode_t mode = inode->i_mode; if (S_ISREG(mode)) { inode->i_op = &gfs2_file_iops; if (gfs2_localflocks(sdp)) inode->i_fop = &gfs2_file_fops_nolock; else inode->i_fop = &gfs2_file_fops; } else if (S_ISDIR(mode)) { inode->i_op = &gfs2_dir_iops; if (gfs2_localflocks(sdp)) inode->i_fop = &gfs2_dir_fops_nolock; else inode->i_fop = &gfs2_dir_fops; } else if (S_ISLNK(mode)) { inode->i_op = &gfs2_symlink_iops; } else { inode->i_op = &gfs2_file_iops; init_special_inode(inode, inode->i_mode, inode->i_rdev); } } static int iget_test(struct inode *inode, void *opaque) { u64 no_addr = *(u64 *)opaque; return GFS2_I(inode)->i_no_addr == no_addr; } static int iget_set(struct inode *inode, void *opaque) { u64 no_addr = *(u64 *)opaque; GFS2_I(inode)->i_no_addr = no_addr; inode->i_ino = no_addr; return 0; } /** * gfs2_inode_lookup - Lookup an inode * @sb: The super block * @type: The type of the inode * @no_addr: The inode number * @no_formal_ino: The inode generation number * @blktype: Requested block type (GFS2_BLKST_DINODE or GFS2_BLKST_UNLINKED; * GFS2_BLKST_FREE to indicate not to verify) * * If @type is DT_UNKNOWN, the inode type is fetched from disk. * * If @blktype is anything other than GFS2_BLKST_FREE (which is used as a * placeholder because it doesn't otherwise make sense), the on-disk block type * is verified to be @blktype. * * When @no_formal_ino is non-zero, this function will return ERR_PTR(-ESTALE) * if it detects that @no_formal_ino doesn't match the actual inode generation * number. However, it doesn't always know unless @type is DT_UNKNOWN. * * Returns: A VFS inode, or an error */ struct inode *gfs2_inode_lookup(struct super_block *sb, unsigned int type, u64 no_addr, u64 no_formal_ino, unsigned int blktype) { struct inode *inode; struct gfs2_inode *ip; struct gfs2_holder i_gh; int error; gfs2_holder_mark_uninitialized(&i_gh); inode = iget5_locked(sb, no_addr, iget_test, iget_set, &no_addr); if (!inode) return ERR_PTR(-ENOMEM); ip = GFS2_I(inode); if (inode->i_state & I_NEW) { struct gfs2_sbd *sdp = GFS2_SB(inode); struct gfs2_glock *io_gl; int extra_flags = 0; error = gfs2_glock_get(sdp, no_addr, &gfs2_inode_glops, CREATE, &ip->i_gl); if (unlikely(error)) goto fail; error = gfs2_glock_get(sdp, no_addr, &gfs2_iopen_glops, CREATE, &io_gl); if (unlikely(error)) goto fail; /* * The only caller that sets @blktype to GFS2_BLKST_UNLINKED is * delete_work_func(). Make sure not to cancel the delete work * from within itself here. */ if (blktype == GFS2_BLKST_UNLINKED) extra_flags |= LM_FLAG_TRY; else gfs2_cancel_delete_work(io_gl); error = gfs2_glock_nq_init(io_gl, LM_ST_SHARED, GL_EXACT | GL_NOPID | extra_flags, &ip->i_iopen_gh); gfs2_glock_put(io_gl); if (unlikely(error)) goto fail; if (type == DT_UNKNOWN || blktype != GFS2_BLKST_FREE) { /* * The GL_SKIP flag indicates to skip reading the inode * block. We read the inode when instantiating it * after possibly checking the block type. */ error = gfs2_glock_nq_init(ip->i_gl, LM_ST_EXCLUSIVE, GL_SKIP, &i_gh); if (error) goto fail; error = -ESTALE; if (no_formal_ino && gfs2_inode_already_deleted(ip->i_gl, no_formal_ino)) goto fail; if (blktype != GFS2_BLKST_FREE) { error = gfs2_check_blk_type(sdp, no_addr, blktype); if (error) goto fail; } } set_bit(GLF_INSTANTIATE_NEEDED, &ip->i_gl->gl_flags); /* Lowest possible timestamp; will be overwritten in gfs2_dinode_in. */ inode_set_atime(inode, 1LL << (8 * sizeof(inode_get_atime_sec(inode)) - 1), 0); glock_set_object(ip->i_gl, ip); if (type == DT_UNKNOWN) { /* Inode glock must be locked already */ error = gfs2_instantiate(&i_gh); if (error) { glock_clear_object(ip->i_gl, ip); goto fail; } } else { ip->i_no_formal_ino = no_formal_ino; inode->i_mode = DT2IF(type); } if (gfs2_holder_initialized(&i_gh)) gfs2_glock_dq_uninit(&i_gh); glock_set_object(ip->i_iopen_gh.gh_gl, ip); gfs2_set_iop(inode); unlock_new_inode(inode); } if (no_formal_ino && ip->i_no_formal_ino && no_formal_ino != ip->i_no_formal_ino) { iput(inode); return ERR_PTR(-ESTALE); } return inode; fail: if (error == GLR_TRYFAILED) error = -EAGAIN; if (gfs2_holder_initialized(&ip->i_iopen_gh)) gfs2_glock_dq_uninit(&ip->i_iopen_gh); if (gfs2_holder_initialized(&i_gh)) gfs2_glock_dq_uninit(&i_gh); if (ip->i_gl) { gfs2_glock_put(ip->i_gl); ip->i_gl = NULL; } iget_failed(inode); return ERR_PTR(error); } /** * gfs2_lookup_by_inum - look up an inode by inode number * @sdp: The super block * @no_addr: The inode number * @no_formal_ino: The inode generation number (0 for any) * @blktype: Requested block type (see gfs2_inode_lookup) */ struct inode *gfs2_lookup_by_inum(struct gfs2_sbd *sdp, u64 no_addr, u64 no_formal_ino, unsigned int blktype) { struct super_block *sb = sdp->sd_vfs; struct inode *inode; int error; inode = gfs2_inode_lookup(sb, DT_UNKNOWN, no_addr, no_formal_ino, blktype); if (IS_ERR(inode)) return inode; if (no_formal_ino) { error = -EIO; if (GFS2_I(inode)->i_diskflags & GFS2_DIF_SYSTEM) goto fail_iput; } return inode; fail_iput: iput(inode); return ERR_PTR(error); } /** * gfs2_lookup_meta - Look up an inode in a metadata directory * @dip: The directory * @name: The name of the inode */ struct inode *gfs2_lookup_meta(struct inode *dip, const char *name) { struct qstr qstr; struct inode *inode; gfs2_str2qstr(&qstr, name); inode = gfs2_lookupi(dip, &qstr, 1); if (IS_ERR_OR_NULL(inode)) return inode ? inode : ERR_PTR(-ENOENT); /* * Must not call back into the filesystem when allocating * pages in the metadata inode's address space. */ mapping_set_gfp_mask(inode->i_mapping, GFP_NOFS); return inode; } /** * gfs2_lookupi - Look up a filename in a directory and return its inode * @dir: The inode of the directory containing the inode to look-up * @name: The name of the inode to look for * @is_root: If 1, ignore the caller's permissions * * This can be called via the VFS filldir function when NFS is doing * a readdirplus and the inode which its intending to stat isn't * already in cache. In this case we must not take the directory glock * again, since the readdir call will have already taken that lock. * * Returns: errno */ struct inode *gfs2_lookupi(struct inode *dir, const struct qstr *name, int is_root) { struct super_block *sb = dir->i_sb; struct gfs2_inode *dip = GFS2_I(dir); struct gfs2_holder d_gh; int error = 0; struct inode *inode = NULL; gfs2_holder_mark_uninitialized(&d_gh); if (!name->len || name->len > GFS2_FNAMESIZE) return ERR_PTR(-ENAMETOOLONG); if ((name->len == 1 && memcmp(name->name, ".", 1) == 0) || (name->len == 2 && memcmp(name->name, "..", 2) == 0 && dir == d_inode(sb->s_root))) { igrab(dir); return dir; } if (gfs2_glock_is_locked_by_me(dip->i_gl) == NULL) { error = gfs2_glock_nq_init(dip->i_gl, LM_ST_SHARED, 0, &d_gh); if (error) return ERR_PTR(error); } if (!is_root) { error = gfs2_permission(&nop_mnt_idmap, dir, MAY_EXEC); if (error) goto out; } inode = gfs2_dir_search(dir, name, false); if (IS_ERR(inode)) error = PTR_ERR(inode); out: if (gfs2_holder_initialized(&d_gh)) gfs2_glock_dq_uninit(&d_gh); if (error == -ENOENT) return NULL; return inode ? inode : ERR_PTR(error); } /** * create_ok - OK to create a new on-disk inode here? * @dip: Directory in which dinode is to be created * @name: Name of new dinode * @mode: * * Returns: errno */ static int create_ok(struct gfs2_inode *dip, const struct qstr *name, umode_t mode) { int error; error = gfs2_permission(&nop_mnt_idmap, &dip->i_inode, MAY_WRITE | MAY_EXEC); if (error) return error; /* Don't create entries in an unlinked directory */ if (!dip->i_inode.i_nlink) return -ENOENT; if (dip->i_entries == (u32)-1) return -EFBIG; if (S_ISDIR(mode) && dip->i_inode.i_nlink == (u32)-1) return -EMLINK; return 0; } static void munge_mode_uid_gid(const struct gfs2_inode *dip, struct inode *inode) { if (GFS2_SB(&dip->i_inode)->sd_args.ar_suiddir && (dip->i_inode.i_mode & S_ISUID) && !uid_eq(dip->i_inode.i_uid, GLOBAL_ROOT_UID)) { if (S_ISDIR(inode->i_mode)) inode->i_mode |= S_ISUID; else if (!uid_eq(dip->i_inode.i_uid, current_fsuid())) inode->i_mode &= ~07111; inode->i_uid = dip->i_inode.i_uid; } else inode->i_uid = current_fsuid(); if (dip->i_inode.i_mode & S_ISGID) { if (S_ISDIR(inode->i_mode)) inode->i_mode |= S_ISGID; inode->i_gid = dip->i_inode.i_gid; } else inode->i_gid = current_fsgid(); } static int alloc_dinode(struct gfs2_inode *ip, u32 flags, unsigned *dblocks) { struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode); struct gfs2_alloc_parms ap = { .target = *dblocks, .aflags = flags, }; int error; error = gfs2_quota_lock_check(ip, &ap); if (error) goto out; error = gfs2_inplace_reserve(ip, &ap); if (error) goto out_quota; error = gfs2_trans_begin(sdp, (*dblocks * RES_RG_BIT) + RES_STATFS + RES_QUOTA, 0); if (error) goto out_ipreserv; error = gfs2_alloc_blocks(ip, &ip->i_no_addr, dblocks, 1); if (error) goto out_trans_end; ip->i_no_formal_ino = ip->i_generation; ip->i_inode.i_ino = ip->i_no_addr; ip->i_goal = ip->i_no_addr; if (*dblocks > 1) ip->i_eattr = ip->i_no_addr + 1; out_trans_end: gfs2_trans_end(sdp); out_ipreserv: gfs2_inplace_release(ip); out_quota: gfs2_quota_unlock(ip); out: return error; } static void gfs2_init_dir(struct buffer_head *dibh, const struct gfs2_inode *parent) { struct gfs2_dinode *di = (struct gfs2_dinode *)dibh->b_data; struct gfs2_dirent *dent = (struct gfs2_dirent *)(di+1); gfs2_qstr2dirent(&gfs2_qdot, GFS2_DIRENT_SIZE(gfs2_qdot.len), dent); dent->de_inum = di->di_num; /* already GFS2 endian */ dent->de_type = cpu_to_be16(DT_DIR); dent = (struct gfs2_dirent *)((char*)dent + GFS2_DIRENT_SIZE(1)); gfs2_qstr2dirent(&gfs2_qdotdot, dibh->b_size - GFS2_DIRENT_SIZE(1) - sizeof(struct gfs2_dinode), dent); gfs2_inum_out(parent, dent); dent->de_type = cpu_to_be16(DT_DIR); } /** * gfs2_init_xattr - Initialise an xattr block for a new inode * @ip: The inode in question * * This sets up an empty xattr block for a new inode, ready to * take any ACLs, LSM xattrs, etc. */ static void gfs2_init_xattr(struct gfs2_inode *ip) { struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode); struct buffer_head *bh; struct gfs2_ea_header *ea; bh = gfs2_meta_new(ip->i_gl, ip->i_eattr); gfs2_trans_add_meta(ip->i_gl, bh); gfs2_metatype_set(bh, GFS2_METATYPE_EA, GFS2_FORMAT_EA); gfs2_buffer_clear_tail(bh, sizeof(struct gfs2_meta_header)); ea = GFS2_EA_BH2FIRST(bh); ea->ea_rec_len = cpu_to_be32(sdp->sd_jbsize); ea->ea_type = GFS2_EATYPE_UNUSED; ea->ea_flags = GFS2_EAFLAG_LAST; brelse(bh); } /** * init_dinode - Fill in a new dinode structure * @dip: The directory this inode is being created in * @ip: The inode * @symname: The symlink destination (if a symlink) * */ static void init_dinode(struct gfs2_inode *dip, struct gfs2_inode *ip, const char *symname) { struct gfs2_dinode *di; struct buffer_head *dibh; dibh = gfs2_meta_new(ip->i_gl, ip->i_no_addr); gfs2_trans_add_meta(ip->i_gl, dibh); di = (struct gfs2_dinode *)dibh->b_data; gfs2_dinode_out(ip, di); di->di_major = cpu_to_be32(imajor(&ip->i_inode)); di->di_minor = cpu_to_be32(iminor(&ip->i_inode)); di->__pad1 = 0; di->__pad2 = 0; di->__pad3 = 0; memset(&di->__pad4, 0, sizeof(di->__pad4)); memset(&di->di_reserved, 0, sizeof(di->di_reserved)); gfs2_buffer_clear_tail(dibh, sizeof(struct gfs2_dinode)); switch(ip->i_inode.i_mode & S_IFMT) { case S_IFDIR: gfs2_init_dir(dibh, dip); break; case S_IFLNK: memcpy(dibh->b_data + sizeof(struct gfs2_dinode), symname, ip->i_inode.i_size); break; } set_buffer_uptodate(dibh); brelse(dibh); } /** * gfs2_trans_da_blks - Calculate number of blocks to link inode * @dip: The directory we are linking into * @da: The dir add information * @nr_inodes: The number of inodes involved * * This calculate the number of blocks we need to reserve in a * transaction to link @nr_inodes into a directory. In most cases * @nr_inodes will be 2 (the directory plus the inode being linked in) * but in case of rename, 4 may be required. * * Returns: Number of blocks */ static unsigned gfs2_trans_da_blks(const struct gfs2_inode *dip, const struct gfs2_diradd *da, unsigned nr_inodes) { return da->nr_blocks + gfs2_rg_blocks(dip, da->nr_blocks) + (nr_inodes * RES_DINODE) + RES_QUOTA + RES_STATFS; } static int link_dinode(struct gfs2_inode *dip, const struct qstr *name, struct gfs2_inode *ip, struct gfs2_diradd *da) { struct gfs2_sbd *sdp = GFS2_SB(&dip->i_inode); struct gfs2_alloc_parms ap = { .target = da->nr_blocks, }; int error; if (da->nr_blocks) { error = gfs2_quota_lock_check(dip, &ap); if (error) goto fail_quota_locks; error = gfs2_inplace_reserve(dip, &ap); if (error) goto fail_quota_locks; error = gfs2_trans_begin(sdp, gfs2_trans_da_blks(dip, da, 2), 0); if (error) goto fail_ipreserv; } else { error = gfs2_trans_begin(sdp, RES_LEAF + 2 * RES_DINODE, 0); if (error) goto fail_quota_locks; } error = gfs2_dir_add(&dip->i_inode, name, ip, da); gfs2_trans_end(sdp); fail_ipreserv: gfs2_inplace_release(dip); fail_quota_locks: gfs2_quota_unlock(dip); return error; } static int gfs2_initxattrs(struct inode *inode, const struct xattr *xattr_array, void *fs_info) { const struct xattr *xattr; int err = 0; for (xattr = xattr_array; xattr->name != NULL; xattr++) { err = __gfs2_xattr_set(inode, xattr->name, xattr->value, xattr->value_len, 0, GFS2_EATYPE_SECURITY); if (err < 0) break; } return err; } /** * gfs2_create_inode - Create a new inode * @dir: The parent directory * @dentry: The new dentry * @file: If non-NULL, the file which is being opened * @mode: The permissions on the new inode * @dev: For device nodes, this is the device number * @symname: For symlinks, this is the link destination * @size: The initial size of the inode (ignored for directories) * @excl: Force fail if inode exists * * FIXME: Change to allocate the disk blocks and write them out in the same * transaction. That way, we can no longer end up in a situation in which an * inode is allocated, the node crashes, and the block looks like a valid * inode. (With atomic creates in place, we will also no longer need to zero * the link count and dirty the inode here on failure.) * * Returns: 0 on success, or error code */ static int gfs2_create_inode(struct inode *dir, struct dentry *dentry, struct file *file, umode_t mode, dev_t dev, const char *symname, unsigned int size, int excl) { const struct qstr *name = &dentry->d_name; struct posix_acl *default_acl, *acl; struct gfs2_holder d_gh, gh; struct inode *inode = NULL; struct gfs2_inode *dip = GFS2_I(dir), *ip; struct gfs2_sbd *sdp = GFS2_SB(&dip->i_inode); struct gfs2_glock *io_gl; int error; u32 aflags = 0; unsigned blocks = 1; struct gfs2_diradd da = { .bh = NULL, .save_loc = 1, }; if (!name->len || name->len > GFS2_FNAMESIZE) return -ENAMETOOLONG; error = gfs2_qa_get(dip); if (error) return error; error = gfs2_rindex_update(sdp); if (error) goto fail; error = gfs2_glock_nq_init(dip->i_gl, LM_ST_EXCLUSIVE, 0, &d_gh); if (error) goto fail; gfs2_holder_mark_uninitialized(&gh); error = create_ok(dip, name, mode); if (error) goto fail_gunlock; inode = gfs2_dir_search(dir, &dentry->d_name, !S_ISREG(mode) || excl); error = PTR_ERR(inode); if (!IS_ERR(inode)) { if (S_ISDIR(inode->i_mode)) { iput(inode); inode = ERR_PTR(-EISDIR); goto fail_gunlock; } d_instantiate(dentry, inode); error = 0; if (file) { if (S_ISREG(inode->i_mode)) error = finish_open(file, dentry, gfs2_open_common); else error = finish_no_open(file, NULL); } gfs2_glock_dq_uninit(&d_gh); goto fail; } else if (error != -ENOENT) { goto fail_gunlock; } error = gfs2_diradd_alloc_required(dir, name, &da); if (error < 0) goto fail_gunlock; inode = new_inode(sdp->sd_vfs); error = -ENOMEM; if (!inode) goto fail_gunlock; ip = GFS2_I(inode); error = posix_acl_create(dir, &mode, &default_acl, &acl); if (error) goto fail_gunlock; error = gfs2_qa_get(ip); if (error) goto fail_free_acls; inode->i_mode = mode; set_nlink(inode, S_ISDIR(mode) ? 2 : 1); inode->i_rdev = dev; inode->i_size = size; simple_inode_init_ts(inode); munge_mode_uid_gid(dip, inode); check_and_update_goal(dip); ip->i_goal = dip->i_goal; ip->i_diskflags = 0; ip->i_eattr = 0; ip->i_height = 0; ip->i_depth = 0; ip->i_entries = 0; ip->i_no_addr = 0; /* Temporarily zero until real addr is assigned */ switch(mode & S_IFMT) { case S_IFREG: if ((dip->i_diskflags & GFS2_DIF_INHERIT_JDATA) || gfs2_tune_get(sdp, gt_new_files_jdata)) ip->i_diskflags |= GFS2_DIF_JDATA; gfs2_set_aops(inode); break; case S_IFDIR: ip->i_diskflags |= (dip->i_diskflags & GFS2_DIF_INHERIT_JDATA); ip->i_diskflags |= GFS2_DIF_JDATA; ip->i_entries = 2; break; } /* Force SYSTEM flag on all files and subdirs of a SYSTEM directory */ if (dip->i_diskflags & GFS2_DIF_SYSTEM) ip->i_diskflags |= GFS2_DIF_SYSTEM; gfs2_set_inode_flags(inode); if ((GFS2_I(d_inode(sdp->sd_root_dir)) == dip) || (dip->i_diskflags & GFS2_DIF_TOPDIR)) aflags |= GFS2_AF_ORLOV; if (default_acl || acl) blocks++; error = alloc_dinode(ip, aflags, &blocks); if (error) goto fail_free_inode; gfs2_set_inode_blocks(inode, blocks); error = gfs2_glock_get(sdp, ip->i_no_addr, &gfs2_inode_glops, CREATE, &ip->i_gl); if (error) goto fail_free_inode; error = gfs2_glock_get(sdp, ip->i_no_addr, &gfs2_iopen_glops, CREATE, &io_gl); if (error) goto fail_free_inode; gfs2_cancel_delete_work(io_gl); io_gl->gl_no_formal_ino = ip->i_no_formal_ino; retry: error = insert_inode_locked4(inode, ip->i_no_addr, iget_test, &ip->i_no_addr); if (error == -EBUSY) goto retry; if (error) goto fail_gunlock2; error = gfs2_glock_nq_init(io_gl, LM_ST_SHARED, GL_EXACT | GL_NOPID, &ip->i_iopen_gh); if (error) goto fail_gunlock2; error = gfs2_glock_nq_init(ip->i_gl, LM_ST_EXCLUSIVE, GL_SKIP, &gh); if (error) goto fail_gunlock3; error = gfs2_trans_begin(sdp, blocks, 0); if (error) goto fail_gunlock3; if (blocks > 1) gfs2_init_xattr(ip); init_dinode(dip, ip, symname); gfs2_trans_end(sdp); glock_set_object(ip->i_gl, ip); glock_set_object(io_gl, ip); gfs2_set_iop(inode); if (default_acl) { error = __gfs2_set_acl(inode, default_acl, ACL_TYPE_DEFAULT); if (error) goto fail_gunlock4; posix_acl_release(default_acl); default_acl = NULL; } if (acl) { error = __gfs2_set_acl(inode, acl, ACL_TYPE_ACCESS); if (error) goto fail_gunlock4; posix_acl_release(acl); acl = NULL; } error = security_inode_init_security(&ip->i_inode, &dip->i_inode, name, &gfs2_initxattrs, NULL); if (error) goto fail_gunlock4; error = link_dinode(dip, name, ip, &da); if (error) goto fail_gunlock4; mark_inode_dirty(inode); d_instantiate(dentry, inode); /* After instantiate, errors should result in evict which will destroy * both inode and iopen glocks properly. */ if (file) { file->f_mode |= FMODE_CREATED; error = finish_open(file, dentry, gfs2_open_common); } gfs2_glock_dq_uninit(&d_gh); gfs2_qa_put(ip); gfs2_glock_dq_uninit(&gh); gfs2_glock_put(io_gl); gfs2_qa_put(dip); unlock_new_inode(inode); return error; fail_gunlock4: glock_clear_object(ip->i_gl, ip); glock_clear_object(io_gl, ip); fail_gunlock3: gfs2_glock_dq_uninit(&ip->i_iopen_gh); fail_gunlock2: gfs2_glock_put(io_gl); fail_free_inode: if (ip->i_gl) { gfs2_glock_put(ip->i_gl); ip->i_gl = NULL; } gfs2_rs_deltree(&ip->i_res); gfs2_qa_put(ip); fail_free_acls: posix_acl_release(default_acl); posix_acl_release(acl); fail_gunlock: gfs2_dir_no_add(&da); gfs2_glock_dq_uninit(&d_gh); if (!IS_ERR_OR_NULL(inode)) { set_bit(GIF_ALLOC_FAILED, &ip->i_flags); clear_nlink(inode); if (ip->i_no_addr) mark_inode_dirty(inode); if (inode->i_state & I_NEW) iget_failed(inode); else iput(inode); } if (gfs2_holder_initialized(&gh)) gfs2_glock_dq_uninit(&gh); fail: gfs2_qa_put(dip); return error; } /** * gfs2_create - Create a file * @idmap: idmap of the mount the inode was found from * @dir: The directory in which to create the file * @dentry: The dentry of the new file * @mode: The mode of the new file * @excl: Force fail if inode exists * * Returns: errno */ static int gfs2_create(struct mnt_idmap *idmap, struct inode *dir, struct dentry *dentry, umode_t mode, bool excl) { return gfs2_create_inode(dir, dentry, NULL, S_IFREG | mode, 0, NULL, 0, excl); } /** * __gfs2_lookup - Look up a filename in a directory and return its inode * @dir: The directory inode * @dentry: The dentry of the new inode * @file: File to be opened * * * Returns: errno */ static struct dentry *__gfs2_lookup(struct inode *dir, struct dentry *dentry, struct file *file) { struct inode *inode; struct dentry *d; struct gfs2_holder gh; struct gfs2_glock *gl; int error; inode = gfs2_lookupi(dir, &dentry->d_name, 0); if (inode == NULL) { d_add(dentry, NULL); return NULL; } if (IS_ERR(inode)) return ERR_CAST(inode); gl = GFS2_I(inode)->i_gl; error = gfs2_glock_nq_init(gl, LM_ST_SHARED, LM_FLAG_ANY, &gh); if (error) { iput(inode); return ERR_PTR(error); } d = d_splice_alias(inode, dentry); if (IS_ERR(d)) { gfs2_glock_dq_uninit(&gh); return d; } if (file && S_ISREG(inode->i_mode)) error = finish_open(file, dentry, gfs2_open_common); gfs2_glock_dq_uninit(&gh); if (error) { dput(d); return ERR_PTR(error); } return d; } static struct dentry *gfs2_lookup(struct inode *dir, struct dentry *dentry, unsigned flags) { return __gfs2_lookup(dir, dentry, NULL); } /** * gfs2_link - Link to a file * @old_dentry: The inode to link * @dir: Add link to this directory * @dentry: The name of the link * * Link the inode in "old_dentry" into the directory "dir" with the * name in "dentry". * * Returns: errno */ static int gfs2_link(struct dentry *old_dentry, struct inode *dir, struct dentry *dentry) { struct gfs2_inode *dip = GFS2_I(dir); struct gfs2_sbd *sdp = GFS2_SB(dir); struct inode *inode = d_inode(old_dentry); struct gfs2_inode *ip = GFS2_I(inode); struct gfs2_holder d_gh, gh; struct buffer_head *dibh; struct gfs2_diradd da = { .bh = NULL, .save_loc = 1, }; int error; if (S_ISDIR(inode->i_mode)) return -EPERM; error = gfs2_qa_get(dip); if (error) return error; gfs2_holder_init(dip->i_gl, LM_ST_EXCLUSIVE, 0, &d_gh); gfs2_holder_init(ip->i_gl, LM_ST_EXCLUSIVE, 0, &gh); error = gfs2_glock_nq(&d_gh); if (error) goto out_parent; error = gfs2_glock_nq(&gh); if (error) goto out_child; error = -ENOENT; if (inode->i_nlink == 0) goto out_gunlock; error = gfs2_permission(&nop_mnt_idmap, dir, MAY_WRITE | MAY_EXEC); if (error) goto out_gunlock; error = gfs2_dir_check(dir, &dentry->d_name, NULL); switch (error) { case -ENOENT: break; case 0: error = -EEXIST; goto out_gunlock; default: goto out_gunlock; } error = -EINVAL; if (!dip->i_inode.i_nlink) goto out_gunlock; error = -EFBIG; if (dip->i_entries == (u32)-1) goto out_gunlock; error = -EPERM; if (IS_IMMUTABLE(inode) || IS_APPEND(inode)) goto out_gunlock; error = -EMLINK; if (ip->i_inode.i_nlink == (u32)-1) goto out_gunlock; error = gfs2_diradd_alloc_required(dir, &dentry->d_name, &da); if (error < 0) goto out_gunlock; if (da.nr_blocks) { struct gfs2_alloc_parms ap = { .target = da.nr_blocks, }; error = gfs2_quota_lock_check(dip, &ap); if (error) goto out_gunlock; error = gfs2_inplace_reserve(dip, &ap); if (error) goto out_gunlock_q; error = gfs2_trans_begin(sdp, gfs2_trans_da_blks(dip, &da, 2), 0); if (error) goto out_ipres; } else { error = gfs2_trans_begin(sdp, 2 * RES_DINODE + RES_LEAF, 0); if (error) goto out_ipres; } error = gfs2_meta_inode_buffer(ip, &dibh); if (error) goto out_end_trans; error = gfs2_dir_add(dir, &dentry->d_name, ip, &da); if (error) goto out_brelse; gfs2_trans_add_meta(ip->i_gl, dibh); inc_nlink(&ip->i_inode); inode_set_ctime_current(&ip->i_inode); ihold(inode); d_instantiate(dentry, inode); mark_inode_dirty(inode); out_brelse: brelse(dibh); out_end_trans: gfs2_trans_end(sdp); out_ipres: if (da.nr_blocks) gfs2_inplace_release(dip); out_gunlock_q: if (da.nr_blocks) gfs2_quota_unlock(dip); out_gunlock: gfs2_dir_no_add(&da); gfs2_glock_dq(&gh); out_child: gfs2_glock_dq(&d_gh); out_parent: gfs2_qa_put(dip); gfs2_holder_uninit(&d_gh); gfs2_holder_uninit(&gh); return error; } /* * gfs2_unlink_ok - check to see that a inode is still in a directory * @dip: the directory * @name: the name of the file * @ip: the inode * * Assumes that the lock on (at least) @dip is held. * * Returns: 0 if the parent/child relationship is correct, errno if it isn't */ static int gfs2_unlink_ok(struct gfs2_inode *dip, const struct qstr *name, const struct gfs2_inode *ip) { int error; if (IS_IMMUTABLE(&ip->i_inode) || IS_APPEND(&ip->i_inode)) return -EPERM; if ((dip->i_inode.i_mode & S_ISVTX) && !uid_eq(dip->i_inode.i_uid, current_fsuid()) && !uid_eq(ip->i_inode.i_uid, current_fsuid()) && !capable(CAP_FOWNER)) return -EPERM; if (IS_APPEND(&dip->i_inode)) return -EPERM; error = gfs2_permission(&nop_mnt_idmap, &dip->i_inode, MAY_WRITE | MAY_EXEC); if (error) return error; return gfs2_dir_check(&dip->i_inode, name, ip); } /** * gfs2_unlink_inode - Removes an inode from its parent dir and unlinks it * @dip: The parent directory * @dentry: The dentry to unlink * * Called with all the locks and in a transaction. This will only be * called for a directory after it has been checked to ensure it is empty. * * Returns: 0 on success, or an error */ static int gfs2_unlink_inode(struct gfs2_inode *dip, const struct dentry *dentry) { struct inode *inode = d_inode(dentry); struct gfs2_inode *ip = GFS2_I(inode); int error; error = gfs2_dir_del(dip, dentry); if (error) return error; ip->i_entries = 0; inode_set_ctime_current(inode); if (S_ISDIR(inode->i_mode)) clear_nlink(inode); else drop_nlink(inode); mark_inode_dirty(inode); if (inode->i_nlink == 0) gfs2_unlink_di(inode); return 0; } /** * gfs2_unlink - Unlink an inode (this does rmdir as well) * @dir: The inode of the directory containing the inode to unlink * @dentry: The file itself * * This routine uses the type of the inode as a flag to figure out * whether this is an unlink or an rmdir. * * Returns: errno */ static int gfs2_unlink(struct inode *dir, struct dentry *dentry) { struct gfs2_inode *dip = GFS2_I(dir); struct gfs2_sbd *sdp = GFS2_SB(dir); struct inode *inode = d_inode(dentry); struct gfs2_inode *ip = GFS2_I(inode); struct gfs2_holder d_gh, r_gh, gh; struct gfs2_rgrpd *rgd; int error; error = gfs2_rindex_update(sdp); if (error) return error; error = -EROFS; gfs2_holder_init(dip->i_gl, LM_ST_EXCLUSIVE, 0, &d_gh); gfs2_holder_init(ip->i_gl, LM_ST_EXCLUSIVE, 0, &gh); rgd = gfs2_blk2rgrpd(sdp, ip->i_no_addr, 1); if (!rgd) goto out_inodes; gfs2_holder_init(rgd->rd_gl, LM_ST_EXCLUSIVE, LM_FLAG_NODE_SCOPE, &r_gh); error = gfs2_glock_nq(&d_gh); if (error) goto out_parent; error = gfs2_glock_nq(&gh); if (error) goto out_child; error = -ENOENT; if (inode->i_nlink == 0) goto out_rgrp; if (S_ISDIR(inode->i_mode)) { error = -ENOTEMPTY; if (ip->i_entries > 2 || inode->i_nlink > 2) goto out_rgrp; } error = gfs2_glock_nq(&r_gh); /* rgrp */ if (error) goto out_rgrp; error = gfs2_unlink_ok(dip, &dentry->d_name, ip); if (error) goto out_gunlock; error = gfs2_trans_begin(sdp, 2*RES_DINODE + 3*RES_LEAF + RES_RG_BIT, 0); if (error) goto out_gunlock; error = gfs2_unlink_inode(dip, dentry); gfs2_trans_end(sdp); out_gunlock: gfs2_glock_dq(&r_gh); out_rgrp: gfs2_glock_dq(&gh); out_child: gfs2_glock_dq(&d_gh); out_parent: gfs2_holder_uninit(&r_gh); out_inodes: gfs2_holder_uninit(&gh); gfs2_holder_uninit(&d_gh); return error; } /** * gfs2_symlink - Create a symlink * @idmap: idmap of the mount the inode was found from * @dir: The directory to create the symlink in * @dentry: The dentry to put the symlink in * @symname: The thing which the link points to * * Returns: errno */ static int gfs2_symlink(struct mnt_idmap *idmap, struct inode *dir, struct dentry *dentry, const char *symname) { unsigned int size; size = strlen(symname); if (size >= gfs2_max_stuffed_size(GFS2_I(dir))) return -ENAMETOOLONG; return gfs2_create_inode(dir, dentry, NULL, S_IFLNK | S_IRWXUGO, 0, symname, size, 0); } /** * gfs2_mkdir - Make a directory * @idmap: idmap of the mount the inode was found from * @dir: The parent directory of the new one * @dentry: The dentry of the new directory * @mode: The mode of the new directory * * Returns: errno */ static int gfs2_mkdir(struct mnt_idmap *idmap, struct inode *dir, struct dentry *dentry, umode_t mode) { unsigned dsize = gfs2_max_stuffed_size(GFS2_I(dir)); return gfs2_create_inode(dir, dentry, NULL, S_IFDIR | mode, 0, NULL, dsize, 0); } /** * gfs2_mknod - Make a special file * @idmap: idmap of the mount the inode was found from * @dir: The directory in which the special file will reside * @dentry: The dentry of the special file * @mode: The mode of the special file * @dev: The device specification of the special file * */ static int gfs2_mknod(struct mnt_idmap *idmap, struct inode *dir, struct dentry *dentry, umode_t mode, dev_t dev) { return gfs2_create_inode(dir, dentry, NULL, mode, dev, NULL, 0, 0); } /** * gfs2_atomic_open - Atomically open a file * @dir: The directory * @dentry: The proposed new entry * @file: The proposed new struct file * @flags: open flags * @mode: File mode * * Returns: error code or 0 for success */ static int gfs2_atomic_open(struct inode *dir, struct dentry *dentry, struct file *file, unsigned flags, umode_t mode) { struct dentry *d; bool excl = !!(flags & O_EXCL); if (!d_in_lookup(dentry)) goto skip_lookup; d = __gfs2_lookup(dir, dentry, file); if (IS_ERR(d)) return PTR_ERR(d); if (d != NULL) dentry = d; if (d_really_is_positive(dentry)) { if (!(file->f_mode & FMODE_OPENED)) return finish_no_open(file, d); dput(d); return excl && (flags & O_CREAT) ? -EEXIST : 0; } BUG_ON(d != NULL); skip_lookup: if (!(flags & O_CREAT)) return -ENOENT; return gfs2_create_inode(dir, dentry, file, S_IFREG | mode, 0, NULL, 0, excl); } /* * gfs2_ok_to_move - check if it's ok to move a directory to another directory * @this: move this * @to: to here * * Follow @to back to the root and make sure we don't encounter @this * Assumes we already hold the rename lock. * * Returns: errno */ static int gfs2_ok_to_move(struct gfs2_inode *this, struct gfs2_inode *to) { struct inode *dir = &to->i_inode; struct super_block *sb = dir->i_sb; struct inode *tmp; int error = 0; igrab(dir); for (;;) { if (dir == &this->i_inode) { error = -EINVAL; break; } if (dir == d_inode(sb->s_root)) { error = 0; break; } tmp = gfs2_lookupi(dir, &gfs2_qdotdot, 1); if (!tmp) { error = -ENOENT; break; } if (IS_ERR(tmp)) { error = PTR_ERR(tmp); break; } iput(dir); dir = tmp; } iput(dir); return error; } /** * update_moved_ino - Update an inode that's being moved * @ip: The inode being moved * @ndip: The parent directory of the new filename * @dir_rename: True of ip is a directory * * Returns: errno */ static int update_moved_ino(struct gfs2_inode *ip, struct gfs2_inode *ndip, int dir_rename) { if (dir_rename) return gfs2_dir_mvino(ip, &gfs2_qdotdot, ndip, DT_DIR); inode_set_ctime_current(&ip->i_inode); mark_inode_dirty_sync(&ip->i_inode); return 0; } /** * gfs2_rename - Rename a file * @odir: Parent directory of old file name * @odentry: The old dentry of the file * @ndir: Parent directory of new file name * @ndentry: The new dentry of the file * * Returns: errno */ static int gfs2_rename(struct inode *odir, struct dentry *odentry, struct inode *ndir, struct dentry *ndentry) { struct gfs2_inode *odip = GFS2_I(odir); struct gfs2_inode *ndip = GFS2_I(ndir); struct gfs2_inode *ip = GFS2_I(d_inode(odentry)); struct gfs2_inode *nip = NULL; struct gfs2_sbd *sdp = GFS2_SB(odir); struct gfs2_holder ghs[4], r_gh, rd_gh; struct gfs2_rgrpd *nrgd; unsigned int num_gh; int dir_rename = 0; struct gfs2_diradd da = { .nr_blocks = 0, .save_loc = 0, }; unsigned int x; int error; gfs2_holder_mark_uninitialized(&r_gh); gfs2_holder_mark_uninitialized(&rd_gh); if (d_really_is_positive(ndentry)) { nip = GFS2_I(d_inode(ndentry)); if (ip == nip) return 0; } error = gfs2_rindex_update(sdp); if (error) return error; error = gfs2_qa_get(ndip); if (error) return error; if (odip != ndip) { error = gfs2_glock_nq_init(sdp->sd_rename_gl, LM_ST_EXCLUSIVE, 0, &r_gh); if (error) goto out; if (S_ISDIR(ip->i_inode.i_mode)) { dir_rename = 1; /* don't move a directory into its subdir */ error = gfs2_ok_to_move(ip, ndip); if (error) goto out_gunlock_r; } } num_gh = 1; gfs2_holder_init(odip->i_gl, LM_ST_EXCLUSIVE, GL_ASYNC, ghs); if (odip != ndip) { gfs2_holder_init(ndip->i_gl, LM_ST_EXCLUSIVE,GL_ASYNC, ghs + num_gh); num_gh++; } gfs2_holder_init(ip->i_gl, LM_ST_EXCLUSIVE, GL_ASYNC, ghs + num_gh); num_gh++; if (nip) { gfs2_holder_init(nip->i_gl, LM_ST_EXCLUSIVE, GL_ASYNC, ghs + num_gh); num_gh++; } for (x = 0; x < num_gh; x++) { error = gfs2_glock_nq(ghs + x); if (error) goto out_gunlock; } error = gfs2_glock_async_wait(num_gh, ghs); if (error) goto out_gunlock; if (nip) { /* Grab the resource group glock for unlink flag twiddling. * This is the case where the target dinode already exists * so we unlink before doing the rename. */ nrgd = gfs2_blk2rgrpd(sdp, nip->i_no_addr, 1); if (!nrgd) { error = -ENOENT; goto out_gunlock; } error = gfs2_glock_nq_init(nrgd->rd_gl, LM_ST_EXCLUSIVE, LM_FLAG_NODE_SCOPE, &rd_gh); if (error) goto out_gunlock; } error = -ENOENT; if (ip->i_inode.i_nlink == 0) goto out_gunlock; /* Check out the old directory */ error = gfs2_unlink_ok(odip, &odentry->d_name, ip); if (error) goto out_gunlock; /* Check out the new directory */ if (nip) { error = gfs2_unlink_ok(ndip, &ndentry->d_name, nip); if (error) goto out_gunlock; if (nip->i_inode.i_nlink == 0) { error = -EAGAIN; goto out_gunlock; } if (S_ISDIR(nip->i_inode.i_mode)) { if (nip->i_entries < 2) { gfs2_consist_inode(nip); error = -EIO; goto out_gunlock; } if (nip->i_entries > 2) { error = -ENOTEMPTY; goto out_gunlock; } } } else { error = gfs2_permission(&nop_mnt_idmap, ndir, MAY_WRITE | MAY_EXEC); if (error) goto out_gunlock; error = gfs2_dir_check(ndir, &ndentry->d_name, NULL); switch (error) { case -ENOENT: error = 0; break; case 0: error = -EEXIST; goto out_gunlock; default: goto out_gunlock; } if (odip != ndip) { if (!ndip->i_inode.i_nlink) { error = -ENOENT; goto out_gunlock; } if (ndip->i_entries == (u32)-1) { error = -EFBIG; goto out_gunlock; } if (S_ISDIR(ip->i_inode.i_mode) && ndip->i_inode.i_nlink == (u32)-1) { error = -EMLINK; goto out_gunlock; } } } /* Check out the dir to be renamed */ if (dir_rename) { error = gfs2_permission(&nop_mnt_idmap, d_inode(odentry), MAY_WRITE); if (error) goto out_gunlock; } if (nip == NULL) { error = gfs2_diradd_alloc_required(ndir, &ndentry->d_name, &da); if (error) goto out_gunlock; } if (da.nr_blocks) { struct gfs2_alloc_parms ap = { .target = da.nr_blocks, }; error = gfs2_quota_lock_check(ndip, &ap); if (error) goto out_gunlock; error = gfs2_inplace_reserve(ndip, &ap); if (error) goto out_gunlock_q; error = gfs2_trans_begin(sdp, gfs2_trans_da_blks(ndip, &da, 4) + 4 * RES_LEAF + 4, 0); if (error) goto out_ipreserv; } else { error = gfs2_trans_begin(sdp, 4 * RES_DINODE + 5 * RES_LEAF + 4, 0); if (error) goto out_gunlock; } /* Remove the target file, if it exists */ if (nip) error = gfs2_unlink_inode(ndip, ndentry); error = update_moved_ino(ip, ndip, dir_rename); if (error) goto out_end_trans; error = gfs2_dir_del(odip, odentry); if (error) goto out_end_trans; error = gfs2_dir_add(ndir, &ndentry->d_name, ip, &da); if (error) goto out_end_trans; out_end_trans: gfs2_trans_end(sdp); out_ipreserv: if (da.nr_blocks) gfs2_inplace_release(ndip); out_gunlock_q: if (da.nr_blocks) gfs2_quota_unlock(ndip); out_gunlock: gfs2_dir_no_add(&da); if (gfs2_holder_initialized(&rd_gh)) gfs2_glock_dq_uninit(&rd_gh); while (x--) { if (gfs2_holder_queued(ghs + x)) gfs2_glock_dq(ghs + x); gfs2_holder_uninit(ghs + x); } out_gunlock_r: if (gfs2_holder_initialized(&r_gh)) gfs2_glock_dq_uninit(&r_gh); out: gfs2_qa_put(ndip); return error; } /** * gfs2_exchange - exchange two files * @odir: Parent directory of old file name * @odentry: The old dentry of the file * @ndir: Parent directory of new file name * @ndentry: The new dentry of the file * @flags: The rename flags * * Returns: errno */ static int gfs2_exchange(struct inode *odir, struct dentry *odentry, struct inode *ndir, struct dentry *ndentry, unsigned int flags) { struct gfs2_inode *odip = GFS2_I(odir); struct gfs2_inode *ndip = GFS2_I(ndir); struct gfs2_inode *oip = GFS2_I(odentry->d_inode); struct gfs2_inode *nip = GFS2_I(ndentry->d_inode); struct gfs2_sbd *sdp = GFS2_SB(odir); struct gfs2_holder ghs[4], r_gh; unsigned int num_gh; unsigned int x; umode_t old_mode = oip->i_inode.i_mode; umode_t new_mode = nip->i_inode.i_mode; int error; gfs2_holder_mark_uninitialized(&r_gh); error = gfs2_rindex_update(sdp); if (error) return error; if (odip != ndip) { error = gfs2_glock_nq_init(sdp->sd_rename_gl, LM_ST_EXCLUSIVE, 0, &r_gh); if (error) goto out; if (S_ISDIR(old_mode)) { /* don't move a directory into its subdir */ error = gfs2_ok_to_move(oip, ndip); if (error) goto out_gunlock_r; } if (S_ISDIR(new_mode)) { /* don't move a directory into its subdir */ error = gfs2_ok_to_move(nip, odip); if (error) goto out_gunlock_r; } } num_gh = 1; gfs2_holder_init(odip->i_gl, LM_ST_EXCLUSIVE, GL_ASYNC, ghs); if (odip != ndip) { gfs2_holder_init(ndip->i_gl, LM_ST_EXCLUSIVE, GL_ASYNC, ghs + num_gh); num_gh++; } gfs2_holder_init(oip->i_gl, LM_ST_EXCLUSIVE, GL_ASYNC, ghs + num_gh); num_gh++; gfs2_holder_init(nip->i_gl, LM_ST_EXCLUSIVE, GL_ASYNC, ghs + num_gh); num_gh++; for (x = 0; x < num_gh; x++) { error = gfs2_glock_nq(ghs + x); if (error) goto out_gunlock; } error = gfs2_glock_async_wait(num_gh, ghs); if (error) goto out_gunlock; error = -ENOENT; if (oip->i_inode.i_nlink == 0 || nip->i_inode.i_nlink == 0) goto out_gunlock; error = gfs2_unlink_ok(odip, &odentry->d_name, oip); if (error) goto out_gunlock; error = gfs2_unlink_ok(ndip, &ndentry->d_name, nip); if (error) goto out_gunlock; if (S_ISDIR(old_mode)) { error = gfs2_permission(&nop_mnt_idmap, odentry->d_inode, MAY_WRITE); if (error) goto out_gunlock; } if (S_ISDIR(new_mode)) { error = gfs2_permission(&nop_mnt_idmap, ndentry->d_inode, MAY_WRITE); if (error) goto out_gunlock; } error = gfs2_trans_begin(sdp, 4 * RES_DINODE + 4 * RES_LEAF, 0); if (error) goto out_gunlock; error = update_moved_ino(oip, ndip, S_ISDIR(old_mode)); if (error) goto out_end_trans; error = update_moved_ino(nip, odip, S_ISDIR(new_mode)); if (error) goto out_end_trans; error = gfs2_dir_mvino(ndip, &ndentry->d_name, oip, IF2DT(old_mode)); if (error) goto out_end_trans; error = gfs2_dir_mvino(odip, &odentry->d_name, nip, IF2DT(new_mode)); if (error) goto out_end_trans; if (odip != ndip) { if (S_ISDIR(new_mode) && !S_ISDIR(old_mode)) { inc_nlink(&odip->i_inode); drop_nlink(&ndip->i_inode); } else if (S_ISDIR(old_mode) && !S_ISDIR(new_mode)) { inc_nlink(&ndip->i_inode); drop_nlink(&odip->i_inode); } } mark_inode_dirty(&ndip->i_inode); if (odip != ndip) mark_inode_dirty(&odip->i_inode); out_end_trans: gfs2_trans_end(sdp); out_gunlock: while (x--) { if (gfs2_holder_queued(ghs + x)) gfs2_glock_dq(ghs + x); gfs2_holder_uninit(ghs + x); } out_gunlock_r: if (gfs2_holder_initialized(&r_gh)) gfs2_glock_dq_uninit(&r_gh); out: return error; } static int gfs2_rename2(struct mnt_idmap *idmap, struct inode *odir, struct dentry *odentry, struct inode *ndir, struct dentry *ndentry, unsigned int flags) { flags &= ~RENAME_NOREPLACE; if (flags & ~RENAME_EXCHANGE) return -EINVAL; if (flags & RENAME_EXCHANGE) return gfs2_exchange(odir, odentry, ndir, ndentry, flags); return gfs2_rename(odir, odentry, ndir, ndentry); } /** * gfs2_get_link - Follow a symbolic link * @dentry: The dentry of the link * @inode: The inode of the link * @done: destructor for return value * * This can handle symlinks of any size. * * Returns: 0 on success or error code */ static const char *gfs2_get_link(struct dentry *dentry, struct inode *inode, struct delayed_call *done) { struct gfs2_inode *ip = GFS2_I(inode); struct gfs2_holder i_gh; struct buffer_head *dibh; unsigned int size; char *buf; int error; if (!dentry) return ERR_PTR(-ECHILD); gfs2_holder_init(ip->i_gl, LM_ST_SHARED, 0, &i_gh); error = gfs2_glock_nq(&i_gh); if (error) { gfs2_holder_uninit(&i_gh); return ERR_PTR(error); } size = (unsigned int)i_size_read(&ip->i_inode); if (size == 0) { gfs2_consist_inode(ip); buf = ERR_PTR(-EIO); goto out; } error = gfs2_meta_inode_buffer(ip, &dibh); if (error) { buf = ERR_PTR(error); goto out; } buf = kzalloc(size + 1, GFP_NOFS); if (!buf) buf = ERR_PTR(-ENOMEM); else memcpy(buf, dibh->b_data + sizeof(struct gfs2_dinode), size); brelse(dibh); out: gfs2_glock_dq_uninit(&i_gh); if (!IS_ERR(buf)) set_delayed_call(done, kfree_link, buf); return buf; } /** * gfs2_permission * @idmap: idmap of the mount the inode was found from * @inode: The inode * @mask: The mask to be tested * * This may be called from the VFS directly, or from within GFS2 with the * inode locked, so we look to see if the glock is already locked and only * lock the glock if its not already been done. * * Returns: errno */ int gfs2_permission(struct mnt_idmap *idmap, struct inode *inode, int mask) { int may_not_block = mask & MAY_NOT_BLOCK; struct gfs2_inode *ip; struct gfs2_holder i_gh; struct gfs2_glock *gl; int error; gfs2_holder_mark_uninitialized(&i_gh); ip = GFS2_I(inode); gl = rcu_dereference_check(ip->i_gl, !may_not_block); if (unlikely(!gl)) { /* inode is getting torn down, must be RCU mode */ WARN_ON_ONCE(!may_not_block); return -ECHILD; } if (gfs2_glock_is_locked_by_me(gl) == NULL) { if (may_not_block) return -ECHILD; error = gfs2_glock_nq_init(gl, LM_ST_SHARED, LM_FLAG_ANY, &i_gh); if (error) return error; } if ((mask & MAY_WRITE) && IS_IMMUTABLE(inode)) error = -EPERM; else error = generic_permission(&nop_mnt_idmap, inode, mask); if (gfs2_holder_initialized(&i_gh)) gfs2_glock_dq_uninit(&i_gh); return error; } static int __gfs2_setattr_simple(struct inode *inode, struct iattr *attr) { setattr_copy(&nop_mnt_idmap, inode, attr); mark_inode_dirty(inode); return 0; } static int gfs2_setattr_simple(struct inode *inode, struct iattr *attr) { int error; if (current->journal_info) return __gfs2_setattr_simple(inode, attr); error = gfs2_trans_begin(GFS2_SB(inode), RES_DINODE, 0); if (error) return error; error = __gfs2_setattr_simple(inode, attr); gfs2_trans_end(GFS2_SB(inode)); return error; } static int setattr_chown(struct inode *inode, struct iattr *attr) { struct gfs2_inode *ip = GFS2_I(inode); struct gfs2_sbd *sdp = GFS2_SB(inode); kuid_t ouid, nuid; kgid_t ogid, ngid; int error; struct gfs2_alloc_parms ap = {}; ouid = inode->i_uid; ogid = inode->i_gid; nuid = attr->ia_uid; ngid = attr->ia_gid; if (!(attr->ia_valid & ATTR_UID) || uid_eq(ouid, nuid)) ouid = nuid = NO_UID_QUOTA_CHANGE; if (!(attr->ia_valid & ATTR_GID) || gid_eq(ogid, ngid)) ogid = ngid = NO_GID_QUOTA_CHANGE; error = gfs2_qa_get(ip); if (error) return error; error = gfs2_rindex_update(sdp); if (error) goto out; error = gfs2_quota_lock(ip, nuid, ngid); if (error) goto out; ap.target = gfs2_get_inode_blocks(&ip->i_inode); if (!uid_eq(ouid, NO_UID_QUOTA_CHANGE) || !gid_eq(ogid, NO_GID_QUOTA_CHANGE)) { error = gfs2_quota_check(ip, nuid, ngid, &ap); if (error) goto out_gunlock_q; } error = gfs2_trans_begin(sdp, RES_DINODE + 2 * RES_QUOTA, 0); if (error) goto out_gunlock_q; error = gfs2_setattr_simple(inode, attr); if (error) goto out_end_trans; if (!uid_eq(ouid, NO_UID_QUOTA_CHANGE) || !gid_eq(ogid, NO_GID_QUOTA_CHANGE)) { gfs2_quota_change(ip, -(s64)ap.target, ouid, ogid); gfs2_quota_change(ip, ap.target, nuid, ngid); } out_end_trans: gfs2_trans_end(sdp); out_gunlock_q: gfs2_quota_unlock(ip); out: gfs2_qa_put(ip); return error; } /** * gfs2_setattr - Change attributes on an inode * @idmap: idmap of the mount the inode was found from * @dentry: The dentry which is changing * @attr: The structure describing the change * * The VFS layer wants to change one or more of an inodes attributes. Write * that change out to disk. * * Returns: errno */ static int gfs2_setattr(struct mnt_idmap *idmap, struct dentry *dentry, struct iattr *attr) { struct inode *inode = d_inode(dentry); struct gfs2_inode *ip = GFS2_I(inode); struct gfs2_holder i_gh; int error; error = gfs2_qa_get(ip); if (error) return error; error = gfs2_glock_nq_init(ip->i_gl, LM_ST_EXCLUSIVE, 0, &i_gh); if (error) goto out; error = may_setattr(&nop_mnt_idmap, inode, attr->ia_valid); if (error) goto error; error = setattr_prepare(&nop_mnt_idmap, dentry, attr); if (error) goto error; if (attr->ia_valid & ATTR_SIZE) error = gfs2_setattr_size(inode, attr->ia_size); else if (attr->ia_valid & (ATTR_UID | ATTR_GID)) error = setattr_chown(inode, attr); else { error = gfs2_setattr_simple(inode, attr); if (!error && attr->ia_valid & ATTR_MODE) error = posix_acl_chmod(&nop_mnt_idmap, dentry, inode->i_mode); } error: if (!error) mark_inode_dirty(inode); gfs2_glock_dq_uninit(&i_gh); out: gfs2_qa_put(ip); return error; } /** * gfs2_getattr - Read out an inode's attributes * @idmap: idmap of the mount the inode was found from * @path: Object to query * @stat: The inode's stats * @request_mask: Mask of STATX_xxx flags indicating the caller's interests * @flags: AT_STATX_xxx setting * * This may be called from the VFS directly, or from within GFS2 with the * inode locked, so we look to see if the glock is already locked and only * lock the glock if its not already been done. Note that its the NFS * readdirplus operation which causes this to be called (from filldir) * with the glock already held. * * Returns: errno */ static int gfs2_getattr(struct mnt_idmap *idmap, const struct path *path, struct kstat *stat, u32 request_mask, unsigned int flags) { struct inode *inode = d_inode(path->dentry); struct gfs2_inode *ip = GFS2_I(inode); struct gfs2_holder gh; u32 gfsflags; int error; gfs2_holder_mark_uninitialized(&gh); if (gfs2_glock_is_locked_by_me(ip->i_gl) == NULL) { error = gfs2_glock_nq_init(ip->i_gl, LM_ST_SHARED, LM_FLAG_ANY, &gh); if (error) return error; } gfsflags = ip->i_diskflags; if (gfsflags & GFS2_DIF_APPENDONLY) stat->attributes |= STATX_ATTR_APPEND; if (gfsflags & GFS2_DIF_IMMUTABLE) stat->attributes |= STATX_ATTR_IMMUTABLE; stat->attributes_mask |= (STATX_ATTR_APPEND | STATX_ATTR_COMPRESSED | STATX_ATTR_ENCRYPTED | STATX_ATTR_IMMUTABLE | STATX_ATTR_NODUMP); generic_fillattr(&nop_mnt_idmap, request_mask, inode, stat); if (gfs2_holder_initialized(&gh)) gfs2_glock_dq_uninit(&gh); return 0; } static int gfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, u64 start, u64 len) { struct gfs2_inode *ip = GFS2_I(inode); struct gfs2_holder gh; int ret; inode_lock_shared(inode); ret = gfs2_glock_nq_init(ip->i_gl, LM_ST_SHARED, 0, &gh); if (ret) goto out; ret = iomap_fiemap(inode, fieinfo, start, len, &gfs2_iomap_ops); gfs2_glock_dq_uninit(&gh); out: inode_unlock_shared(inode); return ret; } loff_t gfs2_seek_data(struct file *file, loff_t offset) { struct inode *inode = file->f_mapping->host; struct gfs2_inode *ip = GFS2_I(inode); struct gfs2_holder gh; loff_t ret; inode_lock_shared(inode); ret = gfs2_glock_nq_init(ip->i_gl, LM_ST_SHARED, 0, &gh); if (!ret) ret = iomap_seek_data(inode, offset, &gfs2_iomap_ops); gfs2_glock_dq_uninit(&gh); inode_unlock_shared(inode); if (ret < 0) return ret; return vfs_setpos(file, ret, inode->i_sb->s_maxbytes); } loff_t gfs2_seek_hole(struct file *file, loff_t offset) { struct inode *inode = file->f_mapping->host; struct gfs2_inode *ip = GFS2_I(inode); struct gfs2_holder gh; loff_t ret; inode_lock_shared(inode); ret = gfs2_glock_nq_init(ip->i_gl, LM_ST_SHARED, 0, &gh); if (!ret) ret = iomap_seek_hole(inode, offset, &gfs2_iomap_ops); gfs2_glock_dq_uninit(&gh); inode_unlock_shared(inode); if (ret < 0) return ret; return vfs_setpos(file, ret, inode->i_sb->s_maxbytes); } static int gfs2_update_time(struct inode *inode, int flags) { struct gfs2_inode *ip = GFS2_I(inode); struct gfs2_glock *gl = ip->i_gl; struct gfs2_holder *gh; int error; gh = gfs2_glock_is_locked_by_me(gl); if (gh && gl->gl_state != LM_ST_EXCLUSIVE) { gfs2_glock_dq(gh); gfs2_holder_reinit(LM_ST_EXCLUSIVE, 0, gh); error = gfs2_glock_nq(gh); if (error) return error; } generic_update_time(inode, flags); return 0; } static const struct inode_operations gfs2_file_iops = { .permission = gfs2_permission, .setattr = gfs2_setattr, .getattr = gfs2_getattr, .listxattr = gfs2_listxattr, .fiemap = gfs2_fiemap, .get_inode_acl = gfs2_get_acl, .set_acl = gfs2_set_acl, .update_time = gfs2_update_time, .fileattr_get = gfs2_fileattr_get, .fileattr_set = gfs2_fileattr_set, }; static const struct inode_operations gfs2_dir_iops = { .create = gfs2_create, .lookup = gfs2_lookup, .link = gfs2_link, .unlink = gfs2_unlink, .symlink = gfs2_symlink, .mkdir = gfs2_mkdir, .rmdir = gfs2_unlink, .mknod = gfs2_mknod, .rename = gfs2_rename2, .permission = gfs2_permission, .setattr = gfs2_setattr, .getattr = gfs2_getattr, .listxattr = gfs2_listxattr, .fiemap = gfs2_fiemap, .get_inode_acl = gfs2_get_acl, .set_acl = gfs2_set_acl, .update_time = gfs2_update_time, .atomic_open = gfs2_atomic_open, .fileattr_get = gfs2_fileattr_get, .fileattr_set = gfs2_fileattr_set, }; static const struct inode_operations gfs2_symlink_iops = { .get_link = gfs2_get_link, .permission = gfs2_permission, .setattr = gfs2_setattr, .getattr = gfs2_getattr, .listxattr = gfs2_listxattr, .fiemap = gfs2_fiemap, }; |
| 1 9 9 9 1 4 3 2 1 1 9 2 1 4 2 3 3 3 3 2 1 1 2 1 6 6 6 6 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 | // SPDX-License-Identifier: GPL-2.0 /* * xfrm6_input.c: based on net/ipv4/xfrm4_input.c * * Authors: * Mitsuru KANDA @USAGI * Kazunori MIYAZAWA @USAGI * Kunihiro Ishiguro <kunihiro@ipinfusion.com> * YOSHIFUJI Hideaki @USAGI * IPv6 support */ #include <linux/module.h> #include <linux/string.h> #include <linux/netfilter.h> #include <linux/netfilter_ipv6.h> #include <net/ipv6.h> #include <net/xfrm.h> #include <net/protocol.h> #include <net/gro.h> int xfrm6_rcv_spi(struct sk_buff *skb, int nexthdr, __be32 spi, struct ip6_tnl *t) { XFRM_TUNNEL_SKB_CB(skb)->tunnel.ip6 = t; XFRM_SPI_SKB_CB(skb)->family = AF_INET6; XFRM_SPI_SKB_CB(skb)->daddroff = offsetof(struct ipv6hdr, daddr); return xfrm_input(skb, nexthdr, spi, 0); } EXPORT_SYMBOL(xfrm6_rcv_spi); static int xfrm6_transport_finish2(struct net *net, struct sock *sk, struct sk_buff *skb) { if (xfrm_trans_queue(skb, ip6_rcv_finish)) { kfree_skb(skb); return NET_RX_DROP; } return 0; } int xfrm6_transport_finish(struct sk_buff *skb, int async) { struct xfrm_offload *xo = xfrm_offload(skb); int nhlen = -skb_network_offset(skb); skb_network_header(skb)[IP6CB(skb)->nhoff] = XFRM_MODE_SKB_CB(skb)->protocol; #ifndef CONFIG_NETFILTER if (!async) return 1; #endif __skb_push(skb, nhlen); ipv6_hdr(skb)->payload_len = htons(skb->len - sizeof(struct ipv6hdr)); skb_postpush_rcsum(skb, skb_network_header(skb), nhlen); if (xo && (xo->flags & XFRM_GRO)) { /* The full l2 header needs to be preserved so that re-injecting the packet at l2 * works correctly in the presence of vlan tags. */ skb_mac_header_rebuild_full(skb, xo->orig_mac_len); skb_reset_network_header(skb); skb_reset_transport_header(skb); return 0; } NF_HOOK(NFPROTO_IPV6, NF_INET_PRE_ROUTING, dev_net(skb->dev), NULL, skb, skb->dev, NULL, xfrm6_transport_finish2); return 0; } static int __xfrm6_udp_encap_rcv(struct sock *sk, struct sk_buff *skb, bool pull) { struct udp_sock *up = udp_sk(sk); struct udphdr *uh; struct ipv6hdr *ip6h; int len; int ip6hlen = sizeof(struct ipv6hdr); __u8 *udpdata; __be32 *udpdata32; u16 encap_type; encap_type = READ_ONCE(up->encap_type); /* if this is not encapsulated socket, then just return now */ if (!encap_type) return 1; /* If this is a paged skb, make sure we pull up * whatever data we need to look at. */ len = skb->len - sizeof(struct udphdr); if (!pskb_may_pull(skb, sizeof(struct udphdr) + min(len, 8))) return 1; /* Now we can get the pointers */ uh = udp_hdr(skb); udpdata = (__u8 *)uh + sizeof(struct udphdr); udpdata32 = (__be32 *)udpdata; switch (encap_type) { default: case UDP_ENCAP_ESPINUDP: /* Check if this is a keepalive packet. If so, eat it. */ if (len == 1 && udpdata[0] == 0xff) { return -EINVAL; } else if (len > sizeof(struct ip_esp_hdr) && udpdata32[0] != 0) { /* ESP Packet without Non-ESP header */ len = sizeof(struct udphdr); } else /* Must be an IKE packet.. pass it through */ return 1; break; } /* At this point we are sure that this is an ESPinUDP packet, * so we need to remove 'len' bytes from the packet (the UDP * header and optional ESP marker bytes) and then modify the * protocol to ESP, and then call into the transform receiver. */ if (skb_unclone(skb, GFP_ATOMIC)) return -EINVAL; /* Now we can update and verify the packet length... */ ip6h = ipv6_hdr(skb); ip6h->payload_len = htons(ntohs(ip6h->payload_len) - len); if (skb->len < ip6hlen + len) { /* packet is too small!?! */ return -EINVAL; } /* pull the data buffer up to the ESP header and set the * transport header to point to ESP. Keep UDP on the stack * for later. */ if (pull) { __skb_pull(skb, len); skb_reset_transport_header(skb); } else { skb_set_transport_header(skb, len); } /* process ESP */ return 0; } /* If it's a keepalive packet, then just eat it. * If it's an encapsulated packet, then pass it to the * IPsec xfrm input. * Returns 0 if skb passed to xfrm or was dropped. * Returns >0 if skb should be passed to UDP. * Returns <0 if skb should be resubmitted (-ret is protocol) */ int xfrm6_udp_encap_rcv(struct sock *sk, struct sk_buff *skb) { int ret; if (skb->protocol == htons(ETH_P_IP)) return xfrm4_udp_encap_rcv(sk, skb); ret = __xfrm6_udp_encap_rcv(sk, skb, true); if (!ret) return xfrm6_rcv_encap(skb, IPPROTO_ESP, 0, udp_sk(sk)->encap_type); if (ret < 0) { kfree_skb(skb); return 0; } return ret; } struct sk_buff *xfrm6_gro_udp_encap_rcv(struct sock *sk, struct list_head *head, struct sk_buff *skb) { int offset = skb_gro_offset(skb); const struct net_offload *ops; struct sk_buff *pp = NULL; int ret; if (skb->protocol == htons(ETH_P_IP)) return xfrm4_gro_udp_encap_rcv(sk, head, skb); offset = offset - sizeof(struct udphdr); if (!pskb_pull(skb, offset)) return NULL; rcu_read_lock(); ops = rcu_dereference(inet6_offloads[IPPROTO_ESP]); if (!ops || !ops->callbacks.gro_receive) goto out; ret = __xfrm6_udp_encap_rcv(sk, skb, false); if (ret) goto out; skb_push(skb, offset); NAPI_GRO_CB(skb)->proto = IPPROTO_UDP; pp = call_gro_receive(ops->callbacks.gro_receive, head, skb); rcu_read_unlock(); return pp; out: rcu_read_unlock(); skb_push(skb, offset); NAPI_GRO_CB(skb)->same_flow = 0; NAPI_GRO_CB(skb)->flush = 1; return NULL; } int xfrm6_rcv_tnl(struct sk_buff *skb, struct ip6_tnl *t) { return xfrm6_rcv_spi(skb, skb_network_header(skb)[IP6CB(skb)->nhoff], 0, t); } EXPORT_SYMBOL(xfrm6_rcv_tnl); int xfrm6_rcv(struct sk_buff *skb) { return xfrm6_rcv_tnl(skb, NULL); } EXPORT_SYMBOL(xfrm6_rcv); int xfrm6_input_addr(struct sk_buff *skb, xfrm_address_t *daddr, xfrm_address_t *saddr, u8 proto) { struct net *net = dev_net(skb->dev); struct xfrm_state *x = NULL; struct sec_path *sp; int i = 0; sp = secpath_set(skb); if (!sp) { XFRM_INC_STATS(net, LINUX_MIB_XFRMINERROR); goto drop; } if (1 + sp->len == XFRM_MAX_DEPTH) { XFRM_INC_STATS(net, LINUX_MIB_XFRMINBUFFERERROR); goto drop; } for (i = 0; i < 3; i++) { xfrm_address_t *dst, *src; switch (i) { case 0: dst = daddr; src = saddr; break; case 1: /* lookup state with wild-card source address */ dst = daddr; src = (xfrm_address_t *)&in6addr_any; break; default: /* lookup state with wild-card addresses */ dst = (xfrm_address_t *)&in6addr_any; src = (xfrm_address_t *)&in6addr_any; break; } x = xfrm_state_lookup_byaddr(net, skb->mark, dst, src, proto, AF_INET6); if (!x) continue; if (unlikely(x->dir && x->dir != XFRM_SA_DIR_IN)) { XFRM_INC_STATS(net, LINUX_MIB_XFRMINSTATEDIRERROR); xfrm_state_put(x); x = NULL; continue; } spin_lock(&x->lock); if ((!i || (x->props.flags & XFRM_STATE_WILDRECV)) && likely(x->km.state == XFRM_STATE_VALID) && !xfrm_state_check_expire(x)) { spin_unlock(&x->lock); if (x->type->input(x, skb) > 0) { /* found a valid state */ break; } } else spin_unlock(&x->lock); xfrm_state_put(x); x = NULL; } if (!x) { XFRM_INC_STATS(net, LINUX_MIB_XFRMINNOSTATES); xfrm_audit_state_notfound_simple(skb, AF_INET6); goto drop; } sp->xvec[sp->len++] = x; spin_lock(&x->lock); x->curlft.bytes += skb->len; x->curlft.packets++; spin_unlock(&x->lock); return 1; drop: return -1; } EXPORT_SYMBOL(xfrm6_input_addr); |
| 4 39 34 134 122 1 59 115 53 168 172 54 114 49 170 151 3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 | /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _FAT_H #define _FAT_H #include <linux/buffer_head.h> #include <linux/nls.h> #include <linux/hash.h> #include <linux/ratelimit.h> #include <linux/msdos_fs.h> #include <linux/fs_context.h> #include <linux/fs_parser.h> /* * vfat shortname flags */ #define VFAT_SFN_DISPLAY_LOWER 0x0001 /* convert to lowercase for display */ #define VFAT_SFN_DISPLAY_WIN95 0x0002 /* emulate win95 rule for display */ #define VFAT_SFN_DISPLAY_WINNT 0x0004 /* emulate winnt rule for display */ #define VFAT_SFN_CREATE_WIN95 0x0100 /* emulate win95 rule for create */ #define VFAT_SFN_CREATE_WINNT 0x0200 /* emulate winnt rule for create */ #define FAT_ERRORS_CONT 1 /* ignore error and continue */ #define FAT_ERRORS_PANIC 2 /* panic on error */ #define FAT_ERRORS_RO 3 /* remount r/o on error */ #define FAT_NFS_STALE_RW 1 /* NFS RW support, can cause ESTALE */ #define FAT_NFS_NOSTALE_RO 2 /* NFS RO support, no ESTALE issue */ struct fat_mount_options { kuid_t fs_uid; kgid_t fs_gid; unsigned short fs_fmask; unsigned short fs_dmask; unsigned short codepage; /* Codepage for shortname conversions */ int time_offset; /* Offset of timestamps from UTC (in minutes) */ char *iocharset; /* Charset used for filename input/display */ unsigned short shortname; /* flags for shortname display/create rule */ unsigned char name_check; /* r = relaxed, n = normal, s = strict */ unsigned char errors; /* On error: continue, panic, remount-ro */ unsigned char nfs; /* NFS support: nostale_ro, stale_rw */ unsigned short allow_utime;/* permission for setting the [am]time */ unsigned quiet:1, /* set = fake successful chmods and chowns */ showexec:1, /* set = only set x bit for com/exe/bat */ sys_immutable:1, /* set = system files are immutable */ dotsOK:1, /* set = hidden and system files are named '.filename' */ isvfat:1, /* 0=no vfat long filename support, 1=vfat support */ utf8:1, /* Use of UTF-8 character set (Default) */ unicode_xlate:1, /* create escape sequences for unhandled Unicode */ numtail:1, /* Does first alias have a numeric '~1' type tail? */ flush:1, /* write things quickly */ nocase:1, /* Does this need case conversion? 0=need case conversion*/ usefree:1, /* Use free_clusters for FAT32 */ tz_set:1, /* Filesystem timestamps' offset set */ rodir:1, /* allow ATTR_RO for directory */ discard:1, /* Issue discard requests on deletions */ dos1xfloppy:1, /* Assume default BPB for DOS 1.x floppies */ debug:1; /* Not currently used */ }; #define FAT_HASH_BITS 8 #define FAT_HASH_SIZE (1UL << FAT_HASH_BITS) /* * MS-DOS file system in-core superblock data */ struct msdos_sb_info { unsigned short sec_per_clus; /* sectors/cluster */ unsigned short cluster_bits; /* log2(cluster_size) */ unsigned int cluster_size; /* cluster size */ unsigned char fats, fat_bits; /* number of FATs, FAT bits (12,16 or 32) */ unsigned short fat_start; unsigned long fat_length; /* FAT start & length (sec.) */ unsigned long dir_start; unsigned short dir_entries; /* root dir start & entries */ unsigned long data_start; /* first data sector */ unsigned long max_cluster; /* maximum cluster number */ unsigned long root_cluster; /* first cluster of the root directory */ unsigned long fsinfo_sector; /* sector number of FAT32 fsinfo */ struct mutex fat_lock; struct mutex nfs_build_inode_lock; struct mutex s_lock; unsigned int prev_free; /* previously allocated cluster number */ unsigned int free_clusters; /* -1 if undefined */ unsigned int free_clus_valid; /* is free_clusters valid? */ struct fat_mount_options options; struct nls_table *nls_disk; /* Codepage used on disk */ struct nls_table *nls_io; /* Charset used for input and display */ const void *dir_ops; /* Opaque; default directory operations */ int dir_per_block; /* dir entries per block */ int dir_per_block_bits; /* log2(dir_per_block) */ unsigned int vol_id; /*volume ID*/ int fatent_shift; const struct fatent_operations *fatent_ops; struct inode *fat_inode; struct inode *fsinfo_inode; struct ratelimit_state ratelimit; spinlock_t inode_hash_lock; struct hlist_head inode_hashtable[FAT_HASH_SIZE]; spinlock_t dir_hash_lock; struct hlist_head dir_hashtable[FAT_HASH_SIZE]; unsigned int dirty; /* fs state before mount */ struct rcu_head rcu; }; #define FAT_CACHE_VALID 0 /* special case for valid cache */ /* * MS-DOS file system inode data in memory */ struct msdos_inode_info { spinlock_t cache_lru_lock; struct list_head cache_lru; int nr_caches; /* for avoiding the race between fat_free() and fat_get_cluster() */ unsigned int cache_valid_id; /* NOTE: mmu_private is 64bits, so must hold ->i_mutex to access */ loff_t mmu_private; /* physically allocated size */ int i_start; /* first cluster or 0 */ int i_logstart; /* logical first cluster */ int i_attrs; /* unused attribute bits */ loff_t i_pos; /* on-disk position of directory entry or 0 */ struct hlist_node i_fat_hash; /* hash by i_location */ struct hlist_node i_dir_hash; /* hash by i_logstart */ struct rw_semaphore truncate_lock; /* protect bmap against truncate */ struct timespec64 i_crtime; /* File creation (birth) time */ struct inode vfs_inode; }; struct fat_slot_info { loff_t i_pos; /* on-disk position of directory entry */ loff_t slot_off; /* offset for slot or de start */ int nr_slots; /* number of slots + 1(de) in filename */ struct msdos_dir_entry *de; struct buffer_head *bh; }; static inline struct msdos_sb_info *MSDOS_SB(struct super_block *sb) { return sb->s_fs_info; } /* * Functions that determine the variant of the FAT file system (i.e., * whether this is FAT12, FAT16 or FAT32. */ static inline bool is_fat12(const struct msdos_sb_info *sbi) { return sbi->fat_bits == 12; } static inline bool is_fat16(const struct msdos_sb_info *sbi) { return sbi->fat_bits == 16; } static inline bool is_fat32(const struct msdos_sb_info *sbi) { return sbi->fat_bits == 32; } /* Maximum number of clusters */ static inline u32 max_fat(struct super_block *sb) { struct msdos_sb_info *sbi = MSDOS_SB(sb); return is_fat32(sbi) ? MAX_FAT32 : is_fat16(sbi) ? MAX_FAT16 : MAX_FAT12; } static inline struct msdos_inode_info *MSDOS_I(struct inode *inode) { return container_of(inode, struct msdos_inode_info, vfs_inode); } /* * If ->i_mode can't hold S_IWUGO (i.e. ATTR_RO), we use ->i_attrs to * save ATTR_RO instead of ->i_mode. * * If it's directory and !sbi->options.rodir, ATTR_RO isn't read-only * bit, it's just used as flag for app. */ static inline int fat_mode_can_hold_ro(struct inode *inode) { struct msdos_sb_info *sbi = MSDOS_SB(inode->i_sb); umode_t mask; if (S_ISDIR(inode->i_mode)) { if (!sbi->options.rodir) return 0; mask = ~sbi->options.fs_dmask; } else mask = ~sbi->options.fs_fmask; if (!(mask & S_IWUGO)) return 0; return 1; } /* Convert attribute bits and a mask to the UNIX mode. */ static inline umode_t fat_make_mode(struct msdos_sb_info *sbi, u8 attrs, umode_t mode) { if (attrs & ATTR_RO && !((attrs & ATTR_DIR) && !sbi->options.rodir)) mode &= ~S_IWUGO; if (attrs & ATTR_DIR) return (mode & ~sbi->options.fs_dmask) | S_IFDIR; else return (mode & ~sbi->options.fs_fmask) | S_IFREG; } /* Return the FAT attribute byte for this inode */ static inline u8 fat_make_attrs(struct inode *inode) { u8 attrs = MSDOS_I(inode)->i_attrs; if (S_ISDIR(inode->i_mode)) attrs |= ATTR_DIR; if (fat_mode_can_hold_ro(inode) && !(inode->i_mode & S_IWUGO)) attrs |= ATTR_RO; return attrs; } static inline void fat_save_attrs(struct inode *inode, u8 attrs) { if (fat_mode_can_hold_ro(inode)) MSDOS_I(inode)->i_attrs = attrs & ATTR_UNUSED; else MSDOS_I(inode)->i_attrs = attrs & (ATTR_UNUSED | ATTR_RO); } static inline unsigned char fat_checksum(const __u8 *name) { unsigned char s = name[0]; s = (s<<7) + (s>>1) + name[1]; s = (s<<7) + (s>>1) + name[2]; s = (s<<7) + (s>>1) + name[3]; s = (s<<7) + (s>>1) + name[4]; s = (s<<7) + (s>>1) + name[5]; s = (s<<7) + (s>>1) + name[6]; s = (s<<7) + (s>>1) + name[7]; s = (s<<7) + (s>>1) + name[8]; s = (s<<7) + (s>>1) + name[9]; s = (s<<7) + (s>>1) + name[10]; return s; } static inline sector_t fat_clus_to_blknr(struct msdos_sb_info *sbi, int clus) { return ((sector_t)clus - FAT_START_ENT) * sbi->sec_per_clus + sbi->data_start; } static inline void fat_get_blknr_offset(struct msdos_sb_info *sbi, loff_t i_pos, sector_t *blknr, int *offset) { *blknr = i_pos >> sbi->dir_per_block_bits; *offset = i_pos & (sbi->dir_per_block - 1); } static inline loff_t fat_i_pos_read(struct msdos_sb_info *sbi, struct inode *inode) { loff_t i_pos; #if BITS_PER_LONG == 32 spin_lock(&sbi->inode_hash_lock); #endif i_pos = MSDOS_I(inode)->i_pos; #if BITS_PER_LONG == 32 spin_unlock(&sbi->inode_hash_lock); #endif return i_pos; } static inline void fat16_towchar(wchar_t *dst, const __u8 *src, size_t len) { #ifdef __BIG_ENDIAN while (len--) { *dst++ = src[0] | (src[1] << 8); src += 2; } #else memcpy(dst, src, len * 2); #endif } static inline int fat_get_start(const struct msdos_sb_info *sbi, const struct msdos_dir_entry *de) { int cluster = le16_to_cpu(de->start); if (is_fat32(sbi)) cluster |= (le16_to_cpu(de->starthi) << 16); return cluster; } static inline void fat_set_start(struct msdos_dir_entry *de, int cluster) { de->start = cpu_to_le16(cluster); de->starthi = cpu_to_le16(cluster >> 16); } static inline void fatwchar_to16(__u8 *dst, const wchar_t *src, size_t len) { #ifdef __BIG_ENDIAN while (len--) { dst[0] = *src & 0x00FF; dst[1] = (*src & 0xFF00) >> 8; dst += 2; src++; } #else memcpy(dst, src, len * 2); #endif } /* fat/cache.c */ extern void fat_cache_inval_inode(struct inode *inode); extern int fat_get_cluster(struct inode *inode, int cluster, int *fclus, int *dclus); extern int fat_get_mapped_cluster(struct inode *inode, sector_t sector, sector_t last_block, unsigned long *mapped_blocks, sector_t *bmap); extern int fat_bmap(struct inode *inode, sector_t sector, sector_t *phys, unsigned long *mapped_blocks, int create, bool from_bmap); /* fat/dir.c */ extern const struct file_operations fat_dir_operations; extern int fat_search_long(struct inode *inode, const unsigned char *name, int name_len, struct fat_slot_info *sinfo); extern int fat_dir_empty(struct inode *dir); extern int fat_subdirs(struct inode *dir); extern int fat_scan(struct inode *dir, const unsigned char *name, struct fat_slot_info *sinfo); extern int fat_scan_logstart(struct inode *dir, int i_logstart, struct fat_slot_info *sinfo); extern int fat_get_dotdot_entry(struct inode *dir, struct buffer_head **bh, struct msdos_dir_entry **de); extern int fat_alloc_new_dir(struct inode *dir, struct timespec64 *ts); extern int fat_add_entries(struct inode *dir, void *slots, int nr_slots, struct fat_slot_info *sinfo); extern int fat_remove_entries(struct inode *dir, struct fat_slot_info *sinfo); /* fat/fatent.c */ struct fat_entry { int entry; union { u8 *ent12_p[2]; __le16 *ent16_p; __le32 *ent32_p; } u; int nr_bhs; struct buffer_head *bhs[2]; struct inode *fat_inode; }; static inline void fatent_init(struct fat_entry *fatent) { fatent->nr_bhs = 0; fatent->entry = 0; fatent->u.ent32_p = NULL; fatent->bhs[0] = fatent->bhs[1] = NULL; fatent->fat_inode = NULL; } static inline void fatent_set_entry(struct fat_entry *fatent, int entry) { fatent->entry = entry; fatent->u.ent32_p = NULL; } static inline void fatent_brelse(struct fat_entry *fatent) { int i; fatent->u.ent32_p = NULL; for (i = 0; i < fatent->nr_bhs; i++) brelse(fatent->bhs[i]); fatent->nr_bhs = 0; fatent->bhs[0] = fatent->bhs[1] = NULL; fatent->fat_inode = NULL; } static inline bool fat_valid_entry(struct msdos_sb_info *sbi, int entry) { return FAT_START_ENT <= entry && entry < sbi->max_cluster; } extern void fat_ent_access_init(struct super_block *sb); extern int fat_ent_read(struct inode *inode, struct fat_entry *fatent, int entry); extern int fat_ent_write(struct inode *inode, struct fat_entry *fatent, int new, int wait); extern int fat_alloc_clusters(struct inode *inode, int *cluster, int nr_cluster); extern int fat_free_clusters(struct inode *inode, int cluster); extern int fat_count_free_clusters(struct super_block *sb); extern int fat_trim_fs(struct inode *inode, struct fstrim_range *range); /* fat/file.c */ extern long fat_generic_ioctl(struct file *filp, unsigned int cmd, unsigned long arg); extern const struct file_operations fat_file_operations; extern const struct inode_operations fat_file_inode_operations; extern int fat_setattr(struct mnt_idmap *idmap, struct dentry *dentry, struct iattr *attr); extern void fat_truncate_blocks(struct inode *inode, loff_t offset); extern int fat_getattr(struct mnt_idmap *idmap, const struct path *path, struct kstat *stat, u32 request_mask, unsigned int flags); extern int fat_file_fsync(struct file *file, loff_t start, loff_t end, int datasync); /* fat/inode.c */ extern int fat_block_truncate_page(struct inode *inode, loff_t from); extern void fat_attach(struct inode *inode, loff_t i_pos); extern void fat_detach(struct inode *inode); extern struct inode *fat_iget(struct super_block *sb, loff_t i_pos); extern struct inode *fat_build_inode(struct super_block *sb, struct msdos_dir_entry *de, loff_t i_pos); extern int fat_sync_inode(struct inode *inode); extern int fat_fill_super(struct super_block *sb, struct fs_context *fc, void (*setup)(struct super_block *)); extern int fat_fill_inode(struct inode *inode, struct msdos_dir_entry *de); extern int fat_flush_inodes(struct super_block *sb, struct inode *i1, struct inode *i2); extern const struct fs_parameter_spec fat_param_spec[]; int fat_init_fs_context(struct fs_context *fc, bool is_vfat); void fat_free_fc(struct fs_context *fc); int fat_parse_param(struct fs_context *fc, struct fs_parameter *param, bool is_vfat); int fat_reconfigure(struct fs_context *fc); static inline unsigned long fat_dir_hash(int logstart) { return hash_32(logstart, FAT_HASH_BITS); } extern int fat_add_cluster(struct inode *inode); /* fat/misc.c */ extern __printf(3, 4) __cold void __fat_fs_error(struct super_block *sb, int report, const char *fmt, ...); #define fat_fs_error(sb, fmt, args...) \ __fat_fs_error(sb, 1, fmt , ## args) #define fat_fs_error_ratelimit(sb, fmt, args...) \ __fat_fs_error(sb, __ratelimit(&MSDOS_SB(sb)->ratelimit), fmt , ## args) #define FAT_PRINTK_PREFIX "%sFAT-fs (%s): " #define fat_msg(sb, level, fmt, args...) \ do { \ printk_index_subsys_emit(FAT_PRINTK_PREFIX, level, fmt, ##args);\ _fat_msg(sb, level, fmt, ##args); \ } while (0) __printf(3, 4) __cold void _fat_msg(struct super_block *sb, const char *level, const char *fmt, ...); #define fat_msg_ratelimit(sb, level, fmt, args...) \ do { \ if (__ratelimit(&MSDOS_SB(sb)->ratelimit)) \ fat_msg(sb, level, fmt, ## args); \ } while (0) extern int fat_clusters_flush(struct super_block *sb); extern int fat_chain_add(struct inode *inode, int new_dclus, int nr_cluster); extern void fat_time_fat2unix(struct msdos_sb_info *sbi, struct timespec64 *ts, __le16 __time, __le16 __date, u8 time_cs); extern void fat_time_unix2fat(struct msdos_sb_info *sbi, struct timespec64 *ts, __le16 *time, __le16 *date, u8 *time_cs); extern struct timespec64 fat_truncate_atime(const struct msdos_sb_info *sbi, const struct timespec64 *ts); extern struct timespec64 fat_truncate_mtime(const struct msdos_sb_info *sbi, const struct timespec64 *ts); extern int fat_truncate_time(struct inode *inode, struct timespec64 *now, int flags); extern int fat_update_time(struct inode *inode, int flags); extern int fat_sync_bhs(struct buffer_head **bhs, int nr_bhs); int fat_cache_init(void); void fat_cache_destroy(void); /* fat/nfs.c */ extern const struct export_operations fat_export_ops; extern const struct export_operations fat_export_ops_nostale; /* helper for printk */ typedef unsigned long long llu; #endif /* !_FAT_H */ |
| 2 2 9 1 1 1 1 1 4 1 2 1 3 2 1 3 3 5 5 5 1 1 5 4 4 3 3 13 1 1 1 2 2 1 5 1 1 1 2 21 20 21 21 21 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 | // SPDX-License-Identifier: GPL-2.0-only /* * File: pn_netlink.c * * Phonet netlink interface * * Copyright (C) 2008 Nokia Corporation. * * Authors: Sakari Ailus <sakari.ailus@nokia.com> * Remi Denis-Courmont */ #include <linux/kernel.h> #include <linux/netlink.h> #include <linux/phonet.h> #include <linux/slab.h> #include <net/sock.h> #include <net/phonet/pn_dev.h> /* Device address handling */ static int fill_addr(struct sk_buff *skb, u32 ifindex, u8 addr, u32 portid, u32 seq, int event); void phonet_address_notify(struct net *net, int event, u32 ifindex, u8 addr) { struct sk_buff *skb; int err = -ENOBUFS; skb = nlmsg_new(NLMSG_ALIGN(sizeof(struct ifaddrmsg)) + nla_total_size(1), GFP_KERNEL); if (skb == NULL) goto errout; err = fill_addr(skb, ifindex, addr, 0, 0, event); if (err < 0) { WARN_ON(err == -EMSGSIZE); kfree_skb(skb); goto errout; } rtnl_notify(skb, net, 0, RTNLGRP_PHONET_IFADDR, NULL, GFP_KERNEL); return; errout: rtnl_set_sk_err(net, RTNLGRP_PHONET_IFADDR, err); } static const struct nla_policy ifa_phonet_policy[IFA_MAX+1] = { [IFA_LOCAL] = { .type = NLA_U8 }, }; static int addr_doit(struct sk_buff *skb, struct nlmsghdr *nlh, struct netlink_ext_ack *extack) { struct net *net = sock_net(skb->sk); struct nlattr *tb[IFA_MAX+1]; struct net_device *dev; struct ifaddrmsg *ifm; int err; u8 pnaddr; if (!netlink_capable(skb, CAP_NET_ADMIN)) return -EPERM; if (!netlink_capable(skb, CAP_SYS_ADMIN)) return -EPERM; err = nlmsg_parse_deprecated(nlh, sizeof(*ifm), tb, IFA_MAX, ifa_phonet_policy, extack); if (err < 0) return err; ifm = nlmsg_data(nlh); if (tb[IFA_LOCAL] == NULL) return -EINVAL; pnaddr = nla_get_u8(tb[IFA_LOCAL]); if (pnaddr & 3) /* Phonet addresses only have 6 high-order bits */ return -EINVAL; rcu_read_lock(); dev = dev_get_by_index_rcu(net, ifm->ifa_index); if (!dev) { rcu_read_unlock(); return -ENODEV; } if (nlh->nlmsg_type == RTM_NEWADDR) err = phonet_address_add(dev, pnaddr); else err = phonet_address_del(dev, pnaddr); rcu_read_unlock(); if (!err) phonet_address_notify(net, nlh->nlmsg_type, ifm->ifa_index, pnaddr); return err; } static int fill_addr(struct sk_buff *skb, u32 ifindex, u8 addr, u32 portid, u32 seq, int event) { struct ifaddrmsg *ifm; struct nlmsghdr *nlh; nlh = nlmsg_put(skb, portid, seq, event, sizeof(*ifm), 0); if (nlh == NULL) return -EMSGSIZE; ifm = nlmsg_data(nlh); ifm->ifa_family = AF_PHONET; ifm->ifa_prefixlen = 0; ifm->ifa_flags = IFA_F_PERMANENT; ifm->ifa_scope = RT_SCOPE_LINK; ifm->ifa_index = ifindex; if (nla_put_u8(skb, IFA_LOCAL, addr)) goto nla_put_failure; nlmsg_end(skb, nlh); return 0; nla_put_failure: nlmsg_cancel(skb, nlh); return -EMSGSIZE; } static int getaddr_dumpit(struct sk_buff *skb, struct netlink_callback *cb) { int addr_idx = 0, addr_start_idx = cb->args[1]; int dev_idx = 0, dev_start_idx = cb->args[0]; struct phonet_device_list *pndevs; struct phonet_device *pnd; int err = 0; pndevs = phonet_device_list(sock_net(skb->sk)); rcu_read_lock(); list_for_each_entry_rcu(pnd, &pndevs->list, list) { DECLARE_BITMAP(addrs, 64); u8 addr; if (dev_idx > dev_start_idx) addr_start_idx = 0; if (dev_idx++ < dev_start_idx) continue; addr_idx = 0; memcpy(addrs, pnd->addrs, sizeof(pnd->addrs)); for_each_set_bit(addr, addrs, 64) { if (addr_idx++ < addr_start_idx) continue; err = fill_addr(skb, READ_ONCE(pnd->netdev->ifindex), addr << 2, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq, RTM_NEWADDR); if (err < 0) goto out; } } out: rcu_read_unlock(); cb->args[0] = dev_idx; cb->args[1] = addr_idx; return err; } /* Routes handling */ static int fill_route(struct sk_buff *skb, u32 ifindex, u8 dst, u32 portid, u32 seq, int event) { struct rtmsg *rtm; struct nlmsghdr *nlh; nlh = nlmsg_put(skb, portid, seq, event, sizeof(*rtm), 0); if (nlh == NULL) return -EMSGSIZE; rtm = nlmsg_data(nlh); rtm->rtm_family = AF_PHONET; rtm->rtm_dst_len = 6; rtm->rtm_src_len = 0; rtm->rtm_tos = 0; rtm->rtm_table = RT_TABLE_MAIN; rtm->rtm_protocol = RTPROT_STATIC; rtm->rtm_scope = RT_SCOPE_UNIVERSE; rtm->rtm_type = RTN_UNICAST; rtm->rtm_flags = 0; if (nla_put_u8(skb, RTA_DST, dst) || nla_put_u32(skb, RTA_OIF, ifindex)) goto nla_put_failure; nlmsg_end(skb, nlh); return 0; nla_put_failure: nlmsg_cancel(skb, nlh); return -EMSGSIZE; } void rtm_phonet_notify(struct net *net, int event, u32 ifindex, u8 dst) { struct sk_buff *skb; int err = -ENOBUFS; skb = nlmsg_new(NLMSG_ALIGN(sizeof(struct rtmsg)) + nla_total_size(1) + nla_total_size(4), GFP_KERNEL); if (skb == NULL) goto errout; err = fill_route(skb, ifindex, dst, 0, 0, event); if (err < 0) { WARN_ON(err == -EMSGSIZE); kfree_skb(skb); goto errout; } rtnl_notify(skb, net, 0, RTNLGRP_PHONET_ROUTE, NULL, GFP_KERNEL); return; errout: rtnl_set_sk_err(net, RTNLGRP_PHONET_ROUTE, err); } static const struct nla_policy rtm_phonet_policy[RTA_MAX+1] = { [RTA_DST] = { .type = NLA_U8 }, [RTA_OIF] = { .type = NLA_U32 }, }; static int route_doit(struct sk_buff *skb, struct nlmsghdr *nlh, struct netlink_ext_ack *extack) { struct net *net = sock_net(skb->sk); struct nlattr *tb[RTA_MAX+1]; bool sync_needed = false; struct net_device *dev; struct rtmsg *rtm; u32 ifindex; int err; u8 dst; if (!netlink_capable(skb, CAP_NET_ADMIN)) return -EPERM; if (!netlink_capable(skb, CAP_SYS_ADMIN)) return -EPERM; err = nlmsg_parse_deprecated(nlh, sizeof(*rtm), tb, RTA_MAX, rtm_phonet_policy, extack); if (err < 0) return err; rtm = nlmsg_data(nlh); if (rtm->rtm_table != RT_TABLE_MAIN || rtm->rtm_type != RTN_UNICAST) return -EINVAL; if (tb[RTA_DST] == NULL || tb[RTA_OIF] == NULL) return -EINVAL; dst = nla_get_u8(tb[RTA_DST]); if (dst & 3) /* Phonet addresses only have 6 high-order bits */ return -EINVAL; ifindex = nla_get_u32(tb[RTA_OIF]); rcu_read_lock(); dev = dev_get_by_index_rcu(net, ifindex); if (!dev) { rcu_read_unlock(); return -ENODEV; } if (nlh->nlmsg_type == RTM_NEWROUTE) { err = phonet_route_add(dev, dst); } else { err = phonet_route_del(dev, dst); if (!err) sync_needed = true; } rcu_read_unlock(); if (sync_needed) { synchronize_rcu(); dev_put(dev); } if (!err) rtm_phonet_notify(net, nlh->nlmsg_type, ifindex, dst); return err; } static int route_dumpit(struct sk_buff *skb, struct netlink_callback *cb) { struct net *net = sock_net(skb->sk); int err = 0; u8 addr; rcu_read_lock(); for (addr = cb->args[0]; addr < 64; addr++) { struct net_device *dev = phonet_route_get_rcu(net, addr << 2); if (!dev) continue; err = fill_route(skb, READ_ONCE(dev->ifindex), addr << 2, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq, RTM_NEWROUTE); if (err < 0) break; } rcu_read_unlock(); cb->args[0] = addr; return err; } static const struct rtnl_msg_handler phonet_rtnl_msg_handlers[] __initdata_or_module = { {.owner = THIS_MODULE, .protocol = PF_PHONET, .msgtype = RTM_NEWADDR, .doit = addr_doit, .flags = RTNL_FLAG_DOIT_UNLOCKED}, {.owner = THIS_MODULE, .protocol = PF_PHONET, .msgtype = RTM_DELADDR, .doit = addr_doit, .flags = RTNL_FLAG_DOIT_UNLOCKED}, {.owner = THIS_MODULE, .protocol = PF_PHONET, .msgtype = RTM_GETADDR, .dumpit = getaddr_dumpit, .flags = RTNL_FLAG_DUMP_UNLOCKED}, {.owner = THIS_MODULE, .protocol = PF_PHONET, .msgtype = RTM_NEWROUTE, .doit = route_doit, .flags = RTNL_FLAG_DOIT_UNLOCKED}, {.owner = THIS_MODULE, .protocol = PF_PHONET, .msgtype = RTM_DELROUTE, .doit = route_doit, .flags = RTNL_FLAG_DOIT_UNLOCKED}, {.owner = THIS_MODULE, .protocol = PF_PHONET, .msgtype = RTM_GETROUTE, .dumpit = route_dumpit, .flags = RTNL_FLAG_DUMP_UNLOCKED}, }; int __init phonet_netlink_register(void) { return rtnl_register_many(phonet_rtnl_msg_handlers); } |
| 2 2 2 2 1 1 3 3 3 3 2 1 7 7 7 4 4 4 4 4 7 5 3 4 5 19 7 1 3 1 1 1 6 4 2 1 1 7 1 1 8 1 1 1 1 1 1 1 1 49 1 1 7 1 2 5 4 1 1 1 1 1 2 5 1 1 1 1 1 1 8 1 1 4 1 1 1 7 3 3 1 1 49 16 1 2 1 2 1 1 1 1 1 1 1 1 1 1 2 2 2 1 1 10 1 6 2 1 1 2 1 1 4 1 3 2 2 1 1 1 1 2 2 2 1 1 7 1 1 4 2 4 2 1 1 1 1 1 1 1 1 1 2 1 1 1 1 40 16 15 1 16 36 36 36 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 | // SPDX-License-Identifier: GPL-2.0 /* * Copyright © 2019 Oracle and/or its affiliates. All rights reserved. * Copyright © 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved. * * KVM Xen emulation */ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #include "x86.h" #include "xen.h" #include "hyperv.h" #include "irq.h" #include <linux/eventfd.h> #include <linux/kvm_host.h> #include <linux/sched/stat.h> #include <trace/events/kvm.h> #include <xen/interface/xen.h> #include <xen/interface/vcpu.h> #include <xen/interface/version.h> #include <xen/interface/event_channel.h> #include <xen/interface/sched.h> #include <asm/xen/cpuid.h> #include <asm/pvclock.h> #include "cpuid.h" #include "trace.h" static int kvm_xen_set_evtchn(struct kvm_xen_evtchn *xe, struct kvm *kvm); static int kvm_xen_setattr_evtchn(struct kvm *kvm, struct kvm_xen_hvm_attr *data); static bool kvm_xen_hcall_evtchn_send(struct kvm_vcpu *vcpu, u64 param, u64 *r); DEFINE_STATIC_KEY_DEFERRED_FALSE(kvm_xen_enabled, HZ); static int kvm_xen_shared_info_init(struct kvm *kvm) { struct gfn_to_pfn_cache *gpc = &kvm->arch.xen.shinfo_cache; struct pvclock_wall_clock *wc; u32 *wc_sec_hi; u32 wc_version; u64 wall_nsec; int ret = 0; int idx = srcu_read_lock(&kvm->srcu); read_lock_irq(&gpc->lock); while (!kvm_gpc_check(gpc, PAGE_SIZE)) { read_unlock_irq(&gpc->lock); ret = kvm_gpc_refresh(gpc, PAGE_SIZE); if (ret) goto out; read_lock_irq(&gpc->lock); } /* * This code mirrors kvm_write_wall_clock() except that it writes * directly through the pfn cache and doesn't mark the page dirty. */ wall_nsec = kvm_get_wall_clock_epoch(kvm); /* Paranoia checks on the 32-bit struct layout */ BUILD_BUG_ON(offsetof(struct compat_shared_info, wc) != 0x900); BUILD_BUG_ON(offsetof(struct compat_shared_info, arch.wc_sec_hi) != 0x924); BUILD_BUG_ON(offsetof(struct pvclock_vcpu_time_info, version) != 0); #ifdef CONFIG_X86_64 /* Paranoia checks on the 64-bit struct layout */ BUILD_BUG_ON(offsetof(struct shared_info, wc) != 0xc00); BUILD_BUG_ON(offsetof(struct shared_info, wc_sec_hi) != 0xc0c); if (IS_ENABLED(CONFIG_64BIT) && kvm->arch.xen.long_mode) { struct shared_info *shinfo = gpc->khva; wc_sec_hi = &shinfo->wc_sec_hi; wc = &shinfo->wc; } else #endif { struct compat_shared_info *shinfo = gpc->khva; wc_sec_hi = &shinfo->arch.wc_sec_hi; wc = &shinfo->wc; } /* Increment and ensure an odd value */ wc_version = wc->version = (wc->version + 1) | 1; smp_wmb(); wc->nsec = do_div(wall_nsec, NSEC_PER_SEC); wc->sec = (u32)wall_nsec; *wc_sec_hi = wall_nsec >> 32; smp_wmb(); wc->version = wc_version + 1; read_unlock_irq(&gpc->lock); kvm_make_all_cpus_request(kvm, KVM_REQ_MASTERCLOCK_UPDATE); out: srcu_read_unlock(&kvm->srcu, idx); return ret; } void kvm_xen_inject_timer_irqs(struct kvm_vcpu *vcpu) { if (atomic_read(&vcpu->arch.xen.timer_pending) > 0) { struct kvm_xen_evtchn e; e.vcpu_id = vcpu->vcpu_id; e.vcpu_idx = vcpu->vcpu_idx; e.port = vcpu->arch.xen.timer_virq; e.priority = KVM_IRQ_ROUTING_XEN_EVTCHN_PRIO_2LEVEL; kvm_xen_set_evtchn(&e, vcpu->kvm); vcpu->arch.xen.timer_expires = 0; atomic_set(&vcpu->arch.xen.timer_pending, 0); } } static enum hrtimer_restart xen_timer_callback(struct hrtimer *timer) { struct kvm_vcpu *vcpu = container_of(timer, struct kvm_vcpu, arch.xen.timer); struct kvm_xen_evtchn e; int rc; if (atomic_read(&vcpu->arch.xen.timer_pending)) return HRTIMER_NORESTART; e.vcpu_id = vcpu->vcpu_id; e.vcpu_idx = vcpu->vcpu_idx; e.port = vcpu->arch.xen.timer_virq; e.priority = KVM_IRQ_ROUTING_XEN_EVTCHN_PRIO_2LEVEL; rc = kvm_xen_set_evtchn_fast(&e, vcpu->kvm); if (rc != -EWOULDBLOCK) { vcpu->arch.xen.timer_expires = 0; return HRTIMER_NORESTART; } atomic_inc(&vcpu->arch.xen.timer_pending); kvm_make_request(KVM_REQ_UNBLOCK, vcpu); kvm_vcpu_kick(vcpu); return HRTIMER_NORESTART; } static void kvm_xen_start_timer(struct kvm_vcpu *vcpu, u64 guest_abs, bool linux_wa) { int64_t kernel_now, delta; uint64_t guest_now; /* * The guest provides the requested timeout in absolute nanoseconds * of the KVM clock — as *it* sees it, based on the scaled TSC and * the pvclock information provided by KVM. * * The kernel doesn't support hrtimers based on CLOCK_MONOTONIC_RAW * so use CLOCK_MONOTONIC. In the timescales covered by timers, the * difference won't matter much as there is no cumulative effect. * * Calculate the time for some arbitrary point in time around "now" * in terms of both kvmclock and CLOCK_MONOTONIC. Calculate the * delta between the kvmclock "now" value and the guest's requested * timeout, apply the "Linux workaround" described below, and add * the resulting delta to the CLOCK_MONOTONIC "now" value, to get * the absolute CLOCK_MONOTONIC time at which the timer should * fire. */ if (vcpu->arch.hv_clock.version && vcpu->kvm->arch.use_master_clock && static_cpu_has(X86_FEATURE_CONSTANT_TSC)) { uint64_t host_tsc, guest_tsc; if (!IS_ENABLED(CONFIG_64BIT) || !kvm_get_monotonic_and_clockread(&kernel_now, &host_tsc)) { /* * Don't fall back to get_kvmclock_ns() because it's * broken; it has a systemic error in its results * because it scales directly from host TSC to * nanoseconds, and doesn't scale first to guest TSC * and *then* to nanoseconds as the guest does. * * There is a small error introduced here because time * continues to elapse between the ktime_get() and the * subsequent rdtsc(). But not the systemic drift due * to get_kvmclock_ns(). */ kernel_now = ktime_get(); /* This is CLOCK_MONOTONIC */ host_tsc = rdtsc(); } /* Calculate the guest kvmclock as the guest would do it. */ guest_tsc = kvm_read_l1_tsc(vcpu, host_tsc); guest_now = __pvclock_read_cycles(&vcpu->arch.hv_clock, guest_tsc); } else { /* * Without CONSTANT_TSC, get_kvmclock_ns() is the only option. * * Also if the guest PV clock hasn't been set up yet, as is * likely to be the case during migration when the vCPU has * not been run yet. It would be possible to calculate the * scaling factors properly in that case but there's not much * point in doing so. The get_kvmclock_ns() drift accumulates * over time, so it's OK to use it at startup. Besides, on * migration there's going to be a little bit of skew in the * precise moment at which timers fire anyway. Often they'll * be in the "past" by the time the VM is running again after * migration. */ guest_now = get_kvmclock_ns(vcpu->kvm); kernel_now = ktime_get(); } delta = guest_abs - guest_now; /* * Xen has a 'Linux workaround' in do_set_timer_op() which checks for * negative absolute timeout values (caused by integer overflow), and * for values about 13 days in the future (2^50ns) which would be * caused by jiffies overflow. For those cases, Xen sets the timeout * 100ms in the future (not *too* soon, since if a guest really did * set a long timeout on purpose we don't want to keep churning CPU * time by waking it up). Emulate Xen's workaround when starting the * timer in response to __HYPERVISOR_set_timer_op. */ if (linux_wa && unlikely((int64_t)guest_abs < 0 || (delta > 0 && (uint32_t) (delta >> 50) != 0))) { delta = 100 * NSEC_PER_MSEC; guest_abs = guest_now + delta; } /* * Avoid races with the old timer firing. Checking timer_expires * to avoid calling hrtimer_cancel() will only have false positives * so is fine. */ if (vcpu->arch.xen.timer_expires) hrtimer_cancel(&vcpu->arch.xen.timer); atomic_set(&vcpu->arch.xen.timer_pending, 0); vcpu->arch.xen.timer_expires = guest_abs; if (delta <= 0) xen_timer_callback(&vcpu->arch.xen.timer); else hrtimer_start(&vcpu->arch.xen.timer, ktime_add_ns(kernel_now, delta), HRTIMER_MODE_ABS_HARD); } static void kvm_xen_stop_timer(struct kvm_vcpu *vcpu) { hrtimer_cancel(&vcpu->arch.xen.timer); vcpu->arch.xen.timer_expires = 0; atomic_set(&vcpu->arch.xen.timer_pending, 0); } static void kvm_xen_update_runstate_guest(struct kvm_vcpu *v, bool atomic) { struct kvm_vcpu_xen *vx = &v->arch.xen; struct gfn_to_pfn_cache *gpc1 = &vx->runstate_cache; struct gfn_to_pfn_cache *gpc2 = &vx->runstate2_cache; size_t user_len, user_len1, user_len2; struct vcpu_runstate_info rs; unsigned long flags; size_t times_ofs; uint8_t *update_bit = NULL; uint64_t entry_time; uint64_t *rs_times; int *rs_state; /* * The only difference between 32-bit and 64-bit versions of the * runstate struct is the alignment of uint64_t in 32-bit, which * means that the 64-bit version has an additional 4 bytes of * padding after the first field 'state'. Let's be really really * paranoid about that, and matching it with our internal data * structures that we memcpy into it... */ BUILD_BUG_ON(offsetof(struct vcpu_runstate_info, state) != 0); BUILD_BUG_ON(offsetof(struct compat_vcpu_runstate_info, state) != 0); BUILD_BUG_ON(sizeof(struct compat_vcpu_runstate_info) != 0x2c); #ifdef CONFIG_X86_64 /* * The 64-bit structure has 4 bytes of padding before 'state_entry_time' * so each subsequent field is shifted by 4, and it's 4 bytes longer. */ BUILD_BUG_ON(offsetof(struct vcpu_runstate_info, state_entry_time) != offsetof(struct compat_vcpu_runstate_info, state_entry_time) + 4); BUILD_BUG_ON(offsetof(struct vcpu_runstate_info, time) != offsetof(struct compat_vcpu_runstate_info, time) + 4); BUILD_BUG_ON(sizeof(struct vcpu_runstate_info) != 0x2c + 4); #endif /* * The state field is in the same place at the start of both structs, * and is the same size (int) as vx->current_runstate. */ BUILD_BUG_ON(offsetof(struct vcpu_runstate_info, state) != offsetof(struct compat_vcpu_runstate_info, state)); BUILD_BUG_ON(sizeof_field(struct vcpu_runstate_info, state) != sizeof(vx->current_runstate)); BUILD_BUG_ON(sizeof_field(struct compat_vcpu_runstate_info, state) != sizeof(vx->current_runstate)); /* * The state_entry_time field is 64 bits in both versions, and the * XEN_RUNSTATE_UPDATE flag is in the top bit, which given that x86 * is little-endian means that it's in the last *byte* of the word. * That detail is important later. */ BUILD_BUG_ON(sizeof_field(struct vcpu_runstate_info, state_entry_time) != sizeof(uint64_t)); BUILD_BUG_ON(sizeof_field(struct compat_vcpu_runstate_info, state_entry_time) != sizeof(uint64_t)); BUILD_BUG_ON((XEN_RUNSTATE_UPDATE >> 56) != 0x80); /* * The time array is four 64-bit quantities in both versions, matching * the vx->runstate_times and immediately following state_entry_time. */ BUILD_BUG_ON(offsetof(struct vcpu_runstate_info, state_entry_time) != offsetof(struct vcpu_runstate_info, time) - sizeof(uint64_t)); BUILD_BUG_ON(offsetof(struct compat_vcpu_runstate_info, state_entry_time) != offsetof(struct compat_vcpu_runstate_info, time) - sizeof(uint64_t)); BUILD_BUG_ON(sizeof_field(struct vcpu_runstate_info, time) != sizeof_field(struct compat_vcpu_runstate_info, time)); BUILD_BUG_ON(sizeof_field(struct vcpu_runstate_info, time) != sizeof(vx->runstate_times)); if (IS_ENABLED(CONFIG_64BIT) && v->kvm->arch.xen.long_mode) { user_len = sizeof(struct vcpu_runstate_info); times_ofs = offsetof(struct vcpu_runstate_info, state_entry_time); } else { user_len = sizeof(struct compat_vcpu_runstate_info); times_ofs = offsetof(struct compat_vcpu_runstate_info, state_entry_time); } /* * There are basically no alignment constraints. The guest can set it * up so it crosses from one page to the next, and at arbitrary byte * alignment (and the 32-bit ABI doesn't align the 64-bit integers * anyway, even if the overall struct had been 64-bit aligned). */ if ((gpc1->gpa & ~PAGE_MASK) + user_len >= PAGE_SIZE) { user_len1 = PAGE_SIZE - (gpc1->gpa & ~PAGE_MASK); user_len2 = user_len - user_len1; } else { user_len1 = user_len; user_len2 = 0; } BUG_ON(user_len1 + user_len2 != user_len); retry: /* * Attempt to obtain the GPC lock on *both* (if there are two) * gfn_to_pfn caches that cover the region. */ if (atomic) { local_irq_save(flags); if (!read_trylock(&gpc1->lock)) { local_irq_restore(flags); return; } } else { read_lock_irqsave(&gpc1->lock, flags); } while (!kvm_gpc_check(gpc1, user_len1)) { read_unlock_irqrestore(&gpc1->lock, flags); /* When invoked from kvm_sched_out() we cannot sleep */ if (atomic) return; if (kvm_gpc_refresh(gpc1, user_len1)) return; read_lock_irqsave(&gpc1->lock, flags); } if (likely(!user_len2)) { /* * Set up three pointers directly to the runstate_info * struct in the guest (via the GPC). * * • @rs_state → state field * • @rs_times → state_entry_time field. * • @update_bit → last byte of state_entry_time, which * contains the XEN_RUNSTATE_UPDATE bit. */ rs_state = gpc1->khva; rs_times = gpc1->khva + times_ofs; if (v->kvm->arch.xen.runstate_update_flag) update_bit = ((void *)(&rs_times[1])) - 1; } else { /* * The guest's runstate_info is split across two pages and we * need to hold and validate both GPCs simultaneously. We can * declare a lock ordering GPC1 > GPC2 because nothing else * takes them more than one at a time. Set a subclass on the * gpc1 lock to make lockdep shut up about it. */ lock_set_subclass(&gpc1->lock.dep_map, 1, _THIS_IP_); if (atomic) { if (!read_trylock(&gpc2->lock)) { read_unlock_irqrestore(&gpc1->lock, flags); return; } } else { read_lock(&gpc2->lock); } if (!kvm_gpc_check(gpc2, user_len2)) { read_unlock(&gpc2->lock); read_unlock_irqrestore(&gpc1->lock, flags); /* When invoked from kvm_sched_out() we cannot sleep */ if (atomic) return; /* * Use kvm_gpc_activate() here because if the runstate * area was configured in 32-bit mode and only extends * to the second page now because the guest changed to * 64-bit mode, the second GPC won't have been set up. */ if (kvm_gpc_activate(gpc2, gpc1->gpa + user_len1, user_len2)) return; /* * We dropped the lock on GPC1 so we have to go all the * way back and revalidate that too. */ goto retry; } /* * In this case, the runstate_info struct will be assembled on * the kernel stack (compat or not as appropriate) and will * be copied to GPC1/GPC2 with a dual memcpy. Set up the three * rs pointers accordingly. */ rs_times = &rs.state_entry_time; /* * The rs_state pointer points to the start of what we'll * copy to the guest, which in the case of a compat guest * is the 32-bit field that the compiler thinks is padding. */ rs_state = ((void *)rs_times) - times_ofs; /* * The update_bit is still directly in the guest memory, * via one GPC or the other. */ if (v->kvm->arch.xen.runstate_update_flag) { if (user_len1 >= times_ofs + sizeof(uint64_t)) update_bit = gpc1->khva + times_ofs + sizeof(uint64_t) - 1; else update_bit = gpc2->khva + times_ofs + sizeof(uint64_t) - 1 - user_len1; } #ifdef CONFIG_X86_64 /* * Don't leak kernel memory through the padding in the 64-bit * version of the struct. */ memset(&rs, 0, offsetof(struct vcpu_runstate_info, state_entry_time)); #endif } /* * First, set the XEN_RUNSTATE_UPDATE bit in the top bit of the * state_entry_time field, directly in the guest. We need to set * that (and write-barrier) before writing to the rest of the * structure, and clear it last. Just as Xen does, we address the * single *byte* in which it resides because it might be in a * different cache line to the rest of the 64-bit word, due to * the (lack of) alignment constraints. */ entry_time = vx->runstate_entry_time; if (update_bit) { entry_time |= XEN_RUNSTATE_UPDATE; *update_bit = (vx->runstate_entry_time | XEN_RUNSTATE_UPDATE) >> 56; smp_wmb(); } /* * Now assemble the actual structure, either on our kernel stack * or directly in the guest according to how the rs_state and * rs_times pointers were set up above. */ *rs_state = vx->current_runstate; rs_times[0] = entry_time; memcpy(rs_times + 1, vx->runstate_times, sizeof(vx->runstate_times)); /* For the split case, we have to then copy it to the guest. */ if (user_len2) { memcpy(gpc1->khva, rs_state, user_len1); memcpy(gpc2->khva, ((void *)rs_state) + user_len1, user_len2); } smp_wmb(); /* Finally, clear the XEN_RUNSTATE_UPDATE bit. */ if (update_bit) { entry_time &= ~XEN_RUNSTATE_UPDATE; *update_bit = entry_time >> 56; smp_wmb(); } if (user_len2) { kvm_gpc_mark_dirty_in_slot(gpc2); read_unlock(&gpc2->lock); } kvm_gpc_mark_dirty_in_slot(gpc1); read_unlock_irqrestore(&gpc1->lock, flags); } void kvm_xen_update_runstate(struct kvm_vcpu *v, int state) { struct kvm_vcpu_xen *vx = &v->arch.xen; u64 now = get_kvmclock_ns(v->kvm); u64 delta_ns = now - vx->runstate_entry_time; u64 run_delay = current->sched_info.run_delay; if (unlikely(!vx->runstate_entry_time)) vx->current_runstate = RUNSTATE_offline; /* * Time waiting for the scheduler isn't "stolen" if the * vCPU wasn't running anyway. */ if (vx->current_runstate == RUNSTATE_running) { u64 steal_ns = run_delay - vx->last_steal; delta_ns -= steal_ns; vx->runstate_times[RUNSTATE_runnable] += steal_ns; } vx->last_steal = run_delay; vx->runstate_times[vx->current_runstate] += delta_ns; vx->current_runstate = state; vx->runstate_entry_time = now; if (vx->runstate_cache.active) kvm_xen_update_runstate_guest(v, state == RUNSTATE_runnable); } void kvm_xen_inject_vcpu_vector(struct kvm_vcpu *v) { struct kvm_lapic_irq irq = { }; irq.dest_id = v->vcpu_id; irq.vector = v->arch.xen.upcall_vector; irq.dest_mode = APIC_DEST_PHYSICAL; irq.shorthand = APIC_DEST_NOSHORT; irq.delivery_mode = APIC_DM_FIXED; irq.level = 1; kvm_irq_delivery_to_apic(v->kvm, NULL, &irq, NULL); } /* * On event channel delivery, the vcpu_info may not have been accessible. * In that case, there are bits in vcpu->arch.xen.evtchn_pending_sel which * need to be marked into the vcpu_info (and evtchn_upcall_pending set). * Do so now that we can sleep in the context of the vCPU to bring the * page in, and refresh the pfn cache for it. */ void kvm_xen_inject_pending_events(struct kvm_vcpu *v) { unsigned long evtchn_pending_sel = READ_ONCE(v->arch.xen.evtchn_pending_sel); struct gfn_to_pfn_cache *gpc = &v->arch.xen.vcpu_info_cache; unsigned long flags; if (!evtchn_pending_sel) return; /* * Yes, this is an open-coded loop. But that's just what put_user() * does anyway. Page it in and retry the instruction. We're just a * little more honest about it. */ read_lock_irqsave(&gpc->lock, flags); while (!kvm_gpc_check(gpc, sizeof(struct vcpu_info))) { read_unlock_irqrestore(&gpc->lock, flags); if (kvm_gpc_refresh(gpc, sizeof(struct vcpu_info))) return; read_lock_irqsave(&gpc->lock, flags); } /* Now gpc->khva is a valid kernel address for the vcpu_info */ if (IS_ENABLED(CONFIG_64BIT) && v->kvm->arch.xen.long_mode) { struct vcpu_info *vi = gpc->khva; asm volatile(LOCK_PREFIX "orq %0, %1\n" "notq %0\n" LOCK_PREFIX "andq %0, %2\n" : "=r" (evtchn_pending_sel), "+m" (vi->evtchn_pending_sel), "+m" (v->arch.xen.evtchn_pending_sel) : "0" (evtchn_pending_sel)); WRITE_ONCE(vi->evtchn_upcall_pending, 1); } else { u32 evtchn_pending_sel32 = evtchn_pending_sel; struct compat_vcpu_info *vi = gpc->khva; asm volatile(LOCK_PREFIX "orl %0, %1\n" "notl %0\n" LOCK_PREFIX "andl %0, %2\n" : "=r" (evtchn_pending_sel32), "+m" (vi->evtchn_pending_sel), "+m" (v->arch.xen.evtchn_pending_sel) : "0" (evtchn_pending_sel32)); WRITE_ONCE(vi->evtchn_upcall_pending, 1); } kvm_gpc_mark_dirty_in_slot(gpc); read_unlock_irqrestore(&gpc->lock, flags); /* For the per-vCPU lapic vector, deliver it as MSI. */ if (v->arch.xen.upcall_vector) kvm_xen_inject_vcpu_vector(v); } int __kvm_xen_has_interrupt(struct kvm_vcpu *v) { struct gfn_to_pfn_cache *gpc = &v->arch.xen.vcpu_info_cache; unsigned long flags; u8 rc = 0; /* * If the global upcall vector (HVMIRQ_callback_vector) is set and * the vCPU's evtchn_upcall_pending flag is set, the IRQ is pending. */ /* No need for compat handling here */ BUILD_BUG_ON(offsetof(struct vcpu_info, evtchn_upcall_pending) != offsetof(struct compat_vcpu_info, evtchn_upcall_pending)); BUILD_BUG_ON(sizeof(rc) != sizeof_field(struct vcpu_info, evtchn_upcall_pending)); BUILD_BUG_ON(sizeof(rc) != sizeof_field(struct compat_vcpu_info, evtchn_upcall_pending)); read_lock_irqsave(&gpc->lock, flags); while (!kvm_gpc_check(gpc, sizeof(struct vcpu_info))) { read_unlock_irqrestore(&gpc->lock, flags); /* * This function gets called from kvm_vcpu_block() after setting the * task to TASK_INTERRUPTIBLE, to see if it needs to wake immediately * from a HLT. So we really mustn't sleep. If the page ended up absent * at that point, just return 1 in order to trigger an immediate wake, * and we'll end up getting called again from a context where we *can* * fault in the page and wait for it. */ if (in_atomic() || !task_is_running(current)) return 1; if (kvm_gpc_refresh(gpc, sizeof(struct vcpu_info))) { /* * If this failed, userspace has screwed up the * vcpu_info mapping. No interrupts for you. */ return 0; } read_lock_irqsave(&gpc->lock, flags); } rc = ((struct vcpu_info *)gpc->khva)->evtchn_upcall_pending; read_unlock_irqrestore(&gpc->lock, flags); return rc; } int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data) { int r = -ENOENT; switch (data->type) { case KVM_XEN_ATTR_TYPE_LONG_MODE: if (!IS_ENABLED(CONFIG_64BIT) && data->u.long_mode) { r = -EINVAL; } else { mutex_lock(&kvm->arch.xen.xen_lock); kvm->arch.xen.long_mode = !!data->u.long_mode; /* * Re-initialize shared_info to put the wallclock in the * correct place. Whilst it's not necessary to do this * unless the mode is actually changed, it does no harm * to make the call anyway. */ r = kvm->arch.xen.shinfo_cache.active ? kvm_xen_shared_info_init(kvm) : 0; mutex_unlock(&kvm->arch.xen.xen_lock); } break; case KVM_XEN_ATTR_TYPE_SHARED_INFO: case KVM_XEN_ATTR_TYPE_SHARED_INFO_HVA: { int idx; mutex_lock(&kvm->arch.xen.xen_lock); idx = srcu_read_lock(&kvm->srcu); if (data->type == KVM_XEN_ATTR_TYPE_SHARED_INFO) { gfn_t gfn = data->u.shared_info.gfn; if (gfn == KVM_XEN_INVALID_GFN) { kvm_gpc_deactivate(&kvm->arch.xen.shinfo_cache); r = 0; } else { r = kvm_gpc_activate(&kvm->arch.xen.shinfo_cache, gfn_to_gpa(gfn), PAGE_SIZE); } } else { void __user * hva = u64_to_user_ptr(data->u.shared_info.hva); if (!PAGE_ALIGNED(hva)) { r = -EINVAL; } else if (!hva) { kvm_gpc_deactivate(&kvm->arch.xen.shinfo_cache); r = 0; } else { r = kvm_gpc_activate_hva(&kvm->arch.xen.shinfo_cache, (unsigned long)hva, PAGE_SIZE); } } srcu_read_unlock(&kvm->srcu, idx); if (!r && kvm->arch.xen.shinfo_cache.active) r = kvm_xen_shared_info_init(kvm); mutex_unlock(&kvm->arch.xen.xen_lock); break; } case KVM_XEN_ATTR_TYPE_UPCALL_VECTOR: if (data->u.vector && data->u.vector < 0x10) r = -EINVAL; else { mutex_lock(&kvm->arch.xen.xen_lock); kvm->arch.xen.upcall_vector = data->u.vector; mutex_unlock(&kvm->arch.xen.xen_lock); r = 0; } break; case KVM_XEN_ATTR_TYPE_EVTCHN: r = kvm_xen_setattr_evtchn(kvm, data); break; case KVM_XEN_ATTR_TYPE_XEN_VERSION: mutex_lock(&kvm->arch.xen.xen_lock); kvm->arch.xen.xen_version = data->u.xen_version; mutex_unlock(&kvm->arch.xen.xen_lock); r = 0; break; case KVM_XEN_ATTR_TYPE_RUNSTATE_UPDATE_FLAG: if (!sched_info_on()) { r = -EOPNOTSUPP; break; } mutex_lock(&kvm->arch.xen.xen_lock); kvm->arch.xen.runstate_update_flag = !!data->u.runstate_update_flag; mutex_unlock(&kvm->arch.xen.xen_lock); r = 0; break; default: break; } return r; } int kvm_xen_hvm_get_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data) { int r = -ENOENT; mutex_lock(&kvm->arch.xen.xen_lock); switch (data->type) { case KVM_XEN_ATTR_TYPE_LONG_MODE: data->u.long_mode = kvm->arch.xen.long_mode; r = 0; break; case KVM_XEN_ATTR_TYPE_SHARED_INFO: if (kvm_gpc_is_gpa_active(&kvm->arch.xen.shinfo_cache)) data->u.shared_info.gfn = gpa_to_gfn(kvm->arch.xen.shinfo_cache.gpa); else data->u.shared_info.gfn = KVM_XEN_INVALID_GFN; r = 0; break; case KVM_XEN_ATTR_TYPE_SHARED_INFO_HVA: if (kvm_gpc_is_hva_active(&kvm->arch.xen.shinfo_cache)) data->u.shared_info.hva = kvm->arch.xen.shinfo_cache.uhva; else data->u.shared_info.hva = 0; r = 0; break; case KVM_XEN_ATTR_TYPE_UPCALL_VECTOR: data->u.vector = kvm->arch.xen.upcall_vector; r = 0; break; case KVM_XEN_ATTR_TYPE_XEN_VERSION: data->u.xen_version = kvm->arch.xen.xen_version; r = 0; break; case KVM_XEN_ATTR_TYPE_RUNSTATE_UPDATE_FLAG: if (!sched_info_on()) { r = -EOPNOTSUPP; break; } data->u.runstate_update_flag = kvm->arch.xen.runstate_update_flag; r = 0; break; default: break; } mutex_unlock(&kvm->arch.xen.xen_lock); return r; } int kvm_xen_vcpu_set_attr(struct kvm_vcpu *vcpu, struct kvm_xen_vcpu_attr *data) { int idx, r = -ENOENT; mutex_lock(&vcpu->kvm->arch.xen.xen_lock); idx = srcu_read_lock(&vcpu->kvm->srcu); switch (data->type) { case KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO: case KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO_HVA: /* No compat necessary here. */ BUILD_BUG_ON(sizeof(struct vcpu_info) != sizeof(struct compat_vcpu_info)); BUILD_BUG_ON(offsetof(struct vcpu_info, time) != offsetof(struct compat_vcpu_info, time)); if (data->type == KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO) { if (data->u.gpa == KVM_XEN_INVALID_GPA) { kvm_gpc_deactivate(&vcpu->arch.xen.vcpu_info_cache); r = 0; break; } r = kvm_gpc_activate(&vcpu->arch.xen.vcpu_info_cache, data->u.gpa, sizeof(struct vcpu_info)); } else { if (data->u.hva == 0) { kvm_gpc_deactivate(&vcpu->arch.xen.vcpu_info_cache); r = 0; break; } r = kvm_gpc_activate_hva(&vcpu->arch.xen.vcpu_info_cache, data->u.hva, sizeof(struct vcpu_info)); } if (!r) kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); break; case KVM_XEN_VCPU_ATTR_TYPE_VCPU_TIME_INFO: if (data->u.gpa == KVM_XEN_INVALID_GPA) { kvm_gpc_deactivate(&vcpu->arch.xen.vcpu_time_info_cache); r = 0; break; } r = kvm_gpc_activate(&vcpu->arch.xen.vcpu_time_info_cache, data->u.gpa, sizeof(struct pvclock_vcpu_time_info)); if (!r) kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); break; case KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADDR: { size_t sz, sz1, sz2; if (!sched_info_on()) { r = -EOPNOTSUPP; break; } if (data->u.gpa == KVM_XEN_INVALID_GPA) { r = 0; deactivate_out: kvm_gpc_deactivate(&vcpu->arch.xen.runstate_cache); kvm_gpc_deactivate(&vcpu->arch.xen.runstate2_cache); break; } /* * If the guest switches to 64-bit mode after setting the runstate * address, that's actually OK. kvm_xen_update_runstate_guest() * will cope. */ if (IS_ENABLED(CONFIG_64BIT) && vcpu->kvm->arch.xen.long_mode) sz = sizeof(struct vcpu_runstate_info); else sz = sizeof(struct compat_vcpu_runstate_info); /* How much fits in the (first) page? */ sz1 = PAGE_SIZE - (data->u.gpa & ~PAGE_MASK); r = kvm_gpc_activate(&vcpu->arch.xen.runstate_cache, data->u.gpa, sz1); if (r) goto deactivate_out; /* Either map the second page, or deactivate the second GPC */ if (sz1 >= sz) { kvm_gpc_deactivate(&vcpu->arch.xen.runstate2_cache); } else { sz2 = sz - sz1; BUG_ON((data->u.gpa + sz1) & ~PAGE_MASK); r = kvm_gpc_activate(&vcpu->arch.xen.runstate2_cache, data->u.gpa + sz1, sz2); if (r) goto deactivate_out; } kvm_xen_update_runstate_guest(vcpu, false); break; } case KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_CURRENT: if (!sched_info_on()) { r = -EOPNOTSUPP; break; } if (data->u.runstate.state > RUNSTATE_offline) { r = -EINVAL; break; } kvm_xen_update_runstate(vcpu, data->u.runstate.state); r = 0; break; case KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_DATA: if (!sched_info_on()) { r = -EOPNOTSUPP; break; } if (data->u.runstate.state > RUNSTATE_offline) { r = -EINVAL; break; } if (data->u.runstate.state_entry_time != (data->u.runstate.time_running + data->u.runstate.time_runnable + data->u.runstate.time_blocked + data->u.runstate.time_offline)) { r = -EINVAL; break; } if (get_kvmclock_ns(vcpu->kvm) < data->u.runstate.state_entry_time) { r = -EINVAL; break; } vcpu->arch.xen.current_runstate = data->u.runstate.state; vcpu->arch.xen.runstate_entry_time = data->u.runstate.state_entry_time; vcpu->arch.xen.runstate_times[RUNSTATE_running] = data->u.runstate.time_running; vcpu->arch.xen.runstate_times[RUNSTATE_runnable] = data->u.runstate.time_runnable; vcpu->arch.xen.runstate_times[RUNSTATE_blocked] = data->u.runstate.time_blocked; vcpu->arch.xen.runstate_times[RUNSTATE_offline] = data->u.runstate.time_offline; vcpu->arch.xen.last_steal = current->sched_info.run_delay; r = 0; break; case KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST: if (!sched_info_on()) { r = -EOPNOTSUPP; break; } if (data->u.runstate.state > RUNSTATE_offline && data->u.runstate.state != (u64)-1) { r = -EINVAL; break; } /* The adjustment must add up */ if (data->u.runstate.state_entry_time != (data->u.runstate.time_running + data->u.runstate.time_runnable + data->u.runstate.time_blocked + data->u.runstate.time_offline)) { r = -EINVAL; break; } if (get_kvmclock_ns(vcpu->kvm) < (vcpu->arch.xen.runstate_entry_time + data->u.runstate.state_entry_time)) { r = -EINVAL; break; } vcpu->arch.xen.runstate_entry_time += data->u.runstate.state_entry_time; vcpu->arch.xen.runstate_times[RUNSTATE_running] += data->u.runstate.time_running; vcpu->arch.xen.runstate_times[RUNSTATE_runnable] += data->u.runstate.time_runnable; vcpu->arch.xen.runstate_times[RUNSTATE_blocked] += data->u.runstate.time_blocked; vcpu->arch.xen.runstate_times[RUNSTATE_offline] += data->u.runstate.time_offline; if (data->u.runstate.state <= RUNSTATE_offline) kvm_xen_update_runstate(vcpu, data->u.runstate.state); else if (vcpu->arch.xen.runstate_cache.active) kvm_xen_update_runstate_guest(vcpu, false); r = 0; break; case KVM_XEN_VCPU_ATTR_TYPE_VCPU_ID: if (data->u.vcpu_id >= KVM_MAX_VCPUS) r = -EINVAL; else { vcpu->arch.xen.vcpu_id = data->u.vcpu_id; r = 0; } break; case KVM_XEN_VCPU_ATTR_TYPE_TIMER: if (data->u.timer.port && data->u.timer.priority != KVM_IRQ_ROUTING_XEN_EVTCHN_PRIO_2LEVEL) { r = -EINVAL; break; } /* Stop the timer (if it's running) before changing the vector */ kvm_xen_stop_timer(vcpu); vcpu->arch.xen.timer_virq = data->u.timer.port; /* Start the timer if the new value has a valid vector+expiry. */ if (data->u.timer.port && data->u.timer.expires_ns) kvm_xen_start_timer(vcpu, data->u.timer.expires_ns, false); r = 0; break; case KVM_XEN_VCPU_ATTR_TYPE_UPCALL_VECTOR: if (data->u.vector && data->u.vector < 0x10) r = -EINVAL; else { vcpu->arch.xen.upcall_vector = data->u.vector; r = 0; } break; default: break; } srcu_read_unlock(&vcpu->kvm->srcu, idx); mutex_unlock(&vcpu->kvm->arch.xen.xen_lock); return r; } int kvm_xen_vcpu_get_attr(struct kvm_vcpu *vcpu, struct kvm_xen_vcpu_attr *data) { int r = -ENOENT; mutex_lock(&vcpu->kvm->arch.xen.xen_lock); switch (data->type) { case KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO: if (kvm_gpc_is_gpa_active(&vcpu->arch.xen.vcpu_info_cache)) data->u.gpa = vcpu->arch.xen.vcpu_info_cache.gpa; else data->u.gpa = KVM_XEN_INVALID_GPA; r = 0; break; case KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO_HVA: if (kvm_gpc_is_hva_active(&vcpu->arch.xen.vcpu_info_cache)) data->u.hva = vcpu->arch.xen.vcpu_info_cache.uhva; else data->u.hva = 0; r = 0; break; case KVM_XEN_VCPU_ATTR_TYPE_VCPU_TIME_INFO: if (vcpu->arch.xen.vcpu_time_info_cache.active) data->u.gpa = vcpu->arch.xen.vcpu_time_info_cache.gpa; else data->u.gpa = KVM_XEN_INVALID_GPA; r = 0; break; case KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADDR: if (!sched_info_on()) { r = -EOPNOTSUPP; break; } if (vcpu->arch.xen.runstate_cache.active) { data->u.gpa = vcpu->arch.xen.runstate_cache.gpa; r = 0; } break; case KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_CURRENT: if (!sched_info_on()) { r = -EOPNOTSUPP; break; } data->u.runstate.state = vcpu->arch.xen.current_runstate; r = 0; break; case KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_DATA: if (!sched_info_on()) { r = -EOPNOTSUPP; break; } data->u.runstate.state = vcpu->arch.xen.current_runstate; data->u.runstate.state_entry_time = vcpu->arch.xen.runstate_entry_time; data->u.runstate.time_running = vcpu->arch.xen.runstate_times[RUNSTATE_running]; data->u.runstate.time_runnable = vcpu->arch.xen.runstate_times[RUNSTATE_runnable]; data->u.runstate.time_blocked = vcpu->arch.xen.runstate_times[RUNSTATE_blocked]; data->u.runstate.time_offline = vcpu->arch.xen.runstate_times[RUNSTATE_offline]; r = 0; break; case KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST: r = -EINVAL; break; case KVM_XEN_VCPU_ATTR_TYPE_VCPU_ID: data->u.vcpu_id = vcpu->arch.xen.vcpu_id; r = 0; break; case KVM_XEN_VCPU_ATTR_TYPE_TIMER: /* * Ensure a consistent snapshot of state is captured, with a * timer either being pending, or the event channel delivered * to the corresponding bit in the shared_info. Not still * lurking in the timer_pending flag for deferred delivery. * Purely as an optimisation, if the timer_expires field is * zero, that means the timer isn't active (or even in the * timer_pending flag) and there is no need to cancel it. */ if (vcpu->arch.xen.timer_expires) { hrtimer_cancel(&vcpu->arch.xen.timer); kvm_xen_inject_timer_irqs(vcpu); } data->u.timer.port = vcpu->arch.xen.timer_virq; data->u.timer.priority = KVM_IRQ_ROUTING_XEN_EVTCHN_PRIO_2LEVEL; data->u.timer.expires_ns = vcpu->arch.xen.timer_expires; /* * The hrtimer may trigger and raise the IRQ immediately, * while the returned state causes it to be set up and * raised again on the destination system after migration. * That's fine, as the guest won't even have had a chance * to run and handle the interrupt. Asserting an already * pending event channel is idempotent. */ if (vcpu->arch.xen.timer_expires) hrtimer_start_expires(&vcpu->arch.xen.timer, HRTIMER_MODE_ABS_HARD); r = 0; break; case KVM_XEN_VCPU_ATTR_TYPE_UPCALL_VECTOR: data->u.vector = vcpu->arch.xen.upcall_vector; r = 0; break; default: break; } mutex_unlock(&vcpu->kvm->arch.xen.xen_lock); return r; } int kvm_xen_write_hypercall_page(struct kvm_vcpu *vcpu, u64 data) { struct kvm *kvm = vcpu->kvm; u32 page_num = data & ~PAGE_MASK; u64 page_addr = data & PAGE_MASK; bool lm = is_long_mode(vcpu); int r = 0; mutex_lock(&kvm->arch.xen.xen_lock); if (kvm->arch.xen.long_mode != lm) { kvm->arch.xen.long_mode = lm; /* * Re-initialize shared_info to put the wallclock in the * correct place. */ if (kvm->arch.xen.shinfo_cache.active && kvm_xen_shared_info_init(kvm)) r = 1; } mutex_unlock(&kvm->arch.xen.xen_lock); if (r) return r; /* * If Xen hypercall intercept is enabled, fill the hypercall * page with VMCALL/VMMCALL instructions since that's what * we catch. Else the VMM has provided the hypercall pages * with instructions of its own choosing, so use those. */ if (kvm_xen_hypercall_enabled(kvm)) { u8 instructions[32]; int i; if (page_num) return 1; /* mov imm32, %eax */ instructions[0] = 0xb8; /* vmcall / vmmcall */ kvm_x86_call(patch_hypercall)(vcpu, instructions + 5); /* ret */ instructions[8] = 0xc3; /* int3 to pad */ memset(instructions + 9, 0xcc, sizeof(instructions) - 9); for (i = 0; i < PAGE_SIZE / sizeof(instructions); i++) { *(u32 *)&instructions[1] = i; if (kvm_vcpu_write_guest(vcpu, page_addr + (i * sizeof(instructions)), instructions, sizeof(instructions))) return 1; } } else { /* * Note, truncation is a non-issue as 'lm' is guaranteed to be * false for a 32-bit kernel, i.e. when hva_t is only 4 bytes. */ hva_t blob_addr = lm ? kvm->arch.xen_hvm_config.blob_addr_64 : kvm->arch.xen_hvm_config.blob_addr_32; u8 blob_size = lm ? kvm->arch.xen_hvm_config.blob_size_64 : kvm->arch.xen_hvm_config.blob_size_32; u8 *page; int ret; if (page_num >= blob_size) return 1; blob_addr += page_num * PAGE_SIZE; page = memdup_user((u8 __user *)blob_addr, PAGE_SIZE); if (IS_ERR(page)) return PTR_ERR(page); ret = kvm_vcpu_write_guest(vcpu, page_addr, page, PAGE_SIZE); kfree(page); if (ret) return 1; } return 0; } int kvm_xen_hvm_config(struct kvm *kvm, struct kvm_xen_hvm_config *xhc) { /* Only some feature flags need to be *enabled* by userspace */ u32 permitted_flags = KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL | KVM_XEN_HVM_CONFIG_EVTCHN_SEND | KVM_XEN_HVM_CONFIG_PVCLOCK_TSC_UNSTABLE; u32 old_flags; if (xhc->flags & ~permitted_flags) return -EINVAL; /* * With hypercall interception the kernel generates its own * hypercall page so it must not be provided. */ if ((xhc->flags & KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL) && (xhc->blob_addr_32 || xhc->blob_addr_64 || xhc->blob_size_32 || xhc->blob_size_64)) return -EINVAL; mutex_lock(&kvm->arch.xen.xen_lock); if (xhc->msr && !kvm->arch.xen_hvm_config.msr) static_branch_inc(&kvm_xen_enabled.key); else if (!xhc->msr && kvm->arch.xen_hvm_config.msr) static_branch_slow_dec_deferred(&kvm_xen_enabled); old_flags = kvm->arch.xen_hvm_config.flags; memcpy(&kvm->arch.xen_hvm_config, xhc, sizeof(*xhc)); mutex_unlock(&kvm->arch.xen.xen_lock); if ((old_flags ^ xhc->flags) & KVM_XEN_HVM_CONFIG_PVCLOCK_TSC_UNSTABLE) kvm_make_all_cpus_request(kvm, KVM_REQ_CLOCK_UPDATE); return 0; } static int kvm_xen_hypercall_set_result(struct kvm_vcpu *vcpu, u64 result) { kvm_rax_write(vcpu, result); return kvm_skip_emulated_instruction(vcpu); } static int kvm_xen_hypercall_complete_userspace(struct kvm_vcpu *vcpu) { struct kvm_run *run = vcpu->run; if (unlikely(!kvm_is_linear_rip(vcpu, vcpu->arch.xen.hypercall_rip))) return 1; return kvm_xen_hypercall_set_result(vcpu, run->xen.u.hcall.result); } static inline int max_evtchn_port(struct kvm *kvm) { if (IS_ENABLED(CONFIG_64BIT) && kvm->arch.xen.long_mode) return EVTCHN_2L_NR_CHANNELS; else return COMPAT_EVTCHN_2L_NR_CHANNELS; } static bool wait_pending_event(struct kvm_vcpu *vcpu, int nr_ports, evtchn_port_t *ports) { struct kvm *kvm = vcpu->kvm; struct gfn_to_pfn_cache *gpc = &kvm->arch.xen.shinfo_cache; unsigned long *pending_bits; unsigned long flags; bool ret = true; int idx, i; idx = srcu_read_lock(&kvm->srcu); read_lock_irqsave(&gpc->lock, flags); if (!kvm_gpc_check(gpc, PAGE_SIZE)) goto out_rcu; ret = false; if (IS_ENABLED(CONFIG_64BIT) && kvm->arch.xen.long_mode) { struct shared_info *shinfo = gpc->khva; pending_bits = (unsigned long *)&shinfo->evtchn_pending; } else { struct compat_shared_info *shinfo = gpc->khva; pending_bits = (unsigned long *)&shinfo->evtchn_pending; } for (i = 0; i < nr_ports; i++) { if (test_bit(ports[i], pending_bits)) { ret = true; break; } } out_rcu: read_unlock_irqrestore(&gpc->lock, flags); srcu_read_unlock(&kvm->srcu, idx); return ret; } static bool kvm_xen_schedop_poll(struct kvm_vcpu *vcpu, bool longmode, u64 param, u64 *r) { struct sched_poll sched_poll; evtchn_port_t port, *ports; struct x86_exception e; int i; if (!lapic_in_kernel(vcpu) || !(vcpu->kvm->arch.xen_hvm_config.flags & KVM_XEN_HVM_CONFIG_EVTCHN_SEND)) return false; if (IS_ENABLED(CONFIG_64BIT) && !longmode) { struct compat_sched_poll sp32; /* Sanity check that the compat struct definition is correct */ BUILD_BUG_ON(sizeof(sp32) != 16); if (kvm_read_guest_virt(vcpu, param, &sp32, sizeof(sp32), &e)) { *r = -EFAULT; return true; } /* * This is a 32-bit pointer to an array of evtchn_port_t which * are uint32_t, so once it's converted no further compat * handling is needed. */ sched_poll.ports = (void *)(unsigned long)(sp32.ports); sched_poll.nr_ports = sp32.nr_ports; sched_poll.timeout = sp32.timeout; } else { if (kvm_read_guest_virt(vcpu, param, &sched_poll, sizeof(sched_poll), &e)) { *r = -EFAULT; return true; } } if (unlikely(sched_poll.nr_ports > 1)) { /* Xen (unofficially) limits number of pollers to 128 */ if (sched_poll.nr_ports > 128) { *r = -EINVAL; return true; } ports = kmalloc_array(sched_poll.nr_ports, sizeof(*ports), GFP_KERNEL); if (!ports) { *r = -ENOMEM; return true; } } else ports = &port; if (kvm_read_guest_virt(vcpu, (gva_t)sched_poll.ports, ports, sched_poll.nr_ports * sizeof(*ports), &e)) { *r = -EFAULT; return true; } for (i = 0; i < sched_poll.nr_ports; i++) { if (ports[i] >= max_evtchn_port(vcpu->kvm)) { *r = -EINVAL; goto out; } } if (sched_poll.nr_ports == 1) vcpu->arch.xen.poll_evtchn = port; else vcpu->arch.xen.poll_evtchn = -1; set_bit(vcpu->vcpu_idx, vcpu->kvm->arch.xen.poll_mask); if (!wait_pending_event(vcpu, sched_poll.nr_ports, ports)) { vcpu->arch.mp_state = KVM_MP_STATE_HALTED; if (sched_poll.timeout) mod_timer(&vcpu->arch.xen.poll_timer, jiffies + nsecs_to_jiffies(sched_poll.timeout)); kvm_vcpu_halt(vcpu); if (sched_poll.timeout) del_timer(&vcpu->arch.xen.poll_timer); vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE; } vcpu->arch.xen.poll_evtchn = 0; *r = 0; out: /* Really, this is only needed in case of timeout */ clear_bit(vcpu->vcpu_idx, vcpu->kvm->arch.xen.poll_mask); if (unlikely(sched_poll.nr_ports > 1)) kfree(ports); return true; } static void cancel_evtchn_poll(struct timer_list *t) { struct kvm_vcpu *vcpu = from_timer(vcpu, t, arch.xen.poll_timer); kvm_make_request(KVM_REQ_UNBLOCK, vcpu); kvm_vcpu_kick(vcpu); } static bool kvm_xen_hcall_sched_op(struct kvm_vcpu *vcpu, bool longmode, int cmd, u64 param, u64 *r) { switch (cmd) { case SCHEDOP_poll: if (kvm_xen_schedop_poll(vcpu, longmode, param, r)) return true; fallthrough; case SCHEDOP_yield: kvm_vcpu_on_spin(vcpu, true); *r = 0; return true; default: break; } return false; } struct compat_vcpu_set_singleshot_timer { uint64_t timeout_abs_ns; uint32_t flags; } __attribute__((packed)); static bool kvm_xen_hcall_vcpu_op(struct kvm_vcpu *vcpu, bool longmode, int cmd, int vcpu_id, u64 param, u64 *r) { struct vcpu_set_singleshot_timer oneshot; struct x86_exception e; if (!kvm_xen_timer_enabled(vcpu)) return false; switch (cmd) { case VCPUOP_set_singleshot_timer: if (vcpu->arch.xen.vcpu_id != vcpu_id) { *r = -EINVAL; return true; } /* * The only difference for 32-bit compat is the 4 bytes of * padding after the interesting part of the structure. So * for a faithful emulation of Xen we have to *try* to copy * the padding and return -EFAULT if we can't. Otherwise we * might as well just have copied the 12-byte 32-bit struct. */ BUILD_BUG_ON(offsetof(struct compat_vcpu_set_singleshot_timer, timeout_abs_ns) != offsetof(struct vcpu_set_singleshot_timer, timeout_abs_ns)); BUILD_BUG_ON(sizeof_field(struct compat_vcpu_set_singleshot_timer, timeout_abs_ns) != sizeof_field(struct vcpu_set_singleshot_timer, timeout_abs_ns)); BUILD_BUG_ON(offsetof(struct compat_vcpu_set_singleshot_timer, flags) != offsetof(struct vcpu_set_singleshot_timer, flags)); BUILD_BUG_ON(sizeof_field(struct compat_vcpu_set_singleshot_timer, flags) != sizeof_field(struct vcpu_set_singleshot_timer, flags)); if (kvm_read_guest_virt(vcpu, param, &oneshot, longmode ? sizeof(oneshot) : sizeof(struct compat_vcpu_set_singleshot_timer), &e)) { *r = -EFAULT; return true; } kvm_xen_start_timer(vcpu, oneshot.timeout_abs_ns, false); *r = 0; return true; case VCPUOP_stop_singleshot_timer: if (vcpu->arch.xen.vcpu_id != vcpu_id) { *r = -EINVAL; return true; } kvm_xen_stop_timer(vcpu); *r = 0; return true; } return false; } static bool kvm_xen_hcall_set_timer_op(struct kvm_vcpu *vcpu, uint64_t timeout, u64 *r) { if (!kvm_xen_timer_enabled(vcpu)) return false; if (timeout) kvm_xen_start_timer(vcpu, timeout, true); else kvm_xen_stop_timer(vcpu); *r = 0; return true; } int kvm_xen_hypercall(struct kvm_vcpu *vcpu) { bool longmode; u64 input, params[6], r = -ENOSYS; bool handled = false; u8 cpl; input = (u64)kvm_register_read(vcpu, VCPU_REGS_RAX); /* Hyper-V hypercalls get bit 31 set in EAX */ if ((input & 0x80000000) && kvm_hv_hypercall_enabled(vcpu)) return kvm_hv_hypercall(vcpu); longmode = is_64_bit_hypercall(vcpu); if (!longmode) { params[0] = (u32)kvm_rbx_read(vcpu); params[1] = (u32)kvm_rcx_read(vcpu); params[2] = (u32)kvm_rdx_read(vcpu); params[3] = (u32)kvm_rsi_read(vcpu); params[4] = (u32)kvm_rdi_read(vcpu); params[5] = (u32)kvm_rbp_read(vcpu); } #ifdef CONFIG_X86_64 else { params[0] = (u64)kvm_rdi_read(vcpu); params[1] = (u64)kvm_rsi_read(vcpu); params[2] = (u64)kvm_rdx_read(vcpu); params[3] = (u64)kvm_r10_read(vcpu); params[4] = (u64)kvm_r8_read(vcpu); params[5] = (u64)kvm_r9_read(vcpu); } #endif cpl = kvm_x86_call(get_cpl)(vcpu); trace_kvm_xen_hypercall(cpl, input, params[0], params[1], params[2], params[3], params[4], params[5]); /* * Only allow hypercall acceleration for CPL0. The rare hypercalls that * are permitted in guest userspace can be handled by the VMM. */ if (unlikely(cpl > 0)) goto handle_in_userspace; switch (input) { case __HYPERVISOR_xen_version: if (params[0] == XENVER_version && vcpu->kvm->arch.xen.xen_version) { r = vcpu->kvm->arch.xen.xen_version; handled = true; } break; case __HYPERVISOR_event_channel_op: if (params[0] == EVTCHNOP_send) handled = kvm_xen_hcall_evtchn_send(vcpu, params[1], &r); break; case __HYPERVISOR_sched_op: handled = kvm_xen_hcall_sched_op(vcpu, longmode, params[0], params[1], &r); break; case __HYPERVISOR_vcpu_op: handled = kvm_xen_hcall_vcpu_op(vcpu, longmode, params[0], params[1], params[2], &r); break; case __HYPERVISOR_set_timer_op: { u64 timeout = params[0]; /* In 32-bit mode, the 64-bit timeout is in two 32-bit params. */ if (!longmode) timeout |= params[1] << 32; handled = kvm_xen_hcall_set_timer_op(vcpu, timeout, &r); break; } default: break; } if (handled) return kvm_xen_hypercall_set_result(vcpu, r); handle_in_userspace: vcpu->run->exit_reason = KVM_EXIT_XEN; vcpu->run->xen.type = KVM_EXIT_XEN_HCALL; vcpu->run->xen.u.hcall.longmode = longmode; vcpu->run->xen.u.hcall.cpl = cpl; vcpu->run->xen.u.hcall.input = input; vcpu->run->xen.u.hcall.params[0] = params[0]; vcpu->run->xen.u.hcall.params[1] = params[1]; vcpu->run->xen.u.hcall.params[2] = params[2]; vcpu->run->xen.u.hcall.params[3] = params[3]; vcpu->run->xen.u.hcall.params[4] = params[4]; vcpu->run->xen.u.hcall.params[5] = params[5]; vcpu->arch.xen.hypercall_rip = kvm_get_linear_rip(vcpu); vcpu->arch.complete_userspace_io = kvm_xen_hypercall_complete_userspace; return 0; } static void kvm_xen_check_poller(struct kvm_vcpu *vcpu, int port) { int poll_evtchn = vcpu->arch.xen.poll_evtchn; if ((poll_evtchn == port || poll_evtchn == -1) && test_and_clear_bit(vcpu->vcpu_idx, vcpu->kvm->arch.xen.poll_mask)) { kvm_make_request(KVM_REQ_UNBLOCK, vcpu); kvm_vcpu_kick(vcpu); } } /* * The return value from this function is propagated to kvm_set_irq() API, * so it returns: * < 0 Interrupt was ignored (masked or not delivered for other reasons) * = 0 Interrupt was coalesced (previous irq is still pending) * > 0 Number of CPUs interrupt was delivered to * * It is also called directly from kvm_arch_set_irq_inatomic(), where the * only check on its return value is a comparison with -EWOULDBLOCK'. */ int kvm_xen_set_evtchn_fast(struct kvm_xen_evtchn *xe, struct kvm *kvm) { struct gfn_to_pfn_cache *gpc = &kvm->arch.xen.shinfo_cache; struct kvm_vcpu *vcpu; unsigned long *pending_bits, *mask_bits; unsigned long flags; int port_word_bit; bool kick_vcpu = false; int vcpu_idx, idx, rc; vcpu_idx = READ_ONCE(xe->vcpu_idx); if (vcpu_idx >= 0) vcpu = kvm_get_vcpu(kvm, vcpu_idx); else { vcpu = kvm_get_vcpu_by_id(kvm, xe->vcpu_id); if (!vcpu) return -EINVAL; WRITE_ONCE(xe->vcpu_idx, vcpu->vcpu_idx); } if (xe->port >= max_evtchn_port(kvm)) return -EINVAL; rc = -EWOULDBLOCK; idx = srcu_read_lock(&kvm->srcu); read_lock_irqsave(&gpc->lock, flags); if (!kvm_gpc_check(gpc, PAGE_SIZE)) goto out_rcu; if (IS_ENABLED(CONFIG_64BIT) && kvm->arch.xen.long_mode) { struct shared_info *shinfo = gpc->khva; pending_bits = (unsigned long *)&shinfo->evtchn_pending; mask_bits = (unsigned long *)&shinfo->evtchn_mask; port_word_bit = xe->port / 64; } else { struct compat_shared_info *shinfo = gpc->khva; pending_bits = (unsigned long *)&shinfo->evtchn_pending; mask_bits = (unsigned long *)&shinfo->evtchn_mask; port_word_bit = xe->port / 32; } /* * If this port wasn't already set, and if it isn't masked, then * we try to set the corresponding bit in the in-kernel shadow of * evtchn_pending_sel for the target vCPU. And if *that* wasn't * already set, then we kick the vCPU in question to write to the * *real* evtchn_pending_sel in its own guest vcpu_info struct. */ if (test_and_set_bit(xe->port, pending_bits)) { rc = 0; /* It was already raised */ } else if (test_bit(xe->port, mask_bits)) { rc = -ENOTCONN; /* Masked */ kvm_xen_check_poller(vcpu, xe->port); } else { rc = 1; /* Delivered to the bitmap in shared_info. */ /* Now switch to the vCPU's vcpu_info to set the index and pending_sel */ read_unlock_irqrestore(&gpc->lock, flags); gpc = &vcpu->arch.xen.vcpu_info_cache; read_lock_irqsave(&gpc->lock, flags); if (!kvm_gpc_check(gpc, sizeof(struct vcpu_info))) { /* * Could not access the vcpu_info. Set the bit in-kernel * and prod the vCPU to deliver it for itself. */ if (!test_and_set_bit(port_word_bit, &vcpu->arch.xen.evtchn_pending_sel)) kick_vcpu = true; goto out_rcu; } if (IS_ENABLED(CONFIG_64BIT) && kvm->arch.xen.long_mode) { struct vcpu_info *vcpu_info = gpc->khva; if (!test_and_set_bit(port_word_bit, &vcpu_info->evtchn_pending_sel)) { WRITE_ONCE(vcpu_info->evtchn_upcall_pending, 1); kick_vcpu = true; } } else { struct compat_vcpu_info *vcpu_info = gpc->khva; if (!test_and_set_bit(port_word_bit, (unsigned long *)&vcpu_info->evtchn_pending_sel)) { WRITE_ONCE(vcpu_info->evtchn_upcall_pending, 1); kick_vcpu = true; } } /* For the per-vCPU lapic vector, deliver it as MSI. */ if (kick_vcpu && vcpu->arch.xen.upcall_vector) { kvm_xen_inject_vcpu_vector(vcpu); kick_vcpu = false; } } out_rcu: read_unlock_irqrestore(&gpc->lock, flags); srcu_read_unlock(&kvm->srcu, idx); if (kick_vcpu) { kvm_make_request(KVM_REQ_UNBLOCK, vcpu); kvm_vcpu_kick(vcpu); } return rc; } static int kvm_xen_set_evtchn(struct kvm_xen_evtchn *xe, struct kvm *kvm) { bool mm_borrowed = false; int rc; rc = kvm_xen_set_evtchn_fast(xe, kvm); if (rc != -EWOULDBLOCK) return rc; if (current->mm != kvm->mm) { /* * If not on a thread which already belongs to this KVM, * we'd better be in the irqfd workqueue. */ if (WARN_ON_ONCE(current->mm)) return -EINVAL; kthread_use_mm(kvm->mm); mm_borrowed = true; } /* * It is theoretically possible for the page to be unmapped * and the MMU notifier to invalidate the shared_info before * we even get to use it. In that case, this looks like an * infinite loop. It was tempting to do it via the userspace * HVA instead... but that just *hides* the fact that it's * an infinite loop, because if a fault occurs and it waits * for the page to come back, it can *still* immediately * fault and have to wait again, repeatedly. * * Conversely, the page could also have been reinstated by * another thread before we even obtain the mutex above, so * check again *first* before remapping it. */ do { struct gfn_to_pfn_cache *gpc = &kvm->arch.xen.shinfo_cache; int idx; rc = kvm_xen_set_evtchn_fast(xe, kvm); if (rc != -EWOULDBLOCK) break; idx = srcu_read_lock(&kvm->srcu); rc = kvm_gpc_refresh(gpc, PAGE_SIZE); srcu_read_unlock(&kvm->srcu, idx); } while(!rc); if (mm_borrowed) kthread_unuse_mm(kvm->mm); return rc; } /* This is the version called from kvm_set_irq() as the .set function */ static int evtchn_set_fn(struct kvm_kernel_irq_routing_entry *e, struct kvm *kvm, int irq_source_id, int level, bool line_status) { if (!level) return -EINVAL; return kvm_xen_set_evtchn(&e->xen_evtchn, kvm); } /* * Set up an event channel interrupt from the KVM IRQ routing table. * Used for e.g. PIRQ from passed through physical devices. */ int kvm_xen_setup_evtchn(struct kvm *kvm, struct kvm_kernel_irq_routing_entry *e, const struct kvm_irq_routing_entry *ue) { struct kvm_vcpu *vcpu; if (ue->u.xen_evtchn.port >= max_evtchn_port(kvm)) return -EINVAL; /* We only support 2 level event channels for now */ if (ue->u.xen_evtchn.priority != KVM_IRQ_ROUTING_XEN_EVTCHN_PRIO_2LEVEL) return -EINVAL; /* * Xen gives us interesting mappings from vCPU index to APIC ID, * which means kvm_get_vcpu_by_id() has to iterate over all vCPUs * to find it. Do that once at setup time, instead of every time. * But beware that on live update / live migration, the routing * table might be reinstated before the vCPU threads have finished * recreating their vCPUs. */ vcpu = kvm_get_vcpu_by_id(kvm, ue->u.xen_evtchn.vcpu); if (vcpu) e->xen_evtchn.vcpu_idx = vcpu->vcpu_idx; else e->xen_evtchn.vcpu_idx = -1; e->xen_evtchn.port = ue->u.xen_evtchn.port; e->xen_evtchn.vcpu_id = ue->u.xen_evtchn.vcpu; e->xen_evtchn.priority = ue->u.xen_evtchn.priority; e->set = evtchn_set_fn; return 0; } /* * Explicit event sending from userspace with KVM_XEN_HVM_EVTCHN_SEND ioctl. */ int kvm_xen_hvm_evtchn_send(struct kvm *kvm, struct kvm_irq_routing_xen_evtchn *uxe) { struct kvm_xen_evtchn e; int ret; if (!uxe->port || uxe->port >= max_evtchn_port(kvm)) return -EINVAL; /* We only support 2 level event channels for now */ if (uxe->priority != KVM_IRQ_ROUTING_XEN_EVTCHN_PRIO_2LEVEL) return -EINVAL; e.port = uxe->port; e.vcpu_id = uxe->vcpu; e.vcpu_idx = -1; e.priority = uxe->priority; ret = kvm_xen_set_evtchn(&e, kvm); /* * None of that 'return 1 if it actually got delivered' nonsense. * We don't care if it was masked (-ENOTCONN) either. */ if (ret > 0 || ret == -ENOTCONN) ret = 0; return ret; } /* * Support for *outbound* event channel events via the EVTCHNOP_send hypercall. */ struct evtchnfd { u32 send_port; u32 type; union { struct kvm_xen_evtchn port; struct { u32 port; /* zero */ struct eventfd_ctx *ctx; } eventfd; } deliver; }; /* * Update target vCPU or priority for a registered sending channel. */ static int kvm_xen_eventfd_update(struct kvm *kvm, struct kvm_xen_hvm_attr *data) { u32 port = data->u.evtchn.send_port; struct evtchnfd *evtchnfd; int ret; /* Protect writes to evtchnfd as well as the idr lookup. */ mutex_lock(&kvm->arch.xen.xen_lock); evtchnfd = idr_find(&kvm->arch.xen.evtchn_ports, port); ret = -ENOENT; if (!evtchnfd) goto out_unlock; /* For an UPDATE, nothing may change except the priority/vcpu */ ret = -EINVAL; if (evtchnfd->type != data->u.evtchn.type) goto out_unlock; /* * Port cannot change, and if it's zero that was an eventfd * which can't be changed either. */ if (!evtchnfd->deliver.port.port || evtchnfd->deliver.port.port != data->u.evtchn.deliver.port.port) goto out_unlock; /* We only support 2 level event channels for now */ if (data->u.evtchn.deliver.port.priority != KVM_IRQ_ROUTING_XEN_EVTCHN_PRIO_2LEVEL) goto out_unlock; evtchnfd->deliver.port.priority = data->u.evtchn.deliver.port.priority; if (evtchnfd->deliver.port.vcpu_id != data->u.evtchn.deliver.port.vcpu) { evtchnfd->deliver.port.vcpu_id = data->u.evtchn.deliver.port.vcpu; evtchnfd->deliver.port.vcpu_idx = -1; } ret = 0; out_unlock: mutex_unlock(&kvm->arch.xen.xen_lock); return ret; } /* * Configure the target (eventfd or local port delivery) for sending on * a given event channel. */ static int kvm_xen_eventfd_assign(struct kvm *kvm, struct kvm_xen_hvm_attr *data) { u32 port = data->u.evtchn.send_port; struct eventfd_ctx *eventfd = NULL; struct evtchnfd *evtchnfd; int ret = -EINVAL; evtchnfd = kzalloc(sizeof(struct evtchnfd), GFP_KERNEL); if (!evtchnfd) return -ENOMEM; switch(data->u.evtchn.type) { case EVTCHNSTAT_ipi: /* IPI must map back to the same port# */ if (data->u.evtchn.deliver.port.port != data->u.evtchn.send_port) goto out_noeventfd; /* -EINVAL */ break; case EVTCHNSTAT_interdomain: if (data->u.evtchn.deliver.port.port) { if (data->u.evtchn.deliver.port.port >= max_evtchn_port(kvm)) goto out_noeventfd; /* -EINVAL */ } else { eventfd = eventfd_ctx_fdget(data->u.evtchn.deliver.eventfd.fd); if (IS_ERR(eventfd)) { ret = PTR_ERR(eventfd); goto out_noeventfd; } } break; case EVTCHNSTAT_virq: case EVTCHNSTAT_closed: case EVTCHNSTAT_unbound: case EVTCHNSTAT_pirq: default: /* Unknown event channel type */ goto out; /* -EINVAL */ } evtchnfd->send_port = data->u.evtchn.send_port; evtchnfd->type = data->u.evtchn.type; if (eventfd) { evtchnfd->deliver.eventfd.ctx = eventfd; } else { /* We only support 2 level event channels for now */ if (data->u.evtchn.deliver.port.priority != KVM_IRQ_ROUTING_XEN_EVTCHN_PRIO_2LEVEL) goto out; /* -EINVAL; */ evtchnfd->deliver.port.port = data->u.evtchn.deliver.port.port; evtchnfd->deliver.port.vcpu_id = data->u.evtchn.deliver.port.vcpu; evtchnfd->deliver.port.vcpu_idx = -1; evtchnfd->deliver.port.priority = data->u.evtchn.deliver.port.priority; } mutex_lock(&kvm->arch.xen.xen_lock); ret = idr_alloc(&kvm->arch.xen.evtchn_ports, evtchnfd, port, port + 1, GFP_KERNEL); mutex_unlock(&kvm->arch.xen.xen_lock); if (ret >= 0) return 0; if (ret == -ENOSPC) ret = -EEXIST; out: if (eventfd) eventfd_ctx_put(eventfd); out_noeventfd: kfree(evtchnfd); return ret; } static int kvm_xen_eventfd_deassign(struct kvm *kvm, u32 port) { struct evtchnfd *evtchnfd; mutex_lock(&kvm->arch.xen.xen_lock); evtchnfd = idr_remove(&kvm->arch.xen.evtchn_ports, port); mutex_unlock(&kvm->arch.xen.xen_lock); if (!evtchnfd) return -ENOENT; synchronize_srcu(&kvm->srcu); if (!evtchnfd->deliver.port.port) eventfd_ctx_put(evtchnfd->deliver.eventfd.ctx); kfree(evtchnfd); return 0; } static int kvm_xen_eventfd_reset(struct kvm *kvm) { struct evtchnfd *evtchnfd, **all_evtchnfds; int i; int n = 0; mutex_lock(&kvm->arch.xen.xen_lock); /* * Because synchronize_srcu() cannot be called inside the * critical section, first collect all the evtchnfd objects * in an array as they are removed from evtchn_ports. */ idr_for_each_entry(&kvm->arch.xen.evtchn_ports, evtchnfd, i) n++; all_evtchnfds = kmalloc_array(n, sizeof(struct evtchnfd *), GFP_KERNEL); if (!all_evtchnfds) { mutex_unlock(&kvm->arch.xen.xen_lock); return -ENOMEM; } n = 0; idr_for_each_entry(&kvm->arch.xen.evtchn_ports, evtchnfd, i) { all_evtchnfds[n++] = evtchnfd; idr_remove(&kvm->arch.xen.evtchn_ports, evtchnfd->send_port); } mutex_unlock(&kvm->arch.xen.xen_lock); synchronize_srcu(&kvm->srcu); while (n--) { evtchnfd = all_evtchnfds[n]; if (!evtchnfd->deliver.port.port) eventfd_ctx_put(evtchnfd->deliver.eventfd.ctx); kfree(evtchnfd); } kfree(all_evtchnfds); return 0; } static int kvm_xen_setattr_evtchn(struct kvm *kvm, struct kvm_xen_hvm_attr *data) { u32 port = data->u.evtchn.send_port; if (data->u.evtchn.flags == KVM_XEN_EVTCHN_RESET) return kvm_xen_eventfd_reset(kvm); if (!port || port >= max_evtchn_port(kvm)) return -EINVAL; if (data->u.evtchn.flags == KVM_XEN_EVTCHN_DEASSIGN) return kvm_xen_eventfd_deassign(kvm, port); if (data->u.evtchn.flags == KVM_XEN_EVTCHN_UPDATE) return kvm_xen_eventfd_update(kvm, data); if (data->u.evtchn.flags) return -EINVAL; return kvm_xen_eventfd_assign(kvm, data); } static bool kvm_xen_hcall_evtchn_send(struct kvm_vcpu *vcpu, u64 param, u64 *r) { struct evtchnfd *evtchnfd; struct evtchn_send send; struct x86_exception e; /* Sanity check: this structure is the same for 32-bit and 64-bit */ BUILD_BUG_ON(sizeof(send) != 4); if (kvm_read_guest_virt(vcpu, param, &send, sizeof(send), &e)) { *r = -EFAULT; return true; } /* * evtchnfd is protected by kvm->srcu; the idr lookup instead * is protected by RCU. */ rcu_read_lock(); evtchnfd = idr_find(&vcpu->kvm->arch.xen.evtchn_ports, send.port); rcu_read_unlock(); if (!evtchnfd) return false; if (evtchnfd->deliver.port.port) { int ret = kvm_xen_set_evtchn(&evtchnfd->deliver.port, vcpu->kvm); if (ret < 0 && ret != -ENOTCONN) return false; } else { eventfd_signal(evtchnfd->deliver.eventfd.ctx); } *r = 0; return true; } void kvm_xen_init_vcpu(struct kvm_vcpu *vcpu) { vcpu->arch.xen.vcpu_id = vcpu->vcpu_idx; vcpu->arch.xen.poll_evtchn = 0; timer_setup(&vcpu->arch.xen.poll_timer, cancel_evtchn_poll, 0); hrtimer_init(&vcpu->arch.xen.timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD); vcpu->arch.xen.timer.function = xen_timer_callback; kvm_gpc_init(&vcpu->arch.xen.runstate_cache, vcpu->kvm); kvm_gpc_init(&vcpu->arch.xen.runstate2_cache, vcpu->kvm); kvm_gpc_init(&vcpu->arch.xen.vcpu_info_cache, vcpu->kvm); kvm_gpc_init(&vcpu->arch.xen.vcpu_time_info_cache, vcpu->kvm); } void kvm_xen_destroy_vcpu(struct kvm_vcpu *vcpu) { if (kvm_xen_timer_enabled(vcpu)) kvm_xen_stop_timer(vcpu); kvm_gpc_deactivate(&vcpu->arch.xen.runstate_cache); kvm_gpc_deactivate(&vcpu->arch.xen.runstate2_cache); kvm_gpc_deactivate(&vcpu->arch.xen.vcpu_info_cache); kvm_gpc_deactivate(&vcpu->arch.xen.vcpu_time_info_cache); del_timer_sync(&vcpu->arch.xen.poll_timer); } void kvm_xen_update_tsc_info(struct kvm_vcpu *vcpu) { struct kvm_cpuid_entry2 *entry; u32 function; if (!vcpu->arch.xen.cpuid.base) return; function = vcpu->arch.xen.cpuid.base | XEN_CPUID_LEAF(3); if (function > vcpu->arch.xen.cpuid.limit) return; entry = kvm_find_cpuid_entry_index(vcpu, function, 1); if (entry) { entry->ecx = vcpu->arch.hv_clock.tsc_to_system_mul; entry->edx = vcpu->arch.hv_clock.tsc_shift; } entry = kvm_find_cpuid_entry_index(vcpu, function, 2); if (entry) entry->eax = vcpu->arch.hw_tsc_khz; } void kvm_xen_init_vm(struct kvm *kvm) { mutex_init(&kvm->arch.xen.xen_lock); idr_init(&kvm->arch.xen.evtchn_ports); kvm_gpc_init(&kvm->arch.xen.shinfo_cache, kvm); } void kvm_xen_destroy_vm(struct kvm *kvm) { struct evtchnfd *evtchnfd; int i; kvm_gpc_deactivate(&kvm->arch.xen.shinfo_cache); idr_for_each_entry(&kvm->arch.xen.evtchn_ports, evtchnfd, i) { if (!evtchnfd->deliver.port.port) eventfd_ctx_put(evtchnfd->deliver.eventfd.ctx); kfree(evtchnfd); } idr_destroy(&kvm->arch.xen.evtchn_ports); if (kvm->arch.xen_hvm_config.msr) static_branch_slow_dec_deferred(&kvm_xen_enabled); } |
| 11 11 89 7 82 101 102 102 82 190 138 137 96 96 67 102 143 75 208 187 187 57 61 4 57 16 96 96 95 5 35 96 8 35 96 96 91 37 91 96 96 64 35 33 2 35 35 35 7 12 14 35 4 11 12 6 12 95 24 96 4 90 12 105 4 48 96 96 96 96 96 24 106 46 68 68 25 68 68 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 | // SPDX-License-Identifier: GPL-2.0-only /* * reservations.c * * Allocation reservations implementation * * Some code borrowed from fs/ext3/balloc.c and is: * * Copyright (C) 1992, 1993, 1994, 1995 * Remy Card (card@masi.ibp.fr) * Laboratoire MASI - Institut Blaise Pascal * Universite Pierre et Marie Curie (Paris VI) * * The rest is copyright (C) 2010 Novell. All rights reserved. */ #include <linux/fs.h> #include <linux/types.h> #include <linux/highmem.h> #include <linux/bitops.h> #include <linux/list.h> #include <cluster/masklog.h> #include "ocfs2.h" #include "ocfs2_trace.h" #ifdef CONFIG_OCFS2_DEBUG_FS #define OCFS2_CHECK_RESERVATIONS #endif static DEFINE_SPINLOCK(resv_lock); int ocfs2_dir_resv_allowed(struct ocfs2_super *osb) { return (osb->osb_resv_level && osb->osb_dir_resv_level); } static unsigned int ocfs2_resv_window_bits(struct ocfs2_reservation_map *resmap, struct ocfs2_alloc_reservation *resv) { struct ocfs2_super *osb = resmap->m_osb; unsigned int bits; if (!(resv->r_flags & OCFS2_RESV_FLAG_DIR)) { /* 8, 16, 32, 64, 128, 256, 512, 1024 */ bits = 4 << osb->osb_resv_level; } else { bits = 4 << osb->osb_dir_resv_level; } return bits; } static inline unsigned int ocfs2_resv_end(struct ocfs2_alloc_reservation *resv) { if (resv->r_len) return resv->r_start + resv->r_len - 1; return resv->r_start; } static inline int ocfs2_resv_empty(struct ocfs2_alloc_reservation *resv) { return !!(resv->r_len == 0); } static inline int ocfs2_resmap_disabled(struct ocfs2_reservation_map *resmap) { if (resmap->m_osb->osb_resv_level == 0) return 1; return 0; } static void ocfs2_dump_resv(struct ocfs2_reservation_map *resmap) { struct ocfs2_super *osb = resmap->m_osb; struct rb_node *node; struct ocfs2_alloc_reservation *resv; int i = 0; mlog(ML_NOTICE, "Dumping resmap for device %s. Bitmap length: %u\n", osb->dev_str, resmap->m_bitmap_len); node = rb_first(&resmap->m_reservations); while (node) { resv = rb_entry(node, struct ocfs2_alloc_reservation, r_node); mlog(ML_NOTICE, "start: %u\tend: %u\tlen: %u\tlast_start: %u" "\tlast_len: %u\n", resv->r_start, ocfs2_resv_end(resv), resv->r_len, resv->r_last_start, resv->r_last_len); node = rb_next(node); i++; } mlog(ML_NOTICE, "%d reservations found. LRU follows\n", i); i = 0; list_for_each_entry(resv, &resmap->m_lru, r_lru) { mlog(ML_NOTICE, "LRU(%d) start: %u\tend: %u\tlen: %u\t" "last_start: %u\tlast_len: %u\n", i, resv->r_start, ocfs2_resv_end(resv), resv->r_len, resv->r_last_start, resv->r_last_len); i++; } } #ifdef OCFS2_CHECK_RESERVATIONS static int ocfs2_validate_resmap_bits(struct ocfs2_reservation_map *resmap, int i, struct ocfs2_alloc_reservation *resv) { char *disk_bitmap = resmap->m_disk_bitmap; unsigned int start = resv->r_start; unsigned int end = ocfs2_resv_end(resv); while (start <= end) { if (ocfs2_test_bit(start, disk_bitmap)) { mlog(ML_ERROR, "reservation %d covers an allocated area " "starting at bit %u!\n", i, start); return 1; } start++; } return 0; } static void ocfs2_check_resmap(struct ocfs2_reservation_map *resmap) { unsigned int off = 0; int i = 0; struct rb_node *node; struct ocfs2_alloc_reservation *resv; node = rb_first(&resmap->m_reservations); while (node) { resv = rb_entry(node, struct ocfs2_alloc_reservation, r_node); if (i > 0 && resv->r_start <= off) { mlog(ML_ERROR, "reservation %d has bad start off!\n", i); goto bad; } if (resv->r_len == 0) { mlog(ML_ERROR, "reservation %d has no length!\n", i); goto bad; } if (resv->r_start > ocfs2_resv_end(resv)) { mlog(ML_ERROR, "reservation %d has invalid range!\n", i); goto bad; } if (ocfs2_resv_end(resv) >= resmap->m_bitmap_len) { mlog(ML_ERROR, "reservation %d extends past bitmap!\n", i); goto bad; } if (ocfs2_validate_resmap_bits(resmap, i, resv)) goto bad; off = ocfs2_resv_end(resv); node = rb_next(node); i++; } return; bad: ocfs2_dump_resv(resmap); BUG(); } #else static inline void ocfs2_check_resmap(struct ocfs2_reservation_map *resmap) { } #endif void ocfs2_resv_init_once(struct ocfs2_alloc_reservation *resv) { memset(resv, 0, sizeof(*resv)); INIT_LIST_HEAD(&resv->r_lru); } void ocfs2_resv_set_type(struct ocfs2_alloc_reservation *resv, unsigned int flags) { BUG_ON(flags & ~OCFS2_RESV_TYPES); resv->r_flags |= flags; } void ocfs2_resmap_init(struct ocfs2_super *osb, struct ocfs2_reservation_map *resmap) { memset(resmap, 0, sizeof(*resmap)); resmap->m_osb = osb; resmap->m_reservations = RB_ROOT; /* m_bitmap_len is initialized to zero by the above memset. */ INIT_LIST_HEAD(&resmap->m_lru); } static void ocfs2_resv_mark_lru(struct ocfs2_reservation_map *resmap, struct ocfs2_alloc_reservation *resv) { assert_spin_locked(&resv_lock); if (!list_empty(&resv->r_lru)) list_del_init(&resv->r_lru); list_add_tail(&resv->r_lru, &resmap->m_lru); } static void __ocfs2_resv_trunc(struct ocfs2_alloc_reservation *resv) { resv->r_len = 0; resv->r_start = 0; } static void ocfs2_resv_remove(struct ocfs2_reservation_map *resmap, struct ocfs2_alloc_reservation *resv) { if (resv->r_flags & OCFS2_RESV_FLAG_INUSE) { list_del_init(&resv->r_lru); rb_erase(&resv->r_node, &resmap->m_reservations); resv->r_flags &= ~OCFS2_RESV_FLAG_INUSE; } } static void __ocfs2_resv_discard(struct ocfs2_reservation_map *resmap, struct ocfs2_alloc_reservation *resv) { assert_spin_locked(&resv_lock); __ocfs2_resv_trunc(resv); /* * last_len and last_start no longer make sense if * we're changing the range of our allocations. */ resv->r_last_len = resv->r_last_start = 0; ocfs2_resv_remove(resmap, resv); } /* does nothing if 'resv' is null */ void ocfs2_resv_discard(struct ocfs2_reservation_map *resmap, struct ocfs2_alloc_reservation *resv) { if (resv) { spin_lock(&resv_lock); __ocfs2_resv_discard(resmap, resv); spin_unlock(&resv_lock); } } static void ocfs2_resmap_clear_all_resv(struct ocfs2_reservation_map *resmap) { struct rb_node *node; struct ocfs2_alloc_reservation *resv; assert_spin_locked(&resv_lock); while ((node = rb_last(&resmap->m_reservations)) != NULL) { resv = rb_entry(node, struct ocfs2_alloc_reservation, r_node); __ocfs2_resv_discard(resmap, resv); } } void ocfs2_resmap_restart(struct ocfs2_reservation_map *resmap, unsigned int clen, char *disk_bitmap) { if (ocfs2_resmap_disabled(resmap)) return; spin_lock(&resv_lock); ocfs2_resmap_clear_all_resv(resmap); resmap->m_bitmap_len = clen; resmap->m_disk_bitmap = disk_bitmap; spin_unlock(&resv_lock); } void ocfs2_resmap_uninit(struct ocfs2_reservation_map *resmap) { /* Does nothing for now. Keep this around for API symmetry */ } static void ocfs2_resv_insert(struct ocfs2_reservation_map *resmap, struct ocfs2_alloc_reservation *new) { struct rb_root *root = &resmap->m_reservations; struct rb_node *parent = NULL; struct rb_node **p = &root->rb_node; struct ocfs2_alloc_reservation *tmp; assert_spin_locked(&resv_lock); trace_ocfs2_resv_insert(new->r_start, new->r_len); while (*p) { parent = *p; tmp = rb_entry(parent, struct ocfs2_alloc_reservation, r_node); if (new->r_start < tmp->r_start) { p = &(*p)->rb_left; /* * This is a good place to check for * overlapping reservations. */ BUG_ON(ocfs2_resv_end(new) >= tmp->r_start); } else if (new->r_start > ocfs2_resv_end(tmp)) { p = &(*p)->rb_right; } else { /* This should never happen! */ mlog(ML_ERROR, "Duplicate reservation window!\n"); BUG(); } } rb_link_node(&new->r_node, parent, p); rb_insert_color(&new->r_node, root); new->r_flags |= OCFS2_RESV_FLAG_INUSE; ocfs2_resv_mark_lru(resmap, new); ocfs2_check_resmap(resmap); } /** * ocfs2_find_resv_lhs() - find the window which contains goal * @resmap: reservation map to search * @goal: which bit to search for * * If a window containing that goal is not found, we return the window * which comes before goal. Returns NULL on empty rbtree or no window * before goal. */ static struct ocfs2_alloc_reservation * ocfs2_find_resv_lhs(struct ocfs2_reservation_map *resmap, unsigned int goal) { struct ocfs2_alloc_reservation *resv = NULL; struct ocfs2_alloc_reservation *prev_resv = NULL; struct rb_node *node = resmap->m_reservations.rb_node; assert_spin_locked(&resv_lock); if (!node) return NULL; node = rb_first(&resmap->m_reservations); while (node) { resv = rb_entry(node, struct ocfs2_alloc_reservation, r_node); if (resv->r_start <= goal && ocfs2_resv_end(resv) >= goal) break; /* Check if we overshot the reservation just before goal? */ if (resv->r_start > goal) { resv = prev_resv; break; } prev_resv = resv; node = rb_next(node); } return resv; } /* * We are given a range within the bitmap, which corresponds to a gap * inside the reservations tree (search_start, search_len). The range * can be anything from the whole bitmap, to a gap between * reservations. * * The start value of *rstart is insignificant. * * This function searches the bitmap range starting at search_start * with length search_len for a set of contiguous free bits. We try * to find up to 'wanted' bits, but can sometimes return less. * * Returns the length of allocation, 0 if no free bits are found. * * *cstart and *clen will also be populated with the result. */ static int ocfs2_resmap_find_free_bits(struct ocfs2_reservation_map *resmap, unsigned int wanted, unsigned int search_start, unsigned int search_len, unsigned int *rstart, unsigned int *rlen) { void *bitmap = resmap->m_disk_bitmap; unsigned int best_start, best_len = 0; int offset, start, found; trace_ocfs2_resmap_find_free_bits_begin(search_start, search_len, wanted, resmap->m_bitmap_len); found = best_start = best_len = 0; start = search_start; while ((offset = ocfs2_find_next_zero_bit(bitmap, resmap->m_bitmap_len, start)) < resmap->m_bitmap_len) { /* Search reached end of the region */ if (offset >= (search_start + search_len)) break; if (offset == start) { /* we found a zero */ found++; /* move start to the next bit to test */ start++; } else { /* got a zero after some ones */ found = 1; start = offset + 1; } if (found > best_len) { best_len = found; best_start = start - found; } if (found >= wanted) break; } if (best_len == 0) return 0; if (best_len >= wanted) best_len = wanted; *rlen = best_len; *rstart = best_start; trace_ocfs2_resmap_find_free_bits_end(best_start, best_len); return *rlen; } static void __ocfs2_resv_find_window(struct ocfs2_reservation_map *resmap, struct ocfs2_alloc_reservation *resv, unsigned int goal, unsigned int wanted) { struct rb_root *root = &resmap->m_reservations; unsigned int gap_start, gap_end, gap_len; struct ocfs2_alloc_reservation *prev_resv, *next_resv; struct rb_node *prev, *next; unsigned int cstart, clen; unsigned int best_start = 0, best_len = 0; /* * Nasty cases to consider: * * - rbtree is empty * - our window should be first in all reservations * - our window should be last in all reservations * - need to make sure we don't go past end of bitmap */ trace_ocfs2_resv_find_window_begin(resv->r_start, ocfs2_resv_end(resv), goal, wanted, RB_EMPTY_ROOT(root)); assert_spin_locked(&resv_lock); if (RB_EMPTY_ROOT(root)) { /* * Easiest case - empty tree. We can just take * whatever window of free bits we want. */ clen = ocfs2_resmap_find_free_bits(resmap, wanted, goal, resmap->m_bitmap_len - goal, &cstart, &clen); /* * This should never happen - the local alloc window * will always have free bits when we're called. */ BUG_ON(goal == 0 && clen == 0); if (clen == 0) return; resv->r_start = cstart; resv->r_len = clen; ocfs2_resv_insert(resmap, resv); return; } prev_resv = ocfs2_find_resv_lhs(resmap, goal); if (prev_resv == NULL) { /* * A NULL here means that the search code couldn't * find a window that starts before goal. * * However, we can take the first window after goal, * which is also by definition, the leftmost window in * the entire tree. If we can find free bits in the * gap between goal and the LHS window, then the * reservation can safely be placed there. * * Otherwise we fall back to a linear search, checking * the gaps in between windows for a place to * allocate. */ next = rb_first(root); next_resv = rb_entry(next, struct ocfs2_alloc_reservation, r_node); /* * The search should never return such a window. (see * comment above */ if (next_resv->r_start <= goal) { mlog(ML_ERROR, "goal: %u next_resv: start %u len %u\n", goal, next_resv->r_start, next_resv->r_len); ocfs2_dump_resv(resmap); BUG(); } clen = ocfs2_resmap_find_free_bits(resmap, wanted, goal, next_resv->r_start - goal, &cstart, &clen); if (clen) { best_len = clen; best_start = cstart; if (best_len == wanted) goto out_insert; } prev_resv = next_resv; next_resv = NULL; } trace_ocfs2_resv_find_window_prev(prev_resv->r_start, ocfs2_resv_end(prev_resv)); prev = &prev_resv->r_node; /* Now we do a linear search for a window, starting at 'prev_rsv' */ while (1) { next = rb_next(prev); if (next) { next_resv = rb_entry(next, struct ocfs2_alloc_reservation, r_node); gap_start = ocfs2_resv_end(prev_resv) + 1; gap_end = next_resv->r_start - 1; gap_len = gap_end - gap_start + 1; } else { /* * We're at the rightmost edge of the * tree. See if a reservation between this * window and the end of the bitmap will work. */ gap_start = ocfs2_resv_end(prev_resv) + 1; gap_len = resmap->m_bitmap_len - gap_start; gap_end = resmap->m_bitmap_len - 1; } trace_ocfs2_resv_find_window_next(next ? next_resv->r_start: -1, next ? ocfs2_resv_end(next_resv) : -1); /* * No need to check this gap if we have already found * a larger region of free bits. */ if (gap_len <= best_len) goto next_resv; clen = ocfs2_resmap_find_free_bits(resmap, wanted, gap_start, gap_len, &cstart, &clen); if (clen == wanted) { best_len = clen; best_start = cstart; goto out_insert; } else if (clen > best_len) { best_len = clen; best_start = cstart; } next_resv: if (!next) break; prev = next; prev_resv = rb_entry(prev, struct ocfs2_alloc_reservation, r_node); } out_insert: if (best_len) { resv->r_start = best_start; resv->r_len = best_len; ocfs2_resv_insert(resmap, resv); } } static void ocfs2_cannibalize_resv(struct ocfs2_reservation_map *resmap, struct ocfs2_alloc_reservation *resv, unsigned int wanted) { struct ocfs2_alloc_reservation *lru_resv; int tmpwindow = !!(resv->r_flags & OCFS2_RESV_FLAG_TMP); unsigned int min_bits; if (!tmpwindow) min_bits = ocfs2_resv_window_bits(resmap, resv) >> 1; else min_bits = wanted; /* We at know the temp window will use all * of these bits */ /* * Take the first reservation off the LRU as our 'target'. We * don't try to be smart about it. There might be a case for * searching based on size but I don't have enough data to be * sure. --Mark (3/16/2010) */ lru_resv = list_first_entry(&resmap->m_lru, struct ocfs2_alloc_reservation, r_lru); trace_ocfs2_cannibalize_resv_begin(lru_resv->r_start, lru_resv->r_len, ocfs2_resv_end(lru_resv)); /* * Cannibalize (some or all) of the target reservation and * feed it to the current window. */ if (lru_resv->r_len <= min_bits) { /* * Discard completely if size is less than or equal to a * reasonable threshold - 50% of window bits for non temporary * windows. */ resv->r_start = lru_resv->r_start; resv->r_len = lru_resv->r_len; __ocfs2_resv_discard(resmap, lru_resv); } else { unsigned int shrink; if (tmpwindow) shrink = min_bits; else shrink = lru_resv->r_len / 2; lru_resv->r_len -= shrink; resv->r_start = ocfs2_resv_end(lru_resv) + 1; resv->r_len = shrink; } trace_ocfs2_cannibalize_resv_end(resv->r_start, ocfs2_resv_end(resv), resv->r_len, resv->r_last_start, resv->r_last_len); ocfs2_resv_insert(resmap, resv); } static void ocfs2_resv_find_window(struct ocfs2_reservation_map *resmap, struct ocfs2_alloc_reservation *resv, unsigned int wanted) { unsigned int goal = 0; BUG_ON(!ocfs2_resv_empty(resv)); /* * Begin by trying to get a window as close to the previous * one as possible. Using the most recent allocation as a * start goal makes sense. */ if (resv->r_last_len) { goal = resv->r_last_start + resv->r_last_len; if (goal >= resmap->m_bitmap_len) goal = 0; } __ocfs2_resv_find_window(resmap, resv, goal, wanted); /* Search from last alloc didn't work, try once more from beginning. */ if (ocfs2_resv_empty(resv) && goal != 0) __ocfs2_resv_find_window(resmap, resv, 0, wanted); if (ocfs2_resv_empty(resv)) { /* * Still empty? Pull oldest one off the LRU, remove it from * tree, put this one in it's place. */ ocfs2_cannibalize_resv(resmap, resv, wanted); } BUG_ON(ocfs2_resv_empty(resv)); } int ocfs2_resmap_resv_bits(struct ocfs2_reservation_map *resmap, struct ocfs2_alloc_reservation *resv, int *cstart, int *clen) { if (resv == NULL || ocfs2_resmap_disabled(resmap)) return -ENOSPC; spin_lock(&resv_lock); if (ocfs2_resv_empty(resv)) { /* * We don't want to over-allocate for temporary * windows. Otherwise, we run the risk of fragmenting the * allocation space. */ unsigned int wanted = ocfs2_resv_window_bits(resmap, resv); if ((resv->r_flags & OCFS2_RESV_FLAG_TMP) || wanted < *clen) wanted = *clen; /* * Try to get a window here. If it works, we must fall * through and test the bitmap . This avoids some * ping-ponging of windows due to non-reserved space * being allocation before we initialize a window for * that inode. */ ocfs2_resv_find_window(resmap, resv, wanted); trace_ocfs2_resmap_resv_bits(resv->r_start, resv->r_len); } BUG_ON(ocfs2_resv_empty(resv)); *cstart = resv->r_start; *clen = resv->r_len; spin_unlock(&resv_lock); return 0; } static void ocfs2_adjust_resv_from_alloc(struct ocfs2_reservation_map *resmap, struct ocfs2_alloc_reservation *resv, unsigned int start, unsigned int end) { unsigned int rhs = 0; unsigned int old_end = ocfs2_resv_end(resv); BUG_ON(start != resv->r_start || old_end < end); /* * Completely used? We can remove it then. */ if (old_end == end) { __ocfs2_resv_discard(resmap, resv); return; } rhs = old_end - end; /* * This should have been trapped above. */ BUG_ON(rhs == 0); resv->r_start = end + 1; resv->r_len = old_end - resv->r_start + 1; } void ocfs2_resmap_claimed_bits(struct ocfs2_reservation_map *resmap, struct ocfs2_alloc_reservation *resv, u32 cstart, u32 clen) { unsigned int cend = cstart + clen - 1; if (resmap == NULL || ocfs2_resmap_disabled(resmap)) return; if (resv == NULL) return; BUG_ON(cstart != resv->r_start); spin_lock(&resv_lock); trace_ocfs2_resmap_claimed_bits_begin(cstart, cend, clen, resv->r_start, ocfs2_resv_end(resv), resv->r_len, resv->r_last_start, resv->r_last_len); BUG_ON(cstart < resv->r_start); BUG_ON(cstart > ocfs2_resv_end(resv)); BUG_ON(cend > ocfs2_resv_end(resv)); ocfs2_adjust_resv_from_alloc(resmap, resv, cstart, cend); resv->r_last_start = cstart; resv->r_last_len = clen; /* * May have been discarded above from * ocfs2_adjust_resv_from_alloc(). */ if (!ocfs2_resv_empty(resv)) ocfs2_resv_mark_lru(resmap, resv); trace_ocfs2_resmap_claimed_bits_end(resv->r_start, ocfs2_resv_end(resv), resv->r_len, resv->r_last_start, resv->r_last_len); ocfs2_check_resmap(resmap); spin_unlock(&resv_lock); } |
| 4 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 | // SPDX-License-Identifier: GPL-2.0 /* * Copyright (C) 1992, 1998-2004 Linus Torvalds, Ingo Molnar * * This file contains the /proc/irq/ handling code. */ #include <linux/irq.h> #include <linux/gfp.h> #include <linux/proc_fs.h> #include <linux/seq_file.h> #include <linux/interrupt.h> #include <linux/kernel_stat.h> #include <linux/mutex.h> #include "internals.h" /* * Access rules: * * procfs protects read/write of /proc/irq/N/ files against a * concurrent free of the interrupt descriptor. remove_proc_entry() * immediately prevents new read/writes to happen and waits for * already running read/write functions to complete. * * We remove the proc entries first and then delete the interrupt * descriptor from the radix tree and free it. So it is guaranteed * that irq_to_desc(N) is valid as long as the read/writes are * permitted by procfs. * * The read from /proc/interrupts is a different problem because there * is no protection. So the lookup and the access to irqdesc * information must be protected by sparse_irq_lock. */ static struct proc_dir_entry *root_irq_dir; #ifdef CONFIG_SMP enum { AFFINITY, AFFINITY_LIST, EFFECTIVE, EFFECTIVE_LIST, }; static int show_irq_affinity(int type, struct seq_file *m) { struct irq_desc *desc = irq_to_desc((long)m->private); const struct cpumask *mask; switch (type) { case AFFINITY: case AFFINITY_LIST: mask = desc->irq_common_data.affinity; if (irq_move_pending(&desc->irq_data)) mask = irq_desc_get_pending_mask(desc); break; case EFFECTIVE: case EFFECTIVE_LIST: #ifdef CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK mask = irq_data_get_effective_affinity_mask(&desc->irq_data); break; #endif default: return -EINVAL; } switch (type) { case AFFINITY_LIST: case EFFECTIVE_LIST: seq_printf(m, "%*pbl\n", cpumask_pr_args(mask)); break; case AFFINITY: case EFFECTIVE: seq_printf(m, "%*pb\n", cpumask_pr_args(mask)); break; } return 0; } static int irq_affinity_hint_proc_show(struct seq_file *m, void *v) { struct irq_desc *desc = irq_to_desc((long)m->private); unsigned long flags; cpumask_var_t mask; if (!zalloc_cpumask_var(&mask, GFP_KERNEL)) return -ENOMEM; raw_spin_lock_irqsave(&desc->lock, flags); if (desc->affinity_hint) cpumask_copy(mask, desc->affinity_hint); raw_spin_unlock_irqrestore(&desc->lock, flags); seq_printf(m, "%*pb\n", cpumask_pr_args(mask)); free_cpumask_var(mask); return 0; } int no_irq_affinity; static int irq_affinity_proc_show(struct seq_file *m, void *v) { return show_irq_affinity(AFFINITY, m); } static int irq_affinity_list_proc_show(struct seq_file *m, void *v) { return show_irq_affinity(AFFINITY_LIST, m); } #ifndef CONFIG_AUTO_IRQ_AFFINITY static inline int irq_select_affinity_usr(unsigned int irq) { /* * If the interrupt is started up already then this fails. The * interrupt is assigned to an online CPU already. There is no * point to move it around randomly. Tell user space that the * selected mask is bogus. * * If not then any change to the affinity is pointless because the * startup code invokes irq_setup_affinity() which will select * a online CPU anyway. */ return -EINVAL; } #else /* ALPHA magic affinity auto selector. Keep it for historical reasons. */ static inline int irq_select_affinity_usr(unsigned int irq) { return irq_select_affinity(irq); } #endif static ssize_t write_irq_affinity(int type, struct file *file, const char __user *buffer, size_t count, loff_t *pos) { unsigned int irq = (int)(long)pde_data(file_inode(file)); cpumask_var_t new_value; int err; if (!irq_can_set_affinity_usr(irq) || no_irq_affinity) return -EPERM; if (!zalloc_cpumask_var(&new_value, GFP_KERNEL)) return -ENOMEM; if (type) err = cpumask_parselist_user(buffer, count, new_value); else err = cpumask_parse_user(buffer, count, new_value); if (err) goto free_cpumask; /* * Do not allow disabling IRQs completely - it's a too easy * way to make the system unusable accidentally :-) At least * one online CPU still has to be targeted. */ if (!cpumask_intersects(new_value, cpu_online_mask)) { /* * Special case for empty set - allow the architecture code * to set default SMP affinity. */ err = irq_select_affinity_usr(irq) ? -EINVAL : count; } else { err = irq_set_affinity(irq, new_value); if (!err) err = count; } free_cpumask: free_cpumask_var(new_value); return err; } static ssize_t irq_affinity_proc_write(struct file *file, const char __user *buffer, size_t count, loff_t *pos) { return write_irq_affinity(0, file, buffer, count, pos); } static ssize_t irq_affinity_list_proc_write(struct file *file, const char __user *buffer, size_t count, loff_t *pos) { return write_irq_affinity(1, file, buffer, count, pos); } static int irq_affinity_proc_open(struct inode *inode, struct file *file) { return single_open(file, irq_affinity_proc_show, pde_data(inode)); } static int irq_affinity_list_proc_open(struct inode *inode, struct file *file) { return single_open(file, irq_affinity_list_proc_show, pde_data(inode)); } static const struct proc_ops irq_affinity_proc_ops = { .proc_open = irq_affinity_proc_open, .proc_read = seq_read, .proc_lseek = seq_lseek, .proc_release = single_release, .proc_write = irq_affinity_proc_write, }; static const struct proc_ops irq_affinity_list_proc_ops = { .proc_open = irq_affinity_list_proc_open, .proc_read = seq_read, .proc_lseek = seq_lseek, .proc_release = single_release, .proc_write = irq_affinity_list_proc_write, }; #ifdef CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK static int irq_effective_aff_proc_show(struct seq_file *m, void *v) { return show_irq_affinity(EFFECTIVE, m); } static int irq_effective_aff_list_proc_show(struct seq_file *m, void *v) { return show_irq_affinity(EFFECTIVE_LIST, m); } #endif static int default_affinity_show(struct seq_file *m, void *v) { seq_printf(m, "%*pb\n", cpumask_pr_args(irq_default_affinity)); return 0; } static ssize_t default_affinity_write(struct file *file, const char __user *buffer, size_t count, loff_t *ppos) { cpumask_var_t new_value; int err; if (!zalloc_cpumask_var(&new_value, GFP_KERNEL)) return -ENOMEM; err = cpumask_parse_user(buffer, count, new_value); if (err) goto out; /* * Do not allow disabling IRQs completely - it's a too easy * way to make the system unusable accidentally :-) At least * one online CPU still has to be targeted. */ if (!cpumask_intersects(new_value, cpu_online_mask)) { err = -EINVAL; goto out; } cpumask_copy(irq_default_affinity, new_value); err = count; out: free_cpumask_var(new_value); return err; } static int default_affinity_open(struct inode *inode, struct file *file) { return single_open(file, default_affinity_show, pde_data(inode)); } static const struct proc_ops default_affinity_proc_ops = { .proc_open = default_affinity_open, .proc_read = seq_read, .proc_lseek = seq_lseek, .proc_release = single_release, .proc_write = default_affinity_write, }; static int irq_node_proc_show(struct seq_file *m, void *v) { struct irq_desc *desc = irq_to_desc((long) m->private); seq_printf(m, "%d\n", irq_desc_get_node(desc)); return 0; } #endif static int irq_spurious_proc_show(struct seq_file *m, void *v) { struct irq_desc *desc = irq_to_desc((long) m->private); seq_printf(m, "count %u\n" "unhandled %u\n" "last_unhandled %u ms\n", desc->irq_count, desc->irqs_unhandled, jiffies_to_msecs(desc->last_unhandled)); return 0; } #define MAX_NAMELEN 128 static int name_unique(unsigned int irq, struct irqaction *new_action) { struct irq_desc *desc = irq_to_desc(irq); struct irqaction *action; unsigned long flags; int ret = 1; raw_spin_lock_irqsave(&desc->lock, flags); for_each_action_of_desc(desc, action) { if ((action != new_action) && action->name && !strcmp(new_action->name, action->name)) { ret = 0; break; } } raw_spin_unlock_irqrestore(&desc->lock, flags); return ret; } void register_handler_proc(unsigned int irq, struct irqaction *action) { char name [MAX_NAMELEN]; struct irq_desc *desc = irq_to_desc(irq); if (!desc->dir || action->dir || !action->name || !name_unique(irq, action)) return; snprintf(name, MAX_NAMELEN, "%s", action->name); /* create /proc/irq/1234/handler/ */ action->dir = proc_mkdir(name, desc->dir); } #undef MAX_NAMELEN #define MAX_NAMELEN 10 void register_irq_proc(unsigned int irq, struct irq_desc *desc) { static DEFINE_MUTEX(register_lock); void __maybe_unused *irqp = (void *)(unsigned long) irq; char name [MAX_NAMELEN]; if (!root_irq_dir || (desc->irq_data.chip == &no_irq_chip)) return; /* * irq directories are registered only when a handler is * added, not when the descriptor is created, so multiple * tasks might try to register at the same time. */ mutex_lock(®ister_lock); if (desc->dir) goto out_unlock; sprintf(name, "%d", irq); /* create /proc/irq/1234 */ desc->dir = proc_mkdir(name, root_irq_dir); if (!desc->dir) goto out_unlock; #ifdef CONFIG_SMP umode_t umode = S_IRUGO; if (irq_can_set_affinity_usr(desc->irq_data.irq)) umode |= S_IWUSR; /* create /proc/irq/<irq>/smp_affinity */ proc_create_data("smp_affinity", umode, desc->dir, &irq_affinity_proc_ops, irqp); /* create /proc/irq/<irq>/affinity_hint */ proc_create_single_data("affinity_hint", 0444, desc->dir, irq_affinity_hint_proc_show, irqp); /* create /proc/irq/<irq>/smp_affinity_list */ proc_create_data("smp_affinity_list", umode, desc->dir, &irq_affinity_list_proc_ops, irqp); proc_create_single_data("node", 0444, desc->dir, irq_node_proc_show, irqp); # ifdef CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK proc_create_single_data("effective_affinity", 0444, desc->dir, irq_effective_aff_proc_show, irqp); proc_create_single_data("effective_affinity_list", 0444, desc->dir, irq_effective_aff_list_proc_show, irqp); # endif #endif proc_create_single_data("spurious", 0444, desc->dir, irq_spurious_proc_show, (void *)(long)irq); out_unlock: mutex_unlock(®ister_lock); } void unregister_irq_proc(unsigned int irq, struct irq_desc *desc) { char name [MAX_NAMELEN]; if (!root_irq_dir || !desc->dir) return; #ifdef CONFIG_SMP remove_proc_entry("smp_affinity", desc->dir); remove_proc_entry("affinity_hint", desc->dir); remove_proc_entry("smp_affinity_list", desc->dir); remove_proc_entry("node", desc->dir); # ifdef CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK remove_proc_entry("effective_affinity", desc->dir); remove_proc_entry("effective_affinity_list", desc->dir); # endif #endif remove_proc_entry("spurious", desc->dir); sprintf(name, "%u", irq); remove_proc_entry(name, root_irq_dir); } #undef MAX_NAMELEN void unregister_handler_proc(unsigned int irq, struct irqaction *action) { proc_remove(action->dir); } static void register_default_affinity_proc(void) { #ifdef CONFIG_SMP proc_create("irq/default_smp_affinity", 0644, NULL, &default_affinity_proc_ops); #endif } void init_irq_proc(void) { unsigned int irq; struct irq_desc *desc; /* create /proc/irq */ root_irq_dir = proc_mkdir("irq", NULL); if (!root_irq_dir) return; register_default_affinity_proc(); /* * Create entries for all existing IRQs. */ for_each_irq_desc(irq, desc) register_irq_proc(irq, desc); } #ifdef CONFIG_GENERIC_IRQ_SHOW int __weak arch_show_interrupts(struct seq_file *p, int prec) { return 0; } #ifndef ACTUAL_NR_IRQS # define ACTUAL_NR_IRQS irq_get_nr_irqs() #endif int show_interrupts(struct seq_file *p, void *v) { const unsigned int nr_irqs = irq_get_nr_irqs(); static int prec; int i = *(loff_t *) v, j; struct irqaction *action; struct irq_desc *desc; unsigned long flags; if (i > ACTUAL_NR_IRQS) return 0; if (i == ACTUAL_NR_IRQS) return arch_show_interrupts(p, prec); /* print header and calculate the width of the first column */ if (i == 0) { for (prec = 3, j = 1000; prec < 10 && j <= nr_irqs; ++prec) j *= 10; seq_printf(p, "%*s", prec + 8, ""); for_each_online_cpu(j) seq_printf(p, "CPU%-8d", j); seq_putc(p, '\n'); } rcu_read_lock(); desc = irq_to_desc(i); if (!desc || irq_settings_is_hidden(desc)) goto outsparse; if (!desc->action || irq_desc_is_chained(desc) || !desc->kstat_irqs) goto outsparse; seq_printf(p, "%*d:", prec, i); for_each_online_cpu(j) { unsigned int cnt = desc->kstat_irqs ? per_cpu(desc->kstat_irqs->cnt, j) : 0; seq_put_decimal_ull_width(p, " ", cnt, 10); } seq_putc(p, ' '); raw_spin_lock_irqsave(&desc->lock, flags); if (desc->irq_data.chip) { if (desc->irq_data.chip->irq_print_chip) desc->irq_data.chip->irq_print_chip(&desc->irq_data, p); else if (desc->irq_data.chip->name) seq_printf(p, "%8s", desc->irq_data.chip->name); else seq_printf(p, "%8s", "-"); } else { seq_printf(p, "%8s", "None"); } if (desc->irq_data.domain) seq_printf(p, " %*lu", prec, desc->irq_data.hwirq); else seq_printf(p, " %*s", prec, ""); #ifdef CONFIG_GENERIC_IRQ_SHOW_LEVEL seq_printf(p, " %-8s", irqd_is_level_type(&desc->irq_data) ? "Level" : "Edge"); #endif if (desc->name) seq_printf(p, "-%-8s", desc->name); action = desc->action; if (action) { seq_printf(p, " %s", action->name); while ((action = action->next) != NULL) seq_printf(p, ", %s", action->name); } seq_putc(p, '\n'); raw_spin_unlock_irqrestore(&desc->lock, flags); outsparse: rcu_read_unlock(); return 0; } #endif |
| 123 15 81 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | /* SPDX-License-Identifier: GPL-2.0-only */ /* * Landlock LSM - Credential hooks * * Copyright © 2019-2020 Mickaël Salaün <mic@digikod.net> * Copyright © 2019-2020 ANSSI */ #ifndef _SECURITY_LANDLOCK_CRED_H #define _SECURITY_LANDLOCK_CRED_H #include <linux/cred.h> #include <linux/init.h> #include <linux/rcupdate.h> #include "ruleset.h" #include "setup.h" struct landlock_cred_security { struct landlock_ruleset *domain; }; static inline struct landlock_cred_security * landlock_cred(const struct cred *cred) { return cred->security + landlock_blob_sizes.lbs_cred; } static inline struct landlock_ruleset *landlock_get_current_domain(void) { return landlock_cred(current_cred())->domain; } /* * The call needs to come from an RCU read-side critical section. */ static inline const struct landlock_ruleset * landlock_get_task_domain(const struct task_struct *const task) { return landlock_cred(__task_cred(task))->domain; } static inline bool landlocked(const struct task_struct *const task) { bool has_dom; if (task == current) return !!landlock_get_current_domain(); rcu_read_lock(); has_dom = !!landlock_get_task_domain(task); rcu_read_unlock(); return has_dom; } __init void landlock_add_cred_hooks(void); #endif /* _SECURITY_LANDLOCK_CRED_H */ |
| 5365 5363 5368 5357 7422 7420 7389 4159 15084 15163 15078 5469 12421 2522 2534 2523 2437 923 1 2156 2 7408 7452 7 7 4 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 | // SPDX-License-Identifier: GPL-2.0 /* * Lockless hierarchical page accounting & limiting * * Copyright (C) 2014 Red Hat, Inc., Johannes Weiner */ #include <linux/page_counter.h> #include <linux/atomic.h> #include <linux/kernel.h> #include <linux/string.h> #include <linux/sched.h> #include <linux/bug.h> #include <asm/page.h> static bool track_protection(struct page_counter *c) { return c->protection_support; } static void propagate_protected_usage(struct page_counter *c, unsigned long usage) { unsigned long protected, old_protected; long delta; if (!c->parent) return; protected = min(usage, READ_ONCE(c->min)); old_protected = atomic_long_read(&c->min_usage); if (protected != old_protected) { old_protected = atomic_long_xchg(&c->min_usage, protected); delta = protected - old_protected; if (delta) atomic_long_add(delta, &c->parent->children_min_usage); } protected = min(usage, READ_ONCE(c->low)); old_protected = atomic_long_read(&c->low_usage); if (protected != old_protected) { old_protected = atomic_long_xchg(&c->low_usage, protected); delta = protected - old_protected; if (delta) atomic_long_add(delta, &c->parent->children_low_usage); } } /** * page_counter_cancel - take pages out of the local counter * @counter: counter * @nr_pages: number of pages to cancel */ void page_counter_cancel(struct page_counter *counter, unsigned long nr_pages) { long new; new = atomic_long_sub_return(nr_pages, &counter->usage); /* More uncharges than charges? */ if (WARN_ONCE(new < 0, "page_counter underflow: %ld nr_pages=%lu\n", new, nr_pages)) { new = 0; atomic_long_set(&counter->usage, new); } if (track_protection(counter)) propagate_protected_usage(counter, new); } /** * page_counter_charge - hierarchically charge pages * @counter: counter * @nr_pages: number of pages to charge * * NOTE: This does not consider any configured counter limits. */ void page_counter_charge(struct page_counter *counter, unsigned long nr_pages) { struct page_counter *c; bool protection = track_protection(counter); for (c = counter; c; c = c->parent) { long new; new = atomic_long_add_return(nr_pages, &c->usage); if (protection) propagate_protected_usage(c, new); /* * This is indeed racy, but we can live with some * inaccuracy in the watermark. * * Notably, we have two watermarks to allow for both a globally * visible peak and one that can be reset at a smaller scope. * * Since we reset both watermarks when the global reset occurs, * we can guarantee that watermark >= local_watermark, so we * don't need to do both comparisons every time. * * On systems with branch predictors, the inner condition should * be almost free. */ if (new > READ_ONCE(c->local_watermark)) { WRITE_ONCE(c->local_watermark, new); if (new > READ_ONCE(c->watermark)) WRITE_ONCE(c->watermark, new); } } } /** * page_counter_try_charge - try to hierarchically charge pages * @counter: counter * @nr_pages: number of pages to charge * @fail: points first counter to hit its limit, if any * * Returns %true on success, or %false and @fail if the counter or one * of its ancestors has hit its configured limit. */ bool page_counter_try_charge(struct page_counter *counter, unsigned long nr_pages, struct page_counter **fail) { struct page_counter *c; bool protection = track_protection(counter); for (c = counter; c; c = c->parent) { long new; /* * Charge speculatively to avoid an expensive CAS. If * a bigger charge fails, it might falsely lock out a * racing smaller charge and send it into reclaim * early, but the error is limited to the difference * between the two sizes, which is less than 2M/4M in * case of a THP locking out a regular page charge. * * The atomic_long_add_return() implies a full memory * barrier between incrementing the count and reading * the limit. When racing with page_counter_set_max(), * we either see the new limit or the setter sees the * counter has changed and retries. */ new = atomic_long_add_return(nr_pages, &c->usage); if (new > c->max) { atomic_long_sub(nr_pages, &c->usage); /* * This is racy, but we can live with some * inaccuracy in the failcnt which is only used * to report stats. */ data_race(c->failcnt++); *fail = c; goto failed; } if (protection) propagate_protected_usage(c, new); /* see comment on page_counter_charge */ if (new > READ_ONCE(c->local_watermark)) { WRITE_ONCE(c->local_watermark, new); if (new > READ_ONCE(c->watermark)) WRITE_ONCE(c->watermark, new); } } return true; failed: for (c = counter; c != *fail; c = c->parent) page_counter_cancel(c, nr_pages); return false; } /** * page_counter_uncharge - hierarchically uncharge pages * @counter: counter * @nr_pages: number of pages to uncharge */ void page_counter_uncharge(struct page_counter *counter, unsigned long nr_pages) { struct page_counter *c; for (c = counter; c; c = c->parent) page_counter_cancel(c, nr_pages); } /** * page_counter_set_max - set the maximum number of pages allowed * @counter: counter * @nr_pages: limit to set * * Returns 0 on success, -EBUSY if the current number of pages on the * counter already exceeds the specified limit. * * The caller must serialize invocations on the same counter. */ int page_counter_set_max(struct page_counter *counter, unsigned long nr_pages) { for (;;) { unsigned long old; long usage; /* * Update the limit while making sure that it's not * below the concurrently-changing counter value. * * The xchg implies two full memory barriers before * and after, so the read-swap-read is ordered and * ensures coherency with page_counter_try_charge(): * that function modifies the count before checking * the limit, so if it sees the old limit, we see the * modified counter and retry. */ usage = page_counter_read(counter); if (usage > nr_pages) return -EBUSY; old = xchg(&counter->max, nr_pages); if (page_counter_read(counter) <= usage || nr_pages >= old) return 0; counter->max = old; cond_resched(); } } /** * page_counter_set_min - set the amount of protected memory * @counter: counter * @nr_pages: value to set * * The caller must serialize invocations on the same counter. */ void page_counter_set_min(struct page_counter *counter, unsigned long nr_pages) { struct page_counter *c; WRITE_ONCE(counter->min, nr_pages); for (c = counter; c; c = c->parent) propagate_protected_usage(c, atomic_long_read(&c->usage)); } /** * page_counter_set_low - set the amount of protected memory * @counter: counter * @nr_pages: value to set * * The caller must serialize invocations on the same counter. */ void page_counter_set_low(struct page_counter *counter, unsigned long nr_pages) { struct page_counter *c; WRITE_ONCE(counter->low, nr_pages); for (c = counter; c; c = c->parent) propagate_protected_usage(c, atomic_long_read(&c->usage)); } /** * page_counter_memparse - memparse() for page counter limits * @buf: string to parse * @max: string meaning maximum possible value * @nr_pages: returns the result in number of pages * * Returns -EINVAL, or 0 and @nr_pages on success. @nr_pages will be * limited to %PAGE_COUNTER_MAX. */ int page_counter_memparse(const char *buf, const char *max, unsigned long *nr_pages) { char *end; u64 bytes; if (!strcmp(buf, max)) { *nr_pages = PAGE_COUNTER_MAX; return 0; } bytes = memparse(buf, &end); if (*end != '\0') return -EINVAL; *nr_pages = min(bytes / PAGE_SIZE, (u64)PAGE_COUNTER_MAX); return 0; } #if IS_ENABLED(CONFIG_MEMCG) || IS_ENABLED(CONFIG_CGROUP_DMEM) /* * This function calculates an individual page counter's effective * protection which is derived from its own memory.min/low, its * parent's and siblings' settings, as well as the actual memory * distribution in the tree. * * The following rules apply to the effective protection values: * * 1. At the first level of reclaim, effective protection is equal to * the declared protection in memory.min and memory.low. * * 2. To enable safe delegation of the protection configuration, at * subsequent levels the effective protection is capped to the * parent's effective protection. * * 3. To make complex and dynamic subtrees easier to configure, the * user is allowed to overcommit the declared protection at a given * level. If that is the case, the parent's effective protection is * distributed to the children in proportion to how much protection * they have declared and how much of it they are utilizing. * * This makes distribution proportional, but also work-conserving: * if one counter claims much more protection than it uses memory, * the unused remainder is available to its siblings. * * 4. Conversely, when the declared protection is undercommitted at a * given level, the distribution of the larger parental protection * budget is NOT proportional. A counter's protection from a sibling * is capped to its own memory.min/low setting. * * 5. However, to allow protecting recursive subtrees from each other * without having to declare each individual counter's fixed share * of the ancestor's claim to protection, any unutilized - * "floating" - protection from up the tree is distributed in * proportion to each counter's *usage*. This makes the protection * neutral wrt sibling cgroups and lets them compete freely over * the shared parental protection budget, but it protects the * subtree as a whole from neighboring subtrees. * * Note that 4. and 5. are not in conflict: 4. is about protecting * against immediate siblings whereas 5. is about protecting against * neighboring subtrees. */ static unsigned long effective_protection(unsigned long usage, unsigned long parent_usage, unsigned long setting, unsigned long parent_effective, unsigned long siblings_protected, bool recursive_protection) { unsigned long protected; unsigned long ep; protected = min(usage, setting); /* * If all cgroups at this level combined claim and use more * protection than what the parent affords them, distribute * shares in proportion to utilization. * * We are using actual utilization rather than the statically * claimed protection in order to be work-conserving: claimed * but unused protection is available to siblings that would * otherwise get a smaller chunk than what they claimed. */ if (siblings_protected > parent_effective) return protected * parent_effective / siblings_protected; /* * Ok, utilized protection of all children is within what the * parent affords them, so we know whatever this child claims * and utilizes is effectively protected. * * If there is unprotected usage beyond this value, reclaim * will apply pressure in proportion to that amount. * * If there is unutilized protection, the cgroup will be fully * shielded from reclaim, but we do return a smaller value for * protection than what the group could enjoy in theory. This * is okay. With the overcommit distribution above, effective * protection is always dependent on how memory is actually * consumed among the siblings anyway. */ ep = protected; /* * If the children aren't claiming (all of) the protection * afforded to them by the parent, distribute the remainder in * proportion to the (unprotected) memory of each cgroup. That * way, cgroups that aren't explicitly prioritized wrt each * other compete freely over the allowance, but they are * collectively protected from neighboring trees. * * We're using unprotected memory for the weight so that if * some cgroups DO claim explicit protection, we don't protect * the same bytes twice. * * Check both usage and parent_usage against the respective * protected values. One should imply the other, but they * aren't read atomically - make sure the division is sane. */ if (!recursive_protection) return ep; if (parent_effective > siblings_protected && parent_usage > siblings_protected && usage > protected) { unsigned long unclaimed; unclaimed = parent_effective - siblings_protected; unclaimed *= usage - protected; unclaimed /= parent_usage - siblings_protected; ep += unclaimed; } return ep; } /** * page_counter_calculate_protection - check if memory consumption is in the normal range * @root: the top ancestor of the sub-tree being checked * @counter: the page_counter the counter to update * @recursive_protection: Whether to use memory_recursiveprot behavior. * * Calculates elow/emin thresholds for given page_counter. * * WARNING: This function is not stateless! It can only be used as part * of a top-down tree iteration, not for isolated queries. */ void page_counter_calculate_protection(struct page_counter *root, struct page_counter *counter, bool recursive_protection) { unsigned long usage, parent_usage; struct page_counter *parent = counter->parent; /* * Effective values of the reclaim targets are ignored so they * can be stale. Have a look at mem_cgroup_protection for more * details. * TODO: calculation should be more robust so that we do not need * that special casing. */ if (root == counter) return; usage = page_counter_read(counter); if (!usage) return; if (parent == root) { counter->emin = READ_ONCE(counter->min); counter->elow = READ_ONCE(counter->low); return; } parent_usage = page_counter_read(parent); WRITE_ONCE(counter->emin, effective_protection(usage, parent_usage, READ_ONCE(counter->min), READ_ONCE(parent->emin), atomic_long_read(&parent->children_min_usage), recursive_protection)); WRITE_ONCE(counter->elow, effective_protection(usage, parent_usage, READ_ONCE(counter->low), READ_ONCE(parent->elow), atomic_long_read(&parent->children_low_usage), recursive_protection)); } #endif /* CONFIG_MEMCG || CONFIG_CGROUP_DMEM */ |
| 2 3 1 2 1 1 4 5 4 1 3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 | // SPDX-License-Identifier: GPL-2.0-only /* * QNX4 file system, Linux implementation. * * Version : 0.2.1 * * Using parts of the xiafs filesystem. * * History : * * 01-06-1998 by Richard Frowijn : first release. * 20-06-1998 by Frank Denis : Linux 2.1.99+ support, boot signature, misc. * 30-06-1998 by Frank Denis : first step to write inodes. */ #include <linux/module.h> #include <linux/init.h> #include <linux/slab.h> #include <linux/highuid.h> #include <linux/pagemap.h> #include <linux/buffer_head.h> #include <linux/writeback.h> #include <linux/statfs.h> #include <linux/fs_context.h> #include "qnx4.h" #define QNX4_VERSION 4 #define QNX4_BMNAME ".bitmap" static const struct super_operations qnx4_sops; static struct inode *qnx4_alloc_inode(struct super_block *sb); static void qnx4_free_inode(struct inode *inode); static int qnx4_statfs(struct dentry *, struct kstatfs *); static int qnx4_get_tree(struct fs_context *fc); static const struct super_operations qnx4_sops = { .alloc_inode = qnx4_alloc_inode, .free_inode = qnx4_free_inode, .statfs = qnx4_statfs, }; static int qnx4_reconfigure(struct fs_context *fc) { struct super_block *sb = fc->root->d_sb; struct qnx4_sb_info *qs; sync_filesystem(sb); qs = qnx4_sb(sb); qs->Version = QNX4_VERSION; fc->sb_flags |= SB_RDONLY; return 0; } static const struct fs_context_operations qnx4_context_opts = { .get_tree = qnx4_get_tree, .reconfigure = qnx4_reconfigure, }; static int qnx4_get_block( struct inode *inode, sector_t iblock, struct buffer_head *bh, int create ) { unsigned long phys; QNX4DEBUG((KERN_INFO "qnx4: qnx4_get_block inode=[%ld] iblock=[%ld]\n",inode->i_ino,iblock)); phys = qnx4_block_map( inode, iblock ); if ( phys ) { // logical block is before EOF map_bh(bh, inode->i_sb, phys); } return 0; } static inline u32 try_extent(qnx4_xtnt_t *extent, u32 *offset) { u32 size = le32_to_cpu(extent->xtnt_size); if (*offset < size) return le32_to_cpu(extent->xtnt_blk) + *offset - 1; *offset -= size; return 0; } unsigned long qnx4_block_map( struct inode *inode, long iblock ) { int ix; long i_xblk; struct buffer_head *bh = NULL; struct qnx4_xblk *xblk = NULL; struct qnx4_inode_entry *qnx4_inode = qnx4_raw_inode(inode); u16 nxtnt = le16_to_cpu(qnx4_inode->di_num_xtnts); u32 offset = iblock; u32 block = try_extent(&qnx4_inode->di_first_xtnt, &offset); if (block) { // iblock is in the first extent. This is easy. } else { // iblock is beyond first extent. We have to follow the extent chain. i_xblk = le32_to_cpu(qnx4_inode->di_xblk); ix = 0; while ( --nxtnt > 0 ) { if ( ix == 0 ) { // read next xtnt block. bh = sb_bread(inode->i_sb, i_xblk - 1); if ( !bh ) { QNX4DEBUG((KERN_ERR "qnx4: I/O error reading xtnt block [%ld])\n", i_xblk - 1)); return -EIO; } xblk = (struct qnx4_xblk*)bh->b_data; if ( memcmp( xblk->xblk_signature, "IamXblk", 7 ) ) { QNX4DEBUG((KERN_ERR "qnx4: block at %ld is not a valid xtnt\n", qnx4_inode->i_xblk)); return -EIO; } } block = try_extent(&xblk->xblk_xtnts[ix], &offset); if (block) { // got it! break; } if ( ++ix >= xblk->xblk_num_xtnts ) { i_xblk = le32_to_cpu(xblk->xblk_next_xblk); ix = 0; brelse( bh ); bh = NULL; } } if ( bh ) brelse( bh ); } QNX4DEBUG((KERN_INFO "qnx4: mapping block %ld of inode %ld = %ld\n",iblock,inode->i_ino,block)); return block; } static int qnx4_statfs(struct dentry *dentry, struct kstatfs *buf) { struct super_block *sb = dentry->d_sb; u64 id = huge_encode_dev(sb->s_bdev->bd_dev); buf->f_type = sb->s_magic; buf->f_bsize = sb->s_blocksize; buf->f_blocks = le32_to_cpu(qnx4_sb(sb)->BitMap->di_size) * 8; buf->f_bfree = qnx4_count_free_blocks(sb); buf->f_bavail = buf->f_bfree; buf->f_namelen = QNX4_NAME_MAX; buf->f_fsid = u64_to_fsid(id); return 0; } /* * Check the root directory of the filesystem to make sure * it really _is_ a qnx4 filesystem, and to check the size * of the directory entry. */ static const char *qnx4_checkroot(struct super_block *sb, struct qnx4_super_block *s) { struct buffer_head *bh; struct qnx4_inode_entry *rootdir; int rd, rl; int i, j; if (s->RootDir.di_fname[0] != '/' || s->RootDir.di_fname[1] != '\0') return "no qnx4 filesystem (no root dir)."; QNX4DEBUG((KERN_NOTICE "QNX4 filesystem found on dev %s.\n", sb->s_id)); rd = le32_to_cpu(s->RootDir.di_first_xtnt.xtnt_blk) - 1; rl = le32_to_cpu(s->RootDir.di_first_xtnt.xtnt_size); for (j = 0; j < rl; j++) { bh = sb_bread(sb, rd + j); /* root dir, first block */ if (bh == NULL) return "unable to read root entry."; rootdir = (struct qnx4_inode_entry *) bh->b_data; for (i = 0; i < QNX4_INODES_PER_BLOCK; i++, rootdir++) { QNX4DEBUG((KERN_INFO "rootdir entry found : [%s]\n", rootdir->di_fname)); if (strcmp(rootdir->di_fname, QNX4_BMNAME) != 0) continue; qnx4_sb(sb)->BitMap = kmemdup(rootdir, sizeof(struct qnx4_inode_entry), GFP_KERNEL); brelse(bh); if (!qnx4_sb(sb)->BitMap) return "not enough memory for bitmap inode"; /* keep bitmap inode known */ return NULL; } brelse(bh); } return "bitmap file not found."; } static int qnx4_fill_super(struct super_block *s, struct fs_context *fc) { struct buffer_head *bh; struct inode *root; const char *errmsg; struct qnx4_sb_info *qs; int silent = fc->sb_flags & SB_SILENT; qs = kzalloc(sizeof(struct qnx4_sb_info), GFP_KERNEL); if (!qs) return -ENOMEM; s->s_fs_info = qs; sb_set_blocksize(s, QNX4_BLOCK_SIZE); s->s_op = &qnx4_sops; s->s_magic = QNX4_SUPER_MAGIC; s->s_flags |= SB_RDONLY; /* Yup, read-only yet */ s->s_time_min = 0; s->s_time_max = U32_MAX; /* Check the superblock signature. Since the qnx4 code is dangerous, we should leave as quickly as possible if we don't belong here... */ bh = sb_bread(s, 1); if (!bh) { printk(KERN_ERR "qnx4: unable to read the superblock\n"); return -EINVAL; } /* check before allocating dentries, inodes, .. */ errmsg = qnx4_checkroot(s, (struct qnx4_super_block *) bh->b_data); brelse(bh); if (errmsg != NULL) { if (!silent) printk(KERN_ERR "qnx4: %s\n", errmsg); return -EINVAL; } /* does root not have inode number QNX4_ROOT_INO ?? */ root = qnx4_iget(s, QNX4_ROOT_INO * QNX4_INODES_PER_BLOCK); if (IS_ERR(root)) { printk(KERN_ERR "qnx4: get inode failed\n"); return PTR_ERR(root); } s->s_root = d_make_root(root); if (s->s_root == NULL) return -ENOMEM; return 0; } static int qnx4_get_tree(struct fs_context *fc) { return get_tree_bdev(fc, qnx4_fill_super); } static int qnx4_init_fs_context(struct fs_context *fc) { fc->ops = &qnx4_context_opts; return 0; } static void qnx4_kill_sb(struct super_block *sb) { struct qnx4_sb_info *qs = qnx4_sb(sb); kill_block_super(sb); if (qs) { kfree(qs->BitMap); kfree(qs); } } static int qnx4_read_folio(struct file *file, struct folio *folio) { return block_read_full_folio(folio, qnx4_get_block); } static sector_t qnx4_bmap(struct address_space *mapping, sector_t block) { return generic_block_bmap(mapping,block,qnx4_get_block); } static const struct address_space_operations qnx4_aops = { .read_folio = qnx4_read_folio, .bmap = qnx4_bmap }; struct inode *qnx4_iget(struct super_block *sb, unsigned long ino) { struct buffer_head *bh; struct qnx4_inode_entry *raw_inode; int block; struct qnx4_inode_entry *qnx4_inode; struct inode *inode; inode = iget_locked(sb, ino); if (!inode) return ERR_PTR(-ENOMEM); if (!(inode->i_state & I_NEW)) return inode; qnx4_inode = qnx4_raw_inode(inode); inode->i_mode = 0; QNX4DEBUG((KERN_INFO "reading inode : [%d]\n", ino)); if (!ino) { printk(KERN_ERR "qnx4: bad inode number on dev %s: %lu is " "out of range\n", sb->s_id, ino); iget_failed(inode); return ERR_PTR(-EIO); } block = ino / QNX4_INODES_PER_BLOCK; if (!(bh = sb_bread(sb, block))) { printk(KERN_ERR "qnx4: major problem: unable to read inode from dev " "%s\n", sb->s_id); iget_failed(inode); return ERR_PTR(-EIO); } raw_inode = ((struct qnx4_inode_entry *) bh->b_data) + (ino % QNX4_INODES_PER_BLOCK); inode->i_mode = le16_to_cpu(raw_inode->di_mode); i_uid_write(inode, (uid_t)le16_to_cpu(raw_inode->di_uid)); i_gid_write(inode, (gid_t)le16_to_cpu(raw_inode->di_gid)); set_nlink(inode, le16_to_cpu(raw_inode->di_nlink)); inode->i_size = le32_to_cpu(raw_inode->di_size); inode_set_mtime(inode, le32_to_cpu(raw_inode->di_mtime), 0); inode_set_atime(inode, le32_to_cpu(raw_inode->di_atime), 0); inode_set_ctime(inode, le32_to_cpu(raw_inode->di_ctime), 0); inode->i_blocks = le32_to_cpu(raw_inode->di_first_xtnt.xtnt_size); memcpy(qnx4_inode, raw_inode, QNX4_DIR_ENTRY_SIZE); if (S_ISREG(inode->i_mode)) { inode->i_fop = &generic_ro_fops; inode->i_mapping->a_ops = &qnx4_aops; qnx4_i(inode)->mmu_private = inode->i_size; } else if (S_ISDIR(inode->i_mode)) { inode->i_op = &qnx4_dir_inode_operations; inode->i_fop = &qnx4_dir_operations; } else if (S_ISLNK(inode->i_mode)) { inode->i_op = &page_symlink_inode_operations; inode_nohighmem(inode); inode->i_mapping->a_ops = &qnx4_aops; qnx4_i(inode)->mmu_private = inode->i_size; } else { printk(KERN_ERR "qnx4: bad inode %lu on dev %s\n", ino, sb->s_id); iget_failed(inode); brelse(bh); return ERR_PTR(-EIO); } brelse(bh); unlock_new_inode(inode); return inode; } static struct kmem_cache *qnx4_inode_cachep; static struct inode *qnx4_alloc_inode(struct super_block *sb) { struct qnx4_inode_info *ei; ei = alloc_inode_sb(sb, qnx4_inode_cachep, GFP_KERNEL); if (!ei) return NULL; return &ei->vfs_inode; } static void qnx4_free_inode(struct inode *inode) { kmem_cache_free(qnx4_inode_cachep, qnx4_i(inode)); } static void init_once(void *foo) { struct qnx4_inode_info *ei = (struct qnx4_inode_info *) foo; inode_init_once(&ei->vfs_inode); } static int init_inodecache(void) { qnx4_inode_cachep = kmem_cache_create("qnx4_inode_cache", sizeof(struct qnx4_inode_info), 0, (SLAB_RECLAIM_ACCOUNT| SLAB_ACCOUNT), init_once); if (qnx4_inode_cachep == NULL) return -ENOMEM; return 0; } static void destroy_inodecache(void) { /* * Make sure all delayed rcu free inodes are flushed before we * destroy cache. */ rcu_barrier(); kmem_cache_destroy(qnx4_inode_cachep); } static struct file_system_type qnx4_fs_type = { .owner = THIS_MODULE, .name = "qnx4", .kill_sb = qnx4_kill_sb, .fs_flags = FS_REQUIRES_DEV, .init_fs_context = qnx4_init_fs_context, }; MODULE_ALIAS_FS("qnx4"); static int __init init_qnx4_fs(void) { int err; err = init_inodecache(); if (err) return err; err = register_filesystem(&qnx4_fs_type); if (err) { destroy_inodecache(); return err; } printk(KERN_INFO "QNX4 filesystem 0.2.3 registered.\n"); return 0; } static void __exit exit_qnx4_fs(void) { unregister_filesystem(&qnx4_fs_type); destroy_inodecache(); } module_init(init_qnx4_fs) module_exit(exit_qnx4_fs) MODULE_DESCRIPTION("QNX4 file system"); MODULE_LICENSE("GPL"); |
| 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 1 16 3 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 7 1 1 2 1 1 1 1 2 1 2 2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 | // SPDX-License-Identifier: GPL-2.0+ /* * Copyright (C) 2001-2004 by David Brownell */ /* this file is part of ehci-hcd.c */ /*-------------------------------------------------------------------------*/ /* * EHCI Root Hub ... the nonsharable stuff * * Registers don't need cpu_to_le32, that happens transparently */ /*-------------------------------------------------------------------------*/ #define PORT_WAKE_BITS (PORT_WKOC_E|PORT_WKDISC_E|PORT_WKCONN_E) #ifdef CONFIG_PM static void unlink_empty_async_suspended(struct ehci_hcd *ehci); static int persist_enabled_on_companion(struct usb_device *udev, void *unused) { return !udev->maxchild && udev->persist_enabled && udev->bus->root_hub->speed < USB_SPEED_HIGH; } /* After a power loss, ports that were owned by the companion must be * reset so that the companion can still own them. */ static void ehci_handover_companion_ports(struct ehci_hcd *ehci) { u32 __iomem *reg; u32 status; int port; __le32 buf; struct usb_hcd *hcd = ehci_to_hcd(ehci); if (!ehci->owned_ports) return; /* * USB 1.1 devices are mostly HIDs, which don't need to persist across * suspends. If we ensure that none of our companion's devices have * persist_enabled (by looking through all USB 1.1 buses in the system), * we can skip this and avoid slowing resume down. Devices without * persist will just get reenumerated shortly after resume anyway. */ if (!usb_for_each_dev(NULL, persist_enabled_on_companion)) return; /* Make sure the ports are powered */ port = HCS_N_PORTS(ehci->hcs_params); while (port--) { if (test_bit(port, &ehci->owned_ports)) { reg = &ehci->regs->port_status[port]; status = ehci_readl(ehci, reg) & ~PORT_RWC_BITS; if (!(status & PORT_POWER)) ehci_port_power(ehci, port, true); } } /* Give the connections some time to appear */ msleep(20); spin_lock_irq(&ehci->lock); port = HCS_N_PORTS(ehci->hcs_params); while (port--) { if (test_bit(port, &ehci->owned_ports)) { reg = &ehci->regs->port_status[port]; status = ehci_readl(ehci, reg) & ~PORT_RWC_BITS; /* Port already owned by companion? */ if (status & PORT_OWNER) clear_bit(port, &ehci->owned_ports); else if (test_bit(port, &ehci->companion_ports)) ehci_writel(ehci, status & ~PORT_PE, reg); else { spin_unlock_irq(&ehci->lock); ehci_hub_control(hcd, SetPortFeature, USB_PORT_FEAT_RESET, port + 1, NULL, 0); spin_lock_irq(&ehci->lock); } } } spin_unlock_irq(&ehci->lock); if (!ehci->owned_ports) return; msleep(90); /* Wait for resets to complete */ spin_lock_irq(&ehci->lock); port = HCS_N_PORTS(ehci->hcs_params); while (port--) { if (test_bit(port, &ehci->owned_ports)) { spin_unlock_irq(&ehci->lock); ehci_hub_control(hcd, GetPortStatus, 0, port + 1, (char *) &buf, sizeof(buf)); spin_lock_irq(&ehci->lock); /* The companion should now own the port, * but if something went wrong the port must not * remain enabled. */ reg = &ehci->regs->port_status[port]; status = ehci_readl(ehci, reg) & ~PORT_RWC_BITS; if (status & PORT_OWNER) ehci_writel(ehci, status | PORT_CSC, reg); else { ehci_dbg(ehci, "failed handover port %d: %x\n", port + 1, status); ehci_writel(ehci, status & ~PORT_PE, reg); } } } ehci->owned_ports = 0; spin_unlock_irq(&ehci->lock); } static int ehci_port_change(struct ehci_hcd *ehci) { int i = HCS_N_PORTS(ehci->hcs_params); /* First check if the controller indicates a change event */ if (ehci_readl(ehci, &ehci->regs->status) & STS_PCD) return 1; /* * Not all controllers appear to update this while going from D3 to D0, * so check the individual port status registers as well */ while (i--) if (ehci_readl(ehci, &ehci->regs->port_status[i]) & PORT_CSC) return 1; return 0; } void ehci_adjust_port_wakeup_flags(struct ehci_hcd *ehci, bool suspending, bool do_wakeup) { int port; u32 temp; /* If remote wakeup is enabled for the root hub but disabled * for the controller, we must adjust all the port wakeup flags * when the controller is suspended or resumed. In all other * cases they don't need to be changed. */ if (!ehci_to_hcd(ehci)->self.root_hub->do_remote_wakeup || do_wakeup) return; spin_lock_irq(&ehci->lock); /* clear phy low-power mode before changing wakeup flags */ if (ehci->has_tdi_phy_lpm) { port = HCS_N_PORTS(ehci->hcs_params); while (port--) { u32 __iomem *hostpc_reg = &ehci->regs->hostpc[port]; temp = ehci_readl(ehci, hostpc_reg); ehci_writel(ehci, temp & ~HOSTPC_PHCD, hostpc_reg); } spin_unlock_irq(&ehci->lock); msleep(5); spin_lock_irq(&ehci->lock); } port = HCS_N_PORTS(ehci->hcs_params); while (port--) { u32 __iomem *reg = &ehci->regs->port_status[port]; u32 t1 = ehci_readl(ehci, reg) & ~PORT_RWC_BITS; u32 t2 = t1 & ~PORT_WAKE_BITS; /* If we are suspending the controller, clear the flags. * If we are resuming the controller, set the wakeup flags. */ if (!suspending) { if (t1 & PORT_CONNECT) t2 |= PORT_WKOC_E | PORT_WKDISC_E; else t2 |= PORT_WKOC_E | PORT_WKCONN_E; } ehci_writel(ehci, t2, reg); } /* enter phy low-power mode again */ if (ehci->has_tdi_phy_lpm) { port = HCS_N_PORTS(ehci->hcs_params); while (port--) { u32 __iomem *hostpc_reg = &ehci->regs->hostpc[port]; temp = ehci_readl(ehci, hostpc_reg); ehci_writel(ehci, temp | HOSTPC_PHCD, hostpc_reg); } } /* Does the root hub have a port wakeup pending? */ if (!suspending && ehci_port_change(ehci)) usb_hcd_resume_root_hub(ehci_to_hcd(ehci)); spin_unlock_irq(&ehci->lock); } EXPORT_SYMBOL_GPL(ehci_adjust_port_wakeup_flags); static int ehci_bus_suspend (struct usb_hcd *hcd) { struct ehci_hcd *ehci = hcd_to_ehci (hcd); int port; int mask; int changed; bool fs_idle_delay; ehci_dbg(ehci, "suspend root hub\n"); if (time_before (jiffies, ehci->next_statechange)) msleep(5); /* stop the schedules */ ehci_quiesce(ehci); spin_lock_irq (&ehci->lock); if (ehci->rh_state < EHCI_RH_RUNNING) goto done; /* Once the controller is stopped, port resumes that are already * in progress won't complete. Hence if remote wakeup is enabled * for the root hub and any ports are in the middle of a resume or * remote wakeup, we must fail the suspend. */ if (hcd->self.root_hub->do_remote_wakeup) { if (ehci->resuming_ports) { spin_unlock_irq(&ehci->lock); ehci_dbg(ehci, "suspend failed because a port is resuming\n"); return -EBUSY; } } /* Unlike other USB host controller types, EHCI doesn't have * any notion of "global" or bus-wide suspend. The driver has * to manually suspend all the active unsuspended ports, and * then manually resume them in the bus_resume() routine. */ ehci->bus_suspended = 0; ehci->owned_ports = 0; changed = 0; fs_idle_delay = false; port = HCS_N_PORTS(ehci->hcs_params); while (port--) { u32 __iomem *reg = &ehci->regs->port_status [port]; u32 t1 = ehci_readl(ehci, reg) & ~PORT_RWC_BITS; u32 t2 = t1 & ~PORT_WAKE_BITS; /* keep track of which ports we suspend */ if (t1 & PORT_OWNER) set_bit(port, &ehci->owned_ports); else if ((t1 & PORT_PE) && !(t1 & PORT_SUSPEND)) { t2 |= PORT_SUSPEND; set_bit(port, &ehci->bus_suspended); } /* enable remote wakeup on all ports, if told to do so */ if (hcd->self.root_hub->do_remote_wakeup) { /* only enable appropriate wake bits, otherwise the * hardware can not go phy low power mode. If a race * condition happens here(connection change during bits * set), the port change detection will finally fix it. */ if (t1 & PORT_CONNECT) t2 |= PORT_WKOC_E | PORT_WKDISC_E; else t2 |= PORT_WKOC_E | PORT_WKCONN_E; } if (t1 != t2) { /* * On some controllers, Wake-On-Disconnect will * generate false wakeup signals until the bus * switches over to full-speed idle. For their * sake, add a delay if we need one. */ if ((t2 & PORT_WKDISC_E) && ehci_port_speed(ehci, t2) == USB_PORT_STAT_HIGH_SPEED) fs_idle_delay = true; ehci_writel(ehci, t2, reg); changed = 1; } } spin_unlock_irq(&ehci->lock); if (changed && ehci_has_fsl_susp_errata(ehci)) /* * Wait for at least 10 millisecondes to ensure the controller * enter the suspend status before initiating a port resume * using the Force Port Resume bit (Not-EHCI compatible). */ usleep_range(10000, 20000); if ((changed && ehci->has_tdi_phy_lpm) || fs_idle_delay) { /* * Wait for HCD to enter low-power mode or for the bus * to switch to full-speed idle. */ usleep_range(5000, 5500); } if (changed && ehci->has_tdi_phy_lpm) { spin_lock_irq(&ehci->lock); port = HCS_N_PORTS(ehci->hcs_params); while (port--) { u32 __iomem *hostpc_reg = &ehci->regs->hostpc[port]; u32 t3; t3 = ehci_readl(ehci, hostpc_reg); ehci_writel(ehci, t3 | HOSTPC_PHCD, hostpc_reg); t3 = ehci_readl(ehci, hostpc_reg); ehci_dbg(ehci, "Port %d phy low-power mode %s\n", port, (t3 & HOSTPC_PHCD) ? "succeeded" : "failed"); } spin_unlock_irq(&ehci->lock); } /* Apparently some devices need a >= 1-uframe delay here */ if (ehci->bus_suspended) udelay(150); /* turn off now-idle HC */ ehci_halt (ehci); spin_lock_irq(&ehci->lock); if (ehci->enabled_hrtimer_events & BIT(EHCI_HRTIMER_POLL_DEAD)) ehci_handle_controller_death(ehci); if (ehci->rh_state != EHCI_RH_RUNNING) goto done; ehci->rh_state = EHCI_RH_SUSPENDED; unlink_empty_async_suspended(ehci); /* Some Synopsys controllers mistakenly leave IAA turned on */ ehci_writel(ehci, STS_IAA, &ehci->regs->status); /* Any IAA cycle that started before the suspend is now invalid */ end_iaa_cycle(ehci); ehci_handle_start_intr_unlinks(ehci); ehci_handle_intr_unlinks(ehci); end_free_itds(ehci); /* allow remote wakeup */ mask = INTR_MASK; if (!hcd->self.root_hub->do_remote_wakeup) mask &= ~STS_PCD; ehci_writel(ehci, mask, &ehci->regs->intr_enable); ehci_readl(ehci, &ehci->regs->intr_enable); done: ehci->next_statechange = jiffies + msecs_to_jiffies(10); ehci->enabled_hrtimer_events = 0; ehci->next_hrtimer_event = EHCI_HRTIMER_NO_EVENT; spin_unlock_irq (&ehci->lock); hrtimer_cancel(&ehci->hrtimer); return 0; } /* caller has locked the root hub, and should reset/reinit on error */ static int ehci_bus_resume (struct usb_hcd *hcd) { struct ehci_hcd *ehci = hcd_to_ehci (hcd); u32 temp; u32 power_okay; int i; unsigned long resume_needed = 0; if (time_before (jiffies, ehci->next_statechange)) msleep(5); spin_lock_irq (&ehci->lock); if (!HCD_HW_ACCESSIBLE(hcd) || ehci->shutdown) goto shutdown; if (unlikely(ehci->debug)) { if (!dbgp_reset_prep(hcd)) ehci->debug = NULL; else dbgp_external_startup(hcd); } /* Ideally and we've got a real resume here, and no port's power * was lost. (For PCI, that means Vaux was maintained.) But we * could instead be restoring a swsusp snapshot -- so that BIOS was * the last user of the controller, not reset/pm hardware keeping * state we gave to it. */ power_okay = ehci_readl(ehci, &ehci->regs->intr_enable); ehci_dbg(ehci, "resume root hub%s\n", power_okay ? "" : " after power loss"); /* at least some APM implementations will try to deliver * IRQs right away, so delay them until we're ready. */ ehci_writel(ehci, 0, &ehci->regs->intr_enable); /* re-init operational registers */ ehci_writel(ehci, 0, &ehci->regs->segment); ehci_writel(ehci, ehci->periodic_dma, &ehci->regs->frame_list); ehci_writel(ehci, (u32) ehci->async->qh_dma, &ehci->regs->async_next); /* restore CMD_RUN, framelist size, and irq threshold */ ehci->command |= CMD_RUN; ehci_writel(ehci, ehci->command, &ehci->regs->command); ehci->rh_state = EHCI_RH_RUNNING; /* * According to Bugzilla #8190, the port status for some controllers * will be wrong without a delay. At their wrong status, the port * is enabled, but not suspended neither resumed. */ i = HCS_N_PORTS(ehci->hcs_params); while (i--) { temp = ehci_readl(ehci, &ehci->regs->port_status[i]); if ((temp & PORT_PE) && !(temp & (PORT_SUSPEND | PORT_RESUME))) { ehci_dbg(ehci, "Port status(0x%x) is wrong\n", temp); spin_unlock_irq(&ehci->lock); msleep(8); spin_lock_irq(&ehci->lock); break; } } if (ehci->shutdown) goto shutdown; /* clear phy low-power mode before resume */ if (ehci->bus_suspended && ehci->has_tdi_phy_lpm) { i = HCS_N_PORTS(ehci->hcs_params); while (i--) { if (test_bit(i, &ehci->bus_suspended)) { u32 __iomem *hostpc_reg = &ehci->regs->hostpc[i]; temp = ehci_readl(ehci, hostpc_reg); ehci_writel(ehci, temp & ~HOSTPC_PHCD, hostpc_reg); } } spin_unlock_irq(&ehci->lock); msleep(5); spin_lock_irq(&ehci->lock); if (ehci->shutdown) goto shutdown; } /* manually resume the ports we suspended during bus_suspend() */ i = HCS_N_PORTS (ehci->hcs_params); while (i--) { temp = ehci_readl(ehci, &ehci->regs->port_status [i]); temp &= ~(PORT_RWC_BITS | PORT_WAKE_BITS); if (test_bit(i, &ehci->bus_suspended) && (temp & PORT_SUSPEND)) { temp |= PORT_RESUME; set_bit(i, &resume_needed); } ehci_writel(ehci, temp, &ehci->regs->port_status [i]); } /* * msleep for USB_RESUME_TIMEOUT ms only if code is trying to resume * port */ if (resume_needed) { spin_unlock_irq(&ehci->lock); msleep(USB_RESUME_TIMEOUT); spin_lock_irq(&ehci->lock); if (ehci->shutdown) goto shutdown; } i = HCS_N_PORTS (ehci->hcs_params); while (i--) { temp = ehci_readl(ehci, &ehci->regs->port_status [i]); if (test_bit(i, &resume_needed)) { temp &= ~(PORT_RWC_BITS | PORT_SUSPEND | PORT_RESUME); ehci_writel(ehci, temp, &ehci->regs->port_status [i]); } } ehci->next_statechange = jiffies + msecs_to_jiffies(5); spin_unlock_irq(&ehci->lock); ehci_handover_companion_ports(ehci); /* Now we can safely re-enable irqs */ spin_lock_irq(&ehci->lock); if (ehci->shutdown) goto shutdown; ehci_writel(ehci, INTR_MASK, &ehci->regs->intr_enable); (void) ehci_readl(ehci, &ehci->regs->intr_enable); spin_unlock_irq(&ehci->lock); return 0; shutdown: spin_unlock_irq(&ehci->lock); return -ESHUTDOWN; } static unsigned long ehci_get_resuming_ports(struct usb_hcd *hcd) { struct ehci_hcd *ehci = hcd_to_ehci(hcd); return ehci->resuming_ports; } #else #define ehci_bus_suspend NULL #define ehci_bus_resume NULL #define ehci_get_resuming_ports NULL #endif /* CONFIG_PM */ /*-------------------------------------------------------------------------*/ /* * Sets the owner of a port */ static void set_owner(struct ehci_hcd *ehci, int portnum, int new_owner) { u32 __iomem *status_reg; u32 port_status; int try; status_reg = &ehci->regs->port_status[portnum]; /* * The controller won't set the OWNER bit if the port is * enabled, so this loop will sometimes require at least two * iterations: one to disable the port and one to set OWNER. */ for (try = 4; try > 0; --try) { spin_lock_irq(&ehci->lock); port_status = ehci_readl(ehci, status_reg); if ((port_status & PORT_OWNER) == new_owner || (port_status & (PORT_OWNER | PORT_CONNECT)) == 0) try = 0; else { port_status ^= PORT_OWNER; port_status &= ~(PORT_PE | PORT_RWC_BITS); ehci_writel(ehci, port_status, status_reg); } spin_unlock_irq(&ehci->lock); if (try > 1) msleep(5); } } /*-------------------------------------------------------------------------*/ static int check_reset_complete ( struct ehci_hcd *ehci, int index, u32 __iomem *status_reg, int port_status ) { if (!(port_status & PORT_CONNECT)) return port_status; /* if reset finished and it's still not enabled -- handoff */ if (!(port_status & PORT_PE)) { /* with integrated TT, there's nobody to hand it to! */ if (ehci_is_TDI(ehci)) { ehci_dbg (ehci, "Failed to enable port %d on root hub TT\n", index+1); return port_status; } ehci_dbg (ehci, "port %d full speed --> companion\n", index + 1); // what happens if HCS_N_CC(params) == 0 ? port_status |= PORT_OWNER; port_status &= ~PORT_RWC_BITS; ehci_writel(ehci, port_status, status_reg); /* ensure 440EPX ohci controller state is operational */ if (ehci->has_amcc_usb23) set_ohci_hcfs(ehci, 1); } else { ehci_dbg(ehci, "port %d reset complete, port enabled\n", index + 1); /* ensure 440EPx ohci controller state is suspended */ if (ehci->has_amcc_usb23) set_ohci_hcfs(ehci, 0); } return port_status; } /*-------------------------------------------------------------------------*/ /* build "status change" packet (one or two bytes) from HC registers */ static int ehci_hub_status_data (struct usb_hcd *hcd, char *buf) { struct ehci_hcd *ehci = hcd_to_ehci (hcd); u32 temp, status; u32 mask; int ports, i, retval = 1; unsigned long flags; u32 ppcd = ~0; /* init status to no-changes */ buf [0] = 0; ports = HCS_N_PORTS (ehci->hcs_params); if (ports > 7) { buf [1] = 0; retval++; } /* Inform the core about resumes-in-progress by returning * a non-zero value even if there are no status changes. */ status = ehci->resuming_ports; /* Some boards (mostly VIA?) report bogus overcurrent indications, * causing massive log spam unless we completely ignore them. It * may be relevant that VIA VT8235 controllers, where PORT_POWER is * always set, seem to clear PORT_OCC and PORT_CSC when writing to * PORT_POWER; that's surprising, but maybe within-spec. */ if (!ignore_oc && !ehci->spurious_oc) mask = PORT_CSC | PORT_PEC | PORT_OCC; else mask = PORT_CSC | PORT_PEC; // PORT_RESUME from hardware ~= PORT_STAT_C_SUSPEND /* no hub change reports (bit 0) for now (power, ...) */ /* port N changes (bit N)? */ spin_lock_irqsave (&ehci->lock, flags); /* get per-port change detect bits */ if (ehci->has_ppcd) ppcd = ehci_readl(ehci, &ehci->regs->status) >> 16; for (i = 0; i < ports; i++) { /* leverage per-port change bits feature */ if (ppcd & (1 << i)) temp = ehci_readl(ehci, &ehci->regs->port_status[i]); else temp = 0; /* * Return status information even for ports with OWNER set. * Otherwise hub_wq wouldn't see the disconnect event when a * high-speed device is switched over to the companion * controller by the user. */ if ((temp & mask) != 0 || test_bit(i, &ehci->port_c_suspend) || (ehci->reset_done[i] && time_after_eq( jiffies, ehci->reset_done[i])) || ehci_has_ci_pec_bug(ehci, temp)) { if (i < 7) buf [0] |= 1 << (i + 1); else buf [1] |= 1 << (i - 7); status = STS_PCD; } } /* If a resume is in progress, make sure it can finish */ if (ehci->resuming_ports) mod_timer(&hcd->rh_timer, jiffies + msecs_to_jiffies(25)); spin_unlock_irqrestore (&ehci->lock, flags); return status ? retval : 0; } /*-------------------------------------------------------------------------*/ static void ehci_hub_descriptor ( struct ehci_hcd *ehci, struct usb_hub_descriptor *desc ) { int ports = HCS_N_PORTS (ehci->hcs_params); u16 temp; desc->bDescriptorType = USB_DT_HUB; desc->bPwrOn2PwrGood = 10; /* ehci 1.0, 2.3.9 says 20ms max */ desc->bHubContrCurrent = 0; desc->bNbrPorts = ports; temp = 1 + (ports / 8); desc->bDescLength = 7 + 2 * temp; /* two bitmaps: ports removable, and usb 1.0 legacy PortPwrCtrlMask */ memset(&desc->u.hs.DeviceRemovable[0], 0, temp); memset(&desc->u.hs.DeviceRemovable[temp], 0xff, temp); temp = HUB_CHAR_INDV_PORT_OCPM; /* per-port overcurrent reporting */ if (HCS_PPC (ehci->hcs_params)) temp |= HUB_CHAR_INDV_PORT_LPSM; /* per-port power control */ else temp |= HUB_CHAR_NO_LPSM; /* no power switching */ #if 0 // re-enable when we support USB_PORT_FEAT_INDICATOR below. if (HCS_INDICATOR (ehci->hcs_params)) temp |= HUB_CHAR_PORTIND; /* per-port indicators (LEDs) */ #endif desc->wHubCharacteristics = cpu_to_le16(temp); } /*-------------------------------------------------------------------------*/ int ehci_hub_control( struct usb_hcd *hcd, u16 typeReq, u16 wValue, u16 wIndex, char *buf, u16 wLength ) { struct ehci_hcd *ehci = hcd_to_ehci (hcd); int ports = HCS_N_PORTS (ehci->hcs_params); u32 __iomem *status_reg, *hostpc_reg; u32 temp, temp1, status; unsigned long flags; int retval = 0; unsigned selector; /* * Avoid out-of-bounds values while calculating the port index * from wIndex. The compiler doesn't like pointers to invalid * addresses, even if they are never used. */ temp = (wIndex - 1) & 0xff; if (temp >= HCS_N_PORTS_MAX) temp = 0; status_reg = &ehci->regs->port_status[temp]; hostpc_reg = &ehci->regs->hostpc[temp]; /* * FIXME: support SetPortFeatures USB_PORT_FEAT_INDICATOR. * HCS_INDICATOR may say we can change LEDs to off/amber/green. * (track current state ourselves) ... blink for diagnostics, * power, "this is the one", etc. EHCI spec supports this. */ spin_lock_irqsave (&ehci->lock, flags); switch (typeReq) { case ClearHubFeature: switch (wValue) { case C_HUB_LOCAL_POWER: case C_HUB_OVER_CURRENT: /* no hub-wide feature/status flags */ break; default: goto error; } break; case ClearPortFeature: if (!wIndex || wIndex > ports) goto error; wIndex--; temp = ehci_readl(ehci, status_reg); temp &= ~PORT_RWC_BITS; /* * Even if OWNER is set, so the port is owned by the * companion controller, hub_wq needs to be able to clear * the port-change status bits (especially * USB_PORT_STAT_C_CONNECTION). */ switch (wValue) { case USB_PORT_FEAT_ENABLE: ehci_writel(ehci, temp & ~PORT_PE, status_reg); break; case USB_PORT_FEAT_C_ENABLE: ehci_writel(ehci, temp | PORT_PEC, status_reg); break; case USB_PORT_FEAT_SUSPEND: if (temp & PORT_RESET) goto error; if (ehci->no_selective_suspend) break; #ifdef CONFIG_USB_OTG if ((hcd->self.otg_port == (wIndex + 1)) && hcd->self.b_hnp_enable) { otg_start_hnp(hcd->usb_phy->otg); break; } #endif if (!(temp & PORT_SUSPEND)) break; if ((temp & PORT_PE) == 0) goto error; /* clear phy low-power mode before resume */ if (ehci->has_tdi_phy_lpm) { temp1 = ehci_readl(ehci, hostpc_reg); ehci_writel(ehci, temp1 & ~HOSTPC_PHCD, hostpc_reg); spin_unlock_irqrestore(&ehci->lock, flags); msleep(5);/* wait to leave low-power mode */ spin_lock_irqsave(&ehci->lock, flags); } /* resume signaling for 20 msec */ temp &= ~PORT_WAKE_BITS; ehci_writel(ehci, temp | PORT_RESUME, status_reg); ehci->reset_done[wIndex] = jiffies + msecs_to_jiffies(USB_RESUME_TIMEOUT); set_bit(wIndex, &ehci->resuming_ports); usb_hcd_start_port_resume(&hcd->self, wIndex); break; case USB_PORT_FEAT_C_SUSPEND: clear_bit(wIndex, &ehci->port_c_suspend); break; case USB_PORT_FEAT_POWER: if (HCS_PPC(ehci->hcs_params)) { spin_unlock_irqrestore(&ehci->lock, flags); ehci_port_power(ehci, wIndex, false); spin_lock_irqsave(&ehci->lock, flags); } break; case USB_PORT_FEAT_C_CONNECTION: ehci_writel(ehci, temp | PORT_CSC, status_reg); break; case USB_PORT_FEAT_C_OVER_CURRENT: ehci_writel(ehci, temp | PORT_OCC, status_reg); break; case USB_PORT_FEAT_C_RESET: /* GetPortStatus clears reset */ break; default: goto error; } ehci_readl(ehci, &ehci->regs->command); /* unblock posted write */ break; case GetHubDescriptor: ehci_hub_descriptor (ehci, (struct usb_hub_descriptor *) buf); break; case GetHubStatus: /* no hub-wide feature/status flags */ memset (buf, 0, 4); //cpu_to_le32s ((u32 *) buf); break; case GetPortStatus: if (!wIndex || wIndex > ports) goto error; wIndex--; status = 0; temp = ehci_readl(ehci, status_reg); // wPortChange bits if (temp & PORT_CSC) status |= USB_PORT_STAT_C_CONNECTION << 16; if (temp & PORT_PEC) status |= USB_PORT_STAT_C_ENABLE << 16; if (ehci_has_ci_pec_bug(ehci, temp)) { status |= USB_PORT_STAT_C_ENABLE << 16; ehci_info(ehci, "PE is cleared by HW port:%d PORTSC:%08x\n", wIndex + 1, temp); } if ((temp & PORT_OCC) && (!ignore_oc && !ehci->spurious_oc)){ status |= USB_PORT_STAT_C_OVERCURRENT << 16; /* * Hubs should disable port power on over-current. * However, not all EHCI implementations do this * automatically, even if they _do_ support per-port * power switching; they're allowed to just limit the * current. hub_wq will turn the power back on. */ if (((temp & PORT_OC) || (ehci->need_oc_pp_cycle)) && HCS_PPC(ehci->hcs_params)) { spin_unlock_irqrestore(&ehci->lock, flags); ehci_port_power(ehci, wIndex, false); spin_lock_irqsave(&ehci->lock, flags); temp = ehci_readl(ehci, status_reg); } } /* no reset or resume pending */ if (!ehci->reset_done[wIndex]) { /* Remote Wakeup received? */ if (temp & PORT_RESUME) { /* resume signaling for 20 msec */ ehci->reset_done[wIndex] = jiffies + msecs_to_jiffies(20); usb_hcd_start_port_resume(&hcd->self, wIndex); set_bit(wIndex, &ehci->resuming_ports); /* check the port again */ mod_timer(&ehci_to_hcd(ehci)->rh_timer, ehci->reset_done[wIndex]); } /* reset or resume not yet complete */ } else if (!time_after_eq(jiffies, ehci->reset_done[wIndex])) { ; /* wait until it is complete */ /* resume completed */ } else if (test_bit(wIndex, &ehci->resuming_ports)) { clear_bit(wIndex, &ehci->suspended_ports); set_bit(wIndex, &ehci->port_c_suspend); ehci->reset_done[wIndex] = 0; usb_hcd_end_port_resume(&hcd->self, wIndex); /* stop resume signaling */ temp &= ~(PORT_RWC_BITS | PORT_SUSPEND | PORT_RESUME); ehci_writel(ehci, temp, status_reg); clear_bit(wIndex, &ehci->resuming_ports); retval = ehci_handshake(ehci, status_reg, PORT_RESUME, 0, 2000 /* 2msec */); if (retval != 0) { ehci_err(ehci, "port %d resume error %d\n", wIndex + 1, retval); goto error; } temp = ehci_readl(ehci, status_reg); /* whoever resets must GetPortStatus to complete it!! */ } else { status |= USB_PORT_STAT_C_RESET << 16; ehci->reset_done [wIndex] = 0; /* force reset to complete */ ehci_writel(ehci, temp & ~(PORT_RWC_BITS | PORT_RESET), status_reg); /* REVISIT: some hardware needs 550+ usec to clear * this bit; seems too long to spin routinely... */ retval = ehci_handshake(ehci, status_reg, PORT_RESET, 0, 1000); if (retval != 0) { ehci_err (ehci, "port %d reset error %d\n", wIndex + 1, retval); goto error; } /* see what we found out */ temp = check_reset_complete (ehci, wIndex, status_reg, ehci_readl(ehci, status_reg)); } /* transfer dedicated ports to the companion hc */ if ((temp & PORT_CONNECT) && test_bit(wIndex, &ehci->companion_ports)) { temp &= ~PORT_RWC_BITS; temp |= PORT_OWNER; ehci_writel(ehci, temp, status_reg); ehci_dbg(ehci, "port %d --> companion\n", wIndex + 1); temp = ehci_readl(ehci, status_reg); } /* * Even if OWNER is set, there's no harm letting hub_wq * see the wPortStatus values (they should all be 0 except * for PORT_POWER anyway). */ if (temp & PORT_CONNECT) { status |= USB_PORT_STAT_CONNECTION; // status may be from integrated TT if (ehci->has_hostpc) { temp1 = ehci_readl(ehci, hostpc_reg); status |= ehci_port_speed(ehci, temp1); } else status |= ehci_port_speed(ehci, temp); } if (temp & PORT_PE) status |= USB_PORT_STAT_ENABLE; /* maybe the port was unsuspended without our knowledge */ if (temp & (PORT_SUSPEND|PORT_RESUME)) { status |= USB_PORT_STAT_SUSPEND; } else if (test_bit(wIndex, &ehci->suspended_ports)) { clear_bit(wIndex, &ehci->suspended_ports); clear_bit(wIndex, &ehci->resuming_ports); ehci->reset_done[wIndex] = 0; if (temp & PORT_PE) set_bit(wIndex, &ehci->port_c_suspend); usb_hcd_end_port_resume(&hcd->self, wIndex); } if (temp & PORT_OC) status |= USB_PORT_STAT_OVERCURRENT; if (temp & PORT_RESET) status |= USB_PORT_STAT_RESET; if (temp & PORT_POWER) status |= USB_PORT_STAT_POWER; if (test_bit(wIndex, &ehci->port_c_suspend)) status |= USB_PORT_STAT_C_SUSPEND << 16; if (status & ~0xffff) /* only if wPortChange is interesting */ dbg_port(ehci, "GetStatus", wIndex + 1, temp); put_unaligned_le32(status, buf); break; case SetHubFeature: switch (wValue) { case C_HUB_LOCAL_POWER: case C_HUB_OVER_CURRENT: /* no hub-wide feature/status flags */ break; default: goto error; } break; case SetPortFeature: selector = wIndex >> 8; wIndex &= 0xff; if (unlikely(ehci->debug)) { /* If the debug port is active any port * feature requests should get denied */ if (wIndex == HCS_DEBUG_PORT(ehci->hcs_params) && (readl(&ehci->debug->control) & DBGP_ENABLED)) { retval = -ENODEV; goto error_exit; } } if (!wIndex || wIndex > ports) goto error; wIndex--; temp = ehci_readl(ehci, status_reg); if (temp & PORT_OWNER) break; temp &= ~PORT_RWC_BITS; switch (wValue) { case USB_PORT_FEAT_SUSPEND: if (ehci->no_selective_suspend) break; if ((temp & PORT_PE) == 0 || (temp & PORT_RESET) != 0) goto error; /* After above check the port must be connected. * Set appropriate bit thus could put phy into low power * mode if we have tdi_phy_lpm feature */ temp &= ~PORT_WKCONN_E; temp |= PORT_WKDISC_E | PORT_WKOC_E; ehci_writel(ehci, temp | PORT_SUSPEND, status_reg); if (ehci->has_tdi_phy_lpm) { spin_unlock_irqrestore(&ehci->lock, flags); msleep(5);/* 5ms for HCD enter low pwr mode */ spin_lock_irqsave(&ehci->lock, flags); temp1 = ehci_readl(ehci, hostpc_reg); ehci_writel(ehci, temp1 | HOSTPC_PHCD, hostpc_reg); temp1 = ehci_readl(ehci, hostpc_reg); ehci_dbg(ehci, "Port%d phy low pwr mode %s\n", wIndex, (temp1 & HOSTPC_PHCD) ? "succeeded" : "failed"); } if (ehci_has_fsl_susp_errata(ehci)) { /* 10ms for HCD enter suspend */ spin_unlock_irqrestore(&ehci->lock, flags); usleep_range(10000, 20000); spin_lock_irqsave(&ehci->lock, flags); } set_bit(wIndex, &ehci->suspended_ports); break; case USB_PORT_FEAT_POWER: if (HCS_PPC(ehci->hcs_params)) { spin_unlock_irqrestore(&ehci->lock, flags); ehci_port_power(ehci, wIndex, true); spin_lock_irqsave(&ehci->lock, flags); } break; case USB_PORT_FEAT_RESET: if (temp & (PORT_SUSPEND|PORT_RESUME)) goto error; /* line status bits may report this as low speed, * which can be fine if this root hub has a * transaction translator built in. */ if ((temp & (PORT_PE|PORT_CONNECT)) == PORT_CONNECT && !ehci_is_TDI(ehci) && PORT_USB11 (temp)) { ehci_dbg (ehci, "port %d low speed --> companion\n", wIndex + 1); temp |= PORT_OWNER; } else { temp |= PORT_RESET; temp &= ~PORT_PE; /* * caller must wait, then call GetPortStatus * usb 2.0 spec says 50 ms resets on root */ ehci->reset_done [wIndex] = jiffies + msecs_to_jiffies (50); /* * Force full-speed connect for FSL high-speed * erratum; disable HS Chirp by setting PFSC bit */ if (ehci_has_fsl_hs_errata(ehci)) temp |= (1 << PORTSC_FSL_PFSC); } ehci_writel(ehci, temp, status_reg); break; /* For downstream facing ports (these): one hub port is put * into test mode according to USB2 11.24.2.13, then the hub * must be reset (which for root hub now means rmmod+modprobe, * or else system reboot). See EHCI 2.3.9 and 4.14 for info * about the EHCI-specific stuff. */ case USB_PORT_FEAT_TEST: #ifdef CONFIG_USB_HCD_TEST_MODE if (selector == EHSET_TEST_SINGLE_STEP_SET_FEATURE) { spin_unlock_irqrestore(&ehci->lock, flags); retval = ehset_single_step_set_feature(hcd, wIndex + 1); spin_lock_irqsave(&ehci->lock, flags); break; } #endif if (!selector || selector > 5) goto error; spin_unlock_irqrestore(&ehci->lock, flags); ehci_quiesce(ehci); spin_lock_irqsave(&ehci->lock, flags); /* Put all enabled ports into suspend */ while (ports--) { u32 __iomem *sreg = &ehci->regs->port_status[ports]; temp = ehci_readl(ehci, sreg) & ~PORT_RWC_BITS; if (temp & PORT_PE) ehci_writel(ehci, temp | PORT_SUSPEND, sreg); } spin_unlock_irqrestore(&ehci->lock, flags); ehci_halt(ehci); spin_lock_irqsave(&ehci->lock, flags); temp = ehci_readl(ehci, status_reg); temp |= selector << 16; ehci_writel(ehci, temp, status_reg); break; default: goto error; } ehci_readl(ehci, &ehci->regs->command); /* unblock posted writes */ break; default: error: /* "stall" on error */ retval = -EPIPE; } error_exit: spin_unlock_irqrestore (&ehci->lock, flags); return retval; } EXPORT_SYMBOL_GPL(ehci_hub_control); static void ehci_relinquish_port(struct usb_hcd *hcd, int portnum) { struct ehci_hcd *ehci = hcd_to_ehci(hcd); if (ehci_is_TDI(ehci)) return; set_owner(ehci, --portnum, PORT_OWNER); } static int ehci_port_handed_over(struct usb_hcd *hcd, int portnum) { struct ehci_hcd *ehci = hcd_to_ehci(hcd); u32 __iomem *reg; if (ehci_is_TDI(ehci)) return 0; reg = &ehci->regs->port_status[portnum - 1]; return ehci_readl(ehci, reg) & PORT_OWNER; } static int ehci_port_power(struct ehci_hcd *ehci, int portnum, bool enable) { struct usb_hcd *hcd = ehci_to_hcd(ehci); u32 __iomem *status_reg = &ehci->regs->port_status[portnum]; u32 temp = ehci_readl(ehci, status_reg) & ~PORT_RWC_BITS; if (enable) ehci_writel(ehci, temp | PORT_POWER, status_reg); else ehci_writel(ehci, temp & ~PORT_POWER, status_reg); if (hcd->driver->port_power) hcd->driver->port_power(hcd, portnum, enable); return 0; } |
| 2 1 9 9 9 9 2 2 2 2 5 5 5 5 5 5 4 5 5 5 4 5 5 5 5 2 2 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 2 2 2 8 5 5 5 9 4 4 5 4 4 4 4 4 5 4 2 5 4 2 5 5 5 5 5 2 2 2 10 5 1 4 4 4 4 1 2 2 4 4 4 2 4 2 2 2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 2674 2675 2676 2677 2678 2679 2680 2681 2682 2683 2684 2685 2686 2687 2688 2689 2690 2691 2692 2693 2694 2695 2696 2697 2698 2699 2700 2701 2702 2703 2704 2705 2706 2707 2708 2709 2710 2711 2712 2713 2714 2715 2716 2717 2718 2719 2720 2721 2722 2723 2724 2725 2726 2727 2728 2729 2730 2731 2732 2733 2734 2735 2736 2737 2738 2739 2740 2741 2742 2743 2744 2745 2746 2747 2748 2749 2750 2751 2752 2753 2754 2755 2756 2757 2758 2759 2760 2761 2762 2763 2764 2765 2766 2767 2768 2769 2770 2771 2772 2773 2774 2775 2776 2777 2778 2779 2780 2781 2782 2783 2784 2785 2786 2787 2788 2789 2790 2791 2792 2793 2794 2795 2796 2797 2798 2799 2800 2801 2802 2803 2804 2805 2806 2807 2808 2809 2810 2811 2812 2813 2814 2815 2816 2817 2818 2819 2820 2821 2822 2823 2824 2825 2826 2827 2828 2829 2830 2831 2832 2833 2834 2835 2836 2837 2838 2839 2840 2841 2842 2843 2844 2845 2846 2847 2848 2849 2850 2851 2852 2853 2854 2855 2856 2857 2858 2859 2860 2861 2862 2863 2864 2865 2866 2867 2868 2869 2870 2871 2872 2873 2874 2875 2876 2877 2878 2879 2880 2881 2882 2883 2884 2885 2886 2887 2888 2889 2890 2891 2892 2893 2894 2895 2896 2897 2898 2899 2900 2901 2902 2903 2904 2905 2906 2907 2908 2909 2910 2911 2912 2913 2914 2915 2916 2917 2918 2919 2920 2921 2922 | // SPDX-License-Identifier: GPL-2.0-only /* * linux/kernel/power/snapshot.c * * This file provides system snapshot/restore functionality for swsusp. * * Copyright (C) 1998-2005 Pavel Machek <pavel@ucw.cz> * Copyright (C) 2006 Rafael J. Wysocki <rjw@sisk.pl> */ #define pr_fmt(fmt) "PM: hibernation: " fmt #include <linux/version.h> #include <linux/module.h> #include <linux/mm.h> #include <linux/suspend.h> #include <linux/delay.h> #include <linux/bitops.h> #include <linux/spinlock.h> #include <linux/kernel.h> #include <linux/pm.h> #include <linux/device.h> #include <linux/init.h> #include <linux/memblock.h> #include <linux/nmi.h> #include <linux/syscalls.h> #include <linux/console.h> #include <linux/highmem.h> #include <linux/list.h> #include <linux/slab.h> #include <linux/compiler.h> #include <linux/ktime.h> #include <linux/set_memory.h> #include <linux/uaccess.h> #include <asm/mmu_context.h> #include <asm/tlbflush.h> #include <asm/io.h> #include "power.h" #if defined(CONFIG_STRICT_KERNEL_RWX) && defined(CONFIG_ARCH_HAS_SET_MEMORY) static bool hibernate_restore_protection; static bool hibernate_restore_protection_active; void enable_restore_image_protection(void) { hibernate_restore_protection = true; } static inline void hibernate_restore_protection_begin(void) { hibernate_restore_protection_active = hibernate_restore_protection; } static inline void hibernate_restore_protection_end(void) { hibernate_restore_protection_active = false; } static inline int __must_check hibernate_restore_protect_page(void *page_address) { if (hibernate_restore_protection_active) return set_memory_ro((unsigned long)page_address, 1); return 0; } static inline int hibernate_restore_unprotect_page(void *page_address) { if (hibernate_restore_protection_active) return set_memory_rw((unsigned long)page_address, 1); return 0; } #else static inline void hibernate_restore_protection_begin(void) {} static inline void hibernate_restore_protection_end(void) {} static inline int __must_check hibernate_restore_protect_page(void *page_address) {return 0; } static inline int hibernate_restore_unprotect_page(void *page_address) {return 0; } #endif /* CONFIG_STRICT_KERNEL_RWX && CONFIG_ARCH_HAS_SET_MEMORY */ /* * The calls to set_direct_map_*() should not fail because remapping a page * here means that we only update protection bits in an existing PTE. * It is still worth to have a warning here if something changes and this * will no longer be the case. */ static inline void hibernate_map_page(struct page *page) { if (IS_ENABLED(CONFIG_ARCH_HAS_SET_DIRECT_MAP)) { int ret = set_direct_map_default_noflush(page); if (ret) pr_warn_once("Failed to remap page\n"); } else { debug_pagealloc_map_pages(page, 1); } } static inline void hibernate_unmap_page(struct page *page) { if (IS_ENABLED(CONFIG_ARCH_HAS_SET_DIRECT_MAP)) { unsigned long addr = (unsigned long)page_address(page); int ret = set_direct_map_invalid_noflush(page); if (ret) pr_warn_once("Failed to remap page\n"); flush_tlb_kernel_range(addr, addr + PAGE_SIZE); } else { debug_pagealloc_unmap_pages(page, 1); } } static int swsusp_page_is_free(struct page *); static void swsusp_set_page_forbidden(struct page *); static void swsusp_unset_page_forbidden(struct page *); /* * Number of bytes to reserve for memory allocations made by device drivers * from their ->freeze() and ->freeze_noirq() callbacks so that they don't * cause image creation to fail (tunable via /sys/power/reserved_size). */ unsigned long reserved_size; void __init hibernate_reserved_size_init(void) { reserved_size = SPARE_PAGES * PAGE_SIZE; } /* * Preferred image size in bytes (tunable via /sys/power/image_size). * When it is set to N, swsusp will do its best to ensure the image * size will not exceed N bytes, but if that is impossible, it will * try to create the smallest image possible. */ unsigned long image_size; void __init hibernate_image_size_init(void) { image_size = ((totalram_pages() * 2) / 5) * PAGE_SIZE; } /* * List of PBEs needed for restoring the pages that were allocated before * the suspend and included in the suspend image, but have also been * allocated by the "resume" kernel, so their contents cannot be written * directly to their "original" page frames. */ struct pbe *restore_pblist; /* struct linked_page is used to build chains of pages */ #define LINKED_PAGE_DATA_SIZE (PAGE_SIZE - sizeof(void *)) struct linked_page { struct linked_page *next; char data[LINKED_PAGE_DATA_SIZE]; } __packed; /* * List of "safe" pages (ie. pages that were not used by the image kernel * before hibernation) that may be used as temporary storage for image kernel * memory contents. */ static struct linked_page *safe_pages_list; /* Pointer to an auxiliary buffer (1 page) */ static void *buffer; #define PG_ANY 0 #define PG_SAFE 1 #define PG_UNSAFE_CLEAR 1 #define PG_UNSAFE_KEEP 0 static unsigned int allocated_unsafe_pages; /** * get_image_page - Allocate a page for a hibernation image. * @gfp_mask: GFP mask for the allocation. * @safe_needed: Get pages that were not used before hibernation (restore only) * * During image restoration, for storing the PBE list and the image data, we can * only use memory pages that do not conflict with the pages used before * hibernation. The "unsafe" pages have PageNosaveFree set and we count them * using allocated_unsafe_pages. * * Each allocated image page is marked as PageNosave and PageNosaveFree so that * swsusp_free() can release it. */ static void *get_image_page(gfp_t gfp_mask, int safe_needed) { void *res; res = (void *)get_zeroed_page(gfp_mask); if (safe_needed) while (res && swsusp_page_is_free(virt_to_page(res))) { /* The page is unsafe, mark it for swsusp_free() */ swsusp_set_page_forbidden(virt_to_page(res)); allocated_unsafe_pages++; res = (void *)get_zeroed_page(gfp_mask); } if (res) { swsusp_set_page_forbidden(virt_to_page(res)); swsusp_set_page_free(virt_to_page(res)); } return res; } static void *__get_safe_page(gfp_t gfp_mask) { if (safe_pages_list) { void *ret = safe_pages_list; safe_pages_list = safe_pages_list->next; memset(ret, 0, PAGE_SIZE); return ret; } return get_image_page(gfp_mask, PG_SAFE); } unsigned long get_safe_page(gfp_t gfp_mask) { return (unsigned long)__get_safe_page(gfp_mask); } static struct page *alloc_image_page(gfp_t gfp_mask) { struct page *page; page = alloc_page(gfp_mask); if (page) { swsusp_set_page_forbidden(page); swsusp_set_page_free(page); } return page; } static void recycle_safe_page(void *page_address) { struct linked_page *lp = page_address; lp->next = safe_pages_list; safe_pages_list = lp; } /** * free_image_page - Free a page allocated for hibernation image. * @addr: Address of the page to free. * @clear_nosave_free: If set, clear the PageNosaveFree bit for the page. * * The page to free should have been allocated by get_image_page() (page flags * set by it are affected). */ static inline void free_image_page(void *addr, int clear_nosave_free) { struct page *page; BUG_ON(!virt_addr_valid(addr)); page = virt_to_page(addr); swsusp_unset_page_forbidden(page); if (clear_nosave_free) swsusp_unset_page_free(page); __free_page(page); } static inline void free_list_of_pages(struct linked_page *list, int clear_page_nosave) { while (list) { struct linked_page *lp = list->next; free_image_page(list, clear_page_nosave); list = lp; } } /* * struct chain_allocator is used for allocating small objects out of * a linked list of pages called 'the chain'. * * The chain grows each time when there is no room for a new object in * the current page. The allocated objects cannot be freed individually. * It is only possible to free them all at once, by freeing the entire * chain. * * NOTE: The chain allocator may be inefficient if the allocated objects * are not much smaller than PAGE_SIZE. */ struct chain_allocator { struct linked_page *chain; /* the chain */ unsigned int used_space; /* total size of objects allocated out of the current page */ gfp_t gfp_mask; /* mask for allocating pages */ int safe_needed; /* if set, only "safe" pages are allocated */ }; static void chain_init(struct chain_allocator *ca, gfp_t gfp_mask, int safe_needed) { ca->chain = NULL; ca->used_space = LINKED_PAGE_DATA_SIZE; ca->gfp_mask = gfp_mask; ca->safe_needed = safe_needed; } static void *chain_alloc(struct chain_allocator *ca, unsigned int size) { void *ret; if (LINKED_PAGE_DATA_SIZE - ca->used_space < size) { struct linked_page *lp; lp = ca->safe_needed ? __get_safe_page(ca->gfp_mask) : get_image_page(ca->gfp_mask, PG_ANY); if (!lp) return NULL; lp->next = ca->chain; ca->chain = lp; ca->used_space = 0; } ret = ca->chain->data + ca->used_space; ca->used_space += size; return ret; } /* * Data types related to memory bitmaps. * * Memory bitmap is a structure consisting of many linked lists of * objects. The main list's elements are of type struct zone_bitmap * and each of them corresponds to one zone. For each zone bitmap * object there is a list of objects of type struct bm_block that * represent each blocks of bitmap in which information is stored. * * struct memory_bitmap contains a pointer to the main list of zone * bitmap objects, a struct bm_position used for browsing the bitmap, * and a pointer to the list of pages used for allocating all of the * zone bitmap objects and bitmap block objects. * * NOTE: It has to be possible to lay out the bitmap in memory * using only allocations of order 0. Additionally, the bitmap is * designed to work with arbitrary number of zones (this is over the * top for now, but let's avoid making unnecessary assumptions ;-). * * struct zone_bitmap contains a pointer to a list of bitmap block * objects and a pointer to the bitmap block object that has been * most recently used for setting bits. Additionally, it contains the * PFNs that correspond to the start and end of the represented zone. * * struct bm_block contains a pointer to the memory page in which * information is stored (in the form of a block of bitmap) * It also contains the pfns that correspond to the start and end of * the represented memory area. * * The memory bitmap is organized as a radix tree to guarantee fast random * access to the bits. There is one radix tree for each zone (as returned * from create_mem_extents). * * One radix tree is represented by one struct mem_zone_bm_rtree. There are * two linked lists for the nodes of the tree, one for the inner nodes and * one for the leave nodes. The linked leave nodes are used for fast linear * access of the memory bitmap. * * The struct rtree_node represents one node of the radix tree. */ #define BM_END_OF_MAP (~0UL) #define BM_BITS_PER_BLOCK (PAGE_SIZE * BITS_PER_BYTE) #define BM_BLOCK_SHIFT (PAGE_SHIFT + 3) #define BM_BLOCK_MASK ((1UL << BM_BLOCK_SHIFT) - 1) /* * struct rtree_node is a wrapper struct to link the nodes * of the rtree together for easy linear iteration over * bits and easy freeing */ struct rtree_node { struct list_head list; unsigned long *data; }; /* * struct mem_zone_bm_rtree represents a bitmap used for one * populated memory zone. */ struct mem_zone_bm_rtree { struct list_head list; /* Link Zones together */ struct list_head nodes; /* Radix Tree inner nodes */ struct list_head leaves; /* Radix Tree leaves */ unsigned long start_pfn; /* Zone start page frame */ unsigned long end_pfn; /* Zone end page frame + 1 */ struct rtree_node *rtree; /* Radix Tree Root */ int levels; /* Number of Radix Tree Levels */ unsigned int blocks; /* Number of Bitmap Blocks */ }; /* struct bm_position is used for browsing memory bitmaps */ struct bm_position { struct mem_zone_bm_rtree *zone; struct rtree_node *node; unsigned long node_pfn; unsigned long cur_pfn; int node_bit; }; struct memory_bitmap { struct list_head zones; struct linked_page *p_list; /* list of pages used to store zone bitmap objects and bitmap block objects */ struct bm_position cur; /* most recently used bit position */ }; /* Functions that operate on memory bitmaps */ #define BM_ENTRIES_PER_LEVEL (PAGE_SIZE / sizeof(unsigned long)) #if BITS_PER_LONG == 32 #define BM_RTREE_LEVEL_SHIFT (PAGE_SHIFT - 2) #else #define BM_RTREE_LEVEL_SHIFT (PAGE_SHIFT - 3) #endif #define BM_RTREE_LEVEL_MASK ((1UL << BM_RTREE_LEVEL_SHIFT) - 1) /** * alloc_rtree_node - Allocate a new node and add it to the radix tree. * @gfp_mask: GFP mask for the allocation. * @safe_needed: Get pages not used before hibernation (restore only) * @ca: Pointer to a linked list of pages ("a chain") to allocate from * @list: Radix Tree node to add. * * This function is used to allocate inner nodes as well as the * leave nodes of the radix tree. It also adds the node to the * corresponding linked list passed in by the *list parameter. */ static struct rtree_node *alloc_rtree_node(gfp_t gfp_mask, int safe_needed, struct chain_allocator *ca, struct list_head *list) { struct rtree_node *node; node = chain_alloc(ca, sizeof(struct rtree_node)); if (!node) return NULL; node->data = get_image_page(gfp_mask, safe_needed); if (!node->data) return NULL; list_add_tail(&node->list, list); return node; } /** * add_rtree_block - Add a new leave node to the radix tree. * * The leave nodes need to be allocated in order to keep the leaves * linked list in order. This is guaranteed by the zone->blocks * counter. */ static int add_rtree_block(struct mem_zone_bm_rtree *zone, gfp_t gfp_mask, int safe_needed, struct chain_allocator *ca) { struct rtree_node *node, *block, **dst; unsigned int levels_needed, block_nr; int i; block_nr = zone->blocks; levels_needed = 0; /* How many levels do we need for this block nr? */ while (block_nr) { levels_needed += 1; block_nr >>= BM_RTREE_LEVEL_SHIFT; } /* Make sure the rtree has enough levels */ for (i = zone->levels; i < levels_needed; i++) { node = alloc_rtree_node(gfp_mask, safe_needed, ca, &zone->nodes); if (!node) return -ENOMEM; node->data[0] = (unsigned long)zone->rtree; zone->rtree = node; zone->levels += 1; } /* Allocate new block */ block = alloc_rtree_node(gfp_mask, safe_needed, ca, &zone->leaves); if (!block) return -ENOMEM; /* Now walk the rtree to insert the block */ node = zone->rtree; dst = &zone->rtree; block_nr = zone->blocks; for (i = zone->levels; i > 0; i--) { int index; if (!node) { node = alloc_rtree_node(gfp_mask, safe_needed, ca, &zone->nodes); if (!node) return -ENOMEM; *dst = node; } index = block_nr >> ((i - 1) * BM_RTREE_LEVEL_SHIFT); index &= BM_RTREE_LEVEL_MASK; dst = (struct rtree_node **)&((*dst)->data[index]); node = *dst; } zone->blocks += 1; *dst = block; return 0; } static void free_zone_bm_rtree(struct mem_zone_bm_rtree *zone, int clear_nosave_free); /** * create_zone_bm_rtree - Create a radix tree for one zone. * * Allocated the mem_zone_bm_rtree structure and initializes it. * This function also allocated and builds the radix tree for the * zone. */ static struct mem_zone_bm_rtree *create_zone_bm_rtree(gfp_t gfp_mask, int safe_needed, struct chain_allocator *ca, unsigned long start, unsigned long end) { struct mem_zone_bm_rtree *zone; unsigned int i, nr_blocks; unsigned long pages; pages = end - start; zone = chain_alloc(ca, sizeof(struct mem_zone_bm_rtree)); if (!zone) return NULL; INIT_LIST_HEAD(&zone->nodes); INIT_LIST_HEAD(&zone->leaves); zone->start_pfn = start; zone->end_pfn = end; nr_blocks = DIV_ROUND_UP(pages, BM_BITS_PER_BLOCK); for (i = 0; i < nr_blocks; i++) { if (add_rtree_block(zone, gfp_mask, safe_needed, ca)) { free_zone_bm_rtree(zone, PG_UNSAFE_CLEAR); return NULL; } } return zone; } /** * free_zone_bm_rtree - Free the memory of the radix tree. * * Free all node pages of the radix tree. The mem_zone_bm_rtree * structure itself is not freed here nor are the rtree_node * structs. */ static void free_zone_bm_rtree(struct mem_zone_bm_rtree *zone, int clear_nosave_free) { struct rtree_node *node; list_for_each_entry(node, &zone->nodes, list) free_image_page(node->data, clear_nosave_free); list_for_each_entry(node, &zone->leaves, list) free_image_page(node->data, clear_nosave_free); } static void memory_bm_position_reset(struct memory_bitmap *bm) { bm->cur.zone = list_entry(bm->zones.next, struct mem_zone_bm_rtree, list); bm->cur.node = list_entry(bm->cur.zone->leaves.next, struct rtree_node, list); bm->cur.node_pfn = 0; bm->cur.cur_pfn = BM_END_OF_MAP; bm->cur.node_bit = 0; } static void memory_bm_free(struct memory_bitmap *bm, int clear_nosave_free); struct mem_extent { struct list_head hook; unsigned long start; unsigned long end; }; /** * free_mem_extents - Free a list of memory extents. * @list: List of extents to free. */ static void free_mem_extents(struct list_head *list) { struct mem_extent *ext, *aux; list_for_each_entry_safe(ext, aux, list, hook) { list_del(&ext->hook); kfree(ext); } } /** * create_mem_extents - Create a list of memory extents. * @list: List to put the extents into. * @gfp_mask: Mask to use for memory allocations. * * The extents represent contiguous ranges of PFNs. */ static int create_mem_extents(struct list_head *list, gfp_t gfp_mask) { struct zone *zone; INIT_LIST_HEAD(list); for_each_populated_zone(zone) { unsigned long zone_start, zone_end; struct mem_extent *ext, *cur, *aux; zone_start = zone->zone_start_pfn; zone_end = zone_end_pfn(zone); list_for_each_entry(ext, list, hook) if (zone_start <= ext->end) break; if (&ext->hook == list || zone_end < ext->start) { /* New extent is necessary */ struct mem_extent *new_ext; new_ext = kzalloc(sizeof(struct mem_extent), gfp_mask); if (!new_ext) { free_mem_extents(list); return -ENOMEM; } new_ext->start = zone_start; new_ext->end = zone_end; list_add_tail(&new_ext->hook, &ext->hook); continue; } /* Merge this zone's range of PFNs with the existing one */ if (zone_start < ext->start) ext->start = zone_start; if (zone_end > ext->end) ext->end = zone_end; /* More merging may be possible */ cur = ext; list_for_each_entry_safe_continue(cur, aux, list, hook) { if (zone_end < cur->start) break; if (zone_end < cur->end) ext->end = cur->end; list_del(&cur->hook); kfree(cur); } } return 0; } /** * memory_bm_create - Allocate memory for a memory bitmap. */ static int memory_bm_create(struct memory_bitmap *bm, gfp_t gfp_mask, int safe_needed) { struct chain_allocator ca; struct list_head mem_extents; struct mem_extent *ext; int error; chain_init(&ca, gfp_mask, safe_needed); INIT_LIST_HEAD(&bm->zones); error = create_mem_extents(&mem_extents, gfp_mask); if (error) return error; list_for_each_entry(ext, &mem_extents, hook) { struct mem_zone_bm_rtree *zone; zone = create_zone_bm_rtree(gfp_mask, safe_needed, &ca, ext->start, ext->end); if (!zone) { error = -ENOMEM; goto Error; } list_add_tail(&zone->list, &bm->zones); } bm->p_list = ca.chain; memory_bm_position_reset(bm); Exit: free_mem_extents(&mem_extents); return error; Error: bm->p_list = ca.chain; memory_bm_free(bm, PG_UNSAFE_CLEAR); goto Exit; } /** * memory_bm_free - Free memory occupied by the memory bitmap. * @bm: Memory bitmap. */ static void memory_bm_free(struct memory_bitmap *bm, int clear_nosave_free) { struct mem_zone_bm_rtree *zone; list_for_each_entry(zone, &bm->zones, list) free_zone_bm_rtree(zone, clear_nosave_free); free_list_of_pages(bm->p_list, clear_nosave_free); INIT_LIST_HEAD(&bm->zones); } /** * memory_bm_find_bit - Find the bit for a given PFN in a memory bitmap. * * Find the bit in memory bitmap @bm that corresponds to the given PFN. * The cur.zone, cur.block and cur.node_pfn members of @bm are updated. * * Walk the radix tree to find the page containing the bit that represents @pfn * and return the position of the bit in @addr and @bit_nr. */ static int memory_bm_find_bit(struct memory_bitmap *bm, unsigned long pfn, void **addr, unsigned int *bit_nr) { struct mem_zone_bm_rtree *curr, *zone; struct rtree_node *node; int i, block_nr; zone = bm->cur.zone; if (pfn >= zone->start_pfn && pfn < zone->end_pfn) goto zone_found; zone = NULL; /* Find the right zone */ list_for_each_entry(curr, &bm->zones, list) { if (pfn >= curr->start_pfn && pfn < curr->end_pfn) { zone = curr; break; } } if (!zone) return -EFAULT; zone_found: /* * We have found the zone. Now walk the radix tree to find the leaf node * for our PFN. */ /* * If the zone we wish to scan is the current zone and the * pfn falls into the current node then we do not need to walk * the tree. */ node = bm->cur.node; if (zone == bm->cur.zone && ((pfn - zone->start_pfn) & ~BM_BLOCK_MASK) == bm->cur.node_pfn) goto node_found; node = zone->rtree; block_nr = (pfn - zone->start_pfn) >> BM_BLOCK_SHIFT; for (i = zone->levels; i > 0; i--) { int index; index = block_nr >> ((i - 1) * BM_RTREE_LEVEL_SHIFT); index &= BM_RTREE_LEVEL_MASK; BUG_ON(node->data[index] == 0); node = (struct rtree_node *)node->data[index]; } node_found: /* Update last position */ bm->cur.zone = zone; bm->cur.node = node; bm->cur.node_pfn = (pfn - zone->start_pfn) & ~BM_BLOCK_MASK; bm->cur.cur_pfn = pfn; /* Set return values */ *addr = node->data; *bit_nr = (pfn - zone->start_pfn) & BM_BLOCK_MASK; return 0; } static void memory_bm_set_bit(struct memory_bitmap *bm, unsigned long pfn) { void *addr; unsigned int bit; int error; error = memory_bm_find_bit(bm, pfn, &addr, &bit); BUG_ON(error); set_bit(bit, addr); } static int mem_bm_set_bit_check(struct memory_bitmap *bm, unsigned long pfn) { void *addr; unsigned int bit; int error; error = memory_bm_find_bit(bm, pfn, &addr, &bit); if (!error) set_bit(bit, addr); return error; } static void memory_bm_clear_bit(struct memory_bitmap *bm, unsigned long pfn) { void *addr; unsigned int bit; int error; error = memory_bm_find_bit(bm, pfn, &addr, &bit); BUG_ON(error); clear_bit(bit, addr); } static void memory_bm_clear_current(struct memory_bitmap *bm) { int bit; bit = max(bm->cur.node_bit - 1, 0); clear_bit(bit, bm->cur.node->data); } static unsigned long memory_bm_get_current(struct memory_bitmap *bm) { return bm->cur.cur_pfn; } static int memory_bm_test_bit(struct memory_bitmap *bm, unsigned long pfn) { void *addr; unsigned int bit; int error; error = memory_bm_find_bit(bm, pfn, &addr, &bit); BUG_ON(error); return test_bit(bit, addr); } static bool memory_bm_pfn_present(struct memory_bitmap *bm, unsigned long pfn) { void *addr; unsigned int bit; return !memory_bm_find_bit(bm, pfn, &addr, &bit); } /* * rtree_next_node - Jump to the next leaf node. * * Set the position to the beginning of the next node in the * memory bitmap. This is either the next node in the current * zone's radix tree or the first node in the radix tree of the * next zone. * * Return true if there is a next node, false otherwise. */ static bool rtree_next_node(struct memory_bitmap *bm) { if (!list_is_last(&bm->cur.node->list, &bm->cur.zone->leaves)) { bm->cur.node = list_entry(bm->cur.node->list.next, struct rtree_node, list); bm->cur.node_pfn += BM_BITS_PER_BLOCK; bm->cur.node_bit = 0; touch_softlockup_watchdog(); return true; } /* No more nodes, goto next zone */ if (!list_is_last(&bm->cur.zone->list, &bm->zones)) { bm->cur.zone = list_entry(bm->cur.zone->list.next, struct mem_zone_bm_rtree, list); bm->cur.node = list_entry(bm->cur.zone->leaves.next, struct rtree_node, list); bm->cur.node_pfn = 0; bm->cur.node_bit = 0; return true; } /* No more zones */ return false; } /** * memory_bm_next_pfn - Find the next set bit in a memory bitmap. * @bm: Memory bitmap. * * Starting from the last returned position this function searches for the next * set bit in @bm and returns the PFN represented by it. If no more bits are * set, BM_END_OF_MAP is returned. * * It is required to run memory_bm_position_reset() before the first call to * this function for the given memory bitmap. */ static unsigned long memory_bm_next_pfn(struct memory_bitmap *bm) { unsigned long bits, pfn, pages; int bit; do { pages = bm->cur.zone->end_pfn - bm->cur.zone->start_pfn; bits = min(pages - bm->cur.node_pfn, BM_BITS_PER_BLOCK); bit = find_next_bit(bm->cur.node->data, bits, bm->cur.node_bit); if (bit < bits) { pfn = bm->cur.zone->start_pfn + bm->cur.node_pfn + bit; bm->cur.node_bit = bit + 1; bm->cur.cur_pfn = pfn; return pfn; } } while (rtree_next_node(bm)); bm->cur.cur_pfn = BM_END_OF_MAP; return BM_END_OF_MAP; } /* * This structure represents a range of page frames the contents of which * should not be saved during hibernation. */ struct nosave_region { struct list_head list; unsigned long start_pfn; unsigned long end_pfn; }; static LIST_HEAD(nosave_regions); static void recycle_zone_bm_rtree(struct mem_zone_bm_rtree *zone) { struct rtree_node *node; list_for_each_entry(node, &zone->nodes, list) recycle_safe_page(node->data); list_for_each_entry(node, &zone->leaves, list) recycle_safe_page(node->data); } static void memory_bm_recycle(struct memory_bitmap *bm) { struct mem_zone_bm_rtree *zone; struct linked_page *p_list; list_for_each_entry(zone, &bm->zones, list) recycle_zone_bm_rtree(zone); p_list = bm->p_list; while (p_list) { struct linked_page *lp = p_list; p_list = lp->next; recycle_safe_page(lp); } } /** * register_nosave_region - Register a region of unsaveable memory. * * Register a range of page frames the contents of which should not be saved * during hibernation (to be used in the early initialization code). */ void __init register_nosave_region(unsigned long start_pfn, unsigned long end_pfn) { struct nosave_region *region; if (start_pfn >= end_pfn) return; if (!list_empty(&nosave_regions)) { /* Try to extend the previous region (they should be sorted) */ region = list_entry(nosave_regions.prev, struct nosave_region, list); if (region->end_pfn == start_pfn) { region->end_pfn = end_pfn; goto Report; } } /* This allocation cannot fail */ region = memblock_alloc_or_panic(sizeof(struct nosave_region), SMP_CACHE_BYTES); region->start_pfn = start_pfn; region->end_pfn = end_pfn; list_add_tail(®ion->list, &nosave_regions); Report: pr_info("Registered nosave memory: [mem %#010llx-%#010llx]\n", (unsigned long long) start_pfn << PAGE_SHIFT, ((unsigned long long) end_pfn << PAGE_SHIFT) - 1); } /* * Set bits in this map correspond to the page frames the contents of which * should not be saved during the suspend. */ static struct memory_bitmap *forbidden_pages_map; /* Set bits in this map correspond to free page frames. */ static struct memory_bitmap *free_pages_map; /* * Each page frame allocated for creating the image is marked by setting the * corresponding bits in forbidden_pages_map and free_pages_map simultaneously */ void swsusp_set_page_free(struct page *page) { if (free_pages_map) memory_bm_set_bit(free_pages_map, page_to_pfn(page)); } static int swsusp_page_is_free(struct page *page) { return free_pages_map ? memory_bm_test_bit(free_pages_map, page_to_pfn(page)) : 0; } void swsusp_unset_page_free(struct page *page) { if (free_pages_map) memory_bm_clear_bit(free_pages_map, page_to_pfn(page)); } static void swsusp_set_page_forbidden(struct page *page) { if (forbidden_pages_map) memory_bm_set_bit(forbidden_pages_map, page_to_pfn(page)); } int swsusp_page_is_forbidden(struct page *page) { return forbidden_pages_map ? memory_bm_test_bit(forbidden_pages_map, page_to_pfn(page)) : 0; } static void swsusp_unset_page_forbidden(struct page *page) { if (forbidden_pages_map) memory_bm_clear_bit(forbidden_pages_map, page_to_pfn(page)); } /** * mark_nosave_pages - Mark pages that should not be saved. * @bm: Memory bitmap. * * Set the bits in @bm that correspond to the page frames the contents of which * should not be saved. */ static void mark_nosave_pages(struct memory_bitmap *bm) { struct nosave_region *region; if (list_empty(&nosave_regions)) return; list_for_each_entry(region, &nosave_regions, list) { unsigned long pfn; pr_debug("Marking nosave pages: [mem %#010llx-%#010llx]\n", (unsigned long long) region->start_pfn << PAGE_SHIFT, ((unsigned long long) region->end_pfn << PAGE_SHIFT) - 1); for (pfn = region->start_pfn; pfn < region->end_pfn; pfn++) if (pfn_valid(pfn)) { /* * It is safe to ignore the result of * mem_bm_set_bit_check() here, since we won't * touch the PFNs for which the error is * returned anyway. */ mem_bm_set_bit_check(bm, pfn); } } } /** * create_basic_memory_bitmaps - Create bitmaps to hold basic page information. * * Create bitmaps needed for marking page frames that should not be saved and * free page frames. The forbidden_pages_map and free_pages_map pointers are * only modified if everything goes well, because we don't want the bits to be * touched before both bitmaps are set up. */ int create_basic_memory_bitmaps(void) { struct memory_bitmap *bm1, *bm2; int error; if (forbidden_pages_map && free_pages_map) return 0; else BUG_ON(forbidden_pages_map || free_pages_map); bm1 = kzalloc(sizeof(struct memory_bitmap), GFP_KERNEL); if (!bm1) return -ENOMEM; error = memory_bm_create(bm1, GFP_KERNEL, PG_ANY); if (error) goto Free_first_object; bm2 = kzalloc(sizeof(struct memory_bitmap), GFP_KERNEL); if (!bm2) goto Free_first_bitmap; error = memory_bm_create(bm2, GFP_KERNEL, PG_ANY); if (error) goto Free_second_object; forbidden_pages_map = bm1; free_pages_map = bm2; mark_nosave_pages(forbidden_pages_map); pr_debug("Basic memory bitmaps created\n"); return 0; Free_second_object: kfree(bm2); Free_first_bitmap: memory_bm_free(bm1, PG_UNSAFE_CLEAR); Free_first_object: kfree(bm1); return -ENOMEM; } /** * free_basic_memory_bitmaps - Free memory bitmaps holding basic information. * * Free memory bitmaps allocated by create_basic_memory_bitmaps(). The * auxiliary pointers are necessary so that the bitmaps themselves are not * referred to while they are being freed. */ void free_basic_memory_bitmaps(void) { struct memory_bitmap *bm1, *bm2; if (WARN_ON(!(forbidden_pages_map && free_pages_map))) return; bm1 = forbidden_pages_map; bm2 = free_pages_map; forbidden_pages_map = NULL; free_pages_map = NULL; memory_bm_free(bm1, PG_UNSAFE_CLEAR); kfree(bm1); memory_bm_free(bm2, PG_UNSAFE_CLEAR); kfree(bm2); pr_debug("Basic memory bitmaps freed\n"); } static void clear_or_poison_free_page(struct page *page) { if (page_poisoning_enabled_static()) __kernel_poison_pages(page, 1); else if (want_init_on_free()) clear_highpage(page); } void clear_or_poison_free_pages(void) { struct memory_bitmap *bm = free_pages_map; unsigned long pfn; if (WARN_ON(!(free_pages_map))) return; if (page_poisoning_enabled() || want_init_on_free()) { memory_bm_position_reset(bm); pfn = memory_bm_next_pfn(bm); while (pfn != BM_END_OF_MAP) { if (pfn_valid(pfn)) clear_or_poison_free_page(pfn_to_page(pfn)); pfn = memory_bm_next_pfn(bm); } memory_bm_position_reset(bm); pr_info("free pages cleared after restore\n"); } } /** * snapshot_additional_pages - Estimate the number of extra pages needed. * @zone: Memory zone to carry out the computation for. * * Estimate the number of additional pages needed for setting up a hibernation * image data structures for @zone (usually, the returned value is greater than * the exact number). */ unsigned int snapshot_additional_pages(struct zone *zone) { unsigned int rtree, nodes; rtree = nodes = DIV_ROUND_UP(zone->spanned_pages, BM_BITS_PER_BLOCK); rtree += DIV_ROUND_UP(rtree * sizeof(struct rtree_node), LINKED_PAGE_DATA_SIZE); while (nodes > 1) { nodes = DIV_ROUND_UP(nodes, BM_ENTRIES_PER_LEVEL); rtree += nodes; } return 2 * rtree; } /* * Touch the watchdog for every WD_PAGE_COUNT pages. */ #define WD_PAGE_COUNT (128*1024) static void mark_free_pages(struct zone *zone) { unsigned long pfn, max_zone_pfn, page_count = WD_PAGE_COUNT; unsigned long flags; unsigned int order, t; struct page *page; if (zone_is_empty(zone)) return; spin_lock_irqsave(&zone->lock, flags); max_zone_pfn = zone_end_pfn(zone); for (pfn = zone->zone_start_pfn; pfn < max_zone_pfn; pfn++) if (pfn_valid(pfn)) { page = pfn_to_page(pfn); if (!--page_count) { touch_nmi_watchdog(); page_count = WD_PAGE_COUNT; } if (page_zone(page) != zone) continue; if (!swsusp_page_is_forbidden(page)) swsusp_unset_page_free(page); } for_each_migratetype_order(order, t) { list_for_each_entry(page, &zone->free_area[order].free_list[t], buddy_list) { unsigned long i; pfn = page_to_pfn(page); for (i = 0; i < (1UL << order); i++) { if (!--page_count) { touch_nmi_watchdog(); page_count = WD_PAGE_COUNT; } swsusp_set_page_free(pfn_to_page(pfn + i)); } } } spin_unlock_irqrestore(&zone->lock, flags); } #ifdef CONFIG_HIGHMEM /** * count_free_highmem_pages - Compute the total number of free highmem pages. * * The returned number is system-wide. */ static unsigned int count_free_highmem_pages(void) { struct zone *zone; unsigned int cnt = 0; for_each_populated_zone(zone) if (is_highmem(zone)) cnt += zone_page_state(zone, NR_FREE_PAGES); return cnt; } /** * saveable_highmem_page - Check if a highmem page is saveable. * * Determine whether a highmem page should be included in a hibernation image. * * We should save the page if it isn't Nosave or NosaveFree, or Reserved, * and it isn't part of a free chunk of pages. */ static struct page *saveable_highmem_page(struct zone *zone, unsigned long pfn) { struct page *page; if (!pfn_valid(pfn)) return NULL; page = pfn_to_online_page(pfn); if (!page || page_zone(page) != zone) return NULL; BUG_ON(!PageHighMem(page)); if (swsusp_page_is_forbidden(page) || swsusp_page_is_free(page)) return NULL; if (PageReserved(page) || PageOffline(page)) return NULL; if (page_is_guard(page)) return NULL; return page; } /** * count_highmem_pages - Compute the total number of saveable highmem pages. */ static unsigned int count_highmem_pages(void) { struct zone *zone; unsigned int n = 0; for_each_populated_zone(zone) { unsigned long pfn, max_zone_pfn; if (!is_highmem(zone)) continue; mark_free_pages(zone); max_zone_pfn = zone_end_pfn(zone); for (pfn = zone->zone_start_pfn; pfn < max_zone_pfn; pfn++) if (saveable_highmem_page(zone, pfn)) n++; } return n; } #endif /* CONFIG_HIGHMEM */ /** * saveable_page - Check if the given page is saveable. * * Determine whether a non-highmem page should be included in a hibernation * image. * * We should save the page if it isn't Nosave, and is not in the range * of pages statically defined as 'unsaveable', and it isn't part of * a free chunk of pages. */ static struct page *saveable_page(struct zone *zone, unsigned long pfn) { struct page *page; if (!pfn_valid(pfn)) return NULL; page = pfn_to_online_page(pfn); if (!page || page_zone(page) != zone) return NULL; BUG_ON(PageHighMem(page)); if (swsusp_page_is_forbidden(page) || swsusp_page_is_free(page)) return NULL; if (PageOffline(page)) return NULL; if (PageReserved(page) && (!kernel_page_present(page) || pfn_is_nosave(pfn))) return NULL; if (page_is_guard(page)) return NULL; return page; } /** * count_data_pages - Compute the total number of saveable non-highmem pages. */ static unsigned int count_data_pages(void) { struct zone *zone; unsigned long pfn, max_zone_pfn; unsigned int n = 0; for_each_populated_zone(zone) { if (is_highmem(zone)) continue; mark_free_pages(zone); max_zone_pfn = zone_end_pfn(zone); for (pfn = zone->zone_start_pfn; pfn < max_zone_pfn; pfn++) if (saveable_page(zone, pfn)) n++; } return n; } /* * This is needed, because copy_page and memcpy are not usable for copying * task structs. Returns true if the page was filled with only zeros, * otherwise false. */ static inline bool do_copy_page(long *dst, long *src) { long z = 0; int n; for (n = PAGE_SIZE / sizeof(long); n; n--) { z |= *src; *dst++ = *src++; } return !z; } /** * safe_copy_page - Copy a page in a safe way. * * Check if the page we are going to copy is marked as present in the kernel * page tables. This always is the case if CONFIG_DEBUG_PAGEALLOC or * CONFIG_ARCH_HAS_SET_DIRECT_MAP is not set. In that case kernel_page_present() * always returns 'true'. Returns true if the page was entirely composed of * zeros, otherwise it will return false. */ static bool safe_copy_page(void *dst, struct page *s_page) { bool zeros_only; if (kernel_page_present(s_page)) { zeros_only = do_copy_page(dst, page_address(s_page)); } else { hibernate_map_page(s_page); zeros_only = do_copy_page(dst, page_address(s_page)); hibernate_unmap_page(s_page); } return zeros_only; } #ifdef CONFIG_HIGHMEM static inline struct page *page_is_saveable(struct zone *zone, unsigned long pfn) { return is_highmem(zone) ? saveable_highmem_page(zone, pfn) : saveable_page(zone, pfn); } static bool copy_data_page(unsigned long dst_pfn, unsigned long src_pfn) { struct page *s_page, *d_page; void *src, *dst; bool zeros_only; s_page = pfn_to_page(src_pfn); d_page = pfn_to_page(dst_pfn); if (PageHighMem(s_page)) { src = kmap_local_page(s_page); dst = kmap_local_page(d_page); zeros_only = do_copy_page(dst, src); kunmap_local(dst); kunmap_local(src); } else { if (PageHighMem(d_page)) { /* * The page pointed to by src may contain some kernel * data modified by kmap_atomic() */ zeros_only = safe_copy_page(buffer, s_page); dst = kmap_local_page(d_page); copy_page(dst, buffer); kunmap_local(dst); } else { zeros_only = safe_copy_page(page_address(d_page), s_page); } } return zeros_only; } #else #define page_is_saveable(zone, pfn) saveable_page(zone, pfn) static inline int copy_data_page(unsigned long dst_pfn, unsigned long src_pfn) { return safe_copy_page(page_address(pfn_to_page(dst_pfn)), pfn_to_page(src_pfn)); } #endif /* CONFIG_HIGHMEM */ /* * Copy data pages will copy all pages into pages pulled from the copy_bm. * If a page was entirely filled with zeros it will be marked in the zero_bm. * * Returns the number of pages copied. */ static unsigned long copy_data_pages(struct memory_bitmap *copy_bm, struct memory_bitmap *orig_bm, struct memory_bitmap *zero_bm) { unsigned long copied_pages = 0; struct zone *zone; unsigned long pfn, copy_pfn; for_each_populated_zone(zone) { unsigned long max_zone_pfn; mark_free_pages(zone); max_zone_pfn = zone_end_pfn(zone); for (pfn = zone->zone_start_pfn; pfn < max_zone_pfn; pfn++) if (page_is_saveable(zone, pfn)) memory_bm_set_bit(orig_bm, pfn); } memory_bm_position_reset(orig_bm); memory_bm_position_reset(copy_bm); copy_pfn = memory_bm_next_pfn(copy_bm); for(;;) { pfn = memory_bm_next_pfn(orig_bm); if (unlikely(pfn == BM_END_OF_MAP)) break; if (copy_data_page(copy_pfn, pfn)) { memory_bm_set_bit(zero_bm, pfn); /* Use this copy_pfn for a page that is not full of zeros */ continue; } copied_pages++; copy_pfn = memory_bm_next_pfn(copy_bm); } return copied_pages; } /* Total number of image pages */ static unsigned int nr_copy_pages; /* Number of pages needed for saving the original pfns of the image pages */ static unsigned int nr_meta_pages; /* Number of zero pages */ static unsigned int nr_zero_pages; /* * Numbers of normal and highmem page frames allocated for hibernation image * before suspending devices. */ static unsigned int alloc_normal, alloc_highmem; /* * Memory bitmap used for marking saveable pages (during hibernation) or * hibernation image pages (during restore) */ static struct memory_bitmap orig_bm; /* * Memory bitmap used during hibernation for marking allocated page frames that * will contain copies of saveable pages. During restore it is initially used * for marking hibernation image pages, but then the set bits from it are * duplicated in @orig_bm and it is released. On highmem systems it is next * used for marking "safe" highmem pages, but it has to be reinitialized for * this purpose. */ static struct memory_bitmap copy_bm; /* Memory bitmap which tracks which saveable pages were zero filled. */ static struct memory_bitmap zero_bm; /** * swsusp_free - Free pages allocated for hibernation image. * * Image pages are allocated before snapshot creation, so they need to be * released after resume. */ void swsusp_free(void) { unsigned long fb_pfn, fr_pfn; if (!forbidden_pages_map || !free_pages_map) goto out; memory_bm_position_reset(forbidden_pages_map); memory_bm_position_reset(free_pages_map); loop: fr_pfn = memory_bm_next_pfn(free_pages_map); fb_pfn = memory_bm_next_pfn(forbidden_pages_map); /* * Find the next bit set in both bitmaps. This is guaranteed to * terminate when fb_pfn == fr_pfn == BM_END_OF_MAP. */ do { if (fb_pfn < fr_pfn) fb_pfn = memory_bm_next_pfn(forbidden_pages_map); if (fr_pfn < fb_pfn) fr_pfn = memory_bm_next_pfn(free_pages_map); } while (fb_pfn != fr_pfn); if (fr_pfn != BM_END_OF_MAP && pfn_valid(fr_pfn)) { struct page *page = pfn_to_page(fr_pfn); memory_bm_clear_current(forbidden_pages_map); memory_bm_clear_current(free_pages_map); hibernate_restore_unprotect_page(page_address(page)); __free_page(page); goto loop; } out: nr_copy_pages = 0; nr_meta_pages = 0; nr_zero_pages = 0; restore_pblist = NULL; buffer = NULL; alloc_normal = 0; alloc_highmem = 0; hibernate_restore_protection_end(); } /* Helper functions used for the shrinking of memory. */ #define GFP_IMAGE (GFP_KERNEL | __GFP_NOWARN) /** * preallocate_image_pages - Allocate a number of pages for hibernation image. * @nr_pages: Number of page frames to allocate. * @mask: GFP flags to use for the allocation. * * Return value: Number of page frames actually allocated */ static unsigned long preallocate_image_pages(unsigned long nr_pages, gfp_t mask) { unsigned long nr_alloc = 0; while (nr_pages > 0) { struct page *page; page = alloc_image_page(mask); if (!page) break; memory_bm_set_bit(©_bm, page_to_pfn(page)); if (PageHighMem(page)) alloc_highmem++; else alloc_normal++; nr_pages--; nr_alloc++; } return nr_alloc; } static unsigned long preallocate_image_memory(unsigned long nr_pages, unsigned long avail_normal) { unsigned long alloc; if (avail_normal <= alloc_normal) return 0; alloc = avail_normal - alloc_normal; if (nr_pages < alloc) alloc = nr_pages; return preallocate_image_pages(alloc, GFP_IMAGE); } #ifdef CONFIG_HIGHMEM static unsigned long preallocate_image_highmem(unsigned long nr_pages) { return preallocate_image_pages(nr_pages, GFP_IMAGE | __GFP_HIGHMEM); } /** * __fraction - Compute (an approximation of) x * (multiplier / base). */ static unsigned long __fraction(u64 x, u64 multiplier, u64 base) { return div64_u64(x * multiplier, base); } static unsigned long preallocate_highmem_fraction(unsigned long nr_pages, unsigned long highmem, unsigned long total) { unsigned long alloc = __fraction(nr_pages, highmem, total); return preallocate_image_pages(alloc, GFP_IMAGE | __GFP_HIGHMEM); } #else /* CONFIG_HIGHMEM */ static inline unsigned long preallocate_image_highmem(unsigned long nr_pages) { return 0; } static inline unsigned long preallocate_highmem_fraction(unsigned long nr_pages, unsigned long highmem, unsigned long total) { return 0; } #endif /* CONFIG_HIGHMEM */ /** * free_unnecessary_pages - Release preallocated pages not needed for the image. */ static unsigned long free_unnecessary_pages(void) { unsigned long save, to_free_normal, to_free_highmem, free; save = count_data_pages(); if (alloc_normal >= save) { to_free_normal = alloc_normal - save; save = 0; } else { to_free_normal = 0; save -= alloc_normal; } save += count_highmem_pages(); if (alloc_highmem >= save) { to_free_highmem = alloc_highmem - save; } else { to_free_highmem = 0; save -= alloc_highmem; if (to_free_normal > save) to_free_normal -= save; else to_free_normal = 0; } free = to_free_normal + to_free_highmem; memory_bm_position_reset(©_bm); while (to_free_normal > 0 || to_free_highmem > 0) { unsigned long pfn = memory_bm_next_pfn(©_bm); struct page *page = pfn_to_page(pfn); if (PageHighMem(page)) { if (!to_free_highmem) continue; to_free_highmem--; alloc_highmem--; } else { if (!to_free_normal) continue; to_free_normal--; alloc_normal--; } memory_bm_clear_bit(©_bm, pfn); swsusp_unset_page_forbidden(page); swsusp_unset_page_free(page); __free_page(page); } return free; } /** * minimum_image_size - Estimate the minimum acceptable size of an image. * @saveable: Number of saveable pages in the system. * * We want to avoid attempting to free too much memory too hard, so estimate the * minimum acceptable size of a hibernation image to use as the lower limit for * preallocating memory. * * We assume that the minimum image size should be proportional to * * [number of saveable pages] - [number of pages that can be freed in theory] * * where the second term is the sum of (1) reclaimable slab pages, (2) active * and (3) inactive anonymous pages, (4) active and (5) inactive file pages. */ static unsigned long minimum_image_size(unsigned long saveable) { unsigned long size; size = global_node_page_state_pages(NR_SLAB_RECLAIMABLE_B) + global_node_page_state(NR_ACTIVE_ANON) + global_node_page_state(NR_INACTIVE_ANON) + global_node_page_state(NR_ACTIVE_FILE) + global_node_page_state(NR_INACTIVE_FILE); return saveable <= size ? 0 : saveable - size; } /** * hibernate_preallocate_memory - Preallocate memory for hibernation image. * * To create a hibernation image it is necessary to make a copy of every page * frame in use. We also need a number of page frames to be free during * hibernation for allocations made while saving the image and for device * drivers, in case they need to allocate memory from their hibernation * callbacks (these two numbers are given by PAGES_FOR_IO (which is a rough * estimate) and reserved_size divided by PAGE_SIZE (which is tunable through * /sys/power/reserved_size, respectively). To make this happen, we compute the * total number of available page frames and allocate at least * * ([page frames total] - PAGES_FOR_IO - [metadata pages]) / 2 * - 2 * DIV_ROUND_UP(reserved_size, PAGE_SIZE) * * of them, which corresponds to the maximum size of a hibernation image. * * If image_size is set below the number following from the above formula, * the preallocation of memory is continued until the total number of saveable * pages in the system is below the requested image size or the minimum * acceptable image size returned by minimum_image_size(), whichever is greater. */ int hibernate_preallocate_memory(void) { struct zone *zone; unsigned long saveable, size, max_size, count, highmem, pages = 0; unsigned long alloc, save_highmem, pages_highmem, avail_normal; ktime_t start, stop; int error; pr_info("Preallocating image memory\n"); start = ktime_get(); error = memory_bm_create(&orig_bm, GFP_IMAGE, PG_ANY); if (error) { pr_err("Cannot allocate original bitmap\n"); goto err_out; } error = memory_bm_create(©_bm, GFP_IMAGE, PG_ANY); if (error) { pr_err("Cannot allocate copy bitmap\n"); goto err_out; } error = memory_bm_create(&zero_bm, GFP_IMAGE, PG_ANY); if (error) { pr_err("Cannot allocate zero bitmap\n"); goto err_out; } alloc_normal = 0; alloc_highmem = 0; nr_zero_pages = 0; /* Count the number of saveable data pages. */ save_highmem = count_highmem_pages(); saveable = count_data_pages(); /* * Compute the total number of page frames we can use (count) and the * number of pages needed for image metadata (size). */ count = saveable; saveable += save_highmem; highmem = save_highmem; size = 0; for_each_populated_zone(zone) { size += snapshot_additional_pages(zone); if (is_highmem(zone)) highmem += zone_page_state(zone, NR_FREE_PAGES); else count += zone_page_state(zone, NR_FREE_PAGES); } avail_normal = count; count += highmem; count -= totalreserve_pages; /* Compute the maximum number of saveable pages to leave in memory. */ max_size = (count - (size + PAGES_FOR_IO)) / 2 - 2 * DIV_ROUND_UP(reserved_size, PAGE_SIZE); /* Compute the desired number of image pages specified by image_size. */ size = DIV_ROUND_UP(image_size, PAGE_SIZE); if (size > max_size) size = max_size; /* * If the desired number of image pages is at least as large as the * current number of saveable pages in memory, allocate page frames for * the image and we're done. */ if (size >= saveable) { pages = preallocate_image_highmem(save_highmem); pages += preallocate_image_memory(saveable - pages, avail_normal); goto out; } /* Estimate the minimum size of the image. */ pages = minimum_image_size(saveable); /* * To avoid excessive pressure on the normal zone, leave room in it to * accommodate an image of the minimum size (unless it's already too * small, in which case don't preallocate pages from it at all). */ if (avail_normal > pages) avail_normal -= pages; else avail_normal = 0; if (size < pages) size = min_t(unsigned long, pages, max_size); /* * Let the memory management subsystem know that we're going to need a * large number of page frames to allocate and make it free some memory. * NOTE: If this is not done, performance will be hurt badly in some * test cases. */ shrink_all_memory(saveable - size); /* * The number of saveable pages in memory was too high, so apply some * pressure to decrease it. First, make room for the largest possible * image and fail if that doesn't work. Next, try to decrease the size * of the image as much as indicated by 'size' using allocations from * highmem and non-highmem zones separately. */ pages_highmem = preallocate_image_highmem(highmem / 2); alloc = count - max_size; if (alloc > pages_highmem) alloc -= pages_highmem; else alloc = 0; pages = preallocate_image_memory(alloc, avail_normal); if (pages < alloc) { /* We have exhausted non-highmem pages, try highmem. */ alloc -= pages; pages += pages_highmem; pages_highmem = preallocate_image_highmem(alloc); if (pages_highmem < alloc) { pr_err("Image allocation is %lu pages short\n", alloc - pages_highmem); goto err_out; } pages += pages_highmem; /* * size is the desired number of saveable pages to leave in * memory, so try to preallocate (all memory - size) pages. */ alloc = (count - pages) - size; pages += preallocate_image_highmem(alloc); } else { /* * There are approximately max_size saveable pages at this point * and we want to reduce this number down to size. */ alloc = max_size - size; size = preallocate_highmem_fraction(alloc, highmem, count); pages_highmem += size; alloc -= size; size = preallocate_image_memory(alloc, avail_normal); pages_highmem += preallocate_image_highmem(alloc - size); pages += pages_highmem + size; } /* * We only need as many page frames for the image as there are saveable * pages in memory, but we have allocated more. Release the excessive * ones now. */ pages -= free_unnecessary_pages(); out: stop = ktime_get(); pr_info("Allocated %lu pages for snapshot\n", pages); swsusp_show_speed(start, stop, pages, "Allocated"); return 0; err_out: swsusp_free(); return -ENOMEM; } #ifdef CONFIG_HIGHMEM /** * count_pages_for_highmem - Count non-highmem pages needed for copying highmem. * * Compute the number of non-highmem pages that will be necessary for creating * copies of highmem pages. */ static unsigned int count_pages_for_highmem(unsigned int nr_highmem) { unsigned int free_highmem = count_free_highmem_pages() + alloc_highmem; if (free_highmem >= nr_highmem) nr_highmem = 0; else nr_highmem -= free_highmem; return nr_highmem; } #else static unsigned int count_pages_for_highmem(unsigned int nr_highmem) { return 0; } #endif /* CONFIG_HIGHMEM */ /** * enough_free_mem - Check if there is enough free memory for the image. */ static int enough_free_mem(unsigned int nr_pages, unsigned int nr_highmem) { struct zone *zone; unsigned int free = alloc_normal; for_each_populated_zone(zone) if (!is_highmem(zone)) free += zone_page_state(zone, NR_FREE_PAGES); nr_pages += count_pages_for_highmem(nr_highmem); pr_debug("Normal pages needed: %u + %u, available pages: %u\n", nr_pages, PAGES_FOR_IO, free); return free > nr_pages + PAGES_FOR_IO; } #ifdef CONFIG_HIGHMEM /** * get_highmem_buffer - Allocate a buffer for highmem pages. * * If there are some highmem pages in the hibernation image, we may need a * buffer to copy them and/or load their data. */ static inline int get_highmem_buffer(int safe_needed) { buffer = get_image_page(GFP_ATOMIC, safe_needed); return buffer ? 0 : -ENOMEM; } /** * alloc_highmem_pages - Allocate some highmem pages for the image. * * Try to allocate as many pages as needed, but if the number of free highmem * pages is less than that, allocate them all. */ static inline unsigned int alloc_highmem_pages(struct memory_bitmap *bm, unsigned int nr_highmem) { unsigned int to_alloc = count_free_highmem_pages(); if (to_alloc > nr_highmem) to_alloc = nr_highmem; nr_highmem -= to_alloc; while (to_alloc-- > 0) { struct page *page; page = alloc_image_page(__GFP_HIGHMEM|__GFP_KSWAPD_RECLAIM); memory_bm_set_bit(bm, page_to_pfn(page)); } return nr_highmem; } #else static inline int get_highmem_buffer(int safe_needed) { return 0; } static inline unsigned int alloc_highmem_pages(struct memory_bitmap *bm, unsigned int n) { return 0; } #endif /* CONFIG_HIGHMEM */ /** * swsusp_alloc - Allocate memory for hibernation image. * * We first try to allocate as many highmem pages as there are * saveable highmem pages in the system. If that fails, we allocate * non-highmem pages for the copies of the remaining highmem ones. * * In this approach it is likely that the copies of highmem pages will * also be located in the high memory, because of the way in which * copy_data_pages() works. */ static int swsusp_alloc(struct memory_bitmap *copy_bm, unsigned int nr_pages, unsigned int nr_highmem) { if (nr_highmem > 0) { if (get_highmem_buffer(PG_ANY)) goto err_out; if (nr_highmem > alloc_highmem) { nr_highmem -= alloc_highmem; nr_pages += alloc_highmem_pages(copy_bm, nr_highmem); } } if (nr_pages > alloc_normal) { nr_pages -= alloc_normal; while (nr_pages-- > 0) { struct page *page; page = alloc_image_page(GFP_ATOMIC); if (!page) goto err_out; memory_bm_set_bit(copy_bm, page_to_pfn(page)); } } return 0; err_out: swsusp_free(); return -ENOMEM; } asmlinkage __visible int swsusp_save(void) { unsigned int nr_pages, nr_highmem; pr_info("Creating image:\n"); drain_local_pages(NULL); nr_pages = count_data_pages(); nr_highmem = count_highmem_pages(); pr_info("Need to copy %u pages\n", nr_pages + nr_highmem); if (!enough_free_mem(nr_pages, nr_highmem)) { pr_err("Not enough free memory\n"); return -ENOMEM; } if (swsusp_alloc(©_bm, nr_pages, nr_highmem)) { pr_err("Memory allocation failed\n"); return -ENOMEM; } /* * During allocating of suspend pagedir, new cold pages may appear. * Kill them. */ drain_local_pages(NULL); nr_copy_pages = copy_data_pages(©_bm, &orig_bm, &zero_bm); /* * End of critical section. From now on, we can write to memory, * but we should not touch disk. This specially means we must _not_ * touch swap space! Except we must write out our image of course. */ nr_pages += nr_highmem; /* We don't actually copy the zero pages */ nr_zero_pages = nr_pages - nr_copy_pages; nr_meta_pages = DIV_ROUND_UP(nr_pages * sizeof(long), PAGE_SIZE); pr_info("Image created (%d pages copied, %d zero pages)\n", nr_copy_pages, nr_zero_pages); return 0; } #ifndef CONFIG_ARCH_HIBERNATION_HEADER static int init_header_complete(struct swsusp_info *info) { memcpy(&info->uts, init_utsname(), sizeof(struct new_utsname)); info->version_code = LINUX_VERSION_CODE; return 0; } static const char *check_image_kernel(struct swsusp_info *info) { if (info->version_code != LINUX_VERSION_CODE) return "kernel version"; if (strcmp(info->uts.sysname,init_utsname()->sysname)) return "system type"; if (strcmp(info->uts.release,init_utsname()->release)) return "kernel release"; if (strcmp(info->uts.version,init_utsname()->version)) return "version"; if (strcmp(info->uts.machine,init_utsname()->machine)) return "machine"; return NULL; } #endif /* CONFIG_ARCH_HIBERNATION_HEADER */ unsigned long snapshot_get_image_size(void) { return nr_copy_pages + nr_meta_pages + 1; } static int init_header(struct swsusp_info *info) { memset(info, 0, sizeof(struct swsusp_info)); info->num_physpages = get_num_physpages(); info->image_pages = nr_copy_pages; info->pages = snapshot_get_image_size(); info->size = info->pages; info->size <<= PAGE_SHIFT; return init_header_complete(info); } #define ENCODED_PFN_ZERO_FLAG ((unsigned long)1 << (BITS_PER_LONG - 1)) #define ENCODED_PFN_MASK (~ENCODED_PFN_ZERO_FLAG) /** * pack_pfns - Prepare PFNs for saving. * @bm: Memory bitmap. * @buf: Memory buffer to store the PFNs in. * @zero_bm: Memory bitmap containing PFNs of zero pages. * * PFNs corresponding to set bits in @bm are stored in the area of memory * pointed to by @buf (1 page at a time). Pages which were filled with only * zeros will have the highest bit set in the packed format to distinguish * them from PFNs which will be contained in the image file. */ static inline void pack_pfns(unsigned long *buf, struct memory_bitmap *bm, struct memory_bitmap *zero_bm) { int j; for (j = 0; j < PAGE_SIZE / sizeof(long); j++) { buf[j] = memory_bm_next_pfn(bm); if (unlikely(buf[j] == BM_END_OF_MAP)) break; if (memory_bm_test_bit(zero_bm, buf[j])) buf[j] |= ENCODED_PFN_ZERO_FLAG; } } /** * snapshot_read_next - Get the address to read the next image page from. * @handle: Snapshot handle to be used for the reading. * * On the first call, @handle should point to a zeroed snapshot_handle * structure. The structure gets populated then and a pointer to it should be * passed to this function every next time. * * On success, the function returns a positive number. Then, the caller * is allowed to read up to the returned number of bytes from the memory * location computed by the data_of() macro. * * The function returns 0 to indicate the end of the data stream condition, * and negative numbers are returned on errors. If that happens, the structure * pointed to by @handle is not updated and should not be used any more. */ int snapshot_read_next(struct snapshot_handle *handle) { if (handle->cur > nr_meta_pages + nr_copy_pages) return 0; if (!buffer) { /* This makes the buffer be freed by swsusp_free() */ buffer = get_image_page(GFP_ATOMIC, PG_ANY); if (!buffer) return -ENOMEM; } if (!handle->cur) { int error; error = init_header((struct swsusp_info *)buffer); if (error) return error; handle->buffer = buffer; memory_bm_position_reset(&orig_bm); memory_bm_position_reset(©_bm); } else if (handle->cur <= nr_meta_pages) { clear_page(buffer); pack_pfns(buffer, &orig_bm, &zero_bm); } else { struct page *page; page = pfn_to_page(memory_bm_next_pfn(©_bm)); if (PageHighMem(page)) { /* * Highmem pages are copied to the buffer, * because we can't return with a kmapped * highmem page (we may not be called again). */ void *kaddr; kaddr = kmap_atomic(page); copy_page(buffer, kaddr); kunmap_atomic(kaddr); handle->buffer = buffer; } else { handle->buffer = page_address(page); } } handle->cur++; return PAGE_SIZE; } static void duplicate_memory_bitmap(struct memory_bitmap *dst, struct memory_bitmap *src) { unsigned long pfn; memory_bm_position_reset(src); pfn = memory_bm_next_pfn(src); while (pfn != BM_END_OF_MAP) { memory_bm_set_bit(dst, pfn); pfn = memory_bm_next_pfn(src); } } /** * mark_unsafe_pages - Mark pages that were used before hibernation. * * Mark the pages that cannot be used for storing the image during restoration, * because they conflict with the pages that had been used before hibernation. */ static void mark_unsafe_pages(struct memory_bitmap *bm) { unsigned long pfn; /* Clear the "free"/"unsafe" bit for all PFNs */ memory_bm_position_reset(free_pages_map); pfn = memory_bm_next_pfn(free_pages_map); while (pfn != BM_END_OF_MAP) { memory_bm_clear_current(free_pages_map); pfn = memory_bm_next_pfn(free_pages_map); } /* Mark pages that correspond to the "original" PFNs as "unsafe" */ duplicate_memory_bitmap(free_pages_map, bm); allocated_unsafe_pages = 0; } static int check_header(struct swsusp_info *info) { const char *reason; reason = check_image_kernel(info); if (!reason && info->num_physpages != get_num_physpages()) reason = "memory size"; if (reason) { pr_err("Image mismatch: %s\n", reason); return -EPERM; } return 0; } /** * load_header - Check the image header and copy the data from it. */ static int load_header(struct swsusp_info *info) { int error; restore_pblist = NULL; error = check_header(info); if (!error) { nr_copy_pages = info->image_pages; nr_meta_pages = info->pages - info->image_pages - 1; } return error; } /** * unpack_orig_pfns - Set bits corresponding to given PFNs in a memory bitmap. * @bm: Memory bitmap. * @buf: Area of memory containing the PFNs. * @zero_bm: Memory bitmap with the zero PFNs marked. * * For each element of the array pointed to by @buf (1 page at a time), set the * corresponding bit in @bm. If the page was originally populated with only * zeros then a corresponding bit will also be set in @zero_bm. */ static int unpack_orig_pfns(unsigned long *buf, struct memory_bitmap *bm, struct memory_bitmap *zero_bm) { unsigned long decoded_pfn; bool zero; int j; for (j = 0; j < PAGE_SIZE / sizeof(long); j++) { if (unlikely(buf[j] == BM_END_OF_MAP)) break; zero = !!(buf[j] & ENCODED_PFN_ZERO_FLAG); decoded_pfn = buf[j] & ENCODED_PFN_MASK; if (pfn_valid(decoded_pfn) && memory_bm_pfn_present(bm, decoded_pfn)) { memory_bm_set_bit(bm, decoded_pfn); if (zero) { memory_bm_set_bit(zero_bm, decoded_pfn); nr_zero_pages++; } } else { if (!pfn_valid(decoded_pfn)) pr_err(FW_BUG "Memory map mismatch at 0x%llx after hibernation\n", (unsigned long long)PFN_PHYS(decoded_pfn)); return -EFAULT; } } return 0; } #ifdef CONFIG_HIGHMEM /* * struct highmem_pbe is used for creating the list of highmem pages that * should be restored atomically during the resume from disk, because the page * frames they have occupied before the suspend are in use. */ struct highmem_pbe { struct page *copy_page; /* data is here now */ struct page *orig_page; /* data was here before the suspend */ struct highmem_pbe *next; }; /* * List of highmem PBEs needed for restoring the highmem pages that were * allocated before the suspend and included in the suspend image, but have * also been allocated by the "resume" kernel, so their contents cannot be * written directly to their "original" page frames. */ static struct highmem_pbe *highmem_pblist; /** * count_highmem_image_pages - Compute the number of highmem pages in the image. * @bm: Memory bitmap. * * The bits in @bm that correspond to image pages are assumed to be set. */ static unsigned int count_highmem_image_pages(struct memory_bitmap *bm) { unsigned long pfn; unsigned int cnt = 0; memory_bm_position_reset(bm); pfn = memory_bm_next_pfn(bm); while (pfn != BM_END_OF_MAP) { if (PageHighMem(pfn_to_page(pfn))) cnt++; pfn = memory_bm_next_pfn(bm); } return cnt; } static unsigned int safe_highmem_pages; static struct memory_bitmap *safe_highmem_bm; /** * prepare_highmem_image - Allocate memory for loading highmem data from image. * @bm: Pointer to an uninitialized memory bitmap structure. * @nr_highmem_p: Pointer to the number of highmem image pages. * * Try to allocate as many highmem pages as there are highmem image pages * (@nr_highmem_p points to the variable containing the number of highmem image * pages). The pages that are "safe" (ie. will not be overwritten when the * hibernation image is restored entirely) have the corresponding bits set in * @bm (it must be uninitialized). * * NOTE: This function should not be called if there are no highmem image pages. */ static int prepare_highmem_image(struct memory_bitmap *bm, unsigned int *nr_highmem_p) { unsigned int to_alloc; if (memory_bm_create(bm, GFP_ATOMIC, PG_SAFE)) return -ENOMEM; if (get_highmem_buffer(PG_SAFE)) return -ENOMEM; to_alloc = count_free_highmem_pages(); if (to_alloc > *nr_highmem_p) to_alloc = *nr_highmem_p; else *nr_highmem_p = to_alloc; safe_highmem_pages = 0; while (to_alloc-- > 0) { struct page *page; page = alloc_page(__GFP_HIGHMEM); if (!swsusp_page_is_free(page)) { /* The page is "safe", set its bit the bitmap */ memory_bm_set_bit(bm, page_to_pfn(page)); safe_highmem_pages++; } /* Mark the page as allocated */ swsusp_set_page_forbidden(page); swsusp_set_page_free(page); } memory_bm_position_reset(bm); safe_highmem_bm = bm; return 0; } static struct page *last_highmem_page; /** * get_highmem_page_buffer - Prepare a buffer to store a highmem image page. * * For a given highmem image page get a buffer that suspend_write_next() should * return to its caller to write to. * * If the page is to be saved to its "original" page frame or a copy of * the page is to be made in the highmem, @buffer is returned. Otherwise, * the copy of the page is to be made in normal memory, so the address of * the copy is returned. * * If @buffer is returned, the caller of suspend_write_next() will write * the page's contents to @buffer, so they will have to be copied to the * right location on the next call to suspend_write_next() and it is done * with the help of copy_last_highmem_page(). For this purpose, if * @buffer is returned, @last_highmem_page is set to the page to which * the data will have to be copied from @buffer. */ static void *get_highmem_page_buffer(struct page *page, struct chain_allocator *ca) { struct highmem_pbe *pbe; void *kaddr; if (swsusp_page_is_forbidden(page) && swsusp_page_is_free(page)) { /* * We have allocated the "original" page frame and we can * use it directly to store the loaded page. */ last_highmem_page = page; return buffer; } /* * The "original" page frame has not been allocated and we have to * use a "safe" page frame to store the loaded page. */ pbe = chain_alloc(ca, sizeof(struct highmem_pbe)); if (!pbe) { swsusp_free(); return ERR_PTR(-ENOMEM); } pbe->orig_page = page; if (safe_highmem_pages > 0) { struct page *tmp; /* Copy of the page will be stored in high memory */ kaddr = buffer; tmp = pfn_to_page(memory_bm_next_pfn(safe_highmem_bm)); safe_highmem_pages--; last_highmem_page = tmp; pbe->copy_page = tmp; } else { /* Copy of the page will be stored in normal memory */ kaddr = __get_safe_page(ca->gfp_mask); if (!kaddr) return ERR_PTR(-ENOMEM); pbe->copy_page = virt_to_page(kaddr); } pbe->next = highmem_pblist; highmem_pblist = pbe; return kaddr; } /** * copy_last_highmem_page - Copy most the most recent highmem image page. * * Copy the contents of a highmem image from @buffer, where the caller of * snapshot_write_next() has stored them, to the right location represented by * @last_highmem_page . */ static void copy_last_highmem_page(void) { if (last_highmem_page) { void *dst; dst = kmap_atomic(last_highmem_page); copy_page(dst, buffer); kunmap_atomic(dst); last_highmem_page = NULL; } } static inline int last_highmem_page_copied(void) { return !last_highmem_page; } static inline void free_highmem_data(void) { if (safe_highmem_bm) memory_bm_free(safe_highmem_bm, PG_UNSAFE_CLEAR); if (buffer) free_image_page(buffer, PG_UNSAFE_CLEAR); } #else static unsigned int count_highmem_image_pages(struct memory_bitmap *bm) { return 0; } static inline int prepare_highmem_image(struct memory_bitmap *bm, unsigned int *nr_highmem_p) { return 0; } static inline void *get_highmem_page_buffer(struct page *page, struct chain_allocator *ca) { return ERR_PTR(-EINVAL); } static inline void copy_last_highmem_page(void) {} static inline int last_highmem_page_copied(void) { return 1; } static inline void free_highmem_data(void) {} #endif /* CONFIG_HIGHMEM */ #define PBES_PER_LINKED_PAGE (LINKED_PAGE_DATA_SIZE / sizeof(struct pbe)) /** * prepare_image - Make room for loading hibernation image. * @new_bm: Uninitialized memory bitmap structure. * @bm: Memory bitmap with unsafe pages marked. * @zero_bm: Memory bitmap containing the zero pages. * * Use @bm to mark the pages that will be overwritten in the process of * restoring the system memory state from the suspend image ("unsafe" pages) * and allocate memory for the image. * * The idea is to allocate a new memory bitmap first and then allocate * as many pages as needed for image data, but without specifying what those * pages will be used for just yet. Instead, we mark them all as allocated and * create a lists of "safe" pages to be used later. On systems with high * memory a list of "safe" highmem pages is created too. * * Because it was not known which pages were unsafe when @zero_bm was created, * make a copy of it and recreate it within safe pages. */ static int prepare_image(struct memory_bitmap *new_bm, struct memory_bitmap *bm, struct memory_bitmap *zero_bm) { unsigned int nr_pages, nr_highmem; struct memory_bitmap tmp; struct linked_page *lp; int error; /* If there is no highmem, the buffer will not be necessary */ free_image_page(buffer, PG_UNSAFE_CLEAR); buffer = NULL; nr_highmem = count_highmem_image_pages(bm); mark_unsafe_pages(bm); error = memory_bm_create(new_bm, GFP_ATOMIC, PG_SAFE); if (error) goto Free; duplicate_memory_bitmap(new_bm, bm); memory_bm_free(bm, PG_UNSAFE_KEEP); /* Make a copy of zero_bm so it can be created in safe pages */ error = memory_bm_create(&tmp, GFP_ATOMIC, PG_SAFE); if (error) goto Free; duplicate_memory_bitmap(&tmp, zero_bm); memory_bm_free(zero_bm, PG_UNSAFE_KEEP); /* Recreate zero_bm in safe pages */ error = memory_bm_create(zero_bm, GFP_ATOMIC, PG_SAFE); if (error) goto Free; duplicate_memory_bitmap(zero_bm, &tmp); memory_bm_free(&tmp, PG_UNSAFE_CLEAR); /* At this point zero_bm is in safe pages and it can be used for restoring. */ if (nr_highmem > 0) { error = prepare_highmem_image(bm, &nr_highmem); if (error) goto Free; } /* * Reserve some safe pages for potential later use. * * NOTE: This way we make sure there will be enough safe pages for the * chain_alloc() in get_buffer(). It is a bit wasteful, but * nr_copy_pages cannot be greater than 50% of the memory anyway. * * nr_copy_pages cannot be less than allocated_unsafe_pages too. */ nr_pages = (nr_zero_pages + nr_copy_pages) - nr_highmem - allocated_unsafe_pages; nr_pages = DIV_ROUND_UP(nr_pages, PBES_PER_LINKED_PAGE); while (nr_pages > 0) { lp = get_image_page(GFP_ATOMIC, PG_SAFE); if (!lp) { error = -ENOMEM; goto Free; } lp->next = safe_pages_list; safe_pages_list = lp; nr_pages--; } /* Preallocate memory for the image */ nr_pages = (nr_zero_pages + nr_copy_pages) - nr_highmem - allocated_unsafe_pages; while (nr_pages > 0) { lp = (struct linked_page *)get_zeroed_page(GFP_ATOMIC); if (!lp) { error = -ENOMEM; goto Free; } if (!swsusp_page_is_free(virt_to_page(lp))) { /* The page is "safe", add it to the list */ lp->next = safe_pages_list; safe_pages_list = lp; } /* Mark the page as allocated */ swsusp_set_page_forbidden(virt_to_page(lp)); swsusp_set_page_free(virt_to_page(lp)); nr_pages--; } return 0; Free: swsusp_free(); return error; } /** * get_buffer - Get the address to store the next image data page. * * Get the address that snapshot_write_next() should return to its caller to * write to. */ static void *get_buffer(struct memory_bitmap *bm, struct chain_allocator *ca) { struct pbe *pbe; struct page *page; unsigned long pfn = memory_bm_next_pfn(bm); if (pfn == BM_END_OF_MAP) return ERR_PTR(-EFAULT); page = pfn_to_page(pfn); if (PageHighMem(page)) return get_highmem_page_buffer(page, ca); if (swsusp_page_is_forbidden(page) && swsusp_page_is_free(page)) /* * We have allocated the "original" page frame and we can * use it directly to store the loaded page. */ return page_address(page); /* * The "original" page frame has not been allocated and we have to * use a "safe" page frame to store the loaded page. */ pbe = chain_alloc(ca, sizeof(struct pbe)); if (!pbe) { swsusp_free(); return ERR_PTR(-ENOMEM); } pbe->orig_address = page_address(page); pbe->address = __get_safe_page(ca->gfp_mask); if (!pbe->address) return ERR_PTR(-ENOMEM); pbe->next = restore_pblist; restore_pblist = pbe; return pbe->address; } /** * snapshot_write_next - Get the address to store the next image page. * @handle: Snapshot handle structure to guide the writing. * * On the first call, @handle should point to a zeroed snapshot_handle * structure. The structure gets populated then and a pointer to it should be * passed to this function every next time. * * On success, the function returns a positive number. Then, the caller * is allowed to write up to the returned number of bytes to the memory * location computed by the data_of() macro. * * The function returns 0 to indicate the "end of file" condition. Negative * numbers are returned on errors, in which cases the structure pointed to by * @handle is not updated and should not be used any more. */ int snapshot_write_next(struct snapshot_handle *handle) { static struct chain_allocator ca; int error; next: /* Check if we have already loaded the entire image */ if (handle->cur > 1 && handle->cur > nr_meta_pages + nr_copy_pages + nr_zero_pages) return 0; if (!handle->cur) { if (!buffer) /* This makes the buffer be freed by swsusp_free() */ buffer = get_image_page(GFP_ATOMIC, PG_ANY); if (!buffer) return -ENOMEM; handle->buffer = buffer; } else if (handle->cur == 1) { error = load_header(buffer); if (error) return error; safe_pages_list = NULL; error = memory_bm_create(©_bm, GFP_ATOMIC, PG_ANY); if (error) return error; error = memory_bm_create(&zero_bm, GFP_ATOMIC, PG_ANY); if (error) return error; nr_zero_pages = 0; hibernate_restore_protection_begin(); } else if (handle->cur <= nr_meta_pages + 1) { error = unpack_orig_pfns(buffer, ©_bm, &zero_bm); if (error) return error; if (handle->cur == nr_meta_pages + 1) { error = prepare_image(&orig_bm, ©_bm, &zero_bm); if (error) return error; chain_init(&ca, GFP_ATOMIC, PG_SAFE); memory_bm_position_reset(&orig_bm); memory_bm_position_reset(&zero_bm); restore_pblist = NULL; handle->buffer = get_buffer(&orig_bm, &ca); if (IS_ERR(handle->buffer)) return PTR_ERR(handle->buffer); } } else { copy_last_highmem_page(); error = hibernate_restore_protect_page(handle->buffer); if (error) return error; handle->buffer = get_buffer(&orig_bm, &ca); if (IS_ERR(handle->buffer)) return PTR_ERR(handle->buffer); } handle->sync_read = (handle->buffer == buffer); handle->cur++; /* Zero pages were not included in the image, memset it and move on. */ if (handle->cur > nr_meta_pages + 1 && memory_bm_test_bit(&zero_bm, memory_bm_get_current(&orig_bm))) { memset(handle->buffer, 0, PAGE_SIZE); goto next; } return PAGE_SIZE; } /** * snapshot_write_finalize - Complete the loading of a hibernation image. * * Must be called after the last call to snapshot_write_next() in case the last * page in the image happens to be a highmem page and its contents should be * stored in highmem. Additionally, it recycles bitmap memory that's not * necessary any more. */ int snapshot_write_finalize(struct snapshot_handle *handle) { int error; copy_last_highmem_page(); error = hibernate_restore_protect_page(handle->buffer); /* Do that only if we have loaded the image entirely */ if (handle->cur > 1 && handle->cur > nr_meta_pages + nr_copy_pages + nr_zero_pages) { memory_bm_recycle(&orig_bm); free_highmem_data(); } return error; } int snapshot_image_loaded(struct snapshot_handle *handle) { return !(!nr_copy_pages || !last_highmem_page_copied() || handle->cur <= nr_meta_pages + nr_copy_pages + nr_zero_pages); } #ifdef CONFIG_HIGHMEM /* Assumes that @buf is ready and points to a "safe" page */ static inline void swap_two_pages_data(struct page *p1, struct page *p2, void *buf) { void *kaddr1, *kaddr2; kaddr1 = kmap_atomic(p1); kaddr2 = kmap_atomic(p2); copy_page(buf, kaddr1); copy_page(kaddr1, kaddr2); copy_page(kaddr2, buf); kunmap_atomic(kaddr2); kunmap_atomic(kaddr1); } /** * restore_highmem - Put highmem image pages into their original locations. * * For each highmem page that was in use before hibernation and is included in * the image, and also has been allocated by the "restore" kernel, swap its * current contents with the previous (ie. "before hibernation") ones. * * If the restore eventually fails, we can call this function once again and * restore the highmem state as seen by the restore kernel. */ int restore_highmem(void) { struct highmem_pbe *pbe = highmem_pblist; void *buf; if (!pbe) return 0; buf = get_image_page(GFP_ATOMIC, PG_SAFE); if (!buf) return -ENOMEM; while (pbe) { swap_two_pages_data(pbe->copy_page, pbe->orig_page, buf); pbe = pbe->next; } free_image_page(buf, PG_UNSAFE_CLEAR); return 0; } #endif /* CONFIG_HIGHMEM */ |
| 15 24 23 15 24 230 229 204 2 1 24 2 16 8 2 23 9 24 2 191 192 1 1 39 39 24 24 2 2 23 24 1 9 1 9 9 1 10 4 6 10 10 10 10 10 9 10 10 10 10 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 | // SPDX-License-Identifier: GPL-2.0 /* * Performance events ring-buffer code: * * Copyright (C) 2008 Thomas Gleixner <tglx@linutronix.de> * Copyright (C) 2008-2011 Red Hat, Inc., Ingo Molnar * Copyright (C) 2008-2011 Red Hat, Inc., Peter Zijlstra * Copyright © 2009 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com> */ #include <linux/perf_event.h> #include <linux/vmalloc.h> #include <linux/slab.h> #include <linux/circ_buf.h> #include <linux/poll.h> #include <linux/nospec.h> #include "internal.h" static void perf_output_wakeup(struct perf_output_handle *handle) { atomic_set(&handle->rb->poll, EPOLLIN); handle->event->pending_wakeup = 1; if (*perf_event_fasync(handle->event) && !handle->event->pending_kill) handle->event->pending_kill = POLL_IN; irq_work_queue(&handle->event->pending_irq); } /* * We need to ensure a later event_id doesn't publish a head when a former * event isn't done writing. However since we need to deal with NMIs we * cannot fully serialize things. * * We only publish the head (and generate a wakeup) when the outer-most * event completes. */ static void perf_output_get_handle(struct perf_output_handle *handle) { struct perf_buffer *rb = handle->rb; preempt_disable(); /* * Avoid an explicit LOAD/STORE such that architectures with memops * can use them. */ (*(volatile unsigned int *)&rb->nest)++; handle->wakeup = local_read(&rb->wakeup); } static void perf_output_put_handle(struct perf_output_handle *handle) { struct perf_buffer *rb = handle->rb; unsigned long head; unsigned int nest; /* * If this isn't the outermost nesting, we don't have to update * @rb->user_page->data_head. */ nest = READ_ONCE(rb->nest); if (nest > 1) { WRITE_ONCE(rb->nest, nest - 1); goto out; } again: /* * In order to avoid publishing a head value that goes backwards, * we must ensure the load of @rb->head happens after we've * incremented @rb->nest. * * Otherwise we can observe a @rb->head value before one published * by an IRQ/NMI happening between the load and the increment. */ barrier(); head = local_read(&rb->head); /* * IRQ/NMI can happen here and advance @rb->head, causing our * load above to be stale. */ /* * Since the mmap() consumer (userspace) can run on a different CPU: * * kernel user * * if (LOAD ->data_tail) { LOAD ->data_head * (A) smp_rmb() (C) * STORE $data LOAD $data * smp_wmb() (B) smp_mb() (D) * STORE ->data_head STORE ->data_tail * } * * Where A pairs with D, and B pairs with C. * * In our case (A) is a control dependency that separates the load of * the ->data_tail and the stores of $data. In case ->data_tail * indicates there is no room in the buffer to store $data we do not. * * D needs to be a full barrier since it separates the data READ * from the tail WRITE. * * For B a WMB is sufficient since it separates two WRITEs, and for C * an RMB is sufficient since it separates two READs. * * See perf_output_begin(). */ smp_wmb(); /* B, matches C */ WRITE_ONCE(rb->user_page->data_head, head); /* * We must publish the head before decrementing the nest count, * otherwise an IRQ/NMI can publish a more recent head value and our * write will (temporarily) publish a stale value. */ barrier(); WRITE_ONCE(rb->nest, 0); /* * Ensure we decrement @rb->nest before we validate the @rb->head. * Otherwise we cannot be sure we caught the 'last' nested update. */ barrier(); if (unlikely(head != local_read(&rb->head))) { WRITE_ONCE(rb->nest, 1); goto again; } if (handle->wakeup != local_read(&rb->wakeup)) perf_output_wakeup(handle); out: preempt_enable(); } static __always_inline bool ring_buffer_has_space(unsigned long head, unsigned long tail, unsigned long data_size, unsigned int size, bool backward) { if (!backward) return CIRC_SPACE(head, tail, data_size) >= size; else return CIRC_SPACE(tail, head, data_size) >= size; } static __always_inline int __perf_output_begin(struct perf_output_handle *handle, struct perf_sample_data *data, struct perf_event *event, unsigned int size, bool backward) { struct perf_buffer *rb; unsigned long tail, offset, head; int have_lost, page_shift; struct { struct perf_event_header header; u64 id; u64 lost; } lost_event; rcu_read_lock(); /* * For inherited events we send all the output towards the parent. */ if (event->parent) event = event->parent; rb = rcu_dereference(event->rb); if (unlikely(!rb)) goto out; if (unlikely(rb->paused)) { if (rb->nr_pages) { local_inc(&rb->lost); atomic64_inc(&event->lost_samples); } goto out; } handle->rb = rb; handle->event = event; have_lost = local_read(&rb->lost); if (unlikely(have_lost)) { size += sizeof(lost_event); if (event->attr.sample_id_all) size += event->id_header_size; } perf_output_get_handle(handle); offset = local_read(&rb->head); do { head = offset; tail = READ_ONCE(rb->user_page->data_tail); if (!rb->overwrite) { if (unlikely(!ring_buffer_has_space(head, tail, perf_data_size(rb), size, backward))) goto fail; } /* * The above forms a control dependency barrier separating the * @tail load above from the data stores below. Since the @tail * load is required to compute the branch to fail below. * * A, matches D; the full memory barrier userspace SHOULD issue * after reading the data and before storing the new tail * position. * * See perf_output_put_handle(). */ if (!backward) head += size; else head -= size; } while (!local_try_cmpxchg(&rb->head, &offset, head)); if (backward) { offset = head; head = (u64)(-head); } /* * We rely on the implied barrier() by local_cmpxchg() to ensure * none of the data stores below can be lifted up by the compiler. */ if (unlikely(head - local_read(&rb->wakeup) > rb->watermark)) local_add(rb->watermark, &rb->wakeup); page_shift = PAGE_SHIFT + page_order(rb); handle->page = (offset >> page_shift) & (rb->nr_pages - 1); offset &= (1UL << page_shift) - 1; handle->addr = rb->data_pages[handle->page] + offset; handle->size = (1UL << page_shift) - offset; if (unlikely(have_lost)) { lost_event.header.size = sizeof(lost_event); lost_event.header.type = PERF_RECORD_LOST; lost_event.header.misc = 0; lost_event.id = event->id; lost_event.lost = local_xchg(&rb->lost, 0); /* XXX mostly redundant; @data is already fully initializes */ perf_event_header__init_id(&lost_event.header, data, event); perf_output_put(handle, lost_event); perf_event__output_id_sample(event, handle, data); } return 0; fail: local_inc(&rb->lost); atomic64_inc(&event->lost_samples); perf_output_put_handle(handle); out: rcu_read_unlock(); return -ENOSPC; } int perf_output_begin_forward(struct perf_output_handle *handle, struct perf_sample_data *data, struct perf_event *event, unsigned int size) { return __perf_output_begin(handle, data, event, size, false); } int perf_output_begin_backward(struct perf_output_handle *handle, struct perf_sample_data *data, struct perf_event *event, unsigned int size) { return __perf_output_begin(handle, data, event, size, true); } int perf_output_begin(struct perf_output_handle *handle, struct perf_sample_data *data, struct perf_event *event, unsigned int size) { return __perf_output_begin(handle, data, event, size, unlikely(is_write_backward(event))); } unsigned int perf_output_copy(struct perf_output_handle *handle, const void *buf, unsigned int len) { return __output_copy(handle, buf, len); } unsigned int perf_output_skip(struct perf_output_handle *handle, unsigned int len) { return __output_skip(handle, NULL, len); } void perf_output_end(struct perf_output_handle *handle) { perf_output_put_handle(handle); rcu_read_unlock(); } static void ring_buffer_init(struct perf_buffer *rb, long watermark, int flags) { long max_size = perf_data_size(rb); if (watermark) rb->watermark = min(max_size, watermark); if (!rb->watermark) rb->watermark = max_size / 2; if (flags & RING_BUFFER_WRITABLE) rb->overwrite = 0; else rb->overwrite = 1; refcount_set(&rb->refcount, 1); INIT_LIST_HEAD(&rb->event_list); spin_lock_init(&rb->event_lock); /* * perf_output_begin() only checks rb->paused, therefore * rb->paused must be true if we have no pages for output. */ if (!rb->nr_pages) rb->paused = 1; mutex_init(&rb->aux_mutex); } void perf_aux_output_flag(struct perf_output_handle *handle, u64 flags) { /* * OVERWRITE is determined by perf_aux_output_end() and can't * be passed in directly. */ if (WARN_ON_ONCE(flags & PERF_AUX_FLAG_OVERWRITE)) return; handle->aux_flags |= flags; } EXPORT_SYMBOL_GPL(perf_aux_output_flag); /* * This is called before hardware starts writing to the AUX area to * obtain an output handle and make sure there's room in the buffer. * When the capture completes, call perf_aux_output_end() to commit * the recorded data to the buffer. * * The ordering is similar to that of perf_output_{begin,end}, with * the exception of (B), which should be taken care of by the pmu * driver, since ordering rules will differ depending on hardware. * * Call this from pmu::start(); see the comment in perf_aux_output_end() * about its use in pmu callbacks. Both can also be called from the PMI * handler if needed. */ void *perf_aux_output_begin(struct perf_output_handle *handle, struct perf_event *event) { struct perf_event *output_event = event; unsigned long aux_head, aux_tail; struct perf_buffer *rb; unsigned int nest; if (output_event->parent) output_event = output_event->parent; /* * Since this will typically be open across pmu::add/pmu::del, we * grab ring_buffer's refcount instead of holding rcu read lock * to make sure it doesn't disappear under us. */ rb = ring_buffer_get(output_event); if (!rb) return NULL; if (!rb_has_aux(rb)) goto err; /* * If aux_mmap_count is zero, the aux buffer is in perf_mmap_close(), * about to get freed, so we leave immediately. * * Checking rb::aux_mmap_count and rb::refcount has to be done in * the same order, see perf_mmap_close. Otherwise we end up freeing * aux pages in this path, which is a bug, because in_atomic(). */ if (!atomic_read(&rb->aux_mmap_count)) goto err; if (!refcount_inc_not_zero(&rb->aux_refcount)) goto err; nest = READ_ONCE(rb->aux_nest); /* * Nesting is not supported for AUX area, make sure nested * writers are caught early */ if (WARN_ON_ONCE(nest)) goto err_put; WRITE_ONCE(rb->aux_nest, nest + 1); aux_head = rb->aux_head; handle->rb = rb; handle->event = event; handle->head = aux_head; handle->size = 0; handle->aux_flags = 0; /* * In overwrite mode, AUX data stores do not depend on aux_tail, * therefore (A) control dependency barrier does not exist. The * (B) <-> (C) ordering is still observed by the pmu driver. */ if (!rb->aux_overwrite) { aux_tail = READ_ONCE(rb->user_page->aux_tail); handle->wakeup = rb->aux_wakeup + rb->aux_watermark; if (aux_head - aux_tail < perf_aux_size(rb)) handle->size = CIRC_SPACE(aux_head, aux_tail, perf_aux_size(rb)); /* * handle->size computation depends on aux_tail load; this forms a * control dependency barrier separating aux_tail load from aux data * store that will be enabled on successful return */ if (!handle->size) { /* A, matches D */ event->pending_disable = smp_processor_id(); perf_output_wakeup(handle); WRITE_ONCE(rb->aux_nest, 0); goto err_put; } } return handle->rb->aux_priv; err_put: /* can't be last */ rb_free_aux(rb); err: ring_buffer_put(rb); handle->event = NULL; return NULL; } EXPORT_SYMBOL_GPL(perf_aux_output_begin); static __always_inline bool rb_need_aux_wakeup(struct perf_buffer *rb) { if (rb->aux_overwrite) return false; if (rb->aux_head - rb->aux_wakeup >= rb->aux_watermark) { rb->aux_wakeup = rounddown(rb->aux_head, rb->aux_watermark); return true; } return false; } /* * Commit the data written by hardware into the ring buffer by adjusting * aux_head and posting a PERF_RECORD_AUX into the perf buffer. It is the * pmu driver's responsibility to observe ordering rules of the hardware, * so that all the data is externally visible before this is called. * * Note: this has to be called from pmu::stop() callback, as the assumption * of the AUX buffer management code is that after pmu::stop(), the AUX * transaction must be stopped and therefore drop the AUX reference count. */ void perf_aux_output_end(struct perf_output_handle *handle, unsigned long size) { bool wakeup = !!(handle->aux_flags & PERF_AUX_FLAG_TRUNCATED); struct perf_buffer *rb = handle->rb; unsigned long aux_head; /* in overwrite mode, driver provides aux_head via handle */ if (rb->aux_overwrite) { handle->aux_flags |= PERF_AUX_FLAG_OVERWRITE; aux_head = handle->head; rb->aux_head = aux_head; } else { handle->aux_flags &= ~PERF_AUX_FLAG_OVERWRITE; aux_head = rb->aux_head; rb->aux_head += size; } /* * Only send RECORD_AUX if we have something useful to communicate * * Note: the OVERWRITE records by themselves are not considered * useful, as they don't communicate any *new* information, * aside from the short-lived offset, that becomes history at * the next event sched-in and therefore isn't useful. * The userspace that needs to copy out AUX data in overwrite * mode should know to use user_page::aux_head for the actual * offset. So, from now on we don't output AUX records that * have *only* OVERWRITE flag set. */ if (size || (handle->aux_flags & ~(u64)PERF_AUX_FLAG_OVERWRITE)) perf_event_aux_event(handle->event, aux_head, size, handle->aux_flags); WRITE_ONCE(rb->user_page->aux_head, rb->aux_head); if (rb_need_aux_wakeup(rb)) wakeup = true; if (wakeup) { if (handle->aux_flags & PERF_AUX_FLAG_TRUNCATED) handle->event->pending_disable = smp_processor_id(); perf_output_wakeup(handle); } handle->event = NULL; WRITE_ONCE(rb->aux_nest, 0); /* can't be last */ rb_free_aux(rb); ring_buffer_put(rb); } EXPORT_SYMBOL_GPL(perf_aux_output_end); /* * Skip over a given number of bytes in the AUX buffer, due to, for example, * hardware's alignment constraints. */ int perf_aux_output_skip(struct perf_output_handle *handle, unsigned long size) { struct perf_buffer *rb = handle->rb; if (size > handle->size) return -ENOSPC; rb->aux_head += size; WRITE_ONCE(rb->user_page->aux_head, rb->aux_head); if (rb_need_aux_wakeup(rb)) { perf_output_wakeup(handle); handle->wakeup = rb->aux_wakeup + rb->aux_watermark; } handle->head = rb->aux_head; handle->size -= size; return 0; } EXPORT_SYMBOL_GPL(perf_aux_output_skip); void *perf_get_aux(struct perf_output_handle *handle) { /* this is only valid between perf_aux_output_begin and *_end */ if (!handle->event) return NULL; return handle->rb->aux_priv; } EXPORT_SYMBOL_GPL(perf_get_aux); /* * Copy out AUX data from an AUX handle. */ long perf_output_copy_aux(struct perf_output_handle *aux_handle, struct perf_output_handle *handle, unsigned long from, unsigned long to) { struct perf_buffer *rb = aux_handle->rb; unsigned long tocopy, remainder, len = 0; void *addr; from &= (rb->aux_nr_pages << PAGE_SHIFT) - 1; to &= (rb->aux_nr_pages << PAGE_SHIFT) - 1; do { tocopy = PAGE_SIZE - offset_in_page(from); if (to > from) tocopy = min(tocopy, to - from); if (!tocopy) break; addr = rb->aux_pages[from >> PAGE_SHIFT]; addr += offset_in_page(from); remainder = perf_output_copy(handle, addr, tocopy); if (remainder) return -EFAULT; len += tocopy; from += tocopy; from &= (rb->aux_nr_pages << PAGE_SHIFT) - 1; } while (to != from); return len; } #define PERF_AUX_GFP (GFP_KERNEL | __GFP_ZERO | __GFP_NOWARN | __GFP_NORETRY) static struct page *rb_alloc_aux_page(int node, int order) { struct page *page; if (order > MAX_PAGE_ORDER) order = MAX_PAGE_ORDER; do { page = alloc_pages_node(node, PERF_AUX_GFP, order); } while (!page && order--); if (page && order) { /* * Communicate the allocation size to the driver: * if we managed to secure a high-order allocation, * set its first page's private to this order; * !PagePrivate(page) means it's just a normal page. */ split_page(page, order); SetPagePrivate(page); set_page_private(page, order); } return page; } static void rb_free_aux_page(struct perf_buffer *rb, int idx) { struct page *page = virt_to_page(rb->aux_pages[idx]); ClearPagePrivate(page); __free_page(page); } static void __rb_free_aux(struct perf_buffer *rb) { int pg; /* * Should never happen, the last reference should be dropped from * perf_mmap_close() path, which first stops aux transactions (which * in turn are the atomic holders of aux_refcount) and then does the * last rb_free_aux(). */ WARN_ON_ONCE(in_atomic()); if (rb->aux_priv) { rb->free_aux(rb->aux_priv); rb->free_aux = NULL; rb->aux_priv = NULL; } if (rb->aux_nr_pages) { for (pg = 0; pg < rb->aux_nr_pages; pg++) rb_free_aux_page(rb, pg); kfree(rb->aux_pages); rb->aux_nr_pages = 0; } } int rb_alloc_aux(struct perf_buffer *rb, struct perf_event *event, pgoff_t pgoff, int nr_pages, long watermark, int flags) { bool overwrite = !(flags & RING_BUFFER_WRITABLE); int node = (event->cpu == -1) ? -1 : cpu_to_node(event->cpu); int ret = -ENOMEM, max_order; if (!has_aux(event)) return -EOPNOTSUPP; if (nr_pages <= 0) return -EINVAL; if (!overwrite) { /* * Watermark defaults to half the buffer, and so does the * max_order, to aid PMU drivers in double buffering. */ if (!watermark) watermark = min_t(unsigned long, U32_MAX, (unsigned long)nr_pages << (PAGE_SHIFT - 1)); /* * Use aux_watermark as the basis for chunking to * help PMU drivers honor the watermark. */ max_order = get_order(watermark); } else { /* * We need to start with the max_order that fits in nr_pages, * not the other way around, hence ilog2() and not get_order. */ max_order = ilog2(nr_pages); watermark = 0; } /* * kcalloc_node() is unable to allocate buffer if the size is larger * than: PAGE_SIZE << MAX_PAGE_ORDER; directly bail out in this case. */ if (get_order((unsigned long)nr_pages * sizeof(void *)) > MAX_PAGE_ORDER) return -ENOMEM; rb->aux_pages = kcalloc_node(nr_pages, sizeof(void *), GFP_KERNEL, node); if (!rb->aux_pages) return -ENOMEM; rb->free_aux = event->pmu->free_aux; for (rb->aux_nr_pages = 0; rb->aux_nr_pages < nr_pages;) { struct page *page; int last, order; order = min(max_order, ilog2(nr_pages - rb->aux_nr_pages)); page = rb_alloc_aux_page(node, order); if (!page) goto out; for (last = rb->aux_nr_pages + (1 << page_private(page)); last > rb->aux_nr_pages; rb->aux_nr_pages++) rb->aux_pages[rb->aux_nr_pages] = page_address(page++); } /* * In overwrite mode, PMUs that don't support SG may not handle more * than one contiguous allocation, since they rely on PMI to do double * buffering. In this case, the entire buffer has to be one contiguous * chunk. */ if ((event->pmu->capabilities & PERF_PMU_CAP_AUX_NO_SG) && overwrite) { struct page *page = virt_to_page(rb->aux_pages[0]); if (page_private(page) != max_order) goto out; } rb->aux_priv = event->pmu->setup_aux(event, rb->aux_pages, nr_pages, overwrite); if (!rb->aux_priv) goto out; ret = 0; /* * aux_pages (and pmu driver's private data, aux_priv) will be * referenced in both producer's and consumer's contexts, thus * we keep a refcount here to make sure either of the two can * reference them safely. */ refcount_set(&rb->aux_refcount, 1); rb->aux_overwrite = overwrite; rb->aux_watermark = watermark; out: if (!ret) rb->aux_pgoff = pgoff; else __rb_free_aux(rb); return ret; } void rb_free_aux(struct perf_buffer *rb) { if (refcount_dec_and_test(&rb->aux_refcount)) __rb_free_aux(rb); } #ifndef CONFIG_PERF_USE_VMALLOC /* * Back perf_mmap() with regular GFP_KERNEL-0 pages. */ static struct page * __perf_mmap_to_page(struct perf_buffer *rb, unsigned long pgoff) { if (pgoff > rb->nr_pages) return NULL; if (pgoff == 0) return virt_to_page(rb->user_page); return virt_to_page(rb->data_pages[pgoff - 1]); } static void *perf_mmap_alloc_page(int cpu) { struct page *page; int node; node = (cpu == -1) ? cpu : cpu_to_node(cpu); page = alloc_pages_node(node, GFP_KERNEL | __GFP_ZERO, 0); if (!page) return NULL; return page_address(page); } static void perf_mmap_free_page(void *addr) { struct page *page = virt_to_page(addr); __free_page(page); } struct perf_buffer *rb_alloc(int nr_pages, long watermark, int cpu, int flags) { struct perf_buffer *rb; unsigned long size; int i, node; size = sizeof(struct perf_buffer); size += nr_pages * sizeof(void *); if (order_base_2(size) > PAGE_SHIFT+MAX_PAGE_ORDER) goto fail; node = (cpu == -1) ? cpu : cpu_to_node(cpu); rb = kzalloc_node(size, GFP_KERNEL, node); if (!rb) goto fail; rb->user_page = perf_mmap_alloc_page(cpu); if (!rb->user_page) goto fail_user_page; for (i = 0; i < nr_pages; i++) { rb->data_pages[i] = perf_mmap_alloc_page(cpu); if (!rb->data_pages[i]) goto fail_data_pages; } rb->nr_pages = nr_pages; ring_buffer_init(rb, watermark, flags); return rb; fail_data_pages: for (i--; i >= 0; i--) perf_mmap_free_page(rb->data_pages[i]); perf_mmap_free_page(rb->user_page); fail_user_page: kfree(rb); fail: return NULL; } void rb_free(struct perf_buffer *rb) { int i; perf_mmap_free_page(rb->user_page); for (i = 0; i < rb->nr_pages; i++) perf_mmap_free_page(rb->data_pages[i]); kfree(rb); } #else static struct page * __perf_mmap_to_page(struct perf_buffer *rb, unsigned long pgoff) { /* The '>' counts in the user page. */ if (pgoff > data_page_nr(rb)) return NULL; return vmalloc_to_page((void *)rb->user_page + pgoff * PAGE_SIZE); } static void rb_free_work(struct work_struct *work) { struct perf_buffer *rb; rb = container_of(work, struct perf_buffer, work); vfree(rb->user_page); kfree(rb); } void rb_free(struct perf_buffer *rb) { schedule_work(&rb->work); } struct perf_buffer *rb_alloc(int nr_pages, long watermark, int cpu, int flags) { struct perf_buffer *rb; unsigned long size; void *all_buf; int node; size = sizeof(struct perf_buffer); size += sizeof(void *); node = (cpu == -1) ? cpu : cpu_to_node(cpu); rb = kzalloc_node(size, GFP_KERNEL, node); if (!rb) goto fail; INIT_WORK(&rb->work, rb_free_work); all_buf = vmalloc_user((nr_pages + 1) * PAGE_SIZE); if (!all_buf) goto fail_all_buf; rb->user_page = all_buf; rb->data_pages[0] = all_buf + PAGE_SIZE; if (nr_pages) { rb->nr_pages = 1; rb->page_order = ilog2(nr_pages); } ring_buffer_init(rb, watermark, flags); return rb; fail_all_buf: kfree(rb); fail: return NULL; } #endif struct page * perf_mmap_to_page(struct perf_buffer *rb, unsigned long pgoff) { if (rb->aux_nr_pages) { /* above AUX space */ if (pgoff > rb->aux_pgoff + rb->aux_nr_pages) return NULL; /* AUX space */ if (pgoff >= rb->aux_pgoff) { int aux_pgoff = array_index_nospec(pgoff - rb->aux_pgoff, rb->aux_nr_pages); return virt_to_page(rb->aux_pages[aux_pgoff]); } } return __perf_mmap_to_page(rb, pgoff); } |
| 462 463 1 440 440 136 92 301 438 1 2 306 244 6 48 395 396 254 1 393 333 335 281 283 282 252 252 252 30 34 462 25 104 437 10 463 138 2 2 61 462 462 88 460 93 139 10 461 461 462 463 177 463 463 62 462 74 74 74 74 74 74 1 74 399 399 116 144 70 391 399 10 399 170 382 398 10 1 74 74 74 358 9 44 363 462 462 5 417 461 400 398 399 400 1 2 6 243 253 386 92 385 92 462 67 1 66 45 173 173 108 7 79 10 459 438 121 95 123 2 125 74 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 | // SPDX-License-Identifier: GPL-2.0-or-later /* SCTP kernel implementation * (C) Copyright IBM Corp. 2001, 2004 * Copyright (c) 1999-2000 Cisco, Inc. * Copyright (c) 1999-2001 Motorola, Inc. * * This file is part of the SCTP kernel implementation * * These functions handle output processing. * * Please send any bug reports or fixes you make to the * email address(es): * lksctp developers <linux-sctp@vger.kernel.org> * * Written or modified by: * La Monte H.P. Yarroll <piggy@acm.org> * Karl Knutson <karl@athena.chicago.il.us> * Jon Grimm <jgrimm@austin.ibm.com> * Sridhar Samudrala <sri@us.ibm.com> */ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #include <linux/types.h> #include <linux/kernel.h> #include <linux/wait.h> #include <linux/time.h> #include <linux/ip.h> #include <linux/ipv6.h> #include <linux/init.h> #include <linux/slab.h> #include <net/inet_ecn.h> #include <net/ip.h> #include <net/icmp.h> #include <net/net_namespace.h> #include <linux/socket.h> /* for sa_family_t */ #include <net/sock.h> #include <net/sctp/sctp.h> #include <net/sctp/sm.h> #include <net/sctp/checksum.h> /* Forward declarations for private helpers. */ static enum sctp_xmit __sctp_packet_append_chunk(struct sctp_packet *packet, struct sctp_chunk *chunk); static enum sctp_xmit sctp_packet_can_append_data(struct sctp_packet *packet, struct sctp_chunk *chunk); static void sctp_packet_append_data(struct sctp_packet *packet, struct sctp_chunk *chunk); static enum sctp_xmit sctp_packet_will_fit(struct sctp_packet *packet, struct sctp_chunk *chunk, u16 chunk_len); static void sctp_packet_reset(struct sctp_packet *packet) { /* sctp_packet_transmit() relies on this to reset size to the * current overhead after sending packets. */ packet->size = packet->overhead; packet->has_cookie_echo = 0; packet->has_sack = 0; packet->has_data = 0; packet->has_auth = 0; packet->ipfragok = 0; packet->auth = NULL; } /* Config a packet. * This appears to be a followup set of initializations. */ void sctp_packet_config(struct sctp_packet *packet, __u32 vtag, int ecn_capable) { struct sctp_transport *tp = packet->transport; struct sctp_association *asoc = tp->asoc; struct sctp_sock *sp = NULL; struct sock *sk; pr_debug("%s: packet:%p vtag:0x%x\n", __func__, packet, vtag); packet->vtag = vtag; /* do the following jobs only once for a flush schedule */ if (!sctp_packet_empty(packet)) return; /* set packet max_size with pathmtu, then calculate overhead */ packet->max_size = tp->pathmtu; if (asoc) { sk = asoc->base.sk; sp = sctp_sk(sk); } packet->overhead = sctp_mtu_payload(sp, 0, 0); packet->size = packet->overhead; if (!asoc) return; /* update dst or transport pathmtu if in need */ if (!sctp_transport_dst_check(tp)) { sctp_transport_route(tp, NULL, sp); if (asoc->param_flags & SPP_PMTUD_ENABLE) sctp_assoc_sync_pmtu(asoc); } else if (!sctp_transport_pl_enabled(tp) && asoc->param_flags & SPP_PMTUD_ENABLE) { if (!sctp_transport_pmtu_check(tp)) sctp_assoc_sync_pmtu(asoc); } if (asoc->pmtu_pending) { if (asoc->param_flags & SPP_PMTUD_ENABLE) sctp_assoc_sync_pmtu(asoc); asoc->pmtu_pending = 0; } /* If there a is a prepend chunk stick it on the list before * any other chunks get appended. */ if (ecn_capable) { struct sctp_chunk *chunk = sctp_get_ecne_prepend(asoc); if (chunk) sctp_packet_append_chunk(packet, chunk); } if (!tp->dst) return; /* set packet max_size with gso_max_size if gso is enabled*/ rcu_read_lock(); if (__sk_dst_get(sk) != tp->dst) { dst_hold(tp->dst); sk_setup_caps(sk, tp->dst); } packet->max_size = sk_can_gso(sk) ? min(READ_ONCE(tp->dst->dev->gso_max_size), GSO_LEGACY_MAX_SIZE) : asoc->pathmtu; rcu_read_unlock(); } /* Initialize the packet structure. */ void sctp_packet_init(struct sctp_packet *packet, struct sctp_transport *transport, __u16 sport, __u16 dport) { pr_debug("%s: packet:%p transport:%p\n", __func__, packet, transport); packet->transport = transport; packet->source_port = sport; packet->destination_port = dport; INIT_LIST_HEAD(&packet->chunk_list); /* The overhead will be calculated by sctp_packet_config() */ packet->overhead = 0; sctp_packet_reset(packet); packet->vtag = 0; } /* Free a packet. */ void sctp_packet_free(struct sctp_packet *packet) { struct sctp_chunk *chunk, *tmp; pr_debug("%s: packet:%p\n", __func__, packet); list_for_each_entry_safe(chunk, tmp, &packet->chunk_list, list) { list_del_init(&chunk->list); sctp_chunk_free(chunk); } } /* This routine tries to append the chunk to the offered packet. If adding * the chunk causes the packet to exceed the path MTU and COOKIE_ECHO chunk * is not present in the packet, it transmits the input packet. * Data can be bundled with a packet containing a COOKIE_ECHO chunk as long * as it can fit in the packet, but any more data that does not fit in this * packet can be sent only after receiving the COOKIE_ACK. */ enum sctp_xmit sctp_packet_transmit_chunk(struct sctp_packet *packet, struct sctp_chunk *chunk, int one_packet, gfp_t gfp) { enum sctp_xmit retval; pr_debug("%s: packet:%p size:%zu chunk:%p size:%d\n", __func__, packet, packet->size, chunk, chunk->skb ? chunk->skb->len : -1); switch ((retval = (sctp_packet_append_chunk(packet, chunk)))) { case SCTP_XMIT_PMTU_FULL: if (!packet->has_cookie_echo) { int error = 0; error = sctp_packet_transmit(packet, gfp); if (error < 0) chunk->skb->sk->sk_err = -error; /* If we have an empty packet, then we can NOT ever * return PMTU_FULL. */ if (!one_packet) retval = sctp_packet_append_chunk(packet, chunk); } break; case SCTP_XMIT_RWND_FULL: case SCTP_XMIT_OK: case SCTP_XMIT_DELAY: break; } return retval; } /* Try to bundle a pad chunk into a packet with a heartbeat chunk for PLPMTUTD probe */ static enum sctp_xmit sctp_packet_bundle_pad(struct sctp_packet *pkt, struct sctp_chunk *chunk) { struct sctp_transport *t = pkt->transport; struct sctp_chunk *pad; int overhead = 0; if (!chunk->pmtu_probe) return SCTP_XMIT_OK; /* calculate the Padding Data size for the pad chunk */ overhead += sizeof(struct sctphdr) + sizeof(struct sctp_chunkhdr); overhead += sizeof(struct sctp_sender_hb_info) + sizeof(struct sctp_pad_chunk); pad = sctp_make_pad(t->asoc, t->pl.probe_size - overhead); if (!pad) return SCTP_XMIT_DELAY; list_add_tail(&pad->list, &pkt->chunk_list); pkt->size += SCTP_PAD4(ntohs(pad->chunk_hdr->length)); chunk->transport = t; return SCTP_XMIT_OK; } /* Try to bundle an auth chunk into the packet. */ static enum sctp_xmit sctp_packet_bundle_auth(struct sctp_packet *pkt, struct sctp_chunk *chunk) { struct sctp_association *asoc = pkt->transport->asoc; enum sctp_xmit retval = SCTP_XMIT_OK; struct sctp_chunk *auth; /* if we don't have an association, we can't do authentication */ if (!asoc) return retval; /* See if this is an auth chunk we are bundling or if * auth is already bundled. */ if (chunk->chunk_hdr->type == SCTP_CID_AUTH || pkt->has_auth) return retval; /* if the peer did not request this chunk to be authenticated, * don't do it */ if (!chunk->auth) return retval; auth = sctp_make_auth(asoc, chunk->shkey->key_id); if (!auth) return retval; auth->shkey = chunk->shkey; sctp_auth_shkey_hold(auth->shkey); retval = __sctp_packet_append_chunk(pkt, auth); if (retval != SCTP_XMIT_OK) sctp_chunk_free(auth); return retval; } /* Try to bundle a SACK with the packet. */ static enum sctp_xmit sctp_packet_bundle_sack(struct sctp_packet *pkt, struct sctp_chunk *chunk) { enum sctp_xmit retval = SCTP_XMIT_OK; /* If sending DATA and haven't aleady bundled a SACK, try to * bundle one in to the packet. */ if (sctp_chunk_is_data(chunk) && !pkt->has_sack && !pkt->has_cookie_echo) { struct sctp_association *asoc; struct timer_list *timer; asoc = pkt->transport->asoc; timer = &asoc->timers[SCTP_EVENT_TIMEOUT_SACK]; /* If the SACK timer is running, we have a pending SACK */ if (timer_pending(timer)) { struct sctp_chunk *sack; if (pkt->transport->sack_generation != pkt->transport->asoc->peer.sack_generation) return retval; asoc->a_rwnd = asoc->rwnd; sack = sctp_make_sack(asoc); if (sack) { retval = __sctp_packet_append_chunk(pkt, sack); if (retval != SCTP_XMIT_OK) { sctp_chunk_free(sack); goto out; } SCTP_INC_STATS(asoc->base.net, SCTP_MIB_OUTCTRLCHUNKS); asoc->stats.octrlchunks++; asoc->peer.sack_needed = 0; if (del_timer(timer)) sctp_association_put(asoc); } } } out: return retval; } /* Append a chunk to the offered packet reporting back any inability to do * so. */ static enum sctp_xmit __sctp_packet_append_chunk(struct sctp_packet *packet, struct sctp_chunk *chunk) { __u16 chunk_len = SCTP_PAD4(ntohs(chunk->chunk_hdr->length)); enum sctp_xmit retval = SCTP_XMIT_OK; /* Check to see if this chunk will fit into the packet */ retval = sctp_packet_will_fit(packet, chunk, chunk_len); if (retval != SCTP_XMIT_OK) goto finish; /* We believe that this chunk is OK to add to the packet */ switch (chunk->chunk_hdr->type) { case SCTP_CID_DATA: case SCTP_CID_I_DATA: /* Account for the data being in the packet */ sctp_packet_append_data(packet, chunk); /* Disallow SACK bundling after DATA. */ packet->has_sack = 1; /* Disallow AUTH bundling after DATA */ packet->has_auth = 1; /* Let it be knows that packet has DATA in it */ packet->has_data = 1; /* timestamp the chunk for rtx purposes */ chunk->sent_at = jiffies; /* Mainly used for prsctp RTX policy */ chunk->sent_count++; break; case SCTP_CID_COOKIE_ECHO: packet->has_cookie_echo = 1; break; case SCTP_CID_SACK: packet->has_sack = 1; if (chunk->asoc) chunk->asoc->stats.osacks++; break; case SCTP_CID_AUTH: packet->has_auth = 1; packet->auth = chunk; break; } /* It is OK to send this chunk. */ list_add_tail(&chunk->list, &packet->chunk_list); packet->size += chunk_len; chunk->transport = packet->transport; finish: return retval; } /* Append a chunk to the offered packet reporting back any inability to do * so. */ enum sctp_xmit sctp_packet_append_chunk(struct sctp_packet *packet, struct sctp_chunk *chunk) { enum sctp_xmit retval = SCTP_XMIT_OK; pr_debug("%s: packet:%p chunk:%p\n", __func__, packet, chunk); /* Data chunks are special. Before seeing what else we can * bundle into this packet, check to see if we are allowed to * send this DATA. */ if (sctp_chunk_is_data(chunk)) { retval = sctp_packet_can_append_data(packet, chunk); if (retval != SCTP_XMIT_OK) goto finish; } /* Try to bundle AUTH chunk */ retval = sctp_packet_bundle_auth(packet, chunk); if (retval != SCTP_XMIT_OK) goto finish; /* Try to bundle SACK chunk */ retval = sctp_packet_bundle_sack(packet, chunk); if (retval != SCTP_XMIT_OK) goto finish; retval = __sctp_packet_append_chunk(packet, chunk); if (retval != SCTP_XMIT_OK) goto finish; retval = sctp_packet_bundle_pad(packet, chunk); finish: return retval; } static void sctp_packet_gso_append(struct sk_buff *head, struct sk_buff *skb) { if (SCTP_OUTPUT_CB(head)->last == head) skb_shinfo(head)->frag_list = skb; else SCTP_OUTPUT_CB(head)->last->next = skb; SCTP_OUTPUT_CB(head)->last = skb; head->truesize += skb->truesize; head->data_len += skb->len; head->len += skb->len; refcount_add(skb->truesize, &head->sk->sk_wmem_alloc); __skb_header_release(skb); } static int sctp_packet_pack(struct sctp_packet *packet, struct sk_buff *head, int gso, gfp_t gfp) { struct sctp_transport *tp = packet->transport; struct sctp_auth_chunk *auth = NULL; struct sctp_chunk *chunk, *tmp; int pkt_count = 0, pkt_size; struct sock *sk = head->sk; struct sk_buff *nskb; int auth_len = 0; if (gso) { skb_shinfo(head)->gso_type = sk->sk_gso_type; SCTP_OUTPUT_CB(head)->last = head; } else { nskb = head; pkt_size = packet->size; goto merge; } do { /* calculate the pkt_size and alloc nskb */ pkt_size = packet->overhead; list_for_each_entry_safe(chunk, tmp, &packet->chunk_list, list) { int padded = SCTP_PAD4(chunk->skb->len); if (chunk == packet->auth) auth_len = padded; else if (auth_len + padded + packet->overhead > tp->pathmtu) return 0; else if (pkt_size + padded > tp->pathmtu) break; pkt_size += padded; } nskb = alloc_skb(pkt_size + MAX_HEADER, gfp); if (!nskb) return 0; skb_reserve(nskb, packet->overhead + MAX_HEADER); merge: /* merge chunks into nskb and append nskb into head list */ pkt_size -= packet->overhead; list_for_each_entry_safe(chunk, tmp, &packet->chunk_list, list) { int padding; list_del_init(&chunk->list); if (sctp_chunk_is_data(chunk)) { if (!sctp_chunk_retransmitted(chunk) && !tp->rto_pending) { chunk->rtt_in_progress = 1; tp->rto_pending = 1; } } padding = SCTP_PAD4(chunk->skb->len) - chunk->skb->len; if (padding) skb_put_zero(chunk->skb, padding); if (chunk == packet->auth) auth = (struct sctp_auth_chunk *) skb_tail_pointer(nskb); skb_put_data(nskb, chunk->skb->data, chunk->skb->len); pr_debug("*** Chunk:%p[%s] %s 0x%x, length:%d, chunk->skb->len:%d, rtt_in_progress:%d\n", chunk, sctp_cname(SCTP_ST_CHUNK(chunk->chunk_hdr->type)), chunk->has_tsn ? "TSN" : "No TSN", chunk->has_tsn ? ntohl(chunk->subh.data_hdr->tsn) : 0, ntohs(chunk->chunk_hdr->length), chunk->skb->len, chunk->rtt_in_progress); pkt_size -= SCTP_PAD4(chunk->skb->len); if (!sctp_chunk_is_data(chunk) && chunk != packet->auth) sctp_chunk_free(chunk); if (!pkt_size) break; } if (auth) { sctp_auth_calculate_hmac(tp->asoc, nskb, auth, packet->auth->shkey, gfp); /* free auth if no more chunks, or add it back */ if (list_empty(&packet->chunk_list)) sctp_chunk_free(packet->auth); else list_add(&packet->auth->list, &packet->chunk_list); } if (gso) sctp_packet_gso_append(head, nskb); pkt_count++; } while (!list_empty(&packet->chunk_list)); if (gso) { memset(head->cb, 0, max(sizeof(struct inet_skb_parm), sizeof(struct inet6_skb_parm))); skb_shinfo(head)->gso_segs = pkt_count; skb_shinfo(head)->gso_size = GSO_BY_FRAGS; goto chksum; } if (sctp_checksum_disable) return 1; if (!(tp->dst->dev->features & NETIF_F_SCTP_CRC) || dst_xfrm(tp->dst) || packet->ipfragok || tp->encap_port) { struct sctphdr *sh = (struct sctphdr *)skb_transport_header(head); sh->checksum = sctp_compute_cksum(head, 0); } else { chksum: head->ip_summed = CHECKSUM_PARTIAL; head->csum_not_inet = 1; head->csum_start = skb_transport_header(head) - head->head; head->csum_offset = offsetof(struct sctphdr, checksum); } return pkt_count; } /* All packets are sent to the network through this function from * sctp_outq_tail(). * * The return value is always 0 for now. */ int sctp_packet_transmit(struct sctp_packet *packet, gfp_t gfp) { struct sctp_transport *tp = packet->transport; struct sctp_association *asoc = tp->asoc; struct sctp_chunk *chunk, *tmp; int pkt_count, gso = 0; struct sk_buff *head; struct sctphdr *sh; struct sock *sk; pr_debug("%s: packet:%p\n", __func__, packet); if (list_empty(&packet->chunk_list)) return 0; chunk = list_entry(packet->chunk_list.next, struct sctp_chunk, list); sk = chunk->skb->sk; if (packet->size > tp->pathmtu && !packet->ipfragok && !chunk->pmtu_probe) { if (tp->pl.state == SCTP_PL_ERROR) { /* do IP fragmentation if in Error state */ packet->ipfragok = 1; } else { if (!sk_can_gso(sk)) { /* check gso */ pr_err_once("Trying to GSO but underlying device doesn't support it."); goto out; } gso = 1; } } /* alloc head skb */ head = alloc_skb((gso ? packet->overhead : packet->size) + MAX_HEADER, gfp); if (!head) goto out; skb_reserve(head, packet->overhead + MAX_HEADER); skb_set_owner_w(head, sk); /* set sctp header */ sh = skb_push(head, sizeof(struct sctphdr)); skb_reset_transport_header(head); sh->source = htons(packet->source_port); sh->dest = htons(packet->destination_port); sh->vtag = htonl(packet->vtag); sh->checksum = 0; /* drop packet if no dst */ if (!tp->dst) { IP_INC_STATS(sock_net(sk), IPSTATS_MIB_OUTNOROUTES); kfree_skb(head); goto out; } /* pack up chunks */ pkt_count = sctp_packet_pack(packet, head, gso, gfp); if (!pkt_count) { kfree_skb(head); goto out; } pr_debug("***sctp_transmit_packet*** skb->len:%d\n", head->len); /* start autoclose timer */ if (packet->has_data && sctp_state(asoc, ESTABLISHED) && asoc->timeouts[SCTP_EVENT_TIMEOUT_AUTOCLOSE]) { struct timer_list *timer = &asoc->timers[SCTP_EVENT_TIMEOUT_AUTOCLOSE]; unsigned long timeout = asoc->timeouts[SCTP_EVENT_TIMEOUT_AUTOCLOSE]; if (!mod_timer(timer, jiffies + timeout)) sctp_association_hold(asoc); } /* sctp xmit */ tp->af_specific->ecn_capable(sk); if (asoc) { asoc->stats.opackets += pkt_count; if (asoc->peer.last_sent_to != tp) asoc->peer.last_sent_to = tp; } head->ignore_df = packet->ipfragok; if (tp->dst_pending_confirm) skb_set_dst_pending_confirm(head, 1); /* neighbour should be confirmed on successful transmission or * positive error */ if (tp->af_specific->sctp_xmit(head, tp) >= 0 && tp->dst_pending_confirm) tp->dst_pending_confirm = 0; out: list_for_each_entry_safe(chunk, tmp, &packet->chunk_list, list) { list_del_init(&chunk->list); if (!sctp_chunk_is_data(chunk)) sctp_chunk_free(chunk); } sctp_packet_reset(packet); return 0; } /******************************************************************** * 2nd Level Abstractions ********************************************************************/ /* This private function check to see if a chunk can be added */ static enum sctp_xmit sctp_packet_can_append_data(struct sctp_packet *packet, struct sctp_chunk *chunk) { size_t datasize, rwnd, inflight, flight_size; struct sctp_transport *transport = packet->transport; struct sctp_association *asoc = transport->asoc; struct sctp_outq *q = &asoc->outqueue; /* RFC 2960 6.1 Transmission of DATA Chunks * * A) At any given time, the data sender MUST NOT transmit new data to * any destination transport address if its peer's rwnd indicates * that the peer has no buffer space (i.e. rwnd is 0, see Section * 6.2.1). However, regardless of the value of rwnd (including if it * is 0), the data sender can always have one DATA chunk in flight to * the receiver if allowed by cwnd (see rule B below). This rule * allows the sender to probe for a change in rwnd that the sender * missed due to the SACK having been lost in transit from the data * receiver to the data sender. */ rwnd = asoc->peer.rwnd; inflight = q->outstanding_bytes; flight_size = transport->flight_size; datasize = sctp_data_size(chunk); if (datasize > rwnd && inflight > 0) /* We have (at least) one data chunk in flight, * so we can't fall back to rule 6.1 B). */ return SCTP_XMIT_RWND_FULL; /* RFC 2960 6.1 Transmission of DATA Chunks * * B) At any given time, the sender MUST NOT transmit new data * to a given transport address if it has cwnd or more bytes * of data outstanding to that transport address. */ /* RFC 7.2.4 & the Implementers Guide 2.8. * * 3) ... * When a Fast Retransmit is being performed the sender SHOULD * ignore the value of cwnd and SHOULD NOT delay retransmission. */ if (chunk->fast_retransmit != SCTP_NEED_FRTX && flight_size >= transport->cwnd) return SCTP_XMIT_RWND_FULL; /* Nagle's algorithm to solve small-packet problem: * Inhibit the sending of new chunks when new outgoing data arrives * if any previously transmitted data on the connection remains * unacknowledged. */ if ((sctp_sk(asoc->base.sk)->nodelay || inflight == 0) && !asoc->force_delay) /* Nothing unacked */ return SCTP_XMIT_OK; if (!sctp_packet_empty(packet)) /* Append to packet */ return SCTP_XMIT_OK; if (!sctp_state(asoc, ESTABLISHED)) return SCTP_XMIT_OK; /* Check whether this chunk and all the rest of pending data will fit * or delay in hopes of bundling a full sized packet. */ if (chunk->skb->len + q->out_qlen > transport->pathmtu - packet->overhead - sctp_datachk_len(&chunk->asoc->stream) - 4) /* Enough data queued to fill a packet */ return SCTP_XMIT_OK; /* Don't delay large message writes that may have been fragmented */ if (!chunk->msg->can_delay) return SCTP_XMIT_OK; /* Defer until all data acked or packet full */ return SCTP_XMIT_DELAY; } /* This private function does management things when adding DATA chunk */ static void sctp_packet_append_data(struct sctp_packet *packet, struct sctp_chunk *chunk) { struct sctp_transport *transport = packet->transport; size_t datasize = sctp_data_size(chunk); struct sctp_association *asoc = transport->asoc; u32 rwnd = asoc->peer.rwnd; /* Keep track of how many bytes are in flight over this transport. */ transport->flight_size += datasize; /* Keep track of how many bytes are in flight to the receiver. */ asoc->outqueue.outstanding_bytes += datasize; /* Update our view of the receiver's rwnd. */ if (datasize < rwnd) rwnd -= datasize; else rwnd = 0; asoc->peer.rwnd = rwnd; sctp_chunk_assign_tsn(chunk); asoc->stream.si->assign_number(chunk); } static enum sctp_xmit sctp_packet_will_fit(struct sctp_packet *packet, struct sctp_chunk *chunk, u16 chunk_len) { enum sctp_xmit retval = SCTP_XMIT_OK; size_t psize, pmtu, maxsize; /* Don't bundle in this packet if this chunk's auth key doesn't * match other chunks already enqueued on this packet. Also, * don't bundle the chunk with auth key if other chunks in this * packet don't have auth key. */ if ((packet->auth && chunk->shkey != packet->auth->shkey) || (!packet->auth && chunk->shkey && chunk->chunk_hdr->type != SCTP_CID_AUTH)) return SCTP_XMIT_PMTU_FULL; psize = packet->size; if (packet->transport->asoc) pmtu = packet->transport->asoc->pathmtu; else pmtu = packet->transport->pathmtu; /* Decide if we need to fragment or resubmit later. */ if (psize + chunk_len > pmtu) { /* It's OK to fragment at IP level if any one of the following * is true: * 1. The packet is empty (meaning this chunk is greater * the MTU) * 2. The packet doesn't have any data in it yet and data * requires authentication. */ if (sctp_packet_empty(packet) || (!packet->has_data && chunk->auth)) { /* We no longer do re-fragmentation. * Just fragment at the IP layer, if we * actually hit this condition */ packet->ipfragok = 1; goto out; } /* Similarly, if this chunk was built before a PMTU * reduction, we have to fragment it at IP level now. So * if the packet already contains something, we need to * flush. */ maxsize = pmtu - packet->overhead; if (packet->auth) maxsize -= SCTP_PAD4(packet->auth->skb->len); if (chunk_len > maxsize) retval = SCTP_XMIT_PMTU_FULL; /* It is also okay to fragment if the chunk we are * adding is a control chunk, but only if current packet * is not a GSO one otherwise it causes fragmentation of * a large frame. So in this case we allow the * fragmentation by forcing it to be in a new packet. */ if (!sctp_chunk_is_data(chunk) && packet->has_data) retval = SCTP_XMIT_PMTU_FULL; if (psize + chunk_len > packet->max_size) /* Hit GSO/PMTU limit, gotta flush */ retval = SCTP_XMIT_PMTU_FULL; if (!packet->transport->burst_limited && psize + chunk_len > (packet->transport->cwnd >> 1)) /* Do not allow a single GSO packet to use more * than half of cwnd. */ retval = SCTP_XMIT_PMTU_FULL; if (packet->transport->burst_limited && psize + chunk_len > (packet->transport->burst_limited >> 1)) /* Do not allow a single GSO packet to use more * than half of original cwnd. */ retval = SCTP_XMIT_PMTU_FULL; /* Otherwise it will fit in the GSO packet */ } out: return retval; } |
| 53 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 | /* SPDX-License-Identifier: GPL-2.0 */ #undef TRACE_SYSTEM #define TRACE_SYSTEM vsock #if !defined(_TRACE_VSOCK_VIRTIO_TRANSPORT_COMMON_H) || \ defined(TRACE_HEADER_MULTI_READ) #define _TRACE_VSOCK_VIRTIO_TRANSPORT_COMMON_H #include <linux/tracepoint.h> TRACE_DEFINE_ENUM(VIRTIO_VSOCK_TYPE_STREAM); TRACE_DEFINE_ENUM(VIRTIO_VSOCK_TYPE_SEQPACKET); #define show_type(val) \ __print_symbolic(val, \ { VIRTIO_VSOCK_TYPE_STREAM, "STREAM" }, \ { VIRTIO_VSOCK_TYPE_SEQPACKET, "SEQPACKET" }) TRACE_DEFINE_ENUM(VIRTIO_VSOCK_OP_INVALID); TRACE_DEFINE_ENUM(VIRTIO_VSOCK_OP_REQUEST); TRACE_DEFINE_ENUM(VIRTIO_VSOCK_OP_RESPONSE); TRACE_DEFINE_ENUM(VIRTIO_VSOCK_OP_RST); TRACE_DEFINE_ENUM(VIRTIO_VSOCK_OP_SHUTDOWN); TRACE_DEFINE_ENUM(VIRTIO_VSOCK_OP_RW); TRACE_DEFINE_ENUM(VIRTIO_VSOCK_OP_CREDIT_UPDATE); TRACE_DEFINE_ENUM(VIRTIO_VSOCK_OP_CREDIT_REQUEST); #define show_op(val) \ __print_symbolic(val, \ { VIRTIO_VSOCK_OP_INVALID, "INVALID" }, \ { VIRTIO_VSOCK_OP_REQUEST, "REQUEST" }, \ { VIRTIO_VSOCK_OP_RESPONSE, "RESPONSE" }, \ { VIRTIO_VSOCK_OP_RST, "RST" }, \ { VIRTIO_VSOCK_OP_SHUTDOWN, "SHUTDOWN" }, \ { VIRTIO_VSOCK_OP_RW, "RW" }, \ { VIRTIO_VSOCK_OP_CREDIT_UPDATE, "CREDIT_UPDATE" }, \ { VIRTIO_VSOCK_OP_CREDIT_REQUEST, "CREDIT_REQUEST" }) TRACE_EVENT(virtio_transport_alloc_pkt, TP_PROTO( __u32 src_cid, __u32 src_port, __u32 dst_cid, __u32 dst_port, __u32 len, __u16 type, __u16 op, __u32 flags, bool zcopy ), TP_ARGS( src_cid, src_port, dst_cid, dst_port, len, type, op, flags, zcopy ), TP_STRUCT__entry( __field(__u32, src_cid) __field(__u32, src_port) __field(__u32, dst_cid) __field(__u32, dst_port) __field(__u32, len) __field(__u16, type) __field(__u16, op) __field(__u32, flags) __field(bool, zcopy) ), TP_fast_assign( __entry->src_cid = src_cid; __entry->src_port = src_port; __entry->dst_cid = dst_cid; __entry->dst_port = dst_port; __entry->len = len; __entry->type = type; __entry->op = op; __entry->flags = flags; __entry->zcopy = zcopy; ), TP_printk("%u:%u -> %u:%u len=%u type=%s op=%s flags=%#x zcopy=%s", __entry->src_cid, __entry->src_port, __entry->dst_cid, __entry->dst_port, __entry->len, show_type(__entry->type), show_op(__entry->op), __entry->flags, __entry->zcopy ? "true" : "false") ); TRACE_EVENT(virtio_transport_recv_pkt, TP_PROTO( __u32 src_cid, __u32 src_port, __u32 dst_cid, __u32 dst_port, __u32 len, __u16 type, __u16 op, __u32 flags, __u32 buf_alloc, __u32 fwd_cnt ), TP_ARGS( src_cid, src_port, dst_cid, dst_port, len, type, op, flags, buf_alloc, fwd_cnt ), TP_STRUCT__entry( __field(__u32, src_cid) __field(__u32, src_port) __field(__u32, dst_cid) __field(__u32, dst_port) __field(__u32, len) __field(__u16, type) __field(__u16, op) __field(__u32, flags) __field(__u32, buf_alloc) __field(__u32, fwd_cnt) ), TP_fast_assign( __entry->src_cid = src_cid; __entry->src_port = src_port; __entry->dst_cid = dst_cid; __entry->dst_port = dst_port; __entry->len = len; __entry->type = type; __entry->op = op; __entry->flags = flags; __entry->buf_alloc = buf_alloc; __entry->fwd_cnt = fwd_cnt; ), TP_printk("%u:%u -> %u:%u len=%u type=%s op=%s flags=%#x " "buf_alloc=%u fwd_cnt=%u", __entry->src_cid, __entry->src_port, __entry->dst_cid, __entry->dst_port, __entry->len, show_type(__entry->type), show_op(__entry->op), __entry->flags, __entry->buf_alloc, __entry->fwd_cnt) ); #endif /* _TRACE_VSOCK_VIRTIO_TRANSPORT_COMMON_H */ #undef TRACE_INCLUDE_FILE #define TRACE_INCLUDE_FILE vsock_virtio_transport_common /* This part must be outside protection */ #include <trace/define_trace.h> |
| 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 | // SPDX-License-Identifier: GPL-2.0-or-later /* * (C) 2011 Pablo Neira Ayuso <pablo@netfilter.org> * (C) 2011 Intra2net AG <https://www.intra2net.com> */ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #include <linux/module.h> #include <linux/skbuff.h> #include <linux/netfilter/x_tables.h> #include <linux/netfilter/nfnetlink_acct.h> #include <linux/netfilter/xt_nfacct.h> MODULE_AUTHOR("Pablo Neira Ayuso <pablo@netfilter.org>"); MODULE_DESCRIPTION("Xtables: match for the extended accounting infrastructure"); MODULE_LICENSE("GPL"); MODULE_ALIAS("ipt_nfacct"); MODULE_ALIAS("ip6t_nfacct"); static bool nfacct_mt(const struct sk_buff *skb, struct xt_action_param *par) { int overquota; const struct xt_nfacct_match_info *info = par->targinfo; nfnl_acct_update(skb, info->nfacct); overquota = nfnl_acct_overquota(xt_net(par), info->nfacct); return overquota != NFACCT_UNDERQUOTA; } static int nfacct_mt_checkentry(const struct xt_mtchk_param *par) { struct xt_nfacct_match_info *info = par->matchinfo; struct nf_acct *nfacct; nfacct = nfnl_acct_find_get(par->net, info->name); if (nfacct == NULL) { pr_info_ratelimited("accounting object `%s' does not exists\n", info->name); return -ENOENT; } info->nfacct = nfacct; return 0; } static void nfacct_mt_destroy(const struct xt_mtdtor_param *par) { const struct xt_nfacct_match_info *info = par->matchinfo; nfnl_acct_put(info->nfacct); } static struct xt_match nfacct_mt_reg[] __read_mostly = { { .name = "nfacct", .revision = 0, .family = NFPROTO_UNSPEC, .checkentry = nfacct_mt_checkentry, .match = nfacct_mt, .destroy = nfacct_mt_destroy, .matchsize = sizeof(struct xt_nfacct_match_info), .usersize = offsetof(struct xt_nfacct_match_info, nfacct), .me = THIS_MODULE, }, { .name = "nfacct", .revision = 1, .family = NFPROTO_UNSPEC, .checkentry = nfacct_mt_checkentry, .match = nfacct_mt, .destroy = nfacct_mt_destroy, .matchsize = sizeof(struct xt_nfacct_match_info_v1), .usersize = offsetof(struct xt_nfacct_match_info_v1, nfacct), .me = THIS_MODULE, }, }; static int __init nfacct_mt_init(void) { return xt_register_matches(nfacct_mt_reg, ARRAY_SIZE(nfacct_mt_reg)); } static void __exit nfacct_mt_exit(void) { xt_unregister_matches(nfacct_mt_reg, ARRAY_SIZE(nfacct_mt_reg)); } module_init(nfacct_mt_init); module_exit(nfacct_mt_exit); |
| 18 1 3 28 1 49 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 | // SPDX-License-Identifier: GPL-2.0 /* * Copyright © 2019 Oracle and/or its affiliates. All rights reserved. * Copyright © 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved. * * KVM Xen emulation */ #ifndef __ARCH_X86_KVM_XEN_H__ #define __ARCH_X86_KVM_XEN_H__ #include <asm/xen/hypervisor.h> #ifdef CONFIG_KVM_XEN #include <linux/jump_label_ratelimit.h> extern struct static_key_false_deferred kvm_xen_enabled; int __kvm_xen_has_interrupt(struct kvm_vcpu *vcpu); void kvm_xen_inject_pending_events(struct kvm_vcpu *vcpu); void kvm_xen_inject_vcpu_vector(struct kvm_vcpu *vcpu); int kvm_xen_vcpu_set_attr(struct kvm_vcpu *vcpu, struct kvm_xen_vcpu_attr *data); int kvm_xen_vcpu_get_attr(struct kvm_vcpu *vcpu, struct kvm_xen_vcpu_attr *data); int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data); int kvm_xen_hvm_get_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data); int kvm_xen_hvm_evtchn_send(struct kvm *kvm, struct kvm_irq_routing_xen_evtchn *evt); int kvm_xen_write_hypercall_page(struct kvm_vcpu *vcpu, u64 data); int kvm_xen_hvm_config(struct kvm *kvm, struct kvm_xen_hvm_config *xhc); void kvm_xen_init_vm(struct kvm *kvm); void kvm_xen_destroy_vm(struct kvm *kvm); void kvm_xen_init_vcpu(struct kvm_vcpu *vcpu); void kvm_xen_destroy_vcpu(struct kvm_vcpu *vcpu); int kvm_xen_set_evtchn_fast(struct kvm_xen_evtchn *xe, struct kvm *kvm); int kvm_xen_setup_evtchn(struct kvm *kvm, struct kvm_kernel_irq_routing_entry *e, const struct kvm_irq_routing_entry *ue); void kvm_xen_update_tsc_info(struct kvm_vcpu *vcpu); static inline void kvm_xen_sw_enable_lapic(struct kvm_vcpu *vcpu) { /* * The local APIC is being enabled. If the per-vCPU upcall vector is * set and the vCPU's evtchn_upcall_pending flag is set, inject the * interrupt. */ if (static_branch_unlikely(&kvm_xen_enabled.key) && vcpu->arch.xen.vcpu_info_cache.active && vcpu->arch.xen.upcall_vector && __kvm_xen_has_interrupt(vcpu)) kvm_xen_inject_vcpu_vector(vcpu); } static inline bool kvm_xen_msr_enabled(struct kvm *kvm) { return static_branch_unlikely(&kvm_xen_enabled.key) && kvm->arch.xen_hvm_config.msr; } static inline bool kvm_xen_hypercall_enabled(struct kvm *kvm) { return static_branch_unlikely(&kvm_xen_enabled.key) && (kvm->arch.xen_hvm_config.flags & KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL); } static inline int kvm_xen_has_interrupt(struct kvm_vcpu *vcpu) { if (static_branch_unlikely(&kvm_xen_enabled.key) && vcpu->arch.xen.vcpu_info_cache.active && vcpu->kvm->arch.xen.upcall_vector) return __kvm_xen_has_interrupt(vcpu); return 0; } static inline bool kvm_xen_has_pending_events(struct kvm_vcpu *vcpu) { return static_branch_unlikely(&kvm_xen_enabled.key) && vcpu->arch.xen.evtchn_pending_sel; } static inline bool kvm_xen_timer_enabled(struct kvm_vcpu *vcpu) { return !!vcpu->arch.xen.timer_virq; } static inline int kvm_xen_has_pending_timer(struct kvm_vcpu *vcpu) { if (kvm_xen_hypercall_enabled(vcpu->kvm) && kvm_xen_timer_enabled(vcpu)) return atomic_read(&vcpu->arch.xen.timer_pending); return 0; } void kvm_xen_inject_timer_irqs(struct kvm_vcpu *vcpu); #else static inline int kvm_xen_write_hypercall_page(struct kvm_vcpu *vcpu, u64 data) { return 1; } static inline void kvm_xen_init_vm(struct kvm *kvm) { } static inline void kvm_xen_destroy_vm(struct kvm *kvm) { } static inline void kvm_xen_init_vcpu(struct kvm_vcpu *vcpu) { } static inline void kvm_xen_destroy_vcpu(struct kvm_vcpu *vcpu) { } static inline void kvm_xen_sw_enable_lapic(struct kvm_vcpu *vcpu) { } static inline bool kvm_xen_msr_enabled(struct kvm *kvm) { return false; } static inline bool kvm_xen_hypercall_enabled(struct kvm *kvm) { return false; } static inline int kvm_xen_has_interrupt(struct kvm_vcpu *vcpu) { return 0; } static inline void kvm_xen_inject_pending_events(struct kvm_vcpu *vcpu) { } static inline bool kvm_xen_has_pending_events(struct kvm_vcpu *vcpu) { return false; } static inline int kvm_xen_has_pending_timer(struct kvm_vcpu *vcpu) { return 0; } static inline void kvm_xen_inject_timer_irqs(struct kvm_vcpu *vcpu) { } static inline bool kvm_xen_timer_enabled(struct kvm_vcpu *vcpu) { return false; } static inline void kvm_xen_update_tsc_info(struct kvm_vcpu *vcpu) { } #endif int kvm_xen_hypercall(struct kvm_vcpu *vcpu); #include <asm/pvclock-abi.h> #include <asm/xen/interface.h> #include <xen/interface/vcpu.h> void kvm_xen_update_runstate(struct kvm_vcpu *vcpu, int state); static inline void kvm_xen_runstate_set_running(struct kvm_vcpu *vcpu) { kvm_xen_update_runstate(vcpu, RUNSTATE_running); } static inline void kvm_xen_runstate_set_preempted(struct kvm_vcpu *vcpu) { /* * If the vCPU wasn't preempted but took a normal exit for * some reason (hypercalls, I/O, etc.), that is accounted as * still RUNSTATE_running, as the VMM is still operating on * behalf of the vCPU. Only if the VMM does actually block * does it need to enter RUNSTATE_blocked. */ if (WARN_ON_ONCE(!vcpu->preempted)) return; kvm_xen_update_runstate(vcpu, RUNSTATE_runnable); } /* 32-bit compatibility definitions, also used natively in 32-bit build */ struct compat_arch_vcpu_info { unsigned int cr2; unsigned int pad[5]; }; struct compat_vcpu_info { uint8_t evtchn_upcall_pending; uint8_t evtchn_upcall_mask; uint16_t pad; uint32_t evtchn_pending_sel; struct compat_arch_vcpu_info arch; struct pvclock_vcpu_time_info time; }; /* 64 bytes (x86) */ struct compat_arch_shared_info { unsigned int max_pfn; unsigned int pfn_to_mfn_frame_list_list; unsigned int nmi_reason; unsigned int p2m_cr3; unsigned int p2m_vaddr; unsigned int p2m_generation; uint32_t wc_sec_hi; }; struct compat_shared_info { struct compat_vcpu_info vcpu_info[MAX_VIRT_CPUS]; uint32_t evtchn_pending[32]; uint32_t evtchn_mask[32]; struct pvclock_wall_clock wc; struct compat_arch_shared_info arch; }; #define COMPAT_EVTCHN_2L_NR_CHANNELS (8 * \ sizeof_field(struct compat_shared_info, \ evtchn_pending)) struct compat_vcpu_runstate_info { int state; uint64_t state_entry_time; uint64_t time[4]; } __attribute__((packed)); struct compat_sched_poll { /* This is actually a guest virtual address which points to ports. */ uint32_t ports; unsigned int nr_ports; uint64_t timeout; }; #endif /* __ARCH_X86_KVM_XEN_H__ */ |
| 43 17 333 47 19 38 270 269 113 6 6 278 278 10 268 269 17 86 62 39 41 22 43 35 8 43 22 60 113 348 28 58 28 21 13 9 9 15 31 28 13 2 7 21 26 1 14 2 12 7 6 12 29 3 2 1 1 2 5 22 16 2 6 100 37 62 31 28 10 10 3 3 3 15 15 12 28 7 21 8 7 7 7 22 22 22 48 83 22 1 20 17 27 1 1 25 109 99 99 99 83 83 17 18 97 17 104 105 43 17 37 64 82 19 11 62 62 58 7 19 3 43 26 3 23 3 3 1 2 6 6 2 2 2 37 1 36 28 27 5 23 28 28 14 22 22 22 9 13 6 6 3 4 6 5 57 20 20 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 | // SPDX-License-Identifier: GPL-2.0 /* * Copyright (c) 2000-2005 Silicon Graphics, Inc. * All Rights Reserved. */ #include "xfs.h" #include "xfs_fs.h" #include "xfs_shared.h" #include "xfs_format.h" #include "xfs_log_format.h" #include "xfs_trans_resv.h" #include "xfs_mount.h" #include "xfs_defer.h" #include "xfs_da_format.h" #include "xfs_da_btree.h" #include "xfs_attr_sf.h" #include "xfs_inode.h" #include "xfs_trans.h" #include "xfs_bmap.h" #include "xfs_bmap_btree.h" #include "xfs_attr.h" #include "xfs_attr_leaf.h" #include "xfs_attr_remote.h" #include "xfs_quota.h" #include "xfs_trans_space.h" #include "xfs_trace.h" #include "xfs_attr_item.h" #include "xfs_xattr.h" #include "xfs_parent.h" struct kmem_cache *xfs_attr_intent_cache; /* * xfs_attr.c * * Provide the external interfaces to manage attribute lists. */ /*======================================================================== * Function prototypes for the kernel. *========================================================================*/ /* * Internal routines when attribute list fits inside the inode. */ STATIC int xfs_attr_shortform_addname(xfs_da_args_t *args); /* * Internal routines when attribute list is one block. */ STATIC int xfs_attr_leaf_get(xfs_da_args_t *args); STATIC int xfs_attr_leaf_removename(xfs_da_args_t *args); STATIC int xfs_attr_leaf_hasname(struct xfs_da_args *args, struct xfs_buf **bp); /* * Internal routines when attribute list is more than one block. */ STATIC int xfs_attr_node_get(xfs_da_args_t *args); STATIC void xfs_attr_restore_rmt_blk(struct xfs_da_args *args); static int xfs_attr_node_try_addname(struct xfs_attr_intent *attr); STATIC int xfs_attr_node_addname_find_attr(struct xfs_attr_intent *attr); STATIC int xfs_attr_node_remove_attr(struct xfs_attr_intent *attr); STATIC int xfs_attr_node_lookup(struct xfs_da_args *args, struct xfs_da_state *state); int xfs_inode_hasattr( struct xfs_inode *ip) { if (!xfs_inode_has_attr_fork(ip)) return 0; if (ip->i_af.if_format == XFS_DINODE_FMT_EXTENTS && ip->i_af.if_nextents == 0) return 0; return 1; } /* * Returns true if the there is exactly only block in the attr fork, in which * case the attribute fork consists of a single leaf block entry. */ bool xfs_attr_is_leaf( struct xfs_inode *ip) { struct xfs_ifork *ifp = &ip->i_af; struct xfs_iext_cursor icur; struct xfs_bmbt_irec imap; ASSERT(!xfs_need_iread_extents(ifp)); if (ifp->if_nextents != 1 || ifp->if_format != XFS_DINODE_FMT_EXTENTS) return false; xfs_iext_first(ifp, &icur); xfs_iext_get_extent(ifp, &icur, &imap); return imap.br_startoff == 0 && imap.br_blockcount == 1; } /* * XXX (dchinner): name path state saving and refilling is an optimisation to * avoid needing to look up name entries after rolling transactions removing * remote xattr blocks between the name entry lookup and name entry removal. * This optimisation got sidelined when combining the set and remove state * machines, but the code has been left in place because it is worthwhile to * restore the optimisation once the combined state machine paths have settled. * * This comment is a public service announcement to remind Future Dave that he * still needs to restore this code to working order. */ #if 0 /* * Fill in the disk block numbers in the state structure for the buffers * that are attached to the state structure. * This is done so that we can quickly reattach ourselves to those buffers * after some set of transaction commits have released these buffers. */ static int xfs_attr_fillstate(xfs_da_state_t *state) { xfs_da_state_path_t *path; xfs_da_state_blk_t *blk; int level; trace_xfs_attr_fillstate(state->args); /* * Roll down the "path" in the state structure, storing the on-disk * block number for those buffers in the "path". */ path = &state->path; ASSERT((path->active >= 0) && (path->active < XFS_DA_NODE_MAXDEPTH)); for (blk = path->blk, level = 0; level < path->active; blk++, level++) { if (blk->bp) { blk->disk_blkno = xfs_buf_daddr(blk->bp); blk->bp = NULL; } else { blk->disk_blkno = 0; } } /* * Roll down the "altpath" in the state structure, storing the on-disk * block number for those buffers in the "altpath". */ path = &state->altpath; ASSERT((path->active >= 0) && (path->active < XFS_DA_NODE_MAXDEPTH)); for (blk = path->blk, level = 0; level < path->active; blk++, level++) { if (blk->bp) { blk->disk_blkno = xfs_buf_daddr(blk->bp); blk->bp = NULL; } else { blk->disk_blkno = 0; } } return 0; } /* * Reattach the buffers to the state structure based on the disk block * numbers stored in the state structure. * This is done after some set of transaction commits have released those * buffers from our grip. */ static int xfs_attr_refillstate(xfs_da_state_t *state) { xfs_da_state_path_t *path; xfs_da_state_blk_t *blk; int level, error; trace_xfs_attr_refillstate(state->args); /* * Roll down the "path" in the state structure, storing the on-disk * block number for those buffers in the "path". */ path = &state->path; ASSERT((path->active >= 0) && (path->active < XFS_DA_NODE_MAXDEPTH)); for (blk = path->blk, level = 0; level < path->active; blk++, level++) { if (blk->disk_blkno) { error = xfs_da3_node_read_mapped(state->args->trans, state->args->dp, blk->disk_blkno, &blk->bp, XFS_ATTR_FORK); if (error) return error; } else { blk->bp = NULL; } } /* * Roll down the "altpath" in the state structure, storing the on-disk * block number for those buffers in the "altpath". */ path = &state->altpath; ASSERT((path->active >= 0) && (path->active < XFS_DA_NODE_MAXDEPTH)); for (blk = path->blk, level = 0; level < path->active; blk++, level++) { if (blk->disk_blkno) { error = xfs_da3_node_read_mapped(state->args->trans, state->args->dp, blk->disk_blkno, &blk->bp, XFS_ATTR_FORK); if (error) return error; } else { blk->bp = NULL; } } return 0; } #else static int xfs_attr_fillstate(xfs_da_state_t *state) { return 0; } #endif /*======================================================================== * Overall external interface routines. *========================================================================*/ /* * Retrieve an extended attribute and its value. Must have ilock. * Returns 0 on successful retrieval, otherwise an error. */ int xfs_attr_get_ilocked( struct xfs_da_args *args) { int error; xfs_assert_ilocked(args->dp, XFS_ILOCK_SHARED | XFS_ILOCK_EXCL); if (!xfs_inode_hasattr(args->dp)) return -ENOATTR; /* * The incore attr fork iext tree must be loaded for xfs_attr_is_leaf * to work correctly. */ error = xfs_iread_extents(args->trans, args->dp, XFS_ATTR_FORK); if (error) return error; if (args->dp->i_af.if_format == XFS_DINODE_FMT_LOCAL) return xfs_attr_shortform_getvalue(args); if (xfs_attr_is_leaf(args->dp)) return xfs_attr_leaf_get(args); return xfs_attr_node_get(args); } /* * Retrieve an extended attribute by name, and its value if requested. * * If args->valuelen is zero, then the caller does not want the value, just an * indication whether the attribute exists and the size of the value if it * exists. The size is returned in args.valuelen. * * If args->value is NULL but args->valuelen is non-zero, allocate the buffer * for the value after existence of the attribute has been determined. The * caller always has to free args->value if it is set, no matter if this * function was successful or not. * * If the attribute is found, but exceeds the size limit set by the caller in * args->valuelen, return -ERANGE with the size of the attribute that was found * in args->valuelen. */ int xfs_attr_get( struct xfs_da_args *args) { uint lock_mode; int error; XFS_STATS_INC(args->dp->i_mount, xs_attr_get); if (xfs_is_shutdown(args->dp->i_mount)) return -EIO; if (!args->owner) args->owner = args->dp->i_ino; args->geo = args->dp->i_mount->m_attr_geo; args->whichfork = XFS_ATTR_FORK; xfs_attr_sethash(args); /* Entirely possible to look up a name which doesn't exist */ args->op_flags = XFS_DA_OP_OKNOENT; lock_mode = xfs_ilock_attr_map_shared(args->dp); error = xfs_attr_get_ilocked(args); xfs_iunlock(args->dp, lock_mode); return error; } /* * Calculate how many blocks we need for the new attribute, */ int xfs_attr_calc_size( struct xfs_da_args *args, int *local) { struct xfs_mount *mp = args->dp->i_mount; int size; int nblks; /* * Determine space new attribute will use, and if it would be * "local" or "remote" (note: local != inline). */ size = xfs_attr_leaf_newentsize(args, local); nblks = XFS_DAENTER_SPACE_RES(mp, XFS_ATTR_FORK); if (*local) { if (size > (args->geo->blksize / 2)) { /* Double split possible */ nblks *= 2; } } else { /* * Out of line attribute, cannot double split, but * make room for the attribute value itself. */ uint dblocks = xfs_attr3_rmt_blocks(mp, args->valuelen); nblks += dblocks; nblks += XFS_NEXTENTADD_SPACE_RES(mp, dblocks, XFS_ATTR_FORK); } return nblks; } /* Initialize transaction reservation for an xattr set/replace/upsert */ inline struct xfs_trans_res xfs_attr_set_resv( const struct xfs_da_args *args) { struct xfs_mount *mp = args->dp->i_mount; struct xfs_trans_res ret = { .tr_logres = M_RES(mp)->tr_attrsetm.tr_logres + M_RES(mp)->tr_attrsetrt.tr_logres * args->total, .tr_logcount = XFS_ATTRSET_LOG_COUNT, .tr_logflags = XFS_TRANS_PERM_LOG_RES, }; return ret; } /* * Add an attr to a shortform fork. If there is no space, * xfs_attr_shortform_addname() will convert to leaf format and return -ENOSPC. * to use. */ STATIC int xfs_attr_try_sf_addname( struct xfs_inode *dp, struct xfs_da_args *args) { int error; /* * Build initial attribute list (if required). */ if (dp->i_af.if_format == XFS_DINODE_FMT_EXTENTS) xfs_attr_shortform_create(args); error = xfs_attr_shortform_addname(args); if (error == -ENOSPC) return error; /* * Commit the shortform mods, and we're done. * NOTE: this is also the error path (EEXIST, etc). */ if (!error) xfs_trans_ichgtime(args->trans, dp, XFS_ICHGTIME_CHG); if (xfs_has_wsync(dp->i_mount)) xfs_trans_set_sync(args->trans); return error; } static int xfs_attr_sf_addname( struct xfs_attr_intent *attr) { struct xfs_da_args *args = attr->xattri_da_args; struct xfs_inode *dp = args->dp; int error = 0; error = xfs_attr_try_sf_addname(dp, args); if (error != -ENOSPC) { ASSERT(!error || error == -EEXIST); attr->xattri_dela_state = XFS_DAS_DONE; goto out; } /* * It won't fit in the shortform, transform to a leaf block. GROT: * another possible req'mt for a double-split btree op. */ error = xfs_attr_shortform_to_leaf(args); if (error) return error; attr->xattri_dela_state = XFS_DAS_LEAF_ADD; out: trace_xfs_attr_sf_addname_return(attr->xattri_dela_state, args->dp); return error; } /* Compute the hash value for a user/root/secure extended attribute */ xfs_dahash_t xfs_attr_hashname( const uint8_t *name, int namelen) { return xfs_da_hashname(name, namelen); } /* Compute the hash value for any extended attribute from any namespace. */ xfs_dahash_t xfs_attr_hashval( struct xfs_mount *mp, unsigned int attr_flags, const uint8_t *name, int namelen, const void *value, int valuelen) { ASSERT(xfs_attr_check_namespace(attr_flags)); if (attr_flags & XFS_ATTR_PARENT) return xfs_parent_hashattr(mp, name, namelen, value, valuelen); return xfs_attr_hashname(name, namelen); } /* Save the current remote block info and clear the current pointers. */ static void xfs_attr_save_rmt_blk( struct xfs_da_args *args) { args->blkno2 = args->blkno; args->index2 = args->index; args->rmtblkno2 = args->rmtblkno; args->rmtblkcnt2 = args->rmtblkcnt; args->rmtvaluelen2 = args->rmtvaluelen; args->rmtblkno = 0; args->rmtblkcnt = 0; args->rmtvaluelen = 0; } /* Set stored info about a remote block */ static void xfs_attr_restore_rmt_blk( struct xfs_da_args *args) { args->blkno = args->blkno2; args->index = args->index2; args->rmtblkno = args->rmtblkno2; args->rmtblkcnt = args->rmtblkcnt2; args->rmtvaluelen = args->rmtvaluelen2; } /* * PPTR_REPLACE operations require the caller to set the old and new names and * values explicitly. Update the canonical fields to the new name and value * here now that the removal phase has finished. */ static void xfs_attr_update_pptr_replace_args( struct xfs_da_args *args) { ASSERT(args->new_namelen > 0); args->name = args->new_name; args->namelen = args->new_namelen; args->value = args->new_value; args->valuelen = args->new_valuelen; xfs_attr_sethash(args); } /* * Handle the state change on completion of a multi-state attr operation. * * If the XFS_DA_OP_REPLACE flag is set, this means the operation was the first * modification in a attr replace operation and we still have to do the second * state, indicated by @replace_state. * * We consume the XFS_DA_OP_REPLACE flag so that when we are called again on * completion of the second half of the attr replace operation we correctly * signal that it is done. */ static enum xfs_delattr_state xfs_attr_complete_op( struct xfs_attr_intent *attr, enum xfs_delattr_state replace_state) { struct xfs_da_args *args = attr->xattri_da_args; if (!(args->op_flags & XFS_DA_OP_REPLACE)) replace_state = XFS_DAS_DONE; else if (xfs_attr_intent_op(attr) == XFS_ATTRI_OP_FLAGS_PPTR_REPLACE) xfs_attr_update_pptr_replace_args(args); args->op_flags &= ~XFS_DA_OP_REPLACE; args->attr_filter &= ~XFS_ATTR_INCOMPLETE; return replace_state; } /* * Try to add an attribute to an inode in leaf form. */ static int xfs_attr_leaf_addname( struct xfs_attr_intent *attr) { struct xfs_da_args *args = attr->xattri_da_args; struct xfs_buf *bp; int error; ASSERT(xfs_attr_is_leaf(args->dp)); error = xfs_attr3_leaf_read(args->trans, args->dp, args->owner, 0, &bp); if (error) return error; /* * Look up the xattr name to set the insertion point for the new xattr. */ error = xfs_attr3_leaf_lookup_int(bp, args); switch (error) { case -ENOATTR: if (args->op_flags & XFS_DA_OP_REPLACE) goto out_brelse; break; case -EEXIST: if (!(args->op_flags & XFS_DA_OP_REPLACE)) goto out_brelse; trace_xfs_attr_leaf_replace(args); /* * Save the existing remote attr state so that the current * values reflect the state of the new attribute we are about to * add, not the attribute we just found and will remove later. */ xfs_attr_save_rmt_blk(args); break; case 0: break; default: goto out_brelse; } /* * We need to commit and roll if we need to allocate remote xattr blocks * or perform more xattr manipulations. Otherwise there is nothing more * to do and we can return success. */ if (!xfs_attr3_leaf_add(bp, args)) { error = xfs_attr3_leaf_to_node(args); if (error) return error; attr->xattri_dela_state = XFS_DAS_NODE_ADD; } else if (args->rmtblkno) { attr->xattri_dela_state = XFS_DAS_LEAF_SET_RMT; } else { attr->xattri_dela_state = xfs_attr_complete_op(attr, XFS_DAS_LEAF_REPLACE); } trace_xfs_attr_leaf_addname_return(attr->xattri_dela_state, args->dp); return 0; out_brelse: xfs_trans_brelse(args->trans, bp); return error; } /* * Add an entry to a node format attr tree. * * Note that we might still have a leaf here - xfs_attr_is_leaf() cannot tell * the difference between leaf + remote attr blocks and a node format tree, * so we may still end up having to convert from leaf to node format here. */ static int xfs_attr_node_addname( struct xfs_attr_intent *attr) { struct xfs_da_args *args = attr->xattri_da_args; int error; error = xfs_attr_node_addname_find_attr(attr); if (error) return error; error = xfs_attr_node_try_addname(attr); if (error == 1) { error = xfs_attr3_leaf_to_node(args); if (error) return error; /* * No state change, we really are in node form now * but we need the transaction rolled to continue. */ goto out; } if (error) return error; if (args->rmtblkno) attr->xattri_dela_state = XFS_DAS_NODE_SET_RMT; else attr->xattri_dela_state = xfs_attr_complete_op(attr, XFS_DAS_NODE_REPLACE); out: trace_xfs_attr_node_addname_return(attr->xattri_dela_state, args->dp); return error; } static int xfs_attr_rmtval_alloc( struct xfs_attr_intent *attr) { struct xfs_da_args *args = attr->xattri_da_args; int error = 0; /* * If there was an out-of-line value, allocate the blocks we * identified for its storage and copy the value. This is done * after we create the attribute so that we don't overflow the * maximum size of a transaction and/or hit a deadlock. */ if (attr->xattri_blkcnt > 0) { error = xfs_attr_rmtval_set_blk(attr); if (error) return error; /* Roll the transaction only if there is more to allocate. */ if (attr->xattri_blkcnt > 0) goto out; } error = xfs_attr_rmtval_set_value(args); if (error) return error; attr->xattri_dela_state = xfs_attr_complete_op(attr, ++attr->xattri_dela_state); /* * If we are not doing a rename, we've finished the operation but still * have to clear the incomplete flag protecting the new attr from * exposing partially initialised state if we crash during creation. */ if (attr->xattri_dela_state == XFS_DAS_DONE) error = xfs_attr3_leaf_clearflag(args); out: trace_xfs_attr_rmtval_alloc(attr->xattri_dela_state, args->dp); return error; } /* * Mark an attribute entry INCOMPLETE and save pointers to the relevant buffers * for later deletion of the entry. */ static int xfs_attr_leaf_mark_incomplete( struct xfs_da_args *args, struct xfs_da_state *state) { int error; /* * Fill in disk block numbers in the state structure * so that we can get the buffers back after we commit * several transactions in the following calls. */ error = xfs_attr_fillstate(state); if (error) return error; /* * Mark the attribute as INCOMPLETE */ return xfs_attr3_leaf_setflag(args); } /* Ensure the da state of an xattr deferred work item is ready to go. */ static inline void xfs_attr_item_init_da_state( struct xfs_attr_intent *attr) { struct xfs_da_args *args = attr->xattri_da_args; if (!attr->xattri_da_state) attr->xattri_da_state = xfs_da_state_alloc(args); else xfs_da_state_reset(attr->xattri_da_state, args); } /* * Initial setup for xfs_attr_node_removename. Make sure the attr is there and * the blocks are valid. Attr keys with remote blocks will be marked * incomplete. */ static int xfs_attr_node_removename_setup( struct xfs_attr_intent *attr) { struct xfs_da_args *args = attr->xattri_da_args; struct xfs_da_state *state; int error; xfs_attr_item_init_da_state(attr); error = xfs_attr_node_lookup(args, attr->xattri_da_state); if (error != -EEXIST) goto out; error = 0; state = attr->xattri_da_state; ASSERT(state->path.blk[state->path.active - 1].bp != NULL); ASSERT(state->path.blk[state->path.active - 1].magic == XFS_ATTR_LEAF_MAGIC); error = xfs_attr_leaf_mark_incomplete(args, state); if (error) goto out; if (args->rmtblkno > 0) error = xfs_attr_rmtval_invalidate(args); out: if (error) { xfs_da_state_free(attr->xattri_da_state); attr->xattri_da_state = NULL; } return error; } /* * Remove the original attr we have just replaced. This is dependent on the * original lookup and insert placing the old attr in args->blkno/args->index * and the new attr in args->blkno2/args->index2. */ static int xfs_attr_leaf_remove_attr( struct xfs_attr_intent *attr) { struct xfs_da_args *args = attr->xattri_da_args; struct xfs_inode *dp = args->dp; struct xfs_buf *bp = NULL; int forkoff; int error; error = xfs_attr3_leaf_read(args->trans, args->dp, args->owner, args->blkno, &bp); if (error) return error; xfs_attr3_leaf_remove(bp, args); forkoff = xfs_attr_shortform_allfit(bp, dp); if (forkoff) error = xfs_attr3_leaf_to_shortform(bp, args, forkoff); /* bp is gone due to xfs_da_shrink_inode */ return error; } /* * Shrink an attribute from leaf to shortform. Used by the node format remove * path when the node format collapses to a single block and so we have to check * if it can be collapsed further. */ static int xfs_attr_leaf_shrink( struct xfs_da_args *args) { struct xfs_inode *dp = args->dp; struct xfs_buf *bp; int forkoff; int error; if (!xfs_attr_is_leaf(dp)) return 0; error = xfs_attr3_leaf_read(args->trans, args->dp, args->owner, 0, &bp); if (error) return error; forkoff = xfs_attr_shortform_allfit(bp, dp); if (forkoff) { error = xfs_attr3_leaf_to_shortform(bp, args, forkoff); /* bp is gone due to xfs_da_shrink_inode */ } else { xfs_trans_brelse(args->trans, bp); } return error; } /* * Run the attribute operation specified in @attr. * * This routine is meant to function as a delayed operation and will set the * state to XFS_DAS_DONE when the operation is complete. Calling functions will * need to handle this, and recall the function until either an error or * XFS_DAS_DONE is detected. */ int xfs_attr_set_iter( struct xfs_attr_intent *attr) { struct xfs_da_args *args = attr->xattri_da_args; int error = 0; /* State machine switch */ next_state: switch (attr->xattri_dela_state) { case XFS_DAS_UNINIT: ASSERT(0); return -EFSCORRUPTED; case XFS_DAS_SF_ADD: return xfs_attr_sf_addname(attr); case XFS_DAS_LEAF_ADD: return xfs_attr_leaf_addname(attr); case XFS_DAS_NODE_ADD: return xfs_attr_node_addname(attr); case XFS_DAS_SF_REMOVE: error = xfs_attr_sf_removename(args); attr->xattri_dela_state = xfs_attr_complete_op(attr, xfs_attr_init_add_state(args)); break; case XFS_DAS_LEAF_REMOVE: error = xfs_attr_leaf_removename(args); attr->xattri_dela_state = xfs_attr_complete_op(attr, xfs_attr_init_add_state(args)); break; case XFS_DAS_NODE_REMOVE: error = xfs_attr_node_removename_setup(attr); if (error == -ENOATTR && (args->op_flags & XFS_DA_OP_RECOVERY)) { attr->xattri_dela_state = xfs_attr_complete_op(attr, xfs_attr_init_add_state(args)); error = 0; break; } if (error) return error; attr->xattri_dela_state = XFS_DAS_NODE_REMOVE_RMT; if (args->rmtblkno == 0) attr->xattri_dela_state++; break; case XFS_DAS_LEAF_SET_RMT: case XFS_DAS_NODE_SET_RMT: error = xfs_attr_rmtval_find_space(attr); if (error) return error; attr->xattri_dela_state++; fallthrough; case XFS_DAS_LEAF_ALLOC_RMT: case XFS_DAS_NODE_ALLOC_RMT: error = xfs_attr_rmtval_alloc(attr); if (error) return error; if (attr->xattri_dela_state == XFS_DAS_DONE) break; goto next_state; case XFS_DAS_LEAF_REPLACE: case XFS_DAS_NODE_REPLACE: /* * We must "flip" the incomplete flags on the "new" and "old" * attribute/value pairs so that one disappears and one appears * atomically. */ error = xfs_attr3_leaf_flipflags(args); if (error) return error; /* * We must commit the flag value change now to make it atomic * and then we can start the next trans in series at REMOVE_OLD. */ attr->xattri_dela_state++; break; case XFS_DAS_LEAF_REMOVE_OLD: case XFS_DAS_NODE_REMOVE_OLD: /* * If we have a remote attr, start the process of removing it * by invalidating any cached buffers. * * If we don't have a remote attr, we skip the remote block * removal state altogether with a second state increment. */ xfs_attr_restore_rmt_blk(args); if (args->rmtblkno) { error = xfs_attr_rmtval_invalidate(args); if (error) return error; } else { attr->xattri_dela_state++; } attr->xattri_dela_state++; goto next_state; case XFS_DAS_LEAF_REMOVE_RMT: case XFS_DAS_NODE_REMOVE_RMT: error = xfs_attr_rmtval_remove(attr); if (error == -EAGAIN) { error = 0; break; } if (error) return error; /* * We've finished removing the remote attr blocks, so commit the * transaction and move on to removing the attr name from the * leaf/node block. Removing the attr might require a full * transaction reservation for btree block freeing, so we * can't do that in the same transaction where we removed the * remote attr blocks. */ attr->xattri_dela_state++; break; case XFS_DAS_LEAF_REMOVE_ATTR: error = xfs_attr_leaf_remove_attr(attr); attr->xattri_dela_state = xfs_attr_complete_op(attr, xfs_attr_init_add_state(args)); break; case XFS_DAS_NODE_REMOVE_ATTR: error = xfs_attr_node_remove_attr(attr); if (!error) error = xfs_attr_leaf_shrink(args); attr->xattri_dela_state = xfs_attr_complete_op(attr, xfs_attr_init_add_state(args)); break; default: ASSERT(0); break; } trace_xfs_attr_set_iter_return(attr->xattri_dela_state, args->dp); return error; } /* * Return EEXIST if attr is found, or ENOATTR if not */ static int xfs_attr_lookup( struct xfs_da_args *args) { struct xfs_inode *dp = args->dp; struct xfs_buf *bp = NULL; struct xfs_da_state *state; int error; if (!xfs_inode_hasattr(dp)) return -ENOATTR; if (dp->i_af.if_format == XFS_DINODE_FMT_LOCAL) { if (xfs_attr_sf_findname(args)) return -EEXIST; return -ENOATTR; } /* Prerequisite for xfs_attr_is_leaf */ error = xfs_iread_extents(args->trans, args->dp, XFS_ATTR_FORK); if (error) return error; if (xfs_attr_is_leaf(dp)) { error = xfs_attr_leaf_hasname(args, &bp); if (bp) xfs_trans_brelse(args->trans, bp); return error; } state = xfs_da_state_alloc(args); error = xfs_attr_node_lookup(args, state); xfs_da_state_free(state); return error; } int xfs_attr_add_fork( struct xfs_inode *ip, /* incore inode pointer */ int size, /* space new attribute needs */ int rsvd) /* xact may use reserved blks */ { struct xfs_mount *mp = ip->i_mount; struct xfs_trans *tp; /* transaction pointer */ unsigned int blks; /* space reservation */ int error; /* error return value */ if (!xfs_is_metadir_inode(ip)) ASSERT(!XFS_NOT_DQATTACHED(mp, ip)); blks = XFS_ADDAFORK_SPACE_RES(mp); error = xfs_trans_alloc_inode(ip, &M_RES(mp)->tr_addafork, blks, 0, rsvd, &tp); if (error) return error; if (xfs_inode_has_attr_fork(ip)) goto trans_cancel; error = xfs_bmap_add_attrfork(tp, ip, size, rsvd); if (error) goto trans_cancel; error = xfs_trans_commit(tp); xfs_iunlock(ip, XFS_ILOCK_EXCL); return error; trans_cancel: xfs_trans_cancel(tp); xfs_iunlock(ip, XFS_ILOCK_EXCL); return error; } /* * Make a change to the xattr structure. * * The caller must have initialized @args, attached dquots, and must not hold * any ILOCKs. Reserved data blocks may be used if @rsvd is set. * * Returns -EEXIST for XFS_ATTRUPDATE_CREATE if the name already exists. * Returns -ENOATTR for XFS_ATTRUPDATE_REMOVE if the name does not exist. * Returns 0 on success, or a negative errno if something else went wrong. */ int xfs_attr_set( struct xfs_da_args *args, enum xfs_attr_update op, bool rsvd) { struct xfs_inode *dp = args->dp; struct xfs_mount *mp = dp->i_mount; struct xfs_trans_res tres; int error, local; int rmt_blks = 0; unsigned int total = 0; ASSERT(!args->trans); switch (op) { case XFS_ATTRUPDATE_UPSERT: case XFS_ATTRUPDATE_CREATE: case XFS_ATTRUPDATE_REPLACE: XFS_STATS_INC(mp, xs_attr_set); args->total = xfs_attr_calc_size(args, &local); /* * If the inode doesn't have an attribute fork, add one. * (inode must not be locked when we call this routine) */ if (xfs_inode_has_attr_fork(dp) == 0) { int sf_size = sizeof(struct xfs_attr_sf_hdr) + xfs_attr_sf_entsize_byname(args->namelen, args->valuelen); error = xfs_attr_add_fork(dp, sf_size, rsvd); if (error) return error; } if (!local) rmt_blks = xfs_attr3_rmt_blocks(mp, args->valuelen); tres = xfs_attr_set_resv(args); total = args->total; break; case XFS_ATTRUPDATE_REMOVE: XFS_STATS_INC(mp, xs_attr_remove); rmt_blks = xfs_attr3_max_rmt_blocks(mp); tres = M_RES(mp)->tr_attrrm; total = XFS_ATTRRM_SPACE_RES(mp); break; } /* * Root fork attributes can use reserved data blocks for this * operation if necessary */ error = xfs_trans_alloc_inode(dp, &tres, total, 0, rsvd, &args->trans); if (error) return error; if (op != XFS_ATTRUPDATE_REMOVE || xfs_inode_hasattr(dp)) { error = xfs_iext_count_extend(args->trans, dp, XFS_ATTR_FORK, XFS_IEXT_ATTR_MANIP_CNT(rmt_blks)); if (error) goto out_trans_cancel; } error = xfs_attr_lookup(args); switch (error) { case -EEXIST: if (op == XFS_ATTRUPDATE_REMOVE) { /* if no value, we are performing a remove operation */ xfs_attr_defer_add(args, XFS_ATTR_DEFER_REMOVE); break; } /* Pure create fails if the attr already exists */ if (op == XFS_ATTRUPDATE_CREATE) goto out_trans_cancel; xfs_attr_defer_add(args, XFS_ATTR_DEFER_REPLACE); break; case -ENOATTR: /* Can't remove what isn't there. */ if (op == XFS_ATTRUPDATE_REMOVE) goto out_trans_cancel; /* Pure replace fails if no existing attr to replace. */ if (op == XFS_ATTRUPDATE_REPLACE) goto out_trans_cancel; xfs_attr_defer_add(args, XFS_ATTR_DEFER_SET); break; default: goto out_trans_cancel; } /* * If this is a synchronous mount, make sure that the * transaction goes to disk before returning to the user. */ if (xfs_has_wsync(mp)) xfs_trans_set_sync(args->trans); xfs_trans_ichgtime(args->trans, dp, XFS_ICHGTIME_CHG); /* * Commit the last in the sequence of transactions. */ xfs_trans_log_inode(args->trans, dp, XFS_ILOG_CORE); error = xfs_trans_commit(args->trans); out_unlock: xfs_iunlock(dp, XFS_ILOCK_EXCL); args->trans = NULL; return error; out_trans_cancel: if (args->trans) xfs_trans_cancel(args->trans); goto out_unlock; } /*======================================================================== * External routines when attribute list is inside the inode *========================================================================*/ int xfs_attr_sf_totsize(struct xfs_inode *dp) { struct xfs_attr_sf_hdr *sf = dp->i_af.if_data; return be16_to_cpu(sf->totsize); } /* * Add a name to the shortform attribute list structure * This is the external routine. */ static int xfs_attr_shortform_addname( struct xfs_da_args *args) { int newsize, forkoff; trace_xfs_attr_sf_addname(args); if (xfs_attr_sf_findname(args)) { int error; ASSERT(args->op_flags & XFS_DA_OP_REPLACE); error = xfs_attr_sf_removename(args); if (error) return error; /* * Since we have removed the old attr, clear XFS_DA_OP_REPLACE * so that the new attr doesn't fit in shortform format, the * leaf format add routine won't trip over the attr not being * around. */ args->op_flags &= ~XFS_DA_OP_REPLACE; } else { ASSERT(!(args->op_flags & XFS_DA_OP_REPLACE)); } if (args->namelen >= XFS_ATTR_SF_ENTSIZE_MAX || args->valuelen >= XFS_ATTR_SF_ENTSIZE_MAX) return -ENOSPC; newsize = xfs_attr_sf_totsize(args->dp); newsize += xfs_attr_sf_entsize_byname(args->namelen, args->valuelen); forkoff = xfs_attr_shortform_bytesfit(args->dp, newsize); if (!forkoff) return -ENOSPC; xfs_attr_shortform_add(args, forkoff); return 0; } /*======================================================================== * External routines when attribute list is one block *========================================================================*/ /* * Return EEXIST if attr is found, or ENOATTR if not */ STATIC int xfs_attr_leaf_hasname( struct xfs_da_args *args, struct xfs_buf **bp) { int error = 0; error = xfs_attr3_leaf_read(args->trans, args->dp, args->owner, 0, bp); if (error) return error; error = xfs_attr3_leaf_lookup_int(*bp, args); if (error != -ENOATTR && error != -EEXIST) xfs_trans_brelse(args->trans, *bp); return error; } /* * Remove a name from the leaf attribute list structure * * This leaf block cannot have a "remote" value, we only call this routine * if bmap_one_block() says there is only one block (ie: no remote blks). */ STATIC int xfs_attr_leaf_removename( struct xfs_da_args *args) { struct xfs_inode *dp; struct xfs_buf *bp; int error, forkoff; trace_xfs_attr_leaf_removename(args); /* * Remove the attribute. */ dp = args->dp; error = xfs_attr_leaf_hasname(args, &bp); if (error == -ENOATTR) { xfs_trans_brelse(args->trans, bp); if (args->op_flags & XFS_DA_OP_RECOVERY) return 0; return error; } else if (error != -EEXIST) return error; xfs_attr3_leaf_remove(bp, args); /* * If the result is small enough, shrink it all into the inode. */ forkoff = xfs_attr_shortform_allfit(bp, dp); if (forkoff) return xfs_attr3_leaf_to_shortform(bp, args, forkoff); /* bp is gone due to xfs_da_shrink_inode */ return 0; } /* * Look up a name in a leaf attribute list structure. * * This leaf block cannot have a "remote" value, we only call this routine * if bmap_one_block() says there is only one block (ie: no remote blks). * * Returns 0 on successful retrieval, otherwise an error. */ STATIC int xfs_attr_leaf_get(xfs_da_args_t *args) { struct xfs_buf *bp; int error; trace_xfs_attr_leaf_get(args); error = xfs_attr_leaf_hasname(args, &bp); if (error == -ENOATTR) { xfs_trans_brelse(args->trans, bp); return error; } else if (error != -EEXIST) return error; error = xfs_attr3_leaf_getvalue(bp, args); xfs_trans_brelse(args->trans, bp); return error; } /* Return EEXIST if attr is found, or ENOATTR if not. */ STATIC int xfs_attr_node_lookup( struct xfs_da_args *args, struct xfs_da_state *state) { int retval, error; /* * Search to see if name exists, and get back a pointer to it. */ error = xfs_da3_node_lookup_int(state, &retval); if (error) return error; return retval; } /*======================================================================== * External routines when attribute list size > geo->blksize *========================================================================*/ STATIC int xfs_attr_node_addname_find_attr( struct xfs_attr_intent *attr) { struct xfs_da_args *args = attr->xattri_da_args; int error; /* * Search to see if name already exists, and get back a pointer * to where it should go. */ xfs_attr_item_init_da_state(attr); error = xfs_attr_node_lookup(args, attr->xattri_da_state); switch (error) { case -ENOATTR: if (args->op_flags & XFS_DA_OP_REPLACE) goto error; break; case -EEXIST: if (!(args->op_flags & XFS_DA_OP_REPLACE)) goto error; trace_xfs_attr_node_replace(args); /* * Save the existing remote attr state so that the current * values reflect the state of the new attribute we are about to * add, not the attribute we just found and will remove later. */ xfs_attr_save_rmt_blk(args); break; case 0: break; default: goto error; } return 0; error: if (attr->xattri_da_state) { xfs_da_state_free(attr->xattri_da_state); attr->xattri_da_state = NULL; } return error; } /* * Add a name to a Btree-format attribute list. * * This will involve walking down the Btree, and may involve splitting leaf * nodes and even splitting intermediate nodes up to and including the root * node (a special case of an intermediate node). * * If the tree was still in single leaf format and needs to converted to * real node format return 1 and let the caller handle that. */ static int xfs_attr_node_try_addname( struct xfs_attr_intent *attr) { struct xfs_da_state *state = attr->xattri_da_state; struct xfs_da_state_blk *blk; int error = 0; trace_xfs_attr_node_addname(state->args); blk = &state->path.blk[state->path.active-1]; ASSERT(blk->magic == XFS_ATTR_LEAF_MAGIC); if (!xfs_attr3_leaf_add(blk->bp, state->args)) { if (state->path.active == 1) { /* * Its really a single leaf node, but it had * out-of-line values so it looked like it *might* * have been a b-tree. Let the caller deal with this. */ error = 1; goto out; } /* * Split as many Btree elements as required. * This code tracks the new and old attr's location * in the index/blkno/rmtblkno/rmtblkcnt fields and * in the index2/blkno2/rmtblkno2/rmtblkcnt2 fields. */ error = xfs_da3_split(state); if (error) goto out; } else { /* * Addition succeeded, update Btree hashvals. */ xfs_da3_fixhashpath(state, &state->path); } out: xfs_da_state_free(state); attr->xattri_da_state = NULL; return error; } static int xfs_attr_node_removename( struct xfs_da_args *args, struct xfs_da_state *state) { struct xfs_da_state_blk *blk; int retval; /* * Remove the name and update the hashvals in the tree. */ blk = &state->path.blk[state->path.active-1]; ASSERT(blk->magic == XFS_ATTR_LEAF_MAGIC); retval = xfs_attr3_leaf_remove(blk->bp, args); xfs_da3_fixhashpath(state, &state->path); return retval; } static int xfs_attr_node_remove_attr( struct xfs_attr_intent *attr) { struct xfs_da_args *args = attr->xattri_da_args; struct xfs_da_state *state = xfs_da_state_alloc(args); int retval = 0; int error = 0; /* * The attr we are removing has already been marked incomplete, so * we need to set the filter appropriately to re-find the "old" * attribute entry after any split ops. */ args->attr_filter |= XFS_ATTR_INCOMPLETE; error = xfs_da3_node_lookup_int(state, &retval); if (error) goto out; error = xfs_attr_node_removename(args, state); /* * Check to see if the tree needs to be collapsed. */ if (retval && (state->path.active > 1)) { error = xfs_da3_join(state); if (error) goto out; } retval = error = 0; out: xfs_da_state_free(state); if (error) return error; return retval; } /* * Retrieve the attribute data from a node attribute list. * * This routine gets called for any attribute fork that has more than one * block, ie: both true Btree attr lists and for single-leaf-blocks with * "remote" values taking up more blocks. * * Returns 0 on successful retrieval, otherwise an error. */ STATIC int xfs_attr_node_get( struct xfs_da_args *args) { struct xfs_da_state *state; struct xfs_da_state_blk *blk; int i; int error; trace_xfs_attr_node_get(args); /* * Search to see if name exists, and get back a pointer to it. */ state = xfs_da_state_alloc(args); error = xfs_attr_node_lookup(args, state); if (error != -EEXIST) goto out_release; /* * Get the value, local or "remote" */ blk = &state->path.blk[state->path.active - 1]; error = xfs_attr3_leaf_getvalue(blk->bp, args); /* * If not in a transaction, we have to release all the buffers. */ out_release: for (i = 0; i < state->path.active; i++) { xfs_trans_brelse(args->trans, state->path.blk[i].bp); state->path.blk[i].bp = NULL; } xfs_da_state_free(state); return error; } /* Enforce that there is at most one namespace bit per attr. */ inline bool xfs_attr_check_namespace(unsigned int attr_flags) { return hweight32(attr_flags & XFS_ATTR_NSP_ONDISK_MASK) < 2; } /* Returns true if the attribute entry name is valid. */ bool xfs_attr_namecheck( unsigned int attr_flags, const void *name, size_t length) { /* Only one namespace bit allowed. */ if (!xfs_attr_check_namespace(attr_flags)) return false; /* * MAXNAMELEN includes the trailing null, but (name/length) leave it * out, so use >= for the length check. */ if (length >= MAXNAMELEN) return false; /* Parent pointers have their own validation. */ if (attr_flags & XFS_ATTR_PARENT) return xfs_parent_namecheck(attr_flags, name, length); /* There shouldn't be any nulls here */ return !memchr(name, 0, length); } int __init xfs_attr_intent_init_cache(void) { xfs_attr_intent_cache = kmem_cache_create("xfs_attr_intent", sizeof(struct xfs_attr_intent), 0, 0, NULL); return xfs_attr_intent_cache != NULL ? 0 : -ENOMEM; } void xfs_attr_intent_destroy_cache(void) { kmem_cache_destroy(xfs_attr_intent_cache); xfs_attr_intent_cache = NULL; } |
| 4399 4398 4397 4394 14 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 | // SPDX-License-Identifier: GPL-2.0-or-later /* * The "hash function" used as the core of the ChaCha stream cipher (RFC7539) * * Copyright (C) 2015 Martin Willi */ #include <linux/bug.h> #include <linux/kernel.h> #include <linux/export.h> #include <linux/bitops.h> #include <linux/string.h> #include <linux/unaligned.h> #include <crypto/chacha.h> static void chacha_permute(u32 *x, int nrounds) { int i; /* whitelist the allowed round counts */ WARN_ON_ONCE(nrounds != 20 && nrounds != 12); for (i = 0; i < nrounds; i += 2) { x[0] += x[4]; x[12] = rol32(x[12] ^ x[0], 16); x[1] += x[5]; x[13] = rol32(x[13] ^ x[1], 16); x[2] += x[6]; x[14] = rol32(x[14] ^ x[2], 16); x[3] += x[7]; x[15] = rol32(x[15] ^ x[3], 16); x[8] += x[12]; x[4] = rol32(x[4] ^ x[8], 12); x[9] += x[13]; x[5] = rol32(x[5] ^ x[9], 12); x[10] += x[14]; x[6] = rol32(x[6] ^ x[10], 12); x[11] += x[15]; x[7] = rol32(x[7] ^ x[11], 12); x[0] += x[4]; x[12] = rol32(x[12] ^ x[0], 8); x[1] += x[5]; x[13] = rol32(x[13] ^ x[1], 8); x[2] += x[6]; x[14] = rol32(x[14] ^ x[2], 8); x[3] += x[7]; x[15] = rol32(x[15] ^ x[3], 8); x[8] += x[12]; x[4] = rol32(x[4] ^ x[8], 7); x[9] += x[13]; x[5] = rol32(x[5] ^ x[9], 7); x[10] += x[14]; x[6] = rol32(x[6] ^ x[10], 7); x[11] += x[15]; x[7] = rol32(x[7] ^ x[11], 7); x[0] += x[5]; x[15] = rol32(x[15] ^ x[0], 16); x[1] += x[6]; x[12] = rol32(x[12] ^ x[1], 16); x[2] += x[7]; x[13] = rol32(x[13] ^ x[2], 16); x[3] += x[4]; x[14] = rol32(x[14] ^ x[3], 16); x[10] += x[15]; x[5] = rol32(x[5] ^ x[10], 12); x[11] += x[12]; x[6] = rol32(x[6] ^ x[11], 12); x[8] += x[13]; x[7] = rol32(x[7] ^ x[8], 12); x[9] += x[14]; x[4] = rol32(x[4] ^ x[9], 12); x[0] += x[5]; x[15] = rol32(x[15] ^ x[0], 8); x[1] += x[6]; x[12] = rol32(x[12] ^ x[1], 8); x[2] += x[7]; x[13] = rol32(x[13] ^ x[2], 8); x[3] += x[4]; x[14] = rol32(x[14] ^ x[3], 8); x[10] += x[15]; x[5] = rol32(x[5] ^ x[10], 7); x[11] += x[12]; x[6] = rol32(x[6] ^ x[11], 7); x[8] += x[13]; x[7] = rol32(x[7] ^ x[8], 7); x[9] += x[14]; x[4] = rol32(x[4] ^ x[9], 7); } } /** * chacha_block_generic - generate one keystream block and increment block counter * @state: input state matrix (16 32-bit words) * @stream: output keystream block (64 bytes) * @nrounds: number of rounds (20 or 12; 20 is recommended) * * This is the ChaCha core, a function from 64-byte strings to 64-byte strings. * The caller has already converted the endianness of the input. This function * also handles incrementing the block counter in the input matrix. */ void chacha_block_generic(u32 *state, u8 *stream, int nrounds) { u32 x[16]; int i; memcpy(x, state, 64); chacha_permute(x, nrounds); for (i = 0; i < ARRAY_SIZE(x); i++) put_unaligned_le32(x[i] + state[i], &stream[i * sizeof(u32)]); state[12]++; } EXPORT_SYMBOL(chacha_block_generic); /** * hchacha_block_generic - abbreviated ChaCha core, for XChaCha * @state: input state matrix (16 32-bit words) * @stream: output (8 32-bit words) * @nrounds: number of rounds (20 or 12; 20 is recommended) * * HChaCha is the ChaCha equivalent of HSalsa and is an intermediate step * towards XChaCha (see https://cr.yp.to/snuffle/xsalsa-20081128.pdf). HChaCha * skips the final addition of the initial state, and outputs only certain words * of the state. It should not be used for streaming directly. */ void hchacha_block_generic(const u32 *state, u32 *stream, int nrounds) { u32 x[16]; memcpy(x, state, 64); chacha_permute(x, nrounds); memcpy(&stream[0], &x[0], 16); memcpy(&stream[4], &x[12], 16); } EXPORT_SYMBOL(hchacha_block_generic); |
| 12 12 4 4 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 | // SPDX-License-Identifier: GPL-2.0-or-later /* * * Copyright (C) Jonathan Naylor G4KLX (g4klx@g4klx.demon.co.uk) */ #include <linux/module.h> #include <linux/proc_fs.h> #include <linux/kernel.h> #include <linux/interrupt.h> #include <linux/fs.h> #include <linux/types.h> #include <linux/sysctl.h> #include <linux/string.h> #include <linux/socket.h> #include <linux/errno.h> #include <linux/fcntl.h> #include <linux/in.h> #include <linux/if_ether.h> #include <linux/slab.h> #include <asm/io.h> #include <linux/inet.h> #include <linux/netdevice.h> #include <linux/etherdevice.h> #include <linux/if_arp.h> #include <linux/skbuff.h> #include <net/ip.h> #include <net/arp.h> #include <net/ax25.h> #include <net/rose.h> static int rose_header(struct sk_buff *skb, struct net_device *dev, unsigned short type, const void *daddr, const void *saddr, unsigned int len) { unsigned char *buff = skb_push(skb, ROSE_MIN_LEN + 2); if (daddr) memcpy(buff + 7, daddr, dev->addr_len); *buff++ = ROSE_GFI | ROSE_Q_BIT; *buff++ = 0x00; *buff++ = ROSE_DATA; *buff++ = 0x7F; *buff++ = AX25_P_IP; if (daddr != NULL) return 37; return -37; } static int rose_set_mac_address(struct net_device *dev, void *addr) { struct sockaddr *sa = addr; int err; if (!memcmp(dev->dev_addr, sa->sa_data, dev->addr_len)) return 0; if (dev->flags & IFF_UP) { err = rose_add_loopback_node((rose_address *)sa->sa_data); if (err) return err; rose_del_loopback_node((const rose_address *)dev->dev_addr); } dev_addr_set(dev, sa->sa_data); return 0; } static int rose_open(struct net_device *dev) { int err; err = rose_add_loopback_node((const rose_address *)dev->dev_addr); if (err) return err; netif_start_queue(dev); return 0; } static int rose_close(struct net_device *dev) { netif_stop_queue(dev); rose_del_loopback_node((const rose_address *)dev->dev_addr); return 0; } static netdev_tx_t rose_xmit(struct sk_buff *skb, struct net_device *dev) { struct net_device_stats *stats = &dev->stats; unsigned int len = skb->len; if (!netif_running(dev)) { printk(KERN_ERR "ROSE: rose_xmit - called when iface is down\n"); return NETDEV_TX_BUSY; } if (!rose_route_frame(skb, NULL)) { dev_kfree_skb(skb); stats->tx_errors++; return NETDEV_TX_OK; } stats->tx_packets++; stats->tx_bytes += len; return NETDEV_TX_OK; } static const struct header_ops rose_header_ops = { .create = rose_header, }; static const struct net_device_ops rose_netdev_ops = { .ndo_open = rose_open, .ndo_stop = rose_close, .ndo_start_xmit = rose_xmit, .ndo_set_mac_address = rose_set_mac_address, }; void rose_setup(struct net_device *dev) { dev->mtu = ROSE_MAX_PACKET_SIZE - 2; dev->netdev_ops = &rose_netdev_ops; dev->header_ops = &rose_header_ops; dev->hard_header_len = AX25_BPQ_HEADER_LEN + AX25_MAX_HEADER_LEN + ROSE_MIN_LEN; dev->addr_len = ROSE_ADDR_LEN; dev->type = ARPHRD_ROSE; /* New-style flags. */ dev->flags = IFF_NOARP; } |
| 36 36 5 36 36 5 5 5 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 | // SPDX-License-Identifier: GPL-2.0-or-later /* 6LoWPAN fragment reassembly * * Authors: * Alexander Aring <aar@pengutronix.de> * * Based on: net/ipv6/reassembly.c */ #define pr_fmt(fmt) "6LoWPAN: " fmt #include <linux/net.h> #include <linux/list.h> #include <linux/netdevice.h> #include <linux/random.h> #include <linux/jhash.h> #include <linux/skbuff.h> #include <linux/slab.h> #include <linux/export.h> #include <net/ieee802154_netdev.h> #include <net/6lowpan.h> #include <net/ipv6_frag.h> #include <net/inet_frag.h> #include <net/ip.h> #include "6lowpan_i.h" static const char lowpan_frags_cache_name[] = "lowpan-frags"; static struct inet_frags lowpan_frags; static int lowpan_frag_reasm(struct lowpan_frag_queue *fq, struct sk_buff *skb, struct sk_buff *prev, struct net_device *ldev); static void lowpan_frag_init(struct inet_frag_queue *q, const void *a) { const struct frag_lowpan_compare_key *key = a; BUILD_BUG_ON(sizeof(*key) > sizeof(q->key)); memcpy(&q->key, key, sizeof(*key)); } static void lowpan_frag_expire(struct timer_list *t) { struct inet_frag_queue *frag = from_timer(frag, t, timer); struct frag_queue *fq; fq = container_of(frag, struct frag_queue, q); spin_lock(&fq->q.lock); if (fq->q.flags & INET_FRAG_COMPLETE) goto out; inet_frag_kill(&fq->q); out: spin_unlock(&fq->q.lock); inet_frag_put(&fq->q); } static inline struct lowpan_frag_queue * fq_find(struct net *net, const struct lowpan_802154_cb *cb, const struct ieee802154_addr *src, const struct ieee802154_addr *dst) { struct netns_ieee802154_lowpan *ieee802154_lowpan = net_ieee802154_lowpan(net); struct frag_lowpan_compare_key key = {}; struct inet_frag_queue *q; key.tag = cb->d_tag; key.d_size = cb->d_size; key.src = *src; key.dst = *dst; q = inet_frag_find(ieee802154_lowpan->fqdir, &key); if (!q) return NULL; return container_of(q, struct lowpan_frag_queue, q); } static int lowpan_frag_queue(struct lowpan_frag_queue *fq, struct sk_buff *skb, u8 frag_type) { struct sk_buff *prev_tail; struct net_device *ldev; int end, offset, err; /* inet_frag_queue_* functions use skb->cb; see struct ipfrag_skb_cb * in inet_fragment.c */ BUILD_BUG_ON(sizeof(struct lowpan_802154_cb) > sizeof(struct inet_skb_parm)); BUILD_BUG_ON(sizeof(struct lowpan_802154_cb) > sizeof(struct inet6_skb_parm)); if (fq->q.flags & INET_FRAG_COMPLETE) goto err; offset = lowpan_802154_cb(skb)->d_offset << 3; end = lowpan_802154_cb(skb)->d_size; /* Is this the final fragment? */ if (offset + skb->len == end) { /* If we already have some bits beyond end * or have different end, the segment is corrupted. */ if (end < fq->q.len || ((fq->q.flags & INET_FRAG_LAST_IN) && end != fq->q.len)) goto err; fq->q.flags |= INET_FRAG_LAST_IN; fq->q.len = end; } else { if (end > fq->q.len) { /* Some bits beyond end -> corruption. */ if (fq->q.flags & INET_FRAG_LAST_IN) goto err; fq->q.len = end; } } ldev = skb->dev; if (ldev) skb->dev = NULL; barrier(); prev_tail = fq->q.fragments_tail; err = inet_frag_queue_insert(&fq->q, skb, offset, end); if (err) goto err; fq->q.stamp = skb->tstamp; fq->q.tstamp_type = skb->tstamp_type; if (frag_type == LOWPAN_DISPATCH_FRAG1) fq->q.flags |= INET_FRAG_FIRST_IN; fq->q.meat += skb->len; add_frag_mem_limit(fq->q.fqdir, skb->truesize); if (fq->q.flags == (INET_FRAG_FIRST_IN | INET_FRAG_LAST_IN) && fq->q.meat == fq->q.len) { int res; unsigned long orefdst = skb->_skb_refdst; skb->_skb_refdst = 0UL; res = lowpan_frag_reasm(fq, skb, prev_tail, ldev); skb->_skb_refdst = orefdst; return res; } skb_dst_drop(skb); return -1; err: kfree_skb(skb); return -1; } /* Check if this packet is complete. * * It is called with locked fq, and caller must check that * queue is eligible for reassembly i.e. it is not COMPLETE, * the last and the first frames arrived and all the bits are here. */ static int lowpan_frag_reasm(struct lowpan_frag_queue *fq, struct sk_buff *skb, struct sk_buff *prev_tail, struct net_device *ldev) { void *reasm_data; inet_frag_kill(&fq->q); reasm_data = inet_frag_reasm_prepare(&fq->q, skb, prev_tail); if (!reasm_data) goto out_oom; inet_frag_reasm_finish(&fq->q, skb, reasm_data, false); skb->dev = ldev; skb->tstamp = fq->q.stamp; fq->q.rb_fragments = RB_ROOT; fq->q.fragments_tail = NULL; fq->q.last_run_head = NULL; return 1; out_oom: net_dbg_ratelimited("lowpan_frag_reasm: no memory for reassembly\n"); return -1; } static int lowpan_frag_rx_handlers_result(struct sk_buff *skb, lowpan_rx_result res) { switch (res) { case RX_QUEUED: return NET_RX_SUCCESS; case RX_CONTINUE: /* nobody cared about this packet */ net_warn_ratelimited("%s: received unknown dispatch\n", __func__); fallthrough; default: /* all others failure */ return NET_RX_DROP; } } static lowpan_rx_result lowpan_frag_rx_h_iphc(struct sk_buff *skb) { int ret; if (!lowpan_is_iphc(*skb_network_header(skb))) return RX_CONTINUE; ret = lowpan_iphc_decompress(skb); if (ret < 0) return RX_DROP; return RX_QUEUED; } static int lowpan_invoke_frag_rx_handlers(struct sk_buff *skb) { lowpan_rx_result res; #define CALL_RXH(rxh) \ do { \ res = rxh(skb); \ if (res != RX_CONTINUE) \ goto rxh_next; \ } while (0) /* likely at first */ CALL_RXH(lowpan_frag_rx_h_iphc); CALL_RXH(lowpan_rx_h_ipv6); rxh_next: return lowpan_frag_rx_handlers_result(skb, res); #undef CALL_RXH } #define LOWPAN_FRAG_DGRAM_SIZE_HIGH_MASK 0x07 #define LOWPAN_FRAG_DGRAM_SIZE_HIGH_SHIFT 8 static int lowpan_get_cb(struct sk_buff *skb, u8 frag_type, struct lowpan_802154_cb *cb) { bool fail; u8 high = 0, low = 0; __be16 d_tag = 0; fail = lowpan_fetch_skb(skb, &high, 1); fail |= lowpan_fetch_skb(skb, &low, 1); /* remove the dispatch value and use first three bits as high value * for the datagram size */ cb->d_size = (high & LOWPAN_FRAG_DGRAM_SIZE_HIGH_MASK) << LOWPAN_FRAG_DGRAM_SIZE_HIGH_SHIFT | low; fail |= lowpan_fetch_skb(skb, &d_tag, 2); cb->d_tag = ntohs(d_tag); if (frag_type == LOWPAN_DISPATCH_FRAGN) { fail |= lowpan_fetch_skb(skb, &cb->d_offset, 1); } else { skb_reset_network_header(skb); cb->d_offset = 0; /* check if datagram_size has ipv6hdr on FRAG1 */ fail |= cb->d_size < sizeof(struct ipv6hdr); /* check if we can dereference the dispatch value */ fail |= !skb->len; } if (unlikely(fail)) return -EIO; return 0; } int lowpan_frag_rcv(struct sk_buff *skb, u8 frag_type) { struct lowpan_frag_queue *fq; struct net *net = dev_net(skb->dev); struct lowpan_802154_cb *cb = lowpan_802154_cb(skb); struct ieee802154_hdr hdr = {}; int err; if (ieee802154_hdr_peek_addrs(skb, &hdr) < 0) goto err; err = lowpan_get_cb(skb, frag_type, cb); if (err < 0) goto err; if (frag_type == LOWPAN_DISPATCH_FRAG1) { err = lowpan_invoke_frag_rx_handlers(skb); if (err == NET_RX_DROP) goto err; } if (cb->d_size > IPV6_MIN_MTU) { net_warn_ratelimited("lowpan_frag_rcv: datagram size exceeds MTU\n"); goto err; } fq = fq_find(net, cb, &hdr.source, &hdr.dest); if (fq != NULL) { int ret; spin_lock(&fq->q.lock); ret = lowpan_frag_queue(fq, skb, frag_type); spin_unlock(&fq->q.lock); inet_frag_put(&fq->q); return ret; } err: kfree_skb(skb); return -1; } #ifdef CONFIG_SYSCTL static struct ctl_table lowpan_frags_ns_ctl_table[] = { { .procname = "6lowpanfrag_high_thresh", .maxlen = sizeof(unsigned long), .mode = 0644, .proc_handler = proc_doulongvec_minmax, }, { .procname = "6lowpanfrag_low_thresh", .maxlen = sizeof(unsigned long), .mode = 0644, .proc_handler = proc_doulongvec_minmax, }, { .procname = "6lowpanfrag_time", .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec_jiffies, }, }; /* secret interval has been deprecated */ static int lowpan_frags_secret_interval_unused; static struct ctl_table lowpan_frags_ctl_table[] = { { .procname = "6lowpanfrag_secret_interval", .data = &lowpan_frags_secret_interval_unused, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec_jiffies, }, }; static int __net_init lowpan_frags_ns_sysctl_register(struct net *net) { struct ctl_table *table; struct ctl_table_header *hdr; struct netns_ieee802154_lowpan *ieee802154_lowpan = net_ieee802154_lowpan(net); size_t table_size = ARRAY_SIZE(lowpan_frags_ns_ctl_table); table = lowpan_frags_ns_ctl_table; if (!net_eq(net, &init_net)) { table = kmemdup(table, sizeof(lowpan_frags_ns_ctl_table), GFP_KERNEL); if (table == NULL) goto err_alloc; /* Don't export sysctls to unprivileged users */ if (net->user_ns != &init_user_ns) table_size = 0; } table[0].data = &ieee802154_lowpan->fqdir->high_thresh; table[0].extra1 = &ieee802154_lowpan->fqdir->low_thresh; table[1].data = &ieee802154_lowpan->fqdir->low_thresh; table[1].extra2 = &ieee802154_lowpan->fqdir->high_thresh; table[2].data = &ieee802154_lowpan->fqdir->timeout; hdr = register_net_sysctl_sz(net, "net/ieee802154/6lowpan", table, table_size); if (hdr == NULL) goto err_reg; ieee802154_lowpan->sysctl.frags_hdr = hdr; return 0; err_reg: if (!net_eq(net, &init_net)) kfree(table); err_alloc: return -ENOMEM; } static void __net_exit lowpan_frags_ns_sysctl_unregister(struct net *net) { const struct ctl_table *table; struct netns_ieee802154_lowpan *ieee802154_lowpan = net_ieee802154_lowpan(net); table = ieee802154_lowpan->sysctl.frags_hdr->ctl_table_arg; unregister_net_sysctl_table(ieee802154_lowpan->sysctl.frags_hdr); if (!net_eq(net, &init_net)) kfree(table); } static struct ctl_table_header *lowpan_ctl_header; static int __init lowpan_frags_sysctl_register(void) { lowpan_ctl_header = register_net_sysctl(&init_net, "net/ieee802154/6lowpan", lowpan_frags_ctl_table); return lowpan_ctl_header == NULL ? -ENOMEM : 0; } static void lowpan_frags_sysctl_unregister(void) { unregister_net_sysctl_table(lowpan_ctl_header); } #else static inline int lowpan_frags_ns_sysctl_register(struct net *net) { return 0; } static inline void lowpan_frags_ns_sysctl_unregister(struct net *net) { } static inline int __init lowpan_frags_sysctl_register(void) { return 0; } static inline void lowpan_frags_sysctl_unregister(void) { } #endif static int __net_init lowpan_frags_init_net(struct net *net) { struct netns_ieee802154_lowpan *ieee802154_lowpan = net_ieee802154_lowpan(net); int res; res = fqdir_init(&ieee802154_lowpan->fqdir, &lowpan_frags, net); if (res < 0) return res; ieee802154_lowpan->fqdir->high_thresh = IPV6_FRAG_HIGH_THRESH; ieee802154_lowpan->fqdir->low_thresh = IPV6_FRAG_LOW_THRESH; ieee802154_lowpan->fqdir->timeout = IPV6_FRAG_TIMEOUT; res = lowpan_frags_ns_sysctl_register(net); if (res < 0) fqdir_exit(ieee802154_lowpan->fqdir); return res; } static void __net_exit lowpan_frags_pre_exit_net(struct net *net) { struct netns_ieee802154_lowpan *ieee802154_lowpan = net_ieee802154_lowpan(net); fqdir_pre_exit(ieee802154_lowpan->fqdir); } static void __net_exit lowpan_frags_exit_net(struct net *net) { struct netns_ieee802154_lowpan *ieee802154_lowpan = net_ieee802154_lowpan(net); lowpan_frags_ns_sysctl_unregister(net); fqdir_exit(ieee802154_lowpan->fqdir); } static struct pernet_operations lowpan_frags_ops = { .init = lowpan_frags_init_net, .pre_exit = lowpan_frags_pre_exit_net, .exit = lowpan_frags_exit_net, }; static u32 lowpan_key_hashfn(const void *data, u32 len, u32 seed) { return jhash2(data, sizeof(struct frag_lowpan_compare_key) / sizeof(u32), seed); } static u32 lowpan_obj_hashfn(const void *data, u32 len, u32 seed) { const struct inet_frag_queue *fq = data; return jhash2((const u32 *)&fq->key, sizeof(struct frag_lowpan_compare_key) / sizeof(u32), seed); } static int lowpan_obj_cmpfn(struct rhashtable_compare_arg *arg, const void *ptr) { const struct frag_lowpan_compare_key *key = arg->key; const struct inet_frag_queue *fq = ptr; return !!memcmp(&fq->key, key, sizeof(*key)); } static const struct rhashtable_params lowpan_rhash_params = { .head_offset = offsetof(struct inet_frag_queue, node), .hashfn = lowpan_key_hashfn, .obj_hashfn = lowpan_obj_hashfn, .obj_cmpfn = lowpan_obj_cmpfn, .automatic_shrinking = true, }; int __init lowpan_net_frag_init(void) { int ret; lowpan_frags.constructor = lowpan_frag_init; lowpan_frags.destructor = NULL; lowpan_frags.qsize = sizeof(struct frag_queue); lowpan_frags.frag_expire = lowpan_frag_expire; lowpan_frags.frags_cache_name = lowpan_frags_cache_name; lowpan_frags.rhash_params = lowpan_rhash_params; ret = inet_frags_init(&lowpan_frags); if (ret) goto out; ret = lowpan_frags_sysctl_register(); if (ret) goto err_sysctl; ret = register_pernet_subsys(&lowpan_frags_ops); if (ret) goto err_pernet; out: return ret; err_pernet: lowpan_frags_sysctl_unregister(); err_sysctl: inet_frags_fini(&lowpan_frags); return ret; } void lowpan_net_frag_exit(void) { lowpan_frags_sysctl_unregister(); unregister_pernet_subsys(&lowpan_frags_ops); inet_frags_fini(&lowpan_frags); } |
| 296 848 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 | /* SPDX-License-Identifier: GPL-2.0-only */ /* * Copyright (C) 2005,2006,2007,2008 IBM Corporation * * Authors: * Reiner Sailer <sailer@watson.ibm.com> * Mimi Zohar <zohar@us.ibm.com> * * File: ima.h * internal Integrity Measurement Architecture (IMA) definitions */ #ifndef __LINUX_IMA_H #define __LINUX_IMA_H #include <linux/types.h> #include <linux/crypto.h> #include <linux/fs.h> #include <linux/security.h> #include <linux/hash.h> #include <linux/tpm.h> #include <linux/audit.h> #include <crypto/hash_info.h> #include "../integrity.h" enum ima_show_type { IMA_SHOW_BINARY, IMA_SHOW_BINARY_NO_FIELD_LEN, IMA_SHOW_BINARY_OLD_STRING_FMT, IMA_SHOW_ASCII }; enum tpm_pcrs { TPM_PCR0 = 0, TPM_PCR8 = 8, TPM_PCR10 = 10 }; /* digest size for IMA, fits SHA1 or MD5 */ #define IMA_DIGEST_SIZE SHA1_DIGEST_SIZE #define IMA_EVENT_NAME_LEN_MAX 255 #define IMA_HASH_BITS 10 #define IMA_MEASURE_HTABLE_SIZE (1 << IMA_HASH_BITS) #define IMA_TEMPLATE_FIELD_ID_MAX_LEN 16 #define IMA_TEMPLATE_NUM_FIELDS_MAX 15 #define IMA_TEMPLATE_IMA_NAME "ima" #define IMA_TEMPLATE_IMA_FMT "d|n" #define NR_BANKS(chip) ((chip != NULL) ? chip->nr_allocated_banks : 0) /* current content of the policy */ extern int ima_policy_flag; /* bitset of digests algorithms allowed in the setxattr hook */ extern atomic_t ima_setxattr_allowed_hash_algorithms; /* IMA hash algorithm description */ struct ima_algo_desc { struct crypto_shash *tfm; enum hash_algo algo; }; /* set during initialization */ extern int ima_hash_algo __ro_after_init; extern int ima_sha1_idx __ro_after_init; extern int ima_hash_algo_idx __ro_after_init; extern int ima_extra_slots __ro_after_init; extern struct ima_algo_desc *ima_algo_array __ro_after_init; extern int ima_appraise; extern struct tpm_chip *ima_tpm_chip; extern const char boot_aggregate_name[]; /* IMA event related data */ struct ima_event_data { struct ima_iint_cache *iint; struct file *file; const unsigned char *filename; struct evm_ima_xattr_data *xattr_value; int xattr_len; const struct modsig *modsig; const char *violation; const void *buf; int buf_len; }; /* IMA template field data definition */ struct ima_field_data { u8 *data; u32 len; }; /* IMA template field definition */ struct ima_template_field { const char field_id[IMA_TEMPLATE_FIELD_ID_MAX_LEN]; int (*field_init)(struct ima_event_data *event_data, struct ima_field_data *field_data); void (*field_show)(struct seq_file *m, enum ima_show_type show, struct ima_field_data *field_data); }; /* IMA template descriptor definition */ struct ima_template_desc { struct list_head list; char *name; char *fmt; int num_fields; const struct ima_template_field **fields; }; struct ima_template_entry { int pcr; struct tpm_digest *digests; struct ima_template_desc *template_desc; /* template descriptor */ u32 template_data_len; struct ima_field_data template_data[]; /* template related data */ }; struct ima_queue_entry { struct hlist_node hnext; /* place in hash collision list */ struct list_head later; /* place in ima_measurements list */ struct ima_template_entry *entry; }; extern struct list_head ima_measurements; /* list of all measurements */ /* Some details preceding the binary serialized measurement list */ struct ima_kexec_hdr { u16 version; u16 _reserved0; u32 _reserved1; u64 buffer_size; u64 count; }; /* IMA iint action cache flags */ #define IMA_MEASURE 0x00000001 #define IMA_MEASURED 0x00000002 #define IMA_APPRAISE 0x00000004 #define IMA_APPRAISED 0x00000008 /*#define IMA_COLLECT 0x00000010 do not use this flag */ #define IMA_COLLECTED 0x00000020 #define IMA_AUDIT 0x00000040 #define IMA_AUDITED 0x00000080 #define IMA_HASH 0x00000100 #define IMA_HASHED 0x00000200 /* IMA iint policy rule cache flags */ #define IMA_NONACTION_FLAGS 0xff000000 #define IMA_DIGSIG_REQUIRED 0x01000000 #define IMA_PERMIT_DIRECTIO 0x02000000 #define IMA_NEW_FILE 0x04000000 #define IMA_FAIL_UNVERIFIABLE_SIGS 0x10000000 #define IMA_MODSIG_ALLOWED 0x20000000 #define IMA_CHECK_BLACKLIST 0x40000000 #define IMA_VERITY_REQUIRED 0x80000000 #define IMA_DO_MASK (IMA_MEASURE | IMA_APPRAISE | IMA_AUDIT | \ IMA_HASH | IMA_APPRAISE_SUBMASK) #define IMA_DONE_MASK (IMA_MEASURED | IMA_APPRAISED | IMA_AUDITED | \ IMA_HASHED | IMA_COLLECTED | \ IMA_APPRAISED_SUBMASK) /* IMA iint subaction appraise cache flags */ #define IMA_FILE_APPRAISE 0x00001000 #define IMA_FILE_APPRAISED 0x00002000 #define IMA_MMAP_APPRAISE 0x00004000 #define IMA_MMAP_APPRAISED 0x00008000 #define IMA_BPRM_APPRAISE 0x00010000 #define IMA_BPRM_APPRAISED 0x00020000 #define IMA_READ_APPRAISE 0x00040000 #define IMA_READ_APPRAISED 0x00080000 #define IMA_CREDS_APPRAISE 0x00100000 #define IMA_CREDS_APPRAISED 0x00200000 #define IMA_APPRAISE_SUBMASK (IMA_FILE_APPRAISE | IMA_MMAP_APPRAISE | \ IMA_BPRM_APPRAISE | IMA_READ_APPRAISE | \ IMA_CREDS_APPRAISE) #define IMA_APPRAISED_SUBMASK (IMA_FILE_APPRAISED | IMA_MMAP_APPRAISED | \ IMA_BPRM_APPRAISED | IMA_READ_APPRAISED | \ IMA_CREDS_APPRAISED) /* IMA iint cache atomic_flags */ #define IMA_CHANGE_XATTR 0 #define IMA_UPDATE_XATTR 1 #define IMA_CHANGE_ATTR 2 #define IMA_DIGSIG 3 #define IMA_MUST_MEASURE 4 /* IMA integrity metadata associated with an inode */ struct ima_iint_cache { struct mutex mutex; /* protects: version, flags, digest */ struct integrity_inode_attributes real_inode; unsigned long flags; unsigned long measured_pcrs; unsigned long atomic_flags; enum integrity_status ima_file_status:4; enum integrity_status ima_mmap_status:4; enum integrity_status ima_bprm_status:4; enum integrity_status ima_read_status:4; enum integrity_status ima_creds_status:4; struct ima_digest_data *ima_hash; }; extern struct lsm_blob_sizes ima_blob_sizes; static inline struct ima_iint_cache * ima_inode_get_iint(const struct inode *inode) { struct ima_iint_cache **iint_sec; if (unlikely(!inode->i_security)) return NULL; iint_sec = inode->i_security + ima_blob_sizes.lbs_inode; return *iint_sec; } static inline void ima_inode_set_iint(const struct inode *inode, struct ima_iint_cache *iint) { struct ima_iint_cache **iint_sec; if (unlikely(!inode->i_security)) return; iint_sec = inode->i_security + ima_blob_sizes.lbs_inode; *iint_sec = iint; } struct ima_iint_cache *ima_iint_find(struct inode *inode); struct ima_iint_cache *ima_inode_get(struct inode *inode); void ima_inode_free_rcu(void *inode_security); void __init ima_iintcache_init(void); extern const int read_idmap[]; #ifdef CONFIG_HAVE_IMA_KEXEC void ima_load_kexec_buffer(void); #else static inline void ima_load_kexec_buffer(void) {} #endif /* CONFIG_HAVE_IMA_KEXEC */ #ifdef CONFIG_IMA_MEASURE_ASYMMETRIC_KEYS void ima_post_key_create_or_update(struct key *keyring, struct key *key, const void *payload, size_t plen, unsigned long flags, bool create); #endif /* * The default binary_runtime_measurements list format is defined as the * platform native format. The canonical format is defined as little-endian. */ extern bool ima_canonical_fmt; /* Internal IMA function definitions */ int ima_init(void); int ima_fs_init(void); int ima_add_template_entry(struct ima_template_entry *entry, int violation, const char *op, struct inode *inode, const unsigned char *filename); int ima_calc_file_hash(struct file *file, struct ima_digest_data *hash); int ima_calc_buffer_hash(const void *buf, loff_t len, struct ima_digest_data *hash); int ima_calc_field_array_hash(struct ima_field_data *field_data, struct ima_template_entry *entry); int ima_calc_boot_aggregate(struct ima_digest_data *hash); void ima_add_violation(struct file *file, const unsigned char *filename, struct ima_iint_cache *iint, const char *op, const char *cause); int ima_init_crypto(void); void ima_putc(struct seq_file *m, void *data, int datalen); void ima_print_digest(struct seq_file *m, u8 *digest, u32 size); int template_desc_init_fields(const char *template_fmt, const struct ima_template_field ***fields, int *num_fields); struct ima_template_desc *ima_template_desc_current(void); struct ima_template_desc *ima_template_desc_buf(void); struct ima_template_desc *lookup_template_desc(const char *name); bool ima_template_has_modsig(const struct ima_template_desc *ima_template); int ima_restore_measurement_entry(struct ima_template_entry *entry); int ima_restore_measurement_list(loff_t bufsize, void *buf); int ima_measurements_show(struct seq_file *m, void *v); unsigned long ima_get_binary_runtime_size(void); int ima_init_template(void); void ima_init_template_list(void); int __init ima_init_digests(void); void __init ima_init_reboot_notifier(void); int ima_lsm_policy_change(struct notifier_block *nb, unsigned long event, void *lsm_data); /* * used to protect h_table and sha_table */ extern spinlock_t ima_queue_lock; struct ima_h_table { atomic_long_t len; /* number of stored measurements in the list */ atomic_long_t violations; struct hlist_head queue[IMA_MEASURE_HTABLE_SIZE]; }; extern struct ima_h_table ima_htable; static inline unsigned int ima_hash_key(u8 *digest) { /* there is no point in taking a hash of part of a digest */ return (digest[0] | digest[1] << 8) % IMA_MEASURE_HTABLE_SIZE; } #define __ima_hooks(hook) \ hook(NONE, none) \ hook(FILE_CHECK, file) \ hook(MMAP_CHECK, mmap) \ hook(MMAP_CHECK_REQPROT, mmap_reqprot) \ hook(BPRM_CHECK, bprm) \ hook(CREDS_CHECK, creds) \ hook(POST_SETATTR, post_setattr) \ hook(MODULE_CHECK, module) \ hook(FIRMWARE_CHECK, firmware) \ hook(KEXEC_KERNEL_CHECK, kexec_kernel) \ hook(KEXEC_INITRAMFS_CHECK, kexec_initramfs) \ hook(POLICY_CHECK, policy) \ hook(KEXEC_CMDLINE, kexec_cmdline) \ hook(KEY_CHECK, key) \ hook(CRITICAL_DATA, critical_data) \ hook(SETXATTR_CHECK, setxattr_check) \ hook(MAX_CHECK, none) #define __ima_hook_enumify(ENUM, str) ENUM, #define __ima_stringify(arg) (#arg) #define __ima_hook_measuring_stringify(ENUM, str) \ (__ima_stringify(measuring_ ##str)), enum ima_hooks { __ima_hooks(__ima_hook_enumify) }; static const char * const ima_hooks_measure_str[] = { __ima_hooks(__ima_hook_measuring_stringify) }; static inline const char *func_measure_str(enum ima_hooks func) { if (func >= MAX_CHECK) return ima_hooks_measure_str[NONE]; return ima_hooks_measure_str[func]; } extern const char *const func_tokens[]; struct modsig; #ifdef CONFIG_IMA_QUEUE_EARLY_BOOT_KEYS /* * To track keys that need to be measured. */ struct ima_key_entry { struct list_head list; void *payload; size_t payload_len; char *keyring_name; }; void ima_init_key_queue(void); bool ima_should_queue_key(void); bool ima_queue_key(struct key *keyring, const void *payload, size_t payload_len); void ima_process_queued_keys(void); #else static inline void ima_init_key_queue(void) {} static inline bool ima_should_queue_key(void) { return false; } static inline bool ima_queue_key(struct key *keyring, const void *payload, size_t payload_len) { return false; } static inline void ima_process_queued_keys(void) {} #endif /* CONFIG_IMA_QUEUE_EARLY_BOOT_KEYS */ /* LIM API function definitions */ int ima_get_action(struct mnt_idmap *idmap, struct inode *inode, const struct cred *cred, struct lsm_prop *prop, int mask, enum ima_hooks func, int *pcr, struct ima_template_desc **template_desc, const char *func_data, unsigned int *allowed_algos); int ima_must_measure(struct inode *inode, int mask, enum ima_hooks func); int ima_collect_measurement(struct ima_iint_cache *iint, struct file *file, void *buf, loff_t size, enum hash_algo algo, struct modsig *modsig); void ima_store_measurement(struct ima_iint_cache *iint, struct file *file, const unsigned char *filename, struct evm_ima_xattr_data *xattr_value, int xattr_len, const struct modsig *modsig, int pcr, struct ima_template_desc *template_desc); int process_buffer_measurement(struct mnt_idmap *idmap, struct inode *inode, const void *buf, int size, const char *eventname, enum ima_hooks func, int pcr, const char *func_data, bool buf_hash, u8 *digest, size_t digest_len); void ima_audit_measurement(struct ima_iint_cache *iint, const unsigned char *filename); int ima_alloc_init_template(struct ima_event_data *event_data, struct ima_template_entry **entry, struct ima_template_desc *template_desc); int ima_store_template(struct ima_template_entry *entry, int violation, struct inode *inode, const unsigned char *filename, int pcr); void ima_free_template_entry(struct ima_template_entry *entry); const char *ima_d_path(const struct path *path, char **pathbuf, char *filename); /* IMA policy related functions */ int ima_match_policy(struct mnt_idmap *idmap, struct inode *inode, const struct cred *cred, struct lsm_prop *prop, enum ima_hooks func, int mask, int flags, int *pcr, struct ima_template_desc **template_desc, const char *func_data, unsigned int *allowed_algos); void ima_init_policy(void); void ima_update_policy(void); void ima_update_policy_flags(void); ssize_t ima_parse_add_rule(char *); void ima_delete_rules(void); int ima_check_policy(void); void *ima_policy_start(struct seq_file *m, loff_t *pos); void *ima_policy_next(struct seq_file *m, void *v, loff_t *pos); void ima_policy_stop(struct seq_file *m, void *v); int ima_policy_show(struct seq_file *m, void *v); /* Appraise integrity measurements */ #define IMA_APPRAISE_ENFORCE 0x01 #define IMA_APPRAISE_FIX 0x02 #define IMA_APPRAISE_LOG 0x04 #define IMA_APPRAISE_MODULES 0x08 #define IMA_APPRAISE_FIRMWARE 0x10 #define IMA_APPRAISE_POLICY 0x20 #define IMA_APPRAISE_KEXEC 0x40 #ifdef CONFIG_IMA_APPRAISE int ima_check_blacklist(struct ima_iint_cache *iint, const struct modsig *modsig, int pcr); int ima_appraise_measurement(enum ima_hooks func, struct ima_iint_cache *iint, struct file *file, const unsigned char *filename, struct evm_ima_xattr_data *xattr_value, int xattr_len, const struct modsig *modsig); int ima_must_appraise(struct mnt_idmap *idmap, struct inode *inode, int mask, enum ima_hooks func); void ima_update_xattr(struct ima_iint_cache *iint, struct file *file); enum integrity_status ima_get_cache_status(struct ima_iint_cache *iint, enum ima_hooks func); enum hash_algo ima_get_hash_algo(const struct evm_ima_xattr_data *xattr_value, int xattr_len); int ima_read_xattr(struct dentry *dentry, struct evm_ima_xattr_data **xattr_value, int xattr_len); void __init init_ima_appraise_lsm(const struct lsm_id *lsmid); #else static inline int ima_check_blacklist(struct ima_iint_cache *iint, const struct modsig *modsig, int pcr) { return 0; } static inline int ima_appraise_measurement(enum ima_hooks func, struct ima_iint_cache *iint, struct file *file, const unsigned char *filename, struct evm_ima_xattr_data *xattr_value, int xattr_len, const struct modsig *modsig) { return INTEGRITY_UNKNOWN; } static inline int ima_must_appraise(struct mnt_idmap *idmap, struct inode *inode, int mask, enum ima_hooks func) { return 0; } static inline void ima_update_xattr(struct ima_iint_cache *iint, struct file *file) { } static inline enum integrity_status ima_get_cache_status(struct ima_iint_cache *iint, enum ima_hooks func) { return INTEGRITY_UNKNOWN; } static inline enum hash_algo ima_get_hash_algo(struct evm_ima_xattr_data *xattr_value, int xattr_len) { return ima_hash_algo; } static inline int ima_read_xattr(struct dentry *dentry, struct evm_ima_xattr_data **xattr_value, int xattr_len) { return 0; } static inline void __init init_ima_appraise_lsm(const struct lsm_id *lsmid) { } #endif /* CONFIG_IMA_APPRAISE */ #ifdef CONFIG_IMA_APPRAISE_MODSIG int ima_read_modsig(enum ima_hooks func, const void *buf, loff_t buf_len, struct modsig **modsig); void ima_collect_modsig(struct modsig *modsig, const void *buf, loff_t size); int ima_get_modsig_digest(const struct modsig *modsig, enum hash_algo *algo, const u8 **digest, u32 *digest_size); int ima_get_raw_modsig(const struct modsig *modsig, const void **data, u32 *data_len); void ima_free_modsig(struct modsig *modsig); #else static inline int ima_read_modsig(enum ima_hooks func, const void *buf, loff_t buf_len, struct modsig **modsig) { return -EOPNOTSUPP; } static inline void ima_collect_modsig(struct modsig *modsig, const void *buf, loff_t size) { } static inline int ima_get_modsig_digest(const struct modsig *modsig, enum hash_algo *algo, const u8 **digest, u32 *digest_size) { return -EOPNOTSUPP; } static inline int ima_get_raw_modsig(const struct modsig *modsig, const void **data, u32 *data_len) { return -EOPNOTSUPP; } static inline void ima_free_modsig(struct modsig *modsig) { } #endif /* CONFIG_IMA_APPRAISE_MODSIG */ /* LSM based policy rules require audit */ #ifdef CONFIG_IMA_LSM_RULES #define ima_filter_rule_init security_audit_rule_init #define ima_filter_rule_free security_audit_rule_free #define ima_filter_rule_match security_audit_rule_match #else static inline int ima_filter_rule_init(u32 field, u32 op, char *rulestr, void **lsmrule, gfp_t gfp) { return -EINVAL; } static inline void ima_filter_rule_free(void *lsmrule) { } static inline int ima_filter_rule_match(struct lsm_prop *prop, u32 field, u32 op, void *lsmrule) { return -EINVAL; } #endif /* CONFIG_IMA_LSM_RULES */ #ifdef CONFIG_IMA_READ_POLICY #define POLICY_FILE_FLAGS (S_IWUSR | S_IRUSR) #else #define POLICY_FILE_FLAGS S_IWUSR #endif /* CONFIG_IMA_READ_POLICY */ #endif /* __LINUX_IMA_H */ |
| 3 3 8 8 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 | /* * Copyright (C) 2015 Red Hat, Inc. * All Rights Reserved. * * Permission is hereby granted, free of charge, to any person obtaining * a copy of this software and associated documentation files (the * "Software"), to deal in the Software without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sublicense, and/or sell copies of the Software, and to * permit persons to whom the Software is furnished to do so, subject to * the following conditions: * * The above copyright notice and this permission notice (including the * next paragraph) shall be included in all copies or substantial * portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. * IN NO EVENT SHALL THE COPYRIGHT OWNER(S) AND/OR ITS SUPPLIERS BE * LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION * OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION * WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include <linux/virtio.h> #include <linux/virtio_config.h> #include <linux/virtio_ring.h> #include <drm/drm_file.h> #include <drm/drm_managed.h> #include "virtgpu_drv.h" static void virtio_gpu_config_changed_work_func(struct work_struct *work) { struct virtio_gpu_device *vgdev = container_of(work, struct virtio_gpu_device, config_changed_work); u32 events_read, events_clear = 0; /* read the config space */ virtio_cread_le(vgdev->vdev, struct virtio_gpu_config, events_read, &events_read); if (events_read & VIRTIO_GPU_EVENT_DISPLAY) { if (vgdev->num_scanouts) { if (vgdev->has_edid) virtio_gpu_cmd_get_edids(vgdev); virtio_gpu_cmd_get_display_info(vgdev); virtio_gpu_notify(vgdev); drm_helper_hpd_irq_event(vgdev->ddev); } events_clear |= VIRTIO_GPU_EVENT_DISPLAY; } virtio_cwrite_le(vgdev->vdev, struct virtio_gpu_config, events_clear, &events_clear); } static void virtio_gpu_init_vq(struct virtio_gpu_queue *vgvq, void (*work_func)(struct work_struct *work)) { spin_lock_init(&vgvq->qlock); init_waitqueue_head(&vgvq->ack_queue); INIT_WORK(&vgvq->dequeue_work, work_func); } static void virtio_gpu_get_capsets(struct virtio_gpu_device *vgdev, int num_capsets) { int i, ret; bool invalid_capset_id = false; struct drm_device *drm = vgdev->ddev; vgdev->capsets = drmm_kcalloc(drm, num_capsets, sizeof(struct virtio_gpu_drv_capset), GFP_KERNEL); if (!vgdev->capsets) { DRM_ERROR("failed to allocate cap sets\n"); return; } for (i = 0; i < num_capsets; i++) { virtio_gpu_cmd_get_capset_info(vgdev, i); virtio_gpu_notify(vgdev); ret = wait_event_timeout(vgdev->resp_wq, vgdev->capsets[i].id > 0, 5 * HZ); /* * Capability ids are defined in the virtio-gpu spec and are * between 1 to 63, inclusive. */ if (!vgdev->capsets[i].id || vgdev->capsets[i].id > MAX_CAPSET_ID) invalid_capset_id = true; if (ret == 0) DRM_ERROR("timed out waiting for cap set %d\n", i); else if (invalid_capset_id) DRM_ERROR("invalid capset id %u", vgdev->capsets[i].id); if (ret == 0 || invalid_capset_id) { spin_lock(&vgdev->display_info_lock); drmm_kfree(drm, vgdev->capsets); vgdev->capsets = NULL; spin_unlock(&vgdev->display_info_lock); return; } vgdev->capset_id_mask |= 1 << vgdev->capsets[i].id; DRM_INFO("cap set %d: id %d, max-version %d, max-size %d\n", i, vgdev->capsets[i].id, vgdev->capsets[i].max_version, vgdev->capsets[i].max_size); } vgdev->num_capsets = num_capsets; } int virtio_gpu_init(struct virtio_device *vdev, struct drm_device *dev) { struct virtqueue_info vqs_info[] = { { "control", virtio_gpu_ctrl_ack }, { "cursor", virtio_gpu_cursor_ack }, }; struct virtio_gpu_device *vgdev; /* this will expand later */ struct virtqueue *vqs[2]; u32 num_scanouts, num_capsets; int ret = 0; if (!virtio_has_feature(vdev, VIRTIO_F_VERSION_1)) return -ENODEV; vgdev = drmm_kzalloc(dev, sizeof(struct virtio_gpu_device), GFP_KERNEL); if (!vgdev) return -ENOMEM; vgdev->ddev = dev; dev->dev_private = vgdev; vgdev->vdev = vdev; spin_lock_init(&vgdev->display_info_lock); spin_lock_init(&vgdev->resource_export_lock); spin_lock_init(&vgdev->host_visible_lock); ida_init(&vgdev->ctx_id_ida); ida_init(&vgdev->resource_ida); init_waitqueue_head(&vgdev->resp_wq); virtio_gpu_init_vq(&vgdev->ctrlq, virtio_gpu_dequeue_ctrl_func); virtio_gpu_init_vq(&vgdev->cursorq, virtio_gpu_dequeue_cursor_func); vgdev->fence_drv.context = dma_fence_context_alloc(1); spin_lock_init(&vgdev->fence_drv.lock); INIT_LIST_HEAD(&vgdev->fence_drv.fences); INIT_LIST_HEAD(&vgdev->cap_cache); INIT_WORK(&vgdev->config_changed_work, virtio_gpu_config_changed_work_func); INIT_WORK(&vgdev->obj_free_work, virtio_gpu_array_put_free_work); INIT_LIST_HEAD(&vgdev->obj_free_list); spin_lock_init(&vgdev->obj_free_lock); #ifdef __LITTLE_ENDIAN if (virtio_has_feature(vgdev->vdev, VIRTIO_GPU_F_VIRGL)) vgdev->has_virgl_3d = true; #endif if (virtio_has_feature(vgdev->vdev, VIRTIO_GPU_F_EDID)) { vgdev->has_edid = true; } if (virtio_has_feature(vgdev->vdev, VIRTIO_RING_F_INDIRECT_DESC)) { vgdev->has_indirect = true; } if (virtio_has_feature(vgdev->vdev, VIRTIO_GPU_F_RESOURCE_UUID)) { vgdev->has_resource_assign_uuid = true; } if (virtio_has_feature(vgdev->vdev, VIRTIO_GPU_F_RESOURCE_BLOB)) { vgdev->has_resource_blob = true; } if (virtio_get_shm_region(vgdev->vdev, &vgdev->host_visible_region, VIRTIO_GPU_SHM_ID_HOST_VISIBLE)) { if (!devm_request_mem_region(&vgdev->vdev->dev, vgdev->host_visible_region.addr, vgdev->host_visible_region.len, dev_name(&vgdev->vdev->dev))) { DRM_ERROR("Could not reserve host visible region\n"); ret = -EBUSY; goto err_vqs; } DRM_INFO("Host memory window: 0x%lx +0x%lx\n", (unsigned long)vgdev->host_visible_region.addr, (unsigned long)vgdev->host_visible_region.len); vgdev->has_host_visible = true; drm_mm_init(&vgdev->host_visible_mm, (unsigned long)vgdev->host_visible_region.addr, (unsigned long)vgdev->host_visible_region.len); } if (virtio_has_feature(vgdev->vdev, VIRTIO_GPU_F_CONTEXT_INIT)) { vgdev->has_context_init = true; } DRM_INFO("features: %cvirgl %cedid %cresource_blob %chost_visible", vgdev->has_virgl_3d ? '+' : '-', vgdev->has_edid ? '+' : '-', vgdev->has_resource_blob ? '+' : '-', vgdev->has_host_visible ? '+' : '-'); DRM_INFO("features: %ccontext_init\n", vgdev->has_context_init ? '+' : '-'); ret = virtio_find_vqs(vgdev->vdev, 2, vqs, vqs_info, NULL); if (ret) { DRM_ERROR("failed to find virt queues\n"); goto err_vqs; } vgdev->ctrlq.vq = vqs[0]; vgdev->cursorq.vq = vqs[1]; ret = virtio_gpu_alloc_vbufs(vgdev); if (ret) { DRM_ERROR("failed to alloc vbufs\n"); goto err_vbufs; } /* get display info */ virtio_cread_le(vgdev->vdev, struct virtio_gpu_config, num_scanouts, &num_scanouts); vgdev->num_scanouts = min_t(uint32_t, num_scanouts, VIRTIO_GPU_MAX_SCANOUTS); if (!IS_ENABLED(CONFIG_DRM_VIRTIO_GPU_KMS) || !vgdev->num_scanouts) { DRM_INFO("KMS disabled\n"); vgdev->num_scanouts = 0; vgdev->has_edid = false; dev->driver_features &= ~(DRIVER_MODESET | DRIVER_ATOMIC); } else { DRM_INFO("number of scanouts: %d\n", num_scanouts); } virtio_cread_le(vgdev->vdev, struct virtio_gpu_config, num_capsets, &num_capsets); DRM_INFO("number of cap sets: %d\n", num_capsets); ret = virtio_gpu_modeset_init(vgdev); if (ret) { DRM_ERROR("modeset init failed\n"); goto err_scanouts; } virtio_device_ready(vgdev->vdev); if (num_capsets) virtio_gpu_get_capsets(vgdev, num_capsets); if (vgdev->num_scanouts) { if (vgdev->has_edid) virtio_gpu_cmd_get_edids(vgdev); virtio_gpu_cmd_get_display_info(vgdev); virtio_gpu_notify(vgdev); wait_event_timeout(vgdev->resp_wq, !vgdev->display_info_pending, 5 * HZ); } return 0; err_scanouts: virtio_gpu_free_vbufs(vgdev); err_vbufs: vgdev->vdev->config->del_vqs(vgdev->vdev); err_vqs: dev->dev_private = NULL; return ret; } static void virtio_gpu_cleanup_cap_cache(struct virtio_gpu_device *vgdev) { struct virtio_gpu_drv_cap_cache *cache_ent, *tmp; list_for_each_entry_safe(cache_ent, tmp, &vgdev->cap_cache, head) { kfree(cache_ent->caps_cache); kfree(cache_ent); } } void virtio_gpu_deinit(struct drm_device *dev) { struct virtio_gpu_device *vgdev = dev->dev_private; flush_work(&vgdev->obj_free_work); flush_work(&vgdev->ctrlq.dequeue_work); flush_work(&vgdev->cursorq.dequeue_work); flush_work(&vgdev->config_changed_work); virtio_reset_device(vgdev->vdev); vgdev->vdev->config->del_vqs(vgdev->vdev); } void virtio_gpu_release(struct drm_device *dev) { struct virtio_gpu_device *vgdev = dev->dev_private; if (!vgdev) return; virtio_gpu_modeset_fini(vgdev); virtio_gpu_free_vbufs(vgdev); virtio_gpu_cleanup_cap_cache(vgdev); if (vgdev->has_host_visible) drm_mm_takedown(&vgdev->host_visible_mm); } int virtio_gpu_driver_open(struct drm_device *dev, struct drm_file *file) { struct virtio_gpu_device *vgdev = dev->dev_private; struct virtio_gpu_fpriv *vfpriv; int handle; /* can't create contexts without 3d renderer */ if (!vgdev->has_virgl_3d) return 0; /* allocate a virt GPU context for this opener */ vfpriv = kzalloc(sizeof(*vfpriv), GFP_KERNEL); if (!vfpriv) return -ENOMEM; mutex_init(&vfpriv->context_lock); handle = ida_alloc(&vgdev->ctx_id_ida, GFP_KERNEL); if (handle < 0) { kfree(vfpriv); return handle; } vfpriv->ctx_id = handle + 1; file->driver_priv = vfpriv; return 0; } void virtio_gpu_driver_postclose(struct drm_device *dev, struct drm_file *file) { struct virtio_gpu_device *vgdev = dev->dev_private; struct virtio_gpu_fpriv *vfpriv = file->driver_priv; if (!vgdev->has_virgl_3d) return; if (vfpriv->context_created) { virtio_gpu_cmd_context_destroy(vgdev, vfpriv->ctx_id); virtio_gpu_notify(vgdev); } ida_free(&vgdev->ctx_id_ida, vfpriv->ctx_id - 1); mutex_destroy(&vfpriv->context_lock); kfree(vfpriv); file->driver_priv = NULL; } |
| 3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 | /* * Copyright (c) 2007 Mellanox Technologies. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU * General Public License (GPL) Version 2, available from the file * COPYING in the main directory of this source tree, or the * OpenIB.org BSD license below: * * Redistribution and use in source and binary forms, with or * without modification, are permitted provided that the following * conditions are met: * * - Redistributions of source code must retain the above * copyright notice, this list of conditions and the following * disclaimer. * * - Redistributions in binary form must reproduce the above * copyright notice, this list of conditions and the following * disclaimer in the documentation and/or other materials * provided with the distribution. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE * SOFTWARE. */ #include <linux/kernel.h> #include <linux/ethtool.h> #include <linux/netdevice.h> #include "ipoib.h" struct ipoib_stats { char stat_string[ETH_GSTRING_LEN]; int stat_offset; }; #define IPOIB_NETDEV_STAT(m) { \ .stat_string = #m, \ .stat_offset = offsetof(struct rtnl_link_stats64, m) } static const struct ipoib_stats ipoib_gstrings_stats[] = { IPOIB_NETDEV_STAT(rx_packets), IPOIB_NETDEV_STAT(tx_packets), IPOIB_NETDEV_STAT(rx_bytes), IPOIB_NETDEV_STAT(tx_bytes), IPOIB_NETDEV_STAT(tx_errors), IPOIB_NETDEV_STAT(rx_dropped), IPOIB_NETDEV_STAT(tx_dropped), IPOIB_NETDEV_STAT(multicast), }; #define IPOIB_GLOBAL_STATS_LEN ARRAY_SIZE(ipoib_gstrings_stats) static void ipoib_get_drvinfo(struct net_device *netdev, struct ethtool_drvinfo *drvinfo) { struct ipoib_dev_priv *priv = ipoib_priv(netdev); ib_get_device_fw_str(priv->ca, drvinfo->fw_version); strscpy(drvinfo->bus_info, dev_name(priv->ca->dev.parent), sizeof(drvinfo->bus_info)); strscpy(drvinfo->driver, "ib_ipoib", sizeof(drvinfo->driver)); } static int ipoib_get_coalesce(struct net_device *dev, struct ethtool_coalesce *coal, struct kernel_ethtool_coalesce *kernel_coal, struct netlink_ext_ack *extack) { struct ipoib_dev_priv *priv = ipoib_priv(dev); coal->rx_coalesce_usecs = priv->ethtool.coalesce_usecs; coal->rx_max_coalesced_frames = priv->ethtool.max_coalesced_frames; return 0; } static int ipoib_set_coalesce(struct net_device *dev, struct ethtool_coalesce *coal, struct kernel_ethtool_coalesce *kernel_coal, struct netlink_ext_ack *extack) { struct ipoib_dev_priv *priv = ipoib_priv(dev); int ret; /* * These values are saved in the private data and returned * when ipoib_get_coalesce() is called */ if (coal->rx_coalesce_usecs > 0xffff || coal->rx_max_coalesced_frames > 0xffff) return -EINVAL; ret = rdma_set_cq_moderation(priv->recv_cq, coal->rx_max_coalesced_frames, coal->rx_coalesce_usecs); if (ret && ret != -EOPNOTSUPP) { ipoib_warn(priv, "failed modifying CQ (%d)\n", ret); return ret; } priv->ethtool.coalesce_usecs = coal->rx_coalesce_usecs; priv->ethtool.max_coalesced_frames = coal->rx_max_coalesced_frames; return 0; } static void ipoib_get_ethtool_stats(struct net_device *dev, struct ethtool_stats __always_unused *stats, u64 *data) { int i; struct net_device_stats *net_stats = &dev->stats; u8 *p = (u8 *)net_stats; for (i = 0; i < IPOIB_GLOBAL_STATS_LEN; i++) data[i] = *(u64 *)(p + ipoib_gstrings_stats[i].stat_offset); } static void ipoib_get_strings(struct net_device __always_unused *dev, u32 stringset, u8 *data) { int i; switch (stringset) { case ETH_SS_STATS: for (i = 0; i < IPOIB_GLOBAL_STATS_LEN; i++) ethtool_puts(&data, ipoib_gstrings_stats[i].stat_string); break; default: break; } } static int ipoib_get_sset_count(struct net_device __always_unused *dev, int sset) { switch (sset) { case ETH_SS_STATS: return IPOIB_GLOBAL_STATS_LEN; default: break; } return -EOPNOTSUPP; } /* Return lane speed in unit of 1e6 bit/sec */ static inline int ib_speed_enum_to_int(int speed) { switch (speed) { case IB_SPEED_SDR: return SPEED_2500; case IB_SPEED_DDR: return SPEED_5000; case IB_SPEED_QDR: case IB_SPEED_FDR10: return SPEED_10000; case IB_SPEED_FDR: return SPEED_14000; case IB_SPEED_EDR: return SPEED_25000; case IB_SPEED_HDR: return SPEED_50000; case IB_SPEED_NDR: return SPEED_100000; case IB_SPEED_XDR: return SPEED_200000; } return SPEED_UNKNOWN; } static int ipoib_get_link_ksettings(struct net_device *netdev, struct ethtool_link_ksettings *cmd) { struct ipoib_dev_priv *priv = ipoib_priv(netdev); struct ib_port_attr attr; int ret, speed, width; if (!netif_carrier_ok(netdev)) { cmd->base.speed = SPEED_UNKNOWN; cmd->base.duplex = DUPLEX_UNKNOWN; return 0; } ret = ib_query_port(priv->ca, priv->port, &attr); if (ret < 0) return -EINVAL; speed = ib_speed_enum_to_int(attr.active_speed); width = ib_width_enum_to_int(attr.active_width); if (speed < 0 || width < 0) return -EINVAL; /* Except the following are set, the other members of * the struct ethtool_link_settings are initialized to * zero in the function __ethtool_get_link_ksettings. */ cmd->base.speed = speed * width; cmd->base.duplex = DUPLEX_FULL; cmd->base.phy_address = 0xFF; cmd->base.autoneg = AUTONEG_ENABLE; cmd->base.port = PORT_OTHER; return 0; } static const struct ethtool_ops ipoib_ethtool_ops = { .supported_coalesce_params = ETHTOOL_COALESCE_RX_USECS | ETHTOOL_COALESCE_RX_MAX_FRAMES, .get_link_ksettings = ipoib_get_link_ksettings, .get_drvinfo = ipoib_get_drvinfo, .get_coalesce = ipoib_get_coalesce, .set_coalesce = ipoib_set_coalesce, .get_strings = ipoib_get_strings, .get_ethtool_stats = ipoib_get_ethtool_stats, .get_sset_count = ipoib_get_sset_count, .get_link = ethtool_op_get_link, }; void ipoib_set_ethtool_ops(struct net_device *dev) { dev->ethtool_ops = &ipoib_ethtool_ops; } |
| 18 2867 2865 1086 1995 2867 2866 2867 1 1 1 1 1 1 1 6 1 6 7 20 1256 3791 3418 1273 1 1 200 198 3 4 23 23 52 52 5 5 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 | // SPDX-License-Identifier: GPL-2.0-only /* * net/core/dst.c Protocol independent destination cache. * * Authors: Alexey Kuznetsov, <kuznet@ms2.inr.ac.ru> * */ #include <linux/bitops.h> #include <linux/errno.h> #include <linux/init.h> #include <linux/kernel.h> #include <linux/workqueue.h> #include <linux/mm.h> #include <linux/module.h> #include <linux/slab.h> #include <linux/netdevice.h> #include <linux/skbuff.h> #include <linux/string.h> #include <linux/types.h> #include <net/net_namespace.h> #include <linux/sched.h> #include <linux/prefetch.h> #include <net/lwtunnel.h> #include <net/xfrm.h> #include <net/dst.h> #include <net/dst_metadata.h> int dst_discard_out(struct net *net, struct sock *sk, struct sk_buff *skb) { kfree_skb(skb); return 0; } EXPORT_SYMBOL(dst_discard_out); const struct dst_metrics dst_default_metrics = { /* This initializer is needed to force linker to place this variable * into const section. Otherwise it might end into bss section. * We really want to avoid false sharing on this variable, and catch * any writes on it. */ .refcnt = REFCOUNT_INIT(1), }; EXPORT_SYMBOL(dst_default_metrics); void dst_init(struct dst_entry *dst, struct dst_ops *ops, struct net_device *dev, int initial_obsolete, unsigned short flags) { dst->dev = dev; netdev_hold(dev, &dst->dev_tracker, GFP_ATOMIC); dst->ops = ops; dst_init_metrics(dst, dst_default_metrics.metrics, true); dst->expires = 0UL; #ifdef CONFIG_XFRM dst->xfrm = NULL; #endif dst->input = dst_discard; dst->output = dst_discard_out; dst->error = 0; dst->obsolete = initial_obsolete; dst->header_len = 0; dst->trailer_len = 0; #ifdef CONFIG_IP_ROUTE_CLASSID dst->tclassid = 0; #endif dst->lwtstate = NULL; rcuref_init(&dst->__rcuref, 1); INIT_LIST_HEAD(&dst->rt_uncached); dst->__use = 0; dst->lastuse = jiffies; dst->flags = flags; if (!(flags & DST_NOCOUNT)) dst_entries_add(ops, 1); } EXPORT_SYMBOL(dst_init); void *dst_alloc(struct dst_ops *ops, struct net_device *dev, int initial_obsolete, unsigned short flags) { struct dst_entry *dst; if (ops->gc && !(flags & DST_NOCOUNT) && dst_entries_get_fast(ops) > ops->gc_thresh) ops->gc(ops); dst = kmem_cache_alloc(ops->kmem_cachep, GFP_ATOMIC); if (!dst) return NULL; dst_init(dst, ops, dev, initial_obsolete, flags); return dst; } EXPORT_SYMBOL(dst_alloc); static void dst_destroy(struct dst_entry *dst) { struct dst_entry *child = NULL; smp_rmb(); #ifdef CONFIG_XFRM if (dst->xfrm) { struct xfrm_dst *xdst = (struct xfrm_dst *) dst; child = xdst->child; } #endif if (dst->ops->destroy) dst->ops->destroy(dst); netdev_put(dst->dev, &dst->dev_tracker); lwtstate_put(dst->lwtstate); if (dst->flags & DST_METADATA) metadata_dst_free((struct metadata_dst *)dst); else kmem_cache_free(dst->ops->kmem_cachep, dst); dst = child; if (dst) dst_release_immediate(dst); } static void dst_destroy_rcu(struct rcu_head *head) { struct dst_entry *dst = container_of(head, struct dst_entry, rcu_head); dst_destroy(dst); } /* Operations to mark dst as DEAD and clean up the net device referenced * by dst: * 1. put the dst under blackhole interface and discard all tx/rx packets * on this route. * 2. release the net_device * This function should be called when removing routes from the fib tree * in preparation for a NETDEV_DOWN/NETDEV_UNREGISTER event and also to * make the next dst_ops->check() fail. */ void dst_dev_put(struct dst_entry *dst) { struct net_device *dev = dst->dev; dst->obsolete = DST_OBSOLETE_DEAD; if (dst->ops->ifdown) dst->ops->ifdown(dst, dev); dst->input = dst_discard; dst->output = dst_discard_out; dst->dev = blackhole_netdev; netdev_ref_replace(dev, blackhole_netdev, &dst->dev_tracker, GFP_ATOMIC); } EXPORT_SYMBOL(dst_dev_put); static void dst_count_dec(struct dst_entry *dst) { if (!(dst->flags & DST_NOCOUNT)) dst_entries_add(dst->ops, -1); } void dst_release(struct dst_entry *dst) { if (dst && rcuref_put(&dst->__rcuref)) { dst_count_dec(dst); call_rcu_hurry(&dst->rcu_head, dst_destroy_rcu); } } EXPORT_SYMBOL(dst_release); void dst_release_immediate(struct dst_entry *dst) { if (dst && rcuref_put(&dst->__rcuref)) { dst_count_dec(dst); dst_destroy(dst); } } EXPORT_SYMBOL(dst_release_immediate); u32 *dst_cow_metrics_generic(struct dst_entry *dst, unsigned long old) { struct dst_metrics *p = kmalloc(sizeof(*p), GFP_ATOMIC); if (p) { struct dst_metrics *old_p = (struct dst_metrics *)__DST_METRICS_PTR(old); unsigned long prev, new; refcount_set(&p->refcnt, 1); memcpy(p->metrics, old_p->metrics, sizeof(p->metrics)); new = (unsigned long) p; prev = cmpxchg(&dst->_metrics, old, new); if (prev != old) { kfree(p); p = (struct dst_metrics *)__DST_METRICS_PTR(prev); if (prev & DST_METRICS_READ_ONLY) p = NULL; } else if (prev & DST_METRICS_REFCOUNTED) { if (refcount_dec_and_test(&old_p->refcnt)) kfree(old_p); } } BUILD_BUG_ON(offsetof(struct dst_metrics, metrics) != 0); return (u32 *)p; } EXPORT_SYMBOL(dst_cow_metrics_generic); /* Caller asserts that dst_metrics_read_only(dst) is false. */ void __dst_destroy_metrics_generic(struct dst_entry *dst, unsigned long old) { unsigned long prev, new; new = ((unsigned long) &dst_default_metrics) | DST_METRICS_READ_ONLY; prev = cmpxchg(&dst->_metrics, old, new); if (prev == old) kfree(__DST_METRICS_PTR(old)); } EXPORT_SYMBOL(__dst_destroy_metrics_generic); struct dst_entry *dst_blackhole_check(struct dst_entry *dst, u32 cookie) { return NULL; } u32 *dst_blackhole_cow_metrics(struct dst_entry *dst, unsigned long old) { return NULL; } struct neighbour *dst_blackhole_neigh_lookup(const struct dst_entry *dst, struct sk_buff *skb, const void *daddr) { return NULL; } void dst_blackhole_update_pmtu(struct dst_entry *dst, struct sock *sk, struct sk_buff *skb, u32 mtu, bool confirm_neigh) { } EXPORT_SYMBOL_GPL(dst_blackhole_update_pmtu); void dst_blackhole_redirect(struct dst_entry *dst, struct sock *sk, struct sk_buff *skb) { } EXPORT_SYMBOL_GPL(dst_blackhole_redirect); unsigned int dst_blackhole_mtu(const struct dst_entry *dst) { unsigned int mtu = dst_metric_raw(dst, RTAX_MTU); return mtu ? : dst->dev->mtu; } EXPORT_SYMBOL_GPL(dst_blackhole_mtu); static struct dst_ops dst_blackhole_ops = { .family = AF_UNSPEC, .neigh_lookup = dst_blackhole_neigh_lookup, .check = dst_blackhole_check, .cow_metrics = dst_blackhole_cow_metrics, .update_pmtu = dst_blackhole_update_pmtu, .redirect = dst_blackhole_redirect, .mtu = dst_blackhole_mtu, }; static void __metadata_dst_init(struct metadata_dst *md_dst, enum metadata_type type, u8 optslen) { struct dst_entry *dst; dst = &md_dst->dst; dst_init(dst, &dst_blackhole_ops, NULL, DST_OBSOLETE_NONE, DST_METADATA | DST_NOCOUNT); memset(dst + 1, 0, sizeof(*md_dst) + optslen - sizeof(*dst)); md_dst->type = type; } struct metadata_dst *metadata_dst_alloc(u8 optslen, enum metadata_type type, gfp_t flags) { struct metadata_dst *md_dst; md_dst = kmalloc(sizeof(*md_dst) + optslen, flags); if (!md_dst) return NULL; __metadata_dst_init(md_dst, type, optslen); return md_dst; } EXPORT_SYMBOL_GPL(metadata_dst_alloc); void metadata_dst_free(struct metadata_dst *md_dst) { #ifdef CONFIG_DST_CACHE if (md_dst->type == METADATA_IP_TUNNEL) dst_cache_destroy(&md_dst->u.tun_info.dst_cache); #endif if (md_dst->type == METADATA_XFRM) dst_release(md_dst->u.xfrm_info.dst_orig); kfree(md_dst); } EXPORT_SYMBOL_GPL(metadata_dst_free); struct metadata_dst __percpu * metadata_dst_alloc_percpu(u8 optslen, enum metadata_type type, gfp_t flags) { int cpu; struct metadata_dst __percpu *md_dst; md_dst = __alloc_percpu_gfp(sizeof(struct metadata_dst) + optslen, __alignof__(struct metadata_dst), flags); if (!md_dst) return NULL; for_each_possible_cpu(cpu) __metadata_dst_init(per_cpu_ptr(md_dst, cpu), type, optslen); return md_dst; } EXPORT_SYMBOL_GPL(metadata_dst_alloc_percpu); void metadata_dst_free_percpu(struct metadata_dst __percpu *md_dst) { int cpu; for_each_possible_cpu(cpu) { struct metadata_dst *one_md_dst = per_cpu_ptr(md_dst, cpu); #ifdef CONFIG_DST_CACHE if (one_md_dst->type == METADATA_IP_TUNNEL) dst_cache_destroy(&one_md_dst->u.tun_info.dst_cache); #endif if (one_md_dst->type == METADATA_XFRM) dst_release(one_md_dst->u.xfrm_info.dst_orig); } free_percpu(md_dst); } EXPORT_SYMBOL_GPL(metadata_dst_free_percpu); |
| 1 1 15 15 15 15 14 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 | // SPDX-License-Identifier: GPL-2.0-only /* * linux/net/sunrpc/svc.c * * High-level RPC service routines * * Copyright (C) 1995, 1996 Olaf Kirch <okir@monad.swb.de> * * Multiple threads pools and NUMAisation * Copyright (c) 2006 Silicon Graphics, Inc. * by Greg Banks <gnb@melbourne.sgi.com> */ #include <linux/linkage.h> #include <linux/sched/signal.h> #include <linux/errno.h> #include <linux/net.h> #include <linux/in.h> #include <linux/mm.h> #include <linux/interrupt.h> #include <linux/module.h> #include <linux/kthread.h> #include <linux/slab.h> #include <linux/sunrpc/types.h> #include <linux/sunrpc/xdr.h> #include <linux/sunrpc/stats.h> #include <linux/sunrpc/svcsock.h> #include <linux/sunrpc/clnt.h> #include <linux/sunrpc/bc_xprt.h> #include <trace/events/sunrpc.h> #include "fail.h" #include "sunrpc.h" #define RPCDBG_FACILITY RPCDBG_SVCDSP static void svc_unregister(const struct svc_serv *serv, struct net *net); #define SVC_POOL_DEFAULT SVC_POOL_GLOBAL /* * Mode for mapping cpus to pools. */ enum { SVC_POOL_AUTO = -1, /* choose one of the others */ SVC_POOL_GLOBAL, /* no mapping, just a single global pool * (legacy & UP mode) */ SVC_POOL_PERCPU, /* one pool per cpu */ SVC_POOL_PERNODE /* one pool per numa node */ }; /* * Structure for mapping cpus to pools and vice versa. * Setup once during sunrpc initialisation. */ struct svc_pool_map { int count; /* How many svc_servs use us */ int mode; /* Note: int not enum to avoid * warnings about "enumeration value * not handled in switch" */ unsigned int npools; unsigned int *pool_to; /* maps pool id to cpu or node */ unsigned int *to_pool; /* maps cpu or node to pool id */ }; static struct svc_pool_map svc_pool_map = { .mode = SVC_POOL_DEFAULT }; static DEFINE_MUTEX(svc_pool_map_mutex);/* protects svc_pool_map.count only */ static int __param_set_pool_mode(const char *val, struct svc_pool_map *m) { int err, mode; mutex_lock(&svc_pool_map_mutex); err = 0; if (!strncmp(val, "auto", 4)) mode = SVC_POOL_AUTO; else if (!strncmp(val, "global", 6)) mode = SVC_POOL_GLOBAL; else if (!strncmp(val, "percpu", 6)) mode = SVC_POOL_PERCPU; else if (!strncmp(val, "pernode", 7)) mode = SVC_POOL_PERNODE; else err = -EINVAL; if (err) goto out; if (m->count == 0) m->mode = mode; else if (mode != m->mode) err = -EBUSY; out: mutex_unlock(&svc_pool_map_mutex); return err; } static int param_set_pool_mode(const char *val, const struct kernel_param *kp) { struct svc_pool_map *m = kp->arg; return __param_set_pool_mode(val, m); } int sunrpc_set_pool_mode(const char *val) { return __param_set_pool_mode(val, &svc_pool_map); } EXPORT_SYMBOL(sunrpc_set_pool_mode); /** * sunrpc_get_pool_mode - get the current pool_mode for the host * @buf: where to write the current pool_mode * @size: size of @buf * * Grab the current pool_mode from the svc_pool_map and write * the resulting string to @buf. Returns the number of characters * written to @buf (a'la snprintf()). */ int sunrpc_get_pool_mode(char *buf, size_t size) { struct svc_pool_map *m = &svc_pool_map; switch (m->mode) { case SVC_POOL_AUTO: return snprintf(buf, size, "auto"); case SVC_POOL_GLOBAL: return snprintf(buf, size, "global"); case SVC_POOL_PERCPU: return snprintf(buf, size, "percpu"); case SVC_POOL_PERNODE: return snprintf(buf, size, "pernode"); default: return snprintf(buf, size, "%d", m->mode); } } EXPORT_SYMBOL(sunrpc_get_pool_mode); static int param_get_pool_mode(char *buf, const struct kernel_param *kp) { char str[16]; int len; len = sunrpc_get_pool_mode(str, ARRAY_SIZE(str)); /* Ensure we have room for newline and NUL */ len = min_t(int, len, ARRAY_SIZE(str) - 2); /* tack on the newline */ str[len] = '\n'; str[len + 1] = '\0'; return sysfs_emit(buf, "%s", str); } module_param_call(pool_mode, param_set_pool_mode, param_get_pool_mode, &svc_pool_map, 0644); /* * Detect best pool mapping mode heuristically, * according to the machine's topology. */ static int svc_pool_map_choose_mode(void) { unsigned int node; if (nr_online_nodes > 1) { /* * Actually have multiple NUMA nodes, * so split pools on NUMA node boundaries */ return SVC_POOL_PERNODE; } node = first_online_node; if (nr_cpus_node(node) > 2) { /* * Non-trivial SMP, or CONFIG_NUMA on * non-NUMA hardware, e.g. with a generic * x86_64 kernel on Xeons. In this case we * want to divide the pools on cpu boundaries. */ return SVC_POOL_PERCPU; } /* default: one global pool */ return SVC_POOL_GLOBAL; } /* * Allocate the to_pool[] and pool_to[] arrays. * Returns 0 on success or an errno. */ static int svc_pool_map_alloc_arrays(struct svc_pool_map *m, unsigned int maxpools) { m->to_pool = kcalloc(maxpools, sizeof(unsigned int), GFP_KERNEL); if (!m->to_pool) goto fail; m->pool_to = kcalloc(maxpools, sizeof(unsigned int), GFP_KERNEL); if (!m->pool_to) goto fail_free; return 0; fail_free: kfree(m->to_pool); m->to_pool = NULL; fail: return -ENOMEM; } /* * Initialise the pool map for SVC_POOL_PERCPU mode. * Returns number of pools or <0 on error. */ static int svc_pool_map_init_percpu(struct svc_pool_map *m) { unsigned int maxpools = nr_cpu_ids; unsigned int pidx = 0; unsigned int cpu; int err; err = svc_pool_map_alloc_arrays(m, maxpools); if (err) return err; for_each_online_cpu(cpu) { BUG_ON(pidx >= maxpools); m->to_pool[cpu] = pidx; m->pool_to[pidx] = cpu; pidx++; } /* cpus brought online later all get mapped to pool0, sorry */ return pidx; }; /* * Initialise the pool map for SVC_POOL_PERNODE mode. * Returns number of pools or <0 on error. */ static int svc_pool_map_init_pernode(struct svc_pool_map *m) { unsigned int maxpools = nr_node_ids; unsigned int pidx = 0; unsigned int node; int err; err = svc_pool_map_alloc_arrays(m, maxpools); if (err) return err; for_each_node_with_cpus(node) { /* some architectures (e.g. SN2) have cpuless nodes */ BUG_ON(pidx > maxpools); m->to_pool[node] = pidx; m->pool_to[pidx] = node; pidx++; } /* nodes brought online later all get mapped to pool0, sorry */ return pidx; } /* * Add a reference to the global map of cpus to pools (and * vice versa) if pools are in use. * Initialise the map if we're the first user. * Returns the number of pools. If this is '1', no reference * was taken. */ static unsigned int svc_pool_map_get(void) { struct svc_pool_map *m = &svc_pool_map; int npools = -1; mutex_lock(&svc_pool_map_mutex); if (m->count++) { mutex_unlock(&svc_pool_map_mutex); return m->npools; } if (m->mode == SVC_POOL_AUTO) m->mode = svc_pool_map_choose_mode(); switch (m->mode) { case SVC_POOL_PERCPU: npools = svc_pool_map_init_percpu(m); break; case SVC_POOL_PERNODE: npools = svc_pool_map_init_pernode(m); break; } if (npools <= 0) { /* default, or memory allocation failure */ npools = 1; m->mode = SVC_POOL_GLOBAL; } m->npools = npools; mutex_unlock(&svc_pool_map_mutex); return npools; } /* * Drop a reference to the global map of cpus to pools. * When the last reference is dropped, the map data is * freed; this allows the sysadmin to change the pool. */ static void svc_pool_map_put(void) { struct svc_pool_map *m = &svc_pool_map; mutex_lock(&svc_pool_map_mutex); if (!--m->count) { kfree(m->to_pool); m->to_pool = NULL; kfree(m->pool_to); m->pool_to = NULL; m->npools = 0; } mutex_unlock(&svc_pool_map_mutex); } static int svc_pool_map_get_node(unsigned int pidx) { const struct svc_pool_map *m = &svc_pool_map; if (m->count) { if (m->mode == SVC_POOL_PERCPU) return cpu_to_node(m->pool_to[pidx]); if (m->mode == SVC_POOL_PERNODE) return m->pool_to[pidx]; } return NUMA_NO_NODE; } /* * Set the given thread's cpus_allowed mask so that it * will only run on cpus in the given pool. */ static inline void svc_pool_map_set_cpumask(struct task_struct *task, unsigned int pidx) { struct svc_pool_map *m = &svc_pool_map; unsigned int node = m->pool_to[pidx]; /* * The caller checks for sv_nrpools > 1, which * implies that we've been initialized. */ WARN_ON_ONCE(m->count == 0); if (m->count == 0) return; switch (m->mode) { case SVC_POOL_PERCPU: { set_cpus_allowed_ptr(task, cpumask_of(node)); break; } case SVC_POOL_PERNODE: { set_cpus_allowed_ptr(task, cpumask_of_node(node)); break; } } } /** * svc_pool_for_cpu - Select pool to run a thread on this cpu * @serv: An RPC service * * Use the active CPU and the svc_pool_map's mode setting to * select the svc thread pool to use. Once initialized, the * svc_pool_map does not change. * * Return value: * A pointer to an svc_pool */ struct svc_pool *svc_pool_for_cpu(struct svc_serv *serv) { struct svc_pool_map *m = &svc_pool_map; int cpu = raw_smp_processor_id(); unsigned int pidx = 0; if (serv->sv_nrpools <= 1) return serv->sv_pools; switch (m->mode) { case SVC_POOL_PERCPU: pidx = m->to_pool[cpu]; break; case SVC_POOL_PERNODE: pidx = m->to_pool[cpu_to_node(cpu)]; break; } return &serv->sv_pools[pidx % serv->sv_nrpools]; } static int svc_rpcb_setup(struct svc_serv *serv, struct net *net) { int err; err = rpcb_create_local(net); if (err) return err; /* Remove any stale portmap registrations */ svc_unregister(serv, net); return 0; } void svc_rpcb_cleanup(struct svc_serv *serv, struct net *net) { svc_unregister(serv, net); rpcb_put_local(net); } EXPORT_SYMBOL_GPL(svc_rpcb_cleanup); static int svc_uses_rpcbind(struct svc_serv *serv) { unsigned int p, i; for (p = 0; p < serv->sv_nprogs; p++) { struct svc_program *progp = &serv->sv_programs[p]; for (i = 0; i < progp->pg_nvers; i++) { if (progp->pg_vers[i] == NULL) continue; if (!progp->pg_vers[i]->vs_hidden) return 1; } } return 0; } int svc_bind(struct svc_serv *serv, struct net *net) { if (!svc_uses_rpcbind(serv)) return 0; return svc_rpcb_setup(serv, net); } EXPORT_SYMBOL_GPL(svc_bind); #if defined(CONFIG_SUNRPC_BACKCHANNEL) static void __svc_init_bc(struct svc_serv *serv) { lwq_init(&serv->sv_cb_list); } #else static void __svc_init_bc(struct svc_serv *serv) { } #endif /* * Create an RPC service */ static struct svc_serv * __svc_create(struct svc_program *prog, int nprogs, struct svc_stat *stats, unsigned int bufsize, int npools, int (*threadfn)(void *data)) { struct svc_serv *serv; unsigned int vers; unsigned int xdrsize; unsigned int i; if (!(serv = kzalloc(sizeof(*serv), GFP_KERNEL))) return NULL; serv->sv_name = prog->pg_name; serv->sv_programs = prog; serv->sv_nprogs = nprogs; serv->sv_stats = stats; if (bufsize > RPCSVC_MAXPAYLOAD) bufsize = RPCSVC_MAXPAYLOAD; serv->sv_max_payload = bufsize? bufsize : 4096; serv->sv_max_mesg = roundup(serv->sv_max_payload + PAGE_SIZE, PAGE_SIZE); serv->sv_threadfn = threadfn; xdrsize = 0; for (i = 0; i < nprogs; i++) { struct svc_program *progp = &prog[i]; progp->pg_lovers = progp->pg_nvers-1; for (vers = 0; vers < progp->pg_nvers ; vers++) if (progp->pg_vers[vers]) { progp->pg_hivers = vers; if (progp->pg_lovers > vers) progp->pg_lovers = vers; if (progp->pg_vers[vers]->vs_xdrsize > xdrsize) xdrsize = progp->pg_vers[vers]->vs_xdrsize; } } serv->sv_xdrsize = xdrsize; INIT_LIST_HEAD(&serv->sv_tempsocks); INIT_LIST_HEAD(&serv->sv_permsocks); timer_setup(&serv->sv_temptimer, NULL, 0); spin_lock_init(&serv->sv_lock); __svc_init_bc(serv); serv->sv_nrpools = npools; serv->sv_pools = kcalloc(serv->sv_nrpools, sizeof(struct svc_pool), GFP_KERNEL); if (!serv->sv_pools) { kfree(serv); return NULL; } for (i = 0; i < serv->sv_nrpools; i++) { struct svc_pool *pool = &serv->sv_pools[i]; dprintk("svc: initialising pool %u for %s\n", i, serv->sv_name); pool->sp_id = i; lwq_init(&pool->sp_xprts); INIT_LIST_HEAD(&pool->sp_all_threads); init_llist_head(&pool->sp_idle_threads); percpu_counter_init(&pool->sp_messages_arrived, 0, GFP_KERNEL); percpu_counter_init(&pool->sp_sockets_queued, 0, GFP_KERNEL); percpu_counter_init(&pool->sp_threads_woken, 0, GFP_KERNEL); } return serv; } /** * svc_create - Create an RPC service * @prog: the RPC program the new service will handle * @bufsize: maximum message size for @prog * @threadfn: a function to service RPC requests for @prog * * Returns an instantiated struct svc_serv object or NULL. */ struct svc_serv *svc_create(struct svc_program *prog, unsigned int bufsize, int (*threadfn)(void *data)) { return __svc_create(prog, 1, NULL, bufsize, 1, threadfn); } EXPORT_SYMBOL_GPL(svc_create); /** * svc_create_pooled - Create an RPC service with pooled threads * @prog: Array of RPC programs the new service will handle * @nprogs: Number of programs in the array * @stats: the stats struct if desired * @bufsize: maximum message size for @prog * @threadfn: a function to service RPC requests for @prog * * Returns an instantiated struct svc_serv object or NULL. */ struct svc_serv *svc_create_pooled(struct svc_program *prog, unsigned int nprogs, struct svc_stat *stats, unsigned int bufsize, int (*threadfn)(void *data)) { struct svc_serv *serv; unsigned int npools = svc_pool_map_get(); serv = __svc_create(prog, nprogs, stats, bufsize, npools, threadfn); if (!serv) goto out_err; serv->sv_is_pooled = true; return serv; out_err: svc_pool_map_put(); return NULL; } EXPORT_SYMBOL_GPL(svc_create_pooled); /* * Destroy an RPC service. Should be called with appropriate locking to * protect sv_permsocks and sv_tempsocks. */ void svc_destroy(struct svc_serv **servp) { struct svc_serv *serv = *servp; unsigned int i; *servp = NULL; dprintk("svc: svc_destroy(%s)\n", serv->sv_programs->pg_name); timer_shutdown_sync(&serv->sv_temptimer); /* * Remaining transports at this point are not expected. */ WARN_ONCE(!list_empty(&serv->sv_permsocks), "SVC: permsocks remain for %s\n", serv->sv_programs->pg_name); WARN_ONCE(!list_empty(&serv->sv_tempsocks), "SVC: tempsocks remain for %s\n", serv->sv_programs->pg_name); cache_clean_deferred(serv); if (serv->sv_is_pooled) svc_pool_map_put(); for (i = 0; i < serv->sv_nrpools; i++) { struct svc_pool *pool = &serv->sv_pools[i]; percpu_counter_destroy(&pool->sp_messages_arrived); percpu_counter_destroy(&pool->sp_sockets_queued); percpu_counter_destroy(&pool->sp_threads_woken); } kfree(serv->sv_pools); kfree(serv); } EXPORT_SYMBOL_GPL(svc_destroy); static bool svc_init_buffer(struct svc_rqst *rqstp, unsigned int size, int node) { unsigned long pages, ret; /* bc_xprt uses fore channel allocated buffers */ if (svc_is_backchannel(rqstp)) return true; pages = size / PAGE_SIZE + 1; /* extra page as we hold both request and reply. * We assume one is at most one page */ WARN_ON_ONCE(pages > RPCSVC_MAXPAGES); if (pages > RPCSVC_MAXPAGES) pages = RPCSVC_MAXPAGES; ret = alloc_pages_bulk_node(GFP_KERNEL, node, pages, rqstp->rq_pages); return ret == pages; } /* * Release an RPC server buffer */ static void svc_release_buffer(struct svc_rqst *rqstp) { unsigned int i; for (i = 0; i < ARRAY_SIZE(rqstp->rq_pages); i++) if (rqstp->rq_pages[i]) put_page(rqstp->rq_pages[i]); } static void svc_rqst_free(struct svc_rqst *rqstp) { folio_batch_release(&rqstp->rq_fbatch); svc_release_buffer(rqstp); if (rqstp->rq_scratch_page) put_page(rqstp->rq_scratch_page); kfree(rqstp->rq_resp); kfree(rqstp->rq_argp); kfree(rqstp->rq_auth_data); kfree_rcu(rqstp, rq_rcu_head); } static struct svc_rqst * svc_prepare_thread(struct svc_serv *serv, struct svc_pool *pool, int node) { struct svc_rqst *rqstp; rqstp = kzalloc_node(sizeof(*rqstp), GFP_KERNEL, node); if (!rqstp) return rqstp; folio_batch_init(&rqstp->rq_fbatch); rqstp->rq_server = serv; rqstp->rq_pool = pool; rqstp->rq_scratch_page = alloc_pages_node(node, GFP_KERNEL, 0); if (!rqstp->rq_scratch_page) goto out_enomem; rqstp->rq_argp = kmalloc_node(serv->sv_xdrsize, GFP_KERNEL, node); if (!rqstp->rq_argp) goto out_enomem; rqstp->rq_resp = kmalloc_node(serv->sv_xdrsize, GFP_KERNEL, node); if (!rqstp->rq_resp) goto out_enomem; if (!svc_init_buffer(rqstp, serv->sv_max_mesg, node)) goto out_enomem; rqstp->rq_err = -EAGAIN; /* No error yet */ serv->sv_nrthreads += 1; pool->sp_nrthreads += 1; /* Protected by whatever lock the service uses when calling * svc_set_num_threads() */ list_add_rcu(&rqstp->rq_all, &pool->sp_all_threads); return rqstp; out_enomem: svc_rqst_free(rqstp); return NULL; } /** * svc_pool_wake_idle_thread - Awaken an idle thread in @pool * @pool: service thread pool * * Can be called from soft IRQ or process context. Finding an idle * service thread and marking it BUSY is atomic with respect to * other calls to svc_pool_wake_idle_thread(). * */ void svc_pool_wake_idle_thread(struct svc_pool *pool) { struct svc_rqst *rqstp; struct llist_node *ln; rcu_read_lock(); ln = READ_ONCE(pool->sp_idle_threads.first); if (ln) { rqstp = llist_entry(ln, struct svc_rqst, rq_idle); WRITE_ONCE(rqstp->rq_qtime, ktime_get()); if (!task_is_running(rqstp->rq_task)) { wake_up_process(rqstp->rq_task); trace_svc_wake_up(rqstp->rq_task->pid); percpu_counter_inc(&pool->sp_threads_woken); } rcu_read_unlock(); return; } rcu_read_unlock(); } EXPORT_SYMBOL_GPL(svc_pool_wake_idle_thread); static struct svc_pool * svc_pool_next(struct svc_serv *serv, struct svc_pool *pool, unsigned int *state) { return pool ? pool : &serv->sv_pools[(*state)++ % serv->sv_nrpools]; } static struct svc_pool * svc_pool_victim(struct svc_serv *serv, struct svc_pool *target_pool, unsigned int *state) { struct svc_pool *pool; unsigned int i; pool = target_pool; if (!pool) { for (i = 0; i < serv->sv_nrpools; i++) { pool = &serv->sv_pools[--(*state) % serv->sv_nrpools]; if (pool->sp_nrthreads) break; } } if (pool && pool->sp_nrthreads) { set_bit(SP_VICTIM_REMAINS, &pool->sp_flags); set_bit(SP_NEED_VICTIM, &pool->sp_flags); return pool; } return NULL; } static int svc_start_kthreads(struct svc_serv *serv, struct svc_pool *pool, int nrservs) { struct svc_rqst *rqstp; struct task_struct *task; struct svc_pool *chosen_pool; unsigned int state = serv->sv_nrthreads-1; int node; int err; do { nrservs--; chosen_pool = svc_pool_next(serv, pool, &state); node = svc_pool_map_get_node(chosen_pool->sp_id); rqstp = svc_prepare_thread(serv, chosen_pool, node); if (!rqstp) return -ENOMEM; task = kthread_create_on_node(serv->sv_threadfn, rqstp, node, "%s", serv->sv_name); if (IS_ERR(task)) { svc_exit_thread(rqstp); return PTR_ERR(task); } rqstp->rq_task = task; if (serv->sv_nrpools > 1) svc_pool_map_set_cpumask(task, chosen_pool->sp_id); svc_sock_update_bufs(serv); wake_up_process(task); wait_var_event(&rqstp->rq_err, rqstp->rq_err != -EAGAIN); err = rqstp->rq_err; if (err) { svc_exit_thread(rqstp); return err; } } while (nrservs > 0); return 0; } static int svc_stop_kthreads(struct svc_serv *serv, struct svc_pool *pool, int nrservs) { unsigned int state = serv->sv_nrthreads-1; struct svc_pool *victim; do { victim = svc_pool_victim(serv, pool, &state); if (!victim) break; svc_pool_wake_idle_thread(victim); wait_on_bit(&victim->sp_flags, SP_VICTIM_REMAINS, TASK_IDLE); nrservs++; } while (nrservs < 0); return 0; } /** * svc_set_num_threads - adjust number of threads per RPC service * @serv: RPC service to adjust * @pool: Specific pool from which to choose threads, or NULL * @nrservs: New number of threads for @serv (0 or less means kill all threads) * * Create or destroy threads to make the number of threads for @serv the * given number. If @pool is non-NULL, change only threads in that pool; * otherwise, round-robin between all pools for @serv. @serv's * sv_nrthreads is adjusted for each thread created or destroyed. * * Caller must ensure mutual exclusion between this and server startup or * shutdown. * * Returns zero on success or a negative errno if an error occurred while * starting a thread. */ int svc_set_num_threads(struct svc_serv *serv, struct svc_pool *pool, int nrservs) { if (!pool) nrservs -= serv->sv_nrthreads; else nrservs -= pool->sp_nrthreads; if (nrservs > 0) return svc_start_kthreads(serv, pool, nrservs); if (nrservs < 0) return svc_stop_kthreads(serv, pool, nrservs); return 0; } EXPORT_SYMBOL_GPL(svc_set_num_threads); /** * svc_rqst_replace_page - Replace one page in rq_pages[] * @rqstp: svc_rqst with pages to replace * @page: replacement page * * When replacing a page in rq_pages, batch the release of the * replaced pages to avoid hammering the page allocator. * * Return values: * %true: page replaced * %false: array bounds checking failed */ bool svc_rqst_replace_page(struct svc_rqst *rqstp, struct page *page) { struct page **begin = rqstp->rq_pages; struct page **end = &rqstp->rq_pages[RPCSVC_MAXPAGES]; if (unlikely(rqstp->rq_next_page < begin || rqstp->rq_next_page > end)) { trace_svc_replace_page_err(rqstp); return false; } if (*rqstp->rq_next_page) { if (!folio_batch_add(&rqstp->rq_fbatch, page_folio(*rqstp->rq_next_page))) __folio_batch_release(&rqstp->rq_fbatch); } get_page(page); *(rqstp->rq_next_page++) = page; return true; } EXPORT_SYMBOL_GPL(svc_rqst_replace_page); /** * svc_rqst_release_pages - Release Reply buffer pages * @rqstp: RPC transaction context * * Release response pages that might still be in flight after * svc_send, and any spliced filesystem-owned pages. */ void svc_rqst_release_pages(struct svc_rqst *rqstp) { int i, count = rqstp->rq_next_page - rqstp->rq_respages; if (count) { release_pages(rqstp->rq_respages, count); for (i = 0; i < count; i++) rqstp->rq_respages[i] = NULL; } } /** * svc_exit_thread - finalise the termination of a sunrpc server thread * @rqstp: the svc_rqst which represents the thread. * * When a thread started with svc_new_thread() exits it must call * svc_exit_thread() as its last act. This must be done with the * service mutex held. Normally this is held by a DIFFERENT thread, the * one that is calling svc_set_num_threads() and which will wait for * SP_VICTIM_REMAINS to be cleared before dropping the mutex. If the * thread exits for any reason other than svc_thread_should_stop() * returning %true (which indicated that svc_set_num_threads() is * waiting for it to exit), then it must take the service mutex itself, * which can only safely be done using mutex_try_lock(). */ void svc_exit_thread(struct svc_rqst *rqstp) { struct svc_serv *serv = rqstp->rq_server; struct svc_pool *pool = rqstp->rq_pool; list_del_rcu(&rqstp->rq_all); pool->sp_nrthreads -= 1; serv->sv_nrthreads -= 1; svc_sock_update_bufs(serv); svc_rqst_free(rqstp); clear_and_wake_up_bit(SP_VICTIM_REMAINS, &pool->sp_flags); } EXPORT_SYMBOL_GPL(svc_exit_thread); /* * Register an "inet" protocol family netid with the local * rpcbind daemon via an rpcbind v4 SET request. * * No netconfig infrastructure is available in the kernel, so * we map IP_ protocol numbers to netids by hand. * * Returns zero on success; a negative errno value is returned * if any error occurs. */ static int __svc_rpcb_register4(struct net *net, const u32 program, const u32 version, const unsigned short protocol, const unsigned short port) { const struct sockaddr_in sin = { .sin_family = AF_INET, .sin_addr.s_addr = htonl(INADDR_ANY), .sin_port = htons(port), }; const char *netid; int error; switch (protocol) { case IPPROTO_UDP: netid = RPCBIND_NETID_UDP; break; case IPPROTO_TCP: netid = RPCBIND_NETID_TCP; break; default: return -ENOPROTOOPT; } error = rpcb_v4_register(net, program, version, (const struct sockaddr *)&sin, netid); /* * User space didn't support rpcbind v4, so retry this * registration request with the legacy rpcbind v2 protocol. */ if (error == -EPROTONOSUPPORT) error = rpcb_register(net, program, version, protocol, port); return error; } #if IS_ENABLED(CONFIG_IPV6) /* * Register an "inet6" protocol family netid with the local * rpcbind daemon via an rpcbind v4 SET request. * * No netconfig infrastructure is available in the kernel, so * we map IP_ protocol numbers to netids by hand. * * Returns zero on success; a negative errno value is returned * if any error occurs. */ static int __svc_rpcb_register6(struct net *net, const u32 program, const u32 version, const unsigned short protocol, const unsigned short port) { const struct sockaddr_in6 sin6 = { .sin6_family = AF_INET6, .sin6_addr = IN6ADDR_ANY_INIT, .sin6_port = htons(port), }; const char *netid; int error; switch (protocol) { case IPPROTO_UDP: netid = RPCBIND_NETID_UDP6; break; case IPPROTO_TCP: netid = RPCBIND_NETID_TCP6; break; default: return -ENOPROTOOPT; } error = rpcb_v4_register(net, program, version, (const struct sockaddr *)&sin6, netid); /* * User space didn't support rpcbind version 4, so we won't * use a PF_INET6 listener. */ if (error == -EPROTONOSUPPORT) error = -EAFNOSUPPORT; return error; } #endif /* IS_ENABLED(CONFIG_IPV6) */ /* * Register a kernel RPC service via rpcbind version 4. * * Returns zero on success; a negative errno value is returned * if any error occurs. */ static int __svc_register(struct net *net, const char *progname, const u32 program, const u32 version, const int family, const unsigned short protocol, const unsigned short port) { int error = -EAFNOSUPPORT; switch (family) { case PF_INET: error = __svc_rpcb_register4(net, program, version, protocol, port); break; #if IS_ENABLED(CONFIG_IPV6) case PF_INET6: error = __svc_rpcb_register6(net, program, version, protocol, port); #endif } trace_svc_register(progname, version, family, protocol, port, error); return error; } static int svc_rpcbind_set_version(struct net *net, const struct svc_program *progp, u32 version, int family, unsigned short proto, unsigned short port) { return __svc_register(net, progp->pg_name, progp->pg_prog, version, family, proto, port); } int svc_generic_rpcbind_set(struct net *net, const struct svc_program *progp, u32 version, int family, unsigned short proto, unsigned short port) { const struct svc_version *vers = progp->pg_vers[version]; int error; if (vers == NULL) return 0; if (vers->vs_hidden) { trace_svc_noregister(progp->pg_name, version, proto, port, family, 0); return 0; } /* * Don't register a UDP port if we need congestion * control. */ if (vers->vs_need_cong_ctrl && proto == IPPROTO_UDP) return 0; error = svc_rpcbind_set_version(net, progp, version, family, proto, port); return (vers->vs_rpcb_optnl) ? 0 : error; } EXPORT_SYMBOL_GPL(svc_generic_rpcbind_set); /** * svc_register - register an RPC service with the local portmapper * @serv: svc_serv struct for the service to register * @net: net namespace for the service to register * @family: protocol family of service's listener socket * @proto: transport protocol number to advertise * @port: port to advertise * * Service is registered for any address in the passed-in protocol family */ int svc_register(const struct svc_serv *serv, struct net *net, const int family, const unsigned short proto, const unsigned short port) { unsigned int p, i; int error = 0; WARN_ON_ONCE(proto == 0 && port == 0); if (proto == 0 && port == 0) return -EINVAL; for (p = 0; p < serv->sv_nprogs; p++) { struct svc_program *progp = &serv->sv_programs[p]; for (i = 0; i < progp->pg_nvers; i++) { error = progp->pg_rpcbind_set(net, progp, i, family, proto, port); if (error < 0) { printk(KERN_WARNING "svc: failed to register " "%sv%u RPC service (errno %d).\n", progp->pg_name, i, -error); break; } } } return error; } /* * If user space is running rpcbind, it should take the v4 UNSET * and clear everything for this [program, version]. If user space * is running portmap, it will reject the v4 UNSET, but won't have * any "inet6" entries anyway. So a PMAP_UNSET should be sufficient * in this case to clear all existing entries for [program, version]. */ static void __svc_unregister(struct net *net, const u32 program, const u32 version, const char *progname) { int error; error = rpcb_v4_register(net, program, version, NULL, ""); /* * User space didn't support rpcbind v4, so retry this * request with the legacy rpcbind v2 protocol. */ if (error == -EPROTONOSUPPORT) error = rpcb_register(net, program, version, 0, 0); trace_svc_unregister(progname, version, error); } /* * All netids, bind addresses and ports registered for [program, version] * are removed from the local rpcbind database (if the service is not * hidden) to make way for a new instance of the service. * * The result of unregistration is reported via dprintk for those who want * verification of the result, but is otherwise not important. */ static void svc_unregister(const struct svc_serv *serv, struct net *net) { struct sighand_struct *sighand; unsigned long flags; unsigned int p, i; clear_thread_flag(TIF_SIGPENDING); for (p = 0; p < serv->sv_nprogs; p++) { struct svc_program *progp = &serv->sv_programs[p]; for (i = 0; i < progp->pg_nvers; i++) { if (progp->pg_vers[i] == NULL) continue; if (progp->pg_vers[i]->vs_hidden) continue; __svc_unregister(net, progp->pg_prog, i, progp->pg_name); } } rcu_read_lock(); sighand = rcu_dereference(current->sighand); spin_lock_irqsave(&sighand->siglock, flags); recalc_sigpending(); spin_unlock_irqrestore(&sighand->siglock, flags); rcu_read_unlock(); } /* * dprintk the given error with the address of the client that caused it. */ #if IS_ENABLED(CONFIG_SUNRPC_DEBUG) static __printf(2, 3) void svc_printk(struct svc_rqst *rqstp, const char *fmt, ...) { struct va_format vaf; va_list args; char buf[RPC_MAX_ADDRBUFLEN]; va_start(args, fmt); vaf.fmt = fmt; vaf.va = &args; dprintk("svc: %s: %pV", svc_print_addr(rqstp, buf, sizeof(buf)), &vaf); va_end(args); } #else static __printf(2,3) void svc_printk(struct svc_rqst *rqstp, const char *fmt, ...) {} #endif __be32 svc_generic_init_request(struct svc_rqst *rqstp, const struct svc_program *progp, struct svc_process_info *ret) { const struct svc_version *versp = NULL; /* compiler food */ const struct svc_procedure *procp = NULL; if (rqstp->rq_vers >= progp->pg_nvers ) goto err_bad_vers; versp = progp->pg_vers[rqstp->rq_vers]; if (!versp) goto err_bad_vers; /* * Some protocol versions (namely NFSv4) require some form of * congestion control. (See RFC 7530 section 3.1 paragraph 2) * In other words, UDP is not allowed. We mark those when setting * up the svc_xprt, and verify that here. * * The spec is not very clear about what error should be returned * when someone tries to access a server that is listening on UDP * for lower versions. RPC_PROG_MISMATCH seems to be the closest * fit. */ if (versp->vs_need_cong_ctrl && rqstp->rq_xprt && !test_bit(XPT_CONG_CTRL, &rqstp->rq_xprt->xpt_flags)) goto err_bad_vers; if (rqstp->rq_proc >= versp->vs_nproc) goto err_bad_proc; rqstp->rq_procinfo = procp = &versp->vs_proc[rqstp->rq_proc]; /* Initialize storage for argp and resp */ memset(rqstp->rq_argp, 0, procp->pc_argzero); memset(rqstp->rq_resp, 0, procp->pc_ressize); /* Bump per-procedure stats counter */ this_cpu_inc(versp->vs_count[rqstp->rq_proc]); ret->dispatch = versp->vs_dispatch; return rpc_success; err_bad_vers: ret->mismatch.lovers = progp->pg_lovers; ret->mismatch.hivers = progp->pg_hivers; return rpc_prog_mismatch; err_bad_proc: return rpc_proc_unavail; } EXPORT_SYMBOL_GPL(svc_generic_init_request); /* * Common routine for processing the RPC request. */ static int svc_process_common(struct svc_rqst *rqstp) { struct xdr_stream *xdr = &rqstp->rq_res_stream; struct svc_program *progp = NULL; const struct svc_procedure *procp = NULL; struct svc_serv *serv = rqstp->rq_server; struct svc_process_info process; enum svc_auth_status auth_res; unsigned int aoffset; int pr, rc; __be32 *p; /* Will be turned off only when NFSv4 Sessions are used */ set_bit(RQ_USEDEFERRAL, &rqstp->rq_flags); clear_bit(RQ_DROPME, &rqstp->rq_flags); /* Construct the first words of the reply: */ svcxdr_init_encode(rqstp); xdr_stream_encode_be32(xdr, rqstp->rq_xid); xdr_stream_encode_be32(xdr, rpc_reply); p = xdr_inline_decode(&rqstp->rq_arg_stream, XDR_UNIT * 4); if (unlikely(!p)) goto err_short_len; if (*p++ != cpu_to_be32(RPC_VERSION)) goto err_bad_rpc; xdr_stream_encode_be32(xdr, rpc_msg_accepted); rqstp->rq_prog = be32_to_cpup(p++); rqstp->rq_vers = be32_to_cpup(p++); rqstp->rq_proc = be32_to_cpup(p); for (pr = 0; pr < serv->sv_nprogs; pr++) if (rqstp->rq_prog == serv->sv_programs[pr].pg_prog) progp = &serv->sv_programs[pr]; /* * Decode auth data, and add verifier to reply buffer. * We do this before anything else in order to get a decent * auth verifier. */ auth_res = svc_authenticate(rqstp); /* Also give the program a chance to reject this call: */ if (auth_res == SVC_OK && progp) auth_res = progp->pg_authenticate(rqstp); trace_svc_authenticate(rqstp, auth_res); switch (auth_res) { case SVC_OK: break; case SVC_GARBAGE: goto err_garbage_args; case SVC_SYSERR: goto err_system_err; case SVC_DENIED: goto err_bad_auth; case SVC_CLOSE: goto close; case SVC_DROP: goto dropit; case SVC_COMPLETE: goto sendit; default: pr_warn_once("Unexpected svc_auth_status (%d)\n", auth_res); goto err_system_err; } if (progp == NULL) goto err_bad_prog; switch (progp->pg_init_request(rqstp, progp, &process)) { case rpc_success: break; case rpc_prog_unavail: goto err_bad_prog; case rpc_prog_mismatch: goto err_bad_vers; case rpc_proc_unavail: goto err_bad_proc; } procp = rqstp->rq_procinfo; /* Should this check go into the dispatcher? */ if (!procp || !procp->pc_func) goto err_bad_proc; /* Syntactic check complete */ if (serv->sv_stats) serv->sv_stats->rpccnt++; trace_svc_process(rqstp, progp->pg_name); aoffset = xdr_stream_pos(xdr); /* un-reserve some of the out-queue now that we have a * better idea of reply size */ if (procp->pc_xdrressize) svc_reserve_auth(rqstp, procp->pc_xdrressize<<2); /* Call the function that processes the request. */ rc = process.dispatch(rqstp); if (procp->pc_release) procp->pc_release(rqstp); xdr_finish_decode(xdr); if (!rc) goto dropit; if (rqstp->rq_auth_stat != rpc_auth_ok) goto err_bad_auth; if (*rqstp->rq_accept_statp != rpc_success) xdr_truncate_encode(xdr, aoffset); if (procp->pc_encode == NULL) goto dropit; sendit: if (svc_authorise(rqstp)) goto close_xprt; return 1; /* Caller can now send it */ dropit: svc_authorise(rqstp); /* doesn't hurt to call this twice */ dprintk("svc: svc_process dropit\n"); return 0; close: svc_authorise(rqstp); close_xprt: if (rqstp->rq_xprt && test_bit(XPT_TEMP, &rqstp->rq_xprt->xpt_flags)) svc_xprt_close(rqstp->rq_xprt); dprintk("svc: svc_process close\n"); return 0; err_short_len: svc_printk(rqstp, "short len %u, dropping request\n", rqstp->rq_arg.len); goto close_xprt; err_bad_rpc: if (serv->sv_stats) serv->sv_stats->rpcbadfmt++; xdr_stream_encode_u32(xdr, RPC_MSG_DENIED); xdr_stream_encode_u32(xdr, RPC_MISMATCH); /* Only RPCv2 supported */ xdr_stream_encode_u32(xdr, RPC_VERSION); xdr_stream_encode_u32(xdr, RPC_VERSION); return 1; /* don't wrap */ err_bad_auth: dprintk("svc: authentication failed (%d)\n", be32_to_cpu(rqstp->rq_auth_stat)); if (serv->sv_stats) serv->sv_stats->rpcbadauth++; /* Restore write pointer to location of reply status: */ xdr_truncate_encode(xdr, XDR_UNIT * 2); xdr_stream_encode_u32(xdr, RPC_MSG_DENIED); xdr_stream_encode_u32(xdr, RPC_AUTH_ERROR); xdr_stream_encode_be32(xdr, rqstp->rq_auth_stat); goto sendit; err_bad_prog: dprintk("svc: unknown program %d\n", rqstp->rq_prog); if (serv->sv_stats) serv->sv_stats->rpcbadfmt++; *rqstp->rq_accept_statp = rpc_prog_unavail; goto sendit; err_bad_vers: svc_printk(rqstp, "unknown version (%d for prog %d, %s)\n", rqstp->rq_vers, rqstp->rq_prog, progp->pg_name); if (serv->sv_stats) serv->sv_stats->rpcbadfmt++; *rqstp->rq_accept_statp = rpc_prog_mismatch; /* * svc_authenticate() has already added the verifier and * advanced the stream just past rq_accept_statp. */ xdr_stream_encode_u32(xdr, process.mismatch.lovers); xdr_stream_encode_u32(xdr, process.mismatch.hivers); goto sendit; err_bad_proc: svc_printk(rqstp, "unknown procedure (%d)\n", rqstp->rq_proc); if (serv->sv_stats) serv->sv_stats->rpcbadfmt++; *rqstp->rq_accept_statp = rpc_proc_unavail; goto sendit; err_garbage_args: svc_printk(rqstp, "failed to decode RPC header\n"); if (serv->sv_stats) serv->sv_stats->rpcbadfmt++; *rqstp->rq_accept_statp = rpc_garbage_args; goto sendit; err_system_err: if (serv->sv_stats) serv->sv_stats->rpcbadfmt++; *rqstp->rq_accept_statp = rpc_system_err; goto sendit; } /* * Drop request */ static void svc_drop(struct svc_rqst *rqstp) { trace_svc_drop(rqstp); } /** * svc_process - Execute one RPC transaction * @rqstp: RPC transaction context * */ void svc_process(struct svc_rqst *rqstp) { struct kvec *resv = &rqstp->rq_res.head[0]; __be32 *p; #if IS_ENABLED(CONFIG_FAIL_SUNRPC) if (!fail_sunrpc.ignore_server_disconnect && should_fail(&fail_sunrpc.attr, 1)) svc_xprt_deferred_close(rqstp->rq_xprt); #endif /* * Setup response xdr_buf. * Initially it has just one page */ rqstp->rq_next_page = &rqstp->rq_respages[1]; resv->iov_base = page_address(rqstp->rq_respages[0]); resv->iov_len = 0; rqstp->rq_res.pages = rqstp->rq_next_page; rqstp->rq_res.len = 0; rqstp->rq_res.page_base = 0; rqstp->rq_res.page_len = 0; rqstp->rq_res.buflen = PAGE_SIZE; rqstp->rq_res.tail[0].iov_base = NULL; rqstp->rq_res.tail[0].iov_len = 0; svcxdr_init_decode(rqstp); p = xdr_inline_decode(&rqstp->rq_arg_stream, XDR_UNIT * 2); if (unlikely(!p)) goto out_drop; rqstp->rq_xid = *p++; if (unlikely(*p != rpc_call)) goto out_baddir; if (!svc_process_common(rqstp)) goto out_drop; svc_send(rqstp); return; out_baddir: svc_printk(rqstp, "bad direction 0x%08x, dropping request\n", be32_to_cpu(*p)); if (rqstp->rq_server->sv_stats) rqstp->rq_server->sv_stats->rpcbadfmt++; out_drop: svc_drop(rqstp); } #if defined(CONFIG_SUNRPC_BACKCHANNEL) /** * svc_process_bc - process a reverse-direction RPC request * @req: RPC request to be used for client-side processing * @rqstp: server-side execution context * */ void svc_process_bc(struct rpc_rqst *req, struct svc_rqst *rqstp) { struct rpc_timeout timeout = { .to_increment = 0, }; struct rpc_task *task; int proc_error; /* Build the svc_rqst used by the common processing routine */ rqstp->rq_xid = req->rq_xid; rqstp->rq_prot = req->rq_xprt->prot; rqstp->rq_bc_net = req->rq_xprt->xprt_net; rqstp->rq_addrlen = sizeof(req->rq_xprt->addr); memcpy(&rqstp->rq_addr, &req->rq_xprt->addr, rqstp->rq_addrlen); memcpy(&rqstp->rq_arg, &req->rq_rcv_buf, sizeof(rqstp->rq_arg)); memcpy(&rqstp->rq_res, &req->rq_snd_buf, sizeof(rqstp->rq_res)); /* Adjust the argument buffer length */ rqstp->rq_arg.len = req->rq_private_buf.len; if (rqstp->rq_arg.len <= rqstp->rq_arg.head[0].iov_len) { rqstp->rq_arg.head[0].iov_len = rqstp->rq_arg.len; rqstp->rq_arg.page_len = 0; } else if (rqstp->rq_arg.len <= rqstp->rq_arg.head[0].iov_len + rqstp->rq_arg.page_len) rqstp->rq_arg.page_len = rqstp->rq_arg.len - rqstp->rq_arg.head[0].iov_len; else rqstp->rq_arg.len = rqstp->rq_arg.head[0].iov_len + rqstp->rq_arg.page_len; /* Reset the response buffer */ rqstp->rq_res.head[0].iov_len = 0; /* * Skip the XID and calldir fields because they've already * been processed by the caller. */ svcxdr_init_decode(rqstp); if (!xdr_inline_decode(&rqstp->rq_arg_stream, XDR_UNIT * 2)) return; /* Parse and execute the bc call */ proc_error = svc_process_common(rqstp); atomic_dec(&req->rq_xprt->bc_slot_count); if (!proc_error) { /* Processing error: drop the request */ xprt_free_bc_request(req); return; } /* Finally, send the reply synchronously */ if (rqstp->bc_to_initval > 0) { timeout.to_initval = rqstp->bc_to_initval; timeout.to_retries = rqstp->bc_to_retries; } else { timeout.to_initval = req->rq_xprt->timeout->to_initval; timeout.to_retries = req->rq_xprt->timeout->to_retries; } timeout.to_maxval = timeout.to_initval; memcpy(&req->rq_snd_buf, &rqstp->rq_res, sizeof(req->rq_snd_buf)); task = rpc_run_bc_task(req, &timeout); if (IS_ERR(task)) return; WARN_ON_ONCE(atomic_read(&task->tk_count) != 1); rpc_put_task(task); } #endif /* CONFIG_SUNRPC_BACKCHANNEL */ /** * svc_max_payload - Return transport-specific limit on the RPC payload * @rqstp: RPC transaction context * * Returns the maximum number of payload bytes the current transport * allows. */ u32 svc_max_payload(const struct svc_rqst *rqstp) { u32 max = rqstp->rq_xprt->xpt_class->xcl_max_payload; if (rqstp->rq_server->sv_max_payload < max) max = rqstp->rq_server->sv_max_payload; return max; } EXPORT_SYMBOL_GPL(svc_max_payload); /** * svc_proc_name - Return RPC procedure name in string form * @rqstp: svc_rqst to operate on * * Return value: * Pointer to a NUL-terminated string */ const char *svc_proc_name(const struct svc_rqst *rqstp) { if (rqstp && rqstp->rq_procinfo) return rqstp->rq_procinfo->pc_name; return "unknown"; } /** * svc_encode_result_payload - mark a range of bytes as a result payload * @rqstp: svc_rqst to operate on * @offset: payload's byte offset in rqstp->rq_res * @length: size of payload, in bytes * * Returns zero on success, or a negative errno if a permanent * error occurred. */ int svc_encode_result_payload(struct svc_rqst *rqstp, unsigned int offset, unsigned int length) { return rqstp->rq_xprt->xpt_ops->xpo_result_payload(rqstp, offset, length); } EXPORT_SYMBOL_GPL(svc_encode_result_payload); /** * svc_fill_write_vector - Construct data argument for VFS write call * @rqstp: svc_rqst to operate on * @payload: xdr_buf containing only the write data payload * * Fills in rqstp::rq_vec, and returns the number of elements. */ unsigned int svc_fill_write_vector(struct svc_rqst *rqstp, struct xdr_buf *payload) { struct page **pages = payload->pages; struct kvec *first = payload->head; struct kvec *vec = rqstp->rq_vec; size_t total = payload->len; unsigned int i; /* Some types of transport can present the write payload * entirely in rq_arg.pages. In this case, @first is empty. */ i = 0; if (first->iov_len) { vec[i].iov_base = first->iov_base; vec[i].iov_len = min_t(size_t, total, first->iov_len); total -= vec[i].iov_len; ++i; } while (total) { vec[i].iov_base = page_address(*pages); vec[i].iov_len = min_t(size_t, total, PAGE_SIZE); total -= vec[i].iov_len; ++i; ++pages; } WARN_ON_ONCE(i > ARRAY_SIZE(rqstp->rq_vec)); return i; } EXPORT_SYMBOL_GPL(svc_fill_write_vector); /** * svc_fill_symlink_pathname - Construct pathname argument for VFS symlink call * @rqstp: svc_rqst to operate on * @first: buffer containing first section of pathname * @p: buffer containing remaining section of pathname * @total: total length of the pathname argument * * The VFS symlink API demands a NUL-terminated pathname in mapped memory. * Returns pointer to a NUL-terminated string, or an ERR_PTR. Caller must free * the returned string. */ char *svc_fill_symlink_pathname(struct svc_rqst *rqstp, struct kvec *first, void *p, size_t total) { size_t len, remaining; char *result, *dst; result = kmalloc(total + 1, GFP_KERNEL); if (!result) return ERR_PTR(-ESERVERFAULT); dst = result; remaining = total; len = min_t(size_t, total, first->iov_len); if (len) { memcpy(dst, first->iov_base, len); dst += len; remaining -= len; } if (remaining) { len = min_t(size_t, remaining, PAGE_SIZE); memcpy(dst, p, len); dst += len; } *dst = '\0'; /* Sanity check: Linux doesn't allow the pathname argument to * contain a NUL byte. */ if (strlen(result) != total) { kfree(result); return ERR_PTR(-EINVAL); } return result; } EXPORT_SYMBOL_GPL(svc_fill_symlink_pathname); |
| 106 100 4 4 3 1 1 1 122 114 8 9 8 8 9 121 116 109 121 108 2 102 107 103 104 107 2 1 3 1 1 3 3 3 3 1 1 2 1 1 1 1 2 2 1 1 1 3 2 1 1 1 3 3 2 2 2 1 1 1 1 1 1 1 1 1 1 2 1 4 5 2 5 3 11 5 3 3 21 3 1 1 2 2 44 3 5 13 3 22 22 1 32 1 29 8 21 11 18 26 1 1 29 2 29 1 11 19 30 18 28 1 1 1 1 53 1 1 1 1 48 1 1 32 1 1 1 1 31 1 1 1 10 2 29 1 5 24 47 28 2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 | /* * Copyright (C) 2014 Red Hat * Copyright (C) 2014 Intel Corp. * Copyright (C) 2018 Intel Corp. * Copyright (c) 2020, The Linux Foundation. All rights reserved. * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and associated documentation files (the "Software"), * to deal in the Software without restriction, including without limitation * the rights to use, copy, modify, merge, publish, distribute, sublicense, * and/or sell copies of the Software, and to permit persons to whom the * Software is furnished to do so, subject to the following conditions: * * The above copyright notice and this permission notice shall be included in * all copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR * OTHER DEALINGS IN THE SOFTWARE. * * Authors: * Rob Clark <robdclark@gmail.com> * Daniel Vetter <daniel.vetter@ffwll.ch> */ #include <drm/drm_atomic_uapi.h> #include <drm/drm_atomic.h> #include <drm/drm_framebuffer.h> #include <drm/drm_print.h> #include <drm/drm_drv.h> #include <drm/drm_writeback.h> #include <drm/drm_vblank.h> #include <linux/dma-fence.h> #include <linux/uaccess.h> #include <linux/sync_file.h> #include <linux/file.h> #include "drm_crtc_internal.h" /** * DOC: overview * * This file contains the marshalling and demarshalling glue for the atomic UAPI * in all its forms: The monster ATOMIC IOCTL itself, code for GET_PROPERTY and * SET_PROPERTY IOCTLs. Plus interface functions for compatibility helpers and * drivers which have special needs to construct their own atomic updates, e.g. * for load detect or similar. */ /** * drm_atomic_set_mode_for_crtc - set mode for CRTC * @state: the CRTC whose incoming state to update * @mode: kernel-internal mode to use for the CRTC, or NULL to disable * * Set a mode (originating from the kernel) on the desired CRTC state and update * the enable property. * * RETURNS: * Zero on success, error code on failure. Cannot return -EDEADLK. */ int drm_atomic_set_mode_for_crtc(struct drm_crtc_state *state, const struct drm_display_mode *mode) { struct drm_crtc *crtc = state->crtc; struct drm_mode_modeinfo umode; /* Early return for no change. */ if (mode && memcmp(&state->mode, mode, sizeof(*mode)) == 0) return 0; drm_property_blob_put(state->mode_blob); state->mode_blob = NULL; if (mode) { struct drm_property_blob *blob; drm_mode_convert_to_umode(&umode, mode); blob = drm_property_create_blob(crtc->dev, sizeof(umode), &umode); if (IS_ERR(blob)) return PTR_ERR(blob); drm_mode_copy(&state->mode, mode); state->mode_blob = blob; state->enable = true; drm_dbg_atomic(crtc->dev, "Set [MODE:%s] for [CRTC:%d:%s] state %p\n", mode->name, crtc->base.id, crtc->name, state); } else { memset(&state->mode, 0, sizeof(state->mode)); state->enable = false; drm_dbg_atomic(crtc->dev, "Set [NOMODE] for [CRTC:%d:%s] state %p\n", crtc->base.id, crtc->name, state); } return 0; } EXPORT_SYMBOL(drm_atomic_set_mode_for_crtc); /** * drm_atomic_set_mode_prop_for_crtc - set mode for CRTC * @state: the CRTC whose incoming state to update * @blob: pointer to blob property to use for mode * * Set a mode (originating from a blob property) on the desired CRTC state. * This function will take a reference on the blob property for the CRTC state, * and release the reference held on the state's existing mode property, if any * was set. * * RETURNS: * Zero on success, error code on failure. Cannot return -EDEADLK. */ int drm_atomic_set_mode_prop_for_crtc(struct drm_crtc_state *state, struct drm_property_blob *blob) { struct drm_crtc *crtc = state->crtc; if (blob == state->mode_blob) return 0; drm_property_blob_put(state->mode_blob); state->mode_blob = NULL; memset(&state->mode, 0, sizeof(state->mode)); if (blob) { int ret; if (blob->length != sizeof(struct drm_mode_modeinfo)) { drm_dbg_atomic(crtc->dev, "[CRTC:%d:%s] bad mode blob length: %zu\n", crtc->base.id, crtc->name, blob->length); return -EINVAL; } ret = drm_mode_convert_umode(crtc->dev, &state->mode, blob->data); if (ret) { drm_dbg_atomic(crtc->dev, "[CRTC:%d:%s] invalid mode (%s, %pe): " DRM_MODE_FMT "\n", crtc->base.id, crtc->name, drm_get_mode_status_name(state->mode.status), ERR_PTR(ret), DRM_MODE_ARG(&state->mode)); return -EINVAL; } state->mode_blob = drm_property_blob_get(blob); state->enable = true; drm_dbg_atomic(crtc->dev, "Set [MODE:%s] for [CRTC:%d:%s] state %p\n", state->mode.name, crtc->base.id, crtc->name, state); } else { state->enable = false; drm_dbg_atomic(crtc->dev, "Set [NOMODE] for [CRTC:%d:%s] state %p\n", crtc->base.id, crtc->name, state); } return 0; } EXPORT_SYMBOL(drm_atomic_set_mode_prop_for_crtc); /** * drm_atomic_set_crtc_for_plane - set CRTC for plane * @plane_state: the plane whose incoming state to update * @crtc: CRTC to use for the plane * * Changing the assigned CRTC for a plane requires us to grab the lock and state * for the new CRTC, as needed. This function takes care of all these details * besides updating the pointer in the state object itself. * * Returns: * 0 on success or can fail with -EDEADLK or -ENOMEM. When the error is EDEADLK * then the w/w mutex code has detected a deadlock and the entire atomic * sequence must be restarted. All other errors are fatal. */ int drm_atomic_set_crtc_for_plane(struct drm_plane_state *plane_state, struct drm_crtc *crtc) { struct drm_plane *plane = plane_state->plane; struct drm_crtc_state *crtc_state; /* Nothing to do for same crtc*/ if (plane_state->crtc == crtc) return 0; if (plane_state->crtc) { crtc_state = drm_atomic_get_crtc_state(plane_state->state, plane_state->crtc); if (WARN_ON(IS_ERR(crtc_state))) return PTR_ERR(crtc_state); crtc_state->plane_mask &= ~drm_plane_mask(plane); } plane_state->crtc = crtc; if (crtc) { crtc_state = drm_atomic_get_crtc_state(plane_state->state, crtc); if (IS_ERR(crtc_state)) return PTR_ERR(crtc_state); crtc_state->plane_mask |= drm_plane_mask(plane); } if (crtc) drm_dbg_atomic(plane->dev, "Link [PLANE:%d:%s] state %p to [CRTC:%d:%s]\n", plane->base.id, plane->name, plane_state, crtc->base.id, crtc->name); else drm_dbg_atomic(plane->dev, "Link [PLANE:%d:%s] state %p to [NOCRTC]\n", plane->base.id, plane->name, plane_state); return 0; } EXPORT_SYMBOL(drm_atomic_set_crtc_for_plane); /** * drm_atomic_set_fb_for_plane - set framebuffer for plane * @plane_state: atomic state object for the plane * @fb: fb to use for the plane * * Changing the assigned framebuffer for a plane requires us to grab a reference * to the new fb and drop the reference to the old fb, if there is one. This * function takes care of all these details besides updating the pointer in the * state object itself. */ void drm_atomic_set_fb_for_plane(struct drm_plane_state *plane_state, struct drm_framebuffer *fb) { struct drm_plane *plane = plane_state->plane; if (fb) drm_dbg_atomic(plane->dev, "Set [FB:%d] for [PLANE:%d:%s] state %p\n", fb->base.id, plane->base.id, plane->name, plane_state); else drm_dbg_atomic(plane->dev, "Set [NOFB] for [PLANE:%d:%s] state %p\n", plane->base.id, plane->name, plane_state); drm_framebuffer_assign(&plane_state->fb, fb); } EXPORT_SYMBOL(drm_atomic_set_fb_for_plane); /** * drm_atomic_set_crtc_for_connector - set CRTC for connector * @conn_state: atomic state object for the connector * @crtc: CRTC to use for the connector * * Changing the assigned CRTC for a connector requires us to grab the lock and * state for the new CRTC, as needed. This function takes care of all these * details besides updating the pointer in the state object itself. * * Returns: * 0 on success or can fail with -EDEADLK or -ENOMEM. When the error is EDEADLK * then the w/w mutex code has detected a deadlock and the entire atomic * sequence must be restarted. All other errors are fatal. */ int drm_atomic_set_crtc_for_connector(struct drm_connector_state *conn_state, struct drm_crtc *crtc) { struct drm_connector *connector = conn_state->connector; struct drm_crtc_state *crtc_state; if (conn_state->crtc == crtc) return 0; if (conn_state->crtc) { crtc_state = drm_atomic_get_new_crtc_state(conn_state->state, conn_state->crtc); crtc_state->connector_mask &= ~drm_connector_mask(conn_state->connector); drm_connector_put(conn_state->connector); conn_state->crtc = NULL; } if (crtc) { crtc_state = drm_atomic_get_crtc_state(conn_state->state, crtc); if (IS_ERR(crtc_state)) return PTR_ERR(crtc_state); crtc_state->connector_mask |= drm_connector_mask(conn_state->connector); drm_connector_get(conn_state->connector); conn_state->crtc = crtc; drm_dbg_atomic(connector->dev, "Link [CONNECTOR:%d:%s] state %p to [CRTC:%d:%s]\n", connector->base.id, connector->name, conn_state, crtc->base.id, crtc->name); } else { drm_dbg_atomic(connector->dev, "Link [CONNECTOR:%d:%s] state %p to [NOCRTC]\n", connector->base.id, connector->name, conn_state); } return 0; } EXPORT_SYMBOL(drm_atomic_set_crtc_for_connector); static void set_out_fence_for_crtc(struct drm_atomic_state *state, struct drm_crtc *crtc, s32 __user *fence_ptr) { state->crtcs[drm_crtc_index(crtc)].out_fence_ptr = fence_ptr; } static s32 __user *get_out_fence_for_crtc(struct drm_atomic_state *state, struct drm_crtc *crtc) { s32 __user *fence_ptr; fence_ptr = state->crtcs[drm_crtc_index(crtc)].out_fence_ptr; state->crtcs[drm_crtc_index(crtc)].out_fence_ptr = NULL; return fence_ptr; } static int set_out_fence_for_connector(struct drm_atomic_state *state, struct drm_connector *connector, s32 __user *fence_ptr) { unsigned int index = drm_connector_index(connector); if (!fence_ptr) return 0; if (put_user(-1, fence_ptr)) return -EFAULT; state->connectors[index].out_fence_ptr = fence_ptr; return 0; } static s32 __user *get_out_fence_for_connector(struct drm_atomic_state *state, struct drm_connector *connector) { unsigned int index = drm_connector_index(connector); s32 __user *fence_ptr; fence_ptr = state->connectors[index].out_fence_ptr; state->connectors[index].out_fence_ptr = NULL; return fence_ptr; } static int drm_atomic_crtc_set_property(struct drm_crtc *crtc, struct drm_crtc_state *state, struct drm_property *property, uint64_t val) { struct drm_device *dev = crtc->dev; struct drm_mode_config *config = &dev->mode_config; bool replaced = false; int ret; if (property == config->prop_active) state->active = val; else if (property == config->prop_mode_id) { struct drm_property_blob *mode = drm_property_lookup_blob(dev, val); ret = drm_atomic_set_mode_prop_for_crtc(state, mode); drm_property_blob_put(mode); return ret; } else if (property == config->prop_vrr_enabled) { state->vrr_enabled = val; } else if (property == config->degamma_lut_property) { ret = drm_property_replace_blob_from_id(dev, &state->degamma_lut, val, -1, sizeof(struct drm_color_lut), &replaced); state->color_mgmt_changed |= replaced; return ret; } else if (property == config->ctm_property) { ret = drm_property_replace_blob_from_id(dev, &state->ctm, val, sizeof(struct drm_color_ctm), -1, &replaced); state->color_mgmt_changed |= replaced; return ret; } else if (property == config->gamma_lut_property) { ret = drm_property_replace_blob_from_id(dev, &state->gamma_lut, val, -1, sizeof(struct drm_color_lut), &replaced); state->color_mgmt_changed |= replaced; return ret; } else if (property == config->prop_out_fence_ptr) { s32 __user *fence_ptr = u64_to_user_ptr(val); if (!fence_ptr) return 0; if (put_user(-1, fence_ptr)) return -EFAULT; set_out_fence_for_crtc(state->state, crtc, fence_ptr); } else if (property == crtc->scaling_filter_property) { state->scaling_filter = val; } else if (crtc->funcs->atomic_set_property) { return crtc->funcs->atomic_set_property(crtc, state, property, val); } else { drm_dbg_atomic(crtc->dev, "[CRTC:%d:%s] unknown property [PROP:%d:%s]\n", crtc->base.id, crtc->name, property->base.id, property->name); return -EINVAL; } return 0; } static int drm_atomic_crtc_get_property(struct drm_crtc *crtc, const struct drm_crtc_state *state, struct drm_property *property, uint64_t *val) { struct drm_device *dev = crtc->dev; struct drm_mode_config *config = &dev->mode_config; if (property == config->prop_active) *val = drm_atomic_crtc_effectively_active(state); else if (property == config->prop_mode_id) *val = (state->mode_blob) ? state->mode_blob->base.id : 0; else if (property == config->prop_vrr_enabled) *val = state->vrr_enabled; else if (property == config->degamma_lut_property) *val = (state->degamma_lut) ? state->degamma_lut->base.id : 0; else if (property == config->ctm_property) *val = (state->ctm) ? state->ctm->base.id : 0; else if (property == config->gamma_lut_property) *val = (state->gamma_lut) ? state->gamma_lut->base.id : 0; else if (property == config->prop_out_fence_ptr) *val = 0; else if (property == crtc->scaling_filter_property) *val = state->scaling_filter; else if (crtc->funcs->atomic_get_property) return crtc->funcs->atomic_get_property(crtc, state, property, val); else { drm_dbg_atomic(dev, "[CRTC:%d:%s] unknown property [PROP:%d:%s]\n", crtc->base.id, crtc->name, property->base.id, property->name); return -EINVAL; } return 0; } static int drm_atomic_plane_set_property(struct drm_plane *plane, struct drm_plane_state *state, struct drm_file *file_priv, struct drm_property *property, uint64_t val) { struct drm_device *dev = plane->dev; struct drm_mode_config *config = &dev->mode_config; bool replaced = false; int ret; if (property == config->prop_fb_id) { struct drm_framebuffer *fb; fb = drm_framebuffer_lookup(dev, file_priv, val); drm_atomic_set_fb_for_plane(state, fb); if (fb) drm_framebuffer_put(fb); } else if (property == config->prop_in_fence_fd) { if (state->fence) return -EINVAL; if (U642I64(val) == -1) return 0; state->fence = sync_file_get_fence(val); if (!state->fence) return -EINVAL; } else if (property == config->prop_crtc_id) { struct drm_crtc *crtc = drm_crtc_find(dev, file_priv, val); if (val && !crtc) { drm_dbg_atomic(dev, "[PROP:%d:%s] cannot find CRTC with ID %llu\n", property->base.id, property->name, val); return -EACCES; } return drm_atomic_set_crtc_for_plane(state, crtc); } else if (property == config->prop_crtc_x) { state->crtc_x = U642I64(val); } else if (property == config->prop_crtc_y) { state->crtc_y = U642I64(val); } else if (property == config->prop_crtc_w) { state->crtc_w = val; } else if (property == config->prop_crtc_h) { state->crtc_h = val; } else if (property == config->prop_src_x) { state->src_x = val; } else if (property == config->prop_src_y) { state->src_y = val; } else if (property == config->prop_src_w) { state->src_w = val; } else if (property == config->prop_src_h) { state->src_h = val; } else if (property == plane->alpha_property) { state->alpha = val; } else if (property == plane->blend_mode_property) { state->pixel_blend_mode = val; } else if (property == plane->rotation_property) { if (!is_power_of_2(val & DRM_MODE_ROTATE_MASK)) { drm_dbg_atomic(plane->dev, "[PLANE:%d:%s] bad rotation bitmask: 0x%llx\n", plane->base.id, plane->name, val); return -EINVAL; } state->rotation = val; } else if (property == plane->zpos_property) { state->zpos = val; } else if (property == plane->color_encoding_property) { state->color_encoding = val; } else if (property == plane->color_range_property) { state->color_range = val; } else if (property == config->prop_fb_damage_clips) { ret = drm_property_replace_blob_from_id(dev, &state->fb_damage_clips, val, -1, sizeof(struct drm_mode_rect), &replaced); return ret; } else if (property == plane->scaling_filter_property) { state->scaling_filter = val; } else if (plane->funcs->atomic_set_property) { return plane->funcs->atomic_set_property(plane, state, property, val); } else if (property == plane->hotspot_x_property) { if (plane->type != DRM_PLANE_TYPE_CURSOR) { drm_dbg_atomic(plane->dev, "[PLANE:%d:%s] is not a cursor plane: 0x%llx\n", plane->base.id, plane->name, val); return -EINVAL; } state->hotspot_x = val; } else if (property == plane->hotspot_y_property) { if (plane->type != DRM_PLANE_TYPE_CURSOR) { drm_dbg_atomic(plane->dev, "[PLANE:%d:%s] is not a cursor plane: 0x%llx\n", plane->base.id, plane->name, val); return -EINVAL; } state->hotspot_y = val; } else { drm_dbg_atomic(plane->dev, "[PLANE:%d:%s] unknown property [PROP:%d:%s]\n", plane->base.id, plane->name, property->base.id, property->name); return -EINVAL; } return 0; } static int drm_atomic_plane_get_property(struct drm_plane *plane, const struct drm_plane_state *state, struct drm_property *property, uint64_t *val) { struct drm_device *dev = plane->dev; struct drm_mode_config *config = &dev->mode_config; if (property == config->prop_fb_id) { *val = (state->fb) ? state->fb->base.id : 0; } else if (property == config->prop_in_fence_fd) { *val = -1; } else if (property == config->prop_crtc_id) { *val = (state->crtc) ? state->crtc->base.id : 0; } else if (property == config->prop_crtc_x) { *val = I642U64(state->crtc_x); } else if (property == config->prop_crtc_y) { *val = I642U64(state->crtc_y); } else if (property == config->prop_crtc_w) { *val = state->crtc_w; } else if (property == config->prop_crtc_h) { *val = state->crtc_h; } else if (property == config->prop_src_x) { *val = state->src_x; } else if (property == config->prop_src_y) { *val = state->src_y; } else if (property == config->prop_src_w) { *val = state->src_w; } else if (property == config->prop_src_h) { *val = state->src_h; } else if (property == plane->alpha_property) { *val = state->alpha; } else if (property == plane->blend_mode_property) { *val = state->pixel_blend_mode; } else if (property == plane->rotation_property) { *val = state->rotation; } else if (property == plane->zpos_property) { *val = state->zpos; } else if (property == plane->color_encoding_property) { *val = state->color_encoding; } else if (property == plane->color_range_property) { *val = state->color_range; } else if (property == config->prop_fb_damage_clips) { *val = (state->fb_damage_clips) ? state->fb_damage_clips->base.id : 0; } else if (property == plane->scaling_filter_property) { *val = state->scaling_filter; } else if (plane->funcs->atomic_get_property) { return plane->funcs->atomic_get_property(plane, state, property, val); } else if (property == plane->hotspot_x_property) { *val = state->hotspot_x; } else if (property == plane->hotspot_y_property) { *val = state->hotspot_y; } else { drm_dbg_atomic(dev, "[PLANE:%d:%s] unknown property [PROP:%d:%s]\n", plane->base.id, plane->name, property->base.id, property->name); return -EINVAL; } return 0; } static int drm_atomic_set_writeback_fb_for_connector( struct drm_connector_state *conn_state, struct drm_framebuffer *fb) { int ret; struct drm_connector *conn = conn_state->connector; ret = drm_writeback_set_fb(conn_state, fb); if (ret < 0) return ret; if (fb) drm_dbg_atomic(conn->dev, "Set [FB:%d] for connector state %p\n", fb->base.id, conn_state); else drm_dbg_atomic(conn->dev, "Set [NOFB] for connector state %p\n", conn_state); return 0; } static int drm_atomic_connector_set_property(struct drm_connector *connector, struct drm_connector_state *state, struct drm_file *file_priv, struct drm_property *property, uint64_t val) { struct drm_device *dev = connector->dev; struct drm_mode_config *config = &dev->mode_config; bool replaced = false; int ret; if (property == config->prop_crtc_id) { struct drm_crtc *crtc = drm_crtc_find(dev, file_priv, val); if (val && !crtc) { drm_dbg_atomic(dev, "[PROP:%d:%s] cannot find CRTC with ID %llu\n", property->base.id, property->name, val); return -EACCES; } return drm_atomic_set_crtc_for_connector(state, crtc); } else if (property == config->dpms_property) { /* setting DPMS property requires special handling, which * is done in legacy setprop path for us. Disallow (for * now?) atomic writes to DPMS property: */ drm_dbg_atomic(dev, "legacy [PROP:%d:%s] can only be set via legacy uAPI\n", property->base.id, property->name); return -EINVAL; } else if (property == config->tv_select_subconnector_property) { state->tv.select_subconnector = val; } else if (property == config->tv_subconnector_property) { state->tv.subconnector = val; } else if (property == config->tv_left_margin_property) { state->tv.margins.left = val; } else if (property == config->tv_right_margin_property) { state->tv.margins.right = val; } else if (property == config->tv_top_margin_property) { state->tv.margins.top = val; } else if (property == config->tv_bottom_margin_property) { state->tv.margins.bottom = val; } else if (property == config->legacy_tv_mode_property) { state->tv.legacy_mode = val; } else if (property == config->tv_mode_property) { state->tv.mode = val; } else if (property == config->tv_brightness_property) { state->tv.brightness = val; } else if (property == config->tv_contrast_property) { state->tv.contrast = val; } else if (property == config->tv_flicker_reduction_property) { state->tv.flicker_reduction = val; } else if (property == config->tv_overscan_property) { state->tv.overscan = val; } else if (property == config->tv_saturation_property) { state->tv.saturation = val; } else if (property == config->tv_hue_property) { state->tv.hue = val; } else if (property == config->link_status_property) { /* Never downgrade from GOOD to BAD on userspace's request here, * only hw issues can do that. * * For an atomic property the userspace doesn't need to be able * to understand all the properties, but needs to be able to * restore the state it wants on VT switch. So if the userspace * tries to change the link_status from GOOD to BAD, driver * silently rejects it and returns a 0. This prevents userspace * from accidentally breaking the display when it restores the * state. */ if (state->link_status != DRM_LINK_STATUS_GOOD) state->link_status = val; } else if (property == config->hdr_output_metadata_property) { ret = drm_property_replace_blob_from_id(dev, &state->hdr_output_metadata, val, sizeof(struct hdr_output_metadata), -1, &replaced); return ret; } else if (property == config->aspect_ratio_property) { state->picture_aspect_ratio = val; } else if (property == config->content_type_property) { state->content_type = val; } else if (property == connector->scaling_mode_property) { state->scaling_mode = val; } else if (property == config->content_protection_property) { if (val == DRM_MODE_CONTENT_PROTECTION_ENABLED) { drm_dbg_kms(dev, "only drivers can set CP Enabled\n"); return -EINVAL; } state->content_protection = val; } else if (property == config->hdcp_content_type_property) { state->hdcp_content_type = val; } else if (property == connector->colorspace_property) { state->colorspace = val; } else if (property == config->writeback_fb_id_property) { struct drm_framebuffer *fb; int ret; fb = drm_framebuffer_lookup(dev, file_priv, val); ret = drm_atomic_set_writeback_fb_for_connector(state, fb); if (fb) drm_framebuffer_put(fb); return ret; } else if (property == config->writeback_out_fence_ptr_property) { s32 __user *fence_ptr = u64_to_user_ptr(val); return set_out_fence_for_connector(state->state, connector, fence_ptr); } else if (property == connector->max_bpc_property) { state->max_requested_bpc = val; } else if (property == connector->privacy_screen_sw_state_property) { state->privacy_screen_sw_state = val; } else if (property == connector->broadcast_rgb_property) { state->hdmi.broadcast_rgb = val; } else if (connector->funcs->atomic_set_property) { return connector->funcs->atomic_set_property(connector, state, property, val); } else { drm_dbg_atomic(connector->dev, "[CONNECTOR:%d:%s] unknown property [PROP:%d:%s]\n", connector->base.id, connector->name, property->base.id, property->name); return -EINVAL; } return 0; } static int drm_atomic_connector_get_property(struct drm_connector *connector, const struct drm_connector_state *state, struct drm_property *property, uint64_t *val) { struct drm_device *dev = connector->dev; struct drm_mode_config *config = &dev->mode_config; if (property == config->prop_crtc_id) { *val = (state->crtc) ? state->crtc->base.id : 0; } else if (property == config->dpms_property) { if (state->crtc && state->crtc->state->self_refresh_active) *val = DRM_MODE_DPMS_ON; else *val = connector->dpms; } else if (property == config->tv_select_subconnector_property) { *val = state->tv.select_subconnector; } else if (property == config->tv_subconnector_property) { *val = state->tv.subconnector; } else if (property == config->tv_left_margin_property) { *val = state->tv.margins.left; } else if (property == config->tv_right_margin_property) { *val = state->tv.margins.right; } else if (property == config->tv_top_margin_property) { *val = state->tv.margins.top; } else if (property == config->tv_bottom_margin_property) { *val = state->tv.margins.bottom; } else if (property == config->legacy_tv_mode_property) { *val = state->tv.legacy_mode; } else if (property == config->tv_mode_property) { *val = state->tv.mode; } else if (property == config->tv_brightness_property) { *val = state->tv.brightness; } else if (property == config->tv_contrast_property) { *val = state->tv.contrast; } else if (property == config->tv_flicker_reduction_property) { *val = state->tv.flicker_reduction; } else if (property == config->tv_overscan_property) { *val = state->tv.overscan; } else if (property == config->tv_saturation_property) { *val = state->tv.saturation; } else if (property == config->tv_hue_property) { *val = state->tv.hue; } else if (property == config->link_status_property) { *val = state->link_status; } else if (property == config->aspect_ratio_property) { *val = state->picture_aspect_ratio; } else if (property == config->content_type_property) { *val = state->content_type; } else if (property == connector->colorspace_property) { *val = state->colorspace; } else if (property == connector->scaling_mode_property) { *val = state->scaling_mode; } else if (property == config->hdr_output_metadata_property) { *val = state->hdr_output_metadata ? state->hdr_output_metadata->base.id : 0; } else if (property == config->content_protection_property) { *val = state->content_protection; } else if (property == config->hdcp_content_type_property) { *val = state->hdcp_content_type; } else if (property == config->writeback_fb_id_property) { /* Writeback framebuffer is one-shot, write and forget */ *val = 0; } else if (property == config->writeback_out_fence_ptr_property) { *val = 0; } else if (property == connector->max_bpc_property) { *val = state->max_requested_bpc; } else if (property == connector->privacy_screen_sw_state_property) { *val = state->privacy_screen_sw_state; } else if (property == connector->broadcast_rgb_property) { *val = state->hdmi.broadcast_rgb; } else if (connector->funcs->atomic_get_property) { return connector->funcs->atomic_get_property(connector, state, property, val); } else { drm_dbg_atomic(dev, "[CONNECTOR:%d:%s] unknown property [PROP:%d:%s]\n", connector->base.id, connector->name, property->base.id, property->name); return -EINVAL; } return 0; } int drm_atomic_get_property(struct drm_mode_object *obj, struct drm_property *property, uint64_t *val) { struct drm_device *dev = property->dev; int ret; switch (obj->type) { case DRM_MODE_OBJECT_CONNECTOR: { struct drm_connector *connector = obj_to_connector(obj); WARN_ON(!drm_modeset_is_locked(&dev->mode_config.connection_mutex)); ret = drm_atomic_connector_get_property(connector, connector->state, property, val); break; } case DRM_MODE_OBJECT_CRTC: { struct drm_crtc *crtc = obj_to_crtc(obj); WARN_ON(!drm_modeset_is_locked(&crtc->mutex)); ret = drm_atomic_crtc_get_property(crtc, crtc->state, property, val); break; } case DRM_MODE_OBJECT_PLANE: { struct drm_plane *plane = obj_to_plane(obj); WARN_ON(!drm_modeset_is_locked(&plane->mutex)); ret = drm_atomic_plane_get_property(plane, plane->state, property, val); break; } default: drm_dbg_atomic(dev, "[OBJECT:%d] has no properties\n", obj->id); ret = -EINVAL; break; } return ret; } /* * The big monster ioctl */ static struct drm_pending_vblank_event *create_vblank_event( struct drm_crtc *crtc, uint64_t user_data) { struct drm_pending_vblank_event *e = NULL; e = kzalloc(sizeof *e, GFP_KERNEL); if (!e) return NULL; e->event.base.type = DRM_EVENT_FLIP_COMPLETE; e->event.base.length = sizeof(e->event); e->event.vbl.crtc_id = crtc->base.id; e->event.vbl.user_data = user_data; return e; } int drm_atomic_connector_commit_dpms(struct drm_atomic_state *state, struct drm_connector *connector, int mode) { struct drm_connector *tmp_connector; struct drm_connector_state *new_conn_state; struct drm_crtc *crtc; struct drm_crtc_state *crtc_state; int i, ret, old_mode = connector->dpms; bool active = false; ret = drm_modeset_lock(&state->dev->mode_config.connection_mutex, state->acquire_ctx); if (ret) return ret; if (mode != DRM_MODE_DPMS_ON) mode = DRM_MODE_DPMS_OFF; connector->dpms = mode; crtc = connector->state->crtc; if (!crtc) goto out; ret = drm_atomic_add_affected_connectors(state, crtc); if (ret) goto out; crtc_state = drm_atomic_get_crtc_state(state, crtc); if (IS_ERR(crtc_state)) { ret = PTR_ERR(crtc_state); goto out; } for_each_new_connector_in_state(state, tmp_connector, new_conn_state, i) { if (new_conn_state->crtc != crtc) continue; if (tmp_connector->dpms == DRM_MODE_DPMS_ON) { active = true; break; } } crtc_state->active = active; ret = drm_atomic_commit(state); out: if (ret != 0) connector->dpms = old_mode; return ret; } static int drm_atomic_check_prop_changes(int ret, uint64_t old_val, uint64_t prop_value, struct drm_property *prop) { if (ret != 0 || old_val != prop_value) { drm_dbg_atomic(prop->dev, "[PROP:%d:%s] No prop can be changed during async flip\n", prop->base.id, prop->name); return -EINVAL; } return 0; } int drm_atomic_set_property(struct drm_atomic_state *state, struct drm_file *file_priv, struct drm_mode_object *obj, struct drm_property *prop, u64 prop_value, bool async_flip) { struct drm_mode_object *ref; u64 old_val; int ret; if (!drm_property_change_valid_get(prop, prop_value, &ref)) return -EINVAL; switch (obj->type) { case DRM_MODE_OBJECT_CONNECTOR: { struct drm_connector *connector = obj_to_connector(obj); struct drm_connector_state *connector_state; connector_state = drm_atomic_get_connector_state(state, connector); if (IS_ERR(connector_state)) { ret = PTR_ERR(connector_state); break; } if (async_flip) { ret = drm_atomic_connector_get_property(connector, connector_state, prop, &old_val); ret = drm_atomic_check_prop_changes(ret, old_val, prop_value, prop); break; } ret = drm_atomic_connector_set_property(connector, connector_state, file_priv, prop, prop_value); break; } case DRM_MODE_OBJECT_CRTC: { struct drm_crtc *crtc = obj_to_crtc(obj); struct drm_crtc_state *crtc_state; crtc_state = drm_atomic_get_crtc_state(state, crtc); if (IS_ERR(crtc_state)) { ret = PTR_ERR(crtc_state); break; } if (async_flip) { ret = drm_atomic_crtc_get_property(crtc, crtc_state, prop, &old_val); ret = drm_atomic_check_prop_changes(ret, old_val, prop_value, prop); break; } ret = drm_atomic_crtc_set_property(crtc, crtc_state, prop, prop_value); break; } case DRM_MODE_OBJECT_PLANE: { struct drm_plane *plane = obj_to_plane(obj); struct drm_plane_state *plane_state; struct drm_mode_config *config = &plane->dev->mode_config; plane_state = drm_atomic_get_plane_state(state, plane); if (IS_ERR(plane_state)) { ret = PTR_ERR(plane_state); break; } if (async_flip && (plane_state->plane->type != DRM_PLANE_TYPE_PRIMARY || (prop != config->prop_fb_id && prop != config->prop_in_fence_fd && prop != config->prop_fb_damage_clips))) { ret = drm_atomic_plane_get_property(plane, plane_state, prop, &old_val); ret = drm_atomic_check_prop_changes(ret, old_val, prop_value, prop); break; } ret = drm_atomic_plane_set_property(plane, plane_state, file_priv, prop, prop_value); break; } default: drm_dbg_atomic(prop->dev, "[OBJECT:%d] has no properties\n", obj->id); ret = -EINVAL; break; } drm_property_change_valid_put(prop, ref); return ret; } /** * DOC: explicit fencing properties * * Explicit fencing allows userspace to control the buffer synchronization * between devices. A Fence or a group of fences are transferred to/from * userspace using Sync File fds and there are two DRM properties for that. * IN_FENCE_FD on each DRM Plane to send fences to the kernel and * OUT_FENCE_PTR on each DRM CRTC to receive fences from the kernel. * * As a contrast, with implicit fencing the kernel keeps track of any * ongoing rendering, and automatically ensures that the atomic update waits * for any pending rendering to complete. This is usually tracked in &struct * dma_resv which can also contain mandatory kernel fences. Implicit syncing * is how Linux traditionally worked (e.g. DRI2/3 on X.org), whereas explicit * fencing is what Android wants. * * "IN_FENCE_FD”: * Use this property to pass a fence that DRM should wait on before * proceeding with the Atomic Commit request and show the framebuffer for * the plane on the screen. The fence can be either a normal fence or a * merged one, the sync_file framework will handle both cases and use a * fence_array if a merged fence is received. Passing -1 here means no * fences to wait on. * * If the Atomic Commit request has the DRM_MODE_ATOMIC_TEST_ONLY flag * it will only check if the Sync File is a valid one. * * On the driver side the fence is stored on the @fence parameter of * &struct drm_plane_state. Drivers which also support implicit fencing * should extract the implicit fence using drm_gem_plane_helper_prepare_fb(), * to make sure there's consistent behaviour between drivers in precedence * of implicit vs. explicit fencing. * * "OUT_FENCE_PTR”: * Use this property to pass a file descriptor pointer to DRM. Once the * Atomic Commit request call returns OUT_FENCE_PTR will be filled with * the file descriptor number of a Sync File. This Sync File contains the * CRTC fence that will be signaled when all framebuffers present on the * Atomic Commit * request for that given CRTC are scanned out on the * screen. * * The Atomic Commit request fails if a invalid pointer is passed. If the * Atomic Commit request fails for any other reason the out fence fd * returned will be -1. On a Atomic Commit with the * DRM_MODE_ATOMIC_TEST_ONLY flag the out fence will also be set to -1. * * Note that out-fences don't have a special interface to drivers and are * internally represented by a &struct drm_pending_vblank_event in struct * &drm_crtc_state, which is also used by the nonblocking atomic commit * helpers and for the DRM event handling for existing userspace. */ struct drm_out_fence_state { s32 __user *out_fence_ptr; struct sync_file *sync_file; int fd; }; static int setup_out_fence(struct drm_out_fence_state *fence_state, struct dma_fence *fence) { fence_state->fd = get_unused_fd_flags(O_CLOEXEC); if (fence_state->fd < 0) return fence_state->fd; if (put_user(fence_state->fd, fence_state->out_fence_ptr)) return -EFAULT; fence_state->sync_file = sync_file_create(fence); if (!fence_state->sync_file) return -ENOMEM; return 0; } static int prepare_signaling(struct drm_device *dev, struct drm_atomic_state *state, struct drm_mode_atomic *arg, struct drm_file *file_priv, struct drm_out_fence_state **fence_state, unsigned int *num_fences) { struct drm_crtc *crtc; struct drm_crtc_state *crtc_state; struct drm_connector *conn; struct drm_connector_state *conn_state; int i, c = 0, ret; if (arg->flags & DRM_MODE_ATOMIC_TEST_ONLY) return 0; for_each_new_crtc_in_state(state, crtc, crtc_state, i) { s32 __user *fence_ptr; fence_ptr = get_out_fence_for_crtc(crtc_state->state, crtc); if (arg->flags & DRM_MODE_PAGE_FLIP_EVENT || fence_ptr) { struct drm_pending_vblank_event *e; e = create_vblank_event(crtc, arg->user_data); if (!e) return -ENOMEM; crtc_state->event = e; } if (arg->flags & DRM_MODE_PAGE_FLIP_EVENT) { struct drm_pending_vblank_event *e = crtc_state->event; if (!file_priv) continue; ret = drm_event_reserve_init(dev, file_priv, &e->base, &e->event.base); if (ret) { kfree(e); crtc_state->event = NULL; return ret; } } if (fence_ptr) { struct dma_fence *fence; struct drm_out_fence_state *f; f = krealloc(*fence_state, sizeof(**fence_state) * (*num_fences + 1), GFP_KERNEL); if (!f) return -ENOMEM; memset(&f[*num_fences], 0, sizeof(*f)); f[*num_fences].out_fence_ptr = fence_ptr; *fence_state = f; fence = drm_crtc_create_fence(crtc); if (!fence) return -ENOMEM; ret = setup_out_fence(&f[(*num_fences)++], fence); if (ret) { dma_fence_put(fence); return ret; } crtc_state->event->base.fence = fence; } c++; } for_each_new_connector_in_state(state, conn, conn_state, i) { struct drm_writeback_connector *wb_conn; struct drm_out_fence_state *f; struct dma_fence *fence; s32 __user *fence_ptr; if (!conn_state->writeback_job) continue; fence_ptr = get_out_fence_for_connector(state, conn); if (!fence_ptr) continue; f = krealloc(*fence_state, sizeof(**fence_state) * (*num_fences + 1), GFP_KERNEL); if (!f) return -ENOMEM; memset(&f[*num_fences], 0, sizeof(*f)); f[*num_fences].out_fence_ptr = fence_ptr; *fence_state = f; wb_conn = drm_connector_to_writeback(conn); fence = drm_writeback_get_out_fence(wb_conn); if (!fence) return -ENOMEM; ret = setup_out_fence(&f[(*num_fences)++], fence); if (ret) { dma_fence_put(fence); return ret; } conn_state->writeback_job->out_fence = fence; } /* * Having this flag means user mode pends on event which will never * reach due to lack of at least one CRTC for signaling */ if (c == 0 && (arg->flags & DRM_MODE_PAGE_FLIP_EVENT)) { drm_dbg_atomic(dev, "need at least one CRTC for DRM_MODE_PAGE_FLIP_EVENT"); return -EINVAL; } return 0; } static void complete_signaling(struct drm_device *dev, struct drm_atomic_state *state, struct drm_out_fence_state *fence_state, unsigned int num_fences, bool install_fds) { struct drm_crtc *crtc; struct drm_crtc_state *crtc_state; int i; if (install_fds) { for (i = 0; i < num_fences; i++) fd_install(fence_state[i].fd, fence_state[i].sync_file->file); kfree(fence_state); return; } for_each_new_crtc_in_state(state, crtc, crtc_state, i) { struct drm_pending_vblank_event *event = crtc_state->event; /* * Free the allocated event. drm_atomic_helper_setup_commit * can allocate an event too, so only free it if it's ours * to prevent a double free in drm_atomic_state_clear. */ if (event && (event->base.fence || event->base.file_priv)) { drm_event_cancel_free(dev, &event->base); crtc_state->event = NULL; } } if (!fence_state) return; for (i = 0; i < num_fences; i++) { if (fence_state[i].sync_file) fput(fence_state[i].sync_file->file); if (fence_state[i].fd >= 0) put_unused_fd(fence_state[i].fd); /* If this fails log error to the user */ if (fence_state[i].out_fence_ptr && put_user(-1, fence_state[i].out_fence_ptr)) drm_dbg_atomic(dev, "Couldn't clear out_fence_ptr\n"); } kfree(fence_state); } static void set_async_flip(struct drm_atomic_state *state) { struct drm_crtc *crtc; struct drm_crtc_state *crtc_state; int i; for_each_new_crtc_in_state(state, crtc, crtc_state, i) { crtc_state->async_flip = true; } } int drm_mode_atomic_ioctl(struct drm_device *dev, void *data, struct drm_file *file_priv) { struct drm_mode_atomic *arg = data; uint32_t __user *objs_ptr = (uint32_t __user *)(unsigned long)(arg->objs_ptr); uint32_t __user *count_props_ptr = (uint32_t __user *)(unsigned long)(arg->count_props_ptr); uint32_t __user *props_ptr = (uint32_t __user *)(unsigned long)(arg->props_ptr); uint64_t __user *prop_values_ptr = (uint64_t __user *)(unsigned long)(arg->prop_values_ptr); unsigned int copied_objs, copied_props; struct drm_atomic_state *state; struct drm_modeset_acquire_ctx ctx; struct drm_out_fence_state *fence_state; int ret = 0; unsigned int i, j, num_fences; bool async_flip = false; /* disallow for drivers not supporting atomic: */ if (!drm_core_check_feature(dev, DRIVER_ATOMIC)) return -EOPNOTSUPP; /* disallow for userspace that has not enabled atomic cap (even * though this may be a bit overkill, since legacy userspace * wouldn't know how to call this ioctl) */ if (!file_priv->atomic) { drm_dbg_atomic(dev, "commit failed: atomic cap not enabled\n"); return -EINVAL; } if (arg->flags & ~DRM_MODE_ATOMIC_FLAGS) { drm_dbg_atomic(dev, "commit failed: invalid flag\n"); return -EINVAL; } if (arg->reserved) { drm_dbg_atomic(dev, "commit failed: reserved field set\n"); return -EINVAL; } if (arg->flags & DRM_MODE_PAGE_FLIP_ASYNC) { if (!dev->mode_config.async_page_flip) { drm_dbg_atomic(dev, "commit failed: DRM_MODE_PAGE_FLIP_ASYNC not supported\n"); return -EINVAL; } async_flip = true; } /* can't test and expect an event at the same time. */ if ((arg->flags & DRM_MODE_ATOMIC_TEST_ONLY) && (arg->flags & DRM_MODE_PAGE_FLIP_EVENT)) { drm_dbg_atomic(dev, "commit failed: page-flip event requested with test-only commit\n"); return -EINVAL; } state = drm_atomic_state_alloc(dev); if (!state) return -ENOMEM; drm_modeset_acquire_init(&ctx, DRM_MODESET_ACQUIRE_INTERRUPTIBLE); state->acquire_ctx = &ctx; state->allow_modeset = !!(arg->flags & DRM_MODE_ATOMIC_ALLOW_MODESET); retry: copied_objs = 0; copied_props = 0; fence_state = NULL; num_fences = 0; for (i = 0; i < arg->count_objs; i++) { uint32_t obj_id, count_props; struct drm_mode_object *obj; if (get_user(obj_id, objs_ptr + copied_objs)) { ret = -EFAULT; goto out; } obj = drm_mode_object_find(dev, file_priv, obj_id, DRM_MODE_OBJECT_ANY); if (!obj) { drm_dbg_atomic(dev, "cannot find object ID %d", obj_id); ret = -ENOENT; goto out; } if (!obj->properties) { drm_dbg_atomic(dev, "[OBJECT:%d] has no properties", obj_id); drm_mode_object_put(obj); ret = -ENOENT; goto out; } if (get_user(count_props, count_props_ptr + copied_objs)) { drm_mode_object_put(obj); ret = -EFAULT; goto out; } copied_objs++; for (j = 0; j < count_props; j++) { uint32_t prop_id; uint64_t prop_value; struct drm_property *prop; if (get_user(prop_id, props_ptr + copied_props)) { drm_mode_object_put(obj); ret = -EFAULT; goto out; } prop = drm_mode_obj_find_prop_id(obj, prop_id); if (!prop) { drm_dbg_atomic(dev, "[OBJECT:%d] cannot find property ID %d", obj_id, prop_id); drm_mode_object_put(obj); ret = -ENOENT; goto out; } if (copy_from_user(&prop_value, prop_values_ptr + copied_props, sizeof(prop_value))) { drm_mode_object_put(obj); ret = -EFAULT; goto out; } ret = drm_atomic_set_property(state, file_priv, obj, prop, prop_value, async_flip); if (ret) { drm_mode_object_put(obj); goto out; } copied_props++; } drm_mode_object_put(obj); } ret = prepare_signaling(dev, state, arg, file_priv, &fence_state, &num_fences); if (ret) goto out; if (arg->flags & DRM_MODE_PAGE_FLIP_ASYNC) set_async_flip(state); if (arg->flags & DRM_MODE_ATOMIC_TEST_ONLY) { ret = drm_atomic_check_only(state); } else if (arg->flags & DRM_MODE_ATOMIC_NONBLOCK) { ret = drm_atomic_nonblocking_commit(state); } else { ret = drm_atomic_commit(state); } out: complete_signaling(dev, state, fence_state, num_fences, !ret); if (ret == -EDEADLK) { drm_atomic_state_clear(state); ret = drm_modeset_backoff(&ctx); if (!ret) goto retry; } drm_atomic_state_put(state); drm_modeset_drop_locks(&ctx); drm_modeset_acquire_fini(&ctx); return ret; } |
| 1 1 2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 | // SPDX-License-Identifier: GPL-2.0-only /* * Copyright(c) 2013-2015 Intel Corporation. All rights reserved. */ #include <linux/blkdev.h> #include <linux/device.h> #include <linux/sizes.h> #include <linux/slab.h> #include <linux/fs.h> #include <linux/mm.h> #include "nd-core.h" #include "btt.h" #include "nd.h" static void nd_btt_release(struct device *dev) { struct nd_region *nd_region = to_nd_region(dev->parent); struct nd_btt *nd_btt = to_nd_btt(dev); dev_dbg(dev, "trace\n"); nd_detach_ndns(&nd_btt->dev, &nd_btt->ndns); ida_free(&nd_region->btt_ida, nd_btt->id); kfree(nd_btt->uuid); kfree(nd_btt); } struct nd_btt *to_nd_btt(struct device *dev) { struct nd_btt *nd_btt = container_of(dev, struct nd_btt, dev); WARN_ON(!is_nd_btt(dev)); return nd_btt; } EXPORT_SYMBOL(to_nd_btt); static const unsigned long btt_lbasize_supported[] = { 512, 520, 528, 4096, 4104, 4160, 4224, 0 }; static ssize_t sector_size_show(struct device *dev, struct device_attribute *attr, char *buf) { struct nd_btt *nd_btt = to_nd_btt(dev); return nd_size_select_show(nd_btt->lbasize, btt_lbasize_supported, buf); } static ssize_t sector_size_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t len) { struct nd_btt *nd_btt = to_nd_btt(dev); ssize_t rc; device_lock(dev); nvdimm_bus_lock(dev); rc = nd_size_select_store(dev, buf, &nd_btt->lbasize, btt_lbasize_supported); dev_dbg(dev, "result: %zd wrote: %s%s", rc, buf, buf[len - 1] == '\n' ? "" : "\n"); nvdimm_bus_unlock(dev); device_unlock(dev); return rc ? rc : len; } static DEVICE_ATTR_RW(sector_size); static ssize_t uuid_show(struct device *dev, struct device_attribute *attr, char *buf) { struct nd_btt *nd_btt = to_nd_btt(dev); if (nd_btt->uuid) return sprintf(buf, "%pUb\n", nd_btt->uuid); return sprintf(buf, "\n"); } static ssize_t uuid_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t len) { struct nd_btt *nd_btt = to_nd_btt(dev); ssize_t rc; device_lock(dev); rc = nd_uuid_store(dev, &nd_btt->uuid, buf, len); dev_dbg(dev, "result: %zd wrote: %s%s", rc, buf, buf[len - 1] == '\n' ? "" : "\n"); device_unlock(dev); return rc ? rc : len; } static DEVICE_ATTR_RW(uuid); static ssize_t namespace_show(struct device *dev, struct device_attribute *attr, char *buf) { struct nd_btt *nd_btt = to_nd_btt(dev); ssize_t rc; nvdimm_bus_lock(dev); rc = sprintf(buf, "%s\n", nd_btt->ndns ? dev_name(&nd_btt->ndns->dev) : ""); nvdimm_bus_unlock(dev); return rc; } static ssize_t namespace_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t len) { struct nd_btt *nd_btt = to_nd_btt(dev); ssize_t rc; device_lock(dev); nvdimm_bus_lock(dev); rc = nd_namespace_store(dev, &nd_btt->ndns, buf, len); dev_dbg(dev, "result: %zd wrote: %s%s", rc, buf, buf[len - 1] == '\n' ? "" : "\n"); nvdimm_bus_unlock(dev); device_unlock(dev); return rc; } static DEVICE_ATTR_RW(namespace); static ssize_t size_show(struct device *dev, struct device_attribute *attr, char *buf) { struct nd_btt *nd_btt = to_nd_btt(dev); ssize_t rc; device_lock(dev); if (dev->driver) rc = sprintf(buf, "%llu\n", nd_btt->size); else { /* no size to convey if the btt instance is disabled */ rc = -ENXIO; } device_unlock(dev); return rc; } static DEVICE_ATTR_RO(size); static ssize_t log_zero_flags_show(struct device *dev, struct device_attribute *attr, char *buf) { return sprintf(buf, "Y\n"); } static DEVICE_ATTR_RO(log_zero_flags); static struct attribute *nd_btt_attributes[] = { &dev_attr_sector_size.attr, &dev_attr_namespace.attr, &dev_attr_uuid.attr, &dev_attr_size.attr, &dev_attr_log_zero_flags.attr, NULL, }; static struct attribute_group nd_btt_attribute_group = { .attrs = nd_btt_attributes, }; static const struct attribute_group *nd_btt_attribute_groups[] = { &nd_btt_attribute_group, &nd_device_attribute_group, &nd_numa_attribute_group, NULL, }; static const struct device_type nd_btt_device_type = { .name = "nd_btt", .release = nd_btt_release, .groups = nd_btt_attribute_groups, }; bool is_nd_btt(struct device *dev) { return dev->type == &nd_btt_device_type; } EXPORT_SYMBOL(is_nd_btt); static struct lock_class_key nvdimm_btt_key; static struct device *__nd_btt_create(struct nd_region *nd_region, unsigned long lbasize, uuid_t *uuid, struct nd_namespace_common *ndns) { struct nd_btt *nd_btt; struct device *dev; nd_btt = kzalloc(sizeof(*nd_btt), GFP_KERNEL); if (!nd_btt) return NULL; nd_btt->id = ida_alloc(&nd_region->btt_ida, GFP_KERNEL); if (nd_btt->id < 0) goto out_nd_btt; nd_btt->lbasize = lbasize; if (uuid) { uuid = kmemdup(uuid, 16, GFP_KERNEL); if (!uuid) goto out_put_id; } nd_btt->uuid = uuid; dev = &nd_btt->dev; dev_set_name(dev, "btt%d.%d", nd_region->id, nd_btt->id); dev->parent = &nd_region->dev; dev->type = &nd_btt_device_type; device_initialize(&nd_btt->dev); lockdep_set_class(&nd_btt->dev.mutex, &nvdimm_btt_key); if (ndns && !__nd_attach_ndns(&nd_btt->dev, ndns, &nd_btt->ndns)) { dev_dbg(&ndns->dev, "failed, already claimed by %s\n", dev_name(ndns->claim)); put_device(dev); return NULL; } return dev; out_put_id: ida_free(&nd_region->btt_ida, nd_btt->id); out_nd_btt: kfree(nd_btt); return NULL; } struct device *nd_btt_create(struct nd_region *nd_region) { struct device *dev = __nd_btt_create(nd_region, 0, NULL, NULL); nd_device_register(dev); return dev; } /** * nd_btt_arena_is_valid - check if the metadata layout is valid * @nd_btt: device with BTT geometry and backing device info * @super: pointer to the arena's info block being tested * * Check consistency of the btt info block with itself by validating * the checksum, and with the parent namespace by verifying the * parent_uuid contained in the info block with the one supplied in. * * Returns: * false for an invalid info block, true for a valid one */ bool nd_btt_arena_is_valid(struct nd_btt *nd_btt, struct btt_sb *super) { const uuid_t *ns_uuid = nd_dev_to_uuid(&nd_btt->ndns->dev); uuid_t parent_uuid; u64 checksum; if (memcmp(super->signature, BTT_SIG, BTT_SIG_LEN) != 0) return false; import_uuid(&parent_uuid, super->parent_uuid); if (!uuid_is_null(&parent_uuid)) if (!uuid_equal(&parent_uuid, ns_uuid)) return false; checksum = le64_to_cpu(super->checksum); super->checksum = 0; if (checksum != nd_sb_checksum((struct nd_gen_sb *) super)) return false; super->checksum = cpu_to_le64(checksum); /* TODO: figure out action for this */ if ((le32_to_cpu(super->flags) & IB_FLAG_ERROR_MASK) != 0) dev_info(&nd_btt->dev, "Found arena with an error flag\n"); return true; } EXPORT_SYMBOL(nd_btt_arena_is_valid); int nd_btt_version(struct nd_btt *nd_btt, struct nd_namespace_common *ndns, struct btt_sb *btt_sb) { if (ndns->claim_class == NVDIMM_CCLASS_BTT2) { /* Probe/setup for BTT v2.0 */ nd_btt->initial_offset = 0; nd_btt->version_major = 2; nd_btt->version_minor = 0; if (nvdimm_read_bytes(ndns, 0, btt_sb, sizeof(*btt_sb), 0)) return -ENXIO; if (!nd_btt_arena_is_valid(nd_btt, btt_sb)) return -ENODEV; if ((le16_to_cpu(btt_sb->version_major) != 2) || (le16_to_cpu(btt_sb->version_minor) != 0)) return -ENODEV; } else { /* * Probe/setup for BTT v1.1 (NVDIMM_CCLASS_NONE or * NVDIMM_CCLASS_BTT) */ nd_btt->initial_offset = SZ_4K; nd_btt->version_major = 1; nd_btt->version_minor = 1; if (nvdimm_read_bytes(ndns, SZ_4K, btt_sb, sizeof(*btt_sb), 0)) return -ENXIO; if (!nd_btt_arena_is_valid(nd_btt, btt_sb)) return -ENODEV; if ((le16_to_cpu(btt_sb->version_major) != 1) || (le16_to_cpu(btt_sb->version_minor) != 1)) return -ENODEV; } return 0; } EXPORT_SYMBOL(nd_btt_version); static int __nd_btt_probe(struct nd_btt *nd_btt, struct nd_namespace_common *ndns, struct btt_sb *btt_sb) { int rc; if (!btt_sb || !ndns || !nd_btt) return -ENODEV; if (nvdimm_namespace_capacity(ndns) < SZ_16M) return -ENXIO; rc = nd_btt_version(nd_btt, ndns, btt_sb); if (rc < 0) return rc; nd_btt->lbasize = le32_to_cpu(btt_sb->external_lbasize); nd_btt->uuid = kmemdup(&btt_sb->uuid, sizeof(uuid_t), GFP_KERNEL); if (!nd_btt->uuid) return -ENOMEM; nd_device_register(&nd_btt->dev); return 0; } int nd_btt_probe(struct device *dev, struct nd_namespace_common *ndns) { int rc; struct device *btt_dev; struct btt_sb *btt_sb; struct nd_region *nd_region = to_nd_region(ndns->dev.parent); if (ndns->force_raw) return -ENODEV; switch (ndns->claim_class) { case NVDIMM_CCLASS_NONE: case NVDIMM_CCLASS_BTT: case NVDIMM_CCLASS_BTT2: break; default: return -ENODEV; } nvdimm_bus_lock(&ndns->dev); btt_dev = __nd_btt_create(nd_region, 0, NULL, ndns); nvdimm_bus_unlock(&ndns->dev); if (!btt_dev) return -ENOMEM; btt_sb = devm_kzalloc(dev, sizeof(*btt_sb), GFP_KERNEL); rc = __nd_btt_probe(to_nd_btt(btt_dev), ndns, btt_sb); dev_dbg(dev, "btt: %s\n", rc == 0 ? dev_name(btt_dev) : "<none>"); if (rc < 0) { struct nd_btt *nd_btt = to_nd_btt(btt_dev); nd_detach_ndns(btt_dev, &nd_btt->ndns); put_device(btt_dev); } return rc; } EXPORT_SYMBOL(nd_btt_probe); |
| 9 71 92 73 68 68 24 68 74 89 8 95 90 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 | /* SPDX-License-Identifier: GPL-2.0-only */ /* * sha512_base.h - core logic for SHA-512 implementations * * Copyright (C) 2015 Linaro Ltd <ard.biesheuvel@linaro.org> */ #ifndef _CRYPTO_SHA512_BASE_H #define _CRYPTO_SHA512_BASE_H #include <crypto/internal/hash.h> #include <crypto/sha2.h> #include <linux/crypto.h> #include <linux/module.h> #include <linux/string.h> #include <linux/unaligned.h> typedef void (sha512_block_fn)(struct sha512_state *sst, u8 const *src, int blocks); static inline int sha384_base_init(struct shash_desc *desc) { struct sha512_state *sctx = shash_desc_ctx(desc); sctx->state[0] = SHA384_H0; sctx->state[1] = SHA384_H1; sctx->state[2] = SHA384_H2; sctx->state[3] = SHA384_H3; sctx->state[4] = SHA384_H4; sctx->state[5] = SHA384_H5; sctx->state[6] = SHA384_H6; sctx->state[7] = SHA384_H7; sctx->count[0] = sctx->count[1] = 0; return 0; } static inline int sha512_base_init(struct shash_desc *desc) { struct sha512_state *sctx = shash_desc_ctx(desc); sctx->state[0] = SHA512_H0; sctx->state[1] = SHA512_H1; sctx->state[2] = SHA512_H2; sctx->state[3] = SHA512_H3; sctx->state[4] = SHA512_H4; sctx->state[5] = SHA512_H5; sctx->state[6] = SHA512_H6; sctx->state[7] = SHA512_H7; sctx->count[0] = sctx->count[1] = 0; return 0; } static inline int sha512_base_do_update(struct shash_desc *desc, const u8 *data, unsigned int len, sha512_block_fn *block_fn) { struct sha512_state *sctx = shash_desc_ctx(desc); unsigned int partial = sctx->count[0] % SHA512_BLOCK_SIZE; sctx->count[0] += len; if (sctx->count[0] < len) sctx->count[1]++; if (unlikely((partial + len) >= SHA512_BLOCK_SIZE)) { int blocks; if (partial) { int p = SHA512_BLOCK_SIZE - partial; memcpy(sctx->buf + partial, data, p); data += p; len -= p; block_fn(sctx, sctx->buf, 1); } blocks = len / SHA512_BLOCK_SIZE; len %= SHA512_BLOCK_SIZE; if (blocks) { block_fn(sctx, data, blocks); data += blocks * SHA512_BLOCK_SIZE; } partial = 0; } if (len) memcpy(sctx->buf + partial, data, len); return 0; } static inline int sha512_base_do_finalize(struct shash_desc *desc, sha512_block_fn *block_fn) { const int bit_offset = SHA512_BLOCK_SIZE - sizeof(__be64[2]); struct sha512_state *sctx = shash_desc_ctx(desc); __be64 *bits = (__be64 *)(sctx->buf + bit_offset); unsigned int partial = sctx->count[0] % SHA512_BLOCK_SIZE; sctx->buf[partial++] = 0x80; if (partial > bit_offset) { memset(sctx->buf + partial, 0x0, SHA512_BLOCK_SIZE - partial); partial = 0; block_fn(sctx, sctx->buf, 1); } memset(sctx->buf + partial, 0x0, bit_offset - partial); bits[0] = cpu_to_be64(sctx->count[1] << 3 | sctx->count[0] >> 61); bits[1] = cpu_to_be64(sctx->count[0] << 3); block_fn(sctx, sctx->buf, 1); return 0; } static inline int sha512_base_finish(struct shash_desc *desc, u8 *out) { unsigned int digest_size = crypto_shash_digestsize(desc->tfm); struct sha512_state *sctx = shash_desc_ctx(desc); __be64 *digest = (__be64 *)out; int i; for (i = 0; digest_size > 0; i++, digest_size -= sizeof(__be64)) put_unaligned_be64(sctx->state[i], digest++); memzero_explicit(sctx, sizeof(*sctx)); return 0; } #endif /* _CRYPTO_SHA512_BASE_H */ |
| 22 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 | /* SPDX-License-Identifier: GPL-2.0-or-later */ /* * Copyright (c) 2005 Andrea Bittau <a.bittau@cs.ucl.ac.uk> */ #ifndef _DCCP_CCID2_H_ #define _DCCP_CCID2_H_ #include <linux/timer.h> #include <linux/types.h> #include "../ccid.h" #include "../dccp.h" /* * CCID-2 timestamping faces the same issues as TCP timestamping. * Hence we reuse/share as much of the code as possible. */ #define ccid2_jiffies32 ((u32)jiffies) /* NUMDUPACK parameter from RFC 4341, p. 6 */ #define NUMDUPACK 3 struct ccid2_seq { u64 ccid2s_seq; u32 ccid2s_sent; int ccid2s_acked; struct ccid2_seq *ccid2s_prev; struct ccid2_seq *ccid2s_next; }; #define CCID2_SEQBUF_LEN 1024 #define CCID2_SEQBUF_MAX 128 /* * Multiple of congestion window to keep the sequence window at * (RFC 4340 7.5.2) */ #define CCID2_WIN_CHANGE_FACTOR 5 /** * struct ccid2_hc_tx_sock - CCID2 TX half connection * @tx_{cwnd,ssthresh,pipe}: as per RFC 4341, section 5 * @tx_packets_acked: Ack counter for deriving cwnd growth (RFC 3465) * @tx_srtt: smoothed RTT estimate, scaled by 2^3 * @tx_mdev: smoothed RTT variation, scaled by 2^2 * @tx_mdev_max: maximum of @mdev during one flight * @tx_rttvar: moving average/maximum of @mdev_max * @tx_rto: RTO value deriving from SRTT and RTTVAR (RFC 2988) * @tx_rtt_seq: to decay RTTVAR at most once per flight * @tx_cwnd_used: actually used cwnd, W_used of RFC 2861 * @tx_expected_wnd: moving average of @tx_cwnd_used * @tx_cwnd_stamp: to track idle periods in CWV * @tx_lsndtime: last time (in jiffies) a data packet was sent * @tx_rpseq: last consecutive seqno * @tx_rpdupack: dupacks since rpseq * @tx_av_chunks: list of Ack Vectors received on current skb */ struct ccid2_hc_tx_sock { u32 tx_cwnd; u32 tx_ssthresh; u32 tx_pipe; u32 tx_packets_acked; struct ccid2_seq *tx_seqbuf[CCID2_SEQBUF_MAX]; int tx_seqbufc; struct ccid2_seq *tx_seqh; struct ccid2_seq *tx_seqt; /* RTT measurement: variables/principles are the same as in TCP */ u32 tx_srtt, tx_mdev, tx_mdev_max, tx_rttvar, tx_rto; u64 tx_rtt_seq:48; struct timer_list tx_rtotimer; struct sock *sk; /* Congestion Window validation (optional, RFC 2861) */ u32 tx_cwnd_used, tx_expected_wnd, tx_cwnd_stamp, tx_lsndtime; u64 tx_rpseq; int tx_rpdupack; u32 tx_last_cong; u64 tx_high_ack; struct list_head tx_av_chunks; }; static inline bool ccid2_cwnd_network_limited(struct ccid2_hc_tx_sock *hc) { return hc->tx_pipe >= hc->tx_cwnd; } /* * Convert RFC 3390 larger initial window into an equivalent number of packets. * This is based on the numbers specified in RFC 5681, 3.1. */ static inline u32 rfc3390_bytes_to_packets(const u32 smss) { return smss <= 1095 ? 4 : (smss > 2190 ? 2 : 3); } /** * struct ccid2_hc_rx_sock - Receiving end of CCID-2 half-connection * @rx_num_data_pkts: number of data packets received since last feedback */ struct ccid2_hc_rx_sock { u32 rx_num_data_pkts; }; static inline struct ccid2_hc_tx_sock *ccid2_hc_tx_sk(const struct sock *sk) { return ccid_priv(dccp_sk(sk)->dccps_hc_tx_ccid); } static inline struct ccid2_hc_rx_sock *ccid2_hc_rx_sk(const struct sock *sk) { return ccid_priv(dccp_sk(sk)->dccps_hc_rx_ccid); } #endif /* _DCCP_CCID2_H_ */ |
| 2 2 1 2 1 2 1 2 1 3 1 3 1 2 2 2 1 13 8 5 13 13 5 8 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 | // SPDX-License-Identifier: GPL-2.0-only /* * This file contains vfs directory ops for the 9P2000 protocol. * * Copyright (C) 2004 by Eric Van Hensbergen <ericvh@gmail.com> * Copyright (C) 2002 by Ron Minnich <rminnich@lanl.gov> */ #include <linux/module.h> #include <linux/errno.h> #include <linux/fs.h> #include <linux/file.h> #include <linux/stat.h> #include <linux/string.h> #include <linux/sched.h> #include <linux/slab.h> #include <linux/uio.h> #include <linux/fscache.h> #include <net/9p/9p.h> #include <net/9p/client.h> #include "v9fs.h" #include "v9fs_vfs.h" #include "fid.h" /** * struct p9_rdir - readdir accounting * @head: start offset of current dirread buffer * @tail: end offset of current dirread buffer * @buf: dirread buffer * * private structure for keeping track of readdir * allocated on demand */ struct p9_rdir { int head; int tail; uint8_t buf[]; }; /** * dt_type - return file type * @mistat: mistat structure * */ static inline int dt_type(struct p9_wstat *mistat) { unsigned long perm = mistat->mode; int rettype = DT_REG; if (perm & P9_DMDIR) rettype = DT_DIR; if (perm & P9_DMSYMLINK) rettype = DT_LNK; return rettype; } /** * v9fs_alloc_rdir_buf - Allocate buffer used for read and readdir * @filp: opened file structure * @buflen: Length in bytes of buffer to allocate * */ static struct p9_rdir *v9fs_alloc_rdir_buf(struct file *filp, int buflen) { struct p9_fid *fid = filp->private_data; if (!fid->rdir) fid->rdir = kzalloc(sizeof(struct p9_rdir) + buflen, GFP_KERNEL); return fid->rdir; } /** * v9fs_dir_readdir - iterate through a directory * @file: opened file structure * @ctx: actor we feed the entries to * */ static int v9fs_dir_readdir(struct file *file, struct dir_context *ctx) { bool over; struct p9_wstat st; int err = 0; struct p9_fid *fid; int buflen; struct p9_rdir *rdir; struct kvec kvec; p9_debug(P9_DEBUG_VFS, "name %pD\n", file); fid = file->private_data; buflen = fid->clnt->msize - P9_IOHDRSZ; rdir = v9fs_alloc_rdir_buf(file, buflen); if (!rdir) return -ENOMEM; kvec.iov_base = rdir->buf; kvec.iov_len = buflen; while (1) { if (rdir->tail == rdir->head) { struct iov_iter to; int n; iov_iter_kvec(&to, ITER_DEST, &kvec, 1, buflen); n = p9_client_read(file->private_data, ctx->pos, &to, &err); if (err) return err; if (n == 0) return 0; rdir->head = 0; rdir->tail = n; } while (rdir->head < rdir->tail) { err = p9stat_read(fid->clnt, rdir->buf + rdir->head, rdir->tail - rdir->head, &st); if (err <= 0) { p9_debug(P9_DEBUG_VFS, "returned %d\n", err); return -EIO; } over = !dir_emit(ctx, st.name, strlen(st.name), QID2INO(&st.qid), dt_type(&st)); p9stat_free(&st); if (over) return 0; rdir->head += err; ctx->pos += err; } } } /** * v9fs_dir_readdir_dotl - iterate through a directory * @file: opened file structure * @ctx: actor we feed the entries to * */ static int v9fs_dir_readdir_dotl(struct file *file, struct dir_context *ctx) { int err = 0; struct p9_fid *fid; int buflen; struct p9_rdir *rdir; struct p9_dirent curdirent; p9_debug(P9_DEBUG_VFS, "name %pD\n", file); fid = file->private_data; buflen = fid->clnt->msize - P9_READDIRHDRSZ; rdir = v9fs_alloc_rdir_buf(file, buflen); if (!rdir) return -ENOMEM; while (1) { if (rdir->tail == rdir->head) { err = p9_client_readdir(fid, rdir->buf, buflen, ctx->pos); if (err <= 0) return err; rdir->head = 0; rdir->tail = err; } while (rdir->head < rdir->tail) { err = p9dirent_read(fid->clnt, rdir->buf + rdir->head, rdir->tail - rdir->head, &curdirent); if (err < 0) { p9_debug(P9_DEBUG_VFS, "returned %d\n", err); return -EIO; } if (!dir_emit(ctx, curdirent.d_name, strlen(curdirent.d_name), QID2INO(&curdirent.qid), curdirent.d_type)) return 0; ctx->pos = curdirent.d_off; rdir->head += err; } } } /** * v9fs_dir_release - close a directory or a file * @inode: inode of the directory or file * @filp: file pointer to a directory or file * */ int v9fs_dir_release(struct inode *inode, struct file *filp) { struct v9fs_inode *v9inode = V9FS_I(inode); struct p9_fid *fid; __le32 version; loff_t i_size; int retval = 0, put_err; fid = filp->private_data; p9_debug(P9_DEBUG_VFS, "inode: %p filp: %p fid: %d\n", inode, filp, fid ? fid->fid : -1); if (fid) { if ((S_ISREG(inode->i_mode)) && (filp->f_mode & FMODE_WRITE)) retval = filemap_fdatawrite(inode->i_mapping); spin_lock(&inode->i_lock); hlist_del(&fid->ilist); spin_unlock(&inode->i_lock); put_err = p9_fid_put(fid); retval = retval < 0 ? retval : put_err; } if ((filp->f_mode & FMODE_WRITE)) { version = cpu_to_le32(v9inode->qid.version); i_size = i_size_read(inode); fscache_unuse_cookie(v9fs_inode_cookie(v9inode), &version, &i_size); } else { fscache_unuse_cookie(v9fs_inode_cookie(v9inode), NULL, NULL); } return retval; } const struct file_operations v9fs_dir_operations = { .read = generic_read_dir, .llseek = generic_file_llseek, .iterate_shared = v9fs_dir_readdir, .open = v9fs_file_open, .release = v9fs_dir_release, }; const struct file_operations v9fs_dir_operations_dotl = { .read = generic_read_dir, .llseek = generic_file_llseek, .iterate_shared = v9fs_dir_readdir_dotl, .open = v9fs_file_open, .release = v9fs_dir_release, .fsync = v9fs_file_fsync_dotl, }; |
| 272 1 272 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 | /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _LINUX_MM_TYPES_H #define _LINUX_MM_TYPES_H #include <linux/mm_types_task.h> #include <linux/auxvec.h> #include <linux/kref.h> #include <linux/list.h> #include <linux/spinlock.h> #include <linux/rbtree.h> #include <linux/maple_tree.h> #include <linux/rwsem.h> #include <linux/completion.h> #include <linux/cpumask.h> #include <linux/uprobes.h> #include <linux/rcupdate.h> #include <linux/page-flags-layout.h> #include <linux/workqueue.h> #include <linux/seqlock.h> #include <linux/percpu_counter.h> #include <asm/mmu.h> #ifndef AT_VECTOR_SIZE_ARCH #define AT_VECTOR_SIZE_ARCH 0 #endif #define AT_VECTOR_SIZE (2*(AT_VECTOR_SIZE_ARCH + AT_VECTOR_SIZE_BASE + 1)) #define INIT_PASID 0 struct address_space; struct mem_cgroup; /* * Each physical page in the system has a struct page associated with * it to keep track of whatever it is we are using the page for at the * moment. Note that we have no way to track which tasks are using * a page, though if it is a pagecache page, rmap structures can tell us * who is mapping it. * * If you allocate the page using alloc_pages(), you can use some of the * space in struct page for your own purposes. The five words in the main * union are available, except for bit 0 of the first word which must be * kept clear. Many users use this word to store a pointer to an object * which is guaranteed to be aligned. If you use the same storage as * page->mapping, you must restore it to NULL before freeing the page. * * The mapcount field must not be used for own purposes. * * If you want to use the refcount field, it must be used in such a way * that other CPUs temporarily incrementing and then decrementing the * refcount does not cause problems. On receiving the page from * alloc_pages(), the refcount will be positive. * * If you allocate pages of order > 0, you can use some of the fields * in each subpage, but you may need to restore some of their values * afterwards. * * SLUB uses cmpxchg_double() to atomically update its freelist and counters. * That requires that freelist & counters in struct slab be adjacent and * double-word aligned. Because struct slab currently just reinterprets the * bits of struct page, we align all struct pages to double-word boundaries, * and ensure that 'freelist' is aligned within struct slab. */ #ifdef CONFIG_HAVE_ALIGNED_STRUCT_PAGE #define _struct_page_alignment __aligned(2 * sizeof(unsigned long)) #else #define _struct_page_alignment __aligned(sizeof(unsigned long)) #endif struct page { unsigned long flags; /* Atomic flags, some possibly * updated asynchronously */ /* * Five words (20/40 bytes) are available in this union. * WARNING: bit 0 of the first word is used for PageTail(). That * means the other users of this union MUST NOT use the bit to * avoid collision and false-positive PageTail(). */ union { struct { /* Page cache and anonymous pages */ /** * @lru: Pageout list, eg. active_list protected by * lruvec->lru_lock. Sometimes used as a generic list * by the page owner. */ union { struct list_head lru; /* Or, for the Unevictable "LRU list" slot */ struct { /* Always even, to negate PageTail */ void *__filler; /* Count page's or folio's mlocks */ unsigned int mlock_count; }; /* Or, free page */ struct list_head buddy_list; struct list_head pcp_list; }; /* See page-flags.h for PAGE_MAPPING_FLAGS */ struct address_space *mapping; union { pgoff_t index; /* Our offset within mapping. */ unsigned long share; /* share count for fsdax */ }; /** * @private: Mapping-private opaque data. * Usually used for buffer_heads if PagePrivate. * Used for swp_entry_t if swapcache flag set. * Indicates order in the buddy system if PageBuddy. */ unsigned long private; }; struct { /* page_pool used by netstack */ /** * @pp_magic: magic value to avoid recycling non * page_pool allocated pages. */ unsigned long pp_magic; struct page_pool *pp; unsigned long _pp_mapping_pad; unsigned long dma_addr; atomic_long_t pp_ref_count; }; struct { /* Tail pages of compound page */ unsigned long compound_head; /* Bit zero is set */ }; struct { /* ZONE_DEVICE pages */ /** @pgmap: Points to the hosting device page map. */ struct dev_pagemap *pgmap; void *zone_device_data; /* * ZONE_DEVICE private pages are counted as being * mapped so the next 3 words hold the mapping, index, * and private fields from the source anonymous or * page cache page while the page is migrated to device * private memory. * ZONE_DEVICE MEMORY_DEVICE_FS_DAX pages also * use the mapping, index, and private fields when * pmem backed DAX files are mapped. */ }; /** @rcu_head: You can use this to free a page by RCU. */ struct rcu_head rcu_head; }; union { /* This union is 4 bytes in size. */ /* * For head pages of typed folios, the value stored here * allows for determining what this page is used for. The * tail pages of typed folios will not store a type * (page_type == _mapcount == -1). * * See page-flags.h for a list of page types which are currently * stored here. * * Owners of typed folios may reuse the lower 16 bit of the * head page page_type field after setting the page type, * but must reset these 16 bit to -1 before clearing the * page type. */ unsigned int page_type; /* * For pages that are part of non-typed folios for which mappings * are tracked via the RMAP, encodes the number of times this page * is directly referenced by a page table. * * Note that the mapcount is always initialized to -1, so that * transitions both from it and to it can be tracked, using * atomic_inc_and_test() and atomic_add_negative(-1). */ atomic_t _mapcount; }; /* Usage count. *DO NOT USE DIRECTLY*. See page_ref.h */ atomic_t _refcount; #ifdef CONFIG_MEMCG unsigned long memcg_data; #elif defined(CONFIG_SLAB_OBJ_EXT) unsigned long _unused_slab_obj_exts; #endif /* * On machines where all RAM is mapped into kernel address space, * we can simply calculate the virtual address. On machines with * highmem some memory is mapped into kernel virtual memory * dynamically, so we need a place to store that address. * Note that this field could be 16 bits on x86 ... ;) * * Architectures with slow multiplication can define * WANT_PAGE_VIRTUAL in asm/page.h */ #if defined(WANT_PAGE_VIRTUAL) void *virtual; /* Kernel virtual address (NULL if not kmapped, ie. highmem) */ #endif /* WANT_PAGE_VIRTUAL */ #ifdef LAST_CPUPID_NOT_IN_PAGE_FLAGS int _last_cpupid; #endif #ifdef CONFIG_KMSAN /* * KMSAN metadata for this page: * - shadow page: every bit indicates whether the corresponding * bit of the original page is initialized (0) or not (1); * - origin page: every 4 bytes contain an id of the stack trace * where the uninitialized value was created. */ struct page *kmsan_shadow; struct page *kmsan_origin; #endif } _struct_page_alignment; /* * struct encoded_page - a nonexistent type marking this pointer * * An 'encoded_page' pointer is a pointer to a regular 'struct page', but * with the low bits of the pointer indicating extra context-dependent * information. Only used in mmu_gather handling, and this acts as a type * system check on that use. * * We only really have two guaranteed bits in general, although you could * play with 'struct page' alignment (see CONFIG_HAVE_ALIGNED_STRUCT_PAGE) * for more. * * Use the supplied helper functions to endcode/decode the pointer and bits. */ struct encoded_page; #define ENCODED_PAGE_BITS 3ul /* Perform rmap removal after we have flushed the TLB. */ #define ENCODED_PAGE_BIT_DELAY_RMAP 1ul /* * The next item in an encoded_page array is the "nr_pages" argument, specifying * the number of consecutive pages starting from this page, that all belong to * the same folio. For example, "nr_pages" corresponds to the number of folio * references that must be dropped. If this bit is not set, "nr_pages" is * implicitly 1. */ #define ENCODED_PAGE_BIT_NR_PAGES_NEXT 2ul static __always_inline struct encoded_page *encode_page(struct page *page, unsigned long flags) { BUILD_BUG_ON(flags > ENCODED_PAGE_BITS); return (struct encoded_page *)(flags | (unsigned long)page); } static inline unsigned long encoded_page_flags(struct encoded_page *page) { return ENCODED_PAGE_BITS & (unsigned long)page; } static inline struct page *encoded_page_ptr(struct encoded_page *page) { return (struct page *)(~ENCODED_PAGE_BITS & (unsigned long)page); } static __always_inline struct encoded_page *encode_nr_pages(unsigned long nr) { VM_WARN_ON_ONCE((nr << 2) >> 2 != nr); return (struct encoded_page *)(nr << 2); } static __always_inline unsigned long encoded_nr_pages(struct encoded_page *page) { return ((unsigned long)page) >> 2; } /* * A swap entry has to fit into a "unsigned long", as the entry is hidden * in the "index" field of the swapper address space. */ typedef struct { unsigned long val; } swp_entry_t; /** * struct folio - Represents a contiguous set of bytes. * @flags: Identical to the page flags. * @lru: Least Recently Used list; tracks how recently this folio was used. * @mlock_count: Number of times this folio has been pinned by mlock(). * @mapping: The file this page belongs to, or refers to the anon_vma for * anonymous memory. * @index: Offset within the file, in units of pages. For anonymous memory, * this is the index from the beginning of the mmap. * @private: Filesystem per-folio data (see folio_attach_private()). * @swap: Used for swp_entry_t if folio_test_swapcache(). * @_mapcount: Do not access this member directly. Use folio_mapcount() to * find out how many times this folio is mapped by userspace. * @_refcount: Do not access this member directly. Use folio_ref_count() * to find how many references there are to this folio. * @memcg_data: Memory Control Group data. * @virtual: Virtual address in the kernel direct map. * @_last_cpupid: IDs of last CPU and last process that accessed the folio. * @_entire_mapcount: Do not use directly, call folio_entire_mapcount(). * @_large_mapcount: Do not use directly, call folio_mapcount(). * @_nr_pages_mapped: Do not use outside of rmap and debug code. * @_pincount: Do not use directly, call folio_maybe_dma_pinned(). * @_folio_nr_pages: Do not use directly, call folio_nr_pages(). * @_hugetlb_subpool: Do not use directly, use accessor in hugetlb.h. * @_hugetlb_cgroup: Do not use directly, use accessor in hugetlb_cgroup.h. * @_hugetlb_cgroup_rsvd: Do not use directly, use accessor in hugetlb_cgroup.h. * @_hugetlb_hwpoison: Do not use directly, call raw_hwp_list_head(). * @_deferred_list: Folios to be split under memory pressure. * @_unused_slab_obj_exts: Placeholder to match obj_exts in struct slab. * * A folio is a physically, virtually and logically contiguous set * of bytes. It is a power-of-two in size, and it is aligned to that * same power-of-two. It is at least as large as %PAGE_SIZE. If it is * in the page cache, it is at a file offset which is a multiple of that * power-of-two. It may be mapped into userspace at an address which is * at an arbitrary page offset, but its kernel virtual address is aligned * to its size. */ struct folio { /* private: don't document the anon union */ union { struct { /* public: */ unsigned long flags; union { struct list_head lru; /* private: avoid cluttering the output */ struct { void *__filler; /* public: */ unsigned int mlock_count; /* private: */ }; /* public: */ }; struct address_space *mapping; pgoff_t index; union { void *private; swp_entry_t swap; }; atomic_t _mapcount; atomic_t _refcount; #ifdef CONFIG_MEMCG unsigned long memcg_data; #elif defined(CONFIG_SLAB_OBJ_EXT) unsigned long _unused_slab_obj_exts; #endif #if defined(WANT_PAGE_VIRTUAL) void *virtual; #endif #ifdef LAST_CPUPID_NOT_IN_PAGE_FLAGS int _last_cpupid; #endif /* private: the union with struct page is transitional */ }; struct page page; }; union { struct { unsigned long _flags_1; unsigned long _head_1; /* public: */ atomic_t _large_mapcount; atomic_t _entire_mapcount; atomic_t _nr_pages_mapped; atomic_t _pincount; #ifdef CONFIG_64BIT unsigned int _folio_nr_pages; #endif /* private: the union with struct page is transitional */ }; struct page __page_1; }; union { struct { unsigned long _flags_2; unsigned long _head_2; /* public: */ void *_hugetlb_subpool; void *_hugetlb_cgroup; void *_hugetlb_cgroup_rsvd; void *_hugetlb_hwpoison; /* private: the union with struct page is transitional */ }; struct { unsigned long _flags_2a; unsigned long _head_2a; /* public: */ struct list_head _deferred_list; /* private: the union with struct page is transitional */ }; struct page __page_2; }; }; #define FOLIO_MATCH(pg, fl) \ static_assert(offsetof(struct page, pg) == offsetof(struct folio, fl)) FOLIO_MATCH(flags, flags); FOLIO_MATCH(lru, lru); FOLIO_MATCH(mapping, mapping); FOLIO_MATCH(compound_head, lru); FOLIO_MATCH(index, index); FOLIO_MATCH(private, private); FOLIO_MATCH(_mapcount, _mapcount); FOLIO_MATCH(_refcount, _refcount); #ifdef CONFIG_MEMCG FOLIO_MATCH(memcg_data, memcg_data); #endif #if defined(WANT_PAGE_VIRTUAL) FOLIO_MATCH(virtual, virtual); #endif #ifdef LAST_CPUPID_NOT_IN_PAGE_FLAGS FOLIO_MATCH(_last_cpupid, _last_cpupid); #endif #undef FOLIO_MATCH #define FOLIO_MATCH(pg, fl) \ static_assert(offsetof(struct folio, fl) == \ offsetof(struct page, pg) + sizeof(struct page)) FOLIO_MATCH(flags, _flags_1); FOLIO_MATCH(compound_head, _head_1); #undef FOLIO_MATCH #define FOLIO_MATCH(pg, fl) \ static_assert(offsetof(struct folio, fl) == \ offsetof(struct page, pg) + 2 * sizeof(struct page)) FOLIO_MATCH(flags, _flags_2); FOLIO_MATCH(compound_head, _head_2); FOLIO_MATCH(flags, _flags_2a); FOLIO_MATCH(compound_head, _head_2a); #undef FOLIO_MATCH /** * struct ptdesc - Memory descriptor for page tables. * @__page_flags: Same as page flags. Powerpc only. * @pt_rcu_head: For freeing page table pages. * @pt_list: List of used page tables. Used for s390 gmap shadow pages * (which are not linked into the user page tables) and x86 * pgds. * @_pt_pad_1: Padding that aliases with page's compound head. * @pmd_huge_pte: Protected by ptdesc->ptl, used for THPs. * @__page_mapping: Aliases with page->mapping. Unused for page tables. * @pt_index: Used for s390 gmap. * @pt_mm: Used for x86 pgds. * @pt_frag_refcount: For fragmented page table tracking. Powerpc only. * @pt_share_count: Used for HugeTLB PMD page table share count. * @_pt_pad_2: Padding to ensure proper alignment. * @ptl: Lock for the page table. * @__page_type: Same as page->page_type. Unused for page tables. * @__page_refcount: Same as page refcount. * @pt_memcg_data: Memcg data. Tracked for page tables here. * * This struct overlays struct page for now. Do not modify without a good * understanding of the issues. */ struct ptdesc { unsigned long __page_flags; union { struct rcu_head pt_rcu_head; struct list_head pt_list; struct { unsigned long _pt_pad_1; pgtable_t pmd_huge_pte; }; }; unsigned long __page_mapping; union { pgoff_t pt_index; struct mm_struct *pt_mm; atomic_t pt_frag_refcount; #ifdef CONFIG_HUGETLB_PMD_PAGE_TABLE_SHARING atomic_t pt_share_count; #endif }; union { unsigned long _pt_pad_2; #if ALLOC_SPLIT_PTLOCKS spinlock_t *ptl; #else spinlock_t ptl; #endif }; unsigned int __page_type; atomic_t __page_refcount; #ifdef CONFIG_MEMCG unsigned long pt_memcg_data; #endif }; #define TABLE_MATCH(pg, pt) \ static_assert(offsetof(struct page, pg) == offsetof(struct ptdesc, pt)) TABLE_MATCH(flags, __page_flags); TABLE_MATCH(compound_head, pt_list); TABLE_MATCH(compound_head, _pt_pad_1); TABLE_MATCH(mapping, __page_mapping); TABLE_MATCH(index, pt_index); TABLE_MATCH(rcu_head, pt_rcu_head); TABLE_MATCH(page_type, __page_type); TABLE_MATCH(_refcount, __page_refcount); #ifdef CONFIG_MEMCG TABLE_MATCH(memcg_data, pt_memcg_data); #endif #undef TABLE_MATCH static_assert(sizeof(struct ptdesc) <= sizeof(struct page)); #define ptdesc_page(pt) (_Generic((pt), \ const struct ptdesc *: (const struct page *)(pt), \ struct ptdesc *: (struct page *)(pt))) #define ptdesc_folio(pt) (_Generic((pt), \ const struct ptdesc *: (const struct folio *)(pt), \ struct ptdesc *: (struct folio *)(pt))) #define page_ptdesc(p) (_Generic((p), \ const struct page *: (const struct ptdesc *)(p), \ struct page *: (struct ptdesc *)(p))) #ifdef CONFIG_HUGETLB_PMD_PAGE_TABLE_SHARING static inline void ptdesc_pmd_pts_init(struct ptdesc *ptdesc) { atomic_set(&ptdesc->pt_share_count, 0); } static inline void ptdesc_pmd_pts_inc(struct ptdesc *ptdesc) { atomic_inc(&ptdesc->pt_share_count); } static inline void ptdesc_pmd_pts_dec(struct ptdesc *ptdesc) { atomic_dec(&ptdesc->pt_share_count); } static inline int ptdesc_pmd_pts_count(struct ptdesc *ptdesc) { return atomic_read(&ptdesc->pt_share_count); } #else static inline void ptdesc_pmd_pts_init(struct ptdesc *ptdesc) { } #endif /* * Used for sizing the vmemmap region on some architectures */ #define STRUCT_PAGE_MAX_SHIFT (order_base_2(sizeof(struct page))) /* * page_private can be used on tail pages. However, PagePrivate is only * checked by the VM on the head page. So page_private on the tail pages * should be used for data that's ancillary to the head page (eg attaching * buffer heads to tail pages after attaching buffer heads to the head page) */ #define page_private(page) ((page)->private) static inline void set_page_private(struct page *page, unsigned long private) { page->private = private; } static inline void *folio_get_private(struct folio *folio) { return folio->private; } typedef unsigned long vm_flags_t; /* * A region containing a mapping of a non-memory backed file under NOMMU * conditions. These are held in a global tree and are pinned by the VMAs that * map parts of them. */ struct vm_region { struct rb_node vm_rb; /* link in global region tree */ vm_flags_t vm_flags; /* VMA vm_flags */ unsigned long vm_start; /* start address of region */ unsigned long vm_end; /* region initialised to here */ unsigned long vm_top; /* region allocated to here */ unsigned long vm_pgoff; /* the offset in vm_file corresponding to vm_start */ struct file *vm_file; /* the backing file or NULL */ int vm_usage; /* region usage count (access under nommu_region_sem) */ bool vm_icache_flushed : 1; /* true if the icache has been flushed for * this region */ }; #ifdef CONFIG_USERFAULTFD #define NULL_VM_UFFD_CTX ((struct vm_userfaultfd_ctx) { NULL, }) struct vm_userfaultfd_ctx { struct userfaultfd_ctx *ctx; }; #else /* CONFIG_USERFAULTFD */ #define NULL_VM_UFFD_CTX ((struct vm_userfaultfd_ctx) {}) struct vm_userfaultfd_ctx {}; #endif /* CONFIG_USERFAULTFD */ struct anon_vma_name { struct kref kref; /* The name needs to be at the end because it is dynamically sized. */ char name[]; }; #ifdef CONFIG_ANON_VMA_NAME /* * mmap_lock should be read-locked when calling anon_vma_name(). Caller should * either keep holding the lock while using the returned pointer or it should * raise anon_vma_name refcount before releasing the lock. */ struct anon_vma_name *anon_vma_name(struct vm_area_struct *vma); struct anon_vma_name *anon_vma_name_alloc(const char *name); void anon_vma_name_free(struct kref *kref); #else /* CONFIG_ANON_VMA_NAME */ static inline struct anon_vma_name *anon_vma_name(struct vm_area_struct *vma) { return NULL; } static inline struct anon_vma_name *anon_vma_name_alloc(const char *name) { return NULL; } #endif struct vma_lock { struct rw_semaphore lock; }; struct vma_numab_state { /* * Initialised as time in 'jiffies' after which VMA * should be scanned. Delays first scan of new VMA by at * least sysctl_numa_balancing_scan_delay: */ unsigned long next_scan; /* * Time in jiffies when pids_active[] is reset to * detect phase change behaviour: */ unsigned long pids_active_reset; /* * Approximate tracking of PIDs that trapped a NUMA hinting * fault. May produce false positives due to hash collisions. * * [0] Previous PID tracking * [1] Current PID tracking * * Window moves after next_pid_reset has expired approximately * every VMA_PID_RESET_PERIOD jiffies: */ unsigned long pids_active[2]; /* MM scan sequence ID when scan first started after VMA creation */ int start_scan_seq; /* * MM scan sequence ID when the VMA was last completely scanned. * A VMA is not eligible for scanning if prev_scan_seq == numa_scan_seq */ int prev_scan_seq; }; /* * This struct describes a virtual memory area. There is one of these * per VM-area/task. A VM area is any part of the process virtual memory * space that has a special rule for the page-fault handlers (ie a shared * library, the executable area etc). * * Only explicitly marked struct members may be accessed by RCU readers before * getting a stable reference. */ struct vm_area_struct { /* The first cache line has the info for VMA tree walking. */ union { struct { /* VMA covers [vm_start; vm_end) addresses within mm */ unsigned long vm_start; unsigned long vm_end; }; #ifdef CONFIG_PER_VMA_LOCK struct rcu_head vm_rcu; /* Used for deferred freeing. */ #endif }; /* * The address space we belong to. * Unstable RCU readers are allowed to read this. */ struct mm_struct *vm_mm; pgprot_t vm_page_prot; /* Access permissions of this VMA. */ /* * Flags, see mm.h. * To modify use vm_flags_{init|reset|set|clear|mod} functions. */ union { const vm_flags_t vm_flags; vm_flags_t __private __vm_flags; }; #ifdef CONFIG_PER_VMA_LOCK /* * Flag to indicate areas detached from the mm->mm_mt tree. * Unstable RCU readers are allowed to read this. */ bool detached; /* * Can only be written (using WRITE_ONCE()) while holding both: * - mmap_lock (in write mode) * - vm_lock->lock (in write mode) * Can be read reliably while holding one of: * - mmap_lock (in read or write mode) * - vm_lock->lock (in read or write mode) * Can be read unreliably (using READ_ONCE()) for pessimistic bailout * while holding nothing (except RCU to keep the VMA struct allocated). * * This sequence counter is explicitly allowed to overflow; sequence * counter reuse can only lead to occasional unnecessary use of the * slowpath. */ unsigned int vm_lock_seq; /* Unstable RCU readers are allowed to read this. */ struct vma_lock *vm_lock; #endif /* * For areas with an address space and backing store, * linkage into the address_space->i_mmap interval tree. * */ struct { struct rb_node rb; unsigned long rb_subtree_last; } shared; /* * A file's MAP_PRIVATE vma can be in both i_mmap tree and anon_vma * list, after a COW of one of the file pages. A MAP_SHARED vma * can only be in the i_mmap tree. An anonymous MAP_PRIVATE, stack * or brk vma (with NULL file) can only be in an anon_vma list. */ struct list_head anon_vma_chain; /* Serialized by mmap_lock & * page_table_lock */ struct anon_vma *anon_vma; /* Serialized by page_table_lock */ /* Function pointers to deal with this struct. */ const struct vm_operations_struct *vm_ops; /* Information about our backing store: */ unsigned long vm_pgoff; /* Offset (within vm_file) in PAGE_SIZE units */ struct file * vm_file; /* File we map to (can be NULL). */ void * vm_private_data; /* was vm_pte (shared mem) */ #ifdef CONFIG_ANON_VMA_NAME /* * For private and shared anonymous mappings, a pointer to a null * terminated string containing the name given to the vma, or NULL if * unnamed. Serialized by mmap_lock. Use anon_vma_name to access. */ struct anon_vma_name *anon_name; #endif #ifdef CONFIG_SWAP atomic_long_t swap_readahead_info; #endif #ifndef CONFIG_MMU struct vm_region *vm_region; /* NOMMU mapping region */ #endif #ifdef CONFIG_NUMA struct mempolicy *vm_policy; /* NUMA policy for the VMA */ #endif #ifdef CONFIG_NUMA_BALANCING struct vma_numab_state *numab_state; /* NUMA Balancing state */ #endif struct vm_userfaultfd_ctx vm_userfaultfd_ctx; } __randomize_layout; #ifdef CONFIG_NUMA #define vma_policy(vma) ((vma)->vm_policy) #else #define vma_policy(vma) NULL #endif #ifdef CONFIG_SCHED_MM_CID struct mm_cid { u64 time; int cid; int recent_cid; }; #endif struct kioctx_table; struct iommu_mm_data; struct mm_struct { struct { /* * Fields which are often written to are placed in a separate * cache line. */ struct { /** * @mm_count: The number of references to &struct * mm_struct (@mm_users count as 1). * * Use mmgrab()/mmdrop() to modify. When this drops to * 0, the &struct mm_struct is freed. */ atomic_t mm_count; } ____cacheline_aligned_in_smp; struct maple_tree mm_mt; unsigned long mmap_base; /* base of mmap area */ unsigned long mmap_legacy_base; /* base of mmap area in bottom-up allocations */ #ifdef CONFIG_HAVE_ARCH_COMPAT_MMAP_BASES /* Base addresses for compatible mmap() */ unsigned long mmap_compat_base; unsigned long mmap_compat_legacy_base; #endif unsigned long task_size; /* size of task vm space */ pgd_t * pgd; #ifdef CONFIG_MEMBARRIER /** * @membarrier_state: Flags controlling membarrier behavior. * * This field is close to @pgd to hopefully fit in the same * cache-line, which needs to be touched by switch_mm(). */ atomic_t membarrier_state; #endif /** * @mm_users: The number of users including userspace. * * Use mmget()/mmget_not_zero()/mmput() to modify. When this * drops to 0 (i.e. when the task exits and there are no other * temporary reference holders), we also release a reference on * @mm_count (which may then free the &struct mm_struct if * @mm_count also drops to 0). */ atomic_t mm_users; #ifdef CONFIG_SCHED_MM_CID /** * @pcpu_cid: Per-cpu current cid. * * Keep track of the currently allocated mm_cid for each cpu. * The per-cpu mm_cid values are serialized by their respective * runqueue locks. */ struct mm_cid __percpu *pcpu_cid; /* * @mm_cid_next_scan: Next mm_cid scan (in jiffies). * * When the next mm_cid scan is due (in jiffies). */ unsigned long mm_cid_next_scan; /** * @nr_cpus_allowed: Number of CPUs allowed for mm. * * Number of CPUs allowed in the union of all mm's * threads allowed CPUs. */ unsigned int nr_cpus_allowed; /** * @max_nr_cid: Maximum number of concurrency IDs allocated. * * Track the highest number of concurrency IDs allocated for the * mm. */ atomic_t max_nr_cid; /** * @cpus_allowed_lock: Lock protecting mm cpus_allowed. * * Provide mutual exclusion for mm cpus_allowed and * mm nr_cpus_allowed updates. */ raw_spinlock_t cpus_allowed_lock; #endif #ifdef CONFIG_MMU atomic_long_t pgtables_bytes; /* size of all page tables */ #endif int map_count; /* number of VMAs */ spinlock_t page_table_lock; /* Protects page tables and some * counters */ /* * With some kernel config, the current mmap_lock's offset * inside 'mm_struct' is at 0x120, which is very optimal, as * its two hot fields 'count' and 'owner' sit in 2 different * cachelines, and when mmap_lock is highly contended, both * of the 2 fields will be accessed frequently, current layout * will help to reduce cache bouncing. * * So please be careful with adding new fields before * mmap_lock, which can easily push the 2 fields into one * cacheline. */ struct rw_semaphore mmap_lock; struct list_head mmlist; /* List of maybe swapped mm's. These * are globally strung together off * init_mm.mmlist, and are protected * by mmlist_lock */ #ifdef CONFIG_PER_VMA_LOCK /* * This field has lock-like semantics, meaning it is sometimes * accessed with ACQUIRE/RELEASE semantics. * Roughly speaking, incrementing the sequence number is * equivalent to releasing locks on VMAs; reading the sequence * number can be part of taking a read lock on a VMA. * Incremented every time mmap_lock is write-locked/unlocked. * Initialized to 0, therefore odd values indicate mmap_lock * is write-locked and even values that it's released. * * Can be modified under write mmap_lock using RELEASE * semantics. * Can be read with no other protection when holding write * mmap_lock. * Can be read with ACQUIRE semantics if not holding write * mmap_lock. */ seqcount_t mm_lock_seq; #endif unsigned long hiwater_rss; /* High-watermark of RSS usage */ unsigned long hiwater_vm; /* High-water virtual memory usage */ unsigned long total_vm; /* Total pages mapped */ unsigned long locked_vm; /* Pages that have PG_mlocked set */ atomic64_t pinned_vm; /* Refcount permanently increased */ unsigned long data_vm; /* VM_WRITE & ~VM_SHARED & ~VM_STACK */ unsigned long exec_vm; /* VM_EXEC & ~VM_WRITE & ~VM_STACK */ unsigned long stack_vm; /* VM_STACK */ unsigned long def_flags; /** * @write_protect_seq: Locked when any thread is write * protecting pages mapped by this mm to enforce a later COW, * for instance during page table copying for fork(). */ seqcount_t write_protect_seq; spinlock_t arg_lock; /* protect the below fields */ unsigned long start_code, end_code, start_data, end_data; unsigned long start_brk, brk, start_stack; unsigned long arg_start, arg_end, env_start, env_end; unsigned long saved_auxv[AT_VECTOR_SIZE]; /* for /proc/PID/auxv */ struct percpu_counter rss_stat[NR_MM_COUNTERS]; struct linux_binfmt *binfmt; /* Architecture-specific MM context */ mm_context_t context; unsigned long flags; /* Must use atomic bitops to access */ #ifdef CONFIG_AIO spinlock_t ioctx_lock; struct kioctx_table __rcu *ioctx_table; #endif #ifdef CONFIG_MEMCG /* * "owner" points to a task that is regarded as the canonical * user/owner of this mm. All of the following must be true in * order for it to be changed: * * current == mm->owner * current->mm != mm * new_owner->mm == mm * new_owner->alloc_lock is held */ struct task_struct __rcu *owner; #endif struct user_namespace *user_ns; /* store ref to file /proc/<pid>/exe symlink points to */ struct file __rcu *exe_file; #ifdef CONFIG_MMU_NOTIFIER struct mmu_notifier_subscriptions *notifier_subscriptions; #endif #if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !defined(CONFIG_SPLIT_PMD_PTLOCKS) pgtable_t pmd_huge_pte; /* protected by page_table_lock */ #endif #ifdef CONFIG_NUMA_BALANCING /* * numa_next_scan is the next time that PTEs will be remapped * PROT_NONE to trigger NUMA hinting faults; such faults gather * statistics and migrate pages to new nodes if necessary. */ unsigned long numa_next_scan; /* Restart point for scanning and remapping PTEs. */ unsigned long numa_scan_offset; /* numa_scan_seq prevents two threads remapping PTEs. */ int numa_scan_seq; #endif /* * An operation with batched TLB flushing is going on. Anything * that can move process memory needs to flush the TLB when * moving a PROT_NONE mapped page. */ atomic_t tlb_flush_pending; #ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH /* See flush_tlb_batched_pending() */ atomic_t tlb_flush_batched; #endif struct uprobes_state uprobes_state; #ifdef CONFIG_PREEMPT_RT struct rcu_head delayed_drop; #endif #ifdef CONFIG_HUGETLB_PAGE atomic_long_t hugetlb_usage; #endif struct work_struct async_put_work; #ifdef CONFIG_IOMMU_MM_DATA struct iommu_mm_data *iommu_mm; #endif #ifdef CONFIG_KSM /* * Represent how many pages of this process are involved in KSM * merging (not including ksm_zero_pages). */ unsigned long ksm_merging_pages; /* * Represent how many pages are checked for ksm merging * including merged and not merged. */ unsigned long ksm_rmap_items; /* * Represent how many empty pages are merged with kernel zero * pages when enabling KSM use_zero_pages. */ atomic_long_t ksm_zero_pages; #endif /* CONFIG_KSM */ #ifdef CONFIG_LRU_GEN_WALKS_MMU struct { /* this mm_struct is on lru_gen_mm_list */ struct list_head list; /* * Set when switching to this mm_struct, as a hint of * whether it has been used since the last time per-node * page table walkers cleared the corresponding bits. */ unsigned long bitmap; #ifdef CONFIG_MEMCG /* points to the memcg of "owner" above */ struct mem_cgroup *memcg; #endif } lru_gen; #endif /* CONFIG_LRU_GEN_WALKS_MMU */ } __randomize_layout; /* * The mm_cpumask needs to be at the end of mm_struct, because it * is dynamically sized based on nr_cpu_ids. */ unsigned long cpu_bitmap[]; }; #define MM_MT_FLAGS (MT_FLAGS_ALLOC_RANGE | MT_FLAGS_LOCK_EXTERN | \ MT_FLAGS_USE_RCU) extern struct mm_struct init_mm; /* Pointer magic because the dynamic array size confuses some compilers. */ static inline void mm_init_cpumask(struct mm_struct *mm) { unsigned long cpu_bitmap = (unsigned long)mm; cpu_bitmap += offsetof(struct mm_struct, cpu_bitmap); cpumask_clear((struct cpumask *)cpu_bitmap); } /* Future-safe accessor for struct mm_struct's cpu_vm_mask. */ static inline cpumask_t *mm_cpumask(struct mm_struct *mm) { return (struct cpumask *)&mm->cpu_bitmap; } #ifdef CONFIG_LRU_GEN struct lru_gen_mm_list { /* mm_struct list for page table walkers */ struct list_head fifo; /* protects the list above */ spinlock_t lock; }; #endif /* CONFIG_LRU_GEN */ #ifdef CONFIG_LRU_GEN_WALKS_MMU void lru_gen_add_mm(struct mm_struct *mm); void lru_gen_del_mm(struct mm_struct *mm); void lru_gen_migrate_mm(struct mm_struct *mm); static inline void lru_gen_init_mm(struct mm_struct *mm) { INIT_LIST_HEAD(&mm->lru_gen.list); mm->lru_gen.bitmap = 0; #ifdef CONFIG_MEMCG mm->lru_gen.memcg = NULL; #endif } static inline void lru_gen_use_mm(struct mm_struct *mm) { /* * When the bitmap is set, page reclaim knows this mm_struct has been * used since the last time it cleared the bitmap. So it might be worth * walking the page tables of this mm_struct to clear the accessed bit. */ WRITE_ONCE(mm->lru_gen.bitmap, -1); } #else /* !CONFIG_LRU_GEN_WALKS_MMU */ static inline void lru_gen_add_mm(struct mm_struct *mm) { } static inline void lru_gen_del_mm(struct mm_struct *mm) { } static inline void lru_gen_migrate_mm(struct mm_struct *mm) { } static inline void lru_gen_init_mm(struct mm_struct *mm) { } static inline void lru_gen_use_mm(struct mm_struct *mm) { } #endif /* CONFIG_LRU_GEN_WALKS_MMU */ struct vma_iterator { struct ma_state mas; }; #define VMA_ITERATOR(name, __mm, __addr) \ struct vma_iterator name = { \ .mas = { \ .tree = &(__mm)->mm_mt, \ .index = __addr, \ .node = NULL, \ .status = ma_start, \ }, \ } static inline void vma_iter_init(struct vma_iterator *vmi, struct mm_struct *mm, unsigned long addr) { mas_init(&vmi->mas, &mm->mm_mt, addr); } #ifdef CONFIG_SCHED_MM_CID enum mm_cid_state { MM_CID_UNSET = -1U, /* Unset state has lazy_put flag set. */ MM_CID_LAZY_PUT = (1U << 31), }; static inline bool mm_cid_is_unset(int cid) { return cid == MM_CID_UNSET; } static inline bool mm_cid_is_lazy_put(int cid) { return !mm_cid_is_unset(cid) && (cid & MM_CID_LAZY_PUT); } static inline bool mm_cid_is_valid(int cid) { return !(cid & MM_CID_LAZY_PUT); } static inline int mm_cid_set_lazy_put(int cid) { return cid | MM_CID_LAZY_PUT; } static inline int mm_cid_clear_lazy_put(int cid) { return cid & ~MM_CID_LAZY_PUT; } /* * mm_cpus_allowed: Union of all mm's threads allowed CPUs. */ static inline cpumask_t *mm_cpus_allowed(struct mm_struct *mm) { unsigned long bitmap = (unsigned long)mm; bitmap += offsetof(struct mm_struct, cpu_bitmap); /* Skip cpu_bitmap */ bitmap += cpumask_size(); return (struct cpumask *)bitmap; } /* Accessor for struct mm_struct's cidmask. */ static inline cpumask_t *mm_cidmask(struct mm_struct *mm) { unsigned long cid_bitmap = (unsigned long)mm_cpus_allowed(mm); /* Skip mm_cpus_allowed */ cid_bitmap += cpumask_size(); return (struct cpumask *)cid_bitmap; } static inline void mm_init_cid(struct mm_struct *mm, struct task_struct *p) { int i; for_each_possible_cpu(i) { struct mm_cid *pcpu_cid = per_cpu_ptr(mm->pcpu_cid, i); pcpu_cid->cid = MM_CID_UNSET; pcpu_cid->recent_cid = MM_CID_UNSET; pcpu_cid->time = 0; } mm->nr_cpus_allowed = p->nr_cpus_allowed; atomic_set(&mm->max_nr_cid, 0); raw_spin_lock_init(&mm->cpus_allowed_lock); cpumask_copy(mm_cpus_allowed(mm), &p->cpus_mask); cpumask_clear(mm_cidmask(mm)); } static inline int mm_alloc_cid_noprof(struct mm_struct *mm, struct task_struct *p) { mm->pcpu_cid = alloc_percpu_noprof(struct mm_cid); if (!mm->pcpu_cid) return -ENOMEM; mm_init_cid(mm, p); return 0; } #define mm_alloc_cid(...) alloc_hooks(mm_alloc_cid_noprof(__VA_ARGS__)) static inline void mm_destroy_cid(struct mm_struct *mm) { free_percpu(mm->pcpu_cid); mm->pcpu_cid = NULL; } static inline unsigned int mm_cid_size(void) { return 2 * cpumask_size(); /* mm_cpus_allowed(), mm_cidmask(). */ } static inline void mm_set_cpus_allowed(struct mm_struct *mm, const struct cpumask *cpumask) { struct cpumask *mm_allowed = mm_cpus_allowed(mm); if (!mm) return; /* The mm_cpus_allowed is the union of each thread allowed CPUs masks. */ raw_spin_lock(&mm->cpus_allowed_lock); cpumask_or(mm_allowed, mm_allowed, cpumask); WRITE_ONCE(mm->nr_cpus_allowed, cpumask_weight(mm_allowed)); raw_spin_unlock(&mm->cpus_allowed_lock); } #else /* CONFIG_SCHED_MM_CID */ static inline void mm_init_cid(struct mm_struct *mm, struct task_struct *p) { } static inline int mm_alloc_cid(struct mm_struct *mm, struct task_struct *p) { return 0; } static inline void mm_destroy_cid(struct mm_struct *mm) { } static inline unsigned int mm_cid_size(void) { return 0; } static inline void mm_set_cpus_allowed(struct mm_struct *mm, const struct cpumask *cpumask) { } #endif /* CONFIG_SCHED_MM_CID */ struct mmu_gather; extern void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm); extern void tlb_gather_mmu_fullmm(struct mmu_gather *tlb, struct mm_struct *mm); extern void tlb_finish_mmu(struct mmu_gather *tlb); struct vm_fault; /** * typedef vm_fault_t - Return type for page fault handlers. * * Page fault handlers return a bitmask of %VM_FAULT values. */ typedef __bitwise unsigned int vm_fault_t; /** * enum vm_fault_reason - Page fault handlers return a bitmask of * these values to tell the core VM what happened when handling the * fault. Used to decide whether a process gets delivered SIGBUS or * just gets major/minor fault counters bumped up. * * @VM_FAULT_OOM: Out Of Memory * @VM_FAULT_SIGBUS: Bad access * @VM_FAULT_MAJOR: Page read from storage * @VM_FAULT_HWPOISON: Hit poisoned small page * @VM_FAULT_HWPOISON_LARGE: Hit poisoned large page. Index encoded * in upper bits * @VM_FAULT_SIGSEGV: segmentation fault * @VM_FAULT_NOPAGE: ->fault installed the pte, not return page * @VM_FAULT_LOCKED: ->fault locked the returned page * @VM_FAULT_RETRY: ->fault blocked, must retry * @VM_FAULT_FALLBACK: huge page fault failed, fall back to small * @VM_FAULT_DONE_COW: ->fault has fully handled COW * @VM_FAULT_NEEDDSYNC: ->fault did not modify page tables and needs * fsync() to complete (for synchronous page faults * in DAX) * @VM_FAULT_COMPLETED: ->fault completed, meanwhile mmap lock released * @VM_FAULT_HINDEX_MASK: mask HINDEX value * */ enum vm_fault_reason { VM_FAULT_OOM = (__force vm_fault_t)0x000001, VM_FAULT_SIGBUS = (__force vm_fault_t)0x000002, VM_FAULT_MAJOR = (__force vm_fault_t)0x000004, VM_FAULT_HWPOISON = (__force vm_fault_t)0x000010, VM_FAULT_HWPOISON_LARGE = (__force vm_fault_t)0x000020, VM_FAULT_SIGSEGV = (__force vm_fault_t)0x000040, VM_FAULT_NOPAGE = (__force vm_fault_t)0x000100, VM_FAULT_LOCKED = (__force vm_fault_t)0x000200, VM_FAULT_RETRY = (__force vm_fault_t)0x000400, VM_FAULT_FALLBACK = (__force vm_fault_t)0x000800, VM_FAULT_DONE_COW = (__force vm_fault_t)0x001000, VM_FAULT_NEEDDSYNC = (__force vm_fault_t)0x002000, VM_FAULT_COMPLETED = (__force vm_fault_t)0x004000, VM_FAULT_HINDEX_MASK = (__force vm_fault_t)0x0f0000, }; /* Encode hstate index for a hwpoisoned large page */ #define VM_FAULT_SET_HINDEX(x) ((__force vm_fault_t)((x) << 16)) #define VM_FAULT_GET_HINDEX(x) (((__force unsigned int)(x) >> 16) & 0xf) #define VM_FAULT_ERROR (VM_FAULT_OOM | VM_FAULT_SIGBUS | \ VM_FAULT_SIGSEGV | VM_FAULT_HWPOISON | \ VM_FAULT_HWPOISON_LARGE | VM_FAULT_FALLBACK) #define VM_FAULT_RESULT_TRACE \ { VM_FAULT_OOM, "OOM" }, \ { VM_FAULT_SIGBUS, "SIGBUS" }, \ { VM_FAULT_MAJOR, "MAJOR" }, \ { VM_FAULT_HWPOISON, "HWPOISON" }, \ { VM_FAULT_HWPOISON_LARGE, "HWPOISON_LARGE" }, \ { VM_FAULT_SIGSEGV, "SIGSEGV" }, \ { VM_FAULT_NOPAGE, "NOPAGE" }, \ { VM_FAULT_LOCKED, "LOCKED" }, \ { VM_FAULT_RETRY, "RETRY" }, \ { VM_FAULT_FALLBACK, "FALLBACK" }, \ { VM_FAULT_DONE_COW, "DONE_COW" }, \ { VM_FAULT_NEEDDSYNC, "NEEDDSYNC" }, \ { VM_FAULT_COMPLETED, "COMPLETED" } struct vm_special_mapping { const char *name; /* The name, e.g. "[vdso]". */ /* * If .fault is not provided, this points to a * NULL-terminated array of pages that back the special mapping. * * This must not be NULL unless .fault is provided. */ struct page **pages; /* * If non-NULL, then this is called to resolve page faults * on the special mapping. If used, .pages is not checked. */ vm_fault_t (*fault)(const struct vm_special_mapping *sm, struct vm_area_struct *vma, struct vm_fault *vmf); int (*mremap)(const struct vm_special_mapping *sm, struct vm_area_struct *new_vma); void (*close)(const struct vm_special_mapping *sm, struct vm_area_struct *vma); }; enum tlb_flush_reason { TLB_FLUSH_ON_TASK_SWITCH, TLB_REMOTE_SHOOTDOWN, TLB_LOCAL_SHOOTDOWN, TLB_LOCAL_MM_SHOOTDOWN, TLB_REMOTE_SEND_IPI, TLB_REMOTE_WRONG_CPU, NR_TLB_FLUSH_REASONS, }; /** * enum fault_flag - Fault flag definitions. * @FAULT_FLAG_WRITE: Fault was a write fault. * @FAULT_FLAG_MKWRITE: Fault was mkwrite of existing PTE. * @FAULT_FLAG_ALLOW_RETRY: Allow to retry the fault if blocked. * @FAULT_FLAG_RETRY_NOWAIT: Don't drop mmap_lock and wait when retrying. * @FAULT_FLAG_KILLABLE: The fault task is in SIGKILL killable region. * @FAULT_FLAG_TRIED: The fault has been tried once. * @FAULT_FLAG_USER: The fault originated in userspace. * @FAULT_FLAG_REMOTE: The fault is not for current task/mm. * @FAULT_FLAG_INSTRUCTION: The fault was during an instruction fetch. * @FAULT_FLAG_INTERRUPTIBLE: The fault can be interrupted by non-fatal signals. * @FAULT_FLAG_UNSHARE: The fault is an unsharing request to break COW in a * COW mapping, making sure that an exclusive anon page is * mapped after the fault. * @FAULT_FLAG_ORIG_PTE_VALID: whether the fault has vmf->orig_pte cached. * We should only access orig_pte if this flag set. * @FAULT_FLAG_VMA_LOCK: The fault is handled under VMA lock. * * About @FAULT_FLAG_ALLOW_RETRY and @FAULT_FLAG_TRIED: we can specify * whether we would allow page faults to retry by specifying these two * fault flags correctly. Currently there can be three legal combinations: * * (a) ALLOW_RETRY and !TRIED: this means the page fault allows retry, and * this is the first try * * (b) ALLOW_RETRY and TRIED: this means the page fault allows retry, and * we've already tried at least once * * (c) !ALLOW_RETRY and !TRIED: this means the page fault does not allow retry * * The unlisted combination (!ALLOW_RETRY && TRIED) is illegal and should never * be used. Note that page faults can be allowed to retry for multiple times, * in which case we'll have an initial fault with flags (a) then later on * continuous faults with flags (b). We should always try to detect pending * signals before a retry to make sure the continuous page faults can still be * interrupted if necessary. * * The combination FAULT_FLAG_WRITE|FAULT_FLAG_UNSHARE is illegal. * FAULT_FLAG_UNSHARE is ignored and treated like an ordinary read fault when * applied to mappings that are not COW mappings. */ enum fault_flag { FAULT_FLAG_WRITE = 1 << 0, FAULT_FLAG_MKWRITE = 1 << 1, FAULT_FLAG_ALLOW_RETRY = 1 << 2, FAULT_FLAG_RETRY_NOWAIT = 1 << 3, FAULT_FLAG_KILLABLE = 1 << 4, FAULT_FLAG_TRIED = 1 << 5, FAULT_FLAG_USER = 1 << 6, FAULT_FLAG_REMOTE = 1 << 7, FAULT_FLAG_INSTRUCTION = 1 << 8, FAULT_FLAG_INTERRUPTIBLE = 1 << 9, FAULT_FLAG_UNSHARE = 1 << 10, FAULT_FLAG_ORIG_PTE_VALID = 1 << 11, FAULT_FLAG_VMA_LOCK = 1 << 12, }; typedef unsigned int __bitwise zap_flags_t; /* Flags for clear_young_dirty_ptes(). */ typedef int __bitwise cydp_t; /* Clear the access bit */ #define CYDP_CLEAR_YOUNG ((__force cydp_t)BIT(0)) /* Clear the dirty bit */ #define CYDP_CLEAR_DIRTY ((__force cydp_t)BIT(1)) /* * FOLL_PIN and FOLL_LONGTERM may be used in various combinations with each * other. Here is what they mean, and how to use them: * * * FIXME: For pages which are part of a filesystem, mappings are subject to the * lifetime enforced by the filesystem and we need guarantees that longterm * users like RDMA and V4L2 only establish mappings which coordinate usage with * the filesystem. Ideas for this coordination include revoking the longterm * pin, delaying writeback, bounce buffer page writeback, etc. As FS DAX was * added after the problem with filesystems was found FS DAX VMAs are * specifically failed. Filesystem pages are still subject to bugs and use of * FOLL_LONGTERM should be avoided on those pages. * * In the CMA case: long term pins in a CMA region would unnecessarily fragment * that region. And so, CMA attempts to migrate the page before pinning, when * FOLL_LONGTERM is specified. * * FOLL_PIN indicates that a special kind of tracking (not just page->_refcount, * but an additional pin counting system) will be invoked. This is intended for * anything that gets a page reference and then touches page data (for example, * Direct IO). This lets the filesystem know that some non-file-system entity is * potentially changing the pages' data. In contrast to FOLL_GET (whose pages * are released via put_page()), FOLL_PIN pages must be released, ultimately, by * a call to unpin_user_page(). * * FOLL_PIN is similar to FOLL_GET: both of these pin pages. They use different * and separate refcounting mechanisms, however, and that means that each has * its own acquire and release mechanisms: * * FOLL_GET: get_user_pages*() to acquire, and put_page() to release. * * FOLL_PIN: pin_user_pages*() to acquire, and unpin_user_pages to release. * * FOLL_PIN and FOLL_GET are mutually exclusive for a given function call. * (The underlying pages may experience both FOLL_GET-based and FOLL_PIN-based * calls applied to them, and that's perfectly OK. This is a constraint on the * callers, not on the pages.) * * FOLL_PIN should be set internally by the pin_user_pages*() APIs, never * directly by the caller. That's in order to help avoid mismatches when * releasing pages: get_user_pages*() pages must be released via put_page(), * while pin_user_pages*() pages must be released via unpin_user_page(). * * Please see Documentation/core-api/pin_user_pages.rst for more information. */ enum { /* check pte is writable */ FOLL_WRITE = 1 << 0, /* do get_page on page */ FOLL_GET = 1 << 1, /* give error on hole if it would be zero */ FOLL_DUMP = 1 << 2, /* get_user_pages read/write w/o permission */ FOLL_FORCE = 1 << 3, /* * if a disk transfer is needed, start the IO and return without waiting * upon it */ FOLL_NOWAIT = 1 << 4, /* do not fault in pages */ FOLL_NOFAULT = 1 << 5, /* check page is hwpoisoned */ FOLL_HWPOISON = 1 << 6, /* don't do file mappings */ FOLL_ANON = 1 << 7, /* * FOLL_LONGTERM indicates that the page will be held for an indefinite * time period _often_ under userspace control. This is in contrast to * iov_iter_get_pages(), whose usages are transient. */ FOLL_LONGTERM = 1 << 8, /* split huge pmd before returning */ FOLL_SPLIT_PMD = 1 << 9, /* allow returning PCI P2PDMA pages */ FOLL_PCI_P2PDMA = 1 << 10, /* allow interrupts from generic signals */ FOLL_INTERRUPTIBLE = 1 << 11, /* * Always honor (trigger) NUMA hinting faults. * * FOLL_WRITE implicitly honors NUMA hinting faults because a * PROT_NONE-mapped page is not writable (exceptions with FOLL_FORCE * apply). get_user_pages_fast_only() always implicitly honors NUMA * hinting faults. */ FOLL_HONOR_NUMA_FAULT = 1 << 12, /* See also internal only FOLL flags in mm/internal.h */ }; /* mm flags */ /* * The first two bits represent core dump modes for set-user-ID, * the modes are SUID_DUMP_* defined in linux/sched/coredump.h */ #define MMF_DUMPABLE_BITS 2 #define MMF_DUMPABLE_MASK ((1 << MMF_DUMPABLE_BITS) - 1) /* coredump filter bits */ #define MMF_DUMP_ANON_PRIVATE 2 #define MMF_DUMP_ANON_SHARED 3 #define MMF_DUMP_MAPPED_PRIVATE 4 #define MMF_DUMP_MAPPED_SHARED 5 #define MMF_DUMP_ELF_HEADERS 6 #define MMF_DUMP_HUGETLB_PRIVATE 7 #define MMF_DUMP_HUGETLB_SHARED 8 #define MMF_DUMP_DAX_PRIVATE 9 #define MMF_DUMP_DAX_SHARED 10 #define MMF_DUMP_FILTER_SHIFT MMF_DUMPABLE_BITS #define MMF_DUMP_FILTER_BITS 9 #define MMF_DUMP_FILTER_MASK \ (((1 << MMF_DUMP_FILTER_BITS) - 1) << MMF_DUMP_FILTER_SHIFT) #define MMF_DUMP_FILTER_DEFAULT \ ((1 << MMF_DUMP_ANON_PRIVATE) | (1 << MMF_DUMP_ANON_SHARED) |\ (1 << MMF_DUMP_HUGETLB_PRIVATE) | MMF_DUMP_MASK_DEFAULT_ELF) #ifdef CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS # define MMF_DUMP_MASK_DEFAULT_ELF (1 << MMF_DUMP_ELF_HEADERS) #else # define MMF_DUMP_MASK_DEFAULT_ELF 0 #endif /* leave room for more dump flags */ #define MMF_VM_MERGEABLE 16 /* KSM may merge identical pages */ #define MMF_VM_HUGEPAGE 17 /* set when mm is available for khugepaged */ /* * This one-shot flag is dropped due to necessity of changing exe once again * on NFS restore */ //#define MMF_EXE_FILE_CHANGED 18 /* see prctl_set_mm_exe_file() */ #define MMF_HAS_UPROBES 19 /* has uprobes */ #define MMF_RECALC_UPROBES 20 /* MMF_HAS_UPROBES can be wrong */ #define MMF_OOM_SKIP 21 /* mm is of no interest for the OOM killer */ #define MMF_UNSTABLE 22 /* mm is unstable for copy_from_user */ #define MMF_HUGE_ZERO_PAGE 23 /* mm has ever used the global huge zero page */ #define MMF_DISABLE_THP 24 /* disable THP for all VMAs */ #define MMF_DISABLE_THP_MASK (1 << MMF_DISABLE_THP) #define MMF_OOM_REAP_QUEUED 25 /* mm was queued for oom_reaper */ #define MMF_MULTIPROCESS 26 /* mm is shared between processes */ /* * MMF_HAS_PINNED: Whether this mm has pinned any pages. This can be either * replaced in the future by mm.pinned_vm when it becomes stable, or grow into * a counter on its own. We're aggresive on this bit for now: even if the * pinned pages were unpinned later on, we'll still keep this bit set for the * lifecycle of this mm, just for simplicity. */ #define MMF_HAS_PINNED 27 /* FOLL_PIN has run, never cleared */ #define MMF_HAS_MDWE 28 #define MMF_HAS_MDWE_MASK (1 << MMF_HAS_MDWE) #define MMF_HAS_MDWE_NO_INHERIT 29 #define MMF_VM_MERGE_ANY 30 #define MMF_VM_MERGE_ANY_MASK (1 << MMF_VM_MERGE_ANY) #define MMF_TOPDOWN 31 /* mm searches top down by default */ #define MMF_TOPDOWN_MASK (1 << MMF_TOPDOWN) #define MMF_INIT_MASK (MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\ MMF_DISABLE_THP_MASK | MMF_HAS_MDWE_MASK |\ MMF_VM_MERGE_ANY_MASK | MMF_TOPDOWN_MASK) static inline unsigned long mmf_init_flags(unsigned long flags) { if (flags & (1UL << MMF_HAS_MDWE_NO_INHERIT)) flags &= ~((1UL << MMF_HAS_MDWE) | (1UL << MMF_HAS_MDWE_NO_INHERIT)); return flags & MMF_INIT_MASK; } #endif /* _LINUX_MM_TYPES_H */ |
| 1 7 6 1 1 1 2 1 1 10 1 1 1 1 1 5 2 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 | // SPDX-License-Identifier: GPL-2.0-only /* * File: datagram.c * * Datagram (ISI) Phonet sockets * * Copyright (C) 2008 Nokia Corporation. * * Authors: Sakari Ailus <sakari.ailus@nokia.com> * Rémi Denis-Courmont */ #include <linux/kernel.h> #include <linux/slab.h> #include <linux/socket.h> #include <asm/ioctls.h> #include <net/sock.h> #include <linux/phonet.h> #include <linux/export.h> #include <net/phonet/phonet.h> static int pn_backlog_rcv(struct sock *sk, struct sk_buff *skb); /* associated socket ceases to exist */ static void pn_sock_close(struct sock *sk, long timeout) { sk_common_release(sk); } static int pn_ioctl(struct sock *sk, int cmd, int *karg) { struct sk_buff *skb; switch (cmd) { case SIOCINQ: spin_lock_bh(&sk->sk_receive_queue.lock); skb = skb_peek(&sk->sk_receive_queue); *karg = skb ? skb->len : 0; spin_unlock_bh(&sk->sk_receive_queue.lock); return 0; case SIOCPNADDRESOURCE: case SIOCPNDELRESOURCE: { u32 res = *karg; if (res >= 256) return -EINVAL; if (cmd == SIOCPNADDRESOURCE) return pn_sock_bind_res(sk, res); else return pn_sock_unbind_res(sk, res); } } return -ENOIOCTLCMD; } /* Destroy socket. All references are gone. */ static void pn_destruct(struct sock *sk) { skb_queue_purge(&sk->sk_receive_queue); } static int pn_init(struct sock *sk) { sk->sk_destruct = pn_destruct; return 0; } static int pn_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) { DECLARE_SOCKADDR(struct sockaddr_pn *, target, msg->msg_name); struct sk_buff *skb; int err; if (msg->msg_flags & ~(MSG_DONTWAIT|MSG_EOR|MSG_NOSIGNAL| MSG_CMSG_COMPAT)) return -EOPNOTSUPP; if (target == NULL) return -EDESTADDRREQ; if (msg->msg_namelen < sizeof(struct sockaddr_pn)) return -EINVAL; if (target->spn_family != AF_PHONET) return -EAFNOSUPPORT; skb = sock_alloc_send_skb(sk, MAX_PHONET_HEADER + len, msg->msg_flags & MSG_DONTWAIT, &err); if (skb == NULL) return err; skb_reserve(skb, MAX_PHONET_HEADER); err = memcpy_from_msg((void *)skb_put(skb, len), msg, len); if (err < 0) { kfree_skb(skb); return err; } /* * Fill in the Phonet header and * finally pass the packet forwards. */ err = pn_skb_send(sk, skb, target); /* If ok, return len. */ return (err >= 0) ? len : err; } static int pn_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int flags, int *addr_len) { struct sk_buff *skb = NULL; struct sockaddr_pn sa; int rval = -EOPNOTSUPP; int copylen; if (flags & ~(MSG_PEEK|MSG_TRUNC|MSG_DONTWAIT|MSG_NOSIGNAL| MSG_CMSG_COMPAT)) goto out_nofree; skb = skb_recv_datagram(sk, flags, &rval); if (skb == NULL) goto out_nofree; pn_skb_get_src_sockaddr(skb, &sa); copylen = skb->len; if (len < copylen) { msg->msg_flags |= MSG_TRUNC; copylen = len; } rval = skb_copy_datagram_msg(skb, 0, msg, copylen); if (rval) { rval = -EFAULT; goto out; } rval = (flags & MSG_TRUNC) ? skb->len : copylen; if (msg->msg_name != NULL) { __sockaddr_check_size(sizeof(sa)); memcpy(msg->msg_name, &sa, sizeof(sa)); *addr_len = sizeof(sa); } out: skb_free_datagram(sk, skb); out_nofree: return rval; } /* Queue an skb for a sock. */ static int pn_backlog_rcv(struct sock *sk, struct sk_buff *skb) { int err = sock_queue_rcv_skb(sk, skb); if (err < 0) kfree_skb(skb); return err ? NET_RX_DROP : NET_RX_SUCCESS; } /* Module registration */ static struct proto pn_proto = { .close = pn_sock_close, .ioctl = pn_ioctl, .init = pn_init, .sendmsg = pn_sendmsg, .recvmsg = pn_recvmsg, .backlog_rcv = pn_backlog_rcv, .hash = pn_sock_hash, .unhash = pn_sock_unhash, .get_port = pn_sock_get_port, .obj_size = sizeof(struct pn_sock), .owner = THIS_MODULE, .name = "PHONET", }; static const struct phonet_protocol pn_dgram_proto = { .ops = &phonet_dgram_ops, .prot = &pn_proto, .sock_type = SOCK_DGRAM, }; int __init isi_register(void) { return phonet_proto_register(PN_PROTO_PHONET, &pn_dgram_proto); } void __exit isi_unregister(void) { phonet_proto_unregister(PN_PROTO_PHONET, &pn_dgram_proto); } |
| 2 2 2 5 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 | /* * linux/fs/nls/mac-romanian.c * * Charset macromanian translation tables. * Generated automatically from the Unicode and charset * tables from the Unicode Organization (www.unicode.org). * The Unicode to charset table has only exact mappings. */ /* * COPYRIGHT AND PERMISSION NOTICE * * Copyright 1991-2012 Unicode, Inc. All rights reserved. Distributed under * the Terms of Use in http://www.unicode.org/copyright.html. * * Permission is hereby granted, free of charge, to any person obtaining a * copy of the Unicode data files and any associated documentation (the "Data * Files") or Unicode software and any associated documentation (the * "Software") to deal in the Data Files or Software without restriction, * including without limitation the rights to use, copy, modify, merge, * publish, distribute, and/or sell copies of the Data Files or Software, and * to permit persons to whom the Data Files or Software are furnished to do * so, provided that (a) the above copyright notice(s) and this permission * notice appear with all copies of the Data Files or Software, (b) both the * above copyright notice(s) and this permission notice appear in associated * documentation, and (c) there is clear notice in each modified Data File or * in the Software as well as in the documentation associated with the Data * File(s) or Software that the data or software has been modified. * * THE DATA FILES AND SOFTWARE ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY * KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT OF * THIRD PARTY RIGHTS. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR HOLDERS * INCLUDED IN THIS NOTICE BE LIABLE FOR ANY CLAIM, OR ANY SPECIAL INDIRECT * OR CONSEQUENTIAL DAMAGES, OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF * USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR * OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR * PERFORMANCE OF THE DATA FILES OR SOFTWARE. * * Except as contained in this notice, the name of a copyright holder shall * not be used in advertising or otherwise to promote the sale, use or other * dealings in these Data Files or Software without prior written * authorization of the copyright holder. */ #include <linux/module.h> #include <linux/kernel.h> #include <linux/string.h> #include <linux/nls.h> #include <linux/errno.h> static const wchar_t charset2uni[256] = { /* 0x00 */ 0x0000, 0x0001, 0x0002, 0x0003, 0x0004, 0x0005, 0x0006, 0x0007, 0x0008, 0x0009, 0x000a, 0x000b, 0x000c, 0x000d, 0x000e, 0x000f, /* 0x10 */ 0x0010, 0x0011, 0x0012, 0x0013, 0x0014, 0x0015, 0x0016, 0x0017, 0x0018, 0x0019, 0x001a, 0x001b, 0x001c, 0x001d, 0x001e, 0x001f, /* 0x20 */ 0x0020, 0x0021, 0x0022, 0x0023, 0x0024, 0x0025, 0x0026, 0x0027, 0x0028, 0x0029, 0x002a, 0x002b, 0x002c, 0x002d, 0x002e, 0x002f, /* 0x30 */ 0x0030, 0x0031, 0x0032, 0x0033, 0x0034, 0x0035, 0x0036, 0x0037, 0x0038, 0x0039, 0x003a, 0x003b, 0x003c, 0x003d, 0x003e, 0x003f, /* 0x40 */ 0x0040, 0x0041, 0x0042, 0x0043, 0x0044, 0x0045, 0x0046, 0x0047, 0x0048, 0x0049, 0x004a, 0x004b, 0x004c, 0x004d, 0x004e, 0x004f, /* 0x50 */ 0x0050, 0x0051, 0x0052, 0x0053, 0x0054, 0x0055, 0x0056, 0x0057, 0x0058, 0x0059, 0x005a, 0x005b, 0x005c, 0x005d, 0x005e, 0x005f, /* 0x60 */ 0x0060, 0x0061, 0x0062, 0x0063, 0x0064, 0x0065, 0x0066, 0x0067, 0x0068, 0x0069, 0x006a, 0x006b, 0x006c, 0x006d, 0x006e, 0x006f, /* 0x70 */ 0x0070, 0x0071, 0x0072, 0x0073, 0x0074, 0x0075, 0x0076, 0x0077, 0x0078, 0x0079, 0x007a, 0x007b, 0x007c, 0x007d, 0x007e, 0x007f, /* 0x80 */ 0x00c4, 0x00c5, 0x00c7, 0x00c9, 0x00d1, 0x00d6, 0x00dc, 0x00e1, 0x00e0, 0x00e2, 0x00e4, 0x00e3, 0x00e5, 0x00e7, 0x00e9, 0x00e8, /* 0x90 */ 0x00ea, 0x00eb, 0x00ed, 0x00ec, 0x00ee, 0x00ef, 0x00f1, 0x00f3, 0x00f2, 0x00f4, 0x00f6, 0x00f5, 0x00fa, 0x00f9, 0x00fb, 0x00fc, /* 0xa0 */ 0x2020, 0x00b0, 0x00a2, 0x00a3, 0x00a7, 0x2022, 0x00b6, 0x00df, 0x00ae, 0x00a9, 0x2122, 0x00b4, 0x00a8, 0x2260, 0x0102, 0x0218, /* 0xb0 */ 0x221e, 0x00b1, 0x2264, 0x2265, 0x00a5, 0x00b5, 0x2202, 0x2211, 0x220f, 0x03c0, 0x222b, 0x00aa, 0x00ba, 0x03a9, 0x0103, 0x0219, /* 0xc0 */ 0x00bf, 0x00a1, 0x00ac, 0x221a, 0x0192, 0x2248, 0x2206, 0x00ab, 0x00bb, 0x2026, 0x00a0, 0x00c0, 0x00c3, 0x00d5, 0x0152, 0x0153, /* 0xd0 */ 0x2013, 0x2014, 0x201c, 0x201d, 0x2018, 0x2019, 0x00f7, 0x25ca, 0x00ff, 0x0178, 0x2044, 0x20ac, 0x2039, 0x203a, 0x021a, 0x021b, /* 0xe0 */ 0x2021, 0x00b7, 0x201a, 0x201e, 0x2030, 0x00c2, 0x00ca, 0x00c1, 0x00cb, 0x00c8, 0x00cd, 0x00ce, 0x00cf, 0x00cc, 0x00d3, 0x00d4, /* 0xf0 */ 0xf8ff, 0x00d2, 0x00da, 0x00db, 0x00d9, 0x0131, 0x02c6, 0x02dc, 0x00af, 0x02d8, 0x02d9, 0x02da, 0x00b8, 0x02dd, 0x02db, 0x02c7, }; static const unsigned char page00[256] = { 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, /* 0x00-0x07 */ 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f, /* 0x08-0x0f */ 0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17, /* 0x10-0x17 */ 0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f, /* 0x18-0x1f */ 0x20, 0x21, 0x22, 0x23, 0x24, 0x25, 0x26, 0x27, /* 0x20-0x27 */ 0x28, 0x29, 0x2a, 0x2b, 0x2c, 0x2d, 0x2e, 0x2f, /* 0x28-0x2f */ 0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37, /* 0x30-0x37 */ 0x38, 0x39, 0x3a, 0x3b, 0x3c, 0x3d, 0x3e, 0x3f, /* 0x38-0x3f */ 0x40, 0x41, 0x42, 0x43, 0x44, 0x45, 0x46, 0x47, /* 0x40-0x47 */ 0x48, 0x49, 0x4a, 0x4b, 0x4c, 0x4d, 0x4e, 0x4f, /* 0x48-0x4f */ 0x50, 0x51, 0x52, 0x53, 0x54, 0x55, 0x56, 0x57, /* 0x50-0x57 */ 0x58, 0x59, 0x5a, 0x5b, 0x5c, 0x5d, 0x5e, 0x5f, /* 0x58-0x5f */ 0x60, 0x61, 0x62, 0x63, 0x64, 0x65, 0x66, 0x67, /* 0x60-0x67 */ 0x68, 0x69, 0x6a, 0x6b, 0x6c, 0x6d, 0x6e, 0x6f, /* 0x68-0x6f */ 0x70, 0x71, 0x72, 0x73, 0x74, 0x75, 0x76, 0x77, /* 0x70-0x77 */ 0x78, 0x79, 0x7a, 0x7b, 0x7c, 0x7d, 0x7e, 0x7f, /* 0x78-0x7f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x80-0x87 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x88-0x8f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x90-0x97 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x98-0x9f */ 0xca, 0xc1, 0xa2, 0xa3, 0x00, 0xb4, 0x00, 0xa4, /* 0xa0-0xa7 */ 0xac, 0xa9, 0xbb, 0xc7, 0xc2, 0x00, 0xa8, 0xf8, /* 0xa8-0xaf */ 0xa1, 0xb1, 0x00, 0x00, 0xab, 0xb5, 0xa6, 0xe1, /* 0xb0-0xb7 */ 0xfc, 0x00, 0xbc, 0xc8, 0x00, 0x00, 0x00, 0xc0, /* 0xb8-0xbf */ 0xcb, 0xe7, 0xe5, 0xcc, 0x80, 0x81, 0x00, 0x82, /* 0xc0-0xc7 */ 0xe9, 0x83, 0xe6, 0xe8, 0xed, 0xea, 0xeb, 0xec, /* 0xc8-0xcf */ 0x00, 0x84, 0xf1, 0xee, 0xef, 0xcd, 0x85, 0x00, /* 0xd0-0xd7 */ 0x00, 0xf4, 0xf2, 0xf3, 0x86, 0x00, 0x00, 0xa7, /* 0xd8-0xdf */ 0x88, 0x87, 0x89, 0x8b, 0x8a, 0x8c, 0x00, 0x8d, /* 0xe0-0xe7 */ 0x8f, 0x8e, 0x90, 0x91, 0x93, 0x92, 0x94, 0x95, /* 0xe8-0xef */ 0x00, 0x96, 0x98, 0x97, 0x99, 0x9b, 0x9a, 0xd6, /* 0xf0-0xf7 */ 0x00, 0x9d, 0x9c, 0x9e, 0x9f, 0x00, 0x00, 0xd8, /* 0xf8-0xff */ }; static const unsigned char page01[256] = { 0x00, 0x00, 0xae, 0xbe, 0x00, 0x00, 0x00, 0x00, /* 0x00-0x07 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x08-0x0f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x10-0x17 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x18-0x1f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x20-0x27 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x28-0x2f */ 0x00, 0xf5, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x30-0x37 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x38-0x3f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x40-0x47 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x48-0x4f */ 0x00, 0x00, 0xce, 0xcf, 0x00, 0x00, 0x00, 0x00, /* 0x50-0x57 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x58-0x5f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x60-0x67 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x68-0x6f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x70-0x77 */ 0xd9, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x78-0x7f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x80-0x87 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x88-0x8f */ 0x00, 0x00, 0xc4, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x90-0x97 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x98-0x9f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xa0-0xa7 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xa8-0xaf */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xb0-0xb7 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xb8-0xbf */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xc0-0xc7 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xc8-0xcf */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xd0-0xd7 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xd8-0xdf */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xe0-0xe7 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xe8-0xef */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xf0-0xf7 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xf8-0xff */ }; static const unsigned char page02[256] = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x00-0x07 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x08-0x0f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x10-0x17 */ 0xaf, 0xbf, 0xde, 0xdf, 0x00, 0x00, 0x00, 0x00, /* 0x18-0x1f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x20-0x27 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x28-0x2f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x30-0x37 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x38-0x3f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x40-0x47 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x48-0x4f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x50-0x57 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x58-0x5f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x60-0x67 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x68-0x6f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x70-0x77 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x78-0x7f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x80-0x87 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x88-0x8f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x90-0x97 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x98-0x9f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xa0-0xa7 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xa8-0xaf */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xb0-0xb7 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xb8-0xbf */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf6, 0xff, /* 0xc0-0xc7 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xc8-0xcf */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xd0-0xd7 */ 0xf9, 0xfa, 0xfb, 0xfe, 0xf7, 0xfd, 0x00, 0x00, /* 0xd8-0xdf */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xe0-0xe7 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xe8-0xef */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xf0-0xf7 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xf8-0xff */ }; static const unsigned char page03[256] = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x00-0x07 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x08-0x0f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x10-0x17 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x18-0x1f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x20-0x27 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x28-0x2f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x30-0x37 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x38-0x3f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x40-0x47 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x48-0x4f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x50-0x57 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x58-0x5f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x60-0x67 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x68-0x6f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x70-0x77 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x78-0x7f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x80-0x87 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x88-0x8f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x90-0x97 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x98-0x9f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xa0-0xa7 */ 0x00, 0xbd, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xa8-0xaf */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xb0-0xb7 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xb8-0xbf */ 0xb9, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xc0-0xc7 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xc8-0xcf */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xd0-0xd7 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xd8-0xdf */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xe0-0xe7 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xe8-0xef */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xf0-0xf7 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xf8-0xff */ }; static const unsigned char page20[256] = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x00-0x07 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x08-0x0f */ 0x00, 0x00, 0x00, 0xd0, 0xd1, 0x00, 0x00, 0x00, /* 0x10-0x17 */ 0xd4, 0xd5, 0xe2, 0x00, 0xd2, 0xd3, 0xe3, 0x00, /* 0x18-0x1f */ 0xa0, 0xe0, 0xa5, 0x00, 0x00, 0x00, 0xc9, 0x00, /* 0x20-0x27 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x28-0x2f */ 0xe4, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x30-0x37 */ 0x00, 0xdc, 0xdd, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x38-0x3f */ 0x00, 0x00, 0x00, 0x00, 0xda, 0x00, 0x00, 0x00, /* 0x40-0x47 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x48-0x4f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x50-0x57 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x58-0x5f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x60-0x67 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x68-0x6f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x70-0x77 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x78-0x7f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x80-0x87 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x88-0x8f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x90-0x97 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x98-0x9f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xa0-0xa7 */ 0x00, 0x00, 0x00, 0x00, 0xdb, 0x00, 0x00, 0x00, /* 0xa8-0xaf */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xb0-0xb7 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xb8-0xbf */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xc0-0xc7 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xc8-0xcf */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xd0-0xd7 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xd8-0xdf */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xe0-0xe7 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xe8-0xef */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xf0-0xf7 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xf8-0xff */ }; static const unsigned char page21[256] = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x00-0x07 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x08-0x0f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x10-0x17 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x18-0x1f */ 0x00, 0x00, 0xaa, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x20-0x27 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x28-0x2f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x30-0x37 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x38-0x3f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x40-0x47 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x48-0x4f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x50-0x57 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x58-0x5f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x60-0x67 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x68-0x6f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x70-0x77 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x78-0x7f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x80-0x87 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x88-0x8f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x90-0x97 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x98-0x9f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xa0-0xa7 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xa8-0xaf */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xb0-0xb7 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xb8-0xbf */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xc0-0xc7 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xc8-0xcf */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xd0-0xd7 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xd8-0xdf */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xe0-0xe7 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xe8-0xef */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xf0-0xf7 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xf8-0xff */ }; static const unsigned char page22[256] = { 0x00, 0x00, 0xb6, 0x00, 0x00, 0x00, 0xc6, 0x00, /* 0x00-0x07 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xb8, /* 0x08-0x0f */ 0x00, 0xb7, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x10-0x17 */ 0x00, 0x00, 0xc3, 0x00, 0x00, 0x00, 0xb0, 0x00, /* 0x18-0x1f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x20-0x27 */ 0x00, 0x00, 0x00, 0xba, 0x00, 0x00, 0x00, 0x00, /* 0x28-0x2f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x30-0x37 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x38-0x3f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x40-0x47 */ 0xc5, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x48-0x4f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x50-0x57 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x58-0x5f */ 0xad, 0x00, 0x00, 0x00, 0xb2, 0xb3, 0x00, 0x00, /* 0x60-0x67 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x68-0x6f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x70-0x77 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x78-0x7f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x80-0x87 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x88-0x8f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x90-0x97 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x98-0x9f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xa0-0xa7 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xa8-0xaf */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xb0-0xb7 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xb8-0xbf */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xc0-0xc7 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xc8-0xcf */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xd0-0xd7 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xd8-0xdf */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xe0-0xe7 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xe8-0xef */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xf0-0xf7 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xf8-0xff */ }; static const unsigned char page25[256] = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x00-0x07 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x08-0x0f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x10-0x17 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x18-0x1f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x20-0x27 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x28-0x2f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x30-0x37 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x38-0x3f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x40-0x47 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x48-0x4f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x50-0x57 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x58-0x5f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x60-0x67 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x68-0x6f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x70-0x77 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x78-0x7f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x80-0x87 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x88-0x8f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x90-0x97 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x98-0x9f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xa0-0xa7 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xa8-0xaf */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xb0-0xb7 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xb8-0xbf */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xc0-0xc7 */ 0x00, 0x00, 0xd7, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xc8-0xcf */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xd0-0xd7 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xd8-0xdf */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xe0-0xe7 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xe8-0xef */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xf0-0xf7 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xf8-0xff */ }; static const unsigned char pagef8[256] = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x00-0x07 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x08-0x0f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x10-0x17 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x18-0x1f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x20-0x27 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x28-0x2f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x30-0x37 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x38-0x3f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x40-0x47 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x48-0x4f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x50-0x57 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x58-0x5f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x60-0x67 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x68-0x6f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x70-0x77 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x78-0x7f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x80-0x87 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x88-0x8f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x90-0x97 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x98-0x9f */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xa0-0xa7 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xa8-0xaf */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xb0-0xb7 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xb8-0xbf */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xc0-0xc7 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xc8-0xcf */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xd0-0xd7 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xd8-0xdf */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xe0-0xe7 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xe8-0xef */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0xf0-0xf7 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf0, /* 0xf8-0xff */ }; static const unsigned char *const page_uni2charset[256] = { page00, page01, page02, page03, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, page20, page21, page22, NULL, NULL, page25, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, pagef8, NULL, NULL, NULL, NULL, NULL, NULL, NULL, }; static const unsigned char charset2lower[256] = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, /* 0x00-0x07 */ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, /* 0x08-0x0f */ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, /* 0x10-0x17 */ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, /* 0x18-0x1f */ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, /* 0x20-0x27 */ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, /* 0x28-0x2f */ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, /* 0x30-0x37 */ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, /* 0x38-0x3f */ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, /* 0x40-0x47 */ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, /* 0x48-0x4f */ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, /* 0x50-0x57 */ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, /* 0x58-0x5f */ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, /* 0x60-0x67 */ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, /* 0x68-0x6f */ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, /* 0x70-0x77 */ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, /* 0x78-0x7f */ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, /* 0x80-0x87 */ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, /* 0x88-0x8f */ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, /* 0x90-0x97 */ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, /* 0x98-0x9f */ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, /* 0xa0-0xa7 */ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, /* 0xa8-0xaf */ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, /* 0xb0-0xb7 */ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, /* 0xb8-0xbf */ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, /* 0xc0-0xc7 */ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, /* 0xc8-0xcf */ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, /* 0xd0-0xd7 */ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, /* 0xd8-0xdf */ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, /* 0xe0-0xe7 */ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, /* 0xe8-0xef */ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, /* 0xf0-0xf7 */ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, /* 0xf8-0xff */ }; static const unsigned char charset2upper[256] = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, /* 0x00-0x07 */ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, /* 0x08-0x0f */ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, /* 0x10-0x17 */ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, /* 0x18-0x1f */ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, /* 0x20-0x27 */ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, /* 0x28-0x2f */ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, /* 0x30-0x37 */ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, /* 0x38-0x3f */ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, /* 0x40-0x47 */ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, /* 0x48-0x4f */ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, /* 0x50-0x57 */ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, /* 0x58-0x5f */ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, /* 0x60-0x67 */ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, /* 0x68-0x6f */ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, /* 0x70-0x77 */ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, /* 0x78-0x7f */ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, /* 0x80-0x87 */ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, /* 0x88-0x8f */ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, /* 0x90-0x97 */ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, /* 0x98-0x9f */ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, /* 0xa0-0xa7 */ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, /* 0xa8-0xaf */ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, /* 0xb0-0xb7 */ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, /* 0xb8-0xbf */ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, /* 0xc0-0xc7 */ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, /* 0xc8-0xcf */ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, /* 0xd0-0xd7 */ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, /* 0xd8-0xdf */ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, /* 0xe0-0xe7 */ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, /* 0xe8-0xef */ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, /* 0xf0-0xf7 */ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, /* 0xf8-0xff */ }; static int uni2char(wchar_t uni, unsigned char *out, int boundlen) { const unsigned char *uni2charset; unsigned char cl = uni & 0x00ff; unsigned char ch = (uni & 0xff00) >> 8; if (boundlen <= 0) return -ENAMETOOLONG; uni2charset = page_uni2charset[ch]; if (uni2charset && uni2charset[cl]) out[0] = uni2charset[cl]; else return -EINVAL; return 1; } static int char2uni(const unsigned char *rawstring, int boundlen, wchar_t *uni) { *uni = charset2uni[*rawstring]; if (*uni == 0x0000) return -EINVAL; return 1; } static struct nls_table table = { .charset = "macromanian", .uni2char = uni2char, .char2uni = char2uni, .charset2lower = charset2lower, .charset2upper = charset2upper, }; static int __init init_nls_macromanian(void) { return register_nls(&table); } static void __exit exit_nls_macromanian(void) { unregister_nls(&table); } module_init(init_nls_macromanian) module_exit(exit_nls_macromanian) MODULE_DESCRIPTION("NLS Codepage macromanian"); MODULE_LICENSE("Dual BSD/GPL"); |
| 32 128 46 4 108 18 18 19 19 18 18 87 31 27 30 45 46 25 25 10 11 11 19 9 3 3 1 3 3 1 3 3 3 3 3 3 3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 | // SPDX-License-Identifier: GPL-2.0-or-later /* * xfrm algorithm interface * * Copyright (c) 2002 James Morris <jmorris@intercode.com.au> */ #include <crypto/aead.h> #include <crypto/hash.h> #include <crypto/skcipher.h> #include <linux/module.h> #include <linux/kernel.h> #include <linux/pfkeyv2.h> #include <linux/crypto.h> #include <linux/scatterlist.h> #include <net/xfrm.h> #if IS_ENABLED(CONFIG_INET_ESP) || IS_ENABLED(CONFIG_INET6_ESP) #include <net/esp.h> #endif /* * Algorithms supported by IPsec. These entries contain properties which * are used in key negotiation and xfrm processing, and are used to verify * that instantiated crypto transforms have correct parameters for IPsec * purposes. */ static struct xfrm_algo_desc aead_list[] = { { .name = "rfc4106(gcm(aes))", .uinfo = { .aead = { .geniv = "seqiv", .icv_truncbits = 64, } }, .pfkey_supported = 1, .desc = { .sadb_alg_id = SADB_X_EALG_AES_GCM_ICV8, .sadb_alg_ivlen = 8, .sadb_alg_minbits = 128, .sadb_alg_maxbits = 256 } }, { .name = "rfc4106(gcm(aes))", .uinfo = { .aead = { .geniv = "seqiv", .icv_truncbits = 96, } }, .pfkey_supported = 1, .desc = { .sadb_alg_id = SADB_X_EALG_AES_GCM_ICV12, .sadb_alg_ivlen = 8, .sadb_alg_minbits = 128, .sadb_alg_maxbits = 256 } }, { .name = "rfc4106(gcm(aes))", .uinfo = { .aead = { .geniv = "seqiv", .icv_truncbits = 128, } }, .pfkey_supported = 1, .desc = { .sadb_alg_id = SADB_X_EALG_AES_GCM_ICV16, .sadb_alg_ivlen = 8, .sadb_alg_minbits = 128, .sadb_alg_maxbits = 256 } }, { .name = "rfc4309(ccm(aes))", .uinfo = { .aead = { .geniv = "seqiv", .icv_truncbits = 64, } }, .pfkey_supported = 1, .desc = { .sadb_alg_id = SADB_X_EALG_AES_CCM_ICV8, .sadb_alg_ivlen = 8, .sadb_alg_minbits = 128, .sadb_alg_maxbits = 256 } }, { .name = "rfc4309(ccm(aes))", .uinfo = { .aead = { .geniv = "seqiv", .icv_truncbits = 96, } }, .pfkey_supported = 1, .desc = { .sadb_alg_id = SADB_X_EALG_AES_CCM_ICV12, .sadb_alg_ivlen = 8, .sadb_alg_minbits = 128, .sadb_alg_maxbits = 256 } }, { .name = "rfc4309(ccm(aes))", .uinfo = { .aead = { .geniv = "seqiv", .icv_truncbits = 128, } }, .pfkey_supported = 1, .desc = { .sadb_alg_id = SADB_X_EALG_AES_CCM_ICV16, .sadb_alg_ivlen = 8, .sadb_alg_minbits = 128, .sadb_alg_maxbits = 256 } }, { .name = "rfc4543(gcm(aes))", .uinfo = { .aead = { .geniv = "seqiv", .icv_truncbits = 128, } }, .pfkey_supported = 1, .desc = { .sadb_alg_id = SADB_X_EALG_NULL_AES_GMAC, .sadb_alg_ivlen = 8, .sadb_alg_minbits = 128, .sadb_alg_maxbits = 256 } }, { .name = "rfc7539esp(chacha20,poly1305)", .uinfo = { .aead = { .geniv = "seqiv", .icv_truncbits = 128, } }, .pfkey_supported = 0, }, }; static struct xfrm_algo_desc aalg_list[] = { { .name = "digest_null", .uinfo = { .auth = { .icv_truncbits = 0, .icv_fullbits = 0, } }, .pfkey_supported = 1, .desc = { .sadb_alg_id = SADB_X_AALG_NULL, .sadb_alg_ivlen = 0, .sadb_alg_minbits = 0, .sadb_alg_maxbits = 0 } }, { .name = "hmac(md5)", .compat = "md5", .uinfo = { .auth = { .icv_truncbits = 96, .icv_fullbits = 128, } }, .pfkey_supported = 1, .desc = { .sadb_alg_id = SADB_AALG_MD5HMAC, .sadb_alg_ivlen = 0, .sadb_alg_minbits = 128, .sadb_alg_maxbits = 128 } }, { .name = "hmac(sha1)", .compat = "sha1", .uinfo = { .auth = { .icv_truncbits = 96, .icv_fullbits = 160, } }, .pfkey_supported = 1, .desc = { .sadb_alg_id = SADB_AALG_SHA1HMAC, .sadb_alg_ivlen = 0, .sadb_alg_minbits = 160, .sadb_alg_maxbits = 160 } }, { .name = "hmac(sha256)", .compat = "sha256", .uinfo = { .auth = { .icv_truncbits = 96, .icv_fullbits = 256, } }, .pfkey_supported = 1, .desc = { .sadb_alg_id = SADB_X_AALG_SHA2_256HMAC, .sadb_alg_ivlen = 0, .sadb_alg_minbits = 256, .sadb_alg_maxbits = 256 } }, { .name = "hmac(sha384)", .uinfo = { .auth = { .icv_truncbits = 192, .icv_fullbits = 384, } }, .pfkey_supported = 1, .desc = { .sadb_alg_id = SADB_X_AALG_SHA2_384HMAC, .sadb_alg_ivlen = 0, .sadb_alg_minbits = 384, .sadb_alg_maxbits = 384 } }, { .name = "hmac(sha512)", .uinfo = { .auth = { .icv_truncbits = 256, .icv_fullbits = 512, } }, .pfkey_supported = 1, .desc = { .sadb_alg_id = SADB_X_AALG_SHA2_512HMAC, .sadb_alg_ivlen = 0, .sadb_alg_minbits = 512, .sadb_alg_maxbits = 512 } }, { .name = "hmac(rmd160)", .compat = "rmd160", .uinfo = { .auth = { .icv_truncbits = 96, .icv_fullbits = 160, } }, .pfkey_supported = 1, .desc = { .sadb_alg_id = SADB_X_AALG_RIPEMD160HMAC, .sadb_alg_ivlen = 0, .sadb_alg_minbits = 160, .sadb_alg_maxbits = 160 } }, { .name = "xcbc(aes)", .uinfo = { .auth = { .icv_truncbits = 96, .icv_fullbits = 128, } }, .pfkey_supported = 1, .desc = { .sadb_alg_id = SADB_X_AALG_AES_XCBC_MAC, .sadb_alg_ivlen = 0, .sadb_alg_minbits = 128, .sadb_alg_maxbits = 128 } }, { /* rfc4494 */ .name = "cmac(aes)", .uinfo = { .auth = { .icv_truncbits = 96, .icv_fullbits = 128, } }, .pfkey_supported = 0, }, { .name = "hmac(sm3)", .compat = "sm3", .uinfo = { .auth = { .icv_truncbits = 256, .icv_fullbits = 256, } }, .pfkey_supported = 1, .desc = { .sadb_alg_id = SADB_X_AALG_SM3_256HMAC, .sadb_alg_ivlen = 0, .sadb_alg_minbits = 256, .sadb_alg_maxbits = 256 } }, }; static struct xfrm_algo_desc ealg_list[] = { { .name = "ecb(cipher_null)", .compat = "cipher_null", .uinfo = { .encr = { .blockbits = 8, .defkeybits = 0, } }, .pfkey_supported = 1, .desc = { .sadb_alg_id = SADB_EALG_NULL, .sadb_alg_ivlen = 0, .sadb_alg_minbits = 0, .sadb_alg_maxbits = 0 } }, { .name = "cbc(des)", .compat = "des", .uinfo = { .encr = { .geniv = "echainiv", .blockbits = 64, .defkeybits = 64, } }, .pfkey_supported = 1, .desc = { .sadb_alg_id = SADB_EALG_DESCBC, .sadb_alg_ivlen = 8, .sadb_alg_minbits = 64, .sadb_alg_maxbits = 64 } }, { .name = "cbc(des3_ede)", .compat = "des3_ede", .uinfo = { .encr = { .geniv = "echainiv", .blockbits = 64, .defkeybits = 192, } }, .pfkey_supported = 1, .desc = { .sadb_alg_id = SADB_EALG_3DESCBC, .sadb_alg_ivlen = 8, .sadb_alg_minbits = 192, .sadb_alg_maxbits = 192 } }, { .name = "cbc(cast5)", .compat = "cast5", .uinfo = { .encr = { .geniv = "echainiv", .blockbits = 64, .defkeybits = 128, } }, .pfkey_supported = 1, .desc = { .sadb_alg_id = SADB_X_EALG_CASTCBC, .sadb_alg_ivlen = 8, .sadb_alg_minbits = 40, .sadb_alg_maxbits = 128 } }, { .name = "cbc(blowfish)", .compat = "blowfish", .uinfo = { .encr = { .geniv = "echainiv", .blockbits = 64, .defkeybits = 128, } }, .pfkey_supported = 1, .desc = { .sadb_alg_id = SADB_X_EALG_BLOWFISHCBC, .sadb_alg_ivlen = 8, .sadb_alg_minbits = 40, .sadb_alg_maxbits = 448 } }, { .name = "cbc(aes)", .compat = "aes", .uinfo = { .encr = { .geniv = "echainiv", .blockbits = 128, .defkeybits = 128, } }, .pfkey_supported = 1, .desc = { .sadb_alg_id = SADB_X_EALG_AESCBC, .sadb_alg_ivlen = 8, .sadb_alg_minbits = 128, .sadb_alg_maxbits = 256 } }, { .name = "cbc(serpent)", .compat = "serpent", .uinfo = { .encr = { .geniv = "echainiv", .blockbits = 128, .defkeybits = 128, } }, .pfkey_supported = 1, .desc = { .sadb_alg_id = SADB_X_EALG_SERPENTCBC, .sadb_alg_ivlen = 8, .sadb_alg_minbits = 128, .sadb_alg_maxbits = 256, } }, { .name = "cbc(camellia)", .compat = "camellia", .uinfo = { .encr = { .geniv = "echainiv", .blockbits = 128, .defkeybits = 128, } }, .pfkey_supported = 1, .desc = { .sadb_alg_id = SADB_X_EALG_CAMELLIACBC, .sadb_alg_ivlen = 8, .sadb_alg_minbits = 128, .sadb_alg_maxbits = 256 } }, { .name = "cbc(twofish)", .compat = "twofish", .uinfo = { .encr = { .geniv = "echainiv", .blockbits = 128, .defkeybits = 128, } }, .pfkey_supported = 1, .desc = { .sadb_alg_id = SADB_X_EALG_TWOFISHCBC, .sadb_alg_ivlen = 8, .sadb_alg_minbits = 128, .sadb_alg_maxbits = 256 } }, { .name = "rfc3686(ctr(aes))", .uinfo = { .encr = { .geniv = "seqiv", .blockbits = 128, .defkeybits = 160, /* 128-bit key + 32-bit nonce */ } }, .pfkey_supported = 1, .desc = { .sadb_alg_id = SADB_X_EALG_AESCTR, .sadb_alg_ivlen = 8, .sadb_alg_minbits = 160, .sadb_alg_maxbits = 288 } }, { .name = "cbc(sm4)", .compat = "sm4", .uinfo = { .encr = { .geniv = "echainiv", .blockbits = 128, .defkeybits = 128, } }, .pfkey_supported = 1, .desc = { .sadb_alg_id = SADB_X_EALG_SM4CBC, .sadb_alg_ivlen = 16, .sadb_alg_minbits = 128, .sadb_alg_maxbits = 256 } }, }; static struct xfrm_algo_desc calg_list[] = { { .name = "deflate", .uinfo = { .comp = { .threshold = 90, } }, .pfkey_supported = 1, .desc = { .sadb_alg_id = SADB_X_CALG_DEFLATE } }, { .name = "lzs", .uinfo = { .comp = { .threshold = 90, } }, .pfkey_supported = 1, .desc = { .sadb_alg_id = SADB_X_CALG_LZS } }, { .name = "lzjh", .uinfo = { .comp = { .threshold = 50, } }, .pfkey_supported = 1, .desc = { .sadb_alg_id = SADB_X_CALG_LZJH } }, }; static inline int aalg_entries(void) { return ARRAY_SIZE(aalg_list); } static inline int ealg_entries(void) { return ARRAY_SIZE(ealg_list); } static inline int calg_entries(void) { return ARRAY_SIZE(calg_list); } struct xfrm_algo_list { int (*find)(const char *name, u32 type, u32 mask); struct xfrm_algo_desc *algs; int entries; }; static const struct xfrm_algo_list xfrm_aead_list = { .find = crypto_has_aead, .algs = aead_list, .entries = ARRAY_SIZE(aead_list), }; static const struct xfrm_algo_list xfrm_aalg_list = { .find = crypto_has_ahash, .algs = aalg_list, .entries = ARRAY_SIZE(aalg_list), }; static const struct xfrm_algo_list xfrm_ealg_list = { .find = crypto_has_skcipher, .algs = ealg_list, .entries = ARRAY_SIZE(ealg_list), }; static const struct xfrm_algo_list xfrm_calg_list = { .find = crypto_has_comp, .algs = calg_list, .entries = ARRAY_SIZE(calg_list), }; static struct xfrm_algo_desc *xfrm_find_algo( const struct xfrm_algo_list *algo_list, int match(const struct xfrm_algo_desc *entry, const void *data), const void *data, int probe) { struct xfrm_algo_desc *list = algo_list->algs; int i, status; for (i = 0; i < algo_list->entries; i++) { if (!match(list + i, data)) continue; if (list[i].available) return &list[i]; if (!probe) break; status = algo_list->find(list[i].name, 0, 0); if (!status) break; list[i].available = status; return &list[i]; } return NULL; } static int xfrm_alg_id_match(const struct xfrm_algo_desc *entry, const void *data) { return entry->desc.sadb_alg_id == (unsigned long)data; } struct xfrm_algo_desc *xfrm_aalg_get_byid(int alg_id) { return xfrm_find_algo(&xfrm_aalg_list, xfrm_alg_id_match, (void *)(unsigned long)alg_id, 1); } EXPORT_SYMBOL_GPL(xfrm_aalg_get_byid); struct xfrm_algo_desc *xfrm_ealg_get_byid(int alg_id) { return xfrm_find_algo(&xfrm_ealg_list, xfrm_alg_id_match, (void *)(unsigned long)alg_id, 1); } EXPORT_SYMBOL_GPL(xfrm_ealg_get_byid); struct xfrm_algo_desc *xfrm_calg_get_byid(int alg_id) { return xfrm_find_algo(&xfrm_calg_list, xfrm_alg_id_match, (void *)(unsigned long)alg_id, 1); } EXPORT_SYMBOL_GPL(xfrm_calg_get_byid); static int xfrm_alg_name_match(const struct xfrm_algo_desc *entry, const void *data) { const char *name = data; return name && (!strcmp(name, entry->name) || (entry->compat && !strcmp(name, entry->compat))); } struct xfrm_algo_desc *xfrm_aalg_get_byname(const char *name, int probe) { return xfrm_find_algo(&xfrm_aalg_list, xfrm_alg_name_match, name, probe); } EXPORT_SYMBOL_GPL(xfrm_aalg_get_byname); struct xfrm_algo_desc *xfrm_ealg_get_byname(const char *name, int probe) { return xfrm_find_algo(&xfrm_ealg_list, xfrm_alg_name_match, name, probe); } EXPORT_SYMBOL_GPL(xfrm_ealg_get_byname); struct xfrm_algo_desc *xfrm_calg_get_byname(const char *name, int probe) { return xfrm_find_algo(&xfrm_calg_list, xfrm_alg_name_match, name, probe); } EXPORT_SYMBOL_GPL(xfrm_calg_get_byname); struct xfrm_aead_name { const char *name; int icvbits; }; static int xfrm_aead_name_match(const struct xfrm_algo_desc *entry, const void *data) { const struct xfrm_aead_name *aead = data; const char *name = aead->name; return aead->icvbits == entry->uinfo.aead.icv_truncbits && name && !strcmp(name, entry->name); } struct xfrm_algo_desc *xfrm_aead_get_byname(const char *name, int icv_len, int probe) { struct xfrm_aead_name data = { .name = name, .icvbits = icv_len, }; return xfrm_find_algo(&xfrm_aead_list, xfrm_aead_name_match, &data, probe); } EXPORT_SYMBOL_GPL(xfrm_aead_get_byname); struct xfrm_algo_desc *xfrm_aalg_get_byidx(unsigned int idx) { if (idx >= aalg_entries()) return NULL; return &aalg_list[idx]; } EXPORT_SYMBOL_GPL(xfrm_aalg_get_byidx); struct xfrm_algo_desc *xfrm_ealg_get_byidx(unsigned int idx) { if (idx >= ealg_entries()) return NULL; return &ealg_list[idx]; } EXPORT_SYMBOL_GPL(xfrm_ealg_get_byidx); /* * Probe for the availability of crypto algorithms, and set the available * flag for any algorithms found on the system. This is typically called by * pfkey during userspace SA add, update or register. */ void xfrm_probe_algs(void) { int i, status; BUG_ON(in_softirq()); for (i = 0; i < aalg_entries(); i++) { status = crypto_has_ahash(aalg_list[i].name, 0, 0); if (aalg_list[i].available != status) aalg_list[i].available = status; } for (i = 0; i < ealg_entries(); i++) { status = crypto_has_skcipher(ealg_list[i].name, 0, 0); if (ealg_list[i].available != status) ealg_list[i].available = status; } for (i = 0; i < calg_entries(); i++) { status = crypto_has_comp(calg_list[i].name, 0, CRYPTO_ALG_ASYNC); if (calg_list[i].available != status) calg_list[i].available = status; } } EXPORT_SYMBOL_GPL(xfrm_probe_algs); int xfrm_count_pfkey_auth_supported(void) { int i, n; for (i = 0, n = 0; i < aalg_entries(); i++) if (aalg_list[i].available && aalg_list[i].pfkey_supported) n++; return n; } EXPORT_SYMBOL_GPL(xfrm_count_pfkey_auth_supported); int xfrm_count_pfkey_enc_supported(void) { int i, n; for (i = 0, n = 0; i < ealg_entries(); i++) if (ealg_list[i].available && ealg_list[i].pfkey_supported) n++; return n; } EXPORT_SYMBOL_GPL(xfrm_count_pfkey_enc_supported); MODULE_DESCRIPTION("XFRM Algorithm interface"); MODULE_LICENSE("GPL"); |
| 4 251 233 381 381 368 20 8 8 8 6 7 150 60 305 360 11 12 12 12 409 409 10 218 253 5 399 400 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 | /* SPDX-License-Identifier: GPL-2.0 */ /* * Code for manipulating bucket marks for garbage collection. * * Copyright 2014 Datera, Inc. */ #ifndef _BUCKETS_H #define _BUCKETS_H #include "buckets_types.h" #include "extents.h" #include "sb-members.h" static inline u64 sector_to_bucket(const struct bch_dev *ca, sector_t s) { return div_u64(s, ca->mi.bucket_size); } static inline sector_t bucket_to_sector(const struct bch_dev *ca, size_t b) { return ((sector_t) b) * ca->mi.bucket_size; } static inline sector_t bucket_remainder(const struct bch_dev *ca, sector_t s) { u32 remainder; div_u64_rem(s, ca->mi.bucket_size, &remainder); return remainder; } static inline u64 sector_to_bucket_and_offset(const struct bch_dev *ca, sector_t s, u32 *offset) { return div_u64_rem(s, ca->mi.bucket_size, offset); } #define for_each_bucket(_b, _buckets) \ for (_b = (_buckets)->b + (_buckets)->first_bucket; \ _b < (_buckets)->b + (_buckets)->nbuckets; _b++) /* * Ugly hack alert: * * We need to cram a spinlock in a single byte, because that's what we have left * in struct bucket, and we care about the size of these - during fsck, we need * in memory state for every single bucket on every device. * * We used to do * while (xchg(&b->lock, 1) cpu_relax(); * but, it turns out not all architectures support xchg on a single byte. * * So now we use bit_spin_lock(), with fun games since we can't burn a whole * ulong for this - we just need to make sure the lock bit always ends up in the * first byte. */ #if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ #define BUCKET_LOCK_BITNR 0 #else #define BUCKET_LOCK_BITNR (BITS_PER_LONG - 1) #endif union ulong_byte_assert { ulong ulong; u8 byte; }; static inline void bucket_unlock(struct bucket *b) { BUILD_BUG_ON(!((union ulong_byte_assert) { .ulong = 1UL << BUCKET_LOCK_BITNR }).byte); clear_bit_unlock(BUCKET_LOCK_BITNR, (void *) &b->lock); wake_up_bit((void *) &b->lock, BUCKET_LOCK_BITNR); } static inline void bucket_lock(struct bucket *b) { wait_on_bit_lock((void *) &b->lock, BUCKET_LOCK_BITNR, TASK_UNINTERRUPTIBLE); } static inline struct bucket *gc_bucket(struct bch_dev *ca, size_t b) { return bucket_valid(ca, b) ? genradix_ptr(&ca->buckets_gc, b) : NULL; } static inline struct bucket_gens *bucket_gens(struct bch_dev *ca) { return rcu_dereference_check(ca->bucket_gens, lockdep_is_held(&ca->fs->state_lock)); } static inline u8 *bucket_gen(struct bch_dev *ca, size_t b) { struct bucket_gens *gens = bucket_gens(ca); if (b - gens->first_bucket >= gens->nbuckets_minus_first) return NULL; return gens->b + b; } static inline int bucket_gen_get_rcu(struct bch_dev *ca, size_t b) { u8 *gen = bucket_gen(ca, b); return gen ? *gen : -1; } static inline int bucket_gen_get(struct bch_dev *ca, size_t b) { rcu_read_lock(); int ret = bucket_gen_get_rcu(ca, b); rcu_read_unlock(); return ret; } static inline size_t PTR_BUCKET_NR(const struct bch_dev *ca, const struct bch_extent_ptr *ptr) { return sector_to_bucket(ca, ptr->offset); } static inline struct bpos PTR_BUCKET_POS(const struct bch_dev *ca, const struct bch_extent_ptr *ptr) { return POS(ptr->dev, PTR_BUCKET_NR(ca, ptr)); } static inline struct bpos PTR_BUCKET_POS_OFFSET(const struct bch_dev *ca, const struct bch_extent_ptr *ptr, u32 *bucket_offset) { return POS(ptr->dev, sector_to_bucket_and_offset(ca, ptr->offset, bucket_offset)); } static inline struct bucket *PTR_GC_BUCKET(struct bch_dev *ca, const struct bch_extent_ptr *ptr) { return gc_bucket(ca, PTR_BUCKET_NR(ca, ptr)); } static inline enum bch_data_type ptr_data_type(const struct bkey *k, const struct bch_extent_ptr *ptr) { if (bkey_is_btree_ptr(k)) return BCH_DATA_btree; return ptr->cached ? BCH_DATA_cached : BCH_DATA_user; } static inline s64 ptr_disk_sectors(s64 sectors, struct extent_ptr_decoded p) { EBUG_ON(sectors < 0); return crc_is_compressed(p.crc) ? DIV_ROUND_UP_ULL(sectors * p.crc.compressed_size, p.crc.uncompressed_size) : sectors; } static inline int gen_cmp(u8 a, u8 b) { return (s8) (a - b); } static inline int gen_after(u8 a, u8 b) { int r = gen_cmp(a, b); return r > 0 ? r : 0; } static inline int dev_ptr_stale_rcu(struct bch_dev *ca, const struct bch_extent_ptr *ptr) { int gen = bucket_gen_get_rcu(ca, PTR_BUCKET_NR(ca, ptr)); return gen < 0 ? gen : gen_after(gen, ptr->gen); } /** * dev_ptr_stale() - check if a pointer points into a bucket that has been * invalidated. */ static inline int dev_ptr_stale(struct bch_dev *ca, const struct bch_extent_ptr *ptr) { rcu_read_lock(); int ret = dev_ptr_stale_rcu(ca, ptr); rcu_read_unlock(); return ret; } /* Device usage: */ void bch2_dev_usage_read_fast(struct bch_dev *, struct bch_dev_usage *); static inline struct bch_dev_usage bch2_dev_usage_read(struct bch_dev *ca) { struct bch_dev_usage ret; bch2_dev_usage_read_fast(ca, &ret); return ret; } void bch2_dev_usage_to_text(struct printbuf *, struct bch_dev *, struct bch_dev_usage *); static inline u64 bch2_dev_buckets_reserved(struct bch_dev *ca, enum bch_watermark watermark) { s64 reserved = 0; switch (watermark) { case BCH_WATERMARK_NR: BUG(); case BCH_WATERMARK_stripe: reserved += ca->mi.nbuckets >> 6; fallthrough; case BCH_WATERMARK_normal: reserved += ca->mi.nbuckets >> 6; fallthrough; case BCH_WATERMARK_copygc: reserved += ca->nr_btree_reserve; fallthrough; case BCH_WATERMARK_btree: reserved += ca->nr_btree_reserve; fallthrough; case BCH_WATERMARK_btree_copygc: case BCH_WATERMARK_reclaim: case BCH_WATERMARK_interior_updates: break; } return reserved; } static inline u64 dev_buckets_free(struct bch_dev *ca, struct bch_dev_usage usage, enum bch_watermark watermark) { return max_t(s64, 0, usage.d[BCH_DATA_free].buckets - ca->nr_open_buckets - bch2_dev_buckets_reserved(ca, watermark)); } static inline u64 __dev_buckets_available(struct bch_dev *ca, struct bch_dev_usage usage, enum bch_watermark watermark) { return max_t(s64, 0, usage.d[BCH_DATA_free].buckets + usage.d[BCH_DATA_cached].buckets + usage.d[BCH_DATA_need_gc_gens].buckets + usage.d[BCH_DATA_need_discard].buckets - ca->nr_open_buckets - bch2_dev_buckets_reserved(ca, watermark)); } static inline u64 dev_buckets_available(struct bch_dev *ca, enum bch_watermark watermark) { return __dev_buckets_available(ca, bch2_dev_usage_read(ca), watermark); } /* Filesystem usage: */ static inline unsigned dev_usage_u64s(void) { return sizeof(struct bch_dev_usage) / sizeof(u64); } struct bch_fs_usage_short bch2_fs_usage_read_short(struct bch_fs *); int bch2_bucket_ref_update(struct btree_trans *, struct bch_dev *, struct bkey_s_c, const struct bch_extent_ptr *, s64, enum bch_data_type, u8, u8, u32 *); int bch2_check_fix_ptrs(struct btree_trans *, enum btree_id, unsigned, struct bkey_s_c, enum btree_iter_update_trigger_flags); int bch2_trigger_extent(struct btree_trans *, enum btree_id, unsigned, struct bkey_s_c, struct bkey_s, enum btree_iter_update_trigger_flags); int bch2_trigger_reservation(struct btree_trans *, enum btree_id, unsigned, struct bkey_s_c, struct bkey_s, enum btree_iter_update_trigger_flags); #define trigger_run_overwrite_then_insert(_fn, _trans, _btree_id, _level, _old, _new, _flags)\ ({ \ int ret = 0; \ \ if (_old.k->type) \ ret = _fn(_trans, _btree_id, _level, _old, _flags & ~BTREE_TRIGGER_insert); \ if (!ret && _new.k->type) \ ret = _fn(_trans, _btree_id, _level, _new.s_c, _flags & ~BTREE_TRIGGER_overwrite);\ ret; \ }) void bch2_trans_account_disk_usage_change(struct btree_trans *); int bch2_trans_mark_metadata_bucket(struct btree_trans *, struct bch_dev *, u64, enum bch_data_type, unsigned, enum btree_iter_update_trigger_flags); int bch2_trans_mark_dev_sb(struct bch_fs *, struct bch_dev *, enum btree_iter_update_trigger_flags); int bch2_trans_mark_dev_sbs_flags(struct bch_fs *, enum btree_iter_update_trigger_flags); int bch2_trans_mark_dev_sbs(struct bch_fs *); bool bch2_is_superblock_bucket(struct bch_dev *, u64); static inline const char *bch2_data_type_str(enum bch_data_type type) { return type < BCH_DATA_NR ? __bch2_data_types[type] : "(invalid data type)"; } /* disk reservations: */ static inline void bch2_disk_reservation_put(struct bch_fs *c, struct disk_reservation *res) { if (res->sectors) { this_cpu_sub(*c->online_reserved, res->sectors); res->sectors = 0; } } enum bch_reservation_flags { BCH_DISK_RESERVATION_NOFAIL = 1 << 0, BCH_DISK_RESERVATION_PARTIAL = 1 << 1, }; int __bch2_disk_reservation_add(struct bch_fs *, struct disk_reservation *, u64, enum bch_reservation_flags); static inline int bch2_disk_reservation_add(struct bch_fs *c, struct disk_reservation *res, u64 sectors, enum bch_reservation_flags flags) { #ifdef __KERNEL__ u64 old, new; old = this_cpu_read(c->pcpu->sectors_available); do { if (sectors > old) return __bch2_disk_reservation_add(c, res, sectors, flags); new = old - sectors; } while (!this_cpu_try_cmpxchg(c->pcpu->sectors_available, &old, new)); this_cpu_add(*c->online_reserved, sectors); res->sectors += sectors; return 0; #else return __bch2_disk_reservation_add(c, res, sectors, flags); #endif } static inline struct disk_reservation bch2_disk_reservation_init(struct bch_fs *c, unsigned nr_replicas) { return (struct disk_reservation) { .sectors = 0, #if 0 /* not used yet: */ .gen = c->capacity_gen, #endif .nr_replicas = nr_replicas, }; } static inline int bch2_disk_reservation_get(struct bch_fs *c, struct disk_reservation *res, u64 sectors, unsigned nr_replicas, int flags) { *res = bch2_disk_reservation_init(c, nr_replicas); return bch2_disk_reservation_add(c, res, sectors * nr_replicas, flags); } #define RESERVE_FACTOR 6 static inline u64 avail_factor(u64 r) { return div_u64(r << RESERVE_FACTOR, (1 << RESERVE_FACTOR) + 1); } void bch2_buckets_nouse_free(struct bch_fs *); int bch2_buckets_nouse_alloc(struct bch_fs *); int bch2_dev_buckets_resize(struct bch_fs *, struct bch_dev *, u64); void bch2_dev_buckets_free(struct bch_dev *); int bch2_dev_buckets_alloc(struct bch_fs *, struct bch_dev *); #endif /* _BUCKETS_H */ |
| 1 15 13 1 1 1 1 2 2 2 14 14 14 13 13 13 11 13 7 1 1 1 4 3 4 3 4 7 1 1 1 2 2 1 2 1 2 2 2 5 4 4 19 18 6 6 6 3 1 5 5 5 5 5 7 7 7 7 6 6 3 5 4 4 4 4 4 4 4 4 6 7 7 7 2 2 2 2 2 2 7 15 1 2 12 15 2 2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 | // SPDX-License-Identifier: GPL-2.0 /* Watch queue and general notification mechanism, built on pipes * * Copyright (C) 2020 Red Hat, Inc. All Rights Reserved. * Written by David Howells (dhowells@redhat.com) * * See Documentation/core-api/watch_queue.rst */ #define pr_fmt(fmt) "watchq: " fmt #include <linux/module.h> #include <linux/init.h> #include <linux/sched.h> #include <linux/slab.h> #include <linux/printk.h> #include <linux/miscdevice.h> #include <linux/fs.h> #include <linux/mm.h> #include <linux/pagemap.h> #include <linux/poll.h> #include <linux/uaccess.h> #include <linux/vmalloc.h> #include <linux/file.h> #include <linux/security.h> #include <linux/cred.h> #include <linux/sched/signal.h> #include <linux/watch_queue.h> #include <linux/pipe_fs_i.h> MODULE_DESCRIPTION("Watch queue"); MODULE_AUTHOR("Red Hat, Inc."); #define WATCH_QUEUE_NOTE_SIZE 128 #define WATCH_QUEUE_NOTES_PER_PAGE (PAGE_SIZE / WATCH_QUEUE_NOTE_SIZE) /* * This must be called under the RCU read-lock, which makes * sure that the wqueue still exists. It can then take the lock, * and check that the wqueue hasn't been destroyed, which in * turn makes sure that the notification pipe still exists. */ static inline bool lock_wqueue(struct watch_queue *wqueue) { spin_lock_bh(&wqueue->lock); if (unlikely(!wqueue->pipe)) { spin_unlock_bh(&wqueue->lock); return false; } return true; } static inline void unlock_wqueue(struct watch_queue *wqueue) { spin_unlock_bh(&wqueue->lock); } static void watch_queue_pipe_buf_release(struct pipe_inode_info *pipe, struct pipe_buffer *buf) { struct watch_queue *wqueue = (struct watch_queue *)buf->private; struct page *page; unsigned int bit; /* We need to work out which note within the page this refers to, but * the note might have been maximum size, so merely ANDing the offset * off doesn't work. OTOH, the note must've been more than zero size. */ bit = buf->offset + buf->len; if ((bit & (WATCH_QUEUE_NOTE_SIZE - 1)) == 0) bit -= WATCH_QUEUE_NOTE_SIZE; bit /= WATCH_QUEUE_NOTE_SIZE; page = buf->page; bit += page->private; set_bit(bit, wqueue->notes_bitmap); generic_pipe_buf_release(pipe, buf); } // No try_steal function => no stealing #define watch_queue_pipe_buf_try_steal NULL /* New data written to a pipe may be appended to a buffer with this type. */ static const struct pipe_buf_operations watch_queue_pipe_buf_ops = { .release = watch_queue_pipe_buf_release, .try_steal = watch_queue_pipe_buf_try_steal, .get = generic_pipe_buf_get, }; /* * Post a notification to a watch queue. * * Must be called with the RCU lock for reading, and the * watch_queue lock held, which guarantees that the pipe * hasn't been released. */ static bool post_one_notification(struct watch_queue *wqueue, struct watch_notification *n) { void *p; struct pipe_inode_info *pipe = wqueue->pipe; struct pipe_buffer *buf; struct page *page; unsigned int head, tail, mask, note, offset, len; bool done = false; spin_lock_irq(&pipe->rd_wait.lock); mask = pipe->ring_size - 1; head = pipe->head; tail = pipe->tail; if (pipe_full(head, tail, pipe->ring_size)) goto lost; note = find_first_bit(wqueue->notes_bitmap, wqueue->nr_notes); if (note >= wqueue->nr_notes) goto lost; page = wqueue->notes[note / WATCH_QUEUE_NOTES_PER_PAGE]; offset = note % WATCH_QUEUE_NOTES_PER_PAGE * WATCH_QUEUE_NOTE_SIZE; get_page(page); len = n->info & WATCH_INFO_LENGTH; p = kmap_atomic(page); memcpy(p + offset, n, len); kunmap_atomic(p); buf = &pipe->bufs[head & mask]; buf->page = page; buf->private = (unsigned long)wqueue; buf->ops = &watch_queue_pipe_buf_ops; buf->offset = offset; buf->len = len; buf->flags = PIPE_BUF_FLAG_WHOLE; smp_store_release(&pipe->head, head + 1); /* vs pipe_read() */ if (!test_and_clear_bit(note, wqueue->notes_bitmap)) { spin_unlock_irq(&pipe->rd_wait.lock); BUG(); } wake_up_interruptible_sync_poll_locked(&pipe->rd_wait, EPOLLIN | EPOLLRDNORM); done = true; out: spin_unlock_irq(&pipe->rd_wait.lock); if (done) kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN); return done; lost: buf = &pipe->bufs[(head - 1) & mask]; buf->flags |= PIPE_BUF_FLAG_LOSS; goto out; } /* * Apply filter rules to a notification. */ static bool filter_watch_notification(const struct watch_filter *wf, const struct watch_notification *n) { const struct watch_type_filter *wt; unsigned int st_bits = sizeof(wt->subtype_filter[0]) * 8; unsigned int st_index = n->subtype / st_bits; unsigned int st_bit = 1U << (n->subtype % st_bits); int i; if (!test_bit(n->type, wf->type_filter)) return false; for (i = 0; i < wf->nr_filters; i++) { wt = &wf->filters[i]; if (n->type == wt->type && (wt->subtype_filter[st_index] & st_bit) && (n->info & wt->info_mask) == wt->info_filter) return true; } return false; /* If there is a filter, the default is to reject. */ } /** * __post_watch_notification - Post an event notification * @wlist: The watch list to post the event to. * @n: The notification record to post. * @cred: The creds of the process that triggered the notification. * @id: The ID to match on the watch. * * Post a notification of an event into a set of watch queues and let the users * know. * * The size of the notification should be set in n->info & WATCH_INFO_LENGTH and * should be in units of sizeof(*n). */ void __post_watch_notification(struct watch_list *wlist, struct watch_notification *n, const struct cred *cred, u64 id) { const struct watch_filter *wf; struct watch_queue *wqueue; struct watch *watch; if (((n->info & WATCH_INFO_LENGTH) >> WATCH_INFO_LENGTH__SHIFT) == 0) { WARN_ON(1); return; } rcu_read_lock(); hlist_for_each_entry_rcu(watch, &wlist->watchers, list_node) { if (watch->id != id) continue; n->info &= ~WATCH_INFO_ID; n->info |= watch->info_id; wqueue = rcu_dereference(watch->queue); wf = rcu_dereference(wqueue->filter); if (wf && !filter_watch_notification(wf, n)) continue; if (security_post_notification(watch->cred, cred, n) < 0) continue; if (lock_wqueue(wqueue)) { post_one_notification(wqueue, n); unlock_wqueue(wqueue); } } rcu_read_unlock(); } EXPORT_SYMBOL(__post_watch_notification); /* * Allocate sufficient pages to preallocation for the requested number of * notifications. */ long watch_queue_set_size(struct pipe_inode_info *pipe, unsigned int nr_notes) { struct watch_queue *wqueue = pipe->watch_queue; struct page **pages; unsigned long *bitmap; unsigned long user_bufs; int ret, i, nr_pages; if (!wqueue) return -ENODEV; if (wqueue->notes) return -EBUSY; if (nr_notes < 1 || nr_notes > 512) /* TODO: choose a better hard limit */ return -EINVAL; nr_pages = (nr_notes + WATCH_QUEUE_NOTES_PER_PAGE - 1); nr_pages /= WATCH_QUEUE_NOTES_PER_PAGE; user_bufs = account_pipe_buffers(pipe->user, pipe->nr_accounted, nr_pages); if (nr_pages > pipe->max_usage && (too_many_pipe_buffers_hard(user_bufs) || too_many_pipe_buffers_soft(user_bufs)) && pipe_is_unprivileged_user()) { ret = -EPERM; goto error; } nr_notes = nr_pages * WATCH_QUEUE_NOTES_PER_PAGE; ret = pipe_resize_ring(pipe, roundup_pow_of_two(nr_notes)); if (ret < 0) goto error; ret = -ENOMEM; pages = kcalloc(nr_pages, sizeof(struct page *), GFP_KERNEL); if (!pages) goto error; for (i = 0; i < nr_pages; i++) { pages[i] = alloc_page(GFP_KERNEL); if (!pages[i]) goto error_p; pages[i]->private = i * WATCH_QUEUE_NOTES_PER_PAGE; } bitmap = bitmap_alloc(nr_notes, GFP_KERNEL); if (!bitmap) goto error_p; bitmap_fill(bitmap, nr_notes); wqueue->notes = pages; wqueue->notes_bitmap = bitmap; wqueue->nr_pages = nr_pages; wqueue->nr_notes = nr_notes; return 0; error_p: while (--i >= 0) __free_page(pages[i]); kfree(pages); error: (void) account_pipe_buffers(pipe->user, nr_pages, pipe->nr_accounted); return ret; } /* * Set the filter on a watch queue. */ long watch_queue_set_filter(struct pipe_inode_info *pipe, struct watch_notification_filter __user *_filter) { struct watch_notification_type_filter *tf; struct watch_notification_filter filter; struct watch_type_filter *q; struct watch_filter *wfilter; struct watch_queue *wqueue = pipe->watch_queue; int ret, nr_filter = 0, i; if (!wqueue) return -ENODEV; if (!_filter) { /* Remove the old filter */ wfilter = NULL; goto set; } /* Grab the user's filter specification */ if (copy_from_user(&filter, _filter, sizeof(filter)) != 0) return -EFAULT; if (filter.nr_filters == 0 || filter.nr_filters > 16 || filter.__reserved != 0) return -EINVAL; tf = memdup_array_user(_filter->filters, filter.nr_filters, sizeof(*tf)); if (IS_ERR(tf)) return PTR_ERR(tf); ret = -EINVAL; for (i = 0; i < filter.nr_filters; i++) { if ((tf[i].info_filter & ~tf[i].info_mask) || tf[i].info_mask & WATCH_INFO_LENGTH) goto err_filter; /* Ignore any unknown types */ if (tf[i].type >= WATCH_TYPE__NR) continue; nr_filter++; } /* Now we need to build the internal filter from only the relevant * user-specified filters. */ ret = -ENOMEM; wfilter = kzalloc(struct_size(wfilter, filters, nr_filter), GFP_KERNEL); if (!wfilter) goto err_filter; wfilter->nr_filters = nr_filter; q = wfilter->filters; for (i = 0; i < filter.nr_filters; i++) { if (tf[i].type >= WATCH_TYPE__NR) continue; q->type = tf[i].type; q->info_filter = tf[i].info_filter; q->info_mask = tf[i].info_mask; q->subtype_filter[0] = tf[i].subtype_filter[0]; __set_bit(q->type, wfilter->type_filter); q++; } kfree(tf); set: pipe_lock(pipe); wfilter = rcu_replace_pointer(wqueue->filter, wfilter, lockdep_is_held(&pipe->mutex)); pipe_unlock(pipe); if (wfilter) kfree_rcu(wfilter, rcu); return 0; err_filter: kfree(tf); return ret; } static void __put_watch_queue(struct kref *kref) { struct watch_queue *wqueue = container_of(kref, struct watch_queue, usage); struct watch_filter *wfilter; int i; for (i = 0; i < wqueue->nr_pages; i++) __free_page(wqueue->notes[i]); kfree(wqueue->notes); bitmap_free(wqueue->notes_bitmap); wfilter = rcu_access_pointer(wqueue->filter); if (wfilter) kfree_rcu(wfilter, rcu); kfree_rcu(wqueue, rcu); } /** * put_watch_queue - Dispose of a ref on a watchqueue. * @wqueue: The watch queue to unref. */ void put_watch_queue(struct watch_queue *wqueue) { kref_put(&wqueue->usage, __put_watch_queue); } EXPORT_SYMBOL(put_watch_queue); static void free_watch(struct rcu_head *rcu) { struct watch *watch = container_of(rcu, struct watch, rcu); put_watch_queue(rcu_access_pointer(watch->queue)); atomic_dec(&watch->cred->user->nr_watches); put_cred(watch->cred); kfree(watch); } static void __put_watch(struct kref *kref) { struct watch *watch = container_of(kref, struct watch, usage); call_rcu(&watch->rcu, free_watch); } /* * Discard a watch. */ static void put_watch(struct watch *watch) { kref_put(&watch->usage, __put_watch); } /** * init_watch - Initialise a watch * @watch: The watch to initialise. * @wqueue: The queue to assign. * * Initialise a watch and set the watch queue. */ void init_watch(struct watch *watch, struct watch_queue *wqueue) { kref_init(&watch->usage); INIT_HLIST_NODE(&watch->list_node); INIT_HLIST_NODE(&watch->queue_node); rcu_assign_pointer(watch->queue, wqueue); } static int add_one_watch(struct watch *watch, struct watch_list *wlist, struct watch_queue *wqueue) { const struct cred *cred; struct watch *w; hlist_for_each_entry(w, &wlist->watchers, list_node) { struct watch_queue *wq = rcu_access_pointer(w->queue); if (wqueue == wq && watch->id == w->id) return -EBUSY; } cred = current_cred(); if (atomic_inc_return(&cred->user->nr_watches) > task_rlimit(current, RLIMIT_NOFILE)) { atomic_dec(&cred->user->nr_watches); return -EAGAIN; } watch->cred = get_cred(cred); rcu_assign_pointer(watch->watch_list, wlist); kref_get(&wqueue->usage); kref_get(&watch->usage); hlist_add_head(&watch->queue_node, &wqueue->watches); hlist_add_head_rcu(&watch->list_node, &wlist->watchers); return 0; } /** * add_watch_to_object - Add a watch on an object to a watch list * @watch: The watch to add * @wlist: The watch list to add to * * @watch->queue must have been set to point to the queue to post notifications * to and the watch list of the object to be watched. @watch->cred must also * have been set to the appropriate credentials and a ref taken on them. * * The caller must pin the queue and the list both and must hold the list * locked against racing watch additions/removals. */ int add_watch_to_object(struct watch *watch, struct watch_list *wlist) { struct watch_queue *wqueue; int ret = -ENOENT; rcu_read_lock(); wqueue = rcu_access_pointer(watch->queue); if (lock_wqueue(wqueue)) { spin_lock(&wlist->lock); ret = add_one_watch(watch, wlist, wqueue); spin_unlock(&wlist->lock); unlock_wqueue(wqueue); } rcu_read_unlock(); return ret; } EXPORT_SYMBOL(add_watch_to_object); /** * remove_watch_from_object - Remove a watch or all watches from an object. * @wlist: The watch list to remove from * @wq: The watch queue of interest (ignored if @all is true) * @id: The ID of the watch to remove (ignored if @all is true) * @all: True to remove all objects * * Remove a specific watch or all watches from an object. A notification is * sent to the watcher to tell them that this happened. */ int remove_watch_from_object(struct watch_list *wlist, struct watch_queue *wq, u64 id, bool all) { struct watch_notification_removal n; struct watch_queue *wqueue; struct watch *watch; int ret = -EBADSLT; rcu_read_lock(); again: spin_lock(&wlist->lock); hlist_for_each_entry(watch, &wlist->watchers, list_node) { if (all || (watch->id == id && rcu_access_pointer(watch->queue) == wq)) goto found; } spin_unlock(&wlist->lock); goto out; found: ret = 0; hlist_del_init_rcu(&watch->list_node); rcu_assign_pointer(watch->watch_list, NULL); spin_unlock(&wlist->lock); /* We now own the reference on watch that used to belong to wlist. */ n.watch.type = WATCH_TYPE_META; n.watch.subtype = WATCH_META_REMOVAL_NOTIFICATION; n.watch.info = watch->info_id | watch_sizeof(n.watch); n.id = id; if (id != 0) n.watch.info = watch->info_id | watch_sizeof(n); wqueue = rcu_dereference(watch->queue); if (lock_wqueue(wqueue)) { post_one_notification(wqueue, &n.watch); if (!hlist_unhashed(&watch->queue_node)) { hlist_del_init_rcu(&watch->queue_node); put_watch(watch); } unlock_wqueue(wqueue); } if (wlist->release_watch) { void (*release_watch)(struct watch *); release_watch = wlist->release_watch; rcu_read_unlock(); (*release_watch)(watch); rcu_read_lock(); } put_watch(watch); if (all && !hlist_empty(&wlist->watchers)) goto again; out: rcu_read_unlock(); return ret; } EXPORT_SYMBOL(remove_watch_from_object); /* * Remove all the watches that are contributory to a queue. This has the * potential to race with removal of the watches by the destruction of the * objects being watched or with the distribution of notifications. */ void watch_queue_clear(struct watch_queue *wqueue) { struct watch_list *wlist; struct watch *watch; bool release; rcu_read_lock(); spin_lock_bh(&wqueue->lock); /* * This pipe can be freed by callers like free_pipe_info(). * Removing this reference also prevents new notifications. */ wqueue->pipe = NULL; while (!hlist_empty(&wqueue->watches)) { watch = hlist_entry(wqueue->watches.first, struct watch, queue_node); hlist_del_init_rcu(&watch->queue_node); /* We now own a ref on the watch. */ spin_unlock_bh(&wqueue->lock); /* We can't do the next bit under the queue lock as we need to * get the list lock - which would cause a deadlock if someone * was removing from the opposite direction at the same time or * posting a notification. */ wlist = rcu_dereference(watch->watch_list); if (wlist) { void (*release_watch)(struct watch *); spin_lock(&wlist->lock); release = !hlist_unhashed(&watch->list_node); if (release) { hlist_del_init_rcu(&watch->list_node); rcu_assign_pointer(watch->watch_list, NULL); /* We now own a second ref on the watch. */ } release_watch = wlist->release_watch; spin_unlock(&wlist->lock); if (release) { if (release_watch) { rcu_read_unlock(); /* This might need to call dput(), so * we have to drop all the locks. */ (*release_watch)(watch); rcu_read_lock(); } put_watch(watch); } } put_watch(watch); spin_lock_bh(&wqueue->lock); } spin_unlock_bh(&wqueue->lock); rcu_read_unlock(); } /** * get_watch_queue - Get a watch queue from its file descriptor. * @fd: The fd to query. */ struct watch_queue *get_watch_queue(int fd) { struct pipe_inode_info *pipe; struct watch_queue *wqueue = ERR_PTR(-EINVAL); CLASS(fd, f)(fd); if (!fd_empty(f)) { pipe = get_pipe_info(fd_file(f), false); if (pipe && pipe->watch_queue) { wqueue = pipe->watch_queue; kref_get(&wqueue->usage); } } return wqueue; } EXPORT_SYMBOL(get_watch_queue); /* * Initialise a watch queue */ int watch_queue_init(struct pipe_inode_info *pipe) { struct watch_queue *wqueue; wqueue = kzalloc(sizeof(*wqueue), GFP_KERNEL); if (!wqueue) return -ENOMEM; wqueue->pipe = pipe; kref_init(&wqueue->usage); spin_lock_init(&wqueue->lock); INIT_HLIST_HEAD(&wqueue->watches); pipe->watch_queue = wqueue; return 0; } |
| 6 7 4 4 12 12 12 12 8 9 9 9 110 110 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 | /* * Copyright (c) 2014 Samsung Electronics Co., Ltd * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and associated documentation files (the "Software"), * to deal in the Software without restriction, including without limitation * the rights to use, copy, modify, merge, publish, distribute, sub license, * and/or sell copies of the Software, and to permit persons to whom the * Software is furnished to do so, subject to the following conditions: * * The above copyright notice and this permission notice (including the * next paragraph) shall be included in all copies or substantial portions * of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER * DEALINGS IN THE SOFTWARE. */ #include <linux/err.h> #include <linux/media-bus-format.h> #include <linux/module.h> #include <linux/mutex.h> #include <drm/drm_atomic_state_helper.h> #include <drm/drm_bridge.h> #include <drm/drm_debugfs.h> #include <drm/drm_edid.h> #include <drm/drm_encoder.h> #include <drm/drm_file.h> #include <drm/drm_of.h> #include <drm/drm_print.h> #include "drm_crtc_internal.h" /** * DOC: overview * * &struct drm_bridge represents a device that hangs on to an encoder. These are * handy when a regular &drm_encoder entity isn't enough to represent the entire * encoder chain. * * A bridge is always attached to a single &drm_encoder at a time, but can be * either connected to it directly, or through a chain of bridges:: * * [ CRTC ---> ] Encoder ---> Bridge A ---> Bridge B * * Here, the output of the encoder feeds to bridge A, and that furthers feeds to * bridge B. Bridge chains can be arbitrarily long, and shall be fully linear: * Chaining multiple bridges to the output of a bridge, or the same bridge to * the output of different bridges, is not supported. * * &drm_bridge, like &drm_panel, aren't &drm_mode_object entities like planes, * CRTCs, encoders or connectors and hence are not visible to userspace. They * just provide additional hooks to get the desired output at the end of the * encoder chain. */ /** * DOC: display driver integration * * Display drivers are responsible for linking encoders with the first bridge * in the chains. This is done by acquiring the appropriate bridge with * devm_drm_of_get_bridge(). Once acquired, the bridge shall be attached to the * encoder with a call to drm_bridge_attach(). * * Bridges are responsible for linking themselves with the next bridge in the * chain, if any. This is done the same way as for encoders, with the call to * drm_bridge_attach() occurring in the &drm_bridge_funcs.attach operation. * * Once these links are created, the bridges can participate along with encoder * functions to perform mode validation and fixup (through * drm_bridge_chain_mode_valid() and drm_atomic_bridge_chain_check()), mode * setting (through drm_bridge_chain_mode_set()), enable (through * drm_atomic_bridge_chain_pre_enable() and drm_atomic_bridge_chain_enable()) * and disable (through drm_atomic_bridge_chain_disable() and * drm_atomic_bridge_chain_post_disable()). Those functions call the * corresponding operations provided in &drm_bridge_funcs in sequence for all * bridges in the chain. * * For display drivers that use the atomic helpers * drm_atomic_helper_check_modeset(), * drm_atomic_helper_commit_modeset_enables() and * drm_atomic_helper_commit_modeset_disables() (either directly in hand-rolled * commit check and commit tail handlers, or through the higher-level * drm_atomic_helper_check() and drm_atomic_helper_commit_tail() or * drm_atomic_helper_commit_tail_rpm() helpers), this is done transparently and * requires no intervention from the driver. For other drivers, the relevant * DRM bridge chain functions shall be called manually. * * Bridges also participate in implementing the &drm_connector at the end of * the bridge chain. Display drivers may use the drm_bridge_connector_init() * helper to create the &drm_connector, or implement it manually on top of the * connector-related operations exposed by the bridge (see the overview * documentation of bridge operations for more details). */ /** * DOC: special care dsi * * The interaction between the bridges and other frameworks involved in * the probing of the upstream driver and the bridge driver can be * challenging. Indeed, there's multiple cases that needs to be * considered: * * - The upstream driver doesn't use the component framework and isn't a * MIPI-DSI host. In this case, the bridge driver will probe at some * point and the upstream driver should try to probe again by returning * EPROBE_DEFER as long as the bridge driver hasn't probed. * * - The upstream driver doesn't use the component framework, but is a * MIPI-DSI host. The bridge device uses the MIPI-DCS commands to be * controlled. In this case, the bridge device is a child of the * display device and when it will probe it's assured that the display * device (and MIPI-DSI host) is present. The upstream driver will be * assured that the bridge driver is connected between the * &mipi_dsi_host_ops.attach and &mipi_dsi_host_ops.detach operations. * Therefore, it must run mipi_dsi_host_register() in its probe * function, and then run drm_bridge_attach() in its * &mipi_dsi_host_ops.attach hook. * * - The upstream driver uses the component framework and is a MIPI-DSI * host. The bridge device uses the MIPI-DCS commands to be * controlled. This is the same situation than above, and can run * mipi_dsi_host_register() in either its probe or bind hooks. * * - The upstream driver uses the component framework and is a MIPI-DSI * host. The bridge device uses a separate bus (such as I2C) to be * controlled. In this case, there's no correlation between the probe * of the bridge and upstream drivers, so care must be taken to avoid * an endless EPROBE_DEFER loop, with each driver waiting for the * other to probe. * * The ideal pattern to cover the last item (and all the others in the * MIPI-DSI host driver case) is to split the operations like this: * * - The MIPI-DSI host driver must run mipi_dsi_host_register() in its * probe hook. It will make sure that the MIPI-DSI host sticks around, * and that the driver's bind can be called. * * - In its probe hook, the bridge driver must try to find its MIPI-DSI * host, register as a MIPI-DSI device and attach the MIPI-DSI device * to its host. The bridge driver is now functional. * * - In its &struct mipi_dsi_host_ops.attach hook, the MIPI-DSI host can * now add its component. Its bind hook will now be called and since * the bridge driver is attached and registered, we can now look for * and attach it. * * At this point, we're now certain that both the upstream driver and * the bridge driver are functional and we can't have a deadlock-like * situation when probing. */ /** * DOC: dsi bridge operations * * DSI host interfaces are expected to be implemented as bridges rather than * encoders, however there are a few aspects of their operation that need to * be defined in order to provide a consistent interface. * * A DSI host should keep the PHY powered down until the pre_enable operation is * called. All lanes are in an undefined idle state up to this point, and it * must not be assumed that it is LP-11. * pre_enable should initialise the PHY, set the data lanes to LP-11, and the * clock lane to either LP-11 or HS depending on the mode_flag * %MIPI_DSI_CLOCK_NON_CONTINUOUS. * * Ordinarily the downstream bridge DSI peripheral pre_enable will have been * called before the DSI host. If the DSI peripheral requires LP-11 and/or * the clock lane to be in HS mode prior to pre_enable, then it can set the * &pre_enable_prev_first flag to request the pre_enable (and * post_disable) order to be altered to enable the DSI host first. * * Either the CRTC being enabled, or the DSI host enable operation should switch * the host to actively transmitting video on the data lanes. * * The reverse also applies. The DSI host disable operation or stopping the CRTC * should stop transmitting video, and the data lanes should return to the LP-11 * state. The DSI host &post_disable operation should disable the PHY. * If the &pre_enable_prev_first flag is set, then the DSI peripheral's * bridge &post_disable will be called before the DSI host's post_disable. * * Whilst it is valid to call &host_transfer prior to pre_enable or after * post_disable, the exact state of the lanes is undefined at this point. The * DSI host should initialise the interface, transmit the data, and then disable * the interface again. * * Ultra Low Power State (ULPS) is not explicitly supported by DRM. If * implemented, it therefore needs to be handled entirely within the DSI Host * driver. */ static DEFINE_MUTEX(bridge_lock); static LIST_HEAD(bridge_list); /** * drm_bridge_add - add the given bridge to the global bridge list * * @bridge: bridge control structure */ void drm_bridge_add(struct drm_bridge *bridge) { mutex_init(&bridge->hpd_mutex); if (bridge->ops & DRM_BRIDGE_OP_HDMI) bridge->ycbcr_420_allowed = !!(bridge->supported_formats & BIT(HDMI_COLORSPACE_YUV420)); mutex_lock(&bridge_lock); list_add_tail(&bridge->list, &bridge_list); mutex_unlock(&bridge_lock); } EXPORT_SYMBOL(drm_bridge_add); static void drm_bridge_remove_void(void *bridge) { drm_bridge_remove(bridge); } /** * devm_drm_bridge_add - devm managed version of drm_bridge_add() * * @dev: device to tie the bridge lifetime to * @bridge: bridge control structure * * This is the managed version of drm_bridge_add() which automatically * calls drm_bridge_remove() when @dev is unbound. * * Return: 0 if no error or negative error code. */ int devm_drm_bridge_add(struct device *dev, struct drm_bridge *bridge) { drm_bridge_add(bridge); return devm_add_action_or_reset(dev, drm_bridge_remove_void, bridge); } EXPORT_SYMBOL(devm_drm_bridge_add); /** * drm_bridge_remove - remove the given bridge from the global bridge list * * @bridge: bridge control structure */ void drm_bridge_remove(struct drm_bridge *bridge) { mutex_lock(&bridge_lock); list_del_init(&bridge->list); mutex_unlock(&bridge_lock); mutex_destroy(&bridge->hpd_mutex); } EXPORT_SYMBOL(drm_bridge_remove); static struct drm_private_state * drm_bridge_atomic_duplicate_priv_state(struct drm_private_obj *obj) { struct drm_bridge *bridge = drm_priv_to_bridge(obj); struct drm_bridge_state *state; state = bridge->funcs->atomic_duplicate_state(bridge); return state ? &state->base : NULL; } static void drm_bridge_atomic_destroy_priv_state(struct drm_private_obj *obj, struct drm_private_state *s) { struct drm_bridge_state *state = drm_priv_to_bridge_state(s); struct drm_bridge *bridge = drm_priv_to_bridge(obj); bridge->funcs->atomic_destroy_state(bridge, state); } static const struct drm_private_state_funcs drm_bridge_priv_state_funcs = { .atomic_duplicate_state = drm_bridge_atomic_duplicate_priv_state, .atomic_destroy_state = drm_bridge_atomic_destroy_priv_state, }; /** * drm_bridge_attach - attach the bridge to an encoder's chain * * @encoder: DRM encoder * @bridge: bridge to attach * @previous: previous bridge in the chain (optional) * @flags: DRM_BRIDGE_ATTACH_* flags * * Called by a kms driver to link the bridge to an encoder's chain. The previous * argument specifies the previous bridge in the chain. If NULL, the bridge is * linked directly at the encoder's output. Otherwise it is linked at the * previous bridge's output. * * If non-NULL the previous bridge must be already attached by a call to this * function. * * Note that bridges attached to encoders are auto-detached during encoder * cleanup in drm_encoder_cleanup(), so drm_bridge_attach() should generally * *not* be balanced with a drm_bridge_detach() in driver code. * * RETURNS: * Zero on success, error code on failure */ int drm_bridge_attach(struct drm_encoder *encoder, struct drm_bridge *bridge, struct drm_bridge *previous, enum drm_bridge_attach_flags flags) { int ret; if (!encoder || !bridge) return -EINVAL; if (previous && (!previous->dev || previous->encoder != encoder)) return -EINVAL; if (bridge->dev) return -EBUSY; bridge->dev = encoder->dev; bridge->encoder = encoder; if (previous) list_add(&bridge->chain_node, &previous->chain_node); else list_add(&bridge->chain_node, &encoder->bridge_chain); if (bridge->funcs->attach) { ret = bridge->funcs->attach(bridge, flags); if (ret < 0) goto err_reset_bridge; } if (bridge->funcs->atomic_reset) { struct drm_bridge_state *state; state = bridge->funcs->atomic_reset(bridge); if (IS_ERR(state)) { ret = PTR_ERR(state); goto err_detach_bridge; } drm_atomic_private_obj_init(bridge->dev, &bridge->base, &state->base, &drm_bridge_priv_state_funcs); } return 0; err_detach_bridge: if (bridge->funcs->detach) bridge->funcs->detach(bridge); err_reset_bridge: bridge->dev = NULL; bridge->encoder = NULL; list_del(&bridge->chain_node); if (ret != -EPROBE_DEFER) DRM_ERROR("failed to attach bridge %pOF to encoder %s: %d\n", bridge->of_node, encoder->name, ret); else dev_err_probe(encoder->dev->dev, -EPROBE_DEFER, "failed to attach bridge %pOF to encoder %s\n", bridge->of_node, encoder->name); return ret; } EXPORT_SYMBOL(drm_bridge_attach); void drm_bridge_detach(struct drm_bridge *bridge) { if (WARN_ON(!bridge)) return; if (WARN_ON(!bridge->dev)) return; if (bridge->funcs->atomic_reset) drm_atomic_private_obj_fini(&bridge->base); if (bridge->funcs->detach) bridge->funcs->detach(bridge); list_del(&bridge->chain_node); bridge->dev = NULL; } /** * DOC: bridge operations * * Bridge drivers expose operations through the &drm_bridge_funcs structure. * The DRM internals (atomic and CRTC helpers) use the helpers defined in * drm_bridge.c to call bridge operations. Those operations are divided in * three big categories to support different parts of the bridge usage. * * - The encoder-related operations support control of the bridges in the * chain, and are roughly counterparts to the &drm_encoder_helper_funcs * operations. They are used by the legacy CRTC and the atomic modeset * helpers to perform mode validation, fixup and setting, and enable and * disable the bridge automatically. * * The enable and disable operations are split in * &drm_bridge_funcs.pre_enable, &drm_bridge_funcs.enable, * &drm_bridge_funcs.disable and &drm_bridge_funcs.post_disable to provide * finer-grained control. * * Bridge drivers may implement the legacy version of those operations, or * the atomic version (prefixed with atomic\_), in which case they shall also * implement the atomic state bookkeeping operations * (&drm_bridge_funcs.atomic_duplicate_state, * &drm_bridge_funcs.atomic_destroy_state and &drm_bridge_funcs.reset). * Mixing atomic and non-atomic versions of the operations is not supported. * * - The bus format negotiation operations * &drm_bridge_funcs.atomic_get_output_bus_fmts and * &drm_bridge_funcs.atomic_get_input_bus_fmts allow bridge drivers to * negotiate the formats transmitted between bridges in the chain when * multiple formats are supported. Negotiation for formats is performed * transparently for display drivers by the atomic modeset helpers. Only * atomic versions of those operations exist, bridge drivers that need to * implement them shall thus also implement the atomic version of the * encoder-related operations. This feature is not supported by the legacy * CRTC helpers. * * - The connector-related operations support implementing a &drm_connector * based on a chain of bridges. DRM bridges traditionally create a * &drm_connector for bridges meant to be used at the end of the chain. This * puts additional burden on bridge drivers, especially for bridges that may * be used in the middle of a chain or at the end of it. Furthermore, it * requires all operations of the &drm_connector to be handled by a single * bridge, which doesn't always match the hardware architecture. * * To simplify bridge drivers and make the connector implementation more * flexible, a new model allows bridges to unconditionally skip creation of * &drm_connector and instead expose &drm_bridge_funcs operations to support * an externally-implemented &drm_connector. Those operations are * &drm_bridge_funcs.detect, &drm_bridge_funcs.get_modes, * &drm_bridge_funcs.get_edid, &drm_bridge_funcs.hpd_notify, * &drm_bridge_funcs.hpd_enable and &drm_bridge_funcs.hpd_disable. When * implemented, display drivers shall create a &drm_connector instance for * each chain of bridges, and implement those connector instances based on * the bridge connector operations. * * Bridge drivers shall implement the connector-related operations for all * the features that the bridge hardware support. For instance, if a bridge * supports reading EDID, the &drm_bridge_funcs.get_edid shall be * implemented. This however doesn't mean that the DDC lines are wired to the * bridge on a particular platform, as they could also be connected to an I2C * controller of the SoC. Support for the connector-related operations on the * running platform is reported through the &drm_bridge.ops flags. Bridge * drivers shall detect which operations they can support on the platform * (usually this information is provided by ACPI or DT), and set the * &drm_bridge.ops flags for all supported operations. A flag shall only be * set if the corresponding &drm_bridge_funcs operation is implemented, but * an implemented operation doesn't necessarily imply that the corresponding * flag will be set. Display drivers shall use the &drm_bridge.ops flags to * decide which bridge to delegate a connector operation to. This mechanism * allows providing a single static const &drm_bridge_funcs instance in * bridge drivers, improving security by storing function pointers in * read-only memory. * * In order to ease transition, bridge drivers may support both the old and * new models by making connector creation optional and implementing the * connected-related bridge operations. Connector creation is then controlled * by the flags argument to the drm_bridge_attach() function. Display drivers * that support the new model and create connectors themselves shall set the * %DRM_BRIDGE_ATTACH_NO_CONNECTOR flag, and bridge drivers shall then skip * connector creation. For intermediate bridges in the chain, the flag shall * be passed to the drm_bridge_attach() call for the downstream bridge. * Bridge drivers that implement the new model only shall return an error * from their &drm_bridge_funcs.attach handler when the * %DRM_BRIDGE_ATTACH_NO_CONNECTOR flag is not set. New display drivers * should use the new model, and convert the bridge drivers they use if * needed, in order to gradually transition to the new model. */ /** * drm_bridge_chain_mode_valid - validate the mode against all bridges in the * encoder chain. * @bridge: bridge control structure * @info: display info against which the mode shall be validated * @mode: desired mode to be validated * * Calls &drm_bridge_funcs.mode_valid for all the bridges in the encoder * chain, starting from the first bridge to the last. If at least one bridge * does not accept the mode the function returns the error code. * * Note: the bridge passed should be the one closest to the encoder. * * RETURNS: * MODE_OK on success, drm_mode_status Enum error code on failure */ enum drm_mode_status drm_bridge_chain_mode_valid(struct drm_bridge *bridge, const struct drm_display_info *info, const struct drm_display_mode *mode) { struct drm_encoder *encoder; if (!bridge) return MODE_OK; encoder = bridge->encoder; list_for_each_entry_from(bridge, &encoder->bridge_chain, chain_node) { enum drm_mode_status ret; if (!bridge->funcs->mode_valid) continue; ret = bridge->funcs->mode_valid(bridge, info, mode); if (ret != MODE_OK) return ret; } return MODE_OK; } EXPORT_SYMBOL(drm_bridge_chain_mode_valid); /** * drm_bridge_chain_mode_set - set proposed mode for all bridges in the * encoder chain * @bridge: bridge control structure * @mode: desired mode to be set for the encoder chain * @adjusted_mode: updated mode that works for this encoder chain * * Calls &drm_bridge_funcs.mode_set op for all the bridges in the * encoder chain, starting from the first bridge to the last. * * Note: the bridge passed should be the one closest to the encoder */ void drm_bridge_chain_mode_set(struct drm_bridge *bridge, const struct drm_display_mode *mode, const struct drm_display_mode *adjusted_mode) { struct drm_encoder *encoder; if (!bridge) return; encoder = bridge->encoder; list_for_each_entry_from(bridge, &encoder->bridge_chain, chain_node) { if (bridge->funcs->mode_set) bridge->funcs->mode_set(bridge, mode, adjusted_mode); } } EXPORT_SYMBOL(drm_bridge_chain_mode_set); /** * drm_atomic_bridge_chain_disable - disables all bridges in the encoder chain * @bridge: bridge control structure * @old_state: old atomic state * * Calls &drm_bridge_funcs.atomic_disable (falls back on * &drm_bridge_funcs.disable) op for all the bridges in the encoder chain, * starting from the last bridge to the first. These are called before calling * &drm_encoder_helper_funcs.atomic_disable * * Note: the bridge passed should be the one closest to the encoder */ void drm_atomic_bridge_chain_disable(struct drm_bridge *bridge, struct drm_atomic_state *old_state) { struct drm_encoder *encoder; struct drm_bridge *iter; if (!bridge) return; encoder = bridge->encoder; list_for_each_entry_reverse(iter, &encoder->bridge_chain, chain_node) { if (iter->funcs->atomic_disable) { struct drm_bridge_state *old_bridge_state; old_bridge_state = drm_atomic_get_old_bridge_state(old_state, iter); if (WARN_ON(!old_bridge_state)) return; iter->funcs->atomic_disable(iter, old_bridge_state); } else if (iter->funcs->disable) { iter->funcs->disable(iter); } if (iter == bridge) break; } } EXPORT_SYMBOL(drm_atomic_bridge_chain_disable); static void drm_atomic_bridge_call_post_disable(struct drm_bridge *bridge, struct drm_atomic_state *old_state) { if (old_state && bridge->funcs->atomic_post_disable) { struct drm_bridge_state *old_bridge_state; old_bridge_state = drm_atomic_get_old_bridge_state(old_state, bridge); if (WARN_ON(!old_bridge_state)) return; bridge->funcs->atomic_post_disable(bridge, old_bridge_state); } else if (bridge->funcs->post_disable) { bridge->funcs->post_disable(bridge); } } /** * drm_atomic_bridge_chain_post_disable - cleans up after disabling all bridges * in the encoder chain * @bridge: bridge control structure * @old_state: old atomic state * * Calls &drm_bridge_funcs.atomic_post_disable (falls back on * &drm_bridge_funcs.post_disable) op for all the bridges in the encoder chain, * starting from the first bridge to the last. These are called after completing * &drm_encoder_helper_funcs.atomic_disable * * If a bridge sets @pre_enable_prev_first, then the @post_disable for that * bridge will be called before the previous one to reverse the @pre_enable * calling direction. * * Example: * Bridge A ---> Bridge B ---> Bridge C ---> Bridge D ---> Bridge E * * With pre_enable_prev_first flag enable in Bridge B, D, E then the resulting * @post_disable order would be, * Bridge B, Bridge A, Bridge E, Bridge D, Bridge C. * * Note: the bridge passed should be the one closest to the encoder */ void drm_atomic_bridge_chain_post_disable(struct drm_bridge *bridge, struct drm_atomic_state *old_state) { struct drm_encoder *encoder; struct drm_bridge *next, *limit; if (!bridge) return; encoder = bridge->encoder; list_for_each_entry_from(bridge, &encoder->bridge_chain, chain_node) { limit = NULL; if (!list_is_last(&bridge->chain_node, &encoder->bridge_chain)) { next = list_next_entry(bridge, chain_node); if (next->pre_enable_prev_first) { /* next bridge had requested that prev * was enabled first, so disabled last */ limit = next; /* Find the next bridge that has NOT requested * prev to be enabled first / disabled last */ list_for_each_entry_from(next, &encoder->bridge_chain, chain_node) { if (!next->pre_enable_prev_first) { next = list_prev_entry(next, chain_node); limit = next; break; } if (list_is_last(&next->chain_node, &encoder->bridge_chain)) { limit = next; break; } } /* Call these bridges in reverse order */ list_for_each_entry_from_reverse(next, &encoder->bridge_chain, chain_node) { if (next == bridge) break; drm_atomic_bridge_call_post_disable(next, old_state); } } } drm_atomic_bridge_call_post_disable(bridge, old_state); if (limit) /* Jump all bridges that we have already post_disabled */ bridge = limit; } } EXPORT_SYMBOL(drm_atomic_bridge_chain_post_disable); static void drm_atomic_bridge_call_pre_enable(struct drm_bridge *bridge, struct drm_atomic_state *old_state) { if (old_state && bridge->funcs->atomic_pre_enable) { struct drm_bridge_state *old_bridge_state; old_bridge_state = drm_atomic_get_old_bridge_state(old_state, bridge); if (WARN_ON(!old_bridge_state)) return; bridge->funcs->atomic_pre_enable(bridge, old_bridge_state); } else if (bridge->funcs->pre_enable) { bridge->funcs->pre_enable(bridge); } } /** * drm_atomic_bridge_chain_pre_enable - prepares for enabling all bridges in * the encoder chain * @bridge: bridge control structure * @old_state: old atomic state * * Calls &drm_bridge_funcs.atomic_pre_enable (falls back on * &drm_bridge_funcs.pre_enable) op for all the bridges in the encoder chain, * starting from the last bridge to the first. These are called before calling * &drm_encoder_helper_funcs.atomic_enable * * If a bridge sets @pre_enable_prev_first, then the pre_enable for the * prev bridge will be called before pre_enable of this bridge. * * Example: * Bridge A ---> Bridge B ---> Bridge C ---> Bridge D ---> Bridge E * * With pre_enable_prev_first flag enable in Bridge B, D, E then the resulting * @pre_enable order would be, * Bridge C, Bridge D, Bridge E, Bridge A, Bridge B. * * Note: the bridge passed should be the one closest to the encoder */ void drm_atomic_bridge_chain_pre_enable(struct drm_bridge *bridge, struct drm_atomic_state *old_state) { struct drm_encoder *encoder; struct drm_bridge *iter, *next, *limit; if (!bridge) return; encoder = bridge->encoder; list_for_each_entry_reverse(iter, &encoder->bridge_chain, chain_node) { if (iter->pre_enable_prev_first) { next = iter; limit = bridge; list_for_each_entry_from_reverse(next, &encoder->bridge_chain, chain_node) { if (next == bridge) break; if (!next->pre_enable_prev_first) { /* Found first bridge that does NOT * request prev to be enabled first */ limit = next; break; } } list_for_each_entry_from(next, &encoder->bridge_chain, chain_node) { /* Call requested prev bridge pre_enable * in order. */ if (next == iter) /* At the first bridge to request prev * bridges called first. */ break; drm_atomic_bridge_call_pre_enable(next, old_state); } } drm_atomic_bridge_call_pre_enable(iter, old_state); if (iter->pre_enable_prev_first) /* Jump all bridges that we have already pre_enabled */ iter = limit; if (iter == bridge) break; } } EXPORT_SYMBOL(drm_atomic_bridge_chain_pre_enable); /** * drm_atomic_bridge_chain_enable - enables all bridges in the encoder chain * @bridge: bridge control structure * @old_state: old atomic state * * Calls &drm_bridge_funcs.atomic_enable (falls back on * &drm_bridge_funcs.enable) op for all the bridges in the encoder chain, * starting from the first bridge to the last. These are called after completing * &drm_encoder_helper_funcs.atomic_enable * * Note: the bridge passed should be the one closest to the encoder */ void drm_atomic_bridge_chain_enable(struct drm_bridge *bridge, struct drm_atomic_state *old_state) { struct drm_encoder *encoder; if (!bridge) return; encoder = bridge->encoder; list_for_each_entry_from(bridge, &encoder->bridge_chain, chain_node) { if (bridge->funcs->atomic_enable) { struct drm_bridge_state *old_bridge_state; old_bridge_state = drm_atomic_get_old_bridge_state(old_state, bridge); if (WARN_ON(!old_bridge_state)) return; bridge->funcs->atomic_enable(bridge, old_bridge_state); } else if (bridge->funcs->enable) { bridge->funcs->enable(bridge); } } } EXPORT_SYMBOL(drm_atomic_bridge_chain_enable); static int drm_atomic_bridge_check(struct drm_bridge *bridge, struct drm_crtc_state *crtc_state, struct drm_connector_state *conn_state) { if (bridge->funcs->atomic_check) { struct drm_bridge_state *bridge_state; int ret; bridge_state = drm_atomic_get_new_bridge_state(crtc_state->state, bridge); if (WARN_ON(!bridge_state)) return -EINVAL; ret = bridge->funcs->atomic_check(bridge, bridge_state, crtc_state, conn_state); if (ret) return ret; } else if (bridge->funcs->mode_fixup) { if (!bridge->funcs->mode_fixup(bridge, &crtc_state->mode, &crtc_state->adjusted_mode)) return -EINVAL; } return 0; } static int select_bus_fmt_recursive(struct drm_bridge *first_bridge, struct drm_bridge *cur_bridge, struct drm_crtc_state *crtc_state, struct drm_connector_state *conn_state, u32 out_bus_fmt) { unsigned int i, num_in_bus_fmts = 0; struct drm_bridge_state *cur_state; struct drm_bridge *prev_bridge; u32 *in_bus_fmts; int ret; prev_bridge = drm_bridge_get_prev_bridge(cur_bridge); cur_state = drm_atomic_get_new_bridge_state(crtc_state->state, cur_bridge); /* * If bus format negotiation is not supported by this bridge, let's * pass MEDIA_BUS_FMT_FIXED to the previous bridge in the chain and * hope that it can handle this situation gracefully (by providing * appropriate default values). */ if (!cur_bridge->funcs->atomic_get_input_bus_fmts) { if (cur_bridge != first_bridge) { ret = select_bus_fmt_recursive(first_bridge, prev_bridge, crtc_state, conn_state, MEDIA_BUS_FMT_FIXED); if (ret) return ret; } /* * Driver does not implement the atomic state hooks, but that's * fine, as long as it does not access the bridge state. */ if (cur_state) { cur_state->input_bus_cfg.format = MEDIA_BUS_FMT_FIXED; cur_state->output_bus_cfg.format = out_bus_fmt; } return 0; } /* * If the driver implements ->atomic_get_input_bus_fmts() it * should also implement the atomic state hooks. */ if (WARN_ON(!cur_state)) return -EINVAL; in_bus_fmts = cur_bridge->funcs->atomic_get_input_bus_fmts(cur_bridge, cur_state, crtc_state, conn_state, out_bus_fmt, &num_in_bus_fmts); if (!num_in_bus_fmts) return -ENOTSUPP; else if (!in_bus_fmts) return -ENOMEM; if (first_bridge == cur_bridge) { cur_state->input_bus_cfg.format = in_bus_fmts[0]; cur_state->output_bus_cfg.format = out_bus_fmt; kfree(in_bus_fmts); return 0; } for (i = 0; i < num_in_bus_fmts; i++) { ret = select_bus_fmt_recursive(first_bridge, prev_bridge, crtc_state, conn_state, in_bus_fmts[i]); if (ret != -ENOTSUPP) break; } if (!ret) { cur_state->input_bus_cfg.format = in_bus_fmts[i]; cur_state->output_bus_cfg.format = out_bus_fmt; } kfree(in_bus_fmts); return ret; } /* * This function is called by &drm_atomic_bridge_chain_check() just before * calling &drm_bridge_funcs.atomic_check() on all elements of the chain. * It performs bus format negotiation between bridge elements. The negotiation * happens in reverse order, starting from the last element in the chain up to * @bridge. * * Negotiation starts by retrieving supported output bus formats on the last * bridge element and testing them one by one. The test is recursive, meaning * that for each tested output format, the whole chain will be walked backward, * and each element will have to choose an input bus format that can be * transcoded to the requested output format. When a bridge element does not * support transcoding into a specific output format -ENOTSUPP is returned and * the next bridge element will have to try a different format. If none of the * combinations worked, -ENOTSUPP is returned and the atomic modeset will fail. * * This implementation is relying on * &drm_bridge_funcs.atomic_get_output_bus_fmts() and * &drm_bridge_funcs.atomic_get_input_bus_fmts() to gather supported * input/output formats. * * When &drm_bridge_funcs.atomic_get_output_bus_fmts() is not implemented by * the last element of the chain, &drm_atomic_bridge_chain_select_bus_fmts() * tries a single format: &drm_connector.display_info.bus_formats[0] if * available, MEDIA_BUS_FMT_FIXED otherwise. * * When &drm_bridge_funcs.atomic_get_input_bus_fmts() is not implemented, * &drm_atomic_bridge_chain_select_bus_fmts() skips the negotiation on the * bridge element that lacks this hook and asks the previous element in the * chain to try MEDIA_BUS_FMT_FIXED. It's up to bridge drivers to decide what * to do in that case (fail if they want to enforce bus format negotiation, or * provide a reasonable default if they need to support pipelines where not * all elements support bus format negotiation). */ static int drm_atomic_bridge_chain_select_bus_fmts(struct drm_bridge *bridge, struct drm_crtc_state *crtc_state, struct drm_connector_state *conn_state) { struct drm_connector *conn = conn_state->connector; struct drm_encoder *encoder = bridge->encoder; struct drm_bridge_state *last_bridge_state; unsigned int i, num_out_bus_fmts = 0; struct drm_bridge *last_bridge; u32 *out_bus_fmts; int ret = 0; last_bridge = list_last_entry(&encoder->bridge_chain, struct drm_bridge, chain_node); last_bridge_state = drm_atomic_get_new_bridge_state(crtc_state->state, last_bridge); if (last_bridge->funcs->atomic_get_output_bus_fmts) { const struct drm_bridge_funcs *funcs = last_bridge->funcs; /* * If the driver implements ->atomic_get_output_bus_fmts() it * should also implement the atomic state hooks. */ if (WARN_ON(!last_bridge_state)) return -EINVAL; out_bus_fmts = funcs->atomic_get_output_bus_fmts(last_bridge, last_bridge_state, crtc_state, conn_state, &num_out_bus_fmts); if (!num_out_bus_fmts) return -ENOTSUPP; else if (!out_bus_fmts) return -ENOMEM; } else { num_out_bus_fmts = 1; out_bus_fmts = kmalloc(sizeof(*out_bus_fmts), GFP_KERNEL); if (!out_bus_fmts) return -ENOMEM; if (conn->display_info.num_bus_formats && conn->display_info.bus_formats) out_bus_fmts[0] = conn->display_info.bus_formats[0]; else out_bus_fmts[0] = MEDIA_BUS_FMT_FIXED; } for (i = 0; i < num_out_bus_fmts; i++) { ret = select_bus_fmt_recursive(bridge, last_bridge, crtc_state, conn_state, out_bus_fmts[i]); if (ret != -ENOTSUPP) break; } kfree(out_bus_fmts); return ret; } static void drm_atomic_bridge_propagate_bus_flags(struct drm_bridge *bridge, struct drm_connector *conn, struct drm_atomic_state *state) { struct drm_bridge_state *bridge_state, *next_bridge_state; struct drm_bridge *next_bridge; u32 output_flags = 0; bridge_state = drm_atomic_get_new_bridge_state(state, bridge); /* No bridge state attached to this bridge => nothing to propagate. */ if (!bridge_state) return; next_bridge = drm_bridge_get_next_bridge(bridge); /* * Let's try to apply the most common case here, that is, propagate * display_info flags for the last bridge, and propagate the input * flags of the next bridge element to the output end of the current * bridge when the bridge is not the last one. * There are exceptions to this rule, like when signal inversion is * happening at the board level, but that's something drivers can deal * with from their &drm_bridge_funcs.atomic_check() implementation by * simply overriding the flags value we've set here. */ if (!next_bridge) { output_flags = conn->display_info.bus_flags; } else { next_bridge_state = drm_atomic_get_new_bridge_state(state, next_bridge); /* * No bridge state attached to the next bridge, just leave the * flags to 0. */ if (next_bridge_state) output_flags = next_bridge_state->input_bus_cfg.flags; } bridge_state->output_bus_cfg.flags = output_flags; /* * Propagate the output flags to the input end of the bridge. Again, it's * not necessarily what all bridges want, but that's what most of them * do, and by doing that by default we avoid forcing drivers to * duplicate the "dummy propagation" logic. */ bridge_state->input_bus_cfg.flags = output_flags; } /** * drm_atomic_bridge_chain_check() - Do an atomic check on the bridge chain * @bridge: bridge control structure * @crtc_state: new CRTC state * @conn_state: new connector state * * First trigger a bus format negotiation before calling * &drm_bridge_funcs.atomic_check() (falls back on * &drm_bridge_funcs.mode_fixup()) op for all the bridges in the encoder chain, * starting from the last bridge to the first. These are called before calling * &drm_encoder_helper_funcs.atomic_check() * * RETURNS: * 0 on success, a negative error code on failure */ int drm_atomic_bridge_chain_check(struct drm_bridge *bridge, struct drm_crtc_state *crtc_state, struct drm_connector_state *conn_state) { struct drm_connector *conn = conn_state->connector; struct drm_encoder *encoder; struct drm_bridge *iter; int ret; if (!bridge) return 0; ret = drm_atomic_bridge_chain_select_bus_fmts(bridge, crtc_state, conn_state); if (ret) return ret; encoder = bridge->encoder; list_for_each_entry_reverse(iter, &encoder->bridge_chain, chain_node) { int ret; /* * Bus flags are propagated by default. If a bridge needs to * tweak the input bus flags for any reason, it should happen * in its &drm_bridge_funcs.atomic_check() implementation such * that preceding bridges in the chain can propagate the new * bus flags. */ drm_atomic_bridge_propagate_bus_flags(iter, conn, crtc_state->state); ret = drm_atomic_bridge_check(iter, crtc_state, conn_state); if (ret) return ret; if (iter == bridge) break; } return 0; } EXPORT_SYMBOL(drm_atomic_bridge_chain_check); /** * drm_bridge_detect - check if anything is attached to the bridge output * @bridge: bridge control structure * * If the bridge supports output detection, as reported by the * DRM_BRIDGE_OP_DETECT bridge ops flag, call &drm_bridge_funcs.detect for the * bridge and return the connection status. Otherwise return * connector_status_unknown. * * RETURNS: * The detection status on success, or connector_status_unknown if the bridge * doesn't support output detection. */ enum drm_connector_status drm_bridge_detect(struct drm_bridge *bridge) { if (!(bridge->ops & DRM_BRIDGE_OP_DETECT)) return connector_status_unknown; return bridge->funcs->detect(bridge); } EXPORT_SYMBOL_GPL(drm_bridge_detect); /** * drm_bridge_get_modes - fill all modes currently valid for the sink into the * @connector * @bridge: bridge control structure * @connector: the connector to fill with modes * * If the bridge supports output modes retrieval, as reported by the * DRM_BRIDGE_OP_MODES bridge ops flag, call &drm_bridge_funcs.get_modes to * fill the connector with all valid modes and return the number of modes * added. Otherwise return 0. * * RETURNS: * The number of modes added to the connector. */ int drm_bridge_get_modes(struct drm_bridge *bridge, struct drm_connector *connector) { if (!(bridge->ops & DRM_BRIDGE_OP_MODES)) return 0; return bridge->funcs->get_modes(bridge, connector); } EXPORT_SYMBOL_GPL(drm_bridge_get_modes); /** * drm_bridge_edid_read - read the EDID data of the connected display * @bridge: bridge control structure * @connector: the connector to read EDID for * * If the bridge supports output EDID retrieval, as reported by the * DRM_BRIDGE_OP_EDID bridge ops flag, call &drm_bridge_funcs.edid_read to get * the EDID and return it. Otherwise return NULL. * * RETURNS: * The retrieved EDID on success, or NULL otherwise. */ const struct drm_edid *drm_bridge_edid_read(struct drm_bridge *bridge, struct drm_connector *connector) { if (!(bridge->ops & DRM_BRIDGE_OP_EDID)) return NULL; return bridge->funcs->edid_read(bridge, connector); } EXPORT_SYMBOL_GPL(drm_bridge_edid_read); /** * drm_bridge_hpd_enable - enable hot plug detection for the bridge * @bridge: bridge control structure * @cb: hot-plug detection callback * @data: data to be passed to the hot-plug detection callback * * Call &drm_bridge_funcs.hpd_enable if implemented and register the given @cb * and @data as hot plug notification callback. From now on the @cb will be * called with @data when an output status change is detected by the bridge, * until hot plug notification gets disabled with drm_bridge_hpd_disable(). * * Hot plug detection is supported only if the DRM_BRIDGE_OP_HPD flag is set in * bridge->ops. This function shall not be called when the flag is not set. * * Only one hot plug detection callback can be registered at a time, it is an * error to call this function when hot plug detection is already enabled for * the bridge. */ void drm_bridge_hpd_enable(struct drm_bridge *bridge, void (*cb)(void *data, enum drm_connector_status status), void *data) { if (!(bridge->ops & DRM_BRIDGE_OP_HPD)) return; mutex_lock(&bridge->hpd_mutex); if (WARN(bridge->hpd_cb, "Hot plug detection already enabled\n")) goto unlock; bridge->hpd_cb = cb; bridge->hpd_data = data; if (bridge->funcs->hpd_enable) bridge->funcs->hpd_enable(bridge); unlock: mutex_unlock(&bridge->hpd_mutex); } EXPORT_SYMBOL_GPL(drm_bridge_hpd_enable); /** * drm_bridge_hpd_disable - disable hot plug detection for the bridge * @bridge: bridge control structure * * Call &drm_bridge_funcs.hpd_disable if implemented and unregister the hot * plug detection callback previously registered with drm_bridge_hpd_enable(). * Once this function returns the callback will not be called by the bridge * when an output status change occurs. * * Hot plug detection is supported only if the DRM_BRIDGE_OP_HPD flag is set in * bridge->ops. This function shall not be called when the flag is not set. */ void drm_bridge_hpd_disable(struct drm_bridge *bridge) { if (!(bridge->ops & DRM_BRIDGE_OP_HPD)) return; mutex_lock(&bridge->hpd_mutex); if (bridge->funcs->hpd_disable) bridge->funcs->hpd_disable(bridge); bridge->hpd_cb = NULL; bridge->hpd_data = NULL; mutex_unlock(&bridge->hpd_mutex); } EXPORT_SYMBOL_GPL(drm_bridge_hpd_disable); /** * drm_bridge_hpd_notify - notify hot plug detection events * @bridge: bridge control structure * @status: output connection status * * Bridge drivers shall call this function to report hot plug events when they * detect a change in the output status, when hot plug detection has been * enabled by drm_bridge_hpd_enable(). * * This function shall be called in a context that can sleep. */ void drm_bridge_hpd_notify(struct drm_bridge *bridge, enum drm_connector_status status) { mutex_lock(&bridge->hpd_mutex); if (bridge->hpd_cb) bridge->hpd_cb(bridge->hpd_data, status); mutex_unlock(&bridge->hpd_mutex); } EXPORT_SYMBOL_GPL(drm_bridge_hpd_notify); #ifdef CONFIG_OF /** * of_drm_find_bridge - find the bridge corresponding to the device node in * the global bridge list * * @np: device node * * RETURNS: * drm_bridge control struct on success, NULL on failure */ struct drm_bridge *of_drm_find_bridge(struct device_node *np) { struct drm_bridge *bridge; mutex_lock(&bridge_lock); list_for_each_entry(bridge, &bridge_list, list) { if (bridge->of_node == np) { mutex_unlock(&bridge_lock); return bridge; } } mutex_unlock(&bridge_lock); return NULL; } EXPORT_SYMBOL(of_drm_find_bridge); #endif MODULE_AUTHOR("Ajay Kumar <ajaykumar.rs@samsung.com>"); MODULE_DESCRIPTION("DRM bridge infrastructure"); MODULE_LICENSE("GPL and additional rights"); |
| 6 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | /* SPDX-License-Identifier: GPL-2.0 */ #undef TRACE_SYSTEM #define TRACE_INCLUDE_PATH ../../drivers/dma-buf #define TRACE_SYSTEM sync_trace #if !defined(_TRACE_SYNC_H) || defined(TRACE_HEADER_MULTI_READ) #define _TRACE_SYNC_H #include "sync_debug.h" #include <linux/tracepoint.h> TRACE_EVENT(sync_timeline, TP_PROTO(struct sync_timeline *timeline), TP_ARGS(timeline), TP_STRUCT__entry( __string(name, timeline->name) __field(u32, value) ), TP_fast_assign( __assign_str(name); __entry->value = timeline->value; ), TP_printk("name=%s value=%d", __get_str(name), __entry->value) ); #endif /* if !defined(_TRACE_SYNC_H) || defined(TRACE_HEADER_MULTI_READ) */ /* This part must be outside protection */ #include <trace/define_trace.h> |
| 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | // SPDX-License-Identifier: GPL-2.0 #include <linux/cpufreq.h> #include <linux/fs.h> #include <linux/init.h> #include <linux/proc_fs.h> #include <linux/seq_file.h> extern const struct seq_operations cpuinfo_op; static int cpuinfo_open(struct inode *inode, struct file *file) { return seq_open(file, &cpuinfo_op); } static const struct proc_ops cpuinfo_proc_ops = { .proc_flags = PROC_ENTRY_PERMANENT, .proc_open = cpuinfo_open, .proc_read_iter = seq_read_iter, .proc_lseek = seq_lseek, .proc_release = seq_release, }; static int __init proc_cpuinfo_init(void) { proc_create("cpuinfo", 0, NULL, &cpuinfo_proc_ops); return 0; } fs_initcall(proc_cpuinfo_init); |
| 6 6 6 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 | // SPDX-License-Identifier: GPL-2.0-only /* * xt_conntrack - Netfilter module to match connection tracking * information. (Superset of Rusty's minimalistic state match.) * * (C) 2001 Marc Boucher (marc@mbsi.ca). * (C) 2006-2012 Patrick McHardy <kaber@trash.net> * Copyright © CC Computer Consultants GmbH, 2007 - 2008 */ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #include <linux/module.h> #include <linux/skbuff.h> #include <net/ipv6.h> #include <linux/netfilter/x_tables.h> #include <linux/netfilter/xt_conntrack.h> #include <net/netfilter/nf_conntrack.h> MODULE_LICENSE("GPL"); MODULE_AUTHOR("Marc Boucher <marc@mbsi.ca>"); MODULE_AUTHOR("Jan Engelhardt <jengelh@medozas.de>"); MODULE_DESCRIPTION("Xtables: connection tracking state match"); MODULE_ALIAS("ipt_conntrack"); MODULE_ALIAS("ip6t_conntrack"); static bool conntrack_addrcmp(const union nf_inet_addr *kaddr, const union nf_inet_addr *uaddr, const union nf_inet_addr *umask, unsigned int l3proto) { if (l3proto == NFPROTO_IPV4) return ((kaddr->ip ^ uaddr->ip) & umask->ip) == 0; else if (l3proto == NFPROTO_IPV6) return ipv6_masked_addr_cmp(&kaddr->in6, &umask->in6, &uaddr->in6) == 0; else return false; } static inline bool conntrack_mt_origsrc(const struct nf_conn *ct, const struct xt_conntrack_mtinfo2 *info, u_int8_t family) { return conntrack_addrcmp(&ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.u3, &info->origsrc_addr, &info->origsrc_mask, family); } static inline bool conntrack_mt_origdst(const struct nf_conn *ct, const struct xt_conntrack_mtinfo2 *info, u_int8_t family) { return conntrack_addrcmp(&ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.dst.u3, &info->origdst_addr, &info->origdst_mask, family); } static inline bool conntrack_mt_replsrc(const struct nf_conn *ct, const struct xt_conntrack_mtinfo2 *info, u_int8_t family) { return conntrack_addrcmp(&ct->tuplehash[IP_CT_DIR_REPLY].tuple.src.u3, &info->replsrc_addr, &info->replsrc_mask, family); } static inline bool conntrack_mt_repldst(const struct nf_conn *ct, const struct xt_conntrack_mtinfo2 *info, u_int8_t family) { return conntrack_addrcmp(&ct->tuplehash[IP_CT_DIR_REPLY].tuple.dst.u3, &info->repldst_addr, &info->repldst_mask, family); } static inline bool ct_proto_port_check(const struct xt_conntrack_mtinfo2 *info, const struct nf_conn *ct) { const struct nf_conntrack_tuple *tuple; tuple = &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple; if ((info->match_flags & XT_CONNTRACK_PROTO) && (nf_ct_protonum(ct) == info->l4proto) ^ !(info->invert_flags & XT_CONNTRACK_PROTO)) return false; /* Shortcut to match all recognized protocols by using ->src.all. */ if ((info->match_flags & XT_CONNTRACK_ORIGSRC_PORT) && (tuple->src.u.all == info->origsrc_port) ^ !(info->invert_flags & XT_CONNTRACK_ORIGSRC_PORT)) return false; if ((info->match_flags & XT_CONNTRACK_ORIGDST_PORT) && (tuple->dst.u.all == info->origdst_port) ^ !(info->invert_flags & XT_CONNTRACK_ORIGDST_PORT)) return false; tuple = &ct->tuplehash[IP_CT_DIR_REPLY].tuple; if ((info->match_flags & XT_CONNTRACK_REPLSRC_PORT) && (tuple->src.u.all == info->replsrc_port) ^ !(info->invert_flags & XT_CONNTRACK_REPLSRC_PORT)) return false; if ((info->match_flags & XT_CONNTRACK_REPLDST_PORT) && (tuple->dst.u.all == info->repldst_port) ^ !(info->invert_flags & XT_CONNTRACK_REPLDST_PORT)) return false; return true; } static inline bool port_match(u16 min, u16 max, u16 port, bool invert) { return (port >= min && port <= max) ^ invert; } static inline bool ct_proto_port_check_v3(const struct xt_conntrack_mtinfo3 *info, const struct nf_conn *ct) { const struct nf_conntrack_tuple *tuple; tuple = &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple; if ((info->match_flags & XT_CONNTRACK_PROTO) && (nf_ct_protonum(ct) == info->l4proto) ^ !(info->invert_flags & XT_CONNTRACK_PROTO)) return false; /* Shortcut to match all recognized protocols by using ->src.all. */ if ((info->match_flags & XT_CONNTRACK_ORIGSRC_PORT) && !port_match(info->origsrc_port, info->origsrc_port_high, ntohs(tuple->src.u.all), info->invert_flags & XT_CONNTRACK_ORIGSRC_PORT)) return false; if ((info->match_flags & XT_CONNTRACK_ORIGDST_PORT) && !port_match(info->origdst_port, info->origdst_port_high, ntohs(tuple->dst.u.all), info->invert_flags & XT_CONNTRACK_ORIGDST_PORT)) return false; tuple = &ct->tuplehash[IP_CT_DIR_REPLY].tuple; if ((info->match_flags & XT_CONNTRACK_REPLSRC_PORT) && !port_match(info->replsrc_port, info->replsrc_port_high, ntohs(tuple->src.u.all), info->invert_flags & XT_CONNTRACK_REPLSRC_PORT)) return false; if ((info->match_flags & XT_CONNTRACK_REPLDST_PORT) && !port_match(info->repldst_port, info->repldst_port_high, ntohs(tuple->dst.u.all), info->invert_flags & XT_CONNTRACK_REPLDST_PORT)) return false; return true; } static bool conntrack_mt(const struct sk_buff *skb, struct xt_action_param *par, u16 state_mask, u16 status_mask) { const struct xt_conntrack_mtinfo2 *info = par->matchinfo; enum ip_conntrack_info ctinfo; const struct nf_conn *ct; unsigned int statebit; ct = nf_ct_get(skb, &ctinfo); if (ct) statebit = XT_CONNTRACK_STATE_BIT(ctinfo); else if (ctinfo == IP_CT_UNTRACKED) statebit = XT_CONNTRACK_STATE_UNTRACKED; else statebit = XT_CONNTRACK_STATE_INVALID; if (info->match_flags & XT_CONNTRACK_STATE) { if (ct != NULL) { if (test_bit(IPS_SRC_NAT_BIT, &ct->status)) statebit |= XT_CONNTRACK_STATE_SNAT; if (test_bit(IPS_DST_NAT_BIT, &ct->status)) statebit |= XT_CONNTRACK_STATE_DNAT; } if (!!(state_mask & statebit) ^ !(info->invert_flags & XT_CONNTRACK_STATE)) return false; } if (ct == NULL) return info->match_flags & XT_CONNTRACK_STATE; if ((info->match_flags & XT_CONNTRACK_DIRECTION) && (CTINFO2DIR(ctinfo) == IP_CT_DIR_ORIGINAL) ^ !(info->invert_flags & XT_CONNTRACK_DIRECTION)) return false; if (info->match_flags & XT_CONNTRACK_ORIGSRC) if (conntrack_mt_origsrc(ct, info, xt_family(par)) ^ !(info->invert_flags & XT_CONNTRACK_ORIGSRC)) return false; if (info->match_flags & XT_CONNTRACK_ORIGDST) if (conntrack_mt_origdst(ct, info, xt_family(par)) ^ !(info->invert_flags & XT_CONNTRACK_ORIGDST)) return false; if (info->match_flags & XT_CONNTRACK_REPLSRC) if (conntrack_mt_replsrc(ct, info, xt_family(par)) ^ !(info->invert_flags & XT_CONNTRACK_REPLSRC)) return false; if (info->match_flags & XT_CONNTRACK_REPLDST) if (conntrack_mt_repldst(ct, info, xt_family(par)) ^ !(info->invert_flags & XT_CONNTRACK_REPLDST)) return false; if (par->match->revision != 3) { if (!ct_proto_port_check(info, ct)) return false; } else { if (!ct_proto_port_check_v3(par->matchinfo, ct)) return false; } if ((info->match_flags & XT_CONNTRACK_STATUS) && (!!(status_mask & ct->status) ^ !(info->invert_flags & XT_CONNTRACK_STATUS))) return false; if (info->match_flags & XT_CONNTRACK_EXPIRES) { unsigned long expires = nf_ct_expires(ct) / HZ; if ((expires >= info->expires_min && expires <= info->expires_max) ^ !(info->invert_flags & XT_CONNTRACK_EXPIRES)) return false; } return true; } static bool conntrack_mt_v1(const struct sk_buff *skb, struct xt_action_param *par) { const struct xt_conntrack_mtinfo1 *info = par->matchinfo; return conntrack_mt(skb, par, info->state_mask, info->status_mask); } static bool conntrack_mt_v2(const struct sk_buff *skb, struct xt_action_param *par) { const struct xt_conntrack_mtinfo2 *info = par->matchinfo; return conntrack_mt(skb, par, info->state_mask, info->status_mask); } static bool conntrack_mt_v3(const struct sk_buff *skb, struct xt_action_param *par) { const struct xt_conntrack_mtinfo3 *info = par->matchinfo; return conntrack_mt(skb, par, info->state_mask, info->status_mask); } static int conntrack_mt_check(const struct xt_mtchk_param *par) { int ret; ret = nf_ct_netns_get(par->net, par->family); if (ret < 0) pr_info_ratelimited("cannot load conntrack support for proto=%u\n", par->family); return ret; } static void conntrack_mt_destroy(const struct xt_mtdtor_param *par) { nf_ct_netns_put(par->net, par->family); } static struct xt_match conntrack_mt_reg[] __read_mostly = { { .name = "conntrack", .revision = 1, .family = NFPROTO_UNSPEC, .matchsize = sizeof(struct xt_conntrack_mtinfo1), .match = conntrack_mt_v1, .checkentry = conntrack_mt_check, .destroy = conntrack_mt_destroy, .me = THIS_MODULE, }, { .name = "conntrack", .revision = 2, .family = NFPROTO_UNSPEC, .matchsize = sizeof(struct xt_conntrack_mtinfo2), .match = conntrack_mt_v2, .checkentry = conntrack_mt_check, .destroy = conntrack_mt_destroy, .me = THIS_MODULE, }, { .name = "conntrack", .revision = 3, .family = NFPROTO_UNSPEC, .matchsize = sizeof(struct xt_conntrack_mtinfo3), .match = conntrack_mt_v3, .checkentry = conntrack_mt_check, .destroy = conntrack_mt_destroy, .me = THIS_MODULE, }, }; static int __init conntrack_mt_init(void) { return xt_register_matches(conntrack_mt_reg, ARRAY_SIZE(conntrack_mt_reg)); } static void __exit conntrack_mt_exit(void) { xt_unregister_matches(conntrack_mt_reg, ARRAY_SIZE(conntrack_mt_reg)); } module_init(conntrack_mt_init); module_exit(conntrack_mt_exit); |
| 23 4 23 10037 3 15 16 16 16 19 19 11 16386 16384 1392 6 1381 1391 1 1 12 5 27 1 22 22 24 4 23 23 23 17 19 24 24 16 19 23 23 23 637 640 639 638 1 638 14276 14263 14267 14275 35 14271 4075 4080 4076 4088 4078 4075 485 488 487 487 489 3 112 112 74 74 74 74 369 373 369 371 374 1131 1127 23521 2 2 2 2 2 2 2 2 4 4 4 1 3 3 19 18 2 17 3 16 19 20 19 20 10 2 7 6 10 16 15 16 1 15 16 16 1 4 1 4 4 4 4 4 4 15 16 4 4 16 1 547 544 547 548 9803 9803 9796 5 9593 1396 27 27 27 27 8 8 8 1 19 19 19 1 18 19 19 1 8 28 24 4 10 1 1 5 16 12 18 37 37 10 16 3 22 24 10 21 3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 | // SPDX-License-Identifier: GPL-2.0 /* * Kernel timekeeping code and accessor functions. Based on code from * timer.c, moved in commit 8524070b7982. */ #include <linux/timekeeper_internal.h> #include <linux/module.h> #include <linux/interrupt.h> #include <linux/percpu.h> #include <linux/init.h> #include <linux/mm.h> #include <linux/nmi.h> #include <linux/sched.h> #include <linux/sched/loadavg.h> #include <linux/sched/clock.h> #include <linux/syscore_ops.h> #include <linux/clocksource.h> #include <linux/jiffies.h> #include <linux/time.h> #include <linux/timex.h> #include <linux/tick.h> #include <linux/stop_machine.h> #include <linux/pvclock_gtod.h> #include <linux/compiler.h> #include <linux/audit.h> #include <linux/random.h> #include "tick-internal.h" #include "ntp_internal.h" #include "timekeeping_internal.h" #define TK_CLEAR_NTP (1 << 0) #define TK_CLOCK_WAS_SET (1 << 1) #define TK_UPDATE_ALL (TK_CLEAR_NTP | TK_CLOCK_WAS_SET) enum timekeeping_adv_mode { /* Update timekeeper when a tick has passed */ TK_ADV_TICK, /* Update timekeeper on a direct frequency change */ TK_ADV_FREQ }; /* * The most important data for readout fits into a single 64 byte * cache line. */ struct tk_data { seqcount_raw_spinlock_t seq; struct timekeeper timekeeper; struct timekeeper shadow_timekeeper; raw_spinlock_t lock; } ____cacheline_aligned; static struct tk_data tk_core; /* flag for if timekeeping is suspended */ int __read_mostly timekeeping_suspended; /** * struct tk_fast - NMI safe timekeeper * @seq: Sequence counter for protecting updates. The lowest bit * is the index for the tk_read_base array * @base: tk_read_base array. Access is indexed by the lowest bit of * @seq. * * See @update_fast_timekeeper() below. */ struct tk_fast { seqcount_latch_t seq; struct tk_read_base base[2]; }; /* Suspend-time cycles value for halted fast timekeeper. */ static u64 cycles_at_suspend; static u64 dummy_clock_read(struct clocksource *cs) { if (timekeeping_suspended) return cycles_at_suspend; return local_clock(); } static struct clocksource dummy_clock = { .read = dummy_clock_read, }; /* * Boot time initialization which allows local_clock() to be utilized * during early boot when clocksources are not available. local_clock() * returns nanoseconds already so no conversion is required, hence mult=1 * and shift=0. When the first proper clocksource is installed then * the fast time keepers are updated with the correct values. */ #define FAST_TK_INIT \ { \ .clock = &dummy_clock, \ .mask = CLOCKSOURCE_MASK(64), \ .mult = 1, \ .shift = 0, \ } static struct tk_fast tk_fast_mono ____cacheline_aligned = { .seq = SEQCNT_LATCH_ZERO(tk_fast_mono.seq), .base[0] = FAST_TK_INIT, .base[1] = FAST_TK_INIT, }; static struct tk_fast tk_fast_raw ____cacheline_aligned = { .seq = SEQCNT_LATCH_ZERO(tk_fast_raw.seq), .base[0] = FAST_TK_INIT, .base[1] = FAST_TK_INIT, }; unsigned long timekeeper_lock_irqsave(void) { unsigned long flags; raw_spin_lock_irqsave(&tk_core.lock, flags); return flags; } void timekeeper_unlock_irqrestore(unsigned long flags) { raw_spin_unlock_irqrestore(&tk_core.lock, flags); } /* * Multigrain timestamps require tracking the latest fine-grained timestamp * that has been issued, and never returning a coarse-grained timestamp that is * earlier than that value. * * mg_floor represents the latest fine-grained time that has been handed out as * a file timestamp on the system. This is tracked as a monotonic ktime_t, and * converted to a realtime clock value on an as-needed basis. * * Maintaining mg_floor ensures the multigrain interfaces never issue a * timestamp earlier than one that has been previously issued. * * The exception to this rule is when there is a backward realtime clock jump. If * such an event occurs, a timestamp can appear to be earlier than a previous one. */ static __cacheline_aligned_in_smp atomic64_t mg_floor; static inline void tk_normalize_xtime(struct timekeeper *tk) { while (tk->tkr_mono.xtime_nsec >= ((u64)NSEC_PER_SEC << tk->tkr_mono.shift)) { tk->tkr_mono.xtime_nsec -= (u64)NSEC_PER_SEC << tk->tkr_mono.shift; tk->xtime_sec++; } while (tk->tkr_raw.xtime_nsec >= ((u64)NSEC_PER_SEC << tk->tkr_raw.shift)) { tk->tkr_raw.xtime_nsec -= (u64)NSEC_PER_SEC << tk->tkr_raw.shift; tk->raw_sec++; } } static inline struct timespec64 tk_xtime(const struct timekeeper *tk) { struct timespec64 ts; ts.tv_sec = tk->xtime_sec; ts.tv_nsec = (long)(tk->tkr_mono.xtime_nsec >> tk->tkr_mono.shift); return ts; } static void tk_set_xtime(struct timekeeper *tk, const struct timespec64 *ts) { tk->xtime_sec = ts->tv_sec; tk->tkr_mono.xtime_nsec = (u64)ts->tv_nsec << tk->tkr_mono.shift; } static void tk_xtime_add(struct timekeeper *tk, const struct timespec64 *ts) { tk->xtime_sec += ts->tv_sec; tk->tkr_mono.xtime_nsec += (u64)ts->tv_nsec << tk->tkr_mono.shift; tk_normalize_xtime(tk); } static void tk_set_wall_to_mono(struct timekeeper *tk, struct timespec64 wtm) { struct timespec64 tmp; /* * Verify consistency of: offset_real = -wall_to_monotonic * before modifying anything */ set_normalized_timespec64(&tmp, -tk->wall_to_monotonic.tv_sec, -tk->wall_to_monotonic.tv_nsec); WARN_ON_ONCE(tk->offs_real != timespec64_to_ktime(tmp)); tk->wall_to_monotonic = wtm; set_normalized_timespec64(&tmp, -wtm.tv_sec, -wtm.tv_nsec); /* Paired with READ_ONCE() in ktime_mono_to_any() */ WRITE_ONCE(tk->offs_real, timespec64_to_ktime(tmp)); WRITE_ONCE(tk->offs_tai, ktime_add(tk->offs_real, ktime_set(tk->tai_offset, 0))); } static inline void tk_update_sleep_time(struct timekeeper *tk, ktime_t delta) { /* Paired with READ_ONCE() in ktime_mono_to_any() */ WRITE_ONCE(tk->offs_boot, ktime_add(tk->offs_boot, delta)); /* * Timespec representation for VDSO update to avoid 64bit division * on every update. */ tk->monotonic_to_boot = ktime_to_timespec64(tk->offs_boot); } /* * tk_clock_read - atomic clocksource read() helper * * This helper is necessary to use in the read paths because, while the * seqcount ensures we don't return a bad value while structures are updated, * it doesn't protect from potential crashes. There is the possibility that * the tkr's clocksource may change between the read reference, and the * clock reference passed to the read function. This can cause crashes if * the wrong clocksource is passed to the wrong read function. * This isn't necessary to use when holding the tk_core.lock or doing * a read of the fast-timekeeper tkrs (which is protected by its own locking * and update logic). */ static inline u64 tk_clock_read(const struct tk_read_base *tkr) { struct clocksource *clock = READ_ONCE(tkr->clock); return clock->read(clock); } /** * tk_setup_internals - Set up internals to use clocksource clock. * * @tk: The target timekeeper to setup. * @clock: Pointer to clocksource. * * Calculates a fixed cycle/nsec interval for a given clocksource/adjustment * pair and interval request. * * Unless you're the timekeeping code, you should not be using this! */ static void tk_setup_internals(struct timekeeper *tk, struct clocksource *clock) { u64 interval; u64 tmp, ntpinterval; struct clocksource *old_clock; ++tk->cs_was_changed_seq; old_clock = tk->tkr_mono.clock; tk->tkr_mono.clock = clock; tk->tkr_mono.mask = clock->mask; tk->tkr_mono.cycle_last = tk_clock_read(&tk->tkr_mono); tk->tkr_raw.clock = clock; tk->tkr_raw.mask = clock->mask; tk->tkr_raw.cycle_last = tk->tkr_mono.cycle_last; /* Do the ns -> cycle conversion first, using original mult */ tmp = NTP_INTERVAL_LENGTH; tmp <<= clock->shift; ntpinterval = tmp; tmp += clock->mult/2; do_div(tmp, clock->mult); if (tmp == 0) tmp = 1; interval = (u64) tmp; tk->cycle_interval = interval; /* Go back from cycles -> shifted ns */ tk->xtime_interval = interval * clock->mult; tk->xtime_remainder = ntpinterval - tk->xtime_interval; tk->raw_interval = interval * clock->mult; /* if changing clocks, convert xtime_nsec shift units */ if (old_clock) { int shift_change = clock->shift - old_clock->shift; if (shift_change < 0) { tk->tkr_mono.xtime_nsec >>= -shift_change; tk->tkr_raw.xtime_nsec >>= -shift_change; } else { tk->tkr_mono.xtime_nsec <<= shift_change; tk->tkr_raw.xtime_nsec <<= shift_change; } } tk->tkr_mono.shift = clock->shift; tk->tkr_raw.shift = clock->shift; tk->ntp_error = 0; tk->ntp_error_shift = NTP_SCALE_SHIFT - clock->shift; tk->ntp_tick = ntpinterval << tk->ntp_error_shift; /* * The timekeeper keeps its own mult values for the currently * active clocksource. These value will be adjusted via NTP * to counteract clock drifting. */ tk->tkr_mono.mult = clock->mult; tk->tkr_raw.mult = clock->mult; tk->ntp_err_mult = 0; tk->skip_second_overflow = 0; } /* Timekeeper helper functions. */ static noinline u64 delta_to_ns_safe(const struct tk_read_base *tkr, u64 delta) { return mul_u64_u32_add_u64_shr(delta, tkr->mult, tkr->xtime_nsec, tkr->shift); } static inline u64 timekeeping_cycles_to_ns(const struct tk_read_base *tkr, u64 cycles) { /* Calculate the delta since the last update_wall_time() */ u64 mask = tkr->mask, delta = (cycles - tkr->cycle_last) & mask; /* * This detects both negative motion and the case where the delta * overflows the multiplication with tkr->mult. */ if (unlikely(delta > tkr->clock->max_cycles)) { /* * Handle clocksource inconsistency between CPUs to prevent * time from going backwards by checking for the MSB of the * mask being set in the delta. */ if (delta & ~(mask >> 1)) return tkr->xtime_nsec >> tkr->shift; return delta_to_ns_safe(tkr, delta); } return ((delta * tkr->mult) + tkr->xtime_nsec) >> tkr->shift; } static __always_inline u64 timekeeping_get_ns(const struct tk_read_base *tkr) { return timekeeping_cycles_to_ns(tkr, tk_clock_read(tkr)); } /** * update_fast_timekeeper - Update the fast and NMI safe monotonic timekeeper. * @tkr: Timekeeping readout base from which we take the update * @tkf: Pointer to NMI safe timekeeper * * We want to use this from any context including NMI and tracing / * instrumenting the timekeeping code itself. * * Employ the latch technique; see @write_seqcount_latch. * * So if a NMI hits the update of base[0] then it will use base[1] * which is still consistent. In the worst case this can result is a * slightly wrong timestamp (a few nanoseconds). See * @ktime_get_mono_fast_ns. */ static void update_fast_timekeeper(const struct tk_read_base *tkr, struct tk_fast *tkf) { struct tk_read_base *base = tkf->base; /* Force readers off to base[1] */ write_seqcount_latch_begin(&tkf->seq); /* Update base[0] */ memcpy(base, tkr, sizeof(*base)); /* Force readers back to base[0] */ write_seqcount_latch(&tkf->seq); /* Update base[1] */ memcpy(base + 1, base, sizeof(*base)); write_seqcount_latch_end(&tkf->seq); } static __always_inline u64 __ktime_get_fast_ns(struct tk_fast *tkf) { struct tk_read_base *tkr; unsigned int seq; u64 now; do { seq = read_seqcount_latch(&tkf->seq); tkr = tkf->base + (seq & 0x01); now = ktime_to_ns(tkr->base); now += timekeeping_get_ns(tkr); } while (read_seqcount_latch_retry(&tkf->seq, seq)); return now; } /** * ktime_get_mono_fast_ns - Fast NMI safe access to clock monotonic * * This timestamp is not guaranteed to be monotonic across an update. * The timestamp is calculated by: * * now = base_mono + clock_delta * slope * * So if the update lowers the slope, readers who are forced to the * not yet updated second array are still using the old steeper slope. * * tmono * ^ * | o n * | o n * | u * | o * |o * |12345678---> reader order * * o = old slope * u = update * n = new slope * * So reader 6 will observe time going backwards versus reader 5. * * While other CPUs are likely to be able to observe that, the only way * for a CPU local observation is when an NMI hits in the middle of * the update. Timestamps taken from that NMI context might be ahead * of the following timestamps. Callers need to be aware of that and * deal with it. */ u64 notrace ktime_get_mono_fast_ns(void) { return __ktime_get_fast_ns(&tk_fast_mono); } EXPORT_SYMBOL_GPL(ktime_get_mono_fast_ns); /** * ktime_get_raw_fast_ns - Fast NMI safe access to clock monotonic raw * * Contrary to ktime_get_mono_fast_ns() this is always correct because the * conversion factor is not affected by NTP/PTP correction. */ u64 notrace ktime_get_raw_fast_ns(void) { return __ktime_get_fast_ns(&tk_fast_raw); } EXPORT_SYMBOL_GPL(ktime_get_raw_fast_ns); /** * ktime_get_boot_fast_ns - NMI safe and fast access to boot clock. * * To keep it NMI safe since we're accessing from tracing, we're not using a * separate timekeeper with updates to monotonic clock and boot offset * protected with seqcounts. This has the following minor side effects: * * (1) Its possible that a timestamp be taken after the boot offset is updated * but before the timekeeper is updated. If this happens, the new boot offset * is added to the old timekeeping making the clock appear to update slightly * earlier: * CPU 0 CPU 1 * timekeeping_inject_sleeptime64() * __timekeeping_inject_sleeptime(tk, delta); * timestamp(); * timekeeping_update_staged(tkd, TK_CLEAR_NTP...); * * (2) On 32-bit systems, the 64-bit boot offset (tk->offs_boot) may be * partially updated. Since the tk->offs_boot update is a rare event, this * should be a rare occurrence which postprocessing should be able to handle. * * The caveats vs. timestamp ordering as documented for ktime_get_mono_fast_ns() * apply as well. */ u64 notrace ktime_get_boot_fast_ns(void) { struct timekeeper *tk = &tk_core.timekeeper; return (ktime_get_mono_fast_ns() + ktime_to_ns(data_race(tk->offs_boot))); } EXPORT_SYMBOL_GPL(ktime_get_boot_fast_ns); /** * ktime_get_tai_fast_ns - NMI safe and fast access to tai clock. * * The same limitations as described for ktime_get_boot_fast_ns() apply. The * mono time and the TAI offset are not read atomically which may yield wrong * readouts. However, an update of the TAI offset is an rare event e.g., caused * by settime or adjtimex with an offset. The user of this function has to deal * with the possibility of wrong timestamps in post processing. */ u64 notrace ktime_get_tai_fast_ns(void) { struct timekeeper *tk = &tk_core.timekeeper; return (ktime_get_mono_fast_ns() + ktime_to_ns(data_race(tk->offs_tai))); } EXPORT_SYMBOL_GPL(ktime_get_tai_fast_ns); /** * ktime_get_real_fast_ns: - NMI safe and fast access to clock realtime. * * See ktime_get_mono_fast_ns() for documentation of the time stamp ordering. */ u64 ktime_get_real_fast_ns(void) { struct tk_fast *tkf = &tk_fast_mono; struct tk_read_base *tkr; u64 baser, delta; unsigned int seq; do { seq = raw_read_seqcount_latch(&tkf->seq); tkr = tkf->base + (seq & 0x01); baser = ktime_to_ns(tkr->base_real); delta = timekeeping_get_ns(tkr); } while (raw_read_seqcount_latch_retry(&tkf->seq, seq)); return baser + delta; } EXPORT_SYMBOL_GPL(ktime_get_real_fast_ns); /** * halt_fast_timekeeper - Prevent fast timekeeper from accessing clocksource. * @tk: Timekeeper to snapshot. * * It generally is unsafe to access the clocksource after timekeeping has been * suspended, so take a snapshot of the readout base of @tk and use it as the * fast timekeeper's readout base while suspended. It will return the same * number of cycles every time until timekeeping is resumed at which time the * proper readout base for the fast timekeeper will be restored automatically. */ static void halt_fast_timekeeper(const struct timekeeper *tk) { static struct tk_read_base tkr_dummy; const struct tk_read_base *tkr = &tk->tkr_mono; memcpy(&tkr_dummy, tkr, sizeof(tkr_dummy)); cycles_at_suspend = tk_clock_read(tkr); tkr_dummy.clock = &dummy_clock; tkr_dummy.base_real = tkr->base + tk->offs_real; update_fast_timekeeper(&tkr_dummy, &tk_fast_mono); tkr = &tk->tkr_raw; memcpy(&tkr_dummy, tkr, sizeof(tkr_dummy)); tkr_dummy.clock = &dummy_clock; update_fast_timekeeper(&tkr_dummy, &tk_fast_raw); } static RAW_NOTIFIER_HEAD(pvclock_gtod_chain); static void update_pvclock_gtod(struct timekeeper *tk, bool was_set) { raw_notifier_call_chain(&pvclock_gtod_chain, was_set, tk); } /** * pvclock_gtod_register_notifier - register a pvclock timedata update listener * @nb: Pointer to the notifier block to register */ int pvclock_gtod_register_notifier(struct notifier_block *nb) { struct timekeeper *tk = &tk_core.timekeeper; int ret; guard(raw_spinlock_irqsave)(&tk_core.lock); ret = raw_notifier_chain_register(&pvclock_gtod_chain, nb); update_pvclock_gtod(tk, true); return ret; } EXPORT_SYMBOL_GPL(pvclock_gtod_register_notifier); /** * pvclock_gtod_unregister_notifier - unregister a pvclock * timedata update listener * @nb: Pointer to the notifier block to unregister */ int pvclock_gtod_unregister_notifier(struct notifier_block *nb) { guard(raw_spinlock_irqsave)(&tk_core.lock); return raw_notifier_chain_unregister(&pvclock_gtod_chain, nb); } EXPORT_SYMBOL_GPL(pvclock_gtod_unregister_notifier); /* * tk_update_leap_state - helper to update the next_leap_ktime */ static inline void tk_update_leap_state(struct timekeeper *tk) { tk->next_leap_ktime = ntp_get_next_leap(); if (tk->next_leap_ktime != KTIME_MAX) /* Convert to monotonic time */ tk->next_leap_ktime = ktime_sub(tk->next_leap_ktime, tk->offs_real); } /* * Leap state update for both shadow and the real timekeeper * Separate to spare a full memcpy() of the timekeeper. */ static void tk_update_leap_state_all(struct tk_data *tkd) { write_seqcount_begin(&tkd->seq); tk_update_leap_state(&tkd->shadow_timekeeper); tkd->timekeeper.next_leap_ktime = tkd->shadow_timekeeper.next_leap_ktime; write_seqcount_end(&tkd->seq); } /* * Update the ktime_t based scalar nsec members of the timekeeper */ static inline void tk_update_ktime_data(struct timekeeper *tk) { u64 seconds; u32 nsec; /* * The xtime based monotonic readout is: * nsec = (xtime_sec + wtm_sec) * 1e9 + wtm_nsec + now(); * The ktime based monotonic readout is: * nsec = base_mono + now(); * ==> base_mono = (xtime_sec + wtm_sec) * 1e9 + wtm_nsec */ seconds = (u64)(tk->xtime_sec + tk->wall_to_monotonic.tv_sec); nsec = (u32) tk->wall_to_monotonic.tv_nsec; tk->tkr_mono.base = ns_to_ktime(seconds * NSEC_PER_SEC + nsec); /* * The sum of the nanoseconds portions of xtime and * wall_to_monotonic can be greater/equal one second. Take * this into account before updating tk->ktime_sec. */ nsec += (u32)(tk->tkr_mono.xtime_nsec >> tk->tkr_mono.shift); if (nsec >= NSEC_PER_SEC) seconds++; tk->ktime_sec = seconds; /* Update the monotonic raw base */ tk->tkr_raw.base = ns_to_ktime(tk->raw_sec * NSEC_PER_SEC); } /* * Restore the shadow timekeeper from the real timekeeper. */ static void timekeeping_restore_shadow(struct tk_data *tkd) { lockdep_assert_held(&tkd->lock); memcpy(&tkd->shadow_timekeeper, &tkd->timekeeper, sizeof(tkd->timekeeper)); } static void timekeeping_update_from_shadow(struct tk_data *tkd, unsigned int action) { struct timekeeper *tk = &tk_core.shadow_timekeeper; lockdep_assert_held(&tkd->lock); /* * Block out readers before running the updates below because that * updates VDSO and other time related infrastructure. Not blocking * the readers might let a reader see time going backwards when * reading from the VDSO after the VDSO update and then reading in * the kernel from the timekeeper before that got updated. */ write_seqcount_begin(&tkd->seq); if (action & TK_CLEAR_NTP) { tk->ntp_error = 0; ntp_clear(); } tk_update_leap_state(tk); tk_update_ktime_data(tk); update_vsyscall(tk); update_pvclock_gtod(tk, action & TK_CLOCK_WAS_SET); tk->tkr_mono.base_real = tk->tkr_mono.base + tk->offs_real; update_fast_timekeeper(&tk->tkr_mono, &tk_fast_mono); update_fast_timekeeper(&tk->tkr_raw, &tk_fast_raw); if (action & TK_CLOCK_WAS_SET) tk->clock_was_set_seq++; /* * Update the real timekeeper. * * We could avoid this memcpy() by switching pointers, but that has * the downside that the reader side does not longer benefit from * the cacheline optimized data layout of the timekeeper and requires * another indirection. */ memcpy(&tkd->timekeeper, tk, sizeof(*tk)); write_seqcount_end(&tkd->seq); } /** * timekeeping_forward_now - update clock to the current time * @tk: Pointer to the timekeeper to update * * Forward the current clock to update its state since the last call to * update_wall_time(). This is useful before significant clock changes, * as it avoids having to deal with this time offset explicitly. */ static void timekeeping_forward_now(struct timekeeper *tk) { u64 cycle_now, delta; cycle_now = tk_clock_read(&tk->tkr_mono); delta = clocksource_delta(cycle_now, tk->tkr_mono.cycle_last, tk->tkr_mono.mask, tk->tkr_mono.clock->max_raw_delta); tk->tkr_mono.cycle_last = cycle_now; tk->tkr_raw.cycle_last = cycle_now; while (delta > 0) { u64 max = tk->tkr_mono.clock->max_cycles; u64 incr = delta < max ? delta : max; tk->tkr_mono.xtime_nsec += incr * tk->tkr_mono.mult; tk->tkr_raw.xtime_nsec += incr * tk->tkr_raw.mult; tk_normalize_xtime(tk); delta -= incr; } } /** * ktime_get_real_ts64 - Returns the time of day in a timespec64. * @ts: pointer to the timespec to be set * * Returns the time of day in a timespec64 (WARN if suspended). */ void ktime_get_real_ts64(struct timespec64 *ts) { struct timekeeper *tk = &tk_core.timekeeper; unsigned int seq; u64 nsecs; WARN_ON(timekeeping_suspended); do { seq = read_seqcount_begin(&tk_core.seq); ts->tv_sec = tk->xtime_sec; nsecs = timekeeping_get_ns(&tk->tkr_mono); } while (read_seqcount_retry(&tk_core.seq, seq)); ts->tv_nsec = 0; timespec64_add_ns(ts, nsecs); } EXPORT_SYMBOL(ktime_get_real_ts64); ktime_t ktime_get(void) { struct timekeeper *tk = &tk_core.timekeeper; unsigned int seq; ktime_t base; u64 nsecs; WARN_ON(timekeeping_suspended); do { seq = read_seqcount_begin(&tk_core.seq); base = tk->tkr_mono.base; nsecs = timekeeping_get_ns(&tk->tkr_mono); } while (read_seqcount_retry(&tk_core.seq, seq)); return ktime_add_ns(base, nsecs); } EXPORT_SYMBOL_GPL(ktime_get); u32 ktime_get_resolution_ns(void) { struct timekeeper *tk = &tk_core.timekeeper; unsigned int seq; u32 nsecs; WARN_ON(timekeeping_suspended); do { seq = read_seqcount_begin(&tk_core.seq); nsecs = tk->tkr_mono.mult >> tk->tkr_mono.shift; } while (read_seqcount_retry(&tk_core.seq, seq)); return nsecs; } EXPORT_SYMBOL_GPL(ktime_get_resolution_ns); static ktime_t *offsets[TK_OFFS_MAX] = { [TK_OFFS_REAL] = &tk_core.timekeeper.offs_real, [TK_OFFS_BOOT] = &tk_core.timekeeper.offs_boot, [TK_OFFS_TAI] = &tk_core.timekeeper.offs_tai, }; ktime_t ktime_get_with_offset(enum tk_offsets offs) { struct timekeeper *tk = &tk_core.timekeeper; unsigned int seq; ktime_t base, *offset = offsets[offs]; u64 nsecs; WARN_ON(timekeeping_suspended); do { seq = read_seqcount_begin(&tk_core.seq); base = ktime_add(tk->tkr_mono.base, *offset); nsecs = timekeeping_get_ns(&tk->tkr_mono); } while (read_seqcount_retry(&tk_core.seq, seq)); return ktime_add_ns(base, nsecs); } EXPORT_SYMBOL_GPL(ktime_get_with_offset); ktime_t ktime_get_coarse_with_offset(enum tk_offsets offs) { struct timekeeper *tk = &tk_core.timekeeper; unsigned int seq; ktime_t base, *offset = offsets[offs]; u64 nsecs; WARN_ON(timekeeping_suspended); do { seq = read_seqcount_begin(&tk_core.seq); base = ktime_add(tk->tkr_mono.base, *offset); nsecs = tk->tkr_mono.xtime_nsec >> tk->tkr_mono.shift; } while (read_seqcount_retry(&tk_core.seq, seq)); return ktime_add_ns(base, nsecs); } EXPORT_SYMBOL_GPL(ktime_get_coarse_with_offset); /** * ktime_mono_to_any() - convert monotonic time to any other time * @tmono: time to convert. * @offs: which offset to use */ ktime_t ktime_mono_to_any(ktime_t tmono, enum tk_offsets offs) { ktime_t *offset = offsets[offs]; unsigned int seq; ktime_t tconv; if (IS_ENABLED(CONFIG_64BIT)) { /* * Paired with WRITE_ONCE()s in tk_set_wall_to_mono() and * tk_update_sleep_time(). */ return ktime_add(tmono, READ_ONCE(*offset)); } do { seq = read_seqcount_begin(&tk_core.seq); tconv = ktime_add(tmono, *offset); } while (read_seqcount_retry(&tk_core.seq, seq)); return tconv; } EXPORT_SYMBOL_GPL(ktime_mono_to_any); /** * ktime_get_raw - Returns the raw monotonic time in ktime_t format */ ktime_t ktime_get_raw(void) { struct timekeeper *tk = &tk_core.timekeeper; unsigned int seq; ktime_t base; u64 nsecs; do { seq = read_seqcount_begin(&tk_core.seq); base = tk->tkr_raw.base; nsecs = timekeeping_get_ns(&tk->tkr_raw); } while (read_seqcount_retry(&tk_core.seq, seq)); return ktime_add_ns(base, nsecs); } EXPORT_SYMBOL_GPL(ktime_get_raw); /** * ktime_get_ts64 - get the monotonic clock in timespec64 format * @ts: pointer to timespec variable * * The function calculates the monotonic clock from the realtime * clock and the wall_to_monotonic offset and stores the result * in normalized timespec64 format in the variable pointed to by @ts. */ void ktime_get_ts64(struct timespec64 *ts) { struct timekeeper *tk = &tk_core.timekeeper; struct timespec64 tomono; unsigned int seq; u64 nsec; WARN_ON(timekeeping_suspended); do { seq = read_seqcount_begin(&tk_core.seq); ts->tv_sec = tk->xtime_sec; nsec = timekeeping_get_ns(&tk->tkr_mono); tomono = tk->wall_to_monotonic; } while (read_seqcount_retry(&tk_core.seq, seq)); ts->tv_sec += tomono.tv_sec; ts->tv_nsec = 0; timespec64_add_ns(ts, nsec + tomono.tv_nsec); } EXPORT_SYMBOL_GPL(ktime_get_ts64); /** * ktime_get_seconds - Get the seconds portion of CLOCK_MONOTONIC * * Returns the seconds portion of CLOCK_MONOTONIC with a single non * serialized read. tk->ktime_sec is of type 'unsigned long' so this * works on both 32 and 64 bit systems. On 32 bit systems the readout * covers ~136 years of uptime which should be enough to prevent * premature wrap arounds. */ time64_t ktime_get_seconds(void) { struct timekeeper *tk = &tk_core.timekeeper; WARN_ON(timekeeping_suspended); return tk->ktime_sec; } EXPORT_SYMBOL_GPL(ktime_get_seconds); /** * ktime_get_real_seconds - Get the seconds portion of CLOCK_REALTIME * * Returns the wall clock seconds since 1970. * * For 64bit systems the fast access to tk->xtime_sec is preserved. On * 32bit systems the access must be protected with the sequence * counter to provide "atomic" access to the 64bit tk->xtime_sec * value. */ time64_t ktime_get_real_seconds(void) { struct timekeeper *tk = &tk_core.timekeeper; time64_t seconds; unsigned int seq; if (IS_ENABLED(CONFIG_64BIT)) return tk->xtime_sec; do { seq = read_seqcount_begin(&tk_core.seq); seconds = tk->xtime_sec; } while (read_seqcount_retry(&tk_core.seq, seq)); return seconds; } EXPORT_SYMBOL_GPL(ktime_get_real_seconds); /** * __ktime_get_real_seconds - The same as ktime_get_real_seconds * but without the sequence counter protect. This internal function * is called just when timekeeping lock is already held. */ noinstr time64_t __ktime_get_real_seconds(void) { struct timekeeper *tk = &tk_core.timekeeper; return tk->xtime_sec; } /** * ktime_get_snapshot - snapshots the realtime/monotonic raw clocks with counter * @systime_snapshot: pointer to struct receiving the system time snapshot */ void ktime_get_snapshot(struct system_time_snapshot *systime_snapshot) { struct timekeeper *tk = &tk_core.timekeeper; unsigned int seq; ktime_t base_raw; ktime_t base_real; ktime_t base_boot; u64 nsec_raw; u64 nsec_real; u64 now; WARN_ON_ONCE(timekeeping_suspended); do { seq = read_seqcount_begin(&tk_core.seq); now = tk_clock_read(&tk->tkr_mono); systime_snapshot->cs_id = tk->tkr_mono.clock->id; systime_snapshot->cs_was_changed_seq = tk->cs_was_changed_seq; systime_snapshot->clock_was_set_seq = tk->clock_was_set_seq; base_real = ktime_add(tk->tkr_mono.base, tk_core.timekeeper.offs_real); base_boot = ktime_add(tk->tkr_mono.base, tk_core.timekeeper.offs_boot); base_raw = tk->tkr_raw.base; nsec_real = timekeeping_cycles_to_ns(&tk->tkr_mono, now); nsec_raw = timekeeping_cycles_to_ns(&tk->tkr_raw, now); } while (read_seqcount_retry(&tk_core.seq, seq)); systime_snapshot->cycles = now; systime_snapshot->real = ktime_add_ns(base_real, nsec_real); systime_snapshot->boot = ktime_add_ns(base_boot, nsec_real); systime_snapshot->raw = ktime_add_ns(base_raw, nsec_raw); } EXPORT_SYMBOL_GPL(ktime_get_snapshot); /* Scale base by mult/div checking for overflow */ static int scale64_check_overflow(u64 mult, u64 div, u64 *base) { u64 tmp, rem; tmp = div64_u64_rem(*base, div, &rem); if (((int)sizeof(u64)*8 - fls64(mult) < fls64(tmp)) || ((int)sizeof(u64)*8 - fls64(mult) < fls64(rem))) return -EOVERFLOW; tmp *= mult; rem = div64_u64(rem * mult, div); *base = tmp + rem; return 0; } /** * adjust_historical_crosststamp - adjust crosstimestamp previous to current interval * @history: Snapshot representing start of history * @partial_history_cycles: Cycle offset into history (fractional part) * @total_history_cycles: Total history length in cycles * @discontinuity: True indicates clock was set on history period * @ts: Cross timestamp that should be adjusted using * partial/total ratio * * Helper function used by get_device_system_crosststamp() to correct the * crosstimestamp corresponding to the start of the current interval to the * system counter value (timestamp point) provided by the driver. The * total_history_* quantities are the total history starting at the provided * reference point and ending at the start of the current interval. The cycle * count between the driver timestamp point and the start of the current * interval is partial_history_cycles. */ static int adjust_historical_crosststamp(struct system_time_snapshot *history, u64 partial_history_cycles, u64 total_history_cycles, bool discontinuity, struct system_device_crosststamp *ts) { struct timekeeper *tk = &tk_core.timekeeper; u64 corr_raw, corr_real; bool interp_forward; int ret; if (total_history_cycles == 0 || partial_history_cycles == 0) return 0; /* Interpolate shortest distance from beginning or end of history */ interp_forward = partial_history_cycles > total_history_cycles / 2; partial_history_cycles = interp_forward ? total_history_cycles - partial_history_cycles : partial_history_cycles; /* * Scale the monotonic raw time delta by: * partial_history_cycles / total_history_cycles */ corr_raw = (u64)ktime_to_ns( ktime_sub(ts->sys_monoraw, history->raw)); ret = scale64_check_overflow(partial_history_cycles, total_history_cycles, &corr_raw); if (ret) return ret; /* * If there is a discontinuity in the history, scale monotonic raw * correction by: * mult(real)/mult(raw) yielding the realtime correction * Otherwise, calculate the realtime correction similar to monotonic * raw calculation */ if (discontinuity) { corr_real = mul_u64_u32_div (corr_raw, tk->tkr_mono.mult, tk->tkr_raw.mult); } else { corr_real = (u64)ktime_to_ns( ktime_sub(ts->sys_realtime, history->real)); ret = scale64_check_overflow(partial_history_cycles, total_history_cycles, &corr_real); if (ret) return ret; } /* Fixup monotonic raw and real time time values */ if (interp_forward) { ts->sys_monoraw = ktime_add_ns(history->raw, corr_raw); ts->sys_realtime = ktime_add_ns(history->real, corr_real); } else { ts->sys_monoraw = ktime_sub_ns(ts->sys_monoraw, corr_raw); ts->sys_realtime = ktime_sub_ns(ts->sys_realtime, corr_real); } return 0; } /* * timestamp_in_interval - true if ts is chronologically in [start, end] * * True if ts occurs chronologically at or after start, and before or at end. */ static bool timestamp_in_interval(u64 start, u64 end, u64 ts) { if (ts >= start && ts <= end) return true; if (start > end && (ts >= start || ts <= end)) return true; return false; } static bool convert_clock(u64 *val, u32 numerator, u32 denominator) { u64 rem, res; if (!numerator || !denominator) return false; res = div64_u64_rem(*val, denominator, &rem) * numerator; *val = res + div_u64(rem * numerator, denominator); return true; } static bool convert_base_to_cs(struct system_counterval_t *scv) { struct clocksource *cs = tk_core.timekeeper.tkr_mono.clock; struct clocksource_base *base; u32 num, den; /* The timestamp was taken from the time keeper clock source */ if (cs->id == scv->cs_id) return true; /* * Check whether cs_id matches the base clock. Prevent the compiler from * re-evaluating @base as the clocksource might change concurrently. */ base = READ_ONCE(cs->base); if (!base || base->id != scv->cs_id) return false; num = scv->use_nsecs ? cs->freq_khz : base->numerator; den = scv->use_nsecs ? USEC_PER_SEC : base->denominator; if (!convert_clock(&scv->cycles, num, den)) return false; scv->cycles += base->offset; return true; } static bool convert_cs_to_base(u64 *cycles, enum clocksource_ids base_id) { struct clocksource *cs = tk_core.timekeeper.tkr_mono.clock; struct clocksource_base *base; /* * Check whether base_id matches the base clock. Prevent the compiler from * re-evaluating @base as the clocksource might change concurrently. */ base = READ_ONCE(cs->base); if (!base || base->id != base_id) return false; *cycles -= base->offset; if (!convert_clock(cycles, base->denominator, base->numerator)) return false; return true; } static bool convert_ns_to_cs(u64 *delta) { struct tk_read_base *tkr = &tk_core.timekeeper.tkr_mono; if (BITS_TO_BYTES(fls64(*delta) + tkr->shift) >= sizeof(*delta)) return false; *delta = div_u64((*delta << tkr->shift) - tkr->xtime_nsec, tkr->mult); return true; } /** * ktime_real_to_base_clock() - Convert CLOCK_REALTIME timestamp to a base clock timestamp * @treal: CLOCK_REALTIME timestamp to convert * @base_id: base clocksource id * @cycles: pointer to store the converted base clock timestamp * * Converts a supplied, future realtime clock value to the corresponding base clock value. * * Return: true if the conversion is successful, false otherwise. */ bool ktime_real_to_base_clock(ktime_t treal, enum clocksource_ids base_id, u64 *cycles) { struct timekeeper *tk = &tk_core.timekeeper; unsigned int seq; u64 delta; do { seq = read_seqcount_begin(&tk_core.seq); if ((u64)treal < tk->tkr_mono.base_real) return false; delta = (u64)treal - tk->tkr_mono.base_real; if (!convert_ns_to_cs(&delta)) return false; *cycles = tk->tkr_mono.cycle_last + delta; if (!convert_cs_to_base(cycles, base_id)) return false; } while (read_seqcount_retry(&tk_core.seq, seq)); return true; } EXPORT_SYMBOL_GPL(ktime_real_to_base_clock); /** * get_device_system_crosststamp - Synchronously capture system/device timestamp * @get_time_fn: Callback to get simultaneous device time and * system counter from the device driver * @ctx: Context passed to get_time_fn() * @history_begin: Historical reference point used to interpolate system * time when counter provided by the driver is before the current interval * @xtstamp: Receives simultaneously captured system and device time * * Reads a timestamp from a device and correlates it to system time */ int get_device_system_crosststamp(int (*get_time_fn) (ktime_t *device_time, struct system_counterval_t *sys_counterval, void *ctx), void *ctx, struct system_time_snapshot *history_begin, struct system_device_crosststamp *xtstamp) { struct system_counterval_t system_counterval; struct timekeeper *tk = &tk_core.timekeeper; u64 cycles, now, interval_start; unsigned int clock_was_set_seq = 0; ktime_t base_real, base_raw; u64 nsec_real, nsec_raw; u8 cs_was_changed_seq; unsigned int seq; bool do_interp; int ret; do { seq = read_seqcount_begin(&tk_core.seq); /* * Try to synchronously capture device time and a system * counter value calling back into the device driver */ ret = get_time_fn(&xtstamp->device, &system_counterval, ctx); if (ret) return ret; /* * Verify that the clocksource ID associated with the captured * system counter value is the same as for the currently * installed timekeeper clocksource */ if (system_counterval.cs_id == CSID_GENERIC || !convert_base_to_cs(&system_counterval)) return -ENODEV; cycles = system_counterval.cycles; /* * Check whether the system counter value provided by the * device driver is on the current timekeeping interval. */ now = tk_clock_read(&tk->tkr_mono); interval_start = tk->tkr_mono.cycle_last; if (!timestamp_in_interval(interval_start, now, cycles)) { clock_was_set_seq = tk->clock_was_set_seq; cs_was_changed_seq = tk->cs_was_changed_seq; cycles = interval_start; do_interp = true; } else { do_interp = false; } base_real = ktime_add(tk->tkr_mono.base, tk_core.timekeeper.offs_real); base_raw = tk->tkr_raw.base; nsec_real = timekeeping_cycles_to_ns(&tk->tkr_mono, cycles); nsec_raw = timekeeping_cycles_to_ns(&tk->tkr_raw, cycles); } while (read_seqcount_retry(&tk_core.seq, seq)); xtstamp->sys_realtime = ktime_add_ns(base_real, nsec_real); xtstamp->sys_monoraw = ktime_add_ns(base_raw, nsec_raw); /* * Interpolate if necessary, adjusting back from the start of the * current interval */ if (do_interp) { u64 partial_history_cycles, total_history_cycles; bool discontinuity; /* * Check that the counter value is not before the provided * history reference and that the history doesn't cross a * clocksource change */ if (!history_begin || !timestamp_in_interval(history_begin->cycles, cycles, system_counterval.cycles) || history_begin->cs_was_changed_seq != cs_was_changed_seq) return -EINVAL; partial_history_cycles = cycles - system_counterval.cycles; total_history_cycles = cycles - history_begin->cycles; discontinuity = history_begin->clock_was_set_seq != clock_was_set_seq; ret = adjust_historical_crosststamp(history_begin, partial_history_cycles, total_history_cycles, discontinuity, xtstamp); if (ret) return ret; } return 0; } EXPORT_SYMBOL_GPL(get_device_system_crosststamp); /** * timekeeping_clocksource_has_base - Check whether the current clocksource * is based on given a base clock * @id: base clocksource ID * * Note: The return value is a snapshot which can become invalid right * after the function returns. * * Return: true if the timekeeper clocksource has a base clock with @id, * false otherwise */ bool timekeeping_clocksource_has_base(enum clocksource_ids id) { /* * This is a snapshot, so no point in using the sequence * count. Just prevent the compiler from re-evaluating @base as the * clocksource might change concurrently. */ struct clocksource_base *base = READ_ONCE(tk_core.timekeeper.tkr_mono.clock->base); return base ? base->id == id : false; } EXPORT_SYMBOL_GPL(timekeeping_clocksource_has_base); /** * do_settimeofday64 - Sets the time of day. * @ts: pointer to the timespec64 variable containing the new time * * Sets the time of day to the new time and update NTP and notify hrtimers */ int do_settimeofday64(const struct timespec64 *ts) { struct timespec64 ts_delta, xt; if (!timespec64_valid_settod(ts)) return -EINVAL; scoped_guard (raw_spinlock_irqsave, &tk_core.lock) { struct timekeeper *tks = &tk_core.shadow_timekeeper; timekeeping_forward_now(tks); xt = tk_xtime(tks); ts_delta = timespec64_sub(*ts, xt); if (timespec64_compare(&tks->wall_to_monotonic, &ts_delta) > 0) { timekeeping_restore_shadow(&tk_core); return -EINVAL; } tk_set_wall_to_mono(tks, timespec64_sub(tks->wall_to_monotonic, ts_delta)); tk_set_xtime(tks, ts); timekeeping_update_from_shadow(&tk_core, TK_UPDATE_ALL); } /* Signal hrtimers about time change */ clock_was_set(CLOCK_SET_WALL); audit_tk_injoffset(ts_delta); add_device_randomness(ts, sizeof(*ts)); return 0; } EXPORT_SYMBOL(do_settimeofday64); /** * timekeeping_inject_offset - Adds or subtracts from the current time. * @ts: Pointer to the timespec variable containing the offset * * Adds or subtracts an offset value from the current time. */ static int timekeeping_inject_offset(const struct timespec64 *ts) { if (ts->tv_nsec < 0 || ts->tv_nsec >= NSEC_PER_SEC) return -EINVAL; scoped_guard (raw_spinlock_irqsave, &tk_core.lock) { struct timekeeper *tks = &tk_core.shadow_timekeeper; struct timespec64 tmp; timekeeping_forward_now(tks); /* Make sure the proposed value is valid */ tmp = timespec64_add(tk_xtime(tks), *ts); if (timespec64_compare(&tks->wall_to_monotonic, ts) > 0 || !timespec64_valid_settod(&tmp)) { timekeeping_restore_shadow(&tk_core); return -EINVAL; } tk_xtime_add(tks, ts); tk_set_wall_to_mono(tks, timespec64_sub(tks->wall_to_monotonic, *ts)); timekeeping_update_from_shadow(&tk_core, TK_UPDATE_ALL); } /* Signal hrtimers about time change */ clock_was_set(CLOCK_SET_WALL); return 0; } /* * Indicates if there is an offset between the system clock and the hardware * clock/persistent clock/rtc. */ int persistent_clock_is_local; /* * Adjust the time obtained from the CMOS to be UTC time instead of * local time. * * This is ugly, but preferable to the alternatives. Otherwise we * would either need to write a program to do it in /etc/rc (and risk * confusion if the program gets run more than once; it would also be * hard to make the program warp the clock precisely n hours) or * compile in the timezone information into the kernel. Bad, bad.... * * - TYT, 1992-01-01 * * The best thing to do is to keep the CMOS clock in universal time (UTC) * as real UNIX machines always do it. This avoids all headaches about * daylight saving times and warping kernel clocks. */ void timekeeping_warp_clock(void) { if (sys_tz.tz_minuteswest != 0) { struct timespec64 adjust; persistent_clock_is_local = 1; adjust.tv_sec = sys_tz.tz_minuteswest * 60; adjust.tv_nsec = 0; timekeeping_inject_offset(&adjust); } } /* * __timekeeping_set_tai_offset - Sets the TAI offset from UTC and monotonic */ static void __timekeeping_set_tai_offset(struct timekeeper *tk, s32 tai_offset) { tk->tai_offset = tai_offset; tk->offs_tai = ktime_add(tk->offs_real, ktime_set(tai_offset, 0)); } /* * change_clocksource - Swaps clocksources if a new one is available * * Accumulates current time interval and initializes new clocksource */ static int change_clocksource(void *data) { struct clocksource *new = data, *old = NULL; /* * If the clocksource is in a module, get a module reference. * Succeeds for built-in code (owner == NULL) as well. Abort if the * reference can't be acquired. */ if (!try_module_get(new->owner)) return 0; /* Abort if the device can't be enabled */ if (new->enable && new->enable(new) != 0) { module_put(new->owner); return 0; } scoped_guard (raw_spinlock_irqsave, &tk_core.lock) { struct timekeeper *tks = &tk_core.shadow_timekeeper; timekeeping_forward_now(tks); old = tks->tkr_mono.clock; tk_setup_internals(tks, new); timekeeping_update_from_shadow(&tk_core, TK_UPDATE_ALL); } if (old) { if (old->disable) old->disable(old); module_put(old->owner); } return 0; } /** * timekeeping_notify - Install a new clock source * @clock: pointer to the clock source * * This function is called from clocksource.c after a new, better clock * source has been registered. The caller holds the clocksource_mutex. */ int timekeeping_notify(struct clocksource *clock) { struct timekeeper *tk = &tk_core.timekeeper; if (tk->tkr_mono.clock == clock) return 0; stop_machine(change_clocksource, clock, NULL); tick_clock_notify(); return tk->tkr_mono.clock == clock ? 0 : -1; } /** * ktime_get_raw_ts64 - Returns the raw monotonic time in a timespec * @ts: pointer to the timespec64 to be set * * Returns the raw monotonic time (completely un-modified by ntp) */ void ktime_get_raw_ts64(struct timespec64 *ts) { struct timekeeper *tk = &tk_core.timekeeper; unsigned int seq; u64 nsecs; do { seq = read_seqcount_begin(&tk_core.seq); ts->tv_sec = tk->raw_sec; nsecs = timekeeping_get_ns(&tk->tkr_raw); } while (read_seqcount_retry(&tk_core.seq, seq)); ts->tv_nsec = 0; timespec64_add_ns(ts, nsecs); } EXPORT_SYMBOL(ktime_get_raw_ts64); /** * timekeeping_valid_for_hres - Check if timekeeping is suitable for hres */ int timekeeping_valid_for_hres(void) { struct timekeeper *tk = &tk_core.timekeeper; unsigned int seq; int ret; do { seq = read_seqcount_begin(&tk_core.seq); ret = tk->tkr_mono.clock->flags & CLOCK_SOURCE_VALID_FOR_HRES; } while (read_seqcount_retry(&tk_core.seq, seq)); return ret; } /** * timekeeping_max_deferment - Returns max time the clocksource can be deferred */ u64 timekeeping_max_deferment(void) { struct timekeeper *tk = &tk_core.timekeeper; unsigned int seq; u64 ret; do { seq = read_seqcount_begin(&tk_core.seq); ret = tk->tkr_mono.clock->max_idle_ns; } while (read_seqcount_retry(&tk_core.seq, seq)); return ret; } /** * read_persistent_clock64 - Return time from the persistent clock. * @ts: Pointer to the storage for the readout value * * Weak dummy function for arches that do not yet support it. * Reads the time from the battery backed persistent clock. * Returns a timespec with tv_sec=0 and tv_nsec=0 if unsupported. * * XXX - Do be sure to remove it once all arches implement it. */ void __weak read_persistent_clock64(struct timespec64 *ts) { ts->tv_sec = 0; ts->tv_nsec = 0; } /** * read_persistent_wall_and_boot_offset - Read persistent clock, and also offset * from the boot. * @wall_time: current time as returned by persistent clock * @boot_offset: offset that is defined as wall_time - boot_time * * Weak dummy function for arches that do not yet support it. * * The default function calculates offset based on the current value of * local_clock(). This way architectures that support sched_clock() but don't * support dedicated boot time clock will provide the best estimate of the * boot time. */ void __weak __init read_persistent_wall_and_boot_offset(struct timespec64 *wall_time, struct timespec64 *boot_offset) { read_persistent_clock64(wall_time); *boot_offset = ns_to_timespec64(local_clock()); } static __init void tkd_basic_setup(struct tk_data *tkd) { raw_spin_lock_init(&tkd->lock); seqcount_raw_spinlock_init(&tkd->seq, &tkd->lock); } /* * Flag reflecting whether timekeeping_resume() has injected sleeptime. * * The flag starts of false and is only set when a suspend reaches * timekeeping_suspend(), timekeeping_resume() sets it to false when the * timekeeper clocksource is not stopping across suspend and has been * used to update sleep time. If the timekeeper clocksource has stopped * then the flag stays true and is used by the RTC resume code to decide * whether sleeptime must be injected and if so the flag gets false then. * * If a suspend fails before reaching timekeeping_resume() then the flag * stays false and prevents erroneous sleeptime injection. */ static bool suspend_timing_needed; /* Flag for if there is a persistent clock on this platform */ static bool persistent_clock_exists; /* * timekeeping_init - Initializes the clocksource and common timekeeping values */ void __init timekeeping_init(void) { struct timespec64 wall_time, boot_offset, wall_to_mono; struct timekeeper *tks = &tk_core.shadow_timekeeper; struct clocksource *clock; tkd_basic_setup(&tk_core); read_persistent_wall_and_boot_offset(&wall_time, &boot_offset); if (timespec64_valid_settod(&wall_time) && timespec64_to_ns(&wall_time) > 0) { persistent_clock_exists = true; } else if (timespec64_to_ns(&wall_time) != 0) { pr_warn("Persistent clock returned invalid value"); wall_time = (struct timespec64){0}; } if (timespec64_compare(&wall_time, &boot_offset) < 0) boot_offset = (struct timespec64){0}; /* * We want set wall_to_mono, so the following is true: * wall time + wall_to_mono = boot time */ wall_to_mono = timespec64_sub(boot_offset, wall_time); guard(raw_spinlock_irqsave)(&tk_core.lock); ntp_init(); clock = clocksource_default_clock(); if (clock->enable) clock->enable(clock); tk_setup_internals(tks, clock); tk_set_xtime(tks, &wall_time); tks->raw_sec = 0; tk_set_wall_to_mono(tks, wall_to_mono); timekeeping_update_from_shadow(&tk_core, TK_CLOCK_WAS_SET); } /* time in seconds when suspend began for persistent clock */ static struct timespec64 timekeeping_suspend_time; /** * __timekeeping_inject_sleeptime - Internal function to add sleep interval * @tk: Pointer to the timekeeper to be updated * @delta: Pointer to the delta value in timespec64 format * * Takes a timespec offset measuring a suspend interval and properly * adds the sleep offset to the timekeeping variables. */ static void __timekeeping_inject_sleeptime(struct timekeeper *tk, const struct timespec64 *delta) { if (!timespec64_valid_strict(delta)) { printk_deferred(KERN_WARNING "__timekeeping_inject_sleeptime: Invalid " "sleep delta value!\n"); return; } tk_xtime_add(tk, delta); tk_set_wall_to_mono(tk, timespec64_sub(tk->wall_to_monotonic, *delta)); tk_update_sleep_time(tk, timespec64_to_ktime(*delta)); tk_debug_account_sleep_time(delta); } #if defined(CONFIG_PM_SLEEP) && defined(CONFIG_RTC_HCTOSYS_DEVICE) /* * We have three kinds of time sources to use for sleep time * injection, the preference order is: * 1) non-stop clocksource * 2) persistent clock (ie: RTC accessible when irqs are off) * 3) RTC * * 1) and 2) are used by timekeeping, 3) by RTC subsystem. * If system has neither 1) nor 2), 3) will be used finally. * * * If timekeeping has injected sleeptime via either 1) or 2), * 3) becomes needless, so in this case we don't need to call * rtc_resume(), and this is what timekeeping_rtc_skipresume() * means. */ bool timekeeping_rtc_skipresume(void) { return !suspend_timing_needed; } /* * 1) can be determined whether to use or not only when doing * timekeeping_resume() which is invoked after rtc_suspend(), * so we can't skip rtc_suspend() surely if system has 1). * * But if system has 2), 2) will definitely be used, so in this * case we don't need to call rtc_suspend(), and this is what * timekeeping_rtc_skipsuspend() means. */ bool timekeeping_rtc_skipsuspend(void) { return persistent_clock_exists; } /** * timekeeping_inject_sleeptime64 - Adds suspend interval to timeekeeping values * @delta: pointer to a timespec64 delta value * * This hook is for architectures that cannot support read_persistent_clock64 * because their RTC/persistent clock is only accessible when irqs are enabled. * and also don't have an effective nonstop clocksource. * * This function should only be called by rtc_resume(), and allows * a suspend offset to be injected into the timekeeping values. */ void timekeeping_inject_sleeptime64(const struct timespec64 *delta) { scoped_guard(raw_spinlock_irqsave, &tk_core.lock) { struct timekeeper *tks = &tk_core.shadow_timekeeper; suspend_timing_needed = false; timekeeping_forward_now(tks); __timekeeping_inject_sleeptime(tks, delta); timekeeping_update_from_shadow(&tk_core, TK_UPDATE_ALL); } /* Signal hrtimers about time change */ clock_was_set(CLOCK_SET_WALL | CLOCK_SET_BOOT); } #endif /** * timekeeping_resume - Resumes the generic timekeeping subsystem. */ void timekeeping_resume(void) { struct timekeeper *tks = &tk_core.shadow_timekeeper; struct clocksource *clock = tks->tkr_mono.clock; struct timespec64 ts_new, ts_delta; bool inject_sleeptime = false; u64 cycle_now, nsec; unsigned long flags; read_persistent_clock64(&ts_new); clockevents_resume(); clocksource_resume(); raw_spin_lock_irqsave(&tk_core.lock, flags); /* * After system resumes, we need to calculate the suspended time and * compensate it for the OS time. There are 3 sources that could be * used: Nonstop clocksource during suspend, persistent clock and rtc * device. * * One specific platform may have 1 or 2 or all of them, and the * preference will be: * suspend-nonstop clocksource -> persistent clock -> rtc * The less preferred source will only be tried if there is no better * usable source. The rtc part is handled separately in rtc core code. */ cycle_now = tk_clock_read(&tks->tkr_mono); nsec = clocksource_stop_suspend_timing(clock, cycle_now); if (nsec > 0) { ts_delta = ns_to_timespec64(nsec); inject_sleeptime = true; } else if (timespec64_compare(&ts_new, &timekeeping_suspend_time) > 0) { ts_delta = timespec64_sub(ts_new, timekeeping_suspend_time); inject_sleeptime = true; } if (inject_sleeptime) { suspend_timing_needed = false; __timekeeping_inject_sleeptime(tks, &ts_delta); } /* Re-base the last cycle value */ tks->tkr_mono.cycle_last = cycle_now; tks->tkr_raw.cycle_last = cycle_now; tks->ntp_error = 0; timekeeping_suspended = 0; timekeeping_update_from_shadow(&tk_core, TK_CLOCK_WAS_SET); raw_spin_unlock_irqrestore(&tk_core.lock, flags); touch_softlockup_watchdog(); /* Resume the clockevent device(s) and hrtimers */ tick_resume(); /* Notify timerfd as resume is equivalent to clock_was_set() */ timerfd_resume(); } int timekeeping_suspend(void) { struct timekeeper *tks = &tk_core.shadow_timekeeper; struct timespec64 delta, delta_delta; static struct timespec64 old_delta; struct clocksource *curr_clock; unsigned long flags; u64 cycle_now; read_persistent_clock64(&timekeeping_suspend_time); /* * On some systems the persistent_clock can not be detected at * timekeeping_init by its return value, so if we see a valid * value returned, update the persistent_clock_exists flag. */ if (timekeeping_suspend_time.tv_sec || timekeeping_suspend_time.tv_nsec) persistent_clock_exists = true; suspend_timing_needed = true; raw_spin_lock_irqsave(&tk_core.lock, flags); timekeeping_forward_now(tks); timekeeping_suspended = 1; /* * Since we've called forward_now, cycle_last stores the value * just read from the current clocksource. Save this to potentially * use in suspend timing. */ curr_clock = tks->tkr_mono.clock; cycle_now = tks->tkr_mono.cycle_last; clocksource_start_suspend_timing(curr_clock, cycle_now); if (persistent_clock_exists) { /* * To avoid drift caused by repeated suspend/resumes, * which each can add ~1 second drift error, * try to compensate so the difference in system time * and persistent_clock time stays close to constant. */ delta = timespec64_sub(tk_xtime(tks), timekeeping_suspend_time); delta_delta = timespec64_sub(delta, old_delta); if (abs(delta_delta.tv_sec) >= 2) { /* * if delta_delta is too large, assume time correction * has occurred and set old_delta to the current delta. */ old_delta = delta; } else { /* Otherwise try to adjust old_system to compensate */ timekeeping_suspend_time = timespec64_add(timekeeping_suspend_time, delta_delta); } } timekeeping_update_from_shadow(&tk_core, 0); halt_fast_timekeeper(tks); raw_spin_unlock_irqrestore(&tk_core.lock, flags); tick_suspend(); clocksource_suspend(); clockevents_suspend(); return 0; } /* sysfs resume/suspend bits for timekeeping */ static struct syscore_ops timekeeping_syscore_ops = { .resume = timekeeping_resume, .suspend = timekeeping_suspend, }; static int __init timekeeping_init_ops(void) { register_syscore_ops(&timekeeping_syscore_ops); return 0; } device_initcall(timekeeping_init_ops); /* * Apply a multiplier adjustment to the timekeeper */ static __always_inline void timekeeping_apply_adjustment(struct timekeeper *tk, s64 offset, s32 mult_adj) { s64 interval = tk->cycle_interval; if (mult_adj == 0) { return; } else if (mult_adj == -1) { interval = -interval; offset = -offset; } else if (mult_adj != 1) { interval *= mult_adj; offset *= mult_adj; } /* * So the following can be confusing. * * To keep things simple, lets assume mult_adj == 1 for now. * * When mult_adj != 1, remember that the interval and offset values * have been appropriately scaled so the math is the same. * * The basic idea here is that we're increasing the multiplier * by one, this causes the xtime_interval to be incremented by * one cycle_interval. This is because: * xtime_interval = cycle_interval * mult * So if mult is being incremented by one: * xtime_interval = cycle_interval * (mult + 1) * Its the same as: * xtime_interval = (cycle_interval * mult) + cycle_interval * Which can be shortened to: * xtime_interval += cycle_interval * * So offset stores the non-accumulated cycles. Thus the current * time (in shifted nanoseconds) is: * now = (offset * adj) + xtime_nsec * Now, even though we're adjusting the clock frequency, we have * to keep time consistent. In other words, we can't jump back * in time, and we also want to avoid jumping forward in time. * * So given the same offset value, we need the time to be the same * both before and after the freq adjustment. * now = (offset * adj_1) + xtime_nsec_1 * now = (offset * adj_2) + xtime_nsec_2 * So: * (offset * adj_1) + xtime_nsec_1 = * (offset * adj_2) + xtime_nsec_2 * And we know: * adj_2 = adj_1 + 1 * So: * (offset * adj_1) + xtime_nsec_1 = * (offset * (adj_1+1)) + xtime_nsec_2 * (offset * adj_1) + xtime_nsec_1 = * (offset * adj_1) + offset + xtime_nsec_2 * Canceling the sides: * xtime_nsec_1 = offset + xtime_nsec_2 * Which gives us: * xtime_nsec_2 = xtime_nsec_1 - offset * Which simplifies to: * xtime_nsec -= offset */ if ((mult_adj > 0) && (tk->tkr_mono.mult + mult_adj < mult_adj)) { /* NTP adjustment caused clocksource mult overflow */ WARN_ON_ONCE(1); return; } tk->tkr_mono.mult += mult_adj; tk->xtime_interval += interval; tk->tkr_mono.xtime_nsec -= offset; } /* * Adjust the timekeeper's multiplier to the correct frequency * and also to reduce the accumulated error value. */ static void timekeeping_adjust(struct timekeeper *tk, s64 offset) { u64 ntp_tl = ntp_tick_length(); u32 mult; /* * Determine the multiplier from the current NTP tick length. * Avoid expensive division when the tick length doesn't change. */ if (likely(tk->ntp_tick == ntp_tl)) { mult = tk->tkr_mono.mult - tk->ntp_err_mult; } else { tk->ntp_tick = ntp_tl; mult = div64_u64((tk->ntp_tick >> tk->ntp_error_shift) - tk->xtime_remainder, tk->cycle_interval); } /* * If the clock is behind the NTP time, increase the multiplier by 1 * to catch up with it. If it's ahead and there was a remainder in the * tick division, the clock will slow down. Otherwise it will stay * ahead until the tick length changes to a non-divisible value. */ tk->ntp_err_mult = tk->ntp_error > 0 ? 1 : 0; mult += tk->ntp_err_mult; timekeeping_apply_adjustment(tk, offset, mult - tk->tkr_mono.mult); if (unlikely(tk->tkr_mono.clock->maxadj && (abs(tk->tkr_mono.mult - tk->tkr_mono.clock->mult) > tk->tkr_mono.clock->maxadj))) { printk_once(KERN_WARNING "Adjusting %s more than 11%% (%ld vs %ld)\n", tk->tkr_mono.clock->name, (long)tk->tkr_mono.mult, (long)tk->tkr_mono.clock->mult + tk->tkr_mono.clock->maxadj); } /* * It may be possible that when we entered this function, xtime_nsec * was very small. Further, if we're slightly speeding the clocksource * in the code above, its possible the required corrective factor to * xtime_nsec could cause it to underflow. * * Now, since we have already accumulated the second and the NTP * subsystem has been notified via second_overflow(), we need to skip * the next update. */ if (unlikely((s64)tk->tkr_mono.xtime_nsec < 0)) { tk->tkr_mono.xtime_nsec += (u64)NSEC_PER_SEC << tk->tkr_mono.shift; tk->xtime_sec--; tk->skip_second_overflow = 1; } } /* * accumulate_nsecs_to_secs - Accumulates nsecs into secs * * Helper function that accumulates the nsecs greater than a second * from the xtime_nsec field to the xtime_secs field. * It also calls into the NTP code to handle leapsecond processing. */ static inline unsigned int accumulate_nsecs_to_secs(struct timekeeper *tk) { u64 nsecps = (u64)NSEC_PER_SEC << tk->tkr_mono.shift; unsigned int clock_set = 0; while (tk->tkr_mono.xtime_nsec >= nsecps) { int leap; tk->tkr_mono.xtime_nsec -= nsecps; tk->xtime_sec++; /* * Skip NTP update if this second was accumulated before, * i.e. xtime_nsec underflowed in timekeeping_adjust() */ if (unlikely(tk->skip_second_overflow)) { tk->skip_second_overflow = 0; continue; } /* Figure out if its a leap sec and apply if needed */ leap = second_overflow(tk->xtime_sec); if (unlikely(leap)) { struct timespec64 ts; tk->xtime_sec += leap; ts.tv_sec = leap; ts.tv_nsec = 0; tk_set_wall_to_mono(tk, timespec64_sub(tk->wall_to_monotonic, ts)); __timekeeping_set_tai_offset(tk, tk->tai_offset - leap); clock_set = TK_CLOCK_WAS_SET; } } return clock_set; } /* * logarithmic_accumulation - shifted accumulation of cycles * * This functions accumulates a shifted interval of cycles into * a shifted interval nanoseconds. Allows for O(log) accumulation * loop. * * Returns the unconsumed cycles. */ static u64 logarithmic_accumulation(struct timekeeper *tk, u64 offset, u32 shift, unsigned int *clock_set) { u64 interval = tk->cycle_interval << shift; u64 snsec_per_sec; /* If the offset is smaller than a shifted interval, do nothing */ if (offset < interval) return offset; /* Accumulate one shifted interval */ offset -= interval; tk->tkr_mono.cycle_last += interval; tk->tkr_raw.cycle_last += interval; tk->tkr_mono.xtime_nsec += tk->xtime_interval << shift; *clock_set |= accumulate_nsecs_to_secs(tk); /* Accumulate raw time */ tk->tkr_raw.xtime_nsec += tk->raw_interval << shift; snsec_per_sec = (u64)NSEC_PER_SEC << tk->tkr_raw.shift; while (tk->tkr_raw.xtime_nsec >= snsec_per_sec) { tk->tkr_raw.xtime_nsec -= snsec_per_sec; tk->raw_sec++; } /* Accumulate error between NTP and clock interval */ tk->ntp_error += tk->ntp_tick << shift; tk->ntp_error -= (tk->xtime_interval + tk->xtime_remainder) << (tk->ntp_error_shift + shift); return offset; } /* * timekeeping_advance - Updates the timekeeper to the current time and * current NTP tick length */ static bool timekeeping_advance(enum timekeeping_adv_mode mode) { struct timekeeper *tk = &tk_core.shadow_timekeeper; struct timekeeper *real_tk = &tk_core.timekeeper; unsigned int clock_set = 0; int shift = 0, maxshift; u64 offset; guard(raw_spinlock_irqsave)(&tk_core.lock); /* Make sure we're fully resumed: */ if (unlikely(timekeeping_suspended)) return false; offset = clocksource_delta(tk_clock_read(&tk->tkr_mono), tk->tkr_mono.cycle_last, tk->tkr_mono.mask, tk->tkr_mono.clock->max_raw_delta); /* Check if there's really nothing to do */ if (offset < real_tk->cycle_interval && mode == TK_ADV_TICK) return false; /* * With NO_HZ we may have to accumulate many cycle_intervals * (think "ticks") worth of time at once. To do this efficiently, * we calculate the largest doubling multiple of cycle_intervals * that is smaller than the offset. We then accumulate that * chunk in one go, and then try to consume the next smaller * doubled multiple. */ shift = ilog2(offset) - ilog2(tk->cycle_interval); shift = max(0, shift); /* Bound shift to one less than what overflows tick_length */ maxshift = (64 - (ilog2(ntp_tick_length())+1)) - 1; shift = min(shift, maxshift); while (offset >= tk->cycle_interval) { offset = logarithmic_accumulation(tk, offset, shift, &clock_set); if (offset < tk->cycle_interval<<shift) shift--; } /* Adjust the multiplier to correct NTP error */ timekeeping_adjust(tk, offset); /* * Finally, make sure that after the rounding * xtime_nsec isn't larger than NSEC_PER_SEC */ clock_set |= accumulate_nsecs_to_secs(tk); timekeeping_update_from_shadow(&tk_core, clock_set); return !!clock_set; } /** * update_wall_time - Uses the current clocksource to increment the wall time * */ void update_wall_time(void) { if (timekeeping_advance(TK_ADV_TICK)) clock_was_set_delayed(); } /** * getboottime64 - Return the real time of system boot. * @ts: pointer to the timespec64 to be set * * Returns the wall-time of boot in a timespec64. * * This is based on the wall_to_monotonic offset and the total suspend * time. Calls to settimeofday will affect the value returned (which * basically means that however wrong your real time clock is at boot time, * you get the right time here). */ void getboottime64(struct timespec64 *ts) { struct timekeeper *tk = &tk_core.timekeeper; ktime_t t = ktime_sub(tk->offs_real, tk->offs_boot); *ts = ktime_to_timespec64(t); } EXPORT_SYMBOL_GPL(getboottime64); void ktime_get_coarse_real_ts64(struct timespec64 *ts) { struct timekeeper *tk = &tk_core.timekeeper; unsigned int seq; do { seq = read_seqcount_begin(&tk_core.seq); *ts = tk_xtime(tk); } while (read_seqcount_retry(&tk_core.seq, seq)); } EXPORT_SYMBOL(ktime_get_coarse_real_ts64); /** * ktime_get_coarse_real_ts64_mg - return latter of coarse grained time or floor * @ts: timespec64 to be filled * * Fetch the global mg_floor value, convert it to realtime and compare it * to the current coarse-grained time. Fill @ts with whichever is * latest. Note that this is a filesystem-specific interface and should be * avoided outside of that context. */ void ktime_get_coarse_real_ts64_mg(struct timespec64 *ts) { struct timekeeper *tk = &tk_core.timekeeper; u64 floor = atomic64_read(&mg_floor); ktime_t f_real, offset, coarse; unsigned int seq; do { seq = read_seqcount_begin(&tk_core.seq); *ts = tk_xtime(tk); offset = tk_core.timekeeper.offs_real; } while (read_seqcount_retry(&tk_core.seq, seq)); coarse = timespec64_to_ktime(*ts); f_real = ktime_add(floor, offset); if (ktime_after(f_real, coarse)) *ts = ktime_to_timespec64(f_real); } /** * ktime_get_real_ts64_mg - attempt to update floor value and return result * @ts: pointer to the timespec to be set * * Get a monotonic fine-grained time value and attempt to swap it into * mg_floor. If that succeeds then accept the new floor value. If it fails * then another task raced in during the interim time and updated the * floor. Since any update to the floor must be later than the previous * floor, either outcome is acceptable. * * Typically this will be called after calling ktime_get_coarse_real_ts64_mg(), * and determining that the resulting coarse-grained timestamp did not effect * a change in ctime. Any more recent floor value would effect a change to * ctime, so there is no need to retry the atomic64_try_cmpxchg() on failure. * * @ts will be filled with the latest floor value, regardless of the outcome of * the cmpxchg. Note that this is a filesystem specific interface and should be * avoided outside of that context. */ void ktime_get_real_ts64_mg(struct timespec64 *ts) { struct timekeeper *tk = &tk_core.timekeeper; ktime_t old = atomic64_read(&mg_floor); ktime_t offset, mono; unsigned int seq; u64 nsecs; do { seq = read_seqcount_begin(&tk_core.seq); ts->tv_sec = tk->xtime_sec; mono = tk->tkr_mono.base; nsecs = timekeeping_get_ns(&tk->tkr_mono); offset = tk_core.timekeeper.offs_real; } while (read_seqcount_retry(&tk_core.seq, seq)); mono = ktime_add_ns(mono, nsecs); /* * Attempt to update the floor with the new time value. As any * update must be later then the existing floor, and would effect * a change to ctime from the perspective of the current task, * accept the resulting floor value regardless of the outcome of * the swap. */ if (atomic64_try_cmpxchg(&mg_floor, &old, mono)) { ts->tv_nsec = 0; timespec64_add_ns(ts, nsecs); timekeeping_inc_mg_floor_swaps(); } else { /* * Another task changed mg_floor since "old" was fetched. * "old" has been updated with the latest value of "mg_floor". * That value is newer than the previous floor value, which * is enough to effect a change to ctime. Accept it. */ *ts = ktime_to_timespec64(ktime_add(old, offset)); } } void ktime_get_coarse_ts64(struct timespec64 *ts) { struct timekeeper *tk = &tk_core.timekeeper; struct timespec64 now, mono; unsigned int seq; do { seq = read_seqcount_begin(&tk_core.seq); now = tk_xtime(tk); mono = tk->wall_to_monotonic; } while (read_seqcount_retry(&tk_core.seq, seq)); set_normalized_timespec64(ts, now.tv_sec + mono.tv_sec, now.tv_nsec + mono.tv_nsec); } EXPORT_SYMBOL(ktime_get_coarse_ts64); /* * Must hold jiffies_lock */ void do_timer(unsigned long ticks) { jiffies_64 += ticks; calc_global_load(); } /** * ktime_get_update_offsets_now - hrtimer helper * @cwsseq: pointer to check and store the clock was set sequence number * @offs_real: pointer to storage for monotonic -> realtime offset * @offs_boot: pointer to storage for monotonic -> boottime offset * @offs_tai: pointer to storage for monotonic -> clock tai offset * * Returns current monotonic time and updates the offsets if the * sequence number in @cwsseq and timekeeper.clock_was_set_seq are * different. * * Called from hrtimer_interrupt() or retrigger_next_event() */ ktime_t ktime_get_update_offsets_now(unsigned int *cwsseq, ktime_t *offs_real, ktime_t *offs_boot, ktime_t *offs_tai) { struct timekeeper *tk = &tk_core.timekeeper; unsigned int seq; ktime_t base; u64 nsecs; do { seq = read_seqcount_begin(&tk_core.seq); base = tk->tkr_mono.base; nsecs = timekeeping_get_ns(&tk->tkr_mono); base = ktime_add_ns(base, nsecs); if (*cwsseq != tk->clock_was_set_seq) { *cwsseq = tk->clock_was_set_seq; *offs_real = tk->offs_real; *offs_boot = tk->offs_boot; *offs_tai = tk->offs_tai; } /* Handle leapsecond insertion adjustments */ if (unlikely(base >= tk->next_leap_ktime)) *offs_real = ktime_sub(tk->offs_real, ktime_set(1, 0)); } while (read_seqcount_retry(&tk_core.seq, seq)); return base; } /* * timekeeping_validate_timex - Ensures the timex is ok for use in do_adjtimex */ static int timekeeping_validate_timex(const struct __kernel_timex *txc) { if (txc->modes & ADJ_ADJTIME) { /* singleshot must not be used with any other mode bits */ if (!(txc->modes & ADJ_OFFSET_SINGLESHOT)) return -EINVAL; if (!(txc->modes & ADJ_OFFSET_READONLY) && !capable(CAP_SYS_TIME)) return -EPERM; } else { /* In order to modify anything, you gotta be super-user! */ if (txc->modes && !capable(CAP_SYS_TIME)) return -EPERM; /* * if the quartz is off by more than 10% then * something is VERY wrong! */ if (txc->modes & ADJ_TICK && (txc->tick < 900000/USER_HZ || txc->tick > 1100000/USER_HZ)) return -EINVAL; } if (txc->modes & ADJ_SETOFFSET) { /* In order to inject time, you gotta be super-user! */ if (!capable(CAP_SYS_TIME)) return -EPERM; /* * Validate if a timespec/timeval used to inject a time * offset is valid. Offsets can be positive or negative, so * we don't check tv_sec. The value of the timeval/timespec * is the sum of its fields,but *NOTE*: * The field tv_usec/tv_nsec must always be non-negative and * we can't have more nanoseconds/microseconds than a second. */ if (txc->time.tv_usec < 0) return -EINVAL; if (txc->modes & ADJ_NANO) { if (txc->time.tv_usec >= NSEC_PER_SEC) return -EINVAL; } else { if (txc->time.tv_usec >= USEC_PER_SEC) return -EINVAL; } } /* * Check for potential multiplication overflows that can * only happen on 64-bit systems: */ if ((txc->modes & ADJ_FREQUENCY) && (BITS_PER_LONG == 64)) { if (LLONG_MIN / PPM_SCALE > txc->freq) return -EINVAL; if (LLONG_MAX / PPM_SCALE < txc->freq) return -EINVAL; } return 0; } /** * random_get_entropy_fallback - Returns the raw clock source value, * used by random.c for platforms with no valid random_get_entropy(). */ unsigned long random_get_entropy_fallback(void) { struct tk_read_base *tkr = &tk_core.timekeeper.tkr_mono; struct clocksource *clock = READ_ONCE(tkr->clock); if (unlikely(timekeeping_suspended || !clock)) return 0; return clock->read(clock); } EXPORT_SYMBOL_GPL(random_get_entropy_fallback); /** * do_adjtimex() - Accessor function to NTP __do_adjtimex function * @txc: Pointer to kernel_timex structure containing NTP parameters */ int do_adjtimex(struct __kernel_timex *txc) { struct audit_ntp_data ad; bool offset_set = false; bool clock_set = false; struct timespec64 ts; int ret; /* Validate the data before disabling interrupts */ ret = timekeeping_validate_timex(txc); if (ret) return ret; add_device_randomness(txc, sizeof(*txc)); if (txc->modes & ADJ_SETOFFSET) { struct timespec64 delta; delta.tv_sec = txc->time.tv_sec; delta.tv_nsec = txc->time.tv_usec; if (!(txc->modes & ADJ_NANO)) delta.tv_nsec *= 1000; ret = timekeeping_inject_offset(&delta); if (ret) return ret; offset_set = delta.tv_sec != 0; audit_tk_injoffset(delta); } audit_ntp_init(&ad); ktime_get_real_ts64(&ts); add_device_randomness(&ts, sizeof(ts)); scoped_guard (raw_spinlock_irqsave, &tk_core.lock) { struct timekeeper *tks = &tk_core.shadow_timekeeper; s32 orig_tai, tai; orig_tai = tai = tks->tai_offset; ret = __do_adjtimex(txc, &ts, &tai, &ad); if (tai != orig_tai) { __timekeeping_set_tai_offset(tks, tai); timekeeping_update_from_shadow(&tk_core, TK_CLOCK_WAS_SET); clock_set = true; } else { tk_update_leap_state_all(&tk_core); } } audit_ntp_log(&ad); /* Update the multiplier immediately if frequency was set directly */ if (txc->modes & (ADJ_FREQUENCY | ADJ_TICK)) clock_set |= timekeeping_advance(TK_ADV_FREQ); if (clock_set) clock_was_set(CLOCK_SET_WALL); ntp_notify_cmos_timer(offset_set); return ret; } #ifdef CONFIG_NTP_PPS /** * hardpps() - Accessor function to NTP __hardpps function * @phase_ts: Pointer to timespec64 structure representing phase timestamp * @raw_ts: Pointer to timespec64 structure representing raw timestamp */ void hardpps(const struct timespec64 *phase_ts, const struct timespec64 *raw_ts) { guard(raw_spinlock_irqsave)(&tk_core.lock); __hardpps(phase_ts, raw_ts); } EXPORT_SYMBOL(hardpps); #endif /* CONFIG_NTP_PPS */ |
| 126 136 134 181 179 180 182 176 177 178 45 115 177 38 68 126 178 107 152 176 121 103 48 177 176 176 177 175 167 163 150 23 165 147 1 144 6 148 1 112 110 159 158 158 158 154 124 16 156 107 8 1 1 107 136 1 1 1 3 128 1 1 2 131 115 115 105 115 107 115 3 113 5 112 111 108 110 110 109 110 73 75 74 76 75 2 161 159 160 7 150 154 143 6 151 150 2 152 112 146 8 125 20 18 1 144 6 137 142 152 150 17 138 7 1 6 106 103 23 105 104 105 105 105 106 4 4 4 103 104 106 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 | /* * Copyright (C) 2014 Red Hat * Copyright (C) 2014 Intel Corp. * Copyright (c) 2020-2021, The Linux Foundation. All rights reserved. * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and associated documentation files (the "Software"), * to deal in the Software without restriction, including without limitation * the rights to use, copy, modify, merge, publish, distribute, sublicense, * and/or sell copies of the Software, and to permit persons to whom the * Software is furnished to do so, subject to the following conditions: * * The above copyright notice and this permission notice shall be included in * all copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR * OTHER DEALINGS IN THE SOFTWARE. * * Authors: * Rob Clark <robdclark@gmail.com> * Daniel Vetter <daniel.vetter@ffwll.ch> */ #include <linux/sync_file.h> #include <drm/drm_atomic.h> #include <drm/drm_atomic_uapi.h> #include <drm/drm_blend.h> #include <drm/drm_bridge.h> #include <drm/drm_debugfs.h> #include <drm/drm_device.h> #include <drm/drm_drv.h> #include <drm/drm_file.h> #include <drm/drm_fourcc.h> #include <drm/drm_framebuffer.h> #include <drm/drm_mode.h> #include <drm/drm_print.h> #include <drm/drm_writeback.h> #include "drm_crtc_internal.h" #include "drm_internal.h" void __drm_crtc_commit_free(struct kref *kref) { struct drm_crtc_commit *commit = container_of(kref, struct drm_crtc_commit, ref); kfree(commit); } EXPORT_SYMBOL(__drm_crtc_commit_free); /** * drm_crtc_commit_wait - Waits for a commit to complete * @commit: &drm_crtc_commit to wait for * * Waits for a given &drm_crtc_commit to be programmed into the * hardware and flipped to. * * Returns: * 0 on success, a negative error code otherwise. */ int drm_crtc_commit_wait(struct drm_crtc_commit *commit) { unsigned long timeout = 10 * HZ; int ret; if (!commit) return 0; ret = wait_for_completion_timeout(&commit->hw_done, timeout); if (!ret) { drm_err(commit->crtc->dev, "hw_done timed out\n"); return -ETIMEDOUT; } /* * Currently no support for overwriting flips, hence * stall for previous one to execute completely. */ ret = wait_for_completion_timeout(&commit->flip_done, timeout); if (!ret) { drm_err(commit->crtc->dev, "flip_done timed out\n"); return -ETIMEDOUT; } return 0; } EXPORT_SYMBOL(drm_crtc_commit_wait); /** * drm_atomic_state_default_release - * release memory initialized by drm_atomic_state_init * @state: atomic state * * Free all the memory allocated by drm_atomic_state_init. * This should only be used by drivers which are still subclassing * &drm_atomic_state and haven't switched to &drm_private_state yet. */ void drm_atomic_state_default_release(struct drm_atomic_state *state) { kfree(state->connectors); kfree(state->crtcs); kfree(state->planes); kfree(state->private_objs); } EXPORT_SYMBOL(drm_atomic_state_default_release); /** * drm_atomic_state_init - init new atomic state * @dev: DRM device * @state: atomic state * * Default implementation for filling in a new atomic state. * This should only be used by drivers which are still subclassing * &drm_atomic_state and haven't switched to &drm_private_state yet. */ int drm_atomic_state_init(struct drm_device *dev, struct drm_atomic_state *state) { kref_init(&state->ref); /* TODO legacy paths should maybe do a better job about * setting this appropriately? */ state->allow_modeset = true; state->crtcs = kcalloc(dev->mode_config.num_crtc, sizeof(*state->crtcs), GFP_KERNEL); if (!state->crtcs) goto fail; state->planes = kcalloc(dev->mode_config.num_total_plane, sizeof(*state->planes), GFP_KERNEL); if (!state->planes) goto fail; /* * Because drm_atomic_state can be committed asynchronously we need our * own reference and cannot rely on the on implied by drm_file in the * ioctl call. */ drm_dev_get(dev); state->dev = dev; drm_dbg_atomic(dev, "Allocated atomic state %p\n", state); return 0; fail: drm_atomic_state_default_release(state); return -ENOMEM; } EXPORT_SYMBOL(drm_atomic_state_init); /** * drm_atomic_state_alloc - allocate atomic state * @dev: DRM device * * This allocates an empty atomic state to track updates. */ struct drm_atomic_state * drm_atomic_state_alloc(struct drm_device *dev) { struct drm_mode_config *config = &dev->mode_config; if (!config->funcs->atomic_state_alloc) { struct drm_atomic_state *state; state = kzalloc(sizeof(*state), GFP_KERNEL); if (!state) return NULL; if (drm_atomic_state_init(dev, state) < 0) { kfree(state); return NULL; } return state; } return config->funcs->atomic_state_alloc(dev); } EXPORT_SYMBOL(drm_atomic_state_alloc); /** * drm_atomic_state_default_clear - clear base atomic state * @state: atomic state * * Default implementation for clearing atomic state. * This should only be used by drivers which are still subclassing * &drm_atomic_state and haven't switched to &drm_private_state yet. */ void drm_atomic_state_default_clear(struct drm_atomic_state *state) { struct drm_device *dev = state->dev; struct drm_mode_config *config = &dev->mode_config; int i; drm_dbg_atomic(dev, "Clearing atomic state %p\n", state); for (i = 0; i < state->num_connector; i++) { struct drm_connector *connector = state->connectors[i].ptr; if (!connector) continue; connector->funcs->atomic_destroy_state(connector, state->connectors[i].state); state->connectors[i].ptr = NULL; state->connectors[i].state = NULL; state->connectors[i].old_state = NULL; state->connectors[i].new_state = NULL; drm_connector_put(connector); } for (i = 0; i < config->num_crtc; i++) { struct drm_crtc *crtc = state->crtcs[i].ptr; if (!crtc) continue; crtc->funcs->atomic_destroy_state(crtc, state->crtcs[i].state); state->crtcs[i].ptr = NULL; state->crtcs[i].state = NULL; state->crtcs[i].old_state = NULL; state->crtcs[i].new_state = NULL; if (state->crtcs[i].commit) { drm_crtc_commit_put(state->crtcs[i].commit); state->crtcs[i].commit = NULL; } } for (i = 0; i < config->num_total_plane; i++) { struct drm_plane *plane = state->planes[i].ptr; if (!plane) continue; plane->funcs->atomic_destroy_state(plane, state->planes[i].state); state->planes[i].ptr = NULL; state->planes[i].state = NULL; state->planes[i].old_state = NULL; state->planes[i].new_state = NULL; } for (i = 0; i < state->num_private_objs; i++) { struct drm_private_obj *obj = state->private_objs[i].ptr; obj->funcs->atomic_destroy_state(obj, state->private_objs[i].state); state->private_objs[i].ptr = NULL; state->private_objs[i].state = NULL; state->private_objs[i].old_state = NULL; state->private_objs[i].new_state = NULL; } state->num_private_objs = 0; if (state->fake_commit) { drm_crtc_commit_put(state->fake_commit); state->fake_commit = NULL; } } EXPORT_SYMBOL(drm_atomic_state_default_clear); /** * drm_atomic_state_clear - clear state object * @state: atomic state * * When the w/w mutex algorithm detects a deadlock we need to back off and drop * all locks. So someone else could sneak in and change the current modeset * configuration. Which means that all the state assembled in @state is no * longer an atomic update to the current state, but to some arbitrary earlier * state. Which could break assumptions the driver's * &drm_mode_config_funcs.atomic_check likely relies on. * * Hence we must clear all cached state and completely start over, using this * function. */ void drm_atomic_state_clear(struct drm_atomic_state *state) { struct drm_device *dev = state->dev; struct drm_mode_config *config = &dev->mode_config; if (config->funcs->atomic_state_clear) config->funcs->atomic_state_clear(state); else drm_atomic_state_default_clear(state); } EXPORT_SYMBOL(drm_atomic_state_clear); /** * __drm_atomic_state_free - free all memory for an atomic state * @ref: This atomic state to deallocate * * This frees all memory associated with an atomic state, including all the * per-object state for planes, CRTCs and connectors. */ void __drm_atomic_state_free(struct kref *ref) { struct drm_atomic_state *state = container_of(ref, typeof(*state), ref); struct drm_device *dev = state->dev; struct drm_mode_config *config = &dev->mode_config; drm_atomic_state_clear(state); drm_dbg_atomic(state->dev, "Freeing atomic state %p\n", state); if (config->funcs->atomic_state_free) { config->funcs->atomic_state_free(state); } else { drm_atomic_state_default_release(state); kfree(state); } drm_dev_put(dev); } EXPORT_SYMBOL(__drm_atomic_state_free); /** * drm_atomic_get_crtc_state - get CRTC state * @state: global atomic state object * @crtc: CRTC to get state object for * * This function returns the CRTC state for the given CRTC, allocating it if * needed. It will also grab the relevant CRTC lock to make sure that the state * is consistent. * * WARNING: Drivers may only add new CRTC states to a @state if * drm_atomic_state.allow_modeset is set, or if it's a driver-internal commit * not created by userspace through an IOCTL call. * * Returns: * Either the allocated state or the error code encoded into the pointer. When * the error is EDEADLK then the w/w mutex code has detected a deadlock and the * entire atomic sequence must be restarted. All other errors are fatal. */ struct drm_crtc_state * drm_atomic_get_crtc_state(struct drm_atomic_state *state, struct drm_crtc *crtc) { int ret, index = drm_crtc_index(crtc); struct drm_crtc_state *crtc_state; WARN_ON(!state->acquire_ctx); crtc_state = drm_atomic_get_existing_crtc_state(state, crtc); if (crtc_state) return crtc_state; ret = drm_modeset_lock(&crtc->mutex, state->acquire_ctx); if (ret) return ERR_PTR(ret); crtc_state = crtc->funcs->atomic_duplicate_state(crtc); if (!crtc_state) return ERR_PTR(-ENOMEM); state->crtcs[index].state = crtc_state; state->crtcs[index].old_state = crtc->state; state->crtcs[index].new_state = crtc_state; state->crtcs[index].ptr = crtc; crtc_state->state = state; drm_dbg_atomic(state->dev, "Added [CRTC:%d:%s] %p state to %p\n", crtc->base.id, crtc->name, crtc_state, state); return crtc_state; } EXPORT_SYMBOL(drm_atomic_get_crtc_state); static int drm_atomic_crtc_check(const struct drm_crtc_state *old_crtc_state, const struct drm_crtc_state *new_crtc_state) { struct drm_crtc *crtc = new_crtc_state->crtc; /* NOTE: we explicitly don't enforce constraints such as primary * layer covering entire screen, since that is something we want * to allow (on hw that supports it). For hw that does not, it * should be checked in driver's crtc->atomic_check() vfunc. * * TODO: Add generic modeset state checks once we support those. */ if (new_crtc_state->active && !new_crtc_state->enable) { drm_dbg_atomic(crtc->dev, "[CRTC:%d:%s] active without enabled\n", crtc->base.id, crtc->name); return -EINVAL; } /* The state->enable vs. state->mode_blob checks can be WARN_ON, * as this is a kernel-internal detail that userspace should never * be able to trigger. */ if (drm_core_check_feature(crtc->dev, DRIVER_ATOMIC) && WARN_ON(new_crtc_state->enable && !new_crtc_state->mode_blob)) { drm_dbg_atomic(crtc->dev, "[CRTC:%d:%s] enabled without mode blob\n", crtc->base.id, crtc->name); return -EINVAL; } if (drm_core_check_feature(crtc->dev, DRIVER_ATOMIC) && WARN_ON(!new_crtc_state->enable && new_crtc_state->mode_blob)) { drm_dbg_atomic(crtc->dev, "[CRTC:%d:%s] disabled with mode blob\n", crtc->base.id, crtc->name); return -EINVAL; } /* * Reject event generation for when a CRTC is off and stays off. * It wouldn't be hard to implement this, but userspace has a track * record of happily burning through 100% cpu (or worse, crash) when the * display pipe is suspended. To avoid all that fun just reject updates * that ask for events since likely that indicates a bug in the * compositor's drawing loop. This is consistent with the vblank IOCTL * and legacy page_flip IOCTL which also reject service on a disabled * pipe. */ if (new_crtc_state->event && !new_crtc_state->active && !old_crtc_state->active) { drm_dbg_atomic(crtc->dev, "[CRTC:%d:%s] requesting event but off\n", crtc->base.id, crtc->name); return -EINVAL; } return 0; } static void drm_atomic_crtc_print_state(struct drm_printer *p, const struct drm_crtc_state *state) { struct drm_crtc *crtc = state->crtc; drm_printf(p, "crtc[%u]: %s\n", crtc->base.id, crtc->name); drm_printf(p, "\tenable=%d\n", state->enable); drm_printf(p, "\tactive=%d\n", state->active); drm_printf(p, "\tself_refresh_active=%d\n", state->self_refresh_active); drm_printf(p, "\tplanes_changed=%d\n", state->planes_changed); drm_printf(p, "\tmode_changed=%d\n", state->mode_changed); drm_printf(p, "\tactive_changed=%d\n", state->active_changed); drm_printf(p, "\tconnectors_changed=%d\n", state->connectors_changed); drm_printf(p, "\tcolor_mgmt_changed=%d\n", state->color_mgmt_changed); drm_printf(p, "\tplane_mask=%x\n", state->plane_mask); drm_printf(p, "\tconnector_mask=%x\n", state->connector_mask); drm_printf(p, "\tencoder_mask=%x\n", state->encoder_mask); drm_printf(p, "\tmode: " DRM_MODE_FMT "\n", DRM_MODE_ARG(&state->mode)); if (crtc->funcs->atomic_print_state) crtc->funcs->atomic_print_state(p, state); } static int drm_atomic_connector_check(struct drm_connector *connector, struct drm_connector_state *state) { struct drm_crtc_state *crtc_state; struct drm_writeback_job *writeback_job = state->writeback_job; const struct drm_display_info *info = &connector->display_info; state->max_bpc = info->bpc ? info->bpc : 8; if (connector->max_bpc_property) state->max_bpc = min(state->max_bpc, state->max_requested_bpc); if ((connector->connector_type != DRM_MODE_CONNECTOR_WRITEBACK) || !writeback_job) return 0; if (writeback_job->fb && !state->crtc) { drm_dbg_atomic(connector->dev, "[CONNECTOR:%d:%s] framebuffer without CRTC\n", connector->base.id, connector->name); return -EINVAL; } if (state->crtc) crtc_state = drm_atomic_get_existing_crtc_state(state->state, state->crtc); if (writeback_job->fb && !crtc_state->active) { drm_dbg_atomic(connector->dev, "[CONNECTOR:%d:%s] has framebuffer, but [CRTC:%d] is off\n", connector->base.id, connector->name, state->crtc->base.id); return -EINVAL; } if (!writeback_job->fb) { if (writeback_job->out_fence) { drm_dbg_atomic(connector->dev, "[CONNECTOR:%d:%s] requesting out-fence without framebuffer\n", connector->base.id, connector->name); return -EINVAL; } drm_writeback_cleanup_job(writeback_job); state->writeback_job = NULL; } return 0; } /** * drm_atomic_get_plane_state - get plane state * @state: global atomic state object * @plane: plane to get state object for * * This function returns the plane state for the given plane, allocating it if * needed. It will also grab the relevant plane lock to make sure that the state * is consistent. * * Returns: * Either the allocated state or the error code encoded into the pointer. When * the error is EDEADLK then the w/w mutex code has detected a deadlock and the * entire atomic sequence must be restarted. All other errors are fatal. */ struct drm_plane_state * drm_atomic_get_plane_state(struct drm_atomic_state *state, struct drm_plane *plane) { int ret, index = drm_plane_index(plane); struct drm_plane_state *plane_state; WARN_ON(!state->acquire_ctx); /* the legacy pointers should never be set */ WARN_ON(plane->fb); WARN_ON(plane->old_fb); WARN_ON(plane->crtc); plane_state = drm_atomic_get_existing_plane_state(state, plane); if (plane_state) return plane_state; ret = drm_modeset_lock(&plane->mutex, state->acquire_ctx); if (ret) return ERR_PTR(ret); plane_state = plane->funcs->atomic_duplicate_state(plane); if (!plane_state) return ERR_PTR(-ENOMEM); state->planes[index].state = plane_state; state->planes[index].ptr = plane; state->planes[index].old_state = plane->state; state->planes[index].new_state = plane_state; plane_state->state = state; drm_dbg_atomic(plane->dev, "Added [PLANE:%d:%s] %p state to %p\n", plane->base.id, plane->name, plane_state, state); if (plane_state->crtc) { struct drm_crtc_state *crtc_state; crtc_state = drm_atomic_get_crtc_state(state, plane_state->crtc); if (IS_ERR(crtc_state)) return ERR_CAST(crtc_state); } return plane_state; } EXPORT_SYMBOL(drm_atomic_get_plane_state); static bool plane_switching_crtc(const struct drm_plane_state *old_plane_state, const struct drm_plane_state *new_plane_state) { if (!old_plane_state->crtc || !new_plane_state->crtc) return false; if (old_plane_state->crtc == new_plane_state->crtc) return false; /* This could be refined, but currently there's no helper or driver code * to implement direct switching of active planes nor userspace to take * advantage of more direct plane switching without the intermediate * full OFF state. */ return true; } /** * drm_atomic_plane_check - check plane state * @old_plane_state: old plane state to check * @new_plane_state: new plane state to check * * Provides core sanity checks for plane state. * * RETURNS: * Zero on success, error code on failure */ static int drm_atomic_plane_check(const struct drm_plane_state *old_plane_state, const struct drm_plane_state *new_plane_state) { struct drm_plane *plane = new_plane_state->plane; struct drm_crtc *crtc = new_plane_state->crtc; const struct drm_framebuffer *fb = new_plane_state->fb; unsigned int fb_width, fb_height; struct drm_mode_rect *clips; uint32_t num_clips; /* either *both* CRTC and FB must be set, or neither */ if (crtc && !fb) { drm_dbg_atomic(plane->dev, "[PLANE:%d:%s] CRTC set but no FB\n", plane->base.id, plane->name); return -EINVAL; } else if (fb && !crtc) { drm_dbg_atomic(plane->dev, "[PLANE:%d:%s] FB set but no CRTC\n", plane->base.id, plane->name); return -EINVAL; } /* if disabled, we don't care about the rest of the state: */ if (!crtc) return 0; /* Check whether this plane is usable on this CRTC */ if (!(plane->possible_crtcs & drm_crtc_mask(crtc))) { drm_dbg_atomic(plane->dev, "Invalid [CRTC:%d:%s] for [PLANE:%d:%s]\n", crtc->base.id, crtc->name, plane->base.id, plane->name); return -EINVAL; } /* Check whether this plane supports the fb pixel format. */ if (!drm_plane_has_format(plane, fb->format->format, fb->modifier)) { drm_dbg_atomic(plane->dev, "[PLANE:%d:%s] invalid pixel format %p4cc, modifier 0x%llx\n", plane->base.id, plane->name, &fb->format->format, fb->modifier); return -EINVAL; } /* Give drivers some help against integer overflows */ if (new_plane_state->crtc_w > INT_MAX || new_plane_state->crtc_x > INT_MAX - (int32_t) new_plane_state->crtc_w || new_plane_state->crtc_h > INT_MAX || new_plane_state->crtc_y > INT_MAX - (int32_t) new_plane_state->crtc_h) { drm_dbg_atomic(plane->dev, "[PLANE:%d:%s] invalid CRTC coordinates %ux%u+%d+%d\n", plane->base.id, plane->name, new_plane_state->crtc_w, new_plane_state->crtc_h, new_plane_state->crtc_x, new_plane_state->crtc_y); return -ERANGE; } fb_width = fb->width << 16; fb_height = fb->height << 16; /* Make sure source coordinates are inside the fb. */ if (new_plane_state->src_w > fb_width || new_plane_state->src_x > fb_width - new_plane_state->src_w || new_plane_state->src_h > fb_height || new_plane_state->src_y > fb_height - new_plane_state->src_h) { drm_dbg_atomic(plane->dev, "[PLANE:%d:%s] invalid source coordinates " "%u.%06ux%u.%06u+%u.%06u+%u.%06u (fb %ux%u)\n", plane->base.id, plane->name, new_plane_state->src_w >> 16, ((new_plane_state->src_w & 0xffff) * 15625) >> 10, new_plane_state->src_h >> 16, ((new_plane_state->src_h & 0xffff) * 15625) >> 10, new_plane_state->src_x >> 16, ((new_plane_state->src_x & 0xffff) * 15625) >> 10, new_plane_state->src_y >> 16, ((new_plane_state->src_y & 0xffff) * 15625) >> 10, fb->width, fb->height); return -ENOSPC; } clips = __drm_plane_get_damage_clips(new_plane_state); num_clips = drm_plane_get_damage_clips_count(new_plane_state); /* Make sure damage clips are valid and inside the fb. */ while (num_clips > 0) { if (clips->x1 >= clips->x2 || clips->y1 >= clips->y2 || clips->x1 < 0 || clips->y1 < 0 || clips->x2 > fb_width || clips->y2 > fb_height) { drm_dbg_atomic(plane->dev, "[PLANE:%d:%s] invalid damage clip %d %d %d %d\n", plane->base.id, plane->name, clips->x1, clips->y1, clips->x2, clips->y2); return -EINVAL; } clips++; num_clips--; } if (plane_switching_crtc(old_plane_state, new_plane_state)) { drm_dbg_atomic(plane->dev, "[PLANE:%d:%s] switching CRTC directly\n", plane->base.id, plane->name); return -EINVAL; } return 0; } static void drm_atomic_plane_print_state(struct drm_printer *p, const struct drm_plane_state *state) { struct drm_plane *plane = state->plane; struct drm_rect src = drm_plane_state_src(state); struct drm_rect dest = drm_plane_state_dest(state); drm_printf(p, "plane[%u]: %s\n", plane->base.id, plane->name); drm_printf(p, "\tcrtc=%s\n", state->crtc ? state->crtc->name : "(null)"); drm_printf(p, "\tfb=%u\n", state->fb ? state->fb->base.id : 0); if (state->fb) drm_framebuffer_print_info(p, 2, state->fb); drm_printf(p, "\tcrtc-pos=" DRM_RECT_FMT "\n", DRM_RECT_ARG(&dest)); drm_printf(p, "\tsrc-pos=" DRM_RECT_FP_FMT "\n", DRM_RECT_FP_ARG(&src)); drm_printf(p, "\trotation=%x\n", state->rotation); drm_printf(p, "\tnormalized-zpos=%x\n", state->normalized_zpos); drm_printf(p, "\tcolor-encoding=%s\n", drm_get_color_encoding_name(state->color_encoding)); drm_printf(p, "\tcolor-range=%s\n", drm_get_color_range_name(state->color_range)); drm_printf(p, "\tcolor_mgmt_changed=%d\n", state->color_mgmt_changed); if (plane->funcs->atomic_print_state) plane->funcs->atomic_print_state(p, state); } /** * DOC: handling driver private state * * Very often the DRM objects exposed to userspace in the atomic modeset api * (&drm_connector, &drm_crtc and &drm_plane) do not map neatly to the * underlying hardware. Especially for any kind of shared resources (e.g. shared * clocks, scaler units, bandwidth and fifo limits shared among a group of * planes or CRTCs, and so on) it makes sense to model these as independent * objects. Drivers then need to do similar state tracking and commit ordering for * such private (since not exposed to userspace) objects as the atomic core and * helpers already provide for connectors, planes and CRTCs. * * To make this easier on drivers the atomic core provides some support to track * driver private state objects using struct &drm_private_obj, with the * associated state struct &drm_private_state. * * Similar to userspace-exposed objects, private state structures can be * acquired by calling drm_atomic_get_private_obj_state(). This also takes care * of locking, hence drivers should not have a need to call drm_modeset_lock() * directly. Sequence of the actual hardware state commit is not handled, * drivers might need to keep track of struct drm_crtc_commit within subclassed * structure of &drm_private_state as necessary, e.g. similar to * &drm_plane_state.commit. See also &drm_atomic_state.fake_commit. * * All private state structures contained in a &drm_atomic_state update can be * iterated using for_each_oldnew_private_obj_in_state(), * for_each_new_private_obj_in_state() and for_each_old_private_obj_in_state(). * Drivers are recommended to wrap these for each type of driver private state * object they have, filtering on &drm_private_obj.funcs using for_each_if(), at * least if they want to iterate over all objects of a given type. * * An earlier way to handle driver private state was by subclassing struct * &drm_atomic_state. But since that encourages non-standard ways to implement * the check/commit split atomic requires (by using e.g. "check and rollback or * commit instead" of "duplicate state, check, then either commit or release * duplicated state) it is deprecated in favour of using &drm_private_state. */ /** * drm_atomic_private_obj_init - initialize private object * @dev: DRM device this object will be attached to * @obj: private object * @state: initial private object state * @funcs: pointer to the struct of function pointers that identify the object * type * * Initialize the private object, which can be embedded into any * driver private object that needs its own atomic state. */ void drm_atomic_private_obj_init(struct drm_device *dev, struct drm_private_obj *obj, struct drm_private_state *state, const struct drm_private_state_funcs *funcs) { memset(obj, 0, sizeof(*obj)); drm_modeset_lock_init(&obj->lock); obj->state = state; obj->funcs = funcs; list_add_tail(&obj->head, &dev->mode_config.privobj_list); state->obj = obj; } EXPORT_SYMBOL(drm_atomic_private_obj_init); /** * drm_atomic_private_obj_fini - finalize private object * @obj: private object * * Finalize the private object. */ void drm_atomic_private_obj_fini(struct drm_private_obj *obj) { list_del(&obj->head); obj->funcs->atomic_destroy_state(obj, obj->state); drm_modeset_lock_fini(&obj->lock); } EXPORT_SYMBOL(drm_atomic_private_obj_fini); /** * drm_atomic_get_private_obj_state - get private object state * @state: global atomic state * @obj: private object to get the state for * * This function returns the private object state for the given private object, * allocating the state if needed. It will also grab the relevant private * object lock to make sure that the state is consistent. * * RETURNS: * Either the allocated state or the error code encoded into a pointer. */ struct drm_private_state * drm_atomic_get_private_obj_state(struct drm_atomic_state *state, struct drm_private_obj *obj) { int index, num_objs, i, ret; size_t size; struct __drm_private_objs_state *arr; struct drm_private_state *obj_state; for (i = 0; i < state->num_private_objs; i++) if (obj == state->private_objs[i].ptr) return state->private_objs[i].state; ret = drm_modeset_lock(&obj->lock, state->acquire_ctx); if (ret) return ERR_PTR(ret); num_objs = state->num_private_objs + 1; size = sizeof(*state->private_objs) * num_objs; arr = krealloc(state->private_objs, size, GFP_KERNEL); if (!arr) return ERR_PTR(-ENOMEM); state->private_objs = arr; index = state->num_private_objs; memset(&state->private_objs[index], 0, sizeof(*state->private_objs)); obj_state = obj->funcs->atomic_duplicate_state(obj); if (!obj_state) return ERR_PTR(-ENOMEM); state->private_objs[index].state = obj_state; state->private_objs[index].old_state = obj->state; state->private_objs[index].new_state = obj_state; state->private_objs[index].ptr = obj; obj_state->state = state; state->num_private_objs = num_objs; drm_dbg_atomic(state->dev, "Added new private object %p state %p to %p\n", obj, obj_state, state); return obj_state; } EXPORT_SYMBOL(drm_atomic_get_private_obj_state); /** * drm_atomic_get_old_private_obj_state * @state: global atomic state object * @obj: private_obj to grab * * This function returns the old private object state for the given private_obj, * or NULL if the private_obj is not part of the global atomic state. */ struct drm_private_state * drm_atomic_get_old_private_obj_state(const struct drm_atomic_state *state, struct drm_private_obj *obj) { int i; for (i = 0; i < state->num_private_objs; i++) if (obj == state->private_objs[i].ptr) return state->private_objs[i].old_state; return NULL; } EXPORT_SYMBOL(drm_atomic_get_old_private_obj_state); /** * drm_atomic_get_new_private_obj_state * @state: global atomic state object * @obj: private_obj to grab * * This function returns the new private object state for the given private_obj, * or NULL if the private_obj is not part of the global atomic state. */ struct drm_private_state * drm_atomic_get_new_private_obj_state(const struct drm_atomic_state *state, struct drm_private_obj *obj) { int i; for (i = 0; i < state->num_private_objs; i++) if (obj == state->private_objs[i].ptr) return state->private_objs[i].new_state; return NULL; } EXPORT_SYMBOL(drm_atomic_get_new_private_obj_state); /** * drm_atomic_get_old_connector_for_encoder - Get old connector for an encoder * @state: Atomic state * @encoder: The encoder to fetch the connector state for * * This function finds and returns the connector that was connected to @encoder * as specified by the @state. * * If there is no connector in @state which previously had @encoder connected to * it, this function will return NULL. While this may seem like an invalid use * case, it is sometimes useful to differentiate commits which had no prior * connectors attached to @encoder vs ones that did (and to inspect their * state). This is especially true in enable hooks because the pipeline has * changed. * * Returns: The old connector connected to @encoder, or NULL if the encoder is * not connected. */ struct drm_connector * drm_atomic_get_old_connector_for_encoder(const struct drm_atomic_state *state, struct drm_encoder *encoder) { struct drm_connector_state *conn_state; struct drm_connector *connector; unsigned int i; for_each_old_connector_in_state(state, connector, conn_state, i) { if (conn_state->best_encoder == encoder) return connector; } return NULL; } EXPORT_SYMBOL(drm_atomic_get_old_connector_for_encoder); /** * drm_atomic_get_new_connector_for_encoder - Get new connector for an encoder * @state: Atomic state * @encoder: The encoder to fetch the connector state for * * This function finds and returns the connector that will be connected to * @encoder as specified by the @state. * * If there is no connector in @state which will have @encoder connected to it, * this function will return NULL. While this may seem like an invalid use case, * it is sometimes useful to differentiate commits which have no connectors * attached to @encoder vs ones that do (and to inspect their state). This is * especially true in disable hooks because the pipeline will change. * * Returns: The new connector connected to @encoder, or NULL if the encoder is * not connected. */ struct drm_connector * drm_atomic_get_new_connector_for_encoder(const struct drm_atomic_state *state, struct drm_encoder *encoder) { struct drm_connector_state *conn_state; struct drm_connector *connector; unsigned int i; for_each_new_connector_in_state(state, connector, conn_state, i) { if (conn_state->best_encoder == encoder) return connector; } return NULL; } EXPORT_SYMBOL(drm_atomic_get_new_connector_for_encoder); /** * drm_atomic_get_old_crtc_for_encoder - Get old crtc for an encoder * @state: Atomic state * @encoder: The encoder to fetch the crtc state for * * This function finds and returns the crtc that was connected to @encoder * as specified by the @state. * * Returns: The old crtc connected to @encoder, or NULL if the encoder is * not connected. */ struct drm_crtc * drm_atomic_get_old_crtc_for_encoder(struct drm_atomic_state *state, struct drm_encoder *encoder) { struct drm_connector *connector; struct drm_connector_state *conn_state; connector = drm_atomic_get_old_connector_for_encoder(state, encoder); if (!connector) return NULL; conn_state = drm_atomic_get_old_connector_state(state, connector); if (!conn_state) return NULL; return conn_state->crtc; } EXPORT_SYMBOL(drm_atomic_get_old_crtc_for_encoder); /** * drm_atomic_get_new_crtc_for_encoder - Get new crtc for an encoder * @state: Atomic state * @encoder: The encoder to fetch the crtc state for * * This function finds and returns the crtc that will be connected to @encoder * as specified by the @state. * * Returns: The new crtc connected to @encoder, or NULL if the encoder is * not connected. */ struct drm_crtc * drm_atomic_get_new_crtc_for_encoder(struct drm_atomic_state *state, struct drm_encoder *encoder) { struct drm_connector *connector; struct drm_connector_state *conn_state; connector = drm_atomic_get_new_connector_for_encoder(state, encoder); if (!connector) return NULL; conn_state = drm_atomic_get_new_connector_state(state, connector); if (!conn_state) return NULL; return conn_state->crtc; } EXPORT_SYMBOL(drm_atomic_get_new_crtc_for_encoder); /** * drm_atomic_get_connector_state - get connector state * @state: global atomic state object * @connector: connector to get state object for * * This function returns the connector state for the given connector, * allocating it if needed. It will also grab the relevant connector lock to * make sure that the state is consistent. * * Returns: * Either the allocated state or the error code encoded into the pointer. When * the error is EDEADLK then the w/w mutex code has detected a deadlock and the * entire atomic sequence must be restarted. All other errors are fatal. */ struct drm_connector_state * drm_atomic_get_connector_state(struct drm_atomic_state *state, struct drm_connector *connector) { int ret, index; struct drm_mode_config *config = &connector->dev->mode_config; struct drm_connector_state *connector_state; WARN_ON(!state->acquire_ctx); ret = drm_modeset_lock(&config->connection_mutex, state->acquire_ctx); if (ret) return ERR_PTR(ret); index = drm_connector_index(connector); if (index >= state->num_connector) { struct __drm_connnectors_state *c; int alloc = max(index + 1, config->num_connector); c = krealloc_array(state->connectors, alloc, sizeof(*state->connectors), GFP_KERNEL); if (!c) return ERR_PTR(-ENOMEM); state->connectors = c; memset(&state->connectors[state->num_connector], 0, sizeof(*state->connectors) * (alloc - state->num_connector)); state->num_connector = alloc; } if (state->connectors[index].state) return state->connectors[index].state; connector_state = connector->funcs->atomic_duplicate_state(connector); if (!connector_state) return ERR_PTR(-ENOMEM); drm_connector_get(connector); state->connectors[index].state = connector_state; state->connectors[index].old_state = connector->state; state->connectors[index].new_state = connector_state; state->connectors[index].ptr = connector; connector_state->state = state; drm_dbg_atomic(connector->dev, "Added [CONNECTOR:%d:%s] %p state to %p\n", connector->base.id, connector->name, connector_state, state); if (connector_state->crtc) { struct drm_crtc_state *crtc_state; crtc_state = drm_atomic_get_crtc_state(state, connector_state->crtc); if (IS_ERR(crtc_state)) return ERR_CAST(crtc_state); } return connector_state; } EXPORT_SYMBOL(drm_atomic_get_connector_state); static void drm_atomic_connector_print_state(struct drm_printer *p, const struct drm_connector_state *state) { struct drm_connector *connector = state->connector; drm_printf(p, "connector[%u]: %s\n", connector->base.id, connector->name); drm_printf(p, "\tcrtc=%s\n", state->crtc ? state->crtc->name : "(null)"); drm_printf(p, "\tself_refresh_aware=%d\n", state->self_refresh_aware); drm_printf(p, "\tinterlace_allowed=%d\n", connector->interlace_allowed); drm_printf(p, "\tycbcr_420_allowed=%d\n", connector->ycbcr_420_allowed); drm_printf(p, "\tmax_requested_bpc=%d\n", state->max_requested_bpc); drm_printf(p, "\tcolorspace=%s\n", drm_get_colorspace_name(state->colorspace)); if (connector->connector_type == DRM_MODE_CONNECTOR_HDMIA || connector->connector_type == DRM_MODE_CONNECTOR_HDMIB) { drm_printf(p, "\tbroadcast_rgb=%s\n", drm_hdmi_connector_get_broadcast_rgb_name(state->hdmi.broadcast_rgb)); drm_printf(p, "\tis_limited_range=%c\n", state->hdmi.is_limited_range ? 'y' : 'n'); drm_printf(p, "\toutput_bpc=%u\n", state->hdmi.output_bpc); drm_printf(p, "\toutput_format=%s\n", drm_hdmi_connector_get_output_format_name(state->hdmi.output_format)); drm_printf(p, "\ttmds_char_rate=%llu\n", state->hdmi.tmds_char_rate); } if (connector->connector_type == DRM_MODE_CONNECTOR_WRITEBACK) if (state->writeback_job && state->writeback_job->fb) drm_printf(p, "\tfb=%d\n", state->writeback_job->fb->base.id); if (connector->funcs->atomic_print_state) connector->funcs->atomic_print_state(p, state); } /** * drm_atomic_get_bridge_state - get bridge state * @state: global atomic state object * @bridge: bridge to get state object for * * This function returns the bridge state for the given bridge, allocating it * if needed. It will also grab the relevant bridge lock to make sure that the * state is consistent. * * Returns: * Either the allocated state or the error code encoded into the pointer. When * the error is EDEADLK then the w/w mutex code has detected a deadlock and the * entire atomic sequence must be restarted. */ struct drm_bridge_state * drm_atomic_get_bridge_state(struct drm_atomic_state *state, struct drm_bridge *bridge) { struct drm_private_state *obj_state; obj_state = drm_atomic_get_private_obj_state(state, &bridge->base); if (IS_ERR(obj_state)) return ERR_CAST(obj_state); return drm_priv_to_bridge_state(obj_state); } EXPORT_SYMBOL(drm_atomic_get_bridge_state); /** * drm_atomic_get_old_bridge_state - get old bridge state, if it exists * @state: global atomic state object * @bridge: bridge to grab * * This function returns the old bridge state for the given bridge, or NULL if * the bridge is not part of the global atomic state. */ struct drm_bridge_state * drm_atomic_get_old_bridge_state(const struct drm_atomic_state *state, struct drm_bridge *bridge) { struct drm_private_state *obj_state; obj_state = drm_atomic_get_old_private_obj_state(state, &bridge->base); if (!obj_state) return NULL; return drm_priv_to_bridge_state(obj_state); } EXPORT_SYMBOL(drm_atomic_get_old_bridge_state); /** * drm_atomic_get_new_bridge_state - get new bridge state, if it exists * @state: global atomic state object * @bridge: bridge to grab * * This function returns the new bridge state for the given bridge, or NULL if * the bridge is not part of the global atomic state. */ struct drm_bridge_state * drm_atomic_get_new_bridge_state(const struct drm_atomic_state *state, struct drm_bridge *bridge) { struct drm_private_state *obj_state; obj_state = drm_atomic_get_new_private_obj_state(state, &bridge->base); if (!obj_state) return NULL; return drm_priv_to_bridge_state(obj_state); } EXPORT_SYMBOL(drm_atomic_get_new_bridge_state); /** * drm_atomic_add_encoder_bridges - add bridges attached to an encoder * @state: atomic state * @encoder: DRM encoder * * This function adds all bridges attached to @encoder. This is needed to add * bridge states to @state and make them available when * &drm_bridge_funcs.atomic_check(), &drm_bridge_funcs.atomic_pre_enable(), * &drm_bridge_funcs.atomic_enable(), * &drm_bridge_funcs.atomic_disable_post_disable() are called. * * Returns: * 0 on success or can fail with -EDEADLK or -ENOMEM. When the error is EDEADLK * then the w/w mutex code has detected a deadlock and the entire atomic * sequence must be restarted. All other errors are fatal. */ int drm_atomic_add_encoder_bridges(struct drm_atomic_state *state, struct drm_encoder *encoder) { struct drm_bridge_state *bridge_state; struct drm_bridge *bridge; if (!encoder) return 0; drm_dbg_atomic(encoder->dev, "Adding all bridges for [encoder:%d:%s] to %p\n", encoder->base.id, encoder->name, state); drm_for_each_bridge_in_chain(encoder, bridge) { /* Skip bridges that don't implement the atomic state hooks. */ if (!bridge->funcs->atomic_duplicate_state) continue; bridge_state = drm_atomic_get_bridge_state(state, bridge); if (IS_ERR(bridge_state)) return PTR_ERR(bridge_state); } return 0; } EXPORT_SYMBOL(drm_atomic_add_encoder_bridges); /** * drm_atomic_add_affected_connectors - add connectors for CRTC * @state: atomic state * @crtc: DRM CRTC * * This function walks the current configuration and adds all connectors * currently using @crtc to the atomic configuration @state. Note that this * function must acquire the connection mutex. This can potentially cause * unneeded serialization if the update is just for the planes on one CRTC. Hence * drivers and helpers should only call this when really needed (e.g. when a * full modeset needs to happen due to some change). * * Returns: * 0 on success or can fail with -EDEADLK or -ENOMEM. When the error is EDEADLK * then the w/w mutex code has detected a deadlock and the entire atomic * sequence must be restarted. All other errors are fatal. */ int drm_atomic_add_affected_connectors(struct drm_atomic_state *state, struct drm_crtc *crtc) { struct drm_mode_config *config = &state->dev->mode_config; struct drm_connector *connector; struct drm_connector_state *conn_state; struct drm_connector_list_iter conn_iter; struct drm_crtc_state *crtc_state; int ret; crtc_state = drm_atomic_get_crtc_state(state, crtc); if (IS_ERR(crtc_state)) return PTR_ERR(crtc_state); ret = drm_modeset_lock(&config->connection_mutex, state->acquire_ctx); if (ret) return ret; drm_dbg_atomic(crtc->dev, "Adding all current connectors for [CRTC:%d:%s] to %p\n", crtc->base.id, crtc->name, state); /* * Changed connectors are already in @state, so only need to look * at the connector_mask in crtc_state. */ drm_connector_list_iter_begin(state->dev, &conn_iter); drm_for_each_connector_iter(connector, &conn_iter) { if (!(crtc_state->connector_mask & drm_connector_mask(connector))) continue; conn_state = drm_atomic_get_connector_state(state, connector); if (IS_ERR(conn_state)) { drm_connector_list_iter_end(&conn_iter); return PTR_ERR(conn_state); } } drm_connector_list_iter_end(&conn_iter); return 0; } EXPORT_SYMBOL(drm_atomic_add_affected_connectors); /** * drm_atomic_add_affected_planes - add planes for CRTC * @state: atomic state * @crtc: DRM CRTC * * This function walks the current configuration and adds all planes * currently used by @crtc to the atomic configuration @state. This is useful * when an atomic commit also needs to check all currently enabled plane on * @crtc, e.g. when changing the mode. It's also useful when re-enabling a CRTC * to avoid special code to force-enable all planes. * * Since acquiring a plane state will always also acquire the w/w mutex of the * current CRTC for that plane (if there is any) adding all the plane states for * a CRTC will not reduce parallelism of atomic updates. * * Returns: * 0 on success or can fail with -EDEADLK or -ENOMEM. When the error is EDEADLK * then the w/w mutex code has detected a deadlock and the entire atomic * sequence must be restarted. All other errors are fatal. */ int drm_atomic_add_affected_planes(struct drm_atomic_state *state, struct drm_crtc *crtc) { const struct drm_crtc_state *old_crtc_state = drm_atomic_get_old_crtc_state(state, crtc); struct drm_plane *plane; WARN_ON(!drm_atomic_get_new_crtc_state(state, crtc)); drm_dbg_atomic(crtc->dev, "Adding all current planes for [CRTC:%d:%s] to %p\n", crtc->base.id, crtc->name, state); drm_for_each_plane_mask(plane, state->dev, old_crtc_state->plane_mask) { struct drm_plane_state *plane_state = drm_atomic_get_plane_state(state, plane); if (IS_ERR(plane_state)) return PTR_ERR(plane_state); } return 0; } EXPORT_SYMBOL(drm_atomic_add_affected_planes); /** * drm_atomic_check_only - check whether a given config would work * @state: atomic configuration to check * * Note that this function can return -EDEADLK if the driver needed to acquire * more locks but encountered a deadlock. The caller must then do the usual w/w * backoff dance and restart. All other errors are fatal. * * Returns: * 0 on success, negative error code on failure. */ int drm_atomic_check_only(struct drm_atomic_state *state) { struct drm_device *dev = state->dev; struct drm_mode_config *config = &dev->mode_config; struct drm_plane *plane; struct drm_plane_state *old_plane_state; struct drm_plane_state *new_plane_state; struct drm_crtc *crtc; struct drm_crtc_state *old_crtc_state; struct drm_crtc_state *new_crtc_state; struct drm_connector *conn; struct drm_connector_state *conn_state; unsigned int requested_crtc = 0; unsigned int affected_crtc = 0; int i, ret = 0; drm_dbg_atomic(dev, "checking %p\n", state); for_each_new_crtc_in_state(state, crtc, new_crtc_state, i) { if (new_crtc_state->enable) requested_crtc |= drm_crtc_mask(crtc); } for_each_oldnew_plane_in_state(state, plane, old_plane_state, new_plane_state, i) { ret = drm_atomic_plane_check(old_plane_state, new_plane_state); if (ret) { drm_dbg_atomic(dev, "[PLANE:%d:%s] atomic core check failed\n", plane->base.id, plane->name); return ret; } } for_each_oldnew_crtc_in_state(state, crtc, old_crtc_state, new_crtc_state, i) { ret = drm_atomic_crtc_check(old_crtc_state, new_crtc_state); if (ret) { drm_dbg_atomic(dev, "[CRTC:%d:%s] atomic core check failed\n", crtc->base.id, crtc->name); return ret; } } for_each_new_connector_in_state(state, conn, conn_state, i) { ret = drm_atomic_connector_check(conn, conn_state); if (ret) { drm_dbg_atomic(dev, "[CONNECTOR:%d:%s] atomic core check failed\n", conn->base.id, conn->name); return ret; } } if (config->funcs->atomic_check) { ret = config->funcs->atomic_check(state->dev, state); if (ret) { drm_dbg_atomic(dev, "atomic driver check for %p failed: %d\n", state, ret); return ret; } } if (!state->allow_modeset) { for_each_new_crtc_in_state(state, crtc, new_crtc_state, i) { if (drm_atomic_crtc_needs_modeset(new_crtc_state)) { drm_dbg_atomic(dev, "[CRTC:%d:%s] requires full modeset\n", crtc->base.id, crtc->name); return -EINVAL; } } } for_each_new_crtc_in_state(state, crtc, new_crtc_state, i) { if (new_crtc_state->enable) affected_crtc |= drm_crtc_mask(crtc); } /* * For commits that allow modesets drivers can add other CRTCs to the * atomic commit, e.g. when they need to reallocate global resources. * This can cause spurious EBUSY, which robs compositors of a very * effective sanity check for their drawing loop. Therefor only allow * drivers to add unrelated CRTC states for modeset commits. * * FIXME: Should add affected_crtc mask to the ATOMIC IOCTL as an output * so compositors know what's going on. */ if (affected_crtc != requested_crtc) { drm_dbg_atomic(dev, "driver added CRTC to commit: requested 0x%x, affected 0x%0x\n", requested_crtc, affected_crtc); WARN(!state->allow_modeset, "adding CRTC not allowed without modesets: requested 0x%x, affected 0x%0x\n", requested_crtc, affected_crtc); } return 0; } EXPORT_SYMBOL(drm_atomic_check_only); /** * drm_atomic_commit - commit configuration atomically * @state: atomic configuration to check * * Note that this function can return -EDEADLK if the driver needed to acquire * more locks but encountered a deadlock. The caller must then do the usual w/w * backoff dance and restart. All other errors are fatal. * * This function will take its own reference on @state. * Callers should always release their reference with drm_atomic_state_put(). * * Returns: * 0 on success, negative error code on failure. */ int drm_atomic_commit(struct drm_atomic_state *state) { struct drm_mode_config *config = &state->dev->mode_config; struct drm_printer p = drm_info_printer(state->dev->dev); int ret; if (drm_debug_enabled(DRM_UT_STATE)) drm_atomic_print_new_state(state, &p); ret = drm_atomic_check_only(state); if (ret) return ret; drm_dbg_atomic(state->dev, "committing %p\n", state); return config->funcs->atomic_commit(state->dev, state, false); } EXPORT_SYMBOL(drm_atomic_commit); /** * drm_atomic_nonblocking_commit - atomic nonblocking commit * @state: atomic configuration to check * * Note that this function can return -EDEADLK if the driver needed to acquire * more locks but encountered a deadlock. The caller must then do the usual w/w * backoff dance and restart. All other errors are fatal. * * This function will take its own reference on @state. * Callers should always release their reference with drm_atomic_state_put(). * * Returns: * 0 on success, negative error code on failure. */ int drm_atomic_nonblocking_commit(struct drm_atomic_state *state) { struct drm_mode_config *config = &state->dev->mode_config; int ret; ret = drm_atomic_check_only(state); if (ret) return ret; drm_dbg_atomic(state->dev, "committing %p nonblocking\n", state); return config->funcs->atomic_commit(state->dev, state, true); } EXPORT_SYMBOL(drm_atomic_nonblocking_commit); /* just used from drm-client and atomic-helper: */ int __drm_atomic_helper_disable_plane(struct drm_plane *plane, struct drm_plane_state *plane_state) { int ret; ret = drm_atomic_set_crtc_for_plane(plane_state, NULL); if (ret != 0) return ret; drm_atomic_set_fb_for_plane(plane_state, NULL); plane_state->crtc_x = 0; plane_state->crtc_y = 0; plane_state->crtc_w = 0; plane_state->crtc_h = 0; plane_state->src_x = 0; plane_state->src_y = 0; plane_state->src_w = 0; plane_state->src_h = 0; return 0; } EXPORT_SYMBOL(__drm_atomic_helper_disable_plane); static int update_output_state(struct drm_atomic_state *state, struct drm_mode_set *set) { struct drm_device *dev = set->crtc->dev; struct drm_crtc *crtc; struct drm_crtc_state *new_crtc_state; struct drm_connector *connector; struct drm_connector_state *new_conn_state; int ret, i; ret = drm_modeset_lock(&dev->mode_config.connection_mutex, state->acquire_ctx); if (ret) return ret; /* First disable all connectors on the target crtc. */ ret = drm_atomic_add_affected_connectors(state, set->crtc); if (ret) return ret; for_each_new_connector_in_state(state, connector, new_conn_state, i) { if (new_conn_state->crtc == set->crtc) { ret = drm_atomic_set_crtc_for_connector(new_conn_state, NULL); if (ret) return ret; /* Make sure legacy setCrtc always re-trains */ new_conn_state->link_status = DRM_LINK_STATUS_GOOD; } } /* Then set all connectors from set->connectors on the target crtc */ for (i = 0; i < set->num_connectors; i++) { new_conn_state = drm_atomic_get_connector_state(state, set->connectors[i]); if (IS_ERR(new_conn_state)) return PTR_ERR(new_conn_state); ret = drm_atomic_set_crtc_for_connector(new_conn_state, set->crtc); if (ret) return ret; } for_each_new_crtc_in_state(state, crtc, new_crtc_state, i) { /* * Don't update ->enable for the CRTC in the set_config request, * since a mismatch would indicate a bug in the upper layers. * The actual modeset code later on will catch any * inconsistencies here. */ if (crtc == set->crtc) continue; if (!new_crtc_state->connector_mask) { ret = drm_atomic_set_mode_prop_for_crtc(new_crtc_state, NULL); if (ret < 0) return ret; new_crtc_state->active = false; } } return 0; } /* just used from drm-client and atomic-helper: */ int __drm_atomic_helper_set_config(struct drm_mode_set *set, struct drm_atomic_state *state) { struct drm_crtc_state *crtc_state; struct drm_plane_state *primary_state; struct drm_crtc *crtc = set->crtc; int hdisplay, vdisplay; int ret; crtc_state = drm_atomic_get_crtc_state(state, crtc); if (IS_ERR(crtc_state)) return PTR_ERR(crtc_state); primary_state = drm_atomic_get_plane_state(state, crtc->primary); if (IS_ERR(primary_state)) return PTR_ERR(primary_state); if (!set->mode) { WARN_ON(set->fb); WARN_ON(set->num_connectors); ret = drm_atomic_set_mode_for_crtc(crtc_state, NULL); if (ret != 0) return ret; crtc_state->active = false; ret = drm_atomic_set_crtc_for_plane(primary_state, NULL); if (ret != 0) return ret; drm_atomic_set_fb_for_plane(primary_state, NULL); goto commit; } WARN_ON(!set->fb); WARN_ON(!set->num_connectors); ret = drm_atomic_set_mode_for_crtc(crtc_state, set->mode); if (ret != 0) return ret; crtc_state->active = true; ret = drm_atomic_set_crtc_for_plane(primary_state, crtc); if (ret != 0) return ret; drm_mode_get_hv_timing(set->mode, &hdisplay, &vdisplay); drm_atomic_set_fb_for_plane(primary_state, set->fb); primary_state->crtc_x = 0; primary_state->crtc_y = 0; primary_state->crtc_w = hdisplay; primary_state->crtc_h = vdisplay; primary_state->src_x = set->x << 16; primary_state->src_y = set->y << 16; if (drm_rotation_90_or_270(primary_state->rotation)) { primary_state->src_w = vdisplay << 16; primary_state->src_h = hdisplay << 16; } else { primary_state->src_w = hdisplay << 16; primary_state->src_h = vdisplay << 16; } commit: ret = update_output_state(state, set); if (ret) return ret; return 0; } EXPORT_SYMBOL(__drm_atomic_helper_set_config); static void drm_atomic_private_obj_print_state(struct drm_printer *p, const struct drm_private_state *state) { struct drm_private_obj *obj = state->obj; if (obj->funcs->atomic_print_state) obj->funcs->atomic_print_state(p, state); } /** * drm_atomic_print_new_state - prints drm atomic state * @state: atomic configuration to check * @p: drm printer * * This functions prints the drm atomic state snapshot using the drm printer * which is passed to it. This snapshot can be used for debugging purposes. * * Note that this function looks into the new state objects and hence its not * safe to be used after the call to drm_atomic_helper_commit_hw_done(). */ void drm_atomic_print_new_state(const struct drm_atomic_state *state, struct drm_printer *p) { struct drm_plane *plane; struct drm_plane_state *plane_state; struct drm_crtc *crtc; struct drm_crtc_state *crtc_state; struct drm_connector *connector; struct drm_connector_state *connector_state; struct drm_private_obj *obj; struct drm_private_state *obj_state; int i; if (!p) { drm_err(state->dev, "invalid drm printer\n"); return; } drm_dbg_atomic(state->dev, "checking %p\n", state); for_each_new_plane_in_state(state, plane, plane_state, i) drm_atomic_plane_print_state(p, plane_state); for_each_new_crtc_in_state(state, crtc, crtc_state, i) drm_atomic_crtc_print_state(p, crtc_state); for_each_new_connector_in_state(state, connector, connector_state, i) drm_atomic_connector_print_state(p, connector_state); for_each_new_private_obj_in_state(state, obj, obj_state, i) drm_atomic_private_obj_print_state(p, obj_state); } EXPORT_SYMBOL(drm_atomic_print_new_state); static void __drm_state_dump(struct drm_device *dev, struct drm_printer *p, bool take_locks) { struct drm_mode_config *config = &dev->mode_config; struct drm_plane *plane; struct drm_crtc *crtc; struct drm_connector *connector; struct drm_connector_list_iter conn_iter; struct drm_private_obj *obj; if (!drm_drv_uses_atomic_modeset(dev)) return; list_for_each_entry(plane, &config->plane_list, head) { if (take_locks) drm_modeset_lock(&plane->mutex, NULL); drm_atomic_plane_print_state(p, plane->state); if (take_locks) drm_modeset_unlock(&plane->mutex); } list_for_each_entry(crtc, &config->crtc_list, head) { if (take_locks) drm_modeset_lock(&crtc->mutex, NULL); drm_atomic_crtc_print_state(p, crtc->state); if (take_locks) drm_modeset_unlock(&crtc->mutex); } drm_connector_list_iter_begin(dev, &conn_iter); if (take_locks) drm_modeset_lock(&dev->mode_config.connection_mutex, NULL); drm_for_each_connector_iter(connector, &conn_iter) drm_atomic_connector_print_state(p, connector->state); if (take_locks) drm_modeset_unlock(&dev->mode_config.connection_mutex); drm_connector_list_iter_end(&conn_iter); list_for_each_entry(obj, &config->privobj_list, head) { if (take_locks) drm_modeset_lock(&obj->lock, NULL); drm_atomic_private_obj_print_state(p, obj->state); if (take_locks) drm_modeset_unlock(&obj->lock); } } /** * drm_state_dump - dump entire device atomic state * @dev: the drm device * @p: where to print the state to * * Just for debugging. Drivers might want an option to dump state * to dmesg in case of error irq's. (Hint, you probably want to * ratelimit this!) * * The caller must wrap this drm_modeset_lock_all_ctx() and * drm_modeset_drop_locks(). If this is called from error irq handler, it should * not be enabled by default - if you are debugging errors you might * not care that this is racey, but calling this without all modeset locks held * is inherently unsafe. */ void drm_state_dump(struct drm_device *dev, struct drm_printer *p) { __drm_state_dump(dev, p, false); } EXPORT_SYMBOL(drm_state_dump); #ifdef CONFIG_DEBUG_FS static int drm_state_info(struct seq_file *m, void *data) { struct drm_debugfs_entry *entry = m->private; struct drm_device *dev = entry->dev; struct drm_printer p = drm_seq_file_printer(m); __drm_state_dump(dev, &p, true); return 0; } /* any use in debugfs files to dump individual planes/crtc/etc? */ static const struct drm_debugfs_info drm_atomic_debugfs_list[] = { {"state", drm_state_info, 0}, }; void drm_atomic_debugfs_init(struct drm_device *dev) { drm_debugfs_add_files(dev, drm_atomic_debugfs_list, ARRAY_SIZE(drm_atomic_debugfs_list)); } #endif |
| 3 13 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 | /* SPDX-License-Identifier: GPL-2.0 */ #undef TRACE_SYSTEM #define TRACE_SYSTEM v4l2 #if !defined(_TRACE_V4L2_H) || defined(TRACE_HEADER_MULTI_READ) #define _TRACE_V4L2_H #include <linux/tracepoint.h> #include <media/videobuf2-v4l2.h> /* Enums require being exported to userspace, for user tool parsing */ #undef EM #undef EMe #define EM(a, b) TRACE_DEFINE_ENUM(a); #define EMe(a, b) TRACE_DEFINE_ENUM(a); #define show_type(type) \ __print_symbolic(type, SHOW_TYPE) #define SHOW_TYPE \ EM( V4L2_BUF_TYPE_VIDEO_CAPTURE, "VIDEO_CAPTURE" ) \ EM( V4L2_BUF_TYPE_VIDEO_OUTPUT, "VIDEO_OUTPUT" ) \ EM( V4L2_BUF_TYPE_VIDEO_OVERLAY, "VIDEO_OVERLAY" ) \ EM( V4L2_BUF_TYPE_VBI_CAPTURE, "VBI_CAPTURE" ) \ EM( V4L2_BUF_TYPE_VBI_OUTPUT, "VBI_OUTPUT" ) \ EM( V4L2_BUF_TYPE_SLICED_VBI_CAPTURE, "SLICED_VBI_CAPTURE" ) \ EM( V4L2_BUF_TYPE_SLICED_VBI_OUTPUT, "SLICED_VBI_OUTPUT" ) \ EM( V4L2_BUF_TYPE_VIDEO_OUTPUT_OVERLAY, "VIDEO_OUTPUT_OVERLAY" ) \ EM( V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE, "VIDEO_CAPTURE_MPLANE" ) \ EM( V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE, "VIDEO_OUTPUT_MPLANE" ) \ EM( V4L2_BUF_TYPE_SDR_CAPTURE, "SDR_CAPTURE" ) \ EM( V4L2_BUF_TYPE_SDR_OUTPUT, "SDR_OUTPUT" ) \ EM( V4L2_BUF_TYPE_META_CAPTURE, "META_CAPTURE" ) \ EMe(V4L2_BUF_TYPE_PRIVATE, "PRIVATE" ) SHOW_TYPE #define show_field(field) \ __print_symbolic(field, SHOW_FIELD) #define SHOW_FIELD \ EM( V4L2_FIELD_ANY, "ANY" ) \ EM( V4L2_FIELD_NONE, "NONE" ) \ EM( V4L2_FIELD_TOP, "TOP" ) \ EM( V4L2_FIELD_BOTTOM, "BOTTOM" ) \ EM( V4L2_FIELD_INTERLACED, "INTERLACED" ) \ EM( V4L2_FIELD_SEQ_TB, "SEQ_TB" ) \ EM( V4L2_FIELD_SEQ_BT, "SEQ_BT" ) \ EM( V4L2_FIELD_ALTERNATE, "ALTERNATE" ) \ EM( V4L2_FIELD_INTERLACED_TB, "INTERLACED_TB" ) \ EMe( V4L2_FIELD_INTERLACED_BT, "INTERLACED_BT" ) SHOW_FIELD /* * Now redefine the EM() and EMe() macros to map the enums to the strings * that will be printed in the output. */ #undef EM #undef EMe #define EM(a, b) {a, b}, #define EMe(a, b) {a, b} /* V4L2_TC_TYPE_* are macros, not defines, they do not need processing */ #define show_timecode_type(type) \ __print_symbolic(type, \ { V4L2_TC_TYPE_24FPS, "24FPS" }, \ { V4L2_TC_TYPE_25FPS, "25FPS" }, \ { V4L2_TC_TYPE_30FPS, "30FPS" }, \ { V4L2_TC_TYPE_50FPS, "50FPS" }, \ { V4L2_TC_TYPE_60FPS, "60FPS" }) #define show_flags(flags) \ __print_flags(flags, "|", \ { V4L2_BUF_FLAG_MAPPED, "MAPPED" }, \ { V4L2_BUF_FLAG_QUEUED, "QUEUED" }, \ { V4L2_BUF_FLAG_DONE, "DONE" }, \ { V4L2_BUF_FLAG_KEYFRAME, "KEYFRAME" }, \ { V4L2_BUF_FLAG_PFRAME, "PFRAME" }, \ { V4L2_BUF_FLAG_BFRAME, "BFRAME" }, \ { V4L2_BUF_FLAG_ERROR, "ERROR" }, \ { V4L2_BUF_FLAG_TIMECODE, "TIMECODE" }, \ { V4L2_BUF_FLAG_PREPARED, "PREPARED" }, \ { V4L2_BUF_FLAG_NO_CACHE_INVALIDATE, "NO_CACHE_INVALIDATE" }, \ { V4L2_BUF_FLAG_NO_CACHE_CLEAN, "NO_CACHE_CLEAN" }, \ { V4L2_BUF_FLAG_TIMESTAMP_MASK, "TIMESTAMP_MASK" }, \ { V4L2_BUF_FLAG_TIMESTAMP_UNKNOWN, "TIMESTAMP_UNKNOWN" }, \ { V4L2_BUF_FLAG_TIMESTAMP_MONOTONIC, "TIMESTAMP_MONOTONIC" }, \ { V4L2_BUF_FLAG_TIMESTAMP_COPY, "TIMESTAMP_COPY" }, \ { V4L2_BUF_FLAG_LAST, "LAST" }) #define show_timecode_flags(flags) \ __print_flags(flags, "|", \ { V4L2_TC_FLAG_DROPFRAME, "DROPFRAME" }, \ { V4L2_TC_FLAG_COLORFRAME, "COLORFRAME" }, \ { V4L2_TC_USERBITS_USERDEFINED, "USERBITS_USERDEFINED" }, \ { V4L2_TC_USERBITS_8BITCHARS, "USERBITS_8BITCHARS" }) DECLARE_EVENT_CLASS(v4l2_event_class, TP_PROTO(int minor, struct v4l2_buffer *buf), TP_ARGS(minor, buf), TP_STRUCT__entry( __field(int, minor) __field(u32, index) __field(u32, type) __field(u32, bytesused) __field(u32, flags) __field(u32, field) __field(s64, timestamp) __field(u32, timecode_type) __field(u32, timecode_flags) __field(u8, timecode_frames) __field(u8, timecode_seconds) __field(u8, timecode_minutes) __field(u8, timecode_hours) __field(u8, timecode_userbits0) __field(u8, timecode_userbits1) __field(u8, timecode_userbits2) __field(u8, timecode_userbits3) __field(u32, sequence) ), TP_fast_assign( __entry->minor = minor; __entry->index = buf->index; __entry->type = buf->type; __entry->bytesused = buf->bytesused; __entry->flags = buf->flags; __entry->field = buf->field; __entry->timestamp = v4l2_buffer_get_timestamp(buf); __entry->timecode_type = buf->timecode.type; __entry->timecode_flags = buf->timecode.flags; __entry->timecode_frames = buf->timecode.frames; __entry->timecode_seconds = buf->timecode.seconds; __entry->timecode_minutes = buf->timecode.minutes; __entry->timecode_hours = buf->timecode.hours; __entry->timecode_userbits0 = buf->timecode.userbits[0]; __entry->timecode_userbits1 = buf->timecode.userbits[1]; __entry->timecode_userbits2 = buf->timecode.userbits[2]; __entry->timecode_userbits3 = buf->timecode.userbits[3]; __entry->sequence = buf->sequence; ), TP_printk("minor = %d, index = %u, type = %s, bytesused = %u, " "flags = %s, field = %s, timestamp = %llu, " "timecode = { type = %s, flags = %s, frames = %u, " "seconds = %u, minutes = %u, hours = %u, " "userbits = { %u %u %u %u } }, sequence = %u", __entry->minor, __entry->index, show_type(__entry->type), __entry->bytesused, show_flags(__entry->flags), show_field(__entry->field), __entry->timestamp, show_timecode_type(__entry->timecode_type), show_timecode_flags(__entry->timecode_flags), __entry->timecode_frames, __entry->timecode_seconds, __entry->timecode_minutes, __entry->timecode_hours, __entry->timecode_userbits0, __entry->timecode_userbits1, __entry->timecode_userbits2, __entry->timecode_userbits3, __entry->sequence ) ) DEFINE_EVENT(v4l2_event_class, v4l2_dqbuf, TP_PROTO(int minor, struct v4l2_buffer *buf), TP_ARGS(minor, buf) ); DEFINE_EVENT(v4l2_event_class, v4l2_qbuf, TP_PROTO(int minor, struct v4l2_buffer *buf), TP_ARGS(minor, buf) ); DECLARE_EVENT_CLASS(vb2_v4l2_event_class, TP_PROTO(struct vb2_queue *q, struct vb2_buffer *vb), TP_ARGS(q, vb), TP_STRUCT__entry( __field(int, minor) __field(u32, flags) __field(u32, field) __field(u64, timestamp) __field(u32, timecode_type) __field(u32, timecode_flags) __field(u8, timecode_frames) __field(u8, timecode_seconds) __field(u8, timecode_minutes) __field(u8, timecode_hours) __field(u8, timecode_userbits0) __field(u8, timecode_userbits1) __field(u8, timecode_userbits2) __field(u8, timecode_userbits3) __field(u32, sequence) ), TP_fast_assign( struct vb2_v4l2_buffer *vbuf = to_vb2_v4l2_buffer(vb); struct v4l2_fh *owner = q->owner; __entry->minor = owner ? owner->vdev->minor : -1; __entry->flags = vbuf->flags; __entry->field = vbuf->field; __entry->timestamp = vb->timestamp; __entry->timecode_type = vbuf->timecode.type; __entry->timecode_flags = vbuf->timecode.flags; __entry->timecode_frames = vbuf->timecode.frames; __entry->timecode_seconds = vbuf->timecode.seconds; __entry->timecode_minutes = vbuf->timecode.minutes; __entry->timecode_hours = vbuf->timecode.hours; __entry->timecode_userbits0 = vbuf->timecode.userbits[0]; __entry->timecode_userbits1 = vbuf->timecode.userbits[1]; __entry->timecode_userbits2 = vbuf->timecode.userbits[2]; __entry->timecode_userbits3 = vbuf->timecode.userbits[3]; __entry->sequence = vbuf->sequence; ), TP_printk("minor=%d flags = %s, field = %s, " "timestamp = %llu, timecode = { type = %s, flags = %s, " "frames = %u, seconds = %u, minutes = %u, hours = %u, " "userbits = { %u %u %u %u } }, sequence = %u", __entry->minor, show_flags(__entry->flags), show_field(__entry->field), __entry->timestamp, show_timecode_type(__entry->timecode_type), show_timecode_flags(__entry->timecode_flags), __entry->timecode_frames, __entry->timecode_seconds, __entry->timecode_minutes, __entry->timecode_hours, __entry->timecode_userbits0, __entry->timecode_userbits1, __entry->timecode_userbits2, __entry->timecode_userbits3, __entry->sequence ) ) DEFINE_EVENT(vb2_v4l2_event_class, vb2_v4l2_buf_done, TP_PROTO(struct vb2_queue *q, struct vb2_buffer *vb), TP_ARGS(q, vb) ); DEFINE_EVENT(vb2_v4l2_event_class, vb2_v4l2_buf_queue, TP_PROTO(struct vb2_queue *q, struct vb2_buffer *vb), TP_ARGS(q, vb) ); DEFINE_EVENT(vb2_v4l2_event_class, vb2_v4l2_dqbuf, TP_PROTO(struct vb2_queue *q, struct vb2_buffer *vb), TP_ARGS(q, vb) ); DEFINE_EVENT(vb2_v4l2_event_class, vb2_v4l2_qbuf, TP_PROTO(struct vb2_queue *q, struct vb2_buffer *vb), TP_ARGS(q, vb) ); #endif /* if !defined(_TRACE_V4L2_H) || defined(TRACE_HEADER_MULTI_READ) */ /* This part must be outside protection */ #include <trace/define_trace.h> |
| 16 4 4 4 4 11 18 4 22 10 10 7 3 10 10 11 11 11 563 563 10 135 128 11 569 115 19 113 134 533 528 18 134 134 4 29 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 | // SPDX-License-Identifier: GPL-2.0 /* * Ldisc rw semaphore * * The ldisc semaphore is semantically a rw_semaphore but which enforces * an alternate policy, namely: * 1) Supports lock wait timeouts * 2) Write waiter has priority * 3) Downgrading is not supported * * Implementation notes: * 1) Upper half of semaphore count is a wait count (differs from rwsem * in that rwsem normalizes the upper half to the wait bias) * 2) Lacks overflow checking * * The generic counting was copied and modified from include/asm-generic/rwsem.h * by Paul Mackerras <paulus@samba.org>. * * The scheduling policy was copied and modified from lib/rwsem.c * Written by David Howells (dhowells@redhat.com). * * This implementation incorporates the write lock stealing work of * Michel Lespinasse <walken@google.com>. * * Copyright (C) 2013 Peter Hurley <peter@hurleysoftware.com> */ #include <linux/list.h> #include <linux/spinlock.h> #include <linux/atomic.h> #include <linux/tty.h> #include <linux/sched.h> #include <linux/sched/debug.h> #include <linux/sched/task.h> #if BITS_PER_LONG == 64 # define LDSEM_ACTIVE_MASK 0xffffffffL #else # define LDSEM_ACTIVE_MASK 0x0000ffffL #endif #define LDSEM_UNLOCKED 0L #define LDSEM_ACTIVE_BIAS 1L #define LDSEM_WAIT_BIAS (-LDSEM_ACTIVE_MASK-1) #define LDSEM_READ_BIAS LDSEM_ACTIVE_BIAS #define LDSEM_WRITE_BIAS (LDSEM_WAIT_BIAS + LDSEM_ACTIVE_BIAS) struct ldsem_waiter { struct list_head list; struct task_struct *task; }; /* * Initialize an ldsem: */ void __init_ldsem(struct ld_semaphore *sem, const char *name, struct lock_class_key *key) { #ifdef CONFIG_DEBUG_LOCK_ALLOC /* * Make sure we are not reinitializing a held semaphore: */ debug_check_no_locks_freed((void *)sem, sizeof(*sem)); lockdep_init_map(&sem->dep_map, name, key, 0); #endif atomic_long_set(&sem->count, LDSEM_UNLOCKED); sem->wait_readers = 0; raw_spin_lock_init(&sem->wait_lock); INIT_LIST_HEAD(&sem->read_wait); INIT_LIST_HEAD(&sem->write_wait); } static void __ldsem_wake_readers(struct ld_semaphore *sem) { struct ldsem_waiter *waiter, *next; struct task_struct *tsk; long adjust, count; /* * Try to grant read locks to all readers on the read wait list. * Note the 'active part' of the count is incremented by * the number of readers before waking any processes up. */ adjust = sem->wait_readers * (LDSEM_ACTIVE_BIAS - LDSEM_WAIT_BIAS); count = atomic_long_add_return(adjust, &sem->count); do { if (count > 0) break; if (atomic_long_try_cmpxchg(&sem->count, &count, count - adjust)) return; } while (1); list_for_each_entry_safe(waiter, next, &sem->read_wait, list) { tsk = waiter->task; smp_store_release(&waiter->task, NULL); wake_up_process(tsk); put_task_struct(tsk); } INIT_LIST_HEAD(&sem->read_wait); sem->wait_readers = 0; } static inline int writer_trylock(struct ld_semaphore *sem) { /* * Only wake this writer if the active part of the count can be * transitioned from 0 -> 1 */ long count = atomic_long_add_return(LDSEM_ACTIVE_BIAS, &sem->count); do { if ((count & LDSEM_ACTIVE_MASK) == LDSEM_ACTIVE_BIAS) return 1; if (atomic_long_try_cmpxchg(&sem->count, &count, count - LDSEM_ACTIVE_BIAS)) return 0; } while (1); } static void __ldsem_wake_writer(struct ld_semaphore *sem) { struct ldsem_waiter *waiter; waiter = list_entry(sem->write_wait.next, struct ldsem_waiter, list); wake_up_process(waiter->task); } /* * handle the lock release when processes blocked on it that can now run * - if we come here from up_xxxx(), then: * - the 'active part' of count (&0x0000ffff) reached 0 (but may have changed) * - the 'waiting part' of count (&0xffff0000) is -ve (and will still be so) * - the spinlock must be held by the caller * - woken process blocks are discarded from the list after having task zeroed */ static void __ldsem_wake(struct ld_semaphore *sem) { if (!list_empty(&sem->write_wait)) __ldsem_wake_writer(sem); else if (!list_empty(&sem->read_wait)) __ldsem_wake_readers(sem); } static void ldsem_wake(struct ld_semaphore *sem) { unsigned long flags; raw_spin_lock_irqsave(&sem->wait_lock, flags); __ldsem_wake(sem); raw_spin_unlock_irqrestore(&sem->wait_lock, flags); } /* * wait for the read lock to be granted */ static struct ld_semaphore __sched * down_read_failed(struct ld_semaphore *sem, long count, long timeout) { struct ldsem_waiter waiter; long adjust = -LDSEM_ACTIVE_BIAS + LDSEM_WAIT_BIAS; /* set up my own style of waitqueue */ raw_spin_lock_irq(&sem->wait_lock); /* * Try to reverse the lock attempt but if the count has changed * so that reversing fails, check if there are no waiters, * and early-out if not */ do { if (atomic_long_try_cmpxchg(&sem->count, &count, count + adjust)) { count += adjust; break; } if (count > 0) { raw_spin_unlock_irq(&sem->wait_lock); return sem; } } while (1); list_add_tail(&waiter.list, &sem->read_wait); sem->wait_readers++; waiter.task = current; get_task_struct(current); /* if there are no active locks, wake the new lock owner(s) */ if ((count & LDSEM_ACTIVE_MASK) == 0) __ldsem_wake(sem); raw_spin_unlock_irq(&sem->wait_lock); /* wait to be given the lock */ for (;;) { set_current_state(TASK_UNINTERRUPTIBLE); if (!smp_load_acquire(&waiter.task)) break; if (!timeout) break; timeout = schedule_timeout(timeout); } __set_current_state(TASK_RUNNING); if (!timeout) { /* * Lock timed out but check if this task was just * granted lock ownership - if so, pretend there * was no timeout; otherwise, cleanup lock wait. */ raw_spin_lock_irq(&sem->wait_lock); if (waiter.task) { atomic_long_add_return(-LDSEM_WAIT_BIAS, &sem->count); sem->wait_readers--; list_del(&waiter.list); raw_spin_unlock_irq(&sem->wait_lock); put_task_struct(waiter.task); return NULL; } raw_spin_unlock_irq(&sem->wait_lock); } return sem; } /* * wait for the write lock to be granted */ static struct ld_semaphore __sched * down_write_failed(struct ld_semaphore *sem, long count, long timeout) { struct ldsem_waiter waiter; long adjust = -LDSEM_ACTIVE_BIAS; int locked = 0; /* set up my own style of waitqueue */ raw_spin_lock_irq(&sem->wait_lock); /* * Try to reverse the lock attempt but if the count has changed * so that reversing fails, check if the lock is now owned, * and early-out if so. */ do { if (atomic_long_try_cmpxchg(&sem->count, &count, count + adjust)) break; if ((count & LDSEM_ACTIVE_MASK) == LDSEM_ACTIVE_BIAS) { raw_spin_unlock_irq(&sem->wait_lock); return sem; } } while (1); list_add_tail(&waiter.list, &sem->write_wait); waiter.task = current; set_current_state(TASK_UNINTERRUPTIBLE); for (;;) { if (!timeout) break; raw_spin_unlock_irq(&sem->wait_lock); timeout = schedule_timeout(timeout); raw_spin_lock_irq(&sem->wait_lock); set_current_state(TASK_UNINTERRUPTIBLE); locked = writer_trylock(sem); if (locked) break; } if (!locked) atomic_long_add_return(-LDSEM_WAIT_BIAS, &sem->count); list_del(&waiter.list); /* * In case of timeout, wake up every reader who gave the right of way * to writer. Prevent separation readers into two groups: * one that helds semaphore and another that sleeps. * (in case of no contention with a writer) */ if (!locked && list_empty(&sem->write_wait)) __ldsem_wake_readers(sem); raw_spin_unlock_irq(&sem->wait_lock); __set_current_state(TASK_RUNNING); /* lock wait may have timed out */ if (!locked) return NULL; return sem; } static int __ldsem_down_read_nested(struct ld_semaphore *sem, int subclass, long timeout) { long count; rwsem_acquire_read(&sem->dep_map, subclass, 0, _RET_IP_); count = atomic_long_add_return(LDSEM_READ_BIAS, &sem->count); if (count <= 0) { lock_contended(&sem->dep_map, _RET_IP_); if (!down_read_failed(sem, count, timeout)) { rwsem_release(&sem->dep_map, _RET_IP_); return 0; } } lock_acquired(&sem->dep_map, _RET_IP_); return 1; } static int __ldsem_down_write_nested(struct ld_semaphore *sem, int subclass, long timeout) { long count; rwsem_acquire(&sem->dep_map, subclass, 0, _RET_IP_); count = atomic_long_add_return(LDSEM_WRITE_BIAS, &sem->count); if ((count & LDSEM_ACTIVE_MASK) != LDSEM_ACTIVE_BIAS) { lock_contended(&sem->dep_map, _RET_IP_); if (!down_write_failed(sem, count, timeout)) { rwsem_release(&sem->dep_map, _RET_IP_); return 0; } } lock_acquired(&sem->dep_map, _RET_IP_); return 1; } /* * lock for reading -- returns 1 if successful, 0 if timed out */ int __sched ldsem_down_read(struct ld_semaphore *sem, long timeout) { might_sleep(); return __ldsem_down_read_nested(sem, 0, timeout); } /* * trylock for reading -- returns 1 if successful, 0 if contention */ int ldsem_down_read_trylock(struct ld_semaphore *sem) { long count = atomic_long_read(&sem->count); while (count >= 0) { if (atomic_long_try_cmpxchg(&sem->count, &count, count + LDSEM_READ_BIAS)) { rwsem_acquire_read(&sem->dep_map, 0, 1, _RET_IP_); lock_acquired(&sem->dep_map, _RET_IP_); return 1; } } return 0; } /* * lock for writing -- returns 1 if successful, 0 if timed out */ int __sched ldsem_down_write(struct ld_semaphore *sem, long timeout) { might_sleep(); return __ldsem_down_write_nested(sem, 0, timeout); } /* * trylock for writing -- returns 1 if successful, 0 if contention */ int ldsem_down_write_trylock(struct ld_semaphore *sem) { long count = atomic_long_read(&sem->count); while ((count & LDSEM_ACTIVE_MASK) == 0) { if (atomic_long_try_cmpxchg(&sem->count, &count, count + LDSEM_WRITE_BIAS)) { rwsem_acquire(&sem->dep_map, 0, 1, _RET_IP_); lock_acquired(&sem->dep_map, _RET_IP_); return 1; } } return 0; } /* * release a read lock */ void ldsem_up_read(struct ld_semaphore *sem) { long count; rwsem_release(&sem->dep_map, _RET_IP_); count = atomic_long_add_return(-LDSEM_READ_BIAS, &sem->count); if (count < 0 && (count & LDSEM_ACTIVE_MASK) == 0) ldsem_wake(sem); } /* * release a write lock */ void ldsem_up_write(struct ld_semaphore *sem) { long count; rwsem_release(&sem->dep_map, _RET_IP_); count = atomic_long_add_return(-LDSEM_WRITE_BIAS, &sem->count); if (count < 0) ldsem_wake(sem); } #ifdef CONFIG_DEBUG_LOCK_ALLOC int ldsem_down_read_nested(struct ld_semaphore *sem, int subclass, long timeout) { might_sleep(); return __ldsem_down_read_nested(sem, subclass, timeout); } int ldsem_down_write_nested(struct ld_semaphore *sem, int subclass, long timeout) { might_sleep(); return __ldsem_down_write_nested(sem, subclass, timeout); } #endif |
| 28 28 28 27 49 49 49 2 2 2 2 2 2 2 4 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 | // SPDX-License-Identifier: GPL-2.0-only // Copyright (c) 2020 Facebook Inc. #include <linux/debugfs.h> #include <linux/netdevice.h> #include <linux/slab.h> #include <net/udp_tunnel.h> #include "netdevsim.h" static int nsim_udp_tunnel_set_port(struct net_device *dev, unsigned int table, unsigned int entry, struct udp_tunnel_info *ti) { struct netdevsim *ns = netdev_priv(dev); int ret; ret = -ns->udp_ports.inject_error; ns->udp_ports.inject_error = 0; if (ns->udp_ports.sleep) msleep(ns->udp_ports.sleep); if (!ret) { if (ns->udp_ports.ports[table][entry]) { WARN(1, "entry already in use\n"); ret = -EBUSY; } else { ns->udp_ports.ports[table][entry] = be16_to_cpu(ti->port) << 16 | ti->type; } } netdev_info(dev, "set [%d, %d] type %d family %d port %d - %d\n", table, entry, ti->type, ti->sa_family, ntohs(ti->port), ret); return ret; } static int nsim_udp_tunnel_unset_port(struct net_device *dev, unsigned int table, unsigned int entry, struct udp_tunnel_info *ti) { struct netdevsim *ns = netdev_priv(dev); int ret; ret = -ns->udp_ports.inject_error; ns->udp_ports.inject_error = 0; if (ns->udp_ports.sleep) msleep(ns->udp_ports.sleep); if (!ret) { u32 val = be16_to_cpu(ti->port) << 16 | ti->type; if (val == ns->udp_ports.ports[table][entry]) { ns->udp_ports.ports[table][entry] = 0; } else { WARN(1, "entry not installed %x vs %x\n", val, ns->udp_ports.ports[table][entry]); ret = -ENOENT; } } netdev_info(dev, "unset [%d, %d] type %d family %d port %d - %d\n", table, entry, ti->type, ti->sa_family, ntohs(ti->port), ret); return ret; } static int nsim_udp_tunnel_sync_table(struct net_device *dev, unsigned int table) { struct netdevsim *ns = netdev_priv(dev); struct udp_tunnel_info ti; unsigned int i; int ret; ret = -ns->udp_ports.inject_error; ns->udp_ports.inject_error = 0; for (i = 0; i < NSIM_UDP_TUNNEL_N_PORTS; i++) { udp_tunnel_nic_get_port(dev, table, i, &ti); ns->udp_ports.ports[table][i] = be16_to_cpu(ti.port) << 16 | ti.type; } return ret; } static const struct udp_tunnel_nic_info nsim_udp_tunnel_info = { .set_port = nsim_udp_tunnel_set_port, .unset_port = nsim_udp_tunnel_unset_port, .sync_table = nsim_udp_tunnel_sync_table, .tables = { { .n_entries = NSIM_UDP_TUNNEL_N_PORTS, .tunnel_types = UDP_TUNNEL_TYPE_VXLAN, }, { .n_entries = NSIM_UDP_TUNNEL_N_PORTS, .tunnel_types = UDP_TUNNEL_TYPE_GENEVE | UDP_TUNNEL_TYPE_VXLAN_GPE, }, }, }; static ssize_t nsim_udp_tunnels_info_reset_write(struct file *file, const char __user *data, size_t count, loff_t *ppos) { struct net_device *dev = file->private_data; struct netdevsim *ns = netdev_priv(dev); rtnl_lock(); if (dev->reg_state == NETREG_REGISTERED) { memset(ns->udp_ports.ports, 0, sizeof(ns->udp_ports.__ports)); udp_tunnel_nic_reset_ntf(dev); } rtnl_unlock(); return count; } static const struct file_operations nsim_udp_tunnels_info_reset_fops = { .open = simple_open, .write = nsim_udp_tunnels_info_reset_write, .llseek = generic_file_llseek, .owner = THIS_MODULE, }; int nsim_udp_tunnels_info_create(struct nsim_dev *nsim_dev, struct net_device *dev) { struct netdevsim *ns = netdev_priv(dev); struct udp_tunnel_nic_info *info; if (nsim_dev->udp_ports.shared && nsim_dev->udp_ports.open_only) { dev_err(&nsim_dev->nsim_bus_dev->dev, "shared can't be used in conjunction with open_only\n"); return -EINVAL; } if (!nsim_dev->udp_ports.shared) ns->udp_ports.ports = ns->udp_ports.__ports; else ns->udp_ports.ports = nsim_dev->udp_ports.__ports; ns->udp_ports.ddir = debugfs_create_dir("udp_ports", ns->nsim_dev_port->ddir); debugfs_create_u32("inject_error", 0600, ns->udp_ports.ddir, &ns->udp_ports.inject_error); ns->udp_ports.dfs_ports[0].array = ns->udp_ports.ports[0]; ns->udp_ports.dfs_ports[0].n_elements = NSIM_UDP_TUNNEL_N_PORTS; debugfs_create_u32_array("table0", 0400, ns->udp_ports.ddir, &ns->udp_ports.dfs_ports[0]); ns->udp_ports.dfs_ports[1].array = ns->udp_ports.ports[1]; ns->udp_ports.dfs_ports[1].n_elements = NSIM_UDP_TUNNEL_N_PORTS; debugfs_create_u32_array("table1", 0400, ns->udp_ports.ddir, &ns->udp_ports.dfs_ports[1]); debugfs_create_file("reset", 0200, ns->udp_ports.ddir, dev, &nsim_udp_tunnels_info_reset_fops); /* Note: it's not normal to allocate the info struct like this! * Drivers are expected to use a static const one, here we're testing. */ info = kmemdup(&nsim_udp_tunnel_info, sizeof(nsim_udp_tunnel_info), GFP_KERNEL); if (!info) return -ENOMEM; ns->udp_ports.sleep = nsim_dev->udp_ports.sleep; if (nsim_dev->udp_ports.sync_all) { info->set_port = NULL; info->unset_port = NULL; } else { info->sync_table = NULL; } if (ns->udp_ports.sleep) info->flags |= UDP_TUNNEL_NIC_INFO_MAY_SLEEP; if (nsim_dev->udp_ports.open_only) info->flags |= UDP_TUNNEL_NIC_INFO_OPEN_ONLY; if (nsim_dev->udp_ports.ipv4_only) info->flags |= UDP_TUNNEL_NIC_INFO_IPV4_ONLY; if (nsim_dev->udp_ports.shared) info->shared = &nsim_dev->udp_ports.utn_shared; if (nsim_dev->udp_ports.static_iana_vxlan) info->flags |= UDP_TUNNEL_NIC_INFO_STATIC_IANA_VXLAN; dev->udp_tunnel_nic_info = info; return 0; } void nsim_udp_tunnels_info_destroy(struct net_device *dev) { struct netdevsim *ns = netdev_priv(dev); debugfs_remove_recursive(ns->udp_ports.ddir); kfree(dev->udp_tunnel_nic_info); dev->udp_tunnel_nic_info = NULL; } void nsim_udp_tunnels_debugfs_create(struct nsim_dev *nsim_dev) { debugfs_create_bool("udp_ports_sync_all", 0600, nsim_dev->ddir, &nsim_dev->udp_ports.sync_all); debugfs_create_bool("udp_ports_open_only", 0600, nsim_dev->ddir, &nsim_dev->udp_ports.open_only); debugfs_create_bool("udp_ports_ipv4_only", 0600, nsim_dev->ddir, &nsim_dev->udp_ports.ipv4_only); debugfs_create_bool("udp_ports_shared", 0600, nsim_dev->ddir, &nsim_dev->udp_ports.shared); debugfs_create_bool("udp_ports_static_iana_vxlan", 0600, nsim_dev->ddir, &nsim_dev->udp_ports.static_iana_vxlan); debugfs_create_u32("udp_ports_sleep", 0600, nsim_dev->ddir, &nsim_dev->udp_ports.sleep); } |
| 67 67 66 67 68 66 1 67 90 89 90 90 90 86 88 89 54 40 59 1 23 1 1 2 14 38 16 35 19 12 2 2 19 56 8 8 2 1 1 4 2 110 110 67 110 81 120 121 120 119 116 119 117 1 30 1 5 2 2 81 110 109 108 1 110 238 241 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 | // SPDX-License-Identifier: GPL-2.0-or-later /* * Handle incoming frames * Linux ethernet bridge * * Authors: * Lennert Buytenhek <buytenh@gnu.org> */ #include <linux/slab.h> #include <linux/kernel.h> #include <linux/netdevice.h> #include <linux/etherdevice.h> #include <linux/netfilter_bridge.h> #ifdef CONFIG_NETFILTER_FAMILY_BRIDGE #include <net/netfilter/nf_queue.h> #endif #include <linux/neighbour.h> #include <net/arp.h> #include <net/dsa.h> #include <linux/export.h> #include <linux/rculist.h> #include "br_private.h" #include "br_private_tunnel.h" static int br_netif_receive_skb(struct net *net, struct sock *sk, struct sk_buff *skb) { br_drop_fake_rtable(skb); return netif_receive_skb(skb); } static int br_pass_frame_up(struct sk_buff *skb, bool promisc) { struct net_device *indev, *brdev = BR_INPUT_SKB_CB(skb)->brdev; struct net_bridge *br = netdev_priv(brdev); struct net_bridge_vlan_group *vg; dev_sw_netstats_rx_add(brdev, skb->len); vg = br_vlan_group_rcu(br); /* Reset the offload_fwd_mark because there could be a stacked * bridge above, and it should not think this bridge it doing * that bridge's work forwarding out its ports. */ br_switchdev_frame_unmark(skb); /* Bridge is just like any other port. Make sure the * packet is allowed except in promisc mode when someone * may be running packet capture. */ if (!(brdev->flags & IFF_PROMISC) && !br_allowed_egress(vg, skb)) { kfree_skb(skb); return NET_RX_DROP; } indev = skb->dev; skb->dev = brdev; skb = br_handle_vlan(br, NULL, vg, skb); if (!skb) return NET_RX_DROP; /* update the multicast stats if the packet is IGMP/MLD */ br_multicast_count(br, NULL, skb, br_multicast_igmp_type(skb), BR_MCAST_DIR_TX); BR_INPUT_SKB_CB(skb)->promisc = promisc; return NF_HOOK(NFPROTO_BRIDGE, NF_BR_LOCAL_IN, dev_net(indev), NULL, skb, indev, NULL, br_netif_receive_skb); } /* note: already called with rcu_read_lock */ int br_handle_frame_finish(struct net *net, struct sock *sk, struct sk_buff *skb) { enum skb_drop_reason reason = SKB_DROP_REASON_NOT_SPECIFIED; struct net_bridge_port *p = br_port_get_rcu(skb->dev); enum br_pkt_type pkt_type = BR_PKT_UNICAST; struct net_bridge_fdb_entry *dst = NULL; struct net_bridge_mcast_port *pmctx; struct net_bridge_mdb_entry *mdst; bool local_rcv, mcast_hit = false; struct net_bridge_mcast *brmctx; struct net_bridge_vlan *vlan; struct net_bridge *br; bool promisc; u16 vid = 0; u8 state; if (!p) goto drop; br = p->br; if (br_mst_is_enabled(br)) { state = BR_STATE_FORWARDING; } else { if (p->state == BR_STATE_DISABLED) { reason = SKB_DROP_REASON_BRIDGE_INGRESS_STP_STATE; goto drop; } state = p->state; } brmctx = &p->br->multicast_ctx; pmctx = &p->multicast_ctx; if (!br_allowed_ingress(p->br, nbp_vlan_group_rcu(p), skb, &vid, &state, &vlan)) goto out; if (p->flags & BR_PORT_LOCKED) { struct net_bridge_fdb_entry *fdb_src = br_fdb_find_rcu(br, eth_hdr(skb)->h_source, vid); if (!fdb_src) { /* FDB miss. Create locked FDB entry if MAB is enabled * and drop the packet. */ if (p->flags & BR_PORT_MAB) br_fdb_update(br, p, eth_hdr(skb)->h_source, vid, BIT(BR_FDB_LOCKED)); goto drop; } else if (READ_ONCE(fdb_src->dst) != p || test_bit(BR_FDB_LOCAL, &fdb_src->flags)) { /* FDB mismatch. Drop the packet without roaming. */ goto drop; } else if (test_bit(BR_FDB_LOCKED, &fdb_src->flags)) { /* FDB match, but entry is locked. Refresh it and drop * the packet. */ br_fdb_update(br, p, eth_hdr(skb)->h_source, vid, BIT(BR_FDB_LOCKED)); goto drop; } } nbp_switchdev_frame_mark(p, skb); /* insert into forwarding database after filtering to avoid spoofing */ if (p->flags & BR_LEARNING) br_fdb_update(br, p, eth_hdr(skb)->h_source, vid, 0); promisc = !!(br->dev->flags & IFF_PROMISC); local_rcv = promisc; if (is_multicast_ether_addr(eth_hdr(skb)->h_dest)) { /* by definition the broadcast is also a multicast address */ if (is_broadcast_ether_addr(eth_hdr(skb)->h_dest)) { pkt_type = BR_PKT_BROADCAST; local_rcv = true; } else { pkt_type = BR_PKT_MULTICAST; if (br_multicast_rcv(&brmctx, &pmctx, vlan, skb, vid)) goto drop; } } if (state == BR_STATE_LEARNING) { reason = SKB_DROP_REASON_BRIDGE_INGRESS_STP_STATE; goto drop; } BR_INPUT_SKB_CB(skb)->brdev = br->dev; BR_INPUT_SKB_CB(skb)->src_port_isolated = !!(p->flags & BR_ISOLATED); if (IS_ENABLED(CONFIG_INET) && (skb->protocol == htons(ETH_P_ARP) || skb->protocol == htons(ETH_P_RARP))) { br_do_proxy_suppress_arp(skb, br, vid, p); } else if (IS_ENABLED(CONFIG_IPV6) && skb->protocol == htons(ETH_P_IPV6) && br_opt_get(br, BROPT_NEIGH_SUPPRESS_ENABLED) && pskb_may_pull(skb, sizeof(struct ipv6hdr) + sizeof(struct nd_msg)) && ipv6_hdr(skb)->nexthdr == IPPROTO_ICMPV6) { struct nd_msg *msg, _msg; msg = br_is_nd_neigh_msg(skb, &_msg); if (msg) br_do_suppress_nd(skb, br, vid, p, msg); } switch (pkt_type) { case BR_PKT_MULTICAST: mdst = br_mdb_entry_skb_get(brmctx, skb, vid); if ((mdst || BR_INPUT_SKB_CB_MROUTERS_ONLY(skb)) && br_multicast_querier_exists(brmctx, eth_hdr(skb), mdst)) { if ((mdst && mdst->host_joined) || br_multicast_is_router(brmctx, skb)) { local_rcv = true; DEV_STATS_INC(br->dev, multicast); } mcast_hit = true; } else { local_rcv = true; DEV_STATS_INC(br->dev, multicast); } break; case BR_PKT_UNICAST: dst = br_fdb_find_rcu(br, eth_hdr(skb)->h_dest, vid); break; default: break; } if (dst) { unsigned long now = jiffies; if (test_bit(BR_FDB_LOCAL, &dst->flags)) return br_pass_frame_up(skb, false); if (now != dst->used) dst->used = now; br_forward(dst->dst, skb, local_rcv, false); } else { if (!mcast_hit) br_flood(br, skb, pkt_type, local_rcv, false, vid); else br_multicast_flood(mdst, skb, brmctx, local_rcv, false); } if (local_rcv) return br_pass_frame_up(skb, promisc); out: return 0; drop: kfree_skb_reason(skb, reason); goto out; } EXPORT_SYMBOL_GPL(br_handle_frame_finish); static void __br_handle_local_finish(struct sk_buff *skb) { struct net_bridge_port *p = br_port_get_rcu(skb->dev); u16 vid = 0; /* check if vlan is allowed, to avoid spoofing */ if ((p->flags & BR_LEARNING) && nbp_state_should_learn(p) && !br_opt_get(p->br, BROPT_NO_LL_LEARN) && br_should_learn(p, skb, &vid)) br_fdb_update(p->br, p, eth_hdr(skb)->h_source, vid, 0); } /* note: already called with rcu_read_lock */ static int br_handle_local_finish(struct net *net, struct sock *sk, struct sk_buff *skb) { __br_handle_local_finish(skb); /* return 1 to signal the okfn() was called so it's ok to use the skb */ return 1; } static int nf_hook_bridge_pre(struct sk_buff *skb, struct sk_buff **pskb) { #ifdef CONFIG_NETFILTER_FAMILY_BRIDGE struct nf_hook_entries *e = NULL; struct nf_hook_state state; unsigned int verdict, i; struct net *net; int ret; net = dev_net(skb->dev); #ifdef HAVE_JUMP_LABEL if (!static_key_false(&nf_hooks_needed[NFPROTO_BRIDGE][NF_BR_PRE_ROUTING])) goto frame_finish; #endif e = rcu_dereference(net->nf.hooks_bridge[NF_BR_PRE_ROUTING]); if (!e) goto frame_finish; nf_hook_state_init(&state, NF_BR_PRE_ROUTING, NFPROTO_BRIDGE, skb->dev, NULL, NULL, net, br_handle_frame_finish); for (i = 0; i < e->num_hook_entries; i++) { verdict = nf_hook_entry_hookfn(&e->hooks[i], skb, &state); switch (verdict & NF_VERDICT_MASK) { case NF_ACCEPT: if (BR_INPUT_SKB_CB(skb)->br_netfilter_broute) { *pskb = skb; return RX_HANDLER_PASS; } break; case NF_DROP: kfree_skb(skb); return RX_HANDLER_CONSUMED; case NF_QUEUE: ret = nf_queue(skb, &state, i, verdict); if (ret == 1) continue; return RX_HANDLER_CONSUMED; default: /* STOLEN */ return RX_HANDLER_CONSUMED; } } frame_finish: net = dev_net(skb->dev); br_handle_frame_finish(net, NULL, skb); #else br_handle_frame_finish(dev_net(skb->dev), NULL, skb); #endif return RX_HANDLER_CONSUMED; } /* Return 0 if the frame was not processed otherwise 1 * note: already called with rcu_read_lock */ static int br_process_frame_type(struct net_bridge_port *p, struct sk_buff *skb) { struct br_frame_type *tmp; hlist_for_each_entry_rcu(tmp, &p->br->frame_type_list, list) if (unlikely(tmp->type == skb->protocol)) return tmp->frame_handler(p, skb); return 0; } /* * Return NULL if skb is handled * note: already called with rcu_read_lock */ static rx_handler_result_t br_handle_frame(struct sk_buff **pskb) { enum skb_drop_reason reason = SKB_DROP_REASON_NOT_SPECIFIED; struct net_bridge_port *p; struct sk_buff *skb = *pskb; const unsigned char *dest = eth_hdr(skb)->h_dest; if (unlikely(skb->pkt_type == PACKET_LOOPBACK)) return RX_HANDLER_PASS; if (!is_valid_ether_addr(eth_hdr(skb)->h_source)) { reason = SKB_DROP_REASON_MAC_INVALID_SOURCE; goto drop; } skb = skb_share_check(skb, GFP_ATOMIC); if (!skb) return RX_HANDLER_CONSUMED; memset(skb->cb, 0, sizeof(struct br_input_skb_cb)); br_tc_skb_miss_set(skb, false); p = br_port_get_rcu(skb->dev); if (p->flags & BR_VLAN_TUNNEL) br_handle_ingress_vlan_tunnel(skb, p, nbp_vlan_group_rcu(p)); if (unlikely(is_link_local_ether_addr(dest))) { u16 fwd_mask = p->br->group_fwd_mask_required; /* * See IEEE 802.1D Table 7-10 Reserved addresses * * Assignment Value * Bridge Group Address 01-80-C2-00-00-00 * (MAC Control) 802.3 01-80-C2-00-00-01 * (Link Aggregation) 802.3 01-80-C2-00-00-02 * 802.1X PAE address 01-80-C2-00-00-03 * * 802.1AB LLDP 01-80-C2-00-00-0E * * Others reserved for future standardization */ fwd_mask |= p->group_fwd_mask; switch (dest[5]) { case 0x00: /* Bridge Group Address */ /* If STP is turned off, then must forward to keep loop detection */ if (p->br->stp_enabled == BR_NO_STP || fwd_mask & (1u << dest[5])) goto forward; *pskb = skb; __br_handle_local_finish(skb); return RX_HANDLER_PASS; case 0x01: /* IEEE MAC (Pause) */ reason = SKB_DROP_REASON_MAC_IEEE_MAC_CONTROL; goto drop; case 0x0E: /* 802.1AB LLDP */ fwd_mask |= p->br->group_fwd_mask; if (fwd_mask & (1u << dest[5])) goto forward; *pskb = skb; __br_handle_local_finish(skb); return RX_HANDLER_PASS; default: /* Allow selective forwarding for most other protocols */ fwd_mask |= p->br->group_fwd_mask; if (fwd_mask & (1u << dest[5])) goto forward; } BR_INPUT_SKB_CB(skb)->promisc = false; /* The else clause should be hit when nf_hook(): * - returns < 0 (drop/error) * - returns = 0 (stolen/nf_queue) * Thus return 1 from the okfn() to signal the skb is ok to pass */ if (NF_HOOK(NFPROTO_BRIDGE, NF_BR_LOCAL_IN, dev_net(skb->dev), NULL, skb, skb->dev, NULL, br_handle_local_finish) == 1) { return RX_HANDLER_PASS; } else { return RX_HANDLER_CONSUMED; } } if (unlikely(br_process_frame_type(p, skb))) return RX_HANDLER_PASS; forward: if (br_mst_is_enabled(p->br)) goto defer_stp_filtering; switch (p->state) { case BR_STATE_FORWARDING: case BR_STATE_LEARNING: defer_stp_filtering: if (ether_addr_equal(p->br->dev->dev_addr, dest)) skb->pkt_type = PACKET_HOST; return nf_hook_bridge_pre(skb, pskb); default: reason = SKB_DROP_REASON_BRIDGE_INGRESS_STP_STATE; drop: kfree_skb_reason(skb, reason); } return RX_HANDLER_CONSUMED; } /* This function has no purpose other than to appease the br_port_get_rcu/rtnl * helpers which identify bridged ports according to the rx_handler installed * on them (so there _needs_ to be a bridge rx_handler even if we don't need it * to do anything useful). This bridge won't support traffic to/from the stack, * but only hardware bridging. So return RX_HANDLER_PASS so we don't steal * frames from the ETH_P_XDSA packet_type handler. */ static rx_handler_result_t br_handle_frame_dummy(struct sk_buff **pskb) { return RX_HANDLER_PASS; } rx_handler_func_t *br_get_rx_handler(const struct net_device *dev) { if (netdev_uses_dsa(dev)) return br_handle_frame_dummy; return br_handle_frame; } void br_add_frame(struct net_bridge *br, struct br_frame_type *ft) { hlist_add_head_rcu(&ft->list, &br->frame_type_list); } void br_del_frame(struct net_bridge *br, struct br_frame_type *ft) { struct br_frame_type *tmp; hlist_for_each_entry(tmp, &br->frame_type_list, list) if (ft == tmp) { hlist_del_rcu(&ft->list); return; } } |
| 890 668 1 51 307 10 436 26 258 225 35 259 260 3 257 256 257 10 115 56 68 11 180 69 5 5 73 2 1 76 6 6 67 67 67 10 57 67 57 10 67 57 9 6 6 7 7 114 114 113 111 113 44 70 1 76 5 5 23 25 23 2 23 21 17 1 16 16 1 2 1 6 5 5 1 7 7 2 5 46 2 11 33 33 6 24 2 7 20 25 29 21 5 1 5 4 1 3 2 8 2 7 2 22 1 10 10 10 1 7 2 9 7 1 6 1 1 2 152 146 1 5 6 4 2 13 1 6 6 43 43 35 1 1 1 33 7 13 22 7 7 1681 1505 198 399 80 471 11 473 91 714 1 1 1 1 1 5 1 5 12 12 5 7 12 36 5 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 | // SPDX-License-Identifier: GPL-2.0-or-later /* linux/net/ipv4/arp.c * * Copyright (C) 1994 by Florian La Roche * * This module implements the Address Resolution Protocol ARP (RFC 826), * which is used to convert IP addresses (or in the future maybe other * high-level addresses) into a low-level hardware address (like an Ethernet * address). * * Fixes: * Alan Cox : Removed the Ethernet assumptions in * Florian's code * Alan Cox : Fixed some small errors in the ARP * logic * Alan Cox : Allow >4K in /proc * Alan Cox : Make ARP add its own protocol entry * Ross Martin : Rewrote arp_rcv() and arp_get_info() * Stephen Henson : Add AX25 support to arp_get_info() * Alan Cox : Drop data when a device is downed. * Alan Cox : Use init_timer(). * Alan Cox : Double lock fixes. * Martin Seine : Move the arphdr structure * to if_arp.h for compatibility. * with BSD based programs. * Andrew Tridgell : Added ARP netmask code and * re-arranged proxy handling. * Alan Cox : Changed to use notifiers. * Niibe Yutaka : Reply for this device or proxies only. * Alan Cox : Don't proxy across hardware types! * Jonathan Naylor : Added support for NET/ROM. * Mike Shaver : RFC1122 checks. * Jonathan Naylor : Only lookup the hardware address for * the correct hardware type. * Germano Caronni : Assorted subtle races. * Craig Schlenter : Don't modify permanent entry * during arp_rcv. * Russ Nelson : Tidied up a few bits. * Alexey Kuznetsov: Major changes to caching and behaviour, * eg intelligent arp probing and * generation * of host down events. * Alan Cox : Missing unlock in device events. * Eckes : ARP ioctl control errors. * Alexey Kuznetsov: Arp free fix. * Manuel Rodriguez: Gratuitous ARP. * Jonathan Layes : Added arpd support through kerneld * message queue (960314) * Mike Shaver : /proc/sys/net/ipv4/arp_* support * Mike McLagan : Routing by source * Stuart Cheshire : Metricom and grat arp fixes * *** FOR 2.1 clean this up *** * Lawrence V. Stefani: (08/12/96) Added FDDI support. * Alan Cox : Took the AP1000 nasty FDDI hack and * folded into the mainstream FDDI code. * Ack spit, Linus how did you allow that * one in... * Jes Sorensen : Make FDDI work again in 2.1.x and * clean up the APFDDI & gen. FDDI bits. * Alexey Kuznetsov: new arp state machine; * now it is in net/core/neighbour.c. * Krzysztof Halasa: Added Frame Relay ARP support. * Arnaldo C. Melo : convert /proc/net/arp to seq_file * Shmulik Hen: Split arp_send to arp_create and * arp_xmit so intermediate drivers like * bonding can change the skb before * sending (e.g. insert 8021q tag). * Harald Welte : convert to make use of jenkins hash * Jesper D. Brouer: Proxy ARP PVLAN RFC 3069 support. */ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #include <linux/module.h> #include <linux/types.h> #include <linux/string.h> #include <linux/kernel.h> #include <linux/capability.h> #include <linux/socket.h> #include <linux/sockios.h> #include <linux/errno.h> #include <linux/in.h> #include <linux/mm.h> #include <linux/inet.h> #include <linux/inetdevice.h> #include <linux/netdevice.h> #include <linux/etherdevice.h> #include <linux/fddidevice.h> #include <linux/if_arp.h> #include <linux/skbuff.h> #include <linux/proc_fs.h> #include <linux/seq_file.h> #include <linux/stat.h> #include <linux/init.h> #include <linux/net.h> #include <linux/rcupdate.h> #include <linux/slab.h> #ifdef CONFIG_SYSCTL #include <linux/sysctl.h> #endif #include <net/net_namespace.h> #include <net/ip.h> #include <net/icmp.h> #include <net/route.h> #include <net/protocol.h> #include <net/tcp.h> #include <net/sock.h> #include <net/arp.h> #include <net/ax25.h> #include <net/netrom.h> #include <net/dst_metadata.h> #include <net/ip_tunnels.h> #include <linux/uaccess.h> #include <linux/netfilter_arp.h> /* * Interface to generic neighbour cache. */ static u32 arp_hash(const void *pkey, const struct net_device *dev, __u32 *hash_rnd); static bool arp_key_eq(const struct neighbour *n, const void *pkey); static int arp_constructor(struct neighbour *neigh); static void arp_solicit(struct neighbour *neigh, struct sk_buff *skb); static void arp_error_report(struct neighbour *neigh, struct sk_buff *skb); static void parp_redo(struct sk_buff *skb); static int arp_is_multicast(const void *pkey); static const struct neigh_ops arp_generic_ops = { .family = AF_INET, .solicit = arp_solicit, .error_report = arp_error_report, .output = neigh_resolve_output, .connected_output = neigh_connected_output, }; static const struct neigh_ops arp_hh_ops = { .family = AF_INET, .solicit = arp_solicit, .error_report = arp_error_report, .output = neigh_resolve_output, .connected_output = neigh_resolve_output, }; static const struct neigh_ops arp_direct_ops = { .family = AF_INET, .output = neigh_direct_output, .connected_output = neigh_direct_output, }; struct neigh_table arp_tbl = { .family = AF_INET, .key_len = 4, .protocol = cpu_to_be16(ETH_P_IP), .hash = arp_hash, .key_eq = arp_key_eq, .constructor = arp_constructor, .proxy_redo = parp_redo, .is_multicast = arp_is_multicast, .id = "arp_cache", .parms = { .tbl = &arp_tbl, .reachable_time = 30 * HZ, .data = { [NEIGH_VAR_MCAST_PROBES] = 3, [NEIGH_VAR_UCAST_PROBES] = 3, [NEIGH_VAR_RETRANS_TIME] = 1 * HZ, [NEIGH_VAR_BASE_REACHABLE_TIME] = 30 * HZ, [NEIGH_VAR_DELAY_PROBE_TIME] = 5 * HZ, [NEIGH_VAR_INTERVAL_PROBE_TIME_MS] = 5 * HZ, [NEIGH_VAR_GC_STALETIME] = 60 * HZ, [NEIGH_VAR_QUEUE_LEN_BYTES] = SK_WMEM_MAX, [NEIGH_VAR_PROXY_QLEN] = 64, [NEIGH_VAR_ANYCAST_DELAY] = 1 * HZ, [NEIGH_VAR_PROXY_DELAY] = (8 * HZ) / 10, [NEIGH_VAR_LOCKTIME] = 1 * HZ, }, }, .gc_interval = 30 * HZ, .gc_thresh1 = 128, .gc_thresh2 = 512, .gc_thresh3 = 1024, }; EXPORT_SYMBOL(arp_tbl); int arp_mc_map(__be32 addr, u8 *haddr, struct net_device *dev, int dir) { switch (dev->type) { case ARPHRD_ETHER: case ARPHRD_FDDI: case ARPHRD_IEEE802: ip_eth_mc_map(addr, haddr); return 0; case ARPHRD_INFINIBAND: ip_ib_mc_map(addr, dev->broadcast, haddr); return 0; case ARPHRD_IPGRE: ip_ipgre_mc_map(addr, dev->broadcast, haddr); return 0; default: if (dir) { memcpy(haddr, dev->broadcast, dev->addr_len); return 0; } } return -EINVAL; } static u32 arp_hash(const void *pkey, const struct net_device *dev, __u32 *hash_rnd) { return arp_hashfn(pkey, dev, hash_rnd); } static bool arp_key_eq(const struct neighbour *neigh, const void *pkey) { return neigh_key_eq32(neigh, pkey); } static int arp_constructor(struct neighbour *neigh) { __be32 addr; struct net_device *dev = neigh->dev; struct in_device *in_dev; struct neigh_parms *parms; u32 inaddr_any = INADDR_ANY; if (dev->flags & (IFF_LOOPBACK | IFF_POINTOPOINT)) memcpy(neigh->primary_key, &inaddr_any, arp_tbl.key_len); addr = *(__be32 *)neigh->primary_key; rcu_read_lock(); in_dev = __in_dev_get_rcu(dev); if (!in_dev) { rcu_read_unlock(); return -EINVAL; } neigh->type = inet_addr_type_dev_table(dev_net(dev), dev, addr); parms = in_dev->arp_parms; __neigh_parms_put(neigh->parms); neigh->parms = neigh_parms_clone(parms); rcu_read_unlock(); if (!dev->header_ops) { neigh->nud_state = NUD_NOARP; neigh->ops = &arp_direct_ops; neigh->output = neigh_direct_output; } else { /* Good devices (checked by reading texts, but only Ethernet is tested) ARPHRD_ETHER: (ethernet, apfddi) ARPHRD_FDDI: (fddi) ARPHRD_IEEE802: (tr) ARPHRD_METRICOM: (strip) ARPHRD_ARCNET: etc. etc. etc. ARPHRD_IPDDP will also work, if author repairs it. I did not it, because this driver does not work even in old paradigm. */ if (neigh->type == RTN_MULTICAST) { neigh->nud_state = NUD_NOARP; arp_mc_map(addr, neigh->ha, dev, 1); } else if (dev->flags & (IFF_NOARP | IFF_LOOPBACK)) { neigh->nud_state = NUD_NOARP; memcpy(neigh->ha, dev->dev_addr, dev->addr_len); } else if (neigh->type == RTN_BROADCAST || (dev->flags & IFF_POINTOPOINT)) { neigh->nud_state = NUD_NOARP; memcpy(neigh->ha, dev->broadcast, dev->addr_len); } if (dev->header_ops->cache) neigh->ops = &arp_hh_ops; else neigh->ops = &arp_generic_ops; if (neigh->nud_state & NUD_VALID) neigh->output = neigh->ops->connected_output; else neigh->output = neigh->ops->output; } return 0; } static void arp_error_report(struct neighbour *neigh, struct sk_buff *skb) { dst_link_failure(skb); kfree_skb_reason(skb, SKB_DROP_REASON_NEIGH_FAILED); } /* Create and send an arp packet. */ static void arp_send_dst(int type, int ptype, __be32 dest_ip, struct net_device *dev, __be32 src_ip, const unsigned char *dest_hw, const unsigned char *src_hw, const unsigned char *target_hw, struct dst_entry *dst) { struct sk_buff *skb; /* arp on this interface. */ if (dev->flags & IFF_NOARP) return; skb = arp_create(type, ptype, dest_ip, dev, src_ip, dest_hw, src_hw, target_hw); if (!skb) return; skb_dst_set(skb, dst_clone(dst)); arp_xmit(skb); } void arp_send(int type, int ptype, __be32 dest_ip, struct net_device *dev, __be32 src_ip, const unsigned char *dest_hw, const unsigned char *src_hw, const unsigned char *target_hw) { arp_send_dst(type, ptype, dest_ip, dev, src_ip, dest_hw, src_hw, target_hw, NULL); } EXPORT_SYMBOL(arp_send); static void arp_solicit(struct neighbour *neigh, struct sk_buff *skb) { __be32 saddr = 0; u8 dst_ha[MAX_ADDR_LEN], *dst_hw = NULL; struct net_device *dev = neigh->dev; __be32 target = *(__be32 *)neigh->primary_key; int probes = atomic_read(&neigh->probes); struct in_device *in_dev; struct dst_entry *dst = NULL; rcu_read_lock(); in_dev = __in_dev_get_rcu(dev); if (!in_dev) { rcu_read_unlock(); return; } switch (IN_DEV_ARP_ANNOUNCE(in_dev)) { default: case 0: /* By default announce any local IP */ if (skb && inet_addr_type_dev_table(dev_net(dev), dev, ip_hdr(skb)->saddr) == RTN_LOCAL) saddr = ip_hdr(skb)->saddr; break; case 1: /* Restrict announcements of saddr in same subnet */ if (!skb) break; saddr = ip_hdr(skb)->saddr; if (inet_addr_type_dev_table(dev_net(dev), dev, saddr) == RTN_LOCAL) { /* saddr should be known to target */ if (inet_addr_onlink(in_dev, target, saddr)) break; } saddr = 0; break; case 2: /* Avoid secondary IPs, get a primary/preferred one */ break; } rcu_read_unlock(); if (!saddr) saddr = inet_select_addr(dev, target, RT_SCOPE_LINK); probes -= NEIGH_VAR(neigh->parms, UCAST_PROBES); if (probes < 0) { if (!(READ_ONCE(neigh->nud_state) & NUD_VALID)) pr_debug("trying to ucast probe in NUD_INVALID\n"); neigh_ha_snapshot(dst_ha, neigh, dev); dst_hw = dst_ha; } else { probes -= NEIGH_VAR(neigh->parms, APP_PROBES); if (probes < 0) { neigh_app_ns(neigh); return; } } if (skb && !(dev->priv_flags & IFF_XMIT_DST_RELEASE)) dst = skb_dst(skb); arp_send_dst(ARPOP_REQUEST, ETH_P_ARP, target, dev, saddr, dst_hw, dev->dev_addr, NULL, dst); } static int arp_ignore(struct in_device *in_dev, __be32 sip, __be32 tip) { struct net *net = dev_net(in_dev->dev); int scope; switch (IN_DEV_ARP_IGNORE(in_dev)) { case 0: /* Reply, the tip is already validated */ return 0; case 1: /* Reply only if tip is configured on the incoming interface */ sip = 0; scope = RT_SCOPE_HOST; break; case 2: /* * Reply only if tip is configured on the incoming interface * and is in same subnet as sip */ scope = RT_SCOPE_HOST; break; case 3: /* Do not reply for scope host addresses */ sip = 0; scope = RT_SCOPE_LINK; in_dev = NULL; break; case 4: /* Reserved */ case 5: case 6: case 7: return 0; case 8: /* Do not reply */ return 1; default: return 0; } return !inet_confirm_addr(net, in_dev, sip, tip, scope); } static int arp_accept(struct in_device *in_dev, __be32 sip) { struct net *net = dev_net(in_dev->dev); int scope = RT_SCOPE_LINK; switch (IN_DEV_ARP_ACCEPT(in_dev)) { case 0: /* Don't create new entries from garp */ return 0; case 1: /* Create new entries from garp */ return 1; case 2: /* Create a neighbor in the arp table only if sip * is in the same subnet as an address configured * on the interface that received the garp message */ return !!inet_confirm_addr(net, in_dev, sip, 0, scope); default: return 0; } } static int arp_filter(__be32 sip, __be32 tip, struct net_device *dev) { struct rtable *rt; int flag = 0; /*unsigned long now; */ struct net *net = dev_net(dev); rt = ip_route_output(net, sip, tip, 0, l3mdev_master_ifindex_rcu(dev), RT_SCOPE_UNIVERSE); if (IS_ERR(rt)) return 1; if (rt->dst.dev != dev) { __NET_INC_STATS(net, LINUX_MIB_ARPFILTER); flag = 1; } ip_rt_put(rt); return flag; } /* * Check if we can use proxy ARP for this path */ static inline int arp_fwd_proxy(struct in_device *in_dev, struct net_device *dev, struct rtable *rt) { struct in_device *out_dev; int imi, omi = -1; if (rt->dst.dev == dev) return 0; if (!IN_DEV_PROXY_ARP(in_dev)) return 0; imi = IN_DEV_MEDIUM_ID(in_dev); if (imi == 0) return 1; if (imi == -1) return 0; /* place to check for proxy_arp for routes */ out_dev = __in_dev_get_rcu(rt->dst.dev); if (out_dev) omi = IN_DEV_MEDIUM_ID(out_dev); return omi != imi && omi != -1; } /* * Check for RFC3069 proxy arp private VLAN (allow to send back to same dev) * * RFC3069 supports proxy arp replies back to the same interface. This * is done to support (ethernet) switch features, like RFC 3069, where * the individual ports are not allowed to communicate with each * other, BUT they are allowed to talk to the upstream router. As * described in RFC 3069, it is possible to allow these hosts to * communicate through the upstream router, by proxy_arp'ing. * * RFC 3069: "VLAN Aggregation for Efficient IP Address Allocation" * * This technology is known by different names: * In RFC 3069 it is called VLAN Aggregation. * Cisco and Allied Telesyn call it Private VLAN. * Hewlett-Packard call it Source-Port filtering or port-isolation. * Ericsson call it MAC-Forced Forwarding (RFC Draft). * */ static inline int arp_fwd_pvlan(struct in_device *in_dev, struct net_device *dev, struct rtable *rt, __be32 sip, __be32 tip) { /* Private VLAN is only concerned about the same ethernet segment */ if (rt->dst.dev != dev) return 0; /* Don't reply on self probes (often done by windowz boxes)*/ if (sip == tip) return 0; if (IN_DEV_PROXY_ARP_PVLAN(in_dev)) return 1; else return 0; } /* * Interface to link layer: send routine and receive handler. */ /* * Create an arp packet. If dest_hw is not set, we create a broadcast * message. */ struct sk_buff *arp_create(int type, int ptype, __be32 dest_ip, struct net_device *dev, __be32 src_ip, const unsigned char *dest_hw, const unsigned char *src_hw, const unsigned char *target_hw) { struct sk_buff *skb; struct arphdr *arp; unsigned char *arp_ptr; int hlen = LL_RESERVED_SPACE(dev); int tlen = dev->needed_tailroom; /* * Allocate a buffer */ skb = alloc_skb(arp_hdr_len(dev) + hlen + tlen, GFP_ATOMIC); if (!skb) return NULL; skb_reserve(skb, hlen); skb_reset_network_header(skb); arp = skb_put(skb, arp_hdr_len(dev)); skb->dev = dev; skb->protocol = htons(ETH_P_ARP); if (!src_hw) src_hw = dev->dev_addr; if (!dest_hw) dest_hw = dev->broadcast; /* * Fill the device header for the ARP frame */ if (dev_hard_header(skb, dev, ptype, dest_hw, src_hw, skb->len) < 0) goto out; /* * Fill out the arp protocol part. * * The arp hardware type should match the device type, except for FDDI, * which (according to RFC 1390) should always equal 1 (Ethernet). */ /* * Exceptions everywhere. AX.25 uses the AX.25 PID value not the * DIX code for the protocol. Make these device structure fields. */ switch (dev->type) { default: arp->ar_hrd = htons(dev->type); arp->ar_pro = htons(ETH_P_IP); break; #if IS_ENABLED(CONFIG_AX25) case ARPHRD_AX25: arp->ar_hrd = htons(ARPHRD_AX25); arp->ar_pro = htons(AX25_P_IP); break; #if IS_ENABLED(CONFIG_NETROM) case ARPHRD_NETROM: arp->ar_hrd = htons(ARPHRD_NETROM); arp->ar_pro = htons(AX25_P_IP); break; #endif #endif #if IS_ENABLED(CONFIG_FDDI) case ARPHRD_FDDI: arp->ar_hrd = htons(ARPHRD_ETHER); arp->ar_pro = htons(ETH_P_IP); break; #endif } arp->ar_hln = dev->addr_len; arp->ar_pln = 4; arp->ar_op = htons(type); arp_ptr = (unsigned char *)(arp + 1); memcpy(arp_ptr, src_hw, dev->addr_len); arp_ptr += dev->addr_len; memcpy(arp_ptr, &src_ip, 4); arp_ptr += 4; switch (dev->type) { #if IS_ENABLED(CONFIG_FIREWIRE_NET) case ARPHRD_IEEE1394: break; #endif default: if (target_hw) memcpy(arp_ptr, target_hw, dev->addr_len); else memset(arp_ptr, 0, dev->addr_len); arp_ptr += dev->addr_len; } memcpy(arp_ptr, &dest_ip, 4); return skb; out: kfree_skb(skb); return NULL; } EXPORT_SYMBOL(arp_create); static int arp_xmit_finish(struct net *net, struct sock *sk, struct sk_buff *skb) { return dev_queue_xmit(skb); } /* * Send an arp packet. */ void arp_xmit(struct sk_buff *skb) { /* Send it off, maybe filter it using firewalling first. */ NF_HOOK(NFPROTO_ARP, NF_ARP_OUT, dev_net(skb->dev), NULL, skb, NULL, skb->dev, arp_xmit_finish); } EXPORT_SYMBOL(arp_xmit); static bool arp_is_garp(struct net *net, struct net_device *dev, int *addr_type, __be16 ar_op, __be32 sip, __be32 tip, unsigned char *sha, unsigned char *tha) { bool is_garp = tip == sip; /* Gratuitous ARP _replies_ also require target hwaddr to be * the same as source. */ if (is_garp && ar_op == htons(ARPOP_REPLY)) is_garp = /* IPv4 over IEEE 1394 doesn't provide target * hardware address field in its ARP payload. */ tha && !memcmp(tha, sha, dev->addr_len); if (is_garp) { *addr_type = inet_addr_type_dev_table(net, dev, sip); if (*addr_type != RTN_UNICAST) is_garp = false; } return is_garp; } /* * Process an arp request. */ static int arp_process(struct net *net, struct sock *sk, struct sk_buff *skb) { struct net_device *dev = skb->dev; struct in_device *in_dev = __in_dev_get_rcu(dev); struct arphdr *arp; unsigned char *arp_ptr; struct rtable *rt; unsigned char *sha; unsigned char *tha = NULL; __be32 sip, tip; u16 dev_type = dev->type; int addr_type; struct neighbour *n; struct dst_entry *reply_dst = NULL; bool is_garp = false; /* arp_rcv below verifies the ARP header and verifies the device * is ARP'able. */ if (!in_dev) goto out_free_skb; arp = arp_hdr(skb); switch (dev_type) { default: if (arp->ar_pro != htons(ETH_P_IP) || htons(dev_type) != arp->ar_hrd) goto out_free_skb; break; case ARPHRD_ETHER: case ARPHRD_FDDI: case ARPHRD_IEEE802: /* * ETHERNET, and Fibre Channel (which are IEEE 802 * devices, according to RFC 2625) devices will accept ARP * hardware types of either 1 (Ethernet) or 6 (IEEE 802.2). * This is the case also of FDDI, where the RFC 1390 says that * FDDI devices should accept ARP hardware of (1) Ethernet, * however, to be more robust, we'll accept both 1 (Ethernet) * or 6 (IEEE 802.2) */ if ((arp->ar_hrd != htons(ARPHRD_ETHER) && arp->ar_hrd != htons(ARPHRD_IEEE802)) || arp->ar_pro != htons(ETH_P_IP)) goto out_free_skb; break; case ARPHRD_AX25: if (arp->ar_pro != htons(AX25_P_IP) || arp->ar_hrd != htons(ARPHRD_AX25)) goto out_free_skb; break; case ARPHRD_NETROM: if (arp->ar_pro != htons(AX25_P_IP) || arp->ar_hrd != htons(ARPHRD_NETROM)) goto out_free_skb; break; } /* Understand only these message types */ if (arp->ar_op != htons(ARPOP_REPLY) && arp->ar_op != htons(ARPOP_REQUEST)) goto out_free_skb; /* * Extract fields */ arp_ptr = (unsigned char *)(arp + 1); sha = arp_ptr; arp_ptr += dev->addr_len; memcpy(&sip, arp_ptr, 4); arp_ptr += 4; switch (dev_type) { #if IS_ENABLED(CONFIG_FIREWIRE_NET) case ARPHRD_IEEE1394: break; #endif default: tha = arp_ptr; arp_ptr += dev->addr_len; } memcpy(&tip, arp_ptr, 4); /* * Check for bad requests for 127.x.x.x and requests for multicast * addresses. If this is one such, delete it. */ if (ipv4_is_multicast(tip) || (!IN_DEV_ROUTE_LOCALNET(in_dev) && ipv4_is_loopback(tip))) goto out_free_skb; /* * For some 802.11 wireless deployments (and possibly other networks), * there will be an ARP proxy and gratuitous ARP frames are attacks * and thus should not be accepted. */ if (sip == tip && IN_DEV_ORCONF(in_dev, DROP_GRATUITOUS_ARP)) goto out_free_skb; /* * Special case: We must set Frame Relay source Q.922 address */ if (dev_type == ARPHRD_DLCI) sha = dev->broadcast; /* * Process entry. The idea here is we want to send a reply if it is a * request for us or if it is a request for someone else that we hold * a proxy for. We want to add an entry to our cache if it is a reply * to us or if it is a request for our address. * (The assumption for this last is that if someone is requesting our * address, they are probably intending to talk to us, so it saves time * if we cache their address. Their address is also probably not in * our cache, since ours is not in their cache.) * * Putting this another way, we only care about replies if they are to * us, in which case we add them to the cache. For requests, we care * about those for us and those for our proxies. We reply to both, * and in the case of requests for us we add the requester to the arp * cache. */ if (arp->ar_op == htons(ARPOP_REQUEST) && skb_metadata_dst(skb)) reply_dst = (struct dst_entry *) iptunnel_metadata_reply(skb_metadata_dst(skb), GFP_ATOMIC); /* Special case: IPv4 duplicate address detection packet (RFC2131) */ if (sip == 0) { if (arp->ar_op == htons(ARPOP_REQUEST) && inet_addr_type_dev_table(net, dev, tip) == RTN_LOCAL && !arp_ignore(in_dev, sip, tip)) arp_send_dst(ARPOP_REPLY, ETH_P_ARP, sip, dev, tip, sha, dev->dev_addr, sha, reply_dst); goto out_consume_skb; } if (arp->ar_op == htons(ARPOP_REQUEST) && ip_route_input_noref(skb, tip, sip, 0, dev) == 0) { rt = skb_rtable(skb); addr_type = rt->rt_type; if (addr_type == RTN_LOCAL) { int dont_send; dont_send = arp_ignore(in_dev, sip, tip); if (!dont_send && IN_DEV_ARPFILTER(in_dev)) dont_send = arp_filter(sip, tip, dev); if (!dont_send) { n = neigh_event_ns(&arp_tbl, sha, &sip, dev); if (n) { arp_send_dst(ARPOP_REPLY, ETH_P_ARP, sip, dev, tip, sha, dev->dev_addr, sha, reply_dst); neigh_release(n); } } goto out_consume_skb; } else if (IN_DEV_FORWARD(in_dev)) { if (addr_type == RTN_UNICAST && (arp_fwd_proxy(in_dev, dev, rt) || arp_fwd_pvlan(in_dev, dev, rt, sip, tip) || (rt->dst.dev != dev && pneigh_lookup(&arp_tbl, net, &tip, dev, 0)))) { n = neigh_event_ns(&arp_tbl, sha, &sip, dev); if (n) neigh_release(n); if (NEIGH_CB(skb)->flags & LOCALLY_ENQUEUED || skb->pkt_type == PACKET_HOST || NEIGH_VAR(in_dev->arp_parms, PROXY_DELAY) == 0) { arp_send_dst(ARPOP_REPLY, ETH_P_ARP, sip, dev, tip, sha, dev->dev_addr, sha, reply_dst); } else { pneigh_enqueue(&arp_tbl, in_dev->arp_parms, skb); goto out_free_dst; } goto out_consume_skb; } } } /* Update our ARP tables */ n = __neigh_lookup(&arp_tbl, &sip, dev, 0); addr_type = -1; if (n || arp_accept(in_dev, sip)) { is_garp = arp_is_garp(net, dev, &addr_type, arp->ar_op, sip, tip, sha, tha); } if (arp_accept(in_dev, sip)) { /* Unsolicited ARP is not accepted by default. It is possible, that this option should be enabled for some devices (strip is candidate) */ if (!n && (is_garp || (arp->ar_op == htons(ARPOP_REPLY) && (addr_type == RTN_UNICAST || (addr_type < 0 && /* postpone calculation to as late as possible */ inet_addr_type_dev_table(net, dev, sip) == RTN_UNICAST))))) n = __neigh_lookup(&arp_tbl, &sip, dev, 1); } if (n) { int state = NUD_REACHABLE; int override; /* If several different ARP replies follows back-to-back, use the FIRST one. It is possible, if several proxy agents are active. Taking the first reply prevents arp trashing and chooses the fastest router. */ override = time_after(jiffies, n->updated + NEIGH_VAR(n->parms, LOCKTIME)) || is_garp; /* Broadcast replies and request packets do not assert neighbour reachability. */ if (arp->ar_op != htons(ARPOP_REPLY) || skb->pkt_type != PACKET_HOST) state = NUD_STALE; neigh_update(n, sha, state, override ? NEIGH_UPDATE_F_OVERRIDE : 0, 0); neigh_release(n); } out_consume_skb: consume_skb(skb); out_free_dst: dst_release(reply_dst); return NET_RX_SUCCESS; out_free_skb: kfree_skb(skb); return NET_RX_DROP; } static void parp_redo(struct sk_buff *skb) { arp_process(dev_net(skb->dev), NULL, skb); } static int arp_is_multicast(const void *pkey) { return ipv4_is_multicast(*((__be32 *)pkey)); } /* * Receive an arp request from the device layer. */ static int arp_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt, struct net_device *orig_dev) { const struct arphdr *arp; /* do not tweak dropwatch on an ARP we will ignore */ if (dev->flags & IFF_NOARP || skb->pkt_type == PACKET_OTHERHOST || skb->pkt_type == PACKET_LOOPBACK) goto consumeskb; skb = skb_share_check(skb, GFP_ATOMIC); if (!skb) goto out_of_mem; /* ARP header, plus 2 device addresses, plus 2 IP addresses. */ if (!pskb_may_pull(skb, arp_hdr_len(dev))) goto freeskb; arp = arp_hdr(skb); if (arp->ar_hln != dev->addr_len || arp->ar_pln != 4) goto freeskb; memset(NEIGH_CB(skb), 0, sizeof(struct neighbour_cb)); return NF_HOOK(NFPROTO_ARP, NF_ARP_IN, dev_net(dev), NULL, skb, dev, NULL, arp_process); consumeskb: consume_skb(skb); return NET_RX_SUCCESS; freeskb: kfree_skb(skb); out_of_mem: return NET_RX_DROP; } /* * User level interface (ioctl) */ static struct net_device *arp_req_dev_by_name(struct net *net, struct arpreq *r, bool getarp) { struct net_device *dev; if (getarp) dev = dev_get_by_name_rcu(net, r->arp_dev); else dev = __dev_get_by_name(net, r->arp_dev); if (!dev) return ERR_PTR(-ENODEV); /* Mmmm... It is wrong... ARPHRD_NETROM == 0 */ if (!r->arp_ha.sa_family) r->arp_ha.sa_family = dev->type; if ((r->arp_flags & ATF_COM) && r->arp_ha.sa_family != dev->type) return ERR_PTR(-EINVAL); return dev; } static struct net_device *arp_req_dev(struct net *net, struct arpreq *r) { struct net_device *dev; struct rtable *rt; __be32 ip; if (r->arp_dev[0]) return arp_req_dev_by_name(net, r, false); if (r->arp_flags & ATF_PUBL) return NULL; ip = ((struct sockaddr_in *)&r->arp_pa)->sin_addr.s_addr; rt = ip_route_output(net, ip, 0, 0, 0, RT_SCOPE_LINK); if (IS_ERR(rt)) return ERR_CAST(rt); dev = rt->dst.dev; ip_rt_put(rt); if (!dev) return ERR_PTR(-EINVAL); return dev; } /* * Set (create) an ARP cache entry. */ static int arp_req_set_proxy(struct net *net, struct net_device *dev, int on) { if (!dev) { IPV4_DEVCONF_ALL(net, PROXY_ARP) = on; return 0; } if (__in_dev_get_rtnl(dev)) { IN_DEV_CONF_SET(__in_dev_get_rtnl(dev), PROXY_ARP, on); return 0; } return -ENXIO; } static int arp_req_set_public(struct net *net, struct arpreq *r, struct net_device *dev) { __be32 mask = ((struct sockaddr_in *)&r->arp_netmask)->sin_addr.s_addr; if (!dev && (r->arp_flags & ATF_COM)) { dev = dev_getbyhwaddr_rcu(net, r->arp_ha.sa_family, r->arp_ha.sa_data); if (!dev) return -ENODEV; } if (mask) { __be32 ip = ((struct sockaddr_in *)&r->arp_pa)->sin_addr.s_addr; if (!pneigh_lookup(&arp_tbl, net, &ip, dev, 1)) return -ENOBUFS; return 0; } return arp_req_set_proxy(net, dev, 1); } static int arp_req_set(struct net *net, struct arpreq *r) { struct neighbour *neigh; struct net_device *dev; __be32 ip; int err; dev = arp_req_dev(net, r); if (IS_ERR(dev)) return PTR_ERR(dev); if (r->arp_flags & ATF_PUBL) return arp_req_set_public(net, r, dev); switch (dev->type) { #if IS_ENABLED(CONFIG_FDDI) case ARPHRD_FDDI: /* * According to RFC 1390, FDDI devices should accept ARP * hardware types of 1 (Ethernet). However, to be more * robust, we'll accept hardware types of either 1 (Ethernet) * or 6 (IEEE 802.2). */ if (r->arp_ha.sa_family != ARPHRD_FDDI && r->arp_ha.sa_family != ARPHRD_ETHER && r->arp_ha.sa_family != ARPHRD_IEEE802) return -EINVAL; break; #endif default: if (r->arp_ha.sa_family != dev->type) return -EINVAL; break; } ip = ((struct sockaddr_in *)&r->arp_pa)->sin_addr.s_addr; neigh = __neigh_lookup_errno(&arp_tbl, &ip, dev); err = PTR_ERR(neigh); if (!IS_ERR(neigh)) { unsigned int state = NUD_STALE; if (r->arp_flags & ATF_PERM) { r->arp_flags |= ATF_COM; state = NUD_PERMANENT; } err = neigh_update(neigh, (r->arp_flags & ATF_COM) ? r->arp_ha.sa_data : NULL, state, NEIGH_UPDATE_F_OVERRIDE | NEIGH_UPDATE_F_ADMIN, 0); neigh_release(neigh); } return err; } static unsigned int arp_state_to_flags(struct neighbour *neigh) { if (neigh->nud_state&NUD_PERMANENT) return ATF_PERM | ATF_COM; else if (neigh->nud_state&NUD_VALID) return ATF_COM; else return 0; } /* * Get an ARP cache entry. */ static int arp_req_get(struct net *net, struct arpreq *r) { __be32 ip = ((struct sockaddr_in *) &r->arp_pa)->sin_addr.s_addr; struct neighbour *neigh; struct net_device *dev; if (!r->arp_dev[0]) return -ENODEV; dev = arp_req_dev_by_name(net, r, true); if (IS_ERR(dev)) return PTR_ERR(dev); neigh = neigh_lookup(&arp_tbl, &ip, dev); if (!neigh) return -ENXIO; if (READ_ONCE(neigh->nud_state) & NUD_NOARP) { neigh_release(neigh); return -ENXIO; } read_lock_bh(&neigh->lock); memcpy(r->arp_ha.sa_data, neigh->ha, min(dev->addr_len, sizeof(r->arp_ha.sa_data_min))); r->arp_flags = arp_state_to_flags(neigh); read_unlock_bh(&neigh->lock); neigh_release(neigh); r->arp_ha.sa_family = dev->type; netdev_copy_name(dev, r->arp_dev); return 0; } int arp_invalidate(struct net_device *dev, __be32 ip, bool force) { struct neighbour *neigh = neigh_lookup(&arp_tbl, &ip, dev); int err = -ENXIO; struct neigh_table *tbl = &arp_tbl; if (neigh) { if ((READ_ONCE(neigh->nud_state) & NUD_VALID) && !force) { neigh_release(neigh); return 0; } if (READ_ONCE(neigh->nud_state) & ~NUD_NOARP) err = neigh_update(neigh, NULL, NUD_FAILED, NEIGH_UPDATE_F_OVERRIDE| NEIGH_UPDATE_F_ADMIN, 0); write_lock_bh(&tbl->lock); neigh_release(neigh); neigh_remove_one(neigh); write_unlock_bh(&tbl->lock); } return err; } static int arp_req_delete_public(struct net *net, struct arpreq *r, struct net_device *dev) { __be32 mask = ((struct sockaddr_in *)&r->arp_netmask)->sin_addr.s_addr; if (mask) { __be32 ip = ((struct sockaddr_in *)&r->arp_pa)->sin_addr.s_addr; return pneigh_delete(&arp_tbl, net, &ip, dev); } return arp_req_set_proxy(net, dev, 0); } static int arp_req_delete(struct net *net, struct arpreq *r) { struct net_device *dev; __be32 ip; dev = arp_req_dev(net, r); if (IS_ERR(dev)) return PTR_ERR(dev); if (r->arp_flags & ATF_PUBL) return arp_req_delete_public(net, r, dev); ip = ((struct sockaddr_in *)&r->arp_pa)->sin_addr.s_addr; return arp_invalidate(dev, ip, true); } /* * Handle an ARP layer I/O control request. */ int arp_ioctl(struct net *net, unsigned int cmd, void __user *arg) { struct arpreq r; __be32 *netmask; int err; switch (cmd) { case SIOCDARP: case SIOCSARP: if (!ns_capable(net->user_ns, CAP_NET_ADMIN)) return -EPERM; fallthrough; case SIOCGARP: err = copy_from_user(&r, arg, sizeof(struct arpreq)); if (err) return -EFAULT; break; default: return -EINVAL; } if (r.arp_pa.sa_family != AF_INET) return -EPFNOSUPPORT; if (!(r.arp_flags & ATF_PUBL) && (r.arp_flags & (ATF_NETMASK | ATF_DONTPUB))) return -EINVAL; netmask = &((struct sockaddr_in *)&r.arp_netmask)->sin_addr.s_addr; if (!(r.arp_flags & ATF_NETMASK)) *netmask = htonl(0xFFFFFFFFUL); else if (*netmask && *netmask != htonl(0xFFFFFFFFUL)) return -EINVAL; switch (cmd) { case SIOCDARP: rtnl_lock(); err = arp_req_delete(net, &r); rtnl_unlock(); break; case SIOCSARP: rtnl_lock(); err = arp_req_set(net, &r); rtnl_unlock(); break; case SIOCGARP: rcu_read_lock(); err = arp_req_get(net, &r); rcu_read_unlock(); if (!err && copy_to_user(arg, &r, sizeof(r))) err = -EFAULT; break; } return err; } static int arp_netdev_event(struct notifier_block *this, unsigned long event, void *ptr) { struct net_device *dev = netdev_notifier_info_to_dev(ptr); struct netdev_notifier_change_info *change_info; struct in_device *in_dev; bool evict_nocarrier; switch (event) { case NETDEV_CHANGEADDR: neigh_changeaddr(&arp_tbl, dev); rt_cache_flush(dev_net(dev)); break; case NETDEV_CHANGE: change_info = ptr; if (change_info->flags_changed & IFF_NOARP) neigh_changeaddr(&arp_tbl, dev); in_dev = __in_dev_get_rtnl(dev); if (!in_dev) evict_nocarrier = true; else evict_nocarrier = IN_DEV_ARP_EVICT_NOCARRIER(in_dev); if (evict_nocarrier && !netif_carrier_ok(dev)) neigh_carrier_down(&arp_tbl, dev); break; default: break; } return NOTIFY_DONE; } static struct notifier_block arp_netdev_notifier = { .notifier_call = arp_netdev_event, }; /* Note, that it is not on notifier chain. It is necessary, that this routine was called after route cache will be flushed. */ void arp_ifdown(struct net_device *dev) { neigh_ifdown(&arp_tbl, dev); } /* * Called once on startup. */ static struct packet_type arp_packet_type __read_mostly = { .type = cpu_to_be16(ETH_P_ARP), .func = arp_rcv, }; #ifdef CONFIG_PROC_FS #if IS_ENABLED(CONFIG_AX25) /* * ax25 -> ASCII conversion */ static void ax2asc2(ax25_address *a, char *buf) { char c, *s; int n; for (n = 0, s = buf; n < 6; n++) { c = (a->ax25_call[n] >> 1) & 0x7F; if (c != ' ') *s++ = c; } *s++ = '-'; n = (a->ax25_call[6] >> 1) & 0x0F; if (n > 9) { *s++ = '1'; n -= 10; } *s++ = n + '0'; *s++ = '\0'; if (*buf == '\0' || *buf == '-') { buf[0] = '*'; buf[1] = '\0'; } } #endif /* CONFIG_AX25 */ #define HBUFFERLEN 30 static void arp_format_neigh_entry(struct seq_file *seq, struct neighbour *n) { char hbuffer[HBUFFERLEN]; int k, j; char tbuf[16]; struct net_device *dev = n->dev; int hatype = dev->type; read_lock(&n->lock); /* Convert hardware address to XX:XX:XX:XX ... form. */ #if IS_ENABLED(CONFIG_AX25) if (hatype == ARPHRD_AX25 || hatype == ARPHRD_NETROM) ax2asc2((ax25_address *)n->ha, hbuffer); else { #endif for (k = 0, j = 0; k < HBUFFERLEN - 3 && j < dev->addr_len; j++) { hbuffer[k++] = hex_asc_hi(n->ha[j]); hbuffer[k++] = hex_asc_lo(n->ha[j]); hbuffer[k++] = ':'; } if (k != 0) --k; hbuffer[k] = 0; #if IS_ENABLED(CONFIG_AX25) } #endif sprintf(tbuf, "%pI4", n->primary_key); seq_printf(seq, "%-16s 0x%-10x0x%-10x%-17s * %s\n", tbuf, hatype, arp_state_to_flags(n), hbuffer, dev->name); read_unlock(&n->lock); } static void arp_format_pneigh_entry(struct seq_file *seq, struct pneigh_entry *n) { struct net_device *dev = n->dev; int hatype = dev ? dev->type : 0; char tbuf[16]; sprintf(tbuf, "%pI4", n->key); seq_printf(seq, "%-16s 0x%-10x0x%-10x%s * %s\n", tbuf, hatype, ATF_PUBL | ATF_PERM, "00:00:00:00:00:00", dev ? dev->name : "*"); } static int arp_seq_show(struct seq_file *seq, void *v) { if (v == SEQ_START_TOKEN) { seq_puts(seq, "IP address HW type Flags " "HW address Mask Device\n"); } else { struct neigh_seq_state *state = seq->private; if (state->flags & NEIGH_SEQ_IS_PNEIGH) arp_format_pneigh_entry(seq, v); |