1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 /* $OpenBSD: kref.h,v 1.4 2020/06/17 02:58:15 jsg Exp $ */ /* * Copyright (c) 2015 Mark Kettenis * * Permission to use, copy, modify, and distribute this software for any * purpose with or without fee is hereby granted, provided that the above * copyright notice and this permission notice appear in all copies. * * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. */ #ifndef _LINUX_KREF_H #define _LINUX_KREF_H #include <sys/types.h> #include <sys/rwlock.h> #include <sys/atomic.h> #include <linux/atomic.h> #include <linux/compiler.h> #include <linux/refcount.h> struct kref { uint32_t refcount; }; static inline void kref_init(struct kref *ref) { atomic_set(&ref->refcount, 1); } static inline unsigned int kref_read(const struct kref *ref) { return atomic_read(&ref->refcount); } static inline void kref_get(struct kref *ref) { atomic_inc_int(&ref->refcount); } static inline int kref_get_unless_zero(struct kref *ref) { if (ref->refcount != 0) { atomic_inc_int(&ref->refcount); return (1); } else { return (0); } } static inline int kref_put(struct kref *ref, void (*release)(struct kref *ref)) { if (atomic_dec_int_nv(&ref->refcount) == 0) { release(ref); return 1; } return 0; } static inline int kref_put_mutex(struct kref *kref, void (*release)(struct kref *kref), struct rwlock *lock) { if (!atomic_add_unless(&kref->refcount, -1, 1)) { rw_enter_write(lock); if (likely(atomic_dec_and_test(&kref->refcount))) { release(kref); return 1; } rw_exit_write(lock); return 0; } return 0; } static inline int kref_put_lock(struct kref *kref, void (*release)(struct kref *kref), struct mutex *lock) { if (!atomic_add_unless(&kref->refcount, -1, 1)) { mtx_enter(lock); if (likely(atomic_dec_and_test(&kref->refcount))) { release(kref); return 1; } mtx_leave(lock); return 0; } return 0; } #endif
1 1 1 59 1 60 1 1 23 23 21 21 21 15 16 16 69 1085 5 1047 35 1082 5 8 1 1 1 1 6 2 5 4 2 17 17 21 21 16 16 16 69 95 2 2 2 16 1085 1082 5 8 9 6 6 27 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 /* $OpenBSD: ifq.c,v 1.46 2022/04/30 21:13:57 bluhm Exp $ */ /* * Copyright (c) 2015 David Gwynne <dlg@openbsd.org> * * Permission to use, copy, modify, and distribute this software for any * purpose with or without fee is hereby granted, provided that the above * copyright notice and this permission notice appear in all copies. * * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. */ #include "bpfilter.h" #include "kstat.h" #include <sys/param.h> #include <sys/systm.h> #include <sys/socket.h> #include <sys/mbuf.h> #include <sys/proc.h> #include <sys/sysctl.h> #include <net/if.h> #include <net/if_var.h> #if NBPFILTER > 0 #include <net/bpf.h> #endif #if NKSTAT > 0 #include <sys/kstat.h> #endif /* * priq glue */ unsigned int priq_idx(unsigned int, const struct mbuf *); struct mbuf *priq_enq(struct ifqueue *, struct mbuf *); struct mbuf *priq_deq_begin(struct ifqueue *, void **); void priq_deq_commit(struct ifqueue *, struct mbuf *, void *); void priq_purge(struct ifqueue *, struct mbuf_list *); void *priq_alloc(unsigned int, void *); void priq_free(unsigned int, void *); const struct ifq_ops priq_ops = { priq_idx, priq_enq, priq_deq_begin, priq_deq_commit, priq_purge, priq_alloc, priq_free, }; const struct ifq_ops * const ifq_priq_ops = &priq_ops; /* * priq internal structures */ struct priq { struct mbuf_list pq_lists[IFQ_NQUEUES]; }; /* * ifqueue serialiser */ void ifq_start_task(void *); void ifq_restart_task(void *); void ifq_barrier_task(void *); void ifq_bundle_task(void *); static inline void ifq_run_start(struct ifqueue *ifq) { ifq_serialize(ifq, &ifq->ifq_start); } void ifq_serialize(struct ifqueue *ifq, struct task *t) { struct task work; if (ISSET(t->t_flags, TASK_ONQUEUE)) return; mtx_enter(&ifq->ifq_task_mtx); if (!ISSET(t->t_flags, TASK_ONQUEUE)) { SET(t->t_flags, TASK_ONQUEUE); TAILQ_INSERT_TAIL(&ifq->ifq_task_list, t, t_entry); } if (ifq->ifq_serializer == NULL) { ifq->ifq_serializer = curcpu(); while ((t = TAILQ_FIRST(&ifq->ifq_task_list)) != NULL) { TAILQ_REMOVE(&ifq->ifq_task_list, t, t_entry); CLR(t->t_flags, TASK_ONQUEUE); work = *t; /* copy to caller to avoid races */ mtx_leave(&ifq->ifq_task_mtx); (*work.t_func)(work.t_arg); mtx_enter(&ifq->ifq_task_mtx); } ifq->ifq_serializer = NULL; } mtx_leave(&ifq->ifq_task_mtx); } int ifq_is_serialized(struct ifqueue *ifq) { return (ifq->ifq_serializer == curcpu()); } void ifq_start(struct ifqueue *ifq) { if (ifq_len(ifq) >= min(ifq->ifq_if->if_txmit, ifq->ifq_maxlen)) { task_del(ifq->ifq_softnet, &ifq->ifq_bundle); ifq_run_start(ifq); } else task_add(ifq->ifq_softnet, &ifq->ifq_bundle); } void ifq_start_task(void *p) { struct ifqueue *ifq = p; struct ifnet *ifp = ifq->ifq_if; if (!ISSET(ifp->if_flags, IFF_RUNNING) || ifq_empty(ifq) || ifq_is_oactive(ifq)) return; ifp->if_qstart(ifq); } void ifq_restart_task(void *p) { struct ifqueue *ifq = p; struct ifnet *ifp = ifq->ifq_if; ifq_clr_oactive(ifq); ifp->if_qstart(ifq); } void ifq_bundle_task(void *p) { struct ifqueue *ifq = p; ifq_run_start(ifq); } void ifq_barrier(struct ifqueue *ifq) { struct cond c = COND_INITIALIZER(); struct task t = TASK_INITIALIZER(ifq_barrier_task, &c); task_del(ifq->ifq_softnet, &ifq->ifq_bundle); if (ifq->ifq_serializer == NULL) return; ifq_serialize(ifq, &t); cond_wait(&c, "ifqbar"); } void ifq_barrier_task(void *p) { struct cond *c = p; cond_signal(c); } /* * ifqueue mbuf queue API */ #if NKSTAT > 0 struct ifq_kstat_data { struct kstat_kv kd_packets; struct kstat_kv kd_bytes; struct kstat_kv kd_qdrops; struct kstat_kv kd_errors; struct kstat_kv kd_qlen; struct kstat_kv kd_maxqlen; struct kstat_kv kd_oactive; }; static const struct ifq_kstat_data ifq_kstat_tpl = { KSTAT_KV_UNIT_INITIALIZER("packets", KSTAT_KV_T_COUNTER64, KSTAT_KV_U_PACKETS), KSTAT_KV_UNIT_INITIALIZER("bytes", KSTAT_KV_T_COUNTER64, KSTAT_KV_U_BYTES), KSTAT_KV_UNIT_INITIALIZER("qdrops", KSTAT_KV_T_COUNTER64, KSTAT_KV_U_PACKETS), KSTAT_KV_UNIT_INITIALIZER("errors", KSTAT_KV_T_COUNTER64, KSTAT_KV_U_PACKETS), KSTAT_KV_UNIT_INITIALIZER("qlen", KSTAT_KV_T_UINT32, KSTAT_KV_U_PACKETS), KSTAT_KV_UNIT_INITIALIZER("maxqlen", KSTAT_KV_T_UINT32, KSTAT_KV_U_PACKETS), KSTAT_KV_INITIALIZER("oactive", KSTAT_KV_T_BOOL), }; int ifq_kstat_copy(struct kstat *ks, void *dst) { struct ifqueue *ifq = ks->ks_softc; struct ifq_kstat_data *kd = dst; *kd = ifq_kstat_tpl; kstat_kv_u64(&kd->kd_packets) = ifq->ifq_packets; kstat_kv_u64(&kd->kd_bytes) = ifq->ifq_bytes; kstat_kv_u64(&kd->kd_qdrops) = ifq->ifq_qdrops; kstat_kv_u64(&kd->kd_errors) = ifq->ifq_errors; kstat_kv_u32(&kd->kd_qlen) = ifq->ifq_len; kstat_kv_u32(&kd->kd_maxqlen) = ifq->ifq_maxlen; kstat_kv_bool(&kd->kd_oactive) = ifq->ifq_oactive; return (0); } #endif void ifq_init(struct ifqueue *ifq, struct ifnet *ifp, unsigned int idx) { ifq->ifq_if = ifp; ifq->ifq_softnet = net_tq(ifp->if_index + idx); ifq->ifq_softc = NULL; mtx_init(&ifq->ifq_mtx, IPL_NET); /* default to priq */ ifq->ifq_ops = &priq_ops; ifq->ifq_q = priq_ops.ifqop_alloc(idx, NULL); ml_init(&ifq->ifq_free); ifq->ifq_len = 0; ifq->ifq_packets = 0; ifq->ifq_bytes = 0; ifq->ifq_qdrops = 0; ifq->ifq_errors = 0; ifq->ifq_mcasts = 0; mtx_init(&ifq->ifq_task_mtx, IPL_NET); TAILQ_INIT(&ifq->ifq_task_list); ifq->ifq_serializer = NULL; task_set(&ifq->ifq_bundle, ifq_bundle_task, ifq); task_set(&ifq->ifq_start, ifq_start_task, ifq); task_set(&ifq->ifq_restart, ifq_restart_task, ifq); if (ifq->ifq_maxlen == 0) ifq_set_maxlen(ifq, IFQ_MAXLEN); ifq->ifq_idx = idx; #if NKSTAT > 0 /* XXX xname vs driver name and unit */ ifq->ifq_kstat = kstat_create(ifp->if_xname, 0, "txq", ifq->ifq_idx, KSTAT_T_KV, 0); KASSERT(ifq->ifq_kstat != NULL); kstat_set_mutex(ifq->ifq_kstat, &ifq->ifq_mtx); ifq->ifq_kstat->ks_softc = ifq; ifq->ifq_kstat->ks_datalen = sizeof(ifq_kstat_tpl); ifq->ifq_kstat->ks_copy = ifq_kstat_copy; kstat_install(ifq->ifq_kstat); #endif } void ifq_attach(struct ifqueue *ifq, const struct ifq_ops *newops, void *opsarg) { struct mbuf_list ml = MBUF_LIST_INITIALIZER(); struct mbuf_list free_ml = MBUF_LIST_INITIALIZER(); struct mbuf *m; const struct ifq_ops *oldops; void *newq, *oldq; newq = newops->ifqop_alloc(ifq->ifq_idx, opsarg); mtx_enter(&ifq->ifq_mtx); ifq->ifq_ops->ifqop_purge(ifq, &ml); ifq->ifq_len = 0; oldops = ifq->ifq_ops; oldq = ifq->ifq_q; ifq->ifq_ops = newops; ifq->ifq_q = newq; while ((m = ml_dequeue(&ml)) != NULL) { m = ifq->ifq_ops->ifqop_enq(ifq, m); if (m != NULL) { ifq->ifq_qdrops++; ml_enqueue(&free_ml, m); } else ifq->ifq_len++; } mtx_leave(&ifq->ifq_mtx); oldops->ifqop_free(ifq->ifq_idx, oldq); ml_purge(&free_ml); } void ifq_destroy(struct ifqueue *ifq) { struct mbuf_list ml = MBUF_LIST_INITIALIZER(); #if NKSTAT > 0 kstat_destroy(ifq->ifq_kstat); #endif NET_ASSERT_UNLOCKED(); if (!task_del(ifq->ifq_softnet, &ifq->ifq_bundle)) taskq_barrier(ifq->ifq_softnet); /* don't need to lock because this is the last use of the ifq */ ifq->ifq_ops->ifqop_purge(ifq, &ml); ifq->ifq_ops->ifqop_free(ifq->ifq_idx, ifq->ifq_q); ml_purge(&ml); } void ifq_add_data(struct ifqueue *ifq, struct if_data *data) { mtx_enter(&ifq->ifq_mtx); data->ifi_opackets += ifq->ifq_packets; data->ifi_obytes += ifq->ifq_bytes; data->ifi_oqdrops += ifq->ifq_qdrops; data->ifi_omcasts += ifq->ifq_mcasts; /* ifp->if_data.ifi_oerrors */ mtx_leave(&ifq->ifq_mtx); } int ifq_enqueue(struct ifqueue *ifq, struct mbuf *m) { struct mbuf *dm; mtx_enter(&ifq->ifq_mtx); dm = ifq->ifq_ops->ifqop_enq(ifq, m); if (dm != m) { ifq->ifq_packets++; ifq->ifq_bytes += m->m_pkthdr.len; if (ISSET(m->m_flags, M_MCAST)) ifq->ifq_mcasts++; } if (dm == NULL) ifq->ifq_len++; else ifq->ifq_qdrops++; mtx_leave(&ifq->ifq_mtx); if (dm != NULL) m_freem(dm); return (dm == m ? ENOBUFS : 0); } static inline void ifq_deq_enter(struct ifqueue *ifq) { mtx_enter(&ifq->ifq_mtx); } static inline void ifq_deq_leave(struct ifqueue *ifq) { struct mbuf_list ml; ml = ifq->ifq_free; ml_init(&ifq->ifq_free); mtx_leave(&ifq->ifq_mtx); if (!ml_empty(&ml)) ml_purge(&ml); } struct mbuf * ifq_deq_begin(struct ifqueue *ifq) { struct mbuf *m = NULL; void *cookie; ifq_deq_enter(ifq); if (ifq->ifq_len == 0 || (m = ifq->ifq_ops->ifqop_deq_begin(ifq, &cookie)) == NULL) { ifq_deq_leave(ifq); return (NULL); } m->m_pkthdr.ph_cookie = cookie; return (m); } void ifq_deq_commit(struct ifqueue *ifq, struct mbuf *m) { void *cookie; KASSERT(m != NULL); cookie = m->m_pkthdr.ph_cookie; ifq->ifq_ops->ifqop_deq_commit(ifq, m, cookie); ifq->ifq_len--; ifq_deq_leave(ifq); } void ifq_deq_rollback(struct ifqueue *ifq, struct mbuf *m) { KASSERT(m != NULL); ifq_deq_leave(ifq); } struct mbuf * ifq_dequeue(struct ifqueue *ifq) { struct mbuf *m; m = ifq_deq_begin(ifq); if (m == NULL) return (NULL); ifq_deq_commit(ifq, m); return (m); } int ifq_deq_sleep(struct ifqueue *ifq, struct mbuf **mp, int nbio, int priority, const char *wmesg, volatile unsigned int *sleeping, volatile unsigned int *alive) { struct mbuf *m; void *cookie; int error = 0; ifq_deq_enter(ifq); if (ifq->ifq_len == 0 && nbio) error = EWOULDBLOCK; else { for (;;) { m = ifq->ifq_ops->ifqop_deq_begin(ifq, &cookie); if (m != NULL) { ifq->ifq_ops->ifqop_deq_commit(ifq, m, cookie); ifq->ifq_len--; *mp = m; break; } (*sleeping)++; error = msleep_nsec(ifq, &ifq->ifq_mtx, priority, wmesg, INFSLP); (*sleeping)--; if (error != 0) break; if (!(*alive)) { error = EIO; break; } } } ifq_deq_leave(ifq); return (error); } int ifq_hdatalen(struct ifqueue *ifq) { struct mbuf *m; int len = 0; if (ifq_empty(ifq)) return (0); m = ifq_deq_begin(ifq); if (m != NULL) { len = m->m_pkthdr.len; ifq_deq_rollback(ifq, m); } return (len); } unsigned int ifq_purge(struct ifqueue *ifq) { struct mbuf_list ml = MBUF_LIST_INITIALIZER(); unsigned int rv; mtx_enter(&ifq->ifq_mtx); ifq->ifq_ops->ifqop_purge(ifq, &ml); rv = ifq->ifq_len; ifq->ifq_len = 0; ifq->ifq_qdrops += rv; mtx_leave(&ifq->ifq_mtx); KASSERT(rv == ml_len(&ml)); ml_purge(&ml); return (rv); } void * ifq_q_enter(struct ifqueue *ifq, const struct ifq_ops *ops) { mtx_enter(&ifq->ifq_mtx); if (ifq->ifq_ops == ops) return (ifq->ifq_q); mtx_leave(&ifq->ifq_mtx); return (NULL); } void ifq_q_leave(struct ifqueue *ifq, void *q) { KASSERT(q == ifq->ifq_q); mtx_leave(&ifq->ifq_mtx); } void ifq_mfreem(struct ifqueue *ifq, struct mbuf *m) { MUTEX_ASSERT_LOCKED(&ifq->ifq_mtx); ifq->ifq_len--; ifq->ifq_qdrops++; ml_enqueue(&ifq->ifq_free, m); } void ifq_mfreeml(struct ifqueue *ifq, struct mbuf_list *ml) { MUTEX_ASSERT_LOCKED(&ifq->ifq_mtx); ifq->ifq_len -= ml_len(ml); ifq->ifq_qdrops += ml_len(ml); ml_enlist(&ifq->ifq_free, ml); } /* * ifiq */ #if NKSTAT > 0 struct ifiq_kstat_data { struct kstat_kv kd_packets; struct kstat_kv kd_bytes; struct kstat_kv kd_qdrops; struct kstat_kv kd_errors; struct kstat_kv kd_qlen; }; static const struct ifiq_kstat_data ifiq_kstat_tpl = { KSTAT_KV_UNIT_INITIALIZER("packets", KSTAT_KV_T_COUNTER64, KSTAT_KV_U_PACKETS), KSTAT_KV_UNIT_INITIALIZER("bytes", KSTAT_KV_T_COUNTER64, KSTAT_KV_U_BYTES), KSTAT_KV_UNIT_INITIALIZER("qdrops", KSTAT_KV_T_COUNTER64, KSTAT_KV_U_PACKETS), KSTAT_KV_UNIT_INITIALIZER("errors", KSTAT_KV_T_COUNTER64, KSTAT_KV_U_PACKETS), KSTAT_KV_UNIT_INITIALIZER("qlen", KSTAT_KV_T_UINT32, KSTAT_KV_U_PACKETS), }; int ifiq_kstat_copy(struct kstat *ks, void *dst) { struct ifiqueue *ifiq = ks->ks_softc; struct ifiq_kstat_data *kd = dst; *kd = ifiq_kstat_tpl; kstat_kv_u64(&kd->kd_packets) = ifiq->ifiq_packets; kstat_kv_u64(&kd->kd_bytes) = ifiq->ifiq_bytes; kstat_kv_u64(&kd->kd_qdrops) = ifiq->ifiq_qdrops; kstat_kv_u64(&kd->kd_errors) = ifiq->ifiq_errors; kstat_kv_u32(&kd->kd_qlen) = ml_len(&ifiq->ifiq_ml); return (0); } #endif static void ifiq_process(void *); void ifiq_init(struct ifiqueue *ifiq, struct ifnet *ifp, unsigned int idx) { ifiq->ifiq_if = ifp; ifiq->ifiq_softnet = net_tq(ifp->if_index + idx); ifiq->ifiq_softc = NULL; mtx_init(&ifiq->ifiq_mtx, IPL_NET); ml_init(&ifiq->ifiq_ml); task_set(&ifiq->ifiq_task, ifiq_process, ifiq); ifiq->ifiq_pressure = 0; ifiq->ifiq_packets = 0; ifiq->ifiq_bytes = 0; ifiq->ifiq_qdrops = 0; ifiq->ifiq_errors = 0; ifiq->ifiq_idx = idx; #if NKSTAT > 0 /* XXX xname vs driver name and unit */ ifiq->ifiq_kstat = kstat_create(ifp->if_xname, 0, "rxq", ifiq->ifiq_idx, KSTAT_T_KV, 0); KASSERT(ifiq->ifiq_kstat != NULL); kstat_set_mutex(ifiq->ifiq_kstat, &ifiq->ifiq_mtx); ifiq->ifiq_kstat->ks_softc = ifiq; ifiq->ifiq_kstat->ks_datalen = sizeof(ifiq_kstat_tpl); ifiq->ifiq_kstat->ks_copy = ifiq_kstat_copy; kstat_install(ifiq->ifiq_kstat); #endif } void ifiq_destroy(struct ifiqueue *ifiq) { #if NKSTAT > 0 kstat_destroy(ifiq->ifiq_kstat); #endif NET_ASSERT_UNLOCKED(); if (!task_del(ifiq->ifiq_softnet, &ifiq->ifiq_task)) taskq_barrier(ifiq->ifiq_softnet); /* don't need to lock because this is the last use of the ifiq */ ml_purge(&ifiq->ifiq_ml); } unsigned int ifiq_maxlen_drop = 2048 * 5; unsigned int ifiq_maxlen_return = 2048 * 3; int ifiq_input(struct ifiqueue *ifiq, struct mbuf_list *ml) { struct ifnet *ifp = ifiq->ifiq_if; struct mbuf *m; uint64_t packets; uint64_t bytes = 0; unsigned int len; #if NBPFILTER > 0 caddr_t if_bpf; #endif if (ml_empty(ml)) return (0); MBUF_LIST_FOREACH(ml, m) { m->m_pkthdr.ph_ifidx = ifp->if_index; m->m_pkthdr.ph_rtableid = ifp->if_rdomain; bytes += m->m_pkthdr.len; } packets = ml_len(ml); #if NBPFILTER > 0 if_bpf = ifp->if_bpf; if (if_bpf) { struct mbuf_list ml0 = *ml; ml_init(ml); while ((m = ml_dequeue(&ml0)) != NULL) { if ((*ifp->if_bpf_mtap)(if_bpf, m, BPF_DIRECTION_IN)) m_freem(m); else ml_enqueue(ml, m); } if (ml_empty(ml)) { mtx_enter(&ifiq->ifiq_mtx); ifiq->ifiq_packets += packets; ifiq->ifiq_bytes += bytes; mtx_leave(&ifiq->ifiq_mtx); return (0); } } #endif mtx_enter(&ifiq->ifiq_mtx); ifiq->ifiq_packets += packets; ifiq->ifiq_bytes += bytes; len = ml_len(&ifiq->ifiq_ml); if (__predict_true(!ISSET(ifp->if_xflags, IFXF_MONITOR))) { if (len > ifiq_maxlen_drop) ifiq->ifiq_qdrops += ml_len(ml); else ml_enlist(&ifiq->ifiq_ml, ml); } mtx_leave(&ifiq->ifiq_mtx); if (ml_empty(ml)) task_add(ifiq->ifiq_softnet, &ifiq->ifiq_task); else ml_purge(ml); return (len > ifiq_maxlen_return); } void ifiq_add_data(struct ifiqueue *ifiq, struct if_data *data) { mtx_enter(&ifiq->ifiq_mtx); data->ifi_ipackets += ifiq->ifiq_packets; data->ifi_ibytes += ifiq->ifiq_bytes; data->ifi_iqdrops += ifiq->ifiq_qdrops; mtx_leave(&ifiq->ifiq_mtx); } int ifiq_enqueue(struct ifiqueue *ifiq, struct mbuf *m) { mtx_enter(&ifiq->ifiq_mtx); ml_enqueue(&ifiq->ifiq_ml, m); mtx_leave(&ifiq->ifiq_mtx); task_add(ifiq->ifiq_softnet, &ifiq->ifiq_task); return (0); } static void ifiq_process(void *arg) { struct ifiqueue *ifiq = arg; struct mbuf_list ml; if (ifiq_empty(ifiq)) return; mtx_enter(&ifiq->ifiq_mtx); ml = ifiq->ifiq_ml; ml_init(&ifiq->ifiq_ml); mtx_leave(&ifiq->ifiq_mtx); if_input_process(ifiq->ifiq_if, &ml); } int net_ifiq_sysctl(int *name, u_int namelen, void *oldp, size_t *oldlenp, void *newp, size_t newlen) { int error = EOPNOTSUPP; /* pressure is disabled for 6.6-release */ #if 0 int val; if (namelen != 1) return (EISDIR); switch (name[0]) { case NET_LINK_IFRXQ_PRESSURE_RETURN: val = ifiq_pressure_return; error = sysctl_int(oldp, oldlenp, newp, newlen, &val); if (error != 0) return (error); if (val < 1 || val > ifiq_pressure_drop) return (EINVAL); ifiq_pressure_return = val; break; case NET_LINK_IFRXQ_PRESSURE_DROP: val = ifiq_pressure_drop; error = sysctl_int(oldp, oldlenp, newp, newlen, &val); if (error != 0) return (error); if (ifiq_pressure_return > val) return (EINVAL); ifiq_pressure_drop = val; break; default: error = EOPNOTSUPP; break; } #endif return (error); } /* * priq implementation */ unsigned int priq_idx(unsigned int nqueues, const struct mbuf *m) { unsigned int flow = 0; if (ISSET(m->m_pkthdr.csum_flags, M_FLOWID)) flow = m->m_pkthdr.ph_flowid; return (flow % nqueues); } void * priq_alloc(unsigned int idx, void *null) { struct priq *pq; int i; pq = malloc(sizeof(struct priq), M_DEVBUF, M_WAITOK); for (i = 0; i < IFQ_NQUEUES; i++) ml_init(&pq->pq_lists[i]); return (pq); } void priq_free(unsigned int idx, void *pq) { free(pq, M_DEVBUF, sizeof(struct priq)); } struct mbuf * priq_enq(struct ifqueue *ifq, struct mbuf *m) { struct priq *pq; struct mbuf_list *pl; struct mbuf *n = NULL; unsigned int prio; pq = ifq->ifq_q; KASSERT(m->m_pkthdr.pf.prio <= IFQ_MAXPRIO); /* Find a lower priority queue to drop from */ if (ifq_len(ifq) >= ifq->ifq_maxlen) { for (prio = 0; prio < m->m_pkthdr.pf.prio; prio++) { pl = &pq->pq_lists[prio]; if (ml_len(pl) > 0) { n = ml_dequeue(pl); goto enqueue; } } /* * There's no lower priority queue that we can * drop from so don't enqueue this one. */ return (m); } enqueue: pl = &pq->pq_lists[m->m_pkthdr.pf.prio]; ml_enqueue(pl, m); return (n); } struct mbuf * priq_deq_begin(struct ifqueue *ifq, void **cookiep) { struct priq *pq = ifq->ifq_q; struct mbuf_list *pl; unsigned int prio = nitems(pq->pq_lists); struct mbuf *m; do { pl = &pq->pq_lists[--prio]; m = MBUF_LIST_FIRST(pl); if (m != NULL) { *cookiep = pl; return (m); } } while (prio > 0); return (NULL); } void priq_deq_commit(struct ifqueue *ifq, struct mbuf *m, void *cookie) { struct mbuf_list *pl = cookie; KASSERT(MBUF_LIST_FIRST(pl) == m); ml_dequeue(pl); } void priq_purge(struct ifqueue *ifq, struct mbuf_list *ml) { struct priq *pq = ifq->ifq_q; struct mbuf_list *pl; unsigned int prio = nitems(pq->pq_lists); do { pl = &pq->pq_lists[--prio]; ml_enlist(ml, pl); } while (prio > 0); }
83 2 86 2 97 3 1052 1530 195 344 1063 58 64 53 59 83 67 60 2 82 100 71 63 77 70 79 58 53 58 54 48 64 69 55 52 55 61 31 32 58 64 55 64 83 63 68 48 60 66 65 49 68 88 2 64 71 35 14 6 11 37 8 22 5 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 /* $OpenBSD: bpf_filter.c,v 1.34 2020/08/03 03:21:24 dlg Exp $ */ /* $NetBSD: bpf_filter.c,v 1.12 1996/02/13 22:00:00 christos Exp $ */ /* * Copyright (c) 1990, 1991, 1992, 1993 * The Regents of the University of California. All rights reserved. * * This code is derived from the Stanford/CMU enet packet filter, * (net/enet.c) distributed as part of 4.3BSD, and code contributed * to Berkeley by Steven McCanne and Van Jacobson both of Lawrence * Berkeley Laboratory. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * @(#)bpf_filter.c 8.1 (Berkeley) 6/10/93 */ #include <sys/param.h> #include <sys/time.h> #ifndef _KERNEL #include <stdlib.h> #include <string.h> #include "pcap.h" #else #include <sys/systm.h> #endif #include <sys/endian.h> #ifdef _KERNEL extern int bpf_maxbufsize; #define Static #else /* _KERNEL */ #define Static static #endif /* _KERNEL */ #include <net/bpf.h> struct bpf_mem { const u_char *pkt; u_int len; }; Static u_int32_t bpf_mem_ldw(const void *, u_int32_t, int *); Static u_int32_t bpf_mem_ldh(const void *, u_int32_t, int *); Static u_int32_t bpf_mem_ldb(const void *, u_int32_t, int *); static const struct bpf_ops bpf_mem_ops = { bpf_mem_ldw, bpf_mem_ldh, bpf_mem_ldb, }; Static u_int32_t bpf_mem_ldw(const void *mem, u_int32_t k, int *err) { const struct bpf_mem *bm = mem; u_int32_t v; *err = 1; if (k + sizeof(v) > bm->len) return (0); memcpy(&v, bm->pkt + k, sizeof(v)); *err = 0; return ntohl(v); } Static u_int32_t bpf_mem_ldh(const void *mem, u_int32_t k, int *err) { const struct bpf_mem *bm = mem; u_int16_t v; *err = 1; if (k + sizeof(v) > bm->len) return (0); memcpy(&v, bm->pkt + k, sizeof(v)); *err = 0; return ntohs(v); } Static u_int32_t bpf_mem_ldb(const void *mem, u_int32_t k, int *err) { const struct bpf_mem *bm = mem; *err = 1; if (k >= bm->len) return (0); *err = 0; return bm->pkt[k]; } /* * Execute the filter program starting at pc on the packet p * wirelen is the length of the original packet * buflen is the amount of data present */ u_int bpf_filter(const struct bpf_insn *pc, const u_char *pkt, u_int wirelen, u_int buflen) { struct bpf_mem bm; bm.pkt = pkt; bm.len = buflen; return _bpf_filter(pc, &bpf_mem_ops, &bm, wirelen); } u_int _bpf_filter(const struct bpf_insn *pc, const struct bpf_ops *ops, const void *pkt, u_int wirelen) { u_int32_t A = 0, X = 0; u_int32_t k; int32_t mem[BPF_MEMWORDS]; int err; if (pc == NULL) { /* * No filter means accept all. */ return (u_int)-1; } memset(mem, 0, sizeof(mem)); --pc; while (1) { ++pc; switch (pc->code) { default: #ifdef _KERNEL return 0; #else abort(); #endif case BPF_RET|BPF_K: return (u_int)pc->k; case BPF_RET|BPF_A: return (u_int)A; case BPF_LD|BPF_W|BPF_ABS: A = ops->ldw(pkt, pc->k, &err); if (err != 0) return 0; continue; case BPF_LD|BPF_H|BPF_ABS: A = ops->ldh(pkt, pc->k, &err); if (err != 0) return 0; continue; case BPF_LD|BPF_B|BPF_ABS: A = ops->ldb(pkt, pc->k, &err); if (err != 0) return 0; continue; case BPF_LD|BPF_W|BPF_LEN: A = wirelen; continue; case BPF_LDX|BPF_W|BPF_LEN: X = wirelen; continue; case BPF_LD|BPF_W|BPF_RND: A = arc4random(); continue; case BPF_LD|BPF_W|BPF_IND: k = X + pc->k; A = ops->ldw(pkt, k, &err); if (err != 0) return 0; continue; case BPF_LD|BPF_H|BPF_IND: k = X + pc->k; A = ops->ldh(pkt, k, &err); if (err != 0) return 0; continue; case BPF_LD|BPF_B|BPF_IND: k = X + pc->k; A = ops->ldb(pkt, k, &err); if (err != 0) return 0; continue; case BPF_LDX|BPF_MSH|BPF_B: X = ops->ldb(pkt, pc->k, &err); if (err != 0) return 0; X &= 0xf; X <<= 2; continue; case BPF_LD|BPF_IMM: A = pc->k; continue; case BPF_LDX|BPF_IMM: X = pc->k; continue; case BPF_LD|BPF_MEM: A = mem[pc->k]; continue; case BPF_LDX|BPF_MEM: X = mem[pc->k]; continue; case BPF_ST: mem[pc->k] = A; continue; case BPF_STX: mem[pc->k] = X; continue; case BPF_JMP|BPF_JA: pc += pc->k; continue; case BPF_JMP|BPF_JGT|BPF_K: pc += (A > pc->k) ? pc->jt : pc->jf; continue; case BPF_JMP|BPF_JGE|BPF_K: pc += (A >= pc->k) ? pc->jt : pc->jf; continue; case BPF_JMP|BPF_JEQ|BPF_K: pc += (A == pc->k) ? pc->jt : pc->jf; continue; case BPF_JMP|BPF_JSET|BPF_K: pc += (A & pc->k) ? pc->jt : pc->jf; continue; case BPF_JMP|BPF_JGT|BPF_X: pc += (A > X) ? pc->jt : pc->jf; continue; case BPF_JMP|BPF_JGE|BPF_X: pc += (A >= X) ? pc->jt : pc->jf; continue; case BPF_JMP|BPF_JEQ|BPF_X: pc += (A == X) ? pc->jt : pc->jf; continue; case BPF_JMP|BPF_JSET|BPF_X: pc += (A & X) ? pc->jt : pc->jf; continue; case BPF_ALU|BPF_ADD|BPF_X: A += X; continue; case BPF_ALU|BPF_SUB|BPF_X: A -= X; continue; case BPF_ALU|BPF_MUL|BPF_X: A *= X; continue; case BPF_ALU|BPF_DIV|BPF_X: if (X == 0) return 0; A /= X; continue; case BPF_ALU|BPF_AND|BPF_X: A &= X; continue; case BPF_ALU|BPF_OR|BPF_X: A |= X; continue; case BPF_ALU|BPF_LSH|BPF_X: A <<= X; continue; case BPF_ALU|BPF_RSH|BPF_X: A >>= X; continue; case BPF_ALU|BPF_ADD|BPF_K: A += pc->k; continue; case BPF_ALU|BPF_SUB|BPF_K: A -= pc->k; continue; case BPF_ALU|BPF_MUL|BPF_K: A *= pc->k; continue; case BPF_ALU|BPF_DIV|BPF_K: A /= pc->k; continue; case BPF_ALU|BPF_AND|BPF_K: A &= pc->k; continue; case BPF_ALU|BPF_OR|BPF_K: A |= pc->k; continue; case BPF_ALU|BPF_LSH|BPF_K: A <<= pc->k; continue; case BPF_ALU|BPF_RSH|BPF_K: A >>= pc->k; continue; case BPF_ALU|BPF_NEG: A = -A; continue; case BPF_MISC|BPF_TAX: X = A; continue; case BPF_MISC|BPF_TXA: A = X; continue; } } } #ifdef _KERNEL /* * Return true if the 'fcode' is a valid filter program. * The constraints are that each jump be forward and to a valid * code and memory operations use valid addresses. The code * must terminate with either an accept or reject. * * The kernel needs to be able to verify an application's filter code. * Otherwise, a bogus program could easily crash the system. */ int bpf_validate(struct bpf_insn *f, int len) { u_int i, from; struct bpf_insn *p; if (len < 1 || len > BPF_MAXINSNS) return 0; for (i = 0; i < len; ++i) { p = &f[i]; switch (BPF_CLASS(p->code)) { /* * Check that memory operations use valid addresses. */ case BPF_LD: case BPF_LDX: switch (BPF_MODE(p->code)) { case BPF_IMM: break; case BPF_ABS: case BPF_IND: case BPF_MSH: /* * More strict check with actual packet length * is done runtime. */ if (p->k >= bpf_maxbufsize) return 0; break; case BPF_MEM: if (p->k >= BPF_MEMWORDS) return 0; break; case BPF_LEN: case BPF_RND: break; default: return 0; } break; case BPF_ST: case BPF_STX: if (p->k >= BPF_MEMWORDS) return 0; break; case BPF_ALU: switch (BPF_OP(p->code)) { case BPF_ADD: case BPF_SUB: case BPF_MUL: case BPF_OR: case BPF_AND: case BPF_LSH: case BPF_RSH: case BPF_NEG: break; case BPF_DIV: /* * Check for constant division by 0. */ if (BPF_SRC(p->code) == BPF_K && p->k == 0) return 0; break; default: return 0; } break; case BPF_JMP: /* * Check that jumps are forward, and within * the code block. */ from = i + 1; switch (BPF_OP(p->code)) { case BPF_JA: if (from + p->k < from || from + p->k >= len) return 0; break; case BPF_JEQ: case BPF_JGT: case BPF_JGE: case BPF_JSET: if (from + p->jt >= len || from + p->jf >= len) return 0; break; default: return 0; } break; case BPF_RET: break; case BPF_MISC: break; default: return 0; } } return BPF_CLASS(f[len - 1].code) == BPF_RET; } #endif
26 1 4 5 21 6 16 17 5 2 2 118 1 116 1 2 10 103 112 111 2 76 70 5 19 8 43 57 13 11 12 43 9 5 14 63 31 3 22 31 14 75 11 26 118 26 15 11 21 1 19 48 48 1 1 4 42 52 1 50 50 1 49 15 15 1 14 63 63 1 26 13 18 2 3 42 42 1 2 1 39 21 21 1 20 6 1 5 2 75 36 39 4 25 63 42 5 7 3 6 9 53 51 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 /* $OpenBSD: uvm_mmap.c,v 1.172 2022/08/01 14:56:59 deraadt Exp $ */ /* $NetBSD: uvm_mmap.c,v 1.49 2001/02/18 21:19:08 chs Exp $ */ /* * Copyright (c) 1997 Charles D. Cranor and Washington University. * Copyright (c) 1991, 1993 The Regents of the University of California. * Copyright (c) 1988 University of Utah. * * All rights reserved. * * This code is derived from software contributed to Berkeley by * the Systems Programming Group of the University of Utah Computer * Science Department. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. All advertising materials mentioning features or use of this software * must display the following acknowledgement: * This product includes software developed by the Charles D. Cranor, * Washington University, University of California, Berkeley and * its contributors. * 4. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * from: Utah $Hdr: vm_mmap.c 1.6 91/10/21$ * @(#)vm_mmap.c 8.5 (Berkeley) 5/19/94 * from: Id: uvm_mmap.c,v 1.1.2.14 1998/01/05 21:04:26 chuck Exp */ /* * uvm_mmap.c: system call interface into VM system, plus kernel vm_mmap * function. */ #include <sys/param.h> #include <sys/systm.h> #include <sys/fcntl.h> #include <sys/file.h> #include <sys/filedesc.h> #include <sys/resourcevar.h> #include <sys/mman.h> #include <sys/mount.h> #include <sys/proc.h> #include <sys/malloc.h> #include <sys/vnode.h> #include <sys/conf.h> #include <sys/signalvar.h> #include <sys/syslog.h> #include <sys/stat.h> #include <sys/specdev.h> #include <sys/stdint.h> #include <sys/pledge.h> #include <sys/unistd.h> /* for KBIND* */ #include <sys/user.h> #include <machine/exec.h> /* for __LDPGSZ */ #include <sys/syscallargs.h> #include <uvm/uvm.h> #include <uvm/uvm_device.h> #include <uvm/uvm_vnode.h> int uvm_mmapanon(vm_map_t, vaddr_t *, vsize_t, vm_prot_t, vm_prot_t, int, vsize_t, struct proc *); int uvm_mmapfile(vm_map_t, vaddr_t *, vsize_t, vm_prot_t, vm_prot_t, int, struct vnode *, voff_t, vsize_t, struct proc *); /* * Page align addr and size, returning EINVAL on wraparound. */ #define ALIGN_ADDR(addr, size, pageoff) do { \ pageoff = (addr & PAGE_MASK); \ if (pageoff != 0) { \ if (size > SIZE_MAX - pageoff) \ return EINVAL; /* wraparound */ \ addr -= pageoff; \ size += pageoff; \ } \ if (size != 0) { \ size = (vsize_t)round_page(size); \ if (size == 0) \ return EINVAL; /* wraparound */ \ } \ } while (0) /* * sys_mquery: provide mapping hints to applications that do fixed mappings * * flags: 0 or MAP_FIXED (MAP_FIXED - means that we insist on this addr and * don't care about PMAP_PREFER or such) * addr: hint where we'd like to place the mapping. * size: size of the mapping * fd: fd of the file we want to map * off: offset within the file */ int sys_mquery(struct proc *p, void *v, register_t *retval) { struct sys_mquery_args /* { syscallarg(void *) addr; syscallarg(size_t) len; syscallarg(int) prot; syscallarg(int) flags; syscallarg(int) fd; syscallarg(off_t) pos; } */ *uap = v; struct file *fp; voff_t uoff; int error; vaddr_t vaddr; int flags = 0; vsize_t size; vm_prot_t prot; int fd; vaddr = (vaddr_t) SCARG(uap, addr); prot = SCARG(uap, prot); size = (vsize_t) SCARG(uap, len); fd = SCARG(uap, fd); if ((prot & PROT_MASK) != prot) return EINVAL; if (SCARG(uap, flags) & MAP_FIXED) flags |= UVM_FLAG_FIXED; if (fd >= 0) { if ((error = getvnode(p, fd, &fp)) != 0) return error; uoff = SCARG(uap, pos); } else { fp = NULL; uoff = UVM_UNKNOWN_OFFSET; } if (vaddr == 0) vaddr = uvm_map_hint(p->p_vmspace, prot, VM_MIN_ADDRESS, VM_MAXUSER_ADDRESS); error = uvm_map_mquery(&p->p_vmspace->vm_map, &vaddr, size, uoff, flags); if (error == 0) *retval = (register_t)(vaddr); if (fp != NULL) FRELE(fp, p); return error; } int uvm_wxabort; /* * W^X violations are only allowed on permitted filesystems. */ static inline int uvm_wxcheck(struct proc *p, char *call) { struct process *pr = p->p_p; int wxallowed = (pr->ps_textvp->v_mount && (pr->ps_textvp->v_mount->mnt_flag & MNT_WXALLOWED)); if (wxallowed && (pr->ps_flags & PS_WXNEEDED)) return 0; if (uvm_wxabort) { KERNEL_LOCK(); /* Report W^X failures */ if (pr->ps_wxcounter++ == 0) log(LOG_NOTICE, "%s(%d): %s W^X violation\n", pr->ps_comm, pr->ps_pid, call); /* Send uncatchable SIGABRT for coredump */ sigexit(p, SIGABRT); KERNEL_UNLOCK(); } return ENOTSUP; } /* * sys_mmap: mmap system call. * * => file offset and address may not be page aligned * - if MAP_FIXED, offset and address must have remainder mod PAGE_SIZE * - if address isn't page aligned the mapping starts at trunc_page(addr) * and the return value is adjusted up by the page offset. */ int sys_mmap(struct proc *p, void *v, register_t *retval) { struct sys_mmap_args /* { syscallarg(void *) addr; syscallarg(size_t) len; syscallarg(int) prot; syscallarg(int) flags; syscallarg(int) fd; syscallarg(off_t) pos; } */ *uap = v; vaddr_t addr; struct vattr va; off_t pos; vsize_t limit, pageoff, size; vm_prot_t prot, maxprot; int flags, fd; vaddr_t vm_min_address = VM_MIN_ADDRESS; struct filedesc *fdp = p->p_fd; struct file *fp = NULL; struct vnode *vp; int error; /* first, extract syscall args from the uap. */ addr = (vaddr_t) SCARG(uap, addr); size = (vsize_t) SCARG(uap, len); prot = SCARG(uap, prot); flags = SCARG(uap, flags); fd = SCARG(uap, fd); pos = SCARG(uap, pos); /* * Validate the flags. */ if ((prot & PROT_MASK) != prot) return EINVAL; if ((prot & (PROT_WRITE | PROT_EXEC)) == (PROT_WRITE | PROT_EXEC) && (error = uvm_wxcheck(p, "mmap"))) return error; if ((flags & MAP_FLAGMASK) != flags) return EINVAL; if ((flags & (MAP_SHARED|MAP_PRIVATE)) == (MAP_SHARED|MAP_PRIVATE)) return EINVAL; if ((flags & (MAP_FIXED|__MAP_NOREPLACE)) == __MAP_NOREPLACE) return EINVAL; if (flags & MAP_STACK) { if ((flags & (MAP_ANON|MAP_PRIVATE)) != (MAP_ANON|MAP_PRIVATE)) return EINVAL; if (flags & ~(MAP_STACK|MAP_FIXED|MAP_ANON|MAP_PRIVATE)) return EINVAL; if (pos != 0) return EINVAL; if ((prot & (PROT_READ|PROT_WRITE)) != (PROT_READ|PROT_WRITE)) return EINVAL; } if (size == 0) return EINVAL; error = pledge_protexec(p, prot); if (error) return error; /* align file position and save offset. adjust size. */ ALIGN_ADDR(pos, size, pageoff); /* now check (MAP_FIXED) or get (!MAP_FIXED) the "addr" */ if (flags & MAP_FIXED) { /* adjust address by the same amount as we did the offset */ addr -= pageoff; if (addr & PAGE_MASK) return EINVAL; /* not page aligned */ if (addr > SIZE_MAX - size) return EINVAL; /* no wrapping! */ if (VM_MAXUSER_ADDRESS > 0 && (addr + size) > VM_MAXUSER_ADDRESS) return EINVAL; if (vm_min_address > 0 && addr < vm_min_address) return EINVAL; } /* check for file mappings (i.e. not anonymous) and verify file. */ if ((flags & MAP_ANON) == 0) { KERNEL_LOCK(); if ((fp = fd_getfile(fdp, fd)) == NULL) { error = EBADF; goto out; } if (fp->f_type != DTYPE_VNODE) { error = ENODEV; /* only mmap vnodes! */ goto out; } vp = (struct vnode *)fp->f_data; /* convert to vnode */ if (vp->v_type != VREG && vp->v_type != VCHR && vp->v_type != VBLK) { error = ENODEV; /* only REG/CHR/BLK support mmap */ goto out; } if (vp->v_type == VREG && (pos + size) < pos) { error = EINVAL; /* no offset wrapping */ goto out; } /* special case: catch SunOS style /dev/zero */ if (vp->v_type == VCHR && iszerodev(vp->v_rdev)) { flags |= MAP_ANON; FRELE(fp, p); fp = NULL; KERNEL_UNLOCK(); goto is_anon; } /* * Old programs may not select a specific sharing type, so * default to an appropriate one. */ if ((flags & (MAP_SHARED|MAP_PRIVATE)) == 0) { #if defined(DEBUG) printf("WARNING: defaulted mmap() share type to" " %s (pid %d comm %s)\n", vp->v_type == VCHR ? "MAP_SHARED" : "MAP_PRIVATE", p->p_p->ps_pid, p->p_p->ps_comm); #endif if (vp->v_type == VCHR) flags |= MAP_SHARED; /* for a device */ else flags |= MAP_PRIVATE; /* for a file */ } /* * MAP_PRIVATE device mappings don't make sense (and aren't * supported anyway). However, some programs rely on this, * so just change it to MAP_SHARED. */ if (vp->v_type == VCHR && (flags & MAP_PRIVATE) != 0) { flags = (flags & ~MAP_PRIVATE) | MAP_SHARED; } /* now check protection */ maxprot = PROT_EXEC; /* check read access */ if (fp->f_flag & FREAD) maxprot |= PROT_READ; else if (prot & PROT_READ) { error = EACCES; goto out; } /* check write access, shared case first */ if (flags & MAP_SHARED) { /* * if the file is writable, only add PROT_WRITE to * maxprot if the file is not immutable, append-only. * otherwise, if we have asked for PROT_WRITE, return * EPERM. */ if (fp->f_flag & FWRITE) { error = VOP_GETATTR(vp, &va, p->p_ucred, p); if (error) goto out; if ((va.va_flags & (IMMUTABLE|APPEND)) == 0) maxprot |= PROT_WRITE; else if (prot & PROT_WRITE) { error = EPERM; goto out; } } else if (prot & PROT_WRITE) { error = EACCES; goto out; } } else { /* MAP_PRIVATE mappings can always write to */ maxprot |= PROT_WRITE; } if ((flags & __MAP_NOFAULT) != 0 || ((flags & MAP_PRIVATE) != 0 && (prot & PROT_WRITE) != 0)) { limit = lim_cur(RLIMIT_DATA); if (limit < size || limit - size < ptoa(p->p_vmspace->vm_dused)) { error = ENOMEM; goto out; } } error = uvm_mmapfile(&p->p_vmspace->vm_map, &addr, size, prot, maxprot, flags, vp, pos, lim_cur(RLIMIT_MEMLOCK), p); FRELE(fp, p); KERNEL_UNLOCK(); } else { /* MAP_ANON case */ if (fd != -1) return EINVAL; is_anon: /* label for SunOS style /dev/zero */ /* __MAP_NOFAULT only makes sense with a backing object */ if ((flags & __MAP_NOFAULT) != 0) return EINVAL; if (prot != PROT_NONE || (flags & MAP_SHARED)) { limit = lim_cur(RLIMIT_DATA); if (limit < size || limit - size < ptoa(p->p_vmspace->vm_dused)) { return ENOMEM; } } /* * We've been treating (MAP_SHARED|MAP_PRIVATE) == 0 as * MAP_PRIVATE, so make that clear. */ if ((flags & MAP_SHARED) == 0) flags |= MAP_PRIVATE; maxprot = PROT_MASK; error = uvm_mmapanon(&p->p_vmspace->vm_map, &addr, size, prot, maxprot, flags, lim_cur(RLIMIT_MEMLOCK), p); } if (error == 0) /* remember to add offset */ *retval = (register_t)(addr + pageoff); return error; out: KERNEL_UNLOCK(); if (fp) FRELE(fp, p); return error; } #if 1 int sys_pad_mquery(struct proc *p, void *v, register_t *retval) { struct sys_pad_mquery_args *uap = v; struct sys_mquery_args unpad; SCARG(&unpad, addr) = SCARG(uap, addr); SCARG(&unpad, len) = SCARG(uap, len); SCARG(&unpad, prot) = SCARG(uap, prot); SCARG(&unpad, flags) = SCARG(uap, flags); SCARG(&unpad, fd) = SCARG(uap, fd); SCARG(&unpad, pos) = SCARG(uap, pos); return sys_mquery(p, &unpad, retval); } int sys_pad_mmap(struct proc *p, void *v, register_t *retval) { struct sys_pad_mmap_args *uap = v; struct sys_mmap_args unpad; SCARG(&unpad, addr) = SCARG(uap, addr); SCARG(&unpad, len) = SCARG(uap, len); SCARG(&unpad, prot) = SCARG(uap, prot); SCARG(&unpad, flags) = SCARG(uap, flags); SCARG(&unpad, fd) = SCARG(uap, fd); SCARG(&unpad, pos) = SCARG(uap, pos); return sys_mmap(p, &unpad, retval); } #endif /* * sys_msync: the msync system call (a front-end for flush) */ int sys_msync(struct proc *p, void *v, register_t *retval) { struct sys_msync_args /* { syscallarg(void *) addr; syscallarg(size_t) len; syscallarg(int) flags; } */ *uap = v; vaddr_t addr; vsize_t size, pageoff; vm_map_t map; int flags, uvmflags; /* extract syscall args from the uap */ addr = (vaddr_t)SCARG(uap, addr); size = (vsize_t)SCARG(uap, len); flags = SCARG(uap, flags); /* sanity check flags */ if ((flags & ~(MS_ASYNC | MS_SYNC | MS_INVALIDATE)) != 0 || (flags & (MS_ASYNC | MS_SYNC | MS_INVALIDATE)) == 0 || (flags & (MS_ASYNC | MS_SYNC)) == (MS_ASYNC | MS_SYNC)) return EINVAL; if ((flags & (MS_ASYNC | MS_SYNC)) == 0) flags |= MS_SYNC; /* align the address to a page boundary, and adjust the size accordingly */ ALIGN_ADDR(addr, size, pageoff); if (addr > SIZE_MAX - size) return EINVAL; /* disallow wrap-around. */ /* get map */ map = &p->p_vmspace->vm_map; /* translate MS_ flags into PGO_ flags */ uvmflags = PGO_CLEANIT; if (flags & MS_INVALIDATE) uvmflags |= PGO_FREE; if (flags & MS_SYNC) uvmflags |= PGO_SYNCIO; else uvmflags |= PGO_SYNCIO; /* XXXCDC: force sync for now! */ return uvm_map_clean(map, addr, addr+size, uvmflags); } /* * sys_munmap: unmap a users memory */ int sys_munmap(struct proc *p, void *v, register_t *retval) { struct sys_munmap_args /* { syscallarg(void *) addr; syscallarg(size_t) len; } */ *uap = v; vaddr_t addr; vsize_t size, pageoff; vm_map_t map; vaddr_t vm_min_address = VM_MIN_ADDRESS; struct uvm_map_deadq dead_entries; /* get syscall args... */ addr = (vaddr_t) SCARG(uap, addr); size = (vsize_t) SCARG(uap, len); /* align address to a page boundary, and adjust size accordingly */ ALIGN_ADDR(addr, size, pageoff); /* * Check for illegal addresses. Watch out for address wrap... * Note that VM_*_ADDRESS are not constants due to casts (argh). */ if (addr > SIZE_MAX - size) return EINVAL; if (VM_MAXUSER_ADDRESS > 0 && addr + size > VM_MAXUSER_ADDRESS) return EINVAL; if (vm_min_address > 0 && addr < vm_min_address) return EINVAL; map = &p->p_vmspace->vm_map; vm_map_lock(map); /* lock map so we can checkprot */ /* * interesting system call semantic: make sure entire range is * allocated before allowing an unmap. */ if (!uvm_map_checkprot(map, addr, addr + size, PROT_NONE)) { vm_map_unlock(map); return EINVAL; } TAILQ_INIT(&dead_entries); uvm_unmap_remove(map, addr, addr + size, &dead_entries, FALSE, TRUE); vm_map_unlock(map); /* and unlock */ uvm_unmap_detach(&dead_entries, 0); return 0; } /* * sys_mprotect: the mprotect system call */ int sys_mprotect(struct proc *p, void *v, register_t *retval) { struct sys_mprotect_args /* { syscallarg(void *) addr; syscallarg(size_t) len; syscallarg(int) prot; } */ *uap = v; vaddr_t addr; vsize_t size, pageoff; vm_prot_t prot; int error; /* * extract syscall args from uap */ addr = (vaddr_t)SCARG(uap, addr); size = (vsize_t)SCARG(uap, len); prot = SCARG(uap, prot); if ((prot & PROT_MASK) != prot) return EINVAL; if ((prot & (PROT_WRITE | PROT_EXEC)) == (PROT_WRITE | PROT_EXEC) && (error = uvm_wxcheck(p, "mprotect"))) return error; error = pledge_protexec(p, prot); if (error) return error; /* * align the address to a page boundary, and adjust the size accordingly */ ALIGN_ADDR(addr, size, pageoff); if (addr > SIZE_MAX - size) return EINVAL; /* disallow wrap-around. */ return (uvm_map_protect(&p->p_vmspace->vm_map, addr, addr+size, prot, FALSE)); } /* * sys_msyscall: the msyscall system call */ int sys_msyscall(struct proc *p, void *v, register_t *retval) { struct sys_msyscall_args /* { syscallarg(void *) addr; syscallarg(size_t) len; } */ *uap = v; vaddr_t addr; vsize_t size, pageoff; addr = (vaddr_t)SCARG(uap, addr); size = (vsize_t)SCARG(uap, len); /* * align the address to a page boundary, and adjust the size accordingly */ ALIGN_ADDR(addr, size, pageoff); if (addr > SIZE_MAX - size) return EINVAL; /* disallow wrap-around. */ return uvm_map_syscall(&p->p_vmspace->vm_map, addr, addr+size); } /* * sys_minherit: the minherit system call */ int sys_minherit(struct proc *p, void *v, register_t *retval) { struct sys_minherit_args /* { syscallarg(void *) addr; syscallarg(size_t) len; syscallarg(int) inherit; } */ *uap = v; vaddr_t addr; vsize_t size, pageoff; vm_inherit_t inherit; addr = (vaddr_t)SCARG(uap, addr); size = (vsize_t)SCARG(uap, len); inherit = SCARG(uap, inherit); /* * align the address to a page boundary, and adjust the size accordingly */ ALIGN_ADDR(addr, size, pageoff); if (addr > SIZE_MAX - size) return EINVAL; /* disallow wrap-around. */ return (uvm_map_inherit(&p->p_vmspace->vm_map, addr, addr+size, inherit)); } /* * sys_madvise: give advice about memory usage. */ /* ARGSUSED */ int sys_madvise(struct proc *p, void *v, register_t *retval) { struct sys_madvise_args /* { syscallarg(void *) addr; syscallarg(size_t) len; syscallarg(int) behav; } */ *uap = v; vaddr_t addr; vsize_t size, pageoff; int advice, error; addr = (vaddr_t)SCARG(uap, addr); size = (vsize_t)SCARG(uap, len); advice = SCARG(uap, behav); /* * align the address to a page boundary, and adjust the size accordingly */ ALIGN_ADDR(addr, size, pageoff); if (addr > SIZE_MAX - size) return EINVAL; /* disallow wrap-around. */ switch (advice) { case MADV_NORMAL: case MADV_RANDOM: case MADV_SEQUENTIAL: error = uvm_map_advice(&p->p_vmspace->vm_map, addr, addr + size, advice); break; case MADV_WILLNEED: /* * Activate all these pages, pre-faulting them in if * necessary. */ /* * XXX IMPLEMENT ME. * Should invent a "weak" mode for uvm_fault() * which would only do the PGO_LOCKED pgo_get(). */ return 0; case MADV_DONTNEED: /* * Deactivate all these pages. We don't need them * any more. We don't, however, toss the data in * the pages. */ error = uvm_map_clean(&p->p_vmspace->vm_map, addr, addr + size, PGO_DEACTIVATE); break; case MADV_FREE: /* * These pages contain no valid data, and may be * garbage-collected. Toss all resources, including * any swap space in use. */ error = uvm_map_clean(&p->p_vmspace->vm_map, addr, addr + size, PGO_FREE); break; case MADV_SPACEAVAIL: /* * XXXMRG What is this? I think it's: * * Ensure that we have allocated backing-store * for these pages. * * This is going to require changes to the page daemon, * as it will free swap space allocated to pages in core. * There's also what to do for device/file/anonymous memory. */ return EINVAL; default: return EINVAL; } return error; } /* * sys_mlock: memory lock */ int sys_mlock(struct proc *p, void *v, register_t *retval) { struct sys_mlock_args /* { syscallarg(const void *) addr; syscallarg(size_t) len; } */ *uap = v; vaddr_t addr; vsize_t size, pageoff; int error; /* extract syscall args from uap */ addr = (vaddr_t)SCARG(uap, addr); size = (vsize_t)SCARG(uap, len); /* align address to a page boundary and adjust size accordingly */ ALIGN_ADDR(addr, size, pageoff); if (addr > SIZE_MAX - size) return EINVAL; /* disallow wrap-around. */ if (atop(size) + uvmexp.wired > uvmexp.wiredmax) return EAGAIN; #ifdef pmap_wired_count if (size + ptoa(pmap_wired_count(vm_map_pmap(&p->p_vmspace->vm_map))) > lim_cur(RLIMIT_MEMLOCK)) return EAGAIN; #else if ((error = suser(p)) != 0) return error; #endif error = uvm_map_pageable(&p->p_vmspace->vm_map, addr, addr+size, FALSE, 0); return error == 0 ? 0 : ENOMEM; } /* * sys_munlock: unlock wired pages */ int sys_munlock(struct proc *p, void *v, register_t *retval) { struct sys_munlock_args /* { syscallarg(const void *) addr; syscallarg(size_t) len; } */ *uap = v; vaddr_t addr; vsize_t size, pageoff; int error; /* extract syscall args from uap */ addr = (vaddr_t)SCARG(uap, addr); size = (vsize_t)SCARG(uap, len); /* align address to a page boundary, and adjust size accordingly */ ALIGN_ADDR(addr, size, pageoff); if (addr > SIZE_MAX - size) return EINVAL; /* disallow wrap-around. */ #ifndef pmap_wired_count if ((error = suser(p)) != 0) return error; #endif error = uvm_map_pageable(&p->p_vmspace->vm_map, addr, addr+size, TRUE, 0); return error == 0 ? 0 : ENOMEM; } /* * sys_mlockall: lock all pages mapped into an address space. */ int sys_mlockall(struct proc *p, void *v, register_t *retval) { struct sys_mlockall_args /* { syscallarg(int) flags; } */ *uap = v; int error, flags; flags = SCARG(uap, flags); if (flags == 0 || (flags & ~(MCL_CURRENT|MCL_FUTURE)) != 0) return EINVAL; #ifndef pmap_wired_count if ((error = suser(p)) != 0) return error; #endif error = uvm_map_pageable_all(&p->p_vmspace->vm_map, flags, lim_cur(RLIMIT_MEMLOCK)); if (error != 0 && error != ENOMEM) return EAGAIN; return error; } /* * sys_munlockall: unlock all pages mapped into an address space. */ int sys_munlockall(struct proc *p, void *v, register_t *retval) { (void) uvm_map_pageable_all(&p->p_vmspace->vm_map, 0, 0); return 0; } /* * common code for mmapanon and mmapfile to lock a mmaping */ int uvm_mmaplock(vm_map_t map, vaddr_t *addr, vsize_t size, vm_prot_t prot, vsize_t locklimit) { int error; /* * POSIX 1003.1b -- if our address space was configured * to lock all future mappings, wire the one we just made. */ if (prot == PROT_NONE) { /* * No more work to do in this case. */ return 0; } vm_map_lock(map); if (map->flags & VM_MAP_WIREFUTURE) { KERNEL_LOCK(); if ((atop(size) + uvmexp.wired) > uvmexp.wiredmax #ifdef pmap_wired_count || (locklimit != 0 && (size + ptoa(pmap_wired_count(vm_map_pmap(map)))) > locklimit) #endif ) { error = ENOMEM; vm_map_unlock(map); /* unmap the region! */ uvm_unmap(map, *addr, *addr + size); KERNEL_UNLOCK(); return error; } /* * uvm_map_pageable() always returns the map * unlocked. */ error = uvm_map_pageable(map, *addr, *addr + size, FALSE, UVM_LK_ENTER); if (error != 0) { /* unmap the region! */ uvm_unmap(map, *addr, *addr + size); KERNEL_UNLOCK(); return error; } KERNEL_UNLOCK(); return 0; } vm_map_unlock(map); return 0; } /* * uvm_mmapanon: internal version of mmap for anons * * - used by sys_mmap */ int uvm_mmapanon(vm_map_t map, vaddr_t *addr, vsize_t size, vm_prot_t prot, vm_prot_t maxprot, int flags, vsize_t locklimit, struct proc *p) { int error; int advice = MADV_NORMAL; unsigned int uvmflag = 0; vsize_t align = 0; /* userland page size */ /* * for non-fixed mappings, round off the suggested address. * for fixed mappings, check alignment and zap old mappings. */ if ((flags & MAP_FIXED) == 0) { *addr = round_page(*addr); /* round */ } else { if (*addr & PAGE_MASK) return EINVAL; uvmflag |= UVM_FLAG_FIXED; if ((flags & __MAP_NOREPLACE) == 0) uvmflag |= UVM_FLAG_UNMAP; } if ((flags & MAP_FIXED) == 0 && size >= __LDPGSZ) align = __LDPGSZ; if ((flags & MAP_SHARED) == 0) /* XXX: defer amap create */ uvmflag |= UVM_FLAG_COPYONW; else /* shared: create amap now */ uvmflag |= UVM_FLAG_OVERLAY; if (flags & MAP_STACK) uvmflag |= UVM_FLAG_STACK; if (flags & MAP_CONCEAL) uvmflag |= UVM_FLAG_CONCEAL; /* set up mapping flags */ uvmflag = UVM_MAPFLAG(prot, maxprot, (flags & MAP_SHARED) ? MAP_INHERIT_SHARE : MAP_INHERIT_COPY, advice, uvmflag); error = uvm_mapanon(map, addr, size, align, uvmflag); if (error == 0) error = uvm_mmaplock(map, addr, size, prot, locklimit); return error; } /* * uvm_mmapfile: internal version of mmap for non-anons * * - used by sys_mmap * - caller must page-align the file offset */ int uvm_mmapfile(vm_map_t map, vaddr_t *addr, vsize_t size, vm_prot_t prot, vm_prot_t maxprot, int flags, struct vnode *vp, voff_t foff, vsize_t locklimit, struct proc *p) { struct uvm_object *uobj; int error; int advice = MADV_NORMAL; unsigned int uvmflag = 0; vsize_t align = 0; /* userland page size */ /* * for non-fixed mappings, round off the suggested address. * for fixed mappings, check alignment and zap old mappings. */ if ((flags & MAP_FIXED) == 0) { *addr = round_page(*addr); /* round */ } else { if (*addr & PAGE_MASK) return EINVAL; uvmflag |= UVM_FLAG_FIXED; if ((flags & __MAP_NOREPLACE) == 0) uvmflag |= UVM_FLAG_UNMAP; } /* * attach to underlying vm object. */ if (vp->v_type != VCHR) { uobj = uvn_attach(vp, (flags & MAP_SHARED) ? maxprot : (maxprot & ~PROT_WRITE)); /* * XXXCDC: hack from old code * don't allow vnodes which have been mapped * shared-writeable to persist [forces them to be * flushed out when last reference goes]. * XXXCDC: interesting side effect: avoids a bug. * note that in WRITE [ufs_readwrite.c] that we * allocate buffer, uncache, and then do the write. * the problem with this is that if the uncache causes * VM data to be flushed to the same area of the file * we are writing to... in that case we've got the * buffer locked and our process goes to sleep forever. * * XXXCDC: checking maxprot protects us from the * "persistbug" program but this is not a long term * solution. * * XXXCDC: we don't bother calling uncache with the vp * VOP_LOCKed since we know that we are already * holding a valid reference to the uvn (from the * uvn_attach above), and thus it is impossible for * the uncache to kill the uvn and trigger I/O. */ if (flags & MAP_SHARED) { if ((prot & PROT_WRITE) || (maxprot & PROT_WRITE)) { uvm_vnp_uncache(vp); } } } else { uobj = udv_attach(vp->v_rdev, (flags & MAP_SHARED) ? maxprot : (maxprot & ~PROT_WRITE), foff, size); /* * XXX Some devices don't like to be mapped with * XXX PROT_EXEC, but we don't really have a * XXX better way of handling this, right now */ if (uobj == NULL && (prot & PROT_EXEC) == 0) { maxprot &= ~PROT_EXEC; uobj = udv_attach(vp->v_rdev, (flags & MAP_SHARED) ? maxprot : (maxprot & ~PROT_WRITE), foff, size); } advice = MADV_RANDOM; } if (uobj == NULL) return vp->v_type == VREG ? ENOMEM : EINVAL; if ((flags & MAP_SHARED) == 0) uvmflag |= UVM_FLAG_COPYONW; if (flags & __MAP_NOFAULT) uvmflag |= (UVM_FLAG_NOFAULT | UVM_FLAG_OVERLAY); if (flags & MAP_STACK) uvmflag |= UVM_FLAG_STACK; if (flags & MAP_CONCEAL) uvmflag |= UVM_FLAG_CONCEAL; /* set up mapping flags */ uvmflag = UVM_MAPFLAG(prot, maxprot, (flags & MAP_SHARED) ? MAP_INHERIT_SHARE : MAP_INHERIT_COPY, advice, uvmflag); error = uvm_map(map, addr, size, uobj, foff, align, uvmflag); if (error == 0) return uvm_mmaplock(map, addr, size, prot, locklimit); /* errors: first detach from the uobj, if any. */ if (uobj) uobj->pgops->pgo_detach(uobj); return error; } /* an address that can't be in userspace or kernelspace */ #define BOGO_PC (u_long)-1 int sys_kbind(struct proc *p, void *v, register_t *retval) { struct sys_kbind_args /* { syscallarg(const struct __kbind *) param; syscallarg(size_t) psize; syscallarg(uint64_t) proc_cookie; } */ *uap = v; const struct __kbind *paramp; union { struct __kbind uk[KBIND_BLOCK_MAX]; char upad[KBIND_BLOCK_MAX * sizeof(*paramp) + KBIND_DATA_MAX]; } param; struct uvm_map_deadq dead_entries; struct process *pr = p->p_p; const char *data; vaddr_t baseva, last_baseva, endva, pageoffset, kva; size_t psize, s; u_long pc; int count, i, extra; int error, sigill = 0; /* * extract syscall args from uap */ paramp = SCARG(uap, param); psize = SCARG(uap, psize); /* * If paramp is NULL and we're uninitialized, disable the syscall * for the process. Raise SIGILL if paramp is NULL and we're * already initialized. * * If paramp is non-NULL and we're uninitialized, do initialization. * Otherwise, do security checks and raise SIGILL on failure. */ pc = PROC_PC(p); mtx_enter(&pr->ps_mtx); if (paramp == NULL) { if (pr->ps_kbind_addr == 0) pr->ps_kbind_addr = BOGO_PC; else sigill = 1; } else if (pr->ps_kbind_addr == 0) { pr->ps_kbind_addr = pc; pr->ps_kbind_cookie = SCARG(uap, proc_cookie); } else if (pc != pr->ps_kbind_addr || pc == BOGO_PC || pr->ps_kbind_cookie != SCARG(uap, proc_cookie)) { sigill = 1; } mtx_leave(&pr->ps_mtx); /* Raise SIGILL if something is off. */ if (sigill) { KERNEL_LOCK(); sigexit(p, SIGILL); /* NOTREACHED */ KERNEL_UNLOCK(); } /* We're done if we were disabling the syscall. */ if (paramp == NULL) return 0; if (psize < sizeof(struct __kbind) || psize > sizeof(param)) return EINVAL; if ((error = copyin(paramp, &param, psize))) return error; /* * The param argument points to an array of __kbind structures * followed by the corresponding new data areas for them. Verify * that the sizes in the __kbind structures add up to the total * size and find the start of the new area. */ paramp = &param.uk[0]; s = psize; for (count = 0; s > 0 && count < KBIND_BLOCK_MAX; count++) { if (s < sizeof(*paramp)) return EINVAL; s -= sizeof(*paramp); baseva = (vaddr_t)paramp[count].kb_addr; endva = baseva + paramp[count].kb_size - 1; if (paramp[count].kb_addr == NULL || paramp[count].kb_size == 0 || paramp[count].kb_size > KBIND_DATA_MAX || baseva >= VM_MAXUSER_ADDRESS || endva >= VM_MAXUSER_ADDRESS || s < paramp[count].kb_size) return EINVAL; s -= paramp[count].kb_size; } if (s > 0) return EINVAL; data = (const char *)&paramp[count]; /* all looks good, so do the bindings */ last_baseva = VM_MAXUSER_ADDRESS; kva = 0; TAILQ_INIT(&dead_entries); KERNEL_LOCK(); for (i = 0; i < count; i++) { baseva = (vaddr_t)paramp[i].kb_addr; s = paramp[i].kb_size; pageoffset = baseva & PAGE_MASK; baseva = trunc_page(baseva); /* hppa at least runs PLT entries over page edge */ extra = (pageoffset + s) & PAGE_MASK; if (extra > pageoffset) extra = 0; else s -= extra; redo: /* make sure sure the desired page is mapped into kernel_map */ if (baseva != last_baseva) { if (kva != 0) { vm_map_lock(kernel_map); uvm_unmap_remove(kernel_map, kva, kva+PAGE_SIZE, &dead_entries, FALSE, TRUE); vm_map_unlock(kernel_map); kva = 0; } if ((error = uvm_map_extract(&p->p_vmspace->vm_map, baseva, PAGE_SIZE, &kva, UVM_EXTRACT_FIXPROT))) break; last_baseva = baseva; } /* do the update */ if ((error = kcopy(data, (char *)kva + pageoffset, s))) break; data += s; if (extra > 0) { baseva += PAGE_SIZE; s = extra; pageoffset = 0; extra = 0; goto redo; } } if (kva != 0) { vm_map_lock(kernel_map); uvm_unmap_remove(kernel_map, kva, kva+PAGE_SIZE, &dead_entries, FALSE, TRUE); vm_map_unlock(kernel_map); } uvm_unmap_detach(&dead_entries, AMAP_REFALL); KERNEL_UNLOCK(); return error; }
3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 /* $OpenBSD: ntfs_vfsops.c,v 1.65 2022/01/11 03:13:59 jsg Exp $ */ /* $NetBSD: ntfs_vfsops.c,v 1.7 2003/04/24 07:50:19 christos Exp $ */ /*- * Copyright (c) 1998, 1999 Semen Ustimenko * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * Id: ntfs_vfsops.c,v 1.7 1999/05/31 11:28:30 phk Exp */ #include <sys/param.h> #include <sys/systm.h> #include <sys/namei.h> #include <sys/proc.h> #include <sys/kernel.h> #include <sys/vnode.h> #include <sys/lock.h> #include <sys/mount.h> #include <sys/buf.h> #include <sys/disk.h> #include <sys/fcntl.h> #include <sys/malloc.h> #include <sys/device.h> #include <sys/conf.h> #include <sys/specdev.h> /*#define NTFS_DEBUG 1*/ #include <ntfs/ntfs.h> #include <ntfs/ntfs_inode.h> #include <ntfs/ntfs_subr.h> #include <ntfs/ntfs_vfsops.h> #include <ntfs/ntfs_ihash.h> int ntfs_mount(struct mount *, const char *, void *, struct nameidata *, struct proc *); int ntfs_quotactl(struct mount *, int, uid_t, caddr_t, struct proc *); int ntfs_root(struct mount *, struct vnode **); int ntfs_start(struct mount *, int, struct proc *); int ntfs_statfs(struct mount *, struct statfs *, struct proc *); int ntfs_sync(struct mount *, int, int, struct ucred *, struct proc *); int ntfs_unmount(struct mount *, int, struct proc *); int ntfs_vget(struct mount *mp, ino_t ino, struct vnode **vpp); int ntfs_mountfs(struct vnode *, struct mount *, struct ntfs_args *, struct proc *); int ntfs_vptofh(struct vnode *, struct fid *); int ntfs_init(struct vfsconf *); int ntfs_fhtovp(struct mount *, struct fid *, struct vnode **); int ntfs_checkexp(struct mount *, struct mbuf *, int *, struct ucred **); int ntfs_sysctl(int *, u_int, void *, size_t *, void *, size_t, struct proc *); /* * Verify a remote client has export rights and return these rights via. * exflagsp and credanonp. */ int ntfs_checkexp(struct mount *mp, struct mbuf *nam, int *exflagsp, struct ucred **credanonp) { struct netcred *np; struct ntfsmount *ntm = VFSTONTFS(mp); /* * Get the export permission structure for this <mp, client> tuple. */ np = vfs_export_lookup(mp, &ntm->ntm_export, nam); if (np == NULL) return (EACCES); *exflagsp = np->netc_exflags; *credanonp = &np->netc_anon; return (0); } int ntfs_sysctl(int *name, u_int namelen, void *oldp, size_t *oldlenp, void *newp, size_t newlen, struct proc *p) { return (EINVAL); } int ntfs_init(struct vfsconf *vcp) { return 0; } int ntfs_mount(struct mount *mp, const char *path, void *data, struct nameidata *ndp, struct proc *p) { int err = 0; struct vnode *devvp; struct ntfs_args *args = data; char fname[MNAMELEN]; char fspec[MNAMELEN]; ntfs_nthashinit(); /* *** * Mounting non-root file system or updating a file system *** */ /* * If updating, check whether changing from read-only to * read/write; if there is no device name, that's all we do. */ if (mp->mnt_flag & MNT_UPDATE) { /* if not updating name...*/ if (args && args->fspec == NULL) { /* * Process export requests. Jumping to "success" * will return the vfs_export() error code. */ struct ntfsmount *ntm = VFSTONTFS(mp); err = vfs_export(mp, &ntm->ntm_export, &args->export_info); goto success; } printf("ntfs_mount(): MNT_UPDATE not supported\n"); err = EINVAL; goto error_1; } /* * Not an update, or updating the name: look up the name * and verify that it refers to a sensible block device. */ err = copyinstr(args->fspec, fspec, sizeof(fspec), NULL); if (err) goto error_1; if (disk_map(fspec, fname, sizeof(fname), DM_OPENBLCK) == -1) bcopy(fspec, fname, sizeof(fname)); NDINIT(ndp, LOOKUP, FOLLOW, UIO_SYSSPACE, fname, p); err = namei(ndp); if (err) { /* can't get devvp!*/ goto error_1; } devvp = ndp->ni_vp; if (devvp->v_type != VBLK) { err = ENOTBLK; goto error_2; } if (major(devvp->v_rdev) >= nblkdev) { err = ENXIO; goto error_2; } if (mp->mnt_flag & MNT_UPDATE) { #if 0 /* ******************** * UPDATE ******************** */ if (devvp != ntmp->um_devvp) err = EINVAL; /* needs translation */ else vrele(devvp); /* * Update device name only on success */ if( !err) { err = set_statfs_info(NULL, UIO_USERSPACE, args->fspec, UIO_USERSPACE, mp, p); } #endif } else { /* ******************** * NEW MOUNT ******************** */ /* * Since this is a new mount, we want the names for * the device and the mount point copied in. If an * error occurs, the mountpoint is discarded by the * upper level code. */ /* Save "last mounted on" info for mount point (NULL pad)*/ bzero(mp->mnt_stat.f_mntonname, MNAMELEN); strlcpy(mp->mnt_stat.f_mntonname, path, MNAMELEN); bzero(mp->mnt_stat.f_mntfromname, MNAMELEN); strlcpy(mp->mnt_stat.f_mntfromname, fname, MNAMELEN); bzero(mp->mnt_stat.f_mntfromspec, MNAMELEN); strlcpy(mp->mnt_stat.f_mntfromspec, fspec, MNAMELEN); bcopy(args, &mp->mnt_stat.mount_info.ntfs_args, sizeof(*args)); if ( !err) { err = ntfs_mountfs(devvp, mp, args, p); } } if (err) { goto error_2; } /* * Initialize FS stat information in mount struct; uses both * mp->mnt_stat.f_mntonname and mp->mnt_stat.f_mntfromname * * This code is common to root and non-root mounts */ (void)VFS_STATFS(mp, &mp->mnt_stat, p); goto success; error_2: /* error with devvp held*/ /* release devvp before failing*/ vrele(devvp); error_1: /* no state to back out*/ success: return(err); } /* * Common code for mount and mountroot */ int ntfs_mountfs(struct vnode *devvp, struct mount *mp, struct ntfs_args *argsp, struct proc *p) { struct buf *bp; struct ntfsmount *ntmp = NULL; dev_t dev = devvp->v_rdev; int error, ncount, i; struct vnode *vp; /* * Disallow multiple mounts of the same device. * Disallow mounting of a device that is currently in use * (except for root, which might share swap device for miniroot). * Flush out any old buffers remaining from a previous use. */ error = vfs_mountedon(devvp); if (error) return (error); ncount = vcount(devvp); if (ncount > 1 && devvp != rootvp) return (EBUSY); vn_lock(devvp, LK_EXCLUSIVE | LK_RETRY); error = vinvalbuf(devvp, V_SAVE, p->p_ucred, p, 0, INFSLP); VOP_UNLOCK(devvp); if (error) return (error); error = VOP_OPEN(devvp, FREAD, FSCRED, p); if (error) return (error); bp = NULL; error = bread(devvp, BBLOCK, BBSIZE, &bp); if (error) goto out; ntmp = malloc(sizeof *ntmp, M_NTFSMNT, M_WAITOK | M_ZERO); bcopy(bp->b_data, &ntmp->ntm_bootfile, sizeof(struct bootfile)); brelse(bp); bp = NULL; if (strncmp(ntmp->ntm_bootfile.bf_sysid, NTFS_BBID, NTFS_BBIDLEN)) { error = EINVAL; DPRINTF("ntfs_mountfs: invalid boot block\n"); goto out; } { int8_t cpr = ntmp->ntm_mftrecsz; if( cpr > 0 ) ntmp->ntm_bpmftrec = ntmp->ntm_spc * cpr; else ntmp->ntm_bpmftrec = (1 << (-cpr)) / ntmp->ntm_bps; } DPRINTF("ntfs_mountfs(): bps: %u, spc: %u, media: %x, " "mftrecsz: %u (%u sects)\n", ntmp->ntm_bps, ntmp->ntm_spc, ntmp->ntm_bootfile.bf_media, ntmp->ntm_mftrecsz, ntmp->ntm_bpmftrec); DPRINTF("ntfs_mountfs(): mftcn: 0x%llx|0x%llx\n", ntmp->ntm_mftcn, ntmp->ntm_mftmirrcn); ntmp->ntm_mountp = mp; ntmp->ntm_dev = dev; ntmp->ntm_devvp = devvp; ntmp->ntm_uid = argsp->uid; ntmp->ntm_gid = argsp->gid; ntmp->ntm_mode = argsp->mode; ntmp->ntm_flag = argsp->flag; mp->mnt_data = ntmp; TAILQ_INIT(&ntmp->ntm_ntnodeq); /* set file name encode/decode hooks XXX utf-8 only for now */ ntmp->ntm_wget = ntfs_utf8_wget; ntmp->ntm_wput = ntfs_utf8_wput; ntmp->ntm_wcmp = ntfs_utf8_wcmp; DPRINTF("ntfs_mountfs(): case-%s,%s uid: %d, gid: %d, mode: %o\n", (ntmp->ntm_flag & NTFS_MFLAG_CASEINS) ? "insens." : "sens.", (ntmp->ntm_flag & NTFS_MFLAG_ALLNAMES) ? " allnames," : "", ntmp->ntm_uid, ntmp->ntm_gid, ntmp->ntm_mode); /* * We read in some system nodes to do not allow * reclaim them and to have everytime access to them. */ { int pi[3] = { NTFS_MFTINO, NTFS_ROOTINO, NTFS_BITMAPINO }; for (i=0; i<3; i++) { error = VFS_VGET(mp, pi[i], &(ntmp->ntm_sysvn[pi[i]])); if(error) goto out1; ntmp->ntm_sysvn[pi[i]]->v_flag |= VSYSTEM; vref(ntmp->ntm_sysvn[pi[i]]); vput(ntmp->ntm_sysvn[pi[i]]); } } /* read the Unicode lowercase --> uppercase translation table, * if necessary */ if ((error = ntfs_toupper_use(mp, ntmp, p))) goto out1; /* * Scan $BitMap and count free clusters */ error = ntfs_calccfree(ntmp, &ntmp->ntm_cfree); if(error) goto out1; /* * Read and translate to internal format attribute * definition file. */ { int num,j; struct attrdef ad; /* Open $AttrDef */ error = VFS_VGET(mp, NTFS_ATTRDEFINO, &vp ); if(error) goto out1; /* Count valid entries */ for(num = 0; ; num++) { error = ntfs_readattr(ntmp, VTONT(vp), NTFS_A_DATA, NULL, num * sizeof(ad), sizeof(ad), &ad, NULL); if (error) goto out1; if (ad.ad_name[0] == 0) break; } /* Alloc memory for attribute definitions */ ntmp->ntm_ad = mallocarray(num, sizeof(struct ntvattrdef), M_NTFSMNT, M_WAITOK); ntmp->ntm_adnum = num; /* Read them and translate */ for(i = 0; i < num; i++){ error = ntfs_readattr(ntmp, VTONT(vp), NTFS_A_DATA, NULL, i * sizeof(ad), sizeof(ad), &ad, NULL); if (error) goto out1; j = 0; do { ntmp->ntm_ad[i].ad_name[j] = ad.ad_name[j]; } while(ad.ad_name[j++]); ntmp->ntm_ad[i].ad_namelen = j - 1; ntmp->ntm_ad[i].ad_type = ad.ad_type; } vput(vp); } mp->mnt_stat.f_fsid.val[0] = dev; mp->mnt_stat.f_fsid.val[1] = mp->mnt_vfc->vfc_typenum; mp->mnt_stat.f_namemax = NTFS_MAXFILENAME; mp->mnt_flag |= MNT_LOCAL; devvp->v_specmountpoint = mp; return (0); out1: for (i = 0; i < NTFS_SYSNODESNUM; i++) if (ntmp->ntm_sysvn[i]) vrele(ntmp->ntm_sysvn[i]); if (vflush(mp,NULLVP,0)) DPRINTF("ntfs_mountfs: vflush failed\n"); out: if (devvp->v_specinfo) devvp->v_specmountpoint = NULL; if (bp) brelse(bp); if (ntmp != NULL) { if (ntmp->ntm_ad != NULL) free(ntmp->ntm_ad, M_NTFSMNT, 0); free(ntmp, M_NTFSMNT, 0); mp->mnt_data = NULL; } /* lock the device vnode before calling VOP_CLOSE() */ vn_lock(devvp, LK_EXCLUSIVE | LK_RETRY); (void)VOP_CLOSE(devvp, FREAD, NOCRED, p); VOP_UNLOCK(devvp); return (error); } int ntfs_start(struct mount *mp, int flags, struct proc *p) { return (0); } int ntfs_unmount(struct mount *mp, int mntflags, struct proc *p) { struct ntfsmount *ntmp; int error, flags, i; DPRINTF("ntfs_unmount: unmounting...\n"); ntmp = VFSTONTFS(mp); flags = 0; if(mntflags & MNT_FORCE) flags |= FORCECLOSE; DPRINTF("ntfs_unmount: vflushing...\n"); error = vflush(mp,NULLVP,flags | SKIPSYSTEM); if (error) { DPRINTF("ntfs_unmount: vflush failed: %d\n", error); return (error); } /* Check if system vnodes are still referenced */ for(i=0;i<NTFS_SYSNODESNUM;i++) { if(((mntflags & MNT_FORCE) == 0) && (ntmp->ntm_sysvn[i] && ntmp->ntm_sysvn[i]->v_usecount > 1)) return (EBUSY); } /* Dereference all system vnodes */ for(i=0;i<NTFS_SYSNODESNUM;i++) if(ntmp->ntm_sysvn[i]) vrele(ntmp->ntm_sysvn[i]); /* vflush system vnodes */ error = vflush(mp,NULLVP,flags); if (error) { /* XXX should this be panic() ? */ printf("ntfs_unmount: vflush failed(sysnodes): %d\n",error); } /* Check if the type of device node isn't VBAD before * touching v_specinfo. If the device vnode is revoked, the * field is NULL and touching it causes null pointer dereference. */ if (ntmp->ntm_devvp->v_type != VBAD) ntmp->ntm_devvp->v_specmountpoint = NULL; /* lock the device vnode before calling VOP_CLOSE() */ vn_lock(ntmp->ntm_devvp, LK_EXCLUSIVE | LK_RETRY); vinvalbuf(ntmp->ntm_devvp, V_SAVE, NOCRED, p, 0, INFSLP); (void)VOP_CLOSE(ntmp->ntm_devvp, FREAD, NOCRED, p); vput(ntmp->ntm_devvp); /* free the toupper table, if this has been last mounted ntfs volume */ ntfs_toupper_unuse(p); DPRINTF("ntfs_unmount: freeing memory...\n"); free(ntmp->ntm_ad, M_NTFSMNT, 0); free(ntmp, M_NTFSMNT, 0); mp->mnt_data = NULL; mp->mnt_flag &= ~MNT_LOCAL; return (0); } int ntfs_root(struct mount *mp, struct vnode **vpp) { struct vnode *nvp; int error = 0; DPRINTF("ntfs_root(): sysvn: %p\n", VFSTONTFS(mp)->ntm_sysvn[NTFS_ROOTINO]); error = VFS_VGET(mp, (ino_t)NTFS_ROOTINO, &nvp); if(error) { printf("ntfs_root: VFS_VGET failed: %d\n",error); return (error); } *vpp = nvp; return (0); } /* * Do operations associated with quotas, not supported */ int ntfs_quotactl(struct mount *mp, int cmds, uid_t uid, caddr_t arg, struct proc *p) { return EOPNOTSUPP; } int ntfs_calccfree(struct ntfsmount *ntmp, cn_t *cfreep) { struct vnode *vp; u_int8_t *tmp; int j, error; cn_t cfree = 0; uint64_t bmsize, offset; size_t chunksize, i; vp = ntmp->ntm_sysvn[NTFS_BITMAPINO]; bmsize = VTOF(vp)->f_size; if (bmsize > 1024 * 1024) chunksize = 1024 * 1024; else chunksize = bmsize; tmp = malloc(chunksize, M_TEMP, M_WAITOK); for (offset = 0; offset < bmsize; offset += chunksize) { if (chunksize > bmsize - offset) chunksize = bmsize - offset; error = ntfs_readattr(ntmp, VTONT(vp), NTFS_A_DATA, NULL, offset, chunksize, tmp, NULL); if (error) goto out; for (i = 0; i < chunksize; i++) for (j = 0; j < 8; j++) if (~tmp[i] & (1 << j)) cfree++; } *cfreep = cfree; out: free(tmp, M_TEMP, 0); return(error); } int ntfs_statfs(struct mount *mp, struct statfs *sbp, struct proc *p) { struct ntfsmount *ntmp = VFSTONTFS(mp); u_int64_t mftallocated; DPRINTF("ntfs_statfs():\n"); mftallocated = VTOF(ntmp->ntm_sysvn[NTFS_MFTINO])->f_allocated; sbp->f_bsize = ntmp->ntm_bps; sbp->f_iosize = ntmp->ntm_bps * ntmp->ntm_spc; sbp->f_blocks = ntmp->ntm_bootfile.bf_spv; sbp->f_bfree = sbp->f_bavail = ntfs_cntobn(ntmp->ntm_cfree); sbp->f_ffree = sbp->f_favail = sbp->f_bfree / ntmp->ntm_bpmftrec; sbp->f_files = mftallocated / ntfs_bntob(ntmp->ntm_bpmftrec) + sbp->f_ffree; copy_statfs_info(sbp, mp); return (0); } int ntfs_sync(struct mount *mp, int waitfor, int stall, struct ucred *cred, struct proc *p) { /*DPRINTF("ntfs_sync():\n");*/ return (0); } int ntfs_fhtovp(struct mount *mp, struct fid *fhp, struct vnode **vpp) { struct ntfid *ntfhp = (struct ntfid *)fhp; int error; DDPRINTF("ntfs_fhtovp(): %s: %u\n", mp->mnt_stat.f_mntonname, ntfhp->ntfid_ino); error = ntfs_vgetex(mp, ntfhp->ntfid_ino, ntfhp->ntfid_attr, NULL, LK_EXCLUSIVE | LK_RETRY, 0, vpp); /* XXX */ if (error != 0) { *vpp = NULLVP; return (error); } /* XXX as unlink/rmdir/mkdir/creat are not currently possible * with NTFS, we don't need to check anything else for now */ return (0); } int ntfs_vptofh(struct vnode *vp, struct fid *fhp) { struct ntnode *ntp; struct ntfid *ntfhp; struct fnode *fn; DDPRINTF("ntfs_fhtovp(): %s: %p\n", vp->v_mount->mnt_stat.f_mntonname, vp); fn = VTOF(vp); ntp = VTONT(vp); ntfhp = (struct ntfid *)fhp; ntfhp->ntfid_len = sizeof(struct ntfid); ntfhp->ntfid_ino = ntp->i_number; ntfhp->ntfid_attr = fn->f_attrtype; #ifdef notyet ntfhp->ntfid_gen = ntp->i_gen; #endif return (0); } int ntfs_vgetex(struct mount *mp, ntfsino_t ino, u_int32_t attrtype, char *attrname, u_long lkflags, u_long flags, struct vnode **vpp) { int error; struct ntfsmount *ntmp; struct ntnode *ip; struct fnode *fp; struct vnode *vp; enum vtype f_type; DPRINTF("ntfs_vgetex: ino: %u, attr: 0x%x:%s, lkf: 0x%lx, f: 0x%lx\n", ino, attrtype, attrname ? attrname : "", lkflags, flags); ntmp = VFSTONTFS(mp); *vpp = NULL; /* Get ntnode */ error = ntfs_ntlookup(ntmp, ino, &ip); if (error) { printf("ntfs_vget: ntfs_ntget failed\n"); return (error); } /* It may be not initialized fully, so force load it */ if (!(flags & VG_DONTLOADIN) && !(ip->i_flag & IN_LOADED)) { error = ntfs_loadntnode(ntmp, ip); if(error) { printf("ntfs_vget: CAN'T LOAD ATTRIBUTES FOR INO: %d\n", ip->i_number); ntfs_ntput(ip); return (error); } } error = ntfs_fget(ntmp, ip, attrtype, attrname, &fp); if (error) { printf("ntfs_vget: ntfs_fget failed\n"); ntfs_ntput(ip); return (error); } if (!(flags & VG_DONTVALIDFN) && !(fp->f_flag & FN_VALID)) { if ((ip->i_frflag & NTFS_FRFLAG_DIR) && (fp->f_attrtype == NTFS_A_DATA && fp->f_attrname == NULL)) { f_type = VDIR; } else if (flags & VG_EXT) { f_type = VNON; fp->f_size = fp->f_allocated = 0; } else { f_type = VREG; error = ntfs_filesize(ntmp, fp, &fp->f_size, &fp->f_allocated); if (error) { ntfs_ntput(ip); return (error); } } fp->f_flag |= FN_VALID; } /* * We may be calling vget() now. To avoid potential deadlock, we need * to release ntnode lock, since due to locking order vnode * lock has to be acquired first. * ntfs_fget() bumped ntnode usecount, so ntnode won't be recycled * prematurely. */ ntfs_ntput(ip); if (FTOV(fp)) { /* vget() returns error if the vnode has been recycled */ if (vget(FTOV(fp), lkflags) == 0) { *vpp = FTOV(fp); return (0); } } error = getnewvnode(VT_NTFS, ntmp->ntm_mountp, &ntfs_vops, &vp); if(error) { ntfs_frele(fp); ntfs_ntput(ip); return (error); } DPRINTF("ntfs_vget: vnode: %p for ntnode: %u\n", vp, ino); fp->f_vp = vp; vp->v_data = fp; vp->v_type = f_type; if (ino == NTFS_ROOTINO) vp->v_flag |= VROOT; if (lkflags & LK_TYPE_MASK) { error = vn_lock(vp, lkflags); if (error) { vput(vp); return (error); } } *vpp = vp; return (0); } int ntfs_vget(struct mount *mp, ino_t ino, struct vnode **vpp) { if (ino > (ntfsino_t)-1) panic("ntfs_vget: alien ino_t %llu", (unsigned long long)ino); return ntfs_vgetex(mp, ino, NTFS_A_DATA, NULL, LK_EXCLUSIVE | LK_RETRY, 0, vpp); /* XXX */ } const struct vfsops ntfs_vfsops = { .vfs_mount = ntfs_mount, .vfs_start = ntfs_start, .vfs_unmount = ntfs_unmount, .vfs_root = ntfs_root, .vfs_quotactl = ntfs_quotactl, .vfs_statfs = ntfs_statfs, .vfs_sync = ntfs_sync, .vfs_vget = ntfs_vget, .vfs_fhtovp = ntfs_fhtovp, .vfs_vptofh = ntfs_vptofh, .vfs_init = ntfs_init, .vfs_sysctl = ntfs_sysctl, .vfs_checkexp = ntfs_checkexp, };
41 41 25 17 17 17 8 9 8 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 /* $OpenBSD: mpls_output.c,v 1.28 2019/09/03 10:39:08 jsg Exp $ */ /* * Copyright (c) 2008 Claudio Jeker <claudio@openbsd.org> * Copyright (c) 2008 Michele Marchetto <michele@openbsd.org> * * Permission to use, copy, modify, and distribute this software for any * purpose with or without fee is hereby granted, provided that the above * copyright notice and this permission notice appear in all copies. * * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. */ #include <sys/param.h> #include <sys/mbuf.h> #include <sys/systm.h> #include <sys/socket.h> #include <net/if.h> #include <net/if_var.h> #include <net/route.h> #include <netmpls/mpls.h> #include <netinet/in.h> #include <netinet/ip.h> #ifdef INET6 #include <netinet/ip6.h> #endif #ifdef MPLS_DEBUG #define MPLS_LABEL_GET(l) ((ntohl((l) & MPLS_LABEL_MASK)) >> MPLS_LABEL_OFFSET) #endif void mpls_do_cksum(struct mbuf *); u_int8_t mpls_getttl(struct mbuf *, sa_family_t); int mpls_output(struct ifnet *ifp, struct mbuf *m, struct sockaddr *dst, struct rtentry *rt) { struct sockaddr_mpls *smpls; struct sockaddr_mpls sa_mpls; struct shim_hdr *shim; struct rt_mpls *rt_mpls; int error; u_int8_t ttl; if (rt == NULL || (dst->sa_family != AF_INET && dst->sa_family != AF_INET6 && dst->sa_family != AF_MPLS)) { if (!ISSET(ifp->if_xflags, IFXF_MPLS)) return (ifp->if_output(ifp, m, dst, rt)); else return (ifp->if_ll_output(ifp, m, dst, rt)); } /* need to calculate checksums now if necessary */ mpls_do_cksum(m); /* initialize sockaddr_mpls */ bzero(&sa_mpls, sizeof(sa_mpls)); smpls = &sa_mpls; smpls->smpls_family = AF_MPLS; smpls->smpls_len = sizeof(*smpls); ttl = mpls_getttl(m, dst->sa_family); rt_mpls = (struct rt_mpls *)rt->rt_llinfo; if (rt_mpls == NULL || (rt->rt_flags & RTF_MPLS) == 0) { /* no MPLS information for this entry */ if (!ISSET(ifp->if_xflags, IFXF_MPLS)) { #ifdef MPLS_DEBUG printf("MPLS_DEBUG: interface not mpls enabled\n"); #endif error = ENETUNREACH; goto bad; } return (ifp->if_ll_output(ifp, m, dst, rt)); } /* to be honest here only the push operation makes sense */ switch (rt_mpls->mpls_operation) { case MPLS_OP_PUSH: m = mpls_shim_push(m, rt_mpls); break; case MPLS_OP_POP: m = mpls_shim_pop(m); break; case MPLS_OP_SWAP: m = mpls_shim_swap(m, rt_mpls); break; default: error = EINVAL; goto bad; } if (m == NULL) { error = ENOBUFS; goto bad; } /* refetch label */ shim = mtod(m, struct shim_hdr *); /* mark first label with BOS flag */ if (dst->sa_family != AF_MPLS) shim->shim_label |= MPLS_BOS_MASK; /* write back TTL */ shim->shim_label &= ~MPLS_TTL_MASK; shim->shim_label |= htonl(ttl); #ifdef MPLS_DEBUG printf("MPLS: sending on %s outshim %x outlabel %d\n", ifp->if_xname, ntohl(shim->shim_label), MPLS_LABEL_GET(rt_mpls->mpls_label)); #endif /* Output iface is not MPLS-enabled */ if (!ISSET(ifp->if_xflags, IFXF_MPLS)) { #ifdef MPLS_DEBUG printf("MPLS_DEBUG: interface not mpls enabled\n"); #endif error = ENETUNREACH; goto bad; } /* reset broadcast and multicast flags, this is a P2P tunnel */ m->m_flags &= ~(M_BCAST | M_MCAST); smpls->smpls_label = shim->shim_label & MPLS_LABEL_MASK; error = ifp->if_ll_output(ifp, m, smplstosa(smpls), rt); return (error); bad: m_freem(m); return (error); } void mpls_do_cksum(struct mbuf *m) { struct ip *ip; u_int16_t hlen; in_proto_cksum_out(m, NULL); if (m->m_pkthdr.csum_flags & M_IPV4_CSUM_OUT) { ip = mtod(m, struct ip *); hlen = ip->ip_hl << 2; ip->ip_sum = in_cksum(m, hlen); m->m_pkthdr.csum_flags &= ~M_IPV4_CSUM_OUT; } } u_int8_t mpls_getttl(struct mbuf *m, sa_family_t af) { struct mbuf *n; int loc, off; u_int8_t ttl = mpls_defttl; /* If the AF is MPLS then inherit the TTL from the present label. */ if (af == AF_MPLS) loc = 3; else { switch (*mtod(m, uint8_t *) >> 4) { case 4: if (!mpls_mapttl_ip) return (ttl); loc = offsetof(struct ip, ip_ttl); break; #ifdef INET6 case 6: if (!mpls_mapttl_ip6) return (ttl); loc = offsetof(struct ip6_hdr, ip6_hlim); break; #endif default: return (ttl); } } n = m_getptr(m, loc, &off); if (n == NULL) return (ttl); ttl = *(mtod(n, uint8_t *) + off); return (ttl); }
219 204 18 18 18 252 23 13 9 3 3 21 20 21 18 18 8 11 1 3 2 2 2 23 23 7 4 4 1 27 23 26 1 36 10 25 7 21 8 8 2 14 3 251 40 70 43 178 55 44 4 265 189 27 219 59 185 186 218 214 214 7 7 7 7 153 2 109 2 40 3 43 78 116 4 111 23 118 103 4 78 34 4 26 1 27 5 112 336 337 254 335 337 306 306 105 18 51 286 52 294 35 36 12 271 30 39 13 43 279 93 2 25 37 22 102 238 175 1 38 63 12 243 45 249 17 247 92 252 83 52 29 20 9 203 44 100 49 153 4 5 151 153 95 1 5 147 62 3 93 5 118 42 153 173 203 56 153 2 2 8 18 18 12 6 18 17 17 1 18 207 136 312 14 18 182 230 229 1 1 85 275 276 133 291 231 233 234 31 60 60 11 38 10 4 18 10 23 10 317 314 316 34 54 274 250 40 185 165 29 187 185 173 22 187 33 166 291 23 23 41 44 30 86 176 82 143 129 18 125 144 146 29 185 185 41 50 33 34 35 23 23 23 85 23 85 23 5 5 42 42 27 93 15 15 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 /* $OpenBSD: kern_event.c,v 1.193 2022/08/14 01:58:27 jsg Exp $ */ /*- * Copyright (c) 1999,2000,2001 Jonathan Lemon <jlemon@FreeBSD.org> * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * $FreeBSD: src/sys/kern/kern_event.c,v 1.22 2001/02/23 20:32:42 jlemon Exp $ */ #include <sys/param.h> #include <sys/systm.h> #include <sys/proc.h> #include <sys/pledge.h> #include <sys/malloc.h> #include <sys/file.h> #include <sys/filedesc.h> #include <sys/fcntl.h> #include <sys/queue.h> #include <sys/event.h> #include <sys/eventvar.h> #include <sys/ktrace.h> #include <sys/pool.h> #include <sys/stat.h> #include <sys/mount.h> #include <sys/syscallargs.h> #include <sys/time.h> #include <sys/timeout.h> #include <sys/vnode.h> #include <sys/wait.h> #ifdef DIAGNOSTIC #define KLIST_ASSERT_LOCKED(kl) do { \ if ((kl)->kl_ops != NULL) \ (kl)->kl_ops->klo_assertlk((kl)->kl_arg); \ else \ KERNEL_ASSERT_LOCKED(); \ } while (0) #else #define KLIST_ASSERT_LOCKED(kl) ((void)(kl)) #endif struct kqueue *kqueue_alloc(struct filedesc *); void kqueue_terminate(struct proc *p, struct kqueue *); void KQREF(struct kqueue *); void KQRELE(struct kqueue *); void kqueue_purge(struct proc *, struct kqueue *); int kqueue_sleep(struct kqueue *, struct timespec *); int kqueue_read(struct file *, struct uio *, int); int kqueue_write(struct file *, struct uio *, int); int kqueue_ioctl(struct file *fp, u_long com, caddr_t data, struct proc *p); int kqueue_kqfilter(struct file *fp, struct knote *kn); int kqueue_stat(struct file *fp, struct stat *st, struct proc *p); int kqueue_close(struct file *fp, struct proc *p); void kqueue_wakeup(struct kqueue *kq); #ifdef KQUEUE_DEBUG void kqueue_do_check(struct kqueue *kq, const char *func, int line); #define kqueue_check(kq) kqueue_do_check((kq), __func__, __LINE__) #else #define kqueue_check(kq) do {} while (0) #endif static int filter_attach(struct knote *kn); static void filter_detach(struct knote *kn); static int filter_event(struct knote *kn, long hint); static int filter_modify(struct kevent *kev, struct knote *kn); static int filter_process(struct knote *kn, struct kevent *kev); static void kqueue_expand_hash(struct kqueue *kq); static void kqueue_expand_list(struct kqueue *kq, int fd); static void kqueue_task(void *); static int klist_lock(struct klist *); static void klist_unlock(struct klist *, int); const struct fileops kqueueops = { .fo_read = kqueue_read, .fo_write = kqueue_write, .fo_ioctl = kqueue_ioctl, .fo_kqfilter = kqueue_kqfilter, .fo_stat = kqueue_stat, .fo_close = kqueue_close }; void knote_attach(struct knote *kn); void knote_detach(struct knote *kn); void knote_drop(struct knote *kn, struct proc *p); void knote_enqueue(struct knote *kn); void knote_dequeue(struct knote *kn); int knote_acquire(struct knote *kn, struct klist *, int); void knote_release(struct knote *kn); void knote_activate(struct knote *kn); void knote_remove(struct proc *p, struct kqueue *kq, struct knlist **plist, int idx, int purge); void filt_kqdetach(struct knote *kn); int filt_kqueue(struct knote *kn, long hint); int filt_kqueuemodify(struct kevent *kev, struct knote *kn); int filt_kqueueprocess(struct knote *kn, struct kevent *kev); int filt_kqueue_common(struct knote *kn, struct kqueue *kq); int filt_procattach(struct knote *kn); void filt_procdetach(struct knote *kn); int filt_proc(struct knote *kn, long hint); int filt_fileattach(struct knote *kn); void filt_timerexpire(void *knx); int filt_timerattach(struct knote *kn); void filt_timerdetach(struct knote *kn); int filt_timermodify(struct kevent *kev, struct knote *kn); int filt_timerprocess(struct knote *kn, struct kevent *kev); void filt_seltruedetach(struct knote *kn); const struct filterops kqread_filtops = { .f_flags = FILTEROP_ISFD | FILTEROP_MPSAFE, .f_attach = NULL, .f_detach = filt_kqdetach, .f_event = filt_kqueue, .f_modify = filt_kqueuemodify, .f_process = filt_kqueueprocess, }; const struct filterops proc_filtops = { .f_flags = 0, .f_attach = filt_procattach, .f_detach = filt_procdetach, .f_event = filt_proc, }; const struct filterops file_filtops = { .f_flags = FILTEROP_ISFD | FILTEROP_MPSAFE, .f_attach = filt_fileattach, .f_detach = NULL, .f_event = NULL, }; const struct filterops timer_filtops = { .f_flags = 0, .f_attach = filt_timerattach, .f_detach = filt_timerdetach, .f_event = NULL, .f_modify = filt_timermodify, .f_process = filt_timerprocess, }; struct pool knote_pool; struct pool kqueue_pool; struct mutex kqueue_klist_lock = MUTEX_INITIALIZER(IPL_MPFLOOR); int kq_ntimeouts = 0; int kq_timeoutmax = (4 * 1024); #define KN_HASH(val, mask) (((val) ^ (val >> 8)) & (mask)) /* * Table for for all system-defined filters. */ const struct filterops *const sysfilt_ops[] = { &file_filtops, /* EVFILT_READ */ &file_filtops, /* EVFILT_WRITE */ NULL, /*&aio_filtops,*/ /* EVFILT_AIO */ &file_filtops, /* EVFILT_VNODE */ &proc_filtops, /* EVFILT_PROC */ &sig_filtops, /* EVFILT_SIGNAL */ &timer_filtops, /* EVFILT_TIMER */ &file_filtops, /* EVFILT_DEVICE */ &file_filtops, /* EVFILT_EXCEPT */ }; void KQREF(struct kqueue *kq) { refcnt_take(&kq->kq_refcnt); } void KQRELE(struct kqueue *kq) { struct filedesc *fdp; if (refcnt_rele(&kq->kq_refcnt) == 0) return; fdp = kq->kq_fdp; if (rw_status(&fdp->fd_lock) == RW_WRITE) { LIST_REMOVE(kq, kq_next); } else { fdplock(fdp); LIST_REMOVE(kq, kq_next); fdpunlock(fdp); } KASSERT(TAILQ_EMPTY(&kq->kq_head)); KASSERT(kq->kq_nknotes == 0); free(kq->kq_knlist, M_KEVENT, kq->kq_knlistsize * sizeof(struct knlist)); hashfree(kq->kq_knhash, KN_HASHSIZE, M_KEVENT); klist_free(&kq->kq_klist); pool_put(&kqueue_pool, kq); } void kqueue_init(void) { pool_init(&kqueue_pool, sizeof(struct kqueue), 0, IPL_MPFLOOR, PR_WAITOK, "kqueuepl", NULL); pool_init(&knote_pool, sizeof(struct knote), 0, IPL_MPFLOOR, PR_WAITOK, "knotepl", NULL); } void kqueue_init_percpu(void) { pool_cache_init(&knote_pool); } int filt_fileattach(struct knote *kn) { struct file *fp = kn->kn_fp; return fp->f_ops->fo_kqfilter(fp, kn); } int kqueue_kqfilter(struct file *fp, struct knote *kn) { struct kqueue *kq = kn->kn_fp->f_data; if (kn->kn_filter != EVFILT_READ) return (EINVAL); kn->kn_fop = &kqread_filtops; klist_insert(&kq->kq_klist, kn); return (0); } void filt_kqdetach(struct knote *kn) { struct kqueue *kq = kn->kn_fp->f_data; klist_remove(&kq->kq_klist, kn); } int filt_kqueue_common(struct knote *kn, struct kqueue *kq) { MUTEX_ASSERT_LOCKED(&kq->kq_lock); kn->kn_data = kq->kq_count; return (kn->kn_data > 0); } int filt_kqueue(struct knote *kn, long hint) { struct kqueue *kq = kn->kn_fp->f_data; int active; mtx_enter(&kq->kq_lock); active = filt_kqueue_common(kn, kq); mtx_leave(&kq->kq_lock); return (active); } int filt_kqueuemodify(struct kevent *kev, struct knote *kn) { struct kqueue *kq = kn->kn_fp->f_data; int active; mtx_enter(&kq->kq_lock); knote_assign(kev, kn); active = filt_kqueue_common(kn, kq); mtx_leave(&kq->kq_lock); return (active); } int filt_kqueueprocess(struct knote *kn, struct kevent *kev) { struct kqueue *kq = kn->kn_fp->f_data; int active; mtx_enter(&kq->kq_lock); if (kev != NULL && (kn->kn_flags & EV_ONESHOT)) active = 1; else active = filt_kqueue_common(kn, kq); if (active) knote_submit(kn, kev); mtx_leave(&kq->kq_lock); return (active); } int filt_procattach(struct knote *kn) { struct process *pr; int s; if ((curproc->p_p->ps_flags & PS_PLEDGE) && (curproc->p_p->ps_pledge & PLEDGE_PROC) == 0) return pledge_fail(curproc, EPERM, PLEDGE_PROC); if (kn->kn_id > PID_MAX) return ESRCH; pr = prfind(kn->kn_id); if (pr == NULL) return (ESRCH); /* exiting processes can't be specified */ if (pr->ps_flags & PS_EXITING) return (ESRCH); kn->kn_ptr.p_process = pr; kn->kn_flags |= EV_CLEAR; /* automatically set */ /* * internal flag indicating registration done by kernel */ if (kn->kn_flags & EV_FLAG1) { kn->kn_data = kn->kn_sdata; /* ppid */ kn->kn_fflags = NOTE_CHILD; kn->kn_flags &= ~EV_FLAG1; } s = splhigh(); klist_insert_locked(&pr->ps_klist, kn); splx(s); return (0); } /* * The knote may be attached to a different process, which may exit, * leaving nothing for the knote to be attached to. So when the process * exits, the knote is marked as DETACHED and also flagged as ONESHOT so * it will be deleted when read out. However, as part of the knote deletion, * this routine is called, so a check is needed to avoid actually performing * a detach, because the original process does not exist any more. */ void filt_procdetach(struct knote *kn) { struct kqueue *kq = kn->kn_kq; struct process *pr = kn->kn_ptr.p_process; int s, status; mtx_enter(&kq->kq_lock); status = kn->kn_status; mtx_leave(&kq->kq_lock); if (status & KN_DETACHED) return; s = splhigh(); klist_remove_locked(&pr->ps_klist, kn); splx(s); } int filt_proc(struct knote *kn, long hint) { struct kqueue *kq = kn->kn_kq; u_int event; /* * mask off extra data */ event = (u_int)hint & NOTE_PCTRLMASK; /* * if the user is interested in this event, record it. */ if (kn->kn_sfflags & event) kn->kn_fflags |= event; /* * process is gone, so flag the event as finished and remove it * from the process's klist */ if (event == NOTE_EXIT) { struct process *pr = kn->kn_ptr.p_process; int s; mtx_enter(&kq->kq_lock); kn->kn_status |= KN_DETACHED; mtx_leave(&kq->kq_lock); s = splhigh(); kn->kn_flags |= (EV_EOF | EV_ONESHOT); kn->kn_data = W_EXITCODE(pr->ps_xexit, pr->ps_xsig); klist_remove_locked(&pr->ps_klist, kn); splx(s); return (1); } /* * process forked, and user wants to track the new process, * so attach a new knote to it, and immediately report an * event with the parent's pid. */ if ((event == NOTE_FORK) && (kn->kn_sfflags & NOTE_TRACK)) { struct kevent kev; int error; /* * register knote with new process. */ memset(&kev, 0, sizeof(kev)); kev.ident = hint & NOTE_PDATAMASK; /* pid */ kev.filter = kn->kn_filter; kev.flags = kn->kn_flags | EV_ADD | EV_ENABLE | EV_FLAG1; kev.fflags = kn->kn_sfflags; kev.data = kn->kn_id; /* parent */ kev.udata = kn->kn_udata; /* preserve udata */ error = kqueue_register(kq, &kev, 0, NULL); if (error) kn->kn_fflags |= NOTE_TRACKERR; } return (kn->kn_fflags != 0); } static void filt_timer_timeout_add(struct knote *kn) { struct timeval tv; struct timeout *to = kn->kn_hook; int tticks; tv.tv_sec = kn->kn_sdata / 1000; tv.tv_usec = (kn->kn_sdata % 1000) * 1000; tticks = tvtohz(&tv); /* Remove extra tick from tvtohz() if timeout has fired before. */ if (timeout_triggered(to)) tticks--; timeout_add(to, (tticks > 0) ? tticks : 1); } void filt_timerexpire(void *knx) { struct knote *kn = knx; struct kqueue *kq = kn->kn_kq; kn->kn_data++; mtx_enter(&kq->kq_lock); knote_activate(kn); mtx_leave(&kq->kq_lock); if ((kn->kn_flags & EV_ONESHOT) == 0) filt_timer_timeout_add(kn); } /* * data contains amount of time to sleep, in milliseconds */ int filt_timerattach(struct knote *kn) { struct timeout *to; if (kq_ntimeouts > kq_timeoutmax) return (ENOMEM); kq_ntimeouts++; kn->kn_flags |= EV_CLEAR; /* automatically set */ to = malloc(sizeof(*to), M_KEVENT, M_WAITOK); timeout_set(to, filt_timerexpire, kn); kn->kn_hook = to; filt_timer_timeout_add(kn); return (0); } void filt_timerdetach(struct knote *kn) { struct timeout *to; to = (struct timeout *)kn->kn_hook; timeout_del_barrier(to); free(to, M_KEVENT, sizeof(*to)); kq_ntimeouts--; } int filt_timermodify(struct kevent *kev, struct knote *kn) { struct kqueue *kq = kn->kn_kq; struct timeout *to = kn->kn_hook; /* Reset the timer. Any pending events are discarded. */ timeout_del_barrier(to); mtx_enter(&kq->kq_lock); if (kn->kn_status & KN_QUEUED) knote_dequeue(kn); kn->kn_status &= ~KN_ACTIVE; mtx_leave(&kq->kq_lock); kn->kn_data = 0; knote_assign(kev, kn); /* Reinit timeout to invoke tick adjustment again. */ timeout_set(to, filt_timerexpire, kn); filt_timer_timeout_add(kn); return (0); } int filt_timerprocess(struct knote *kn, struct kevent *kev) { int active, s; s = splsoftclock(); active = (kn->kn_data != 0); if (active) knote_submit(kn, kev); splx(s); return (active); } /* * filt_seltrue: * * This filter "event" routine simulates seltrue(). */ int filt_seltrue(struct knote *kn, long hint) { /* * We don't know how much data can be read/written, * but we know that it *can* be. This is about as * good as select/poll does as well. */ kn->kn_data = 0; return (1); } int filt_seltruemodify(struct kevent *kev, struct knote *kn) { knote_assign(kev, kn); return (kn->kn_fop->f_event(kn, 0)); } int filt_seltrueprocess(struct knote *kn, struct kevent *kev) { int active; active = kn->kn_fop->f_event(kn, 0); if (active) knote_submit(kn, kev); return (active); } /* * This provides full kqfilter entry for device switch tables, which * has same effect as filter using filt_seltrue() as filter method. */ void filt_seltruedetach(struct knote *kn) { /* Nothing to do */ } const struct filterops seltrue_filtops = { .f_flags = FILTEROP_ISFD | FILTEROP_MPSAFE, .f_attach = NULL, .f_detach = filt_seltruedetach, .f_event = filt_seltrue, .f_modify = filt_seltruemodify, .f_process = filt_seltrueprocess, }; int seltrue_kqfilter(dev_t dev, struct knote *kn) { switch (kn->kn_filter) { case EVFILT_READ: case EVFILT_WRITE: kn->kn_fop = &seltrue_filtops; break; default: return (EINVAL); } /* Nothing more to do */ return (0); } static int filt_dead(struct knote *kn, long hint) { if (kn->kn_filter == EVFILT_EXCEPT) { /* * Do not deliver event because there is no out-of-band data. * However, let HUP condition pass for poll(2). */ if ((kn->kn_flags & __EV_POLL) == 0) { kn->kn_flags |= EV_DISABLE; return (0); } } kn->kn_flags |= (EV_EOF | EV_ONESHOT); if (kn->kn_flags & __EV_POLL) kn->kn_flags |= __EV_HUP; kn->kn_data = 0; return (1); } static void filt_deaddetach(struct knote *kn) { /* Nothing to do */ } const struct filterops dead_filtops = { .f_flags = FILTEROP_ISFD | FILTEROP_MPSAFE, .f_attach = NULL, .f_detach = filt_deaddetach, .f_event = filt_dead, .f_modify = filt_seltruemodify, .f_process = filt_seltrueprocess, }; static int filt_badfd(struct knote *kn, long hint) { kn->kn_flags |= (EV_ERROR | EV_ONESHOT); kn->kn_data = EBADF; return (1); } /* For use with kqpoll. */ const struct filterops badfd_filtops = { .f_flags = FILTEROP_ISFD | FILTEROP_MPSAFE, .f_attach = NULL, .f_detach = filt_deaddetach, .f_event = filt_badfd, .f_modify = filt_seltruemodify, .f_process = filt_seltrueprocess, }; static int filter_attach(struct knote *kn) { int error; if (kn->kn_fop->f_flags & FILTEROP_MPSAFE) { error = kn->kn_fop->f_attach(kn); } else { KERNEL_LOCK(); error = kn->kn_fop->f_attach(kn); KERNEL_UNLOCK(); } return (error); } static void filter_detach(struct knote *kn) { if (kn->kn_fop->f_flags & FILTEROP_MPSAFE) { kn->kn_fop->f_detach(kn); } else { KERNEL_LOCK(); kn->kn_fop->f_detach(kn); KERNEL_UNLOCK(); } } static int filter_event(struct knote *kn, long hint) { if ((kn->kn_fop->f_flags & FILTEROP_MPSAFE) == 0) KERNEL_ASSERT_LOCKED(); return (kn->kn_fop->f_event(kn, hint)); } static int filter_modify(struct kevent *kev, struct knote *kn) { int active, s; if (kn->kn_fop->f_flags & FILTEROP_MPSAFE) { active = kn->kn_fop->f_modify(kev, kn); } else { KERNEL_LOCK(); if (kn->kn_fop->f_modify != NULL) { active = kn->kn_fop->f_modify(kev, kn); } else { s = splhigh(); active = knote_modify(kev, kn); splx(s); } KERNEL_UNLOCK(); } return (active); } static int filter_process(struct knote *kn, struct kevent *kev) { int active, s; if (kn->kn_fop->f_flags & FILTEROP_MPSAFE) { active = kn->kn_fop->f_process(kn, kev); } else { KERNEL_LOCK(); if (kn->kn_fop->f_process != NULL) { active = kn->kn_fop->f_process(kn, kev); } else { s = splhigh(); active = knote_process(kn, kev); splx(s); } KERNEL_UNLOCK(); } return (active); } /* * Initialize the current thread for poll/select system call. * num indicates the number of serials that the system call may utilize. * After this function, the valid range of serials is * p_kq_serial <= x < p_kq_serial + num. */ void kqpoll_init(unsigned int num) { struct proc *p = curproc; struct filedesc *fdp; if (p->p_kq == NULL) { p->p_kq = kqueue_alloc(p->p_fd); p->p_kq_serial = arc4random(); fdp = p->p_fd; fdplock(fdp); LIST_INSERT_HEAD(&fdp->fd_kqlist, p->p_kq, kq_next); fdpunlock(fdp); } if (p->p_kq_serial + num < p->p_kq_serial) { /* Serial is about to wrap. Clear all attached knotes. */ kqueue_purge(p, p->p_kq); p->p_kq_serial = 0; } } /* * Finish poll/select system call. * num must have the same value that was used with kqpoll_init(). */ void kqpoll_done(unsigned int num) { struct proc *p = curproc; struct kqueue *kq = p->p_kq; KASSERT(p->p_kq != NULL); KASSERT(p->p_kq_serial + num >= p->p_kq_serial); p->p_kq_serial += num; /* * Because of kn_pollid key, a thread can in principle allocate * up to O(maxfiles^2) knotes by calling poll(2) repeatedly * with suitably varying pollfd arrays. * Prevent such a large allocation by clearing knotes eagerly * if there are too many of them. * * A small multiple of kq_knlistsize should give enough margin * that eager clearing is infrequent, or does not happen at all, * with normal programs. * A single pollfd entry can use up to three knotes. * Typically there is no significant overlap of fd and events * between different entries in the pollfd array. */ if (kq->kq_nknotes > 4 * kq->kq_knlistsize) kqueue_purge(p, kq); } void kqpoll_exit(void) { struct proc *p = curproc; if (p->p_kq == NULL) return; kqueue_purge(p, p->p_kq); kqueue_terminate(p, p->p_kq); KASSERT(p->p_kq->kq_refcnt.r_refs == 1); KQRELE(p->p_kq); p->p_kq = NULL; } struct kqueue * kqueue_alloc(struct filedesc *fdp) { struct kqueue *kq; kq = pool_get(&kqueue_pool, PR_WAITOK | PR_ZERO); refcnt_init(&kq->kq_refcnt); kq->kq_fdp = fdp; TAILQ_INIT(&kq->kq_head); mtx_init(&kq->kq_lock, IPL_HIGH); task_set(&kq->kq_task, kqueue_task, kq); klist_init_mutex(&kq->kq_klist, &kqueue_klist_lock); return (kq); } int sys_kqueue(struct proc *p, void *v, register_t *retval) { struct filedesc *fdp = p->p_fd; struct kqueue *kq; struct file *fp; int fd, error; kq = kqueue_alloc(fdp); fdplock(fdp); error = falloc(p, &fp, &fd); if (error) goto out; fp->f_flag = FREAD | FWRITE; fp->f_type = DTYPE_KQUEUE; fp->f_ops = &kqueueops; fp->f_data = kq; *retval = fd; LIST_INSERT_HEAD(&fdp->fd_kqlist, kq, kq_next); kq = NULL; fdinsert(fdp, fd, 0, fp); FRELE(fp, p); out: fdpunlock(fdp); if (kq != NULL) pool_put(&kqueue_pool, kq); return (error); } int sys_kevent(struct proc *p, void *v, register_t *retval) { struct kqueue_scan_state scan; struct filedesc* fdp = p->p_fd; struct sys_kevent_args /* { syscallarg(int) fd; syscallarg(const struct kevent *) changelist; syscallarg(int) nchanges; syscallarg(struct kevent *) eventlist; syscallarg(int) nevents; syscallarg(const struct timespec *) timeout; } */ *uap = v; struct kevent *kevp; struct kqueue *kq; struct file *fp; struct timespec ts; struct timespec *tsp = NULL; int i, n, nerrors, error; int ready, total; struct kevent kev[KQ_NEVENTS]; if ((fp = fd_getfile(fdp, SCARG(uap, fd))) == NULL) return (EBADF); if (fp->f_type != DTYPE_KQUEUE) { error = EBADF; goto done; } if (SCARG(uap, timeout) != NULL) { error = copyin(SCARG(uap, timeout), &ts, sizeof(ts)); if (error) goto done; #ifdef KTRACE if (KTRPOINT(p, KTR_STRUCT)) ktrreltimespec(p, &ts); #endif if (ts.tv_sec < 0 || !timespecisvalid(&ts)) { error = EINVAL; goto done; } tsp = &ts; } kq = fp->f_data; nerrors = 0; while ((n = SCARG(uap, nchanges)) > 0) { if (n > nitems(kev)) n = nitems(kev); error = copyin(SCARG(uap, changelist), kev, n * sizeof(struct kevent)); if (error) goto done; #ifdef KTRACE if (KTRPOINT(p, KTR_STRUCT)) ktrevent(p, kev, n); #endif for (i = 0; i < n; i++) { kevp = &kev[i]; kevp->flags &= ~EV_SYSFLAGS; error = kqueue_register(kq, kevp, 0, p); if (error || (kevp->flags & EV_RECEIPT)) { if (SCARG(uap, nevents) != 0) { kevp->flags = EV_ERROR; kevp->data = error; copyout(kevp, SCARG(uap, eventlist), sizeof(*kevp)); SCARG(uap, eventlist)++; SCARG(uap, nevents)--; nerrors++; } else { goto done; } } } SCARG(uap, nchanges) -= n; SCARG(uap, changelist) += n; } if (nerrors) { *retval = nerrors; error = 0; goto done; } kqueue_scan_setup(&scan, kq); FRELE(fp, p); /* * Collect as many events as we can. The timeout on successive * loops is disabled (kqueue_scan() becomes non-blocking). */ total = 0; error = 0; while ((n = SCARG(uap, nevents) - total) > 0) { if (n > nitems(kev)) n = nitems(kev); ready = kqueue_scan(&scan, n, kev, tsp, p, &error); if (ready == 0) break; error = copyout(kev, SCARG(uap, eventlist) + total, sizeof(struct kevent) * ready); #ifdef KTRACE if (KTRPOINT(p, KTR_STRUCT)) ktrevent(p, kev, ready); #endif total += ready; if (error || ready < n) break; } kqueue_scan_finish(&scan); *retval = total; return (error); done: FRELE(fp, p); return (error); } #ifdef KQUEUE_DEBUG void kqueue_do_check(struct kqueue *kq, const char *func, int line) { struct knote *kn; int count = 0, nmarker = 0; MUTEX_ASSERT_LOCKED(&kq->kq_lock); TAILQ_FOREACH(kn, &kq->kq_head, kn_tqe) { if (kn->kn_filter == EVFILT_MARKER) { if ((kn->kn_status & KN_QUEUED) != 0) panic("%s:%d: kq=%p kn=%p marker QUEUED", func, line, kq, kn); nmarker++; } else { if ((kn->kn_status & KN_ACTIVE) == 0) panic("%s:%d: kq=%p kn=%p knote !ACTIVE", func, line, kq, kn); if ((kn->kn_status & KN_QUEUED) == 0) panic("%s:%d: kq=%p kn=%p knote !QUEUED", func, line, kq, kn); if (kn->kn_kq != kq) panic("%s:%d: kq=%p kn=%p kn_kq=%p != kq", func, line, kq, kn, kn->kn_kq); count++; if (count > kq->kq_count) goto bad; } } if (count != kq->kq_count) { bad: panic("%s:%d: kq=%p kq_count=%d count=%d nmarker=%d", func, line, kq, kq->kq_count, count, nmarker); } } #endif int kqueue_register(struct kqueue *kq, struct kevent *kev, unsigned int pollid, struct proc *p) { struct filedesc *fdp = kq->kq_fdp; const struct filterops *fops = NULL; struct file *fp = NULL; struct knote *kn = NULL, *newkn = NULL; struct knlist *list = NULL; int active, error = 0; KASSERT(pollid == 0 || (p != NULL && p->p_kq == kq)); if (kev->filter < 0) { if (kev->filter + EVFILT_SYSCOUNT < 0) return (EINVAL); fops = sysfilt_ops[~kev->filter]; /* to 0-base index */ } if (fops == NULL) { /* * XXX * filter attach routine is responsible for ensuring that * the identifier can be attached to it. */ return (EINVAL); } if (fops->f_flags & FILTEROP_ISFD) { /* validate descriptor */ if (kev->ident > INT_MAX) return (EBADF); } if (kev->flags & EV_ADD) newkn = pool_get(&knote_pool, PR_WAITOK | PR_ZERO); again: if (fops->f_flags & FILTEROP_ISFD) { if ((fp = fd_getfile(fdp, kev->ident)) == NULL) { error = EBADF; goto done; } mtx_enter(&kq->kq_lock); if (kev->flags & EV_ADD) kqueue_expand_list(kq, kev->ident); if (kev->ident < kq->kq_knlistsize) list = &kq->kq_knlist[kev->ident]; } else { mtx_enter(&kq->kq_lock); if (kev->flags & EV_ADD) kqueue_expand_hash(kq); if (kq->kq_knhashmask != 0) { list = &kq->kq_knhash[ KN_HASH((u_long)kev->ident, kq->kq_knhashmask)]; } } if (list != NULL) { SLIST_FOREACH(kn, list, kn_link) { if (kev->filter == kn->kn_filter && kev->ident == kn->kn_id && pollid == kn->kn_pollid) { if (!knote_acquire(kn, NULL, 0)) { /* knote_acquire() has released * kq_lock. */ if (fp != NULL) { FRELE(fp, p); fp = NULL; } goto again; } break; } } } KASSERT(kn == NULL || (kn->kn_status & KN_PROCESSING) != 0); if (kn == NULL && ((kev->flags & EV_ADD) == 0)) { mtx_leave(&kq->kq_lock); error = ENOENT; goto done; } /* * kn now contains the matching knote, or NULL if no match. */ if (kev->flags & EV_ADD) { if (kn == NULL) { kn = newkn; newkn = NULL; kn->kn_status = KN_PROCESSING; kn->kn_fp = fp; kn->kn_kq = kq; kn->kn_fop = fops; /* * apply reference count to knote structure, and * do not release it at the end of this routine. */ fp = NULL; kn->kn_sfflags = kev->fflags; kn->kn_sdata = kev->data; kev->fflags = 0; kev->data = 0; kn->kn_kevent = *kev; kn->kn_pollid = pollid; knote_attach(kn); mtx_leave(&kq->kq_lock); error = filter_attach(kn); if (error != 0) { knote_drop(kn, p); goto done; } /* * If this is a file descriptor filter, check if * fd was closed while the knote was being added. * knote_fdclose() has missed kn if the function * ran before kn appeared in kq_knlist. */ if ((fops->f_flags & FILTEROP_ISFD) && fd_checkclosed(fdp, kev->ident, kn->kn_fp)) { /* * Drop the knote silently without error * because another thread might already have * seen it. This corresponds to the insert * happening in full before the close. */ filter_detach(kn); knote_drop(kn, p); goto done; } /* Check if there is a pending event. */ active = filter_process(kn, NULL); mtx_enter(&kq->kq_lock); if (active) knote_activate(kn); } else if (kn->kn_fop == &badfd_filtops) { /* * Nothing expects this badfd knote any longer. * Drop it to make room for the new knote and retry. */ KASSERT(kq == p->p_kq); mtx_leave(&kq->kq_lock); filter_detach(kn); knote_drop(kn, p); KASSERT(fp != NULL); FRELE(fp, p); fp = NULL; goto again; } else { /* * The user may change some filter values after the * initial EV_ADD, but doing so will not reset any * filters which have already been triggered. */ mtx_leave(&kq->kq_lock); active = filter_modify(kev, kn); mtx_enter(&kq->kq_lock); if (active) knote_activate(kn); if (kev->flags & EV_ERROR) { error = kev->data; goto release; } } } else if (kev->flags & EV_DELETE) { mtx_leave(&kq->kq_lock); filter_detach(kn); knote_drop(kn, p); goto done; } if ((kev->flags & EV_DISABLE) && ((kn->kn_status & KN_DISABLED) == 0)) kn->kn_status |= KN_DISABLED; if ((kev->flags & EV_ENABLE) && (kn->kn_status & KN_DISABLED)) { kn->kn_status &= ~KN_DISABLED; mtx_leave(&kq->kq_lock); /* Check if there is a pending event. */ active = filter_process(kn, NULL); mtx_enter(&kq->kq_lock); if (active) knote_activate(kn); } release: knote_release(kn); mtx_leave(&kq->kq_lock); done: if (fp != NULL) FRELE(fp, p); if (newkn != NULL) pool_put(&knote_pool, newkn); return (error); } int kqueue_sleep(struct kqueue *kq, struct timespec *tsp) { struct timespec elapsed, start, stop; uint64_t nsecs; int error; MUTEX_ASSERT_LOCKED(&kq->kq_lock); if (tsp != NULL) { getnanouptime(&start); nsecs = MIN(TIMESPEC_TO_NSEC(tsp), MAXTSLP); } else nsecs = INFSLP; error = msleep_nsec(kq, &kq->kq_lock, PSOCK | PCATCH | PNORELOCK, "kqread", nsecs); if (tsp != NULL) { getnanouptime(&stop); timespecsub(&stop, &start, &elapsed); timespecsub(tsp, &elapsed, tsp); if (tsp->tv_sec < 0) timespecclear(tsp); } return (error); } /* * Scan the kqueue, blocking if necessary until the target time is reached. * If tsp is NULL we block indefinitely. If tsp->ts_secs/nsecs are both * 0 we do not block at all. */ int kqueue_scan(struct kqueue_scan_state *scan, int maxevents, struct kevent *kevp, struct timespec *tsp, struct proc *p, int *errorp) { struct kqueue *kq = scan->kqs_kq; struct knote *kn; int error = 0, nkev = 0; int reinserted; if (maxevents == 0) goto done; retry: KASSERT(nkev == 0); error = 0; reinserted = 0; /* msleep() with PCATCH requires kernel lock. */ KERNEL_LOCK(); mtx_enter(&kq->kq_lock); if (kq->kq_state & KQ_DYING) { mtx_leave(&kq->kq_lock); KERNEL_UNLOCK(); error = EBADF; goto done; } if (kq->kq_count == 0) { /* * Successive loops are only necessary if there are more * ready events to gather, so they don't need to block. */ if ((tsp != NULL && !timespecisset(tsp)) || scan->kqs_nevent != 0) { mtx_leave(&kq->kq_lock); KERNEL_UNLOCK(); error = 0; goto done; } kq->kq_state |= KQ_SLEEP; error = kqueue_sleep(kq, tsp); /* kqueue_sleep() has released kq_lock. */ KERNEL_UNLOCK(); if (error == 0 || error == EWOULDBLOCK) goto retry; /* don't restart after signals... */ if (error == ERESTART) error = EINTR; goto done; } /* The actual scan does not sleep on kq, so unlock the kernel. */ KERNEL_UNLOCK(); /* * Put the end marker in the queue to limit the scan to the events * that are currently active. This prevents events from being * recollected if they reactivate during scan. * * If a partial scan has been performed already but no events have * been collected, reposition the end marker to make any new events * reachable. */ if (!scan->kqs_queued) { TAILQ_INSERT_TAIL(&kq->kq_head, &scan->kqs_end, kn_tqe); scan->kqs_queued = 1; } else if (scan->kqs_nevent == 0) { TAILQ_REMOVE(&kq->kq_head, &scan->kqs_end, kn_tqe); TAILQ_INSERT_TAIL(&kq->kq_head, &scan->kqs_end, kn_tqe); } TAILQ_INSERT_HEAD(&kq->kq_head, &scan->kqs_start, kn_tqe); while (nkev < maxevents) { kn = TAILQ_NEXT(&scan->kqs_start, kn_tqe); if (kn->kn_filter == EVFILT_MARKER) { if (kn == &scan->kqs_end) break; /* Move start marker past another thread's marker. */ TAILQ_REMOVE(&kq->kq_head, &scan->kqs_start, kn_tqe); TAILQ_INSERT_AFTER(&kq->kq_head, kn, &scan->kqs_start, kn_tqe); continue; } if (!knote_acquire(kn, NULL, 0)) { /* knote_acquire() has released kq_lock. */ mtx_enter(&kq->kq_lock); continue; } kqueue_check(kq); TAILQ_REMOVE(&kq->kq_head, kn, kn_tqe); kn->kn_status &= ~KN_QUEUED; kq->kq_count--; kqueue_check(kq); if (kn->kn_status & KN_DISABLED) { knote_release(kn); continue; } mtx_leave(&kq->kq_lock); /* Drop expired kqpoll knotes. */ if (p->p_kq == kq && p->p_kq_serial > (unsigned long)kn->kn_udata) { filter_detach(kn); knote_drop(kn, p); mtx_enter(&kq->kq_lock); continue; } /* * Invalidate knotes whose vnodes have been revoked. * This is a workaround; it is tricky to clear existing * knotes and prevent new ones from being registered * with the current revocation mechanism. */ if ((kn->kn_fop->f_flags & FILTEROP_ISFD) && kn->kn_fp != NULL && kn->kn_fp->f_type == DTYPE_VNODE) { struct vnode *vp = kn->kn_fp->f_data; if (__predict_false(vp->v_op == &dead_vops && kn->kn_fop != &dead_filtops)) { filter_detach(kn); kn->kn_fop = &dead_filtops; /* * Check if the event should be delivered. * Use f_event directly because this is * a special situation. */ if (kn->kn_fop->f_event(kn, 0) == 0) { filter_detach(kn); knote_drop(kn, p); mtx_enter(&kq->kq_lock); continue; } } } memset(kevp, 0, sizeof(*kevp)); if (filter_process(kn, kevp) == 0) { mtx_enter(&kq->kq_lock); if ((kn->kn_status & KN_QUEUED) == 0) kn->kn_status &= ~KN_ACTIVE; knote_release(kn); kqueue_check(kq); continue; } /* * Post-event action on the note */ if (kevp->flags & EV_ONESHOT) { filter_detach(kn); knote_drop(kn, p); mtx_enter(&kq->kq_lock); } else if (kevp->flags & (EV_CLEAR | EV_DISPATCH)) { mtx_enter(&kq->kq_lock); if (kevp->flags & EV_DISPATCH) kn->kn_status |= KN_DISABLED; if ((kn->kn_status & KN_QUEUED) == 0) kn->kn_status &= ~KN_ACTIVE; knote_release(kn); } else { mtx_enter(&kq->kq_lock); if ((kn->kn_status & KN_QUEUED) == 0) { kqueue_check(kq); kq->kq_count++; kn->kn_status |= KN_QUEUED; TAILQ_INSERT_TAIL(&kq->kq_head, kn, kn_tqe); /* Wakeup is done after loop. */ reinserted = 1; } knote_release(kn); } kqueue_check(kq); kevp++; nkev++; scan->kqs_nevent++; } TAILQ_REMOVE(&kq->kq_head, &scan->kqs_start, kn_tqe); if (reinserted && kq->kq_count != 0) kqueue_wakeup(kq); mtx_leave(&kq->kq_lock); if (scan->kqs_nevent == 0) goto retry; done: *errorp = error; return (nkev); } void kqueue_scan_setup(struct kqueue_scan_state *scan, struct kqueue *kq) { memset(scan, 0, sizeof(*scan)); KQREF(kq); scan->kqs_kq = kq; scan->kqs_start.kn_filter = EVFILT_MARKER; scan->kqs_start.kn_status = KN_PROCESSING; scan->kqs_end.kn_filter = EVFILT_MARKER; scan->kqs_end.kn_status = KN_PROCESSING; } void kqueue_scan_finish(struct kqueue_scan_state *scan) { struct kqueue *kq = scan->kqs_kq; KASSERT(scan->kqs_start.kn_filter == EVFILT_MARKER); KASSERT(scan->kqs_start.kn_status == KN_PROCESSING); KASSERT(scan->kqs_end.kn_filter == EVFILT_MARKER); KASSERT(scan->kqs_end.kn_status == KN_PROCESSING); if (scan->kqs_queued) { scan->kqs_queued = 0; mtx_enter(&kq->kq_lock); TAILQ_REMOVE(&kq->kq_head, &scan->kqs_end, kn_tqe); mtx_leave(&kq->kq_lock); } KQRELE(kq); } /* * XXX * This could be expanded to call kqueue_scan, if desired. */ int kqueue_read(struct file *fp, struct uio *uio, int fflags) { return (ENXIO); } int kqueue_write(struct file *fp, struct uio *uio, int fflags) { return (ENXIO); } int kqueue_ioctl(struct file *fp, u_long com, caddr_t data, struct proc *p) { return (ENOTTY); } int kqueue_stat(struct file *fp, struct stat *st, struct proc *p) { struct kqueue *kq = fp->f_data; memset(st, 0, sizeof(*st)); st->st_size = kq->kq_count; /* unlocked read */ st->st_blksize = sizeof(struct kevent); st->st_mode = S_IFIFO; return (0); } void kqueue_purge(struct proc *p, struct kqueue *kq) { int i; mtx_enter(&kq->kq_lock); for (i = 0; i < kq->kq_knlistsize; i++) knote_remove(p, kq, &kq->kq_knlist, i, 1); if (kq->kq_knhashmask != 0) { for (i = 0; i < kq->kq_knhashmask + 1; i++) knote_remove(p, kq, &kq->kq_knhash, i, 1); } mtx_leave(&kq->kq_lock); } void kqueue_terminate(struct proc *p, struct kqueue *kq) { struct knote *kn; int state; mtx_enter(&kq->kq_lock); /* * Any remaining entries should be scan markers. * They are removed when the ongoing scans finish. */ KASSERT(kq->kq_count == 0); TAILQ_FOREACH(kn, &kq->kq_head, kn_tqe) KASSERT(kn->kn_filter == EVFILT_MARKER); kq->kq_state |= KQ_DYING; state = kq->kq_state; kqueue_wakeup(kq); mtx_leave(&kq->kq_lock); /* * Any knotes that were attached to this kqueue were deleted * by knote_fdclose() when this kqueue's file descriptor was closed. */ KASSERT(klist_empty(&kq->kq_klist)); if (state & KQ_TASK) taskq_del_barrier(systqmp, &kq->kq_task); } int kqueue_close(struct file *fp, struct proc *p) { struct kqueue *kq = fp->f_data; fp->f_data = NULL; kqueue_purge(p, kq); kqueue_terminate(p, kq); KQRELE(kq); return (0); } static void kqueue_task(void *arg) { struct kqueue *kq = arg; mtx_enter(&kqueue_klist_lock); KNOTE(&kq->kq_klist, 0); mtx_leave(&kqueue_klist_lock); } void kqueue_wakeup(struct kqueue *kq) { MUTEX_ASSERT_LOCKED(&kq->kq_lock); if (kq->kq_state & KQ_SLEEP) { kq->kq_state &= ~KQ_SLEEP; wakeup(kq); } if (!klist_empty(&kq->kq_klist)) { /* Defer activation to avoid recursion. */ kq->kq_state |= KQ_TASK; task_add(systqmp, &kq->kq_task); } } static void kqueue_expand_hash(struct kqueue *kq) { struct knlist *hash; u_long hashmask; MUTEX_ASSERT_LOCKED(&kq->kq_lock); if (kq->kq_knhashmask == 0) { mtx_leave(&kq->kq_lock); hash = hashinit(KN_HASHSIZE, M_KEVENT, M_WAITOK, &hashmask); mtx_enter(&kq->kq_lock); if (kq->kq_knhashmask == 0) { kq->kq_knhash = hash; kq->kq_knhashmask = hashmask; } else { /* Another thread has allocated the hash. */ mtx_leave(&kq->kq_lock); hashfree(hash, KN_HASHSIZE, M_KEVENT); mtx_enter(&kq->kq_lock); } } } static void kqueue_expand_list(struct kqueue *kq, int fd) { struct knlist *list, *olist; int size, osize; MUTEX_ASSERT_LOCKED(&kq->kq_lock); if (kq->kq_knlistsize <= fd) { size = kq->kq_knlistsize; mtx_leave(&kq->kq_lock); while (size <= fd) size += KQEXTENT; list = mallocarray(size, sizeof(*list), M_KEVENT, M_WAITOK); mtx_enter(&kq->kq_lock); if (kq->kq_knlistsize <= fd) { memcpy(list, kq->kq_knlist, kq->kq_knlistsize * sizeof(*list)); memset(&list[kq->kq_knlistsize], 0, (size - kq->kq_knlistsize) * sizeof(*list)); olist = kq->kq_knlist; osize = kq->kq_knlistsize; kq->kq_knlist = list; kq->kq_knlistsize = size; mtx_leave(&kq->kq_lock); free(olist, M_KEVENT, osize * sizeof(*list)); mtx_enter(&kq->kq_lock); } else { /* Another thread has expanded the list. */ mtx_leave(&kq->kq_lock); free(list, M_KEVENT, size * sizeof(*list)); mtx_enter(&kq->kq_lock); } } } /* * Acquire a knote, return non-zero on success, 0 on failure. * * If we cannot acquire the knote we sleep and return 0. The knote * may be stale on return in this case and the caller must restart * whatever loop they are in. * * If we are about to sleep and klist is non-NULL, the list is unlocked * before sleep and remains unlocked on return. */ int knote_acquire(struct knote *kn, struct klist *klist, int ls) { struct kqueue *kq = kn->kn_kq; MUTEX_ASSERT_LOCKED(&kq->kq_lock); KASSERT(kn->kn_filter != EVFILT_MARKER); if (kn->kn_status & KN_PROCESSING) { kn->kn_status |= KN_WAITING; if (klist != NULL) { mtx_leave(&kq->kq_lock); klist_unlock(klist, ls); /* XXX Timeout resolves potential loss of wakeup. */ tsleep_nsec(kn, 0, "kqepts", SEC_TO_NSEC(1)); } else { msleep_nsec(kn, &kq->kq_lock, PNORELOCK, "kqepts", SEC_TO_NSEC(1)); } /* knote may be stale now */ return (0); } kn->kn_status |= KN_PROCESSING; return (1); } /* * Release an acquired knote, clearing KN_PROCESSING. */ void knote_release(struct knote *kn) { MUTEX_ASSERT_LOCKED(&kn->kn_kq->kq_lock); KASSERT(kn->kn_filter != EVFILT_MARKER); KASSERT(kn->kn_status & KN_PROCESSING); if (kn->kn_status & KN_WAITING) { kn->kn_status &= ~KN_WAITING; wakeup(kn); } kn->kn_status &= ~KN_PROCESSING; /* kn should not be accessed anymore */ } /* * activate one knote. */ void knote_activate(struct knote *kn) { MUTEX_ASSERT_LOCKED(&kn->kn_kq->kq_lock); kn->kn_status |= KN_ACTIVE; if ((kn->kn_status & (KN_QUEUED | KN_DISABLED)) == 0) knote_enqueue(kn); } /* * walk down a list of knotes, activating them if their event has triggered. */ void knote(struct klist *list, long hint) { struct knote *kn, *kn0; struct kqueue *kq; KLIST_ASSERT_LOCKED(list); SLIST_FOREACH_SAFE(kn, &list->kl_list, kn_selnext, kn0) { if (filter_event(kn, hint)) { kq = kn->kn_kq; mtx_enter(&kq->kq_lock); knote_activate(kn); mtx_leave(&kq->kq_lock); } } } /* * remove all knotes from a specified knlist */ void knote_remove(struct proc *p, struct kqueue *kq, struct knlist **plist, int idx, int purge) { struct knote *kn; MUTEX_ASSERT_LOCKED(&kq->kq_lock); /* Always fetch array pointer as another thread can resize kq_knlist. */ while ((kn = SLIST_FIRST(*plist + idx)) != NULL) { KASSERT(kn->kn_kq == kq); if (!purge) { /* Skip pending badfd knotes. */ while (kn->kn_fop == &badfd_filtops) { kn = SLIST_NEXT(kn, kn_link); if (kn == NULL) return; KASSERT(kn->kn_kq == kq); } } if (!knote_acquire(kn, NULL, 0)) { /* knote_acquire() has released kq_lock. */ mtx_enter(&kq->kq_lock); continue; } mtx_leave(&kq->kq_lock); filter_detach(kn); /* * Notify poll(2) and select(2) when a monitored * file descriptor is closed. * * This reuses the original knote for delivering the * notification so as to avoid allocating memory. */ if (!purge && (kn->kn_flags & (__EV_POLL | __EV_SELECT)) && !(p->p_kq == kq && p->p_kq_serial > (unsigned long)kn->kn_udata) && kn->kn_fop != &badfd_filtops) { KASSERT(kn->kn_fop->f_flags & FILTEROP_ISFD); FRELE(kn->kn_fp, p); kn->kn_fp = NULL; kn->kn_fop = &badfd_filtops; filter_event(kn, 0); mtx_enter(&kq->kq_lock); knote_activate(kn); knote_release(kn); continue; } knote_drop(kn, p); mtx_enter(&kq->kq_lock); } } /* * remove all knotes referencing a specified fd */ void knote_fdclose(struct proc *p, int fd) { struct filedesc *fdp = p->p_p->ps_fd; struct kqueue *kq; /* * fdplock can be ignored if the file descriptor table is being freed * because no other thread can access the fdp. */ if (fdp->fd_refcnt != 0) fdpassertlocked(fdp); LIST_FOREACH(kq, &fdp->fd_kqlist, kq_next) { mtx_enter(&kq->kq_lock); if (fd < kq->kq_knlistsize) knote_remove(p, kq, &kq->kq_knlist, fd, 0); mtx_leave(&kq->kq_lock); } } /* * handle a process exiting, including the triggering of NOTE_EXIT notes * XXX this could be more efficient, doing a single pass down the klist */ void knote_processexit(struct process *pr) { KERNEL_ASSERT_LOCKED(); KNOTE(&pr->ps_klist, NOTE_EXIT); /* remove other knotes hanging off the process */ klist_invalidate(&pr->ps_klist); } void knote_attach(struct knote *kn) { struct kqueue *kq = kn->kn_kq; struct knlist *list; MUTEX_ASSERT_LOCKED(&kq->kq_lock); KASSERT(kn->kn_status & KN_PROCESSING); if (kn->kn_fop->f_flags & FILTEROP_ISFD) { KASSERT(kq->kq_knlistsize > kn->kn_id); list = &kq->kq_knlist[kn->kn_id]; } else { KASSERT(kq->kq_knhashmask != 0); list = &kq->kq_knhash[KN_HASH(kn->kn_id, kq->kq_knhashmask)]; } SLIST_INSERT_HEAD(list, kn, kn_link); kq->kq_nknotes++; } void knote_detach(struct knote *kn) { struct kqueue *kq = kn->kn_kq; struct knlist *list; MUTEX_ASSERT_LOCKED(&kq->kq_lock); KASSERT(kn->kn_status & KN_PROCESSING); kq->kq_nknotes--; if (kn->kn_fop->f_flags & FILTEROP_ISFD) list = &kq->kq_knlist[kn->kn_id]; else list = &kq->kq_knhash[KN_HASH(kn->kn_id, kq->kq_knhashmask)]; SLIST_REMOVE(list, kn, knote, kn_link); } /* * should be called at spl == 0, since we don't want to hold spl * while calling FRELE and pool_put. */ void knote_drop(struct knote *kn, struct proc *p) { struct kqueue *kq = kn->kn_kq; KASSERT(kn->kn_filter != EVFILT_MARKER); mtx_enter(&kq->kq_lock); knote_detach(kn); if (kn->kn_status & KN_QUEUED) knote_dequeue(kn); if (kn->kn_status & KN_WAITING) { kn->kn_status &= ~KN_WAITING; wakeup(kn); } mtx_leave(&kq->kq_lock); if ((kn->kn_fop->f_flags & FILTEROP_ISFD) && kn->kn_fp != NULL) FRELE(kn->kn_fp, p); pool_put(&knote_pool, kn); } void knote_enqueue(struct knote *kn) { struct kqueue *kq = kn->kn_kq; MUTEX_ASSERT_LOCKED(&kq->kq_lock); KASSERT(kn->kn_filter != EVFILT_MARKER); KASSERT((kn->kn_status & KN_QUEUED) == 0); kqueue_check(kq); TAILQ_INSERT_TAIL(&kq->kq_head, kn, kn_tqe); kn->kn_status |= KN_QUEUED; kq->kq_count++; kqueue_check(kq); kqueue_wakeup(kq); } void knote_dequeue(struct knote *kn) { struct kqueue *kq = kn->kn_kq; MUTEX_ASSERT_LOCKED(&kq->kq_lock); KASSERT(kn->kn_filter != EVFILT_MARKER); KASSERT(kn->kn_status & KN_QUEUED); kqueue_check(kq); TAILQ_REMOVE(&kq->kq_head, kn, kn_tqe); kn->kn_status &= ~KN_QUEUED; kq->kq_count--; kqueue_check(kq); } /* * Assign parameters to the knote. * * The knote's object lock must be held. */ void knote_assign(const struct kevent *kev, struct knote *kn) { if ((kn->kn_fop->f_flags & FILTEROP_MPSAFE) == 0) KERNEL_ASSERT_LOCKED(); kn->kn_sfflags = kev->fflags; kn->kn_sdata = kev->data; kn->kn_udata = kev->udata; } /* * Submit the knote's event for delivery. * * The knote's object lock must be held. */ void knote_submit(struct knote *kn, struct kevent *kev) { if ((kn->kn_fop->f_flags & FILTEROP_MPSAFE) == 0) KERNEL_ASSERT_LOCKED(); if (kev != NULL) { *kev = kn->kn_kevent; if (kn->kn_flags & EV_CLEAR) { kn->kn_fflags = 0; kn->kn_data = 0; } } } void klist_init(struct klist *klist, const struct klistops *ops, void *arg) { SLIST_INIT(&klist->kl_list); klist->kl_ops = ops; klist->kl_arg = arg; } void klist_free(struct klist *klist) { KASSERT(SLIST_EMPTY(&klist->kl_list)); } void klist_insert(struct klist *klist, struct knote *kn) { int ls; ls = klist_lock(klist); SLIST_INSERT_HEAD(&klist->kl_list, kn, kn_selnext); klist_unlock(klist, ls); } void klist_insert_locked(struct klist *klist, struct knote *kn) { KLIST_ASSERT_LOCKED(klist); SLIST_INSERT_HEAD(&klist->kl_list, kn, kn_selnext); } void klist_remove(struct klist *klist, struct knote *kn) { int ls; ls = klist_lock(klist); SLIST_REMOVE(&klist->kl_list, kn, knote, kn_selnext); klist_unlock(klist, ls); } void klist_remove_locked(struct klist *klist, struct knote *kn) { KLIST_ASSERT_LOCKED(klist); SLIST_REMOVE(&klist->kl_list, kn, knote, kn_selnext); } /* * Detach all knotes from klist. The knotes are rewired to indicate EOF. * * The caller of this function must not hold any locks that can block * filterops callbacks that run with KN_PROCESSING. * Otherwise this function might deadlock. */ void klist_invalidate(struct klist *list) { struct knote *kn; struct kqueue *kq; struct proc *p = curproc; int ls; NET_ASSERT_UNLOCKED(); ls = klist_lock(list); while ((kn = SLIST_FIRST(&list->kl_list)) != NULL) { kq = kn->kn_kq; mtx_enter(&kq->kq_lock); if (!knote_acquire(kn, list, ls)) { /* knote_acquire() has released kq_lock * and klist lock. */ ls = klist_lock(list); continue; } mtx_leave(&kq->kq_lock); klist_unlock(list, ls); filter_detach(kn); if (kn->kn_fop->f_flags & FILTEROP_ISFD) { kn->kn_fop = &dead_filtops; filter_event(kn, 0); mtx_enter(&kq->kq_lock); knote_activate(kn); knote_release(kn); mtx_leave(&kq->kq_lock); } else { knote_drop(kn, p); } ls = klist_lock(list); } klist_unlock(list, ls); } static int klist_lock(struct klist *list) { int ls = 0; if (list->kl_ops != NULL) { ls = list->kl_ops->klo_lock(list->kl_arg); } else { KERNEL_LOCK(); ls = splhigh(); } return ls; } static void klist_unlock(struct klist *list, int ls) { if (list->kl_ops != NULL) { list->kl_ops->klo_unlock(list->kl_arg, ls); } else { splx(ls); KERNEL_UNLOCK(); } } static void klist_mutex_assertlk(void *arg) { struct mutex *mtx = arg; (void)mtx; MUTEX_ASSERT_LOCKED(mtx); } static int klist_mutex_lock(void *arg) { struct mutex *mtx = arg; mtx_enter(mtx); return 0; } static void klist_mutex_unlock(void *arg, int s) { struct mutex *mtx = arg; mtx_leave(mtx); } static const struct klistops mutex_klistops = { .klo_assertlk = klist_mutex_assertlk, .klo_lock = klist_mutex_lock, .klo_unlock = klist_mutex_unlock, }; void klist_init_mutex(struct klist *klist, struct mutex *mtx) { klist_init(klist, &mutex_klistops, mtx); } static void klist_rwlock_assertlk(void *arg) { struct rwlock *rwl = arg; (void)rwl; rw_assert_wrlock(rwl); } static int klist_rwlock_lock(void *arg) { struct rwlock *rwl = arg; rw_enter_write(rwl); return 0; } static void klist_rwlock_unlock(void *arg, int s) { struct rwlock *rwl = arg; rw_exit_write(rwl); } static const struct klistops rwlock_klistops = { .klo_assertlk = klist_rwlock_assertlk, .klo_lock = klist_rwlock_lock, .klo_unlock = klist_rwlock_unlock, }; void klist_init_rwlock(struct klist *klist, struct rwlock *rwl) { klist_init(klist, &rwlock_klistops, rwl); }
127 85 127 114 13 13 2 1 2 4 3 137 38 109 56 17 41 2 2 5 5 5 8 5 8 8 5 4 4 4 4 4 8 8 8 4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 /* $OpenBSD: kern_proc.c,v 1.92 2022/08/14 01:58:27 jsg Exp $ */ /* $NetBSD: kern_proc.c,v 1.14 1996/02/09 18:59:41 christos Exp $ */ /* * Copyright (c) 1982, 1986, 1989, 1991, 1993 * The Regents of the University of California. All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * @(#)kern_proc.c 8.4 (Berkeley) 1/4/94 */ #include <sys/param.h> #include <sys/systm.h> #include <sys/proc.h> #include <sys/wait.h> #include <sys/rwlock.h> #include <sys/malloc.h> #include <sys/tty.h> #include <sys/signalvar.h> #include <sys/pool.h> #include <sys/vnode.h> struct rwlock uidinfolk; #define UIHASH(uid) (&uihashtbl[(uid) & uihash]) LIST_HEAD(uihashhead, uidinfo) *uihashtbl; u_long uihash; /* size of hash table - 1 */ /* * Other process lists */ struct tidhashhead *tidhashtbl; u_long tidhash; struct pidhashhead *pidhashtbl; u_long pidhash; struct pgrphashhead *pgrphashtbl; u_long pgrphash; struct processlist allprocess; struct processlist zombprocess; struct proclist allproc; struct pool proc_pool; struct pool process_pool; struct pool rusage_pool; struct pool ucred_pool; struct pool pgrp_pool; struct pool session_pool; void pgdelete(struct pgrp *); void fixjobc(struct process *, struct pgrp *, int); static void orphanpg(struct pgrp *); #ifdef DEBUG void pgrpdump(void); #endif /* * Initialize global process hashing structures. */ void procinit(void) { LIST_INIT(&allprocess); LIST_INIT(&zombprocess); LIST_INIT(&allproc); rw_init(&uidinfolk, "uidinfo"); tidhashtbl = hashinit(maxthread / 4, M_PROC, M_NOWAIT, &tidhash); pidhashtbl = hashinit(maxprocess / 4, M_PROC, M_NOWAIT, &pidhash); pgrphashtbl = hashinit(maxprocess / 4, M_PROC, M_NOWAIT, &pgrphash); uihashtbl = hashinit(maxprocess / 16, M_PROC, M_NOWAIT, &uihash); if (!tidhashtbl || !pidhashtbl || !pgrphashtbl || !uihashtbl) panic("procinit: malloc"); pool_init(&proc_pool, sizeof(struct proc), 0, IPL_NONE, PR_WAITOK, "procpl", NULL); pool_init(&process_pool, sizeof(struct process), 0, IPL_NONE, PR_WAITOK, "processpl", NULL); pool_init(&rusage_pool, sizeof(struct rusage), 0, IPL_NONE, PR_WAITOK, "zombiepl", NULL); pool_init(&ucred_pool, sizeof(struct ucred), 0, IPL_MPFLOOR, 0, "ucredpl", NULL); pool_init(&pgrp_pool, sizeof(struct pgrp), 0, IPL_NONE, PR_WAITOK, "pgrppl", NULL); pool_init(&session_pool, sizeof(struct session), 0, IPL_NONE, PR_WAITOK, "sessionpl", NULL); } /* * This returns with `uidinfolk' held: caller must call uid_release() * after making whatever change they needed. */ struct uidinfo * uid_find(uid_t uid) { struct uidinfo *uip, *nuip; struct uihashhead *uipp; uipp = UIHASH(uid); rw_enter_write(&uidinfolk); LIST_FOREACH(uip, uipp, ui_hash) if (uip->ui_uid == uid) break; if (uip) return (uip); rw_exit_write(&uidinfolk); nuip = malloc(sizeof(*nuip), M_PROC, M_WAITOK|M_ZERO); rw_enter_write(&uidinfolk); LIST_FOREACH(uip, uipp, ui_hash) if (uip->ui_uid == uid) break; if (uip) { free(nuip, M_PROC, sizeof(*nuip)); return (uip); } nuip->ui_uid = uid; LIST_INSERT_HEAD(uipp, nuip, ui_hash); return (nuip); } void uid_release(struct uidinfo *uip) { rw_exit_write(&uidinfolk); } /* * Change the count associated with number of threads * a given user is using. */ int chgproccnt(uid_t uid, int diff) { struct uidinfo *uip; long count; uip = uid_find(uid); count = (uip->ui_proccnt += diff); uid_release(uip); if (count < 0) panic("chgproccnt: procs < 0"); return count; } /* * Is pr an inferior of parent? */ int inferior(struct process *pr, struct process *parent) { for (; pr != parent; pr = pr->ps_pptr) if (pr->ps_pid == 0 || pr->ps_pid == 1) return (0); return (1); } /* * Locate a proc (thread) by number */ struct proc * tfind(pid_t tid) { struct proc *p; LIST_FOREACH(p, TIDHASH(tid), p_hash) if (p->p_tid == tid) return (p); return (NULL); } /* * Locate a process by number */ struct process * prfind(pid_t pid) { struct process *pr; LIST_FOREACH(pr, PIDHASH(pid), ps_hash) if (pr->ps_pid == pid) return (pr); return (NULL); } /* * Locate a process group by number */ struct pgrp * pgfind(pid_t pgid) { struct pgrp *pgrp; LIST_FOREACH(pgrp, PGRPHASH(pgid), pg_hash) if (pgrp->pg_id == pgid) return (pgrp); return (NULL); } /* * Locate a zombie process */ struct process * zombiefind(pid_t pid) { struct process *pr; LIST_FOREACH(pr, &zombprocess, ps_list) if (pr->ps_pid == pid) return (pr); return (NULL); } /* * Move process to a new process group. If a session is provided * then it's a new session to contain this process group; otherwise * the process is staying within its existing session. */ void enternewpgrp(struct process *pr, struct pgrp *pgrp, struct session *newsess) { #ifdef DIAGNOSTIC if (SESS_LEADER(pr)) panic("%s: session leader attempted setpgrp", __func__); #endif if (newsess != NULL) { /* * New session. Initialize it completely */ timeout_set(&newsess->s_verauthto, zapverauth, newsess); newsess->s_leader = pr; newsess->s_count = 1; newsess->s_ttyvp = NULL; newsess->s_ttyp = NULL; memcpy(newsess->s_login, pr->ps_session->s_login, sizeof(newsess->s_login)); atomic_clearbits_int(&pr->ps_flags, PS_CONTROLT); pgrp->pg_session = newsess; #ifdef DIAGNOSTIC if (pr != curproc->p_p) panic("%s: mksession but not curproc", __func__); #endif } else { pgrp->pg_session = pr->ps_session; pgrp->pg_session->s_count++; } pgrp->pg_id = pr->ps_pid; LIST_INIT(&pgrp->pg_members); LIST_INIT(&pgrp->pg_sigiolst); LIST_INSERT_HEAD(PGRPHASH(pr->ps_pid), pgrp, pg_hash); pgrp->pg_jobc = 0; enterthispgrp(pr, pgrp); } /* * move process to an existing process group */ void enterthispgrp(struct process *pr, struct pgrp *pgrp) { struct pgrp *savepgrp = pr->ps_pgrp; /* * Adjust eligibility of affected pgrps to participate in job control. * Increment eligibility counts before decrementing, otherwise we * could reach 0 spuriously during the first call. */ fixjobc(pr, pgrp, 1); fixjobc(pr, savepgrp, 0); LIST_REMOVE(pr, ps_pglist); pr->ps_pgrp = pgrp; LIST_INSERT_HEAD(&pgrp->pg_members, pr, ps_pglist); if (LIST_EMPTY(&savepgrp->pg_members)) pgdelete(savepgrp); } /* * remove process from process group */ void leavepgrp(struct process *pr) { if (pr->ps_session->s_verauthppid == pr->ps_pid) zapverauth(pr->ps_session); LIST_REMOVE(pr, ps_pglist); if (LIST_EMPTY(&pr->ps_pgrp->pg_members)) pgdelete(pr->ps_pgrp); pr->ps_pgrp = NULL; } /* * delete a process group */ void pgdelete(struct pgrp *pgrp) { sigio_freelist(&pgrp->pg_sigiolst); if (pgrp->pg_session->s_ttyp != NULL && pgrp->pg_session->s_ttyp->t_pgrp == pgrp) pgrp->pg_session->s_ttyp->t_pgrp = NULL; LIST_REMOVE(pgrp, pg_hash); SESSRELE(pgrp->pg_session); pool_put(&pgrp_pool, pgrp); } void zapverauth(void *v) { struct session *sess = v; sess->s_verauthuid = 0; sess->s_verauthppid = 0; } /* * Adjust pgrp jobc counters when specified process changes process group. * We count the number of processes in each process group that "qualify" * the group for terminal job control (those with a parent in a different * process group of the same session). If that count reaches zero, the * process group becomes orphaned. Check both the specified process' * process group and that of its children. * entering == 0 => pr is leaving specified group. * entering == 1 => pr is entering specified group. * XXX need proctree lock */ void fixjobc(struct process *pr, struct pgrp *pgrp, int entering) { struct pgrp *hispgrp; struct session *mysession = pgrp->pg_session; /* * Check pr's parent to see whether pr qualifies its own process * group; if so, adjust count for pr's process group. */ if ((hispgrp = pr->ps_pptr->ps_pgrp) != pgrp && hispgrp->pg_session == mysession) { if (entering) pgrp->pg_jobc++; else if (--pgrp->pg_jobc == 0) orphanpg(pgrp); } /* * Check this process' children to see whether they qualify * their process groups; if so, adjust counts for children's * process groups. */ LIST_FOREACH(pr, &pr->ps_children, ps_sibling) if ((hispgrp = pr->ps_pgrp) != pgrp && hispgrp->pg_session == mysession && (pr->ps_flags & PS_ZOMBIE) == 0) { if (entering) hispgrp->pg_jobc++; else if (--hispgrp->pg_jobc == 0) orphanpg(hispgrp); } } void killjobc(struct process *pr) { if (SESS_LEADER(pr)) { struct session *sp = pr->ps_session; if (sp->s_ttyvp) { struct vnode *ovp; /* * Controlling process. * Signal foreground pgrp, * drain controlling terminal * and revoke access to controlling terminal. */ if (sp->s_ttyp->t_session == sp) { if (sp->s_ttyp->t_pgrp) pgsignal(sp->s_ttyp->t_pgrp, SIGHUP, 1); ttywait(sp->s_ttyp); /* * The tty could have been revoked * if we blocked. */ if (sp->s_ttyvp) VOP_REVOKE(sp->s_ttyvp, REVOKEALL); } ovp = sp->s_ttyvp; sp->s_ttyvp = NULL; if (ovp) vrele(ovp); /* * s_ttyp is not zero'd; we use this to * indicate that the session once had a * controlling terminal. (for logging and * informational purposes) */ } sp->s_leader = NULL; } fixjobc(pr, pr->ps_pgrp, 0); } /* * A process group has become orphaned; * if there are any stopped processes in the group, * hang-up all process in that group. */ static void orphanpg(struct pgrp *pg) { struct process *pr; LIST_FOREACH(pr, &pg->pg_members, ps_pglist) { if (pr->ps_mainproc->p_stat == SSTOP) { LIST_FOREACH(pr, &pg->pg_members, ps_pglist) { prsignal(pr, SIGHUP); prsignal(pr, SIGCONT); } return; } } } #ifdef DDB void proc_printit(struct proc *p, const char *modif, int (*pr)(const char *, ...) __attribute__((__format__(__kprintf__,1,2)))) { static const char *const pstat[] = { "idle", "run", "sleep", "stop", "zombie", "dead", "onproc" }; char pstbuf[5]; const char *pst = pstbuf; if (p->p_stat < 1 || p->p_stat > sizeof(pstat) / sizeof(pstat[0])) snprintf(pstbuf, sizeof(pstbuf), "%d", p->p_stat); else pst = pstat[(int)p->p_stat - 1]; (*pr)("PROC (%s) pid=%d stat=%s\n", p->p_p->ps_comm, p->p_tid, pst); (*pr)(" flags process=%b proc=%b\n", p->p_p->ps_flags, PS_BITS, p->p_flag, P_BITS); (*pr)(" pri=%u, usrpri=%u, nice=%d\n", p->p_runpri, p->p_usrpri, p->p_p->ps_nice); (*pr)(" forw=%p, list=%p,%p\n", TAILQ_NEXT(p, p_runq), p->p_list.le_next, p->p_list.le_prev); (*pr)(" process=%p user=%p, vmspace=%p\n", p->p_p, p->p_addr, p->p_vmspace); (*pr)(" estcpu=%u, cpticks=%d, pctcpu=%u.%u\n", p->p_estcpu, p->p_cpticks, p->p_pctcpu / 100, p->p_pctcpu % 100); (*pr)(" user=%u, sys=%u, intr=%u\n", p->p_uticks, p->p_sticks, p->p_iticks); } #include <machine/db_machdep.h> #include <ddb/db_output.h> void db_kill_cmd(db_expr_t addr, int have_addr, db_expr_t count, char *modif) { struct process *pr; struct proc *p; pr = prfind(addr); if (pr == NULL) { db_printf("%ld: No such process", addr); return; } p = TAILQ_FIRST(&pr->ps_threads); /* Send uncatchable SIGABRT for coredump */ sigabort(p); } void db_show_all_procs(db_expr_t addr, int haddr, db_expr_t count, char *modif) { char *mode; int skipzomb = 0; int has_kernel_lock = 0; struct proc *p; struct process *pr, *ppr; if (modif[0] == 0) modif[0] = 'n'; /* default == normal mode */ mode = "mawno"; while (*mode && *mode != modif[0]) mode++; if (*mode == 0 || *mode == 'm') { db_printf("usage: show all procs [/a] [/n] [/w]\n"); db_printf("\t/a == show process address info\n"); db_printf("\t/n == show normal process info [default]\n"); db_printf("\t/w == show process pgrp/wait info\n"); db_printf("\t/o == show normal info for non-idle SONPROC\n"); return; } pr = LIST_FIRST(&allprocess); switch (*mode) { case 'a': db_printf(" TID %-9s %18s %18s %18s\n", "COMMAND", "STRUCT PROC *", "UAREA *", "VMSPACE/VM_MAP"); break; case 'n': db_printf(" PID %6s %5s %5s S %10s %-12s %-15s\n", "TID", "PPID", "UID", "FLAGS", "WAIT", "COMMAND"); break; case 'w': db_printf(" TID %-15s %-5s %18s %s\n", "COMMAND", "PGRP", "WAIT-CHANNEL", "WAIT-MSG"); break; case 'o': skipzomb = 1; db_printf(" TID %5s %5s %10s %10s %3s %-30s\n", "PID", "UID", "PRFLAGS", "PFLAGS", "CPU", "COMMAND"); break; } while (pr != NULL) { ppr = pr->ps_pptr; TAILQ_FOREACH(p, &pr->ps_threads, p_thr_link) { #ifdef MULTIPROCESSOR if (__mp_lock_held(&kernel_lock, p->p_cpu)) has_kernel_lock = 1; else has_kernel_lock = 0; #endif if (p->p_stat) { if (*mode == 'o') { if (p->p_stat != SONPROC) continue; if (p->p_cpu != NULL && p->p_cpu-> ci_schedstate.spc_idleproc == p) continue; } if (*mode == 'n') { db_printf("%c%5d ", (p == curproc ? '*' : ' '), pr->ps_pid); } else { db_printf("%c%6d ", (p == curproc ? '*' : ' '), p->p_tid); } switch (*mode) { case 'a': db_printf("%-9.9s %18p %18p %18p\n", pr->ps_comm, p, p->p_addr, p->p_vmspace); break; case 'n': db_printf("%6d %5d %5d %d %#10x " "%-12.12s %-15s\n", p->p_tid, ppr ? ppr->ps_pid : -1, pr->ps_ucred->cr_ruid, p->p_stat, p->p_flag | pr->ps_flags, (p->p_wchan && p->p_wmesg) ? p->p_wmesg : "", pr->ps_comm); break; case 'w': db_printf("%-15s %-5d %18p %s\n", pr->ps_comm, (pr->ps_pgrp ? pr->ps_pgrp->pg_id : -1), p->p_wchan, (p->p_wchan && p->p_wmesg) ? p->p_wmesg : ""); break; case 'o': db_printf("%5d %5d %#10x %#10x %3d" "%c %-31s\n", pr->ps_pid, pr->ps_ucred->cr_ruid, pr->ps_flags, p->p_flag, CPU_INFO_UNIT(p->p_cpu), has_kernel_lock ? 'K' : ' ', pr->ps_comm); break; } } } pr = LIST_NEXT(pr, ps_list); if (pr == NULL && skipzomb == 0) { skipzomb = 1; pr = LIST_FIRST(&zombprocess); } } } #endif #ifdef DEBUG void pgrpdump(void) { struct pgrp *pgrp; struct process *pr; int i; for (i = 0; i <= pgrphash; i++) { if (!LIST_EMPTY(&pgrphashtbl[i])) { printf("\tindx %d\n", i); LIST_FOREACH(pgrp, &pgrphashtbl[i], pg_hash) { printf("\tpgrp %p, pgid %d, sess %p, sesscnt %d, mem %p\n", pgrp, pgrp->pg_id, pgrp->pg_session, pgrp->pg_session->s_count, LIST_FIRST(&pgrp->pg_members)); LIST_FOREACH(pr, &pgrp->pg_members, ps_pglist) { printf("\t\tpid %d addr %p pgrp %p\n", pr->ps_pid, pr, pr->ps_pgrp); } } } } } #endif /* DEBUG */
8 8 8 8 8 8 8 164 161 8 164 161 8 14 2 1 12 14 3 2 7 1 6 8 5 3 14 14 14 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 /* $OpenBSD: uvm_pager.c,v 1.89 2022/08/19 05:53:19 mpi Exp $ */ /* $NetBSD: uvm_pager.c,v 1.36 2000/11/27 18:26:41 chs Exp $ */ /* * Copyright (c) 1997 Charles D. Cranor and Washington University. * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. * * from: Id: uvm_pager.c,v 1.1.2.23 1998/02/02 20:38:06 chuck Exp */ /* * uvm_pager.c: generic functions used to assist the pagers. */ #include <sys/param.h> #include <sys/systm.h> #include <sys/malloc.h> #include <sys/pool.h> #include <sys/buf.h> #include <sys/atomic.h> #include <uvm/uvm.h> const struct uvm_pagerops *uvmpagerops[] = { &aobj_pager, &uvm_deviceops, &uvm_vnodeops, }; /* * the pager map: provides KVA for I/O * * Each uvm_pseg has room for MAX_PAGERMAP_SEGS pager io space of * MAXBSIZE bytes. * * The number of uvm_pseg instances is dynamic using an array segs. * At most UVM_PSEG_COUNT instances can exist. * * psegs[0/1] always exist (so that the pager can always map in pages). * psegs[0/1] element 0 are always reserved for the pagedaemon. * * Any other pseg is automatically created when no space is available * and automatically destroyed when it is no longer in use. */ #define MAX_PAGER_SEGS 16 #define PSEG_NUMSEGS (PAGER_MAP_SIZE / MAX_PAGER_SEGS / MAXBSIZE) struct uvm_pseg { /* Start of virtual space; 0 if not inited. */ vaddr_t start; /* Bitmap of the segments in use in this pseg. */ int use; }; struct mutex uvm_pseg_lck; struct uvm_pseg psegs[PSEG_NUMSEGS]; #define UVM_PSEG_FULL(pseg) ((pseg)->use == (1 << MAX_PAGER_SEGS) - 1) #define UVM_PSEG_EMPTY(pseg) ((pseg)->use == 0) #define UVM_PSEG_INUSE(pseg,id) (((pseg)->use & (1 << (id))) != 0) void uvm_pseg_init(struct uvm_pseg *); vaddr_t uvm_pseg_get(int); void uvm_pseg_release(vaddr_t); /* * uvm_pager_init: init pagers (at boot time) */ void uvm_pager_init(void) { int lcv; /* init pager map */ uvm_pseg_init(&psegs[0]); uvm_pseg_init(&psegs[1]); mtx_init(&uvm_pseg_lck, IPL_VM); /* init ASYNC I/O queue */ TAILQ_INIT(&uvm.aio_done); /* call pager init functions */ for (lcv = 0 ; lcv < sizeof(uvmpagerops)/sizeof(struct uvm_pagerops *); lcv++) { if (uvmpagerops[lcv]->pgo_init) uvmpagerops[lcv]->pgo_init(); } } /* * Initialize a uvm_pseg. * * May fail, in which case seg->start == 0. * * Caller locks uvm_pseg_lck. */ void uvm_pseg_init(struct uvm_pseg *pseg) { KASSERT(pseg->start == 0); KASSERT(pseg->use == 0); pseg->start = (vaddr_t)km_alloc(MAX_PAGER_SEGS * MAXBSIZE, &kv_any, &kp_none, &kd_trylock); } /* * Acquire a pager map segment. * * Returns a vaddr for paging. 0 on failure. * * Caller does not lock. */ vaddr_t uvm_pseg_get(int flags) { int i; struct uvm_pseg *pseg; /* * XXX Prevent lock ordering issue in uvm_unmap_detach(). A real * fix would be to move the KERNEL_LOCK() out of uvm_unmap_detach(). * * witness_checkorder() at witness_checkorder+0xba0 * __mp_lock() at __mp_lock+0x5f * uvm_unmap_detach() at uvm_unmap_detach+0xc5 * uvm_map() at uvm_map+0x857 * uvm_km_valloc_try() at uvm_km_valloc_try+0x65 * uvm_pseg_get() at uvm_pseg_get+0x6f * uvm_pagermapin() at uvm_pagermapin+0x45 * uvn_io() at uvn_io+0xcf * uvn_get() at uvn_get+0x156 * uvm_fault_lower() at uvm_fault_lower+0x28a * uvm_fault() at uvm_fault+0x1b3 * upageflttrap() at upageflttrap+0x62 */ KERNEL_LOCK(); mtx_enter(&uvm_pseg_lck); pager_seg_restart: /* Find first pseg that has room. */ for (pseg = &psegs[0]; pseg != &psegs[PSEG_NUMSEGS]; pseg++) { if (UVM_PSEG_FULL(pseg)) continue; if (pseg->start == 0) { /* Need initialization. */ uvm_pseg_init(pseg); if (pseg->start == 0) goto pager_seg_fail; } /* Keep indexes 0,1 reserved for pagedaemon. */ if ((pseg == &psegs[0] || pseg == &psegs[1]) && (curproc != uvm.pagedaemon_proc)) i = 2; else i = 0; for (; i < MAX_PAGER_SEGS; i++) { if (!UVM_PSEG_INUSE(pseg, i)) { pseg->use |= 1 << i; mtx_leave(&uvm_pseg_lck); KERNEL_UNLOCK(); return pseg->start + i * MAXBSIZE; } } } pager_seg_fail: if ((flags & UVMPAGER_MAPIN_WAITOK) != 0) { msleep_nsec(&psegs, &uvm_pseg_lck, PVM, "pagerseg", INFSLP); goto pager_seg_restart; } mtx_leave(&uvm_pseg_lck); KERNEL_UNLOCK(); return 0; } /* * Release a pager map segment. * * Caller does not lock. * * Deallocates pseg if it is no longer in use. */ void uvm_pseg_release(vaddr_t segaddr) { int id; struct uvm_pseg *pseg; vaddr_t va = 0; mtx_enter(&uvm_pseg_lck); for (pseg = &psegs[0]; pseg != &psegs[PSEG_NUMSEGS]; pseg++) { if (pseg->start <= segaddr && segaddr < pseg->start + MAX_PAGER_SEGS * MAXBSIZE) break; } KASSERT(pseg != &psegs[PSEG_NUMSEGS]); id = (segaddr - pseg->start) / MAXBSIZE; KASSERT(id >= 0 && id < MAX_PAGER_SEGS); /* test for no remainder */ KDASSERT(segaddr == pseg->start + id * MAXBSIZE); KASSERT(UVM_PSEG_INUSE(pseg, id)); pseg->use &= ~(1 << id); wakeup(&psegs); if ((pseg != &psegs[0] && pseg != &psegs[1]) && UVM_PSEG_EMPTY(pseg)) { va = pseg->start; pseg->start = 0; } mtx_leave(&uvm_pseg_lck); if (va) { km_free((void *)va, MAX_PAGER_SEGS * MAXBSIZE, &kv_any, &kp_none); } } /* * uvm_pagermapin: map pages into KVA for I/O that needs mappings * * We basically just km_valloc a blank map entry to reserve the space in the * kernel map and then use pmap_enter() to put the mappings in by hand. */ vaddr_t uvm_pagermapin(struct vm_page **pps, int npages, int flags) { vaddr_t kva, cva; vm_prot_t prot; vsize_t size; struct vm_page *pp; #if defined(__HAVE_PMAP_DIRECT) /* * Use direct mappings for single page, unless there is a risk * of aliasing. */ if (npages == 1 && PMAP_PREFER_ALIGN() == 0) { KASSERT(pps[0]); KASSERT(pps[0]->pg_flags & PG_BUSY); return pmap_map_direct(pps[0]); } #endif prot = PROT_READ; if (flags & UVMPAGER_MAPIN_READ) prot |= PROT_WRITE; size = ptoa(npages); KASSERT(size <= MAXBSIZE); kva = uvm_pseg_get(flags); if (kva == 0) return 0; for (cva = kva ; size != 0 ; size -= PAGE_SIZE, cva += PAGE_SIZE) { pp = *pps++; KASSERT(pp); KASSERT(pp->pg_flags & PG_BUSY); /* Allow pmap_enter to fail. */ if (pmap_enter(pmap_kernel(), cva, VM_PAGE_TO_PHYS(pp), prot, PMAP_WIRED | PMAP_CANFAIL | prot) != 0) { pmap_remove(pmap_kernel(), kva, cva); pmap_update(pmap_kernel()); uvm_pseg_release(kva); return 0; } } pmap_update(pmap_kernel()); return kva; } /* * uvm_pagermapout: remove KVA mapping * * We remove our mappings by hand and then remove the mapping. */ void uvm_pagermapout(vaddr_t kva, int npages) { #if defined(__HAVE_PMAP_DIRECT) /* * Use direct mappings for single page, unless there is a risk * of aliasing. */ if (npages == 1 && PMAP_PREFER_ALIGN() == 0) { pmap_unmap_direct(kva); return; } #endif pmap_remove(pmap_kernel(), kva, kva + ((vsize_t)npages << PAGE_SHIFT)); pmap_update(pmap_kernel()); uvm_pseg_release(kva); } /* * uvm_mk_pcluster * * generic "make 'pager put' cluster" function. a pager can either * [1] set pgo_mk_pcluster to NULL (never cluster), [2] set it to this * generic function, or [3] set it to a pager specific function. * * => caller must lock object _and_ pagequeues (since we need to look * at active vs. inactive bits, etc.) * => caller must make center page busy and write-protect it * => we mark all cluster pages busy for the caller * => the caller must unbusy all pages (and check wanted/released * status if it drops the object lock) * => flags: * PGO_ALLPAGES: all pages in object are valid targets * !PGO_ALLPAGES: use "lo" and "hi" to limit range of cluster * PGO_DOACTCLUST: include active pages in cluster. * PGO_FREE: set the PG_RELEASED bits on the cluster so they'll be freed * in async io (caller must clean on error). * NOTE: the caller should clear PG_CLEANCHK bits if PGO_DOACTCLUST. * PG_CLEANCHK is only a hint, but clearing will help reduce * the number of calls we make to the pmap layer. */ struct vm_page ** uvm_mk_pcluster(struct uvm_object *uobj, struct vm_page **pps, int *npages, struct vm_page *center, int flags, voff_t mlo, voff_t mhi) { struct vm_page **ppsp, *pclust; voff_t lo, hi, curoff; int center_idx, forward, incr; /* * center page should already be busy and write protected. XXX: * suppose page is wired? if we lock, then a process could * fault/block on it. if we don't lock, a process could write the * pages in the middle of an I/O. (consider an msync()). let's * lock it for now (better to delay than corrupt data?). */ /* get cluster boundaries, check sanity, and apply our limits as well.*/ uobj->pgops->pgo_cluster(uobj, center->offset, &lo, &hi); if ((flags & PGO_ALLPAGES) == 0) { if (lo < mlo) lo = mlo; if (hi > mhi) hi = mhi; } if ((hi - lo) >> PAGE_SHIFT > *npages) { /* pps too small, bail out! */ pps[0] = center; *npages = 1; return pps; } /* now determine the center and attempt to cluster around the edges */ center_idx = (center->offset - lo) >> PAGE_SHIFT; pps[center_idx] = center; /* plug in the center page */ ppsp = &pps[center_idx]; *npages = 1; /* * attempt to cluster around the left [backward], and then * the right side [forward]. * * note that for inactive pages (pages that have been deactivated) * there are no valid mappings and PG_CLEAN should be up to date. * [i.e. there is no need to query the pmap with pmap_is_modified * since there are no mappings]. */ for (forward = 0 ; forward <= 1 ; forward++) { incr = forward ? PAGE_SIZE : -PAGE_SIZE; curoff = center->offset + incr; for ( ;(forward == 0 && curoff >= lo) || (forward && curoff < hi); curoff += incr) { pclust = uvm_pagelookup(uobj, curoff); /* lookup page */ if (pclust == NULL) { break; /* no page */ } /* handle active pages */ /* NOTE: inactive pages don't have pmap mappings */ if ((pclust->pg_flags & PQ_INACTIVE) == 0) { if ((flags & PGO_DOACTCLUST) == 0) { /* dont want mapped pages at all */ break; } /* make sure "clean" bit is sync'd */ if ((pclust->pg_flags & PG_CLEANCHK) == 0) { if ((pclust->pg_flags & (PG_CLEAN|PG_BUSY)) == PG_CLEAN && pmap_is_modified(pclust)) atomic_clearbits_int( &pclust->pg_flags, PG_CLEAN); /* now checked */ atomic_setbits_int(&pclust->pg_flags, PG_CLEANCHK); } } /* is page available for cleaning and does it need it */ if ((pclust->pg_flags & (PG_CLEAN|PG_BUSY)) != 0) { break; /* page is already clean or is busy */ } /* yes! enroll the page in our array */ atomic_setbits_int(&pclust->pg_flags, PG_BUSY); UVM_PAGE_OWN(pclust, "uvm_mk_pcluster"); /* * If we want to free after io is done, and we're * async, set the released flag */ if ((flags & (PGO_FREE|PGO_SYNCIO)) == PGO_FREE) atomic_setbits_int(&pclust->pg_flags, PG_RELEASED); /* XXX: protect wired page? see above comment. */ pmap_page_protect(pclust, PROT_READ); if (!forward) { ppsp--; /* back up one page */ *ppsp = pclust; } else { /* move forward one page */ ppsp[*npages] = pclust; } (*npages)++; } } /* * done! return the cluster array to the caller!!! */ return ppsp; } /* * uvm_pager_put: high level pageout routine * * we want to pageout page "pg" to backing store, clustering if * possible. * * => page queues must be locked by caller * => if page is not swap-backed, then "uobj" points to the object * backing it. * => if page is swap-backed, then "uobj" should be NULL. * => "pg" should be PG_BUSY (by caller), and !PG_CLEAN * for swap-backed memory, "pg" can be NULL if there is no page * of interest [sometimes the case for the pagedaemon] * => "ppsp_ptr" should point to an array of npages vm_page pointers * for possible cluster building * => flags (first two for non-swap-backed pages) * PGO_ALLPAGES: all pages in uobj are valid targets * PGO_DOACTCLUST: include "PQ_ACTIVE" pages as valid targets * PGO_SYNCIO: do SYNC I/O (no async) * PGO_PDFREECLUST: pagedaemon: drop cluster on successful I/O * PGO_FREE: tell the aio daemon to free pages in the async case. * => start/stop: if (uobj && !PGO_ALLPAGES) limit targets to this range * if (!uobj) start is the (daddr_t) of the starting swapblk * => return state: * 1. we return the VM_PAGER status code of the pageout * 2. we return with the page queues unlocked * 3. on errors we always drop the cluster. thus, if we return * !PEND, !OK, then the caller only has to worry about * un-busying the main page (not the cluster pages). * 4. on success, if !PGO_PDFREECLUST, we return the cluster * with all pages busy (caller must un-busy and check * wanted/released flags). */ int uvm_pager_put(struct uvm_object *uobj, struct vm_page *pg, struct vm_page ***ppsp_ptr, int *npages, int flags, voff_t start, voff_t stop) { int result; daddr_t swblk; struct vm_page **ppsp = *ppsp_ptr; /* * note that uobj is null if we are doing a swap-backed pageout. * note that uobj is !null if we are doing normal object pageout. * note that the page queues must be locked to cluster. */ if (uobj) { /* if !swap-backed */ /* * attempt to build a cluster for pageout using its * make-put-cluster function (if it has one). */ if (uobj->pgops->pgo_mk_pcluster) { ppsp = uobj->pgops->pgo_mk_pcluster(uobj, ppsp, npages, pg, flags, start, stop); *ppsp_ptr = ppsp; /* update caller's pointer */ } else { ppsp[0] = pg; *npages = 1; } swblk = 0; /* XXX: keep gcc happy */ } else { /* * for swap-backed pageout, the caller (the pagedaemon) has * already built the cluster for us. the starting swap * block we are writing to has been passed in as "start." * "pg" could be NULL if there is no page we are especially * interested in (in which case the whole cluster gets dropped * in the event of an error or a sync "done"). */ swblk = start; /* ppsp and npages should be ok */ } /* now that we've clustered we can unlock the page queues */ uvm_unlock_pageq(); /* * now attempt the I/O. if we have a failure and we are * clustered, we will drop the cluster and try again. */ ReTry: if (uobj) { result = uobj->pgops->pgo_put(uobj, ppsp, *npages, flags); } else { /* XXX daddr_t -> int */ result = uvm_swap_put(swblk, ppsp, *npages, flags); } /* * we have attempted the I/O. * * if the I/O was a success then: * if !PGO_PDFREECLUST, we return the cluster to the * caller (who must un-busy all pages) * else we un-busy cluster pages for the pagedaemon * * if I/O is pending (async i/o) then we return the pending code. * [in this case the async i/o done function must clean up when * i/o is done...] */ if (result == VM_PAGER_PEND || result == VM_PAGER_OK) { if (result == VM_PAGER_OK && (flags & PGO_PDFREECLUST)) { /* drop cluster */ if (*npages > 1 || pg == NULL) uvm_pager_dropcluster(uobj, pg, ppsp, npages, PGO_PDFREECLUST); } return (result); } /* * a pager error occurred (even after dropping the cluster, if there * was one). give up! the caller only has one page ("pg") * to worry about. */ if (*npages > 1 || pg == NULL) { uvm_pager_dropcluster(uobj, pg, ppsp, npages, PGO_REALLOCSWAP); /* * for failed swap-backed pageouts with a "pg", * we need to reset pg's swslot to either: * "swblk" (for transient errors, so we can retry), * or 0 (for hard errors). */ if (uobj == NULL && pg != NULL) { /* XXX daddr_t -> int */ int nswblk = (result == VM_PAGER_AGAIN) ? swblk : 0; if (pg->pg_flags & PQ_ANON) { rw_enter(pg->uanon->an_lock, RW_WRITE); pg->uanon->an_swslot = nswblk; rw_exit(pg->uanon->an_lock); } else { rw_enter(pg->uobject->vmobjlock, RW_WRITE); uao_set_swslot(pg->uobject, pg->offset >> PAGE_SHIFT, nswblk); rw_exit(pg->uobject->vmobjlock); } } if (result == VM_PAGER_AGAIN) { /* * for transient failures, free all the swslots that * we're not going to retry with. */ if (uobj == NULL) { if (pg) { /* XXX daddr_t -> int */ uvm_swap_free(swblk + 1, *npages - 1); } else { /* XXX daddr_t -> int */ uvm_swap_free(swblk, *npages); } } if (pg) { ppsp[0] = pg; *npages = 1; goto ReTry; } } else if (uobj == NULL) { /* * for hard errors on swap-backed pageouts, * mark the swslots as bad. note that we do not * free swslots that we mark bad. */ /* XXX daddr_t -> int */ uvm_swap_markbad(swblk, *npages); } } /* * a pager error occurred (even after dropping the cluster, if there * was one). give up! the caller only has one page ("pg") * to worry about. */ return result; } /* * uvm_pager_dropcluster: drop a cluster we have built (because we * got an error, or, if PGO_PDFREECLUST we are un-busying the * cluster pages on behalf of the pagedaemon). * * => uobj, if non-null, is a non-swap-backed object * => page queues are not locked * => pg is our page of interest (the one we clustered around, can be null) * => ppsp/npages is our current cluster * => flags: PGO_PDFREECLUST: pageout was a success: un-busy cluster * pages on behalf of the pagedaemon. * PGO_REALLOCSWAP: drop previously allocated swap slots for * clustered swap-backed pages (except for "pg" if !NULL) * "swblk" is the start of swap alloc (e.g. for ppsp[0]) * [only meaningful if swap-backed (uobj == NULL)] */ void uvm_pager_dropcluster(struct uvm_object *uobj, struct vm_page *pg, struct vm_page **ppsp, int *npages, int flags) { int lcv; KASSERT(uobj == NULL || rw_write_held(uobj->vmobjlock)); /* drop all pages but "pg" */ for (lcv = 0 ; lcv < *npages ; lcv++) { /* skip "pg" or empty slot */ if (ppsp[lcv] == pg || ppsp[lcv] == NULL) continue; /* * Note that PQ_ANON bit can't change as long as we are holding * the PG_BUSY bit (so there is no need to lock the page * queues to test it). */ if (!uobj) { if (ppsp[lcv]->pg_flags & PQ_ANON) { rw_enter(ppsp[lcv]->uanon->an_lock, RW_WRITE); if (flags & PGO_REALLOCSWAP) /* zap swap block */ ppsp[lcv]->uanon->an_swslot = 0; } else { rw_enter(ppsp[lcv]->uobject->vmobjlock, RW_WRITE); if (flags & PGO_REALLOCSWAP) uao_set_swslot(ppsp[lcv]->uobject, ppsp[lcv]->offset >> PAGE_SHIFT, 0); } } /* did someone want the page while we had it busy-locked? */ if (ppsp[lcv]->pg_flags & PG_WANTED) { wakeup(ppsp[lcv]); } /* if page was released, release it. otherwise un-busy it */ if (ppsp[lcv]->pg_flags & PG_RELEASED && ppsp[lcv]->pg_flags & PQ_ANON) { /* kills anon and frees pg */ uvm_anon_release(ppsp[lcv]->uanon); continue; } else { /* * if we were planning on async io then we would * have PG_RELEASED set, clear that with the others. */ atomic_clearbits_int(&ppsp[lcv]->pg_flags, PG_BUSY|PG_WANTED|PG_FAKE|PG_RELEASED); UVM_PAGE_OWN(ppsp[lcv], NULL); } /* * if we are operating on behalf of the pagedaemon and we * had a successful pageout update the page! */ if (flags & PGO_PDFREECLUST) { pmap_clear_reference(ppsp[lcv]); pmap_clear_modify(ppsp[lcv]); atomic_setbits_int(&ppsp[lcv]->pg_flags, PG_CLEAN); } /* if anonymous cluster, unlock object and move on */ if (!uobj) { if (ppsp[lcv]->pg_flags & PQ_ANON) rw_exit(ppsp[lcv]->uanon->an_lock); else rw_exit(ppsp[lcv]->uobject->vmobjlock); } } } /* * interrupt-context iodone handler for single-buf i/os * or the top-level buf of a nested-buf i/o. * * => must be at splbio(). */ void uvm_aio_biodone(struct buf *bp) { splassert(IPL_BIO); /* reset b_iodone for when this is a single-buf i/o. */ bp->b_iodone = uvm_aio_aiodone; mtx_enter(&uvm.aiodoned_lock); TAILQ_INSERT_TAIL(&uvm.aio_done, bp, b_freelist); wakeup(&uvm.aiodoned); mtx_leave(&uvm.aiodoned_lock); } void uvm_aio_aiodone_pages(struct vm_page **pgs, int npages, boolean_t write, int error) { struct vm_page *pg; struct uvm_object *uobj; boolean_t swap; int i; uobj = NULL; for (i = 0; i < npages; i++) { pg = pgs[i]; if (i == 0) { swap = (pg->pg_flags & PQ_SWAPBACKED) != 0; if (!swap) { uobj = pg->uobject; rw_enter(uobj->vmobjlock, RW_WRITE); } } KASSERT(swap || pg->uobject == uobj); /* * if this is a read and we got an error, mark the pages * PG_RELEASED so that uvm_page_unbusy() will free them. */ if (!write && error) { atomic_setbits_int(&pg->pg_flags, PG_RELEASED); continue; } KASSERT(!write || (pgs[i]->pg_flags & PG_FAKE) == 0); /* * if this is a read and the page is PG_FAKE, * or this was a successful write, * mark the page PG_CLEAN and not PG_FAKE. */ if ((pgs[i]->pg_flags & PG_FAKE) || (write && error != ENOMEM)) { pmap_clear_reference(pgs[i]); pmap_clear_modify(pgs[i]); atomic_setbits_int(&pgs[i]->pg_flags, PG_CLEAN); atomic_clearbits_int(&pgs[i]->pg_flags, PG_FAKE); } } uvm_page_unbusy(pgs, npages); if (!swap) { rw_exit(uobj->vmobjlock); } } /* * uvm_aio_aiodone: do iodone processing for async i/os. * this should be called in thread context, not interrupt context. */ void uvm_aio_aiodone(struct buf *bp) { int npages = bp->b_bufsize >> PAGE_SHIFT; struct vm_page *pgs[MAXPHYS >> PAGE_SHIFT]; int i, error; boolean_t write; KASSERT(npages <= MAXPHYS >> PAGE_SHIFT); splassert(IPL_BIO); error = (bp->b_flags & B_ERROR) ? (bp->b_error ? bp->b_error : EIO) : 0; write = (bp->b_flags & B_READ) == 0; for (i = 0; i < npages; i++) pgs[i] = uvm_atopg((vaddr_t)bp->b_data + ((vsize_t)i << PAGE_SHIFT)); uvm_pagermapout((vaddr_t)bp->b_data, npages); #ifdef UVM_SWAP_ENCRYPT /* * XXX - assumes that we only get ASYNC writes. used to be above. */ if (pgs[0]->pg_flags & PQ_ENCRYPT) { uvm_swap_freepages(pgs, npages); goto freed; } #endif /* UVM_SWAP_ENCRYPT */ uvm_aio_aiodone_pages(pgs, npages, write, error); #ifdef UVM_SWAP_ENCRYPT freed: #endif pool_put(&bufpool, bp); }
290 37 1139 1349 775 15 6 711 3 4 2 2 15 37 18 83 28 99 291 275 18 18 26 269 269 18 9 291 37 37 31 8 31 6 37 37 34 29 277 51 17 212 213 278 3 1 1 3 2 3 3 3 3 2 3 1 3 1 84 39 83 62 82 39 25 61 61 44 64 46 10 1 2 10 10 8 3 4 63 1657 1652 1137 658 582 6 6 86 79 13 81 26 26 94 23 37 46 12 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 /* $OpenBSD: uvm_amap.c,v 1.91 2022/08/01 14:15:46 mpi Exp $ */ /* $NetBSD: uvm_amap.c,v 1.27 2000/11/25 06:27:59 chs Exp $ */ /* * Copyright (c) 1997 Charles D. Cranor and Washington University. * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ /* * uvm_amap.c: amap operations * * this file contains functions that perform operations on amaps. see * uvm_amap.h for a brief explanation of the role of amaps in uvm. */ #include <sys/param.h> #include <sys/systm.h> #include <sys/malloc.h> #include <sys/kernel.h> #include <sys/pool.h> #include <sys/atomic.h> #include <uvm/uvm.h> #include <uvm/uvm_swap.h> /* * pools for allocation of vm_amap structures. note that in order to * avoid an endless loop, the amap pool's allocator cannot allocate * memory from an amap (it currently goes through the kernel uobj, so * we are ok). */ struct pool uvm_amap_pool; struct pool uvm_small_amap_pool[UVM_AMAP_CHUNK]; struct pool uvm_amap_chunk_pool; LIST_HEAD(, vm_amap) amap_list; struct rwlock amap_list_lock = RWLOCK_INITIALIZER("amaplstlk"); #define amap_lock_list() rw_enter_write(&amap_list_lock) #define amap_unlock_list() rw_exit_write(&amap_list_lock) static char amap_small_pool_names[UVM_AMAP_CHUNK][9]; /* * local functions */ static struct vm_amap *amap_alloc1(int, int, int); static inline void amap_list_insert(struct vm_amap *); static inline void amap_list_remove(struct vm_amap *); struct vm_amap_chunk *amap_chunk_get(struct vm_amap *, int, int, int); void amap_chunk_free(struct vm_amap *, struct vm_amap_chunk *); /* * if we enable PPREF, then we have a couple of extra functions that * we need to prototype here... */ #ifdef UVM_AMAP_PPREF #define PPREF_NONE ((int *) -1) /* not using ppref */ void amap_pp_adjref(struct vm_amap *, int, vsize_t, int); void amap_pp_establish(struct vm_amap *); void amap_wiperange_chunk(struct vm_amap *, struct vm_amap_chunk *, int, int); void amap_wiperange(struct vm_amap *, int, int); #endif /* UVM_AMAP_PPREF */ static inline void amap_list_insert(struct vm_amap *amap) { amap_lock_list(); LIST_INSERT_HEAD(&amap_list, amap, am_list); amap_unlock_list(); } static inline void amap_list_remove(struct vm_amap *amap) { amap_lock_list(); LIST_REMOVE(amap, am_list); amap_unlock_list(); } /* * amap_chunk_get: lookup a chunk for slot. if create is non-zero, * the chunk is created if it does not yet exist. * * => returns the chunk on success or NULL on error */ struct vm_amap_chunk * amap_chunk_get(struct vm_amap *amap, int slot, int create, int waitf) { int bucket = UVM_AMAP_BUCKET(amap, slot); int baseslot = AMAP_BASE_SLOT(slot); int n; struct vm_amap_chunk *chunk, *newchunk, *pchunk = NULL; if (UVM_AMAP_SMALL(amap)) return &amap->am_small; for (chunk = amap->am_buckets[bucket]; chunk != NULL; chunk = TAILQ_NEXT(chunk, ac_list)) { if (UVM_AMAP_BUCKET(amap, chunk->ac_baseslot) != bucket) break; if (chunk->ac_baseslot == baseslot) return chunk; pchunk = chunk; } if (!create) return NULL; if (amap->am_nslot - baseslot >= UVM_AMAP_CHUNK) n = UVM_AMAP_CHUNK; else n = amap->am_nslot - baseslot; newchunk = pool_get(&uvm_amap_chunk_pool, waitf | PR_ZERO); if (newchunk == NULL) return NULL; if (pchunk == NULL) { TAILQ_INSERT_TAIL(&amap->am_chunks, newchunk, ac_list); KASSERT(amap->am_buckets[bucket] == NULL); amap->am_buckets[bucket] = newchunk; } else TAILQ_INSERT_AFTER(&amap->am_chunks, pchunk, newchunk, ac_list); amap->am_ncused++; newchunk->ac_baseslot = baseslot; newchunk->ac_nslot = n; return newchunk; } void amap_chunk_free(struct vm_amap *amap, struct vm_amap_chunk *chunk) { int bucket = UVM_AMAP_BUCKET(amap, chunk->ac_baseslot); struct vm_amap_chunk *nchunk; if (UVM_AMAP_SMALL(amap)) return; nchunk = TAILQ_NEXT(chunk, ac_list); TAILQ_REMOVE(&amap->am_chunks, chunk, ac_list); if (amap->am_buckets[bucket] == chunk) { if (nchunk != NULL && UVM_AMAP_BUCKET(amap, nchunk->ac_baseslot) == bucket) amap->am_buckets[bucket] = nchunk; else amap->am_buckets[bucket] = NULL; } pool_put(&uvm_amap_chunk_pool, chunk); amap->am_ncused--; } #ifdef UVM_AMAP_PPREF /* * what is ppref? ppref is an _optional_ amap feature which is used * to keep track of reference counts on a per-page basis. it is enabled * when UVM_AMAP_PPREF is defined. * * when enabled, an array of ints is allocated for the pprefs. this * array is allocated only when a partial reference is added to the * map (either by unmapping part of the amap, or gaining a reference * to only a part of an amap). if the allocation of the array fails * (M_NOWAIT), then we set the array pointer to PPREF_NONE to indicate * that we tried to do ppref's but couldn't alloc the array so just * give up (after all, this is an optional feature!). * * the array is divided into page sized "chunks." for chunks of length 1, * the chunk reference count plus one is stored in that chunk's slot. * for chunks of length > 1 the first slot contains (the reference count * plus one) * -1. [the negative value indicates that the length is * greater than one.] the second slot of the chunk contains the length * of the chunk. here is an example: * * actual REFS: 2 2 2 2 3 1 1 0 0 0 4 4 0 1 1 1 * ppref: -3 4 x x 4 -2 2 -1 3 x -5 2 1 -2 3 x * <----------><-><----><-------><----><-><-------> * (x = don't care) * * this allows us to allow one int to contain the ref count for the whole * chunk. note that the "plus one" part is needed because a reference * count of zero is neither positive or negative (need a way to tell * if we've got one zero or a bunch of them). * * here are some in-line functions to help us. */ /* * pp_getreflen: get the reference and length for a specific offset * * => ppref's amap must be locked */ static inline void pp_getreflen(int *ppref, int offset, int *refp, int *lenp) { if (ppref[offset] > 0) { /* chunk size must be 1 */ *refp = ppref[offset] - 1; /* don't forget to adjust */ *lenp = 1; } else { *refp = (ppref[offset] * -1) - 1; *lenp = ppref[offset+1]; } } /* * pp_setreflen: set the reference and length for a specific offset * * => ppref's amap must be locked */ static inline void pp_setreflen(int *ppref, int offset, int ref, int len) { if (len == 1) { ppref[offset] = ref + 1; } else { ppref[offset] = (ref + 1) * -1; ppref[offset+1] = len; } } #endif /* UVM_AMAP_PPREF */ /* * amap_init: called at boot time to init global amap data structures */ void amap_init(void) { int i; size_t size; /* Initialize the vm_amap pool. */ pool_init(&uvm_amap_pool, sizeof(struct vm_amap), 0, IPL_MPFLOOR, PR_WAITOK, "amappl", NULL); pool_sethiwat(&uvm_amap_pool, 4096); /* initialize small amap pools */ for (i = 0; i < nitems(uvm_small_amap_pool); i++) { snprintf(amap_small_pool_names[i], sizeof(amap_small_pool_names[0]), "amappl%d", i + 1); size = offsetof(struct vm_amap, am_small.ac_anon) + (i + 1) * sizeof(struct vm_anon *); pool_init(&uvm_small_amap_pool[i], size, 0, IPL_MPFLOOR, PR_WAITOK, amap_small_pool_names[i], NULL); } pool_init(&uvm_amap_chunk_pool, sizeof(struct vm_amap_chunk) + UVM_AMAP_CHUNK * sizeof(struct vm_anon *), 0, IPL_MPFLOOR, PR_WAITOK, "amapchunkpl", NULL); pool_sethiwat(&uvm_amap_chunk_pool, 4096); } /* * amap_alloc1: allocate an amap, but do not initialise the overlay. * * => Note: lock is not set. */ static inline struct vm_amap * amap_alloc1(int slots, int waitf, int lazyalloc) { struct vm_amap *amap; struct vm_amap_chunk *chunk, *tmp; int chunks, log_chunks, chunkperbucket = 1, hashshift = 0; int buckets, i, n; int pwaitf = (waitf & M_WAITOK) ? PR_WAITOK : PR_NOWAIT; KASSERT(slots > 0); /* * Cast to unsigned so that rounding up cannot cause integer overflow * if slots is large. */ chunks = roundup((unsigned int)slots, UVM_AMAP_CHUNK) / UVM_AMAP_CHUNK; if (lazyalloc) { /* * Basically, the amap is a hash map where the number of * buckets is fixed. We select the number of buckets using the * following strategy: * * 1. The maximal number of entries to search in a bucket upon * a collision should be less than or equal to * log2(slots / UVM_AMAP_CHUNK). This is the worst-case number * of lookups we would have if we could chunk the amap. The * log2(n) comes from the fact that amaps are chunked by * splitting up their vm_map_entries and organizing those * in a binary search tree. * * 2. The maximal number of entries in a bucket must be a * power of two. * * The maximal number of entries per bucket is used to hash * a slot to a bucket. * * In the future, this strategy could be refined to make it * even harder/impossible that the total amount of KVA needed * for the hash buckets of all amaps to exceed the maximal * amount of KVA memory reserved for amaps. */ for (log_chunks = 1; (chunks >> log_chunks) > 0; log_chunks++) continue; chunkperbucket = 1 << hashshift; while (chunkperbucket + 1 < log_chunks) { hashshift++; chunkperbucket = 1 << hashshift; } } if (slots > UVM_AMAP_CHUNK) amap = pool_get(&uvm_amap_pool, pwaitf); else amap = pool_get(&uvm_small_amap_pool[slots - 1], pwaitf | PR_ZERO); if (amap == NULL) return NULL; amap->am_lock = NULL; amap->am_ref = 1; amap->am_flags = 0; #ifdef UVM_AMAP_PPREF amap->am_ppref = NULL; #endif amap->am_nslot = slots; amap->am_nused = 0; if (UVM_AMAP_SMALL(amap)) { amap->am_small.ac_nslot = slots; return amap; } amap->am_ncused = 0; TAILQ_INIT(&amap->am_chunks); amap->am_hashshift = hashshift; amap->am_buckets = NULL; buckets = howmany(chunks, chunkperbucket); amap->am_buckets = mallocarray(buckets, sizeof(*amap->am_buckets), M_UVMAMAP, waitf | (lazyalloc ? M_ZERO : 0)); if (amap->am_buckets == NULL) goto fail1; amap->am_nbuckets = buckets; if (!lazyalloc) { for (i = 0; i < buckets; i++) { if (i == buckets - 1) { n = slots % UVM_AMAP_CHUNK; if (n == 0) n = UVM_AMAP_CHUNK; } else n = UVM_AMAP_CHUNK; chunk = pool_get(&uvm_amap_chunk_pool, PR_ZERO | pwaitf); if (chunk == NULL) goto fail1; amap->am_buckets[i] = chunk; amap->am_ncused++; chunk->ac_baseslot = i * UVM_AMAP_CHUNK; chunk->ac_nslot = n; TAILQ_INSERT_TAIL(&amap->am_chunks, chunk, ac_list); } } return amap; fail1: free(amap->am_buckets, M_UVMAMAP, buckets * sizeof(*amap->am_buckets)); TAILQ_FOREACH_SAFE(chunk, &amap->am_chunks, ac_list, tmp) pool_put(&uvm_amap_chunk_pool, chunk); pool_put(&uvm_amap_pool, amap); return NULL; } static void amap_lock_alloc(struct vm_amap *amap) { rw_obj_alloc(&amap->am_lock, "amaplk"); } /* * amap_alloc: allocate an amap to manage "sz" bytes of anonymous VM * * => caller should ensure sz is a multiple of PAGE_SIZE * => reference count to new amap is set to one * => new amap is returned unlocked */ struct vm_amap * amap_alloc(vaddr_t sz, int waitf, int lazyalloc) { struct vm_amap *amap; size_t slots; AMAP_B2SLOT(slots, sz); /* load slots */ if (slots > INT_MAX) return NULL; amap = amap_alloc1(slots, waitf, lazyalloc); if (amap != NULL) { amap_lock_alloc(amap); amap_list_insert(amap); } return amap; } /* * amap_free: free an amap * * => the amap must be unlocked * => the amap should have a zero reference count and be empty */ void amap_free(struct vm_amap *amap) { struct vm_amap_chunk *chunk, *tmp; KASSERT(amap->am_ref == 0 && amap->am_nused == 0); KASSERT((amap->am_flags & AMAP_SWAPOFF) == 0); if (amap->am_lock != NULL) { KASSERT(amap->am_lock == NULL || !rw_write_held(amap->am_lock)); rw_obj_free(amap->am_lock); } #ifdef UVM_AMAP_PPREF if (amap->am_ppref && amap->am_ppref != PPREF_NONE) free(amap->am_ppref, M_UVMAMAP, amap->am_nslot * sizeof(int)); #endif if (UVM_AMAP_SMALL(amap)) pool_put(&uvm_small_amap_pool[amap->am_nslot - 1], amap); else { TAILQ_FOREACH_SAFE(chunk, &amap->am_chunks, ac_list, tmp) pool_put(&uvm_amap_chunk_pool, chunk); free(amap->am_buckets, M_UVMAMAP, amap->am_nbuckets * sizeof(*amap->am_buckets)); pool_put(&uvm_amap_pool, amap); } } /* * amap_wipeout: wipeout all anon's in an amap; then free the amap! * * => Called from amap_unref(), when reference count drops to zero. * => amap must be locked. */ void amap_wipeout(struct vm_amap *amap) { int slot; struct vm_anon *anon; struct vm_amap_chunk *chunk; struct pglist pgl; KASSERT(rw_write_held(amap->am_lock)); KASSERT(amap->am_ref == 0); if (__predict_false((amap->am_flags & AMAP_SWAPOFF) != 0)) { /* * Note: amap_swap_off() will call us again. */ amap_unlock(amap); return; } TAILQ_INIT(&pgl); amap_list_remove(amap); AMAP_CHUNK_FOREACH(chunk, amap) { int i, refs, map = chunk->ac_usedmap; for (i = ffs(map); i != 0; i = ffs(map)) { slot = i - 1; map ^= 1 << slot; anon = chunk->ac_anon[slot]; if (anon == NULL || anon->an_ref == 0) panic("amap_wipeout: corrupt amap"); KASSERT(anon->an_lock == amap->am_lock); /* * Drop the reference. */ refs = --anon->an_ref; if (refs == 0) { uvm_anfree_list(anon, &pgl); } } } /* free the pages */ uvm_pglistfree(&pgl); /* * Finally, destroy the amap. */ amap->am_ref = 0; /* ... was one */ amap->am_nused = 0; amap_unlock(amap); amap_free(amap); } /* * amap_copy: ensure that a map entry's "needs_copy" flag is false * by copying the amap if necessary. * * => an entry with a null amap pointer will get a new (blank) one. * => the map that the map entry belongs to must be locked by caller. * => the amap currently attached to "entry" (if any) must be unlocked. * => if canchunk is true, then we may clip the entry into a chunk * => "startva" and "endva" are used only if canchunk is true. they are * used to limit chunking (e.g. if you have a large space that you * know you are going to need to allocate amaps for, there is no point * in allowing that to be chunked) */ void amap_copy(struct vm_map *map, struct vm_map_entry *entry, int waitf, boolean_t canchunk, vaddr_t startva, vaddr_t endva) { struct vm_amap *amap, *srcamap; int slots, lcv, lazyalloc = 0; vaddr_t chunksize; int i, j, k, n, srcslot; struct vm_amap_chunk *chunk = NULL, *srcchunk = NULL; struct vm_anon *anon; KASSERT(map != kernel_map); /* we use sleeping locks */ /* * Is there an amap to copy? If not, create one. */ if (entry->aref.ar_amap == NULL) { /* * Check to see if we have a large amap that we can * chunk. We align startva/endva to chunk-sized * boundaries and then clip to them. * * If we cannot chunk the amap, allocate it in a way * that makes it grow or shrink dynamically with * the number of slots. */ if (atop(entry->end - entry->start) >= UVM_AMAP_LARGE) { if (canchunk) { /* convert slots to bytes */ chunksize = UVM_AMAP_CHUNK << PAGE_SHIFT; startva = (startva / chunksize) * chunksize; endva = roundup(endva, chunksize); UVM_MAP_CLIP_START(map, entry, startva); /* watch out for endva wrap-around! */ if (endva >= startva) UVM_MAP_CLIP_END(map, entry, endva); } else lazyalloc = 1; } entry->aref.ar_pageoff = 0; entry->aref.ar_amap = amap_alloc(entry->end - entry->start, waitf, lazyalloc); if (entry->aref.ar_amap != NULL) entry->etype &= ~UVM_ET_NEEDSCOPY; return; } /* * First check and see if we are the only map entry referencing * he amap we currently have. If so, then just take it over instead * of copying it. Note that we are reading am_ref without lock held * as the value value can only be one if we have the only reference * to the amap (via our locked map). If the value is greater than * one, then allocate amap and re-check the value. */ if (entry->aref.ar_amap->am_ref == 1) { entry->etype &= ~UVM_ET_NEEDSCOPY; return; } /* * Allocate a new amap (note: not initialised, etc). */ AMAP_B2SLOT(slots, entry->end - entry->start); if (!UVM_AMAP_SMALL(entry->aref.ar_amap) && entry->aref.ar_amap->am_hashshift != 0) lazyalloc = 1; amap = amap_alloc1(slots, waitf, lazyalloc); if (amap == NULL) return; srcamap = entry->aref.ar_amap; /* * Make the new amap share the source amap's lock, and then lock * both. */ amap->am_lock = srcamap->am_lock; rw_obj_hold(amap->am_lock); amap_lock(srcamap); /* * Re-check the reference count with the lock held. If it has * dropped to one - we can take over the existing map. */ if (srcamap->am_ref == 1) { /* Just take over the existing amap. */ entry->etype &= ~UVM_ET_NEEDSCOPY; amap_unlock(srcamap); /* Destroy the new (unused) amap. */ amap->am_ref--; amap_free(amap); return; } /* * Copy the slots. */ for (lcv = 0; lcv < slots; lcv += n) { srcslot = entry->aref.ar_pageoff + lcv; i = UVM_AMAP_SLOTIDX(lcv); j = UVM_AMAP_SLOTIDX(srcslot); n = UVM_AMAP_CHUNK; if (i > j) n -= i; else n -= j; if (lcv + n > slots) n = slots - lcv; srcchunk = amap_chunk_get(srcamap, srcslot, 0, PR_NOWAIT); if (srcchunk == NULL) continue; chunk = amap_chunk_get(amap, lcv, 1, PR_NOWAIT); if (chunk == NULL) { /* amap_wipeout() releases the lock. */ amap->am_ref = 0; amap_wipeout(amap); return; } for (k = 0; k < n; i++, j++, k++) { chunk->ac_anon[i] = anon = srcchunk->ac_anon[j]; if (anon == NULL) continue; KASSERT(anon->an_lock == srcamap->am_lock); KASSERT(anon->an_ref > 0); chunk->ac_usedmap |= (1 << i); anon->an_ref++; amap->am_nused++; } } /* * Drop our reference to the old amap (srcamap) and unlock. * Since the reference count on srcamap is greater than one, * (we checked above), it cannot drop to zero while it is locked. */ srcamap->am_ref--; KASSERT(srcamap->am_ref > 0); if (srcamap->am_ref == 1 && (srcamap->am_flags & AMAP_SHARED) != 0) srcamap->am_flags &= ~AMAP_SHARED; /* clear shared flag */ #ifdef UVM_AMAP_PPREF if (srcamap->am_ppref && srcamap->am_ppref != PPREF_NONE) { amap_pp_adjref(srcamap, entry->aref.ar_pageoff, (entry->end - entry->start) >> PAGE_SHIFT, -1); } #endif /* * If we referenced any anons, then share the source amap's lock. * Otherwise, we have nothing in common, so allocate a new one. */ KASSERT(amap->am_lock == srcamap->am_lock); if (amap->am_nused == 0) { rw_obj_free(amap->am_lock); amap->am_lock = NULL; } amap_unlock(srcamap); if (amap->am_lock == NULL) amap_lock_alloc(amap); /* * Install new amap. */ entry->aref.ar_pageoff = 0; entry->aref.ar_amap = amap; entry->etype &= ~UVM_ET_NEEDSCOPY; amap_list_insert(amap); } /* * amap_cow_now: resolve all copy-on-write faults in an amap now for fork(2) * * called during fork(2) when the parent process has a wired map * entry. in that case we want to avoid write-protecting pages * in the parent's map (e.g. like what you'd do for a COW page) * so we resolve the COW here. * * => assume parent's entry was wired, thus all pages are resident. * => the parent and child vm_map must both be locked. * => caller passes child's map/entry in to us * => XXXCDC: out of memory should cause fork to fail, but there is * currently no easy way to do this (needs fix) */ void amap_cow_now(struct vm_map *map, struct vm_map_entry *entry) { struct vm_amap *amap = entry->aref.ar_amap; int slot; struct vm_anon *anon, *nanon; struct vm_page *pg, *npg; struct vm_amap_chunk *chunk; /* * note that if we unlock the amap then we must ReStart the "lcv" for * loop because some other process could reorder the anon's in the * am_anon[] array on us while the lock is dropped. */ ReStart: amap_lock(amap); AMAP_CHUNK_FOREACH(chunk, amap) { int i, map = chunk->ac_usedmap; for (i = ffs(map); i != 0; i = ffs(map)) { slot = i - 1; map ^= 1 << slot; anon = chunk->ac_anon[slot]; pg = anon->an_page; KASSERT(anon->an_lock == amap->am_lock); /* * The old page must be resident since the parent is * wired. */ KASSERT(pg != NULL); /* * if the anon ref count is one, we are safe (the child * has exclusive access to the page). */ if (anon->an_ref <= 1) continue; /* * If the page is busy, then we have to unlock, wait for * it and then restart. */ if (pg->pg_flags & PG_BUSY) { uvm_pagewait(pg, amap->am_lock, "cownow"); goto ReStart; } /* * Perform a copy-on-write. * First - get a new anon and a page. */ nanon = uvm_analloc(); if (nanon != NULL) { /* the new anon will share the amap's lock */ nanon->an_lock = amap->am_lock; npg = uvm_pagealloc(NULL, 0, nanon, 0); } else npg = NULL; /* XXX: quiet gcc warning */ if (nanon == NULL || npg == NULL) { /* out of memory */ amap_unlock(amap); if (nanon != NULL) { nanon->an_lock = NULL; nanon->an_ref--; KASSERT(nanon->an_ref == 0); uvm_anfree(nanon); } uvm_wait("cownowpage"); goto ReStart; } /* * Copy the data and replace anon with the new one. * Also, setup its lock (share the with amap's lock). */ uvm_pagecopy(pg, npg); anon->an_ref--; KASSERT(anon->an_ref > 0); chunk->ac_anon[slot] = nanon; /* * Drop PG_BUSY on new page. Since its owner was write * locked all this time - it cannot be PG_RELEASED or * PG_WANTED. */ atomic_clearbits_int(&npg->pg_flags, PG_BUSY|PG_FAKE); UVM_PAGE_OWN(npg, NULL); uvm_lock_pageq(); uvm_pageactivate(npg); uvm_unlock_pageq(); } } amap_unlock(amap); } /* * amap_splitref: split a single reference into two separate references * * => called from uvm_map's clip routines * => origref's map should be locked * => origref->ar_amap should be unlocked (we will lock) */ void amap_splitref(struct vm_aref *origref, struct vm_aref *splitref, vaddr_t offset) { struct vm_amap *amap = origref->ar_amap; int leftslots; KASSERT(splitref->ar_amap == amap); AMAP_B2SLOT(leftslots, offset); if (leftslots == 0) panic("amap_splitref: split at zero offset"); amap_lock(amap); if (amap->am_nslot - origref->ar_pageoff - leftslots <= 0) panic("amap_splitref: map size check failed"); #ifdef UVM_AMAP_PPREF /* Establish ppref before we add a duplicate reference to the amap. */ if (amap->am_ppref == NULL) amap_pp_establish(amap); #endif /* Note: not a share reference. */ amap->am_ref++; splitref->ar_amap = amap; splitref->ar_pageoff = origref->ar_pageoff + leftslots; amap_unlock(amap); } #ifdef UVM_AMAP_PPREF /* * amap_pp_establish: add a ppref array to an amap, if possible. * * => amap should be locked by caller* => amap should be locked by caller */ void amap_pp_establish(struct vm_amap *amap) { KASSERT(rw_write_held(amap->am_lock)); amap->am_ppref = mallocarray(amap->am_nslot, sizeof(int), M_UVMAMAP, M_NOWAIT|M_ZERO); if (amap->am_ppref == NULL) { /* Failure - just do not use ppref. */ amap->am_ppref = PPREF_NONE; return; } pp_setreflen(amap->am_ppref, 0, amap->am_ref, amap->am_nslot); } /* * amap_pp_adjref: adjust reference count to a part of an amap using the * per-page reference count array. * * => caller must check that ppref != PPREF_NONE before calling. * => map and amap must be locked. */ void amap_pp_adjref(struct vm_amap *amap, int curslot, vsize_t slotlen, int adjval) { int stopslot, *ppref, lcv, prevlcv; int ref, len, prevref, prevlen; KASSERT(rw_write_held(amap->am_lock)); stopslot = curslot + slotlen; ppref = amap->am_ppref; prevlcv = 0; /* * Advance to the correct place in the array, fragment if needed. */ for (lcv = 0 ; lcv < curslot ; lcv += len) { pp_getreflen(ppref, lcv, &ref, &len); if (lcv + len > curslot) { /* goes past start? */ pp_setreflen(ppref, lcv, ref, curslot - lcv); pp_setreflen(ppref, curslot, ref, len - (curslot -lcv)); len = curslot - lcv; /* new length of entry @ lcv */ } prevlcv = lcv; } if (lcv != 0) pp_getreflen(ppref, prevlcv, &prevref, &prevlen); else { /* * Ensure that the "prevref == ref" test below always * fails, since we are starting from the beginning of * the ppref array; that is, there is no previous chunk. */ prevref = -1; prevlen = 0; } /* * Now adjust reference counts in range. Merge the first * changed entry with the last unchanged entry if possible. */ if (lcv != curslot) panic("amap_pp_adjref: overshot target"); for (/* lcv already set */; lcv < stopslot ; lcv += len) { pp_getreflen(ppref, lcv, &ref, &len); if (lcv + len > stopslot) { /* goes past end? */ pp_setreflen(ppref, lcv, ref, stopslot - lcv); pp_setreflen(ppref, stopslot, ref, len - (stopslot - lcv)); len = stopslot - lcv; } ref += adjval; if (ref < 0) panic("amap_pp_adjref: negative reference count"); if (lcv == prevlcv + prevlen && ref == prevref) { pp_setreflen(ppref, prevlcv, ref, prevlen + len); } else { pp_setreflen(ppref, lcv, ref, len); } if (ref == 0) amap_wiperange(amap, lcv, len); } } void amap_wiperange_chunk(struct vm_amap *amap, struct vm_amap_chunk *chunk, int slotoff, int slots) { int curslot, i, map; int startbase, endbase; struct vm_anon *anon; startbase = AMAP_BASE_SLOT(slotoff); endbase = AMAP_BASE_SLOT(slotoff + slots - 1); map = chunk->ac_usedmap; if (startbase == chunk->ac_baseslot) map &= ~((1 << (slotoff - startbase)) - 1); if (endbase == chunk->ac_baseslot) map &= (1 << (slotoff + slots - endbase)) - 1; for (i = ffs(map); i != 0; i = ffs(map)) { int refs; curslot = i - 1; map ^= 1 << curslot; chunk->ac_usedmap ^= 1 << curslot; anon = chunk->ac_anon[curslot]; KASSERT(anon->an_lock == amap->am_lock); /* remove it from the amap */ chunk->ac_anon[curslot] = NULL; amap->am_nused--; /* drop anon reference count */ refs = --anon->an_ref; if (refs == 0) { uvm_anfree(anon); } /* * done with this anon, next ...! */ } /* end of 'for' loop */ } /* * amap_wiperange: wipe out a range of an amap. * Note: different from amap_wipeout because the amap is kept intact. * * => Both map and amap must be locked by caller. */ void amap_wiperange(struct vm_amap *amap, int slotoff, int slots) { int bucket, startbucket, endbucket; struct vm_amap_chunk *chunk, *nchunk; KASSERT(rw_write_held(amap->am_lock)); startbucket = UVM_AMAP_BUCKET(amap, slotoff); endbucket = UVM_AMAP_BUCKET(amap, slotoff + slots - 1); /* * We can either traverse the amap by am_chunks or by am_buckets. * Determine which way is less expensive. */ if (UVM_AMAP_SMALL(amap)) amap_wiperange_chunk(amap, &amap->am_small, slotoff, slots); else if (endbucket + 1 - startbucket >= amap->am_ncused) { TAILQ_FOREACH_SAFE(chunk, &amap->am_chunks, ac_list, nchunk) { if (chunk->ac_baseslot + chunk->ac_nslot <= slotoff) continue; if (chunk->ac_baseslot >= slotoff + slots) continue; amap_wiperange_chunk(amap, chunk, slotoff, slots); if (chunk->ac_usedmap == 0) amap_chunk_free(amap, chunk); } } else { for (bucket = startbucket; bucket <= endbucket; bucket++) { for (chunk = amap->am_buckets[bucket]; chunk != NULL; chunk = nchunk) { nchunk = TAILQ_NEXT(chunk, ac_list); if (UVM_AMAP_BUCKET(amap, chunk->ac_baseslot) != bucket) break; if (chunk->ac_baseslot + chunk->ac_nslot <= slotoff) continue; if (chunk->ac_baseslot >= slotoff + slots) continue; amap_wiperange_chunk(amap, chunk, slotoff, slots); if (chunk->ac_usedmap == 0) amap_chunk_free(amap, chunk); } } } } #endif /* * amap_swap_off: pagein anonymous pages in amaps and drop swap slots. * * => note that we don't always traverse all anons. * eg. amaps being wiped out, released anons. * => return TRUE if failed. */ boolean_t amap_swap_off(int startslot, int endslot) { struct vm_amap *am; struct vm_amap *am_next; struct vm_amap marker; boolean_t rv = FALSE; amap_lock_list(); for (am = LIST_FIRST(&amap_list); am != NULL && !rv; am = am_next) { int i, map; struct vm_amap_chunk *chunk; amap_lock(am); if (am->am_nused == 0) { amap_unlock(am); am_next = LIST_NEXT(am, am_list); continue; } LIST_INSERT_AFTER(am, &marker, am_list); amap_unlock_list(); again: AMAP_CHUNK_FOREACH(chunk, am) { map = chunk->ac_usedmap; for (i = ffs(map); i != 0; i = ffs(map)) { int swslot; int slot = i - 1; struct vm_anon *anon; map ^= 1 << slot; anon = chunk->ac_anon[slot]; swslot = anon->an_swslot; if (swslot < startslot || endslot <= swslot) { continue; } am->am_flags |= AMAP_SWAPOFF; rv = uvm_anon_pagein(am, anon); amap_lock(am); am->am_flags &= ~AMAP_SWAPOFF; if (amap_refs(am) == 0) { amap_wipeout(am); am = NULL; goto nextamap; } if (rv) goto nextamap; goto again; } } nextamap: if (am != NULL) amap_unlock(am); amap_lock_list(); am_next = LIST_NEXT(&marker, am_list); LIST_REMOVE(&marker, am_list); } amap_unlock_list(); return rv; } /* * amap_lookup: look up a page in an amap. * * => amap should be locked by caller. */ struct vm_anon * amap_lookup(struct vm_aref *aref, vaddr_t offset) { int slot; struct vm_amap *amap = aref->ar_amap; struct vm_amap_chunk *chunk; AMAP_B2SLOT(slot, offset); slot += aref->ar_pageoff; KASSERT(slot < amap->am_nslot); chunk = amap_chunk_get(amap, slot, 0, PR_NOWAIT); if (chunk == NULL) return NULL; return chunk->ac_anon[UVM_AMAP_SLOTIDX(slot)]; } /* * amap_lookups: look up a range of pages in an amap. * * => amap should be locked by caller. * => XXXCDC: this interface is biased toward array-based amaps. fix. */ void amap_lookups(struct vm_aref *aref, vaddr_t offset, struct vm_anon **anons, int npages) { int i, lcv, n, slot; struct vm_amap *amap = aref->ar_amap; struct vm_amap_chunk *chunk = NULL; AMAP_B2SLOT(slot, offset); slot += aref->ar_pageoff; KASSERT((slot + (npages - 1)) < amap->am_nslot); for (i = 0, lcv = slot; lcv < slot + npages; i += n, lcv += n) { n = UVM_AMAP_CHUNK - UVM_AMAP_SLOTIDX(lcv); if (lcv + n > slot + npages) n = slot + npages - lcv; chunk = amap_chunk_get(amap, lcv, 0, PR_NOWAIT); if (chunk == NULL) memset(&anons[i], 0, n * sizeof(*anons)); else memcpy(&anons[i], &chunk->ac_anon[UVM_AMAP_SLOTIDX(lcv)], n * sizeof(*anons)); } } /* * amap_populate: ensure that the amap can store an anon for the page at * offset. This function can sleep until memory to store the anon is * available. */ void amap_populate(struct vm_aref *aref, vaddr_t offset) { int slot; struct vm_amap *amap = aref->ar_amap; struct vm_amap_chunk *chunk; AMAP_B2SLOT(slot, offset); slot += aref->ar_pageoff; KASSERT(slot < amap->am_nslot); chunk = amap_chunk_get(amap, slot, 1, PR_WAITOK); KASSERT(chunk != NULL); } /* * amap_add: add (or replace) a page to an amap. * * => amap should be locked by caller. * => anon must have the lock associated with this amap. */ int amap_add(struct vm_aref *aref, vaddr_t offset, struct vm_anon *anon, boolean_t replace) { int slot; struct vm_amap *amap = aref->ar_amap; struct vm_amap_chunk *chunk; AMAP_B2SLOT(slot, offset); slot += aref->ar_pageoff; KASSERT(slot < amap->am_nslot); chunk = amap_chunk_get(amap, slot, 1, PR_NOWAIT); if (chunk == NULL) return 1; slot = UVM_AMAP_SLOTIDX(slot); if (replace) { struct vm_anon *oanon = chunk->ac_anon[slot]; KASSERT(oanon != NULL); if (oanon->an_page && (amap->am_flags & AMAP_SHARED) != 0) { pmap_page_protect(oanon->an_page, PROT_NONE); /* * XXX: suppose page is supposed to be wired somewhere? */ } } else { /* !replace */ if (chunk->ac_anon[slot] != NULL) panic("amap_add: slot in use"); chunk->ac_usedmap |= 1 << slot; amap->am_nused++; } chunk->ac_anon[slot] = anon; return 0; } /* * amap_unadd: remove a page from an amap. * * => amap should be locked by caller. */ void amap_unadd(struct vm_aref *aref, vaddr_t offset) { struct vm_amap *amap = aref->ar_amap; struct vm_amap_chunk *chunk; int slot; KASSERT(rw_write_held(amap->am_lock)); AMAP_B2SLOT(slot, offset); slot += aref->ar_pageoff; KASSERT(slot < amap->am_nslot); chunk = amap_chunk_get(amap, slot, 0, PR_NOWAIT); KASSERT(chunk != NULL); slot = UVM_AMAP_SLOTIDX(slot); KASSERT(chunk->ac_anon[slot] != NULL); chunk->ac_anon[slot] = NULL; chunk->ac_usedmap &= ~(1 << slot); amap->am_nused--; if (chunk->ac_usedmap == 0) amap_chunk_free(amap, chunk); } /* * amap_adjref_anons: adjust the reference count(s) on amap and its anons. */ static void amap_adjref_anons(struct vm_amap *amap, vaddr_t offset, vsize_t len, int refv, boolean_t all) { #ifdef UVM_AMAP_PPREF KASSERT(rw_write_held(amap->am_lock)); /* * We must establish the ppref array before changing am_ref * so that the ppref values match the current amap refcount. */ if (amap->am_ppref == NULL && !all && len != amap->am_nslot) { amap_pp_establish(amap); } #endif amap->am_ref += refv; #ifdef UVM_AMAP_PPREF if (amap->am_ppref && amap->am_ppref != PPREF_NONE) { if (all) { amap_pp_adjref(amap, 0, amap->am_nslot, refv); } else { amap_pp_adjref(amap, offset, len, refv); } } #endif amap_unlock(amap); } /* * amap_ref: gain a reference to an amap. * * => amap must not be locked (we will lock). * => "offset" and "len" are in units of pages. * => Called at fork time to gain the child's reference. */ void amap_ref(struct vm_amap *amap, vaddr_t offset, vsize_t len, int flags) { amap_lock(amap); if (flags & AMAP_SHARED) amap->am_flags |= AMAP_SHARED; amap_adjref_anons(amap, offset, len, 1, (flags & AMAP_REFALL) != 0); } /* * amap_unref: remove a reference to an amap. * * => All pmap-level references to this amap must be already removed. * => Called from uvm_unmap_detach(); entry is already removed from the map. * => We will lock amap, so it must be unlocked. */ void amap_unref(struct vm_amap *amap, vaddr_t offset, vsize_t len, boolean_t all) { amap_lock(amap); KASSERT(amap->am_ref > 0); if (amap->am_ref == 1) { /* * If the last reference - wipeout and destroy the amap. */ amap->am_ref--; amap_wipeout(amap); return; } /* * Otherwise, drop the reference count(s) on anons. */ if (amap->am_ref == 2 && (amap->am_flags & AMAP_SHARED) != 0) { amap->am_flags &= ~AMAP_SHARED; } amap_adjref_anons(amap, offset, len, -1, all); }
88 88 88 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 /* $OpenBSD: vfs_sync.c,v 1.68 2022/08/14 01:58:28 jsg Exp $ */ /* * Portions of this code are: * * Copyright (c) 1989, 1993 * The Regents of the University of California. All rights reserved. * (c) UNIX System Laboratories, Inc. * All or some portions of this file are derived from material licensed * to the University of California by American Telephone and Telegraph * Co. or Unix System Laboratories, Inc. and are reproduced herein with * the permission of UNIX System Laboratories, Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ /* * Syncer daemon */ #include <sys/queue.h> #include <sys/param.h> #include <sys/systm.h> #include <sys/proc.h> #include <sys/mount.h> #include <sys/vnode.h> #include <sys/lock.h> #include <sys/malloc.h> #include <sys/time.h> #ifdef FFS_SOFTUPDATES int softdep_process_worklist(struct mount *); #endif /* * The workitem queue. */ #define SYNCER_MAXDELAY 32 /* maximum sync delay time */ #define SYNCER_DEFAULT 30 /* default sync delay time */ int syncer_maxdelay = SYNCER_MAXDELAY; /* maximum delay time */ int syncdelay = SYNCER_DEFAULT; /* time to delay syncing vnodes */ int rushjob = 0; /* number of slots to run ASAP */ int stat_rush_requests = 0; /* number of rush requests */ int syncer_delayno = 0; long syncer_mask; LIST_HEAD(synclist, vnode); static struct synclist *syncer_workitem_pending; struct proc *syncerproc; int syncer_chan; /* * The workitem queue. * * It is useful to delay writes of file data and filesystem metadata * for tens of seconds so that quickly created and deleted files need * not waste disk bandwidth being created and removed. To realize this, * we append vnodes to a "workitem" queue. When running with a soft * updates implementation, most pending metadata dependencies should * not wait for more than a few seconds. Thus, mounted block devices * are delayed only about half the time that file data is delayed. * Similarly, directory updates are more critical, so are only delayed * about a third the time that file data is delayed. Thus, there are * SYNCER_MAXDELAY queues that are processed round-robin at a rate of * one each second (driven off the filesystem syncer process). The * syncer_delayno variable indicates the next queue that is to be processed. * Items that need to be processed soon are placed in this queue: * * syncer_workitem_pending[syncer_delayno] * * A delay of fifteen seconds is done by placing the request fifteen * entries later in the queue: * * syncer_workitem_pending[(syncer_delayno + 15) & syncer_mask] * */ void vn_initialize_syncerd(void) { syncer_workitem_pending = hashinit(syncer_maxdelay, M_VNODE, M_WAITOK, &syncer_mask); syncer_maxdelay = syncer_mask + 1; } /* * Add an item to the syncer work queue. */ void vn_syncer_add_to_worklist(struct vnode *vp, int delay) { int s, slot; if (delay > syncer_maxdelay - 2) delay = syncer_maxdelay - 2; slot = (syncer_delayno + delay) & syncer_mask; s = splbio(); if (vp->v_bioflag & VBIOONSYNCLIST) LIST_REMOVE(vp, v_synclist); vp->v_bioflag |= VBIOONSYNCLIST; LIST_INSERT_HEAD(&syncer_workitem_pending[slot], vp, v_synclist); splx(s); } /* * System filesystem synchronizer daemon. */ void syncer_thread(void *arg) { uint64_t elapsed, start; struct proc *p = curproc; struct synclist *slp; struct vnode *vp; int s; for (;;) { start = getnsecuptime(); /* * Push files whose dirty time has expired. */ s = splbio(); slp = &syncer_workitem_pending[syncer_delayno]; syncer_delayno += 1; if (syncer_delayno == syncer_maxdelay) syncer_delayno = 0; while ((vp = LIST_FIRST(slp)) != NULL) { if (vget(vp, LK_EXCLUSIVE | LK_NOWAIT)) { /* * If we fail to get the lock, we move this * vnode one second ahead in time. * XXX - no good, but the best we can do. */ vn_syncer_add_to_worklist(vp, 1); continue; } splx(s); (void) VOP_FSYNC(vp, p->p_ucred, MNT_LAZY, p); vput(vp); s = splbio(); if (LIST_FIRST(slp) == vp) { /* * Note: disk vps can remain on the * worklist too with no dirty blocks, but * since sync_fsync() moves it to a different * slot we are safe. */ #ifdef DIAGNOSTIC if (LIST_FIRST(&vp->v_dirtyblkhd) == NULL && vp->v_type != VBLK) { vprint("fsync failed", vp); if (vp->v_mount != NULL) printf("mounted on: %s\n", vp->v_mount->mnt_stat.f_mntonname); panic("%s: fsync failed", __func__); } #endif /* DIAGNOSTIC */ /* * Put us back on the worklist. The worklist * routine will remove us from our current * position and then add us back in at a later * position. */ vn_syncer_add_to_worklist(vp, syncdelay); } sched_pause(yield); } splx(s); #ifdef FFS_SOFTUPDATES /* * Do soft update processing. */ softdep_process_worklist(NULL); #endif /* * The variable rushjob allows the kernel to speed up the * processing of the filesystem syncer process. A rushjob * value of N tells the filesystem syncer to process the next * N seconds worth of work on its queue ASAP. Currently rushjob * is used by the soft update code to speed up the filesystem * syncer process when the incore state is getting so far * ahead of the disk that the kernel memory pool is being * threatened with exhaustion. */ if (rushjob > 0) { rushjob -= 1; continue; } /* * If it has taken us less than a second to process the * current work, then wait. Otherwise start right over * again. We can still lose time if any single round * takes more than two seconds, but it does not really * matter as we are just trying to generally pace the * filesystem activity. */ elapsed = getnsecuptime() - start; if (elapsed < SEC_TO_NSEC(1)) { tsleep_nsec(&syncer_chan, PPAUSE, "syncer", SEC_TO_NSEC(1) - elapsed); } } } /* * Request the syncer daemon to speed up its work. * We never push it to speed up more than half of its * normal turn time, otherwise it could take over the cpu. */ int speedup_syncer(void) { if (syncerproc) wakeup_proc(syncerproc, &syncer_chan); if (rushjob < syncdelay / 2) { rushjob += 1; stat_rush_requests += 1; return 1; } return 0; } /* Routine to create and manage a filesystem syncer vnode. */ int sync_fsync(void *); int sync_inactive(void *); int sync_print(void *); const struct vops sync_vops = { .vop_close = nullop, .vop_fsync = sync_fsync, .vop_inactive = sync_inactive, .vop_reclaim = nullop, .vop_lock = nullop, .vop_unlock = nullop, .vop_islocked = nullop, .vop_print = sync_print }; /* * Create a new filesystem syncer vnode for the specified mount point. */ int vfs_allocate_syncvnode(struct mount *mp) { struct vnode *vp; static long start, incr, next; int error; /* Allocate a new vnode */ if ((error = getnewvnode(VT_VFS, mp, &sync_vops, &vp)) != 0) { mp->mnt_syncer = NULL; return (error); } vp->v_writecount = 1; vp->v_type = VNON; /* * Place the vnode onto the syncer worklist. We attempt to * scatter them about on the list so that they will go off * at evenly distributed times even if all the filesystems * are mounted at once. */ next += incr; if (next == 0 || next > syncer_maxdelay) { start /= 2; incr /= 2; if (start == 0) { start = syncer_maxdelay / 2; incr = syncer_maxdelay; } next = start; } vn_syncer_add_to_worklist(vp, next); mp->mnt_syncer = vp; return (0); } /* * Do a lazy sync of the filesystem. */ int sync_fsync(void *v) { struct vop_fsync_args *ap = v; struct vnode *syncvp = ap->a_vp; struct mount *mp = syncvp->v_mount; int asyncflag; /* * We only need to do something if this is a lazy evaluation. */ if (ap->a_waitfor != MNT_LAZY) return (0); /* * Move ourselves to the back of the sync list. */ vn_syncer_add_to_worklist(syncvp, syncdelay); /* * Walk the list of vnodes pushing all that are dirty and * not already on the sync list. */ if (vfs_busy(mp, VB_READ|VB_NOWAIT) == 0) { asyncflag = mp->mnt_flag & MNT_ASYNC; mp->mnt_flag &= ~MNT_ASYNC; VFS_SYNC(mp, MNT_LAZY, 0, ap->a_cred, ap->a_p); if (asyncflag) mp->mnt_flag |= MNT_ASYNC; vfs_unbusy(mp); } return (0); } /* * The syncer vnode is no longer needed and is being decommissioned. */ int sync_inactive(void *v) { struct vop_inactive_args *ap = v; struct vnode *vp = ap->a_vp; int s; if (vp->v_usecount == 0) { VOP_UNLOCK(vp); return (0); } vp->v_mount->mnt_syncer = NULL; s = splbio(); LIST_REMOVE(vp, v_synclist); vp->v_bioflag &= ~VBIOONSYNCLIST; splx(s); vp->v_writecount = 0; vput(vp); return (0); } /* * Print out a syncer vnode. */ int sync_print(void *v) { printf("syncer vnode\n"); return (0); }
1 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 /* $OpenBSD: video.c,v 1.57 2022/07/02 08:50:41 visa Exp $ */ /* * Copyright (c) 2008 Robert Nagy <robert@openbsd.org> * Copyright (c) 2008 Marcus Glocker <mglocker@openbsd.org> * * Permission to use, copy, modify, and distribute this software for any * purpose with or without fee is hereby granted, provided that the above * copyright notice and this permission notice appear in all copies. * * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. */ #include <sys/param.h> #include <sys/systm.h> #include <sys/errno.h> #include <sys/ioctl.h> #include <sys/fcntl.h> #include <sys/device.h> #include <sys/vnode.h> #include <sys/kernel.h> #include <sys/malloc.h> #include <sys/conf.h> #include <sys/proc.h> #include <sys/videoio.h> #include <dev/video_if.h> #include <uvm/uvm_extern.h> #ifdef VIDEO_DEBUG int video_debug = 1; #define DPRINTF(l, x...) do { if ((l) <= video_debug) printf(x); } while (0) #else #define DPRINTF(l, x...) #endif struct video_softc { struct device dev; void *hw_hdl; /* hardware driver handle */ struct device *sc_dev; /* hardware device struct */ const struct video_hw_if *hw_if; /* hardware interface */ char sc_dying; /* device detached */ struct process *sc_owner; /* owner process */ uint8_t sc_open; /* device opened */ int sc_fsize; uint8_t *sc_fbuffer; caddr_t sc_fbuffer_mmap; size_t sc_fbufferlen; int sc_vidmode; /* access mode */ #define VIDMODE_NONE 0 #define VIDMODE_MMAP 1 #define VIDMODE_READ 2 int sc_frames_ready; struct selinfo sc_rsel; /* read selector */ }; int videoprobe(struct device *, void *, void *); void videoattach(struct device *, struct device *, void *); int videodetach(struct device *, int); int videoactivate(struct device *, int); int videoprint(void *, const char *); void video_intr(void *); int video_stop(struct video_softc *); int video_claim(struct video_softc *, struct process *); const struct cfattach video_ca = { sizeof(struct video_softc), videoprobe, videoattach, videodetach, videoactivate }; struct cfdriver video_cd = { NULL, "video", DV_DULL }; /* * Global flag to control if video recording is enabled by kern.video.record. */ int video_record_enable = 0; int videoprobe(struct device *parent, void *match, void *aux) { return (1); } void videoattach(struct device *parent, struct device *self, void *aux) { struct video_softc *sc = (void *)self; struct video_attach_args *sa = aux; printf("\n"); sc->hw_if = sa->hwif; sc->hw_hdl = sa->hdl; sc->sc_dev = parent; sc->sc_fbufferlen = 0; sc->sc_owner = NULL; if (sc->hw_if->get_bufsize) sc->sc_fbufferlen = (sc->hw_if->get_bufsize)(sc->hw_hdl); if (sc->sc_fbufferlen == 0) { printf("video: could not request frame buffer size\n"); return; } sc->sc_fbuffer = malloc(sc->sc_fbufferlen, M_DEVBUF, M_NOWAIT); if (sc->sc_fbuffer == NULL) { printf("video: could not allocate frame buffer\n"); return; } } int videoopen(dev_t dev, int flags, int fmt, struct proc *p) { int unit = VIDEOUNIT(dev); struct video_softc *sc; int error = 0; KERNEL_ASSERT_LOCKED(); if (unit >= video_cd.cd_ndevs || (sc = video_cd.cd_devs[unit]) == NULL || sc->hw_if == NULL) return (ENXIO); if (sc->sc_open) { DPRINTF(1, "%s: device already open\n", __func__); return (0); } sc->sc_vidmode = VIDMODE_NONE; sc->sc_frames_ready = 0; if (sc->hw_if->open != NULL) { error = sc->hw_if->open(sc->hw_hdl, flags, &sc->sc_fsize, sc->sc_fbuffer, video_intr, sc); } if (error == 0) { sc->sc_open = 1; DPRINTF(1, "%s: set device to open\n", __func__); } return (error); } int videoclose(dev_t dev, int flags, int fmt, struct proc *p) { struct video_softc *sc; int error = 0; KERNEL_ASSERT_LOCKED(); DPRINTF(1, "%s: last close\n", __func__); sc = video_cd.cd_devs[VIDEOUNIT(dev)]; error = video_stop(sc); sc->sc_open = 0; return (error); } int videoread(dev_t dev, struct uio *uio, int ioflag) { int unit = VIDEOUNIT(dev); struct video_softc *sc; int error; size_t size; KERNEL_ASSERT_LOCKED(); if (unit >= video_cd.cd_ndevs || (sc = video_cd.cd_devs[unit]) == NULL) return (ENXIO); if (sc->sc_dying) return (EIO); if (sc->sc_vidmode == VIDMODE_MMAP) return (EBUSY); if ((error = video_claim(sc, curproc->p_p))) return (error); /* start the stream if not already started */ if (sc->sc_vidmode == VIDMODE_NONE && sc->hw_if->start_read) { error = sc->hw_if->start_read(sc->hw_hdl); if (error) return (error); sc->sc_vidmode = VIDMODE_READ; } DPRINTF(1, "resid=%zu\n", uio->uio_resid); if (sc->sc_frames_ready < 1) { /* block userland read until a frame is ready */ error = tsleep_nsec(sc, PWAIT | PCATCH, "vid_rd", INFSLP); if (sc->sc_dying) error = EIO; if (error) return (error); } /* move no more than 1 frame to userland, as per specification */ size = ulmin(uio->uio_resid, sc->sc_fsize); if (!video_record_enable) bzero(sc->sc_fbuffer, size); error = uiomove(sc->sc_fbuffer, size, uio); sc->sc_frames_ready--; if (error) return (error); DPRINTF(1, "uiomove successfully done (%zu bytes)\n", size); return (0); } int videoioctl(dev_t dev, u_long cmd, caddr_t data, int flags, struct proc *p) { int unit = VIDEOUNIT(dev); struct video_softc *sc; struct v4l2_buffer *vb = (struct v4l2_buffer *)data; int error; KERNEL_ASSERT_LOCKED(); if (unit >= video_cd.cd_ndevs || (sc = video_cd.cd_devs[unit]) == NULL || sc->hw_if == NULL) return (ENXIO); DPRINTF(3, "video_ioctl(%zu, '%c', %zu)\n", IOCPARM_LEN(cmd), (int) IOCGROUP(cmd), cmd & 0xff); error = EOPNOTSUPP; switch (cmd) { case VIDIOC_G_CTRL: if (sc->hw_if->g_ctrl) error = (sc->hw_if->g_ctrl)(sc->hw_hdl, (struct v4l2_control *)data); break; case VIDIOC_S_CTRL: if (sc->hw_if->s_ctrl) error = (sc->hw_if->s_ctrl)(sc->hw_hdl, (struct v4l2_control *)data); break; default: error = (ENOTTY); } if (error != ENOTTY) return (error); if ((error = video_claim(sc, p->p_p))) return (error); /* * The following IOCTLs can only be called by the device owner. * For further shared IOCTLs please move it up. */ error = EOPNOTSUPP; switch (cmd) { case VIDIOC_QUERYCAP: if (sc->hw_if->querycap) error = (sc->hw_if->querycap)(sc->hw_hdl, (struct v4l2_capability *)data); break; case VIDIOC_ENUM_FMT: if (sc->hw_if->enum_fmt) error = (sc->hw_if->enum_fmt)(sc->hw_hdl, (struct v4l2_fmtdesc *)data); break; case VIDIOC_ENUM_FRAMESIZES: if (sc->hw_if->enum_fsizes) error = (sc->hw_if->enum_fsizes)(sc->hw_hdl, (struct v4l2_frmsizeenum *)data); break; case VIDIOC_ENUM_FRAMEINTERVALS: if (sc->hw_if->enum_fivals) error = (sc->hw_if->enum_fivals)(sc->hw_hdl, (struct v4l2_frmivalenum *)data); break; case VIDIOC_S_FMT: if (!(flags & FWRITE)) return (EACCES); if (sc->hw_if->s_fmt) error = (sc->hw_if->s_fmt)(sc->hw_hdl, (struct v4l2_format *)data); break; case VIDIOC_G_FMT: if (sc->hw_if->g_fmt) error = (sc->hw_if->g_fmt)(sc->hw_hdl, (struct v4l2_format *)data); break; case VIDIOC_S_PARM: if (sc->hw_if->s_parm) error = (sc->hw_if->s_parm)(sc->hw_hdl, (struct v4l2_streamparm *)data); break; case VIDIOC_G_PARM: if (sc->hw_if->g_parm) error = (sc->hw_if->g_parm)(sc->hw_hdl, (struct v4l2_streamparm *)data); break; case VIDIOC_ENUMINPUT: if (sc->hw_if->enum_input) error = (sc->hw_if->enum_input)(sc->hw_hdl, (struct v4l2_input *)data); break; case VIDIOC_S_INPUT: if (sc->hw_if->s_input) error = (sc->hw_if->s_input)(sc->hw_hdl, (int)*data); break; case VIDIOC_G_INPUT: if (sc->hw_if->g_input) error = (sc->hw_if->g_input)(sc->hw_hdl, (int *)data); break; case VIDIOC_REQBUFS: if (sc->hw_if->reqbufs) error = (sc->hw_if->reqbufs)(sc->hw_hdl, (struct v4l2_requestbuffers *)data); break; case VIDIOC_QUERYBUF: if (sc->hw_if->querybuf) error = (sc->hw_if->querybuf)(sc->hw_hdl, (struct v4l2_buffer *)data); break; case VIDIOC_QBUF: if (sc->hw_if->qbuf) error = (sc->hw_if->qbuf)(sc->hw_hdl, (struct v4l2_buffer *)data); break; case VIDIOC_DQBUF: if (!sc->hw_if->dqbuf) break; /* should have called mmap() before now */ if (sc->sc_vidmode != VIDMODE_MMAP) { error = EINVAL; break; } error = (sc->hw_if->dqbuf)(sc->hw_hdl, (struct v4l2_buffer *)data); if (!video_record_enable) bzero(sc->sc_fbuffer_mmap + vb->m.offset, vb->length); sc->sc_frames_ready--; break; case VIDIOC_STREAMON: if (sc->hw_if->streamon) error = (sc->hw_if->streamon)(sc->hw_hdl, (int)*data); break; case VIDIOC_STREAMOFF: if (sc->hw_if->streamoff) error = (sc->hw_if->streamoff)(sc->hw_hdl, (int)*data); if (!error) { /* Release device ownership and streaming buffers. */ error = video_stop(sc); } break; case VIDIOC_TRY_FMT: if (sc->hw_if->try_fmt) error = (sc->hw_if->try_fmt)(sc->hw_hdl, (struct v4l2_format *)data); break; case VIDIOC_QUERYCTRL: if (sc->hw_if->queryctrl) error = (sc->hw_if->queryctrl)(sc->hw_hdl, (struct v4l2_queryctrl *)data); break; default: error = (ENOTTY); } return (error); } paddr_t videommap(dev_t dev, off_t off, int prot) { int unit = VIDEOUNIT(dev); struct video_softc *sc; caddr_t p; paddr_t pa; KERNEL_ASSERT_LOCKED(); DPRINTF(2, "%s: off=%lld, prot=%d\n", __func__, off, prot); if (unit >= video_cd.cd_ndevs || (sc = video_cd.cd_devs[unit]) == NULL) return (-1); if (sc->sc_dying) return (-1); if (sc->hw_if->mappage == NULL) return (-1); p = sc->hw_if->mappage(sc->hw_hdl, off, prot); if (p == NULL) return (-1); if (pmap_extract(pmap_kernel(), (vaddr_t)p, &pa) == FALSE) panic("videommap: invalid page"); sc->sc_vidmode = VIDMODE_MMAP; /* store frame buffer base address for later blanking */ if (off == 0) sc->sc_fbuffer_mmap = p; return (pa); } void filt_videodetach(struct knote *kn) { struct video_softc *sc = kn->kn_hook; int s; s = splhigh(); klist_remove_locked(&sc->sc_rsel.si_note, kn); splx(s); } int filt_videoread(struct knote *kn, long hint) { struct video_softc *sc = kn->kn_hook; if (sc->sc_frames_ready > 0) return (1); return (0); } const struct filterops video_filtops = { .f_flags = FILTEROP_ISFD, .f_attach = NULL, .f_detach = filt_videodetach, .f_event = filt_videoread, }; int videokqfilter(dev_t dev, struct knote *kn) { int unit = VIDEOUNIT(dev); struct video_softc *sc; int s, error; KERNEL_ASSERT_LOCKED(); if (unit >= video_cd.cd_ndevs || (sc = video_cd.cd_devs[unit]) == NULL) return (ENXIO); if (sc->sc_dying) return (ENXIO); switch (kn->kn_filter) { case EVFILT_READ: kn->kn_fop = &video_filtops; kn->kn_hook = sc; break; default: return (EINVAL); } if ((error = video_claim(sc, curproc->p_p))) return (error); /* * Start the stream in read() mode if not already started. If * the user wanted mmap() mode, he should have called mmap() * before now. */ if (sc->sc_vidmode == VIDMODE_NONE && sc->hw_if->start_read) { if (sc->hw_if->start_read(sc->hw_hdl)) return (ENXIO); sc->sc_vidmode = VIDMODE_READ; } s = splhigh(); klist_insert_locked(&sc->sc_rsel.si_note, kn); splx(s); return (0); } int video_submatch(struct device *parent, void *match, void *aux) { struct cfdata *cf = match; return (cf->cf_driver == &video_cd); } /* * Called from hardware driver. This is where the MI video driver gets * probed/attached to the hardware driver */ struct device * video_attach_mi(const struct video_hw_if *rhwp, void *hdlp, struct device *dev) { struct video_attach_args arg; arg.hwif = rhwp; arg.hdl = hdlp; return (config_found_sm(dev, &arg, videoprint, video_submatch)); } void video_intr(void *addr) { struct video_softc *sc = (struct video_softc *)addr; DPRINTF(3, "video_intr sc=%p\n", sc); if (sc->sc_vidmode != VIDMODE_NONE) sc->sc_frames_ready++; else printf("%s: interrupt but no streams!\n", __func__); if (sc->sc_vidmode == VIDMODE_READ) wakeup(sc); selwakeup(&sc->sc_rsel); } int video_stop(struct video_softc *sc) { int error = 0; DPRINTF(1, "%s: stream close\n", __func__); if (sc->hw_if->close != NULL) error = sc->hw_if->close(sc->hw_hdl); sc->sc_vidmode = VIDMODE_NONE; sc->sc_frames_ready = 0; sc->sc_owner = NULL; return (error); } int video_claim(struct video_softc *sc, struct process *pr) { if (sc->sc_owner != NULL && sc->sc_owner != pr) { DPRINTF(1, "%s: already owned=%p\n", __func__, sc->sc_owner); return (EBUSY); } if (sc->sc_owner == NULL) { sc->sc_owner = pr; DPRINTF(1, "%s: new owner=%p\n", __func__, sc->sc_owner); } return (0); } int videoprint(void *aux, const char *pnp) { if (pnp != NULL) printf("video at %s", pnp); return (UNCONF); } int videodetach(struct device *self, int flags) { struct video_softc *sc = (struct video_softc *)self; int s, maj, mn; /* locate the major number */ for (maj = 0; maj < nchrdev; maj++) if (cdevsw[maj].d_open == videoopen) break; /* Nuke the vnodes for any open instances (calls close). */ mn = self->dv_unit; vdevgone(maj, mn, mn, VCHR); s = splhigh(); klist_invalidate(&sc->sc_rsel.si_note); splx(s); free(sc->sc_fbuffer, M_DEVBUF, sc->sc_fbufferlen); return (0); } int videoactivate(struct device *self, int act) { struct video_softc *sc = (struct video_softc *)self; switch (act) { case DVACT_DEACTIVATE: sc->sc_dying = 1; break; } return (0); }
68 183 183 13 106 10 8 8 8 8 8 8 28 8 21 28 61 188 2347 1496 882 160 47 26 21 114 216 59 2346 1500 881 158 47 26 21 112 2249 1515 1516 861 162 85 71 14 62 14 77 105 900 900 1 2 1 900 900 3 616 610 39 612 6 618 121 1 118 118 117 1 3 210 6 201 5 14 8 94 100 138 42 4 40 29 29 35 35 33 2 200 200 193 7 5 191 6 198 198 198 15 15 1 14 14 14 106 106 55 23 3 20 46 30 14 24 5 54 75 2 153 153 152 2 2 2 2 2 2 2 43 43 6 43 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 /* $OpenBSD: uipc_socket2.c,v 1.128 2022/09/05 14:56:09 bluhm Exp $ */ /* $NetBSD: uipc_socket2.c,v 1.11 1996/02/04 02:17:55 christos Exp $ */ /* * Copyright (c) 1982, 1986, 1988, 1990, 1993 * The Regents of the University of California. All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * @(#)uipc_socket2.c 8.1 (Berkeley) 6/10/93 */ #include <sys/param.h> #include <sys/systm.h> #include <sys/malloc.h> #include <sys/mbuf.h> #include <sys/protosw.h> #include <sys/domain.h> #include <sys/socket.h> #include <sys/socketvar.h> #include <sys/signalvar.h> #include <sys/event.h> #include <sys/pool.h> /* * Primitive routines for operating on sockets and socket buffers */ u_long sb_max = SB_MAX; /* patchable */ extern struct pool mclpools[]; extern struct pool mbpool; /* * Procedures to manipulate state flags of socket * and do appropriate wakeups. Normal sequence from the * active (originating) side is that soisconnecting() is * called during processing of connect() call, * resulting in an eventual call to soisconnected() if/when the * connection is established. When the connection is torn down * soisdisconnecting() is called during processing of disconnect() call, * and soisdisconnected() is called when the connection to the peer * is totally severed. The semantics of these routines are such that * connectionless protocols can call soisconnected() and soisdisconnected() * only, bypassing the in-progress calls when setting up a ``connection'' * takes no time. * * From the passive side, a socket is created with * two queues of sockets: so_q0 for connections in progress * and so_q for connections already made and awaiting user acceptance. * As a protocol is preparing incoming connections, it creates a socket * structure queued on so_q0 by calling sonewconn(). When the connection * is established, soisconnected() is called, and transfers the * socket structure to so_q, making it available to accept(). * * If a socket is closed with sockets on either * so_q0 or so_q, these sockets are dropped. * * If higher level protocols are implemented in * the kernel, the wakeups done here will sometimes * cause software-interrupt process scheduling. */ void soisconnecting(struct socket *so) { soassertlocked(so); so->so_state &= ~(SS_ISCONNECTED|SS_ISDISCONNECTING); so->so_state |= SS_ISCONNECTING; } void soisconnected(struct socket *so) { struct socket *head = so->so_head; soassertlocked(so); so->so_state &= ~(SS_ISCONNECTING|SS_ISDISCONNECTING); so->so_state |= SS_ISCONNECTED; if (head != NULL && so->so_onq == &head->so_q0) { int persocket = solock_persocket(so); if (persocket) { soref(so); soref(head); sounlock(so); solock(head); solock(so); if (so->so_onq != &head->so_q0) { sounlock(head); sorele(head); sorele(so); return; } sorele(head); sorele(so); } soqremque(so, 0); soqinsque(head, so, 1); sorwakeup(head); wakeup_one(&head->so_timeo); if (persocket) sounlock(head); } else { wakeup(&so->so_timeo); sorwakeup(so); sowwakeup(so); } } void soisdisconnecting(struct socket *so) { soassertlocked(so); so->so_state &= ~SS_ISCONNECTING; so->so_state |= (SS_ISDISCONNECTING|SS_CANTRCVMORE|SS_CANTSENDMORE); wakeup(&so->so_timeo); sowwakeup(so); sorwakeup(so); } void soisdisconnected(struct socket *so) { soassertlocked(so); so->so_state &= ~(SS_ISCONNECTING|SS_ISCONNECTED|SS_ISDISCONNECTING); so->so_state |= (SS_CANTRCVMORE|SS_CANTSENDMORE|SS_ISDISCONNECTED); wakeup(&so->so_timeo); sowwakeup(so); sorwakeup(so); } /* * When an attempt at a new connection is noted on a socket * which accepts connections, sonewconn is called. If the * connection is possible (subject to space constraints, etc.) * then we allocate a new structure, properly linked into the * data structure of the original socket, and return this. * Connstatus may be 0 or SS_ISCONNECTED. */ struct socket * sonewconn(struct socket *head, int connstatus) { struct socket *so; int persocket = solock_persocket(head); int error; /* * XXXSMP as long as `so' and `head' share the same lock, we * can call soreserve() and pr_attach() below w/o explicitly * locking `so'. */ soassertlocked(head); if (m_pool_used() > 95) return (NULL); if (head->so_qlen + head->so_q0len > head->so_qlimit * 3) return (NULL); so = soalloc(PR_NOWAIT | PR_ZERO); if (so == NULL) return (NULL); so->so_type = head->so_type; so->so_options = head->so_options &~ SO_ACCEPTCONN; so->so_linger = head->so_linger; so->so_state = head->so_state | SS_NOFDREF; so->so_proto = head->so_proto; so->so_timeo = head->so_timeo; so->so_euid = head->so_euid; so->so_ruid = head->so_ruid; so->so_egid = head->so_egid; so->so_rgid = head->so_rgid; so->so_cpid = head->so_cpid; /* * Lock order will be `head' -> `so' while these sockets are linked. */ if (persocket) solock(so); /* * Inherit watermarks but those may get clamped in low mem situations. */ if (soreserve(so, head->so_snd.sb_hiwat, head->so_rcv.sb_hiwat)) { if (persocket) sounlock(so); pool_put(&socket_pool, so); return (NULL); } so->so_snd.sb_wat = head->so_snd.sb_wat; so->so_snd.sb_lowat = head->so_snd.sb_lowat; so->so_snd.sb_timeo_nsecs = head->so_snd.sb_timeo_nsecs; so->so_rcv.sb_wat = head->so_rcv.sb_wat; so->so_rcv.sb_lowat = head->so_rcv.sb_lowat; so->so_rcv.sb_timeo_nsecs = head->so_rcv.sb_timeo_nsecs; klist_init(&so->so_rcv.sb_sel.si_note, &socket_klistops, so); klist_init(&so->so_snd.sb_sel.si_note, &socket_klistops, so); sigio_init(&so->so_sigio); sigio_copy(&so->so_sigio, &head->so_sigio); soqinsque(head, so, 0); /* * We need to unlock `head' because PCB layer could release * solock() to enforce desired lock order. */ if (persocket) { head->so_newconn++; sounlock(head); } error = pru_attach(so, 0); if (persocket) { sounlock(so); solock(head); solock(so); if ((head->so_newconn--) == 0) { if ((head->so_state & SS_NEWCONN_WAIT) != 0) { head->so_state &= ~SS_NEWCONN_WAIT; wakeup(&head->so_newconn); } } } if (error) { soqremque(so, 0); if (persocket) sounlock(so); sigio_free(&so->so_sigio); klist_free(&so->so_rcv.sb_sel.si_note); klist_free(&so->so_snd.sb_sel.si_note); pool_put(&socket_pool, so); return (NULL); } if (connstatus) { so->so_state |= connstatus; soqremque(so, 0); soqinsque(head, so, 1); sorwakeup(head); wakeup(&head->so_timeo); } if (persocket) sounlock(so); return (so); } void soqinsque(struct socket *head, struct socket *so, int q) { soassertlocked(head); soassertlocked(so); KASSERT(so->so_onq == NULL); so->so_head = head; if (q == 0) { head->so_q0len++; so->so_onq = &head->so_q0; } else { head->so_qlen++; so->so_onq = &head->so_q; } TAILQ_INSERT_TAIL(so->so_onq, so, so_qe); } int soqremque(struct socket *so, int q) { struct socket *head = so->so_head; soassertlocked(so); soassertlocked(head); if (q == 0) { if (so->so_onq != &head->so_q0) return (0); head->so_q0len--; } else { if (so->so_onq != &head->so_q) return (0); head->so_qlen--; } TAILQ_REMOVE(so->so_onq, so, so_qe); so->so_onq = NULL; so->so_head = NULL; return (1); } /* * Socantsendmore indicates that no more data will be sent on the * socket; it would normally be applied to a socket when the user * informs the system that no more data is to be sent, by the protocol * code (in case PRU_SHUTDOWN). Socantrcvmore indicates that no more data * will be received, and will normally be applied to the socket by a * protocol when it detects that the peer will send no more data. * Data queued for reading in the socket may yet be read. */ void socantsendmore(struct socket *so) { soassertlocked(so); so->so_state |= SS_CANTSENDMORE; sowwakeup(so); } void socantrcvmore(struct socket *so) { soassertlocked(so); so->so_state |= SS_CANTRCVMORE; sorwakeup(so); } void solock(struct socket *so) { switch (so->so_proto->pr_domain->dom_family) { case PF_INET: case PF_INET6: NET_LOCK(); break; default: rw_enter_write(&so->so_lock); break; } } void solock_shared(struct socket *so) { switch (so->so_proto->pr_domain->dom_family) { case PF_INET: case PF_INET6: if (so->so_proto->pr_usrreqs->pru_lock != NULL) { NET_LOCK_SHARED(); pru_lock(so); } else NET_LOCK(); break; default: rw_enter_write(&so->so_lock); break; } } int solock_persocket(struct socket *so) { switch (so->so_proto->pr_domain->dom_family) { case PF_INET: case PF_INET6: return 0; default: return 1; } } void solock_pair(struct socket *so1, struct socket *so2) { KASSERT(so1 != so2); KASSERT(so1->so_type == so2->so_type); KASSERT(solock_persocket(so1)); if (so1 < so2) { solock(so1); solock(so2); } else { solock(so2); solock(so1); } } void sounlock(struct socket *so) { switch (so->so_proto->pr_domain->dom_family) { case PF_INET: case PF_INET6: NET_UNLOCK(); break; default: rw_exit_write(&so->so_lock); break; } } void sounlock_shared(struct socket *so) { switch (so->so_proto->pr_domain->dom_family) { case PF_INET: case PF_INET6: if (so->so_proto->pr_usrreqs->pru_unlock != NULL) { pru_unlock(so); NET_UNLOCK_SHARED(); } else NET_UNLOCK(); break; default: rw_exit_write(&so->so_lock); break; } } void soassertlocked(struct socket *so) { switch (so->so_proto->pr_domain->dom_family) { case PF_INET: case PF_INET6: NET_ASSERT_LOCKED(); break; default: rw_assert_wrlock(&so->so_lock); break; } } int sosleep_nsec(struct socket *so, void *ident, int prio, const char *wmesg, uint64_t nsecs) { int ret; switch (so->so_proto->pr_domain->dom_family) { case PF_INET: case PF_INET6: if (so->so_proto->pr_usrreqs->pru_unlock != NULL && rw_status(&netlock) == RW_READ) { pru_unlock(so); } ret = rwsleep_nsec(ident, &netlock, prio, wmesg, nsecs); if (so->so_proto->pr_usrreqs->pru_lock != NULL && rw_status(&netlock) == RW_READ) { pru_lock(so); } break; default: ret = rwsleep_nsec(ident, &so->so_lock, prio, wmesg, nsecs); break; } return ret; } /* * Wait for data to arrive at/drain from a socket buffer. */ int sbwait(struct socket *so, struct sockbuf *sb) { int prio = (sb->sb_flags & SB_NOINTR) ? PSOCK : PSOCK | PCATCH; soassertlocked(so); sb->sb_flags |= SB_WAIT; return sosleep_nsec(so, &sb->sb_cc, prio, "netio", sb->sb_timeo_nsecs); } int sblock(struct socket *so, struct sockbuf *sb, int wait) { int error, prio = (sb->sb_flags & SB_NOINTR) ? PSOCK : PSOCK | PCATCH; soassertlocked(so); if ((sb->sb_flags & SB_LOCK) == 0) { sb->sb_flags |= SB_LOCK; return (0); } if (wait & M_NOWAIT) return (EWOULDBLOCK); while (sb->sb_flags & SB_LOCK) { sb->sb_flags |= SB_WANT; error = sosleep_nsec(so, &sb->sb_flags, prio, "netlck", INFSLP); if (error) return (error); } sb->sb_flags |= SB_LOCK; return (0); } void sbunlock(struct socket *so, struct sockbuf *sb) { soassertlocked(so); sb->sb_flags &= ~SB_LOCK; if (sb->sb_flags & SB_WANT) { sb->sb_flags &= ~SB_WANT; wakeup(&sb->sb_flags); } } /* * Wakeup processes waiting on a socket buffer. * Do asynchronous notification via SIGIO * if the socket buffer has the SB_ASYNC flag set. */ void sowakeup(struct socket *so, struct sockbuf *sb) { soassertlocked(so); if (sb->sb_flags & SB_WAIT) { sb->sb_flags &= ~SB_WAIT; wakeup(&sb->sb_cc); } if (sb->sb_flags & SB_ASYNC) pgsigio(&so->so_sigio, SIGIO, 0); KNOTE(&sb->sb_sel.si_note, 0); } /* * Socket buffer (struct sockbuf) utility routines. * * Each socket contains two socket buffers: one for sending data and * one for receiving data. Each buffer contains a queue of mbufs, * information about the number of mbufs and amount of data in the * queue, and other fields allowing select() statements and notification * on data availability to be implemented. * * Data stored in a socket buffer is maintained as a list of records. * Each record is a list of mbufs chained together with the m_next * field. Records are chained together with the m_nextpkt field. The upper * level routine soreceive() expects the following conventions to be * observed when placing information in the receive buffer: * * 1. If the protocol requires each message be preceded by the sender's * name, then a record containing that name must be present before * any associated data (mbuf's must be of type MT_SONAME). * 2. If the protocol supports the exchange of ``access rights'' (really * just additional data associated with the message), and there are * ``rights'' to be received, then a record containing this data * should be present (mbuf's must be of type MT_CONTROL). * 3. If a name or rights record exists, then it must be followed by * a data record, perhaps of zero length. * * Before using a new socket structure it is first necessary to reserve * buffer space to the socket, by calling sbreserve(). This should commit * some of the available buffer space in the system buffer pool for the * socket (currently, it does nothing but enforce limits). The space * should be released by calling sbrelease() when the socket is destroyed. */ int soreserve(struct socket *so, u_long sndcc, u_long rcvcc) { soassertlocked(so); if (sbreserve(so, &so->so_snd, sndcc)) goto bad; if (sbreserve(so, &so->so_rcv, rcvcc)) goto bad2; so->so_snd.sb_wat = sndcc; so->so_rcv.sb_wat = rcvcc; if (so->so_rcv.sb_lowat == 0) so->so_rcv.sb_lowat = 1; if (so->so_snd.sb_lowat == 0) so->so_snd.sb_lowat = MCLBYTES; if (so->so_snd.sb_lowat > so->so_snd.sb_hiwat) so->so_snd.sb_lowat = so->so_snd.sb_hiwat; return (0); bad2: sbrelease(so, &so->so_snd); bad: return (ENOBUFS); } /* * Allot mbufs to a sockbuf. * Attempt to scale mbmax so that mbcnt doesn't become limiting * if buffering efficiency is near the normal case. */ int sbreserve(struct socket *so, struct sockbuf *sb, u_long cc) { KASSERT(sb == &so->so_rcv || sb == &so->so_snd); soassertlocked(so); if (cc == 0 || cc > sb_max) return (1); sb->sb_hiwat = cc; sb->sb_mbmax = max(3 * MAXMCLBYTES, cc * 8); if (sb->sb_lowat > sb->sb_hiwat) sb->sb_lowat = sb->sb_hiwat; return (0); } /* * In low memory situation, do not accept any greater than normal request. */ int sbcheckreserve(u_long cnt, u_long defcnt) { if (cnt > defcnt && sbchecklowmem()) return (ENOBUFS); return (0); } int sbchecklowmem(void) { static int sblowmem; unsigned int used = m_pool_used(); if (used < 60) sblowmem = 0; else if (used > 80) sblowmem = 1; return (sblowmem); } /* * Free mbufs held by a socket, and reserved mbuf space. */ void sbrelease(struct socket *so, struct sockbuf *sb) { sbflush(so, sb); sb->sb_hiwat = sb->sb_mbmax = 0; } /* * Routines to add and remove * data from an mbuf queue. * * The routines sbappend() or sbappendrecord() are normally called to * append new mbufs to a socket buffer, after checking that adequate * space is available, comparing the function sbspace() with the amount * of data to be added. sbappendrecord() differs from sbappend() in * that data supplied is treated as the beginning of a new record. * To place a sender's address, optional access rights, and data in a * socket receive buffer, sbappendaddr() should be used. To place * access rights and data in a socket receive buffer, sbappendrights() * should be used. In either case, the new data begins a new record. * Note that unlike sbappend() and sbappendrecord(), these routines check * for the caller that there will be enough space to store the data. * Each fails if there is not enough space, or if it cannot find mbufs * to store additional information in. * * Reliable protocols may use the socket send buffer to hold data * awaiting acknowledgement. Data is normally copied from a socket * send buffer in a protocol with m_copym for output to a peer, * and then removing the data from the socket buffer with sbdrop() * or sbdroprecord() when the data is acknowledged by the peer. */ #ifdef SOCKBUF_DEBUG void sblastrecordchk(struct sockbuf *sb, const char *where) { struct mbuf *m = sb->sb_mb; while (m && m->m_nextpkt) m = m->m_nextpkt; if (m != sb->sb_lastrecord) { printf("sblastrecordchk: sb_mb %p sb_lastrecord %p last %p\n", sb->sb_mb, sb->sb_lastrecord, m); printf("packet chain:\n"); for (m = sb->sb_mb; m != NULL; m = m->m_nextpkt) printf("\t%p\n", m); panic("sblastrecordchk from %s", where); } } void sblastmbufchk(struct sockbuf *sb, const char *where) { struct mbuf *m = sb->sb_mb; struct mbuf *n; while (m && m->m_nextpkt) m = m->m_nextpkt; while (m && m->m_next) m = m->m_next; if (m != sb->sb_mbtail) { printf("sblastmbufchk: sb_mb %p sb_mbtail %p last %p\n", sb->sb_mb, sb->sb_mbtail, m); printf("packet tree:\n"); for (m = sb->sb_mb; m != NULL; m = m->m_nextpkt) { printf("\t"); for (n = m; n != NULL; n = n->m_next) printf("%p ", n); printf("\n"); } panic("sblastmbufchk from %s", where); } } #endif /* SOCKBUF_DEBUG */ #define SBLINKRECORD(sb, m0) \ do { \ if ((sb)->sb_lastrecord != NULL) \ (sb)->sb_lastrecord->m_nextpkt = (m0); \ else \ (sb)->sb_mb = (m0); \ (sb)->sb_lastrecord = (m0); \ } while (/*CONSTCOND*/0) /* * Append mbuf chain m to the last record in the * socket buffer sb. The additional space associated * the mbuf chain is recorded in sb. Empty mbufs are * discarded and mbufs are compacted where possible. */ void sbappend(struct socket *so, struct sockbuf *sb, struct mbuf *m) { struct mbuf *n; if (m == NULL) return; soassertlocked(so); SBLASTRECORDCHK(sb, "sbappend 1"); if ((n = sb->sb_lastrecord) != NULL) { /* * XXX Would like to simply use sb_mbtail here, but * XXX I need to verify that I won't miss an EOR that * XXX way. */ do { if (n->m_flags & M_EOR) { sbappendrecord(so, sb, m); /* XXXXXX!!!! */ return; } } while (n->m_next && (n = n->m_next)); } else { /* * If this is the first record in the socket buffer, it's * also the last record. */ sb->sb_lastrecord = m; } sbcompress(so, sb, m, n); SBLASTRECORDCHK(sb, "sbappend 2"); } /* * This version of sbappend() should only be used when the caller * absolutely knows that there will never be more than one record * in the socket buffer, that is, a stream protocol (such as TCP). */ void sbappendstream(struct socket *so, struct sockbuf *sb, struct mbuf *m) { KASSERT(sb == &so->so_rcv || sb == &so->so_snd); soassertlocked(so); KDASSERT(m->m_nextpkt == NULL); KASSERT(sb->sb_mb == sb->sb_lastrecord); SBLASTMBUFCHK(sb, __func__); sbcompress(so, sb, m, sb->sb_mbtail); sb->sb_lastrecord = sb->sb_mb; SBLASTRECORDCHK(sb, __func__); } #ifdef SOCKBUF_DEBUG void sbcheck(struct socket *so, struct sockbuf *sb) { struct mbuf *m, *n; u_long len = 0, mbcnt = 0; for (m = sb->sb_mb; m; m = m->m_nextpkt) { for (n = m; n; n = n->m_next) { len += n->m_len; mbcnt += MSIZE; if (n->m_flags & M_EXT) mbcnt += n->m_ext.ext_size; if (m != n && n->m_nextpkt) panic("sbcheck nextpkt"); } } if (len != sb->sb_cc || mbcnt != sb->sb_mbcnt) { printf("cc %lu != %lu || mbcnt %lu != %lu\n", len, sb->sb_cc, mbcnt, sb->sb_mbcnt); panic("sbcheck"); } } #endif /* * As above, except the mbuf chain * begins a new record. */ void sbappendrecord(struct socket *so, struct sockbuf *sb, struct mbuf *m0) { struct mbuf *m; KASSERT(sb == &so->so_rcv || sb == &so->so_snd); soassertlocked(so); if (m0 == NULL) return; /* * Put the first mbuf on the queue. * Note this permits zero length records. */ sballoc(so, sb, m0); SBLASTRECORDCHK(sb, "sbappendrecord 1"); SBLINKRECORD(sb, m0); m = m0->m_next; m0->m_next = NULL; if (m && (m0->m_flags & M_EOR)) { m0->m_flags &= ~M_EOR; m->m_flags |= M_EOR; } sbcompress(so, sb, m, m0); SBLASTRECORDCHK(sb, "sbappendrecord 2"); } /* * Append address and data, and optionally, control (ancillary) data * to the receive queue of a socket. If present, * m0 must include a packet header with total length. * Returns 0 if no space in sockbuf or insufficient mbufs. */ int sbappendaddr(struct socket *so, struct sockbuf *sb, const struct sockaddr *asa, struct mbuf *m0, struct mbuf *control) { struct mbuf *m, *n, *nlast; int space = asa->sa_len; soassertlocked(so); if (m0 && (m0->m_flags & M_PKTHDR) == 0) panic("sbappendaddr"); if (m0) space += m0->m_pkthdr.len; for (n = control; n; n = n->m_next) { space += n->m_len; if (n->m_next == NULL) /* keep pointer to last control buf */ break; } if (space > sbspace(so, sb)) return (0); if (asa->sa_len > MLEN) return (0); MGET(m, M_DONTWAIT, MT_SONAME); if (m == NULL) return (0); m->m_len = asa->sa_len; memcpy(mtod(m, caddr_t), asa, asa->sa_len); if (n) n->m_next = m0; /* concatenate data to control */ else control = m0; m->m_next = control; SBLASTRECORDCHK(sb, "sbappendaddr 1"); for (n = m; n->m_next != NULL; n = n->m_next) sballoc(so, sb, n); sballoc(so, sb, n); nlast = n; SBLINKRECORD(sb, m); sb->sb_mbtail = nlast; SBLASTMBUFCHK(sb, "sbappendaddr"); SBLASTRECORDCHK(sb, "sbappendaddr 2"); return (1); } int sbappendcontrol(struct socket *so, struct sockbuf *sb, struct mbuf *m0, struct mbuf *control) { struct mbuf *m, *mlast, *n; int space = 0; if (control == NULL) panic("sbappendcontrol"); for (m = control; ; m = m->m_next) { space += m->m_len; if (m->m_next == NULL) break; } n = m; /* save pointer to last control buffer */ for (m = m0; m; m = m->m_next) space += m->m_len; if (space > sbspace(so, sb)) return (0); n->m_next = m0; /* concatenate data to control */ SBLASTRECORDCHK(sb, "sbappendcontrol 1"); for (m = control; m->m_next != NULL; m = m->m_next) sballoc(so, sb, m); sballoc(so, sb, m); mlast = m; SBLINKRECORD(sb, control); sb->sb_mbtail = mlast; SBLASTMBUFCHK(sb, "sbappendcontrol"); SBLASTRECORDCHK(sb, "sbappendcontrol 2"); return (1); } /* * Compress mbuf chain m into the socket * buffer sb following mbuf n. If n * is null, the buffer is presumed empty. */ void sbcompress(struct socket *so, struct sockbuf *sb, struct mbuf *m, struct mbuf *n) { int eor = 0; struct mbuf *o; while (m) { eor |= m->m_flags & M_EOR; if (m->m_len == 0 && (eor == 0 || (((o = m->m_next) || (o = n)) && o->m_type == m->m_type))) { if (sb->sb_lastrecord == m) sb->sb_lastrecord = m->m_next; m = m_free(m); continue; } if (n && (n->m_flags & M_EOR) == 0 && /* m_trailingspace() checks buffer writeability */ m->m_len <= ((n->m_flags & M_EXT)? n->m_ext.ext_size : MCLBYTES) / 4 && /* XXX Don't copy too much */ m->m_len <= m_trailingspace(n) && n->m_type == m->m_type) { memcpy(mtod(n, caddr_t) + n->m_len, mtod(m, caddr_t), m->m_len); n->m_len += m->m_len; sb->sb_cc += m->m_len; if (m->m_type != MT_CONTROL && m->m_type != MT_SONAME) sb->sb_datacc += m->m_len; m = m_free(m); continue; } if (n) n->m_next = m; else sb->sb_mb = m; sb->sb_mbtail = m; sballoc(so, sb, m); n = m; m->m_flags &= ~M_EOR; m = m->m_next; n->m_next = NULL; } if (eor) { if (n) n->m_flags |= eor; else printf("semi-panic: sbcompress"); } SBLASTMBUFCHK(sb, __func__); } /* * Free all mbufs in a sockbuf. * Check that all resources are reclaimed. */ void sbflush(struct socket *so, struct sockbuf *sb) { KASSERT(sb == &so->so_rcv || sb == &so->so_snd); KASSERT((sb->sb_flags & SB_LOCK) == 0); while (sb->sb_mbcnt) sbdrop(so, sb, (int)sb->sb_cc); KASSERT(sb->sb_cc == 0); KASSERT(sb->sb_datacc == 0); KASSERT(sb->sb_mb == NULL); KASSERT(sb->sb_mbtail == NULL); KASSERT(sb->sb_lastrecord == NULL); } /* * Drop data from (the front of) a sockbuf. */ void sbdrop(struct socket *so, struct sockbuf *sb, int len) { struct mbuf *m, *mn; struct mbuf *next; KASSERT(sb == &so->so_rcv || sb == &so->so_snd); soassertlocked(so); next = (m = sb->sb_mb) ? m->m_nextpkt : NULL; while (len > 0) { if (m == NULL) { if (next == NULL) panic("sbdrop"); m = next; next = m->m_nextpkt; continue; } if (m->m_len > len) { m->m_len -= len; m->m_data += len; sb->sb_cc -= len; if (m->m_type != MT_CONTROL && m->m_type != MT_SONAME) sb->sb_datacc -= len; break; } len -= m->m_len; sbfree(so, sb, m); mn = m_free(m); m = mn; } while (m && m->m_len == 0) { sbfree(so, sb, m); mn = m_free(m); m = mn; } if (m) { sb->sb_mb = m; m->m_nextpkt = next; } else sb->sb_mb = next; /* * First part is an inline SB_EMPTY_FIXUP(). Second part * makes sure sb_lastrecord is up-to-date if we dropped * part of the last record. */ m = sb->sb_mb; if (m == NULL) { sb->sb_mbtail = NULL; sb->sb_lastrecord = NULL; } else if (m->m_nextpkt == NULL) sb->sb_lastrecord = m; } /* * Drop a record off the front of a sockbuf * and move the next record to the front. */ void sbdroprecord(struct socket *so, struct sockbuf *sb) { struct mbuf *m, *mn; m = sb->sb_mb; if (m) { sb->sb_mb = m->m_nextpkt; do { sbfree(so, sb, m); mn = m_free(m); } while ((m = mn) != NULL); } SB_EMPTY_FIXUP(sb); } /* * Create a "control" mbuf containing the specified data * with the specified type for presentation on a socket buffer. */ struct mbuf * sbcreatecontrol(const void *p, size_t size, int type, int level) { struct cmsghdr *cp; struct mbuf *m; if (CMSG_SPACE(size) > MCLBYTES) { printf("sbcreatecontrol: message too large %zu\n", size); return (NULL); } if ((m = m_get(M_DONTWAIT, MT_CONTROL)) == NULL) return (NULL); if (CMSG_SPACE(size) > MLEN) { MCLGET(m, M_DONTWAIT); if ((m->m_flags & M_EXT) == 0) { m_free(m); return NULL; } } cp = mtod(m, struct cmsghdr *); memset(cp, 0, CMSG_SPACE(size)); memcpy(CMSG_DATA(cp), p, size); m->m_len = CMSG_SPACE(size); cp->cmsg_len = CMSG_LEN(size); cp->cmsg_level = level; cp->cmsg_type = type; return (m); }
26 26 26 189 98 94 4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 /* $OpenBSD: vfs_default.c,v 1.51 2022/04/27 14:52:25 claudio Exp $ */ /* * Portions of this code are: * * Copyright (c) 1989, 1993 * The Regents of the University of California. All rights reserved. * (c) UNIX System Laboratories, Inc. * All or some portions of this file are derived from material licensed * to the University of California by American Telephone and Telegraph * Co. or Unix System Laboratories, Inc. and are reproduced herein with * the permission of UNIX System Laboratories, Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ #include <sys/param.h> #include <sys/systm.h> #include <sys/mount.h> #include <sys/vnode.h> #include <sys/namei.h> #include <sys/pool.h> #include <sys/event.h> #include <sys/specdev.h> int filt_generic_readwrite(struct knote *, long); void filt_generic_detach(struct knote *); /* * Eliminate all activity associated with the requested vnode * and with all vnodes aliased to the requested vnode. */ int vop_generic_revoke(void *v) { struct vop_revoke_args *ap = v; struct vnode *vp, *vq; struct proc *p = curproc; #ifdef DIAGNOSTIC if ((ap->a_flags & REVOKEALL) == 0) panic("vop_generic_revoke"); #endif vp = ap->a_vp; while (vp->v_type == VBLK && vp->v_specinfo != NULL && vp->v_specmountpoint != NULL) { struct mount *mp = vp->v_specmountpoint; /* * If we have a mount point associated with the vnode, we must * flush it out now, as to not leave a dangling zombie mount * point laying around in VFS. */ if (!vfs_busy(mp, VB_WRITE|VB_WAIT)) { dounmount(mp, MNT_FORCE | MNT_DOOMED, p); break; } } if (vp->v_flag & VALIASED) { /* * If a vgone (or vclean) is already in progress, * wait until it is done and return. */ mtx_enter(&vnode_mtx); if (vp->v_lflag & VXLOCK) { vp->v_lflag |= VXWANT; msleep_nsec(vp, &vnode_mtx, PINOD, "vop_generic_revokeall", INFSLP); mtx_leave(&vnode_mtx); return(0); } /* * Ensure that vp will not be vgone'd while we * are eliminating its aliases. */ vp->v_lflag |= VXLOCK; mtx_leave(&vnode_mtx); while (vp->v_flag & VALIASED) { SLIST_FOREACH(vq, vp->v_hashchain, v_specnext) { if (vq->v_rdev != vp->v_rdev || vq->v_type != vp->v_type || vp == vq) continue; vgonel(vq, p); break; } } /* * Remove the lock so that vgone below will * really eliminate the vnode after which time * vgone will awaken any sleepers. */ mtx_enter(&vnode_mtx); vp->v_lflag &= ~VXLOCK; mtx_leave(&vnode_mtx); } vgonel(vp, p); return (0); } int vop_generic_badop(void *v) { panic("%s", __func__); } int vop_generic_bmap(void *v) { struct vop_bmap_args *ap = v; if (ap->a_vpp) *ap->a_vpp = ap->a_vp; if (ap->a_bnp) *ap->a_bnp = ap->a_bn; if (ap->a_runp) *ap->a_runp = 0; return (0); } int vop_generic_bwrite(void *v) { struct vop_bwrite_args *ap = v; return (bwrite(ap->a_bp)); } int vop_generic_abortop(void *v) { struct vop_abortop_args *ap = v; if ((ap->a_cnp->cn_flags & (HASBUF | SAVESTART)) == HASBUF) pool_put(&namei_pool, ap->a_cnp->cn_pnbuf); return (0); } const struct filterops generic_filtops = { .f_flags = FILTEROP_ISFD, .f_attach = NULL, .f_detach = filt_generic_detach, .f_event = filt_generic_readwrite, }; int vop_generic_kqfilter(void *v) { struct vop_kqfilter_args *ap = v; struct knote *kn = ap->a_kn; switch (kn->kn_filter) { case EVFILT_READ: case EVFILT_WRITE: kn->kn_fop = &generic_filtops; break; default: return (EINVAL); } return (0); } /* Trivial lookup routine that always fails. */ int vop_generic_lookup(void *v) { struct vop_lookup_args *ap = v; *ap->a_vpp = NULL; return (ENOTDIR); } void filt_generic_detach(struct knote *kn) { } int filt_generic_readwrite(struct knote *kn, long hint) { /* * filesystem is gone, so set the EOF flag and schedule * the knote for deletion. */ if (hint == NOTE_REVOKE) { kn->kn_flags |= (EV_EOF | EV_ONESHOT); return (1); } kn->kn_data = 0; return (1); }
6 2 2 1 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 /* $OpenBSD: ip_ipip.c,v 1.98 2022/01/02 22:36:04 jsg Exp $ */ /* * The authors of this code are John Ioannidis (ji@tla.org), * Angelos D. Keromytis (kermit@csd.uch.gr) and * Niels Provos (provos@physnet.uni-hamburg.de). * * The original version of this code was written by John Ioannidis * for BSD/OS in Athens, Greece, in November 1995. * * Ported to OpenBSD and NetBSD, with additional transforms, in December 1996, * by Angelos D. Keromytis. * * Additional transforms and features in 1997 and 1998 by Angelos D. Keromytis * and Niels Provos. * * Additional features in 1999 by Angelos D. Keromytis. * * Copyright (C) 1995, 1996, 1997, 1998, 1999 by John Ioannidis, * Angelos D. Keromytis and Niels Provos. * Copyright (c) 2001, Angelos D. Keromytis. * * Permission to use, copy, and modify this software with or without fee * is hereby granted, provided that this entire notice is included in * all copies of any software which is or includes a copy or * modification of this software. * You may use this code under the GNU public license if you so wish. Please * contribute changes back to the authors under this freer than GPL license * so that we may further the use of strong encryption without limitations to * all. * * THIS SOFTWARE IS BEING PROVIDED "AS IS", WITHOUT ANY EXPRESS OR * IMPLIED WARRANTY. IN PARTICULAR, NONE OF THE AUTHORS MAKES ANY * REPRESENTATION OR WARRANTY OF ANY KIND CONCERNING THE * MERCHANTABILITY OF THIS SOFTWARE OR ITS FITNESS FOR ANY PARTICULAR * PURPOSE. */ /* * IP-inside-IP processing */ #include "bpfilter.h" #include "gif.h" #include "pf.h" #include <sys/param.h> #include <sys/systm.h> #include <sys/mbuf.h> #include <sys/socket.h> #include <sys/sysctl.h> #include <net/if.h> #include <net/if_types.h> #include <net/if_var.h> #include <net/route.h> #include <net/netisr.h> #include <net/bpf.h> #include <netinet/in.h> #include <netinet/ip.h> #include <netinet/in_pcb.h> #include <netinet/ip_var.h> #include <netinet/ip_ecn.h> #include <netinet/ip_ipip.h> #ifdef MROUTING #include <netinet/ip_mroute.h> #endif #if NPF > 0 #include <net/pfvar.h> #endif #ifdef ENCDEBUG #define DPRINTF(fmt, args...) \ do { \ if (encdebug) \ printf("%s: " fmt "\n", __func__, ## args); \ } while (0) #else #define DPRINTF(fmt, args...) \ do { } while (0) #endif /* * We can control the acceptance of IP4 packets by altering the sysctl * net.inet.ipip.allow value. Zero means drop them, all else is acceptance. */ int ipip_allow = 0; struct cpumem *ipipcounters; void ipip_init(void) { ipipcounters = counters_alloc(ipips_ncounters); } /* * Really only a wrapper for ipip_input_if(), for use with pr_input. */ int ipip_input(struct mbuf **mp, int *offp, int nxt, int af) { struct ifnet *ifp; /* If we do not accept IP-in-IP explicitly, drop. */ if (!ipip_allow && ((*mp)->m_flags & (M_AUTH|M_CONF)) == 0) { DPRINTF("dropped due to policy"); ipipstat_inc(ipips_pdrops); m_freemp(mp); return IPPROTO_DONE; } ifp = if_get((*mp)->m_pkthdr.ph_ifidx); if (ifp == NULL) { m_freemp(mp); return IPPROTO_DONE; } nxt = ipip_input_if(mp, offp, nxt, af, ifp); if_put(ifp); return nxt; } /* * ipip_input gets called when we receive an IP{46} encapsulated packet, * either because we got it at a real interface, or because AH or ESP * were being used in tunnel mode (in which case the ph_ifidx element * will contain the index of the encX interface associated with the * tunnel. */ int ipip_input_if(struct mbuf **mp, int *offp, int proto, int oaf, struct ifnet *ifp) { struct mbuf *m = *mp; struct sockaddr_in *sin; struct ip *ip; #ifdef INET6 struct sockaddr_in6 *sin6; struct ip6_hdr *ip6; #endif int mode, hlen; u_int8_t itos, otos; sa_family_t iaf; ipipstat_inc(ipips_ipackets); switch (oaf) { case AF_INET: hlen = sizeof(struct ip); break; #ifdef INET6 case AF_INET6: hlen = sizeof(struct ip6_hdr); break; #endif default: unhandled_af(oaf); } /* Bring the IP header in the first mbuf, if not there already */ if (m->m_len < hlen) { if ((m = *mp = m_pullup(m, hlen)) == NULL) { DPRINTF("m_pullup() failed"); ipipstat_inc(ipips_hdrops); goto bad; } } /* Keep outer ecn field. */ switch (oaf) { case AF_INET: ip = mtod(m, struct ip *); otos = ip->ip_tos; break; #ifdef INET6 case AF_INET6: ip6 = mtod(m, struct ip6_hdr *); otos = (ntohl(ip6->ip6_flow) >> 20) & 0xff; break; #endif } /* Remove outer IP header */ KASSERT(*offp > 0); m_adj(m, *offp); *offp = 0; ip = NULL; #ifdef INET6 ip6 = NULL; #endif switch (proto) { case IPPROTO_IPV4: hlen = sizeof(struct ip); break; #ifdef INET6 case IPPROTO_IPV6: hlen = sizeof(struct ip6_hdr); break; #endif default: ipipstat_inc(ipips_family); goto bad; } /* Sanity check */ if (m->m_pkthdr.len < hlen) { ipipstat_inc(ipips_hdrops); goto bad; } /* * Bring the inner header into the first mbuf, if not there already. */ if (m->m_len < hlen) { if ((m = *mp = m_pullup(m, hlen)) == NULL) { DPRINTF("m_pullup() failed"); ipipstat_inc(ipips_hdrops); goto bad; } } /* * RFC 1853 specifies that the inner TTL should not be touched on * decapsulation. There's no reason this comment should be here, but * this is as good as any a position. */ /* Some sanity checks in the inner IP header */ switch (proto) { case IPPROTO_IPV4: iaf = AF_INET; ip = mtod(m, struct ip *); hlen = ip->ip_hl << 2; if (m->m_pkthdr.len < hlen) { ipipstat_inc(ipips_hdrops); goto bad; } itos = ip->ip_tos; mode = m->m_flags & (M_AUTH|M_CONF) ? ECN_ALLOWED_IPSEC : ECN_ALLOWED; if (!ip_ecn_egress(mode, &otos, &itos)) { DPRINTF("ip_ecn_egress() failed"); ipipstat_inc(ipips_pdrops); goto bad; } /* re-calculate the checksum if ip_tos was changed */ if (itos != ip->ip_tos) ip_tos_patch(ip, itos); break; #ifdef INET6 case IPPROTO_IPV6: iaf = AF_INET6; ip6 = mtod(m, struct ip6_hdr *); itos = (ntohl(ip6->ip6_flow) >> 20) & 0xff; if (!ip_ecn_egress(ECN_ALLOWED, &otos, &itos)) { DPRINTF("ip_ecn_egress() failed"); ipipstat_inc(ipips_pdrops); goto bad; } ip6->ip6_flow &= ~htonl(0xff << 20); ip6->ip6_flow |= htonl((u_int32_t) itos << 20); break; #endif } /* Check for local address spoofing. */ if (!(ifp->if_flags & IFF_LOOPBACK) && ipip_allow != 2) { struct sockaddr_storage ss; struct rtentry *rt; memset(&ss, 0, sizeof(ss)); if (ip) { sin = (struct sockaddr_in *)&ss; sin->sin_family = AF_INET; sin->sin_len = sizeof(*sin); sin->sin_addr = ip->ip_src; #ifdef INET6 } else if (ip6) { sin6 = (struct sockaddr_in6 *)&ss; sin6->sin6_family = AF_INET6; sin6->sin6_len = sizeof(*sin6); sin6->sin6_addr = ip6->ip6_src; #endif /* INET6 */ } rt = rtalloc(sstosa(&ss), 0, m->m_pkthdr.ph_rtableid); if ((rt != NULL) && (rt->rt_flags & RTF_LOCAL)) { ipipstat_inc(ipips_spoof); rtfree(rt); goto bad; } rtfree(rt); } /* Statistics */ ipipstat_add(ipips_ibytes, m->m_pkthdr.len - hlen); #if NBPFILTER > 0 && NGIF > 0 if (ifp->if_type == IFT_GIF && ifp->if_bpf != NULL) bpf_mtap_af(ifp->if_bpf, iaf, m, BPF_DIRECTION_IN); #endif #if NPF > 0 pf_pkt_addr_changed(m); #endif /* * Interface pointer stays the same; if no IPsec processing has * been done (or will be done), this will point to a normal * interface. Otherwise, it'll point to an enc interface, which * will allow a packet filter to distinguish between secure and * untrusted packets. */ switch (proto) { case IPPROTO_IPV4: return ip_input_if(mp, offp, proto, oaf, ifp); #ifdef INET6 case IPPROTO_IPV6: return ip6_input_if(mp, offp, proto, oaf, ifp); #endif } bad: m_freemp(mp); return IPPROTO_DONE; } int ipip_output(struct mbuf **mp, struct tdb *tdb) { struct mbuf *m = *mp; u_int8_t tp, otos, itos; u_int64_t obytes; struct ip *ipo; #ifdef INET6 struct ip6_hdr *ip6, *ip6o; #endif /* INET6 */ #ifdef ENCDEBUG char buf[INET6_ADDRSTRLEN]; #endif int error; /* XXX Deal with empty TDB source/destination addresses. */ m_copydata(m, 0, 1, &tp); tp = (tp >> 4) & 0xff; /* Get the IP version number. */ switch (tdb->tdb_dst.sa.sa_family) { case AF_INET: if (tdb->tdb_src.sa.sa_family != AF_INET || tdb->tdb_src.sin.sin_addr.s_addr == INADDR_ANY || tdb->tdb_dst.sin.sin_addr.s_addr == INADDR_ANY) { DPRINTF("unspecified tunnel endpoint address " "in SA %s/%08x", ipsp_address(&tdb->tdb_dst, buf, sizeof(buf)), ntohl(tdb->tdb_spi)); ipipstat_inc(ipips_unspec); error = EINVAL; goto drop; } M_PREPEND(*mp, sizeof(struct ip), M_DONTWAIT); if (*mp == NULL) { DPRINTF("M_PREPEND failed"); ipipstat_inc(ipips_hdrops); error = ENOBUFS; goto drop; } m = *mp; ipo = mtod(m, struct ip *); ipo->ip_v = IPVERSION; ipo->ip_hl = 5; ipo->ip_len = htons(m->m_pkthdr.len); ipo->ip_ttl = ip_defttl; ipo->ip_sum = 0; ipo->ip_src = tdb->tdb_src.sin.sin_addr; ipo->ip_dst = tdb->tdb_dst.sin.sin_addr; /* * We do the htons() to prevent snoopers from determining our * endianness. */ ipo->ip_id = htons(ip_randomid()); /* If the inner protocol is IP... */ if (tp == IPVERSION) { /* Save ECN notification */ m_copydata(m, sizeof(struct ip) + offsetof(struct ip, ip_tos), sizeof(u_int8_t), (caddr_t) &itos); ipo->ip_p = IPPROTO_IPIP; /* * We should be keeping tunnel soft-state and * send back ICMPs if needed. */ m_copydata(m, sizeof(struct ip) + offsetof(struct ip, ip_off), sizeof(u_int16_t), (caddr_t) &ipo->ip_off); ipo->ip_off = ntohs(ipo->ip_off); ipo->ip_off &= ~(IP_DF | IP_MF | IP_OFFMASK); ipo->ip_off = htons(ipo->ip_off); } #ifdef INET6 else if (tp == (IPV6_VERSION >> 4)) { u_int32_t itos32; /* Save ECN notification. */ m_copydata(m, sizeof(struct ip) + offsetof(struct ip6_hdr, ip6_flow), sizeof(u_int32_t), (caddr_t) &itos32); itos = ntohl(itos32) >> 20; ipo->ip_p = IPPROTO_IPV6; ipo->ip_off = 0; } #endif /* INET6 */ else { ipipstat_inc(ipips_family); error = EAFNOSUPPORT; goto drop; } otos = 0; ip_ecn_ingress(ECN_ALLOWED, &otos, &itos); ipo->ip_tos = otos; obytes = m->m_pkthdr.len - sizeof(struct ip); if (tdb->tdb_xform->xf_type == XF_IP4) tdb->tdb_cur_bytes += obytes; break; #ifdef INET6 case AF_INET6: if (IN6_IS_ADDR_UNSPECIFIED(&tdb->tdb_dst.sin6.sin6_addr) || tdb->tdb_src.sa.sa_family != AF_INET6 || IN6_IS_ADDR_UNSPECIFIED(&tdb->tdb_src.sin6.sin6_addr)) { DPRINTF("unspecified tunnel endpoint address " "in SA %s/%08x", ipsp_address(&tdb->tdb_dst, buf, sizeof(buf)), ntohl(tdb->tdb_spi)); ipipstat_inc(ipips_unspec); error = EINVAL; goto drop; } /* If the inner protocol is IPv6, clear link local scope */ if (tp == (IPV6_VERSION >> 4)) { /* scoped address handling */ ip6 = mtod(m, struct ip6_hdr *); if (IN6_IS_SCOPE_EMBED(&ip6->ip6_src)) ip6->ip6_src.s6_addr16[1] = 0; if (IN6_IS_SCOPE_EMBED(&ip6->ip6_dst)) ip6->ip6_dst.s6_addr16[1] = 0; } M_PREPEND(*mp, sizeof(struct ip6_hdr), M_DONTWAIT); if (*mp == NULL) { DPRINTF("M_PREPEND failed"); ipipstat_inc(ipips_hdrops); error = ENOBUFS; goto drop; } m = *mp; /* Initialize IPv6 header */ ip6o = mtod(m, struct ip6_hdr *); ip6o->ip6_flow = 0; ip6o->ip6_vfc &= ~IPV6_VERSION_MASK; ip6o->ip6_vfc |= IPV6_VERSION; ip6o->ip6_plen = htons(m->m_pkthdr.len - sizeof(*ip6o)); ip6o->ip6_hlim = ip_defttl; in6_embedscope(&ip6o->ip6_src, &tdb->tdb_src.sin6, NULL); in6_embedscope(&ip6o->ip6_dst, &tdb->tdb_dst.sin6, NULL); if (tp == IPVERSION) { /* Save ECN notification */ m_copydata(m, sizeof(struct ip6_hdr) + offsetof(struct ip, ip_tos), sizeof(u_int8_t), (caddr_t) &itos); /* This is really IPVERSION. */ ip6o->ip6_nxt = IPPROTO_IPIP; } else if (tp == (IPV6_VERSION >> 4)) { u_int32_t itos32; /* Save ECN notification. */ m_copydata(m, sizeof(struct ip6_hdr) + offsetof(struct ip6_hdr, ip6_flow), sizeof(u_int32_t), (caddr_t) &itos32); itos = ntohl(itos32) >> 20; ip6o->ip6_nxt = IPPROTO_IPV6; } else { ipipstat_inc(ipips_family); error = EAFNOSUPPORT; goto drop; } otos = 0; ip_ecn_ingress(ECN_ALLOWED, &otos, &itos); ip6o->ip6_flow |= htonl((u_int32_t) otos << 20); obytes = m->m_pkthdr.len - sizeof(struct ip6_hdr); if (tdb->tdb_xform->xf_type == XF_IP4) tdb->tdb_cur_bytes += obytes; break; #endif /* INET6 */ default: DPRINTF("unsupported protocol family %d", tdb->tdb_dst.sa.sa_family); ipipstat_inc(ipips_family); error = EPFNOSUPPORT; goto drop; } ipipstat_pkt(ipips_opackets, ipips_obytes, obytes); return 0; drop: m_freemp(mp); return error; } #ifdef IPSEC int ipe4_attach(void) { return 0; } int ipe4_init(struct tdb *tdbp, const struct xformsw *xsp, struct ipsecinit *ii) { tdbp->tdb_xform = xsp; return 0; } int ipe4_zeroize(struct tdb *tdbp) { return 0; } int ipe4_input(struct mbuf **mp, struct tdb *tdb, int hlen, int proto) { /* This is a rather serious mistake, so no conditional printing. */ printf("%s: should never be called\n", __func__); m_freemp(mp); return EINVAL; } #endif /* IPSEC */ int ipip_sysctl_ipipstat(void *oldp, size_t *oldlenp, void *newp) { struct ipipstat ipipstat; CTASSERT(sizeof(ipipstat) == (ipips_ncounters * sizeof(uint64_t))); memset(&ipipstat, 0, sizeof ipipstat); counters_read(ipipcounters, (uint64_t *)&ipipstat, ipips_ncounters); return (sysctl_rdstruct(oldp, oldlenp, newp, &ipipstat, sizeof(ipipstat))); } int ipip_sysctl(int *name, u_int namelen, void *oldp, size_t *oldlenp, void *newp, size_t newlen) { int error; /* All sysctl names at this level are terminal. */ if (namelen != 1) return (ENOTDIR); switch (name[0]) { case IPIPCTL_ALLOW: NET_LOCK(); error = sysctl_int_bounded(oldp, oldlenp, newp, newlen, &ipip_allow, 0, 2); NET_UNLOCK(); return (error); case IPIPCTL_STATS: return (ipip_sysctl_ipipstat(oldp, oldlenp, newp)); default: return (ENOPROTOOPT); } /* NOTREACHED */ }
4 4 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 /* $OpenBSD: mpls_input.c,v 1.78 2021/07/22 11:07:17 mvs Exp $ */ /* * Copyright (c) 2008 Claudio Jeker <claudio@openbsd.org> * * Permission to use, copy, modify, and distribute this software for any * purpose with or without fee is hereby granted, provided that the above * copyright notice and this permission notice appear in all copies. * * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. */ #include <sys/param.h> #include <sys/mbuf.h> #include <sys/systm.h> #include <sys/socket.h> #include <net/if.h> #include <net/if_var.h> #include <net/if_types.h> #include <net/netisr.h> #include <net/route.h> #include <netinet/in.h> #include <netinet/ip.h> #include <netinet/ip_var.h> #include <netinet/ip_icmp.h> #ifdef INET6 #include <netinet/ip6.h> #endif /* INET6 */ #include <netmpls/mpls.h> #ifdef MPLS_DEBUG #define MPLS_LABEL_GET(l) ((ntohl((l) & MPLS_LABEL_MASK)) >> MPLS_LABEL_OFFSET) #define MPLS_TTL_GET(l) (ntohl((l) & MPLS_TTL_MASK)) #endif struct mbuf *mpls_do_error(struct mbuf *, int, int, int); void mpls_input_local(struct rtentry *, struct mbuf *); void mpls_input(struct ifnet *ifp, struct mbuf *m) { struct sockaddr_mpls *smpls; struct sockaddr_mpls sa_mpls; struct shim_hdr *shim; struct rtentry *rt; struct rt_mpls *rt_mpls; uint8_t ttl; int hasbos; if (!ISSET(ifp->if_xflags, IFXF_MPLS)) { m_freem(m); return; } /* drop all broadcast and multicast packets */ if (m->m_flags & (M_BCAST | M_MCAST)) { m_freem(m); return; } if (m->m_len < sizeof(*shim)) { m = m_pullup(m, sizeof(*shim)); if (m == NULL) return; } shim = mtod(m, struct shim_hdr *); #ifdef MPLS_DEBUG printf("mpls_input: iface %s label=%d, ttl=%d BoS %d\n", ifp->if_xname, MPLS_LABEL_GET(shim->shim_label), MPLS_TTL_GET(shim->shim_label), MPLS_BOS_ISSET(shim->shim_label)); #endif /* check and decrement TTL */ ttl = ntohl(shim->shim_label & MPLS_TTL_MASK); if (ttl <= 1) { /* TTL exceeded */ m = mpls_do_error(m, ICMP_TIMXCEED, ICMP_TIMXCEED_INTRANS, 0); if (m == NULL) return; shim = mtod(m, struct shim_hdr *); ttl = ntohl(shim->shim_label & MPLS_TTL_MASK); } else ttl--; hasbos = MPLS_BOS_ISSET(shim->shim_label); bzero(&sa_mpls, sizeof(sa_mpls)); smpls = &sa_mpls; smpls->smpls_family = AF_MPLS; smpls->smpls_len = sizeof(*smpls); smpls->smpls_label = shim->shim_label & MPLS_LABEL_MASK; if (ntohl(smpls->smpls_label) < MPLS_LABEL_RESERVED_MAX) { m = mpls_shim_pop(m); if (m == NULL) return; if (!hasbos) { /* * RFC 4182 relaxes the position of the * explicit NULL labels. They no longer need * to be at the beginning of the stack. * In this case the label is ignored and the decision * is made based on the lower one. */ shim = mtod(m, struct shim_hdr *); smpls->smpls_label = shim->shim_label & MPLS_LABEL_MASK; hasbos = MPLS_BOS_ISSET(shim->shim_label); } else { switch (ntohl(smpls->smpls_label)) { case MPLS_LABEL_IPV4NULL: do_v4: if (mpls_mapttl_ip) { m = mpls_ip_adjttl(m, ttl); if (m == NULL) return; } ipv4_input(ifp, m); return; #ifdef INET6 case MPLS_LABEL_IPV6NULL: do_v6: if (mpls_mapttl_ip6) { m = mpls_ip6_adjttl(m, ttl); if (m == NULL) return; } ipv6_input(ifp, m); return; #endif /* INET6 */ case MPLS_LABEL_IMPLNULL: if (m->m_len < sizeof(u_char) && (m = m_pullup(m, sizeof(u_char))) == NULL) return; switch (*mtod(m, u_char *) >> 4) { case IPVERSION: goto do_v4; #ifdef INET6 case IPV6_VERSION >> 4: goto do_v6; #endif default: m_freem(m); return; } default: /* Other cases are not handled for now */ m_freem(m); return; } } } ifp = NULL; rt = rtalloc(smplstosa(smpls), RT_RESOLVE, m->m_pkthdr.ph_rtableid); if (!rtisvalid(rt)) { /* no entry for this label */ #ifdef MPLS_DEBUG printf("MPLS_DEBUG: label not found\n"); #endif m_freem(m); goto done; } rt_mpls = (struct rt_mpls *)rt->rt_llinfo; if (rt_mpls == NULL || (rt->rt_flags & RTF_MPLS) == 0) { #ifdef MPLS_DEBUG printf("MPLS_DEBUG: no MPLS information attached\n"); #endif m_freem(m); goto done; } switch (rt_mpls->mpls_operation) { case MPLS_OP_POP: if (ISSET(rt->rt_flags, RTF_LOCAL)) { mpls_input_local(rt, m); goto done; } m = mpls_shim_pop(m); if (m == NULL) goto done; if (!hasbos) /* just forward to gw */ break; /* last label popped so decide where to push it to */ ifp = if_get(rt->rt_ifidx); if (ifp == NULL) { m_freem(m); goto done; } KASSERT(rt->rt_gateway); switch(rt->rt_gateway->sa_family) { case AF_INET: if ((m = mpls_ip_adjttl(m, ttl)) == NULL) goto done; break; #ifdef INET6 case AF_INET6: if ((m = mpls_ip6_adjttl(m, ttl)) == NULL) goto done; break; #endif case AF_LINK: break; default: m_freem(m); goto done; } /* shortcut sending out the packet */ if (!ISSET(ifp->if_xflags, IFXF_MPLS)) (*ifp->if_output)(ifp, m, rt->rt_gateway, rt); else (*ifp->if_ll_output)(ifp, m, rt->rt_gateway, rt); goto done; case MPLS_OP_PUSH: /* this does not make much sense but it does not hurt */ m = mpls_shim_push(m, rt_mpls); break; case MPLS_OP_SWAP: m = mpls_shim_swap(m, rt_mpls); break; default: m_freem(m); goto done; } if (m == NULL) goto done; /* refetch label and write back TTL */ shim = mtod(m, struct shim_hdr *); shim->shim_label = (shim->shim_label & ~MPLS_TTL_MASK) | htonl(ttl); ifp = if_get(rt->rt_ifidx); if (ifp == NULL) { m_freem(m); goto done; } #ifdef MPLS_DEBUG printf("MPLS: sending on %s outlabel %x dst af %d in %d out %d\n", ifp->if_xname, ntohl(shim->shim_label), smpls->smpls_family, MPLS_LABEL_GET(smpls->smpls_label), MPLS_LABEL_GET(rt_mpls->mpls_label)); #endif /* Output iface is not MPLS-enabled */ if (!ISSET(ifp->if_xflags, IFXF_MPLS)) { #ifdef MPLS_DEBUG printf("MPLS_DEBUG: interface %s not mpls enabled\n", ifp->if_xname); #endif m_freem(m); goto done; } (*ifp->if_ll_output)(ifp, m, smplstosa(smpls), rt); done: if_put(ifp); rtfree(rt); } void mpls_input_local(struct rtentry *rt, struct mbuf *m) { struct ifnet *ifp; ifp = if_get(rt->rt_ifidx); if (ifp == NULL) { m_freem(m); return; } /* shortcut sending out the packet */ if (!ISSET(ifp->if_xflags, IFXF_MPLS)) (*ifp->if_output)(ifp, m, rt->rt_gateway, rt); else (*ifp->if_ll_output)(ifp, m, rt->rt_gateway, rt); if_put(ifp); } struct mbuf * mpls_ip_adjttl(struct mbuf *m, u_int8_t ttl) { struct ip *ip; uint16_t old, new; uint32_t x; if (m->m_len < sizeof(*ip)) { m = m_pullup(m, sizeof(*ip)); if (m == NULL) return (NULL); } ip = mtod(m, struct ip *); old = htons(ip->ip_ttl << 8); new = htons(ttl << 8); x = ip->ip_sum + old - new; ip->ip_ttl = ttl; /* see pf_cksum_fixup() */ ip->ip_sum = (x) + (x >> 16); return (m); } #ifdef INET6 struct mbuf * mpls_ip6_adjttl(struct mbuf *m, u_int8_t ttl) { struct ip6_hdr *ip6; if (m->m_len < sizeof(*ip6)) { m = m_pullup(m, sizeof(*ip6)); if (m == NULL) return (NULL); } ip6 = mtod(m, struct ip6_hdr *); ip6->ip6_hlim = ttl; return (m); } #endif /* INET6 */ struct mbuf * mpls_do_error(struct mbuf *m, int type, int code, int destmtu) { struct shim_hdr stack[MPLS_INKERNEL_LOOP_MAX]; struct sockaddr_mpls sa_mpls; struct sockaddr_mpls *smpls; struct rtentry *rt = NULL; struct shim_hdr *shim; struct in_ifaddr *ia; struct icmp *icp; struct ip *ip; int nstk, error; for (nstk = 0; nstk < MPLS_INKERNEL_LOOP_MAX; nstk++) { if (m->m_len < sizeof(*shim) && (m = m_pullup(m, sizeof(*shim))) == NULL) return (NULL); stack[nstk] = *mtod(m, struct shim_hdr *); m_adj(m, sizeof(*shim)); if (MPLS_BOS_ISSET(stack[nstk].shim_label)) break; } shim = &stack[0]; if (m->m_len < sizeof(u_char) && (m = m_pullup(m, sizeof(u_char))) == NULL) return (NULL); switch (*mtod(m, u_char *) >> 4) { case IPVERSION: if (m->m_len < sizeof(*ip) && (m = m_pullup(m, sizeof(*ip))) == NULL) return (NULL); m = icmp_do_error(m, type, code, 0, destmtu); if (m == NULL) return (NULL); if (icmp_do_exthdr(m, ICMP_EXT_MPLS, 1, stack, (nstk + 1) * sizeof(*shim))) return (NULL); /* set ip_src to something usable, based on the MPLS label */ bzero(&sa_mpls, sizeof(sa_mpls)); smpls = &sa_mpls; smpls->smpls_family = AF_MPLS; smpls->smpls_len = sizeof(*smpls); smpls->smpls_label = shim->shim_label & MPLS_LABEL_MASK; rt = rtalloc(smplstosa(smpls), RT_RESOLVE, 0); if (!rtisvalid(rt)) { rtfree(rt); /* no entry for this label */ m_freem(m); return (NULL); } if (rt->rt_ifa->ifa_addr->sa_family == AF_INET) ia = ifatoia(rt->rt_ifa); else { /* XXX this needs fixing, if the MPLS is on an IP * less interface we need to find some other IP to * use as source. */ rtfree(rt); m_freem(m); return (NULL); } /* It is safe to dereference ``ia'' iff ``rt'' is valid. */ error = icmp_reflect(m, NULL, ia); rtfree(rt); if (error) return (NULL); ip = mtod(m, struct ip *); /* stuff to fix up which is normally done in ip_output */ ip->ip_v = IPVERSION; ip->ip_id = htons(ip_randomid()); ip->ip_sum = 0; ip->ip_sum = in_cksum(m, sizeof(*ip)); /* stolen from icmp_send() */ icp = (struct icmp *)(mtod(m, caddr_t) + sizeof(*ip)); icp->icmp_cksum = 0; icp->icmp_cksum = in4_cksum(m, 0, sizeof(*ip), ntohs(ip->ip_len) - sizeof(*ip)); break; #ifdef INET6 case IPV6_VERSION >> 4: #endif default: m_freem(m); return (NULL); } /* add mpls stack back to new packet */ M_PREPEND(m, (nstk + 1) * sizeof(*shim), M_NOWAIT); if (m == NULL) return (NULL); m_copyback(m, 0, (nstk + 1) * sizeof(*shim), stack, M_NOWAIT); /* change TTL to default */ shim = mtod(m, struct shim_hdr *); shim->shim_label = (shim->shim_label & ~MPLS_TTL_MASK) | htonl(mpls_defttl); return (m); }
22 13 16 12 12 12 7 2 6 7 2 5 52 1 3 48 5 1 5 5 17 16 17 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 /* $OpenBSD: cons.c,v 1.30 2022/07/02 08:50:41 visa Exp $ */ /* $NetBSD: cons.c,v 1.30 1996/04/08 19:57:30 jonathan Exp $ */ /* * Copyright (c) 1988 University of Utah. * Copyright (c) 1990, 1993 * The Regents of the University of California. All rights reserved. * * This code is derived from software contributed to Berkeley by * the Systems Programming Group of the University of Utah Computer * Science Department. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * from: Utah $Hdr: cons.c 1.7 92/01/21$ * * @(#)cons.c 8.2 (Berkeley) 1/12/94 */ #include <sys/param.h> #include <sys/systm.h> #include <sys/ioctl.h> #include <sys/tty.h> #include <sys/conf.h> #include <sys/vnode.h> #include <dev/cons.h> struct tty *constty = NULL; /* virtual console output device */ struct vnode *cn_devvp = NULLVP; /* vnode for underlying device. */ int cnopen(dev_t dev, int flag, int mode, struct proc *p) { dev_t cndev; if (cn_tab == NULL) return (0); /* * always open the 'real' console device, so we don't get nailed * later. This follows normal device semantics; they always get * open() calls. */ cndev = cn_tab->cn_dev; if (cndev == NODEV) return (ENXIO); #ifdef DIAGNOSTIC if (cndev == dev) panic("cnopen: recursive"); #endif if (cn_devvp == NULLVP) { /* try to get a reference on its vnode, but fail silently */ cdevvp(cndev, &cn_devvp); } return ((*cdevsw[major(cndev)].d_open)(cndev, flag, mode, p)); } int cnclose(dev_t dev, int flag, int mode, struct proc *p) { struct vnode *vp; if (cn_tab == NULL) return (0); /* * If the real console isn't otherwise open, close it. * If it's otherwise open, don't close it, because that'll * screw up others who have it open. */ dev = cn_tab->cn_dev; if (cn_devvp != NULLVP) { /* release our reference to real dev's vnode */ vrele(cn_devvp); cn_devvp = NULLVP; } if (vfinddev(dev, VCHR, &vp) && vcount(vp)) return (0); return ((*cdevsw[major(dev)].d_close)(dev, flag, mode, p)); } int cnread(dev_t dev, struct uio *uio, int flag) { /* * If we would redirect input, punt. This will keep strange * things from happening to people who are using the real * console. Nothing should be using /dev/console for * input (except a shell in single-user mode, but then, * one wouldn't TIOCCONS then). */ if (constty != NULL) return 0; else if (cn_tab == NULL) return ENXIO; dev = cn_tab->cn_dev; return ((*cdevsw[major(dev)].d_read)(dev, uio, flag)); } int cnwrite(dev_t dev, struct uio *uio, int flag) { /* * Redirect output, if that's appropriate. * If there's no real console, return ENXIO. */ if (constty != NULL) dev = constty->t_dev; else if (cn_tab == NULL) return ENXIO; else dev = cn_tab->cn_dev; return ((*cdevsw[major(dev)].d_write)(dev, uio, flag)); } int cnstop(struct tty *tp, int flag) { return (0); } int cnioctl(dev_t dev, u_long cmd, caddr_t data, int flag, struct proc *p) { int error; /* * Superuser can always use this to wrest control of console * output from the "virtual" console. */ if (cmd == TIOCCONS && constty != NULL) { error = suser(p); if (error) return (error); constty = NULL; return (0); } /* * Redirect the ioctl, if that's appropriate. * Note that strange things can happen, if a program does * ioctls on /dev/console, then the console is redirected * out from under it. */ if (constty != NULL) dev = constty->t_dev; else if (cn_tab == NULL) return ENXIO; else dev = cn_tab->cn_dev; return ((*cdevsw[major(dev)].d_ioctl)(dev, cmd, data, flag, p)); } int cnkqfilter(dev_t dev, struct knote *kn) { /* * Redirect output, if that's appropriate. * If there's no real console, return 1. */ if (constty != NULL) dev = constty->t_dev; else if (cn_tab == NULL) return (ENXIO); else dev = cn_tab->cn_dev; if (cdevsw[major(dev)].d_kqfilter) return ((*cdevsw[major(dev)].d_kqfilter)(dev, kn)); return (EOPNOTSUPP); } int cngetc(void) { if (cn_tab == NULL) return (0); return ((*cn_tab->cn_getc)(cn_tab->cn_dev)); } void cnputc(int c) { if (cn_tab == NULL) return; if (c) { (*cn_tab->cn_putc)(cn_tab->cn_dev, c); if (c == '\n') (*cn_tab->cn_putc)(cn_tab->cn_dev, '\r'); } } void cnpollc(int on) { static int refcount = 0; if (cn_tab == NULL) return; if (!on) --refcount; if (refcount == 0) (*cn_tab->cn_pollc)(cn_tab->cn_dev, on); if (on) ++refcount; } void nullcnpollc(dev_t dev, int on) { } void cnbell(u_int pitch, u_int period, u_int volume) { if (cn_tab == NULL || cn_tab->cn_bell == NULL) return; (*cn_tab->cn_bell)(cn_tab->cn_dev, pitch, period, volume); }
166 9 104 20 33 46 3 58 42 1 1 6 3 2 1 1 1 1 1 165 166 2 29 1 20 14 157 157 245 86 12 153 86 56 102 2 100 2 100 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 /* $OpenBSD: pf_lb.c,v 1.72 2022/08/31 11:29:12 benno Exp $ */ /* * Copyright (c) 2001 Daniel Hartmeier * Copyright (c) 2002 - 2008 Henning Brauer * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * * - Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * - Redistributions in binary form must reproduce the above * copyright notice, this list of conditions and the following * disclaimer in the documentation and/or other materials provided * with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE * COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, * BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN * ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE * POSSIBILITY OF SUCH DAMAGE. * * Effort sponsored in part by the Defense Advanced Research Projects * Agency (DARPA) and Air Force Research Laboratory, Air Force * Materiel Command, USAF, under agreement number F30602-01-2-0537. * */ #include "bpfilter.h" #include "pflog.h" #include "pfsync.h" #include "pflow.h" #include <sys/param.h> #include <sys/systm.h> #include <sys/mbuf.h> #include <sys/filio.h> #include <sys/socket.h> #include <sys/socketvar.h> #include <sys/kernel.h> #include <sys/time.h> #include <sys/pool.h> #include <sys/rwlock.h> #include <sys/syslog.h> #include <sys/stdint.h> #include <crypto/siphash.h> #include <net/if.h> #include <net/bpf.h> #include <net/route.h> #include <netinet/in.h> #include <netinet/ip.h> #include <netinet/in_pcb.h> #include <netinet/ip_var.h> #include <netinet/ip_icmp.h> #include <netinet/icmp_var.h> #include <netinet/tcp.h> #include <netinet/tcp_seq.h> #include <netinet/tcp_timer.h> #include <netinet/udp.h> #include <netinet/udp_var.h> #include <netinet/if_ether.h> #ifdef INET6 #include <netinet/ip6.h> #include <netinet/icmp6.h> #endif /* INET6 */ #include <net/pfvar.h> #include <net/pfvar_priv.h> #if NPFLOG > 0 #include <net/if_pflog.h> #endif /* NPFLOG > 0 */ #if NPFLOW > 0 #include <net/if_pflow.h> #endif /* NPFLOW > 0 */ #if NPFSYNC > 0 #include <net/if_pfsync.h> #endif /* NPFSYNC > 0 */ u_int64_t pf_hash(struct pf_addr *, struct pf_addr *, struct pf_poolhashkey *, sa_family_t); int pf_get_sport(struct pf_pdesc *, struct pf_rule *, struct pf_addr *, u_int16_t *, u_int16_t, u_int16_t, struct pf_src_node **); int pf_map_addr_states_increase(sa_family_t, struct pf_pool *, struct pf_addr *); int pf_get_transaddr_af(struct pf_rule *, struct pf_pdesc *, struct pf_src_node **); int pf_map_addr_sticky(sa_family_t, struct pf_rule *, struct pf_addr *, struct pf_addr *, struct pf_src_node **, struct pf_pool *, enum pf_sn_types); u_int64_t pf_hash(struct pf_addr *inaddr, struct pf_addr *hash, struct pf_poolhashkey *key, sa_family_t af) { uint64_t res = 0; #ifdef INET6 union { uint64_t hash64; uint32_t hash32[2]; } h; #endif /* INET6 */ switch (af) { case AF_INET: res = SipHash24((SIPHASH_KEY *)key, &inaddr->addr32[0], sizeof(inaddr->addr32[0])); hash->addr32[0] = res; break; #ifdef INET6 case AF_INET6: res = SipHash24((SIPHASH_KEY *)key, &inaddr->addr32[0], 4 * sizeof(inaddr->addr32[0])); h.hash64 = res; hash->addr32[0] = h.hash32[0]; hash->addr32[1] = h.hash32[1]; /* * siphash isn't big enough, but flipping it around is * good enough here. */ hash->addr32[2] = ~h.hash32[1]; hash->addr32[3] = ~h.hash32[0]; break; #endif /* INET6 */ default: unhandled_af(af); } return (res); } int pf_get_sport(struct pf_pdesc *pd, struct pf_rule *r, struct pf_addr *naddr, u_int16_t *nport, u_int16_t low, u_int16_t high, struct pf_src_node **sn) { struct pf_state_key_cmp key; struct pf_addr init_addr; u_int16_t cut; int dir = (pd->dir == PF_IN) ? PF_OUT : PF_IN; int sidx = pd->sidx; int didx = pd->didx; memset(&init_addr, 0, sizeof(init_addr)); if (pf_map_addr(pd->naf, r, &pd->nsaddr, naddr, &init_addr, sn, &r->nat, PF_SN_NAT)) return (1); if (pd->proto == IPPROTO_ICMP) { if (pd->ndport == htons(ICMP_ECHO)) { low = 1; high = 65535; } else return (0); /* Don't try to modify non-echo ICMP */ } #ifdef INET6 if (pd->proto == IPPROTO_ICMPV6) { if (pd->ndport == htons(ICMP6_ECHO_REQUEST)) { low = 1; high = 65535; } else return (0); /* Don't try to modify non-echo ICMP */ } #endif /* INET6 */ do { key.af = pd->naf; key.proto = pd->proto; key.rdomain = pd->rdomain; pf_addrcpy(&key.addr[didx], &pd->ndaddr, key.af); pf_addrcpy(&key.addr[sidx], naddr, key.af); key.port[didx] = pd->ndport; /* * port search; start random, step; * similar 2 portloop in in_pcbbind */ if (!(pd->proto == IPPROTO_TCP || pd->proto == IPPROTO_UDP || pd->proto == IPPROTO_ICMP || pd->proto == IPPROTO_ICMPV6)) { /* XXX bug: icmp states dont use the id on both * XXX sides (traceroute -I through nat) */ key.port[sidx] = pd->nsport; if (pf_find_state_all(&key, dir, NULL) == NULL) { *nport = pd->nsport; return (0); } } else if (low == 0 && high == 0) { key.port[sidx] = pd->nsport; if (pf_find_state_all(&key, dir, NULL) == NULL) { *nport = pd->nsport; return (0); } } else if (low == high) { key.port[sidx] = htons(low); if (pf_find_state_all(&key, dir, NULL) == NULL) { *nport = htons(low); return (0); } } else { u_int32_t tmp; if (low > high) { tmp = low; low = high; high = tmp; } /* low < high */ cut = arc4random_uniform(1 + high - low) + low; /* low <= cut <= high */ for (tmp = cut; tmp <= high && tmp <= 0xffff; ++tmp) { key.port[sidx] = htons(tmp); if (pf_find_state_all(&key, dir, NULL) == NULL && !in_baddynamic(tmp, pd->proto)) { *nport = htons(tmp); return (0); } } tmp = cut; for (tmp -= 1; tmp >= low && tmp <= 0xffff; --tmp) { key.port[sidx] = htons(tmp); if (pf_find_state_all(&key, dir, NULL) == NULL && !in_baddynamic(tmp, pd->proto)) { *nport = htons(tmp); return (0); } } } switch (r->nat.opts & PF_POOL_TYPEMASK) { case PF_POOL_RANDOM: case PF_POOL_ROUNDROBIN: case PF_POOL_LEASTSTATES: /* * pick a different source address since we're out * of free port choices for the current one. */ if (pf_map_addr(pd->naf, r, &pd->nsaddr, naddr, &init_addr, sn, &r->nat, PF_SN_NAT)) return (1); break; case PF_POOL_NONE: case PF_POOL_SRCHASH: case PF_POOL_BITMASK: default: return (1); } } while (! PF_AEQ(&init_addr, naddr, pd->naf) ); return (1); /* none available */ } int pf_map_addr_sticky(sa_family_t af, struct pf_rule *r, struct pf_addr *saddr, struct pf_addr *naddr, struct pf_src_node **sns, struct pf_pool *rpool, enum pf_sn_types type) { struct pf_addr *raddr, *rmask, *cached; struct pf_state *s; struct pf_src_node k; int valid; k.af = af; k.type = type; pf_addrcpy(&k.addr, saddr, af); k.rule.ptr = r; pf_status.scounters[SCNT_SRC_NODE_SEARCH]++; sns[type] = RB_FIND(pf_src_tree, &tree_src_tracking, &k); if (sns[type] == NULL) return (-1); /* check if the cached entry is still valid */ cached = &(sns[type])->raddr; valid = 0; if (PF_AZERO(cached, af)) { valid = 1; } else if (rpool->addr.type == PF_ADDR_DYNIFTL) { if (pfr_kentry_byaddr(rpool->addr.p.dyn->pfid_kt, cached, af, 0)) valid = 1; } else if (rpool->addr.type == PF_ADDR_TABLE) { if (pfr_kentry_byaddr(rpool->addr.p.tbl, cached, af, 0)) valid = 1; } else if (rpool->addr.type != PF_ADDR_NOROUTE) { raddr = &rpool->addr.v.a.addr; rmask = &rpool->addr.v.a.mask; valid = pf_match_addr(0, raddr, rmask, cached, af); } if (!valid) { if (pf_status.debug >= LOG_DEBUG) { log(LOG_DEBUG, "pf: pf_map_addr: " "stale src tracking (%u) ", type); pf_print_host(&k.addr, 0, af); addlog(" to "); pf_print_host(cached, 0, af); addlog("\n"); } if (sns[type]->states != 0) { /* XXX expensive */ RB_FOREACH(s, pf_state_tree_id, &tree_id) pf_state_rm_src_node(s, sns[type]); } sns[type]->expire = 1; pf_remove_src_node(sns[type]); sns[type] = NULL; return (-1); } if (!PF_AZERO(cached, af)) { pf_addrcpy(naddr, cached, af); if ((rpool->opts & PF_POOL_TYPEMASK) == PF_POOL_LEASTSTATES && pf_map_addr_states_increase(af, rpool, cached) == -1) return (-1); } if (pf_status.debug >= LOG_DEBUG) { log(LOG_DEBUG, "pf: pf_map_addr: " "src tracking (%u) maps ", type); pf_print_host(&k.addr, 0, af); addlog(" to "); pf_print_host(naddr, 0, af); addlog("\n"); } if (sns[type]->kif != NULL) rpool->kif = sns[type]->kif; return (0); } uint32_t pf_rand_addr(uint32_t mask) { uint32_t addr; mask = ~ntohl(mask); addr = arc4random_uniform(mask + 1); return (htonl(addr)); } int pf_map_addr(sa_family_t af, struct pf_rule *r, struct pf_addr *saddr, struct pf_addr *naddr, struct pf_addr *init_addr, struct pf_src_node **sns, struct pf_pool *rpool, enum pf_sn_types type) { struct pf_addr hash; struct pf_addr faddr; struct pf_addr *raddr = &rpool->addr.v.a.addr; struct pf_addr *rmask = &rpool->addr.v.a.mask; struct pfr_ktable *kt; struct pfi_kif *kif; u_int64_t states; u_int16_t weight; u_int64_t load; u_int64_t cload; u_int64_t hashidx; int cnt; if (sns[type] == NULL && rpool->opts & PF_POOL_STICKYADDR && (rpool->opts & PF_POOL_TYPEMASK) != PF_POOL_NONE && pf_map_addr_sticky(af, r, saddr, naddr, sns, rpool, type) == 0) return (0); if (rpool->addr.type == PF_ADDR_NOROUTE) return (1); if (rpool->addr.type == PF_ADDR_DYNIFTL) { switch (af) { case AF_INET: if (rpool->addr.p.dyn->pfid_acnt4 < 1 && !PF_POOL_DYNTYPE(rpool->opts)) return (1); raddr = &rpool->addr.p.dyn->pfid_addr4; rmask = &rpool->addr.p.dyn->pfid_mask4; break; #ifdef INET6 case AF_INET6: if (rpool->addr.p.dyn->pfid_acnt6 < 1 && !PF_POOL_DYNTYPE(rpool->opts)) return (1); raddr = &rpool->addr.p.dyn->pfid_addr6; rmask = &rpool->addr.p.dyn->pfid_mask6; break; #endif /* INET6 */ default: unhandled_af(af); } } else if (rpool->addr.type == PF_ADDR_TABLE) { if (!PF_POOL_DYNTYPE(rpool->opts)) return (1); /* unsupported */ } else { raddr = &rpool->addr.v.a.addr; rmask = &rpool->addr.v.a.mask; } switch (rpool->opts & PF_POOL_TYPEMASK) { case PF_POOL_NONE: pf_addrcpy(naddr, raddr, af); break; case PF_POOL_BITMASK: pf_poolmask(naddr, raddr, rmask, saddr, af); break; case PF_POOL_RANDOM: if (rpool->addr.type == PF_ADDR_TABLE || rpool->addr.type == PF_ADDR_DYNIFTL) { if (rpool->addr.type == PF_ADDR_TABLE) kt = rpool->addr.p.tbl; else kt = rpool->addr.p.dyn->pfid_kt; kt = pfr_ktable_select_active(kt); if (kt == NULL) return (1); cnt = kt->pfrkt_cnt; if (cnt == 0) rpool->tblidx = 0; else rpool->tblidx = (int)arc4random_uniform(cnt); memset(&rpool->counter, 0, sizeof(rpool->counter)); if (pfr_pool_get(rpool, &raddr, &rmask, af)) return (1); pf_addrcpy(naddr, &rpool->counter, af); } else if (init_addr != NULL && PF_AZERO(init_addr, af)) { switch (af) { case AF_INET: rpool->counter.addr32[0] = pf_rand_addr( rmask->addr32[0]); break; #ifdef INET6 case AF_INET6: if (rmask->addr32[3] != 0xffffffff) rpool->counter.addr32[3] = pf_rand_addr( rmask->addr32[3]); else break; if (rmask->addr32[2] != 0xffffffff) rpool->counter.addr32[2] = pf_rand_addr( rmask->addr32[2]); else break; if (rmask->addr32[1] != 0xffffffff) rpool->counter.addr32[1] = pf_rand_addr( rmask->addr32[1]); else break; if (rmask->addr32[0] != 0xffffffff) rpool->counter.addr32[0] = pf_rand_addr( rmask->addr32[0]); break; #endif /* INET6 */ default: unhandled_af(af); } pf_poolmask(naddr, raddr, rmask, &rpool->counter, af); pf_addrcpy(init_addr, naddr, af); } else { pf_addr_inc(&rpool->counter, af); pf_poolmask(naddr, raddr, rmask, &rpool->counter, af); } break; case PF_POOL_SRCHASH: hashidx = pf_hash(saddr, &hash, &rpool->key, af); if (rpool->addr.type == PF_ADDR_TABLE || rpool->addr.type == PF_ADDR_DYNIFTL) { if (rpool->addr.type == PF_ADDR_TABLE) kt = rpool->addr.p.tbl; else kt = rpool->addr.p.dyn->pfid_kt; kt = pfr_ktable_select_active(kt); if (kt == NULL) return (1); cnt = kt->pfrkt_cnt; if (cnt == 0) rpool->tblidx = 0; else rpool->tblidx = (int)(hashidx % cnt); memset(&rpool->counter, 0, sizeof(rpool->counter)); if (pfr_pool_get(rpool, &raddr, &rmask, af)) return (1); pf_addrcpy(naddr, &rpool->counter, af); } else { pf_poolmask(naddr, raddr, rmask, &hash, af); } break; case PF_POOL_ROUNDROBIN: if (rpool->addr.type == PF_ADDR_TABLE || rpool->addr.type == PF_ADDR_DYNIFTL) { if (pfr_pool_get(rpool, &raddr, &rmask, af)) { /* * reset counter in case its value * has been removed from the pool. */ memset(&rpool->counter, 0, sizeof(rpool->counter)); if (pfr_pool_get(rpool, &raddr, &rmask, af)) return (1); } } else if (PF_AZERO(&rpool->counter, af)) { /* * fall back to POOL_NONE if there is a single host * address in pool. */ if (af == AF_INET && rmask->addr32[0] == INADDR_BROADCAST) { pf_addrcpy(naddr, raddr, af); break; } #ifdef INET6 if (af == AF_INET6 && IN6_ARE_ADDR_EQUAL(&rmask->v6, &in6mask128)) { pf_addrcpy(naddr, raddr, af); break; } #endif } else if (pf_match_addr(0, raddr, rmask, &rpool->counter, af)) return (1); /* iterate over table if it contains entries which are weighted */ if ((rpool->addr.type == PF_ADDR_TABLE && rpool->addr.p.tbl->pfrkt_refcntcost > 0) || (rpool->addr.type == PF_ADDR_DYNIFTL && rpool->addr.p.dyn->pfid_kt->pfrkt_refcntcost > 0)) { do { if (rpool->addr.type == PF_ADDR_TABLE || rpool->addr.type == PF_ADDR_DYNIFTL) { if (pfr_pool_get(rpool, &raddr, &rmask, af)) return (1); } else { log(LOG_ERR, "pf: pf_map_addr: " "weighted RR failure"); return (1); } if (rpool->weight >= rpool->curweight) break; pf_addr_inc(&rpool->counter, af); } while (1); weight = rpool->weight; } pf_poolmask(naddr, raddr, rmask, &rpool->counter, af); if (init_addr != NULL && PF_AZERO(init_addr, af)) pf_addrcpy(init_addr, &rpool->counter, af); pf_addr_inc(&rpool->counter, af); break; case PF_POOL_LEASTSTATES: /* retrieve an address first */ if (rpool->addr.type == PF_ADDR_TABLE || rpool->addr.type == PF_ADDR_DYNIFTL) { if (pfr_pool_get(rpool, &raddr, &rmask, af)) { /* see PF_POOL_ROUNDROBIN */ memset(&rpool->counter, 0, sizeof(rpool->counter)); if (pfr_pool_get(rpool, &raddr, &rmask, af)) return (1); } } else if (pf_match_addr(0, raddr, rmask, &rpool->counter, af)) return (1); states = rpool->states; weight = rpool->weight; kif = rpool->kif; if ((rpool->addr.type == PF_ADDR_TABLE && rpool->addr.p.tbl->pfrkt_refcntcost > 0) || (rpool->addr.type == PF_ADDR_DYNIFTL && rpool->addr.p.dyn->pfid_kt->pfrkt_refcntcost > 0)) load = ((UINT16_MAX * rpool->states) / rpool->weight); else load = states; pf_addrcpy(&faddr, &rpool->counter, af); pf_addrcpy(naddr, &rpool->counter, af); if (init_addr != NULL && PF_AZERO(init_addr, af)) pf_addrcpy(init_addr, naddr, af); /* * iterate *once* over whole table and find destination with * least connection */ do { pf_addr_inc(&rpool->counter, af); if (rpool->addr.type == PF_ADDR_TABLE || rpool->addr.type == PF_ADDR_DYNIFTL) { if (pfr_pool_get(rpool, &raddr, &rmask, af)) return (1); } else if (pf_match_addr(0, raddr, rmask, &rpool->counter, af)) return (1); if ((rpool->addr.type == PF_ADDR_TABLE && rpool->addr.p.tbl->pfrkt_refcntcost > 0) || (rpool->addr.type == PF_ADDR_DYNIFTL && rpool->addr.p.dyn->pfid_kt->pfrkt_refcntcost > 0)) cload = ((UINT16_MAX * rpool->states) / rpool->weight); else cload = rpool->states; /* find lc minimum */ if (cload < load) { states = rpool->states; weight = rpool->weight; kif = rpool->kif; load = cload; pf_addrcpy(naddr, &rpool->counter, af); if (init_addr != NULL && PF_AZERO(init_addr, af)) pf_addrcpy(init_addr, naddr, af); } } while (pf_match_addr(1, &faddr, rmask, &rpool->counter, af) && (states > 0)); if (pf_map_addr_states_increase(af, rpool, naddr) == -1) return (1); /* revert the kif which was set by pfr_pool_get() */ rpool->kif = kif; break; } if (rpool->opts & PF_POOL_STICKYADDR) { if (sns[type] != NULL) { pf_remove_src_node(sns[type]); sns[type] = NULL; } if (pf_insert_src_node(&sns[type], r, type, af, saddr, naddr, rpool->kif)) return (1); } if (pf_status.debug >= LOG_INFO && (rpool->opts & PF_POOL_TYPEMASK) != PF_POOL_NONE) { log(LOG_INFO, "pf: pf_map_addr: selected address "); pf_print_host(naddr, 0, af); if ((rpool->opts & PF_POOL_TYPEMASK) == PF_POOL_LEASTSTATES) addlog(" with state count %llu", states); if ((rpool->addr.type == PF_ADDR_TABLE && rpool->addr.p.tbl->pfrkt_refcntcost > 0) || (rpool->addr.type == PF_ADDR_DYNIFTL && rpool->addr.p.dyn->pfid_kt->pfrkt_refcntcost > 0)) addlog(" with weight %u", weight); addlog("\n"); } return (0); } int pf_map_addr_states_increase(sa_family_t af, struct pf_pool *rpool, struct pf_addr *naddr) { if (rpool->addr.type == PF_ADDR_TABLE) { if (pfr_states_increase(rpool->addr.p.tbl, naddr, af) == -1) { if (pf_status.debug >= LOG_DEBUG) { log(LOG_DEBUG, "pf: pf_map_addr_states_increase: " "selected address "); pf_print_host(naddr, 0, af); addlog(". Failed to increase count!\n"); } return (-1); } } else if (rpool->addr.type == PF_ADDR_DYNIFTL) { if (pfr_states_increase(rpool->addr.p.dyn->pfid_kt, naddr, af) == -1) { if (pf_status.debug >= LOG_DEBUG) { log(LOG_DEBUG, "pf: pf_map_addr_states_increase: " "selected address "); pf_print_host(naddr, 0, af); addlog(". Failed to increase count!\n"); } return (-1); } } return (0); } int pf_get_transaddr(struct pf_rule *r, struct pf_pdesc *pd, struct pf_src_node **sns, struct pf_rule **nr) { struct pf_addr naddr; u_int16_t nport; #ifdef INET6 if (pd->af != pd->naf) return (pf_get_transaddr_af(r, pd, sns)); #endif /* INET6 */ if (r->nat.addr.type != PF_ADDR_NONE) { /* XXX is this right? what if rtable is changed at the same * XXX time? where do I need to figure out the sport? */ nport = 0; if (pf_get_sport(pd, r, &naddr, &nport, r->nat.proxy_port[0], r->nat.proxy_port[1], sns)) { DPFPRINTF(LOG_NOTICE, "pf: NAT proxy port allocation (%u-%u) failed", r->nat.proxy_port[0], r->nat.proxy_port[1]); return (-1); } *nr = r; pf_addrcpy(&pd->nsaddr, &naddr, pd->af); pd->nsport = nport; } if (r->rdr.addr.type != PF_ADDR_NONE) { if (pf_map_addr(pd->af, r, &pd->nsaddr, &naddr, NULL, sns, &r->rdr, PF_SN_RDR)) return (-1); if ((r->rdr.opts & PF_POOL_TYPEMASK) == PF_POOL_BITMASK) pf_poolmask(&naddr, &naddr, &r->rdr.addr.v.a.mask, &pd->ndaddr, pd->af); nport = 0; if (r->rdr.proxy_port[1]) { u_int32_t tmp_nport; u_int16_t div; div = r->rdr.proxy_port[1] - r->rdr.proxy_port[0] + 1; div = (div == 0) ? 1 : div; tmp_nport = ((ntohs(pd->ndport) - ntohs(r->dst.port[0])) % div) + r->rdr.proxy_port[0]; /* wrap around if necessary */ if (tmp_nport > 65535) tmp_nport -= 65535; nport = htons((u_int16_t)tmp_nport); } else if (r->rdr.proxy_port[0]) nport = htons(r->rdr.proxy_port[0]); *nr = r; pf_addrcpy(&pd->ndaddr, &naddr, pd->af); if (nport) pd->ndport = nport; } return (0); } #ifdef INET6 int pf_get_transaddr_af(struct pf_rule *r, struct pf_pdesc *pd, struct pf_src_node **sns) { struct pf_addr ndaddr, nsaddr, naddr; u_int16_t nport; int prefixlen = 96; if (pf_status.debug >= LOG_INFO) { log(LOG_INFO, "pf: af-to %s %s, ", pd->naf == AF_INET ? "inet" : "inet6", r->rdr.addr.type == PF_ADDR_NONE ? "nat" : "rdr"); pf_print_host(&pd->nsaddr, pd->nsport, pd->af); addlog(" -> "); pf_print_host(&pd->ndaddr, pd->ndport, pd->af); addlog("\n"); } if (r->nat.addr.type == PF_ADDR_NONE) panic("pf_get_transaddr_af: no nat pool for source address"); /* get source address and port */ nport = 0; if (pf_get_sport(pd, r, &nsaddr, &nport, r->nat.proxy_port[0], r->nat.proxy_port[1], sns)) { DPFPRINTF(LOG_NOTICE, "pf: af-to NAT proxy port allocation (%u-%u) failed", r->nat.proxy_port[0], r->nat.proxy_port[1]); return (-1); } pd->nsport = nport; if (pd->proto == IPPROTO_ICMPV6 && pd->naf == AF_INET) { if (pd->dir == PF_IN) { pd->ndport = ntohs(pd->ndport); if (pd->ndport == ICMP6_ECHO_REQUEST) pd->ndport = ICMP_ECHO; else if (pd->ndport == ICMP6_ECHO_REPLY) pd->ndport = ICMP_ECHOREPLY; pd->ndport = htons(pd->ndport); } else { pd->nsport = ntohs(pd->nsport); if (pd->nsport == ICMP6_ECHO_REQUEST) pd->nsport = ICMP_ECHO; else if (pd->nsport == ICMP6_ECHO_REPLY) pd->nsport = ICMP_ECHOREPLY; pd->nsport = htons(pd->nsport); } } else if (pd->proto == IPPROTO_ICMP && pd->naf == AF_INET6) { if (pd->dir == PF_IN) { pd->ndport = ntohs(pd->ndport); if (pd->ndport == ICMP_ECHO) pd->ndport = ICMP6_ECHO_REQUEST; else if (pd->ndport == ICMP_ECHOREPLY) pd->ndport = ICMP6_ECHO_REPLY; pd->ndport = htons(pd->ndport); } else { pd->nsport = ntohs(pd->nsport); if (pd->nsport == ICMP_ECHO) pd->nsport = ICMP6_ECHO_REQUEST; else if (pd->nsport == ICMP_ECHOREPLY) pd->nsport = ICMP6_ECHO_REPLY; pd->nsport = htons(pd->nsport); } } /* get the destination address and port */ if (r->rdr.addr.type != PF_ADDR_NONE) { if (pf_map_addr(pd->naf, r, &nsaddr, &naddr, NULL, sns, &r->rdr, PF_SN_RDR)) return (-1); if (r->rdr.proxy_port[0]) pd->ndport = htons(r->rdr.proxy_port[0]); if (pd->naf == AF_INET) { /* The prefix is the IPv4 rdr address */ prefixlen = in_mask2len((struct in_addr *) &r->rdr.addr.v.a.mask); inet_nat46(pd->naf, &pd->ndaddr, &ndaddr, &naddr, prefixlen); } else { /* The prefix is the IPv6 rdr address */ prefixlen = in6_mask2len((struct in6_addr *) &r->rdr.addr.v.a.mask, NULL); inet_nat64(pd->naf, &pd->ndaddr, &ndaddr, &naddr, prefixlen); } } else { if (pd->naf == AF_INET) { /* The prefix is the IPv6 dst address */ prefixlen = in6_mask2len((struct in6_addr *) &r->dst.addr.v.a.mask, NULL); if (prefixlen < 32) prefixlen = 96; inet_nat64(pd->naf, &pd->ndaddr, &ndaddr, &pd->ndaddr, prefixlen); } else { /* * The prefix is the IPv6 nat address * (that was stored in pd->nsaddr) */ prefixlen = in6_mask2len((struct in6_addr *) &r->nat.addr.v.a.mask, NULL); if (prefixlen > 96) prefixlen = 96; inet_nat64(pd->naf, &pd->ndaddr, &ndaddr, &nsaddr, prefixlen); } } pf_addrcpy(&pd->nsaddr, &nsaddr, pd->naf); pf_addrcpy(&pd->ndaddr, &ndaddr, pd->naf); if (pf_status.debug >= LOG_INFO) { log(LOG_INFO, "pf: af-to %s %s done, prefixlen %d, ", pd->naf == AF_INET ? "inet" : "inet6", r->rdr.addr.type == PF_ADDR_NONE ? "nat" : "rdr", prefixlen); pf_print_host(&pd->nsaddr, pd->nsport, pd->naf); addlog(" -> "); pf_print_host(&pd->ndaddr, pd->ndport, pd->naf); addlog("\n"); } return (0); } #endif /* INET6 */ int pf_postprocess_addr(struct pf_state *cur) { struct pf_rule *nr; struct pf_state_key *sks; struct pf_pool rpool; struct pf_addr lookup_addr; int slbcount = -1; nr = cur->natrule.ptr; if (nr == NULL) return (0); /* decrease counter */ sks = cur->key[PF_SK_STACK]; /* check for outgoing or ingoing balancing */ if (nr->rt == PF_ROUTETO) lookup_addr = cur->rt_addr; else if (sks != NULL) lookup_addr = sks->addr[1]; else { if (pf_status.debug >= LOG_DEBUG) { log(LOG_DEBUG, "pf: %s: unable to obtain address", __func__); } return (1); } /* check for appropriate pool */ if (nr->rdr.addr.type != PF_ADDR_NONE) rpool = nr->rdr; else if (nr->nat.addr.type != PF_ADDR_NONE) rpool = nr->nat; else if (nr->route.addr.type != PF_ADDR_NONE) rpool = nr->route; else return (0); if (((rpool.opts & PF_POOL_TYPEMASK) != PF_POOL_LEASTSTATES)) return (0); if (rpool.addr.type == PF_ADDR_TABLE) { if ((slbcount = pfr_states_decrease( rpool.addr.p.tbl, &lookup_addr, sks->af)) == -1) { if (pf_status.debug >= LOG_DEBUG) { log(LOG_DEBUG, "pf: %s: selected address ", __func__); pf_print_host(&lookup_addr, sks->port[0], sks->af); addlog(". Failed to " "decrease count!\n"); } return (1); } } else if (rpool.addr.type == PF_ADDR_DYNIFTL) { if ((slbcount = pfr_states_decrease( rpool.addr.p.dyn->pfid_kt, &lookup_addr, sks->af)) == -1) { if (pf_status.debug >= LOG_DEBUG) { log(LOG_DEBUG, "pf: %s: selected address ", __func__); pf_print_host(&lookup_addr, sks->port[0], sks->af); addlog(". Failed to " "decrease count!\n"); } return (1); } } if (slbcount > -1) { if (pf_status.debug >= LOG_INFO) { log(LOG_INFO, "pf: %s: selected address ", __func__); pf_print_host(&lookup_addr, sks->port[0], sks->af); addlog(" decreased state count to %u\n", slbcount); } } return (0); }
250 250 1 1 1 1 3 255 250 251 251 251 251 251 191 191 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 /* $OpenBSD: bus_dma.c,v 1.51 2019/06/09 12:52:04 kettenis Exp $ */ /* $NetBSD: bus_dma.c,v 1.3 2003/05/07 21:33:58 fvdl Exp $ */ /*- * Copyright (c) 1996, 1997, 1998 The NetBSD Foundation, Inc. * All rights reserved. * * This code is derived from software contributed to The NetBSD Foundation * by Charles M. Hannum and by Jason R. Thorpe of the Numerical Aerospace * Simulation Facility, NASA Ames Research Center. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE * POSSIBILITY OF SUCH DAMAGE. */ /* * The following is included because _bus_dma_uiomove is derived from * uiomove() in kern_subr.c. */ /* * Copyright (c) 1982, 1986, 1991, 1993 * The Regents of the University of California. All rights reserved. * (c) UNIX System Laboratories, Inc. * All or some portions of this file are derived from material licensed * to the University of California by American Telephone and Telegraph * Co. or Unix System Laboratories, Inc. and are reproduced herein with * the permission of UNIX System Laboratories, Inc. * * Copyright (c) 1992, 1993 * The Regents of the University of California. All rights reserved. * * This software was developed by the Computer Systems Engineering group * at Lawrence Berkeley Laboratory under DARPA contract BG 91-66 and * contributed to Berkeley. * * All advertising materials mentioning features or use of this software * must display the following acknowledgement: * This product includes software developed by the University of * California, Lawrence Berkeley Laboratory. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. All advertising materials mentioning features or use of this software * must display the following acknowledgement: * This product includes software developed by the University of * California, Berkeley and its contributors. * 4. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ #include <sys/param.h> #include <sys/systm.h> #include <sys/malloc.h> #include <sys/mbuf.h> #include <sys/proc.h> #include <machine/bus.h> #include <uvm/uvm_extern.h> int _bus_dmamap_load_buffer(bus_dma_tag_t, bus_dmamap_t, void *, bus_size_t, struct proc *, int, paddr_t *, int *, int); /* * Common function for DMA map creation. May be called by bus-specific * DMA map creation functions. */ int _bus_dmamap_create(bus_dma_tag_t t, bus_size_t size, int nsegments, bus_size_t maxsegsz, bus_size_t boundary, int flags, bus_dmamap_t *dmamp) { struct bus_dmamap *map; void *mapstore; size_t mapsize; /* * Allocate and initialize the DMA map. The end of the map * is a variable-sized array of segments, so we allocate enough * room for them in one shot. * * Note we don't preserve the WAITOK or NOWAIT flags. Preservation * of ALLOCNOW notifies others that we've reserved these resources, * and they are not to be freed. * * The bus_dmamap_t includes one bus_dma_segment_t, hence * the (nsegments - 1). */ mapsize = sizeof(struct bus_dmamap) + (sizeof(bus_dma_segment_t) * (nsegments - 1)); if ((mapstore = malloc(mapsize, M_DEVBUF, (flags & BUS_DMA_NOWAIT) ? (M_NOWAIT|M_ZERO) : (M_WAITOK|M_ZERO))) == NULL) return (ENOMEM); map = (struct bus_dmamap *)mapstore; map->_dm_size = size; map->_dm_segcnt = nsegments; map->_dm_maxsegsz = maxsegsz; map->_dm_boundary = boundary; map->_dm_flags = flags & ~(BUS_DMA_WAITOK|BUS_DMA_NOWAIT); *dmamp = map; return (0); } /* * Common function for DMA map destruction. May be called by bus-specific * DMA map destruction functions. */ void _bus_dmamap_destroy(bus_dma_tag_t t, bus_dmamap_t map) { size_t mapsize; mapsize = sizeof(struct bus_dmamap) + (sizeof(bus_dma_segment_t) * (map->_dm_segcnt - 1)); free(map, M_DEVBUF, mapsize); } /* * Common function for loading a DMA map with a linear buffer. May * be called by bus-specific DMA map load functions. */ int _bus_dmamap_load(bus_dma_tag_t t, bus_dmamap_t map, void *buf, bus_size_t buflen, struct proc *p, int flags) { bus_addr_t lastaddr = 0; int seg, error; /* * Make sure that on error condition we return "no valid mappings". */ map->dm_mapsize = 0; map->dm_nsegs = 0; if (buflen > map->_dm_size) return (EINVAL); seg = 0; error = _bus_dmamap_load_buffer(t, map, buf, buflen, p, flags, &lastaddr, &seg, 1); if (error == 0) { map->dm_mapsize = buflen; map->dm_nsegs = seg + 1; } return (error); } /* * Like _bus_dmamap_load(), but for mbufs. */ int _bus_dmamap_load_mbuf(bus_dma_tag_t t, bus_dmamap_t map, struct mbuf *m0, int flags) { paddr_t lastaddr = 0; int seg, error, first; struct mbuf *m; /* * Make sure that on error condition we return "no valid mappings". */ map->dm_mapsize = 0; map->dm_nsegs = 0; #ifdef DIAGNOSTIC if ((m0->m_flags & M_PKTHDR) == 0) panic("_bus_dmamap_load_mbuf: no packet header"); #endif if (m0->m_pkthdr.len > map->_dm_size) return (EINVAL); first = 1; seg = 0; error = 0; for (m = m0; m != NULL && error == 0; m = m->m_next) { if (m->m_len == 0) continue; error = _bus_dmamap_load_buffer(t, map, m->m_data, m->m_len, NULL, flags, &lastaddr, &seg, first); first = 0; } if (error == 0) { map->dm_mapsize = m0->m_pkthdr.len; map->dm_nsegs = seg + 1; } return (error); } /* * Like _bus_dmamap_load(), but for uios. */ int _bus_dmamap_load_uio(bus_dma_tag_t t, bus_dmamap_t map, struct uio *uio, int flags) { paddr_t lastaddr = 0; int seg, i, error, first; bus_size_t minlen, resid; struct proc *p = NULL; struct iovec *iov; caddr_t addr; /* * Make sure that on error condition we return "no valid mappings". */ map->dm_mapsize = 0; map->dm_nsegs = 0; resid = uio->uio_resid; iov = uio->uio_iov; if (uio->uio_segflg == UIO_USERSPACE) { p = uio->uio_procp; #ifdef DIAGNOSTIC if (p == NULL) panic("_bus_dmamap_load_uio: USERSPACE but no proc"); #endif } first = 1; seg = 0; error = 0; for (i = 0; i < uio->uio_iovcnt && resid != 0 && error == 0; i++) { /* * Now at the first iovec to load. Load each iovec * until we have exhausted the residual count. */ minlen = resid < iov[i].iov_len ? resid : iov[i].iov_len; addr = (caddr_t)iov[i].iov_base; error = _bus_dmamap_load_buffer(t, map, addr, minlen, p, flags, &lastaddr, &seg, first); first = 0; resid -= minlen; } if (error == 0) { map->dm_mapsize = uio->uio_resid; map->dm_nsegs = seg + 1; } return (error); } /* * Like _bus_dmamap_load(), but for raw memory allocated with * bus_dmamem_alloc(). */ int _bus_dmamap_load_raw(bus_dma_tag_t t, bus_dmamap_t map, bus_dma_segment_t *segs, int nsegs, bus_size_t size, int flags) { bus_addr_t paddr, baddr, bmask, lastaddr = 0; bus_size_t plen, sgsize, mapsize; int first = 1; int i, seg = 0; /* * Make sure that on error condition we return "no valid mappings". */ map->dm_mapsize = 0; map->dm_nsegs = 0; if (nsegs > map->_dm_segcnt || size > map->_dm_size) return (EINVAL); mapsize = size; bmask = ~(map->_dm_boundary - 1); for (i = 0; i < nsegs && size > 0; i++) { paddr = segs[i].ds_addr; plen = MIN(segs[i].ds_len, size); while (plen > 0) { /* * Compute the segment size, and adjust counts. */ sgsize = PAGE_SIZE - ((u_long)paddr & PGOFSET); if (plen < sgsize) sgsize = plen; if (paddr > dma_constraint.ucr_high && (map->_dm_flags & BUS_DMA_64BIT) == 0) panic("Non dma-reachable buffer at paddr %#lx(raw)", paddr); /* * Make sure we don't cross any boundaries. */ if (map->_dm_boundary > 0) { baddr = (paddr + map->_dm_boundary) & bmask; if (sgsize > (baddr - paddr)) sgsize = (baddr - paddr); } /* * Insert chunk into a segment, coalescing with * previous segment if possible. */ if (first) { map->dm_segs[seg].ds_addr = paddr; map->dm_segs[seg].ds_len = sgsize; first = 0; } else { if (paddr == lastaddr && (map->dm_segs[seg].ds_len + sgsize) <= map->_dm_maxsegsz && (map->_dm_boundary == 0 || (map->dm_segs[seg].ds_addr & bmask) == (paddr & bmask))) map->dm_segs[seg].ds_len += sgsize; else { if (++seg >= map->_dm_segcnt) return (EINVAL); map->dm_segs[seg].ds_addr = paddr; map->dm_segs[seg].ds_len = sgsize; } } paddr += sgsize; plen -= sgsize; size -= sgsize; lastaddr = paddr; } } map->dm_mapsize = mapsize; map->dm_nsegs = seg + 1; return (0); } /* * Common function for unloading a DMA map. May be called by * bus-specific DMA map unload functions. */ void _bus_dmamap_unload(bus_dma_tag_t t, bus_dmamap_t map) { /* * No resources to free; just mark the mappings as * invalid. */ map->dm_mapsize = 0; map->dm_nsegs = 0; } /* * Common function for DMA map synchronization. May be called * by bus-specific DMA map synchronization functions. */ void _bus_dmamap_sync(bus_dma_tag_t t, bus_dmamap_t map, bus_addr_t addr, bus_size_t size, int op) { /* Nothing to do here. */ } /* * Common function for DMA-safe memory allocation. May be called * by bus-specific DMA memory allocation functions. */ int _bus_dmamem_alloc(bus_dma_tag_t t, bus_size_t size, bus_size_t alignment, bus_size_t boundary, bus_dma_segment_t *segs, int nsegs, int *rsegs, int flags) { /* * XXX in the presence of decent (working) iommus and bouncebuffers * we can then fallback this allocation to a range of { 0, -1 }. * However for now we err on the side of caution and allocate dma * memory under the 4gig boundary. */ return (_bus_dmamem_alloc_range(t, size, alignment, boundary, segs, nsegs, rsegs, flags, (bus_addr_t)0, (bus_addr_t)0xffffffff)); } /* * Common function for freeing DMA-safe memory. May be called by * bus-specific DMA memory free functions. */ void _bus_dmamem_free(bus_dma_tag_t t, bus_dma_segment_t *segs, int nsegs) { struct vm_page *m; bus_addr_t addr; struct pglist mlist; int curseg; /* * Build a list of pages to free back to the VM system. */ TAILQ_INIT(&mlist); for (curseg = 0; curseg < nsegs; curseg++) { for (addr = segs[curseg].ds_addr; addr < (segs[curseg].ds_addr + segs[curseg].ds_len); addr += PAGE_SIZE) { m = PHYS_TO_VM_PAGE(addr); TAILQ_INSERT_TAIL(&mlist, m, pageq); } } uvm_pglistfree(&mlist); } /* * Common function for mapping DMA-safe memory. May be called by * bus-specific DMA memory map functions. */ int _bus_dmamem_map(bus_dma_tag_t t, bus_dma_segment_t *segs, int nsegs, size_t size, caddr_t *kvap, int flags) { vaddr_t va, sva; size_t ssize; bus_addr_t addr; int curseg, pmapflags = 0, error; const struct kmem_dyn_mode *kd; if (nsegs == 1 && (flags & BUS_DMA_NOCACHE) == 0) { *kvap = (caddr_t)PMAP_DIRECT_MAP(segs[0].ds_addr); return (0); } if (flags & BUS_DMA_NOCACHE) pmapflags |= PMAP_NOCACHE; size = round_page(size); kd = flags & BUS_DMA_NOWAIT ? &kd_trylock : &kd_waitok; va = (vaddr_t)km_alloc(size, &kv_any, &kp_none, kd); if (va == 0) return (ENOMEM); *kvap = (caddr_t)va; sva = va; ssize = size; for (curseg = 0; curseg < nsegs; curseg++) { for (addr = segs[curseg].ds_addr; addr < (segs[curseg].ds_addr + segs[curseg].ds_len); addr += PAGE_SIZE, va += PAGE_SIZE, size -= PAGE_SIZE) { if (size == 0) panic("_bus_dmamem_map: size botch"); error = pmap_enter(pmap_kernel(), va, addr | pmapflags, PROT_READ | PROT_WRITE, PROT_READ | PROT_WRITE | PMAP_WIRED | PMAP_CANFAIL); if (error) { pmap_update(pmap_kernel()); km_free((void *)sva, ssize, &kv_any, &kp_none); return (error); } } } pmap_update(pmap_kernel()); return (0); } /* * Common function for unmapping DMA-safe memory. May be called by * bus-specific DMA memory unmapping functions. */ void _bus_dmamem_unmap(bus_dma_tag_t t, caddr_t kva, size_t size) { #ifdef DIAGNOSTIC if ((u_long)kva & PGOFSET) panic("_bus_dmamem_unmap"); #endif if (kva >= (caddr_t)PMAP_DIRECT_BASE && kva <= (caddr_t)PMAP_DIRECT_END) return; km_free(kva, round_page(size), &kv_any, &kp_none); } /* * Common function for mmap(2)'ing DMA-safe memory. May be called by * bus-specific DMA mmap(2)'ing functions. */ paddr_t _bus_dmamem_mmap(bus_dma_tag_t t, bus_dma_segment_t *segs, int nsegs, off_t off, int prot, int flags) { int i, pmapflags = 0; if (flags & BUS_DMA_NOCACHE) pmapflags |= PMAP_NOCACHE; for (i = 0; i < nsegs; i++) { #ifdef DIAGNOSTIC if (off & PGOFSET) panic("_bus_dmamem_mmap: offset unaligned"); if (segs[i].ds_addr & PGOFSET) panic("_bus_dmamem_mmap: segment unaligned"); if (segs[i].ds_len & PGOFSET) panic("_bus_dmamem_mmap: segment size not multiple" " of page size"); #endif if (off >= segs[i].ds_len) { off -= segs[i].ds_len; continue; } return ((segs[i].ds_addr + off) | pmapflags); } /* Page not found. */ return (-1); } /********************************************************************** * DMA utility functions **********************************************************************/ /* * Utility function to load a linear buffer. lastaddrp holds state * between invocations (for multiple-buffer loads). segp contains * the starting segment on entrance, and the ending segment on exit. * first indicates if this is the first invocation of this function. */ int _bus_dmamap_load_buffer(bus_dma_tag_t t, bus_dmamap_t map, void *buf, bus_size_t buflen, struct proc *p, int flags, paddr_t *lastaddrp, int *segp, int first) { bus_size_t sgsize; bus_addr_t curaddr, lastaddr, baddr, bmask; vaddr_t vaddr = (vaddr_t)buf; int seg; pmap_t pmap; if (p != NULL) pmap = p->p_vmspace->vm_map.pmap; else pmap = pmap_kernel(); lastaddr = *lastaddrp; bmask = ~(map->_dm_boundary - 1); for (seg = *segp; buflen > 0 ; ) { /* * Get the physical address for this segment. */ pmap_extract(pmap, vaddr, (paddr_t *)&curaddr); if (curaddr > dma_constraint.ucr_high && (map->_dm_flags & BUS_DMA_64BIT) == 0) panic("Non dma-reachable buffer at curaddr %#lx(raw)", curaddr); /* * Compute the segment size, and adjust counts. */ sgsize = PAGE_SIZE - ((u_long)vaddr & PGOFSET); if (buflen < sgsize) sgsize = buflen; /* * Make sure we don't cross any boundaries. */ if (map->_dm_boundary > 0) { baddr = (curaddr + map->_dm_boundary) & bmask; if (sgsize > (baddr - curaddr)) sgsize = (baddr - curaddr); } /* * Insert chunk into a segment, coalescing with * previous segment if possible. */ if (first) { map->dm_segs[seg].ds_addr = curaddr; map->dm_segs[seg].ds_len = sgsize; first = 0; } else { if (curaddr == lastaddr && (map->dm_segs[seg].ds_len + sgsize) <= map->_dm_maxsegsz && (map->_dm_boundary == 0 || (map->dm_segs[seg].ds_addr & bmask) == (curaddr & bmask))) map->dm_segs[seg].ds_len += sgsize; else { if (++seg >= map->_dm_segcnt) break; map->dm_segs[seg].ds_addr = curaddr; map->dm_segs[seg].ds_len = sgsize; } } lastaddr = curaddr + sgsize; vaddr += sgsize; buflen -= sgsize; } *segp = seg; *lastaddrp = lastaddr; /* * Did we fit? */ if (buflen != 0) return (EFBIG); /* XXX better return value here? */ return (0); } /* * Allocate physical memory from the given physical address range. * Called by DMA-safe memory allocation methods. */ int _bus_dmamem_alloc_range(bus_dma_tag_t t, bus_size_t size, bus_size_t alignment, bus_size_t boundary, bus_dma_segment_t *segs, int nsegs, int *rsegs, int flags, bus_addr_t low, bus_addr_t high) { paddr_t curaddr, lastaddr; struct vm_page *m; struct pglist mlist; int curseg, error, plaflag; /* Always round the size. */ size = round_page(size); segs[0]._ds_boundary = boundary; segs[0]._ds_align = alignment; /* * Allocate pages from the VM system. */ plaflag = flags & BUS_DMA_NOWAIT ? UVM_PLA_NOWAIT : UVM_PLA_WAITOK; if (flags & BUS_DMA_ZERO) plaflag |= UVM_PLA_ZERO; TAILQ_INIT(&mlist); error = uvm_pglistalloc(size, low, high, alignment, boundary, &mlist, nsegs, plaflag); if (error) return (error); /* * Compute the location, size, and number of segments actually * returned by the VM code. */ m = TAILQ_FIRST(&mlist); curseg = 0; lastaddr = segs[curseg].ds_addr = VM_PAGE_TO_PHYS(m); segs[curseg].ds_len = PAGE_SIZE; for (m = TAILQ_NEXT(m, pageq); m != NULL; m = TAILQ_NEXT(m, pageq)) { curaddr = VM_PAGE_TO_PHYS(m); #ifdef DIAGNOSTIC if (curseg == nsegs) { printf("uvm_pglistalloc returned too many\n"); panic("_bus_dmamem_alloc_range"); } if (curaddr < low || curaddr >= high) { printf("uvm_pglistalloc returned non-sensical" " address 0x%lx\n", curaddr); panic("_bus_dmamem_alloc_range"); } #endif if (curaddr == (lastaddr + PAGE_SIZE)) segs[curseg].ds_len += PAGE_SIZE; else { curseg++; segs[curseg].ds_addr = curaddr; segs[curseg].ds_len = PAGE_SIZE; } lastaddr = curaddr; } *rsegs = curseg + 1; return (0); }
2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 /* $OpenBSD: ipifuncs.c,v 1.37 2022/08/07 23:56:06 guenther Exp $ */ /* $NetBSD: ipifuncs.c,v 1.1 2003/04/26 18:39:28 fvdl Exp $ */ /*- * Copyright (c) 2000 The NetBSD Foundation, Inc. * All rights reserved. * * This code is derived from software contributed to The NetBSD Foundation * by RedBack Networks Inc. * * Author: Bill Sommerfeld * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE * POSSIBILITY OF SUCH DAMAGE. */ /* * Interprocessor interrupt handlers. */ #include <sys/param.h> #include <sys/device.h> #include <sys/memrange.h> #include <sys/systm.h> #include <uvm/uvm_extern.h> #include <machine/intr.h> #include <machine/atomic.h> #include <machine/cpuvar.h> #include <machine/i82093var.h> #include <machine/i82489var.h> #include <machine/fpu.h> #include <machine/mplock.h> #include <machine/db_machdep.h> #include "vmm.h" #if NVMM > 0 #include <machine/vmmvar.h> #endif /* NVMM > 0 */ void x86_64_ipi_nop(struct cpu_info *); void x86_64_ipi_halt(struct cpu_info *); void x86_64_ipi_wbinvd(struct cpu_info *); #if NVMM > 0 void x86_64_ipi_vmclear_vmm(struct cpu_info *); void x86_64_ipi_start_vmm(struct cpu_info *); void x86_64_ipi_stop_vmm(struct cpu_info *); #endif /* NVMM > 0 */ #include "pctr.h" #if NPCTR > 0 #include <machine/pctr.h> #define x86_64_ipi_reload_pctr pctr_reload #else #define x86_64_ipi_reload_pctr NULL #endif #ifdef MTRR void x86_64_ipi_reload_mtrr(struct cpu_info *); #else #define x86_64_ipi_reload_mtrr NULL #endif void (*ipifunc[X86_NIPI])(struct cpu_info *) = { x86_64_ipi_halt, x86_64_ipi_nop, #if NVMM > 0 x86_64_ipi_vmclear_vmm, #else NULL, #endif NULL, x86_64_ipi_reload_pctr, x86_64_ipi_reload_mtrr, x86_setperf_ipi, #ifdef DDB x86_ipi_db, #else NULL, #endif #if NVMM > 0 x86_64_ipi_start_vmm, x86_64_ipi_stop_vmm, #else NULL, NULL, #endif x86_64_ipi_wbinvd, }; void x86_64_ipi_nop(struct cpu_info *ci) { } void x86_64_ipi_halt(struct cpu_info *ci) { SCHED_ASSERT_UNLOCKED(); KASSERT(!_kernel_lock_held()); intr_disable(); lapic_disable(); wbinvd(); atomic_clearbits_int(&ci->ci_flags, CPUF_RUNNING); wbinvd(); for(;;) { __asm volatile("hlt"); } } #ifdef MTRR void x86_64_ipi_reload_mtrr(struct cpu_info *ci) { if (mem_range_softc.mr_op != NULL) mem_range_softc.mr_op->reload(&mem_range_softc); } #endif #if NVMM > 0 void x86_64_ipi_vmclear_vmm(struct cpu_info *ci) { vmclear_on_cpu(ci); } void x86_64_ipi_start_vmm(struct cpu_info *ci) { start_vmm_on_cpu(ci); } void x86_64_ipi_stop_vmm(struct cpu_info *ci) { stop_vmm_on_cpu(ci); } #endif /* NVMM > 0 */ void x86_64_ipi_wbinvd(struct cpu_info *ci) { wbinvd(); }
4 4 3 6 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 /* $OpenBSD: time.h,v 1.62 2022/07/23 22:58:51 cheloha Exp $ */ /* $NetBSD: time.h,v 1.18 1996/04/23 10:29:33 mycroft Exp $ */ /* * Copyright (c) 1982, 1986, 1993 * The Regents of the University of California. All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * @(#)time.h 8.2 (Berkeley) 7/10/94 */ #ifndef _SYS_TIME_H_ #define _SYS_TIME_H_ #include <sys/select.h> #ifndef _TIMEVAL_DECLARED #define _TIMEVAL_DECLARED /* * Structure returned by gettimeofday(2) system call, * and used in other calls. */ struct timeval { time_t tv_sec; /* seconds */ suseconds_t tv_usec; /* and microseconds */ }; #endif #ifndef _TIMESPEC_DECLARED #define _TIMESPEC_DECLARED /* * Structure defined by POSIX.1b to be like a timeval. */ struct timespec { time_t tv_sec; /* seconds */ long tv_nsec; /* and nanoseconds */ }; #endif #define TIMEVAL_TO_TIMESPEC(tv, ts) do { \ (ts)->tv_sec = (tv)->tv_sec; \ (ts)->tv_nsec = (tv)->tv_usec * 1000; \ } while (0) #define TIMESPEC_TO_TIMEVAL(tv, ts) do { \ (tv)->tv_sec = (ts)->tv_sec; \ (tv)->tv_usec = (ts)->tv_nsec / 1000; \ } while (0) struct timezone { int tz_minuteswest; /* minutes west of Greenwich */ int tz_dsttime; /* type of dst correction */ }; #define DST_NONE 0 /* not on dst */ #define DST_USA 1 /* USA style dst */ #define DST_AUST 2 /* Australian style dst */ #define DST_WET 3 /* Western European dst */ #define DST_MET 4 /* Middle European dst */ #define DST_EET 5 /* Eastern European dst */ #define DST_CAN 6 /* Canada */ /* Operations on timevals. */ #define timerclear(tvp) (tvp)->tv_sec = (tvp)->tv_usec = 0 #define timerisset(tvp) ((tvp)->tv_sec || (tvp)->tv_usec) #define timerisvalid(tvp) \ ((tvp)->tv_usec >= 0 && (tvp)->tv_usec < 1000000) #define timercmp(tvp, uvp, cmp) \ (((tvp)->tv_sec == (uvp)->tv_sec) ? \ ((tvp)->tv_usec cmp (uvp)->tv_usec) : \ ((tvp)->tv_sec cmp (uvp)->tv_sec)) #define timeradd(tvp, uvp, vvp) \ do { \ (vvp)->tv_sec = (tvp)->tv_sec + (uvp)->tv_sec; \ (vvp)->tv_usec = (tvp)->tv_usec + (uvp)->tv_usec; \ if ((vvp)->tv_usec >= 1000000) { \ (vvp)->tv_sec++; \ (vvp)->tv_usec -= 1000000; \ } \ } while (0) #define timersub(tvp, uvp, vvp) \ do { \ (vvp)->tv_sec = (tvp)->tv_sec - (uvp)->tv_sec; \ (vvp)->tv_usec = (tvp)->tv_usec - (uvp)->tv_usec; \ if ((vvp)->tv_usec < 0) { \ (vvp)->tv_sec--; \ (vvp)->tv_usec += 1000000; \ } \ } while (0) /* Operations on timespecs. */ #define timespecclear(tsp) (tsp)->tv_sec = (tsp)->tv_nsec = 0 #define timespecisset(tsp) ((tsp)->tv_sec || (tsp)->tv_nsec) #define timespecisvalid(tsp) \ ((tsp)->tv_nsec >= 0 && (tsp)->tv_nsec < 1000000000L) #define timespeccmp(tsp, usp, cmp) \ (((tsp)->tv_sec == (usp)->tv_sec) ? \ ((tsp)->tv_nsec cmp (usp)->tv_nsec) : \ ((tsp)->tv_sec cmp (usp)->tv_sec)) #define timespecadd(tsp, usp, vsp) \ do { \ (vsp)->tv_sec = (tsp)->tv_sec + (usp)->tv_sec; \ (vsp)->tv_nsec = (tsp)->tv_nsec + (usp)->tv_nsec; \ if ((vsp)->tv_nsec >= 1000000000L) { \ (vsp)->tv_sec++; \ (vsp)->tv_nsec -= 1000000000L; \ } \ } while (0) #define timespecsub(tsp, usp, vsp) \ do { \ (vsp)->tv_sec = (tsp)->tv_sec - (usp)->tv_sec; \ (vsp)->tv_nsec = (tsp)->tv_nsec - (usp)->tv_nsec; \ if ((vsp)->tv_nsec < 0) { \ (vsp)->tv_sec--; \ (vsp)->tv_nsec += 1000000000L; \ } \ } while (0) /* * Names of the interval timers, and structure * defining a timer setting. */ #define ITIMER_REAL 0 #define ITIMER_VIRTUAL 1 #define ITIMER_PROF 2 struct itimerval { struct timeval it_interval; /* timer interval */ struct timeval it_value; /* current value */ }; #if __BSD_VISIBLE /* * clock information structure for sysctl({CTL_KERN, KERN_CLOCKRATE}) */ struct clockinfo { int hz; /* clock frequency */ int tick; /* micro-seconds per hz tick */ int stathz; /* statistics clock frequency */ int profhz; /* profiling clock frequency */ }; #endif /* __BSD_VISIBLE */ #if defined(_KERNEL) || defined(_STANDALONE) || defined (_LIBC) #include <sys/_time.h> /* Time expressed as seconds and fractions of a second + operations on it. */ struct bintime { time_t sec; uint64_t frac; }; #endif #if defined(_KERNEL) || defined(_STANDALONE) || defined (_LIBC) #define bintimecmp(btp, ctp, cmp) \ ((btp)->sec == (ctp)->sec ? \ (btp)->frac cmp (ctp)->frac : \ (btp)->sec cmp (ctp)->sec) static inline void bintimeaddfrac(const struct bintime *bt, uint64_t x, struct bintime *ct) { ct->sec = bt->sec; if (bt->frac > bt->frac + x) ct->sec++; ct->frac = bt->frac + x; } static inline void bintimeadd(const struct bintime *bt, const struct bintime *ct, struct bintime *dt) { dt->sec = bt->sec + ct->sec; if (bt->frac > bt->frac + ct->frac) dt->sec++; dt->frac = bt->frac + ct->frac; } static inline void bintimesub(const struct bintime *bt, const struct bintime *ct, struct bintime *dt) { dt->sec = bt->sec - ct->sec; if (bt->frac < bt->frac - ct->frac) dt->sec--; dt->frac = bt->frac - ct->frac; } static inline void TIMECOUNT_TO_BINTIME(u_int count, uint64_t scale, struct bintime *bt) { uint64_t hi64; hi64 = count * (scale >> 32); bt->sec = hi64 >> 32; bt->frac = hi64 << 32; bintimeaddfrac(bt, count * (scale & 0xffffffff), bt); } /*- * Background information: * * When converting between timestamps on parallel timescales of differing * resolutions it is historical and scientific practice to round down rather * than doing 4/5 rounding. * * The date changes at midnight, not at noon. * * Even at 15:59:59.999999999 it's not four'o'clock. * * time_second ticks after N.999999999 not after N.4999999999 */ static inline uint32_t FRAC_TO_NSEC(uint64_t frac) { return ((frac >> 32) * 1000000000ULL) >> 32; } static inline void BINTIME_TO_TIMESPEC(const struct bintime *bt, struct timespec *ts) { ts->tv_sec = bt->sec; ts->tv_nsec = FRAC_TO_NSEC(bt->frac); } static inline void TIMESPEC_TO_BINTIME(const struct timespec *ts, struct bintime *bt) { bt->sec = ts->tv_sec; /* 18446744073 = int(2^64 / 1000000000) */ bt->frac = (uint64_t)ts->tv_nsec * (uint64_t)18446744073ULL; } static inline void BINTIME_TO_TIMEVAL(const struct bintime *bt, struct timeval *tv) { tv->tv_sec = bt->sec; tv->tv_usec = (long)(((uint64_t)1000000 * (uint32_t)(bt->frac >> 32)) >> 32); } static inline void TIMEVAL_TO_BINTIME(const struct timeval *tv, struct bintime *bt) { bt->sec = (time_t)tv->tv_sec; /* 18446744073709 = int(2^64 / 1000000) */ bt->frac = (uint64_t)tv->tv_usec * (uint64_t)18446744073709ULL; } #endif #if defined(_KERNEL) || defined(_STANDALONE) /* * Functions for looking at our clocks: [get]{bin,nano,micro}[boot|up]time() * * Functions without the "get" prefix returns the best timestamp * we can produce in the given format. * * "bin" == struct bintime == seconds + 64 bit fraction of seconds. * "nano" == struct timespec == seconds + nanoseconds. * "micro" == struct timeval == seconds + microseconds. * * Functions containing "up" returns time relative to boot and * should be used for calculating time intervals. * * Functions containing "boot" return the GMT time at which the * system booted. * * Functions with just "time" return the current GMT time. * * Functions with the "get" prefix returns a less precise result * much faster than the functions without "get" prefix and should * be used where a precision of 10 msec is acceptable or where * performance is priority. (NB: "precision", _not_ "resolution" !) */ void bintime(struct bintime *); void nanotime(struct timespec *); void microtime(struct timeval *); void getnanotime(struct timespec *); void getmicrotime(struct timeval *); void binuptime(struct bintime *); void nanouptime(struct timespec *); void microuptime(struct timeval *); void getbinuptime(struct bintime *); void getnanouptime(struct timespec *); void getmicrouptime(struct timeval *); void binboottime(struct bintime *); void microboottime(struct timeval *); void nanoboottime(struct timespec *); void binruntime(struct bintime *); void nanoruntime(struct timespec *); time_t gettime(void); time_t getuptime(void); uint64_t nsecuptime(void); uint64_t getnsecuptime(void); struct proc; int clock_gettime(struct proc *, clockid_t, struct timespec *); void cancel_all_itimers(void); int itimerdecr(struct itimerspec *, long); int settime(const struct timespec *); int ratecheck(struct timeval *, const struct timeval *); int ppsratecheck(struct timeval *, int *, int); /* * "POSIX time" to/from "YY/MM/DD/hh/mm/ss" */ struct clock_ymdhms { u_short dt_year; u_char dt_mon; u_char dt_day; u_char dt_wday; /* Day of week */ u_char dt_hour; u_char dt_min; u_char dt_sec; }; time_t clock_ymdhms_to_secs(struct clock_ymdhms *); void clock_secs_to_ymdhms(time_t, struct clock_ymdhms *); /* * BCD to decimal and decimal to BCD. */ #define FROMBCD(x) (((x) >> 4) * 10 + ((x) & 0xf)) #define TOBCD(x) (((x) / 10 * 16) + ((x) % 10)) /* Some handy constants. */ #define SECDAY 86400L #define SECYR (SECDAY * 365) /* Traditional POSIX base year */ #define POSIX_BASE_YEAR 1970 #include <sys/stdint.h> static inline void USEC_TO_TIMEVAL(uint64_t us, struct timeval *tv) { tv->tv_sec = us / 1000000; tv->tv_usec = us % 1000000; } static inline void NSEC_TO_TIMEVAL(uint64_t ns, struct timeval *tv) { tv->tv_sec = ns / 1000000000L; tv->tv_usec = (ns % 1000000000L) / 1000; } static inline uint64_t TIMEVAL_TO_NSEC(const struct timeval *tv) { uint64_t nsecs; if (tv->tv_sec > UINT64_MAX / 1000000000ULL) return UINT64_MAX; nsecs = tv->tv_sec * 1000000000ULL; if (tv->tv_usec * 1000ULL > UINT64_MAX - nsecs) return UINT64_MAX; return nsecs + tv->tv_usec * 1000ULL; } static inline void NSEC_TO_TIMESPEC(uint64_t ns, struct timespec *ts) { ts->tv_sec = ns / 1000000000L; ts->tv_nsec = ns % 1000000000L; } static inline uint64_t SEC_TO_NSEC(uint64_t seconds) { if (seconds > UINT64_MAX / 1000000000ULL) return UINT64_MAX; return seconds * 1000000000ULL; } static inline uint64_t MSEC_TO_NSEC(uint64_t milliseconds) { if (milliseconds > UINT64_MAX / 1000000ULL) return UINT64_MAX; return milliseconds * 1000000ULL; } static inline uint64_t USEC_TO_NSEC(uint64_t microseconds) { if (microseconds > UINT64_MAX / 1000ULL) return UINT64_MAX; return microseconds * 1000ULL; } static inline uint64_t TIMESPEC_TO_NSEC(const struct timespec *ts) { if (ts->tv_sec > (UINT64_MAX - ts->tv_nsec) / 1000000000ULL) return UINT64_MAX; return ts->tv_sec * 1000000000ULL + ts->tv_nsec; } static inline uint64_t BINTIME_TO_NSEC(const struct bintime *bt) { return bt->sec * 1000000000ULL + FRAC_TO_NSEC(bt->frac); } #else /* !_KERNEL */ #include <time.h> #if __BSD_VISIBLE || __XPG_VISIBLE __BEGIN_DECLS #if __BSD_VISIBLE int adjtime(const struct timeval *, struct timeval *); int adjfreq(const int64_t *, int64_t *); #endif #if __XPG_VISIBLE int futimes(int, const struct timeval *); int getitimer(int, struct itimerval *); int gettimeofday(struct timeval *, struct timezone *); int setitimer(int, const struct itimerval *, struct itimerval *); int settimeofday(const struct timeval *, const struct timezone *); int utimes(const char *, const struct timeval *); #endif /* __XPG_VISIBLE */ __END_DECLS #endif /* __BSD_VISIBLE || __XPG_VISIBLE */ #endif /* !_KERNEL */ #endif /* !_SYS_TIME_H_ */
6 3 3 6 6 2 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 /* $OpenBSD: wstpad.c,v 1.31 2022/06/09 22:17:18 bru Exp $ */ /* * Copyright (c) 2015, 2016 Ulf Brosziewski * * Permission to use, copy, modify, and distribute this software for any * purpose with or without fee is hereby granted, provided that the above * copyright notice and this permission notice appear in all copies. * * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. */ /* * touchpad input processing */ #include <sys/param.h> #include <sys/kernel.h> #include <sys/malloc.h> #include <sys/proc.h> #include <sys/systm.h> #include <sys/signalvar.h> #include <sys/timeout.h> #include <dev/wscons/wsconsio.h> #include <dev/wscons/wsmousevar.h> #include <dev/wscons/wseventvar.h> #include <dev/wscons/wsmouseinput.h> #define BTNMASK(n) ((n) > 0 && (n) <= 32 ? 1 << ((n) - 1) : 0) #define LEFTBTN BTNMASK(1) #define MIDDLEBTN BTNMASK(2) #define RIGHTBTN BTNMASK(3) #define PRIMARYBTN LEFTBTN #define PRIMARYBTN_CLICKED(tp) ((tp)->btns_sync & PRIMARYBTN & (tp)->btns) #define PRIMARYBTN_RELEASED(tp) ((tp)->btns_sync & PRIMARYBTN & ~(tp)->btns) #define IS_MT(tp) ((tp)->features & WSTPAD_MT) #define DISABLE(tp) ((tp)->features & WSTPAD_DISABLE) /* * Ratios to the height or width of the touchpad surface, in * [*.12] fixed-point format: */ #define V_EDGE_RATIO_DEFAULT 205 #define B_EDGE_RATIO_DEFAULT 410 #define T_EDGE_RATIO_DEFAULT 512 #define CENTER_RATIO_DEFAULT 512 #define TAP_MAXTIME_DEFAULT 180 #define TAP_CLICKTIME_DEFAULT 180 #define TAP_LOCKTIME_DEFAULT 0 #define TAP_BTNMAP_SIZE 3 #define CLICKDELAY_MS 20 #define FREEZE_MS 100 #define MATCHINTERVAL_MS 45 #define STOPINTERVAL_MS 55 #define MAG_LOW (10 << 12) #define MAG_MEDIUM (18 << 12) enum tpad_handlers { SOFTBUTTON_HDLR, TOPBUTTON_HDLR, TAP_HDLR, F2SCROLL_HDLR, EDGESCROLL_HDLR, CLICK_HDLR, }; enum tap_state { TAP_DETECT, TAP_IGNORE, TAP_LIFTED, TAP_LOCKED, TAP_LOCKED_DRAG, }; enum tpad_cmd { CLEAR_MOTION_DELTAS, SOFTBUTTON_DOWN, SOFTBUTTON_UP, TAPBUTTON_SYNC, TAPBUTTON_DOWN, TAPBUTTON_UP, VSCROLL, HSCROLL, }; /* * tpad_touch.flags: */ #define L_EDGE (1 << 0) #define R_EDGE (1 << 1) #define T_EDGE (1 << 2) #define B_EDGE (1 << 3) #define THUMB (1 << 4) #define EDGES (L_EDGE | R_EDGE | T_EDGE | B_EDGE) /* * A touch is "centered" if it does not start and remain at the top * edge or one of the vertical edges. Two-finger scrolling and tapping * require that at least one touch is centered. */ #define CENTERED(t) (((t)->flags & (L_EDGE | R_EDGE | T_EDGE)) == 0) enum touchstates { TOUCH_NONE, TOUCH_BEGIN, TOUCH_UPDATE, TOUCH_END, }; struct tpad_touch { u_int flags; enum touchstates state; int x; int y; int dir; struct timespec start; struct timespec match; struct position *pos; struct { int x; int y; struct timespec time; } orig; }; /* * wstpad.features */ #define WSTPAD_SOFTBUTTONS (1 << 0) #define WSTPAD_SOFTMBTN (1 << 1) #define WSTPAD_TOPBUTTONS (1 << 2) #define WSTPAD_TWOFINGERSCROLL (1 << 3) #define WSTPAD_EDGESCROLL (1 << 4) #define WSTPAD_HORIZSCROLL (1 << 5) #define WSTPAD_SWAPSIDES (1 << 6) #define WSTPAD_DISABLE (1 << 7) #define WSTPAD_MT (1 << 31) struct wstpad { u_int features; u_int handlers; /* * t always points into the tpad_touches array, which has at * least one element. If there is more than one, t selects * the pointer-controlling touch. */ struct tpad_touch *t; struct tpad_touch *tpad_touches; u_int mtcycle; u_int ignore; int contacts; int prev_contacts; u_int btns; u_int btns_sync; int ratio; struct timespec time; u_int freeze; struct timespec freeze_ts; /* edge coordinates */ struct { int left; int right; int top; int bottom; int center; int center_left; int center_right; int low; } edge; struct { /* ratios to the surface width or height */ int left_edge; int right_edge; int top_edge; int bottom_edge; int center_width; /* two-finger contacts */ int f2pressure; int f2width; } params; /* handler state and configuration: */ u_int softbutton; u_int sbtnswap; struct { enum tap_state state; int contacts; int valid; u_int pending; u_int button; int masked; int maxdist; struct timeout to; /* parameters: */ struct timespec maxtime; int clicktime; int locktime; u_int btnmap[TAP_BTNMAP_SIZE]; } tap; struct { int dz; int dw; int hdist; int vdist; int mag; } scroll; }; static const struct timespec match_interval = { .tv_sec = 0, .tv_nsec = MATCHINTERVAL_MS * 1000000 }; static const struct timespec stop_interval = { .tv_sec = 0, .tv_nsec = STOPINTERVAL_MS * 1000000 }; /* * Coordinates in the wstpad struct are "normalized" device coordinates, * the orientation is left-to-right and upward. */ static inline int normalize_abs(struct axis_filter *filter, int val) { return (filter->inv ? filter->inv - val : val); } static inline int normalize_rel(struct axis_filter *filter, int val) { return (filter->inv ? -val : val); } /* * Directions of motion are represented by numbers in the range 0 - 11, * corresponding to clockwise counted circle sectors: * * 11 | 0 * 10 | 1 * 9 | 2 * -------+------- * 8 | 3 * 7 | 4 * 6 | 5 * */ /* Tangent constants in [*.12] fixed-point format: */ #define TAN_DEG_60 7094 #define TAN_DEG_30 2365 #define NORTH(d) ((d) == 0 || (d) == 11) #define SOUTH(d) ((d) == 5 || (d) == 6) #define EAST(d) ((d) == 2 || (d) == 3) #define WEST(d) ((d) == 8 || (d) == 9) static inline int direction(int dx, int dy, int ratio) { int rdy, dir = -1; if (dx || dy) { rdy = abs(dy) * ratio; if (abs(dx) * TAN_DEG_60 < rdy) dir = 0; else if (abs(dx) * TAN_DEG_30 < rdy) dir = 1; else dir = 2; if ((dx < 0) != (dy < 0)) dir = 5 - dir; if (dx < 0) dir += 6; } return dir; } static inline int dircmp(int dir1, int dir2) { int diff = abs(dir1 - dir2); return (diff <= 6 ? diff : 12 - diff); } /* * Update direction and timespec attributes for a touch. They are used to * determine whether it is moving - or resting - stably. * * The callers pass touches from the current frame and the touches that are * no longer present in the update cycle to this function. Even though this * ensures that pairs of zero deltas do not result from stale coordinates, * zero deltas do not reset the state immediately. A short time span - the * "stop interval" - must pass before the state is cleared, which is * necessary because some touchpads report intermediate stops when a touch * is moving very slowly. */ void wstpad_set_direction(struct wstpad *tp, struct tpad_touch *t, int dx, int dy) { int dir; struct timespec ts; if (t->state != TOUCH_UPDATE) { t->dir = -1; memcpy(&t->start, &tp->time, sizeof(struct timespec)); return; } dir = direction(dx, dy, tp->ratio); if (dir >= 0) { if (t->dir < 0 || dircmp(dir, t->dir) > 1) { memcpy(&t->start, &tp->time, sizeof(struct timespec)); } t->dir = dir; memcpy(&t->match, &tp->time, sizeof(struct timespec)); } else if (t->dir >= 0) { timespecsub(&tp->time, &t->match, &ts); if (timespeccmp(&ts, &stop_interval, >=)) { t->dir = -1; memcpy(&t->start, &t->match, sizeof(struct timespec)); } } } /* * Make a rough, but quick estimation of the speed of a touch. Its * distance to the previous position is scaled by factors derived * from the average update rate and the deceleration parameter * (filter.dclr). The unit of the result is: * (filter.dclr / 100) device units per millisecond * * Magnitudes are returned in [*.12] fixed-point format. For purposes * of filtering, they are divided into medium and high speeds * (> MAG_MEDIUM), low speeds, and very low speeds (< MAG_LOW). * * The scale factors are not affected if deceleration is turned off. */ static inline int magnitude(struct wsmouseinput *input, int dx, int dy) { int h, v; h = abs(dx) * input->filter.h.mag_scale; v = abs(dy) * input->filter.v.mag_scale; /* Return an "alpha-max-plus-beta-min" approximation: */ return (h >= v ? h + 3 * v / 8 : v + 3 * h / 8); } /* * Treat a touch as stable if it is moving at a medium or high speed, * if it is moving continuously, or if it has stopped for a certain * time span. */ int wstpad_is_stable(struct wsmouseinput *input, struct tpad_touch *t) { struct timespec ts; if (t->dir >= 0) { if (magnitude(input, t->pos->dx, t->pos->dy) > MAG_MEDIUM) return (1); timespecsub(&t->match, &t->start, &ts); } else { timespecsub(&input->tp->time, &t->start, &ts); } return (timespeccmp(&ts, &match_interval, >=)); } /* * If a touch starts in an edge area, pointer movement will be * suppressed as long as it stays in that area. */ static inline u_int edge_flags(struct wstpad *tp, int x, int y) { u_int flags = 0; if (x < tp->edge.left) flags |= L_EDGE; else if (x >= tp->edge.right) flags |= R_EDGE; if (y < tp->edge.bottom) flags |= B_EDGE; else if (y >= tp->edge.top) flags |= T_EDGE; return (flags); } static inline struct tpad_touch * get_2nd_touch(struct wsmouseinput *input) { struct wstpad *tp = input->tp; int slot; if (IS_MT(tp)) { slot = ffs(input->mt.touches & ~(input->mt.ptr | tp->ignore)); if (slot) return &tp->tpad_touches[--slot]; } return NULL; } /* Suppress pointer motion for a short period of time. */ static inline void set_freeze_ts(struct wstpad *tp, int sec, int ms) { tp->freeze_ts.tv_sec = sec; tp->freeze_ts.tv_nsec = ms * 1000000; timespecadd(&tp->time, &tp->freeze_ts, &tp->freeze_ts); } /* Return TRUE if two-finger- or edge-scrolling would be valid. */ int wstpad_scroll_coords(struct wsmouseinput *input, int *dx, int *dy) { struct wstpad *tp = input->tp; if (tp->contacts != tp->prev_contacts || tp->btns || tp->btns_sync) { tp->scroll.dz = 0; tp->scroll.dw = 0; return (0); } if ((input->motion.sync & SYNC_POSITION) == 0) return (0); /* * Try to exclude accidental scroll events by checking whether the * pointer-controlling touch is stable. The check, which may cause * a short delay, is only applied initially, a touch that stops and * resumes scrolling is not affected. */ if (tp->scroll.dz || tp->scroll.dw || wstpad_is_stable(input, tp->t)) { *dx = normalize_rel(&input->filter.h, input->motion.pos.dx); *dy = normalize_rel(&input->filter.v, input->motion.pos.dy); return (*dx || *dy); } return (0); } void wstpad_scroll(struct wstpad *tp, int dx, int dy, int mag, u_int *cmds) { int dz, dw, n = 1; /* * The function applies strong deceleration, but only to input with * very low speeds. A higher threshold might make applications * without support for precision scrolling appear unresponsive. */ mag = tp->scroll.mag = imin(MAG_MEDIUM, (mag + 3 * tp->scroll.mag) / 4); if (mag < MAG_LOW) n = (MAG_LOW - mag) / 4096 + 1; if (dy && tp->scroll.vdist) { if (tp->scroll.dw) { /* * Before switching the axis, wstpad_scroll_coords() * should check again whether the movement is stable. */ tp->scroll.dw = 0; return; } dz = -dy * 4096 / (tp->scroll.vdist * n); if (tp->scroll.dz) { if ((dy < 0) != (tp->scroll.dz > 0)) tp->scroll.dz = -tp->scroll.dz; dz = (dz + 3 * tp->scroll.dz) / 4; } if (dz) { tp->scroll.dz = dz; *cmds |= 1 << VSCROLL; } } else if (dx && tp->scroll.hdist) { if (tp->scroll.dz) { tp->scroll.dz = 0; return; } dw = dx * 4096 / (tp->scroll.hdist * n); if (tp->scroll.dw) { if ((dx > 0) != (tp->scroll.dw > 0)) tp->scroll.dw = -tp->scroll.dw; dw = (dw + 3 * tp->scroll.dw) / 4; } if (dw) { tp->scroll.dw = dw; *cmds |= 1 << HSCROLL; } } } void wstpad_f2scroll(struct wsmouseinput *input, u_int *cmds) { struct wstpad *tp = input->tp; struct tpad_touch *t2; int dir, dx, dy, centered; if (tp->ignore == 0) { if (tp->contacts != 2) return; } else if (tp->contacts != 3 || (tp->ignore == input->mt.ptr)) { return; } if (!wstpad_scroll_coords(input, &dx, &dy)) return; dir = tp->t->dir; if (!(NORTH(dir) || SOUTH(dir))) dy = 0; if (!(EAST(dir) || WEST(dir))) dx = 0; if (dx || dy) { centered = CENTERED(tp->t); if (IS_MT(tp)) { t2 = get_2nd_touch(input); if (t2 == NULL) return; dir = t2->dir; if ((dy > 0 && !NORTH(dir)) || (dy < 0 && !SOUTH(dir))) return; if ((dx > 0 && !EAST(dir)) || (dx < 0 && !WEST(dir))) return; if (!wstpad_is_stable(input, t2) && !(tp->scroll.dz || tp->scroll.dw)) return; centered |= CENTERED(t2); } if (centered) { wstpad_scroll(tp, dx, dy, magnitude(input, dx, dy), cmds); set_freeze_ts(tp, 0, FREEZE_MS); } } } void wstpad_edgescroll(struct wsmouseinput *input, u_int *cmds) { struct wstpad *tp = input->tp; struct tpad_touch *t = tp->t; u_int v_edge, b_edge; int dx, dy; if (!wstpad_scroll_coords(input, &dx, &dy) || tp->contacts != 1) return; v_edge = (tp->features & WSTPAD_SWAPSIDES) ? L_EDGE : R_EDGE; b_edge = (tp->features & WSTPAD_HORIZSCROLL) ? B_EDGE : 0; if ((t->flags & v_edge) == 0) dy = 0; if ((t->flags & b_edge) == 0) dx = 0; if (dx || dy) wstpad_scroll(tp, dx, dy, magnitude(input, dx, dy), cmds); } static inline u_int sbtn(struct wstpad *tp, int x, int y) { if (y >= tp->edge.bottom) return (0); if ((tp->features & WSTPAD_SOFTMBTN) && x >= tp->edge.center_left && x < tp->edge.center_right) return (MIDDLEBTN); return ((x < tp->edge.center ? LEFTBTN : RIGHTBTN) ^ tp->sbtnswap); } static inline u_int top_sbtn(struct wstpad *tp, int x, int y) { if (y < tp->edge.top) return (0); if (x < tp->edge.center_left) return (LEFTBTN ^ tp->sbtnswap); return (x > tp->edge.center_right ? (RIGHTBTN ^ tp->sbtnswap) : MIDDLEBTN); } u_int wstpad_get_sbtn(struct wsmouseinput *input, int top) { struct wstpad *tp = input->tp; struct tpad_touch *t = tp->t; u_int btn; btn = 0; if (tp->contacts) { btn = top ? top_sbtn(tp, t->x, t->y) : sbtn(tp, t->x, t->y); /* * If there is no middle-button area, but contacts in both * halves of the edge zone, generate a middle-button event: */ if (btn && IS_MT(tp) && tp->contacts == 2 && !top && !(tp->features & WSTPAD_SOFTMBTN)) { if ((t = get_2nd_touch(input)) != NULL) btn |= sbtn(tp, t->x, t->y); if (btn == (LEFTBTN | RIGHTBTN)) btn = MIDDLEBTN; } } return (btn != PRIMARYBTN ? btn : 0); } void wstpad_softbuttons(struct wsmouseinput *input, u_int *cmds, int hdlr) { struct wstpad *tp = input->tp; int top = (hdlr == TOPBUTTON_HDLR); if (tp->softbutton && PRIMARYBTN_RELEASED(tp)) { *cmds |= 1 << SOFTBUTTON_UP; return; } if (tp->softbutton == 0 && PRIMARYBTN_CLICKED(tp)) { tp->softbutton = wstpad_get_sbtn(input, top); if (tp->softbutton) *cmds |= 1 << SOFTBUTTON_DOWN; } } /* Check whether the duration of t is within the tap limit. */ int wstpad_is_tap(struct wstpad *tp, struct tpad_touch *t) { struct timespec ts; timespecsub(&tp->time, &t->orig.time, &ts); return (timespeccmp(&ts, &tp->tap.maxtime, <)); } /* * At least one MT touch must remain close to its origin and end * in the main area. The same conditions apply to one-finger taps * on single-touch devices. */ void wstpad_tap_filter(struct wstpad *tp, struct tpad_touch *t) { int dx, dy, dist = 0; if (IS_MT(tp) || tp->tap.contacts == 1) { dx = abs(t->x - t->orig.x) << 12; dy = abs(t->y - t->orig.y) * tp->ratio; dist = (dx >= dy ? dx + 3 * dy / 8 : dy + 3 * dx / 8); } tp->tap.valid = (CENTERED(t) && dist <= (tp->tap.maxdist << 12)); } /* * Return the oldest touch in the TOUCH_END state, or NULL. */ struct tpad_touch * wstpad_tap_touch(struct wsmouseinput *input) { struct wstpad *tp = input->tp; struct tpad_touch *s, *t = NULL; u_int lifted; int slot; if (IS_MT(tp)) { lifted = (input->mt.sync[MTS_TOUCH] & ~input->mt.touches); FOREACHBIT(lifted, slot) { s = &tp->tpad_touches[slot]; if (tp->tap.state == TAP_DETECT && !tp->tap.valid) wstpad_tap_filter(tp, s); if (t == NULL || timespeccmp(&t->orig.time, &s->orig.time, >)) t = s; } } else { if (tp->t->state == TOUCH_END) { t = tp->t; if (tp->tap.state == TAP_DETECT && !tp->tap.valid) wstpad_tap_filter(tp, t); } } return (t); } /* Determine the "tap button", keep track of whether a touch is masked. */ u_int wstpad_tap_button(struct wstpad *tp) { int n = tp->tap.contacts - tp->contacts - 1; tp->tap.masked = tp->contacts; return (n >= 0 && n < TAP_BTNMAP_SIZE ? tp->tap.btnmap[n] : 0); } /* * In the hold/drag state, do not mask touches if no masking was involved * in the preceding tap gesture. */ static inline int tap_unmask(struct wstpad *tp) { return ((tp->tap.button || tp->tap.pending) && tp->tap.masked == 0); } /* * In the default configuration, this handler maps one-, two-, and * three-finger taps to left-button, right-button, and middle-button * events, respectively. Setting the LOCKTIME parameter enables * "locked drags", which are finished by a timeout or a tap-to-end * gesture. */ void wstpad_tap(struct wsmouseinput *input, u_int *cmds) { struct wstpad *tp = input->tp; struct tpad_touch *t; int contacts, is_tap, slot, err = 0; /* Synchronize the button states, if necessary. */ if (input->btn.sync) *cmds |= 1 << TAPBUTTON_SYNC; /* * It is possible to produce a click within the tap timeout. * Wait for a new touch before generating new button events. */ if (PRIMARYBTN_RELEASED(tp)) tp->tap.contacts = 0; /* Reset the detection state whenever a new touch starts. */ if (tp->contacts > tp->prev_contacts || (IS_MT(tp) && (input->mt.touches & input->mt.sync[MTS_TOUCH]))) { tp->tap.contacts = tp->contacts; tp->tap.valid = 0; } /* * The filtered number of active touches excludes a masked * touch if its duration exceeds the tap limit. */ contacts = tp->contacts; if ((slot = ffs(input->mt.ptr_mask) - 1) >= 0 && !wstpad_is_tap(tp, &tp->tpad_touches[slot]) && !tap_unmask(tp)) { contacts--; } switch (tp->tap.state) { case TAP_DETECT: /* Find the oldest touch in the TOUCH_END state. */ t = wstpad_tap_touch(input); if (t) { is_tap = wstpad_is_tap(tp, t); if (is_tap && contacts == 0) { if (tp->tap.button) *cmds |= 1 << TAPBUTTON_UP; tp->tap.pending = (tp->tap.valid ? wstpad_tap_button(tp) : 0); if (tp->tap.pending) { tp->tap.state = TAP_LIFTED; err = !timeout_add_msec(&tp->tap.to, CLICKDELAY_MS); } } else if (!is_tap && tp->tap.locktime == 0) { if (contacts == 0 && tp->tap.button) *cmds |= 1 << TAPBUTTON_UP; else if (contacts) tp->tap.state = TAP_IGNORE; } else if (!is_tap && tp->tap.button) { if (contacts == 0) { tp->tap.state = TAP_LOCKED; err = !timeout_add_msec(&tp->tap.to, tp->tap.locktime); } else { tp->tap.state = TAP_LOCKED_DRAG; } } } break; case TAP_IGNORE: if (contacts == 0) { tp->tap.state = TAP_DETECT; if (tp->tap.button) *cmds |= 1 << TAPBUTTON_UP; } break; case TAP_LIFTED: if (contacts) { timeout_del(&tp->tap.to); tp->tap.state = TAP_DETECT; if (tp->tap.pending) *cmds |= 1 << TAPBUTTON_DOWN; } break; case TAP_LOCKED: if (contacts) { timeout_del(&tp->tap.to); tp->tap.state = TAP_LOCKED_DRAG; } break; case TAP_LOCKED_DRAG: if (contacts == 0) { t = wstpad_tap_touch(input); if (t && wstpad_is_tap(tp, t)) { /* "tap-to-end" */ *cmds |= 1 << TAPBUTTON_UP; tp->tap.state = TAP_DETECT; } else { tp->tap.state = TAP_LOCKED; err = !timeout_add_msec(&tp->tap.to, tp->tap.locktime); } } break; } if (err) { /* Did timeout_add fail? */ input->sbtn.buttons &= ~tp->tap.button; input->sbtn.sync |= tp->tap.button; tp->tap.pending = 0; tp->tap.button = 0; tp->tap.state = TAP_DETECT; } } int wstpad_tap_sync(struct wsmouseinput *input) { struct wstpad *tp = input->tp; return ((tp->tap.button & (input->btn.buttons | tp->softbutton)) == 0 || (tp->tap.button == PRIMARYBTN && tp->softbutton)); } void wstpad_tap_timeout(void *p) { struct wsmouseinput *input = p; struct wstpad *tp = input->tp; struct evq_access evq; u_int btn; int s, ev; s = spltty(); evq.evar = *input->evar; if (evq.evar != NULL && tp != NULL) { ev = 0; if (tp->tap.pending) { tp->tap.button = tp->tap.pending; tp->tap.pending = 0; input->sbtn.buttons |= tp->tap.button; timeout_add_msec(&tp->tap.to, tp->tap.clicktime); if (wstpad_tap_sync(input)) { ev = BTN_DOWN_EV; btn = ffs(tp->tap.button) - 1; } } else { if (wstpad_tap_sync(input)) { ev = BTN_UP_EV; btn = ffs(tp->tap.button) - 1; } if (tp->tap.button != tp->softbutton) input->sbtn.buttons &= ~tp->tap.button; tp->tap.button = 0; tp->tap.state = TAP_DETECT; } if (ev) { evq.put = evq.evar->put; evq.result = EVQ_RESULT_NONE; getnanotime(&evq.ts); wsmouse_evq_put(&evq, ev, btn); wsmouse_evq_put(&evq, SYNC_EV, 0); if (evq.result == EVQ_RESULT_SUCCESS) { if (input->flags & LOG_EVENTS) { wsmouse_log_events(input, &evq); } evq.evar->put = evq.put; WSEVENT_WAKEUP(evq.evar); } else { input->sbtn.sync |= tp->tap.button; } } } splx(s); } /* * Suppress accidental pointer movements after a click on a clickpad. */ void wstpad_click(struct wsmouseinput *input) { struct wstpad *tp = input->tp; if (tp->contacts == 1 && (PRIMARYBTN_CLICKED(tp) || PRIMARYBTN_RELEASED(tp))) set_freeze_ts(tp, 0, FREEZE_MS); } /* Translate the "command" bits into the sync-state of wsmouse. */ void wstpad_cmds(struct wsmouseinput *input, u_int cmds) { struct wstpad *tp = input->tp; int n; FOREACHBIT(cmds, n) { switch (n) { case CLEAR_MOTION_DELTAS: input->motion.dx = input->motion.dy = 0; if (input->motion.dz == 0 && input->motion.dw == 0) input->motion.sync &= ~SYNC_DELTAS; continue; case SOFTBUTTON_DOWN: input->btn.sync &= ~PRIMARYBTN; input->sbtn.buttons |= tp->softbutton; if (tp->softbutton != tp->tap.button) input->sbtn.sync |= tp->softbutton; continue; case SOFTBUTTON_UP: input->btn.sync &= ~PRIMARYBTN; if (tp->softbutton != tp->tap.button) { input->sbtn.buttons &= ~tp->softbutton; input->sbtn.sync |= tp->softbutton; } tp->softbutton = 0; continue; case TAPBUTTON_SYNC: if (tp->tap.button) input->btn.sync &= ~tp->tap.button; continue; case TAPBUTTON_DOWN: tp->tap.button = tp->tap.pending; tp->tap.pending = 0; input->sbtn.buttons |= tp->tap.button; if (wstpad_tap_sync(input)) input->sbtn.sync |= tp->tap.button; continue; case TAPBUTTON_UP: if (tp->tap.button != tp->softbutton) input->sbtn.buttons &= ~tp->tap.button; if (wstpad_tap_sync(input)) input->sbtn.sync |= tp->tap.button; tp->tap.button = 0; continue; case HSCROLL: input->motion.dw = tp->scroll.dw; input->motion.sync |= SYNC_DELTAS; continue; case VSCROLL: input->motion.dz = tp->scroll.dz; input->motion.sync |= SYNC_DELTAS; continue; default: printf("[wstpad] invalid cmd %d\n", n); break; } } } /* * Set the state of touches that have ended. TOUCH_END is a transitional * state and will be changed to TOUCH_NONE before process_input() returns. */ static inline void clear_touchstates(struct wsmouseinput *input, enum touchstates state) { u_int touches; int slot; touches = input->mt.sync[MTS_TOUCH] & ~input->mt.touches; FOREACHBIT(touches, slot) input->tp->tpad_touches[slot].state = state; } void wstpad_mt_inputs(struct wsmouseinput *input) { struct wstpad *tp = input->tp; struct tpad_touch *t; int slot, dx, dy; u_int touches, inactive; /* TOUCH_BEGIN */ touches = input->mt.touches & input->mt.sync[MTS_TOUCH]; FOREACHBIT(touches, slot) { t = &tp->tpad_touches[slot]; t->state = TOUCH_BEGIN; t->x = normalize_abs(&input->filter.h, t->pos->x); t->y = normalize_abs(&input->filter.v, t->pos->y); t->orig.x = t->x; t->orig.y = t->y; memcpy(&t->orig.time, &tp->time, sizeof(struct timespec)); t->flags = edge_flags(tp, t->x, t->y); wstpad_set_direction(tp, t, 0, 0); } /* TOUCH_UPDATE */ touches = input->mt.touches & input->mt.frame; if (touches & tp->mtcycle) { /* * Slot data may be synchronized separately, in any order, * or not at all if there is no delta. Identify the touches * without deltas. */ inactive = input->mt.touches & ~tp->mtcycle; tp->mtcycle = touches; } else { inactive = 0; tp->mtcycle |= touches; } touches = input->mt.touches & ~input->mt.sync[MTS_TOUCH]; FOREACHBIT(touches, slot) { t = &tp->tpad_touches[slot]; t->state = TOUCH_UPDATE; if ((1 << slot) & input->mt.frame) { dx = normalize_abs(&input->filter.h, t->pos->x) - t->x; t->x += dx; dy = normalize_abs(&input->filter.v, t->pos->y) - t->y; t->y += dy; t->flags &= (~EDGES | edge_flags(tp, t->x, t->y)); if (wsmouse_hysteresis(input, t->pos)) dx = dy = 0; wstpad_set_direction(tp, t, dx, dy); } else if ((1 << slot) & inactive) { wstpad_set_direction(tp, t, 0, 0); } } clear_touchstates(input, TOUCH_END); } /* * Identify "thumb" contacts in the bottom area. The identification * has three stages: * 1. If exactly one of two or more touches is in the bottom area, it * is masked, which means it does not receive pointer control as long * as there are alternatives. Once set, the mask will only be cleared * when the touch is released. * Tap detection ignores a masked touch if it does not participate in * a tap gesture. * 2. If the pointer-controlling touch is moving stably while a masked * touch in the bottom area is resting, or only moving minimally, the * pointer mask is copied to tp->ignore. In this stage, the masked * touch does not block pointer movement, and it is ignored by * wstpad_f2scroll(). * Decisions are made more or less immediately, there may be errors * in edge cases. If a fast or long upward movement is detected, * tp->ignore is cleared. There is no other transition from stage 2 * to scrolling, or vice versa, for a pair of touches. * 3. If tp->ignore is set and the touch is resting, it is marked as * thumb, and it will be ignored until it ends. */ void wstpad_mt_masks(struct wsmouseinput *input) { struct wstpad *tp = input->tp; struct tpad_touch *t; struct position *pos; u_int mask; int slot; tp->ignore &= input->mt.touches; if (tp->contacts < 2) return; if (tp->ignore) { slot = ffs(tp->ignore) - 1; t = &tp->tpad_touches[slot]; if (t->flags & THUMB) return; if (t->dir < 0 && wstpad_is_stable(input, t)) { t->flags |= THUMB; return; } /* The edge.low area is a bit larger than the bottom area. */ if (t->y >= tp->edge.low || (NORTH(t->dir) && magnitude(input, t->pos->dx, t->pos->dy) >= MAG_MEDIUM)) tp->ignore = 0; return; } if (input->mt.ptr_mask == 0) { mask = ~0; FOREACHBIT(input->mt.touches, slot) { t = &tp->tpad_touches[slot]; if (t->flags & B_EDGE) { mask &= (1 << slot); input->mt.ptr_mask = mask; } } } if ((input->mt.ptr_mask & ~input->mt.ptr) && !(tp->scroll.dz || tp->scroll.dw) && tp->t->dir >= 0 && wstpad_is_stable(input, tp->t)) { slot = ffs(input->mt.ptr_mask) - 1; t = &tp->tpad_touches[slot]; if (t->y >= tp->edge.low) return; if (!wstpad_is_stable(input, t)) return; /* Default hysteresis limits are low. Make a strict check. */ pos = tp->t->pos; if (abs(pos->acc_dx) < 3 * input->filter.h.hysteresis && abs(pos->acc_dy) < 3 * input->filter.v.hysteresis) return; if (t->dir >= 0) { /* Treat t as thumb if it is slow while tp->t is fast. */ if (magnitude(input, t->pos->dx, t->pos->dy) > MAG_LOW || magnitude(input, pos->dx, pos->dy) < MAG_MEDIUM) return; } tp->ignore = input->mt.ptr_mask; } } void wstpad_touch_inputs(struct wsmouseinput *input) { struct wstpad *tp = input->tp; struct tpad_touch *t; int slot, x, y, dx, dy; tp->btns = input->btn.buttons; tp->btns_sync = input->btn.sync; tp->prev_contacts = tp->contacts; tp->contacts = input->touch.contacts; if (tp->contacts == 1 && ((tp->params.f2width && input->touch.width >= tp->params.f2width) || (tp->params.f2pressure && input->touch.pressure >= tp->params.f2pressure))) tp->contacts = 2; if (IS_MT(tp)) { wstpad_mt_inputs(input); if (input->mt.ptr) { slot = ffs(input->mt.ptr) - 1; tp->t = &tp->tpad_touches[slot]; } wstpad_mt_masks(input); } else { t = tp->t; if (tp->contacts) t->state = (tp->prev_contacts ? TOUCH_UPDATE : TOUCH_BEGIN); else t->state = (tp->prev_contacts ? TOUCH_END : TOUCH_NONE); dx = dy = 0; x = normalize_abs(&input->filter.h, t->pos->x); y = normalize_abs(&input->filter.v, t->pos->y); if (t->state == TOUCH_BEGIN) { t->x = t->orig.x = x; t->y = t->orig.y = y; memcpy(&t->orig.time, &tp->time, sizeof(struct timespec)); t->flags = edge_flags(tp, x, y); } else if (input->motion.sync & SYNC_POSITION) { if (!wsmouse_hysteresis(input, t->pos)) { dx = x - t->x; dy = y - t->y; } t->x = x; t->y = y; t->flags &= (~EDGES | edge_flags(tp, x, y)); } wstpad_set_direction(tp, t, dx, dy); } } static inline int t2_ignore(struct wsmouseinput *input) { /* * If there are two touches, do not block pointer movement if they * perform a click-and-drag action, or if the second touch is * resting in the bottom area. */ return (input->tp->contacts == 2 && ((input->tp->btns & PRIMARYBTN) || (input->tp->ignore & ~input->mt.ptr))); } void wstpad_process_input(struct wsmouseinput *input, struct evq_access *evq) { struct wstpad *tp = input->tp; u_int handlers, hdlr, cmds; memcpy(&tp->time, &evq->ts, sizeof(struct timespec)); wstpad_touch_inputs(input); cmds = 0; handlers = tp->handlers; if (DISABLE(tp)) handlers &= ((1 << TOPBUTTON_HDLR) | (1 << SOFTBUTTON_HDLR)); FOREACHBIT(handlers, hdlr) { switch (hdlr) { case SOFTBUTTON_HDLR: case TOPBUTTON_HDLR: wstpad_softbuttons(input, &cmds, hdlr); continue; case TAP_HDLR: wstpad_tap(input, &cmds); continue; case F2SCROLL_HDLR: wstpad_f2scroll(input, &cmds); continue; case EDGESCROLL_HDLR: wstpad_edgescroll(input, &cmds); continue; case CLICK_HDLR: wstpad_click(input); continue; } } /* Check whether pointer movement should be blocked. */ if (input->motion.dx || input->motion.dy) { if (DISABLE(tp) || (tp->t->flags & tp->freeze) || timespeccmp(&tp->time, &tp->freeze_ts, <) || (tp->contacts > 1 && !t2_ignore(input))) { cmds |= 1 << CLEAR_MOTION_DELTAS; } } wstpad_cmds(input, cmds); if (IS_MT(tp)) clear_touchstates(input, TOUCH_NONE); } /* * Try to determine the average interval between two updates. Various * conditions are checked in order to ensure that only valid samples enter * into the calculation. Above all, it is restricted to motion events * occurring when there is only one contact. MT devices may need more than * one packet to transmit their state if there are multiple touches, and * the update frequency may be higher in this case. */ void wstpad_track_interval(struct wsmouseinput *input, struct timespec *time) { static const struct timespec limit = { 0, 30 * 1000000L }; struct timespec ts; int samples; if (input->motion.sync == 0 || (input->touch.sync & SYNC_CONTACTS) || (input->touch.contacts > 1)) { input->intv.track = 0; return; } if (input->intv.track) { timespecsub(time, &input->intv.ts, &ts); if (timespeccmp(&ts, &limit, <)) { /* The unit of the sum is 4096 nanoseconds. */ input->intv.sum += ts.tv_nsec >> 12; samples = ++input->intv.samples; /* * Make the first calculation quickly and later * a more reliable one: */ if (samples == 8) { input->intv.avg = input->intv.sum << 9; wstpad_init_deceleration(input); } else if (samples == 128) { input->intv.avg = input->intv.sum << 5; wstpad_init_deceleration(input); input->intv.samples = 0; input->intv.sum = 0; input->flags &= ~TRACK_INTERVAL; } } } memcpy(&input->intv.ts, time, sizeof(struct timespec)); input->intv.track = 1; } /* * The default acceleration options of X don't work convincingly with * touchpads (the synaptics driver installs its own "acceleration * profile" and callback function). As a preliminary workaround, this * filter applies a simple deceleration scheme to small deltas, based * on the "magnitude" of the delta pair. A magnitude of 8 corresponds, * roughly, to a speed of (filter.dclr / 12.5) device units per milli- * second. If its magnitude is smaller than 7 a delta will be downscaled * by the factor 2/8, deltas with magnitudes from 7 to 11 by factors * ranging from 3/8 to 7/8. */ int wstpad_decelerate(struct wsmouseinput *input, int *dx, int *dy) { int mag, n, h, v; mag = magnitude(input, *dx, *dy); /* Don't change deceleration levels abruptly. */ mag = (mag + 7 * input->filter.mag) / 8; /* Don't use arbitrarily high values. */ input->filter.mag = imin(mag, 24 << 12); n = imax((mag >> 12) - 4, 2); if (n < 8) { /* Scale by (n / 8). */ h = *dx * n + input->filter.h.dclr_rmdr; v = *dy * n + input->filter.v.dclr_rmdr; input->filter.h.dclr_rmdr = (h >= 0 ? h & 7 : -(-h & 7)); input->filter.v.dclr_rmdr = (v >= 0 ? v & 7 : -(-v & 7)); *dx = h / 8; *dy = v / 8; return (1); } return (0); } void wstpad_filter(struct wsmouseinput *input) { struct axis_filter *h = &input->filter.h; struct axis_filter *v = &input->filter.v; struct position *pos = &input->motion.pos; int strength = input->filter.mode & 7; int dx, dy; if (!(input->motion.sync & SYNC_POSITION) || (h->dmax && (abs(pos->dx) > h->dmax)) || (v->dmax && (abs(pos->dy) > v->dmax))) { dx = dy = 0; } else { dx = pos->dx; dy = pos->dy; } if (wsmouse_hysteresis(input, pos)) dx = dy = 0; if (input->filter.dclr && wstpad_decelerate(input, &dx, &dy)) /* Strong smoothing may hamper the precision at low speeds. */ strength = imin(strength, 2); if (strength) { if ((input->touch.sync & SYNC_CONTACTS) || input->mt.ptr != input->mt.prev_ptr) { h->avg = v->avg = 0; } /* Use a weighted decaying average for smoothing. */ dx = dx * (8 - strength) + h->avg * strength + h->avg_rmdr; dy = dy * (8 - strength) + v->avg * strength + v->avg_rmdr; h->avg_rmdr = (dx >= 0 ? dx & 7 : -(-dx & 7)); v->avg_rmdr = (dy >= 0 ? dy & 7 : -(-dy & 7)); dx = h->avg = dx / 8; dy = v->avg = dy / 8; } input->motion.dx = dx; input->motion.dy = dy; } /* * Compatibility-mode conversions. wstpad_filter transforms and filters * the coordinate inputs, extended functionality is provided by * wstpad_process_input. */ void wstpad_compat_convert(struct wsmouseinput *input, struct evq_access *evq) { if (input->flags & TRACK_INTERVAL) wstpad_track_interval(input, &evq->ts); wstpad_filter(input); if ((input->motion.dx || input->motion.dy) && !(input->motion.sync & SYNC_DELTAS)) { input->motion.dz = input->motion.dw = 0; input->motion.sync |= SYNC_DELTAS; } if (input->tp != NULL) wstpad_process_input(input, evq); input->motion.sync &= ~SYNC_POSITION; input->touch.sync = 0; } int wstpad_init(struct wsmouseinput *input) { struct wstpad *tp = input->tp; int i, slots; if (tp != NULL) return (0); input->tp = tp = malloc(sizeof(struct wstpad), M_DEVBUF, M_WAITOK | M_ZERO); if (tp == NULL) return (-1); slots = imax(input->mt.num_slots, 1); tp->tpad_touches = malloc(slots * sizeof(struct tpad_touch), M_DEVBUF, M_WAITOK | M_ZERO); if (tp->tpad_touches == NULL) { free(tp, M_DEVBUF, sizeof(struct wstpad)); return (-1); } tp->t = &tp->tpad_touches[0]; if (input->mt.num_slots) { tp->features |= WSTPAD_MT; for (i = 0; i < input->mt.num_slots; i++) tp->tpad_touches[i].pos = &input->mt.slots[i].pos; } else { tp->t->pos = &input->motion.pos; } timeout_set(&tp->tap.to, wstpad_tap_timeout, input); tp->ratio = input->filter.ratio; return (0); } /* * Integer square root (Halleck's method) * * An adaption of code from John B. Halleck (from * http://www.cc.utah.edu/~nahaj/factoring/code.html). This version is * used and published under the OpenBSD license terms with his permission. * * Cf. also Martin Guy's "Square root by abacus" method. */ static inline u_int isqrt(u_int n) { u_int root, sqbit; root = 0; sqbit = 1 << (sizeof(u_int) * 8 - 2); while (sqbit) { if (n >= (sqbit | root)) { n -= (sqbit | root); root = (root >> 1) | sqbit; } else { root >>= 1; } sqbit >>= 2; } return (root); } void wstpad_init_deceleration(struct wsmouseinput *input) { int n, dclr; if ((dclr = input->filter.dclr) == 0) return; dclr = imax(dclr, 4); /* * For a standard update rate of about 80Hz, (dclr) units * will be mapped to a magnitude of 8. If the average rate * is significantly higher or lower, adjust the coefficient * accordingly: */ if (input->intv.avg == 0) { n = 8; } else { n = 8 * 13000000 / input->intv.avg; n = imax(imin(n, 32), 4); } input->filter.h.mag_scale = (n << 12) / dclr; input->filter.v.mag_scale = (input->filter.ratio ? n * input->filter.ratio : n << 12) / dclr; input->filter.h.dclr_rmdr = 0; input->filter.v.dclr_rmdr = 0; input->flags |= TRACK_INTERVAL; } int wstpad_configure(struct wsmouseinput *input) { struct wstpad *tp; int width, height, diag, offset, h_res, v_res, h_unit, v_unit, i; width = abs(input->hw.x_max - input->hw.x_min); height = abs(input->hw.y_max - input->hw.y_min); if (width == 0 || height == 0) return (-1); /* We can't do anything. */ if (input->tp == NULL && wstpad_init(input)) return (-1); tp = input->tp; if (!(input->flags & CONFIGURED)) { /* * The filter parameters are derived from the length of the * diagonal in device units, with some magic constants which * are partly adapted from libinput or synaptics code, or are * based on tests and guess work. The absolute resolution * values might not be reliable, but if they are present the * settings are adapted to the ratio. */ h_res = input->hw.h_res; v_res = input->hw.v_res; if (h_res == 0 || v_res == 0) h_res = v_res = 1; diag = isqrt(width * width + height * height); input->filter.h.scale = (imin(920, diag) << 12) / diag; input->filter.v.scale = input->filter.h.scale * h_res / v_res; h_unit = imax(diag / 280, 3); v_unit = imax((h_unit * v_res + h_res / 2) / h_res, 3); input->filter.h.hysteresis = h_unit; input->filter.v.hysteresis = v_unit; input->filter.mode = FILTER_MODE_DEFAULT; input->filter.dclr = h_unit - h_unit / 5; wstpad_init_deceleration(input); tp->features &= (WSTPAD_MT | WSTPAD_DISABLE); if (input->hw.contacts_max != 1) tp->features |= WSTPAD_TWOFINGERSCROLL; else tp->features |= WSTPAD_EDGESCROLL; if (input->hw.hw_type == WSMOUSEHW_CLICKPAD) { if (input->hw.type == WSMOUSE_TYPE_SYNAP_SBTN) { tp->features |= WSTPAD_TOPBUTTONS; } else { tp->features |= WSTPAD_SOFTBUTTONS; tp->features |= WSTPAD_SOFTMBTN; } } tp->params.left_edge = V_EDGE_RATIO_DEFAULT; tp->params.right_edge = V_EDGE_RATIO_DEFAULT; tp->params.bottom_edge = ((tp->features & WSTPAD_SOFTBUTTONS) ? B_EDGE_RATIO_DEFAULT : 0); tp->params.top_edge = ((tp->features & WSTPAD_TOPBUTTONS) ? T_EDGE_RATIO_DEFAULT : 0); tp->params.center_width = CENTER_RATIO_DEFAULT; tp->tap.maxtime.tv_nsec = TAP_MAXTIME_DEFAULT * 1000000; tp->tap.clicktime = TAP_CLICKTIME_DEFAULT; tp->tap.locktime = TAP_LOCKTIME_DEFAULT; tp->scroll.hdist = 4 * h_unit; tp->scroll.vdist = 4 * v_unit; tp->tap.maxdist = 4 * h_unit; } /* A touch with a flag set in this mask does not move the pointer. */ tp->freeze = EDGES; offset = width * tp->params.left_edge / 4096; tp->edge.left = (offset ? input->hw.x_min + offset : INT_MIN); offset = width * tp->params.right_edge / 4096; tp->edge.right = (offset ? input->hw.x_max - offset : INT_MAX); offset = height * tp->params.bottom_edge / 4096; tp->edge.bottom = (offset ? input->hw.y_min + offset : INT_MIN); tp->edge.low = tp->edge.bottom + offset / 2; offset = height * tp->params.top_edge / 4096; tp->edge.top = (offset ? input->hw.y_max - offset : INT_MAX); offset = width * abs(tp->params.center_width) / 8192; tp->edge.center = input->hw.x_min + width / 2; tp->edge.center_left = tp->edge.center - offset; tp->edge.center_right = tp->edge.center + offset; tp->handlers = 0; if (tp->features & WSTPAD_SOFTBUTTONS) tp->handlers |= 1 << SOFTBUTTON_HDLR; if (tp->features & WSTPAD_TOPBUTTONS) tp->handlers |= 1 << TOPBUTTON_HDLR; if (tp->features & WSTPAD_TWOFINGERSCROLL) tp->handlers |= 1 << F2SCROLL_HDLR; else if (tp->features & WSTPAD_EDGESCROLL) tp->handlers |= 1 << EDGESCROLL_HDLR; for (i = 0; i < TAP_BTNMAP_SIZE; i++) { if (tp->tap.btnmap[i] == 0) continue; tp->tap.clicktime = imin(imax(tp->tap.clicktime, 80), 350); if (tp->tap.locktime) tp->tap.locktime = imin(imax(tp->tap.locktime, 150), 5000); tp->handlers |= 1 << TAP_HDLR; break; } if (input->hw.hw_type == WSMOUSEHW_CLICKPAD) tp->handlers |= 1 << CLICK_HDLR; tp->sbtnswap = ((tp->features & WSTPAD_SWAPSIDES) ? (LEFTBTN | RIGHTBTN) : 0); return (0); } void wstpad_reset(struct wsmouseinput *input) { struct wstpad *tp = input->tp; if (tp != NULL) { timeout_del(&tp->tap.to); tp->tap.state = TAP_DETECT; } if (input->sbtn.buttons) { input->sbtn.sync = input->sbtn.buttons; input->sbtn.buttons = 0; } } void wstpad_cleanup(struct wsmouseinput *input) { struct wstpad *tp = input->tp; int slots; timeout_del(&tp->tap.to); slots = imax(input->mt.num_slots, 1); free(tp->tpad_touches, M_DEVBUF, slots * sizeof(struct tpad_touch)); free(tp, M_DEVBUF, sizeof(struct wstpad)); input->tp = NULL; } int wstpad_set_param(struct wsmouseinput *input, int key, int val) { struct wstpad *tp = input->tp; u_int flag; if (tp == NULL) return (EINVAL); switch (key) { case WSMOUSECFG_SOFTBUTTONS ... WSMOUSECFG_DISABLE: switch (key) { case WSMOUSECFG_SOFTBUTTONS: flag = WSTPAD_SOFTBUTTONS; break; case WSMOUSECFG_SOFTMBTN: flag = WSTPAD_SOFTMBTN; break; case WSMOUSECFG_TOPBUTTONS: flag = WSTPAD_TOPBUTTONS; break; case WSMOUSECFG_TWOFINGERSCROLL: flag = WSTPAD_TWOFINGERSCROLL; break; case WSMOUSECFG_EDGESCROLL: flag = WSTPAD_EDGESCROLL; break; case WSMOUSECFG_HORIZSCROLL: flag = WSTPAD_HORIZSCROLL; break; case WSMOUSECFG_SWAPSIDES: flag = WSTPAD_SWAPSIDES; break; case WSMOUSECFG_DISABLE: flag = WSTPAD_DISABLE; break; } if (val) tp->features |= flag; else tp->features &= ~flag; break; case WSMOUSECFG_LEFT_EDGE: tp->params.left_edge = val; break; case WSMOUSECFG_RIGHT_EDGE: tp->params.right_edge = val; break; case WSMOUSECFG_TOP_EDGE: tp->params.top_edge = val; break; case WSMOUSECFG_BOTTOM_EDGE: tp->params.bottom_edge = val; break; case WSMOUSECFG_CENTERWIDTH: tp->params.center_width = val; break; case WSMOUSECFG_HORIZSCROLLDIST: tp->scroll.hdist = val; break; case WSMOUSECFG_VERTSCROLLDIST: tp->scroll.vdist = val; break; case WSMOUSECFG_F2WIDTH: tp->params.f2width = val; break; case WSMOUSECFG_F2PRESSURE: tp->params.f2pressure = val; break; case WSMOUSECFG_TAP_MAXTIME: tp->tap.maxtime.tv_nsec = imin(val, 999) * 1000000; break; case WSMOUSECFG_TAP_CLICKTIME: tp->tap.clicktime = val; break; case WSMOUSECFG_TAP_LOCKTIME: tp->tap.locktime = val; break; case WSMOUSECFG_TAP_ONE_BTNMAP: tp->tap.btnmap[0] = BTNMASK(val); break; case WSMOUSECFG_TAP_TWO_BTNMAP: tp->tap.btnmap[1] = BTNMASK(val); break; case WSMOUSECFG_TAP_THREE_BTNMAP: tp->tap.btnmap[2] = BTNMASK(val); break; default: return (ENOTSUP); } return (0); } int wstpad_get_param(struct wsmouseinput *input, int key, int *pval) { struct wstpad *tp = input->tp; u_int flag; if (tp == NULL) return (EINVAL); switch (key) { case WSMOUSECFG_SOFTBUTTONS ... WSMOUSECFG_DISABLE: switch (key) { case WSMOUSECFG_SOFTBUTTONS: flag = WSTPAD_SOFTBUTTONS; break; case WSMOUSECFG_SOFTMBTN: flag = WSTPAD_SOFTMBTN; break; case WSMOUSECFG_TOPBUTTONS: flag = WSTPAD_TOPBUTTONS; break; case WSMOUSECFG_TWOFINGERSCROLL: flag = WSTPAD_TWOFINGERSCROLL; break; case WSMOUSECFG_EDGESCROLL: flag = WSTPAD_EDGESCROLL; break; case WSMOUSECFG_HORIZSCROLL: flag = WSTPAD_HORIZSCROLL; break; case WSMOUSECFG_SWAPSIDES: flag = WSTPAD_SWAPSIDES; break; case WSMOUSECFG_DISABLE: flag = WSTPAD_DISABLE; break; } *pval = !!(tp->features & flag); break; case WSMOUSECFG_LEFT_EDGE: *pval = tp->params.left_edge; break; case WSMOUSECFG_RIGHT_EDGE: *pval = tp->params.right_edge; break; case WSMOUSECFG_TOP_EDGE: *pval = tp->params.top_edge; break; case WSMOUSECFG_BOTTOM_EDGE: *pval = tp->params.bottom_edge; break; case WSMOUSECFG_CENTERWIDTH: *pval = tp->params.center_width; break; case WSMOUSECFG_HORIZSCROLLDIST: *pval = tp->scroll.hdist; break; case WSMOUSECFG_VERTSCROLLDIST: *pval = tp->scroll.vdist; break; case WSMOUSECFG_F2WIDTH: *pval = tp->params.f2width; break; case WSMOUSECFG_F2PRESSURE: *pval = tp->params.f2pressure; break; case WSMOUSECFG_TAP_MAXTIME: *pval = tp->tap.maxtime.tv_nsec / 1000000; break; case WSMOUSECFG_TAP_CLICKTIME: *pval = tp->tap.clicktime; break; case WSMOUSECFG_TAP_LOCKTIME: *pval = tp->tap.locktime; break; case WSMOUSECFG_TAP_ONE_BTNMAP: *pval = ffs(tp->tap.btnmap[0]); break; case WSMOUSECFG_TAP_TWO_BTNMAP: *pval = ffs(tp->tap.btnmap[1]); break; case WSMOUSECFG_TAP_THREE_BTNMAP: *pval = ffs(tp->tap.btnmap[2]); break; default: return (ENOTSUP); } return (0); }
28 34 34 6 34 32 6 33 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 /* $OpenBSD: chacha_private.h,v 1.4 2020/07/22 13:54:30 tobhe Exp $ */ /* chacha-merged.c version 20080118 D. J. Bernstein Public domain. */ #include <sys/systm.h> typedef unsigned char u8; typedef unsigned int u32; typedef struct { u32 input[16]; /* could be compressed */ } chacha_ctx; #define U8C(v) (v##U) #define U32C(v) (v##U) #define U8V(v) ((u8)(v) & U8C(0xFF)) #define U32V(v) ((u32)(v) & U32C(0xFFFFFFFF)) #define ROTL32(v, n) \ (U32V((v) << (n)) | ((v) >> (32 - (n)))) #define U8TO32_LITTLE(p) \ (((u32)((p)[0]) ) | \ ((u32)((p)[1]) << 8) | \ ((u32)((p)[2]) << 16) | \ ((u32)((p)[3]) << 24)) #define U32TO8_LITTLE(p, v) \ do { \ (p)[0] = U8V((v) ); \ (p)[1] = U8V((v) >> 8); \ (p)[2] = U8V((v) >> 16); \ (p)[3] = U8V((v) >> 24); \ } while (0) #define ROTATE(v,c) (ROTL32(v,c)) #define XOR(v,w) ((v) ^ (w)) #define PLUS(v,w) (U32V((v) + (w))) #define PLUSONE(v) (PLUS((v),1)) #define QUARTERROUND(a,b,c,d) \ a = PLUS(a,b); d = ROTATE(XOR(d,a),16); \ c = PLUS(c,d); b = ROTATE(XOR(b,c),12); \ a = PLUS(a,b); d = ROTATE(XOR(d,a), 8); \ c = PLUS(c,d); b = ROTATE(XOR(b,c), 7); static const char sigma[16] = "expand 32-byte k"; static const char tau[16] = "expand 16-byte k"; static inline void hchacha20(u32 derived_key[8], const u8 nonce[16], const u8 key[32]) { int i; uint32_t x[] = { U8TO32_LITTLE(sigma + 0), U8TO32_LITTLE(sigma + 4), U8TO32_LITTLE(sigma + 8), U8TO32_LITTLE(sigma + 12), U8TO32_LITTLE(key + 0), U8TO32_LITTLE(key + 4), U8TO32_LITTLE(key + 8), U8TO32_LITTLE(key + 12), U8TO32_LITTLE(key + 16), U8TO32_LITTLE(key + 20), U8TO32_LITTLE(key + 24), U8TO32_LITTLE(key + 28), U8TO32_LITTLE(nonce + 0), U8TO32_LITTLE(nonce + 4), U8TO32_LITTLE(nonce + 8), U8TO32_LITTLE(nonce + 12) }; for (i = 20;i > 0;i -= 2) { QUARTERROUND( x[0], x[4], x[8],x[12]) QUARTERROUND( x[1], x[5], x[9],x[13]) QUARTERROUND( x[2], x[6],x[10],x[14]) QUARTERROUND( x[3], x[7],x[11],x[15]) QUARTERROUND( x[0], x[5],x[10],x[15]) QUARTERROUND( x[1], x[6],x[11],x[12]) QUARTERROUND( x[2], x[7], x[8],x[13]) QUARTERROUND( x[3], x[4], x[9],x[14]) } memcpy(derived_key + 0, x + 0, sizeof(u32) * 4); memcpy(derived_key + 4, x + 12, sizeof(u32) * 4); } static void chacha_keysetup(chacha_ctx *x,const u8 *k,u32 kbits) { const char *constants; x->input[4] = U8TO32_LITTLE(k + 0); x->input[5] = U8TO32_LITTLE(k + 4); x->input[6] = U8TO32_LITTLE(k + 8); x->input[7] = U8TO32_LITTLE(k + 12); if (kbits == 256) { /* recommended */ k += 16; constants = sigma; } else { /* kbits == 128 */ constants = tau; } x->input[8] = U8TO32_LITTLE(k + 0); x->input[9] = U8TO32_LITTLE(k + 4); x->input[10] = U8TO32_LITTLE(k + 8); x->input[11] = U8TO32_LITTLE(k + 12); x->input[0] = U8TO32_LITTLE(constants + 0); x->input[1] = U8TO32_LITTLE(constants + 4); x->input[2] = U8TO32_LITTLE(constants + 8); x->input[3] = U8TO32_LITTLE(constants + 12); } static void chacha_ivsetup(chacha_ctx *x, const u8 *iv, const u8 *counter) { x->input[12] = counter == NULL ? 0 : U8TO32_LITTLE(counter + 0); x->input[13] = counter == NULL ? 0 : U8TO32_LITTLE(counter + 4); x->input[14] = U8TO32_LITTLE(iv + 0); x->input[15] = U8TO32_LITTLE(iv + 4); } static void chacha_encrypt_bytes(chacha_ctx *x,const u8 *m,u8 *c,u32 bytes) { u32 x0, x1, x2, x3, x4, x5, x6, x7, x8, x9, x10, x11, x12, x13, x14, x15; u32 j0, j1, j2, j3, j4, j5, j6, j7, j8, j9, j10, j11, j12, j13, j14, j15; u8 *ctarget = NULL; u8 tmp[64]; u_int i; if (!bytes) return; j0 = x->input[0]; j1 = x->input[1]; j2 = x->input[2]; j3 = x->input[3]; j4 = x->input[4]; j5 = x->input[5]; j6 = x->input[6]; j7 = x->input[7]; j8 = x->input[8]; j9 = x->input[9]; j10 = x->input[10]; j11 = x->input[11]; j12 = x->input[12]; j13 = x->input[13]; j14 = x->input[14]; j15 = x->input[15]; for (;;) { if (bytes < 64) { for (i = 0;i < bytes;++i) tmp[i] = m[i]; m = tmp; ctarget = c; c = tmp; } x0 = j0; x1 = j1; x2 = j2; x3 = j3; x4 = j4; x5 = j5; x6 = j6; x7 = j7; x8 = j8; x9 = j9; x10 = j10; x11 = j11; x12 = j12; x13 = j13; x14 = j14; x15 = j15; for (i = 20;i > 0;i -= 2) { QUARTERROUND( x0, x4, x8,x12) QUARTERROUND( x1, x5, x9,x13) QUARTERROUND( x2, x6,x10,x14) QUARTERROUND( x3, x7,x11,x15) QUARTERROUND( x0, x5,x10,x15) QUARTERROUND( x1, x6,x11,x12) QUARTERROUND( x2, x7, x8,x13) QUARTERROUND( x3, x4, x9,x14) } x0 = PLUS(x0,j0); x1 = PLUS(x1,j1); x2 = PLUS(x2,j2); x3 = PLUS(x3,j3); x4 = PLUS(x4,j4); x5 = PLUS(x5,j5); x6 = PLUS(x6,j6); x7 = PLUS(x7,j7); x8 = PLUS(x8,j8); x9 = PLUS(x9,j9); x10 = PLUS(x10,j10); x11 = PLUS(x11,j11); x12 = PLUS(x12,j12); x13 = PLUS(x13,j13); x14 = PLUS(x14,j14); x15 = PLUS(x15,j15); #ifndef KEYSTREAM_ONLY x0 = XOR(x0,U8TO32_LITTLE(m + 0)); x1 = XOR(x1,U8TO32_LITTLE(m + 4)); x2 = XOR(x2,U8TO32_LITTLE(m + 8)); x3 = XOR(x3,U8TO32_LITTLE(m + 12)); x4 = XOR(x4,U8TO32_LITTLE(m + 16)); x5 = XOR(x5,U8TO32_LITTLE(m + 20)); x6 = XOR(x6,U8TO32_LITTLE(m + 24)); x7 = XOR(x7,U8TO32_LITTLE(m + 28)); x8 = XOR(x8,U8TO32_LITTLE(m + 32)); x9 = XOR(x9,U8TO32_LITTLE(m + 36)); x10 = XOR(x10,U8TO32_LITTLE(m + 40)); x11 = XOR(x11,U8TO32_LITTLE(m + 44)); x12 = XOR(x12,U8TO32_LITTLE(m + 48)); x13 = XOR(x13,U8TO32_LITTLE(m + 52)); x14 = XOR(x14,U8TO32_LITTLE(m + 56)); x15 = XOR(x15,U8TO32_LITTLE(m + 60)); #endif j12 = PLUSONE(j12); if (!j12) { j13 = PLUSONE(j13); /* stopping at 2^70 bytes per nonce is user's responsibility */ } U32TO8_LITTLE(c + 0,x0); U32TO8_LITTLE(c + 4,x1); U32TO8_LITTLE(c + 8,x2); U32TO8_LITTLE(c + 12,x3); U32TO8_LITTLE(c + 16,x4); U32TO8_LITTLE(c + 20,x5); U32TO8_LITTLE(c + 24,x6); U32TO8_LITTLE(c + 28,x7); U32TO8_LITTLE(c + 32,x8); U32TO8_LITTLE(c + 36,x9); U32TO8_LITTLE(c + 40,x10); U32TO8_LITTLE(c + 44,x11); U32TO8_LITTLE(c + 48,x12); U32TO8_LITTLE(c + 52,x13); U32TO8_LITTLE(c + 56,x14); U32TO8_LITTLE(c + 60,x15); if (bytes <= 64) { if (bytes < 64) { for (i = 0;i < bytes;++i) ctarget[i] = c[i]; } x->input[12] = j12; x->input[13] = j13; return; } bytes -= 64; c += 64; #ifndef KEYSTREAM_ONLY m += 64; #endif } }
65 64 3 62 65 3 65 65 65 41 43 3 3 33 15 65 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 /* $OpenBSD: in4_cksum.c,v 1.11 2022/02/01 15:30:10 miod Exp $ */ /* $KAME: in4_cksum.c,v 1.10 2001/11/30 10:06:15 itojun Exp $ */ /* $NetBSD: in_cksum.c,v 1.13 1996/10/13 02:03:03 christos Exp $ */ /* * Copyright (C) 1999 WIDE Project. * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. Neither the name of the project nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE PROJECT AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE PROJECT OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ /* * Copyright (c) 1988, 1992, 1993 * The Regents of the University of California. All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * @(#)in_cksum.c 8.1 (Berkeley) 6/10/93 */ #include <sys/param.h> #include <sys/mbuf.h> #include <sys/systm.h> #include <sys/socket.h> #include <sys/socketvar.h> #include <netinet/in.h> #include <netinet/ip.h> #include <netinet/ip_var.h> /* * Checksum routine for Internet Protocol family headers (Portable Version). * This is only for IPv4 pseudo header checksum. * No need to clear non-pseudo-header fields in IPv4 header. * len is for actual payload size, and does not include IPv4 header and * skipped header chain (off + len should be equal to the whole packet). * * This routine is very heavily used in the network * code and should be modified for each CPU to be as fast as possible. */ #define ADDCARRY(x) (x > 65535 ? x -= 65535 : x) #define REDUCE {l_util.l = sum; sum = l_util.s[0] + l_util.s[1]; ADDCARRY(sum);} int in4_cksum(struct mbuf *m, u_int8_t nxt, int off, int len) { u_int16_t *w; int sum = 0; int mlen = 0; int byte_swapped = 0; union { struct ipovly ipov; u_int16_t w[10]; } u; union { u_int8_t c[2]; u_int16_t s; } s_util; union { u_int16_t s[2]; u_int32_t l; } l_util; if (nxt != 0) { /* pseudo header */ if (off < sizeof(struct ipovly)) panic("in4_cksum: offset too short"); if (m->m_len < sizeof(struct ip)) panic("in4_cksum: bad mbuf chain"); u.ipov.ih_x1[8] = 0; u.ipov.ih_pr = nxt; u.ipov.ih_len = htons(len); u.ipov.ih_src = mtod(m, struct ip *)->ip_src; u.ipov.ih_dst = mtod(m, struct ip *)->ip_dst; w = u.w; /* assumes sizeof(ipov) == 20 and first 8 bytes are zeroes */ sum += w[4]; sum += w[5]; sum += w[6]; sum += w[7]; sum += w[8]; sum += w[9]; } /* skip unnecessary part */ while (m && off > 0) { if (m->m_len > off) break; off -= m->m_len; m = m->m_next; } for (;m && len; m = m->m_next) { if (m->m_len == 0) continue; w = (u_int16_t *)(mtod(m, caddr_t) + off); if (mlen == -1) { /* * The first byte of this mbuf is the continuation * of a word spanning between this mbuf and the * last mbuf. * * s_util.c[0] is already saved when scanning previous * mbuf. */ s_util.c[1] = *(u_int8_t *)w; sum += s_util.s; w = (u_int16_t *)((u_int8_t *)w + 1); mlen = m->m_len - off - 1; len--; } else mlen = m->m_len - off; off = 0; if (len < mlen) mlen = len; len -= mlen; /* * Force to even boundary. */ if ((1 & (long) w) && (mlen > 0)) { REDUCE; sum <<= 8; s_util.c[0] = *(u_int8_t *)w; w = (u_int16_t *)((int8_t *)w + 1); mlen--; byte_swapped = 1; } /* * Unroll the loop to make overhead from * branches &c small. */ while ((mlen -= 32) >= 0) { sum += w[0]; sum += w[1]; sum += w[2]; sum += w[3]; sum += w[4]; sum += w[5]; sum += w[6]; sum += w[7]; sum += w[8]; sum += w[9]; sum += w[10]; sum += w[11]; sum += w[12]; sum += w[13]; sum += w[14]; sum += w[15]; w += 16; } mlen += 32; while ((mlen -= 8) >= 0) { sum += w[0]; sum += w[1]; sum += w[2]; sum += w[3]; w += 4; } mlen += 8; if (mlen == 0 && byte_swapped == 0) continue; REDUCE; while ((mlen -= 2) >= 0) { sum += *w++; } if (byte_swapped) { REDUCE; sum <<= 8; byte_swapped = 0; if (mlen == -1) { s_util.c[1] = *(u_int8_t *)w; sum += s_util.s; mlen = 0; } else mlen = -1; } else if (mlen == -1) s_util.c[0] = *(u_int8_t *)w; } if (len) printf("cksum4: out of data\n"); if (mlen == -1) { /* The last mbuf has odd # of bytes. Follow the standard (the odd byte may be shifted left by 8 bits or not as determined by endian-ness of the machine) */ s_util.c[1] = 0; sum += s_util.s; } REDUCE; return (~sum & 0xffff); }
2924 5 5 11 11 2510 269 439 4 1083 1083 4 607 229 175 64 64 350 351 528 528 63 63 9 9 4 4 4 4 4 4 4 4 4 4 7 6 2 5 5 5 15 2 7 5 1 4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 /* $OpenBSD: kern_tc.c,v 1.77 2022/08/12 02:20:36 cheloha Exp $ */ /* * Copyright (c) 2000 Poul-Henning Kamp <phk@FreeBSD.org> * * Permission to use, copy, modify, and distribute this software for any * purpose with or without fee is hereby granted, provided that the above * copyright notice and this permission notice appear in all copies. * * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. */ /* * If we meet some day, and you think this stuff is worth it, you * can buy me a beer in return. Poul-Henning Kamp */ #include <sys/param.h> #include <sys/atomic.h> #include <sys/kernel.h> #include <sys/mutex.h> #include <sys/rwlock.h> #include <sys/stdint.h> #include <sys/timeout.h> #include <sys/sysctl.h> #include <sys/syslog.h> #include <sys/systm.h> #include <sys/timetc.h> #include <sys/queue.h> #include <sys/malloc.h> u_int dummy_get_timecount(struct timecounter *); int sysctl_tc_hardware(void *, size_t *, void *, size_t); int sysctl_tc_choice(void *, size_t *, void *, size_t); /* * Implement a dummy timecounter which we can use until we get a real one * in the air. This allows the console and other early stuff to use * time services. */ u_int dummy_get_timecount(struct timecounter *tc) { static u_int now; return atomic_inc_int_nv(&now); } static struct timecounter dummy_timecounter = { .tc_get_timecount = dummy_get_timecount, .tc_poll_pps = NULL, .tc_counter_mask = ~0u, .tc_frequency = 1000000, .tc_name = "dummy", .tc_quality = -1000000, .tc_priv = NULL, .tc_user = 0, }; /* * Locks used to protect struct members, global variables in this file: * I immutable after initialization * T tc_lock * W windup_mtx */ struct timehands { /* These fields must be initialized by the driver. */ struct timecounter *th_counter; /* [W] */ int64_t th_adjtimedelta; /* [T,W] */ struct bintime th_next_ntp_update; /* [T,W] */ int64_t th_adjustment; /* [W] */ u_int64_t th_scale; /* [W] */ u_int th_offset_count; /* [W] */ struct bintime th_boottime; /* [T,W] */ struct bintime th_offset; /* [W] */ struct bintime th_naptime; /* [W] */ struct timeval th_microtime; /* [W] */ struct timespec th_nanotime; /* [W] */ /* Fields not to be copied in tc_windup start with th_generation. */ volatile u_int th_generation; /* [W] */ struct timehands *th_next; /* [I] */ }; static struct timehands th0; static struct timehands th1 = { .th_next = &th0 }; static struct timehands th0 = { .th_counter = &dummy_timecounter, .th_scale = UINT64_MAX / 1000000, .th_offset = { .sec = 1, .frac = 0 }, .th_generation = 1, .th_next = &th1 }; struct rwlock tc_lock = RWLOCK_INITIALIZER("tc_lock"); /* * tc_windup() must be called before leaving this mutex. */ struct mutex windup_mtx = MUTEX_INITIALIZER(IPL_CLOCK); static struct timehands *volatile timehands = &th0; /* [W] */ struct timecounter *timecounter = &dummy_timecounter; /* [T] */ static SLIST_HEAD(, timecounter) tc_list = SLIST_HEAD_INITIALIZER(tc_list); /* * These are updated from tc_windup(). They are useful when * examining kernel core dumps. */ volatile time_t naptime = 0; volatile time_t time_second = 1; volatile time_t time_uptime = 0; static int timestepwarnings; void ntp_update_second(struct timehands *); void tc_windup(struct bintime *, struct bintime *, int64_t *); /* * Return the difference between the timehands' counter value now and what * was when we copied it to the timehands' offset_count. */ static __inline u_int tc_delta(struct timehands *th) { struct timecounter *tc; tc = th->th_counter; return ((tc->tc_get_timecount(tc) - th->th_offset_count) & tc->tc_counter_mask); } /* * Functions for reading the time. We have to loop until we are sure that * the timehands that we operated on was not updated under our feet. See * the comment in <sys/time.h> for a description of these functions. */ void binboottime(struct bintime *bt) { struct timehands *th; u_int gen; do { th = timehands; gen = th->th_generation; membar_consumer(); *bt = th->th_boottime; membar_consumer(); } while (gen == 0 || gen != th->th_ge