| 238 229 5 234 229 229 51 3 48 6 3 16 34 1 2 1 38 38 12 10 1 3 1 1 1 4 7 2 5 7 7 5 7 45 275 27 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 | // SPDX-License-Identifier: GPL-2.0 #include <linux/mount.h> #include <linux/pseudo_fs.h> #include <linux/file.h> #include <linux/fs.h> #include <linux/proc_fs.h> #include <linux/proc_ns.h> #include <linux/magic.h> #include <linux/ktime.h> #include <linux/seq_file.h> #include <linux/pid_namespace.h> #include <linux/user_namespace.h> #include <linux/nsfs.h> #include <linux/uaccess.h> #include <linux/mnt_namespace.h> #include <linux/ipc_namespace.h> #include <linux/time_namespace.h> #include <linux/utsname.h> #include <linux/exportfs.h> #include <linux/nstree.h> #include <net/net_namespace.h> #include "mount.h" #include "internal.h" static struct vfsmount *nsfs_mnt; static struct path nsfs_root_path = {}; void nsfs_get_root(struct path *path) { *path = nsfs_root_path; path_get(path); } static long ns_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg); static const struct file_operations ns_file_operations = { .unlocked_ioctl = ns_ioctl, .compat_ioctl = compat_ptr_ioctl, }; static char *ns_dname(struct dentry *dentry, char *buffer, int buflen) { struct inode *inode = d_inode(dentry); struct ns_common *ns = inode->i_private; const struct proc_ns_operations *ns_ops = ns->ops; return dynamic_dname(buffer, buflen, "%s:[%lu]", ns_ops->name, inode->i_ino); } const struct dentry_operations ns_dentry_operations = { .d_dname = ns_dname, .d_prune = stashed_dentry_prune, }; static void nsfs_evict(struct inode *inode) { struct ns_common *ns = inode->i_private; clear_inode(inode); ns->ops->put(ns); } int ns_get_path_cb(struct path *path, ns_get_path_helper_t *ns_get_cb, void *private_data) { struct ns_common *ns; ns = ns_get_cb(private_data); if (!ns) return -ENOENT; return path_from_stashed(&ns->stashed, nsfs_mnt, ns, path); } struct ns_get_path_task_args { const struct proc_ns_operations *ns_ops; struct task_struct *task; }; static struct ns_common *ns_get_path_task(void *private_data) { struct ns_get_path_task_args *args = private_data; return args->ns_ops->get(args->task); } int ns_get_path(struct path *path, struct task_struct *task, const struct proc_ns_operations *ns_ops) { struct ns_get_path_task_args args = { .ns_ops = ns_ops, .task = task, }; return ns_get_path_cb(path, ns_get_path_task, &args); } /** * open_namespace - open a namespace * @ns: the namespace to open * * This will consume a reference to @ns indendent of success or failure. * * Return: A file descriptor on success or a negative error code on failure. */ int open_namespace(struct ns_common *ns) { struct path path __free(path_put) = {}; struct file *f; int err; /* call first to consume reference */ err = path_from_stashed(&ns->stashed, nsfs_mnt, ns, &path); if (err < 0) return err; CLASS(get_unused_fd, fd)(O_CLOEXEC); if (fd < 0) return fd; f = dentry_open(&path, O_RDONLY, current_cred()); if (IS_ERR(f)) return PTR_ERR(f); fd_install(fd, f); return take_fd(fd); } int open_related_ns(struct ns_common *ns, struct ns_common *(*get_ns)(struct ns_common *ns)) { struct ns_common *relative; relative = get_ns(ns); if (IS_ERR(relative)) return PTR_ERR(relative); return open_namespace(relative); } EXPORT_SYMBOL_GPL(open_related_ns); static int copy_ns_info_to_user(const struct mnt_namespace *mnt_ns, struct mnt_ns_info __user *uinfo, size_t usize, struct mnt_ns_info *kinfo) { /* * If userspace and the kernel have the same struct size it can just * be copied. If userspace provides an older struct, only the bits that * userspace knows about will be copied. If userspace provides a new * struct, only the bits that the kernel knows aobut will be copied and * the size value will be set to the size the kernel knows about. */ kinfo->size = min(usize, sizeof(*kinfo)); kinfo->mnt_ns_id = mnt_ns->ns.ns_id; kinfo->nr_mounts = READ_ONCE(mnt_ns->nr_mounts); /* Subtract the root mount of the mount namespace. */ if (kinfo->nr_mounts) kinfo->nr_mounts--; if (copy_to_user(uinfo, kinfo, kinfo->size)) return -EFAULT; return 0; } static bool nsfs_ioctl_valid(unsigned int cmd) { switch (cmd) { case NS_GET_USERNS: case NS_GET_PARENT: case NS_GET_NSTYPE: case NS_GET_OWNER_UID: case NS_GET_MNTNS_ID: case NS_GET_PID_FROM_PIDNS: case NS_GET_TGID_FROM_PIDNS: case NS_GET_PID_IN_PIDNS: case NS_GET_TGID_IN_PIDNS: case NS_GET_ID: return true; } /* Extensible ioctls require some extra handling. */ switch (_IOC_NR(cmd)) { case _IOC_NR(NS_MNT_GET_INFO): return extensible_ioctl_valid(cmd, NS_MNT_GET_INFO, MNT_NS_INFO_SIZE_VER0); case _IOC_NR(NS_MNT_GET_NEXT): return extensible_ioctl_valid(cmd, NS_MNT_GET_NEXT, MNT_NS_INFO_SIZE_VER0); case _IOC_NR(NS_MNT_GET_PREV): return extensible_ioctl_valid(cmd, NS_MNT_GET_PREV, MNT_NS_INFO_SIZE_VER0); } return false; } static long ns_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) { struct user_namespace *user_ns; struct pid_namespace *pid_ns; struct task_struct *tsk; struct ns_common *ns; struct mnt_namespace *mnt_ns; bool previous = false; uid_t __user *argp; uid_t uid; int ret; if (!nsfs_ioctl_valid(ioctl)) return -ENOIOCTLCMD; ns = get_proc_ns(file_inode(filp)); switch (ioctl) { case NS_GET_USERNS: return open_related_ns(ns, ns_get_owner); case NS_GET_PARENT: if (!ns->ops->get_parent) return -EINVAL; return open_related_ns(ns, ns->ops->get_parent); case NS_GET_NSTYPE: return ns->ns_type; case NS_GET_OWNER_UID: if (ns->ns_type != CLONE_NEWUSER) return -EINVAL; user_ns = container_of(ns, struct user_namespace, ns); argp = (uid_t __user *) arg; uid = from_kuid_munged(current_user_ns(), user_ns->owner); return put_user(uid, argp); case NS_GET_PID_FROM_PIDNS: fallthrough; case NS_GET_TGID_FROM_PIDNS: fallthrough; case NS_GET_PID_IN_PIDNS: fallthrough; case NS_GET_TGID_IN_PIDNS: { if (ns->ns_type != CLONE_NEWPID) return -EINVAL; ret = -ESRCH; pid_ns = container_of(ns, struct pid_namespace, ns); guard(rcu)(); if (ioctl == NS_GET_PID_IN_PIDNS || ioctl == NS_GET_TGID_IN_PIDNS) tsk = find_task_by_vpid(arg); else tsk = find_task_by_pid_ns(arg, pid_ns); if (!tsk) break; switch (ioctl) { case NS_GET_PID_FROM_PIDNS: ret = task_pid_vnr(tsk); break; case NS_GET_TGID_FROM_PIDNS: ret = task_tgid_vnr(tsk); break; case NS_GET_PID_IN_PIDNS: ret = task_pid_nr_ns(tsk, pid_ns); break; case NS_GET_TGID_IN_PIDNS: ret = task_tgid_nr_ns(tsk, pid_ns); break; default: ret = 0; break; } if (!ret) ret = -ESRCH; return ret; } case NS_GET_MNTNS_ID: if (ns->ns_type != CLONE_NEWNS) return -EINVAL; fallthrough; case NS_GET_ID: { __u64 __user *idp; __u64 id; idp = (__u64 __user *)arg; id = ns->ns_id; return put_user(id, idp); } } /* extensible ioctls */ switch (_IOC_NR(ioctl)) { case _IOC_NR(NS_MNT_GET_INFO): { struct mnt_ns_info kinfo = {}; struct mnt_ns_info __user *uinfo = (struct mnt_ns_info __user *)arg; size_t usize = _IOC_SIZE(ioctl); if (ns->ns_type != CLONE_NEWNS) return -EINVAL; if (!uinfo) return -EINVAL; if (usize < MNT_NS_INFO_SIZE_VER0) return -EINVAL; return copy_ns_info_to_user(to_mnt_ns(ns), uinfo, usize, &kinfo); } case _IOC_NR(NS_MNT_GET_PREV): previous = true; fallthrough; case _IOC_NR(NS_MNT_GET_NEXT): { struct mnt_ns_info kinfo = {}; struct mnt_ns_info __user *uinfo = (struct mnt_ns_info __user *)arg; struct path path __free(path_put) = {}; struct file *f __free(fput) = NULL; size_t usize = _IOC_SIZE(ioctl); if (ns->ns_type != CLONE_NEWNS) return -EINVAL; if (usize < MNT_NS_INFO_SIZE_VER0) return -EINVAL; mnt_ns = get_sequential_mnt_ns(to_mnt_ns(ns), previous); if (IS_ERR(mnt_ns)) return PTR_ERR(mnt_ns); ns = to_ns_common(mnt_ns); /* Transfer ownership of @mnt_ns reference to @path. */ ret = path_from_stashed(&ns->stashed, nsfs_mnt, ns, &path); if (ret) return ret; CLASS(get_unused_fd, fd)(O_CLOEXEC); if (fd < 0) return fd; f = dentry_open(&path, O_RDONLY, current_cred()); if (IS_ERR(f)) return PTR_ERR(f); if (uinfo) { /* * If @uinfo is passed return all information about the * mount namespace as well. */ ret = copy_ns_info_to_user(to_mnt_ns(ns), uinfo, usize, &kinfo); if (ret) return ret; } /* Transfer reference of @f to caller's fdtable. */ fd_install(fd, no_free_ptr(f)); /* File descriptor is live so hand it off to the caller. */ return take_fd(fd); } default: ret = -ENOTTY; } return ret; } int ns_get_name(char *buf, size_t size, struct task_struct *task, const struct proc_ns_operations *ns_ops) { struct ns_common *ns; int res = -ENOENT; const char *name; ns = ns_ops->get(task); if (ns) { name = ns_ops->real_ns_name ? : ns_ops->name; res = snprintf(buf, size, "%s:[%u]", name, ns->inum); ns_ops->put(ns); } return res; } bool proc_ns_file(const struct file *file) { return file->f_op == &ns_file_operations; } /** * ns_match() - Returns true if current namespace matches dev/ino provided. * @ns: current namespace * @dev: dev_t from nsfs that will be matched against current nsfs * @ino: ino_t from nsfs that will be matched against current nsfs * * Return: true if dev and ino matches the current nsfs. */ bool ns_match(const struct ns_common *ns, dev_t dev, ino_t ino) { return (ns->inum == ino) && (nsfs_mnt->mnt_sb->s_dev == dev); } static int nsfs_show_path(struct seq_file *seq, struct dentry *dentry) { struct inode *inode = d_inode(dentry); const struct ns_common *ns = inode->i_private; const struct proc_ns_operations *ns_ops = ns->ops; seq_printf(seq, "%s:[%lu]", ns_ops->name, inode->i_ino); return 0; } static const struct super_operations nsfs_ops = { .statfs = simple_statfs, .evict_inode = nsfs_evict, .show_path = nsfs_show_path, }; static int nsfs_init_inode(struct inode *inode, void *data) { struct ns_common *ns = data; inode->i_private = data; inode->i_mode |= S_IRUGO; inode->i_fop = &ns_file_operations; inode->i_ino = ns->inum; return 0; } static void nsfs_put_data(void *data) { struct ns_common *ns = data; ns->ops->put(ns); } static const struct stashed_operations nsfs_stashed_ops = { .init_inode = nsfs_init_inode, .put_data = nsfs_put_data, }; #define NSFS_FID_SIZE_U32_VER0 (NSFS_FILE_HANDLE_SIZE_VER0 / sizeof(u32)) #define NSFS_FID_SIZE_U32_LATEST (NSFS_FILE_HANDLE_SIZE_LATEST / sizeof(u32)) static int nsfs_encode_fh(struct inode *inode, u32 *fh, int *max_len, struct inode *parent) { struct nsfs_file_handle *fid = (struct nsfs_file_handle *)fh; struct ns_common *ns = inode->i_private; int len = *max_len; if (parent) return FILEID_INVALID; if (len < NSFS_FID_SIZE_U32_VER0) { *max_len = NSFS_FID_SIZE_U32_LATEST; return FILEID_INVALID; } else if (len > NSFS_FID_SIZE_U32_LATEST) { *max_len = NSFS_FID_SIZE_U32_LATEST; } fid->ns_id = ns->ns_id; fid->ns_type = ns->ns_type; fid->ns_inum = inode->i_ino; return FILEID_NSFS; } static struct dentry *nsfs_fh_to_dentry(struct super_block *sb, struct fid *fh, int fh_len, int fh_type) { struct path path __free(path_put) = {}; struct nsfs_file_handle *fid = (struct nsfs_file_handle *)fh; struct user_namespace *owning_ns = NULL; struct ns_common *ns; int ret; if (fh_len < NSFS_FID_SIZE_U32_VER0) return NULL; /* Check that any trailing bytes are zero. */ if ((fh_len > NSFS_FID_SIZE_U32_LATEST) && memchr_inv((void *)fid + NSFS_FID_SIZE_U32_LATEST, 0, fh_len - NSFS_FID_SIZE_U32_LATEST)) return NULL; switch (fh_type) { case FILEID_NSFS: break; default: return NULL; } scoped_guard(rcu) { ns = ns_tree_lookup_rcu(fid->ns_id, fid->ns_type); if (!ns) return NULL; VFS_WARN_ON_ONCE(ns->ns_id != fid->ns_id); VFS_WARN_ON_ONCE(ns->ns_type != fid->ns_type); if (ns->inum != fid->ns_inum) return NULL; if (!__ns_ref_get(ns)) return NULL; } switch (ns->ns_type) { #ifdef CONFIG_CGROUPS case CLONE_NEWCGROUP: if (!current_in_namespace(to_cg_ns(ns))) owning_ns = to_cg_ns(ns)->user_ns; break; #endif #ifdef CONFIG_IPC_NS case CLONE_NEWIPC: if (!current_in_namespace(to_ipc_ns(ns))) owning_ns = to_ipc_ns(ns)->user_ns; break; #endif case CLONE_NEWNS: if (!current_in_namespace(to_mnt_ns(ns))) owning_ns = to_mnt_ns(ns)->user_ns; break; #ifdef CONFIG_NET_NS case CLONE_NEWNET: if (!current_in_namespace(to_net_ns(ns))) owning_ns = to_net_ns(ns)->user_ns; break; #endif #ifdef CONFIG_PID_NS case CLONE_NEWPID: if (!current_in_namespace(to_pid_ns(ns))) { owning_ns = to_pid_ns(ns)->user_ns; } else if (!READ_ONCE(to_pid_ns(ns)->child_reaper)) { ns->ops->put(ns); return ERR_PTR(-EPERM); } break; #endif #ifdef CONFIG_TIME_NS case CLONE_NEWTIME: if (!current_in_namespace(to_time_ns(ns))) owning_ns = to_time_ns(ns)->user_ns; break; #endif #ifdef CONFIG_USER_NS case CLONE_NEWUSER: if (!current_in_namespace(to_user_ns(ns))) owning_ns = to_user_ns(ns); break; #endif #ifdef CONFIG_UTS_NS case CLONE_NEWUTS: if (!current_in_namespace(to_uts_ns(ns))) owning_ns = to_uts_ns(ns)->user_ns; break; #endif default: return ERR_PTR(-EOPNOTSUPP); } if (owning_ns && !ns_capable(owning_ns, CAP_SYS_ADMIN)) { ns->ops->put(ns); return ERR_PTR(-EPERM); } /* path_from_stashed() unconditionally consumes the reference. */ ret = path_from_stashed(&ns->stashed, nsfs_mnt, ns, &path); if (ret) return ERR_PTR(ret); return no_free_ptr(path.dentry); } static int nsfs_export_permission(struct handle_to_path_ctx *ctx, unsigned int oflags) { /* nsfs_fh_to_dentry() performs all permission checks. */ return 0; } static struct file *nsfs_export_open(const struct path *path, unsigned int oflags) { return file_open_root(path, "", oflags, 0); } static const struct export_operations nsfs_export_operations = { .encode_fh = nsfs_encode_fh, .fh_to_dentry = nsfs_fh_to_dentry, .open = nsfs_export_open, .permission = nsfs_export_permission, }; static int nsfs_init_fs_context(struct fs_context *fc) { struct pseudo_fs_context *ctx = init_pseudo(fc, NSFS_MAGIC); if (!ctx) return -ENOMEM; ctx->ops = &nsfs_ops; ctx->eops = &nsfs_export_operations; ctx->dops = &ns_dentry_operations; fc->s_fs_info = (void *)&nsfs_stashed_ops; return 0; } static struct file_system_type nsfs = { .name = "nsfs", .init_fs_context = nsfs_init_fs_context, .kill_sb = kill_anon_super, }; void __init nsfs_init(void) { nsfs_mnt = kern_mount(&nsfs); if (IS_ERR(nsfs_mnt)) panic("can't set nsfs up\n"); nsfs_mnt->mnt_sb->s_flags &= ~SB_NOUSER; nsfs_root_path.mnt = nsfs_mnt; nsfs_root_path.dentry = nsfs_mnt->mnt_root; } |
| 54 12 54 54 11 54 149 95 54 45 137 70 70 70 70 22 22 22 22 91 91 2 2 2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 | // SPDX-License-Identifier: GPL-2.0-only /* * Pid namespaces * * Authors: * (C) 2007 Pavel Emelyanov <xemul@openvz.org>, OpenVZ, SWsoft Inc. * (C) 2007 Sukadev Bhattiprolu <sukadev@us.ibm.com>, IBM * Many thanks to Oleg Nesterov for comments and help * */ #include <linux/pid.h> #include <linux/pid_namespace.h> #include <linux/user_namespace.h> #include <linux/syscalls.h> #include <linux/cred.h> #include <linux/err.h> #include <linux/acct.h> #include <linux/slab.h> #include <linux/proc_ns.h> #include <linux/reboot.h> #include <linux/export.h> #include <linux/sched/task.h> #include <linux/sched/signal.h> #include <linux/idr.h> #include <linux/nstree.h> #include <uapi/linux/wait.h> #include "pid_sysctl.h" static DEFINE_MUTEX(pid_caches_mutex); static struct kmem_cache *pid_ns_cachep; /* Write once array, filled from the beginning. */ static struct kmem_cache *pid_cache[MAX_PID_NS_LEVEL]; /* * creates the kmem cache to allocate pids from. * @level: pid namespace level */ static struct kmem_cache *create_pid_cachep(unsigned int level) { /* Level 0 is init_pid_ns.pid_cachep */ struct kmem_cache **pkc = &pid_cache[level - 1]; struct kmem_cache *kc; char name[4 + 10 + 1]; unsigned int len; kc = READ_ONCE(*pkc); if (kc) return kc; snprintf(name, sizeof(name), "pid_%u", level + 1); len = struct_size_t(struct pid, numbers, level + 1); mutex_lock(&pid_caches_mutex); /* Name collision forces to do allocation under mutex. */ if (!*pkc) *pkc = kmem_cache_create(name, len, 0, SLAB_HWCACHE_ALIGN | SLAB_ACCOUNT, NULL); mutex_unlock(&pid_caches_mutex); /* current can fail, but someone else can succeed. */ return READ_ONCE(*pkc); } static struct ucounts *inc_pid_namespaces(struct user_namespace *ns) { return inc_ucount(ns, current_euid(), UCOUNT_PID_NAMESPACES); } static void dec_pid_namespaces(struct ucounts *ucounts) { dec_ucount(ucounts, UCOUNT_PID_NAMESPACES); } static void destroy_pid_namespace_work(struct work_struct *work); static struct pid_namespace *create_pid_namespace(struct user_namespace *user_ns, struct pid_namespace *parent_pid_ns) { struct pid_namespace *ns; unsigned int level = parent_pid_ns->level + 1; struct ucounts *ucounts; int err; err = -EINVAL; if (!in_userns(parent_pid_ns->user_ns, user_ns)) goto out; err = -ENOSPC; if (level > MAX_PID_NS_LEVEL) goto out; ucounts = inc_pid_namespaces(user_ns); if (!ucounts) goto out; err = -ENOMEM; ns = kmem_cache_zalloc(pid_ns_cachep, GFP_KERNEL); if (ns == NULL) goto out_dec; idr_init(&ns->idr); ns->pid_cachep = create_pid_cachep(level); if (ns->pid_cachep == NULL) goto out_free_idr; err = ns_common_init(ns); if (err) goto out_free_idr; ns->pid_max = PID_MAX_LIMIT; err = register_pidns_sysctls(ns); if (err) goto out_free_inum; ns->level = level; ns->parent = get_pid_ns(parent_pid_ns); ns->user_ns = get_user_ns(user_ns); ns->ucounts = ucounts; ns->pid_allocated = PIDNS_ADDING; INIT_WORK(&ns->work, destroy_pid_namespace_work); #if defined(CONFIG_SYSCTL) && defined(CONFIG_MEMFD_CREATE) ns->memfd_noexec_scope = pidns_memfd_noexec_scope(parent_pid_ns); #endif ns_tree_add(ns); return ns; out_free_inum: ns_common_free(ns); out_free_idr: idr_destroy(&ns->idr); kmem_cache_free(pid_ns_cachep, ns); out_dec: dec_pid_namespaces(ucounts); out: return ERR_PTR(err); } static void delayed_free_pidns(struct rcu_head *p) { struct pid_namespace *ns = container_of(p, struct pid_namespace, rcu); dec_pid_namespaces(ns->ucounts); put_user_ns(ns->user_ns); kmem_cache_free(pid_ns_cachep, ns); } static void destroy_pid_namespace(struct pid_namespace *ns) { ns_tree_remove(ns); unregister_pidns_sysctls(ns); ns_common_free(ns); idr_destroy(&ns->idr); call_rcu(&ns->rcu, delayed_free_pidns); } static void destroy_pid_namespace_work(struct work_struct *work) { struct pid_namespace *ns = container_of(work, struct pid_namespace, work); do { struct pid_namespace *parent; parent = ns->parent; destroy_pid_namespace(ns); ns = parent; } while (ns != &init_pid_ns && ns_ref_put(ns)); } struct pid_namespace *copy_pid_ns(u64 flags, struct user_namespace *user_ns, struct pid_namespace *old_ns) { if (!(flags & CLONE_NEWPID)) return get_pid_ns(old_ns); if (task_active_pid_ns(current) != old_ns) return ERR_PTR(-EINVAL); return create_pid_namespace(user_ns, old_ns); } void put_pid_ns(struct pid_namespace *ns) { if (ns && ns != &init_pid_ns && ns_ref_put(ns)) schedule_work(&ns->work); } EXPORT_SYMBOL_GPL(put_pid_ns); void zap_pid_ns_processes(struct pid_namespace *pid_ns) { int nr; int rc; struct task_struct *task, *me = current; int init_pids = thread_group_leader(me) ? 1 : 2; struct pid *pid; /* Don't allow any more processes into the pid namespace */ disable_pid_allocation(pid_ns); /* * Ignore SIGCHLD causing any terminated children to autoreap. * This speeds up the namespace shutdown, plus see the comment * below. */ spin_lock_irq(&me->sighand->siglock); me->sighand->action[SIGCHLD - 1].sa.sa_handler = SIG_IGN; spin_unlock_irq(&me->sighand->siglock); /* * The last thread in the cgroup-init thread group is terminating. * Find remaining pid_ts in the namespace, signal and wait for them * to exit. * * Note: This signals each threads in the namespace - even those that * belong to the same thread group, To avoid this, we would have * to walk the entire tasklist looking a processes in this * namespace, but that could be unnecessarily expensive if the * pid namespace has just a few processes. Or we need to * maintain a tasklist for each pid namespace. * */ rcu_read_lock(); read_lock(&tasklist_lock); nr = 2; idr_for_each_entry_continue(&pid_ns->idr, pid, nr) { task = pid_task(pid, PIDTYPE_PID); if (task && !__fatal_signal_pending(task)) group_send_sig_info(SIGKILL, SEND_SIG_PRIV, task, PIDTYPE_MAX); } read_unlock(&tasklist_lock); rcu_read_unlock(); /* * Reap the EXIT_ZOMBIE children we had before we ignored SIGCHLD. * kernel_wait4() will also block until our children traced from the * parent namespace are detached and become EXIT_DEAD. */ do { clear_thread_flag(TIF_SIGPENDING); clear_thread_flag(TIF_NOTIFY_SIGNAL); rc = kernel_wait4(-1, NULL, __WALL, NULL); } while (rc != -ECHILD); /* * kernel_wait4() misses EXIT_DEAD children, and EXIT_ZOMBIE * process whose parents processes are outside of the pid * namespace. Such processes are created with setns()+fork(). * * If those EXIT_ZOMBIE processes are not reaped by their * parents before their parents exit, they will be reparented * to pid_ns->child_reaper. Thus pidns->child_reaper needs to * stay valid until they all go away. * * The code relies on the pid_ns->child_reaper ignoring * SIGCHILD to cause those EXIT_ZOMBIE processes to be * autoreaped if reparented. * * Semantically it is also desirable to wait for EXIT_ZOMBIE * processes before allowing the child_reaper to be reaped, as * that gives the invariant that when the init process of a * pid namespace is reaped all of the processes in the pid * namespace are gone. * * Once all of the other tasks are gone from the pid_namespace * free_pid() will awaken this task. */ for (;;) { set_current_state(TASK_INTERRUPTIBLE); if (pid_ns->pid_allocated == init_pids) break; schedule(); } __set_current_state(TASK_RUNNING); if (pid_ns->reboot) current->signal->group_exit_code = pid_ns->reboot; acct_exit_ns(pid_ns); return; } #ifdef CONFIG_CHECKPOINT_RESTORE static int pid_ns_ctl_handler(const struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { struct pid_namespace *pid_ns = task_active_pid_ns(current); struct ctl_table tmp = *table; int ret, next; if (write && !checkpoint_restore_ns_capable(pid_ns->user_ns)) return -EPERM; next = idr_get_cursor(&pid_ns->idr) - 1; tmp.data = &next; tmp.extra2 = &pid_ns->pid_max; ret = proc_dointvec_minmax(&tmp, write, buffer, lenp, ppos); if (!ret && write) idr_set_cursor(&pid_ns->idr, next + 1); return ret; } static const struct ctl_table pid_ns_ctl_table[] = { { .procname = "ns_last_pid", .maxlen = sizeof(int), .mode = 0666, /* permissions are checked in the handler */ .proc_handler = pid_ns_ctl_handler, .extra1 = SYSCTL_ZERO, .extra2 = &init_pid_ns.pid_max, }, }; #endif /* CONFIG_CHECKPOINT_RESTORE */ int reboot_pid_ns(struct pid_namespace *pid_ns, int cmd) { if (pid_ns == &init_pid_ns) return 0; switch (cmd) { case LINUX_REBOOT_CMD_RESTART2: case LINUX_REBOOT_CMD_RESTART: pid_ns->reboot = SIGHUP; break; case LINUX_REBOOT_CMD_POWER_OFF: case LINUX_REBOOT_CMD_HALT: pid_ns->reboot = SIGINT; break; default: return -EINVAL; } read_lock(&tasklist_lock); send_sig(SIGKILL, pid_ns->child_reaper, 1); read_unlock(&tasklist_lock); do_exit(0); /* Not reached */ return 0; } static struct ns_common *pidns_get(struct task_struct *task) { struct pid_namespace *ns; rcu_read_lock(); ns = task_active_pid_ns(task); if (ns) get_pid_ns(ns); rcu_read_unlock(); return ns ? &ns->ns : NULL; } static struct ns_common *pidns_for_children_get(struct task_struct *task) { struct pid_namespace *ns = NULL; task_lock(task); if (task->nsproxy) { ns = task->nsproxy->pid_ns_for_children; get_pid_ns(ns); } task_unlock(task); if (ns) { read_lock(&tasklist_lock); if (!ns->child_reaper) { put_pid_ns(ns); ns = NULL; } read_unlock(&tasklist_lock); } return ns ? &ns->ns : NULL; } static void pidns_put(struct ns_common *ns) { put_pid_ns(to_pid_ns(ns)); } bool pidns_is_ancestor(struct pid_namespace *child, struct pid_namespace *ancestor) { struct pid_namespace *ns; if (child->level < ancestor->level) return false; for (ns = child; ns->level > ancestor->level; ns = ns->parent) ; return ns == ancestor; } static int pidns_install(struct nsset *nsset, struct ns_common *ns) { struct nsproxy *nsproxy = nsset->nsproxy; struct pid_namespace *active = task_active_pid_ns(current); struct pid_namespace *new = to_pid_ns(ns); if (!ns_capable(new->user_ns, CAP_SYS_ADMIN) || !ns_capable(nsset->cred->user_ns, CAP_SYS_ADMIN)) return -EPERM; /* * Only allow entering the current active pid namespace * or a child of the current active pid namespace. * * This is required for fork to return a usable pid value and * this maintains the property that processes and their * children can not escape their current pid namespace. */ if (!pidns_is_ancestor(new, active)) return -EINVAL; put_pid_ns(nsproxy->pid_ns_for_children); nsproxy->pid_ns_for_children = get_pid_ns(new); return 0; } static struct ns_common *pidns_get_parent(struct ns_common *ns) { struct pid_namespace *active = task_active_pid_ns(current); struct pid_namespace *pid_ns, *p; /* See if the parent is in the current namespace */ pid_ns = p = to_pid_ns(ns)->parent; for (;;) { if (!p) return ERR_PTR(-EPERM); if (p == active) break; p = p->parent; } return &get_pid_ns(pid_ns)->ns; } static struct user_namespace *pidns_owner(struct ns_common *ns) { return to_pid_ns(ns)->user_ns; } const struct proc_ns_operations pidns_operations = { .name = "pid", .get = pidns_get, .put = pidns_put, .install = pidns_install, .owner = pidns_owner, .get_parent = pidns_get_parent, }; const struct proc_ns_operations pidns_for_children_operations = { .name = "pid_for_children", .real_ns_name = "pid", .get = pidns_for_children_get, .put = pidns_put, .install = pidns_install, .owner = pidns_owner, .get_parent = pidns_get_parent, }; static __init int pid_namespaces_init(void) { pid_ns_cachep = KMEM_CACHE(pid_namespace, SLAB_PANIC | SLAB_ACCOUNT); #ifdef CONFIG_CHECKPOINT_RESTORE register_sysctl_init("kernel", pid_ns_ctl_table); #endif register_pid_ns_sysctl_table_vm(); ns_tree_add(&init_pid_ns); return 0; } __initcall(pid_namespaces_init); |
| 5 5 3 3 12 10 2 9 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 | // SPDX-License-Identifier: GPL-2.0-only /* * Copyright (c) 2012-2016 Pablo Neira Ayuso <pablo@netfilter.org> */ #include <linux/init.h> #include <linux/module.h> #include <linux/skbuff.h> #include <linux/netlink.h> #include <linux/netfilter.h> #include <linux/netfilter/nf_tables.h> #include <net/netfilter/nf_tables_core.h> #define nft_objref_priv(expr) *((struct nft_object **)nft_expr_priv(expr)) void nft_objref_eval(const struct nft_expr *expr, struct nft_regs *regs, const struct nft_pktinfo *pkt) { struct nft_object *obj = nft_objref_priv(expr); obj->ops->eval(obj, regs, pkt); } static int nft_objref_validate_obj_type(const struct nft_ctx *ctx, u32 type) { unsigned int hooks; switch (type) { case NFT_OBJECT_SYNPROXY: if (ctx->family != NFPROTO_IPV4 && ctx->family != NFPROTO_IPV6 && ctx->family != NFPROTO_INET) return -EOPNOTSUPP; hooks = (1 << NF_INET_LOCAL_IN) | (1 << NF_INET_FORWARD); return nft_chain_validate_hooks(ctx->chain, hooks); default: break; } return 0; } static int nft_objref_validate(const struct nft_ctx *ctx, const struct nft_expr *expr) { struct nft_object *obj = nft_objref_priv(expr); return nft_objref_validate_obj_type(ctx, obj->ops->type->type); } static int nft_objref_init(const struct nft_ctx *ctx, const struct nft_expr *expr, const struct nlattr * const tb[]) { struct nft_object *obj = nft_objref_priv(expr); u8 genmask = nft_genmask_next(ctx->net); u32 objtype; if (!tb[NFTA_OBJREF_IMM_NAME] || !tb[NFTA_OBJREF_IMM_TYPE]) return -EINVAL; objtype = ntohl(nla_get_be32(tb[NFTA_OBJREF_IMM_TYPE])); obj = nft_obj_lookup(ctx->net, ctx->table, tb[NFTA_OBJREF_IMM_NAME], objtype, genmask); if (IS_ERR(obj)) return -ENOENT; if (!nft_use_inc(&obj->use)) return -EMFILE; nft_objref_priv(expr) = obj; return 0; } static int nft_objref_dump(struct sk_buff *skb, const struct nft_expr *expr, bool reset) { const struct nft_object *obj = nft_objref_priv(expr); if (nla_put_string(skb, NFTA_OBJREF_IMM_NAME, obj->key.name) || nla_put_be32(skb, NFTA_OBJREF_IMM_TYPE, htonl(obj->ops->type->type))) goto nla_put_failure; return 0; nla_put_failure: return -1; } static void nft_objref_deactivate(const struct nft_ctx *ctx, const struct nft_expr *expr, enum nft_trans_phase phase) { struct nft_object *obj = nft_objref_priv(expr); if (phase == NFT_TRANS_COMMIT) return; nft_use_dec(&obj->use); } static void nft_objref_activate(const struct nft_ctx *ctx, const struct nft_expr *expr) { struct nft_object *obj = nft_objref_priv(expr); nft_use_inc_restore(&obj->use); } static const struct nft_expr_ops nft_objref_ops = { .type = &nft_objref_type, .size = NFT_EXPR_SIZE(sizeof(struct nft_object *)), .eval = nft_objref_eval, .init = nft_objref_init, .activate = nft_objref_activate, .deactivate = nft_objref_deactivate, .dump = nft_objref_dump, .validate = nft_objref_validate, .reduce = NFT_REDUCE_READONLY, }; struct nft_objref_map { struct nft_set *set; u8 sreg; struct nft_set_binding binding; }; void nft_objref_map_eval(const struct nft_expr *expr, struct nft_regs *regs, const struct nft_pktinfo *pkt) { struct nft_objref_map *priv = nft_expr_priv(expr); const struct nft_set *set = priv->set; struct net *net = nft_net(pkt); const struct nft_set_ext *ext; struct nft_object *obj; ext = nft_set_do_lookup(net, set, ®s->data[priv->sreg]); if (!ext) { ext = nft_set_catchall_lookup(net, set); if (!ext) { regs->verdict.code = NFT_BREAK; return; } } obj = *nft_set_ext_obj(ext); obj->ops->eval(obj, regs, pkt); } static int nft_objref_map_init(const struct nft_ctx *ctx, const struct nft_expr *expr, const struct nlattr * const tb[]) { struct nft_objref_map *priv = nft_expr_priv(expr); u8 genmask = nft_genmask_next(ctx->net); struct nft_set *set; int err; set = nft_set_lookup_global(ctx->net, ctx->table, tb[NFTA_OBJREF_SET_NAME], tb[NFTA_OBJREF_SET_ID], genmask); if (IS_ERR(set)) return PTR_ERR(set); if (!(set->flags & NFT_SET_OBJECT)) return -EINVAL; err = nft_parse_register_load(ctx, tb[NFTA_OBJREF_SET_SREG], &priv->sreg, set->klen); if (err < 0) return err; priv->binding.flags = set->flags & NFT_SET_OBJECT; err = nf_tables_bind_set(ctx, set, &priv->binding); if (err < 0) return err; priv->set = set; return 0; } static int nft_objref_map_dump(struct sk_buff *skb, const struct nft_expr *expr, bool reset) { const struct nft_objref_map *priv = nft_expr_priv(expr); if (nft_dump_register(skb, NFTA_OBJREF_SET_SREG, priv->sreg) || nla_put_string(skb, NFTA_OBJREF_SET_NAME, priv->set->name)) goto nla_put_failure; return 0; nla_put_failure: return -1; } static void nft_objref_map_deactivate(const struct nft_ctx *ctx, const struct nft_expr *expr, enum nft_trans_phase phase) { struct nft_objref_map *priv = nft_expr_priv(expr); nf_tables_deactivate_set(ctx, priv->set, &priv->binding, phase); } static void nft_objref_map_activate(const struct nft_ctx *ctx, const struct nft_expr *expr) { struct nft_objref_map *priv = nft_expr_priv(expr); nf_tables_activate_set(ctx, priv->set); } static void nft_objref_map_destroy(const struct nft_ctx *ctx, const struct nft_expr *expr) { struct nft_objref_map *priv = nft_expr_priv(expr); nf_tables_destroy_set(ctx, priv->set); } static int nft_objref_map_validate(const struct nft_ctx *ctx, const struct nft_expr *expr) { const struct nft_objref_map *priv = nft_expr_priv(expr); return nft_objref_validate_obj_type(ctx, priv->set->objtype); } static const struct nft_expr_ops nft_objref_map_ops = { .type = &nft_objref_type, .size = NFT_EXPR_SIZE(sizeof(struct nft_objref_map)), .eval = nft_objref_map_eval, .init = nft_objref_map_init, .activate = nft_objref_map_activate, .deactivate = nft_objref_map_deactivate, .destroy = nft_objref_map_destroy, .dump = nft_objref_map_dump, .validate = nft_objref_map_validate, .reduce = NFT_REDUCE_READONLY, }; static const struct nft_expr_ops * nft_objref_select_ops(const struct nft_ctx *ctx, const struct nlattr * const tb[]) { if (tb[NFTA_OBJREF_SET_SREG] && (tb[NFTA_OBJREF_SET_NAME] || tb[NFTA_OBJREF_SET_ID])) return &nft_objref_map_ops; else if (tb[NFTA_OBJREF_IMM_NAME] && tb[NFTA_OBJREF_IMM_TYPE]) return &nft_objref_ops; return ERR_PTR(-EOPNOTSUPP); } static const struct nla_policy nft_objref_policy[NFTA_OBJREF_MAX + 1] = { [NFTA_OBJREF_IMM_NAME] = { .type = NLA_STRING, .len = NFT_OBJ_MAXNAMELEN - 1 }, [NFTA_OBJREF_IMM_TYPE] = { .type = NLA_U32 }, [NFTA_OBJREF_SET_SREG] = { .type = NLA_U32 }, [NFTA_OBJREF_SET_NAME] = { .type = NLA_STRING, .len = NFT_SET_MAXNAMELEN - 1 }, [NFTA_OBJREF_SET_ID] = { .type = NLA_U32 }, }; struct nft_expr_type nft_objref_type __read_mostly = { .name = "objref", .select_ops = nft_objref_select_ops, .policy = nft_objref_policy, .maxattr = NFTA_OBJREF_MAX, .owner = THIS_MODULE, }; |
| 2 2 2 5 5 4 1 1 5 3 2 2 2 5 5 5 5 5 5 5 5 5 8 2 1 5 5 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 | // SPDX-License-Identifier: GPL-2.0-only /* * Copyright (c) 2007-2014 Nicira, Inc. */ #include <linux/uaccess.h> #include <linux/netdevice.h> #include <linux/etherdevice.h> #include <linux/if_ether.h> #include <linux/if_vlan.h> #include <net/llc_pdu.h> #include <linux/kernel.h> #include <linux/jhash.h> #include <linux/jiffies.h> #include <linux/llc.h> #include <linux/module.h> #include <linux/in.h> #include <linux/rcupdate.h> #include <linux/cpumask.h> #include <linux/if_arp.h> #include <linux/ip.h> #include <linux/ipv6.h> #include <linux/mpls.h> #include <linux/sctp.h> #include <linux/smp.h> #include <linux/tcp.h> #include <linux/udp.h> #include <linux/icmp.h> #include <linux/icmpv6.h> #include <linux/rculist.h> #include <net/ip.h> #include <net/ip_tunnels.h> #include <net/ipv6.h> #include <net/mpls.h> #include <net/ndisc.h> #include <net/nsh.h> #include <net/pkt_cls.h> #include <net/netfilter/nf_conntrack_zones.h> #include "conntrack.h" #include "datapath.h" #include "flow.h" #include "flow_netlink.h" #include "vport.h" u64 ovs_flow_used_time(unsigned long flow_jiffies) { struct timespec64 cur_ts; u64 cur_ms, idle_ms; ktime_get_ts64(&cur_ts); idle_ms = jiffies_to_msecs(jiffies - flow_jiffies); cur_ms = (u64)(u32)cur_ts.tv_sec * MSEC_PER_SEC + cur_ts.tv_nsec / NSEC_PER_MSEC; return cur_ms - idle_ms; } #define TCP_FLAGS_BE16(tp) (*(__be16 *)&tcp_flag_word(tp) & htons(0x0FFF)) void ovs_flow_stats_update(struct sw_flow *flow, __be16 tcp_flags, const struct sk_buff *skb) { struct sw_flow_stats *stats; unsigned int cpu = smp_processor_id(); int len = skb->len + (skb_vlan_tag_present(skb) ? VLAN_HLEN : 0); stats = rcu_dereference(flow->stats[cpu]); /* Check if already have CPU-specific stats. */ if (likely(stats)) { spin_lock(&stats->lock); /* Mark if we write on the pre-allocated stats. */ if (cpu == 0 && unlikely(flow->stats_last_writer != cpu)) flow->stats_last_writer = cpu; } else { stats = rcu_dereference(flow->stats[0]); /* Pre-allocated. */ spin_lock(&stats->lock); /* If the current CPU is the only writer on the * pre-allocated stats keep using them. */ if (unlikely(flow->stats_last_writer != cpu)) { /* A previous locker may have already allocated the * stats, so we need to check again. If CPU-specific * stats were already allocated, we update the pre- * allocated stats as we have already locked them. */ if (likely(flow->stats_last_writer != -1) && likely(!rcu_access_pointer(flow->stats[cpu]))) { /* Try to allocate CPU-specific stats. */ struct sw_flow_stats *new_stats; new_stats = kmem_cache_alloc_node(flow_stats_cache, GFP_NOWAIT | __GFP_THISNODE | __GFP_NOWARN | __GFP_NOMEMALLOC, numa_node_id()); if (likely(new_stats)) { new_stats->used = jiffies; new_stats->packet_count = 1; new_stats->byte_count = len; new_stats->tcp_flags = tcp_flags; spin_lock_init(&new_stats->lock); rcu_assign_pointer(flow->stats[cpu], new_stats); cpumask_set_cpu(cpu, flow->cpu_used_mask); goto unlock; } } flow->stats_last_writer = cpu; } } stats->used = jiffies; stats->packet_count++; stats->byte_count += len; stats->tcp_flags |= tcp_flags; unlock: spin_unlock(&stats->lock); } /* Must be called with rcu_read_lock or ovs_mutex. */ void ovs_flow_stats_get(const struct sw_flow *flow, struct ovs_flow_stats *ovs_stats, unsigned long *used, __be16 *tcp_flags) { unsigned int cpu; *used = 0; *tcp_flags = 0; memset(ovs_stats, 0, sizeof(*ovs_stats)); for_each_cpu(cpu, flow->cpu_used_mask) { struct sw_flow_stats *stats = rcu_dereference_ovsl(flow->stats[cpu]); if (stats) { /* Local CPU may write on non-local stats, so we must * block bottom-halves here. */ spin_lock_bh(&stats->lock); if (!*used || time_after(stats->used, *used)) *used = stats->used; *tcp_flags |= stats->tcp_flags; ovs_stats->n_packets += stats->packet_count; ovs_stats->n_bytes += stats->byte_count; spin_unlock_bh(&stats->lock); } } } /* Called with ovs_mutex. */ void ovs_flow_stats_clear(struct sw_flow *flow) { unsigned int cpu; for_each_cpu(cpu, flow->cpu_used_mask) { struct sw_flow_stats *stats = ovsl_dereference(flow->stats[cpu]); if (stats) { spin_lock_bh(&stats->lock); stats->used = 0; stats->packet_count = 0; stats->byte_count = 0; stats->tcp_flags = 0; spin_unlock_bh(&stats->lock); } } } static int check_header(struct sk_buff *skb, int len) { if (unlikely(skb->len < len)) return -EINVAL; if (unlikely(!pskb_may_pull(skb, len))) return -ENOMEM; return 0; } static bool arphdr_ok(struct sk_buff *skb) { return pskb_may_pull(skb, skb_network_offset(skb) + sizeof(struct arp_eth_header)); } static int check_iphdr(struct sk_buff *skb) { unsigned int nh_ofs = skb_network_offset(skb); unsigned int ip_len; int err; err = check_header(skb, nh_ofs + sizeof(struct iphdr)); if (unlikely(err)) return err; ip_len = ip_hdrlen(skb); if (unlikely(ip_len < sizeof(struct iphdr) || skb->len < nh_ofs + ip_len)) return -EINVAL; skb_set_transport_header(skb, nh_ofs + ip_len); return 0; } static bool tcphdr_ok(struct sk_buff *skb) { int th_ofs = skb_transport_offset(skb); int tcp_len; if (unlikely(!pskb_may_pull(skb, th_ofs + sizeof(struct tcphdr)))) return false; tcp_len = tcp_hdrlen(skb); if (unlikely(tcp_len < sizeof(struct tcphdr) || skb->len < th_ofs + tcp_len)) return false; return true; } static bool udphdr_ok(struct sk_buff *skb) { return pskb_may_pull(skb, skb_transport_offset(skb) + sizeof(struct udphdr)); } static bool sctphdr_ok(struct sk_buff *skb) { return pskb_may_pull(skb, skb_transport_offset(skb) + sizeof(struct sctphdr)); } static bool icmphdr_ok(struct sk_buff *skb) { return pskb_may_pull(skb, skb_transport_offset(skb) + sizeof(struct icmphdr)); } /** * get_ipv6_ext_hdrs() - Parses packet and sets IPv6 extension header flags. * * @skb: buffer where extension header data starts in packet * @nh: ipv6 header * @ext_hdrs: flags are stored here * * OFPIEH12_UNREP is set if more than one of a given IPv6 extension header * is unexpectedly encountered. (Two destination options headers may be * expected and would not cause this bit to be set.) * * OFPIEH12_UNSEQ is set if IPv6 extension headers were not in the order * preferred (but not required) by RFC 2460: * * When more than one extension header is used in the same packet, it is * recommended that those headers appear in the following order: * IPv6 header * Hop-by-Hop Options header * Destination Options header * Routing header * Fragment header * Authentication header * Encapsulating Security Payload header * Destination Options header * upper-layer header */ static void get_ipv6_ext_hdrs(struct sk_buff *skb, struct ipv6hdr *nh, u16 *ext_hdrs) { u8 next_type = nh->nexthdr; unsigned int start = skb_network_offset(skb) + sizeof(struct ipv6hdr); int dest_options_header_count = 0; *ext_hdrs = 0; while (ipv6_ext_hdr(next_type)) { struct ipv6_opt_hdr _hdr, *hp; switch (next_type) { case IPPROTO_NONE: *ext_hdrs |= OFPIEH12_NONEXT; /* stop parsing */ return; case IPPROTO_ESP: if (*ext_hdrs & OFPIEH12_ESP) *ext_hdrs |= OFPIEH12_UNREP; if ((*ext_hdrs & ~(OFPIEH12_HOP | OFPIEH12_DEST | OFPIEH12_ROUTER | IPPROTO_FRAGMENT | OFPIEH12_AUTH | OFPIEH12_UNREP)) || dest_options_header_count >= 2) { *ext_hdrs |= OFPIEH12_UNSEQ; } *ext_hdrs |= OFPIEH12_ESP; break; case IPPROTO_AH: if (*ext_hdrs & OFPIEH12_AUTH) *ext_hdrs |= OFPIEH12_UNREP; if ((*ext_hdrs & ~(OFPIEH12_HOP | OFPIEH12_DEST | OFPIEH12_ROUTER | IPPROTO_FRAGMENT | OFPIEH12_UNREP)) || dest_options_header_count >= 2) { *ext_hdrs |= OFPIEH12_UNSEQ; } *ext_hdrs |= OFPIEH12_AUTH; break; case IPPROTO_DSTOPTS: if (dest_options_header_count == 0) { if (*ext_hdrs & ~(OFPIEH12_HOP | OFPIEH12_UNREP)) *ext_hdrs |= OFPIEH12_UNSEQ; *ext_hdrs |= OFPIEH12_DEST; } else if (dest_options_header_count == 1) { if (*ext_hdrs & ~(OFPIEH12_HOP | OFPIEH12_DEST | OFPIEH12_ROUTER | OFPIEH12_FRAG | OFPIEH12_AUTH | OFPIEH12_ESP | OFPIEH12_UNREP)) { *ext_hdrs |= OFPIEH12_UNSEQ; } } else { *ext_hdrs |= OFPIEH12_UNREP; } dest_options_header_count++; break; case IPPROTO_FRAGMENT: if (*ext_hdrs & OFPIEH12_FRAG) *ext_hdrs |= OFPIEH12_UNREP; if ((*ext_hdrs & ~(OFPIEH12_HOP | OFPIEH12_DEST | OFPIEH12_ROUTER | OFPIEH12_UNREP)) || dest_options_header_count >= 2) { *ext_hdrs |= OFPIEH12_UNSEQ; } *ext_hdrs |= OFPIEH12_FRAG; break; case IPPROTO_ROUTING: if (*ext_hdrs & OFPIEH12_ROUTER) *ext_hdrs |= OFPIEH12_UNREP; if ((*ext_hdrs & ~(OFPIEH12_HOP | OFPIEH12_DEST | OFPIEH12_UNREP)) || dest_options_header_count >= 2) { *ext_hdrs |= OFPIEH12_UNSEQ; } *ext_hdrs |= OFPIEH12_ROUTER; break; case IPPROTO_HOPOPTS: if (*ext_hdrs & OFPIEH12_HOP) *ext_hdrs |= OFPIEH12_UNREP; /* OFPIEH12_HOP is set to 1 if a hop-by-hop IPv6 * extension header is present as the first * extension header in the packet. */ if (*ext_hdrs == 0) *ext_hdrs |= OFPIEH12_HOP; else *ext_hdrs |= OFPIEH12_UNSEQ; break; default: return; } hp = skb_header_pointer(skb, start, sizeof(_hdr), &_hdr); if (!hp) break; next_type = hp->nexthdr; start += ipv6_optlen(hp); } } static int parse_ipv6hdr(struct sk_buff *skb, struct sw_flow_key *key) { unsigned short frag_off; unsigned int payload_ofs = 0; unsigned int nh_ofs = skb_network_offset(skb); unsigned int nh_len; struct ipv6hdr *nh; int err, nexthdr, flags = 0; err = check_header(skb, nh_ofs + sizeof(*nh)); if (unlikely(err)) return err; nh = ipv6_hdr(skb); get_ipv6_ext_hdrs(skb, nh, &key->ipv6.exthdrs); key->ip.proto = NEXTHDR_NONE; key->ip.tos = ipv6_get_dsfield(nh); key->ip.ttl = nh->hop_limit; key->ipv6.label = *(__be32 *)nh & htonl(IPV6_FLOWINFO_FLOWLABEL); key->ipv6.addr.src = nh->saddr; key->ipv6.addr.dst = nh->daddr; nexthdr = ipv6_find_hdr(skb, &payload_ofs, -1, &frag_off, &flags); if (flags & IP6_FH_F_FRAG) { if (frag_off) { key->ip.frag = OVS_FRAG_TYPE_LATER; key->ip.proto = NEXTHDR_FRAGMENT; return 0; } key->ip.frag = OVS_FRAG_TYPE_FIRST; } else { key->ip.frag = OVS_FRAG_TYPE_NONE; } /* Delayed handling of error in ipv6_find_hdr() as it * always sets flags and frag_off to a valid value which may be * used to set key->ip.frag above. */ if (unlikely(nexthdr < 0)) return -EPROTO; nh_len = payload_ofs - nh_ofs; skb_set_transport_header(skb, nh_ofs + nh_len); key->ip.proto = nexthdr; return nh_len; } static bool icmp6hdr_ok(struct sk_buff *skb) { return pskb_may_pull(skb, skb_transport_offset(skb) + sizeof(struct icmp6hdr)); } /** * parse_vlan_tag - Parse vlan tag from vlan header. * @skb: skb containing frame to parse * @key_vh: pointer to parsed vlan tag * @untag_vlan: should the vlan header be removed from the frame * * Return: ERROR on memory error. * %0 if it encounters a non-vlan or incomplete packet. * %1 after successfully parsing vlan tag. */ static int parse_vlan_tag(struct sk_buff *skb, struct vlan_head *key_vh, bool untag_vlan) { struct vlan_head *vh = (struct vlan_head *)skb->data; if (likely(!eth_type_vlan(vh->tpid))) return 0; if (unlikely(skb->len < sizeof(struct vlan_head) + sizeof(__be16))) return 0; if (unlikely(!pskb_may_pull(skb, sizeof(struct vlan_head) + sizeof(__be16)))) return -ENOMEM; vh = (struct vlan_head *)skb->data; key_vh->tci = vh->tci | htons(VLAN_CFI_MASK); key_vh->tpid = vh->tpid; if (unlikely(untag_vlan)) { int offset = skb->data - skb_mac_header(skb); u16 tci; int err; __skb_push(skb, offset); err = __skb_vlan_pop(skb, &tci); __skb_pull(skb, offset); if (err) return err; __vlan_hwaccel_put_tag(skb, key_vh->tpid, tci); } else { __skb_pull(skb, sizeof(struct vlan_head)); } return 1; } static void clear_vlan(struct sw_flow_key *key) { key->eth.vlan.tci = 0; key->eth.vlan.tpid = 0; key->eth.cvlan.tci = 0; key->eth.cvlan.tpid = 0; } static int parse_vlan(struct sk_buff *skb, struct sw_flow_key *key) { int res; if (skb_vlan_tag_present(skb)) { key->eth.vlan.tci = htons(skb->vlan_tci) | htons(VLAN_CFI_MASK); key->eth.vlan.tpid = skb->vlan_proto; } else { /* Parse outer vlan tag in the non-accelerated case. */ res = parse_vlan_tag(skb, &key->eth.vlan, true); if (res <= 0) return res; } /* Parse inner vlan tag. */ res = parse_vlan_tag(skb, &key->eth.cvlan, false); if (res <= 0) return res; return 0; } static __be16 parse_ethertype(struct sk_buff *skb) { struct llc_snap_hdr { u8 dsap; /* Always 0xAA */ u8 ssap; /* Always 0xAA */ u8 ctrl; u8 oui[3]; __be16 ethertype; }; struct llc_snap_hdr *llc; __be16 proto; proto = *(__be16 *) skb->data; __skb_pull(skb, sizeof(__be16)); if (eth_proto_is_802_3(proto)) return proto; if (skb->len < sizeof(struct llc_snap_hdr)) return htons(ETH_P_802_2); if (unlikely(!pskb_may_pull(skb, sizeof(struct llc_snap_hdr)))) return htons(0); llc = (struct llc_snap_hdr *) skb->data; if (llc->dsap != LLC_SAP_SNAP || llc->ssap != LLC_SAP_SNAP || (llc->oui[0] | llc->oui[1] | llc->oui[2]) != 0) return htons(ETH_P_802_2); __skb_pull(skb, sizeof(struct llc_snap_hdr)); if (eth_proto_is_802_3(llc->ethertype)) return llc->ethertype; return htons(ETH_P_802_2); } static int parse_icmpv6(struct sk_buff *skb, struct sw_flow_key *key, int nh_len) { struct icmp6hdr *icmp = icmp6_hdr(skb); /* The ICMPv6 type and code fields use the 16-bit transport port * fields, so we need to store them in 16-bit network byte order. */ key->tp.src = htons(icmp->icmp6_type); key->tp.dst = htons(icmp->icmp6_code); if (icmp->icmp6_code == 0 && (icmp->icmp6_type == NDISC_NEIGHBOUR_SOLICITATION || icmp->icmp6_type == NDISC_NEIGHBOUR_ADVERTISEMENT)) { int icmp_len = skb->len - skb_transport_offset(skb); struct nd_msg *nd; int offset; memset(&key->ipv6.nd, 0, sizeof(key->ipv6.nd)); /* In order to process neighbor discovery options, we need the * entire packet. */ if (unlikely(icmp_len < sizeof(*nd))) return 0; if (unlikely(skb_linearize(skb))) return -ENOMEM; nd = (struct nd_msg *)skb_transport_header(skb); key->ipv6.nd.target = nd->target; icmp_len -= sizeof(*nd); offset = 0; while (icmp_len >= 8) { struct nd_opt_hdr *nd_opt = (struct nd_opt_hdr *)(nd->opt + offset); int opt_len = nd_opt->nd_opt_len * 8; if (unlikely(!opt_len || opt_len > icmp_len)) return 0; /* Store the link layer address if the appropriate * option is provided. It is considered an error if * the same link layer option is specified twice. */ if (nd_opt->nd_opt_type == ND_OPT_SOURCE_LL_ADDR && opt_len == 8) { if (unlikely(!is_zero_ether_addr(key->ipv6.nd.sll))) goto invalid; ether_addr_copy(key->ipv6.nd.sll, &nd->opt[offset+sizeof(*nd_opt)]); } else if (nd_opt->nd_opt_type == ND_OPT_TARGET_LL_ADDR && opt_len == 8) { if (unlikely(!is_zero_ether_addr(key->ipv6.nd.tll))) goto invalid; ether_addr_copy(key->ipv6.nd.tll, &nd->opt[offset+sizeof(*nd_opt)]); } icmp_len -= opt_len; offset += opt_len; } } return 0; invalid: memset(&key->ipv6.nd.target, 0, sizeof(key->ipv6.nd.target)); memset(key->ipv6.nd.sll, 0, sizeof(key->ipv6.nd.sll)); memset(key->ipv6.nd.tll, 0, sizeof(key->ipv6.nd.tll)); return 0; } static int parse_nsh(struct sk_buff *skb, struct sw_flow_key *key) { struct nshhdr *nh; unsigned int nh_ofs = skb_network_offset(skb); u8 version, length; int err; err = check_header(skb, nh_ofs + NSH_BASE_HDR_LEN); if (unlikely(err)) return err; nh = nsh_hdr(skb); version = nsh_get_ver(nh); length = nsh_hdr_len(nh); if (version != 0) return -EINVAL; err = check_header(skb, nh_ofs + length); if (unlikely(err)) return err; nh = nsh_hdr(skb); key->nsh.base.flags = nsh_get_flags(nh); key->nsh.base.ttl = nsh_get_ttl(nh); key->nsh.base.mdtype = nh->mdtype; key->nsh.base.np = nh->np; key->nsh.base.path_hdr = nh->path_hdr; switch (key->nsh.base.mdtype) { case NSH_M_TYPE1: if (length != NSH_M_TYPE1_LEN) return -EINVAL; memcpy(key->nsh.context, nh->md1.context, sizeof(nh->md1)); break; case NSH_M_TYPE2: memset(key->nsh.context, 0, sizeof(nh->md1)); break; default: return -EINVAL; } return 0; } /** * key_extract_l3l4 - extracts L3/L4 header information. * @skb: sk_buff that contains the frame, with skb->data pointing to the * L3 header * @key: output flow key * * Return: %0 if successful, otherwise a negative errno value. */ static int key_extract_l3l4(struct sk_buff *skb, struct sw_flow_key *key) { int error; /* Network layer. */ if (key->eth.type == htons(ETH_P_IP)) { struct iphdr *nh; __be16 offset; error = check_iphdr(skb); if (unlikely(error)) { memset(&key->ip, 0, sizeof(key->ip)); memset(&key->ipv4, 0, sizeof(key->ipv4)); if (error == -EINVAL) { skb->transport_header = skb->network_header; error = 0; } return error; } nh = ip_hdr(skb); key->ipv4.addr.src = nh->saddr; key->ipv4.addr.dst = nh->daddr; key->ip.proto = nh->protocol; key->ip.tos = nh->tos; key->ip.ttl = nh->ttl; offset = nh->frag_off & htons(IP_OFFSET); if (offset) { key->ip.frag = OVS_FRAG_TYPE_LATER; memset(&key->tp, 0, sizeof(key->tp)); return 0; } if (nh->frag_off & htons(IP_MF) || skb_shinfo(skb)->gso_type & SKB_GSO_UDP) key->ip.frag = OVS_FRAG_TYPE_FIRST; else key->ip.frag = OVS_FRAG_TYPE_NONE; /* Transport layer. */ if (key->ip.proto == IPPROTO_TCP) { if (tcphdr_ok(skb)) { struct tcphdr *tcp = tcp_hdr(skb); key->tp.src = tcp->source; key->tp.dst = tcp->dest; key->tp.flags = TCP_FLAGS_BE16(tcp); } else { memset(&key->tp, 0, sizeof(key->tp)); } } else if (key->ip.proto == IPPROTO_UDP) { if (udphdr_ok(skb)) { struct udphdr *udp = udp_hdr(skb); key->tp.src = udp->source; key->tp.dst = udp->dest; } else { memset(&key->tp, 0, sizeof(key->tp)); } } else if (key->ip.proto == IPPROTO_SCTP) { if (sctphdr_ok(skb)) { struct sctphdr *sctp = sctp_hdr(skb); key->tp.src = sctp->source; key->tp.dst = sctp->dest; } else { memset(&key->tp, 0, sizeof(key->tp)); } } else if (key->ip.proto == IPPROTO_ICMP) { if (icmphdr_ok(skb)) { struct icmphdr *icmp = icmp_hdr(skb); /* The ICMP type and code fields use the 16-bit * transport port fields, so we need to store * them in 16-bit network byte order. */ key->tp.src = htons(icmp->type); key->tp.dst = htons(icmp->code); } else { memset(&key->tp, 0, sizeof(key->tp)); } } } else if (key->eth.type == htons(ETH_P_ARP) || key->eth.type == htons(ETH_P_RARP)) { struct arp_eth_header *arp; bool arp_available = arphdr_ok(skb); arp = (struct arp_eth_header *)skb_network_header(skb); if (arp_available && arp->ar_hrd == htons(ARPHRD_ETHER) && arp->ar_pro == htons(ETH_P_IP) && arp->ar_hln == ETH_ALEN && arp->ar_pln == 4) { /* We only match on the lower 8 bits of the opcode. */ if (ntohs(arp->ar_op) <= 0xff) key->ip.proto = ntohs(arp->ar_op); else key->ip.proto = 0; memcpy(&key->ipv4.addr.src, arp->ar_sip, sizeof(key->ipv4.addr.src)); memcpy(&key->ipv4.addr.dst, arp->ar_tip, sizeof(key->ipv4.addr.dst)); ether_addr_copy(key->ipv4.arp.sha, arp->ar_sha); ether_addr_copy(key->ipv4.arp.tha, arp->ar_tha); } else { memset(&key->ip, 0, sizeof(key->ip)); memset(&key->ipv4, 0, sizeof(key->ipv4)); } } else if (eth_p_mpls(key->eth.type)) { size_t label_count = 1; memset(&key->mpls, 0, sizeof(key->mpls)); skb_set_inner_network_header(skb, skb->mac_len); while (1) { __be32 lse; error = check_header(skb, skb->mac_len + label_count * MPLS_HLEN); if (unlikely(error)) return 0; memcpy(&lse, skb_inner_network_header(skb), MPLS_HLEN); if (label_count <= MPLS_LABEL_DEPTH) memcpy(&key->mpls.lse[label_count - 1], &lse, MPLS_HLEN); skb_set_inner_network_header(skb, skb->mac_len + label_count * MPLS_HLEN); if (lse & htonl(MPLS_LS_S_MASK)) break; label_count++; } if (label_count > MPLS_LABEL_DEPTH) label_count = MPLS_LABEL_DEPTH; key->mpls.num_labels_mask = GENMASK(label_count - 1, 0); } else if (key->eth.type == htons(ETH_P_IPV6)) { int nh_len; /* IPv6 Header + Extensions */ nh_len = parse_ipv6hdr(skb, key); if (unlikely(nh_len < 0)) { switch (nh_len) { case -EINVAL: memset(&key->ip, 0, sizeof(key->ip)); memset(&key->ipv6.addr, 0, sizeof(key->ipv6.addr)); fallthrough; case -EPROTO: skb->transport_header = skb->network_header; error = 0; break; default: error = nh_len; } return error; } if (key->ip.frag == OVS_FRAG_TYPE_LATER) { memset(&key->tp, 0, sizeof(key->tp)); return 0; } if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP) key->ip.frag = OVS_FRAG_TYPE_FIRST; /* Transport layer. */ if (key->ip.proto == NEXTHDR_TCP) { if (tcphdr_ok(skb)) { struct tcphdr *tcp = tcp_hdr(skb); key->tp.src = tcp->source; key->tp.dst = tcp->dest; key->tp.flags = TCP_FLAGS_BE16(tcp); } else { memset(&key->tp, 0, sizeof(key->tp)); } } else if (key->ip.proto == NEXTHDR_UDP) { if (udphdr_ok(skb)) { struct udphdr *udp = udp_hdr(skb); key->tp.src = udp->source; key->tp.dst = udp->dest; } else { memset(&key->tp, 0, sizeof(key->tp)); } } else if (key->ip.proto == NEXTHDR_SCTP) { if (sctphdr_ok(skb)) { struct sctphdr *sctp = sctp_hdr(skb); key->tp.src = sctp->source; key->tp.dst = sctp->dest; } else { memset(&key->tp, 0, sizeof(key->tp)); } } else if (key->ip.proto == NEXTHDR_ICMP) { if (icmp6hdr_ok(skb)) { error = parse_icmpv6(skb, key, nh_len); if (error) return error; } else { memset(&key->tp, 0, sizeof(key->tp)); } } } else if (key->eth.type == htons(ETH_P_NSH)) { error = parse_nsh(skb, key); if (error) return error; } return 0; } /** * key_extract - extracts a flow key from an Ethernet frame. * @skb: sk_buff that contains the frame, with skb->data pointing to the * Ethernet header * @key: output flow key * * The caller must ensure that skb->len >= ETH_HLEN. * * Initializes @skb header fields as follows: * * - skb->mac_header: the L2 header. * * - skb->network_header: just past the L2 header, or just past the * VLAN header, to the first byte of the L2 payload. * * - skb->transport_header: If key->eth.type is ETH_P_IP or ETH_P_IPV6 * on output, then just past the IP header, if one is present and * of a correct length, otherwise the same as skb->network_header. * For other key->eth.type values it is left untouched. * * - skb->protocol: the type of the data starting at skb->network_header. * Equals to key->eth.type. * * Return: %0 if successful, otherwise a negative errno value. */ static int key_extract(struct sk_buff *skb, struct sw_flow_key *key) { struct ethhdr *eth; /* Flags are always used as part of stats */ key->tp.flags = 0; skb_reset_mac_header(skb); /* Link layer. */ clear_vlan(key); if (ovs_key_mac_proto(key) == MAC_PROTO_NONE) { if (unlikely(eth_type_vlan(skb->protocol))) return -EINVAL; skb_reset_network_header(skb); key->eth.type = skb->protocol; } else { eth = eth_hdr(skb); ether_addr_copy(key->eth.src, eth->h_source); ether_addr_copy(key->eth.dst, eth->h_dest); __skb_pull(skb, 2 * ETH_ALEN); /* We are going to push all headers that we pull, so no need to * update skb->csum here. */ if (unlikely(parse_vlan(skb, key))) return -ENOMEM; key->eth.type = parse_ethertype(skb); if (unlikely(key->eth.type == htons(0))) return -ENOMEM; /* Multiple tagged packets need to retain TPID to satisfy * skb_vlan_pop(), which will later shift the ethertype into * skb->protocol. */ if (key->eth.cvlan.tci & htons(VLAN_CFI_MASK)) skb->protocol = key->eth.cvlan.tpid; else skb->protocol = key->eth.type; skb_reset_network_header(skb); __skb_push(skb, skb->data - skb_mac_header(skb)); } skb_reset_mac_len(skb); /* Fill out L3/L4 key info, if any */ return key_extract_l3l4(skb, key); } /* In the case of conntrack fragment handling it expects L3 headers, * add a helper. */ int ovs_flow_key_update_l3l4(struct sk_buff *skb, struct sw_flow_key *key) { return key_extract_l3l4(skb, key); } int ovs_flow_key_update(struct sk_buff *skb, struct sw_flow_key *key) { int res; res = key_extract(skb, key); if (!res) key->mac_proto &= ~SW_FLOW_KEY_INVALID; return res; } static int key_extract_mac_proto(struct sk_buff *skb) { switch (skb->dev->type) { case ARPHRD_ETHER: return MAC_PROTO_ETHERNET; case ARPHRD_NONE: if (skb->protocol == htons(ETH_P_TEB)) return MAC_PROTO_ETHERNET; return MAC_PROTO_NONE; } WARN_ON_ONCE(1); return -EINVAL; } int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info, struct sk_buff *skb, struct sw_flow_key *key) { #if IS_ENABLED(CONFIG_NET_TC_SKB_EXT) struct tc_skb_ext *tc_ext; #endif bool post_ct = false, post_ct_snat = false, post_ct_dnat = false; int res, err; u16 zone = 0; /* Extract metadata from packet. */ if (tun_info) { key->tun_proto = ip_tunnel_info_af(tun_info); memcpy(&key->tun_key, &tun_info->key, sizeof(key->tun_key)); if (tun_info->options_len) { BUILD_BUG_ON((1 << (sizeof(tun_info->options_len) * 8)) - 1 > sizeof(key->tun_opts)); ip_tunnel_info_opts_get(TUN_METADATA_OPTS(key, tun_info->options_len), tun_info); key->tun_opts_len = tun_info->options_len; } else { key->tun_opts_len = 0; } } else { key->tun_proto = 0; key->tun_opts_len = 0; memset(&key->tun_key, 0, sizeof(key->tun_key)); } key->phy.priority = skb->priority; key->phy.in_port = OVS_CB(skb)->input_vport->port_no; key->phy.skb_mark = skb->mark; key->ovs_flow_hash = 0; res = key_extract_mac_proto(skb); if (res < 0) return res; key->mac_proto = res; #if IS_ENABLED(CONFIG_NET_TC_SKB_EXT) if (tc_skb_ext_tc_enabled()) { tc_ext = skb_ext_find(skb, TC_SKB_EXT); key->recirc_id = tc_ext && !tc_ext->act_miss ? tc_ext->chain : 0; OVS_CB(skb)->mru = tc_ext ? tc_ext->mru : 0; post_ct = tc_ext ? tc_ext->post_ct : false; post_ct_snat = post_ct ? tc_ext->post_ct_snat : false; post_ct_dnat = post_ct ? tc_ext->post_ct_dnat : false; zone = post_ct ? tc_ext->zone : 0; } else { key->recirc_id = 0; } #else key->recirc_id = 0; #endif err = key_extract(skb, key); if (!err) { ovs_ct_fill_key(skb, key, post_ct); /* Must be after key_extract(). */ if (post_ct) { if (!skb_get_nfct(skb)) { key->ct_zone = zone; } else { if (!post_ct_dnat) key->ct_state &= ~OVS_CS_F_DST_NAT; if (!post_ct_snat) key->ct_state &= ~OVS_CS_F_SRC_NAT; } } } return err; } int ovs_flow_key_extract_userspace(struct net *net, const struct nlattr *attr, struct sk_buff *skb, struct sw_flow_key *key, bool log) { const struct nlattr *a[OVS_KEY_ATTR_MAX + 1]; u64 attrs = 0; int err; err = parse_flow_nlattrs(attr, a, &attrs, log); if (err) return -EINVAL; /* Extract metadata from netlink attributes. */ err = ovs_nla_get_flow_metadata(net, a, attrs, key, log); if (err) return err; /* key_extract assumes that skb->protocol is set-up for * layer 3 packets which is the case for other callers, * in particular packets received from the network stack. * Here the correct value can be set from the metadata * extracted above. * For L2 packet key eth type would be zero. skb protocol * would be set to correct value later during key-extact. */ skb->protocol = key->eth.type; err = key_extract(skb, key); if (err) return err; /* Check that we have conntrack original direction tuple metadata only * for packets for which it makes sense. Otherwise the key may be * corrupted due to overlapping key fields. */ if (attrs & (1 << OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV4) && key->eth.type != htons(ETH_P_IP)) return -EINVAL; if (attrs & (1 << OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV6) && (key->eth.type != htons(ETH_P_IPV6) || sw_flow_key_is_nd(key))) return -EINVAL; return 0; } |
| 8300 45 8300 45 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 | /* SPDX-License-Identifier: GPL-2.0-only */ #ifndef LLIST_H #define LLIST_H /* * Lock-less NULL terminated single linked list * * Cases where locking is not needed: * If there are multiple producers and multiple consumers, llist_add can be * used in producers and llist_del_all can be used in consumers simultaneously * without locking. Also a single consumer can use llist_del_first while * multiple producers simultaneously use llist_add, without any locking. * * Cases where locking is needed: * If we have multiple consumers with llist_del_first used in one consumer, and * llist_del_first or llist_del_all used in other consumers, then a lock is * needed. This is because llist_del_first depends on list->first->next not * changing, but without lock protection, there's no way to be sure about that * if a preemption happens in the middle of the delete operation and on being * preempted back, the list->first is the same as before causing the cmpxchg in * llist_del_first to succeed. For example, while a llist_del_first operation * is in progress in one consumer, then a llist_del_first, llist_add, * llist_add (or llist_del_all, llist_add, llist_add) sequence in another * consumer may cause violations. * * This can be summarized as follows: * * | add | del_first | del_all * add | - | - | - * del_first | | L | L * del_all | | | - * * Where, a particular row's operation can happen concurrently with a column's * operation, with "-" being no lock needed, while "L" being lock is needed. * * The list entries deleted via llist_del_all can be traversed with * traversing function such as llist_for_each etc. But the list * entries can not be traversed safely before deleted from the list. * The order of deleted entries is from the newest to the oldest added * one. If you want to traverse from the oldest to the newest, you * must reverse the order by yourself before traversing. * * The basic atomic operation of this list is cmpxchg on long. On * architectures that don't have NMI-safe cmpxchg implementation, the * list can NOT be used in NMI handlers. So code that uses the list in * an NMI handler should depend on CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG. * * Copyright 2010,2011 Intel Corp. * Author: Huang Ying <ying.huang@intel.com> */ #include <linux/atomic.h> #include <linux/container_of.h> #include <linux/stddef.h> #include <linux/types.h> struct llist_head { struct llist_node *first; }; struct llist_node { struct llist_node *next; }; #define LLIST_HEAD_INIT(name) { NULL } #define LLIST_HEAD(name) struct llist_head name = LLIST_HEAD_INIT(name) /** * init_llist_head - initialize lock-less list head * @head: the head for your lock-less list */ static inline void init_llist_head(struct llist_head *list) { list->first = NULL; } /** * init_llist_node - initialize lock-less list node * @node: the node to be initialised * * In cases where there is a need to test if a node is on * a list or not, this initialises the node to clearly * not be on any list. */ static inline void init_llist_node(struct llist_node *node) { WRITE_ONCE(node->next, node); } /** * llist_on_list - test if a lock-list list node is on a list * @node: the node to test * * When a node is on a list the ->next pointer will be NULL or * some other node. It can never point to itself. We use that * in init_llist_node() to record that a node is not on any list, * and here to test whether it is on any list. */ static inline bool llist_on_list(const struct llist_node *node) { return READ_ONCE(node->next) != node; } /** * llist_entry - get the struct of this entry * @ptr: the &struct llist_node pointer. * @type: the type of the struct this is embedded in. * @member: the name of the llist_node within the struct. */ #define llist_entry(ptr, type, member) \ container_of(ptr, type, member) /** * member_address_is_nonnull - check whether the member address is not NULL * @ptr: the object pointer (struct type * that contains the llist_node) * @member: the name of the llist_node within the struct. * * This macro is conceptually the same as * &ptr->member != NULL * but it works around the fact that compilers can decide that taking a member * address is never a NULL pointer. * * Real objects that start at a high address and have a member at NULL are * unlikely to exist, but such pointers may be returned e.g. by the * container_of() macro. */ #define member_address_is_nonnull(ptr, member) \ ((uintptr_t)(ptr) + offsetof(typeof(*(ptr)), member) != 0) /** * llist_for_each - iterate over some deleted entries of a lock-less list * @pos: the &struct llist_node to use as a loop cursor * @node: the first entry of deleted list entries * * In general, some entries of the lock-less list can be traversed * safely only after being deleted from list, so start with an entry * instead of list head. * * If being used on entries deleted from lock-less list directly, the * traverse order is from the newest to the oldest added entry. If * you want to traverse from the oldest to the newest, you must * reverse the order by yourself before traversing. */ #define llist_for_each(pos, node) \ for ((pos) = (node); pos; (pos) = (pos)->next) /** * llist_for_each_safe - iterate over some deleted entries of a lock-less list * safe against removal of list entry * @pos: the &struct llist_node to use as a loop cursor * @n: another &struct llist_node to use as temporary storage * @node: the first entry of deleted list entries * * In general, some entries of the lock-less list can be traversed * safely only after being deleted from list, so start with an entry * instead of list head. * * If being used on entries deleted from lock-less list directly, the * traverse order is from the newest to the oldest added entry. If * you want to traverse from the oldest to the newest, you must * reverse the order by yourself before traversing. */ #define llist_for_each_safe(pos, n, node) \ for ((pos) = (node); (pos) && ((n) = (pos)->next, true); (pos) = (n)) /** * llist_for_each_entry - iterate over some deleted entries of lock-less list of given type * @pos: the type * to use as a loop cursor. * @node: the fist entry of deleted list entries. * @member: the name of the llist_node with the struct. * * In general, some entries of the lock-less list can be traversed * safely only after being removed from list, so start with an entry * instead of list head. * * If being used on entries deleted from lock-less list directly, the * traverse order is from the newest to the oldest added entry. If * you want to traverse from the oldest to the newest, you must * reverse the order by yourself before traversing. */ #define llist_for_each_entry(pos, node, member) \ for ((pos) = llist_entry((node), typeof(*(pos)), member); \ member_address_is_nonnull(pos, member); \ (pos) = llist_entry((pos)->member.next, typeof(*(pos)), member)) /** * llist_for_each_entry_safe - iterate over some deleted entries of lock-less list of given type * safe against removal of list entry * @pos: the type * to use as a loop cursor. * @n: another type * to use as temporary storage * @node: the first entry of deleted list entries. * @member: the name of the llist_node with the struct. * * In general, some entries of the lock-less list can be traversed * safely only after being removed from list, so start with an entry * instead of list head. * * If being used on entries deleted from lock-less list directly, the * traverse order is from the newest to the oldest added entry. If * you want to traverse from the oldest to the newest, you must * reverse the order by yourself before traversing. */ #define llist_for_each_entry_safe(pos, n, node, member) \ for (pos = llist_entry((node), typeof(*pos), member); \ member_address_is_nonnull(pos, member) && \ (n = llist_entry(pos->member.next, typeof(*n), member), true); \ pos = n) /** * llist_empty - tests whether a lock-less list is empty * @head: the list to test * * Not guaranteed to be accurate or up to date. Just a quick way to * test whether the list is empty without deleting something from the * list. */ static inline bool llist_empty(const struct llist_head *head) { return READ_ONCE(head->first) == NULL; } static inline struct llist_node *llist_next(struct llist_node *node) { return READ_ONCE(node->next); } /** * llist_add_batch - add several linked entries in batch * @new_first: first entry in batch to be added * @new_last: last entry in batch to be added * @head: the head for your lock-less list * * Return whether list is empty before adding. */ static inline bool llist_add_batch(struct llist_node *new_first, struct llist_node *new_last, struct llist_head *head) { struct llist_node *first = READ_ONCE(head->first); do { new_last->next = first; } while (!try_cmpxchg(&head->first, &first, new_first)); return !first; } static inline bool __llist_add_batch(struct llist_node *new_first, struct llist_node *new_last, struct llist_head *head) { new_last->next = head->first; head->first = new_first; return new_last->next == NULL; } /** * llist_add - add a new entry * @new: new entry to be added * @head: the head for your lock-less list * * Returns true if the list was empty prior to adding this entry. */ static inline bool llist_add(struct llist_node *new, struct llist_head *head) { return llist_add_batch(new, new, head); } static inline bool __llist_add(struct llist_node *new, struct llist_head *head) { return __llist_add_batch(new, new, head); } /** * llist_del_all - delete all entries from lock-less list * @head: the head of lock-less list to delete all entries * * If list is empty, return NULL, otherwise, delete all entries and * return the pointer to the first entry. The order of entries * deleted is from the newest to the oldest added one. */ static inline struct llist_node *llist_del_all(struct llist_head *head) { return xchg(&head->first, NULL); } static inline struct llist_node *__llist_del_all(struct llist_head *head) { struct llist_node *first = head->first; head->first = NULL; return first; } extern struct llist_node *llist_del_first(struct llist_head *head); /** * llist_del_first_init - delete first entry from lock-list and mark is as being off-list * @head: the head of lock-less list to delete from. * * This behave the same as llist_del_first() except that llist_init_node() is called * on the returned node so that llist_on_list() will report false for the node. */ static inline struct llist_node *llist_del_first_init(struct llist_head *head) { struct llist_node *n = llist_del_first(head); if (n) init_llist_node(n); return n; } extern bool llist_del_first_this(struct llist_head *head, struct llist_node *this); struct llist_node *llist_reverse_order(struct llist_node *head); #endif /* LLIST_H */ |
| 1093 134 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 | /* SPDX-License-Identifier: GPL-2.0 */ /* * This is <linux/capability.h> * * Andrew G. Morgan <morgan@kernel.org> * Alexander Kjeldaas <astor@guardian.no> * with help from Aleph1, Roland Buresund and Andrew Main. * * See here for the libcap library ("POSIX draft" compliance): * * ftp://www.kernel.org/pub/linux/libs/security/linux-privs/kernel-2.6/ */ #ifndef _LINUX_CAPABILITY_H #define _LINUX_CAPABILITY_H #include <uapi/linux/capability.h> #include <linux/uidgid.h> #include <linux/bits.h> #define _KERNEL_CAPABILITY_VERSION _LINUX_CAPABILITY_VERSION_3 extern int file_caps_enabled; typedef struct { u64 val; } kernel_cap_t; /* same as vfs_ns_cap_data but in cpu endian and always filled completely */ struct cpu_vfs_cap_data { __u32 magic_etc; kuid_t rootid; kernel_cap_t permitted; kernel_cap_t inheritable; }; #define _USER_CAP_HEADER_SIZE (sizeof(struct __user_cap_header_struct)) #define _KERNEL_CAP_T_SIZE (sizeof(kernel_cap_t)) struct file; struct inode; struct dentry; struct task_struct; struct user_namespace; struct mnt_idmap; /* * CAP_FS_MASK and CAP_NFSD_MASKS: * * The fs mask is all the privileges that fsuid==0 historically meant. * At one time in the past, that included CAP_MKNOD and CAP_LINUX_IMMUTABLE. * * It has never meant setting security.* and trusted.* xattrs. * * We could also define fsmask as follows: * 1. CAP_FS_MASK is the privilege to bypass all fs-related DAC permissions * 2. The security.* and trusted.* xattrs are fs-related MAC permissions */ # define CAP_FS_MASK (BIT_ULL(CAP_CHOWN) \ | BIT_ULL(CAP_MKNOD) \ | BIT_ULL(CAP_DAC_OVERRIDE) \ | BIT_ULL(CAP_DAC_READ_SEARCH) \ | BIT_ULL(CAP_FOWNER) \ | BIT_ULL(CAP_FSETID) \ | BIT_ULL(CAP_MAC_OVERRIDE)) #define CAP_VALID_MASK (BIT_ULL(CAP_LAST_CAP+1)-1) # define CAP_EMPTY_SET ((kernel_cap_t) { 0 }) # define CAP_FULL_SET ((kernel_cap_t) { CAP_VALID_MASK }) # define CAP_FS_SET ((kernel_cap_t) { CAP_FS_MASK | BIT_ULL(CAP_LINUX_IMMUTABLE) }) # define CAP_NFSD_SET ((kernel_cap_t) { CAP_FS_MASK | BIT_ULL(CAP_SYS_RESOURCE) }) # define cap_clear(c) do { (c).val = 0; } while (0) #define cap_raise(c, flag) ((c).val |= BIT_ULL(flag)) #define cap_lower(c, flag) ((c).val &= ~BIT_ULL(flag)) #define cap_raised(c, flag) (((c).val & BIT_ULL(flag)) != 0) static inline kernel_cap_t cap_combine(const kernel_cap_t a, const kernel_cap_t b) { return (kernel_cap_t) { a.val | b.val }; } static inline kernel_cap_t cap_intersect(const kernel_cap_t a, const kernel_cap_t b) { return (kernel_cap_t) { a.val & b.val }; } static inline kernel_cap_t cap_drop(const kernel_cap_t a, const kernel_cap_t drop) { return (kernel_cap_t) { a.val &~ drop.val }; } static inline bool cap_isclear(const kernel_cap_t a) { return !a.val; } static inline bool cap_isidentical(const kernel_cap_t a, const kernel_cap_t b) { return a.val == b.val; } /* * Check if "a" is a subset of "set". * return true if ALL of the capabilities in "a" are also in "set" * cap_issubset(0101, 1111) will return true * return false if ANY of the capabilities in "a" are not in "set" * cap_issubset(1111, 0101) will return false */ static inline bool cap_issubset(const kernel_cap_t a, const kernel_cap_t set) { return !(a.val & ~set.val); } /* Used to decide between falling back on the old suser() or fsuser(). */ static inline kernel_cap_t cap_drop_fs_set(const kernel_cap_t a) { return cap_drop(a, CAP_FS_SET); } static inline kernel_cap_t cap_raise_fs_set(const kernel_cap_t a, const kernel_cap_t permitted) { return cap_combine(a, cap_intersect(permitted, CAP_FS_SET)); } static inline kernel_cap_t cap_drop_nfsd_set(const kernel_cap_t a) { return cap_drop(a, CAP_NFSD_SET); } static inline kernel_cap_t cap_raise_nfsd_set(const kernel_cap_t a, const kernel_cap_t permitted) { return cap_combine(a, cap_intersect(permitted, CAP_NFSD_SET)); } #ifdef CONFIG_MULTIUSER extern bool has_ns_capability(struct task_struct *t, struct user_namespace *ns, int cap); extern bool has_capability_noaudit(struct task_struct *t, int cap); extern bool has_ns_capability_noaudit(struct task_struct *t, struct user_namespace *ns, int cap); extern bool capable(int cap); extern bool ns_capable(struct user_namespace *ns, int cap); extern bool ns_capable_noaudit(struct user_namespace *ns, int cap); extern bool ns_capable_setid(struct user_namespace *ns, int cap); #else static inline bool has_ns_capability(struct task_struct *t, struct user_namespace *ns, int cap) { return true; } static inline bool has_capability_noaudit(struct task_struct *t, int cap) { return true; } static inline bool has_ns_capability_noaudit(struct task_struct *t, struct user_namespace *ns, int cap) { return true; } static inline bool capable(int cap) { return true; } static inline bool ns_capable(struct user_namespace *ns, int cap) { return true; } static inline bool ns_capable_noaudit(struct user_namespace *ns, int cap) { return true; } static inline bool ns_capable_setid(struct user_namespace *ns, int cap) { return true; } #endif /* CONFIG_MULTIUSER */ bool privileged_wrt_inode_uidgid(struct user_namespace *ns, struct mnt_idmap *idmap, const struct inode *inode); bool capable_wrt_inode_uidgid(struct mnt_idmap *idmap, const struct inode *inode, int cap); extern bool file_ns_capable(const struct file *file, struct user_namespace *ns, int cap); extern bool ptracer_capable(struct task_struct *tsk, struct user_namespace *ns); static inline bool perfmon_capable(void) { return capable(CAP_PERFMON) || capable(CAP_SYS_ADMIN); } static inline bool bpf_capable(void) { return capable(CAP_BPF) || capable(CAP_SYS_ADMIN); } static inline bool checkpoint_restore_ns_capable(struct user_namespace *ns) { return ns_capable(ns, CAP_CHECKPOINT_RESTORE) || ns_capable(ns, CAP_SYS_ADMIN); } /* audit system wants to get cap info from files as well */ int get_vfs_caps_from_disk(struct mnt_idmap *idmap, const struct dentry *dentry, struct cpu_vfs_cap_data *cpu_caps); int cap_convert_nscap(struct mnt_idmap *idmap, struct dentry *dentry, const void **ivalue, size_t size); #endif /* !_LINUX_CAPABILITY_H */ |
| 5 4 7 7 33 33 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 | // SPDX-License-Identifier: GPL-2.0 #include <linux/proc_fs.h> #include <linux/ethtool.h> #include <linux/export.h> #include <net/net_namespace.h> #include <net/netns/generic.h> #include <net/bonding.h> #include "bonding_priv.h" static void *bond_info_seq_start(struct seq_file *seq, loff_t *pos) __acquires(RCU) { struct bonding *bond = pde_data(file_inode(seq->file)); struct list_head *iter; struct slave *slave; loff_t off = 0; rcu_read_lock(); if (*pos == 0) return SEQ_START_TOKEN; bond_for_each_slave_rcu(bond, slave, iter) if (++off == *pos) return slave; return NULL; } static void *bond_info_seq_next(struct seq_file *seq, void *v, loff_t *pos) { struct bonding *bond = pde_data(file_inode(seq->file)); struct list_head *iter; struct slave *slave; bool found = false; ++*pos; if (v == SEQ_START_TOKEN) return bond_first_slave_rcu(bond); bond_for_each_slave_rcu(bond, slave, iter) { if (found) return slave; if (slave == v) found = true; } return NULL; } static void bond_info_seq_stop(struct seq_file *seq, void *v) __releases(RCU) { rcu_read_unlock(); } static void bond_info_show_master(struct seq_file *seq) { struct bonding *bond = pde_data(file_inode(seq->file)); const struct bond_opt_value *optval; struct slave *curr, *primary; int i; curr = rcu_dereference(bond->curr_active_slave); seq_printf(seq, "Bonding Mode: %s", bond_mode_name(BOND_MODE(bond))); if (BOND_MODE(bond) == BOND_MODE_ACTIVEBACKUP && bond->params.fail_over_mac) { optval = bond_opt_get_val(BOND_OPT_FAIL_OVER_MAC, bond->params.fail_over_mac); seq_printf(seq, " (fail_over_mac %s)", optval->string); } seq_printf(seq, "\n"); if (bond_mode_uses_xmit_hash(bond)) { optval = bond_opt_get_val(BOND_OPT_XMIT_HASH, bond->params.xmit_policy); seq_printf(seq, "Transmit Hash Policy: %s (%d)\n", optval->string, bond->params.xmit_policy); } if (bond_uses_primary(bond)) { primary = rcu_dereference(bond->primary_slave); seq_printf(seq, "Primary Slave: %s", primary ? primary->dev->name : "None"); if (primary) { optval = bond_opt_get_val(BOND_OPT_PRIMARY_RESELECT, bond->params.primary_reselect); seq_printf(seq, " (primary_reselect %s)", optval->string); } seq_printf(seq, "\nCurrently Active Slave: %s\n", (curr) ? curr->dev->name : "None"); } seq_printf(seq, "MII Status: %s\n", netif_carrier_ok(bond->dev) ? "up" : "down"); seq_printf(seq, "MII Polling Interval (ms): %d\n", bond->params.miimon); seq_printf(seq, "Up Delay (ms): %d\n", bond->params.updelay * bond->params.miimon); seq_printf(seq, "Down Delay (ms): %d\n", bond->params.downdelay * bond->params.miimon); seq_printf(seq, "Peer Notification Delay (ms): %d\n", bond->params.peer_notif_delay * bond->params.miimon); /* ARP information */ if (bond->params.arp_interval > 0) { int printed = 0; seq_printf(seq, "ARP Polling Interval (ms): %d\n", bond->params.arp_interval); seq_printf(seq, "ARP Missed Max: %u\n", bond->params.missed_max); seq_printf(seq, "ARP IP target/s (n.n.n.n form):"); for (i = 0; (i < BOND_MAX_ARP_TARGETS); i++) { if (!bond->params.arp_targets[i]) break; if (printed) seq_printf(seq, ","); seq_printf(seq, " %pI4", &bond->params.arp_targets[i]); printed = 1; } seq_printf(seq, "\n"); #if IS_ENABLED(CONFIG_IPV6) printed = 0; seq_printf(seq, "NS IPv6 target/s (xx::xx form):"); for (i = 0; (i < BOND_MAX_NS_TARGETS); i++) { if (ipv6_addr_any(&bond->params.ns_targets[i])) break; if (printed) seq_printf(seq, ","); seq_printf(seq, " %pI6c", &bond->params.ns_targets[i]); printed = 1; } seq_printf(seq, "\n"); #endif } if (BOND_MODE(bond) == BOND_MODE_8023AD) { struct ad_info ad_info; seq_puts(seq, "\n802.3ad info\n"); seq_printf(seq, "LACP active: %s\n", (bond->params.lacp_active) ? "on" : "off"); seq_printf(seq, "LACP rate: %s\n", (bond->params.lacp_fast) ? "fast" : "slow"); seq_printf(seq, "Min links: %d\n", bond->params.min_links); optval = bond_opt_get_val(BOND_OPT_AD_SELECT, bond->params.ad_select); seq_printf(seq, "Aggregator selection policy (ad_select): %s\n", optval->string); if (capable(CAP_NET_ADMIN)) { seq_printf(seq, "System priority: %d\n", BOND_AD_INFO(bond).system.sys_priority); seq_printf(seq, "System MAC address: %pM\n", &BOND_AD_INFO(bond).system.sys_mac_addr); if (__bond_3ad_get_active_agg_info(bond, &ad_info)) { seq_printf(seq, "bond %s has no active aggregator\n", bond->dev->name); } else { seq_printf(seq, "Active Aggregator Info:\n"); seq_printf(seq, "\tAggregator ID: %d\n", ad_info.aggregator_id); seq_printf(seq, "\tNumber of ports: %d\n", ad_info.ports); seq_printf(seq, "\tActor Key: %d\n", ad_info.actor_key); seq_printf(seq, "\tPartner Key: %d\n", ad_info.partner_key); seq_printf(seq, "\tPartner Mac Address: %pM\n", ad_info.partner_system); } } } } static void bond_info_show_slave(struct seq_file *seq, const struct slave *slave) { struct bonding *bond = pde_data(file_inode(seq->file)); seq_printf(seq, "\nSlave Interface: %s\n", slave->dev->name); seq_printf(seq, "MII Status: %s\n", bond_slave_link_status(slave->link)); if (slave->speed == SPEED_UNKNOWN) seq_printf(seq, "Speed: %s\n", "Unknown"); else seq_printf(seq, "Speed: %d Mbps\n", slave->speed); if (slave->duplex == DUPLEX_UNKNOWN) seq_printf(seq, "Duplex: %s\n", "Unknown"); else seq_printf(seq, "Duplex: %s\n", slave->duplex ? "full" : "half"); seq_printf(seq, "Link Failure Count: %u\n", slave->link_failure_count); seq_printf(seq, "Permanent HW addr: %*phC\n", slave->dev->addr_len, slave->perm_hwaddr); seq_printf(seq, "Slave queue ID: %d\n", READ_ONCE(slave->queue_id)); if (BOND_MODE(bond) == BOND_MODE_8023AD) { const struct port *port = &SLAVE_AD_INFO(slave)->port; const struct aggregator *agg = port->aggregator; if (agg) { seq_printf(seq, "Aggregator ID: %d\n", agg->aggregator_identifier); seq_printf(seq, "Actor Churn State: %s\n", bond_3ad_churn_desc(port->sm_churn_actor_state)); seq_printf(seq, "Partner Churn State: %s\n", bond_3ad_churn_desc(port->sm_churn_partner_state)); seq_printf(seq, "Actor Churned Count: %d\n", port->churn_actor_count); seq_printf(seq, "Partner Churned Count: %d\n", port->churn_partner_count); if (capable(CAP_NET_ADMIN)) { seq_puts(seq, "details actor lacp pdu:\n"); seq_printf(seq, " system priority: %d\n", port->actor_system_priority); seq_printf(seq, " system mac address: %pM\n", &port->actor_system); seq_printf(seq, " port key: %d\n", port->actor_oper_port_key); seq_printf(seq, " port priority: %d\n", port->actor_port_priority); seq_printf(seq, " port number: %d\n", port->actor_port_number); seq_printf(seq, " port state: %d\n", port->actor_oper_port_state); seq_puts(seq, "details partner lacp pdu:\n"); seq_printf(seq, " system priority: %d\n", port->partner_oper.system_priority); seq_printf(seq, " system mac address: %pM\n", &port->partner_oper.system); seq_printf(seq, " oper key: %d\n", port->partner_oper.key); seq_printf(seq, " port priority: %d\n", port->partner_oper.port_priority); seq_printf(seq, " port number: %d\n", port->partner_oper.port_number); seq_printf(seq, " port state: %d\n", port->partner_oper.port_state); } } else { seq_puts(seq, "Aggregator ID: N/A\n"); } } } static int bond_info_seq_show(struct seq_file *seq, void *v) { if (v == SEQ_START_TOKEN) { seq_printf(seq, "%s\n", bond_version); bond_info_show_master(seq); } else bond_info_show_slave(seq, v); return 0; } static const struct seq_operations bond_info_seq_ops = { .start = bond_info_seq_start, .next = bond_info_seq_next, .stop = bond_info_seq_stop, .show = bond_info_seq_show, }; void bond_create_proc_entry(struct bonding *bond) { struct net_device *bond_dev = bond->dev; struct bond_net *bn = net_generic(dev_net(bond_dev), bond_net_id); if (bn->proc_dir) { bond->proc_entry = proc_create_seq_data(bond_dev->name, 0444, bn->proc_dir, &bond_info_seq_ops, bond); if (bond->proc_entry == NULL) netdev_warn(bond_dev, "Cannot create /proc/net/%s/%s\n", DRV_NAME, bond_dev->name); else memcpy(bond->proc_file_name, bond_dev->name, IFNAMSIZ); } } void bond_remove_proc_entry(struct bonding *bond) { struct net_device *bond_dev = bond->dev; struct bond_net *bn = net_generic(dev_net(bond_dev), bond_net_id); if (bn->proc_dir && bond->proc_entry) { remove_proc_entry(bond->proc_file_name, bn->proc_dir); memset(bond->proc_file_name, 0, IFNAMSIZ); bond->proc_entry = NULL; } } /* Create the bonding directory under /proc/net, if doesn't exist yet. * Caller must hold rtnl_lock. */ void __net_init bond_create_proc_dir(struct bond_net *bn) { if (!bn->proc_dir) { bn->proc_dir = proc_mkdir(DRV_NAME, bn->net->proc_net); if (!bn->proc_dir) pr_warn("Warning: Cannot create /proc/net/%s\n", DRV_NAME); } } /* Destroy the bonding directory under /proc/net, if empty. */ void __net_exit bond_destroy_proc_dir(struct bond_net *bn) { if (bn->proc_dir) { remove_proc_entry(DRV_NAME, bn->net->proc_net); bn->proc_dir = NULL; } } |
| 19 19 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 | // SPDX-License-Identifier: GPL-2.0-or-later /* * Handle firewalling core * Linux ethernet bridge * * Authors: * Lennert Buytenhek <buytenh@gnu.org> * Bart De Schuymer <bdschuym@pandora.be> * * Lennert dedicates this file to Kerstin Wurdinger. */ #include <linux/module.h> #include <linux/kernel.h> #include <linux/in_route.h> #include <linux/inetdevice.h> #include <net/route.h> #include "br_private.h" #ifdef CONFIG_SYSCTL #include <linux/sysctl.h> #endif static void fake_update_pmtu(struct dst_entry *dst, struct sock *sk, struct sk_buff *skb, u32 mtu, bool confirm_neigh) { } static void fake_redirect(struct dst_entry *dst, struct sock *sk, struct sk_buff *skb) { } static u32 *fake_cow_metrics(struct dst_entry *dst, unsigned long old) { return NULL; } static struct neighbour *fake_neigh_lookup(const struct dst_entry *dst, struct sk_buff *skb, const void *daddr) { return NULL; } static unsigned int fake_mtu(const struct dst_entry *dst) { return dst->dev->mtu; } static struct dst_ops fake_dst_ops = { .family = AF_INET, .update_pmtu = fake_update_pmtu, .redirect = fake_redirect, .cow_metrics = fake_cow_metrics, .neigh_lookup = fake_neigh_lookup, .mtu = fake_mtu, }; /* * Initialize bogus route table used to keep netfilter happy. * Currently, we fill in the PMTU entry because netfilter * refragmentation needs it, and the rt_flags entry because * ipt_REJECT needs it. Future netfilter modules might * require us to fill additional fields. */ void br_netfilter_rtable_init(struct net_bridge *br) { struct rtable *rt = &br->fake_rtable; rcuref_init(&rt->dst.__rcuref, 1); rt->dst.dev = br->dev; dst_init_metrics(&rt->dst, br->metrics, false); dst_metric_set(&rt->dst, RTAX_MTU, br->dev->mtu); rt->dst.flags = DST_NOXFRM | DST_FAKE_RTABLE; rt->dst.ops = &fake_dst_ops; } int __init br_nf_core_init(void) { return dst_entries_init(&fake_dst_ops); } void br_nf_core_fini(void) { dst_entries_destroy(&fake_dst_ops); } |
| 4 2 2 13 11 2 13 13 13 13 15 15 15 15 15 1 37 13 24 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 | // SPDX-License-Identifier: GPL-2.0 /* * linux/fs/proc/root.c * * Copyright (C) 1991, 1992 Linus Torvalds * * proc root directory handling functions */ #include <linux/errno.h> #include <linux/time.h> #include <linux/proc_fs.h> #include <linux/stat.h> #include <linux/init.h> #include <linux/sched.h> #include <linux/sched/stat.h> #include <linux/module.h> #include <linux/bitops.h> #include <linux/user_namespace.h> #include <linux/fs_context.h> #include <linux/mount.h> #include <linux/pid_namespace.h> #include <linux/fs_parser.h> #include <linux/cred.h> #include <linux/magic.h> #include <linux/slab.h> #include "internal.h" struct proc_fs_context { struct pid_namespace *pid_ns; unsigned int mask; enum proc_hidepid hidepid; int gid; enum proc_pidonly pidonly; }; enum proc_param { Opt_gid, Opt_hidepid, Opt_subset, Opt_pidns, }; static const struct fs_parameter_spec proc_fs_parameters[] = { fsparam_u32("gid", Opt_gid), fsparam_string("hidepid", Opt_hidepid), fsparam_string("subset", Opt_subset), fsparam_file_or_string("pidns", Opt_pidns), {} }; static inline int valid_hidepid(unsigned int value) { return (value == HIDEPID_OFF || value == HIDEPID_NO_ACCESS || value == HIDEPID_INVISIBLE || value == HIDEPID_NOT_PTRACEABLE); } static int proc_parse_hidepid_param(struct fs_context *fc, struct fs_parameter *param) { struct proc_fs_context *ctx = fc->fs_private; struct fs_parameter_spec hidepid_u32_spec = fsparam_u32("hidepid", Opt_hidepid); struct fs_parse_result result; int base = (unsigned long)hidepid_u32_spec.data; if (param->type != fs_value_is_string) return invalf(fc, "proc: unexpected type of hidepid value\n"); if (!kstrtouint(param->string, base, &result.uint_32)) { if (!valid_hidepid(result.uint_32)) return invalf(fc, "proc: unknown value of hidepid - %s\n", param->string); ctx->hidepid = result.uint_32; return 0; } if (!strcmp(param->string, "off")) ctx->hidepid = HIDEPID_OFF; else if (!strcmp(param->string, "noaccess")) ctx->hidepid = HIDEPID_NO_ACCESS; else if (!strcmp(param->string, "invisible")) ctx->hidepid = HIDEPID_INVISIBLE; else if (!strcmp(param->string, "ptraceable")) ctx->hidepid = HIDEPID_NOT_PTRACEABLE; else return invalf(fc, "proc: unknown value of hidepid - %s\n", param->string); return 0; } static int proc_parse_subset_param(struct fs_context *fc, char *value) { struct proc_fs_context *ctx = fc->fs_private; while (value) { char *ptr = strchr(value, ','); if (ptr != NULL) *ptr++ = '\0'; if (*value != '\0') { if (!strcmp(value, "pid")) { ctx->pidonly = PROC_PIDONLY_ON; } else { return invalf(fc, "proc: unsupported subset option - %s\n", value); } } value = ptr; } return 0; } #ifdef CONFIG_PID_NS static int proc_parse_pidns_param(struct fs_context *fc, struct fs_parameter *param, struct fs_parse_result *result) { struct proc_fs_context *ctx = fc->fs_private; struct pid_namespace *target, *active = task_active_pid_ns(current); struct ns_common *ns; struct file *ns_filp __free(fput) = NULL; switch (param->type) { case fs_value_is_file: /* came through fsconfig, steal the file reference */ ns_filp = no_free_ptr(param->file); break; case fs_value_is_string: ns_filp = filp_open(param->string, O_RDONLY, 0); break; default: WARN_ON_ONCE(true); break; } if (!ns_filp) ns_filp = ERR_PTR(-EBADF); if (IS_ERR(ns_filp)) { errorfc(fc, "could not get file from pidns argument"); return PTR_ERR(ns_filp); } if (!proc_ns_file(ns_filp)) return invalfc(fc, "pidns argument is not an nsfs file"); ns = get_proc_ns(file_inode(ns_filp)); if (ns->ns_type != CLONE_NEWPID) return invalfc(fc, "pidns argument is not a pidns file"); target = container_of(ns, struct pid_namespace, ns); /* * pidns= is shorthand for joining the pidns to get a fsopen fd, so the * permission model should be the same as pidns_install(). */ if (!ns_capable(target->user_ns, CAP_SYS_ADMIN)) { errorfc(fc, "insufficient permissions to set pidns"); return -EPERM; } if (!pidns_is_ancestor(target, active)) return invalfc(fc, "cannot set pidns to non-descendant pidns"); put_pid_ns(ctx->pid_ns); ctx->pid_ns = get_pid_ns(target); put_user_ns(fc->user_ns); fc->user_ns = get_user_ns(ctx->pid_ns->user_ns); return 0; } #endif /* CONFIG_PID_NS */ static int proc_parse_param(struct fs_context *fc, struct fs_parameter *param) { struct proc_fs_context *ctx = fc->fs_private; struct fs_parse_result result; int opt, err; opt = fs_parse(fc, proc_fs_parameters, param, &result); if (opt < 0) return opt; switch (opt) { case Opt_gid: ctx->gid = result.uint_32; break; case Opt_hidepid: err = proc_parse_hidepid_param(fc, param); if (err) return err; break; case Opt_subset: err = proc_parse_subset_param(fc, param->string); if (err) return err; break; case Opt_pidns: #ifdef CONFIG_PID_NS /* * We would have to RCU-protect every proc_pid_ns() or * proc_sb_info() access if we allowed this to be reconfigured * for an existing procfs instance. Luckily, procfs instances * are cheap to create, and mount-beneath would let you * atomically replace an instance even with overmounts. */ if (fc->purpose == FS_CONTEXT_FOR_RECONFIGURE) { errorfc(fc, "cannot reconfigure pidns for existing procfs"); return -EBUSY; } err = proc_parse_pidns_param(fc, param, &result); if (err) return err; break; #else errorfc(fc, "pidns mount flag not supported on this system"); return -EOPNOTSUPP; #endif default: return -EINVAL; } ctx->mask |= 1 << opt; return 0; } static void proc_apply_options(struct proc_fs_info *fs_info, struct fs_context *fc, struct user_namespace *user_ns) { struct proc_fs_context *ctx = fc->fs_private; if (ctx->mask & (1 << Opt_gid)) fs_info->pid_gid = make_kgid(user_ns, ctx->gid); if (ctx->mask & (1 << Opt_hidepid)) fs_info->hide_pid = ctx->hidepid; if (ctx->mask & (1 << Opt_subset)) fs_info->pidonly = ctx->pidonly; if (ctx->mask & (1 << Opt_pidns) && !WARN_ON_ONCE(fc->purpose == FS_CONTEXT_FOR_RECONFIGURE)) { put_pid_ns(fs_info->pid_ns); fs_info->pid_ns = get_pid_ns(ctx->pid_ns); } } static int proc_fill_super(struct super_block *s, struct fs_context *fc) { struct proc_fs_context *ctx = fc->fs_private; struct inode *root_inode; struct proc_fs_info *fs_info; int ret; fs_info = kzalloc(sizeof(*fs_info), GFP_KERNEL); if (!fs_info) return -ENOMEM; fs_info->pid_ns = get_pid_ns(ctx->pid_ns); proc_apply_options(fs_info, fc, current_user_ns()); /* User space would break if executables or devices appear on proc */ s->s_iflags |= SB_I_USERNS_VISIBLE | SB_I_NOEXEC | SB_I_NODEV; s->s_flags |= SB_NODIRATIME | SB_NOSUID | SB_NOEXEC; s->s_blocksize = 1024; s->s_blocksize_bits = 10; s->s_magic = PROC_SUPER_MAGIC; s->s_op = &proc_sops; s->s_time_gran = 1; s->s_fs_info = fs_info; /* * procfs isn't actually a stacking filesystem; however, there is * too much magic going on inside it to permit stacking things on * top of it */ s->s_stack_depth = FILESYSTEM_MAX_STACK_DEPTH; /* procfs dentries and inodes don't require IO to create */ s->s_shrink->seeks = 0; pde_get(&proc_root); root_inode = proc_get_inode(s, &proc_root); if (!root_inode) { pr_err("proc_fill_super: get root inode failed\n"); return -ENOMEM; } s->s_root = d_make_root(root_inode); if (!s->s_root) { pr_err("proc_fill_super: allocate dentry failed\n"); return -ENOMEM; } ret = proc_setup_self(s); if (ret) { return ret; } return proc_setup_thread_self(s); } static int proc_reconfigure(struct fs_context *fc) { struct super_block *sb = fc->root->d_sb; struct proc_fs_info *fs_info = proc_sb_info(sb); sync_filesystem(sb); proc_apply_options(fs_info, fc, current_user_ns()); return 0; } static int proc_get_tree(struct fs_context *fc) { return get_tree_nodev(fc, proc_fill_super); } static void proc_fs_context_free(struct fs_context *fc) { struct proc_fs_context *ctx = fc->fs_private; put_pid_ns(ctx->pid_ns); kfree(ctx); } static const struct fs_context_operations proc_fs_context_ops = { .free = proc_fs_context_free, .parse_param = proc_parse_param, .get_tree = proc_get_tree, .reconfigure = proc_reconfigure, }; static int proc_init_fs_context(struct fs_context *fc) { struct proc_fs_context *ctx; ctx = kzalloc(sizeof(struct proc_fs_context), GFP_KERNEL); if (!ctx) return -ENOMEM; ctx->pid_ns = get_pid_ns(task_active_pid_ns(current)); put_user_ns(fc->user_ns); fc->user_ns = get_user_ns(ctx->pid_ns->user_ns); fc->fs_private = ctx; fc->ops = &proc_fs_context_ops; return 0; } static void proc_kill_sb(struct super_block *sb) { struct proc_fs_info *fs_info = proc_sb_info(sb); if (!fs_info) { kill_anon_super(sb); return; } dput(fs_info->proc_self); dput(fs_info->proc_thread_self); kill_anon_super(sb); put_pid_ns(fs_info->pid_ns); kfree_rcu(fs_info, rcu); } static struct file_system_type proc_fs_type = { .name = "proc", .init_fs_context = proc_init_fs_context, .parameters = proc_fs_parameters, .kill_sb = proc_kill_sb, .fs_flags = FS_USERNS_MOUNT | FS_DISALLOW_NOTIFY_PERM, }; void __init proc_root_init(void) { proc_init_kmemcache(); set_proc_pid_nlink(); proc_self_init(); proc_thread_self_init(); proc_symlink("mounts", NULL, "self/mounts"); proc_net_init(); proc_mkdir("fs", NULL); proc_mkdir("driver", NULL); proc_create_mount_point("fs/nfsd"); /* somewhere for the nfsd filesystem to be mounted */ #if defined(CONFIG_SUN_OPENPROMFS) || defined(CONFIG_SUN_OPENPROMFS_MODULE) /* just give it a mountpoint */ proc_create_mount_point("openprom"); #endif proc_tty_init(); proc_mkdir("bus", NULL); proc_sys_init(); /* * Last things last. It is not like userspace processes eager * to open /proc files exist at this point but register last * anyway. */ register_filesystem(&proc_fs_type); } static int proc_root_getattr(struct mnt_idmap *idmap, const struct path *path, struct kstat *stat, u32 request_mask, unsigned int query_flags) { generic_fillattr(&nop_mnt_idmap, request_mask, d_inode(path->dentry), stat); stat->nlink = proc_root.nlink + nr_processes(); return 0; } static struct dentry *proc_root_lookup(struct inode * dir, struct dentry * dentry, unsigned int flags) { if (!proc_pid_lookup(dentry, flags)) return NULL; return proc_lookup(dir, dentry, flags); } static int proc_root_readdir(struct file *file, struct dir_context *ctx) { if (ctx->pos < FIRST_PROCESS_ENTRY) { int error = proc_readdir(file, ctx); if (unlikely(error <= 0)) return error; ctx->pos = FIRST_PROCESS_ENTRY; } return proc_pid_readdir(file, ctx); } /* * The root /proc directory is special, as it has the * <pid> directories. Thus we don't use the generic * directory handling functions for that.. */ static const struct file_operations proc_root_operations = { .read = generic_read_dir, .iterate_shared = proc_root_readdir, .llseek = generic_file_llseek, }; /* * proc root can do almost nothing.. */ static const struct inode_operations proc_root_inode_operations = { .lookup = proc_root_lookup, .getattr = proc_root_getattr, }; /* * This is the root "inode" in the /proc tree.. */ struct proc_dir_entry proc_root = { .low_ino = PROCFS_ROOT_INO, .namelen = 5, .mode = S_IFDIR | S_IRUGO | S_IXUGO, .nlink = 2, .refcnt = REFCOUNT_INIT(1), .proc_iops = &proc_root_inode_operations, .proc_dir_ops = &proc_root_operations, .parent = &proc_root, .subdir = RB_ROOT, .name = "/proc", }; |
| 29 46 56 33 54 43 149 149 149 26 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 | // SPDX-License-Identifier: GPL-2.0-only #include <linux/ns_common.h> #include <linux/proc_ns.h> #include <linux/vfsdebug.h> #ifdef CONFIG_DEBUG_VFS static void ns_debug(struct ns_common *ns, const struct proc_ns_operations *ops) { switch (ns->ns_type) { #ifdef CONFIG_CGROUPS case CLONE_NEWCGROUP: VFS_WARN_ON_ONCE(ops != &cgroupns_operations); break; #endif #ifdef CONFIG_IPC_NS case CLONE_NEWIPC: VFS_WARN_ON_ONCE(ops != &ipcns_operations); break; #endif case CLONE_NEWNS: VFS_WARN_ON_ONCE(ops != &mntns_operations); break; #ifdef CONFIG_NET_NS case CLONE_NEWNET: VFS_WARN_ON_ONCE(ops != &netns_operations); break; #endif #ifdef CONFIG_PID_NS case CLONE_NEWPID: VFS_WARN_ON_ONCE(ops != &pidns_operations); break; #endif #ifdef CONFIG_TIME_NS case CLONE_NEWTIME: VFS_WARN_ON_ONCE(ops != &timens_operations); break; #endif #ifdef CONFIG_USER_NS case CLONE_NEWUSER: VFS_WARN_ON_ONCE(ops != &userns_operations); break; #endif #ifdef CONFIG_UTS_NS case CLONE_NEWUTS: VFS_WARN_ON_ONCE(ops != &utsns_operations); break; #endif } } #endif int __ns_common_init(struct ns_common *ns, u32 ns_type, const struct proc_ns_operations *ops, int inum) { refcount_set(&ns->__ns_ref, 1); ns->stashed = NULL; ns->ops = ops; ns->ns_id = 0; ns->ns_type = ns_type; RB_CLEAR_NODE(&ns->ns_tree_node); INIT_LIST_HEAD(&ns->ns_list_node); #ifdef CONFIG_DEBUG_VFS ns_debug(ns, ops); #endif if (inum) { ns->inum = inum; return 0; } return proc_alloc_inum(&ns->inum); } void __ns_common_free(struct ns_common *ns) { proc_free_inum(ns->inum); } |
| 6 4 6 2 4 6 6 6 4 6 6 4 6 4 6 4 6 4 6 4 6 4 6 4 6 4 6 4 6 4 6 4 6 4 6 4 6 4 6 4 6 4 6 4 6 4 6 4 6 4 6 4 6 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 | // SPDX-License-Identifier: GPL-2.0-only #include <linux/dim.h> #include "netlink.h" #include "common.h" struct coalesce_req_info { struct ethnl_req_info base; }; struct coalesce_reply_data { struct ethnl_reply_data base; struct ethtool_coalesce coalesce; struct kernel_ethtool_coalesce kernel_coalesce; u32 supported_params; }; #define COALESCE_REPDATA(__reply_base) \ container_of(__reply_base, struct coalesce_reply_data, base) #define __SUPPORTED_OFFSET ETHTOOL_A_COALESCE_RX_USECS static u32 attr_to_mask(unsigned int attr_type) { return BIT(attr_type - __SUPPORTED_OFFSET); } /* build time check that indices in ethtool_ops::supported_coalesce_params * match corresponding attribute types with an offset */ #define __CHECK_SUPPORTED_OFFSET(x) \ static_assert((ETHTOOL_ ## x) == \ BIT((ETHTOOL_A_ ## x) - __SUPPORTED_OFFSET)) __CHECK_SUPPORTED_OFFSET(COALESCE_RX_USECS); __CHECK_SUPPORTED_OFFSET(COALESCE_RX_MAX_FRAMES); __CHECK_SUPPORTED_OFFSET(COALESCE_RX_USECS_IRQ); __CHECK_SUPPORTED_OFFSET(COALESCE_RX_MAX_FRAMES_IRQ); __CHECK_SUPPORTED_OFFSET(COALESCE_TX_USECS); __CHECK_SUPPORTED_OFFSET(COALESCE_TX_MAX_FRAMES); __CHECK_SUPPORTED_OFFSET(COALESCE_TX_USECS_IRQ); __CHECK_SUPPORTED_OFFSET(COALESCE_TX_MAX_FRAMES_IRQ); __CHECK_SUPPORTED_OFFSET(COALESCE_STATS_BLOCK_USECS); __CHECK_SUPPORTED_OFFSET(COALESCE_USE_ADAPTIVE_RX); __CHECK_SUPPORTED_OFFSET(COALESCE_USE_ADAPTIVE_TX); __CHECK_SUPPORTED_OFFSET(COALESCE_PKT_RATE_LOW); __CHECK_SUPPORTED_OFFSET(COALESCE_RX_USECS_LOW); __CHECK_SUPPORTED_OFFSET(COALESCE_RX_MAX_FRAMES_LOW); __CHECK_SUPPORTED_OFFSET(COALESCE_TX_USECS_LOW); __CHECK_SUPPORTED_OFFSET(COALESCE_TX_MAX_FRAMES_LOW); __CHECK_SUPPORTED_OFFSET(COALESCE_PKT_RATE_HIGH); __CHECK_SUPPORTED_OFFSET(COALESCE_RX_USECS_HIGH); __CHECK_SUPPORTED_OFFSET(COALESCE_RX_MAX_FRAMES_HIGH); __CHECK_SUPPORTED_OFFSET(COALESCE_TX_USECS_HIGH); __CHECK_SUPPORTED_OFFSET(COALESCE_TX_MAX_FRAMES_HIGH); __CHECK_SUPPORTED_OFFSET(COALESCE_RATE_SAMPLE_INTERVAL); const struct nla_policy ethnl_coalesce_get_policy[] = { [ETHTOOL_A_COALESCE_HEADER] = NLA_POLICY_NESTED(ethnl_header_policy), }; static int coalesce_prepare_data(const struct ethnl_req_info *req_base, struct ethnl_reply_data *reply_base, const struct genl_info *info) { struct coalesce_reply_data *data = COALESCE_REPDATA(reply_base); struct net_device *dev = reply_base->dev; int ret; if (!dev->ethtool_ops->get_coalesce) return -EOPNOTSUPP; data->supported_params = dev->ethtool_ops->supported_coalesce_params; ret = ethnl_ops_begin(dev); if (ret < 0) return ret; ret = dev->ethtool_ops->get_coalesce(dev, &data->coalesce, &data->kernel_coalesce, info->extack); ethnl_ops_complete(dev); return ret; } static int coalesce_reply_size(const struct ethnl_req_info *req_base, const struct ethnl_reply_data *reply_base) { int modersz = nla_total_size(0) + /* _PROFILE_IRQ_MODERATION, nest */ nla_total_size(sizeof(u32)) + /* _IRQ_MODERATION_USEC */ nla_total_size(sizeof(u32)) + /* _IRQ_MODERATION_PKTS */ nla_total_size(sizeof(u32)); /* _IRQ_MODERATION_COMPS */ int total_modersz = nla_total_size(0) + /* _{R,T}X_PROFILE, nest */ modersz * NET_DIM_PARAMS_NUM_PROFILES; return nla_total_size(sizeof(u32)) + /* _RX_USECS */ nla_total_size(sizeof(u32)) + /* _RX_MAX_FRAMES */ nla_total_size(sizeof(u32)) + /* _RX_USECS_IRQ */ nla_total_size(sizeof(u32)) + /* _RX_MAX_FRAMES_IRQ */ nla_total_size(sizeof(u32)) + /* _TX_USECS */ nla_total_size(sizeof(u32)) + /* _TX_MAX_FRAMES */ nla_total_size(sizeof(u32)) + /* _TX_USECS_IRQ */ nla_total_size(sizeof(u32)) + /* _TX_MAX_FRAMES_IRQ */ nla_total_size(sizeof(u32)) + /* _STATS_BLOCK_USECS */ nla_total_size(sizeof(u8)) + /* _USE_ADAPTIVE_RX */ nla_total_size(sizeof(u8)) + /* _USE_ADAPTIVE_TX */ nla_total_size(sizeof(u32)) + /* _PKT_RATE_LOW */ nla_total_size(sizeof(u32)) + /* _RX_USECS_LOW */ nla_total_size(sizeof(u32)) + /* _RX_MAX_FRAMES_LOW */ nla_total_size(sizeof(u32)) + /* _TX_USECS_LOW */ nla_total_size(sizeof(u32)) + /* _TX_MAX_FRAMES_LOW */ nla_total_size(sizeof(u32)) + /* _PKT_RATE_HIGH */ nla_total_size(sizeof(u32)) + /* _RX_USECS_HIGH */ nla_total_size(sizeof(u32)) + /* _RX_MAX_FRAMES_HIGH */ nla_total_size(sizeof(u32)) + /* _TX_USECS_HIGH */ nla_total_size(sizeof(u32)) + /* _TX_MAX_FRAMES_HIGH */ nla_total_size(sizeof(u32)) + /* _RATE_SAMPLE_INTERVAL */ nla_total_size(sizeof(u8)) + /* _USE_CQE_MODE_TX */ nla_total_size(sizeof(u8)) + /* _USE_CQE_MODE_RX */ nla_total_size(sizeof(u32)) + /* _TX_AGGR_MAX_BYTES */ nla_total_size(sizeof(u32)) + /* _TX_AGGR_MAX_FRAMES */ nla_total_size(sizeof(u32)) + /* _TX_AGGR_TIME_USECS */ total_modersz * 2; /* _{R,T}X_PROFILE */ } static bool coalesce_put_u32(struct sk_buff *skb, u16 attr_type, u32 val, u32 supported_params) { if (!val && !(supported_params & attr_to_mask(attr_type))) return false; return nla_put_u32(skb, attr_type, val); } static bool coalesce_put_bool(struct sk_buff *skb, u16 attr_type, u32 val, u32 supported_params) { if (!val && !(supported_params & attr_to_mask(attr_type))) return false; return nla_put_u8(skb, attr_type, !!val); } /** * coalesce_put_profile - fill reply with a nla nest with four child nla nests. * @skb: socket buffer the message is stored in * @attr_type: nest attr type ETHTOOL_A_COALESCE_*X_PROFILE * @profile: data passed to userspace * @coal_flags: modifiable parameters supported by the driver * * Put a dim profile nest attribute. Refer to ETHTOOL_A_PROFILE_IRQ_MODERATION. * * Return: 0 on success or a negative error code. */ static int coalesce_put_profile(struct sk_buff *skb, u16 attr_type, const struct dim_cq_moder *profile, u8 coal_flags) { struct nlattr *profile_attr, *moder_attr; int i, ret; if (!profile || !coal_flags) return 0; profile_attr = nla_nest_start(skb, attr_type); if (!profile_attr) return -EMSGSIZE; for (i = 0; i < NET_DIM_PARAMS_NUM_PROFILES; i++) { moder_attr = nla_nest_start(skb, ETHTOOL_A_PROFILE_IRQ_MODERATION); if (!moder_attr) { ret = -EMSGSIZE; goto cancel_profile; } if (coal_flags & DIM_COALESCE_USEC) { ret = nla_put_u32(skb, ETHTOOL_A_IRQ_MODERATION_USEC, profile[i].usec); if (ret) goto cancel_moder; } if (coal_flags & DIM_COALESCE_PKTS) { ret = nla_put_u32(skb, ETHTOOL_A_IRQ_MODERATION_PKTS, profile[i].pkts); if (ret) goto cancel_moder; } if (coal_flags & DIM_COALESCE_COMPS) { ret = nla_put_u32(skb, ETHTOOL_A_IRQ_MODERATION_COMPS, profile[i].comps); if (ret) goto cancel_moder; } nla_nest_end(skb, moder_attr); } nla_nest_end(skb, profile_attr); return 0; cancel_moder: nla_nest_cancel(skb, moder_attr); cancel_profile: nla_nest_cancel(skb, profile_attr); return ret; } static int coalesce_fill_reply(struct sk_buff *skb, const struct ethnl_req_info *req_base, const struct ethnl_reply_data *reply_base) { const struct coalesce_reply_data *data = COALESCE_REPDATA(reply_base); const struct kernel_ethtool_coalesce *kcoal = &data->kernel_coalesce; const struct ethtool_coalesce *coal = &data->coalesce; u32 supported = data->supported_params; struct dim_irq_moder *moder; int ret = 0; if (coalesce_put_u32(skb, ETHTOOL_A_COALESCE_RX_USECS, coal->rx_coalesce_usecs, supported) || coalesce_put_u32(skb, ETHTOOL_A_COALESCE_RX_MAX_FRAMES, coal->rx_max_coalesced_frames, supported) || coalesce_put_u32(skb, ETHTOOL_A_COALESCE_RX_USECS_IRQ, coal->rx_coalesce_usecs_irq, supported) || coalesce_put_u32(skb, ETHTOOL_A_COALESCE_RX_MAX_FRAMES_IRQ, coal->rx_max_coalesced_frames_irq, supported) || coalesce_put_u32(skb, ETHTOOL_A_COALESCE_TX_USECS, coal->tx_coalesce_usecs, supported) || coalesce_put_u32(skb, ETHTOOL_A_COALESCE_TX_MAX_FRAMES, coal->tx_max_coalesced_frames, supported) || coalesce_put_u32(skb, ETHTOOL_A_COALESCE_TX_USECS_IRQ, coal->tx_coalesce_usecs_irq, supported) || coalesce_put_u32(skb, ETHTOOL_A_COALESCE_TX_MAX_FRAMES_IRQ, coal->tx_max_coalesced_frames_irq, supported) || coalesce_put_u32(skb, ETHTOOL_A_COALESCE_STATS_BLOCK_USECS, coal->stats_block_coalesce_usecs, supported) || coalesce_put_bool(skb, ETHTOOL_A_COALESCE_USE_ADAPTIVE_RX, coal->use_adaptive_rx_coalesce, supported) || coalesce_put_bool(skb, ETHTOOL_A_COALESCE_USE_ADAPTIVE_TX, coal->use_adaptive_tx_coalesce, supported) || coalesce_put_u32(skb, ETHTOOL_A_COALESCE_PKT_RATE_LOW, coal->pkt_rate_low, supported) || coalesce_put_u32(skb, ETHTOOL_A_COALESCE_RX_USECS_LOW, coal->rx_coalesce_usecs_low, supported) || coalesce_put_u32(skb, ETHTOOL_A_COALESCE_RX_MAX_FRAMES_LOW, coal->rx_max_coalesced_frames_low, supported) || coalesce_put_u32(skb, ETHTOOL_A_COALESCE_TX_USECS_LOW, coal->tx_coalesce_usecs_low, supported) || coalesce_put_u32(skb, ETHTOOL_A_COALESCE_TX_MAX_FRAMES_LOW, coal->tx_max_coalesced_frames_low, supported) || coalesce_put_u32(skb, ETHTOOL_A_COALESCE_PKT_RATE_HIGH, coal->pkt_rate_high, supported) || coalesce_put_u32(skb, ETHTOOL_A_COALESCE_RX_USECS_HIGH, coal->rx_coalesce_usecs_high, supported) || coalesce_put_u32(skb, ETHTOOL_A_COALESCE_RX_MAX_FRAMES_HIGH, coal->rx_max_coalesced_frames_high, supported) || coalesce_put_u32(skb, ETHTOOL_A_COALESCE_TX_USECS_HIGH, coal->tx_coalesce_usecs_high, supported) || coalesce_put_u32(skb, ETHTOOL_A_COALESCE_TX_MAX_FRAMES_HIGH, coal->tx_max_coalesced_frames_high, supported) || coalesce_put_u32(skb, ETHTOOL_A_COALESCE_RATE_SAMPLE_INTERVAL, coal->rate_sample_interval, supported) || coalesce_put_bool(skb, ETHTOOL_A_COALESCE_USE_CQE_MODE_TX, kcoal->use_cqe_mode_tx, supported) || coalesce_put_bool(skb, ETHTOOL_A_COALESCE_USE_CQE_MODE_RX, kcoal->use_cqe_mode_rx, supported) || coalesce_put_u32(skb, ETHTOOL_A_COALESCE_TX_AGGR_MAX_BYTES, kcoal->tx_aggr_max_bytes, supported) || coalesce_put_u32(skb, ETHTOOL_A_COALESCE_TX_AGGR_MAX_FRAMES, kcoal->tx_aggr_max_frames, supported) || coalesce_put_u32(skb, ETHTOOL_A_COALESCE_TX_AGGR_TIME_USECS, kcoal->tx_aggr_time_usecs, supported)) return -EMSGSIZE; if (!req_base->dev || !req_base->dev->irq_moder) return 0; moder = req_base->dev->irq_moder; rcu_read_lock(); if (moder->profile_flags & DIM_PROFILE_RX) { ret = coalesce_put_profile(skb, ETHTOOL_A_COALESCE_RX_PROFILE, rcu_dereference(moder->rx_profile), moder->coal_flags); if (ret) goto out; } if (moder->profile_flags & DIM_PROFILE_TX) ret = coalesce_put_profile(skb, ETHTOOL_A_COALESCE_TX_PROFILE, rcu_dereference(moder->tx_profile), moder->coal_flags); out: rcu_read_unlock(); return ret; } /* COALESCE_SET */ static const struct nla_policy coalesce_irq_moderation_policy[] = { [ETHTOOL_A_IRQ_MODERATION_USEC] = { .type = NLA_U32 }, [ETHTOOL_A_IRQ_MODERATION_PKTS] = { .type = NLA_U32 }, [ETHTOOL_A_IRQ_MODERATION_COMPS] = { .type = NLA_U32 }, }; static const struct nla_policy coalesce_profile_policy[] = { [ETHTOOL_A_PROFILE_IRQ_MODERATION] = NLA_POLICY_NESTED(coalesce_irq_moderation_policy), }; const struct nla_policy ethnl_coalesce_set_policy[] = { [ETHTOOL_A_COALESCE_HEADER] = NLA_POLICY_NESTED(ethnl_header_policy), [ETHTOOL_A_COALESCE_RX_USECS] = { .type = NLA_U32 }, [ETHTOOL_A_COALESCE_RX_MAX_FRAMES] = { .type = NLA_U32 }, [ETHTOOL_A_COALESCE_RX_USECS_IRQ] = { .type = NLA_U32 }, [ETHTOOL_A_COALESCE_RX_MAX_FRAMES_IRQ] = { .type = NLA_U32 }, [ETHTOOL_A_COALESCE_TX_USECS] = { .type = NLA_U32 }, [ETHTOOL_A_COALESCE_TX_MAX_FRAMES] = { .type = NLA_U32 }, [ETHTOOL_A_COALESCE_TX_USECS_IRQ] = { .type = NLA_U32 }, [ETHTOOL_A_COALESCE_TX_MAX_FRAMES_IRQ] = { .type = NLA_U32 }, [ETHTOOL_A_COALESCE_STATS_BLOCK_USECS] = { .type = NLA_U32 }, [ETHTOOL_A_COALESCE_USE_ADAPTIVE_RX] = { .type = NLA_U8 }, [ETHTOOL_A_COALESCE_USE_ADAPTIVE_TX] = { .type = NLA_U8 }, [ETHTOOL_A_COALESCE_PKT_RATE_LOW] = { .type = NLA_U32 }, [ETHTOOL_A_COALESCE_RX_USECS_LOW] = { .type = NLA_U32 }, [ETHTOOL_A_COALESCE_RX_MAX_FRAMES_LOW] = { .type = NLA_U32 }, [ETHTOOL_A_COALESCE_TX_USECS_LOW] = { .type = NLA_U32 }, [ETHTOOL_A_COALESCE_TX_MAX_FRAMES_LOW] = { .type = NLA_U32 }, [ETHTOOL_A_COALESCE_PKT_RATE_HIGH] = { .type = NLA_U32 }, [ETHTOOL_A_COALESCE_RX_USECS_HIGH] = { .type = NLA_U32 }, [ETHTOOL_A_COALESCE_RX_MAX_FRAMES_HIGH] = { .type = NLA_U32 }, [ETHTOOL_A_COALESCE_TX_USECS_HIGH] = { .type = NLA_U32 }, [ETHTOOL_A_COALESCE_TX_MAX_FRAMES_HIGH] = { .type = NLA_U32 }, [ETHTOOL_A_COALESCE_RATE_SAMPLE_INTERVAL] = { .type = NLA_U32 }, [ETHTOOL_A_COALESCE_USE_CQE_MODE_TX] = NLA_POLICY_MAX(NLA_U8, 1), [ETHTOOL_A_COALESCE_USE_CQE_MODE_RX] = NLA_POLICY_MAX(NLA_U8, 1), [ETHTOOL_A_COALESCE_TX_AGGR_MAX_BYTES] = { .type = NLA_U32 }, [ETHTOOL_A_COALESCE_TX_AGGR_MAX_FRAMES] = { .type = NLA_U32 }, [ETHTOOL_A_COALESCE_TX_AGGR_TIME_USECS] = { .type = NLA_U32 }, [ETHTOOL_A_COALESCE_RX_PROFILE] = NLA_POLICY_NESTED(coalesce_profile_policy), [ETHTOOL_A_COALESCE_TX_PROFILE] = NLA_POLICY_NESTED(coalesce_profile_policy), }; static int ethnl_set_coalesce_validate(struct ethnl_req_info *req_info, struct genl_info *info) { const struct ethtool_ops *ops = req_info->dev->ethtool_ops; struct dim_irq_moder *irq_moder = req_info->dev->irq_moder; struct nlattr **tb = info->attrs; u32 supported_params; u16 a; if (!ops->get_coalesce || !ops->set_coalesce) return -EOPNOTSUPP; /* make sure that only supported parameters are present */ supported_params = ops->supported_coalesce_params; if (irq_moder && irq_moder->profile_flags & DIM_PROFILE_RX) supported_params |= ETHTOOL_COALESCE_RX_PROFILE; if (irq_moder && irq_moder->profile_flags & DIM_PROFILE_TX) supported_params |= ETHTOOL_COALESCE_TX_PROFILE; for (a = ETHTOOL_A_COALESCE_RX_USECS; a < __ETHTOOL_A_COALESCE_CNT; a++) if (tb[a] && !(supported_params & attr_to_mask(a))) { NL_SET_ERR_MSG_ATTR(info->extack, tb[a], "cannot modify an unsupported parameter"); return -EINVAL; } return 1; } /** * ethnl_update_irq_moder - update a specific field in the given profile * @irq_moder: place that collects dim related information * @irq_field: field in profile to modify * @attr_type: attr type ETHTOOL_A_IRQ_MODERATION_* * @tb: netlink attribute with new values or null * @coal_bit: DIM_COALESCE_* bit from coal_flags * @mod: pointer to bool for modification tracking * @extack: netlink extended ack * * Return: 0 on success or a negative error code. */ static int ethnl_update_irq_moder(struct dim_irq_moder *irq_moder, u16 *irq_field, u16 attr_type, struct nlattr **tb, u8 coal_bit, bool *mod, struct netlink_ext_ack *extack) { int ret = 0; u32 val; if (!tb[attr_type]) return 0; if (irq_moder->coal_flags & coal_bit) { val = nla_get_u32(tb[attr_type]); if (*irq_field == val) return 0; *irq_field = val; *mod = true; } else { NL_SET_BAD_ATTR(extack, tb[attr_type]); ret = -EOPNOTSUPP; } return ret; } /** * ethnl_update_profile - get a profile nest with child nests from userspace. * @dev: netdevice to update the profile * @dst: profile get from the driver and modified by ethnl_update_profile. * @nests: nest attr ETHTOOL_A_COALESCE_*X_PROFILE to set profile. * @mod: pointer to bool for modification tracking * @extack: Netlink extended ack * * Layout of nests: * Nested ETHTOOL_A_COALESCE_*X_PROFILE attr * Nested ETHTOOL_A_PROFILE_IRQ_MODERATION attr * ETHTOOL_A_IRQ_MODERATION_USEC attr * ETHTOOL_A_IRQ_MODERATION_PKTS attr * ETHTOOL_A_IRQ_MODERATION_COMPS attr * ... * Nested ETHTOOL_A_PROFILE_IRQ_MODERATION attr * ETHTOOL_A_IRQ_MODERATION_USEC attr * ETHTOOL_A_IRQ_MODERATION_PKTS attr * ETHTOOL_A_IRQ_MODERATION_COMPS attr * * Return: 0 on success or a negative error code. */ static int ethnl_update_profile(struct net_device *dev, struct dim_cq_moder __rcu **dst, const struct nlattr *nests, bool *mod, struct netlink_ext_ack *extack) { int len_irq_moder = ARRAY_SIZE(coalesce_irq_moderation_policy); struct nlattr *tb[ARRAY_SIZE(coalesce_irq_moderation_policy)]; struct dim_irq_moder *irq_moder = dev->irq_moder; struct dim_cq_moder *new_profile, *old_profile; int ret, rem, i = 0, len; struct nlattr *nest; if (!nests) return 0; if (!*dst) return -EOPNOTSUPP; old_profile = rtnl_dereference(*dst); len = NET_DIM_PARAMS_NUM_PROFILES * sizeof(*old_profile); new_profile = kmemdup(old_profile, len, GFP_KERNEL); if (!new_profile) return -ENOMEM; nla_for_each_nested_type(nest, ETHTOOL_A_PROFILE_IRQ_MODERATION, nests, rem) { ret = nla_parse_nested(tb, len_irq_moder - 1, nest, coalesce_irq_moderation_policy, extack); if (ret) goto err_out; ret = ethnl_update_irq_moder(irq_moder, &new_profile[i].usec, ETHTOOL_A_IRQ_MODERATION_USEC, tb, DIM_COALESCE_USEC, mod, extack); if (ret) goto err_out; ret = ethnl_update_irq_moder(irq_moder, &new_profile[i].pkts, ETHTOOL_A_IRQ_MODERATION_PKTS, tb, DIM_COALESCE_PKTS, mod, extack); if (ret) goto err_out; ret = ethnl_update_irq_moder(irq_moder, &new_profile[i].comps, ETHTOOL_A_IRQ_MODERATION_COMPS, tb, DIM_COALESCE_COMPS, mod, extack); if (ret) goto err_out; i++; } /* After the profile is modified, dim itself is a dynamic * mechanism and will quickly fit to the appropriate * coalescing parameters according to the new profile. */ rcu_assign_pointer(*dst, new_profile); kfree_rcu(old_profile, rcu); return 0; err_out: kfree(new_profile); return ret; } static int __ethnl_set_coalesce(struct ethnl_req_info *req_info, struct genl_info *info, bool *dual_change) { struct kernel_ethtool_coalesce kernel_coalesce = {}; struct net_device *dev = req_info->dev; struct ethtool_coalesce coalesce = {}; bool mod_mode = false, mod = false; struct nlattr **tb = info->attrs; int ret; ret = dev->ethtool_ops->get_coalesce(dev, &coalesce, &kernel_coalesce, info->extack); if (ret < 0) return ret; /* Update values */ ethnl_update_u32(&coalesce.rx_coalesce_usecs, tb[ETHTOOL_A_COALESCE_RX_USECS], &mod); ethnl_update_u32(&coalesce.rx_max_coalesced_frames, tb[ETHTOOL_A_COALESCE_RX_MAX_FRAMES], &mod); ethnl_update_u32(&coalesce.rx_coalesce_usecs_irq, tb[ETHTOOL_A_COALESCE_RX_USECS_IRQ], &mod); ethnl_update_u32(&coalesce.rx_max_coalesced_frames_irq, tb[ETHTOOL_A_COALESCE_RX_MAX_FRAMES_IRQ], &mod); ethnl_update_u32(&coalesce.tx_coalesce_usecs, tb[ETHTOOL_A_COALESCE_TX_USECS], &mod); ethnl_update_u32(&coalesce.tx_max_coalesced_frames, tb[ETHTOOL_A_COALESCE_TX_MAX_FRAMES], &mod); ethnl_update_u32(&coalesce.tx_coalesce_usecs_irq, tb[ETHTOOL_A_COALESCE_TX_USECS_IRQ], &mod); ethnl_update_u32(&coalesce.tx_max_coalesced_frames_irq, tb[ETHTOOL_A_COALESCE_TX_MAX_FRAMES_IRQ], &mod); ethnl_update_u32(&coalesce.stats_block_coalesce_usecs, tb[ETHTOOL_A_COALESCE_STATS_BLOCK_USECS], &mod); ethnl_update_u32(&coalesce.pkt_rate_low, tb[ETHTOOL_A_COALESCE_PKT_RATE_LOW], &mod); ethnl_update_u32(&coalesce.rx_coalesce_usecs_low, tb[ETHTOOL_A_COALESCE_RX_USECS_LOW], &mod); ethnl_update_u32(&coalesce.rx_max_coalesced_frames_low, tb[ETHTOOL_A_COALESCE_RX_MAX_FRAMES_LOW], &mod); ethnl_update_u32(&coalesce.tx_coalesce_usecs_low, tb[ETHTOOL_A_COALESCE_TX_USECS_LOW], &mod); ethnl_update_u32(&coalesce.tx_max_coalesced_frames_low, tb[ETHTOOL_A_COALESCE_TX_MAX_FRAMES_LOW], &mod); ethnl_update_u32(&coalesce.pkt_rate_high, tb[ETHTOOL_A_COALESCE_PKT_RATE_HIGH], &mod); ethnl_update_u32(&coalesce.rx_coalesce_usecs_high, tb[ETHTOOL_A_COALESCE_RX_USECS_HIGH], &mod); ethnl_update_u32(&coalesce.rx_max_coalesced_frames_high, tb[ETHTOOL_A_COALESCE_RX_MAX_FRAMES_HIGH], &mod); ethnl_update_u32(&coalesce.tx_coalesce_usecs_high, tb[ETHTOOL_A_COALESCE_TX_USECS_HIGH], &mod); ethnl_update_u32(&coalesce.tx_max_coalesced_frames_high, tb[ETHTOOL_A_COALESCE_TX_MAX_FRAMES_HIGH], &mod); ethnl_update_u32(&coalesce.rate_sample_interval, tb[ETHTOOL_A_COALESCE_RATE_SAMPLE_INTERVAL], &mod); ethnl_update_u32(&kernel_coalesce.tx_aggr_max_bytes, tb[ETHTOOL_A_COALESCE_TX_AGGR_MAX_BYTES], &mod); ethnl_update_u32(&kernel_coalesce.tx_aggr_max_frames, tb[ETHTOOL_A_COALESCE_TX_AGGR_MAX_FRAMES], &mod); ethnl_update_u32(&kernel_coalesce.tx_aggr_time_usecs, tb[ETHTOOL_A_COALESCE_TX_AGGR_TIME_USECS], &mod); if (dev->irq_moder && dev->irq_moder->profile_flags & DIM_PROFILE_RX) { ret = ethnl_update_profile(dev, &dev->irq_moder->rx_profile, tb[ETHTOOL_A_COALESCE_RX_PROFILE], &mod, info->extack); if (ret < 0) return ret; } if (dev->irq_moder && dev->irq_moder->profile_flags & DIM_PROFILE_TX) { ret = ethnl_update_profile(dev, &dev->irq_moder->tx_profile, tb[ETHTOOL_A_COALESCE_TX_PROFILE], &mod, info->extack); if (ret < 0) return ret; } /* Update operation modes */ ethnl_update_bool32(&coalesce.use_adaptive_rx_coalesce, tb[ETHTOOL_A_COALESCE_USE_ADAPTIVE_RX], &mod_mode); ethnl_update_bool32(&coalesce.use_adaptive_tx_coalesce, tb[ETHTOOL_A_COALESCE_USE_ADAPTIVE_TX], &mod_mode); ethnl_update_u8(&kernel_coalesce.use_cqe_mode_tx, tb[ETHTOOL_A_COALESCE_USE_CQE_MODE_TX], &mod_mode); ethnl_update_u8(&kernel_coalesce.use_cqe_mode_rx, tb[ETHTOOL_A_COALESCE_USE_CQE_MODE_RX], &mod_mode); *dual_change = mod && mod_mode; if (!mod && !mod_mode) return 0; ret = dev->ethtool_ops->set_coalesce(dev, &coalesce, &kernel_coalesce, info->extack); return ret < 0 ? ret : 1; } static int ethnl_set_coalesce(struct ethnl_req_info *req_info, struct genl_info *info) { bool dual_change; int err, ret; /* SET_COALESCE may change operation mode and parameters in one call. * Changing operation mode may cause the driver to reset the parameter * values, and therefore ignore user input (driver does not know which * parameters come from user and which are echoed back from ->get). * To not complicate the drivers if user tries to change both the mode * and parameters at once - call the driver twice. */ err = __ethnl_set_coalesce(req_info, info, &dual_change); if (err < 0) return err; ret = err; if (ret && dual_change) { err = __ethnl_set_coalesce(req_info, info, &dual_change); if (err < 0) return err; } return ret; } const struct ethnl_request_ops ethnl_coalesce_request_ops = { .request_cmd = ETHTOOL_MSG_COALESCE_GET, .reply_cmd = ETHTOOL_MSG_COALESCE_GET_REPLY, .hdr_attr = ETHTOOL_A_COALESCE_HEADER, .req_info_size = sizeof(struct coalesce_req_info), .reply_data_size = sizeof(struct coalesce_reply_data), .prepare_data = coalesce_prepare_data, .reply_size = coalesce_reply_size, .fill_reply = coalesce_fill_reply, .set_validate = ethnl_set_coalesce_validate, .set = ethnl_set_coalesce, .set_ntf_cmd = ETHTOOL_MSG_COALESCE_NTF, }; |
| 2470 2476 1159 771 643 1440 1439 2356 2354 2354 2388 2386 2351 1441 2355 2357 1438 1439 1439 1439 1440 1220 2354 2346 2352 2354 2350 1459 1466 2348 2358 2345 2291 116 1436 1439 1432 1435 1434 1438 2485 67 2475 20 1114 1873 2352 345 1855 1840 124 1443 1442 1444 1441 6 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 | // SPDX-License-Identifier: GPL-2.0 /* * KFENCE guarded object allocator and fault handling. * * Copyright (C) 2020, Google LLC. */ #define pr_fmt(fmt) "kfence: " fmt #include <linux/atomic.h> #include <linux/bug.h> #include <linux/debugfs.h> #include <linux/hash.h> #include <linux/irq_work.h> #include <linux/jhash.h> #include <linux/kcsan-checks.h> #include <linux/kfence.h> #include <linux/kmemleak.h> #include <linux/list.h> #include <linux/lockdep.h> #include <linux/log2.h> #include <linux/memblock.h> #include <linux/moduleparam.h> #include <linux/nodemask.h> #include <linux/notifier.h> #include <linux/panic_notifier.h> #include <linux/random.h> #include <linux/rcupdate.h> #include <linux/sched/clock.h> #include <linux/seq_file.h> #include <linux/slab.h> #include <linux/spinlock.h> #include <linux/string.h> #include <asm/kfence.h> #include "kfence.h" /* Disables KFENCE on the first warning assuming an irrecoverable error. */ #define KFENCE_WARN_ON(cond) \ ({ \ const bool __cond = WARN_ON(cond); \ if (unlikely(__cond)) { \ WRITE_ONCE(kfence_enabled, false); \ disabled_by_warn = true; \ } \ __cond; \ }) /* === Data ================================================================= */ static bool kfence_enabled __read_mostly; static bool disabled_by_warn __read_mostly; unsigned long kfence_sample_interval __read_mostly = CONFIG_KFENCE_SAMPLE_INTERVAL; EXPORT_SYMBOL_GPL(kfence_sample_interval); /* Export for test modules. */ #ifdef MODULE_PARAM_PREFIX #undef MODULE_PARAM_PREFIX #endif #define MODULE_PARAM_PREFIX "kfence." static int kfence_enable_late(void); static int param_set_sample_interval(const char *val, const struct kernel_param *kp) { unsigned long num; int ret = kstrtoul(val, 0, &num); if (ret < 0) return ret; /* Using 0 to indicate KFENCE is disabled. */ if (!num && READ_ONCE(kfence_enabled)) { pr_info("disabled\n"); WRITE_ONCE(kfence_enabled, false); } *((unsigned long *)kp->arg) = num; if (num && !READ_ONCE(kfence_enabled) && system_state != SYSTEM_BOOTING) return disabled_by_warn ? -EINVAL : kfence_enable_late(); return 0; } static int param_get_sample_interval(char *buffer, const struct kernel_param *kp) { if (!READ_ONCE(kfence_enabled)) return sprintf(buffer, "0\n"); return param_get_ulong(buffer, kp); } static const struct kernel_param_ops sample_interval_param_ops = { .set = param_set_sample_interval, .get = param_get_sample_interval, }; module_param_cb(sample_interval, &sample_interval_param_ops, &kfence_sample_interval, 0600); /* Pool usage% threshold when currently covered allocations are skipped. */ static unsigned long kfence_skip_covered_thresh __read_mostly = 75; module_param_named(skip_covered_thresh, kfence_skip_covered_thresh, ulong, 0644); /* Allocation burst count: number of excess KFENCE allocations per sample. */ static unsigned int kfence_burst __read_mostly; module_param_named(burst, kfence_burst, uint, 0644); /* If true, use a deferrable timer. */ static bool kfence_deferrable __read_mostly = IS_ENABLED(CONFIG_KFENCE_DEFERRABLE); module_param_named(deferrable, kfence_deferrable, bool, 0444); /* If true, check all canary bytes on panic. */ static bool kfence_check_on_panic __read_mostly; module_param_named(check_on_panic, kfence_check_on_panic, bool, 0444); /* The pool of pages used for guard pages and objects. */ char *__kfence_pool __read_mostly; EXPORT_SYMBOL(__kfence_pool); /* Export for test modules. */ /* * Per-object metadata, with one-to-one mapping of object metadata to * backing pages (in __kfence_pool). */ static_assert(CONFIG_KFENCE_NUM_OBJECTS > 0); struct kfence_metadata *kfence_metadata __read_mostly; /* * If kfence_metadata is not NULL, it may be accessed by kfence_shutdown_cache(). * So introduce kfence_metadata_init to initialize metadata, and then make * kfence_metadata visible after initialization is successful. This prevents * potential UAF or access to uninitialized metadata. */ static struct kfence_metadata *kfence_metadata_init __read_mostly; /* Freelist with available objects. */ static struct list_head kfence_freelist = LIST_HEAD_INIT(kfence_freelist); static DEFINE_RAW_SPINLOCK(kfence_freelist_lock); /* Lock protecting freelist. */ /* * The static key to set up a KFENCE allocation; or if static keys are not used * to gate allocations, to avoid a load and compare if KFENCE is disabled. */ DEFINE_STATIC_KEY_FALSE(kfence_allocation_key); /* Gates the allocation, ensuring only one succeeds in a given period. */ atomic_t kfence_allocation_gate = ATOMIC_INIT(1); /* * A Counting Bloom filter of allocation coverage: limits currently covered * allocations of the same source filling up the pool. * * Assuming a range of 15%-85% unique allocations in the pool at any point in * time, the below parameters provide a probablity of 0.02-0.33 for false * positive hits respectively: * * P(alloc_traces) = (1 - e^(-HNUM * (alloc_traces / SIZE)) ^ HNUM */ #define ALLOC_COVERED_HNUM 2 #define ALLOC_COVERED_ORDER (const_ilog2(CONFIG_KFENCE_NUM_OBJECTS) + 2) #define ALLOC_COVERED_SIZE (1 << ALLOC_COVERED_ORDER) #define ALLOC_COVERED_HNEXT(h) hash_32(h, ALLOC_COVERED_ORDER) #define ALLOC_COVERED_MASK (ALLOC_COVERED_SIZE - 1) static atomic_t alloc_covered[ALLOC_COVERED_SIZE]; /* Stack depth used to determine uniqueness of an allocation. */ #define UNIQUE_ALLOC_STACK_DEPTH ((size_t)8) /* * Randomness for stack hashes, making the same collisions across reboots and * different machines less likely. */ static u32 stack_hash_seed __ro_after_init; /* Statistics counters for debugfs. */ enum kfence_counter_id { KFENCE_COUNTER_ALLOCATED, KFENCE_COUNTER_ALLOCS, KFENCE_COUNTER_FREES, KFENCE_COUNTER_ZOMBIES, KFENCE_COUNTER_BUGS, KFENCE_COUNTER_SKIP_INCOMPAT, KFENCE_COUNTER_SKIP_CAPACITY, KFENCE_COUNTER_SKIP_COVERED, KFENCE_COUNTER_COUNT, }; static atomic_long_t counters[KFENCE_COUNTER_COUNT]; static const char *const counter_names[] = { [KFENCE_COUNTER_ALLOCATED] = "currently allocated", [KFENCE_COUNTER_ALLOCS] = "total allocations", [KFENCE_COUNTER_FREES] = "total frees", [KFENCE_COUNTER_ZOMBIES] = "zombie allocations", [KFENCE_COUNTER_BUGS] = "total bugs", [KFENCE_COUNTER_SKIP_INCOMPAT] = "skipped allocations (incompatible)", [KFENCE_COUNTER_SKIP_CAPACITY] = "skipped allocations (capacity)", [KFENCE_COUNTER_SKIP_COVERED] = "skipped allocations (covered)", }; static_assert(ARRAY_SIZE(counter_names) == KFENCE_COUNTER_COUNT); /* === Internals ============================================================ */ static inline bool should_skip_covered(void) { unsigned long thresh = (CONFIG_KFENCE_NUM_OBJECTS * kfence_skip_covered_thresh) / 100; return atomic_long_read(&counters[KFENCE_COUNTER_ALLOCATED]) > thresh; } static u32 get_alloc_stack_hash(unsigned long *stack_entries, size_t num_entries) { num_entries = min(num_entries, UNIQUE_ALLOC_STACK_DEPTH); num_entries = filter_irq_stacks(stack_entries, num_entries); return jhash(stack_entries, num_entries * sizeof(stack_entries[0]), stack_hash_seed); } /* * Adds (or subtracts) count @val for allocation stack trace hash * @alloc_stack_hash from Counting Bloom filter. */ static void alloc_covered_add(u32 alloc_stack_hash, int val) { int i; for (i = 0; i < ALLOC_COVERED_HNUM; i++) { atomic_add(val, &alloc_covered[alloc_stack_hash & ALLOC_COVERED_MASK]); alloc_stack_hash = ALLOC_COVERED_HNEXT(alloc_stack_hash); } } /* * Returns true if the allocation stack trace hash @alloc_stack_hash is * currently contained (non-zero count) in Counting Bloom filter. */ static bool alloc_covered_contains(u32 alloc_stack_hash) { int i; for (i = 0; i < ALLOC_COVERED_HNUM; i++) { if (!atomic_read(&alloc_covered[alloc_stack_hash & ALLOC_COVERED_MASK])) return false; alloc_stack_hash = ALLOC_COVERED_HNEXT(alloc_stack_hash); } return true; } static bool kfence_protect(unsigned long addr) { return !KFENCE_WARN_ON(!kfence_protect_page(ALIGN_DOWN(addr, PAGE_SIZE), true)); } static bool kfence_unprotect(unsigned long addr) { return !KFENCE_WARN_ON(!kfence_protect_page(ALIGN_DOWN(addr, PAGE_SIZE), false)); } static inline unsigned long metadata_to_pageaddr(const struct kfence_metadata *meta) { unsigned long offset = (meta - kfence_metadata + 1) * PAGE_SIZE * 2; unsigned long pageaddr = (unsigned long)&__kfence_pool[offset]; /* The checks do not affect performance; only called from slow-paths. */ /* Only call with a pointer into kfence_metadata. */ if (KFENCE_WARN_ON(meta < kfence_metadata || meta >= kfence_metadata + CONFIG_KFENCE_NUM_OBJECTS)) return 0; /* * This metadata object only ever maps to 1 page; verify that the stored * address is in the expected range. */ if (KFENCE_WARN_ON(ALIGN_DOWN(meta->addr, PAGE_SIZE) != pageaddr)) return 0; return pageaddr; } static inline bool kfence_obj_allocated(const struct kfence_metadata *meta) { enum kfence_object_state state = READ_ONCE(meta->state); return state == KFENCE_OBJECT_ALLOCATED || state == KFENCE_OBJECT_RCU_FREEING; } /* * Update the object's metadata state, including updating the alloc/free stacks * depending on the state transition. */ static noinline void metadata_update_state(struct kfence_metadata *meta, enum kfence_object_state next, unsigned long *stack_entries, size_t num_stack_entries) { struct kfence_track *track = next == KFENCE_OBJECT_ALLOCATED ? &meta->alloc_track : &meta->free_track; lockdep_assert_held(&meta->lock); /* Stack has been saved when calling rcu, skip. */ if (READ_ONCE(meta->state) == KFENCE_OBJECT_RCU_FREEING) goto out; if (stack_entries) { memcpy(track->stack_entries, stack_entries, num_stack_entries * sizeof(stack_entries[0])); } else { /* * Skip over 1 (this) functions; noinline ensures we do not * accidentally skip over the caller by never inlining. */ num_stack_entries = stack_trace_save(track->stack_entries, KFENCE_STACK_DEPTH, 1); } track->num_stack_entries = num_stack_entries; track->pid = task_pid_nr(current); track->cpu = raw_smp_processor_id(); track->ts_nsec = local_clock(); /* Same source as printk timestamps. */ out: /* * Pairs with READ_ONCE() in * kfence_shutdown_cache(), * kfence_handle_page_fault(). */ WRITE_ONCE(meta->state, next); } #ifdef CONFIG_KMSAN #define check_canary_attributes noinline __no_kmsan_checks #else #define check_canary_attributes inline #endif /* Check canary byte at @addr. */ static check_canary_attributes bool check_canary_byte(u8 *addr) { struct kfence_metadata *meta; unsigned long flags; if (likely(*addr == KFENCE_CANARY_PATTERN_U8(addr))) return true; atomic_long_inc(&counters[KFENCE_COUNTER_BUGS]); meta = addr_to_metadata((unsigned long)addr); raw_spin_lock_irqsave(&meta->lock, flags); kfence_report_error((unsigned long)addr, false, NULL, meta, KFENCE_ERROR_CORRUPTION); raw_spin_unlock_irqrestore(&meta->lock, flags); return false; } static inline void set_canary(const struct kfence_metadata *meta) { const unsigned long pageaddr = ALIGN_DOWN(meta->addr, PAGE_SIZE); unsigned long addr = pageaddr; /* * The canary may be written to part of the object memory, but it does * not affect it. The user should initialize the object before using it. */ for (; addr < meta->addr; addr += sizeof(u64)) *((u64 *)addr) = KFENCE_CANARY_PATTERN_U64; addr = ALIGN_DOWN(meta->addr + meta->size, sizeof(u64)); for (; addr - pageaddr < PAGE_SIZE; addr += sizeof(u64)) *((u64 *)addr) = KFENCE_CANARY_PATTERN_U64; } static check_canary_attributes void check_canary(const struct kfence_metadata *meta) { const unsigned long pageaddr = ALIGN_DOWN(meta->addr, PAGE_SIZE); unsigned long addr = pageaddr; /* * We'll iterate over each canary byte per-side until a corrupted byte * is found. However, we'll still iterate over the canary bytes to the * right of the object even if there was an error in the canary bytes to * the left of the object. Specifically, if check_canary_byte() * generates an error, showing both sides might give more clues as to * what the error is about when displaying which bytes were corrupted. */ /* Apply to left of object. */ for (; meta->addr - addr >= sizeof(u64); addr += sizeof(u64)) { if (unlikely(*((u64 *)addr) != KFENCE_CANARY_PATTERN_U64)) break; } /* * If the canary is corrupted in a certain 64 bytes, or the canary * memory cannot be completely covered by multiple consecutive 64 bytes, * it needs to be checked one by one. */ for (; addr < meta->addr; addr++) { if (unlikely(!check_canary_byte((u8 *)addr))) break; } /* Apply to right of object. */ for (addr = meta->addr + meta->size; addr % sizeof(u64) != 0; addr++) { if (unlikely(!check_canary_byte((u8 *)addr))) return; } for (; addr - pageaddr < PAGE_SIZE; addr += sizeof(u64)) { if (unlikely(*((u64 *)addr) != KFENCE_CANARY_PATTERN_U64)) { for (; addr - pageaddr < PAGE_SIZE; addr++) { if (!check_canary_byte((u8 *)addr)) return; } } } } static void *kfence_guarded_alloc(struct kmem_cache *cache, size_t size, gfp_t gfp, unsigned long *stack_entries, size_t num_stack_entries, u32 alloc_stack_hash) { struct kfence_metadata *meta = NULL; unsigned long flags; struct slab *slab; void *addr; const bool random_right_allocate = get_random_u32_below(2); const bool random_fault = CONFIG_KFENCE_STRESS_TEST_FAULTS && !get_random_u32_below(CONFIG_KFENCE_STRESS_TEST_FAULTS); /* Try to obtain a free object. */ raw_spin_lock_irqsave(&kfence_freelist_lock, flags); if (!list_empty(&kfence_freelist)) { meta = list_entry(kfence_freelist.next, struct kfence_metadata, list); list_del_init(&meta->list); } raw_spin_unlock_irqrestore(&kfence_freelist_lock, flags); if (!meta) { atomic_long_inc(&counters[KFENCE_COUNTER_SKIP_CAPACITY]); return NULL; } if (unlikely(!raw_spin_trylock_irqsave(&meta->lock, flags))) { /* * This is extremely unlikely -- we are reporting on a * use-after-free, which locked meta->lock, and the reporting * code via printk calls kmalloc() which ends up in * kfence_alloc() and tries to grab the same object that we're * reporting on. While it has never been observed, lockdep does * report that there is a possibility of deadlock. Fix it by * using trylock and bailing out gracefully. */ raw_spin_lock_irqsave(&kfence_freelist_lock, flags); /* Put the object back on the freelist. */ list_add_tail(&meta->list, &kfence_freelist); raw_spin_unlock_irqrestore(&kfence_freelist_lock, flags); return NULL; } meta->addr = metadata_to_pageaddr(meta); /* Unprotect if we're reusing this page. */ if (meta->state == KFENCE_OBJECT_FREED) kfence_unprotect(meta->addr); /* * Note: for allocations made before RNG initialization, will always * return zero. We still benefit from enabling KFENCE as early as * possible, even when the RNG is not yet available, as this will allow * KFENCE to detect bugs due to earlier allocations. The only downside * is that the out-of-bounds accesses detected are deterministic for * such allocations. */ if (random_right_allocate) { /* Allocate on the "right" side, re-calculate address. */ meta->addr += PAGE_SIZE - size; meta->addr = ALIGN_DOWN(meta->addr, cache->align); } addr = (void *)meta->addr; /* Update remaining metadata. */ metadata_update_state(meta, KFENCE_OBJECT_ALLOCATED, stack_entries, num_stack_entries); /* Pairs with READ_ONCE() in kfence_shutdown_cache(). */ WRITE_ONCE(meta->cache, cache); meta->size = size; meta->alloc_stack_hash = alloc_stack_hash; raw_spin_unlock_irqrestore(&meta->lock, flags); alloc_covered_add(alloc_stack_hash, 1); /* Set required slab fields. */ slab = virt_to_slab((void *)meta->addr); slab->slab_cache = cache; slab->objects = 1; /* Memory initialization. */ set_canary(meta); /* * We check slab_want_init_on_alloc() ourselves, rather than letting * SL*B do the initialization, as otherwise we might overwrite KFENCE's * redzone. */ if (unlikely(slab_want_init_on_alloc(gfp, cache))) memzero_explicit(addr, size); if (cache->ctor) cache->ctor(addr); if (random_fault) kfence_protect(meta->addr); /* Random "faults" by protecting the object. */ atomic_long_inc(&counters[KFENCE_COUNTER_ALLOCATED]); atomic_long_inc(&counters[KFENCE_COUNTER_ALLOCS]); return addr; } static void kfence_guarded_free(void *addr, struct kfence_metadata *meta, bool zombie) { struct kcsan_scoped_access assert_page_exclusive; unsigned long flags; bool init; raw_spin_lock_irqsave(&meta->lock, flags); if (!kfence_obj_allocated(meta) || meta->addr != (unsigned long)addr) { /* Invalid or double-free, bail out. */ atomic_long_inc(&counters[KFENCE_COUNTER_BUGS]); kfence_report_error((unsigned long)addr, false, NULL, meta, KFENCE_ERROR_INVALID_FREE); raw_spin_unlock_irqrestore(&meta->lock, flags); return; } /* Detect racy use-after-free, or incorrect reallocation of this page by KFENCE. */ kcsan_begin_scoped_access((void *)ALIGN_DOWN((unsigned long)addr, PAGE_SIZE), PAGE_SIZE, KCSAN_ACCESS_SCOPED | KCSAN_ACCESS_WRITE | KCSAN_ACCESS_ASSERT, &assert_page_exclusive); if (CONFIG_KFENCE_STRESS_TEST_FAULTS) kfence_unprotect((unsigned long)addr); /* To check canary bytes. */ /* Restore page protection if there was an OOB access. */ if (meta->unprotected_page) { memzero_explicit((void *)ALIGN_DOWN(meta->unprotected_page, PAGE_SIZE), PAGE_SIZE); kfence_protect(meta->unprotected_page); meta->unprotected_page = 0; } /* Mark the object as freed. */ metadata_update_state(meta, KFENCE_OBJECT_FREED, NULL, 0); init = slab_want_init_on_free(meta->cache); raw_spin_unlock_irqrestore(&meta->lock, flags); alloc_covered_add(meta->alloc_stack_hash, -1); /* Check canary bytes for memory corruption. */ check_canary(meta); /* * Clear memory if init-on-free is set. While we protect the page, the * data is still there, and after a use-after-free is detected, we * unprotect the page, so the data is still accessible. */ if (!zombie && unlikely(init)) memzero_explicit(addr, meta->size); /* Protect to detect use-after-frees. */ kfence_protect((unsigned long)addr); kcsan_end_scoped_access(&assert_page_exclusive); if (!zombie) { /* Add it to the tail of the freelist for reuse. */ raw_spin_lock_irqsave(&kfence_freelist_lock, flags); KFENCE_WARN_ON(!list_empty(&meta->list)); list_add_tail(&meta->list, &kfence_freelist); raw_spin_unlock_irqrestore(&kfence_freelist_lock, flags); atomic_long_dec(&counters[KFENCE_COUNTER_ALLOCATED]); atomic_long_inc(&counters[KFENCE_COUNTER_FREES]); } else { /* See kfence_shutdown_cache(). */ atomic_long_inc(&counters[KFENCE_COUNTER_ZOMBIES]); } } static void rcu_guarded_free(struct rcu_head *h) { struct kfence_metadata *meta = container_of(h, struct kfence_metadata, rcu_head); kfence_guarded_free((void *)meta->addr, meta, false); } /* * Initialization of the KFENCE pool after its allocation. * Returns 0 on success; otherwise returns the address up to * which partial initialization succeeded. */ static unsigned long kfence_init_pool(void) { unsigned long addr, start_pfn; int i; if (!arch_kfence_init_pool()) return (unsigned long)__kfence_pool; addr = (unsigned long)__kfence_pool; start_pfn = PHYS_PFN(virt_to_phys(__kfence_pool)); /* * Set up object pages: they must have PGTY_slab set to avoid freeing * them as real pages. * * We also want to avoid inserting kfence_free() in the kfree() * fast-path in SLUB, and therefore need to ensure kfree() correctly * enters __slab_free() slow-path. */ for (i = 0; i < KFENCE_POOL_SIZE / PAGE_SIZE; i++) { struct slab *slab; if (!i || (i % 2)) continue; slab = page_slab(pfn_to_page(start_pfn + i)); __folio_set_slab(slab_folio(slab)); #ifdef CONFIG_MEMCG slab->obj_exts = (unsigned long)&kfence_metadata_init[i / 2 - 1].obj_exts | MEMCG_DATA_OBJEXTS; #endif } /* * Protect the first 2 pages. The first page is mostly unnecessary, and * merely serves as an extended guard page. However, adding one * additional page in the beginning gives us an even number of pages, * which simplifies the mapping of address to metadata index. */ for (i = 0; i < 2; i++) { if (unlikely(!kfence_protect(addr))) return addr; addr += PAGE_SIZE; } for (i = 0; i < CONFIG_KFENCE_NUM_OBJECTS; i++) { struct kfence_metadata *meta = &kfence_metadata_init[i]; /* Initialize metadata. */ INIT_LIST_HEAD(&meta->list); raw_spin_lock_init(&meta->lock); meta->state = KFENCE_OBJECT_UNUSED; meta->addr = addr; /* Initialize for validation in metadata_to_pageaddr(). */ list_add_tail(&meta->list, &kfence_freelist); /* Protect the right redzone. */ if (unlikely(!kfence_protect(addr + PAGE_SIZE))) goto reset_slab; addr += 2 * PAGE_SIZE; } /* * Make kfence_metadata visible only when initialization is successful. * Otherwise, if the initialization fails and kfence_metadata is freed, * it may cause UAF in kfence_shutdown_cache(). */ smp_store_release(&kfence_metadata, kfence_metadata_init); return 0; reset_slab: for (i = 0; i < KFENCE_POOL_SIZE / PAGE_SIZE; i++) { struct slab *slab; if (!i || (i % 2)) continue; slab = page_slab(pfn_to_page(start_pfn + i)); #ifdef CONFIG_MEMCG slab->obj_exts = 0; #endif __folio_clear_slab(slab_folio(slab)); } return addr; } static bool __init kfence_init_pool_early(void) { unsigned long addr; if (!__kfence_pool) return false; addr = kfence_init_pool(); if (!addr) { /* * The pool is live and will never be deallocated from this point on. * Ignore the pool object from the kmemleak phys object tree, as it would * otherwise overlap with allocations returned by kfence_alloc(), which * are registered with kmemleak through the slab post-alloc hook. */ kmemleak_ignore_phys(__pa(__kfence_pool)); return true; } /* * Only release unprotected pages, and do not try to go back and change * page attributes due to risk of failing to do so as well. If changing * page attributes for some pages fails, it is very likely that it also * fails for the first page, and therefore expect addr==__kfence_pool in * most failure cases. */ memblock_free_late(__pa(addr), KFENCE_POOL_SIZE - (addr - (unsigned long)__kfence_pool)); __kfence_pool = NULL; memblock_free_late(__pa(kfence_metadata_init), KFENCE_METADATA_SIZE); kfence_metadata_init = NULL; return false; } /* === DebugFS Interface ==================================================== */ static int stats_show(struct seq_file *seq, void *v) { int i; seq_printf(seq, "enabled: %i\n", READ_ONCE(kfence_enabled)); for (i = 0; i < KFENCE_COUNTER_COUNT; i++) seq_printf(seq, "%s: %ld\n", counter_names[i], atomic_long_read(&counters[i])); return 0; } DEFINE_SHOW_ATTRIBUTE(stats); /* * debugfs seq_file operations for /sys/kernel/debug/kfence/objects. * start_object() and next_object() return the object index + 1, because NULL is used * to stop iteration. */ static void *start_object(struct seq_file *seq, loff_t *pos) { if (*pos < CONFIG_KFENCE_NUM_OBJECTS) return (void *)((long)*pos + 1); return NULL; } static void stop_object(struct seq_file *seq, void *v) { } static void *next_object(struct seq_file *seq, void *v, loff_t *pos) { ++*pos; if (*pos < CONFIG_KFENCE_NUM_OBJECTS) return (void *)((long)*pos + 1); return NULL; } static int show_object(struct seq_file *seq, void *v) { struct kfence_metadata *meta = &kfence_metadata[(long)v - 1]; unsigned long flags; raw_spin_lock_irqsave(&meta->lock, flags); kfence_print_object(seq, meta); raw_spin_unlock_irqrestore(&meta->lock, flags); seq_puts(seq, "---------------------------------\n"); return 0; } static const struct seq_operations objects_sops = { .start = start_object, .next = next_object, .stop = stop_object, .show = show_object, }; DEFINE_SEQ_ATTRIBUTE(objects); static int kfence_debugfs_init(void) { struct dentry *kfence_dir; if (!READ_ONCE(kfence_enabled)) return 0; kfence_dir = debugfs_create_dir("kfence", NULL); debugfs_create_file("stats", 0444, kfence_dir, NULL, &stats_fops); debugfs_create_file("objects", 0400, kfence_dir, NULL, &objects_fops); return 0; } late_initcall(kfence_debugfs_init); /* === Panic Notifier ====================================================== */ static void kfence_check_all_canary(void) { int i; for (i = 0; i < CONFIG_KFENCE_NUM_OBJECTS; i++) { struct kfence_metadata *meta = &kfence_metadata[i]; if (kfence_obj_allocated(meta)) check_canary(meta); } } static int kfence_check_canary_callback(struct notifier_block *nb, unsigned long reason, void *arg) { kfence_check_all_canary(); return NOTIFY_OK; } static struct notifier_block kfence_check_canary_notifier = { .notifier_call = kfence_check_canary_callback, }; /* === Allocation Gate Timer ================================================ */ static struct delayed_work kfence_timer; #ifdef CONFIG_KFENCE_STATIC_KEYS /* Wait queue to wake up allocation-gate timer task. */ static DECLARE_WAIT_QUEUE_HEAD(allocation_wait); static void wake_up_kfence_timer(struct irq_work *work) { wake_up(&allocation_wait); } static DEFINE_IRQ_WORK(wake_up_kfence_timer_work, wake_up_kfence_timer); #endif /* * Set up delayed work, which will enable and disable the static key. We need to * use a work queue (rather than a simple timer), since enabling and disabling a * static key cannot be done from an interrupt. * * Note: Toggling a static branch currently causes IPIs, and here we'll end up * with a total of 2 IPIs to all CPUs. If this ends up a problem in future (with * more aggressive sampling intervals), we could get away with a variant that * avoids IPIs, at the cost of not immediately capturing allocations if the * instructions remain cached. */ static void toggle_allocation_gate(struct work_struct *work) { if (!READ_ONCE(kfence_enabled)) return; atomic_set(&kfence_allocation_gate, -kfence_burst); #ifdef CONFIG_KFENCE_STATIC_KEYS /* Enable static key, and await allocation to happen. */ static_branch_enable(&kfence_allocation_key); wait_event_idle(allocation_wait, atomic_read(&kfence_allocation_gate) > 0); /* Disable static key and reset timer. */ static_branch_disable(&kfence_allocation_key); #endif queue_delayed_work(system_unbound_wq, &kfence_timer, msecs_to_jiffies(kfence_sample_interval)); } /* === Public interface ===================================================== */ void __init kfence_alloc_pool_and_metadata(void) { if (!kfence_sample_interval) return; /* * If the pool has already been initialized by arch, there is no need to * re-allocate the memory pool. */ if (!__kfence_pool) __kfence_pool = memblock_alloc(KFENCE_POOL_SIZE, PAGE_SIZE); if (!__kfence_pool) { pr_err("failed to allocate pool\n"); return; } /* The memory allocated by memblock has been zeroed out. */ kfence_metadata_init = memblock_alloc(KFENCE_METADATA_SIZE, PAGE_SIZE); if (!kfence_metadata_init) { pr_err("failed to allocate metadata\n"); memblock_free(__kfence_pool, KFENCE_POOL_SIZE); __kfence_pool = NULL; } } static void kfence_init_enable(void) { if (!IS_ENABLED(CONFIG_KFENCE_STATIC_KEYS)) static_branch_enable(&kfence_allocation_key); if (kfence_deferrable) INIT_DEFERRABLE_WORK(&kfence_timer, toggle_allocation_gate); else INIT_DELAYED_WORK(&kfence_timer, toggle_allocation_gate); if (kfence_check_on_panic) atomic_notifier_chain_register(&panic_notifier_list, &kfence_check_canary_notifier); WRITE_ONCE(kfence_enabled, true); queue_delayed_work(system_unbound_wq, &kfence_timer, 0); pr_info("initialized - using %lu bytes for %d objects at 0x%p-0x%p\n", KFENCE_POOL_SIZE, CONFIG_KFENCE_NUM_OBJECTS, (void *)__kfence_pool, (void *)(__kfence_pool + KFENCE_POOL_SIZE)); } void __init kfence_init(void) { stack_hash_seed = get_random_u32(); /* Setting kfence_sample_interval to 0 on boot disables KFENCE. */ if (!kfence_sample_interval) return; if (!kfence_init_pool_early()) { pr_err("%s failed\n", __func__); return; } kfence_init_enable(); } static int kfence_init_late(void) { const unsigned long nr_pages_pool = KFENCE_POOL_SIZE / PAGE_SIZE; const unsigned long nr_pages_meta = KFENCE_METADATA_SIZE / PAGE_SIZE; unsigned long addr = (unsigned long)__kfence_pool; unsigned long free_size = KFENCE_POOL_SIZE; int err = -ENOMEM; #ifdef CONFIG_CONTIG_ALLOC struct page *pages; pages = alloc_contig_pages(nr_pages_pool, GFP_KERNEL, first_online_node, NULL); if (!pages) return -ENOMEM; __kfence_pool = page_to_virt(pages); pages = alloc_contig_pages(nr_pages_meta, GFP_KERNEL, first_online_node, NULL); if (pages) kfence_metadata_init = page_to_virt(pages); #else if (nr_pages_pool > MAX_ORDER_NR_PAGES || nr_pages_meta > MAX_ORDER_NR_PAGES) { pr_warn("KFENCE_NUM_OBJECTS too large for buddy allocator\n"); return -EINVAL; } __kfence_pool = alloc_pages_exact(KFENCE_POOL_SIZE, GFP_KERNEL); if (!__kfence_pool) return -ENOMEM; kfence_metadata_init = alloc_pages_exact(KFENCE_METADATA_SIZE, GFP_KERNEL); #endif if (!kfence_metadata_init) goto free_pool; memzero_explicit(kfence_metadata_init, KFENCE_METADATA_SIZE); addr = kfence_init_pool(); if (!addr) { kfence_init_enable(); kfence_debugfs_init(); return 0; } pr_err("%s failed\n", __func__); free_size = KFENCE_POOL_SIZE - (addr - (unsigned long)__kfence_pool); err = -EBUSY; #ifdef CONFIG_CONTIG_ALLOC free_contig_range(page_to_pfn(virt_to_page((void *)kfence_metadata_init)), nr_pages_meta); free_pool: free_contig_range(page_to_pfn(virt_to_page((void *)addr)), free_size / PAGE_SIZE); #else free_pages_exact((void *)kfence_metadata_init, KFENCE_METADATA_SIZE); free_pool: free_pages_exact((void *)addr, free_size); #endif kfence_metadata_init = NULL; __kfence_pool = NULL; return err; } static int kfence_enable_late(void) { if (!__kfence_pool) return kfence_init_late(); WRITE_ONCE(kfence_enabled, true); queue_delayed_work(system_unbound_wq, &kfence_timer, 0); pr_info("re-enabled\n"); return 0; } void kfence_shutdown_cache(struct kmem_cache *s) { unsigned long flags; struct kfence_metadata *meta; int i; /* Pairs with release in kfence_init_pool(). */ if (!smp_load_acquire(&kfence_metadata)) return; for (i = 0; i < CONFIG_KFENCE_NUM_OBJECTS; i++) { bool in_use; meta = &kfence_metadata[i]; /* * If we observe some inconsistent cache and state pair where we * should have returned false here, cache destruction is racing * with either kmem_cache_alloc() or kmem_cache_free(). Taking * the lock will not help, as different critical section * serialization will have the same outcome. */ if (READ_ONCE(meta->cache) != s || !kfence_obj_allocated(meta)) continue; raw_spin_lock_irqsave(&meta->lock, flags); in_use = meta->cache == s && kfence_obj_allocated(meta); raw_spin_unlock_irqrestore(&meta->lock, flags); if (in_use) { /* * This cache still has allocations, and we should not * release them back into the freelist so they can still * safely be used and retain the kernel's default * behaviour of keeping the allocations alive (leak the * cache); however, they effectively become "zombie * allocations" as the KFENCE objects are the only ones * still in use and the owning cache is being destroyed. * * We mark them freed, so that any subsequent use shows * more useful error messages that will include stack * traces of the user of the object, the original * allocation, and caller to shutdown_cache(). */ kfence_guarded_free((void *)meta->addr, meta, /*zombie=*/true); } } for (i = 0; i < CONFIG_KFENCE_NUM_OBJECTS; i++) { meta = &kfence_metadata[i]; /* See above. */ if (READ_ONCE(meta->cache) != s || READ_ONCE(meta->state) != KFENCE_OBJECT_FREED) continue; raw_spin_lock_irqsave(&meta->lock, flags); if (meta->cache == s && meta->state == KFENCE_OBJECT_FREED) meta->cache = NULL; raw_spin_unlock_irqrestore(&meta->lock, flags); } } void *__kfence_alloc(struct kmem_cache *s, size_t size, gfp_t flags) { unsigned long stack_entries[KFENCE_STACK_DEPTH]; size_t num_stack_entries; u32 alloc_stack_hash; int allocation_gate; /* * Perform size check before switching kfence_allocation_gate, so that * we don't disable KFENCE without making an allocation. */ if (size > PAGE_SIZE) { atomic_long_inc(&counters[KFENCE_COUNTER_SKIP_INCOMPAT]); return NULL; } /* * Skip allocations from non-default zones, including DMA. We cannot * guarantee that pages in the KFENCE pool will have the requested * properties (e.g. reside in DMAable memory). */ if ((flags & GFP_ZONEMASK) || ((flags & __GFP_THISNODE) && num_online_nodes() > 1) || (s->flags & (SLAB_CACHE_DMA | SLAB_CACHE_DMA32))) { atomic_long_inc(&counters[KFENCE_COUNTER_SKIP_INCOMPAT]); return NULL; } /* * Skip allocations for this slab, if KFENCE has been disabled for * this slab. */ if (s->flags & SLAB_SKIP_KFENCE) return NULL; allocation_gate = atomic_inc_return(&kfence_allocation_gate); if (allocation_gate > 1) return NULL; #ifdef CONFIG_KFENCE_STATIC_KEYS /* * waitqueue_active() is fully ordered after the update of * kfence_allocation_gate per atomic_inc_return(). */ if (allocation_gate == 1 && waitqueue_active(&allocation_wait)) { /* * Calling wake_up() here may deadlock when allocations happen * from within timer code. Use an irq_work to defer it. */ irq_work_queue(&wake_up_kfence_timer_work); } #endif if (!READ_ONCE(kfence_enabled)) return NULL; num_stack_entries = stack_trace_save(stack_entries, KFENCE_STACK_DEPTH, 0); /* * Do expensive check for coverage of allocation in slow-path after * allocation_gate has already become non-zero, even though it might * mean not making any allocation within a given sample interval. * * This ensures reasonable allocation coverage when the pool is almost * full, including avoiding long-lived allocations of the same source * filling up the pool (e.g. pagecache allocations). */ alloc_stack_hash = get_alloc_stack_hash(stack_entries, num_stack_entries); if (should_skip_covered() && alloc_covered_contains(alloc_stack_hash)) { atomic_long_inc(&counters[KFENCE_COUNTER_SKIP_COVERED]); return NULL; } return kfence_guarded_alloc(s, size, flags, stack_entries, num_stack_entries, alloc_stack_hash); } size_t kfence_ksize(const void *addr) { const struct kfence_metadata *meta = addr_to_metadata((unsigned long)addr); /* * Read locklessly -- if there is a race with __kfence_alloc(), this is * either a use-after-free or invalid access. */ return meta ? meta->size : 0; } void *kfence_object_start(const void *addr) { const struct kfence_metadata *meta = addr_to_metadata((unsigned long)addr); /* * Read locklessly -- if there is a race with __kfence_alloc(), this is * either a use-after-free or invalid access. */ return meta ? (void *)meta->addr : NULL; } void __kfence_free(void *addr) { struct kfence_metadata *meta = addr_to_metadata((unsigned long)addr); #ifdef CONFIG_MEMCG KFENCE_WARN_ON(meta->obj_exts.objcg); #endif /* * If the objects of the cache are SLAB_TYPESAFE_BY_RCU, defer freeing * the object, as the object page may be recycled for other-typed * objects once it has been freed. meta->cache may be NULL if the cache * was destroyed. * Save the stack trace here so that reports show where the user freed * the object. */ if (unlikely(meta->cache && (meta->cache->flags & SLAB_TYPESAFE_BY_RCU))) { unsigned long flags; raw_spin_lock_irqsave(&meta->lock, flags); metadata_update_state(meta, KFENCE_OBJECT_RCU_FREEING, NULL, 0); raw_spin_unlock_irqrestore(&meta->lock, flags); call_rcu(&meta->rcu_head, rcu_guarded_free); } else { kfence_guarded_free(addr, meta, false); } } bool kfence_handle_page_fault(unsigned long addr, bool is_write, struct pt_regs *regs) { const int page_index = (addr - (unsigned long)__kfence_pool) / PAGE_SIZE; struct kfence_metadata *to_report = NULL; enum kfence_error_type error_type; unsigned long flags; if (!is_kfence_address((void *)addr)) return false; if (!READ_ONCE(kfence_enabled)) /* If disabled at runtime ... */ return kfence_unprotect(addr); /* ... unprotect and proceed. */ atomic_long_inc(&counters[KFENCE_COUNTER_BUGS]); if (page_index % 2) { /* This is a redzone, report a buffer overflow. */ struct kfence_metadata *meta; int distance = 0; meta = addr_to_metadata(addr - PAGE_SIZE); if (meta && kfence_obj_allocated(meta)) { to_report = meta; /* Data race ok; distance calculation approximate. */ distance = addr - data_race(meta->addr + meta->size); } meta = addr_to_metadata(addr + PAGE_SIZE); if (meta && kfence_obj_allocated(meta)) { /* Data race ok; distance calculation approximate. */ if (!to_report || distance > data_race(meta->addr) - addr) to_report = meta; } if (!to_report) goto out; raw_spin_lock_irqsave(&to_report->lock, flags); to_report->unprotected_page = addr; error_type = KFENCE_ERROR_OOB; /* * If the object was freed before we took the look we can still * report this as an OOB -- the report will simply show the * stacktrace of the free as well. */ } else { to_report = addr_to_metadata(addr); if (!to_report) goto out; raw_spin_lock_irqsave(&to_report->lock, flags); error_type = KFENCE_ERROR_UAF; /* * We may race with __kfence_alloc(), and it is possible that a * freed object may be reallocated. We simply report this as a * use-after-free, with the stack trace showing the place where * the object was re-allocated. */ } out: if (to_report) { kfence_report_error(addr, is_write, regs, to_report, error_type); raw_spin_unlock_irqrestore(&to_report->lock, flags); } else { /* This may be a UAF or OOB access, but we can't be sure. */ kfence_report_error(addr, is_write, regs, NULL, KFENCE_ERROR_INVALID); } return kfence_unprotect(addr); /* Unprotect and let access proceed. */ } |
| 10 10 10 10 10 10 9 10 9 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 | /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _BLK_CGROUP_PRIVATE_H #define _BLK_CGROUP_PRIVATE_H /* * block cgroup private header * * Based on ideas and code from CFQ, CFS and BFQ: * Copyright (C) 2003 Jens Axboe <axboe@kernel.dk> * * Copyright (C) 2008 Fabio Checconi <fabio@gandalf.sssup.it> * Paolo Valente <paolo.valente@unimore.it> * * Copyright (C) 2009 Vivek Goyal <vgoyal@redhat.com> * Nauman Rafique <nauman@google.com> */ #include <linux/blk-cgroup.h> #include <linux/cgroup.h> #include <linux/kthread.h> #include <linux/blk-mq.h> #include <linux/llist.h> #include "blk.h" struct blkcg_gq; struct blkg_policy_data; /* percpu_counter batch for blkg_[rw]stats, per-cpu drift doesn't matter */ #define BLKG_STAT_CPU_BATCH (INT_MAX / 2) #ifdef CONFIG_BLK_CGROUP enum blkg_iostat_type { BLKG_IOSTAT_READ, BLKG_IOSTAT_WRITE, BLKG_IOSTAT_DISCARD, BLKG_IOSTAT_NR, }; struct blkg_iostat { u64 bytes[BLKG_IOSTAT_NR]; u64 ios[BLKG_IOSTAT_NR]; }; struct blkg_iostat_set { struct u64_stats_sync sync; struct blkcg_gq *blkg; struct llist_node lnode; int lqueued; /* queued in llist */ struct blkg_iostat cur; struct blkg_iostat last; }; /* association between a blk cgroup and a request queue */ struct blkcg_gq { /* Pointer to the associated request_queue */ struct request_queue *q; struct list_head q_node; struct hlist_node blkcg_node; struct blkcg *blkcg; /* all non-root blkcg_gq's are guaranteed to have access to parent */ struct blkcg_gq *parent; /* reference count */ struct percpu_ref refcnt; /* is this blkg online? protected by both blkcg and q locks */ bool online; struct blkg_iostat_set __percpu *iostat_cpu; struct blkg_iostat_set iostat; struct blkg_policy_data *pd[BLKCG_MAX_POLS]; #ifdef CONFIG_BLK_CGROUP_PUNT_BIO spinlock_t async_bio_lock; struct bio_list async_bios; #endif union { struct work_struct async_bio_work; struct work_struct free_work; }; atomic_t use_delay; atomic64_t delay_nsec; atomic64_t delay_start; u64 last_delay; int last_use; struct rcu_head rcu_head; }; struct blkcg { struct cgroup_subsys_state css; spinlock_t lock; refcount_t online_pin; /* If there is block congestion on this cgroup. */ atomic_t congestion_count; struct radix_tree_root blkg_tree; struct blkcg_gq __rcu *blkg_hint; struct hlist_head blkg_list; struct blkcg_policy_data *cpd[BLKCG_MAX_POLS]; struct list_head all_blkcgs_node; /* * List of updated percpu blkg_iostat_set's since the last flush. */ struct llist_head __percpu *lhead; #ifdef CONFIG_BLK_CGROUP_FC_APPID char fc_app_id[FC_APPID_LEN]; #endif #ifdef CONFIG_CGROUP_WRITEBACK struct list_head cgwb_list; #endif }; static inline struct blkcg *css_to_blkcg(struct cgroup_subsys_state *css) { return css ? container_of(css, struct blkcg, css) : NULL; } /* * A blkcg_gq (blkg) is association between a block cgroup (blkcg) and a * request_queue (q). This is used by blkcg policies which need to track * information per blkcg - q pair. * * There can be multiple active blkcg policies and each blkg:policy pair is * represented by a blkg_policy_data which is allocated and freed by each * policy's pd_alloc/free_fn() methods. A policy can allocate private data * area by allocating larger data structure which embeds blkg_policy_data * at the beginning. */ struct blkg_policy_data { /* the blkg and policy id this per-policy data belongs to */ struct blkcg_gq *blkg; int plid; bool online; }; /* * Policies that need to keep per-blkcg data which is independent from any * request_queue associated to it should implement cpd_alloc/free_fn() * methods. A policy can allocate private data area by allocating larger * data structure which embeds blkcg_policy_data at the beginning. * cpd_init() is invoked to let each policy handle per-blkcg data. */ struct blkcg_policy_data { /* the blkcg and policy id this per-policy data belongs to */ struct blkcg *blkcg; int plid; }; typedef struct blkcg_policy_data *(blkcg_pol_alloc_cpd_fn)(gfp_t gfp); typedef void (blkcg_pol_init_cpd_fn)(struct blkcg_policy_data *cpd); typedef void (blkcg_pol_free_cpd_fn)(struct blkcg_policy_data *cpd); typedef void (blkcg_pol_bind_cpd_fn)(struct blkcg_policy_data *cpd); typedef struct blkg_policy_data *(blkcg_pol_alloc_pd_fn)(struct gendisk *disk, struct blkcg *blkcg, gfp_t gfp); typedef void (blkcg_pol_init_pd_fn)(struct blkg_policy_data *pd); typedef void (blkcg_pol_online_pd_fn)(struct blkg_policy_data *pd); typedef void (blkcg_pol_offline_pd_fn)(struct blkg_policy_data *pd); typedef void (blkcg_pol_free_pd_fn)(struct blkg_policy_data *pd); typedef void (blkcg_pol_reset_pd_stats_fn)(struct blkg_policy_data *pd); typedef void (blkcg_pol_stat_pd_fn)(struct blkg_policy_data *pd, struct seq_file *s); struct blkcg_policy { int plid; /* cgroup files for the policy */ struct cftype *dfl_cftypes; struct cftype *legacy_cftypes; /* operations */ blkcg_pol_alloc_cpd_fn *cpd_alloc_fn; blkcg_pol_free_cpd_fn *cpd_free_fn; blkcg_pol_alloc_pd_fn *pd_alloc_fn; blkcg_pol_init_pd_fn *pd_init_fn; blkcg_pol_online_pd_fn *pd_online_fn; blkcg_pol_offline_pd_fn *pd_offline_fn; blkcg_pol_free_pd_fn *pd_free_fn; blkcg_pol_reset_pd_stats_fn *pd_reset_stats_fn; blkcg_pol_stat_pd_fn *pd_stat_fn; }; extern struct blkcg blkcg_root; extern bool blkcg_debug_stats; void blkg_init_queue(struct request_queue *q); int blkcg_init_disk(struct gendisk *disk); void blkcg_exit_disk(struct gendisk *disk); /* Blkio controller policy registration */ int blkcg_policy_register(struct blkcg_policy *pol); void blkcg_policy_unregister(struct blkcg_policy *pol); int blkcg_activate_policy(struct gendisk *disk, const struct blkcg_policy *pol); void blkcg_deactivate_policy(struct gendisk *disk, const struct blkcg_policy *pol); const char *blkg_dev_name(struct blkcg_gq *blkg); void blkcg_print_blkgs(struct seq_file *sf, struct blkcg *blkcg, u64 (*prfill)(struct seq_file *, struct blkg_policy_data *, int), const struct blkcg_policy *pol, int data, bool show_total); u64 __blkg_prfill_u64(struct seq_file *sf, struct blkg_policy_data *pd, u64 v); struct blkg_conf_ctx { char *input; char *body; struct block_device *bdev; struct blkcg_gq *blkg; }; void blkg_conf_init(struct blkg_conf_ctx *ctx, char *input); int blkg_conf_open_bdev(struct blkg_conf_ctx *ctx); unsigned long blkg_conf_open_bdev_frozen(struct blkg_conf_ctx *ctx); int blkg_conf_prep(struct blkcg *blkcg, const struct blkcg_policy *pol, struct blkg_conf_ctx *ctx); void blkg_conf_exit(struct blkg_conf_ctx *ctx); void blkg_conf_exit_frozen(struct blkg_conf_ctx *ctx, unsigned long memflags); /** * bio_issue_as_root_blkg - see if this bio needs to be issued as root blkg * @bio: the target &bio * * Return: true if this bio needs to be submitted with the root blkg context. * * In order to avoid priority inversions we sometimes need to issue a bio as if * it were attached to the root blkg, and then backcharge to the actual owning * blkg. The idea is we do bio_blkcg_css() to look up the actual context for * the bio and attach the appropriate blkg to the bio. Then we call this helper * and if it is true run with the root blkg for that queue and then do any * backcharging to the originating cgroup once the io is complete. */ static inline bool bio_issue_as_root_blkg(struct bio *bio) { return (bio->bi_opf & (REQ_META | REQ_SWAP)) != 0; } /** * blkg_lookup - lookup blkg for the specified blkcg - q pair * @blkcg: blkcg of interest * @q: request_queue of interest * * Lookup blkg for the @blkcg - @q pair. * * Must be called in a RCU critical section. */ static inline struct blkcg_gq *blkg_lookup(struct blkcg *blkcg, struct request_queue *q) { struct blkcg_gq *blkg; if (blkcg == &blkcg_root) return q->root_blkg; blkg = rcu_dereference_check(blkcg->blkg_hint, lockdep_is_held(&q->queue_lock)); if (blkg && blkg->q == q) return blkg; blkg = radix_tree_lookup(&blkcg->blkg_tree, q->id); if (blkg && blkg->q != q) blkg = NULL; return blkg; } /** * blkg_to_pd - get policy private data * @blkg: blkg of interest * @pol: policy of interest * * Return pointer to private data associated with the @blkg-@pol pair. */ static inline struct blkg_policy_data *blkg_to_pd(struct blkcg_gq *blkg, struct blkcg_policy *pol) { return blkg ? blkg->pd[pol->plid] : NULL; } static inline struct blkcg_policy_data *blkcg_to_cpd(struct blkcg *blkcg, struct blkcg_policy *pol) { return blkcg ? blkcg->cpd[pol->plid] : NULL; } /** * pd_to_blkg - get blkg associated with policy private data * @pd: policy private data of interest * * @pd is policy private data. Determine the blkg it's associated with. */ static inline struct blkcg_gq *pd_to_blkg(struct blkg_policy_data *pd) { return pd ? pd->blkg : NULL; } static inline struct blkcg *cpd_to_blkcg(struct blkcg_policy_data *cpd) { return cpd ? cpd->blkcg : NULL; } /** * blkg_get - get a blkg reference * @blkg: blkg to get * * The caller should be holding an existing reference. */ static inline void blkg_get(struct blkcg_gq *blkg) { percpu_ref_get(&blkg->refcnt); } /** * blkg_tryget - try and get a blkg reference * @blkg: blkg to get * * This is for use when doing an RCU lookup of the blkg. We may be in the midst * of freeing this blkg, so we can only use it if the refcnt is not zero. */ static inline bool blkg_tryget(struct blkcg_gq *blkg) { return blkg && percpu_ref_tryget(&blkg->refcnt); } /** * blkg_put - put a blkg reference * @blkg: blkg to put */ static inline void blkg_put(struct blkcg_gq *blkg) { percpu_ref_put(&blkg->refcnt); } /** * blkg_for_each_descendant_pre - pre-order walk of a blkg's descendants * @d_blkg: loop cursor pointing to the current descendant * @pos_css: used for iteration * @p_blkg: target blkg to walk descendants of * * Walk @c_blkg through the descendants of @p_blkg. Must be used with RCU * read locked. If called under either blkcg or queue lock, the iteration * is guaranteed to include all and only online blkgs. The caller may * update @pos_css by calling css_rightmost_descendant() to skip subtree. * @p_blkg is included in the iteration and the first node to be visited. */ #define blkg_for_each_descendant_pre(d_blkg, pos_css, p_blkg) \ css_for_each_descendant_pre((pos_css), &(p_blkg)->blkcg->css) \ if (((d_blkg) = blkg_lookup(css_to_blkcg(pos_css), \ (p_blkg)->q))) /** * blkg_for_each_descendant_post - post-order walk of a blkg's descendants * @d_blkg: loop cursor pointing to the current descendant * @pos_css: used for iteration * @p_blkg: target blkg to walk descendants of * * Similar to blkg_for_each_descendant_pre() but performs post-order * traversal instead. Synchronization rules are the same. @p_blkg is * included in the iteration and the last node to be visited. */ #define blkg_for_each_descendant_post(d_blkg, pos_css, p_blkg) \ css_for_each_descendant_post((pos_css), &(p_blkg)->blkcg->css) \ if (((d_blkg) = blkg_lookup(css_to_blkcg(pos_css), \ (p_blkg)->q))) static inline void blkcg_use_delay(struct blkcg_gq *blkg) { if (WARN_ON_ONCE(atomic_read(&blkg->use_delay) < 0)) return; if (atomic_add_return(1, &blkg->use_delay) == 1) atomic_inc(&blkg->blkcg->congestion_count); } static inline int blkcg_unuse_delay(struct blkcg_gq *blkg) { int old = atomic_read(&blkg->use_delay); if (WARN_ON_ONCE(old < 0)) return 0; if (old == 0) return 0; /* * We do this song and dance because we can race with somebody else * adding or removing delay. If we just did an atomic_dec we'd end up * negative and we'd already be in trouble. We need to subtract 1 and * then check to see if we were the last delay so we can drop the * congestion count on the cgroup. */ while (old && !atomic_try_cmpxchg(&blkg->use_delay, &old, old - 1)) ; if (old == 0) return 0; if (old == 1) atomic_dec(&blkg->blkcg->congestion_count); return 1; } /** * blkcg_set_delay - Enable allocator delay mechanism with the specified delay amount * @blkg: target blkg * @delay: delay duration in nsecs * * When enabled with this function, the delay is not decayed and must be * explicitly cleared with blkcg_clear_delay(). Must not be mixed with * blkcg_[un]use_delay() and blkcg_add_delay() usages. */ static inline void blkcg_set_delay(struct blkcg_gq *blkg, u64 delay) { int old = atomic_read(&blkg->use_delay); /* We only want 1 person setting the congestion count for this blkg. */ if (!old && atomic_try_cmpxchg(&blkg->use_delay, &old, -1)) atomic_inc(&blkg->blkcg->congestion_count); atomic64_set(&blkg->delay_nsec, delay); } /** * blkcg_clear_delay - Disable allocator delay mechanism * @blkg: target blkg * * Disable use_delay mechanism. See blkcg_set_delay(). */ static inline void blkcg_clear_delay(struct blkcg_gq *blkg) { int old = atomic_read(&blkg->use_delay); /* We only want 1 person clearing the congestion count for this blkg. */ if (old && atomic_try_cmpxchg(&blkg->use_delay, &old, 0)) atomic_dec(&blkg->blkcg->congestion_count); } /** * blk_cgroup_mergeable - Determine whether to allow or disallow merges * @rq: request to merge into * @bio: bio to merge * * @bio and @rq should belong to the same cgroup and their issue_as_root should * match. The latter is necessary as we don't want to throttle e.g. a metadata * update because it happens to be next to a regular IO. */ static inline bool blk_cgroup_mergeable(struct request *rq, struct bio *bio) { return rq->bio->bi_blkg == bio->bi_blkg && bio_issue_as_root_blkg(rq->bio) == bio_issue_as_root_blkg(bio); } static inline bool blkcg_policy_enabled(struct request_queue *q, const struct blkcg_policy *pol) { return pol && test_bit(pol->plid, q->blkcg_pols); } void blk_cgroup_bio_start(struct bio *bio); void blkcg_add_delay(struct blkcg_gq *blkg, u64 now, u64 delta); #else /* CONFIG_BLK_CGROUP */ struct blkg_policy_data { }; struct blkcg_policy_data { }; struct blkcg_policy { }; struct blkcg { }; static inline struct blkcg_gq *blkg_lookup(struct blkcg *blkcg, void *key) { return NULL; } static inline void blkg_init_queue(struct request_queue *q) { } static inline int blkcg_init_disk(struct gendisk *disk) { return 0; } static inline void blkcg_exit_disk(struct gendisk *disk) { } static inline int blkcg_policy_register(struct blkcg_policy *pol) { return 0; } static inline void blkcg_policy_unregister(struct blkcg_policy *pol) { } static inline int blkcg_activate_policy(struct gendisk *disk, const struct blkcg_policy *pol) { return 0; } static inline void blkcg_deactivate_policy(struct gendisk *disk, const struct blkcg_policy *pol) { } static inline struct blkg_policy_data *blkg_to_pd(struct blkcg_gq *blkg, struct blkcg_policy *pol) { return NULL; } static inline struct blkcg_gq *pd_to_blkg(struct blkg_policy_data *pd) { return NULL; } static inline void blkg_get(struct blkcg_gq *blkg) { } static inline void blkg_put(struct blkcg_gq *blkg) { } static inline void blk_cgroup_bio_start(struct bio *bio) { } static inline bool blk_cgroup_mergeable(struct request *rq, struct bio *bio) { return true; } #define blk_queue_for_each_rl(rl, q) \ for ((rl) = &(q)->root_rl; (rl); (rl) = NULL) #endif /* CONFIG_BLK_CGROUP */ #endif /* _BLK_CGROUP_PRIVATE_H */ |
| 6 6 6 6 5 2 3 6 1 5 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 | // SPDX-License-Identifier: GPL-2.0-or-later /* * net/sched/sch_prio.c Simple 3-band priority "scheduler". * * Authors: Alexey Kuznetsov, <kuznet@ms2.inr.ac.ru> * Fixes: 19990609: J Hadi Salim <hadi@nortelnetworks.com>: * Init -- EINVAL when opt undefined */ #include <linux/module.h> #include <linux/slab.h> #include <linux/types.h> #include <linux/kernel.h> #include <linux/string.h> #include <linux/errno.h> #include <linux/skbuff.h> #include <net/netlink.h> #include <net/pkt_sched.h> #include <net/pkt_cls.h> struct prio_sched_data { int bands; struct tcf_proto __rcu *filter_list; struct tcf_block *block; u8 prio2band[TC_PRIO_MAX+1]; struct Qdisc *queues[TCQ_PRIO_BANDS]; }; static struct Qdisc * prio_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr) { struct prio_sched_data *q = qdisc_priv(sch); u32 band = skb->priority; struct tcf_result res; struct tcf_proto *fl; int err; *qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS; if (TC_H_MAJ(skb->priority) != sch->handle) { fl = rcu_dereference_bh(q->filter_list); err = tcf_classify(skb, NULL, fl, &res, false); #ifdef CONFIG_NET_CLS_ACT switch (err) { case TC_ACT_STOLEN: case TC_ACT_QUEUED: case TC_ACT_TRAP: *qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN; fallthrough; case TC_ACT_SHOT: return NULL; } #endif if (!fl || err < 0) { if (TC_H_MAJ(band)) band = 0; return q->queues[q->prio2band[band & TC_PRIO_MAX]]; } band = res.classid; } band = TC_H_MIN(band) - 1; if (band >= q->bands) return q->queues[q->prio2band[0]]; return q->queues[band]; } static int prio_enqueue(struct sk_buff *skb, struct Qdisc *sch, struct sk_buff **to_free) { unsigned int len = qdisc_pkt_len(skb); struct Qdisc *qdisc; int ret; qdisc = prio_classify(skb, sch, &ret); #ifdef CONFIG_NET_CLS_ACT if (qdisc == NULL) { if (ret & __NET_XMIT_BYPASS) qdisc_qstats_drop(sch); __qdisc_drop(skb, to_free); return ret; } #endif ret = qdisc_enqueue(skb, qdisc, to_free); if (ret == NET_XMIT_SUCCESS) { sch->qstats.backlog += len; sch->q.qlen++; return NET_XMIT_SUCCESS; } if (net_xmit_drop_count(ret)) qdisc_qstats_drop(sch); return ret; } static struct sk_buff *prio_peek(struct Qdisc *sch) { struct prio_sched_data *q = qdisc_priv(sch); int prio; for (prio = 0; prio < q->bands; prio++) { struct Qdisc *qdisc = q->queues[prio]; struct sk_buff *skb = qdisc->ops->peek(qdisc); if (skb) return skb; } return NULL; } static struct sk_buff *prio_dequeue(struct Qdisc *sch) { struct prio_sched_data *q = qdisc_priv(sch); int prio; for (prio = 0; prio < q->bands; prio++) { struct Qdisc *qdisc = q->queues[prio]; struct sk_buff *skb = qdisc_dequeue_peeked(qdisc); if (skb) { qdisc_bstats_update(sch, skb); qdisc_qstats_backlog_dec(sch, skb); sch->q.qlen--; return skb; } } return NULL; } static void prio_reset(struct Qdisc *sch) { int prio; struct prio_sched_data *q = qdisc_priv(sch); for (prio = 0; prio < q->bands; prio++) qdisc_reset(q->queues[prio]); } static int prio_offload(struct Qdisc *sch, struct tc_prio_qopt *qopt) { struct net_device *dev = qdisc_dev(sch); struct tc_prio_qopt_offload opt = { .handle = sch->handle, .parent = sch->parent, }; if (!tc_can_offload(dev) || !dev->netdev_ops->ndo_setup_tc) return -EOPNOTSUPP; if (qopt) { opt.command = TC_PRIO_REPLACE; opt.replace_params.bands = qopt->bands; memcpy(&opt.replace_params.priomap, qopt->priomap, TC_PRIO_MAX + 1); opt.replace_params.qstats = &sch->qstats; } else { opt.command = TC_PRIO_DESTROY; } return dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_QDISC_PRIO, &opt); } static void prio_destroy(struct Qdisc *sch) { int prio; struct prio_sched_data *q = qdisc_priv(sch); tcf_block_put(q->block); prio_offload(sch, NULL); for (prio = 0; prio < q->bands; prio++) qdisc_put(q->queues[prio]); } static int prio_tune(struct Qdisc *sch, struct nlattr *opt, struct netlink_ext_ack *extack) { struct prio_sched_data *q = qdisc_priv(sch); struct Qdisc *queues[TCQ_PRIO_BANDS]; int oldbands = q->bands, i; struct tc_prio_qopt *qopt; if (nla_len(opt) < sizeof(*qopt)) return -EINVAL; qopt = nla_data(opt); if (qopt->bands > TCQ_PRIO_BANDS || qopt->bands < TCQ_MIN_PRIO_BANDS) return -EINVAL; for (i = 0; i <= TC_PRIO_MAX; i++) { if (qopt->priomap[i] >= qopt->bands) return -EINVAL; } /* Before commit, make sure we can allocate all new qdiscs */ for (i = oldbands; i < qopt->bands; i++) { queues[i] = qdisc_create_dflt(sch->dev_queue, &pfifo_qdisc_ops, TC_H_MAKE(sch->handle, i + 1), extack); if (!queues[i]) { while (i > oldbands) qdisc_put(queues[--i]); return -ENOMEM; } } prio_offload(sch, qopt); sch_tree_lock(sch); q->bands = qopt->bands; memcpy(q->prio2band, qopt->priomap, TC_PRIO_MAX+1); for (i = q->bands; i < oldbands; i++) qdisc_purge_queue(q->queues[i]); for (i = oldbands; i < q->bands; i++) { q->queues[i] = queues[i]; if (q->queues[i] != &noop_qdisc) qdisc_hash_add(q->queues[i], true); } sch_tree_unlock(sch); for (i = q->bands; i < oldbands; i++) qdisc_put(q->queues[i]); return 0; } static int prio_init(struct Qdisc *sch, struct nlattr *opt, struct netlink_ext_ack *extack) { struct prio_sched_data *q = qdisc_priv(sch); int err; if (!opt) return -EINVAL; err = tcf_block_get(&q->block, &q->filter_list, sch, extack); if (err) return err; return prio_tune(sch, opt, extack); } static int prio_dump_offload(struct Qdisc *sch) { struct tc_prio_qopt_offload hw_stats = { .command = TC_PRIO_STATS, .handle = sch->handle, .parent = sch->parent, { .stats = { .bstats = &sch->bstats, .qstats = &sch->qstats, }, }, }; return qdisc_offload_dump_helper(sch, TC_SETUP_QDISC_PRIO, &hw_stats); } static int prio_dump(struct Qdisc *sch, struct sk_buff *skb) { struct prio_sched_data *q = qdisc_priv(sch); unsigned char *b = skb_tail_pointer(skb); struct tc_prio_qopt opt; int err; opt.bands = q->bands; memcpy(&opt.priomap, q->prio2band, TC_PRIO_MAX + 1); err = prio_dump_offload(sch); if (err) goto nla_put_failure; if (nla_put(skb, TCA_OPTIONS, sizeof(opt), &opt)) goto nla_put_failure; return skb->len; nla_put_failure: nlmsg_trim(skb, b); return -1; } static int prio_graft(struct Qdisc *sch, unsigned long arg, struct Qdisc *new, struct Qdisc **old, struct netlink_ext_ack *extack) { struct prio_sched_data *q = qdisc_priv(sch); struct tc_prio_qopt_offload graft_offload; unsigned long band = arg - 1; if (!new) { new = qdisc_create_dflt(sch->dev_queue, &pfifo_qdisc_ops, TC_H_MAKE(sch->handle, arg), extack); if (!new) new = &noop_qdisc; else qdisc_hash_add(new, true); } *old = qdisc_replace(sch, new, &q->queues[band]); graft_offload.handle = sch->handle; graft_offload.parent = sch->parent; graft_offload.graft_params.band = band; graft_offload.graft_params.child_handle = new->handle; graft_offload.command = TC_PRIO_GRAFT; qdisc_offload_graft_helper(qdisc_dev(sch), sch, new, *old, TC_SETUP_QDISC_PRIO, &graft_offload, extack); return 0; } static struct Qdisc * prio_leaf(struct Qdisc *sch, unsigned long arg) { struct prio_sched_data *q = qdisc_priv(sch); unsigned long band = arg - 1; return q->queues[band]; } static unsigned long prio_find(struct Qdisc *sch, u32 classid) { struct prio_sched_data *q = qdisc_priv(sch); unsigned long band = TC_H_MIN(classid); if (band - 1 >= q->bands) return 0; return band; } static unsigned long prio_bind(struct Qdisc *sch, unsigned long parent, u32 classid) { return prio_find(sch, classid); } static void prio_unbind(struct Qdisc *q, unsigned long cl) { } static int prio_dump_class(struct Qdisc *sch, unsigned long cl, struct sk_buff *skb, struct tcmsg *tcm) { struct prio_sched_data *q = qdisc_priv(sch); tcm->tcm_handle |= TC_H_MIN(cl); tcm->tcm_info = q->queues[cl-1]->handle; return 0; } static int prio_dump_class_stats(struct Qdisc *sch, unsigned long cl, struct gnet_dump *d) { struct prio_sched_data *q = qdisc_priv(sch); struct Qdisc *cl_q; cl_q = q->queues[cl - 1]; if (gnet_stats_copy_basic(d, cl_q->cpu_bstats, &cl_q->bstats, true) < 0 || qdisc_qstats_copy(d, cl_q) < 0) return -1; return 0; } static void prio_walk(struct Qdisc *sch, struct qdisc_walker *arg) { struct prio_sched_data *q = qdisc_priv(sch); int prio; if (arg->stop) return; for (prio = 0; prio < q->bands; prio++) { if (!tc_qdisc_stats_dump(sch, prio + 1, arg)) break; } } static struct tcf_block *prio_tcf_block(struct Qdisc *sch, unsigned long cl, struct netlink_ext_ack *extack) { struct prio_sched_data *q = qdisc_priv(sch); if (cl) return NULL; return q->block; } static const struct Qdisc_class_ops prio_class_ops = { .graft = prio_graft, .leaf = prio_leaf, .find = prio_find, .walk = prio_walk, .tcf_block = prio_tcf_block, .bind_tcf = prio_bind, .unbind_tcf = prio_unbind, .dump = prio_dump_class, .dump_stats = prio_dump_class_stats, }; static struct Qdisc_ops prio_qdisc_ops __read_mostly = { .next = NULL, .cl_ops = &prio_class_ops, .id = "prio", .priv_size = sizeof(struct prio_sched_data), .enqueue = prio_enqueue, .dequeue = prio_dequeue, .peek = prio_peek, .init = prio_init, .reset = prio_reset, .destroy = prio_destroy, .change = prio_tune, .dump = prio_dump, .owner = THIS_MODULE, }; MODULE_ALIAS_NET_SCH("prio"); static int __init prio_module_init(void) { return register_qdisc(&prio_qdisc_ops); } static void __exit prio_module_exit(void) { unregister_qdisc(&prio_qdisc_ops); } module_init(prio_module_init) module_exit(prio_module_exit) MODULE_LICENSE("GPL"); MODULE_DESCRIPTION("Simple 3-band priority qdisc"); |
| 61 4 74 28 18 156 18 34 8 14 2 52 7 49 1 5 5 3 3 9 61 9 61 7 34 61 61 52 61 47 7 57 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 2674 2675 2676 2677 2678 2679 2680 2681 2682 2683 2684 2685 2686 2687 2688 2689 2690 2691 2692 2693 2694 2695 2696 2697 2698 2699 2700 2701 2702 2703 2704 2705 2706 2707 2708 2709 2710 2711 2712 2713 2714 2715 2716 2717 2718 2719 2720 2721 2722 2723 2724 2725 2726 2727 2728 2729 2730 2731 2732 2733 2734 2735 2736 2737 2738 2739 2740 2741 2742 2743 2744 2745 2746 2747 2748 2749 2750 2751 2752 2753 2754 2755 2756 2757 2758 2759 2760 2761 2762 2763 2764 2765 2766 2767 2768 2769 2770 2771 2772 2773 2774 2775 2776 2777 2778 2779 2780 2781 2782 2783 2784 2785 2786 2787 2788 2789 2790 2791 2792 2793 2794 2795 2796 2797 2798 2799 2800 2801 2802 2803 2804 2805 2806 2807 2808 2809 2810 2811 2812 2813 2814 2815 2816 2817 2818 2819 2820 2821 2822 2823 2824 2825 2826 2827 2828 2829 2830 2831 2832 2833 2834 2835 2836 2837 2838 2839 2840 2841 2842 2843 2844 2845 2846 2847 2848 2849 2850 2851 2852 2853 2854 2855 2856 2857 2858 2859 2860 2861 2862 2863 2864 2865 2866 2867 2868 2869 2870 2871 2872 2873 2874 2875 2876 2877 2878 2879 2880 2881 2882 2883 2884 2885 2886 2887 2888 2889 2890 2891 2892 2893 2894 2895 2896 2897 2898 2899 2900 2901 2902 2903 2904 2905 2906 2907 2908 2909 2910 2911 2912 2913 2914 2915 2916 2917 2918 2919 2920 2921 2922 2923 2924 2925 2926 2927 2928 2929 2930 2931 2932 2933 2934 2935 2936 2937 2938 2939 2940 2941 2942 2943 2944 2945 2946 2947 2948 2949 2950 2951 2952 2953 2954 2955 2956 2957 2958 2959 2960 2961 2962 2963 2964 2965 2966 2967 2968 2969 2970 2971 2972 2973 2974 2975 2976 2977 2978 2979 2980 2981 2982 2983 2984 2985 2986 2987 2988 2989 2990 2991 2992 2993 2994 2995 2996 2997 2998 2999 3000 3001 3002 3003 3004 3005 3006 3007 3008 3009 3010 3011 3012 3013 3014 3015 3016 3017 3018 3019 3020 3021 3022 3023 3024 3025 3026 3027 3028 3029 3030 3031 3032 3033 3034 3035 3036 3037 3038 3039 3040 3041 3042 3043 3044 3045 3046 3047 3048 3049 3050 3051 3052 3053 3054 3055 3056 3057 3058 3059 3060 3061 3062 3063 3064 3065 3066 3067 3068 3069 3070 3071 3072 3073 3074 3075 3076 3077 3078 3079 3080 3081 3082 3083 3084 3085 3086 3087 3088 3089 3090 3091 3092 3093 3094 3095 3096 3097 3098 3099 3100 3101 3102 3103 3104 3105 3106 3107 3108 3109 3110 3111 3112 3113 3114 3115 3116 3117 3118 3119 3120 3121 3122 3123 3124 3125 3126 3127 3128 3129 3130 3131 3132 3133 3134 3135 3136 3137 3138 3139 3140 3141 3142 3143 3144 3145 3146 3147 3148 3149 3150 3151 3152 3153 3154 3155 3156 3157 3158 3159 3160 3161 3162 3163 3164 3165 3166 3167 3168 3169 3170 3171 3172 3173 3174 3175 3176 3177 3178 3179 3180 3181 3182 3183 3184 3185 3186 3187 3188 3189 3190 3191 3192 3193 3194 3195 3196 3197 3198 3199 3200 3201 3202 3203 3204 3205 3206 3207 3208 3209 3210 3211 3212 3213 3214 3215 3216 3217 3218 3219 3220 3221 3222 3223 3224 3225 3226 3227 3228 3229 3230 3231 3232 3233 3234 3235 3236 3237 3238 3239 3240 3241 3242 3243 3244 3245 3246 3247 3248 3249 3250 3251 3252 3253 3254 3255 3256 3257 3258 3259 3260 3261 3262 3263 3264 3265 3266 3267 3268 3269 3270 3271 3272 3273 3274 3275 3276 3277 3278 3279 3280 3281 3282 3283 3284 3285 3286 3287 3288 3289 3290 3291 3292 3293 3294 3295 3296 3297 3298 3299 3300 3301 3302 3303 3304 3305 3306 3307 3308 3309 3310 3311 3312 3313 3314 3315 3316 3317 3318 3319 3320 3321 3322 3323 3324 3325 3326 3327 3328 3329 3330 3331 3332 3333 3334 3335 3336 3337 3338 3339 3340 3341 3342 3343 3344 3345 3346 3347 3348 3349 3350 3351 3352 3353 3354 3355 3356 3357 3358 3359 3360 3361 3362 3363 3364 3365 3366 3367 3368 3369 3370 3371 3372 3373 3374 3375 3376 3377 3378 3379 3380 3381 3382 3383 3384 3385 3386 3387 3388 3389 3390 3391 3392 3393 3394 3395 3396 3397 3398 3399 3400 3401 3402 3403 3404 3405 3406 3407 3408 3409 3410 3411 3412 3413 3414 3415 3416 3417 3418 3419 3420 3421 3422 3423 3424 3425 3426 3427 3428 3429 3430 3431 3432 3433 3434 3435 3436 3437 3438 3439 3440 3441 3442 3443 3444 3445 3446 3447 3448 3449 3450 3451 3452 3453 3454 3455 3456 3457 3458 3459 3460 3461 3462 3463 3464 3465 3466 3467 3468 3469 3470 3471 3472 3473 3474 3475 3476 3477 3478 3479 3480 3481 3482 3483 3484 3485 3486 3487 3488 3489 3490 3491 3492 3493 3494 3495 3496 3497 3498 3499 3500 3501 3502 3503 3504 3505 3506 3507 3508 3509 3510 3511 3512 3513 3514 3515 3516 3517 3518 3519 3520 3521 3522 3523 3524 3525 3526 3527 3528 3529 3530 3531 3532 3533 3534 3535 3536 3537 3538 3539 3540 3541 3542 3543 3544 3545 3546 3547 3548 3549 3550 3551 3552 3553 3554 3555 3556 3557 3558 3559 3560 3561 3562 3563 3564 3565 3566 3567 3568 3569 3570 3571 3572 3573 3574 3575 3576 3577 3578 3579 3580 3581 3582 3583 3584 3585 3586 3587 3588 3589 3590 3591 3592 3593 3594 3595 3596 3597 3598 3599 3600 3601 3602 3603 3604 3605 3606 3607 3608 3609 3610 3611 3612 3613 3614 3615 3616 3617 3618 3619 3620 3621 3622 3623 3624 3625 3626 3627 3628 3629 3630 3631 3632 3633 3634 3635 3636 3637 3638 3639 3640 3641 3642 3643 3644 3645 3646 3647 3648 3649 3650 3651 3652 3653 3654 3655 3656 3657 3658 3659 3660 3661 3662 3663 3664 3665 3666 3667 3668 3669 3670 3671 3672 3673 3674 3675 3676 3677 3678 3679 3680 3681 3682 3683 3684 3685 3686 3687 3688 3689 3690 3691 3692 3693 3694 3695 3696 3697 3698 3699 3700 3701 3702 3703 3704 3705 3706 3707 3708 3709 3710 3711 3712 3713 3714 3715 3716 3717 3718 3719 3720 3721 3722 3723 3724 3725 3726 3727 3728 3729 3730 3731 3732 3733 3734 3735 3736 3737 3738 3739 3740 3741 3742 3743 3744 3745 3746 3747 3748 3749 3750 3751 3752 3753 3754 3755 3756 3757 3758 3759 3760 3761 3762 3763 3764 3765 3766 3767 3768 3769 3770 3771 3772 3773 3774 3775 3776 3777 3778 3779 3780 3781 3782 3783 3784 3785 3786 3787 3788 3789 3790 3791 3792 3793 3794 3795 3796 3797 3798 3799 3800 3801 3802 3803 3804 3805 3806 3807 3808 3809 3810 3811 3812 3813 3814 3815 3816 3817 3818 3819 3820 3821 3822 3823 3824 3825 3826 3827 3828 3829 3830 3831 3832 3833 3834 3835 3836 3837 3838 3839 3840 3841 3842 3843 3844 3845 3846 3847 3848 3849 3850 3851 3852 3853 3854 3855 3856 3857 3858 3859 3860 3861 3862 3863 3864 3865 3866 3867 3868 3869 3870 3871 3872 3873 3874 3875 3876 3877 3878 3879 3880 3881 3882 3883 3884 3885 3886 3887 3888 3889 3890 3891 3892 3893 3894 3895 3896 3897 3898 3899 3900 3901 3902 3903 3904 3905 3906 3907 3908 3909 3910 3911 3912 3913 3914 3915 3916 3917 3918 3919 3920 3921 3922 3923 3924 3925 3926 3927 3928 3929 3930 3931 3932 3933 3934 3935 3936 3937 3938 3939 3940 3941 3942 3943 3944 3945 3946 3947 3948 3949 3950 3951 3952 3953 3954 3955 3956 3957 3958 3959 3960 3961 3962 3963 3964 3965 3966 3967 3968 3969 3970 3971 3972 3973 3974 3975 3976 3977 3978 3979 3980 3981 3982 3983 3984 3985 3986 3987 3988 3989 3990 3991 3992 3993 3994 3995 3996 3997 3998 3999 4000 4001 4002 4003 4004 4005 4006 4007 4008 4009 4010 4011 4012 4013 4014 4015 4016 4017 4018 4019 4020 4021 4022 4023 4024 4025 4026 4027 4028 4029 4030 4031 4032 4033 4034 4035 4036 4037 4038 4039 4040 4041 4042 4043 4044 4045 4046 4047 4048 4049 4050 4051 4052 4053 4054 4055 4056 4057 4058 4059 4060 4061 4062 4063 4064 4065 4066 4067 4068 4069 4070 4071 4072 4073 4074 4075 4076 4077 4078 4079 4080 4081 4082 4083 4084 4085 4086 4087 4088 4089 4090 4091 4092 4093 4094 4095 4096 4097 4098 4099 4100 4101 4102 4103 4104 4105 4106 4107 4108 4109 4110 4111 4112 4113 4114 4115 4116 4117 4118 4119 4120 4121 4122 4123 4124 4125 4126 4127 4128 4129 4130 4131 4132 4133 4134 4135 4136 4137 4138 4139 4140 4141 4142 4143 4144 4145 4146 4147 4148 4149 4150 4151 4152 4153 4154 4155 4156 4157 4158 4159 4160 4161 4162 4163 4164 4165 4166 4167 4168 4169 4170 4171 4172 4173 4174 4175 4176 4177 4178 4179 4180 4181 4182 4183 4184 4185 4186 4187 4188 4189 4190 4191 4192 4193 4194 4195 4196 4197 4198 4199 4200 4201 4202 4203 4204 4205 4206 4207 4208 4209 4210 | /* SPDX-License-Identifier: GPL-2.0 */ /* * Portions of this file * Copyright(c) 2016-2017 Intel Deutschland GmbH * Copyright (C) 2018, 2020-2025 Intel Corporation */ #undef TRACE_SYSTEM #define TRACE_SYSTEM cfg80211 #if !defined(__RDEV_OPS_TRACE) || defined(TRACE_HEADER_MULTI_READ) #define __RDEV_OPS_TRACE #include <linux/tracepoint.h> #include <linux/rtnetlink.h> #include <linux/etherdevice.h> #include <net/cfg80211.h> #include "core.h" #define MAC_ENTRY(entry_mac) __array(u8, entry_mac, ETH_ALEN) #define MAC_ASSIGN(entry_mac, given_mac) do { \ if (given_mac) \ memcpy(__entry->entry_mac, given_mac, ETH_ALEN); \ else \ eth_zero_addr(__entry->entry_mac); \ } while (0) #define MAXNAME 32 #define WIPHY_ENTRY __array(char, wiphy_name, 32) #define WIPHY_ASSIGN strscpy(__entry->wiphy_name, wiphy_name(wiphy), MAXNAME) #define WIPHY_PR_FMT "%s" #define WIPHY_PR_ARG __entry->wiphy_name #define WDEV_ENTRY __field(u32, id) #define WDEV_ASSIGN (__entry->id) = (!IS_ERR_OR_NULL(wdev) \ ? wdev->identifier : 0) #define WDEV_PR_FMT "wdev(%u)" #define WDEV_PR_ARG (__entry->id) #define NETDEV_ENTRY __array(char, name, IFNAMSIZ) \ __field(int, ifindex) #define NETDEV_ASSIGN \ do { \ memcpy(__entry->name, netdev->name, IFNAMSIZ); \ (__entry->ifindex) = (netdev->ifindex); \ } while (0) #define NETDEV_PR_FMT "netdev:%s(%d)" #define NETDEV_PR_ARG __entry->name, __entry->ifindex #define MESH_CFG_ENTRY __field(u16, dot11MeshRetryTimeout) \ __field(u16, dot11MeshConfirmTimeout) \ __field(u16, dot11MeshHoldingTimeout) \ __field(u16, dot11MeshMaxPeerLinks) \ __field(u8, dot11MeshMaxRetries) \ __field(u8, dot11MeshTTL) \ __field(u8, element_ttl) \ __field(bool, auto_open_plinks) \ __field(u32, dot11MeshNbrOffsetMaxNeighbor) \ __field(u8, dot11MeshHWMPmaxPREQretries) \ __field(u32, path_refresh_time) \ __field(u32, dot11MeshHWMPactivePathTimeout) \ __field(u16, min_discovery_timeout) \ __field(u16, dot11MeshHWMPpreqMinInterval) \ __field(u16, dot11MeshHWMPperrMinInterval) \ __field(u16, dot11MeshHWMPnetDiameterTraversalTime) \ __field(u8, dot11MeshHWMPRootMode) \ __field(u16, dot11MeshHWMPRannInterval) \ __field(bool, dot11MeshGateAnnouncementProtocol) \ __field(bool, dot11MeshForwarding) \ __field(s32, rssi_threshold) \ __field(u16, ht_opmode) \ __field(u32, dot11MeshHWMPactivePathToRootTimeout) \ __field(u16, dot11MeshHWMProotInterval) \ __field(u16, dot11MeshHWMPconfirmationInterval) \ __field(bool, dot11MeshNolearn) #define MESH_CFG_ASSIGN \ do { \ __entry->dot11MeshRetryTimeout = conf->dot11MeshRetryTimeout; \ __entry->dot11MeshConfirmTimeout = \ conf->dot11MeshConfirmTimeout; \ __entry->dot11MeshHoldingTimeout = \ conf->dot11MeshHoldingTimeout; \ __entry->dot11MeshMaxPeerLinks = conf->dot11MeshMaxPeerLinks; \ __entry->dot11MeshMaxRetries = conf->dot11MeshMaxRetries; \ __entry->dot11MeshTTL = conf->dot11MeshTTL; \ __entry->element_ttl = conf->element_ttl; \ __entry->auto_open_plinks = conf->auto_open_plinks; \ __entry->dot11MeshNbrOffsetMaxNeighbor = \ conf->dot11MeshNbrOffsetMaxNeighbor; \ __entry->dot11MeshHWMPmaxPREQretries = \ conf->dot11MeshHWMPmaxPREQretries; \ __entry->path_refresh_time = conf->path_refresh_time; \ __entry->dot11MeshHWMPactivePathTimeout = \ conf->dot11MeshHWMPactivePathTimeout; \ __entry->min_discovery_timeout = conf->min_discovery_timeout; \ __entry->dot11MeshHWMPpreqMinInterval = \ conf->dot11MeshHWMPpreqMinInterval; \ __entry->dot11MeshHWMPperrMinInterval = \ conf->dot11MeshHWMPperrMinInterval; \ __entry->dot11MeshHWMPnetDiameterTraversalTime = \ conf->dot11MeshHWMPnetDiameterTraversalTime; \ __entry->dot11MeshHWMPRootMode = conf->dot11MeshHWMPRootMode; \ __entry->dot11MeshHWMPRannInterval = \ conf->dot11MeshHWMPRannInterval; \ __entry->dot11MeshGateAnnouncementProtocol = \ conf->dot11MeshGateAnnouncementProtocol; \ __entry->dot11MeshForwarding = conf->dot11MeshForwarding; \ __entry->rssi_threshold = conf->rssi_threshold; \ __entry->ht_opmode = conf->ht_opmode; \ __entry->dot11MeshHWMPactivePathToRootTimeout = \ conf->dot11MeshHWMPactivePathToRootTimeout; \ __entry->dot11MeshHWMProotInterval = \ conf->dot11MeshHWMProotInterval; \ __entry->dot11MeshHWMPconfirmationInterval = \ conf->dot11MeshHWMPconfirmationInterval; \ __entry->dot11MeshNolearn = conf->dot11MeshNolearn; \ } while (0) #define CHAN_ENTRY __field(enum nl80211_band, band) \ __field(u32, center_freq) \ __field(u16, freq_offset) #define CHAN_ASSIGN(chan) \ do { \ if (chan) { \ __entry->band = chan->band; \ __entry->center_freq = chan->center_freq; \ __entry->freq_offset = chan->freq_offset; \ } else { \ __entry->band = 0; \ __entry->center_freq = 0; \ __entry->freq_offset = 0; \ } \ } while (0) #define CHAN_PR_FMT "band: %d, freq: %u.%03u" #define CHAN_PR_ARG __entry->band, __entry->center_freq, __entry->freq_offset #define CHAN_DEF_ENTRY __field(enum nl80211_band, band) \ __field(u32, control_freq) \ __field(u32, freq_offset) \ __field(u32, width) \ __field(u32, center_freq1) \ __field(u32, freq1_offset) \ __field(u32, center_freq2) \ __field(u16, punctured) #define CHAN_DEF_ASSIGN(chandef) \ do { \ if ((chandef) && (chandef)->chan) { \ __entry->band = (chandef)->chan->band; \ __entry->control_freq = \ (chandef)->chan->center_freq; \ __entry->freq_offset = \ (chandef)->chan->freq_offset; \ __entry->width = (chandef)->width; \ __entry->center_freq1 = (chandef)->center_freq1;\ __entry->freq1_offset = (chandef)->freq1_offset;\ __entry->center_freq2 = (chandef)->center_freq2;\ __entry->punctured = (chandef)->punctured; \ } else { \ __entry->band = 0; \ __entry->control_freq = 0; \ __entry->freq_offset = 0; \ __entry->width = 0; \ __entry->center_freq1 = 0; \ __entry->freq1_offset = 0; \ __entry->center_freq2 = 0; \ __entry->punctured = 0; \ } \ } while (0) #define CHAN_DEF_PR_FMT \ "band: %d, control freq: %u.%03u, width: %d, cf1: %u.%03u, cf2: %u, punct: 0x%x" #define CHAN_DEF_PR_ARG __entry->band, __entry->control_freq, \ __entry->freq_offset, __entry->width, \ __entry->center_freq1, __entry->freq1_offset, \ __entry->center_freq2, __entry->punctured #define FILS_AAD_ASSIGN(fa) \ do { \ if (fa) { \ ether_addr_copy(__entry->macaddr, fa->macaddr); \ __entry->kek_len = fa->kek_len; \ } else { \ eth_zero_addr(__entry->macaddr); \ __entry->kek_len = 0; \ } \ } while (0) #define FILS_AAD_PR_FMT \ "macaddr: %pM, kek_len: %d" #define SINFO_ENTRY __field(int, generation) \ __field(u32, connected_time) \ __field(u32, inactive_time) \ __field(u32, rx_bytes) \ __field(u32, tx_bytes) \ __field(u32, rx_packets) \ __field(u32, tx_packets) \ __field(u32, tx_retries) \ __field(u32, tx_failed) \ __field(u32, rx_dropped_misc) \ __field(u32, beacon_loss_count) \ __field(u16, llid) \ __field(u16, plid) \ __field(u8, plink_state) #define SINFO_ASSIGN \ do { \ __entry->generation = sinfo->generation; \ __entry->connected_time = sinfo->connected_time; \ __entry->inactive_time = sinfo->inactive_time; \ __entry->rx_bytes = sinfo->rx_bytes; \ __entry->tx_bytes = sinfo->tx_bytes; \ __entry->rx_packets = sinfo->rx_packets; \ __entry->tx_packets = sinfo->tx_packets; \ __entry->tx_retries = sinfo->tx_retries; \ __entry->tx_failed = sinfo->tx_failed; \ __entry->rx_dropped_misc = sinfo->rx_dropped_misc; \ __entry->beacon_loss_count = sinfo->beacon_loss_count; \ __entry->llid = sinfo->llid; \ __entry->plid = sinfo->plid; \ __entry->plink_state = sinfo->plink_state; \ } while (0) #define BOOL_TO_STR(bo) (bo) ? "true" : "false" #define QOS_MAP_ENTRY __field(u8, num_des) \ __array(u8, dscp_exception, \ 2 * IEEE80211_QOS_MAP_MAX_EX) \ __array(u8, up, IEEE80211_QOS_MAP_LEN_MIN) #define QOS_MAP_ASSIGN(qos_map) \ do { \ if ((qos_map)) { \ __entry->num_des = (qos_map)->num_des; \ memcpy(__entry->dscp_exception, \ &(qos_map)->dscp_exception, \ 2 * IEEE80211_QOS_MAP_MAX_EX); \ memcpy(__entry->up, &(qos_map)->up, \ IEEE80211_QOS_MAP_LEN_MIN); \ } else { \ __entry->num_des = 0; \ memset(__entry->dscp_exception, 0, \ 2 * IEEE80211_QOS_MAP_MAX_EX); \ memset(__entry->up, 0, \ IEEE80211_QOS_MAP_LEN_MIN); \ } \ } while (0) /************************************************************* * wiphy work traces * *************************************************************/ DECLARE_EVENT_CLASS(wiphy_work_event, TP_PROTO(struct wiphy *wiphy, struct wiphy_work *work), TP_ARGS(wiphy, work), TP_STRUCT__entry( WIPHY_ENTRY __field(void *, instance) __field(void *, func) ), TP_fast_assign( WIPHY_ASSIGN; __entry->instance = work; __entry->func = work ? work->func : NULL; ), TP_printk(WIPHY_PR_FMT " instance=%p func=%pS", WIPHY_PR_ARG, __entry->instance, __entry->func) ); DEFINE_EVENT(wiphy_work_event, wiphy_work_queue, TP_PROTO(struct wiphy *wiphy, struct wiphy_work *work), TP_ARGS(wiphy, work) ); DEFINE_EVENT(wiphy_work_event, wiphy_work_run, TP_PROTO(struct wiphy *wiphy, struct wiphy_work *work), TP_ARGS(wiphy, work) ); DEFINE_EVENT(wiphy_work_event, wiphy_work_cancel, TP_PROTO(struct wiphy *wiphy, struct wiphy_work *work), TP_ARGS(wiphy, work) ); DEFINE_EVENT(wiphy_work_event, wiphy_work_flush, TP_PROTO(struct wiphy *wiphy, struct wiphy_work *work), TP_ARGS(wiphy, work) ); TRACE_EVENT(wiphy_delayed_work_queue, TP_PROTO(struct wiphy *wiphy, struct wiphy_work *work, unsigned long delay), TP_ARGS(wiphy, work, delay), TP_STRUCT__entry( WIPHY_ENTRY __field(void *, instance) __field(void *, func) __field(unsigned long, delay) ), TP_fast_assign( WIPHY_ASSIGN; __entry->instance = work; __entry->func = work->func; __entry->delay = delay; ), TP_printk(WIPHY_PR_FMT " instance=%p func=%pS delay=%ld", WIPHY_PR_ARG, __entry->instance, __entry->func, __entry->delay) ); TRACE_EVENT(wiphy_work_worker_start, TP_PROTO(struct wiphy *wiphy), TP_ARGS(wiphy), TP_STRUCT__entry( WIPHY_ENTRY ), TP_fast_assign( WIPHY_ASSIGN; ), TP_printk(WIPHY_PR_FMT, WIPHY_PR_ARG) ); /************************************************************* * rdev->ops traces * *************************************************************/ TRACE_EVENT(rdev_suspend, TP_PROTO(struct wiphy *wiphy, struct cfg80211_wowlan *wow), TP_ARGS(wiphy, wow), TP_STRUCT__entry( WIPHY_ENTRY __field(bool, any) __field(bool, disconnect) __field(bool, magic_pkt) __field(bool, gtk_rekey_failure) __field(bool, eap_identity_req) __field(bool, four_way_handshake) __field(bool, rfkill_release) __field(bool, valid_wow) ), TP_fast_assign( WIPHY_ASSIGN; if (wow) { __entry->any = wow->any; __entry->disconnect = wow->disconnect; __entry->magic_pkt = wow->magic_pkt; __entry->gtk_rekey_failure = wow->gtk_rekey_failure; __entry->eap_identity_req = wow->eap_identity_req; __entry->four_way_handshake = wow->four_way_handshake; __entry->rfkill_release = wow->rfkill_release; __entry->valid_wow = true; } else { __entry->valid_wow = false; } ), TP_printk(WIPHY_PR_FMT ", wow%s - any: %d, disconnect: %d, " "magic pkt: %d, gtk rekey failure: %d, eap identify req: %d, " "four way handshake: %d, rfkill release: %d.", WIPHY_PR_ARG, __entry->valid_wow ? "" : "(Not configured!)", __entry->any, __entry->disconnect, __entry->magic_pkt, __entry->gtk_rekey_failure, __entry->eap_identity_req, __entry->four_way_handshake, __entry->rfkill_release) ); TRACE_EVENT(rdev_return_int, TP_PROTO(struct wiphy *wiphy, int ret), TP_ARGS(wiphy, ret), TP_STRUCT__entry( WIPHY_ENTRY __field(int, ret) ), TP_fast_assign( WIPHY_ASSIGN; __entry->ret = ret; ), TP_printk(WIPHY_PR_FMT ", returned: %d", WIPHY_PR_ARG, __entry->ret) ); TRACE_EVENT(rdev_scan, TP_PROTO(struct wiphy *wiphy, struct cfg80211_scan_request_int *request), TP_ARGS(wiphy, request), TP_STRUCT__entry( WIPHY_ENTRY ), TP_fast_assign( WIPHY_ASSIGN; ), TP_printk(WIPHY_PR_FMT, WIPHY_PR_ARG) ); DECLARE_EVENT_CLASS(wiphy_only_evt, TP_PROTO(struct wiphy *wiphy), TP_ARGS(wiphy), TP_STRUCT__entry( WIPHY_ENTRY ), TP_fast_assign( WIPHY_ASSIGN; ), TP_printk(WIPHY_PR_FMT, WIPHY_PR_ARG) ); DEFINE_EVENT(wiphy_only_evt, rdev_resume, TP_PROTO(struct wiphy *wiphy), TP_ARGS(wiphy) ); DEFINE_EVENT(wiphy_only_evt, rdev_return_void, TP_PROTO(struct wiphy *wiphy), TP_ARGS(wiphy) ); TRACE_EVENT(rdev_get_antenna, TP_PROTO(struct wiphy *wiphy, int radio_idx), TP_ARGS(wiphy, radio_idx), TP_STRUCT__entry( WIPHY_ENTRY __field(int, radio_idx) ), TP_fast_assign( WIPHY_ASSIGN; __entry->radio_idx = radio_idx; ), TP_printk(WIPHY_PR_FMT ", radio_idx: %d", WIPHY_PR_ARG, __entry->radio_idx) ); DEFINE_EVENT(wiphy_only_evt, rdev_rfkill_poll, TP_PROTO(struct wiphy *wiphy), TP_ARGS(wiphy) ); DECLARE_EVENT_CLASS(wiphy_enabled_evt, TP_PROTO(struct wiphy *wiphy, bool enabled), TP_ARGS(wiphy, enabled), TP_STRUCT__entry( WIPHY_ENTRY __field(bool, enabled) ), TP_fast_assign( WIPHY_ASSIGN; __entry->enabled = enabled; ), TP_printk(WIPHY_PR_FMT ", %senabled ", WIPHY_PR_ARG, __entry->enabled ? "" : "not ") ); DEFINE_EVENT(wiphy_enabled_evt, rdev_set_wakeup, TP_PROTO(struct wiphy *wiphy, bool enabled), TP_ARGS(wiphy, enabled) ); TRACE_EVENT(rdev_add_virtual_intf, TP_PROTO(struct wiphy *wiphy, char *name, enum nl80211_iftype type), TP_ARGS(wiphy, name, type), TP_STRUCT__entry( WIPHY_ENTRY __string(vir_intf_name, name ? name : "<noname>") __field(enum nl80211_iftype, type) ), TP_fast_assign( WIPHY_ASSIGN; __assign_str(vir_intf_name); __entry->type = type; ), TP_printk(WIPHY_PR_FMT ", virtual intf name: %s, type: %d", WIPHY_PR_ARG, __get_str(vir_intf_name), __entry->type) ); DECLARE_EVENT_CLASS(wiphy_wdev_evt, TP_PROTO(struct wiphy *wiphy, struct wireless_dev *wdev), TP_ARGS(wiphy, wdev), TP_STRUCT__entry( WIPHY_ENTRY WDEV_ENTRY ), TP_fast_assign( WIPHY_ASSIGN; WDEV_ASSIGN; ), TP_printk(WIPHY_PR_FMT ", " WDEV_PR_FMT, WIPHY_PR_ARG, WDEV_PR_ARG) ); DECLARE_EVENT_CLASS(wiphy_wdev_cookie_evt, TP_PROTO(struct wiphy *wiphy, struct wireless_dev *wdev, u64 cookie), TP_ARGS(wiphy, wdev, cookie), TP_STRUCT__entry( WIPHY_ENTRY WDEV_ENTRY __field(u64, cookie) ), TP_fast_assign( WIPHY_ASSIGN; WDEV_ASSIGN; __entry->cookie = cookie; ), TP_printk(WIPHY_PR_FMT ", " WDEV_PR_FMT ", cookie: %lld", WIPHY_PR_ARG, WDEV_PR_ARG, (unsigned long long)__entry->cookie) ); DEFINE_EVENT(wiphy_wdev_evt, rdev_return_wdev, TP_PROTO(struct wiphy *wiphy, struct wireless_dev *wdev), TP_ARGS(wiphy, wdev) ); DEFINE_EVENT(wiphy_wdev_evt, rdev_del_virtual_intf, TP_PROTO(struct wiphy *wiphy, struct wireless_dev *wdev), TP_ARGS(wiphy, wdev) ); TRACE_EVENT(rdev_change_virtual_intf, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, enum nl80211_iftype type), TP_ARGS(wiphy, netdev, type), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY __field(enum nl80211_iftype, type) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; __entry->type = type; ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", type: %d", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->type) ); DECLARE_EVENT_CLASS(key_handle, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, int link_id, u8 key_index, bool pairwise, const u8 *mac_addr), TP_ARGS(wiphy, netdev, link_id, key_index, pairwise, mac_addr), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY MAC_ENTRY(mac_addr) __field(int, link_id) __field(u8, key_index) __field(bool, pairwise) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; MAC_ASSIGN(mac_addr, mac_addr); __entry->link_id = link_id; __entry->key_index = key_index; __entry->pairwise = pairwise; ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", link_id: %d, " "key_index: %u, pairwise: %s, mac addr: %pM", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->link_id, __entry->key_index, BOOL_TO_STR(__entry->pairwise), __entry->mac_addr) ); DEFINE_EVENT(key_handle, rdev_get_key, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, int link_id, u8 key_index, bool pairwise, const u8 *mac_addr), TP_ARGS(wiphy, netdev, link_id, key_index, pairwise, mac_addr) ); DEFINE_EVENT(key_handle, rdev_del_key, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, int link_id, u8 key_index, bool pairwise, const u8 *mac_addr), TP_ARGS(wiphy, netdev, link_id, key_index, pairwise, mac_addr) ); TRACE_EVENT(rdev_add_key, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, int link_id, u8 key_index, bool pairwise, const u8 *mac_addr, u8 mode), TP_ARGS(wiphy, netdev, link_id, key_index, pairwise, mac_addr, mode), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY MAC_ENTRY(mac_addr) __field(int, link_id) __field(u8, key_index) __field(bool, pairwise) __field(u8, mode) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; MAC_ASSIGN(mac_addr, mac_addr); __entry->link_id = link_id; __entry->key_index = key_index; __entry->pairwise = pairwise; __entry->mode = mode; ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", link_id: %d, " "key_index: %u, mode: %u, pairwise: %s, " "mac addr: %pM", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->link_id, __entry->key_index, __entry->mode, BOOL_TO_STR(__entry->pairwise), __entry->mac_addr) ); TRACE_EVENT(rdev_set_default_key, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, int link_id, u8 key_index, bool unicast, bool multicast), TP_ARGS(wiphy, netdev, link_id, key_index, unicast, multicast), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY __field(int, link_id) __field(u8, key_index) __field(bool, unicast) __field(bool, multicast) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; __entry->link_id = link_id; __entry->key_index = key_index; __entry->unicast = unicast; __entry->multicast = multicast; ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", link_id: %d, " "key index: %u, unicast: %s, multicast: %s", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->link_id, __entry->key_index, BOOL_TO_STR(__entry->unicast), BOOL_TO_STR(__entry->multicast)) ); TRACE_EVENT(rdev_set_default_mgmt_key, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, int link_id, u8 key_index), TP_ARGS(wiphy, netdev, link_id, key_index), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY __field(int, link_id) __field(u8, key_index) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; __entry->link_id = link_id; __entry->key_index = key_index; ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", link_id: %d, " "key index: %u", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->link_id, __entry->key_index) ); TRACE_EVENT(rdev_set_default_beacon_key, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, int link_id, u8 key_index), TP_ARGS(wiphy, netdev, link_id, key_index), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY __field(int, link_id) __field(u8, key_index) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; __entry->link_id = link_id; __entry->key_index = key_index; ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", link_id: %d, " "key index: %u", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->link_id, __entry->key_index) ); TRACE_EVENT(rdev_start_ap, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, struct cfg80211_ap_settings *settings), TP_ARGS(wiphy, netdev, settings), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY CHAN_DEF_ENTRY __field(int, beacon_interval) __field(int, dtim_period) __array(char, ssid, IEEE80211_MAX_SSID_LEN + 1) __field(enum nl80211_hidden_ssid, hidden_ssid) __field(u32, wpa_ver) __field(bool, privacy) __field(enum nl80211_auth_type, auth_type) __field(int, inactivity_timeout) __field(unsigned int, link_id) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; CHAN_DEF_ASSIGN(&settings->chandef); __entry->beacon_interval = settings->beacon_interval; __entry->dtim_period = settings->dtim_period; __entry->hidden_ssid = settings->hidden_ssid; __entry->wpa_ver = settings->crypto.wpa_versions; __entry->privacy = settings->privacy; __entry->auth_type = settings->auth_type; __entry->inactivity_timeout = settings->inactivity_timeout; memset(__entry->ssid, 0, IEEE80211_MAX_SSID_LEN + 1); memcpy(__entry->ssid, settings->ssid, settings->ssid_len); __entry->link_id = settings->beacon.link_id; ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", AP settings - ssid: %s, " CHAN_DEF_PR_FMT ", beacon interval: %d, dtim period: %d, " "hidden ssid: %d, wpa versions: %u, privacy: %s, " "auth type: %d, inactivity timeout: %d, link_id: %d", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->ssid, CHAN_DEF_PR_ARG, __entry->beacon_interval, __entry->dtim_period, __entry->hidden_ssid, __entry->wpa_ver, BOOL_TO_STR(__entry->privacy), __entry->auth_type, __entry->inactivity_timeout, __entry->link_id) ); TRACE_EVENT(rdev_change_beacon, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, struct cfg80211_ap_update *info), TP_ARGS(wiphy, netdev, info), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY __field(int, link_id) __dynamic_array(u8, head, info->beacon.head_len) __dynamic_array(u8, tail, info->beacon.tail_len) __dynamic_array(u8, beacon_ies, info->beacon.beacon_ies_len) __dynamic_array(u8, proberesp_ies, info->beacon.proberesp_ies_len) __dynamic_array(u8, assocresp_ies, info->beacon.assocresp_ies_len) __dynamic_array(u8, probe_resp, info->beacon.probe_resp_len) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; __entry->link_id = info->beacon.link_id; if (info->beacon.head) memcpy(__get_dynamic_array(head), info->beacon.head, info->beacon.head_len); if (info->beacon.tail) memcpy(__get_dynamic_array(tail), info->beacon.tail, info->beacon.tail_len); if (info->beacon.beacon_ies) memcpy(__get_dynamic_array(beacon_ies), info->beacon.beacon_ies, info->beacon.beacon_ies_len); if (info->beacon.proberesp_ies) memcpy(__get_dynamic_array(proberesp_ies), info->beacon.proberesp_ies, info->beacon.proberesp_ies_len); if (info->beacon.assocresp_ies) memcpy(__get_dynamic_array(assocresp_ies), info->beacon.assocresp_ies, info->beacon.assocresp_ies_len); if (info->beacon.probe_resp) memcpy(__get_dynamic_array(probe_resp), info->beacon.probe_resp, info->beacon.probe_resp_len); ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", link_id:%d", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->link_id) ); TRACE_EVENT(rdev_stop_ap, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, unsigned int link_id), TP_ARGS(wiphy, netdev, link_id), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY __field(unsigned int, link_id) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; __entry->link_id = link_id; ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", link_id: %d", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->link_id) ); DECLARE_EVENT_CLASS(wiphy_netdev_evt, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev), TP_ARGS(wiphy, netdev), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT, WIPHY_PR_ARG, NETDEV_PR_ARG) ); DEFINE_EVENT(wiphy_netdev_evt, rdev_set_rekey_data, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev), TP_ARGS(wiphy, netdev) ); DEFINE_EVENT(wiphy_netdev_evt, rdev_get_mesh_config, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev), TP_ARGS(wiphy, netdev) ); DEFINE_EVENT(wiphy_netdev_evt, rdev_leave_mesh, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev), TP_ARGS(wiphy, netdev) ); DEFINE_EVENT(wiphy_netdev_evt, rdev_leave_ibss, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev), TP_ARGS(wiphy, netdev) ); DEFINE_EVENT(wiphy_netdev_evt, rdev_leave_ocb, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev), TP_ARGS(wiphy, netdev) ); DEFINE_EVENT(wiphy_netdev_evt, rdev_flush_pmksa, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev), TP_ARGS(wiphy, netdev) ); TRACE_EVENT(rdev_end_cac, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, unsigned int link_id), TP_ARGS(wiphy, netdev, link_id), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY __field(unsigned int, link_id) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; __entry->link_id = link_id; ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", link_id: %d", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->link_id) ); DECLARE_EVENT_CLASS(station_add_change, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, u8 *mac, struct station_parameters *params), TP_ARGS(wiphy, netdev, mac, params), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY MAC_ENTRY(sta_mac) __field(u32, sta_flags_mask) __field(u32, sta_flags_set) __field(u32, sta_modify_mask) __field(int, listen_interval) __field(u16, capability) __field(u16, aid) __field(u8, plink_action) __field(u8, plink_state) __field(u8, uapsd_queues) __field(u8, max_sp) __field(u8, opmode_notif) __field(bool, opmode_notif_used) __array(u8, ht_capa, (int)sizeof(struct ieee80211_ht_cap)) __array(u8, vht_capa, (int)sizeof(struct ieee80211_vht_cap)) __array(char, vlan, IFNAMSIZ) __dynamic_array(u8, supported_rates, params->link_sta_params.supported_rates_len) __dynamic_array(u8, ext_capab, params->ext_capab_len) __dynamic_array(u8, supported_channels, params->supported_channels_len) __dynamic_array(u8, supported_oper_classes, params->supported_oper_classes_len) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; MAC_ASSIGN(sta_mac, mac); __entry->sta_flags_mask = params->sta_flags_mask; __entry->sta_flags_set = params->sta_flags_set; __entry->sta_modify_mask = params->sta_modify_mask; __entry->listen_interval = params->listen_interval; __entry->aid = params->aid; __entry->plink_action = params->plink_action; __entry->plink_state = params->plink_state; __entry->uapsd_queues = params->uapsd_queues; memset(__entry->ht_capa, 0, sizeof(struct ieee80211_ht_cap)); if (params->link_sta_params.ht_capa) memcpy(__entry->ht_capa, params->link_sta_params.ht_capa, sizeof(struct ieee80211_ht_cap)); memset(__entry->vht_capa, 0, sizeof(struct ieee80211_vht_cap)); if (params->link_sta_params.vht_capa) memcpy(__entry->vht_capa, params->link_sta_params.vht_capa, sizeof(struct ieee80211_vht_cap)); memset(__entry->vlan, 0, sizeof(__entry->vlan)); if (params->vlan) memcpy(__entry->vlan, params->vlan->name, IFNAMSIZ); if (params->link_sta_params.supported_rates && params->link_sta_params.supported_rates_len) memcpy(__get_dynamic_array(supported_rates), params->link_sta_params.supported_rates, params->link_sta_params.supported_rates_len); if (params->ext_capab && params->ext_capab_len) memcpy(__get_dynamic_array(ext_capab), params->ext_capab, params->ext_capab_len); if (params->supported_channels && params->supported_channels_len) memcpy(__get_dynamic_array(supported_channels), params->supported_channels, params->supported_channels_len); if (params->supported_oper_classes && params->supported_oper_classes_len) memcpy(__get_dynamic_array(supported_oper_classes), params->supported_oper_classes, params->supported_oper_classes_len); __entry->max_sp = params->max_sp; __entry->capability = params->capability; __entry->opmode_notif = params->link_sta_params.opmode_notif; __entry->opmode_notif_used = params->link_sta_params.opmode_notif_used; ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", station mac: %pM" ", station flags mask: 0x%x, station flags set: 0x%x, " "station modify mask: 0x%x, listen interval: %d, aid: %u, " "plink action: %u, plink state: %u, uapsd queues: %u, vlan:%s", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->sta_mac, __entry->sta_flags_mask, __entry->sta_flags_set, __entry->sta_modify_mask, __entry->listen_interval, __entry->aid, __entry->plink_action, __entry->plink_state, __entry->uapsd_queues, __entry->vlan) ); DEFINE_EVENT(station_add_change, rdev_add_station, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, u8 *mac, struct station_parameters *params), TP_ARGS(wiphy, netdev, mac, params) ); DEFINE_EVENT(station_add_change, rdev_change_station, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, u8 *mac, struct station_parameters *params), TP_ARGS(wiphy, netdev, mac, params) ); DECLARE_EVENT_CLASS(wiphy_netdev_mac_evt, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, const u8 *mac), TP_ARGS(wiphy, netdev, mac), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY MAC_ENTRY(sta_mac) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; MAC_ASSIGN(sta_mac, mac); ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", mac: %pM", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->sta_mac) ); DECLARE_EVENT_CLASS(station_del, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, struct station_del_parameters *params), TP_ARGS(wiphy, netdev, params), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY MAC_ENTRY(sta_mac) __field(u8, subtype) __field(u16, reason_code) __field(int, link_id) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; MAC_ASSIGN(sta_mac, params->mac); __entry->subtype = params->subtype; __entry->reason_code = params->reason_code; __entry->link_id = params->link_id; ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", station mac: %pM" ", subtype: %u, reason_code: %u, link_id: %d", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->sta_mac, __entry->subtype, __entry->reason_code, __entry->link_id) ); DEFINE_EVENT(station_del, rdev_del_station, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, struct station_del_parameters *params), TP_ARGS(wiphy, netdev, params) ); DEFINE_EVENT(wiphy_netdev_mac_evt, rdev_get_station, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, const u8 *mac), TP_ARGS(wiphy, netdev, mac) ); DEFINE_EVENT(wiphy_netdev_mac_evt, rdev_del_mpath, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, const u8 *mac), TP_ARGS(wiphy, netdev, mac) ); TRACE_EVENT(rdev_dump_station, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, int _idx, u8 *mac), TP_ARGS(wiphy, netdev, _idx, mac), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY MAC_ENTRY(sta_mac) __field(int, idx) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; MAC_ASSIGN(sta_mac, mac); __entry->idx = _idx; ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", station mac: %pM, idx: %d", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->sta_mac, __entry->idx) ); TRACE_EVENT(rdev_return_int_station_info, TP_PROTO(struct wiphy *wiphy, int ret, struct station_info *sinfo), TP_ARGS(wiphy, ret, sinfo), TP_STRUCT__entry( WIPHY_ENTRY __field(int, ret) SINFO_ENTRY ), TP_fast_assign( WIPHY_ASSIGN; __entry->ret = ret; SINFO_ASSIGN; ), TP_printk(WIPHY_PR_FMT ", returned %d" , WIPHY_PR_ARG, __entry->ret) ); DECLARE_EVENT_CLASS(mpath_evt, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, u8 *dst, u8 *next_hop), TP_ARGS(wiphy, netdev, dst, next_hop), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY MAC_ENTRY(dst) MAC_ENTRY(next_hop) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; MAC_ASSIGN(dst, dst); MAC_ASSIGN(next_hop, next_hop); ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", destination: %pM, next hop: %pM", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->dst, __entry->next_hop) ); DEFINE_EVENT(mpath_evt, rdev_add_mpath, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, u8 *dst, u8 *next_hop), TP_ARGS(wiphy, netdev, dst, next_hop) ); DEFINE_EVENT(mpath_evt, rdev_change_mpath, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, u8 *dst, u8 *next_hop), TP_ARGS(wiphy, netdev, dst, next_hop) ); DEFINE_EVENT(mpath_evt, rdev_get_mpath, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, u8 *dst, u8 *next_hop), TP_ARGS(wiphy, netdev, dst, next_hop) ); TRACE_EVENT(rdev_dump_mpath, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, int _idx, u8 *dst, u8 *next_hop), TP_ARGS(wiphy, netdev, _idx, dst, next_hop), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY MAC_ENTRY(dst) MAC_ENTRY(next_hop) __field(int, idx) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; MAC_ASSIGN(dst, dst); MAC_ASSIGN(next_hop, next_hop); __entry->idx = _idx; ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", index: %d, destination: %pM, next hop: %pM", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->idx, __entry->dst, __entry->next_hop) ); TRACE_EVENT(rdev_get_mpp, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, u8 *dst, u8 *mpp), TP_ARGS(wiphy, netdev, dst, mpp), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY MAC_ENTRY(dst) MAC_ENTRY(mpp) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; MAC_ASSIGN(dst, dst); MAC_ASSIGN(mpp, mpp); ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", destination: %pM" ", mpp: %pM", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->dst, __entry->mpp) ); TRACE_EVENT(rdev_dump_mpp, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, int _idx, u8 *dst, u8 *mpp), TP_ARGS(wiphy, netdev, _idx, dst, mpp), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY MAC_ENTRY(dst) MAC_ENTRY(mpp) __field(int, idx) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; MAC_ASSIGN(dst, dst); MAC_ASSIGN(mpp, mpp); __entry->idx = _idx; ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", index: %d, destination: %pM, mpp: %pM", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->idx, __entry->dst, __entry->mpp) ); TRACE_EVENT(rdev_return_int_mpath_info, TP_PROTO(struct wiphy *wiphy, int ret, struct mpath_info *pinfo), TP_ARGS(wiphy, ret, pinfo), TP_STRUCT__entry( WIPHY_ENTRY __field(int, ret) __field(int, generation) __field(u32, filled) __field(u32, frame_qlen) __field(u32, sn) __field(u32, metric) __field(u32, exptime) __field(u32, discovery_timeout) __field(u8, discovery_retries) __field(u8, flags) ), TP_fast_assign( WIPHY_ASSIGN; __entry->ret = ret; __entry->generation = pinfo->generation; __entry->filled = pinfo->filled; __entry->frame_qlen = pinfo->frame_qlen; __entry->sn = pinfo->sn; __entry->metric = pinfo->metric; __entry->exptime = pinfo->exptime; __entry->discovery_timeout = pinfo->discovery_timeout; __entry->discovery_retries = pinfo->discovery_retries; __entry->flags = pinfo->flags; ), TP_printk(WIPHY_PR_FMT ", returned %d. mpath info - generation: %d, " "filled: %u, frame qlen: %u, sn: %u, metric: %u, exptime: %u," " discovery timeout: %u, discovery retries: %u, flags: 0x%x", WIPHY_PR_ARG, __entry->ret, __entry->generation, __entry->filled, __entry->frame_qlen, __entry->sn, __entry->metric, __entry->exptime, __entry->discovery_timeout, __entry->discovery_retries, __entry->flags) ); TRACE_EVENT(rdev_return_int_mesh_config, TP_PROTO(struct wiphy *wiphy, int ret, struct mesh_config *conf), TP_ARGS(wiphy, ret, conf), TP_STRUCT__entry( WIPHY_ENTRY MESH_CFG_ENTRY __field(int, ret) ), TP_fast_assign( WIPHY_ASSIGN; MESH_CFG_ASSIGN; __entry->ret = ret; ), TP_printk(WIPHY_PR_FMT ", returned: %d", WIPHY_PR_ARG, __entry->ret) ); TRACE_EVENT(rdev_update_mesh_config, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, u32 mask, const struct mesh_config *conf), TP_ARGS(wiphy, netdev, mask, conf), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY MESH_CFG_ENTRY __field(u32, mask) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; MESH_CFG_ASSIGN; __entry->mask = mask; ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", mask: %u", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->mask) ); TRACE_EVENT(rdev_join_mesh, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, const struct mesh_config *conf, const struct mesh_setup *setup), TP_ARGS(wiphy, netdev, conf, setup), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY MESH_CFG_ENTRY ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; MESH_CFG_ASSIGN; ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT, WIPHY_PR_ARG, NETDEV_PR_ARG) ); TRACE_EVENT(rdev_change_bss, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, struct bss_parameters *params), TP_ARGS(wiphy, netdev, params), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY __field(int, use_cts_prot) __field(int, use_short_preamble) __field(int, use_short_slot_time) __field(int, ap_isolate) __field(int, ht_opmode) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; __entry->use_cts_prot = params->use_cts_prot; __entry->use_short_preamble = params->use_short_preamble; __entry->use_short_slot_time = params->use_short_slot_time; __entry->ap_isolate = params->ap_isolate; __entry->ht_opmode = params->ht_opmode; ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", use cts prot: %d, " "use short preamble: %d, use short slot time: %d, " "ap isolate: %d, ht opmode: %d", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->use_cts_prot, __entry->use_short_preamble, __entry->use_short_slot_time, __entry->ap_isolate, __entry->ht_opmode) ); TRACE_EVENT(rdev_inform_bss, TP_PROTO(struct wiphy *wiphy, struct cfg80211_bss *bss), TP_ARGS(wiphy, bss), TP_STRUCT__entry( WIPHY_ENTRY MAC_ENTRY(bssid) CHAN_ENTRY ), TP_fast_assign( WIPHY_ASSIGN; MAC_ASSIGN(bssid, bss->bssid); CHAN_ASSIGN(bss->channel); ), TP_printk(WIPHY_PR_FMT ", %pM, " CHAN_PR_FMT, WIPHY_PR_ARG, __entry->bssid, CHAN_PR_ARG) ); TRACE_EVENT(rdev_set_txq_params, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, struct ieee80211_txq_params *params), TP_ARGS(wiphy, netdev, params), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY __field(enum nl80211_ac, ac) __field(u16, txop) __field(u16, cwmin) __field(u16, cwmax) __field(u8, aifs) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; __entry->ac = params->ac; __entry->txop = params->txop; __entry->cwmin = params->cwmin; __entry->cwmax = params->cwmax; __entry->aifs = params->aifs; ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", ac: %d, txop: %u, cwmin: %u, cwmax: %u, aifs: %u", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->ac, __entry->txop, __entry->cwmin, __entry->cwmax, __entry->aifs) ); TRACE_EVENT(rdev_libertas_set_mesh_channel, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, struct ieee80211_channel *chan), TP_ARGS(wiphy, netdev, chan), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY CHAN_ENTRY ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; CHAN_ASSIGN(chan); ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", " CHAN_PR_FMT, WIPHY_PR_ARG, NETDEV_PR_ARG, CHAN_PR_ARG) ); TRACE_EVENT(rdev_set_monitor_channel, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, struct cfg80211_chan_def *chandef), TP_ARGS(wiphy, netdev, chandef), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY CHAN_DEF_ENTRY ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; CHAN_DEF_ASSIGN(chandef); ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", " CHAN_DEF_PR_FMT, WIPHY_PR_ARG, NETDEV_PR_ARG, CHAN_DEF_PR_ARG) ); TRACE_EVENT(rdev_auth, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, struct cfg80211_auth_request *req), TP_ARGS(wiphy, netdev, req), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY MAC_ENTRY(bssid) __field(enum nl80211_auth_type, auth_type) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; if (req->bss) MAC_ASSIGN(bssid, req->bss->bssid); else eth_zero_addr(__entry->bssid); __entry->auth_type = req->auth_type; ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", auth type: %d, bssid: %pM", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->auth_type, __entry->bssid) ); TRACE_EVENT(rdev_assoc, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, struct cfg80211_assoc_request *req), TP_ARGS(wiphy, netdev, req), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY MAC_ENTRY(bssid) MAC_ENTRY(prev_bssid) __field(bool, use_mfp) __field(u32, flags) __dynamic_array(u8, elements, req->ie_len) __array(u8, ht_capa, sizeof(struct ieee80211_ht_cap)) __array(u8, ht_capa_mask, sizeof(struct ieee80211_ht_cap)) __array(u8, vht_capa, sizeof(struct ieee80211_vht_cap)) __array(u8, vht_capa_mask, sizeof(struct ieee80211_vht_cap)) __dynamic_array(u8, fils_kek, req->fils_kek_len) __dynamic_array(u8, fils_nonces, req->fils_nonces ? 2 * FILS_NONCE_LEN : 0) __field(u16, ext_mld_capa_ops) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; if (req->bss) MAC_ASSIGN(bssid, req->bss->bssid); else eth_zero_addr(__entry->bssid); MAC_ASSIGN(prev_bssid, req->prev_bssid); __entry->use_mfp = req->use_mfp; __entry->flags = req->flags; if (req->ie) memcpy(__get_dynamic_array(elements), req->ie, req->ie_len); memcpy(__entry->ht_capa, &req->ht_capa, sizeof(req->ht_capa)); memcpy(__entry->ht_capa_mask, &req->ht_capa_mask, sizeof(req->ht_capa_mask)); memcpy(__entry->vht_capa, &req->vht_capa, sizeof(req->vht_capa)); memcpy(__entry->vht_capa_mask, &req->vht_capa_mask, sizeof(req->vht_capa_mask)); if (req->fils_kek) memcpy(__get_dynamic_array(fils_kek), req->fils_kek, req->fils_kek_len); if (req->fils_nonces) memcpy(__get_dynamic_array(fils_nonces), req->fils_nonces, 2 * FILS_NONCE_LEN); __entry->ext_mld_capa_ops = req->ext_mld_capa_ops; ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", bssid: %pM" ", previous bssid: %pM, use mfp: %s, flags: 0x%x", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->bssid, __entry->prev_bssid, BOOL_TO_STR(__entry->use_mfp), __entry->flags) ); TRACE_EVENT(rdev_deauth, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, struct cfg80211_deauth_request *req), TP_ARGS(wiphy, netdev, req), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY MAC_ENTRY(bssid) __field(u16, reason_code) __field(bool, local_state_change) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; MAC_ASSIGN(bssid, req->bssid); __entry->reason_code = req->reason_code; __entry->local_state_change = req->local_state_change; ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", bssid: %pM, reason: %u, local_state_change:%d", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->bssid, __entry->reason_code, __entry->local_state_change) ); TRACE_EVENT(rdev_disassoc, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, struct cfg80211_disassoc_request *req), TP_ARGS(wiphy, netdev, req), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY MAC_ENTRY(bssid) __field(u16, reason_code) __field(bool, local_state_change) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; MAC_ASSIGN(bssid, req->ap_addr); __entry->reason_code = req->reason_code; __entry->local_state_change = req->local_state_change; ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", bssid: %pM" ", reason: %u, local state change: %s", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->bssid, __entry->reason_code, BOOL_TO_STR(__entry->local_state_change)) ); TRACE_EVENT(rdev_mgmt_tx_cancel_wait, TP_PROTO(struct wiphy *wiphy, struct wireless_dev *wdev, u64 cookie), TP_ARGS(wiphy, wdev, cookie), TP_STRUCT__entry( WIPHY_ENTRY WDEV_ENTRY __field(u64, cookie) ), TP_fast_assign( WIPHY_ASSIGN; WDEV_ASSIGN; __entry->cookie = cookie; ), TP_printk(WIPHY_PR_FMT ", " WDEV_PR_FMT ", cookie: %llu ", WIPHY_PR_ARG, WDEV_PR_ARG, __entry->cookie) ); TRACE_EVENT(rdev_set_power_mgmt, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, bool enabled, int timeout), TP_ARGS(wiphy, netdev, enabled, timeout), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY __field(bool, enabled) __field(int, timeout) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; __entry->enabled = enabled; __entry->timeout = timeout; ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", %senabled, timeout: %d ", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->enabled ? "" : "not ", __entry->timeout) ); TRACE_EVENT(rdev_connect, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, struct cfg80211_connect_params *sme), TP_ARGS(wiphy, netdev, sme), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY MAC_ENTRY(bssid) __array(char, ssid, IEEE80211_MAX_SSID_LEN + 1) __field(enum nl80211_auth_type, auth_type) __field(bool, privacy) __field(u32, wpa_versions) __field(u32, flags) MAC_ENTRY(prev_bssid) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; MAC_ASSIGN(bssid, sme->bssid); memset(__entry->ssid, 0, IEEE80211_MAX_SSID_LEN + 1); memcpy(__entry->ssid, sme->ssid, sme->ssid_len); __entry->auth_type = sme->auth_type; __entry->privacy = sme->privacy; __entry->wpa_versions = sme->crypto.wpa_versions; __entry->flags = sme->flags; MAC_ASSIGN(prev_bssid, sme->prev_bssid); ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", bssid: %pM" ", ssid: %s, auth type: %d, privacy: %s, wpa versions: %u, " "flags: 0x%x, previous bssid: %pM", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->bssid, __entry->ssid, __entry->auth_type, BOOL_TO_STR(__entry->privacy), __entry->wpa_versions, __entry->flags, __entry->prev_bssid) ); TRACE_EVENT(rdev_update_connect_params, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, struct cfg80211_connect_params *sme, u32 changed), TP_ARGS(wiphy, netdev, sme, changed), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY __field(u32, changed) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; __entry->changed = changed; ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", parameters changed: %u", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->changed) ); TRACE_EVENT(rdev_set_cqm_rssi_config, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, s32 rssi_thold, u32 rssi_hyst), TP_ARGS(wiphy, netdev, rssi_thold, rssi_hyst), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY __field(s32, rssi_thold) __field(u32, rssi_hyst) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; __entry->rssi_thold = rssi_thold; __entry->rssi_hyst = rssi_hyst; ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", rssi_thold: %d, rssi_hyst: %u ", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->rssi_thold, __entry->rssi_hyst) ); TRACE_EVENT(rdev_set_cqm_rssi_range_config, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, s32 low, s32 high), TP_ARGS(wiphy, netdev, low, high), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY __field(s32, rssi_low) __field(s32, rssi_high) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; __entry->rssi_low = low; __entry->rssi_high = high; ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", range: %d - %d ", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->rssi_low, __entry->rssi_high) ); TRACE_EVENT(rdev_set_cqm_txe_config, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, u32 rate, u32 pkts, u32 intvl), TP_ARGS(wiphy, netdev, rate, pkts, intvl), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY __field(u32, rate) __field(u32, pkts) __field(u32, intvl) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; __entry->rate = rate; __entry->pkts = pkts; __entry->intvl = intvl; ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", rate: %u, packets: %u, interval: %u", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->rate, __entry->pkts, __entry->intvl) ); TRACE_EVENT(rdev_disconnect, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, u16 reason_code), TP_ARGS(wiphy, netdev, reason_code), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY __field(u16, reason_code) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; __entry->reason_code = reason_code; ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", reason code: %u", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->reason_code) ); TRACE_EVENT(rdev_join_ibss, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, struct cfg80211_ibss_params *params), TP_ARGS(wiphy, netdev, params), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY MAC_ENTRY(bssid) __array(char, ssid, IEEE80211_MAX_SSID_LEN + 1) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; MAC_ASSIGN(bssid, params->bssid); memset(__entry->ssid, 0, IEEE80211_MAX_SSID_LEN + 1); memcpy(__entry->ssid, params->ssid, params->ssid_len); ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", bssid: %pM, ssid: %s", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->bssid, __entry->ssid) ); TRACE_EVENT(rdev_join_ocb, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, const struct ocb_setup *setup), TP_ARGS(wiphy, netdev, setup), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT, WIPHY_PR_ARG, NETDEV_PR_ARG) ); TRACE_EVENT(rdev_set_wiphy_params, TP_PROTO(struct wiphy *wiphy, int radio_idx, u32 changed), TP_ARGS(wiphy, radio_idx, changed), TP_STRUCT__entry( WIPHY_ENTRY __field(int, radio_idx) __field(u32, changed) ), TP_fast_assign( WIPHY_ASSIGN; __entry->radio_idx = radio_idx; __entry->changed = changed; ), TP_printk(WIPHY_PR_FMT ", radio_idx: %d, changed: %u", WIPHY_PR_ARG, __entry->radio_idx, __entry->changed) ); DECLARE_EVENT_CLASS(wiphy_wdev_link_evt, TP_PROTO(struct wiphy *wiphy, struct wireless_dev *wdev, unsigned int link_id), TP_ARGS(wiphy, wdev, link_id), TP_STRUCT__entry( WIPHY_ENTRY WDEV_ENTRY __field(unsigned int, link_id) ), TP_fast_assign( WIPHY_ASSIGN; WDEV_ASSIGN; __entry->link_id = link_id; ), TP_printk(WIPHY_PR_FMT ", " WDEV_PR_FMT ", link_id: %u", WIPHY_PR_ARG, WDEV_PR_ARG, __entry->link_id) ); TRACE_EVENT(rdev_get_tx_power, TP_PROTO(struct wiphy *wiphy, struct wireless_dev *wdev, int radio_idx, unsigned int link_id), TP_ARGS(wiphy, wdev, radio_idx, link_id), TP_STRUCT__entry( WIPHY_ENTRY WDEV_ENTRY __field(int, radio_idx) __field(unsigned int, link_id) ), TP_fast_assign( WIPHY_ASSIGN; WDEV_ASSIGN; __entry->radio_idx = radio_idx; __entry->link_id = link_id; ), TP_printk(WIPHY_PR_FMT ", " WDEV_PR_FMT ", radio_idx: %d, link_id: %u", WIPHY_PR_ARG, WDEV_PR_ARG, __entry->radio_idx, __entry->link_id) ); TRACE_EVENT(rdev_set_tx_power, TP_PROTO(struct wiphy *wiphy, struct wireless_dev *wdev, int radio_idx, enum nl80211_tx_power_setting type, int mbm), TP_ARGS(wiphy, wdev, radio_idx, type, mbm), TP_STRUCT__entry( WIPHY_ENTRY WDEV_ENTRY __field(int, radio_idx) __field(enum nl80211_tx_power_setting, type) __field(int, mbm) ), TP_fast_assign( WIPHY_ASSIGN; WDEV_ASSIGN; __entry->radio_idx = radio_idx; __entry->type = type; __entry->mbm = mbm; ), TP_printk(WIPHY_PR_FMT ", " WDEV_PR_FMT ", radio_idx: %d, type: %u, mbm: %d", WIPHY_PR_ARG, WDEV_PR_ARG, __entry->radio_idx, __entry->type, __entry->mbm) ); TRACE_EVENT(rdev_return_int_int, TP_PROTO(struct wiphy *wiphy, int func_ret, int func_fill), TP_ARGS(wiphy, func_ret, func_fill), TP_STRUCT__entry( WIPHY_ENTRY __field(int, func_ret) __field(int, func_fill) ), TP_fast_assign( WIPHY_ASSIGN; __entry->func_ret = func_ret; __entry->func_fill = func_fill; ), TP_printk(WIPHY_PR_FMT ", function returns: %d, function filled: %d", WIPHY_PR_ARG, __entry->func_ret, __entry->func_fill) ); #ifdef CONFIG_NL80211_TESTMODE TRACE_EVENT(rdev_testmode_cmd, TP_PROTO(struct wiphy *wiphy, struct wireless_dev *wdev), TP_ARGS(wiphy, wdev), TP_STRUCT__entry( WIPHY_ENTRY WDEV_ENTRY ), TP_fast_assign( WIPHY_ASSIGN; WDEV_ASSIGN; ), TP_printk(WIPHY_PR_FMT WDEV_PR_FMT, WIPHY_PR_ARG, WDEV_PR_ARG) ); TRACE_EVENT(rdev_testmode_dump, TP_PROTO(struct wiphy *wiphy), TP_ARGS(wiphy), TP_STRUCT__entry( WIPHY_ENTRY ), TP_fast_assign( WIPHY_ASSIGN; ), TP_printk(WIPHY_PR_FMT, WIPHY_PR_ARG) ); #endif /* CONFIG_NL80211_TESTMODE */ TRACE_EVENT(rdev_set_bitrate_mask, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, unsigned int link_id, const u8 *peer, const struct cfg80211_bitrate_mask *mask), TP_ARGS(wiphy, netdev, link_id, peer, mask), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY __field(unsigned int, link_id) MAC_ENTRY(peer) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; __entry->link_id = link_id; MAC_ASSIGN(peer, peer); ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", link_id: %d, peer: %pM", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->link_id, __entry->peer) ); TRACE_EVENT(rdev_update_mgmt_frame_registrations, TP_PROTO(struct wiphy *wiphy, struct wireless_dev *wdev, struct mgmt_frame_regs *upd), TP_ARGS(wiphy, wdev, upd), TP_STRUCT__entry( WIPHY_ENTRY WDEV_ENTRY __field(u16, global_stypes) __field(u16, interface_stypes) ), TP_fast_assign( WIPHY_ASSIGN; WDEV_ASSIGN; __entry->global_stypes = upd->global_stypes; __entry->interface_stypes = upd->interface_stypes; ), TP_printk(WIPHY_PR_FMT ", " WDEV_PR_FMT ", global: 0x%.2x, intf: 0x%.2x", WIPHY_PR_ARG, WDEV_PR_ARG, __entry->global_stypes, __entry->interface_stypes) ); TRACE_EVENT(rdev_return_int_tx_rx, TP_PROTO(struct wiphy *wiphy, int ret, u32 tx, u32 rx), TP_ARGS(wiphy, ret, tx, rx), TP_STRUCT__entry( WIPHY_ENTRY __field(int, ret) __field(u32, tx) __field(u32, rx) ), TP_fast_assign( WIPHY_ASSIGN; __entry->ret = ret; __entry->tx = tx; __entry->rx = rx; ), TP_printk(WIPHY_PR_FMT ", returned %d, tx: %u, rx: %u", WIPHY_PR_ARG, __entry->ret, __entry->tx, __entry->rx) ); TRACE_EVENT(rdev_return_void_tx_rx, TP_PROTO(struct wiphy *wiphy, u32 tx, u32 tx_max, u32 rx, u32 rx_max), TP_ARGS(wiphy, tx, tx_max, rx, rx_max), TP_STRUCT__entry( WIPHY_ENTRY __field(u32, tx) __field(u32, tx_max) __field(u32, rx) __field(u32, rx_max) ), TP_fast_assign( WIPHY_ASSIGN; __entry->tx = tx; __entry->tx_max = tx_max; __entry->rx = rx; __entry->rx_max = rx_max; ), TP_printk(WIPHY_PR_FMT ", tx: %u, tx_max: %u, rx: %u, rx_max: %u ", WIPHY_PR_ARG, __entry->tx, __entry->tx_max, __entry->rx, __entry->rx_max) ); TRACE_EVENT(rdev_set_antenna, TP_PROTO(struct wiphy *wiphy, int radio_idx, u32 tx, u32 rx), TP_ARGS(wiphy, radio_idx, tx, rx), TP_STRUCT__entry( WIPHY_ENTRY __field(int, radio_idx) __field(u32, tx) __field(u32, rx) ), TP_fast_assign( WIPHY_ASSIGN; __entry->radio_idx = radio_idx; __entry->tx = tx; __entry->rx = rx; ), TP_printk(WIPHY_PR_FMT ", radio_idx: %d, tx: %u, rx: %u ", WIPHY_PR_ARG, __entry->radio_idx, __entry->tx, __entry->rx) ); DECLARE_EVENT_CLASS(wiphy_netdev_id_evt, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, u64 id), TP_ARGS(wiphy, netdev, id), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY __field(u64, id) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; __entry->id = id; ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", id: %llu", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->id) ); DEFINE_EVENT(wiphy_netdev_id_evt, rdev_sched_scan_start, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, u64 id), TP_ARGS(wiphy, netdev, id) ); DEFINE_EVENT(wiphy_netdev_id_evt, rdev_sched_scan_stop, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, u64 id), TP_ARGS(wiphy, netdev, id) ); TRACE_EVENT(rdev_tdls_mgmt, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, u8 *peer, int link_id, u8 action_code, u8 dialog_token, u16 status_code, u32 peer_capability, bool initiator, const u8 *buf, size_t len), TP_ARGS(wiphy, netdev, peer, link_id, action_code, dialog_token, status_code, peer_capability, initiator, buf, len), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY MAC_ENTRY(peer) __field(int, link_id) __field(u8, action_code) __field(u8, dialog_token) __field(u16, status_code) __field(u32, peer_capability) __field(bool, initiator) __dynamic_array(u8, buf, len) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; MAC_ASSIGN(peer, peer); __entry->link_id = link_id; __entry->action_code = action_code; __entry->dialog_token = dialog_token; __entry->status_code = status_code; __entry->peer_capability = peer_capability; __entry->initiator = initiator; memcpy(__get_dynamic_array(buf), buf, len); ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", %pM" ", link_id: %d, action_code: %u " "dialog_token: %u, status_code: %u, peer_capability: %u " "initiator: %s buf: %#.2x ", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->peer, __entry->link_id, __entry->action_code, __entry->dialog_token, __entry->status_code, __entry->peer_capability, BOOL_TO_STR(__entry->initiator), ((u8 *)__get_dynamic_array(buf))[0]) ); TRACE_EVENT(rdev_dump_survey, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, int _idx), TP_ARGS(wiphy, netdev, _idx), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY __field(int, idx) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; __entry->idx = _idx; ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", index: %d", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->idx) ); TRACE_EVENT(rdev_return_int_survey_info, TP_PROTO(struct wiphy *wiphy, int ret, struct survey_info *info), TP_ARGS(wiphy, ret, info), TP_STRUCT__entry( WIPHY_ENTRY CHAN_ENTRY __field(int, ret) __field(u64, time) __field(u64, time_busy) __field(u64, time_ext_busy) __field(u64, time_rx) __field(u64, time_tx) __field(u64, time_scan) __field(u32, filled) __field(s8, noise) ), TP_fast_assign( WIPHY_ASSIGN; CHAN_ASSIGN(info->channel); __entry->ret = ret; __entry->time = info->time; __entry->time_busy = info->time_busy; __entry->time_ext_busy = info->time_ext_busy; __entry->time_rx = info->time_rx; __entry->time_tx = info->time_tx; __entry->time_scan = info->time_scan; __entry->filled = info->filled; __entry->noise = info->noise; ), TP_printk(WIPHY_PR_FMT ", returned: %d, " CHAN_PR_FMT ", channel time: %llu, channel time busy: %llu, " "channel time extension busy: %llu, channel time rx: %llu, " "channel time tx: %llu, scan time: %llu, filled: %u, noise: %d", WIPHY_PR_ARG, __entry->ret, CHAN_PR_ARG, __entry->time, __entry->time_busy, __entry->time_ext_busy, __entry->time_rx, __entry->time_tx, __entry->time_scan, __entry->filled, __entry->noise) ); TRACE_EVENT(rdev_tdls_oper, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, u8 *peer, enum nl80211_tdls_operation oper), TP_ARGS(wiphy, netdev, peer, oper), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY MAC_ENTRY(peer) __field(enum nl80211_tdls_operation, oper) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; MAC_ASSIGN(peer, peer); __entry->oper = oper; ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", %pM, oper: %d", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->peer, __entry->oper) ); DECLARE_EVENT_CLASS(rdev_pmksa, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, struct cfg80211_pmksa *pmksa), TP_ARGS(wiphy, netdev, pmksa), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY MAC_ENTRY(bssid) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; MAC_ASSIGN(bssid, pmksa->bssid); ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", bssid: %pM", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->bssid) ); TRACE_EVENT(rdev_probe_client, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, const u8 *peer), TP_ARGS(wiphy, netdev, peer), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY MAC_ENTRY(peer) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; MAC_ASSIGN(peer, peer); ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", %pM", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->peer) ); DEFINE_EVENT(rdev_pmksa, rdev_set_pmksa, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, struct cfg80211_pmksa *pmksa), TP_ARGS(wiphy, netdev, pmksa) ); DEFINE_EVENT(rdev_pmksa, rdev_del_pmksa, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, struct cfg80211_pmksa *pmksa), TP_ARGS(wiphy, netdev, pmksa) ); TRACE_EVENT(rdev_remain_on_channel, TP_PROTO(struct wiphy *wiphy, struct wireless_dev *wdev, struct ieee80211_channel *chan, unsigned int duration), TP_ARGS(wiphy, wdev, chan, duration), TP_STRUCT__entry( WIPHY_ENTRY WDEV_ENTRY CHAN_ENTRY __field(unsigned int, duration) ), TP_fast_assign( WIPHY_ASSIGN; WDEV_ASSIGN; CHAN_ASSIGN(chan); __entry->duration = duration; ), TP_printk(WIPHY_PR_FMT ", " WDEV_PR_FMT ", " CHAN_PR_FMT ", duration: %u", WIPHY_PR_ARG, WDEV_PR_ARG, CHAN_PR_ARG, __entry->duration) ); TRACE_EVENT(rdev_return_int_cookie, TP_PROTO(struct wiphy *wiphy, int ret, u64 cookie), TP_ARGS(wiphy, ret, cookie), TP_STRUCT__entry( WIPHY_ENTRY __field(int, ret) __field(u64, cookie) ), TP_fast_assign( WIPHY_ASSIGN; __entry->ret = ret; __entry->cookie = cookie; ), TP_printk(WIPHY_PR_FMT ", returned %d, cookie: %llu", WIPHY_PR_ARG, __entry->ret, __entry->cookie) ); TRACE_EVENT(rdev_cancel_remain_on_channel, TP_PROTO(struct wiphy *wiphy, struct wireless_dev *wdev, u64 cookie), TP_ARGS(wiphy, wdev, cookie), TP_STRUCT__entry( WIPHY_ENTRY WDEV_ENTRY __field(u64, cookie) ), TP_fast_assign( WIPHY_ASSIGN; WDEV_ASSIGN; __entry->cookie = cookie; ), TP_printk(WIPHY_PR_FMT ", " WDEV_PR_FMT ", cookie: %llu", WIPHY_PR_ARG, WDEV_PR_ARG, __entry->cookie) ); TRACE_EVENT(rdev_mgmt_tx, TP_PROTO(struct wiphy *wiphy, struct wireless_dev *wdev, struct cfg80211_mgmt_tx_params *params), TP_ARGS(wiphy, wdev, params), TP_STRUCT__entry( WIPHY_ENTRY WDEV_ENTRY CHAN_ENTRY __field(bool, offchan) __field(unsigned int, wait) __field(bool, no_cck) __field(bool, dont_wait_for_ack) ), TP_fast_assign( WIPHY_ASSIGN; WDEV_ASSIGN; CHAN_ASSIGN(params->chan); __entry->offchan = params->offchan; __entry->wait = params->wait; __entry->no_cck = params->no_cck; __entry->dont_wait_for_ack = params->dont_wait_for_ack; ), TP_printk(WIPHY_PR_FMT ", " WDEV_PR_FMT ", " CHAN_PR_FMT ", offchan: %s," " wait: %u, no cck: %s, dont wait for ack: %s", WIPHY_PR_ARG, WDEV_PR_ARG, CHAN_PR_ARG, BOOL_TO_STR(__entry->offchan), __entry->wait, BOOL_TO_STR(__entry->no_cck), BOOL_TO_STR(__entry->dont_wait_for_ack)) ); TRACE_EVENT(rdev_tx_control_port, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, const u8 *buf, size_t len, const u8 *dest, __be16 proto, bool unencrypted, int link_id), TP_ARGS(wiphy, netdev, buf, len, dest, proto, unencrypted, link_id), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY MAC_ENTRY(dest) __field(__be16, proto) __field(bool, unencrypted) __field(int, link_id) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; MAC_ASSIGN(dest, dest); __entry->proto = proto; __entry->unencrypted = unencrypted; __entry->link_id = link_id; ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", %pM," " proto: 0x%x, unencrypted: %s, link: %d", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->dest, be16_to_cpu(__entry->proto), BOOL_TO_STR(__entry->unencrypted), __entry->link_id) ); TRACE_EVENT(rdev_set_noack_map, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, u16 noack_map), TP_ARGS(wiphy, netdev, noack_map), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY __field(u16, noack_map) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; __entry->noack_map = noack_map; ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", noack_map: %u", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->noack_map) ); DEFINE_EVENT(wiphy_wdev_link_evt, rdev_get_channel, TP_PROTO(struct wiphy *wiphy, struct wireless_dev *wdev, unsigned int link_id), TP_ARGS(wiphy, wdev, link_id) ); TRACE_EVENT(rdev_return_chandef, TP_PROTO(struct wiphy *wiphy, int ret, struct cfg80211_chan_def *chandef), TP_ARGS(wiphy, ret, chandef), TP_STRUCT__entry( WIPHY_ENTRY __field(int, ret) CHAN_DEF_ENTRY ), TP_fast_assign( WIPHY_ASSIGN; if (ret == 0) CHAN_DEF_ASSIGN(chandef); else CHAN_DEF_ASSIGN((struct cfg80211_chan_def *)NULL); __entry->ret = ret; ), TP_printk(WIPHY_PR_FMT ", " CHAN_DEF_PR_FMT ", ret: %d", WIPHY_PR_ARG, CHAN_DEF_PR_ARG, __entry->ret) ); DEFINE_EVENT(wiphy_wdev_evt, rdev_start_p2p_device, TP_PROTO(struct wiphy *wiphy, struct wireless_dev *wdev), TP_ARGS(wiphy, wdev) ); DEFINE_EVENT(wiphy_wdev_evt, rdev_stop_p2p_device, TP_PROTO(struct wiphy *wiphy, struct wireless_dev *wdev), TP_ARGS(wiphy, wdev) ); TRACE_EVENT(rdev_start_nan, TP_PROTO(struct wiphy *wiphy, struct wireless_dev *wdev, struct cfg80211_nan_conf *conf), TP_ARGS(wiphy, wdev, conf), TP_STRUCT__entry( WIPHY_ENTRY WDEV_ENTRY __field(u8, master_pref) __field(u8, bands) ), TP_fast_assign( WIPHY_ASSIGN; WDEV_ASSIGN; __entry->master_pref = conf->master_pref; __entry->bands = conf->bands; ), TP_printk(WIPHY_PR_FMT ", " WDEV_PR_FMT ", master preference: %u, bands: 0x%0x", WIPHY_PR_ARG, WDEV_PR_ARG, __entry->master_pref, __entry->bands) ); TRACE_EVENT(rdev_nan_change_conf, TP_PROTO(struct wiphy *wiphy, struct wireless_dev *wdev, struct cfg80211_nan_conf *conf, u32 changes), TP_ARGS(wiphy, wdev, conf, changes), TP_STRUCT__entry( WIPHY_ENTRY WDEV_ENTRY __field(u8, master_pref) __field(u8, bands) __field(u32, changes) ), TP_fast_assign( WIPHY_ASSIGN; WDEV_ASSIGN; __entry->master_pref = conf->master_pref; __entry->bands = conf->bands; __entry->changes = changes; ), TP_printk(WIPHY_PR_FMT ", " WDEV_PR_FMT ", master preference: %u, bands: 0x%0x, changes: %x", WIPHY_PR_ARG, WDEV_PR_ARG, __entry->master_pref, __entry->bands, __entry->changes) ); DEFINE_EVENT(wiphy_wdev_evt, rdev_stop_nan, TP_PROTO(struct wiphy *wiphy, struct wireless_dev *wdev), TP_ARGS(wiphy, wdev) ); TRACE_EVENT(rdev_add_nan_func, TP_PROTO(struct wiphy *wiphy, struct wireless_dev *wdev, const struct cfg80211_nan_func *func), TP_ARGS(wiphy, wdev, func), TP_STRUCT__entry( WIPHY_ENTRY WDEV_ENTRY __field(u8, func_type) __field(u64, cookie) ), TP_fast_assign( WIPHY_ASSIGN; WDEV_ASSIGN; __entry->func_type = func->type; __entry->cookie = func->cookie ), TP_printk(WIPHY_PR_FMT ", " WDEV_PR_FMT ", type=%u, cookie=%llu", WIPHY_PR_ARG, WDEV_PR_ARG, __entry->func_type, __entry->cookie) ); TRACE_EVENT(rdev_del_nan_func, TP_PROTO(struct wiphy *wiphy, struct wireless_dev *wdev, u64 cookie), TP_ARGS(wiphy, wdev, cookie), TP_STRUCT__entry( WIPHY_ENTRY WDEV_ENTRY __field(u64, cookie) ), TP_fast_assign( WIPHY_ASSIGN; WDEV_ASSIGN; __entry->cookie = cookie; ), TP_printk(WIPHY_PR_FMT ", " WDEV_PR_FMT ", cookie=%llu", WIPHY_PR_ARG, WDEV_PR_ARG, __entry->cookie) ); TRACE_EVENT(rdev_set_mac_acl, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, struct cfg80211_acl_data *params), TP_ARGS(wiphy, netdev, params), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY __field(u32, acl_policy) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; __entry->acl_policy = params->acl_policy; ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", acl policy: %d", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->acl_policy) ); TRACE_EVENT(rdev_update_ft_ies, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, struct cfg80211_update_ft_ies_params *ftie), TP_ARGS(wiphy, netdev, ftie), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY __field(u16, md) __dynamic_array(u8, ie, ftie->ie_len) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; __entry->md = ftie->md; memcpy(__get_dynamic_array(ie), ftie->ie, ftie->ie_len); ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", md: 0x%x", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->md) ); TRACE_EVENT(rdev_crit_proto_start, TP_PROTO(struct wiphy *wiphy, struct wireless_dev *wdev, enum nl80211_crit_proto_id protocol, u16 duration), TP_ARGS(wiphy, wdev, protocol, duration), TP_STRUCT__entry( WIPHY_ENTRY WDEV_ENTRY __field(u16, proto) __field(u16, duration) ), TP_fast_assign( WIPHY_ASSIGN; WDEV_ASSIGN; __entry->proto = protocol; __entry->duration = duration; ), TP_printk(WIPHY_PR_FMT ", " WDEV_PR_FMT ", proto=%x, duration=%u", WIPHY_PR_ARG, WDEV_PR_ARG, __entry->proto, __entry->duration) ); TRACE_EVENT(rdev_crit_proto_stop, TP_PROTO(struct wiphy *wiphy, struct wireless_dev *wdev), TP_ARGS(wiphy, wdev), TP_STRUCT__entry( WIPHY_ENTRY WDEV_ENTRY ), TP_fast_assign( WIPHY_ASSIGN; WDEV_ASSIGN; ), TP_printk(WIPHY_PR_FMT ", " WDEV_PR_FMT, WIPHY_PR_ARG, WDEV_PR_ARG) ); TRACE_EVENT(rdev_channel_switch, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, struct cfg80211_csa_settings *params), TP_ARGS(wiphy, netdev, params), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY CHAN_DEF_ENTRY __field(bool, radar_required) __field(bool, block_tx) __field(u8, count) __dynamic_array(u16, bcn_ofs, params->n_counter_offsets_beacon) __dynamic_array(u16, pres_ofs, params->n_counter_offsets_presp) __field(u8, link_id) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; CHAN_DEF_ASSIGN(¶ms->chandef); __entry->radar_required = params->radar_required; __entry->block_tx = params->block_tx; __entry->count = params->count; memcpy(__get_dynamic_array(bcn_ofs), params->counter_offsets_beacon, params->n_counter_offsets_beacon * sizeof(u16)); /* probe response offsets are optional */ if (params->n_counter_offsets_presp) memcpy(__get_dynamic_array(pres_ofs), params->counter_offsets_presp, params->n_counter_offsets_presp * sizeof(u16)); __entry->link_id = params->link_id; ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", " CHAN_DEF_PR_FMT ", block_tx: %d, count: %u, radar_required: %d, link_id: %d", WIPHY_PR_ARG, NETDEV_PR_ARG, CHAN_DEF_PR_ARG, __entry->block_tx, __entry->count, __entry->radar_required, __entry->link_id) ); TRACE_EVENT(rdev_set_qos_map, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, struct cfg80211_qos_map *qos_map), TP_ARGS(wiphy, netdev, qos_map), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY QOS_MAP_ENTRY ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; QOS_MAP_ASSIGN(qos_map); ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", num_des: %u", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->num_des) ); TRACE_EVENT(rdev_set_ap_chanwidth, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, unsigned int link_id, struct cfg80211_chan_def *chandef), TP_ARGS(wiphy, netdev, link_id, chandef), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY CHAN_DEF_ENTRY __field(unsigned int, link_id) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; CHAN_DEF_ASSIGN(chandef); __entry->link_id = link_id; ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", " CHAN_DEF_PR_FMT ", link:%d", WIPHY_PR_ARG, NETDEV_PR_ARG, CHAN_DEF_PR_ARG, __entry->link_id) ); TRACE_EVENT(rdev_add_tx_ts, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, u8 tsid, const u8 *peer, u8 user_prio, u16 admitted_time), TP_ARGS(wiphy, netdev, tsid, peer, user_prio, admitted_time), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY MAC_ENTRY(peer) __field(u8, tsid) __field(u8, user_prio) __field(u16, admitted_time) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; MAC_ASSIGN(peer, peer); __entry->tsid = tsid; __entry->user_prio = user_prio; __entry->admitted_time = admitted_time; ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", %pM, TSID %d, UP %d, time %d", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->peer, __entry->tsid, __entry->user_prio, __entry->admitted_time) ); TRACE_EVENT(rdev_del_tx_ts, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, u8 tsid, const u8 *peer), TP_ARGS(wiphy, netdev, tsid, peer), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY MAC_ENTRY(peer) __field(u8, tsid) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; MAC_ASSIGN(peer, peer); __entry->tsid = tsid; ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", %pM, TSID %d", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->peer, __entry->tsid) ); TRACE_EVENT(rdev_tdls_channel_switch, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, const u8 *addr, u8 oper_class, struct cfg80211_chan_def *chandef), TP_ARGS(wiphy, netdev, addr, oper_class, chandef), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY MAC_ENTRY(addr) __field(u8, oper_class) CHAN_DEF_ENTRY ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; MAC_ASSIGN(addr, addr); CHAN_DEF_ASSIGN(chandef); ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", %pM" " oper class %d, " CHAN_DEF_PR_FMT, WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->addr, __entry->oper_class, CHAN_DEF_PR_ARG) ); TRACE_EVENT(rdev_tdls_cancel_channel_switch, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, const u8 *addr), TP_ARGS(wiphy, netdev, addr), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY MAC_ENTRY(addr) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; MAC_ASSIGN(addr, addr); ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", %pM", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->addr) ); TRACE_EVENT(rdev_set_pmk, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, struct cfg80211_pmk_conf *pmk_conf), TP_ARGS(wiphy, netdev, pmk_conf), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY MAC_ENTRY(aa) __field(u8, pmk_len) __field(u8, pmk_r0_name_len) __dynamic_array(u8, pmk, pmk_conf->pmk_len) __dynamic_array(u8, pmk_r0_name, WLAN_PMK_NAME_LEN) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; MAC_ASSIGN(aa, pmk_conf->aa); __entry->pmk_len = pmk_conf->pmk_len; __entry->pmk_r0_name_len = pmk_conf->pmk_r0_name ? WLAN_PMK_NAME_LEN : 0; memcpy(__get_dynamic_array(pmk), pmk_conf->pmk, pmk_conf->pmk_len); memcpy(__get_dynamic_array(pmk_r0_name), pmk_conf->pmk_r0_name, pmk_conf->pmk_r0_name ? WLAN_PMK_NAME_LEN : 0); ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", %pM" "pmk_len=%u, pmk: %s pmk_r0_name: %s", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->aa, __entry->pmk_len, __print_array(__get_dynamic_array(pmk), __get_dynamic_array_len(pmk), 1), __entry->pmk_r0_name_len ? __print_array(__get_dynamic_array(pmk_r0_name), __get_dynamic_array_len(pmk_r0_name), 1) : "") ); TRACE_EVENT(rdev_del_pmk, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, const u8 *aa), TP_ARGS(wiphy, netdev, aa), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY MAC_ENTRY(aa) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; MAC_ASSIGN(aa, aa); ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", %pM", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->aa) ); TRACE_EVENT(rdev_external_auth, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, struct cfg80211_external_auth_params *params), TP_ARGS(wiphy, netdev, params), TP_STRUCT__entry(WIPHY_ENTRY NETDEV_ENTRY MAC_ENTRY(bssid) __array(u8, ssid, IEEE80211_MAX_SSID_LEN + 1) __field(u16, status) MAC_ENTRY(mld_addr) ), TP_fast_assign(WIPHY_ASSIGN; NETDEV_ASSIGN; MAC_ASSIGN(bssid, params->bssid); memset(__entry->ssid, 0, IEEE80211_MAX_SSID_LEN + 1); memcpy(__entry->ssid, params->ssid.ssid, params->ssid.ssid_len); __entry->status = params->status; MAC_ASSIGN(mld_addr, params->mld_addr); ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", bssid: %pM" ", ssid: %s, status: %u, mld_addr: %pM", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->bssid, __entry->ssid, __entry->status, __entry->mld_addr) ); TRACE_EVENT(rdev_start_radar_detection, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, struct cfg80211_chan_def *chandef, u32 cac_time_ms, int link_id), TP_ARGS(wiphy, netdev, chandef, cac_time_ms, link_id), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY CHAN_DEF_ENTRY __field(u32, cac_time_ms) __field(int, link_id) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; CHAN_DEF_ASSIGN(chandef); __entry->cac_time_ms = cac_time_ms; __entry->link_id = link_id; ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", " CHAN_DEF_PR_FMT ", cac_time_ms=%u, link_id=%d", WIPHY_PR_ARG, NETDEV_PR_ARG, CHAN_DEF_PR_ARG, __entry->cac_time_ms, __entry->link_id) ); TRACE_EVENT(rdev_set_mcast_rate, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, int *mcast_rate), TP_ARGS(wiphy, netdev, mcast_rate), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY __array(int, mcast_rate, NUM_NL80211_BANDS) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; memcpy(__entry->mcast_rate, mcast_rate, sizeof(int) * NUM_NL80211_BANDS); ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", " "mcast_rates [2.4GHz=0x%x, 5.2GHz=0x%x, 6GHz=0x%x, 60GHz=0x%x]", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->mcast_rate[NL80211_BAND_2GHZ], __entry->mcast_rate[NL80211_BAND_5GHZ], __entry->mcast_rate[NL80211_BAND_6GHZ], __entry->mcast_rate[NL80211_BAND_60GHZ]) ); TRACE_EVENT(rdev_set_coalesce, TP_PROTO(struct wiphy *wiphy, struct cfg80211_coalesce *coalesce), TP_ARGS(wiphy, coalesce), TP_STRUCT__entry( WIPHY_ENTRY __field(int, n_rules) ), TP_fast_assign( WIPHY_ASSIGN; __entry->n_rules = coalesce ? coalesce->n_rules : 0; ), TP_printk(WIPHY_PR_FMT ", n_rules=%d", WIPHY_PR_ARG, __entry->n_rules) ); DEFINE_EVENT(wiphy_wdev_evt, rdev_abort_scan, TP_PROTO(struct wiphy *wiphy, struct wireless_dev *wdev), TP_ARGS(wiphy, wdev) ); TRACE_EVENT(rdev_set_multicast_to_unicast, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, const bool enabled), TP_ARGS(wiphy, netdev, enabled), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY __field(bool, enabled) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; __entry->enabled = enabled; ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", unicast: %s", WIPHY_PR_ARG, NETDEV_PR_ARG, BOOL_TO_STR(__entry->enabled)) ); DEFINE_EVENT(wiphy_wdev_evt, rdev_get_txq_stats, TP_PROTO(struct wiphy *wiphy, struct wireless_dev *wdev), TP_ARGS(wiphy, wdev) ); TRACE_EVENT(rdev_get_ftm_responder_stats, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, struct cfg80211_ftm_responder_stats *ftm_stats), TP_ARGS(wiphy, netdev, ftm_stats), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY __field(u64, timestamp) __field(u32, success_num) __field(u32, partial_num) __field(u32, failed_num) __field(u32, asap_num) __field(u32, non_asap_num) __field(u64, duration) __field(u32, unknown_triggers) __field(u32, reschedule) __field(u32, out_of_window) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; __entry->success_num = ftm_stats->success_num; __entry->partial_num = ftm_stats->partial_num; __entry->failed_num = ftm_stats->failed_num; __entry->asap_num = ftm_stats->asap_num; __entry->non_asap_num = ftm_stats->non_asap_num; __entry->duration = ftm_stats->total_duration_ms; __entry->unknown_triggers = ftm_stats->unknown_triggers_num; __entry->reschedule = ftm_stats->reschedule_requests_num; __entry->out_of_window = ftm_stats->out_of_window_triggers_num; ), TP_printk(WIPHY_PR_FMT "Ftm responder stats: success %u, partial %u, " "failed %u, asap %u, non asap %u, total duration %llu, unknown " "triggers %u, rescheduled %u, out of window %u", WIPHY_PR_ARG, __entry->success_num, __entry->partial_num, __entry->failed_num, __entry->asap_num, __entry->non_asap_num, __entry->duration, __entry->unknown_triggers, __entry->reschedule, __entry->out_of_window) ); DEFINE_EVENT(wiphy_wdev_cookie_evt, rdev_start_pmsr, TP_PROTO(struct wiphy *wiphy, struct wireless_dev *wdev, u64 cookie), TP_ARGS(wiphy, wdev, cookie) ); DEFINE_EVENT(wiphy_wdev_cookie_evt, rdev_abort_pmsr, TP_PROTO(struct wiphy *wiphy, struct wireless_dev *wdev, u64 cookie), TP_ARGS(wiphy, wdev, cookie) ); TRACE_EVENT(rdev_set_fils_aad, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, struct cfg80211_fils_aad *fils_aad), TP_ARGS(wiphy, netdev, fils_aad), TP_STRUCT__entry(WIPHY_ENTRY NETDEV_ENTRY __array(u8, macaddr, ETH_ALEN) __field(u8, kek_len) ), TP_fast_assign(WIPHY_ASSIGN; NETDEV_ASSIGN; FILS_AAD_ASSIGN(fils_aad); ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", " FILS_AAD_PR_FMT, WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->macaddr, __entry->kek_len) ); TRACE_EVENT(rdev_update_owe_info, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, struct cfg80211_update_owe_info *owe_info), TP_ARGS(wiphy, netdev, owe_info), TP_STRUCT__entry(WIPHY_ENTRY NETDEV_ENTRY MAC_ENTRY(peer) __field(u16, status) __dynamic_array(u8, ie, owe_info->ie_len)), TP_fast_assign(WIPHY_ASSIGN; NETDEV_ASSIGN; MAC_ASSIGN(peer, owe_info->peer); __entry->status = owe_info->status; memcpy(__get_dynamic_array(ie), owe_info->ie, owe_info->ie_len);), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", peer: %pM" " status %d", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->peer, __entry->status) ); TRACE_EVENT(rdev_probe_mesh_link, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, const u8 *dest, const u8 *buf, size_t len), TP_ARGS(wiphy, netdev, dest, buf, len), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY MAC_ENTRY(dest) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; MAC_ASSIGN(dest, dest); ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", %pM", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->dest) ); TRACE_EVENT(rdev_set_tid_config, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, struct cfg80211_tid_config *tid_conf), TP_ARGS(wiphy, netdev, tid_conf), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY MAC_ENTRY(peer) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; MAC_ASSIGN(peer, tid_conf->peer); ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", peer: %pM", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->peer) ); TRACE_EVENT(rdev_reset_tid_config, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, const u8 *peer, u8 tids), TP_ARGS(wiphy, netdev, peer, tids), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY MAC_ENTRY(peer) __field(u8, tids) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; MAC_ASSIGN(peer, peer); __entry->tids = tids; ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", peer: %pM, tids: 0x%x", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->peer, __entry->tids) ); TRACE_EVENT(rdev_set_sar_specs, TP_PROTO(struct wiphy *wiphy, struct cfg80211_sar_specs *sar), TP_ARGS(wiphy, sar), TP_STRUCT__entry( WIPHY_ENTRY __field(u16, type) __field(u16, num) ), TP_fast_assign( WIPHY_ASSIGN; __entry->type = sar->type; __entry->num = sar->num_sub_specs; ), TP_printk(WIPHY_PR_FMT ", Set type:%d, num_specs:%d", WIPHY_PR_ARG, __entry->type, __entry->num) ); TRACE_EVENT(rdev_color_change, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, struct cfg80211_color_change_settings *params), TP_ARGS(wiphy, netdev, params), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY __field(u8, count) __field(u16, bcn_ofs) __field(u16, pres_ofs) __field(u8, link_id) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; __entry->count = params->count; __entry->bcn_ofs = params->counter_offset_beacon; __entry->pres_ofs = params->counter_offset_presp; __entry->link_id = params->link_id; ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", count: %u, link_id: %d", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->count, __entry->link_id) ); TRACE_EVENT(rdev_set_radar_background, TP_PROTO(struct wiphy *wiphy, struct cfg80211_chan_def *chandef), TP_ARGS(wiphy, chandef), TP_STRUCT__entry( WIPHY_ENTRY CHAN_DEF_ENTRY ), TP_fast_assign( WIPHY_ASSIGN; CHAN_DEF_ASSIGN(chandef) ), TP_printk(WIPHY_PR_FMT ", " CHAN_DEF_PR_FMT, WIPHY_PR_ARG, CHAN_DEF_PR_ARG) ); DEFINE_EVENT(wiphy_wdev_link_evt, rdev_add_intf_link, TP_PROTO(struct wiphy *wiphy, struct wireless_dev *wdev, unsigned int link_id), TP_ARGS(wiphy, wdev, link_id) ); DEFINE_EVENT(wiphy_wdev_link_evt, rdev_del_intf_link, TP_PROTO(struct wiphy *wiphy, struct wireless_dev *wdev, unsigned int link_id), TP_ARGS(wiphy, wdev, link_id) ); TRACE_EVENT(rdev_del_link_station, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, struct link_station_del_parameters *params), TP_ARGS(wiphy, netdev, params), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY __array(u8, mld_mac, 6) __field(u32, link_id) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; memset(__entry->mld_mac, 0, 6); if (params->mld_mac) memcpy(__entry->mld_mac, params->mld_mac, 6); __entry->link_id = params->link_id; ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", station mac: %pM" ", link id: %u", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->mld_mac, __entry->link_id) ); TRACE_EVENT(rdev_set_hw_timestamp, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, struct cfg80211_set_hw_timestamp *hwts), TP_ARGS(wiphy, netdev, hwts), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY MAC_ENTRY(macaddr) __field(bool, enable) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; MAC_ASSIGN(macaddr, hwts->macaddr); __entry->enable = hwts->enable; ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", mac %pM, enable: %u", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->macaddr, __entry->enable) ); TRACE_EVENT(rdev_set_ttlm, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, struct cfg80211_ttlm_params *params), TP_ARGS(wiphy, netdev, params), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY __array(u8, dlink, sizeof(u16) * 8) __array(u8, ulink, sizeof(u16) * 8) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; memcpy(__entry->dlink, params->dlink, sizeof(params->dlink)); memcpy(__entry->ulink, params->ulink, sizeof(params->ulink)); ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT, WIPHY_PR_ARG, NETDEV_PR_ARG) ); TRACE_EVENT(rdev_set_epcs, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, bool val), TP_ARGS(wiphy, netdev, val), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY __field(bool, val) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; __entry->val = val; ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", config=%u", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->val) ); /************************************************************* * cfg80211 exported functions traces * *************************************************************/ TRACE_EVENT(cfg80211_return_bool, TP_PROTO(bool ret), TP_ARGS(ret), TP_STRUCT__entry( __field(bool, ret) ), TP_fast_assign( __entry->ret = ret; ), TP_printk("returned %s", BOOL_TO_STR(__entry->ret)) ); DECLARE_EVENT_CLASS(cfg80211_netdev_mac_evt, TP_PROTO(struct net_device *netdev, const u8 *macaddr), TP_ARGS(netdev, macaddr), TP_STRUCT__entry( NETDEV_ENTRY MAC_ENTRY(macaddr) ), TP_fast_assign( NETDEV_ASSIGN; MAC_ASSIGN(macaddr, macaddr); ), TP_printk(NETDEV_PR_FMT ", mac: %pM", NETDEV_PR_ARG, __entry->macaddr) ); DEFINE_EVENT(cfg80211_netdev_mac_evt, cfg80211_notify_new_peer_candidate, TP_PROTO(struct net_device *netdev, const u8 *macaddr), TP_ARGS(netdev, macaddr) ); TRACE_EVENT(cfg80211_send_rx_assoc, TP_PROTO(struct net_device *netdev, const struct cfg80211_rx_assoc_resp_data *data), TP_ARGS(netdev, data), TP_STRUCT__entry( NETDEV_ENTRY MAC_ENTRY(ap_addr) ), TP_fast_assign( NETDEV_ASSIGN; MAC_ASSIGN(ap_addr, data->ap_mld_addr ?: data->links[0].bss->bssid); ), TP_printk(NETDEV_PR_FMT ", %pM", NETDEV_PR_ARG, __entry->ap_addr) ); DECLARE_EVENT_CLASS(netdev_frame_event, TP_PROTO(struct net_device *netdev, const u8 *buf, int len), TP_ARGS(netdev, buf, len), TP_STRUCT__entry( NETDEV_ENTRY __dynamic_array(u8, frame, len) ), TP_fast_assign( NETDEV_ASSIGN; memcpy(__get_dynamic_array(frame), buf, len); ), TP_printk(NETDEV_PR_FMT ", ftype:0x%.2x", NETDEV_PR_ARG, le16_to_cpup((__le16 *)__get_dynamic_array(frame))) ); DEFINE_EVENT(netdev_frame_event, cfg80211_rx_unprot_mlme_mgmt, TP_PROTO(struct net_device *netdev, const u8 *buf, int len), TP_ARGS(netdev, buf, len) ); DEFINE_EVENT(netdev_frame_event, cfg80211_rx_mlme_mgmt, TP_PROTO(struct net_device *netdev, const u8 *buf, int len), TP_ARGS(netdev, buf, len) ); TRACE_EVENT(cfg80211_tx_mlme_mgmt, TP_PROTO(struct net_device *netdev, const u8 *buf, int len, bool reconnect), TP_ARGS(netdev, buf, len, reconnect), TP_STRUCT__entry( NETDEV_ENTRY __dynamic_array(u8, frame, len) __field(int, reconnect) ), TP_fast_assign( NETDEV_ASSIGN; memcpy(__get_dynamic_array(frame), buf, len); __entry->reconnect = reconnect; ), TP_printk(NETDEV_PR_FMT ", ftype:0x%.2x reconnect:%d", NETDEV_PR_ARG, le16_to_cpup((__le16 *)__get_dynamic_array(frame)), __entry->reconnect) ); DECLARE_EVENT_CLASS(netdev_mac_evt, TP_PROTO(struct net_device *netdev, const u8 *mac), TP_ARGS(netdev, mac), TP_STRUCT__entry( NETDEV_ENTRY MAC_ENTRY(mac) ), TP_fast_assign( NETDEV_ASSIGN; MAC_ASSIGN(mac, mac) ), TP_printk(NETDEV_PR_FMT ", mac: %pM", NETDEV_PR_ARG, __entry->mac) ); DEFINE_EVENT(netdev_mac_evt, cfg80211_send_auth_timeout, TP_PROTO(struct net_device *netdev, const u8 *mac), TP_ARGS(netdev, mac) ); TRACE_EVENT(cfg80211_send_assoc_failure, TP_PROTO(struct net_device *netdev, struct cfg80211_assoc_failure *data), TP_ARGS(netdev, data), TP_STRUCT__entry( NETDEV_ENTRY MAC_ENTRY(ap_addr) __field(bool, timeout) ), TP_fast_assign( NETDEV_ASSIGN; MAC_ASSIGN(ap_addr, data->ap_mld_addr ?: data->bss[0]->bssid); __entry->timeout = data->timeout; ), TP_printk(NETDEV_PR_FMT ", mac: %pM, timeout: %d", NETDEV_PR_ARG, __entry->ap_addr, __entry->timeout) ); TRACE_EVENT(cfg80211_michael_mic_failure, TP_PROTO(struct net_device *netdev, const u8 *addr, enum nl80211_key_type key_type, int key_id, const u8 *tsc), TP_ARGS(netdev, addr, key_type, key_id, tsc), TP_STRUCT__entry( NETDEV_ENTRY MAC_ENTRY(addr) __field(enum nl80211_key_type, key_type) __field(int, key_id) __array(u8, tsc, 6) ), TP_fast_assign( NETDEV_ASSIGN; MAC_ASSIGN(addr, addr); __entry->key_type = key_type; __entry->key_id = key_id; if (tsc) memcpy(__entry->tsc, tsc, 6); ), TP_printk(NETDEV_PR_FMT ", %pM, key type: %d, key id: %d, tsc: %pm", NETDEV_PR_ARG, __entry->addr, __entry->key_type, __entry->key_id, __entry->tsc) ); TRACE_EVENT(cfg80211_ready_on_channel, TP_PROTO(struct wireless_dev *wdev, u64 cookie, struct ieee80211_channel *chan, unsigned int duration), TP_ARGS(wdev, cookie, chan, duration), TP_STRUCT__entry( WDEV_ENTRY __field(u64, cookie) CHAN_ENTRY __field(unsigned int, duration) ), TP_fast_assign( WDEV_ASSIGN; __entry->cookie = cookie; CHAN_ASSIGN(chan); __entry->duration = duration; ), TP_printk(WDEV_PR_FMT ", cookie: %llu, " CHAN_PR_FMT ", duration: %u", WDEV_PR_ARG, __entry->cookie, CHAN_PR_ARG, __entry->duration) ); TRACE_EVENT(cfg80211_ready_on_channel_expired, TP_PROTO(struct wireless_dev *wdev, u64 cookie, struct ieee80211_channel *chan), TP_ARGS(wdev, cookie, chan), TP_STRUCT__entry( WDEV_ENTRY __field(u64, cookie) CHAN_ENTRY ), TP_fast_assign( WDEV_ASSIGN; __entry->cookie = cookie; CHAN_ASSIGN(chan); ), TP_printk(WDEV_PR_FMT ", cookie: %llu, " CHAN_PR_FMT, WDEV_PR_ARG, __entry->cookie, CHAN_PR_ARG) ); TRACE_EVENT(cfg80211_tx_mgmt_expired, TP_PROTO(struct wireless_dev *wdev, u64 cookie, struct ieee80211_channel *chan), TP_ARGS(wdev, cookie, chan), TP_STRUCT__entry( WDEV_ENTRY __field(u64, cookie) CHAN_ENTRY ), TP_fast_assign( WDEV_ASSIGN; __entry->cookie = cookie; CHAN_ASSIGN(chan); ), TP_printk(WDEV_PR_FMT ", cookie: %llu, " CHAN_PR_FMT, WDEV_PR_ARG, __entry->cookie, CHAN_PR_ARG) ); TRACE_EVENT(cfg80211_new_sta, TP_PROTO(struct net_device *netdev, const u8 *mac_addr, struct station_info *sinfo), TP_ARGS(netdev, mac_addr, sinfo), TP_STRUCT__entry( NETDEV_ENTRY MAC_ENTRY(mac_addr) SINFO_ENTRY ), TP_fast_assign( NETDEV_ASSIGN; MAC_ASSIGN(mac_addr, mac_addr); SINFO_ASSIGN; ), TP_printk(NETDEV_PR_FMT ", %pM", NETDEV_PR_ARG, __entry->mac_addr) ); DEFINE_EVENT(cfg80211_netdev_mac_evt, cfg80211_del_sta, TP_PROTO(struct net_device *netdev, const u8 *macaddr), TP_ARGS(netdev, macaddr) ); TRACE_EVENT(cfg80211_rx_mgmt, TP_PROTO(struct wireless_dev *wdev, struct cfg80211_rx_info *info), TP_ARGS(wdev, info), TP_STRUCT__entry( WDEV_ENTRY __field(int, freq) __field(int, sig_dbm) ), TP_fast_assign( WDEV_ASSIGN; __entry->freq = info->freq; __entry->sig_dbm = info->sig_dbm; ), TP_printk(WDEV_PR_FMT ", freq: "KHZ_F", sig dbm: %d", WDEV_PR_ARG, PR_KHZ(__entry->freq), __entry->sig_dbm) ); TRACE_EVENT(cfg80211_mgmt_tx_status, TP_PROTO(struct wireless_dev *wdev, u64 cookie, bool ack), TP_ARGS(wdev, cookie, ack), TP_STRUCT__entry( WDEV_ENTRY __field(u64, cookie) __field(bool, ack) ), TP_fast_assign( WDEV_ASSIGN; __entry->cookie = cookie; __entry->ack = ack; ), TP_printk(WDEV_PR_FMT", cookie: %llu, ack: %s", WDEV_PR_ARG, __entry->cookie, BOOL_TO_STR(__entry->ack)) ); TRACE_EVENT(cfg80211_control_port_tx_status, TP_PROTO(struct wireless_dev *wdev, u64 cookie, bool ack), TP_ARGS(wdev, cookie, ack), TP_STRUCT__entry( WDEV_ENTRY __field(u64, cookie) __field(bool, ack) ), TP_fast_assign( WDEV_ASSIGN; __entry->cookie = cookie; __entry->ack = ack; ), TP_printk(WDEV_PR_FMT", cookie: %llu, ack: %s", WDEV_PR_ARG, __entry->cookie, BOOL_TO_STR(__entry->ack)) ); TRACE_EVENT(cfg80211_rx_control_port, TP_PROTO(struct net_device *netdev, struct sk_buff *skb, bool unencrypted, int link_id), TP_ARGS(netdev, skb, unencrypted, link_id), TP_STRUCT__entry( NETDEV_ENTRY __field(int, len) MAC_ENTRY(from) __field(u16, proto) __field(bool, unencrypted) __field(int, link_id) ), TP_fast_assign( NETDEV_ASSIGN; __entry->len = skb->len; MAC_ASSIGN(from, eth_hdr(skb)->h_source); __entry->proto = be16_to_cpu(skb->protocol); __entry->unencrypted = unencrypted; __entry->link_id = link_id; ), TP_printk(NETDEV_PR_FMT ", len=%d, %pM, proto: 0x%x, unencrypted: %s, link: %d", NETDEV_PR_ARG, __entry->len, __entry->from, __entry->proto, BOOL_TO_STR(__entry->unencrypted), __entry->link_id) ); TRACE_EVENT(cfg80211_cqm_rssi_notify, TP_PROTO(struct net_device *netdev, enum nl80211_cqm_rssi_threshold_event rssi_event, s32 rssi_level), TP_ARGS(netdev, rssi_event, rssi_level), TP_STRUCT__entry( NETDEV_ENTRY __field(enum nl80211_cqm_rssi_threshold_event, rssi_event) __field(s32, rssi_level) ), TP_fast_assign( NETDEV_ASSIGN; __entry->rssi_event = rssi_event; __entry->rssi_level = rssi_level; ), TP_printk(NETDEV_PR_FMT ", rssi event: %d, level: %d", NETDEV_PR_ARG, __entry->rssi_event, __entry->rssi_level) ); TRACE_EVENT(cfg80211_reg_can_beacon, TP_PROTO(struct wiphy *wiphy, struct cfg80211_chan_def *chandef, enum nl80211_iftype iftype, u32 prohibited_flags, u32 permitting_flags), TP_ARGS(wiphy, chandef, iftype, prohibited_flags, permitting_flags), TP_STRUCT__entry( WIPHY_ENTRY CHAN_DEF_ENTRY __field(enum nl80211_iftype, iftype) __field(u32, prohibited_flags) __field(u32, permitting_flags) ), TP_fast_assign( WIPHY_ASSIGN; CHAN_DEF_ASSIGN(chandef); __entry->iftype = iftype; __entry->prohibited_flags = prohibited_flags; __entry->permitting_flags = permitting_flags; ), TP_printk(WIPHY_PR_FMT ", " CHAN_DEF_PR_FMT ", iftype=%d prohibited_flags=0x%x permitting_flags=0x%x", WIPHY_PR_ARG, CHAN_DEF_PR_ARG, __entry->iftype, __entry->prohibited_flags, __entry->permitting_flags) ); TRACE_EVENT(cfg80211_ch_switch_notify, TP_PROTO(struct net_device *netdev, struct cfg80211_chan_def *chandef, unsigned int link_id), TP_ARGS(netdev, chandef, link_id), TP_STRUCT__entry( NETDEV_ENTRY CHAN_DEF_ENTRY __field(unsigned int, link_id) ), TP_fast_assign( NETDEV_ASSIGN; CHAN_DEF_ASSIGN(chandef); __entry->link_id = link_id; ), TP_printk(NETDEV_PR_FMT ", " CHAN_DEF_PR_FMT ", link:%d", NETDEV_PR_ARG, CHAN_DEF_PR_ARG, __entry->link_id) ); TRACE_EVENT(cfg80211_ch_switch_started_notify, TP_PROTO(struct net_device *netdev, struct cfg80211_chan_def *chandef, unsigned int link_id), TP_ARGS(netdev, chandef, link_id), TP_STRUCT__entry( NETDEV_ENTRY CHAN_DEF_ENTRY __field(unsigned int, link_id) ), TP_fast_assign( NETDEV_ASSIGN; CHAN_DEF_ASSIGN(chandef); __entry->link_id = link_id; ), TP_printk(NETDEV_PR_FMT ", " CHAN_DEF_PR_FMT ", link:%d", NETDEV_PR_ARG, CHAN_DEF_PR_ARG, __entry->link_id) ); TRACE_EVENT(cfg80211_radar_event, TP_PROTO(struct wiphy *wiphy, struct cfg80211_chan_def *chandef, bool offchan), TP_ARGS(wiphy, chandef, offchan), TP_STRUCT__entry( WIPHY_ENTRY CHAN_DEF_ENTRY __field(bool, offchan) ), TP_fast_assign( WIPHY_ASSIGN; CHAN_DEF_ASSIGN(chandef); __entry->offchan = offchan; ), TP_printk(WIPHY_PR_FMT ", " CHAN_DEF_PR_FMT ", offchan %d", WIPHY_PR_ARG, CHAN_DEF_PR_ARG, __entry->offchan) ); TRACE_EVENT(cfg80211_cac_event, TP_PROTO(struct net_device *netdev, enum nl80211_radar_event evt, unsigned int link_id), TP_ARGS(netdev, evt, link_id), TP_STRUCT__entry( NETDEV_ENTRY __field(enum nl80211_radar_event, evt) __field(unsigned int, link_id) ), TP_fast_assign( NETDEV_ASSIGN; __entry->evt = evt; __entry->link_id = link_id; ), TP_printk(NETDEV_PR_FMT ", event: %d, link_id=%u", NETDEV_PR_ARG, __entry->evt, __entry->link_id) ); DECLARE_EVENT_CLASS(cfg80211_rx_evt, TP_PROTO(struct net_device *netdev, const u8 *addr, int link_id), TP_ARGS(netdev, addr, link_id), TP_STRUCT__entry( NETDEV_ENTRY MAC_ENTRY(addr) __field(int, link_id) ), TP_fast_assign( NETDEV_ASSIGN; MAC_ASSIGN(addr, addr); __entry->link_id = link_id; ), TP_printk(NETDEV_PR_FMT ", %pM, link_id:%d", NETDEV_PR_ARG, __entry->addr, __entry->link_id) ); DEFINE_EVENT(cfg80211_rx_evt, cfg80211_rx_spurious_frame, TP_PROTO(struct net_device *netdev, const u8 *addr, int link_id), TP_ARGS(netdev, addr, link_id) ); DEFINE_EVENT(cfg80211_rx_evt, cfg80211_rx_unexpected_4addr_frame, TP_PROTO(struct net_device *netdev, const u8 *addr, int link_id), TP_ARGS(netdev, addr, link_id) ); TRACE_EVENT(cfg80211_ibss_joined, TP_PROTO(struct net_device *netdev, const u8 *bssid, struct ieee80211_channel *channel), TP_ARGS(netdev, bssid, channel), TP_STRUCT__entry( NETDEV_ENTRY MAC_ENTRY(bssid) CHAN_ENTRY ), TP_fast_assign( NETDEV_ASSIGN; MAC_ASSIGN(bssid, bssid); CHAN_ASSIGN(channel); ), TP_printk(NETDEV_PR_FMT ", bssid: %pM, " CHAN_PR_FMT, NETDEV_PR_ARG, __entry->bssid, CHAN_PR_ARG) ); TRACE_EVENT(cfg80211_probe_status, TP_PROTO(struct net_device *netdev, const u8 *addr, u64 cookie, bool acked), TP_ARGS(netdev, addr, cookie, acked), TP_STRUCT__entry( NETDEV_ENTRY MAC_ENTRY(addr) __field(u64, cookie) __field(bool, acked) ), TP_fast_assign( NETDEV_ASSIGN; MAC_ASSIGN(addr, addr); __entry->cookie = cookie; __entry->acked = acked; ), TP_printk(NETDEV_PR_FMT " addr:%pM, cookie: %llu, acked: %s", NETDEV_PR_ARG, __entry->addr, __entry->cookie, BOOL_TO_STR(__entry->acked)) ); TRACE_EVENT(cfg80211_cqm_pktloss_notify, TP_PROTO(struct net_device *netdev, const u8 *peer, u32 num_packets), TP_ARGS(netdev, peer, num_packets), TP_STRUCT__entry( NETDEV_ENTRY MAC_ENTRY(peer) __field(u32, num_packets) ), TP_fast_assign( NETDEV_ASSIGN; MAC_ASSIGN(peer, peer); __entry->num_packets = num_packets; ), TP_printk(NETDEV_PR_FMT ", peer: %pM, num of lost packets: %u", NETDEV_PR_ARG, __entry->peer, __entry->num_packets) ); DEFINE_EVENT(cfg80211_netdev_mac_evt, cfg80211_gtk_rekey_notify, TP_PROTO(struct net_device *netdev, const u8 *macaddr), TP_ARGS(netdev, macaddr) ); TRACE_EVENT(cfg80211_pmksa_candidate_notify, TP_PROTO(struct net_device *netdev, int index, const u8 *bssid, bool preauth), TP_ARGS(netdev, index, bssid, preauth), TP_STRUCT__entry( NETDEV_ENTRY __field(int, index) MAC_ENTRY(bssid) __field(bool, preauth) ), TP_fast_assign( NETDEV_ASSIGN; __entry->index = index; MAC_ASSIGN(bssid, bssid); __entry->preauth = preauth; ), TP_printk(NETDEV_PR_FMT ", index:%d, bssid: %pM, pre auth: %s", NETDEV_PR_ARG, __entry->index, __entry->bssid, BOOL_TO_STR(__entry->preauth)) ); TRACE_EVENT(cfg80211_report_obss_beacon, TP_PROTO(struct wiphy *wiphy, const u8 *frame, size_t len, int freq, int sig_dbm), TP_ARGS(wiphy, frame, len, freq, sig_dbm), TP_STRUCT__entry( WIPHY_ENTRY __field(int, freq) __field(int, sig_dbm) ), TP_fast_assign( WIPHY_ASSIGN; __entry->freq = freq; __entry->sig_dbm = sig_dbm; ), TP_printk(WIPHY_PR_FMT ", freq: "KHZ_F", sig_dbm: %d", WIPHY_PR_ARG, PR_KHZ(__entry->freq), __entry->sig_dbm) ); TRACE_EVENT(cfg80211_tdls_oper_request, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, const u8 *peer, enum nl80211_tdls_operation oper, u16 reason_code), TP_ARGS(wiphy, netdev, peer, oper, reason_code), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY MAC_ENTRY(peer) __field(enum nl80211_tdls_operation, oper) __field(u16, reason_code) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; MAC_ASSIGN(peer, peer); __entry->oper = oper; __entry->reason_code = reason_code; ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", peer: %pM, oper: %d, reason_code %u", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->peer, __entry->oper, __entry->reason_code) ); TRACE_EVENT(cfg80211_scan_done, TP_PROTO(struct cfg80211_scan_request_int *request, struct cfg80211_scan_info *info), TP_ARGS(request, info), TP_STRUCT__entry( __field(u32, n_channels) __dynamic_array(u8, ie, request ? request->req.ie_len : 0) __array(u32, rates, NUM_NL80211_BANDS) __field(u32, wdev_id) MAC_ENTRY(wiphy_mac) __field(bool, no_cck) __field(bool, aborted) __field(u64, scan_start_tsf) MAC_ENTRY(tsf_bssid) ), TP_fast_assign( if (request) { memcpy(__get_dynamic_array(ie), request->req.ie, request->req.ie_len); memcpy(__entry->rates, request->req.rates, NUM_NL80211_BANDS); __entry->wdev_id = request->req.wdev ? request->req.wdev->identifier : 0; if (request->req.wiphy) MAC_ASSIGN(wiphy_mac, request->req.wiphy->perm_addr); __entry->no_cck = request->req.no_cck; } if (info) { __entry->aborted = info->aborted; __entry->scan_start_tsf = info->scan_start_tsf; MAC_ASSIGN(tsf_bssid, info->tsf_bssid); } ), TP_printk("aborted: %s, scan start (TSF): %llu, tsf_bssid: %pM", BOOL_TO_STR(__entry->aborted), (unsigned long long)__entry->scan_start_tsf, __entry->tsf_bssid) ); DECLARE_EVENT_CLASS(wiphy_id_evt, TP_PROTO(struct wiphy *wiphy, u64 id), TP_ARGS(wiphy, id), TP_STRUCT__entry( WIPHY_ENTRY __field(u64, id) ), TP_fast_assign( WIPHY_ASSIGN; __entry->id = id; ), TP_printk(WIPHY_PR_FMT ", id: %llu", WIPHY_PR_ARG, __entry->id) ); DEFINE_EVENT(wiphy_id_evt, cfg80211_sched_scan_stopped, TP_PROTO(struct wiphy *wiphy, u64 id), TP_ARGS(wiphy, id) ); DEFINE_EVENT(wiphy_id_evt, cfg80211_sched_scan_results, TP_PROTO(struct wiphy *wiphy, u64 id), TP_ARGS(wiphy, id) ); TRACE_EVENT(cfg80211_get_bss, TP_PROTO(struct wiphy *wiphy, struct ieee80211_channel *channel, const u8 *bssid, const u8 *ssid, size_t ssid_len, enum ieee80211_bss_type bss_type, enum ieee80211_privacy privacy), TP_ARGS(wiphy, channel, bssid, ssid, ssid_len, bss_type, privacy), TP_STRUCT__entry( WIPHY_ENTRY CHAN_ENTRY MAC_ENTRY(bssid) __dynamic_array(u8, ssid, ssid_len) __field(enum ieee80211_bss_type, bss_type) __field(enum ieee80211_privacy, privacy) ), TP_fast_assign( WIPHY_ASSIGN; CHAN_ASSIGN(channel); MAC_ASSIGN(bssid, bssid); memcpy(__get_dynamic_array(ssid), ssid, ssid_len); __entry->bss_type = bss_type; __entry->privacy = privacy; ), TP_printk(WIPHY_PR_FMT ", " CHAN_PR_FMT ", %pM" ", buf: %#.2x, bss_type: %d, privacy: %d", WIPHY_PR_ARG, CHAN_PR_ARG, __entry->bssid, ((u8 *)__get_dynamic_array(ssid))[0], __entry->bss_type, __entry->privacy) ); TRACE_EVENT(cfg80211_inform_bss_frame, TP_PROTO(struct wiphy *wiphy, struct cfg80211_inform_bss *data, struct ieee80211_mgmt *mgmt, size_t len), TP_ARGS(wiphy, data, mgmt, len), TP_STRUCT__entry( WIPHY_ENTRY CHAN_ENTRY __dynamic_array(u8, mgmt, len) __field(s32, signal) __field(u64, ts_boottime) __field(u64, parent_tsf) MAC_ENTRY(parent_bssid) ), TP_fast_assign( WIPHY_ASSIGN; CHAN_ASSIGN(data->chan); if (mgmt) memcpy(__get_dynamic_array(mgmt), mgmt, len); __entry->signal = data->signal; __entry->ts_boottime = data->boottime_ns; __entry->parent_tsf = data->parent_tsf; MAC_ASSIGN(parent_bssid, data->parent_bssid); ), TP_printk(WIPHY_PR_FMT ", " CHAN_PR_FMT "signal: %d, tsb:%llu, detect_tsf:%llu, tsf_bssid: %pM", WIPHY_PR_ARG, CHAN_PR_ARG, __entry->signal, (unsigned long long)__entry->ts_boottime, (unsigned long long)__entry->parent_tsf, __entry->parent_bssid) ); DECLARE_EVENT_CLASS(cfg80211_bss_evt, TP_PROTO(struct cfg80211_bss *pub), TP_ARGS(pub), TP_STRUCT__entry( MAC_ENTRY(bssid) CHAN_ENTRY ), TP_fast_assign( MAC_ASSIGN(bssid, pub->bssid); CHAN_ASSIGN(pub->channel); ), TP_printk("%pM, " CHAN_PR_FMT, __entry->bssid, CHAN_PR_ARG) ); DEFINE_EVENT(cfg80211_bss_evt, cfg80211_return_bss, TP_PROTO(struct cfg80211_bss *pub), TP_ARGS(pub) ); TRACE_EVENT(cfg80211_report_wowlan_wakeup, TP_PROTO(struct wiphy *wiphy, struct wireless_dev *wdev, struct cfg80211_wowlan_wakeup *wakeup), TP_ARGS(wiphy, wdev, wakeup), TP_STRUCT__entry( WIPHY_ENTRY WDEV_ENTRY __field(bool, non_wireless) __field(bool, disconnect) __field(bool, magic_pkt) __field(bool, gtk_rekey_failure) __field(bool, eap_identity_req) __field(bool, four_way_handshake) __field(bool, rfkill_release) __field(s32, pattern_idx) __field(u32, packet_len) __dynamic_array(u8, packet, wakeup ? wakeup->packet_present_len : 0) ), TP_fast_assign( WIPHY_ASSIGN; WDEV_ASSIGN; __entry->non_wireless = !wakeup; __entry->disconnect = wakeup ? wakeup->disconnect : false; __entry->magic_pkt = wakeup ? wakeup->magic_pkt : false; __entry->gtk_rekey_failure = wakeup ? wakeup->gtk_rekey_failure : false; __entry->eap_identity_req = wakeup ? wakeup->eap_identity_req : false; __entry->four_way_handshake = wakeup ? wakeup->four_way_handshake : false; __entry->rfkill_release = wakeup ? wakeup->rfkill_release : false; __entry->pattern_idx = wakeup ? wakeup->pattern_idx : false; __entry->packet_len = wakeup ? wakeup->packet_len : false; if (wakeup && wakeup->packet && wakeup->packet_present_len) memcpy(__get_dynamic_array(packet), wakeup->packet, wakeup->packet_present_len); ), TP_printk(WIPHY_PR_FMT ", " WDEV_PR_FMT, WIPHY_PR_ARG, WDEV_PR_ARG) ); TRACE_EVENT(cfg80211_ft_event, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, struct cfg80211_ft_event_params *ft_event), TP_ARGS(wiphy, netdev, ft_event), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY __dynamic_array(u8, ies, ft_event->ies_len) MAC_ENTRY(target_ap) __dynamic_array(u8, ric_ies, ft_event->ric_ies_len) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; if (ft_event->ies) memcpy(__get_dynamic_array(ies), ft_event->ies, ft_event->ies_len); MAC_ASSIGN(target_ap, ft_event->target_ap); if (ft_event->ric_ies) memcpy(__get_dynamic_array(ric_ies), ft_event->ric_ies, ft_event->ric_ies_len); ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", target_ap: %pM", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->target_ap) ); TRACE_EVENT(cfg80211_stop_iface, TP_PROTO(struct wiphy *wiphy, struct wireless_dev *wdev), TP_ARGS(wiphy, wdev), TP_STRUCT__entry( WIPHY_ENTRY WDEV_ENTRY ), TP_fast_assign( WIPHY_ASSIGN; WDEV_ASSIGN; ), TP_printk(WIPHY_PR_FMT ", " WDEV_PR_FMT, WIPHY_PR_ARG, WDEV_PR_ARG) ); TRACE_EVENT(cfg80211_pmsr_report, TP_PROTO(struct wiphy *wiphy, struct wireless_dev *wdev, u64 cookie, const u8 *addr), TP_ARGS(wiphy, wdev, cookie, addr), TP_STRUCT__entry( WIPHY_ENTRY WDEV_ENTRY __field(u64, cookie) MAC_ENTRY(addr) ), TP_fast_assign( WIPHY_ASSIGN; WDEV_ASSIGN; __entry->cookie = cookie; MAC_ASSIGN(addr, addr); ), TP_printk(WIPHY_PR_FMT ", " WDEV_PR_FMT ", cookie:%lld, %pM", WIPHY_PR_ARG, WDEV_PR_ARG, (unsigned long long)__entry->cookie, __entry->addr) ); TRACE_EVENT(cfg80211_pmsr_complete, TP_PROTO(struct wiphy *wiphy, struct wireless_dev *wdev, u64 cookie), TP_ARGS(wiphy, wdev, cookie), TP_STRUCT__entry( WIPHY_ENTRY WDEV_ENTRY __field(u64, cookie) ), TP_fast_assign( WIPHY_ASSIGN; WDEV_ASSIGN; __entry->cookie = cookie; ), TP_printk(WIPHY_PR_FMT ", " WDEV_PR_FMT ", cookie:%lld", WIPHY_PR_ARG, WDEV_PR_ARG, (unsigned long long)__entry->cookie) ); TRACE_EVENT(cfg80211_update_owe_info_event, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, struct cfg80211_update_owe_info *owe_info), TP_ARGS(wiphy, netdev, owe_info), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY MAC_ENTRY(peer) __dynamic_array(u8, ie, owe_info->ie_len) __field(int, assoc_link_id) MAC_ENTRY(peer_mld_addr) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; MAC_ASSIGN(peer, owe_info->peer); memcpy(__get_dynamic_array(ie), owe_info->ie, owe_info->ie_len); __entry->assoc_link_id = owe_info->assoc_link_id; MAC_ASSIGN(peer_mld_addr, owe_info->peer_mld_addr); ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", peer: %pM," " assoc_link_id: %d, peer_mld_addr: %pM", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->peer, __entry->assoc_link_id, __entry->peer_mld_addr) ); TRACE_EVENT(cfg80211_bss_color_notify, TP_PROTO(struct net_device *netdev, enum nl80211_commands cmd, u8 count, u64 color_bitmap), TP_ARGS(netdev, cmd, count, color_bitmap), TP_STRUCT__entry( NETDEV_ENTRY __field(u32, cmd) __field(u8, count) __field(u64, color_bitmap) ), TP_fast_assign( NETDEV_ASSIGN; __entry->cmd = cmd; __entry->count = count; __entry->color_bitmap = color_bitmap; ), TP_printk(NETDEV_PR_FMT ", cmd: %x, count: %u, bitmap: %llx", NETDEV_PR_ARG, __entry->cmd, __entry->count, __entry->color_bitmap) ); TRACE_EVENT(cfg80211_assoc_comeback, TP_PROTO(struct wireless_dev *wdev, const u8 *ap_addr, u32 timeout), TP_ARGS(wdev, ap_addr, timeout), TP_STRUCT__entry( WDEV_ENTRY MAC_ENTRY(ap_addr) __field(u32, timeout) ), TP_fast_assign( WDEV_ASSIGN; MAC_ASSIGN(ap_addr, ap_addr); __entry->timeout = timeout; ), TP_printk(WDEV_PR_FMT ", %pM, timeout: %u TUs", WDEV_PR_ARG, __entry->ap_addr, __entry->timeout) ); DECLARE_EVENT_CLASS(link_station_add_mod, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, struct link_station_parameters *params), TP_ARGS(wiphy, netdev, params), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY __array(u8, mld_mac, 6) __array(u8, link_mac, 6) __field(u32, link_id) __dynamic_array(u8, supported_rates, params->supported_rates_len) __array(u8, ht_capa, (int)sizeof(struct ieee80211_ht_cap)) __array(u8, vht_capa, (int)sizeof(struct ieee80211_vht_cap)) __field(u8, opmode_notif) __field(bool, opmode_notif_used) __dynamic_array(u8, he_capa, params->he_capa_len) __array(u8, he_6ghz_capa, (int)sizeof(struct ieee80211_he_6ghz_capa)) __dynamic_array(u8, eht_capa, params->eht_capa_len) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; memset(__entry->mld_mac, 0, 6); memset(__entry->link_mac, 0, 6); if (params->mld_mac) memcpy(__entry->mld_mac, params->mld_mac, 6); if (params->link_mac) memcpy(__entry->link_mac, params->link_mac, 6); __entry->link_id = params->link_id; if (params->supported_rates && params->supported_rates_len) memcpy(__get_dynamic_array(supported_rates), params->supported_rates, params->supported_rates_len); memset(__entry->ht_capa, 0, sizeof(struct ieee80211_ht_cap)); if (params->ht_capa) memcpy(__entry->ht_capa, params->ht_capa, sizeof(struct ieee80211_ht_cap)); memset(__entry->vht_capa, 0, sizeof(struct ieee80211_vht_cap)); if (params->vht_capa) memcpy(__entry->vht_capa, params->vht_capa, sizeof(struct ieee80211_vht_cap)); __entry->opmode_notif = params->opmode_notif; __entry->opmode_notif_used = params->opmode_notif_used; if (params->he_capa && params->he_capa_len) memcpy(__get_dynamic_array(he_capa), params->he_capa, params->he_capa_len); memset(__entry->he_6ghz_capa, 0, sizeof(struct ieee80211_he_6ghz_capa)); if (params->he_6ghz_capa) memcpy(__entry->he_6ghz_capa, params->he_6ghz_capa, sizeof(struct ieee80211_he_6ghz_capa)); if (params->eht_capa && params->eht_capa_len) memcpy(__get_dynamic_array(eht_capa), params->eht_capa, params->eht_capa_len); ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", station mac: %pM" ", link mac: %pM, link id: %u", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->mld_mac, __entry->link_mac, __entry->link_id) ); DEFINE_EVENT(link_station_add_mod, rdev_add_link_station, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, struct link_station_parameters *params), TP_ARGS(wiphy, netdev, params) ); DEFINE_EVENT(link_station_add_mod, rdev_mod_link_station, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, struct link_station_parameters *params), TP_ARGS(wiphy, netdev, params) ); TRACE_EVENT(cfg80211_links_removed, TP_PROTO(struct net_device *netdev, u16 link_mask), TP_ARGS(netdev, link_mask), TP_STRUCT__entry( NETDEV_ENTRY __field(u16, link_mask) ), TP_fast_assign( NETDEV_ASSIGN; __entry->link_mask = link_mask; ), TP_printk(NETDEV_PR_FMT ", link_mask:0x%x", NETDEV_PR_ARG, __entry->link_mask) ); TRACE_EVENT(cfg80211_mlo_reconf_add_done, TP_PROTO(struct net_device *netdev, u16 link_mask, const u8 *buf, size_t len, bool driver_initiated), TP_ARGS(netdev, link_mask, buf, len, driver_initiated), TP_STRUCT__entry( NETDEV_ENTRY __field(u16, link_mask) __dynamic_array(u8, buf, len) __field(bool, driver_initiated) ), TP_fast_assign( NETDEV_ASSIGN; __entry->link_mask = link_mask; memcpy(__get_dynamic_array(buf), buf, len); __entry->driver_initiated = driver_initiated; ), TP_printk(NETDEV_PR_FMT ", link_mask:0x%x, driver_initiated:%d", NETDEV_PR_ARG, __entry->link_mask, __entry->driver_initiated) ); TRACE_EVENT(rdev_assoc_ml_reconf, TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, struct cfg80211_ml_reconf_req *req), TP_ARGS(wiphy, netdev, req), TP_STRUCT__entry( WIPHY_ENTRY NETDEV_ENTRY __field(u16, add_links) __field(u16, rem_links) __field(u16, ext_mld_capa_ops) ), TP_fast_assign( WIPHY_ASSIGN; NETDEV_ASSIGN; u32 i; __entry->add_links = 0; __entry->rem_links = req->rem_links; for (i = 0; i < IEEE80211_MLD_MAX_NUM_LINKS; i++) if (req->add_links[i].bss) __entry->add_links |= BIT(i); __entry->ext_mld_capa_ops = req->ext_mld_capa_ops; ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", add_links=0x%x, rem_links=0x%x", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->add_links, __entry->rem_links) ); TRACE_EVENT(cfg80211_epcs_changed, TP_PROTO(struct wireless_dev *wdev, bool enabled), TP_ARGS(wdev, enabled), TP_STRUCT__entry( WDEV_ENTRY __field(u32, enabled) ), TP_fast_assign( WDEV_ASSIGN; __entry->enabled = enabled; ), TP_printk(WDEV_PR_FMT ", enabled=%u", WDEV_PR_ARG, __entry->enabled) ); TRACE_EVENT(cfg80211_next_nan_dw_notif, TP_PROTO(struct wireless_dev *wdev, struct ieee80211_channel *chan), TP_ARGS(wdev, chan), TP_STRUCT__entry( WDEV_ENTRY CHAN_ENTRY ), TP_fast_assign( WDEV_ASSIGN; CHAN_ASSIGN(chan); ), TP_printk(WDEV_PR_FMT " " CHAN_PR_FMT, WDEV_PR_ARG, CHAN_PR_ARG) ); TRACE_EVENT(cfg80211_nan_cluster_joined, TP_PROTO(struct wireless_dev *wdev, const u8 *cluster_id, bool new_cluster), TP_ARGS(wdev, cluster_id, new_cluster), TP_STRUCT__entry( WDEV_ENTRY MAC_ENTRY(cluster_id) __field(bool, new_cluster) ), TP_fast_assign( WDEV_ASSIGN; MAC_ASSIGN(cluster_id, cluster_id); __entry->new_cluster = new_cluster; ), TP_printk(WDEV_PR_FMT " cluster_id %pMF%s", WDEV_PR_ARG, __entry->cluster_id, __entry->new_cluster ? " [new]" : "") ); #endif /* !__RDEV_OPS_TRACE || TRACE_HEADER_MULTI_READ */ #undef TRACE_INCLUDE_PATH #define TRACE_INCLUDE_PATH . #undef TRACE_INCLUDE_FILE #define TRACE_INCLUDE_FILE trace #include <trace/define_trace.h> |
| 76 76 76 76 76 76 76 76 75 75 76 76 13 13 12 13 13 13 13 200 199 54 141 213 20 211 3 212 739 740 584 150 508 780 542 751 751 751 751 750 8 17 24 24 585 232 751 752 116 711 750 13 580 578 481 148 580 580 580 581 201 201 203 132 88 200 581 579 580 12 266 84 84 30 30 585 586 565 30 771 89 577 565 70 68 69 89 559 89 197 197 202 11 197 30 581 579 566 84 579 579 2 203 203 188 30 203 581 15 15 9 579 581 579 581 580 579 579 581 565 83 3 2 578 15 579 580 564 84 578 5 597 599 230 184 234 203 201 201 203 203 188 30 203 9 9 203 202 200 199 16 23 15 23 23 23 7 17 17 7 20 20 3 3 19 19 76 76 76 7 7 56 56 56 56 118 22 3 2 135 141 141 141 85 86 66 3 17 7 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 2674 2675 2676 2677 2678 2679 2680 2681 2682 2683 2684 2685 2686 2687 2688 2689 2690 2691 2692 2693 2694 2695 2696 2697 2698 2699 2700 2701 2702 2703 2704 2705 2706 2707 2708 2709 2710 2711 2712 2713 2714 2715 2716 2717 2718 2719 2720 2721 2722 2723 2724 2725 2726 2727 2728 2729 2730 2731 2732 2733 2734 2735 2736 2737 2738 2739 2740 2741 2742 2743 2744 2745 2746 2747 2748 2749 2750 2751 2752 2753 2754 2755 2756 2757 2758 2759 2760 2761 2762 2763 2764 2765 2766 2767 2768 2769 2770 2771 2772 2773 2774 2775 2776 2777 2778 2779 2780 2781 2782 2783 2784 2785 2786 2787 2788 2789 2790 2791 2792 2793 2794 2795 2796 2797 2798 2799 2800 2801 2802 2803 2804 2805 2806 2807 2808 2809 2810 2811 2812 2813 2814 2815 2816 2817 2818 2819 2820 2821 2822 2823 2824 2825 2826 2827 2828 2829 2830 2831 2832 2833 2834 2835 2836 2837 2838 2839 2840 2841 2842 2843 2844 2845 2846 2847 2848 2849 2850 2851 2852 2853 2854 2855 2856 2857 2858 2859 2860 2861 2862 2863 2864 2865 2866 2867 2868 2869 2870 2871 2872 2873 2874 2875 2876 2877 2878 2879 2880 2881 2882 2883 2884 2885 2886 2887 2888 2889 2890 2891 2892 2893 2894 2895 2896 2897 2898 2899 2900 2901 2902 2903 2904 2905 2906 2907 2908 2909 2910 2911 2912 2913 2914 2915 2916 2917 2918 2919 2920 2921 2922 2923 2924 2925 2926 2927 2928 2929 2930 2931 2932 2933 2934 2935 2936 2937 2938 2939 2940 2941 2942 2943 2944 2945 2946 2947 2948 2949 2950 2951 2952 2953 2954 2955 2956 2957 2958 2959 2960 2961 2962 2963 2964 2965 2966 2967 2968 2969 2970 2971 2972 2973 2974 2975 2976 2977 2978 2979 2980 2981 2982 2983 2984 2985 2986 2987 2988 2989 2990 2991 2992 2993 2994 2995 2996 2997 2998 2999 3000 3001 3002 3003 3004 3005 3006 3007 3008 3009 3010 3011 3012 3013 3014 3015 3016 3017 3018 3019 3020 3021 3022 3023 3024 3025 3026 3027 3028 3029 3030 3031 3032 3033 3034 3035 3036 3037 3038 3039 3040 3041 3042 3043 3044 3045 3046 3047 3048 3049 3050 3051 3052 3053 3054 3055 3056 3057 3058 3059 3060 3061 3062 3063 3064 3065 3066 3067 3068 3069 3070 3071 3072 3073 3074 3075 3076 3077 3078 3079 3080 3081 3082 3083 3084 3085 3086 3087 3088 3089 3090 3091 3092 3093 3094 3095 3096 3097 3098 3099 3100 3101 3102 3103 3104 3105 3106 3107 3108 3109 3110 3111 3112 3113 3114 3115 3116 3117 3118 3119 3120 3121 3122 3123 3124 3125 3126 3127 3128 3129 3130 3131 3132 3133 3134 3135 3136 3137 3138 3139 3140 3141 3142 3143 3144 3145 3146 3147 3148 3149 3150 3151 3152 3153 3154 3155 3156 3157 3158 3159 3160 3161 3162 3163 3164 3165 3166 3167 3168 3169 3170 3171 3172 3173 3174 3175 3176 3177 3178 3179 3180 3181 3182 3183 3184 3185 3186 3187 3188 3189 3190 3191 3192 3193 3194 3195 3196 3197 3198 3199 3200 3201 3202 3203 3204 3205 3206 3207 3208 3209 3210 3211 3212 3213 3214 3215 3216 3217 3218 3219 3220 3221 3222 3223 3224 3225 3226 3227 3228 3229 3230 3231 3232 3233 3234 3235 3236 3237 3238 3239 3240 3241 3242 3243 3244 3245 3246 3247 3248 3249 3250 3251 3252 3253 3254 3255 3256 3257 3258 3259 3260 3261 3262 3263 3264 3265 3266 3267 3268 3269 3270 3271 3272 3273 3274 3275 3276 3277 3278 3279 3280 3281 3282 3283 3284 3285 3286 3287 3288 3289 3290 3291 3292 3293 3294 3295 3296 3297 3298 3299 3300 3301 3302 3303 3304 3305 3306 3307 3308 3309 3310 3311 3312 3313 3314 3315 3316 3317 3318 3319 3320 3321 3322 3323 3324 3325 3326 3327 3328 3329 3330 3331 3332 3333 3334 3335 3336 3337 3338 3339 3340 3341 3342 3343 3344 3345 3346 3347 3348 3349 3350 3351 3352 3353 3354 3355 3356 3357 3358 3359 3360 3361 3362 3363 3364 3365 3366 3367 3368 3369 3370 3371 3372 3373 3374 3375 3376 3377 3378 3379 3380 3381 3382 3383 3384 3385 3386 3387 3388 3389 3390 3391 3392 3393 3394 3395 3396 3397 3398 3399 3400 3401 3402 3403 3404 3405 3406 3407 3408 3409 3410 3411 3412 3413 3414 3415 3416 3417 3418 3419 3420 3421 3422 3423 3424 3425 3426 3427 3428 3429 3430 3431 3432 3433 3434 3435 3436 3437 3438 3439 3440 3441 3442 3443 3444 3445 3446 3447 3448 3449 3450 3451 3452 3453 3454 3455 3456 3457 3458 3459 3460 3461 3462 3463 3464 3465 3466 3467 3468 3469 3470 3471 3472 3473 3474 3475 3476 3477 3478 3479 3480 3481 3482 3483 3484 3485 3486 3487 3488 3489 3490 3491 3492 3493 3494 3495 3496 3497 3498 3499 3500 3501 3502 3503 3504 3505 3506 3507 3508 3509 3510 3511 3512 3513 3514 3515 3516 3517 3518 3519 3520 3521 3522 3523 3524 3525 3526 3527 3528 3529 3530 3531 3532 3533 3534 3535 3536 3537 3538 3539 3540 3541 3542 3543 3544 3545 3546 3547 3548 3549 3550 3551 3552 3553 3554 3555 3556 3557 3558 3559 3560 3561 3562 3563 3564 3565 3566 3567 3568 3569 3570 3571 3572 3573 3574 3575 3576 3577 3578 3579 3580 3581 3582 3583 3584 3585 3586 3587 3588 3589 3590 3591 3592 3593 3594 3595 3596 3597 3598 3599 3600 3601 3602 3603 3604 3605 3606 3607 3608 3609 3610 3611 3612 3613 3614 3615 3616 3617 3618 3619 3620 3621 3622 3623 3624 3625 3626 3627 3628 3629 3630 3631 3632 3633 3634 3635 3636 3637 3638 3639 3640 3641 3642 3643 3644 3645 3646 3647 3648 3649 3650 3651 3652 3653 3654 3655 3656 3657 3658 3659 3660 3661 3662 3663 3664 3665 3666 3667 3668 3669 3670 3671 3672 3673 3674 3675 3676 3677 3678 3679 3680 3681 3682 3683 3684 3685 3686 3687 3688 3689 3690 3691 3692 3693 3694 3695 3696 3697 3698 3699 3700 3701 3702 3703 3704 3705 3706 3707 3708 3709 3710 3711 3712 3713 3714 3715 3716 3717 3718 3719 3720 3721 3722 3723 3724 3725 3726 3727 3728 3729 3730 3731 3732 3733 3734 3735 3736 3737 3738 3739 3740 3741 3742 3743 3744 3745 3746 3747 3748 3749 3750 3751 3752 3753 3754 3755 3756 3757 3758 3759 3760 3761 3762 3763 3764 3765 3766 3767 3768 3769 3770 3771 3772 3773 3774 3775 3776 3777 3778 3779 3780 3781 3782 3783 3784 3785 3786 3787 3788 3789 3790 3791 3792 3793 3794 3795 3796 3797 3798 3799 3800 3801 3802 3803 3804 3805 3806 3807 3808 3809 3810 3811 3812 3813 3814 3815 3816 3817 3818 3819 3820 3821 3822 3823 3824 3825 3826 3827 3828 3829 3830 3831 3832 3833 3834 3835 3836 3837 3838 3839 3840 3841 3842 3843 3844 3845 3846 3847 3848 3849 3850 3851 3852 3853 3854 3855 3856 3857 3858 3859 3860 3861 3862 3863 3864 3865 3866 3867 3868 3869 3870 3871 3872 3873 3874 3875 3876 3877 3878 3879 3880 3881 3882 3883 3884 3885 3886 3887 3888 3889 3890 3891 3892 3893 3894 3895 3896 3897 3898 3899 3900 3901 3902 3903 3904 3905 3906 3907 3908 3909 3910 3911 3912 3913 3914 3915 3916 3917 3918 3919 3920 3921 3922 3923 3924 3925 3926 3927 3928 3929 3930 3931 3932 3933 3934 3935 3936 3937 3938 3939 3940 3941 3942 3943 3944 3945 3946 3947 3948 3949 3950 3951 3952 3953 3954 3955 3956 3957 3958 3959 3960 3961 3962 3963 3964 3965 3966 3967 3968 3969 3970 3971 3972 3973 3974 3975 3976 3977 3978 3979 3980 3981 3982 3983 3984 3985 3986 3987 3988 3989 3990 3991 3992 3993 3994 3995 3996 3997 3998 3999 4000 4001 4002 4003 4004 4005 4006 4007 4008 4009 4010 4011 4012 4013 4014 4015 4016 4017 4018 4019 4020 4021 4022 4023 4024 4025 4026 4027 4028 4029 4030 4031 4032 4033 4034 4035 4036 4037 4038 4039 4040 4041 4042 4043 4044 4045 4046 4047 4048 4049 4050 4051 4052 4053 4054 4055 4056 4057 4058 4059 4060 4061 4062 4063 4064 4065 4066 4067 4068 4069 4070 4071 4072 4073 4074 4075 4076 4077 4078 4079 4080 4081 4082 4083 4084 4085 4086 4087 4088 4089 4090 4091 4092 4093 4094 4095 4096 4097 4098 4099 4100 4101 4102 4103 4104 4105 4106 4107 4108 4109 4110 4111 4112 4113 4114 4115 4116 4117 4118 4119 4120 4121 4122 4123 4124 4125 4126 4127 4128 4129 4130 4131 4132 4133 4134 4135 4136 4137 4138 4139 4140 4141 4142 4143 4144 4145 4146 4147 4148 4149 4150 4151 4152 4153 4154 4155 4156 4157 4158 4159 4160 4161 4162 4163 4164 4165 4166 4167 4168 4169 4170 4171 4172 4173 4174 4175 4176 4177 4178 4179 4180 4181 4182 4183 4184 4185 4186 4187 4188 4189 4190 4191 4192 4193 4194 4195 4196 4197 4198 4199 4200 4201 4202 4203 4204 4205 4206 4207 4208 4209 4210 4211 4212 4213 4214 4215 4216 4217 4218 4219 4220 4221 4222 4223 4224 4225 4226 4227 4228 4229 4230 4231 4232 4233 4234 4235 4236 4237 4238 4239 4240 4241 4242 4243 4244 4245 4246 4247 4248 4249 4250 4251 4252 4253 4254 4255 4256 4257 4258 4259 4260 4261 4262 4263 4264 4265 4266 4267 4268 4269 4270 4271 4272 4273 4274 4275 4276 4277 4278 4279 4280 4281 4282 4283 4284 4285 4286 4287 4288 4289 4290 4291 4292 4293 4294 4295 4296 4297 4298 4299 4300 4301 4302 4303 4304 4305 4306 4307 4308 4309 4310 4311 4312 4313 4314 4315 4316 4317 4318 4319 4320 4321 4322 4323 4324 4325 4326 4327 4328 4329 4330 4331 4332 4333 4334 4335 4336 4337 4338 4339 4340 4341 4342 4343 4344 4345 4346 4347 4348 4349 4350 4351 4352 4353 4354 4355 4356 4357 4358 4359 4360 4361 4362 4363 4364 4365 4366 4367 4368 4369 4370 4371 4372 4373 4374 4375 4376 4377 4378 4379 4380 4381 4382 4383 4384 4385 4386 4387 4388 4389 4390 4391 4392 4393 4394 4395 4396 4397 4398 4399 4400 4401 4402 4403 4404 4405 4406 4407 4408 4409 4410 4411 4412 4413 4414 4415 4416 4417 4418 4419 4420 4421 4422 4423 4424 4425 4426 4427 4428 4429 4430 4431 4432 4433 4434 4435 4436 4437 4438 4439 4440 4441 4442 4443 4444 4445 4446 4447 4448 4449 4450 4451 4452 4453 4454 4455 4456 4457 4458 4459 4460 4461 4462 4463 4464 4465 4466 4467 4468 4469 4470 4471 4472 4473 4474 4475 4476 4477 4478 4479 4480 4481 4482 4483 4484 4485 4486 4487 4488 4489 4490 4491 4492 4493 4494 4495 4496 4497 4498 4499 4500 4501 4502 4503 4504 4505 4506 4507 4508 4509 4510 4511 4512 4513 4514 4515 4516 4517 4518 4519 4520 4521 4522 4523 4524 4525 4526 4527 4528 4529 4530 4531 4532 4533 4534 4535 4536 4537 4538 4539 4540 4541 4542 4543 4544 4545 4546 4547 4548 4549 4550 4551 4552 4553 4554 4555 4556 4557 4558 4559 4560 4561 4562 4563 4564 4565 4566 4567 4568 4569 4570 4571 4572 4573 4574 4575 4576 4577 4578 4579 4580 4581 4582 4583 4584 4585 4586 4587 4588 4589 4590 4591 4592 4593 4594 4595 4596 4597 4598 4599 4600 4601 4602 4603 4604 4605 4606 4607 4608 4609 4610 4611 4612 4613 4614 4615 4616 4617 4618 4619 4620 4621 4622 4623 4624 4625 4626 4627 4628 4629 4630 4631 4632 4633 4634 4635 4636 4637 4638 4639 4640 4641 4642 4643 4644 4645 4646 4647 4648 4649 4650 4651 4652 4653 4654 4655 4656 4657 4658 4659 4660 4661 4662 4663 4664 4665 4666 4667 4668 4669 4670 4671 4672 4673 4674 4675 4676 4677 4678 4679 4680 4681 4682 4683 4684 4685 4686 4687 4688 4689 4690 4691 4692 4693 4694 4695 4696 4697 4698 4699 4700 4701 4702 4703 4704 4705 4706 4707 4708 4709 4710 4711 4712 4713 4714 4715 4716 4717 4718 4719 4720 4721 4722 4723 4724 4725 4726 4727 4728 4729 4730 4731 4732 4733 4734 4735 4736 4737 4738 4739 4740 4741 4742 4743 4744 4745 4746 4747 4748 4749 4750 4751 4752 4753 4754 4755 4756 4757 4758 4759 4760 4761 4762 4763 4764 4765 4766 4767 4768 4769 4770 4771 4772 4773 4774 4775 4776 4777 4778 4779 4780 4781 4782 4783 4784 4785 4786 4787 4788 4789 4790 4791 4792 4793 4794 4795 4796 4797 4798 4799 4800 4801 4802 4803 4804 4805 4806 4807 4808 4809 4810 4811 4812 4813 4814 4815 4816 4817 4818 4819 4820 4821 4822 4823 4824 4825 4826 4827 4828 4829 4830 4831 4832 4833 4834 4835 4836 4837 4838 4839 4840 4841 4842 4843 4844 4845 4846 4847 4848 4849 4850 4851 4852 4853 4854 4855 4856 4857 4858 4859 4860 4861 4862 4863 4864 4865 4866 4867 4868 4869 4870 4871 4872 4873 4874 4875 4876 4877 4878 4879 4880 4881 4882 4883 4884 4885 4886 4887 4888 4889 4890 4891 4892 4893 4894 4895 4896 4897 4898 4899 4900 4901 4902 4903 4904 4905 4906 4907 4908 4909 4910 4911 4912 4913 4914 4915 4916 4917 4918 4919 4920 4921 4922 4923 4924 4925 4926 4927 4928 4929 4930 4931 4932 4933 4934 4935 4936 4937 4938 4939 4940 4941 4942 4943 4944 4945 4946 4947 4948 4949 4950 4951 4952 4953 4954 4955 4956 4957 4958 4959 4960 4961 4962 4963 4964 4965 4966 4967 4968 4969 4970 4971 4972 4973 4974 4975 4976 4977 4978 4979 4980 4981 4982 4983 4984 4985 4986 4987 4988 4989 4990 4991 4992 4993 4994 4995 4996 4997 4998 4999 5000 5001 5002 5003 5004 5005 5006 5007 5008 5009 5010 5011 5012 5013 5014 5015 5016 5017 5018 5019 5020 5021 5022 5023 5024 5025 5026 5027 5028 5029 5030 5031 5032 5033 5034 5035 5036 5037 5038 5039 5040 5041 5042 5043 5044 5045 5046 5047 5048 5049 5050 5051 5052 5053 5054 5055 5056 5057 5058 5059 5060 5061 5062 5063 5064 5065 5066 5067 5068 5069 5070 5071 5072 5073 5074 5075 5076 5077 5078 5079 5080 5081 5082 5083 5084 5085 5086 5087 5088 5089 5090 5091 5092 5093 5094 5095 5096 5097 5098 5099 5100 5101 5102 5103 5104 5105 5106 5107 5108 5109 5110 5111 5112 5113 5114 5115 5116 5117 5118 5119 5120 5121 5122 5123 5124 5125 5126 5127 5128 5129 5130 5131 5132 5133 5134 5135 5136 5137 5138 5139 5140 5141 5142 5143 5144 5145 5146 5147 5148 5149 5150 5151 5152 5153 5154 5155 5156 5157 5158 5159 5160 5161 5162 5163 5164 5165 5166 5167 5168 5169 5170 5171 5172 5173 5174 5175 5176 5177 5178 5179 5180 5181 5182 5183 5184 5185 5186 5187 5188 5189 5190 5191 5192 5193 5194 5195 5196 5197 5198 5199 5200 5201 5202 5203 5204 5205 5206 5207 5208 5209 5210 5211 5212 5213 5214 5215 5216 5217 5218 5219 5220 5221 5222 5223 5224 5225 5226 5227 5228 5229 5230 5231 5232 5233 5234 5235 5236 5237 5238 5239 5240 5241 5242 5243 5244 5245 5246 5247 5248 5249 5250 5251 5252 5253 5254 5255 5256 5257 5258 5259 5260 5261 5262 5263 5264 5265 5266 5267 5268 5269 5270 5271 5272 5273 5274 5275 5276 5277 5278 5279 5280 5281 5282 5283 5284 5285 5286 5287 5288 5289 5290 5291 5292 5293 5294 5295 5296 5297 5298 5299 5300 5301 5302 5303 5304 5305 5306 5307 5308 5309 5310 5311 5312 5313 5314 5315 5316 5317 5318 5319 5320 5321 5322 5323 5324 5325 5326 5327 5328 5329 5330 5331 5332 5333 5334 5335 5336 5337 5338 5339 5340 5341 5342 5343 5344 5345 5346 5347 5348 5349 5350 5351 5352 | // SPDX-License-Identifier: GPL-2.0 /* * drivers/base/core.c - core driver model code (device registration, etc) * * Copyright (c) 2002-3 Patrick Mochel * Copyright (c) 2002-3 Open Source Development Labs * Copyright (c) 2006 Greg Kroah-Hartman <gregkh@suse.de> * Copyright (c) 2006 Novell, Inc. */ #include <linux/acpi.h> #include <linux/blkdev.h> #include <linux/cleanup.h> #include <linux/cpufreq.h> #include <linux/device.h> #include <linux/dma-map-ops.h> /* for dma_default_coherent */ #include <linux/err.h> #include <linux/fwnode.h> #include <linux/init.h> #include <linux/kdev_t.h> #include <linux/kstrtox.h> #include <linux/module.h> #include <linux/mutex.h> #include <linux/netdevice.h> #include <linux/notifier.h> #include <linux/of.h> #include <linux/of_device.h> #include <linux/pm_runtime.h> #include <linux/sched/mm.h> #include <linux/sched/signal.h> #include <linux/slab.h> #include <linux/string_helpers.h> #include <linux/swiotlb.h> #include <linux/sysfs.h> #include "base.h" #include "physical_location.h" #include "power/power.h" /* Device links support. */ static LIST_HEAD(deferred_sync); static unsigned int defer_sync_state_count = 1; static DEFINE_MUTEX(fwnode_link_lock); static bool fw_devlink_is_permissive(void); static void __fw_devlink_link_to_consumers(struct device *dev); static bool fw_devlink_drv_reg_done; static bool fw_devlink_best_effort; static struct workqueue_struct *device_link_wq; /** * __fwnode_link_add - Create a link between two fwnode_handles. * @con: Consumer end of the link. * @sup: Supplier end of the link. * @flags: Link flags. * * Create a fwnode link between fwnode handles @con and @sup. The fwnode link * represents the detail that the firmware lists @sup fwnode as supplying a * resource to @con. * * The driver core will use the fwnode link to create a device link between the * two device objects corresponding to @con and @sup when they are created. The * driver core will automatically delete the fwnode link between @con and @sup * after doing that. * * Attempts to create duplicate links between the same pair of fwnode handles * are ignored and there is no reference counting. */ static int __fwnode_link_add(struct fwnode_handle *con, struct fwnode_handle *sup, u8 flags) { struct fwnode_link *link; list_for_each_entry(link, &sup->consumers, s_hook) if (link->consumer == con) { link->flags |= flags; return 0; } link = kzalloc(sizeof(*link), GFP_KERNEL); if (!link) return -ENOMEM; link->supplier = sup; INIT_LIST_HEAD(&link->s_hook); link->consumer = con; INIT_LIST_HEAD(&link->c_hook); link->flags = flags; list_add(&link->s_hook, &sup->consumers); list_add(&link->c_hook, &con->suppliers); pr_debug("%pfwf Linked as a fwnode consumer to %pfwf\n", con, sup); return 0; } int fwnode_link_add(struct fwnode_handle *con, struct fwnode_handle *sup, u8 flags) { guard(mutex)(&fwnode_link_lock); return __fwnode_link_add(con, sup, flags); } /** * __fwnode_link_del - Delete a link between two fwnode_handles. * @link: the fwnode_link to be deleted * * The fwnode_link_lock needs to be held when this function is called. */ static void __fwnode_link_del(struct fwnode_link *link) { pr_debug("%pfwf Dropping the fwnode link to %pfwf\n", link->consumer, link->supplier); list_del(&link->s_hook); list_del(&link->c_hook); kfree(link); } /** * __fwnode_link_cycle - Mark a fwnode link as being part of a cycle. * @link: the fwnode_link to be marked * * The fwnode_link_lock needs to be held when this function is called. */ static void __fwnode_link_cycle(struct fwnode_link *link) { pr_debug("%pfwf: cycle: depends on %pfwf\n", link->consumer, link->supplier); link->flags |= FWLINK_FLAG_CYCLE; } /** * fwnode_links_purge_suppliers - Delete all supplier links of fwnode_handle. * @fwnode: fwnode whose supplier links need to be deleted * * Deletes all supplier links connecting directly to @fwnode. */ static void fwnode_links_purge_suppliers(struct fwnode_handle *fwnode) { struct fwnode_link *link, *tmp; guard(mutex)(&fwnode_link_lock); list_for_each_entry_safe(link, tmp, &fwnode->suppliers, c_hook) __fwnode_link_del(link); } /** * fwnode_links_purge_consumers - Delete all consumer links of fwnode_handle. * @fwnode: fwnode whose consumer links need to be deleted * * Deletes all consumer links connecting directly to @fwnode. */ static void fwnode_links_purge_consumers(struct fwnode_handle *fwnode) { struct fwnode_link *link, *tmp; guard(mutex)(&fwnode_link_lock); list_for_each_entry_safe(link, tmp, &fwnode->consumers, s_hook) __fwnode_link_del(link); } /** * fwnode_links_purge - Delete all links connected to a fwnode_handle. * @fwnode: fwnode whose links needs to be deleted * * Deletes all links connecting directly to a fwnode. */ void fwnode_links_purge(struct fwnode_handle *fwnode) { fwnode_links_purge_suppliers(fwnode); fwnode_links_purge_consumers(fwnode); } void fw_devlink_purge_absent_suppliers(struct fwnode_handle *fwnode) { struct fwnode_handle *child; /* Don't purge consumer links of an added child */ if (fwnode->dev) return; fwnode->flags |= FWNODE_FLAG_NOT_DEVICE; fwnode_links_purge_consumers(fwnode); fwnode_for_each_available_child_node(fwnode, child) fw_devlink_purge_absent_suppliers(child); } EXPORT_SYMBOL_GPL(fw_devlink_purge_absent_suppliers); /** * __fwnode_links_move_consumers - Move consumer from @from to @to fwnode_handle * @from: move consumers away from this fwnode * @to: move consumers to this fwnode * * Move all consumer links from @from fwnode to @to fwnode. */ static void __fwnode_links_move_consumers(struct fwnode_handle *from, struct fwnode_handle *to) { struct fwnode_link *link, *tmp; list_for_each_entry_safe(link, tmp, &from->consumers, s_hook) { __fwnode_link_add(link->consumer, to, link->flags); __fwnode_link_del(link); } } /** * __fw_devlink_pickup_dangling_consumers - Pick up dangling consumers * @fwnode: fwnode from which to pick up dangling consumers * @new_sup: fwnode of new supplier * * If the @fwnode has a corresponding struct device and the device supports * probing (that is, added to a bus), then we want to let fw_devlink create * MANAGED device links to this device, so leave @fwnode and its descendant's * fwnode links alone. * * Otherwise, move its consumers to the new supplier @new_sup. */ static void __fw_devlink_pickup_dangling_consumers(struct fwnode_handle *fwnode, struct fwnode_handle *new_sup) { struct fwnode_handle *child; if (fwnode->dev && fwnode->dev->bus) return; fwnode->flags |= FWNODE_FLAG_NOT_DEVICE; __fwnode_links_move_consumers(fwnode, new_sup); fwnode_for_each_available_child_node(fwnode, child) __fw_devlink_pickup_dangling_consumers(child, new_sup); } static DEFINE_MUTEX(device_links_lock); DEFINE_STATIC_SRCU(device_links_srcu); static inline void device_links_write_lock(void) { mutex_lock(&device_links_lock); } static inline void device_links_write_unlock(void) { mutex_unlock(&device_links_lock); } int device_links_read_lock(void) __acquires(&device_links_srcu) { return srcu_read_lock(&device_links_srcu); } void device_links_read_unlock(int idx) __releases(&device_links_srcu) { srcu_read_unlock(&device_links_srcu, idx); } int device_links_read_lock_held(void) { return srcu_read_lock_held(&device_links_srcu); } static void device_link_synchronize_removal(void) { synchronize_srcu(&device_links_srcu); } static void device_link_remove_from_lists(struct device_link *link) { list_del_rcu(&link->s_node); list_del_rcu(&link->c_node); } static bool device_is_ancestor(struct device *dev, struct device *target) { while (target->parent) { target = target->parent; if (dev == target) return true; } return false; } #define DL_MARKER_FLAGS (DL_FLAG_INFERRED | \ DL_FLAG_CYCLE | \ DL_FLAG_MANAGED) bool device_link_flag_is_sync_state_only(u32 flags) { return (flags & ~DL_MARKER_FLAGS) == DL_FLAG_SYNC_STATE_ONLY; } /** * device_is_dependent - Check if one device depends on another one * @dev: Device to check dependencies for. * @target: Device to check against. * * Check if @target depends on @dev or any device dependent on it (its child or * its consumer etc). Return 1 if that is the case or 0 otherwise. */ static int device_is_dependent(struct device *dev, void *target) { struct device_link *link; int ret; /* * The "ancestors" check is needed to catch the case when the target * device has not been completely initialized yet and it is still * missing from the list of children of its parent device. */ if (dev == target || device_is_ancestor(dev, target)) return 1; ret = device_for_each_child(dev, target, device_is_dependent); if (ret) return ret; list_for_each_entry(link, &dev->links.consumers, s_node) { if (device_link_flag_is_sync_state_only(link->flags)) continue; if (link->consumer == target) return 1; ret = device_is_dependent(link->consumer, target); if (ret) break; } return ret; } static void device_link_init_status(struct device_link *link, struct device *consumer, struct device *supplier) { switch (supplier->links.status) { case DL_DEV_PROBING: switch (consumer->links.status) { case DL_DEV_PROBING: /* * A consumer driver can create a link to a supplier * that has not completed its probing yet as long as it * knows that the supplier is already functional (for * example, it has just acquired some resources from the * supplier). */ link->status = DL_STATE_CONSUMER_PROBE; break; default: link->status = DL_STATE_DORMANT; break; } break; case DL_DEV_DRIVER_BOUND: switch (consumer->links.status) { case DL_DEV_PROBING: link->status = DL_STATE_CONSUMER_PROBE; break; case DL_DEV_DRIVER_BOUND: link->status = DL_STATE_ACTIVE; break; default: link->status = DL_STATE_AVAILABLE; break; } break; case DL_DEV_UNBINDING: link->status = DL_STATE_SUPPLIER_UNBIND; break; default: link->status = DL_STATE_DORMANT; break; } } static int device_reorder_to_tail(struct device *dev, void *not_used) { struct device_link *link; /* * Devices that have not been registered yet will be put to the ends * of the lists during the registration, so skip them here. */ if (device_is_registered(dev)) devices_kset_move_last(dev); if (device_pm_initialized(dev)) device_pm_move_last(dev); device_for_each_child(dev, NULL, device_reorder_to_tail); list_for_each_entry(link, &dev->links.consumers, s_node) { if (device_link_flag_is_sync_state_only(link->flags)) continue; device_reorder_to_tail(link->consumer, NULL); } return 0; } /** * device_pm_move_to_tail - Move set of devices to the end of device lists * @dev: Device to move * * This is a device_reorder_to_tail() wrapper taking the requisite locks. * * It moves the @dev along with all of its children and all of its consumers * to the ends of the device_kset and dpm_list, recursively. */ void device_pm_move_to_tail(struct device *dev) { int idx; idx = device_links_read_lock(); device_pm_lock(); device_reorder_to_tail(dev, NULL); device_pm_unlock(); device_links_read_unlock(idx); } #define to_devlink(dev) container_of((dev), struct device_link, link_dev) static ssize_t status_show(struct device *dev, struct device_attribute *attr, char *buf) { const char *output; switch (to_devlink(dev)->status) { case DL_STATE_NONE: output = "not tracked"; break; case DL_STATE_DORMANT: output = "dormant"; break; case DL_STATE_AVAILABLE: output = "available"; break; case DL_STATE_CONSUMER_PROBE: output = "consumer probing"; break; case DL_STATE_ACTIVE: output = "active"; break; case DL_STATE_SUPPLIER_UNBIND: output = "supplier unbinding"; break; default: output = "unknown"; break; } return sysfs_emit(buf, "%s\n", output); } static DEVICE_ATTR_RO(status); static ssize_t auto_remove_on_show(struct device *dev, struct device_attribute *attr, char *buf) { struct device_link *link = to_devlink(dev); const char *output; if (device_link_test(link, DL_FLAG_AUTOREMOVE_SUPPLIER)) output = "supplier unbind"; else if (device_link_test(link, DL_FLAG_AUTOREMOVE_CONSUMER)) output = "consumer unbind"; else output = "never"; return sysfs_emit(buf, "%s\n", output); } static DEVICE_ATTR_RO(auto_remove_on); static ssize_t runtime_pm_show(struct device *dev, struct device_attribute *attr, char *buf) { struct device_link *link = to_devlink(dev); return sysfs_emit(buf, "%d\n", device_link_test(link, DL_FLAG_PM_RUNTIME)); } static DEVICE_ATTR_RO(runtime_pm); static ssize_t sync_state_only_show(struct device *dev, struct device_attribute *attr, char *buf) { struct device_link *link = to_devlink(dev); return sysfs_emit(buf, "%d\n", device_link_test(link, DL_FLAG_SYNC_STATE_ONLY)); } static DEVICE_ATTR_RO(sync_state_only); static struct attribute *devlink_attrs[] = { &dev_attr_status.attr, &dev_attr_auto_remove_on.attr, &dev_attr_runtime_pm.attr, &dev_attr_sync_state_only.attr, NULL, }; ATTRIBUTE_GROUPS(devlink); static void device_link_release_fn(struct work_struct *work) { struct device_link *link = container_of(work, struct device_link, rm_work); /* Ensure that all references to the link object have been dropped. */ device_link_synchronize_removal(); pm_runtime_release_supplier(link); /* * If supplier_preactivated is set, the link has been dropped between * the pm_runtime_get_suppliers() and pm_runtime_put_suppliers() calls * in __driver_probe_device(). In that case, drop the supplier's * PM-runtime usage counter to remove the reference taken by * pm_runtime_get_suppliers(). */ if (link->supplier_preactivated) pm_runtime_put_noidle(link->supplier); pm_request_idle(link->supplier); put_device(link->consumer); put_device(link->supplier); kfree(link); } static void devlink_dev_release(struct device *dev) { struct device_link *link = to_devlink(dev); INIT_WORK(&link->rm_work, device_link_release_fn); /* * It may take a while to complete this work because of the SRCU * synchronization in device_link_release_fn() and if the consumer or * supplier devices get deleted when it runs, so put it into the * dedicated workqueue. */ queue_work(device_link_wq, &link->rm_work); } /** * device_link_wait_removal - Wait for ongoing devlink removal jobs to terminate */ void device_link_wait_removal(void) { /* * devlink removal jobs are queued in the dedicated work queue. * To be sure that all removal jobs are terminated, ensure that any * scheduled work has run to completion. */ flush_workqueue(device_link_wq); } EXPORT_SYMBOL_GPL(device_link_wait_removal); static const struct class devlink_class = { .name = "devlink", .dev_groups = devlink_groups, .dev_release = devlink_dev_release, }; static int devlink_add_symlinks(struct device *dev) { char *buf_con __free(kfree) = NULL, *buf_sup __free(kfree) = NULL; int ret; struct device_link *link = to_devlink(dev); struct device *sup = link->supplier; struct device *con = link->consumer; ret = sysfs_create_link(&link->link_dev.kobj, &sup->kobj, "supplier"); if (ret) goto out; ret = sysfs_create_link(&link->link_dev.kobj, &con->kobj, "consumer"); if (ret) goto err_con; buf_con = kasprintf(GFP_KERNEL, "consumer:%s:%s", dev_bus_name(con), dev_name(con)); if (!buf_con) { ret = -ENOMEM; goto err_con_dev; } ret = sysfs_create_link(&sup->kobj, &link->link_dev.kobj, buf_con); if (ret) goto err_con_dev; buf_sup = kasprintf(GFP_KERNEL, "supplier:%s:%s", dev_bus_name(sup), dev_name(sup)); if (!buf_sup) { ret = -ENOMEM; goto err_sup_dev; } ret = sysfs_create_link(&con->kobj, &link->link_dev.kobj, buf_sup); if (ret) goto err_sup_dev; goto out; err_sup_dev: sysfs_remove_link(&sup->kobj, buf_con); err_con_dev: sysfs_remove_link(&link->link_dev.kobj, "consumer"); err_con: sysfs_remove_link(&link->link_dev.kobj, "supplier"); out: return ret; } static void devlink_remove_symlinks(struct device *dev) { char *buf_con __free(kfree) = NULL, *buf_sup __free(kfree) = NULL; struct device_link *link = to_devlink(dev); struct device *sup = link->supplier; struct device *con = link->consumer; sysfs_remove_link(&link->link_dev.kobj, "consumer"); sysfs_remove_link(&link->link_dev.kobj, "supplier"); if (device_is_registered(con)) { buf_sup = kasprintf(GFP_KERNEL, "supplier:%s:%s", dev_bus_name(sup), dev_name(sup)); if (!buf_sup) goto out; sysfs_remove_link(&con->kobj, buf_sup); } buf_con = kasprintf(GFP_KERNEL, "consumer:%s:%s", dev_bus_name(con), dev_name(con)); if (!buf_con) goto out; sysfs_remove_link(&sup->kobj, buf_con); return; out: WARN(1, "Unable to properly free device link symlinks!\n"); } static struct class_interface devlink_class_intf = { .class = &devlink_class, .add_dev = devlink_add_symlinks, .remove_dev = devlink_remove_symlinks, }; static int __init devlink_class_init(void) { int ret; ret = class_register(&devlink_class); if (ret) return ret; ret = class_interface_register(&devlink_class_intf); if (ret) class_unregister(&devlink_class); return ret; } postcore_initcall(devlink_class_init); #define DL_MANAGED_LINK_FLAGS (DL_FLAG_AUTOREMOVE_CONSUMER | \ DL_FLAG_AUTOREMOVE_SUPPLIER | \ DL_FLAG_AUTOPROBE_CONSUMER | \ DL_FLAG_SYNC_STATE_ONLY | \ DL_FLAG_INFERRED | \ DL_FLAG_CYCLE) #define DL_ADD_VALID_FLAGS (DL_MANAGED_LINK_FLAGS | DL_FLAG_STATELESS | \ DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE) /** * device_link_add - Create a link between two devices. * @consumer: Consumer end of the link. * @supplier: Supplier end of the link. * @flags: Link flags. * * Return: On success, a device_link struct will be returned. * On error or invalid flag settings, NULL will be returned. * * The caller is responsible for the proper synchronization of the link creation * with runtime PM. First, setting the DL_FLAG_PM_RUNTIME flag will cause the * runtime PM framework to take the link into account. Second, if the * DL_FLAG_RPM_ACTIVE flag is set in addition to it, the supplier devices will * be forced into the active meta state and reference-counted upon the creation * of the link. If DL_FLAG_PM_RUNTIME is not set, DL_FLAG_RPM_ACTIVE will be * ignored. * * If DL_FLAG_STATELESS is set in @flags, the caller of this function is * expected to release the link returned by it directly with the help of either * device_link_del() or device_link_remove(). * * If that flag is not set, however, the caller of this function is handing the * management of the link over to the driver core entirely and its return value * can only be used to check whether or not the link is present. In that case, * the DL_FLAG_AUTOREMOVE_CONSUMER and DL_FLAG_AUTOREMOVE_SUPPLIER device link * flags can be used to indicate to the driver core when the link can be safely * deleted. Namely, setting one of them in @flags indicates to the driver core * that the link is not going to be used (by the given caller of this function) * after unbinding the consumer or supplier driver, respectively, from its * device, so the link can be deleted at that point. If none of them is set, * the link will be maintained until one of the devices pointed to by it (either * the consumer or the supplier) is unregistered. * * Also, if DL_FLAG_STATELESS, DL_FLAG_AUTOREMOVE_CONSUMER and * DL_FLAG_AUTOREMOVE_SUPPLIER are not set in @flags (that is, a persistent * managed device link is being added), the DL_FLAG_AUTOPROBE_CONSUMER flag can * be used to request the driver core to automatically probe for a consumer * driver after successfully binding a driver to the supplier device. * * The combination of DL_FLAG_STATELESS and one of DL_FLAG_AUTOREMOVE_CONSUMER, * DL_FLAG_AUTOREMOVE_SUPPLIER, or DL_FLAG_AUTOPROBE_CONSUMER set in @flags at * the same time is invalid and will cause NULL to be returned upfront. * However, if a device link between the given @consumer and @supplier pair * exists already when this function is called for them, the existing link will * be returned regardless of its current type and status (the link's flags may * be modified then). The caller of this function is then expected to treat * the link as though it has just been created, so (in particular) if * DL_FLAG_STATELESS was passed in @flags, the link needs to be released * explicitly when not needed any more (as stated above). * * A side effect of the link creation is re-ordering of dpm_list and the * devices_kset list by moving the consumer device and all devices depending * on it to the ends of these lists (that does not happen to devices that have * not been registered when this function is called). * * The supplier device is required to be registered when this function is called * and NULL will be returned if that is not the case. The consumer device need * not be registered, however. */ struct device_link *device_link_add(struct device *consumer, struct device *supplier, u32 flags) { struct device_link *link; if (!consumer || !supplier || consumer == supplier || flags & ~DL_ADD_VALID_FLAGS || (flags & DL_FLAG_STATELESS && flags & DL_MANAGED_LINK_FLAGS) || (flags & DL_FLAG_AUTOPROBE_CONSUMER && flags & (DL_FLAG_AUTOREMOVE_CONSUMER | DL_FLAG_AUTOREMOVE_SUPPLIER))) return NULL; if (flags & DL_FLAG_PM_RUNTIME && flags & DL_FLAG_RPM_ACTIVE) { if (pm_runtime_get_sync(supplier) < 0) { pm_runtime_put_noidle(supplier); return NULL; } } if (!(flags & DL_FLAG_STATELESS)) flags |= DL_FLAG_MANAGED; if (flags & DL_FLAG_SYNC_STATE_ONLY && !device_link_flag_is_sync_state_only(flags)) return NULL; device_links_write_lock(); device_pm_lock(); /* * If the supplier has not been fully registered yet or there is a * reverse (non-SYNC_STATE_ONLY) dependency between the consumer and * the supplier already in the graph, return NULL. If the link is a * SYNC_STATE_ONLY link, we don't check for reverse dependencies * because it only affects sync_state() callbacks. */ if (!device_pm_initialized(supplier) || (!(flags & DL_FLAG_SYNC_STATE_ONLY) && device_is_dependent(consumer, supplier))) { link = NULL; goto out; } /* * SYNC_STATE_ONLY links are useless once a consumer device has probed. * So, only create it if the consumer hasn't probed yet. */ if (flags & DL_FLAG_SYNC_STATE_ONLY && consumer->links.status != DL_DEV_NO_DRIVER && consumer->links.status != DL_DEV_PROBING) { link = NULL; goto out; } /* * DL_FLAG_AUTOREMOVE_SUPPLIER indicates that the link will be needed * longer than for DL_FLAG_AUTOREMOVE_CONSUMER and setting them both * together doesn't make sense, so prefer DL_FLAG_AUTOREMOVE_SUPPLIER. */ if (flags & DL_FLAG_AUTOREMOVE_SUPPLIER) flags &= ~DL_FLAG_AUTOREMOVE_CONSUMER; list_for_each_entry(link, &supplier->links.consumers, s_node) { if (link->consumer != consumer) continue; if (device_link_test(link, DL_FLAG_INFERRED) && !(flags & DL_FLAG_INFERRED)) link->flags &= ~DL_FLAG_INFERRED; if (flags & DL_FLAG_PM_RUNTIME) { if (!device_link_test(link, DL_FLAG_PM_RUNTIME)) { pm_runtime_new_link(consumer); link->flags |= DL_FLAG_PM_RUNTIME; } if (flags & DL_FLAG_RPM_ACTIVE) refcount_inc(&link->rpm_active); } if (flags & DL_FLAG_STATELESS) { kref_get(&link->kref); if (device_link_test(link, DL_FLAG_SYNC_STATE_ONLY) && !device_link_test(link, DL_FLAG_STATELESS)) { link->flags |= DL_FLAG_STATELESS; goto reorder; } else { link->flags |= DL_FLAG_STATELESS; goto out; } } /* * If the life time of the link following from the new flags is * longer than indicated by the flags of the existing link, * update the existing link to stay around longer. */ if (flags & DL_FLAG_AUTOREMOVE_SUPPLIER) { if (device_link_test(link, DL_FLAG_AUTOREMOVE_CONSUMER)) { link->flags &= ~DL_FLAG_AUTOREMOVE_CONSUMER; link->flags |= DL_FLAG_AUTOREMOVE_SUPPLIER; } } else if (!(flags & DL_FLAG_AUTOREMOVE_CONSUMER)) { link->flags &= ~(DL_FLAG_AUTOREMOVE_CONSUMER | DL_FLAG_AUTOREMOVE_SUPPLIER); } if (!device_link_test(link, DL_FLAG_MANAGED)) { kref_get(&link->kref); link->flags |= DL_FLAG_MANAGED; device_link_init_status(link, consumer, supplier); } if (device_link_test(link, DL_FLAG_SYNC_STATE_ONLY) && !(flags & DL_FLAG_SYNC_STATE_ONLY)) { link->flags &= ~DL_FLAG_SYNC_STATE_ONLY; goto reorder; } goto out; } link = kzalloc(sizeof(*link), GFP_KERNEL); if (!link) goto out; refcount_set(&link->rpm_active, 1); get_device(supplier); link->supplier = supplier; INIT_LIST_HEAD(&link->s_node); get_device(consumer); link->consumer = consumer; INIT_LIST_HEAD(&link->c_node); link->flags = flags; kref_init(&link->kref); link->link_dev.class = &devlink_class; device_set_pm_not_required(&link->link_dev); dev_set_name(&link->link_dev, "%s:%s--%s:%s", dev_bus_name(supplier), dev_name(supplier), dev_bus_name(consumer), dev_name(consumer)); if (device_register(&link->link_dev)) { put_device(&link->link_dev); link = NULL; goto out; } if (flags & DL_FLAG_PM_RUNTIME) { if (flags & DL_FLAG_RPM_ACTIVE) refcount_inc(&link->rpm_active); pm_runtime_new_link(consumer); } /* Determine the initial link state. */ if (flags & DL_FLAG_STATELESS) link->status = DL_STATE_NONE; else device_link_init_status(link, consumer, supplier); /* * Some callers expect the link creation during consumer driver probe to * resume the supplier even without DL_FLAG_RPM_ACTIVE. */ if (link->status == DL_STATE_CONSUMER_PROBE && flags & DL_FLAG_PM_RUNTIME) pm_runtime_resume(supplier); list_add_tail_rcu(&link->s_node, &supplier->links.consumers); list_add_tail_rcu(&link->c_node, &consumer->links.suppliers); if (flags & DL_FLAG_SYNC_STATE_ONLY) { dev_dbg(consumer, "Linked as a sync state only consumer to %s\n", dev_name(supplier)); goto out; } reorder: /* * Move the consumer and all of the devices depending on it to the end * of dpm_list and the devices_kset list. * * It is necessary to hold dpm_list locked throughout all that or else * we may end up suspending with a wrong ordering of it. */ device_reorder_to_tail(consumer, NULL); dev_dbg(consumer, "Linked as a consumer to %s\n", dev_name(supplier)); out: device_pm_unlock(); device_links_write_unlock(); if ((flags & DL_FLAG_PM_RUNTIME && flags & DL_FLAG_RPM_ACTIVE) && !link) pm_runtime_put(supplier); return link; } EXPORT_SYMBOL_GPL(device_link_add); static void __device_link_del(struct kref *kref) { struct device_link *link = container_of(kref, struct device_link, kref); dev_dbg(link->consumer, "Dropping the link to %s\n", dev_name(link->supplier)); pm_runtime_drop_link(link); device_link_remove_from_lists(link); device_unregister(&link->link_dev); } static void device_link_put_kref(struct device_link *link) { if (device_link_test(link, DL_FLAG_STATELESS)) kref_put(&link->kref, __device_link_del); else if (!device_is_registered(link->consumer)) __device_link_del(&link->kref); else WARN(1, "Unable to drop a managed device link reference\n"); } /** * device_link_del - Delete a stateless link between two devices. * @link: Device link to delete. * * The caller must ensure proper synchronization of this function with runtime * PM. If the link was added multiple times, it needs to be deleted as often. * Care is required for hotplugged devices: Their links are purged on removal * and calling device_link_del() is then no longer allowed. */ void device_link_del(struct device_link *link) { device_links_write_lock(); device_link_put_kref(link); device_links_write_unlock(); } EXPORT_SYMBOL_GPL(device_link_del); /** * device_link_remove - Delete a stateless link between two devices. * @consumer: Consumer end of the link. * @supplier: Supplier end of the link. * * The caller must ensure proper synchronization of this function with runtime * PM. */ void device_link_remove(void *consumer, struct device *supplier) { struct device_link *link; if (WARN_ON(consumer == supplier)) return; device_links_write_lock(); list_for_each_entry(link, &supplier->links.consumers, s_node) { if (link->consumer == consumer) { device_link_put_kref(link); break; } } device_links_write_unlock(); } EXPORT_SYMBOL_GPL(device_link_remove); static void device_links_missing_supplier(struct device *dev) { struct device_link *link; list_for_each_entry(link, &dev->links.suppliers, c_node) { if (link->status != DL_STATE_CONSUMER_PROBE) continue; if (link->supplier->links.status == DL_DEV_DRIVER_BOUND) { WRITE_ONCE(link->status, DL_STATE_AVAILABLE); } else { WARN_ON(!device_link_test(link, DL_FLAG_SYNC_STATE_ONLY)); WRITE_ONCE(link->status, DL_STATE_DORMANT); } } } static bool dev_is_best_effort(struct device *dev) { return (fw_devlink_best_effort && dev->can_match) || (dev->fwnode && (dev->fwnode->flags & FWNODE_FLAG_BEST_EFFORT)); } static struct fwnode_handle *fwnode_links_check_suppliers( struct fwnode_handle *fwnode) { struct fwnode_link *link; if (!fwnode || fw_devlink_is_permissive()) return NULL; list_for_each_entry(link, &fwnode->suppliers, c_hook) if (!(link->flags & (FWLINK_FLAG_CYCLE | FWLINK_FLAG_IGNORE))) return link->supplier; return NULL; } /** * device_links_check_suppliers - Check presence of supplier drivers. * @dev: Consumer device. * * Check links from this device to any suppliers. Walk the list of the device's * links to suppliers and see if all of them are available. If not, simply * return -EPROBE_DEFER. * * We need to guarantee that the supplier will not go away after the check has * been positive here. It only can go away in __device_release_driver() and * that function checks the device's links to consumers. This means we need to * mark the link as "consumer probe in progress" to make the supplier removal * wait for us to complete (or bad things may happen). * * Links without the DL_FLAG_MANAGED flag set are ignored. */ int device_links_check_suppliers(struct device *dev) { struct device_link *link; int ret = 0, fwnode_ret = 0; struct fwnode_handle *sup_fw; /* * Device waiting for supplier to become available is not allowed to * probe. */ scoped_guard(mutex, &fwnode_link_lock) { sup_fw = fwnode_links_check_suppliers(dev->fwnode); if (sup_fw) { if (dev_is_best_effort(dev)) fwnode_ret = -EAGAIN; else return dev_err_probe(dev, -EPROBE_DEFER, "wait for supplier %pfwf\n", sup_fw); } } device_links_write_lock(); list_for_each_entry(link, &dev->links.suppliers, c_node) { if (!device_link_test(link, DL_FLAG_MANAGED)) continue; if (link->status != DL_STATE_AVAILABLE && !device_link_test(link, DL_FLAG_SYNC_STATE_ONLY)) { if (dev_is_best_effort(dev) && device_link_test(link, DL_FLAG_INFERRED) && !link->supplier->can_match) { ret = -EAGAIN; continue; } device_links_missing_supplier(dev); ret = dev_err_probe(dev, -EPROBE_DEFER, "supplier %s not ready\n", dev_name(link->supplier)); break; } WRITE_ONCE(link->status, DL_STATE_CONSUMER_PROBE); } dev->links.status = DL_DEV_PROBING; device_links_write_unlock(); return ret ? ret : fwnode_ret; } /** * __device_links_queue_sync_state - Queue a device for sync_state() callback * @dev: Device to call sync_state() on * @list: List head to queue the @dev on * * Queues a device for a sync_state() callback when the device links write lock * isn't held. This allows the sync_state() execution flow to use device links * APIs. The caller must ensure this function is called with * device_links_write_lock() held. * * This function does a get_device() to make sure the device is not freed while * on this list. * * So the caller must also ensure that device_links_flush_sync_list() is called * as soon as the caller releases device_links_write_lock(). This is necessary * to make sure the sync_state() is called in a timely fashion and the * put_device() is called on this device. */ static void __device_links_queue_sync_state(struct device *dev, struct list_head *list) { struct device_link *link; if (!dev_has_sync_state(dev)) return; if (dev->state_synced) return; list_for_each_entry(link, &dev->links.consumers, s_node) { if (!device_link_test(link, DL_FLAG_MANAGED)) continue; if (link->status != DL_STATE_ACTIVE) return; } /* * Set the flag here to avoid adding the same device to a list more * than once. This can happen if new consumers get added to the device * and probed before the list is flushed. */ dev->state_synced = true; if (WARN_ON(!list_empty(&dev->links.defer_sync))) return; get_device(dev); list_add_tail(&dev->links.defer_sync, list); } /** * device_links_flush_sync_list - Call sync_state() on a list of devices * @list: List of devices to call sync_state() on * @dont_lock_dev: Device for which lock is already held by the caller * * Calls sync_state() on all the devices that have been queued for it. This * function is used in conjunction with __device_links_queue_sync_state(). The * @dont_lock_dev parameter is useful when this function is called from a * context where a device lock is already held. */ static void device_links_flush_sync_list(struct list_head *list, struct device *dont_lock_dev) { struct device *dev, *tmp; list_for_each_entry_safe(dev, tmp, list, links.defer_sync) { list_del_init(&dev->links.defer_sync); if (dev != dont_lock_dev) device_lock(dev); dev_sync_state(dev); if (dev != dont_lock_dev) device_unlock(dev); put_device(dev); } } void device_links_supplier_sync_state_pause(void) { device_links_write_lock(); defer_sync_state_count++; device_links_write_unlock(); } void device_links_supplier_sync_state_resume(void) { struct device *dev, *tmp; LIST_HEAD(sync_list); device_links_write_lock(); if (!defer_sync_state_count) { WARN(true, "Unmatched sync_state pause/resume!"); goto out; } defer_sync_state_count--; if (defer_sync_state_count) goto out; list_for_each_entry_safe(dev, tmp, &deferred_sync, links.defer_sync) { /* * Delete from deferred_sync list before queuing it to * sync_list because defer_sync is used for both lists. */ list_del_init(&dev->links.defer_sync); __device_links_queue_sync_state(dev, &sync_list); } out: device_links_write_unlock(); device_links_flush_sync_list(&sync_list, NULL); } static int sync_state_resume_initcall(void) { device_links_supplier_sync_state_resume(); return 0; } late_initcall(sync_state_resume_initcall); static void __device_links_supplier_defer_sync(struct device *sup) { if (list_empty(&sup->links.defer_sync) && dev_has_sync_state(sup)) list_add_tail(&sup->links.defer_sync, &deferred_sync); } static void device_link_drop_managed(struct device_link *link) { link->flags &= ~DL_FLAG_MANAGED; WRITE_ONCE(link->status, DL_STATE_NONE); kref_put(&link->kref, __device_link_del); } static ssize_t waiting_for_supplier_show(struct device *dev, struct device_attribute *attr, char *buf) { bool val; device_lock(dev); scoped_guard(mutex, &fwnode_link_lock) val = !!fwnode_links_check_suppliers(dev->fwnode); device_unlock(dev); return sysfs_emit(buf, "%u\n", val); } static DEVICE_ATTR_RO(waiting_for_supplier); /** * device_links_force_bind - Prepares device to be force bound * @dev: Consumer device. * * device_bind_driver() force binds a device to a driver without calling any * driver probe functions. So the consumer really isn't going to wait for any * supplier before it's bound to the driver. We still want the device link * states to be sensible when this happens. * * In preparation for device_bind_driver(), this function goes through each * supplier device links and checks if the supplier is bound. If it is, then * the device link status is set to CONSUMER_PROBE. Otherwise, the device link * is dropped. Links without the DL_FLAG_MANAGED flag set are ignored. */ void device_links_force_bind(struct device *dev) { struct device_link *link, *ln; device_links_write_lock(); list_for_each_entry_safe(link, ln, &dev->links.suppliers, c_node) { if (!device_link_test(link, DL_FLAG_MANAGED)) continue; if (link->status != DL_STATE_AVAILABLE) { device_link_drop_managed(link); continue; } WRITE_ONCE(link->status, DL_STATE_CONSUMER_PROBE); } dev->links.status = DL_DEV_PROBING; device_links_write_unlock(); } /** * device_links_driver_bound - Update device links after probing its driver. * @dev: Device to update the links for. * * The probe has been successful, so update links from this device to any * consumers by changing their status to "available". * * Also change the status of @dev's links to suppliers to "active". * * Links without the DL_FLAG_MANAGED flag set are ignored. */ void device_links_driver_bound(struct device *dev) { struct device_link *link, *ln; LIST_HEAD(sync_list); /* * If a device binds successfully, it's expected to have created all * the device links it needs to or make new device links as it needs * them. So, fw_devlink no longer needs to create device links to any * of the device's suppliers. * * Also, if a child firmware node of this bound device is not added as a * device by now, assume it is never going to be added. Make this bound * device the fallback supplier to the dangling consumers of the child * firmware node because this bound device is probably implementing the * child firmware node functionality and we don't want the dangling * consumers to defer probe indefinitely waiting for a device for the * child firmware node. */ if (dev->fwnode && dev->fwnode->dev == dev) { struct fwnode_handle *child; fwnode_links_purge_suppliers(dev->fwnode); guard(mutex)(&fwnode_link_lock); fwnode_for_each_available_child_node(dev->fwnode, child) __fw_devlink_pickup_dangling_consumers(child, dev->fwnode); __fw_devlink_link_to_consumers(dev); } device_remove_file(dev, &dev_attr_waiting_for_supplier); device_links_write_lock(); list_for_each_entry(link, &dev->links.consumers, s_node) { if (!device_link_test(link, DL_FLAG_MANAGED)) continue; /* * Links created during consumer probe may be in the "consumer * probe" state to start with if the supplier is still probing * when they are created and they may become "active" if the * consumer probe returns first. Skip them here. */ if (link->status == DL_STATE_CONSUMER_PROBE || link->status == DL_STATE_ACTIVE) continue; WARN_ON(link->status != DL_STATE_DORMANT); WRITE_ONCE(link->status, DL_STATE_AVAILABLE); if (device_link_test(link, DL_FLAG_AUTOPROBE_CONSUMER)) driver_deferred_probe_add(link->consumer); } if (defer_sync_state_count) __device_links_supplier_defer_sync(dev); else __device_links_queue_sync_state(dev, &sync_list); list_for_each_entry_safe(link, ln, &dev->links.suppliers, c_node) { struct device *supplier; if (!device_link_test(link, DL_FLAG_MANAGED)) continue; supplier = link->supplier; if (device_link_test(link, DL_FLAG_SYNC_STATE_ONLY)) { /* * When DL_FLAG_SYNC_STATE_ONLY is set, it means no * other DL_MANAGED_LINK_FLAGS have been set. So, it's * save to drop the managed link completely. */ device_link_drop_managed(link); } else if (dev_is_best_effort(dev) && device_link_test(link, DL_FLAG_INFERRED) && link->status != DL_STATE_CONSUMER_PROBE && !link->supplier->can_match) { /* * When dev_is_best_effort() is true, we ignore device * links to suppliers that don't have a driver. If the * consumer device still managed to probe, there's no * point in maintaining a device link in a weird state * (consumer probed before supplier). So delete it. */ device_link_drop_managed(link); } else { WARN_ON(link->status != DL_STATE_CONSUMER_PROBE); WRITE_ONCE(link->status, DL_STATE_ACTIVE); } /* * This needs to be done even for the deleted * DL_FLAG_SYNC_STATE_ONLY device link in case it was the last * device link that was preventing the supplier from getting a * sync_state() call. */ if (defer_sync_state_count) __device_links_supplier_defer_sync(supplier); else __device_links_queue_sync_state(supplier, &sync_list); } dev->links.status = DL_DEV_DRIVER_BOUND; device_links_write_unlock(); device_links_flush_sync_list(&sync_list, dev); } /** * __device_links_no_driver - Update links of a device without a driver. * @dev: Device without a drvier. * * Delete all non-persistent links from this device to any suppliers. * * Persistent links stay around, but their status is changed to "available", * unless they already are in the "supplier unbind in progress" state in which * case they need not be updated. * * Links without the DL_FLAG_MANAGED flag set are ignored. */ static void __device_links_no_driver(struct device *dev) { struct device_link *link, *ln; list_for_each_entry_safe_reverse(link, ln, &dev->links.suppliers, c_node) { if (!device_link_test(link, DL_FLAG_MANAGED)) continue; if (device_link_test(link, DL_FLAG_AUTOREMOVE_CONSUMER)) { device_link_drop_managed(link); continue; } if (link->status != DL_STATE_CONSUMER_PROBE && link->status != DL_STATE_ACTIVE) continue; if (link->supplier->links.status == DL_DEV_DRIVER_BOUND) { WRITE_ONCE(link->status, DL_STATE_AVAILABLE); } else { WARN_ON(!device_link_test(link, DL_FLAG_SYNC_STATE_ONLY)); WRITE_ONCE(link->status, DL_STATE_DORMANT); } } dev->links.status = DL_DEV_NO_DRIVER; } /** * device_links_no_driver - Update links after failing driver probe. * @dev: Device whose driver has just failed to probe. * * Clean up leftover links to consumers for @dev and invoke * %__device_links_no_driver() to update links to suppliers for it as * appropriate. * * Links without the DL_FLAG_MANAGED flag set are ignored. */ void device_links_no_driver(struct device *dev) { struct device_link *link; device_links_write_lock(); list_for_each_entry(link, &dev->links.consumers, s_node) { if (!device_link_test(link, DL_FLAG_MANAGED)) continue; /* * The probe has failed, so if the status of the link is * "consumer probe" or "active", it must have been added by * a probing consumer while this device was still probing. * Change its state to "dormant", as it represents a valid * relationship, but it is not functionally meaningful. */ if (link->status == DL_STATE_CONSUMER_PROBE || link->status == DL_STATE_ACTIVE) WRITE_ONCE(link->status, DL_STATE_DORMANT); } __device_links_no_driver(dev); device_links_write_unlock(); } /** * device_links_driver_cleanup - Update links after driver removal. * @dev: Device whose driver has just gone away. * * Update links to consumers for @dev by changing their status to "dormant" and * invoke %__device_links_no_driver() to update links to suppliers for it as * appropriate. * * Links without the DL_FLAG_MANAGED flag set are ignored. */ void device_links_driver_cleanup(struct device *dev) { struct device_link *link, *ln; device_links_write_lock(); list_for_each_entry_safe(link, ln, &dev->links.consumers, s_node) { if (!device_link_test(link, DL_FLAG_MANAGED)) continue; WARN_ON(device_link_test(link, DL_FLAG_AUTOREMOVE_CONSUMER)); WARN_ON(link->status != DL_STATE_SUPPLIER_UNBIND); /* * autoremove the links between this @dev and its consumer * devices that are not active, i.e. where the link state * has moved to DL_STATE_SUPPLIER_UNBIND. */ if (link->status == DL_STATE_SUPPLIER_UNBIND && device_link_test(link, DL_FLAG_AUTOREMOVE_SUPPLIER)) device_link_drop_managed(link); WRITE_ONCE(link->status, DL_STATE_DORMANT); } list_del_init(&dev->links.defer_sync); __device_links_no_driver(dev); device_links_write_unlock(); } /** * device_links_busy - Check if there are any busy links to consumers. * @dev: Device to check. * * Check each consumer of the device and return 'true' if its link's status * is one of "consumer probe" or "active" (meaning that the given consumer is * probing right now or its driver is present). Otherwise, change the link * state to "supplier unbind" to prevent the consumer from being probed * successfully going forward. * * Return 'false' if there are no probing or active consumers. * * Links without the DL_FLAG_MANAGED flag set are ignored. */ bool device_links_busy(struct device *dev) { struct device_link *link; bool ret = false; device_links_write_lock(); list_for_each_entry(link, &dev->links.consumers, s_node) { if (!device_link_test(link, DL_FLAG_MANAGED)) continue; if (link->status == DL_STATE_CONSUMER_PROBE || link->status == DL_STATE_ACTIVE) { ret = true; break; } WRITE_ONCE(link->status, DL_STATE_SUPPLIER_UNBIND); } dev->links.status = DL_DEV_UNBINDING; device_links_write_unlock(); return ret; } /** * device_links_unbind_consumers - Force unbind consumers of the given device. * @dev: Device to unbind the consumers of. * * Walk the list of links to consumers for @dev and if any of them is in the * "consumer probe" state, wait for all device probes in progress to complete * and start over. * * If that's not the case, change the status of the link to "supplier unbind" * and check if the link was in the "active" state. If so, force the consumer * driver to unbind and start over (the consumer will not re-probe as we have * changed the state of the link already). * * Links without the DL_FLAG_MANAGED flag set are ignored. */ void device_links_unbind_consumers(struct device *dev) { struct device_link *link; start: device_links_write_lock(); list_for_each_entry(link, &dev->links.consumers, s_node) { enum device_link_state status; if (!device_link_test(link, DL_FLAG_MANAGED) || device_link_test(link, DL_FLAG_SYNC_STATE_ONLY)) continue; status = link->status; if (status == DL_STATE_CONSUMER_PROBE) { device_links_write_unlock(); wait_for_device_probe(); goto start; } WRITE_ONCE(link->status, DL_STATE_SUPPLIER_UNBIND); if (status == DL_STATE_ACTIVE) { struct device *consumer = link->consumer; get_device(consumer); device_links_write_unlock(); device_release_driver_internal(consumer, NULL, consumer->parent); put_device(consumer); goto start; } } device_links_write_unlock(); } /** * device_links_purge - Delete existing links to other devices. * @dev: Target device. */ static void device_links_purge(struct device *dev) { struct device_link *link, *ln; if (dev->class == &devlink_class) return; /* * Delete all of the remaining links from this device to any other * devices (either consumers or suppliers). */ device_links_write_lock(); list_for_each_entry_safe_reverse(link, ln, &dev->links.suppliers, c_node) { WARN_ON(link->status == DL_STATE_ACTIVE); __device_link_del(&link->kref); } list_for_each_entry_safe_reverse(link, ln, &dev->links.consumers, s_node) { WARN_ON(link->status != DL_STATE_DORMANT && link->status != DL_STATE_NONE); __device_link_del(&link->kref); } device_links_write_unlock(); } #define FW_DEVLINK_FLAGS_PERMISSIVE (DL_FLAG_INFERRED | \ DL_FLAG_SYNC_STATE_ONLY) #define FW_DEVLINK_FLAGS_ON (DL_FLAG_INFERRED | \ DL_FLAG_AUTOPROBE_CONSUMER) #define FW_DEVLINK_FLAGS_RPM (FW_DEVLINK_FLAGS_ON | \ DL_FLAG_PM_RUNTIME) static u32 fw_devlink_flags = FW_DEVLINK_FLAGS_RPM; static int __init fw_devlink_setup(char *arg) { if (!arg) return -EINVAL; if (strcmp(arg, "off") == 0) { fw_devlink_flags = 0; } else if (strcmp(arg, "permissive") == 0) { fw_devlink_flags = FW_DEVLINK_FLAGS_PERMISSIVE; } else if (strcmp(arg, "on") == 0) { fw_devlink_flags = FW_DEVLINK_FLAGS_ON; } else if (strcmp(arg, "rpm") == 0) { fw_devlink_flags = FW_DEVLINK_FLAGS_RPM; } return 0; } early_param("fw_devlink", fw_devlink_setup); static bool fw_devlink_strict; static int __init fw_devlink_strict_setup(char *arg) { return kstrtobool(arg, &fw_devlink_strict); } early_param("fw_devlink.strict", fw_devlink_strict_setup); #define FW_DEVLINK_SYNC_STATE_STRICT 0 #define FW_DEVLINK_SYNC_STATE_TIMEOUT 1 #ifndef CONFIG_FW_DEVLINK_SYNC_STATE_TIMEOUT static int fw_devlink_sync_state; #else static int fw_devlink_sync_state = FW_DEVLINK_SYNC_STATE_TIMEOUT; #endif static int __init fw_devlink_sync_state_setup(char *arg) { if (!arg) return -EINVAL; if (strcmp(arg, "strict") == 0) { fw_devlink_sync_state = FW_DEVLINK_SYNC_STATE_STRICT; return 0; } else if (strcmp(arg, "timeout") == 0) { fw_devlink_sync_state = FW_DEVLINK_SYNC_STATE_TIMEOUT; return 0; } return -EINVAL; } early_param("fw_devlink.sync_state", fw_devlink_sync_state_setup); static inline u32 fw_devlink_get_flags(u8 fwlink_flags) { if (fwlink_flags & FWLINK_FLAG_CYCLE) return FW_DEVLINK_FLAGS_PERMISSIVE | DL_FLAG_CYCLE; return fw_devlink_flags; } static bool fw_devlink_is_permissive(void) { return fw_devlink_flags == FW_DEVLINK_FLAGS_PERMISSIVE; } bool fw_devlink_is_strict(void) { return fw_devlink_strict && !fw_devlink_is_permissive(); } static void fw_devlink_parse_fwnode(struct fwnode_handle *fwnode) { if (fwnode->flags & FWNODE_FLAG_LINKS_ADDED) return; fwnode_call_int_op(fwnode, add_links); fwnode->flags |= FWNODE_FLAG_LINKS_ADDED; } static void fw_devlink_parse_fwtree(struct fwnode_handle *fwnode) { struct fwnode_handle *child = NULL; fw_devlink_parse_fwnode(fwnode); while ((child = fwnode_get_next_available_child_node(fwnode, child))) fw_devlink_parse_fwtree(child); } static void fw_devlink_relax_link(struct device_link *link) { if (!device_link_test(link, DL_FLAG_INFERRED)) return; if (device_link_flag_is_sync_state_only(link->flags)) return; pm_runtime_drop_link(link); link->flags = DL_FLAG_MANAGED | FW_DEVLINK_FLAGS_PERMISSIVE; dev_dbg(link->consumer, "Relaxing link with %s\n", dev_name(link->supplier)); } static int fw_devlink_no_driver(struct device *dev, void *data) { struct device_link *link = to_devlink(dev); if (!link->supplier->can_match) fw_devlink_relax_link(link); return 0; } void fw_devlink_drivers_done(void) { fw_devlink_drv_reg_done = true; device_links_write_lock(); class_for_each_device(&devlink_class, NULL, NULL, fw_devlink_no_driver); device_links_write_unlock(); } static int fw_devlink_dev_sync_state(struct device *dev, void *data) { struct device_link *link = to_devlink(dev); struct device *sup = link->supplier; if (!device_link_test(link, DL_FLAG_MANAGED) || link->status == DL_STATE_ACTIVE || sup->state_synced || !dev_has_sync_state(sup)) return 0; if (fw_devlink_sync_state == FW_DEVLINK_SYNC_STATE_STRICT) { dev_info(sup, "sync_state() pending due to %s\n", dev_name(link->consumer)); return 0; } if (!list_empty(&sup->links.defer_sync)) return 0; dev_warn(sup, "Timed out. Forcing sync_state()\n"); sup->state_synced = true; get_device(sup); list_add_tail(&sup->links.defer_sync, data); return 0; } void fw_devlink_probing_done(void) { LIST_HEAD(sync_list); device_links_write_lock(); class_for_each_device(&devlink_class, NULL, &sync_list, fw_devlink_dev_sync_state); device_links_write_unlock(); device_links_flush_sync_list(&sync_list, NULL); } /** * wait_for_init_devices_probe - Try to probe any device needed for init * * Some devices might need to be probed and bound successfully before the kernel * boot sequence can finish and move on to init/userspace. For example, a * network interface might need to be bound to be able to mount a NFS rootfs. * * With fw_devlink=on by default, some of these devices might be blocked from * probing because they are waiting on a optional supplier that doesn't have a * driver. While fw_devlink will eventually identify such devices and unblock * the probing automatically, it might be too late by the time it unblocks the * probing of devices. For example, the IP4 autoconfig might timeout before * fw_devlink unblocks probing of the network interface. * * This function is available to temporarily try and probe all devices that have * a driver even if some of their suppliers haven't been added or don't have * drivers. * * The drivers can then decide which of the suppliers are optional vs mandatory * and probe the device if possible. By the time this function returns, all such * "best effort" probes are guaranteed to be completed. If a device successfully * probes in this mode, we delete all fw_devlink discovered dependencies of that * device where the supplier hasn't yet probed successfully because they have to * be optional dependencies. * * Any devices that didn't successfully probe go back to being treated as if * this function was never called. * * This also means that some devices that aren't needed for init and could have * waited for their optional supplier to probe (when the supplier's module is * loaded later on) would end up probing prematurely with limited functionality. * So call this function only when boot would fail without it. */ void __init wait_for_init_devices_probe(void) { if (!fw_devlink_flags || fw_devlink_is_permissive()) return; /* * Wait for all ongoing probes to finish so that the "best effort" is * only applied to devices that can't probe otherwise. */ wait_for_device_probe(); pr_info("Trying to probe devices needed for running init ...\n"); fw_devlink_best_effort = true; driver_deferred_probe_trigger(); /* * Wait for all "best effort" probes to finish before going back to * normal enforcement. */ wait_for_device_probe(); fw_devlink_best_effort = false; } static void fw_devlink_unblock_consumers(struct device *dev) { struct device_link *link; if (!fw_devlink_flags || fw_devlink_is_permissive()) return; device_links_write_lock(); list_for_each_entry(link, &dev->links.consumers, s_node) fw_devlink_relax_link(link); device_links_write_unlock(); } static bool fwnode_init_without_drv(struct fwnode_handle *fwnode) { struct device *dev; bool ret; if (!(fwnode->flags & FWNODE_FLAG_INITIALIZED)) return false; dev = get_dev_from_fwnode(fwnode); ret = !dev || dev->links.status == DL_DEV_NO_DRIVER; put_device(dev); return ret; } static bool fwnode_ancestor_init_without_drv(struct fwnode_handle *fwnode) { struct fwnode_handle *parent; fwnode_for_each_parent_node(fwnode, parent) { if (fwnode_init_without_drv(parent)) { fwnode_handle_put(parent); return true; } } return false; } /** * fwnode_is_ancestor_of - Test if @ancestor is ancestor of @child * @ancestor: Firmware which is tested for being an ancestor * @child: Firmware which is tested for being the child * * A node is considered an ancestor of itself too. * * Return: true if @ancestor is an ancestor of @child. Otherwise, returns false. */ static bool fwnode_is_ancestor_of(const struct fwnode_handle *ancestor, const struct fwnode_handle *child) { struct fwnode_handle *parent; if (IS_ERR_OR_NULL(ancestor)) return false; if (child == ancestor) return true; fwnode_for_each_parent_node(child, parent) { if (parent == ancestor) { fwnode_handle_put(parent); return true; } } return false; } /** * fwnode_get_next_parent_dev - Find device of closest ancestor fwnode * @fwnode: firmware node * * Given a firmware node (@fwnode), this function finds its closest ancestor * firmware node that has a corresponding struct device and returns that struct * device. * * The caller is responsible for calling put_device() on the returned device * pointer. * * Return: a pointer to the device of the @fwnode's closest ancestor. */ static struct device *fwnode_get_next_parent_dev(const struct fwnode_handle *fwnode) { struct fwnode_handle *parent; struct device *dev; fwnode_for_each_parent_node(fwnode, parent) { dev = get_dev_from_fwnode(parent); if (dev) { fwnode_handle_put(parent); return dev; } } return NULL; } /** * __fw_devlink_relax_cycles - Relax and mark dependency cycles. * @con_handle: Potential consumer device fwnode. * @sup_handle: Potential supplier's fwnode. * * Needs to be called with fwnode_lock and device link lock held. * * Check if @sup_handle or any of its ancestors or suppliers direct/indirectly * depend on @con. This function can detect multiple cyles between @sup_handle * and @con. When such dependency cycles are found, convert all device links * created solely by fw_devlink into SYNC_STATE_ONLY device links. Also, mark * all fwnode links in the cycle with FWLINK_FLAG_CYCLE so that when they are * converted into a device link in the future, they are created as * SYNC_STATE_ONLY device links. This is the equivalent of doing * fw_devlink=permissive just between the devices in the cycle. We need to do * this because, at this point, fw_devlink can't tell which of these * dependencies is not a real dependency. * * Return true if one or more cycles were found. Otherwise, return false. */ static bool __fw_devlink_relax_cycles(struct fwnode_handle *con_handle, struct fwnode_handle *sup_handle) { struct device *sup_dev = NULL, *par_dev = NULL, *con_dev = NULL; struct fwnode_link *link; struct device_link *dev_link; bool ret = false; if (!sup_handle) return false; /* * We aren't trying to find all cycles. Just a cycle between con and * sup_handle. */ if (sup_handle->flags & FWNODE_FLAG_VISITED) return false; sup_handle->flags |= FWNODE_FLAG_VISITED; /* Termination condition. */ if (sup_handle == con_handle) { pr_debug("----- cycle: start -----\n"); ret = true; goto out; } sup_dev = get_dev_from_fwnode(sup_handle); con_dev = get_dev_from_fwnode(con_handle); /* * If sup_dev is bound to a driver and @con hasn't started binding to a * driver, sup_dev can't be a consumer of @con. So, no need to check * further. */ if (sup_dev && sup_dev->links.status == DL_DEV_DRIVER_BOUND && con_dev && con_dev->links.status == DL_DEV_NO_DRIVER) { ret = false; goto out; } list_for_each_entry(link, &sup_handle->suppliers, c_hook) { if (link->flags & FWLINK_FLAG_IGNORE) continue; if (__fw_devlink_relax_cycles(con_handle, link->supplier)) { __fwnode_link_cycle(link); ret = true; } } /* * Give priority to device parent over fwnode parent to account for any * quirks in how fwnodes are converted to devices. */ if (sup_dev) par_dev = get_device(sup_dev->parent); else par_dev = fwnode_get_next_parent_dev(sup_handle); if (par_dev && __fw_devlink_relax_cycles(con_handle, par_dev->fwnode)) { pr_debug("%pfwf: cycle: child of %pfwf\n", sup_handle, par_dev->fwnode); ret = true; } if (!sup_dev) goto out; list_for_each_entry(dev_link, &sup_dev->links.suppliers, c_node) { /* * Ignore a SYNC_STATE_ONLY flag only if it wasn't marked as * such due to a cycle. */ if (device_link_flag_is_sync_state_only(dev_link->flags) && !device_link_test(dev_link, DL_FLAG_CYCLE)) continue; if (__fw_devlink_relax_cycles(con_handle, dev_link->supplier->fwnode)) { pr_debug("%pfwf: cycle: depends on %pfwf\n", sup_handle, dev_link->supplier->fwnode); fw_devlink_relax_link(dev_link); dev_link->flags |= DL_FLAG_CYCLE; ret = true; } } out: sup_handle->flags &= ~FWNODE_FLAG_VISITED; put_device(sup_dev); put_device(con_dev); put_device(par_dev); return ret; } /** * fw_devlink_create_devlink - Create a device link from a consumer to fwnode * @con: consumer device for the device link * @sup_handle: fwnode handle of supplier * @link: fwnode link that's being converted to a device link * * This function will try to create a device link between the consumer device * @con and the supplier device represented by @sup_handle. * * The supplier has to be provided as a fwnode because incorrect cycles in * fwnode links can sometimes cause the supplier device to never be created. * This function detects such cases and returns an error if it cannot create a * device link from the consumer to a missing supplier. * * Returns, * 0 on successfully creating a device link * -EINVAL if the device link cannot be created as expected * -EAGAIN if the device link cannot be created right now, but it may be * possible to do that in the future */ static int fw_devlink_create_devlink(struct device *con, struct fwnode_handle *sup_handle, struct fwnode_link *link) { struct device *sup_dev; int ret = 0; u32 flags; if (link->flags & FWLINK_FLAG_IGNORE) return 0; /* * In some cases, a device P might also be a supplier to its child node * C. However, this would defer the probe of C until the probe of P * completes successfully. This is perfectly fine in the device driver * model. device_add() doesn't guarantee probe completion of the device * by the time it returns. * * However, there are a few drivers that assume C will finish probing * as soon as it's added and before P finishes probing. So, we provide * a flag to let fw_devlink know not to delay the probe of C until the * probe of P completes successfully. * * When such a flag is set, we can't create device links where P is the * supplier of C as that would delay the probe of C. */ if (sup_handle->flags & FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD && fwnode_is_ancestor_of(sup_handle, con->fwnode)) return -EINVAL; /* * Don't try to optimize by not calling the cycle detection logic under * certain conditions. There's always some corner case that won't get * detected. */ device_links_write_lock(); if (__fw_devlink_relax_cycles(link->consumer, sup_handle)) { __fwnode_link_cycle(link); pr_debug("----- cycle: end -----\n"); pr_info("%pfwf: Fixed dependency cycle(s) with %pfwf\n", link->consumer, sup_handle); } device_links_write_unlock(); if (con->fwnode == link->consumer) flags = fw_devlink_get_flags(link->flags); else flags = FW_DEVLINK_FLAGS_PERMISSIVE; if (sup_handle->flags & FWNODE_FLAG_NOT_DEVICE) sup_dev = fwnode_get_next_parent_dev(sup_handle); else sup_dev = get_dev_from_fwnode(sup_handle); if (sup_dev) { /* * If it's one of those drivers that don't actually bind to * their device using driver core, then don't wait on this * supplier device indefinitely. */ if (sup_dev->links.status == DL_DEV_NO_DRIVER && sup_handle->flags & FWNODE_FLAG_INITIALIZED) { dev_dbg(con, "Not linking %pfwf - dev might never probe\n", sup_handle); ret = -EINVAL; goto out; } if (con != sup_dev && !device_link_add(con, sup_dev, flags)) { dev_err(con, "Failed to create device link (0x%x) with supplier %s for %pfwf\n", flags, dev_name(sup_dev), link->consumer); ret = -EINVAL; } goto out; } /* * Supplier or supplier's ancestor already initialized without a struct * device or being probed by a driver. */ if (fwnode_init_without_drv(sup_handle) || fwnode_ancestor_init_without_drv(sup_handle)) { dev_dbg(con, "Not linking %pfwf - might never become dev\n", sup_handle); return -EINVAL; } ret = -EAGAIN; out: put_device(sup_dev); return ret; } /** * __fw_devlink_link_to_consumers - Create device links to consumers of a device * @dev: Device that needs to be linked to its consumers * * This function looks at all the consumer fwnodes of @dev and creates device * links between the consumer device and @dev (supplier). * * If the consumer device has not been added yet, then this function creates a * SYNC_STATE_ONLY link between @dev (supplier) and the closest ancestor device * of the consumer fwnode. This is necessary to make sure @dev doesn't get a * sync_state() callback before the real consumer device gets to be added and * then probed. * * Once device links are created from the real consumer to @dev (supplier), the * fwnode links are deleted. */ static void __fw_devlink_link_to_consumers(struct device *dev) { struct fwnode_handle *fwnode = dev->fwnode; struct fwnode_link *link, *tmp; list_for_each_entry_safe(link, tmp, &fwnode->consumers, s_hook) { struct device *con_dev; bool own_link = true; int ret; con_dev = get_dev_from_fwnode(link->consumer); /* * If consumer device is not available yet, make a "proxy" * SYNC_STATE_ONLY link from the consumer's parent device to * the supplier device. This is necessary to make sure the * supplier doesn't get a sync_state() callback before the real * consumer can create a device link to the supplier. * * This proxy link step is needed to handle the case where the * consumer's parent device is added before the supplier. */ if (!con_dev) { con_dev = fwnode_get_next_parent_dev(link->consumer); /* * However, if the consumer's parent device is also the * parent of the supplier, don't create a * consumer-supplier link from the parent to its child * device. Such a dependency is impossible. */ if (con_dev && fwnode_is_ancestor_of(con_dev->fwnode, fwnode)) { put_device(con_dev); con_dev = NULL; } else { own_link = false; } } if (!con_dev) continue; ret = fw_devlink_create_devlink(con_dev, fwnode, link); put_device(con_dev); if (!own_link || ret == -EAGAIN) continue; __fwnode_link_del(link); } } /** * __fw_devlink_link_to_suppliers - Create device links to suppliers of a device * @dev: The consumer device that needs to be linked to its suppliers * @fwnode: Root of the fwnode tree that is used to create device links * * This function looks at all the supplier fwnodes of fwnode tree rooted at * @fwnode and creates device links between @dev (consumer) and all the * supplier devices of the entire fwnode tree at @fwnode. * * The function creates normal (non-SYNC_STATE_ONLY) device links between @dev * and the real suppliers of @dev. Once these device links are created, the * fwnode links are deleted. * * In addition, it also looks at all the suppliers of the entire fwnode tree * because some of the child devices of @dev that have not been added yet * (because @dev hasn't probed) might already have their suppliers added to * driver core. So, this function creates SYNC_STATE_ONLY device links between * @dev (consumer) and these suppliers to make sure they don't execute their * sync_state() callbacks before these child devices have a chance to create * their device links. The fwnode links that correspond to the child devices * aren't delete because they are needed later to create the device links * between the real consumer and supplier devices. */ static void __fw_devlink_link_to_suppliers(struct device *dev, struct fwnode_handle *fwnode) { bool own_link = (dev->fwnode == fwnode); struct fwnode_link *link, *tmp; struct fwnode_handle *child = NULL; list_for_each_entry_safe(link, tmp, &fwnode->suppliers, c_hook) { int ret; struct fwnode_handle *sup = link->supplier; ret = fw_devlink_create_devlink(dev, sup, link); if (!own_link || ret == -EAGAIN) continue; __fwnode_link_del(link); } /* * Make "proxy" SYNC_STATE_ONLY device links to represent the needs of * all the descendants. This proxy link step is needed to handle the * case where the supplier is added before the consumer's parent device * (@dev). */ while ((child = fwnode_get_next_available_child_node(fwnode, child))) __fw_devlink_link_to_suppliers(dev, child); } static void fw_devlink_link_device(struct device *dev) { struct fwnode_handle *fwnode = dev->fwnode; if (!fw_devlink_flags) return; fw_devlink_parse_fwtree(fwnode); guard(mutex)(&fwnode_link_lock); __fw_devlink_link_to_consumers(dev); __fw_devlink_link_to_suppliers(dev, fwnode); } /* Device links support end. */ static struct kobject *dev_kobj; /* /sys/dev/char */ static struct kobject *sysfs_dev_char_kobj; /* /sys/dev/block */ static struct kobject *sysfs_dev_block_kobj; static DEFINE_MUTEX(device_hotplug_lock); void lock_device_hotplug(void) { mutex_lock(&device_hotplug_lock); } void unlock_device_hotplug(void) { mutex_unlock(&device_hotplug_lock); } int lock_device_hotplug_sysfs(void) { if (mutex_trylock(&device_hotplug_lock)) return 0; /* Avoid busy looping (5 ms of sleep should do). */ msleep(5); return restart_syscall(); } #ifdef CONFIG_BLOCK static inline int device_is_not_partition(struct device *dev) { return !(dev->type == &part_type); } #else static inline int device_is_not_partition(struct device *dev) { return 1; } #endif static void device_platform_notify(struct device *dev) { acpi_device_notify(dev); software_node_notify(dev); } static void device_platform_notify_remove(struct device *dev) { software_node_notify_remove(dev); acpi_device_notify_remove(dev); } /** * dev_driver_string - Return a device's driver name, if at all possible * @dev: struct device to get the name of * * Will return the device's driver's name if it is bound to a device. If * the device is not bound to a driver, it will return the name of the bus * it is attached to. If it is not attached to a bus either, an empty * string will be returned. */ const char *dev_driver_string(const struct device *dev) { struct device_driver *drv; /* dev->driver can change to NULL underneath us because of unbinding, * so be careful about accessing it. dev->bus and dev->class should * never change once they are set, so they don't need special care. */ drv = READ_ONCE(dev->driver); return drv ? drv->name : dev_bus_name(dev); } EXPORT_SYMBOL(dev_driver_string); #define to_dev_attr(_attr) container_of(_attr, struct device_attribute, attr) static ssize_t dev_attr_show(struct kobject *kobj, struct attribute *attr, char *buf) { struct device_attribute *dev_attr = to_dev_attr(attr); struct device *dev = kobj_to_dev(kobj); ssize_t ret = -EIO; if (dev_attr->show) ret = dev_attr->show(dev, dev_attr, buf); if (ret >= (ssize_t)PAGE_SIZE) { printk("dev_attr_show: %pS returned bad count\n", dev_attr->show); } return ret; } static ssize_t dev_attr_store(struct kobject *kobj, struct attribute *attr, const char *buf, size_t count) { struct device_attribute *dev_attr = to_dev_attr(attr); struct device *dev = kobj_to_dev(kobj); ssize_t ret = -EIO; if (dev_attr->store) ret = dev_attr->store(dev, dev_attr, buf, count); return ret; } static const struct sysfs_ops dev_sysfs_ops = { .show = dev_attr_show, .store = dev_attr_store, }; #define to_ext_attr(x) container_of(x, struct dev_ext_attribute, attr) ssize_t device_store_ulong(struct device *dev, struct device_attribute *attr, const char *buf, size_t size) { struct dev_ext_attribute *ea = to_ext_attr(attr); int ret; unsigned long new; ret = kstrtoul(buf, 0, &new); if (ret) return ret; *(unsigned long *)(ea->var) = new; /* Always return full write size even if we didn't consume all */ return size; } EXPORT_SYMBOL_GPL(device_store_ulong); ssize_t device_show_ulong(struct device *dev, struct device_attribute *attr, char *buf) { struct dev_ext_attribute *ea = to_ext_attr(attr); return sysfs_emit(buf, "%lx\n", *(unsigned long *)(ea->var)); } EXPORT_SYMBOL_GPL(device_show_ulong); ssize_t device_store_int(struct device *dev, struct device_attribute *attr, const char *buf, size_t size) { struct dev_ext_attribute *ea = to_ext_attr(attr); int ret; long new; ret = kstrtol(buf, 0, &new); if (ret) return ret; if (new > INT_MAX || new < INT_MIN) return -EINVAL; *(int *)(ea->var) = new; /* Always return full write size even if we didn't consume all */ return size; } EXPORT_SYMBOL_GPL(device_store_int); ssize_t device_show_int(struct device *dev, struct device_attribute *attr, char *buf) { struct dev_ext_attribute *ea = to_ext_attr(attr); return sysfs_emit(buf, "%d\n", *(int *)(ea->var)); } EXPORT_SYMBOL_GPL(device_show_int); ssize_t device_store_bool(struct device *dev, struct device_attribute *attr, const char *buf, size_t size) { struct dev_ext_attribute *ea = to_ext_attr(attr); if (kstrtobool(buf, ea->var) < 0) return -EINVAL; return size; } EXPORT_SYMBOL_GPL(device_store_bool); ssize_t device_show_bool(struct device *dev, struct device_attribute *attr, char *buf) { struct dev_ext_attribute *ea = to_ext_attr(attr); return sysfs_emit(buf, "%d\n", *(bool *)(ea->var)); } EXPORT_SYMBOL_GPL(device_show_bool); ssize_t device_show_string(struct device *dev, struct device_attribute *attr, char *buf) { struct dev_ext_attribute *ea = to_ext_attr(attr); return sysfs_emit(buf, "%s\n", (char *)ea->var); } EXPORT_SYMBOL_GPL(device_show_string); /** * device_release - free device structure. * @kobj: device's kobject. * * This is called once the reference count for the object * reaches 0. We forward the call to the device's release * method, which should handle actually freeing the structure. */ static void device_release(struct kobject *kobj) { struct device *dev = kobj_to_dev(kobj); struct device_private *p = dev->p; /* * Some platform devices are driven without driver attached * and managed resources may have been acquired. Make sure * all resources are released. * * Drivers still can add resources into device after device * is deleted but alive, so release devres here to avoid * possible memory leak. */ devres_release_all(dev); kfree(dev->dma_range_map); if (dev->release) dev->release(dev); else if (dev->type && dev->type->release) dev->type->release(dev); else if (dev->class && dev->class->dev_release) dev->class->dev_release(dev); else WARN(1, KERN_ERR "Device '%s' does not have a release() function, it is broken and must be fixed. See Documentation/core-api/kobject.rst.\n", dev_name(dev)); kfree(p); } static const void *device_namespace(const struct kobject *kobj) { const struct device *dev = kobj_to_dev(kobj); const void *ns = NULL; if (dev->class && dev->class->namespace) ns = dev->class->namespace(dev); return ns; } static void device_get_ownership(const struct kobject *kobj, kuid_t *uid, kgid_t *gid) { const struct device *dev = kobj_to_dev(kobj); if (dev->class && dev->class->get_ownership) dev->class->get_ownership(dev, uid, gid); } static const struct kobj_type device_ktype = { .release = device_release, .sysfs_ops = &dev_sysfs_ops, .namespace = device_namespace, .get_ownership = device_get_ownership, }; static int dev_uevent_filter(const struct kobject *kobj) { const struct kobj_type *ktype = get_ktype(kobj); if (ktype == &device_ktype) { const struct device *dev = kobj_to_dev(kobj); if (dev->bus) return 1; if (dev->class) return 1; } return 0; } static const char *dev_uevent_name(const struct kobject *kobj) { const struct device *dev = kobj_to_dev(kobj); if (dev->bus) return dev->bus->name; if (dev->class) return dev->class->name; return NULL; } /* * Try filling "DRIVER=<name>" uevent variable for a device. Because this * function may race with binding and unbinding the device from a driver, * we need to be careful. Binding is generally safe, at worst we miss the * fact that the device is already bound to a driver (but the driver * information that is delivered through uevents is best-effort, it may * become obsolete as soon as it is generated anyways). Unbinding is more * risky as driver pointer is transitioning to NULL, so READ_ONCE() should * be used to make sure we are dealing with the same pointer, and to * ensure that driver structure is not going to disappear from under us * we take bus' drivers klist lock. The assumption that only registered * driver can be bound to a device, and to unregister a driver bus code * will take the same lock. */ static void dev_driver_uevent(const struct device *dev, struct kobj_uevent_env *env) { struct subsys_private *sp = bus_to_subsys(dev->bus); if (sp) { scoped_guard(spinlock, &sp->klist_drivers.k_lock) { struct device_driver *drv = READ_ONCE(dev->driver); if (drv) add_uevent_var(env, "DRIVER=%s", drv->name); } subsys_put(sp); } } static int dev_uevent(const struct kobject *kobj, struct kobj_uevent_env *env) { const struct device *dev = kobj_to_dev(kobj); int retval = 0; /* add device node properties if present */ if (MAJOR(dev->devt)) { const char *tmp; const char *name; umode_t mode = 0; kuid_t uid = GLOBAL_ROOT_UID; kgid_t gid = GLOBAL_ROOT_GID; add_uevent_var(env, "MAJOR=%u", MAJOR(dev->devt)); add_uevent_var(env, "MINOR=%u", MINOR(dev->devt)); name = device_get_devnode(dev, &mode, &uid, &gid, &tmp); if (name) { add_uevent_var(env, "DEVNAME=%s", name); if (mode) add_uevent_var(env, "DEVMODE=%#o", mode & 0777); if (!uid_eq(uid, GLOBAL_ROOT_UID)) add_uevent_var(env, "DEVUID=%u", from_kuid(&init_user_ns, uid)); if (!gid_eq(gid, GLOBAL_ROOT_GID)) add_uevent_var(env, "DEVGID=%u", from_kgid(&init_user_ns, gid)); kfree(tmp); } } if (dev->type && dev->type->name) add_uevent_var(env, "DEVTYPE=%s", dev->type->name); /* Add "DRIVER=%s" variable if the device is bound to a driver */ dev_driver_uevent(dev, env); /* Add common DT information about the device */ of_device_uevent(dev, env); /* have the bus specific function add its stuff */ if (dev->bus && dev->bus->uevent) { retval = dev->bus->uevent(dev, env); if (retval) pr_debug("device: '%s': %s: bus uevent() returned %d\n", dev_name(dev), __func__, retval); } /* have the class specific function add its stuff */ if (dev->class && dev->class->dev_uevent) { retval = dev->class->dev_uevent(dev, env); if (retval) pr_debug("device: '%s': %s: class uevent() " "returned %d\n", dev_name(dev), __func__, retval); } /* have the device type specific function add its stuff */ if (dev->type && dev->type->uevent) { retval = dev->type->uevent(dev, env); if (retval) pr_debug("device: '%s': %s: dev_type uevent() " "returned %d\n", dev_name(dev), __func__, retval); } return retval; } static const struct kset_uevent_ops device_uevent_ops = { .filter = dev_uevent_filter, .name = dev_uevent_name, .uevent = dev_uevent, }; static ssize_t uevent_show(struct device *dev, struct device_attribute *attr, char *buf) { struct kobject *top_kobj; struct kset *kset; struct kobj_uevent_env *env = NULL; int i; int len = 0; int retval; /* search the kset, the device belongs to */ top_kobj = &dev->kobj; while (!top_kobj->kset && top_kobj->parent) top_kobj = top_kobj->parent; if (!top_kobj->kset) goto out; kset = top_kobj->kset; if (!kset->uevent_ops || !kset->uevent_ops->uevent) goto out; /* respect filter */ if (kset->uevent_ops && kset->uevent_ops->filter) if (!kset->uevent_ops->filter(&dev->kobj)) goto out; env = kzalloc(sizeof(struct kobj_uevent_env), GFP_KERNEL); if (!env) return -ENOMEM; /* let the kset specific function add its keys */ retval = kset->uevent_ops->uevent(&dev->kobj, env); if (retval) goto out; /* copy keys to file */ for (i = 0; i < env->envp_idx; i++) len += sysfs_emit_at(buf, len, "%s\n", env->envp[i]); out: kfree(env); return len; } static ssize_t uevent_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t count) { int rc; rc = kobject_synth_uevent(&dev->kobj, buf, count); if (rc) { dev_err(dev, "uevent: failed to send synthetic uevent: %d\n", rc); return rc; } return count; } static DEVICE_ATTR_RW(uevent); static ssize_t online_show(struct device *dev, struct device_attribute *attr, char *buf) { bool val; device_lock(dev); val = !dev->offline; device_unlock(dev); return sysfs_emit(buf, "%u\n", val); } static ssize_t online_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t count) { bool val; int ret; ret = kstrtobool(buf, &val); if (ret < 0) return ret; ret = lock_device_hotplug_sysfs(); if (ret) return ret; ret = val ? device_online(dev) : device_offline(dev); unlock_device_hotplug(); return ret < 0 ? ret : count; } static DEVICE_ATTR_RW(online); static ssize_t removable_show(struct device *dev, struct device_attribute *attr, char *buf) { const char *loc; switch (dev->removable) { case DEVICE_REMOVABLE: loc = "removable"; break; case DEVICE_FIXED: loc = "fixed"; break; default: loc = "unknown"; } return sysfs_emit(buf, "%s\n", loc); } static DEVICE_ATTR_RO(removable); int device_add_groups(struct device *dev, const struct attribute_group **groups) { return sysfs_create_groups(&dev->kobj, groups); } EXPORT_SYMBOL_GPL(device_add_groups); void device_remove_groups(struct device *dev, const struct attribute_group **groups) { sysfs_remove_groups(&dev->kobj, groups); } EXPORT_SYMBOL_GPL(device_remove_groups); union device_attr_group_devres { const struct attribute_group *group; const struct attribute_group **groups; }; static void devm_attr_group_remove(struct device *dev, void *res) { union device_attr_group_devres *devres = res; const struct attribute_group *group = devres->group; dev_dbg(dev, "%s: removing group %p\n", __func__, group); sysfs_remove_group(&dev->kobj, group); } /** * devm_device_add_group - given a device, create a managed attribute group * @dev: The device to create the group for * @grp: The attribute group to create * * This function creates a group for the first time. It will explicitly * warn and error if any of the attribute files being created already exist. * * Returns 0 on success or error code on failure. */ int devm_device_add_group(struct device *dev, const struct attribute_group *grp) { union device_attr_group_devres *devres; int error; devres = devres_alloc(devm_attr_group_remove, sizeof(*devres), GFP_KERNEL); if (!devres) return -ENOMEM; error = sysfs_create_group(&dev->kobj, grp); if (error) { devres_free(devres); return error; } devres->group = grp; devres_add(dev, devres); return 0; } EXPORT_SYMBOL_GPL(devm_device_add_group); static int device_add_attrs(struct device *dev) { const struct class *class = dev->class; const struct device_type *type = dev->type; int error; if (class) { error = device_add_groups(dev, class->dev_groups); if (error) return error; } if (type) { error = device_add_groups(dev, type->groups); if (error) goto err_remove_class_groups; } error = device_add_groups(dev, dev->groups); if (error) goto err_remove_type_groups; if (device_supports_offline(dev) && !dev->offline_disabled) { error = device_create_file(dev, &dev_attr_online); if (error) goto err_remove_dev_groups; } if (fw_devlink_flags && !fw_devlink_is_permissive() && dev->fwnode) { error = device_create_file(dev, &dev_attr_waiting_for_supplier); if (error) goto err_remove_dev_online; } if (dev_removable_is_valid(dev)) { error = device_create_file(dev, &dev_attr_removable); if (error) goto err_remove_dev_waiting_for_supplier; } if (dev_add_physical_location(dev)) { error = device_add_group(dev, &dev_attr_physical_location_group); if (error) goto err_remove_dev_removable; } return 0; err_remove_dev_removable: device_remove_file(dev, &dev_attr_removable); err_remove_dev_waiting_for_supplier: device_remove_file(dev, &dev_attr_waiting_for_supplier); err_remove_dev_online: device_remove_file(dev, &dev_attr_online); err_remove_dev_groups: device_remove_groups(dev, dev->groups); err_remove_type_groups: if (type) device_remove_groups(dev, type->groups); err_remove_class_groups: if (class) device_remove_groups(dev, class->dev_groups); return error; } static void device_remove_attrs(struct device *dev) { const struct class *class = dev->class; const struct device_type *type = dev->type; if (dev->physical_location) { device_remove_group(dev, &dev_attr_physical_location_group); kfree(dev->physical_location); } device_remove_file(dev, &dev_attr_removable); device_remove_file(dev, &dev_attr_waiting_for_supplier); device_remove_file(dev, &dev_attr_online); device_remove_groups(dev, dev->groups); if (type) device_remove_groups(dev, type->groups); if (class) device_remove_groups(dev, class->dev_groups); } static ssize_t dev_show(struct device *dev, struct device_attribute *attr, char *buf) { return print_dev_t(buf, dev->devt); } static DEVICE_ATTR_RO(dev); /* /sys/devices/ */ struct kset *devices_kset; /** * devices_kset_move_before - Move device in the devices_kset's list. * @deva: Device to move. * @devb: Device @deva should come before. */ static void devices_kset_move_before(struct device *deva, struct device *devb) { if (!devices_kset) return; pr_debug("devices_kset: Moving %s before %s\n", dev_name(deva), dev_name(devb)); spin_lock(&devices_kset->list_lock); list_move_tail(&deva->kobj.entry, &devb->kobj.entry); spin_unlock(&devices_kset->list_lock); } /** * devices_kset_move_after - Move device in the devices_kset's list. * @deva: Device to move * @devb: Device @deva should come after. */ static void devices_kset_move_after(struct device *deva, struct device *devb) { if (!devices_kset) return; pr_debug("devices_kset: Moving %s after %s\n", dev_name(deva), dev_name(devb)); spin_lock(&devices_kset->list_lock); list_move(&deva->kobj.entry, &devb->kobj.entry); spin_unlock(&devices_kset->list_lock); } /** * devices_kset_move_last - move the device to the end of devices_kset's list. * @dev: device to move */ void devices_kset_move_last(struct device *dev) { if (!devices_kset) return; pr_debug("devices_kset: Moving %s to end of list\n", dev_name(dev)); spin_lock(&devices_kset->list_lock); list_move_tail(&dev->kobj.entry, &devices_kset->list); spin_unlock(&devices_kset->list_lock); } /** * device_create_file - create sysfs attribute file for device. * @dev: device. * @attr: device attribute descriptor. */ int device_create_file(struct device *dev, const struct device_attribute *attr) { int error = 0; if (dev) { WARN(((attr->attr.mode & S_IWUGO) && !attr->store), "Attribute %s: write permission without 'store'\n", attr->attr.name); WARN(((attr->attr.mode & S_IRUGO) && !attr->show), "Attribute %s: read permission without 'show'\n", attr->attr.name); error = sysfs_create_file(&dev->kobj, &attr->attr); } return error; } EXPORT_SYMBOL_GPL(device_create_file); /** * device_remove_file - remove sysfs attribute file. * @dev: device. * @attr: device attribute descriptor. */ void device_remove_file(struct device *dev, const struct device_attribute *attr) { if (dev) sysfs_remove_file(&dev->kobj, &attr->attr); } EXPORT_SYMBOL_GPL(device_remove_file); /** * device_remove_file_self - remove sysfs attribute file from its own method. * @dev: device. * @attr: device attribute descriptor. * * See kernfs_remove_self() for details. */ bool device_remove_file_self(struct device *dev, const struct device_attribute *attr) { if (dev) return sysfs_remove_file_self(&dev->kobj, &attr->attr); else return false; } EXPORT_SYMBOL_GPL(device_remove_file_self); /** * device_create_bin_file - create sysfs binary attribute file for device. * @dev: device. * @attr: device binary attribute descriptor. */ int device_create_bin_file(struct device *dev, const struct bin_attribute *attr) { int error = -EINVAL; if (dev) error = sysfs_create_bin_file(&dev->kobj, attr); return error; } EXPORT_SYMBOL_GPL(device_create_bin_file); /** * device_remove_bin_file - remove sysfs binary attribute file * @dev: device. * @attr: device binary attribute descriptor. */ void device_remove_bin_file(struct device *dev, const struct bin_attribute *attr) { if (dev) sysfs_remove_bin_file(&dev->kobj, attr); } EXPORT_SYMBOL_GPL(device_remove_bin_file); static void klist_children_get(struct klist_node *n) { struct device_private *p = to_device_private_parent(n); struct device *dev = p->device; get_device(dev); } static void klist_children_put(struct klist_node *n) { struct device_private *p = to_device_private_parent(n); struct device *dev = p->device; put_device(dev); } /** * device_initialize - init device structure. * @dev: device. * * This prepares the device for use by other layers by initializing * its fields. * It is the first half of device_register(), if called by * that function, though it can also be called separately, so one * may use @dev's fields. In particular, get_device()/put_device() * may be used for reference counting of @dev after calling this * function. * * All fields in @dev must be initialized by the caller to 0, except * for those explicitly set to some other value. The simplest * approach is to use kzalloc() to allocate the structure containing * @dev. * * NOTE: Use put_device() to give up your reference instead of freeing * @dev directly once you have called this function. */ void device_initialize(struct device *dev) { dev->kobj.kset = devices_kset; kobject_init(&dev->kobj, &device_ktype); INIT_LIST_HEAD(&dev->dma_pools); mutex_init(&dev->mutex); lockdep_set_novalidate_class(&dev->mutex); spin_lock_init(&dev->devres_lock); INIT_LIST_HEAD(&dev->devres_head); device_pm_init(dev); set_dev_node(dev, NUMA_NO_NODE); INIT_LIST_HEAD(&dev->links.consumers); INIT_LIST_HEAD(&dev->links.suppliers); INIT_LIST_HEAD(&dev->links.defer_sync); dev->links.status = DL_DEV_NO_DRIVER; #if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) || \ defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) || \ defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL) dev->dma_coherent = dma_default_coherent; #endif swiotlb_dev_init(dev); } EXPORT_SYMBOL_GPL(device_initialize); struct kobject *virtual_device_parent(void) { static struct kobject *virtual_dir = NULL; if (!virtual_dir) virtual_dir = kobject_create_and_add("virtual", &devices_kset->kobj); return virtual_dir; } struct class_dir { struct kobject kobj; const struct class *class; }; #define to_class_dir(obj) container_of(obj, struct class_dir, kobj) static void class_dir_release(struct kobject *kobj) { struct class_dir *dir = to_class_dir(kobj); kfree(dir); } static const struct kobj_ns_type_operations *class_dir_child_ns_type(const struct kobject *kobj) { const struct class_dir *dir = to_class_dir(kobj); return dir->class->ns_type; } static const struct kobj_type class_dir_ktype = { .release = class_dir_release, .sysfs_ops = &kobj_sysfs_ops, .child_ns_type = class_dir_child_ns_type }; static struct kobject *class_dir_create_and_add(struct subsys_private *sp, struct kobject *parent_kobj) { struct class_dir *dir; int retval; dir = kzalloc(sizeof(*dir), GFP_KERNEL); if (!dir) return ERR_PTR(-ENOMEM); dir->class = sp->class; kobject_init(&dir->kobj, &class_dir_ktype); dir->kobj.kset = &sp->glue_dirs; retval = kobject_add(&dir->kobj, parent_kobj, "%s", sp->class->name); if (retval < 0) { kobject_put(&dir->kobj); return ERR_PTR(retval); } return &dir->kobj; } static DEFINE_MUTEX(gdp_mutex); static struct kobject *get_device_parent(struct device *dev, struct device *parent) { struct subsys_private *sp = class_to_subsys(dev->class); struct kobject *kobj = NULL; if (sp) { struct kobject *parent_kobj; struct kobject *k; /* * If we have no parent, we live in "virtual". * Class-devices with a non class-device as parent, live * in a "glue" directory to prevent namespace collisions. */ if (parent == NULL) parent_kobj = virtual_device_parent(); else if (parent->class && !dev->class->ns_type) { subsys_put(sp); return &parent->kobj; } else { parent_kobj = &parent->kobj; } mutex_lock(&gdp_mutex); /* find our class-directory at the parent and reference it */ spin_lock(&sp->glue_dirs.list_lock); list_for_each_entry(k, &sp->glue_dirs.list, entry) if (k->parent == parent_kobj) { kobj = kobject_get(k); break; } spin_unlock(&sp->glue_dirs.list_lock); if (kobj) { mutex_unlock(&gdp_mutex); subsys_put(sp); return kobj; } /* or create a new class-directory at the parent device */ k = class_dir_create_and_add(sp, parent_kobj); /* do not emit an uevent for this simple "glue" directory */ mutex_unlock(&gdp_mutex); subsys_put(sp); return k; } /* subsystems can specify a default root directory for their devices */ if (!parent && dev->bus) { struct device *dev_root = bus_get_dev_root(dev->bus); if (dev_root) { kobj = &dev_root->kobj; put_device(dev_root); return kobj; } } if (parent) return &parent->kobj; return NULL; } static inline bool live_in_glue_dir(struct kobject *kobj, struct device *dev) { struct subsys_private *sp; bool retval; if (!kobj || !dev->class) return false; sp = class_to_subsys(dev->class); if (!sp) return false; if (kobj->kset == &sp->glue_dirs) retval = true; else retval = false; subsys_put(sp); return retval; } static inline struct kobject *get_glue_dir(struct device *dev) { return dev->kobj.parent; } /** * kobject_has_children - Returns whether a kobject has children. * @kobj: the object to test * * This will return whether a kobject has other kobjects as children. * * It does NOT account for the presence of attribute files, only sub * directories. It also assumes there is no concurrent addition or * removal of such children, and thus relies on external locking. */ static inline bool kobject_has_children(struct kobject *kobj) { WARN_ON_ONCE(kref_read(&kobj->kref) == 0); return kobj->sd && kobj->sd->dir.subdirs; } /* * make sure cleaning up dir as the last step, we need to make * sure .release handler of kobject is run with holding the * global lock */ static void cleanup_glue_dir(struct device *dev, struct kobject *glue_dir) { unsigned int ref; /* see if we live in a "glue" directory */ if (!live_in_glue_dir(glue_dir, dev)) return; mutex_lock(&gdp_mutex); /** * There is a race condition between removing glue directory * and adding a new device under the glue directory. * * CPU1: CPU2: * * device_add() * get_device_parent() * class_dir_create_and_add() * kobject_add_internal() * create_dir() // create glue_dir * * device_add() * get_device_parent() * kobject_get() // get glue_dir * * device_del() * cleanup_glue_dir() * kobject_del(glue_dir) * * kobject_add() * kobject_add_internal() * create_dir() // in glue_dir * sysfs_create_dir_ns() * kernfs_create_dir_ns(sd) * * sysfs_remove_dir() // glue_dir->sd=NULL * sysfs_put() // free glue_dir->sd * * // sd is freed * kernfs_new_node(sd) * kernfs_get(glue_dir) * kernfs_add_one() * kernfs_put() * * Before CPU1 remove last child device under glue dir, if CPU2 add * a new device under glue dir, the glue_dir kobject reference count * will be increase to 2 in kobject_get(k). And CPU2 has been called * kernfs_create_dir_ns(). Meanwhile, CPU1 call sysfs_remove_dir() * and sysfs_put(). This result in glue_dir->sd is freed. * * Then the CPU2 will see a stale "empty" but still potentially used * glue dir around in kernfs_new_node(). * * In order to avoid this happening, we also should make sure that * kernfs_node for glue_dir is released in CPU1 only when refcount * for glue_dir kobj is 1. */ ref = kref_read(&glue_dir->kref); if (!kobject_has_children(glue_dir) && !--ref) kobject_del(glue_dir); kobject_put(glue_dir); mutex_unlock(&gdp_mutex); } static int device_add_class_symlinks(struct device *dev) { struct device_node *of_node = dev_of_node(dev); struct subsys_private *sp; int error; if (of_node) { error = sysfs_create_link(&dev->kobj, of_node_kobj(of_node), "of_node"); if (error) dev_warn(dev, "Error %d creating of_node link\n",error); /* An error here doesn't warrant bringing down the device */ } sp = class_to_subsys(dev->class); if (!sp) return 0; error = sysfs_create_link(&dev->kobj, &sp->subsys.kobj, "subsystem"); if (error) goto out_devnode; if (dev->parent && device_is_not_partition(dev)) { error = sysfs_create_link(&dev->kobj, &dev->parent->kobj, "device"); if (error) goto out_subsys; } /* link in the class directory pointing to the device */ error = sysfs_create_link(&sp->subsys.kobj, &dev->kobj, dev_name(dev)); if (error) goto out_device; goto exit; out_device: sysfs_remove_link(&dev->kobj, "device"); out_subsys: sysfs_remove_link(&dev->kobj, "subsystem"); out_devnode: sysfs_remove_link(&dev->kobj, "of_node"); exit: subsys_put(sp); return error; } static void device_remove_class_symlinks(struct device *dev) { struct subsys_private *sp = class_to_subsys(dev->class); if (dev_of_node(dev)) sysfs_remove_link(&dev->kobj, "of_node"); if (!sp) return; if (dev->parent && device_is_not_partition(dev)) sysfs_remove_link(&dev->kobj, "device"); sysfs_remove_link(&dev->kobj, "subsystem"); sysfs_delete_link(&sp->subsys.kobj, &dev->kobj, dev_name(dev)); subsys_put(sp); } /** * dev_set_name - set a device name * @dev: device * @fmt: format string for the device's name */ int dev_set_name(struct device *dev, const char *fmt, ...) { va_list vargs; int err; va_start(vargs, fmt); err = kobject_set_name_vargs(&dev->kobj, fmt, vargs); va_end(vargs); return err; } EXPORT_SYMBOL_GPL(dev_set_name); /* select a /sys/dev/ directory for the device */ static struct kobject *device_to_dev_kobj(struct device *dev) { if (is_blockdev(dev)) return sysfs_dev_block_kobj; else return sysfs_dev_char_kobj; } static int device_create_sys_dev_entry(struct device *dev) { struct kobject *kobj = device_to_dev_kobj(dev); int error = 0; char devt_str[15]; if (kobj) { format_dev_t(devt_str, dev->devt); error = sysfs_create_link(kobj, &dev->kobj, devt_str); } return error; } static void device_remove_sys_dev_entry(struct device *dev) { struct kobject *kobj = device_to_dev_kobj(dev); char devt_str[15]; if (kobj) { format_dev_t(devt_str, dev->devt); sysfs_remove_link(kobj, devt_str); } } static int device_private_init(struct device *dev) { dev->p = kzalloc(sizeof(*dev->p), GFP_KERNEL); if (!dev->p) return -ENOMEM; dev->p->device = dev; klist_init(&dev->p->klist_children, klist_children_get, klist_children_put); INIT_LIST_HEAD(&dev->p->deferred_probe); return 0; } /** * device_add - add device to device hierarchy. * @dev: device. * * This is part 2 of device_register(), though may be called * separately _iff_ device_initialize() has been called separately. * * This adds @dev to the kobject hierarchy via kobject_add(), adds it * to the global and sibling lists for the device, then * adds it to the other relevant subsystems of the driver model. * * Do not call this routine or device_register() more than once for * any device structure. The driver model core is not designed to work * with devices that get unregistered and then spring back to life. * (Among other things, it's very hard to guarantee that all references * to the previous incarnation of @dev have been dropped.) Allocate * and register a fresh new struct device instead. * * NOTE: _Never_ directly free @dev after calling this function, even * if it returned an error! Always use put_device() to give up your * reference instead. * * Rule of thumb is: if device_add() succeeds, you should call * device_del() when you want to get rid of it. If device_add() has * *not* succeeded, use *only* put_device() to drop the reference * count. */ int device_add(struct device *dev) { struct subsys_private *sp; struct device *parent; struct kobject *kobj; struct class_interface *class_intf; int error = -EINVAL; struct kobject *glue_dir = NULL; dev = get_device(dev); if (!dev) goto done; if (!dev->p) { error = device_private_init(dev); if (error) goto done; } /* * for statically allocated devices, which should all be converted * some day, we need to initialize the name. We prevent reading back * the name, and force the use of dev_name() */ if (dev->init_name) { error = dev_set_name(dev, "%s", dev->init_name); dev->init_name = NULL; } if (dev_name(dev)) error = 0; /* subsystems can specify simple device enumeration */ else if (dev->bus && dev->bus->dev_name) error = dev_set_name(dev, "%s%u", dev->bus->dev_name, dev->id); else error = -EINVAL; if (error) goto name_error; pr_debug("device: '%s': %s\n", dev_name(dev), __func__); parent = get_device(dev->parent); kobj = get_device_parent(dev, parent); if (IS_ERR(kobj)) { error = PTR_ERR(kobj); goto parent_error; } if (kobj) dev->kobj.parent = kobj; /* use parent numa_node */ if (parent && (dev_to_node(dev) == NUMA_NO_NODE)) set_dev_node(dev, dev_to_node(parent)); /* first, register with generic layer. */ /* we require the name to be set before, and pass NULL */ error = kobject_add(&dev->kobj, dev->kobj.parent, NULL); if (error) { glue_dir = kobj; goto Error; } /* notify platform of device entry */ device_platform_notify(dev); error = device_create_file(dev, &dev_attr_uevent); if (error) goto attrError; error = device_add_class_symlinks(dev); if (error) goto SymlinkError; error = device_add_attrs(dev); if (error) goto AttrsError; error = bus_add_device(dev); if (error) goto BusError; error = dpm_sysfs_add(dev); if (error) goto DPMError; device_pm_add(dev); if (MAJOR(dev->devt)) { error = device_create_file(dev, &dev_attr_dev); if (error) goto DevAttrError; error = device_create_sys_dev_entry(dev); if (error) goto SysEntryError; devtmpfs_create_node(dev); } /* Notify clients of device addition. This call must come * after dpm_sysfs_add() and before kobject_uevent(). */ bus_notify(dev, BUS_NOTIFY_ADD_DEVICE); kobject_uevent(&dev->kobj, KOBJ_ADD); /* * Check if any of the other devices (consumers) have been waiting for * this device (supplier) to be added so that they can create a device * link to it. * * This needs to happen after device_pm_add() because device_link_add() * requires the supplier be registered before it's called. * * But this also needs to happen before bus_probe_device() to make sure * waiting consumers can link to it before the driver is bound to the * device and the driver sync_state callback is called for this device. */ if (dev->fwnode && !dev->fwnode->dev) { dev->fwnode->dev = dev; fw_devlink_link_device(dev); } bus_probe_device(dev); /* * If all driver registration is done and a newly added device doesn't * match with any driver, don't block its consumers from probing in * case the consumer device is able to operate without this supplier. */ if (dev->fwnode && fw_devlink_drv_reg_done && !dev->can_match) fw_devlink_unblock_consumers(dev); if (parent) klist_add_tail(&dev->p->knode_parent, &parent->p->klist_children); sp = class_to_subsys(dev->class); if (sp) { mutex_lock(&sp->mutex); /* tie the class to the device */ klist_add_tail(&dev->p->knode_class, &sp->klist_devices); /* notify any interfaces that the device is here */ list_for_each_entry(class_intf, &sp->interfaces, node) if (class_intf->add_dev) class_intf->add_dev(dev); mutex_unlock(&sp->mutex); subsys_put(sp); } done: put_device(dev); return error; SysEntryError: if (MAJOR(dev->devt)) device_remove_file(dev, &dev_attr_dev); DevAttrError: device_pm_remove(dev); dpm_sysfs_remove(dev); DPMError: device_set_driver(dev, NULL); bus_remove_device(dev); BusError: device_remove_attrs(dev); AttrsError: device_remove_class_symlinks(dev); SymlinkError: device_remove_file(dev, &dev_attr_uevent); attrError: device_platform_notify_remove(dev); kobject_uevent(&dev->kobj, KOBJ_REMOVE); glue_dir = get_glue_dir(dev); kobject_del(&dev->kobj); Error: cleanup_glue_dir(dev, glue_dir); parent_error: put_device(parent); name_error: kfree(dev->p); dev->p = NULL; goto done; } EXPORT_SYMBOL_GPL(device_add); /** * device_register - register a device with the system. * @dev: pointer to the device structure * * This happens in two clean steps - initialize the device * and add it to the system. The two steps can be called * separately, but this is the easiest and most common. * I.e. you should only call the two helpers separately if * have a clearly defined need to use and refcount the device * before it is added to the hierarchy. * * For more information, see the kerneldoc for device_initialize() * and device_add(). * * NOTE: _Never_ directly free @dev after calling this function, even * if it returned an error! Always use put_device() to give up the * reference initialized in this function instead. */ int device_register(struct device *dev) { device_initialize(dev); return device_add(dev); } EXPORT_SYMBOL_GPL(device_register); /** * get_device - increment reference count for device. * @dev: device. * * This simply forwards the call to kobject_get(), though * we do take care to provide for the case that we get a NULL * pointer passed in. */ struct device *get_device(struct device *dev) { return dev ? kobj_to_dev(kobject_get(&dev->kobj)) : NULL; } EXPORT_SYMBOL_GPL(get_device); /** * put_device - decrement reference count. * @dev: device in question. */ void put_device(struct device *dev) { /* might_sleep(); */ if (dev) kobject_put(&dev->kobj); } EXPORT_SYMBOL_GPL(put_device); bool kill_device(struct device *dev) { /* * Require the device lock and set the "dead" flag to guarantee that * the update behavior is consistent with the other bitfields near * it and that we cannot have an asynchronous probe routine trying * to run while we are tearing out the bus/class/sysfs from * underneath the device. */ device_lock_assert(dev); if (dev->p->dead) return false; dev->p->dead = true; return true; } EXPORT_SYMBOL_GPL(kill_device); /** * device_del - delete device from system. * @dev: device. * * This is the first part of the device unregistration * sequence. This removes the device from the lists we control * from here, has it removed from the other driver model * subsystems it was added to in device_add(), and removes it * from the kobject hierarchy. * * NOTE: this should be called manually _iff_ device_add() was * also called manually. */ void device_del(struct device *dev) { struct subsys_private *sp; struct device *parent = dev->parent; struct kobject *glue_dir = NULL; struct class_interface *class_intf; unsigned int noio_flag; device_lock(dev); kill_device(dev); device_unlock(dev); if (dev->fwnode && dev->fwnode->dev == dev) dev->fwnode->dev = NULL; /* Notify clients of device removal. This call must come * before dpm_sysfs_remove(). */ noio_flag = memalloc_noio_save(); bus_notify(dev, BUS_NOTIFY_DEL_DEVICE); dpm_sysfs_remove(dev); if (parent) klist_del(&dev->p->knode_parent); if (MAJOR(dev->devt)) { devtmpfs_delete_node(dev); device_remove_sys_dev_entry(dev); device_remove_file(dev, &dev_attr_dev); } sp = class_to_subsys(dev->class); if (sp) { device_remove_class_symlinks(dev); mutex_lock(&sp->mutex); /* notify any interfaces that the device is now gone */ list_for_each_entry(class_intf, &sp->interfaces, node) if (class_intf->remove_dev) class_intf->remove_dev(dev); /* remove the device from the class list */ klist_del(&dev->p->knode_class); mutex_unlock(&sp->mutex); subsys_put(sp); } device_remove_file(dev, &dev_attr_uevent); device_remove_attrs(dev); bus_remove_device(dev); device_pm_remove(dev); driver_deferred_probe_del(dev); device_platform_notify_remove(dev); device_links_purge(dev); /* * If a device does not have a driver attached, we need to clean * up any managed resources. We do this in device_release(), but * it's never called (and we leak the device) if a managed * resource holds a reference to the device. So release all * managed resources here, like we do in driver_detach(). We * still need to do so again in device_release() in case someone * adds a new resource after this point, though. */ devres_release_all(dev); bus_notify(dev, BUS_NOTIFY_REMOVED_DEVICE); kobject_uevent(&dev->kobj, KOBJ_REMOVE); glue_dir = get_glue_dir(dev); kobject_del(&dev->kobj); cleanup_glue_dir(dev, glue_dir); memalloc_noio_restore(noio_flag); put_device(parent); } EXPORT_SYMBOL_GPL(device_del); /** * device_unregister - unregister device from system. * @dev: device going away. * * We do this in two parts, like we do device_register(). First, * we remove it from all the subsystems with device_del(), then * we decrement the reference count via put_device(). If that * is the final reference count, the device will be cleaned up * via device_release() above. Otherwise, the structure will * stick around until the final reference to the device is dropped. */ void device_unregister(struct device *dev) { pr_debug("device: '%s': %s\n", dev_name(dev), __func__); device_del(dev); put_device(dev); } EXPORT_SYMBOL_GPL(device_unregister); static struct device *prev_device(struct klist_iter *i) { struct klist_node *n = klist_prev(i); struct device *dev = NULL; struct device_private *p; if (n) { p = to_device_private_parent(n); dev = p->device; } return dev; } static struct device *next_device(struct klist_iter *i) { struct klist_node *n = klist_next(i); struct device *dev = NULL; struct device_private *p; if (n) { p = to_device_private_parent(n); dev = p->device; } return dev; } /** * device_get_devnode - path of device node file * @dev: device * @mode: returned file access mode * @uid: returned file owner * @gid: returned file group * @tmp: possibly allocated string * * Return the relative path of a possible device node. * Non-default names may need to allocate a memory to compose * a name. This memory is returned in tmp and needs to be * freed by the caller. */ const char *device_get_devnode(const struct device *dev, umode_t *mode, kuid_t *uid, kgid_t *gid, const char **tmp) { char *s; *tmp = NULL; /* the device type may provide a specific name */ if (dev->type && dev->type->devnode) *tmp = dev->type->devnode(dev, mode, uid, gid); if (*tmp) return *tmp; /* the class may provide a specific name */ if (dev->class && dev->class->devnode) *tmp = dev->class->devnode(dev, mode); if (*tmp) return *tmp; /* return name without allocation, tmp == NULL */ if (strchr(dev_name(dev), '!') == NULL) return dev_name(dev); /* replace '!' in the name with '/' */ s = kstrdup_and_replace(dev_name(dev), '!', '/', GFP_KERNEL); if (!s) return NULL; return *tmp = s; } /** * device_for_each_child - device child iterator. * @parent: parent struct device. * @data: data for the callback. * @fn: function to be called for each device. * * Iterate over @parent's child devices, and call @fn for each, * passing it @data. * * We check the return of @fn each time. If it returns anything * other than 0, we break out and return that value. */ int device_for_each_child(struct device *parent, void *data, device_iter_t fn) { struct klist_iter i; struct device *child; int error = 0; if (!parent || !parent->p) return 0; klist_iter_init(&parent->p->klist_children, &i); while (!error && (child = next_device(&i))) error = fn(child, data); klist_iter_exit(&i); return error; } EXPORT_SYMBOL_GPL(device_for_each_child); /** * device_for_each_child_reverse - device child iterator in reversed order. * @parent: parent struct device. * @data: data for the callback. * @fn: function to be called for each device. * * Iterate over @parent's child devices, and call @fn for each, * passing it @data. * * We check the return of @fn each time. If it returns anything * other than 0, we break out and return that value. */ int device_for_each_child_reverse(struct device *parent, void *data, device_iter_t fn) { struct klist_iter i; struct device *child; int error = 0; if (!parent || !parent->p) return 0; klist_iter_init(&parent->p->klist_children, &i); while ((child = prev_device(&i)) && !error) error = fn(child, data); klist_iter_exit(&i); return error; } EXPORT_SYMBOL_GPL(device_for_each_child_reverse); /** * device_for_each_child_reverse_from - device child iterator in reversed order. * @parent: parent struct device. * @from: optional starting point in child list * @data: data for the callback. * @fn: function to be called for each device. * * Iterate over @parent's child devices, starting at @from, and call @fn * for each, passing it @data. This helper is identical to * device_for_each_child_reverse() when @from is NULL. * * @fn is checked each iteration. If it returns anything other than 0, * iteration stop and that value is returned to the caller of * device_for_each_child_reverse_from(); */ int device_for_each_child_reverse_from(struct device *parent, struct device *from, void *data, device_iter_t fn) { struct klist_iter i; struct device *child; int error = 0; if (!parent || !parent->p) return 0; klist_iter_init_node(&parent->p->klist_children, &i, (from ? &from->p->knode_parent : NULL)); while ((child = prev_device(&i)) && !error) error = fn(child, data); klist_iter_exit(&i); return error; } EXPORT_SYMBOL_GPL(device_for_each_child_reverse_from); /** * device_find_child - device iterator for locating a particular device. * @parent: parent struct device * @data: Data to pass to match function * @match: Callback function to check device * * This is similar to the device_for_each_child() function above, but it * returns a reference to a device that is 'found' for later use, as * determined by the @match callback. * * The callback should return 0 if the device doesn't match and non-zero * if it does. If the callback returns non-zero and a reference to the * current device can be obtained, this function will return to the caller * and not iterate over any more devices. * * NOTE: you will need to drop the reference with put_device() after use. */ struct device *device_find_child(struct device *parent, const void *data, device_match_t match) { struct klist_iter i; struct device *child; if (!parent || !parent->p) return NULL; klist_iter_init(&parent->p->klist_children, &i); while ((child = next_device(&i))) { if (match(child, data)) { get_device(child); break; } } klist_iter_exit(&i); return child; } EXPORT_SYMBOL_GPL(device_find_child); int __init devices_init(void) { devices_kset = kset_create_and_add("devices", &device_uevent_ops, NULL); if (!devices_kset) return -ENOMEM; dev_kobj = kobject_create_and_add("dev", NULL); if (!dev_kobj) goto dev_kobj_err; sysfs_dev_block_kobj = kobject_create_and_add("block", dev_kobj); if (!sysfs_dev_block_kobj) goto block_kobj_err; sysfs_dev_char_kobj = kobject_create_and_add("char", dev_kobj); if (!sysfs_dev_char_kobj) goto char_kobj_err; device_link_wq = alloc_workqueue("device_link_wq", 0, 0); if (!device_link_wq) goto wq_err; return 0; wq_err: kobject_put(sysfs_dev_char_kobj); char_kobj_err: kobject_put(sysfs_dev_block_kobj); block_kobj_err: kobject_put(dev_kobj); dev_kobj_err: kset_unregister(devices_kset); return -ENOMEM; } static int device_check_offline(struct device *dev, void *not_used) { int ret; ret = device_for_each_child(dev, NULL, device_check_offline); if (ret) return ret; return device_supports_offline(dev) && !dev->offline ? -EBUSY : 0; } /** * device_offline - Prepare the device for hot-removal. * @dev: Device to be put offline. * * Execute the device bus type's .offline() callback, if present, to prepare * the device for a subsequent hot-removal. If that succeeds, the device must * not be used until either it is removed or its bus type's .online() callback * is executed. * * Call under device_hotplug_lock. */ int device_offline(struct device *dev) { int ret; if (dev->offline_disabled) return -EPERM; ret = device_for_each_child(dev, NULL, device_check_offline); if (ret) return ret; device_lock(dev); if (device_supports_offline(dev)) { if (dev->offline) { ret = 1; } else { ret = dev->bus->offline(dev); if (!ret) { kobject_uevent(&dev->kobj, KOBJ_OFFLINE); dev->offline = true; } } } device_unlock(dev); return ret; } /** * device_online - Put the device back online after successful device_offline(). * @dev: Device to be put back online. * * If device_offline() has been successfully executed for @dev, but the device * has not been removed subsequently, execute its bus type's .online() callback * to indicate that the device can be used again. * * Call under device_hotplug_lock. */ int device_online(struct device *dev) { int ret = 0; device_lock(dev); if (device_supports_offline(dev)) { if (dev->offline) { ret = dev->bus->online(dev); if (!ret) { kobject_uevent(&dev->kobj, KOBJ_ONLINE); dev->offline = false; } } else { ret = 1; } } device_unlock(dev); return ret; } struct root_device { struct device dev; struct module *owner; }; static inline struct root_device *to_root_device(struct device *d) { return container_of(d, struct root_device, dev); } static void root_device_release(struct device *dev) { kfree(to_root_device(dev)); } /** * __root_device_register - allocate and register a root device * @name: root device name * @owner: owner module of the root device, usually THIS_MODULE * * This function allocates a root device and registers it * using device_register(). In order to free the returned * device, use root_device_unregister(). * * Root devices are dummy devices which allow other devices * to be grouped under /sys/devices. Use this function to * allocate a root device and then use it as the parent of * any device which should appear under /sys/devices/{name} * * The /sys/devices/{name} directory will also contain a * 'module' symlink which points to the @owner directory * in sysfs. * * Returns &struct device pointer on success, or ERR_PTR() on error. * * Note: You probably want to use root_device_register(). */ struct device *__root_device_register(const char *name, struct module *owner) { struct root_device *root; int err = -ENOMEM; root = kzalloc(sizeof(struct root_device), GFP_KERNEL); if (!root) return ERR_PTR(err); err = dev_set_name(&root->dev, "%s", name); if (err) { kfree(root); return ERR_PTR(err); } root->dev.release = root_device_release; err = device_register(&root->dev); if (err) { put_device(&root->dev); return ERR_PTR(err); } #ifdef CONFIG_MODULES /* gotta find a "cleaner" way to do this */ if (owner) { struct module_kobject *mk = &owner->mkobj; err = sysfs_create_link(&root->dev.kobj, &mk->kobj, "module"); if (err) { device_unregister(&root->dev); return ERR_PTR(err); } root->owner = owner; } #endif return &root->dev; } EXPORT_SYMBOL_GPL(__root_device_register); /** * root_device_unregister - unregister and free a root device * @dev: device going away * * This function unregisters and cleans up a device that was created by * root_device_register(). */ void root_device_unregister(struct device *dev) { struct root_device *root = to_root_device(dev); if (root->owner) sysfs_remove_link(&root->dev.kobj, "module"); device_unregister(dev); } EXPORT_SYMBOL_GPL(root_device_unregister); static void device_create_release(struct device *dev) { pr_debug("device: '%s': %s\n", dev_name(dev), __func__); kfree(dev); } static __printf(6, 0) struct device * device_create_groups_vargs(const struct class *class, struct device *parent, dev_t devt, void *drvdata, const struct attribute_group **groups, const char *fmt, va_list args) { struct device *dev = NULL; int retval = -ENODEV; if (IS_ERR_OR_NULL(class)) goto error; dev = kzalloc(sizeof(*dev), GFP_KERNEL); if (!dev) { retval = -ENOMEM; goto error; } device_initialize(dev); dev->devt = devt; dev->class = class; dev->parent = parent; dev->groups = groups; dev->release = device_create_release; dev_set_drvdata(dev, drvdata); retval = kobject_set_name_vargs(&dev->kobj, fmt, args); if (retval) goto error; retval = device_add(dev); if (retval) goto error; return dev; error: put_device(dev); return ERR_PTR(retval); } /** * device_create - creates a device and registers it with sysfs * @class: pointer to the struct class that this device should be registered to * @parent: pointer to the parent struct device of this new device, if any * @devt: the dev_t for the char device to be added * @drvdata: the data to be added to the device for callbacks * @fmt: string for the device's name * * This function can be used by char device classes. A struct device * will be created in sysfs, registered to the specified class. * * A "dev" file will be created, showing the dev_t for the device, if * the dev_t is not 0,0. * If a pointer to a parent struct device is passed in, the newly created * struct device will be a child of that device in sysfs. * The pointer to the struct device will be returned from the call. * Any further sysfs files that might be required can be created using this * pointer. * * Returns &struct device pointer on success, or ERR_PTR() on error. */ struct device *device_create(const struct class *class, struct device *parent, dev_t devt, void *drvdata, const char *fmt, ...) { va_list vargs; struct device *dev; va_start(vargs, fmt); dev = device_create_groups_vargs(class, parent, devt, drvdata, NULL, fmt, vargs); va_end(vargs); return dev; } EXPORT_SYMBOL_GPL(device_create); /** * device_create_with_groups - creates a device and registers it with sysfs * @class: pointer to the struct class that this device should be registered to * @parent: pointer to the parent struct device of this new device, if any * @devt: the dev_t for the char device to be added * @drvdata: the data to be added to the device for callbacks * @groups: NULL-terminated list of attribute groups to be created * @fmt: string for the device's name * * This function can be used by char device classes. A struct device * will be created in sysfs, registered to the specified class. * Additional attributes specified in the groups parameter will also * be created automatically. * * A "dev" file will be created, showing the dev_t for the device, if * the dev_t is not 0,0. * If a pointer to a parent struct device is passed in, the newly created * struct device will be a child of that device in sysfs. * The pointer to the struct device will be returned from the call. * Any further sysfs files that might be required can be created using this * pointer. * * Returns &struct device pointer on success, or ERR_PTR() on error. */ struct device *device_create_with_groups(const struct class *class, struct device *parent, dev_t devt, void *drvdata, const struct attribute_group **groups, const char *fmt, ...) { va_list vargs; struct device *dev; va_start(vargs, fmt); dev = device_create_groups_vargs(class, parent, devt, drvdata, groups, fmt, vargs); va_end(vargs); return dev; } EXPORT_SYMBOL_GPL(device_create_with_groups); /** * device_destroy - removes a device that was created with device_create() * @class: pointer to the struct class that this device was registered with * @devt: the dev_t of the device that was previously registered * * This call unregisters and cleans up a device that was created with a * call to device_create(). */ void device_destroy(const struct class *class, dev_t devt) { struct device *dev; dev = class_find_device_by_devt(class, devt); if (dev) { put_device(dev); device_unregister(dev); } } EXPORT_SYMBOL_GPL(device_destroy); /** * device_rename - renames a device * @dev: the pointer to the struct device to be renamed * @new_name: the new name of the device * * It is the responsibility of the caller to provide mutual * exclusion between two different calls of device_rename * on the same device to ensure that new_name is valid and * won't conflict with other devices. * * Note: given that some subsystems (networking and infiniband) use this * function, with no immediate plans for this to change, we cannot assume or * require that this function not be called at all. * * However, if you're writing new code, do not call this function. The following * text from Kay Sievers offers some insight: * * Renaming devices is racy at many levels, symlinks and other stuff are not * replaced atomically, and you get a "move" uevent, but it's not easy to * connect the event to the old and new device. Device nodes are not renamed at * all, there isn't even support for that in the kernel now. * * In the meantime, during renaming, your target name might be taken by another * driver, creating conflicts. Or the old name is taken directly after you * renamed it -- then you get events for the same DEVPATH, before you even see * the "move" event. It's just a mess, and nothing new should ever rely on * kernel device renaming. Besides that, it's not even implemented now for * other things than (driver-core wise very simple) network devices. * * Make up a "real" name in the driver before you register anything, or add * some other attributes for userspace to find the device, or use udev to add * symlinks -- but never rename kernel devices later, it's a complete mess. We * don't even want to get into that and try to implement the missing pieces in * the core. We really have other pieces to fix in the driver core mess. :) */ int device_rename(struct device *dev, const char *new_name) { struct subsys_private *sp = NULL; struct kobject *kobj = &dev->kobj; char *old_device_name = NULL; int error; bool is_link_renamed = false; dev = get_device(dev); if (!dev) return -EINVAL; dev_dbg(dev, "renaming to %s\n", new_name); old_device_name = kstrdup(dev_name(dev), GFP_KERNEL); if (!old_device_name) { error = -ENOMEM; goto out; } if (dev->class) { sp = class_to_subsys(dev->class); if (!sp) { error = -EINVAL; goto out; } error = sysfs_rename_link_ns(&sp->subsys.kobj, kobj, old_device_name, new_name, kobject_namespace(kobj)); if (error) goto out; is_link_renamed = true; } error = kobject_rename(kobj, new_name); out: if (error && is_link_renamed) sysfs_rename_link_ns(&sp->subsys.kobj, kobj, new_name, old_device_name, kobject_namespace(kobj)); subsys_put(sp); put_device(dev); kfree(old_device_name); return error; } EXPORT_SYMBOL_GPL(device_rename); static int device_move_class_links(struct device *dev, struct device *old_parent, struct device *new_parent) { int error = 0; if (old_parent) sysfs_remove_link(&dev->kobj, "device"); if (new_parent) error = sysfs_create_link(&dev->kobj, &new_parent->kobj, "device"); return error; } /** * device_move - moves a device to a new parent * @dev: the pointer to the struct device to be moved * @new_parent: the new parent of the device (can be NULL) * @dpm_order: how to reorder the dpm_list */ int device_move(struct device *dev, struct device *new_parent, enum dpm_order dpm_order) { int error; struct device *old_parent; struct kobject *new_parent_kobj; dev = get_device(dev); if (!dev) return -EINVAL; device_pm_lock(); new_parent = get_device(new_parent); new_parent_kobj = get_device_parent(dev, new_parent); if (IS_ERR(new_parent_kobj)) { error = PTR_ERR(new_parent_kobj); put_device(new_parent); goto out; } pr_debug("device: '%s': %s: moving to '%s'\n", dev_name(dev), __func__, new_parent ? dev_name(new_parent) : "<NULL>"); error = kobject_move(&dev->kobj, new_parent_kobj); if (error) { cleanup_glue_dir(dev, new_parent_kobj); put_device(new_parent); goto out; } old_parent = dev->parent; dev->parent = new_parent; if (old_parent) klist_remove(&dev->p->knode_parent); if (new_parent) { klist_add_tail(&dev->p->knode_parent, &new_parent->p->klist_children); set_dev_node(dev, dev_to_node(new_parent)); } if (dev->class) { error = device_move_class_links(dev, old_parent, new_parent); if (error) { /* We ignore errors on cleanup since we're hosed anyway... */ device_move_class_links(dev, new_parent, old_parent); if (!kobject_move(&dev->kobj, &old_parent->kobj)) { if (new_parent) klist_remove(&dev->p->knode_parent); dev->parent = old_parent; if (old_parent) { klist_add_tail(&dev->p->knode_parent, &old_parent->p->klist_children); set_dev_node(dev, dev_to_node(old_parent)); } } cleanup_glue_dir(dev, new_parent_kobj); put_device(new_parent); goto out; } } switch (dpm_order) { case DPM_ORDER_NONE: break; case DPM_ORDER_DEV_AFTER_PARENT: device_pm_move_after(dev, new_parent); devices_kset_move_after(dev, new_parent); break; case DPM_ORDER_PARENT_BEFORE_DEV: device_pm_move_before(new_parent, dev); devices_kset_move_before(new_parent, dev); break; case DPM_ORDER_DEV_LAST: device_pm_move_last(dev); devices_kset_move_last(dev); break; } put_device(old_parent); out: device_pm_unlock(); put_device(dev); return error; } EXPORT_SYMBOL_GPL(device_move); static int device_attrs_change_owner(struct device *dev, kuid_t kuid, kgid_t kgid) { struct kobject *kobj = &dev->kobj; const struct class *class = dev->class; const struct device_type *type = dev->type; int error; if (class) { /* * Change the device groups of the device class for @dev to * @kuid/@kgid. */ error = sysfs_groups_change_owner(kobj, class->dev_groups, kuid, kgid); if (error) return error; } if (type) { /* * Change the device groups of the device type for @dev to * @kuid/@kgid. */ error = sysfs_groups_change_owner(kobj, type->groups, kuid, kgid); if (error) return error; } /* Change the device groups of @dev to @kuid/@kgid. */ error = sysfs_groups_change_owner(kobj, dev->groups, kuid, kgid); if (error) return error; if (device_supports_offline(dev) && !dev->offline_disabled) { /* Change online device attributes of @dev to @kuid/@kgid. */ error = sysfs_file_change_owner(kobj, dev_attr_online.attr.name, kuid, kgid); if (error) return error; } return 0; } /** * device_change_owner - change the owner of an existing device. * @dev: device. * @kuid: new owner's kuid * @kgid: new owner's kgid * * This changes the owner of @dev and its corresponding sysfs entries to * @kuid/@kgid. This function closely mirrors how @dev was added via driver * core. * * Returns 0 on success or error code on failure. */ int device_change_owner(struct device *dev, kuid_t kuid, kgid_t kgid) { int error; struct kobject *kobj = &dev->kobj; struct subsys_private *sp; dev = get_device(dev); if (!dev) return -EINVAL; /* * Change the kobject and the default attributes and groups of the * ktype associated with it to @kuid/@kgid. */ error = sysfs_change_owner(kobj, kuid, kgid); if (error) goto out; /* * Change the uevent file for @dev to the new owner. The uevent file * was created in a separate step when @dev got added and we mirror * that step here. */ error = sysfs_file_change_owner(kobj, dev_attr_uevent.attr.name, kuid, kgid); if (error) goto out; /* * Change the device groups, the device groups associated with the * device class, and the groups associated with the device type of @dev * to @kuid/@kgid. */ error = device_attrs_change_owner(dev, kuid, kgid); if (error) goto out; error = dpm_sysfs_change_owner(dev, kuid, kgid); if (error) goto out; /* * Change the owner of the symlink located in the class directory of * the device class associated with @dev which points to the actual * directory entry for @dev to @kuid/@kgid. This ensures that the * symlink shows the same permissions as its target. */ sp = class_to_subsys(dev->class); if (!sp) { error = -EINVAL; goto out; } error = sysfs_link_change_owner(&sp->subsys.kobj, &dev->kobj, dev_name(dev), kuid, kgid); subsys_put(sp); out: put_device(dev); return error; } EXPORT_SYMBOL_GPL(device_change_owner); /** * device_shutdown - call ->shutdown() on each device to shutdown. */ void device_shutdown(void) { struct device *dev, *parent; wait_for_device_probe(); device_block_probing(); cpufreq_suspend(); spin_lock(&devices_kset->list_lock); /* * Walk the devices list backward, shutting down each in turn. * Beware that device unplug events may also start pulling * devices offline, even as the system is shutting down. */ while (!list_empty(&devices_kset->list)) { dev = list_entry(devices_kset->list.prev, struct device, kobj.entry); /* * hold reference count of device's parent to * prevent it from being freed because parent's * lock is to be held */ parent = get_device(dev->parent); get_device(dev); /* * Make sure the device is off the kset list, in the * event that dev->*->shutdown() doesn't remove it. */ list_del_init(&dev->kobj.entry); spin_unlock(&devices_kset->list_lock); /* hold lock to avoid race with probe/release */ if (parent) device_lock(parent); device_lock(dev); /* Don't allow any more runtime suspends */ pm_runtime_get_noresume(dev); pm_runtime_barrier(dev); if (dev->class && dev->class->shutdown_pre) { if (initcall_debug) dev_info(dev, "shutdown_pre\n"); dev->class->shutdown_pre(dev); } if (dev->bus && dev->bus->shutdown) { if (initcall_debug) dev_info(dev, "shutdown\n"); dev->bus->shutdown(dev); } else if (dev->driver && dev->driver->shutdown) { if (initcall_debug) dev_info(dev, "shutdown\n"); dev->driver->shutdown(dev); } device_unlock(dev); if (parent) device_unlock(parent); put_device(dev); put_device(parent); spin_lock(&devices_kset->list_lock); } spin_unlock(&devices_kset->list_lock); } /* * Device logging functions */ #ifdef CONFIG_PRINTK static void set_dev_info(const struct device *dev, struct dev_printk_info *dev_info) { const char *subsys; memset(dev_info, 0, sizeof(*dev_info)); if (dev->class) subsys = dev->class->name; else if (dev->bus) subsys = dev->bus->name; else return; strscpy(dev_info->subsystem, subsys); /* * Add device identifier DEVICE=: * b12:8 block dev_t * c127:3 char dev_t * n8 netdev ifindex * +sound:card0 subsystem:devname */ if (MAJOR(dev->devt)) { char c; if (strcmp(subsys, "block") == 0) c = 'b'; else c = 'c'; snprintf(dev_info->device, sizeof(dev_info->device), "%c%u:%u", c, MAJOR(dev->devt), MINOR(dev->devt)); } else if (strcmp(subsys, "net") == 0) { struct net_device *net = to_net_dev(dev); snprintf(dev_info->device, sizeof(dev_info->device), "n%u", net->ifindex); } else { snprintf(dev_info->device, sizeof(dev_info->device), "+%s:%s", subsys, dev_name(dev)); } } int dev_vprintk_emit(int level, const struct device *dev, const char *fmt, va_list args) { struct dev_printk_info dev_info; set_dev_info(dev, &dev_info); return vprintk_emit(0, level, &dev_info, fmt, args); } EXPORT_SYMBOL(dev_vprintk_emit); int dev_printk_emit(int level, const struct device *dev, const char *fmt, ...) { va_list args; int r; va_start(args, fmt); r = dev_vprintk_emit(level, dev, fmt, args); va_end(args); return r; } EXPORT_SYMBOL(dev_printk_emit); static void __dev_printk(const char *level, const struct device *dev, struct va_format *vaf) { if (dev) dev_printk_emit(level[1] - '0', dev, "%s %s: %pV", dev_driver_string(dev), dev_name(dev), vaf); else printk("%s(NULL device *): %pV", level, vaf); } void _dev_printk(const char *level, const struct device *dev, const char *fmt, ...) { struct va_format vaf; va_list args; va_start(args, fmt); vaf.fmt = fmt; vaf.va = &args; __dev_printk(level, dev, &vaf); va_end(args); } EXPORT_SYMBOL(_dev_printk); #define define_dev_printk_level(func, kern_level) \ void func(const struct device *dev, const char *fmt, ...) \ { \ struct va_format vaf; \ va_list args; \ \ va_start(args, fmt); \ \ vaf.fmt = fmt; \ vaf.va = &args; \ \ __dev_printk(kern_level, dev, &vaf); \ \ va_end(args); \ } \ EXPORT_SYMBOL(func); define_dev_printk_level(_dev_emerg, KERN_EMERG); define_dev_printk_level(_dev_alert, KERN_ALERT); define_dev_printk_level(_dev_crit, KERN_CRIT); define_dev_printk_level(_dev_err, KERN_ERR); define_dev_printk_level(_dev_warn, KERN_WARNING); define_dev_printk_level(_dev_notice, KERN_NOTICE); define_dev_printk_level(_dev_info, KERN_INFO); #endif static void __dev_probe_failed(const struct device *dev, int err, bool fatal, const char *fmt, va_list vargsp) { struct va_format vaf; va_list vargs; /* * On x86_64 and possibly on other architectures, va_list is actually a * size-1 array containing a structure. As a result, function parameter * vargsp decays from T[1] to T*, and &vargsp has type T** rather than * T(*)[1], which is expected by its assignment to vaf.va below. * * One standard way to solve this mess is by creating a copy in a local * variable of type va_list and then using a pointer to that local copy * instead, which is the approach employed here. */ va_copy(vargs, vargsp); vaf.fmt = fmt; vaf.va = &vargs; switch (err) { case -EPROBE_DEFER: device_set_deferred_probe_reason(dev, &vaf); dev_dbg(dev, "error %pe: %pV", ERR_PTR(err), &vaf); break; case -ENOMEM: /* Don't print anything on -ENOMEM, there's already enough output */ break; default: /* Log fatal final failures as errors, otherwise produce warnings */ if (fatal) dev_err(dev, "error %pe: %pV", ERR_PTR(err), &vaf); else dev_warn(dev, "error %pe: %pV", ERR_PTR(err), &vaf); break; } va_end(vargs); } /** * dev_err_probe - probe error check and log helper * @dev: the pointer to the struct device * @err: error value to test * @fmt: printf-style format string * @...: arguments as specified in the format string * * This helper implements common pattern present in probe functions for error * checking: print debug or error message depending if the error value is * -EPROBE_DEFER and propagate error upwards. * In case of -EPROBE_DEFER it sets also defer probe reason, which can be * checked later by reading devices_deferred debugfs attribute. * It replaces the following code sequence:: * * if (err != -EPROBE_DEFER) * dev_err(dev, ...); * else * dev_dbg(dev, ...); * return err; * * with:: * * return dev_err_probe(dev, err, ...); * * Using this helper in your probe function is totally fine even if @err * is known to never be -EPROBE_DEFER. * The benefit compared to a normal dev_err() is the standardized format * of the error code, which is emitted symbolically (i.e. you get "EAGAIN" * instead of "-35"), and having the error code returned allows more * compact error paths. * * Returns @err. */ int dev_err_probe(const struct device *dev, int err, const char *fmt, ...) { va_list vargs; va_start(vargs, fmt); /* Use dev_err() for logging when err doesn't equal -EPROBE_DEFER */ __dev_probe_failed(dev, err, true, fmt, vargs); va_end(vargs); return err; } EXPORT_SYMBOL_GPL(dev_err_probe); /** * dev_warn_probe - probe error check and log helper * @dev: the pointer to the struct device * @err: error value to test * @fmt: printf-style format string * @...: arguments as specified in the format string * * This helper implements common pattern present in probe functions for error * checking: print debug or warning message depending if the error value is * -EPROBE_DEFER and propagate error upwards. * In case of -EPROBE_DEFER it sets also defer probe reason, which can be * checked later by reading devices_deferred debugfs attribute. * It replaces the following code sequence:: * * if (err != -EPROBE_DEFER) * dev_warn(dev, ...); * else * dev_dbg(dev, ...); * return err; * * with:: * * return dev_warn_probe(dev, err, ...); * * Using this helper in your probe function is totally fine even if @err * is known to never be -EPROBE_DEFER. * The benefit compared to a normal dev_warn() is the standardized format * of the error code, which is emitted symbolically (i.e. you get "EAGAIN" * instead of "-35"), and having the error code returned allows more * compact error paths. * * Returns @err. */ int dev_warn_probe(const struct device *dev, int err, const char *fmt, ...) { va_list vargs; va_start(vargs, fmt); /* Use dev_warn() for logging when err doesn't equal -EPROBE_DEFER */ __dev_probe_failed(dev, err, false, fmt, vargs); va_end(vargs); return err; } EXPORT_SYMBOL_GPL(dev_warn_probe); static inline bool fwnode_is_primary(struct fwnode_handle *fwnode) { return fwnode && !IS_ERR(fwnode->secondary); } /** * set_primary_fwnode - Change the primary firmware node of a given device. * @dev: Device to handle. * @fwnode: New primary firmware node of the device. * * Set the device's firmware node pointer to @fwnode, but if a secondary * firmware node of the device is present, preserve it. * * Valid fwnode cases are: * - primary --> secondary --> -ENODEV * - primary --> NULL * - secondary --> -ENODEV * - NULL */ void set_primary_fwnode(struct device *dev, struct fwnode_handle *fwnode) { struct device *parent = dev->parent; struct fwnode_handle *fn = dev->fwnode; if (fwnode) { if (fwnode_is_primary(fn)) fn = fn->secondary; if (fn) { WARN_ON(fwnode->secondary); fwnode->secondary = fn; } dev->fwnode = fwnode; } else { if (fwnode_is_primary(fn)) { dev->fwnode = fn->secondary; /* Skip nullifying fn->secondary if the primary is shared */ if (parent && fn == parent->fwnode) return; /* Set fn->secondary = NULL, so fn remains the primary fwnode */ fn->secondary = NULL; } else { dev->fwnode = NULL; } } } EXPORT_SYMBOL_GPL(set_primary_fwnode); /** * set_secondary_fwnode - Change the secondary firmware node of a given device. * @dev: Device to handle. * @fwnode: New secondary firmware node of the device. * * If a primary firmware node of the device is present, set its secondary * pointer to @fwnode. Otherwise, set the device's firmware node pointer to * @fwnode. */ void set_secondary_fwnode(struct device *dev, struct fwnode_handle *fwnode) { if (fwnode) fwnode->secondary = ERR_PTR(-ENODEV); if (fwnode_is_primary(dev->fwnode)) dev->fwnode->secondary = fwnode; else dev->fwnode = fwnode; } EXPORT_SYMBOL_GPL(set_secondary_fwnode); /** * device_remove_of_node - Remove an of_node from a device * @dev: device whose device tree node is being removed */ void device_remove_of_node(struct device *dev) { dev = get_device(dev); if (!dev) return; if (!dev->of_node) goto end; if (dev->fwnode == of_fwnode_handle(dev->of_node)) dev->fwnode = NULL; of_node_put(dev->of_node); dev->of_node = NULL; end: put_device(dev); } EXPORT_SYMBOL_GPL(device_remove_of_node); /** * device_add_of_node - Add an of_node to an existing device * @dev: device whose device tree node is being added * @of_node: of_node to add * * Return: 0 on success or error code on failure. */ int device_add_of_node(struct device *dev, struct device_node *of_node) { int ret; if (!of_node) return -EINVAL; dev = get_device(dev); if (!dev) return -EINVAL; if (dev->of_node) { dev_err(dev, "Cannot replace node %pOF with %pOF\n", dev->of_node, of_node); ret = -EBUSY; goto end; } dev->of_node = of_node_get(of_node); if (!dev->fwnode) dev->fwnode = of_fwnode_handle(of_node); ret = 0; end: put_device(dev); return ret; } EXPORT_SYMBOL_GPL(device_add_of_node); /** * device_set_of_node_from_dev - reuse device-tree node of another device * @dev: device whose device-tree node is being set * @dev2: device whose device-tree node is being reused * * Takes another reference to the new device-tree node after first dropping * any reference held to the old node. */ void device_set_of_node_from_dev(struct device *dev, const struct device *dev2) { of_node_put(dev->of_node); dev->of_node = of_node_get(dev2->of_node); dev->of_node_reused = true; } EXPORT_SYMBOL_GPL(device_set_of_node_from_dev); void device_set_node(struct device *dev, struct fwnode_handle *fwnode) { dev->fwnode = fwnode; dev->of_node = to_of_node(fwnode); } EXPORT_SYMBOL_GPL(device_set_node); /** * get_dev_from_fwnode - Obtain a reference count of the struct device the * struct fwnode_handle is associated with. * @fwnode: The pointer to the struct fwnode_handle to obtain the struct device * reference count of. * * This function obtains a reference count of the device the device pointer * embedded in the struct fwnode_handle points to. * * Note that the struct device pointer embedded in struct fwnode_handle does * *not* have a reference count of the struct device itself. * * Hence, it is a UAF (and thus a bug) to call this function if the caller can't * guarantee that the last reference count of the corresponding struct device is * not dropped concurrently. * * This is possible since struct fwnode_handle has its own reference count and * hence can out-live the struct device it is associated with. */ struct device *get_dev_from_fwnode(struct fwnode_handle *fwnode) { return get_device((fwnode)->dev); } EXPORT_SYMBOL_GPL(get_dev_from_fwnode); int device_match_name(struct device *dev, const void *name) { return sysfs_streq(dev_name(dev), name); } EXPORT_SYMBOL_GPL(device_match_name); int device_match_type(struct device *dev, const void *type) { return dev->type == type; } EXPORT_SYMBOL_GPL(device_match_type); int device_match_of_node(struct device *dev, const void *np) { return np && dev->of_node == np; } EXPORT_SYMBOL_GPL(device_match_of_node); int device_match_fwnode(struct device *dev, const void *fwnode) { return fwnode && dev_fwnode(dev) == fwnode; } EXPORT_SYMBOL_GPL(device_match_fwnode); int device_match_devt(struct device *dev, const void *pdevt) { return dev->devt == *(dev_t *)pdevt; } EXPORT_SYMBOL_GPL(device_match_devt); int device_match_acpi_dev(struct device *dev, const void *adev) { return adev && ACPI_COMPANION(dev) == adev; } EXPORT_SYMBOL(device_match_acpi_dev); int device_match_acpi_handle(struct device *dev, const void *handle) { return handle && ACPI_HANDLE(dev) == handle; } EXPORT_SYMBOL(device_match_acpi_handle); int device_match_any(struct device *dev, const void *unused) { return 1; } EXPORT_SYMBOL_GPL(device_match_any); |
| 507 507 506 507 201 201 201 201 655 657 515 516 515 515 515 515 659 659 516 658 201 202 198 200 657 656 516 516 517 517 26 659 658 657 517 658 200 197 8 189 189 6 6 6 689 689 176 176 714 507 189 190 189 508 45 463 505 33 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 | // SPDX-License-Identifier: GPL-2.0-or-later /* * net-sysfs.c - network device class and attributes * * Copyright (c) 2003 Stephen Hemminger <shemminger@osdl.org> */ #include <linux/capability.h> #include <linux/kernel.h> #include <linux/netdevice.h> #include <linux/if_arp.h> #include <linux/slab.h> #include <linux/sched/signal.h> #include <linux/sched/isolation.h> #include <linux/nsproxy.h> #include <net/sock.h> #include <net/net_namespace.h> #include <linux/rtnetlink.h> #include <linux/vmalloc.h> #include <linux/export.h> #include <linux/jiffies.h> #include <linux/pm_runtime.h> #include <linux/of.h> #include <linux/of_net.h> #include <linux/cpu.h> #include <net/netdev_lock.h> #include <net/netdev_rx_queue.h> #include <net/rps.h> #include "dev.h" #include "net-sysfs.h" #ifdef CONFIG_SYSFS static const char fmt_hex[] = "%#x\n"; static const char fmt_dec[] = "%d\n"; static const char fmt_uint[] = "%u\n"; static const char fmt_ulong[] = "%lu\n"; static const char fmt_u64[] = "%llu\n"; /* Caller holds RTNL, netdev->lock or RCU */ static inline int dev_isalive(const struct net_device *dev) { return READ_ONCE(dev->reg_state) <= NETREG_REGISTERED; } /* There is a possible ABBA deadlock between rtnl_lock and kernfs_node->active, * when unregistering a net device and accessing associated sysfs files. The * potential deadlock is as follow: * * CPU 0 CPU 1 * * rtnl_lock vfs_read * unregister_netdevice_many kernfs_seq_start * device_del / kobject_put kernfs_get_active (kn->active++) * kernfs_drain sysfs_kf_seq_show * wait_event( rtnl_lock * kn->active == KN_DEACTIVATED_BIAS) -> waits on CPU 0 to release * -> waits on CPU 1 to decrease kn->active the rtnl lock. * * The historical fix was to use rtnl_trylock with restart_syscall to bail out * of sysfs operations when the lock couldn't be taken. This fixed the above * issue as it allowed CPU 1 to bail out of the ABBA situation. * * But it came with performances issues, as syscalls are being restarted in * loops when there was contention on the rtnl lock, with huge slow downs in * specific scenarios (e.g. lots of virtual interfaces created and userspace * daemons querying their attributes). * * The idea below is to bail out of the active kernfs_node protection * (kn->active) while trying to take the rtnl lock. * * This replaces rtnl_lock() and still has to be used with rtnl_unlock(). The * net device is guaranteed to be alive if this returns successfully. */ static int sysfs_rtnl_lock(struct kobject *kobj, struct attribute *attr, struct net_device *ndev) { struct kernfs_node *kn; int ret = 0; /* First, we hold a reference to the net device as the unregistration * path might run in parallel. This will ensure the net device and the * associated sysfs objects won't be freed while we try to take the rtnl * lock. */ dev_hold(ndev); /* sysfs_break_active_protection was introduced to allow self-removal of * devices and their associated sysfs files by bailing out of the * sysfs/kernfs protection. We do this here to allow the unregistration * path to complete in parallel. The following takes a reference on the * kobject and the kernfs_node being accessed. * * This works because we hold a reference onto the net device and the * unregistration path will wait for us eventually in netdev_run_todo * (outside an rtnl lock section). */ kn = sysfs_break_active_protection(kobj, attr); /* We can now try to take the rtnl lock. This can't deadlock us as the * unregistration path is able to drain sysfs files (kernfs_node) thanks * to the above dance. */ if (rtnl_lock_interruptible()) { ret = -ERESTARTSYS; goto unbreak; } /* Check dismantle on the device hasn't started, otherwise deny the * operation. */ if (!dev_isalive(ndev)) { rtnl_unlock(); ret = -ENODEV; goto unbreak; } /* We are now sure the device dismantle hasn't started nor that it can * start before we exit the locking section as we hold the rtnl lock. * There's no need to keep unbreaking the sysfs protection nor to hold * a net device reference from that point; that was only needed to take * the rtnl lock. */ unbreak: sysfs_unbreak_active_protection(kn); dev_put(ndev); return ret; } /* use same locking rules as GIF* ioctl's */ static ssize_t netdev_show(const struct device *dev, struct device_attribute *attr, char *buf, ssize_t (*format)(const struct net_device *, char *)) { struct net_device *ndev = to_net_dev(dev); ssize_t ret = -EINVAL; rcu_read_lock(); if (dev_isalive(ndev)) ret = (*format)(ndev, buf); rcu_read_unlock(); return ret; } /* generate a show function for simple field */ #define NETDEVICE_SHOW(field, format_string) \ static ssize_t format_##field(const struct net_device *dev, char *buf) \ { \ return sysfs_emit(buf, format_string, READ_ONCE(dev->field)); \ } \ static ssize_t field##_show(struct device *dev, \ struct device_attribute *attr, char *buf) \ { \ return netdev_show(dev, attr, buf, format_##field); \ } \ #define NETDEVICE_SHOW_RO(field, format_string) \ NETDEVICE_SHOW(field, format_string); \ static DEVICE_ATTR_RO(field) #define NETDEVICE_SHOW_RW(field, format_string) \ NETDEVICE_SHOW(field, format_string); \ static DEVICE_ATTR_RW(field) /* use same locking and permission rules as SIF* ioctl's */ static ssize_t netdev_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t len, int (*set)(struct net_device *, unsigned long)) { struct net_device *netdev = to_net_dev(dev); struct net *net = dev_net(netdev); unsigned long new; int ret; if (!ns_capable(net->user_ns, CAP_NET_ADMIN)) return -EPERM; ret = kstrtoul(buf, 0, &new); if (ret) goto err; ret = sysfs_rtnl_lock(&dev->kobj, &attr->attr, netdev); if (ret) goto err; ret = (*set)(netdev, new); if (ret == 0) ret = len; rtnl_unlock(); err: return ret; } /* Same as netdev_store() but takes netdev_lock() instead of rtnl_lock() */ static ssize_t netdev_lock_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t len, int (*set)(struct net_device *, unsigned long)) { struct net_device *netdev = to_net_dev(dev); struct net *net = dev_net(netdev); unsigned long new; int ret; if (!ns_capable(net->user_ns, CAP_NET_ADMIN)) return -EPERM; ret = kstrtoul(buf, 0, &new); if (ret) return ret; netdev_lock(netdev); if (dev_isalive(netdev)) { ret = (*set)(netdev, new); if (ret == 0) ret = len; } netdev_unlock(netdev); return ret; } NETDEVICE_SHOW_RO(dev_id, fmt_hex); NETDEVICE_SHOW_RO(dev_port, fmt_dec); NETDEVICE_SHOW_RO(addr_assign_type, fmt_dec); NETDEVICE_SHOW_RO(addr_len, fmt_dec); NETDEVICE_SHOW_RO(ifindex, fmt_dec); NETDEVICE_SHOW_RO(type, fmt_dec); NETDEVICE_SHOW_RO(link_mode, fmt_dec); static ssize_t iflink_show(struct device *dev, struct device_attribute *attr, char *buf) { struct net_device *ndev = to_net_dev(dev); return sysfs_emit(buf, fmt_dec, dev_get_iflink(ndev)); } static DEVICE_ATTR_RO(iflink); static ssize_t format_name_assign_type(const struct net_device *dev, char *buf) { return sysfs_emit(buf, fmt_dec, READ_ONCE(dev->name_assign_type)); } static ssize_t name_assign_type_show(struct device *dev, struct device_attribute *attr, char *buf) { struct net_device *ndev = to_net_dev(dev); ssize_t ret = -EINVAL; if (READ_ONCE(ndev->name_assign_type) != NET_NAME_UNKNOWN) ret = netdev_show(dev, attr, buf, format_name_assign_type); return ret; } static DEVICE_ATTR_RO(name_assign_type); /* use same locking rules as GIFHWADDR ioctl's (netif_get_mac_address()) */ static ssize_t address_show(struct device *dev, struct device_attribute *attr, char *buf) { struct net_device *ndev = to_net_dev(dev); ssize_t ret = -EINVAL; down_read(&dev_addr_sem); rcu_read_lock(); if (dev_isalive(ndev)) ret = sysfs_format_mac(buf, ndev->dev_addr, ndev->addr_len); rcu_read_unlock(); up_read(&dev_addr_sem); return ret; } static DEVICE_ATTR_RO(address); static ssize_t broadcast_show(struct device *dev, struct device_attribute *attr, char *buf) { struct net_device *ndev = to_net_dev(dev); int ret = -EINVAL; rcu_read_lock(); if (dev_isalive(ndev)) ret = sysfs_format_mac(buf, ndev->broadcast, ndev->addr_len); rcu_read_unlock(); return ret; } static DEVICE_ATTR_RO(broadcast); static int change_carrier(struct net_device *dev, unsigned long new_carrier) { if (!netif_running(dev)) return -EINVAL; return dev_change_carrier(dev, (bool)new_carrier); } static ssize_t carrier_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t len) { struct net_device *netdev = to_net_dev(dev); /* The check is also done in change_carrier; this helps returning early * without hitting the locking section in netdev_store. */ if (!netdev->netdev_ops->ndo_change_carrier) return -EOPNOTSUPP; return netdev_store(dev, attr, buf, len, change_carrier); } static ssize_t carrier_show(struct device *dev, struct device_attribute *attr, char *buf) { struct net_device *netdev = to_net_dev(dev); int ret; ret = sysfs_rtnl_lock(&dev->kobj, &attr->attr, netdev); if (ret) return ret; ret = -EINVAL; if (netif_running(netdev)) { /* Synchronize carrier state with link watch, * see also rtnl_getlink(). */ linkwatch_sync_dev(netdev); ret = sysfs_emit(buf, fmt_dec, !!netif_carrier_ok(netdev)); } rtnl_unlock(); return ret; } static DEVICE_ATTR_RW(carrier); static ssize_t speed_show(struct device *dev, struct device_attribute *attr, char *buf) { struct net_device *netdev = to_net_dev(dev); int ret = -EINVAL; /* The check is also done in __ethtool_get_link_ksettings; this helps * returning early without hitting the locking section below. */ if (!netdev->ethtool_ops->get_link_ksettings) return ret; ret = sysfs_rtnl_lock(&dev->kobj, &attr->attr, netdev); if (ret) return ret; ret = -EINVAL; if (netif_running(netdev)) { struct ethtool_link_ksettings cmd; if (!__ethtool_get_link_ksettings(netdev, &cmd)) ret = sysfs_emit(buf, fmt_dec, cmd.base.speed); } rtnl_unlock(); return ret; } static DEVICE_ATTR_RO(speed); static ssize_t duplex_show(struct device *dev, struct device_attribute *attr, char *buf) { struct net_device *netdev = to_net_dev(dev); int ret = -EINVAL; /* The check is also done in __ethtool_get_link_ksettings; this helps * returning early without hitting the locking section below. */ if (!netdev->ethtool_ops->get_link_ksettings) return ret; ret = sysfs_rtnl_lock(&dev->kobj, &attr->attr, netdev); if (ret) return ret; ret = -EINVAL; if (netif_running(netdev)) { struct ethtool_link_ksettings cmd; if (!__ethtool_get_link_ksettings(netdev, &cmd)) { const char *duplex; switch (cmd.base.duplex) { case DUPLEX_HALF: duplex = "half"; break; case DUPLEX_FULL: duplex = "full"; break; default: duplex = "unknown"; break; } ret = sysfs_emit(buf, "%s\n", duplex); } } rtnl_unlock(); return ret; } static DEVICE_ATTR_RO(duplex); static ssize_t testing_show(struct device *dev, struct device_attribute *attr, char *buf) { struct net_device *netdev = to_net_dev(dev); if (netif_running(netdev)) return sysfs_emit(buf, fmt_dec, !!netif_testing(netdev)); return -EINVAL; } static DEVICE_ATTR_RO(testing); static ssize_t dormant_show(struct device *dev, struct device_attribute *attr, char *buf) { struct net_device *netdev = to_net_dev(dev); if (netif_running(netdev)) return sysfs_emit(buf, fmt_dec, !!netif_dormant(netdev)); return -EINVAL; } static DEVICE_ATTR_RO(dormant); static const char *const operstates[] = { "unknown", "notpresent", /* currently unused */ "down", "lowerlayerdown", "testing", "dormant", "up" }; static ssize_t operstate_show(struct device *dev, struct device_attribute *attr, char *buf) { const struct net_device *netdev = to_net_dev(dev); unsigned char operstate; operstate = READ_ONCE(netdev->operstate); if (!netif_running(netdev)) operstate = IF_OPER_DOWN; if (operstate >= ARRAY_SIZE(operstates)) return -EINVAL; /* should not happen */ return sysfs_emit(buf, "%s\n", operstates[operstate]); } static DEVICE_ATTR_RO(operstate); static ssize_t carrier_changes_show(struct device *dev, struct device_attribute *attr, char *buf) { struct net_device *netdev = to_net_dev(dev); return sysfs_emit(buf, fmt_dec, atomic_read(&netdev->carrier_up_count) + atomic_read(&netdev->carrier_down_count)); } static DEVICE_ATTR_RO(carrier_changes); static ssize_t carrier_up_count_show(struct device *dev, struct device_attribute *attr, char *buf) { struct net_device *netdev = to_net_dev(dev); return sysfs_emit(buf, fmt_dec, atomic_read(&netdev->carrier_up_count)); } static DEVICE_ATTR_RO(carrier_up_count); static ssize_t carrier_down_count_show(struct device *dev, struct device_attribute *attr, char *buf) { struct net_device *netdev = to_net_dev(dev); return sysfs_emit(buf, fmt_dec, atomic_read(&netdev->carrier_down_count)); } static DEVICE_ATTR_RO(carrier_down_count); /* read-write attributes */ static int change_mtu(struct net_device *dev, unsigned long new_mtu) { return dev_set_mtu(dev, (int)new_mtu); } static ssize_t mtu_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t len) { return netdev_store(dev, attr, buf, len, change_mtu); } NETDEVICE_SHOW_RW(mtu, fmt_dec); static int change_flags(struct net_device *dev, unsigned long new_flags) { return dev_change_flags(dev, (unsigned int)new_flags, NULL); } static ssize_t flags_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t len) { return netdev_store(dev, attr, buf, len, change_flags); } NETDEVICE_SHOW_RW(flags, fmt_hex); static ssize_t tx_queue_len_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t len) { if (!capable(CAP_NET_ADMIN)) return -EPERM; return netdev_store(dev, attr, buf, len, dev_change_tx_queue_len); } NETDEVICE_SHOW_RW(tx_queue_len, fmt_dec); static int change_gro_flush_timeout(struct net_device *dev, unsigned long val) { netdev_set_gro_flush_timeout(dev, val); return 0; } static ssize_t gro_flush_timeout_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t len) { if (!capable(CAP_NET_ADMIN)) return -EPERM; return netdev_lock_store(dev, attr, buf, len, change_gro_flush_timeout); } NETDEVICE_SHOW_RW(gro_flush_timeout, fmt_ulong); static int change_napi_defer_hard_irqs(struct net_device *dev, unsigned long val) { if (val > S32_MAX) return -ERANGE; netdev_set_defer_hard_irqs(dev, (u32)val); return 0; } static ssize_t napi_defer_hard_irqs_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t len) { if (!capable(CAP_NET_ADMIN)) return -EPERM; return netdev_lock_store(dev, attr, buf, len, change_napi_defer_hard_irqs); } NETDEVICE_SHOW_RW(napi_defer_hard_irqs, fmt_uint); static ssize_t ifalias_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t len) { struct net_device *netdev = to_net_dev(dev); struct net *net = dev_net(netdev); size_t count = len; ssize_t ret; if (!ns_capable(net->user_ns, CAP_NET_ADMIN)) return -EPERM; /* ignore trailing newline */ if (len > 0 && buf[len - 1] == '\n') --count; ret = sysfs_rtnl_lock(&dev->kobj, &attr->attr, netdev); if (ret) return ret; ret = dev_set_alias(netdev, buf, count); if (ret < 0) goto err; ret = len; netdev_state_change(netdev); err: rtnl_unlock(); return ret; } static ssize_t ifalias_show(struct device *dev, struct device_attribute *attr, char *buf) { const struct net_device *netdev = to_net_dev(dev); char tmp[IFALIASZ]; ssize_t ret; ret = dev_get_alias(netdev, tmp, sizeof(tmp)); if (ret > 0) ret = sysfs_emit(buf, "%s\n", tmp); return ret; } static DEVICE_ATTR_RW(ifalias); static int change_group(struct net_device *dev, unsigned long new_group) { dev_set_group(dev, (int)new_group); return 0; } static ssize_t group_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t len) { return netdev_store(dev, attr, buf, len, change_group); } NETDEVICE_SHOW(group, fmt_dec); static DEVICE_ATTR(netdev_group, 0644, group_show, group_store); static int change_proto_down(struct net_device *dev, unsigned long proto_down) { return dev_change_proto_down(dev, (bool)proto_down); } static ssize_t proto_down_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t len) { return netdev_store(dev, attr, buf, len, change_proto_down); } NETDEVICE_SHOW_RW(proto_down, fmt_dec); static ssize_t phys_port_id_show(struct device *dev, struct device_attribute *attr, char *buf) { struct net_device *netdev = to_net_dev(dev); struct netdev_phys_item_id ppid; ssize_t ret; ret = sysfs_rtnl_lock(&dev->kobj, &attr->attr, netdev); if (ret) return ret; ret = dev_get_phys_port_id(netdev, &ppid); if (!ret) ret = sysfs_emit(buf, "%*phN\n", ppid.id_len, ppid.id); rtnl_unlock(); return ret; } static DEVICE_ATTR_RO(phys_port_id); static ssize_t phys_port_name_show(struct device *dev, struct device_attribute *attr, char *buf) { struct net_device *netdev = to_net_dev(dev); char name[IFNAMSIZ]; ssize_t ret; ret = sysfs_rtnl_lock(&dev->kobj, &attr->attr, netdev); if (ret) return ret; ret = dev_get_phys_port_name(netdev, name, sizeof(name)); if (!ret) ret = sysfs_emit(buf, "%s\n", name); rtnl_unlock(); return ret; } static DEVICE_ATTR_RO(phys_port_name); static ssize_t phys_switch_id_show(struct device *dev, struct device_attribute *attr, char *buf) { struct net_device *netdev = to_net_dev(dev); struct netdev_phys_item_id ppid = { }; ssize_t ret; ret = sysfs_rtnl_lock(&dev->kobj, &attr->attr, netdev); if (ret) return ret; ret = netif_get_port_parent_id(netdev, &ppid, false); if (!ret) ret = sysfs_emit(buf, "%*phN\n", ppid.id_len, ppid.id); rtnl_unlock(); return ret; } static DEVICE_ATTR_RO(phys_switch_id); static struct attribute *netdev_phys_attrs[] __ro_after_init = { &dev_attr_phys_port_id.attr, &dev_attr_phys_port_name.attr, &dev_attr_phys_switch_id.attr, NULL, }; static umode_t netdev_phys_is_visible(struct kobject *kobj, struct attribute *attr, int index) { struct device *dev = kobj_to_dev(kobj); struct net_device *netdev = to_net_dev(dev); if (attr == &dev_attr_phys_port_id.attr) { if (!netdev->netdev_ops->ndo_get_phys_port_id) return 0; } else if (attr == &dev_attr_phys_port_name.attr) { if (!netdev->netdev_ops->ndo_get_phys_port_name && !netdev->devlink_port) return 0; } else if (attr == &dev_attr_phys_switch_id.attr) { if (!netdev->netdev_ops->ndo_get_port_parent_id && !netdev->devlink_port) return 0; } return attr->mode; } static const struct attribute_group netdev_phys_group = { .attrs = netdev_phys_attrs, .is_visible = netdev_phys_is_visible, }; static ssize_t threaded_show(struct device *dev, struct device_attribute *attr, char *buf) { struct net_device *netdev = to_net_dev(dev); ssize_t ret = -EINVAL; rcu_read_lock(); if (dev_isalive(netdev)) ret = sysfs_emit(buf, fmt_dec, READ_ONCE(netdev->threaded)); rcu_read_unlock(); return ret; } static int modify_napi_threaded(struct net_device *dev, unsigned long val) { int ret; if (list_empty(&dev->napi_list)) return -EOPNOTSUPP; if (val != 0 && val != 1) return -EOPNOTSUPP; ret = netif_set_threaded(dev, val); return ret; } static ssize_t threaded_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t len) { return netdev_lock_store(dev, attr, buf, len, modify_napi_threaded); } static DEVICE_ATTR_RW(threaded); static struct attribute *net_class_attrs[] __ro_after_init = { &dev_attr_netdev_group.attr, &dev_attr_type.attr, &dev_attr_dev_id.attr, &dev_attr_dev_port.attr, &dev_attr_iflink.attr, &dev_attr_ifindex.attr, &dev_attr_name_assign_type.attr, &dev_attr_addr_assign_type.attr, &dev_attr_addr_len.attr, &dev_attr_link_mode.attr, &dev_attr_address.attr, &dev_attr_broadcast.attr, &dev_attr_speed.attr, &dev_attr_duplex.attr, &dev_attr_dormant.attr, &dev_attr_testing.attr, &dev_attr_operstate.attr, &dev_attr_carrier_changes.attr, &dev_attr_ifalias.attr, &dev_attr_carrier.attr, &dev_attr_mtu.attr, &dev_attr_flags.attr, &dev_attr_tx_queue_len.attr, &dev_attr_gro_flush_timeout.attr, &dev_attr_napi_defer_hard_irqs.attr, &dev_attr_proto_down.attr, &dev_attr_carrier_up_count.attr, &dev_attr_carrier_down_count.attr, &dev_attr_threaded.attr, NULL, }; ATTRIBUTE_GROUPS(net_class); /* Show a given an attribute in the statistics group */ static ssize_t netstat_show(const struct device *d, struct device_attribute *attr, char *buf, unsigned long offset) { struct net_device *dev = to_net_dev(d); ssize_t ret = -EINVAL; WARN_ON(offset > sizeof(struct rtnl_link_stats64) || offset % sizeof(u64) != 0); rcu_read_lock(); if (dev_isalive(dev)) { struct rtnl_link_stats64 temp; const struct rtnl_link_stats64 *stats = dev_get_stats(dev, &temp); ret = sysfs_emit(buf, fmt_u64, *(u64 *)(((u8 *)stats) + offset)); } rcu_read_unlock(); return ret; } /* generate a read-only statistics attribute */ #define NETSTAT_ENTRY(name) \ static ssize_t name##_show(struct device *d, \ struct device_attribute *attr, char *buf) \ { \ return netstat_show(d, attr, buf, \ offsetof(struct rtnl_link_stats64, name)); \ } \ static DEVICE_ATTR_RO(name) NETSTAT_ENTRY(rx_packets); NETSTAT_ENTRY(tx_packets); NETSTAT_ENTRY(rx_bytes); NETSTAT_ENTRY(tx_bytes); NETSTAT_ENTRY(rx_errors); NETSTAT_ENTRY(tx_errors); NETSTAT_ENTRY(rx_dropped); NETSTAT_ENTRY(tx_dropped); NETSTAT_ENTRY(multicast); NETSTAT_ENTRY(collisions); NETSTAT_ENTRY(rx_length_errors); NETSTAT_ENTRY(rx_over_errors); NETSTAT_ENTRY(rx_crc_errors); NETSTAT_ENTRY(rx_frame_errors); NETSTAT_ENTRY(rx_fifo_errors); NETSTAT_ENTRY(rx_missed_errors); NETSTAT_ENTRY(tx_aborted_errors); NETSTAT_ENTRY(tx_carrier_errors); NETSTAT_ENTRY(tx_fifo_errors); NETSTAT_ENTRY(tx_heartbeat_errors); NETSTAT_ENTRY(tx_window_errors); NETSTAT_ENTRY(rx_compressed); NETSTAT_ENTRY(tx_compressed); NETSTAT_ENTRY(rx_nohandler); static struct attribute *netstat_attrs[] __ro_after_init = { &dev_attr_rx_packets.attr, &dev_attr_tx_packets.attr, &dev_attr_rx_bytes.attr, &dev_attr_tx_bytes.attr, &dev_attr_rx_errors.attr, &dev_attr_tx_errors.attr, &dev_attr_rx_dropped.attr, &dev_attr_tx_dropped.attr, &dev_attr_multicast.attr, &dev_attr_collisions.attr, &dev_attr_rx_length_errors.attr, &dev_attr_rx_over_errors.attr, &dev_attr_rx_crc_errors.attr, &dev_attr_rx_frame_errors.attr, &dev_attr_rx_fifo_errors.attr, &dev_attr_rx_missed_errors.attr, &dev_attr_tx_aborted_errors.attr, &dev_attr_tx_carrier_errors.attr, &dev_attr_tx_fifo_errors.attr, &dev_attr_tx_heartbeat_errors.attr, &dev_attr_tx_window_errors.attr, &dev_attr_rx_compressed.attr, &dev_attr_tx_compressed.attr, &dev_attr_rx_nohandler.attr, NULL }; static const struct attribute_group netstat_group = { .name = "statistics", .attrs = netstat_attrs, }; static struct attribute *wireless_attrs[] = { NULL }; static const struct attribute_group wireless_group = { .name = "wireless", .attrs = wireless_attrs, }; static bool wireless_group_needed(struct net_device *ndev) { #if IS_ENABLED(CONFIG_CFG80211) if (ndev->ieee80211_ptr) return true; #endif #if IS_ENABLED(CONFIG_WIRELESS_EXT) if (ndev->wireless_handlers) return true; #endif return false; } #else /* CONFIG_SYSFS */ #define net_class_groups NULL #endif /* CONFIG_SYSFS */ #ifdef CONFIG_SYSFS #define to_rx_queue_attr(_attr) \ container_of(_attr, struct rx_queue_attribute, attr) #define to_rx_queue(obj) container_of(obj, struct netdev_rx_queue, kobj) static ssize_t rx_queue_attr_show(struct kobject *kobj, struct attribute *attr, char *buf) { const struct rx_queue_attribute *attribute = to_rx_queue_attr(attr); struct netdev_rx_queue *queue = to_rx_queue(kobj); if (!attribute->show) return -EIO; return attribute->show(queue, buf); } static ssize_t rx_queue_attr_store(struct kobject *kobj, struct attribute *attr, const char *buf, size_t count) { const struct rx_queue_attribute *attribute = to_rx_queue_attr(attr); struct netdev_rx_queue *queue = to_rx_queue(kobj); if (!attribute->store) return -EIO; return attribute->store(queue, buf, count); } static const struct sysfs_ops rx_queue_sysfs_ops = { .show = rx_queue_attr_show, .store = rx_queue_attr_store, }; #ifdef CONFIG_RPS static ssize_t show_rps_map(struct netdev_rx_queue *queue, char *buf) { struct rps_map *map; cpumask_var_t mask; int i, len; if (!zalloc_cpumask_var(&mask, GFP_KERNEL)) return -ENOMEM; rcu_read_lock(); map = rcu_dereference(queue->rps_map); if (map) for (i = 0; i < map->len; i++) cpumask_set_cpu(map->cpus[i], mask); len = sysfs_emit(buf, "%*pb\n", cpumask_pr_args(mask)); rcu_read_unlock(); free_cpumask_var(mask); return len < PAGE_SIZE ? len : -EINVAL; } static int netdev_rx_queue_set_rps_mask(struct netdev_rx_queue *queue, cpumask_var_t mask) { static DEFINE_MUTEX(rps_map_mutex); struct rps_map *old_map, *map; int cpu, i; map = kzalloc(max_t(unsigned int, RPS_MAP_SIZE(cpumask_weight(mask)), L1_CACHE_BYTES), GFP_KERNEL); if (!map) return -ENOMEM; i = 0; for_each_cpu_and(cpu, mask, cpu_online_mask) map->cpus[i++] = cpu; if (i) { map->len = i; } else { kfree(map); map = NULL; } mutex_lock(&rps_map_mutex); old_map = rcu_dereference_protected(queue->rps_map, mutex_is_locked(&rps_map_mutex)); rcu_assign_pointer(queue->rps_map, map); if (map) static_branch_inc(&rps_needed); if (old_map) static_branch_dec(&rps_needed); mutex_unlock(&rps_map_mutex); if (old_map) kfree_rcu(old_map, rcu); return 0; } int rps_cpumask_housekeeping(struct cpumask *mask) { if (!cpumask_empty(mask)) { cpumask_and(mask, mask, housekeeping_cpumask(HK_TYPE_DOMAIN)); cpumask_and(mask, mask, housekeeping_cpumask(HK_TYPE_WQ)); if (cpumask_empty(mask)) return -EINVAL; } return 0; } static ssize_t store_rps_map(struct netdev_rx_queue *queue, const char *buf, size_t len) { cpumask_var_t mask; int err; if (!capable(CAP_NET_ADMIN)) return -EPERM; if (!alloc_cpumask_var(&mask, GFP_KERNEL)) return -ENOMEM; err = bitmap_parse(buf, len, cpumask_bits(mask), nr_cpumask_bits); if (err) goto out; err = rps_cpumask_housekeeping(mask); if (err) goto out; err = netdev_rx_queue_set_rps_mask(queue, mask); out: free_cpumask_var(mask); return err ? : len; } static ssize_t show_rps_dev_flow_table_cnt(struct netdev_rx_queue *queue, char *buf) { struct rps_dev_flow_table *flow_table; unsigned long val = 0; rcu_read_lock(); flow_table = rcu_dereference(queue->rps_flow_table); if (flow_table) val = 1UL << flow_table->log; rcu_read_unlock(); return sysfs_emit(buf, "%lu\n", val); } static void rps_dev_flow_table_release(struct rcu_head *rcu) { struct rps_dev_flow_table *table = container_of(rcu, struct rps_dev_flow_table, rcu); vfree(table); } static ssize_t store_rps_dev_flow_table_cnt(struct netdev_rx_queue *queue, const char *buf, size_t len) { unsigned long mask, count; struct rps_dev_flow_table *table, *old_table; static DEFINE_SPINLOCK(rps_dev_flow_lock); int rc; if (!capable(CAP_NET_ADMIN)) return -EPERM; rc = kstrtoul(buf, 0, &count); if (rc < 0) return rc; if (count) { mask = count - 1; /* mask = roundup_pow_of_two(count) - 1; * without overflows... */ while ((mask | (mask >> 1)) != mask) mask |= (mask >> 1); /* On 64 bit arches, must check mask fits in table->mask (u32), * and on 32bit arches, must check * RPS_DEV_FLOW_TABLE_SIZE(mask + 1) doesn't overflow. */ #if BITS_PER_LONG > 32 if (mask > (unsigned long)(u32)mask) return -EINVAL; #else if (mask > (ULONG_MAX - RPS_DEV_FLOW_TABLE_SIZE(1)) / sizeof(struct rps_dev_flow)) { /* Enforce a limit to prevent overflow */ return -EINVAL; } #endif table = vmalloc(RPS_DEV_FLOW_TABLE_SIZE(mask + 1)); if (!table) return -ENOMEM; table->log = ilog2(mask) + 1; for (count = 0; count <= mask; count++) { table->flows[count].cpu = RPS_NO_CPU; table->flows[count].filter = RPS_NO_FILTER; } } else { table = NULL; } spin_lock(&rps_dev_flow_lock); old_table = rcu_dereference_protected(queue->rps_flow_table, lockdep_is_held(&rps_dev_flow_lock)); rcu_assign_pointer(queue->rps_flow_table, table); spin_unlock(&rps_dev_flow_lock); if (old_table) call_rcu(&old_table->rcu, rps_dev_flow_table_release); return len; } static struct rx_queue_attribute rps_cpus_attribute __ro_after_init = __ATTR(rps_cpus, 0644, show_rps_map, store_rps_map); static struct rx_queue_attribute rps_dev_flow_table_cnt_attribute __ro_after_init = __ATTR(rps_flow_cnt, 0644, show_rps_dev_flow_table_cnt, store_rps_dev_flow_table_cnt); #endif /* CONFIG_RPS */ static struct attribute *rx_queue_default_attrs[] __ro_after_init = { #ifdef CONFIG_RPS &rps_cpus_attribute.attr, &rps_dev_flow_table_cnt_attribute.attr, #endif NULL }; ATTRIBUTE_GROUPS(rx_queue_default); static void rx_queue_release(struct kobject *kobj) { struct netdev_rx_queue *queue = to_rx_queue(kobj); #ifdef CONFIG_RPS struct rps_map *map; struct rps_dev_flow_table *flow_table; map = rcu_dereference_protected(queue->rps_map, 1); if (map) { RCU_INIT_POINTER(queue->rps_map, NULL); kfree_rcu(map, rcu); } flow_table = rcu_dereference_protected(queue->rps_flow_table, 1); if (flow_table) { RCU_INIT_POINTER(queue->rps_flow_table, NULL); call_rcu(&flow_table->rcu, rps_dev_flow_table_release); } #endif memset(kobj, 0, sizeof(*kobj)); netdev_put(queue->dev, &queue->dev_tracker); } static const void *rx_queue_namespace(const struct kobject *kobj) { struct netdev_rx_queue *queue = to_rx_queue(kobj); struct device *dev = &queue->dev->dev; const void *ns = NULL; if (dev->class && dev->class->namespace) ns = dev->class->namespace(dev); return ns; } static void rx_queue_get_ownership(const struct kobject *kobj, kuid_t *uid, kgid_t *gid) { const struct net *net = rx_queue_namespace(kobj); net_ns_get_ownership(net, uid, gid); } static const struct kobj_type rx_queue_ktype = { .sysfs_ops = &rx_queue_sysfs_ops, .release = rx_queue_release, .namespace = rx_queue_namespace, .get_ownership = rx_queue_get_ownership, }; static int rx_queue_default_mask(struct net_device *dev, struct netdev_rx_queue *queue) { #if IS_ENABLED(CONFIG_RPS) && IS_ENABLED(CONFIG_SYSCTL) struct cpumask *rps_default_mask; int res = 0; mutex_lock(&rps_default_mask_mutex); rps_default_mask = dev_net(dev)->core.rps_default_mask; if (rps_default_mask && !cpumask_empty(rps_default_mask)) res = netdev_rx_queue_set_rps_mask(queue, rps_default_mask); mutex_unlock(&rps_default_mask_mutex); return res; #else return 0; #endif } static int rx_queue_add_kobject(struct net_device *dev, int index) { struct netdev_rx_queue *queue = dev->_rx + index; struct kobject *kobj = &queue->kobj; int error = 0; /* Rx queues are cleared in rx_queue_release to allow later * re-registration. This is triggered when their kobj refcount is * dropped. * * If a queue is removed while both a read (or write) operation and a * the re-addition of the same queue are pending (waiting on rntl_lock) * it might happen that the re-addition will execute before the read, * making the initial removal to never happen (queue's kobj refcount * won't drop enough because of the pending read). In such rare case, * return to allow the removal operation to complete. */ if (unlikely(kobj->state_initialized)) { netdev_warn_once(dev, "Cannot re-add rx queues before their removal completed"); return -EAGAIN; } /* Kobject_put later will trigger rx_queue_release call which * decreases dev refcount: Take that reference here */ netdev_hold(queue->dev, &queue->dev_tracker, GFP_KERNEL); kobj->kset = dev->queues_kset; error = kobject_init_and_add(kobj, &rx_queue_ktype, NULL, "rx-%u", index); if (error) goto err; queue->groups = rx_queue_default_groups; error = sysfs_create_groups(kobj, queue->groups); if (error) goto err; if (dev->sysfs_rx_queue_group) { error = sysfs_create_group(kobj, dev->sysfs_rx_queue_group); if (error) goto err_default_groups; } error = rx_queue_default_mask(dev, queue); if (error) goto err_default_groups; kobject_uevent(kobj, KOBJ_ADD); return error; err_default_groups: sysfs_remove_groups(kobj, queue->groups); err: kobject_put(kobj); return error; } static int rx_queue_change_owner(struct net_device *dev, int index, kuid_t kuid, kgid_t kgid) { struct netdev_rx_queue *queue = dev->_rx + index; struct kobject *kobj = &queue->kobj; int error; error = sysfs_change_owner(kobj, kuid, kgid); if (error) return error; if (dev->sysfs_rx_queue_group) error = sysfs_group_change_owner( kobj, dev->sysfs_rx_queue_group, kuid, kgid); return error; } #endif /* CONFIG_SYSFS */ int net_rx_queue_update_kobjects(struct net_device *dev, int old_num, int new_num) { #ifdef CONFIG_SYSFS int i; int error = 0; #ifndef CONFIG_RPS if (!dev->sysfs_rx_queue_group) return 0; #endif for (i = old_num; i < new_num; i++) { error = rx_queue_add_kobject(dev, i); if (error) { new_num = old_num; break; } } while (--i >= new_num) { struct netdev_rx_queue *queue = &dev->_rx[i]; struct kobject *kobj = &queue->kobj; if (!check_net(dev_net(dev))) kobj->uevent_suppress = 1; if (dev->sysfs_rx_queue_group) sysfs_remove_group(kobj, dev->sysfs_rx_queue_group); sysfs_remove_groups(kobj, queue->groups); kobject_put(kobj); } return error; #else return 0; #endif } static int net_rx_queue_change_owner(struct net_device *dev, int num, kuid_t kuid, kgid_t kgid) { #ifdef CONFIG_SYSFS int error = 0; int i; #ifndef CONFIG_RPS if (!dev->sysfs_rx_queue_group) return 0; #endif for (i = 0; i < num; i++) { error = rx_queue_change_owner(dev, i, kuid, kgid); if (error) break; } return error; #else return 0; #endif } #ifdef CONFIG_SYSFS /* * netdev_queue sysfs structures and functions. */ struct netdev_queue_attribute { struct attribute attr; ssize_t (*show)(struct kobject *kobj, struct attribute *attr, struct netdev_queue *queue, char *buf); ssize_t (*store)(struct kobject *kobj, struct attribute *attr, struct netdev_queue *queue, const char *buf, size_t len); }; #define to_netdev_queue_attr(_attr) \ container_of(_attr, struct netdev_queue_attribute, attr) #define to_netdev_queue(obj) container_of(obj, struct netdev_queue, kobj) static ssize_t netdev_queue_attr_show(struct kobject *kobj, struct attribute *attr, char *buf) { const struct netdev_queue_attribute *attribute = to_netdev_queue_attr(attr); struct netdev_queue *queue = to_netdev_queue(kobj); if (!attribute->show) return -EIO; return attribute->show(kobj, attr, queue, buf); } static ssize_t netdev_queue_attr_store(struct kobject *kobj, struct attribute *attr, const char *buf, size_t count) { const struct netdev_queue_attribute *attribute = to_netdev_queue_attr(attr); struct netdev_queue *queue = to_netdev_queue(kobj); if (!attribute->store) return -EIO; return attribute->store(kobj, attr, queue, buf, count); } static const struct sysfs_ops netdev_queue_sysfs_ops = { .show = netdev_queue_attr_show, .store = netdev_queue_attr_store, }; static ssize_t tx_timeout_show(struct kobject *kobj, struct attribute *attr, struct netdev_queue *queue, char *buf) { unsigned long trans_timeout = atomic_long_read(&queue->trans_timeout); return sysfs_emit(buf, fmt_ulong, trans_timeout); } static unsigned int get_netdev_queue_index(struct netdev_queue *queue) { struct net_device *dev = queue->dev; unsigned int i; i = queue - dev->_tx; BUG_ON(i >= dev->num_tx_queues); return i; } static ssize_t traffic_class_show(struct kobject *kobj, struct attribute *attr, struct netdev_queue *queue, char *buf) { struct net_device *dev = queue->dev; int num_tc, tc, index, ret; if (!netif_is_multiqueue(dev)) return -ENOENT; ret = sysfs_rtnl_lock(kobj, attr, queue->dev); if (ret) return ret; index = get_netdev_queue_index(queue); /* If queue belongs to subordinate dev use its TC mapping */ dev = netdev_get_tx_queue(dev, index)->sb_dev ? : dev; num_tc = dev->num_tc; tc = netdev_txq_to_tc(dev, index); rtnl_unlock(); if (tc < 0) return -EINVAL; /* We can report the traffic class one of two ways: * Subordinate device traffic classes are reported with the traffic * class first, and then the subordinate class so for example TC0 on * subordinate device 2 will be reported as "0-2". If the queue * belongs to the root device it will be reported with just the * traffic class, so just "0" for TC 0 for example. */ return num_tc < 0 ? sysfs_emit(buf, "%d%d\n", tc, num_tc) : sysfs_emit(buf, "%d\n", tc); } #ifdef CONFIG_XPS static ssize_t tx_maxrate_show(struct kobject *kobj, struct attribute *attr, struct netdev_queue *queue, char *buf) { return sysfs_emit(buf, "%lu\n", queue->tx_maxrate); } static ssize_t tx_maxrate_store(struct kobject *kobj, struct attribute *attr, struct netdev_queue *queue, const char *buf, size_t len) { int err, index = get_netdev_queue_index(queue); struct net_device *dev = queue->dev; u32 rate = 0; if (!capable(CAP_NET_ADMIN)) return -EPERM; /* The check is also done later; this helps returning early without * hitting the locking section below. */ if (!dev->netdev_ops->ndo_set_tx_maxrate) return -EOPNOTSUPP; err = kstrtou32(buf, 10, &rate); if (err < 0) return err; err = sysfs_rtnl_lock(kobj, attr, dev); if (err) return err; err = -EOPNOTSUPP; netdev_lock_ops(dev); if (dev->netdev_ops->ndo_set_tx_maxrate) err = dev->netdev_ops->ndo_set_tx_maxrate(dev, index, rate); netdev_unlock_ops(dev); if (!err) { queue->tx_maxrate = rate; rtnl_unlock(); return len; } rtnl_unlock(); return err; } static struct netdev_queue_attribute queue_tx_maxrate __ro_after_init = __ATTR_RW(tx_maxrate); #endif static struct netdev_queue_attribute queue_trans_timeout __ro_after_init = __ATTR_RO(tx_timeout); static struct netdev_queue_attribute queue_traffic_class __ro_after_init = __ATTR_RO(traffic_class); #ifdef CONFIG_BQL /* * Byte queue limits sysfs structures and functions. */ static ssize_t bql_show(char *buf, unsigned int value) { return sysfs_emit(buf, "%u\n", value); } static ssize_t bql_set(const char *buf, const size_t count, unsigned int *pvalue) { unsigned int value; int err; if (!strcmp(buf, "max") || !strcmp(buf, "max\n")) { value = DQL_MAX_LIMIT; } else { err = kstrtouint(buf, 10, &value); if (err < 0) return err; if (value > DQL_MAX_LIMIT) return -EINVAL; } *pvalue = value; return count; } static ssize_t bql_show_hold_time(struct kobject *kobj, struct attribute *attr, struct netdev_queue *queue, char *buf) { struct dql *dql = &queue->dql; return sysfs_emit(buf, "%u\n", jiffies_to_msecs(dql->slack_hold_time)); } static ssize_t bql_set_hold_time(struct kobject *kobj, struct attribute *attr, struct netdev_queue *queue, const char *buf, size_t len) { struct dql *dql = &queue->dql; unsigned int value; int err; err = kstrtouint(buf, 10, &value); if (err < 0) return err; dql->slack_hold_time = msecs_to_jiffies(value); return len; } static struct netdev_queue_attribute bql_hold_time_attribute __ro_after_init = __ATTR(hold_time, 0644, bql_show_hold_time, bql_set_hold_time); static ssize_t bql_show_stall_thrs(struct kobject *kobj, struct attribute *attr, struct netdev_queue *queue, char *buf) { struct dql *dql = &queue->dql; return sysfs_emit(buf, "%u\n", jiffies_to_msecs(dql->stall_thrs)); } static ssize_t bql_set_stall_thrs(struct kobject *kobj, struct attribute *attr, struct netdev_queue *queue, const char *buf, size_t len) { struct dql *dql = &queue->dql; unsigned int value; int err; err = kstrtouint(buf, 10, &value); if (err < 0) return err; value = msecs_to_jiffies(value); if (value && (value < 4 || value > 4 / 2 * BITS_PER_LONG)) return -ERANGE; if (!dql->stall_thrs && value) dql->last_reap = jiffies; /* Force last_reap to be live */ smp_wmb(); dql->stall_thrs = value; return len; } static struct netdev_queue_attribute bql_stall_thrs_attribute __ro_after_init = __ATTR(stall_thrs, 0644, bql_show_stall_thrs, bql_set_stall_thrs); static ssize_t bql_show_stall_max(struct kobject *kobj, struct attribute *attr, struct netdev_queue *queue, char *buf) { return sysfs_emit(buf, "%u\n", READ_ONCE(queue->dql.stall_max)); } static ssize_t bql_set_stall_max(struct kobject *kobj, struct attribute *attr, struct netdev_queue *queue, const char *buf, size_t len) { WRITE_ONCE(queue->dql.stall_max, 0); return len; } static struct netdev_queue_attribute bql_stall_max_attribute __ro_after_init = __ATTR(stall_max, 0644, bql_show_stall_max, bql_set_stall_max); static ssize_t bql_show_stall_cnt(struct kobject *kobj, struct attribute *attr, struct netdev_queue *queue, char *buf) { struct dql *dql = &queue->dql; return sysfs_emit(buf, "%lu\n", dql->stall_cnt); } static struct netdev_queue_attribute bql_stall_cnt_attribute __ro_after_init = __ATTR(stall_cnt, 0444, bql_show_stall_cnt, NULL); static ssize_t bql_show_inflight(struct kobject *kobj, struct attribute *attr, struct netdev_queue *queue, char *buf) { struct dql *dql = &queue->dql; return sysfs_emit(buf, "%u\n", dql->num_queued - dql->num_completed); } static struct netdev_queue_attribute bql_inflight_attribute __ro_after_init = __ATTR(inflight, 0444, bql_show_inflight, NULL); #define BQL_ATTR(NAME, FIELD) \ static ssize_t bql_show_ ## NAME(struct kobject *kobj, \ struct attribute *attr, \ struct netdev_queue *queue, char *buf) \ { \ return bql_show(buf, queue->dql.FIELD); \ } \ \ static ssize_t bql_set_ ## NAME(struct kobject *kobj, \ struct attribute *attr, \ struct netdev_queue *queue, \ const char *buf, size_t len) \ { \ return bql_set(buf, len, &queue->dql.FIELD); \ } \ \ static struct netdev_queue_attribute bql_ ## NAME ## _attribute __ro_after_init \ = __ATTR(NAME, 0644, \ bql_show_ ## NAME, bql_set_ ## NAME) BQL_ATTR(limit, limit); BQL_ATTR(limit_max, max_limit); BQL_ATTR(limit_min, min_limit); static struct attribute *dql_attrs[] __ro_after_init = { &bql_limit_attribute.attr, &bql_limit_max_attribute.attr, &bql_limit_min_attribute.attr, &bql_hold_time_attribute.attr, &bql_inflight_attribute.attr, &bql_stall_thrs_attribute.attr, &bql_stall_cnt_attribute.attr, &bql_stall_max_attribute.attr, NULL }; static const struct attribute_group dql_group = { .name = "byte_queue_limits", .attrs = dql_attrs, }; #else /* Fake declaration, all the code using it should be dead */ static const struct attribute_group dql_group = {}; #endif /* CONFIG_BQL */ #ifdef CONFIG_XPS static ssize_t xps_queue_show(struct net_device *dev, unsigned int index, int tc, char *buf, enum xps_map_type type) { struct xps_dev_maps *dev_maps; unsigned long *mask; unsigned int nr_ids; int j, len; rcu_read_lock(); dev_maps = rcu_dereference(dev->xps_maps[type]); /* Default to nr_cpu_ids/dev->num_rx_queues and do not just return 0 * when dev_maps hasn't been allocated yet, to be backward compatible. */ nr_ids = dev_maps ? dev_maps->nr_ids : (type == XPS_CPUS ? nr_cpu_ids : dev->num_rx_queues); mask = bitmap_zalloc(nr_ids, GFP_NOWAIT); if (!mask) { rcu_read_unlock(); return -ENOMEM; } if (!dev_maps || tc >= dev_maps->num_tc) goto out_no_maps; for (j = 0; j < nr_ids; j++) { int i, tci = j * dev_maps->num_tc + tc; struct xps_map *map; map = rcu_dereference(dev_maps->attr_map[tci]); if (!map) continue; for (i = map->len; i--;) { if (map->queues[i] == index) { __set_bit(j, mask); break; } } } out_no_maps: rcu_read_unlock(); len = bitmap_print_to_pagebuf(false, buf, mask, nr_ids); bitmap_free(mask); return len < PAGE_SIZE ? len : -EINVAL; } static ssize_t xps_cpus_show(struct kobject *kobj, struct attribute *attr, struct netdev_queue *queue, char *buf) { struct net_device *dev = queue->dev; unsigned int index; int len, tc, ret; if (!netif_is_multiqueue(dev)) return -ENOENT; index = get_netdev_queue_index(queue); ret = sysfs_rtnl_lock(kobj, attr, queue->dev); if (ret) return ret; /* If queue belongs to subordinate dev use its map */ dev = netdev_get_tx_queue(dev, index)->sb_dev ? : dev; tc = netdev_txq_to_tc(dev, index); if (tc < 0) { rtnl_unlock(); return -EINVAL; } /* Increase the net device refcnt to make sure it won't be freed while * xps_queue_show is running. */ dev_hold(dev); rtnl_unlock(); len = xps_queue_show(dev, index, tc, buf, XPS_CPUS); dev_put(dev); return len; } static ssize_t xps_cpus_store(struct kobject *kobj, struct attribute *attr, struct netdev_queue *queue, const char *buf, size_t len) { struct net_device *dev = queue->dev; unsigned int index; cpumask_var_t mask; int err; if (!netif_is_multiqueue(dev)) return -ENOENT; if (!capable(CAP_NET_ADMIN)) return -EPERM; if (!alloc_cpumask_var(&mask, GFP_KERNEL)) return -ENOMEM; index = get_netdev_queue_index(queue); err = bitmap_parse(buf, len, cpumask_bits(mask), nr_cpumask_bits); if (err) { free_cpumask_var(mask); return err; } err = sysfs_rtnl_lock(kobj, attr, dev); if (err) { free_cpumask_var(mask); return err; } err = netif_set_xps_queue(dev, mask, index); rtnl_unlock(); free_cpumask_var(mask); return err ? : len; } static struct netdev_queue_attribute xps_cpus_attribute __ro_after_init = __ATTR_RW(xps_cpus); static ssize_t xps_rxqs_show(struct kobject *kobj, struct attribute *attr, struct netdev_queue *queue, char *buf) { struct net_device *dev = queue->dev; unsigned int index; int tc, ret; index = get_netdev_queue_index(queue); ret = sysfs_rtnl_lock(kobj, attr, dev); if (ret) return ret; tc = netdev_txq_to_tc(dev, index); /* Increase the net device refcnt to make sure it won't be freed while * xps_queue_show is running. */ dev_hold(dev); rtnl_unlock(); ret = tc >= 0 ? xps_queue_show(dev, index, tc, buf, XPS_RXQS) : -EINVAL; dev_put(dev); return ret; } static ssize_t xps_rxqs_store(struct kobject *kobj, struct attribute *attr, struct netdev_queue *queue, const char *buf, size_t len) { struct net_device *dev = queue->dev; struct net *net = dev_net(dev); unsigned long *mask; unsigned int index; int err; if (!ns_capable(net->user_ns, CAP_NET_ADMIN)) return -EPERM; mask = bitmap_zalloc(dev->num_rx_queues, GFP_KERNEL); if (!mask) return -ENOMEM; index = get_netdev_queue_index(queue); err = bitmap_parse(buf, len, mask, dev->num_rx_queues); if (err) { bitmap_free(mask); return err; } err = sysfs_rtnl_lock(kobj, attr, dev); if (err) { bitmap_free(mask); return err; } cpus_read_lock(); err = __netif_set_xps_queue(dev, mask, index, XPS_RXQS); cpus_read_unlock(); rtnl_unlock(); bitmap_free(mask); return err ? : len; } static struct netdev_queue_attribute xps_rxqs_attribute __ro_after_init = __ATTR_RW(xps_rxqs); #endif /* CONFIG_XPS */ static struct attribute *netdev_queue_default_attrs[] __ro_after_init = { &queue_trans_timeout.attr, &queue_traffic_class.attr, #ifdef CONFIG_XPS &xps_cpus_attribute.attr, &xps_rxqs_attribute.attr, &queue_tx_maxrate.attr, #endif NULL }; ATTRIBUTE_GROUPS(netdev_queue_default); static void netdev_queue_release(struct kobject *kobj) { struct netdev_queue *queue = to_netdev_queue(kobj); memset(kobj, 0, sizeof(*kobj)); netdev_put(queue->dev, &queue->dev_tracker); } static const void *netdev_queue_namespace(const struct kobject *kobj) { struct netdev_queue *queue = to_netdev_queue(kobj); struct device *dev = &queue->dev->dev; const void *ns = NULL; if (dev->class && dev->class->namespace) ns = dev->class->namespace(dev); return ns; } static void netdev_queue_get_ownership(const struct kobject *kobj, kuid_t *uid, kgid_t *gid) { const struct net *net = netdev_queue_namespace(kobj); net_ns_get_ownership(net, uid, gid); } static const struct kobj_type netdev_queue_ktype = { .sysfs_ops = &netdev_queue_sysfs_ops, .release = netdev_queue_release, .namespace = netdev_queue_namespace, .get_ownership = netdev_queue_get_ownership, }; static bool netdev_uses_bql(const struct net_device *dev) { if (dev->lltx || (dev->priv_flags & IFF_NO_QUEUE)) return false; return IS_ENABLED(CONFIG_BQL); } static int netdev_queue_add_kobject(struct net_device *dev, int index) { struct netdev_queue *queue = dev->_tx + index; struct kobject *kobj = &queue->kobj; int error = 0; /* Tx queues are cleared in netdev_queue_release to allow later * re-registration. This is triggered when their kobj refcount is * dropped. * * If a queue is removed while both a read (or write) operation and a * the re-addition of the same queue are pending (waiting on rntl_lock) * it might happen that the re-addition will execute before the read, * making the initial removal to never happen (queue's kobj refcount * won't drop enough because of the pending read). In such rare case, * return to allow the removal operation to complete. */ if (unlikely(kobj->state_initialized)) { netdev_warn_once(dev, "Cannot re-add tx queues before their removal completed"); return -EAGAIN; } /* Kobject_put later will trigger netdev_queue_release call * which decreases dev refcount: Take that reference here */ netdev_hold(queue->dev, &queue->dev_tracker, GFP_KERNEL); kobj->kset = dev->queues_kset; error = kobject_init_and_add(kobj, &netdev_queue_ktype, NULL, "tx-%u", index); if (error) goto err; queue->groups = netdev_queue_default_groups; error = sysfs_create_groups(kobj, queue->groups); if (error) goto err; if (netdev_uses_bql(dev)) { error = sysfs_create_group(kobj, &dql_group); if (error) goto err_default_groups; } kobject_uevent(kobj, KOBJ_ADD); return 0; err_default_groups: sysfs_remove_groups(kobj, queue->groups); err: kobject_put(kobj); return error; } static int tx_queue_change_owner(struct net_device *ndev, int index, kuid_t kuid, kgid_t kgid) { struct netdev_queue *queue = ndev->_tx + index; struct kobject *kobj = &queue->kobj; int error; error = sysfs_change_owner(kobj, kuid, kgid); if (error) return error; if (netdev_uses_bql(ndev)) error = sysfs_group_change_owner(kobj, &dql_group, kuid, kgid); return error; } #endif /* CONFIG_SYSFS */ int netdev_queue_update_kobjects(struct net_device *dev, int old_num, int new_num) { #ifdef CONFIG_SYSFS int i; int error = 0; /* Tx queue kobjects are allowed to be updated when a device is being * unregistered, but solely to remove queues from qdiscs. Any path * adding queues should be fixed. */ WARN(dev->reg_state == NETREG_UNREGISTERING && new_num > old_num, "New queues can't be registered after device unregistration."); for (i = old_num; i < new_num; i++) { error = netdev_queue_add_kobject(dev, i); if (error) { new_num = old_num; break; } } while (--i >= new_num) { struct netdev_queue *queue = dev->_tx + i; if (!check_net(dev_net(dev))) queue->kobj.uevent_suppress = 1; if (netdev_uses_bql(dev)) sysfs_remove_group(&queue->kobj, &dql_group); sysfs_remove_groups(&queue->kobj, queue->groups); kobject_put(&queue->kobj); } return error; #else return 0; #endif /* CONFIG_SYSFS */ } static int net_tx_queue_change_owner(struct net_device *dev, int num, kuid_t kuid, kgid_t kgid) { #ifdef CONFIG_SYSFS int error = 0; int i; for (i = 0; i < num; i++) { error = tx_queue_change_owner(dev, i, kuid, kgid); if (error) break; } return error; #else return 0; #endif /* CONFIG_SYSFS */ } static int register_queue_kobjects(struct net_device *dev) { int error = 0, txq = 0, rxq = 0, real_rx = 0, real_tx = 0; #ifdef CONFIG_SYSFS dev->queues_kset = kset_create_and_add("queues", NULL, &dev->dev.kobj); if (!dev->queues_kset) return -ENOMEM; real_rx = dev->real_num_rx_queues; #endif real_tx = dev->real_num_tx_queues; error = net_rx_queue_update_kobjects(dev, 0, real_rx); if (error) goto error; rxq = real_rx; error = netdev_queue_update_kobjects(dev, 0, real_tx); if (error) goto error; txq = real_tx; return 0; error: netdev_queue_update_kobjects(dev, txq, 0); net_rx_queue_update_kobjects(dev, rxq, 0); #ifdef CONFIG_SYSFS kset_unregister(dev->queues_kset); #endif return error; } static int queue_change_owner(struct net_device *ndev, kuid_t kuid, kgid_t kgid) { int error = 0, real_rx = 0, real_tx = 0; #ifdef CONFIG_SYSFS if (ndev->queues_kset) { error = sysfs_change_owner(&ndev->queues_kset->kobj, kuid, kgid); if (error) return error; } real_rx = ndev->real_num_rx_queues; #endif real_tx = ndev->real_num_tx_queues; error = net_rx_queue_change_owner(ndev, real_rx, kuid, kgid); if (error) return error; error = net_tx_queue_change_owner(ndev, real_tx, kuid, kgid); if (error) return error; return 0; } static void remove_queue_kobjects(struct net_device *dev) { int real_rx = 0, real_tx = 0; #ifdef CONFIG_SYSFS real_rx = dev->real_num_rx_queues; #endif real_tx = dev->real_num_tx_queues; net_rx_queue_update_kobjects(dev, real_rx, 0); netdev_queue_update_kobjects(dev, real_tx, 0); netdev_lock_ops(dev); dev->real_num_rx_queues = 0; dev->real_num_tx_queues = 0; netdev_unlock_ops(dev); #ifdef CONFIG_SYSFS kset_unregister(dev->queues_kset); #endif } static bool net_current_may_mount(void) { struct net *net = current->nsproxy->net_ns; return ns_capable(net->user_ns, CAP_SYS_ADMIN); } static void *net_grab_current_ns(void) { struct net *ns = current->nsproxy->net_ns; #ifdef CONFIG_NET_NS if (ns) refcount_inc(&ns->passive); #endif return ns; } static const void *net_initial_ns(void) { return &init_net; } static const void *net_netlink_ns(struct sock *sk) { return sock_net(sk); } const struct kobj_ns_type_operations net_ns_type_operations = { .type = KOBJ_NS_TYPE_NET, .current_may_mount = net_current_may_mount, .grab_current_ns = net_grab_current_ns, .netlink_ns = net_netlink_ns, .initial_ns = net_initial_ns, .drop_ns = net_drop_ns, }; EXPORT_SYMBOL_GPL(net_ns_type_operations); static int netdev_uevent(const struct device *d, struct kobj_uevent_env *env) { const struct net_device *dev = to_net_dev(d); int retval; /* pass interface to uevent. */ retval = add_uevent_var(env, "INTERFACE=%s", dev->name); if (retval) goto exit; /* pass ifindex to uevent. * ifindex is useful as it won't change (interface name may change) * and is what RtNetlink uses natively. */ retval = add_uevent_var(env, "IFINDEX=%d", dev->ifindex); exit: return retval; } /* * netdev_release -- destroy and free a dead device. * Called when last reference to device kobject is gone. */ static void netdev_release(struct device *d) { struct net_device *dev = to_net_dev(d); BUG_ON(dev->reg_state != NETREG_RELEASED); /* no need to wait for rcu grace period: * device is dead and about to be freed. */ kfree(rcu_access_pointer(dev->ifalias)); kvfree(dev); } static const void *net_namespace(const struct device *d) { const struct net_device *dev = to_net_dev(d); return dev_net(dev); } static void net_get_ownership(const struct device *d, kuid_t *uid, kgid_t *gid) { const struct net_device *dev = to_net_dev(d); const struct net *net = dev_net(dev); net_ns_get_ownership(net, uid, gid); } static const struct class net_class = { .name = "net", .dev_release = netdev_release, .dev_groups = net_class_groups, .dev_uevent = netdev_uevent, .ns_type = &net_ns_type_operations, .namespace = net_namespace, .get_ownership = net_get_ownership, }; #ifdef CONFIG_OF static int of_dev_node_match(struct device *dev, const void *data) { for (; dev; dev = dev->parent) { if (dev->of_node == data) return 1; } return 0; } /* * of_find_net_device_by_node - lookup the net device for the device node * @np: OF device node * * Looks up the net_device structure corresponding with the device node. * If successful, returns a pointer to the net_device with the embedded * struct device refcount incremented by one, or NULL on failure. The * refcount must be dropped when done with the net_device. */ struct net_device *of_find_net_device_by_node(struct device_node *np) { struct device *dev; dev = class_find_device(&net_class, NULL, np, of_dev_node_match); if (!dev) return NULL; return to_net_dev(dev); } EXPORT_SYMBOL(of_find_net_device_by_node); #endif /* Delete sysfs entries but hold kobject reference until after all * netdev references are gone. */ void netdev_unregister_kobject(struct net_device *ndev) { struct device *dev = &ndev->dev; if (!check_net(dev_net(ndev))) dev_set_uevent_suppress(dev, 1); kobject_get(&dev->kobj); remove_queue_kobjects(ndev); pm_runtime_set_memalloc_noio(dev, false); device_del(dev); } /* Create sysfs entries for network device. */ int netdev_register_kobject(struct net_device *ndev) { struct device *dev = &ndev->dev; const struct attribute_group **groups = ndev->sysfs_groups; int error = 0; device_initialize(dev); dev->class = &net_class; dev->platform_data = ndev; dev->groups = groups; dev_set_name(dev, "%s", ndev->name); #ifdef CONFIG_SYSFS /* Allow for a device specific group */ if (*groups) groups++; *groups++ = &netstat_group; *groups++ = &netdev_phys_group; if (wireless_group_needed(ndev)) *groups++ = &wireless_group; #endif /* CONFIG_SYSFS */ error = device_add(dev); if (error) return error; error = register_queue_kobjects(ndev); if (error) { device_del(dev); return error; } pm_runtime_set_memalloc_noio(dev, true); return error; } /* Change owner for sysfs entries when moving network devices across network * namespaces owned by different user namespaces. */ int netdev_change_owner(struct net_device *ndev, const struct net *net_old, const struct net *net_new) { kuid_t old_uid = GLOBAL_ROOT_UID, new_uid = GLOBAL_ROOT_UID; kgid_t old_gid = GLOBAL_ROOT_GID, new_gid = GLOBAL_ROOT_GID; struct device *dev = &ndev->dev; int error; net_ns_get_ownership(net_old, &old_uid, &old_gid); net_ns_get_ownership(net_new, &new_uid, &new_gid); /* The network namespace was changed but the owning user namespace is * identical so there's no need to change the owner of sysfs entries. */ if (uid_eq(old_uid, new_uid) && gid_eq(old_gid, new_gid)) return 0; error = device_change_owner(dev, new_uid, new_gid); if (error) return error; error = queue_change_owner(ndev, new_uid, new_gid); if (error) return error; return 0; } int netdev_class_create_file_ns(const struct class_attribute *class_attr, const void *ns) { return class_create_file_ns(&net_class, class_attr, ns); } EXPORT_SYMBOL(netdev_class_create_file_ns); void netdev_class_remove_file_ns(const struct class_attribute *class_attr, const void *ns) { class_remove_file_ns(&net_class, class_attr, ns); } EXPORT_SYMBOL(netdev_class_remove_file_ns); int __init netdev_kobject_init(void) { kobj_ns_type_register(&net_ns_type_operations); return class_register(&net_class); } |
| 5 39 86 5 80 81 81 24 57 137 4 128 5 5 90 86 21 73 1 6 627 603 33 33 33 33 33 493 471 24 24 24 24 142 135 5 4 4 4 4 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 | // SPDX-License-Identifier: GPL-2.0-or-later /* * lwtunnel Infrastructure for light weight tunnels like mpls * * Authors: Roopa Prabhu, <roopa@cumulusnetworks.com> */ #include <linux/capability.h> #include <linux/module.h> #include <linux/types.h> #include <linux/kernel.h> #include <linux/slab.h> #include <linux/uaccess.h> #include <linux/skbuff.h> #include <linux/netdevice.h> #include <linux/lwtunnel.h> #include <linux/in.h> #include <linux/init.h> #include <linux/err.h> #include <net/lwtunnel.h> #include <net/rtnetlink.h> #include <net/ip6_fib.h> #include <net/rtnh.h> #include "dev.h" DEFINE_STATIC_KEY_FALSE(nf_hooks_lwtunnel_enabled); EXPORT_SYMBOL_GPL(nf_hooks_lwtunnel_enabled); #ifdef CONFIG_MODULES static const char *lwtunnel_encap_str(enum lwtunnel_encap_types encap_type) { /* Only lwt encaps implemented without using an interface for * the encap need to return a string here. */ switch (encap_type) { case LWTUNNEL_ENCAP_MPLS: return "MPLS"; case LWTUNNEL_ENCAP_ILA: return "ILA"; case LWTUNNEL_ENCAP_SEG6: return "SEG6"; case LWTUNNEL_ENCAP_BPF: return "BPF"; case LWTUNNEL_ENCAP_SEG6_LOCAL: return "SEG6LOCAL"; case LWTUNNEL_ENCAP_RPL: return "RPL"; case LWTUNNEL_ENCAP_IOAM6: return "IOAM6"; case LWTUNNEL_ENCAP_XFRM: /* module autoload not supported for encap type */ return NULL; case LWTUNNEL_ENCAP_IP6: case LWTUNNEL_ENCAP_IP: case LWTUNNEL_ENCAP_NONE: case __LWTUNNEL_ENCAP_MAX: /* should not have got here */ WARN_ON(1); break; } return NULL; } #endif /* CONFIG_MODULES */ struct lwtunnel_state *lwtunnel_state_alloc(int encap_len) { struct lwtunnel_state *lws; lws = kzalloc(sizeof(*lws) + encap_len, GFP_ATOMIC); return lws; } EXPORT_SYMBOL_GPL(lwtunnel_state_alloc); static const struct lwtunnel_encap_ops __rcu * lwtun_encaps[LWTUNNEL_ENCAP_MAX + 1] __read_mostly; int lwtunnel_encap_add_ops(const struct lwtunnel_encap_ops *ops, unsigned int num) { if (num > LWTUNNEL_ENCAP_MAX) return -ERANGE; return !cmpxchg((const struct lwtunnel_encap_ops **) &lwtun_encaps[num], NULL, ops) ? 0 : -1; } EXPORT_SYMBOL_GPL(lwtunnel_encap_add_ops); int lwtunnel_encap_del_ops(const struct lwtunnel_encap_ops *ops, unsigned int encap_type) { int ret; if (encap_type == LWTUNNEL_ENCAP_NONE || encap_type > LWTUNNEL_ENCAP_MAX) return -ERANGE; ret = (cmpxchg((const struct lwtunnel_encap_ops **) &lwtun_encaps[encap_type], ops, NULL) == ops) ? 0 : -1; synchronize_net(); return ret; } EXPORT_SYMBOL_GPL(lwtunnel_encap_del_ops); int lwtunnel_build_state(struct net *net, u16 encap_type, struct nlattr *encap, unsigned int family, const void *cfg, struct lwtunnel_state **lws, struct netlink_ext_ack *extack) { const struct lwtunnel_encap_ops *ops; bool found = false; int ret = -EINVAL; if (encap_type == LWTUNNEL_ENCAP_NONE || encap_type > LWTUNNEL_ENCAP_MAX) { NL_SET_ERR_MSG_ATTR(extack, encap, "Unknown LWT encapsulation type"); return ret; } ret = -EOPNOTSUPP; rcu_read_lock(); ops = rcu_dereference(lwtun_encaps[encap_type]); if (likely(ops && ops->build_state && try_module_get(ops->owner))) found = true; rcu_read_unlock(); if (found) { ret = ops->build_state(net, encap, family, cfg, lws, extack); if (ret) module_put(ops->owner); } else { /* don't rely on -EOPNOTSUPP to detect match as build_state * handlers could return it */ NL_SET_ERR_MSG_ATTR(extack, encap, "LWT encapsulation type not supported"); } return ret; } EXPORT_SYMBOL_GPL(lwtunnel_build_state); int lwtunnel_valid_encap_type(u16 encap_type, struct netlink_ext_ack *extack) { const struct lwtunnel_encap_ops *ops; int ret = -EINVAL; if (encap_type == LWTUNNEL_ENCAP_NONE || encap_type > LWTUNNEL_ENCAP_MAX) { NL_SET_ERR_MSG(extack, "Unknown lwt encapsulation type"); return ret; } ops = rcu_access_pointer(lwtun_encaps[encap_type]); #ifdef CONFIG_MODULES if (!ops) { const char *encap_type_str = lwtunnel_encap_str(encap_type); if (encap_type_str) { request_module("rtnl-lwt-%s", encap_type_str); ops = rcu_access_pointer(lwtun_encaps[encap_type]); } } #endif ret = ops ? 0 : -EOPNOTSUPP; if (ret < 0) NL_SET_ERR_MSG(extack, "lwt encapsulation type not supported"); return ret; } EXPORT_SYMBOL_GPL(lwtunnel_valid_encap_type); int lwtunnel_valid_encap_type_attr(struct nlattr *attr, int remaining, struct netlink_ext_ack *extack) { struct rtnexthop *rtnh = (struct rtnexthop *)attr; struct nlattr *nla_entype; struct nlattr *attrs; u16 encap_type; int attrlen; while (rtnh_ok(rtnh, remaining)) { attrlen = rtnh_attrlen(rtnh); if (attrlen > 0) { attrs = rtnh_attrs(rtnh); nla_entype = nla_find(attrs, attrlen, RTA_ENCAP_TYPE); if (nla_entype) { if (nla_len(nla_entype) < sizeof(u16)) { NL_SET_ERR_MSG(extack, "Invalid RTA_ENCAP_TYPE"); return -EINVAL; } encap_type = nla_get_u16(nla_entype); if (lwtunnel_valid_encap_type(encap_type, extack)) return -EOPNOTSUPP; } } rtnh = rtnh_next(rtnh, &remaining); } return 0; } EXPORT_SYMBOL_GPL(lwtunnel_valid_encap_type_attr); void lwtstate_free(struct lwtunnel_state *lws) { const struct lwtunnel_encap_ops *ops = lwtun_encaps[lws->type]; if (ops->destroy_state) { ops->destroy_state(lws); kfree_rcu(lws, rcu); } else { kfree(lws); } module_put(ops->owner); } EXPORT_SYMBOL_GPL(lwtstate_free); int lwtunnel_fill_encap(struct sk_buff *skb, struct lwtunnel_state *lwtstate, int encap_attr, int encap_type_attr) { const struct lwtunnel_encap_ops *ops; struct nlattr *nest; int ret; if (!lwtstate) return 0; if (lwtstate->type == LWTUNNEL_ENCAP_NONE || lwtstate->type > LWTUNNEL_ENCAP_MAX) return 0; nest = nla_nest_start_noflag(skb, encap_attr); if (!nest) return -EMSGSIZE; ret = -EOPNOTSUPP; rcu_read_lock(); ops = rcu_dereference(lwtun_encaps[lwtstate->type]); if (likely(ops && ops->fill_encap)) ret = ops->fill_encap(skb, lwtstate); rcu_read_unlock(); if (ret) goto nla_put_failure; nla_nest_end(skb, nest); ret = nla_put_u16(skb, encap_type_attr, lwtstate->type); if (ret) goto nla_put_failure; return 0; nla_put_failure: nla_nest_cancel(skb, nest); return (ret == -EOPNOTSUPP ? 0 : ret); } EXPORT_SYMBOL_GPL(lwtunnel_fill_encap); int lwtunnel_get_encap_size(struct lwtunnel_state *lwtstate) { const struct lwtunnel_encap_ops *ops; int ret = 0; if (!lwtstate) return 0; if (lwtstate->type == LWTUNNEL_ENCAP_NONE || lwtstate->type > LWTUNNEL_ENCAP_MAX) return 0; rcu_read_lock(); ops = rcu_dereference(lwtun_encaps[lwtstate->type]); if (likely(ops && ops->get_encap_size)) ret = nla_total_size(ops->get_encap_size(lwtstate)); rcu_read_unlock(); return ret; } EXPORT_SYMBOL_GPL(lwtunnel_get_encap_size); int lwtunnel_cmp_encap(struct lwtunnel_state *a, struct lwtunnel_state *b) { const struct lwtunnel_encap_ops *ops; int ret = 0; if (!a && !b) return 0; if (!a || !b) return 1; if (a->type != b->type) return 1; if (a->type == LWTUNNEL_ENCAP_NONE || a->type > LWTUNNEL_ENCAP_MAX) return 0; rcu_read_lock(); ops = rcu_dereference(lwtun_encaps[a->type]); if (likely(ops && ops->cmp_encap)) ret = ops->cmp_encap(a, b); rcu_read_unlock(); return ret; } EXPORT_SYMBOL_GPL(lwtunnel_cmp_encap); int lwtunnel_output(struct net *net, struct sock *sk, struct sk_buff *skb) { const struct lwtunnel_encap_ops *ops; struct lwtunnel_state *lwtstate; struct dst_entry *dst; int ret; local_bh_disable(); if (dev_xmit_recursion()) { net_crit_ratelimited("%s(): recursion limit reached on datapath\n", __func__); ret = -ENETDOWN; goto drop; } dst = skb_dst(skb); if (!dst) { ret = -EINVAL; goto drop; } lwtstate = dst->lwtstate; if (lwtstate->type == LWTUNNEL_ENCAP_NONE || lwtstate->type > LWTUNNEL_ENCAP_MAX) { ret = 0; goto out; } ret = -EOPNOTSUPP; rcu_read_lock(); ops = rcu_dereference(lwtun_encaps[lwtstate->type]); if (likely(ops && ops->output)) { dev_xmit_recursion_inc(); ret = ops->output(net, sk, skb); dev_xmit_recursion_dec(); } rcu_read_unlock(); if (ret == -EOPNOTSUPP) goto drop; goto out; drop: kfree_skb(skb); out: local_bh_enable(); return ret; } EXPORT_SYMBOL_GPL(lwtunnel_output); int lwtunnel_xmit(struct sk_buff *skb) { const struct lwtunnel_encap_ops *ops; struct lwtunnel_state *lwtstate; struct dst_entry *dst; int ret; local_bh_disable(); if (dev_xmit_recursion()) { net_crit_ratelimited("%s(): recursion limit reached on datapath\n", __func__); ret = -ENETDOWN; goto drop; } dst = skb_dst(skb); if (!dst) { ret = -EINVAL; goto drop; } lwtstate = dst->lwtstate; if (lwtstate->type == LWTUNNEL_ENCAP_NONE || lwtstate->type > LWTUNNEL_ENCAP_MAX) { ret = 0; goto out; } ret = -EOPNOTSUPP; rcu_read_lock(); ops = rcu_dereference(lwtun_encaps[lwtstate->type]); if (likely(ops && ops->xmit)) { dev_xmit_recursion_inc(); ret = ops->xmit(skb); dev_xmit_recursion_dec(); } rcu_read_unlock(); if (ret == -EOPNOTSUPP) goto drop; goto out; drop: kfree_skb(skb); out: local_bh_enable(); return ret; } EXPORT_SYMBOL_GPL(lwtunnel_xmit); int lwtunnel_input(struct sk_buff *skb) { const struct lwtunnel_encap_ops *ops; struct lwtunnel_state *lwtstate; struct dst_entry *dst; int ret; DEBUG_NET_WARN_ON_ONCE(!in_softirq()); if (dev_xmit_recursion()) { net_crit_ratelimited("%s(): recursion limit reached on datapath\n", __func__); ret = -ENETDOWN; goto drop; } dst = skb_dst(skb); if (!dst) { ret = -EINVAL; goto drop; } lwtstate = dst->lwtstate; if (lwtstate->type == LWTUNNEL_ENCAP_NONE || lwtstate->type > LWTUNNEL_ENCAP_MAX) return 0; ret = -EOPNOTSUPP; rcu_read_lock(); ops = rcu_dereference(lwtun_encaps[lwtstate->type]); if (likely(ops && ops->input)) { dev_xmit_recursion_inc(); ret = ops->input(skb); dev_xmit_recursion_dec(); } rcu_read_unlock(); if (ret == -EOPNOTSUPP) goto drop; return ret; drop: kfree_skb(skb); return ret; } EXPORT_SYMBOL_GPL(lwtunnel_input); |
| 1 1 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 | // SPDX-License-Identifier: GPL-2.0-only /* Copyright (C) 2011-2013 Jozsef Kadlecsik <kadlec@netfilter.org> */ /* Kernel module implementing an IP set type: the hash:net,iface type */ #include <linux/jhash.h> #include <linux/module.h> #include <linux/ip.h> #include <linux/skbuff.h> #include <linux/errno.h> #include <linux/random.h> #include <net/ip.h> #include <net/ipv6.h> #include <net/netlink.h> #include <linux/netfilter.h> #include <linux/netfilter_bridge.h> #include <linux/netfilter/ipset/pfxlen.h> #include <linux/netfilter/ipset/ip_set.h> #include <linux/netfilter/ipset/ip_set_hash.h> #define IPSET_TYPE_REV_MIN 0 /* 1 nomatch flag support added */ /* 2 /0 support added */ /* 3 Counters support added */ /* 4 Comments support added */ /* 5 Forceadd support added */ /* 6 skbinfo support added */ /* 7 interface wildcard support added */ #define IPSET_TYPE_REV_MAX 8 /* bucketsize, initval support added */ MODULE_LICENSE("GPL"); MODULE_AUTHOR("Jozsef Kadlecsik <kadlec@netfilter.org>"); IP_SET_MODULE_DESC("hash:net,iface", IPSET_TYPE_REV_MIN, IPSET_TYPE_REV_MAX); MODULE_ALIAS("ip_set_hash:net,iface"); /* Type specific function prefix */ #define HTYPE hash_netiface #define IP_SET_HASH_WITH_NETS #define IP_SET_HASH_WITH_MULTI #define IP_SET_HASH_WITH_NET0 #define STRSCPY(a, b) strscpy(a, b, IFNAMSIZ) /* IPv4 variant */ struct hash_netiface4_elem_hashed { __be32 ip; u8 physdev; u8 cidr; u8 nomatch; u8 elem; }; /* Member elements */ struct hash_netiface4_elem { __be32 ip; u8 physdev; u8 cidr; u8 nomatch; u8 elem; u8 wildcard; char iface[IFNAMSIZ]; }; /* Common functions */ static bool hash_netiface4_data_equal(const struct hash_netiface4_elem *ip1, const struct hash_netiface4_elem *ip2, u32 *multi) { return ip1->ip == ip2->ip && ip1->cidr == ip2->cidr && (++*multi) && ip1->physdev == ip2->physdev && (ip1->wildcard ? strncmp(ip1->iface, ip2->iface, strlen(ip1->iface)) == 0 : strcmp(ip1->iface, ip2->iface) == 0); } static int hash_netiface4_do_data_match(const struct hash_netiface4_elem *elem) { return elem->nomatch ? -ENOTEMPTY : 1; } static void hash_netiface4_data_set_flags(struct hash_netiface4_elem *elem, u32 flags) { elem->nomatch = (flags >> 16) & IPSET_FLAG_NOMATCH; } static void hash_netiface4_data_reset_flags(struct hash_netiface4_elem *elem, u8 *flags) { swap(*flags, elem->nomatch); } static void hash_netiface4_data_netmask(struct hash_netiface4_elem *elem, u8 cidr) { elem->ip &= ip_set_netmask(cidr); elem->cidr = cidr; } static bool hash_netiface4_data_list(struct sk_buff *skb, const struct hash_netiface4_elem *data) { u32 flags = (data->physdev ? IPSET_FLAG_PHYSDEV : 0) | (data->wildcard ? IPSET_FLAG_IFACE_WILDCARD : 0); if (data->nomatch) flags |= IPSET_FLAG_NOMATCH; if (nla_put_ipaddr4(skb, IPSET_ATTR_IP, data->ip) || nla_put_u8(skb, IPSET_ATTR_CIDR, data->cidr) || nla_put_string(skb, IPSET_ATTR_IFACE, data->iface) || (flags && nla_put_net32(skb, IPSET_ATTR_CADT_FLAGS, htonl(flags)))) goto nla_put_failure; return false; nla_put_failure: return true; } static void hash_netiface4_data_next(struct hash_netiface4_elem *next, const struct hash_netiface4_elem *d) { next->ip = d->ip; } #define MTYPE hash_netiface4 #define HOST_MASK 32 #define HKEY_DATALEN sizeof(struct hash_netiface4_elem_hashed) #include "ip_set_hash_gen.h" #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER) static const char *get_physindev_name(const struct sk_buff *skb, struct net *net) { struct net_device *dev = nf_bridge_get_physindev(skb, net); return dev ? dev->name : NULL; } static const char *get_physoutdev_name(const struct sk_buff *skb) { struct net_device *dev = nf_bridge_get_physoutdev(skb); return dev ? dev->name : NULL; } #endif static int hash_netiface4_kadt(struct ip_set *set, const struct sk_buff *skb, const struct xt_action_param *par, enum ipset_adt adt, struct ip_set_adt_opt *opt) { struct hash_netiface4 *h = set->data; ipset_adtfn adtfn = set->variant->adt[adt]; struct hash_netiface4_elem e = { .cidr = INIT_CIDR(h->nets[0].cidr[0], HOST_MASK), .elem = 1, }; struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, set); if (adt == IPSET_TEST) e.cidr = HOST_MASK; ip4addrptr(skb, opt->flags & IPSET_DIM_ONE_SRC, &e.ip); e.ip &= ip_set_netmask(e.cidr); #define IFACE(dir) (par->state->dir ? par->state->dir->name : "") #define SRCDIR (opt->flags & IPSET_DIM_TWO_SRC) if (opt->cmdflags & IPSET_FLAG_PHYSDEV) { #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER) const char *eiface = SRCDIR ? get_physindev_name(skb, xt_net(par)) : get_physoutdev_name(skb); if (!eiface) return -EINVAL; STRSCPY(e.iface, eiface); e.physdev = 1; #endif } else { STRSCPY(e.iface, SRCDIR ? IFACE(in) : IFACE(out)); } if (strlen(e.iface) == 0) return -EINVAL; return adtfn(set, &e, &ext, &opt->ext, opt->cmdflags); } static int hash_netiface4_uadt(struct ip_set *set, struct nlattr *tb[], enum ipset_adt adt, u32 *lineno, u32 flags, bool retried) { struct hash_netiface4 *h = set->data; ipset_adtfn adtfn = set->variant->adt[adt]; struct hash_netiface4_elem e = { .cidr = HOST_MASK, .elem = 1 }; struct ip_set_ext ext = IP_SET_INIT_UEXT(set); u32 ip = 0, ip_to = 0, i = 0; int ret; if (tb[IPSET_ATTR_LINENO]) *lineno = nla_get_u32(tb[IPSET_ATTR_LINENO]); if (unlikely(!tb[IPSET_ATTR_IP] || !tb[IPSET_ATTR_IFACE] || !ip_set_optattr_netorder(tb, IPSET_ATTR_CADT_FLAGS))) return -IPSET_ERR_PROTOCOL; ret = ip_set_get_hostipaddr4(tb[IPSET_ATTR_IP], &ip); if (ret) return ret; ret = ip_set_get_extensions(set, tb, &ext); if (ret) return ret; if (tb[IPSET_ATTR_CIDR]) { e.cidr = nla_get_u8(tb[IPSET_ATTR_CIDR]); if (e.cidr > HOST_MASK) return -IPSET_ERR_INVALID_CIDR; } nla_strscpy(e.iface, tb[IPSET_ATTR_IFACE], IFNAMSIZ); if (tb[IPSET_ATTR_CADT_FLAGS]) { u32 cadt_flags = ip_set_get_h32(tb[IPSET_ATTR_CADT_FLAGS]); if (cadt_flags & IPSET_FLAG_PHYSDEV) e.physdev = 1; if (cadt_flags & IPSET_FLAG_NOMATCH) flags |= (IPSET_FLAG_NOMATCH << 16); if (cadt_flags & IPSET_FLAG_IFACE_WILDCARD) e.wildcard = 1; } if (adt == IPSET_TEST || !tb[IPSET_ATTR_IP_TO]) { e.ip = htonl(ip & ip_set_hostmask(e.cidr)); ret = adtfn(set, &e, &ext, &ext, flags); return ip_set_enomatch(ret, flags, adt, set) ? -ret : ip_set_eexist(ret, flags) ? 0 : ret; } if (tb[IPSET_ATTR_IP_TO]) { ret = ip_set_get_hostipaddr4(tb[IPSET_ATTR_IP_TO], &ip_to); if (ret) return ret; if (ip_to < ip) swap(ip, ip_to); if (ip + UINT_MAX == ip_to) return -IPSET_ERR_HASH_RANGE; } else { ip_set_mask_from_to(ip, ip_to, e.cidr); } if (retried) ip = ntohl(h->next.ip); do { i++; e.ip = htonl(ip); if (i > IPSET_MAX_RANGE) { hash_netiface4_data_next(&h->next, &e); return -ERANGE; } ip = ip_set_range_to_cidr(ip, ip_to, &e.cidr); ret = adtfn(set, &e, &ext, &ext, flags); if (ret && !ip_set_eexist(ret, flags)) return ret; ret = 0; } while (ip++ < ip_to); return ret; } /* IPv6 variant */ struct hash_netiface6_elem_hashed { union nf_inet_addr ip; u8 physdev; u8 cidr; u8 nomatch; u8 elem; }; struct hash_netiface6_elem { union nf_inet_addr ip; u8 physdev; u8 cidr; u8 nomatch; u8 elem; u8 wildcard; char iface[IFNAMSIZ]; }; /* Common functions */ static bool hash_netiface6_data_equal(const struct hash_netiface6_elem *ip1, const struct hash_netiface6_elem *ip2, u32 *multi) { return ipv6_addr_equal(&ip1->ip.in6, &ip2->ip.in6) && ip1->cidr == ip2->cidr && (++*multi) && ip1->physdev == ip2->physdev && (ip1->wildcard ? strncmp(ip1->iface, ip2->iface, strlen(ip1->iface)) == 0 : strcmp(ip1->iface, ip2->iface) == 0); } static int hash_netiface6_do_data_match(const struct hash_netiface6_elem *elem) { return elem->nomatch ? -ENOTEMPTY : 1; } static void hash_netiface6_data_set_flags(struct hash_netiface6_elem *elem, u32 flags) { elem->nomatch = (flags >> 16) & IPSET_FLAG_NOMATCH; } static void hash_netiface6_data_reset_flags(struct hash_netiface6_elem *elem, u8 *flags) { swap(*flags, elem->nomatch); } static void hash_netiface6_data_netmask(struct hash_netiface6_elem *elem, u8 cidr) { ip6_netmask(&elem->ip, cidr); elem->cidr = cidr; } static bool hash_netiface6_data_list(struct sk_buff *skb, const struct hash_netiface6_elem *data) { u32 flags = (data->physdev ? IPSET_FLAG_PHYSDEV : 0) | (data->wildcard ? IPSET_FLAG_IFACE_WILDCARD : 0); if (data->nomatch) flags |= IPSET_FLAG_NOMATCH; if (nla_put_ipaddr6(skb, IPSET_ATTR_IP, &data->ip.in6) || nla_put_u8(skb, IPSET_ATTR_CIDR, data->cidr) || nla_put_string(skb, IPSET_ATTR_IFACE, data->iface) || (flags && nla_put_net32(skb, IPSET_ATTR_CADT_FLAGS, htonl(flags)))) goto nla_put_failure; return false; nla_put_failure: return true; } static void hash_netiface6_data_next(struct hash_netiface6_elem *next, const struct hash_netiface6_elem *d) { } #undef MTYPE #undef HOST_MASK #define MTYPE hash_netiface6 #define HOST_MASK 128 #define HKEY_DATALEN sizeof(struct hash_netiface6_elem_hashed) #define IP_SET_EMIT_CREATE #include "ip_set_hash_gen.h" static int hash_netiface6_kadt(struct ip_set *set, const struct sk_buff *skb, const struct xt_action_param *par, enum ipset_adt adt, struct ip_set_adt_opt *opt) { struct hash_netiface6 *h = set->data; ipset_adtfn adtfn = set->variant->adt[adt]; struct hash_netiface6_elem e = { .cidr = INIT_CIDR(h->nets[0].cidr[0], HOST_MASK), .elem = 1, }; struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, set); if (adt == IPSET_TEST) e.cidr = HOST_MASK; ip6addrptr(skb, opt->flags & IPSET_DIM_ONE_SRC, &e.ip.in6); ip6_netmask(&e.ip, e.cidr); if (opt->cmdflags & IPSET_FLAG_PHYSDEV) { #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER) const char *eiface = SRCDIR ? get_physindev_name(skb, xt_net(par)) : get_physoutdev_name(skb); if (!eiface) return -EINVAL; STRSCPY(e.iface, eiface); e.physdev = 1; #endif } else { STRSCPY(e.iface, SRCDIR ? IFACE(in) : IFACE(out)); } if (strlen(e.iface) == 0) return -EINVAL; return adtfn(set, &e, &ext, &opt->ext, opt->cmdflags); } static int hash_netiface6_uadt(struct ip_set *set, struct nlattr *tb[], enum ipset_adt adt, u32 *lineno, u32 flags, bool retried) { ipset_adtfn adtfn = set->variant->adt[adt]; struct hash_netiface6_elem e = { .cidr = HOST_MASK, .elem = 1 }; struct ip_set_ext ext = IP_SET_INIT_UEXT(set); int ret; if (tb[IPSET_ATTR_LINENO]) *lineno = nla_get_u32(tb[IPSET_ATTR_LINENO]); if (unlikely(!tb[IPSET_ATTR_IP] || !tb[IPSET_ATTR_IFACE] || !ip_set_optattr_netorder(tb, IPSET_ATTR_CADT_FLAGS))) return -IPSET_ERR_PROTOCOL; if (unlikely(tb[IPSET_ATTR_IP_TO])) return -IPSET_ERR_HASH_RANGE_UNSUPPORTED; ret = ip_set_get_ipaddr6(tb[IPSET_ATTR_IP], &e.ip); if (ret) return ret; ret = ip_set_get_extensions(set, tb, &ext); if (ret) return ret; if (tb[IPSET_ATTR_CIDR]) { e.cidr = nla_get_u8(tb[IPSET_ATTR_CIDR]); if (e.cidr > HOST_MASK) return -IPSET_ERR_INVALID_CIDR; } ip6_netmask(&e.ip, e.cidr); nla_strscpy(e.iface, tb[IPSET_ATTR_IFACE], IFNAMSIZ); if (tb[IPSET_ATTR_CADT_FLAGS]) { u32 cadt_flags = ip_set_get_h32(tb[IPSET_ATTR_CADT_FLAGS]); if (cadt_flags & IPSET_FLAG_PHYSDEV) e.physdev = 1; if (cadt_flags & IPSET_FLAG_NOMATCH) flags |= (IPSET_FLAG_NOMATCH << 16); if (cadt_flags & IPSET_FLAG_IFACE_WILDCARD) e.wildcard = 1; } ret = adtfn(set, &e, &ext, &ext, flags); return ip_set_enomatch(ret, flags, adt, set) ? -ret : ip_set_eexist(ret, flags) ? 0 : ret; } static struct ip_set_type hash_netiface_type __read_mostly = { .name = "hash:net,iface", .protocol = IPSET_PROTOCOL, .features = IPSET_TYPE_IP | IPSET_TYPE_IFACE | IPSET_TYPE_NOMATCH, .dimension = IPSET_DIM_TWO, .family = NFPROTO_UNSPEC, .revision_min = IPSET_TYPE_REV_MIN, .revision_max = IPSET_TYPE_REV_MAX, .create_flags[IPSET_TYPE_REV_MAX] = IPSET_CREATE_FLAG_BUCKETSIZE, .create = hash_netiface_create, .create_policy = { [IPSET_ATTR_HASHSIZE] = { .type = NLA_U32 }, [IPSET_ATTR_MAXELEM] = { .type = NLA_U32 }, [IPSET_ATTR_INITVAL] = { .type = NLA_U32 }, [IPSET_ATTR_BUCKETSIZE] = { .type = NLA_U8 }, [IPSET_ATTR_RESIZE] = { .type = NLA_U8 }, [IPSET_ATTR_PROTO] = { .type = NLA_U8 }, [IPSET_ATTR_TIMEOUT] = { .type = NLA_U32 }, [IPSET_ATTR_CADT_FLAGS] = { .type = NLA_U32 }, }, .adt_policy = { [IPSET_ATTR_IP] = { .type = NLA_NESTED }, [IPSET_ATTR_IP_TO] = { .type = NLA_NESTED }, [IPSET_ATTR_IFACE] = { .type = NLA_NUL_STRING, .len = IFNAMSIZ - 1 }, [IPSET_ATTR_CADT_FLAGS] = { .type = NLA_U32 }, [IPSET_ATTR_CIDR] = { .type = NLA_U8 }, [IPSET_ATTR_TIMEOUT] = { .type = NLA_U32 }, [IPSET_ATTR_LINENO] = { .type = NLA_U32 }, [IPSET_ATTR_BYTES] = { .type = NLA_U64 }, [IPSET_ATTR_PACKETS] = { .type = NLA_U64 }, [IPSET_ATTR_COMMENT] = { .type = NLA_NUL_STRING, .len = IPSET_MAX_COMMENT_SIZE }, [IPSET_ATTR_SKBMARK] = { .type = NLA_U64 }, [IPSET_ATTR_SKBPRIO] = { .type = NLA_U32 }, [IPSET_ATTR_SKBQUEUE] = { .type = NLA_U16 }, }, .me = THIS_MODULE, }; static int __init hash_netiface_init(void) { return ip_set_type_register(&hash_netiface_type); } static void __exit hash_netiface_fini(void) { rcu_barrier(); ip_set_type_unregister(&hash_netiface_type); } module_init(hash_netiface_init); module_exit(hash_netiface_fini); |
| 100 100 2742 2743 2742 2752 2741 4 2752 2743 2731 18 1612 1547 2766 206 5948 516 3605 3522 271 1126 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 | // SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) /* * Copyright (C) 2017-2024 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved. * Copyright Matt Mackall <mpm@selenic.com>, 2003, 2004, 2005 * Copyright Theodore Ts'o, 1994, 1995, 1996, 1997, 1998, 1999. All rights reserved. * * This driver produces cryptographically secure pseudorandom data. It is divided * into roughly six sections, each with a section header: * * - Initialization and readiness waiting. * - Fast key erasure RNG, the "crng". * - Entropy accumulation and extraction routines. * - Entropy collection routines. * - Userspace reader/writer interfaces. * - Sysctl interface. * * The high level overview is that there is one input pool, into which * various pieces of data are hashed. Prior to initialization, some of that * data is then "credited" as having a certain number of bits of entropy. * When enough bits of entropy are available, the hash is finalized and * handed as a key to a stream cipher that expands it indefinitely for * various consumers. This key is periodically refreshed as the various * entropy collectors, described below, add data to the input pool. */ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #include <linux/utsname.h> #include <linux/module.h> #include <linux/kernel.h> #include <linux/major.h> #include <linux/string.h> #include <linux/fcntl.h> #include <linux/slab.h> #include <linux/random.h> #include <linux/poll.h> #include <linux/init.h> #include <linux/fs.h> #include <linux/blkdev.h> #include <linux/interrupt.h> #include <linux/mm.h> #include <linux/nodemask.h> #include <linux/spinlock.h> #include <linux/kthread.h> #include <linux/percpu.h> #include <linux/ptrace.h> #include <linux/workqueue.h> #include <linux/irq.h> #include <linux/ratelimit.h> #include <linux/syscalls.h> #include <linux/completion.h> #include <linux/uuid.h> #include <linux/uaccess.h> #include <linux/suspend.h> #include <linux/siphash.h> #include <linux/sched/isolation.h> #include <crypto/chacha.h> #include <crypto/blake2s.h> #ifdef CONFIG_VDSO_GETRANDOM #include <vdso/getrandom.h> #include <vdso/datapage.h> #include <vdso/vsyscall.h> #endif #include <asm/archrandom.h> #include <asm/processor.h> #include <asm/irq.h> #include <asm/irq_regs.h> #include <asm/io.h> /********************************************************************* * * Initialization and readiness waiting. * * Much of the RNG infrastructure is devoted to various dependencies * being able to wait until the RNG has collected enough entropy and * is ready for safe consumption. * *********************************************************************/ /* * crng_init is protected by base_crng->lock, and only increases * its value (from empty->early->ready). */ static enum { CRNG_EMPTY = 0, /* Little to no entropy collected */ CRNG_EARLY = 1, /* At least POOL_EARLY_BITS collected */ CRNG_READY = 2 /* Fully initialized with POOL_READY_BITS collected */ } crng_init __read_mostly = CRNG_EMPTY; static DEFINE_STATIC_KEY_FALSE(crng_is_ready); #define crng_ready() (static_branch_likely(&crng_is_ready) || crng_init >= CRNG_READY) /* Various types of waiters for crng_init->CRNG_READY transition. */ static DECLARE_WAIT_QUEUE_HEAD(crng_init_wait); static struct fasync_struct *fasync; static ATOMIC_NOTIFIER_HEAD(random_ready_notifier); /* Control how we warn userspace. */ static struct ratelimit_state urandom_warning = RATELIMIT_STATE_INIT_FLAGS("urandom_warning", HZ, 3, RATELIMIT_MSG_ON_RELEASE); static int ratelimit_disable __read_mostly = IS_ENABLED(CONFIG_WARN_ALL_UNSEEDED_RANDOM); module_param_named(ratelimit_disable, ratelimit_disable, int, 0644); MODULE_PARM_DESC(ratelimit_disable, "Disable random ratelimit suppression"); /* * Returns whether or not the input pool has been seeded and thus guaranteed * to supply cryptographically secure random numbers. This applies to: the * /dev/urandom device, the get_random_bytes function, and the get_random_{u8, * u16,u32,u64,long} family of functions. * * Returns: true if the input pool has been seeded. * false if the input pool has not been seeded. */ bool rng_is_initialized(void) { return crng_ready(); } EXPORT_SYMBOL(rng_is_initialized); static void __cold crng_set_ready(struct work_struct *work) { static_branch_enable(&crng_is_ready); } /* Used by wait_for_random_bytes(), and considered an entropy collector, below. */ static void try_to_generate_entropy(void); /* * Wait for the input pool to be seeded and thus guaranteed to supply * cryptographically secure random numbers. This applies to: the /dev/urandom * device, the get_random_bytes function, and the get_random_{u8,u16,u32,u64, * long} family of functions. Using any of these functions without first * calling this function forfeits the guarantee of security. * * Returns: 0 if the input pool has been seeded. * -ERESTARTSYS if the function was interrupted by a signal. */ int wait_for_random_bytes(void) { while (!crng_ready()) { int ret; try_to_generate_entropy(); ret = wait_event_interruptible_timeout(crng_init_wait, crng_ready(), HZ); if (ret) return ret > 0 ? 0 : ret; } return 0; } EXPORT_SYMBOL(wait_for_random_bytes); /* * Add a callback function that will be invoked when the crng is initialised, * or immediately if it already has been. Only use this is you are absolutely * sure it is required. Most users should instead be able to test * `rng_is_initialized()` on demand, or make use of `get_random_bytes_wait()`. */ int __cold execute_with_initialized_rng(struct notifier_block *nb) { unsigned long flags; int ret = 0; spin_lock_irqsave(&random_ready_notifier.lock, flags); if (crng_ready()) nb->notifier_call(nb, 0, NULL); else ret = raw_notifier_chain_register((struct raw_notifier_head *)&random_ready_notifier.head, nb); spin_unlock_irqrestore(&random_ready_notifier.lock, flags); return ret; } #define warn_unseeded_randomness() \ if (IS_ENABLED(CONFIG_WARN_ALL_UNSEEDED_RANDOM) && !crng_ready()) \ printk_deferred(KERN_NOTICE "random: %s called from %pS with crng_init=%d\n", \ __func__, (void *)_RET_IP_, crng_init) /********************************************************************* * * Fast key erasure RNG, the "crng". * * These functions expand entropy from the entropy extractor into * long streams for external consumption using the "fast key erasure" * RNG described at <https://blog.cr.yp.to/20170723-random.html>. * * There are a few exported interfaces for use by other drivers: * * void get_random_bytes(void *buf, size_t len) * u8 get_random_u8() * u16 get_random_u16() * u32 get_random_u32() * u32 get_random_u32_below(u32 ceil) * u32 get_random_u32_above(u32 floor) * u32 get_random_u32_inclusive(u32 floor, u32 ceil) * u64 get_random_u64() * unsigned long get_random_long() * * These interfaces will return the requested number of random bytes * into the given buffer or as a return value. This is equivalent to * a read from /dev/urandom. The u8, u16, u32, u64, long family of * functions may be higher performance for one-off random integers, * because they do a bit of buffering and do not invoke reseeding * until the buffer is emptied. * *********************************************************************/ enum { CRNG_RESEED_START_INTERVAL = HZ, CRNG_RESEED_INTERVAL = 60 * HZ }; static struct { u8 key[CHACHA_KEY_SIZE] __aligned(__alignof__(long)); unsigned long generation; spinlock_t lock; } base_crng = { .lock = __SPIN_LOCK_UNLOCKED(base_crng.lock) }; struct crng { u8 key[CHACHA_KEY_SIZE]; unsigned long generation; local_lock_t lock; }; static DEFINE_PER_CPU(struct crng, crngs) = { .generation = ULONG_MAX, .lock = INIT_LOCAL_LOCK(crngs.lock), }; /* * Return the interval until the next reseeding, which is normally * CRNG_RESEED_INTERVAL, but during early boot, it is at an interval * proportional to the uptime. */ static unsigned int crng_reseed_interval(void) { static bool early_boot = true; if (unlikely(READ_ONCE(early_boot))) { time64_t uptime = ktime_get_seconds(); if (uptime >= CRNG_RESEED_INTERVAL / HZ * 2) WRITE_ONCE(early_boot, false); else return max_t(unsigned int, CRNG_RESEED_START_INTERVAL, (unsigned int)uptime / 2 * HZ); } return CRNG_RESEED_INTERVAL; } /* Used by crng_reseed() and crng_make_state() to extract a new seed from the input pool. */ static void extract_entropy(void *buf, size_t len); /* This extracts a new crng key from the input pool. */ static void crng_reseed(struct work_struct *work) { static DECLARE_DELAYED_WORK(next_reseed, crng_reseed); unsigned long flags; unsigned long next_gen; u8 key[CHACHA_KEY_SIZE]; /* Immediately schedule the next reseeding, so that it fires sooner rather than later. */ if (likely(system_unbound_wq)) queue_delayed_work(system_unbound_wq, &next_reseed, crng_reseed_interval()); extract_entropy(key, sizeof(key)); /* * We copy the new key into the base_crng, overwriting the old one, * and update the generation counter. We avoid hitting ULONG_MAX, * because the per-cpu crngs are initialized to ULONG_MAX, so this * forces new CPUs that come online to always initialize. */ spin_lock_irqsave(&base_crng.lock, flags); memcpy(base_crng.key, key, sizeof(base_crng.key)); next_gen = base_crng.generation + 1; if (next_gen == ULONG_MAX) ++next_gen; WRITE_ONCE(base_crng.generation, next_gen); #ifdef CONFIG_VDSO_GETRANDOM /* base_crng.generation's invalid value is ULONG_MAX, while * vdso_k_rng_data->generation's invalid value is 0, so add one to the * former to arrive at the latter. Use smp_store_release so that this * is ordered with the write above to base_crng.generation. Pairs with * the smp_rmb() before the syscall in the vDSO code. * * Cast to unsigned long for 32-bit architectures, since atomic 64-bit * operations are not supported on those architectures. This is safe * because base_crng.generation is a 32-bit value. On big-endian * architectures it will be stored in the upper 32 bits, but that's okay * because the vDSO side only checks whether the value changed, without * actually using or interpreting the value. */ smp_store_release((unsigned long *)&vdso_k_rng_data->generation, next_gen + 1); #endif if (!static_branch_likely(&crng_is_ready)) crng_init = CRNG_READY; spin_unlock_irqrestore(&base_crng.lock, flags); memzero_explicit(key, sizeof(key)); } /* * This generates a ChaCha block using the provided key, and then * immediately overwrites that key with half the block. It returns * the resultant ChaCha state to the user, along with the second * half of the block containing 32 bytes of random data that may * be used; random_data_len may not be greater than 32. * * The returned ChaCha state contains within it a copy of the old * key value, at index 4, so the state should always be zeroed out * immediately after using in order to maintain forward secrecy. * If the state cannot be erased in a timely manner, then it is * safer to set the random_data parameter to &chacha_state->x[4] * so that this function overwrites it before returning. */ static void crng_fast_key_erasure(u8 key[CHACHA_KEY_SIZE], struct chacha_state *chacha_state, u8 *random_data, size_t random_data_len) { u8 first_block[CHACHA_BLOCK_SIZE]; BUG_ON(random_data_len > 32); chacha_init_consts(chacha_state); memcpy(&chacha_state->x[4], key, CHACHA_KEY_SIZE); memset(&chacha_state->x[12], 0, sizeof(u32) * 4); chacha20_block(chacha_state, first_block); memcpy(key, first_block, CHACHA_KEY_SIZE); memcpy(random_data, first_block + CHACHA_KEY_SIZE, random_data_len); memzero_explicit(first_block, sizeof(first_block)); } /* * This function returns a ChaCha state that you may use for generating * random data. It also returns up to 32 bytes on its own of random data * that may be used; random_data_len may not be greater than 32. */ static void crng_make_state(struct chacha_state *chacha_state, u8 *random_data, size_t random_data_len) { unsigned long flags; struct crng *crng; BUG_ON(random_data_len > 32); /* * For the fast path, we check whether we're ready, unlocked first, and * then re-check once locked later. In the case where we're really not * ready, we do fast key erasure with the base_crng directly, extracting * when crng_init is CRNG_EMPTY. */ if (!crng_ready()) { bool ready; spin_lock_irqsave(&base_crng.lock, flags); ready = crng_ready(); if (!ready) { if (crng_init == CRNG_EMPTY) extract_entropy(base_crng.key, sizeof(base_crng.key)); crng_fast_key_erasure(base_crng.key, chacha_state, random_data, random_data_len); } spin_unlock_irqrestore(&base_crng.lock, flags); if (!ready) return; } local_lock_irqsave(&crngs.lock, flags); crng = raw_cpu_ptr(&crngs); /* * If our per-cpu crng is older than the base_crng, then it means * somebody reseeded the base_crng. In that case, we do fast key * erasure on the base_crng, and use its output as the new key * for our per-cpu crng. This brings us up to date with base_crng. */ if (unlikely(crng->generation != READ_ONCE(base_crng.generation))) { spin_lock(&base_crng.lock); crng_fast_key_erasure(base_crng.key, chacha_state, crng->key, sizeof(crng->key)); crng->generation = base_crng.generation; spin_unlock(&base_crng.lock); } /* * Finally, when we've made it this far, our per-cpu crng has an up * to date key, and we can do fast key erasure with it to produce * some random data and a ChaCha state for the caller. All other * branches of this function are "unlikely", so most of the time we * should wind up here immediately. */ crng_fast_key_erasure(crng->key, chacha_state, random_data, random_data_len); local_unlock_irqrestore(&crngs.lock, flags); } static void _get_random_bytes(void *buf, size_t len) { struct chacha_state chacha_state; u8 tmp[CHACHA_BLOCK_SIZE]; size_t first_block_len; if (!len) return; first_block_len = min_t(size_t, 32, len); crng_make_state(&chacha_state, buf, first_block_len); len -= first_block_len; buf += first_block_len; while (len) { if (len < CHACHA_BLOCK_SIZE) { chacha20_block(&chacha_state, tmp); memcpy(buf, tmp, len); memzero_explicit(tmp, sizeof(tmp)); break; } chacha20_block(&chacha_state, buf); if (unlikely(chacha_state.x[12] == 0)) ++chacha_state.x[13]; len -= CHACHA_BLOCK_SIZE; buf += CHACHA_BLOCK_SIZE; } chacha_zeroize_state(&chacha_state); } /* * This returns random bytes in arbitrary quantities. The quality of the * random bytes is good as /dev/urandom. In order to ensure that the * randomness provided by this function is okay, the function * wait_for_random_bytes() should be called and return 0 at least once * at any point prior. */ void get_random_bytes(void *buf, size_t len) { warn_unseeded_randomness(); _get_random_bytes(buf, len); } EXPORT_SYMBOL(get_random_bytes); static ssize_t get_random_bytes_user(struct iov_iter *iter) { struct chacha_state chacha_state; u8 block[CHACHA_BLOCK_SIZE]; size_t ret = 0, copied; if (unlikely(!iov_iter_count(iter))) return 0; /* * Immediately overwrite the ChaCha key at index 4 with random * bytes, in case userspace causes copy_to_iter() below to sleep * forever, so that we still retain forward secrecy in that case. */ crng_make_state(&chacha_state, (u8 *)&chacha_state.x[4], CHACHA_KEY_SIZE); /* * However, if we're doing a read of len <= 32, we don't need to * use chacha_state after, so we can simply return those bytes to * the user directly. */ if (iov_iter_count(iter) <= CHACHA_KEY_SIZE) { ret = copy_to_iter(&chacha_state.x[4], CHACHA_KEY_SIZE, iter); goto out_zero_chacha; } for (;;) { chacha20_block(&chacha_state, block); if (unlikely(chacha_state.x[12] == 0)) ++chacha_state.x[13]; copied = copy_to_iter(block, sizeof(block), iter); ret += copied; if (!iov_iter_count(iter) || copied != sizeof(block)) break; BUILD_BUG_ON(PAGE_SIZE % sizeof(block) != 0); if (ret % PAGE_SIZE == 0) { if (signal_pending(current)) break; cond_resched(); } } memzero_explicit(block, sizeof(block)); out_zero_chacha: chacha_zeroize_state(&chacha_state); return ret ? ret : -EFAULT; } /* * Batched entropy returns random integers. The quality of the random * number is good as /dev/urandom. In order to ensure that the randomness * provided by this function is okay, the function wait_for_random_bytes() * should be called and return 0 at least once at any point prior. */ #define DEFINE_BATCHED_ENTROPY(type) \ struct batch_ ##type { \ /* \ * We make this 1.5x a ChaCha block, so that we get the \ * remaining 32 bytes from fast key erasure, plus one full \ * block from the detached ChaCha state. We can increase \ * the size of this later if needed so long as we keep the \ * formula of (integer_blocks + 0.5) * CHACHA_BLOCK_SIZE. \ */ \ type entropy[CHACHA_BLOCK_SIZE * 3 / (2 * sizeof(type))]; \ local_lock_t lock; \ unsigned long generation; \ unsigned int position; \ }; \ \ static DEFINE_PER_CPU(struct batch_ ##type, batched_entropy_ ##type) = { \ .lock = INIT_LOCAL_LOCK(batched_entropy_ ##type.lock), \ .position = UINT_MAX \ }; \ \ type get_random_ ##type(void) \ { \ type ret; \ unsigned long flags; \ struct batch_ ##type *batch; \ unsigned long next_gen; \ \ warn_unseeded_randomness(); \ \ if (!crng_ready()) { \ _get_random_bytes(&ret, sizeof(ret)); \ return ret; \ } \ \ local_lock_irqsave(&batched_entropy_ ##type.lock, flags); \ batch = raw_cpu_ptr(&batched_entropy_##type); \ \ next_gen = READ_ONCE(base_crng.generation); \ if (batch->position >= ARRAY_SIZE(batch->entropy) || \ next_gen != batch->generation) { \ _get_random_bytes(batch->entropy, sizeof(batch->entropy)); \ batch->position = 0; \ batch->generation = next_gen; \ } \ \ ret = batch->entropy[batch->position]; \ batch->entropy[batch->position] = 0; \ ++batch->position; \ local_unlock_irqrestore(&batched_entropy_ ##type.lock, flags); \ return ret; \ } \ EXPORT_SYMBOL(get_random_ ##type); DEFINE_BATCHED_ENTROPY(u8) DEFINE_BATCHED_ENTROPY(u16) DEFINE_BATCHED_ENTROPY(u32) DEFINE_BATCHED_ENTROPY(u64) u32 __get_random_u32_below(u32 ceil) { /* * This is the slow path for variable ceil. It is still fast, most of * the time, by doing traditional reciprocal multiplication and * opportunistically comparing the lower half to ceil itself, before * falling back to computing a larger bound, and then rejecting samples * whose lower half would indicate a range indivisible by ceil. The use * of `-ceil % ceil` is analogous to `2^32 % ceil`, but is computable * in 32-bits. */ u32 rand = get_random_u32(); u64 mult; /* * This function is technically undefined for ceil == 0, and in fact * for the non-underscored constant version in the header, we build bug * on that. But for the non-constant case, it's convenient to have that * evaluate to being a straight call to get_random_u32(), so that * get_random_u32_inclusive() can work over its whole range without * undefined behavior. */ if (unlikely(!ceil)) return rand; mult = (u64)ceil * rand; if (unlikely((u32)mult < ceil)) { u32 bound = -ceil % ceil; while (unlikely((u32)mult < bound)) mult = (u64)ceil * get_random_u32(); } return mult >> 32; } EXPORT_SYMBOL(__get_random_u32_below); #ifdef CONFIG_SMP /* * This function is called when the CPU is coming up, with entry * CPUHP_RANDOM_PREPARE, which comes before CPUHP_WORKQUEUE_PREP. */ int __cold random_prepare_cpu(unsigned int cpu) { /* * When the cpu comes back online, immediately invalidate both * the per-cpu crng and all batches, so that we serve fresh * randomness. */ per_cpu_ptr(&crngs, cpu)->generation = ULONG_MAX; per_cpu_ptr(&batched_entropy_u8, cpu)->position = UINT_MAX; per_cpu_ptr(&batched_entropy_u16, cpu)->position = UINT_MAX; per_cpu_ptr(&batched_entropy_u32, cpu)->position = UINT_MAX; per_cpu_ptr(&batched_entropy_u64, cpu)->position = UINT_MAX; return 0; } #endif /********************************************************************** * * Entropy accumulation and extraction routines. * * Callers may add entropy via: * * static void mix_pool_bytes(const void *buf, size_t len) * * After which, if added entropy should be credited: * * static void credit_init_bits(size_t bits) * * Finally, extract entropy via: * * static void extract_entropy(void *buf, size_t len) * **********************************************************************/ enum { POOL_BITS = BLAKE2S_HASH_SIZE * 8, POOL_READY_BITS = POOL_BITS, /* When crng_init->CRNG_READY */ POOL_EARLY_BITS = POOL_READY_BITS / 2 /* When crng_init->CRNG_EARLY */ }; static struct { struct blake2s_state hash; spinlock_t lock; unsigned int init_bits; } input_pool = { .hash.h = { BLAKE2S_IV0 ^ (0x01010000 | BLAKE2S_HASH_SIZE), BLAKE2S_IV1, BLAKE2S_IV2, BLAKE2S_IV3, BLAKE2S_IV4, BLAKE2S_IV5, BLAKE2S_IV6, BLAKE2S_IV7 }, .hash.outlen = BLAKE2S_HASH_SIZE, .lock = __SPIN_LOCK_UNLOCKED(input_pool.lock), }; static void _mix_pool_bytes(const void *buf, size_t len) { blake2s_update(&input_pool.hash, buf, len); } /* * This function adds bytes into the input pool. It does not * update the initialization bit counter; the caller should call * credit_init_bits if this is appropriate. */ static void mix_pool_bytes(const void *buf, size_t len) { unsigned long flags; spin_lock_irqsave(&input_pool.lock, flags); _mix_pool_bytes(buf, len); spin_unlock_irqrestore(&input_pool.lock, flags); } /* * This is an HKDF-like construction for using the hashed collected entropy * as a PRF key, that's then expanded block-by-block. */ static void extract_entropy(void *buf, size_t len) { unsigned long flags; u8 seed[BLAKE2S_HASH_SIZE], next_key[BLAKE2S_HASH_SIZE]; struct { unsigned long rdseed[32 / sizeof(long)]; size_t counter; } block; size_t i, longs; for (i = 0; i < ARRAY_SIZE(block.rdseed);) { longs = arch_get_random_seed_longs(&block.rdseed[i], ARRAY_SIZE(block.rdseed) - i); if (longs) { i += longs; continue; } longs = arch_get_random_longs(&block.rdseed[i], ARRAY_SIZE(block.rdseed) - i); if (longs) { i += longs; continue; } block.rdseed[i++] = random_get_entropy(); } spin_lock_irqsave(&input_pool.lock, flags); /* seed = HASHPRF(last_key, entropy_input) */ blake2s_final(&input_pool.hash, seed); /* next_key = HASHPRF(seed, RDSEED || 0) */ block.counter = 0; blake2s(next_key, (u8 *)&block, seed, sizeof(next_key), sizeof(block), sizeof(seed)); blake2s_init_key(&input_pool.hash, BLAKE2S_HASH_SIZE, next_key, sizeof(next_key)); spin_unlock_irqrestore(&input_pool.lock, flags); memzero_explicit(next_key, sizeof(next_key)); while (len) { i = min_t(size_t, len, BLAKE2S_HASH_SIZE); /* output = HASHPRF(seed, RDSEED || ++counter) */ ++block.counter; blake2s(buf, (u8 *)&block, seed, i, sizeof(block), sizeof(seed)); len -= i; buf += i; } memzero_explicit(seed, sizeof(seed)); memzero_explicit(&block, sizeof(block)); } #define credit_init_bits(bits) if (!crng_ready()) _credit_init_bits(bits) static void __cold _credit_init_bits(size_t bits) { static DECLARE_WORK(set_ready, crng_set_ready); unsigned int new, orig, add; unsigned long flags; int m; if (!bits) return; add = min_t(size_t, bits, POOL_BITS); orig = READ_ONCE(input_pool.init_bits); do { new = min_t(unsigned int, POOL_BITS, orig + add); } while (!try_cmpxchg(&input_pool.init_bits, &orig, new)); if (orig < POOL_READY_BITS && new >= POOL_READY_BITS) { crng_reseed(NULL); /* Sets crng_init to CRNG_READY under base_crng.lock. */ if (static_key_initialized && system_unbound_wq) queue_work(system_unbound_wq, &set_ready); atomic_notifier_call_chain(&random_ready_notifier, 0, NULL); #ifdef CONFIG_VDSO_GETRANDOM WRITE_ONCE(vdso_k_rng_data->is_ready, true); #endif wake_up_interruptible(&crng_init_wait); kill_fasync(&fasync, SIGIO, POLL_IN); pr_notice("crng init done\n"); m = ratelimit_state_get_miss(&urandom_warning); if (m) pr_notice("%d urandom warning(s) missed due to ratelimiting\n", m); } else if (orig < POOL_EARLY_BITS && new >= POOL_EARLY_BITS) { spin_lock_irqsave(&base_crng.lock, flags); /* Check if crng_init is CRNG_EMPTY, to avoid race with crng_reseed(). */ if (crng_init == CRNG_EMPTY) { extract_entropy(base_crng.key, sizeof(base_crng.key)); crng_init = CRNG_EARLY; } spin_unlock_irqrestore(&base_crng.lock, flags); } } /********************************************************************** * * Entropy collection routines. * * The following exported functions are used for pushing entropy into * the above entropy accumulation routines: * * void add_device_randomness(const void *buf, size_t len); * void add_hwgenerator_randomness(const void *buf, size_t len, size_t entropy, bool sleep_after); * void add_bootloader_randomness(const void *buf, size_t len); * void add_vmfork_randomness(const void *unique_vm_id, size_t len); * void add_interrupt_randomness(int irq); * void add_input_randomness(unsigned int type, unsigned int code, unsigned int value); * void add_disk_randomness(struct gendisk *disk); * * add_device_randomness() adds data to the input pool that * is likely to differ between two devices (or possibly even per boot). * This would be things like MAC addresses or serial numbers, or the * read-out of the RTC. This does *not* credit any actual entropy to * the pool, but it initializes the pool to different values for devices * that might otherwise be identical and have very little entropy * available to them (particularly common in the embedded world). * * add_hwgenerator_randomness() is for true hardware RNGs, and will credit * entropy as specified by the caller. If the entropy pool is full it will * block until more entropy is needed. * * add_bootloader_randomness() is called by bootloader drivers, such as EFI * and device tree, and credits its input depending on whether or not the * command line option 'random.trust_bootloader'. * * add_vmfork_randomness() adds a unique (but not necessarily secret) ID * representing the current instance of a VM to the pool, without crediting, * and then force-reseeds the crng so that it takes effect immediately. * * add_interrupt_randomness() uses the interrupt timing as random * inputs to the entropy pool. Using the cycle counters and the irq source * as inputs, it feeds the input pool roughly once a second or after 64 * interrupts, crediting 1 bit of entropy for whichever comes first. * * add_input_randomness() uses the input layer interrupt timing, as well * as the event type information from the hardware. * * add_disk_randomness() uses what amounts to the seek time of block * layer request events, on a per-disk_devt basis, as input to the * entropy pool. Note that high-speed solid state drives with very low * seek times do not make for good sources of entropy, as their seek * times are usually fairly consistent. * * The last two routines try to estimate how many bits of entropy * to credit. They do this by keeping track of the first and second * order deltas of the event timings. * **********************************************************************/ static bool trust_cpu __initdata = true; static bool trust_bootloader __initdata = true; static int __init parse_trust_cpu(char *arg) { return kstrtobool(arg, &trust_cpu); } static int __init parse_trust_bootloader(char *arg) { return kstrtobool(arg, &trust_bootloader); } early_param("random.trust_cpu", parse_trust_cpu); early_param("random.trust_bootloader", parse_trust_bootloader); static int random_pm_notification(struct notifier_block *nb, unsigned long action, void *data) { unsigned long flags, entropy = random_get_entropy(); /* * Encode a representation of how long the system has been suspended, * in a way that is distinct from prior system suspends. */ ktime_t stamps[] = { ktime_get(), ktime_get_boottime(), ktime_get_real() }; spin_lock_irqsave(&input_pool.lock, flags); _mix_pool_bytes(&action, sizeof(action)); _mix_pool_bytes(stamps, sizeof(stamps)); _mix_pool_bytes(&entropy, sizeof(entropy)); spin_unlock_irqrestore(&input_pool.lock, flags); if (crng_ready() && (action == PM_RESTORE_PREPARE || (action == PM_POST_SUSPEND && !IS_ENABLED(CONFIG_PM_AUTOSLEEP) && !IS_ENABLED(CONFIG_PM_USERSPACE_AUTOSLEEP)))) { crng_reseed(NULL); pr_notice("crng reseeded on system resumption\n"); } return 0; } static struct notifier_block pm_notifier = { .notifier_call = random_pm_notification }; /* * This is called extremely early, before time keeping functionality is * available, but arch randomness is. Interrupts are not yet enabled. */ void __init random_init_early(const char *command_line) { unsigned long entropy[BLAKE2S_BLOCK_SIZE / sizeof(long)]; size_t i, longs, arch_bits; #if defined(LATENT_ENTROPY_PLUGIN) static const u8 compiletime_seed[BLAKE2S_BLOCK_SIZE] __initconst __latent_entropy; _mix_pool_bytes(compiletime_seed, sizeof(compiletime_seed)); #endif for (i = 0, arch_bits = sizeof(entropy) * 8; i < ARRAY_SIZE(entropy);) { longs = arch_get_random_seed_longs(entropy, ARRAY_SIZE(entropy) - i); if (longs) { _mix_pool_bytes(entropy, sizeof(*entropy) * longs); i += longs; continue; } longs = arch_get_random_longs(entropy, ARRAY_SIZE(entropy) - i); if (longs) { _mix_pool_bytes(entropy, sizeof(*entropy) * longs); i += longs; continue; } arch_bits -= sizeof(*entropy) * 8; ++i; } _mix_pool_bytes(init_utsname(), sizeof(*(init_utsname()))); _mix_pool_bytes(command_line, strlen(command_line)); /* Reseed if already seeded by earlier phases. */ if (crng_ready()) crng_reseed(NULL); else if (trust_cpu) _credit_init_bits(arch_bits); } /* * This is called a little bit after the prior function, and now there is * access to timestamps counters. Interrupts are not yet enabled. */ void __init random_init(void) { unsigned long entropy = random_get_entropy(); ktime_t now = ktime_get_real(); _mix_pool_bytes(&now, sizeof(now)); _mix_pool_bytes(&entropy, sizeof(entropy)); add_latent_entropy(); /* * If we were initialized by the cpu or bootloader before jump labels * or workqueues are initialized, then we should enable the static * branch here, where it's guaranteed that these have been initialized. */ if (!static_branch_likely(&crng_is_ready) && crng_init >= CRNG_READY) crng_set_ready(NULL); /* Reseed if already seeded by earlier phases. */ if (crng_ready()) crng_reseed(NULL); WARN_ON(register_pm_notifier(&pm_notifier)); WARN(!entropy, "Missing cycle counter and fallback timer; RNG " "entropy collection will consequently suffer."); } /* * Add device- or boot-specific data to the input pool to help * initialize it. * * None of this adds any entropy; it is meant to avoid the problem of * the entropy pool having similar initial state across largely * identical devices. */ void add_device_randomness(const void *buf, size_t len) { unsigned long entropy = random_get_entropy(); unsigned long flags; spin_lock_irqsave(&input_pool.lock, flags); _mix_pool_bytes(&entropy, sizeof(entropy)); _mix_pool_bytes(buf, len); spin_unlock_irqrestore(&input_pool.lock, flags); } EXPORT_SYMBOL(add_device_randomness); /* * Interface for in-kernel drivers of true hardware RNGs. Those devices * may produce endless random bits, so this function will sleep for * some amount of time after, if the sleep_after parameter is true. */ void add_hwgenerator_randomness(const void *buf, size_t len, size_t entropy, bool sleep_after) { mix_pool_bytes(buf, len); credit_init_bits(entropy); /* * Throttle writing to once every reseed interval, unless we're not yet * initialized or no entropy is credited. */ if (sleep_after && !kthread_should_stop() && (crng_ready() || !entropy)) schedule_timeout_interruptible(crng_reseed_interval()); } EXPORT_SYMBOL_GPL(add_hwgenerator_randomness); /* * Handle random seed passed by bootloader, and credit it depending * on the command line option 'random.trust_bootloader'. */ void __init add_bootloader_randomness(const void *buf, size_t len) { mix_pool_bytes(buf, len); if (trust_bootloader) credit_init_bits(len * 8); } #if IS_ENABLED(CONFIG_VMGENID) static BLOCKING_NOTIFIER_HEAD(vmfork_chain); /* * Handle a new unique VM ID, which is unique, not secret, so we * don't credit it, but we do immediately force a reseed after so * that it's used by the crng posthaste. */ void __cold add_vmfork_randomness(const void *unique_vm_id, size_t len) { add_device_randomness(unique_vm_id, len); if (crng_ready()) { crng_reseed(NULL); pr_notice("crng reseeded due to virtual machine fork\n"); } blocking_notifier_call_chain(&vmfork_chain, 0, NULL); } #if IS_MODULE(CONFIG_VMGENID) EXPORT_SYMBOL_GPL(add_vmfork_randomness); #endif int __cold register_random_vmfork_notifier(struct notifier_block *nb) { return blocking_notifier_chain_register(&vmfork_chain, nb); } EXPORT_SYMBOL_GPL(register_random_vmfork_notifier); int __cold unregister_random_vmfork_notifier(struct notifier_block *nb) { return blocking_notifier_chain_unregister(&vmfork_chain, nb); } EXPORT_SYMBOL_GPL(unregister_random_vmfork_notifier); #endif struct fast_pool { unsigned long pool[4]; unsigned long last; unsigned int count; struct timer_list mix; }; static void mix_interrupt_randomness(struct timer_list *work); static DEFINE_PER_CPU(struct fast_pool, irq_randomness) = { #ifdef CONFIG_64BIT #define FASTMIX_PERM SIPHASH_PERMUTATION .pool = { SIPHASH_CONST_0, SIPHASH_CONST_1, SIPHASH_CONST_2, SIPHASH_CONST_3 }, #else #define FASTMIX_PERM HSIPHASH_PERMUTATION .pool = { HSIPHASH_CONST_0, HSIPHASH_CONST_1, HSIPHASH_CONST_2, HSIPHASH_CONST_3 }, #endif .mix = __TIMER_INITIALIZER(mix_interrupt_randomness, 0) }; /* * This is [Half]SipHash-1-x, starting from an empty key. Because * the key is fixed, it assumes that its inputs are non-malicious, * and therefore this has no security on its own. s represents the * four-word SipHash state, while v represents a two-word input. */ static void fast_mix(unsigned long s[4], unsigned long v1, unsigned long v2) { s[3] ^= v1; FASTMIX_PERM(s[0], s[1], s[2], s[3]); s[0] ^= v1; s[3] ^= v2; FASTMIX_PERM(s[0], s[1], s[2], s[3]); s[0] ^= v2; } #ifdef CONFIG_SMP /* * This function is called when the CPU has just come online, with * entry CPUHP_AP_RANDOM_ONLINE, just after CPUHP_AP_WORKQUEUE_ONLINE. */ int __cold random_online_cpu(unsigned int cpu) { /* * During CPU shutdown and before CPU onlining, add_interrupt_ * randomness() may schedule mix_interrupt_randomness(), and * set the MIX_INFLIGHT flag. However, because the worker can * be scheduled on a different CPU during this period, that * flag will never be cleared. For that reason, we zero out * the flag here, which runs just after workqueues are onlined * for the CPU again. This also has the effect of setting the * irq randomness count to zero so that new accumulated irqs * are fresh. */ per_cpu_ptr(&irq_randomness, cpu)->count = 0; return 0; } #endif static void mix_interrupt_randomness(struct timer_list *work) { struct fast_pool *fast_pool = container_of(work, struct fast_pool, mix); /* * The size of the copied stack pool is explicitly 2 longs so that we * only ever ingest half of the siphash output each time, retaining * the other half as the next "key" that carries over. The entropy is * supposed to be sufficiently dispersed between bits so on average * we don't wind up "losing" some. */ unsigned long pool[2]; unsigned int count; /* Check to see if we're running on the wrong CPU due to hotplug. */ local_irq_disable(); if (fast_pool != this_cpu_ptr(&irq_randomness)) { local_irq_enable(); return; } /* * Copy the pool to the stack so that the mixer always has a * consistent view, before we reenable irqs again. */ memcpy(pool, fast_pool->pool, sizeof(pool)); count = fast_pool->count; fast_pool->count = 0; fast_pool->last = jiffies; local_irq_enable(); mix_pool_bytes(pool, sizeof(pool)); credit_init_bits(clamp_t(unsigned int, (count & U16_MAX) / 64, 1, sizeof(pool) * 8)); memzero_explicit(pool, sizeof(pool)); } void add_interrupt_randomness(int irq) { enum { MIX_INFLIGHT = 1U << 31 }; unsigned long entropy = random_get_entropy(); struct fast_pool *fast_pool = this_cpu_ptr(&irq_randomness); struct pt_regs *regs = get_irq_regs(); unsigned int new_count; fast_mix(fast_pool->pool, entropy, (regs ? instruction_pointer(regs) : _RET_IP_) ^ swab(irq)); new_count = ++fast_pool->count; if (new_count & MIX_INFLIGHT) return; if (new_count < 1024 && !time_is_before_jiffies(fast_pool->last + HZ)) return; fast_pool->count |= MIX_INFLIGHT; if (!timer_pending(&fast_pool->mix)) { fast_pool->mix.expires = jiffies; add_timer_on(&fast_pool->mix, raw_smp_processor_id()); } } EXPORT_SYMBOL_GPL(add_interrupt_randomness); /* There is one of these per entropy source */ struct timer_rand_state { unsigned long last_time; long last_delta, last_delta2; }; /* * This function adds entropy to the entropy "pool" by using timing * delays. It uses the timer_rand_state structure to make an estimate * of how many bits of entropy this call has added to the pool. The * value "num" is also added to the pool; it should somehow describe * the type of event that just happened. */ static void add_timer_randomness(struct timer_rand_state *state, unsigned int num) { unsigned long entropy = random_get_entropy(), now = jiffies, flags; long delta, delta2, delta3; unsigned int bits; /* * If we're in a hard IRQ, add_interrupt_randomness() will be called * sometime after, so mix into the fast pool. */ if (in_hardirq()) { fast_mix(this_cpu_ptr(&irq_randomness)->pool, entropy, num); } else { spin_lock_irqsave(&input_pool.lock, flags); _mix_pool_bytes(&entropy, sizeof(entropy)); _mix_pool_bytes(&num, sizeof(num)); spin_unlock_irqrestore(&input_pool.lock, flags); } if (crng_ready()) return; /* * Calculate number of bits of randomness we probably added. * We take into account the first, second and third-order deltas * in order to make our estimate. */ delta = now - READ_ONCE(state->last_time); WRITE_ONCE(state->last_time, now); delta2 = delta - READ_ONCE(state->last_delta); WRITE_ONCE(state->last_delta, delta); delta3 = delta2 - READ_ONCE(state->last_delta2); WRITE_ONCE(state->last_delta2, delta2); if (delta < 0) delta = -delta; if (delta2 < 0) delta2 = -delta2; if (delta3 < 0) delta3 = -delta3; if (delta > delta2) delta = delta2; if (delta > delta3) delta = delta3; /* * delta is now minimum absolute delta. Round down by 1 bit * on general principles, and limit entropy estimate to 11 bits. */ bits = min(fls(delta >> 1), 11); /* * As mentioned above, if we're in a hard IRQ, add_interrupt_randomness() * will run after this, which uses a different crediting scheme of 1 bit * per every 64 interrupts. In order to let that function do accounting * close to the one in this function, we credit a full 64/64 bit per bit, * and then subtract one to account for the extra one added. */ if (in_hardirq()) this_cpu_ptr(&irq_randomness)->count += max(1u, bits * 64) - 1; else _credit_init_bits(bits); } void add_input_randomness(unsigned int type, unsigned int code, unsigned int value) { static unsigned char last_value; static struct timer_rand_state input_timer_state = { INITIAL_JIFFIES }; /* Ignore autorepeat and the like. */ if (value == last_value) return; last_value = value; add_timer_randomness(&input_timer_state, (type << 4) ^ code ^ (code >> 4) ^ value); } EXPORT_SYMBOL_GPL(add_input_randomness); #ifdef CONFIG_BLOCK void add_disk_randomness(struct gendisk *disk) { if (!disk || !disk->random) return; /* First major is 1, so we get >= 0x200 here. */ add_timer_randomness(disk->random, 0x100 + disk_devt(disk)); } EXPORT_SYMBOL_GPL(add_disk_randomness); void __cold rand_initialize_disk(struct gendisk *disk) { struct timer_rand_state *state; /* * If kzalloc returns null, we just won't use that entropy * source. */ state = kzalloc(sizeof(struct timer_rand_state), GFP_KERNEL); if (state) { state->last_time = INITIAL_JIFFIES; disk->random = state; } } #endif struct entropy_timer_state { unsigned long entropy; struct timer_list timer; atomic_t samples; unsigned int samples_per_bit; }; /* * Each time the timer fires, we expect that we got an unpredictable jump in * the cycle counter. Even if the timer is running on another CPU, the timer * activity will be touching the stack of the CPU that is generating entropy. * * Note that we don't re-arm the timer in the timer itself - we are happy to be * scheduled away, since that just makes the load more complex, but we do not * want the timer to keep ticking unless the entropy loop is running. * * So the re-arming always happens in the entropy loop itself. */ static void __cold entropy_timer(struct timer_list *timer) { struct entropy_timer_state *state = container_of(timer, struct entropy_timer_state, timer); unsigned long entropy = random_get_entropy(); mix_pool_bytes(&entropy, sizeof(entropy)); if (atomic_inc_return(&state->samples) % state->samples_per_bit == 0) credit_init_bits(1); } /* * If we have an actual cycle counter, see if we can generate enough entropy * with timing noise. */ static void __cold try_to_generate_entropy(void) { enum { NUM_TRIAL_SAMPLES = 8192, MAX_SAMPLES_PER_BIT = HZ / 15 }; u8 stack_bytes[sizeof(struct entropy_timer_state) + SMP_CACHE_BYTES - 1]; struct entropy_timer_state *stack = PTR_ALIGN((void *)stack_bytes, SMP_CACHE_BYTES); unsigned int i, num_different = 0; unsigned long last = random_get_entropy(); int cpu = -1; for (i = 0; i < NUM_TRIAL_SAMPLES - 1; ++i) { stack->entropy = random_get_entropy(); if (stack->entropy != last) ++num_different; last = stack->entropy; } stack->samples_per_bit = DIV_ROUND_UP(NUM_TRIAL_SAMPLES, num_different + 1); if (stack->samples_per_bit > MAX_SAMPLES_PER_BIT) return; atomic_set(&stack->samples, 0); timer_setup_on_stack(&stack->timer, entropy_timer, 0); while (!crng_ready() && !signal_pending(current)) { /* * Check !timer_pending() and then ensure that any previous callback has finished * executing by checking timer_delete_sync_try(), before queueing the next one. */ if (!timer_pending(&stack->timer) && timer_delete_sync_try(&stack->timer) >= 0) { struct cpumask timer_cpus; unsigned int num_cpus; /* * Preemption must be disabled here, both to read the current CPU number * and to avoid scheduling a timer on a dead CPU. */ preempt_disable(); /* Only schedule callbacks on timer CPUs that are online. */ cpumask_and(&timer_cpus, housekeeping_cpumask(HK_TYPE_TIMER), cpu_online_mask); num_cpus = cpumask_weight(&timer_cpus); /* In very bizarre case of misconfiguration, fallback to all online. */ if (unlikely(num_cpus == 0)) { timer_cpus = *cpu_online_mask; num_cpus = cpumask_weight(&timer_cpus); } /* Basic CPU round-robin, which avoids the current CPU. */ do { cpu = cpumask_next(cpu, &timer_cpus); if (cpu >= nr_cpu_ids) cpu = cpumask_first(&timer_cpus); } while (cpu == smp_processor_id() && num_cpus > 1); /* Expiring the timer at `jiffies` means it's the next tick. */ stack->timer.expires = jiffies; add_timer_on(&stack->timer, cpu); preempt_enable(); } mix_pool_bytes(&stack->entropy, sizeof(stack->entropy)); schedule(); stack->entropy = random_get_entropy(); } mix_pool_bytes(&stack->entropy, sizeof(stack->entropy)); timer_delete_sync(&stack->timer); timer_destroy_on_stack(&stack->timer); } /********************************************************************** * * Userspace reader/writer interfaces. * * getrandom(2) is the primary modern interface into the RNG and should * be used in preference to anything else. * * Reading from /dev/random has the same functionality as calling * getrandom(2) with flags=0. In earlier versions, however, it had * vastly different semantics and should therefore be avoided, to * prevent backwards compatibility issues. * * Reading from /dev/urandom has the same functionality as calling * getrandom(2) with flags=GRND_INSECURE. Because it does not block * waiting for the RNG to be ready, it should not be used. * * Writing to either /dev/random or /dev/urandom adds entropy to * the input pool but does not credit it. * * Polling on /dev/random indicates when the RNG is initialized, on * the read side, and when it wants new entropy, on the write side. * * Both /dev/random and /dev/urandom have the same set of ioctls for * adding entropy, getting the entropy count, zeroing the count, and * reseeding the crng. * **********************************************************************/ SYSCALL_DEFINE3(getrandom, char __user *, ubuf, size_t, len, unsigned int, flags) { struct iov_iter iter; int ret; if (flags & ~(GRND_NONBLOCK | GRND_RANDOM | GRND_INSECURE)) return -EINVAL; /* * Requesting insecure and blocking randomness at the same time makes * no sense. */ if ((flags & (GRND_INSECURE | GRND_RANDOM)) == (GRND_INSECURE | GRND_RANDOM)) return -EINVAL; if (!crng_ready() && !(flags & GRND_INSECURE)) { if (flags & GRND_NONBLOCK) return -EAGAIN; ret = wait_for_random_bytes(); if (unlikely(ret)) return ret; } ret = import_ubuf(ITER_DEST, ubuf, len, &iter); if (unlikely(ret)) return ret; return get_random_bytes_user(&iter); } static __poll_t random_poll(struct file *file, poll_table *wait) { poll_wait(file, &crng_init_wait, wait); return crng_ready() ? EPOLLIN | EPOLLRDNORM : EPOLLOUT | EPOLLWRNORM; } static ssize_t write_pool_user(struct iov_iter *iter) { u8 block[BLAKE2S_BLOCK_SIZE]; ssize_t ret = 0; size_t copied; if (unlikely(!iov_iter_count(iter))) return 0; for (;;) { copied = copy_from_iter(block, sizeof(block), iter); ret += copied; mix_pool_bytes(block, copied); if (!iov_iter_count(iter) || copied != sizeof(block)) break; BUILD_BUG_ON(PAGE_SIZE % sizeof(block) != 0); if (ret % PAGE_SIZE == 0) { if (signal_pending(current)) break; cond_resched(); } } memzero_explicit(block, sizeof(block)); return ret ? ret : -EFAULT; } static ssize_t random_write_iter(struct kiocb *kiocb, struct iov_iter *iter) { return write_pool_user(iter); } static ssize_t urandom_read_iter(struct kiocb *kiocb, struct iov_iter *iter) { static int maxwarn = 10; /* * Opportunistically attempt to initialize the RNG on platforms that * have fast cycle counters, but don't (for now) require it to succeed. */ if (!crng_ready()) try_to_generate_entropy(); if (!crng_ready()) { if (!ratelimit_disable && maxwarn <= 0) ratelimit_state_inc_miss(&urandom_warning); else if (ratelimit_disable || __ratelimit(&urandom_warning)) { --maxwarn; pr_notice("%s: uninitialized urandom read (%zu bytes read)\n", current->comm, iov_iter_count(iter)); } } return get_random_bytes_user(iter); } static ssize_t random_read_iter(struct kiocb *kiocb, struct iov_iter *iter) { int ret; if (!crng_ready() && ((kiocb->ki_flags & (IOCB_NOWAIT | IOCB_NOIO)) || (kiocb->ki_filp->f_flags & O_NONBLOCK))) return -EAGAIN; ret = wait_for_random_bytes(); if (ret != 0) return ret; return get_random_bytes_user(iter); } static long random_ioctl(struct file *f, unsigned int cmd, unsigned long arg) { int __user *p = (int __user *)arg; int ent_count; switch (cmd) { case RNDGETENTCNT: /* Inherently racy, no point locking. */ if (put_user(input_pool.init_bits, p)) return -EFAULT; return 0; case RNDADDTOENTCNT: if (!capable(CAP_SYS_ADMIN)) return -EPERM; if (get_user(ent_count, p)) return -EFAULT; if (ent_count < 0) return -EINVAL; credit_init_bits(ent_count); return 0; case RNDADDENTROPY: { struct iov_iter iter; ssize_t ret; int len; if (!capable(CAP_SYS_ADMIN)) return -EPERM; if (get_user(ent_count, p++)) return -EFAULT; if (ent_count < 0) return -EINVAL; if (get_user(len, p++)) return -EFAULT; ret = import_ubuf(ITER_SOURCE, p, len, &iter); if (unlikely(ret)) return ret; ret = write_pool_user(&iter); if (unlikely(ret < 0)) return ret; /* Since we're crediting, enforce that it was all written into the pool. */ if (unlikely(ret != len)) return -EFAULT; credit_init_bits(ent_count); return 0; } case RNDZAPENTCNT: case RNDCLEARPOOL: /* No longer has any effect. */ if (!capable(CAP_SYS_ADMIN)) return -EPERM; return 0; case RNDRESEEDCRNG: if (!capable(CAP_SYS_ADMIN)) return -EPERM; if (!crng_ready()) return -ENODATA; crng_reseed(NULL); return 0; default: return -EINVAL; } } static int random_fasync(int fd, struct file *filp, int on) { return fasync_helper(fd, filp, on, &fasync); } const struct file_operations random_fops = { .read_iter = random_read_iter, .write_iter = random_write_iter, .poll = random_poll, .unlocked_ioctl = random_ioctl, .compat_ioctl = compat_ptr_ioctl, .fasync = random_fasync, .llseek = noop_llseek, .splice_read = copy_splice_read, .splice_write = iter_file_splice_write, }; const struct file_operations urandom_fops = { .read_iter = urandom_read_iter, .write_iter = random_write_iter, .unlocked_ioctl = random_ioctl, .compat_ioctl = compat_ptr_ioctl, .fasync = random_fasync, .llseek = noop_llseek, .splice_read = copy_splice_read, .splice_write = iter_file_splice_write, }; /******************************************************************** * * Sysctl interface. * * These are partly unused legacy knobs with dummy values to not break * userspace and partly still useful things. They are usually accessible * in /proc/sys/kernel/random/ and are as follows: * * - boot_id - a UUID representing the current boot. * * - uuid - a random UUID, different each time the file is read. * * - poolsize - the number of bits of entropy that the input pool can * hold, tied to the POOL_BITS constant. * * - entropy_avail - the number of bits of entropy currently in the * input pool. Always <= poolsize. * * - write_wakeup_threshold - the amount of entropy in the input pool * below which write polls to /dev/random will unblock, requesting * more entropy, tied to the POOL_READY_BITS constant. It is writable * to avoid breaking old userspaces, but writing to it does not * change any behavior of the RNG. * * - urandom_min_reseed_secs - fixed to the value CRNG_RESEED_INTERVAL. * It is writable to avoid breaking old userspaces, but writing * to it does not change any behavior of the RNG. * ********************************************************************/ #ifdef CONFIG_SYSCTL #include <linux/sysctl.h> static int sysctl_random_min_urandom_seed = CRNG_RESEED_INTERVAL / HZ; static int sysctl_random_write_wakeup_bits = POOL_READY_BITS; static int sysctl_poolsize = POOL_BITS; static u8 sysctl_bootid[UUID_SIZE]; /* * This function is used to return both the bootid UUID, and random * UUID. The difference is in whether table->data is NULL; if it is, * then a new UUID is generated and returned to the user. */ static int proc_do_uuid(const struct ctl_table *table, int write, void *buf, size_t *lenp, loff_t *ppos) { u8 tmp_uuid[UUID_SIZE], *uuid; char uuid_string[UUID_STRING_LEN + 1]; struct ctl_table fake_table = { .data = uuid_string, .maxlen = UUID_STRING_LEN }; if (write) return -EPERM; uuid = table->data; if (!uuid) { uuid = tmp_uuid; generate_random_uuid(uuid); } else { static DEFINE_SPINLOCK(bootid_spinlock); spin_lock(&bootid_spinlock); if (!uuid[8]) generate_random_uuid(uuid); spin_unlock(&bootid_spinlock); } snprintf(uuid_string, sizeof(uuid_string), "%pU", uuid); return proc_dostring(&fake_table, 0, buf, lenp, ppos); } /* The same as proc_dointvec, but writes don't change anything. */ static int proc_do_rointvec(const struct ctl_table *table, int write, void *buf, size_t *lenp, loff_t *ppos) { return write ? 0 : proc_dointvec(table, 0, buf, lenp, ppos); } static const struct ctl_table random_table[] = { { .procname = "poolsize", .data = &sysctl_poolsize, .maxlen = sizeof(int), .mode = 0444, .proc_handler = proc_dointvec, }, { .procname = "entropy_avail", .data = &input_pool.init_bits, .maxlen = sizeof(int), .mode = 0444, .proc_handler = proc_dointvec, }, { .procname = "write_wakeup_threshold", .data = &sysctl_random_write_wakeup_bits, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_do_rointvec, }, { .procname = "urandom_min_reseed_secs", .data = &sysctl_random_min_urandom_seed, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_do_rointvec, }, { .procname = "boot_id", .data = &sysctl_bootid, .mode = 0444, .proc_handler = proc_do_uuid, }, { .procname = "uuid", .mode = 0444, .proc_handler = proc_do_uuid, }, }; /* * random_init() is called before sysctl_init(), * so we cannot call register_sysctl_init() in random_init() */ static int __init random_sysctls_init(void) { register_sysctl_init("kernel/random", random_table); return 0; } device_initcall(random_sysctls_init); #endif |
| 1 1 1 1 1 1 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 | // SPDX-License-Identifier: GPL-2.0 #include <linux/types.h> #include <linux/kconfig.h> #include <linux/list.h> #include <linux/security.h> #include <linux/umh.h> #include <linux/sysctl.h> #include <linux/module.h> #include "fallback.h" #include "firmware.h" /* * firmware fallback mechanism */ /* * use small loading timeout for caching devices' firmware because all these * firmware images have been loaded successfully at lease once, also system is * ready for completing firmware loading now. The maximum size of firmware in * current distributions is about 2M bytes, so 10 secs should be enough. */ void fw_fallback_set_cache_timeout(void) { fw_fallback_config.old_timeout = __firmware_loading_timeout(); __fw_fallback_set_timeout(10); } /* Restores the timeout to the value last configured during normal operation */ void fw_fallback_set_default_timeout(void) { __fw_fallback_set_timeout(fw_fallback_config.old_timeout); } static long firmware_loading_timeout(void) { return __firmware_loading_timeout() > 0 ? __firmware_loading_timeout() * HZ : MAX_JIFFY_OFFSET; } static inline int fw_sysfs_wait_timeout(struct fw_priv *fw_priv, long timeout) { return __fw_state_wait_common(fw_priv, timeout); } static LIST_HEAD(pending_fw_head); void kill_pending_fw_fallback_reqs(bool kill_all) { struct fw_priv *fw_priv; struct fw_priv *next; mutex_lock(&fw_lock); list_for_each_entry_safe(fw_priv, next, &pending_fw_head, pending_list) { if (kill_all || !fw_priv->need_uevent) __fw_load_abort(fw_priv); } if (kill_all) fw_load_abort_all = true; mutex_unlock(&fw_lock); } /** * fw_load_sysfs_fallback() - load a firmware via the sysfs fallback mechanism * @fw_sysfs: firmware sysfs information for the firmware to load * @timeout: timeout to wait for the load * * In charge of constructing a sysfs fallback interface for firmware loading. **/ static int fw_load_sysfs_fallback(struct fw_sysfs *fw_sysfs, long timeout) { int retval = 0; struct device *f_dev = &fw_sysfs->dev; struct fw_priv *fw_priv = fw_sysfs->fw_priv; /* fall back on userspace loading */ if (!fw_priv->data) fw_priv->is_paged_buf = true; dev_set_uevent_suppress(f_dev, true); retval = device_add(f_dev); if (retval) { dev_err(f_dev, "%s: device_register failed\n", __func__); goto err_put_dev; } mutex_lock(&fw_lock); if (fw_load_abort_all || fw_state_is_aborted(fw_priv)) { mutex_unlock(&fw_lock); retval = -EINTR; goto out; } list_add(&fw_priv->pending_list, &pending_fw_head); mutex_unlock(&fw_lock); if (fw_priv->opt_flags & FW_OPT_UEVENT) { fw_priv->need_uevent = true; dev_set_uevent_suppress(f_dev, false); dev_dbg(f_dev, "firmware: requesting %s\n", fw_priv->fw_name); kobject_uevent(&fw_sysfs->dev.kobj, KOBJ_ADD); } else { timeout = MAX_JIFFY_OFFSET; } retval = fw_sysfs_wait_timeout(fw_priv, timeout); if (retval < 0 && retval != -ENOENT) { mutex_lock(&fw_lock); fw_load_abort(fw_sysfs); mutex_unlock(&fw_lock); } if (fw_state_is_aborted(fw_priv)) { if (retval == -ERESTARTSYS) retval = -EINTR; } else if (fw_priv->is_paged_buf && !fw_priv->data) retval = -ENOMEM; out: device_del(f_dev); err_put_dev: put_device(f_dev); return retval; } static int fw_load_from_user_helper(struct firmware *firmware, const char *name, struct device *device, u32 opt_flags) { struct fw_sysfs *fw_sysfs; long timeout; int ret; timeout = firmware_loading_timeout(); if (opt_flags & FW_OPT_NOWAIT) { timeout = usermodehelper_read_lock_wait(timeout); if (!timeout) { dev_dbg(device, "firmware: %s loading timed out\n", name); return -EBUSY; } } else { ret = usermodehelper_read_trylock(); if (WARN_ON(ret)) { dev_err(device, "firmware: %s will not be loaded\n", name); return ret; } } fw_sysfs = fw_create_instance(firmware, name, device, opt_flags); if (IS_ERR(fw_sysfs)) { ret = PTR_ERR(fw_sysfs); goto out_unlock; } fw_sysfs->fw_priv = firmware->priv; ret = fw_load_sysfs_fallback(fw_sysfs, timeout); if (!ret) ret = assign_fw(firmware, device); out_unlock: usermodehelper_read_unlock(); return ret; } static bool fw_force_sysfs_fallback(u32 opt_flags) { if (fw_fallback_config.force_sysfs_fallback) return true; if (!(opt_flags & FW_OPT_USERHELPER)) return false; return true; } static bool fw_run_sysfs_fallback(u32 opt_flags) { int ret; if (fw_fallback_config.ignore_sysfs_fallback) { pr_info_once("Ignoring firmware sysfs fallback due to sysctl knob\n"); return false; } if ((opt_flags & FW_OPT_NOFALLBACK_SYSFS)) return false; /* Also permit LSMs and IMA to fail firmware sysfs fallback */ ret = security_kernel_load_data(LOADING_FIRMWARE, true); if (ret < 0) return false; return fw_force_sysfs_fallback(opt_flags); } /** * firmware_fallback_sysfs() - use the fallback mechanism to find firmware * @fw: pointer to firmware image * @name: name of firmware file to look for * @device: device for which firmware is being loaded * @opt_flags: options to control firmware loading behaviour, as defined by * &enum fw_opt * @ret: return value from direct lookup which triggered the fallback mechanism * * This function is called if direct lookup for the firmware failed, it enables * a fallback mechanism through userspace by exposing a sysfs loading * interface. Userspace is in charge of loading the firmware through the sysfs * loading interface. This sysfs fallback mechanism may be disabled completely * on a system by setting the proc sysctl value ignore_sysfs_fallback to true. * If this is false we check if the internal API caller set the * @FW_OPT_NOFALLBACK_SYSFS flag, if so it would also disable the fallback * mechanism. A system may want to enforce the sysfs fallback mechanism at all * times, it can do this by setting ignore_sysfs_fallback to false and * force_sysfs_fallback to true. * Enabling force_sysfs_fallback is functionally equivalent to build a kernel * with CONFIG_FW_LOADER_USER_HELPER_FALLBACK. **/ int firmware_fallback_sysfs(struct firmware *fw, const char *name, struct device *device, u32 opt_flags, int ret) { if (!fw_run_sysfs_fallback(opt_flags)) return ret; if (!(opt_flags & FW_OPT_NO_WARN)) dev_warn(device, "Falling back to sysfs fallback for: %s\n", name); else dev_dbg(device, "Falling back to sysfs fallback for: %s\n", name); return fw_load_from_user_helper(fw, name, device, opt_flags); } |
| 3 278 121 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _LINUX_CGROUP_NAMESPACE_H #define _LINUX_CGROUP_NAMESPACE_H #include <linux/ns_common.h> struct cgroup_namespace { struct ns_common ns; struct user_namespace *user_ns; struct ucounts *ucounts; struct css_set *root_cset; }; extern struct cgroup_namespace init_cgroup_ns; #ifdef CONFIG_CGROUPS static inline struct cgroup_namespace *to_cg_ns(struct ns_common *ns) { return container_of(ns, struct cgroup_namespace, ns); } void free_cgroup_ns(struct cgroup_namespace *ns); struct cgroup_namespace *copy_cgroup_ns(u64 flags, struct user_namespace *user_ns, struct cgroup_namespace *old_ns); int cgroup_path_ns(struct cgroup *cgrp, char *buf, size_t buflen, struct cgroup_namespace *ns); static inline void get_cgroup_ns(struct cgroup_namespace *ns) { ns_ref_inc(ns); } static inline void put_cgroup_ns(struct cgroup_namespace *ns) { if (ns_ref_put(ns)) free_cgroup_ns(ns); } #else /* !CONFIG_CGROUPS */ static inline void free_cgroup_ns(struct cgroup_namespace *ns) { } static inline struct cgroup_namespace * copy_cgroup_ns(u64 flags, struct user_namespace *user_ns, struct cgroup_namespace *old_ns) { return old_ns; } static inline void get_cgroup_ns(struct cgroup_namespace *ns) { } static inline void put_cgroup_ns(struct cgroup_namespace *ns) { } #endif /* !CONFIG_CGROUPS */ #endif /* _LINUX_CGROUP_NAMESPACE_H */ |
| 3 10 10 10 10 10 10 10 10 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 | // SPDX-License-Identifier: GPL-2.0 /* * buffered writeback throttling. loosely based on CoDel. We can't drop * packets for IO scheduling, so the logic is something like this: * * - Monitor latencies in a defined window of time. * - If the minimum latency in the above window exceeds some target, increment * scaling step and scale down queue depth by a factor of 2x. The monitoring * window is then shrunk to 100 / sqrt(scaling step + 1). * - For any window where we don't have solid data on what the latencies * look like, retain status quo. * - If latencies look good, decrement scaling step. * - If we're only doing writes, allow the scaling step to go negative. This * will temporarily boost write performance, snapping back to a stable * scaling step of 0 if reads show up or the heavy writers finish. Unlike * positive scaling steps where we shrink the monitoring window, a negative * scaling step retains the default step==0 window size. * * Copyright (C) 2016 Jens Axboe * */ #include <linux/kernel.h> #include <linux/blk_types.h> #include <linux/slab.h> #include <linux/backing-dev.h> #include <linux/swap.h> #include "blk-stat.h" #include "blk-wbt.h" #include "blk-rq-qos.h" #include "elevator.h" #include "blk.h" #define CREATE_TRACE_POINTS #include <trace/events/wbt.h> enum wbt_flags { WBT_TRACKED = 1, /* write, tracked for throttling */ WBT_READ = 2, /* read */ WBT_SWAP = 4, /* write, from swap_writeout() */ WBT_DISCARD = 8, /* discard */ WBT_NR_BITS = 4, /* number of bits */ }; enum { WBT_RWQ_BG = 0, WBT_RWQ_SWAP, WBT_RWQ_DISCARD, WBT_NUM_RWQ, }; /* * If current state is WBT_STATE_ON/OFF_DEFAULT, it can be covered to any other * state, if current state is WBT_STATE_ON/OFF_MANUAL, it can only be covered * to WBT_STATE_OFF/ON_MANUAL. */ enum { WBT_STATE_ON_DEFAULT = 1, /* on by default */ WBT_STATE_ON_MANUAL = 2, /* on manually by sysfs */ WBT_STATE_OFF_DEFAULT = 3, /* off by default */ WBT_STATE_OFF_MANUAL = 4, /* off manually by sysfs */ }; struct rq_wb { /* * Settings that govern how we throttle */ unsigned int wb_background; /* background writeback */ unsigned int wb_normal; /* normal writeback */ short enable_state; /* WBT_STATE_* */ /* * Number of consecutive periods where we don't have enough * information to make a firm scale up/down decision. */ unsigned int unknown_cnt; u64 win_nsec; /* default window size */ u64 cur_win_nsec; /* current window size */ struct blk_stat_callback *cb; u64 sync_issue; void *sync_cookie; unsigned long last_issue; /* issue time of last read rq */ unsigned long last_comp; /* completion time of last read rq */ unsigned long min_lat_nsec; struct rq_qos rqos; struct rq_wait rq_wait[WBT_NUM_RWQ]; struct rq_depth rq_depth; }; static inline struct rq_wb *RQWB(struct rq_qos *rqos) { return container_of(rqos, struct rq_wb, rqos); } static inline void wbt_clear_state(struct request *rq) { rq->wbt_flags = 0; } static inline enum wbt_flags wbt_flags(struct request *rq) { return rq->wbt_flags; } static inline bool wbt_is_tracked(struct request *rq) { return rq->wbt_flags & WBT_TRACKED; } static inline bool wbt_is_read(struct request *rq) { return rq->wbt_flags & WBT_READ; } enum { /* * Default setting, we'll scale up (to 75% of QD max) or down (min 1) * from here depending on device stats */ RWB_DEF_DEPTH = 16, /* * 100msec window */ RWB_WINDOW_NSEC = 100 * 1000 * 1000ULL, /* * Disregard stats, if we don't meet this minimum */ RWB_MIN_WRITE_SAMPLES = 3, /* * If we have this number of consecutive windows without enough * information to scale up or down, slowly return to center state * (step == 0). */ RWB_UNKNOWN_BUMP = 5, }; static inline bool rwb_enabled(struct rq_wb *rwb) { return rwb && rwb->enable_state != WBT_STATE_OFF_DEFAULT && rwb->enable_state != WBT_STATE_OFF_MANUAL; } static void wb_timestamp(struct rq_wb *rwb, unsigned long *var) { if (rwb_enabled(rwb)) { const unsigned long cur = jiffies; if (cur != *var) *var = cur; } } /* * If a task was rate throttled in balance_dirty_pages() within the last * second or so, use that to indicate a higher cleaning rate. */ static bool wb_recent_wait(struct rq_wb *rwb) { struct backing_dev_info *bdi = rwb->rqos.disk->bdi; return time_before(jiffies, bdi->last_bdp_sleep + HZ); } static inline struct rq_wait *get_rq_wait(struct rq_wb *rwb, enum wbt_flags wb_acct) { if (wb_acct & WBT_SWAP) return &rwb->rq_wait[WBT_RWQ_SWAP]; else if (wb_acct & WBT_DISCARD) return &rwb->rq_wait[WBT_RWQ_DISCARD]; return &rwb->rq_wait[WBT_RWQ_BG]; } static void rwb_wake_all(struct rq_wb *rwb) { int i; for (i = 0; i < WBT_NUM_RWQ; i++) { struct rq_wait *rqw = &rwb->rq_wait[i]; if (wq_has_sleeper(&rqw->wait)) wake_up_all(&rqw->wait); } } static void wbt_rqw_done(struct rq_wb *rwb, struct rq_wait *rqw, enum wbt_flags wb_acct) { int inflight, limit; inflight = atomic_dec_return(&rqw->inflight); /* * For discards, our limit is always the background. For writes, if * the device does write back caching, drop further down before we * wake people up. */ if (wb_acct & WBT_DISCARD) limit = rwb->wb_background; else if (blk_queue_write_cache(rwb->rqos.disk->queue) && !wb_recent_wait(rwb)) limit = 0; else limit = rwb->wb_normal; /* * Don't wake anyone up if we are above the normal limit. */ if (inflight && inflight >= limit) return; if (wq_has_sleeper(&rqw->wait)) { int diff = limit - inflight; if (!inflight || diff >= rwb->wb_background / 2) wake_up_all(&rqw->wait); } } static void __wbt_done(struct rq_qos *rqos, enum wbt_flags wb_acct) { struct rq_wb *rwb = RQWB(rqos); struct rq_wait *rqw; if (!(wb_acct & WBT_TRACKED)) return; rqw = get_rq_wait(rwb, wb_acct); wbt_rqw_done(rwb, rqw, wb_acct); } /* * Called on completion of a request. Note that it's also called when * a request is merged, when the request gets freed. */ static void wbt_done(struct rq_qos *rqos, struct request *rq) { struct rq_wb *rwb = RQWB(rqos); if (!wbt_is_tracked(rq)) { if (wbt_is_read(rq)) { if (rwb->sync_cookie == rq) { rwb->sync_issue = 0; rwb->sync_cookie = NULL; } wb_timestamp(rwb, &rwb->last_comp); } } else { WARN_ON_ONCE(rq == rwb->sync_cookie); __wbt_done(rqos, wbt_flags(rq)); } wbt_clear_state(rq); } static inline bool stat_sample_valid(struct blk_rq_stat *stat) { /* * We need at least one read sample, and a minimum of * RWB_MIN_WRITE_SAMPLES. We require some write samples to know * that it's writes impacting us, and not just some sole read on * a device that is in a lower power state. */ return (stat[READ].nr_samples >= 1 && stat[WRITE].nr_samples >= RWB_MIN_WRITE_SAMPLES); } static u64 rwb_sync_issue_lat(struct rq_wb *rwb) { u64 issue = READ_ONCE(rwb->sync_issue); if (!issue || !rwb->sync_cookie) return 0; return blk_time_get_ns() - issue; } static inline unsigned int wbt_inflight(struct rq_wb *rwb) { unsigned int i, ret = 0; for (i = 0; i < WBT_NUM_RWQ; i++) ret += atomic_read(&rwb->rq_wait[i].inflight); return ret; } enum { LAT_OK = 1, LAT_UNKNOWN, LAT_UNKNOWN_WRITES, LAT_EXCEEDED, }; static int latency_exceeded(struct rq_wb *rwb, struct blk_rq_stat *stat) { struct backing_dev_info *bdi = rwb->rqos.disk->bdi; struct rq_depth *rqd = &rwb->rq_depth; u64 thislat; /* * If our stored sync issue exceeds the window size, or it * exceeds our min target AND we haven't logged any entries, * flag the latency as exceeded. wbt works off completion latencies, * but for a flooded device, a single sync IO can take a long time * to complete after being issued. If this time exceeds our * monitoring window AND we didn't see any other completions in that * window, then count that sync IO as a violation of the latency. */ thislat = rwb_sync_issue_lat(rwb); if (thislat > rwb->cur_win_nsec || (thislat > rwb->min_lat_nsec && !stat[READ].nr_samples)) { trace_wbt_lat(bdi, thislat); return LAT_EXCEEDED; } /* * No read/write mix, if stat isn't valid */ if (!stat_sample_valid(stat)) { /* * If we had writes in this stat window and the window is * current, we're only doing writes. If a task recently * waited or still has writes in flights, consider us doing * just writes as well. */ if (stat[WRITE].nr_samples || wb_recent_wait(rwb) || wbt_inflight(rwb)) return LAT_UNKNOWN_WRITES; return LAT_UNKNOWN; } /* * If the 'min' latency exceeds our target, step down. */ if (stat[READ].min > rwb->min_lat_nsec) { trace_wbt_lat(bdi, stat[READ].min); trace_wbt_stat(bdi, stat); return LAT_EXCEEDED; } if (rqd->scale_step) trace_wbt_stat(bdi, stat); return LAT_OK; } static void rwb_trace_step(struct rq_wb *rwb, const char *msg) { struct backing_dev_info *bdi = rwb->rqos.disk->bdi; struct rq_depth *rqd = &rwb->rq_depth; trace_wbt_step(bdi, msg, rqd->scale_step, rwb->cur_win_nsec, rwb->wb_background, rwb->wb_normal, rqd->max_depth); } static void calc_wb_limits(struct rq_wb *rwb) { if (rwb->min_lat_nsec == 0) { rwb->wb_normal = rwb->wb_background = 0; } else if (rwb->rq_depth.max_depth <= 2) { rwb->wb_normal = rwb->rq_depth.max_depth; rwb->wb_background = 1; } else { rwb->wb_normal = (rwb->rq_depth.max_depth + 1) / 2; rwb->wb_background = (rwb->rq_depth.max_depth + 3) / 4; } } static void scale_up(struct rq_wb *rwb) { if (!rq_depth_scale_up(&rwb->rq_depth)) return; calc_wb_limits(rwb); rwb->unknown_cnt = 0; rwb_wake_all(rwb); rwb_trace_step(rwb, tracepoint_string("scale up")); } static void scale_down(struct rq_wb *rwb, bool hard_throttle) { if (!rq_depth_scale_down(&rwb->rq_depth, hard_throttle)) return; calc_wb_limits(rwb); rwb->unknown_cnt = 0; rwb_trace_step(rwb, tracepoint_string("scale down")); } static void rwb_arm_timer(struct rq_wb *rwb) { struct rq_depth *rqd = &rwb->rq_depth; if (rqd->scale_step > 0) { /* * We should speed this up, using some variant of a fast * integer inverse square root calculation. Since we only do * this for every window expiration, it's not a huge deal, * though. */ rwb->cur_win_nsec = div_u64(rwb->win_nsec << 4, int_sqrt((rqd->scale_step + 1) << 8)); } else { /* * For step < 0, we don't want to increase/decrease the * window size. */ rwb->cur_win_nsec = rwb->win_nsec; } blk_stat_activate_nsecs(rwb->cb, rwb->cur_win_nsec); } static void wb_timer_fn(struct blk_stat_callback *cb) { struct rq_wb *rwb = cb->data; struct rq_depth *rqd = &rwb->rq_depth; unsigned int inflight = wbt_inflight(rwb); int status; if (!rwb->rqos.disk) return; status = latency_exceeded(rwb, cb->stat); trace_wbt_timer(rwb->rqos.disk->bdi, status, rqd->scale_step, inflight); /* * If we exceeded the latency target, step down. If we did not, * step one level up. If we don't know enough to say either exceeded * or ok, then don't do anything. */ switch (status) { case LAT_EXCEEDED: scale_down(rwb, true); break; case LAT_OK: scale_up(rwb); break; case LAT_UNKNOWN_WRITES: /* * We don't have a valid read/write sample, but we do have * writes going on. Allow step to go negative, to increase * write performance. */ scale_up(rwb); break; case LAT_UNKNOWN: if (++rwb->unknown_cnt < RWB_UNKNOWN_BUMP) break; /* * We get here when previously scaled reduced depth, and we * currently don't have a valid read/write sample. For that * case, slowly return to center state (step == 0). */ if (rqd->scale_step > 0) scale_up(rwb); else if (rqd->scale_step < 0) scale_down(rwb, false); break; default: break; } /* * Re-arm timer, if we have IO in flight */ if (rqd->scale_step || inflight) rwb_arm_timer(rwb); } static void wbt_update_limits(struct rq_wb *rwb) { struct rq_depth *rqd = &rwb->rq_depth; rqd->scale_step = 0; rqd->scaled_max = false; rq_depth_calc_max_depth(rqd); calc_wb_limits(rwb); rwb_wake_all(rwb); } bool wbt_disabled(struct request_queue *q) { struct rq_qos *rqos = wbt_rq_qos(q); return !rqos || !rwb_enabled(RQWB(rqos)); } u64 wbt_get_min_lat(struct request_queue *q) { struct rq_qos *rqos = wbt_rq_qos(q); if (!rqos) return 0; return RQWB(rqos)->min_lat_nsec; } void wbt_set_min_lat(struct request_queue *q, u64 val) { struct rq_qos *rqos = wbt_rq_qos(q); if (!rqos) return; RQWB(rqos)->min_lat_nsec = val; if (val) RQWB(rqos)->enable_state = WBT_STATE_ON_MANUAL; else RQWB(rqos)->enable_state = WBT_STATE_OFF_MANUAL; wbt_update_limits(RQWB(rqos)); } static bool close_io(struct rq_wb *rwb) { const unsigned long now = jiffies; return time_before(now, rwb->last_issue + HZ / 10) || time_before(now, rwb->last_comp + HZ / 10); } #define REQ_HIPRIO (REQ_SYNC | REQ_META | REQ_PRIO | REQ_SWAP) static inline unsigned int get_limit(struct rq_wb *rwb, blk_opf_t opf) { unsigned int limit; if ((opf & REQ_OP_MASK) == REQ_OP_DISCARD) return rwb->wb_background; /* * At this point we know it's a buffered write. If this is * swap trying to free memory, or REQ_SYNC is set, then * it's WB_SYNC_ALL writeback, and we'll use the max limit for * that. If the write is marked as a background write, then use * the idle limit, or go to normal if we haven't had competing * IO for a bit. */ if ((opf & REQ_HIPRIO) || wb_recent_wait(rwb)) limit = rwb->rq_depth.max_depth; else if ((opf & REQ_BACKGROUND) || close_io(rwb)) { /* * If less than 100ms since we completed unrelated IO, * limit us to half the depth for background writeback. */ limit = rwb->wb_background; } else limit = rwb->wb_normal; return limit; } struct wbt_wait_data { struct rq_wb *rwb; enum wbt_flags wb_acct; blk_opf_t opf; }; static bool wbt_inflight_cb(struct rq_wait *rqw, void *private_data) { struct wbt_wait_data *data = private_data; return rq_wait_inc_below(rqw, get_limit(data->rwb, data->opf)); } static void wbt_cleanup_cb(struct rq_wait *rqw, void *private_data) { struct wbt_wait_data *data = private_data; wbt_rqw_done(data->rwb, rqw, data->wb_acct); } /* * Block if we will exceed our limit, or if we are currently waiting for * the timer to kick off queuing again. */ static void __wbt_wait(struct rq_wb *rwb, enum wbt_flags wb_acct, blk_opf_t opf) { struct rq_wait *rqw = get_rq_wait(rwb, wb_acct); struct wbt_wait_data data = { .rwb = rwb, .wb_acct = wb_acct, .opf = opf, }; rq_qos_wait(rqw, &data, wbt_inflight_cb, wbt_cleanup_cb); } static inline bool wbt_should_throttle(struct bio *bio) { switch (bio_op(bio)) { case REQ_OP_WRITE: /* * Don't throttle WRITE_ODIRECT */ if ((bio->bi_opf & (REQ_SYNC | REQ_IDLE)) == (REQ_SYNC | REQ_IDLE)) return false; fallthrough; case REQ_OP_DISCARD: return true; default: return false; } } static enum wbt_flags bio_to_wbt_flags(struct rq_wb *rwb, struct bio *bio) { enum wbt_flags flags = 0; if (!rwb_enabled(rwb)) return 0; if (bio_op(bio) == REQ_OP_READ) { flags = WBT_READ; } else if (wbt_should_throttle(bio)) { if (bio->bi_opf & REQ_SWAP) flags |= WBT_SWAP; if (bio_op(bio) == REQ_OP_DISCARD) flags |= WBT_DISCARD; flags |= WBT_TRACKED; } return flags; } static void wbt_cleanup(struct rq_qos *rqos, struct bio *bio) { struct rq_wb *rwb = RQWB(rqos); enum wbt_flags flags = bio_to_wbt_flags(rwb, bio); __wbt_done(rqos, flags); } /* May sleep, if we have exceeded the writeback limits. */ static void wbt_wait(struct rq_qos *rqos, struct bio *bio) { struct rq_wb *rwb = RQWB(rqos); enum wbt_flags flags; flags = bio_to_wbt_flags(rwb, bio); if (!(flags & WBT_TRACKED)) { if (flags & WBT_READ) wb_timestamp(rwb, &rwb->last_issue); return; } __wbt_wait(rwb, flags, bio->bi_opf); if (!blk_stat_is_active(rwb->cb)) rwb_arm_timer(rwb); } static void wbt_track(struct rq_qos *rqos, struct request *rq, struct bio *bio) { struct rq_wb *rwb = RQWB(rqos); rq->wbt_flags |= bio_to_wbt_flags(rwb, bio); } static void wbt_issue(struct rq_qos *rqos, struct request *rq) { struct rq_wb *rwb = RQWB(rqos); if (!rwb_enabled(rwb)) return; /* * Track sync issue, in case it takes a long time to complete. Allows us * to react quicker, if a sync IO takes a long time to complete. Note * that this is just a hint. The request can go away when it completes, * so it's important we never dereference it. We only use the address to * compare with, which is why we store the sync_issue time locally. */ if (wbt_is_read(rq) && !rwb->sync_issue) { rwb->sync_cookie = rq; rwb->sync_issue = rq->io_start_time_ns; } } static void wbt_requeue(struct rq_qos *rqos, struct request *rq) { struct rq_wb *rwb = RQWB(rqos); if (!rwb_enabled(rwb)) return; if (rq == rwb->sync_cookie) { rwb->sync_issue = 0; rwb->sync_cookie = NULL; } } /* * Enable wbt if defaults are configured that way */ void wbt_enable_default(struct gendisk *disk) { struct request_queue *q = disk->queue; struct rq_qos *rqos; bool enable = IS_ENABLED(CONFIG_BLK_WBT_MQ); mutex_lock(&disk->rqos_state_mutex); if (blk_queue_disable_wbt(q)) enable = false; /* Throttling already enabled? */ rqos = wbt_rq_qos(q); if (rqos) { if (enable && RQWB(rqos)->enable_state == WBT_STATE_OFF_DEFAULT) RQWB(rqos)->enable_state = WBT_STATE_ON_DEFAULT; mutex_unlock(&disk->rqos_state_mutex); return; } mutex_unlock(&disk->rqos_state_mutex); /* Queue not registered? Maybe shutting down... */ if (!blk_queue_registered(q)) return; if (queue_is_mq(q) && enable) wbt_init(disk); } EXPORT_SYMBOL_GPL(wbt_enable_default); u64 wbt_default_latency_nsec(struct request_queue *q) { /* * We default to 2msec for non-rotational storage, and 75msec * for rotational storage. */ if (blk_queue_nonrot(q)) return 2000000ULL; else return 75000000ULL; } static int wbt_data_dir(const struct request *rq) { const enum req_op op = req_op(rq); if (op == REQ_OP_READ) return READ; else if (op_is_write(op)) return WRITE; /* don't account */ return -1; } static void wbt_queue_depth_changed(struct rq_qos *rqos) { RQWB(rqos)->rq_depth.queue_depth = blk_queue_depth(rqos->disk->queue); wbt_update_limits(RQWB(rqos)); } static void wbt_exit(struct rq_qos *rqos) { struct rq_wb *rwb = RQWB(rqos); blk_stat_remove_callback(rqos->disk->queue, rwb->cb); blk_stat_free_callback(rwb->cb); kfree(rwb); } /* * Disable wbt, if enabled by default. */ void wbt_disable_default(struct gendisk *disk) { struct rq_qos *rqos = wbt_rq_qos(disk->queue); struct rq_wb *rwb; if (!rqos) return; mutex_lock(&disk->rqos_state_mutex); rwb = RQWB(rqos); if (rwb->enable_state == WBT_STATE_ON_DEFAULT) { blk_stat_deactivate(rwb->cb); rwb->enable_state = WBT_STATE_OFF_DEFAULT; } mutex_unlock(&disk->rqos_state_mutex); } EXPORT_SYMBOL_GPL(wbt_disable_default); #ifdef CONFIG_BLK_DEBUG_FS static int wbt_curr_win_nsec_show(void *data, struct seq_file *m) { struct rq_qos *rqos = data; struct rq_wb *rwb = RQWB(rqos); seq_printf(m, "%llu\n", rwb->cur_win_nsec); return 0; } static int wbt_enabled_show(void *data, struct seq_file *m) { struct rq_qos *rqos = data; struct rq_wb *rwb = RQWB(rqos); seq_printf(m, "%d\n", rwb->enable_state); return 0; } static int wbt_id_show(void *data, struct seq_file *m) { struct rq_qos *rqos = data; seq_printf(m, "%u\n", rqos->id); return 0; } static int wbt_inflight_show(void *data, struct seq_file *m) { struct rq_qos *rqos = data; struct rq_wb *rwb = RQWB(rqos); int i; for (i = 0; i < WBT_NUM_RWQ; i++) seq_printf(m, "%d: inflight %d\n", i, atomic_read(&rwb->rq_wait[i].inflight)); return 0; } static int wbt_min_lat_nsec_show(void *data, struct seq_file *m) { struct rq_qos *rqos = data; struct rq_wb *rwb = RQWB(rqos); seq_printf(m, "%lu\n", rwb->min_lat_nsec); return 0; } static int wbt_unknown_cnt_show(void *data, struct seq_file *m) { struct rq_qos *rqos = data; struct rq_wb *rwb = RQWB(rqos); seq_printf(m, "%u\n", rwb->unknown_cnt); return 0; } static int wbt_normal_show(void *data, struct seq_file *m) { struct rq_qos *rqos = data; struct rq_wb *rwb = RQWB(rqos); seq_printf(m, "%u\n", rwb->wb_normal); return 0; } static int wbt_background_show(void *data, struct seq_file *m) { struct rq_qos *rqos = data; struct rq_wb *rwb = RQWB(rqos); seq_printf(m, "%u\n", rwb->wb_background); return 0; } static const struct blk_mq_debugfs_attr wbt_debugfs_attrs[] = { {"curr_win_nsec", 0400, wbt_curr_win_nsec_show}, {"enabled", 0400, wbt_enabled_show}, {"id", 0400, wbt_id_show}, {"inflight", 0400, wbt_inflight_show}, {"min_lat_nsec", 0400, wbt_min_lat_nsec_show}, {"unknown_cnt", 0400, wbt_unknown_cnt_show}, {"wb_normal", 0400, wbt_normal_show}, {"wb_background", 0400, wbt_background_show}, {}, }; #endif static const struct rq_qos_ops wbt_rqos_ops = { .throttle = wbt_wait, .issue = wbt_issue, .track = wbt_track, .requeue = wbt_requeue, .done = wbt_done, .cleanup = wbt_cleanup, .queue_depth_changed = wbt_queue_depth_changed, .exit = wbt_exit, #ifdef CONFIG_BLK_DEBUG_FS .debugfs_attrs = wbt_debugfs_attrs, #endif }; int wbt_init(struct gendisk *disk) { struct request_queue *q = disk->queue; struct rq_wb *rwb; int i; int ret; rwb = kzalloc(sizeof(*rwb), GFP_KERNEL); if (!rwb) return -ENOMEM; rwb->cb = blk_stat_alloc_callback(wb_timer_fn, wbt_data_dir, 2, rwb); if (!rwb->cb) { kfree(rwb); return -ENOMEM; } for (i = 0; i < WBT_NUM_RWQ; i++) rq_wait_init(&rwb->rq_wait[i]); rwb->last_comp = rwb->last_issue = jiffies; rwb->win_nsec = RWB_WINDOW_NSEC; rwb->enable_state = WBT_STATE_ON_DEFAULT; rwb->rq_depth.default_depth = RWB_DEF_DEPTH; rwb->min_lat_nsec = wbt_default_latency_nsec(q); rwb->rq_depth.queue_depth = blk_queue_depth(q); wbt_update_limits(rwb); /* * Assign rwb and add the stats callback. */ mutex_lock(&q->rq_qos_mutex); ret = rq_qos_add(&rwb->rqos, disk, RQ_QOS_WBT, &wbt_rqos_ops); mutex_unlock(&q->rq_qos_mutex); if (ret) goto err_free; blk_stat_add_callback(q, rwb->cb); return 0; err_free: blk_stat_free_callback(rwb->cb); kfree(rwb); return ret; } |
| 2047 2051 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | // SPDX-License-Identifier: GPL-2.0-only /* * A generic implementation of binary search for the Linux kernel * * Copyright (C) 2008-2009 Ksplice, Inc. * Author: Tim Abbott <tabbott@ksplice.com> */ #include <linux/export.h> #include <linux/bsearch.h> #include <linux/kprobes.h> /* * bsearch - binary search an array of elements * @key: pointer to item being searched for * @base: pointer to first element to search * @num: number of elements * @size: size of each element * @cmp: pointer to comparison function * * This function does a binary search on the given array. The * contents of the array should already be in ascending sorted order * under the provided comparison function. * * Note that the key need not have the same type as the elements in * the array, e.g. key could be a string and the comparison function * could compare the string with the struct's name field. However, if * the key and elements in the array are of the same type, you can use * the same comparison function for both sort() and bsearch(). */ void *bsearch(const void *key, const void *base, size_t num, size_t size, cmp_func_t cmp) { return __inline_bsearch(key, base, num, size, cmp); } EXPORT_SYMBOL(bsearch); NOKPROBE_SYMBOL(bsearch); |
| 24 24 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 | // SPDX-License-Identifier: GPL-2.0-or-later /* Socket buffer accounting * * Copyright (C) 2007 Red Hat, Inc. All Rights Reserved. * Written by David Howells (dhowells@redhat.com) */ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #include <linux/module.h> #include <linux/net.h> #include <linux/skbuff.h> #include <net/sock.h> #include <net/af_rxrpc.h> #include "ar-internal.h" #define select_skb_count(skb) (&rxrpc_n_rx_skbs) /* * Note the allocation or reception of a socket buffer. */ void rxrpc_new_skb(struct sk_buff *skb, enum rxrpc_skb_trace why) { int n = atomic_inc_return(select_skb_count(skb)); trace_rxrpc_skb(skb, refcount_read(&skb->users), n, why); } /* * Note the re-emergence of a socket buffer from a queue or buffer. */ void rxrpc_see_skb(struct sk_buff *skb, enum rxrpc_skb_trace why) { if (skb) { int n = atomic_read(select_skb_count(skb)); trace_rxrpc_skb(skb, refcount_read(&skb->users), n, why); } } /* * Note the addition of a ref on a socket buffer. */ void rxrpc_get_skb(struct sk_buff *skb, enum rxrpc_skb_trace why) { int n = atomic_inc_return(select_skb_count(skb)); trace_rxrpc_skb(skb, refcount_read(&skb->users), n, why); skb_get(skb); } /* * Note the dropping of a ref on a socket buffer by the core. */ void rxrpc_eaten_skb(struct sk_buff *skb, enum rxrpc_skb_trace why) { int n = atomic_inc_return(&rxrpc_n_rx_skbs); trace_rxrpc_skb(skb, 0, n, why); } /* * Note the destruction of a socket buffer. */ void rxrpc_free_skb(struct sk_buff *skb, enum rxrpc_skb_trace why) { if (skb) { int n = atomic_dec_return(select_skb_count(skb)); trace_rxrpc_skb(skb, refcount_read(&skb->users), n, why); consume_skb(skb); } } /* * Clear a queue of socket buffers. */ void rxrpc_purge_queue(struct sk_buff_head *list) { struct sk_buff *skb; while ((skb = skb_dequeue((list))) != NULL) { int n = atomic_dec_return(select_skb_count(skb)); trace_rxrpc_skb(skb, refcount_read(&skb->users), n, rxrpc_skb_put_purge); consume_skb(skb); } } |
| 54 54 33 33 2 19 21 16 3 1 1 2 16 16 2 2 14 16 14 4 12 4 2 12 12 12 12 12 45 88 1 1 125 11 115 27 4 23 21 21 21 21 21 21 21 21 33 33 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 | // SPDX-License-Identifier: GPL-2.0-or-later /* * IPv6 fragment reassembly for connection tracking * * Copyright (C)2004 USAGI/WIDE Project * * Author: * Yasuyuki Kozakai @USAGI <yasuyuki.kozakai@toshiba.co.jp> * * Based on: net/ipv6/reassembly.c */ #define pr_fmt(fmt) "IPv6-nf: " fmt #include <linux/errno.h> #include <linux/types.h> #include <linux/string.h> #include <linux/net.h> #include <linux/netdevice.h> #include <linux/ipv6.h> #include <linux/slab.h> #include <net/ipv6_frag.h> #include <net/netfilter/ipv6/nf_conntrack_ipv6.h> #include <linux/sysctl.h> #include <linux/netfilter.h> #include <linux/netfilter_ipv6.h> #include <linux/kernel.h> #include <linux/module.h> #include <net/netfilter/ipv6/nf_defrag_ipv6.h> #include <net/netns/generic.h> static const char nf_frags_cache_name[] = "nf-frags"; static unsigned int nf_frag_pernet_id __read_mostly; static struct inet_frags nf_frags; static struct nft_ct_frag6_pernet *nf_frag_pernet(struct net *net) { return net_generic(net, nf_frag_pernet_id); } #ifdef CONFIG_SYSCTL static struct ctl_table nf_ct_frag6_sysctl_table[] = { { .procname = "nf_conntrack_frag6_timeout", .maxlen = sizeof(unsigned int), .mode = 0644, .proc_handler = proc_dointvec_jiffies, }, { .procname = "nf_conntrack_frag6_low_thresh", .maxlen = sizeof(unsigned long), .mode = 0644, .proc_handler = proc_doulongvec_minmax, }, { .procname = "nf_conntrack_frag6_high_thresh", .maxlen = sizeof(unsigned long), .mode = 0644, .proc_handler = proc_doulongvec_minmax, }, }; static int nf_ct_frag6_sysctl_register(struct net *net) { struct nft_ct_frag6_pernet *nf_frag; struct ctl_table *table; struct ctl_table_header *hdr; table = nf_ct_frag6_sysctl_table; if (!net_eq(net, &init_net)) { table = kmemdup(table, sizeof(nf_ct_frag6_sysctl_table), GFP_KERNEL); if (table == NULL) goto err_alloc; } nf_frag = nf_frag_pernet(net); table[0].data = &nf_frag->fqdir->timeout; table[1].data = &nf_frag->fqdir->low_thresh; table[1].extra2 = &nf_frag->fqdir->high_thresh; table[2].data = &nf_frag->fqdir->high_thresh; table[2].extra1 = &nf_frag->fqdir->low_thresh; hdr = register_net_sysctl_sz(net, "net/netfilter", table, ARRAY_SIZE(nf_ct_frag6_sysctl_table)); if (hdr == NULL) goto err_reg; nf_frag->nf_frag_frags_hdr = hdr; return 0; err_reg: if (!net_eq(net, &init_net)) kfree(table); err_alloc: return -ENOMEM; } static void __net_exit nf_ct_frags6_sysctl_unregister(struct net *net) { struct nft_ct_frag6_pernet *nf_frag = nf_frag_pernet(net); const struct ctl_table *table; table = nf_frag->nf_frag_frags_hdr->ctl_table_arg; unregister_net_sysctl_table(nf_frag->nf_frag_frags_hdr); if (!net_eq(net, &init_net)) kfree(table); } #else static int nf_ct_frag6_sysctl_register(struct net *net) { return 0; } static void __net_exit nf_ct_frags6_sysctl_unregister(struct net *net) { } #endif static int nf_ct_frag6_reasm(struct frag_queue *fq, struct sk_buff *skb, struct sk_buff *prev_tail, struct net_device *dev, int *refs); static inline u8 ip6_frag_ecn(const struct ipv6hdr *ipv6h) { return 1 << (ipv6_get_dsfield(ipv6h) & INET_ECN_MASK); } static void nf_ct_frag6_expire(struct timer_list *t) { struct inet_frag_queue *frag = timer_container_of(frag, t, timer); struct frag_queue *fq; fq = container_of(frag, struct frag_queue, q); ip6frag_expire_frag_queue(fq->q.fqdir->net, fq); } /* Creation primitives. */ static struct frag_queue *fq_find(struct net *net, __be32 id, u32 user, const struct ipv6hdr *hdr, int iif) { struct nft_ct_frag6_pernet *nf_frag = nf_frag_pernet(net); struct frag_v6_compare_key key = { .id = id, .saddr = hdr->saddr, .daddr = hdr->daddr, .user = user, .iif = iif, }; struct inet_frag_queue *q; if (!(ipv6_addr_type(&hdr->daddr) & (IPV6_ADDR_MULTICAST | IPV6_ADDR_LINKLOCAL))) key.iif = 0; q = inet_frag_find(nf_frag->fqdir, &key); if (!q) return NULL; return container_of(q, struct frag_queue, q); } static int nf_ct_frag6_queue(struct frag_queue *fq, struct sk_buff *skb, const struct frag_hdr *fhdr, int nhoff, int *refs) { unsigned int payload_len; struct net_device *dev; struct sk_buff *prev; int offset, end, err; u8 ecn; if (fq->q.flags & INET_FRAG_COMPLETE) { pr_debug("Already completed\n"); goto err; } payload_len = ntohs(ipv6_hdr(skb)->payload_len); offset = ntohs(fhdr->frag_off) & ~0x7; end = offset + (payload_len - ((u8 *)(fhdr + 1) - (u8 *)(ipv6_hdr(skb) + 1))); if ((unsigned int)end > IPV6_MAXPLEN) { pr_debug("offset is too large.\n"); return -EINVAL; } ecn = ip6_frag_ecn(ipv6_hdr(skb)); if (skb->ip_summed == CHECKSUM_COMPLETE) { const unsigned char *nh = skb_network_header(skb); skb->csum = csum_sub(skb->csum, csum_partial(nh, (u8 *)(fhdr + 1) - nh, 0)); } /* Is this the final fragment? */ if (!(fhdr->frag_off & htons(IP6_MF))) { /* If we already have some bits beyond end * or have different end, the segment is corrupted. */ if (end < fq->q.len || ((fq->q.flags & INET_FRAG_LAST_IN) && end != fq->q.len)) { pr_debug("already received last fragment\n"); goto err; } fq->q.flags |= INET_FRAG_LAST_IN; fq->q.len = end; } else { /* Check if the fragment is rounded to 8 bytes. * Required by the RFC. */ if (end & 0x7) { /* RFC2460 says always send parameter problem in * this case. -DaveM */ pr_debug("end of fragment not rounded to 8 bytes.\n"); inet_frag_kill(&fq->q, refs); return -EPROTO; } if (end > fq->q.len) { /* Some bits beyond end -> corruption. */ if (fq->q.flags & INET_FRAG_LAST_IN) { pr_debug("last packet already reached.\n"); goto err; } fq->q.len = end; } } if (end == offset) goto err; /* Point into the IP datagram 'data' part. */ if (!pskb_pull(skb, (u8 *) (fhdr + 1) - skb->data)) { pr_debug("queue: message is too short.\n"); goto err; } if (pskb_trim_rcsum(skb, end - offset)) { pr_debug("Can't trim\n"); goto err; } /* Note : skb->rbnode and skb->dev share the same location. */ dev = skb->dev; /* Makes sure compiler wont do silly aliasing games */ barrier(); prev = fq->q.fragments_tail; err = inet_frag_queue_insert(&fq->q, skb, offset, end); if (err) { if (err == IPFRAG_DUP) { /* No error for duplicates, pretend they got queued. */ kfree_skb_reason(skb, SKB_DROP_REASON_DUP_FRAG); return -EINPROGRESS; } goto insert_error; } if (dev) fq->iif = dev->ifindex; fq->q.stamp = skb->tstamp; fq->q.tstamp_type = skb->tstamp_type; fq->q.meat += skb->len; fq->ecn |= ecn; if (payload_len > fq->q.max_size) fq->q.max_size = payload_len; add_frag_mem_limit(fq->q.fqdir, skb->truesize); /* The first fragment. * nhoffset is obtained from the first fragment, of course. */ if (offset == 0) { fq->nhoffset = nhoff; fq->q.flags |= INET_FRAG_FIRST_IN; } if (fq->q.flags == (INET_FRAG_FIRST_IN | INET_FRAG_LAST_IN) && fq->q.meat == fq->q.len) { unsigned long orefdst = skb->_skb_refdst; skb->_skb_refdst = 0UL; err = nf_ct_frag6_reasm(fq, skb, prev, dev, refs); skb->_skb_refdst = orefdst; /* After queue has assumed skb ownership, only 0 or * -EINPROGRESS must be returned. */ return err ? -EINPROGRESS : 0; } skb_dst_drop(skb); skb_orphan(skb); return -EINPROGRESS; insert_error: inet_frag_kill(&fq->q, refs); err: skb_dst_drop(skb); return -EINVAL; } /* * Check if this packet is complete. * * It is called with locked fq, and caller must check that * queue is eligible for reassembly i.e. it is not COMPLETE, * the last and the first frames arrived and all the bits are here. */ static int nf_ct_frag6_reasm(struct frag_queue *fq, struct sk_buff *skb, struct sk_buff *prev_tail, struct net_device *dev, int *refs) { void *reasm_data; int payload_len; u8 ecn; inet_frag_kill(&fq->q, refs); ecn = ip_frag_ecn_table[fq->ecn]; if (unlikely(ecn == 0xff)) goto err; reasm_data = inet_frag_reasm_prepare(&fq->q, skb, prev_tail); if (!reasm_data) goto err; payload_len = -skb_network_offset(skb) - sizeof(struct ipv6hdr) + fq->q.len - sizeof(struct frag_hdr); if (payload_len > IPV6_MAXPLEN) { net_dbg_ratelimited("nf_ct_frag6_reasm: payload len = %d\n", payload_len); goto err; } /* We have to remove fragment header from datagram and to relocate * header in order to calculate ICV correctly. */ skb_network_header(skb)[fq->nhoffset] = skb_transport_header(skb)[0]; memmove(skb->head + sizeof(struct frag_hdr), skb->head, (skb->data - skb->head) - sizeof(struct frag_hdr)); skb->mac_header += sizeof(struct frag_hdr); skb->network_header += sizeof(struct frag_hdr); skb_reset_transport_header(skb); inet_frag_reasm_finish(&fq->q, skb, reasm_data, false); skb->ignore_df = 1; skb->dev = dev; ipv6_hdr(skb)->payload_len = htons(payload_len); ipv6_change_dsfield(ipv6_hdr(skb), 0xff, ecn); IP6CB(skb)->frag_max_size = sizeof(struct ipv6hdr) + fq->q.max_size; IP6CB(skb)->flags |= IP6SKB_FRAGMENTED; /* Yes, and fold redundant checksum back. 8) */ if (skb->ip_summed == CHECKSUM_COMPLETE) skb->csum = csum_partial(skb_network_header(skb), skb_network_header_len(skb), skb->csum); fq->q.rb_fragments = RB_ROOT; fq->q.fragments_tail = NULL; fq->q.last_run_head = NULL; return 0; err: inet_frag_kill(&fq->q, refs); return -EINVAL; } /* * find the header just before Fragment Header. * * if success return 0 and set ... * (*prevhdrp): the value of "Next Header Field" in the header * just before Fragment Header. * (*prevhoff): the offset of "Next Header Field" in the header * just before Fragment Header. * (*fhoff) : the offset of Fragment Header. * * Based on ipv6_skip_hdr() in net/ipv6/exthdr.c * */ static int find_prev_fhdr(struct sk_buff *skb, u8 *prevhdrp, int *prevhoff, int *fhoff) { u8 nexthdr = ipv6_hdr(skb)->nexthdr; const int netoff = skb_network_offset(skb); u8 prev_nhoff = netoff + offsetof(struct ipv6hdr, nexthdr); int start = netoff + sizeof(struct ipv6hdr); int len = skb->len - start; u8 prevhdr = NEXTHDR_IPV6; while (nexthdr != NEXTHDR_FRAGMENT) { struct ipv6_opt_hdr hdr; int hdrlen; if (!ipv6_ext_hdr(nexthdr)) { return -1; } if (nexthdr == NEXTHDR_NONE) { pr_debug("next header is none\n"); return -1; } if (len < (int)sizeof(struct ipv6_opt_hdr)) { pr_debug("too short\n"); return -1; } if (skb_copy_bits(skb, start, &hdr, sizeof(hdr))) BUG(); if (nexthdr == NEXTHDR_AUTH) hdrlen = ipv6_authlen(&hdr); else hdrlen = ipv6_optlen(&hdr); prevhdr = nexthdr; prev_nhoff = start; nexthdr = hdr.nexthdr; len -= hdrlen; start += hdrlen; } if (len < 0) return -1; *prevhdrp = prevhdr; *prevhoff = prev_nhoff; *fhoff = start; return 0; } int nf_ct_frag6_gather(struct net *net, struct sk_buff *skb, u32 user) { u16 savethdr = skb->transport_header; u8 nexthdr = NEXTHDR_FRAGMENT; int fhoff, nhoff, ret; struct frag_hdr *fhdr; struct frag_queue *fq; struct ipv6hdr *hdr; int refs = 0; u8 prevhdr; /* Jumbo payload inhibits frag. header */ if (ipv6_hdr(skb)->payload_len == 0) { pr_debug("payload len = 0\n"); return 0; } if (find_prev_fhdr(skb, &prevhdr, &nhoff, &fhoff) < 0) return 0; /* Discard the first fragment if it does not include all headers * RFC 8200, Section 4.5 */ if (ipv6frag_thdr_truncated(skb, fhoff, &nexthdr)) { pr_debug("Drop incomplete fragment\n"); return 0; } if (!pskb_may_pull(skb, fhoff + sizeof(*fhdr))) return -ENOMEM; skb_set_transport_header(skb, fhoff); hdr = ipv6_hdr(skb); fhdr = (struct frag_hdr *)skb_transport_header(skb); rcu_read_lock(); fq = fq_find(net, fhdr->identification, user, hdr, skb->dev ? skb->dev->ifindex : 0); if (fq == NULL) { rcu_read_unlock(); pr_debug("Can't find and can't create new queue\n"); return -ENOMEM; } spin_lock_bh(&fq->q.lock); ret = nf_ct_frag6_queue(fq, skb, fhdr, nhoff, &refs); if (ret == -EPROTO) { skb->transport_header = savethdr; ret = 0; } spin_unlock_bh(&fq->q.lock); rcu_read_unlock(); inet_frag_putn(&fq->q, refs); return ret; } EXPORT_SYMBOL_GPL(nf_ct_frag6_gather); static int nf_ct_net_init(struct net *net) { struct nft_ct_frag6_pernet *nf_frag = nf_frag_pernet(net); int res; res = fqdir_init(&nf_frag->fqdir, &nf_frags, net); if (res < 0) return res; nf_frag->fqdir->high_thresh = IPV6_FRAG_HIGH_THRESH; nf_frag->fqdir->low_thresh = IPV6_FRAG_LOW_THRESH; nf_frag->fqdir->timeout = IPV6_FRAG_TIMEOUT; res = nf_ct_frag6_sysctl_register(net); if (res < 0) fqdir_exit(nf_frag->fqdir); return res; } static void nf_ct_net_pre_exit(struct net *net) { struct nft_ct_frag6_pernet *nf_frag = nf_frag_pernet(net); fqdir_pre_exit(nf_frag->fqdir); } static void nf_ct_net_exit(struct net *net) { struct nft_ct_frag6_pernet *nf_frag = nf_frag_pernet(net); nf_ct_frags6_sysctl_unregister(net); fqdir_exit(nf_frag->fqdir); } static struct pernet_operations nf_ct_net_ops = { .init = nf_ct_net_init, .pre_exit = nf_ct_net_pre_exit, .exit = nf_ct_net_exit, .id = &nf_frag_pernet_id, .size = sizeof(struct nft_ct_frag6_pernet), }; static const struct rhashtable_params nfct_rhash_params = { .head_offset = offsetof(struct inet_frag_queue, node), .hashfn = ip6frag_key_hashfn, .obj_hashfn = ip6frag_obj_hashfn, .obj_cmpfn = ip6frag_obj_cmpfn, .automatic_shrinking = true, }; int nf_ct_frag6_init(void) { int ret = 0; nf_frags.constructor = ip6frag_init; nf_frags.destructor = NULL; nf_frags.qsize = sizeof(struct frag_queue); nf_frags.frag_expire = nf_ct_frag6_expire; nf_frags.frags_cache_name = nf_frags_cache_name; nf_frags.rhash_params = nfct_rhash_params; ret = inet_frags_init(&nf_frags); if (ret) goto out; ret = register_pernet_subsys(&nf_ct_net_ops); if (ret) inet_frags_fini(&nf_frags); out: return ret; } void nf_ct_frag6_cleanup(void) { unregister_pernet_subsys(&nf_ct_net_ops); inet_frags_fini(&nf_frags); } |
| 11 11 1 1 2 2 5 4 1 1177 1176 190 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 | // SPDX-License-Identifier: GPL-2.0-only /* * Copyright (c) 2008-2011, Intel Corporation. * * Description: Data Center Bridging netlink interface * Author: Lucy Liu <lucy.liu@intel.com> */ #include <linux/netdevice.h> #include <linux/netlink.h> #include <linux/slab.h> #include <net/netlink.h> #include <net/rtnetlink.h> #include <linux/dcbnl.h> #include <net/dcbevent.h> #include <linux/rtnetlink.h> #include <linux/init.h> #include <net/sock.h> /* Data Center Bridging (DCB) is a collection of Ethernet enhancements * intended to allow network traffic with differing requirements * (highly reliable, no drops vs. best effort vs. low latency) to operate * and co-exist on Ethernet. Current DCB features are: * * Enhanced Transmission Selection (aka Priority Grouping [PG]) - provides a * framework for assigning bandwidth guarantees to traffic classes. * * Priority-based Flow Control (PFC) - provides a flow control mechanism which * can work independently for each 802.1p priority. * * Congestion Notification - provides a mechanism for end-to-end congestion * control for protocols which do not have built-in congestion management. * * More information about the emerging standards for these Ethernet features * can be found at: http://www.ieee802.org/1/pages/dcbridges.html * * This file implements an rtnetlink interface to allow configuration of DCB * features for capable devices. */ /**************** DCB attribute policies *************************************/ /* DCB netlink attributes policy */ static const struct nla_policy dcbnl_rtnl_policy[DCB_ATTR_MAX + 1] = { [DCB_ATTR_IFNAME] = {.type = NLA_NUL_STRING, .len = IFNAMSIZ - 1}, [DCB_ATTR_STATE] = {.type = NLA_U8}, [DCB_ATTR_PFC_CFG] = {.type = NLA_NESTED}, [DCB_ATTR_PG_CFG] = {.type = NLA_NESTED}, [DCB_ATTR_SET_ALL] = {.type = NLA_U8}, [DCB_ATTR_PERM_HWADDR] = {.type = NLA_FLAG}, [DCB_ATTR_CAP] = {.type = NLA_NESTED}, [DCB_ATTR_PFC_STATE] = {.type = NLA_U8}, [DCB_ATTR_BCN] = {.type = NLA_NESTED}, [DCB_ATTR_APP] = {.type = NLA_NESTED}, [DCB_ATTR_IEEE] = {.type = NLA_NESTED}, [DCB_ATTR_DCBX] = {.type = NLA_U8}, [DCB_ATTR_FEATCFG] = {.type = NLA_NESTED}, }; /* DCB priority flow control to User Priority nested attributes */ static const struct nla_policy dcbnl_pfc_up_nest[DCB_PFC_UP_ATTR_MAX + 1] = { [DCB_PFC_UP_ATTR_0] = {.type = NLA_U8}, [DCB_PFC_UP_ATTR_1] = {.type = NLA_U8}, [DCB_PFC_UP_ATTR_2] = {.type = NLA_U8}, [DCB_PFC_UP_ATTR_3] = {.type = NLA_U8}, [DCB_PFC_UP_ATTR_4] = {.type = NLA_U8}, [DCB_PFC_UP_ATTR_5] = {.type = NLA_U8}, [DCB_PFC_UP_ATTR_6] = {.type = NLA_U8}, [DCB_PFC_UP_ATTR_7] = {.type = NLA_U8}, [DCB_PFC_UP_ATTR_ALL] = {.type = NLA_FLAG}, }; /* DCB priority grouping nested attributes */ static const struct nla_policy dcbnl_pg_nest[DCB_PG_ATTR_MAX + 1] = { [DCB_PG_ATTR_TC_0] = {.type = NLA_NESTED}, [DCB_PG_ATTR_TC_1] = {.type = NLA_NESTED}, [DCB_PG_ATTR_TC_2] = {.type = NLA_NESTED}, [DCB_PG_ATTR_TC_3] = {.type = NLA_NESTED}, [DCB_PG_ATTR_TC_4] = {.type = NLA_NESTED}, [DCB_PG_ATTR_TC_5] = {.type = NLA_NESTED}, [DCB_PG_ATTR_TC_6] = {.type = NLA_NESTED}, [DCB_PG_ATTR_TC_7] = {.type = NLA_NESTED}, [DCB_PG_ATTR_TC_ALL] = {.type = NLA_NESTED}, [DCB_PG_ATTR_BW_ID_0] = {.type = NLA_U8}, [DCB_PG_ATTR_BW_ID_1] = {.type = NLA_U8}, [DCB_PG_ATTR_BW_ID_2] = {.type = NLA_U8}, [DCB_PG_ATTR_BW_ID_3] = {.type = NLA_U8}, [DCB_PG_ATTR_BW_ID_4] = {.type = NLA_U8}, [DCB_PG_ATTR_BW_ID_5] = {.type = NLA_U8}, [DCB_PG_ATTR_BW_ID_6] = {.type = NLA_U8}, [DCB_PG_ATTR_BW_ID_7] = {.type = NLA_U8}, [DCB_PG_ATTR_BW_ID_ALL] = {.type = NLA_FLAG}, }; /* DCB traffic class nested attributes. */ static const struct nla_policy dcbnl_tc_param_nest[DCB_TC_ATTR_PARAM_MAX + 1] = { [DCB_TC_ATTR_PARAM_PGID] = {.type = NLA_U8}, [DCB_TC_ATTR_PARAM_UP_MAPPING] = {.type = NLA_U8}, [DCB_TC_ATTR_PARAM_STRICT_PRIO] = {.type = NLA_U8}, [DCB_TC_ATTR_PARAM_BW_PCT] = {.type = NLA_U8}, [DCB_TC_ATTR_PARAM_ALL] = {.type = NLA_FLAG}, }; /* DCB capabilities nested attributes. */ static const struct nla_policy dcbnl_cap_nest[DCB_CAP_ATTR_MAX + 1] = { [DCB_CAP_ATTR_ALL] = {.type = NLA_FLAG}, [DCB_CAP_ATTR_PG] = {.type = NLA_U8}, [DCB_CAP_ATTR_PFC] = {.type = NLA_U8}, [DCB_CAP_ATTR_UP2TC] = {.type = NLA_U8}, [DCB_CAP_ATTR_PG_TCS] = {.type = NLA_U8}, [DCB_CAP_ATTR_PFC_TCS] = {.type = NLA_U8}, [DCB_CAP_ATTR_GSP] = {.type = NLA_U8}, [DCB_CAP_ATTR_BCN] = {.type = NLA_U8}, [DCB_CAP_ATTR_DCBX] = {.type = NLA_U8}, }; /* DCB capabilities nested attributes. */ static const struct nla_policy dcbnl_numtcs_nest[DCB_NUMTCS_ATTR_MAX + 1] = { [DCB_NUMTCS_ATTR_ALL] = {.type = NLA_FLAG}, [DCB_NUMTCS_ATTR_PG] = {.type = NLA_U8}, [DCB_NUMTCS_ATTR_PFC] = {.type = NLA_U8}, }; /* DCB BCN nested attributes. */ static const struct nla_policy dcbnl_bcn_nest[DCB_BCN_ATTR_MAX + 1] = { [DCB_BCN_ATTR_RP_0] = {.type = NLA_U8}, [DCB_BCN_ATTR_RP_1] = {.type = NLA_U8}, [DCB_BCN_ATTR_RP_2] = {.type = NLA_U8}, [DCB_BCN_ATTR_RP_3] = {.type = NLA_U8}, [DCB_BCN_ATTR_RP_4] = {.type = NLA_U8}, [DCB_BCN_ATTR_RP_5] = {.type = NLA_U8}, [DCB_BCN_ATTR_RP_6] = {.type = NLA_U8}, [DCB_BCN_ATTR_RP_7] = {.type = NLA_U8}, [DCB_BCN_ATTR_RP_ALL] = {.type = NLA_FLAG}, [DCB_BCN_ATTR_BCNA_0] = {.type = NLA_U32}, [DCB_BCN_ATTR_BCNA_1] = {.type = NLA_U32}, [DCB_BCN_ATTR_ALPHA] = {.type = NLA_U32}, [DCB_BCN_ATTR_BETA] = {.type = NLA_U32}, [DCB_BCN_ATTR_GD] = {.type = NLA_U32}, [DCB_BCN_ATTR_GI] = {.type = NLA_U32}, [DCB_BCN_ATTR_TMAX] = {.type = NLA_U32}, [DCB_BCN_ATTR_TD] = {.type = NLA_U32}, [DCB_BCN_ATTR_RMIN] = {.type = NLA_U32}, [DCB_BCN_ATTR_W] = {.type = NLA_U32}, [DCB_BCN_ATTR_RD] = {.type = NLA_U32}, [DCB_BCN_ATTR_RU] = {.type = NLA_U32}, [DCB_BCN_ATTR_WRTT] = {.type = NLA_U32}, [DCB_BCN_ATTR_RI] = {.type = NLA_U32}, [DCB_BCN_ATTR_C] = {.type = NLA_U32}, [DCB_BCN_ATTR_ALL] = {.type = NLA_FLAG}, }; /* DCB APP nested attributes. */ static const struct nla_policy dcbnl_app_nest[DCB_APP_ATTR_MAX + 1] = { [DCB_APP_ATTR_IDTYPE] = {.type = NLA_U8}, [DCB_APP_ATTR_ID] = {.type = NLA_U16}, [DCB_APP_ATTR_PRIORITY] = {.type = NLA_U8}, }; /* IEEE 802.1Qaz nested attributes. */ static const struct nla_policy dcbnl_ieee_policy[DCB_ATTR_IEEE_MAX + 1] = { [DCB_ATTR_IEEE_ETS] = {.len = sizeof(struct ieee_ets)}, [DCB_ATTR_IEEE_PFC] = {.len = sizeof(struct ieee_pfc)}, [DCB_ATTR_IEEE_APP_TABLE] = {.type = NLA_NESTED}, [DCB_ATTR_IEEE_MAXRATE] = {.len = sizeof(struct ieee_maxrate)}, [DCB_ATTR_IEEE_QCN] = {.len = sizeof(struct ieee_qcn)}, [DCB_ATTR_IEEE_QCN_STATS] = {.len = sizeof(struct ieee_qcn_stats)}, [DCB_ATTR_DCB_BUFFER] = {.len = sizeof(struct dcbnl_buffer)}, [DCB_ATTR_DCB_APP_TRUST_TABLE] = {.type = NLA_NESTED}, }; /* DCB number of traffic classes nested attributes. */ static const struct nla_policy dcbnl_featcfg_nest[DCB_FEATCFG_ATTR_MAX + 1] = { [DCB_FEATCFG_ATTR_ALL] = {.type = NLA_FLAG}, [DCB_FEATCFG_ATTR_PG] = {.type = NLA_U8}, [DCB_FEATCFG_ATTR_PFC] = {.type = NLA_U8}, [DCB_FEATCFG_ATTR_APP] = {.type = NLA_U8}, }; static LIST_HEAD(dcb_app_list); static LIST_HEAD(dcb_rewr_list); static DEFINE_SPINLOCK(dcb_lock); static enum ieee_attrs_app dcbnl_app_attr_type_get(u8 selector) { switch (selector) { case IEEE_8021QAZ_APP_SEL_ETHERTYPE: case IEEE_8021QAZ_APP_SEL_STREAM: case IEEE_8021QAZ_APP_SEL_DGRAM: case IEEE_8021QAZ_APP_SEL_ANY: case IEEE_8021QAZ_APP_SEL_DSCP: return DCB_ATTR_IEEE_APP; case DCB_APP_SEL_PCP: return DCB_ATTR_DCB_APP; default: return DCB_ATTR_IEEE_APP_UNSPEC; } } static bool dcbnl_app_attr_type_validate(enum ieee_attrs_app type) { switch (type) { case DCB_ATTR_IEEE_APP: case DCB_ATTR_DCB_APP: return true; default: return false; } } static bool dcbnl_app_selector_validate(enum ieee_attrs_app type, u8 selector) { return dcbnl_app_attr_type_get(selector) == type; } static struct sk_buff *dcbnl_newmsg(int type, u8 cmd, u32 port, u32 seq, u32 flags, struct nlmsghdr **nlhp) { struct sk_buff *skb; struct dcbmsg *dcb; struct nlmsghdr *nlh; skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); if (!skb) return NULL; nlh = nlmsg_put(skb, port, seq, type, sizeof(*dcb), flags); BUG_ON(!nlh); dcb = nlmsg_data(nlh); dcb->dcb_family = AF_UNSPEC; dcb->cmd = cmd; dcb->dcb_pad = 0; if (nlhp) *nlhp = nlh; return skb; } static int dcbnl_getstate(struct net_device *netdev, struct nlmsghdr *nlh, u32 seq, struct nlattr **tb, struct sk_buff *skb) { /* if (!tb[DCB_ATTR_STATE] || !netdev->dcbnl_ops->getstate) */ if (!netdev->dcbnl_ops->getstate) return -EOPNOTSUPP; return nla_put_u8(skb, DCB_ATTR_STATE, netdev->dcbnl_ops->getstate(netdev)); } static int dcbnl_getpfccfg(struct net_device *netdev, struct nlmsghdr *nlh, u32 seq, struct nlattr **tb, struct sk_buff *skb) { struct nlattr *data[DCB_PFC_UP_ATTR_MAX + 1], *nest; u8 value; int ret; int i; int getall = 0; if (!tb[DCB_ATTR_PFC_CFG]) return -EINVAL; if (!netdev->dcbnl_ops->getpfccfg) return -EOPNOTSUPP; ret = nla_parse_nested_deprecated(data, DCB_PFC_UP_ATTR_MAX, tb[DCB_ATTR_PFC_CFG], dcbnl_pfc_up_nest, NULL); if (ret) return ret; nest = nla_nest_start_noflag(skb, DCB_ATTR_PFC_CFG); if (!nest) return -EMSGSIZE; if (data[DCB_PFC_UP_ATTR_ALL]) getall = 1; for (i = DCB_PFC_UP_ATTR_0; i <= DCB_PFC_UP_ATTR_7; i++) { if (!getall && !data[i]) continue; netdev->dcbnl_ops->getpfccfg(netdev, i - DCB_PFC_UP_ATTR_0, &value); ret = nla_put_u8(skb, i, value); if (ret) { nla_nest_cancel(skb, nest); return ret; } } nla_nest_end(skb, nest); return 0; } static int dcbnl_getperm_hwaddr(struct net_device *netdev, struct nlmsghdr *nlh, u32 seq, struct nlattr **tb, struct sk_buff *skb) { u8 perm_addr[MAX_ADDR_LEN]; if (!netdev->dcbnl_ops->getpermhwaddr) return -EOPNOTSUPP; memset(perm_addr, 0, sizeof(perm_addr)); netdev->dcbnl_ops->getpermhwaddr(netdev, perm_addr); return nla_put(skb, DCB_ATTR_PERM_HWADDR, sizeof(perm_addr), perm_addr); } static int dcbnl_getcap(struct net_device *netdev, struct nlmsghdr *nlh, u32 seq, struct nlattr **tb, struct sk_buff *skb) { struct nlattr *data[DCB_CAP_ATTR_MAX + 1], *nest; u8 value; int ret; int i; int getall = 0; if (!tb[DCB_ATTR_CAP]) return -EINVAL; if (!netdev->dcbnl_ops->getcap) return -EOPNOTSUPP; ret = nla_parse_nested_deprecated(data, DCB_CAP_ATTR_MAX, tb[DCB_ATTR_CAP], dcbnl_cap_nest, NULL); if (ret) return ret; nest = nla_nest_start_noflag(skb, DCB_ATTR_CAP); if (!nest) return -EMSGSIZE; if (data[DCB_CAP_ATTR_ALL]) getall = 1; for (i = DCB_CAP_ATTR_ALL+1; i <= DCB_CAP_ATTR_MAX; i++) { if (!getall && !data[i]) continue; if (!netdev->dcbnl_ops->getcap(netdev, i, &value)) { ret = nla_put_u8(skb, i, value); if (ret) { nla_nest_cancel(skb, nest); return ret; } } } nla_nest_end(skb, nest); return 0; } static int dcbnl_getnumtcs(struct net_device *netdev, struct nlmsghdr *nlh, u32 seq, struct nlattr **tb, struct sk_buff *skb) { struct nlattr *data[DCB_NUMTCS_ATTR_MAX + 1], *nest; u8 value; int ret; int i; int getall = 0; if (!tb[DCB_ATTR_NUMTCS]) return -EINVAL; if (!netdev->dcbnl_ops->getnumtcs) return -EOPNOTSUPP; ret = nla_parse_nested_deprecated(data, DCB_NUMTCS_ATTR_MAX, tb[DCB_ATTR_NUMTCS], dcbnl_numtcs_nest, NULL); if (ret) return ret; nest = nla_nest_start_noflag(skb, DCB_ATTR_NUMTCS); if (!nest) return -EMSGSIZE; if (data[DCB_NUMTCS_ATTR_ALL]) getall = 1; for (i = DCB_NUMTCS_ATTR_ALL+1; i <= DCB_NUMTCS_ATTR_MAX; i++) { if (!getall && !data[i]) continue; ret = netdev->dcbnl_ops->getnumtcs(netdev, i, &value); if (!ret) { ret = nla_put_u8(skb, i, value); if (ret) { nla_nest_cancel(skb, nest); return ret; } } else return -EINVAL; } nla_nest_end(skb, nest); return 0; } static int dcbnl_setnumtcs(struct net_device *netdev, struct nlmsghdr *nlh, u32 seq, struct nlattr **tb, struct sk_buff *skb) { struct nlattr *data[DCB_NUMTCS_ATTR_MAX + 1]; int ret; u8 value; int i; if (!tb[DCB_ATTR_NUMTCS]) return -EINVAL; if (!netdev->dcbnl_ops->setnumtcs) return -EOPNOTSUPP; ret = nla_parse_nested_deprecated(data, DCB_NUMTCS_ATTR_MAX, tb[DCB_ATTR_NUMTCS], dcbnl_numtcs_nest, NULL); if (ret) return ret; for (i = DCB_NUMTCS_ATTR_ALL+1; i <= DCB_NUMTCS_ATTR_MAX; i++) { if (data[i] == NULL) continue; value = nla_get_u8(data[i]); ret = netdev->dcbnl_ops->setnumtcs(netdev, i, value); if (ret) break; } return nla_put_u8(skb, DCB_ATTR_NUMTCS, !!ret); } static int dcbnl_getpfcstate(struct net_device *netdev, struct nlmsghdr *nlh, u32 seq, struct nlattr **tb, struct sk_buff *skb) { if (!netdev->dcbnl_ops->getpfcstate) return -EOPNOTSUPP; return nla_put_u8(skb, DCB_ATTR_PFC_STATE, netdev->dcbnl_ops->getpfcstate(netdev)); } static int dcbnl_setpfcstate(struct net_device *netdev, struct nlmsghdr *nlh, u32 seq, struct nlattr **tb, struct sk_buff *skb) { u8 value; if (!tb[DCB_ATTR_PFC_STATE]) return -EINVAL; if (!netdev->dcbnl_ops->setpfcstate) return -EOPNOTSUPP; value = nla_get_u8(tb[DCB_ATTR_PFC_STATE]); netdev->dcbnl_ops->setpfcstate(netdev, value); return nla_put_u8(skb, DCB_ATTR_PFC_STATE, 0); } static int dcbnl_getapp(struct net_device *netdev, struct nlmsghdr *nlh, u32 seq, struct nlattr **tb, struct sk_buff *skb) { struct nlattr *app_nest; struct nlattr *app_tb[DCB_APP_ATTR_MAX + 1]; u16 id; u8 up, idtype; int ret; if (!tb[DCB_ATTR_APP]) return -EINVAL; ret = nla_parse_nested_deprecated(app_tb, DCB_APP_ATTR_MAX, tb[DCB_ATTR_APP], dcbnl_app_nest, NULL); if (ret) return ret; /* all must be non-null */ if ((!app_tb[DCB_APP_ATTR_IDTYPE]) || (!app_tb[DCB_APP_ATTR_ID])) return -EINVAL; /* either by eth type or by socket number */ idtype = nla_get_u8(app_tb[DCB_APP_ATTR_IDTYPE]); if ((idtype != DCB_APP_IDTYPE_ETHTYPE) && (idtype != DCB_APP_IDTYPE_PORTNUM)) return -EINVAL; id = nla_get_u16(app_tb[DCB_APP_ATTR_ID]); if (netdev->dcbnl_ops->getapp) { ret = netdev->dcbnl_ops->getapp(netdev, idtype, id); if (ret < 0) return ret; else up = ret; } else { struct dcb_app app = { .selector = idtype, .protocol = id, }; up = dcb_getapp(netdev, &app); } app_nest = nla_nest_start_noflag(skb, DCB_ATTR_APP); if (!app_nest) return -EMSGSIZE; ret = nla_put_u8(skb, DCB_APP_ATTR_IDTYPE, idtype); if (ret) goto out_cancel; ret = nla_put_u16(skb, DCB_APP_ATTR_ID, id); if (ret) goto out_cancel; ret = nla_put_u8(skb, DCB_APP_ATTR_PRIORITY, up); if (ret) goto out_cancel; nla_nest_end(skb, app_nest); return 0; out_cancel: nla_nest_cancel(skb, app_nest); return ret; } static int dcbnl_setapp(struct net_device *netdev, struct nlmsghdr *nlh, u32 seq, struct nlattr **tb, struct sk_buff *skb) { int ret; u16 id; u8 up, idtype; struct nlattr *app_tb[DCB_APP_ATTR_MAX + 1]; if (!tb[DCB_ATTR_APP]) return -EINVAL; ret = nla_parse_nested_deprecated(app_tb, DCB_APP_ATTR_MAX, tb[DCB_ATTR_APP], dcbnl_app_nest, NULL); if (ret) return ret; /* all must be non-null */ if ((!app_tb[DCB_APP_ATTR_IDTYPE]) || (!app_tb[DCB_APP_ATTR_ID]) || (!app_tb[DCB_APP_ATTR_PRIORITY])) return -EINVAL; /* either by eth type or by socket number */ idtype = nla_get_u8(app_tb[DCB_APP_ATTR_IDTYPE]); if ((idtype != DCB_APP_IDTYPE_ETHTYPE) && (idtype != DCB_APP_IDTYPE_PORTNUM)) return -EINVAL; id = nla_get_u16(app_tb[DCB_APP_ATTR_ID]); up = nla_get_u8(app_tb[DCB_APP_ATTR_PRIORITY]); if (netdev->dcbnl_ops->setapp) { ret = netdev->dcbnl_ops->setapp(netdev, idtype, id, up); if (ret < 0) return ret; } else { struct dcb_app app; app.selector = idtype; app.protocol = id; app.priority = up; ret = dcb_setapp(netdev, &app); } ret = nla_put_u8(skb, DCB_ATTR_APP, ret); dcbnl_cee_notify(netdev, RTM_SETDCB, DCB_CMD_SAPP, seq, 0); return ret; } static int __dcbnl_pg_getcfg(struct net_device *netdev, struct nlmsghdr *nlh, struct nlattr **tb, struct sk_buff *skb, int dir) { struct nlattr *pg_nest, *param_nest, *data; struct nlattr *pg_tb[DCB_PG_ATTR_MAX + 1]; struct nlattr *param_tb[DCB_TC_ATTR_PARAM_MAX + 1]; u8 prio, pgid, tc_pct, up_map; int ret; int getall = 0; int i; if (!tb[DCB_ATTR_PG_CFG]) return -EINVAL; if (!netdev->dcbnl_ops->getpgtccfgtx || !netdev->dcbnl_ops->getpgtccfgrx || !netdev->dcbnl_ops->getpgbwgcfgtx || !netdev->dcbnl_ops->getpgbwgcfgrx) return -EOPNOTSUPP; ret = nla_parse_nested_deprecated(pg_tb, DCB_PG_ATTR_MAX, tb[DCB_ATTR_PG_CFG], dcbnl_pg_nest, NULL); if (ret) return ret; pg_nest = nla_nest_start_noflag(skb, DCB_ATTR_PG_CFG); if (!pg_nest) return -EMSGSIZE; if (pg_tb[DCB_PG_ATTR_TC_ALL]) getall = 1; for (i = DCB_PG_ATTR_TC_0; i <= DCB_PG_ATTR_TC_7; i++) { if (!getall && !pg_tb[i]) continue; if (pg_tb[DCB_PG_ATTR_TC_ALL]) data = pg_tb[DCB_PG_ATTR_TC_ALL]; else data = pg_tb[i]; ret = nla_parse_nested_deprecated(param_tb, DCB_TC_ATTR_PARAM_MAX, data, dcbnl_tc_param_nest, NULL); if (ret) goto err_pg; param_nest = nla_nest_start_noflag(skb, i); if (!param_nest) goto err_pg; pgid = DCB_ATTR_VALUE_UNDEFINED; prio = DCB_ATTR_VALUE_UNDEFINED; tc_pct = DCB_ATTR_VALUE_UNDEFINED; up_map = DCB_ATTR_VALUE_UNDEFINED; if (dir) { /* Rx */ netdev->dcbnl_ops->getpgtccfgrx(netdev, i - DCB_PG_ATTR_TC_0, &prio, &pgid, &tc_pct, &up_map); } else { /* Tx */ netdev->dcbnl_ops->getpgtccfgtx(netdev, i - DCB_PG_ATTR_TC_0, &prio, &pgid, &tc_pct, &up_map); } if (param_tb[DCB_TC_ATTR_PARAM_PGID] || param_tb[DCB_TC_ATTR_PARAM_ALL]) { ret = nla_put_u8(skb, DCB_TC_ATTR_PARAM_PGID, pgid); if (ret) goto err_param; } if (param_tb[DCB_TC_ATTR_PARAM_UP_MAPPING] || param_tb[DCB_TC_ATTR_PARAM_ALL]) { ret = nla_put_u8(skb, DCB_TC_ATTR_PARAM_UP_MAPPING, up_map); if (ret) goto err_param; } if (param_tb[DCB_TC_ATTR_PARAM_STRICT_PRIO] || param_tb[DCB_TC_ATTR_PARAM_ALL]) { ret = nla_put_u8(skb, DCB_TC_ATTR_PARAM_STRICT_PRIO, prio); if (ret) goto err_param; } if (param_tb[DCB_TC_ATTR_PARAM_BW_PCT] || param_tb[DCB_TC_ATTR_PARAM_ALL]) { ret = nla_put_u8(skb, DCB_TC_ATTR_PARAM_BW_PCT, tc_pct); if (ret) goto err_param; } nla_nest_end(skb, param_nest); } if (pg_tb[DCB_PG_ATTR_BW_ID_ALL]) getall = 1; else getall = 0; for (i = DCB_PG_ATTR_BW_ID_0; i <= DCB_PG_ATTR_BW_ID_7; i++) { if (!getall && !pg_tb[i]) continue; tc_pct = DCB_ATTR_VALUE_UNDEFINED; if (dir) { /* Rx */ netdev->dcbnl_ops->getpgbwgcfgrx(netdev, i - DCB_PG_ATTR_BW_ID_0, &tc_pct); } else { /* Tx */ netdev->dcbnl_ops->getpgbwgcfgtx(netdev, i - DCB_PG_ATTR_BW_ID_0, &tc_pct); } ret = nla_put_u8(skb, i, tc_pct); if (ret) goto err_pg; } nla_nest_end(skb, pg_nest); return 0; err_param: nla_nest_cancel(skb, param_nest); err_pg: nla_nest_cancel(skb, pg_nest); return -EMSGSIZE; } static int dcbnl_pgtx_getcfg(struct net_device *netdev, struct nlmsghdr *nlh, u32 seq, struct nlattr **tb, struct sk_buff *skb) { return __dcbnl_pg_getcfg(netdev, nlh, tb, skb, 0); } static int dcbnl_pgrx_getcfg(struct net_device *netdev, struct nlmsghdr *nlh, u32 seq, struct nlattr **tb, struct sk_buff *skb) { return __dcbnl_pg_getcfg(netdev, nlh, tb, skb, 1); } static int dcbnl_setstate(struct net_device *netdev, struct nlmsghdr *nlh, u32 seq, struct nlattr **tb, struct sk_buff *skb) { u8 value; if (!tb[DCB_ATTR_STATE]) return -EINVAL; if (!netdev->dcbnl_ops->setstate) return -EOPNOTSUPP; value = nla_get_u8(tb[DCB_ATTR_STATE]); return nla_put_u8(skb, DCB_ATTR_STATE, netdev->dcbnl_ops->setstate(netdev, value)); } static int dcbnl_setpfccfg(struct net_device *netdev, struct nlmsghdr *nlh, u32 seq, struct nlattr **tb, struct sk_buff *skb) { struct nlattr *data[DCB_PFC_UP_ATTR_MAX + 1]; int i; int ret; u8 value; if (!tb[DCB_ATTR_PFC_CFG]) return -EINVAL; if (!netdev->dcbnl_ops->setpfccfg) return -EOPNOTSUPP; ret = nla_parse_nested_deprecated(data, DCB_PFC_UP_ATTR_MAX, tb[DCB_ATTR_PFC_CFG], dcbnl_pfc_up_nest, NULL); if (ret) return ret; for (i = DCB_PFC_UP_ATTR_0; i <= DCB_PFC_UP_ATTR_7; i++) { if (data[i] == NULL) continue; value = nla_get_u8(data[i]); netdev->dcbnl_ops->setpfccfg(netdev, data[i]->nla_type - DCB_PFC_UP_ATTR_0, value); } return nla_put_u8(skb, DCB_ATTR_PFC_CFG, 0); } static int dcbnl_setall(struct net_device *netdev, struct nlmsghdr *nlh, u32 seq, struct nlattr **tb, struct sk_buff *skb) { int ret; if (!tb[DCB_ATTR_SET_ALL]) return -EINVAL; if (!netdev->dcbnl_ops->setall) return -EOPNOTSUPP; ret = nla_put_u8(skb, DCB_ATTR_SET_ALL, netdev->dcbnl_ops->setall(netdev)); dcbnl_cee_notify(netdev, RTM_SETDCB, DCB_CMD_SET_ALL, seq, 0); return ret; } static int __dcbnl_pg_setcfg(struct net_device *netdev, struct nlmsghdr *nlh, u32 seq, struct nlattr **tb, struct sk_buff *skb, int dir) { struct nlattr *pg_tb[DCB_PG_ATTR_MAX + 1]; struct nlattr *param_tb[DCB_TC_ATTR_PARAM_MAX + 1]; int ret; int i; u8 pgid; u8 up_map; u8 prio; u8 tc_pct; if (!tb[DCB_ATTR_PG_CFG]) return -EINVAL; if (!netdev->dcbnl_ops->setpgtccfgtx || !netdev->dcbnl_ops->setpgtccfgrx || !netdev->dcbnl_ops->setpgbwgcfgtx || !netdev->dcbnl_ops->setpgbwgcfgrx) return -EOPNOTSUPP; ret = nla_parse_nested_deprecated(pg_tb, DCB_PG_ATTR_MAX, tb[DCB_ATTR_PG_CFG], dcbnl_pg_nest, NULL); if (ret) return ret; for (i = DCB_PG_ATTR_TC_0; i <= DCB_PG_ATTR_TC_7; i++) { if (!pg_tb[i]) continue; ret = nla_parse_nested_deprecated(param_tb, DCB_TC_ATTR_PARAM_MAX, pg_tb[i], dcbnl_tc_param_nest, NULL); if (ret) return ret; pgid = DCB_ATTR_VALUE_UNDEFINED; prio = DCB_ATTR_VALUE_UNDEFINED; tc_pct = DCB_ATTR_VALUE_UNDEFINED; up_map = DCB_ATTR_VALUE_UNDEFINED; if (param_tb[DCB_TC_ATTR_PARAM_STRICT_PRIO]) prio = nla_get_u8(param_tb[DCB_TC_ATTR_PARAM_STRICT_PRIO]); if (param_tb[DCB_TC_ATTR_PARAM_PGID]) pgid = nla_get_u8(param_tb[DCB_TC_ATTR_PARAM_PGID]); if (param_tb[DCB_TC_ATTR_PARAM_BW_PCT]) tc_pct = nla_get_u8(param_tb[DCB_TC_ATTR_PARAM_BW_PCT]); if (param_tb[DCB_TC_ATTR_PARAM_UP_MAPPING]) up_map = nla_get_u8(param_tb[DCB_TC_ATTR_PARAM_UP_MAPPING]); /* dir: Tx = 0, Rx = 1 */ if (dir) { /* Rx */ netdev->dcbnl_ops->setpgtccfgrx(netdev, i - DCB_PG_ATTR_TC_0, prio, pgid, tc_pct, up_map); } else { /* Tx */ netdev->dcbnl_ops->setpgtccfgtx(netdev, i - DCB_PG_ATTR_TC_0, prio, pgid, tc_pct, up_map); } } for (i = DCB_PG_ATTR_BW_ID_0; i <= DCB_PG_ATTR_BW_ID_7; i++) { if (!pg_tb[i]) continue; tc_pct = nla_get_u8(pg_tb[i]); /* dir: Tx = 0, Rx = 1 */ if (dir) { /* Rx */ netdev->dcbnl_ops->setpgbwgcfgrx(netdev, i - DCB_PG_ATTR_BW_ID_0, tc_pct); } else { /* Tx */ netdev->dcbnl_ops->setpgbwgcfgtx(netdev, i - DCB_PG_ATTR_BW_ID_0, tc_pct); } } return nla_put_u8(skb, DCB_ATTR_PG_CFG, 0); } static int dcbnl_pgtx_setcfg(struct net_device *netdev, struct nlmsghdr *nlh, u32 seq, struct nlattr **tb, struct sk_buff *skb) { return __dcbnl_pg_setcfg(netdev, nlh, seq, tb, skb, 0); } static int dcbnl_pgrx_setcfg(struct net_device *netdev, struct nlmsghdr *nlh, u32 seq, struct nlattr **tb, struct sk_buff *skb) { return __dcbnl_pg_setcfg(netdev, nlh, seq, tb, skb, 1); } static int dcbnl_bcn_getcfg(struct net_device *netdev, struct nlmsghdr *nlh, u32 seq, struct nlattr **tb, struct sk_buff *skb) { struct nlattr *bcn_nest; struct nlattr *bcn_tb[DCB_BCN_ATTR_MAX + 1]; u8 value_byte; u32 value_integer; int ret; bool getall = false; int i; if (!tb[DCB_ATTR_BCN]) return -EINVAL; if (!netdev->dcbnl_ops->getbcnrp || !netdev->dcbnl_ops->getbcncfg) return -EOPNOTSUPP; ret = nla_parse_nested_deprecated(bcn_tb, DCB_BCN_ATTR_MAX, tb[DCB_ATTR_BCN], dcbnl_bcn_nest, NULL); if (ret) return ret; bcn_nest = nla_nest_start_noflag(skb, DCB_ATTR_BCN); if (!bcn_nest) return -EMSGSIZE; if (bcn_tb[DCB_BCN_ATTR_ALL]) getall = true; for (i = DCB_BCN_ATTR_RP_0; i <= DCB_BCN_ATTR_RP_7; i++) { if (!getall && !bcn_tb[i]) continue; netdev->dcbnl_ops->getbcnrp(netdev, i - DCB_BCN_ATTR_RP_0, &value_byte); ret = nla_put_u8(skb, i, value_byte); if (ret) goto err_bcn; } for (i = DCB_BCN_ATTR_BCNA_0; i <= DCB_BCN_ATTR_RI; i++) { if (!getall && !bcn_tb[i]) continue; netdev->dcbnl_ops->getbcncfg(netdev, i, &value_integer); ret = nla_put_u32(skb, i, value_integer); if (ret) goto err_bcn; } nla_nest_end(skb, bcn_nest); return 0; err_bcn: nla_nest_cancel(skb, bcn_nest); return ret; } static int dcbnl_bcn_setcfg(struct net_device *netdev, struct nlmsghdr *nlh, u32 seq, struct nlattr **tb, struct sk_buff *skb) { struct nlattr *data[DCB_BCN_ATTR_MAX + 1]; int i; int ret; u8 value_byte; u32 value_int; if (!tb[DCB_ATTR_BCN]) return -EINVAL; if (!netdev->dcbnl_ops->setbcncfg || !netdev->dcbnl_ops->setbcnrp) return -EOPNOTSUPP; ret = nla_parse_nested_deprecated(data, DCB_BCN_ATTR_MAX, tb[DCB_ATTR_BCN], dcbnl_bcn_nest, NULL); if (ret) return ret; for (i = DCB_BCN_ATTR_RP_0; i <= DCB_BCN_ATTR_RP_7; i++) { if (data[i] == NULL) continue; value_byte = nla_get_u8(data[i]); netdev->dcbnl_ops->setbcnrp(netdev, data[i]->nla_type - DCB_BCN_ATTR_RP_0, value_byte); } for (i = DCB_BCN_ATTR_BCNA_0; i <= DCB_BCN_ATTR_RI; i++) { if (data[i] == NULL) continue; value_int = nla_get_u32(data[i]); netdev->dcbnl_ops->setbcncfg(netdev, i, value_int); } return nla_put_u8(skb, DCB_ATTR_BCN, 0); } static int dcbnl_build_peer_app(struct net_device *netdev, struct sk_buff* skb, int app_nested_type, int app_info_type, int app_entry_type) { struct dcb_peer_app_info info; struct dcb_app *table = NULL; const struct dcbnl_rtnl_ops *ops = netdev->dcbnl_ops; u16 app_count; int err; /** * retrieve the peer app configuration form the driver. If the driver * handlers fail exit without doing anything */ err = ops->peer_getappinfo(netdev, &info, &app_count); if (!err && app_count) { table = kmalloc_array(app_count, sizeof(struct dcb_app), GFP_KERNEL); if (!table) return -ENOMEM; err = ops->peer_getapptable(netdev, table); } if (!err) { u16 i; struct nlattr *app; /** * build the message, from here on the only possible failure * is due to the skb size */ err = -EMSGSIZE; app = nla_nest_start_noflag(skb, app_nested_type); if (!app) goto nla_put_failure; if (app_info_type && nla_put(skb, app_info_type, sizeof(info), &info)) goto nla_put_failure; for (i = 0; i < app_count; i++) { if (nla_put(skb, app_entry_type, sizeof(struct dcb_app), &table[i])) goto nla_put_failure; } nla_nest_end(skb, app); } err = 0; nla_put_failure: kfree(table); return err; } static int dcbnl_getapptrust(struct net_device *netdev, struct sk_buff *skb) { const struct dcbnl_rtnl_ops *ops = netdev->dcbnl_ops; enum ieee_attrs_app type; struct nlattr *apptrust; int nselectors, err, i; u8 *selectors; selectors = kzalloc(IEEE_8021QAZ_APP_SEL_MAX + 1, GFP_KERNEL); if (!selectors) return -ENOMEM; err = ops->dcbnl_getapptrust(netdev, selectors, &nselectors); if (err) { err = 0; goto out; } apptrust = nla_nest_start(skb, DCB_ATTR_DCB_APP_TRUST_TABLE); if (!apptrust) { err = -EMSGSIZE; goto out; } for (i = 0; i < nselectors; i++) { type = dcbnl_app_attr_type_get(selectors[i]); err = nla_put_u8(skb, type, selectors[i]); if (err) { nla_nest_cancel(skb, apptrust); goto out; } } nla_nest_end(skb, apptrust); out: kfree(selectors); return err; } /* Set or delete APP table or rewrite table entries. The APP struct is validated * and the appropriate callback function is called. */ static int dcbnl_app_table_setdel(struct nlattr *attr, struct net_device *netdev, int (*setdel)(struct net_device *dev, struct dcb_app *app)) { struct dcb_app *app_data; enum ieee_attrs_app type; struct nlattr *attr_itr; int rem, err; nla_for_each_nested(attr_itr, attr, rem) { type = nla_type(attr_itr); if (!dcbnl_app_attr_type_validate(type)) continue; if (nla_len(attr_itr) < sizeof(struct dcb_app)) return -ERANGE; app_data = nla_data(attr_itr); if (!dcbnl_app_selector_validate(type, app_data->selector)) return -EINVAL; err = setdel(netdev, app_data); if (err) return err; } return 0; } /* Handle IEEE 802.1Qaz/802.1Qau/802.1Qbb GET commands. */ static int dcbnl_ieee_fill(struct sk_buff *skb, struct net_device *netdev) { const struct dcbnl_rtnl_ops *ops = netdev->dcbnl_ops; struct nlattr *ieee, *app, *rewr; struct dcb_app_type *itr; int dcbx; int err; if (nla_put_string(skb, DCB_ATTR_IFNAME, netdev->name)) return -EMSGSIZE; ieee = nla_nest_start_noflag(skb, DCB_ATTR_IEEE); if (!ieee) return -EMSGSIZE; if (ops->ieee_getets) { struct ieee_ets ets; memset(&ets, 0, sizeof(ets)); err = ops->ieee_getets(netdev, &ets); if (!err && nla_put(skb, DCB_ATTR_IEEE_ETS, sizeof(ets), &ets)) return -EMSGSIZE; } if (ops->ieee_getmaxrate) { struct ieee_maxrate maxrate; memset(&maxrate, 0, sizeof(maxrate)); err = ops->ieee_getmaxrate(netdev, &maxrate); if (!err) { err = nla_put(skb, DCB_ATTR_IEEE_MAXRATE, sizeof(maxrate), &maxrate); if (err) return -EMSGSIZE; } } if (ops->ieee_getqcn) { struct ieee_qcn qcn; memset(&qcn, 0, sizeof(qcn)); err = ops->ieee_getqcn(netdev, &qcn); if (!err) { err = nla_put(skb, DCB_ATTR_IEEE_QCN, sizeof(qcn), &qcn); if (err) return -EMSGSIZE; } } if (ops->ieee_getqcnstats) { struct ieee_qcn_stats qcn_stats; memset(&qcn_stats, 0, sizeof(qcn_stats)); err = ops->ieee_getqcnstats(netdev, &qcn_stats); if (!err) { err = nla_put(skb, DCB_ATTR_IEEE_QCN_STATS, sizeof(qcn_stats), &qcn_stats); if (err) return -EMSGSIZE; } } if (ops->ieee_getpfc) { struct ieee_pfc pfc; memset(&pfc, 0, sizeof(pfc)); err = ops->ieee_getpfc(netdev, &pfc); if (!err && nla_put(skb, DCB_ATTR_IEEE_PFC, sizeof(pfc), &pfc)) return -EMSGSIZE; } if (ops->dcbnl_getbuffer) { struct dcbnl_buffer buffer; memset(&buffer, 0, sizeof(buffer)); err = ops->dcbnl_getbuffer(netdev, &buffer); if (!err && nla_put(skb, DCB_ATTR_DCB_BUFFER, sizeof(buffer), &buffer)) return -EMSGSIZE; } app = nla_nest_start_noflag(skb, DCB_ATTR_IEEE_APP_TABLE); if (!app) return -EMSGSIZE; spin_lock_bh(&dcb_lock); list_for_each_entry(itr, &dcb_app_list, list) { if (itr->ifindex == netdev->ifindex) { enum ieee_attrs_app type = dcbnl_app_attr_type_get(itr->app.selector); err = nla_put(skb, type, sizeof(itr->app), &itr->app); if (err) { spin_unlock_bh(&dcb_lock); return -EMSGSIZE; } } } if (netdev->dcbnl_ops->getdcbx) dcbx = netdev->dcbnl_ops->getdcbx(netdev); else dcbx = -EOPNOTSUPP; spin_unlock_bh(&dcb_lock); nla_nest_end(skb, app); rewr = nla_nest_start(skb, DCB_ATTR_DCB_REWR_TABLE); if (!rewr) return -EMSGSIZE; spin_lock_bh(&dcb_lock); list_for_each_entry(itr, &dcb_rewr_list, list) { if (itr->ifindex == netdev->ifindex) { enum ieee_attrs_app type = dcbnl_app_attr_type_get(itr->app.selector); err = nla_put(skb, type, sizeof(itr->app), &itr->app); if (err) { spin_unlock_bh(&dcb_lock); nla_nest_cancel(skb, rewr); return -EMSGSIZE; } } } spin_unlock_bh(&dcb_lock); nla_nest_end(skb, rewr); if (ops->dcbnl_getapptrust) { err = dcbnl_getapptrust(netdev, skb); if (err) return err; } /* get peer info if available */ if (ops->ieee_peer_getets) { struct ieee_ets ets; memset(&ets, 0, sizeof(ets)); err = ops->ieee_peer_getets(netdev, &ets); if (!err && nla_put(skb, DCB_ATTR_IEEE_PEER_ETS, sizeof(ets), &ets)) return -EMSGSIZE; } if (ops->ieee_peer_getpfc) { struct ieee_pfc pfc; memset(&pfc, 0, sizeof(pfc)); err = ops->ieee_peer_getpfc(netdev, &pfc); if (!err && nla_put(skb, DCB_ATTR_IEEE_PEER_PFC, sizeof(pfc), &pfc)) return -EMSGSIZE; } if (ops->peer_getappinfo && ops->peer_getapptable) { err = dcbnl_build_peer_app(netdev, skb, DCB_ATTR_IEEE_PEER_APP, DCB_ATTR_IEEE_APP_UNSPEC, DCB_ATTR_IEEE_APP); if (err) return -EMSGSIZE; } nla_nest_end(skb, ieee); if (dcbx >= 0) { err = nla_put_u8(skb, DCB_ATTR_DCBX, dcbx); if (err) return -EMSGSIZE; } return 0; } static int dcbnl_cee_pg_fill(struct sk_buff *skb, struct net_device *dev, int dir) { u8 pgid, up_map, prio, tc_pct; const struct dcbnl_rtnl_ops *ops = dev->dcbnl_ops; int i = dir ? DCB_ATTR_CEE_TX_PG : DCB_ATTR_CEE_RX_PG; struct nlattr *pg = nla_nest_start_noflag(skb, i); if (!pg) return -EMSGSIZE; for (i = DCB_PG_ATTR_TC_0; i <= DCB_PG_ATTR_TC_7; i++) { struct nlattr *tc_nest = nla_nest_start_noflag(skb, i); if (!tc_nest) return -EMSGSIZE; pgid = DCB_ATTR_VALUE_UNDEFINED; prio = DCB_ATTR_VALUE_UNDEFINED; tc_pct = DCB_ATTR_VALUE_UNDEFINED; up_map = DCB_ATTR_VALUE_UNDEFINED; if (!dir) ops->getpgtccfgrx(dev, i - DCB_PG_ATTR_TC_0, &prio, &pgid, &tc_pct, &up_map); else ops->getpgtccfgtx(dev, i - DCB_PG_ATTR_TC_0, &prio, &pgid, &tc_pct, &up_map); if (nla_put_u8(skb, DCB_TC_ATTR_PARAM_PGID, pgid) || nla_put_u8(skb, DCB_TC_ATTR_PARAM_UP_MAPPING, up_map) || nla_put_u8(skb, DCB_TC_ATTR_PARAM_STRICT_PRIO, prio) || nla_put_u8(skb, DCB_TC_ATTR_PARAM_BW_PCT, tc_pct)) return -EMSGSIZE; nla_nest_end(skb, tc_nest); } for (i = DCB_PG_ATTR_BW_ID_0; i <= DCB_PG_ATTR_BW_ID_7; i++) { tc_pct = DCB_ATTR_VALUE_UNDEFINED; if (!dir) ops->getpgbwgcfgrx(dev, i - DCB_PG_ATTR_BW_ID_0, &tc_pct); else ops->getpgbwgcfgtx(dev, i - DCB_PG_ATTR_BW_ID_0, &tc_pct); if (nla_put_u8(skb, i, tc_pct)) return -EMSGSIZE; } nla_nest_end(skb, pg); return 0; } static int dcbnl_cee_fill(struct sk_buff *skb, struct net_device *netdev) { struct nlattr *cee, *app; struct dcb_app_type *itr; const struct dcbnl_rtnl_ops *ops = netdev->dcbnl_ops; int dcbx, i, err = -EMSGSIZE; u8 value; if (nla_put_string(skb, DCB_ATTR_IFNAME, netdev->name)) goto nla_put_failure; cee = nla_nest_start_noflag(skb, DCB_ATTR_CEE); if (!cee) goto nla_put_failure; /* local pg */ if (ops->getpgtccfgtx && ops->getpgbwgcfgtx) { err = dcbnl_cee_pg_fill(skb, netdev, 1); if (err) goto nla_put_failure; } if (ops->getpgtccfgrx && ops->getpgbwgcfgrx) { err = dcbnl_cee_pg_fill(skb, netdev, 0); if (err) goto nla_put_failure; } /* local pfc */ if (ops->getpfccfg) { struct nlattr *pfc_nest = nla_nest_start_noflag(skb, DCB_ATTR_CEE_PFC); if (!pfc_nest) goto nla_put_failure; for (i = DCB_PFC_UP_ATTR_0; i <= DCB_PFC_UP_ATTR_7; i++) { ops->getpfccfg(netdev, i - DCB_PFC_UP_ATTR_0, &value); if (nla_put_u8(skb, i, value)) goto nla_put_failure; } nla_nest_end(skb, pfc_nest); } /* local app */ spin_lock_bh(&dcb_lock); app = nla_nest_start_noflag(skb, DCB_ATTR_CEE_APP_TABLE); if (!app) goto dcb_unlock; list_for_each_entry(itr, &dcb_app_list, list) { if (itr->ifindex == netdev->ifindex) { struct nlattr *app_nest = nla_nest_start_noflag(skb, DCB_ATTR_APP); if (!app_nest) goto dcb_unlock; err = nla_put_u8(skb, DCB_APP_ATTR_IDTYPE, itr->app.selector); if (err) goto dcb_unlock; err = nla_put_u16(skb, DCB_APP_ATTR_ID, itr->app.protocol); if (err) goto dcb_unlock; err = nla_put_u8(skb, DCB_APP_ATTR_PRIORITY, itr->app.priority); if (err) goto dcb_unlock; nla_nest_end(skb, app_nest); } } nla_nest_end(skb, app); if (netdev->dcbnl_ops->getdcbx) dcbx = netdev->dcbnl_ops->getdcbx(netdev); else dcbx = -EOPNOTSUPP; spin_unlock_bh(&dcb_lock); /* features flags */ if (ops->getfeatcfg) { struct nlattr *feat = nla_nest_start_noflag(skb, DCB_ATTR_CEE_FEAT); if (!feat) goto nla_put_failure; for (i = DCB_FEATCFG_ATTR_ALL + 1; i <= DCB_FEATCFG_ATTR_MAX; i++) if (!ops->getfeatcfg(netdev, i, &value) && nla_put_u8(skb, i, value)) goto nla_put_failure; nla_nest_end(skb, feat); } /* peer info if available */ if (ops->cee_peer_getpg) { struct cee_pg pg; memset(&pg, 0, sizeof(pg)); err = ops->cee_peer_getpg(netdev, &pg); if (!err && nla_put(skb, DCB_ATTR_CEE_PEER_PG, sizeof(pg), &pg)) goto nla_put_failure; } if (ops->cee_peer_getpfc) { struct cee_pfc pfc; memset(&pfc, 0, sizeof(pfc)); err = ops->cee_peer_getpfc(netdev, &pfc); if (!err && nla_put(skb, DCB_ATTR_CEE_PEER_PFC, sizeof(pfc), &pfc)) goto nla_put_failure; } if (ops->peer_getappinfo && ops->peer_getapptable) { err = dcbnl_build_peer_app(netdev, skb, DCB_ATTR_CEE_PEER_APP_TABLE, DCB_ATTR_CEE_PEER_APP_INFO, DCB_ATTR_CEE_PEER_APP); if (err) goto nla_put_failure; } nla_nest_end(skb, cee); /* DCBX state */ if (dcbx >= 0) { err = nla_put_u8(skb, DCB_ATTR_DCBX, dcbx); if (err) goto nla_put_failure; } return 0; dcb_unlock: spin_unlock_bh(&dcb_lock); nla_put_failure: err = -EMSGSIZE; return err; } static int dcbnl_notify(struct net_device *dev, int event, int cmd, u32 seq, u32 portid, int dcbx_ver) { struct net *net = dev_net(dev); struct sk_buff *skb; struct nlmsghdr *nlh; const struct dcbnl_rtnl_ops *ops = dev->dcbnl_ops; int err; if (!ops) return -EOPNOTSUPP; skb = dcbnl_newmsg(event, cmd, portid, seq, 0, &nlh); if (!skb) return -ENOMEM; if (dcbx_ver == DCB_CAP_DCBX_VER_IEEE) err = dcbnl_ieee_fill(skb, dev); else err = dcbnl_cee_fill(skb, dev); if (err < 0) { /* Report error to broadcast listeners */ nlmsg_free(skb); rtnl_set_sk_err(net, RTNLGRP_DCB, err); } else { /* End nlmsg and notify broadcast listeners */ nlmsg_end(skb, nlh); rtnl_notify(skb, net, 0, RTNLGRP_DCB, NULL, GFP_KERNEL); } return err; } int dcbnl_ieee_notify(struct net_device *dev, int event, int cmd, u32 seq, u32 portid) { return dcbnl_notify(dev, event, cmd, seq, portid, DCB_CAP_DCBX_VER_IEEE); } EXPORT_SYMBOL(dcbnl_ieee_notify); int dcbnl_cee_notify(struct net_device *dev, int event, int cmd, u32 seq, u32 portid) { return dcbnl_notify(dev, event, cmd, seq, portid, DCB_CAP_DCBX_VER_CEE); } EXPORT_SYMBOL(dcbnl_cee_notify); /* Handle IEEE 802.1Qaz/802.1Qau/802.1Qbb SET commands. * If any requested operation can not be completed * the entire msg is aborted and error value is returned. * No attempt is made to reconcile the case where only part of the * cmd can be completed. */ static int dcbnl_ieee_set(struct net_device *netdev, struct nlmsghdr *nlh, u32 seq, struct nlattr **tb, struct sk_buff *skb) { const struct dcbnl_rtnl_ops *ops = netdev->dcbnl_ops; struct nlattr *ieee[DCB_ATTR_IEEE_MAX + 1]; int prio; int err; if (!ops) return -EOPNOTSUPP; if (!tb[DCB_ATTR_IEEE]) return -EINVAL; err = nla_parse_nested_deprecated(ieee, DCB_ATTR_IEEE_MAX, tb[DCB_ATTR_IEEE], dcbnl_ieee_policy, NULL); if (err) return err; if (ieee[DCB_ATTR_IEEE_ETS] && ops->ieee_setets) { struct ieee_ets *ets = nla_data(ieee[DCB_ATTR_IEEE_ETS]); err = ops->ieee_setets(netdev, ets); if (err) goto err; } if (ieee[DCB_ATTR_IEEE_MAXRATE] && ops->ieee_setmaxrate) { struct ieee_maxrate *maxrate = nla_data(ieee[DCB_ATTR_IEEE_MAXRATE]); err = ops->ieee_setmaxrate(netdev, maxrate); if (err) goto err; } if (ieee[DCB_ATTR_IEEE_QCN] && ops->ieee_setqcn) { struct ieee_qcn *qcn = nla_data(ieee[DCB_ATTR_IEEE_QCN]); err = ops->ieee_setqcn(netdev, qcn); if (err) goto err; } if (ieee[DCB_ATTR_IEEE_PFC] && ops->ieee_setpfc) { struct ieee_pfc *pfc = nla_data(ieee[DCB_ATTR_IEEE_PFC]); err = ops->ieee_setpfc(netdev, pfc); if (err) goto err; } if (ieee[DCB_ATTR_DCB_BUFFER] && ops->dcbnl_setbuffer) { struct dcbnl_buffer *buffer = nla_data(ieee[DCB_ATTR_DCB_BUFFER]); for (prio = 0; prio < ARRAY_SIZE(buffer->prio2buffer); prio++) { if (buffer->prio2buffer[prio] >= DCBX_MAX_BUFFERS) { err = -EINVAL; goto err; } } err = ops->dcbnl_setbuffer(netdev, buffer); if (err) goto err; } if (ieee[DCB_ATTR_DCB_REWR_TABLE]) { err = dcbnl_app_table_setdel(ieee[DCB_ATTR_DCB_REWR_TABLE], netdev, ops->dcbnl_setrewr ?: dcb_setrewr); if (err) goto err; } if (ieee[DCB_ATTR_IEEE_APP_TABLE]) { err = dcbnl_app_table_setdel(ieee[DCB_ATTR_IEEE_APP_TABLE], netdev, ops->ieee_setapp ?: dcb_ieee_setapp); if (err) goto err; } if (ieee[DCB_ATTR_DCB_APP_TRUST_TABLE]) { u8 selectors[IEEE_8021QAZ_APP_SEL_MAX + 1] = {0}; struct nlattr *attr; int nselectors = 0; int rem; if (!ops->dcbnl_setapptrust) { err = -EOPNOTSUPP; goto err; } nla_for_each_nested(attr, ieee[DCB_ATTR_DCB_APP_TRUST_TABLE], rem) { enum ieee_attrs_app type = nla_type(attr); u8 selector; int i; if (!dcbnl_app_attr_type_validate(type) || nla_len(attr) != 1 || nselectors >= sizeof(selectors)) { err = -EINVAL; goto err; } selector = nla_get_u8(attr); if (!dcbnl_app_selector_validate(type, selector)) { err = -EINVAL; goto err; } /* Duplicate selector ? */ for (i = 0; i < nselectors; i++) { if (selectors[i] == selector) { err = -EINVAL; goto err; } } selectors[nselectors++] = selector; } err = ops->dcbnl_setapptrust(netdev, selectors, nselectors); if (err) goto err; } err: err = nla_put_u8(skb, DCB_ATTR_IEEE, err); dcbnl_ieee_notify(netdev, RTM_SETDCB, DCB_CMD_IEEE_SET, seq, 0); return err; } static int dcbnl_ieee_get(struct net_device *netdev, struct nlmsghdr *nlh, u32 seq, struct nlattr **tb, struct sk_buff *skb) { const struct dcbnl_rtnl_ops *ops = netdev->dcbnl_ops; if (!ops) return -EOPNOTSUPP; return dcbnl_ieee_fill(skb, netdev); } static int dcbnl_ieee_del(struct net_device *netdev, struct nlmsghdr *nlh, u32 seq, struct nlattr **tb, struct sk_buff *skb) { const struct dcbnl_rtnl_ops *ops = netdev->dcbnl_ops; struct nlattr *ieee[DCB_ATTR_IEEE_MAX + 1]; int err; if (!ops) return -EOPNOTSUPP; if (!tb[DCB_ATTR_IEEE]) return -EINVAL; err = nla_parse_nested_deprecated(ieee, DCB_ATTR_IEEE_MAX, tb[DCB_ATTR_IEEE], dcbnl_ieee_policy, NULL); if (err) return err; if (ieee[DCB_ATTR_IEEE_APP_TABLE]) { err = dcbnl_app_table_setdel(ieee[DCB_ATTR_IEEE_APP_TABLE], netdev, ops->ieee_delapp ?: dcb_ieee_delapp); if (err) goto err; } if (ieee[DCB_ATTR_DCB_REWR_TABLE]) { err = dcbnl_app_table_setdel(ieee[DCB_ATTR_DCB_REWR_TABLE], netdev, ops->dcbnl_delrewr ?: dcb_delrewr); if (err) goto err; } err: err = nla_put_u8(skb, DCB_ATTR_IEEE, err); dcbnl_ieee_notify(netdev, RTM_SETDCB, DCB_CMD_IEEE_DEL, seq, 0); return err; } /* DCBX configuration */ static int dcbnl_getdcbx(struct net_device *netdev, struct nlmsghdr *nlh, u32 seq, struct nlattr **tb, struct sk_buff *skb) { if (!netdev->dcbnl_ops->getdcbx) return -EOPNOTSUPP; return nla_put_u8(skb, DCB_ATTR_DCBX, netdev->dcbnl_ops->getdcbx(netdev)); } static int dcbnl_setdcbx(struct net_device *netdev, struct nlmsghdr *nlh, u32 seq, struct nlattr **tb, struct sk_buff *skb) { u8 value; if (!netdev->dcbnl_ops->setdcbx) return -EOPNOTSUPP; if (!tb[DCB_ATTR_DCBX]) return -EINVAL; value = nla_get_u8(tb[DCB_ATTR_DCBX]); return nla_put_u8(skb, DCB_ATTR_DCBX, netdev->dcbnl_ops->setdcbx(netdev, value)); } static int dcbnl_getfeatcfg(struct net_device *netdev, struct nlmsghdr *nlh, u32 seq, struct nlattr **tb, struct sk_buff *skb) { struct nlattr *data[DCB_FEATCFG_ATTR_MAX + 1], *nest; u8 value; int ret, i; int getall = 0; if (!netdev->dcbnl_ops->getfeatcfg) return -EOPNOTSUPP; if (!tb[DCB_ATTR_FEATCFG]) return -EINVAL; ret = nla_parse_nested_deprecated(data, DCB_FEATCFG_ATTR_MAX, tb[DCB_ATTR_FEATCFG], dcbnl_featcfg_nest, NULL); if (ret) return ret; nest = nla_nest_start_noflag(skb, DCB_ATTR_FEATCFG); if (!nest) return -EMSGSIZE; if (data[DCB_FEATCFG_ATTR_ALL]) getall = 1; for (i = DCB_FEATCFG_ATTR_ALL+1; i <= DCB_FEATCFG_ATTR_MAX; i++) { if (!getall && !data[i]) continue; ret = netdev->dcbnl_ops->getfeatcfg(netdev, i, &value); if (!ret) ret = nla_put_u8(skb, i, value); if (ret) { nla_nest_cancel(skb, nest); goto nla_put_failure; } } nla_nest_end(skb, nest); nla_put_failure: return ret; } static int dcbnl_setfeatcfg(struct net_device *netdev, struct nlmsghdr *nlh, u32 seq, struct nlattr **tb, struct sk_buff *skb) { struct nlattr *data[DCB_FEATCFG_ATTR_MAX + 1]; int ret, i; u8 value; if (!netdev->dcbnl_ops->setfeatcfg) return -ENOTSUPP; if (!tb[DCB_ATTR_FEATCFG]) return -EINVAL; ret = nla_parse_nested_deprecated(data, DCB_FEATCFG_ATTR_MAX, tb[DCB_ATTR_FEATCFG], dcbnl_featcfg_nest, NULL); if (ret) goto err; for (i = DCB_FEATCFG_ATTR_ALL+1; i <= DCB_FEATCFG_ATTR_MAX; i++) { if (data[i] == NULL) continue; value = nla_get_u8(data[i]); ret = netdev->dcbnl_ops->setfeatcfg(netdev, i, value); if (ret) goto err; } err: ret = nla_put_u8(skb, DCB_ATTR_FEATCFG, ret); return ret; } /* Handle CEE DCBX GET commands. */ static int dcbnl_cee_get(struct net_device *netdev, struct nlmsghdr *nlh, u32 seq, struct nlattr **tb, struct sk_buff *skb) { const struct dcbnl_rtnl_ops *ops = netdev->dcbnl_ops; if (!ops) return -EOPNOTSUPP; return dcbnl_cee_fill(skb, netdev); } struct reply_func { /* reply netlink message type */ int type; /* function to fill message contents */ int (*cb)(struct net_device *, struct nlmsghdr *, u32, struct nlattr **, struct sk_buff *); }; static const struct reply_func reply_funcs[DCB_CMD_MAX+1] = { [DCB_CMD_GSTATE] = { RTM_GETDCB, dcbnl_getstate }, [DCB_CMD_SSTATE] = { RTM_SETDCB, dcbnl_setstate }, [DCB_CMD_PFC_GCFG] = { RTM_GETDCB, dcbnl_getpfccfg }, [DCB_CMD_PFC_SCFG] = { RTM_SETDCB, dcbnl_setpfccfg }, [DCB_CMD_GPERM_HWADDR] = { RTM_GETDCB, dcbnl_getperm_hwaddr }, [DCB_CMD_GCAP] = { RTM_GETDCB, dcbnl_getcap }, [DCB_CMD_GNUMTCS] = { RTM_GETDCB, dcbnl_getnumtcs }, [DCB_CMD_SNUMTCS] = { RTM_SETDCB, dcbnl_setnumtcs }, [DCB_CMD_PFC_GSTATE] = { RTM_GETDCB, dcbnl_getpfcstate }, [DCB_CMD_PFC_SSTATE] = { RTM_SETDCB, dcbnl_setpfcstate }, [DCB_CMD_GAPP] = { RTM_GETDCB, dcbnl_getapp }, [DCB_CMD_SAPP] = { RTM_SETDCB, dcbnl_setapp }, [DCB_CMD_PGTX_GCFG] = { RTM_GETDCB, dcbnl_pgtx_getcfg }, [DCB_CMD_PGTX_SCFG] = { RTM_SETDCB, dcbnl_pgtx_setcfg }, [DCB_CMD_PGRX_GCFG] = { RTM_GETDCB, dcbnl_pgrx_getcfg }, [DCB_CMD_PGRX_SCFG] = { RTM_SETDCB, dcbnl_pgrx_setcfg }, [DCB_CMD_SET_ALL] = { RTM_SETDCB, dcbnl_setall }, [DCB_CMD_BCN_GCFG] = { RTM_GETDCB, dcbnl_bcn_getcfg }, [DCB_CMD_BCN_SCFG] = { RTM_SETDCB, dcbnl_bcn_setcfg }, [DCB_CMD_IEEE_GET] = { RTM_GETDCB, dcbnl_ieee_get }, [DCB_CMD_IEEE_SET] = { RTM_SETDCB, dcbnl_ieee_set }, [DCB_CMD_IEEE_DEL] = { RTM_SETDCB, dcbnl_ieee_del }, [DCB_CMD_GDCBX] = { RTM_GETDCB, dcbnl_getdcbx }, [DCB_CMD_SDCBX] = { RTM_SETDCB, dcbnl_setdcbx }, [DCB_CMD_GFEATCFG] = { RTM_GETDCB, dcbnl_getfeatcfg }, [DCB_CMD_SFEATCFG] = { RTM_SETDCB, dcbnl_setfeatcfg }, [DCB_CMD_CEE_GET] = { RTM_GETDCB, dcbnl_cee_get }, }; static int dcb_doit(struct sk_buff *skb, struct nlmsghdr *nlh, struct netlink_ext_ack *extack) { struct net *net = sock_net(skb->sk); struct net_device *netdev; struct dcbmsg *dcb = nlmsg_data(nlh); struct nlattr *tb[DCB_ATTR_MAX + 1]; u32 portid = NETLINK_CB(skb).portid; int ret = -EINVAL; struct sk_buff *reply_skb; struct nlmsghdr *reply_nlh = NULL; const struct reply_func *fn; if ((nlh->nlmsg_type == RTM_SETDCB) && !netlink_capable(skb, CAP_NET_ADMIN)) return -EPERM; ret = nlmsg_parse_deprecated(nlh, sizeof(*dcb), tb, DCB_ATTR_MAX, dcbnl_rtnl_policy, extack); if (ret < 0) return ret; if (dcb->cmd > DCB_CMD_MAX) return -EINVAL; /* check if a reply function has been defined for the command */ fn = &reply_funcs[dcb->cmd]; if (!fn->cb) return -EOPNOTSUPP; if (fn->type == RTM_SETDCB && !netlink_capable(skb, CAP_NET_ADMIN)) return -EPERM; if (!tb[DCB_ATTR_IFNAME]) return -EINVAL; netdev = __dev_get_by_name(net, nla_data(tb[DCB_ATTR_IFNAME])); if (!netdev) return -ENODEV; if (!netdev->dcbnl_ops) return -EOPNOTSUPP; reply_skb = dcbnl_newmsg(fn->type, dcb->cmd, portid, nlh->nlmsg_seq, nlh->nlmsg_flags, &reply_nlh); if (!reply_skb) return -ENOMEM; ret = fn->cb(netdev, nlh, nlh->nlmsg_seq, tb, reply_skb); if (ret < 0) { nlmsg_free(reply_skb); goto out; } nlmsg_end(reply_skb, reply_nlh); ret = rtnl_unicast(reply_skb, net, portid); out: return ret; } static struct dcb_app_type *dcb_rewr_lookup(const struct dcb_app *app, int ifindex, int proto) { struct dcb_app_type *itr; list_for_each_entry(itr, &dcb_rewr_list, list) { if (itr->app.selector == app->selector && itr->app.priority == app->priority && itr->ifindex == ifindex && ((proto == -1) || itr->app.protocol == proto)) return itr; } return NULL; } static struct dcb_app_type *dcb_app_lookup(const struct dcb_app *app, int ifindex, int prio) { struct dcb_app_type *itr; list_for_each_entry(itr, &dcb_app_list, list) { if (itr->app.selector == app->selector && itr->app.protocol == app->protocol && itr->ifindex == ifindex && ((prio == -1) || itr->app.priority == prio)) return itr; } return NULL; } static int dcb_app_add(struct list_head *list, const struct dcb_app *app, int ifindex) { struct dcb_app_type *entry; entry = kmalloc(sizeof(*entry), GFP_ATOMIC); if (!entry) return -ENOMEM; memcpy(&entry->app, app, sizeof(*app)); entry->ifindex = ifindex; list_add(&entry->list, list); return 0; } /** * dcb_getapp - retrieve the DCBX application user priority * @dev: network interface * @app: application to get user priority of * * On success returns a non-zero 802.1p user priority bitmap * otherwise returns 0 as the invalid user priority bitmap to * indicate an error. */ u8 dcb_getapp(struct net_device *dev, struct dcb_app *app) { struct dcb_app_type *itr; u8 prio = 0; spin_lock_bh(&dcb_lock); itr = dcb_app_lookup(app, dev->ifindex, -1); if (itr) prio = itr->app.priority; spin_unlock_bh(&dcb_lock); return prio; } EXPORT_SYMBOL(dcb_getapp); /** * dcb_setapp - add CEE dcb application data to app list * @dev: network interface * @new: application data to add * * Priority 0 is an invalid priority in CEE spec. This routine * removes applications from the app list if the priority is * set to zero. Priority is expected to be 8-bit 802.1p user priority bitmap */ int dcb_setapp(struct net_device *dev, struct dcb_app *new) { struct dcb_app_type *itr; struct dcb_app_type event; int err = 0; event.ifindex = dev->ifindex; memcpy(&event.app, new, sizeof(event.app)); if (dev->dcbnl_ops->getdcbx) event.dcbx = dev->dcbnl_ops->getdcbx(dev); spin_lock_bh(&dcb_lock); /* Search for existing match and replace */ itr = dcb_app_lookup(new, dev->ifindex, -1); if (itr) { if (new->priority) itr->app.priority = new->priority; else { list_del(&itr->list); kfree(itr); } goto out; } /* App type does not exist add new application type */ if (new->priority) err = dcb_app_add(&dcb_app_list, new, dev->ifindex); out: spin_unlock_bh(&dcb_lock); if (!err) call_dcbevent_notifiers(DCB_APP_EVENT, &event); return err; } EXPORT_SYMBOL(dcb_setapp); /** * dcb_ieee_getapp_mask - retrieve the IEEE DCB application priority * @dev: network interface * @app: where to store the retrieve application data * * Helper routine which on success returns a non-zero 802.1Qaz user * priority bitmap otherwise returns 0 to indicate the dcb_app was * not found in APP list. */ u8 dcb_ieee_getapp_mask(struct net_device *dev, struct dcb_app *app) { struct dcb_app_type *itr; u8 prio = 0; spin_lock_bh(&dcb_lock); itr = dcb_app_lookup(app, dev->ifindex, -1); if (itr) prio |= 1 << itr->app.priority; spin_unlock_bh(&dcb_lock); return prio; } EXPORT_SYMBOL(dcb_ieee_getapp_mask); /* Get protocol value from rewrite entry. */ u16 dcb_getrewr(struct net_device *dev, struct dcb_app *app) { struct dcb_app_type *itr; u16 proto = 0; spin_lock_bh(&dcb_lock); itr = dcb_rewr_lookup(app, dev->ifindex, -1); if (itr) proto = itr->app.protocol; spin_unlock_bh(&dcb_lock); return proto; } EXPORT_SYMBOL(dcb_getrewr); /* Add rewrite entry to the rewrite list. */ int dcb_setrewr(struct net_device *dev, struct dcb_app *new) { int err; spin_lock_bh(&dcb_lock); /* Search for existing match and abort if found. */ if (dcb_rewr_lookup(new, dev->ifindex, new->protocol)) { err = -EEXIST; goto out; } err = dcb_app_add(&dcb_rewr_list, new, dev->ifindex); out: spin_unlock_bh(&dcb_lock); return err; } EXPORT_SYMBOL(dcb_setrewr); /* Delete rewrite entry from the rewrite list. */ int dcb_delrewr(struct net_device *dev, struct dcb_app *del) { struct dcb_app_type *itr; int err = -ENOENT; spin_lock_bh(&dcb_lock); /* Search for existing match and remove it. */ itr = dcb_rewr_lookup(del, dev->ifindex, del->protocol); if (itr) { list_del(&itr->list); kfree(itr); err = 0; } spin_unlock_bh(&dcb_lock); return err; } EXPORT_SYMBOL(dcb_delrewr); /** * dcb_ieee_setapp - add IEEE dcb application data to app list * @dev: network interface * @new: application data to add * * This adds Application data to the list. Multiple application * entries may exists for the same selector and protocol as long * as the priorities are different. Priority is expected to be a * 3-bit unsigned integer */ int dcb_ieee_setapp(struct net_device *dev, struct dcb_app *new) { struct dcb_app_type event; int err = 0; event.ifindex = dev->ifindex; memcpy(&event.app, new, sizeof(event.app)); if (dev->dcbnl_ops->getdcbx) event.dcbx = dev->dcbnl_ops->getdcbx(dev); spin_lock_bh(&dcb_lock); /* Search for existing match and abort if found */ if (dcb_app_lookup(new, dev->ifindex, new->priority)) { err = -EEXIST; goto out; } err = dcb_app_add(&dcb_app_list, new, dev->ifindex); out: spin_unlock_bh(&dcb_lock); if (!err) call_dcbevent_notifiers(DCB_APP_EVENT, &event); return err; } EXPORT_SYMBOL(dcb_ieee_setapp); /** * dcb_ieee_delapp - delete IEEE dcb application data from list * @dev: network interface * @del: application data to delete * * This removes a matching APP data from the APP list */ int dcb_ieee_delapp(struct net_device *dev, struct dcb_app *del) { struct dcb_app_type *itr; struct dcb_app_type event; int err = -ENOENT; event.ifindex = dev->ifindex; memcpy(&event.app, del, sizeof(event.app)); if (dev->dcbnl_ops->getdcbx) event.dcbx = dev->dcbnl_ops->getdcbx(dev); spin_lock_bh(&dcb_lock); /* Search for existing match and remove it. */ if ((itr = dcb_app_lookup(del, dev->ifindex, del->priority))) { list_del(&itr->list); kfree(itr); err = 0; } spin_unlock_bh(&dcb_lock); if (!err) call_dcbevent_notifiers(DCB_APP_EVENT, &event); return err; } EXPORT_SYMBOL(dcb_ieee_delapp); /* dcb_getrewr_prio_pcp_mask_map - For a given device, find mapping from * priorities to the PCP and DEI values assigned to that priority. */ void dcb_getrewr_prio_pcp_mask_map(const struct net_device *dev, struct dcb_rewr_prio_pcp_map *p_map) { int ifindex = dev->ifindex; struct dcb_app_type *itr; u8 prio; memset(p_map->map, 0, sizeof(p_map->map)); spin_lock_bh(&dcb_lock); list_for_each_entry(itr, &dcb_rewr_list, list) { if (itr->ifindex == ifindex && itr->app.selector == DCB_APP_SEL_PCP && itr->app.protocol < 16 && itr->app.priority < IEEE_8021QAZ_MAX_TCS) { prio = itr->app.priority; p_map->map[prio] |= 1 << itr->app.protocol; } } spin_unlock_bh(&dcb_lock); } EXPORT_SYMBOL(dcb_getrewr_prio_pcp_mask_map); /* dcb_getrewr_prio_dscp_mask_map - For a given device, find mapping from * priorities to the DSCP values assigned to that priority. */ void dcb_getrewr_prio_dscp_mask_map(const struct net_device *dev, struct dcb_ieee_app_prio_map *p_map) { int ifindex = dev->ifindex; struct dcb_app_type *itr; u8 prio; memset(p_map->map, 0, sizeof(p_map->map)); spin_lock_bh(&dcb_lock); list_for_each_entry(itr, &dcb_rewr_list, list) { if (itr->ifindex == ifindex && itr->app.selector == IEEE_8021QAZ_APP_SEL_DSCP && itr->app.protocol < 64 && itr->app.priority < IEEE_8021QAZ_MAX_TCS) { prio = itr->app.priority; p_map->map[prio] |= 1ULL << itr->app.protocol; } } spin_unlock_bh(&dcb_lock); } EXPORT_SYMBOL(dcb_getrewr_prio_dscp_mask_map); /* * dcb_ieee_getapp_prio_dscp_mask_map - For a given device, find mapping from * priorities to the DSCP values assigned to that priority. Initialize p_map * such that each map element holds a bit mask of DSCP values configured for * that priority by APP entries. */ void dcb_ieee_getapp_prio_dscp_mask_map(const struct net_device *dev, struct dcb_ieee_app_prio_map *p_map) { int ifindex = dev->ifindex; struct dcb_app_type *itr; u8 prio; memset(p_map->map, 0, sizeof(p_map->map)); spin_lock_bh(&dcb_lock); list_for_each_entry(itr, &dcb_app_list, list) { if (itr->ifindex == ifindex && itr->app.selector == IEEE_8021QAZ_APP_SEL_DSCP && itr->app.protocol < 64 && itr->app.priority < IEEE_8021QAZ_MAX_TCS) { prio = itr->app.priority; p_map->map[prio] |= 1ULL << itr->app.protocol; } } spin_unlock_bh(&dcb_lock); } EXPORT_SYMBOL(dcb_ieee_getapp_prio_dscp_mask_map); /* * dcb_ieee_getapp_dscp_prio_mask_map - For a given device, find mapping from * DSCP values to the priorities assigned to that DSCP value. Initialize p_map * such that each map element holds a bit mask of priorities configured for a * given DSCP value by APP entries. */ void dcb_ieee_getapp_dscp_prio_mask_map(const struct net_device *dev, struct dcb_ieee_app_dscp_map *p_map) { int ifindex = dev->ifindex; struct dcb_app_type *itr; memset(p_map->map, 0, sizeof(p_map->map)); spin_lock_bh(&dcb_lock); list_for_each_entry(itr, &dcb_app_list, list) { if (itr->ifindex == ifindex && itr->app.selector == IEEE_8021QAZ_APP_SEL_DSCP && itr->app.protocol < 64 && itr->app.priority < IEEE_8021QAZ_MAX_TCS) p_map->map[itr->app.protocol] |= 1 << itr->app.priority; } spin_unlock_bh(&dcb_lock); } EXPORT_SYMBOL(dcb_ieee_getapp_dscp_prio_mask_map); /* * Per 802.1Q-2014, the selector value of 1 is used for matching on Ethernet * type, with valid PID values >= 1536. A special meaning is then assigned to * protocol value of 0: "default priority. For use when priority is not * otherwise specified". * * dcb_ieee_getapp_default_prio_mask - For a given device, find all APP entries * of the form {$PRIO, ETHERTYPE, 0} and construct a bit mask of all default * priorities set by these entries. */ u8 dcb_ieee_getapp_default_prio_mask(const struct net_device *dev) { int ifindex = dev->ifindex; struct dcb_app_type *itr; u8 mask = 0; spin_lock_bh(&dcb_lock); list_for_each_entry(itr, &dcb_app_list, list) { if (itr->ifindex == ifindex && itr->app.selector == IEEE_8021QAZ_APP_SEL_ETHERTYPE && itr->app.protocol == 0 && itr->app.priority < IEEE_8021QAZ_MAX_TCS) mask |= 1 << itr->app.priority; } spin_unlock_bh(&dcb_lock); return mask; } EXPORT_SYMBOL(dcb_ieee_getapp_default_prio_mask); static void dcbnl_flush_dev(struct net_device *dev) { struct dcb_app_type *itr, *tmp; spin_lock_bh(&dcb_lock); list_for_each_entry_safe(itr, tmp, &dcb_app_list, list) { if (itr->ifindex == dev->ifindex) { list_del(&itr->list); kfree(itr); } } spin_unlock_bh(&dcb_lock); } static int dcbnl_netdevice_event(struct notifier_block *nb, unsigned long event, void *ptr) { struct net_device *dev = netdev_notifier_info_to_dev(ptr); switch (event) { case NETDEV_UNREGISTER: if (!dev->dcbnl_ops) return NOTIFY_DONE; dcbnl_flush_dev(dev); return NOTIFY_OK; default: return NOTIFY_DONE; } } static struct notifier_block dcbnl_nb __read_mostly = { .notifier_call = dcbnl_netdevice_event, }; static const struct rtnl_msg_handler dcbnl_rtnl_msg_handlers[] __initconst = { {.msgtype = RTM_GETDCB, .doit = dcb_doit}, {.msgtype = RTM_SETDCB, .doit = dcb_doit}, }; static int __init dcbnl_init(void) { int err; err = register_netdevice_notifier(&dcbnl_nb); if (err) return err; rtnl_register_many(dcbnl_rtnl_msg_handlers); return 0; } device_initcall(dcbnl_init); |
| 19 16 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 | /* SPDX-License-Identifier: GPL-2.0-or-later */ /* * NET Generic infrastructure for Network protocols. * * Definitions for request_sock * * Authors: Arnaldo Carvalho de Melo <acme@conectiva.com.br> * * From code originally in include/net/tcp.h */ #ifndef _REQUEST_SOCK_H #define _REQUEST_SOCK_H #include <linux/slab.h> #include <linux/spinlock.h> #include <linux/types.h> #include <linux/bug.h> #include <linux/refcount.h> #include <net/sock.h> #include <net/rstreason.h> struct request_sock; struct sk_buff; struct dst_entry; struct proto; struct request_sock_ops { int family; unsigned int obj_size; struct kmem_cache *slab; char *slab_name; void (*send_ack)(const struct sock *sk, struct sk_buff *skb, struct request_sock *req); void (*send_reset)(const struct sock *sk, struct sk_buff *skb, enum sk_rst_reason reason); void (*destructor)(struct request_sock *req); void (*syn_ack_timeout)(const struct request_sock *req); }; struct saved_syn { u32 mac_hdrlen; u32 network_hdrlen; u32 tcp_hdrlen; u8 data[]; }; /* struct request_sock - mini sock to represent a connection request */ struct request_sock { struct sock_common __req_common; #define rsk_refcnt __req_common.skc_refcnt #define rsk_hash __req_common.skc_hash #define rsk_listener __req_common.skc_listener #define rsk_window_clamp __req_common.skc_window_clamp #define rsk_rcv_wnd __req_common.skc_rcv_wnd struct request_sock *dl_next; u16 mss; u8 num_retrans; /* number of retransmits */ u8 syncookie:1; /* True if * 1) tcpopts needs to be encoded in * TS of SYN+ACK * 2) ACK is validated by BPF kfunc. */ u8 num_timeout:7; /* number of timeouts */ u32 ts_recent; struct timer_list rsk_timer; const struct request_sock_ops *rsk_ops; struct sock *sk; struct saved_syn *saved_syn; u32 secid; u32 peer_secid; u32 timeout; }; static inline struct request_sock *inet_reqsk(const struct sock *sk) { return (struct request_sock *)sk; } static inline struct sock *req_to_sk(struct request_sock *req) { return (struct sock *)req; } /** * skb_steal_sock - steal a socket from an sk_buff * @skb: sk_buff to steal the socket from * @refcounted: is set to true if the socket is reference-counted * @prefetched: is set to true if the socket was assigned from bpf */ static inline struct sock *skb_steal_sock(struct sk_buff *skb, bool *refcounted, bool *prefetched) { struct sock *sk = skb->sk; if (!sk) { *prefetched = false; *refcounted = false; return NULL; } *prefetched = skb_sk_is_prefetched(skb); if (*prefetched) { #if IS_ENABLED(CONFIG_SYN_COOKIES) if (sk->sk_state == TCP_NEW_SYN_RECV && inet_reqsk(sk)->syncookie) { struct request_sock *req = inet_reqsk(sk); *refcounted = false; sk = req->rsk_listener; req->rsk_listener = NULL; return sk; } #endif *refcounted = sk_is_refcounted(sk); } else { *refcounted = true; } skb->destructor = NULL; skb->sk = NULL; return sk; } static inline void __reqsk_free(struct request_sock *req) { req->rsk_ops->destructor(req); if (req->rsk_listener) sock_put(req->rsk_listener); kfree(req->saved_syn); kmem_cache_free(req->rsk_ops->slab, req); } static inline void reqsk_free(struct request_sock *req) { DEBUG_NET_WARN_ON_ONCE(refcount_read(&req->rsk_refcnt) != 0); __reqsk_free(req); } static inline void reqsk_put(struct request_sock *req) { if (refcount_dec_and_test(&req->rsk_refcnt)) __reqsk_free(req); } /* * For a TCP Fast Open listener - * lock - protects the access to all the reqsk, which is co-owned by * the listener and the child socket. * qlen - pending TFO requests (still in TCP_SYN_RECV). * max_qlen - max TFO reqs allowed before TFO is disabled. * * XXX (TFO) - ideally these fields can be made as part of "listen_sock" * structure above. But there is some implementation difficulty due to * listen_sock being part of request_sock_queue hence will be freed when * a listener is stopped. But TFO related fields may continue to be * accessed even after a listener is closed, until its sk_refcnt drops * to 0 implying no more outstanding TFO reqs. One solution is to keep * listen_opt around until sk_refcnt drops to 0. But there is some other * complexity that needs to be resolved. E.g., a listener can be disabled * temporarily through shutdown()->tcp_disconnect(), and re-enabled later. */ struct fastopen_queue { struct request_sock *rskq_rst_head; /* Keep track of past TFO */ struct request_sock *rskq_rst_tail; /* requests that caused RST. * This is part of the defense * against spoofing attack. */ spinlock_t lock; int qlen; /* # of pending (TCP_SYN_RECV) reqs */ int max_qlen; /* != 0 iff TFO is currently enabled */ struct tcp_fastopen_context __rcu *ctx; /* cipher context for cookie */ }; /** struct request_sock_queue - queue of request_socks * * @rskq_accept_head - FIFO head of established children * @rskq_accept_tail - FIFO tail of established children * @rskq_defer_accept - User waits for some data after accept() * */ struct request_sock_queue { spinlock_t rskq_lock; u8 rskq_defer_accept; u8 synflood_warned; atomic_t qlen; atomic_t young; struct request_sock *rskq_accept_head; struct request_sock *rskq_accept_tail; struct fastopen_queue fastopenq; /* Check max_qlen != 0 to determine * if TFO is enabled. */ }; void reqsk_queue_alloc(struct request_sock_queue *queue); void reqsk_fastopen_remove(struct sock *sk, struct request_sock *req, bool reset); static inline bool reqsk_queue_empty(const struct request_sock_queue *queue) { return READ_ONCE(queue->rskq_accept_head) == NULL; } static inline struct request_sock *reqsk_queue_remove(struct request_sock_queue *queue, struct sock *parent) { struct request_sock *req; spin_lock_bh(&queue->rskq_lock); req = queue->rskq_accept_head; if (req) { sk_acceptq_removed(parent); WRITE_ONCE(queue->rskq_accept_head, req->dl_next); if (queue->rskq_accept_head == NULL) queue->rskq_accept_tail = NULL; } spin_unlock_bh(&queue->rskq_lock); return req; } static inline void reqsk_queue_removed(struct request_sock_queue *queue, const struct request_sock *req) { if (req->num_timeout == 0) atomic_dec(&queue->young); atomic_dec(&queue->qlen); } static inline void reqsk_queue_added(struct request_sock_queue *queue) { atomic_inc(&queue->young); atomic_inc(&queue->qlen); } static inline int reqsk_queue_len(const struct request_sock_queue *queue) { return atomic_read(&queue->qlen); } static inline int reqsk_queue_len_young(const struct request_sock_queue *queue) { return atomic_read(&queue->young); } /* RFC 7323 2.3 Using the Window Scale Option * The window field (SEG.WND) of every outgoing segment, with the * exception of <SYN> segments, MUST be right-shifted by * Rcv.Wind.Shift bits. * * This means the SEG.WND carried in SYNACK can not exceed 65535. * We use this property to harden TCP stack while in NEW_SYN_RECV state. */ static inline u32 tcp_synack_window(const struct request_sock *req) { return min(req->rsk_rcv_wnd, 65535U); } #endif /* _REQUEST_SOCK_H */ |
| 183 8 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 2674 2675 2676 2677 2678 2679 2680 2681 2682 2683 2684 2685 2686 2687 2688 2689 2690 2691 2692 2693 2694 2695 2696 2697 2698 2699 2700 2701 2702 2703 2704 2705 2706 2707 2708 2709 2710 2711 2712 2713 2714 2715 2716 2717 2718 2719 2720 2721 2722 2723 2724 2725 2726 2727 2728 2729 2730 2731 2732 2733 2734 2735 2736 2737 2738 2739 2740 2741 2742 2743 2744 2745 2746 2747 2748 2749 2750 2751 2752 2753 2754 2755 2756 2757 2758 2759 2760 2761 2762 2763 2764 2765 2766 2767 2768 2769 2770 2771 2772 2773 2774 2775 2776 2777 2778 2779 2780 2781 2782 2783 2784 2785 2786 2787 2788 2789 2790 2791 2792 2793 2794 2795 2796 2797 2798 2799 2800 2801 2802 2803 2804 2805 2806 2807 2808 2809 2810 2811 2812 2813 2814 2815 2816 2817 2818 2819 2820 2821 2822 2823 2824 2825 2826 2827 2828 2829 2830 2831 2832 2833 2834 2835 2836 2837 2838 2839 2840 2841 2842 2843 2844 2845 2846 2847 2848 2849 2850 2851 2852 2853 2854 2855 2856 2857 2858 2859 2860 2861 2862 2863 2864 2865 2866 2867 2868 2869 2870 2871 2872 2873 2874 2875 2876 2877 2878 2879 2880 2881 2882 2883 2884 2885 2886 2887 2888 2889 2890 2891 2892 2893 2894 2895 2896 2897 2898 2899 2900 2901 2902 2903 2904 2905 2906 2907 2908 2909 2910 2911 2912 2913 2914 2915 2916 2917 2918 2919 2920 2921 2922 2923 2924 2925 2926 2927 2928 2929 2930 2931 2932 2933 2934 2935 2936 2937 2938 2939 2940 2941 2942 2943 2944 2945 2946 2947 2948 2949 2950 2951 2952 2953 2954 2955 2956 2957 2958 2959 2960 2961 2962 2963 2964 2965 2966 2967 2968 2969 2970 2971 2972 2973 2974 2975 2976 2977 2978 2979 2980 2981 2982 2983 2984 2985 2986 2987 2988 2989 2990 2991 2992 2993 2994 2995 2996 2997 2998 2999 3000 3001 3002 3003 3004 3005 3006 3007 3008 3009 3010 3011 3012 3013 3014 3015 3016 3017 3018 3019 3020 3021 3022 3023 3024 3025 3026 3027 3028 3029 3030 3031 3032 3033 3034 3035 3036 | // SPDX-License-Identifier: GPL-2.0 /* * linux/kernel/sys.c * * Copyright (C) 1991, 1992 Linus Torvalds */ #include <linux/export.h> #include <linux/mm.h> #include <linux/mm_inline.h> #include <linux/utsname.h> #include <linux/mman.h> #include <linux/reboot.h> #include <linux/prctl.h> #include <linux/highuid.h> #include <linux/fs.h> #include <linux/kmod.h> #include <linux/ksm.h> #include <linux/perf_event.h> #include <linux/resource.h> #include <linux/kernel.h> #include <linux/workqueue.h> #include <linux/capability.h> #include <linux/device.h> #include <linux/key.h> #include <linux/times.h> #include <linux/posix-timers.h> #include <linux/security.h> #include <linux/random.h> #include <linux/suspend.h> #include <linux/tty.h> #include <linux/signal.h> #include <linux/cn_proc.h> #include <linux/getcpu.h> #include <linux/task_io_accounting_ops.h> #include <linux/seccomp.h> #include <linux/cpu.h> #include <linux/personality.h> #include <linux/ptrace.h> #include <linux/fs_struct.h> #include <linux/file.h> #include <linux/mount.h> #include <linux/gfp.h> #include <linux/syscore_ops.h> #include <linux/version.h> #include <linux/ctype.h> #include <linux/syscall_user_dispatch.h> #include <linux/compat.h> #include <linux/syscalls.h> #include <linux/kprobes.h> #include <linux/user_namespace.h> #include <linux/time_namespace.h> #include <linux/binfmts.h> #include <linux/futex.h> #include <linux/sched.h> #include <linux/sched/autogroup.h> #include <linux/sched/loadavg.h> #include <linux/sched/stat.h> #include <linux/sched/mm.h> #include <linux/sched/coredump.h> #include <linux/sched/task.h> #include <linux/sched/cputime.h> #include <linux/rcupdate.h> #include <linux/uidgid.h> #include <linux/cred.h> #include <linux/nospec.h> #include <linux/kmsg_dump.h> /* Move somewhere else to avoid recompiling? */ #include <generated/utsrelease.h> #include <linux/uaccess.h> #include <asm/io.h> #include <asm/unistd.h> #include <trace/events/task.h> #include "uid16.h" #ifndef SET_UNALIGN_CTL # define SET_UNALIGN_CTL(a, b) (-EINVAL) #endif #ifndef GET_UNALIGN_CTL # define GET_UNALIGN_CTL(a, b) (-EINVAL) #endif #ifndef SET_FPEMU_CTL # define SET_FPEMU_CTL(a, b) (-EINVAL) #endif #ifndef GET_FPEMU_CTL # define GET_FPEMU_CTL(a, b) (-EINVAL) #endif #ifndef SET_FPEXC_CTL # define SET_FPEXC_CTL(a, b) (-EINVAL) #endif #ifndef GET_FPEXC_CTL # define GET_FPEXC_CTL(a, b) (-EINVAL) #endif #ifndef GET_ENDIAN # define GET_ENDIAN(a, b) (-EINVAL) #endif #ifndef SET_ENDIAN # define SET_ENDIAN(a, b) (-EINVAL) #endif #ifndef GET_TSC_CTL # define GET_TSC_CTL(a) (-EINVAL) #endif #ifndef SET_TSC_CTL # define SET_TSC_CTL(a) (-EINVAL) #endif #ifndef GET_FP_MODE # define GET_FP_MODE(a) (-EINVAL) #endif #ifndef SET_FP_MODE # define SET_FP_MODE(a,b) (-EINVAL) #endif #ifndef SVE_SET_VL # define SVE_SET_VL(a) (-EINVAL) #endif #ifndef SVE_GET_VL # define SVE_GET_VL() (-EINVAL) #endif #ifndef SME_SET_VL # define SME_SET_VL(a) (-EINVAL) #endif #ifndef SME_GET_VL # define SME_GET_VL() (-EINVAL) #endif #ifndef PAC_RESET_KEYS # define PAC_RESET_KEYS(a, b) (-EINVAL) #endif #ifndef PAC_SET_ENABLED_KEYS # define PAC_SET_ENABLED_KEYS(a, b, c) (-EINVAL) #endif #ifndef PAC_GET_ENABLED_KEYS # define PAC_GET_ENABLED_KEYS(a) (-EINVAL) #endif #ifndef SET_TAGGED_ADDR_CTRL # define SET_TAGGED_ADDR_CTRL(a) (-EINVAL) #endif #ifndef GET_TAGGED_ADDR_CTRL # define GET_TAGGED_ADDR_CTRL() (-EINVAL) #endif #ifndef RISCV_V_SET_CONTROL # define RISCV_V_SET_CONTROL(a) (-EINVAL) #endif #ifndef RISCV_V_GET_CONTROL # define RISCV_V_GET_CONTROL() (-EINVAL) #endif #ifndef RISCV_SET_ICACHE_FLUSH_CTX # define RISCV_SET_ICACHE_FLUSH_CTX(a, b) (-EINVAL) #endif #ifndef PPC_GET_DEXCR_ASPECT # define PPC_GET_DEXCR_ASPECT(a, b) (-EINVAL) #endif #ifndef PPC_SET_DEXCR_ASPECT # define PPC_SET_DEXCR_ASPECT(a, b, c) (-EINVAL) #endif /* * this is where the system-wide overflow UID and GID are defined, for * architectures that now have 32-bit UID/GID but didn't in the past */ int overflowuid = DEFAULT_OVERFLOWUID; int overflowgid = DEFAULT_OVERFLOWGID; EXPORT_SYMBOL(overflowuid); EXPORT_SYMBOL(overflowgid); /* * the same as above, but for filesystems which can only store a 16-bit * UID and GID. as such, this is needed on all architectures */ int fs_overflowuid = DEFAULT_FS_OVERFLOWUID; int fs_overflowgid = DEFAULT_FS_OVERFLOWGID; EXPORT_SYMBOL(fs_overflowuid); EXPORT_SYMBOL(fs_overflowgid); static const struct ctl_table overflow_sysctl_table[] = { { .procname = "overflowuid", .data = &overflowuid, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec_minmax, .extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_MAXOLDUID, }, { .procname = "overflowgid", .data = &overflowgid, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec_minmax, .extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_MAXOLDUID, }, }; static int __init init_overflow_sysctl(void) { register_sysctl_init("kernel", overflow_sysctl_table); return 0; } postcore_initcall(init_overflow_sysctl); /* * Returns true if current's euid is same as p's uid or euid, * or has CAP_SYS_NICE to p's user_ns. * * Called with rcu_read_lock, creds are safe */ static bool set_one_prio_perm(struct task_struct *p) { const struct cred *cred = current_cred(), *pcred = __task_cred(p); if (uid_eq(pcred->uid, cred->euid) || uid_eq(pcred->euid, cred->euid)) return true; if (ns_capable(pcred->user_ns, CAP_SYS_NICE)) return true; return false; } /* * set the priority of a task * - the caller must hold the RCU read lock */ static int set_one_prio(struct task_struct *p, int niceval, int error) { int no_nice; if (!set_one_prio_perm(p)) { error = -EPERM; goto out; } if (niceval < task_nice(p) && !can_nice(p, niceval)) { error = -EACCES; goto out; } no_nice = security_task_setnice(p, niceval); if (no_nice) { error = no_nice; goto out; } if (error == -ESRCH) error = 0; set_user_nice(p, niceval); out: return error; } SYSCALL_DEFINE3(setpriority, int, which, int, who, int, niceval) { struct task_struct *g, *p; struct user_struct *user; const struct cred *cred = current_cred(); int error = -EINVAL; struct pid *pgrp; kuid_t uid; if (which > PRIO_USER || which < PRIO_PROCESS) goto out; /* normalize: avoid signed division (rounding problems) */ error = -ESRCH; if (niceval < MIN_NICE) niceval = MIN_NICE; if (niceval > MAX_NICE) niceval = MAX_NICE; rcu_read_lock(); switch (which) { case PRIO_PROCESS: if (who) p = find_task_by_vpid(who); else p = current; if (p) error = set_one_prio(p, niceval, error); break; case PRIO_PGRP: if (who) pgrp = find_vpid(who); else pgrp = task_pgrp(current); read_lock(&tasklist_lock); do_each_pid_thread(pgrp, PIDTYPE_PGID, p) { error = set_one_prio(p, niceval, error); } while_each_pid_thread(pgrp, PIDTYPE_PGID, p); read_unlock(&tasklist_lock); break; case PRIO_USER: uid = make_kuid(cred->user_ns, who); user = cred->user; if (!who) uid = cred->uid; else if (!uid_eq(uid, cred->uid)) { user = find_user(uid); if (!user) goto out_unlock; /* No processes for this user */ } for_each_process_thread(g, p) { if (uid_eq(task_uid(p), uid) && task_pid_vnr(p)) error = set_one_prio(p, niceval, error); } if (!uid_eq(uid, cred->uid)) free_uid(user); /* For find_user() */ break; } out_unlock: rcu_read_unlock(); out: return error; } /* * Ugh. To avoid negative return values, "getpriority()" will * not return the normal nice-value, but a negated value that * has been offset by 20 (ie it returns 40..1 instead of -20..19) * to stay compatible. */ SYSCALL_DEFINE2(getpriority, int, which, int, who) { struct task_struct *g, *p; struct user_struct *user; const struct cred *cred = current_cred(); long niceval, retval = -ESRCH; struct pid *pgrp; kuid_t uid; if (which > PRIO_USER || which < PRIO_PROCESS) return -EINVAL; rcu_read_lock(); switch (which) { case PRIO_PROCESS: if (who) p = find_task_by_vpid(who); else p = current; if (p) { niceval = nice_to_rlimit(task_nice(p)); if (niceval > retval) retval = niceval; } break; case PRIO_PGRP: if (who) pgrp = find_vpid(who); else pgrp = task_pgrp(current); read_lock(&tasklist_lock); do_each_pid_thread(pgrp, PIDTYPE_PGID, p) { niceval = nice_to_rlimit(task_nice(p)); if (niceval > retval) retval = niceval; } while_each_pid_thread(pgrp, PIDTYPE_PGID, p); read_unlock(&tasklist_lock); break; case PRIO_USER: uid = make_kuid(cred->user_ns, who); user = cred->user; if (!who) uid = cred->uid; else if (!uid_eq(uid, cred->uid)) { user = find_user(uid); if (!user) goto out_unlock; /* No processes for this user */ } for_each_process_thread(g, p) { if (uid_eq(task_uid(p), uid) && task_pid_vnr(p)) { niceval = nice_to_rlimit(task_nice(p)); if (niceval > retval) retval = niceval; } } if (!uid_eq(uid, cred->uid)) free_uid(user); /* for find_user() */ break; } out_unlock: rcu_read_unlock(); return retval; } /* * Unprivileged users may change the real gid to the effective gid * or vice versa. (BSD-style) * * If you set the real gid at all, or set the effective gid to a value not * equal to the real gid, then the saved gid is set to the new effective gid. * * This makes it possible for a setgid program to completely drop its * privileges, which is often a useful assertion to make when you are doing * a security audit over a program. * * The general idea is that a program which uses just setregid() will be * 100% compatible with BSD. A program which uses just setgid() will be * 100% compatible with POSIX with saved IDs. * * SMP: There are not races, the GIDs are checked only by filesystem * operations (as far as semantic preservation is concerned). */ #ifdef CONFIG_MULTIUSER long __sys_setregid(gid_t rgid, gid_t egid) { struct user_namespace *ns = current_user_ns(); const struct cred *old; struct cred *new; int retval; kgid_t krgid, kegid; krgid = make_kgid(ns, rgid); kegid = make_kgid(ns, egid); if ((rgid != (gid_t) -1) && !gid_valid(krgid)) return -EINVAL; if ((egid != (gid_t) -1) && !gid_valid(kegid)) return -EINVAL; new = prepare_creds(); if (!new) return -ENOMEM; old = current_cred(); retval = -EPERM; if (rgid != (gid_t) -1) { if (gid_eq(old->gid, krgid) || gid_eq(old->egid, krgid) || ns_capable_setid(old->user_ns, CAP_SETGID)) new->gid = krgid; else goto error; } if (egid != (gid_t) -1) { if (gid_eq(old->gid, kegid) || gid_eq(old->egid, kegid) || gid_eq(old->sgid, kegid) || ns_capable_setid(old->user_ns, CAP_SETGID)) new->egid = kegid; else goto error; } if (rgid != (gid_t) -1 || (egid != (gid_t) -1 && !gid_eq(kegid, old->gid))) new->sgid = new->egid; new->fsgid = new->egid; retval = security_task_fix_setgid(new, old, LSM_SETID_RE); if (retval < 0) goto error; return commit_creds(new); error: abort_creds(new); return retval; } SYSCALL_DEFINE2(setregid, gid_t, rgid, gid_t, egid) { return __sys_setregid(rgid, egid); } /* * setgid() is implemented like SysV w/ SAVED_IDS * * SMP: Same implicit races as above. */ long __sys_setgid(gid_t gid) { struct user_namespace *ns = current_user_ns(); const struct cred *old; struct cred *new; int retval; kgid_t kgid; kgid = make_kgid(ns, gid); if (!gid_valid(kgid)) return -EINVAL; new = prepare_creds(); if (!new) return -ENOMEM; old = current_cred(); retval = -EPERM; if (ns_capable_setid(old->user_ns, CAP_SETGID)) new->gid = new->egid = new->sgid = new->fsgid = kgid; else if (gid_eq(kgid, old->gid) || gid_eq(kgid, old->sgid)) new->egid = new->fsgid = kgid; else goto error; retval = security_task_fix_setgid(new, old, LSM_SETID_ID); if (retval < 0) goto error; return commit_creds(new); error: abort_creds(new); return retval; } SYSCALL_DEFINE1(setgid, gid_t, gid) { return __sys_setgid(gid); } /* * change the user struct in a credentials set to match the new UID */ static int set_user(struct cred *new) { struct user_struct *new_user; new_user = alloc_uid(new->uid); if (!new_user) return -EAGAIN; free_uid(new->user); new->user = new_user; return 0; } static void flag_nproc_exceeded(struct cred *new) { if (new->ucounts == current_ucounts()) return; /* * We don't fail in case of NPROC limit excess here because too many * poorly written programs don't check set*uid() return code, assuming * it never fails if called by root. We may still enforce NPROC limit * for programs doing set*uid()+execve() by harmlessly deferring the * failure to the execve() stage. */ if (is_rlimit_overlimit(new->ucounts, UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC)) && new->user != INIT_USER) current->flags |= PF_NPROC_EXCEEDED; else current->flags &= ~PF_NPROC_EXCEEDED; } /* * Unprivileged users may change the real uid to the effective uid * or vice versa. (BSD-style) * * If you set the real uid at all, or set the effective uid to a value not * equal to the real uid, then the saved uid is set to the new effective uid. * * This makes it possible for a setuid program to completely drop its * privileges, which is often a useful assertion to make when you are doing * a security audit over a program. * * The general idea is that a program which uses just setreuid() will be * 100% compatible with BSD. A program which uses just setuid() will be * 100% compatible with POSIX with saved IDs. */ long __sys_setreuid(uid_t ruid, uid_t euid) { struct user_namespace *ns = current_user_ns(); const struct cred *old; struct cred *new; int retval; kuid_t kruid, keuid; kruid = make_kuid(ns, ruid); keuid = make_kuid(ns, euid); if ((ruid != (uid_t) -1) && !uid_valid(kruid)) return -EINVAL; if ((euid != (uid_t) -1) && !uid_valid(keuid)) return -EINVAL; new = prepare_creds(); if (!new) return -ENOMEM; old = current_cred(); retval = -EPERM; if (ruid != (uid_t) -1) { new->uid = kruid; if (!uid_eq(old->uid, kruid) && !uid_eq(old->euid, kruid) && !ns_capable_setid(old->user_ns, CAP_SETUID)) goto error; } if (euid != (uid_t) -1) { new->euid = keuid; if (!uid_eq(old->uid, keuid) && !uid_eq(old->euid, keuid) && !uid_eq(old->suid, keuid) && !ns_capable_setid(old->user_ns, CAP_SETUID)) goto error; } if (!uid_eq(new->uid, old->uid)) { retval = set_user(new); if (retval < 0) goto error; } if (ruid != (uid_t) -1 || (euid != (uid_t) -1 && !uid_eq(keuid, old->uid))) new->suid = new->euid; new->fsuid = new->euid; retval = security_task_fix_setuid(new, old, LSM_SETID_RE); if (retval < 0) goto error; retval = set_cred_ucounts(new); if (retval < 0) goto error; flag_nproc_exceeded(new); return commit_creds(new); error: abort_creds(new); return retval; } SYSCALL_DEFINE2(setreuid, uid_t, ruid, uid_t, euid) { return __sys_setreuid(ruid, euid); } /* * setuid() is implemented like SysV with SAVED_IDS * * Note that SAVED_ID's is deficient in that a setuid root program * like sendmail, for example, cannot set its uid to be a normal * user and then switch back, because if you're root, setuid() sets * the saved uid too. If you don't like this, blame the bright people * in the POSIX committee and/or USG. Note that the BSD-style setreuid() * will allow a root program to temporarily drop privileges and be able to * regain them by swapping the real and effective uid. */ long __sys_setuid(uid_t uid) { struct user_namespace *ns = current_user_ns(); const struct cred *old; struct cred *new; int retval; kuid_t kuid; kuid = make_kuid(ns, uid); if (!uid_valid(kuid)) return -EINVAL; new = prepare_creds(); if (!new) return -ENOMEM; old = current_cred(); retval = -EPERM; if (ns_capable_setid(old->user_ns, CAP_SETUID)) { new->suid = new->uid = kuid; if (!uid_eq(kuid, old->uid)) { retval = set_user(new); if (retval < 0) goto error; } } else if (!uid_eq(kuid, old->uid) && !uid_eq(kuid, new->suid)) { goto error; } new->fsuid = new->euid = kuid; retval = security_task_fix_setuid(new, old, LSM_SETID_ID); if (retval < 0) goto error; retval = set_cred_ucounts(new); if (retval < 0) goto error; flag_nproc_exceeded(new); return commit_creds(new); error: abort_creds(new); return retval; } SYSCALL_DEFINE1(setuid, uid_t, uid) { return __sys_setuid(uid); } /* * This function implements a generic ability to update ruid, euid, * and suid. This allows you to implement the 4.4 compatible seteuid(). */ long __sys_setresuid(uid_t ruid, uid_t euid, uid_t suid) { struct user_namespace *ns = current_user_ns(); const struct cred *old; struct cred *new; int retval; kuid_t kruid, keuid, ksuid; bool ruid_new, euid_new, suid_new; kruid = make_kuid(ns, ruid); keuid = make_kuid(ns, euid); ksuid = make_kuid(ns, suid); if ((ruid != (uid_t) -1) && !uid_valid(kruid)) return -EINVAL; if ((euid != (uid_t) -1) && !uid_valid(keuid)) return -EINVAL; if ((suid != (uid_t) -1) && !uid_valid(ksuid)) return -EINVAL; old = current_cred(); /* check for no-op */ if ((ruid == (uid_t) -1 || uid_eq(kruid, old->uid)) && (euid == (uid_t) -1 || (uid_eq(keuid, old->euid) && uid_eq(keuid, old->fsuid))) && (suid == (uid_t) -1 || uid_eq(ksuid, old->suid))) return 0; ruid_new = ruid != (uid_t) -1 && !uid_eq(kruid, old->uid) && !uid_eq(kruid, old->euid) && !uid_eq(kruid, old->suid); euid_new = euid != (uid_t) -1 && !uid_eq(keuid, old->uid) && !uid_eq(keuid, old->euid) && !uid_eq(keuid, old->suid); suid_new = suid != (uid_t) -1 && !uid_eq(ksuid, old->uid) && !uid_eq(ksuid, old->euid) && !uid_eq(ksuid, old->suid); if ((ruid_new || euid_new || suid_new) && !ns_capable_setid(old->user_ns, CAP_SETUID)) return -EPERM; new = prepare_creds(); if (!new) return -ENOMEM; if (ruid != (uid_t) -1) { new->uid = kruid; if (!uid_eq(kruid, old->uid)) { retval = set_user(new); if (retval < 0) goto error; } } if (euid != (uid_t) -1) new->euid = keuid; if (suid != (uid_t) -1) new->suid = ksuid; new->fsuid = new->euid; retval = security_task_fix_setuid(new, old, LSM_SETID_RES); if (retval < 0) goto error; retval = set_cred_ucounts(new); if (retval < 0) goto error; flag_nproc_exceeded(new); return commit_creds(new); error: abort_creds(new); return retval; } SYSCALL_DEFINE3(setresuid, uid_t, ruid, uid_t, euid, uid_t, suid) { return __sys_setresuid(ruid, euid, suid); } SYSCALL_DEFINE3(getresuid, uid_t __user *, ruidp, uid_t __user *, euidp, uid_t __user *, suidp) { const struct cred *cred = current_cred(); int retval; uid_t ruid, euid, suid; ruid = from_kuid_munged(cred->user_ns, cred->uid); euid = from_kuid_munged(cred->user_ns, cred->euid); suid = from_kuid_munged(cred->user_ns, cred->suid); retval = put_user(ruid, ruidp); if (!retval) { retval = put_user(euid, euidp); if (!retval) return put_user(suid, suidp); } return retval; } /* * Same as above, but for rgid, egid, sgid. */ long __sys_setresgid(gid_t rgid, gid_t egid, gid_t sgid) { struct user_namespace *ns = current_user_ns(); const struct cred *old; struct cred *new; int retval; kgid_t krgid, kegid, ksgid; bool rgid_new, egid_new, sgid_new; krgid = make_kgid(ns, rgid); kegid = make_kgid(ns, egid); ksgid = make_kgid(ns, sgid); if ((rgid != (gid_t) -1) && !gid_valid(krgid)) return -EINVAL; if ((egid != (gid_t) -1) && !gid_valid(kegid)) return -EINVAL; if ((sgid != (gid_t) -1) && !gid_valid(ksgid)) return -EINVAL; old = current_cred(); /* check for no-op */ if ((rgid == (gid_t) -1 || gid_eq(krgid, old->gid)) && (egid == (gid_t) -1 || (gid_eq(kegid, old->egid) && gid_eq(kegid, old->fsgid))) && (sgid == (gid_t) -1 || gid_eq(ksgid, old->sgid))) return 0; rgid_new = rgid != (gid_t) -1 && !gid_eq(krgid, old->gid) && !gid_eq(krgid, old->egid) && !gid_eq(krgid, old->sgid); egid_new = egid != (gid_t) -1 && !gid_eq(kegid, old->gid) && !gid_eq(kegid, old->egid) && !gid_eq(kegid, old->sgid); sgid_new = sgid != (gid_t) -1 && !gid_eq(ksgid, old->gid) && !gid_eq(ksgid, old->egid) && !gid_eq(ksgid, old->sgid); if ((rgid_new || egid_new || sgid_new) && !ns_capable_setid(old->user_ns, CAP_SETGID)) return -EPERM; new = prepare_creds(); if (!new) return -ENOMEM; if (rgid != (gid_t) -1) new->gid = krgid; if (egid != (gid_t) -1) new->egid = kegid; if (sgid != (gid_t) -1) new->sgid = ksgid; new->fsgid = new->egid; retval = security_task_fix_setgid(new, old, LSM_SETID_RES); if (retval < 0) goto error; return commit_creds(new); error: abort_creds(new); return retval; } SYSCALL_DEFINE3(setresgid, gid_t, rgid, gid_t, egid, gid_t, sgid) { return __sys_setresgid(rgid, egid, sgid); } SYSCALL_DEFINE3(getresgid, gid_t __user *, rgidp, gid_t __user *, egidp, gid_t __user *, sgidp) { const struct cred *cred = current_cred(); int retval; gid_t rgid, egid, sgid; rgid = from_kgid_munged(cred->user_ns, cred->gid); egid = from_kgid_munged(cred->user_ns, cred->egid); sgid = from_kgid_munged(cred->user_ns, cred->sgid); retval = put_user(rgid, rgidp); if (!retval) { retval = put_user(egid, egidp); if (!retval) retval = put_user(sgid, sgidp); } return retval; } /* * "setfsuid()" sets the fsuid - the uid used for filesystem checks. This * is used for "access()" and for the NFS daemon (letting nfsd stay at * whatever uid it wants to). It normally shadows "euid", except when * explicitly set by setfsuid() or for access.. */ long __sys_setfsuid(uid_t uid) { const struct cred *old; struct cred *new; uid_t old_fsuid; kuid_t kuid; old = current_cred(); old_fsuid = from_kuid_munged(old->user_ns, old->fsuid); kuid = make_kuid(old->user_ns, uid); if (!uid_valid(kuid)) return old_fsuid; new = prepare_creds(); if (!new) return old_fsuid; if (uid_eq(kuid, old->uid) || uid_eq(kuid, old->euid) || uid_eq(kuid, old->suid) || uid_eq(kuid, old->fsuid) || ns_capable_setid(old->user_ns, CAP_SETUID)) { if (!uid_eq(kuid, old->fsuid)) { new->fsuid = kuid; if (security_task_fix_setuid(new, old, LSM_SETID_FS) == 0) goto change_okay; } } abort_creds(new); return old_fsuid; change_okay: commit_creds(new); return old_fsuid; } SYSCALL_DEFINE1(setfsuid, uid_t, uid) { return __sys_setfsuid(uid); } /* * Samma pÃ¥ svenska.. */ long __sys_setfsgid(gid_t gid) { const struct cred *old; struct cred *new; gid_t old_fsgid; kgid_t kgid; old = current_cred(); old_fsgid = from_kgid_munged(old->user_ns, old->fsgid); kgid = make_kgid(old->user_ns, gid); if (!gid_valid(kgid)) return old_fsgid; new = prepare_creds(); if (!new) return old_fsgid; if (gid_eq(kgid, old->gid) || gid_eq(kgid, old->egid) || gid_eq(kgid, old->sgid) || gid_eq(kgid, old->fsgid) || ns_capable_setid(old->user_ns, CAP_SETGID)) { if (!gid_eq(kgid, old->fsgid)) { new->fsgid = kgid; if (security_task_fix_setgid(new,old,LSM_SETID_FS) == 0) goto change_okay; } } abort_creds(new); return old_fsgid; change_okay: commit_creds(new); return old_fsgid; } SYSCALL_DEFINE1(setfsgid, gid_t, gid) { return __sys_setfsgid(gid); } #endif /* CONFIG_MULTIUSER */ /** * sys_getpid - return the thread group id of the current process * * Note, despite the name, this returns the tgid not the pid. The tgid and * the pid are identical unless CLONE_THREAD was specified on clone() in * which case the tgid is the same in all threads of the same group. * * This is SMP safe as current->tgid does not change. */ SYSCALL_DEFINE0(getpid) { return task_tgid_vnr(current); } /* Thread ID - the internal kernel "pid" */ SYSCALL_DEFINE0(gettid) { return task_pid_vnr(current); } /* * Accessing ->real_parent is not SMP-safe, it could * change from under us. However, we can use a stale * value of ->real_parent under rcu_read_lock(), see * release_task()->call_rcu(delayed_put_task_struct). */ SYSCALL_DEFINE0(getppid) { int pid; rcu_read_lock(); pid = task_tgid_vnr(rcu_dereference(current->real_parent)); rcu_read_unlock(); return pid; } SYSCALL_DEFINE0(getuid) { /* Only we change this so SMP safe */ return from_kuid_munged(current_user_ns(), current_uid()); } SYSCALL_DEFINE0(geteuid) { /* Only we change this so SMP safe */ return from_kuid_munged(current_user_ns(), current_euid()); } SYSCALL_DEFINE0(getgid) { /* Only we change this so SMP safe */ return from_kgid_munged(current_user_ns(), current_gid()); } SYSCALL_DEFINE0(getegid) { /* Only we change this so SMP safe */ return from_kgid_munged(current_user_ns(), current_egid()); } static void do_sys_times(struct tms *tms) { u64 tgutime, tgstime, cutime, cstime; thread_group_cputime_adjusted(current, &tgutime, &tgstime); cutime = current->signal->cutime; cstime = current->signal->cstime; tms->tms_utime = nsec_to_clock_t(tgutime); tms->tms_stime = nsec_to_clock_t(tgstime); tms->tms_cutime = nsec_to_clock_t(cutime); tms->tms_cstime = nsec_to_clock_t(cstime); } SYSCALL_DEFINE1(times, struct tms __user *, tbuf) { if (tbuf) { struct tms tmp; do_sys_times(&tmp); if (copy_to_user(tbuf, &tmp, sizeof(struct tms))) return -EFAULT; } force_successful_syscall_return(); return (long) jiffies_64_to_clock_t(get_jiffies_64()); } #ifdef CONFIG_COMPAT static compat_clock_t clock_t_to_compat_clock_t(clock_t x) { return compat_jiffies_to_clock_t(clock_t_to_jiffies(x)); } COMPAT_SYSCALL_DEFINE1(times, struct compat_tms __user *, tbuf) { if (tbuf) { struct tms tms; struct compat_tms tmp; do_sys_times(&tms); /* Convert our struct tms to the compat version. */ tmp.tms_utime = clock_t_to_compat_clock_t(tms.tms_utime); tmp.tms_stime = clock_t_to_compat_clock_t(tms.tms_stime); tmp.tms_cutime = clock_t_to_compat_clock_t(tms.tms_cutime); tmp.tms_cstime = clock_t_to_compat_clock_t(tms.tms_cstime); if (copy_to_user(tbuf, &tmp, sizeof(tmp))) return -EFAULT; } force_successful_syscall_return(); return compat_jiffies_to_clock_t(jiffies); } #endif /* * This needs some heavy checking ... * I just haven't the stomach for it. I also don't fully * understand sessions/pgrp etc. Let somebody who does explain it. * * OK, I think I have the protection semantics right.... this is really * only important on a multi-user system anyway, to make sure one user * can't send a signal to a process owned by another. -TYT, 12/12/91 * * !PF_FORKNOEXEC check to conform completely to POSIX. */ SYSCALL_DEFINE2(setpgid, pid_t, pid, pid_t, pgid) { struct task_struct *p; struct task_struct *group_leader = current->group_leader; struct pid *pids[PIDTYPE_MAX] = { 0 }; struct pid *pgrp; int err; if (!pid) pid = task_pid_vnr(group_leader); if (!pgid) pgid = pid; if (pgid < 0) return -EINVAL; rcu_read_lock(); /* From this point forward we keep holding onto the tasklist lock * so that our parent does not change from under us. -DaveM */ write_lock_irq(&tasklist_lock); err = -ESRCH; p = find_task_by_vpid(pid); if (!p) goto out; err = -EINVAL; if (!thread_group_leader(p)) goto out; if (same_thread_group(p->real_parent, group_leader)) { err = -EPERM; if (task_session(p) != task_session(group_leader)) goto out; err = -EACCES; if (!(p->flags & PF_FORKNOEXEC)) goto out; } else { err = -ESRCH; if (p != group_leader) goto out; } err = -EPERM; if (p->signal->leader) goto out; pgrp = task_pid(p); if (pgid != pid) { struct task_struct *g; pgrp = find_vpid(pgid); g = pid_task(pgrp, PIDTYPE_PGID); if (!g || task_session(g) != task_session(group_leader)) goto out; } err = security_task_setpgid(p, pgid); if (err) goto out; if (task_pgrp(p) != pgrp) change_pid(pids, p, PIDTYPE_PGID, pgrp); err = 0; out: /* All paths lead to here, thus we are safe. -DaveM */ write_unlock_irq(&tasklist_lock); rcu_read_unlock(); free_pids(pids); return err; } static int do_getpgid(pid_t pid) { struct task_struct *p; struct pid *grp; int retval; rcu_read_lock(); if (!pid) grp = task_pgrp(current); else { retval = -ESRCH; p = find_task_by_vpid(pid); if (!p) goto out; grp = task_pgrp(p); if (!grp) goto out; retval = security_task_getpgid(p); if (retval) goto out; } retval = pid_vnr(grp); out: rcu_read_unlock(); return retval; } SYSCALL_DEFINE1(getpgid, pid_t, pid) { return do_getpgid(pid); } #ifdef __ARCH_WANT_SYS_GETPGRP SYSCALL_DEFINE0(getpgrp) { return do_getpgid(0); } #endif SYSCALL_DEFINE1(getsid, pid_t, pid) { struct task_struct *p; struct pid *sid; int retval; rcu_read_lock(); if (!pid) sid = task_session(current); else { retval = -ESRCH; p = find_task_by_vpid(pid); if (!p) goto out; sid = task_session(p); if (!sid) goto out; retval = security_task_getsid(p); if (retval) goto out; } retval = pid_vnr(sid); out: rcu_read_unlock(); return retval; } static void set_special_pids(struct pid **pids, struct pid *pid) { struct task_struct *curr = current->group_leader; if (task_session(curr) != pid) change_pid(pids, curr, PIDTYPE_SID, pid); if (task_pgrp(curr) != pid) change_pid(pids, curr, PIDTYPE_PGID, pid); } int ksys_setsid(void) { struct task_struct *group_leader = current->group_leader; struct pid *sid = task_pid(group_leader); struct pid *pids[PIDTYPE_MAX] = { 0 }; pid_t session = pid_vnr(sid); int err = -EPERM; write_lock_irq(&tasklist_lock); /* Fail if I am already a session leader */ if (group_leader->signal->leader) goto out; /* Fail if a process group id already exists that equals the * proposed session id. */ if (pid_task(sid, PIDTYPE_PGID)) goto out; group_leader->signal->leader = 1; set_special_pids(pids, sid); proc_clear_tty(group_leader); err = session; out: write_unlock_irq(&tasklist_lock); free_pids(pids); if (err > 0) { proc_sid_connector(group_leader); sched_autogroup_create_attach(group_leader); } return err; } SYSCALL_DEFINE0(setsid) { return ksys_setsid(); } DECLARE_RWSEM(uts_sem); #ifdef COMPAT_UTS_MACHINE #define override_architecture(name) \ (personality(current->personality) == PER_LINUX32 && \ copy_to_user(name->machine, COMPAT_UTS_MACHINE, \ sizeof(COMPAT_UTS_MACHINE))) #else #define override_architecture(name) 0 #endif /* * Work around broken programs that cannot handle "Linux 3.0". * Instead we map 3.x to 2.6.40+x, so e.g. 3.0 would be 2.6.40 * And we map 4.x and later versions to 2.6.60+x, so 4.0/5.0/6.0/... would be * 2.6.60. */ static int override_release(char __user *release, size_t len) { int ret = 0; if (current->personality & UNAME26) { const char *rest = UTS_RELEASE; char buf[65] = { 0 }; int ndots = 0; unsigned v; size_t copy; while (*rest) { if (*rest == '.' && ++ndots >= 3) break; if (!isdigit(*rest) && *rest != '.') break; rest++; } v = LINUX_VERSION_PATCHLEVEL + 60; copy = clamp_t(size_t, len, 1, sizeof(buf)); copy = scnprintf(buf, copy, "2.6.%u%s", v, rest); ret = copy_to_user(release, buf, copy + 1); } return ret; } SYSCALL_DEFINE1(newuname, struct new_utsname __user *, name) { struct new_utsname tmp; down_read(&uts_sem); memcpy(&tmp, utsname(), sizeof(tmp)); up_read(&uts_sem); if (copy_to_user(name, &tmp, sizeof(tmp))) return -EFAULT; if (override_release(name->release, sizeof(name->release))) return -EFAULT; if (override_architecture(name)) return -EFAULT; return 0; } #ifdef __ARCH_WANT_SYS_OLD_UNAME /* * Old cruft */ SYSCALL_DEFINE1(uname, struct old_utsname __user *, name) { struct old_utsname tmp; if (!name) return -EFAULT; down_read(&uts_sem); memcpy(&tmp, utsname(), sizeof(tmp)); up_read(&uts_sem); if (copy_to_user(name, &tmp, sizeof(tmp))) return -EFAULT; if (override_release(name->release, sizeof(name->release))) return -EFAULT; if (override_architecture(name)) return -EFAULT; return 0; } SYSCALL_DEFINE1(olduname, struct oldold_utsname __user *, name) { struct oldold_utsname tmp; if (!name) return -EFAULT; memset(&tmp, 0, sizeof(tmp)); down_read(&uts_sem); memcpy(&tmp.sysname, &utsname()->sysname, __OLD_UTS_LEN); memcpy(&tmp.nodename, &utsname()->nodename, __OLD_UTS_LEN); memcpy(&tmp.release, &utsname()->release, __OLD_UTS_LEN); memcpy(&tmp.version, &utsname()->version, __OLD_UTS_LEN); memcpy(&tmp.machine, &utsname()->machine, __OLD_UTS_LEN); up_read(&uts_sem); if (copy_to_user(name, &tmp, sizeof(tmp))) return -EFAULT; if (override_architecture(name)) return -EFAULT; if (override_release(name->release, sizeof(name->release))) return -EFAULT; return 0; } #endif SYSCALL_DEFINE2(sethostname, char __user *, name, int, len) { int errno; char tmp[__NEW_UTS_LEN]; if (!ns_capable(current->nsproxy->uts_ns->user_ns, CAP_SYS_ADMIN)) return -EPERM; if (len < 0 || len > __NEW_UTS_LEN) return -EINVAL; errno = -EFAULT; if (!copy_from_user(tmp, name, len)) { struct new_utsname *u; add_device_randomness(tmp, len); down_write(&uts_sem); u = utsname(); memcpy(u->nodename, tmp, len); memset(u->nodename + len, 0, sizeof(u->nodename) - len); errno = 0; uts_proc_notify(UTS_PROC_HOSTNAME); up_write(&uts_sem); } return errno; } #ifdef __ARCH_WANT_SYS_GETHOSTNAME SYSCALL_DEFINE2(gethostname, char __user *, name, int, len) { int i; struct new_utsname *u; char tmp[__NEW_UTS_LEN + 1]; if (len < 0) return -EINVAL; down_read(&uts_sem); u = utsname(); i = 1 + strlen(u->nodename); if (i > len) i = len; memcpy(tmp, u->nodename, i); up_read(&uts_sem); if (copy_to_user(name, tmp, i)) return -EFAULT; return 0; } #endif /* * Only setdomainname; getdomainname can be implemented by calling * uname() */ SYSCALL_DEFINE2(setdomainname, char __user *, name, int, len) { int errno; char tmp[__NEW_UTS_LEN]; if (!ns_capable(current->nsproxy->uts_ns->user_ns, CAP_SYS_ADMIN)) return -EPERM; if (len < 0 || len > __NEW_UTS_LEN) return -EINVAL; errno = -EFAULT; if (!copy_from_user(tmp, name, len)) { struct new_utsname *u; add_device_randomness(tmp, len); down_write(&uts_sem); u = utsname(); memcpy(u->domainname, tmp, len); memset(u->domainname + len, 0, sizeof(u->domainname) - len); errno = 0; uts_proc_notify(UTS_PROC_DOMAINNAME); up_write(&uts_sem); } return errno; } /* make sure you are allowed to change @tsk limits before calling this */ static int do_prlimit(struct task_struct *tsk, unsigned int resource, struct rlimit *new_rlim, struct rlimit *old_rlim) { struct rlimit *rlim; int retval = 0; if (resource >= RLIM_NLIMITS) return -EINVAL; resource = array_index_nospec(resource, RLIM_NLIMITS); if (new_rlim) { if (new_rlim->rlim_cur > new_rlim->rlim_max) return -EINVAL; if (resource == RLIMIT_NOFILE && new_rlim->rlim_max > sysctl_nr_open) return -EPERM; } /* Holding a refcount on tsk protects tsk->signal from disappearing. */ rlim = tsk->signal->rlim + resource; task_lock(tsk->group_leader); if (new_rlim) { /* * Keep the capable check against init_user_ns until cgroups can * contain all limits. */ if (new_rlim->rlim_max > rlim->rlim_max && !capable(CAP_SYS_RESOURCE)) retval = -EPERM; if (!retval) retval = security_task_setrlimit(tsk, resource, new_rlim); } if (!retval) { if (old_rlim) *old_rlim = *rlim; if (new_rlim) *rlim = *new_rlim; } task_unlock(tsk->group_leader); /* * RLIMIT_CPU handling. Arm the posix CPU timer if the limit is not * infinite. In case of RLIM_INFINITY the posix CPU timer code * ignores the rlimit. */ if (!retval && new_rlim && resource == RLIMIT_CPU && new_rlim->rlim_cur != RLIM_INFINITY && IS_ENABLED(CONFIG_POSIX_TIMERS)) { /* * update_rlimit_cpu can fail if the task is exiting, but there * may be other tasks in the thread group that are not exiting, * and they need their cpu timers adjusted. * * The group_leader is the last task to be released, so if we * cannot update_rlimit_cpu on it, then the entire process is * exiting and we do not need to update at all. */ update_rlimit_cpu(tsk->group_leader, new_rlim->rlim_cur); } return retval; } SYSCALL_DEFINE2(getrlimit, unsigned int, resource, struct rlimit __user *, rlim) { struct rlimit value; int ret; ret = do_prlimit(current, resource, NULL, &value); if (!ret) ret = copy_to_user(rlim, &value, sizeof(*rlim)) ? -EFAULT : 0; return ret; } #ifdef CONFIG_COMPAT COMPAT_SYSCALL_DEFINE2(setrlimit, unsigned int, resource, struct compat_rlimit __user *, rlim) { struct rlimit r; struct compat_rlimit r32; if (copy_from_user(&r32, rlim, sizeof(struct compat_rlimit))) return -EFAULT; if (r32.rlim_cur == COMPAT_RLIM_INFINITY) r.rlim_cur = RLIM_INFINITY; else r.rlim_cur = r32.rlim_cur; if (r32.rlim_max == COMPAT_RLIM_INFINITY) r.rlim_max = RLIM_INFINITY; else r.rlim_max = r32.rlim_max; return do_prlimit(current, resource, &r, NULL); } COMPAT_SYSCALL_DEFINE2(getrlimit, unsigned int, resource, struct compat_rlimit __user *, rlim) { struct rlimit r; int ret; ret = do_prlimit(current, resource, NULL, &r); if (!ret) { struct compat_rlimit r32; if (r.rlim_cur > COMPAT_RLIM_INFINITY) r32.rlim_cur = COMPAT_RLIM_INFINITY; else r32.rlim_cur = r.rlim_cur; if (r.rlim_max > COMPAT_RLIM_INFINITY) r32.rlim_max = COMPAT_RLIM_INFINITY; else r32.rlim_max = r.rlim_max; if (copy_to_user(rlim, &r32, sizeof(struct compat_rlimit))) return -EFAULT; } return ret; } #endif #ifdef __ARCH_WANT_SYS_OLD_GETRLIMIT /* * Back compatibility for getrlimit. Needed for some apps. */ SYSCALL_DEFINE2(old_getrlimit, unsigned int, resource, struct rlimit __user *, rlim) { struct rlimit x; if (resource >= RLIM_NLIMITS) return -EINVAL; resource = array_index_nospec(resource, RLIM_NLIMITS); task_lock(current->group_leader); x = current->signal->rlim[resource]; task_unlock(current->group_leader); if (x.rlim_cur > 0x7FFFFFFF) x.rlim_cur = 0x7FFFFFFF; if (x.rlim_max > 0x7FFFFFFF) x.rlim_max = 0x7FFFFFFF; return copy_to_user(rlim, &x, sizeof(x)) ? -EFAULT : 0; } #ifdef CONFIG_COMPAT COMPAT_SYSCALL_DEFINE2(old_getrlimit, unsigned int, resource, struct compat_rlimit __user *, rlim) { struct rlimit r; if (resource >= RLIM_NLIMITS) return -EINVAL; resource = array_index_nospec(resource, RLIM_NLIMITS); task_lock(current->group_leader); r = current->signal->rlim[resource]; task_unlock(current->group_leader); if (r.rlim_cur > 0x7FFFFFFF) r.rlim_cur = 0x7FFFFFFF; if (r.rlim_max > 0x7FFFFFFF) r.rlim_max = 0x7FFFFFFF; if (put_user(r.rlim_cur, &rlim->rlim_cur) || put_user(r.rlim_max, &rlim->rlim_max)) return -EFAULT; return 0; } #endif #endif static inline bool rlim64_is_infinity(__u64 rlim64) { #if BITS_PER_LONG < 64 return rlim64 >= ULONG_MAX; #else return rlim64 == RLIM64_INFINITY; #endif } static void rlim_to_rlim64(const struct rlimit *rlim, struct rlimit64 *rlim64) { if (rlim->rlim_cur == RLIM_INFINITY) rlim64->rlim_cur = RLIM64_INFINITY; else rlim64->rlim_cur = rlim->rlim_cur; if (rlim->rlim_max == RLIM_INFINITY) rlim64->rlim_max = RLIM64_INFINITY; else rlim64->rlim_max = rlim->rlim_max; } static void rlim64_to_rlim(const struct rlimit64 *rlim64, struct rlimit *rlim) { if (rlim64_is_infinity(rlim64->rlim_cur)) rlim->rlim_cur = RLIM_INFINITY; else rlim->rlim_cur = (unsigned long)rlim64->rlim_cur; if (rlim64_is_infinity(rlim64->rlim_max)) rlim->rlim_max = RLIM_INFINITY; else rlim->rlim_max = (unsigned long)rlim64->rlim_max; } /* rcu lock must be held */ static int check_prlimit_permission(struct task_struct *task, unsigned int flags) { const struct cred *cred = current_cred(), *tcred; bool id_match; if (current == task) return 0; tcred = __task_cred(task); id_match = (uid_eq(cred->uid, tcred->euid) && uid_eq(cred->uid, tcred->suid) && uid_eq(cred->uid, tcred->uid) && gid_eq(cred->gid, tcred->egid) && gid_eq(cred->gid, tcred->sgid) && gid_eq(cred->gid, tcred->gid)); if (!id_match && !ns_capable(tcred->user_ns, CAP_SYS_RESOURCE)) return -EPERM; return security_task_prlimit(cred, tcred, flags); } SYSCALL_DEFINE4(prlimit64, pid_t, pid, unsigned int, resource, const struct rlimit64 __user *, new_rlim, struct rlimit64 __user *, old_rlim) { struct rlimit64 old64, new64; struct rlimit old, new; struct task_struct *tsk; unsigned int checkflags = 0; bool need_tasklist; int ret; if (old_rlim) checkflags |= LSM_PRLIMIT_READ; if (new_rlim) { if (copy_from_user(&new64, new_rlim, sizeof(new64))) return -EFAULT; rlim64_to_rlim(&new64, &new); checkflags |= LSM_PRLIMIT_WRITE; } rcu_read_lock(); tsk = pid ? find_task_by_vpid(pid) : current; if (!tsk) { rcu_read_unlock(); return -ESRCH; } ret = check_prlimit_permission(tsk, checkflags); if (ret) { rcu_read_unlock(); return ret; } get_task_struct(tsk); rcu_read_unlock(); need_tasklist = !same_thread_group(tsk, current); if (need_tasklist) { /* * Ensure we can't race with group exit or de_thread(), * so tsk->group_leader can't be freed or changed until * read_unlock(tasklist_lock) below. */ read_lock(&tasklist_lock); if (!pid_alive(tsk)) ret = -ESRCH; } if (!ret) { ret = do_prlimit(tsk, resource, new_rlim ? &new : NULL, old_rlim ? &old : NULL); } if (need_tasklist) read_unlock(&tasklist_lock); if (!ret && old_rlim) { rlim_to_rlim64(&old, &old64); if (copy_to_user(old_rlim, &old64, sizeof(old64))) ret = -EFAULT; } put_task_struct(tsk); return ret; } SYSCALL_DEFINE2(setrlimit, unsigned int, resource, struct rlimit __user *, rlim) { struct rlimit new_rlim; if (copy_from_user(&new_rlim, rlim, sizeof(*rlim))) return -EFAULT; return do_prlimit(current, resource, &new_rlim, NULL); } /* * It would make sense to put struct rusage in the task_struct, * except that would make the task_struct be *really big*. After * task_struct gets moved into malloc'ed memory, it would * make sense to do this. It will make moving the rest of the information * a lot simpler! (Which we're not doing right now because we're not * measuring them yet). * * When sampling multiple threads for RUSAGE_SELF, under SMP we might have * races with threads incrementing their own counters. But since word * reads are atomic, we either get new values or old values and we don't * care which for the sums. We always take the siglock to protect reading * the c* fields from p->signal from races with exit.c updating those * fields when reaping, so a sample either gets all the additions of a * given child after it's reaped, or none so this sample is before reaping. * * Locking: * We need to take the siglock for CHILDEREN, SELF and BOTH * for the cases current multithreaded, non-current single threaded * non-current multithreaded. Thread traversal is now safe with * the siglock held. * Strictly speaking, we donot need to take the siglock if we are current and * single threaded, as no one else can take our signal_struct away, no one * else can reap the children to update signal->c* counters, and no one else * can race with the signal-> fields. If we do not take any lock, the * signal-> fields could be read out of order while another thread was just * exiting. So we should place a read memory barrier when we avoid the lock. * On the writer side, write memory barrier is implied in __exit_signal * as __exit_signal releases the siglock spinlock after updating the signal-> * fields. But we don't do this yet to keep things simple. * */ static void accumulate_thread_rusage(struct task_struct *t, struct rusage *r) { r->ru_nvcsw += t->nvcsw; r->ru_nivcsw += t->nivcsw; r->ru_minflt += t->min_flt; r->ru_majflt += t->maj_flt; r->ru_inblock += task_io_get_inblock(t); r->ru_oublock += task_io_get_oublock(t); } void getrusage(struct task_struct *p, int who, struct rusage *r) { struct task_struct *t; unsigned long flags; u64 tgutime, tgstime, utime, stime; unsigned long maxrss; struct mm_struct *mm; struct signal_struct *sig = p->signal; unsigned int seq = 0; retry: memset(r, 0, sizeof(*r)); utime = stime = 0; maxrss = 0; if (who == RUSAGE_THREAD) { task_cputime_adjusted(current, &utime, &stime); accumulate_thread_rusage(p, r); maxrss = sig->maxrss; goto out_thread; } flags = read_seqbegin_or_lock_irqsave(&sig->stats_lock, &seq); switch (who) { case RUSAGE_BOTH: case RUSAGE_CHILDREN: utime = sig->cutime; stime = sig->cstime; r->ru_nvcsw = sig->cnvcsw; r->ru_nivcsw = sig->cnivcsw; r->ru_minflt = sig->cmin_flt; r->ru_majflt = sig->cmaj_flt; r->ru_inblock = sig->cinblock; r->ru_oublock = sig->coublock; maxrss = sig->cmaxrss; if (who == RUSAGE_CHILDREN) break; fallthrough; case RUSAGE_SELF: r->ru_nvcsw += sig->nvcsw; r->ru_nivcsw += sig->nivcsw; r->ru_minflt += sig->min_flt; r->ru_majflt += sig->maj_flt; r->ru_inblock += sig->inblock; r->ru_oublock += sig->oublock; if (maxrss < sig->maxrss) maxrss = sig->maxrss; rcu_read_lock(); __for_each_thread(sig, t) accumulate_thread_rusage(t, r); rcu_read_unlock(); break; default: BUG(); } if (need_seqretry(&sig->stats_lock, seq)) { seq = 1; goto retry; } done_seqretry_irqrestore(&sig->stats_lock, seq, flags); if (who == RUSAGE_CHILDREN) goto out_children; thread_group_cputime_adjusted(p, &tgutime, &tgstime); utime += tgutime; stime += tgstime; out_thread: mm = get_task_mm(p); if (mm) { setmax_mm_hiwater_rss(&maxrss, mm); mmput(mm); } out_children: r->ru_maxrss = maxrss * (PAGE_SIZE / 1024); /* convert pages to KBs */ r->ru_utime = ns_to_kernel_old_timeval(utime); r->ru_stime = ns_to_kernel_old_timeval(stime); } SYSCALL_DEFINE2(getrusage, int, who, struct rusage __user *, ru) { struct rusage r; if (who != RUSAGE_SELF && who != RUSAGE_CHILDREN && who != RUSAGE_THREAD) return -EINVAL; getrusage(current, who, &r); return copy_to_user(ru, &r, sizeof(r)) ? -EFAULT : 0; } #ifdef CONFIG_COMPAT COMPAT_SYSCALL_DEFINE2(getrusage, int, who, struct compat_rusage __user *, ru) { struct rusage r; if (who != RUSAGE_SELF && who != RUSAGE_CHILDREN && who != RUSAGE_THREAD) return -EINVAL; getrusage(current, who, &r); return put_compat_rusage(&r, ru); } #endif SYSCALL_DEFINE1(umask, int, mask) { mask = xchg(¤t->fs->umask, mask & S_IRWXUGO); return mask; } static int prctl_set_mm_exe_file(struct mm_struct *mm, unsigned int fd) { CLASS(fd, exe)(fd); struct inode *inode; int err; if (fd_empty(exe)) return -EBADF; inode = file_inode(fd_file(exe)); /* * Because the original mm->exe_file points to executable file, make * sure that this one is executable as well, to avoid breaking an * overall picture. */ if (!S_ISREG(inode->i_mode) || path_noexec(&fd_file(exe)->f_path)) return -EACCES; err = file_permission(fd_file(exe), MAY_EXEC); if (err) return err; return replace_mm_exe_file(mm, fd_file(exe)); } /* * Check arithmetic relations of passed addresses. * * WARNING: we don't require any capability here so be very careful * in what is allowed for modification from userspace. */ static int validate_prctl_map_addr(struct prctl_mm_map *prctl_map) { unsigned long mmap_max_addr = TASK_SIZE; int error = -EINVAL, i; static const unsigned char offsets[] = { offsetof(struct prctl_mm_map, start_code), offsetof(struct prctl_mm_map, end_code), offsetof(struct prctl_mm_map, start_data), offsetof(struct prctl_mm_map, end_data), offsetof(struct prctl_mm_map, start_brk), offsetof(struct prctl_mm_map, brk), offsetof(struct prctl_mm_map, start_stack), offsetof(struct prctl_mm_map, arg_start), offsetof(struct prctl_mm_map, arg_end), offsetof(struct prctl_mm_map, env_start), offsetof(struct prctl_mm_map, env_end), }; /* * Make sure the members are not somewhere outside * of allowed address space. */ for (i = 0; i < ARRAY_SIZE(offsets); i++) { u64 val = *(u64 *)((char *)prctl_map + offsets[i]); if ((unsigned long)val >= mmap_max_addr || (unsigned long)val < mmap_min_addr) goto out; } /* * Make sure the pairs are ordered. */ #define __prctl_check_order(__m1, __op, __m2) \ ((unsigned long)prctl_map->__m1 __op \ (unsigned long)prctl_map->__m2) ? 0 : -EINVAL error = __prctl_check_order(start_code, <, end_code); error |= __prctl_check_order(start_data,<=, end_data); error |= __prctl_check_order(start_brk, <=, brk); error |= __prctl_check_order(arg_start, <=, arg_end); error |= __prctl_check_order(env_start, <=, env_end); if (error) goto out; #undef __prctl_check_order error = -EINVAL; /* * Neither we should allow to override limits if they set. */ if (check_data_rlimit(rlimit(RLIMIT_DATA), prctl_map->brk, prctl_map->start_brk, prctl_map->end_data, prctl_map->start_data)) goto out; error = 0; out: return error; } #ifdef CONFIG_CHECKPOINT_RESTORE static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long data_size) { struct prctl_mm_map prctl_map = { .exe_fd = (u32)-1, }; unsigned long user_auxv[AT_VECTOR_SIZE]; struct mm_struct *mm = current->mm; int error; BUILD_BUG_ON(sizeof(user_auxv) != sizeof(mm->saved_auxv)); BUILD_BUG_ON(sizeof(struct prctl_mm_map) > 256); if (opt == PR_SET_MM_MAP_SIZE) return put_user((unsigned int)sizeof(prctl_map), (unsigned int __user *)addr); if (data_size != sizeof(prctl_map)) return -EINVAL; if (copy_from_user(&prctl_map, addr, sizeof(prctl_map))) return -EFAULT; error = validate_prctl_map_addr(&prctl_map); if (error) return error; if (prctl_map.auxv_size) { /* * Someone is trying to cheat the auxv vector. */ if (!prctl_map.auxv || prctl_map.auxv_size > sizeof(mm->saved_auxv)) return -EINVAL; memset(user_auxv, 0, sizeof(user_auxv)); if (copy_from_user(user_auxv, (const void __user *)prctl_map.auxv, prctl_map.auxv_size)) return -EFAULT; /* Last entry must be AT_NULL as specification requires */ user_auxv[AT_VECTOR_SIZE - 2] = AT_NULL; user_auxv[AT_VECTOR_SIZE - 1] = AT_NULL; } if (prctl_map.exe_fd != (u32)-1) { /* * Check if the current user is checkpoint/restore capable. * At the time of this writing, it checks for CAP_SYS_ADMIN * or CAP_CHECKPOINT_RESTORE. * Note that a user with access to ptrace can masquerade an * arbitrary program as any executable, even setuid ones. * This may have implications in the tomoyo subsystem. */ if (!checkpoint_restore_ns_capable(current_user_ns())) return -EPERM; error = prctl_set_mm_exe_file(mm, prctl_map.exe_fd); if (error) return error; } /* * arg_lock protects concurrent updates but we still need mmap_lock for * read to exclude races with sys_brk. */ mmap_read_lock(mm); /* * We don't validate if these members are pointing to * real present VMAs because application may have correspond * VMAs already unmapped and kernel uses these members for statistics * output in procfs mostly, except * * - @start_brk/@brk which are used in do_brk_flags but kernel lookups * for VMAs when updating these members so anything wrong written * here cause kernel to swear at userspace program but won't lead * to any problem in kernel itself */ spin_lock(&mm->arg_lock); mm->start_code = prctl_map.start_code; mm->end_code = prctl_map.end_code; mm->start_data = prctl_map.start_data; mm->end_data = prctl_map.end_data; mm->start_brk = prctl_map.start_brk; mm->brk = prctl_map.brk; mm->start_stack = prctl_map.start_stack; mm->arg_start = prctl_map.arg_start; mm->arg_end = prctl_map.arg_end; mm->env_start = prctl_map.env_start; mm->env_end = prctl_map.env_end; spin_unlock(&mm->arg_lock); /* * Note this update of @saved_auxv is lockless thus * if someone reads this member in procfs while we're * updating -- it may get partly updated results. It's * known and acceptable trade off: we leave it as is to * not introduce additional locks here making the kernel * more complex. */ if (prctl_map.auxv_size) memcpy(mm->saved_auxv, user_auxv, sizeof(user_auxv)); mmap_read_unlock(mm); return 0; } #endif /* CONFIG_CHECKPOINT_RESTORE */ static int prctl_set_auxv(struct mm_struct *mm, unsigned long addr, unsigned long len) { /* * This doesn't move the auxiliary vector itself since it's pinned to * mm_struct, but it permits filling the vector with new values. It's * up to the caller to provide sane values here, otherwise userspace * tools which use this vector might be unhappy. */ unsigned long user_auxv[AT_VECTOR_SIZE] = {}; if (len > sizeof(user_auxv)) return -EINVAL; if (copy_from_user(user_auxv, (const void __user *)addr, len)) return -EFAULT; /* Make sure the last entry is always AT_NULL */ user_auxv[AT_VECTOR_SIZE - 2] = 0; user_auxv[AT_VECTOR_SIZE - 1] = 0; BUILD_BUG_ON(sizeof(user_auxv) != sizeof(mm->saved_auxv)); task_lock(current); memcpy(mm->saved_auxv, user_auxv, len); task_unlock(current); return 0; } static int prctl_set_mm(int opt, unsigned long addr, unsigned long arg4, unsigned long arg5) { struct mm_struct *mm = current->mm; struct prctl_mm_map prctl_map = { .auxv = NULL, .auxv_size = 0, .exe_fd = -1, }; struct vm_area_struct *vma; int error; if (arg5 || (arg4 && (opt != PR_SET_MM_AUXV && opt != PR_SET_MM_MAP && opt != PR_SET_MM_MAP_SIZE))) return -EINVAL; #ifdef CONFIG_CHECKPOINT_RESTORE if (opt == PR_SET_MM_MAP || opt == PR_SET_MM_MAP_SIZE) return prctl_set_mm_map(opt, (const void __user *)addr, arg4); #endif if (!capable(CAP_SYS_RESOURCE)) return -EPERM; if (opt == PR_SET_MM_EXE_FILE) return prctl_set_mm_exe_file(mm, (unsigned int)addr); if (opt == PR_SET_MM_AUXV) return prctl_set_auxv(mm, addr, arg4); if (addr >= TASK_SIZE || addr < mmap_min_addr) return -EINVAL; error = -EINVAL; /* * arg_lock protects concurrent updates of arg boundaries, we need * mmap_lock for a) concurrent sys_brk, b) finding VMA for addr * validation. */ mmap_read_lock(mm); vma = find_vma(mm, addr); spin_lock(&mm->arg_lock); prctl_map.start_code = mm->start_code; prctl_map.end_code = mm->end_code; prctl_map.start_data = mm->start_data; prctl_map.end_data = mm->end_data; prctl_map.start_brk = mm->start_brk; prctl_map.brk = mm->brk; prctl_map.start_stack = mm->start_stack; prctl_map.arg_start = mm->arg_start; prctl_map.arg_end = mm->arg_end; prctl_map.env_start = mm->env_start; prctl_map.env_end = mm->env_end; switch (opt) { case PR_SET_MM_START_CODE: prctl_map.start_code = addr; break; case PR_SET_MM_END_CODE: prctl_map.end_code = addr; break; case PR_SET_MM_START_DATA: prctl_map.start_data = addr; break; case PR_SET_MM_END_DATA: prctl_map.end_data = addr; break; case PR_SET_MM_START_STACK: prctl_map.start_stack = addr; break; case PR_SET_MM_START_BRK: prctl_map.start_brk = addr; break; case PR_SET_MM_BRK: prctl_map.brk = addr; break; case PR_SET_MM_ARG_START: prctl_map.arg_start = addr; break; case PR_SET_MM_ARG_END: prctl_map.arg_end = addr; break; case PR_SET_MM_ENV_START: prctl_map.env_start = addr; break; case PR_SET_MM_ENV_END: prctl_map.env_end = addr; break; default: goto out; } error = validate_prctl_map_addr(&prctl_map); if (error) goto out; switch (opt) { /* * If command line arguments and environment * are placed somewhere else on stack, we can * set them up here, ARG_START/END to setup * command line arguments and ENV_START/END * for environment. */ case PR_SET_MM_START_STACK: case PR_SET_MM_ARG_START: case PR_SET_MM_ARG_END: case PR_SET_MM_ENV_START: case PR_SET_MM_ENV_END: if (!vma) { error = -EFAULT; goto out; } } mm->start_code = prctl_map.start_code; mm->end_code = prctl_map.end_code; mm->start_data = prctl_map.start_data; mm->end_data = prctl_map.end_data; mm->start_brk = prctl_map.start_brk; mm->brk = prctl_map.brk; mm->start_stack = prctl_map.start_stack; mm->arg_start = prctl_map.arg_start; mm->arg_end = prctl_map.arg_end; mm->env_start = prctl_map.env_start; mm->env_end = prctl_map.env_end; error = 0; out: spin_unlock(&mm->arg_lock); mmap_read_unlock(mm); return error; } #ifdef CONFIG_CHECKPOINT_RESTORE static int prctl_get_tid_address(struct task_struct *me, int __user * __user *tid_addr) { return put_user(me->clear_child_tid, tid_addr); } #else static int prctl_get_tid_address(struct task_struct *me, int __user * __user *tid_addr) { return -EINVAL; } #endif static int propagate_has_child_subreaper(struct task_struct *p, void *data) { /* * If task has has_child_subreaper - all its descendants * already have these flag too and new descendants will * inherit it on fork, skip them. * * If we've found child_reaper - skip descendants in * it's subtree as they will never get out pidns. */ if (p->signal->has_child_subreaper || is_child_reaper(task_pid(p))) return 0; p->signal->has_child_subreaper = 1; return 1; } int __weak arch_prctl_spec_ctrl_get(struct task_struct *t, unsigned long which) { return -EINVAL; } int __weak arch_prctl_spec_ctrl_set(struct task_struct *t, unsigned long which, unsigned long ctrl) { return -EINVAL; } int __weak arch_get_shadow_stack_status(struct task_struct *t, unsigned long __user *status) { return -EINVAL; } int __weak arch_set_shadow_stack_status(struct task_struct *t, unsigned long status) { return -EINVAL; } int __weak arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status) { return -EINVAL; } #define PR_IO_FLUSHER (PF_MEMALLOC_NOIO | PF_LOCAL_THROTTLE) static int prctl_set_vma(unsigned long opt, unsigned long addr, unsigned long size, unsigned long arg) { int error; switch (opt) { case PR_SET_VMA_ANON_NAME: error = set_anon_vma_name(addr, size, (const char __user *)arg); break; default: error = -EINVAL; } return error; } static inline unsigned long get_current_mdwe(void) { unsigned long ret = 0; if (mm_flags_test(MMF_HAS_MDWE, current->mm)) ret |= PR_MDWE_REFUSE_EXEC_GAIN; if (mm_flags_test(MMF_HAS_MDWE_NO_INHERIT, current->mm)) ret |= PR_MDWE_NO_INHERIT; return ret; } static inline int prctl_set_mdwe(unsigned long bits, unsigned long arg3, unsigned long arg4, unsigned long arg5) { unsigned long current_bits; if (arg3 || arg4 || arg5) return -EINVAL; if (bits & ~(PR_MDWE_REFUSE_EXEC_GAIN | PR_MDWE_NO_INHERIT)) return -EINVAL; /* NO_INHERIT only makes sense with REFUSE_EXEC_GAIN */ if (bits & PR_MDWE_NO_INHERIT && !(bits & PR_MDWE_REFUSE_EXEC_GAIN)) return -EINVAL; /* * EOPNOTSUPP might be more appropriate here in principle, but * existing userspace depends on EINVAL specifically. */ if (!arch_memory_deny_write_exec_supported()) return -EINVAL; current_bits = get_current_mdwe(); if (current_bits && current_bits != bits) return -EPERM; /* Cannot unset the flags */ if (bits & PR_MDWE_NO_INHERIT) mm_flags_set(MMF_HAS_MDWE_NO_INHERIT, current->mm); if (bits & PR_MDWE_REFUSE_EXEC_GAIN) mm_flags_set(MMF_HAS_MDWE, current->mm); return 0; } static inline int prctl_get_mdwe(unsigned long arg2, unsigned long arg3, unsigned long arg4, unsigned long arg5) { if (arg2 || arg3 || arg4 || arg5) return -EINVAL; return get_current_mdwe(); } static int prctl_get_auxv(void __user *addr, unsigned long len) { struct mm_struct *mm = current->mm; unsigned long size = min_t(unsigned long, sizeof(mm->saved_auxv), len); if (size && copy_to_user(addr, mm->saved_auxv, size)) return -EFAULT; return sizeof(mm->saved_auxv); } static int prctl_get_thp_disable(unsigned long arg2, unsigned long arg3, unsigned long arg4, unsigned long arg5) { struct mm_struct *mm = current->mm; if (arg2 || arg3 || arg4 || arg5) return -EINVAL; /* If disabled, we return "1 | flags", otherwise 0. */ if (mm_flags_test(MMF_DISABLE_THP_COMPLETELY, mm)) return 1; else if (mm_flags_test(MMF_DISABLE_THP_EXCEPT_ADVISED, mm)) return 1 | PR_THP_DISABLE_EXCEPT_ADVISED; return 0; } static int prctl_set_thp_disable(bool thp_disable, unsigned long flags, unsigned long arg4, unsigned long arg5) { struct mm_struct *mm = current->mm; if (arg4 || arg5) return -EINVAL; /* Flags are only allowed when disabling. */ if ((!thp_disable && flags) || (flags & ~PR_THP_DISABLE_EXCEPT_ADVISED)) return -EINVAL; if (mmap_write_lock_killable(current->mm)) return -EINTR; if (thp_disable) { if (flags & PR_THP_DISABLE_EXCEPT_ADVISED) { mm_flags_clear(MMF_DISABLE_THP_COMPLETELY, mm); mm_flags_set(MMF_DISABLE_THP_EXCEPT_ADVISED, mm); } else { mm_flags_set(MMF_DISABLE_THP_COMPLETELY, mm); mm_flags_clear(MMF_DISABLE_THP_EXCEPT_ADVISED, mm); } } else { mm_flags_clear(MMF_DISABLE_THP_COMPLETELY, mm); mm_flags_clear(MMF_DISABLE_THP_EXCEPT_ADVISED, mm); } mmap_write_unlock(current->mm); return 0; } SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, unsigned long, arg4, unsigned long, arg5) { struct task_struct *me = current; unsigned char comm[sizeof(me->comm)]; long error; error = security_task_prctl(option, arg2, arg3, arg4, arg5); if (error != -ENOSYS) return error; error = 0; switch (option) { case PR_SET_PDEATHSIG: if (!valid_signal(arg2)) { error = -EINVAL; break; } /* * Ensure that either: * * 1. Subsequent getppid() calls reflect the parent process having died. * 2. forget_original_parent() will send the new me->pdeath_signal. * * Also prevent the read of me->pdeath_signal from being a data race. */ read_lock(&tasklist_lock); me->pdeath_signal = arg2; read_unlock(&tasklist_lock); break; case PR_GET_PDEATHSIG: error = put_user(me->pdeath_signal, (int __user *)arg2); break; case PR_GET_DUMPABLE: error = get_dumpable(me->mm); break; case PR_SET_DUMPABLE: if (arg2 != SUID_DUMP_DISABLE && arg2 != SUID_DUMP_USER) { error = -EINVAL; break; } set_dumpable(me->mm, arg2); break; case PR_SET_UNALIGN: error = SET_UNALIGN_CTL(me, arg2); break; case PR_GET_UNALIGN: error = GET_UNALIGN_CTL(me, arg2); break; case PR_SET_FPEMU: error = SET_FPEMU_CTL(me, arg2); break; case PR_GET_FPEMU: error = GET_FPEMU_CTL(me, arg2); break; case PR_SET_FPEXC: error = SET_FPEXC_CTL(me, arg2); break; case PR_GET_FPEXC: error = GET_FPEXC_CTL(me, arg2); break; case PR_GET_TIMING: error = PR_TIMING_STATISTICAL; break; case PR_SET_TIMING: if (arg2 != PR_TIMING_STATISTICAL) error = -EINVAL; break; case PR_SET_NAME: comm[sizeof(me->comm) - 1] = 0; if (strncpy_from_user(comm, (char __user *)arg2, sizeof(me->comm) - 1) < 0) return -EFAULT; set_task_comm(me, comm); proc_comm_connector(me); break; case PR_GET_NAME: get_task_comm(comm, me); if (copy_to_user((char __user *)arg2, comm, sizeof(comm))) return -EFAULT; break; case PR_GET_ENDIAN: error = GET_ENDIAN(me, arg2); break; case PR_SET_ENDIAN: error = SET_ENDIAN(me, arg2); break; case PR_GET_SECCOMP: error = prctl_get_seccomp(); break; case PR_SET_SECCOMP: error = prctl_set_seccomp(arg2, (char __user *)arg3); break; case PR_GET_TSC: error = GET_TSC_CTL(arg2); break; case PR_SET_TSC: error = SET_TSC_CTL(arg2); break; case PR_TASK_PERF_EVENTS_DISABLE: error = perf_event_task_disable(); break; case PR_TASK_PERF_EVENTS_ENABLE: error = perf_event_task_enable(); break; case PR_GET_TIMERSLACK: if (current->timer_slack_ns > ULONG_MAX) error = ULONG_MAX; else error = current->timer_slack_ns; break; case PR_SET_TIMERSLACK: if (rt_or_dl_task_policy(current)) break; if (arg2 <= 0) current->timer_slack_ns = current->default_timer_slack_ns; else current->timer_slack_ns = arg2; break; case PR_MCE_KILL: if (arg4 | arg5) return -EINVAL; switch (arg2) { case PR_MCE_KILL_CLEAR: if (arg3 != 0) return -EINVAL; current->flags &= ~PF_MCE_PROCESS; break; case PR_MCE_KILL_SET: current->flags |= PF_MCE_PROCESS; if (arg3 == PR_MCE_KILL_EARLY) current->flags |= PF_MCE_EARLY; else if (arg3 == PR_MCE_KILL_LATE) current->flags &= ~PF_MCE_EARLY; else if (arg3 == PR_MCE_KILL_DEFAULT) current->flags &= ~(PF_MCE_EARLY|PF_MCE_PROCESS); else return -EINVAL; break; default: return -EINVAL; } break; case PR_MCE_KILL_GET: if (arg2 | arg3 | arg4 | arg5) return -EINVAL; if (current->flags & PF_MCE_PROCESS) error = (current->flags & PF_MCE_EARLY) ? PR_MCE_KILL_EARLY : PR_MCE_KILL_LATE; else error = PR_MCE_KILL_DEFAULT; break; case PR_SET_MM: error = prctl_set_mm(arg2, arg3, arg4, arg5); break; case PR_GET_TID_ADDRESS: error = prctl_get_tid_address(me, (int __user * __user *)arg2); break; case PR_SET_CHILD_SUBREAPER: me->signal->is_child_subreaper = !!arg2; if (!arg2) break; walk_process_tree(me, propagate_has_child_subreaper, NULL); break; case PR_GET_CHILD_SUBREAPER: error = put_user(me->signal->is_child_subreaper, (int __user *)arg2); break; case PR_SET_NO_NEW_PRIVS: if (arg2 != 1 || arg3 || arg4 || arg5) return -EINVAL; task_set_no_new_privs(current); break; case PR_GET_NO_NEW_PRIVS: if (arg2 || arg3 || arg4 || arg5) return -EINVAL; return task_no_new_privs(current) ? 1 : 0; case PR_GET_THP_DISABLE: error = prctl_get_thp_disable(arg2, arg3, arg4, arg5); break; case PR_SET_THP_DISABLE: error = prctl_set_thp_disable(arg2, arg3, arg4, arg5); break; case PR_MPX_ENABLE_MANAGEMENT: case PR_MPX_DISABLE_MANAGEMENT: /* No longer implemented: */ return -EINVAL; case PR_SET_FP_MODE: error = SET_FP_MODE(me, arg2); break; case PR_GET_FP_MODE: error = GET_FP_MODE(me); break; case PR_SVE_SET_VL: error = SVE_SET_VL(arg2); break; case PR_SVE_GET_VL: error = SVE_GET_VL(); break; case PR_SME_SET_VL: error = SME_SET_VL(arg2); break; case PR_SME_GET_VL: error = SME_GET_VL(); break; case PR_GET_SPECULATION_CTRL: if (arg3 || arg4 || arg5) return -EINVAL; error = arch_prctl_spec_ctrl_get(me, arg2); break; case PR_SET_SPECULATION_CTRL: if (arg4 || arg5) return -EINVAL; error = arch_prctl_spec_ctrl_set(me, arg2, arg3); break; case PR_PAC_RESET_KEYS: if (arg3 || arg4 || arg5) return -EINVAL; error = PAC_RESET_KEYS(me, arg2); break; case PR_PAC_SET_ENABLED_KEYS: if (arg4 || arg5) return -EINVAL; error = PAC_SET_ENABLED_KEYS(me, arg2, arg3); break; case PR_PAC_GET_ENABLED_KEYS: if (arg2 || arg3 || arg4 || arg5) return -EINVAL; error = PAC_GET_ENABLED_KEYS(me); break; case PR_SET_TAGGED_ADDR_CTRL: if (arg3 || arg4 || arg5) return -EINVAL; error = SET_TAGGED_ADDR_CTRL(arg2); break; case PR_GET_TAGGED_ADDR_CTRL: if (arg2 || arg3 || arg4 || arg5) return -EINVAL; error = GET_TAGGED_ADDR_CTRL(); break; case PR_SET_IO_FLUSHER: if (!capable(CAP_SYS_RESOURCE)) return -EPERM; if (arg3 || arg4 || arg5) return -EINVAL; if (arg2 == 1) current->flags |= PR_IO_FLUSHER; else if (!arg2) current->flags &= ~PR_IO_FLUSHER; else return -EINVAL; break; case PR_GET_IO_FLUSHER: if (!capable(CAP_SYS_RESOURCE)) return -EPERM; if (arg2 || arg3 || arg4 || arg5) return -EINVAL; error = (current->flags & PR_IO_FLUSHER) == PR_IO_FLUSHER; break; case PR_SET_SYSCALL_USER_DISPATCH: error = set_syscall_user_dispatch(arg2, arg3, arg4, (char __user *) arg5); break; #ifdef CONFIG_SCHED_CORE case PR_SCHED_CORE: error = sched_core_share_pid(arg2, arg3, arg4, arg5); break; #endif case PR_SET_MDWE: error = prctl_set_mdwe(arg2, arg3, arg4, arg5); break; case PR_GET_MDWE: error = prctl_get_mdwe(arg2, arg3, arg4, arg5); break; case PR_PPC_GET_DEXCR: if (arg3 || arg4 || arg5) return -EINVAL; error = PPC_GET_DEXCR_ASPECT(me, arg2); break; case PR_PPC_SET_DEXCR: if (arg4 || arg5) return -EINVAL; error = PPC_SET_DEXCR_ASPECT(me, arg2, arg3); break; case PR_SET_VMA: error = prctl_set_vma(arg2, arg3, arg4, arg5); break; case PR_GET_AUXV: if (arg4 || arg5) return -EINVAL; error = prctl_get_auxv((void __user *)arg2, arg3); break; #ifdef CONFIG_KSM case PR_SET_MEMORY_MERGE: if (arg3 || arg4 || arg5) return -EINVAL; if (mmap_write_lock_killable(me->mm)) return -EINTR; if (arg2) error = ksm_enable_merge_any(me->mm); else error = ksm_disable_merge_any(me->mm); mmap_write_unlock(me->mm); break; case PR_GET_MEMORY_MERGE: if (arg2 || arg3 || arg4 || arg5) return -EINVAL; error = !!mm_flags_test(MMF_VM_MERGE_ANY, me->mm); break; #endif case PR_RISCV_V_SET_CONTROL: error = RISCV_V_SET_CONTROL(arg2); break; case PR_RISCV_V_GET_CONTROL: error = RISCV_V_GET_CONTROL(); break; case PR_RISCV_SET_ICACHE_FLUSH_CTX: error = RISCV_SET_ICACHE_FLUSH_CTX(arg2, arg3); break; case PR_GET_SHADOW_STACK_STATUS: if (arg3 || arg4 || arg5) return -EINVAL; error = arch_get_shadow_stack_status(me, (unsigned long __user *) arg2); break; case PR_SET_SHADOW_STACK_STATUS: if (arg3 || arg4 || arg5) return -EINVAL; error = arch_set_shadow_stack_status(me, arg2); break; case PR_LOCK_SHADOW_STACK_STATUS: if (arg3 || arg4 || arg5) return -EINVAL; error = arch_lock_shadow_stack_status(me, arg2); break; case PR_TIMER_CREATE_RESTORE_IDS: if (arg3 || arg4 || arg5) return -EINVAL; error = posixtimer_create_prctl(arg2); break; case PR_FUTEX_HASH: error = futex_hash_prctl(arg2, arg3, arg4); break; default: trace_task_prctl_unknown(option, arg2, arg3, arg4, arg5); error = -EINVAL; break; } return error; } SYSCALL_DEFINE3(getcpu, unsigned __user *, cpup, unsigned __user *, nodep, struct getcpu_cache __user *, unused) { int err = 0; int cpu = raw_smp_processor_id(); if (cpup) err |= put_user(cpu, cpup); if (nodep) err |= put_user(cpu_to_node(cpu), nodep); return err ? -EFAULT : 0; } /** * do_sysinfo - fill in sysinfo struct * @info: pointer to buffer to fill */ static int do_sysinfo(struct sysinfo *info) { unsigned long mem_total, sav_total; unsigned int mem_unit, bitcount; struct timespec64 tp; memset(info, 0, sizeof(struct sysinfo)); ktime_get_boottime_ts64(&tp); timens_add_boottime(&tp); info->uptime = tp.tv_sec + (tp.tv_nsec ? 1 : 0); get_avenrun(info->loads, 0, SI_LOAD_SHIFT - FSHIFT); info->procs = nr_threads; si_meminfo(info); si_swapinfo(info); /* * If the sum of all the available memory (i.e. ram + swap) * is less than can be stored in a 32 bit unsigned long then * we can be binary compatible with 2.2.x kernels. If not, * well, in that case 2.2.x was broken anyways... * * -Erik Andersen <andersee@debian.org> */ mem_total = info->totalram + info->totalswap; if (mem_total < info->totalram || mem_total < info->totalswap) goto out; bitcount = 0; mem_unit = info->mem_unit; while (mem_unit > 1) { bitcount++; mem_unit >>= 1; sav_total = mem_total; mem_total <<= 1; if (mem_total < sav_total) goto out; } /* * If mem_total did not overflow, multiply all memory values by * info->mem_unit and set it to 1. This leaves things compatible * with 2.2.x, and also retains compatibility with earlier 2.4.x * kernels... */ info->mem_unit = 1; info->totalram <<= bitcount; info->freeram <<= bitcount; info->sharedram <<= bitcount; info->bufferram <<= bitcount; info->totalswap <<= bitcount; info->freeswap <<= bitcount; info->totalhigh <<= bitcount; info->freehigh <<= bitcount; out: return 0; } SYSCALL_DEFINE1(sysinfo, struct sysinfo __user *, info) { struct sysinfo val; do_sysinfo(&val); if (copy_to_user(info, &val, sizeof(struct sysinfo))) return -EFAULT; return 0; } #ifdef CONFIG_COMPAT struct compat_sysinfo { s32 uptime; u32 loads[3]; u32 totalram; u32 freeram; u32 sharedram; u32 bufferram; u32 totalswap; u32 freeswap; u16 procs; u16 pad; u32 totalhigh; u32 freehigh; u32 mem_unit; char _f[20-2*sizeof(u32)-sizeof(int)]; }; COMPAT_SYSCALL_DEFINE1(sysinfo, struct compat_sysinfo __user *, info) { struct sysinfo s; struct compat_sysinfo s_32; do_sysinfo(&s); /* Check to see if any memory value is too large for 32-bit and scale * down if needed */ if (upper_32_bits(s.totalram) || upper_32_bits(s.totalswap)) { int bitcount = 0; while (s.mem_unit < PAGE_SIZE) { s.mem_unit <<= 1; bitcount++; } s.totalram >>= bitcount; s.freeram >>= bitcount; s.sharedram >>= bitcount; s.bufferram >>= bitcount; s.totalswap >>= bitcount; s.freeswap >>= bitcount; s.totalhigh >>= bitcount; s.freehigh >>= bitcount; } memset(&s_32, 0, sizeof(s_32)); s_32.uptime = s.uptime; s_32.loads[0] = s.loads[0]; s_32.loads[1] = s.loads[1]; s_32.loads[2] = s.loads[2]; s_32.totalram = s.totalram; s_32.freeram = s.freeram; s_32.sharedram = s.sharedram; s_32.bufferram = s.bufferram; s_32.totalswap = s.totalswap; s_32.freeswap = s.freeswap; s_32.procs = s.procs; s_32.totalhigh = s.totalhigh; s_32.freehigh = s.freehigh; s_32.mem_unit = s.mem_unit; if (copy_to_user(info, &s_32, sizeof(s_32))) return -EFAULT; return 0; } #endif /* CONFIG_COMPAT */ |
| 5 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 | // SPDX-License-Identifier: GPL-2.0 /* Copyright (C) B.A.T.M.A.N. contributors: * * Marek Lindner */ #include "gateway_common.h" #include "main.h" #include <linux/atomic.h> #include <linux/byteorder/generic.h> #include <linux/stddef.h> #include <linux/types.h> #include <uapi/linux/batadv_packet.h> #include <uapi/linux/batman_adv.h> #include "gateway_client.h" #include "tvlv.h" /** * batadv_gw_tvlv_container_update() - update the gw tvlv container after * gateway setting change * @bat_priv: the bat priv with all the mesh interface information */ void batadv_gw_tvlv_container_update(struct batadv_priv *bat_priv) { struct batadv_tvlv_gateway_data gw; u32 down, up; char gw_mode; gw_mode = atomic_read(&bat_priv->gw.mode); switch (gw_mode) { case BATADV_GW_MODE_OFF: case BATADV_GW_MODE_CLIENT: batadv_tvlv_container_unregister(bat_priv, BATADV_TVLV_GW, 1); break; case BATADV_GW_MODE_SERVER: down = atomic_read(&bat_priv->gw.bandwidth_down); up = atomic_read(&bat_priv->gw.bandwidth_up); gw.bandwidth_down = htonl(down); gw.bandwidth_up = htonl(up); batadv_tvlv_container_register(bat_priv, BATADV_TVLV_GW, 1, &gw, sizeof(gw)); break; } } /** * batadv_gw_tvlv_ogm_handler_v1() - process incoming gateway tvlv container * @bat_priv: the bat priv with all the mesh interface information * @orig: the orig_node of the ogm * @flags: flags indicating the tvlv state (see batadv_tvlv_handler_flags) * @tvlv_value: tvlv buffer containing the gateway data * @tvlv_value_len: tvlv buffer length */ static void batadv_gw_tvlv_ogm_handler_v1(struct batadv_priv *bat_priv, struct batadv_orig_node *orig, u8 flags, void *tvlv_value, u16 tvlv_value_len) { struct batadv_tvlv_gateway_data gateway, *gateway_ptr; /* only fetch the tvlv value if the handler wasn't called via the * CIFNOTFND flag and if there is data to fetch */ if (flags & BATADV_TVLV_HANDLER_OGM_CIFNOTFND || tvlv_value_len < sizeof(gateway)) { gateway.bandwidth_down = 0; gateway.bandwidth_up = 0; } else { gateway_ptr = tvlv_value; gateway.bandwidth_down = gateway_ptr->bandwidth_down; gateway.bandwidth_up = gateway_ptr->bandwidth_up; if (gateway.bandwidth_down == 0 || gateway.bandwidth_up == 0) { gateway.bandwidth_down = 0; gateway.bandwidth_up = 0; } } batadv_gw_node_update(bat_priv, orig, &gateway); /* restart gateway selection */ if (gateway.bandwidth_down != 0 && atomic_read(&bat_priv->gw.mode) == BATADV_GW_MODE_CLIENT) batadv_gw_check_election(bat_priv, orig); } /** * batadv_gw_init() - initialise the gateway handling internals * @bat_priv: the bat priv with all the mesh interface information */ void batadv_gw_init(struct batadv_priv *bat_priv) { if (bat_priv->algo_ops->gw.init_sel_class) bat_priv->algo_ops->gw.init_sel_class(bat_priv); else atomic_set(&bat_priv->gw.sel_class, 1); batadv_tvlv_handler_register(bat_priv, batadv_gw_tvlv_ogm_handler_v1, NULL, NULL, BATADV_TVLV_GW, 1, BATADV_TVLV_HANDLER_OGM_CIFNOTFND); } /** * batadv_gw_free() - free the gateway handling internals * @bat_priv: the bat priv with all the mesh interface information */ void batadv_gw_free(struct batadv_priv *bat_priv) { batadv_tvlv_container_unregister(bat_priv, BATADV_TVLV_GW, 1); batadv_tvlv_handler_unregister(bat_priv, BATADV_TVLV_GW, 1); } |
| 19 19 16 19 19 19 19 16 19 1 19 19 19 19 19 1 19 19 19 19 19 19 19 19 19 19 19 19 19 19 1 18 18 16 16 19 19 19 3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 | /* * Non-physical true random number generator based on timing jitter -- * Jitter RNG standalone code. * * Copyright Stephan Mueller <smueller@chronox.de>, 2015 - 2023 * * Design * ====== * * See https://www.chronox.de/jent.html * * License * ======= * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, and the entire permission notice in its entirety, * including the disclaimer of warranties. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. The name of the author may not be used to endorse or promote * products derived from this software without specific prior * written permission. * * ALTERNATIVELY, this product may be distributed under the terms of * the GNU General Public License, in which case the provisions of the GPL2 are * required INSTEAD OF the above restrictions. (This clause is * necessary due to a potential bad interaction between the GPL and * the restrictions contained in a BSD-style copyright.) * * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESS OR IMPLIED * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE, ALL OF * WHICH ARE HEREBY DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT * OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE * USE OF THIS SOFTWARE, EVEN IF NOT ADVISED OF THE POSSIBILITY OF SUCH * DAMAGE. */ /* * This Jitterentropy RNG is based on the jitterentropy library * version 3.4.0 provided at https://www.chronox.de/jent.html */ #ifdef __OPTIMIZE__ #error "The CPU Jitter random number generator must not be compiled with optimizations. See documentation. Use the compiler switch -O0 for compiling jitterentropy.c." #endif typedef unsigned long long __u64; typedef long long __s64; typedef unsigned int __u32; typedef unsigned char u8; #define NULL ((void *) 0) /* The entropy pool */ struct rand_data { /* SHA3-256 is used as conditioner */ #define DATA_SIZE_BITS 256 /* all data values that are vital to maintain the security * of the RNG are marked as SENSITIVE. A user must not * access that information while the RNG executes its loops to * calculate the next random value. */ void *hash_state; /* SENSITIVE hash state entropy pool */ __u64 prev_time; /* SENSITIVE Previous time stamp */ __u64 last_delta; /* SENSITIVE stuck test */ __s64 last_delta2; /* SENSITIVE stuck test */ unsigned int flags; /* Flags used to initialize */ unsigned int osr; /* Oversample rate */ #define JENT_MEMORY_ACCESSLOOPS 128 #define JENT_MEMORY_SIZE \ (CONFIG_CRYPTO_JITTERENTROPY_MEMORY_BLOCKS * \ CONFIG_CRYPTO_JITTERENTROPY_MEMORY_BLOCKSIZE) unsigned char *mem; /* Memory access location with size of * memblocks * memblocksize */ unsigned int memlocation; /* Pointer to byte in *mem */ unsigned int memblocks; /* Number of memory blocks in *mem */ unsigned int memblocksize; /* Size of one memory block in bytes */ unsigned int memaccessloops; /* Number of memory accesses per random * bit generation */ /* Repetition Count Test */ unsigned int rct_count; /* Number of stuck values */ /* Adaptive Proportion Test cutoff values */ unsigned int apt_cutoff; /* Intermittent health test failure */ unsigned int apt_cutoff_permanent; /* Permanent health test failure */ #define JENT_APT_WINDOW_SIZE 512 /* Data window size */ /* LSB of time stamp to process */ #define JENT_APT_LSB 16 #define JENT_APT_WORD_MASK (JENT_APT_LSB - 1) unsigned int apt_observations; /* Number of collected observations */ unsigned int apt_count; /* APT counter */ unsigned int apt_base; /* APT base reference */ unsigned int health_failure; /* Record health failure */ unsigned int apt_base_set:1; /* APT base reference set? */ }; /* Flags that can be used to initialize the RNG */ #define JENT_DISABLE_MEMORY_ACCESS (1<<2) /* Disable memory access for more * entropy, saves MEMORY_SIZE RAM for * entropy collector */ /* -- error codes for init function -- */ #define JENT_ENOTIME 1 /* Timer service not available */ #define JENT_ECOARSETIME 2 /* Timer too coarse for RNG */ #define JENT_ENOMONOTONIC 3 /* Timer is not monotonic increasing */ #define JENT_EVARVAR 5 /* Timer does not produce variations of * variations (2nd derivation of time is * zero). */ #define JENT_ESTUCK 8 /* Too many stuck results during init. */ #define JENT_EHEALTH 9 /* Health test failed during initialization */ #define JENT_ERCT 10 /* RCT failed during initialization */ #define JENT_EHASH 11 /* Hash self test failed */ #define JENT_EMEM 12 /* Can't allocate memory for initialization */ #define JENT_RCT_FAILURE 1 /* Failure in RCT health test. */ #define JENT_APT_FAILURE 2 /* Failure in APT health test. */ #define JENT_PERMANENT_FAILURE_SHIFT 16 #define JENT_PERMANENT_FAILURE(x) (x << JENT_PERMANENT_FAILURE_SHIFT) #define JENT_RCT_FAILURE_PERMANENT JENT_PERMANENT_FAILURE(JENT_RCT_FAILURE) #define JENT_APT_FAILURE_PERMANENT JENT_PERMANENT_FAILURE(JENT_APT_FAILURE) /* * The output n bits can receive more than n bits of min entropy, of course, * but the fixed output of the conditioning function can only asymptotically * approach the output size bits of min entropy, not attain that bound. Random * maps will tend to have output collisions, which reduces the creditable * output entropy (that is what SP 800-90B Section 3.1.5.1.2 attempts to bound). * * The value "64" is justified in Appendix A.4 of the current 90C draft, * and aligns with NIST's in "epsilon" definition in this document, which is * that a string can be considered "full entropy" if you can bound the min * entropy in each bit of output to at least 1-epsilon, where epsilon is * required to be <= 2^(-32). */ #define JENT_ENTROPY_SAFETY_FACTOR 64 #include <linux/array_size.h> #include <linux/fips.h> #include <linux/minmax.h> #include "jitterentropy.h" /*************************************************************************** * Adaptive Proportion Test * * This test complies with SP800-90B section 4.4.2. ***************************************************************************/ /* * See the SP 800-90B comment #10b for the corrected cutoff for the SP 800-90B * APT. * https://www.untruth.org/~josh/sp80090b/UL%20SP800-90B-final%20comments%20v1.9%2020191212.pdf * In the syntax of R, this is C = 2 + qbinom(1 − 2^(−30), 511, 2^(-1/osr)). * (The original formula wasn't correct because the first symbol must * necessarily have been observed, so there is no chance of observing 0 of these * symbols.) * * For the alpha < 2^-53, R cannot be used as it uses a float data type without * arbitrary precision. A SageMath script is used to calculate those cutoff * values. * * For any value above 14, this yields the maximal allowable value of 512 * (by FIPS 140-2 IG 7.19 Resolution # 16, we cannot choose a cutoff value that * renders the test unable to fail). */ static const unsigned int jent_apt_cutoff_lookup[15] = { 325, 422, 459, 477, 488, 494, 499, 502, 505, 507, 508, 509, 510, 511, 512 }; static const unsigned int jent_apt_cutoff_permanent_lookup[15] = { 355, 447, 479, 494, 502, 507, 510, 512, 512, 512, 512, 512, 512, 512, 512 }; static void jent_apt_init(struct rand_data *ec, unsigned int osr) { /* * Establish the apt_cutoff based on the presumed entropy rate of * 1/osr. */ if (osr >= ARRAY_SIZE(jent_apt_cutoff_lookup)) { ec->apt_cutoff = jent_apt_cutoff_lookup[ ARRAY_SIZE(jent_apt_cutoff_lookup) - 1]; ec->apt_cutoff_permanent = jent_apt_cutoff_permanent_lookup[ ARRAY_SIZE(jent_apt_cutoff_permanent_lookup) - 1]; } else { ec->apt_cutoff = jent_apt_cutoff_lookup[osr - 1]; ec->apt_cutoff_permanent = jent_apt_cutoff_permanent_lookup[osr - 1]; } } /* * Reset the APT counter * * @ec [in] Reference to entropy collector */ static void jent_apt_reset(struct rand_data *ec, unsigned int delta_masked) { /* Reset APT counter */ ec->apt_count = 0; ec->apt_base = delta_masked; ec->apt_observations = 0; } /* * Insert a new entropy event into APT * * @ec [in] Reference to entropy collector * @delta_masked [in] Masked time delta to process */ static void jent_apt_insert(struct rand_data *ec, unsigned int delta_masked) { /* Initialize the base reference */ if (!ec->apt_base_set) { ec->apt_base = delta_masked; ec->apt_base_set = 1; return; } if (delta_masked == ec->apt_base) { ec->apt_count++; /* Note, ec->apt_count starts with one. */ if (ec->apt_count >= ec->apt_cutoff_permanent) ec->health_failure |= JENT_APT_FAILURE_PERMANENT; else if (ec->apt_count >= ec->apt_cutoff) ec->health_failure |= JENT_APT_FAILURE; } ec->apt_observations++; if (ec->apt_observations >= JENT_APT_WINDOW_SIZE) jent_apt_reset(ec, delta_masked); } /*************************************************************************** * Stuck Test and its use as Repetition Count Test * * The Jitter RNG uses an enhanced version of the Repetition Count Test * (RCT) specified in SP800-90B section 4.4.1. Instead of counting identical * back-to-back values, the input to the RCT is the counting of the stuck * values during the generation of one Jitter RNG output block. * * The RCT is applied with an alpha of 2^{-30} compliant to FIPS 140-2 IG 9.8. * * During the counting operation, the Jitter RNG always calculates the RCT * cut-off value of C. If that value exceeds the allowed cut-off value, * the Jitter RNG output block will be calculated completely but discarded at * the end. The caller of the Jitter RNG is informed with an error code. ***************************************************************************/ /* * Repetition Count Test as defined in SP800-90B section 4.4.1 * * @ec [in] Reference to entropy collector * @stuck [in] Indicator whether the value is stuck */ static void jent_rct_insert(struct rand_data *ec, int stuck) { if (stuck) { ec->rct_count++; /* * The cutoff value is based on the following consideration: * alpha = 2^-30 or 2^-60 as recommended in SP800-90B. * In addition, we require an entropy value H of 1/osr as this * is the minimum entropy required to provide full entropy. * Note, we collect (DATA_SIZE_BITS + ENTROPY_SAFETY_FACTOR)*osr * deltas for inserting them into the entropy pool which should * then have (close to) DATA_SIZE_BITS bits of entropy in the * conditioned output. * * Note, ec->rct_count (which equals to value B in the pseudo * code of SP800-90B section 4.4.1) starts with zero. Hence * we need to subtract one from the cutoff value as calculated * following SP800-90B. Thus C = ceil(-log_2(alpha)/H) = 30*osr * or 60*osr. */ if ((unsigned int)ec->rct_count >= (60 * ec->osr)) { ec->rct_count = -1; ec->health_failure |= JENT_RCT_FAILURE_PERMANENT; } else if ((unsigned int)ec->rct_count >= (30 * ec->osr)) { ec->rct_count = -1; ec->health_failure |= JENT_RCT_FAILURE; } } else { /* Reset RCT */ ec->rct_count = 0; } } static inline __u64 jent_delta(__u64 prev, __u64 next) { #define JENT_UINT64_MAX (__u64)(~((__u64) 0)) return (prev < next) ? (next - prev) : (JENT_UINT64_MAX - prev + 1 + next); } /* * Stuck test by checking the: * 1st derivative of the jitter measurement (time delta) * 2nd derivative of the jitter measurement (delta of time deltas) * 3rd derivative of the jitter measurement (delta of delta of time deltas) * * All values must always be non-zero. * * @ec [in] Reference to entropy collector * @current_delta [in] Jitter time delta * * @return * 0 jitter measurement not stuck (good bit) * 1 jitter measurement stuck (reject bit) */ static int jent_stuck(struct rand_data *ec, __u64 current_delta) { __u64 delta2 = jent_delta(ec->last_delta, current_delta); __u64 delta3 = jent_delta(ec->last_delta2, delta2); ec->last_delta = current_delta; ec->last_delta2 = delta2; /* * Insert the result of the comparison of two back-to-back time * deltas. */ jent_apt_insert(ec, current_delta); if (!current_delta || !delta2 || !delta3) { /* RCT with a stuck bit */ jent_rct_insert(ec, 1); return 1; } /* RCT with a non-stuck bit */ jent_rct_insert(ec, 0); return 0; } /* * Report any health test failures * * @ec [in] Reference to entropy collector * * @return a bitmask indicating which tests failed * 0 No health test failure * 1 RCT failure * 2 APT failure * 1<<JENT_PERMANENT_FAILURE_SHIFT RCT permanent failure * 2<<JENT_PERMANENT_FAILURE_SHIFT APT permanent failure */ static unsigned int jent_health_failure(struct rand_data *ec) { /* Test is only enabled in FIPS mode */ if (!fips_enabled) return 0; return ec->health_failure; } /*************************************************************************** * Noise sources ***************************************************************************/ /* * Update of the loop count used for the next round of * an entropy collection. * * Input: * @bits is the number of low bits of the timer to consider * @min is the number of bits we shift the timer value to the right at * the end to make sure we have a guaranteed minimum value * * @return Newly calculated loop counter */ static __u64 jent_loop_shuffle(unsigned int bits, unsigned int min) { __u64 time = 0; __u64 shuffle = 0; unsigned int i = 0; unsigned int mask = (1<<bits) - 1; jent_get_nstime(&time); /* * We fold the time value as much as possible to ensure that as many * bits of the time stamp are included as possible. */ for (i = 0; ((DATA_SIZE_BITS + bits - 1) / bits) > i; i++) { shuffle ^= time & mask; time = time >> bits; } /* * We add a lower boundary value to ensure we have a minimum * RNG loop count. */ return (shuffle + (1<<min)); } /* * CPU Jitter noise source -- this is the noise source based on the CPU * execution time jitter * * This function injects the individual bits of the time value into the * entropy pool using a hash. * * ec [in] entropy collector * time [in] time stamp to be injected * stuck [in] Is the time stamp identified as stuck? * * Output: * updated hash context in the entropy collector or error code */ static int jent_condition_data(struct rand_data *ec, __u64 time, int stuck) { #define SHA3_HASH_LOOP (1<<3) struct { int rct_count; unsigned int apt_observations; unsigned int apt_count; unsigned int apt_base; } addtl = { ec->rct_count, ec->apt_observations, ec->apt_count, ec->apt_base }; return jent_hash_time(ec->hash_state, time, (u8 *)&addtl, sizeof(addtl), SHA3_HASH_LOOP, stuck); } /* * Memory Access noise source -- this is a noise source based on variations in * memory access times * * This function performs memory accesses which will add to the timing * variations due to an unknown amount of CPU wait states that need to be * added when accessing memory. The memory size should be larger than the L1 * caches as outlined in the documentation and the associated testing. * * The L1 cache has a very high bandwidth, albeit its access rate is usually * slower than accessing CPU registers. Therefore, L1 accesses only add minimal * variations as the CPU has hardly to wait. Starting with L2, significant * variations are added because L2 typically does not belong to the CPU any more * and therefore a wider range of CPU wait states is necessary for accesses. * L3 and real memory accesses have even a wider range of wait states. However, * to reliably access either L3 or memory, the ec->mem memory must be quite * large which is usually not desirable. * * @ec [in] Reference to the entropy collector with the memory access data -- if * the reference to the memory block to be accessed is NULL, this noise * source is disabled * @loop_cnt [in] if a value not equal to 0 is set, use the given value * number of loops to perform the LFSR */ static void jent_memaccess(struct rand_data *ec, __u64 loop_cnt) { unsigned int wrap = 0; __u64 i = 0; #define MAX_ACC_LOOP_BIT 7 #define MIN_ACC_LOOP_BIT 0 __u64 acc_loop_cnt = jent_loop_shuffle(MAX_ACC_LOOP_BIT, MIN_ACC_LOOP_BIT); if (NULL == ec || NULL == ec->mem) return; wrap = ec->memblocksize * ec->memblocks; /* * testing purposes -- allow test app to set the counter, not * needed during runtime */ if (loop_cnt) acc_loop_cnt = loop_cnt; for (i = 0; i < (ec->memaccessloops + acc_loop_cnt); i++) { unsigned char *tmpval = ec->mem + ec->memlocation; /* * memory access: just add 1 to one byte, * wrap at 255 -- memory access implies read * from and write to memory location */ *tmpval = (*tmpval + 1) & 0xff; /* * Addition of memblocksize - 1 to pointer * with wrap around logic to ensure that every * memory location is hit evenly */ ec->memlocation = ec->memlocation + ec->memblocksize - 1; ec->memlocation = ec->memlocation % wrap; } } /*************************************************************************** * Start of entropy processing logic ***************************************************************************/ /* * This is the heart of the entropy generation: calculate time deltas and * use the CPU jitter in the time deltas. The jitter is injected into the * entropy pool. * * WARNING: ensure that ->prev_time is primed before using the output * of this function! This can be done by calling this function * and not using its result. * * @ec [in] Reference to entropy collector * * @return result of stuck test */ static int jent_measure_jitter(struct rand_data *ec, __u64 *ret_current_delta) { __u64 time = 0; __u64 current_delta = 0; int stuck; /* Invoke one noise source before time measurement to add variations */ jent_memaccess(ec, 0); /* * Get time stamp and calculate time delta to previous * invocation to measure the timing variations */ jent_get_nstime(&time); current_delta = jent_delta(ec->prev_time, time); ec->prev_time = time; /* Check whether we have a stuck measurement. */ stuck = jent_stuck(ec, current_delta); /* Now call the next noise sources which also injects the data */ if (jent_condition_data(ec, current_delta, stuck)) stuck = 1; /* return the raw entropy value */ if (ret_current_delta) *ret_current_delta = current_delta; return stuck; } /* * Generator of one 64 bit random number * Function fills rand_data->hash_state * * @ec [in] Reference to entropy collector */ static void jent_gen_entropy(struct rand_data *ec) { unsigned int k = 0, safety_factor = 0; if (fips_enabled) safety_factor = JENT_ENTROPY_SAFETY_FACTOR; /* priming of the ->prev_time value */ jent_measure_jitter(ec, NULL); while (!jent_health_failure(ec)) { /* If a stuck measurement is received, repeat measurement */ if (jent_measure_jitter(ec, NULL)) continue; /* * We multiply the loop value with ->osr to obtain the * oversampling rate requested by the caller */ if (++k >= ((DATA_SIZE_BITS + safety_factor) * ec->osr)) break; } } /* * Entry function: Obtain entropy for the caller. * * This function invokes the entropy gathering logic as often to generate * as many bytes as requested by the caller. The entropy gathering logic * creates 64 bit per invocation. * * This function truncates the last 64 bit entropy value output to the exact * size specified by the caller. * * @ec [in] Reference to entropy collector * @data [in] pointer to buffer for storing random data -- buffer must already * exist * @len [in] size of the buffer, specifying also the requested number of random * in bytes * * @return 0 when request is fulfilled or an error * * The following error codes can occur: * -1 entropy_collector is NULL or the generation failed * -2 Intermittent health failure * -3 Permanent health failure */ int jent_read_entropy(struct rand_data *ec, unsigned char *data, unsigned int len) { unsigned char *p = data; if (!ec) return -1; while (len > 0) { unsigned int tocopy, health_test_result; jent_gen_entropy(ec); health_test_result = jent_health_failure(ec); if (health_test_result > JENT_PERMANENT_FAILURE_SHIFT) { /* * At this point, the Jitter RNG instance is considered * as a failed instance. There is no rerun of the * startup test any more, because the caller * is assumed to not further use this instance. */ return -3; } else if (health_test_result) { /* * Perform startup health tests and return permanent * error if it fails. */ if (jent_entropy_init(0, 0, NULL, ec)) { /* Mark the permanent error */ ec->health_failure &= JENT_RCT_FAILURE_PERMANENT | JENT_APT_FAILURE_PERMANENT; return -3; } return -2; } tocopy = min(DATA_SIZE_BITS / 8, len); if (jent_read_random_block(ec->hash_state, p, tocopy)) return -1; len -= tocopy; p += tocopy; } return 0; } /*************************************************************************** * Initialization logic ***************************************************************************/ struct rand_data *jent_entropy_collector_alloc(unsigned int osr, unsigned int flags, void *hash_state) { struct rand_data *entropy_collector; entropy_collector = jent_zalloc(sizeof(struct rand_data)); if (!entropy_collector) return NULL; if (!(flags & JENT_DISABLE_MEMORY_ACCESS)) { /* Allocate memory for adding variations based on memory * access */ entropy_collector->mem = jent_kvzalloc(JENT_MEMORY_SIZE); if (!entropy_collector->mem) { jent_zfree(entropy_collector); return NULL; } entropy_collector->memblocksize = CONFIG_CRYPTO_JITTERENTROPY_MEMORY_BLOCKSIZE; entropy_collector->memblocks = CONFIG_CRYPTO_JITTERENTROPY_MEMORY_BLOCKS; entropy_collector->memaccessloops = JENT_MEMORY_ACCESSLOOPS; } /* verify and set the oversampling rate */ if (osr == 0) osr = 1; /* H_submitter = 1 / osr */ entropy_collector->osr = osr; entropy_collector->flags = flags; entropy_collector->hash_state = hash_state; /* Initialize the APT */ jent_apt_init(entropy_collector, osr); /* fill the data pad with non-zero values */ jent_gen_entropy(entropy_collector); return entropy_collector; } void jent_entropy_collector_free(struct rand_data *entropy_collector) { jent_kvzfree(entropy_collector->mem, JENT_MEMORY_SIZE); entropy_collector->mem = NULL; jent_zfree(entropy_collector); } int jent_entropy_init(unsigned int osr, unsigned int flags, void *hash_state, struct rand_data *p_ec) { /* * If caller provides an allocated ec, reuse it which implies that the * health test entropy data is used to further still the available * entropy pool. */ struct rand_data *ec = p_ec; int i, time_backwards = 0, ret = 0, ec_free = 0; unsigned int health_test_result; if (!ec) { ec = jent_entropy_collector_alloc(osr, flags, hash_state); if (!ec) return JENT_EMEM; ec_free = 1; } else { /* Reset the APT */ jent_apt_reset(ec, 0); /* Ensure that a new APT base is obtained */ ec->apt_base_set = 0; /* Reset the RCT */ ec->rct_count = 0; /* Reset intermittent, leave permanent health test result */ ec->health_failure &= (~JENT_RCT_FAILURE); ec->health_failure &= (~JENT_APT_FAILURE); } /* We could perform statistical tests here, but the problem is * that we only have a few loop counts to do testing. These * loop counts may show some slight skew and we produce * false positives. * * Moreover, only old systems show potentially problematic * jitter entropy that could potentially be caught here. But * the RNG is intended for hardware that is available or widely * used, but not old systems that are long out of favor. Thus, * no statistical tests. */ /* * We could add a check for system capabilities such as clock_getres or * check for CONFIG_X86_TSC, but it does not make much sense as the * following sanity checks verify that we have a high-resolution * timer. */ /* * TESTLOOPCOUNT needs some loops to identify edge systems. 100 is * definitely too little. * * SP800-90B requires at least 1024 initial test cycles. */ #define TESTLOOPCOUNT 1024 #define CLEARCACHE 100 for (i = 0; (TESTLOOPCOUNT + CLEARCACHE) > i; i++) { __u64 start_time = 0, end_time = 0, delta = 0; /* Invoke core entropy collection logic */ jent_measure_jitter(ec, &delta); end_time = ec->prev_time; start_time = ec->prev_time - delta; /* test whether timer works */ if (!start_time || !end_time) { ret = JENT_ENOTIME; goto out; } /* * test whether timer is fine grained enough to provide * delta even when called shortly after each other -- this * implies that we also have a high resolution timer */ if (!delta || (end_time == start_time)) { ret = JENT_ECOARSETIME; goto out; } /* * up to here we did not modify any variable that will be * evaluated later, but we already performed some work. Thus we * already have had an impact on the caches, branch prediction, * etc. with the goal to clear it to get the worst case * measurements. */ if (i < CLEARCACHE) continue; /* test whether we have an increasing timer */ if (!(end_time > start_time)) time_backwards++; } /* * we allow up to three times the time running backwards. * CLOCK_REALTIME is affected by adjtime and NTP operations. Thus, * if such an operation just happens to interfere with our test, it * should not fail. The value of 3 should cover the NTP case being * performed during our test run. */ if (time_backwards > 3) { ret = JENT_ENOMONOTONIC; goto out; } /* Did we encounter a health test failure? */ health_test_result = jent_health_failure(ec); if (health_test_result) { ret = (health_test_result & JENT_RCT_FAILURE) ? JENT_ERCT : JENT_EHEALTH; goto out; } out: if (ec_free) jent_entropy_collector_free(ec); return ret; } |
| 154 199 34 23 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 | /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _NF_CONNTRACK_EXTEND_H #define _NF_CONNTRACK_EXTEND_H #include <linux/slab.h> #include <net/netfilter/nf_conntrack.h> enum nf_ct_ext_id { NF_CT_EXT_HELPER, #if IS_ENABLED(CONFIG_NF_NAT) NF_CT_EXT_NAT, #endif NF_CT_EXT_SEQADJ, NF_CT_EXT_ACCT, #ifdef CONFIG_NF_CONNTRACK_EVENTS NF_CT_EXT_ECACHE, #endif #ifdef CONFIG_NF_CONNTRACK_TIMESTAMP NF_CT_EXT_TSTAMP, #endif #ifdef CONFIG_NF_CONNTRACK_TIMEOUT NF_CT_EXT_TIMEOUT, #endif #ifdef CONFIG_NF_CONNTRACK_LABELS NF_CT_EXT_LABELS, #endif #if IS_ENABLED(CONFIG_NETFILTER_SYNPROXY) NF_CT_EXT_SYNPROXY, #endif #if IS_ENABLED(CONFIG_NET_ACT_CT) NF_CT_EXT_ACT_CT, #endif NF_CT_EXT_NUM, }; /* Extensions: optional stuff which isn't permanently in struct. */ struct nf_ct_ext { u8 offset[NF_CT_EXT_NUM]; u8 len; unsigned int gen_id; char data[] __aligned(8); }; static inline bool __nf_ct_ext_exist(const struct nf_ct_ext *ext, u8 id) { return !!ext->offset[id]; } static inline bool nf_ct_ext_exist(const struct nf_conn *ct, u8 id) { return (ct->ext && __nf_ct_ext_exist(ct->ext, id)); } void *__nf_ct_ext_find(const struct nf_ct_ext *ext, u8 id); static inline void *nf_ct_ext_find(const struct nf_conn *ct, u8 id) { struct nf_ct_ext *ext = ct->ext; if (!ext || !__nf_ct_ext_exist(ext, id)) return NULL; if (unlikely(ext->gen_id)) return __nf_ct_ext_find(ext, id); return (void *)ct->ext + ct->ext->offset[id]; } /* Add this type, returns pointer to data or NULL. */ void *nf_ct_ext_add(struct nf_conn *ct, enum nf_ct_ext_id id, gfp_t gfp); /* ext genid. if ext->id != ext_genid, extensions cannot be used * anymore unless conntrack has CONFIRMED bit set. */ extern atomic_t nf_conntrack_ext_genid; void nf_ct_ext_bump_genid(void); #endif /* _NF_CONNTRACK_EXTEND_H */ |
| 2 2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 | // SPDX-License-Identifier: GPL-2.0-only #include <linux/ethtool.h> #include <linux/sfp.h> #include "netlink.h" #include "common.h" struct eeprom_req_info { struct ethnl_req_info base; u32 offset; u32 length; u8 page; u8 bank; u8 i2c_address; }; struct eeprom_reply_data { struct ethnl_reply_data base; u32 length; u8 *data; }; #define MODULE_EEPROM_REQINFO(__req_base) \ container_of(__req_base, struct eeprom_req_info, base) #define MODULE_EEPROM_REPDATA(__reply_base) \ container_of(__reply_base, struct eeprom_reply_data, base) static int fallback_set_params(struct eeprom_req_info *request, struct ethtool_modinfo *modinfo, struct ethtool_eeprom *eeprom) { u32 offset = request->offset; u32 length = request->length; if (request->page) offset = request->page * ETH_MODULE_EEPROM_PAGE_LEN + offset; if (modinfo->type == ETH_MODULE_SFF_8472 && request->i2c_address == 0x51) offset += ETH_MODULE_EEPROM_PAGE_LEN * 2; if (offset >= modinfo->eeprom_len) return -EINVAL; eeprom->cmd = ETHTOOL_GMODULEEEPROM; eeprom->len = length; eeprom->offset = offset; return 0; } static int eeprom_fallback(struct eeprom_req_info *request, struct eeprom_reply_data *reply) { struct net_device *dev = reply->base.dev; struct ethtool_modinfo modinfo = {0}; struct ethtool_eeprom eeprom = {0}; u8 *data; int err; modinfo.cmd = ETHTOOL_GMODULEINFO; err = ethtool_get_module_info_call(dev, &modinfo); if (err < 0) return err; err = fallback_set_params(request, &modinfo, &eeprom); if (err < 0) return err; data = kmalloc(eeprom.len, GFP_KERNEL); if (!data) return -ENOMEM; err = ethtool_get_module_eeprom_call(dev, &eeprom, data); if (err < 0) goto err_out; reply->data = data; reply->length = eeprom.len; return 0; err_out: kfree(data); return err; } static int get_module_eeprom_by_page(struct net_device *dev, struct ethtool_module_eeprom *page_data, struct netlink_ext_ack *extack) { const struct ethtool_ops *ops = dev->ethtool_ops; if (dev->ethtool->module_fw_flash_in_progress) { NL_SET_ERR_MSG(extack, "Module firmware flashing is in progress"); return -EBUSY; } if (dev->sfp_bus) return sfp_get_module_eeprom_by_page(dev->sfp_bus, page_data, extack); if (ops->get_module_eeprom_by_page) return ops->get_module_eeprom_by_page(dev, page_data, extack); return -EOPNOTSUPP; } static int eeprom_prepare_data(const struct ethnl_req_info *req_base, struct ethnl_reply_data *reply_base, const struct genl_info *info) { struct eeprom_reply_data *reply = MODULE_EEPROM_REPDATA(reply_base); struct eeprom_req_info *request = MODULE_EEPROM_REQINFO(req_base); struct ethtool_module_eeprom page_data = {0}; struct net_device *dev = reply_base->dev; int ret; page_data.offset = request->offset; page_data.length = request->length; page_data.i2c_address = request->i2c_address; page_data.page = request->page; page_data.bank = request->bank; page_data.data = kmalloc(page_data.length, GFP_KERNEL); if (!page_data.data) return -ENOMEM; ret = ethnl_ops_begin(dev); if (ret) goto err_free; ret = get_module_eeprom_by_page(dev, &page_data, info->extack); if (ret < 0) goto err_ops; reply->length = ret; reply->data = page_data.data; ethnl_ops_complete(dev); return 0; err_ops: ethnl_ops_complete(dev); err_free: kfree(page_data.data); if (ret == -EOPNOTSUPP) return eeprom_fallback(request, reply); return ret; } static int eeprom_parse_request(struct ethnl_req_info *req_info, struct nlattr **tb, struct netlink_ext_ack *extack) { struct eeprom_req_info *request = MODULE_EEPROM_REQINFO(req_info); if (!tb[ETHTOOL_A_MODULE_EEPROM_OFFSET] || !tb[ETHTOOL_A_MODULE_EEPROM_LENGTH] || !tb[ETHTOOL_A_MODULE_EEPROM_PAGE] || !tb[ETHTOOL_A_MODULE_EEPROM_I2C_ADDRESS]) return -EINVAL; request->i2c_address = nla_get_u8(tb[ETHTOOL_A_MODULE_EEPROM_I2C_ADDRESS]); request->offset = nla_get_u32(tb[ETHTOOL_A_MODULE_EEPROM_OFFSET]); request->length = nla_get_u32(tb[ETHTOOL_A_MODULE_EEPROM_LENGTH]); /* The following set of conditions limit the API to only dump 1/2 * EEPROM page without crossing low page boundary located at offset 128. * This means user may only request dumps of length limited to 128 from * either low 128 bytes or high 128 bytes. * For pages higher than 0 only high 128 bytes are accessible. */ request->page = nla_get_u8(tb[ETHTOOL_A_MODULE_EEPROM_PAGE]); if (request->page && request->offset < ETH_MODULE_EEPROM_PAGE_LEN) { NL_SET_ERR_MSG_ATTR(extack, tb[ETHTOOL_A_MODULE_EEPROM_PAGE], "reading from lower half page is allowed for page 0 only"); return -EINVAL; } if (request->offset < ETH_MODULE_EEPROM_PAGE_LEN && request->offset + request->length > ETH_MODULE_EEPROM_PAGE_LEN) { NL_SET_ERR_MSG_ATTR(extack, tb[ETHTOOL_A_MODULE_EEPROM_LENGTH], "reading cross half page boundary is illegal"); return -EINVAL; } else if (request->offset + request->length > ETH_MODULE_EEPROM_PAGE_LEN * 2) { NL_SET_ERR_MSG_ATTR(extack, tb[ETHTOOL_A_MODULE_EEPROM_LENGTH], "reading cross page boundary is illegal"); return -EINVAL; } if (tb[ETHTOOL_A_MODULE_EEPROM_BANK]) request->bank = nla_get_u8(tb[ETHTOOL_A_MODULE_EEPROM_BANK]); return 0; } static int eeprom_reply_size(const struct ethnl_req_info *req_base, const struct ethnl_reply_data *reply_base) { const struct eeprom_req_info *request = MODULE_EEPROM_REQINFO(req_base); return nla_total_size(sizeof(u8) * request->length); /* _EEPROM_DATA */ } static int eeprom_fill_reply(struct sk_buff *skb, const struct ethnl_req_info *req_base, const struct ethnl_reply_data *reply_base) { struct eeprom_reply_data *reply = MODULE_EEPROM_REPDATA(reply_base); return nla_put(skb, ETHTOOL_A_MODULE_EEPROM_DATA, reply->length, reply->data); } static void eeprom_cleanup_data(struct ethnl_reply_data *reply_base) { struct eeprom_reply_data *reply = MODULE_EEPROM_REPDATA(reply_base); kfree(reply->data); } const struct ethnl_request_ops ethnl_module_eeprom_request_ops = { .request_cmd = ETHTOOL_MSG_MODULE_EEPROM_GET, .reply_cmd = ETHTOOL_MSG_MODULE_EEPROM_GET_REPLY, .hdr_attr = ETHTOOL_A_MODULE_EEPROM_HEADER, .req_info_size = sizeof(struct eeprom_req_info), .reply_data_size = sizeof(struct eeprom_reply_data), .parse_request = eeprom_parse_request, .prepare_data = eeprom_prepare_data, .reply_size = eeprom_reply_size, .fill_reply = eeprom_fill_reply, .cleanup_data = eeprom_cleanup_data, }; const struct nla_policy ethnl_module_eeprom_get_policy[] = { [ETHTOOL_A_MODULE_EEPROM_HEADER] = NLA_POLICY_NESTED(ethnl_header_policy), [ETHTOOL_A_MODULE_EEPROM_OFFSET] = NLA_POLICY_MAX(NLA_U32, ETH_MODULE_EEPROM_PAGE_LEN * 2 - 1), [ETHTOOL_A_MODULE_EEPROM_LENGTH] = NLA_POLICY_RANGE(NLA_U32, 1, ETH_MODULE_EEPROM_PAGE_LEN), [ETHTOOL_A_MODULE_EEPROM_PAGE] = { .type = NLA_U8 }, [ETHTOOL_A_MODULE_EEPROM_BANK] = { .type = NLA_U8 }, [ETHTOOL_A_MODULE_EEPROM_I2C_ADDRESS] = NLA_POLICY_RANGE(NLA_U8, 0, ETH_MODULE_MAX_I2C_ADDRESS), }; |
| 10 10 10 10 8 10 8 10 10 8 10 10 10 10 10 10 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 | // SPDX-License-Identifier: GPL-2.0-only /* * Copyright (C) 2025 Christoph Hellwig */ #include <linux/blk-integrity.h> #include <linux/blk-mq-dma.h> #include "blk.h" struct phys_vec { phys_addr_t paddr; u32 len; }; static bool __blk_map_iter_next(struct blk_map_iter *iter) { if (iter->iter.bi_size) return true; if (!iter->bio || !iter->bio->bi_next) return false; iter->bio = iter->bio->bi_next; if (iter->is_integrity) { iter->iter = bio_integrity(iter->bio)->bip_iter; iter->bvecs = bio_integrity(iter->bio)->bip_vec; } else { iter->iter = iter->bio->bi_iter; iter->bvecs = iter->bio->bi_io_vec; } return true; } static bool blk_map_iter_next(struct request *req, struct blk_map_iter *iter, struct phys_vec *vec) { unsigned int max_size; struct bio_vec bv; if (!iter->iter.bi_size) return false; bv = mp_bvec_iter_bvec(iter->bvecs, iter->iter); vec->paddr = bvec_phys(&bv); max_size = get_max_segment_size(&req->q->limits, vec->paddr, UINT_MAX); bv.bv_len = min(bv.bv_len, max_size); bvec_iter_advance_single(iter->bvecs, &iter->iter, bv.bv_len); /* * If we are entirely done with this bi_io_vec entry, check if the next * one could be merged into it. This typically happens when moving to * the next bio, but some callers also don't pack bvecs tight. */ while (!iter->iter.bi_size || !iter->iter.bi_bvec_done) { struct bio_vec next; if (!__blk_map_iter_next(iter)) break; next = mp_bvec_iter_bvec(iter->bvecs, iter->iter); if (bv.bv_len + next.bv_len > max_size || !biovec_phys_mergeable(req->q, &bv, &next)) break; bv.bv_len += next.bv_len; bvec_iter_advance_single(iter->bvecs, &iter->iter, next.bv_len); } vec->len = bv.bv_len; return true; } /* * The IOVA-based DMA API wants to be able to coalesce at the minimal IOMMU page * size granularity (which is guaranteed to be <= PAGE_SIZE and usually 4k), so * we need to ensure our segments are aligned to this as well. * * Note that there is no point in using the slightly more complicated IOVA based * path for single segment mappings. */ static inline bool blk_can_dma_map_iova(struct request *req, struct device *dma_dev) { return !((queue_virt_boundary(req->q) + 1) & dma_get_merge_boundary(dma_dev)); } static bool blk_dma_map_bus(struct blk_dma_iter *iter, struct phys_vec *vec) { iter->addr = pci_p2pdma_bus_addr_map(&iter->p2pdma, vec->paddr); iter->len = vec->len; return true; } static bool blk_dma_map_direct(struct request *req, struct device *dma_dev, struct blk_dma_iter *iter, struct phys_vec *vec) { iter->addr = dma_map_page(dma_dev, phys_to_page(vec->paddr), offset_in_page(vec->paddr), vec->len, rq_dma_dir(req)); if (dma_mapping_error(dma_dev, iter->addr)) { iter->status = BLK_STS_RESOURCE; return false; } iter->len = vec->len; return true; } static bool blk_rq_dma_map_iova(struct request *req, struct device *dma_dev, struct dma_iova_state *state, struct blk_dma_iter *iter, struct phys_vec *vec) { enum dma_data_direction dir = rq_dma_dir(req); unsigned int mapped = 0; int error; iter->addr = state->addr; iter->len = dma_iova_size(state); do { error = dma_iova_link(dma_dev, state, vec->paddr, mapped, vec->len, dir, 0); if (error) break; mapped += vec->len; } while (blk_map_iter_next(req, &iter->iter, vec)); error = dma_iova_sync(dma_dev, state, 0, mapped); if (error) { iter->status = errno_to_blk_status(error); return false; } return true; } static inline void blk_rq_map_iter_init(struct request *rq, struct blk_map_iter *iter) { struct bio *bio = rq->bio; if (rq->rq_flags & RQF_SPECIAL_PAYLOAD) { *iter = (struct blk_map_iter) { .bvecs = &rq->special_vec, .iter = { .bi_size = rq->special_vec.bv_len, } }; } else if (bio) { *iter = (struct blk_map_iter) { .bio = bio, .bvecs = bio->bi_io_vec, .iter = bio->bi_iter, }; } else { /* the internal flush request may not have bio attached */ *iter = (struct blk_map_iter) {}; } } static bool blk_dma_map_iter_start(struct request *req, struct device *dma_dev, struct dma_iova_state *state, struct blk_dma_iter *iter, unsigned int total_len) { struct phys_vec vec; memset(&iter->p2pdma, 0, sizeof(iter->p2pdma)); iter->status = BLK_STS_OK; /* * Grab the first segment ASAP because we'll need it to check for P2P * transfers. */ if (!blk_map_iter_next(req, &iter->iter, &vec)) return false; switch (pci_p2pdma_state(&iter->p2pdma, dma_dev, phys_to_page(vec.paddr))) { case PCI_P2PDMA_MAP_BUS_ADDR: if (iter->iter.is_integrity) bio_integrity(req->bio)->bip_flags |= BIP_P2P_DMA; else req->cmd_flags |= REQ_P2PDMA; return blk_dma_map_bus(iter, &vec); case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE: /* * P2P transfers through the host bridge are treated the * same as non-P2P transfers below and during unmap. */ case PCI_P2PDMA_MAP_NONE: break; default: iter->status = BLK_STS_INVAL; return false; } if (blk_can_dma_map_iova(req, dma_dev) && dma_iova_try_alloc(dma_dev, state, vec.paddr, total_len)) return blk_rq_dma_map_iova(req, dma_dev, state, iter, &vec); return blk_dma_map_direct(req, dma_dev, iter, &vec); } /** * blk_rq_dma_map_iter_start - map the first DMA segment for a request * @req: request to map * @dma_dev: device to map to * @state: DMA IOVA state * @iter: block layer DMA iterator * * Start DMA mapping @req to @dma_dev. @state and @iter are provided by the * caller and don't need to be initialized. @state needs to be stored for use * at unmap time, @iter is only needed at map time. * * Returns %false if there is no segment to map, including due to an error, or * %true ft it did map a segment. * * If a segment was mapped, the DMA address for it is returned in @iter.addr and * the length in @iter.len. If no segment was mapped the status code is * returned in @iter.status. * * The caller can call blk_rq_dma_map_coalesce() to check if further segments * need to be mapped after this, or go straight to blk_rq_dma_map_iter_next() * to try to map the following segments. */ bool blk_rq_dma_map_iter_start(struct request *req, struct device *dma_dev, struct dma_iova_state *state, struct blk_dma_iter *iter) { blk_rq_map_iter_init(req, &iter->iter); return blk_dma_map_iter_start(req, dma_dev, state, iter, blk_rq_payload_bytes(req)); } EXPORT_SYMBOL_GPL(blk_rq_dma_map_iter_start); /** * blk_rq_dma_map_iter_next - map the next DMA segment for a request * @req: request to map * @dma_dev: device to map to * @state: DMA IOVA state * @iter: block layer DMA iterator * * Iterate to the next mapping after a previous call to * blk_rq_dma_map_iter_start(). See there for a detailed description of the * arguments. * * Returns %false if there is no segment to map, including due to an error, or * %true ft it did map a segment. * * If a segment was mapped, the DMA address for it is returned in @iter.addr and * the length in @iter.len. If no segment was mapped the status code is * returned in @iter.status. */ bool blk_rq_dma_map_iter_next(struct request *req, struct device *dma_dev, struct dma_iova_state *state, struct blk_dma_iter *iter) { struct phys_vec vec; if (!blk_map_iter_next(req, &iter->iter, &vec)) return false; if (iter->p2pdma.map == PCI_P2PDMA_MAP_BUS_ADDR) return blk_dma_map_bus(iter, &vec); return blk_dma_map_direct(req, dma_dev, iter, &vec); } EXPORT_SYMBOL_GPL(blk_rq_dma_map_iter_next); static inline struct scatterlist * blk_next_sg(struct scatterlist **sg, struct scatterlist *sglist) { if (!*sg) return sglist; /* * If the driver previously mapped a shorter list, we could see a * termination bit prematurely unless it fully inits the sg table * on each mapping. We KNOW that there must be more entries here * or the driver would be buggy, so force clear the termination bit * to avoid doing a full sg_init_table() in drivers for each command. */ sg_unmark_end(*sg); return sg_next(*sg); } /* * Map a request to scatterlist, return number of sg entries setup. Caller * must make sure sg can hold rq->nr_phys_segments entries. */ int __blk_rq_map_sg(struct request *rq, struct scatterlist *sglist, struct scatterlist **last_sg) { struct blk_map_iter iter; struct phys_vec vec; int nsegs = 0; blk_rq_map_iter_init(rq, &iter); while (blk_map_iter_next(rq, &iter, &vec)) { *last_sg = blk_next_sg(last_sg, sglist); sg_set_page(*last_sg, phys_to_page(vec.paddr), vec.len, offset_in_page(vec.paddr)); nsegs++; } if (*last_sg) sg_mark_end(*last_sg); /* * Something must have been wrong if the figured number of * segment is bigger than number of req's physical segments */ WARN_ON(nsegs > blk_rq_nr_phys_segments(rq)); return nsegs; } EXPORT_SYMBOL(__blk_rq_map_sg); #ifdef CONFIG_BLK_DEV_INTEGRITY /** * blk_rq_integrity_dma_map_iter_start - map the first integrity DMA segment * for a request * @req: request to map * @dma_dev: device to map to * @state: DMA IOVA state * @iter: block layer DMA iterator * * Start DMA mapping @req integrity data to @dma_dev. @state and @iter are * provided by the caller and don't need to be initialized. @state needs to be * stored for use at unmap time, @iter is only needed at map time. * * Returns %false if there is no segment to map, including due to an error, or * %true if it did map a segment. * * If a segment was mapped, the DMA address for it is returned in @iter.addr * and the length in @iter.len. If no segment was mapped the status code is * returned in @iter.status. * * The caller can call blk_rq_dma_map_coalesce() to check if further segments * need to be mapped after this, or go straight to blk_rq_dma_map_iter_next() * to try to map the following segments. */ bool blk_rq_integrity_dma_map_iter_start(struct request *req, struct device *dma_dev, struct dma_iova_state *state, struct blk_dma_iter *iter) { unsigned len = bio_integrity_bytes(&req->q->limits.integrity, blk_rq_sectors(req)); struct bio *bio = req->bio; iter->iter = (struct blk_map_iter) { .bio = bio, .iter = bio_integrity(bio)->bip_iter, .bvecs = bio_integrity(bio)->bip_vec, .is_integrity = true, }; return blk_dma_map_iter_start(req, dma_dev, state, iter, len); } EXPORT_SYMBOL_GPL(blk_rq_integrity_dma_map_iter_start); /** * blk_rq_integrity_dma_map_iter_start - map the next integrity DMA segment for * a request * @req: request to map * @dma_dev: device to map to * @state: DMA IOVA state * @iter: block layer DMA iterator * * Iterate to the next integrity mapping after a previous call to * blk_rq_integrity_dma_map_iter_start(). See there for a detailed description * of the arguments. * * Returns %false if there is no segment to map, including due to an error, or * %true if it did map a segment. * * If a segment was mapped, the DMA address for it is returned in @iter.addr and * the length in @iter.len. If no segment was mapped the status code is * returned in @iter.status. */ bool blk_rq_integrity_dma_map_iter_next(struct request *req, struct device *dma_dev, struct blk_dma_iter *iter) { struct phys_vec vec; if (!blk_map_iter_next(req, &iter->iter, &vec)) return false; if (iter->p2pdma.map == PCI_P2PDMA_MAP_BUS_ADDR) return blk_dma_map_bus(iter, &vec); return blk_dma_map_direct(req, dma_dev, iter, &vec); } EXPORT_SYMBOL_GPL(blk_rq_integrity_dma_map_iter_next); /** * blk_rq_map_integrity_sg - Map integrity metadata into a scatterlist * @rq: request to map * @sglist: target scatterlist * * Description: Map the integrity vectors in request into a * scatterlist. The scatterlist must be big enough to hold all * elements. I.e. sized using blk_rq_count_integrity_sg() or * rq->nr_integrity_segments. */ int blk_rq_map_integrity_sg(struct request *rq, struct scatterlist *sglist) { struct request_queue *q = rq->q; struct scatterlist *sg = NULL; struct bio *bio = rq->bio; unsigned int segments = 0; struct phys_vec vec; struct blk_map_iter iter = { .bio = bio, .iter = bio_integrity(bio)->bip_iter, .bvecs = bio_integrity(bio)->bip_vec, .is_integrity = true, }; while (blk_map_iter_next(rq, &iter, &vec)) { sg = blk_next_sg(&sg, sglist); sg_set_page(sg, phys_to_page(vec.paddr), vec.len, offset_in_page(vec.paddr)); segments++; } if (sg) sg_mark_end(sg); /* * Something must have been wrong if the figured number of segment * is bigger than number of req's physical integrity segments */ BUG_ON(segments > rq->nr_integrity_segments); BUG_ON(segments > queue_max_integrity_segments(q)); return segments; } EXPORT_SYMBOL(blk_rq_map_integrity_sg); #endif |
| 33 33 6 2 4 22 6 19 21 6 17 11 7 4 14 3 12 8 2 8 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 | // SPDX-License-Identifier: GPL-2.0-only /* * Copyright (c) 2013 Patrick McHardy <kaber@trash.net> */ #include <linux/module.h> #include <linux/skbuff.h> #include <linux/unaligned.h> #include <net/tcp.h> #include <net/netns/generic.h> #include <linux/proc_fs.h> #include <linux/netfilter_ipv6.h> #include <linux/netfilter/nf_synproxy.h> #include <net/netfilter/nf_conntrack.h> #include <net/netfilter/nf_conntrack_ecache.h> #include <net/netfilter/nf_conntrack_extend.h> #include <net/netfilter/nf_conntrack_seqadj.h> #include <net/netfilter/nf_conntrack_synproxy.h> #include <net/netfilter/nf_conntrack_zones.h> #include <net/netfilter/nf_synproxy.h> unsigned int synproxy_net_id; EXPORT_SYMBOL_GPL(synproxy_net_id); bool synproxy_parse_options(const struct sk_buff *skb, unsigned int doff, const struct tcphdr *th, struct synproxy_options *opts) { int length = (th->doff * 4) - sizeof(*th); u8 buf[40], *ptr; if (unlikely(length < 0)) return false; ptr = skb_header_pointer(skb, doff + sizeof(*th), length, buf); if (ptr == NULL) return false; opts->options = 0; while (length > 0) { int opcode = *ptr++; int opsize; switch (opcode) { case TCPOPT_EOL: return true; case TCPOPT_NOP: length--; continue; default: if (length < 2) return true; opsize = *ptr++; if (opsize < 2) return true; if (opsize > length) return true; switch (opcode) { case TCPOPT_MSS: if (opsize == TCPOLEN_MSS) { opts->mss_option = get_unaligned_be16(ptr); opts->options |= NF_SYNPROXY_OPT_MSS; } break; case TCPOPT_WINDOW: if (opsize == TCPOLEN_WINDOW) { opts->wscale = *ptr; if (opts->wscale > TCP_MAX_WSCALE) opts->wscale = TCP_MAX_WSCALE; opts->options |= NF_SYNPROXY_OPT_WSCALE; } break; case TCPOPT_TIMESTAMP: if (opsize == TCPOLEN_TIMESTAMP) { opts->tsval = get_unaligned_be32(ptr); opts->tsecr = get_unaligned_be32(ptr + 4); opts->options |= NF_SYNPROXY_OPT_TIMESTAMP; } break; case TCPOPT_SACK_PERM: if (opsize == TCPOLEN_SACK_PERM) opts->options |= NF_SYNPROXY_OPT_SACK_PERM; break; } ptr += opsize - 2; length -= opsize; } } return true; } EXPORT_SYMBOL_GPL(synproxy_parse_options); static unsigned int synproxy_options_size(const struct synproxy_options *opts) { unsigned int size = 0; if (opts->options & NF_SYNPROXY_OPT_MSS) size += TCPOLEN_MSS_ALIGNED; if (opts->options & NF_SYNPROXY_OPT_TIMESTAMP) size += TCPOLEN_TSTAMP_ALIGNED; else if (opts->options & NF_SYNPROXY_OPT_SACK_PERM) size += TCPOLEN_SACKPERM_ALIGNED; if (opts->options & NF_SYNPROXY_OPT_WSCALE) size += TCPOLEN_WSCALE_ALIGNED; return size; } static void synproxy_build_options(struct tcphdr *th, const struct synproxy_options *opts) { __be32 *ptr = (__be32 *)(th + 1); u8 options = opts->options; if (options & NF_SYNPROXY_OPT_MSS) *ptr++ = htonl((TCPOPT_MSS << 24) | (TCPOLEN_MSS << 16) | opts->mss_option); if (options & NF_SYNPROXY_OPT_TIMESTAMP) { if (options & NF_SYNPROXY_OPT_SACK_PERM) *ptr++ = htonl((TCPOPT_SACK_PERM << 24) | (TCPOLEN_SACK_PERM << 16) | (TCPOPT_TIMESTAMP << 8) | TCPOLEN_TIMESTAMP); else *ptr++ = htonl((TCPOPT_NOP << 24) | (TCPOPT_NOP << 16) | (TCPOPT_TIMESTAMP << 8) | TCPOLEN_TIMESTAMP); *ptr++ = htonl(opts->tsval); *ptr++ = htonl(opts->tsecr); } else if (options & NF_SYNPROXY_OPT_SACK_PERM) *ptr++ = htonl((TCPOPT_NOP << 24) | (TCPOPT_NOP << 16) | (TCPOPT_SACK_PERM << 8) | TCPOLEN_SACK_PERM); if (options & NF_SYNPROXY_OPT_WSCALE) *ptr++ = htonl((TCPOPT_NOP << 24) | (TCPOPT_WINDOW << 16) | (TCPOLEN_WINDOW << 8) | opts->wscale); } void synproxy_init_timestamp_cookie(const struct nf_synproxy_info *info, struct synproxy_options *opts) { opts->tsecr = opts->tsval; opts->tsval = tcp_clock_ms() & ~0x3f; if (opts->options & NF_SYNPROXY_OPT_WSCALE) { opts->tsval |= opts->wscale; opts->wscale = info->wscale; } else opts->tsval |= 0xf; if (opts->options & NF_SYNPROXY_OPT_SACK_PERM) opts->tsval |= 1 << 4; if (opts->options & NF_SYNPROXY_OPT_ECN) opts->tsval |= 1 << 5; } EXPORT_SYMBOL_GPL(synproxy_init_timestamp_cookie); static void synproxy_check_timestamp_cookie(struct synproxy_options *opts) { opts->wscale = opts->tsecr & 0xf; if (opts->wscale != 0xf) opts->options |= NF_SYNPROXY_OPT_WSCALE; opts->options |= opts->tsecr & (1 << 4) ? NF_SYNPROXY_OPT_SACK_PERM : 0; opts->options |= opts->tsecr & (1 << 5) ? NF_SYNPROXY_OPT_ECN : 0; } static unsigned int synproxy_tstamp_adjust(struct sk_buff *skb, unsigned int protoff, struct tcphdr *th, struct nf_conn *ct, enum ip_conntrack_info ctinfo, const struct nf_conn_synproxy *synproxy) { unsigned int optoff, optend; __be32 *ptr, old; if (synproxy->tsoff == 0) return 1; optoff = protoff + sizeof(struct tcphdr); optend = protoff + th->doff * 4; if (skb_ensure_writable(skb, optend)) return 0; while (optoff < optend) { unsigned char *op = skb->data + optoff; switch (op[0]) { case TCPOPT_EOL: return 1; case TCPOPT_NOP: optoff++; continue; default: if (optoff + 1 == optend || optoff + op[1] > optend || op[1] < 2) return 0; if (op[0] == TCPOPT_TIMESTAMP && op[1] == TCPOLEN_TIMESTAMP) { if (CTINFO2DIR(ctinfo) == IP_CT_DIR_REPLY) { ptr = (__be32 *)&op[2]; old = *ptr; *ptr = htonl(ntohl(*ptr) - synproxy->tsoff); } else { ptr = (__be32 *)&op[6]; old = *ptr; *ptr = htonl(ntohl(*ptr) + synproxy->tsoff); } inet_proto_csum_replace4(&th->check, skb, old, *ptr, false); return 1; } optoff += op[1]; } } return 1; } #ifdef CONFIG_PROC_FS static void *synproxy_cpu_seq_start(struct seq_file *seq, loff_t *pos) { struct synproxy_net *snet = synproxy_pernet(seq_file_net(seq)); int cpu; if (*pos == 0) return SEQ_START_TOKEN; for (cpu = *pos - 1; cpu < nr_cpu_ids; cpu++) { if (!cpu_possible(cpu)) continue; *pos = cpu + 1; return per_cpu_ptr(snet->stats, cpu); } return NULL; } static void *synproxy_cpu_seq_next(struct seq_file *seq, void *v, loff_t *pos) { struct synproxy_net *snet = synproxy_pernet(seq_file_net(seq)); int cpu; for (cpu = *pos; cpu < nr_cpu_ids; cpu++) { if (!cpu_possible(cpu)) continue; *pos = cpu + 1; return per_cpu_ptr(snet->stats, cpu); } (*pos)++; return NULL; } static void synproxy_cpu_seq_stop(struct seq_file *seq, void *v) { return; } static int synproxy_cpu_seq_show(struct seq_file *seq, void *v) { struct synproxy_stats *stats = v; if (v == SEQ_START_TOKEN) { seq_puts(seq, "entries\t\tsyn_received\t" "cookie_invalid\tcookie_valid\t" "cookie_retrans\tconn_reopened\n"); return 0; } seq_printf(seq, "%08x\t%08x\t%08x\t%08x\t%08x\t%08x\n", 0, stats->syn_received, stats->cookie_invalid, stats->cookie_valid, stats->cookie_retrans, stats->conn_reopened); return 0; } static const struct seq_operations synproxy_cpu_seq_ops = { .start = synproxy_cpu_seq_start, .next = synproxy_cpu_seq_next, .stop = synproxy_cpu_seq_stop, .show = synproxy_cpu_seq_show, }; static int __net_init synproxy_proc_init(struct net *net) { if (!proc_create_net("synproxy", 0444, net->proc_net_stat, &synproxy_cpu_seq_ops, sizeof(struct seq_net_private))) return -ENOMEM; return 0; } static void __net_exit synproxy_proc_exit(struct net *net) { remove_proc_entry("synproxy", net->proc_net_stat); } #else static int __net_init synproxy_proc_init(struct net *net) { return 0; } static void __net_exit synproxy_proc_exit(struct net *net) { return; } #endif /* CONFIG_PROC_FS */ static int __net_init synproxy_net_init(struct net *net) { struct synproxy_net *snet = synproxy_pernet(net); struct nf_conn *ct; int err = -ENOMEM; ct = nf_ct_tmpl_alloc(net, &nf_ct_zone_dflt, GFP_KERNEL); if (!ct) goto err1; if (!nfct_seqadj_ext_add(ct)) goto err2; if (!nfct_synproxy_ext_add(ct)) goto err2; __set_bit(IPS_CONFIRMED_BIT, &ct->status); snet->tmpl = ct; snet->stats = alloc_percpu(struct synproxy_stats); if (snet->stats == NULL) goto err2; err = synproxy_proc_init(net); if (err < 0) goto err3; return 0; err3: free_percpu(snet->stats); err2: nf_ct_tmpl_free(ct); err1: return err; } static void __net_exit synproxy_net_exit(struct net *net) { struct synproxy_net *snet = synproxy_pernet(net); nf_ct_put(snet->tmpl); synproxy_proc_exit(net); free_percpu(snet->stats); } static struct pernet_operations synproxy_net_ops = { .init = synproxy_net_init, .exit = synproxy_net_exit, .id = &synproxy_net_id, .size = sizeof(struct synproxy_net), }; static int __init synproxy_core_init(void) { return register_pernet_subsys(&synproxy_net_ops); } static void __exit synproxy_core_exit(void) { unregister_pernet_subsys(&synproxy_net_ops); } module_init(synproxy_core_init); module_exit(synproxy_core_exit); static struct iphdr * synproxy_build_ip(struct net *net, struct sk_buff *skb, __be32 saddr, __be32 daddr) { struct iphdr *iph; skb_reset_network_header(skb); iph = skb_put(skb, sizeof(*iph)); iph->version = 4; iph->ihl = sizeof(*iph) / 4; iph->tos = 0; iph->id = 0; iph->frag_off = htons(IP_DF); iph->ttl = READ_ONCE(net->ipv4.sysctl_ip_default_ttl); iph->protocol = IPPROTO_TCP; iph->check = 0; iph->saddr = saddr; iph->daddr = daddr; return iph; } static void synproxy_send_tcp(struct net *net, const struct sk_buff *skb, struct sk_buff *nskb, struct nf_conntrack *nfct, enum ip_conntrack_info ctinfo, struct iphdr *niph, struct tcphdr *nth, unsigned int tcp_hdr_size) { nth->check = ~tcp_v4_check(tcp_hdr_size, niph->saddr, niph->daddr, 0); nskb->ip_summed = CHECKSUM_PARTIAL; nskb->csum_start = (unsigned char *)nth - nskb->head; nskb->csum_offset = offsetof(struct tcphdr, check); skb_dst_set_noref(nskb, skb_dst(skb)); nskb->protocol = htons(ETH_P_IP); if (ip_route_me_harder(net, nskb->sk, nskb, RTN_UNSPEC)) goto free_nskb; if (nfct) { nf_ct_set(nskb, (struct nf_conn *)nfct, ctinfo); nf_conntrack_get(nfct); } ip_local_out(net, nskb->sk, nskb); return; free_nskb: kfree_skb(nskb); } void synproxy_send_client_synack(struct net *net, const struct sk_buff *skb, const struct tcphdr *th, const struct synproxy_options *opts) { struct sk_buff *nskb; struct iphdr *iph, *niph; struct tcphdr *nth; unsigned int tcp_hdr_size; u16 mss = opts->mss_encode; iph = ip_hdr(skb); tcp_hdr_size = sizeof(*nth) + synproxy_options_size(opts); nskb = alloc_skb(sizeof(*niph) + tcp_hdr_size + MAX_TCP_HEADER, GFP_ATOMIC); if (!nskb) return; skb_reserve(nskb, MAX_TCP_HEADER); niph = synproxy_build_ip(net, nskb, iph->daddr, iph->saddr); skb_reset_transport_header(nskb); nth = skb_put(nskb, tcp_hdr_size); nth->source = th->dest; nth->dest = th->source; nth->seq = htonl(__cookie_v4_init_sequence(iph, th, &mss)); nth->ack_seq = htonl(ntohl(th->seq) + 1); tcp_flag_word(nth) = TCP_FLAG_SYN | TCP_FLAG_ACK; if (opts->options & NF_SYNPROXY_OPT_ECN) tcp_flag_word(nth) |= TCP_FLAG_ECE; nth->doff = tcp_hdr_size / 4; nth->window = 0; nth->check = 0; nth->urg_ptr = 0; synproxy_build_options(nth, opts); synproxy_send_tcp(net, skb, nskb, skb_nfct(skb), IP_CT_ESTABLISHED_REPLY, niph, nth, tcp_hdr_size); } EXPORT_SYMBOL_GPL(synproxy_send_client_synack); static void synproxy_send_server_syn(struct net *net, const struct sk_buff *skb, const struct tcphdr *th, const struct synproxy_options *opts, u32 recv_seq) { struct synproxy_net *snet = synproxy_pernet(net); struct sk_buff *nskb; struct iphdr *iph, *niph; struct tcphdr *nth; unsigned int tcp_hdr_size; iph = ip_hdr(skb); tcp_hdr_size = sizeof(*nth) + synproxy_options_size(opts); nskb = alloc_skb(sizeof(*niph) + tcp_hdr_size + MAX_TCP_HEADER, GFP_ATOMIC); if (!nskb) return; skb_reserve(nskb, MAX_TCP_HEADER); niph = synproxy_build_ip(net, nskb, iph->saddr, iph->daddr); skb_reset_transport_header(nskb); nth = skb_put(nskb, tcp_hdr_size); nth->source = th->source; nth->dest = th->dest; nth->seq = htonl(recv_seq - 1); /* ack_seq is used to relay our ISN to the synproxy hook to initialize * sequence number translation once a connection tracking entry exists. */ nth->ack_seq = htonl(ntohl(th->ack_seq) - 1); tcp_flag_word(nth) = TCP_FLAG_SYN; if (opts->options & NF_SYNPROXY_OPT_ECN) tcp_flag_word(nth) |= TCP_FLAG_ECE | TCP_FLAG_CWR; nth->doff = tcp_hdr_size / 4; nth->window = th->window; nth->check = 0; nth->urg_ptr = 0; synproxy_build_options(nth, opts); synproxy_send_tcp(net, skb, nskb, &snet->tmpl->ct_general, IP_CT_NEW, niph, nth, tcp_hdr_size); } static void synproxy_send_server_ack(struct net *net, const struct ip_ct_tcp *state, const struct sk_buff *skb, const struct tcphdr *th, const struct synproxy_options *opts) { struct sk_buff *nskb; struct iphdr *iph, *niph; struct tcphdr *nth; unsigned int tcp_hdr_size; iph = ip_hdr(skb); tcp_hdr_size = sizeof(*nth) + synproxy_options_size(opts); nskb = alloc_skb(sizeof(*niph) + tcp_hdr_size + MAX_TCP_HEADER, GFP_ATOMIC); if (!nskb) return; skb_reserve(nskb, MAX_TCP_HEADER); niph = synproxy_build_ip(net, nskb, iph->daddr, iph->saddr); skb_reset_transport_header(nskb); nth = skb_put(nskb, tcp_hdr_size); nth->source = th->dest; nth->dest = th->source; nth->seq = htonl(ntohl(th->ack_seq)); nth->ack_seq = htonl(ntohl(th->seq) + 1); tcp_flag_word(nth) = TCP_FLAG_ACK; nth->doff = tcp_hdr_size / 4; nth->window = htons(state->seen[IP_CT_DIR_ORIGINAL].td_maxwin); nth->check = 0; nth->urg_ptr = 0; synproxy_build_options(nth, opts); synproxy_send_tcp(net, skb, nskb, NULL, 0, niph, nth, tcp_hdr_size); } static void synproxy_send_client_ack(struct net *net, const struct sk_buff *skb, const struct tcphdr *th, const struct synproxy_options *opts) { struct sk_buff *nskb; struct iphdr *iph, *niph; struct tcphdr *nth; unsigned int tcp_hdr_size; iph = ip_hdr(skb); tcp_hdr_size = sizeof(*nth) + synproxy_options_size(opts); nskb = alloc_skb(sizeof(*niph) + tcp_hdr_size + MAX_TCP_HEADER, GFP_ATOMIC); if (!nskb) return; skb_reserve(nskb, MAX_TCP_HEADER); niph = synproxy_build_ip(net, nskb, iph->saddr, iph->daddr); skb_reset_transport_header(nskb); nth = skb_put(nskb, tcp_hdr_size); nth->source = th->source; nth->dest = th->dest; nth->seq = htonl(ntohl(th->seq) + 1); nth->ack_seq = th->ack_seq; tcp_flag_word(nth) = TCP_FLAG_ACK; nth->doff = tcp_hdr_size / 4; nth->window = htons(ntohs(th->window) >> opts->wscale); nth->check = 0; nth->urg_ptr = 0; synproxy_build_options(nth, opts); synproxy_send_tcp(net, skb, nskb, skb_nfct(skb), IP_CT_ESTABLISHED_REPLY, niph, nth, tcp_hdr_size); } bool synproxy_recv_client_ack(struct net *net, const struct sk_buff *skb, const struct tcphdr *th, struct synproxy_options *opts, u32 recv_seq) { struct synproxy_net *snet = synproxy_pernet(net); int mss; mss = __cookie_v4_check(ip_hdr(skb), th); if (mss == 0) { this_cpu_inc(snet->stats->cookie_invalid); return false; } this_cpu_inc(snet->stats->cookie_valid); opts->mss_option = mss; opts->options |= NF_SYNPROXY_OPT_MSS; if (opts->options & NF_SYNPROXY_OPT_TIMESTAMP) synproxy_check_timestamp_cookie(opts); synproxy_send_server_syn(net, skb, th, opts, recv_seq); return true; } EXPORT_SYMBOL_GPL(synproxy_recv_client_ack); unsigned int ipv4_synproxy_hook(void *priv, struct sk_buff *skb, const struct nf_hook_state *nhs) { struct net *net = nhs->net; struct synproxy_net *snet = synproxy_pernet(net); enum ip_conntrack_info ctinfo; struct nf_conn *ct; struct nf_conn_synproxy *synproxy; struct synproxy_options opts = {}; const struct ip_ct_tcp *state; struct tcphdr *th, _th; unsigned int thoff; ct = nf_ct_get(skb, &ctinfo); if (!ct) return NF_ACCEPT; synproxy = nfct_synproxy(ct); if (!synproxy) return NF_ACCEPT; if (nf_is_loopback_packet(skb) || ip_hdr(skb)->protocol != IPPROTO_TCP) return NF_ACCEPT; thoff = ip_hdrlen(skb); th = skb_header_pointer(skb, thoff, sizeof(_th), &_th); if (!th) return NF_DROP; state = &ct->proto.tcp; switch (state->state) { case TCP_CONNTRACK_CLOSE: if (th->rst && CTINFO2DIR(ctinfo) != IP_CT_DIR_ORIGINAL) { nf_ct_seqadj_init(ct, ctinfo, synproxy->isn - ntohl(th->seq) + 1); break; } if (!th->syn || th->ack || CTINFO2DIR(ctinfo) != IP_CT_DIR_ORIGINAL) break; /* Reopened connection - reset the sequence number and timestamp * adjustments, they will get initialized once the connection is * reestablished. */ nf_ct_seqadj_init(ct, ctinfo, 0); synproxy->tsoff = 0; this_cpu_inc(snet->stats->conn_reopened); fallthrough; case TCP_CONNTRACK_SYN_SENT: if (!synproxy_parse_options(skb, thoff, th, &opts)) return NF_DROP; if (!th->syn && th->ack && CTINFO2DIR(ctinfo) == IP_CT_DIR_ORIGINAL) { /* Keep-Alives are sent with SEG.SEQ = SND.NXT-1, * therefore we need to add 1 to make the SYN sequence * number match the one of first SYN. */ if (synproxy_recv_client_ack(net, skb, th, &opts, ntohl(th->seq) + 1)) { this_cpu_inc(snet->stats->cookie_retrans); consume_skb(skb); return NF_STOLEN; } else { return NF_DROP; } } synproxy->isn = ntohl(th->ack_seq); if (opts.options & NF_SYNPROXY_OPT_TIMESTAMP) synproxy->its = opts.tsecr; nf_conntrack_event_cache(IPCT_SYNPROXY, ct); break; case TCP_CONNTRACK_SYN_RECV: if (!th->syn || !th->ack) break; if (!synproxy_parse_options(skb, thoff, th, &opts)) return NF_DROP; if (opts.options & NF_SYNPROXY_OPT_TIMESTAMP) { synproxy->tsoff = opts.tsval - synproxy->its; nf_conntrack_event_cache(IPCT_SYNPROXY, ct); } opts.options &= ~(NF_SYNPROXY_OPT_MSS | NF_SYNPROXY_OPT_WSCALE | NF_SYNPROXY_OPT_SACK_PERM); swap(opts.tsval, opts.tsecr); synproxy_send_server_ack(net, state, skb, th, &opts); nf_ct_seqadj_init(ct, ctinfo, synproxy->isn - ntohl(th->seq)); nf_conntrack_event_cache(IPCT_SEQADJ, ct); swap(opts.tsval, opts.tsecr); synproxy_send_client_ack(net, skb, th, &opts); consume_skb(skb); return NF_STOLEN; default: break; } synproxy_tstamp_adjust(skb, thoff, th, ct, ctinfo, synproxy); return NF_ACCEPT; } EXPORT_SYMBOL_GPL(ipv4_synproxy_hook); static const struct nf_hook_ops ipv4_synproxy_ops[] = { { .hook = ipv4_synproxy_hook, .pf = NFPROTO_IPV4, .hooknum = NF_INET_LOCAL_IN, .priority = NF_IP_PRI_CONNTRACK_CONFIRM - 1, }, { .hook = ipv4_synproxy_hook, .pf = NFPROTO_IPV4, .hooknum = NF_INET_POST_ROUTING, .priority = NF_IP_PRI_CONNTRACK_CONFIRM - 1, }, }; int nf_synproxy_ipv4_init(struct synproxy_net *snet, struct net *net) { int err; if (snet->hook_ref4 == 0) { err = nf_register_net_hooks(net, ipv4_synproxy_ops, ARRAY_SIZE(ipv4_synproxy_ops)); if (err) return err; } snet->hook_ref4++; return 0; } EXPORT_SYMBOL_GPL(nf_synproxy_ipv4_init); void nf_synproxy_ipv4_fini(struct synproxy_net *snet, struct net *net) { snet->hook_ref4--; if (snet->hook_ref4 == 0) nf_unregister_net_hooks(net, ipv4_synproxy_ops, ARRAY_SIZE(ipv4_synproxy_ops)); } EXPORT_SYMBOL_GPL(nf_synproxy_ipv4_fini); #if IS_ENABLED(CONFIG_IPV6) static struct ipv6hdr * synproxy_build_ip_ipv6(struct net *net, struct sk_buff *skb, const struct in6_addr *saddr, const struct in6_addr *daddr) { struct ipv6hdr *iph; skb_reset_network_header(skb); iph = skb_put(skb, sizeof(*iph)); ip6_flow_hdr(iph, 0, 0); iph->hop_limit = READ_ONCE(net->ipv6.devconf_all->hop_limit); iph->nexthdr = IPPROTO_TCP; iph->saddr = *saddr; iph->daddr = *daddr; return iph; } static void synproxy_send_tcp_ipv6(struct net *net, const struct sk_buff *skb, struct sk_buff *nskb, struct nf_conntrack *nfct, enum ip_conntrack_info ctinfo, struct ipv6hdr *niph, struct tcphdr *nth, unsigned int tcp_hdr_size) { struct dst_entry *dst; struct flowi6 fl6; int err; nth->check = ~tcp_v6_check(tcp_hdr_size, &niph->saddr, &niph->daddr, 0); nskb->ip_summed = CHECKSUM_PARTIAL; nskb->csum_start = (unsigned char *)nth - nskb->head; nskb->csum_offset = offsetof(struct tcphdr, check); memset(&fl6, 0, sizeof(fl6)); fl6.flowi6_proto = IPPROTO_TCP; fl6.saddr = niph->saddr; fl6.daddr = niph->daddr; fl6.fl6_sport = nth->source; fl6.fl6_dport = nth->dest; security_skb_classify_flow((struct sk_buff *)skb, flowi6_to_flowi_common(&fl6)); err = nf_ip6_route(net, &dst, flowi6_to_flowi(&fl6), false); if (err) { goto free_nskb; } dst = xfrm_lookup(net, dst, flowi6_to_flowi(&fl6), NULL, 0); if (IS_ERR(dst)) goto free_nskb; skb_dst_set(nskb, dst); if (nfct) { nf_ct_set(nskb, (struct nf_conn *)nfct, ctinfo); nf_conntrack_get(nfct); } ip6_local_out(net, nskb->sk, nskb); return; free_nskb: kfree_skb(nskb); } void synproxy_send_client_synack_ipv6(struct net *net, const struct sk_buff *skb, const struct tcphdr *th, const struct synproxy_options *opts) { struct sk_buff *nskb; struct ipv6hdr *iph, *niph; struct tcphdr *nth; unsigned int tcp_hdr_size; u16 mss = opts->mss_encode; iph = ipv6_hdr(skb); tcp_hdr_size = sizeof(*nth) + synproxy_options_size(opts); nskb = alloc_skb(sizeof(*niph) + tcp_hdr_size + MAX_TCP_HEADER, GFP_ATOMIC); if (!nskb) return; skb_reserve(nskb, MAX_TCP_HEADER); niph = synproxy_build_ip_ipv6(net, nskb, &iph->daddr, &iph->saddr); skb_reset_transport_header(nskb); nth = skb_put(nskb, tcp_hdr_size); nth->source = th->dest; nth->dest = th->source; nth->seq = htonl(nf_ipv6_cookie_init_sequence(iph, th, &mss)); nth->ack_seq = htonl(ntohl(th->seq) + 1); tcp_flag_word(nth) = TCP_FLAG_SYN | TCP_FLAG_ACK; if (opts->options & NF_SYNPROXY_OPT_ECN) tcp_flag_word(nth) |= TCP_FLAG_ECE; nth->doff = tcp_hdr_size / 4; nth->window = 0; nth->check = 0; nth->urg_ptr = 0; synproxy_build_options(nth, opts); synproxy_send_tcp_ipv6(net, skb, nskb, skb_nfct(skb), IP_CT_ESTABLISHED_REPLY, niph, nth, tcp_hdr_size); } EXPORT_SYMBOL_GPL(synproxy_send_client_synack_ipv6); static void synproxy_send_server_syn_ipv6(struct net *net, const struct sk_buff *skb, const struct tcphdr *th, const struct synproxy_options *opts, u32 recv_seq) { struct synproxy_net *snet = synproxy_pernet(net); struct sk_buff *nskb; struct ipv6hdr *iph, *niph; struct tcphdr *nth; unsigned int tcp_hdr_size; iph = ipv6_hdr(skb); tcp_hdr_size = sizeof(*nth) + synproxy_options_size(opts); nskb = alloc_skb(sizeof(*niph) + tcp_hdr_size + MAX_TCP_HEADER, GFP_ATOMIC); if (!nskb) return; skb_reserve(nskb, MAX_TCP_HEADER); niph = synproxy_build_ip_ipv6(net, nskb, &iph->saddr, &iph->daddr); skb_reset_transport_header(nskb); nth = skb_put(nskb, tcp_hdr_size); nth->source = th->source; nth->dest = th->dest; nth->seq = htonl(recv_seq - 1); /* ack_seq is used to relay our ISN to the synproxy hook to initialize * sequence number translation once a connection tracking entry exists. */ nth->ack_seq = htonl(ntohl(th->ack_seq) - 1); tcp_flag_word(nth) = TCP_FLAG_SYN; if (opts->options & NF_SYNPROXY_OPT_ECN) tcp_flag_word(nth) |= TCP_FLAG_ECE | TCP_FLAG_CWR; nth->doff = tcp_hdr_size / 4; nth->window = th->window; nth->check = 0; nth->urg_ptr = 0; synproxy_build_options(nth, opts); synproxy_send_tcp_ipv6(net, skb, nskb, &snet->tmpl->ct_general, IP_CT_NEW, niph, nth, tcp_hdr_size); } static void synproxy_send_server_ack_ipv6(struct net *net, const struct ip_ct_tcp *state, const struct sk_buff *skb, const struct tcphdr *th, const struct synproxy_options *opts) { struct sk_buff *nskb; struct ipv6hdr *iph, *niph; struct tcphdr *nth; unsigned int tcp_hdr_size; iph = ipv6_hdr(skb); tcp_hdr_size = sizeof(*nth) + synproxy_options_size(opts); nskb = alloc_skb(sizeof(*niph) + tcp_hdr_size + MAX_TCP_HEADER, GFP_ATOMIC); if (!nskb) return; skb_reserve(nskb, MAX_TCP_HEADER); niph = synproxy_build_ip_ipv6(net, nskb, &iph->daddr, &iph->saddr); skb_reset_transport_header(nskb); nth = skb_put(nskb, tcp_hdr_size); nth->source = th->dest; nth->dest = th->source; nth->seq = htonl(ntohl(th->ack_seq)); nth->ack_seq = htonl(ntohl(th->seq) + 1); tcp_flag_word(nth) = TCP_FLAG_ACK; nth->doff = tcp_hdr_size / 4; nth->window = htons(state->seen[IP_CT_DIR_ORIGINAL].td_maxwin); nth->check = 0; nth->urg_ptr = 0; synproxy_build_options(nth, opts); synproxy_send_tcp_ipv6(net, skb, nskb, NULL, 0, niph, nth, tcp_hdr_size); } static void synproxy_send_client_ack_ipv6(struct net *net, const struct sk_buff *skb, const struct tcphdr *th, const struct synproxy_options *opts) { struct sk_buff *nskb; struct ipv6hdr *iph, *niph; struct tcphdr *nth; unsigned int tcp_hdr_size; iph = ipv6_hdr(skb); tcp_hdr_size = sizeof(*nth) + synproxy_options_size(opts); nskb = alloc_skb(sizeof(*niph) + tcp_hdr_size + MAX_TCP_HEADER, GFP_ATOMIC); if (!nskb) return; skb_reserve(nskb, MAX_TCP_HEADER); niph = synproxy_build_ip_ipv6(net, nskb, &iph->saddr, &iph->daddr); skb_reset_transport_header(nskb); nth = skb_put(nskb, tcp_hdr_size); nth->source = th->source; nth->dest = th->dest; nth->seq = htonl(ntohl(th->seq) + 1); nth->ack_seq = th->ack_seq; tcp_flag_word(nth) = TCP_FLAG_ACK; nth->doff = tcp_hdr_size / 4; nth->window = htons(ntohs(th->window) >> opts->wscale); nth->check = 0; nth->urg_ptr = 0; synproxy_build_options(nth, opts); synproxy_send_tcp_ipv6(net, skb, nskb, skb_nfct(skb), IP_CT_ESTABLISHED_REPLY, niph, nth, tcp_hdr_size); } bool synproxy_recv_client_ack_ipv6(struct net *net, const struct sk_buff *skb, const struct tcphdr *th, struct synproxy_options *opts, u32 recv_seq) { struct synproxy_net *snet = synproxy_pernet(net); int mss; mss = nf_cookie_v6_check(ipv6_hdr(skb), th); if (mss == 0) { this_cpu_inc(snet->stats->cookie_invalid); return false; } this_cpu_inc(snet->stats->cookie_valid); opts->mss_option = mss; opts->options |= NF_SYNPROXY_OPT_MSS; if (opts->options & NF_SYNPROXY_OPT_TIMESTAMP) synproxy_check_timestamp_cookie(opts); synproxy_send_server_syn_ipv6(net, skb, th, opts, recv_seq); return true; } EXPORT_SYMBOL_GPL(synproxy_recv_client_ack_ipv6); unsigned int ipv6_synproxy_hook(void *priv, struct sk_buff *skb, const struct nf_hook_state *nhs) { struct net *net = nhs->net; struct synproxy_net *snet = synproxy_pernet(net); enum ip_conntrack_info ctinfo; struct nf_conn *ct; struct nf_conn_synproxy *synproxy; struct synproxy_options opts = {}; const struct ip_ct_tcp *state; struct tcphdr *th, _th; __be16 frag_off; u8 nexthdr; int thoff; ct = nf_ct_get(skb, &ctinfo); if (!ct) return NF_ACCEPT; synproxy = nfct_synproxy(ct); if (!synproxy) return NF_ACCEPT; if (nf_is_loopback_packet(skb)) return NF_ACCEPT; nexthdr = ipv6_hdr(skb)->nexthdr; thoff = ipv6_skip_exthdr(skb, sizeof(struct ipv6hdr), &nexthdr, &frag_off); if (thoff < 0 || nexthdr != IPPROTO_TCP) return NF_ACCEPT; th = skb_header_pointer(skb, thoff, sizeof(_th), &_th); if (!th) return NF_DROP; state = &ct->proto.tcp; switch (state->state) { case TCP_CONNTRACK_CLOSE: if (th->rst && CTINFO2DIR(ctinfo) != IP_CT_DIR_ORIGINAL) { nf_ct_seqadj_init(ct, ctinfo, synproxy->isn - ntohl(th->seq) + 1); break; } if (!th->syn || th->ack || CTINFO2DIR(ctinfo) != IP_CT_DIR_ORIGINAL) break; /* Reopened connection - reset the sequence number and timestamp * adjustments, they will get initialized once the connection is * reestablished. */ nf_ct_seqadj_init(ct, ctinfo, 0); synproxy->tsoff = 0; this_cpu_inc(snet->stats->conn_reopened); fallthrough; case TCP_CONNTRACK_SYN_SENT: if (!synproxy_parse_options(skb, thoff, th, &opts)) return NF_DROP; if (!th->syn && th->ack && CTINFO2DIR(ctinfo) == IP_CT_DIR_ORIGINAL) { /* Keep-Alives are sent with SEG.SEQ = SND.NXT-1, * therefore we need to add 1 to make the SYN sequence * number match the one of first SYN. */ if (synproxy_recv_client_ack_ipv6(net, skb, th, &opts, ntohl(th->seq) + 1)) { this_cpu_inc(snet->stats->cookie_retrans); consume_skb(skb); return NF_STOLEN; } else { return NF_DROP; } } synproxy->isn = ntohl(th->ack_seq); if (opts.options & NF_SYNPROXY_OPT_TIMESTAMP) synproxy->its = opts.tsecr; nf_conntrack_event_cache(IPCT_SYNPROXY, ct); break; case TCP_CONNTRACK_SYN_RECV: if (!th->syn || !th->ack) break; if (!synproxy_parse_options(skb, thoff, th, &opts)) return NF_DROP; if (opts.options & NF_SYNPROXY_OPT_TIMESTAMP) { synproxy->tsoff = opts.tsval - synproxy->its; nf_conntrack_event_cache(IPCT_SYNPROXY, ct); } opts.options &= ~(NF_SYNPROXY_OPT_MSS | NF_SYNPROXY_OPT_WSCALE | NF_SYNPROXY_OPT_SACK_PERM); swap(opts.tsval, opts.tsecr); synproxy_send_server_ack_ipv6(net, state, skb, th, &opts); nf_ct_seqadj_init(ct, ctinfo, synproxy->isn - ntohl(th->seq)); nf_conntrack_event_cache(IPCT_SEQADJ, ct); swap(opts.tsval, opts.tsecr); synproxy_send_client_ack_ipv6(net, skb, th, &opts); consume_skb(skb); return NF_STOLEN; default: break; } synproxy_tstamp_adjust(skb, thoff, th, ct, ctinfo, synproxy); return NF_ACCEPT; } EXPORT_SYMBOL_GPL(ipv6_synproxy_hook); static const struct nf_hook_ops ipv6_synproxy_ops[] = { { .hook = ipv6_synproxy_hook, .pf = NFPROTO_IPV6, .hooknum = NF_INET_LOCAL_IN, .priority = NF_IP_PRI_CONNTRACK_CONFIRM - 1, }, { .hook = ipv6_synproxy_hook, .pf = NFPROTO_IPV6, .hooknum = NF_INET_POST_ROUTING, .priority = NF_IP_PRI_CONNTRACK_CONFIRM - 1, }, }; int nf_synproxy_ipv6_init(struct synproxy_net *snet, struct net *net) { int err; if (snet->hook_ref6 == 0) { err = nf_register_net_hooks(net, ipv6_synproxy_ops, ARRAY_SIZE(ipv6_synproxy_ops)); if (err) return err; } snet->hook_ref6++; return 0; } EXPORT_SYMBOL_GPL(nf_synproxy_ipv6_init); void nf_synproxy_ipv6_fini(struct synproxy_net *snet, struct net *net) { snet->hook_ref6--; if (snet->hook_ref6 == 0) nf_unregister_net_hooks(net, ipv6_synproxy_ops, ARRAY_SIZE(ipv6_synproxy_ops)); } EXPORT_SYMBOL_GPL(nf_synproxy_ipv6_fini); #endif /* CONFIG_IPV6 */ MODULE_LICENSE("GPL"); MODULE_AUTHOR("Patrick McHardy <kaber@trash.net>"); MODULE_DESCRIPTION("nftables SYNPROXY expression support"); |
| 61 61 13 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 | /* * Copyright (c) 2004 Topspin Communications. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU * General Public License (GPL) Version 2, available from the file * COPYING in the main directory of this source tree, or the * OpenIB.org BSD license below: * * Redistribution and use in source and binary forms, with or * without modification, are permitted provided that the following * conditions are met: * * - Redistributions of source code must retain the above * copyright notice, this list of conditions and the following * disclaimer. * * - Redistributions in binary form must reproduce the above * copyright notice, this list of conditions and the following * disclaimer in the documentation and/or other materials * provided with the distribution. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE * SOFTWARE. */ #ifndef _CORE_PRIV_H #define _CORE_PRIV_H #include <linux/list.h> #include <linux/spinlock.h> #include <linux/cgroup_rdma.h> #include <net/net_namespace.h> #include <net/netns/generic.h> #include <rdma/ib_verbs.h> #include <rdma/opa_addr.h> #include <rdma/ib_mad.h> #include <rdma/restrack.h> #include "mad_priv.h" #include "restrack.h" /* Total number of ports combined across all struct ib_devices's */ #define RDMA_MAX_PORTS 8192 struct pkey_index_qp_list { struct list_head pkey_index_list; u16 pkey_index; /* Lock to hold while iterating the qp_list. */ spinlock_t qp_list_lock; struct list_head qp_list; }; /** * struct rdma_dev_net - rdma net namespace metadata for a net * @nl_sock: Pointer to netlink socket * @net: Pointer to owner net namespace * @id: xarray id to identify the net namespace. */ struct rdma_dev_net { struct sock *nl_sock; possible_net_t net; u32 id; }; extern const struct attribute_group ib_dev_attr_group; extern bool ib_devices_shared_netns; extern unsigned int rdma_dev_net_id; static inline struct rdma_dev_net *rdma_net_to_dev_net(struct net *net) { return net_generic(net, rdma_dev_net_id); } int ib_device_rename(struct ib_device *ibdev, const char *name); int ib_device_set_dim(struct ib_device *ibdev, u8 use_dim); typedef void (*roce_netdev_callback)(struct ib_device *device, u32 port, struct net_device *idev, void *cookie); typedef bool (*roce_netdev_filter)(struct ib_device *device, u32 port, struct net_device *idev, void *cookie); struct net_device *ib_device_get_netdev(struct ib_device *ib_dev, u32 port); void ib_enum_roce_netdev(struct ib_device *ib_dev, roce_netdev_filter filter, void *filter_cookie, roce_netdev_callback cb, void *cookie); void ib_enum_all_roce_netdevs(roce_netdev_filter filter, void *filter_cookie, roce_netdev_callback cb, void *cookie); typedef int (*nldev_callback)(struct ib_device *device, struct sk_buff *skb, struct netlink_callback *cb, unsigned int idx); int ib_enum_all_devs(nldev_callback nldev_cb, struct sk_buff *skb, struct netlink_callback *cb); struct ib_client_nl_info { struct sk_buff *nl_msg; struct device *cdev; u32 port; u64 abi; }; int ib_get_client_nl_info(struct ib_device *ibdev, const char *client_name, struct ib_client_nl_info *res); enum ib_cache_gid_default_mode { IB_CACHE_GID_DEFAULT_MODE_SET, IB_CACHE_GID_DEFAULT_MODE_DELETE }; int ib_cache_gid_parse_type_str(const char *buf); const char *ib_cache_gid_type_str(enum ib_gid_type gid_type); void ib_cache_gid_set_default_gid(struct ib_device *ib_dev, u32 port, struct net_device *ndev, unsigned long gid_type_mask, enum ib_cache_gid_default_mode mode); int ib_cache_gid_add(struct ib_device *ib_dev, u32 port, union ib_gid *gid, struct ib_gid_attr *attr); int ib_cache_gid_del(struct ib_device *ib_dev, u32 port, union ib_gid *gid, struct ib_gid_attr *attr); int ib_cache_gid_del_all_netdev_gids(struct ib_device *ib_dev, u32 port, struct net_device *ndev); int roce_gid_mgmt_init(void); void roce_gid_mgmt_cleanup(void); unsigned long roce_gid_type_mask_support(struct ib_device *ib_dev, u32 port); int ib_cache_setup_one(struct ib_device *device); void ib_cache_cleanup_one(struct ib_device *device); void ib_cache_release_one(struct ib_device *device); void ib_dispatch_event_clients(struct ib_event *event); #ifdef CONFIG_CGROUP_RDMA void ib_device_register_rdmacg(struct ib_device *device); void ib_device_unregister_rdmacg(struct ib_device *device); int ib_rdmacg_try_charge(struct ib_rdmacg_object *cg_obj, struct ib_device *device, enum rdmacg_resource_type resource_index); void ib_rdmacg_uncharge(struct ib_rdmacg_object *cg_obj, struct ib_device *device, enum rdmacg_resource_type resource_index); #else static inline void ib_device_register_rdmacg(struct ib_device *device) { } static inline void ib_device_unregister_rdmacg(struct ib_device *device) { } static inline int ib_rdmacg_try_charge(struct ib_rdmacg_object *cg_obj, struct ib_device *device, enum rdmacg_resource_type resource_index) { return 0; } static inline void ib_rdmacg_uncharge(struct ib_rdmacg_object *cg_obj, struct ib_device *device, enum rdmacg_resource_type resource_index) { } #endif static inline bool rdma_is_upper_dev_rcu(struct net_device *dev, struct net_device *upper) { return netdev_has_upper_dev_all_rcu(dev, upper); } int addr_init(void); void addr_cleanup(void); int ib_mad_init(void); void ib_mad_cleanup(void); int ib_sa_init(void); void ib_sa_cleanup(void); void rdma_nl_init(void); void rdma_nl_exit(void); int ib_nl_handle_resolve_resp(struct sk_buff *skb, struct nlmsghdr *nlh, struct netlink_ext_ack *extack); int ib_nl_handle_set_timeout(struct sk_buff *skb, struct nlmsghdr *nlh, struct netlink_ext_ack *extack); int ib_nl_handle_ip_res_resp(struct sk_buff *skb, struct nlmsghdr *nlh, struct netlink_ext_ack *extack); void ib_get_cached_subnet_prefix(struct ib_device *device, u32 port_num, u64 *sn_pfx); #ifdef CONFIG_SECURITY_INFINIBAND void ib_security_release_port_pkey_list(struct ib_device *device); void ib_security_cache_change(struct ib_device *device, u32 port_num, u64 subnet_prefix); int ib_security_modify_qp(struct ib_qp *qp, struct ib_qp_attr *qp_attr, int qp_attr_mask, struct ib_udata *udata); int ib_create_qp_security(struct ib_qp *qp, struct ib_device *dev); void ib_destroy_qp_security_begin(struct ib_qp_security *sec); void ib_destroy_qp_security_abort(struct ib_qp_security *sec); void ib_destroy_qp_security_end(struct ib_qp_security *sec); int ib_open_shared_qp_security(struct ib_qp *qp, struct ib_device *dev); void ib_close_shared_qp_security(struct ib_qp_security *sec); int ib_mad_agent_security_setup(struct ib_mad_agent *agent, enum ib_qp_type qp_type); void ib_mad_agent_security_cleanup(struct ib_mad_agent *agent); int ib_mad_enforce_security(struct ib_mad_agent_private *map, u16 pkey_index); void ib_mad_agent_security_change(void); #else static inline void ib_security_release_port_pkey_list(struct ib_device *device) { } static inline void ib_security_cache_change(struct ib_device *device, u32 port_num, u64 subnet_prefix) { } static inline int ib_security_modify_qp(struct ib_qp *qp, struct ib_qp_attr *qp_attr, int qp_attr_mask, struct ib_udata *udata) { return qp->device->ops.modify_qp(qp->real_qp, qp_attr, qp_attr_mask, udata); } static inline int ib_create_qp_security(struct ib_qp *qp, struct ib_device *dev) { return 0; } static inline void ib_destroy_qp_security_begin(struct ib_qp_security *sec) { } static inline void ib_destroy_qp_security_abort(struct ib_qp_security *sec) { } static inline void ib_destroy_qp_security_end(struct ib_qp_security *sec) { } static inline int ib_open_shared_qp_security(struct ib_qp *qp, struct ib_device *dev) { return 0; } static inline void ib_close_shared_qp_security(struct ib_qp_security *sec) { } static inline int ib_mad_agent_security_setup(struct ib_mad_agent *agent, enum ib_qp_type qp_type) { return 0; } static inline void ib_mad_agent_security_cleanup(struct ib_mad_agent *agent) { } static inline int ib_mad_enforce_security(struct ib_mad_agent_private *map, u16 pkey_index) { return 0; } static inline void ib_mad_agent_security_change(void) { } #endif struct ib_device *ib_device_get_by_index(const struct net *net, u32 index); /* RDMA device netlink */ void nldev_init(void); void nldev_exit(void); struct ib_qp *ib_create_qp_user(struct ib_device *dev, struct ib_pd *pd, struct ib_qp_init_attr *attr, struct ib_udata *udata, struct ib_uqp_object *uobj, const char *caller); void ib_qp_usecnt_inc(struct ib_qp *qp); void ib_qp_usecnt_dec(struct ib_qp *qp); struct rdma_dev_addr; int rdma_addr_find_l2_eth_by_grh(const union ib_gid *sgid, const union ib_gid *dgid, u8 *dmac, const struct ib_gid_attr *sgid_attr, int *hoplimit); void rdma_copy_src_l2_addr(struct rdma_dev_addr *dev_addr, const struct net_device *dev); struct sa_path_rec; int roce_resolve_route_from_path(struct sa_path_rec *rec, const struct ib_gid_attr *attr); struct net_device *rdma_read_gid_attr_ndev_rcu(const struct ib_gid_attr *attr); void ib_free_port_attrs(struct ib_core_device *coredev); int ib_setup_port_attrs(struct ib_core_device *coredev); struct rdma_hw_stats *ib_get_hw_stats_port(struct ib_device *ibdev, u32 port_num); void ib_device_release_hw_stats(struct hw_stats_device_data *data); int ib_setup_device_attrs(struct ib_device *ibdev); int rdma_compatdev_set(u8 enable); int ib_port_register_client_groups(struct ib_device *ibdev, u32 port_num, const struct attribute_group **groups); void ib_port_unregister_client_groups(struct ib_device *ibdev, u32 port_num, const struct attribute_group **groups); int ib_device_set_netns_put(struct sk_buff *skb, struct ib_device *dev, u32 ns_fd); int rdma_nl_net_init(struct rdma_dev_net *rnet); void rdma_nl_net_exit(struct rdma_dev_net *rnet); struct rdma_umap_priv { struct vm_area_struct *vma; struct list_head list; struct rdma_user_mmap_entry *entry; }; void rdma_umap_priv_init(struct rdma_umap_priv *priv, struct vm_area_struct *vma, struct rdma_user_mmap_entry *entry); void ib_cq_pool_cleanup(struct ib_device *dev); bool rdma_nl_get_privileged_qkey(void); #endif /* _CORE_PRIV_H */ |
| 544 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 | // SPDX-License-Identifier: GPL-2.0 /* * Dynamic byte queue limits. See include/linux/dynamic_queue_limits.h * * Copyright (c) 2011, Tom Herbert <therbert@google.com> */ #include <linux/types.h> #include <linux/kernel.h> #include <linux/jiffies.h> #include <linux/dynamic_queue_limits.h> #include <linux/compiler.h> #include <linux/export.h> #include <trace/events/napi.h> #define POSDIFF(A, B) ((int)((A) - (B)) > 0 ? (A) - (B) : 0) #define AFTER_EQ(A, B) ((int)((A) - (B)) >= 0) static void dql_check_stall(struct dql *dql, unsigned short stall_thrs) { unsigned long now; if (!stall_thrs) return; now = jiffies; /* Check for a potential stall */ if (time_after_eq(now, dql->last_reap + stall_thrs)) { unsigned long hist_head, t, start, end; /* We are trying to detect a period of at least @stall_thrs * jiffies without any Tx completions, but during first half * of which some Tx was posted. */ dqs_again: hist_head = READ_ONCE(dql->history_head); /* pairs with smp_wmb() in dql_queued() */ smp_rmb(); /* Get the previous entry in the ring buffer, which is the * oldest sample. */ start = (hist_head - DQL_HIST_LEN + 1) * BITS_PER_LONG; /* Advance start to continue from the last reap time */ if (time_before(start, dql->last_reap + 1)) start = dql->last_reap + 1; /* Newest sample we should have already seen a completion for */ end = hist_head * BITS_PER_LONG + (BITS_PER_LONG - 1); /* Shrink the search space to [start, (now - start_thrs/2)] if * `end` is beyond the stall zone */ if (time_before(now, end + stall_thrs / 2)) end = now - stall_thrs / 2; /* Search for the queued time in [t, end] */ for (t = start; time_before_eq(t, end); t++) if (test_bit(t % (DQL_HIST_LEN * BITS_PER_LONG), dql->history)) break; /* Variable t contains the time of the queue */ if (!time_before_eq(t, end)) goto no_stall; /* The ring buffer was modified in the meantime, retry */ if (hist_head != READ_ONCE(dql->history_head)) goto dqs_again; dql->stall_cnt++; dql->stall_max = max_t(unsigned short, dql->stall_max, now - t); trace_dql_stall_detected(dql->stall_thrs, now - t, dql->last_reap, dql->history_head, now, dql->history); } no_stall: dql->last_reap = now; } /* Records completed count and recalculates the queue limit */ void dql_completed(struct dql *dql, unsigned int count) { unsigned int inprogress, prev_inprogress, limit; unsigned int ovlimit, completed, num_queued; unsigned short stall_thrs; bool all_prev_completed; num_queued = READ_ONCE(dql->num_queued); /* Read stall_thrs in advance since it belongs to the same (first) * cache line as ->num_queued. This way, dql_check_stall() does not * need to touch the first cache line again later, reducing the window * of possible false sharing. */ stall_thrs = READ_ONCE(dql->stall_thrs); /* Can't complete more than what's in queue */ BUG_ON(count > num_queued - dql->num_completed); completed = dql->num_completed + count; limit = dql->limit; ovlimit = POSDIFF(num_queued - dql->num_completed, limit); inprogress = num_queued - completed; prev_inprogress = dql->prev_num_queued - dql->num_completed; all_prev_completed = AFTER_EQ(completed, dql->prev_num_queued); if ((ovlimit && !inprogress) || (dql->prev_ovlimit && all_prev_completed)) { /* * Queue considered starved if: * - The queue was over-limit in the last interval, * and there is no more data in the queue. * OR * - The queue was over-limit in the previous interval and * when enqueuing it was possible that all queued data * had been consumed. This covers the case when queue * may have becomes starved between completion processing * running and next time enqueue was scheduled. * * When queue is starved increase the limit by the amount * of bytes both sent and completed in the last interval, * plus any previous over-limit. */ limit += POSDIFF(completed, dql->prev_num_queued) + dql->prev_ovlimit; dql->slack_start_time = jiffies; dql->lowest_slack = UINT_MAX; } else if (inprogress && prev_inprogress && !all_prev_completed) { /* * Queue was not starved, check if the limit can be decreased. * A decrease is only considered if the queue has been busy in * the whole interval (the check above). * * If there is slack, the amount of excess data queued above * the amount needed to prevent starvation, the queue limit * can be decreased. To avoid hysteresis we consider the * minimum amount of slack found over several iterations of the * completion routine. */ unsigned int slack, slack_last_objs; /* * Slack is the maximum of * - The queue limit plus previous over-limit minus twice * the number of objects completed. Note that two times * number of completed bytes is a basis for an upper bound * of the limit. * - Portion of objects in the last queuing operation that * was not part of non-zero previous over-limit. That is * "round down" by non-overlimit portion of the last * queueing operation. */ slack = POSDIFF(limit + dql->prev_ovlimit, 2 * (completed - dql->num_completed)); slack_last_objs = dql->prev_ovlimit ? POSDIFF(dql->prev_last_obj_cnt, dql->prev_ovlimit) : 0; slack = max(slack, slack_last_objs); if (slack < dql->lowest_slack) dql->lowest_slack = slack; if (time_after(jiffies, dql->slack_start_time + dql->slack_hold_time)) { limit = POSDIFF(limit, dql->lowest_slack); dql->slack_start_time = jiffies; dql->lowest_slack = UINT_MAX; } } /* Enforce bounds on limit */ limit = clamp(limit, dql->min_limit, dql->max_limit); if (limit != dql->limit) { dql->limit = limit; ovlimit = 0; } dql->adj_limit = limit + completed; dql->prev_ovlimit = ovlimit; dql->prev_last_obj_cnt = READ_ONCE(dql->last_obj_cnt); dql->num_completed = completed; dql->prev_num_queued = num_queued; dql_check_stall(dql, stall_thrs); } EXPORT_SYMBOL(dql_completed); void dql_reset(struct dql *dql) { /* Reset all dynamic values */ dql->limit = dql->min_limit; dql->num_queued = 0; dql->num_completed = 0; dql->last_obj_cnt = 0; dql->prev_num_queued = 0; dql->prev_last_obj_cnt = 0; dql->prev_ovlimit = 0; dql->lowest_slack = UINT_MAX; dql->slack_start_time = jiffies; dql->last_reap = jiffies; dql->history_head = jiffies / BITS_PER_LONG; memset(dql->history, 0, sizeof(dql->history)); } EXPORT_SYMBOL(dql_reset); void dql_init(struct dql *dql, unsigned int hold_time) { dql->max_limit = DQL_MAX_LIMIT; dql->min_limit = 0; dql->slack_hold_time = hold_time; dql->stall_thrs = 0; dql_reset(dql); } EXPORT_SYMBOL(dql_init); |
| 566 114 666 94 47 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 | /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _LINUX_RANDOM_H #define _LINUX_RANDOM_H #include <linux/bug.h> #include <linux/kernel.h> #include <linux/list.h> #include <uapi/linux/random.h> struct notifier_block; void add_device_randomness(const void *buf, size_t len); void __init add_bootloader_randomness(const void *buf, size_t len); void add_input_randomness(unsigned int type, unsigned int code, unsigned int value) __latent_entropy; void add_interrupt_randomness(int irq) __latent_entropy; void add_hwgenerator_randomness(const void *buf, size_t len, size_t entropy, bool sleep_after); static inline void add_latent_entropy(void) { #if defined(LATENT_ENTROPY_PLUGIN) && !defined(__CHECKER__) add_device_randomness((const void *)&latent_entropy, sizeof(latent_entropy)); #else add_device_randomness(NULL, 0); #endif } #if IS_ENABLED(CONFIG_VMGENID) void add_vmfork_randomness(const void *unique_vm_id, size_t len); int register_random_vmfork_notifier(struct notifier_block *nb); int unregister_random_vmfork_notifier(struct notifier_block *nb); #else static inline int register_random_vmfork_notifier(struct notifier_block *nb) { return 0; } static inline int unregister_random_vmfork_notifier(struct notifier_block *nb) { return 0; } #endif void get_random_bytes(void *buf, size_t len); u8 get_random_u8(void); u16 get_random_u16(void); u32 get_random_u32(void); u64 get_random_u64(void); static inline unsigned long get_random_long(void) { #if BITS_PER_LONG == 64 return get_random_u64(); #else return get_random_u32(); #endif } u32 __get_random_u32_below(u32 ceil); /* * Returns a random integer in the interval [0, ceil), with uniform * distribution, suitable for all uses. Fastest when ceil is a constant, but * still fast for variable ceil as well. */ static inline u32 get_random_u32_below(u32 ceil) { if (!__builtin_constant_p(ceil)) return __get_random_u32_below(ceil); /* * For the fast path, below, all operations on ceil are precomputed by * the compiler, so this incurs no overhead for checking pow2, doing * divisions, or branching based on integer size. The resultant * algorithm does traditional reciprocal multiplication (typically * optimized by the compiler into shifts and adds), rejecting samples * whose lower half would indicate a range indivisible by ceil. */ BUILD_BUG_ON_MSG(!ceil, "get_random_u32_below() must take ceil > 0"); if (ceil <= 1) return 0; for (;;) { if (ceil <= 1U << 8) { u32 mult = ceil * get_random_u8(); if (likely(is_power_of_2(ceil) || (u8)mult >= (1U << 8) % ceil)) return mult >> 8; } else if (ceil <= 1U << 16) { u32 mult = ceil * get_random_u16(); if (likely(is_power_of_2(ceil) || (u16)mult >= (1U << 16) % ceil)) return mult >> 16; } else { u64 mult = (u64)ceil * get_random_u32(); if (likely(is_power_of_2(ceil) || (u32)mult >= -ceil % ceil)) return mult >> 32; } } } /* * Returns a random integer in the interval (floor, U32_MAX], with uniform * distribution, suitable for all uses. Fastest when floor is a constant, but * still fast for variable floor as well. */ static inline u32 get_random_u32_above(u32 floor) { BUILD_BUG_ON_MSG(__builtin_constant_p(floor) && floor == U32_MAX, "get_random_u32_above() must take floor < U32_MAX"); return floor + 1 + get_random_u32_below(U32_MAX - floor); } /* * Returns a random integer in the interval [floor, ceil], with uniform * distribution, suitable for all uses. Fastest when floor and ceil are * constant, but still fast for variable floor and ceil as well. */ static inline u32 get_random_u32_inclusive(u32 floor, u32 ceil) { BUILD_BUG_ON_MSG(__builtin_constant_p(floor) && __builtin_constant_p(ceil) && (floor > ceil || ceil - floor == U32_MAX), "get_random_u32_inclusive() must take floor <= ceil"); return floor + get_random_u32_below(ceil - floor + 1); } void __init random_init_early(const char *command_line); void __init random_init(void); bool rng_is_initialized(void); int wait_for_random_bytes(void); int execute_with_initialized_rng(struct notifier_block *nb); /* Calls wait_for_random_bytes() and then calls get_random_bytes(buf, nbytes). * Returns the result of the call to wait_for_random_bytes. */ static inline int get_random_bytes_wait(void *buf, size_t nbytes) { int ret = wait_for_random_bytes(); get_random_bytes(buf, nbytes); return ret; } #define declare_get_random_var_wait(name, ret_type) \ static inline int get_random_ ## name ## _wait(ret_type *out) { \ int ret = wait_for_random_bytes(); \ if (unlikely(ret)) \ return ret; \ *out = get_random_ ## name(); \ return 0; \ } declare_get_random_var_wait(u8, u8) declare_get_random_var_wait(u16, u16) declare_get_random_var_wait(u32, u32) declare_get_random_var_wait(u64, u32) declare_get_random_var_wait(long, unsigned long) #undef declare_get_random_var #ifdef CONFIG_SMP int random_prepare_cpu(unsigned int cpu); int random_online_cpu(unsigned int cpu); #endif #ifndef MODULE extern const struct file_operations random_fops, urandom_fops; #endif #endif /* _LINUX_RANDOM_H */ |
| 615 12463 11667 334 136 1876 71 8 11177 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 | /* SPDX-License-Identifier: GPL-2.0-or-later */ /* * Berkeley style UIO structures - Alan Cox 1994. */ #ifndef __LINUX_UIO_H #define __LINUX_UIO_H #include <linux/kernel.h> #include <linux/mm_types.h> #include <linux/ucopysize.h> #include <uapi/linux/uio.h> struct page; struct folio_queue; typedef unsigned int __bitwise iov_iter_extraction_t; struct kvec { void *iov_base; /* and that should *never* hold a userland pointer */ size_t iov_len; }; enum iter_type { /* iter types */ ITER_UBUF, ITER_IOVEC, ITER_BVEC, ITER_KVEC, ITER_FOLIOQ, ITER_XARRAY, ITER_DISCARD, }; #define ITER_SOURCE 1 // == WRITE #define ITER_DEST 0 // == READ struct iov_iter_state { size_t iov_offset; size_t count; unsigned long nr_segs; }; struct iov_iter { u8 iter_type; bool nofault; bool data_source; size_t iov_offset; /* * Hack alert: overlay ubuf_iovec with iovec + count, so * that the members resolve correctly regardless of the type * of iterator used. This means that you can use: * * &iter->__ubuf_iovec or iter->__iov * * interchangably for the user_backed cases, hence simplifying * some of the cases that need to deal with both. */ union { /* * This really should be a const, but we cannot do that without * also modifying any of the zero-filling iter init functions. * Leave it non-const for now, but it should be treated as such. */ struct iovec __ubuf_iovec; struct { union { /* use iter_iov() to get the current vec */ const struct iovec *__iov; const struct kvec *kvec; const struct bio_vec *bvec; const struct folio_queue *folioq; struct xarray *xarray; void __user *ubuf; }; size_t count; }; }; union { unsigned long nr_segs; u8 folioq_slot; loff_t xarray_start; }; }; typedef __u16 uio_meta_flags_t; struct uio_meta { uio_meta_flags_t flags; u16 app_tag; u64 seed; struct iov_iter iter; }; static inline const struct iovec *iter_iov(const struct iov_iter *iter) { if (iter->iter_type == ITER_UBUF) return (const struct iovec *) &iter->__ubuf_iovec; return iter->__iov; } #define iter_iov_addr(iter) (iter_iov(iter)->iov_base + (iter)->iov_offset) static inline size_t iter_iov_len(const struct iov_iter *i) { if (i->iter_type == ITER_UBUF) return i->count; return iter_iov(i)->iov_len - i->iov_offset; } static inline enum iter_type iov_iter_type(const struct iov_iter *i) { return i->iter_type; } static inline void iov_iter_save_state(struct iov_iter *iter, struct iov_iter_state *state) { state->iov_offset = iter->iov_offset; state->count = iter->count; state->nr_segs = iter->nr_segs; } static inline bool iter_is_ubuf(const struct iov_iter *i) { return iov_iter_type(i) == ITER_UBUF; } static inline bool iter_is_iovec(const struct iov_iter *i) { return iov_iter_type(i) == ITER_IOVEC; } static inline bool iov_iter_is_kvec(const struct iov_iter *i) { return iov_iter_type(i) == ITER_KVEC; } static inline bool iov_iter_is_bvec(const struct iov_iter *i) { return iov_iter_type(i) == ITER_BVEC; } static inline bool iov_iter_is_discard(const struct iov_iter *i) { return iov_iter_type(i) == ITER_DISCARD; } static inline bool iov_iter_is_folioq(const struct iov_iter *i) { return iov_iter_type(i) == ITER_FOLIOQ; } static inline bool iov_iter_is_xarray(const struct iov_iter *i) { return iov_iter_type(i) == ITER_XARRAY; } static inline unsigned char iov_iter_rw(const struct iov_iter *i) { return i->data_source ? WRITE : READ; } static inline bool user_backed_iter(const struct iov_iter *i) { return iter_is_ubuf(i) || iter_is_iovec(i); } /* * Total number of bytes covered by an iovec. * * NOTE that it is not safe to use this function until all the iovec's * segment lengths have been validated. Because the individual lengths can * overflow a size_t when added together. */ static inline size_t iov_length(const struct iovec *iov, unsigned long nr_segs) { unsigned long seg; size_t ret = 0; for (seg = 0; seg < nr_segs; seg++) ret += iov[seg].iov_len; return ret; } void iov_iter_advance(struct iov_iter *i, size_t bytes); void iov_iter_revert(struct iov_iter *i, size_t bytes); size_t fault_in_iov_iter_readable(const struct iov_iter *i, size_t bytes); size_t fault_in_iov_iter_writeable(const struct iov_iter *i, size_t bytes); size_t iov_iter_single_seg_count(const struct iov_iter *i); size_t copy_page_to_iter(struct page *page, size_t offset, size_t bytes, struct iov_iter *i); size_t copy_page_from_iter(struct page *page, size_t offset, size_t bytes, struct iov_iter *i); size_t copy_folio_from_iter_atomic(struct folio *folio, size_t offset, size_t bytes, struct iov_iter *i); size_t _copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i); size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i); size_t _copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i); static inline size_t copy_folio_to_iter(struct folio *folio, size_t offset, size_t bytes, struct iov_iter *i) { return copy_page_to_iter(&folio->page, offset, bytes, i); } static inline size_t copy_folio_from_iter(struct folio *folio, size_t offset, size_t bytes, struct iov_iter *i) { return copy_page_from_iter(&folio->page, offset, bytes, i); } size_t copy_page_to_iter_nofault(struct page *page, unsigned offset, size_t bytes, struct iov_iter *i); static __always_inline __must_check size_t copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i) { if (check_copy_size(addr, bytes, true)) return _copy_to_iter(addr, bytes, i); return 0; } static __always_inline __must_check size_t copy_from_iter(void *addr, size_t bytes, struct iov_iter *i) { if (check_copy_size(addr, bytes, false)) return _copy_from_iter(addr, bytes, i); return 0; } static __always_inline __must_check bool copy_to_iter_full(const void *addr, size_t bytes, struct iov_iter *i) { size_t copied = copy_to_iter(addr, bytes, i); if (likely(copied == bytes)) return true; iov_iter_revert(i, copied); return false; } static __always_inline __must_check bool copy_from_iter_full(void *addr, size_t bytes, struct iov_iter *i) { size_t copied = copy_from_iter(addr, bytes, i); if (likely(copied == bytes)) return true; iov_iter_revert(i, copied); return false; } static __always_inline __must_check size_t copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i) { if (check_copy_size(addr, bytes, false)) return _copy_from_iter_nocache(addr, bytes, i); return 0; } static __always_inline __must_check bool copy_from_iter_full_nocache(void *addr, size_t bytes, struct iov_iter *i) { size_t copied = copy_from_iter_nocache(addr, bytes, i); if (likely(copied == bytes)) return true; iov_iter_revert(i, copied); return false; } #ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE /* * Note, users like pmem that depend on the stricter semantics of * _copy_from_iter_flushcache() than _copy_from_iter_nocache() must check for * IS_ENABLED(CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE) before assuming that the * destination is flushed from the cache on return. */ size_t _copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i); #else #define _copy_from_iter_flushcache _copy_from_iter_nocache #endif #ifdef CONFIG_ARCH_HAS_COPY_MC size_t _copy_mc_to_iter(const void *addr, size_t bytes, struct iov_iter *i); #else #define _copy_mc_to_iter _copy_to_iter #endif size_t iov_iter_zero(size_t bytes, struct iov_iter *); unsigned long iov_iter_alignment(const struct iov_iter *i); unsigned long iov_iter_gap_alignment(const struct iov_iter *i); void iov_iter_init(struct iov_iter *i, unsigned int direction, const struct iovec *iov, unsigned long nr_segs, size_t count); void iov_iter_kvec(struct iov_iter *i, unsigned int direction, const struct kvec *kvec, unsigned long nr_segs, size_t count); void iov_iter_bvec(struct iov_iter *i, unsigned int direction, const struct bio_vec *bvec, unsigned long nr_segs, size_t count); void iov_iter_discard(struct iov_iter *i, unsigned int direction, size_t count); void iov_iter_folio_queue(struct iov_iter *i, unsigned int direction, const struct folio_queue *folioq, unsigned int first_slot, unsigned int offset, size_t count); void iov_iter_xarray(struct iov_iter *i, unsigned int direction, struct xarray *xarray, loff_t start, size_t count); ssize_t iov_iter_get_pages2(struct iov_iter *i, struct page **pages, size_t maxsize, unsigned maxpages, size_t *start); ssize_t iov_iter_get_pages_alloc2(struct iov_iter *i, struct page ***pages, size_t maxsize, size_t *start); int iov_iter_npages(const struct iov_iter *i, int maxpages); void iov_iter_restore(struct iov_iter *i, struct iov_iter_state *state); const void *dup_iter(struct iov_iter *new, struct iov_iter *old, gfp_t flags); static inline size_t iov_iter_count(const struct iov_iter *i) { return i->count; } /* * Cap the iov_iter by given limit; note that the second argument is * *not* the new size - it's upper limit for such. Passing it a value * greater than the amount of data in iov_iter is fine - it'll just do * nothing in that case. */ static inline void iov_iter_truncate(struct iov_iter *i, u64 count) { /* * count doesn't have to fit in size_t - comparison extends both * operands to u64 here and any value that would be truncated by * conversion in assignement is by definition greater than all * values of size_t, including old i->count. */ if (i->count > count) i->count = count; } /* * reexpand a previously truncated iterator; count must be no more than how much * we had shrunk it. */ static inline void iov_iter_reexpand(struct iov_iter *i, size_t count) { i->count = count; } static inline int iov_iter_npages_cap(struct iov_iter *i, int maxpages, size_t max_bytes) { size_t shorted = 0; int npages; if (iov_iter_count(i) > max_bytes) { shorted = iov_iter_count(i) - max_bytes; iov_iter_truncate(i, max_bytes); } npages = iov_iter_npages(i, maxpages); if (shorted) iov_iter_reexpand(i, iov_iter_count(i) + shorted); return npages; } struct iovec *iovec_from_user(const struct iovec __user *uvector, unsigned long nr_segs, unsigned long fast_segs, struct iovec *fast_iov, bool compat); ssize_t import_iovec(int type, const struct iovec __user *uvec, unsigned nr_segs, unsigned fast_segs, struct iovec **iovp, struct iov_iter *i); ssize_t __import_iovec(int type, const struct iovec __user *uvec, unsigned nr_segs, unsigned fast_segs, struct iovec **iovp, struct iov_iter *i, bool compat); int import_ubuf(int type, void __user *buf, size_t len, struct iov_iter *i); static inline void iov_iter_ubuf(struct iov_iter *i, unsigned int direction, void __user *buf, size_t count) { WARN_ON(direction & ~(READ | WRITE)); *i = (struct iov_iter) { .iter_type = ITER_UBUF, .data_source = direction, .ubuf = buf, .count = count, .nr_segs = 1 }; } /* Flags for iov_iter_get/extract_pages*() */ /* Allow P2PDMA on the extracted pages */ #define ITER_ALLOW_P2PDMA ((__force iov_iter_extraction_t)0x01) ssize_t iov_iter_extract_pages(struct iov_iter *i, struct page ***pages, size_t maxsize, unsigned int maxpages, iov_iter_extraction_t extraction_flags, size_t *offset0); /** * iov_iter_extract_will_pin - Indicate how pages from the iterator will be retained * @iter: The iterator * * Examine the iterator and indicate by returning true or false as to how, if * at all, pages extracted from the iterator will be retained by the extraction * function. * * %true indicates that the pages will have a pin placed in them that the * caller must unpin. This is must be done for DMA/async DIO to force fork() * to forcibly copy a page for the child (the parent must retain the original * page). * * %false indicates that no measures are taken and that it's up to the caller * to retain the pages. */ static inline bool iov_iter_extract_will_pin(const struct iov_iter *iter) { return user_backed_iter(iter); } struct sg_table; ssize_t extract_iter_to_sg(struct iov_iter *iter, size_t len, struct sg_table *sgtable, unsigned int sg_max, iov_iter_extraction_t extraction_flags); #endif |
| 32 6 2 20 4 19 2 2 21 2 7 5 2 30 7 19 4 4 4 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 | // SPDX-License-Identifier: GPL-2.0-only /* * Copyright (c) 2016 Pablo Neira Ayuso <pablo@netfilter.org> */ #include <linux/kernel.h> #include <linux/init.h> #include <linux/module.h> #include <linux/atomic.h> #include <linux/netlink.h> #include <linux/netfilter.h> #include <linux/netfilter/nf_tables.h> #include <net/netfilter/nf_tables.h> struct nft_quota { atomic64_t quota; unsigned long flags; atomic64_t *consumed; }; static inline bool nft_overquota(struct nft_quota *priv, const struct sk_buff *skb, bool *report) { u64 consumed = atomic64_add_return(skb->len, priv->consumed); u64 quota = atomic64_read(&priv->quota); if (report) *report = consumed >= quota; return consumed > quota; } static inline bool nft_quota_invert(struct nft_quota *priv) { return priv->flags & NFT_QUOTA_F_INV; } static inline void nft_quota_do_eval(struct nft_quota *priv, struct nft_regs *regs, const struct nft_pktinfo *pkt) { if (nft_overquota(priv, pkt->skb, NULL) ^ nft_quota_invert(priv)) regs->verdict.code = NFT_BREAK; } static const struct nla_policy nft_quota_policy[NFTA_QUOTA_MAX + 1] = { [NFTA_QUOTA_BYTES] = { .type = NLA_U64 }, [NFTA_QUOTA_FLAGS] = { .type = NLA_U32 }, [NFTA_QUOTA_CONSUMED] = { .type = NLA_U64 }, }; #define NFT_QUOTA_DEPLETED_BIT 1 /* From NFT_QUOTA_F_DEPLETED. */ static void nft_quota_obj_eval(struct nft_object *obj, struct nft_regs *regs, const struct nft_pktinfo *pkt) { struct nft_quota *priv = nft_obj_data(obj); bool overquota, report; overquota = nft_overquota(priv, pkt->skb, &report); if (overquota ^ nft_quota_invert(priv)) regs->verdict.code = NFT_BREAK; if (report && !test_and_set_bit(NFT_QUOTA_DEPLETED_BIT, &priv->flags)) nft_obj_notify(nft_net(pkt), obj->key.table, obj, 0, 0, NFT_MSG_NEWOBJ, 0, nft_pf(pkt), 0, GFP_ATOMIC); } static int nft_quota_do_init(const struct nlattr * const tb[], struct nft_quota *priv) { unsigned long flags = 0; u64 quota, consumed = 0; if (!tb[NFTA_QUOTA_BYTES]) return -EINVAL; quota = be64_to_cpu(nla_get_be64(tb[NFTA_QUOTA_BYTES])); if (quota > S64_MAX) return -EOVERFLOW; if (tb[NFTA_QUOTA_CONSUMED]) { consumed = be64_to_cpu(nla_get_be64(tb[NFTA_QUOTA_CONSUMED])); if (consumed > quota) return -EINVAL; } if (tb[NFTA_QUOTA_FLAGS]) { flags = ntohl(nla_get_be32(tb[NFTA_QUOTA_FLAGS])); if (flags & ~NFT_QUOTA_F_INV) return -EINVAL; if (flags & NFT_QUOTA_F_DEPLETED) return -EOPNOTSUPP; } priv->consumed = kmalloc(sizeof(*priv->consumed), GFP_KERNEL_ACCOUNT); if (!priv->consumed) return -ENOMEM; atomic64_set(&priv->quota, quota); priv->flags = flags; atomic64_set(priv->consumed, consumed); return 0; } static void nft_quota_do_destroy(const struct nft_ctx *ctx, struct nft_quota *priv) { kfree(priv->consumed); } static int nft_quota_obj_init(const struct nft_ctx *ctx, const struct nlattr * const tb[], struct nft_object *obj) { struct nft_quota *priv = nft_obj_data(obj); return nft_quota_do_init(tb, priv); } static void nft_quota_obj_update(struct nft_object *obj, struct nft_object *newobj) { struct nft_quota *newpriv = nft_obj_data(newobj); struct nft_quota *priv = nft_obj_data(obj); u64 newquota; newquota = atomic64_read(&newpriv->quota); atomic64_set(&priv->quota, newquota); priv->flags = newpriv->flags; } static int nft_quota_do_dump(struct sk_buff *skb, struct nft_quota *priv, bool reset) { u64 consumed, consumed_cap, quota; u32 flags = priv->flags; /* Since we inconditionally increment consumed quota for each packet * that we see, don't go over the quota boundary in what we send to * userspace. */ consumed = atomic64_read(priv->consumed); quota = atomic64_read(&priv->quota); if (consumed >= quota) { consumed_cap = quota; flags |= NFT_QUOTA_F_DEPLETED; } else { consumed_cap = consumed; } if (nla_put_be64(skb, NFTA_QUOTA_BYTES, cpu_to_be64(quota), NFTA_QUOTA_PAD) || nla_put_be64(skb, NFTA_QUOTA_CONSUMED, cpu_to_be64(consumed_cap), NFTA_QUOTA_PAD) || nla_put_be32(skb, NFTA_QUOTA_FLAGS, htonl(flags))) goto nla_put_failure; if (reset) { atomic64_sub(consumed, priv->consumed); clear_bit(NFT_QUOTA_DEPLETED_BIT, &priv->flags); } return 0; nla_put_failure: return -1; } static int nft_quota_obj_dump(struct sk_buff *skb, struct nft_object *obj, bool reset) { struct nft_quota *priv = nft_obj_data(obj); return nft_quota_do_dump(skb, priv, reset); } static void nft_quota_obj_destroy(const struct nft_ctx *ctx, struct nft_object *obj) { struct nft_quota *priv = nft_obj_data(obj); return nft_quota_do_destroy(ctx, priv); } static struct nft_object_type nft_quota_obj_type; static const struct nft_object_ops nft_quota_obj_ops = { .type = &nft_quota_obj_type, .size = sizeof(struct nft_quota), .init = nft_quota_obj_init, .destroy = nft_quota_obj_destroy, .eval = nft_quota_obj_eval, .dump = nft_quota_obj_dump, .update = nft_quota_obj_update, }; static struct nft_object_type nft_quota_obj_type __read_mostly = { .type = NFT_OBJECT_QUOTA, .ops = &nft_quota_obj_ops, .maxattr = NFTA_QUOTA_MAX, .policy = nft_quota_policy, .owner = THIS_MODULE, }; static void nft_quota_eval(const struct nft_expr *expr, struct nft_regs *regs, const struct nft_pktinfo *pkt) { struct nft_quota *priv = nft_expr_priv(expr); nft_quota_do_eval(priv, regs, pkt); } static int nft_quota_init(const struct nft_ctx *ctx, const struct nft_expr *expr, const struct nlattr * const tb[]) { struct nft_quota *priv = nft_expr_priv(expr); return nft_quota_do_init(tb, priv); } static int nft_quota_dump(struct sk_buff *skb, const struct nft_expr *expr, bool reset) { struct nft_quota *priv = nft_expr_priv(expr); return nft_quota_do_dump(skb, priv, reset); } static void nft_quota_destroy(const struct nft_ctx *ctx, const struct nft_expr *expr) { struct nft_quota *priv = nft_expr_priv(expr); return nft_quota_do_destroy(ctx, priv); } static int nft_quota_clone(struct nft_expr *dst, const struct nft_expr *src, gfp_t gfp) { struct nft_quota *priv_dst = nft_expr_priv(dst); struct nft_quota *priv_src = nft_expr_priv(src); priv_dst->quota = priv_src->quota; priv_dst->flags = priv_src->flags; priv_dst->consumed = kmalloc(sizeof(*priv_dst->consumed), gfp); if (!priv_dst->consumed) return -ENOMEM; *priv_dst->consumed = *priv_src->consumed; return 0; } static struct nft_expr_type nft_quota_type; static const struct nft_expr_ops nft_quota_ops = { .type = &nft_quota_type, .size = NFT_EXPR_SIZE(sizeof(struct nft_quota)), .eval = nft_quota_eval, .init = nft_quota_init, .destroy = nft_quota_destroy, .clone = nft_quota_clone, .dump = nft_quota_dump, .reduce = NFT_REDUCE_READONLY, }; static struct nft_expr_type nft_quota_type __read_mostly = { .name = "quota", .ops = &nft_quota_ops, .policy = nft_quota_policy, .maxattr = NFTA_QUOTA_MAX, .flags = NFT_EXPR_STATEFUL, .owner = THIS_MODULE, }; static int __init nft_quota_module_init(void) { int err; err = nft_register_obj(&nft_quota_obj_type); if (err < 0) return err; err = nft_register_expr(&nft_quota_type); if (err < 0) goto err1; return 0; err1: nft_unregister_obj(&nft_quota_obj_type); return err; } static void __exit nft_quota_module_exit(void) { nft_unregister_expr(&nft_quota_type); nft_unregister_obj(&nft_quota_obj_type); } module_init(nft_quota_module_init); module_exit(nft_quota_module_exit); MODULE_LICENSE("GPL"); MODULE_AUTHOR("Pablo Neira Ayuso <pablo@netfilter.org>"); MODULE_ALIAS_NFT_EXPR("quota"); MODULE_ALIAS_NFT_OBJ(NFT_OBJECT_QUOTA); MODULE_DESCRIPTION("Netfilter nftables quota module"); |
| 88 88 5 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 | /* SPDX-License-Identifier: GPL-2.0-only */ /* * kernfs.h - pseudo filesystem decoupled from vfs locking */ #ifndef __LINUX_KERNFS_H #define __LINUX_KERNFS_H #include <linux/err.h> #include <linux/list.h> #include <linux/mutex.h> #include <linux/idr.h> #include <linux/lockdep.h> #include <linux/rbtree.h> #include <linux/atomic.h> #include <linux/bug.h> #include <linux/types.h> #include <linux/uidgid.h> #include <linux/wait.h> #include <linux/rwsem.h> #include <linux/cache.h> struct file; struct dentry; struct iattr; struct seq_file; struct vm_area_struct; struct vm_operations_struct; struct super_block; struct file_system_type; struct poll_table_struct; struct fs_context; struct kernfs_fs_context; struct kernfs_open_node; struct kernfs_iattrs; /* * NR_KERNFS_LOCK_BITS determines size (NR_KERNFS_LOCKS) of hash * table of locks. * Having a small hash table would impact scalability, since * more and more kernfs_node objects will end up using same lock * and having a very large hash table would waste memory. * * At the moment size of hash table of locks is being set based on * the number of CPUs as follows: * * NR_CPU NR_KERNFS_LOCK_BITS NR_KERNFS_LOCKS * 1 1 2 * 2-3 2 4 * 4-7 4 16 * 8-15 6 64 * 16-31 8 256 * 32 and more 10 1024 * * The above relation between NR_CPU and number of locks is based * on some internal experimentation which involved booting qemu * with different values of smp, performing some sysfs operations * on all CPUs and observing how increase in number of locks impacts * completion time of these sysfs operations on each CPU. */ #ifdef CONFIG_SMP #define NR_KERNFS_LOCK_BITS (2 * (ilog2(NR_CPUS < 32 ? NR_CPUS : 32))) #else #define NR_KERNFS_LOCK_BITS 1 #endif #define NR_KERNFS_LOCKS (1 << NR_KERNFS_LOCK_BITS) /* * There's one kernfs_open_file for each open file and one kernfs_open_node * for each kernfs_node with one or more open files. * * filp->private_data points to seq_file whose ->private points to * kernfs_open_file. * * kernfs_open_files are chained at kernfs_open_node->files, which is * protected by kernfs_global_locks.open_file_mutex[i]. * * To reduce possible contention in sysfs access, arising due to single * locks, use an array of locks (e.g. open_file_mutex) and use kernfs_node * object address as hash keys to get the index of these locks. * * Hashed mutexes are safe to use here because operations using these don't * rely on global exclusion. * * In future we intend to replace other global locks with hashed ones as well. * kernfs_global_locks acts as a holder for all such hash tables. */ struct kernfs_global_locks { struct mutex open_file_mutex[NR_KERNFS_LOCKS]; }; enum kernfs_node_type { KERNFS_DIR = 0x0001, KERNFS_FILE = 0x0002, KERNFS_LINK = 0x0004, }; #define KERNFS_TYPE_MASK 0x000f #define KERNFS_FLAG_MASK ~KERNFS_TYPE_MASK #define KERNFS_MAX_USER_XATTRS 128 #define KERNFS_USER_XATTR_SIZE_LIMIT (128 << 10) enum kernfs_node_flag { KERNFS_ACTIVATED = 0x0010, KERNFS_NS = 0x0020, KERNFS_HAS_SEQ_SHOW = 0x0040, KERNFS_HAS_MMAP = 0x0080, KERNFS_LOCKDEP = 0x0100, KERNFS_HIDDEN = 0x0200, KERNFS_SUICIDAL = 0x0400, KERNFS_SUICIDED = 0x0800, KERNFS_EMPTY_DIR = 0x1000, KERNFS_HAS_RELEASE = 0x2000, KERNFS_REMOVING = 0x4000, }; /* @flags for kernfs_create_root() */ enum kernfs_root_flag { /* * kernfs_nodes are created in the deactivated state and invisible. * They require explicit kernfs_activate() to become visible. This * can be used to make related nodes become visible atomically * after all nodes are created successfully. */ KERNFS_ROOT_CREATE_DEACTIVATED = 0x0001, /* * For regular files, if the opener has CAP_DAC_OVERRIDE, open(2) * succeeds regardless of the RW permissions. sysfs had an extra * layer of enforcement where open(2) fails with -EACCES regardless * of CAP_DAC_OVERRIDE if the permission doesn't have the * respective read or write access at all (none of S_IRUGO or * S_IWUGO) or the respective operation isn't implemented. The * following flag enables that behavior. */ KERNFS_ROOT_EXTRA_OPEN_PERM_CHECK = 0x0002, /* * The filesystem supports exportfs operation, so userspace can use * fhandle to access nodes of the fs. */ KERNFS_ROOT_SUPPORT_EXPORTOP = 0x0004, /* * Support user xattrs to be written to nodes rooted at this root. */ KERNFS_ROOT_SUPPORT_USER_XATTR = 0x0008, /* * Renames must not change the parent node. */ KERNFS_ROOT_INVARIANT_PARENT = 0x0010, }; /* type-specific structures for kernfs_node union members */ struct kernfs_elem_dir { unsigned long subdirs; /* children rbtree starts here and goes through kn->rb */ struct rb_root children; /* * The kernfs hierarchy this directory belongs to. This fits * better directly in kernfs_node but is here to save space. */ struct kernfs_root *root; /* * Monotonic revision counter, used to identify if a directory * node has changed during negative dentry revalidation. */ unsigned long rev; }; struct kernfs_elem_symlink { struct kernfs_node *target_kn; }; struct kernfs_elem_attr { const struct kernfs_ops *ops; struct kernfs_open_node __rcu *open; loff_t size; struct kernfs_node *notify_next; /* for kernfs_notify() */ }; /* * kernfs_node - the building block of kernfs hierarchy. Each and every * kernfs node is represented by single kernfs_node. Most fields are * private to kernfs and shouldn't be accessed directly by kernfs users. * * As long as count reference is held, the kernfs_node itself is * accessible. Dereferencing elem or any other outer entity requires * active reference. */ struct kernfs_node { atomic_t count; atomic_t active; #ifdef CONFIG_DEBUG_LOCK_ALLOC struct lockdep_map dep_map; #endif /* * Use kernfs_get_parent() and kernfs_name/path() instead of * accessing the following two fields directly. If the node is * never moved to a different parent, it is safe to access the * parent directly. */ struct kernfs_node __rcu *__parent; const char __rcu *name; struct rb_node rb; const void *ns; /* namespace tag */ unsigned int hash; /* ns + name hash */ unsigned short flags; umode_t mode; union { struct kernfs_elem_dir dir; struct kernfs_elem_symlink symlink; struct kernfs_elem_attr attr; }; /* * 64bit unique ID. On 64bit ino setups, id is the ino. On 32bit, * the low 32bits are ino and upper generation. */ u64 id; void *priv; struct kernfs_iattrs *iattr; struct rcu_head rcu; }; /* * kernfs_syscall_ops may be specified on kernfs_create_root() to support * syscalls. These optional callbacks are invoked on the matching syscalls * and can perform any kernfs operations which don't necessarily have to be * the exact operation requested. An active reference is held for each * kernfs_node parameter. */ struct kernfs_syscall_ops { int (*show_options)(struct seq_file *sf, struct kernfs_root *root); int (*mkdir)(struct kernfs_node *parent, const char *name, umode_t mode); int (*rmdir)(struct kernfs_node *kn); int (*rename)(struct kernfs_node *kn, struct kernfs_node *new_parent, const char *new_name); int (*show_path)(struct seq_file *sf, struct kernfs_node *kn, struct kernfs_root *root); }; struct kernfs_node *kernfs_root_to_node(struct kernfs_root *root); struct kernfs_open_file { /* published fields */ struct kernfs_node *kn; struct file *file; struct seq_file *seq_file; void *priv; /* private fields, do not use outside kernfs proper */ struct mutex mutex; struct mutex prealloc_mutex; int event; struct list_head list; char *prealloc_buf; size_t atomic_write_len; bool mmapped:1; bool released:1; const struct vm_operations_struct *vm_ops; }; struct kernfs_ops { /* * Optional open/release methods. Both are called with * @of->seq_file populated. */ int (*open)(struct kernfs_open_file *of); void (*release)(struct kernfs_open_file *of); /* * Read is handled by either seq_file or raw_read(). * * If seq_show() is present, seq_file path is active. Other seq * operations are optional and if not implemented, the behavior is * equivalent to single_open(). @sf->private points to the * associated kernfs_open_file. * * read() is bounced through kernel buffer and a read larger than * PAGE_SIZE results in partial operation of PAGE_SIZE. */ int (*seq_show)(struct seq_file *sf, void *v); void *(*seq_start)(struct seq_file *sf, loff_t *ppos); void *(*seq_next)(struct seq_file *sf, void *v, loff_t *ppos); void (*seq_stop)(struct seq_file *sf, void *v); ssize_t (*read)(struct kernfs_open_file *of, char *buf, size_t bytes, loff_t off); /* * write() is bounced through kernel buffer. If atomic_write_len * is not set, a write larger than PAGE_SIZE results in partial * operations of PAGE_SIZE chunks. If atomic_write_len is set, * writes upto the specified size are executed atomically but * larger ones are rejected with -E2BIG. */ size_t atomic_write_len; /* * "prealloc" causes a buffer to be allocated at open for * all read/write requests. As ->seq_show uses seq_read() * which does its own allocation, it is incompatible with * ->prealloc. Provide ->read and ->write with ->prealloc. */ bool prealloc; ssize_t (*write)(struct kernfs_open_file *of, char *buf, size_t bytes, loff_t off); __poll_t (*poll)(struct kernfs_open_file *of, struct poll_table_struct *pt); int (*mmap)(struct kernfs_open_file *of, struct vm_area_struct *vma); loff_t (*llseek)(struct kernfs_open_file *of, loff_t offset, int whence); }; /* * The kernfs superblock creation/mount parameter context. */ struct kernfs_fs_context { struct kernfs_root *root; /* Root of the hierarchy being mounted */ void *ns_tag; /* Namespace tag of the mount (or NULL) */ unsigned long magic; /* File system specific magic number */ /* The following are set/used by kernfs_mount() */ bool new_sb_created; /* Set to T if we allocated a new sb */ }; #ifdef CONFIG_KERNFS static inline enum kernfs_node_type kernfs_type(struct kernfs_node *kn) { return kn->flags & KERNFS_TYPE_MASK; } static inline ino_t kernfs_id_ino(u64 id) { /* id is ino if ino_t is 64bit; otherwise, low 32bits */ if (sizeof(ino_t) >= sizeof(u64)) return id; else return (u32)id; } static inline u32 kernfs_id_gen(u64 id) { /* gen is fixed at 1 if ino_t is 64bit; otherwise, high 32bits */ if (sizeof(ino_t) >= sizeof(u64)) return 1; else return id >> 32; } static inline ino_t kernfs_ino(struct kernfs_node *kn) { return kernfs_id_ino(kn->id); } static inline ino_t kernfs_gen(struct kernfs_node *kn) { return kernfs_id_gen(kn->id); } /** * kernfs_enable_ns - enable namespace under a directory * @kn: directory of interest, should be empty * * This is to be called right after @kn is created to enable namespace * under it. All children of @kn must have non-NULL namespace tags and * only the ones which match the super_block's tag will be visible. */ static inline void kernfs_enable_ns(struct kernfs_node *kn) { WARN_ON_ONCE(kernfs_type(kn) != KERNFS_DIR); WARN_ON_ONCE(!RB_EMPTY_ROOT(&kn->dir.children)); kn->flags |= KERNFS_NS; } /** * kernfs_ns_enabled - test whether namespace is enabled * @kn: the node to test * * Test whether namespace filtering is enabled for the children of @ns. */ static inline bool kernfs_ns_enabled(struct kernfs_node *kn) { return kn->flags & KERNFS_NS; } int kernfs_name(struct kernfs_node *kn, char *buf, size_t buflen); int kernfs_path_from_node(struct kernfs_node *kn_to, struct kernfs_node *kn_from, char *buf, size_t buflen); void pr_cont_kernfs_name(struct kernfs_node *kn); void pr_cont_kernfs_path(struct kernfs_node *kn); struct kernfs_node *kernfs_get_parent(struct kernfs_node *kn); struct kernfs_node *kernfs_find_and_get_ns(struct kernfs_node *parent, const char *name, const void *ns); struct kernfs_node *kernfs_walk_and_get_ns(struct kernfs_node *parent, const char *path, const void *ns); void kernfs_get(struct kernfs_node *kn); void kernfs_put(struct kernfs_node *kn); struct kernfs_node *kernfs_node_from_dentry(struct dentry *dentry); struct kernfs_root *kernfs_root_from_sb(struct super_block *sb); struct inode *kernfs_get_inode(struct super_block *sb, struct kernfs_node *kn); struct dentry *kernfs_node_dentry(struct kernfs_node *kn, struct super_block *sb); struct kernfs_root *kernfs_create_root(struct kernfs_syscall_ops *scops, unsigned int flags, void *priv); void kernfs_destroy_root(struct kernfs_root *root); unsigned int kernfs_root_flags(struct kernfs_node *kn); struct kernfs_node *kernfs_create_dir_ns(struct kernfs_node *parent, const char *name, umode_t mode, kuid_t uid, kgid_t gid, void *priv, const void *ns); struct kernfs_node *kernfs_create_empty_dir(struct kernfs_node *parent, const char *name); struct kernfs_node *__kernfs_create_file(struct kernfs_node *parent, const char *name, umode_t mode, kuid_t uid, kgid_t gid, loff_t size, const struct kernfs_ops *ops, void *priv, const void *ns, struct lock_class_key *key); struct kernfs_node *kernfs_create_link(struct kernfs_node *parent, const char *name, struct kernfs_node *target); void kernfs_activate(struct kernfs_node *kn); void kernfs_show(struct kernfs_node *kn, bool show); void kernfs_remove(struct kernfs_node *kn); void kernfs_break_active_protection(struct kernfs_node *kn); void kernfs_unbreak_active_protection(struct kernfs_node *kn); bool kernfs_remove_self(struct kernfs_node *kn); int kernfs_remove_by_name_ns(struct kernfs_node *parent, const char *name, const void *ns); int kernfs_rename_ns(struct kernfs_node *kn, struct kernfs_node *new_parent, const char *new_name, const void *new_ns); int kernfs_setattr(struct kernfs_node *kn, const struct iattr *iattr); __poll_t kernfs_generic_poll(struct kernfs_open_file *of, struct poll_table_struct *pt); void kernfs_notify(struct kernfs_node *kn); int kernfs_xattr_get(struct kernfs_node *kn, const char *name, void *value, size_t size); int kernfs_xattr_set(struct kernfs_node *kn, const char *name, const void *value, size_t size, int flags); const void *kernfs_super_ns(struct super_block *sb); int kernfs_get_tree(struct fs_context *fc); void kernfs_free_fs_context(struct fs_context *fc); void kernfs_kill_sb(struct super_block *sb); void kernfs_init(void); struct kernfs_node *kernfs_find_and_get_node_by_id(struct kernfs_root *root, u64 id); #else /* CONFIG_KERNFS */ static inline enum kernfs_node_type kernfs_type(struct kernfs_node *kn) { return 0; } /* whatever */ static inline void kernfs_enable_ns(struct kernfs_node *kn) { } static inline bool kernfs_ns_enabled(struct kernfs_node *kn) { return false; } static inline int kernfs_name(struct kernfs_node *kn, char *buf, size_t buflen) { return -ENOSYS; } static inline int kernfs_path_from_node(struct kernfs_node *root_kn, struct kernfs_node *kn, char *buf, size_t buflen) { return -ENOSYS; } static inline void pr_cont_kernfs_name(struct kernfs_node *kn) { } static inline void pr_cont_kernfs_path(struct kernfs_node *kn) { } static inline struct kernfs_node *kernfs_get_parent(struct kernfs_node *kn) { return NULL; } static inline struct kernfs_node * kernfs_find_and_get_ns(struct kernfs_node *parent, const char *name, const void *ns) { return NULL; } static inline struct kernfs_node * kernfs_walk_and_get_ns(struct kernfs_node *parent, const char *path, const void *ns) { return NULL; } static inline void kernfs_get(struct kernfs_node *kn) { } static inline void kernfs_put(struct kernfs_node *kn) { } static inline struct kernfs_node *kernfs_node_from_dentry(struct dentry *dentry) { return NULL; } static inline struct kernfs_root *kernfs_root_from_sb(struct super_block *sb) { return NULL; } static inline struct inode * kernfs_get_inode(struct super_block *sb, struct kernfs_node *kn) { return NULL; } static inline struct kernfs_root * kernfs_create_root(struct kernfs_syscall_ops *scops, unsigned int flags, void *priv) { return ERR_PTR(-ENOSYS); } static inline void kernfs_destroy_root(struct kernfs_root *root) { } static inline unsigned int kernfs_root_flags(struct kernfs_node *kn) { return 0; } static inline struct kernfs_node * kernfs_create_dir_ns(struct kernfs_node *parent, const char *name, umode_t mode, kuid_t uid, kgid_t gid, void *priv, const void *ns) { return ERR_PTR(-ENOSYS); } static inline struct kernfs_node * __kernfs_create_file(struct kernfs_node *parent, const char *name, umode_t mode, kuid_t uid, kgid_t gid, loff_t size, const struct kernfs_ops *ops, void *priv, const void *ns, struct lock_class_key *key) { return ERR_PTR(-ENOSYS); } static inline struct kernfs_node * kernfs_create_link(struct kernfs_node *parent, const char *name, struct kernfs_node *target) { return ERR_PTR(-ENOSYS); } static inline void kernfs_activate(struct kernfs_node *kn) { } static inline void kernfs_remove(struct kernfs_node *kn) { } static inline bool kernfs_remove_self(struct kernfs_node *kn) { return false; } static inline int kernfs_remove_by_name_ns(struct kernfs_node *kn, const char *name, const void *ns) { return -ENOSYS; } static inline int kernfs_rename_ns(struct kernfs_node *kn, struct kernfs_node *new_parent, const char *new_name, const void *new_ns) { return -ENOSYS; } static inline int kernfs_setattr(struct kernfs_node *kn, const struct iattr *iattr) { return -ENOSYS; } static inline __poll_t kernfs_generic_poll(struct kernfs_open_file *of, struct poll_table_struct *pt) { return -ENOSYS; } static inline void kernfs_notify(struct kernfs_node *kn) { } static inline int kernfs_xattr_get(struct kernfs_node *kn, const char *name, void *value, size_t size) { return -ENOSYS; } static inline int kernfs_xattr_set(struct kernfs_node *kn, const char *name, const void *value, size_t size, int flags) { return -ENOSYS; } static inline const void *kernfs_super_ns(struct super_block *sb) { return NULL; } static inline int kernfs_get_tree(struct fs_context *fc) { return -ENOSYS; } static inline void kernfs_free_fs_context(struct fs_context *fc) { } static inline void kernfs_kill_sb(struct super_block *sb) { } static inline void kernfs_init(void) { } #endif /* CONFIG_KERNFS */ /** * kernfs_path - build full path of a given node * @kn: kernfs_node of interest * @buf: buffer to copy @kn's name into * @buflen: size of @buf * * If @kn is NULL result will be "(null)". * * Returns the length of the full path. If the full length is equal to or * greater than @buflen, @buf contains the truncated path with the trailing * '\0'. On error, -errno is returned. */ static inline int kernfs_path(struct kernfs_node *kn, char *buf, size_t buflen) { return kernfs_path_from_node(kn, NULL, buf, buflen); } static inline struct kernfs_node * kernfs_find_and_get(struct kernfs_node *kn, const char *name) { return kernfs_find_and_get_ns(kn, name, NULL); } static inline struct kernfs_node * kernfs_walk_and_get(struct kernfs_node *kn, const char *path) { return kernfs_walk_and_get_ns(kn, path, NULL); } static inline struct kernfs_node * kernfs_create_dir(struct kernfs_node *parent, const char *name, umode_t mode, void *priv) { return kernfs_create_dir_ns(parent, name, mode, GLOBAL_ROOT_UID, GLOBAL_ROOT_GID, priv, NULL); } static inline int kernfs_remove_by_name(struct kernfs_node *parent, const char *name) { return kernfs_remove_by_name_ns(parent, name, NULL); } static inline int kernfs_rename(struct kernfs_node *kn, struct kernfs_node *new_parent, const char *new_name) { return kernfs_rename_ns(kn, new_parent, new_name, NULL); } #endif /* __LINUX_KERNFS_H */ |
| 33 33 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 | // SPDX-License-Identifier: GPL-2.0-only /* * This is the 1999 rewrite of IP Firewalling, aiming for kernel 2.3.x. * * Copyright (C) 1999 Paul `Rusty' Russell & Michael J. Neuling * Copyright (C) 2000-2004 Netfilter Core Team <coreteam@netfilter.org> */ #include <linux/module.h> #include <linux/moduleparam.h> #include <linux/netfilter_ipv4/ip_tables.h> #include <linux/slab.h> #include <net/ip.h> MODULE_LICENSE("GPL"); MODULE_AUTHOR("Netfilter Core Team <coreteam@netfilter.org>"); MODULE_DESCRIPTION("iptables filter table"); #define FILTER_VALID_HOOKS ((1 << NF_INET_LOCAL_IN) | \ (1 << NF_INET_FORWARD) | \ (1 << NF_INET_LOCAL_OUT)) static const struct xt_table packet_filter = { .name = "filter", .valid_hooks = FILTER_VALID_HOOKS, .me = THIS_MODULE, .af = NFPROTO_IPV4, .priority = NF_IP_PRI_FILTER, }; static struct nf_hook_ops *filter_ops __read_mostly; /* Default to forward because I got too much mail already. */ static bool forward __read_mostly = true; module_param(forward, bool, 0000); static int iptable_filter_table_init(struct net *net) { struct ipt_replace *repl; int err; repl = ipt_alloc_initial_table(&packet_filter); if (repl == NULL) return -ENOMEM; /* Entry 1 is the FORWARD hook */ ((struct ipt_standard *)repl->entries)[1].target.verdict = forward ? -NF_ACCEPT - 1 : NF_DROP - 1; err = ipt_register_table(net, &packet_filter, repl, filter_ops); kfree(repl); return err; } static int __net_init iptable_filter_net_init(struct net *net) { if (!forward) return iptable_filter_table_init(net); return 0; } static void __net_exit iptable_filter_net_pre_exit(struct net *net) { ipt_unregister_table_pre_exit(net, "filter"); } static void __net_exit iptable_filter_net_exit(struct net *net) { ipt_unregister_table_exit(net, "filter"); } static struct pernet_operations iptable_filter_net_ops = { .init = iptable_filter_net_init, .pre_exit = iptable_filter_net_pre_exit, .exit = iptable_filter_net_exit, }; static int __init iptable_filter_init(void) { int ret = xt_register_template(&packet_filter, iptable_filter_table_init); if (ret < 0) return ret; filter_ops = xt_hook_ops_alloc(&packet_filter, ipt_do_table); if (IS_ERR(filter_ops)) { xt_unregister_template(&packet_filter); return PTR_ERR(filter_ops); } ret = register_pernet_subsys(&iptable_filter_net_ops); if (ret < 0) { xt_unregister_template(&packet_filter); kfree(filter_ops); return ret; } return 0; } static void __exit iptable_filter_fini(void) { unregister_pernet_subsys(&iptable_filter_net_ops); xt_unregister_template(&packet_filter); kfree(filter_ops); } module_init(iptable_filter_init); module_exit(iptable_filter_fini); |
| 41 87 87 309 281 77 16 66 77 37 833 4 4 733 738 738 13 30 10 21 30 273 273 985 983 625 380 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 | /* SPDX-License-Identifier: GPL-2.0-or-later */ /* * include/net/l3mdev.h - L3 master device API * Copyright (c) 2015 Cumulus Networks * Copyright (c) 2015 David Ahern <dsa@cumulusnetworks.com> */ #ifndef _NET_L3MDEV_H_ #define _NET_L3MDEV_H_ #include <net/dst.h> #include <net/fib_rules.h> enum l3mdev_type { L3MDEV_TYPE_UNSPEC, L3MDEV_TYPE_VRF, __L3MDEV_TYPE_MAX }; #define L3MDEV_TYPE_MAX (__L3MDEV_TYPE_MAX - 1) typedef int (*lookup_by_table_id_t)(struct net *net, u32 table_d); /** * struct l3mdev_ops - l3mdev operations * * @l3mdev_fib_table: Get FIB table id to use for lookups * * @l3mdev_l3_rcv: Hook in L3 receive path * * @l3mdev_l3_out: Hook in L3 output path * * @l3mdev_link_scope_lookup: IPv6 lookup for linklocal and mcast destinations */ struct l3mdev_ops { u32 (*l3mdev_fib_table)(const struct net_device *dev); struct sk_buff * (*l3mdev_l3_rcv)(struct net_device *dev, struct sk_buff *skb, u16 proto); struct sk_buff * (*l3mdev_l3_out)(struct net_device *dev, struct sock *sk, struct sk_buff *skb, u16 proto); /* IPv6 ops */ struct dst_entry * (*l3mdev_link_scope_lookup)(const struct net_device *dev, struct flowi6 *fl6); }; #ifdef CONFIG_NET_L3_MASTER_DEV int l3mdev_table_lookup_register(enum l3mdev_type l3type, lookup_by_table_id_t fn); void l3mdev_table_lookup_unregister(enum l3mdev_type l3type, lookup_by_table_id_t fn); int l3mdev_ifindex_lookup_by_table_id(enum l3mdev_type l3type, struct net *net, u32 table_id); int l3mdev_fib_rule_match(struct net *net, struct flowi *fl, struct fib_lookup_arg *arg); static inline bool l3mdev_fib_rule_iif_match(const struct flowi *fl, int iifindex) { return !(fl->flowi_flags & FLOWI_FLAG_L3MDEV_OIF) && fl->flowi_l3mdev == iifindex; } static inline bool l3mdev_fib_rule_oif_match(const struct flowi *fl, int oifindex) { return fl->flowi_flags & FLOWI_FLAG_L3MDEV_OIF && fl->flowi_l3mdev == oifindex; } void l3mdev_update_flow(struct net *net, struct flowi *fl); int l3mdev_master_ifindex_rcu(const struct net_device *dev); static inline int l3mdev_master_ifindex(struct net_device *dev) { int ifindex; rcu_read_lock(); ifindex = l3mdev_master_ifindex_rcu(dev); rcu_read_unlock(); return ifindex; } static inline int l3mdev_master_ifindex_by_index(struct net *net, int ifindex) { struct net_device *dev; int rc = 0; if (ifindex) { rcu_read_lock(); dev = dev_get_by_index_rcu(net, ifindex); if (dev) rc = l3mdev_master_ifindex_rcu(dev); rcu_read_unlock(); } return rc; } static inline struct net_device *l3mdev_master_dev_rcu(const struct net_device *_dev) { /* netdev_master_upper_dev_get_rcu calls * list_first_or_null_rcu to walk the upper dev list. * list_first_or_null_rcu does not handle a const arg. We aren't * making changes, just want the master device from that list so * typecast to remove the const */ struct net_device *dev = (struct net_device *)_dev; struct net_device *master; if (!dev) return NULL; if (netif_is_l3_master(dev)) master = dev; else if (netif_is_l3_slave(dev)) master = netdev_master_upper_dev_get_rcu(dev); else master = NULL; return master; } int l3mdev_master_upper_ifindex_by_index_rcu(struct net *net, int ifindex); static inline int l3mdev_master_upper_ifindex_by_index(struct net *net, int ifindex) { rcu_read_lock(); ifindex = l3mdev_master_upper_ifindex_by_index_rcu(net, ifindex); rcu_read_unlock(); return ifindex; } u32 l3mdev_fib_table_rcu(const struct net_device *dev); u32 l3mdev_fib_table_by_index(struct net *net, int ifindex); static inline u32 l3mdev_fib_table(const struct net_device *dev) { u32 tb_id; rcu_read_lock(); tb_id = l3mdev_fib_table_rcu(dev); rcu_read_unlock(); return tb_id; } static inline bool netif_index_is_l3_master(struct net *net, int ifindex) { struct net_device *dev; bool rc = false; if (ifindex == 0) return false; rcu_read_lock(); dev = dev_get_by_index_rcu(net, ifindex); if (dev) rc = netif_is_l3_master(dev); rcu_read_unlock(); return rc; } struct dst_entry *l3mdev_link_scope_lookup(struct net *net, struct flowi6 *fl6); static inline struct sk_buff *l3mdev_l3_rcv(struct sk_buff *skb, u16 proto) { struct net_device *master = NULL; if (netif_is_l3_slave(skb->dev)) master = netdev_master_upper_dev_get_rcu(skb->dev); else if (netif_is_l3_master(skb->dev) || netif_has_l3_rx_handler(skb->dev)) master = skb->dev; if (master && master->l3mdev_ops->l3mdev_l3_rcv) skb = master->l3mdev_ops->l3mdev_l3_rcv(master, skb, proto); return skb; } static inline struct sk_buff *l3mdev_ip_rcv(struct sk_buff *skb) { return l3mdev_l3_rcv(skb, AF_INET); } static inline struct sk_buff *l3mdev_ip6_rcv(struct sk_buff *skb) { return l3mdev_l3_rcv(skb, AF_INET6); } static inline struct sk_buff *l3mdev_l3_out(struct sock *sk, struct sk_buff *skb, u16 proto) { struct net_device *dev = skb_dst(skb)->dev; if (netif_is_l3_slave(dev)) { struct net_device *master; rcu_read_lock(); master = netdev_master_upper_dev_get_rcu(dev); if (master && master->l3mdev_ops->l3mdev_l3_out) skb = master->l3mdev_ops->l3mdev_l3_out(master, sk, skb, proto); rcu_read_unlock(); } return skb; } static inline struct sk_buff *l3mdev_ip_out(struct sock *sk, struct sk_buff *skb) { return l3mdev_l3_out(sk, skb, AF_INET); } static inline struct sk_buff *l3mdev_ip6_out(struct sock *sk, struct sk_buff *skb) { return l3mdev_l3_out(sk, skb, AF_INET6); } #else static inline int l3mdev_master_ifindex_rcu(const struct net_device *dev) { return 0; } static inline int l3mdev_master_ifindex(struct net_device *dev) { return 0; } static inline int l3mdev_master_ifindex_by_index(struct net *net, int ifindex) { return 0; } static inline int l3mdev_master_upper_ifindex_by_index_rcu(struct net *net, int ifindex) { return 0; } static inline int l3mdev_master_upper_ifindex_by_index(struct net *net, int ifindex) { return 0; } static inline struct net_device *l3mdev_master_dev_rcu(const struct net_device *dev) { return NULL; } static inline u32 l3mdev_fib_table_rcu(const struct net_device *dev) { return 0; } static inline u32 l3mdev_fib_table(const struct net_device *dev) { return 0; } static inline u32 l3mdev_fib_table_by_index(struct net *net, int ifindex) { return 0; } static inline bool netif_index_is_l3_master(struct net *net, int ifindex) { return false; } static inline struct dst_entry *l3mdev_link_scope_lookup(struct net *net, struct flowi6 *fl6) { return NULL; } static inline struct sk_buff *l3mdev_ip_rcv(struct sk_buff *skb) { return skb; } static inline struct sk_buff *l3mdev_ip6_rcv(struct sk_buff *skb) { return skb; } static inline struct sk_buff *l3mdev_ip_out(struct sock *sk, struct sk_buff *skb) { return skb; } static inline struct sk_buff *l3mdev_ip6_out(struct sock *sk, struct sk_buff *skb) { return skb; } static inline int l3mdev_table_lookup_register(enum l3mdev_type l3type, lookup_by_table_id_t fn) { return -EOPNOTSUPP; } static inline void l3mdev_table_lookup_unregister(enum l3mdev_type l3type, lookup_by_table_id_t fn) { } static inline int l3mdev_ifindex_lookup_by_table_id(enum l3mdev_type l3type, struct net *net, u32 table_id) { return -ENODEV; } static inline int l3mdev_fib_rule_match(struct net *net, struct flowi *fl, struct fib_lookup_arg *arg) { return 1; } static inline bool l3mdev_fib_rule_iif_match(const struct flowi *fl, int iifindex) { return false; } static inline bool l3mdev_fib_rule_oif_match(const struct flowi *fl, int oifindex) { return false; } static inline void l3mdev_update_flow(struct net *net, struct flowi *fl) { } #endif #endif /* _NET_L3MDEV_H_ */ |
| 25 25 25 25 4 4 4 3 3 3 7 7 7 8 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 | /* SPDX-License-Identifier: GPL-2.0-only */ /* * Copyright (c) 2008, Intel Corporation. * * Author: Alexander Duyck <alexander.h.duyck@intel.com> */ #ifndef __NET_TC_SKBEDIT_H #define __NET_TC_SKBEDIT_H #include <net/act_api.h> #include <linux/tc_act/tc_skbedit.h> struct tcf_skbedit_params { int action; u32 flags; u32 priority; u32 mark; u32 mask; u16 queue_mapping; u16 mapping_mod; u16 ptype; struct rcu_head rcu; }; struct tcf_skbedit { struct tc_action common; struct tcf_skbedit_params __rcu *params; }; #define to_skbedit(a) ((struct tcf_skbedit *)a) /* Return true iff action is the one identified by FLAG. */ static inline bool is_tcf_skbedit_with_flag(const struct tc_action *a, u32 flag) { #ifdef CONFIG_NET_CLS_ACT u32 flags; if (a->ops && a->ops->id == TCA_ID_SKBEDIT) { rcu_read_lock(); flags = rcu_dereference(to_skbedit(a)->params)->flags; rcu_read_unlock(); return flags == flag; } #endif return false; } /* Return true iff action is mark */ static inline bool is_tcf_skbedit_mark(const struct tc_action *a) { return is_tcf_skbedit_with_flag(a, SKBEDIT_F_MARK); } static inline u32 tcf_skbedit_mark(const struct tc_action *a) { u32 mark; rcu_read_lock(); mark = rcu_dereference(to_skbedit(a)->params)->mark; rcu_read_unlock(); return mark; } /* Return true iff action is ptype */ static inline bool is_tcf_skbedit_ptype(const struct tc_action *a) { return is_tcf_skbedit_with_flag(a, SKBEDIT_F_PTYPE); } static inline u32 tcf_skbedit_ptype(const struct tc_action *a) { u16 ptype; rcu_read_lock(); ptype = rcu_dereference(to_skbedit(a)->params)->ptype; rcu_read_unlock(); return ptype; } /* Return true iff action is priority */ static inline bool is_tcf_skbedit_priority(const struct tc_action *a) { return is_tcf_skbedit_with_flag(a, SKBEDIT_F_PRIORITY); } static inline u32 tcf_skbedit_priority(const struct tc_action *a) { u32 priority; rcu_read_lock(); priority = rcu_dereference(to_skbedit(a)->params)->priority; rcu_read_unlock(); return priority; } static inline u16 tcf_skbedit_rx_queue_mapping(const struct tc_action *a) { u16 rx_queue; rcu_read_lock(); rx_queue = rcu_dereference(to_skbedit(a)->params)->queue_mapping; rcu_read_unlock(); return rx_queue; } /* Return true iff action is queue_mapping */ static inline bool is_tcf_skbedit_queue_mapping(const struct tc_action *a) { return is_tcf_skbedit_with_flag(a, SKBEDIT_F_QUEUE_MAPPING); } /* Return true if action is on ingress traffic */ static inline bool is_tcf_skbedit_ingress(u32 flags) { return flags & TCA_ACT_FLAGS_AT_INGRESS; } static inline bool is_tcf_skbedit_tx_queue_mapping(const struct tc_action *a) { return is_tcf_skbedit_queue_mapping(a) && !is_tcf_skbedit_ingress(a->tcfa_flags); } static inline bool is_tcf_skbedit_rx_queue_mapping(const struct tc_action *a) { return is_tcf_skbedit_queue_mapping(a) && is_tcf_skbedit_ingress(a->tcfa_flags); } /* Return true iff action is inheritdsfield */ static inline bool is_tcf_skbedit_inheritdsfield(const struct tc_action *a) { return is_tcf_skbedit_with_flag(a, SKBEDIT_F_INHERITDSFIELD); } #endif /* __NET_TC_SKBEDIT_H */ |
| 9 9 9 9 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 | // SPDX-License-Identifier: GPL-2.0-or-later /* * Crypto API support for SHA-1 and HMAC-SHA1 * * Copyright (c) Alan Smithee. * Copyright (c) Andrew McDonald <andrew@mcdonald.org.uk> * Copyright (c) Jean-Francois Dive <jef@linuxbe.org> * Copyright 2025 Google LLC */ #include <crypto/internal/hash.h> #include <crypto/sha1.h> #include <linux/kernel.h> #include <linux/module.h> /* * Export and import functions. crypto_shash wants a particular format that * matches that used by some legacy drivers. It currently is the same as the * library SHA context, except the value in bytecount must be block-aligned and * the remainder must be stored in an extra u8 appended to the struct. */ #define SHA1_SHASH_STATE_SIZE (sizeof(struct sha1_ctx) + 1) static_assert(sizeof(struct sha1_ctx) == sizeof(struct sha1_state)); static_assert(offsetof(struct sha1_ctx, state) == offsetof(struct sha1_state, state)); static_assert(offsetof(struct sha1_ctx, bytecount) == offsetof(struct sha1_state, count)); static_assert(offsetof(struct sha1_ctx, buf) == offsetof(struct sha1_state, buffer)); static int __crypto_sha1_export(const struct sha1_ctx *ctx0, void *out) { struct sha1_ctx ctx = *ctx0; unsigned int partial; u8 *p = out; partial = ctx.bytecount % SHA1_BLOCK_SIZE; ctx.bytecount -= partial; memcpy(p, &ctx, sizeof(ctx)); p += sizeof(ctx); *p = partial; return 0; } static int __crypto_sha1_import(struct sha1_ctx *ctx, const void *in) { const u8 *p = in; memcpy(ctx, p, sizeof(*ctx)); p += sizeof(*ctx); ctx->bytecount += *p; return 0; } static int __crypto_sha1_export_core(const struct sha1_ctx *ctx, void *out) { memcpy(out, ctx, offsetof(struct sha1_ctx, buf)); return 0; } static int __crypto_sha1_import_core(struct sha1_ctx *ctx, const void *in) { memcpy(ctx, in, offsetof(struct sha1_ctx, buf)); return 0; } const u8 sha1_zero_message_hash[SHA1_DIGEST_SIZE] = { 0xda, 0x39, 0xa3, 0xee, 0x5e, 0x6b, 0x4b, 0x0d, 0x32, 0x55, 0xbf, 0xef, 0x95, 0x60, 0x18, 0x90, 0xaf, 0xd8, 0x07, 0x09 }; EXPORT_SYMBOL_GPL(sha1_zero_message_hash); #define SHA1_CTX(desc) ((struct sha1_ctx *)shash_desc_ctx(desc)) static int crypto_sha1_init(struct shash_desc *desc) { sha1_init(SHA1_CTX(desc)); return 0; } static int crypto_sha1_update(struct shash_desc *desc, const u8 *data, unsigned int len) { sha1_update(SHA1_CTX(desc), data, len); return 0; } static int crypto_sha1_final(struct shash_desc *desc, u8 *out) { sha1_final(SHA1_CTX(desc), out); return 0; } static int crypto_sha1_digest(struct shash_desc *desc, const u8 *data, unsigned int len, u8 *out) { sha1(data, len, out); return 0; } static int crypto_sha1_export(struct shash_desc *desc, void *out) { return __crypto_sha1_export(SHA1_CTX(desc), out); } static int crypto_sha1_import(struct shash_desc *desc, const void *in) { return __crypto_sha1_import(SHA1_CTX(desc), in); } static int crypto_sha1_export_core(struct shash_desc *desc, void *out) { return __crypto_sha1_export_core(SHA1_CTX(desc), out); } static int crypto_sha1_import_core(struct shash_desc *desc, const void *in) { return __crypto_sha1_import_core(SHA1_CTX(desc), in); } #define HMAC_SHA1_KEY(tfm) ((struct hmac_sha1_key *)crypto_shash_ctx(tfm)) #define HMAC_SHA1_CTX(desc) ((struct hmac_sha1_ctx *)shash_desc_ctx(desc)) static int crypto_hmac_sha1_setkey(struct crypto_shash *tfm, const u8 *raw_key, unsigned int keylen) { hmac_sha1_preparekey(HMAC_SHA1_KEY(tfm), raw_key, keylen); return 0; } static int crypto_hmac_sha1_init(struct shash_desc *desc) { hmac_sha1_init(HMAC_SHA1_CTX(desc), HMAC_SHA1_KEY(desc->tfm)); return 0; } static int crypto_hmac_sha1_update(struct shash_desc *desc, const u8 *data, unsigned int len) { hmac_sha1_update(HMAC_SHA1_CTX(desc), data, len); return 0; } static int crypto_hmac_sha1_final(struct shash_desc *desc, u8 *out) { hmac_sha1_final(HMAC_SHA1_CTX(desc), out); return 0; } static int crypto_hmac_sha1_digest(struct shash_desc *desc, const u8 *data, unsigned int len, u8 *out) { hmac_sha1(HMAC_SHA1_KEY(desc->tfm), data, len, out); return 0; } static int crypto_hmac_sha1_export(struct shash_desc *desc, void *out) { return __crypto_sha1_export(&HMAC_SHA1_CTX(desc)->sha_ctx, out); } static int crypto_hmac_sha1_import(struct shash_desc *desc, const void *in) { struct hmac_sha1_ctx *ctx = HMAC_SHA1_CTX(desc); ctx->ostate = HMAC_SHA1_KEY(desc->tfm)->ostate; return __crypto_sha1_import(&ctx->sha_ctx, in); } static int crypto_hmac_sha1_export_core(struct shash_desc *desc, void *out) { return __crypto_sha1_export_core(&HMAC_SHA1_CTX(desc)->sha_ctx, out); } static int crypto_hmac_sha1_import_core(struct shash_desc *desc, const void *in) { struct hmac_sha1_ctx *ctx = HMAC_SHA1_CTX(desc); ctx->ostate = HMAC_SHA1_KEY(desc->tfm)->ostate; return __crypto_sha1_import_core(&ctx->sha_ctx, in); } static struct shash_alg algs[] = { { .base.cra_name = "sha1", .base.cra_driver_name = "sha1-lib", .base.cra_priority = 300, .base.cra_blocksize = SHA1_BLOCK_SIZE, .base.cra_module = THIS_MODULE, .digestsize = SHA1_DIGEST_SIZE, .init = crypto_sha1_init, .update = crypto_sha1_update, .final = crypto_sha1_final, .digest = crypto_sha1_digest, .export = crypto_sha1_export, .import = crypto_sha1_import, .export_core = crypto_sha1_export_core, .import_core = crypto_sha1_import_core, .descsize = sizeof(struct sha1_ctx), .statesize = SHA1_SHASH_STATE_SIZE, }, { .base.cra_name = "hmac(sha1)", .base.cra_driver_name = "hmac-sha1-lib", .base.cra_priority = 300, .base.cra_blocksize = SHA1_BLOCK_SIZE, .base.cra_ctxsize = sizeof(struct hmac_sha1_key), .base.cra_module = THIS_MODULE, .digestsize = SHA1_DIGEST_SIZE, .setkey = crypto_hmac_sha1_setkey, .init = crypto_hmac_sha1_init, .update = crypto_hmac_sha1_update, .final = crypto_hmac_sha1_final, .digest = crypto_hmac_sha1_digest, .export = crypto_hmac_sha1_export, .import = crypto_hmac_sha1_import, .export_core = crypto_hmac_sha1_export_core, .import_core = crypto_hmac_sha1_import_core, .descsize = sizeof(struct hmac_sha1_ctx), .statesize = SHA1_SHASH_STATE_SIZE, }, }; static int __init crypto_sha1_mod_init(void) { return crypto_register_shashes(algs, ARRAY_SIZE(algs)); } module_init(crypto_sha1_mod_init); static void __exit crypto_sha1_mod_exit(void) { crypto_unregister_shashes(algs, ARRAY_SIZE(algs)); } module_exit(crypto_sha1_mod_exit); MODULE_LICENSE("GPL"); MODULE_DESCRIPTION("Crypto API support for SHA-1 and HMAC-SHA1"); MODULE_ALIAS_CRYPTO("sha1"); MODULE_ALIAS_CRYPTO("sha1-lib"); MODULE_ALIAS_CRYPTO("hmac(sha1)"); MODULE_ALIAS_CRYPTO("hmac-sha1-lib"); |
| 17 17 17 7 7 7 7 7 7 7 7 7 7 24 24 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 | // SPDX-License-Identifier: GPL-2.0 /* * Copyright (C) 2015-2019 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved. */ #include "peer.h" #include "device.h" #include "queueing.h" #include "timers.h" #include "peerlookup.h" #include "noise.h" #include <linux/kref.h> #include <linux/lockdep.h> #include <linux/rcupdate.h> #include <linux/list.h> static struct kmem_cache *peer_cache; static atomic64_t peer_counter = ATOMIC64_INIT(0); struct wg_peer *wg_peer_create(struct wg_device *wg, const u8 public_key[NOISE_PUBLIC_KEY_LEN], const u8 preshared_key[NOISE_SYMMETRIC_KEY_LEN]) { struct wg_peer *peer; int ret = -ENOMEM; lockdep_assert_held(&wg->device_update_lock); if (wg->num_peers >= MAX_PEERS_PER_DEVICE) return ERR_PTR(ret); peer = kmem_cache_zalloc(peer_cache, GFP_KERNEL); if (unlikely(!peer)) return ERR_PTR(ret); if (unlikely(dst_cache_init(&peer->endpoint_cache, GFP_KERNEL))) goto err; peer->device = wg; wg_noise_handshake_init(&peer->handshake, &wg->static_identity, public_key, preshared_key, peer); peer->internal_id = atomic64_inc_return(&peer_counter); peer->serial_work_cpu = nr_cpumask_bits; wg_cookie_init(&peer->latest_cookie); wg_timers_init(peer); wg_cookie_checker_precompute_peer_keys(peer); spin_lock_init(&peer->keypairs.keypair_update_lock); INIT_WORK(&peer->transmit_handshake_work, wg_packet_handshake_send_worker); INIT_WORK(&peer->transmit_packet_work, wg_packet_tx_worker); wg_prev_queue_init(&peer->tx_queue); wg_prev_queue_init(&peer->rx_queue); rwlock_init(&peer->endpoint_lock); kref_init(&peer->refcount); skb_queue_head_init(&peer->staged_packet_queue); wg_noise_reset_last_sent_handshake(&peer->last_sent_handshake); set_bit(NAPI_STATE_NO_BUSY_POLL, &peer->napi.state); netif_napi_add(wg->dev, &peer->napi, wg_packet_rx_poll); napi_enable(&peer->napi); list_add_tail(&peer->peer_list, &wg->peer_list); INIT_LIST_HEAD(&peer->allowedips_list); wg_pubkey_hashtable_add(wg->peer_hashtable, peer); ++wg->num_peers; pr_debug("%s: Peer %llu created\n", wg->dev->name, peer->internal_id); return peer; err: kmem_cache_free(peer_cache, peer); return ERR_PTR(ret); } struct wg_peer *wg_peer_get_maybe_zero(struct wg_peer *peer) { RCU_LOCKDEP_WARN(!rcu_read_lock_bh_held(), "Taking peer reference without holding the RCU read lock"); if (unlikely(!peer || !kref_get_unless_zero(&peer->refcount))) return NULL; return peer; } static void peer_make_dead(struct wg_peer *peer) { /* Remove from configuration-time lookup structures. */ list_del_init(&peer->peer_list); wg_allowedips_remove_by_peer(&peer->device->peer_allowedips, peer, &peer->device->device_update_lock); wg_pubkey_hashtable_remove(peer->device->peer_hashtable, peer); /* Mark as dead, so that we don't allow jumping contexts after. */ WRITE_ONCE(peer->is_dead, true); /* The caller must now synchronize_net() for this to take effect. */ } static void peer_remove_after_dead(struct wg_peer *peer) { WARN_ON(!peer->is_dead); /* No more keypairs can be created for this peer, since is_dead protects * add_new_keypair, so we can now destroy existing ones. */ wg_noise_keypairs_clear(&peer->keypairs); /* Destroy all ongoing timers that were in-flight at the beginning of * this function. */ wg_timers_stop(peer); /* The transition between packet encryption/decryption queues isn't * guarded by is_dead, but each reference's life is strictly bounded by * two generations: once for parallel crypto and once for serial * ingestion, so we can simply flush twice, and be sure that we no * longer have references inside these queues. */ /* a) For encrypt/decrypt. */ flush_workqueue(peer->device->packet_crypt_wq); /* b.1) For send (but not receive, since that's napi). */ flush_workqueue(peer->device->packet_crypt_wq); /* b.2.1) For receive (but not send, since that's wq). */ napi_disable(&peer->napi); /* b.2.1) It's now safe to remove the napi struct, which must be done * here from process context. */ netif_napi_del(&peer->napi); /* Ensure any workstructs we own (like transmit_handshake_work or * clear_peer_work) no longer are in use. */ flush_workqueue(peer->device->handshake_send_wq); /* After the above flushes, a peer might still be active in a few * different contexts: 1) from xmit(), before hitting is_dead and * returning, 2) from wg_packet_consume_data(), before hitting is_dead * and returning, 3) from wg_receive_handshake_packet() after a point * where it has processed an incoming handshake packet, but where * all calls to pass it off to timers fails because of is_dead. We won't * have new references in (1) eventually, because we're removed from * allowedips; we won't have new references in (2) eventually, because * wg_index_hashtable_lookup will always return NULL, since we removed * all existing keypairs and no more can be created; we won't have new * references in (3) eventually, because we're removed from the pubkey * hash table, which allows for a maximum of one handshake response, * via the still-uncleared index hashtable entry, but not more than one, * and in wg_cookie_message_consume, the lookup eventually gets a peer * with a refcount of zero, so no new reference is taken. */ --peer->device->num_peers; wg_peer_put(peer); } /* We have a separate "remove" function make sure that all active places where * a peer is currently operating will eventually come to an end and not pass * their reference onto another context. */ void wg_peer_remove(struct wg_peer *peer) { if (unlikely(!peer)) return; lockdep_assert_held(&peer->device->device_update_lock); peer_make_dead(peer); synchronize_net(); peer_remove_after_dead(peer); } void wg_peer_remove_all(struct wg_device *wg) { struct wg_peer *peer, *temp; LIST_HEAD(dead_peers); lockdep_assert_held(&wg->device_update_lock); /* Avoid having to traverse individually for each one. */ wg_allowedips_free(&wg->peer_allowedips, &wg->device_update_lock); list_for_each_entry_safe(peer, temp, &wg->peer_list, peer_list) { peer_make_dead(peer); list_add_tail(&peer->peer_list, &dead_peers); } synchronize_net(); list_for_each_entry_safe(peer, temp, &dead_peers, peer_list) peer_remove_after_dead(peer); } static void rcu_release(struct rcu_head *rcu) { struct wg_peer *peer = container_of(rcu, struct wg_peer, rcu); dst_cache_destroy(&peer->endpoint_cache); WARN_ON(wg_prev_queue_peek(&peer->tx_queue) || wg_prev_queue_peek(&peer->rx_queue)); /* The final zeroing takes care of clearing any remaining handshake key * material and other potentially sensitive information. */ memzero_explicit(peer, sizeof(*peer)); kmem_cache_free(peer_cache, peer); } static void kref_release(struct kref *refcount) { struct wg_peer *peer = container_of(refcount, struct wg_peer, refcount); pr_debug("%s: Peer %llu (%pISpfsc) destroyed\n", peer->device->dev->name, peer->internal_id, &peer->endpoint.addr); /* Remove ourself from dynamic runtime lookup structures, now that the * last reference is gone. */ wg_index_hashtable_remove(peer->device->index_hashtable, &peer->handshake.entry); /* Remove any lingering packets that didn't have a chance to be * transmitted. */ wg_packet_purge_staged_packets(peer); /* Free the memory used. */ call_rcu(&peer->rcu, rcu_release); } void wg_peer_put(struct wg_peer *peer) { if (unlikely(!peer)) return; kref_put(&peer->refcount, kref_release); } int __init wg_peer_init(void) { peer_cache = KMEM_CACHE(wg_peer, 0); return peer_cache ? 0 : -ENOMEM; } void wg_peer_uninit(void) { kmem_cache_destroy(peer_cache); } |
| 138 469 47 4973 744 28 358 4415 501 47 68 180 60 94 37 863 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 | /* * Performance events: * * Copyright (C) 2008-2009, Thomas Gleixner <tglx@linutronix.de> * Copyright (C) 2008-2011, Red Hat, Inc., Ingo Molnar * Copyright (C) 2008-2011, Red Hat, Inc., Peter Zijlstra * * Data type definitions, declarations, prototypes. * * Started by: Thomas Gleixner and Ingo Molnar * * For licencing details see kernel-base/COPYING */ #ifndef _LINUX_PERF_EVENT_H #define _LINUX_PERF_EVENT_H #include <uapi/linux/perf_event.h> #include <uapi/linux/bpf_perf_event.h> /* * Kernel-internal data types and definitions: */ #ifdef CONFIG_PERF_EVENTS # include <asm/perf_event.h> # include <asm/local64.h> #endif #ifdef CONFIG_HAVE_HW_BREAKPOINT # include <linux/rhashtable-types.h> # include <asm/hw_breakpoint.h> #endif #include <linux/list.h> #include <linux/mutex.h> #include <linux/rculist.h> #include <linux/rcupdate.h> #include <linux/spinlock.h> #include <linux/hrtimer.h> #include <linux/fs.h> #include <linux/pid_namespace.h> #include <linux/workqueue.h> #include <linux/ftrace.h> #include <linux/cpu.h> #include <linux/irq_work.h> #include <linux/static_key.h> #include <linux/jump_label_ratelimit.h> #include <linux/atomic.h> #include <linux/sysfs.h> #include <linux/perf_regs.h> #include <linux/cgroup.h> #include <linux/refcount.h> #include <linux/security.h> #include <linux/static_call.h> #include <linux/lockdep.h> #include <asm/local.h> struct perf_callchain_entry { u64 nr; u64 ip[]; /* /proc/sys/kernel/perf_event_max_stack */ }; struct perf_callchain_entry_ctx { struct perf_callchain_entry *entry; u32 max_stack; u32 nr; short contexts; bool contexts_maxed; }; typedef unsigned long (*perf_copy_f)(void *dst, const void *src, unsigned long off, unsigned long len); struct perf_raw_frag { union { struct perf_raw_frag *next; unsigned long pad; }; perf_copy_f copy; void *data; u32 size; } __packed; struct perf_raw_record { struct perf_raw_frag frag; u32 size; }; static __always_inline bool perf_raw_frag_last(const struct perf_raw_frag *frag) { return frag->pad < sizeof(u64); } /* * branch stack layout: * nr: number of taken branches stored in entries[] * hw_idx: The low level index of raw branch records * for the most recent branch. * -1ULL means invalid/unknown. * * Note that nr can vary from sample to sample * branches (to, from) are stored from most recent * to least recent, i.e., entries[0] contains the most * recent branch. * The entries[] is an abstraction of raw branch records, * which may not be stored in age order in HW, e.g. Intel LBR. * The hw_idx is to expose the low level index of raw * branch record for the most recent branch aka entries[0]. * The hw_idx index is between -1 (unknown) and max depth, * which can be retrieved in /sys/devices/cpu/caps/branches. * For the architectures whose raw branch records are * already stored in age order, the hw_idx should be 0. */ struct perf_branch_stack { u64 nr; u64 hw_idx; struct perf_branch_entry entries[]; }; struct task_struct; /* * extra PMU register associated with an event */ struct hw_perf_event_extra { u64 config; /* register value */ unsigned int reg; /* register address or index */ int alloc; /* extra register already allocated */ int idx; /* index in shared_regs->regs[] */ }; /** * hw_perf_event::flag values * * PERF_EVENT_FLAG_ARCH bits are reserved for architecture-specific * usage. */ #define PERF_EVENT_FLAG_ARCH 0x0fffffff #define PERF_EVENT_FLAG_USER_READ_CNT 0x80000000 static_assert((PERF_EVENT_FLAG_USER_READ_CNT & PERF_EVENT_FLAG_ARCH) == 0); /** * struct hw_perf_event - performance event hardware details: */ struct hw_perf_event { #ifdef CONFIG_PERF_EVENTS union { struct { /* hardware */ u64 config; u64 config1; u64 last_tag; u64 dyn_constraint; unsigned long config_base; unsigned long event_base; int event_base_rdpmc; int idx; int last_cpu; int flags; struct hw_perf_event_extra extra_reg; struct hw_perf_event_extra branch_reg; }; struct { /* aux / Intel-PT */ u64 aux_config; /* * For AUX area events, aux_paused cannot be a state * flag because it can be updated asynchronously to * state. */ unsigned int aux_paused; }; struct { /* software */ struct hrtimer hrtimer; }; struct { /* tracepoint */ /* for tp_event->class */ struct list_head tp_list; }; struct { /* amd_power */ u64 pwr_acc; u64 ptsc; }; #ifdef CONFIG_HAVE_HW_BREAKPOINT struct { /* breakpoint */ /* * Crufty hack to avoid the chicken and egg * problem hw_breakpoint has with context * creation and event initalization. */ struct arch_hw_breakpoint info; struct rhlist_head bp_list; }; #endif struct { /* amd_iommu */ u8 iommu_bank; u8 iommu_cntr; u16 padding; u64 conf; u64 conf1; }; }; /* * If the event is a per task event, this will point to the task in * question. See the comment in perf_event_alloc(). */ struct task_struct *target; /* * PMU would store hardware filter configuration * here. */ void *addr_filters; /* Last sync'ed generation of filters */ unsigned long addr_filters_gen; /* * hw_perf_event::state flags; used to track the PERF_EF_* state. */ /* the counter is stopped */ #define PERF_HES_STOPPED 0x01 /* event->count up-to-date */ #define PERF_HES_UPTODATE 0x02 #define PERF_HES_ARCH 0x04 int state; /* * The last observed hardware counter value, updated with a * local64_cmpxchg() such that pmu::read() can be called nested. */ local64_t prev_count; /* * The period to start the next sample with. */ u64 sample_period; union { struct { /* Sampling */ /* * The period we started this sample with. */ u64 last_period; /* * However much is left of the current period; * note that this is a full 64bit value and * allows for generation of periods longer * than hardware might allow. */ local64_t period_left; }; struct { /* Topdown events counting for context switch */ u64 saved_metric; u64 saved_slots; }; }; /* * State for throttling the event, see __perf_event_overflow() and * perf_adjust_freq_unthr_context(). */ u64 interrupts_seq; u64 interrupts; /* * State for freq target events, see __perf_event_overflow() and * perf_adjust_freq_unthr_context(). */ u64 freq_time_stamp; u64 freq_count_stamp; #endif /* CONFIG_PERF_EVENTS */ }; struct perf_event; struct perf_event_pmu_context; /* * Common implementation detail of pmu::{start,commit,cancel}_txn */ /* txn to add/schedule event on PMU */ #define PERF_PMU_TXN_ADD 0x1 /* txn to read event group from PMU */ #define PERF_PMU_TXN_READ 0x2 /** * pmu::capabilities flags */ #define PERF_PMU_CAP_NO_INTERRUPT 0x0001 #define PERF_PMU_CAP_NO_NMI 0x0002 #define PERF_PMU_CAP_AUX_NO_SG 0x0004 #define PERF_PMU_CAP_EXTENDED_REGS 0x0008 #define PERF_PMU_CAP_EXCLUSIVE 0x0010 #define PERF_PMU_CAP_ITRACE 0x0020 #define PERF_PMU_CAP_NO_EXCLUDE 0x0040 #define PERF_PMU_CAP_AUX_OUTPUT 0x0080 #define PERF_PMU_CAP_EXTENDED_HW_TYPE 0x0100 #define PERF_PMU_CAP_AUX_PAUSE 0x0200 #define PERF_PMU_CAP_AUX_PREFER_LARGE 0x0400 /** * pmu::scope */ enum perf_pmu_scope { PERF_PMU_SCOPE_NONE = 0, PERF_PMU_SCOPE_CORE, PERF_PMU_SCOPE_DIE, PERF_PMU_SCOPE_CLUSTER, PERF_PMU_SCOPE_PKG, PERF_PMU_SCOPE_SYS_WIDE, PERF_PMU_MAX_SCOPE, }; struct perf_output_handle; #define PMU_NULL_DEV ((void *)(~0UL)) /** * struct pmu - generic performance monitoring unit */ struct pmu { struct list_head entry; spinlock_t events_lock; struct list_head events; struct module *module; struct device *dev; struct device *parent; const struct attribute_group **attr_groups; const struct attribute_group **attr_update; const char *name; int type; /* * various common per-pmu feature flags */ int capabilities; /* * PMU scope */ unsigned int scope; struct perf_cpu_pmu_context * __percpu *cpu_pmu_context; atomic_t exclusive_cnt; /* < 0: cpu; > 0: tsk */ int task_ctx_nr; int hrtimer_interval_ms; /* number of address filters this PMU can do */ unsigned int nr_addr_filters; /* * Fully disable/enable this PMU, can be used to protect from the PMI * as well as for lazy/batch writing of the MSRs. */ void (*pmu_enable) (struct pmu *pmu); /* optional */ void (*pmu_disable) (struct pmu *pmu); /* optional */ /* * Try and initialize the event for this PMU. * * Returns: * -ENOENT -- @event is not for this PMU * * -ENODEV -- @event is for this PMU but PMU not present * -EBUSY -- @event is for this PMU but PMU temporarily unavailable * -EINVAL -- @event is for this PMU but @event is not valid * -EOPNOTSUPP -- @event is for this PMU, @event is valid, but not supported * -EACCES -- @event is for this PMU, @event is valid, but no privileges * * 0 -- @event is for this PMU and valid * * Other error return values are allowed. */ int (*event_init) (struct perf_event *event); /* * Notification that the event was mapped or unmapped. Called * in the context of the mapping task. */ void (*event_mapped) (struct perf_event *event, struct mm_struct *mm); /* optional */ void (*event_unmapped) (struct perf_event *event, struct mm_struct *mm); /* optional */ /* * Flags for ->add()/->del()/ ->start()/->stop(). There are * matching hw_perf_event::state flags. */ /* start the counter when adding */ #define PERF_EF_START 0x01 /* reload the counter when starting */ #define PERF_EF_RELOAD 0x02 /* update the counter when stopping */ #define PERF_EF_UPDATE 0x04 /* AUX area event, pause tracing */ #define PERF_EF_PAUSE 0x08 /* AUX area event, resume tracing */ #define PERF_EF_RESUME 0x10 /* * Adds/Removes a counter to/from the PMU, can be done inside a * transaction, see the ->*_txn() methods. * * The add/del callbacks will reserve all hardware resources required * to service the event, this includes any counter constraint * scheduling etc. * * Called with IRQs disabled and the PMU disabled on the CPU the event * is on. * * ->add() called without PERF_EF_START should result in the same state * as ->add() followed by ->stop(). * * ->del() must always PERF_EF_UPDATE stop an event. If it calls * ->stop() that must deal with already being stopped without * PERF_EF_UPDATE. */ int (*add) (struct perf_event *event, int flags); void (*del) (struct perf_event *event, int flags); /* * Starts/Stops a counter present on the PMU. * * The PMI handler should stop the counter when perf_event_overflow() * returns !0. ->start() will be used to continue. * * Also used to change the sample period. * * Called with IRQs disabled and the PMU disabled on the CPU the event * is on -- will be called from NMI context with the PMU generates * NMIs. * * ->stop() with PERF_EF_UPDATE will read the counter and update * period/count values like ->read() would. * * ->start() with PERF_EF_RELOAD will reprogram the counter * value, must be preceded by a ->stop() with PERF_EF_UPDATE. * * ->stop() with PERF_EF_PAUSE will stop as simply as possible. Will not * overlap another ->stop() with PERF_EF_PAUSE nor ->start() with * PERF_EF_RESUME. * * ->start() with PERF_EF_RESUME will start as simply as possible but * only if the counter is not otherwise stopped. Will not overlap * another ->start() with PERF_EF_RESUME nor ->stop() with * PERF_EF_PAUSE. * * Notably, PERF_EF_PAUSE/PERF_EF_RESUME *can* be concurrent with other * ->stop()/->start() invocations, just not itself. */ void (*start) (struct perf_event *event, int flags); void (*stop) (struct perf_event *event, int flags); /* * Updates the counter value of the event. * * For sampling capable PMUs this will also update the software period * hw_perf_event::period_left field. */ void (*read) (struct perf_event *event); /* * Group events scheduling is treated as a transaction, add * group events as a whole and perform one schedulability test. * If the test fails, roll back the whole group * * Start the transaction, after this ->add() doesn't need to * do schedulability tests. * * Optional. */ void (*start_txn) (struct pmu *pmu, unsigned int txn_flags); /* * If ->start_txn() disabled the ->add() schedulability test * then ->commit_txn() is required to perform one. On success * the transaction is closed. On error the transaction is kept * open until ->cancel_txn() is called. * * Optional. */ int (*commit_txn) (struct pmu *pmu); /* * Will cancel the transaction, assumes ->del() is called * for each successful ->add() during the transaction. * * Optional. */ void (*cancel_txn) (struct pmu *pmu); /* * Will return the value for perf_event_mmap_page::index for this event, * if no implementation is provided it will default to 0 (see * perf_event_idx_default). */ int (*event_idx) (struct perf_event *event); /*optional */ /* * context-switches callback */ void (*sched_task) (struct perf_event_pmu_context *pmu_ctx, struct task_struct *task, bool sched_in); /* * Kmem cache of PMU specific data */ struct kmem_cache *task_ctx_cache; /* * Set up pmu-private data structures for an AUX area */ void *(*setup_aux) (struct perf_event *event, void **pages, int nr_pages, bool overwrite); /* optional */ /* * Free pmu-private AUX data structures */ void (*free_aux) (void *aux); /* optional */ /* * Take a snapshot of the AUX buffer without touching the event * state, so that preempting ->start()/->stop() callbacks does * not interfere with their logic. Called in PMI context. * * Returns the size of AUX data copied to the output handle. * * Optional. */ long (*snapshot_aux) (struct perf_event *event, struct perf_output_handle *handle, unsigned long size); /* * Validate address range filters: make sure the HW supports the * requested configuration and number of filters; return 0 if the * supplied filters are valid, -errno otherwise. * * Runs in the context of the ioctl()ing process and is not serialized * with the rest of the PMU callbacks. */ int (*addr_filters_validate) (struct list_head *filters); /* optional */ /* * Synchronize address range filter configuration: * translate hw-agnostic filters into hardware configuration in * event::hw::addr_filters. * * Runs as a part of filter sync sequence that is done in ->start() * callback by calling perf_event_addr_filters_sync(). * * May (and should) traverse event::addr_filters::list, for which its * caller provides necessary serialization. */ void (*addr_filters_sync) (struct perf_event *event); /* optional */ /* * Check if event can be used for aux_output purposes for * events of this PMU. * * Runs from perf_event_open(). Should return 0 for "no match" * or non-zero for "match". */ int (*aux_output_match) (struct perf_event *event); /* optional */ /* * Skip programming this PMU on the given CPU. Typically needed for * big.LITTLE things. */ bool (*filter) (struct pmu *pmu, int cpu); /* optional */ /* * Check period value for PERF_EVENT_IOC_PERIOD ioctl. */ int (*check_period) (struct perf_event *event, u64 value); /* optional */ }; enum perf_addr_filter_action_t { PERF_ADDR_FILTER_ACTION_STOP = 0, PERF_ADDR_FILTER_ACTION_START, PERF_ADDR_FILTER_ACTION_FILTER, }; /** * struct perf_addr_filter - address range filter definition * @entry: event's filter list linkage * @path: object file's path for file-based filters * @offset: filter range offset * @size: filter range size (size==0 means single address trigger) * @action: filter/start/stop * * This is a hardware-agnostic filter configuration as specified by the user. */ struct perf_addr_filter { struct list_head entry; struct path path; unsigned long offset; unsigned long size; enum perf_addr_filter_action_t action; }; /** * struct perf_addr_filters_head - container for address range filters * @list: list of filters for this event * @lock: spinlock that serializes accesses to the @list and event's * (and its children's) filter generations. * @nr_file_filters: number of file-based filters * * A child event will use parent's @list (and therefore @lock), so they are * bundled together; see perf_event_addr_filters(). */ struct perf_addr_filters_head { struct list_head list; raw_spinlock_t lock; unsigned int nr_file_filters; }; struct perf_addr_filter_range { unsigned long start; unsigned long size; }; /* * The normal states are: * * ACTIVE --. * ^ | * | | * sched_{in,out}() | * | | * v | * ,---> INACTIVE --+ <-. * | | | * | {dis,en}able() * sched_in() | | * | OFF <--' --+ * | | * `---> ERROR ------' * * That is: * * sched_in: INACTIVE -> {ACTIVE,ERROR} * sched_out: ACTIVE -> INACTIVE * disable: {ACTIVE,INACTIVE} -> OFF * enable: {OFF,ERROR} -> INACTIVE * * Where {OFF,ERROR} are disabled states. * * Then we have the {EXIT,REVOKED,DEAD} states which are various shades of * defunct events: * * - EXIT means task that the even was assigned to died, but child events * still live, and further children can still be created. But the event * itself will never be active again. It can only transition to * {REVOKED,DEAD}; * * - REVOKED means the PMU the event was associated with is gone; all * functionality is stopped but the event is still alive. Can only * transition to DEAD; * * - DEAD event really is DYING tearing down state and freeing bits. * */ enum perf_event_state { PERF_EVENT_STATE_DEAD = -5, PERF_EVENT_STATE_REVOKED = -4, /* pmu gone, must not touch */ PERF_EVENT_STATE_EXIT = -3, /* task died, still inherit */ PERF_EVENT_STATE_ERROR = -2, /* scheduling error, can enable */ PERF_EVENT_STATE_OFF = -1, PERF_EVENT_STATE_INACTIVE = 0, PERF_EVENT_STATE_ACTIVE = 1, }; struct file; struct perf_sample_data; typedef void (*perf_overflow_handler_t)(struct perf_event *, struct perf_sample_data *, struct pt_regs *regs); /* * Event capabilities. For event_caps and groups caps. * * PERF_EV_CAP_SOFTWARE: Is a software event. * PERF_EV_CAP_READ_ACTIVE_PKG: A CPU event (or cgroup event) that can be read * from any CPU in the package where it is active. * PERF_EV_CAP_SIBLING: An event with this flag must be a group sibling and * cannot be a group leader. If an event with this flag is detached from the * group it is scheduled out and moved into an unrecoverable ERROR state. * PERF_EV_CAP_READ_SCOPE: A CPU event that can be read from any CPU of the * PMU scope where it is active. */ #define PERF_EV_CAP_SOFTWARE BIT(0) #define PERF_EV_CAP_READ_ACTIVE_PKG BIT(1) #define PERF_EV_CAP_SIBLING BIT(2) #define PERF_EV_CAP_READ_SCOPE BIT(3) #define SWEVENT_HLIST_BITS 8 #define SWEVENT_HLIST_SIZE (1 << SWEVENT_HLIST_BITS) struct swevent_hlist { struct hlist_head heads[SWEVENT_HLIST_SIZE]; struct rcu_head rcu_head; }; #define PERF_ATTACH_CONTEXT 0x0001 #define PERF_ATTACH_GROUP 0x0002 #define PERF_ATTACH_TASK 0x0004 #define PERF_ATTACH_TASK_DATA 0x0008 #define PERF_ATTACH_GLOBAL_DATA 0x0010 #define PERF_ATTACH_SCHED_CB 0x0020 #define PERF_ATTACH_CHILD 0x0040 #define PERF_ATTACH_EXCLUSIVE 0x0080 #define PERF_ATTACH_CALLCHAIN 0x0100 #define PERF_ATTACH_ITRACE 0x0200 struct bpf_prog; struct perf_cgroup; struct perf_buffer; struct pmu_event_list { raw_spinlock_t lock; struct list_head list; }; /* * event->sibling_list is modified whole holding both ctx->lock and ctx->mutex * as such iteration must hold either lock. However, since ctx->lock is an IRQ * safe lock, and is only held by the CPU doing the modification, having IRQs * disabled is sufficient since it will hold-off the IPIs. */ #ifdef CONFIG_PROVE_LOCKING # define lockdep_assert_event_ctx(event) \ WARN_ON_ONCE(__lockdep_enabled && \ (this_cpu_read(hardirqs_enabled) && \ lockdep_is_held(&(event)->ctx->mutex) != LOCK_STATE_HELD)) #else # define lockdep_assert_event_ctx(event) #endif #define for_each_sibling_event(sibling, event) \ lockdep_assert_event_ctx(event); \ if ((event)->group_leader == (event)) \ list_for_each_entry((sibling), &(event)->sibling_list, sibling_list) /** * struct perf_event - performance event kernel representation: */ struct perf_event { #ifdef CONFIG_PERF_EVENTS /* * entry onto perf_event_context::event_list; * modifications require ctx->lock * RCU safe iterations. */ struct list_head event_entry; /* * Locked for modification by both ctx->mutex and ctx->lock; holding * either sufficies for read. */ struct list_head sibling_list; struct list_head active_list; /* * Node on the pinned or flexible tree located at the event context; */ struct rb_node group_node; u64 group_index; /* * We need storage to track the entries in perf_pmu_migrate_context; we * cannot use the event_entry because of RCU and we want to keep the * group in tact which avoids us using the other two entries. */ struct list_head migrate_entry; struct hlist_node hlist_entry; struct list_head active_entry; int nr_siblings; /* Not serialized. Only written during event initialization. */ int event_caps; /* The cumulative AND of all event_caps for events in this group. */ int group_caps; unsigned int group_generation; struct perf_event *group_leader; /* * event->pmu will always point to pmu in which this event belongs. * Whereas event->pmu_ctx->pmu may point to other pmu when group of * different pmu events is created. */ struct pmu *pmu; void *pmu_private; enum perf_event_state state; unsigned int attach_state; local64_t count; atomic64_t child_count; /* * These are the total time in nanoseconds that the event * has been enabled (i.e. eligible to run, and the task has * been scheduled in, if this is a per-task event) * and running (scheduled onto the CPU), respectively. */ u64 total_time_enabled; u64 total_time_running; u64 tstamp; struct perf_event_attr attr; u16 header_size; u16 id_header_size; u16 read_size; struct hw_perf_event hw; struct perf_event_context *ctx; /* * event->pmu_ctx points to perf_event_pmu_context in which the event * is added. This pmu_ctx can be of other pmu for sw event when that * sw event is part of a group which also contains non-sw events. */ struct perf_event_pmu_context *pmu_ctx; atomic_long_t refcount; /* * These accumulate total time (in nanoseconds) that children * events have been enabled and running, respectively. */ atomic64_t child_total_time_enabled; atomic64_t child_total_time_running; /* * Protect attach/detach and child_list: */ struct mutex child_mutex; struct list_head child_list; struct perf_event *parent; int oncpu; int cpu; struct list_head owner_entry; struct task_struct *owner; /* mmap bits */ struct mutex mmap_mutex; refcount_t mmap_count; struct perf_buffer *rb; struct list_head rb_entry; unsigned long rcu_batches; int rcu_pending; /* poll related */ wait_queue_head_t waitq; struct fasync_struct *fasync; /* delayed work for NMIs and such */ unsigned int pending_wakeup; unsigned int pending_kill; unsigned int pending_disable; unsigned long pending_addr; /* SIGTRAP */ struct irq_work pending_irq; struct irq_work pending_disable_irq; struct callback_head pending_task; unsigned int pending_work; atomic_t event_limit; /* address range filters */ struct perf_addr_filters_head addr_filters; /* vma address array for file-based filders */ struct perf_addr_filter_range *addr_filter_ranges; unsigned long addr_filters_gen; /* for aux_output events */ struct perf_event *aux_event; void (*destroy)(struct perf_event *); struct rcu_head rcu_head; struct pid_namespace *ns; u64 id; atomic64_t lost_samples; u64 (*clock)(void); perf_overflow_handler_t overflow_handler; void *overflow_handler_context; struct bpf_prog *prog; u64 bpf_cookie; #ifdef CONFIG_EVENT_TRACING struct trace_event_call *tp_event; struct event_filter *filter; # ifdef CONFIG_FUNCTION_TRACER struct ftrace_ops ftrace_ops; # endif #endif #ifdef CONFIG_CGROUP_PERF struct perf_cgroup *cgrp; /* cgroup event is attach to */ #endif #ifdef CONFIG_SECURITY void *security; #endif struct list_head sb_list; struct list_head pmu_list; /* * Certain events gets forwarded to another pmu internally by over- * writing kernel copy of event->attr.type without user being aware * of it. event->orig_type contains original 'type' requested by * user. */ u32 orig_type; #endif /* CONFIG_PERF_EVENTS */ }; /* * ,-----------------------[1:n]------------------------. * V V * perf_event_context <-[1:n]-> perf_event_pmu_context <-[1:n]- perf_event * | | * `--[n:1]-> pmu <-[1:n]--' * * * struct perf_event_pmu_context lifetime is refcount based and RCU freed * (similar to perf_event_context). Locking is as if it were a member of * perf_event_context; specifically: * * modification, both: ctx->mutex && ctx->lock * reading, either: ctx->mutex || ctx->lock * * There is one exception to this; namely put_pmu_ctx() isn't always called * with ctx->mutex held; this means that as long as we can guarantee the epc * has events the above rules hold. * * Specificially, sys_perf_event_open()'s group_leader case depends on * ctx->mutex pinning the configuration. Since we hold a reference on * group_leader (through the filedesc) it can't go away, therefore it's * associated pmu_ctx must exist and cannot change due to ctx->mutex. * * perf_event holds a refcount on perf_event_context * perf_event holds a refcount on perf_event_pmu_context */ struct perf_event_pmu_context { struct pmu *pmu; struct perf_event_context *ctx; struct list_head pmu_ctx_entry; struct list_head pinned_active; struct list_head flexible_active; /* Used to identify the per-cpu perf_event_pmu_context */ unsigned int embedded : 1; unsigned int nr_events; unsigned int nr_cgroups; unsigned int nr_freq; atomic_t refcount; /* event <-> epc */ struct rcu_head rcu_head; /* * Set when one or more (plausibly active) event can't be scheduled * due to pmu overcommit or pmu constraints, except tolerant to * events not necessary to be active due to scheduling constraints, * such as cgroups. */ int rotate_necessary; }; static inline bool perf_pmu_ctx_is_active(struct perf_event_pmu_context *epc) { return !list_empty(&epc->flexible_active) || !list_empty(&epc->pinned_active); } struct perf_event_groups { struct rb_root tree; u64 index; }; /** * struct perf_event_context - event context structure * * Used as a container for task events and CPU events as well: */ struct perf_event_context { /* * Protect the states of the events in the list, * nr_active, and the list: */ raw_spinlock_t lock; /* * Protect the list of events. Locking either mutex or lock * is sufficient to ensure the list doesn't change; to change * the list you need to lock both the mutex and the spinlock. */ struct mutex mutex; struct list_head pmu_ctx_list; struct perf_event_groups pinned_groups; struct perf_event_groups flexible_groups; struct list_head event_list; int nr_events; int nr_user; int is_active; int nr_stat; int nr_freq; int rotate_disable; refcount_t refcount; /* event <-> ctx */ struct task_struct *task; /* * Context clock, runs when context enabled. */ u64 time; u64 timestamp; u64 timeoffset; /* * These fields let us detect when two contexts have both * been cloned (inherited) from a common ancestor. */ struct perf_event_context *parent_ctx; u64 parent_gen; u64 generation; int pin_count; #ifdef CONFIG_CGROUP_PERF int nr_cgroups; /* cgroup evts */ #endif struct rcu_head rcu_head; /* * The count of events for which using the switch-out fast path * should be avoided. * * Sum (event->pending_work + events with * (attr->inherit && (attr->sample_type & PERF_SAMPLE_READ))) * * The SIGTRAP is targeted at ctx->task, as such it won't do changing * that until the signal is delivered. */ local_t nr_no_switch_fast; }; /** * struct perf_ctx_data - PMU specific data for a task * @rcu_head: To avoid the race on free PMU specific data * @refcount: To track users * @global: To track system-wide users * @ctx_cache: Kmem cache of PMU specific data * @data: PMU specific data * * Currently, the struct is only used in Intel LBR call stack mode to * save/restore the call stack of a task on context switches. * * The rcu_head is used to prevent the race on free the data. * The data only be allocated when Intel LBR call stack mode is enabled. * The data will be freed when the mode is disabled. * The content of the data will only be accessed in context switch, which * should be protected by rcu_read_lock(). * * Because of the alignment requirement of Intel Arch LBR, the Kmem cache * is used to allocate the PMU specific data. The ctx_cache is to track * the Kmem cache. * * Careful: Struct perf_ctx_data is added as a pointer in struct task_struct. * When system-wide Intel LBR call stack mode is enabled, a buffer with * constant size will be allocated for each task. * Also, system memory consumption can further grow when the size of * struct perf_ctx_data enlarges. */ struct perf_ctx_data { struct rcu_head rcu_head; refcount_t refcount; int global; struct kmem_cache *ctx_cache; void *data; }; struct perf_cpu_pmu_context { struct perf_event_pmu_context epc; struct perf_event_pmu_context *task_epc; struct list_head sched_cb_entry; int sched_cb_usage; int active_oncpu; int exclusive; int pmu_disable_count; raw_spinlock_t hrtimer_lock; struct hrtimer hrtimer; ktime_t hrtimer_interval; unsigned int hrtimer_active; }; /** * struct perf_event_cpu_context - per cpu event context structure */ struct perf_cpu_context { struct perf_event_context ctx; struct perf_event_context *task_ctx; int online; #ifdef CONFIG_CGROUP_PERF struct perf_cgroup *cgrp; #endif /* * Per-CPU storage for iterators used in visit_groups_merge. The default * storage is of size 2 to hold the CPU and any CPU event iterators. */ int heap_size; struct perf_event **heap; struct perf_event *heap_default[2]; }; struct perf_output_handle { struct perf_event *event; struct perf_buffer *rb; unsigned long wakeup; unsigned long size; union { u64 flags; /* perf_output*() */ u64 aux_flags; /* perf_aux_output*() */ struct { u64 skip_read : 1; }; }; union { void *addr; unsigned long head; }; int page; }; struct bpf_perf_event_data_kern { bpf_user_pt_regs_t *regs; struct perf_sample_data *data; struct perf_event *event; }; #ifdef CONFIG_CGROUP_PERF /* * perf_cgroup_info keeps track of time_enabled for a cgroup. * This is a per-cpu dynamically allocated data structure. */ struct perf_cgroup_info { u64 time; u64 timestamp; u64 timeoffset; int active; }; struct perf_cgroup { struct cgroup_subsys_state css; struct perf_cgroup_info __percpu *info; }; /* * Must ensure cgroup is pinned (css_get) before calling * this function. In other words, we cannot call this function * if there is no cgroup event for the current CPU context. */ static inline struct perf_cgroup * perf_cgroup_from_task(struct task_struct *task, struct perf_event_context *ctx) { return container_of(task_css_check(task, perf_event_cgrp_id, ctx ? lockdep_is_held(&ctx->lock) : true), struct perf_cgroup, css); } #endif /* CONFIG_CGROUP_PERF */ #ifdef CONFIG_PERF_EVENTS extern struct perf_event_context *perf_cpu_task_ctx(void); extern void *perf_aux_output_begin(struct perf_output_handle *handle, struct perf_event *event); extern void perf_aux_output_end(struct perf_output_handle *handle, unsigned long size); extern int perf_aux_output_skip(struct perf_output_handle *handle, unsigned long size); extern void *perf_get_aux(struct perf_output_handle *handle); extern void perf_aux_output_flag(struct perf_output_handle *handle, u64 flags); extern void perf_event_itrace_started(struct perf_event *event); extern int perf_pmu_register(struct pmu *pmu, const char *name, int type); extern int perf_pmu_unregister(struct pmu *pmu); extern void __perf_event_task_sched_in(struct task_struct *prev, struct task_struct *task); extern void __perf_event_task_sched_out(struct task_struct *prev, struct task_struct *next); extern int perf_event_init_task(struct task_struct *child, u64 clone_flags); extern void perf_event_exit_task(struct task_struct *child); extern void perf_event_free_task(struct task_struct *task); extern void perf_event_delayed_put(struct task_struct *task); extern struct file *perf_event_get(unsigned int fd); extern const struct perf_event *perf_get_event(struct file *file); extern const struct perf_event_attr *perf_event_attrs(struct perf_event *event); extern void perf_event_print_debug(void); extern void perf_pmu_disable(struct pmu *pmu); extern void perf_pmu_enable(struct pmu *pmu); extern void perf_sched_cb_dec(struct pmu *pmu); extern void perf_sched_cb_inc(struct pmu *pmu); extern int perf_event_task_disable(void); extern int perf_event_task_enable(void); extern void perf_pmu_resched(struct pmu *pmu); extern int perf_event_refresh(struct perf_event *event, int refresh); extern void perf_event_update_userpage(struct perf_event *event); extern int perf_event_release_kernel(struct perf_event *event); extern struct perf_event * perf_event_create_kernel_counter(struct perf_event_attr *attr, int cpu, struct task_struct *task, perf_overflow_handler_t callback, void *context); extern void perf_pmu_migrate_context(struct pmu *pmu, int src_cpu, int dst_cpu); extern int perf_event_read_local(struct perf_event *event, u64 *value, u64 *enabled, u64 *running); extern u64 perf_event_read_value(struct perf_event *event, u64 *enabled, u64 *running); extern struct perf_callchain_entry *perf_callchain(struct perf_event *event, struct pt_regs *regs); static inline bool branch_sample_no_flags(const struct perf_event *event) { return event->attr.branch_sample_type & PERF_SAMPLE_BRANCH_NO_FLAGS; } static inline bool branch_sample_no_cycles(const struct perf_event *event) { return event->attr.branch_sample_type & PERF_SAMPLE_BRANCH_NO_CYCLES; } static inline bool branch_sample_type(const struct perf_event *event) { return event->attr.branch_sample_type & PERF_SAMPLE_BRANCH_TYPE_SAVE; } static inline bool branch_sample_hw_index(const struct perf_event *event) { return event->attr.branch_sample_type & PERF_SAMPLE_BRANCH_HW_INDEX; } static inline bool branch_sample_priv(const struct perf_event *event) { return event->attr.branch_sample_type & PERF_SAMPLE_BRANCH_PRIV_SAVE; } static inline bool branch_sample_counters(const struct perf_event *event) { return event->attr.branch_sample_type & PERF_SAMPLE_BRANCH_COUNTERS; } static inline bool branch_sample_call_stack(const struct perf_event *event) { return event->attr.branch_sample_type & PERF_SAMPLE_BRANCH_CALL_STACK; } struct perf_sample_data { /* * Fields set by perf_sample_data_init() unconditionally, * group so as to minimize the cachelines touched. */ u64 sample_flags; u64 period; u64 dyn_size; /* * Fields commonly set by __perf_event_header__init_id(), * group so as to minimize the cachelines touched. */ u64 type; struct { u32 pid; u32 tid; } tid_entry; u64 time; u64 id; struct { u32 cpu; u32 reserved; } cpu_entry; /* * The other fields, optionally {set,used} by * perf_{prepare,output}_sample(). */ u64 ip; struct perf_callchain_entry *callchain; struct perf_raw_record *raw; struct perf_branch_stack *br_stack; u64 *br_stack_cntr; union perf_sample_weight weight; union perf_mem_data_src data_src; u64 txn; struct perf_regs regs_user; struct perf_regs regs_intr; u64 stack_user_size; u64 stream_id; u64 cgroup; u64 addr; u64 phys_addr; u64 data_page_size; u64 code_page_size; u64 aux_size; } ____cacheline_aligned; /* default value for data source */ #define PERF_MEM_NA (PERF_MEM_S(OP, NA) |\ PERF_MEM_S(LVL, NA) |\ PERF_MEM_S(SNOOP, NA) |\ PERF_MEM_S(LOCK, NA) |\ PERF_MEM_S(TLB, NA) |\ PERF_MEM_S(LVLNUM, NA)) static inline void perf_sample_data_init(struct perf_sample_data *data, u64 addr, u64 period) { /* remaining struct members initialized in perf_prepare_sample() */ data->sample_flags = PERF_SAMPLE_PERIOD; data->period = period; data->dyn_size = 0; if (addr) { data->addr = addr; data->sample_flags |= PERF_SAMPLE_ADDR; } } static inline void perf_sample_save_callchain(struct perf_sample_data *data, struct perf_event *event, struct pt_regs *regs) { int size = 1; if (!(event->attr.sample_type & PERF_SAMPLE_CALLCHAIN)) return; if (WARN_ON_ONCE(data->sample_flags & PERF_SAMPLE_CALLCHAIN)) return; data->callchain = perf_callchain(event, regs); size += data->callchain->nr; data->dyn_size += size * sizeof(u64); data->sample_flags |= PERF_SAMPLE_CALLCHAIN; } static inline void perf_sample_save_raw_data(struct perf_sample_data *data, struct perf_event *event, struct perf_raw_record *raw) { struct perf_raw_frag *frag = &raw->frag; u32 sum = 0; int size; if (!(event->attr.sample_type & PERF_SAMPLE_RAW)) return; if (WARN_ON_ONCE(data->sample_flags & PERF_SAMPLE_RAW)) return; do { sum += frag->size; if (perf_raw_frag_last(frag)) break; frag = frag->next; } while (1); size = round_up(sum + sizeof(u32), sizeof(u64)); raw->size = size - sizeof(u32); frag->pad = raw->size - sum; data->raw = raw; data->dyn_size += size; data->sample_flags |= PERF_SAMPLE_RAW; } static inline bool has_branch_stack(struct perf_event *event) { return event->attr.sample_type & PERF_SAMPLE_BRANCH_STACK; } static inline void perf_sample_save_brstack(struct perf_sample_data *data, struct perf_event *event, struct perf_branch_stack *brs, u64 *brs_cntr) { int size = sizeof(u64); /* nr */ if (!has_branch_stack(event)) return; if (WARN_ON_ONCE(data->sample_flags & PERF_SAMPLE_BRANCH_STACK)) return; if (branch_sample_hw_index(event)) size += sizeof(u64); brs->nr = min_t(u16, event->attr.sample_max_stack, brs->nr); size += brs->nr * sizeof(struct perf_branch_entry); /* * The extension space for counters is appended after the * struct perf_branch_stack. It is used to store the occurrences * of events of each branch. */ if (brs_cntr) size += brs->nr * sizeof(u64); data->br_stack = brs; data->br_stack_cntr = brs_cntr; data->dyn_size += size; data->sample_flags |= PERF_SAMPLE_BRANCH_STACK; } static inline u32 perf_sample_data_size(struct perf_sample_data *data, struct perf_event *event) { u32 size = sizeof(struct perf_event_header); size += event->header_size + event->id_header_size; size += data->dyn_size; return size; } /* * Clear all bitfields in the perf_branch_entry. * The to and from fields are not cleared because they are * systematically modified by caller. */ static inline void perf_clear_branch_entry_bitfields(struct perf_branch_entry *br) { br->mispred = 0; br->predicted = 0; br->in_tx = 0; br->abort = 0; br->cycles = 0; br->type = 0; br->spec = PERF_BR_SPEC_NA; br->reserved = 0; } extern void perf_output_sample(struct perf_output_handle *handle, struct perf_event_header *header, struct perf_sample_data *data, struct perf_event *event); extern void perf_prepare_sample(struct perf_sample_data *data, struct perf_event *event, struct pt_regs *regs); extern void perf_prepare_header(struct perf_event_header *header, struct perf_sample_data *data, struct perf_event *event, struct pt_regs *regs); extern int perf_event_overflow(struct perf_event *event, struct perf_sample_data *data, struct pt_regs *regs); extern void perf_event_output_forward(struct perf_event *event, struct perf_sample_data *data, struct pt_regs *regs); extern void perf_event_output_backward(struct perf_event *event, struct perf_sample_data *data, struct pt_regs *regs); extern int perf_event_output(struct perf_event *event, struct perf_sample_data *data, struct pt_regs *regs); static inline bool is_default_overflow_handler(struct perf_event *event) { perf_overflow_handler_t overflow_handler = event->overflow_handler; if (likely(overflow_handler == perf_event_output_forward)) return true; if (unlikely(overflow_handler == perf_event_output_backward)) return true; return false; } extern void perf_event_header__init_id(struct perf_event_header *header, struct perf_sample_data *data, struct perf_event *event); extern void perf_event__output_id_sample(struct perf_event *event, struct perf_output_handle *handle, struct perf_sample_data *sample); extern void perf_log_lost_samples(struct perf_event *event, u64 lost); static inline bool event_has_any_exclude_flag(struct perf_event *event) { struct perf_event_attr *attr = &event->attr; return attr->exclude_idle || attr->exclude_user || attr->exclude_kernel || attr->exclude_hv || attr->exclude_guest || attr->exclude_host; } static inline bool is_sampling_event(struct perf_event *event) { return event->attr.sample_period != 0; } /* * Return 1 for a software event, 0 for a hardware event */ static inline int is_software_event(struct perf_event *event) { return event->event_caps & PERF_EV_CAP_SOFTWARE; } /* * Return 1 for event in sw context, 0 for event in hw context */ static inline int in_software_context(struct perf_event *event) { return event->pmu_ctx->pmu->task_ctx_nr == perf_sw_context; } static inline int is_exclusive_pmu(struct pmu *pmu) { return pmu->capabilities & PERF_PMU_CAP_EXCLUSIVE; } extern struct static_key perf_swevent_enabled[PERF_COUNT_SW_MAX]; extern void ___perf_sw_event(u32, u64, struct pt_regs *, u64); extern void __perf_sw_event(u32, u64, struct pt_regs *, u64); #ifndef perf_arch_fetch_caller_regs static inline void perf_arch_fetch_caller_regs(struct pt_regs *regs, unsigned long ip) { } #endif /* * When generating a perf sample in-line, instead of from an interrupt / * exception, we lack a pt_regs. This is typically used from software events * like: SW_CONTEXT_SWITCHES, SW_MIGRATIONS and the tie-in with tracepoints. * * We typically don't need a full set, but (for x86) do require: * - ip for PERF_SAMPLE_IP * - cs for user_mode() tests * - sp for PERF_SAMPLE_CALLCHAIN * - eflags for MISC bits and CALLCHAIN (see: perf_hw_regs()) * * NOTE: assumes @regs is otherwise already 0 filled; this is important for * things like PERF_SAMPLE_REGS_INTR. */ static inline void perf_fetch_caller_regs(struct pt_regs *regs) { perf_arch_fetch_caller_regs(regs, CALLER_ADDR0); } static __always_inline void perf_sw_event(u32 event_id, u64 nr, struct pt_regs *regs, u64 addr) { if (static_key_false(&perf_swevent_enabled[event_id])) __perf_sw_event(event_id, nr, regs, addr); } DECLARE_PER_CPU(struct pt_regs, __perf_regs[4]); /* * 'Special' version for the scheduler, it hard assumes no recursion, * which is guaranteed by us not actually scheduling inside other swevents * because those disable preemption. */ static __always_inline void __perf_sw_event_sched(u32 event_id, u64 nr, u64 addr) { struct pt_regs *regs = this_cpu_ptr(&__perf_regs[0]); perf_fetch_caller_regs(regs); ___perf_sw_event(event_id, nr, regs, addr); } extern struct static_key_false perf_sched_events; static __always_inline bool __perf_sw_enabled(int swevt) { return static_key_false(&perf_swevent_enabled[swevt]); } static inline void perf_event_task_migrate(struct task_struct *task) { if (__perf_sw_enabled(PERF_COUNT_SW_CPU_MIGRATIONS)) task->sched_migrated = 1; } static inline void perf_event_task_sched_in(struct task_struct *prev, struct task_struct *task) { if (static_branch_unlikely(&perf_sched_events)) __perf_event_task_sched_in(prev, task); if (__perf_sw_enabled(PERF_COUNT_SW_CPU_MIGRATIONS) && task->sched_migrated) { __perf_sw_event_sched(PERF_COUNT_SW_CPU_MIGRATIONS, 1, 0); task->sched_migrated = 0; } } static inline void perf_event_task_sched_out(struct task_struct *prev, struct task_struct *next) { if (__perf_sw_enabled(PERF_COUNT_SW_CONTEXT_SWITCHES)) __perf_sw_event_sched(PERF_COUNT_SW_CONTEXT_SWITCHES, 1, 0); #ifdef CONFIG_CGROUP_PERF if (__perf_sw_enabled(PERF_COUNT_SW_CGROUP_SWITCHES) && perf_cgroup_from_task(prev, NULL) != perf_cgroup_from_task(next, NULL)) __perf_sw_event_sched(PERF_COUNT_SW_CGROUP_SWITCHES, 1, 0); #endif if (static_branch_unlikely(&perf_sched_events)) __perf_event_task_sched_out(prev, next); } extern void perf_event_mmap(struct vm_area_struct *vma); extern void perf_event_ksymbol(u16 ksym_type, u64 addr, u32 len, bool unregister, const char *sym); extern void perf_event_bpf_event(struct bpf_prog *prog, enum perf_bpf_event_type type, u16 flags); #define PERF_GUEST_ACTIVE 0x01 #define PERF_GUEST_USER 0x02 struct perf_guest_info_callbacks { unsigned int (*state)(void); unsigned long (*get_ip)(void); unsigned int (*handle_intel_pt_intr)(void); }; #ifdef CONFIG_GUEST_PERF_EVENTS extern struct perf_guest_info_callbacks __rcu *perf_guest_cbs; DECLARE_STATIC_CALL(__perf_guest_state, *perf_guest_cbs->state); DECLARE_STATIC_CALL(__perf_guest_get_ip, *perf_guest_cbs->get_ip); DECLARE_STATIC_CALL(__perf_guest_handle_intel_pt_intr, *perf_guest_cbs->handle_intel_pt_intr); static inline unsigned int perf_guest_state(void) { return static_call(__perf_guest_state)(); } static inline unsigned long perf_guest_get_ip(void) { return static_call(__perf_guest_get_ip)(); } static inline unsigned int perf_guest_handle_intel_pt_intr(void) { return static_call(__perf_guest_handle_intel_pt_intr)(); } extern void perf_register_guest_info_callbacks(struct perf_guest_info_callbacks *cbs); extern void perf_unregister_guest_info_callbacks(struct perf_guest_info_callbacks *cbs); #else /* !CONFIG_GUEST_PERF_EVENTS: */ static inline unsigned int perf_guest_state(void) { return 0; } static inline unsigned long perf_guest_get_ip(void) { return 0; } static inline unsigned int perf_guest_handle_intel_pt_intr(void) { return 0; } #endif /* !CONFIG_GUEST_PERF_EVENTS */ extern void perf_event_exec(void); extern void perf_event_comm(struct task_struct *tsk, bool exec); extern void perf_event_namespaces(struct task_struct *tsk); extern void perf_event_fork(struct task_struct *tsk); extern void perf_event_text_poke(const void *addr, const void *old_bytes, size_t old_len, const void *new_bytes, size_t new_len); /* Callchains */ DECLARE_PER_CPU(struct perf_callchain_entry, perf_callchain_entry); extern void perf_callchain_user(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs); extern void perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs); extern struct perf_callchain_entry * get_perf_callchain(struct pt_regs *regs, bool kernel, bool user, u32 max_stack, bool crosstask, bool add_mark); extern int get_callchain_buffers(int max_stack); extern void put_callchain_buffers(void); extern struct perf_callchain_entry *get_callchain_entry(int *rctx); extern void put_callchain_entry(int rctx); extern int sysctl_perf_event_max_stack; extern int sysctl_perf_event_max_contexts_per_stack; static inline int perf_callchain_store_context(struct perf_callchain_entry_ctx *ctx, u64 ip) { if (ctx->contexts < sysctl_perf_event_max_contexts_per_stack) { struct perf_callchain_entry *entry = ctx->entry; entry->ip[entry->nr++] = ip; ++ctx->contexts; return 0; } else { ctx->contexts_maxed = true; return -1; /* no more room, stop walking the stack */ } } static inline int perf_callchain_store(struct perf_callchain_entry_ctx *ctx, u64 ip) { if (ctx->nr < ctx->max_stack && !ctx->contexts_maxed) { struct perf_callchain_entry *entry = ctx->entry; entry->ip[entry->nr++] = ip; ++ctx->nr; return 0; } else { return -1; /* no more room, stop walking the stack */ } } extern int sysctl_perf_event_paranoid; extern int sysctl_perf_event_sample_rate; extern void perf_sample_event_took(u64 sample_len_ns); /* Access to perf_event_open(2) syscall. */ #define PERF_SECURITY_OPEN 0 /* Finer grained perf_event_open(2) access control. */ #define PERF_SECURITY_CPU 1 #define PERF_SECURITY_KERNEL 2 #define PERF_SECURITY_TRACEPOINT 3 static inline int perf_is_paranoid(void) { return sysctl_perf_event_paranoid > -1; } extern int perf_allow_kernel(void); static inline int perf_allow_cpu(void) { if (sysctl_perf_event_paranoid > 0 && !perfmon_capable()) return -EACCES; return security_perf_event_open(PERF_SECURITY_CPU); } static inline int perf_allow_tracepoint(void) { if (sysctl_perf_event_paranoid > -1 && !perfmon_capable()) return -EPERM; return security_perf_event_open(PERF_SECURITY_TRACEPOINT); } extern int perf_exclude_event(struct perf_event *event, struct pt_regs *regs); extern void perf_event_init(void); extern void perf_tp_event(u16 event_type, u64 count, void *record, int entry_size, struct pt_regs *regs, struct hlist_head *head, int rctx, struct task_struct *task); extern void perf_bp_event(struct perf_event *event, void *data); extern unsigned long perf_misc_flags(struct perf_event *event, struct pt_regs *regs); extern unsigned long perf_instruction_pointer(struct perf_event *event, struct pt_regs *regs); #ifndef perf_arch_misc_flags # define perf_arch_misc_flags(regs) \ (user_mode(regs) ? PERF_RECORD_MISC_USER : PERF_RECORD_MISC_KERNEL) # define perf_arch_instruction_pointer(regs) instruction_pointer(regs) #endif #ifndef perf_arch_bpf_user_pt_regs # define perf_arch_bpf_user_pt_regs(regs) regs #endif #ifndef perf_arch_guest_misc_flags static inline unsigned long perf_arch_guest_misc_flags(struct pt_regs *regs) { unsigned long guest_state = perf_guest_state(); if (!(guest_state & PERF_GUEST_ACTIVE)) return 0; if (guest_state & PERF_GUEST_USER) return PERF_RECORD_MISC_GUEST_USER; else return PERF_RECORD_MISC_GUEST_KERNEL; } # define perf_arch_guest_misc_flags(regs) perf_arch_guest_misc_flags(regs) #endif static inline bool needs_branch_stack(struct perf_event *event) { return event->attr.branch_sample_type != 0; } static inline bool has_aux(struct perf_event *event) { return event->pmu && event->pmu->setup_aux; } static inline bool has_aux_action(struct perf_event *event) { return event->attr.aux_sample_size || event->attr.aux_pause || event->attr.aux_resume; } static inline bool is_write_backward(struct perf_event *event) { return !!event->attr.write_backward; } static inline bool has_addr_filter(struct perf_event *event) { return event->pmu->nr_addr_filters; } /* * An inherited event uses parent's filters */ static inline struct perf_addr_filters_head * perf_event_addr_filters(struct perf_event *event) { struct perf_addr_filters_head *ifh = &event->addr_filters; if (event->parent) ifh = &event->parent->addr_filters; return ifh; } static inline struct fasync_struct **perf_event_fasync(struct perf_event *event) { /* Only the parent has fasync state */ if (event->parent) event = event->parent; return &event->fasync; } extern void perf_event_addr_filters_sync(struct perf_event *event); extern void perf_report_aux_output_id(struct perf_event *event, u64 hw_id); extern int perf_output_begin(struct perf_output_handle *handle, struct perf_sample_data *data, struct perf_event *event, unsigned int size); extern int perf_output_begin_forward(struct perf_output_handle *handle, struct perf_sample_data *data, struct perf_event *event, unsigned int size); extern int perf_output_begin_backward(struct perf_output_handle *handle, struct perf_sample_data *data, struct perf_event *event, unsigned int size); extern void perf_output_end(struct perf_output_handle *handle); extern unsigned int perf_output_copy(struct perf_output_handle *handle, const void *buf, unsigned int len); extern unsigned int perf_output_skip(struct perf_output_handle *handle, unsigned int len); extern long perf_output_copy_aux(struct perf_output_handle *aux_handle, struct perf_output_handle *handle, unsigned long from, unsigned long to); extern int perf_swevent_get_recursion_context(void); extern void perf_swevent_put_recursion_context(int rctx); extern u64 perf_swevent_set_period(struct perf_event *event); extern void perf_event_enable(struct perf_event *event); extern void perf_event_disable(struct perf_event *event); extern void perf_event_disable_local(struct perf_event *event); extern void perf_event_disable_inatomic(struct perf_event *event); extern void perf_event_task_tick(void); extern int perf_event_account_interrupt(struct perf_event *event); extern int perf_event_period(struct perf_event *event, u64 value); extern u64 perf_event_pause(struct perf_event *event, bool reset); #else /* !CONFIG_PERF_EVENTS: */ static inline void * perf_aux_output_begin(struct perf_output_handle *handle, struct perf_event *event) { return NULL; } static inline void perf_aux_output_end(struct perf_output_handle *handle, unsigned long size) { } static inline int perf_aux_output_skip(struct perf_output_handle *handle, unsigned long size) { return -EINVAL; } static inline void * perf_get_aux(struct perf_output_handle *handle) { return NULL; } static inline void perf_event_task_migrate(struct task_struct *task) { } static inline void perf_event_task_sched_in(struct task_struct *prev, struct task_struct *task) { } static inline void perf_event_task_sched_out(struct task_struct *prev, struct task_struct *next) { } static inline int perf_event_init_task(struct task_struct *child, u64 clone_flags) { return 0; } static inline void perf_event_exit_task(struct task_struct *child) { } static inline void perf_event_free_task(struct task_struct *task) { } static inline void perf_event_delayed_put(struct task_struct *task) { } static inline struct file *perf_event_get(unsigned int fd) { return ERR_PTR(-EINVAL); } static inline const struct perf_event *perf_get_event(struct file *file) { return ERR_PTR(-EINVAL); } static inline const struct perf_event_attr *perf_event_attrs(struct perf_event *event) { return ERR_PTR(-EINVAL); } static inline int perf_event_read_local(struct perf_event *event, u64 *value, u64 *enabled, u64 *running) { return -EINVAL; } static inline void perf_event_print_debug(void) { } static inline int perf_event_task_disable(void) { return -EINVAL; } static inline int perf_event_task_enable(void) { return -EINVAL; } static inline int perf_event_refresh(struct perf_event *event, int refresh) { return -EINVAL; } static inline void perf_sw_event(u32 event_id, u64 nr, struct pt_regs *regs, u64 addr) { } static inline void perf_bp_event(struct perf_event *event, void *data) { } static inline void perf_event_mmap(struct vm_area_struct *vma) { } typedef int (perf_ksymbol_get_name_f)(char *name, int name_len, void *data); static inline void perf_event_ksymbol(u16 ksym_type, u64 addr, u32 len, bool unregister, const char *sym) { } static inline void perf_event_bpf_event(struct bpf_prog *prog, enum perf_bpf_event_type type, u16 flags) { } static inline void perf_event_exec(void) { } static inline void perf_event_comm(struct task_struct *tsk, bool exec) { } static inline void perf_event_namespaces(struct task_struct *tsk) { } static inline void perf_event_fork(struct task_struct *tsk) { } static inline void perf_event_text_poke(const void *addr, const void *old_bytes, size_t old_len, const void *new_bytes, size_t new_len) { } static inline void perf_event_init(void) { } static inline int perf_swevent_get_recursion_context(void) { return -1; } static inline void perf_swevent_put_recursion_context(int rctx) { } static inline u64 perf_swevent_set_period(struct perf_event *event) { return 0; } static inline void perf_event_enable(struct perf_event *event) { } static inline void perf_event_disable(struct perf_event *event) { } static inline int __perf_event_disable(void *info) { return -1; } static inline void perf_event_task_tick(void) { } static inline int perf_event_release_kernel(struct perf_event *event) { return 0; } static inline int perf_event_period(struct perf_event *event, u64 value) { return -EINVAL; } static inline u64 perf_event_pause(struct perf_event *event, bool reset) { return 0; } static inline int perf_exclude_event(struct perf_event *event, struct pt_regs *regs) { return 0; } #endif /* !CONFIG_PERF_EVENTS */ #if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_CPU_SUP_INTEL) extern void perf_restore_debug_store(void); #else static inline void perf_restore_debug_store(void) { } #endif #define perf_output_put(handle, x) perf_output_copy((handle), &(x), sizeof(x)) struct perf_pmu_events_attr { struct device_attribute attr; u64 id; const char *event_str; }; struct perf_pmu_events_ht_attr { struct device_attribute attr; u64 id; const char *event_str_ht; const char *event_str_noht; }; struct perf_pmu_events_hybrid_attr { struct device_attribute attr; u64 id; const char *event_str; u64 pmu_type; }; struct perf_pmu_format_hybrid_attr { struct device_attribute attr; u64 pmu_type; }; ssize_t perf_event_sysfs_show(struct device *dev, struct device_attribute *attr, char *page); #define PMU_EVENT_ATTR(_name, _var, _id, _show) \ static struct perf_pmu_events_attr _var = { \ .attr = __ATTR(_name, 0444, _show, NULL), \ .id = _id, \ }; #define PMU_EVENT_ATTR_STRING(_name, _var, _str) \ static struct perf_pmu_events_attr _var = { \ .attr = __ATTR(_name, 0444, perf_event_sysfs_show, NULL), \ .id = 0, \ .event_str = _str, \ }; #define PMU_EVENT_ATTR_ID(_name, _show, _id) \ (&((struct perf_pmu_events_attr[]) { \ { .attr = __ATTR(_name, 0444, _show, NULL), \ .id = _id, } \ })[0].attr.attr) #define PMU_FORMAT_ATTR_SHOW(_name, _format) \ static ssize_t \ _name##_show(struct device *dev, \ struct device_attribute *attr, \ char *page) \ { \ BUILD_BUG_ON(sizeof(_format) >= PAGE_SIZE); \ return sprintf(page, _format "\n"); \ } \ #define PMU_FORMAT_ATTR(_name, _format) \ PMU_FORMAT_ATTR_SHOW(_name, _format) \ \ static struct device_attribute format_attr_##_name = __ATTR_RO(_name) /* Performance counter hotplug functions */ #ifdef CONFIG_PERF_EVENTS extern int perf_event_init_cpu(unsigned int cpu); extern int perf_event_exit_cpu(unsigned int cpu); #else # define perf_event_init_cpu NULL # define perf_event_exit_cpu NULL #endif extern void arch_perf_update_userpage(struct perf_event *event, struct perf_event_mmap_page *userpg, u64 now); /* * Snapshot branch stack on software events. * * Branch stack can be very useful in understanding software events. For * example, when a long function, e.g. sys_perf_event_open, returns an * errno, it is not obvious why the function failed. Branch stack could * provide very helpful information in this type of scenarios. * * On software event, it is necessary to stop the hardware branch recorder * fast. Otherwise, the hardware register/buffer will be flushed with * entries of the triggering event. Therefore, static call is used to * stop the hardware recorder. */ /* * cnt is the number of entries allocated for entries. * Return number of entries copied to . */ typedef int (perf_snapshot_branch_stack_t)(struct perf_branch_entry *entries, unsigned int cnt); DECLARE_STATIC_CALL(perf_snapshot_branch_stack, perf_snapshot_branch_stack_t); #ifndef PERF_NEEDS_LOPWR_CB static inline void perf_lopwr_cb(bool mode) { } #endif #endif /* _LINUX_PERF_EVENT_H */ |
| 132 51 48 47 23 22 166 165 164 132 129 215 51 166 102 103 103 102 64 14 14 262 264 264 130 131 44 43 44 26 26 44 44 539 184 356 6 183 184 185 37 384 384 146 4 145 145 145 106 109 108 145 34 112 48 39 137 26 2 24 111 30 18 2 18 14 14 14 12 10 3 11 6 10 5 12 12 10 6 3 3 7 7 3 5 14 10 14 5 48 46 20 22 12 3 16 14 20 39 43 43 14 14 15 43 43 26 26 19 19 19 20 11 11 145 144 112 48 33 7 41 41 16 144 143 145 8 7 141 159 26 85 10 10 5 304 67 47 21 402 146 147 1 62 4 13 4 9 56 23 24 56 140 147 23 145 4 39 108 9 147 147 147 9 147 4 47 38 45 2 4 35 49 41 8 38 8 15 37 10 3 7 30 20 17 2 11 4 1 5 4 1 14 1 15 4 11 21 11 15 15 15 5 5 11 21 21 21 11 11 24 24 24 23 20 7 18 16 15 23 22 21 7 5 20 98 320 320 66 53 320 320 320 7 4 320 8 424 425 41 41 41 282 282 194 98 127 1 282 282 127 383 144 241 383 384 384 384 23 16 350 282 184 125 383 382 6 339 14 14 14 3 10 1 14 98 2 120 159 12 146 157 158 159 44 13 31 31 44 44 13 13 16 17 17 6 12 17 2 1 16 10 50 4 51 51 34 10 51 47 3 3 51 51 2 2 47 88 21 68 89 88 293 223 84 2 3 64 26 51 51 4 51 4 9 80 283 214 40 47 4 88 88 4 45 1235 1239 910 13 1 1238 283 89 4 89 43 47 44 45 44 26 18 1195 1196 1198 1191 161 117 267 188 1194 45 1156 123 5 5 4 8 82 52 26 4 118 98 150 34 16 275 82 195 118 180 115 12 12 64 64 64 64 64 64 64 7 64 64 53 52 12 47 15 1 16 16 16 16 9 4 5 12 5 5 33 33 33 33 147 147 22 22 1 3 1 35 40 18 26 39 6 16 19 21 2 11 25 25 16 1 1 28 59 7 53 20 5 4 2 6 4 59 59 2 46 18 24 24 26 26 26 16 16 14 14 12 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 2674 2675 2676 2677 2678 2679 2680 2681 2682 2683 2684 2685 2686 2687 2688 2689 2690 2691 2692 2693 2694 2695 2696 2697 2698 2699 2700 2701 2702 2703 2704 2705 2706 2707 2708 2709 2710 2711 2712 2713 2714 2715 2716 2717 2718 2719 2720 2721 2722 2723 2724 2725 2726 2727 2728 2729 2730 2731 2732 2733 2734 2735 2736 2737 2738 2739 2740 2741 2742 2743 2744 2745 2746 2747 2748 2749 2750 2751 2752 2753 2754 2755 2756 2757 2758 2759 2760 2761 2762 2763 2764 2765 2766 2767 2768 2769 2770 2771 2772 2773 2774 2775 2776 2777 2778 2779 2780 2781 2782 2783 2784 2785 2786 2787 2788 2789 2790 2791 2792 2793 2794 2795 2796 2797 2798 2799 2800 2801 2802 2803 2804 2805 2806 2807 2808 2809 2810 2811 2812 2813 2814 2815 2816 2817 2818 2819 2820 2821 2822 2823 2824 2825 2826 2827 2828 2829 2830 2831 2832 2833 2834 2835 2836 2837 2838 2839 2840 2841 2842 2843 2844 2845 2846 2847 2848 2849 2850 2851 2852 2853 2854 2855 2856 2857 2858 2859 2860 2861 2862 2863 2864 2865 2866 2867 2868 2869 2870 2871 2872 2873 2874 2875 2876 2877 2878 2879 2880 2881 2882 2883 2884 2885 2886 2887 2888 2889 2890 2891 2892 2893 2894 2895 2896 2897 2898 2899 2900 2901 2902 2903 2904 2905 2906 2907 2908 2909 2910 2911 2912 2913 2914 2915 2916 2917 2918 2919 2920 2921 2922 2923 2924 2925 2926 2927 2928 2929 2930 2931 2932 2933 2934 2935 2936 2937 2938 2939 2940 2941 2942 2943 2944 2945 2946 2947 2948 2949 2950 2951 2952 2953 2954 2955 2956 2957 2958 2959 2960 2961 2962 2963 2964 2965 2966 2967 2968 2969 2970 2971 2972 2973 2974 2975 2976 2977 2978 2979 2980 2981 2982 2983 2984 2985 2986 2987 2988 2989 2990 2991 2992 2993 2994 2995 2996 2997 2998 2999 3000 3001 3002 3003 3004 3005 3006 3007 3008 3009 3010 3011 3012 3013 3014 3015 3016 3017 3018 3019 3020 3021 3022 3023 3024 3025 3026 3027 3028 3029 3030 3031 3032 3033 3034 3035 3036 3037 3038 3039 3040 3041 3042 3043 3044 3045 3046 3047 3048 3049 3050 3051 3052 3053 3054 3055 3056 3057 3058 3059 3060 3061 3062 3063 3064 3065 3066 3067 3068 3069 3070 3071 3072 3073 3074 3075 3076 3077 3078 3079 3080 3081 3082 3083 3084 3085 3086 3087 3088 3089 3090 3091 3092 3093 3094 3095 3096 3097 3098 3099 3100 3101 3102 3103 3104 3105 3106 3107 3108 3109 3110 3111 3112 3113 3114 3115 3116 3117 3118 3119 3120 3121 3122 3123 3124 3125 3126 3127 3128 3129 3130 3131 3132 3133 3134 3135 3136 3137 3138 3139 3140 3141 3142 3143 3144 3145 3146 3147 3148 3149 3150 3151 3152 3153 3154 3155 3156 3157 3158 3159 3160 3161 3162 3163 3164 3165 3166 3167 3168 3169 3170 3171 3172 3173 3174 3175 3176 3177 3178 3179 3180 3181 3182 3183 3184 3185 3186 3187 3188 3189 3190 3191 3192 3193 3194 3195 3196 3197 3198 3199 3200 3201 3202 3203 3204 3205 3206 3207 3208 3209 3210 3211 3212 3213 3214 3215 3216 3217 3218 3219 3220 3221 3222 3223 3224 3225 3226 3227 3228 3229 3230 3231 3232 3233 3234 3235 3236 3237 3238 3239 3240 3241 3242 3243 3244 3245 3246 3247 3248 3249 3250 3251 3252 3253 3254 3255 3256 3257 3258 3259 3260 3261 3262 3263 3264 3265 3266 3267 3268 3269 3270 3271 3272 3273 3274 3275 3276 3277 3278 3279 3280 3281 3282 3283 3284 3285 3286 3287 3288 3289 3290 3291 3292 3293 3294 3295 3296 3297 3298 3299 3300 3301 3302 3303 3304 3305 3306 3307 3308 3309 3310 3311 3312 3313 3314 3315 3316 3317 3318 3319 3320 3321 3322 3323 3324 3325 3326 3327 3328 3329 3330 3331 3332 3333 3334 3335 3336 3337 3338 3339 3340 3341 3342 3343 3344 3345 3346 3347 3348 3349 3350 3351 3352 3353 3354 3355 3356 3357 3358 3359 3360 3361 3362 3363 3364 3365 3366 3367 3368 3369 3370 3371 3372 3373 3374 3375 3376 3377 3378 3379 3380 3381 3382 3383 3384 3385 3386 3387 3388 3389 3390 3391 3392 3393 3394 3395 3396 3397 3398 3399 3400 3401 3402 3403 3404 3405 3406 3407 3408 3409 3410 3411 3412 3413 3414 3415 3416 3417 3418 3419 3420 3421 3422 3423 3424 3425 3426 3427 3428 3429 3430 3431 3432 3433 3434 3435 3436 3437 3438 3439 3440 3441 3442 3443 3444 3445 3446 3447 3448 3449 3450 3451 3452 3453 3454 3455 3456 3457 3458 3459 3460 3461 3462 3463 3464 3465 3466 3467 3468 3469 3470 3471 3472 3473 3474 3475 3476 3477 3478 3479 3480 3481 3482 3483 3484 3485 3486 3487 3488 3489 3490 3491 3492 3493 3494 3495 3496 3497 3498 3499 3500 3501 3502 3503 3504 3505 3506 3507 3508 3509 3510 3511 3512 3513 3514 3515 3516 3517 3518 3519 3520 3521 3522 3523 3524 3525 3526 3527 3528 3529 3530 3531 3532 3533 3534 3535 3536 3537 3538 3539 3540 3541 3542 3543 3544 3545 3546 3547 3548 3549 3550 3551 3552 3553 3554 3555 3556 3557 3558 3559 3560 3561 3562 3563 3564 3565 3566 3567 3568 3569 3570 3571 3572 3573 3574 3575 3576 3577 3578 3579 3580 3581 3582 3583 3584 3585 3586 3587 3588 3589 3590 3591 3592 3593 3594 3595 3596 3597 3598 3599 3600 3601 3602 3603 3604 3605 3606 3607 3608 3609 3610 3611 3612 3613 3614 3615 3616 3617 3618 3619 3620 3621 3622 3623 3624 3625 3626 3627 3628 3629 3630 3631 3632 3633 3634 3635 3636 3637 3638 3639 3640 3641 3642 3643 3644 3645 3646 3647 3648 3649 3650 3651 3652 3653 3654 3655 3656 3657 3658 3659 3660 3661 3662 3663 3664 3665 3666 3667 3668 3669 3670 3671 3672 3673 3674 3675 3676 3677 3678 3679 3680 3681 3682 3683 3684 3685 3686 3687 3688 3689 3690 3691 3692 3693 3694 3695 3696 3697 3698 3699 3700 3701 3702 3703 3704 3705 3706 3707 3708 3709 3710 3711 3712 3713 3714 3715 3716 3717 3718 3719 3720 3721 3722 3723 3724 3725 3726 3727 3728 3729 3730 3731 3732 3733 3734 3735 3736 3737 3738 3739 3740 3741 3742 3743 3744 3745 3746 3747 3748 3749 3750 3751 3752 3753 3754 3755 3756 3757 3758 3759 3760 3761 3762 3763 3764 3765 3766 3767 3768 3769 3770 3771 3772 3773 3774 3775 3776 3777 3778 3779 3780 3781 3782 3783 3784 3785 3786 3787 3788 3789 3790 3791 3792 3793 3794 3795 3796 3797 3798 3799 3800 3801 3802 3803 3804 3805 3806 3807 3808 3809 3810 3811 3812 3813 3814 3815 3816 3817 3818 3819 3820 3821 3822 3823 3824 3825 3826 3827 3828 3829 3830 3831 3832 3833 3834 3835 3836 3837 3838 3839 3840 3841 3842 3843 3844 3845 3846 3847 3848 3849 3850 3851 3852 3853 3854 3855 3856 3857 3858 3859 3860 3861 3862 3863 3864 3865 3866 3867 3868 3869 3870 3871 3872 3873 3874 3875 3876 3877 3878 3879 3880 3881 3882 3883 3884 3885 3886 3887 3888 3889 3890 3891 3892 3893 3894 3895 3896 3897 3898 3899 3900 3901 3902 3903 3904 3905 3906 3907 3908 3909 3910 3911 3912 3913 3914 3915 3916 3917 3918 3919 3920 3921 3922 3923 3924 3925 3926 3927 3928 3929 3930 3931 3932 3933 3934 3935 3936 3937 3938 3939 3940 3941 3942 3943 3944 3945 3946 3947 3948 3949 3950 3951 3952 3953 3954 3955 3956 3957 3958 3959 3960 3961 3962 3963 3964 3965 3966 3967 3968 3969 3970 3971 3972 3973 3974 3975 3976 3977 3978 3979 3980 3981 3982 3983 3984 3985 3986 3987 3988 3989 3990 3991 3992 3993 3994 3995 3996 3997 3998 3999 4000 4001 4002 4003 4004 4005 4006 4007 4008 4009 4010 4011 4012 4013 4014 4015 4016 4017 4018 4019 4020 4021 4022 4023 4024 4025 4026 4027 4028 4029 4030 4031 4032 4033 4034 4035 4036 4037 4038 4039 4040 4041 4042 4043 4044 4045 4046 4047 4048 4049 4050 4051 4052 4053 4054 4055 4056 4057 4058 4059 4060 4061 4062 4063 4064 4065 4066 4067 4068 4069 4070 4071 4072 4073 4074 4075 4076 4077 4078 4079 4080 4081 4082 4083 4084 4085 4086 4087 4088 4089 4090 4091 4092 4093 4094 4095 4096 4097 4098 4099 4100 4101 4102 4103 4104 4105 4106 4107 4108 4109 4110 4111 4112 4113 4114 4115 4116 4117 4118 4119 4120 4121 4122 4123 4124 4125 4126 4127 4128 4129 4130 4131 4132 4133 4134 4135 4136 4137 4138 4139 4140 4141 4142 4143 4144 4145 4146 4147 4148 4149 4150 4151 4152 4153 4154 4155 4156 4157 4158 4159 4160 4161 4162 4163 4164 4165 4166 4167 4168 4169 4170 4171 4172 4173 4174 4175 4176 4177 4178 4179 4180 4181 4182 4183 4184 4185 4186 4187 4188 4189 4190 4191 4192 4193 4194 4195 4196 4197 4198 4199 4200 4201 4202 4203 4204 4205 4206 4207 4208 4209 4210 4211 4212 4213 4214 4215 4216 4217 4218 4219 4220 4221 4222 4223 4224 4225 4226 4227 4228 4229 4230 4231 4232 4233 4234 4235 4236 4237 4238 4239 4240 4241 4242 4243 4244 4245 4246 4247 4248 4249 4250 4251 4252 4253 4254 4255 4256 4257 4258 4259 4260 4261 4262 4263 4264 4265 4266 4267 4268 4269 4270 4271 4272 4273 4274 4275 4276 4277 4278 4279 4280 4281 4282 4283 4284 4285 4286 4287 4288 4289 4290 4291 4292 4293 4294 4295 4296 4297 4298 4299 4300 4301 4302 4303 4304 4305 4306 4307 4308 4309 4310 4311 4312 4313 4314 4315 4316 4317 4318 4319 4320 4321 4322 4323 4324 4325 4326 4327 4328 4329 4330 4331 4332 4333 4334 4335 4336 4337 4338 4339 4340 4341 4342 4343 4344 4345 4346 4347 4348 4349 4350 4351 4352 4353 4354 4355 4356 4357 4358 4359 4360 4361 4362 4363 4364 4365 4366 4367 4368 4369 4370 4371 4372 4373 4374 4375 4376 4377 4378 4379 4380 4381 4382 4383 4384 4385 4386 4387 4388 4389 4390 4391 4392 4393 4394 4395 4396 4397 4398 4399 4400 4401 4402 4403 4404 4405 4406 4407 4408 4409 4410 4411 4412 4413 4414 4415 4416 4417 4418 4419 4420 4421 4422 4423 4424 4425 4426 4427 4428 4429 4430 4431 4432 4433 4434 4435 4436 4437 4438 4439 4440 4441 4442 4443 4444 4445 4446 4447 4448 4449 4450 4451 4452 4453 4454 4455 4456 4457 4458 4459 4460 4461 4462 4463 4464 4465 4466 4467 4468 4469 4470 4471 4472 4473 4474 4475 4476 4477 4478 4479 4480 4481 4482 4483 4484 4485 4486 4487 4488 4489 4490 4491 4492 4493 4494 4495 4496 4497 4498 4499 4500 4501 4502 4503 4504 4505 4506 4507 4508 4509 4510 4511 4512 4513 4514 4515 4516 4517 4518 4519 4520 4521 4522 4523 4524 4525 4526 4527 4528 4529 4530 4531 4532 4533 4534 4535 4536 4537 4538 4539 4540 4541 4542 4543 4544 4545 4546 4547 4548 4549 4550 4551 4552 4553 4554 4555 4556 4557 4558 4559 4560 4561 4562 4563 4564 4565 4566 4567 4568 4569 4570 4571 4572 4573 4574 4575 4576 4577 4578 4579 4580 4581 4582 4583 4584 4585 4586 4587 4588 4589 4590 4591 4592 4593 4594 4595 4596 4597 4598 4599 4600 4601 4602 4603 4604 4605 4606 4607 4608 4609 4610 4611 4612 4613 4614 4615 4616 4617 4618 4619 4620 4621 4622 4623 4624 4625 4626 4627 4628 4629 4630 4631 4632 4633 4634 4635 4636 4637 4638 4639 4640 4641 4642 4643 4644 4645 4646 4647 4648 4649 4650 4651 4652 4653 4654 4655 4656 4657 4658 4659 4660 4661 4662 4663 4664 4665 4666 4667 4668 4669 4670 4671 4672 4673 4674 4675 4676 4677 4678 4679 4680 4681 4682 4683 4684 4685 4686 4687 4688 4689 4690 4691 4692 4693 4694 4695 4696 4697 4698 4699 4700 4701 4702 4703 4704 4705 4706 4707 4708 4709 4710 4711 4712 4713 4714 4715 4716 4717 | // SPDX-License-Identifier: GPL-2.0-only /* * xfrm_policy.c * * Changes: * Mitsuru KANDA @USAGI * Kazunori MIYAZAWA @USAGI * Kunihiro Ishiguro <kunihiro@ipinfusion.com> * IPv6 support * Kazunori MIYAZAWA @USAGI * YOSHIFUJI Hideaki * Split up af-specific portion * Derek Atkins <derek@ihtfp.com> Add the post_input processor * */ #include <linux/err.h> #include <linux/slab.h> #include <linux/kmod.h> #include <linux/list.h> #include <linux/spinlock.h> #include <linux/workqueue.h> #include <linux/notifier.h> #include <linux/netdevice.h> #include <linux/netfilter.h> #include <linux/module.h> #include <linux/cache.h> #include <linux/cpu.h> #include <linux/audit.h> #include <linux/rhashtable.h> #include <linux/if_tunnel.h> #include <linux/icmp.h> #include <net/dst.h> #include <net/flow.h> #include <net/inet_ecn.h> #include <net/xfrm.h> #include <net/ip.h> #include <net/gre.h> #if IS_ENABLED(CONFIG_IPV6_MIP6) #include <net/mip6.h> #endif #ifdef CONFIG_XFRM_STATISTICS #include <net/snmp.h> #endif #ifdef CONFIG_XFRM_ESPINTCP #include <net/espintcp.h> #endif #include <net/inet_dscp.h> #include "xfrm_hash.h" #define XFRM_QUEUE_TMO_MIN ((unsigned)(HZ/10)) #define XFRM_QUEUE_TMO_MAX ((unsigned)(60*HZ)) #define XFRM_MAX_QUEUE_LEN 100 struct xfrm_flo { struct dst_entry *dst_orig; u8 flags; }; /* prefixes smaller than this are stored in lists, not trees. */ #define INEXACT_PREFIXLEN_IPV4 16 #define INEXACT_PREFIXLEN_IPV6 48 struct xfrm_pol_inexact_node { struct rb_node node; union { xfrm_address_t addr; struct rcu_head rcu; }; u8 prefixlen; struct rb_root root; /* the policies matching this node, can be empty list */ struct hlist_head hhead; }; /* xfrm inexact policy search tree: * xfrm_pol_inexact_bin = hash(dir,type,family,if_id); * | * +---- root_d: sorted by daddr:prefix * | | * | xfrm_pol_inexact_node * | | * | +- root: sorted by saddr/prefix * | | | * | | xfrm_pol_inexact_node * | | | * | | + root: unused * | | | * | | + hhead: saddr:daddr policies * | | * | +- coarse policies and all any:daddr policies * | * +---- root_s: sorted by saddr:prefix * | | * | xfrm_pol_inexact_node * | | * | + root: unused * | | * | + hhead: saddr:any policies * | * +---- coarse policies and all any:any policies * * Lookups return four candidate lists: * 1. any:any list from top-level xfrm_pol_inexact_bin * 2. any:daddr list from daddr tree * 3. saddr:daddr list from 2nd level daddr tree * 4. saddr:any list from saddr tree * * This result set then needs to be searched for the policy with * the lowest priority. If two candidates have the same priority, the * struct xfrm_policy pos member with the lower number is used. * * This replicates previous single-list-search algorithm which would * return first matching policy in the (ordered-by-priority) list. */ struct xfrm_pol_inexact_key { possible_net_t net; u32 if_id; u16 family; u8 dir, type; }; struct xfrm_pol_inexact_bin { struct xfrm_pol_inexact_key k; struct rhash_head head; /* list containing '*:*' policies */ struct hlist_head hhead; seqcount_spinlock_t count; /* tree sorted by daddr/prefix */ struct rb_root root_d; /* tree sorted by saddr/prefix */ struct rb_root root_s; /* slow path below */ struct list_head inexact_bins; struct rcu_head rcu; }; enum xfrm_pol_inexact_candidate_type { XFRM_POL_CAND_BOTH, XFRM_POL_CAND_SADDR, XFRM_POL_CAND_DADDR, XFRM_POL_CAND_ANY, XFRM_POL_CAND_MAX, }; struct xfrm_pol_inexact_candidates { struct hlist_head *res[XFRM_POL_CAND_MAX]; }; struct xfrm_flow_keys { struct flow_dissector_key_basic basic; struct flow_dissector_key_control control; union { struct flow_dissector_key_ipv4_addrs ipv4; struct flow_dissector_key_ipv6_addrs ipv6; } addrs; struct flow_dissector_key_ip ip; struct flow_dissector_key_icmp icmp; struct flow_dissector_key_ports ports; struct flow_dissector_key_keyid gre; }; static struct flow_dissector xfrm_session_dissector __ro_after_init; static DEFINE_SPINLOCK(xfrm_if_cb_lock); static struct xfrm_if_cb const __rcu *xfrm_if_cb __read_mostly; static DEFINE_SPINLOCK(xfrm_policy_afinfo_lock); static struct xfrm_policy_afinfo const __rcu *xfrm_policy_afinfo[AF_INET6 + 1] __read_mostly; static struct kmem_cache *xfrm_dst_cache __ro_after_init; static struct rhashtable xfrm_policy_inexact_table; static const struct rhashtable_params xfrm_pol_inexact_params; static void xfrm_init_pmtu(struct xfrm_dst **bundle, int nr); static int stale_bundle(struct dst_entry *dst); static int xfrm_bundle_ok(struct xfrm_dst *xdst); static void xfrm_policy_queue_process(struct timer_list *t); static void __xfrm_policy_link(struct xfrm_policy *pol, int dir); static struct xfrm_policy *__xfrm_policy_unlink(struct xfrm_policy *pol, int dir); static struct xfrm_pol_inexact_bin * xfrm_policy_inexact_lookup(struct net *net, u8 type, u16 family, u8 dir, u32 if_id); static struct xfrm_pol_inexact_bin * xfrm_policy_inexact_lookup_rcu(struct net *net, u8 type, u16 family, u8 dir, u32 if_id); static struct xfrm_policy * xfrm_policy_insert_list(struct hlist_head *chain, struct xfrm_policy *policy, bool excl); static bool xfrm_policy_find_inexact_candidates(struct xfrm_pol_inexact_candidates *cand, struct xfrm_pol_inexact_bin *b, const xfrm_address_t *saddr, const xfrm_address_t *daddr); static inline bool xfrm_pol_hold_rcu(struct xfrm_policy *policy) { return refcount_inc_not_zero(&policy->refcnt); } static inline bool __xfrm4_selector_match(const struct xfrm_selector *sel, const struct flowi *fl) { const struct flowi4 *fl4 = &fl->u.ip4; return addr4_match(fl4->daddr, sel->daddr.a4, sel->prefixlen_d) && addr4_match(fl4->saddr, sel->saddr.a4, sel->prefixlen_s) && !((xfrm_flowi_dport(fl, &fl4->uli) ^ sel->dport) & sel->dport_mask) && !((xfrm_flowi_sport(fl, &fl4->uli) ^ sel->sport) & sel->sport_mask) && (fl4->flowi4_proto == sel->proto || !sel->proto) && (fl4->flowi4_oif == sel->ifindex || !sel->ifindex); } static inline bool __xfrm6_selector_match(const struct xfrm_selector *sel, const struct flowi *fl) { const struct flowi6 *fl6 = &fl->u.ip6; return addr_match(&fl6->daddr, &sel->daddr, sel->prefixlen_d) && addr_match(&fl6->saddr, &sel->saddr, sel->prefixlen_s) && !((xfrm_flowi_dport(fl, &fl6->uli) ^ sel->dport) & sel->dport_mask) && !((xfrm_flowi_sport(fl, &fl6->uli) ^ sel->sport) & sel->sport_mask) && (fl6->flowi6_proto == sel->proto || !sel->proto) && (fl6->flowi6_oif == sel->ifindex || !sel->ifindex); } bool xfrm_selector_match(const struct xfrm_selector *sel, const struct flowi *fl, unsigned short family) { switch (family) { case AF_INET: return __xfrm4_selector_match(sel, fl); case AF_INET6: return __xfrm6_selector_match(sel, fl); } return false; } static const struct xfrm_policy_afinfo *xfrm_policy_get_afinfo(unsigned short family) { const struct xfrm_policy_afinfo *afinfo; if (unlikely(family >= ARRAY_SIZE(xfrm_policy_afinfo))) return NULL; rcu_read_lock(); afinfo = rcu_dereference(xfrm_policy_afinfo[family]); if (unlikely(!afinfo)) rcu_read_unlock(); return afinfo; } /* Called with rcu_read_lock(). */ static const struct xfrm_if_cb *xfrm_if_get_cb(void) { return rcu_dereference(xfrm_if_cb); } struct dst_entry *__xfrm_dst_lookup(int family, const struct xfrm_dst_lookup_params *params) { const struct xfrm_policy_afinfo *afinfo; struct dst_entry *dst; afinfo = xfrm_policy_get_afinfo(family); if (unlikely(afinfo == NULL)) return ERR_PTR(-EAFNOSUPPORT); dst = afinfo->dst_lookup(params); rcu_read_unlock(); return dst; } EXPORT_SYMBOL(__xfrm_dst_lookup); static inline struct dst_entry *xfrm_dst_lookup(struct xfrm_state *x, dscp_t dscp, int oif, xfrm_address_t *prev_saddr, xfrm_address_t *prev_daddr, int family, u32 mark) { struct xfrm_dst_lookup_params params; struct net *net = xs_net(x); xfrm_address_t *saddr = &x->props.saddr; xfrm_address_t *daddr = &x->id.daddr; struct dst_entry *dst; if (x->type->flags & XFRM_TYPE_LOCAL_COADDR) { saddr = x->coaddr; daddr = prev_daddr; } if (x->type->flags & XFRM_TYPE_REMOTE_COADDR) { saddr = prev_saddr; daddr = x->coaddr; } params.net = net; params.saddr = saddr; params.daddr = daddr; params.dscp = dscp; params.oif = oif; params.mark = mark; params.ipproto = x->id.proto; if (x->encap) { switch (x->encap->encap_type) { case UDP_ENCAP_ESPINUDP: params.ipproto = IPPROTO_UDP; params.uli.ports.sport = x->encap->encap_sport; params.uli.ports.dport = x->encap->encap_dport; break; case TCP_ENCAP_ESPINTCP: params.ipproto = IPPROTO_TCP; params.uli.ports.sport = x->encap->encap_sport; params.uli.ports.dport = x->encap->encap_dport; break; } } dst = __xfrm_dst_lookup(family, ¶ms); if (!IS_ERR(dst)) { if (prev_saddr != saddr) memcpy(prev_saddr, saddr, sizeof(*prev_saddr)); if (prev_daddr != daddr) memcpy(prev_daddr, daddr, sizeof(*prev_daddr)); } return dst; } static inline unsigned long make_jiffies(long secs) { if (secs >= (MAX_SCHEDULE_TIMEOUT-1)/HZ) return MAX_SCHEDULE_TIMEOUT-1; else return secs*HZ; } static void xfrm_policy_timer(struct timer_list *t) { struct xfrm_policy *xp = timer_container_of(xp, t, timer); time64_t now = ktime_get_real_seconds(); time64_t next = TIME64_MAX; int warn = 0; int dir; read_lock(&xp->lock); if (unlikely(xp->walk.dead)) goto out; dir = xfrm_policy_id2dir(xp->index); if (xp->lft.hard_add_expires_seconds) { time64_t tmo = xp->lft.hard_add_expires_seconds + xp->curlft.add_time - now; if (tmo <= 0) goto expired; if (tmo < next) next = tmo; } if (xp->lft.hard_use_expires_seconds) { time64_t tmo = xp->lft.hard_use_expires_seconds + (READ_ONCE(xp->curlft.use_time) ? : xp->curlft.add_time) - now; if (tmo <= 0) goto expired; if (tmo < next) next = tmo; } if (xp->lft.soft_add_expires_seconds) { time64_t tmo = xp->lft.soft_add_expires_seconds + xp->curlft.add_time - now; if (tmo <= 0) { warn = 1; tmo = XFRM_KM_TIMEOUT; } if (tmo < next) next = tmo; } if (xp->lft.soft_use_expires_seconds) { time64_t tmo = xp->lft.soft_use_expires_seconds + (READ_ONCE(xp->curlft.use_time) ? : xp->curlft.add_time) - now; if (tmo <= 0) { warn = 1; tmo = XFRM_KM_TIMEOUT; } if (tmo < next) next = tmo; } if (warn) km_policy_expired(xp, dir, 0, 0); if (next != TIME64_MAX && !mod_timer(&xp->timer, jiffies + make_jiffies(next))) xfrm_pol_hold(xp); out: read_unlock(&xp->lock); xfrm_pol_put(xp); return; expired: read_unlock(&xp->lock); if (!xfrm_policy_delete(xp, dir)) km_policy_expired(xp, dir, 1, 0); xfrm_pol_put(xp); } /* Allocate xfrm_policy. Not used here, it is supposed to be used by pfkeyv2 * SPD calls. */ struct xfrm_policy *xfrm_policy_alloc(struct net *net, gfp_t gfp) { struct xfrm_policy *policy; policy = kzalloc(sizeof(struct xfrm_policy), gfp); if (policy) { write_pnet(&policy->xp_net, net); INIT_LIST_HEAD(&policy->walk.all); INIT_HLIST_HEAD(&policy->state_cache_list); INIT_HLIST_NODE(&policy->bydst); INIT_HLIST_NODE(&policy->byidx); rwlock_init(&policy->lock); refcount_set(&policy->refcnt, 1); skb_queue_head_init(&policy->polq.hold_queue); timer_setup(&policy->timer, xfrm_policy_timer, 0); timer_setup(&policy->polq.hold_timer, xfrm_policy_queue_process, 0); } return policy; } EXPORT_SYMBOL(xfrm_policy_alloc); static void xfrm_policy_destroy_rcu(struct rcu_head *head) { struct xfrm_policy *policy = container_of(head, struct xfrm_policy, rcu); security_xfrm_policy_free(policy->security); kfree(policy); } /* Destroy xfrm_policy: descendant resources must be released to this moment. */ void xfrm_policy_destroy(struct xfrm_policy *policy) { BUG_ON(!policy->walk.dead); if (timer_delete(&policy->timer) || timer_delete(&policy->polq.hold_timer)) BUG(); xfrm_dev_policy_free(policy); call_rcu(&policy->rcu, xfrm_policy_destroy_rcu); } EXPORT_SYMBOL(xfrm_policy_destroy); /* Rule must be locked. Release descendant resources, announce * entry dead. The rule must be unlinked from lists to the moment. */ static void xfrm_policy_kill(struct xfrm_policy *policy) { struct net *net = xp_net(policy); struct xfrm_state *x; xfrm_dev_policy_delete(policy); write_lock_bh(&policy->lock); policy->walk.dead = 1; write_unlock_bh(&policy->lock); atomic_inc(&policy->genid); if (timer_delete(&policy->polq.hold_timer)) xfrm_pol_put(policy); skb_queue_purge(&policy->polq.hold_queue); if (timer_delete(&policy->timer)) xfrm_pol_put(policy); /* XXX: Flush state cache */ spin_lock_bh(&net->xfrm.xfrm_state_lock); hlist_for_each_entry_rcu(x, &policy->state_cache_list, state_cache) { hlist_del_init_rcu(&x->state_cache); } spin_unlock_bh(&net->xfrm.xfrm_state_lock); xfrm_pol_put(policy); } static unsigned int xfrm_policy_hashmax __read_mostly = 1 * 1024 * 1024; static inline unsigned int idx_hash(struct net *net, u32 index) { return __idx_hash(index, net->xfrm.policy_idx_hmask); } /* calculate policy hash thresholds */ static void __get_hash_thresh(struct net *net, unsigned short family, int dir, u8 *dbits, u8 *sbits) { switch (family) { case AF_INET: *dbits = net->xfrm.policy_bydst[dir].dbits4; *sbits = net->xfrm.policy_bydst[dir].sbits4; break; case AF_INET6: *dbits = net->xfrm.policy_bydst[dir].dbits6; *sbits = net->xfrm.policy_bydst[dir].sbits6; break; default: *dbits = 0; *sbits = 0; } } static struct hlist_head *policy_hash_bysel(struct net *net, const struct xfrm_selector *sel, unsigned short family, int dir) { unsigned int hmask = net->xfrm.policy_bydst[dir].hmask; unsigned int hash; u8 dbits; u8 sbits; __get_hash_thresh(net, family, dir, &dbits, &sbits); hash = __sel_hash(sel, family, hmask, dbits, sbits); if (hash == hmask + 1) return NULL; return rcu_dereference_check(net->xfrm.policy_bydst[dir].table, lockdep_is_held(&net->xfrm.xfrm_policy_lock)) + hash; } static struct hlist_head *policy_hash_direct(struct net *net, const xfrm_address_t *daddr, const xfrm_address_t *saddr, unsigned short family, int dir) { unsigned int hmask = net->xfrm.policy_bydst[dir].hmask; unsigned int hash; u8 dbits; u8 sbits; __get_hash_thresh(net, family, dir, &dbits, &sbits); hash = __addr_hash(daddr, saddr, family, hmask, dbits, sbits); return rcu_dereference_check(net->xfrm.policy_bydst[dir].table, lockdep_is_held(&net->xfrm.xfrm_policy_lock)) + hash; } static void xfrm_dst_hash_transfer(struct net *net, struct hlist_head *list, struct hlist_head *ndsttable, unsigned int nhashmask, int dir) { struct hlist_node *tmp, *entry0 = NULL; struct xfrm_policy *pol; unsigned int h0 = 0; u8 dbits; u8 sbits; redo: hlist_for_each_entry_safe(pol, tmp, list, bydst) { unsigned int h; __get_hash_thresh(net, pol->family, dir, &dbits, &sbits); h = __addr_hash(&pol->selector.daddr, &pol->selector.saddr, pol->family, nhashmask, dbits, sbits); if (!entry0 || pol->xdo.type == XFRM_DEV_OFFLOAD_PACKET) { hlist_del_rcu(&pol->bydst); hlist_add_head_rcu(&pol->bydst, ndsttable + h); h0 = h; } else { if (h != h0) continue; hlist_del_rcu(&pol->bydst); hlist_add_behind_rcu(&pol->bydst, entry0); } entry0 = &pol->bydst; } if (!hlist_empty(list)) { entry0 = NULL; goto redo; } } static void xfrm_idx_hash_transfer(struct hlist_head *list, struct hlist_head *nidxtable, unsigned int nhashmask) { struct hlist_node *tmp; struct xfrm_policy *pol; hlist_for_each_entry_safe(pol, tmp, list, byidx) { unsigned int h; h = __idx_hash(pol->index, nhashmask); hlist_add_head(&pol->byidx, nidxtable+h); } } static unsigned long xfrm_new_hash_mask(unsigned int old_hmask) { return ((old_hmask + 1) << 1) - 1; } static void xfrm_bydst_resize(struct net *net, int dir) { unsigned int hmask = net->xfrm.policy_bydst[dir].hmask; unsigned int nhashmask = xfrm_new_hash_mask(hmask); unsigned int nsize = (nhashmask + 1) * sizeof(struct hlist_head); struct hlist_head *ndst = xfrm_hash_alloc(nsize); struct hlist_head *odst; int i; if (!ndst) return; spin_lock_bh(&net->xfrm.xfrm_policy_lock); write_seqcount_begin(&net->xfrm.xfrm_policy_hash_generation); odst = rcu_dereference_protected(net->xfrm.policy_bydst[dir].table, lockdep_is_held(&net->xfrm.xfrm_policy_lock)); for (i = hmask; i >= 0; i--) xfrm_dst_hash_transfer(net, odst + i, ndst, nhashmask, dir); rcu_assign_pointer(net->xfrm.policy_bydst[dir].table, ndst); net->xfrm.policy_bydst[dir].hmask = nhashmask; write_seqcount_end(&net->xfrm.xfrm_policy_hash_generation); spin_unlock_bh(&net->xfrm.xfrm_policy_lock); synchronize_rcu(); xfrm_hash_free(odst, (hmask + 1) * sizeof(struct hlist_head)); } static void xfrm_byidx_resize(struct net *net) { unsigned int hmask = net->xfrm.policy_idx_hmask; unsigned int nhashmask = xfrm_new_hash_mask(hmask); unsigned int nsize = (nhashmask + 1) * sizeof(struct hlist_head); struct hlist_head *oidx = net->xfrm.policy_byidx; struct hlist_head *nidx = xfrm_hash_alloc(nsize); int i; if (!nidx) return; spin_lock_bh(&net->xfrm.xfrm_policy_lock); for (i = hmask; i >= 0; i--) xfrm_idx_hash_transfer(oidx + i, nidx, nhashmask); net->xfrm.policy_byidx = nidx; net->xfrm.policy_idx_hmask = nhashmask; spin_unlock_bh(&net->xfrm.xfrm_policy_lock); xfrm_hash_free(oidx, (hmask + 1) * sizeof(struct hlist_head)); } static inline int xfrm_bydst_should_resize(struct net *net, int dir, int *total) { unsigned int cnt = net->xfrm.policy_count[dir]; unsigned int hmask = net->xfrm.policy_bydst[dir].hmask; if (total) *total += cnt; if ((hmask + 1) < xfrm_policy_hashmax && cnt > hmask) return 1; return 0; } static inline int xfrm_byidx_should_resize(struct net *net, int total) { unsigned int hmask = net->xfrm.policy_idx_hmask; if ((hmask + 1) < xfrm_policy_hashmax && total > hmask) return 1; return 0; } void xfrm_spd_getinfo(struct net *net, struct xfrmk_spdinfo *si) { si->incnt = net->xfrm.policy_count[XFRM_POLICY_IN]; si->outcnt = net->xfrm.policy_count[XFRM_POLICY_OUT]; si->fwdcnt = net->xfrm.policy_count[XFRM_POLICY_FWD]; si->inscnt = net->xfrm.policy_count[XFRM_POLICY_IN+XFRM_POLICY_MAX]; si->outscnt = net->xfrm.policy_count[XFRM_POLICY_OUT+XFRM_POLICY_MAX]; si->fwdscnt = net->xfrm.policy_count[XFRM_POLICY_FWD+XFRM_POLICY_MAX]; si->spdhcnt = net->xfrm.policy_idx_hmask; si->spdhmcnt = xfrm_policy_hashmax; } EXPORT_SYMBOL(xfrm_spd_getinfo); static DEFINE_MUTEX(hash_resize_mutex); static void xfrm_hash_resize(struct work_struct *work) { struct net *net = container_of(work, struct net, xfrm.policy_hash_work); int dir, total; mutex_lock(&hash_resize_mutex); total = 0; for (dir = 0; dir < XFRM_POLICY_MAX; dir++) { if (xfrm_bydst_should_resize(net, dir, &total)) xfrm_bydst_resize(net, dir); } if (xfrm_byidx_should_resize(net, total)) xfrm_byidx_resize(net); mutex_unlock(&hash_resize_mutex); } /* Make sure *pol can be inserted into fastbin. * Useful to check that later insert requests will be successful * (provided xfrm_policy_lock is held throughout). */ static struct xfrm_pol_inexact_bin * xfrm_policy_inexact_alloc_bin(const struct xfrm_policy *pol, u8 dir) { struct xfrm_pol_inexact_bin *bin, *prev; struct xfrm_pol_inexact_key k = { .family = pol->family, .type = pol->type, .dir = dir, .if_id = pol->if_id, }; struct net *net = xp_net(pol); lockdep_assert_held(&net->xfrm.xfrm_policy_lock); write_pnet(&k.net, net); bin = rhashtable_lookup_fast(&xfrm_policy_inexact_table, &k, xfrm_pol_inexact_params); if (bin) return bin; bin = kzalloc(sizeof(*bin), GFP_ATOMIC); if (!bin) return NULL; bin->k = k; INIT_HLIST_HEAD(&bin->hhead); bin->root_d = RB_ROOT; bin->root_s = RB_ROOT; seqcount_spinlock_init(&bin->count, &net->xfrm.xfrm_policy_lock); prev = rhashtable_lookup_get_insert_key(&xfrm_policy_inexact_table, &bin->k, &bin->head, xfrm_pol_inexact_params); if (!prev) { list_add(&bin->inexact_bins, &net->xfrm.inexact_bins); return bin; } kfree(bin); return IS_ERR(prev) ? NULL : prev; } static bool xfrm_pol_inexact_addr_use_any_list(const xfrm_address_t *addr, int family, u8 prefixlen) { if (xfrm_addr_any(addr, family)) return true; if (family == AF_INET6 && prefixlen < INEXACT_PREFIXLEN_IPV6) return true; if (family == AF_INET && prefixlen < INEXACT_PREFIXLEN_IPV4) return true; return false; } static bool xfrm_policy_inexact_insert_use_any_list(const struct xfrm_policy *policy) { const xfrm_address_t *addr; bool saddr_any, daddr_any; u8 prefixlen; addr = &policy->selector.saddr; prefixlen = policy->selector.prefixlen_s; saddr_any = xfrm_pol_inexact_addr_use_any_list(addr, policy->family, prefixlen); addr = &policy->selector.daddr; prefixlen = policy->selector.prefixlen_d; daddr_any = xfrm_pol_inexact_addr_use_any_list(addr, policy->family, prefixlen); return saddr_any && daddr_any; } static void xfrm_pol_inexact_node_init(struct xfrm_pol_inexact_node *node, const xfrm_address_t *addr, u8 prefixlen) { node->addr = *addr; node->prefixlen = prefixlen; } static struct xfrm_pol_inexact_node * xfrm_pol_inexact_node_alloc(const xfrm_address_t *addr, u8 prefixlen) { struct xfrm_pol_inexact_node *node; node = kzalloc(sizeof(*node), GFP_ATOMIC); if (node) xfrm_pol_inexact_node_init(node, addr, prefixlen); return node; } static int xfrm_policy_addr_delta(const xfrm_address_t *a, const xfrm_address_t *b, u8 prefixlen, u16 family) { u32 ma, mb, mask; unsigned int pdw, pbi; int delta = 0; switch (family) { case AF_INET: if (prefixlen == 0) return 0; mask = ~0U << (32 - prefixlen); ma = ntohl(a->a4) & mask; mb = ntohl(b->a4) & mask; if (ma < mb) delta = -1; else if (ma > mb) delta = 1; break; case AF_INET6: pdw = prefixlen >> 5; pbi = prefixlen & 0x1f; if (pdw) { delta = memcmp(a->a6, b->a6, pdw << 2); if (delta) return delta; } if (pbi) { mask = ~0U << (32 - pbi); ma = ntohl(a->a6[pdw]) & mask; mb = ntohl(b->a6[pdw]) & mask; if (ma < mb) delta = -1; else if (ma > mb) delta = 1; } break; default: break; } return delta; } static void xfrm_policy_inexact_list_reinsert(struct net *net, struct xfrm_pol_inexact_node *n, u16 family) { unsigned int matched_s, matched_d; struct xfrm_policy *policy, *p; matched_s = 0; matched_d = 0; list_for_each_entry_reverse(policy, &net->xfrm.policy_all, walk.all) { struct hlist_node *newpos = NULL; bool matches_s, matches_d; if (policy->walk.dead || !policy->bydst_reinsert) continue; WARN_ON_ONCE(policy->family != family); policy->bydst_reinsert = false; hlist_for_each_entry(p, &n->hhead, bydst) { if (policy->priority > p->priority) newpos = &p->bydst; else if (policy->priority == p->priority && policy->pos > p->pos) newpos = &p->bydst; else break; } if (newpos && policy->xdo.type != XFRM_DEV_OFFLOAD_PACKET) hlist_add_behind_rcu(&policy->bydst, newpos); else hlist_add_head_rcu(&policy->bydst, &n->hhead); /* paranoia checks follow. * Check that the reinserted policy matches at least * saddr or daddr for current node prefix. * * Matching both is fine, matching saddr in one policy * (but not daddr) and then matching only daddr in another * is a bug. */ matches_s = xfrm_policy_addr_delta(&policy->selector.saddr, &n->addr, n->prefixlen, family) == 0; matches_d = xfrm_policy_addr_delta(&policy->selector.daddr, &n->addr, n->prefixlen, family) == 0; if (matches_s && matches_d) continue; WARN_ON_ONCE(!matches_s && !matches_d); if (matches_s) matched_s++; if (matches_d) matched_d++; WARN_ON_ONCE(matched_s && matched_d); } } static void xfrm_policy_inexact_node_reinsert(struct net *net, struct xfrm_pol_inexact_node *n, struct rb_root *new, u16 family) { struct xfrm_pol_inexact_node *node; struct rb_node **p, *parent; /* we should not have another subtree here */ WARN_ON_ONCE(!RB_EMPTY_ROOT(&n->root)); restart: parent = NULL; p = &new->rb_node; while (*p) { u8 prefixlen; int delta; parent = *p; node = rb_entry(*p, struct xfrm_pol_inexact_node, node); prefixlen = min(node->prefixlen, n->prefixlen); delta = xfrm_policy_addr_delta(&n->addr, &node->addr, prefixlen, family); if (delta < 0) { p = &parent->rb_left; } else if (delta > 0) { p = &parent->rb_right; } else { bool same_prefixlen = node->prefixlen == n->prefixlen; struct xfrm_policy *tmp; hlist_for_each_entry(tmp, &n->hhead, bydst) { tmp->bydst_reinsert = true; hlist_del_rcu(&tmp->bydst); } node->prefixlen = prefixlen; xfrm_policy_inexact_list_reinsert(net, node, family); if (same_prefixlen) { kfree_rcu(n, rcu); return; } rb_erase(*p, new); kfree_rcu(n, rcu); n = node; goto restart; } } rb_link_node_rcu(&n->node, parent, p); rb_insert_color(&n->node, new); } /* merge nodes v and n */ static void xfrm_policy_inexact_node_merge(struct net *net, struct xfrm_pol_inexact_node *v, struct xfrm_pol_inexact_node *n, u16 family) { struct xfrm_pol_inexact_node *node; struct xfrm_policy *tmp; struct rb_node *rnode; /* To-be-merged node v has a subtree. * * Dismantle it and insert its nodes to n->root. */ while ((rnode = rb_first(&v->root)) != NULL) { node = rb_entry(rnode, struct xfrm_pol_inexact_node, node); rb_erase(&node->node, &v->root); xfrm_policy_inexact_node_reinsert(net, node, &n->root, family); } hlist_for_each_entry(tmp, &v->hhead, bydst) { tmp->bydst_reinsert = true; hlist_del_rcu(&tmp->bydst); } xfrm_policy_inexact_list_reinsert(net, n, family); } static struct xfrm_pol_inexact_node * xfrm_policy_inexact_insert_node(struct net *net, struct rb_root *root, xfrm_address_t *addr, u16 family, u8 prefixlen, u8 dir) { struct xfrm_pol_inexact_node *cached = NULL; struct rb_node **p, *parent = NULL; struct xfrm_pol_inexact_node *node; p = &root->rb_node; while (*p) { int delta; parent = *p; node = rb_entry(*p, struct xfrm_pol_inexact_node, node); delta = xfrm_policy_addr_delta(addr, &node->addr, node->prefixlen, family); if (delta == 0 && prefixlen >= node->prefixlen) { WARN_ON_ONCE(cached); /* ipsec policies got lost */ return node; } if (delta < 0) p = &parent->rb_left; else p = &parent->rb_right; if (prefixlen < node->prefixlen) { delta = xfrm_policy_addr_delta(addr, &node->addr, prefixlen, family); if (delta) continue; /* This node is a subnet of the new prefix. It needs * to be removed and re-inserted with the smaller * prefix and all nodes that are now also covered * by the reduced prefixlen. */ rb_erase(&node->node, root); if (!cached) { xfrm_pol_inexact_node_init(node, addr, prefixlen); cached = node; } else { /* This node also falls within the new * prefixlen. Merge the to-be-reinserted * node and this one. */ xfrm_policy_inexact_node_merge(net, node, cached, family); kfree_rcu(node, rcu); } /* restart */ p = &root->rb_node; parent = NULL; } } node = cached; if (!node) { node = xfrm_pol_inexact_node_alloc(addr, prefixlen); if (!node) return NULL; } rb_link_node_rcu(&node->node, parent, p); rb_insert_color(&node->node, root); return node; } static void xfrm_policy_inexact_gc_tree(struct rb_root *r, bool rm) { struct xfrm_pol_inexact_node *node; struct rb_node *rn = rb_first(r); while (rn) { node = rb_entry(rn, struct xfrm_pol_inexact_node, node); xfrm_policy_inexact_gc_tree(&node->root, rm); rn = rb_next(rn); if (!hlist_empty(&node->hhead) || !RB_EMPTY_ROOT(&node->root)) { WARN_ON_ONCE(rm); continue; } rb_erase(&node->node, r); kfree_rcu(node, rcu); } } static void __xfrm_policy_inexact_prune_bin(struct xfrm_pol_inexact_bin *b, bool net_exit) { write_seqcount_begin(&b->count); xfrm_policy_inexact_gc_tree(&b->root_d, net_exit); xfrm_policy_inexact_gc_tree(&b->root_s, net_exit); write_seqcount_end(&b->count); if (!RB_EMPTY_ROOT(&b->root_d) || !RB_EMPTY_ROOT(&b->root_s) || !hlist_empty(&b->hhead)) { WARN_ON_ONCE(net_exit); return; } if (rhashtable_remove_fast(&xfrm_policy_inexact_table, &b->head, xfrm_pol_inexact_params) == 0) { list_del(&b->inexact_bins); kfree_rcu(b, rcu); } } static void xfrm_policy_inexact_prune_bin(struct xfrm_pol_inexact_bin *b) { struct net *net = read_pnet(&b->k.net); spin_lock_bh(&net->xfrm.xfrm_policy_lock); __xfrm_policy_inexact_prune_bin(b, false); spin_unlock_bh(&net->xfrm.xfrm_policy_lock); } static void __xfrm_policy_inexact_flush(struct net *net) { struct xfrm_pol_inexact_bin *bin, *t; lockdep_assert_held(&net->xfrm.xfrm_policy_lock); list_for_each_entry_safe(bin, t, &net->xfrm.inexact_bins, inexact_bins) __xfrm_policy_inexact_prune_bin(bin, false); } static struct hlist_head * xfrm_policy_inexact_alloc_chain(struct xfrm_pol_inexact_bin *bin, struct xfrm_policy *policy, u8 dir) { struct xfrm_pol_inexact_node *n; struct net *net; net = xp_net(policy); lockdep_assert_held(&net->xfrm.xfrm_policy_lock); if (xfrm_policy_inexact_insert_use_any_list(policy)) return &bin->hhead; if (xfrm_pol_inexact_addr_use_any_list(&policy->selector.daddr, policy->family, policy->selector.prefixlen_d)) { write_seqcount_begin(&bin->count); n = xfrm_policy_inexact_insert_node(net, &bin->root_s, &policy->selector.saddr, policy->family, policy->selector.prefixlen_s, dir); write_seqcount_end(&bin->count); if (!n) return NULL; return &n->hhead; } /* daddr is fixed */ write_seqcount_begin(&bin->count); n = xfrm_policy_inexact_insert_node(net, &bin->root_d, &policy->selector.daddr, policy->family, policy->selector.prefixlen_d, dir); write_seqcount_end(&bin->count); if (!n) return NULL; /* saddr is wildcard */ if (xfrm_pol_inexact_addr_use_any_list(&policy->selector.saddr, policy->family, policy->selector.prefixlen_s)) return &n->hhead; write_seqcount_begin(&bin->count); n = xfrm_policy_inexact_insert_node(net, &n->root, &policy->selector.saddr, policy->family, policy->selector.prefixlen_s, dir); write_seqcount_end(&bin->count); if (!n) return NULL; return &n->hhead; } static struct xfrm_policy * xfrm_policy_inexact_insert(struct xfrm_policy *policy, u8 dir, int excl) { struct xfrm_pol_inexact_bin *bin; struct xfrm_policy *delpol; struct hlist_head *chain; struct net *net; bin = xfrm_policy_inexact_alloc_bin(policy, dir); if (!bin) return ERR_PTR(-ENOMEM); net = xp_net(policy); lockdep_assert_held(&net->xfrm.xfrm_policy_lock); chain = xfrm_policy_inexact_alloc_chain(bin, policy, dir); if (!chain) { __xfrm_policy_inexact_prune_bin(bin, false); return ERR_PTR(-ENOMEM); } delpol = xfrm_policy_insert_list(chain, policy, excl); if (delpol && excl) { __xfrm_policy_inexact_prune_bin(bin, false); return ERR_PTR(-EEXIST); } if (delpol) __xfrm_policy_inexact_prune_bin(bin, false); return delpol; } static bool xfrm_policy_is_dead_or_sk(const struct xfrm_policy *policy) { int dir; if (policy->walk.dead) return true; dir = xfrm_policy_id2dir(policy->index); return dir >= XFRM_POLICY_MAX; } static void xfrm_hash_rebuild(struct work_struct *work) { struct net *net = container_of(work, struct net, xfrm.policy_hthresh.work); struct xfrm_policy *pol; struct xfrm_policy *policy; struct hlist_head *chain; struct hlist_node *newpos; int dir; unsigned seq; u8 lbits4, rbits4, lbits6, rbits6; mutex_lock(&hash_resize_mutex); /* read selector prefixlen thresholds */ do { seq = read_seqbegin(&net->xfrm.policy_hthresh.lock); lbits4 = net->xfrm.policy_hthresh.lbits4; rbits4 = net->xfrm.policy_hthresh.rbits4; lbits6 = net->xfrm.policy_hthresh.lbits6; rbits6 = net->xfrm.policy_hthresh.rbits6; } while (read_seqretry(&net->xfrm.policy_hthresh.lock, seq)); spin_lock_bh(&net->xfrm.xfrm_policy_lock); write_seqcount_begin(&net->xfrm.xfrm_policy_hash_generation); /* make sure that we can insert the indirect policies again before * we start with destructive action. */ list_for_each_entry(policy, &net->xfrm.policy_all, walk.all) { struct xfrm_pol_inexact_bin *bin; u8 dbits, sbits; if (xfrm_policy_is_dead_or_sk(policy)) continue; dir = xfrm_policy_id2dir(policy->index); if ((dir & XFRM_POLICY_MASK) == XFRM_POLICY_OUT) { if (policy->family == AF_INET) { dbits = rbits4; sbits = lbits4; } else { dbits = rbits6; sbits = lbits6; } } else { if (policy->family == AF_INET) { dbits = lbits4; sbits = rbits4; } else { dbits = lbits6; sbits = rbits6; } } if (policy->selector.prefixlen_d < dbits || policy->selector.prefixlen_s < sbits) continue; bin = xfrm_policy_inexact_alloc_bin(policy, dir); if (!bin) goto out_unlock; if (!xfrm_policy_inexact_alloc_chain(bin, policy, dir)) goto out_unlock; } for (dir = 0; dir < XFRM_POLICY_MAX; dir++) { if ((dir & XFRM_POLICY_MASK) == XFRM_POLICY_OUT) { /* dir out => dst = remote, src = local */ net->xfrm.policy_bydst[dir].dbits4 = rbits4; net->xfrm.policy_bydst[dir].sbits4 = lbits4; net->xfrm.policy_bydst[dir].dbits6 = rbits6; net->xfrm.policy_bydst[dir].sbits6 = lbits6; } else { /* dir in/fwd => dst = local, src = remote */ net->xfrm.policy_bydst[dir].dbits4 = lbits4; net->xfrm.policy_bydst[dir].sbits4 = rbits4; net->xfrm.policy_bydst[dir].dbits6 = lbits6; net->xfrm.policy_bydst[dir].sbits6 = rbits6; } } /* re-insert all policies by order of creation */ list_for_each_entry_reverse(policy, &net->xfrm.policy_all, walk.all) { if (xfrm_policy_is_dead_or_sk(policy)) continue; hlist_del_rcu(&policy->bydst); newpos = NULL; dir = xfrm_policy_id2dir(policy->index); chain = policy_hash_bysel(net, &policy->selector, policy->family, dir); if (!chain) { void *p = xfrm_policy_inexact_insert(policy, dir, 0); WARN_ONCE(IS_ERR(p), "reinsert: %ld\n", PTR_ERR(p)); continue; } hlist_for_each_entry(pol, chain, bydst) { if (policy->priority >= pol->priority) newpos = &pol->bydst; else break; } if (newpos && policy->xdo.type != XFRM_DEV_OFFLOAD_PACKET) hlist_add_behind_rcu(&policy->bydst, newpos); else hlist_add_head_rcu(&policy->bydst, chain); } out_unlock: __xfrm_policy_inexact_flush(net); write_seqcount_end(&net->xfrm.xfrm_policy_hash_generation); spin_unlock_bh(&net->xfrm.xfrm_policy_lock); mutex_unlock(&hash_resize_mutex); } void xfrm_policy_hash_rebuild(struct net *net) { schedule_work(&net->xfrm.policy_hthresh.work); } EXPORT_SYMBOL(xfrm_policy_hash_rebuild); /* Generate new index... KAME seems to generate them ordered by cost * of an absolute inpredictability of ordering of rules. This will not pass. */ static u32 xfrm_gen_index(struct net *net, int dir, u32 index) { for (;;) { struct hlist_head *list; struct xfrm_policy *p; u32 idx; int found; if (!index) { idx = (net->xfrm.idx_generator | dir); net->xfrm.idx_generator += 8; } else { idx = index; index = 0; } if (idx == 0) idx = 8; list = net->xfrm.policy_byidx + idx_hash(net, idx); found = 0; hlist_for_each_entry(p, list, byidx) { if (p->index == idx) { found = 1; break; } } if (!found) return idx; } } static inline int selector_cmp(struct xfrm_selector *s1, struct xfrm_selector *s2) { u32 *p1 = (u32 *) s1; u32 *p2 = (u32 *) s2; int len = sizeof(struct xfrm_selector) / sizeof(u32); int i; for (i = 0; i < len; i++) { if (p1[i] != p2[i]) return 1; } return 0; } static void xfrm_policy_requeue(struct xfrm_policy *old, struct xfrm_policy *new) { struct xfrm_policy_queue *pq = &old->polq; struct sk_buff_head list; if (skb_queue_empty(&pq->hold_queue)) return; __skb_queue_head_init(&list); spin_lock_bh(&pq->hold_queue.lock); skb_queue_splice_init(&pq->hold_queue, &list); if (timer_delete(&pq->hold_timer)) xfrm_pol_put(old); spin_unlock_bh(&pq->hold_queue.lock); pq = &new->polq; spin_lock_bh(&pq->hold_queue.lock); skb_queue_splice(&list, &pq->hold_queue); pq->timeout = XFRM_QUEUE_TMO_MIN; if (!mod_timer(&pq->hold_timer, jiffies)) xfrm_pol_hold(new); spin_unlock_bh(&pq->hold_queue.lock); } static inline bool xfrm_policy_mark_match(const struct xfrm_mark *mark, struct xfrm_policy *pol) { return mark->v == pol->mark.v && mark->m == pol->mark.m; } static u32 xfrm_pol_bin_key(const void *data, u32 len, u32 seed) { const struct xfrm_pol_inexact_key *k = data; u32 a = k->type << 24 | k->dir << 16 | k->family; return jhash_3words(a, k->if_id, net_hash_mix(read_pnet(&k->net)), seed); } static u32 xfrm_pol_bin_obj(const void *data, u32 len, u32 seed) { const struct xfrm_pol_inexact_bin *b = data; return xfrm_pol_bin_key(&b->k, 0, seed); } static int xfrm_pol_bin_cmp(struct rhashtable_compare_arg *arg, const void *ptr) { const struct xfrm_pol_inexact_key *key = arg->key; const struct xfrm_pol_inexact_bin *b = ptr; int ret; if (!net_eq(read_pnet(&b->k.net), read_pnet(&key->net))) return -1; ret = b->k.dir ^ key->dir; if (ret) return ret; ret = b->k.type ^ key->type; if (ret) return ret; ret = b->k.family ^ key->family; if (ret) return ret; return b->k.if_id ^ key->if_id; } static const struct rhashtable_params xfrm_pol_inexact_params = { .head_offset = offsetof(struct xfrm_pol_inexact_bin, head), .hashfn = xfrm_pol_bin_key, .obj_hashfn = xfrm_pol_bin_obj, .obj_cmpfn = xfrm_pol_bin_cmp, .automatic_shrinking = true, }; static struct xfrm_policy *xfrm_policy_insert_list(struct hlist_head *chain, struct xfrm_policy *policy, bool excl) { struct xfrm_policy *pol, *newpos = NULL, *delpol = NULL; hlist_for_each_entry(pol, chain, bydst) { if (pol->type == policy->type && pol->if_id == policy->if_id && !selector_cmp(&pol->selector, &policy->selector) && xfrm_policy_mark_match(&policy->mark, pol) && xfrm_sec_ctx_match(pol->security, policy->security) && !WARN_ON(delpol)) { if (excl) return ERR_PTR(-EEXIST); delpol = pol; if (policy->priority > pol->priority) continue; } else if (policy->priority >= pol->priority) { newpos = pol; continue; } if (delpol) break; } if (newpos && policy->xdo.type != XFRM_DEV_OFFLOAD_PACKET) hlist_add_behind_rcu(&policy->bydst, &newpos->bydst); else /* Packet offload policies enter to the head * to speed-up lookups. */ hlist_add_head_rcu(&policy->bydst, chain); return delpol; } int xfrm_policy_insert(int dir, struct xfrm_policy *policy, int excl) { struct net *net = xp_net(policy); struct xfrm_policy *delpol; struct hlist_head *chain; /* Sanitize mark before store */ policy->mark.v &= policy->mark.m; spin_lock_bh(&net->xfrm.xfrm_policy_lock); chain = policy_hash_bysel(net, &policy->selector, policy->family, dir); if (chain) delpol = xfrm_policy_insert_list(chain, policy, excl); else delpol = xfrm_policy_inexact_insert(policy, dir, excl); if (IS_ERR(delpol)) { spin_unlock_bh(&net->xfrm.xfrm_policy_lock); return PTR_ERR(delpol); } __xfrm_policy_link(policy, dir); /* After previous checking, family can either be AF_INET or AF_INET6 */ if (policy->family == AF_INET) rt_genid_bump_ipv4(net); else rt_genid_bump_ipv6(net); if (delpol) { xfrm_policy_requeue(delpol, policy); __xfrm_policy_unlink(delpol, dir); } policy->index = delpol ? delpol->index : xfrm_gen_index(net, dir, policy->index); hlist_add_head(&policy->byidx, net->xfrm.policy_byidx+idx_hash(net, policy->index)); policy->curlft.add_time = ktime_get_real_seconds(); policy->curlft.use_time = 0; if (!mod_timer(&policy->timer, jiffies + HZ)) xfrm_pol_hold(policy); spin_unlock_bh(&net->xfrm.xfrm_policy_lock); if (delpol) xfrm_policy_kill(delpol); else if (xfrm_bydst_should_resize(net, dir, NULL)) schedule_work(&net->xfrm.policy_hash_work); return 0; } EXPORT_SYMBOL(xfrm_policy_insert); static struct xfrm_policy * __xfrm_policy_bysel_ctx(struct hlist_head *chain, const struct xfrm_mark *mark, u32 if_id, u8 type, int dir, struct xfrm_selector *sel, struct xfrm_sec_ctx *ctx) { struct xfrm_policy *pol; if (!chain) return NULL; hlist_for_each_entry(pol, chain, bydst) { if (pol->type == type && pol->if_id == if_id && xfrm_policy_mark_match(mark, pol) && !selector_cmp(sel, &pol->selector) && xfrm_sec_ctx_match(ctx, pol->security)) return pol; } return NULL; } struct xfrm_policy * xfrm_policy_bysel_ctx(struct net *net, const struct xfrm_mark *mark, u32 if_id, u8 type, int dir, struct xfrm_selector *sel, struct xfrm_sec_ctx *ctx, int delete, int *err) { struct xfrm_pol_inexact_bin *bin = NULL; struct xfrm_policy *pol, *ret = NULL; struct hlist_head *chain; *err = 0; spin_lock_bh(&net->xfrm.xfrm_policy_lock); chain = policy_hash_bysel(net, sel, sel->family, dir); if (!chain) { struct xfrm_pol_inexact_candidates cand; int i; bin = xfrm_policy_inexact_lookup(net, type, sel->family, dir, if_id); if (!bin) { spin_unlock_bh(&net->xfrm.xfrm_policy_lock); return NULL; } if (!xfrm_policy_find_inexact_candidates(&cand, bin, &sel->saddr, &sel->daddr)) { spin_unlock_bh(&net->xfrm.xfrm_policy_lock); return NULL; } pol = NULL; for (i = 0; i < ARRAY_SIZE(cand.res); i++) { struct xfrm_policy *tmp; tmp = __xfrm_policy_bysel_ctx(cand.res[i], mark, if_id, type, dir, sel, ctx); if (!tmp) continue; if (!pol || tmp->pos < pol->pos) pol = tmp; } } else { pol = __xfrm_policy_bysel_ctx(chain, mark, if_id, type, dir, sel, ctx); } if (pol) { xfrm_pol_hold(pol); if (delete) { *err = security_xfrm_policy_delete(pol->security); if (*err) { spin_unlock_bh(&net->xfrm.xfrm_policy_lock); return pol; } __xfrm_policy_unlink(pol, dir); } ret = pol; } spin_unlock_bh(&net->xfrm.xfrm_policy_lock); if (ret && delete) xfrm_policy_kill(ret); if (bin && delete) xfrm_policy_inexact_prune_bin(bin); return ret; } EXPORT_SYMBOL(xfrm_policy_bysel_ctx); struct xfrm_policy * xfrm_policy_byid(struct net *net, const struct xfrm_mark *mark, u32 if_id, u8 type, int dir, u32 id, int delete, int *err) { struct xfrm_policy *pol, *ret; struct hlist_head *chain; *err = -ENOENT; if (xfrm_policy_id2dir(id) != dir) return NULL; *err = 0; spin_lock_bh(&net->xfrm.xfrm_policy_lock); chain = net->xfrm.policy_byidx + idx_hash(net, id); ret = NULL; hlist_for_each_entry(pol, chain, byidx) { if (pol->type == type && pol->index == id && pol->if_id == if_id && xfrm_policy_mark_match(mark, pol)) { xfrm_pol_hold(pol); if (delete) { *err = security_xfrm_policy_delete( pol->security); if (*err) { spin_unlock_bh(&net->xfrm.xfrm_policy_lock); return pol; } __xfrm_policy_unlink(pol, dir); } ret = pol; break; } } spin_unlock_bh(&net->xfrm.xfrm_policy_lock); if (ret && delete) xfrm_policy_kill(ret); return ret; } EXPORT_SYMBOL(xfrm_policy_byid); #ifdef CONFIG_SECURITY_NETWORK_XFRM static inline int xfrm_policy_flush_secctx_check(struct net *net, u8 type, bool task_valid) { struct xfrm_policy *pol; int err = 0; list_for_each_entry(pol, &net->xfrm.policy_all, walk.all) { if (pol->walk.dead || xfrm_policy_id2dir(pol->index) >= XFRM_POLICY_MAX || pol->type != type) continue; err = security_xfrm_policy_delete(pol->security); if (err) { xfrm_audit_policy_delete(pol, 0, task_valid); return err; } } return err; } static inline int xfrm_dev_policy_flush_secctx_check(struct net *net, struct net_device *dev, bool task_valid) { struct xfrm_policy *pol; int err = 0; list_for_each_entry(pol, &net->xfrm.policy_all, walk.all) { if (pol->walk.dead || xfrm_policy_id2dir(pol->index) >= XFRM_POLICY_MAX || pol->xdo.dev != dev) continue; err = security_xfrm_policy_delete(pol->security); if (err) { xfrm_audit_policy_delete(pol, 0, task_valid); return err; } } return err; } #else static inline int xfrm_policy_flush_secctx_check(struct net *net, u8 type, bool task_valid) { return 0; } static inline int xfrm_dev_policy_flush_secctx_check(struct net *net, struct net_device *dev, bool task_valid) { return 0; } #endif int xfrm_policy_flush(struct net *net, u8 type, bool task_valid) { int dir, err = 0, cnt = 0; struct xfrm_policy *pol; spin_lock_bh(&net->xfrm.xfrm_policy_lock); err = xfrm_policy_flush_secctx_check(net, type, task_valid); if (err) goto out; again: list_for_each_entry(pol, &net->xfrm.policy_all, walk.all) { if (pol->walk.dead) continue; dir = xfrm_policy_id2dir(pol->index); if (dir >= XFRM_POLICY_MAX || pol->type != type) continue; __xfrm_policy_unlink(pol, dir); spin_unlock_bh(&net->xfrm.xfrm_policy_lock); cnt++; xfrm_audit_policy_delete(pol, 1, task_valid); xfrm_policy_kill(pol); spin_lock_bh(&net->xfrm.xfrm_policy_lock); goto again; } if (cnt) __xfrm_policy_inexact_flush(net); else err = -ESRCH; out: spin_unlock_bh(&net->xfrm.xfrm_policy_lock); return err; } EXPORT_SYMBOL(xfrm_policy_flush); int xfrm_dev_policy_flush(struct net *net, struct net_device *dev, bool task_valid) { int dir, err = 0, cnt = 0; struct xfrm_policy *pol; spin_lock_bh(&net->xfrm.xfrm_policy_lock); err = xfrm_dev_policy_flush_secctx_check(net, dev, task_valid); if (err) goto out; again: list_for_each_entry(pol, &net->xfrm.policy_all, walk.all) { if (pol->walk.dead) continue; dir = xfrm_policy_id2dir(pol->index); if (dir >= XFRM_POLICY_MAX || pol->xdo.dev != dev) continue; __xfrm_policy_unlink(pol, dir); spin_unlock_bh(&net->xfrm.xfrm_policy_lock); cnt++; xfrm_audit_policy_delete(pol, 1, task_valid); xfrm_policy_kill(pol); spin_lock_bh(&net->xfrm.xfrm_policy_lock); goto again; } if (cnt) __xfrm_policy_inexact_flush(net); else err = -ESRCH; out: spin_unlock_bh(&net->xfrm.xfrm_policy_lock); return err; } EXPORT_SYMBOL(xfrm_dev_policy_flush); int xfrm_policy_walk(struct net *net, struct xfrm_policy_walk *walk, int (*func)(struct xfrm_policy *, int, int, void*), void *data) { struct xfrm_policy *pol; struct xfrm_policy_walk_entry *x; int error = 0; if (walk->type >= XFRM_POLICY_TYPE_MAX && walk->type != XFRM_POLICY_TYPE_ANY) return -EINVAL; if (list_empty(&walk->walk.all) && walk->seq != 0) return 0; spin_lock_bh(&net->xfrm.xfrm_policy_lock); if (list_empty(&walk->walk.all)) x = list_first_entry(&net->xfrm.policy_all, struct xfrm_policy_walk_entry, all); else x = list_first_entry(&walk->walk.all, struct xfrm_policy_walk_entry, all); list_for_each_entry_from(x, &net->xfrm.policy_all, all) { if (x->dead) continue; pol = container_of(x, struct xfrm_policy, walk); if (walk->type != XFRM_POLICY_TYPE_ANY && walk->type != pol->type) continue; error = func(pol, xfrm_policy_id2dir(pol->index), walk->seq, data); if (error) { list_move_tail(&walk->walk.all, &x->all); goto out; } walk->seq++; } if (walk->seq == 0) { error = -ENOENT; goto out; } list_del_init(&walk->walk.all); out: spin_unlock_bh(&net->xfrm.xfrm_policy_lock); return error; } EXPORT_SYMBOL(xfrm_policy_walk); void xfrm_policy_walk_init(struct xfrm_policy_walk *walk, u8 type) { INIT_LIST_HEAD(&walk->walk.all); walk->walk.dead = 1; walk->type = type; walk->seq = 0; } EXPORT_SYMBOL(xfrm_policy_walk_init); void xfrm_policy_walk_done(struct xfrm_policy_walk *walk, struct net *net) { if (list_empty(&walk->walk.all)) return; spin_lock_bh(&net->xfrm.xfrm_policy_lock); /*FIXME where is net? */ list_del(&walk->walk.all); spin_unlock_bh(&net->xfrm.xfrm_policy_lock); } EXPORT_SYMBOL(xfrm_policy_walk_done); /* * Find policy to apply to this flow. * * Returns 0 if policy found, else an -errno. */ static int xfrm_policy_match(const struct xfrm_policy *pol, const struct flowi *fl, u8 type, u16 family, u32 if_id) { const struct xfrm_selector *sel = &pol->selector; int ret = -ESRCH; bool match; if (pol->family != family || pol->if_id != if_id || (fl->flowi_mark & pol->mark.m) != pol->mark.v || pol->type != type) return ret; match = xfrm_selector_match(sel, fl, family); if (match) ret = security_xfrm_policy_lookup(pol->security, fl->flowi_secid); return ret; } static struct xfrm_pol_inexact_node * xfrm_policy_lookup_inexact_addr(const struct rb_root *r, seqcount_spinlock_t *count, const xfrm_address_t *addr, u16 family) { const struct rb_node *parent; int seq; again: seq = read_seqcount_begin(count); parent = rcu_dereference_raw(r->rb_node); while (parent) { struct xfrm_pol_inexact_node *node; int delta; node = rb_entry(parent, struct xfrm_pol_inexact_node, node); delta = xfrm_policy_addr_delta(addr, &node->addr, node->prefixlen, family); if (delta < 0) { parent = rcu_dereference_raw(parent->rb_left); continue; } else if (delta > 0) { parent = rcu_dereference_raw(parent->rb_right); continue; } return node; } if (read_seqcount_retry(count, seq)) goto again; return NULL; } static bool xfrm_policy_find_inexact_candidates(struct xfrm_pol_inexact_candidates *cand, struct xfrm_pol_inexact_bin *b, const xfrm_address_t *saddr, const xfrm_address_t *daddr) { struct xfrm_pol_inexact_node *n; u16 family; if (!b) return false; family = b->k.family; memset(cand, 0, sizeof(*cand)); cand->res[XFRM_POL_CAND_ANY] = &b->hhead; n = xfrm_policy_lookup_inexact_addr(&b->root_d, &b->count, daddr, family); if (n) { cand->res[XFRM_POL_CAND_DADDR] = &n->hhead; n = xfrm_policy_lookup_inexact_addr(&n->root, &b->count, saddr, family); if (n) cand->res[XFRM_POL_CAND_BOTH] = &n->hhead; } n = xfrm_policy_lookup_inexact_addr(&b->root_s, &b->count, saddr, family); if (n) cand->res[XFRM_POL_CAND_SADDR] = &n->hhead; return true; } static struct xfrm_pol_inexact_bin * xfrm_policy_inexact_lookup_rcu(struct net *net, u8 type, u16 family, u8 dir, u32 if_id) { struct xfrm_pol_inexact_key k = { .family = family, .type = type, .dir = dir, .if_id = if_id, }; write_pnet(&k.net, net); return rhashtable_lookup(&xfrm_policy_inexact_table, &k, xfrm_pol_inexact_params); } static struct xfrm_pol_inexact_bin * xfrm_policy_inexact_lookup(struct net *net, u8 type, u16 family, u8 dir, u32 if_id) { struct xfrm_pol_inexact_bin *bin; lockdep_assert_held(&net->xfrm.xfrm_policy_lock); rcu_read_lock(); bin = xfrm_policy_inexact_lookup_rcu(net, type, family, dir, if_id); rcu_read_unlock(); return bin; } static struct xfrm_policy * __xfrm_policy_eval_candidates(struct hlist_head *chain, struct xfrm_policy *prefer, const struct flowi *fl, u8 type, u16 family, u32 if_id) { u32 priority = prefer ? prefer->priority : ~0u; struct xfrm_policy *pol; if (!chain) return NULL; hlist_for_each_entry_rcu(pol, chain, bydst) { int err; if (pol->priority > priority) break; err = xfrm_policy_match(pol, fl, type, family, if_id); if (err) { if (err != -ESRCH) return ERR_PTR(err); continue; } if (prefer) { /* matches. Is it older than *prefer? */ if (pol->priority == priority && prefer->pos < pol->pos) return prefer; } return pol; } return NULL; } static struct xfrm_policy * xfrm_policy_eval_candidates(struct xfrm_pol_inexact_candidates *cand, struct xfrm_policy *prefer, const struct flowi *fl, u8 type, u16 family, u32 if_id) { struct xfrm_policy *tmp; int i; for (i = 0; i < ARRAY_SIZE(cand->res); i++) { tmp = __xfrm_policy_eval_candidates(cand->res[i], prefer, fl, type, family, if_id); if (!tmp) continue; if (IS_ERR(tmp)) return tmp; prefer = tmp; } return prefer; } static struct xfrm_policy *xfrm_policy_lookup_bytype(struct net *net, u8 type, const struct flowi *fl, u16 family, u8 dir, u32 if_id) { struct xfrm_pol_inexact_candidates cand; const xfrm_address_t *daddr, *saddr; struct xfrm_pol_inexact_bin *bin; struct xfrm_policy *pol, *ret; struct hlist_head *chain; unsigned int sequence; int err; daddr = xfrm_flowi_daddr(fl, family); saddr = xfrm_flowi_saddr(fl, family); if (unlikely(!daddr || !saddr)) return NULL; rcu_read_lock(); retry: do { sequence = read_seqcount_begin(&net->xfrm.xfrm_policy_hash_generation); chain = policy_hash_direct(net, daddr, saddr, family, dir); } while (read_seqcount_retry(&net->xfrm.xfrm_policy_hash_generation, sequence)); ret = NULL; hlist_for_each_entry_rcu(pol, chain, bydst) { err = xfrm_policy_match(pol, fl, type, family, if_id); if (err) { if (err == -ESRCH) continue; else { ret = ERR_PTR(err); goto fail; } } else { ret = pol; break; } } if (ret && ret->xdo.type == XFRM_DEV_OFFLOAD_PACKET) goto skip_inexact; bin = xfrm_policy_inexact_lookup_rcu(net, type, family, dir, if_id); if (!bin || !xfrm_policy_find_inexact_candidates(&cand, bin, saddr, daddr)) goto skip_inexact; pol = xfrm_policy_eval_candidates(&cand, ret, fl, type, family, if_id); if (pol) { ret = pol; if (IS_ERR(pol)) goto fail; } skip_inexact: if (read_seqcount_retry(&net->xfrm.xfrm_policy_hash_generation, sequence)) goto retry; if (ret && !xfrm_pol_hold_rcu(ret)) goto retry; fail: rcu_read_unlock(); return ret; } static struct xfrm_policy *xfrm_policy_lookup(struct net *net, const struct flowi *fl, u16 family, u8 dir, u32 if_id) { #ifdef CONFIG_XFRM_SUB_POLICY struct xfrm_policy *pol; pol = xfrm_policy_lookup_bytype(net, XFRM_POLICY_TYPE_SUB, fl, family, dir, if_id); if (pol != NULL) return pol; #endif return xfrm_policy_lookup_bytype(net, XFRM_POLICY_TYPE_MAIN, fl, family, dir, if_id); } static struct xfrm_policy *xfrm_sk_policy_lookup(const struct sock *sk, int dir, const struct flowi *fl, u16 family, u32 if_id) { struct xfrm_policy *pol; rcu_read_lock(); again: pol = rcu_dereference(sk->sk_policy[dir]); if (pol != NULL) { bool match; int err = 0; if (pol->family != family) { pol = NULL; goto out; } match = xfrm_selector_match(&pol->selector, fl, family); if (match) { if ((READ_ONCE(sk->sk_mark) & pol->mark.m) != pol->mark.v || pol->if_id != if_id) { pol = NULL; goto out; } err = security_xfrm_policy_lookup(pol->security, fl->flowi_secid); if (!err) { if (!xfrm_pol_hold_rcu(pol)) goto again; } else if (err == -ESRCH) { pol = NULL; } else { pol = ERR_PTR(err); } } else pol = NULL; } out: rcu_read_unlock(); return pol; } static u32 xfrm_gen_pos_slow(struct net *net) { struct xfrm_policy *policy; u32 i = 0; /* oldest entry is last in list */ list_for_each_entry_reverse(policy, &net->xfrm.policy_all, walk.all) { if (!xfrm_policy_is_dead_or_sk(policy)) policy->pos = ++i; } return i; } static u32 xfrm_gen_pos(struct net *net) { const struct xfrm_policy *policy; u32 i = 0; /* most recently added policy is at the head of the list */ list_for_each_entry(policy, &net->xfrm.policy_all, walk.all) { if (xfrm_policy_is_dead_or_sk(policy)) continue; if (policy->pos == UINT_MAX) return xfrm_gen_pos_slow(net); i = policy->pos + 1; break; } return i; } static void __xfrm_policy_link(struct xfrm_policy *pol, int dir) { struct net *net = xp_net(pol); switch (dir) { case XFRM_POLICY_IN: case XFRM_POLICY_FWD: case XFRM_POLICY_OUT: pol->pos = xfrm_gen_pos(net); break; } list_add(&pol->walk.all, &net->xfrm.policy_all); net->xfrm.policy_count[dir]++; xfrm_pol_hold(pol); } static struct xfrm_policy *__xfrm_policy_unlink(struct xfrm_policy *pol, int dir) { struct net *net = xp_net(pol); if (list_empty(&pol->walk.all)) return NULL; /* Socket policies are not hashed. */ if (!hlist_unhashed(&pol->bydst)) { hlist_del_rcu(&pol->bydst); hlist_del(&pol->byidx); } list_del_init(&pol->walk.all); net->xfrm.policy_count[dir]--; return pol; } static void xfrm_sk_policy_link(struct xfrm_policy *pol, int dir) { __xfrm_policy_link(pol, XFRM_POLICY_MAX + dir); } static void xfrm_sk_policy_unlink(struct xfrm_policy *pol, int dir) { __xfrm_policy_unlink(pol, XFRM_POLICY_MAX + dir); } int xfrm_policy_delete(struct xfrm_policy *pol, int dir) { struct net *net = xp_net(pol); spin_lock_bh(&net->xfrm.xfrm_policy_lock); pol = __xfrm_policy_unlink(pol, dir); spin_unlock_bh(&net->xfrm.xfrm_policy_lock); if (pol) { xfrm_policy_kill(pol); return 0; } return -ENOENT; } EXPORT_SYMBOL(xfrm_policy_delete); int xfrm_sk_policy_insert(struct sock *sk, int dir, struct xfrm_policy *pol) { struct net *net = sock_net(sk); struct xfrm_policy *old_pol; #ifdef CONFIG_XFRM_SUB_POLICY if (pol && pol->type != XFRM_POLICY_TYPE_MAIN) return -EINVAL; #endif spin_lock_bh(&net->xfrm.xfrm_policy_lock); old_pol = rcu_dereference_protected(sk->sk_policy[dir], lockdep_is_held(&net->xfrm.xfrm_policy_lock)); if (pol) { pol->curlft.add_time = ktime_get_real_seconds(); pol->index = xfrm_gen_index(net, XFRM_POLICY_MAX+dir, 0); xfrm_sk_policy_link(pol, dir); } rcu_assign_pointer(sk->sk_policy[dir], pol); if (old_pol) { if (pol) xfrm_policy_requeue(old_pol, pol); /* Unlinking succeeds always. This is the only function * allowed to delete or replace socket policy. */ xfrm_sk_policy_unlink(old_pol, dir); } spin_unlock_bh(&net->xfrm.xfrm_policy_lock); if (old_pol) { xfrm_policy_kill(old_pol); } return 0; } static struct xfrm_policy *clone_policy(const struct xfrm_policy *old, int dir) { struct xfrm_policy *newp = xfrm_policy_alloc(xp_net(old), GFP_ATOMIC); struct net *net = xp_net(old); if (newp) { newp->selector = old->selector; if (security_xfrm_policy_clone(old->security, &newp->security)) { kfree(newp); return NULL; /* ENOMEM */ } newp->lft = old->lft; newp->curlft = old->curlft; newp->mark = old->mark; newp->if_id = old->if_id; newp->action = old->action; newp->flags = old->flags; newp->xfrm_nr = old->xfrm_nr; newp->index = old->index; newp->type = old->type; newp->family = old->family; memcpy(newp->xfrm_vec, old->xfrm_vec, newp->xfrm_nr*sizeof(struct xfrm_tmpl)); spin_lock_bh(&net->xfrm.xfrm_policy_lock); xfrm_sk_policy_link(newp, dir); spin_unlock_bh(&net->xfrm.xfrm_policy_lock); xfrm_pol_put(newp); } return newp; } int __xfrm_sk_clone_policy(struct sock *sk, const struct sock *osk) { const struct xfrm_policy *p; struct xfrm_policy *np; int i, ret = 0; rcu_read_lock(); for (i = 0; i < 2; i++) { p = rcu_dereference(osk->sk_policy[i]); if (p) { np = clone_policy(p, i); if (unlikely(!np)) { ret = -ENOMEM; break; } rcu_assign_pointer(sk->sk_policy[i], np); } } rcu_read_unlock(); return ret; } static int xfrm_get_saddr(unsigned short family, xfrm_address_t *saddr, const struct xfrm_dst_lookup_params *params) { int err; const struct xfrm_policy_afinfo *afinfo = xfrm_policy_get_afinfo(family); if (unlikely(afinfo == NULL)) return -EINVAL; err = afinfo->get_saddr(saddr, params); rcu_read_unlock(); return err; } /* Resolve list of templates for the flow, given policy. */ static int xfrm_tmpl_resolve_one(struct xfrm_policy *policy, const struct flowi *fl, struct xfrm_state **xfrm, unsigned short family) { struct net *net = xp_net(policy); int nx; int i, error; xfrm_address_t *daddr = xfrm_flowi_daddr(fl, family); xfrm_address_t *saddr = xfrm_flowi_saddr(fl, family); xfrm_address_t tmp; for (nx = 0, i = 0; i < policy->xfrm_nr; i++) { struct xfrm_state *x; xfrm_address_t *remote = daddr; xfrm_address_t *local = saddr; struct xfrm_tmpl *tmpl = &policy->xfrm_vec[i]; if (tmpl->mode == XFRM_MODE_TUNNEL || tmpl->mode == XFRM_MODE_IPTFS || tmpl->mode == XFRM_MODE_BEET) { remote = &tmpl->id.daddr; local = &tmpl->saddr; if (xfrm_addr_any(local, tmpl->encap_family)) { struct xfrm_dst_lookup_params params; memset(¶ms, 0, sizeof(params)); params.net = net; params.oif = fl->flowi_oif; params.daddr = remote; error = xfrm_get_saddr(tmpl->encap_family, &tmp, ¶ms); if (error) goto fail; local = &tmp; } } x = xfrm_state_find(remote, local, fl, tmpl, policy, &error, family, policy->if_id); if (x && x->dir && x->dir != XFRM_SA_DIR_OUT) { XFRM_INC_STATS(net, LINUX_MIB_XFRMOUTSTATEDIRERROR); xfrm_state_put(x); error = -EINVAL; goto fail; } if (x && x->km.state == XFRM_STATE_VALID) { xfrm[nx++] = x; daddr = remote; saddr = local; continue; } if (x) { error = (x->km.state == XFRM_STATE_ERROR ? -EINVAL : -EAGAIN); xfrm_state_put(x); } else if (error == -ESRCH) { error = -EAGAIN; } if (!tmpl->optional) goto fail; } return nx; fail: for (nx--; nx >= 0; nx--) xfrm_state_put(xfrm[nx]); return error; } static int xfrm_tmpl_resolve(struct xfrm_policy **pols, int npols, const struct flowi *fl, struct xfrm_state **xfrm, unsigned short family) { struct xfrm_state *tp[XFRM_MAX_DEPTH]; struct xfrm_state **tpp = (npols > 1) ? tp : xfrm; int cnx = 0; int error; int ret; int i; for (i = 0; i < npols; i++) { if (cnx + pols[i]->xfrm_nr >= XFRM_MAX_DEPTH) { error = -ENOBUFS; goto fail; } ret = xfrm_tmpl_resolve_one(pols[i], fl, &tpp[cnx], family); if (ret < 0) { error = ret; goto fail; } else cnx += ret; } /* found states are sorted for outbound processing */ if (npols > 1) xfrm_state_sort(xfrm, tpp, cnx, family); return cnx; fail: for (cnx--; cnx >= 0; cnx--) xfrm_state_put(tpp[cnx]); return error; } static dscp_t xfrm_get_dscp(const struct flowi *fl, int family) { if (family == AF_INET) return fl->u.ip4.flowi4_dscp; return 0; } static inline struct xfrm_dst *xfrm_alloc_dst(struct net *net, int family) { const struct xfrm_policy_afinfo *afinfo = xfrm_policy_get_afinfo(family); struct dst_ops *dst_ops; struct xfrm_dst *xdst; if (!afinfo) return ERR_PTR(-EINVAL); switch (family) { case AF_INET: dst_ops = &net->xfrm.xfrm4_dst_ops; break; #if IS_ENABLED(CONFIG_IPV6) case AF_INET6: dst_ops = &net->xfrm.xfrm6_dst_ops; break; #endif default: BUG(); } xdst = dst_alloc(dst_ops, NULL, DST_OBSOLETE_NONE, 0); if (likely(xdst)) { memset_after(xdst, 0, u.dst); } else xdst = ERR_PTR(-ENOBUFS); rcu_read_unlock(); return xdst; } static void xfrm_init_path(struct xfrm_dst *path, struct dst_entry *dst, int nfheader_len) { if (dst->ops->family == AF_INET6) { path->path_cookie = rt6_get_cookie(dst_rt6_info(dst)); path->u.rt6.rt6i_nfheader_len = nfheader_len; } } static inline int xfrm_fill_dst(struct xfrm_dst *xdst, struct net_device *dev, const struct flowi *fl) { const struct xfrm_policy_afinfo *afinfo = xfrm_policy_get_afinfo(xdst->u.dst.ops->family); int err; if (!afinfo) return -EINVAL; err = afinfo->fill_dst(xdst, dev, fl); rcu_read_unlock(); return err; } /* Allocate chain of dst_entry's, attach known xfrm's, calculate * all the metrics... Shortly, bundle a bundle. */ static struct dst_entry *xfrm_bundle_create(struct xfrm_policy *policy, struct xfrm_state **xfrm, struct xfrm_dst **bundle, int nx, const struct flowi *fl, struct dst_entry *dst) { const struct xfrm_state_afinfo *afinfo; const struct xfrm_mode *inner_mode; struct net *net = xp_net(policy); unsigned long now = jiffies; struct net_device *dev; struct xfrm_dst *xdst_prev = NULL; struct xfrm_dst *xdst0 = NULL; int i = 0; int err; int header_len = 0; int nfheader_len = 0; int trailer_len = 0; int family = policy->selector.family; xfrm_address_t saddr, daddr; dscp_t dscp; xfrm_flowi_addr_get(fl, &saddr, &daddr, family); dscp = xfrm_get_dscp(fl, family); dst_hold(dst); for (; i < nx; i++) { struct xfrm_dst *xdst = xfrm_alloc_dst(net, family); struct dst_entry *dst1 = &xdst->u.dst; err = PTR_ERR(xdst); if (IS_ERR(xdst)) { dst_release(dst); goto put_states; } bundle[i] = xdst; if (!xdst_prev) xdst0 = xdst; else /* Ref count is taken during xfrm_alloc_dst() * No need to do dst_clone() on dst1 */ xfrm_dst_set_child(xdst_prev, &xdst->u.dst); if (xfrm[i]->sel.family == AF_UNSPEC) { inner_mode = xfrm_ip2inner_mode(xfrm[i], xfrm_af2proto(family)); if (!inner_mode) { err = -EAFNOSUPPORT; dst_release(dst); goto put_states; } } else inner_mode = &xfrm[i]->inner_mode; xdst->route = dst; dst_copy_metrics(dst1, dst); if (xfrm[i]->props.mode != XFRM_MODE_TRANSPORT) { __u32 mark = 0; int oif; if (xfrm[i]->props.smark.v || xfrm[i]->props.smark.m) mark = xfrm_smark_get(fl->flowi_mark, xfrm[i]); if (xfrm[i]->xso.type != XFRM_DEV_OFFLOAD_PACKET) family = xfrm[i]->props.family; oif = fl->flowi_oif ? : fl->flowi_l3mdev; dst = xfrm_dst_lookup(xfrm[i], dscp, oif, &saddr, &daddr, family, mark); err = PTR_ERR(dst); if (IS_ERR(dst)) goto put_states; } else dst_hold(dst); dst1->xfrm = xfrm[i]; xdst->xfrm_genid = xfrm[i]->genid; dst1->obsolete = DST_OBSOLETE_FORCE_CHK; dst1->lastuse = now; dst1->input = dst_discard; if (xfrm[i]->mode_cbs && xfrm[i]->mode_cbs->output) { dst1->output = xfrm[i]->mode_cbs->output; } else { rcu_read_lock(); afinfo = xfrm_state_afinfo_get_rcu(inner_mode->family); if (likely(afinfo)) dst1->output = afinfo->output; else dst1->output = dst_discard_out; rcu_read_unlock(); } xdst_prev = xdst; header_len += xfrm[i]->props.header_len; if (xfrm[i]->type->flags & XFRM_TYPE_NON_FRAGMENT) nfheader_len += xfrm[i]->props.header_len; trailer_len += xfrm[i]->props.trailer_len; } xfrm_dst_set_child(xdst_prev, dst); xdst0->path = dst; err = -ENODEV; dev = dst->dev; if (!dev) goto free_dst; xfrm_init_path(xdst0, dst, nfheader_len); xfrm_init_pmtu(bundle, nx); for (xdst_prev = xdst0; xdst_prev != (struct xfrm_dst *)dst; xdst_prev = (struct xfrm_dst *) xfrm_dst_child(&xdst_prev->u.dst)) { err = xfrm_fill_dst(xdst_prev, dev, fl); if (err) goto free_dst; xdst_prev->u.dst.header_len = header_len; xdst_prev->u.dst.trailer_len = trailer_len; header_len -= xdst_prev->u.dst.xfrm->props.header_len; trailer_len -= xdst_prev->u.dst.xfrm->props.trailer_len; } return &xdst0->u.dst; put_states: for (; i < nx; i++) xfrm_state_put(xfrm[i]); free_dst: if (xdst0) dst_release_immediate(&xdst0->u.dst); return ERR_PTR(err); } static int xfrm_expand_policies(const struct flowi *fl, u16 family, struct xfrm_policy **pols, int *num_pols, int *num_xfrms) { int i; if (*num_pols == 0 || !pols[0]) { *num_pols = 0; *num_xfrms = 0; return 0; } if (IS_ERR(pols[0])) { *num_pols = 0; return PTR_ERR(pols[0]); } *num_xfrms = pols[0]->xfrm_nr; #ifdef CONFIG_XFRM_SUB_POLICY if (pols[0]->action == XFRM_POLICY_ALLOW && pols[0]->type != XFRM_POLICY_TYPE_MAIN) { pols[1] = xfrm_policy_lookup_bytype(xp_net(pols[0]), XFRM_POLICY_TYPE_MAIN, fl, family, XFRM_POLICY_OUT, pols[0]->if_id); if (pols[1]) { if (IS_ERR(pols[1])) { xfrm_pols_put(pols, *num_pols); *num_pols = 0; return PTR_ERR(pols[1]); } (*num_pols)++; (*num_xfrms) += pols[1]->xfrm_nr; } } #endif for (i = 0; i < *num_pols; i++) { if (pols[i]->action != XFRM_POLICY_ALLOW) { *num_xfrms = -1; break; } } return 0; } static struct xfrm_dst * xfrm_resolve_and_create_bundle(struct xfrm_policy **pols, int num_pols, const struct flowi *fl, u16 family, struct dst_entry *dst_orig) { struct net *net = xp_net(pols[0]); struct xfrm_state *xfrm[XFRM_MAX_DEPTH]; struct xfrm_dst *bundle[XFRM_MAX_DEPTH]; struct xfrm_dst *xdst; struct dst_entry *dst; int err; /* Try to instantiate a bundle */ err = xfrm_tmpl_resolve(pols, num_pols, fl, xfrm, family); if (err <= 0) { if (err == 0) return NULL; if (err != -EAGAIN) XFRM_INC_STATS(net, LINUX_MIB_XFRMOUTPOLERROR); return ERR_PTR(err); } dst = xfrm_bundle_create(pols[0], xfrm, bundle, err, fl, dst_orig); if (IS_ERR(dst)) { XFRM_INC_STATS(net, LINUX_MIB_XFRMOUTBUNDLEGENERROR); return ERR_CAST(dst); } xdst = (struct xfrm_dst *)dst; xdst->num_xfrms = err; xdst->num_pols = num_pols; memcpy(xdst->pols, pols, sizeof(struct xfrm_policy *) * num_pols); xdst->policy_genid = atomic_read(&pols[0]->genid); return xdst; } static void xfrm_policy_queue_process(struct timer_list *t) { struct sk_buff *skb; struct sock *sk; struct dst_entry *dst; struct xfrm_policy *pol = timer_container_of(pol, t, polq.hold_timer); struct net *net = xp_net(pol); struct xfrm_policy_queue *pq = &pol->polq; struct flowi fl; struct sk_buff_head list; __u32 skb_mark; spin_lock(&pq->hold_queue.lock); skb = skb_peek(&pq->hold_queue); if (!skb) { spin_unlock(&pq->hold_queue.lock); goto out; } dst = skb_dst(skb); sk = skb->sk; /* Fixup the mark to support VTI. */ skb_mark = skb->mark; skb->mark = pol->mark.v; xfrm_decode_session(net, skb, &fl, dst->ops->family); skb->mark = skb_mark; spin_unlock(&pq->hold_queue.lock); dst_hold(xfrm_dst_path(dst)); dst = xfrm_lookup(net, xfrm_dst_path(dst), &fl, sk, XFRM_LOOKUP_QUEUE); if (IS_ERR(dst)) goto purge_queue; if (dst->flags & DST_XFRM_QUEUE) { dst_release(dst); if (pq->timeout >= XFRM_QUEUE_TMO_MAX) goto purge_queue; pq->timeout = pq->timeout << 1; if (!mod_timer(&pq->hold_timer, jiffies + pq->timeout)) xfrm_pol_hold(pol); goto out; } dst_release(dst); __skb_queue_head_init(&list); spin_lock(&pq->hold_queue.lock); pq->timeout = 0; skb_queue_splice_init(&pq->hold_queue, &list); spin_unlock(&pq->hold_queue.lock); while (!skb_queue_empty(&list)) { skb = __skb_dequeue(&list); /* Fixup the mark to support VTI. */ skb_mark = skb->mark; skb->mark = pol->mark.v; xfrm_decode_session(net, skb, &fl, skb_dst(skb)->ops->family); skb->mark = skb_mark; dst_hold(xfrm_dst_path(skb_dst(skb))); dst = xfrm_lookup(net, xfrm_dst_path(skb_dst(skb)), &fl, skb->sk, 0); if (IS_ERR(dst)) { kfree_skb(skb); continue; } nf_reset_ct(skb); skb_dst_drop(skb); skb_dst_set(skb, dst); dst_output(net, skb_to_full_sk(skb), skb); } out: xfrm_pol_put(pol); return; purge_queue: pq->timeout = 0; skb_queue_purge(&pq->hold_queue); xfrm_pol_put(pol); } static int xdst_queue_output(struct net *net, struct sock *sk, struct sk_buff *skb) { unsigned long sched_next; struct dst_entry *dst = skb_dst(skb); struct xfrm_dst *xdst = (struct xfrm_dst *) dst; struct xfrm_policy *pol = xdst->pols[0]; struct xfrm_policy_queue *pq = &pol->polq; if (unlikely(skb_fclone_busy(sk, skb))) { kfree_skb(skb); return 0; } if (pq->hold_queue.qlen > XFRM_MAX_QUEUE_LEN) { kfree_skb(skb); return -EAGAIN; } skb_dst_force(skb); spin_lock_bh(&pq->hold_queue.lock); if (!pq->timeout) pq->timeout = XFRM_QUEUE_TMO_MIN; sched_next = jiffies + pq->timeout; if (timer_delete(&pq->hold_timer)) { if (time_before(pq->hold_timer.expires, sched_next)) sched_next = pq->hold_timer.expires; xfrm_pol_put(pol); } __skb_queue_tail(&pq->hold_queue, skb); if (!mod_timer(&pq->hold_timer, sched_next)) xfrm_pol_hold(pol); spin_unlock_bh(&pq->hold_queue.lock); return 0; } static struct xfrm_dst *xfrm_create_dummy_bundle(struct net *net, struct xfrm_flo *xflo, const struct flowi *fl, int num_xfrms, u16 family) { int err; struct net_device *dev; struct dst_entry *dst; struct dst_entry *dst1; struct xfrm_dst *xdst; xdst = xfrm_alloc_dst(net, family); if (IS_ERR(xdst)) return xdst; if (!(xflo->flags & XFRM_LOOKUP_QUEUE) || net->xfrm.sysctl_larval_drop || num_xfrms <= 0) return xdst; dst = xflo->dst_orig; dst1 = &xdst->u.dst; dst_hold(dst); xdst->route = dst; dst_copy_metrics(dst1, dst); dst1->obsolete = DST_OBSOLETE_FORCE_CHK; dst1->flags |= DST_XFRM_QUEUE; dst1->lastuse = jiffies; dst1->input = dst_discard; dst1->output = xdst_queue_output; dst_hold(dst); xfrm_dst_set_child(xdst, dst); xdst->path = dst; xfrm_init_path((struct xfrm_dst *)dst1, dst, 0); err = -ENODEV; dev = dst->dev; if (!dev) goto free_dst; err = xfrm_fill_dst(xdst, dev, fl); if (err) goto free_dst; out: return xdst; free_dst: dst_release(dst1); xdst = ERR_PTR(err); goto out; } static struct xfrm_dst *xfrm_bundle_lookup(struct net *net, const struct flowi *fl, u16 family, u8 dir, struct xfrm_flo *xflo, u32 if_id) { struct xfrm_policy *pols[XFRM_POLICY_TYPE_MAX]; int num_pols = 0, num_xfrms = 0, err; struct xfrm_dst *xdst; /* Resolve policies to use if we couldn't get them from * previous cache entry */ num_pols = 1; pols[0] = xfrm_policy_lookup(net, fl, family, dir, if_id); err = xfrm_expand_policies(fl, family, pols, &num_pols, &num_xfrms); if (err < 0) goto inc_error; if (num_pols == 0) return NULL; if (num_xfrms <= 0) goto make_dummy_bundle; xdst = xfrm_resolve_and_create_bundle(pols, num_pols, fl, family, xflo->dst_orig); if (IS_ERR(xdst)) { err = PTR_ERR(xdst); if (err == -EREMOTE) { xfrm_pols_put(pols, num_pols); return NULL; } if (err != -EAGAIN) goto error; goto make_dummy_bundle; } else if (xdst == NULL) { num_xfrms = 0; goto make_dummy_bundle; } return xdst; make_dummy_bundle: /* We found policies, but there's no bundles to instantiate: * either because the policy blocks, has no transformations or * we could not build template (no xfrm_states).*/ xdst = xfrm_create_dummy_bundle(net, xflo, fl, num_xfrms, family); if (IS_ERR(xdst)) { xfrm_pols_put(pols, num_pols); return ERR_CAST(xdst); } xdst->num_pols = num_pols; xdst->num_xfrms = num_xfrms; memcpy(xdst->pols, pols, sizeof(struct xfrm_policy *) * num_pols); return xdst; inc_error: XFRM_INC_STATS(net, LINUX_MIB_XFRMOUTPOLERROR); error: xfrm_pols_put(pols, num_pols); return ERR_PTR(err); } static struct dst_entry *make_blackhole(struct net *net, u16 family, struct dst_entry *dst_orig) { const struct xfrm_policy_afinfo *afinfo = xfrm_policy_get_afinfo(family); struct dst_entry *ret; if (!afinfo) { dst_release(dst_orig); return ERR_PTR(-EINVAL); } else { ret = afinfo->blackhole_route(net, dst_orig); } rcu_read_unlock(); return ret; } /* Finds/creates a bundle for given flow and if_id * * At the moment we eat a raw IP route. Mostly to speed up lookups * on interfaces with disabled IPsec. * * xfrm_lookup uses an if_id of 0 by default, and is provided for * compatibility */ struct dst_entry *xfrm_lookup_with_ifid(struct net *net, struct dst_entry *dst_orig, const struct flowi *fl, const struct sock *sk, int flags, u32 if_id) { struct xfrm_policy *pols[XFRM_POLICY_TYPE_MAX]; struct xfrm_dst *xdst; struct dst_entry *dst, *route; u16 family = dst_orig->ops->family; u8 dir = XFRM_POLICY_OUT; int i, err, num_pols, num_xfrms = 0, drop_pols = 0; dst = NULL; xdst = NULL; route = NULL; sk = sk_const_to_full_sk(sk); if (sk && sk->sk_policy[XFRM_POLICY_OUT]) { num_pols = 1; pols[0] = xfrm_sk_policy_lookup(sk, XFRM_POLICY_OUT, fl, family, if_id); err = xfrm_expand_policies(fl, family, pols, &num_pols, &num_xfrms); if (err < 0) goto dropdst; if (num_pols) { if (num_xfrms <= 0) { drop_pols = num_pols; goto no_transform; } xdst = xfrm_resolve_and_create_bundle( pols, num_pols, fl, family, dst_orig); if (IS_ERR(xdst)) { xfrm_pols_put(pols, num_pols); err = PTR_ERR(xdst); if (err == -EREMOTE) goto nopol; goto dropdst; } else if (xdst == NULL) { num_xfrms = 0; drop_pols = num_pols; goto no_transform; } route = xdst->route; } } if (xdst == NULL) { struct xfrm_flo xflo; xflo.dst_orig = dst_orig; xflo.flags = flags; /* To accelerate a bit... */ if (!if_id && ((dst_orig->flags & DST_NOXFRM) || !net->xfrm.policy_count[XFRM_POLICY_OUT])) goto nopol; xdst = xfrm_bundle_lookup(net, fl, family, dir, &xflo, if_id); if (xdst == NULL) goto nopol; if (IS_ERR(xdst)) { err = PTR_ERR(xdst); goto dropdst; } num_pols = xdst->num_pols; num_xfrms = xdst->num_xfrms; memcpy(pols, xdst->pols, sizeof(struct xfrm_policy *) * num_pols); route = xdst->route; } dst = &xdst->u.dst; if (route == NULL && num_xfrms > 0) { /* The only case when xfrm_bundle_lookup() returns a * bundle with null route, is when the template could * not be resolved. It means policies are there, but * bundle could not be created, since we don't yet * have the xfrm_state's. We need to wait for KM to * negotiate new SA's or bail out with error.*/ if (net->xfrm.sysctl_larval_drop) { XFRM_INC_STATS(net, LINUX_MIB_XFRMOUTNOSTATES); err = -EREMOTE; goto error; } err = -EAGAIN; XFRM_INC_STATS(net, LINUX_MIB_XFRMOUTNOSTATES); goto error; } no_transform: if (num_pols == 0) goto nopol; if ((flags & XFRM_LOOKUP_ICMP) && !(pols[0]->flags & XFRM_POLICY_ICMP)) { err = -ENOENT; goto error; } for (i = 0; i < num_pols; i++) WRITE_ONCE(pols[i]->curlft.use_time, ktime_get_real_seconds()); if (num_xfrms < 0) { /* Prohibit the flow */ XFRM_INC_STATS(net, LINUX_MIB_XFRMOUTPOLBLOCK); err = -EPERM; goto error; } else if (num_xfrms > 0) { /* Flow transformed */ dst_release(dst_orig); } else { /* Flow passes untransformed */ dst_release(dst); dst = dst_orig; } ok: xfrm_pols_put(pols, drop_pols); if (dst->xfrm && (dst->xfrm->props.mode == XFRM_MODE_TUNNEL || dst->xfrm->props.mode == XFRM_MODE_IPTFS)) dst->flags |= DST_XFRM_TUNNEL; return dst; nopol: if ((!dst_orig->dev || !(dst_orig->dev->flags & IFF_LOOPBACK)) && net->xfrm.policy_default[dir] == XFRM_USERPOLICY_BLOCK) { err = -EPERM; goto error; } if (!(flags & XFRM_LOOKUP_ICMP)) { dst = dst_orig; goto ok; } err = -ENOENT; error: dst_release(dst); dropdst: if (!(flags & XFRM_LOOKUP_KEEP_DST_REF)) dst_release(dst_orig); xfrm_pols_put(pols, drop_pols); return ERR_PTR(err); } EXPORT_SYMBOL(xfrm_lookup_with_ifid); /* Main function: finds/creates a bundle for given flow. * * At the moment we eat a raw IP route. Mostly to speed up lookups * on interfaces with disabled IPsec. */ struct dst_entry *xfrm_lookup(struct net *net, struct dst_entry *dst_orig, const struct flowi *fl, const struct sock *sk, int flags) { return xfrm_lookup_with_ifid(net, dst_orig, fl, sk, flags, 0); } EXPORT_SYMBOL(xfrm_lookup); /* Callers of xfrm_lookup_route() must ensure a call to dst_output(). * Otherwise we may send out blackholed packets. */ struct dst_entry *xfrm_lookup_route(struct net *net, struct dst_entry *dst_orig, const struct flowi *fl, const struct sock *sk, int flags) { struct dst_entry *dst = xfrm_lookup(net, dst_orig, fl, sk, flags | XFRM_LOOKUP_QUEUE | XFRM_LOOKUP_KEEP_DST_REF); if (PTR_ERR(dst) == -EREMOTE) return make_blackhole(net, dst_orig->ops->family, dst_orig); if (IS_ERR(dst)) dst_release(dst_orig); return dst; } EXPORT_SYMBOL(xfrm_lookup_route); static inline int xfrm_secpath_reject(int idx, struct sk_buff *skb, const struct flowi *fl) { struct sec_path *sp = skb_sec_path(skb); struct xfrm_state *x; if (!sp || idx < 0 || idx >= sp->len) return 0; x = sp->xvec[idx]; if (!x->type->reject) return 0; return x->type->reject(x, skb, fl); } /* When skb is transformed back to its "native" form, we have to * check policy restrictions. At the moment we make this in maximally * stupid way. Shame on me. :-) Of course, connected sockets must * have policy cached at them. */ static inline int xfrm_state_ok(const struct xfrm_tmpl *tmpl, const struct xfrm_state *x, unsigned short family, u32 if_id) { if (xfrm_state_kern(x)) return tmpl->optional && !xfrm_state_addr_cmp(tmpl, x, tmpl->encap_family); return x->id.proto == tmpl->id.proto && (x->id.spi == tmpl->id.spi || !tmpl->id.spi) && (x->props.reqid == tmpl->reqid || !tmpl->reqid) && x->props.mode == tmpl->mode && (tmpl->allalgs || (tmpl->aalgos & (1<<x->props.aalgo)) || !(xfrm_id_proto_match(tmpl->id.proto, IPSEC_PROTO_ANY))) && !(x->props.mode != XFRM_MODE_TRANSPORT && xfrm_state_addr_cmp(tmpl, x, family)) && (if_id == 0 || if_id == x->if_id); } /* * 0 or more than 0 is returned when validation is succeeded (either bypass * because of optional transport mode, or next index of the matched secpath * state with the template. * -1 is returned when no matching template is found. * Otherwise "-2 - errored_index" is returned. */ static inline int xfrm_policy_ok(const struct xfrm_tmpl *tmpl, const struct sec_path *sp, int start, unsigned short family, u32 if_id) { int idx = start; if (tmpl->optional) { if (tmpl->mode == XFRM_MODE_TRANSPORT) return start; } else start = -1; for (; idx < sp->len; idx++) { if (xfrm_state_ok(tmpl, sp->xvec[idx], family, if_id)) return ++idx; if (sp->xvec[idx]->props.mode != XFRM_MODE_TRANSPORT) { if (idx < sp->verified_cnt) { /* Secpath entry previously verified, consider optional and * continue searching */ continue; } if (start == -1) start = -2-idx; break; } } return start; } static void decode_session4(const struct xfrm_flow_keys *flkeys, struct flowi *fl, bool reverse) { struct flowi4 *fl4 = &fl->u.ip4; memset(fl4, 0, sizeof(struct flowi4)); if (reverse) { fl4->saddr = flkeys->addrs.ipv4.dst; fl4->daddr = flkeys->addrs.ipv4.src; fl4->fl4_sport = flkeys->ports.dst; fl4->fl4_dport = flkeys->ports.src; } else { fl4->saddr = flkeys->addrs.ipv4.src; fl4->daddr = flkeys->addrs.ipv4.dst; fl4->fl4_sport = flkeys->ports.src; fl4->fl4_dport = flkeys->ports.dst; } switch (flkeys->basic.ip_proto) { case IPPROTO_GRE: fl4->fl4_gre_key = flkeys->gre.keyid; break; case IPPROTO_ICMP: fl4->fl4_icmp_type = flkeys->icmp.type; fl4->fl4_icmp_code = flkeys->icmp.code; break; } fl4->flowi4_proto = flkeys->basic.ip_proto; fl4->flowi4_dscp = inet_dsfield_to_dscp(flkeys->ip.tos); } #if IS_ENABLED(CONFIG_IPV6) static void decode_session6(const struct xfrm_flow_keys *flkeys, struct flowi *fl, bool reverse) { struct flowi6 *fl6 = &fl->u.ip6; memset(fl6, 0, sizeof(struct flowi6)); if (reverse) { fl6->saddr = flkeys->addrs.ipv6.dst; fl6->daddr = flkeys->addrs.ipv6.src; fl6->fl6_sport = flkeys->ports.dst; fl6->fl6_dport = flkeys->ports.src; } else { fl6->saddr = flkeys->addrs.ipv6.src; fl6->daddr = flkeys->addrs.ipv6.dst; fl6->fl6_sport = flkeys->ports.src; fl6->fl6_dport = flkeys->ports.dst; } switch (flkeys->basic.ip_proto) { case IPPROTO_GRE: fl6->fl6_gre_key = flkeys->gre.keyid; break; case IPPROTO_ICMPV6: fl6->fl6_icmp_type = flkeys->icmp.type; fl6->fl6_icmp_code = flkeys->icmp.code; break; } fl6->flowi6_proto = flkeys->basic.ip_proto; } #endif int __xfrm_decode_session(struct net *net, struct sk_buff *skb, struct flowi *fl, unsigned int family, int reverse) { struct xfrm_flow_keys flkeys; memset(&flkeys, 0, sizeof(flkeys)); __skb_flow_dissect(net, skb, &xfrm_session_dissector, &flkeys, NULL, 0, 0, 0, FLOW_DISSECTOR_F_STOP_AT_ENCAP); switch (family) { case AF_INET: decode_session4(&flkeys, fl, reverse); break; #if IS_ENABLED(CONFIG_IPV6) case AF_INET6: decode_session6(&flkeys, fl, reverse); break; #endif default: return -EAFNOSUPPORT; } fl->flowi_mark = skb->mark; if (reverse) { fl->flowi_oif = skb->skb_iif; } else { int oif = 0; if (skb_dst(skb) && skb_dst(skb)->dev) oif = skb_dst(skb)->dev->ifindex; fl->flowi_oif = oif; } return security_xfrm_decode_session(skb, &fl->flowi_secid); } EXPORT_SYMBOL(__xfrm_decode_session); static inline int secpath_has_nontransport(const struct sec_path *sp, int k, int *idxp) { for (; k < sp->len; k++) { if (sp->xvec[k]->props.mode != XFRM_MODE_TRANSPORT) { *idxp = k; return 1; } } return 0; } static bool icmp_err_packet(const struct flowi *fl, unsigned short family) { const struct flowi4 *fl4 = &fl->u.ip4; if (family == AF_INET && fl4->flowi4_proto == IPPROTO_ICMP && (fl4->fl4_icmp_type == ICMP_DEST_UNREACH || fl4->fl4_icmp_type == ICMP_TIME_EXCEEDED)) return true; #if IS_ENABLED(CONFIG_IPV6) if (family == AF_INET6) { const struct flowi6 *fl6 = &fl->u.ip6; if (fl6->flowi6_proto == IPPROTO_ICMPV6 && (fl6->fl6_icmp_type == ICMPV6_DEST_UNREACH || fl6->fl6_icmp_type == ICMPV6_PKT_TOOBIG || fl6->fl6_icmp_type == ICMPV6_TIME_EXCEED)) return true; } #endif return false; } static bool xfrm_icmp_flow_decode(struct sk_buff *skb, unsigned short family, const struct flowi *fl, struct flowi *fl1) { bool ret = true; struct sk_buff *newskb = skb_clone(skb, GFP_ATOMIC); int hl = family == AF_INET ? (sizeof(struct iphdr) + sizeof(struct icmphdr)) : (sizeof(struct ipv6hdr) + sizeof(struct icmp6hdr)); if (!newskb) return true; if (!pskb_pull(newskb, hl)) goto out; skb_reset_network_header(newskb); if (xfrm_decode_session_reverse(dev_net(skb->dev), newskb, fl1, family) < 0) goto out; fl1->flowi_oif = fl->flowi_oif; fl1->flowi_mark = fl->flowi_mark; fl1->flowi_dscp = fl->flowi_dscp; nf_nat_decode_session(newskb, fl1, family); ret = false; out: consume_skb(newskb); return ret; } static bool xfrm_selector_inner_icmp_match(struct sk_buff *skb, unsigned short family, const struct xfrm_selector *sel, const struct flowi *fl) { bool ret = false; if (icmp_err_packet(fl, family)) { struct flowi fl1; if (xfrm_icmp_flow_decode(skb, family, fl, &fl1)) return ret; ret = xfrm_selector_match(sel, &fl1, family); } return ret; } static inline struct xfrm_policy *xfrm_in_fwd_icmp(struct sk_buff *skb, const struct flowi *fl, unsigned short family, u32 if_id) { struct xfrm_policy *pol = NULL; if (icmp_err_packet(fl, family)) { struct flowi fl1; struct net *net = dev_net(skb->dev); if (xfrm_icmp_flow_decode(skb, family, fl, &fl1)) return pol; pol = xfrm_policy_lookup(net, &fl1, family, XFRM_POLICY_FWD, if_id); if (IS_ERR(pol)) pol = NULL; } return pol; } static inline struct dst_entry *xfrm_out_fwd_icmp(struct sk_buff *skb, struct flowi *fl, unsigned short family, struct dst_entry *dst) { if (icmp_err_packet(fl, family)) { struct net *net = dev_net(skb->dev); struct dst_entry *dst2; struct flowi fl1; if (xfrm_icmp_flow_decode(skb, family, fl, &fl1)) return dst; dst_hold(dst); dst2 = xfrm_lookup(net, dst, &fl1, NULL, (XFRM_LOOKUP_QUEUE | XFRM_LOOKUP_ICMP)); if (IS_ERR(dst2)) return dst; if (dst2->xfrm) { dst_release(dst); dst = dst2; } else { dst_release(dst2); } } return dst; } int __xfrm_policy_check(struct sock *sk, int dir, struct sk_buff *skb, unsigned short family) { struct net *net = dev_net(skb->dev); struct xfrm_policy *pol; struct xfrm_policy *pols[XFRM_POLICY_TYPE_MAX]; int npols = 0; int xfrm_nr; int pi; int reverse; struct flowi fl; int xerr_idx = -1; const struct xfrm_if_cb *ifcb; struct sec_path *sp; u32 if_id = 0; rcu_read_lock(); ifcb = xfrm_if_get_cb(); if (ifcb) { struct xfrm_if_decode_session_result r; if (ifcb->decode_session(skb, family, &r)) { if_id = r.if_id; net = r.net; } } rcu_read_unlock(); reverse = dir & ~XFRM_POLICY_MASK; dir &= XFRM_POLICY_MASK; if (__xfrm_decode_session(net, skb, &fl, family, reverse) < 0) { XFRM_INC_STATS(net, LINUX_MIB_XFRMINHDRERROR); return 0; } nf_nat_decode_session(skb, &fl, family); /* First, check used SA against their selectors. */ sp = skb_sec_path(skb); if (sp) { int i; for (i = sp->len - 1; i >= 0; i--) { struct xfrm_state *x = sp->xvec[i]; int ret = 0; if (!xfrm_selector_match(&x->sel, &fl, family)) { ret = 1; if (x->props.flags & XFRM_STATE_ICMP && xfrm_selector_inner_icmp_match(skb, family, &x->sel, &fl)) ret = 0; if (ret) { XFRM_INC_STATS(net, LINUX_MIB_XFRMINSTATEMISMATCH); return 0; } } } } pol = NULL; sk = sk_to_full_sk(sk); if (sk && sk->sk_policy[dir]) { pol = xfrm_sk_policy_lookup(sk, dir, &fl, family, if_id); if (IS_ERR(pol)) { XFRM_INC_STATS(net, LINUX_MIB_XFRMINPOLERROR); return 0; } } if (!pol) pol = xfrm_policy_lookup(net, &fl, family, dir, if_id); if (IS_ERR(pol)) { XFRM_INC_STATS(net, LINUX_MIB_XFRMINPOLERROR); return 0; } if (!pol && dir == XFRM_POLICY_FWD) pol = xfrm_in_fwd_icmp(skb, &fl, family, if_id); if (!pol) { const bool is_crypto_offload = sp && (xfrm_input_state(skb)->xso.type == XFRM_DEV_OFFLOAD_CRYPTO); if (net->xfrm.policy_default[dir] == XFRM_USERPOLICY_BLOCK) { XFRM_INC_STATS(net, LINUX_MIB_XFRMINNOPOLS); return 0; } if (sp && secpath_has_nontransport(sp, 0, &xerr_idx) && !is_crypto_offload) { xfrm_secpath_reject(xerr_idx, skb, &fl); XFRM_INC_STATS(net, LINUX_MIB_XFRMINNOPOLS); return 0; } return 1; } /* This lockless write can happen from different cpus. */ WRITE_ONCE(pol->curlft.use_time, ktime_get_real_seconds()); pols[0] = pol; npols++; #ifdef CONFIG_XFRM_SUB_POLICY if (pols[0]->type != XFRM_POLICY_TYPE_MAIN) { pols[1] = xfrm_policy_lookup_bytype(net, XFRM_POLICY_TYPE_MAIN, &fl, family, XFRM_POLICY_IN, if_id); if (pols[1]) { if (IS_ERR(pols[1])) { XFRM_INC_STATS(net, LINUX_MIB_XFRMINPOLERROR); xfrm_pol_put(pols[0]); return 0; } /* This write can happen from different cpus. */ WRITE_ONCE(pols[1]->curlft.use_time, ktime_get_real_seconds()); npols++; } } #endif if (pol->action == XFRM_POLICY_ALLOW) { static struct sec_path dummy; struct xfrm_tmpl *tp[XFRM_MAX_DEPTH]; struct xfrm_tmpl *stp[XFRM_MAX_DEPTH]; struct xfrm_tmpl **tpp = tp; int ti = 0; int i, k; sp = skb_sec_path(skb); if (!sp) sp = &dummy; for (pi = 0; pi < npols; pi++) { if (pols[pi] != pol && pols[pi]->action != XFRM_POLICY_ALLOW) { XFRM_INC_STATS(net, LINUX_MIB_XFRMINPOLBLOCK); goto reject; } if (ti + pols[pi]->xfrm_nr >= XFRM_MAX_DEPTH) { XFRM_INC_STATS(net, LINUX_MIB_XFRMINBUFFERERROR); goto reject_error; } for (i = 0; i < pols[pi]->xfrm_nr; i++) tpp[ti++] = &pols[pi]->xfrm_vec[i]; } xfrm_nr = ti; if (npols > 1) { xfrm_tmpl_sort(stp, tpp, xfrm_nr, family); tpp = stp; } /* For each tunnel xfrm, find the first matching tmpl. * For each tmpl before that, find corresponding xfrm. * Order is _important_. Later we will implement * some barriers, but at the moment barriers * are implied between each two transformations. * Upon success, marks secpath entries as having been * verified to allow them to be skipped in future policy * checks (e.g. nested tunnels). */ for (i = xfrm_nr-1, k = 0; i >= 0; i--) { k = xfrm_policy_ok(tpp[i], sp, k, family, if_id); if (k < 0) { if (k < -1) /* "-2 - errored_index" returned */ xerr_idx = -(2+k); XFRM_INC_STATS(net, LINUX_MIB_XFRMINTMPLMISMATCH); goto reject; } } if (secpath_has_nontransport(sp, k, &xerr_idx)) { XFRM_INC_STATS(net, LINUX_MIB_XFRMINTMPLMISMATCH); goto reject; } xfrm_pols_put(pols, npols); sp->verified_cnt = k; return 1; } XFRM_INC_STATS(net, LINUX_MIB_XFRMINPOLBLOCK); reject: xfrm_secpath_reject(xerr_idx, skb, &fl); reject_error: xfrm_pols_put(pols, npols); return 0; } EXPORT_SYMBOL(__xfrm_policy_check); int __xfrm_route_forward(struct sk_buff *skb, unsigned short family) { struct net *net = dev_net(skb->dev); struct flowi fl; struct dst_entry *dst; int res = 1; if (xfrm_decode_session(net, skb, &fl, family) < 0) { XFRM_INC_STATS(net, LINUX_MIB_XFRMFWDHDRERROR); return 0; } skb_dst_force(skb); dst = skb_dst(skb); if (!dst) { XFRM_INC_STATS(net, LINUX_MIB_XFRMFWDHDRERROR); return 0; } /* ignore return value from skb_dstref_steal, xfrm_lookup takes * care of dropping the refcnt if needed. */ skb_dstref_steal(skb); dst = xfrm_lookup(net, dst, &fl, NULL, XFRM_LOOKUP_QUEUE); if (IS_ERR(dst)) { res = 0; dst = NULL; } if (dst && !dst->xfrm) dst = xfrm_out_fwd_icmp(skb, &fl, family, dst); skb_dst_set(skb, dst); return res; } EXPORT_SYMBOL(__xfrm_route_forward); /* Optimize later using cookies and generation ids. */ static struct dst_entry *xfrm_dst_check(struct dst_entry *dst, u32 cookie) { /* Code (such as __xfrm4_bundle_create()) sets dst->obsolete * to DST_OBSOLETE_FORCE_CHK to force all XFRM destinations to * get validated by dst_ops->check on every use. We do this * because when a normal route referenced by an XFRM dst is * obsoleted we do not go looking around for all parent * referencing XFRM dsts so that we can invalidate them. It * is just too much work. Instead we make the checks here on * every use. For example: * * XFRM dst A --> IPv4 dst X * * X is the "xdst->route" of A (X is also the "dst->path" of A * in this example). If X is marked obsolete, "A" will not * notice. That's what we are validating here via the * stale_bundle() check. * * When a dst is removed from the fib tree, DST_OBSOLETE_DEAD will * be marked on it. * This will force stale_bundle() to fail on any xdst bundle with * this dst linked in it. */ if (READ_ONCE(dst->obsolete) < 0 && !stale_bundle(dst)) return dst; return NULL; } static int stale_bundle(struct dst_entry *dst) { return !xfrm_bundle_ok((struct xfrm_dst *)dst); } void xfrm_dst_ifdown(struct dst_entry *dst, struct net_device *dev) { while ((dst = xfrm_dst_child(dst)) && dst->xfrm && dst->dev == dev) { dst->dev = blackhole_netdev; dev_hold(dst->dev); dev_put(dev); } } EXPORT_SYMBOL(xfrm_dst_ifdown); static void xfrm_link_failure(struct sk_buff *skb) { /* Impossible. Such dst must be popped before reaches point of failure. */ } static void xfrm_negative_advice(struct sock *sk, struct dst_entry *dst) { if (READ_ONCE(dst->obsolete)) sk_dst_reset(sk); } static void xfrm_init_pmtu(struct xfrm_dst **bundle, int nr) { while (nr--) { struct xfrm_dst *xdst = bundle[nr]; u32 pmtu, route_mtu_cached; struct dst_entry *dst; dst = &xdst->u.dst; pmtu = dst_mtu(xfrm_dst_child(dst)); xdst->child_mtu_cached = pmtu; pmtu = xfrm_state_mtu(dst->xfrm, pmtu); route_mtu_cached = dst_mtu(xdst->route); xdst->route_mtu_cached = route_mtu_cached; if (pmtu > route_mtu_cached) pmtu = route_mtu_cached; dst_metric_set(dst, RTAX_MTU, pmtu); } } /* Check that the bundle accepts the flow and its components are * still valid. */ static int xfrm_bundle_ok(struct xfrm_dst *first) { struct xfrm_dst *bundle[XFRM_MAX_DEPTH]; struct dst_entry *dst = &first->u.dst; struct xfrm_dst *xdst; int start_from, nr; u32 mtu; if (!dst_check(xfrm_dst_path(dst), ((struct xfrm_dst *)dst)->path_cookie) || (dst->dev && !netif_running(dst->dev))) return 0; if (dst->flags & DST_XFRM_QUEUE) return 1; start_from = nr = 0; do { struct xfrm_dst *xdst = (struct xfrm_dst *)dst; if (dst->xfrm->km.state != XFRM_STATE_VALID) return 0; if (xdst->xfrm_genid != dst->xfrm->genid) return 0; if (xdst->num_pols > 0 && xdst->policy_genid != atomic_read(&xdst->pols[0]->genid)) return 0; bundle[nr++] = xdst; mtu = dst_mtu(xfrm_dst_child(dst)); if (xdst->child_mtu_cached != mtu) { start_from = nr; xdst->child_mtu_cached = mtu; } if (!dst_check(xdst->route, xdst->route_cookie)) return 0; mtu = dst_mtu(xdst->route); if (xdst->route_mtu_cached != mtu) { start_from = nr; xdst->route_mtu_cached = mtu; } dst = xfrm_dst_child(dst); } while (dst->xfrm); if (likely(!start_from)) return 1; xdst = bundle[start_from - 1]; mtu = xdst->child_mtu_cached; while (start_from--) { dst = &xdst->u.dst; mtu = xfrm_state_mtu(dst->xfrm, mtu); if (mtu > xdst->route_mtu_cached) mtu = xdst->route_mtu_cached; dst_metric_set(dst, RTAX_MTU, mtu); if (!start_from) break; xdst = bundle[start_from - 1]; xdst->child_mtu_cached = mtu; } return 1; } static unsigned int xfrm_default_advmss(const struct dst_entry *dst) { return dst_metric_advmss(xfrm_dst_path(dst)); } static unsigned int xfrm_mtu(const struct dst_entry *dst) { unsigned int mtu = dst_metric_raw(dst, RTAX_MTU); return mtu ? : dst_mtu(xfrm_dst_path(dst)); } static const void *xfrm_get_dst_nexthop(const struct dst_entry *dst, const void *daddr) { while (dst->xfrm) { const struct xfrm_state *xfrm = dst->xfrm; dst = xfrm_dst_child(dst); if (xfrm->props.mode == XFRM_MODE_TRANSPORT) continue; if (xfrm->type->flags & XFRM_TYPE_REMOTE_COADDR) daddr = xfrm->coaddr; else if (!(xfrm->type->flags & XFRM_TYPE_LOCAL_COADDR)) daddr = &xfrm->id.daddr; } return daddr; } static struct neighbour *xfrm_neigh_lookup(const struct dst_entry *dst, struct sk_buff *skb, const void *daddr) { const struct dst_entry *path = xfrm_dst_path(dst); if (!skb) daddr = xfrm_get_dst_nexthop(dst, daddr); return path->ops->neigh_lookup(path, skb, daddr); } static void xfrm_confirm_neigh(const struct dst_entry *dst, const void *daddr) { const struct dst_entry *path = xfrm_dst_path(dst); daddr = xfrm_get_dst_nexthop(dst, daddr); path->ops->confirm_neigh(path, daddr); } int xfrm_policy_register_afinfo(const struct xfrm_policy_afinfo *afinfo, int family) { int err = 0; if (WARN_ON(family >= ARRAY_SIZE(xfrm_policy_afinfo))) return -EAFNOSUPPORT; spin_lock(&xfrm_policy_afinfo_lock); if (unlikely(xfrm_policy_afinfo[family] != NULL)) err = -EEXIST; else { struct dst_ops *dst_ops = afinfo->dst_ops; if (likely(dst_ops->kmem_cachep == NULL)) dst_ops->kmem_cachep = xfrm_dst_cache; if (likely(dst_ops->check == NULL)) dst_ops->check = xfrm_dst_check; if (likely(dst_ops->default_advmss == NULL)) dst_ops->default_advmss = xfrm_default_advmss; if (likely(dst_ops->mtu == NULL)) dst_ops->mtu = xfrm_mtu; if (likely(dst_ops->negative_advice == NULL)) dst_ops->negative_advice = xfrm_negative_advice; if (likely(dst_ops->link_failure == NULL)) dst_ops->link_failure = xfrm_link_failure; if (likely(dst_ops->neigh_lookup == NULL)) dst_ops->neigh_lookup = xfrm_neigh_lookup; if (likely(!dst_ops->confirm_neigh)) dst_ops->confirm_neigh = xfrm_confirm_neigh; rcu_assign_pointer(xfrm_policy_afinfo[family], afinfo); } spin_unlock(&xfrm_policy_afinfo_lock); return err; } EXPORT_SYMBOL(xfrm_policy_register_afinfo); void xfrm_policy_unregister_afinfo(const struct xfrm_policy_afinfo *afinfo) { struct dst_ops *dst_ops = afinfo->dst_ops; int i; for (i = 0; i < ARRAY_SIZE(xfrm_policy_afinfo); i++) { if (xfrm_policy_afinfo[i] != afinfo) continue; RCU_INIT_POINTER(xfrm_policy_afinfo[i], NULL); break; } synchronize_rcu(); dst_ops->kmem_cachep = NULL; dst_ops->check = NULL; dst_ops->negative_advice = NULL; dst_ops->link_failure = NULL; } EXPORT_SYMBOL(xfrm_policy_unregister_afinfo); void xfrm_if_register_cb(const struct xfrm_if_cb *ifcb) { spin_lock(&xfrm_if_cb_lock); rcu_assign_pointer(xfrm_if_cb, ifcb); spin_unlock(&xfrm_if_cb_lock); } EXPORT_SYMBOL(xfrm_if_register_cb); void xfrm_if_unregister_cb(void) { RCU_INIT_POINTER(xfrm_if_cb, NULL); synchronize_rcu(); } EXPORT_SYMBOL(xfrm_if_unregister_cb); #ifdef CONFIG_XFRM_STATISTICS static int __net_init xfrm_statistics_init(struct net *net) { int rv; net->mib.xfrm_statistics = alloc_percpu(struct linux_xfrm_mib); if (!net->mib.xfrm_statistics) return -ENOMEM; rv = xfrm_proc_init(net); if (rv < 0) free_percpu(net->mib.xfrm_statistics); return rv; } static void xfrm_statistics_fini(struct net *net) { xfrm_proc_fini(net); free_percpu(net->mib.xfrm_statistics); } #else static int __net_init xfrm_statistics_init(struct net *net) { return 0; } static void xfrm_statistics_fini(struct net *net) { } #endif static int __net_init xfrm_policy_init(struct net *net) { unsigned int hmask, sz; int dir, err; if (net_eq(net, &init_net)) { xfrm_dst_cache = KMEM_CACHE(xfrm_dst, SLAB_HWCACHE_ALIGN | SLAB_PANIC); err = rhashtable_init(&xfrm_policy_inexact_table, &xfrm_pol_inexact_params); BUG_ON(err); } hmask = 8 - 1; sz = (hmask+1) * sizeof(struct hlist_head); net->xfrm.policy_byidx = xfrm_hash_alloc(sz); if (!net->xfrm.policy_byidx) goto out_byidx; net->xfrm.policy_idx_hmask = hmask; for (dir = 0; dir < XFRM_POLICY_MAX; dir++) { struct xfrm_policy_hash *htab; net->xfrm.policy_count[dir] = 0; net->xfrm.policy_count[XFRM_POLICY_MAX + dir] = 0; htab = &net->xfrm.policy_bydst[dir]; htab->table = xfrm_hash_alloc(sz); if (!htab->table) goto out_bydst; htab->hmask = hmask; htab->dbits4 = 32; htab->sbits4 = 32; htab->dbits6 = 128; htab->sbits6 = 128; } net->xfrm.policy_hthresh.lbits4 = 32; net->xfrm.policy_hthresh.rbits4 = 32; net->xfrm.policy_hthresh.lbits6 = 128; net->xfrm.policy_hthresh.rbits6 = 128; seqlock_init(&net->xfrm.policy_hthresh.lock); INIT_LIST_HEAD(&net->xfrm.policy_all); INIT_LIST_HEAD(&net->xfrm.inexact_bins); INIT_WORK(&net->xfrm.policy_hash_work, xfrm_hash_resize); INIT_WORK(&net->xfrm.policy_hthresh.work, xfrm_hash_rebuild); return 0; out_bydst: for (dir--; dir >= 0; dir--) { struct xfrm_policy_hash *htab; htab = &net->xfrm.policy_bydst[dir]; xfrm_hash_free(htab->table, sz); } xfrm_hash_free(net->xfrm.policy_byidx, sz); out_byidx: return -ENOMEM; } static void xfrm_policy_fini(struct net *net) { struct xfrm_pol_inexact_bin *b, *t; unsigned int sz; int dir; flush_work(&net->xfrm.policy_hash_work); #ifdef CONFIG_XFRM_SUB_POLICY xfrm_policy_flush(net, XFRM_POLICY_TYPE_SUB, false); #endif xfrm_policy_flush(net, XFRM_POLICY_TYPE_MAIN, false); WARN_ON(!list_empty(&net->xfrm.policy_all)); for (dir = 0; dir < XFRM_POLICY_MAX; dir++) { struct xfrm_policy_hash *htab; htab = &net->xfrm.policy_bydst[dir]; sz = (htab->hmask + 1) * sizeof(struct hlist_head); WARN_ON(!hlist_empty(htab->table)); xfrm_hash_free(htab->table, sz); } sz = (net->xfrm.policy_idx_hmask + 1) * sizeof(struct hlist_head); WARN_ON(!hlist_empty(net->xfrm.policy_byidx)); xfrm_hash_free(net->xfrm.policy_byidx, sz); spin_lock_bh(&net->xfrm.xfrm_policy_lock); list_for_each_entry_safe(b, t, &net->xfrm.inexact_bins, inexact_bins) __xfrm_policy_inexact_prune_bin(b, true); spin_unlock_bh(&net->xfrm.xfrm_policy_lock); } static int __net_init xfrm_net_init(struct net *net) { int rv; /* Initialize the per-net locks here */ spin_lock_init(&net->xfrm.xfrm_state_lock); spin_lock_init(&net->xfrm.xfrm_policy_lock); seqcount_spinlock_init(&net->xfrm.xfrm_policy_hash_generation, &net->xfrm.xfrm_policy_lock); mutex_init(&net->xfrm.xfrm_cfg_mutex); net->xfrm.policy_default[XFRM_POLICY_IN] = XFRM_USERPOLICY_ACCEPT; net->xfrm.policy_default[XFRM_POLICY_FWD] = XFRM_USERPOLICY_ACCEPT; net->xfrm.policy_default[XFRM_POLICY_OUT] = XFRM_USERPOLICY_ACCEPT; rv = xfrm_statistics_init(net); if (rv < 0) goto out_statistics; rv = xfrm_state_init(net); if (rv < 0) goto out_state; rv = xfrm_policy_init(net); if (rv < 0) goto out_policy; rv = xfrm_sysctl_init(net); if (rv < 0) goto out_sysctl; rv = xfrm_nat_keepalive_net_init(net); if (rv < 0) goto out_nat_keepalive; return 0; out_nat_keepalive: xfrm_sysctl_fini(net); out_sysctl: xfrm_policy_fini(net); out_policy: xfrm_state_fini(net); out_state: xfrm_statistics_fini(net); out_statistics: return rv; } static void __net_exit xfrm_net_exit(struct net *net) { xfrm_nat_keepalive_net_fini(net); xfrm_sysctl_fini(net); xfrm_policy_fini(net); xfrm_state_fini(net); xfrm_statistics_fini(net); } static struct pernet_operations __net_initdata xfrm_net_ops = { .init = xfrm_net_init, .exit = xfrm_net_exit, }; static const struct flow_dissector_key xfrm_flow_dissector_keys[] = { { .key_id = FLOW_DISSECTOR_KEY_CONTROL, .offset = offsetof(struct xfrm_flow_keys, control), }, { .key_id = FLOW_DISSECTOR_KEY_BASIC, .offset = offsetof(struct xfrm_flow_keys, basic), }, { .key_id = FLOW_DISSECTOR_KEY_IPV4_ADDRS, .offset = offsetof(struct xfrm_flow_keys, addrs.ipv4), }, { .key_id = FLOW_DISSECTOR_KEY_IPV6_ADDRS, .offset = offsetof(struct xfrm_flow_keys, addrs.ipv6), }, { .key_id = FLOW_DISSECTOR_KEY_PORTS, .offset = offsetof(struct xfrm_flow_keys, ports), }, { .key_id = FLOW_DISSECTOR_KEY_GRE_KEYID, .offset = offsetof(struct xfrm_flow_keys, gre), }, { .key_id = FLOW_DISSECTOR_KEY_IP, .offset = offsetof(struct xfrm_flow_keys, ip), }, { .key_id = FLOW_DISSECTOR_KEY_ICMP, .offset = offsetof(struct xfrm_flow_keys, icmp), }, }; void __init xfrm_init(void) { skb_flow_dissector_init(&xfrm_session_dissector, xfrm_flow_dissector_keys, ARRAY_SIZE(xfrm_flow_dissector_keys)); register_pernet_subsys(&xfrm_net_ops); xfrm_dev_init(); xfrm_input_init(); #ifdef CONFIG_XFRM_ESPINTCP espintcp_init(); #endif register_xfrm_state_bpf(); xfrm_nat_keepalive_init(AF_INET); } #ifdef CONFIG_AUDITSYSCALL static void xfrm_audit_common_policyinfo(struct xfrm_policy *xp, struct audit_buffer *audit_buf) { struct xfrm_sec_ctx *ctx = xp->security; struct xfrm_selector *sel = &xp->selector; if (ctx) audit_log_format(audit_buf, " sec_alg=%u sec_doi=%u sec_obj=%s", ctx->ctx_alg, ctx->ctx_doi, ctx->ctx_str); switch (sel->family) { case AF_INET: audit_log_format(audit_buf, " src=%pI4", &sel->saddr.a4); if (sel->prefixlen_s != 32) audit_log_format(audit_buf, " src_prefixlen=%d", sel->prefixlen_s); audit_log_format(audit_buf, " dst=%pI4", &sel->daddr.a4); if (sel->prefixlen_d != 32) audit_log_format(audit_buf, " dst_prefixlen=%d", sel->prefixlen_d); break; case AF_INET6: audit_log_format(audit_buf, " src=%pI6", sel->saddr.a6); if (sel->prefixlen_s != 128) audit_log_format(audit_buf, " src_prefixlen=%d", sel->prefixlen_s); audit_log_format(audit_buf, " dst=%pI6", sel->daddr.a6); if (sel->prefixlen_d != 128) audit_log_format(audit_buf, " dst_prefixlen=%d", sel->prefixlen_d); break; } } void xfrm_audit_policy_add(struct xfrm_policy *xp, int result, bool task_valid) { struct audit_buffer *audit_buf; audit_buf = xfrm_audit_start("SPD-add"); if (audit_buf == NULL) return; xfrm_audit_helper_usrinfo(task_valid, audit_buf); audit_log_format(audit_buf, " res=%u", result); xfrm_audit_common_policyinfo(xp, audit_buf); audit_log_end(audit_buf); } EXPORT_SYMBOL_GPL(xfrm_audit_policy_add); void xfrm_audit_policy_delete(struct xfrm_policy *xp, int result, bool task_valid) { struct audit_buffer *audit_buf; audit_buf = xfrm_audit_start("SPD-delete"); if (audit_buf == NULL) return; xfrm_audit_helper_usrinfo(task_valid, audit_buf); audit_log_format(audit_buf, " res=%u", result); xfrm_audit_common_policyinfo(xp, audit_buf); audit_log_end(audit_buf); } EXPORT_SYMBOL_GPL(xfrm_audit_policy_delete); #endif #ifdef CONFIG_XFRM_MIGRATE static struct xfrm_policy *xfrm_migrate_policy_find(const struct xfrm_selector *sel, u8 dir, u8 type, struct net *net, u32 if_id) { struct xfrm_policy *pol; struct flowi fl; memset(&fl, 0, sizeof(fl)); fl.flowi_proto = sel->proto; switch (sel->family) { case AF_INET: fl.u.ip4.saddr = sel->saddr.a4; fl.u.ip4.daddr = sel->daddr.a4; if (sel->proto == IPSEC_ULPROTO_ANY) break; fl.u.flowi4_oif = sel->ifindex; fl.u.ip4.fl4_sport = sel->sport; fl.u.ip4.fl4_dport = sel->dport; break; case AF_INET6: fl.u.ip6.saddr = sel->saddr.in6; fl.u.ip6.daddr = sel->daddr.in6; if (sel->proto == IPSEC_ULPROTO_ANY) break; fl.u.flowi6_oif = sel->ifindex; fl.u.ip6.fl4_sport = sel->sport; fl.u.ip6.fl4_dport = sel->dport; break; default: return ERR_PTR(-EAFNOSUPPORT); } rcu_read_lock(); pol = xfrm_policy_lookup_bytype(net, type, &fl, sel->family, dir, if_id); if (IS_ERR_OR_NULL(pol)) goto out_unlock; if (!xfrm_pol_hold_rcu(pol)) pol = NULL; out_unlock: rcu_read_unlock(); return pol; } static int migrate_tmpl_match(const struct xfrm_migrate *m, const struct xfrm_tmpl *t) { int match = 0; if (t->mode == m->mode && t->id.proto == m->proto && (m->reqid == 0 || t->reqid == m->reqid)) { switch (t->mode) { case XFRM_MODE_TUNNEL: case XFRM_MODE_BEET: case XFRM_MODE_IPTFS: if (xfrm_addr_equal(&t->id.daddr, &m->old_daddr, m->old_family) && xfrm_addr_equal(&t->saddr, &m->old_saddr, m->old_family)) { match = 1; } break; case XFRM_MODE_TRANSPORT: /* in case of transport mode, template does not store any IP addresses, hence we just compare mode and protocol */ match = 1; break; default: break; } } return match; } /* update endpoint address(es) of template(s) */ static int xfrm_policy_migrate(struct xfrm_policy *pol, struct xfrm_migrate *m, int num_migrate, struct netlink_ext_ack *extack) { struct xfrm_migrate *mp; int i, j, n = 0; write_lock_bh(&pol->lock); if (unlikely(pol->walk.dead)) { /* target policy has been deleted */ NL_SET_ERR_MSG(extack, "Target policy not found"); write_unlock_bh(&pol->lock); return -ENOENT; } for (i = 0; i < pol->xfrm_nr; i++) { for (j = 0, mp = m; j < num_migrate; j++, mp++) { if (!migrate_tmpl_match(mp, &pol->xfrm_vec[i])) continue; n++; if (pol->xfrm_vec[i].mode != XFRM_MODE_TUNNEL && pol->xfrm_vec[i].mode != XFRM_MODE_BEET && pol->xfrm_vec[i].mode != XFRM_MODE_IPTFS) continue; /* update endpoints */ memcpy(&pol->xfrm_vec[i].id.daddr, &mp->new_daddr, sizeof(pol->xfrm_vec[i].id.daddr)); memcpy(&pol->xfrm_vec[i].saddr, &mp->new_saddr, sizeof(pol->xfrm_vec[i].saddr)); pol->xfrm_vec[i].encap_family = mp->new_family; /* flush bundles */ atomic_inc(&pol->genid); } } write_unlock_bh(&pol->lock); if (!n) return -ENODATA; return 0; } static int xfrm_migrate_check(const struct xfrm_migrate *m, int num_migrate, struct netlink_ext_ack *extack) { int i, j; if (num_migrate < 1 || num_migrate > XFRM_MAX_DEPTH) { NL_SET_ERR_MSG(extack, "Invalid number of SAs to migrate, must be 0 < num <= XFRM_MAX_DEPTH (6)"); return -EINVAL; } for (i = 0; i < num_migrate; i++) { if (xfrm_addr_any(&m[i].new_daddr, m[i].new_family) || xfrm_addr_any(&m[i].new_saddr, m[i].new_family)) { NL_SET_ERR_MSG(extack, "Addresses in the MIGRATE attribute's list cannot be null"); return -EINVAL; } /* check if there is any duplicated entry */ for (j = i + 1; j < num_migrate; j++) { if (!memcmp(&m[i].old_daddr, &m[j].old_daddr, sizeof(m[i].old_daddr)) && !memcmp(&m[i].old_saddr, &m[j].old_saddr, sizeof(m[i].old_saddr)) && m[i].proto == m[j].proto && m[i].mode == m[j].mode && m[i].reqid == m[j].reqid && m[i].old_family == m[j].old_family) { NL_SET_ERR_MSG(extack, "Entries in the MIGRATE attribute's list must be unique"); return -EINVAL; } } } return 0; } int xfrm_migrate(const struct xfrm_selector *sel, u8 dir, u8 type, struct xfrm_migrate *m, int num_migrate, struct xfrm_kmaddress *k, struct net *net, struct xfrm_encap_tmpl *encap, u32 if_id, struct netlink_ext_ack *extack, struct xfrm_user_offload *xuo) { int i, err, nx_cur = 0, nx_new = 0; struct xfrm_policy *pol = NULL; struct xfrm_state *x, *xc; struct xfrm_state *x_cur[XFRM_MAX_DEPTH]; struct xfrm_state *x_new[XFRM_MAX_DEPTH]; struct xfrm_migrate *mp; /* Stage 0 - sanity checks */ err = xfrm_migrate_check(m, num_migrate, extack); if (err < 0) goto out; if (dir >= XFRM_POLICY_MAX) { NL_SET_ERR_MSG(extack, "Invalid policy direction"); err = -EINVAL; goto out; } /* Stage 1 - find policy */ pol = xfrm_migrate_policy_find(sel, dir, type, net, if_id); if (IS_ERR_OR_NULL(pol)) { NL_SET_ERR_MSG(extack, "Target policy not found"); err = IS_ERR(pol) ? PTR_ERR(pol) : -ENOENT; goto out; } /* Stage 2 - find and update state(s) */ for (i = 0, mp = m; i < num_migrate; i++, mp++) { if ((x = xfrm_migrate_state_find(mp, net, if_id))) { x_cur[nx_cur] = x; nx_cur++; xc = xfrm_state_migrate(x, mp, encap, net, xuo, extack); if (xc) { x_new[nx_new] = xc; nx_new++; } else { err = -ENODATA; goto restore_state; } } } /* Stage 3 - update policy */ err = xfrm_policy_migrate(pol, m, num_migrate, extack); if (err < 0) goto restore_state; /* Stage 4 - delete old state(s) */ if (nx_cur) { xfrm_states_put(x_cur, nx_cur); xfrm_states_delete(x_cur, nx_cur); } /* Stage 5 - announce */ km_migrate(sel, dir, type, m, num_migrate, k, encap); xfrm_pol_put(pol); return 0; out: return err; restore_state: if (pol) xfrm_pol_put(pol); if (nx_cur) xfrm_states_put(x_cur, nx_cur); if (nx_new) xfrm_states_delete(x_new, nx_new); return err; } EXPORT_SYMBOL(xfrm_migrate); #endif |
| 20 2 1 2 15 2 2 2 3 1 9 6 3 9 1 9 1 7 6 1 8 8 5 3 8 9 9 9 9 9 9 9 33 33 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 | // SPDX-License-Identifier: GPL-2.0-only /* * net/sched/act_sample.c - Packet sampling tc action * Copyright (c) 2017 Yotam Gigi <yotamg@mellanox.com> */ #include <linux/types.h> #include <linux/kernel.h> #include <linux/string.h> #include <linux/errno.h> #include <linux/skbuff.h> #include <linux/rtnetlink.h> #include <linux/module.h> #include <linux/init.h> #include <linux/gfp.h> #include <net/net_namespace.h> #include <net/netlink.h> #include <net/pkt_sched.h> #include <linux/tc_act/tc_sample.h> #include <net/tc_act/tc_sample.h> #include <net/psample.h> #include <net/pkt_cls.h> #include <net/tc_wrapper.h> #include <linux/if_arp.h> static struct tc_action_ops act_sample_ops; static const struct nla_policy sample_policy[TCA_SAMPLE_MAX + 1] = { [TCA_SAMPLE_PARMS] = { .len = sizeof(struct tc_sample) }, [TCA_SAMPLE_RATE] = { .type = NLA_U32 }, [TCA_SAMPLE_TRUNC_SIZE] = { .type = NLA_U32 }, [TCA_SAMPLE_PSAMPLE_GROUP] = { .type = NLA_U32 }, }; static int tcf_sample_init(struct net *net, struct nlattr *nla, struct nlattr *est, struct tc_action **a, struct tcf_proto *tp, u32 flags, struct netlink_ext_ack *extack) { struct tc_action_net *tn = net_generic(net, act_sample_ops.net_id); bool bind = flags & TCA_ACT_FLAGS_BIND; struct nlattr *tb[TCA_SAMPLE_MAX + 1]; struct psample_group *psample_group; u32 psample_group_num, rate, index; struct tcf_chain *goto_ch = NULL; struct tc_sample *parm; struct tcf_sample *s; bool exists = false; int ret, err; if (!nla) return -EINVAL; ret = nla_parse_nested_deprecated(tb, TCA_SAMPLE_MAX, nla, sample_policy, NULL); if (ret < 0) return ret; if (!tb[TCA_SAMPLE_PARMS]) return -EINVAL; parm = nla_data(tb[TCA_SAMPLE_PARMS]); index = parm->index; err = tcf_idr_check_alloc(tn, &index, a, bind); if (err < 0) return err; exists = err; if (exists && bind) return ACT_P_BOUND; if (!exists) { ret = tcf_idr_create(tn, index, est, a, &act_sample_ops, bind, true, flags); if (ret) { tcf_idr_cleanup(tn, index); return ret; } ret = ACT_P_CREATED; } else if (!(flags & TCA_ACT_FLAGS_REPLACE)) { tcf_idr_release(*a, bind); return -EEXIST; } if (!tb[TCA_SAMPLE_RATE] || !tb[TCA_SAMPLE_PSAMPLE_GROUP]) { NL_SET_ERR_MSG(extack, "sample rate and group are required"); err = -EINVAL; goto release_idr; } err = tcf_action_check_ctrlact(parm->action, tp, &goto_ch, extack); if (err < 0) goto release_idr; rate = nla_get_u32(tb[TCA_SAMPLE_RATE]); if (!rate) { NL_SET_ERR_MSG(extack, "invalid sample rate"); err = -EINVAL; goto put_chain; } psample_group_num = nla_get_u32(tb[TCA_SAMPLE_PSAMPLE_GROUP]); psample_group = psample_group_get(net, psample_group_num); if (!psample_group) { err = -ENOMEM; goto put_chain; } s = to_sample(*a); spin_lock_bh(&s->tcf_lock); goto_ch = tcf_action_set_ctrlact(*a, parm->action, goto_ch); s->rate = rate; s->psample_group_num = psample_group_num; psample_group = rcu_replace_pointer(s->psample_group, psample_group, lockdep_is_held(&s->tcf_lock)); if (tb[TCA_SAMPLE_TRUNC_SIZE]) { s->truncate = true; s->trunc_size = nla_get_u32(tb[TCA_SAMPLE_TRUNC_SIZE]); } spin_unlock_bh(&s->tcf_lock); if (psample_group) psample_group_put(psample_group); if (goto_ch) tcf_chain_put_by_act(goto_ch); return ret; put_chain: if (goto_ch) tcf_chain_put_by_act(goto_ch); release_idr: tcf_idr_release(*a, bind); return err; } static void tcf_sample_cleanup(struct tc_action *a) { struct tcf_sample *s = to_sample(a); struct psample_group *psample_group; /* last reference to action, no need to lock */ psample_group = rcu_dereference_protected(s->psample_group, 1); RCU_INIT_POINTER(s->psample_group, NULL); if (psample_group) psample_group_put(psample_group); } static bool tcf_sample_dev_ok_push(struct net_device *dev) { switch (dev->type) { case ARPHRD_TUNNEL: case ARPHRD_TUNNEL6: case ARPHRD_SIT: case ARPHRD_IPGRE: case ARPHRD_IP6GRE: case ARPHRD_VOID: case ARPHRD_NONE: return false; default: return true; } } TC_INDIRECT_SCOPE int tcf_sample_act(struct sk_buff *skb, const struct tc_action *a, struct tcf_result *res) { struct tcf_sample *s = to_sample(a); struct psample_group *psample_group; u8 cookie_data[TC_COOKIE_MAX_SIZE]; struct psample_metadata md = {}; struct tc_cookie *user_cookie; int retval; tcf_lastuse_update(&s->tcf_tm); bstats_update(this_cpu_ptr(s->common.cpu_bstats), skb); retval = READ_ONCE(s->tcf_action); psample_group = rcu_dereference_bh(s->psample_group); /* randomly sample packets according to rate */ if (psample_group && (get_random_u32_below(s->rate) == 0)) { if (!skb_at_tc_ingress(skb)) { md.in_ifindex = skb->skb_iif; md.out_ifindex = skb->dev->ifindex; } else { md.in_ifindex = skb->dev->ifindex; } /* on ingress, the mac header gets popped, so push it back */ if (skb_at_tc_ingress(skb) && tcf_sample_dev_ok_push(skb->dev)) skb_push(skb, skb->mac_len); rcu_read_lock(); user_cookie = rcu_dereference(a->user_cookie); if (user_cookie) { memcpy(cookie_data, user_cookie->data, user_cookie->len); md.user_cookie = cookie_data; md.user_cookie_len = user_cookie->len; } rcu_read_unlock(); md.trunc_size = s->truncate ? s->trunc_size : skb->len; psample_sample_packet(psample_group, skb, s->rate, &md); if (skb_at_tc_ingress(skb) && tcf_sample_dev_ok_push(skb->dev)) skb_pull(skb, skb->mac_len); } return retval; } static void tcf_sample_stats_update(struct tc_action *a, u64 bytes, u64 packets, u64 drops, u64 lastuse, bool hw) { struct tcf_sample *s = to_sample(a); struct tcf_t *tm = &s->tcf_tm; tcf_action_update_stats(a, bytes, packets, drops, hw); tm->lastuse = max_t(u64, tm->lastuse, lastuse); } static int tcf_sample_dump(struct sk_buff *skb, struct tc_action *a, int bind, int ref) { unsigned char *b = skb_tail_pointer(skb); struct tcf_sample *s = to_sample(a); struct tc_sample opt = { .index = s->tcf_index, .refcnt = refcount_read(&s->tcf_refcnt) - ref, .bindcnt = atomic_read(&s->tcf_bindcnt) - bind, }; struct tcf_t t; spin_lock_bh(&s->tcf_lock); opt.action = s->tcf_action; if (nla_put(skb, TCA_SAMPLE_PARMS, sizeof(opt), &opt)) goto nla_put_failure; tcf_tm_dump(&t, &s->tcf_tm); if (nla_put_64bit(skb, TCA_SAMPLE_TM, sizeof(t), &t, TCA_SAMPLE_PAD)) goto nla_put_failure; if (nla_put_u32(skb, TCA_SAMPLE_RATE, s->rate)) goto nla_put_failure; if (s->truncate) if (nla_put_u32(skb, TCA_SAMPLE_TRUNC_SIZE, s->trunc_size)) goto nla_put_failure; if (nla_put_u32(skb, TCA_SAMPLE_PSAMPLE_GROUP, s->psample_group_num)) goto nla_put_failure; spin_unlock_bh(&s->tcf_lock); return skb->len; nla_put_failure: spin_unlock_bh(&s->tcf_lock); nlmsg_trim(skb, b); return -1; } static void tcf_psample_group_put(void *priv) { struct psample_group *group = priv; psample_group_put(group); } static struct psample_group * tcf_sample_get_group(const struct tc_action *a, tc_action_priv_destructor *destructor) { struct tcf_sample *s = to_sample(a); struct psample_group *group; group = rcu_dereference_protected(s->psample_group, lockdep_is_held(&s->tcf_lock)); if (group) { psample_group_take(group); *destructor = tcf_psample_group_put; } return group; } static void tcf_offload_sample_get_group(struct flow_action_entry *entry, const struct tc_action *act) { entry->sample.psample_group = act->ops->get_psample_group(act, &entry->destructor); entry->destructor_priv = entry->sample.psample_group; } static int tcf_sample_offload_act_setup(struct tc_action *act, void *entry_data, u32 *index_inc, bool bind, struct netlink_ext_ack *extack) { if (bind) { struct flow_action_entry *entry = entry_data; entry->id = FLOW_ACTION_SAMPLE; entry->sample.trunc_size = tcf_sample_trunc_size(act); entry->sample.truncate = tcf_sample_truncate(act); entry->sample.rate = tcf_sample_rate(act); tcf_offload_sample_get_group(entry, act); *index_inc = 1; } else { struct flow_offload_action *fl_action = entry_data; fl_action->id = FLOW_ACTION_SAMPLE; } return 0; } static struct tc_action_ops act_sample_ops = { .kind = "sample", .id = TCA_ID_SAMPLE, .owner = THIS_MODULE, .act = tcf_sample_act, .stats_update = tcf_sample_stats_update, .dump = tcf_sample_dump, .init = tcf_sample_init, .cleanup = tcf_sample_cleanup, .get_psample_group = tcf_sample_get_group, .offload_act_setup = tcf_sample_offload_act_setup, .size = sizeof(struct tcf_sample), }; MODULE_ALIAS_NET_ACT("sample"); static __net_init int sample_init_net(struct net *net) { struct tc_action_net *tn = net_generic(net, act_sample_ops.net_id); return tc_action_net_init(net, tn, &act_sample_ops); } static void __net_exit sample_exit_net(struct list_head *net_list) { tc_action_net_exit(net_list, act_sample_ops.net_id); } static struct pernet_operations sample_net_ops = { .init = sample_init_net, .exit_batch = sample_exit_net, .id = &act_sample_ops.net_id, .size = sizeof(struct tc_action_net), }; static int __init sample_init_module(void) { return tcf_register_action(&act_sample_ops, &sample_net_ops); } static void __exit sample_cleanup_module(void) { tcf_unregister_action(&act_sample_ops, &sample_net_ops); } module_init(sample_init_module); module_exit(sample_cleanup_module); MODULE_AUTHOR("Yotam Gigi <yotam.gi@gmail.com>"); MODULE_DESCRIPTION("Packet sampling action"); MODULE_LICENSE("GPL v2"); |
| 19 19 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 | /* SPDX-License-Identifier: GPL-2.0-or-later */ /* * Copyright (C) 2001 Momchil Velikov * Portions Copyright (C) 2001 Christoph Hellwig * Copyright (C) 2006 Nick Piggin * Copyright (C) 2012 Konstantin Khlebnikov */ #ifndef _LINUX_RADIX_TREE_H #define _LINUX_RADIX_TREE_H #include <linux/bitops.h> #include <linux/gfp_types.h> #include <linux/list.h> #include <linux/lockdep.h> #include <linux/math.h> #include <linux/percpu.h> #include <linux/preempt.h> #include <linux/rcupdate.h> #include <linux/spinlock.h> #include <linux/types.h> #include <linux/xarray.h> #include <linux/local_lock.h> /* Keep unconverted code working */ #define radix_tree_root xarray #define radix_tree_node xa_node struct radix_tree_preload { local_lock_t lock; unsigned nr; /* nodes->parent points to next preallocated node */ struct radix_tree_node *nodes; }; DECLARE_PER_CPU(struct radix_tree_preload, radix_tree_preloads); /* * The bottom two bits of the slot determine how the remaining bits in the * slot are interpreted: * * 00 - data pointer * 10 - internal entry * x1 - value entry * * The internal entry may be a pointer to the next level in the tree, a * sibling entry, or an indicator that the entry in this slot has been moved * to another location in the tree and the lookup should be restarted. While * NULL fits the 'data pointer' pattern, it means that there is no entry in * the tree for this index (no matter what level of the tree it is found at). * This means that storing a NULL entry in the tree is the same as deleting * the entry from the tree. */ #define RADIX_TREE_ENTRY_MASK 3UL #define RADIX_TREE_INTERNAL_NODE 2UL static inline bool radix_tree_is_internal_node(void *ptr) { return ((unsigned long)ptr & RADIX_TREE_ENTRY_MASK) == RADIX_TREE_INTERNAL_NODE; } /*** radix-tree API starts here ***/ #define RADIX_TREE_MAP_SHIFT XA_CHUNK_SHIFT #define RADIX_TREE_MAP_SIZE (1UL << RADIX_TREE_MAP_SHIFT) #define RADIX_TREE_MAP_MASK (RADIX_TREE_MAP_SIZE-1) #define RADIX_TREE_MAX_TAGS XA_MAX_MARKS #define RADIX_TREE_TAG_LONGS XA_MARK_LONGS #define RADIX_TREE_INDEX_BITS (8 /* CHAR_BIT */ * sizeof(unsigned long)) #define RADIX_TREE_MAX_PATH (DIV_ROUND_UP(RADIX_TREE_INDEX_BITS, \ RADIX_TREE_MAP_SHIFT)) /* The IDR tag is stored in the low bits of xa_flags */ #define ROOT_IS_IDR ((__force gfp_t)4) /* The top bits of xa_flags are used to store the root tags */ #define ROOT_TAG_SHIFT (__GFP_BITS_SHIFT) #define RADIX_TREE_INIT(name, mask) XARRAY_INIT(name, mask) #define RADIX_TREE(name, mask) \ struct radix_tree_root name = RADIX_TREE_INIT(name, mask) #define INIT_RADIX_TREE(root, mask) xa_init_flags(root, mask) static inline bool radix_tree_empty(const struct radix_tree_root *root) { return root->xa_head == NULL; } /** * struct radix_tree_iter - radix tree iterator state * * @index: index of current slot * @next_index: one beyond the last index for this chunk * @tags: bit-mask for tag-iterating * @node: node that contains current slot * * This radix tree iterator works in terms of "chunks" of slots. A chunk is a * subinterval of slots contained within one radix tree leaf node. It is * described by a pointer to its first slot and a struct radix_tree_iter * which holds the chunk's position in the tree and its size. For tagged * iteration radix_tree_iter also holds the slots' bit-mask for one chosen * radix tree tag. */ struct radix_tree_iter { unsigned long index; unsigned long next_index; unsigned long tags; struct radix_tree_node *node; }; /** * Radix-tree synchronization * * The radix-tree API requires that users provide all synchronisation (with * specific exceptions, noted below). * * Synchronization of access to the data items being stored in the tree, and * management of their lifetimes must be completely managed by API users. * * For API usage, in general, * - any function _modifying_ the tree or tags (inserting or deleting * items, setting or clearing tags) must exclude other modifications, and * exclude any functions reading the tree. * - any function _reading_ the tree or tags (looking up items or tags, * gang lookups) must exclude modifications to the tree, but may occur * concurrently with other readers. * * The notable exceptions to this rule are the following functions: * __radix_tree_lookup * radix_tree_lookup * radix_tree_lookup_slot * radix_tree_tag_get * radix_tree_gang_lookup * radix_tree_gang_lookup_tag * radix_tree_gang_lookup_tag_slot * radix_tree_tagged * * The first 7 functions are able to be called locklessly, using RCU. The * caller must ensure calls to these functions are made within rcu_read_lock() * regions. Other readers (lock-free or otherwise) and modifications may be * running concurrently. * * It is still required that the caller manage the synchronization and lifetimes * of the items. So if RCU lock-free lookups are used, typically this would mean * that the items have their own locks, or are amenable to lock-free access; and * that the items are freed by RCU (or only freed after having been deleted from * the radix tree *and* a synchronize_rcu() grace period). * * (Note, rcu_assign_pointer and rcu_dereference are not needed to control * access to data items when inserting into or looking up from the radix tree) * * Note that the value returned by radix_tree_tag_get() may not be relied upon * if only the RCU read lock is held. Functions to set/clear tags and to * delete nodes running concurrently with it may affect its result such that * two consecutive reads in the same locked section may return different * values. If reliability is required, modification functions must also be * excluded from concurrency. * * radix_tree_tagged is able to be called without locking or RCU. */ /** * radix_tree_deref_slot - dereference a slot * @slot: slot pointer, returned by radix_tree_lookup_slot * * For use with radix_tree_lookup_slot(). Caller must hold tree at least read * locked across slot lookup and dereference. Not required if write lock is * held (ie. items cannot be concurrently inserted). * * radix_tree_deref_retry must be used to confirm validity of the pointer if * only the read lock is held. * * Return: entry stored in that slot. */ static inline void *radix_tree_deref_slot(void __rcu **slot) { return rcu_dereference(*slot); } /** * radix_tree_deref_slot_protected - dereference a slot with tree lock held * @slot: slot pointer, returned by radix_tree_lookup_slot * * Similar to radix_tree_deref_slot. The caller does not hold the RCU read * lock but it must hold the tree lock to prevent parallel updates. * * Return: entry stored in that slot. */ static inline void *radix_tree_deref_slot_protected(void __rcu **slot, spinlock_t *treelock) { return rcu_dereference_protected(*slot, lockdep_is_held(treelock)); } /** * radix_tree_deref_retry - check radix_tree_deref_slot * @arg: pointer returned by radix_tree_deref_slot * Returns: 0 if retry is not required, otherwise retry is required * * radix_tree_deref_retry must be used with radix_tree_deref_slot. */ static inline int radix_tree_deref_retry(void *arg) { return unlikely(radix_tree_is_internal_node(arg)); } /** * radix_tree_exception - radix_tree_deref_slot returned either exception? * @arg: value returned by radix_tree_deref_slot * Returns: 0 if well-aligned pointer, non-0 if either kind of exception. */ static inline int radix_tree_exception(void *arg) { return unlikely((unsigned long)arg & RADIX_TREE_ENTRY_MASK); } int radix_tree_insert(struct radix_tree_root *, unsigned long index, void *); void *__radix_tree_lookup(const struct radix_tree_root *, unsigned long index, struct radix_tree_node **nodep, void __rcu ***slotp); void *radix_tree_lookup(const struct radix_tree_root *, unsigned long); void __rcu **radix_tree_lookup_slot(const struct radix_tree_root *, unsigned long index); void __radix_tree_replace(struct radix_tree_root *, struct radix_tree_node *, void __rcu **slot, void *entry); void radix_tree_iter_replace(struct radix_tree_root *, const struct radix_tree_iter *, void __rcu **slot, void *entry); void radix_tree_replace_slot(struct radix_tree_root *, void __rcu **slot, void *entry); void radix_tree_iter_delete(struct radix_tree_root *, struct radix_tree_iter *iter, void __rcu **slot); void *radix_tree_delete_item(struct radix_tree_root *, unsigned long, void *); void *radix_tree_delete(struct radix_tree_root *, unsigned long); unsigned int radix_tree_gang_lookup(const struct radix_tree_root *, void **results, unsigned long first_index, unsigned int max_items); int radix_tree_preload(gfp_t gfp_mask); int radix_tree_maybe_preload(gfp_t gfp_mask); void radix_tree_init(void); void *radix_tree_tag_set(struct radix_tree_root *, unsigned long index, unsigned int tag); void *radix_tree_tag_clear(struct radix_tree_root *, unsigned long index, unsigned int tag); int radix_tree_tag_get(const struct radix_tree_root *, unsigned long index, unsigned int tag); void radix_tree_iter_tag_clear(struct radix_tree_root *, const struct radix_tree_iter *iter, unsigned int tag); unsigned int radix_tree_gang_lookup_tag(const struct radix_tree_root *, void **results, unsigned long first_index, unsigned int max_items, unsigned int tag); unsigned int radix_tree_gang_lookup_tag_slot(const struct radix_tree_root *, void __rcu ***results, unsigned long first_index, unsigned int max_items, unsigned int tag); int radix_tree_tagged(const struct radix_tree_root *, unsigned int tag); static inline void radix_tree_preload_end(void) { local_unlock(&radix_tree_preloads.lock); } void __rcu **idr_get_free(struct radix_tree_root *root, struct radix_tree_iter *iter, gfp_t gfp, unsigned long max); enum { RADIX_TREE_ITER_TAG_MASK = 0x0f, /* tag index in lower nybble */ RADIX_TREE_ITER_TAGGED = 0x10, /* lookup tagged slots */ RADIX_TREE_ITER_CONTIG = 0x20, /* stop at first hole */ }; /** * radix_tree_iter_init - initialize radix tree iterator * * @iter: pointer to iterator state * @start: iteration starting index * Returns: NULL */ static __always_inline void __rcu ** radix_tree_iter_init(struct radix_tree_iter *iter, unsigned long start) { /* * Leave iter->tags uninitialized. radix_tree_next_chunk() will fill it * in the case of a successful tagged chunk lookup. If the lookup was * unsuccessful or non-tagged then nobody cares about ->tags. * * Set index to zero to bypass next_index overflow protection. * See the comment in radix_tree_next_chunk() for details. */ iter->index = 0; iter->next_index = start; return NULL; } /** * radix_tree_next_chunk - find next chunk of slots for iteration * * @root: radix tree root * @iter: iterator state * @flags: RADIX_TREE_ITER_* flags and tag index * Returns: pointer to chunk first slot, or NULL if there no more left * * This function looks up the next chunk in the radix tree starting from * @iter->next_index. It returns a pointer to the chunk's first slot. * Also it fills @iter with data about chunk: position in the tree (index), * its end (next_index), and constructs a bit mask for tagged iterating (tags). */ void __rcu **radix_tree_next_chunk(const struct radix_tree_root *, struct radix_tree_iter *iter, unsigned flags); /** * radix_tree_iter_lookup - look up an index in the radix tree * @root: radix tree root * @iter: iterator state * @index: key to look up * * If @index is present in the radix tree, this function returns the slot * containing it and updates @iter to describe the entry. If @index is not * present, it returns NULL. */ static inline void __rcu ** radix_tree_iter_lookup(const struct radix_tree_root *root, struct radix_tree_iter *iter, unsigned long index) { radix_tree_iter_init(iter, index); return radix_tree_next_chunk(root, iter, RADIX_TREE_ITER_CONTIG); } /** * radix_tree_iter_retry - retry this chunk of the iteration * @iter: iterator state * * If we iterate over a tree protected only by the RCU lock, a race * against deletion or creation may result in seeing a slot for which * radix_tree_deref_retry() returns true. If so, call this function * and continue the iteration. */ static inline __must_check void __rcu **radix_tree_iter_retry(struct radix_tree_iter *iter) { iter->next_index = iter->index; iter->tags = 0; return NULL; } static inline unsigned long __radix_tree_iter_add(struct radix_tree_iter *iter, unsigned long slots) { return iter->index + slots; } /** * radix_tree_iter_resume - resume iterating when the chunk may be invalid * @slot: pointer to current slot * @iter: iterator state * Returns: New slot pointer * * If the iterator needs to release then reacquire a lock, the chunk may * have been invalidated by an insertion or deletion. Call this function * before releasing the lock to continue the iteration from the next index. */ void __rcu **__must_check radix_tree_iter_resume(void __rcu **slot, struct radix_tree_iter *iter); /** * radix_tree_chunk_size - get current chunk size * * @iter: pointer to radix tree iterator * Returns: current chunk size */ static __always_inline long radix_tree_chunk_size(struct radix_tree_iter *iter) { return iter->next_index - iter->index; } /** * radix_tree_next_slot - find next slot in chunk * * @slot: pointer to current slot * @iter: pointer to iterator state * @flags: RADIX_TREE_ITER_*, should be constant * Returns: pointer to next slot, or NULL if there no more left * * This function updates @iter->index in the case of a successful lookup. * For tagged lookup it also eats @iter->tags. * * There are several cases where 'slot' can be passed in as NULL to this * function. These cases result from the use of radix_tree_iter_resume() or * radix_tree_iter_retry(). In these cases we don't end up dereferencing * 'slot' because either: * a) we are doing tagged iteration and iter->tags has been set to 0, or * b) we are doing non-tagged iteration, and iter->index and iter->next_index * have been set up so that radix_tree_chunk_size() returns 1 or 0. */ static __always_inline void __rcu **radix_tree_next_slot(void __rcu **slot, struct radix_tree_iter *iter, unsigned flags) { if (flags & RADIX_TREE_ITER_TAGGED) { iter->tags >>= 1; if (unlikely(!iter->tags)) return NULL; if (likely(iter->tags & 1ul)) { iter->index = __radix_tree_iter_add(iter, 1); slot++; goto found; } if (!(flags & RADIX_TREE_ITER_CONTIG)) { unsigned offset = __ffs(iter->tags); iter->tags >>= offset++; iter->index = __radix_tree_iter_add(iter, offset); slot += offset; goto found; } } else { long count = radix_tree_chunk_size(iter); while (--count > 0) { slot++; iter->index = __radix_tree_iter_add(iter, 1); if (likely(*slot)) goto found; if (flags & RADIX_TREE_ITER_CONTIG) { /* forbid switching to the next chunk */ iter->next_index = 0; break; } } } return NULL; found: return slot; } /** * radix_tree_for_each_slot - iterate over non-empty slots * * @slot: the void** variable for pointer to slot * @root: the struct radix_tree_root pointer * @iter: the struct radix_tree_iter pointer * @start: iteration starting index * * @slot points to radix tree slot, @iter->index contains its index. */ #define radix_tree_for_each_slot(slot, root, iter, start) \ for (slot = radix_tree_iter_init(iter, start) ; \ slot || (slot = radix_tree_next_chunk(root, iter, 0)) ; \ slot = radix_tree_next_slot(slot, iter, 0)) /** * radix_tree_for_each_tagged - iterate over tagged slots * * @slot: the void** variable for pointer to slot * @root: the struct radix_tree_root pointer * @iter: the struct radix_tree_iter pointer * @start: iteration starting index * @tag: tag index * * @slot points to radix tree slot, @iter->index contains its index. */ #define radix_tree_for_each_tagged(slot, root, iter, start, tag) \ for (slot = radix_tree_iter_init(iter, start) ; \ slot || (slot = radix_tree_next_chunk(root, iter, \ RADIX_TREE_ITER_TAGGED | tag)) ; \ slot = radix_tree_next_slot(slot, iter, \ RADIX_TREE_ITER_TAGGED | tag)) #endif /* _LINUX_RADIX_TREE_H */ |
| 249 254 322 232 306 37 144 144 143 23 509 528 529 529 528 510 510 510 509 509 246 299 506 3 32 503 508 5 509 509 185 186 185 182 22 27 172 728 729 730 742 2 1 281 569 569 570 569 3 729 730 730 12 12 12 12 1 2 1 4 4 7 3 4 7 6 6 6 6 6 1 6 4 4 6 304 305 306 305 306 154 127 154 154 118 5 155 155 1 153 164 1 160 4 49 116 4 152 3 5 150 155 1 154 1 13 11 11 8 5 2 11 2 9 13 13 13 13 13 13 13 13 13 2 11 13 444 38 524 253 242 488 528 490 446 444 211 393 443 159 358 442 2 439 441 440 372 528 526 528 218 42 491 221 527 527 525 526 185 460 136 354 409 56 216 215 99 214 526 23 527 526 1 19 19 19 1 19 245 347 347 348 328 283 251 71 67 105 283 148 10 10 10 4 10 10 53 53 46 26 10 1 10 53 7 3 7 2 222 107 1 1 1 1 9 6 3 2 53 47 44 1 1 2 1 1 135 3 18 18 9 11 11 11 18 129 145 145 18 15 127 4 2 10 1 3 6 11 11 11 2 59 59 5 2 3 41 8 4 2 39 5 36 41 41 41 7 2 26 2 2 22 2 13 13 13 13 13 9 26 1 3 24 24 20 1 19 41 1 38 39 39 15 15 15 11 1 13 14 13 14 13 2 10 13 36 1 35 82 82 81 82 82 32 32 32 32 91 7 32 32 50 9 95 91 7 6 1 14 56 32 14 1 2 7 2 11 1 1 9 132 14 11 36 95 16 12 1 16 16 161 161 18 156 16 1177 1178 1140 41 484 69 22 27 295 22 7 34 5 68 105 101 64 14 128 58 30 36 18 126 126 105 29 4 51 50 1 43 165 43 43 43 43 410 410 409 40 232 291 401 404 404 292 292 294 402 231 290 403 404 292 404 293 293 294 293 294 294 62 278 232 232 54 54 53 54 123 142 142 128 31 33 45 45 45 45 45 45 11 4 4 4 7 45 1 45 45 55 8 11 1 1 9 7 3 2 3 3 5 5 5 3 2 5 5 5 3 2 2 3 2 3 2 5 2 5 45 1 1 43 36 5 37 4 41 31 2 2 6 3 5 2 38 2 38 36 5 5 38 58 301 68 342 3 344 343 340 4 5 9 345 345 3 346 4 807 800 6 6 809 2 2 808 7 7 7 7 7 15 7 1 6 7 6 6 3 6 2 15 15 14 15 15 3 15 15 15 7 6 2 1 13 14 14 13 11 1 11 5 5 10 10 11 337 1225 1223 1220 1225 1224 5 2 1224 1252 4 2 19 30 1224 2 2 1253 1169 1168 1293 1294 299 1236 2 2 2 2 140 3 7 54 94 1 115 46 140 7 140 13 14 14 14 14 14 6 14 14 126 126 338 35 313 337 36 39 39 39 306 306 304 5 307 307 77 558 558 557 33 33 556 556 556 282 33 33 33 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 2674 2675 2676 2677 2678 2679 2680 2681 2682 2683 2684 2685 2686 2687 2688 2689 2690 2691 2692 2693 2694 2695 2696 2697 2698 2699 2700 2701 2702 2703 2704 2705 2706 2707 2708 2709 2710 2711 2712 2713 2714 2715 2716 2717 2718 2719 2720 2721 2722 2723 2724 2725 2726 2727 2728 2729 2730 2731 2732 2733 2734 2735 2736 2737 2738 2739 2740 2741 2742 2743 2744 2745 2746 2747 2748 2749 2750 2751 2752 2753 2754 2755 2756 2757 2758 2759 2760 2761 2762 2763 2764 2765 2766 2767 2768 2769 2770 2771 2772 2773 2774 2775 2776 2777 2778 2779 2780 2781 2782 2783 2784 2785 2786 2787 2788 2789 2790 2791 2792 2793 2794 2795 2796 2797 2798 2799 2800 2801 2802 2803 2804 2805 2806 2807 2808 2809 2810 2811 2812 2813 2814 2815 2816 2817 2818 2819 2820 2821 2822 2823 2824 2825 2826 2827 2828 2829 2830 2831 2832 2833 2834 2835 2836 2837 2838 2839 2840 2841 2842 2843 2844 2845 2846 2847 2848 2849 2850 2851 2852 2853 2854 2855 2856 2857 2858 2859 2860 2861 2862 2863 2864 2865 2866 2867 2868 2869 2870 2871 2872 2873 2874 2875 2876 2877 2878 2879 2880 2881 2882 2883 2884 2885 2886 2887 2888 2889 2890 2891 2892 2893 2894 2895 2896 2897 2898 2899 2900 2901 2902 2903 2904 2905 2906 2907 2908 2909 2910 2911 2912 2913 2914 2915 2916 2917 2918 2919 2920 2921 2922 2923 2924 2925 2926 2927 2928 2929 2930 2931 2932 2933 2934 2935 2936 2937 2938 2939 2940 2941 2942 2943 2944 2945 2946 2947 2948 2949 2950 2951 2952 2953 2954 2955 2956 2957 2958 2959 2960 2961 2962 2963 2964 2965 2966 2967 2968 2969 2970 2971 2972 2973 2974 2975 2976 2977 2978 2979 2980 2981 2982 2983 2984 2985 2986 2987 2988 2989 2990 2991 2992 2993 2994 2995 2996 2997 2998 2999 3000 3001 3002 3003 3004 3005 3006 3007 3008 3009 3010 3011 3012 3013 3014 3015 3016 3017 3018 3019 3020 3021 3022 3023 3024 3025 3026 3027 3028 3029 3030 3031 3032 3033 3034 3035 3036 3037 3038 3039 3040 3041 3042 3043 3044 3045 3046 3047 3048 3049 3050 3051 3052 3053 3054 3055 3056 3057 3058 3059 3060 3061 3062 3063 3064 3065 3066 3067 3068 3069 3070 3071 3072 3073 3074 3075 3076 3077 3078 3079 3080 3081 3082 3083 3084 3085 3086 3087 3088 3089 3090 3091 3092 3093 3094 3095 3096 3097 3098 3099 3100 3101 3102 3103 3104 3105 3106 3107 3108 3109 3110 3111 3112 3113 3114 3115 3116 3117 3118 3119 3120 3121 3122 3123 3124 3125 3126 3127 3128 3129 3130 3131 3132 3133 3134 3135 3136 3137 3138 3139 3140 3141 3142 3143 3144 3145 3146 3147 3148 3149 3150 3151 3152 3153 3154 3155 3156 3157 3158 3159 3160 3161 3162 3163 3164 3165 3166 3167 3168 3169 3170 3171 3172 3173 3174 3175 3176 3177 3178 3179 3180 3181 3182 3183 3184 3185 3186 3187 3188 3189 3190 3191 3192 3193 3194 3195 3196 3197 3198 3199 3200 3201 3202 3203 3204 3205 3206 3207 3208 3209 3210 3211 3212 3213 3214 3215 3216 3217 3218 3219 3220 3221 3222 3223 3224 3225 3226 3227 3228 3229 3230 3231 3232 3233 3234 3235 3236 3237 3238 3239 3240 3241 3242 3243 3244 3245 3246 3247 3248 3249 3250 3251 3252 3253 3254 3255 3256 3257 3258 3259 3260 3261 3262 3263 3264 3265 3266 3267 3268 3269 3270 3271 3272 3273 3274 3275 3276 3277 3278 3279 3280 3281 3282 3283 3284 3285 3286 3287 3288 3289 3290 3291 3292 3293 3294 3295 3296 3297 3298 3299 3300 3301 3302 3303 3304 3305 3306 3307 3308 3309 3310 3311 3312 3313 3314 3315 3316 3317 3318 3319 3320 3321 3322 3323 3324 3325 3326 3327 3328 3329 3330 3331 3332 3333 3334 3335 3336 3337 3338 3339 3340 3341 3342 3343 3344 3345 3346 3347 3348 3349 3350 3351 3352 3353 3354 3355 3356 3357 3358 3359 3360 3361 3362 3363 3364 3365 3366 3367 3368 3369 3370 3371 3372 3373 3374 3375 3376 3377 3378 3379 3380 3381 3382 3383 3384 3385 3386 3387 3388 3389 3390 3391 3392 3393 3394 3395 3396 3397 3398 3399 3400 3401 3402 3403 3404 3405 3406 3407 3408 3409 3410 3411 3412 3413 3414 3415 3416 3417 3418 3419 3420 3421 3422 3423 3424 3425 3426 3427 3428 3429 3430 3431 3432 3433 3434 3435 3436 3437 3438 3439 3440 3441 3442 3443 3444 3445 3446 3447 3448 3449 3450 3451 3452 3453 3454 3455 3456 3457 3458 3459 3460 3461 3462 3463 3464 3465 3466 3467 3468 3469 3470 3471 3472 3473 3474 3475 3476 3477 3478 3479 3480 3481 3482 3483 3484 3485 3486 3487 3488 3489 3490 3491 3492 3493 3494 3495 3496 3497 3498 3499 3500 3501 3502 3503 3504 3505 3506 3507 3508 3509 3510 3511 3512 3513 3514 3515 3516 3517 3518 3519 3520 3521 3522 3523 3524 3525 3526 3527 3528 3529 3530 3531 3532 3533 3534 3535 3536 3537 3538 3539 3540 3541 3542 3543 3544 3545 3546 3547 3548 3549 3550 3551 3552 3553 3554 3555 3556 3557 3558 3559 3560 3561 3562 3563 3564 3565 3566 3567 3568 3569 3570 3571 3572 3573 3574 3575 3576 3577 3578 3579 3580 3581 3582 3583 3584 3585 3586 3587 3588 3589 3590 3591 3592 3593 3594 3595 3596 3597 3598 3599 3600 3601 3602 3603 3604 3605 3606 3607 3608 3609 3610 3611 3612 3613 3614 3615 3616 3617 3618 3619 3620 3621 3622 3623 3624 3625 3626 3627 3628 3629 3630 3631 3632 3633 3634 3635 3636 3637 3638 3639 3640 3641 3642 3643 3644 3645 3646 3647 3648 3649 3650 3651 3652 3653 3654 3655 3656 3657 3658 3659 3660 3661 3662 3663 3664 3665 3666 3667 3668 3669 3670 3671 3672 3673 3674 3675 3676 3677 3678 3679 3680 3681 3682 3683 3684 3685 3686 3687 3688 3689 3690 3691 3692 3693 3694 3695 3696 3697 3698 3699 3700 3701 3702 3703 3704 3705 3706 3707 3708 3709 3710 3711 3712 3713 3714 3715 3716 3717 3718 3719 3720 3721 3722 3723 3724 3725 3726 3727 3728 3729 3730 3731 3732 3733 3734 3735 3736 3737 3738 3739 3740 3741 3742 3743 3744 3745 3746 3747 3748 3749 3750 3751 3752 3753 3754 3755 3756 3757 3758 3759 3760 3761 3762 3763 3764 3765 3766 3767 3768 3769 3770 3771 3772 3773 3774 3775 3776 3777 3778 3779 3780 3781 3782 3783 3784 3785 3786 3787 3788 3789 3790 3791 3792 3793 3794 3795 3796 3797 3798 3799 3800 3801 3802 3803 3804 3805 3806 3807 3808 3809 3810 3811 3812 3813 3814 3815 3816 3817 3818 3819 3820 3821 3822 3823 3824 3825 3826 3827 3828 3829 3830 3831 3832 3833 3834 3835 3836 3837 3838 3839 3840 3841 3842 3843 3844 3845 3846 3847 3848 3849 3850 3851 3852 3853 3854 3855 3856 3857 3858 3859 3860 3861 3862 3863 3864 3865 3866 3867 3868 3869 3870 3871 3872 3873 3874 3875 3876 3877 3878 3879 3880 3881 3882 3883 3884 3885 3886 3887 3888 3889 3890 3891 3892 3893 3894 3895 3896 3897 3898 3899 3900 3901 3902 3903 3904 3905 3906 3907 3908 3909 3910 3911 3912 3913 3914 3915 3916 3917 3918 3919 3920 3921 3922 3923 3924 3925 3926 3927 3928 3929 3930 3931 3932 3933 3934 3935 3936 3937 3938 3939 3940 3941 3942 3943 3944 3945 3946 3947 3948 3949 3950 3951 3952 3953 3954 3955 3956 3957 3958 3959 3960 3961 3962 3963 3964 3965 3966 3967 3968 3969 3970 3971 3972 3973 3974 3975 3976 3977 3978 3979 3980 3981 3982 3983 3984 3985 3986 3987 3988 3989 3990 3991 3992 3993 3994 3995 3996 3997 3998 3999 4000 4001 4002 4003 4004 4005 4006 4007 4008 4009 4010 4011 4012 4013 4014 4015 4016 4017 4018 4019 4020 4021 4022 4023 4024 4025 4026 4027 4028 4029 4030 4031 4032 4033 4034 4035 4036 4037 4038 4039 4040 4041 4042 4043 4044 4045 4046 4047 4048 4049 4050 4051 4052 4053 4054 4055 4056 4057 4058 4059 4060 4061 4062 4063 4064 4065 4066 4067 4068 4069 4070 4071 4072 4073 4074 4075 4076 4077 4078 4079 4080 4081 4082 4083 4084 4085 4086 4087 4088 4089 4090 4091 4092 4093 4094 4095 4096 4097 4098 4099 4100 4101 4102 4103 4104 4105 4106 4107 4108 4109 4110 4111 4112 4113 4114 4115 4116 4117 4118 4119 4120 4121 4122 4123 4124 4125 4126 4127 4128 4129 4130 4131 4132 4133 4134 4135 4136 4137 4138 4139 4140 4141 4142 4143 4144 4145 4146 4147 4148 4149 4150 4151 4152 4153 4154 4155 4156 4157 4158 4159 4160 4161 4162 4163 4164 4165 4166 4167 4168 4169 4170 4171 4172 4173 4174 4175 4176 4177 4178 4179 4180 4181 4182 4183 4184 4185 4186 4187 4188 4189 4190 4191 4192 4193 4194 4195 4196 4197 4198 4199 4200 4201 4202 4203 4204 4205 4206 4207 4208 4209 4210 4211 4212 4213 4214 4215 4216 4217 4218 4219 4220 4221 4222 4223 4224 4225 4226 4227 4228 4229 4230 4231 4232 4233 4234 4235 4236 4237 4238 4239 4240 4241 4242 4243 4244 4245 4246 4247 4248 4249 4250 4251 4252 4253 4254 4255 4256 4257 4258 4259 4260 4261 4262 4263 4264 4265 4266 4267 4268 4269 4270 4271 4272 4273 4274 4275 4276 4277 4278 4279 4280 4281 4282 4283 4284 4285 4286 4287 4288 4289 4290 4291 4292 4293 4294 4295 4296 4297 4298 4299 4300 4301 4302 4303 4304 4305 4306 4307 4308 4309 4310 4311 4312 4313 4314 4315 4316 4317 4318 4319 4320 4321 4322 4323 4324 4325 4326 4327 4328 4329 4330 4331 4332 4333 4334 4335 4336 4337 4338 4339 4340 4341 4342 4343 4344 4345 4346 4347 4348 4349 4350 4351 4352 4353 4354 4355 4356 4357 4358 4359 4360 4361 4362 4363 4364 4365 4366 4367 4368 4369 4370 4371 4372 4373 4374 4375 4376 4377 4378 4379 4380 4381 4382 4383 4384 4385 4386 4387 4388 4389 4390 4391 4392 4393 4394 4395 4396 4397 4398 4399 4400 4401 4402 4403 4404 4405 4406 4407 4408 4409 4410 4411 4412 4413 4414 4415 4416 4417 4418 4419 4420 4421 4422 4423 4424 4425 4426 4427 4428 4429 4430 4431 4432 4433 4434 4435 4436 4437 4438 4439 4440 4441 4442 4443 4444 4445 4446 4447 4448 4449 4450 4451 4452 4453 4454 4455 4456 4457 4458 4459 4460 4461 4462 4463 4464 4465 4466 4467 4468 4469 4470 4471 4472 4473 4474 4475 4476 4477 4478 4479 4480 4481 4482 4483 4484 4485 4486 4487 4488 4489 4490 4491 4492 4493 4494 4495 4496 4497 4498 4499 4500 4501 4502 4503 4504 4505 4506 4507 4508 4509 4510 4511 4512 4513 4514 4515 4516 4517 4518 4519 4520 4521 4522 4523 4524 4525 4526 4527 4528 4529 4530 4531 4532 4533 4534 4535 4536 4537 4538 4539 4540 4541 4542 4543 4544 4545 4546 4547 4548 4549 4550 4551 4552 4553 4554 4555 4556 4557 4558 4559 4560 4561 4562 4563 4564 4565 4566 4567 4568 4569 4570 4571 4572 4573 4574 4575 4576 4577 4578 4579 4580 4581 4582 4583 4584 4585 4586 4587 4588 4589 4590 4591 4592 4593 4594 4595 4596 4597 4598 4599 4600 4601 4602 4603 4604 4605 4606 4607 4608 4609 4610 4611 4612 4613 4614 4615 4616 4617 4618 4619 4620 4621 4622 4623 4624 4625 4626 4627 4628 4629 4630 4631 4632 4633 4634 4635 4636 4637 4638 4639 4640 4641 4642 4643 4644 4645 4646 4647 4648 4649 4650 4651 4652 4653 4654 4655 4656 4657 4658 4659 4660 4661 4662 4663 4664 4665 4666 4667 4668 4669 4670 4671 4672 4673 4674 4675 4676 4677 4678 4679 4680 4681 4682 4683 4684 4685 4686 4687 4688 4689 4690 4691 4692 4693 4694 4695 4696 4697 4698 4699 4700 4701 4702 4703 4704 4705 4706 4707 4708 4709 4710 4711 4712 4713 4714 4715 4716 4717 4718 4719 4720 4721 4722 4723 4724 4725 4726 4727 4728 4729 4730 4731 4732 4733 4734 4735 4736 4737 4738 4739 4740 4741 4742 4743 4744 4745 4746 4747 4748 4749 4750 4751 4752 4753 4754 4755 4756 4757 4758 4759 4760 4761 4762 4763 4764 4765 4766 4767 4768 4769 4770 4771 4772 4773 4774 4775 4776 4777 4778 4779 4780 4781 4782 4783 4784 4785 4786 4787 4788 4789 4790 4791 4792 4793 4794 4795 4796 4797 4798 4799 4800 4801 4802 4803 4804 4805 4806 4807 4808 4809 4810 4811 4812 4813 4814 4815 4816 4817 4818 4819 4820 4821 4822 4823 4824 4825 4826 4827 4828 4829 4830 4831 4832 4833 4834 4835 4836 4837 4838 4839 4840 4841 4842 4843 4844 4845 4846 4847 4848 4849 4850 4851 4852 4853 4854 4855 4856 4857 4858 4859 4860 4861 4862 4863 4864 4865 4866 4867 4868 4869 4870 4871 4872 4873 4874 4875 4876 4877 4878 4879 4880 4881 4882 4883 4884 4885 4886 4887 4888 4889 4890 4891 4892 4893 4894 4895 4896 4897 4898 4899 4900 4901 4902 4903 4904 4905 4906 4907 4908 4909 4910 4911 4912 4913 4914 4915 4916 4917 4918 4919 4920 4921 4922 4923 4924 4925 4926 4927 4928 4929 4930 4931 4932 4933 4934 4935 4936 4937 4938 4939 4940 4941 4942 4943 4944 4945 4946 4947 4948 4949 4950 4951 4952 4953 4954 4955 4956 4957 4958 4959 4960 4961 4962 4963 4964 4965 4966 4967 4968 4969 4970 4971 4972 4973 4974 4975 4976 4977 4978 4979 4980 4981 4982 4983 4984 4985 4986 4987 4988 4989 4990 4991 4992 4993 4994 4995 4996 4997 4998 4999 5000 5001 5002 5003 5004 5005 5006 5007 5008 5009 5010 5011 5012 5013 5014 5015 5016 5017 5018 5019 5020 5021 5022 5023 5024 5025 5026 5027 5028 5029 5030 5031 5032 5033 5034 5035 5036 5037 5038 5039 5040 5041 5042 5043 5044 5045 5046 5047 5048 5049 5050 5051 5052 5053 5054 5055 5056 5057 5058 5059 5060 5061 5062 5063 5064 5065 5066 5067 5068 5069 5070 5071 5072 5073 5074 5075 5076 5077 5078 5079 5080 5081 5082 5083 5084 5085 5086 5087 5088 5089 5090 5091 5092 5093 5094 5095 5096 5097 5098 5099 5100 5101 5102 5103 5104 5105 5106 5107 5108 5109 5110 5111 5112 5113 5114 5115 5116 5117 5118 5119 5120 5121 5122 5123 5124 5125 5126 5127 5128 5129 5130 5131 5132 5133 5134 5135 5136 5137 5138 5139 5140 5141 5142 5143 5144 5145 5146 5147 5148 5149 5150 5151 5152 5153 5154 5155 5156 5157 5158 5159 5160 5161 5162 5163 5164 5165 5166 5167 5168 5169 5170 5171 5172 5173 5174 5175 5176 5177 5178 5179 5180 5181 5182 5183 5184 5185 5186 5187 5188 5189 5190 5191 5192 5193 5194 5195 5196 5197 5198 5199 5200 5201 5202 5203 5204 5205 5206 5207 5208 5209 5210 5211 5212 5213 5214 5215 5216 5217 5218 5219 5220 5221 5222 5223 5224 5225 5226 5227 5228 5229 5230 5231 5232 5233 5234 5235 5236 5237 5238 5239 5240 5241 5242 5243 5244 5245 5246 5247 5248 5249 5250 5251 5252 5253 5254 5255 5256 5257 5258 5259 5260 5261 5262 5263 5264 5265 5266 5267 5268 5269 5270 5271 5272 5273 5274 5275 5276 5277 5278 5279 5280 5281 5282 5283 5284 5285 5286 5287 5288 5289 5290 5291 5292 5293 5294 5295 5296 5297 5298 5299 5300 5301 5302 5303 5304 5305 5306 5307 5308 5309 5310 5311 5312 5313 5314 5315 5316 5317 5318 5319 5320 5321 5322 5323 5324 5325 5326 5327 5328 5329 5330 5331 5332 5333 5334 5335 5336 5337 5338 5339 5340 5341 5342 5343 5344 5345 5346 5347 5348 5349 5350 5351 5352 5353 5354 5355 5356 5357 5358 5359 5360 5361 5362 5363 5364 5365 5366 5367 5368 5369 5370 5371 5372 5373 5374 5375 5376 5377 5378 5379 5380 5381 5382 5383 5384 5385 5386 5387 5388 5389 5390 5391 5392 5393 5394 5395 5396 5397 5398 5399 5400 5401 5402 5403 5404 5405 5406 5407 5408 5409 5410 5411 5412 5413 5414 5415 5416 5417 5418 5419 5420 5421 5422 5423 5424 5425 5426 5427 5428 5429 5430 5431 5432 5433 5434 5435 5436 5437 5438 5439 5440 5441 5442 5443 5444 5445 5446 5447 5448 5449 5450 5451 5452 5453 5454 5455 5456 5457 5458 5459 5460 5461 5462 5463 5464 5465 5466 5467 5468 5469 5470 5471 5472 5473 5474 5475 5476 5477 5478 5479 5480 5481 5482 5483 5484 5485 5486 5487 5488 5489 5490 5491 5492 5493 5494 5495 5496 5497 5498 5499 5500 5501 5502 5503 5504 5505 5506 5507 5508 5509 5510 5511 5512 5513 5514 5515 5516 5517 5518 5519 5520 5521 5522 5523 5524 5525 5526 5527 5528 5529 5530 5531 5532 5533 5534 5535 5536 5537 5538 5539 5540 5541 5542 5543 5544 5545 5546 5547 5548 5549 5550 5551 5552 5553 5554 5555 5556 5557 5558 5559 5560 5561 5562 5563 5564 5565 5566 5567 5568 5569 5570 5571 5572 5573 5574 5575 5576 5577 5578 5579 5580 5581 5582 5583 5584 5585 5586 5587 5588 5589 5590 5591 5592 5593 5594 5595 5596 5597 5598 5599 5600 5601 5602 5603 5604 5605 5606 5607 5608 5609 5610 5611 5612 5613 5614 5615 5616 5617 5618 5619 5620 5621 5622 5623 5624 5625 5626 5627 5628 5629 5630 5631 5632 5633 5634 5635 5636 5637 5638 5639 5640 5641 5642 5643 5644 5645 5646 5647 5648 5649 5650 5651 5652 5653 5654 5655 5656 5657 5658 5659 5660 5661 5662 5663 5664 5665 5666 5667 5668 5669 5670 5671 5672 5673 5674 5675 5676 5677 5678 5679 5680 5681 5682 5683 5684 5685 5686 5687 5688 5689 5690 5691 5692 5693 5694 5695 5696 5697 5698 5699 5700 5701 5702 5703 5704 5705 5706 5707 5708 5709 5710 5711 5712 5713 5714 5715 5716 5717 5718 5719 5720 5721 5722 5723 5724 5725 5726 5727 5728 5729 5730 5731 5732 5733 5734 5735 5736 5737 5738 5739 5740 5741 5742 5743 5744 5745 5746 5747 5748 5749 5750 5751 5752 5753 5754 5755 5756 5757 5758 5759 5760 5761 5762 5763 5764 5765 5766 5767 5768 5769 5770 5771 5772 5773 5774 5775 5776 5777 5778 5779 5780 5781 5782 5783 5784 5785 5786 5787 5788 5789 5790 5791 5792 5793 5794 5795 5796 5797 5798 5799 5800 5801 5802 5803 5804 5805 5806 5807 5808 5809 5810 5811 5812 5813 5814 5815 5816 5817 5818 5819 5820 5821 5822 5823 5824 5825 5826 5827 5828 5829 5830 5831 5832 5833 5834 5835 5836 5837 5838 5839 5840 5841 5842 5843 5844 5845 5846 5847 5848 5849 5850 5851 5852 5853 5854 5855 5856 5857 5858 5859 5860 5861 5862 5863 5864 5865 5866 5867 5868 5869 5870 5871 5872 5873 5874 5875 5876 5877 5878 5879 5880 5881 5882 5883 5884 5885 5886 5887 5888 5889 5890 5891 5892 5893 5894 5895 5896 5897 5898 5899 5900 5901 5902 5903 5904 5905 5906 5907 5908 5909 5910 5911 5912 5913 5914 5915 5916 5917 5918 5919 5920 5921 5922 5923 5924 5925 5926 5927 5928 5929 5930 5931 5932 5933 5934 5935 5936 5937 5938 5939 5940 5941 5942 5943 5944 5945 5946 5947 5948 5949 5950 5951 5952 5953 5954 5955 5956 5957 5958 5959 5960 5961 5962 5963 5964 5965 5966 5967 5968 5969 5970 5971 5972 5973 5974 5975 5976 5977 5978 5979 5980 5981 5982 5983 5984 5985 5986 5987 5988 5989 5990 5991 5992 5993 5994 5995 5996 5997 5998 5999 6000 6001 6002 6003 6004 6005 6006 6007 6008 6009 6010 6011 6012 6013 6014 6015 6016 6017 6018 6019 6020 6021 6022 6023 6024 6025 6026 6027 6028 6029 6030 6031 6032 6033 6034 6035 6036 6037 6038 6039 6040 6041 6042 6043 6044 6045 6046 6047 6048 6049 6050 6051 6052 6053 6054 6055 6056 6057 6058 6059 6060 6061 6062 6063 6064 6065 6066 6067 6068 6069 6070 6071 6072 6073 6074 6075 6076 6077 6078 6079 6080 6081 6082 6083 6084 6085 6086 6087 6088 6089 6090 6091 6092 6093 6094 6095 6096 6097 6098 6099 6100 6101 6102 6103 6104 6105 6106 6107 6108 6109 6110 6111 6112 6113 6114 6115 6116 6117 6118 6119 6120 6121 6122 6123 6124 6125 6126 6127 6128 6129 6130 6131 6132 6133 6134 6135 6136 6137 6138 6139 6140 6141 6142 6143 6144 6145 6146 6147 6148 6149 6150 6151 6152 6153 6154 6155 6156 6157 6158 6159 6160 6161 6162 6163 6164 6165 6166 6167 6168 6169 6170 6171 6172 6173 6174 6175 6176 6177 6178 6179 6180 6181 6182 6183 6184 6185 6186 6187 6188 6189 6190 6191 6192 6193 6194 6195 6196 6197 6198 6199 6200 6201 6202 6203 6204 6205 6206 6207 6208 6209 6210 6211 6212 6213 6214 6215 6216 6217 6218 6219 6220 6221 6222 6223 6224 6225 6226 6227 6228 6229 6230 6231 6232 6233 6234 6235 6236 6237 6238 6239 6240 6241 6242 6243 6244 6245 6246 6247 6248 6249 6250 6251 6252 6253 6254 6255 6256 6257 6258 6259 6260 6261 6262 6263 6264 6265 6266 6267 6268 6269 6270 6271 6272 6273 6274 6275 6276 6277 6278 6279 6280 6281 6282 6283 6284 6285 6286 6287 6288 6289 6290 6291 6292 6293 6294 6295 6296 6297 6298 6299 6300 6301 6302 6303 6304 6305 6306 6307 6308 6309 6310 6311 6312 6313 6314 6315 6316 6317 6318 6319 6320 6321 6322 6323 6324 6325 6326 6327 6328 6329 6330 6331 6332 6333 6334 6335 6336 6337 6338 6339 6340 6341 6342 6343 6344 6345 6346 6347 6348 6349 6350 6351 6352 6353 6354 6355 6356 6357 6358 6359 6360 6361 6362 6363 6364 6365 6366 6367 6368 6369 6370 6371 6372 6373 6374 6375 6376 6377 6378 6379 6380 6381 6382 6383 6384 6385 6386 6387 6388 6389 6390 6391 6392 6393 6394 6395 6396 6397 6398 6399 6400 6401 6402 6403 6404 6405 6406 6407 6408 6409 6410 6411 6412 6413 6414 6415 6416 6417 6418 6419 6420 6421 6422 6423 6424 6425 6426 6427 6428 6429 6430 6431 6432 6433 6434 6435 6436 6437 6438 6439 6440 6441 6442 6443 6444 6445 6446 6447 6448 6449 6450 6451 6452 6453 6454 6455 6456 6457 6458 6459 6460 6461 6462 6463 6464 6465 6466 6467 6468 6469 6470 6471 6472 6473 6474 6475 6476 6477 6478 6479 6480 6481 6482 6483 6484 6485 6486 6487 6488 6489 6490 6491 6492 6493 6494 6495 6496 6497 6498 6499 6500 6501 6502 6503 6504 6505 6506 6507 6508 6509 6510 6511 6512 6513 6514 6515 6516 6517 6518 6519 6520 6521 6522 6523 6524 6525 6526 6527 6528 6529 6530 6531 6532 6533 6534 6535 6536 6537 6538 6539 6540 6541 6542 6543 6544 6545 6546 6547 6548 6549 6550 6551 6552 6553 6554 6555 6556 6557 6558 6559 6560 6561 6562 6563 6564 6565 6566 6567 6568 6569 6570 6571 6572 6573 6574 6575 6576 6577 6578 6579 6580 6581 6582 6583 6584 6585 6586 6587 6588 6589 6590 6591 6592 6593 6594 6595 6596 6597 6598 6599 6600 6601 6602 6603 6604 6605 6606 6607 6608 6609 6610 6611 6612 6613 6614 6615 6616 6617 6618 6619 6620 6621 6622 6623 6624 6625 6626 6627 6628 6629 6630 6631 6632 6633 6634 6635 6636 6637 6638 6639 6640 6641 6642 6643 6644 6645 6646 6647 6648 6649 6650 6651 6652 6653 6654 6655 6656 6657 6658 6659 6660 6661 6662 6663 6664 6665 6666 6667 6668 6669 6670 6671 6672 6673 6674 6675 6676 6677 6678 6679 6680 6681 6682 6683 6684 6685 6686 6687 6688 6689 6690 6691 6692 6693 6694 6695 6696 6697 6698 6699 6700 6701 6702 6703 6704 6705 6706 6707 6708 6709 6710 6711 6712 6713 6714 6715 6716 6717 6718 6719 6720 6721 6722 6723 6724 6725 6726 6727 6728 6729 6730 6731 6732 6733 6734 6735 6736 6737 6738 6739 6740 6741 6742 6743 6744 6745 6746 6747 6748 6749 6750 6751 6752 6753 6754 6755 6756 6757 6758 6759 6760 6761 6762 6763 6764 6765 6766 6767 6768 6769 6770 6771 6772 6773 6774 6775 6776 6777 6778 6779 6780 6781 6782 6783 6784 6785 6786 6787 6788 6789 6790 6791 6792 6793 6794 6795 6796 6797 6798 6799 6800 6801 6802 6803 6804 6805 6806 6807 6808 6809 6810 6811 6812 6813 6814 6815 6816 6817 6818 6819 6820 6821 6822 6823 6824 6825 6826 6827 6828 6829 6830 6831 6832 6833 6834 6835 6836 6837 6838 6839 6840 6841 6842 6843 6844 6845 6846 6847 6848 6849 6850 6851 6852 6853 6854 6855 6856 6857 6858 6859 6860 6861 6862 6863 6864 6865 6866 6867 6868 6869 6870 6871 6872 6873 6874 6875 6876 6877 6878 6879 6880 6881 6882 6883 6884 6885 6886 6887 6888 6889 6890 6891 6892 6893 6894 6895 6896 6897 6898 6899 6900 6901 6902 6903 6904 6905 6906 6907 6908 6909 6910 6911 6912 6913 6914 6915 6916 6917 6918 6919 6920 6921 6922 6923 6924 6925 6926 6927 6928 6929 6930 6931 6932 6933 6934 6935 6936 6937 6938 6939 6940 6941 6942 6943 6944 6945 6946 6947 6948 6949 6950 6951 6952 6953 6954 6955 6956 6957 6958 6959 6960 6961 6962 6963 6964 6965 6966 6967 6968 6969 6970 6971 6972 6973 6974 6975 6976 6977 6978 6979 6980 6981 6982 6983 6984 6985 6986 6987 6988 6989 6990 6991 6992 6993 6994 6995 6996 6997 6998 6999 7000 7001 7002 7003 7004 7005 7006 7007 7008 7009 7010 7011 7012 7013 7014 7015 7016 7017 7018 7019 7020 7021 7022 7023 7024 7025 7026 7027 7028 7029 7030 7031 7032 7033 7034 7035 7036 7037 7038 7039 7040 7041 7042 7043 7044 7045 7046 7047 7048 7049 7050 7051 7052 7053 7054 7055 7056 7057 7058 7059 7060 7061 7062 7063 7064 7065 7066 7067 7068 7069 7070 7071 7072 7073 7074 7075 7076 7077 7078 7079 7080 7081 7082 7083 7084 7085 7086 7087 7088 7089 7090 7091 7092 7093 7094 7095 7096 7097 7098 7099 7100 7101 7102 7103 7104 7105 7106 7107 7108 7109 7110 7111 7112 7113 7114 7115 7116 7117 7118 7119 7120 7121 7122 7123 7124 7125 7126 7127 7128 7129 7130 7131 7132 7133 7134 7135 7136 7137 7138 7139 7140 7141 7142 7143 7144 7145 7146 7147 7148 7149 7150 7151 7152 7153 7154 7155 7156 7157 7158 7159 7160 7161 7162 7163 7164 7165 7166 7167 7168 7169 7170 7171 7172 7173 7174 7175 7176 7177 7178 7179 7180 7181 7182 7183 7184 7185 7186 7187 7188 7189 7190 7191 7192 7193 7194 7195 7196 7197 7198 7199 7200 7201 7202 7203 7204 7205 7206 7207 7208 7209 7210 7211 7212 7213 7214 7215 7216 7217 7218 7219 7220 7221 7222 7223 7224 7225 7226 7227 7228 7229 7230 7231 7232 7233 7234 7235 7236 7237 7238 7239 7240 7241 7242 7243 7244 7245 7246 7247 7248 7249 7250 7251 7252 7253 7254 7255 7256 7257 7258 7259 7260 7261 7262 7263 7264 7265 7266 7267 7268 7269 7270 7271 7272 7273 7274 7275 7276 7277 7278 7279 7280 7281 7282 7283 7284 7285 7286 7287 7288 7289 7290 7291 7292 7293 7294 7295 7296 7297 7298 7299 7300 7301 7302 7303 7304 7305 7306 7307 7308 7309 7310 7311 7312 7313 7314 7315 7316 7317 7318 7319 7320 7321 7322 7323 7324 7325 7326 7327 7328 7329 7330 7331 7332 7333 7334 7335 7336 7337 7338 7339 7340 7341 7342 7343 7344 7345 7346 7347 7348 7349 7350 7351 7352 7353 7354 7355 7356 7357 7358 7359 7360 7361 7362 7363 7364 7365 7366 7367 7368 7369 7370 7371 7372 7373 7374 7375 7376 7377 7378 7379 7380 7381 7382 7383 7384 7385 7386 7387 7388 7389 7390 7391 7392 7393 7394 7395 7396 7397 7398 7399 7400 7401 7402 7403 7404 7405 7406 7407 7408 7409 7410 7411 7412 7413 7414 7415 7416 7417 7418 7419 7420 7421 7422 7423 7424 7425 7426 7427 7428 7429 7430 7431 7432 7433 7434 7435 7436 7437 7438 7439 7440 7441 7442 7443 7444 7445 7446 7447 7448 7449 7450 7451 7452 7453 7454 7455 7456 7457 7458 7459 7460 7461 7462 7463 7464 7465 7466 7467 7468 7469 7470 7471 7472 7473 7474 7475 7476 7477 7478 7479 7480 7481 7482 7483 7484 7485 7486 7487 7488 7489 7490 7491 7492 7493 7494 7495 7496 7497 7498 7499 7500 7501 7502 7503 7504 7505 7506 7507 7508 7509 7510 7511 7512 7513 7514 7515 7516 7517 7518 7519 7520 7521 7522 7523 7524 7525 7526 7527 7528 7529 7530 7531 7532 7533 7534 7535 7536 7537 7538 7539 7540 7541 7542 7543 7544 7545 7546 7547 7548 7549 7550 7551 7552 7553 7554 7555 7556 7557 7558 7559 7560 7561 7562 7563 7564 7565 7566 7567 7568 7569 7570 7571 7572 7573 7574 7575 7576 7577 7578 7579 7580 7581 7582 7583 7584 7585 7586 7587 7588 7589 7590 7591 7592 7593 7594 7595 7596 7597 7598 7599 7600 7601 7602 7603 7604 7605 7606 7607 7608 7609 7610 7611 7612 7613 7614 7615 7616 7617 7618 7619 7620 7621 | // SPDX-License-Identifier: GPL-2.0-or-later /* * IPv6 Address [auto]configuration * Linux INET6 implementation * * Authors: * Pedro Roque <roque@di.fc.ul.pt> * Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> */ /* * Changes: * * Janos Farkas : delete timer on ifdown * <chexum@bankinf.banki.hu> * Andi Kleen : kill double kfree on module * unload. * Maciej W. Rozycki : FDDI support * sekiya@USAGI : Don't send too many RS * packets. * yoshfuji@USAGI : Fixed interval between DAD * packets. * YOSHIFUJI Hideaki @USAGI : improved accuracy of * address validation timer. * YOSHIFUJI Hideaki @USAGI : Privacy Extensions (RFC3041) * support. * Yuji SEKIYA @USAGI : Don't assign a same IPv6 * address on a same interface. * YOSHIFUJI Hideaki @USAGI : ARCnet support * YOSHIFUJI Hideaki @USAGI : convert /proc/net/if_inet6 to * seq_file. * YOSHIFUJI Hideaki @USAGI : improved source address * selection; consider scope, * status etc. */ #define pr_fmt(fmt) "IPv6: " fmt #include <linux/errno.h> #include <linux/types.h> #include <linux/kernel.h> #include <linux/sched/signal.h> #include <linux/socket.h> #include <linux/sockios.h> #include <linux/net.h> #include <linux/inet.h> #include <linux/in6.h> #include <linux/netdevice.h> #include <linux/if_addr.h> #include <linux/if_arp.h> #include <linux/if_arcnet.h> #include <linux/if_infiniband.h> #include <linux/route.h> #include <linux/inetdevice.h> #include <linux/init.h> #include <linux/slab.h> #ifdef CONFIG_SYSCTL #include <linux/sysctl.h> #endif #include <linux/capability.h> #include <linux/delay.h> #include <linux/notifier.h> #include <linux/string.h> #include <linux/hash.h> #include <net/ip_tunnels.h> #include <net/net_namespace.h> #include <net/sock.h> #include <net/snmp.h> #include <net/6lowpan.h> #include <net/firewire.h> #include <net/ipv6.h> #include <net/protocol.h> #include <net/ndisc.h> #include <net/ip6_route.h> #include <net/addrconf.h> #include <net/tcp.h> #include <net/ip.h> #include <net/netlink.h> #include <net/pkt_sched.h> #include <net/l3mdev.h> #include <net/netdev_lock.h> #include <linux/if_tunnel.h> #include <linux/rtnetlink.h> #include <linux/netconf.h> #include <linux/random.h> #include <linux/uaccess.h> #include <linux/unaligned.h> #include <linux/proc_fs.h> #include <linux/seq_file.h> #include <linux/export.h> #include <linux/ioam6.h> #define IPV6_MAX_STRLEN \ sizeof("ffff:ffff:ffff:ffff:ffff:ffff:255.255.255.255") static inline u32 cstamp_delta(unsigned long cstamp) { return (cstamp - INITIAL_JIFFIES) * 100UL / HZ; } static inline s32 rfc3315_s14_backoff_init(s32 irt) { /* multiply 'initial retransmission time' by 0.9 .. 1.1 */ u64 tmp = get_random_u32_inclusive(900000, 1100000) * (u64)irt; do_div(tmp, 1000000); return (s32)tmp; } static inline s32 rfc3315_s14_backoff_update(s32 rt, s32 mrt) { /* multiply 'retransmission timeout' by 1.9 .. 2.1 */ u64 tmp = get_random_u32_inclusive(1900000, 2100000) * (u64)rt; do_div(tmp, 1000000); if ((s32)tmp > mrt) { /* multiply 'maximum retransmission time' by 0.9 .. 1.1 */ tmp = get_random_u32_inclusive(900000, 1100000) * (u64)mrt; do_div(tmp, 1000000); } return (s32)tmp; } #ifdef CONFIG_SYSCTL static int addrconf_sysctl_register(struct inet6_dev *idev); static void addrconf_sysctl_unregister(struct inet6_dev *idev); #else static inline int addrconf_sysctl_register(struct inet6_dev *idev) { return 0; } static inline void addrconf_sysctl_unregister(struct inet6_dev *idev) { } #endif static void ipv6_gen_rnd_iid(struct in6_addr *addr); static int ipv6_generate_eui64(u8 *eui, struct net_device *dev); static int ipv6_count_addresses(const struct inet6_dev *idev); static int ipv6_generate_stable_address(struct in6_addr *addr, u8 dad_count, const struct inet6_dev *idev); #define IN6_ADDR_HSIZE_SHIFT 8 #define IN6_ADDR_HSIZE (1 << IN6_ADDR_HSIZE_SHIFT) static void addrconf_verify(struct net *net); static void addrconf_verify_rtnl(struct net *net); static struct workqueue_struct *addrconf_wq; static void addrconf_join_anycast(struct inet6_ifaddr *ifp); static void addrconf_leave_anycast(struct inet6_ifaddr *ifp); static void addrconf_type_change(struct net_device *dev, unsigned long event); static int addrconf_ifdown(struct net_device *dev, bool unregister); static struct fib6_info *addrconf_get_prefix_route(const struct in6_addr *pfx, int plen, const struct net_device *dev, u32 flags, u32 noflags, bool no_gw); static void addrconf_dad_start(struct inet6_ifaddr *ifp); static void addrconf_dad_work(struct work_struct *w); static void addrconf_dad_completed(struct inet6_ifaddr *ifp, bool bump_id, bool send_na); static void addrconf_dad_run(struct inet6_dev *idev, bool restart); static void addrconf_rs_timer(struct timer_list *t); static void __ipv6_ifa_notify(int event, struct inet6_ifaddr *ifa); static void ipv6_ifa_notify(int event, struct inet6_ifaddr *ifa); static void inet6_prefix_notify(int event, struct inet6_dev *idev, struct prefix_info *pinfo); static struct ipv6_devconf ipv6_devconf __read_mostly = { .forwarding = 0, .hop_limit = IPV6_DEFAULT_HOPLIMIT, .mtu6 = IPV6_MIN_MTU, .accept_ra = 1, .accept_redirects = 1, .autoconf = 1, .force_mld_version = 0, .mldv1_unsolicited_report_interval = 10 * HZ, .mldv2_unsolicited_report_interval = HZ, .dad_transmits = 1, .rtr_solicits = MAX_RTR_SOLICITATIONS, .rtr_solicit_interval = RTR_SOLICITATION_INTERVAL, .rtr_solicit_max_interval = RTR_SOLICITATION_MAX_INTERVAL, .rtr_solicit_delay = MAX_RTR_SOLICITATION_DELAY, .use_tempaddr = 0, .temp_valid_lft = TEMP_VALID_LIFETIME, .temp_prefered_lft = TEMP_PREFERRED_LIFETIME, .regen_min_advance = REGEN_MIN_ADVANCE, .regen_max_retry = REGEN_MAX_RETRY, .max_desync_factor = MAX_DESYNC_FACTOR, .max_addresses = IPV6_MAX_ADDRESSES, .accept_ra_defrtr = 1, .ra_defrtr_metric = IP6_RT_PRIO_USER, .accept_ra_from_local = 0, .accept_ra_min_hop_limit= 1, .accept_ra_min_lft = 0, .accept_ra_pinfo = 1, #ifdef CONFIG_IPV6_ROUTER_PREF .accept_ra_rtr_pref = 1, .rtr_probe_interval = 60 * HZ, #ifdef CONFIG_IPV6_ROUTE_INFO .accept_ra_rt_info_min_plen = 0, .accept_ra_rt_info_max_plen = 0, #endif #endif .proxy_ndp = 0, .accept_source_route = 0, /* we do not accept RH0 by default. */ .disable_ipv6 = 0, .accept_dad = 0, .suppress_frag_ndisc = 1, .accept_ra_mtu = 1, .stable_secret = { .initialized = false, }, .use_oif_addrs_only = 0, .ignore_routes_with_linkdown = 0, .keep_addr_on_down = 0, .seg6_enabled = 0, #ifdef CONFIG_IPV6_SEG6_HMAC .seg6_require_hmac = 0, #endif .enhanced_dad = 1, .addr_gen_mode = IN6_ADDR_GEN_MODE_EUI64, .disable_policy = 0, .rpl_seg_enabled = 0, .ioam6_enabled = 0, .ioam6_id = IOAM6_DEFAULT_IF_ID, .ioam6_id_wide = IOAM6_DEFAULT_IF_ID_WIDE, .ndisc_evict_nocarrier = 1, .ra_honor_pio_life = 0, .ra_honor_pio_pflag = 0, .force_forwarding = 0, }; static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = { .forwarding = 0, .hop_limit = IPV6_DEFAULT_HOPLIMIT, .mtu6 = IPV6_MIN_MTU, .accept_ra = 1, .accept_redirects = 1, .autoconf = 1, .force_mld_version = 0, .mldv1_unsolicited_report_interval = 10 * HZ, .mldv2_unsolicited_report_interval = HZ, .dad_transmits = 1, .rtr_solicits = MAX_RTR_SOLICITATIONS, .rtr_solicit_interval = RTR_SOLICITATION_INTERVAL, .rtr_solicit_max_interval = RTR_SOLICITATION_MAX_INTERVAL, .rtr_solicit_delay = MAX_RTR_SOLICITATION_DELAY, .use_tempaddr = 0, .temp_valid_lft = TEMP_VALID_LIFETIME, .temp_prefered_lft = TEMP_PREFERRED_LIFETIME, .regen_min_advance = REGEN_MIN_ADVANCE, .regen_max_retry = REGEN_MAX_RETRY, .max_desync_factor = MAX_DESYNC_FACTOR, .max_addresses = IPV6_MAX_ADDRESSES, .accept_ra_defrtr = 1, .ra_defrtr_metric = IP6_RT_PRIO_USER, .accept_ra_from_local = 0, .accept_ra_min_hop_limit= 1, .accept_ra_min_lft = 0, .accept_ra_pinfo = 1, #ifdef CONFIG_IPV6_ROUTER_PREF .accept_ra_rtr_pref = 1, .rtr_probe_interval = 60 * HZ, #ifdef CONFIG_IPV6_ROUTE_INFO .accept_ra_rt_info_min_plen = 0, .accept_ra_rt_info_max_plen = 0, #endif #endif .proxy_ndp = 0, .accept_source_route = 0, /* we do not accept RH0 by default. */ .disable_ipv6 = 0, .accept_dad = 1, .suppress_frag_ndisc = 1, .accept_ra_mtu = 1, .stable_secret = { .initialized = false, }, .use_oif_addrs_only = 0, .ignore_routes_with_linkdown = 0, .keep_addr_on_down = 0, .seg6_enabled = 0, #ifdef CONFIG_IPV6_SEG6_HMAC .seg6_require_hmac = 0, #endif .enhanced_dad = 1, .addr_gen_mode = IN6_ADDR_GEN_MODE_EUI64, .disable_policy = 0, .rpl_seg_enabled = 0, .ioam6_enabled = 0, .ioam6_id = IOAM6_DEFAULT_IF_ID, .ioam6_id_wide = IOAM6_DEFAULT_IF_ID_WIDE, .ndisc_evict_nocarrier = 1, .ra_honor_pio_life = 0, .ra_honor_pio_pflag = 0, .force_forwarding = 0, }; /* Check if link is ready: is it up and is a valid qdisc available */ static inline bool addrconf_link_ready(const struct net_device *dev) { return netif_oper_up(dev) && !qdisc_tx_is_noop(dev); } static void addrconf_del_rs_timer(struct inet6_dev *idev) { if (timer_delete(&idev->rs_timer)) __in6_dev_put(idev); } static void addrconf_del_dad_work(struct inet6_ifaddr *ifp) { if (cancel_delayed_work(&ifp->dad_work)) __in6_ifa_put(ifp); } static void addrconf_mod_rs_timer(struct inet6_dev *idev, unsigned long when) { if (!mod_timer(&idev->rs_timer, jiffies + when)) in6_dev_hold(idev); } static void addrconf_mod_dad_work(struct inet6_ifaddr *ifp, unsigned long delay) { in6_ifa_hold(ifp); if (mod_delayed_work(addrconf_wq, &ifp->dad_work, delay)) in6_ifa_put(ifp); } static int snmp6_alloc_dev(struct inet6_dev *idev) { int i; idev->stats.ipv6 = alloc_percpu_gfp(struct ipstats_mib, GFP_KERNEL_ACCOUNT); if (!idev->stats.ipv6) goto err_ip; for_each_possible_cpu(i) { struct ipstats_mib *addrconf_stats; addrconf_stats = per_cpu_ptr(idev->stats.ipv6, i); u64_stats_init(&addrconf_stats->syncp); } idev->stats.icmpv6dev = kzalloc(sizeof(struct icmpv6_mib_device), GFP_KERNEL); if (!idev->stats.icmpv6dev) goto err_icmp; idev->stats.icmpv6msgdev = kzalloc(sizeof(struct icmpv6msg_mib_device), GFP_KERNEL_ACCOUNT); if (!idev->stats.icmpv6msgdev) goto err_icmpmsg; return 0; err_icmpmsg: kfree(idev->stats.icmpv6dev); err_icmp: free_percpu(idev->stats.ipv6); err_ip: return -ENOMEM; } static struct inet6_dev *ipv6_add_dev(struct net_device *dev) { struct inet6_dev *ndev; int err = -ENOMEM; ASSERT_RTNL(); netdev_ops_assert_locked(dev); if (dev->mtu < IPV6_MIN_MTU && dev != blackhole_netdev) return ERR_PTR(-EINVAL); ndev = kzalloc(sizeof(*ndev), GFP_KERNEL_ACCOUNT); if (!ndev) return ERR_PTR(err); rwlock_init(&ndev->lock); ndev->dev = dev; INIT_LIST_HEAD(&ndev->addr_list); timer_setup(&ndev->rs_timer, addrconf_rs_timer, 0); memcpy(&ndev->cnf, dev_net(dev)->ipv6.devconf_dflt, sizeof(ndev->cnf)); if (ndev->cnf.stable_secret.initialized) ndev->cnf.addr_gen_mode = IN6_ADDR_GEN_MODE_STABLE_PRIVACY; ndev->cnf.mtu6 = dev->mtu; ndev->ra_mtu = 0; ndev->nd_parms = neigh_parms_alloc(dev, &nd_tbl); if (!ndev->nd_parms) { kfree(ndev); return ERR_PTR(err); } if (ndev->cnf.forwarding) netif_disable_lro(dev); /* We refer to the device */ netdev_hold(dev, &ndev->dev_tracker, GFP_KERNEL); if (snmp6_alloc_dev(ndev) < 0) { netdev_dbg(dev, "%s: cannot allocate memory for statistics\n", __func__); neigh_parms_release(&nd_tbl, ndev->nd_parms); netdev_put(dev, &ndev->dev_tracker); kfree(ndev); return ERR_PTR(err); } if (dev != blackhole_netdev) { if (snmp6_register_dev(ndev) < 0) { netdev_dbg(dev, "%s: cannot create /proc/net/dev_snmp6/%s\n", __func__, dev->name); goto err_release; } } /* One reference from device. */ refcount_set(&ndev->refcnt, 1); if (dev->flags & (IFF_NOARP | IFF_LOOPBACK)) ndev->cnf.accept_dad = -1; #if IS_ENABLED(CONFIG_IPV6_SIT) if (dev->type == ARPHRD_SIT && (dev->priv_flags & IFF_ISATAP)) { pr_info("%s: Disabled Multicast RS\n", dev->name); ndev->cnf.rtr_solicits = 0; } #endif INIT_LIST_HEAD(&ndev->tempaddr_list); ndev->desync_factor = U32_MAX; if ((dev->flags&IFF_LOOPBACK) || dev->type == ARPHRD_TUNNEL || dev->type == ARPHRD_TUNNEL6 || dev->type == ARPHRD_SIT || dev->type == ARPHRD_NONE) { ndev->cnf.use_tempaddr = -1; } ndev->token = in6addr_any; if (netif_running(dev) && addrconf_link_ready(dev)) ndev->if_flags |= IF_READY; ipv6_mc_init_dev(ndev); ndev->tstamp = jiffies; if (dev != blackhole_netdev) { err = addrconf_sysctl_register(ndev); if (err) { ipv6_mc_destroy_dev(ndev); snmp6_unregister_dev(ndev); goto err_release; } } /* protected by rtnl_lock */ rcu_assign_pointer(dev->ip6_ptr, ndev); if (dev != blackhole_netdev) { /* Join interface-local all-node multicast group */ ipv6_dev_mc_inc(dev, &in6addr_interfacelocal_allnodes); /* Join all-node multicast group */ ipv6_dev_mc_inc(dev, &in6addr_linklocal_allnodes); /* Join all-router multicast group if forwarding is set */ if (ndev->cnf.forwarding && (dev->flags & IFF_MULTICAST)) ipv6_dev_mc_inc(dev, &in6addr_linklocal_allrouters); } return ndev; err_release: neigh_parms_release(&nd_tbl, ndev->nd_parms); ndev->dead = 1; in6_dev_finish_destroy(ndev); return ERR_PTR(err); } static struct inet6_dev *ipv6_find_idev(struct net_device *dev) { struct inet6_dev *idev; ASSERT_RTNL(); idev = __in6_dev_get(dev); if (!idev) { idev = ipv6_add_dev(dev); if (IS_ERR(idev)) return idev; } if (dev->flags&IFF_UP) ipv6_mc_up(idev); return idev; } static int inet6_netconf_msgsize_devconf(int type) { int size = NLMSG_ALIGN(sizeof(struct netconfmsg)) + nla_total_size(4); /* NETCONFA_IFINDEX */ bool all = false; if (type == NETCONFA_ALL) all = true; if (all || type == NETCONFA_FORWARDING) size += nla_total_size(4); #ifdef CONFIG_IPV6_MROUTE if (all || type == NETCONFA_MC_FORWARDING) size += nla_total_size(4); #endif if (all || type == NETCONFA_PROXY_NEIGH) size += nla_total_size(4); if (all || type == NETCONFA_IGNORE_ROUTES_WITH_LINKDOWN) size += nla_total_size(4); return size; } static int inet6_netconf_fill_devconf(struct sk_buff *skb, int ifindex, struct ipv6_devconf *devconf, u32 portid, u32 seq, int event, unsigned int flags, int type) { struct nlmsghdr *nlh; struct netconfmsg *ncm; bool all = false; nlh = nlmsg_put(skb, portid, seq, event, sizeof(struct netconfmsg), flags); if (!nlh) return -EMSGSIZE; if (type == NETCONFA_ALL) all = true; ncm = nlmsg_data(nlh); ncm->ncm_family = AF_INET6; if (nla_put_s32(skb, NETCONFA_IFINDEX, ifindex) < 0) goto nla_put_failure; if (!devconf) goto out; if ((all || type == NETCONFA_FORWARDING) && nla_put_s32(skb, NETCONFA_FORWARDING, READ_ONCE(devconf->forwarding)) < 0) goto nla_put_failure; #ifdef CONFIG_IPV6_MROUTE if ((all || type == NETCONFA_MC_FORWARDING) && nla_put_s32(skb, NETCONFA_MC_FORWARDING, atomic_read(&devconf->mc_forwarding)) < 0) goto nla_put_failure; #endif if ((all || type == NETCONFA_PROXY_NEIGH) && nla_put_s32(skb, NETCONFA_PROXY_NEIGH, READ_ONCE(devconf->proxy_ndp)) < 0) goto nla_put_failure; if ((all || type == NETCONFA_IGNORE_ROUTES_WITH_LINKDOWN) && nla_put_s32(skb, NETCONFA_IGNORE_ROUTES_WITH_LINKDOWN, READ_ONCE(devconf->ignore_routes_with_linkdown)) < 0) goto nla_put_failure; out: nlmsg_end(skb, nlh); return 0; nla_put_failure: nlmsg_cancel(skb, nlh); return -EMSGSIZE; } void inet6_netconf_notify_devconf(struct net *net, int event, int type, int ifindex, struct ipv6_devconf *devconf) { struct sk_buff *skb; int err = -ENOBUFS; skb = nlmsg_new(inet6_netconf_msgsize_devconf(type), GFP_KERNEL); if (!skb) goto errout; err = inet6_netconf_fill_devconf(skb, ifindex, devconf, 0, 0, event, 0, type); if (err < 0) { /* -EMSGSIZE implies BUG in inet6_netconf_msgsize_devconf() */ WARN_ON(err == -EMSGSIZE); kfree_skb(skb); goto errout; } rtnl_notify(skb, net, 0, RTNLGRP_IPV6_NETCONF, NULL, GFP_KERNEL); return; errout: rtnl_set_sk_err(net, RTNLGRP_IPV6_NETCONF, err); } static const struct nla_policy devconf_ipv6_policy[NETCONFA_MAX+1] = { [NETCONFA_IFINDEX] = { .len = sizeof(int) }, [NETCONFA_FORWARDING] = { .len = sizeof(int) }, [NETCONFA_PROXY_NEIGH] = { .len = sizeof(int) }, [NETCONFA_IGNORE_ROUTES_WITH_LINKDOWN] = { .len = sizeof(int) }, }; static int inet6_netconf_valid_get_req(struct sk_buff *skb, const struct nlmsghdr *nlh, struct nlattr **tb, struct netlink_ext_ack *extack) { int i, err; if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(struct netconfmsg))) { NL_SET_ERR_MSG_MOD(extack, "Invalid header for netconf get request"); return -EINVAL; } if (!netlink_strict_get_check(skb)) return nlmsg_parse_deprecated(nlh, sizeof(struct netconfmsg), tb, NETCONFA_MAX, devconf_ipv6_policy, extack); err = nlmsg_parse_deprecated_strict(nlh, sizeof(struct netconfmsg), tb, NETCONFA_MAX, devconf_ipv6_policy, extack); if (err) return err; for (i = 0; i <= NETCONFA_MAX; i++) { if (!tb[i]) continue; switch (i) { case NETCONFA_IFINDEX: break; default: NL_SET_ERR_MSG_MOD(extack, "Unsupported attribute in netconf get request"); return -EINVAL; } } return 0; } static int inet6_netconf_get_devconf(struct sk_buff *in_skb, struct nlmsghdr *nlh, struct netlink_ext_ack *extack) { struct net *net = sock_net(in_skb->sk); struct nlattr *tb[NETCONFA_MAX+1]; struct inet6_dev *in6_dev = NULL; struct net_device *dev = NULL; struct sk_buff *skb; struct ipv6_devconf *devconf; int ifindex; int err; err = inet6_netconf_valid_get_req(in_skb, nlh, tb, extack); if (err < 0) return err; if (!tb[NETCONFA_IFINDEX]) return -EINVAL; err = -EINVAL; ifindex = nla_get_s32(tb[NETCONFA_IFINDEX]); switch (ifindex) { case NETCONFA_IFINDEX_ALL: devconf = net->ipv6.devconf_all; break; case NETCONFA_IFINDEX_DEFAULT: devconf = net->ipv6.devconf_dflt; break; default: dev = dev_get_by_index(net, ifindex); if (!dev) return -EINVAL; in6_dev = in6_dev_get(dev); if (!in6_dev) goto errout; devconf = &in6_dev->cnf; break; } err = -ENOBUFS; skb = nlmsg_new(inet6_netconf_msgsize_devconf(NETCONFA_ALL), GFP_KERNEL); if (!skb) goto errout; err = inet6_netconf_fill_devconf(skb, ifindex, devconf, NETLINK_CB(in_skb).portid, nlh->nlmsg_seq, RTM_NEWNETCONF, 0, NETCONFA_ALL); if (err < 0) { /* -EMSGSIZE implies BUG in inet6_netconf_msgsize_devconf() */ WARN_ON(err == -EMSGSIZE); kfree_skb(skb); goto errout; } err = rtnl_unicast(skb, net, NETLINK_CB(in_skb).portid); errout: if (in6_dev) in6_dev_put(in6_dev); dev_put(dev); return err; } /* Combine dev_addr_genid and dev_base_seq to detect changes. */ static u32 inet6_base_seq(const struct net *net) { u32 res = atomic_read(&net->ipv6.dev_addr_genid) + READ_ONCE(net->dev_base_seq); /* Must not return 0 (see nl_dump_check_consistent()). * Chose a value far away from 0. */ if (!res) res = 0x80000000; return res; } static int inet6_netconf_dump_devconf(struct sk_buff *skb, struct netlink_callback *cb) { const struct nlmsghdr *nlh = cb->nlh; struct net *net = sock_net(skb->sk); struct { unsigned long ifindex; unsigned int all_default; } *ctx = (void *)cb->ctx; struct net_device *dev; struct inet6_dev *idev; int err = 0; if (cb->strict_check) { struct netlink_ext_ack *extack = cb->extack; struct netconfmsg *ncm; if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*ncm))) { NL_SET_ERR_MSG_MOD(extack, "Invalid header for netconf dump request"); return -EINVAL; } if (nlmsg_attrlen(nlh, sizeof(*ncm))) { NL_SET_ERR_MSG_MOD(extack, "Invalid data after header in netconf dump request"); return -EINVAL; } } rcu_read_lock(); for_each_netdev_dump(net, dev, ctx->ifindex) { idev = __in6_dev_get(dev); if (!idev) continue; err = inet6_netconf_fill_devconf(skb, dev->ifindex, &idev->cnf, NETLINK_CB(cb->skb).portid, nlh->nlmsg_seq, RTM_NEWNETCONF, NLM_F_MULTI, NETCONFA_ALL); if (err < 0) goto done; } if (ctx->all_default == 0) { err = inet6_netconf_fill_devconf(skb, NETCONFA_IFINDEX_ALL, net->ipv6.devconf_all, NETLINK_CB(cb->skb).portid, nlh->nlmsg_seq, RTM_NEWNETCONF, NLM_F_MULTI, NETCONFA_ALL); if (err < 0) goto done; ctx->all_default++; } if (ctx->all_default == 1) { err = inet6_netconf_fill_devconf(skb, NETCONFA_IFINDEX_DEFAULT, net->ipv6.devconf_dflt, NETLINK_CB(cb->skb).portid, nlh->nlmsg_seq, RTM_NEWNETCONF, NLM_F_MULTI, NETCONFA_ALL); if (err < 0) goto done; ctx->all_default++; } done: rcu_read_unlock(); return err; } #ifdef CONFIG_SYSCTL static void dev_forward_change(struct inet6_dev *idev) { struct net_device *dev; struct inet6_ifaddr *ifa; LIST_HEAD(tmp_addr_list); if (!idev) return; dev = idev->dev; if (idev->cnf.forwarding) dev_disable_lro(dev); if (dev->flags & IFF_MULTICAST) { if (idev->cnf.forwarding) { ipv6_dev_mc_inc(dev, &in6addr_linklocal_allrouters); ipv6_dev_mc_inc(dev, &in6addr_interfacelocal_allrouters); ipv6_dev_mc_inc(dev, &in6addr_sitelocal_allrouters); } else { ipv6_dev_mc_dec(dev, &in6addr_linklocal_allrouters); ipv6_dev_mc_dec(dev, &in6addr_interfacelocal_allrouters); ipv6_dev_mc_dec(dev, &in6addr_sitelocal_allrouters); } } read_lock_bh(&idev->lock); list_for_each_entry(ifa, &idev->addr_list, if_list) { if (ifa->flags&IFA_F_TENTATIVE) continue; list_add_tail(&ifa->if_list_aux, &tmp_addr_list); } read_unlock_bh(&idev->lock); while (!list_empty(&tmp_addr_list)) { ifa = list_first_entry(&tmp_addr_list, struct inet6_ifaddr, if_list_aux); list_del(&ifa->if_list_aux); if (idev->cnf.forwarding) addrconf_join_anycast(ifa); else addrconf_leave_anycast(ifa); } inet6_netconf_notify_devconf(dev_net(dev), RTM_NEWNETCONF, NETCONFA_FORWARDING, dev->ifindex, &idev->cnf); } static void addrconf_forward_change(struct net *net, __s32 newf) { struct net_device *dev; struct inet6_dev *idev; for_each_netdev(net, dev) { idev = __in6_dev_get_rtnl_net(dev); if (idev) { int changed = (!idev->cnf.forwarding) ^ (!newf); /* Disabling all.forwarding sets 0 to force_forwarding for all interfaces */ if (newf == 0) WRITE_ONCE(idev->cnf.force_forwarding, 0); WRITE_ONCE(idev->cnf.forwarding, newf); if (changed) dev_forward_change(idev); } } } static int addrconf_fixup_forwarding(const struct ctl_table *table, int *p, int newf) { struct net *net = (struct net *)table->extra2; int old; if (!rtnl_net_trylock(net)) return restart_syscall(); old = *p; WRITE_ONCE(*p, newf); if (p == &net->ipv6.devconf_dflt->forwarding) { if ((!newf) ^ (!old)) inet6_netconf_notify_devconf(net, RTM_NEWNETCONF, NETCONFA_FORWARDING, NETCONFA_IFINDEX_DEFAULT, net->ipv6.devconf_dflt); rtnl_net_unlock(net); return 0; } if (p == &net->ipv6.devconf_all->forwarding) { int old_dflt = net->ipv6.devconf_dflt->forwarding; WRITE_ONCE(net->ipv6.devconf_dflt->forwarding, newf); if ((!newf) ^ (!old_dflt)) inet6_netconf_notify_devconf(net, RTM_NEWNETCONF, NETCONFA_FORWARDING, NETCONFA_IFINDEX_DEFAULT, net->ipv6.devconf_dflt); addrconf_forward_change(net, newf); if ((!newf) ^ (!old)) inet6_netconf_notify_devconf(net, RTM_NEWNETCONF, NETCONFA_FORWARDING, NETCONFA_IFINDEX_ALL, net->ipv6.devconf_all); } else if ((!newf) ^ (!old)) dev_forward_change((struct inet6_dev *)table->extra1); rtnl_net_unlock(net); if (newf) rt6_purge_dflt_routers(net); return 1; } static void addrconf_linkdown_change(struct net *net, __s32 newf) { struct net_device *dev; struct inet6_dev *idev; for_each_netdev(net, dev) { idev = __in6_dev_get_rtnl_net(dev); if (idev) { int changed = (!idev->cnf.ignore_routes_with_linkdown) ^ (!newf); WRITE_ONCE(idev->cnf.ignore_routes_with_linkdown, newf); if (changed) inet6_netconf_notify_devconf(dev_net(dev), RTM_NEWNETCONF, NETCONFA_IGNORE_ROUTES_WITH_LINKDOWN, dev->ifindex, &idev->cnf); } } } static int addrconf_fixup_linkdown(const struct ctl_table *table, int *p, int newf) { struct net *net = (struct net *)table->extra2; int old; if (!rtnl_net_trylock(net)) return restart_syscall(); old = *p; WRITE_ONCE(*p, newf); if (p == &net->ipv6.devconf_dflt->ignore_routes_with_linkdown) { if ((!newf) ^ (!old)) inet6_netconf_notify_devconf(net, RTM_NEWNETCONF, NETCONFA_IGNORE_ROUTES_WITH_LINKDOWN, NETCONFA_IFINDEX_DEFAULT, net->ipv6.devconf_dflt); rtnl_net_unlock(net); return 0; } if (p == &net->ipv6.devconf_all->ignore_routes_with_linkdown) { WRITE_ONCE(net->ipv6.devconf_dflt->ignore_routes_with_linkdown, newf); addrconf_linkdown_change(net, newf); if ((!newf) ^ (!old)) inet6_netconf_notify_devconf(net, RTM_NEWNETCONF, NETCONFA_IGNORE_ROUTES_WITH_LINKDOWN, NETCONFA_IFINDEX_ALL, net->ipv6.devconf_all); } rtnl_net_unlock(net); return 1; } #endif /* Nobody refers to this ifaddr, destroy it */ void inet6_ifa_finish_destroy(struct inet6_ifaddr *ifp) { WARN_ON(!hlist_unhashed(&ifp->addr_lst)); #ifdef NET_REFCNT_DEBUG pr_debug("%s\n", __func__); #endif in6_dev_put(ifp->idev); if (cancel_delayed_work(&ifp->dad_work)) pr_notice("delayed DAD work was pending while freeing ifa=%p\n", ifp); if (ifp->state != INET6_IFADDR_STATE_DEAD) { pr_warn("Freeing alive inet6 address %p\n", ifp); return; } kfree_rcu(ifp, rcu); } static void ipv6_link_dev_addr(struct inet6_dev *idev, struct inet6_ifaddr *ifp) { struct list_head *p; int ifp_scope = ipv6_addr_src_scope(&ifp->addr); /* * Each device address list is sorted in order of scope - * global before linklocal. */ list_for_each(p, &idev->addr_list) { struct inet6_ifaddr *ifa = list_entry(p, struct inet6_ifaddr, if_list); if (ifp_scope >= ipv6_addr_src_scope(&ifa->addr)) break; } list_add_tail_rcu(&ifp->if_list, p); } static u32 inet6_addr_hash(const struct net *net, const struct in6_addr *addr) { u32 val = __ipv6_addr_jhash(addr, net_hash_mix(net)); return hash_32(val, IN6_ADDR_HSIZE_SHIFT); } static bool ipv6_chk_same_addr(struct net *net, const struct in6_addr *addr, struct net_device *dev, unsigned int hash) { struct inet6_ifaddr *ifp; hlist_for_each_entry(ifp, &net->ipv6.inet6_addr_lst[hash], addr_lst) { if (ipv6_addr_equal(&ifp->addr, addr)) { if (!dev || ifp->idev->dev == dev) return true; } } return false; } static int ipv6_add_addr_hash(struct net_device *dev, struct inet6_ifaddr *ifa) { struct net *net = dev_net(dev); unsigned int hash = inet6_addr_hash(net, &ifa->addr); int err = 0; spin_lock_bh(&net->ipv6.addrconf_hash_lock); /* Ignore adding duplicate addresses on an interface */ if (ipv6_chk_same_addr(net, &ifa->addr, dev, hash)) { netdev_dbg(dev, "ipv6_add_addr: already assigned\n"); err = -EEXIST; } else { hlist_add_head_rcu(&ifa->addr_lst, &net->ipv6.inet6_addr_lst[hash]); } spin_unlock_bh(&net->ipv6.addrconf_hash_lock); return err; } /* On success it returns ifp with increased reference count */ static struct inet6_ifaddr * ipv6_add_addr(struct inet6_dev *idev, struct ifa6_config *cfg, bool can_block, struct netlink_ext_ack *extack) { gfp_t gfp_flags = can_block ? GFP_KERNEL : GFP_ATOMIC; int addr_type = ipv6_addr_type(cfg->pfx); struct net *net = dev_net(idev->dev); struct inet6_ifaddr *ifa = NULL; struct fib6_info *f6i = NULL; int err = 0; if (addr_type == IPV6_ADDR_ANY) { NL_SET_ERR_MSG_MOD(extack, "Invalid address"); return ERR_PTR(-EADDRNOTAVAIL); } else if (addr_type & IPV6_ADDR_MULTICAST && !(cfg->ifa_flags & IFA_F_MCAUTOJOIN)) { NL_SET_ERR_MSG_MOD(extack, "Cannot assign multicast address without \"IFA_F_MCAUTOJOIN\" flag"); return ERR_PTR(-EADDRNOTAVAIL); } else if (!(idev->dev->flags & IFF_LOOPBACK) && !netif_is_l3_master(idev->dev) && addr_type & IPV6_ADDR_LOOPBACK) { NL_SET_ERR_MSG_MOD(extack, "Cannot assign loopback address on this device"); return ERR_PTR(-EADDRNOTAVAIL); } if (idev->dead) { NL_SET_ERR_MSG_MOD(extack, "device is going away"); err = -ENODEV; goto out; } if (idev->cnf.disable_ipv6) { NL_SET_ERR_MSG_MOD(extack, "IPv6 is disabled on this device"); err = -EACCES; goto out; } /* validator notifier needs to be blocking; * do not call in atomic context */ if (can_block) { struct in6_validator_info i6vi = { .i6vi_addr = *cfg->pfx, .i6vi_dev = idev, .extack = extack, }; err = inet6addr_validator_notifier_call_chain(NETDEV_UP, &i6vi); err = notifier_to_errno(err); if (err < 0) goto out; } ifa = kzalloc(sizeof(*ifa), gfp_flags | __GFP_ACCOUNT); if (!ifa) { err = -ENOBUFS; goto out; } f6i = addrconf_f6i_alloc(net, idev, cfg->pfx, false, gfp_flags, extack); if (IS_ERR(f6i)) { err = PTR_ERR(f6i); f6i = NULL; goto out; } neigh_parms_data_state_setall(idev->nd_parms); ifa->addr = *cfg->pfx; if (cfg->peer_pfx) ifa->peer_addr = *cfg->peer_pfx; spin_lock_init(&ifa->lock); INIT_DELAYED_WORK(&ifa->dad_work, addrconf_dad_work); INIT_HLIST_NODE(&ifa->addr_lst); ifa->scope = cfg->scope; ifa->prefix_len = cfg->plen; ifa->rt_priority = cfg->rt_priority; ifa->flags = cfg->ifa_flags; ifa->ifa_proto = cfg->ifa_proto; /* No need to add the TENTATIVE flag for addresses with NODAD */ if (!(cfg->ifa_flags & IFA_F_NODAD)) ifa->flags |= IFA_F_TENTATIVE; ifa->valid_lft = cfg->valid_lft; ifa->prefered_lft = cfg->preferred_lft; ifa->cstamp = ifa->tstamp = jiffies; ifa->tokenized = false; ifa->rt = f6i; ifa->idev = idev; in6_dev_hold(idev); /* For caller */ refcount_set(&ifa->refcnt, 1); rcu_read_lock(); err = ipv6_add_addr_hash(idev->dev, ifa); if (err < 0) { rcu_read_unlock(); goto out; } write_lock_bh(&idev->lock); /* Add to inet6_dev unicast addr list. */ ipv6_link_dev_addr(idev, ifa); if (ifa->flags&IFA_F_TEMPORARY) { list_add(&ifa->tmp_list, &idev->tempaddr_list); in6_ifa_hold(ifa); } in6_ifa_hold(ifa); write_unlock_bh(&idev->lock); rcu_read_unlock(); inet6addr_notifier_call_chain(NETDEV_UP, ifa); out: if (unlikely(err < 0)) { fib6_info_release(f6i); if (ifa) { if (ifa->idev) in6_dev_put(ifa->idev); kfree(ifa); } ifa = ERR_PTR(err); } return ifa; } enum cleanup_prefix_rt_t { CLEANUP_PREFIX_RT_NOP, /* no cleanup action for prefix route */ CLEANUP_PREFIX_RT_DEL, /* delete the prefix route */ CLEANUP_PREFIX_RT_EXPIRE, /* update the lifetime of the prefix route */ }; /* * Check, whether the prefix for ifp would still need a prefix route * after deleting ifp. The function returns one of the CLEANUP_PREFIX_RT_* * constants. * * 1) we don't purge prefix if address was not permanent. * prefix is managed by its own lifetime. * 2) we also don't purge, if the address was IFA_F_NOPREFIXROUTE. * 3) if there are no addresses, delete prefix. * 4) if there are still other permanent address(es), * corresponding prefix is still permanent. * 5) if there are still other addresses with IFA_F_NOPREFIXROUTE, * don't purge the prefix, assume user space is managing it. * 6) otherwise, update prefix lifetime to the * longest valid lifetime among the corresponding * addresses on the device. * Note: subsequent RA will update lifetime. **/ static enum cleanup_prefix_rt_t check_cleanup_prefix_route(struct inet6_ifaddr *ifp, unsigned long *expires) { struct inet6_ifaddr *ifa; struct inet6_dev *idev = ifp->idev; unsigned long lifetime; enum cleanup_prefix_rt_t action = CLEANUP_PREFIX_RT_DEL; *expires = jiffies; list_for_each_entry(ifa, &idev->addr_list, if_list) { if (ifa == ifp) continue; if (ifa->prefix_len != ifp->prefix_len || !ipv6_prefix_equal(&ifa->addr, &ifp->addr, ifp->prefix_len)) continue; if (ifa->flags & (IFA_F_PERMANENT | IFA_F_NOPREFIXROUTE)) return CLEANUP_PREFIX_RT_NOP; action = CLEANUP_PREFIX_RT_EXPIRE; spin_lock(&ifa->lock); lifetime = addrconf_timeout_fixup(ifa->valid_lft, HZ); /* * Note: Because this address is * not permanent, lifetime < * LONG_MAX / HZ here. */ if (time_before(*expires, ifa->tstamp + lifetime * HZ)) *expires = ifa->tstamp + lifetime * HZ; spin_unlock(&ifa->lock); } return action; } static void cleanup_prefix_route(struct inet6_ifaddr *ifp, unsigned long expires, bool del_rt, bool del_peer) { struct fib6_table *table; struct fib6_info *f6i; f6i = addrconf_get_prefix_route(del_peer ? &ifp->peer_addr : &ifp->addr, ifp->prefix_len, ifp->idev->dev, 0, RTF_DEFAULT, true); if (f6i) { if (del_rt) ip6_del_rt(dev_net(ifp->idev->dev), f6i, false); else { if (!(f6i->fib6_flags & RTF_EXPIRES)) { table = f6i->fib6_table; spin_lock_bh(&table->tb6_lock); fib6_set_expires(f6i, expires); fib6_add_gc_list(f6i); spin_unlock_bh(&table->tb6_lock); } fib6_info_release(f6i); } } } /* This function wants to get referenced ifp and releases it before return */ static void ipv6_del_addr(struct inet6_ifaddr *ifp) { enum cleanup_prefix_rt_t action = CLEANUP_PREFIX_RT_NOP; struct net *net = dev_net(ifp->idev->dev); unsigned long expires; int state; ASSERT_RTNL(); spin_lock_bh(&ifp->lock); state = ifp->state; ifp->state = INET6_IFADDR_STATE_DEAD; spin_unlock_bh(&ifp->lock); if (state == INET6_IFADDR_STATE_DEAD) goto out; spin_lock_bh(&net->ipv6.addrconf_hash_lock); hlist_del_init_rcu(&ifp->addr_lst); spin_unlock_bh(&net->ipv6.addrconf_hash_lock); write_lock_bh(&ifp->idev->lock); if (ifp->flags&IFA_F_TEMPORARY) { list_del(&ifp->tmp_list); if (ifp->ifpub) { in6_ifa_put(ifp->ifpub); ifp->ifpub = NULL; } __in6_ifa_put(ifp); } if (ifp->flags & IFA_F_PERMANENT && !(ifp->flags & IFA_F_NOPREFIXROUTE)) action = check_cleanup_prefix_route(ifp, &expires); list_del_rcu(&ifp->if_list); __in6_ifa_put(ifp); write_unlock_bh(&ifp->idev->lock); addrconf_del_dad_work(ifp); ipv6_ifa_notify(RTM_DELADDR, ifp); inet6addr_notifier_call_chain(NETDEV_DOWN, ifp); if (action != CLEANUP_PREFIX_RT_NOP) { cleanup_prefix_route(ifp, expires, action == CLEANUP_PREFIX_RT_DEL, false); } /* clean up prefsrc entries */ rt6_remove_prefsrc(ifp); out: in6_ifa_put(ifp); } static unsigned long ipv6_get_regen_advance(const struct inet6_dev *idev) { return READ_ONCE(idev->cnf.regen_min_advance) + READ_ONCE(idev->cnf.regen_max_retry) * READ_ONCE(idev->cnf.dad_transmits) * max(NEIGH_VAR(idev->nd_parms, RETRANS_TIME), HZ/100) / HZ; } static int ipv6_create_tempaddr(struct inet6_ifaddr *ifp, bool block) { struct inet6_dev *idev = ifp->idev; unsigned long tmp_tstamp, age; unsigned long regen_advance; unsigned long now = jiffies; u32 if_public_preferred_lft; s32 cnf_temp_preferred_lft; struct inet6_ifaddr *ift; struct ifa6_config cfg; long max_desync_factor; struct in6_addr addr; int ret = 0; write_lock_bh(&idev->lock); retry: in6_dev_hold(idev); if (READ_ONCE(idev->cnf.use_tempaddr) <= 0) { write_unlock_bh(&idev->lock); pr_info("%s: use_tempaddr is disabled\n", __func__); in6_dev_put(idev); ret = -1; goto out; } spin_lock_bh(&ifp->lock); if (ifp->regen_count++ >= READ_ONCE(idev->cnf.regen_max_retry)) { WRITE_ONCE(idev->cnf.use_tempaddr, -1); /*XXX*/ spin_unlock_bh(&ifp->lock); write_unlock_bh(&idev->lock); pr_warn("%s: regeneration time exceeded - disabled temporary address support\n", __func__); in6_dev_put(idev); ret = -1; goto out; } in6_ifa_hold(ifp); memcpy(addr.s6_addr, ifp->addr.s6_addr, 8); ipv6_gen_rnd_iid(&addr); age = (now - ifp->tstamp) / HZ; regen_advance = ipv6_get_regen_advance(idev); /* recalculate max_desync_factor each time and update * idev->desync_factor if it's larger */ cnf_temp_preferred_lft = READ_ONCE(idev->cnf.temp_prefered_lft); max_desync_factor = min_t(long, READ_ONCE(idev->cnf.max_desync_factor), cnf_temp_preferred_lft - regen_advance); if (unlikely(idev->desync_factor > max_desync_factor)) { if (max_desync_factor > 0) { get_random_bytes(&idev->desync_factor, sizeof(idev->desync_factor)); idev->desync_factor %= max_desync_factor; } else { idev->desync_factor = 0; } } if_public_preferred_lft = ifp->prefered_lft; memset(&cfg, 0, sizeof(cfg)); cfg.valid_lft = min_t(__u32, ifp->valid_lft, READ_ONCE(idev->cnf.temp_valid_lft) + age); cfg.preferred_lft = cnf_temp_preferred_lft + age - idev->desync_factor; cfg.preferred_lft = min_t(__u32, if_public_preferred_lft, cfg.preferred_lft); cfg.preferred_lft = min_t(__u32, cfg.valid_lft, cfg.preferred_lft); cfg.plen = ifp->prefix_len; tmp_tstamp = ifp->tstamp; spin_unlock_bh(&ifp->lock); write_unlock_bh(&idev->lock); /* From RFC 4941: * * A temporary address is created only if this calculated Preferred * Lifetime is greater than REGEN_ADVANCE time units. In * particular, an implementation must not create a temporary address * with a zero Preferred Lifetime. * * ... * * When creating a temporary address, the lifetime values MUST be * derived from the corresponding prefix as follows: * * ... * * * Its Preferred Lifetime is the lower of the Preferred Lifetime * of the public address or TEMP_PREFERRED_LIFETIME - * DESYNC_FACTOR. * * To comply with the RFC's requirements, clamp the preferred lifetime * to a minimum of regen_advance, unless that would exceed valid_lft or * ifp->prefered_lft. * * Use age calculation as in addrconf_verify to avoid unnecessary * temporary addresses being generated. */ age = (now - tmp_tstamp + ADDRCONF_TIMER_FUZZ_MINUS) / HZ; if (cfg.preferred_lft <= regen_advance + age) { cfg.preferred_lft = regen_advance + age + 1; if (cfg.preferred_lft > cfg.valid_lft || cfg.preferred_lft > if_public_preferred_lft) { in6_ifa_put(ifp); in6_dev_put(idev); ret = -1; goto out; } } cfg.ifa_flags = IFA_F_TEMPORARY; /* set in addrconf_prefix_rcv() */ if (ifp->flags & IFA_F_OPTIMISTIC) cfg.ifa_flags |= IFA_F_OPTIMISTIC; cfg.pfx = &addr; cfg.scope = ipv6_addr_scope(cfg.pfx); ift = ipv6_add_addr(idev, &cfg, block, NULL); if (IS_ERR(ift)) { in6_ifa_put(ifp); in6_dev_put(idev); pr_info("%s: retry temporary address regeneration\n", __func__); write_lock_bh(&idev->lock); goto retry; } spin_lock_bh(&ift->lock); ift->ifpub = ifp; ift->cstamp = now; ift->tstamp = tmp_tstamp; spin_unlock_bh(&ift->lock); addrconf_dad_start(ift); in6_ifa_put(ift); in6_dev_put(idev); out: return ret; } /* * Choose an appropriate source address (RFC3484) */ enum { IPV6_SADDR_RULE_INIT = 0, IPV6_SADDR_RULE_LOCAL, IPV6_SADDR_RULE_SCOPE, IPV6_SADDR_RULE_PREFERRED, #ifdef CONFIG_IPV6_MIP6 IPV6_SADDR_RULE_HOA, #endif IPV6_SADDR_RULE_OIF, IPV6_SADDR_RULE_LABEL, IPV6_SADDR_RULE_PRIVACY, IPV6_SADDR_RULE_ORCHID, IPV6_SADDR_RULE_PREFIX, #ifdef CONFIG_IPV6_OPTIMISTIC_DAD IPV6_SADDR_RULE_NOT_OPTIMISTIC, #endif IPV6_SADDR_RULE_MAX }; struct ipv6_saddr_score { int rule; int addr_type; struct inet6_ifaddr *ifa; DECLARE_BITMAP(scorebits, IPV6_SADDR_RULE_MAX); int scopedist; int matchlen; }; struct ipv6_saddr_dst { const struct in6_addr *addr; int ifindex; int scope; int label; unsigned int prefs; }; static inline int ipv6_saddr_preferred(int type) { if (type & (IPV6_ADDR_MAPPED|IPV6_ADDR_COMPATv4|IPV6_ADDR_LOOPBACK)) return 1; return 0; } static bool ipv6_use_optimistic_addr(const struct net *net, const struct inet6_dev *idev) { #ifdef CONFIG_IPV6_OPTIMISTIC_DAD if (!idev) return false; if (!READ_ONCE(net->ipv6.devconf_all->optimistic_dad) && !READ_ONCE(idev->cnf.optimistic_dad)) return false; if (!READ_ONCE(net->ipv6.devconf_all->use_optimistic) && !READ_ONCE(idev->cnf.use_optimistic)) return false; return true; #else return false; #endif } static bool ipv6_allow_optimistic_dad(const struct net *net, const struct inet6_dev *idev) { #ifdef CONFIG_IPV6_OPTIMISTIC_DAD if (!idev) return false; if (!READ_ONCE(net->ipv6.devconf_all->optimistic_dad) && !READ_ONCE(idev->cnf.optimistic_dad)) return false; return true; #else return false; #endif } static int ipv6_get_saddr_eval(struct net *net, struct ipv6_saddr_score *score, struct ipv6_saddr_dst *dst, int i) { int ret; if (i <= score->rule) { switch (i) { case IPV6_SADDR_RULE_SCOPE: ret = score->scopedist; break; case IPV6_SADDR_RULE_PREFIX: ret = score->matchlen; break; default: ret = !!test_bit(i, score->scorebits); } goto out; } switch (i) { case IPV6_SADDR_RULE_INIT: /* Rule 0: remember if hiscore is not ready yet */ ret = !!score->ifa; break; case IPV6_SADDR_RULE_LOCAL: /* Rule 1: Prefer same address */ ret = ipv6_addr_equal(&score->ifa->addr, dst->addr); break; case IPV6_SADDR_RULE_SCOPE: /* Rule 2: Prefer appropriate scope * * ret * ^ * -1 | d 15 * ---+--+-+---> scope * | * | d is scope of the destination. * B-d | \ * | \ <- smaller scope is better if * B-15 | \ if scope is enough for destination. * | ret = B - scope (-1 <= scope >= d <= 15). * d-C-1 | / * |/ <- greater is better * -C / if scope is not enough for destination. * /| ret = scope - C (-1 <= d < scope <= 15). * * d - C - 1 < B -15 (for all -1 <= d <= 15). * C > d + 14 - B >= 15 + 14 - B = 29 - B. * Assume B = 0 and we get C > 29. */ ret = __ipv6_addr_src_scope(score->addr_type); if (ret >= dst->scope) ret = -ret; else ret -= 128; /* 30 is enough */ score->scopedist = ret; break; case IPV6_SADDR_RULE_PREFERRED: { /* Rule 3: Avoid deprecated and optimistic addresses */ u8 avoid = IFA_F_DEPRECATED; if (!ipv6_use_optimistic_addr(net, score->ifa->idev)) avoid |= IFA_F_OPTIMISTIC; ret = ipv6_saddr_preferred(score->addr_type) || !(score->ifa->flags & avoid); break; } #ifdef CONFIG_IPV6_MIP6 case IPV6_SADDR_RULE_HOA: { /* Rule 4: Prefer home address */ int prefhome = !(dst->prefs & IPV6_PREFER_SRC_COA); ret = !(score->ifa->flags & IFA_F_HOMEADDRESS) ^ prefhome; break; } #endif case IPV6_SADDR_RULE_OIF: /* Rule 5: Prefer outgoing interface */ ret = (!dst->ifindex || dst->ifindex == score->ifa->idev->dev->ifindex); break; case IPV6_SADDR_RULE_LABEL: /* Rule 6: Prefer matching label */ ret = ipv6_addr_label(net, &score->ifa->addr, score->addr_type, score->ifa->idev->dev->ifindex) == dst->label; break; case IPV6_SADDR_RULE_PRIVACY: { /* Rule 7: Prefer public address * Note: prefer temporary address if use_tempaddr >= 2 */ int preftmp = dst->prefs & (IPV6_PREFER_SRC_PUBLIC|IPV6_PREFER_SRC_TMP) ? !!(dst->prefs & IPV6_PREFER_SRC_TMP) : READ_ONCE(score->ifa->idev->cnf.use_tempaddr) >= 2; ret = (!(score->ifa->flags & IFA_F_TEMPORARY)) ^ preftmp; break; } case IPV6_SADDR_RULE_ORCHID: /* Rule 8-: Prefer ORCHID vs ORCHID or * non-ORCHID vs non-ORCHID */ ret = !(ipv6_addr_orchid(&score->ifa->addr) ^ ipv6_addr_orchid(dst->addr)); break; case IPV6_SADDR_RULE_PREFIX: /* Rule 8: Use longest matching prefix */ ret = ipv6_addr_diff(&score->ifa->addr, dst->addr); if (ret > score->ifa->prefix_len) ret = score->ifa->prefix_len; score->matchlen = ret; break; #ifdef CONFIG_IPV6_OPTIMISTIC_DAD case IPV6_SADDR_RULE_NOT_OPTIMISTIC: /* Optimistic addresses still have lower precedence than other * preferred addresses. */ ret = !(score->ifa->flags & IFA_F_OPTIMISTIC); break; #endif default: ret = 0; } if (ret) __set_bit(i, score->scorebits); score->rule = i; out: return ret; } static int __ipv6_dev_get_saddr(struct net *net, struct ipv6_saddr_dst *dst, struct inet6_dev *idev, struct ipv6_saddr_score *scores, int hiscore_idx) { struct ipv6_saddr_score *score = &scores[1 - hiscore_idx], *hiscore = &scores[hiscore_idx]; list_for_each_entry_rcu(score->ifa, &idev->addr_list, if_list) { int i; /* * - Tentative Address (RFC2462 section 5.4) * - A tentative address is not considered * "assigned to an interface" in the traditional * sense, unless it is also flagged as optimistic. * - Candidate Source Address (section 4) * - In any case, anycast addresses, multicast * addresses, and the unspecified address MUST * NOT be included in a candidate set. */ if ((score->ifa->flags & IFA_F_TENTATIVE) && (!(score->ifa->flags & IFA_F_OPTIMISTIC))) continue; score->addr_type = __ipv6_addr_type(&score->ifa->addr); if (unlikely(score->addr_type == IPV6_ADDR_ANY || score->addr_type & IPV6_ADDR_MULTICAST)) { net_dbg_ratelimited("ADDRCONF: unspecified / multicast address assigned as unicast address on %s", idev->dev->name); continue; } score->rule = -1; bitmap_zero(score->scorebits, IPV6_SADDR_RULE_MAX); for (i = 0; i < IPV6_SADDR_RULE_MAX; i++) { int minihiscore, miniscore; minihiscore = ipv6_get_saddr_eval(net, hiscore, dst, i); miniscore = ipv6_get_saddr_eval(net, score, dst, i); if (minihiscore > miniscore) { if (i == IPV6_SADDR_RULE_SCOPE && score->scopedist > 0) { /* * special case: * each remaining entry * has too small (not enough) * scope, because ifa entries * are sorted by their scope * values. */ goto out; } break; } else if (minihiscore < miniscore) { swap(hiscore, score); hiscore_idx = 1 - hiscore_idx; /* restore our iterator */ score->ifa = hiscore->ifa; break; } } } out: return hiscore_idx; } static int ipv6_get_saddr_master(struct net *net, const struct net_device *dst_dev, const struct net_device *master, struct ipv6_saddr_dst *dst, struct ipv6_saddr_score *scores, int hiscore_idx) { struct inet6_dev *idev; idev = __in6_dev_get(dst_dev); if (idev) hiscore_idx = __ipv6_dev_get_saddr(net, dst, idev, scores, hiscore_idx); idev = __in6_dev_get(master); if (idev) hiscore_idx = __ipv6_dev_get_saddr(net, dst, idev, scores, hiscore_idx); return hiscore_idx; } int ipv6_dev_get_saddr(struct net *net, const struct net_device *dst_dev, const struct in6_addr *daddr, unsigned int prefs, struct in6_addr *saddr) { struct ipv6_saddr_score scores[2], *hiscore; struct ipv6_saddr_dst dst; struct inet6_dev *idev; struct net_device *dev; int dst_type; bool use_oif_addr = false; int hiscore_idx = 0; int ret = 0; dst_type = __ipv6_addr_type(daddr); dst.addr = daddr; dst.ifindex = dst_dev ? dst_dev->ifindex : 0; dst.scope = __ipv6_addr_src_scope(dst_type); dst.label = ipv6_addr_label(net, daddr, dst_type, dst.ifindex); dst.prefs = prefs; scores[hiscore_idx].rule = -1; scores[hiscore_idx].ifa = NULL; rcu_read_lock(); /* Candidate Source Address (section 4) * - multicast and link-local destination address, * the set of candidate source address MUST only * include addresses assigned to interfaces * belonging to the same link as the outgoing * interface. * (- For site-local destination addresses, the * set of candidate source addresses MUST only * include addresses assigned to interfaces * belonging to the same site as the outgoing * interface.) * - "It is RECOMMENDED that the candidate source addresses * be the set of unicast addresses assigned to the * interface that will be used to send to the destination * (the 'outgoing' interface)." (RFC 6724) */ if (dst_dev) { idev = __in6_dev_get(dst_dev); if ((dst_type & IPV6_ADDR_MULTICAST) || dst.scope <= IPV6_ADDR_SCOPE_LINKLOCAL || (idev && READ_ONCE(idev->cnf.use_oif_addrs_only))) { use_oif_addr = true; } } if (use_oif_addr) { if (idev) hiscore_idx = __ipv6_dev_get_saddr(net, &dst, idev, scores, hiscore_idx); } else { const struct net_device *master; int master_idx = 0; /* if dst_dev exists and is enslaved to an L3 device, then * prefer addresses from dst_dev and then the master over * any other enslaved devices in the L3 domain. */ master = l3mdev_master_dev_rcu(dst_dev); if (master) { master_idx = master->ifindex; hiscore_idx = ipv6_get_saddr_master(net, dst_dev, master, &dst, scores, hiscore_idx); if (scores[hiscore_idx].ifa && scores[hiscore_idx].scopedist >= 0) goto out; } for_each_netdev_rcu(net, dev) { /* only consider addresses on devices in the * same L3 domain */ if (l3mdev_master_ifindex_rcu(dev) != master_idx) continue; idev = __in6_dev_get(dev); if (!idev) continue; hiscore_idx = __ipv6_dev_get_saddr(net, &dst, idev, scores, hiscore_idx); } } out: hiscore = &scores[hiscore_idx]; if (!hiscore->ifa) ret = -EADDRNOTAVAIL; else *saddr = hiscore->ifa->addr; rcu_read_unlock(); return ret; } EXPORT_SYMBOL(ipv6_dev_get_saddr); static int __ipv6_get_lladdr(struct inet6_dev *idev, struct in6_addr *addr, u32 banned_flags) { struct inet6_ifaddr *ifp; int err = -EADDRNOTAVAIL; list_for_each_entry_reverse(ifp, &idev->addr_list, if_list) { if (ifp->scope > IFA_LINK) break; if (ifp->scope == IFA_LINK && !(ifp->flags & banned_flags)) { *addr = ifp->addr; err = 0; break; } } return err; } int ipv6_get_lladdr(struct net_device *dev, struct in6_addr *addr, u32 banned_flags) { struct inet6_dev *idev; int err = -EADDRNOTAVAIL; rcu_read_lock(); idev = __in6_dev_get(dev); if (idev) { read_lock_bh(&idev->lock); err = __ipv6_get_lladdr(idev, addr, banned_flags); read_unlock_bh(&idev->lock); } rcu_read_unlock(); return err; } static int ipv6_count_addresses(const struct inet6_dev *idev) { const struct inet6_ifaddr *ifp; int cnt = 0; rcu_read_lock(); list_for_each_entry_rcu(ifp, &idev->addr_list, if_list) cnt++; rcu_read_unlock(); return cnt; } int ipv6_chk_addr(struct net *net, const struct in6_addr *addr, const struct net_device *dev, int strict) { return ipv6_chk_addr_and_flags(net, addr, dev, !dev, strict, IFA_F_TENTATIVE); } EXPORT_SYMBOL(ipv6_chk_addr); /* device argument is used to find the L3 domain of interest. If * skip_dev_check is set, then the ifp device is not checked against * the passed in dev argument. So the 2 cases for addresses checks are: * 1. does the address exist in the L3 domain that dev is part of * (skip_dev_check = true), or * * 2. does the address exist on the specific device * (skip_dev_check = false) */ static struct net_device * __ipv6_chk_addr_and_flags(struct net *net, const struct in6_addr *addr, const struct net_device *dev, bool skip_dev_check, int strict, u32 banned_flags) { unsigned int hash = inet6_addr_hash(net, addr); struct net_device *l3mdev, *ndev; struct inet6_ifaddr *ifp; u32 ifp_flags; rcu_read_lock(); l3mdev = l3mdev_master_dev_rcu(dev); if (skip_dev_check) dev = NULL; hlist_for_each_entry_rcu(ifp, &net->ipv6.inet6_addr_lst[hash], addr_lst) { ndev = ifp->idev->dev; if (l3mdev_master_dev_rcu(ndev) != l3mdev) continue; /* Decouple optimistic from tentative for evaluation here. * Ban optimistic addresses explicitly, when required. */ ifp_flags = (ifp->flags&IFA_F_OPTIMISTIC) ? (ifp->flags&~IFA_F_TENTATIVE) : ifp->flags; if (ipv6_addr_equal(&ifp->addr, addr) && !(ifp_flags&banned_flags) && (!dev || ndev == dev || !(ifp->scope&(IFA_LINK|IFA_HOST) || strict))) { rcu_read_unlock(); return ndev; } } rcu_read_unlock(); return NULL; } int ipv6_chk_addr_and_flags(struct net *net, const struct in6_addr *addr, const struct net_device *dev, bool skip_dev_check, int strict, u32 banned_flags) { return __ipv6_chk_addr_and_flags(net, addr, dev, skip_dev_check, strict, banned_flags) ? 1 : 0; } EXPORT_SYMBOL(ipv6_chk_addr_and_flags); /* Compares an address/prefix_len with addresses on device @dev. * If one is found it returns true. */ bool ipv6_chk_custom_prefix(const struct in6_addr *addr, const unsigned int prefix_len, struct net_device *dev) { const struct inet6_ifaddr *ifa; const struct inet6_dev *idev; bool ret = false; rcu_read_lock(); idev = __in6_dev_get(dev); if (idev) { list_for_each_entry_rcu(ifa, &idev->addr_list, if_list) { ret = ipv6_prefix_equal(addr, &ifa->addr, prefix_len); if (ret) break; } } rcu_read_unlock(); return ret; } EXPORT_SYMBOL(ipv6_chk_custom_prefix); int ipv6_chk_prefix(const struct in6_addr *addr, struct net_device *dev) { const struct inet6_ifaddr *ifa; const struct inet6_dev *idev; int onlink; onlink = 0; rcu_read_lock(); idev = __in6_dev_get(dev); if (idev) { list_for_each_entry_rcu(ifa, &idev->addr_list, if_list) { onlink = ipv6_prefix_equal(addr, &ifa->addr, ifa->prefix_len); if (onlink) break; } } rcu_read_unlock(); return onlink; } EXPORT_SYMBOL(ipv6_chk_prefix); /** * ipv6_dev_find - find the first device with a given source address. * @net: the net namespace * @addr: the source address * @dev: used to find the L3 domain of interest * * The caller should be protected by RCU, or RTNL. */ struct net_device *ipv6_dev_find(struct net *net, const struct in6_addr *addr, struct net_device *dev) { return __ipv6_chk_addr_and_flags(net, addr, dev, !dev, 1, IFA_F_TENTATIVE); } EXPORT_SYMBOL(ipv6_dev_find); struct inet6_ifaddr *ipv6_get_ifaddr(struct net *net, const struct in6_addr *addr, struct net_device *dev, int strict) { unsigned int hash = inet6_addr_hash(net, addr); struct inet6_ifaddr *ifp, *result = NULL; rcu_read_lock(); hlist_for_each_entry_rcu(ifp, &net->ipv6.inet6_addr_lst[hash], addr_lst) { if (ipv6_addr_equal(&ifp->addr, addr)) { if (!dev || ifp->idev->dev == dev || !(ifp->scope&(IFA_LINK|IFA_HOST) || strict)) { if (in6_ifa_hold_safe(ifp)) { result = ifp; break; } } } } rcu_read_unlock(); return result; } /* Gets referenced address, destroys ifaddr */ static void addrconf_dad_stop(struct inet6_ifaddr *ifp, int dad_failed) { if (dad_failed) ifp->flags |= IFA_F_DADFAILED; if (ifp->flags&IFA_F_TEMPORARY) { struct inet6_ifaddr *ifpub; spin_lock_bh(&ifp->lock); ifpub = ifp->ifpub; if (ifpub) { in6_ifa_hold(ifpub); spin_unlock_bh(&ifp->lock); ipv6_create_tempaddr(ifpub, true); in6_ifa_put(ifpub); } else { spin_unlock_bh(&ifp->lock); } ipv6_del_addr(ifp); } else if (ifp->flags&IFA_F_PERMANENT || !dad_failed) { spin_lock_bh(&ifp->lock); addrconf_del_dad_work(ifp); ifp->flags |= IFA_F_TENTATIVE; if (dad_failed) ifp->flags &= ~IFA_F_OPTIMISTIC; spin_unlock_bh(&ifp->lock); if (dad_failed) ipv6_ifa_notify(0, ifp); in6_ifa_put(ifp); } else { ipv6_del_addr(ifp); } } static int addrconf_dad_end(struct inet6_ifaddr *ifp) { int err = -ENOENT; spin_lock_bh(&ifp->lock); if (ifp->state == INET6_IFADDR_STATE_DAD) { ifp->state = INET6_IFADDR_STATE_POSTDAD; err = 0; } spin_unlock_bh(&ifp->lock); return err; } void addrconf_dad_failure(struct sk_buff *skb, struct inet6_ifaddr *ifp) { struct inet6_dev *idev = ifp->idev; struct net *net = dev_net(idev->dev); int max_addresses; if (addrconf_dad_end(ifp)) { in6_ifa_put(ifp); return; } net_info_ratelimited("%s: IPv6 duplicate address %pI6c used by %pM detected!\n", ifp->idev->dev->name, &ifp->addr, eth_hdr(skb)->h_source); spin_lock_bh(&ifp->lock); if (ifp->flags & IFA_F_STABLE_PRIVACY) { struct in6_addr new_addr; struct inet6_ifaddr *ifp2; int retries = ifp->stable_privacy_retry + 1; struct ifa6_config cfg = { .pfx = &new_addr, .plen = ifp->prefix_len, .ifa_flags = ifp->flags, .valid_lft = ifp->valid_lft, .preferred_lft = ifp->prefered_lft, .scope = ifp->scope, }; if (retries > net->ipv6.sysctl.idgen_retries) { net_info_ratelimited("%s: privacy stable address generation failed because of DAD conflicts!\n", ifp->idev->dev->name); goto errdad; } new_addr = ifp->addr; if (ipv6_generate_stable_address(&new_addr, retries, idev)) goto errdad; spin_unlock_bh(&ifp->lock); max_addresses = READ_ONCE(idev->cnf.max_addresses); if (max_addresses && ipv6_count_addresses(idev) >= max_addresses) goto lock_errdad; net_info_ratelimited("%s: generating new stable privacy address because of DAD conflict\n", ifp->idev->dev->name); ifp2 = ipv6_add_addr(idev, &cfg, false, NULL); if (IS_ERR(ifp2)) goto lock_errdad; spin_lock_bh(&ifp2->lock); ifp2->stable_privacy_retry = retries; ifp2->state = INET6_IFADDR_STATE_PREDAD; spin_unlock_bh(&ifp2->lock); addrconf_mod_dad_work(ifp2, net->ipv6.sysctl.idgen_delay); in6_ifa_put(ifp2); lock_errdad: spin_lock_bh(&ifp->lock); } errdad: /* transition from _POSTDAD to _ERRDAD */ ifp->state = INET6_IFADDR_STATE_ERRDAD; spin_unlock_bh(&ifp->lock); addrconf_mod_dad_work(ifp, 0); in6_ifa_put(ifp); } /* Join to solicited addr multicast group. */ void addrconf_join_solict(struct net_device *dev, const struct in6_addr *addr) { struct in6_addr maddr; if (READ_ONCE(dev->flags) & (IFF_LOOPBACK | IFF_NOARP)) return; addrconf_addr_solict_mult(addr, &maddr); ipv6_dev_mc_inc(dev, &maddr); } void addrconf_leave_solict(struct inet6_dev *idev, const struct in6_addr *addr) { struct in6_addr maddr; if (READ_ONCE(idev->dev->flags) & (IFF_LOOPBACK | IFF_NOARP)) return; addrconf_addr_solict_mult(addr, &maddr); __ipv6_dev_mc_dec(idev, &maddr); } static void addrconf_join_anycast(struct inet6_ifaddr *ifp) { struct in6_addr addr; if (ifp->prefix_len >= 127) /* RFC 6164 */ return; ipv6_addr_prefix(&addr, &ifp->addr, ifp->prefix_len); if (ipv6_addr_any(&addr)) return; __ipv6_dev_ac_inc(ifp->idev, &addr); } static void addrconf_leave_anycast(struct inet6_ifaddr *ifp) { struct in6_addr addr; if (ifp->prefix_len >= 127) /* RFC 6164 */ return; ipv6_addr_prefix(&addr, &ifp->addr, ifp->prefix_len); if (ipv6_addr_any(&addr)) return; __ipv6_dev_ac_dec(ifp->idev, &addr); } static int addrconf_ifid_6lowpan(u8 *eui, struct net_device *dev) { switch (dev->addr_len) { case ETH_ALEN: memcpy(eui, dev->dev_addr, 3); eui[3] = 0xFF; eui[4] = 0xFE; memcpy(eui + 5, dev->dev_addr + 3, 3); break; case EUI64_ADDR_LEN: memcpy(eui, dev->dev_addr, EUI64_ADDR_LEN); eui[0] ^= 2; break; default: return -1; } return 0; } static int addrconf_ifid_ieee1394(u8 *eui, struct net_device *dev) { const union fwnet_hwaddr *ha; if (dev->addr_len != FWNET_ALEN) return -1; ha = (const union fwnet_hwaddr *)dev->dev_addr; memcpy(eui, &ha->uc.uniq_id, sizeof(ha->uc.uniq_id)); eui[0] ^= 2; return 0; } static int addrconf_ifid_arcnet(u8 *eui, struct net_device *dev) { /* XXX: inherit EUI-64 from other interface -- yoshfuji */ if (dev->addr_len != ARCNET_ALEN) return -1; memset(eui, 0, 7); eui[7] = *(u8 *)dev->dev_addr; return 0; } static int addrconf_ifid_infiniband(u8 *eui, struct net_device *dev) { if (dev->addr_len != INFINIBAND_ALEN) return -1; memcpy(eui, dev->dev_addr + 12, 8); eui[0] |= 2; return 0; } static int __ipv6_isatap_ifid(u8 *eui, __be32 addr) { if (addr == 0) return -1; eui[0] = (ipv4_is_zeronet(addr) || ipv4_is_private_10(addr) || ipv4_is_loopback(addr) || ipv4_is_linklocal_169(addr) || ipv4_is_private_172(addr) || ipv4_is_test_192(addr) || ipv4_is_anycast_6to4(addr) || ipv4_is_private_192(addr) || ipv4_is_test_198(addr) || ipv4_is_multicast(addr) || ipv4_is_lbcast(addr)) ? 0x00 : 0x02; eui[1] = 0; eui[2] = 0x5E; eui[3] = 0xFE; memcpy(eui + 4, &addr, 4); return 0; } static int addrconf_ifid_sit(u8 *eui, struct net_device *dev) { if (dev->priv_flags & IFF_ISATAP) return __ipv6_isatap_ifid(eui, *(__be32 *)dev->dev_addr); return -1; } static int addrconf_ifid_gre(u8 *eui, struct net_device *dev) { return __ipv6_isatap_ifid(eui, *(__be32 *)dev->dev_addr); } static int addrconf_ifid_ip6tnl(u8 *eui, struct net_device *dev) { memcpy(eui, dev->perm_addr, 3); memcpy(eui + 5, dev->perm_addr + 3, 3); eui[3] = 0xFF; eui[4] = 0xFE; eui[0] ^= 2; return 0; } static int ipv6_generate_eui64(u8 *eui, struct net_device *dev) { switch (dev->type) { case ARPHRD_ETHER: case ARPHRD_FDDI: return addrconf_ifid_eui48(eui, dev); case ARPHRD_ARCNET: return addrconf_ifid_arcnet(eui, dev); case ARPHRD_INFINIBAND: return addrconf_ifid_infiniband(eui, dev); case ARPHRD_SIT: return addrconf_ifid_sit(eui, dev); case ARPHRD_IPGRE: case ARPHRD_TUNNEL: return addrconf_ifid_gre(eui, dev); case ARPHRD_6LOWPAN: return addrconf_ifid_6lowpan(eui, dev); case ARPHRD_IEEE1394: return addrconf_ifid_ieee1394(eui, dev); case ARPHRD_TUNNEL6: case ARPHRD_IP6GRE: case ARPHRD_RAWIP: return addrconf_ifid_ip6tnl(eui, dev); } return -1; } static int ipv6_inherit_eui64(u8 *eui, struct inet6_dev *idev) { int err = -1; struct inet6_ifaddr *ifp; read_lock_bh(&idev->lock); list_for_each_entry_reverse(ifp, &idev->addr_list, if_list) { if (ifp->scope > IFA_LINK) break; if (ifp->scope == IFA_LINK && !(ifp->flags&IFA_F_TENTATIVE)) { memcpy(eui, ifp->addr.s6_addr+8, 8); err = 0; break; } } read_unlock_bh(&idev->lock); return err; } /* Generation of a randomized Interface Identifier * draft-ietf-6man-rfc4941bis, Section 3.3.1 */ static void ipv6_gen_rnd_iid(struct in6_addr *addr) { regen: get_random_bytes(&addr->s6_addr[8], 8); /* <draft-ietf-6man-rfc4941bis-08.txt>, Section 3.3.1: * check if generated address is not inappropriate: * * - Reserved IPv6 Interface Identifiers * - XXX: already assigned to an address on the device */ /* Subnet-router anycast: 0000:0000:0000:0000 */ if (!(addr->s6_addr32[2] | addr->s6_addr32[3])) goto regen; /* IANA Ethernet block: 0200:5EFF:FE00:0000-0200:5EFF:FE00:5212 * Proxy Mobile IPv6: 0200:5EFF:FE00:5213 * IANA Ethernet block: 0200:5EFF:FE00:5214-0200:5EFF:FEFF:FFFF */ if (ntohl(addr->s6_addr32[2]) == 0x02005eff && (ntohl(addr->s6_addr32[3]) & 0Xff000000) == 0xfe000000) goto regen; /* Reserved subnet anycast addresses */ if (ntohl(addr->s6_addr32[2]) == 0xfdffffff && ntohl(addr->s6_addr32[3]) >= 0Xffffff80) goto regen; } /* * Add prefix route. */ static void addrconf_prefix_route(struct in6_addr *pfx, int plen, u32 metric, struct net_device *dev, unsigned long expires, u32 flags, gfp_t gfp_flags) { struct fib6_config cfg = { .fc_table = l3mdev_fib_table(dev) ? : RT6_TABLE_PREFIX, .fc_metric = metric ? : IP6_RT_PRIO_ADDRCONF, .fc_ifindex = dev->ifindex, .fc_expires = expires, .fc_dst_len = plen, .fc_flags = RTF_UP | flags, .fc_nlinfo.nl_net = dev_net(dev), .fc_protocol = RTPROT_KERNEL, .fc_type = RTN_UNICAST, }; cfg.fc_dst = *pfx; /* Prevent useless cloning on PtP SIT. This thing is done here expecting that the whole class of non-broadcast devices need not cloning. */ #if IS_ENABLED(CONFIG_IPV6_SIT) if (dev->type == ARPHRD_SIT && (dev->flags & IFF_POINTOPOINT)) cfg.fc_flags |= RTF_NONEXTHOP; #endif ip6_route_add(&cfg, gfp_flags, NULL); } static struct fib6_info *addrconf_get_prefix_route(const struct in6_addr *pfx, int plen, const struct net_device *dev, u32 flags, u32 noflags, bool no_gw) { struct fib6_node *fn; struct fib6_info *rt = NULL; struct fib6_table *table; u32 tb_id = l3mdev_fib_table(dev) ? : RT6_TABLE_PREFIX; table = fib6_get_table(dev_net(dev), tb_id); if (!table) return NULL; rcu_read_lock(); fn = fib6_locate(&table->tb6_root, pfx, plen, NULL, 0, true); if (!fn) goto out; for_each_fib6_node_rt_rcu(fn) { /* prefix routes only use builtin fib6_nh */ if (rt->nh) continue; if (rt->fib6_nh->fib_nh_dev->ifindex != dev->ifindex) continue; if (no_gw && rt->fib6_nh->fib_nh_gw_family) continue; if ((rt->fib6_flags & flags) != flags) continue; if ((rt->fib6_flags & noflags) != 0) continue; if (!fib6_info_hold_safe(rt)) continue; break; } out: rcu_read_unlock(); return rt; } /* Create "default" multicast route to the interface */ static void addrconf_add_mroute(struct net_device *dev) { struct fib6_config cfg = { .fc_table = l3mdev_fib_table(dev) ? : RT6_TABLE_LOCAL, .fc_metric = IP6_RT_PRIO_ADDRCONF, .fc_ifindex = dev->ifindex, .fc_dst_len = 8, .fc_flags = RTF_UP, .fc_type = RTN_MULTICAST, .fc_nlinfo.nl_net = dev_net(dev), .fc_protocol = RTPROT_KERNEL, }; ipv6_addr_set(&cfg.fc_dst, htonl(0xFF000000), 0, 0, 0); ip6_route_add(&cfg, GFP_KERNEL, NULL); } static struct inet6_dev *addrconf_add_dev(struct net_device *dev) { struct inet6_dev *idev; ASSERT_RTNL(); idev = ipv6_find_idev(dev); if (IS_ERR(idev)) return idev; if (idev->cnf.disable_ipv6) return ERR_PTR(-EACCES); /* Add default multicast route */ if (!(dev->flags & IFF_LOOPBACK) && !netif_is_l3_master(dev)) addrconf_add_mroute(dev); return idev; } static void delete_tempaddrs(struct inet6_dev *idev, struct inet6_ifaddr *ifp) { struct inet6_ifaddr *ift, *tmp; write_lock_bh(&idev->lock); list_for_each_entry_safe(ift, tmp, &idev->tempaddr_list, tmp_list) { if (ift->ifpub != ifp) continue; in6_ifa_hold(ift); write_unlock_bh(&idev->lock); ipv6_del_addr(ift); write_lock_bh(&idev->lock); } write_unlock_bh(&idev->lock); } static void manage_tempaddrs(struct inet6_dev *idev, struct inet6_ifaddr *ifp, __u32 valid_lft, __u32 prefered_lft, bool create, unsigned long now) { u32 flags; struct inet6_ifaddr *ift; read_lock_bh(&idev->lock); /* update all temporary addresses in the list */ list_for_each_entry(ift, &idev->tempaddr_list, tmp_list) { int age, max_valid, max_prefered; if (ifp != ift->ifpub) continue; /* RFC 4941 section 3.3: * If a received option will extend the lifetime of a public * address, the lifetimes of temporary addresses should * be extended, subject to the overall constraint that no * temporary addresses should ever remain "valid" or "preferred" * for a time longer than (TEMP_VALID_LIFETIME) or * (TEMP_PREFERRED_LIFETIME - DESYNC_FACTOR), respectively. */ age = (now - ift->cstamp) / HZ; max_valid = READ_ONCE(idev->cnf.temp_valid_lft) - age; if (max_valid < 0) max_valid = 0; max_prefered = READ_ONCE(idev->cnf.temp_prefered_lft) - idev->desync_factor - age; if (max_prefered < 0) max_prefered = 0; if (valid_lft > max_valid) valid_lft = max_valid; if (prefered_lft > max_prefered) prefered_lft = max_prefered; spin_lock(&ift->lock); flags = ift->flags; ift->valid_lft = valid_lft; ift->prefered_lft = prefered_lft; ift->tstamp = now; if (prefered_lft > 0) ift->flags &= ~IFA_F_DEPRECATED; spin_unlock(&ift->lock); if (!(flags&IFA_F_TENTATIVE)) ipv6_ifa_notify(0, ift); } /* Also create a temporary address if it's enabled but no temporary * address currently exists. * However, we get called with valid_lft == 0, prefered_lft == 0, create == false * as part of cleanup (ie. deleting the mngtmpaddr). * We don't want that to result in creating a new temporary ip address. */ if (list_empty(&idev->tempaddr_list) && (valid_lft || prefered_lft)) create = true; if (create && READ_ONCE(idev->cnf.use_tempaddr) > 0) { /* When a new public address is created as described * in [ADDRCONF], also create a new temporary address. */ read_unlock_bh(&idev->lock); ipv6_create_tempaddr(ifp, false); } else { read_unlock_bh(&idev->lock); } } static bool is_addr_mode_generate_stable(struct inet6_dev *idev) { return idev->cnf.addr_gen_mode == IN6_ADDR_GEN_MODE_STABLE_PRIVACY || idev->cnf.addr_gen_mode == IN6_ADDR_GEN_MODE_RANDOM; } int addrconf_prefix_rcv_add_addr(struct net *net, struct net_device *dev, const struct prefix_info *pinfo, struct inet6_dev *in6_dev, const struct in6_addr *addr, int addr_type, u32 addr_flags, bool sllao, bool tokenized, __u32 valid_lft, u32 prefered_lft) { struct inet6_ifaddr *ifp = ipv6_get_ifaddr(net, addr, dev, 1); int create = 0, update_lft = 0; if (!ifp && valid_lft) { int max_addresses = READ_ONCE(in6_dev->cnf.max_addresses); struct ifa6_config cfg = { .pfx = addr, .plen = pinfo->prefix_len, .ifa_flags = addr_flags, .valid_lft = valid_lft, .preferred_lft = prefered_lft, .scope = addr_type & IPV6_ADDR_SCOPE_MASK, .ifa_proto = IFAPROT_KERNEL_RA }; #ifdef CONFIG_IPV6_OPTIMISTIC_DAD if ((READ_ONCE(net->ipv6.devconf_all->optimistic_dad) || READ_ONCE(in6_dev->cnf.optimistic_dad)) && !net->ipv6.devconf_all->forwarding && sllao) cfg.ifa_flags |= IFA_F_OPTIMISTIC; #endif /* Do not allow to create too much of autoconfigured * addresses; this would be too easy way to crash kernel. */ if (!max_addresses || ipv6_count_addresses(in6_dev) < max_addresses) ifp = ipv6_add_addr(in6_dev, &cfg, false, NULL); if (IS_ERR_OR_NULL(ifp)) return -1; create = 1; spin_lock_bh(&ifp->lock); ifp->flags |= IFA_F_MANAGETEMPADDR; ifp->cstamp = jiffies; ifp->tokenized = tokenized; spin_unlock_bh(&ifp->lock); addrconf_dad_start(ifp); } if (ifp) { u32 flags; unsigned long now; u32 stored_lft; /* update lifetime (RFC2462 5.5.3 e) */ spin_lock_bh(&ifp->lock); now = jiffies; if (ifp->valid_lft > (now - ifp->tstamp) / HZ) stored_lft = ifp->valid_lft - (now - ifp->tstamp) / HZ; else stored_lft = 0; /* RFC4862 Section 5.5.3e: * "Note that the preferred lifetime of the * corresponding address is always reset to * the Preferred Lifetime in the received * Prefix Information option, regardless of * whether the valid lifetime is also reset or * ignored." * * So we should always update prefered_lft here. */ update_lft = !create && stored_lft; if (update_lft && !READ_ONCE(in6_dev->cnf.ra_honor_pio_life)) { const u32 minimum_lft = min_t(u32, stored_lft, MIN_VALID_LIFETIME); valid_lft = max(valid_lft, minimum_lft); } if (update_lft) { ifp->valid_lft = valid_lft; ifp->prefered_lft = prefered_lft; WRITE_ONCE(ifp->tstamp, now); flags = ifp->flags; ifp->flags &= ~IFA_F_DEPRECATED; spin_unlock_bh(&ifp->lock); if (!(flags&IFA_F_TENTATIVE)) ipv6_ifa_notify(0, ifp); } else spin_unlock_bh(&ifp->lock); manage_tempaddrs(in6_dev, ifp, valid_lft, prefered_lft, create, now); in6_ifa_put(ifp); addrconf_verify(net); } return 0; } EXPORT_SYMBOL_GPL(addrconf_prefix_rcv_add_addr); void addrconf_prefix_rcv(struct net_device *dev, u8 *opt, int len, bool sllao) { struct prefix_info *pinfo; struct fib6_table *table; __u32 valid_lft; __u32 prefered_lft; int addr_type, err; u32 addr_flags = 0; struct inet6_dev *in6_dev; struct net *net = dev_net(dev); bool ignore_autoconf = false; pinfo = (struct prefix_info *) opt; if (len < sizeof(struct prefix_info)) { netdev_dbg(dev, "addrconf: prefix option too short\n"); return; } /* * Validation checks ([ADDRCONF], page 19) */ addr_type = ipv6_addr_type(&pinfo->prefix); if (addr_type & (IPV6_ADDR_MULTICAST|IPV6_ADDR_LINKLOCAL)) return; valid_lft = ntohl(pinfo->valid); prefered_lft = ntohl(pinfo->prefered); if (prefered_lft > valid_lft) { net_warn_ratelimited("addrconf: prefix option has invalid lifetime\n"); return; } in6_dev = in6_dev_get(dev); if (!in6_dev) { net_dbg_ratelimited("addrconf: device %s not configured\n", dev->name); return; } if (valid_lft != 0 && valid_lft < in6_dev->cnf.accept_ra_min_lft) goto put; /* * Two things going on here: * 1) Add routes for on-link prefixes * 2) Configure prefixes with the auto flag set */ if (pinfo->onlink) { struct fib6_info *rt; unsigned long rt_expires; /* Avoid arithmetic overflow. Really, we could * save rt_expires in seconds, likely valid_lft, * but it would require division in fib gc, that it * not good. */ if (HZ > USER_HZ) rt_expires = addrconf_timeout_fixup(valid_lft, HZ); else rt_expires = addrconf_timeout_fixup(valid_lft, USER_HZ); if (addrconf_finite_timeout(rt_expires)) rt_expires *= HZ; rt = addrconf_get_prefix_route(&pinfo->prefix, pinfo->prefix_len, dev, RTF_ADDRCONF | RTF_PREFIX_RT, RTF_DEFAULT, true); if (rt) { /* Autoconf prefix route */ if (valid_lft == 0) { ip6_del_rt(net, rt, false); rt = NULL; } else { table = rt->fib6_table; spin_lock_bh(&table->tb6_lock); if (addrconf_finite_timeout(rt_expires)) { /* not infinity */ fib6_set_expires(rt, jiffies + rt_expires); fib6_add_gc_list(rt); } else { fib6_clean_expires(rt); fib6_remove_gc_list(rt); } spin_unlock_bh(&table->tb6_lock); } } else if (valid_lft) { clock_t expires = 0; int flags = RTF_ADDRCONF | RTF_PREFIX_RT; if (addrconf_finite_timeout(rt_expires)) { /* not infinity */ flags |= RTF_EXPIRES; expires = jiffies_to_clock_t(rt_expires); } addrconf_prefix_route(&pinfo->prefix, pinfo->prefix_len, 0, dev, expires, flags, GFP_ATOMIC); } fib6_info_release(rt); } /* Try to figure out our local address for this prefix */ ignore_autoconf = READ_ONCE(in6_dev->cnf.ra_honor_pio_pflag) && pinfo->preferpd; if (pinfo->autoconf && in6_dev->cnf.autoconf && !ignore_autoconf) { struct in6_addr addr; bool tokenized = false, dev_addr_generated = false; if (pinfo->prefix_len == 64) { memcpy(&addr, &pinfo->prefix, 8); if (!ipv6_addr_any(&in6_dev->token)) { read_lock_bh(&in6_dev->lock); memcpy(addr.s6_addr + 8, in6_dev->token.s6_addr + 8, 8); read_unlock_bh(&in6_dev->lock); tokenized = true; } else if (is_addr_mode_generate_stable(in6_dev) && !ipv6_generate_stable_address(&addr, 0, in6_dev)) { addr_flags |= IFA_F_STABLE_PRIVACY; goto ok; } else if (ipv6_generate_eui64(addr.s6_addr + 8, dev) && ipv6_inherit_eui64(addr.s6_addr + 8, in6_dev)) { goto put; } else { dev_addr_generated = true; } goto ok; } net_dbg_ratelimited("IPv6 addrconf: prefix with wrong length %d\n", pinfo->prefix_len); goto put; ok: err = addrconf_prefix_rcv_add_addr(net, dev, pinfo, in6_dev, &addr, addr_type, addr_flags, sllao, tokenized, valid_lft, prefered_lft); if (err) goto put; /* Ignore error case here because previous prefix add addr was * successful which will be notified. */ ndisc_ops_prefix_rcv_add_addr(net, dev, pinfo, in6_dev, &addr, addr_type, addr_flags, sllao, tokenized, valid_lft, prefered_lft, dev_addr_generated); } inet6_prefix_notify(RTM_NEWPREFIX, in6_dev, pinfo); put: in6_dev_put(in6_dev); } static int addrconf_set_sit_dstaddr(struct net *net, struct net_device *dev, struct in6_ifreq *ireq) { struct ip_tunnel_parm_kern p = { }; int err; if (!(ipv6_addr_type(&ireq->ifr6_addr) & IPV6_ADDR_COMPATv4)) return -EADDRNOTAVAIL; p.iph.daddr = ireq->ifr6_addr.s6_addr32[3]; p.iph.version = 4; p.iph.ihl = 5; p.iph.protocol = IPPROTO_IPV6; p.iph.ttl = 64; if (!dev->netdev_ops->ndo_tunnel_ctl) return -EOPNOTSUPP; err = dev->netdev_ops->ndo_tunnel_ctl(dev, &p, SIOCADDTUNNEL); if (err) return err; dev = __dev_get_by_name(net, p.name); if (!dev) return -ENOBUFS; return dev_open(dev, NULL); } /* * Set destination address. * Special case for SIT interfaces where we create a new "virtual" * device. */ int addrconf_set_dstaddr(struct net *net, void __user *arg) { struct net_device *dev; struct in6_ifreq ireq; int err = -ENODEV; if (!IS_ENABLED(CONFIG_IPV6_SIT)) return -ENODEV; if (copy_from_user(&ireq, arg, sizeof(struct in6_ifreq))) return -EFAULT; rtnl_net_lock(net); dev = __dev_get_by_index(net, ireq.ifr6_ifindex); if (dev && dev->type == ARPHRD_SIT) err = addrconf_set_sit_dstaddr(net, dev, &ireq); rtnl_net_unlock(net); return err; } static int ipv6_mc_config(struct sock *sk, bool join, const struct in6_addr *addr, int ifindex) { int ret; ASSERT_RTNL(); lock_sock(sk); if (join) ret = ipv6_sock_mc_join(sk, ifindex, addr); else ret = ipv6_sock_mc_drop(sk, ifindex, addr); release_sock(sk); return ret; } /* * Manual configuration of address on an interface */ static int inet6_addr_add(struct net *net, struct net_device *dev, struct ifa6_config *cfg, clock_t expires, u32 flags, struct netlink_ext_ack *extack) { struct inet6_ifaddr *ifp; struct inet6_dev *idev; ASSERT_RTNL_NET(net); if (cfg->plen > 128) { NL_SET_ERR_MSG_MOD(extack, "Invalid prefix length"); return -EINVAL; } if (cfg->ifa_flags & IFA_F_MANAGETEMPADDR && cfg->plen != 64) { NL_SET_ERR_MSG_MOD(extack, "address with \"mngtmpaddr\" flag must have a prefix length of 64"); return -EINVAL; } idev = addrconf_add_dev(dev); if (IS_ERR(idev)) { NL_SET_ERR_MSG_MOD(extack, "IPv6 is disabled on this device"); return PTR_ERR(idev); } if (cfg->ifa_flags & IFA_F_MCAUTOJOIN) { int ret = ipv6_mc_config(net->ipv6.mc_autojoin_sk, true, cfg->pfx, dev->ifindex); if (ret < 0) { NL_SET_ERR_MSG_MOD(extack, "Multicast auto join failed"); return ret; } } cfg->scope = ipv6_addr_scope(cfg->pfx); ifp = ipv6_add_addr(idev, cfg, true, extack); if (!IS_ERR(ifp)) { if (!(cfg->ifa_flags & IFA_F_NOPREFIXROUTE)) { addrconf_prefix_route(&ifp->addr, ifp->prefix_len, ifp->rt_priority, dev, expires, flags, GFP_KERNEL); } /* Send a netlink notification if DAD is enabled and * optimistic flag is not set */ if (!(ifp->flags & (IFA_F_OPTIMISTIC | IFA_F_NODAD))) ipv6_ifa_notify(0, ifp); /* * Note that section 3.1 of RFC 4429 indicates * that the Optimistic flag should not be set for * manually configured addresses */ addrconf_dad_start(ifp); if (cfg->ifa_flags & IFA_F_MANAGETEMPADDR) manage_tempaddrs(idev, ifp, cfg->valid_lft, cfg->preferred_lft, true, jiffies); in6_ifa_put(ifp); addrconf_verify_rtnl(net); return 0; } else if (cfg->ifa_flags & IFA_F_MCAUTOJOIN) { ipv6_mc_config(net->ipv6.mc_autojoin_sk, false, cfg->pfx, dev->ifindex); } return PTR_ERR(ifp); } static int inet6_addr_del(struct net *net, int ifindex, u32 ifa_flags, const struct in6_addr *pfx, unsigned int plen, struct netlink_ext_ack *extack) { struct inet6_ifaddr *ifp; struct inet6_dev *idev; struct net_device *dev; if (plen > 128) { NL_SET_ERR_MSG_MOD(extack, "Invalid prefix length"); return -EINVAL; } dev = __dev_get_by_index(net, ifindex); if (!dev) { NL_SET_ERR_MSG_MOD(extack, "Unable to find the interface"); return -ENODEV; } idev = __in6_dev_get_rtnl_net(dev); if (!idev) { NL_SET_ERR_MSG_MOD(extack, "IPv6 is disabled on this device"); return -ENXIO; } read_lock_bh(&idev->lock); list_for_each_entry(ifp, &idev->addr_list, if_list) { if (ifp->prefix_len == plen && ipv6_addr_equal(pfx, &ifp->addr)) { in6_ifa_hold(ifp); read_unlock_bh(&idev->lock); ipv6_del_addr(ifp); if (!(ifp->flags & IFA_F_TEMPORARY) && (ifp->flags & IFA_F_MANAGETEMPADDR)) delete_tempaddrs(idev, ifp); addrconf_verify_rtnl(net); if (ipv6_addr_is_multicast(pfx)) { ipv6_mc_config(net->ipv6.mc_autojoin_sk, false, pfx, dev->ifindex); } return 0; } } read_unlock_bh(&idev->lock); NL_SET_ERR_MSG_MOD(extack, "address not found"); return -EADDRNOTAVAIL; } int addrconf_add_ifaddr(struct net *net, void __user *arg) { struct ifa6_config cfg = { .ifa_flags = IFA_F_PERMANENT, .preferred_lft = INFINITY_LIFE_TIME, .valid_lft = INFINITY_LIFE_TIME, }; struct net_device *dev; struct in6_ifreq ireq; int err; if (!ns_capable(net->user_ns, CAP_NET_ADMIN)) return -EPERM; if (copy_from_user(&ireq, arg, sizeof(struct in6_ifreq))) return -EFAULT; cfg.pfx = &ireq.ifr6_addr; cfg.plen = ireq.ifr6_prefixlen; rtnl_net_lock(net); dev = __dev_get_by_index(net, ireq.ifr6_ifindex); if (dev) { netdev_lock_ops(dev); err = inet6_addr_add(net, dev, &cfg, 0, 0, NULL); netdev_unlock_ops(dev); } else { err = -ENODEV; } rtnl_net_unlock(net); return err; } int addrconf_del_ifaddr(struct net *net, void __user *arg) { struct in6_ifreq ireq; int err; if (!ns_capable(net->user_ns, CAP_NET_ADMIN)) return -EPERM; if (copy_from_user(&ireq, arg, sizeof(struct in6_ifreq))) return -EFAULT; rtnl_net_lock(net); err = inet6_addr_del(net, ireq.ifr6_ifindex, 0, &ireq.ifr6_addr, ireq.ifr6_prefixlen, NULL); rtnl_net_unlock(net); return err; } static void add_addr(struct inet6_dev *idev, const struct in6_addr *addr, int plen, int scope, u8 proto) { struct inet6_ifaddr *ifp; struct ifa6_config cfg = { .pfx = addr, .plen = plen, .ifa_flags = IFA_F_PERMANENT, .valid_lft = INFINITY_LIFE_TIME, .preferred_lft = INFINITY_LIFE_TIME, .scope = scope, .ifa_proto = proto }; ifp = ipv6_add_addr(idev, &cfg, true, NULL); if (!IS_ERR(ifp)) { spin_lock_bh(&ifp->lock); ifp->flags &= ~IFA_F_TENTATIVE; spin_unlock_bh(&ifp->lock); rt_genid_bump_ipv6(dev_net(idev->dev)); ipv6_ifa_notify(RTM_NEWADDR, ifp); in6_ifa_put(ifp); } } #if IS_ENABLED(CONFIG_IPV6_SIT) || IS_ENABLED(CONFIG_NET_IPGRE) static void add_v4_addrs(struct inet6_dev *idev) { struct in6_addr addr; struct net_device *dev; struct net *net = dev_net(idev->dev); int scope, plen; u32 pflags = 0; ASSERT_RTNL(); memset(&addr, 0, sizeof(struct in6_addr)); memcpy(&addr.s6_addr32[3], idev->dev->dev_addr, 4); if (!(idev->dev->flags & IFF_POINTOPOINT) && idev->dev->type == ARPHRD_SIT) { scope = IPV6_ADDR_COMPATv4; plen = 96; pflags |= RTF_NONEXTHOP; } else { if (idev->cnf.addr_gen_mode == IN6_ADDR_GEN_MODE_NONE) return; addr.s6_addr32[0] = htonl(0xfe800000); scope = IFA_LINK; plen = 64; } if (addr.s6_addr32[3]) { add_addr(idev, &addr, plen, scope, IFAPROT_UNSPEC); addrconf_prefix_route(&addr, plen, 0, idev->dev, 0, pflags, GFP_KERNEL); return; } for_each_netdev(net, dev) { struct in_device *in_dev = __in_dev_get_rtnl(dev); if (in_dev && (dev->flags & IFF_UP)) { struct in_ifaddr *ifa; int flag = scope; in_dev_for_each_ifa_rtnl(ifa, in_dev) { addr.s6_addr32[3] = ifa->ifa_local; if (ifa->ifa_scope == RT_SCOPE_LINK) continue; if (ifa->ifa_scope >= RT_SCOPE_HOST) { if (idev->dev->flags&IFF_POINTOPOINT) continue; flag |= IFA_HOST; } add_addr(idev, &addr, plen, flag, IFAPROT_UNSPEC); addrconf_prefix_route(&addr, plen, 0, idev->dev, 0, pflags, GFP_KERNEL); } } } } #endif static void init_loopback(struct net_device *dev) { struct inet6_dev *idev; /* ::1 */ ASSERT_RTNL(); idev = ipv6_find_idev(dev); if (IS_ERR(idev)) { pr_debug("%s: add_dev failed\n", __func__); return; } add_addr(idev, &in6addr_loopback, 128, IFA_HOST, IFAPROT_KERNEL_LO); } void addrconf_add_linklocal(struct inet6_dev *idev, const struct in6_addr *addr, u32 flags) { struct ifa6_config cfg = { .pfx = addr, .plen = 64, .ifa_flags = flags | IFA_F_PERMANENT, .valid_lft = INFINITY_LIFE_TIME, .preferred_lft = INFINITY_LIFE_TIME, .scope = IFA_LINK, .ifa_proto = IFAPROT_KERNEL_LL }; struct inet6_ifaddr *ifp; #ifdef CONFIG_IPV6_OPTIMISTIC_DAD if ((READ_ONCE(dev_net(idev->dev)->ipv6.devconf_all->optimistic_dad) || READ_ONCE(idev->cnf.optimistic_dad)) && !dev_net(idev->dev)->ipv6.devconf_all->forwarding) cfg.ifa_flags |= IFA_F_OPTIMISTIC; #endif ifp = ipv6_add_addr(idev, &cfg, true, NULL); if (!IS_ERR(ifp)) { addrconf_prefix_route(&ifp->addr, ifp->prefix_len, 0, idev->dev, 0, 0, GFP_ATOMIC); addrconf_dad_start(ifp); in6_ifa_put(ifp); } } EXPORT_SYMBOL_GPL(addrconf_add_linklocal); static bool ipv6_reserved_interfaceid(struct in6_addr address) { if ((address.s6_addr32[2] | address.s6_addr32[3]) == 0) return true; if (address.s6_addr32[2] == htonl(0x02005eff) && ((address.s6_addr32[3] & htonl(0xfe000000)) == htonl(0xfe000000))) return true; if (address.s6_addr32[2] == htonl(0xfdffffff) && ((address.s6_addr32[3] & htonl(0xffffff80)) == htonl(0xffffff80))) return true; return false; } static int ipv6_generate_stable_address(struct in6_addr *address, u8 dad_count, const struct inet6_dev *idev) { static DEFINE_SPINLOCK(lock); static __u32 digest[SHA1_DIGEST_WORDS]; static __u32 workspace[SHA1_WORKSPACE_WORDS]; static union { char __data[SHA1_BLOCK_SIZE]; struct { struct in6_addr secret; __be32 prefix[2]; unsigned char hwaddr[MAX_ADDR_LEN]; u8 dad_count; } __packed; } data; struct in6_addr secret; struct in6_addr temp; struct net *net = dev_net(idev->dev); BUILD_BUG_ON(sizeof(data.__data) != sizeof(data)); if (idev->cnf.stable_secret.initialized) secret = idev->cnf.stable_secret.secret; else if (net->ipv6.devconf_dflt->stable_secret.initialized) secret = net->ipv6.devconf_dflt->stable_secret.secret; else return -1; retry: spin_lock_bh(&lock); sha1_init_raw(digest); memset(&data, 0, sizeof(data)); memset(workspace, 0, sizeof(workspace)); memcpy(data.hwaddr, idev->dev->perm_addr, idev->dev->addr_len); data.prefix[0] = address->s6_addr32[0]; data.prefix[1] = address->s6_addr32[1]; data.secret = secret; data.dad_count = dad_count; sha1_transform(digest, data.__data, workspace); temp = *address; temp.s6_addr32[2] = (__force __be32)digest[0]; temp.s6_addr32[3] = (__force __be32)digest[1]; spin_unlock_bh(&lock); if (ipv6_reserved_interfaceid(temp)) { dad_count++; if (dad_count > dev_net(idev->dev)->ipv6.sysctl.idgen_retries) return -1; goto retry; } *address = temp; return 0; } static void ipv6_gen_mode_random_init(struct inet6_dev *idev) { struct ipv6_stable_secret *s = &idev->cnf.stable_secret; if (s->initialized) return; s = &idev->cnf.stable_secret; get_random_bytes(&s->secret, sizeof(s->secret)); s->initialized = true; } static void addrconf_addr_gen(struct inet6_dev *idev, bool prefix_route) { struct in6_addr addr; /* no link local addresses on L3 master devices */ if (netif_is_l3_master(idev->dev)) return; /* no link local addresses on devices flagged as slaves */ if (idev->dev->priv_flags & IFF_NO_ADDRCONF) return; ipv6_addr_set(&addr, htonl(0xFE800000), 0, 0, 0); switch (idev->cnf.addr_gen_mode) { case IN6_ADDR_GEN_MODE_RANDOM: ipv6_gen_mode_random_init(idev); fallthrough; case IN6_ADDR_GEN_MODE_STABLE_PRIVACY: if (!ipv6_generate_stable_address(&addr, 0, idev)) addrconf_add_linklocal(idev, &addr, IFA_F_STABLE_PRIVACY); else if (prefix_route) addrconf_prefix_route(&addr, 64, 0, idev->dev, 0, 0, GFP_KERNEL); break; case IN6_ADDR_GEN_MODE_EUI64: /* addrconf_add_linklocal also adds a prefix_route and we * only need to care about prefix routes if ipv6_generate_eui64 * couldn't generate one. */ if (ipv6_generate_eui64(addr.s6_addr + 8, idev->dev) == 0) addrconf_add_linklocal(idev, &addr, 0); else if (prefix_route) addrconf_prefix_route(&addr, 64, 0, idev->dev, 0, 0, GFP_KERNEL); break; case IN6_ADDR_GEN_MODE_NONE: default: /* will not add any link local address */ break; } } static void addrconf_dev_config(struct net_device *dev) { struct inet6_dev *idev; ASSERT_RTNL(); if ((dev->type != ARPHRD_ETHER) && (dev->type != ARPHRD_FDDI) && (dev->type != ARPHRD_ARCNET) && (dev->type != ARPHRD_INFINIBAND) && (dev->type != ARPHRD_IEEE1394) && (dev->type != ARPHRD_TUNNEL6) && (dev->type != ARPHRD_6LOWPAN) && (dev->type != ARPHRD_IP6GRE) && (dev->type != ARPHRD_TUNNEL) && (dev->type != ARPHRD_NONE) && (dev->type != ARPHRD_RAWIP)) { /* Alas, we support only Ethernet autoconfiguration. */ idev = __in6_dev_get(dev); if (!IS_ERR_OR_NULL(idev) && dev->flags & IFF_UP && dev->flags & IFF_MULTICAST) ipv6_mc_up(idev); return; } idev = addrconf_add_dev(dev); if (IS_ERR(idev)) return; /* this device type has no EUI support */ if (dev->type == ARPHRD_NONE && idev->cnf.addr_gen_mode == IN6_ADDR_GEN_MODE_EUI64) WRITE_ONCE(idev->cnf.addr_gen_mode, IN6_ADDR_GEN_MODE_RANDOM); addrconf_addr_gen(idev, false); } #if IS_ENABLED(CONFIG_IPV6_SIT) static void addrconf_sit_config(struct net_device *dev) { struct inet6_dev *idev; ASSERT_RTNL(); /* * Configure the tunnel with one of our IPv4 * addresses... we should configure all of * our v4 addrs in the tunnel */ idev = ipv6_find_idev(dev); if (IS_ERR(idev)) { pr_debug("%s: add_dev failed\n", __func__); return; } if (dev->priv_flags & IFF_ISATAP) { addrconf_addr_gen(idev, false); return; } add_v4_addrs(idev); if (dev->flags&IFF_POINTOPOINT) addrconf_add_mroute(dev); } #endif #if IS_ENABLED(CONFIG_NET_IPGRE) static void addrconf_gre_config(struct net_device *dev) { struct inet6_dev *idev; ASSERT_RTNL(); idev = addrconf_add_dev(dev); if (IS_ERR(idev)) return; /* Generate the IPv6 link-local address using addrconf_addr_gen(), * unless we have an IPv4 GRE device not bound to an IP address and * which is in EUI64 mode (as __ipv6_isatap_ifid() would fail in this * case). Such devices fall back to add_v4_addrs() instead. */ if (!(*(__be32 *)dev->dev_addr == 0 && idev->cnf.addr_gen_mode == IN6_ADDR_GEN_MODE_EUI64)) { addrconf_addr_gen(idev, true); return; } add_v4_addrs(idev); } #endif static void addrconf_init_auto_addrs(struct net_device *dev) { switch (dev->type) { #if IS_ENABLED(CONFIG_IPV6_SIT) case ARPHRD_SIT: addrconf_sit_config(dev); break; #endif #if IS_ENABLED(CONFIG_NET_IPGRE) case ARPHRD_IPGRE: addrconf_gre_config(dev); break; #endif case ARPHRD_LOOPBACK: init_loopback(dev); break; default: addrconf_dev_config(dev); break; } } static int fixup_permanent_addr(struct net *net, struct inet6_dev *idev, struct inet6_ifaddr *ifp) { /* !fib6_node means the host route was removed from the * FIB, for example, if 'lo' device is taken down. In that * case regenerate the host route. */ if (!ifp->rt || !ifp->rt->fib6_node) { struct fib6_info *f6i, *prev; f6i = addrconf_f6i_alloc(net, idev, &ifp->addr, false, GFP_ATOMIC, NULL); if (IS_ERR(f6i)) return PTR_ERR(f6i); /* ifp->rt can be accessed outside of rtnl */ spin_lock(&ifp->lock); prev = ifp->rt; ifp->rt = f6i; spin_unlock(&ifp->lock); fib6_info_release(prev); } if (!(ifp->flags & IFA_F_NOPREFIXROUTE)) { addrconf_prefix_route(&ifp->addr, ifp->prefix_len, ifp->rt_priority, idev->dev, 0, 0, GFP_ATOMIC); } if (ifp->state == INET6_IFADDR_STATE_PREDAD) addrconf_dad_start(ifp); return 0; } static void addrconf_permanent_addr(struct net *net, struct net_device *dev) { struct inet6_ifaddr *ifp, *tmp; struct inet6_dev *idev; idev = __in6_dev_get(dev); if (!idev) return; write_lock_bh(&idev->lock); list_for_each_entry_safe(ifp, tmp, &idev->addr_list, if_list) { if ((ifp->flags & IFA_F_PERMANENT) && fixup_permanent_addr(net, idev, ifp) < 0) { write_unlock_bh(&idev->lock); in6_ifa_hold(ifp); ipv6_del_addr(ifp); write_lock_bh(&idev->lock); net_info_ratelimited("%s: Failed to add prefix route for address %pI6c; dropping\n", idev->dev->name, &ifp->addr); } } write_unlock_bh(&idev->lock); } static int addrconf_notify(struct notifier_block *this, unsigned long event, void *ptr) { struct net_device *dev = netdev_notifier_info_to_dev(ptr); struct netdev_notifier_change_info *change_info; struct netdev_notifier_changeupper_info *info; struct inet6_dev *idev = __in6_dev_get(dev); struct net *net = dev_net(dev); int run_pending = 0; int err; switch (event) { case NETDEV_REGISTER: if (!idev && dev->mtu >= IPV6_MIN_MTU) { idev = ipv6_add_dev(dev); if (IS_ERR(idev)) return notifier_from_errno(PTR_ERR(idev)); } break; case NETDEV_CHANGEMTU: /* if MTU under IPV6_MIN_MTU stop IPv6 on this interface. */ if (dev->mtu < IPV6_MIN_MTU) { addrconf_ifdown(dev, dev != net->loopback_dev); break; } if (idev) { rt6_mtu_change(dev, dev->mtu); WRITE_ONCE(idev->cnf.mtu6, dev->mtu); break; } /* allocate new idev */ idev = ipv6_add_dev(dev); if (IS_ERR(idev)) break; /* device is still not ready */ if (!(idev->if_flags & IF_READY)) break; run_pending = 1; fallthrough; case NETDEV_UP: case NETDEV_CHANGE: if (idev && idev->cnf.disable_ipv6) break; if (dev->priv_flags & IFF_NO_ADDRCONF) { if (event == NETDEV_UP && !IS_ERR_OR_NULL(idev) && dev->flags & IFF_UP && dev->flags & IFF_MULTICAST) ipv6_mc_up(idev); break; } if (event == NETDEV_UP) { /* restore routes for permanent addresses */ addrconf_permanent_addr(net, dev); if (!addrconf_link_ready(dev)) { /* device is not ready yet. */ pr_debug("ADDRCONF(NETDEV_UP): %s: link is not ready\n", dev->name); break; } if (!idev && dev->mtu >= IPV6_MIN_MTU) idev = ipv6_add_dev(dev); if (!IS_ERR_OR_NULL(idev)) { idev->if_flags |= IF_READY; run_pending = 1; } } else if (event == NETDEV_CHANGE) { if (!addrconf_link_ready(dev)) { /* device is still not ready. */ rt6_sync_down_dev(dev, event); break; } if (!IS_ERR_OR_NULL(idev)) { if (idev->if_flags & IF_READY) { /* device is already configured - * but resend MLD reports, we might * have roamed and need to update * multicast snooping switches */ ipv6_mc_up(idev); change_info = ptr; if (change_info->flags_changed & IFF_NOARP) addrconf_dad_run(idev, true); rt6_sync_up(dev, RTNH_F_LINKDOWN); break; } idev->if_flags |= IF_READY; } pr_debug("ADDRCONF(NETDEV_CHANGE): %s: link becomes ready\n", dev->name); run_pending = 1; } addrconf_init_auto_addrs(dev); if (!IS_ERR_OR_NULL(idev)) { if (run_pending) addrconf_dad_run(idev, false); /* Device has an address by now */ rt6_sync_up(dev, RTNH_F_DEAD); /* * If the MTU changed during the interface down, * when the interface up, the changed MTU must be * reflected in the idev as well as routers. */ if (idev->cnf.mtu6 != dev->mtu && dev->mtu >= IPV6_MIN_MTU) { rt6_mtu_change(dev, dev->mtu); WRITE_ONCE(idev->cnf.mtu6, dev->mtu); } WRITE_ONCE(idev->tstamp, jiffies); inet6_ifinfo_notify(RTM_NEWLINK, idev); /* * If the changed mtu during down is lower than * IPV6_MIN_MTU stop IPv6 on this interface. */ if (dev->mtu < IPV6_MIN_MTU) addrconf_ifdown(dev, dev != net->loopback_dev); } break; case NETDEV_DOWN: case NETDEV_UNREGISTER: /* * Remove all addresses from this interface. */ addrconf_ifdown(dev, event != NETDEV_DOWN); break; case NETDEV_CHANGENAME: if (idev) { snmp6_unregister_dev(idev); addrconf_sysctl_unregister(idev); err = addrconf_sysctl_register(idev); if (err) return notifier_from_errno(err); err = snmp6_register_dev(idev); if (err) { addrconf_sysctl_unregister(idev); return notifier_from_errno(err); } } break; case NETDEV_PRE_TYPE_CHANGE: case NETDEV_POST_TYPE_CHANGE: if (idev) addrconf_type_change(dev, event); break; case NETDEV_CHANGEUPPER: info = ptr; /* flush all routes if dev is linked to or unlinked from * an L3 master device (e.g., VRF) */ if (info->upper_dev && netif_is_l3_master(info->upper_dev)) addrconf_ifdown(dev, false); } return NOTIFY_OK; } /* * addrconf module should be notified of a device going up */ static struct notifier_block ipv6_dev_notf = { .notifier_call = addrconf_notify, .priority = ADDRCONF_NOTIFY_PRIORITY, }; static void addrconf_type_change(struct net_device *dev, unsigned long event) { struct inet6_dev *idev; ASSERT_RTNL(); idev = __in6_dev_get(dev); if (event == NETDEV_POST_TYPE_CHANGE) ipv6_mc_remap(idev); else if (event == NETDEV_PRE_TYPE_CHANGE) ipv6_mc_unmap(idev); } static bool addr_is_local(const struct in6_addr *addr) { return ipv6_addr_type(addr) & (IPV6_ADDR_LINKLOCAL | IPV6_ADDR_LOOPBACK); } static int addrconf_ifdown(struct net_device *dev, bool unregister) { unsigned long event = unregister ? NETDEV_UNREGISTER : NETDEV_DOWN; struct net *net = dev_net(dev); struct inet6_dev *idev; struct inet6_ifaddr *ifa; LIST_HEAD(tmp_addr_list); bool keep_addr = false; bool was_ready; int state, i; ASSERT_RTNL(); rt6_disable_ip(dev, event); idev = __in6_dev_get(dev); if (!idev) return -ENODEV; /* * Step 1: remove reference to ipv6 device from parent device. * Do not dev_put! */ if (unregister) { WRITE_ONCE(idev->dead, 1); /* protected by rtnl_lock */ RCU_INIT_POINTER(dev->ip6_ptr, NULL); /* Step 1.5: remove snmp6 entry */ snmp6_unregister_dev(idev); } /* combine the user config with event to determine if permanent * addresses are to be removed from address hash table */ if (!unregister && !idev->cnf.disable_ipv6) { /* aggregate the system setting and interface setting */ int _keep_addr = READ_ONCE(net->ipv6.devconf_all->keep_addr_on_down); if (!_keep_addr) _keep_addr = READ_ONCE(idev->cnf.keep_addr_on_down); keep_addr = (_keep_addr > 0); } /* Step 2: clear hash table */ for (i = 0; i < IN6_ADDR_HSIZE; i++) { struct hlist_head *h = &net->ipv6.inet6_addr_lst[i]; spin_lock_bh(&net->ipv6.addrconf_hash_lock); restart: hlist_for_each_entry_rcu(ifa, h, addr_lst) { if (ifa->idev == idev) { addrconf_del_dad_work(ifa); /* combined flag + permanent flag decide if * address is retained on a down event */ if (!keep_addr || !(ifa->flags & IFA_F_PERMANENT) || addr_is_local(&ifa->addr)) { hlist_del_init_rcu(&ifa->addr_lst); goto restart; } } } spin_unlock_bh(&net->ipv6.addrconf_hash_lock); } write_lock_bh(&idev->lock); addrconf_del_rs_timer(idev); /* Step 2: clear flags for stateless addrconf, repeated down * detection */ was_ready = idev->if_flags & IF_READY; if (!unregister) idev->if_flags &= ~(IF_RS_SENT|IF_RA_RCVD|IF_READY); /* Step 3: clear tempaddr list */ while (!list_empty(&idev->tempaddr_list)) { ifa = list_first_entry(&idev->tempaddr_list, struct inet6_ifaddr, tmp_list); list_del(&ifa->tmp_list); write_unlock_bh(&idev->lock); spin_lock_bh(&ifa->lock); if (ifa->ifpub) { in6_ifa_put(ifa->ifpub); ifa->ifpub = NULL; } spin_unlock_bh(&ifa->lock); in6_ifa_put(ifa); write_lock_bh(&idev->lock); } list_for_each_entry(ifa, &idev->addr_list, if_list) list_add_tail(&ifa->if_list_aux, &tmp_addr_list); write_unlock_bh(&idev->lock); while (!list_empty(&tmp_addr_list)) { struct fib6_info *rt = NULL; bool keep; ifa = list_first_entry(&tmp_addr_list, struct inet6_ifaddr, if_list_aux); list_del(&ifa->if_list_aux); addrconf_del_dad_work(ifa); keep = keep_addr && (ifa->flags & IFA_F_PERMANENT) && !addr_is_local(&ifa->addr); spin_lock_bh(&ifa->lock); if (keep) { /* set state to skip the notifier below */ state = INET6_IFADDR_STATE_DEAD; ifa->state = INET6_IFADDR_STATE_PREDAD; if (!(ifa->flags & IFA_F_NODAD)) ifa->flags |= IFA_F_TENTATIVE; rt = ifa->rt; ifa->rt = NULL; } else { state = ifa->state; ifa->state = INET6_IFADDR_STATE_DEAD; } spin_unlock_bh(&ifa->lock); if (rt) ip6_del_rt(net, rt, false); if (state != INET6_IFADDR_STATE_DEAD) { __ipv6_ifa_notify(RTM_DELADDR, ifa); inet6addr_notifier_call_chain(NETDEV_DOWN, ifa); } else { if (idev->cnf.forwarding) addrconf_leave_anycast(ifa); addrconf_leave_solict(ifa->idev, &ifa->addr); } if (!keep) { write_lock_bh(&idev->lock); list_del_rcu(&ifa->if_list); write_unlock_bh(&idev->lock); in6_ifa_put(ifa); } } /* Step 5: Discard anycast and multicast list */ if (unregister) { ipv6_ac_destroy_dev(idev); ipv6_mc_destroy_dev(idev); } else if (was_ready) { ipv6_mc_down(idev); } WRITE_ONCE(idev->tstamp, jiffies); idev->ra_mtu = 0; /* Last: Shot the device (if unregistered) */ if (unregister) { addrconf_sysctl_unregister(idev); neigh_parms_release(&nd_tbl, idev->nd_parms); neigh_ifdown(&nd_tbl, dev); in6_dev_put(idev); } return 0; } static void addrconf_rs_timer(struct timer_list *t) { struct inet6_dev *idev = timer_container_of(idev, t, rs_timer); struct net_device *dev = idev->dev; struct in6_addr lladdr; int rtr_solicits; write_lock(&idev->lock); if (idev->dead || !(idev->if_flags & IF_READY)) goto out; if (!ipv6_accept_ra(idev)) goto out; /* Announcement received after solicitation was sent */ if (idev->if_flags & IF_RA_RCVD) goto out; rtr_solicits = READ_ONCE(idev->cnf.rtr_solicits); if (idev->rs_probes++ < rtr_solicits || rtr_solicits < 0) { write_unlock(&idev->lock); if (!ipv6_get_lladdr(dev, &lladdr, IFA_F_TENTATIVE)) ndisc_send_rs(dev, &lladdr, &in6addr_linklocal_allrouters); else goto put; write_lock(&idev->lock); idev->rs_interval = rfc3315_s14_backoff_update( idev->rs_interval, READ_ONCE(idev->cnf.rtr_solicit_max_interval)); /* The wait after the last probe can be shorter */ addrconf_mod_rs_timer(idev, (idev->rs_probes == READ_ONCE(idev->cnf.rtr_solicits)) ? READ_ONCE(idev->cnf.rtr_solicit_delay) : idev->rs_interval); } else { /* * Note: we do not support deprecated "all on-link" * assumption any longer. */ pr_debug("%s: no IPv6 routers present\n", idev->dev->name); } out: write_unlock(&idev->lock); put: in6_dev_put(idev); } /* * Duplicate Address Detection */ static void addrconf_dad_kick(struct inet6_ifaddr *ifp) { struct inet6_dev *idev = ifp->idev; unsigned long rand_num; u64 nonce; if (ifp->flags & IFA_F_OPTIMISTIC) rand_num = 0; else rand_num = get_random_u32_below( READ_ONCE(idev->cnf.rtr_solicit_delay) ? : 1); nonce = 0; if (READ_ONCE(idev->cnf.enhanced_dad) || READ_ONCE(dev_net(idev->dev)->ipv6.devconf_all->enhanced_dad)) { do get_random_bytes(&nonce, 6); while (nonce == 0); } ifp->dad_nonce = nonce; ifp->dad_probes = READ_ONCE(idev->cnf.dad_transmits); addrconf_mod_dad_work(ifp, rand_num); } static void addrconf_dad_begin(struct inet6_ifaddr *ifp) { struct inet6_dev *idev = ifp->idev; struct net_device *dev = idev->dev; bool bump_id, notify = false; struct net *net; addrconf_join_solict(dev, &ifp->addr); read_lock_bh(&idev->lock); spin_lock(&ifp->lock); if (ifp->state == INET6_IFADDR_STATE_DEAD) goto out; net = dev_net(dev); if (dev->flags&(IFF_NOARP|IFF_LOOPBACK) || (READ_ONCE(net->ipv6.devconf_all->accept_dad) < 1 && READ_ONCE(idev->cnf.accept_dad) < 1) || !(ifp->flags&IFA_F_TENTATIVE) || ifp->flags & IFA_F_NODAD) { bool send_na = false; if (ifp->flags & IFA_F_TENTATIVE && !(ifp->flags & IFA_F_OPTIMISTIC)) send_na = true; bump_id = ifp->flags & IFA_F_TENTATIVE; ifp->flags &= ~(IFA_F_TENTATIVE|IFA_F_OPTIMISTIC|IFA_F_DADFAILED); spin_unlock(&ifp->lock); read_unlock_bh(&idev->lock); addrconf_dad_completed(ifp, bump_id, send_na); return; } if (!(idev->if_flags & IF_READY)) { spin_unlock(&ifp->lock); read_unlock_bh(&idev->lock); /* * If the device is not ready: * - keep it tentative if it is a permanent address. * - otherwise, kill it. */ in6_ifa_hold(ifp); addrconf_dad_stop(ifp, 0); return; } /* * Optimistic nodes can start receiving * Frames right away */ if (ifp->flags & IFA_F_OPTIMISTIC) { ip6_ins_rt(net, ifp->rt); if (ipv6_use_optimistic_addr(net, idev)) { /* Because optimistic nodes can use this address, * notify listeners. If DAD fails, RTM_DELADDR is sent. */ notify = true; } } addrconf_dad_kick(ifp); out: spin_unlock(&ifp->lock); read_unlock_bh(&idev->lock); if (notify) ipv6_ifa_notify(RTM_NEWADDR, ifp); } static void addrconf_dad_start(struct inet6_ifaddr *ifp) { bool begin_dad = false; spin_lock_bh(&ifp->lock); if (ifp->state != INET6_IFADDR_STATE_DEAD) { ifp->state = INET6_IFADDR_STATE_PREDAD; begin_dad = true; } spin_unlock_bh(&ifp->lock); if (begin_dad) addrconf_mod_dad_work(ifp, 0); } static void addrconf_dad_work(struct work_struct *w) { struct inet6_ifaddr *ifp = container_of(to_delayed_work(w), struct inet6_ifaddr, dad_work); struct inet6_dev *idev = ifp->idev; bool bump_id, disable_ipv6 = false; struct in6_addr mcaddr; struct net *net; enum { DAD_PROCESS, DAD_BEGIN, DAD_ABORT, } action = DAD_PROCESS; net = dev_net(idev->dev); rtnl_net_lock(net); spin_lock_bh(&ifp->lock); if (ifp->state == INET6_IFADDR_STATE_PREDAD) { action = DAD_BEGIN; ifp->state = INET6_IFADDR_STATE_DAD; } else if (ifp->state == INET6_IFADDR_STATE_ERRDAD) { action = DAD_ABORT; ifp->state = INET6_IFADDR_STATE_POSTDAD; if ((READ_ONCE(net->ipv6.devconf_all->accept_dad) > 1 || READ_ONCE(idev->cnf.accept_dad) > 1) && !idev->cnf.disable_ipv6 && !(ifp->flags & IFA_F_STABLE_PRIVACY)) { struct in6_addr addr; addr.s6_addr32[0] = htonl(0xfe800000); addr.s6_addr32[1] = 0; if (!ipv6_generate_eui64(addr.s6_addr + 8, idev->dev) && ipv6_addr_equal(&ifp->addr, &addr)) { /* DAD failed for link-local based on MAC */ WRITE_ONCE(idev->cnf.disable_ipv6, 1); pr_info("%s: IPv6 being disabled!\n", ifp->idev->dev->name); disable_ipv6 = true; } } } spin_unlock_bh(&ifp->lock); if (action == DAD_BEGIN) { addrconf_dad_begin(ifp); goto out; } else if (action == DAD_ABORT) { in6_ifa_hold(ifp); addrconf_dad_stop(ifp, 1); if (disable_ipv6) addrconf_ifdown(idev->dev, false); goto out; } if (!ifp->dad_probes && addrconf_dad_end(ifp)) goto out; write_lock_bh(&idev->lock); if (idev->dead || !(idev->if_flags & IF_READY)) { write_unlock_bh(&idev->lock); goto out; } spin_lock(&ifp->lock); if (ifp->state == INET6_IFADDR_STATE_DEAD) { spin_unlock(&ifp->lock); write_unlock_bh(&idev->lock); goto out; } if (ifp->dad_probes == 0) { bool send_na = false; /* * DAD was successful */ if (ifp->flags & IFA_F_TENTATIVE && !(ifp->flags & IFA_F_OPTIMISTIC)) send_na = true; bump_id = ifp->flags & IFA_F_TENTATIVE; ifp->flags &= ~(IFA_F_TENTATIVE|IFA_F_OPTIMISTIC|IFA_F_DADFAILED); spin_unlock(&ifp->lock); write_unlock_bh(&idev->lock); addrconf_dad_completed(ifp, bump_id, send_na); goto out; } ifp->dad_probes--; addrconf_mod_dad_work(ifp, max(NEIGH_VAR(ifp->idev->nd_parms, RETRANS_TIME), HZ/100)); spin_unlock(&ifp->lock); write_unlock_bh(&idev->lock); /* send a neighbour solicitation for our addr */ addrconf_addr_solict_mult(&ifp->addr, &mcaddr); ndisc_send_ns(ifp->idev->dev, &ifp->addr, &mcaddr, &in6addr_any, ifp->dad_nonce); out: in6_ifa_put(ifp); rtnl_net_unlock(net); } /* ifp->idev must be at least read locked */ static bool ipv6_lonely_lladdr(struct inet6_ifaddr *ifp) { struct inet6_ifaddr *ifpiter; struct inet6_dev *idev = ifp->idev; list_for_each_entry_reverse(ifpiter, &idev->addr_list, if_list) { if (ifpiter->scope > IFA_LINK) break; if (ifp != ifpiter && ifpiter->scope == IFA_LINK && (ifpiter->flags & (IFA_F_PERMANENT|IFA_F_TENTATIVE| IFA_F_OPTIMISTIC|IFA_F_DADFAILED)) == IFA_F_PERMANENT) return false; } return true; } static void addrconf_dad_completed(struct inet6_ifaddr *ifp, bool bump_id, bool send_na) { struct net_device *dev = ifp->idev->dev; struct in6_addr lladdr; bool send_rs, send_mld; addrconf_del_dad_work(ifp); /* * Configure the address for reception. Now it is valid. */ ipv6_ifa_notify(RTM_NEWADDR, ifp); /* If added prefix is link local and we are prepared to process router advertisements, start sending router solicitations. */ read_lock_bh(&ifp->idev->lock); send_mld = ifp->scope == IFA_LINK && ipv6_lonely_lladdr(ifp); send_rs = send_mld && ipv6_accept_ra(ifp->idev) && READ_ONCE(ifp->idev->cnf.rtr_solicits) != 0 && (dev->flags & IFF_LOOPBACK) == 0 && (dev->type != ARPHRD_TUNNEL) && !netif_is_team_port(dev); read_unlock_bh(&ifp->idev->lock); /* While dad is in progress mld report's source address is in6_addrany. * Resend with proper ll now. */ if (send_mld) ipv6_mc_dad_complete(ifp->idev); /* send unsolicited NA if enabled */ if (send_na && (READ_ONCE(ifp->idev->cnf.ndisc_notify) || READ_ONCE(dev_net(dev)->ipv6.devconf_all->ndisc_notify))) { ndisc_send_na(dev, &in6addr_linklocal_allnodes, &ifp->addr, /*router=*/ !!ifp->idev->cnf.forwarding, /*solicited=*/ false, /*override=*/ true, /*inc_opt=*/ true); } if (send_rs) { /* * If a host as already performed a random delay * [...] as part of DAD [...] there is no need * to delay again before sending the first RS */ if (ipv6_get_lladdr(dev, &lladdr, IFA_F_TENTATIVE)) return; ndisc_send_rs(dev, &lladdr, &in6addr_linklocal_allrouters); write_lock_bh(&ifp->idev->lock); spin_lock(&ifp->lock); ifp->idev->rs_interval = rfc3315_s14_backoff_init( READ_ONCE(ifp->idev->cnf.rtr_solicit_interval)); ifp->idev->rs_probes = 1; ifp->idev->if_flags |= IF_RS_SENT; addrconf_mod_rs_timer(ifp->idev, ifp->idev->rs_interval); spin_unlock(&ifp->lock); write_unlock_bh(&ifp->idev->lock); } if (bump_id) rt_genid_bump_ipv6(dev_net(dev)); /* Make sure that a new temporary address will be created * before this temporary address becomes deprecated. */ if (ifp->flags & IFA_F_TEMPORARY) addrconf_verify_rtnl(dev_net(dev)); } static void addrconf_dad_run(struct inet6_dev *idev, bool restart) { struct inet6_ifaddr *ifp; read_lock_bh(&idev->lock); list_for_each_entry(ifp, &idev->addr_list, if_list) { spin_lock(&ifp->lock); if ((ifp->flags & IFA_F_TENTATIVE && ifp->state == INET6_IFADDR_STATE_DAD) || restart) { if (restart) ifp->state = INET6_IFADDR_STATE_PREDAD; addrconf_dad_kick(ifp); } spin_unlock(&ifp->lock); } read_unlock_bh(&idev->lock); } #ifdef CONFIG_PROC_FS struct if6_iter_state { struct seq_net_private p; int bucket; int offset; }; static struct inet6_ifaddr *if6_get_first(struct seq_file *seq, loff_t pos) { struct if6_iter_state *state = seq->private; struct net *net = seq_file_net(seq); struct inet6_ifaddr *ifa = NULL; int p = 0; /* initial bucket if pos is 0 */ if (pos == 0) { state->bucket = 0; state->offset = 0; } for (; state->bucket < IN6_ADDR_HSIZE; ++state->bucket) { hlist_for_each_entry_rcu(ifa, &net->ipv6.inet6_addr_lst[state->bucket], addr_lst) { /* sync with offset */ if (p < state->offset) { p++; continue; } return ifa; } /* prepare for next bucket */ state->offset = 0; p = 0; } return NULL; } static struct inet6_ifaddr *if6_get_next(struct seq_file *seq, struct inet6_ifaddr *ifa) { struct if6_iter_state *state = seq->private; struct net *net = seq_file_net(seq); hlist_for_each_entry_continue_rcu(ifa, addr_lst) { state->offset++; return ifa; } state->offset = 0; while (++state->bucket < IN6_ADDR_HSIZE) { hlist_for_each_entry_rcu(ifa, &net->ipv6.inet6_addr_lst[state->bucket], addr_lst) { return ifa; } } return NULL; } static void *if6_seq_start(struct seq_file *seq, loff_t *pos) __acquires(rcu) { rcu_read_lock(); return if6_get_first(seq, *pos); } static void *if6_seq_next(struct seq_file *seq, void *v, loff_t *pos) { struct inet6_ifaddr *ifa; ifa = if6_get_next(seq, v); ++*pos; return ifa; } static void if6_seq_stop(struct seq_file *seq, void *v) __releases(rcu) { rcu_read_unlock(); } static int if6_seq_show(struct seq_file *seq, void *v) { struct inet6_ifaddr *ifp = (struct inet6_ifaddr *)v; seq_printf(seq, "%pi6 %02x %02x %02x %02x %8s\n", &ifp->addr, ifp->idev->dev->ifindex, ifp->prefix_len, ifp->scope, (u8) ifp->flags, ifp->idev->dev->name); return 0; } static const struct seq_operations if6_seq_ops = { .start = if6_seq_start, .next = if6_seq_next, .show = if6_seq_show, .stop = if6_seq_stop, }; static int __net_init if6_proc_net_init(struct net *net) { if (!proc_create_net("if_inet6", 0444, net->proc_net, &if6_seq_ops, sizeof(struct if6_iter_state))) return -ENOMEM; return 0; } static void __net_exit if6_proc_net_exit(struct net *net) { remove_proc_entry("if_inet6", net->proc_net); } static struct pernet_operations if6_proc_net_ops = { .init = if6_proc_net_init, .exit = if6_proc_net_exit, }; int __init if6_proc_init(void) { return register_pernet_subsys(&if6_proc_net_ops); } void if6_proc_exit(void) { unregister_pernet_subsys(&if6_proc_net_ops); } #endif /* CONFIG_PROC_FS */ #if IS_ENABLED(CONFIG_IPV6_MIP6) /* Check if address is a home address configured on any interface. */ int ipv6_chk_home_addr(struct net *net, const struct in6_addr *addr) { unsigned int hash = inet6_addr_hash(net, addr); struct inet6_ifaddr *ifp = NULL; int ret = 0; rcu_read_lock(); hlist_for_each_entry_rcu(ifp, &net->ipv6.inet6_addr_lst[hash], addr_lst) { if (ipv6_addr_equal(&ifp->addr, addr) && (ifp->flags & IFA_F_HOMEADDRESS)) { ret = 1; break; } } rcu_read_unlock(); return ret; } #endif /* RFC6554 has some algorithm to avoid loops in segment routing by * checking if the segments contains any of a local interface address. * * Quote: * * To detect loops in the SRH, a router MUST determine if the SRH * includes multiple addresses assigned to any interface on that router. * If such addresses appear more than once and are separated by at least * one address not assigned to that router. */ int ipv6_chk_rpl_srh_loop(struct net *net, const struct in6_addr *segs, unsigned char nsegs) { const struct in6_addr *addr; int i, ret = 0, found = 0; struct inet6_ifaddr *ifp; bool separated = false; unsigned int hash; bool hash_found; rcu_read_lock(); for (i = 0; i < nsegs; i++) { addr = &segs[i]; hash = inet6_addr_hash(net, addr); hash_found = false; hlist_for_each_entry_rcu(ifp, &net->ipv6.inet6_addr_lst[hash], addr_lst) { if (ipv6_addr_equal(&ifp->addr, addr)) { hash_found = true; break; } } if (hash_found) { if (found > 1 && separated) { ret = 1; break; } separated = false; found++; } else { separated = true; } } rcu_read_unlock(); return ret; } /* * Periodic address status verification */ static void addrconf_verify_rtnl(struct net *net) { unsigned long now, next, next_sec, next_sched; struct inet6_ifaddr *ifp; int i; ASSERT_RTNL(); rcu_read_lock_bh(); now = jiffies; next = round_jiffies_up(now + ADDR_CHECK_FREQUENCY); cancel_delayed_work(&net->ipv6.addr_chk_work); for (i = 0; i < IN6_ADDR_HSIZE; i++) { restart: hlist_for_each_entry_rcu_bh(ifp, &net->ipv6.inet6_addr_lst[i], addr_lst) { unsigned long age; /* When setting preferred_lft to a value not zero or * infinity, while valid_lft is infinity * IFA_F_PERMANENT has a non-infinity life time. */ if ((ifp->flags & IFA_F_PERMANENT) && (ifp->prefered_lft == INFINITY_LIFE_TIME)) continue; spin_lock(&ifp->lock); /* We try to batch several events at once. */ age = (now - ifp->tstamp + ADDRCONF_TIMER_FUZZ_MINUS) / HZ; if ((ifp->flags&IFA_F_TEMPORARY) && !(ifp->flags&IFA_F_TENTATIVE) && ifp->prefered_lft != INFINITY_LIFE_TIME && !ifp->regen_count && ifp->ifpub) { /* This is a non-regenerated temporary addr. */ unsigned long regen_advance = ipv6_get_regen_advance(ifp->idev); if (age + regen_advance >= ifp->prefered_lft) { struct inet6_ifaddr *ifpub = ifp->ifpub; if (time_before(ifp->tstamp + ifp->prefered_lft * HZ, next)) next = ifp->tstamp + ifp->prefered_lft * HZ; ifp->regen_count++; in6_ifa_hold(ifp); in6_ifa_hold(ifpub); spin_unlock(&ifp->lock); spin_lock(&ifpub->lock); ifpub->regen_count = 0; spin_unlock(&ifpub->lock); rcu_read_unlock_bh(); ipv6_create_tempaddr(ifpub, true); in6_ifa_put(ifpub); in6_ifa_put(ifp); rcu_read_lock_bh(); goto restart; } else if (time_before(ifp->tstamp + ifp->prefered_lft * HZ - regen_advance * HZ, next)) next = ifp->tstamp + ifp->prefered_lft * HZ - regen_advance * HZ; } if (ifp->valid_lft != INFINITY_LIFE_TIME && age >= ifp->valid_lft) { spin_unlock(&ifp->lock); in6_ifa_hold(ifp); rcu_read_unlock_bh(); ipv6_del_addr(ifp); rcu_read_lock_bh(); goto restart; } else if (ifp->prefered_lft == INFINITY_LIFE_TIME) { spin_unlock(&ifp->lock); continue; } else if (age >= ifp->prefered_lft) { /* jiffies - ifp->tstamp > age >= ifp->prefered_lft */ int deprecate = 0; if (!(ifp->flags&IFA_F_DEPRECATED)) { deprecate = 1; ifp->flags |= IFA_F_DEPRECATED; } if ((ifp->valid_lft != INFINITY_LIFE_TIME) && (time_before(ifp->tstamp + ifp->valid_lft * HZ, next))) next = ifp->tstamp + ifp->valid_lft * HZ; spin_unlock(&ifp->lock); if (deprecate) { in6_ifa_hold(ifp); ipv6_ifa_notify(0, ifp); in6_ifa_put(ifp); goto restart; } } else { /* ifp->prefered_lft <= ifp->valid_lft */ if (time_before(ifp->tstamp + ifp->prefered_lft * HZ, next)) next = ifp->tstamp + ifp->prefered_lft * HZ; spin_unlock(&ifp->lock); } } } next_sec = round_jiffies_up(next); next_sched = next; /* If rounded timeout is accurate enough, accept it. */ if (time_before(next_sec, next + ADDRCONF_TIMER_FUZZ)) next_sched = next_sec; /* And minimum interval is ADDRCONF_TIMER_FUZZ_MAX. */ if (time_before(next_sched, jiffies + ADDRCONF_TIMER_FUZZ_MAX)) next_sched = jiffies + ADDRCONF_TIMER_FUZZ_MAX; pr_debug("now = %lu, schedule = %lu, rounded schedule = %lu => %lu\n", now, next, next_sec, next_sched); mod_delayed_work(addrconf_wq, &net->ipv6.addr_chk_work, next_sched - now); rcu_read_unlock_bh(); } static void addrconf_verify_work(struct work_struct *w) { struct net *net = container_of(to_delayed_work(w), struct net, ipv6.addr_chk_work); rtnl_net_lock(net); addrconf_verify_rtnl(net); rtnl_net_unlock(net); } static void addrconf_verify(struct net *net) { mod_delayed_work(addrconf_wq, &net->ipv6.addr_chk_work, 0); } static struct in6_addr *extract_addr(struct nlattr *addr, struct nlattr *local, struct in6_addr **peer_pfx) { struct in6_addr *pfx = NULL; *peer_pfx = NULL; if (addr) pfx = nla_data(addr); if (local) { if (pfx && nla_memcmp(local, pfx, sizeof(*pfx))) *peer_pfx = pfx; pfx = nla_data(local); } return pfx; } static const struct nla_policy ifa_ipv6_policy[IFA_MAX+1] = { [IFA_ADDRESS] = { .len = sizeof(struct in6_addr) }, [IFA_LOCAL] = { .len = sizeof(struct in6_addr) }, [IFA_CACHEINFO] = { .len = sizeof(struct ifa_cacheinfo) }, [IFA_FLAGS] = { .len = sizeof(u32) }, [IFA_RT_PRIORITY] = { .len = sizeof(u32) }, [IFA_TARGET_NETNSID] = { .type = NLA_S32 }, [IFA_PROTO] = { .type = NLA_U8 }, }; static int inet6_rtm_deladdr(struct sk_buff *skb, struct nlmsghdr *nlh, struct netlink_ext_ack *extack) { struct net *net = sock_net(skb->sk); struct ifaddrmsg *ifm; struct nlattr *tb[IFA_MAX+1]; struct in6_addr *pfx, *peer_pfx; u32 ifa_flags; int err; err = nlmsg_parse_deprecated(nlh, sizeof(*ifm), tb, IFA_MAX, ifa_ipv6_policy, extack); if (err < 0) return err; ifm = nlmsg_data(nlh); pfx = extract_addr(tb[IFA_ADDRESS], tb[IFA_LOCAL], &peer_pfx); if (!pfx) return -EINVAL; ifa_flags = nla_get_u32_default(tb[IFA_FLAGS], ifm->ifa_flags); /* We ignore other flags so far. */ ifa_flags &= IFA_F_MANAGETEMPADDR; rtnl_net_lock(net); err = inet6_addr_del(net, ifm->ifa_index, ifa_flags, pfx, ifm->ifa_prefixlen, extack); rtnl_net_unlock(net); return err; } static int modify_prefix_route(struct net *net, struct inet6_ifaddr *ifp, unsigned long expires, u32 flags, bool modify_peer) { struct fib6_table *table; struct fib6_info *f6i; u32 prio; f6i = addrconf_get_prefix_route(modify_peer ? &ifp->peer_addr : &ifp->addr, ifp->prefix_len, ifp->idev->dev, 0, RTF_DEFAULT, true); if (!f6i) return -ENOENT; prio = ifp->rt_priority ? : IP6_RT_PRIO_ADDRCONF; if (f6i->fib6_metric != prio) { /* delete old one */ ip6_del_rt(dev_net(ifp->idev->dev), f6i, false); /* add new one */ addrconf_prefix_route(modify_peer ? &ifp->peer_addr : &ifp->addr, ifp->prefix_len, ifp->rt_priority, ifp->idev->dev, expires, flags, GFP_KERNEL); return 0; } if (f6i != net->ipv6.fib6_null_entry) { table = f6i->fib6_table; spin_lock_bh(&table->tb6_lock); if (!(flags & RTF_EXPIRES)) { fib6_clean_expires(f6i); fib6_remove_gc_list(f6i); } else { fib6_set_expires(f6i, expires); fib6_add_gc_list(f6i); } spin_unlock_bh(&table->tb6_lock); } fib6_info_release(f6i); return 0; } static int inet6_addr_modify(struct net *net, struct inet6_ifaddr *ifp, struct ifa6_config *cfg, clock_t expires, u32 flags) { bool was_managetempaddr; bool new_peer = false; bool had_prefixroute; ASSERT_RTNL_NET(net); if (cfg->ifa_flags & IFA_F_MANAGETEMPADDR && (ifp->flags & IFA_F_TEMPORARY || ifp->prefix_len != 64)) return -EINVAL; if (!(ifp->flags & IFA_F_TENTATIVE) || ifp->flags & IFA_F_DADFAILED) cfg->ifa_flags &= ~IFA_F_OPTIMISTIC; if (cfg->peer_pfx && memcmp(&ifp->peer_addr, cfg->peer_pfx, sizeof(struct in6_addr))) { if (!ipv6_addr_any(&ifp->peer_addr)) cleanup_prefix_route(ifp, expires, true, true); new_peer = true; } spin_lock_bh(&ifp->lock); was_managetempaddr = ifp->flags & IFA_F_MANAGETEMPADDR; had_prefixroute = ifp->flags & IFA_F_PERMANENT && !(ifp->flags & IFA_F_NOPREFIXROUTE); ifp->flags &= ~(IFA_F_DEPRECATED | IFA_F_PERMANENT | IFA_F_NODAD | IFA_F_HOMEADDRESS | IFA_F_MANAGETEMPADDR | IFA_F_NOPREFIXROUTE); ifp->flags |= cfg->ifa_flags; WRITE_ONCE(ifp->tstamp, jiffies); WRITE_ONCE(ifp->valid_lft, cfg->valid_lft); WRITE_ONCE(ifp->prefered_lft, cfg->preferred_lft); WRITE_ONCE(ifp->ifa_proto, cfg->ifa_proto); if (cfg->rt_priority && cfg->rt_priority != ifp->rt_priority) WRITE_ONCE(ifp->rt_priority, cfg->rt_priority); if (new_peer) ifp->peer_addr = *cfg->peer_pfx; spin_unlock_bh(&ifp->lock); if (!(ifp->flags&IFA_F_TENTATIVE)) ipv6_ifa_notify(0, ifp); if (!(cfg->ifa_flags & IFA_F_NOPREFIXROUTE)) { int rc = -ENOENT; if (had_prefixroute) rc = modify_prefix_route(net, ifp, expires, flags, false); /* prefix route could have been deleted; if so restore it */ if (rc == -ENOENT) { addrconf_prefix_route(&ifp->addr, ifp->prefix_len, ifp->rt_priority, ifp->idev->dev, expires, flags, GFP_KERNEL); } if (had_prefixroute && !ipv6_addr_any(&ifp->peer_addr)) rc = modify_prefix_route(net, ifp, expires, flags, true); if (rc == -ENOENT && !ipv6_addr_any(&ifp->peer_addr)) { addrconf_prefix_route(&ifp->peer_addr, ifp->prefix_len, ifp->rt_priority, ifp->idev->dev, expires, flags, GFP_KERNEL); } } else if (had_prefixroute) { enum cleanup_prefix_rt_t action; unsigned long rt_expires; write_lock_bh(&ifp->idev->lock); action = check_cleanup_prefix_route(ifp, &rt_expires); write_unlock_bh(&ifp->idev->lock); if (action != CLEANUP_PREFIX_RT_NOP) { cleanup_prefix_route(ifp, rt_expires, action == CLEANUP_PREFIX_RT_DEL, false); } } if (was_managetempaddr || ifp->flags & IFA_F_MANAGETEMPADDR) { if (was_managetempaddr && !(ifp->flags & IFA_F_MANAGETEMPADDR)) delete_tempaddrs(ifp->idev, ifp); else manage_tempaddrs(ifp->idev, ifp, cfg->valid_lft, cfg->preferred_lft, !was_managetempaddr, jiffies); } addrconf_verify_rtnl(net); return 0; } static int inet6_rtm_newaddr(struct sk_buff *skb, struct nlmsghdr *nlh, struct netlink_ext_ack *extack) { struct net *net = sock_net(skb->sk); struct nlattr *tb[IFA_MAX+1]; struct in6_addr *peer_pfx; struct inet6_ifaddr *ifa; struct net_device *dev; struct inet6_dev *idev; struct ifa6_config cfg; struct ifaddrmsg *ifm; unsigned long timeout; clock_t expires; u32 flags; int err; err = nlmsg_parse_deprecated(nlh, sizeof(*ifm), tb, IFA_MAX, ifa_ipv6_policy, extack); if (err < 0) return err; memset(&cfg, 0, sizeof(cfg)); ifm = nlmsg_data(nlh); cfg.pfx = extract_addr(tb[IFA_ADDRESS], tb[IFA_LOCAL], &peer_pfx); if (!cfg.pfx) return -EINVAL; cfg.peer_pfx = peer_pfx; cfg.plen = ifm->ifa_prefixlen; if (tb[IFA_RT_PRIORITY]) cfg.rt_priority = nla_get_u32(tb[IFA_RT_PRIORITY]); if (tb[IFA_PROTO]) cfg.ifa_proto = nla_get_u8(tb[IFA_PROTO]); cfg.ifa_flags = nla_get_u32_default(tb[IFA_FLAGS], ifm->ifa_flags); /* We ignore other flags so far. */ cfg.ifa_flags &= IFA_F_NODAD | IFA_F_HOMEADDRESS | IFA_F_MANAGETEMPADDR | IFA_F_NOPREFIXROUTE | IFA_F_MCAUTOJOIN | IFA_F_OPTIMISTIC; cfg.ifa_flags |= IFA_F_PERMANENT; cfg.valid_lft = INFINITY_LIFE_TIME; cfg.preferred_lft = INFINITY_LIFE_TIME; expires = 0; flags = 0; if (tb[IFA_CACHEINFO]) { struct ifa_cacheinfo *ci; ci = nla_data(tb[IFA_CACHEINFO]); cfg.valid_lft = ci->ifa_valid; cfg.preferred_lft = ci->ifa_prefered; if (!cfg.valid_lft || cfg.preferred_lft > cfg.valid_lft) { NL_SET_ERR_MSG_MOD(extack, "address lifetime invalid"); return -EINVAL; } timeout = addrconf_timeout_fixup(cfg.valid_lft, HZ); if (addrconf_finite_timeout(timeout)) { cfg.ifa_flags &= ~IFA_F_PERMANENT; cfg.valid_lft = timeout; expires = jiffies_to_clock_t(timeout * HZ); flags = RTF_EXPIRES; } timeout = addrconf_timeout_fixup(cfg.preferred_lft, HZ); if (addrconf_finite_timeout(timeout)) { if (timeout == 0) cfg.ifa_flags |= IFA_F_DEPRECATED; cfg.preferred_lft = timeout; } } rtnl_net_lock(net); dev = __dev_get_by_index(net, ifm->ifa_index); if (!dev) { NL_SET_ERR_MSG_MOD(extack, "Unable to find the interface"); err = -ENODEV; goto unlock_rtnl; } netdev_lock_ops(dev); idev = ipv6_find_idev(dev); if (IS_ERR(idev)) { err = PTR_ERR(idev); goto unlock; } if (!ipv6_allow_optimistic_dad(net, idev)) cfg.ifa_flags &= ~IFA_F_OPTIMISTIC; if (cfg.ifa_flags & IFA_F_NODAD && cfg.ifa_flags & IFA_F_OPTIMISTIC) { NL_SET_ERR_MSG(extack, "IFA_F_NODAD and IFA_F_OPTIMISTIC are mutually exclusive"); err = -EINVAL; goto unlock; } ifa = ipv6_get_ifaddr(net, cfg.pfx, dev, 1); if (!ifa) { /* * It would be best to check for !NLM_F_CREATE here but * userspace already relies on not having to provide this. */ err = inet6_addr_add(net, dev, &cfg, expires, flags, extack); goto unlock; } if (nlh->nlmsg_flags & NLM_F_EXCL || !(nlh->nlmsg_flags & NLM_F_REPLACE)) { NL_SET_ERR_MSG_MOD(extack, "address already assigned"); err = -EEXIST; } else { err = inet6_addr_modify(net, ifa, &cfg, expires, flags); } in6_ifa_put(ifa); unlock: netdev_unlock_ops(dev); unlock_rtnl: rtnl_net_unlock(net); return err; } static void put_ifaddrmsg(struct nlmsghdr *nlh, u8 prefixlen, u32 flags, u8 scope, int ifindex) { struct ifaddrmsg *ifm; ifm = nlmsg_data(nlh); ifm->ifa_family = AF_INET6; ifm->ifa_prefixlen = prefixlen; ifm->ifa_flags = flags; ifm->ifa_scope = scope; ifm->ifa_index = ifindex; } static int put_cacheinfo(struct sk_buff *skb, unsigned long cstamp, unsigned long tstamp, u32 preferred, u32 valid) { struct ifa_cacheinfo ci; ci.cstamp = cstamp_delta(cstamp); ci.tstamp = cstamp_delta(tstamp); ci.ifa_prefered = preferred; ci.ifa_valid = valid; return nla_put(skb, IFA_CACHEINFO, sizeof(ci), &ci); } static inline int rt_scope(int ifa_scope) { if (ifa_scope & IFA_HOST) return RT_SCOPE_HOST; else if (ifa_scope & IFA_LINK) return RT_SCOPE_LINK; else if (ifa_scope & IFA_SITE) return RT_SCOPE_SITE; else return RT_SCOPE_UNIVERSE; } static inline int inet6_ifaddr_msgsize(void) { return NLMSG_ALIGN(sizeof(struct ifaddrmsg)) + nla_total_size(16) /* IFA_LOCAL */ + nla_total_size(16) /* IFA_ADDRESS */ + nla_total_size(sizeof(struct ifa_cacheinfo)) + nla_total_size(4) /* IFA_FLAGS */ + nla_total_size(1) /* IFA_PROTO */ + nla_total_size(4) /* IFA_RT_PRIORITY */; } static int inet6_fill_ifaddr(struct sk_buff *skb, const struct inet6_ifaddr *ifa, struct inet6_fill_args *args) { struct nlmsghdr *nlh; u32 preferred, valid; u32 flags, priority; u8 proto; nlh = nlmsg_put(skb, args->portid, args->seq, args->event, sizeof(struct ifaddrmsg), args->flags); if (!nlh) return -EMSGSIZE; flags = READ_ONCE(ifa->flags); put_ifaddrmsg(nlh, ifa->prefix_len, ifa->flags, rt_scope(ifa->scope), ifa->idev->dev->ifindex); if (args->netnsid >= 0 && nla_put_s32(skb, IFA_TARGET_NETNSID, args->netnsid)) goto error; preferred = READ_ONCE(ifa->prefered_lft); valid = READ_ONCE(ifa->valid_lft); if (!((flags & IFA_F_PERMANENT) && (preferred == INFINITY_LIFE_TIME))) { if (preferred != INFINITY_LIFE_TIME) { long tval = (jiffies - READ_ONCE(ifa->tstamp)) / HZ; if (preferred > tval) preferred -= tval; else preferred = 0; if (valid != INFINITY_LIFE_TIME) { if (valid > tval) valid -= tval; else valid = 0; } } } else { preferred = INFINITY_LIFE_TIME; valid = INFINITY_LIFE_TIME; } if (!ipv6_addr_any(&ifa->peer_addr)) { if (nla_put_in6_addr(skb, IFA_LOCAL, &ifa->addr) < 0 || nla_put_in6_addr(skb, IFA_ADDRESS, &ifa->peer_addr) < 0) goto error; } else { if (nla_put_in6_addr(skb, IFA_ADDRESS, &ifa->addr) < 0) goto error; } priority = READ_ONCE(ifa->rt_priority); if (priority && nla_put_u32(skb, IFA_RT_PRIORITY, priority)) goto error; if (put_cacheinfo(skb, ifa->cstamp, READ_ONCE(ifa->tstamp), preferred, valid) < 0) goto error; if (nla_put_u32(skb, IFA_FLAGS, flags) < 0) goto error; proto = READ_ONCE(ifa->ifa_proto); if (proto && nla_put_u8(skb, IFA_PROTO, proto)) goto error; nlmsg_end(skb, nlh); return 0; error: nlmsg_cancel(skb, nlh); return -EMSGSIZE; } int inet6_fill_ifmcaddr(struct sk_buff *skb, const struct ifmcaddr6 *ifmca, struct inet6_fill_args *args) { int ifindex = ifmca->idev->dev->ifindex; u8 scope = RT_SCOPE_UNIVERSE; struct nlmsghdr *nlh; if (!args->force_rt_scope_universe && ipv6_addr_scope(&ifmca->mca_addr) & IFA_SITE) scope = RT_SCOPE_SITE; nlh = nlmsg_put(skb, args->portid, args->seq, args->event, sizeof(struct ifaddrmsg), args->flags); if (!nlh) return -EMSGSIZE; if (args->netnsid >= 0 && nla_put_s32(skb, IFA_TARGET_NETNSID, args->netnsid)) { nlmsg_cancel(skb, nlh); return -EMSGSIZE; } put_ifaddrmsg(nlh, 128, IFA_F_PERMANENT, scope, ifindex); if (nla_put_in6_addr(skb, IFA_MULTICAST, &ifmca->mca_addr) < 0 || put_cacheinfo(skb, ifmca->mca_cstamp, READ_ONCE(ifmca->mca_tstamp), INFINITY_LIFE_TIME, INFINITY_LIFE_TIME) < 0) { nlmsg_cancel(skb, nlh); return -EMSGSIZE; } nlmsg_end(skb, nlh); return 0; } int inet6_fill_ifacaddr(struct sk_buff *skb, const struct ifacaddr6 *ifaca, struct inet6_fill_args *args) { struct net_device *dev = fib6_info_nh_dev(ifaca->aca_rt); int ifindex = dev ? dev->ifindex : 1; u8 scope = RT_SCOPE_UNIVERSE; struct nlmsghdr *nlh; if (ipv6_addr_scope(&ifaca->aca_addr) & IFA_SITE) scope = RT_SCOPE_SITE; nlh = nlmsg_put(skb, args->portid, args->seq, args->event, sizeof(struct ifaddrmsg), args->flags); if (!nlh) return -EMSGSIZE; if (args->netnsid >= 0 && nla_put_s32(skb, IFA_TARGET_NETNSID, args->netnsid)) { nlmsg_cancel(skb, nlh); return -EMSGSIZE; } put_ifaddrmsg(nlh, 128, IFA_F_PERMANENT, scope, ifindex); if (nla_put_in6_addr(skb, IFA_ANYCAST, &ifaca->aca_addr) < 0 || put_cacheinfo(skb, ifaca->aca_cstamp, READ_ONCE(ifaca->aca_tstamp), INFINITY_LIFE_TIME, INFINITY_LIFE_TIME) < 0) { nlmsg_cancel(skb, nlh); return -EMSGSIZE; } nlmsg_end(skb, nlh); return 0; } /* called with rcu_read_lock() */ static int in6_dump_addrs(const struct inet6_dev *idev, struct sk_buff *skb, struct netlink_callback *cb, int *s_ip_idx, struct inet6_fill_args *fillargs) { const struct ifmcaddr6 *ifmca; const struct ifacaddr6 *ifaca; int ip_idx = 0; int err = 0; switch (fillargs->type) { case UNICAST_ADDR: { const struct inet6_ifaddr *ifa; fillargs->event = RTM_NEWADDR; /* unicast address incl. temp addr */ list_for_each_entry_rcu(ifa, &idev->addr_list, if_list) { if (ip_idx < *s_ip_idx) goto next; err = inet6_fill_ifaddr(skb, ifa, fillargs); if (err < 0) break; nl_dump_check_consistent(cb, nlmsg_hdr(skb)); next: ip_idx++; } break; } case MULTICAST_ADDR: fillargs->event = RTM_GETMULTICAST; /* multicast address */ for (ifmca = rcu_dereference(idev->mc_list); ifmca; ifmca = rcu_dereference(ifmca->next), ip_idx++) { if (ip_idx < *s_ip_idx) continue; err = inet6_fill_ifmcaddr(skb, ifmca, fillargs); if (err < 0) break; } break; case ANYCAST_ADDR: fillargs->event = RTM_GETANYCAST; /* anycast address */ for (ifaca = rcu_dereference(idev->ac_list); ifaca; ifaca = rcu_dereference(ifaca->aca_next), ip_idx++) { if (ip_idx < *s_ip_idx) continue; err = inet6_fill_ifacaddr(skb, ifaca, fillargs); if (err < 0) break; } break; default: break; } *s_ip_idx = err ? ip_idx : 0; return err; } static int inet6_valid_dump_ifaddr_req(const struct nlmsghdr *nlh, struct inet6_fill_args *fillargs, struct net **tgt_net, struct sock *sk, struct netlink_callback *cb) { struct netlink_ext_ack *extack = cb->extack; struct nlattr *tb[IFA_MAX+1]; struct ifaddrmsg *ifm; int err, i; ifm = nlmsg_payload(nlh, sizeof(*ifm)); if (!ifm) { NL_SET_ERR_MSG_MOD(extack, "Invalid header for address dump request"); return -EINVAL; } if (ifm->ifa_prefixlen || ifm->ifa_flags || ifm->ifa_scope) { NL_SET_ERR_MSG_MOD(extack, "Invalid values in header for address dump request"); return -EINVAL; } fillargs->ifindex = ifm->ifa_index; if (fillargs->ifindex) { cb->answer_flags |= NLM_F_DUMP_FILTERED; fillargs->flags |= NLM_F_DUMP_FILTERED; } err = nlmsg_parse_deprecated_strict(nlh, sizeof(*ifm), tb, IFA_MAX, ifa_ipv6_policy, extack); if (err < 0) return err; for (i = 0; i <= IFA_MAX; ++i) { if (!tb[i]) continue; if (i == IFA_TARGET_NETNSID) { struct net *net; fillargs->netnsid = nla_get_s32(tb[i]); net = rtnl_get_net_ns_capable(sk, fillargs->netnsid); if (IS_ERR(net)) { fillargs->netnsid = -1; NL_SET_ERR_MSG_MOD(extack, "Invalid target network namespace id"); return PTR_ERR(net); } *tgt_net = net; } else { NL_SET_ERR_MSG_MOD(extack, "Unsupported attribute in dump request"); return -EINVAL; } } return 0; } static int inet6_dump_addr(struct sk_buff *skb, struct netlink_callback *cb, enum addr_type_t type) { struct net *tgt_net = sock_net(skb->sk); const struct nlmsghdr *nlh = cb->nlh; struct inet6_fill_args fillargs = { .portid = NETLINK_CB(cb->skb).portid, .seq = cb->nlh->nlmsg_seq, .flags = NLM_F_MULTI, .netnsid = -1, .type = type, .force_rt_scope_universe = false, }; struct { unsigned long ifindex; int ip_idx; } *ctx = (void *)cb->ctx; struct net_device *dev; struct inet6_dev *idev; int err = 0; rcu_read_lock(); if (cb->strict_check) { err = inet6_valid_dump_ifaddr_req(nlh, &fillargs, &tgt_net, skb->sk, cb); if (err < 0) goto done; err = 0; if (fillargs.ifindex) { dev = dev_get_by_index_rcu(tgt_net, fillargs.ifindex); if (!dev) { err = -ENODEV; goto done; } idev = __in6_dev_get(dev); if (idev) err = in6_dump_addrs(idev, skb, cb, &ctx->ip_idx, &fillargs); goto done; } } cb->seq = inet6_base_seq(tgt_net); for_each_netdev_dump(tgt_net, dev, ctx->ifindex) { idev = __in6_dev_get(dev); if (!idev) continue; err = in6_dump_addrs(idev, skb, cb, &ctx->ip_idx, &fillargs); if (err < 0) goto done; } done: rcu_read_unlock(); if (fillargs.netnsid >= 0) put_net(tgt_net); return err; } static int inet6_dump_ifaddr(struct sk_buff *skb, struct netlink_callback *cb) { enum addr_type_t type = UNICAST_ADDR; return inet6_dump_addr(skb, cb, type); } static int inet6_dump_ifmcaddr(struct sk_buff *skb, struct netlink_callback *cb) { enum addr_type_t type = MULTICAST_ADDR; return inet6_dump_addr(skb, cb, type); } static int inet6_dump_ifacaddr(struct sk_buff *skb, struct netlink_callback *cb) { enum addr_type_t type = ANYCAST_ADDR; return inet6_dump_addr(skb, cb, type); } static int inet6_rtm_valid_getaddr_req(struct sk_buff *skb, const struct nlmsghdr *nlh, struct nlattr **tb, struct netlink_ext_ack *extack) { struct ifaddrmsg *ifm; int i, err; ifm = nlmsg_payload(nlh, sizeof(*ifm)); if (!ifm) { NL_SET_ERR_MSG_MOD(extack, "Invalid header for get address request"); return -EINVAL; } if (!netlink_strict_get_check(skb)) return nlmsg_parse_deprecated(nlh, sizeof(*ifm), tb, IFA_MAX, ifa_ipv6_policy, extack); if (ifm->ifa_prefixlen || ifm->ifa_flags || ifm->ifa_scope) { NL_SET_ERR_MSG_MOD(extack, "Invalid values in header for get address request"); return -EINVAL; } err = nlmsg_parse_deprecated_strict(nlh, sizeof(*ifm), tb, IFA_MAX, ifa_ipv6_policy, extack); if (err) return err; for (i = 0; i <= IFA_MAX; i++) { if (!tb[i]) continue; switch (i) { case IFA_TARGET_NETNSID: case IFA_ADDRESS: case IFA_LOCAL: break; default: NL_SET_ERR_MSG_MOD(extack, "Unsupported attribute in get address request"); return -EINVAL; } } return 0; } static int inet6_rtm_getaddr(struct sk_buff *in_skb, struct nlmsghdr *nlh, struct netlink_ext_ack *extack) { struct net *tgt_net = sock_net(in_skb->sk); struct inet6_fill_args fillargs = { .portid = NETLINK_CB(in_skb).portid, .seq = nlh->nlmsg_seq, .event = RTM_NEWADDR, .flags = 0, .netnsid = -1, .force_rt_scope_universe = false, }; struct ifaddrmsg *ifm; struct nlattr *tb[IFA_MAX+1]; struct in6_addr *addr = NULL, *peer; struct net_device *dev = NULL; struct inet6_ifaddr *ifa; struct sk_buff *skb; int err; err = inet6_rtm_valid_getaddr_req(in_skb, nlh, tb, extack); if (err < 0) return err; if (tb[IFA_TARGET_NETNSID]) { fillargs.netnsid = nla_get_s32(tb[IFA_TARGET_NETNSID]); tgt_net = rtnl_get_net_ns_capable(NETLINK_CB(in_skb).sk, fillargs.netnsid); if (IS_ERR(tgt_net)) return PTR_ERR(tgt_net); } addr = extract_addr(tb[IFA_ADDRESS], tb[IFA_LOCAL], &peer); if (!addr) { err = -EINVAL; goto errout; } ifm = nlmsg_data(nlh); if (ifm->ifa_index) dev = dev_get_by_index(tgt_net, ifm->ifa_index); ifa = ipv6_get_ifaddr(tgt_net, addr, dev, 1); if (!ifa) { err = -EADDRNOTAVAIL; goto errout; } skb = nlmsg_new(inet6_ifaddr_msgsize(), GFP_KERNEL); if (!skb) { err = -ENOBUFS; goto errout_ifa; } err = inet6_fill_ifaddr(skb, ifa, &fillargs); if (err < 0) { /* -EMSGSIZE implies BUG in inet6_ifaddr_msgsize() */ WARN_ON(err == -EMSGSIZE); kfree_skb(skb); goto errout_ifa; } err = rtnl_unicast(skb, tgt_net, NETLINK_CB(in_skb).portid); errout_ifa: in6_ifa_put(ifa); errout: dev_put(dev); if (fillargs.netnsid >= 0) put_net(tgt_net); return err; } static void inet6_ifa_notify(int event, struct inet6_ifaddr *ifa) { struct sk_buff *skb; struct net *net = dev_net(ifa->idev->dev); struct inet6_fill_args fillargs = { .portid = 0, .seq = 0, .event = event, .flags = 0, .netnsid = -1, .force_rt_scope_universe = false, }; int err = -ENOBUFS; skb = nlmsg_new(inet6_ifaddr_msgsize(), GFP_ATOMIC); if (!skb) goto errout; err = inet6_fill_ifaddr(skb, ifa, &fillargs); if (err < 0) { /* -EMSGSIZE implies BUG in inet6_ifaddr_msgsize() */ WARN_ON(err == -EMSGSIZE); kfree_skb(skb); goto errout; } rtnl_notify(skb, net, 0, RTNLGRP_IPV6_IFADDR, NULL, GFP_ATOMIC); return; errout: rtnl_set_sk_err(net, RTNLGRP_IPV6_IFADDR, err); } static void ipv6_store_devconf(const struct ipv6_devconf *cnf, __s32 *array, int bytes) { BUG_ON(bytes < (DEVCONF_MAX * 4)); memset(array, 0, bytes); array[DEVCONF_FORWARDING] = READ_ONCE(cnf->forwarding); array[DEVCONF_HOPLIMIT] = READ_ONCE(cnf->hop_limit); array[DEVCONF_MTU6] = READ_ONCE(cnf->mtu6); array[DEVCONF_ACCEPT_RA] = READ_ONCE(cnf->accept_ra); array[DEVCONF_ACCEPT_REDIRECTS] = READ_ONCE(cnf->accept_redirects); array[DEVCONF_AUTOCONF] = READ_ONCE(cnf->autoconf); array[DEVCONF_DAD_TRANSMITS] = READ_ONCE(cnf->dad_transmits); array[DEVCONF_RTR_SOLICITS] = READ_ONCE(cnf->rtr_solicits); array[DEVCONF_RTR_SOLICIT_INTERVAL] = jiffies_to_msecs(READ_ONCE(cnf->rtr_solicit_interval)); array[DEVCONF_RTR_SOLICIT_MAX_INTERVAL] = jiffies_to_msecs(READ_ONCE(cnf->rtr_solicit_max_interval)); array[DEVCONF_RTR_SOLICIT_DELAY] = jiffies_to_msecs(READ_ONCE(cnf->rtr_solicit_delay)); array[DEVCONF_FORCE_MLD_VERSION] = READ_ONCE(cnf->force_mld_version); array[DEVCONF_MLDV1_UNSOLICITED_REPORT_INTERVAL] = jiffies_to_msecs(READ_ONCE(cnf->mldv1_unsolicited_report_interval)); array[DEVCONF_MLDV2_UNSOLICITED_REPORT_INTERVAL] = jiffies_to_msecs(READ_ONCE(cnf->mldv2_unsolicited_report_interval)); array[DEVCONF_USE_TEMPADDR] = READ_ONCE(cnf->use_tempaddr); array[DEVCONF_TEMP_VALID_LFT] = READ_ONCE(cnf->temp_valid_lft); array[DEVCONF_TEMP_PREFERED_LFT] = READ_ONCE(cnf->temp_prefered_lft); array[DEVCONF_REGEN_MAX_RETRY] = READ_ONCE(cnf->regen_max_retry); array[DEVCONF_MAX_DESYNC_FACTOR] = READ_ONCE(cnf->max_desync_factor); array[DEVCONF_MAX_ADDRESSES] = READ_ONCE(cnf->max_addresses); array[DEVCONF_ACCEPT_RA_DEFRTR] = READ_ONCE(cnf->accept_ra_defrtr); array[DEVCONF_RA_DEFRTR_METRIC] = READ_ONCE(cnf->ra_defrtr_metric); array[DEVCONF_ACCEPT_RA_MIN_HOP_LIMIT] = READ_ONCE(cnf->accept_ra_min_hop_limit); array[DEVCONF_ACCEPT_RA_PINFO] = READ_ONCE(cnf->accept_ra_pinfo); #ifdef CONFIG_IPV6_ROUTER_PREF array[DEVCONF_ACCEPT_RA_RTR_PREF] = READ_ONCE(cnf->accept_ra_rtr_pref); array[DEVCONF_RTR_PROBE_INTERVAL] = jiffies_to_msecs(READ_ONCE(cnf->rtr_probe_interval)); #ifdef CONFIG_IPV6_ROUTE_INFO array[DEVCONF_ACCEPT_RA_RT_INFO_MIN_PLEN] = READ_ONCE(cnf->accept_ra_rt_info_min_plen); array[DEVCONF_ACCEPT_RA_RT_INFO_MAX_PLEN] = READ_ONCE(cnf->accept_ra_rt_info_max_plen); #endif #endif array[DEVCONF_PROXY_NDP] = READ_ONCE(cnf->proxy_ndp); array[DEVCONF_ACCEPT_SOURCE_ROUTE] = READ_ONCE(cnf->accept_source_route); #ifdef CONFIG_IPV6_OPTIMISTIC_DAD array[DEVCONF_OPTIMISTIC_DAD] = READ_ONCE(cnf->optimistic_dad); array[DEVCONF_USE_OPTIMISTIC] = READ_ONCE(cnf->use_optimistic); #endif #ifdef CONFIG_IPV6_MROUTE array[DEVCONF_MC_FORWARDING] = atomic_read(&cnf->mc_forwarding); #endif array[DEVCONF_DISABLE_IPV6] = READ_ONCE(cnf->disable_ipv6); array[DEVCONF_ACCEPT_DAD] = READ_ONCE(cnf->accept_dad); array[DEVCONF_FORCE_TLLAO] = READ_ONCE(cnf->force_tllao); array[DEVCONF_NDISC_NOTIFY] = READ_ONCE(cnf->ndisc_notify); array[DEVCONF_SUPPRESS_FRAG_NDISC] = READ_ONCE(cnf->suppress_frag_ndisc); array[DEVCONF_ACCEPT_RA_FROM_LOCAL] = READ_ONCE(cnf->accept_ra_from_local); array[DEVCONF_ACCEPT_RA_MTU] = READ_ONCE(cnf->accept_ra_mtu); array[DEVCONF_IGNORE_ROUTES_WITH_LINKDOWN] = READ_ONCE(cnf->ignore_routes_with_linkdown); /* we omit DEVCONF_STABLE_SECRET for now */ array[DEVCONF_USE_OIF_ADDRS_ONLY] = READ_ONCE(cnf->use_oif_addrs_only); array[DEVCONF_DROP_UNICAST_IN_L2_MULTICAST] = READ_ONCE(cnf->drop_unicast_in_l2_multicast); array[DEVCONF_DROP_UNSOLICITED_NA] = READ_ONCE(cnf->drop_unsolicited_na); array[DEVCONF_KEEP_ADDR_ON_DOWN] = READ_ONCE(cnf->keep_addr_on_down); array[DEVCONF_SEG6_ENABLED] = READ_ONCE(cnf->seg6_enabled); #ifdef CONFIG_IPV6_SEG6_HMAC array[DEVCONF_SEG6_REQUIRE_HMAC] = READ_ONCE(cnf->seg6_require_hmac); #endif array[DEVCONF_ENHANCED_DAD] = READ_ONCE(cnf->enhanced_dad); array[DEVCONF_ADDR_GEN_MODE] = READ_ONCE(cnf->addr_gen_mode); array[DEVCONF_DISABLE_POLICY] = READ_ONCE(cnf->disable_policy); array[DEVCONF_NDISC_TCLASS] = READ_ONCE(cnf->ndisc_tclass); array[DEVCONF_RPL_SEG_ENABLED] = READ_ONCE(cnf->rpl_seg_enabled); array[DEVCONF_IOAM6_ENABLED] = READ_ONCE(cnf->ioam6_enabled); array[DEVCONF_IOAM6_ID] = READ_ONCE(cnf->ioam6_id); array[DEVCONF_IOAM6_ID_WIDE] = READ_ONCE(cnf->ioam6_id_wide); array[DEVCONF_NDISC_EVICT_NOCARRIER] = READ_ONCE(cnf->ndisc_evict_nocarrier); array[DEVCONF_ACCEPT_UNTRACKED_NA] = READ_ONCE(cnf->accept_untracked_na); array[DEVCONF_ACCEPT_RA_MIN_LFT] = READ_ONCE(cnf->accept_ra_min_lft); array[DEVCONF_FORCE_FORWARDING] = READ_ONCE(cnf->force_forwarding); } static inline size_t inet6_ifla6_size(void) { return nla_total_size(4) /* IFLA_INET6_FLAGS */ + nla_total_size(sizeof(struct ifla_cacheinfo)) + nla_total_size(DEVCONF_MAX * 4) /* IFLA_INET6_CONF */ + nla_total_size(IPSTATS_MIB_MAX * 8) /* IFLA_INET6_STATS */ + nla_total_size(ICMP6_MIB_MAX * 8) /* IFLA_INET6_ICMP6STATS */ + nla_total_size(sizeof(struct in6_addr)) /* IFLA_INET6_TOKEN */ + nla_total_size(1) /* IFLA_INET6_ADDR_GEN_MODE */ + nla_total_size(4) /* IFLA_INET6_RA_MTU */ + 0; } static inline size_t inet6_if_nlmsg_size(void) { return NLMSG_ALIGN(sizeof(struct ifinfomsg)) + nla_total_size(IFNAMSIZ) /* IFLA_IFNAME */ + nla_total_size(MAX_ADDR_LEN) /* IFLA_ADDRESS */ + nla_total_size(4) /* IFLA_MTU */ + nla_total_size(4) /* IFLA_LINK */ + nla_total_size(1) /* IFLA_OPERSTATE */ + nla_total_size(inet6_ifla6_size()); /* IFLA_PROTINFO */ } static inline void __snmp6_fill_statsdev(u64 *stats, atomic_long_t *mib, int bytes) { int i; int pad = bytes - sizeof(u64) * ICMP6_MIB_MAX; BUG_ON(pad < 0); /* Use put_unaligned() because stats may not be aligned for u64. */ put_unaligned(ICMP6_MIB_MAX, &stats[0]); for (i = 1; i < ICMP6_MIB_MAX; i++) put_unaligned(atomic_long_read(&mib[i]), &stats[i]); memset(&stats[ICMP6_MIB_MAX], 0, pad); } static inline void __snmp6_fill_stats64(u64 *stats, void __percpu *mib, int bytes, size_t syncpoff) { int i, c; u64 buff[IPSTATS_MIB_MAX]; int pad = bytes - sizeof(u64) * IPSTATS_MIB_MAX; BUG_ON(pad < 0); memset(buff, 0, sizeof(buff)); buff[0] = IPSTATS_MIB_MAX; for_each_possible_cpu(c) { for (i = 1; i < IPSTATS_MIB_MAX; i++) buff[i] += snmp_get_cpu_field64(mib, c, i, syncpoff); } memcpy(stats, buff, IPSTATS_MIB_MAX * sizeof(u64)); memset(&stats[IPSTATS_MIB_MAX], 0, pad); } static void snmp6_fill_stats(u64 *stats, struct inet6_dev *idev, int attrtype, int bytes) { switch (attrtype) { case IFLA_INET6_STATS: __snmp6_fill_stats64(stats, idev->stats.ipv6, bytes, offsetof(struct ipstats_mib, syncp)); break; case IFLA_INET6_ICMP6STATS: __snmp6_fill_statsdev(stats, idev->stats.icmpv6dev->mibs, bytes); break; } } static int inet6_fill_ifla6_stats_attrs(struct sk_buff *skb, struct inet6_dev *idev) { struct nlattr *nla; nla = nla_reserve(skb, IFLA_INET6_STATS, IPSTATS_MIB_MAX * sizeof(u64)); if (!nla) goto nla_put_failure; snmp6_fill_stats(nla_data(nla), idev, IFLA_INET6_STATS, nla_len(nla)); nla = nla_reserve(skb, IFLA_INET6_ICMP6STATS, ICMP6_MIB_MAX * sizeof(u64)); if (!nla) goto nla_put_failure; snmp6_fill_stats(nla_data(nla), idev, IFLA_INET6_ICMP6STATS, nla_len(nla)); return 0; nla_put_failure: return -EMSGSIZE; } static int inet6_fill_ifla6_attrs(struct sk_buff *skb, struct inet6_dev *idev, u32 ext_filter_mask) { struct ifla_cacheinfo ci; struct nlattr *nla; u32 ra_mtu; if (nla_put_u32(skb, IFLA_INET6_FLAGS, READ_ONCE(idev->if_flags))) goto nla_put_failure; ci.max_reasm_len = IPV6_MAXPLEN; ci.tstamp = cstamp_delta(READ_ONCE(idev->tstamp)); ci.reachable_time = jiffies_to_msecs(idev->nd_parms->reachable_time); ci.retrans_time = jiffies_to_msecs(NEIGH_VAR(idev->nd_parms, RETRANS_TIME)); if (nla_put(skb, IFLA_INET6_CACHEINFO, sizeof(ci), &ci)) goto nla_put_failure; nla = nla_reserve(skb, IFLA_INET6_CONF, DEVCONF_MAX * sizeof(s32)); if (!nla) goto nla_put_failure; ipv6_store_devconf(&idev->cnf, nla_data(nla), nla_len(nla)); /* XXX - MC not implemented */ if (!(ext_filter_mask & RTEXT_FILTER_SKIP_STATS)) { if (inet6_fill_ifla6_stats_attrs(skb, idev) < 0) goto nla_put_failure; } nla = nla_reserve(skb, IFLA_INET6_TOKEN, sizeof(struct in6_addr)); if (!nla) goto nla_put_failure; read_lock_bh(&idev->lock); memcpy(nla_data(nla), idev->token.s6_addr, nla_len(nla)); read_unlock_bh(&idev->lock); if (nla_put_u8(skb, IFLA_INET6_ADDR_GEN_MODE, READ_ONCE(idev->cnf.addr_gen_mode))) goto nla_put_failure; ra_mtu = READ_ONCE(idev->ra_mtu); if (ra_mtu && nla_put_u32(skb, IFLA_INET6_RA_MTU, ra_mtu)) goto nla_put_failure; return 0; nla_put_failure: return -EMSGSIZE; } static size_t inet6_get_link_af_size(const struct net_device *dev, u32 ext_filter_mask) { if (!__in6_dev_get(dev)) return 0; return inet6_ifla6_size(); } static int inet6_fill_link_af(struct sk_buff *skb, const struct net_device *dev, u32 ext_filter_mask) { struct inet6_dev *idev = __in6_dev_get(dev); if (!idev) return -ENODATA; if (inet6_fill_ifla6_attrs(skb, idev, ext_filter_mask) < 0) return -EMSGSIZE; return 0; } static int inet6_set_iftoken(struct inet6_dev *idev, struct in6_addr *token, struct netlink_ext_ack *extack) { struct inet6_ifaddr *ifp; struct net_device *dev = idev->dev; bool clear_token, update_rs = false; struct in6_addr ll_addr; ASSERT_RTNL(); if (!token) return -EINVAL; if (dev->flags & IFF_LOOPBACK) { NL_SET_ERR_MSG_MOD(extack, "Device is loopback"); return -EINVAL; } if (dev->flags & IFF_NOARP) { NL_SET_ERR_MSG_MOD(extack, "Device does not do neighbour discovery"); return -EINVAL; } if (!ipv6_accept_ra(idev)) { NL_SET_ERR_MSG_MOD(extack, "Router advertisement is disabled on device"); return -EINVAL; } if (READ_ONCE(idev->cnf.rtr_solicits) == 0) { NL_SET_ERR_MSG(extack, "Router solicitation is disabled on device"); return -EINVAL; } write_lock_bh(&idev->lock); BUILD_BUG_ON(sizeof(token->s6_addr) != 16); memcpy(idev->token.s6_addr + 8, token->s6_addr + 8, 8); write_unlock_bh(&idev->lock); clear_token = ipv6_addr_any(token); if (clear_token) goto update_lft; if (!idev->dead && (idev->if_flags & IF_READY) && !ipv6_get_lladdr(dev, &ll_addr, IFA_F_TENTATIVE | IFA_F_OPTIMISTIC)) { /* If we're not ready, then normal ifup will take care * of this. Otherwise, we need to request our rs here. */ ndisc_send_rs(dev, &ll_addr, &in6addr_linklocal_allrouters); update_rs = true; } update_lft: write_lock_bh(&idev->lock); if (update_rs) { idev->if_flags |= IF_RS_SENT; idev->rs_interval = rfc3315_s14_backoff_init( READ_ONCE(idev->cnf.rtr_solicit_interval)); idev->rs_probes = 1; addrconf_mod_rs_timer(idev, idev->rs_interval); } /* Well, that's kinda nasty ... */ list_for_each_entry(ifp, &idev->addr_list, if_list) { spin_lock(&ifp->lock); if (ifp->tokenized) { ifp->valid_lft = 0; ifp->prefered_lft = 0; } spin_unlock(&ifp->lock); } write_unlock_bh(&idev->lock); inet6_ifinfo_notify(RTM_NEWLINK, idev); addrconf_verify_rtnl(dev_net(dev)); return 0; } static const struct nla_policy inet6_af_policy[IFLA_INET6_MAX + 1] = { [IFLA_INET6_ADDR_GEN_MODE] = { .type = NLA_U8 }, [IFLA_INET6_TOKEN] = { .len = sizeof(struct in6_addr) }, [IFLA_INET6_RA_MTU] = { .type = NLA_REJECT, .reject_message = "IFLA_INET6_RA_MTU can not be set" }, }; static int check_addr_gen_mode(int mode) { if (mode != IN6_ADDR_GEN_MODE_EUI64 && mode != IN6_ADDR_GEN_MODE_NONE && mode != IN6_ADDR_GEN_MODE_STABLE_PRIVACY && mode != IN6_ADDR_GEN_MODE_RANDOM) return -EINVAL; return 1; } static int check_stable_privacy(struct inet6_dev *idev, struct net *net, int mode) { if (mode == IN6_ADDR_GEN_MODE_STABLE_PRIVACY && !idev->cnf.stable_secret.initialized && !net->ipv6.devconf_dflt->stable_secret.initialized) return -EINVAL; return 1; } static int inet6_validate_link_af(const struct net_device *dev, const struct nlattr *nla, struct netlink_ext_ack *extack) { struct nlattr *tb[IFLA_INET6_MAX + 1]; struct inet6_dev *idev = NULL; int err; if (dev) { idev = __in6_dev_get(dev); if (!idev) return -EAFNOSUPPORT; } err = nla_parse_nested_deprecated(tb, IFLA_INET6_MAX, nla, inet6_af_policy, extack); if (err) return err; if (!tb[IFLA_INET6_TOKEN] && !tb[IFLA_INET6_ADDR_GEN_MODE]) return -EINVAL; if (tb[IFLA_INET6_ADDR_GEN_MODE]) { u8 mode = nla_get_u8(tb[IFLA_INET6_ADDR_GEN_MODE]); if (check_addr_gen_mode(mode) < 0) return -EINVAL; if (dev && check_stable_privacy(idev, dev_net(dev), mode) < 0) return -EINVAL; } return 0; } static int inet6_set_link_af(struct net_device *dev, const struct nlattr *nla, struct netlink_ext_ack *extack) { struct inet6_dev *idev = __in6_dev_get(dev); struct nlattr *tb[IFLA_INET6_MAX + 1]; int err; if (!idev) return -EAFNOSUPPORT; if (nla_parse_nested_deprecated(tb, IFLA_INET6_MAX, nla, NULL, NULL) < 0) return -EINVAL; if (tb[IFLA_INET6_TOKEN]) { err = inet6_set_iftoken(idev, nla_data(tb[IFLA_INET6_TOKEN]), extack); if (err) return err; } if (tb[IFLA_INET6_ADDR_GEN_MODE]) { u8 mode = nla_get_u8(tb[IFLA_INET6_ADDR_GEN_MODE]); WRITE_ONCE(idev->cnf.addr_gen_mode, mode); } return 0; } static int inet6_fill_ifinfo(struct sk_buff *skb, struct inet6_dev *idev, u32 portid, u32 seq, int event, unsigned int flags) { struct net_device *dev = idev->dev; struct ifinfomsg *hdr; struct nlmsghdr *nlh; int ifindex, iflink; void *protoinfo; nlh = nlmsg_put(skb, portid, seq, event, sizeof(*hdr), flags); if (!nlh) return -EMSGSIZE; hdr = nlmsg_data(nlh); hdr->ifi_family = AF_INET6; hdr->__ifi_pad = 0; hdr->ifi_type = dev->type; ifindex = READ_ONCE(dev->ifindex); hdr->ifi_index = ifindex; hdr->ifi_flags = netif_get_flags(dev); hdr->ifi_change = 0; iflink = dev_get_iflink(dev); if (nla_put_string(skb, IFLA_IFNAME, dev->name) || (dev->addr_len && nla_put(skb, IFLA_ADDRESS, dev->addr_len, dev->dev_addr)) || nla_put_u32(skb, IFLA_MTU, READ_ONCE(dev->mtu)) || (ifindex != iflink && nla_put_u32(skb, IFLA_LINK, iflink)) || nla_put_u8(skb, IFLA_OPERSTATE, netif_running(dev) ? READ_ONCE(dev->operstate) : IF_OPER_DOWN)) goto nla_put_failure; protoinfo = nla_nest_start_noflag(skb, IFLA_PROTINFO); if (!protoinfo) goto nla_put_failure; if (inet6_fill_ifla6_attrs(skb, idev, 0) < 0) goto nla_put_failure; nla_nest_end(skb, protoinfo); nlmsg_end(skb, nlh); return 0; nla_put_failure: nlmsg_cancel(skb, nlh); return -EMSGSIZE; } static int inet6_valid_dump_ifinfo(const struct nlmsghdr *nlh, struct netlink_ext_ack *extack) { struct ifinfomsg *ifm; ifm = nlmsg_payload(nlh, sizeof(*ifm)); if (!ifm) { NL_SET_ERR_MSG_MOD(extack, "Invalid header for link dump request"); return -EINVAL; } if (nlmsg_attrlen(nlh, sizeof(*ifm))) { NL_SET_ERR_MSG_MOD(extack, "Invalid data after header"); return -EINVAL; } if (ifm->__ifi_pad || ifm->ifi_type || ifm->ifi_flags || ifm->ifi_change || ifm->ifi_index) { NL_SET_ERR_MSG_MOD(extack, "Invalid values in header for dump request"); return -EINVAL; } return 0; } static int inet6_dump_ifinfo(struct sk_buff *skb, struct netlink_callback *cb) { struct net *net = sock_net(skb->sk); struct { unsigned long ifindex; } *ctx = (void *)cb->ctx; struct net_device *dev; struct inet6_dev *idev; int err; /* only requests using strict checking can pass data to * influence the dump */ if (cb->strict_check) { err = inet6_valid_dump_ifinfo(cb->nlh, cb->extack); if (err < 0) return err; } err = 0; rcu_read_lock(); for_each_netdev_dump(net, dev, ctx->ifindex) { idev = __in6_dev_get(dev); if (!idev) continue; err = inet6_fill_ifinfo(skb, idev, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq, RTM_NEWLINK, NLM_F_MULTI); if (err < 0) break; } rcu_read_unlock(); return err; } void inet6_ifinfo_notify(int event, struct inet6_dev *idev) { struct sk_buff *skb; struct net *net = dev_net(idev->dev); int err = -ENOBUFS; skb = nlmsg_new(inet6_if_nlmsg_size(), GFP_ATOMIC); if (!skb) goto errout; err = inet6_fill_ifinfo(skb, idev, 0, 0, event, 0); if (err < 0) { /* -EMSGSIZE implies BUG in inet6_if_nlmsg_size() */ WARN_ON(err == -EMSGSIZE); kfree_skb(skb); goto errout; } rtnl_notify(skb, net, 0, RTNLGRP_IPV6_IFINFO, NULL, GFP_ATOMIC); return; errout: rtnl_set_sk_err(net, RTNLGRP_IPV6_IFINFO, err); } static inline size_t inet6_prefix_nlmsg_size(void) { return NLMSG_ALIGN(sizeof(struct prefixmsg)) + nla_total_size(sizeof(struct in6_addr)) + nla_total_size(sizeof(struct prefix_cacheinfo)); } static int inet6_fill_prefix(struct sk_buff *skb, struct inet6_dev *idev, struct prefix_info *pinfo, u32 portid, u32 seq, int event, unsigned int flags) { struct prefixmsg *pmsg; struct nlmsghdr *nlh; struct prefix_cacheinfo ci; nlh = nlmsg_put(skb, portid, seq, event, sizeof(*pmsg), flags); if (!nlh) return -EMSGSIZE; pmsg = nlmsg_data(nlh); pmsg->prefix_family = AF_INET6; pmsg->prefix_pad1 = 0; pmsg->prefix_pad2 = 0; pmsg->prefix_ifindex = idev->dev->ifindex; pmsg->prefix_len = pinfo->prefix_len; pmsg->prefix_type = pinfo->type; pmsg->prefix_pad3 = 0; pmsg->prefix_flags = pinfo->flags; if (nla_put(skb, PREFIX_ADDRESS, sizeof(pinfo->prefix), &pinfo->prefix)) goto nla_put_failure; ci.preferred_time = ntohl(pinfo->prefered); ci.valid_time = ntohl(pinfo->valid); if (nla_put(skb, PREFIX_CACHEINFO, sizeof(ci), &ci)) goto nla_put_failure; nlmsg_end(skb, nlh); return 0; nla_put_failure: nlmsg_cancel(skb, nlh); return -EMSGSIZE; } static void inet6_prefix_notify(int event, struct inet6_dev *idev, struct prefix_info *pinfo) { struct sk_buff *skb; struct net *net = dev_net(idev->dev); int err = -ENOBUFS; skb = nlmsg_new(inet6_prefix_nlmsg_size(), GFP_ATOMIC); if (!skb) goto errout; err = inet6_fill_prefix(skb, idev, pinfo, 0, 0, event, 0); if (err < 0) { /* -EMSGSIZE implies BUG in inet6_prefix_nlmsg_size() */ WARN_ON(err == -EMSGSIZE); kfree_skb(skb); goto errout; } rtnl_notify(skb, net, 0, RTNLGRP_IPV6_PREFIX, NULL, GFP_ATOMIC); return; errout: rtnl_set_sk_err(net, RTNLGRP_IPV6_PREFIX, err); } static void __ipv6_ifa_notify(int event, struct inet6_ifaddr *ifp) { struct net *net = dev_net(ifp->idev->dev); if (event) ASSERT_RTNL(); inet6_ifa_notify(event ? : RTM_NEWADDR, ifp); switch (event) { case RTM_NEWADDR: /* * If the address was optimistic we inserted the route at the * start of our DAD process, so we don't need to do it again. * If the device was taken down in the middle of the DAD * cycle there is a race where we could get here without a * host route, so nothing to insert. That will be fixed when * the device is brought up. */ if (ifp->rt && !rcu_access_pointer(ifp->rt->fib6_node)) { ip6_ins_rt(net, ifp->rt); } else if (!ifp->rt && (ifp->idev->dev->flags & IFF_UP)) { pr_warn("BUG: Address %pI6c on device %s is missing its host route.\n", &ifp->addr, ifp->idev->dev->name); } if (ifp->idev->cnf.forwarding) addrconf_join_anycast(ifp); if (!ipv6_addr_any(&ifp->peer_addr)) addrconf_prefix_route(&ifp->peer_addr, 128, ifp->rt_priority, ifp->idev->dev, 0, 0, GFP_ATOMIC); break; case RTM_DELADDR: if (ifp->idev->cnf.forwarding) addrconf_leave_anycast(ifp); addrconf_leave_solict(ifp->idev, &ifp->addr); if (!ipv6_addr_any(&ifp->peer_addr)) { struct fib6_info *rt; rt = addrconf_get_prefix_route(&ifp->peer_addr, 128, ifp->idev->dev, 0, 0, false); if (rt) ip6_del_rt(net, rt, false); } if (ifp->rt) { ip6_del_rt(net, ifp->rt, false); ifp->rt = NULL; } rt_genid_bump_ipv6(net); break; } atomic_inc(&net->ipv6.dev_addr_genid); } static void ipv6_ifa_notify(int event, struct inet6_ifaddr *ifp) { if (likely(ifp->idev->dead == 0)) __ipv6_ifa_notify(event, ifp); } #ifdef CONFIG_SYSCTL static int addrconf_sysctl_forward(const struct ctl_table *ctl, int write, void *buffer, size_t *lenp, loff_t *ppos) { int *valp = ctl->data; int val = *valp; loff_t pos = *ppos; struct ctl_table lctl; int ret; /* * ctl->data points to idev->cnf.forwarding, we should * not modify it until we get the rtnl lock. */ lctl = *ctl; lctl.data = &val; ret = proc_dointvec(&lctl, write, buffer, lenp, ppos); if (write) ret = addrconf_fixup_forwarding(ctl, valp, val); if (ret) *ppos = pos; return ret; } static int addrconf_sysctl_mtu(const struct ctl_table *ctl, int write, void *buffer, size_t *lenp, loff_t *ppos) { struct inet6_dev *idev = ctl->extra1; int min_mtu = IPV6_MIN_MTU; struct ctl_table lctl; lctl = *ctl; lctl.extra1 = &min_mtu; lctl.extra2 = idev ? &idev->dev->mtu : NULL; return proc_dointvec_minmax(&lctl, write, buffer, lenp, ppos); } static void dev_disable_change(struct inet6_dev *idev) { struct netdev_notifier_info info; if (!idev || !idev->dev) return; netdev_notifier_info_init(&info, idev->dev); if (idev->cnf.disable_ipv6) addrconf_notify(NULL, NETDEV_DOWN, &info); else addrconf_notify(NULL, NETDEV_UP, &info); } static void addrconf_disable_change(struct net *net, __s32 newf) { struct net_device *dev; struct inet6_dev *idev; for_each_netdev(net, dev) { idev = __in6_dev_get_rtnl_net(dev); if (idev) { int changed = (!idev->cnf.disable_ipv6) ^ (!newf); WRITE_ONCE(idev->cnf.disable_ipv6, newf); if (changed) dev_disable_change(idev); } } } static int addrconf_disable_ipv6(const struct ctl_table *table, int *p, int newf) { struct net *net = (struct net *)table->extra2; int old; if (p == &net->ipv6.devconf_dflt->disable_ipv6) { WRITE_ONCE(*p, newf); return 0; } if (!rtnl_net_trylock(net)) return restart_syscall(); old = *p; WRITE_ONCE(*p, newf); if (p == &net->ipv6.devconf_all->disable_ipv6) { WRITE_ONCE(net->ipv6.devconf_dflt->disable_ipv6, newf); addrconf_disable_change(net, newf); } else if ((!newf) ^ (!old)) { dev_disable_change((struct inet6_dev *)table->extra1); } rtnl_net_unlock(net); return 0; } static int addrconf_sysctl_disable(const struct ctl_table *ctl, int write, void *buffer, size_t *lenp, loff_t *ppos) { int *valp = ctl->data; int val = *valp; loff_t pos = *ppos; struct ctl_table lctl; int ret; /* * ctl->data points to idev->cnf.disable_ipv6, we should * not modify it until we get the rtnl lock. */ lctl = *ctl; lctl.data = &val; ret = proc_dointvec(&lctl, write, buffer, lenp, ppos); if (write) ret = addrconf_disable_ipv6(ctl, valp, val); if (ret) *ppos = pos; return ret; } static int addrconf_sysctl_proxy_ndp(const struct ctl_table *ctl, int write, void *buffer, size_t *lenp, loff_t *ppos) { int *valp = ctl->data; int ret; int old, new; old = *valp; ret = proc_dointvec(ctl, write, buffer, lenp, ppos); new = *valp; if (write && old != new) { struct net *net = ctl->extra2; if (!rtnl_net_trylock(net)) return restart_syscall(); if (valp == &net->ipv6.devconf_dflt->proxy_ndp) { inet6_netconf_notify_devconf(net, RTM_NEWNETCONF, NETCONFA_PROXY_NEIGH, NETCONFA_IFINDEX_DEFAULT, net->ipv6.devconf_dflt); } else if (valp == &net->ipv6.devconf_all->proxy_ndp) { inet6_netconf_notify_devconf(net, RTM_NEWNETCONF, NETCONFA_PROXY_NEIGH, NETCONFA_IFINDEX_ALL, net->ipv6.devconf_all); } else { struct inet6_dev *idev = ctl->extra1; inet6_netconf_notify_devconf(net, RTM_NEWNETCONF, NETCONFA_PROXY_NEIGH, idev->dev->ifindex, &idev->cnf); } rtnl_net_unlock(net); } return ret; } static int addrconf_sysctl_addr_gen_mode(const struct ctl_table *ctl, int write, void *buffer, size_t *lenp, loff_t *ppos) { int ret = 0; u32 new_val; struct inet6_dev *idev = (struct inet6_dev *)ctl->extra1; struct net *net = (struct net *)ctl->extra2; struct ctl_table tmp = { .data = &new_val, .maxlen = sizeof(new_val), .mode = ctl->mode, }; if (!rtnl_net_trylock(net)) return restart_syscall(); new_val = *((u32 *)ctl->data); ret = proc_douintvec(&tmp, write, buffer, lenp, ppos); if (ret != 0) goto out; if (write) { if (check_addr_gen_mode(new_val) < 0) { ret = -EINVAL; goto out; } if (idev) { if (check_stable_privacy(idev, net, new_val) < 0) { ret = -EINVAL; goto out; } if (idev->cnf.addr_gen_mode != new_val) { WRITE_ONCE(idev->cnf.addr_gen_mode, new_val); netdev_lock_ops(idev->dev); addrconf_init_auto_addrs(idev->dev); netdev_unlock_ops(idev->dev); } } else if (&net->ipv6.devconf_all->addr_gen_mode == ctl->data) { struct net_device *dev; WRITE_ONCE(net->ipv6.devconf_dflt->addr_gen_mode, new_val); for_each_netdev(net, dev) { idev = __in6_dev_get_rtnl_net(dev); if (idev && idev->cnf.addr_gen_mode != new_val) { WRITE_ONCE(idev->cnf.addr_gen_mode, new_val); netdev_lock_ops(idev->dev); addrconf_init_auto_addrs(idev->dev); netdev_unlock_ops(idev->dev); } } } WRITE_ONCE(*((u32 *)ctl->data), new_val); } out: rtnl_net_unlock(net); return ret; } static int addrconf_sysctl_stable_secret(const struct ctl_table *ctl, int write, void *buffer, size_t *lenp, loff_t *ppos) { int err; struct in6_addr addr; char str[IPV6_MAX_STRLEN]; struct ctl_table lctl = *ctl; struct net *net = ctl->extra2; struct ipv6_stable_secret *secret = ctl->data; if (&net->ipv6.devconf_all->stable_secret == ctl->data) return -EIO; lctl.maxlen = IPV6_MAX_STRLEN; lctl.data = str; if (!rtnl_net_trylock(net)) return restart_syscall(); if (!write && !secret->initialized) { err = -EIO; goto out; } err = snprintf(str, sizeof(str), "%pI6", &secret->secret); if (err >= sizeof(str)) { err = -EIO; goto out; } err = proc_dostring(&lctl, write, buffer, lenp, ppos); if (err || !write) goto out; if (in6_pton(str, -1, addr.in6_u.u6_addr8, -1, NULL) != 1) { err = -EIO; goto out; } secret->initialized = true; secret->secret = addr; if (&net->ipv6.devconf_dflt->stable_secret == ctl->data) { struct net_device *dev; for_each_netdev(net, dev) { struct inet6_dev *idev = __in6_dev_get_rtnl_net(dev); if (idev) { WRITE_ONCE(idev->cnf.addr_gen_mode, IN6_ADDR_GEN_MODE_STABLE_PRIVACY); } } } else { struct inet6_dev *idev = ctl->extra1; WRITE_ONCE(idev->cnf.addr_gen_mode, IN6_ADDR_GEN_MODE_STABLE_PRIVACY); } out: rtnl_net_unlock(net); return err; } static int addrconf_sysctl_ignore_routes_with_linkdown(const struct ctl_table *ctl, int write, void *buffer, size_t *lenp, loff_t *ppos) { int *valp = ctl->data; int val = *valp; loff_t pos = *ppos; struct ctl_table lctl; int ret; /* ctl->data points to idev->cnf.ignore_routes_when_linkdown * we should not modify it until we get the rtnl lock. */ lctl = *ctl; lctl.data = &val; ret = proc_dointvec(&lctl, write, buffer, lenp, ppos); if (write) ret = addrconf_fixup_linkdown(ctl, valp, val); if (ret) *ppos = pos; return ret; } static void addrconf_set_nopolicy(struct rt6_info *rt, int action) { if (rt) { if (action) rt->dst.flags |= DST_NOPOLICY; else rt->dst.flags &= ~DST_NOPOLICY; } } static void addrconf_disable_policy_idev(struct inet6_dev *idev, int val) { struct inet6_ifaddr *ifa; read_lock_bh(&idev->lock); list_for_each_entry(ifa, &idev->addr_list, if_list) { spin_lock(&ifa->lock); if (ifa->rt) { /* host routes only use builtin fib6_nh */ struct fib6_nh *nh = ifa->rt->fib6_nh; int cpu; rcu_read_lock(); ifa->rt->dst_nopolicy = val ? true : false; if (nh->rt6i_pcpu) { for_each_possible_cpu(cpu) { struct rt6_info **rtp; rtp = per_cpu_ptr(nh->rt6i_pcpu, cpu); addrconf_set_nopolicy(*rtp, val); } } rcu_read_unlock(); } spin_unlock(&ifa->lock); } read_unlock_bh(&idev->lock); } static int addrconf_disable_policy(const struct ctl_table *ctl, int *valp, int val) { struct net *net = (struct net *)ctl->extra2; struct inet6_dev *idev; if (valp == &net->ipv6.devconf_dflt->disable_policy) { WRITE_ONCE(*valp, val); return 0; } if (!rtnl_net_trylock(net)) return restart_syscall(); WRITE_ONCE(*valp, val); if (valp == &net->ipv6.devconf_all->disable_policy) { struct net_device *dev; for_each_netdev(net, dev) { idev = __in6_dev_get_rtnl_net(dev); if (idev) addrconf_disable_policy_idev(idev, val); } } else { idev = (struct inet6_dev *)ctl->extra1; addrconf_disable_policy_idev(idev, val); } rtnl_net_unlock(net); return 0; } static int addrconf_sysctl_disable_policy(const struct ctl_table *ctl, int write, void *buffer, size_t *lenp, loff_t *ppos) { int *valp = ctl->data; int val = *valp; loff_t pos = *ppos; struct ctl_table lctl; int ret; lctl = *ctl; lctl.data = &val; ret = proc_dointvec(&lctl, write, buffer, lenp, ppos); if (write && (*valp != val)) ret = addrconf_disable_policy(ctl, valp, val); if (ret) *ppos = pos; return ret; } static void addrconf_force_forward_change(struct net *net, __s32 newf) { struct net_device *dev; struct inet6_dev *idev; for_each_netdev(net, dev) { idev = __in6_dev_get_rtnl_net(dev); if (idev) { int changed = (!idev->cnf.force_forwarding) ^ (!newf); WRITE_ONCE(idev->cnf.force_forwarding, newf); if (changed) inet6_netconf_notify_devconf(dev_net(dev), RTM_NEWNETCONF, NETCONFA_FORCE_FORWARDING, dev->ifindex, &idev->cnf); } } } static int addrconf_sysctl_force_forwarding(const struct ctl_table *ctl, int write, void *buffer, size_t *lenp, loff_t *ppos) { struct inet6_dev *idev = ctl->extra1; struct ctl_table tmp_ctl = *ctl; struct net *net = ctl->extra2; int *valp = ctl->data; int new_val = *valp; int old_val = *valp; loff_t pos = *ppos; int ret; tmp_ctl.extra1 = SYSCTL_ZERO; tmp_ctl.extra2 = SYSCTL_ONE; tmp_ctl.data = &new_val; ret = proc_douintvec_minmax(&tmp_ctl, write, buffer, lenp, ppos); if (write && old_val != new_val) { if (!rtnl_net_trylock(net)) return restart_syscall(); WRITE_ONCE(*valp, new_val); if (valp == &net->ipv6.devconf_dflt->force_forwarding) { inet6_netconf_notify_devconf(net, RTM_NEWNETCONF, NETCONFA_FORCE_FORWARDING, NETCONFA_IFINDEX_DEFAULT, net->ipv6.devconf_dflt); } else if (valp == &net->ipv6.devconf_all->force_forwarding) { inet6_netconf_notify_devconf(net, RTM_NEWNETCONF, NETCONFA_FORCE_FORWARDING, NETCONFA_IFINDEX_ALL, net->ipv6.devconf_all); addrconf_force_forward_change(net, new_val); } else { inet6_netconf_notify_devconf(net, RTM_NEWNETCONF, NETCONFA_FORCE_FORWARDING, idev->dev->ifindex, &idev->cnf); } rtnl_net_unlock(net); } if (ret) *ppos = pos; return ret; } static int minus_one = -1; static const int two_five_five = 255; static u32 ioam6_if_id_max = U16_MAX; static const struct ctl_table addrconf_sysctl[] = { { .procname = "forwarding", .data = &ipv6_devconf.forwarding, .maxlen = sizeof(int), .mode = 0644, .proc_handler = addrconf_sysctl_forward, }, { .procname = "hop_limit", .data = &ipv6_devconf.hop_limit, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec_minmax, .extra1 = (void *)SYSCTL_ONE, .extra2 = (void *)&two_five_five, }, { .procname = "mtu", .data = &ipv6_devconf.mtu6, .maxlen = sizeof(int), .mode = 0644, .proc_handler = addrconf_sysctl_mtu, }, { .procname = "accept_ra", .data = &ipv6_devconf.accept_ra, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec, }, { .procname = "accept_redirects", .data = &ipv6_devconf.accept_redirects, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec, }, { .procname = "autoconf", .data = &ipv6_devconf.autoconf, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec, }, { .procname = "dad_transmits", .data = &ipv6_devconf.dad_transmits, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec, }, { .procname = "router_solicitations", .data = &ipv6_devconf.rtr_solicits, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec_minmax, .extra1 = &minus_one, }, { .procname = "router_solicitation_interval", .data = &ipv6_devconf.rtr_solicit_interval, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec_jiffies, }, { .procname = "router_solicitation_max_interval", .data = &ipv6_devconf.rtr_solicit_max_interval, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec_jiffies, }, { .procname = "router_solicitation_delay", .data = &ipv6_devconf.rtr_solicit_delay, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec_jiffies, }, { .procname = "force_mld_version", .data = &ipv6_devconf.force_mld_version, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec, }, { .procname = "mldv1_unsolicited_report_interval", .data = &ipv6_devconf.mldv1_unsolicited_report_interval, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec_ms_jiffies, }, { .procname = "mldv2_unsolicited_report_interval", .data = &ipv6_devconf.mldv2_unsolicited_report_interval, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec_ms_jiffies, }, { .procname = "use_tempaddr", .data = &ipv6_devconf.use_tempaddr, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec, }, { .procname = "temp_valid_lft", .data = &ipv6_devconf.temp_valid_lft, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec, }, { .procname = "temp_prefered_lft", .data = &ipv6_devconf.temp_prefered_lft, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec, }, { .procname = "regen_min_advance", .data = &ipv6_devconf.regen_min_advance, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec, }, { .procname = "regen_max_retry", .data = &ipv6_devconf.regen_max_retry, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec, }, { .procname = "max_desync_factor", .data = &ipv6_devconf.max_desync_factor, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec, }, { .procname = "max_addresses", .data = &ipv6_devconf.max_addresses, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec, }, { .procname = "accept_ra_defrtr", .data = &ipv6_devconf.accept_ra_defrtr, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec, }, { .procname = "ra_defrtr_metric", .data = &ipv6_devconf.ra_defrtr_metric, .maxlen = sizeof(u32), .mode = 0644, .proc_handler = proc_douintvec_minmax, .extra1 = (void *)SYSCTL_ONE, }, { .procname = "accept_ra_min_hop_limit", .data = &ipv6_devconf.accept_ra_min_hop_limit, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec, }, { .procname = "accept_ra_min_lft", .data = &ipv6_devconf.accept_ra_min_lft, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec, }, { .procname = "accept_ra_pinfo", .data = &ipv6_devconf.accept_ra_pinfo, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec, }, { .procname = "ra_honor_pio_life", .data = &ipv6_devconf.ra_honor_pio_life, .maxlen = sizeof(u8), .mode = 0644, .proc_handler = proc_dou8vec_minmax, .extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_ONE, }, { .procname = "ra_honor_pio_pflag", .data = &ipv6_devconf.ra_honor_pio_pflag, .maxlen = sizeof(u8), .mode = 0644, .proc_handler = proc_dou8vec_minmax, .extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_ONE, }, #ifdef CONFIG_IPV6_ROUTER_PREF { .procname = "accept_ra_rtr_pref", .data = &ipv6_devconf.accept_ra_rtr_pref, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec, }, { .procname = "router_probe_interval", .data = &ipv6_devconf.rtr_probe_interval, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec_jiffies, }, #ifdef CONFIG_IPV6_ROUTE_INFO { .procname = "accept_ra_rt_info_min_plen", .data = &ipv6_devconf.accept_ra_rt_info_min_plen, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec, }, { .procname = "accept_ra_rt_info_max_plen", .data = &ipv6_devconf.accept_ra_rt_info_max_plen, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec, }, #endif #endif { .procname = "proxy_ndp", .data = &ipv6_devconf.proxy_ndp, .maxlen = sizeof(int), .mode = 0644, .proc_handler = addrconf_sysctl_proxy_ndp, }, { .procname = "accept_source_route", .data = &ipv6_devconf.accept_source_route, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec, }, #ifdef CONFIG_IPV6_OPTIMISTIC_DAD { .procname = "optimistic_dad", .data = &ipv6_devconf.optimistic_dad, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec, }, { .procname = "use_optimistic", .data = &ipv6_devconf.use_optimistic, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec, }, #endif #ifdef CONFIG_IPV6_MROUTE { .procname = "mc_forwarding", .data = &ipv6_devconf.mc_forwarding, .maxlen = sizeof(int), .mode = 0444, .proc_handler = proc_dointvec, }, #endif { .procname = "disable_ipv6", .data = &ipv6_devconf.disable_ipv6, .maxlen = sizeof(int), .mode = 0644, .proc_handler = addrconf_sysctl_disable, }, { .procname = "accept_dad", .data = &ipv6_devconf.accept_dad, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec, }, { .procname = "force_tllao", .data = &ipv6_devconf.force_tllao, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec }, { .procname = "ndisc_notify", .data = &ipv6_devconf.ndisc_notify, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec }, { .procname = "suppress_frag_ndisc", .data = &ipv6_devconf.suppress_frag_ndisc, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec }, { .procname = "accept_ra_from_local", .data = &ipv6_devconf.accept_ra_from_local, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec, }, { .procname = "accept_ra_mtu", .data = &ipv6_devconf.accept_ra_mtu, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec, }, { .procname = "stable_secret", .data = &ipv6_devconf.stable_secret, .maxlen = IPV6_MAX_STRLEN, .mode = 0600, .proc_handler = addrconf_sysctl_stable_secret, }, { .procname = "use_oif_addrs_only", .data = &ipv6_devconf.use_oif_addrs_only, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec, }, { .procname = "ignore_routes_with_linkdown", .data = &ipv6_devconf.ignore_routes_with_linkdown, .maxlen = sizeof(int), .mode = 0644, .proc_handler = addrconf_sysctl_ignore_routes_with_linkdown, }, { .procname = "drop_unicast_in_l2_multicast", .data = &ipv6_devconf.drop_unicast_in_l2_multicast, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec, }, { .procname = "drop_unsolicited_na", .data = &ipv6_devconf.drop_unsolicited_na, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec, }, { .procname = "keep_addr_on_down", .data = &ipv6_devconf.keep_addr_on_down, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec, }, { .procname = "seg6_enabled", .data = &ipv6_devconf.seg6_enabled, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec, }, #ifdef CONFIG_IPV6_SEG6_HMAC { .procname = "seg6_require_hmac", .data = &ipv6_devconf.seg6_require_hmac, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec, }, #endif { .procname = "enhanced_dad", .data = &ipv6_devconf.enhanced_dad, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec, }, { .procname = "addr_gen_mode", .data = &ipv6_devconf.addr_gen_mode, .maxlen = sizeof(int), .mode = 0644, .proc_handler = addrconf_sysctl_addr_gen_mode, }, { .procname = "disable_policy", .data = &ipv6_devconf.disable_policy, .maxlen = sizeof(int), .mode = 0644, .proc_handler = addrconf_sysctl_disable_policy, }, { .procname = "ndisc_tclass", .data = &ipv6_devconf.ndisc_tclass, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec_minmax, .extra1 = (void *)SYSCTL_ZERO, .extra2 = (void *)&two_five_five, }, { .procname = "rpl_seg_enabled", .data = &ipv6_devconf.rpl_seg_enabled, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec_minmax, .extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_ONE, }, { .procname = "ioam6_enabled", .data = &ipv6_devconf.ioam6_enabled, .maxlen = sizeof(u8), .mode = 0644, .proc_handler = proc_dou8vec_minmax, .extra1 = (void *)SYSCTL_ZERO, .extra2 = (void *)SYSCTL_ONE, }, { .procname = "ioam6_id", .data = &ipv6_devconf.ioam6_id, .maxlen = sizeof(u32), .mode = 0644, .proc_handler = proc_douintvec_minmax, .extra1 = (void *)SYSCTL_ZERO, .extra2 = (void *)&ioam6_if_id_max, }, { .procname = "ioam6_id_wide", .data = &ipv6_devconf.ioam6_id_wide, .maxlen = sizeof(u32), .mode = 0644, .proc_handler = proc_douintvec, }, { .procname = "ndisc_evict_nocarrier", .data = &ipv6_devconf.ndisc_evict_nocarrier, .maxlen = sizeof(u8), .mode = 0644, .proc_handler = proc_dou8vec_minmax, .extra1 = (void *)SYSCTL_ZERO, .extra2 = (void *)SYSCTL_ONE, }, { .procname = "accept_untracked_na", .data = &ipv6_devconf.accept_untracked_na, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec_minmax, .extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_TWO, }, { .procname = "force_forwarding", .data = &ipv6_devconf.force_forwarding, .maxlen = sizeof(int), .mode = 0644, .proc_handler = addrconf_sysctl_force_forwarding, }, }; static int __addrconf_sysctl_register(struct net *net, char *dev_name, struct inet6_dev *idev, struct ipv6_devconf *p) { size_t table_size = ARRAY_SIZE(addrconf_sysctl); int i, ifindex; struct ctl_table *table; char path[sizeof("net/ipv6/conf/") + IFNAMSIZ]; table = kmemdup(addrconf_sysctl, sizeof(addrconf_sysctl), GFP_KERNEL_ACCOUNT); if (!table) goto out; for (i = 0; i < table_size; i++) { table[i].data += (char *)p - (char *)&ipv6_devconf; /* If one of these is already set, then it is not safe to * overwrite either of them: this makes proc_dointvec_minmax * usable. */ if (!table[i].extra1 && !table[i].extra2) { table[i].extra1 = idev; /* embedded; no ref */ table[i].extra2 = net; } } snprintf(path, sizeof(path), "net/ipv6/conf/%s", dev_name); p->sysctl_header = register_net_sysctl_sz(net, path, table, table_size); if (!p->sysctl_header) goto free; if (!strcmp(dev_name, "all")) ifindex = NETCONFA_IFINDEX_ALL; else if (!strcmp(dev_name, "default")) ifindex = NETCONFA_IFINDEX_DEFAULT; else ifindex = idev->dev->ifindex; inet6_netconf_notify_devconf(net, RTM_NEWNETCONF, NETCONFA_ALL, ifindex, p); return 0; free: kfree(table); out: return -ENOBUFS; } static void __addrconf_sysctl_unregister(struct net *net, struct ipv6_devconf *p, int ifindex) { const struct ctl_table *table; if (!p->sysctl_header) return; table = p->sysctl_header->ctl_table_arg; unregister_net_sysctl_table(p->sysctl_header); p->sysctl_header = NULL; kfree(table); inet6_netconf_notify_devconf(net, RTM_DELNETCONF, 0, ifindex, NULL); } static int addrconf_sysctl_register(struct inet6_dev *idev) { int err; if (!sysctl_dev_name_is_allowed(idev->dev->name)) return -EINVAL; err = neigh_sysctl_register(idev->dev, idev->nd_parms, &ndisc_ifinfo_sysctl_change); if (err) return err; err = __addrconf_sysctl_register(dev_net(idev->dev), idev->dev->name, idev, &idev->cnf); if (err) neigh_sysctl_unregister(idev->nd_parms); return err; } static void addrconf_sysctl_unregister(struct inet6_dev *idev) { __addrconf_sysctl_unregister(dev_net(idev->dev), &idev->cnf, idev->dev->ifindex); neigh_sysctl_unregister(idev->nd_parms); } #endif static int __net_init addrconf_init_net(struct net *net) { int err = -ENOMEM; struct ipv6_devconf *all, *dflt; spin_lock_init(&net->ipv6.addrconf_hash_lock); INIT_DEFERRABLE_WORK(&net->ipv6.addr_chk_work, addrconf_verify_work); net->ipv6.inet6_addr_lst = kcalloc(IN6_ADDR_HSIZE, sizeof(struct hlist_head), GFP_KERNEL); if (!net->ipv6.inet6_addr_lst) goto err_alloc_addr; all = kmemdup(&ipv6_devconf, sizeof(ipv6_devconf), GFP_KERNEL); if (!all) goto err_alloc_all; dflt = kmemdup(&ipv6_devconf_dflt, sizeof(ipv6_devconf_dflt), GFP_KERNEL); if (!dflt) goto err_alloc_dflt; if (!net_eq(net, &init_net)) { switch (net_inherit_devconf()) { case 1: /* copy from init_net */ memcpy(all, init_net.ipv6.devconf_all, sizeof(ipv6_devconf)); memcpy(dflt, init_net.ipv6.devconf_dflt, sizeof(ipv6_devconf_dflt)); break; case 3: /* copy from the current netns */ memcpy(all, current->nsproxy->net_ns->ipv6.devconf_all, sizeof(ipv6_devconf)); memcpy(dflt, current->nsproxy->net_ns->ipv6.devconf_dflt, sizeof(ipv6_devconf_dflt)); break; case 0: case 2: /* use compiled values */ break; } } /* these will be inherited by all namespaces */ dflt->autoconf = ipv6_defaults.autoconf; dflt->disable_ipv6 = ipv6_defaults.disable_ipv6; dflt->stable_secret.initialized = false; all->stable_secret.initialized = false; net->ipv6.devconf_all = all; net->ipv6.devconf_dflt = dflt; #ifdef CONFIG_SYSCTL err = __addrconf_sysctl_register(net, "all", NULL, all); if (err < 0) goto err_reg_all; err = __addrconf_sysctl_register(net, "default", NULL, dflt); if (err < 0) goto err_reg_dflt; #endif return 0; #ifdef CONFIG_SYSCTL err_reg_dflt: __addrconf_sysctl_unregister(net, all, NETCONFA_IFINDEX_ALL); err_reg_all: kfree(dflt); net->ipv6.devconf_dflt = NULL; #endif err_alloc_dflt: kfree(all); net->ipv6.devconf_all = NULL; err_alloc_all: kfree(net->ipv6.inet6_addr_lst); err_alloc_addr: return err; } static void __net_exit addrconf_exit_net(struct net *net) { int i; #ifdef CONFIG_SYSCTL __addrconf_sysctl_unregister(net, net->ipv6.devconf_dflt, NETCONFA_IFINDEX_DEFAULT); __addrconf_sysctl_unregister(net, net->ipv6.devconf_all, NETCONFA_IFINDEX_ALL); #endif kfree(net->ipv6.devconf_dflt); net->ipv6.devconf_dflt = NULL; kfree(net->ipv6.devconf_all); net->ipv6.devconf_all = NULL; cancel_delayed_work_sync(&net->ipv6.addr_chk_work); /* * Check hash table, then free it. */ for (i = 0; i < IN6_ADDR_HSIZE; i++) WARN_ON_ONCE(!hlist_empty(&net->ipv6.inet6_addr_lst[i])); kfree(net->ipv6.inet6_addr_lst); net->ipv6.inet6_addr_lst = NULL; } static struct pernet_operations addrconf_ops = { .init = addrconf_init_net, .exit = addrconf_exit_net, }; static struct rtnl_af_ops inet6_ops __read_mostly = { .family = AF_INET6, .fill_link_af = inet6_fill_link_af, .get_link_af_size = inet6_get_link_af_size, .validate_link_af = inet6_validate_link_af, .set_link_af = inet6_set_link_af, }; static const struct rtnl_msg_handler addrconf_rtnl_msg_handlers[] __initconst_or_module = { {.owner = THIS_MODULE, .protocol = PF_INET6, .msgtype = RTM_GETLINK, .dumpit = inet6_dump_ifinfo, .flags = RTNL_FLAG_DUMP_UNLOCKED}, {.owner = THIS_MODULE, .protocol = PF_INET6, .msgtype = RTM_NEWADDR, .doit = inet6_rtm_newaddr, .flags = RTNL_FLAG_DOIT_PERNET}, {.owner = THIS_MODULE, .protocol = PF_INET6, .msgtype = RTM_DELADDR, .doit = inet6_rtm_deladdr, .flags = RTNL_FLAG_DOIT_PERNET}, {.owner = THIS_MODULE, .protocol = PF_INET6, .msgtype = RTM_GETADDR, .doit = inet6_rtm_getaddr, .dumpit = inet6_dump_ifaddr, .flags = RTNL_FLAG_DOIT_UNLOCKED | RTNL_FLAG_DUMP_UNLOCKED}, {.owner = THIS_MODULE, .protocol = PF_INET6, .msgtype = RTM_GETMULTICAST, .dumpit = inet6_dump_ifmcaddr, .flags = RTNL_FLAG_DUMP_UNLOCKED}, {.owner = THIS_MODULE, .protocol = PF_INET6, .msgtype = RTM_GETANYCAST, .dumpit = inet6_dump_ifacaddr, .flags = RTNL_FLAG_DUMP_UNLOCKED}, {.owner = THIS_MODULE, .protocol = PF_INET6, .msgtype = RTM_GETNETCONF, .doit = inet6_netconf_get_devconf, .dumpit = inet6_netconf_dump_devconf, .flags = RTNL_FLAG_DOIT_UNLOCKED | RTNL_FLAG_DUMP_UNLOCKED}, }; /* * Init / cleanup code */ int __init addrconf_init(void) { struct inet6_dev *idev; int err; err = ipv6_addr_label_init(); if (err < 0) { pr_crit("%s: cannot initialize default policy table: %d\n", __func__, err); goto out; } err = register_pernet_subsys(&addrconf_ops); if (err < 0) goto out_addrlabel; /* All works using addrconf_wq need to lock rtnl. */ addrconf_wq = create_singlethread_workqueue("ipv6_addrconf"); if (!addrconf_wq) { err = -ENOMEM; goto out_nowq; } rtnl_net_lock(&init_net); idev = ipv6_add_dev(blackhole_netdev); rtnl_net_unlock(&init_net); if (IS_ERR(idev)) { err = PTR_ERR(idev); goto errlo; } ip6_route_init_special_entries(); register_netdevice_notifier(&ipv6_dev_notf); addrconf_verify(&init_net); err = rtnl_af_register(&inet6_ops); if (err) goto erraf; err = rtnl_register_many(addrconf_rtnl_msg_handlers); if (err) goto errout; err = ipv6_addr_label_rtnl_register(); if (err < 0) goto errout; return 0; errout: rtnl_unregister_all(PF_INET6); rtnl_af_unregister(&inet6_ops); erraf: unregister_netdevice_notifier(&ipv6_dev_notf); errlo: destroy_workqueue(addrconf_wq); out_nowq: unregister_pernet_subsys(&addrconf_ops); out_addrlabel: ipv6_addr_label_cleanup(); out: return err; } void addrconf_cleanup(void) { struct net_device *dev; unregister_netdevice_notifier(&ipv6_dev_notf); unregister_pernet_subsys(&addrconf_ops); ipv6_addr_label_cleanup(); rtnl_af_unregister(&inet6_ops); rtnl_net_lock(&init_net); /* clean dev list */ for_each_netdev(&init_net, dev) { if (!__in6_dev_get_rtnl_net(dev)) continue; addrconf_ifdown(dev, true); } addrconf_ifdown(init_net.loopback_dev, true); rtnl_net_unlock(&init_net); destroy_workqueue(addrconf_wq); } |
| 1 25 2 24 1 1 9 5 8 1 4 4 14 14 1 14 5 11 8 12 3 12 1 8 3 3 3 3 3 3 3 3 49 49 2 48 2 46 45 18 27 1 26 26 26 26 25 26 19 7 5 5 33 33 33 33 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 | // SPDX-License-Identifier: GPL-2.0-or-later /* * IPv6 fragment reassembly * Linux INET6 implementation * * Authors: * Pedro Roque <roque@di.fc.ul.pt> * * Based on: net/ipv4/ip_fragment.c */ /* * Fixes: * Andi Kleen Make it work with multiple hosts. * More RFC compliance. * * Horst von Brand Add missing #include <linux/string.h> * Alexey Kuznetsov SMP races, threading, cleanup. * Patrick McHardy LRU queue of frag heads for evictor. * Mitsuru KANDA @USAGI Register inet6_protocol{}. * David Stevens and * YOSHIFUJI,H. @USAGI Always remove fragment header to * calculate ICV correctly. */ #define pr_fmt(fmt) "IPv6: " fmt #include <linux/errno.h> #include <linux/types.h> #include <linux/string.h> #include <linux/socket.h> #include <linux/sockios.h> #include <linux/jiffies.h> #include <linux/net.h> #include <linux/list.h> #include <linux/netdevice.h> #include <linux/in6.h> #include <linux/ipv6.h> #include <linux/icmpv6.h> #include <linux/random.h> #include <linux/jhash.h> #include <linux/skbuff.h> #include <linux/slab.h> #include <linux/export.h> #include <linux/tcp.h> #include <linux/udp.h> #include <net/sock.h> #include <net/snmp.h> #include <net/ipv6.h> #include <net/ip6_route.h> #include <net/protocol.h> #include <net/transp_v6.h> #include <net/rawv6.h> #include <net/ndisc.h> #include <net/addrconf.h> #include <net/ipv6_frag.h> #include <net/inet_ecn.h> static const char ip6_frag_cache_name[] = "ip6-frags"; static u8 ip6_frag_ecn(const struct ipv6hdr *ipv6h) { return 1 << (ipv6_get_dsfield(ipv6h) & INET_ECN_MASK); } static struct inet_frags ip6_frags; static int ip6_frag_reasm(struct frag_queue *fq, struct sk_buff *skb, struct sk_buff *prev_tail, struct net_device *dev, int *refs); static void ip6_frag_expire(struct timer_list *t) { struct inet_frag_queue *frag = timer_container_of(frag, t, timer); struct frag_queue *fq; fq = container_of(frag, struct frag_queue, q); ip6frag_expire_frag_queue(fq->q.fqdir->net, fq); } static struct frag_queue * fq_find(struct net *net, __be32 id, const struct ipv6hdr *hdr, int iif) { struct frag_v6_compare_key key = { .id = id, .saddr = hdr->saddr, .daddr = hdr->daddr, .user = IP6_DEFRAG_LOCAL_DELIVER, .iif = iif, }; struct inet_frag_queue *q; if (!(ipv6_addr_type(&hdr->daddr) & (IPV6_ADDR_MULTICAST | IPV6_ADDR_LINKLOCAL))) key.iif = 0; q = inet_frag_find(net->ipv6.fqdir, &key); if (!q) return NULL; return container_of(q, struct frag_queue, q); } static int ip6_frag_queue(struct net *net, struct frag_queue *fq, struct sk_buff *skb, struct frag_hdr *fhdr, int nhoff, u32 *prob_offset, int *refs) { int offset, end, fragsize; struct sk_buff *prev_tail; struct net_device *dev; int err = -ENOENT; SKB_DR(reason); u8 ecn; /* If reassembly is already done, @skb must be a duplicate frag. */ if (fq->q.flags & INET_FRAG_COMPLETE) { SKB_DR_SET(reason, DUP_FRAG); goto err; } err = -EINVAL; offset = ntohs(fhdr->frag_off) & ~0x7; end = offset + (ntohs(ipv6_hdr(skb)->payload_len) - ((u8 *)(fhdr + 1) - (u8 *)(ipv6_hdr(skb) + 1))); if ((unsigned int)end > IPV6_MAXPLEN) { *prob_offset = (u8 *)&fhdr->frag_off - skb_network_header(skb); /* note that if prob_offset is set, the skb is freed elsewhere, * we do not free it here. */ return -1; } ecn = ip6_frag_ecn(ipv6_hdr(skb)); if (skb->ip_summed == CHECKSUM_COMPLETE) { const unsigned char *nh = skb_network_header(skb); skb->csum = csum_sub(skb->csum, csum_partial(nh, (u8 *)(fhdr + 1) - nh, 0)); } /* Is this the final fragment? */ if (!(fhdr->frag_off & htons(IP6_MF))) { /* If we already have some bits beyond end * or have different end, the segment is corrupted. */ if (end < fq->q.len || ((fq->q.flags & INET_FRAG_LAST_IN) && end != fq->q.len)) goto discard_fq; fq->q.flags |= INET_FRAG_LAST_IN; fq->q.len = end; } else { /* Check if the fragment is rounded to 8 bytes. * Required by the RFC. */ if (end & 0x7) { /* RFC2460 says always send parameter problem in * this case. -DaveM */ *prob_offset = offsetof(struct ipv6hdr, payload_len); return -1; } if (end > fq->q.len) { /* Some bits beyond end -> corruption. */ if (fq->q.flags & INET_FRAG_LAST_IN) goto discard_fq; fq->q.len = end; } } if (end == offset) goto discard_fq; err = -ENOMEM; /* Point into the IP datagram 'data' part. */ if (!pskb_pull(skb, (u8 *) (fhdr + 1) - skb->data)) goto discard_fq; err = pskb_trim_rcsum(skb, end - offset); if (err) goto discard_fq; /* Note : skb->rbnode and skb->dev share the same location. */ dev = skb->dev; /* Makes sure compiler wont do silly aliasing games */ barrier(); prev_tail = fq->q.fragments_tail; err = inet_frag_queue_insert(&fq->q, skb, offset, end); if (err) goto insert_error; if (dev) fq->iif = dev->ifindex; fq->q.stamp = skb->tstamp; fq->q.tstamp_type = skb->tstamp_type; fq->q.meat += skb->len; fq->ecn |= ecn; add_frag_mem_limit(fq->q.fqdir, skb->truesize); fragsize = -skb_network_offset(skb) + skb->len; if (fragsize > fq->q.max_size) fq->q.max_size = fragsize; /* The first fragment. * nhoffset is obtained from the first fragment, of course. */ if (offset == 0) { fq->nhoffset = nhoff; fq->q.flags |= INET_FRAG_FIRST_IN; } if (fq->q.flags == (INET_FRAG_FIRST_IN | INET_FRAG_LAST_IN) && fq->q.meat == fq->q.len) { unsigned long orefdst = skb->_skb_refdst; skb->_skb_refdst = 0UL; err = ip6_frag_reasm(fq, skb, prev_tail, dev, refs); skb->_skb_refdst = orefdst; return err; } skb_dst_drop(skb); return -EINPROGRESS; insert_error: if (err == IPFRAG_DUP) { SKB_DR_SET(reason, DUP_FRAG); err = -EINVAL; goto err; } err = -EINVAL; __IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)), IPSTATS_MIB_REASM_OVERLAPS); discard_fq: inet_frag_kill(&fq->q, refs); __IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)), IPSTATS_MIB_REASMFAILS); err: kfree_skb_reason(skb, reason); return err; } /* * Check if this packet is complete. * * It is called with locked fq, and caller must check that * queue is eligible for reassembly i.e. it is not COMPLETE, * the last and the first frames arrived and all the bits are here. */ static int ip6_frag_reasm(struct frag_queue *fq, struct sk_buff *skb, struct sk_buff *prev_tail, struct net_device *dev, int *refs) { struct net *net = fq->q.fqdir->net; unsigned int nhoff; void *reasm_data; int payload_len; u8 ecn; inet_frag_kill(&fq->q, refs); ecn = ip_frag_ecn_table[fq->ecn]; if (unlikely(ecn == 0xff)) goto out_fail; reasm_data = inet_frag_reasm_prepare(&fq->q, skb, prev_tail); if (!reasm_data) goto out_oom; payload_len = -skb_network_offset(skb) - sizeof(struct ipv6hdr) + fq->q.len - sizeof(struct frag_hdr); if (payload_len > IPV6_MAXPLEN) goto out_oversize; /* We have to remove fragment header from datagram and to relocate * header in order to calculate ICV correctly. */ nhoff = fq->nhoffset; skb_network_header(skb)[nhoff] = skb_transport_header(skb)[0]; memmove(skb->head + sizeof(struct frag_hdr), skb->head, (skb->data - skb->head) - sizeof(struct frag_hdr)); if (skb_mac_header_was_set(skb)) skb->mac_header += sizeof(struct frag_hdr); skb->network_header += sizeof(struct frag_hdr); skb_reset_transport_header(skb); inet_frag_reasm_finish(&fq->q, skb, reasm_data, true); skb->dev = dev; ipv6_hdr(skb)->payload_len = htons(payload_len); ipv6_change_dsfield(ipv6_hdr(skb), 0xff, ecn); IP6CB(skb)->nhoff = nhoff; IP6CB(skb)->flags |= IP6SKB_FRAGMENTED; IP6CB(skb)->frag_max_size = fq->q.max_size; /* Yes, and fold redundant checksum back. 8) */ skb_postpush_rcsum(skb, skb_network_header(skb), skb_network_header_len(skb)); __IP6_INC_STATS(net, __in6_dev_stats_get(dev, skb), IPSTATS_MIB_REASMOKS); fq->q.rb_fragments = RB_ROOT; fq->q.fragments_tail = NULL; fq->q.last_run_head = NULL; return 1; out_oversize: net_dbg_ratelimited("ip6_frag_reasm: payload len = %d\n", payload_len); goto out_fail; out_oom: net_dbg_ratelimited("ip6_frag_reasm: no memory for reassembly\n"); out_fail: __IP6_INC_STATS(net, __in6_dev_stats_get(dev, skb), IPSTATS_MIB_REASMFAILS); inet_frag_kill(&fq->q, refs); return -1; } static int ipv6_frag_rcv(struct sk_buff *skb) { const struct ipv6hdr *hdr = ipv6_hdr(skb); struct net *net = skb_dst_dev_net(skb); struct frag_hdr *fhdr; struct frag_queue *fq; u8 nexthdr; int iif; if (IP6CB(skb)->flags & IP6SKB_FRAGMENTED) goto fail_hdr; __IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)), IPSTATS_MIB_REASMREQDS); /* Jumbo payload inhibits frag. header */ if (hdr->payload_len == 0) goto fail_hdr; if (!pskb_may_pull(skb, (skb_transport_offset(skb) + sizeof(struct frag_hdr)))) goto fail_hdr; hdr = ipv6_hdr(skb); fhdr = (struct frag_hdr *)skb_transport_header(skb); if (!(fhdr->frag_off & htons(IP6_OFFSET | IP6_MF))) { /* It is not a fragmented frame */ skb->transport_header += sizeof(struct frag_hdr); __IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)), IPSTATS_MIB_REASMOKS); IP6CB(skb)->nhoff = (u8 *)fhdr - skb_network_header(skb); IP6CB(skb)->flags |= IP6SKB_FRAGMENTED; IP6CB(skb)->frag_max_size = ntohs(hdr->payload_len) + sizeof(struct ipv6hdr); return 1; } /* RFC 8200, Section 4.5 Fragment Header: * If the first fragment does not include all headers through an * Upper-Layer header, then that fragment should be discarded and * an ICMP Parameter Problem, Code 3, message should be sent to * the source of the fragment, with the Pointer field set to zero. */ nexthdr = hdr->nexthdr; if (ipv6frag_thdr_truncated(skb, skb_network_offset(skb) + sizeof(struct ipv6hdr), &nexthdr)) { __IP6_INC_STATS(net, __in6_dev_get_safely(skb->dev), IPSTATS_MIB_INHDRERRORS); icmpv6_param_prob(skb, ICMPV6_HDR_INCOMP, 0); return -1; } iif = skb->dev ? skb->dev->ifindex : 0; rcu_read_lock(); fq = fq_find(net, fhdr->identification, hdr, iif); if (fq) { u32 prob_offset = 0; int ret, refs = 0; spin_lock(&fq->q.lock); fq->iif = iif; ret = ip6_frag_queue(net, fq, skb, fhdr, IP6CB(skb)->nhoff, &prob_offset, &refs); spin_unlock(&fq->q.lock); rcu_read_unlock(); inet_frag_putn(&fq->q, refs); if (prob_offset) { __IP6_INC_STATS(net, __in6_dev_get_safely(skb->dev), IPSTATS_MIB_INHDRERRORS); /* icmpv6_param_prob() calls kfree_skb(skb) */ icmpv6_param_prob(skb, ICMPV6_HDR_FIELD, prob_offset); } return ret; } rcu_read_unlock(); __IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)), IPSTATS_MIB_REASMFAILS); kfree_skb(skb); return -1; fail_hdr: __IP6_INC_STATS(net, __in6_dev_get_safely(skb->dev), IPSTATS_MIB_INHDRERRORS); icmpv6_param_prob(skb, ICMPV6_HDR_FIELD, skb_network_header_len(skb)); return -1; } static const struct inet6_protocol frag_protocol = { .handler = ipv6_frag_rcv, .flags = INET6_PROTO_NOPOLICY, }; #ifdef CONFIG_SYSCTL static struct ctl_table ip6_frags_ns_ctl_table[] = { { .procname = "ip6frag_high_thresh", .maxlen = sizeof(unsigned long), .mode = 0644, .proc_handler = proc_doulongvec_minmax, }, { .procname = "ip6frag_low_thresh", .maxlen = sizeof(unsigned long), .mode = 0644, .proc_handler = proc_doulongvec_minmax, }, { .procname = "ip6frag_time", .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec_jiffies, }, }; /* secret interval has been deprecated */ static int ip6_frags_secret_interval_unused; static struct ctl_table ip6_frags_ctl_table[] = { { .procname = "ip6frag_secret_interval", .data = &ip6_frags_secret_interval_unused, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec_jiffies, }, }; static int __net_init ip6_frags_ns_sysctl_register(struct net *net) { struct ctl_table *table; struct ctl_table_header *hdr; table = ip6_frags_ns_ctl_table; if (!net_eq(net, &init_net)) { table = kmemdup(table, sizeof(ip6_frags_ns_ctl_table), GFP_KERNEL); if (!table) goto err_alloc; } table[0].data = &net->ipv6.fqdir->high_thresh; table[0].extra1 = &net->ipv6.fqdir->low_thresh; table[1].data = &net->ipv6.fqdir->low_thresh; table[1].extra2 = &net->ipv6.fqdir->high_thresh; table[2].data = &net->ipv6.fqdir->timeout; hdr = register_net_sysctl_sz(net, "net/ipv6", table, ARRAY_SIZE(ip6_frags_ns_ctl_table)); if (!hdr) goto err_reg; net->ipv6.sysctl.frags_hdr = hdr; return 0; err_reg: if (!net_eq(net, &init_net)) kfree(table); err_alloc: return -ENOMEM; } static void __net_exit ip6_frags_ns_sysctl_unregister(struct net *net) { const struct ctl_table *table; table = net->ipv6.sysctl.frags_hdr->ctl_table_arg; unregister_net_sysctl_table(net->ipv6.sysctl.frags_hdr); if (!net_eq(net, &init_net)) kfree(table); } static struct ctl_table_header *ip6_ctl_header; static int ip6_frags_sysctl_register(void) { ip6_ctl_header = register_net_sysctl(&init_net, "net/ipv6", ip6_frags_ctl_table); return ip6_ctl_header == NULL ? -ENOMEM : 0; } static void ip6_frags_sysctl_unregister(void) { unregister_net_sysctl_table(ip6_ctl_header); } #else static int ip6_frags_ns_sysctl_register(struct net *net) { return 0; } static void ip6_frags_ns_sysctl_unregister(struct net *net) { } static int ip6_frags_sysctl_register(void) { return 0; } static void ip6_frags_sysctl_unregister(void) { } #endif static int __net_init ipv6_frags_init_net(struct net *net) { int res; res = fqdir_init(&net->ipv6.fqdir, &ip6_frags, net); if (res < 0) return res; net->ipv6.fqdir->high_thresh = IPV6_FRAG_HIGH_THRESH; net->ipv6.fqdir->low_thresh = IPV6_FRAG_LOW_THRESH; net->ipv6.fqdir->timeout = IPV6_FRAG_TIMEOUT; res = ip6_frags_ns_sysctl_register(net); if (res < 0) fqdir_exit(net->ipv6.fqdir); return res; } static void __net_exit ipv6_frags_pre_exit_net(struct net *net) { fqdir_pre_exit(net->ipv6.fqdir); } static void __net_exit ipv6_frags_exit_net(struct net *net) { ip6_frags_ns_sysctl_unregister(net); fqdir_exit(net->ipv6.fqdir); } static struct pernet_operations ip6_frags_ops = { .init = ipv6_frags_init_net, .pre_exit = ipv6_frags_pre_exit_net, .exit = ipv6_frags_exit_net, }; static const struct rhashtable_params ip6_rhash_params = { .head_offset = offsetof(struct inet_frag_queue, node), .hashfn = ip6frag_key_hashfn, .obj_hashfn = ip6frag_obj_hashfn, .obj_cmpfn = ip6frag_obj_cmpfn, .automatic_shrinking = true, }; int __init ipv6_frag_init(void) { int ret; ip6_frags.constructor = ip6frag_init; ip6_frags.destructor = NULL; ip6_frags.qsize = sizeof(struct frag_queue); ip6_frags.frag_expire = ip6_frag_expire; ip6_frags.frags_cache_name = ip6_frag_cache_name; ip6_frags.rhash_params = ip6_rhash_params; ret = inet_frags_init(&ip6_frags); if (ret) goto out; ret = inet6_add_protocol(&frag_protocol, IPPROTO_FRAGMENT); if (ret) goto err_protocol; ret = ip6_frags_sysctl_register(); if (ret) goto err_sysctl; ret = register_pernet_subsys(&ip6_frags_ops); if (ret) goto err_pernet; out: return ret; err_pernet: ip6_frags_sysctl_unregister(); err_sysctl: inet6_del_protocol(&frag_protocol, IPPROTO_FRAGMENT); err_protocol: inet_frags_fini(&ip6_frags); goto out; } void ipv6_frag_exit(void) { ip6_frags_sysctl_unregister(); unregister_pernet_subsys(&ip6_frags_ops); inet6_del_protocol(&frag_protocol, IPPROTO_FRAGMENT); inet_frags_fini(&ip6_frags); } |
| 2 1 10 10 7 3 10 1 2 2 2 2 1 2 2 2 4 3 10 10 10 10 1 6 5 1 1 4 4 3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 | // SPDX-License-Identifier: GPL-2.0-only /* Copyright (C) 2003-2013 Jozsef Kadlecsik <kadlec@netfilter.org> */ /* Kernel module implementing an IP set type: the hash:ip,port type */ #include <linux/jhash.h> #include <linux/module.h> #include <linux/ip.h> #include <linux/skbuff.h> #include <linux/errno.h> #include <linux/random.h> #include <net/ip.h> #include <net/ipv6.h> #include <net/netlink.h> #include <net/tcp.h> #include <linux/netfilter.h> #include <linux/netfilter/ipset/pfxlen.h> #include <linux/netfilter/ipset/ip_set.h> #include <linux/netfilter/ipset/ip_set_getport.h> #include <linux/netfilter/ipset/ip_set_hash.h> #define IPSET_TYPE_REV_MIN 0 /* 1 SCTP and UDPLITE support added */ /* 2 Counters support added */ /* 3 Comments support added */ /* 4 Forceadd support added */ /* 5 skbinfo support added */ /* 6 bucketsize, initval support added */ #define IPSET_TYPE_REV_MAX 7 /* bitmask support added */ MODULE_LICENSE("GPL"); MODULE_AUTHOR("Jozsef Kadlecsik <kadlec@netfilter.org>"); IP_SET_MODULE_DESC("hash:ip,port", IPSET_TYPE_REV_MIN, IPSET_TYPE_REV_MAX); MODULE_ALIAS("ip_set_hash:ip,port"); /* Type specific function prefix */ #define HTYPE hash_ipport #define IP_SET_HASH_WITH_NETMASK #define IP_SET_HASH_WITH_BITMASK /* IPv4 variant */ /* Member elements */ struct hash_ipport4_elem { __be32 ip; __be16 port; u8 proto; u8 padding; }; /* Common functions */ static bool hash_ipport4_data_equal(const struct hash_ipport4_elem *ip1, const struct hash_ipport4_elem *ip2, u32 *multi) { return ip1->ip == ip2->ip && ip1->port == ip2->port && ip1->proto == ip2->proto; } static bool hash_ipport4_data_list(struct sk_buff *skb, const struct hash_ipport4_elem *data) { if (nla_put_ipaddr4(skb, IPSET_ATTR_IP, data->ip) || nla_put_net16(skb, IPSET_ATTR_PORT, data->port) || nla_put_u8(skb, IPSET_ATTR_PROTO, data->proto)) goto nla_put_failure; return false; nla_put_failure: return true; } static void hash_ipport4_data_next(struct hash_ipport4_elem *next, const struct hash_ipport4_elem *d) { next->ip = d->ip; next->port = d->port; } #define MTYPE hash_ipport4 #define HOST_MASK 32 #include "ip_set_hash_gen.h" static int hash_ipport4_kadt(struct ip_set *set, const struct sk_buff *skb, const struct xt_action_param *par, enum ipset_adt adt, struct ip_set_adt_opt *opt) { ipset_adtfn adtfn = set->variant->adt[adt]; struct hash_ipport4_elem e = { .ip = 0 }; struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, set); const struct MTYPE *h = set->data; if (!ip_set_get_ip4_port(skb, opt->flags & IPSET_DIM_TWO_SRC, &e.port, &e.proto)) return -EINVAL; ip4addrptr(skb, opt->flags & IPSET_DIM_ONE_SRC, &e.ip); e.ip &= h->bitmask.ip; if (e.ip == 0) return -EINVAL; return adtfn(set, &e, &ext, &opt->ext, opt->cmdflags); } static int hash_ipport4_uadt(struct ip_set *set, struct nlattr *tb[], enum ipset_adt adt, u32 *lineno, u32 flags, bool retried) { struct hash_ipport4 *h = set->data; ipset_adtfn adtfn = set->variant->adt[adt]; struct hash_ipport4_elem e = { .ip = 0 }; struct ip_set_ext ext = IP_SET_INIT_UEXT(set); u32 ip, ip_to = 0, p = 0, port, port_to, i = 0; bool with_ports = false; int ret; if (tb[IPSET_ATTR_LINENO]) *lineno = nla_get_u32(tb[IPSET_ATTR_LINENO]); if (unlikely(!tb[IPSET_ATTR_IP] || !ip_set_attr_netorder(tb, IPSET_ATTR_PORT) || !ip_set_optattr_netorder(tb, IPSET_ATTR_PORT_TO))) return -IPSET_ERR_PROTOCOL; ret = ip_set_get_ipaddr4(tb[IPSET_ATTR_IP], &e.ip); if (ret) return ret; ret = ip_set_get_extensions(set, tb, &ext); if (ret) return ret; e.ip &= h->bitmask.ip; if (e.ip == 0) return -EINVAL; e.port = nla_get_be16(tb[IPSET_ATTR_PORT]); if (tb[IPSET_ATTR_PROTO]) { e.proto = nla_get_u8(tb[IPSET_ATTR_PROTO]); with_ports = ip_set_proto_with_ports(e.proto); if (e.proto == 0) return -IPSET_ERR_INVALID_PROTO; } else { return -IPSET_ERR_MISSING_PROTO; } if (!(with_ports || e.proto == IPPROTO_ICMP)) e.port = 0; if (adt == IPSET_TEST || !(tb[IPSET_ATTR_IP_TO] || tb[IPSET_ATTR_CIDR] || tb[IPSET_ATTR_PORT_TO])) { ret = adtfn(set, &e, &ext, &ext, flags); return ip_set_eexist(ret, flags) ? 0 : ret; } ip_to = ip = ntohl(e.ip); if (tb[IPSET_ATTR_IP_TO]) { ret = ip_set_get_hostipaddr4(tb[IPSET_ATTR_IP_TO], &ip_to); if (ret) return ret; if (ip > ip_to) swap(ip, ip_to); } else if (tb[IPSET_ATTR_CIDR]) { u8 cidr = nla_get_u8(tb[IPSET_ATTR_CIDR]); if (!cidr || cidr > HOST_MASK) return -IPSET_ERR_INVALID_CIDR; ip_set_mask_from_to(ip, ip_to, cidr); } port_to = port = ntohs(e.port); if (with_ports && tb[IPSET_ATTR_PORT_TO]) { port_to = ip_set_get_h16(tb[IPSET_ATTR_PORT_TO]); if (port > port_to) swap(port, port_to); } if (retried) ip = ntohl(h->next.ip); for (; ip <= ip_to; ip++) { p = retried && ip == ntohl(h->next.ip) ? ntohs(h->next.port) : port; for (; p <= port_to; p++, i++) { e.ip = htonl(ip); e.port = htons(p); if (i > IPSET_MAX_RANGE) { hash_ipport4_data_next(&h->next, &e); return -ERANGE; } ret = adtfn(set, &e, &ext, &ext, flags); if (ret && !ip_set_eexist(ret, flags)) return ret; ret = 0; } } return ret; } /* IPv6 variant */ struct hash_ipport6_elem { union nf_inet_addr ip; __be16 port; u8 proto; u8 padding; }; /* Common functions */ static bool hash_ipport6_data_equal(const struct hash_ipport6_elem *ip1, const struct hash_ipport6_elem *ip2, u32 *multi) { return ipv6_addr_equal(&ip1->ip.in6, &ip2->ip.in6) && ip1->port == ip2->port && ip1->proto == ip2->proto; } static bool hash_ipport6_data_list(struct sk_buff *skb, const struct hash_ipport6_elem *data) { if (nla_put_ipaddr6(skb, IPSET_ATTR_IP, &data->ip.in6) || nla_put_net16(skb, IPSET_ATTR_PORT, data->port) || nla_put_u8(skb, IPSET_ATTR_PROTO, data->proto)) goto nla_put_failure; return false; nla_put_failure: return true; } static void hash_ipport6_data_next(struct hash_ipport6_elem *next, const struct hash_ipport6_elem *d) { next->port = d->port; } #undef MTYPE #undef HOST_MASK #define MTYPE hash_ipport6 #define HOST_MASK 128 #define IP_SET_EMIT_CREATE #include "ip_set_hash_gen.h" static int hash_ipport6_kadt(struct ip_set *set, const struct sk_buff *skb, const struct xt_action_param *par, enum ipset_adt adt, struct ip_set_adt_opt *opt) { ipset_adtfn adtfn = set->variant->adt[adt]; struct hash_ipport6_elem e = { .ip = { .all = { 0 } } }; struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, set); const struct MTYPE *h = set->data; if (!ip_set_get_ip6_port(skb, opt->flags & IPSET_DIM_TWO_SRC, &e.port, &e.proto)) return -EINVAL; ip6addrptr(skb, opt->flags & IPSET_DIM_ONE_SRC, &e.ip.in6); nf_inet_addr_mask_inplace(&e.ip, &h->bitmask); if (ipv6_addr_any(&e.ip.in6)) return -EINVAL; return adtfn(set, &e, &ext, &opt->ext, opt->cmdflags); } static int hash_ipport6_uadt(struct ip_set *set, struct nlattr *tb[], enum ipset_adt adt, u32 *lineno, u32 flags, bool retried) { const struct hash_ipport6 *h = set->data; ipset_adtfn adtfn = set->variant->adt[adt]; struct hash_ipport6_elem e = { .ip = { .all = { 0 } } }; struct ip_set_ext ext = IP_SET_INIT_UEXT(set); u32 port, port_to; bool with_ports = false; int ret; if (tb[IPSET_ATTR_LINENO]) *lineno = nla_get_u32(tb[IPSET_ATTR_LINENO]); if (unlikely(!tb[IPSET_ATTR_IP] || !ip_set_attr_netorder(tb, IPSET_ATTR_PORT) || !ip_set_optattr_netorder(tb, IPSET_ATTR_PORT_TO))) return -IPSET_ERR_PROTOCOL; if (unlikely(tb[IPSET_ATTR_IP_TO])) return -IPSET_ERR_HASH_RANGE_UNSUPPORTED; if (unlikely(tb[IPSET_ATTR_CIDR])) { u8 cidr = nla_get_u8(tb[IPSET_ATTR_CIDR]); if (cidr != HOST_MASK) return -IPSET_ERR_INVALID_CIDR; } ret = ip_set_get_ipaddr6(tb[IPSET_ATTR_IP], &e.ip); if (ret) return ret; ret = ip_set_get_extensions(set, tb, &ext); if (ret) return ret; nf_inet_addr_mask_inplace(&e.ip, &h->bitmask); if (ipv6_addr_any(&e.ip.in6)) return -EINVAL; e.port = nla_get_be16(tb[IPSET_ATTR_PORT]); if (tb[IPSET_ATTR_PROTO]) { e.proto = nla_get_u8(tb[IPSET_ATTR_PROTO]); with_ports = ip_set_proto_with_ports(e.proto); if (e.proto == 0) return -IPSET_ERR_INVALID_PROTO; } else { return -IPSET_ERR_MISSING_PROTO; } if (!(with_ports || e.proto == IPPROTO_ICMPV6)) e.port = 0; if (adt == IPSET_TEST || !with_ports || !tb[IPSET_ATTR_PORT_TO]) { ret = adtfn(set, &e, &ext, &ext, flags); return ip_set_eexist(ret, flags) ? 0 : ret; } port = ntohs(e.port); port_to = ip_set_get_h16(tb[IPSET_ATTR_PORT_TO]); if (port > port_to) swap(port, port_to); if (retried) port = ntohs(h->next.port); for (; port <= port_to; port++) { e.port = htons(port); ret = adtfn(set, &e, &ext, &ext, flags); if (ret && !ip_set_eexist(ret, flags)) return ret; ret = 0; } return ret; } static struct ip_set_type hash_ipport_type __read_mostly = { .name = "hash:ip,port", .protocol = IPSET_PROTOCOL, .features = IPSET_TYPE_IP | IPSET_TYPE_PORT, .dimension = IPSET_DIM_TWO, .family = NFPROTO_UNSPEC, .revision_min = IPSET_TYPE_REV_MIN, .revision_max = IPSET_TYPE_REV_MAX, .create_flags[IPSET_TYPE_REV_MAX] = IPSET_CREATE_FLAG_BUCKETSIZE, .create = hash_ipport_create, .create_policy = { [IPSET_ATTR_HASHSIZE] = { .type = NLA_U32 }, [IPSET_ATTR_MAXELEM] = { .type = NLA_U32 }, [IPSET_ATTR_INITVAL] = { .type = NLA_U32 }, [IPSET_ATTR_BUCKETSIZE] = { .type = NLA_U8 }, [IPSET_ATTR_RESIZE] = { .type = NLA_U8 }, [IPSET_ATTR_PROTO] = { .type = NLA_U8 }, [IPSET_ATTR_TIMEOUT] = { .type = NLA_U32 }, [IPSET_ATTR_CADT_FLAGS] = { .type = NLA_U32 }, [IPSET_ATTR_NETMASK] = { .type = NLA_U8 }, [IPSET_ATTR_BITMASK] = { .type = NLA_NESTED }, }, .adt_policy = { [IPSET_ATTR_IP] = { .type = NLA_NESTED }, [IPSET_ATTR_IP_TO] = { .type = NLA_NESTED }, [IPSET_ATTR_PORT] = { .type = NLA_U16 }, [IPSET_ATTR_PORT_TO] = { .type = NLA_U16 }, [IPSET_ATTR_CIDR] = { .type = NLA_U8 }, [IPSET_ATTR_PROTO] = { .type = NLA_U8 }, [IPSET_ATTR_TIMEOUT] = { .type = NLA_U32 }, [IPSET_ATTR_LINENO] = { .type = NLA_U32 }, [IPSET_ATTR_BYTES] = { .type = NLA_U64 }, [IPSET_ATTR_PACKETS] = { .type = NLA_U64 }, [IPSET_ATTR_COMMENT] = { .type = NLA_NUL_STRING, .len = IPSET_MAX_COMMENT_SIZE }, [IPSET_ATTR_SKBMARK] = { .type = NLA_U64 }, [IPSET_ATTR_SKBPRIO] = { .type = NLA_U32 }, [IPSET_ATTR_SKBQUEUE] = { .type = NLA_U16 }, }, .me = THIS_MODULE, }; static int __init hash_ipport_init(void) { return ip_set_type_register(&hash_ipport_type); } static void __exit hash_ipport_fini(void) { rcu_barrier(); ip_set_type_unregister(&hash_ipport_type); } module_init(hash_ipport_init); module_exit(hash_ipport_fini); |
| 24408 507 24346 24342 16373 696 694 696 20726 1199 1198 1199 4573 5518 19107 4770 4579 5522 19103 19026 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 | /* SPDX-License-Identifier: GPL-2.0-only */ /* * AppArmor security module * * This file contains AppArmor contexts used to associate "labels" to objects. * * Copyright (C) 1998-2008 Novell/SUSE * Copyright 2009-2010 Canonical Ltd. */ #ifndef __AA_CONTEXT_H #define __AA_CONTEXT_H #include <linux/cred.h> #include <linux/slab.h> #include <linux/sched.h> #include "label.h" #include "policy_ns.h" #include "task.h" static inline struct aa_label *cred_label(const struct cred *cred) { struct aa_label **blob = cred->security + apparmor_blob_sizes.lbs_cred; AA_BUG(!blob); return *blob; } static inline void set_cred_label(const struct cred *cred, struct aa_label *label) { struct aa_label **blob = cred->security + apparmor_blob_sizes.lbs_cred; AA_BUG(!blob); *blob = label; } /** * aa_cred_raw_label - obtain cred's label * @cred: cred to obtain label from (NOT NULL) * * Returns: confining label * * does NOT increment reference count */ static inline struct aa_label *aa_cred_raw_label(const struct cred *cred) { struct aa_label *label = cred_label(cred); AA_BUG(!label); return label; } /** * aa_get_newest_cred_label - obtain the newest label on a cred * @cred: cred to obtain label from (NOT NULL) * * Returns: newest version of confining label */ static inline struct aa_label *aa_get_newest_cred_label(const struct cred *cred) { return aa_get_newest_label(aa_cred_raw_label(cred)); } static inline struct aa_label *aa_get_newest_cred_label_condref(const struct cred *cred, bool *needput) { struct aa_label *l = aa_cred_raw_label(cred); if (unlikely(label_is_stale(l))) { *needput = true; return aa_get_newest_label(l); } *needput = false; return l; } static inline void aa_put_label_condref(struct aa_label *l, bool needput) { if (unlikely(needput)) aa_put_label(l); } /** * aa_current_raw_label - find the current tasks confining label * * Returns: up to date confining label or the ns unconfined label (NOT NULL) * * This fn will not update the tasks cred to the most up to date version * of the label so it is safe to call when inside of locks. */ static inline struct aa_label *aa_current_raw_label(void) { return aa_cred_raw_label(current_cred()); } /** * aa_get_current_label - get the newest version of the current tasks label * * Returns: newest version of confining label (NOT NULL) * * This fn will not update the tasks cred, so it is safe inside of locks * * The returned reference must be put with aa_put_label() */ static inline struct aa_label *aa_get_current_label(void) { struct aa_label *l = aa_current_raw_label(); if (label_is_stale(l)) return aa_get_newest_label(l); return aa_get_label(l); } /** * __end_current_label_crit_section - end crit section begun with __begin_... * @label: label obtained from __begin_current_label_crit_section * @needput: output: bool set by __begin_current_label_crit_section * * Returns: label to use for this crit section */ static inline void __end_current_label_crit_section(struct aa_label *label, bool needput) { if (unlikely(needput)) aa_put_label(label); } /** * end_current_label_crit_section - put a reference found with begin_current_label.. * @label: label reference to put * * Should only be used with a reference obtained with * begin_current_label_crit_section and never used in situations where the * task cred may be updated */ static inline void end_current_label_crit_section(struct aa_label *label) { if (label != aa_current_raw_label()) aa_put_label(label); } /** * __begin_current_label_crit_section - current's confining label * @needput: store whether the label needs to be put when ending crit section * * Returns: up to date confining label or the ns unconfined label (NOT NULL) * * safe to call inside locks * * The returned reference must be put with __end_current_label_crit_section() * This must NOT be used if the task cred could be updated within the * critical section between __begin_current_label_crit_section() .. * __end_current_label_crit_section() */ static inline struct aa_label *__begin_current_label_crit_section(bool *needput) { struct aa_label *label = aa_current_raw_label(); if (label_is_stale(label)) { *needput = true; return aa_get_newest_label(label); } *needput = false; return label; } /** * begin_current_label_crit_section - current's confining label and update it * * Returns: up to date confining label or the ns unconfined label (NOT NULL) * * Not safe to call inside locks * * The returned reference must be put with end_current_label_crit_section() * This must NOT be used if the task cred could be updated within the * critical section between begin_current_label_crit_section() .. * end_current_label_crit_section() */ static inline struct aa_label *begin_current_label_crit_section(void) { struct aa_label *label = aa_current_raw_label(); might_sleep(); if (label_is_stale(label)) { label = aa_get_newest_label(label); if (aa_replace_current_label(label) == 0) /* task cred will keep the reference */ aa_put_label(label); } return label; } static inline struct aa_ns *aa_get_current_ns(void) { struct aa_label *label; struct aa_ns *ns; bool needput; label = __begin_current_label_crit_section(&needput); ns = aa_get_ns(labels_ns(label)); __end_current_label_crit_section(label, needput); return ns; } #endif /* __AA_CONTEXT_H */ |
| 115 17 438 118 1 15 12 1 1 21 17 8 1 1 89 10 17 16 46 3 2 1 6 2 8 2 3 3 34 34 8 8 16 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 | /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _INET_ECN_H_ #define _INET_ECN_H_ #include <linux/ip.h> #include <linux/skbuff.h> #include <linux/if_vlan.h> #include <net/inet_sock.h> #include <net/dsfield.h> #include <net/checksum.h> enum { INET_ECN_NOT_ECT = 0, INET_ECN_ECT_1 = 1, INET_ECN_ECT_0 = 2, INET_ECN_CE = 3, INET_ECN_MASK = 3, }; extern int sysctl_tunnel_ecn_log; static inline int INET_ECN_is_ce(__u8 dsfield) { return (dsfield & INET_ECN_MASK) == INET_ECN_CE; } static inline int INET_ECN_is_not_ect(__u8 dsfield) { return (dsfield & INET_ECN_MASK) == INET_ECN_NOT_ECT; } static inline int INET_ECN_is_capable(__u8 dsfield) { return dsfield & INET_ECN_ECT_0; } /* * RFC 3168 9.1.1 * The full-functionality option for ECN encapsulation is to copy the * ECN codepoint of the inside header to the outside header on * encapsulation if the inside header is not-ECT or ECT, and to set the * ECN codepoint of the outside header to ECT(0) if the ECN codepoint of * the inside header is CE. */ static inline __u8 INET_ECN_encapsulate(__u8 outer, __u8 inner) { outer &= ~INET_ECN_MASK; outer |= !INET_ECN_is_ce(inner) ? (inner & INET_ECN_MASK) : INET_ECN_ECT_0; return outer; } static inline void INET_ECN_xmit(struct sock *sk) { inet_sk(sk)->tos |= INET_ECN_ECT_0; if (inet6_sk(sk) != NULL) inet6_sk(sk)->tclass |= INET_ECN_ECT_0; } static inline void INET_ECN_dontxmit(struct sock *sk) { inet_sk(sk)->tos &= ~INET_ECN_MASK; if (inet6_sk(sk) != NULL) inet6_sk(sk)->tclass &= ~INET_ECN_MASK; } #define IP6_ECN_flow_init(label) do { \ (label) &= ~htonl(INET_ECN_MASK << 20); \ } while (0) #define IP6_ECN_flow_xmit(sk, label) do { \ if (INET_ECN_is_capable(inet6_sk(sk)->tclass)) \ (label) |= htonl(INET_ECN_ECT_0 << 20); \ } while (0) static inline int IP_ECN_set_ce(struct iphdr *iph) { u32 ecn = (iph->tos + 1) & INET_ECN_MASK; __be16 check_add; /* * After the last operation we have (in binary): * INET_ECN_NOT_ECT => 01 * INET_ECN_ECT_1 => 10 * INET_ECN_ECT_0 => 11 * INET_ECN_CE => 00 */ if (!(ecn & 2)) return !ecn; /* * The following gives us: * INET_ECN_ECT_1 => check += htons(0xFFFD) * INET_ECN_ECT_0 => check += htons(0xFFFE) */ check_add = (__force __be16)((__force u16)htons(0xFFFB) + (__force u16)htons(ecn)); iph->check = csum16_add(iph->check, check_add); iph->tos |= INET_ECN_CE; return 1; } static inline int IP_ECN_set_ect1(struct iphdr *iph) { if ((iph->tos & INET_ECN_MASK) != INET_ECN_ECT_0) return 0; iph->check = csum16_add(iph->check, htons(0x1)); iph->tos ^= INET_ECN_MASK; return 1; } static inline void IP_ECN_clear(struct iphdr *iph) { iph->tos &= ~INET_ECN_MASK; } static inline void ipv4_copy_dscp(unsigned int dscp, struct iphdr *inner) { dscp &= ~INET_ECN_MASK; ipv4_change_dsfield(inner, INET_ECN_MASK, dscp); } struct ipv6hdr; /* Note: * IP_ECN_set_ce() has to tweak IPV4 checksum when setting CE, * meaning both changes have no effect on skb->csum if/when CHECKSUM_COMPLETE * In IPv6 case, no checksum compensates the change in IPv6 header, * so we have to update skb->csum. */ static inline int IP6_ECN_set_ce(struct sk_buff *skb, struct ipv6hdr *iph) { __be32 from, to; if (INET_ECN_is_not_ect(ipv6_get_dsfield(iph))) return 0; from = *(__be32 *)iph; to = from | htonl(INET_ECN_CE << 20); *(__be32 *)iph = to; if (skb->ip_summed == CHECKSUM_COMPLETE) skb->csum = csum_add(csum_sub(skb->csum, (__force __wsum)from), (__force __wsum)to); return 1; } static inline int IP6_ECN_set_ect1(struct sk_buff *skb, struct ipv6hdr *iph) { __be32 from, to; if ((ipv6_get_dsfield(iph) & INET_ECN_MASK) != INET_ECN_ECT_0) return 0; from = *(__be32 *)iph; to = from ^ htonl(INET_ECN_MASK << 20); *(__be32 *)iph = to; if (skb->ip_summed == CHECKSUM_COMPLETE) skb->csum = csum_add(csum_sub(skb->csum, (__force __wsum)from), (__force __wsum)to); return 1; } static inline void ipv6_copy_dscp(unsigned int dscp, struct ipv6hdr *inner) { dscp &= ~INET_ECN_MASK; ipv6_change_dsfield(inner, INET_ECN_MASK, dscp); } static inline int INET_ECN_set_ce(struct sk_buff *skb) { switch (skb_protocol(skb, true)) { case cpu_to_be16(ETH_P_IP): if (skb_network_header(skb) + sizeof(struct iphdr) <= skb_tail_pointer(skb)) return IP_ECN_set_ce(ip_hdr(skb)); break; case cpu_to_be16(ETH_P_IPV6): if (skb_network_header(skb) + sizeof(struct ipv6hdr) <= skb_tail_pointer(skb)) return IP6_ECN_set_ce(skb, ipv6_hdr(skb)); break; } return 0; } static inline int skb_get_dsfield(struct sk_buff *skb) { switch (skb_protocol(skb, true)) { case cpu_to_be16(ETH_P_IP): if (!pskb_network_may_pull(skb, sizeof(struct iphdr))) break; return ipv4_get_dsfield(ip_hdr(skb)); case cpu_to_be16(ETH_P_IPV6): if (!pskb_network_may_pull(skb, sizeof(struct ipv6hdr))) break; return ipv6_get_dsfield(ipv6_hdr(skb)); } return -1; } static inline int INET_ECN_set_ect1(struct sk_buff *skb) { switch (skb_protocol(skb, true)) { case cpu_to_be16(ETH_P_IP): if (skb_network_header(skb) + sizeof(struct iphdr) <= skb_tail_pointer(skb)) return IP_ECN_set_ect1(ip_hdr(skb)); break; case cpu_to_be16(ETH_P_IPV6): if (skb_network_header(skb) + sizeof(struct ipv6hdr) <= skb_tail_pointer(skb)) return IP6_ECN_set_ect1(skb, ipv6_hdr(skb)); break; } return 0; } /* * RFC 6040 4.2 * To decapsulate the inner header at the tunnel egress, a compliant * tunnel egress MUST set the outgoing ECN field to the codepoint at the * intersection of the appropriate arriving inner header (row) and outer * header (column) in Figure 4 * * +---------+------------------------------------------------+ * |Arriving | Arriving Outer Header | * | Inner +---------+------------+------------+------------+ * | Header | Not-ECT | ECT(0) | ECT(1) | CE | * +---------+---------+------------+------------+------------+ * | Not-ECT | Not-ECT |Not-ECT(!!!)|Not-ECT(!!!)| <drop>(!!!)| * | ECT(0) | ECT(0) | ECT(0) | ECT(1) | CE | * | ECT(1) | ECT(1) | ECT(1) (!) | ECT(1) | CE | * | CE | CE | CE | CE(!!!)| CE | * +---------+---------+------------+------------+------------+ * * Figure 4: New IP in IP Decapsulation Behaviour * * returns 0 on success * 1 if something is broken and should be logged (!!! above) * 2 if packet should be dropped */ static inline int __INET_ECN_decapsulate(__u8 outer, __u8 inner, bool *set_ce) { if (INET_ECN_is_not_ect(inner)) { switch (outer & INET_ECN_MASK) { case INET_ECN_NOT_ECT: return 0; case INET_ECN_ECT_0: case INET_ECN_ECT_1: return 1; case INET_ECN_CE: return 2; } } *set_ce = INET_ECN_is_ce(outer); return 0; } static inline int INET_ECN_decapsulate(struct sk_buff *skb, __u8 outer, __u8 inner) { bool set_ce = false; int rc; rc = __INET_ECN_decapsulate(outer, inner, &set_ce); if (!rc) { if (set_ce) INET_ECN_set_ce(skb); else if ((outer & INET_ECN_MASK) == INET_ECN_ECT_1) INET_ECN_set_ect1(skb); } return rc; } static inline int IP_ECN_decapsulate(const struct iphdr *oiph, struct sk_buff *skb) { __u8 inner; switch (skb_protocol(skb, true)) { case htons(ETH_P_IP): inner = ip_hdr(skb)->tos; break; case htons(ETH_P_IPV6): inner = ipv6_get_dsfield(ipv6_hdr(skb)); break; default: return 0; } return INET_ECN_decapsulate(skb, oiph->tos, inner); } static inline int IP6_ECN_decapsulate(const struct ipv6hdr *oipv6h, struct sk_buff *skb) { __u8 inner; switch (skb_protocol(skb, true)) { case htons(ETH_P_IP): inner = ip_hdr(skb)->tos; break; case htons(ETH_P_IPV6): inner = ipv6_get_dsfield(ipv6_hdr(skb)); break; default: return 0; } return INET_ECN_decapsulate(skb, ipv6_get_dsfield(oipv6h), inner); } #endif |
| 6 10 3 3 1 1 1 2 1 1 1 2 3 2 20 3 1 2 1 1 4 4 4 1 3 2 1 6 5 5 6 6 6 1 5 1 3 5 5 3 2 3 2 1 1 2 3 1 2 4 4 1 2 1 1 4 4 4 8 8 5 3 2 2 2 5 2 2 1 4 2 2 3 3 3 1 1 2 3 3 2 1 2 1 2 2 2 10 9 2 8 10 4 4 6 5 6 4 4 2 3 6 4 4 4 4 1 1 4 1 1 3 3 3 6 4 2 2 1 4 4 2 1 1 2 2 2 2 1 1 1 3 3 3 1 2 2 2 1 2 2 2 2 2 1 1 2 2 1 1 1 1 2 1 1 1 1 3 1 2 2 2 2 2 2 2 2 2 2 2 3 3 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 3 7 2 7 5 2 5 17 17 17 1 6 9 16 14 2 1 2 14 9 5 13 10 10 7 7 3 3 1 2 2 2 2 1 1 1 7 15 4 23 6 6 20 21 21 2 20 21 5 20 9 9 1 3 9 1 11 2 1 1 1 1 1 12 1 11 2 1 11 11 11 1 11 2 2 2 3 3 3 1 1 1 1 1 1 5 1 1 3 3 3 1 2 1 1 1 1 1 1 1 1 3 3 1 2 1 1 3 3 1 2 2 2 2 2 2 2 1 1 1 69 69 59 47 10 2 1 37 31 68 158 3 95 74 79 80 159 2 143 105 3 146 13 158 158 160 159 3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 2674 2675 2676 2677 2678 2679 2680 2681 2682 2683 2684 2685 2686 2687 2688 2689 2690 2691 2692 2693 2694 2695 2696 2697 2698 2699 2700 2701 2702 2703 2704 2705 2706 2707 2708 2709 2710 2711 2712 2713 2714 2715 2716 2717 2718 2719 2720 2721 2722 2723 2724 2725 2726 2727 2728 2729 2730 2731 2732 2733 2734 2735 2736 2737 2738 2739 2740 2741 2742 2743 2744 2745 2746 2747 2748 2749 2750 2751 2752 2753 2754 2755 2756 2757 2758 2759 2760 2761 2762 2763 2764 2765 2766 2767 2768 2769 2770 2771 2772 2773 2774 2775 2776 2777 2778 2779 2780 2781 2782 2783 2784 2785 2786 2787 2788 2789 2790 2791 2792 2793 2794 2795 2796 2797 2798 2799 2800 2801 2802 2803 2804 2805 2806 2807 2808 2809 2810 2811 2812 2813 2814 2815 2816 2817 2818 2819 2820 2821 2822 2823 2824 2825 2826 2827 2828 2829 2830 2831 2832 2833 2834 2835 2836 2837 2838 2839 2840 2841 2842 2843 2844 2845 2846 2847 2848 2849 2850 2851 2852 2853 2854 2855 2856 2857 2858 2859 2860 2861 2862 2863 2864 2865 2866 2867 2868 2869 2870 2871 2872 2873 2874 2875 2876 2877 2878 2879 2880 2881 2882 2883 2884 2885 2886 2887 2888 2889 2890 2891 2892 2893 2894 2895 2896 2897 2898 2899 2900 2901 2902 2903 2904 2905 2906 2907 2908 2909 2910 2911 2912 2913 2914 2915 2916 2917 2918 2919 2920 2921 2922 2923 2924 2925 2926 2927 2928 2929 2930 2931 2932 2933 2934 2935 2936 2937 2938 2939 2940 2941 2942 2943 2944 2945 2946 2947 2948 2949 2950 2951 2952 2953 2954 2955 2956 2957 2958 2959 2960 2961 2962 2963 2964 2965 2966 2967 2968 2969 2970 2971 2972 2973 2974 2975 2976 2977 2978 2979 2980 2981 2982 2983 2984 2985 2986 2987 2988 2989 2990 2991 2992 2993 2994 2995 2996 2997 2998 2999 3000 3001 3002 3003 3004 3005 3006 3007 3008 3009 3010 3011 3012 3013 3014 3015 3016 3017 3018 3019 3020 3021 3022 3023 3024 3025 3026 3027 3028 3029 3030 3031 3032 3033 3034 3035 3036 3037 3038 3039 3040 3041 3042 3043 3044 3045 3046 3047 3048 3049 3050 3051 3052 3053 3054 3055 3056 3057 3058 3059 3060 3061 3062 3063 3064 3065 3066 3067 3068 3069 3070 3071 3072 3073 3074 3075 3076 3077 3078 3079 3080 3081 3082 3083 3084 3085 3086 3087 3088 3089 3090 3091 3092 3093 3094 3095 3096 3097 3098 3099 3100 3101 3102 3103 3104 3105 3106 3107 3108 3109 3110 3111 3112 3113 3114 3115 3116 3117 3118 3119 3120 3121 3122 3123 3124 3125 3126 3127 3128 3129 3130 3131 3132 3133 3134 3135 3136 3137 3138 3139 3140 3141 3142 3143 3144 3145 3146 3147 3148 3149 3150 3151 3152 3153 3154 3155 3156 3157 3158 3159 3160 3161 3162 3163 3164 3165 3166 3167 3168 3169 3170 3171 3172 3173 3174 3175 3176 3177 3178 3179 3180 3181 3182 3183 3184 3185 3186 3187 3188 3189 3190 3191 3192 3193 3194 3195 3196 3197 3198 3199 3200 3201 3202 3203 3204 3205 3206 3207 3208 3209 3210 3211 3212 3213 3214 3215 3216 3217 3218 3219 3220 3221 3222 3223 3224 3225 3226 3227 3228 3229 3230 3231 3232 3233 3234 3235 3236 3237 3238 3239 3240 3241 3242 3243 3244 3245 3246 3247 3248 3249 3250 3251 3252 3253 3254 3255 3256 3257 3258 3259 3260 3261 3262 3263 3264 3265 3266 3267 3268 3269 3270 3271 3272 3273 3274 3275 3276 3277 3278 3279 3280 3281 3282 3283 3284 3285 3286 3287 3288 3289 3290 3291 3292 3293 3294 3295 3296 3297 3298 3299 3300 3301 3302 3303 3304 3305 3306 3307 3308 3309 3310 3311 3312 3313 3314 3315 3316 3317 3318 3319 3320 3321 3322 3323 3324 3325 3326 3327 3328 3329 3330 3331 3332 3333 3334 3335 3336 3337 3338 3339 3340 3341 3342 3343 3344 3345 3346 3347 3348 3349 3350 3351 3352 3353 3354 3355 3356 3357 3358 3359 3360 3361 3362 3363 3364 3365 3366 3367 3368 3369 3370 3371 3372 3373 3374 3375 3376 3377 3378 3379 3380 3381 3382 3383 3384 3385 3386 3387 3388 3389 3390 3391 3392 3393 3394 3395 3396 3397 3398 3399 3400 3401 3402 3403 3404 3405 3406 3407 3408 3409 3410 3411 3412 3413 3414 3415 3416 3417 3418 3419 3420 3421 3422 3423 3424 3425 3426 3427 3428 3429 3430 3431 3432 3433 3434 3435 3436 3437 3438 3439 3440 3441 3442 3443 3444 3445 3446 3447 3448 3449 3450 3451 3452 3453 3454 3455 3456 3457 3458 3459 3460 3461 3462 3463 3464 3465 3466 3467 3468 3469 3470 3471 3472 3473 3474 3475 3476 3477 3478 3479 3480 3481 3482 3483 3484 3485 3486 3487 3488 3489 3490 3491 3492 3493 3494 3495 3496 3497 3498 3499 3500 3501 3502 3503 3504 3505 3506 3507 3508 3509 3510 3511 3512 3513 3514 3515 3516 3517 3518 3519 3520 3521 3522 3523 3524 3525 3526 3527 3528 3529 3530 3531 3532 3533 3534 3535 3536 3537 3538 3539 3540 3541 3542 3543 3544 3545 3546 3547 3548 3549 3550 3551 3552 3553 3554 3555 3556 3557 3558 3559 3560 3561 3562 3563 3564 3565 3566 3567 3568 3569 3570 3571 3572 3573 3574 3575 3576 3577 3578 3579 3580 3581 3582 3583 3584 3585 3586 3587 3588 3589 3590 3591 3592 3593 3594 3595 3596 3597 3598 3599 3600 3601 3602 3603 3604 3605 3606 3607 3608 3609 3610 3611 3612 3613 3614 3615 3616 3617 3618 3619 3620 3621 3622 3623 3624 3625 3626 3627 3628 3629 3630 3631 3632 3633 3634 3635 3636 3637 3638 3639 3640 3641 3642 3643 3644 3645 3646 3647 3648 3649 3650 3651 3652 3653 3654 3655 3656 3657 3658 3659 3660 3661 3662 3663 3664 3665 3666 3667 3668 3669 3670 3671 3672 3673 3674 3675 3676 3677 3678 3679 3680 3681 3682 3683 3684 3685 3686 3687 3688 3689 3690 3691 3692 3693 3694 3695 3696 3697 3698 3699 3700 3701 3702 3703 3704 3705 3706 3707 3708 3709 3710 3711 3712 3713 3714 3715 3716 3717 3718 3719 3720 3721 3722 3723 3724 3725 3726 3727 3728 3729 3730 3731 3732 3733 3734 3735 3736 3737 3738 3739 3740 3741 3742 3743 3744 3745 3746 3747 3748 3749 3750 3751 3752 3753 3754 3755 3756 3757 3758 3759 3760 3761 3762 3763 3764 3765 3766 3767 3768 3769 3770 3771 3772 3773 3774 3775 3776 3777 3778 3779 3780 3781 3782 3783 3784 3785 3786 3787 3788 3789 3790 3791 3792 3793 3794 3795 3796 3797 3798 3799 3800 3801 3802 3803 3804 3805 3806 3807 3808 3809 3810 3811 3812 3813 3814 3815 3816 3817 3818 3819 3820 3821 3822 3823 3824 3825 3826 3827 3828 3829 3830 3831 3832 3833 3834 3835 3836 3837 3838 3839 3840 3841 3842 3843 3844 3845 3846 3847 3848 3849 3850 3851 3852 3853 3854 3855 3856 3857 3858 3859 3860 3861 3862 3863 3864 3865 3866 3867 3868 3869 3870 3871 3872 3873 3874 3875 3876 3877 3878 3879 3880 3881 3882 3883 3884 3885 3886 3887 3888 3889 3890 3891 3892 3893 3894 3895 3896 3897 3898 3899 3900 3901 3902 3903 3904 3905 3906 3907 3908 3909 3910 3911 3912 3913 3914 3915 3916 3917 3918 3919 3920 3921 3922 3923 3924 3925 3926 3927 3928 3929 3930 3931 3932 3933 3934 3935 3936 3937 3938 3939 3940 3941 3942 3943 3944 3945 3946 3947 3948 3949 3950 3951 3952 3953 3954 3955 3956 3957 3958 3959 3960 3961 3962 3963 3964 3965 3966 3967 3968 3969 3970 3971 3972 3973 3974 3975 3976 3977 3978 3979 3980 3981 3982 3983 3984 3985 3986 3987 3988 3989 3990 3991 3992 3993 3994 3995 3996 3997 3998 3999 4000 4001 4002 4003 4004 4005 4006 4007 4008 4009 4010 4011 4012 4013 4014 4015 4016 4017 4018 4019 4020 4021 4022 4023 4024 4025 4026 4027 4028 4029 4030 4031 4032 4033 4034 4035 4036 4037 4038 4039 4040 4041 4042 4043 4044 4045 4046 4047 4048 4049 4050 4051 4052 4053 4054 4055 4056 4057 4058 4059 4060 4061 4062 4063 4064 4065 4066 4067 4068 4069 4070 4071 4072 4073 4074 4075 4076 4077 4078 4079 4080 4081 4082 4083 4084 4085 4086 4087 4088 4089 4090 4091 4092 4093 4094 4095 4096 4097 4098 4099 4100 4101 4102 4103 4104 4105 4106 4107 4108 4109 4110 4111 4112 4113 4114 4115 4116 4117 4118 4119 4120 4121 4122 4123 4124 4125 4126 4127 4128 4129 4130 4131 4132 4133 4134 4135 4136 4137 4138 4139 4140 4141 4142 4143 4144 4145 4146 4147 4148 4149 4150 4151 4152 4153 4154 4155 4156 4157 4158 4159 4160 4161 4162 4163 4164 4165 4166 4167 4168 4169 4170 4171 4172 4173 4174 4175 4176 4177 4178 4179 4180 4181 4182 4183 4184 4185 4186 4187 4188 4189 4190 4191 4192 4193 4194 4195 4196 4197 4198 4199 4200 4201 4202 4203 4204 4205 4206 4207 4208 4209 4210 4211 4212 4213 4214 4215 4216 4217 4218 4219 4220 4221 4222 4223 4224 4225 4226 4227 4228 4229 4230 4231 4232 4233 4234 4235 4236 4237 4238 4239 4240 4241 4242 4243 4244 4245 4246 4247 4248 4249 4250 4251 4252 4253 4254 4255 4256 4257 4258 4259 4260 4261 4262 4263 4264 4265 4266 4267 4268 4269 4270 4271 4272 4273 4274 4275 4276 4277 4278 4279 4280 4281 4282 4283 4284 4285 4286 4287 4288 4289 4290 4291 4292 4293 4294 4295 4296 4297 4298 4299 4300 4301 4302 4303 4304 4305 4306 4307 4308 4309 4310 4311 4312 4313 4314 4315 4316 4317 4318 4319 4320 4321 4322 4323 4324 4325 4326 4327 4328 4329 4330 4331 4332 4333 4334 4335 4336 4337 4338 4339 4340 4341 4342 4343 4344 4345 4346 4347 4348 4349 4350 4351 4352 4353 4354 4355 4356 4357 4358 4359 4360 4361 4362 4363 4364 4365 4366 4367 4368 4369 4370 4371 4372 4373 4374 4375 4376 4377 4378 4379 4380 4381 4382 4383 4384 4385 4386 4387 4388 4389 4390 4391 4392 4393 4394 4395 4396 4397 4398 4399 4400 4401 4402 4403 4404 4405 4406 4407 4408 4409 4410 4411 4412 4413 4414 4415 4416 4417 4418 4419 4420 4421 4422 4423 4424 4425 4426 4427 4428 4429 4430 4431 4432 4433 4434 4435 4436 4437 4438 4439 4440 4441 4442 4443 4444 4445 4446 4447 4448 4449 4450 4451 4452 4453 4454 4455 4456 4457 4458 4459 4460 4461 4462 4463 4464 4465 4466 4467 4468 4469 4470 4471 4472 4473 4474 4475 4476 4477 4478 4479 4480 4481 4482 4483 4484 4485 4486 4487 4488 4489 4490 4491 4492 4493 4494 4495 4496 4497 4498 4499 4500 4501 4502 4503 4504 4505 4506 4507 4508 4509 4510 4511 4512 4513 4514 4515 4516 4517 4518 4519 4520 4521 4522 4523 4524 4525 4526 4527 4528 4529 4530 4531 4532 4533 4534 4535 4536 4537 4538 4539 4540 4541 4542 4543 4544 4545 4546 4547 4548 4549 4550 4551 4552 4553 4554 4555 4556 4557 4558 4559 4560 4561 4562 4563 4564 4565 4566 4567 4568 4569 4570 4571 4572 4573 4574 4575 4576 4577 4578 4579 4580 4581 4582 4583 4584 4585 4586 4587 4588 4589 4590 4591 4592 4593 4594 4595 4596 4597 4598 4599 4600 4601 4602 4603 4604 4605 4606 4607 4608 4609 4610 4611 4612 4613 4614 4615 4616 4617 4618 4619 4620 4621 4622 4623 4624 4625 4626 4627 4628 4629 4630 4631 4632 4633 4634 4635 4636 4637 4638 4639 4640 4641 4642 4643 4644 4645 4646 4647 4648 4649 4650 4651 4652 4653 4654 4655 4656 4657 4658 4659 4660 4661 4662 4663 4664 4665 4666 4667 4668 4669 4670 4671 4672 4673 4674 4675 4676 4677 4678 4679 4680 4681 4682 4683 4684 4685 4686 4687 4688 4689 4690 4691 4692 4693 4694 4695 4696 4697 4698 4699 4700 4701 4702 4703 4704 4705 4706 4707 4708 4709 4710 4711 4712 4713 4714 4715 4716 4717 4718 4719 4720 4721 4722 4723 4724 4725 4726 4727 4728 4729 4730 4731 4732 4733 4734 4735 4736 4737 4738 4739 4740 4741 4742 4743 4744 4745 4746 4747 4748 4749 4750 4751 4752 4753 4754 4755 4756 4757 4758 4759 4760 4761 4762 4763 4764 4765 4766 4767 4768 4769 4770 4771 4772 4773 4774 4775 4776 4777 4778 4779 4780 4781 4782 4783 4784 4785 4786 4787 4788 4789 4790 4791 4792 4793 4794 4795 4796 4797 4798 4799 4800 4801 4802 4803 4804 4805 4806 4807 4808 4809 4810 4811 4812 4813 4814 4815 4816 4817 4818 4819 4820 4821 4822 4823 4824 4825 4826 4827 4828 4829 4830 4831 4832 4833 4834 4835 4836 4837 4838 4839 4840 4841 4842 4843 4844 4845 4846 4847 4848 4849 4850 4851 4852 4853 4854 4855 4856 4857 4858 4859 4860 4861 4862 4863 4864 4865 4866 4867 4868 4869 4870 4871 4872 4873 4874 4875 4876 4877 4878 4879 4880 4881 4882 4883 4884 4885 4886 4887 4888 4889 4890 4891 4892 4893 4894 4895 4896 4897 4898 4899 4900 4901 4902 4903 4904 4905 4906 4907 4908 4909 4910 4911 4912 4913 4914 4915 4916 4917 4918 4919 4920 4921 4922 4923 4924 4925 4926 4927 4928 4929 4930 4931 4932 4933 4934 4935 4936 4937 4938 4939 4940 4941 4942 4943 4944 4945 4946 4947 4948 4949 4950 4951 4952 4953 4954 4955 4956 4957 4958 4959 4960 4961 4962 4963 4964 4965 4966 4967 4968 4969 4970 4971 4972 4973 4974 4975 4976 4977 4978 4979 4980 4981 4982 4983 4984 4985 4986 4987 4988 4989 4990 4991 4992 4993 4994 4995 4996 4997 4998 4999 5000 5001 5002 5003 5004 5005 5006 5007 5008 5009 5010 5011 5012 5013 5014 5015 5016 5017 5018 5019 5020 5021 5022 5023 5024 5025 5026 5027 5028 5029 5030 5031 5032 5033 5034 5035 5036 5037 5038 5039 5040 5041 5042 5043 5044 5045 5046 5047 5048 5049 5050 5051 5052 5053 5054 5055 5056 5057 5058 5059 5060 5061 5062 5063 5064 5065 5066 5067 5068 5069 5070 5071 5072 5073 5074 5075 5076 5077 5078 5079 5080 5081 5082 5083 5084 5085 5086 5087 5088 5089 5090 5091 5092 5093 5094 5095 5096 5097 5098 5099 5100 5101 5102 5103 5104 5105 5106 5107 5108 5109 5110 5111 5112 5113 5114 5115 5116 5117 5118 5119 5120 5121 5122 5123 5124 5125 5126 5127 5128 5129 5130 5131 5132 5133 5134 5135 5136 5137 5138 5139 5140 5141 5142 5143 5144 5145 5146 5147 5148 5149 5150 5151 5152 5153 5154 5155 5156 5157 5158 5159 5160 5161 5162 5163 5164 5165 5166 5167 5168 5169 5170 5171 5172 5173 5174 5175 5176 5177 5178 5179 5180 5181 5182 5183 5184 5185 5186 5187 5188 5189 5190 5191 5192 5193 5194 5195 5196 5197 5198 5199 5200 5201 5202 5203 5204 5205 5206 5207 5208 5209 5210 5211 5212 5213 5214 5215 5216 5217 5218 5219 5220 5221 5222 5223 5224 5225 5226 5227 5228 5229 5230 5231 5232 5233 5234 5235 5236 5237 5238 5239 5240 5241 5242 5243 5244 5245 5246 5247 5248 5249 5250 5251 5252 5253 5254 5255 5256 5257 5258 5259 5260 5261 5262 5263 5264 5265 5266 5267 5268 5269 5270 5271 5272 5273 5274 5275 5276 5277 5278 5279 5280 5281 5282 5283 5284 5285 5286 5287 5288 5289 5290 5291 5292 5293 5294 5295 5296 5297 5298 5299 5300 5301 5302 5303 5304 5305 5306 5307 5308 5309 5310 5311 5312 5313 5314 5315 5316 5317 5318 5319 5320 5321 5322 5323 5324 5325 5326 5327 5328 5329 5330 5331 5332 5333 5334 5335 5336 5337 5338 5339 5340 5341 5342 5343 5344 5345 5346 5347 5348 5349 5350 5351 5352 5353 5354 5355 5356 5357 5358 5359 5360 5361 5362 5363 5364 5365 5366 5367 5368 5369 5370 5371 5372 5373 5374 5375 5376 5377 5378 5379 5380 5381 5382 5383 5384 5385 5386 5387 5388 5389 5390 5391 5392 5393 5394 5395 5396 5397 5398 5399 5400 5401 5402 5403 5404 5405 5406 5407 5408 5409 5410 5411 5412 5413 5414 5415 5416 5417 5418 5419 5420 5421 5422 5423 5424 5425 5426 5427 5428 5429 5430 5431 5432 5433 5434 5435 5436 5437 5438 5439 5440 5441 5442 5443 5444 5445 5446 5447 5448 5449 5450 5451 5452 5453 5454 5455 5456 5457 5458 5459 5460 5461 5462 5463 5464 5465 5466 5467 5468 5469 5470 5471 5472 5473 5474 5475 5476 5477 5478 5479 5480 5481 5482 5483 5484 5485 5486 5487 5488 5489 5490 5491 5492 5493 5494 5495 5496 5497 5498 5499 5500 5501 5502 5503 5504 5505 5506 5507 5508 5509 5510 5511 5512 5513 5514 5515 5516 5517 5518 5519 5520 5521 5522 5523 5524 5525 5526 5527 5528 5529 5530 5531 5532 5533 5534 5535 5536 5537 5538 5539 5540 5541 5542 5543 5544 5545 5546 5547 5548 5549 5550 5551 5552 5553 5554 5555 5556 5557 5558 5559 5560 5561 5562 5563 5564 5565 5566 5567 5568 5569 5570 5571 5572 5573 5574 5575 5576 5577 5578 5579 5580 5581 5582 5583 5584 5585 5586 5587 5588 5589 5590 5591 5592 5593 5594 5595 5596 5597 5598 5599 5600 5601 5602 5603 5604 5605 5606 5607 5608 5609 5610 5611 5612 5613 5614 5615 5616 5617 5618 5619 5620 5621 5622 5623 5624 5625 5626 5627 5628 5629 5630 5631 5632 5633 5634 5635 5636 5637 5638 5639 5640 5641 5642 5643 5644 5645 5646 5647 5648 5649 5650 5651 5652 5653 5654 5655 5656 5657 5658 5659 5660 5661 5662 5663 5664 5665 5666 5667 5668 5669 5670 5671 5672 5673 5674 5675 5676 5677 5678 5679 5680 5681 5682 5683 5684 5685 5686 5687 5688 5689 5690 5691 5692 5693 5694 5695 5696 5697 5698 5699 5700 5701 5702 5703 5704 5705 5706 5707 5708 5709 5710 5711 5712 5713 5714 5715 5716 5717 5718 5719 5720 5721 5722 5723 5724 5725 5726 5727 5728 5729 5730 5731 5732 5733 5734 5735 5736 5737 5738 5739 5740 5741 5742 5743 5744 5745 5746 5747 5748 5749 5750 5751 5752 5753 5754 5755 5756 5757 5758 5759 5760 5761 5762 5763 5764 5765 5766 5767 5768 5769 5770 5771 5772 5773 5774 5775 5776 5777 5778 5779 5780 5781 5782 5783 5784 5785 5786 5787 5788 5789 5790 5791 5792 5793 5794 5795 5796 5797 5798 5799 5800 5801 5802 5803 5804 5805 5806 5807 5808 5809 5810 5811 5812 5813 5814 5815 5816 5817 5818 5819 5820 5821 5822 5823 5824 5825 5826 5827 5828 5829 5830 5831 5832 5833 5834 5835 5836 5837 5838 5839 5840 5841 5842 5843 5844 5845 5846 5847 5848 5849 5850 5851 5852 5853 5854 5855 5856 5857 5858 5859 5860 5861 5862 5863 5864 5865 5866 5867 5868 5869 5870 5871 5872 5873 5874 5875 5876 5877 5878 5879 5880 5881 5882 5883 5884 5885 5886 5887 5888 5889 5890 5891 5892 5893 5894 5895 5896 5897 5898 5899 5900 5901 5902 5903 5904 5905 5906 5907 5908 5909 5910 5911 5912 5913 5914 5915 5916 5917 5918 5919 5920 5921 5922 5923 5924 5925 5926 5927 5928 5929 5930 5931 5932 5933 5934 5935 5936 5937 5938 5939 5940 5941 5942 5943 5944 5945 5946 5947 5948 5949 5950 5951 5952 5953 5954 5955 5956 5957 5958 5959 5960 5961 5962 5963 5964 5965 5966 5967 5968 5969 5970 5971 5972 5973 5974 5975 5976 5977 5978 5979 5980 5981 5982 5983 5984 5985 5986 5987 5988 5989 5990 5991 5992 5993 5994 5995 5996 5997 5998 5999 6000 6001 6002 6003 6004 6005 6006 6007 6008 6009 6010 6011 6012 6013 6014 6015 6016 6017 6018 6019 6020 6021 6022 6023 6024 6025 6026 6027 6028 6029 6030 6031 6032 6033 6034 6035 6036 6037 6038 6039 6040 6041 6042 6043 6044 6045 6046 6047 6048 6049 6050 6051 6052 6053 6054 6055 6056 6057 6058 6059 6060 6061 6062 6063 6064 6065 6066 6067 6068 6069 6070 6071 6072 6073 6074 6075 6076 6077 6078 6079 6080 6081 6082 6083 6084 6085 6086 6087 6088 6089 6090 6091 6092 6093 6094 6095 6096 6097 6098 6099 6100 6101 6102 6103 6104 6105 6106 6107 6108 6109 6110 6111 6112 6113 6114 6115 6116 6117 6118 6119 6120 6121 6122 6123 6124 6125 6126 6127 6128 6129 6130 6131 6132 6133 6134 6135 6136 6137 6138 6139 6140 6141 6142 6143 6144 6145 6146 6147 6148 6149 6150 6151 6152 6153 6154 6155 6156 6157 6158 6159 6160 6161 6162 6163 6164 6165 6166 6167 6168 6169 6170 6171 6172 6173 6174 6175 6176 6177 6178 6179 6180 6181 6182 6183 6184 6185 6186 6187 6188 6189 6190 6191 6192 6193 6194 6195 6196 6197 6198 6199 6200 6201 6202 6203 6204 6205 6206 6207 6208 6209 6210 6211 6212 6213 6214 6215 6216 6217 6218 6219 6220 6221 6222 6223 6224 6225 6226 6227 6228 6229 6230 6231 6232 6233 6234 6235 6236 6237 6238 6239 6240 6241 6242 6243 6244 6245 6246 6247 6248 6249 6250 6251 6252 6253 6254 6255 6256 6257 6258 6259 6260 6261 6262 6263 6264 6265 6266 6267 6268 6269 6270 6271 6272 6273 6274 6275 6276 6277 6278 6279 6280 6281 6282 6283 6284 6285 6286 6287 6288 6289 6290 6291 6292 6293 6294 6295 6296 6297 6298 6299 6300 6301 6302 6303 6304 6305 6306 6307 6308 6309 6310 6311 6312 6313 6314 6315 6316 6317 6318 6319 6320 6321 6322 6323 6324 6325 6326 6327 6328 6329 6330 6331 6332 6333 6334 6335 6336 6337 6338 6339 6340 6341 6342 6343 6344 6345 6346 6347 6348 6349 6350 6351 6352 6353 6354 6355 6356 6357 6358 6359 6360 6361 6362 6363 6364 6365 6366 6367 6368 6369 6370 6371 6372 6373 6374 6375 6376 6377 6378 6379 6380 6381 6382 6383 6384 6385 6386 6387 6388 6389 6390 6391 6392 6393 6394 6395 6396 6397 6398 6399 6400 6401 6402 6403 6404 6405 6406 6407 6408 6409 6410 6411 6412 6413 6414 6415 6416 6417 6418 6419 6420 6421 6422 6423 6424 6425 6426 6427 6428 6429 6430 6431 6432 6433 6434 6435 6436 6437 6438 6439 6440 6441 6442 6443 6444 6445 6446 6447 6448 6449 6450 6451 6452 6453 6454 6455 6456 6457 6458 6459 6460 6461 6462 6463 6464 6465 6466 6467 6468 6469 6470 6471 6472 6473 6474 6475 6476 6477 6478 6479 6480 6481 6482 6483 6484 6485 6486 6487 6488 6489 6490 6491 6492 6493 6494 6495 6496 6497 6498 6499 6500 6501 6502 6503 6504 6505 6506 6507 6508 6509 6510 6511 6512 6513 6514 6515 6516 6517 6518 6519 6520 6521 6522 6523 6524 6525 6526 6527 6528 6529 6530 6531 6532 6533 6534 6535 6536 6537 6538 6539 6540 6541 6542 6543 6544 6545 6546 6547 6548 6549 6550 6551 6552 6553 6554 6555 6556 6557 6558 6559 6560 6561 6562 6563 6564 6565 6566 6567 6568 6569 6570 6571 6572 6573 6574 6575 6576 6577 6578 6579 6580 6581 6582 6583 6584 6585 6586 6587 6588 6589 6590 6591 6592 6593 6594 6595 6596 6597 6598 6599 6600 6601 6602 6603 6604 6605 6606 6607 6608 6609 6610 6611 6612 6613 6614 6615 6616 6617 6618 6619 6620 6621 6622 6623 6624 6625 6626 6627 6628 6629 6630 6631 6632 6633 6634 6635 6636 6637 6638 6639 6640 6641 6642 6643 6644 6645 6646 6647 6648 6649 6650 6651 6652 6653 6654 6655 6656 6657 6658 6659 6660 6661 6662 6663 6664 6665 6666 6667 6668 6669 6670 6671 6672 6673 6674 6675 6676 6677 6678 6679 6680 6681 6682 6683 6684 6685 6686 6687 6688 6689 6690 6691 6692 6693 6694 6695 6696 6697 6698 6699 6700 6701 6702 6703 6704 6705 6706 6707 6708 6709 6710 6711 6712 6713 6714 6715 6716 6717 6718 6719 6720 6721 6722 6723 6724 6725 6726 6727 6728 6729 6730 6731 6732 6733 6734 6735 6736 6737 6738 6739 6740 6741 6742 6743 6744 6745 6746 6747 6748 6749 6750 6751 6752 6753 6754 6755 6756 6757 6758 6759 6760 6761 6762 6763 6764 6765 6766 6767 6768 6769 6770 6771 6772 6773 6774 6775 6776 6777 6778 6779 6780 6781 6782 6783 6784 6785 6786 6787 6788 6789 6790 6791 6792 6793 6794 6795 6796 6797 6798 6799 6800 6801 6802 6803 6804 6805 6806 6807 6808 6809 6810 6811 6812 6813 6814 6815 6816 6817 6818 6819 6820 6821 6822 6823 6824 6825 6826 6827 6828 6829 6830 6831 6832 6833 6834 6835 6836 6837 6838 6839 6840 6841 6842 6843 6844 6845 6846 6847 6848 6849 6850 6851 6852 6853 6854 6855 6856 6857 6858 6859 6860 6861 6862 6863 6864 6865 6866 6867 6868 6869 6870 6871 6872 6873 6874 6875 6876 6877 6878 6879 6880 6881 6882 6883 6884 6885 6886 6887 6888 6889 6890 6891 6892 6893 6894 6895 6896 6897 6898 6899 6900 6901 6902 6903 6904 6905 6906 6907 6908 6909 6910 6911 6912 6913 6914 6915 6916 6917 6918 6919 6920 6921 6922 6923 6924 6925 6926 6927 6928 6929 6930 6931 6932 6933 6934 6935 6936 6937 6938 6939 6940 6941 6942 6943 6944 6945 6946 6947 6948 6949 6950 6951 6952 6953 6954 6955 6956 6957 6958 6959 6960 6961 6962 6963 6964 6965 6966 6967 6968 6969 6970 6971 6972 6973 6974 6975 6976 6977 6978 6979 6980 6981 6982 6983 6984 6985 6986 6987 6988 6989 6990 6991 6992 6993 6994 6995 6996 6997 6998 6999 7000 7001 7002 7003 7004 7005 7006 7007 7008 7009 7010 7011 7012 7013 7014 7015 7016 7017 7018 7019 7020 7021 7022 7023 7024 7025 7026 7027 7028 7029 7030 7031 7032 7033 7034 7035 7036 7037 7038 7039 7040 7041 7042 7043 7044 7045 7046 7047 7048 7049 7050 7051 7052 7053 7054 7055 7056 7057 7058 7059 7060 7061 7062 7063 7064 7065 7066 7067 7068 7069 7070 7071 7072 7073 7074 7075 7076 7077 7078 7079 7080 7081 7082 7083 7084 7085 7086 7087 7088 7089 7090 7091 7092 7093 7094 7095 7096 7097 7098 7099 7100 7101 7102 7103 7104 7105 7106 7107 7108 7109 7110 7111 7112 7113 7114 7115 7116 7117 7118 7119 7120 7121 7122 7123 7124 7125 7126 7127 7128 7129 7130 7131 7132 7133 7134 7135 7136 7137 7138 7139 7140 7141 7142 7143 7144 7145 7146 7147 7148 7149 7150 7151 7152 7153 7154 7155 7156 7157 7158 7159 7160 7161 7162 7163 7164 7165 7166 7167 7168 7169 7170 7171 7172 7173 7174 7175 7176 7177 7178 7179 7180 7181 7182 7183 7184 7185 7186 7187 7188 7189 7190 7191 7192 7193 7194 7195 7196 7197 7198 7199 7200 7201 7202 7203 7204 7205 7206 7207 7208 7209 7210 7211 7212 7213 7214 7215 7216 7217 7218 7219 7220 7221 7222 7223 7224 7225 7226 7227 7228 7229 7230 7231 7232 7233 7234 7235 7236 7237 7238 7239 7240 7241 7242 7243 7244 7245 7246 7247 7248 7249 7250 7251 7252 7253 7254 7255 7256 7257 7258 7259 7260 7261 7262 7263 7264 7265 7266 7267 7268 7269 7270 7271 7272 7273 7274 7275 7276 7277 7278 7279 7280 7281 7282 7283 7284 7285 7286 7287 7288 7289 7290 7291 7292 7293 7294 7295 7296 7297 7298 7299 7300 7301 7302 7303 7304 7305 7306 7307 7308 7309 7310 7311 7312 7313 7314 7315 7316 7317 7318 7319 7320 7321 7322 7323 7324 7325 7326 7327 7328 7329 7330 7331 7332 7333 7334 7335 7336 7337 7338 7339 7340 7341 7342 7343 7344 7345 7346 7347 7348 7349 7350 7351 7352 7353 7354 7355 7356 7357 7358 7359 7360 7361 7362 7363 7364 7365 7366 7367 7368 7369 7370 7371 7372 7373 7374 7375 7376 7377 7378 7379 7380 7381 7382 7383 7384 7385 7386 7387 7388 7389 7390 7391 7392 7393 7394 7395 7396 7397 7398 7399 7400 7401 7402 7403 7404 7405 7406 7407 7408 7409 7410 7411 7412 7413 7414 7415 7416 7417 7418 7419 7420 7421 7422 7423 7424 7425 7426 7427 7428 7429 7430 7431 7432 7433 7434 7435 7436 7437 7438 7439 7440 7441 7442 7443 7444 7445 7446 7447 7448 7449 7450 7451 7452 7453 7454 7455 7456 7457 7458 7459 7460 7461 7462 7463 7464 7465 7466 7467 7468 7469 7470 7471 7472 7473 7474 7475 7476 7477 7478 7479 7480 7481 7482 7483 7484 7485 7486 7487 7488 7489 7490 7491 7492 7493 7494 7495 7496 7497 7498 7499 7500 7501 7502 7503 7504 7505 7506 7507 7508 7509 7510 7511 7512 7513 7514 7515 7516 7517 7518 7519 7520 7521 7522 7523 7524 7525 7526 7527 7528 7529 7530 7531 7532 7533 7534 7535 7536 7537 7538 7539 7540 7541 7542 7543 7544 7545 7546 7547 7548 7549 7550 7551 7552 7553 7554 7555 7556 7557 7558 7559 7560 7561 7562 7563 7564 7565 7566 7567 7568 7569 7570 7571 7572 7573 7574 7575 7576 7577 7578 7579 7580 7581 7582 7583 7584 7585 7586 7587 7588 7589 7590 7591 7592 7593 7594 7595 7596 7597 7598 7599 7600 7601 7602 7603 7604 7605 7606 7607 7608 7609 7610 7611 7612 7613 7614 7615 7616 7617 7618 7619 7620 7621 7622 7623 7624 7625 7626 7627 7628 7629 7630 7631 7632 7633 7634 7635 7636 7637 | /* BlueZ - Bluetooth protocol stack for Linux Copyright (c) 2000-2001, 2010, Code Aurora Forum. All rights reserved. Copyright 2023-2024 NXP Written 2000,2001 by Maxim Krasnyansky <maxk@qualcomm.com> This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License version 2 as published by the Free Software Foundation; THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT OF THIRD PARTY RIGHTS. IN NO EVENT SHALL THE COPYRIGHT HOLDER(S) AND AUTHOR(S) BE LIABLE FOR ANY CLAIM, OR ANY SPECIAL INDIRECT OR CONSEQUENTIAL DAMAGES, OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. ALL LIABILITY, INCLUDING LIABILITY FOR INFRINGEMENT OF ANY PATENTS, COPYRIGHTS, TRADEMARKS OR OTHER RIGHTS, RELATING TO USE OF THIS SOFTWARE IS DISCLAIMED. */ /* Bluetooth HCI event handling. */ #include <linux/unaligned.h> #include <linux/crypto.h> #include <crypto/algapi.h> #include <net/bluetooth/bluetooth.h> #include <net/bluetooth/hci_core.h> #include <net/bluetooth/mgmt.h> #include "hci_debugfs.h" #include "hci_codec.h" #include "smp.h" #include "msft.h" #include "eir.h" #define ZERO_KEY "\x00\x00\x00\x00\x00\x00\x00\x00" \ "\x00\x00\x00\x00\x00\x00\x00\x00" /* Handle HCI Event packets */ static void *hci_ev_skb_pull(struct hci_dev *hdev, struct sk_buff *skb, u8 ev, size_t len) { void *data; data = skb_pull_data(skb, len); if (!data) bt_dev_err(hdev, "Malformed Event: 0x%2.2x", ev); return data; } static void *hci_cc_skb_pull(struct hci_dev *hdev, struct sk_buff *skb, u16 op, size_t len) { void *data; data = skb_pull_data(skb, len); if (!data) bt_dev_err(hdev, "Malformed Command Complete: 0x%4.4x", op); return data; } static void *hci_le_ev_skb_pull(struct hci_dev *hdev, struct sk_buff *skb, u8 ev, size_t len) { void *data; data = skb_pull_data(skb, len); if (!data) bt_dev_err(hdev, "Malformed LE Event: 0x%2.2x", ev); return data; } static u8 hci_cc_inquiry_cancel(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_status *rp = data; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); /* It is possible that we receive Inquiry Complete event right * before we receive Inquiry Cancel Command Complete event, in * which case the latter event should have status of Command * Disallowed. This should not be treated as error, since * we actually achieve what Inquiry Cancel wants to achieve, * which is to end the last Inquiry session. */ if (rp->status == HCI_ERROR_COMMAND_DISALLOWED && !test_bit(HCI_INQUIRY, &hdev->flags)) { bt_dev_warn(hdev, "Ignoring error of Inquiry Cancel command"); rp->status = 0x00; } if (rp->status) return rp->status; clear_bit(HCI_INQUIRY, &hdev->flags); smp_mb__after_atomic(); /* wake_up_bit advises about this barrier */ wake_up_bit(&hdev->flags, HCI_INQUIRY); hci_dev_lock(hdev); /* Set discovery state to stopped if we're not doing LE active * scanning. */ if (!hci_dev_test_flag(hdev, HCI_LE_SCAN) || hdev->le_scan_type != LE_SCAN_ACTIVE) hci_discovery_set_state(hdev, DISCOVERY_STOPPED); hci_dev_unlock(hdev); return rp->status; } static u8 hci_cc_periodic_inq(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_status *rp = data; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); if (rp->status) return rp->status; hci_dev_set_flag(hdev, HCI_PERIODIC_INQ); return rp->status; } static u8 hci_cc_exit_periodic_inq(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_status *rp = data; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); if (rp->status) return rp->status; hci_dev_clear_flag(hdev, HCI_PERIODIC_INQ); return rp->status; } static u8 hci_cc_remote_name_req_cancel(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_rp_remote_name_req_cancel *rp = data; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); return rp->status; } static u8 hci_cc_role_discovery(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_rp_role_discovery *rp = data; struct hci_conn *conn; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); if (rp->status) return rp->status; hci_dev_lock(hdev); conn = hci_conn_hash_lookup_handle(hdev, __le16_to_cpu(rp->handle)); if (conn) conn->role = rp->role; hci_dev_unlock(hdev); return rp->status; } static u8 hci_cc_read_link_policy(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_rp_read_link_policy *rp = data; struct hci_conn *conn; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); if (rp->status) return rp->status; hci_dev_lock(hdev); conn = hci_conn_hash_lookup_handle(hdev, __le16_to_cpu(rp->handle)); if (conn) conn->link_policy = __le16_to_cpu(rp->policy); hci_dev_unlock(hdev); return rp->status; } static u8 hci_cc_write_link_policy(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_rp_write_link_policy *rp = data; struct hci_conn *conn; void *sent; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); if (rp->status) return rp->status; sent = hci_sent_cmd_data(hdev, HCI_OP_WRITE_LINK_POLICY); if (!sent) return rp->status; hci_dev_lock(hdev); conn = hci_conn_hash_lookup_handle(hdev, __le16_to_cpu(rp->handle)); if (conn) conn->link_policy = get_unaligned_le16(sent + 2); hci_dev_unlock(hdev); return rp->status; } static u8 hci_cc_read_def_link_policy(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_rp_read_def_link_policy *rp = data; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); if (rp->status) return rp->status; hdev->link_policy = __le16_to_cpu(rp->policy); return rp->status; } static u8 hci_cc_write_def_link_policy(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_status *rp = data; void *sent; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); if (rp->status) return rp->status; sent = hci_sent_cmd_data(hdev, HCI_OP_WRITE_DEF_LINK_POLICY); if (!sent) return rp->status; hdev->link_policy = get_unaligned_le16(sent); return rp->status; } static u8 hci_cc_reset(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_status *rp = data; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); clear_bit(HCI_RESET, &hdev->flags); if (rp->status) return rp->status; /* Reset all non-persistent flags */ hci_dev_clear_volatile_flags(hdev); hci_discovery_set_state(hdev, DISCOVERY_STOPPED); hdev->inq_tx_power = HCI_TX_POWER_INVALID; hdev->adv_tx_power = HCI_TX_POWER_INVALID; memset(hdev->adv_data, 0, sizeof(hdev->adv_data)); hdev->adv_data_len = 0; memset(hdev->scan_rsp_data, 0, sizeof(hdev->scan_rsp_data)); hdev->scan_rsp_data_len = 0; hdev->le_scan_type = LE_SCAN_PASSIVE; hdev->ssp_debug_mode = 0; hci_bdaddr_list_clear(&hdev->le_accept_list); hci_bdaddr_list_clear(&hdev->le_resolv_list); return rp->status; } static u8 hci_cc_read_stored_link_key(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_rp_read_stored_link_key *rp = data; struct hci_cp_read_stored_link_key *sent; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); sent = hci_sent_cmd_data(hdev, HCI_OP_READ_STORED_LINK_KEY); if (!sent) return rp->status; if (!rp->status && sent->read_all == 0x01) { hdev->stored_max_keys = le16_to_cpu(rp->max_keys); hdev->stored_num_keys = le16_to_cpu(rp->num_keys); } return rp->status; } static u8 hci_cc_delete_stored_link_key(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_rp_delete_stored_link_key *rp = data; u16 num_keys; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); if (rp->status) return rp->status; num_keys = le16_to_cpu(rp->num_keys); if (num_keys <= hdev->stored_num_keys) hdev->stored_num_keys -= num_keys; else hdev->stored_num_keys = 0; return rp->status; } static u8 hci_cc_write_local_name(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_status *rp = data; void *sent; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); sent = hci_sent_cmd_data(hdev, HCI_OP_WRITE_LOCAL_NAME); if (!sent) return rp->status; hci_dev_lock(hdev); if (hci_dev_test_flag(hdev, HCI_MGMT)) mgmt_set_local_name_complete(hdev, sent, rp->status); else if (!rp->status) memcpy(hdev->dev_name, sent, HCI_MAX_NAME_LENGTH); hci_dev_unlock(hdev); return rp->status; } static u8 hci_cc_read_local_name(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_rp_read_local_name *rp = data; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); if (rp->status) return rp->status; if (hci_dev_test_flag(hdev, HCI_SETUP) || hci_dev_test_flag(hdev, HCI_CONFIG)) memcpy(hdev->dev_name, rp->name, HCI_MAX_NAME_LENGTH); return rp->status; } static u8 hci_cc_write_auth_enable(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_status *rp = data; void *sent; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); sent = hci_sent_cmd_data(hdev, HCI_OP_WRITE_AUTH_ENABLE); if (!sent) return rp->status; hci_dev_lock(hdev); if (!rp->status) { __u8 param = *((__u8 *) sent); if (param == AUTH_ENABLED) set_bit(HCI_AUTH, &hdev->flags); else clear_bit(HCI_AUTH, &hdev->flags); } if (hci_dev_test_flag(hdev, HCI_MGMT)) mgmt_auth_enable_complete(hdev, rp->status); hci_dev_unlock(hdev); return rp->status; } static u8 hci_cc_write_encrypt_mode(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_status *rp = data; __u8 param; void *sent; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); if (rp->status) return rp->status; sent = hci_sent_cmd_data(hdev, HCI_OP_WRITE_ENCRYPT_MODE); if (!sent) return rp->status; param = *((__u8 *) sent); if (param) set_bit(HCI_ENCRYPT, &hdev->flags); else clear_bit(HCI_ENCRYPT, &hdev->flags); return rp->status; } static u8 hci_cc_write_scan_enable(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_status *rp = data; __u8 param; void *sent; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); sent = hci_sent_cmd_data(hdev, HCI_OP_WRITE_SCAN_ENABLE); if (!sent) return rp->status; param = *((__u8 *) sent); hci_dev_lock(hdev); if (rp->status) { hdev->discov_timeout = 0; goto done; } if (param & SCAN_INQUIRY) set_bit(HCI_ISCAN, &hdev->flags); else clear_bit(HCI_ISCAN, &hdev->flags); if (param & SCAN_PAGE) set_bit(HCI_PSCAN, &hdev->flags); else clear_bit(HCI_PSCAN, &hdev->flags); done: hci_dev_unlock(hdev); return rp->status; } static u8 hci_cc_set_event_filter(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_status *rp = data; struct hci_cp_set_event_filter *cp; void *sent; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); if (rp->status) return rp->status; sent = hci_sent_cmd_data(hdev, HCI_OP_SET_EVENT_FLT); if (!sent) return rp->status; cp = (struct hci_cp_set_event_filter *)sent; if (cp->flt_type == HCI_FLT_CLEAR_ALL) hci_dev_clear_flag(hdev, HCI_EVENT_FILTER_CONFIGURED); else hci_dev_set_flag(hdev, HCI_EVENT_FILTER_CONFIGURED); return rp->status; } static u8 hci_cc_read_class_of_dev(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_rp_read_class_of_dev *rp = data; if (WARN_ON(!hdev)) return HCI_ERROR_UNSPECIFIED; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); if (rp->status) return rp->status; memcpy(hdev->dev_class, rp->dev_class, 3); bt_dev_dbg(hdev, "class 0x%.2x%.2x%.2x", hdev->dev_class[2], hdev->dev_class[1], hdev->dev_class[0]); return rp->status; } static u8 hci_cc_write_class_of_dev(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_status *rp = data; void *sent; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); sent = hci_sent_cmd_data(hdev, HCI_OP_WRITE_CLASS_OF_DEV); if (!sent) return rp->status; hci_dev_lock(hdev); if (!rp->status) memcpy(hdev->dev_class, sent, 3); if (hci_dev_test_flag(hdev, HCI_MGMT)) mgmt_set_class_of_dev_complete(hdev, sent, rp->status); hci_dev_unlock(hdev); return rp->status; } static u8 hci_cc_read_voice_setting(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_rp_read_voice_setting *rp = data; __u16 setting; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); if (rp->status) return rp->status; setting = __le16_to_cpu(rp->voice_setting); if (hdev->voice_setting == setting) return rp->status; hdev->voice_setting = setting; bt_dev_dbg(hdev, "voice setting 0x%4.4x", setting); if (hdev->notify) hdev->notify(hdev, HCI_NOTIFY_VOICE_SETTING); return rp->status; } static u8 hci_cc_write_voice_setting(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_status *rp = data; __u16 setting; void *sent; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); if (rp->status) return rp->status; sent = hci_sent_cmd_data(hdev, HCI_OP_WRITE_VOICE_SETTING); if (!sent) return rp->status; setting = get_unaligned_le16(sent); if (hdev->voice_setting == setting) return rp->status; hdev->voice_setting = setting; bt_dev_dbg(hdev, "voice setting 0x%4.4x", setting); if (hdev->notify) hdev->notify(hdev, HCI_NOTIFY_VOICE_SETTING); return rp->status; } static u8 hci_cc_read_num_supported_iac(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_rp_read_num_supported_iac *rp = data; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); if (rp->status) return rp->status; hdev->num_iac = rp->num_iac; bt_dev_dbg(hdev, "num iac %d", hdev->num_iac); return rp->status; } static u8 hci_cc_write_ssp_mode(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_status *rp = data; struct hci_cp_write_ssp_mode *sent; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); sent = hci_sent_cmd_data(hdev, HCI_OP_WRITE_SSP_MODE); if (!sent) return rp->status; hci_dev_lock(hdev); if (!rp->status) { if (sent->mode) hdev->features[1][0] |= LMP_HOST_SSP; else hdev->features[1][0] &= ~LMP_HOST_SSP; } if (!rp->status) { if (sent->mode) hci_dev_set_flag(hdev, HCI_SSP_ENABLED); else hci_dev_clear_flag(hdev, HCI_SSP_ENABLED); } hci_dev_unlock(hdev); return rp->status; } static u8 hci_cc_write_sc_support(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_status *rp = data; struct hci_cp_write_sc_support *sent; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); sent = hci_sent_cmd_data(hdev, HCI_OP_WRITE_SC_SUPPORT); if (!sent) return rp->status; hci_dev_lock(hdev); if (!rp->status) { if (sent->support) hdev->features[1][0] |= LMP_HOST_SC; else hdev->features[1][0] &= ~LMP_HOST_SC; } if (!hci_dev_test_flag(hdev, HCI_MGMT) && !rp->status) { if (sent->support) hci_dev_set_flag(hdev, HCI_SC_ENABLED); else hci_dev_clear_flag(hdev, HCI_SC_ENABLED); } hci_dev_unlock(hdev); return rp->status; } static u8 hci_cc_read_local_version(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_rp_read_local_version *rp = data; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); if (rp->status) return rp->status; if (hci_dev_test_flag(hdev, HCI_SETUP) || hci_dev_test_flag(hdev, HCI_CONFIG)) { hdev->hci_ver = rp->hci_ver; hdev->hci_rev = __le16_to_cpu(rp->hci_rev); hdev->lmp_ver = rp->lmp_ver; hdev->manufacturer = __le16_to_cpu(rp->manufacturer); hdev->lmp_subver = __le16_to_cpu(rp->lmp_subver); } return rp->status; } static u8 hci_cc_read_enc_key_size(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_rp_read_enc_key_size *rp = data; struct hci_conn *conn; u16 handle; u8 status = rp->status; bt_dev_dbg(hdev, "status 0x%2.2x", status); handle = le16_to_cpu(rp->handle); hci_dev_lock(hdev); conn = hci_conn_hash_lookup_handle(hdev, handle); if (!conn) { status = 0xFF; goto done; } /* While unexpected, the read_enc_key_size command may fail. The most * secure approach is to then assume the key size is 0 to force a * disconnection. */ if (status) { bt_dev_err(hdev, "failed to read key size for handle %u", handle); conn->enc_key_size = 0; } else { u8 *key_enc_size = hci_conn_key_enc_size(conn); conn->enc_key_size = rp->key_size; status = 0; /* Attempt to check if the key size is too small or if it has * been downgraded from the last time it was stored as part of * the link_key. */ if (conn->enc_key_size < hdev->min_enc_key_size || (key_enc_size && conn->enc_key_size < *key_enc_size)) { /* As slave role, the conn->state has been set to * BT_CONNECTED and l2cap conn req might not be received * yet, at this moment the l2cap layer almost does * nothing with the non-zero status. * So we also clear encrypt related bits, and then the * handler of l2cap conn req will get the right secure * state at a later time. */ status = HCI_ERROR_AUTH_FAILURE; clear_bit(HCI_CONN_ENCRYPT, &conn->flags); clear_bit(HCI_CONN_AES_CCM, &conn->flags); } /* Update the key encryption size with the connection one */ if (key_enc_size && *key_enc_size != conn->enc_key_size) *key_enc_size = conn->enc_key_size; } hci_encrypt_cfm(conn, status); done: hci_dev_unlock(hdev); return status; } static u8 hci_cc_read_local_commands(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_rp_read_local_commands *rp = data; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); if (rp->status) return rp->status; if (hci_dev_test_flag(hdev, HCI_SETUP) || hci_dev_test_flag(hdev, HCI_CONFIG)) memcpy(hdev->commands, rp->commands, sizeof(hdev->commands)); return rp->status; } static u8 hci_cc_read_auth_payload_timeout(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_rp_read_auth_payload_to *rp = data; struct hci_conn *conn; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); if (rp->status) return rp->status; hci_dev_lock(hdev); conn = hci_conn_hash_lookup_handle(hdev, __le16_to_cpu(rp->handle)); if (conn) conn->auth_payload_timeout = __le16_to_cpu(rp->timeout); hci_dev_unlock(hdev); return rp->status; } static u8 hci_cc_write_auth_payload_timeout(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_rp_write_auth_payload_to *rp = data; struct hci_conn *conn; void *sent; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); sent = hci_sent_cmd_data(hdev, HCI_OP_WRITE_AUTH_PAYLOAD_TO); if (!sent) return rp->status; hci_dev_lock(hdev); conn = hci_conn_hash_lookup_handle(hdev, __le16_to_cpu(rp->handle)); if (!conn) { rp->status = 0xff; goto unlock; } if (!rp->status) conn->auth_payload_timeout = get_unaligned_le16(sent + 2); unlock: hci_dev_unlock(hdev); return rp->status; } static u8 hci_cc_read_local_features(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_rp_read_local_features *rp = data; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); if (rp->status) return rp->status; memcpy(hdev->features, rp->features, 8); /* Adjust default settings according to features * supported by device. */ if (hdev->features[0][0] & LMP_3SLOT) hdev->pkt_type |= (HCI_DM3 | HCI_DH3); if (hdev->features[0][0] & LMP_5SLOT) hdev->pkt_type |= (HCI_DM5 | HCI_DH5); if (hdev->features[0][1] & LMP_HV2) { hdev->pkt_type |= (HCI_HV2); hdev->esco_type |= (ESCO_HV2); } if (hdev->features[0][1] & LMP_HV3) { hdev->pkt_type |= (HCI_HV3); hdev->esco_type |= (ESCO_HV3); } if (lmp_esco_capable(hdev)) hdev->esco_type |= (ESCO_EV3); if (hdev->features[0][4] & LMP_EV4) hdev->esco_type |= (ESCO_EV4); if (hdev->features[0][4] & LMP_EV5) hdev->esco_type |= (ESCO_EV5); if (hdev->features[0][5] & LMP_EDR_ESCO_2M) hdev->esco_type |= (ESCO_2EV3); if (hdev->features[0][5] & LMP_EDR_ESCO_3M) hdev->esco_type |= (ESCO_3EV3); if (hdev->features[0][5] & LMP_EDR_3S_ESCO) hdev->esco_type |= (ESCO_2EV5 | ESCO_3EV5); return rp->status; } static u8 hci_cc_read_local_ext_features(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_rp_read_local_ext_features *rp = data; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); if (rp->status) return rp->status; if (hdev->max_page < rp->max_page) { if (hci_test_quirk(hdev, HCI_QUIRK_BROKEN_LOCAL_EXT_FEATURES_PAGE_2)) bt_dev_warn(hdev, "broken local ext features page 2"); else hdev->max_page = rp->max_page; } if (rp->page < HCI_MAX_PAGES) memcpy(hdev->features[rp->page], rp->features, 8); return rp->status; } static u8 hci_cc_read_buffer_size(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_rp_read_buffer_size *rp = data; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); if (rp->status) return rp->status; hdev->acl_mtu = __le16_to_cpu(rp->acl_mtu); hdev->sco_mtu = rp->sco_mtu; hdev->acl_pkts = __le16_to_cpu(rp->acl_max_pkt); hdev->sco_pkts = __le16_to_cpu(rp->sco_max_pkt); if (hci_test_quirk(hdev, HCI_QUIRK_FIXUP_BUFFER_SIZE)) { hdev->sco_mtu = 64; hdev->sco_pkts = 8; } if (!read_voice_setting_capable(hdev)) hdev->sco_pkts = 0; hdev->acl_cnt = hdev->acl_pkts; hdev->sco_cnt = hdev->sco_pkts; BT_DBG("%s acl mtu %d:%d sco mtu %d:%d", hdev->name, hdev->acl_mtu, hdev->acl_pkts, hdev->sco_mtu, hdev->sco_pkts); if (!hdev->acl_mtu || !hdev->acl_pkts) return HCI_ERROR_INVALID_PARAMETERS; return rp->status; } static u8 hci_cc_read_bd_addr(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_rp_read_bd_addr *rp = data; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); if (rp->status) return rp->status; if (test_bit(HCI_INIT, &hdev->flags)) bacpy(&hdev->bdaddr, &rp->bdaddr); if (hci_dev_test_flag(hdev, HCI_SETUP)) bacpy(&hdev->setup_addr, &rp->bdaddr); return rp->status; } static u8 hci_cc_read_local_pairing_opts(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_rp_read_local_pairing_opts *rp = data; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); if (rp->status) return rp->status; if (hci_dev_test_flag(hdev, HCI_SETUP) || hci_dev_test_flag(hdev, HCI_CONFIG)) { hdev->pairing_opts = rp->pairing_opts; hdev->max_enc_key_size = rp->max_key_size; } return rp->status; } static u8 hci_cc_read_page_scan_activity(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_rp_read_page_scan_activity *rp = data; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); if (rp->status) return rp->status; if (test_bit(HCI_INIT, &hdev->flags)) { hdev->page_scan_interval = __le16_to_cpu(rp->interval); hdev->page_scan_window = __le16_to_cpu(rp->window); } return rp->status; } static u8 hci_cc_write_page_scan_activity(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_status *rp = data; struct hci_cp_write_page_scan_activity *sent; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); if (rp->status) return rp->status; sent = hci_sent_cmd_data(hdev, HCI_OP_WRITE_PAGE_SCAN_ACTIVITY); if (!sent) return rp->status; hdev->page_scan_interval = __le16_to_cpu(sent->interval); hdev->page_scan_window = __le16_to_cpu(sent->window); return rp->status; } static u8 hci_cc_read_page_scan_type(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_rp_read_page_scan_type *rp = data; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); if (rp->status) return rp->status; if (test_bit(HCI_INIT, &hdev->flags)) hdev->page_scan_type = rp->type; return rp->status; } static u8 hci_cc_write_page_scan_type(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_status *rp = data; u8 *type; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); if (rp->status) return rp->status; type = hci_sent_cmd_data(hdev, HCI_OP_WRITE_PAGE_SCAN_TYPE); if (type) hdev->page_scan_type = *type; return rp->status; } static u8 hci_cc_read_clock(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_rp_read_clock *rp = data; struct hci_cp_read_clock *cp; struct hci_conn *conn; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); if (rp->status) return rp->status; hci_dev_lock(hdev); cp = hci_sent_cmd_data(hdev, HCI_OP_READ_CLOCK); if (!cp) goto unlock; if (cp->which == 0x00) { hdev->clock = le32_to_cpu(rp->clock); goto unlock; } conn = hci_conn_hash_lookup_handle(hdev, __le16_to_cpu(rp->handle)); if (conn) { conn->clock = le32_to_cpu(rp->clock); conn->clock_accuracy = le16_to_cpu(rp->accuracy); } unlock: hci_dev_unlock(hdev); return rp->status; } static u8 hci_cc_read_inq_rsp_tx_power(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_rp_read_inq_rsp_tx_power *rp = data; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); if (rp->status) return rp->status; hdev->inq_tx_power = rp->tx_power; return rp->status; } static u8 hci_cc_read_def_err_data_reporting(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_rp_read_def_err_data_reporting *rp = data; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); if (rp->status) return rp->status; hdev->err_data_reporting = rp->err_data_reporting; return rp->status; } static u8 hci_cc_write_def_err_data_reporting(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_status *rp = data; struct hci_cp_write_def_err_data_reporting *cp; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); if (rp->status) return rp->status; cp = hci_sent_cmd_data(hdev, HCI_OP_WRITE_DEF_ERR_DATA_REPORTING); if (!cp) return rp->status; hdev->err_data_reporting = cp->err_data_reporting; return rp->status; } static u8 hci_cc_pin_code_reply(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_rp_pin_code_reply *rp = data; struct hci_cp_pin_code_reply *cp; struct hci_conn *conn; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); hci_dev_lock(hdev); if (hci_dev_test_flag(hdev, HCI_MGMT)) mgmt_pin_code_reply_complete(hdev, &rp->bdaddr, rp->status); if (rp->status) goto unlock; cp = hci_sent_cmd_data(hdev, HCI_OP_PIN_CODE_REPLY); if (!cp) goto unlock; conn = hci_conn_hash_lookup_ba(hdev, ACL_LINK, &cp->bdaddr); if (conn) conn->pin_length = cp->pin_len; unlock: hci_dev_unlock(hdev); return rp->status; } static u8 hci_cc_pin_code_neg_reply(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_rp_pin_code_neg_reply *rp = data; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); hci_dev_lock(hdev); if (hci_dev_test_flag(hdev, HCI_MGMT)) mgmt_pin_code_neg_reply_complete(hdev, &rp->bdaddr, rp->status); hci_dev_unlock(hdev); return rp->status; } static u8 hci_cc_le_read_buffer_size(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_rp_le_read_buffer_size *rp = data; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); if (rp->status) return rp->status; hdev->le_mtu = __le16_to_cpu(rp->le_mtu); hdev->le_pkts = rp->le_max_pkt; hdev->le_cnt = hdev->le_pkts; BT_DBG("%s le mtu %d:%d", hdev->name, hdev->le_mtu, hdev->le_pkts); if (hdev->le_mtu && hdev->le_mtu < HCI_MIN_LE_MTU) return HCI_ERROR_INVALID_PARAMETERS; return rp->status; } static u8 hci_cc_le_read_local_features(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_rp_le_read_local_features *rp = data; BT_DBG("%s status 0x%2.2x", hdev->name, rp->status); if (rp->status) return rp->status; memcpy(hdev->le_features, rp->features, 8); return rp->status; } static u8 hci_cc_le_read_adv_tx_power(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_rp_le_read_adv_tx_power *rp = data; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); if (rp->status) return rp->status; hdev->adv_tx_power = rp->tx_power; return rp->status; } static u8 hci_cc_user_confirm_reply(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_rp_user_confirm_reply *rp = data; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); hci_dev_lock(hdev); if (hci_dev_test_flag(hdev, HCI_MGMT)) mgmt_user_confirm_reply_complete(hdev, &rp->bdaddr, ACL_LINK, 0, rp->status); hci_dev_unlock(hdev); return rp->status; } static u8 hci_cc_user_confirm_neg_reply(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_rp_user_confirm_reply *rp = data; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); hci_dev_lock(hdev); if (hci_dev_test_flag(hdev, HCI_MGMT)) mgmt_user_confirm_neg_reply_complete(hdev, &rp->bdaddr, ACL_LINK, 0, rp->status); hci_dev_unlock(hdev); return rp->status; } static u8 hci_cc_user_passkey_reply(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_rp_user_confirm_reply *rp = data; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); hci_dev_lock(hdev); if (hci_dev_test_flag(hdev, HCI_MGMT)) mgmt_user_passkey_reply_complete(hdev, &rp->bdaddr, ACL_LINK, 0, rp->status); hci_dev_unlock(hdev); return rp->status; } static u8 hci_cc_user_passkey_neg_reply(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_rp_user_confirm_reply *rp = data; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); hci_dev_lock(hdev); if (hci_dev_test_flag(hdev, HCI_MGMT)) mgmt_user_passkey_neg_reply_complete(hdev, &rp->bdaddr, ACL_LINK, 0, rp->status); hci_dev_unlock(hdev); return rp->status; } static u8 hci_cc_read_local_oob_data(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_rp_read_local_oob_data *rp = data; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); return rp->status; } static u8 hci_cc_read_local_oob_ext_data(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_rp_read_local_oob_ext_data *rp = data; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); return rp->status; } static u8 hci_cc_le_set_random_addr(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_status *rp = data; bdaddr_t *sent; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); if (rp->status) return rp->status; sent = hci_sent_cmd_data(hdev, HCI_OP_LE_SET_RANDOM_ADDR); if (!sent) return rp->status; hci_dev_lock(hdev); bacpy(&hdev->random_addr, sent); if (!bacmp(&hdev->rpa, sent)) { hci_dev_clear_flag(hdev, HCI_RPA_EXPIRED); queue_delayed_work(hdev->workqueue, &hdev->rpa_expired, secs_to_jiffies(hdev->rpa_timeout)); } hci_dev_unlock(hdev); return rp->status; } static u8 hci_cc_le_set_default_phy(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_status *rp = data; struct hci_cp_le_set_default_phy *cp; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); if (rp->status) return rp->status; cp = hci_sent_cmd_data(hdev, HCI_OP_LE_SET_DEFAULT_PHY); if (!cp) return rp->status; hci_dev_lock(hdev); hdev->le_tx_def_phys = cp->tx_phys; hdev->le_rx_def_phys = cp->rx_phys; hci_dev_unlock(hdev); return rp->status; } static u8 hci_cc_le_set_adv_set_random_addr(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_status *rp = data; struct hci_cp_le_set_adv_set_rand_addr *cp; struct adv_info *adv; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); if (rp->status) return rp->status; cp = hci_sent_cmd_data(hdev, HCI_OP_LE_SET_ADV_SET_RAND_ADDR); /* Update only in case the adv instance since handle 0x00 shall be using * HCI_OP_LE_SET_RANDOM_ADDR since that allows both extended and * non-extended adverting. */ if (!cp || !cp->handle) return rp->status; hci_dev_lock(hdev); adv = hci_find_adv_instance(hdev, cp->handle); if (adv) { bacpy(&adv->random_addr, &cp->bdaddr); if (!bacmp(&hdev->rpa, &cp->bdaddr)) { adv->rpa_expired = false; queue_delayed_work(hdev->workqueue, &adv->rpa_expired_cb, secs_to_jiffies(hdev->rpa_timeout)); } } hci_dev_unlock(hdev); return rp->status; } static u8 hci_cc_le_remove_adv_set(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_status *rp = data; u8 *instance; int err; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); if (rp->status) return rp->status; instance = hci_sent_cmd_data(hdev, HCI_OP_LE_REMOVE_ADV_SET); if (!instance) return rp->status; hci_dev_lock(hdev); err = hci_remove_adv_instance(hdev, *instance); if (!err) mgmt_advertising_removed(hci_skb_sk(hdev->sent_cmd), hdev, *instance); hci_dev_unlock(hdev); return rp->status; } static u8 hci_cc_le_clear_adv_sets(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_status *rp = data; struct adv_info *adv, *n; int err; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); if (rp->status) return rp->status; if (!hci_sent_cmd_data(hdev, HCI_OP_LE_CLEAR_ADV_SETS)) return rp->status; hci_dev_lock(hdev); list_for_each_entry_safe(adv, n, &hdev->adv_instances, list) { u8 instance = adv->instance; err = hci_remove_adv_instance(hdev, instance); if (!err) mgmt_advertising_removed(hci_skb_sk(hdev->sent_cmd), hdev, instance); } hci_dev_unlock(hdev); return rp->status; } static u8 hci_cc_le_read_transmit_power(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_rp_le_read_transmit_power *rp = data; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); if (rp->status) return rp->status; hdev->min_le_tx_power = rp->min_le_tx_power; hdev->max_le_tx_power = rp->max_le_tx_power; return rp->status; } static u8 hci_cc_le_set_privacy_mode(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_status *rp = data; struct hci_cp_le_set_privacy_mode *cp; struct hci_conn_params *params; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); if (rp->status) return rp->status; cp = hci_sent_cmd_data(hdev, HCI_OP_LE_SET_PRIVACY_MODE); if (!cp) return rp->status; hci_dev_lock(hdev); params = hci_conn_params_lookup(hdev, &cp->bdaddr, cp->bdaddr_type); if (params) WRITE_ONCE(params->privacy_mode, cp->mode); hci_dev_unlock(hdev); return rp->status; } static u8 hci_cc_le_set_adv_enable(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_status *rp = data; __u8 *sent; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); if (rp->status) return rp->status; sent = hci_sent_cmd_data(hdev, HCI_OP_LE_SET_ADV_ENABLE); if (!sent) return rp->status; hci_dev_lock(hdev); /* If we're doing connection initiation as peripheral. Set a * timeout in case something goes wrong. */ if (*sent) { struct hci_conn *conn; hci_dev_set_flag(hdev, HCI_LE_ADV); conn = hci_lookup_le_connect(hdev); if (conn) queue_delayed_work(hdev->workqueue, &conn->le_conn_timeout, conn->conn_timeout); } else { hci_dev_clear_flag(hdev, HCI_LE_ADV); } hci_dev_unlock(hdev); return rp->status; } static u8 hci_cc_le_set_ext_adv_enable(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_cp_le_set_ext_adv_enable *cp; struct hci_cp_ext_adv_set *set; struct adv_info *adv = NULL, *n; struct hci_ev_status *rp = data; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); if (rp->status) return rp->status; cp = hci_sent_cmd_data(hdev, HCI_OP_LE_SET_EXT_ADV_ENABLE); if (!cp) return rp->status; set = (void *)cp->data; hci_dev_lock(hdev); if (cp->num_of_sets) adv = hci_find_adv_instance(hdev, set->handle); if (cp->enable) { struct hci_conn *conn; hci_dev_set_flag(hdev, HCI_LE_ADV); if (adv) adv->enabled = true; else if (!set->handle) hci_dev_set_flag(hdev, HCI_LE_ADV_0); conn = hci_lookup_le_connect(hdev); if (conn) queue_delayed_work(hdev->workqueue, &conn->le_conn_timeout, conn->conn_timeout); } else { if (cp->num_of_sets) { if (adv) adv->enabled = false; else if (!set->handle) hci_dev_clear_flag(hdev, HCI_LE_ADV_0); /* If just one instance was disabled check if there are * any other instance enabled before clearing HCI_LE_ADV */ list_for_each_entry_safe(adv, n, &hdev->adv_instances, list) { if (adv->enabled) goto unlock; } } else { /* All instances shall be considered disabled */ list_for_each_entry_safe(adv, n, &hdev->adv_instances, list) adv->enabled = false; } hci_dev_clear_flag(hdev, HCI_LE_ADV); } unlock: hci_dev_unlock(hdev); return rp->status; } static u8 hci_cc_le_set_scan_param(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_cp_le_set_scan_param *cp; struct hci_ev_status *rp = data; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); if (rp->status) return rp->status; cp = hci_sent_cmd_data(hdev, HCI_OP_LE_SET_SCAN_PARAM); if (!cp) return rp->status; hci_dev_lock(hdev); hdev->le_scan_type = cp->type; hci_dev_unlock(hdev); return rp->status; } static u8 hci_cc_le_set_ext_scan_param(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_cp_le_set_ext_scan_params *cp; struct hci_ev_status *rp = data; struct hci_cp_le_scan_phy_params *phy_param; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); if (rp->status) return rp->status; cp = hci_sent_cmd_data(hdev, HCI_OP_LE_SET_EXT_SCAN_PARAMS); if (!cp) return rp->status; phy_param = (void *)cp->data; hci_dev_lock(hdev); hdev->le_scan_type = phy_param->type; hci_dev_unlock(hdev); return rp->status; } static bool has_pending_adv_report(struct hci_dev *hdev) { struct discovery_state *d = &hdev->discovery; return bacmp(&d->last_adv_addr, BDADDR_ANY); } static void clear_pending_adv_report(struct hci_dev *hdev) { struct discovery_state *d = &hdev->discovery; bacpy(&d->last_adv_addr, BDADDR_ANY); d->last_adv_data_len = 0; } static void store_pending_adv_report(struct hci_dev *hdev, bdaddr_t *bdaddr, u8 bdaddr_type, s8 rssi, u32 flags, u8 *data, u8 len) { struct discovery_state *d = &hdev->discovery; if (len > max_adv_len(hdev)) return; bacpy(&d->last_adv_addr, bdaddr); d->last_adv_addr_type = bdaddr_type; d->last_adv_rssi = rssi; d->last_adv_flags = flags; memcpy(d->last_adv_data, data, len); d->last_adv_data_len = len; } static void le_set_scan_enable_complete(struct hci_dev *hdev, u8 enable) { hci_dev_lock(hdev); switch (enable) { case LE_SCAN_ENABLE: hci_dev_set_flag(hdev, HCI_LE_SCAN); if (hdev->le_scan_type == LE_SCAN_ACTIVE) { clear_pending_adv_report(hdev); hci_discovery_set_state(hdev, DISCOVERY_FINDING); } break; case LE_SCAN_DISABLE: /* We do this here instead of when setting DISCOVERY_STOPPED * since the latter would potentially require waiting for * inquiry to stop too. */ if (has_pending_adv_report(hdev)) { struct discovery_state *d = &hdev->discovery; mgmt_device_found(hdev, &d->last_adv_addr, LE_LINK, d->last_adv_addr_type, NULL, d->last_adv_rssi, d->last_adv_flags, d->last_adv_data, d->last_adv_data_len, NULL, 0, 0); } /* Cancel this timer so that we don't try to disable scanning * when it's already disabled. */ cancel_delayed_work(&hdev->le_scan_disable); hci_dev_clear_flag(hdev, HCI_LE_SCAN); /* The HCI_LE_SCAN_INTERRUPTED flag indicates that we * interrupted scanning due to a connect request. Mark * therefore discovery as stopped. */ if (hci_dev_test_and_clear_flag(hdev, HCI_LE_SCAN_INTERRUPTED)) hci_discovery_set_state(hdev, DISCOVERY_STOPPED); else if (!hci_dev_test_flag(hdev, HCI_LE_ADV) && hdev->discovery.state == DISCOVERY_FINDING) queue_work(hdev->workqueue, &hdev->reenable_adv_work); break; default: bt_dev_err(hdev, "use of reserved LE_Scan_Enable param %d", enable); break; } hci_dev_unlock(hdev); } static u8 hci_cc_le_set_scan_enable(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_cp_le_set_scan_enable *cp; struct hci_ev_status *rp = data; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); if (rp->status) return rp->status; cp = hci_sent_cmd_data(hdev, HCI_OP_LE_SET_SCAN_ENABLE); if (!cp) return rp->status; le_set_scan_enable_complete(hdev, cp->enable); return rp->status; } static u8 hci_cc_le_set_ext_scan_enable(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_cp_le_set_ext_scan_enable *cp; struct hci_ev_status *rp = data; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); if (rp->status) return rp->status; cp = hci_sent_cmd_data(hdev, HCI_OP_LE_SET_EXT_SCAN_ENABLE); if (!cp) return rp->status; le_set_scan_enable_complete(hdev, cp->enable); return rp->status; } static u8 hci_cc_le_read_num_adv_sets(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_rp_le_read_num_supported_adv_sets *rp = data; bt_dev_dbg(hdev, "status 0x%2.2x No of Adv sets %u", rp->status, rp->num_of_sets); if (rp->status) return rp->status; hdev->le_num_of_adv_sets = rp->num_of_sets; return rp->status; } static u8 hci_cc_le_read_accept_list_size(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_rp_le_read_accept_list_size *rp = data; bt_dev_dbg(hdev, "status 0x%2.2x size %u", rp->status, rp->size); if (rp->status) return rp->status; hdev->le_accept_list_size = rp->size; return rp->status; } static u8 hci_cc_le_clear_accept_list(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_status *rp = data; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); if (rp->status) return rp->status; hci_dev_lock(hdev); hci_bdaddr_list_clear(&hdev->le_accept_list); hci_dev_unlock(hdev); return rp->status; } static u8 hci_cc_le_add_to_accept_list(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_cp_le_add_to_accept_list *sent; struct hci_ev_status *rp = data; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); if (rp->status) return rp->status; sent = hci_sent_cmd_data(hdev, HCI_OP_LE_ADD_TO_ACCEPT_LIST); if (!sent) return rp->status; hci_dev_lock(hdev); hci_bdaddr_list_add(&hdev->le_accept_list, &sent->bdaddr, sent->bdaddr_type); hci_dev_unlock(hdev); return rp->status; } static u8 hci_cc_le_del_from_accept_list(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_cp_le_del_from_accept_list *sent; struct hci_ev_status *rp = data; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); if (rp->status) return rp->status; sent = hci_sent_cmd_data(hdev, HCI_OP_LE_DEL_FROM_ACCEPT_LIST); if (!sent) return rp->status; hci_dev_lock(hdev); hci_bdaddr_list_del(&hdev->le_accept_list, &sent->bdaddr, sent->bdaddr_type); hci_dev_unlock(hdev); return rp->status; } static u8 hci_cc_le_read_supported_states(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_rp_le_read_supported_states *rp = data; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); if (rp->status) return rp->status; memcpy(hdev->le_states, rp->le_states, 8); return rp->status; } static u8 hci_cc_le_read_def_data_len(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_rp_le_read_def_data_len *rp = data; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); if (rp->status) return rp->status; hdev->le_def_tx_len = le16_to_cpu(rp->tx_len); hdev->le_def_tx_time = le16_to_cpu(rp->tx_time); return rp->status; } static u8 hci_cc_le_write_def_data_len(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_cp_le_write_def_data_len *sent; struct hci_ev_status *rp = data; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); if (rp->status) return rp->status; sent = hci_sent_cmd_data(hdev, HCI_OP_LE_WRITE_DEF_DATA_LEN); if (!sent) return rp->status; hdev->le_def_tx_len = le16_to_cpu(sent->tx_len); hdev->le_def_tx_time = le16_to_cpu(sent->tx_time); return rp->status; } static u8 hci_cc_le_add_to_resolv_list(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_cp_le_add_to_resolv_list *sent; struct hci_ev_status *rp = data; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); if (rp->status) return rp->status; sent = hci_sent_cmd_data(hdev, HCI_OP_LE_ADD_TO_RESOLV_LIST); if (!sent) return rp->status; hci_dev_lock(hdev); hci_bdaddr_list_add_with_irk(&hdev->le_resolv_list, &sent->bdaddr, sent->bdaddr_type, sent->peer_irk, sent->local_irk); hci_dev_unlock(hdev); return rp->status; } static u8 hci_cc_le_del_from_resolv_list(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_cp_le_del_from_resolv_list *sent; struct hci_ev_status *rp = data; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); if (rp->status) return rp->status; sent = hci_sent_cmd_data(hdev, HCI_OP_LE_DEL_FROM_RESOLV_LIST); if (!sent) return rp->status; hci_dev_lock(hdev); hci_bdaddr_list_del_with_irk(&hdev->le_resolv_list, &sent->bdaddr, sent->bdaddr_type); hci_dev_unlock(hdev); return rp->status; } static u8 hci_cc_le_clear_resolv_list(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_status *rp = data; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); if (rp->status) return rp->status; hci_dev_lock(hdev); hci_bdaddr_list_clear(&hdev->le_resolv_list); hci_dev_unlock(hdev); return rp->status; } static u8 hci_cc_le_read_resolv_list_size(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_rp_le_read_resolv_list_size *rp = data; bt_dev_dbg(hdev, "status 0x%2.2x size %u", rp->status, rp->size); if (rp->status) return rp->status; hdev->le_resolv_list_size = rp->size; return rp->status; } static u8 hci_cc_le_set_addr_resolution_enable(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_status *rp = data; __u8 *sent; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); if (rp->status) return rp->status; sent = hci_sent_cmd_data(hdev, HCI_OP_LE_SET_ADDR_RESOLV_ENABLE); if (!sent) return rp->status; hci_dev_lock(hdev); if (*sent) hci_dev_set_flag(hdev, HCI_LL_RPA_RESOLUTION); else hci_dev_clear_flag(hdev, HCI_LL_RPA_RESOLUTION); hci_dev_unlock(hdev); return rp->status; } static u8 hci_cc_le_read_max_data_len(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_rp_le_read_max_data_len *rp = data; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); if (rp->status) return rp->status; hdev->le_max_tx_len = le16_to_cpu(rp->tx_len); hdev->le_max_tx_time = le16_to_cpu(rp->tx_time); hdev->le_max_rx_len = le16_to_cpu(rp->rx_len); hdev->le_max_rx_time = le16_to_cpu(rp->rx_time); return rp->status; } static u8 hci_cc_write_le_host_supported(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_cp_write_le_host_supported *sent; struct hci_ev_status *rp = data; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); if (rp->status) return rp->status; sent = hci_sent_cmd_data(hdev, HCI_OP_WRITE_LE_HOST_SUPPORTED); if (!sent) return rp->status; hci_dev_lock(hdev); if (sent->le) { hdev->features[1][0] |= LMP_HOST_LE; hci_dev_set_flag(hdev, HCI_LE_ENABLED); } else { hdev->features[1][0] &= ~LMP_HOST_LE; hci_dev_clear_flag(hdev, HCI_LE_ENABLED); hci_dev_clear_flag(hdev, HCI_ADVERTISING); } if (sent->simul) hdev->features[1][0] |= LMP_HOST_LE_BREDR; else hdev->features[1][0] &= ~LMP_HOST_LE_BREDR; hci_dev_unlock(hdev); return rp->status; } static u8 hci_cc_set_adv_param(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_cp_le_set_adv_param *cp; struct hci_ev_status *rp = data; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); if (rp->status) return rp->status; cp = hci_sent_cmd_data(hdev, HCI_OP_LE_SET_ADV_PARAM); if (!cp) return rp->status; hci_dev_lock(hdev); hdev->adv_addr_type = cp->own_address_type; hci_dev_unlock(hdev); return rp->status; } static u8 hci_cc_read_rssi(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_rp_read_rssi *rp = data; struct hci_conn *conn; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); if (rp->status) return rp->status; hci_dev_lock(hdev); conn = hci_conn_hash_lookup_handle(hdev, __le16_to_cpu(rp->handle)); if (conn) conn->rssi = rp->rssi; hci_dev_unlock(hdev); return rp->status; } static u8 hci_cc_read_tx_power(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_cp_read_tx_power *sent; struct hci_rp_read_tx_power *rp = data; struct hci_conn *conn; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); if (rp->status) return rp->status; sent = hci_sent_cmd_data(hdev, HCI_OP_READ_TX_POWER); if (!sent) return rp->status; hci_dev_lock(hdev); conn = hci_conn_hash_lookup_handle(hdev, __le16_to_cpu(rp->handle)); if (!conn) goto unlock; switch (sent->type) { case 0x00: conn->tx_power = rp->tx_power; break; case 0x01: conn->max_tx_power = rp->tx_power; break; } unlock: hci_dev_unlock(hdev); return rp->status; } static u8 hci_cc_write_ssp_debug_mode(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_status *rp = data; u8 *mode; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); if (rp->status) return rp->status; mode = hci_sent_cmd_data(hdev, HCI_OP_WRITE_SSP_DEBUG_MODE); if (mode) hdev->ssp_debug_mode = *mode; return rp->status; } static void hci_cs_inquiry(struct hci_dev *hdev, __u8 status) { bt_dev_dbg(hdev, "status 0x%2.2x", status); if (status) return; if (hci_sent_cmd_data(hdev, HCI_OP_INQUIRY)) set_bit(HCI_INQUIRY, &hdev->flags); } static void hci_cs_create_conn(struct hci_dev *hdev, __u8 status) { struct hci_cp_create_conn *cp; struct hci_conn *conn; bt_dev_dbg(hdev, "status 0x%2.2x", status); cp = hci_sent_cmd_data(hdev, HCI_OP_CREATE_CONN); if (!cp) return; hci_dev_lock(hdev); conn = hci_conn_hash_lookup_ba(hdev, ACL_LINK, &cp->bdaddr); bt_dev_dbg(hdev, "bdaddr %pMR hcon %p", &cp->bdaddr, conn); if (status) { if (conn && conn->state == BT_CONNECT) { conn->state = BT_CLOSED; hci_connect_cfm(conn, status); hci_conn_del(conn); } } else { if (!conn) { conn = hci_conn_add_unset(hdev, ACL_LINK, &cp->bdaddr, HCI_ROLE_MASTER); if (IS_ERR(conn)) bt_dev_err(hdev, "connection err: %ld", PTR_ERR(conn)); } } hci_dev_unlock(hdev); } static void hci_cs_add_sco(struct hci_dev *hdev, __u8 status) { struct hci_cp_add_sco *cp; struct hci_conn *acl; struct hci_link *link; __u16 handle; bt_dev_dbg(hdev, "status 0x%2.2x", status); if (!status) return; cp = hci_sent_cmd_data(hdev, HCI_OP_ADD_SCO); if (!cp) return; handle = __le16_to_cpu(cp->handle); bt_dev_dbg(hdev, "handle 0x%4.4x", handle); hci_dev_lock(hdev); acl = hci_conn_hash_lookup_handle(hdev, handle); if (acl) { link = list_first_entry_or_null(&acl->link_list, struct hci_link, list); if (link && link->conn) { link->conn->state = BT_CLOSED; hci_connect_cfm(link->conn, status); hci_conn_del(link->conn); } } hci_dev_unlock(hdev); } static void hci_cs_auth_requested(struct hci_dev *hdev, __u8 status) { struct hci_cp_auth_requested *cp; struct hci_conn *conn; bt_dev_dbg(hdev, "status 0x%2.2x", status); if (!status) return; cp = hci_sent_cmd_data(hdev, HCI_OP_AUTH_REQUESTED); if (!cp) return; hci_dev_lock(hdev); conn = hci_conn_hash_lookup_handle(hdev, __le16_to_cpu(cp->handle)); if (conn) { if (conn->state == BT_CONFIG) { hci_connect_cfm(conn, status); hci_conn_drop(conn); } } hci_dev_unlock(hdev); } static void hci_cs_set_conn_encrypt(struct hci_dev *hdev, __u8 status) { struct hci_cp_set_conn_encrypt *cp; struct hci_conn *conn; bt_dev_dbg(hdev, "status 0x%2.2x", status); if (!status) return; cp = hci_sent_cmd_data(hdev, HCI_OP_SET_CONN_ENCRYPT); if (!cp) return; hci_dev_lock(hdev); conn = hci_conn_hash_lookup_handle(hdev, __le16_to_cpu(cp->handle)); if (conn) { if (conn->state == BT_CONFIG) { hci_connect_cfm(conn, status); hci_conn_drop(conn); } } hci_dev_unlock(hdev); } static int hci_outgoing_auth_needed(struct hci_dev *hdev, struct hci_conn *conn) { if (conn->state != BT_CONFIG || !conn->out) return 0; if (conn->pending_sec_level == BT_SECURITY_SDP) return 0; /* Only request authentication for SSP connections or non-SSP * devices with sec_level MEDIUM or HIGH or if MITM protection * is requested. */ if (!hci_conn_ssp_enabled(conn) && !(conn->auth_type & 0x01) && conn->pending_sec_level != BT_SECURITY_FIPS && conn->pending_sec_level != BT_SECURITY_HIGH && conn->pending_sec_level != BT_SECURITY_MEDIUM) return 0; return 1; } static int hci_resolve_name(struct hci_dev *hdev, struct inquiry_entry *e) { struct hci_cp_remote_name_req cp; memset(&cp, 0, sizeof(cp)); bacpy(&cp.bdaddr, &e->data.bdaddr); cp.pscan_rep_mode = e->data.pscan_rep_mode; cp.pscan_mode = e->data.pscan_mode; cp.clock_offset = e->data.clock_offset; return hci_send_cmd(hdev, HCI_OP_REMOTE_NAME_REQ, sizeof(cp), &cp); } static bool hci_resolve_next_name(struct hci_dev *hdev) { struct discovery_state *discov = &hdev->discovery; struct inquiry_entry *e; if (list_empty(&discov->resolve)) return false; /* We should stop if we already spent too much time resolving names. */ if (time_after(jiffies, discov->name_resolve_timeout)) { bt_dev_warn_ratelimited(hdev, "Name resolve takes too long."); return false; } e = hci_inquiry_cache_lookup_resolve(hdev, BDADDR_ANY, NAME_NEEDED); if (!e) return false; if (hci_resolve_name(hdev, e) == 0) { e->name_state = NAME_PENDING; return true; } return false; } static void hci_check_pending_name(struct hci_dev *hdev, struct hci_conn *conn, bdaddr_t *bdaddr, u8 *name, u8 name_len) { struct discovery_state *discov = &hdev->discovery; struct inquiry_entry *e; /* Update the mgmt connected state if necessary. Be careful with * conn objects that exist but are not (yet) connected however. * Only those in BT_CONFIG or BT_CONNECTED states can be * considered connected. */ if (conn && (conn->state == BT_CONFIG || conn->state == BT_CONNECTED)) mgmt_device_connected(hdev, conn, name, name_len); if (discov->state == DISCOVERY_STOPPED) return; if (discov->state == DISCOVERY_STOPPING) goto discov_complete; if (discov->state != DISCOVERY_RESOLVING) return; e = hci_inquiry_cache_lookup_resolve(hdev, bdaddr, NAME_PENDING); /* If the device was not found in a list of found devices names of which * are pending. there is no need to continue resolving a next name as it * will be done upon receiving another Remote Name Request Complete * Event */ if (!e) return; list_del(&e->list); e->name_state = name ? NAME_KNOWN : NAME_NOT_KNOWN; mgmt_remote_name(hdev, bdaddr, ACL_LINK, 0x00, e->data.rssi, name, name_len); if (hci_resolve_next_name(hdev)) return; discov_complete: hci_discovery_set_state(hdev, DISCOVERY_STOPPED); } static void hci_cs_remote_name_req(struct hci_dev *hdev, __u8 status) { struct hci_cp_remote_name_req *cp; struct hci_conn *conn; bt_dev_dbg(hdev, "status 0x%2.2x", status); /* If successful wait for the name req complete event before * checking for the need to do authentication */ if (!status) return; cp = hci_sent_cmd_data(hdev, HCI_OP_REMOTE_NAME_REQ); if (!cp) return; hci_dev_lock(hdev); conn = hci_conn_hash_lookup_ba(hdev, ACL_LINK, &cp->bdaddr); if (hci_dev_test_flag(hdev, HCI_MGMT)) hci_check_pending_name(hdev, conn, &cp->bdaddr, NULL, 0); if (!conn) goto unlock; if (!hci_outgoing_auth_needed(hdev, conn)) goto unlock; if (!test_and_set_bit(HCI_CONN_AUTH_PEND, &conn->flags)) { struct hci_cp_auth_requested auth_cp; set_bit(HCI_CONN_AUTH_INITIATOR, &conn->flags); auth_cp.handle = __cpu_to_le16(conn->handle); hci_send_cmd(hdev, HCI_OP_AUTH_REQUESTED, sizeof(auth_cp), &auth_cp); } unlock: hci_dev_unlock(hdev); } static void hci_cs_read_remote_features(struct hci_dev *hdev, __u8 status) { struct hci_cp_read_remote_features *cp; struct hci_conn *conn; bt_dev_dbg(hdev, "status 0x%2.2x", status); if (!status) return; cp = hci_sent_cmd_data(hdev, HCI_OP_READ_REMOTE_FEATURES); if (!cp) return; hci_dev_lock(hdev); conn = hci_conn_hash_lookup_handle(hdev, __le16_to_cpu(cp->handle)); if (conn) { if (conn->state == BT_CONFIG) { hci_connect_cfm(conn, status); hci_conn_drop(conn); } } hci_dev_unlock(hdev); } static void hci_cs_read_remote_ext_features(struct hci_dev *hdev, __u8 status) { struct hci_cp_read_remote_ext_features *cp; struct hci_conn *conn; bt_dev_dbg(hdev, "status 0x%2.2x", status); if (!status) return; cp = hci_sent_cmd_data(hdev, HCI_OP_READ_REMOTE_EXT_FEATURES); if (!cp) return; hci_dev_lock(hdev); conn = hci_conn_hash_lookup_handle(hdev, __le16_to_cpu(cp->handle)); if (conn) { if (conn->state == BT_CONFIG) { hci_connect_cfm(conn, status); hci_conn_drop(conn); } } hci_dev_unlock(hdev); } static void hci_setup_sync_conn_status(struct hci_dev *hdev, __u16 handle, __u8 status) { struct hci_conn *acl; struct hci_link *link; bt_dev_dbg(hdev, "handle 0x%4.4x status 0x%2.2x", handle, status); hci_dev_lock(hdev); acl = hci_conn_hash_lookup_handle(hdev, handle); if (acl) { link = list_first_entry_or_null(&acl->link_list, struct hci_link, list); if (link && link->conn) { link->conn->state = BT_CLOSED; hci_connect_cfm(link->conn, status); hci_conn_del(link->conn); } } hci_dev_unlock(hdev); } static void hci_cs_setup_sync_conn(struct hci_dev *hdev, __u8 status) { struct hci_cp_setup_sync_conn *cp; bt_dev_dbg(hdev, "status 0x%2.2x", status); if (!status) return; cp = hci_sent_cmd_data(hdev, HCI_OP_SETUP_SYNC_CONN); if (!cp) return; hci_setup_sync_conn_status(hdev, __le16_to_cpu(cp->handle), status); } static void hci_cs_enhanced_setup_sync_conn(struct hci_dev *hdev, __u8 status) { struct hci_cp_enhanced_setup_sync_conn *cp; bt_dev_dbg(hdev, "status 0x%2.2x", status); if (!status) return; cp = hci_sent_cmd_data(hdev, HCI_OP_ENHANCED_SETUP_SYNC_CONN); if (!cp) return; hci_setup_sync_conn_status(hdev, __le16_to_cpu(cp->handle), status); } static void hci_cs_sniff_mode(struct hci_dev *hdev, __u8 status) { struct hci_cp_sniff_mode *cp; struct hci_conn *conn; bt_dev_dbg(hdev, "status 0x%2.2x", status); if (!status) return; cp = hci_sent_cmd_data(hdev, HCI_OP_SNIFF_MODE); if (!cp) return; hci_dev_lock(hdev); conn = hci_conn_hash_lookup_handle(hdev, __le16_to_cpu(cp->handle)); if (conn) { clear_bit(HCI_CONN_MODE_CHANGE_PEND, &conn->flags); if (test_and_clear_bit(HCI_CONN_SCO_SETUP_PEND, &conn->flags)) hci_sco_setup(conn, status); } hci_dev_unlock(hdev); } static void hci_cs_exit_sniff_mode(struct hci_dev *hdev, __u8 status) { struct hci_cp_exit_sniff_mode *cp; struct hci_conn *conn; bt_dev_dbg(hdev, "status 0x%2.2x", status); if (!status) return; cp = hci_sent_cmd_data(hdev, HCI_OP_EXIT_SNIFF_MODE); if (!cp) return; hci_dev_lock(hdev); conn = hci_conn_hash_lookup_handle(hdev, __le16_to_cpu(cp->handle)); if (conn) { clear_bit(HCI_CONN_MODE_CHANGE_PEND, &conn->flags); if (test_and_clear_bit(HCI_CONN_SCO_SETUP_PEND, &conn->flags)) hci_sco_setup(conn, status); } hci_dev_unlock(hdev); } static void hci_cs_disconnect(struct hci_dev *hdev, u8 status) { struct hci_cp_disconnect *cp; struct hci_conn_params *params; struct hci_conn *conn; bool mgmt_conn; bt_dev_dbg(hdev, "status 0x%2.2x", status); /* Wait for HCI_EV_DISCONN_COMPLETE if status 0x00 and not suspended * otherwise cleanup the connection immediately. */ if (!status && !hdev->suspended) return; cp = hci_sent_cmd_data(hdev, HCI_OP_DISCONNECT); if (!cp) return; hci_dev_lock(hdev); conn = hci_conn_hash_lookup_handle(hdev, __le16_to_cpu(cp->handle)); if (!conn) goto unlock; if (status && status != HCI_ERROR_UNKNOWN_CONN_ID) { mgmt_disconnect_failed(hdev, &conn->dst, conn->type, conn->dst_type, status); if (conn->type == LE_LINK && conn->role == HCI_ROLE_SLAVE) { hdev->cur_adv_instance = conn->adv_instance; hci_enable_advertising(hdev); } /* Inform sockets conn is gone before we delete it */ hci_disconn_cfm(conn, HCI_ERROR_UNSPECIFIED); goto done; } /* During suspend, mark connection as closed immediately * since we might not receive HCI_EV_DISCONN_COMPLETE */ if (hdev->suspended) conn->state = BT_CLOSED; mgmt_conn = test_and_clear_bit(HCI_CONN_MGMT_CONNECTED, &conn->flags); if (conn->type == ACL_LINK) { if (test_and_clear_bit(HCI_CONN_FLUSH_KEY, &conn->flags)) hci_remove_link_key(hdev, &conn->dst); } params = hci_conn_params_lookup(hdev, &conn->dst, conn->dst_type); if (params) { switch (params->auto_connect) { case HCI_AUTO_CONN_LINK_LOSS: if (cp->reason != HCI_ERROR_CONNECTION_TIMEOUT) break; fallthrough; case HCI_AUTO_CONN_DIRECT: case HCI_AUTO_CONN_ALWAYS: hci_pend_le_list_del_init(params); hci_pend_le_list_add(params, &hdev->pend_le_conns); break; default: break; } } mgmt_device_disconnected(hdev, &conn->dst, conn->type, conn->dst_type, cp->reason, mgmt_conn); hci_disconn_cfm(conn, cp->reason); done: /* If the disconnection failed for any reason, the upper layer * does not retry to disconnect in current implementation. * Hence, we need to do some basic cleanup here and re-enable * advertising if necessary. */ hci_conn_del(conn); unlock: hci_dev_unlock(hdev); } static u8 ev_bdaddr_type(struct hci_dev *hdev, u8 type, bool *resolved) { /* When using controller based address resolution, then the new * address types 0x02 and 0x03 are used. These types need to be * converted back into either public address or random address type */ switch (type) { case ADDR_LE_DEV_PUBLIC_RESOLVED: if (resolved) *resolved = true; return ADDR_LE_DEV_PUBLIC; case ADDR_LE_DEV_RANDOM_RESOLVED: if (resolved) *resolved = true; return ADDR_LE_DEV_RANDOM; } if (resolved) *resolved = false; return type; } static void cs_le_create_conn(struct hci_dev *hdev, bdaddr_t *peer_addr, u8 peer_addr_type, u8 own_address_type, u8 filter_policy) { struct hci_conn *conn; conn = hci_conn_hash_lookup_le(hdev, peer_addr, peer_addr_type); if (!conn) return; own_address_type = ev_bdaddr_type(hdev, own_address_type, NULL); /* Store the initiator and responder address information which * is needed for SMP. These values will not change during the * lifetime of the connection. */ conn->init_addr_type = own_address_type; if (own_address_type == ADDR_LE_DEV_RANDOM) bacpy(&conn->init_addr, &hdev->random_addr); else bacpy(&conn->init_addr, &hdev->bdaddr); conn->resp_addr_type = peer_addr_type; bacpy(&conn->resp_addr, peer_addr); } static void hci_cs_le_create_conn(struct hci_dev *hdev, u8 status) { struct hci_cp_le_create_conn *cp; bt_dev_dbg(hdev, "status 0x%2.2x", status); /* All connection failure handling is taken care of by the * hci_conn_failed function which is triggered by the HCI * request completion callbacks used for connecting. */ if (status) return; cp = hci_sent_cmd_data(hdev, HCI_OP_LE_CREATE_CONN); if (!cp) return; hci_dev_lock(hdev); cs_le_create_conn(hdev, &cp->peer_addr, cp->peer_addr_type, cp->own_address_type, cp->filter_policy); hci_dev_unlock(hdev); } static void hci_cs_le_ext_create_conn(struct hci_dev *hdev, u8 status) { struct hci_cp_le_ext_create_conn *cp; bt_dev_dbg(hdev, "status 0x%2.2x", status); /* All connection failure handling is taken care of by the * hci_conn_failed function which is triggered by the HCI * request completion callbacks used for connecting. */ if (status) return; cp = hci_sent_cmd_data(hdev, HCI_OP_LE_EXT_CREATE_CONN); if (!cp) return; hci_dev_lock(hdev); cs_le_create_conn(hdev, &cp->peer_addr, cp->peer_addr_type, cp->own_addr_type, cp->filter_policy); hci_dev_unlock(hdev); } static void hci_cs_le_read_remote_features(struct hci_dev *hdev, u8 status) { struct hci_cp_le_read_remote_features *cp; struct hci_conn *conn; bt_dev_dbg(hdev, "status 0x%2.2x", status); if (!status) return; cp = hci_sent_cmd_data(hdev, HCI_OP_LE_READ_REMOTE_FEATURES); if (!cp) return; hci_dev_lock(hdev); conn = hci_conn_hash_lookup_handle(hdev, __le16_to_cpu(cp->handle)); if (conn) { if (conn->state == BT_CONFIG) { hci_connect_cfm(conn, status); hci_conn_drop(conn); } } hci_dev_unlock(hdev); } static void hci_cs_le_start_enc(struct hci_dev *hdev, u8 status) { struct hci_cp_le_start_enc *cp; struct hci_conn *conn; bt_dev_dbg(hdev, "status 0x%2.2x", status); if (!status) return; hci_dev_lock(hdev); cp = hci_sent_cmd_data(hdev, HCI_OP_LE_START_ENC); if (!cp) goto unlock; conn = hci_conn_hash_lookup_handle(hdev, __le16_to_cpu(cp->handle)); if (!conn) goto unlock; if (conn->state != BT_CONNECTED) goto unlock; hci_disconnect(conn, HCI_ERROR_AUTH_FAILURE); hci_conn_drop(conn); unlock: hci_dev_unlock(hdev); } static void hci_cs_switch_role(struct hci_dev *hdev, u8 status) { struct hci_cp_switch_role *cp; struct hci_conn *conn; BT_DBG("%s status 0x%2.2x", hdev->name, status); if (!status) return; cp = hci_sent_cmd_data(hdev, HCI_OP_SWITCH_ROLE); if (!cp) return; hci_dev_lock(hdev); conn = hci_conn_hash_lookup_ba(hdev, ACL_LINK, &cp->bdaddr); if (conn) clear_bit(HCI_CONN_RSWITCH_PEND, &conn->flags); hci_dev_unlock(hdev); } static void hci_inquiry_complete_evt(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_status *ev = data; struct discovery_state *discov = &hdev->discovery; struct inquiry_entry *e; bt_dev_dbg(hdev, "status 0x%2.2x", ev->status); if (!test_and_clear_bit(HCI_INQUIRY, &hdev->flags)) return; smp_mb__after_atomic(); /* wake_up_bit advises about this barrier */ wake_up_bit(&hdev->flags, HCI_INQUIRY); if (!hci_dev_test_flag(hdev, HCI_MGMT)) return; hci_dev_lock(hdev); if (discov->state != DISCOVERY_FINDING) goto unlock; if (list_empty(&discov->resolve)) { /* When BR/EDR inquiry is active and no LE scanning is in * progress, then change discovery state to indicate completion. * * When running LE scanning and BR/EDR inquiry simultaneously * and the LE scan already finished, then change the discovery * state to indicate completion. */ if (!hci_dev_test_flag(hdev, HCI_LE_SCAN) || !hci_test_quirk(hdev, HCI_QUIRK_SIMULTANEOUS_DISCOVERY)) hci_discovery_set_state(hdev, DISCOVERY_STOPPED); goto unlock; } e = hci_inquiry_cache_lookup_resolve(hdev, BDADDR_ANY, NAME_NEEDED); if (e && hci_resolve_name(hdev, e) == 0) { e->name_state = NAME_PENDING; hci_discovery_set_state(hdev, DISCOVERY_RESOLVING); discov->name_resolve_timeout = jiffies + NAME_RESOLVE_DURATION; } else { /* When BR/EDR inquiry is active and no LE scanning is in * progress, then change discovery state to indicate completion. * * When running LE scanning and BR/EDR inquiry simultaneously * and the LE scan already finished, then change the discovery * state to indicate completion. */ if (!hci_dev_test_flag(hdev, HCI_LE_SCAN) || !hci_test_quirk(hdev, HCI_QUIRK_SIMULTANEOUS_DISCOVERY)) hci_discovery_set_state(hdev, DISCOVERY_STOPPED); } unlock: hci_dev_unlock(hdev); } static void hci_inquiry_result_evt(struct hci_dev *hdev, void *edata, struct sk_buff *skb) { struct hci_ev_inquiry_result *ev = edata; struct inquiry_data data; int i; if (!hci_ev_skb_pull(hdev, skb, HCI_EV_INQUIRY_RESULT, flex_array_size(ev, info, ev->num))) return; bt_dev_dbg(hdev, "num %d", ev->num); if (!ev->num) return; if (hci_dev_test_flag(hdev, HCI_PERIODIC_INQ)) return; hci_dev_lock(hdev); for (i = 0; i < ev->num; i++) { struct inquiry_info *info = &ev->info[i]; u32 flags; bacpy(&data.bdaddr, &info->bdaddr); data.pscan_rep_mode = info->pscan_rep_mode; data.pscan_period_mode = info->pscan_period_mode; data.pscan_mode = info->pscan_mode; memcpy(data.dev_class, info->dev_class, 3); data.clock_offset = info->clock_offset; data.rssi = HCI_RSSI_INVALID; data.ssp_mode = 0x00; flags = hci_inquiry_cache_update(hdev, &data, false); mgmt_device_found(hdev, &info->bdaddr, ACL_LINK, 0x00, info->dev_class, HCI_RSSI_INVALID, flags, NULL, 0, NULL, 0, 0); } hci_dev_unlock(hdev); } static int hci_read_enc_key_size(struct hci_dev *hdev, struct hci_conn *conn) { struct hci_cp_read_enc_key_size cp; u8 *key_enc_size = hci_conn_key_enc_size(conn); if (!read_key_size_capable(hdev)) { conn->enc_key_size = HCI_LINK_KEY_SIZE; return -EOPNOTSUPP; } bt_dev_dbg(hdev, "hcon %p", conn); memset(&cp, 0, sizeof(cp)); cp.handle = cpu_to_le16(conn->handle); /* If the key enc_size is already known, use it as conn->enc_key_size, * otherwise use hdev->min_enc_key_size so the likes of * l2cap_check_enc_key_size don't fail while waiting for * HCI_OP_READ_ENC_KEY_SIZE response. */ if (key_enc_size && *key_enc_size) conn->enc_key_size = *key_enc_size; else conn->enc_key_size = hdev->min_enc_key_size; return hci_send_cmd(hdev, HCI_OP_READ_ENC_KEY_SIZE, sizeof(cp), &cp); } static void hci_conn_complete_evt(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_conn_complete *ev = data; struct hci_conn *conn; u8 status = ev->status; bt_dev_dbg(hdev, "status 0x%2.2x", status); hci_dev_lock(hdev); /* Check for existing connection: * * 1. If it doesn't exist then it must be receiver/slave role. * 2. If it does exist confirm that it is connecting/BT_CONNECT in case * of initiator/master role since there could be a collision where * either side is attempting to connect or something like a fuzzing * testing is trying to play tricks to destroy the hcon object before * it even attempts to connect (e.g. hcon->state == BT_OPEN). */ conn = hci_conn_hash_lookup_ba(hdev, ev->link_type, &ev->bdaddr); if (!conn || (conn->role == HCI_ROLE_MASTER && conn->state != BT_CONNECT)) { /* In case of error status and there is no connection pending * just unlock as there is nothing to cleanup. */ if (ev->status) goto unlock; /* Connection may not exist if auto-connected. Check the bredr * allowlist to see if this device is allowed to auto connect. * If link is an ACL type, create a connection class * automatically. * * Auto-connect will only occur if the event filter is * programmed with a given address. Right now, event filter is * only used during suspend. */ if (ev->link_type == ACL_LINK && hci_bdaddr_list_lookup_with_flags(&hdev->accept_list, &ev->bdaddr, BDADDR_BREDR)) { conn = hci_conn_add_unset(hdev, ev->link_type, &ev->bdaddr, HCI_ROLE_SLAVE); if (IS_ERR(conn)) { bt_dev_err(hdev, "connection err: %ld", PTR_ERR(conn)); goto unlock; } } else { if (ev->link_type != SCO_LINK) goto unlock; conn = hci_conn_hash_lookup_ba(hdev, ESCO_LINK, &ev->bdaddr); if (!conn) goto unlock; conn->type = SCO_LINK; } } /* The HCI_Connection_Complete event is only sent once per connection. * Processing it more than once per connection can corrupt kernel memory. * * As the connection handle is set here for the first time, it indicates * whether the connection is already set up. */ if (!HCI_CONN_HANDLE_UNSET(conn->handle)) { bt_dev_err(hdev, "Ignoring HCI_Connection_Complete for existing connection"); goto unlock; } if (!status) { status = hci_conn_set_handle(conn, __le16_to_cpu(ev->handle)); if (status) goto done; if (conn->type == ACL_LINK) { conn->state = BT_CONFIG; hci_conn_hold(conn); if (!conn->out && !hci_conn_ssp_enabled(conn) && !hci_find_link_key(hdev, &ev->bdaddr)) conn->disc_timeout = HCI_PAIRING_TIMEOUT; else conn->disc_timeout = HCI_DISCONN_TIMEOUT; } else conn->state = BT_CONNECTED; hci_debugfs_create_conn(conn); hci_conn_add_sysfs(conn); if (test_bit(HCI_AUTH, &hdev->flags)) set_bit(HCI_CONN_AUTH, &conn->flags); if (test_bit(HCI_ENCRYPT, &hdev->flags)) set_bit(HCI_CONN_ENCRYPT, &conn->flags); /* "Link key request" completed ahead of "connect request" completes */ if (ev->encr_mode == 1 && !test_bit(HCI_CONN_ENCRYPT, &conn->flags) && ev->link_type == ACL_LINK) { struct link_key *key; key = hci_find_link_key(hdev, &ev->bdaddr); if (key) { set_bit(HCI_CONN_ENCRYPT, &conn->flags); hci_read_enc_key_size(hdev, conn); hci_encrypt_cfm(conn, ev->status); } } /* Get remote features */ if (conn->type == ACL_LINK) { struct hci_cp_read_remote_features cp; cp.handle = ev->handle; hci_send_cmd(hdev, HCI_OP_READ_REMOTE_FEATURES, sizeof(cp), &cp); hci_update_scan(hdev); } /* Set packet type for incoming connection */ if (!conn->out && hdev->hci_ver < BLUETOOTH_VER_2_0) { struct hci_cp_change_conn_ptype cp; cp.handle = ev->handle; cp.pkt_type = cpu_to_le16(conn->pkt_type); hci_send_cmd(hdev, HCI_OP_CHANGE_CONN_PTYPE, sizeof(cp), &cp); } } if (conn->type == ACL_LINK) hci_sco_setup(conn, ev->status); done: if (status) { hci_conn_failed(conn, status); } else if (ev->link_type == SCO_LINK) { switch (conn->setting & SCO_AIRMODE_MASK) { case SCO_AIRMODE_CVSD: if (hdev->notify) hdev->notify(hdev, HCI_NOTIFY_ENABLE_SCO_CVSD); break; } hci_connect_cfm(conn, status); } unlock: hci_dev_unlock(hdev); } static void hci_reject_conn(struct hci_dev *hdev, bdaddr_t *bdaddr) { struct hci_cp_reject_conn_req cp; bacpy(&cp.bdaddr, bdaddr); cp.reason = HCI_ERROR_REJ_BAD_ADDR; hci_send_cmd(hdev, HCI_OP_REJECT_CONN_REQ, sizeof(cp), &cp); } static void hci_conn_request_evt(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_conn_request *ev = data; int mask = hdev->link_mode; struct inquiry_entry *ie; struct hci_conn *conn; __u8 flags = 0; bt_dev_dbg(hdev, "bdaddr %pMR type 0x%x", &ev->bdaddr, ev->link_type); /* Reject incoming connection from device with same BD ADDR against * CVE-2020-26555 */ if (hdev && !bacmp(&hdev->bdaddr, &ev->bdaddr)) { bt_dev_dbg(hdev, "Reject connection with same BD_ADDR %pMR\n", &ev->bdaddr); hci_reject_conn(hdev, &ev->bdaddr); return; } mask |= hci_proto_connect_ind(hdev, &ev->bdaddr, ev->link_type, &flags); if (!(mask & HCI_LM_ACCEPT)) { hci_reject_conn(hdev, &ev->bdaddr); return; } hci_dev_lock(hdev); if (hci_bdaddr_list_lookup(&hdev->reject_list, &ev->bdaddr, BDADDR_BREDR)) { hci_reject_conn(hdev, &ev->bdaddr); goto unlock; } /* Require HCI_CONNECTABLE or an accept list entry to accept the * connection. These features are only touched through mgmt so * only do the checks if HCI_MGMT is set. */ if (hci_dev_test_flag(hdev, HCI_MGMT) && !hci_dev_test_flag(hdev, HCI_CONNECTABLE) && !hci_bdaddr_list_lookup_with_flags(&hdev->accept_list, &ev->bdaddr, BDADDR_BREDR)) { hci_reject_conn(hdev, &ev->bdaddr); goto unlock; } /* Connection accepted */ ie = hci_inquiry_cache_lookup(hdev, &ev->bdaddr); if (ie) memcpy(ie->data.dev_class, ev->dev_class, 3); conn = hci_conn_hash_lookup_ba(hdev, ev->link_type, &ev->bdaddr); if (!conn) { conn = hci_conn_add_unset(hdev, ev->link_type, &ev->bdaddr, HCI_ROLE_SLAVE); if (IS_ERR(conn)) { bt_dev_err(hdev, "connection err: %ld", PTR_ERR(conn)); goto unlock; } } memcpy(conn->dev_class, ev->dev_class, 3); hci_dev_unlock(hdev); if (ev->link_type == ACL_LINK || (!(flags & HCI_PROTO_DEFER) && !lmp_esco_capable(hdev))) { struct hci_cp_accept_conn_req cp; conn->state = BT_CONNECT; bacpy(&cp.bdaddr, &ev->bdaddr); if (lmp_rswitch_capable(hdev) && (mask & HCI_LM_MASTER)) cp.role = 0x00; /* Become central */ else cp.role = 0x01; /* Remain peripheral */ hci_send_cmd(hdev, HCI_OP_ACCEPT_CONN_REQ, sizeof(cp), &cp); } else if (!(flags & HCI_PROTO_DEFER)) { struct hci_cp_accept_sync_conn_req cp; conn->state = BT_CONNECT; bacpy(&cp.bdaddr, &ev->bdaddr); cp.pkt_type = cpu_to_le16(conn->pkt_type); cp.tx_bandwidth = cpu_to_le32(0x00001f40); cp.rx_bandwidth = cpu_to_le32(0x00001f40); cp.max_latency = cpu_to_le16(0xffff); cp.content_format = cpu_to_le16(hdev->voice_setting); cp.retrans_effort = 0xff; hci_send_cmd(hdev, HCI_OP_ACCEPT_SYNC_CONN_REQ, sizeof(cp), &cp); } else { conn->state = BT_CONNECT2; hci_connect_cfm(conn, 0); } return; unlock: hci_dev_unlock(hdev); } static u8 hci_to_mgmt_reason(u8 err) { switch (err) { case HCI_ERROR_CONNECTION_TIMEOUT: return MGMT_DEV_DISCONN_TIMEOUT; case HCI_ERROR_REMOTE_USER_TERM: case HCI_ERROR_REMOTE_LOW_RESOURCES: case HCI_ERROR_REMOTE_POWER_OFF: return MGMT_DEV_DISCONN_REMOTE; case HCI_ERROR_LOCAL_HOST_TERM: return MGMT_DEV_DISCONN_LOCAL_HOST; default: return MGMT_DEV_DISCONN_UNKNOWN; } } static void hci_disconn_complete_evt(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_disconn_complete *ev = data; u8 reason; struct hci_conn_params *params; struct hci_conn *conn; bool mgmt_connected; bt_dev_dbg(hdev, "status 0x%2.2x", ev->status); hci_dev_lock(hdev); conn = hci_conn_hash_lookup_handle(hdev, __le16_to_cpu(ev->handle)); if (!conn) goto unlock; if (ev->status) { mgmt_disconnect_failed(hdev, &conn->dst, conn->type, conn->dst_type, ev->status); goto unlock; } conn->state = BT_CLOSED; mgmt_connected = test_and_clear_bit(HCI_CONN_MGMT_CONNECTED, &conn->flags); if (test_bit(HCI_CONN_AUTH_FAILURE, &conn->flags)) reason = MGMT_DEV_DISCONN_AUTH_FAILURE; else reason = hci_to_mgmt_reason(ev->reason); mgmt_device_disconnected(hdev, &conn->dst, conn->type, conn->dst_type, reason, mgmt_connected); if (conn->type == ACL_LINK) { if (test_and_clear_bit(HCI_CONN_FLUSH_KEY, &conn->flags)) hci_remove_link_key(hdev, &conn->dst); hci_update_scan(hdev); } /* Re-enable passive scanning if disconnected device is marked * as auto-connectable. */ if (conn->type == LE_LINK) { params = hci_conn_params_lookup(hdev, &conn->dst, conn->dst_type); if (params) { switch (params->auto_connect) { case HCI_AUTO_CONN_LINK_LOSS: if (ev->reason != HCI_ERROR_CONNECTION_TIMEOUT) break; fallthrough; case HCI_AUTO_CONN_DIRECT: case HCI_AUTO_CONN_ALWAYS: hci_pend_le_list_del_init(params); hci_pend_le_list_add(params, &hdev->pend_le_conns); hci_update_passive_scan(hdev); break; default: break; } } } hci_disconn_cfm(conn, ev->reason); /* Re-enable advertising if necessary, since it might * have been disabled by the connection. From the * HCI_LE_Set_Advertise_Enable command description in * the core specification (v4.0): * "The Controller shall continue advertising until the Host * issues an LE_Set_Advertise_Enable command with * Advertising_Enable set to 0x00 (Advertising is disabled) * or until a connection is created or until the Advertising * is timed out due to Directed Advertising." */ if (conn->type == LE_LINK && conn->role == HCI_ROLE_SLAVE) { hdev->cur_adv_instance = conn->adv_instance; hci_enable_advertising(hdev); } hci_conn_del(conn); unlock: hci_dev_unlock(hdev); } static void hci_auth_complete_evt(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_auth_complete *ev = data; struct hci_conn *conn; bt_dev_dbg(hdev, "status 0x%2.2x", ev->status); hci_dev_lock(hdev); conn = hci_conn_hash_lookup_handle(hdev, __le16_to_cpu(ev->handle)); if (!conn) goto unlock; if (!ev->status) { clear_bit(HCI_CONN_AUTH_FAILURE, &conn->flags); set_bit(HCI_CONN_AUTH, &conn->flags); conn->sec_level = conn->pending_sec_level; } else { if (ev->status == HCI_ERROR_PIN_OR_KEY_MISSING) set_bit(HCI_CONN_AUTH_FAILURE, &conn->flags); mgmt_auth_failed(conn, ev->status); } clear_bit(HCI_CONN_AUTH_PEND, &conn->flags); if (conn->state == BT_CONFIG) { if (!ev->status && hci_conn_ssp_enabled(conn)) { struct hci_cp_set_conn_encrypt cp; cp.handle = ev->handle; cp.encrypt = 0x01; hci_send_cmd(hdev, HCI_OP_SET_CONN_ENCRYPT, sizeof(cp), &cp); } else { conn->state = BT_CONNECTED; hci_connect_cfm(conn, ev->status); hci_conn_drop(conn); } } else { hci_auth_cfm(conn, ev->status); hci_conn_hold(conn); conn->disc_timeout = HCI_DISCONN_TIMEOUT; hci_conn_drop(conn); } if (test_bit(HCI_CONN_ENCRYPT_PEND, &conn->flags)) { if (!ev->status) { struct hci_cp_set_conn_encrypt cp; cp.handle = ev->handle; cp.encrypt = 0x01; hci_send_cmd(hdev, HCI_OP_SET_CONN_ENCRYPT, sizeof(cp), &cp); } else { clear_bit(HCI_CONN_ENCRYPT_PEND, &conn->flags); hci_encrypt_cfm(conn, ev->status); } } unlock: hci_dev_unlock(hdev); } static void hci_remote_name_evt(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_remote_name *ev = data; struct hci_conn *conn; bt_dev_dbg(hdev, "status 0x%2.2x", ev->status); hci_dev_lock(hdev); conn = hci_conn_hash_lookup_ba(hdev, ACL_LINK, &ev->bdaddr); if (!hci_dev_test_flag(hdev, HCI_MGMT)) goto check_auth; if (ev->status == 0) hci_check_pending_name(hdev, conn, &ev->bdaddr, ev->name, strnlen(ev->name, HCI_MAX_NAME_LENGTH)); else hci_check_pending_name(hdev, conn, &ev->bdaddr, NULL, 0); check_auth: if (!conn) goto unlock; if (!hci_outgoing_auth_needed(hdev, conn)) goto unlock; if (!test_and_set_bit(HCI_CONN_AUTH_PEND, &conn->flags)) { struct hci_cp_auth_requested cp; set_bit(HCI_CONN_AUTH_INITIATOR, &conn->flags); cp.handle = __cpu_to_le16(conn->handle); hci_send_cmd(hdev, HCI_OP_AUTH_REQUESTED, sizeof(cp), &cp); } unlock: hci_dev_unlock(hdev); } static void hci_encrypt_change_evt(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_encrypt_change *ev = data; struct hci_conn *conn; bt_dev_dbg(hdev, "status 0x%2.2x", ev->status); hci_dev_lock(hdev); conn = hci_conn_hash_lookup_handle(hdev, __le16_to_cpu(ev->handle)); if (!conn) goto unlock; if (!ev->status) { if (ev->encrypt) { /* Encryption implies authentication */ set_bit(HCI_CONN_AUTH, &conn->flags); set_bit(HCI_CONN_ENCRYPT, &conn->flags); conn->sec_level = conn->pending_sec_level; /* P-256 authentication key implies FIPS */ if (conn->key_type == HCI_LK_AUTH_COMBINATION_P256) set_bit(HCI_CONN_FIPS, &conn->flags); if ((conn->type == ACL_LINK && ev->encrypt == 0x02) || conn->type == LE_LINK) set_bit(HCI_CONN_AES_CCM, &conn->flags); } else { clear_bit(HCI_CONN_ENCRYPT, &conn->flags); clear_bit(HCI_CONN_AES_CCM, &conn->flags); } } /* We should disregard the current RPA and generate a new one * whenever the encryption procedure fails. */ if (ev->status && conn->type == LE_LINK) { hci_dev_set_flag(hdev, HCI_RPA_EXPIRED); hci_adv_instances_set_rpa_expired(hdev, true); } clear_bit(HCI_CONN_ENCRYPT_PEND, &conn->flags); /* Check link security requirements are met */ if (!hci_conn_check_link_mode(conn)) ev->status = HCI_ERROR_AUTH_FAILURE; if (ev->status && conn->state == BT_CONNECTED) { if (ev->status == HCI_ERROR_PIN_OR_KEY_MISSING) set_bit(HCI_CONN_AUTH_FAILURE, &conn->flags); /* Notify upper layers so they can cleanup before * disconnecting. */ hci_encrypt_cfm(conn, ev->status); hci_disconnect(conn, HCI_ERROR_AUTH_FAILURE); hci_conn_drop(conn); goto unlock; } /* Try reading the encryption key size for encrypted ACL links */ if (!ev->status && ev->encrypt && conn->type == ACL_LINK) { if (hci_read_enc_key_size(hdev, conn)) goto notify; goto unlock; } /* We skip the WRITE_AUTH_PAYLOAD_TIMEOUT for ATS2851 based controllers * to avoid unexpected SMP command errors when pairing. */ if (hci_test_quirk(hdev, HCI_QUIRK_BROKEN_WRITE_AUTH_PAYLOAD_TIMEOUT)) goto notify; /* Set the default Authenticated Payload Timeout after * an LE Link is established. As per Core Spec v5.0, Vol 2, Part B * Section 3.3, the HCI command WRITE_AUTH_PAYLOAD_TIMEOUT should be * sent when the link is active and Encryption is enabled, the conn * type can be either LE or ACL and controller must support LMP Ping. * Ensure for AES-CCM encryption as well. */ if (test_bit(HCI_CONN_ENCRYPT, &conn->flags) && test_bit(HCI_CONN_AES_CCM, &conn->flags) && ((conn->type == ACL_LINK && lmp_ping_capable(hdev)) || (conn->type == LE_LINK && (hdev->le_features[0] & HCI_LE_PING)))) { struct hci_cp_write_auth_payload_to cp; cp.handle = cpu_to_le16(conn->handle); cp.timeout = cpu_to_le16(hdev->auth_payload_timeout); if (hci_send_cmd(conn->hdev, HCI_OP_WRITE_AUTH_PAYLOAD_TO, sizeof(cp), &cp)) bt_dev_err(hdev, "write auth payload timeout failed"); } notify: hci_encrypt_cfm(conn, ev->status); unlock: hci_dev_unlock(hdev); } static void hci_change_link_key_complete_evt(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_change_link_key_complete *ev = data; struct hci_conn *conn; bt_dev_dbg(hdev, "status 0x%2.2x", ev->status); hci_dev_lock(hdev); conn = hci_conn_hash_lookup_handle(hdev, __le16_to_cpu(ev->handle)); if (conn) { if (!ev->status) set_bit(HCI_CONN_SECURE, &conn->flags); clear_bit(HCI_CONN_AUTH_PEND, &conn->flags); hci_key_change_cfm(conn, ev->status); } hci_dev_unlock(hdev); } static void hci_remote_features_evt(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_remote_features *ev = data; struct hci_conn *conn; bt_dev_dbg(hdev, "status 0x%2.2x", ev->status); hci_dev_lock(hdev); conn = hci_conn_hash_lookup_handle(hdev, __le16_to_cpu(ev->handle)); if (!conn) goto unlock; if (!ev->status) memcpy(conn->features[0], ev->features, 8); if (conn->state != BT_CONFIG) goto unlock; if (!ev->status && lmp_ext_feat_capable(hdev) && lmp_ext_feat_capable(conn)) { struct hci_cp_read_remote_ext_features cp; cp.handle = ev->handle; cp.page = 0x01; hci_send_cmd(hdev, HCI_OP_READ_REMOTE_EXT_FEATURES, sizeof(cp), &cp); goto unlock; } if (!ev->status) { struct hci_cp_remote_name_req cp; memset(&cp, 0, sizeof(cp)); bacpy(&cp.bdaddr, &conn->dst); cp.pscan_rep_mode = 0x02; hci_send_cmd(hdev, HCI_OP_REMOTE_NAME_REQ, sizeof(cp), &cp); } else { mgmt_device_connected(hdev, conn, NULL, 0); } if (!hci_outgoing_auth_needed(hdev, conn)) { conn->state = BT_CONNECTED; hci_connect_cfm(conn, ev->status); hci_conn_drop(conn); } unlock: hci_dev_unlock(hdev); } static inline void handle_cmd_cnt_and_timer(struct hci_dev *hdev, u8 ncmd) { cancel_delayed_work(&hdev->cmd_timer); rcu_read_lock(); if (!test_bit(HCI_RESET, &hdev->flags)) { if (ncmd) { cancel_delayed_work(&hdev->ncmd_timer); atomic_set(&hdev->cmd_cnt, 1); } else { if (!hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE)) queue_delayed_work(hdev->workqueue, &hdev->ncmd_timer, HCI_NCMD_TIMEOUT); } } rcu_read_unlock(); } static u8 hci_cc_le_read_buffer_size_v2(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_rp_le_read_buffer_size_v2 *rp = data; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); if (rp->status) return rp->status; hdev->le_mtu = __le16_to_cpu(rp->acl_mtu); hdev->le_pkts = rp->acl_max_pkt; hdev->iso_mtu = __le16_to_cpu(rp->iso_mtu); hdev->iso_pkts = rp->iso_max_pkt; hdev->le_cnt = hdev->le_pkts; hdev->iso_cnt = hdev->iso_pkts; BT_DBG("%s acl mtu %d:%d iso mtu %d:%d", hdev->name, hdev->acl_mtu, hdev->acl_pkts, hdev->iso_mtu, hdev->iso_pkts); if (hdev->le_mtu && hdev->le_mtu < HCI_MIN_LE_MTU) return HCI_ERROR_INVALID_PARAMETERS; return rp->status; } static void hci_unbound_cis_failed(struct hci_dev *hdev, u8 cig, u8 status) { struct hci_conn *conn, *tmp; lockdep_assert_held(&hdev->lock); list_for_each_entry_safe(conn, tmp, &hdev->conn_hash.list, list) { if (conn->type != CIS_LINK || conn->state == BT_OPEN || conn->iso_qos.ucast.cig != cig) continue; if (HCI_CONN_HANDLE_UNSET(conn->handle)) hci_conn_failed(conn, status); } } static u8 hci_cc_le_set_cig_params(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_rp_le_set_cig_params *rp = data; struct hci_cp_le_set_cig_params *cp; struct hci_conn *conn; u8 status = rp->status; bool pending = false; int i; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); cp = hci_sent_cmd_data(hdev, HCI_OP_LE_SET_CIG_PARAMS); if (!rp->status && (!cp || rp->num_handles != cp->num_cis || rp->cig_id != cp->cig_id)) { bt_dev_err(hdev, "unexpected Set CIG Parameters response data"); status = HCI_ERROR_UNSPECIFIED; } hci_dev_lock(hdev); /* BLUETOOTH CORE SPECIFICATION Version 5.4 | Vol 4, Part E page 2554 * * If the Status return parameter is non-zero, then the state of the CIG * and its CIS configurations shall not be changed by the command. If * the CIG did not already exist, it shall not be created. */ if (status) { /* Keep current configuration, fail only the unbound CIS */ hci_unbound_cis_failed(hdev, rp->cig_id, status); goto unlock; } /* BLUETOOTH CORE SPECIFICATION Version 5.3 | Vol 4, Part E page 2553 * * If the Status return parameter is zero, then the Controller shall * set the Connection_Handle arrayed return parameter to the connection * handle(s) corresponding to the CIS configurations specified in * the CIS_IDs command parameter, in the same order. */ for (i = 0; i < rp->num_handles; ++i) { conn = hci_conn_hash_lookup_cis(hdev, NULL, 0, rp->cig_id, cp->cis[i].cis_id); if (!conn || !bacmp(&conn->dst, BDADDR_ANY)) continue; if (conn->state != BT_BOUND && conn->state != BT_CONNECT) continue; if (hci_conn_set_handle(conn, __le16_to_cpu(rp->handle[i]))) continue; if (conn->state == BT_CONNECT) pending = true; } unlock: if (pending) hci_le_create_cis_pending(hdev); hci_dev_unlock(hdev); return rp->status; } static u8 hci_cc_le_setup_iso_path(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_rp_le_setup_iso_path *rp = data; struct hci_cp_le_setup_iso_path *cp; struct hci_conn *conn; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); cp = hci_sent_cmd_data(hdev, HCI_OP_LE_SETUP_ISO_PATH); if (!cp) return rp->status; hci_dev_lock(hdev); conn = hci_conn_hash_lookup_handle(hdev, __le16_to_cpu(cp->handle)); if (!conn) goto unlock; if (rp->status) { hci_connect_cfm(conn, rp->status); hci_conn_del(conn); goto unlock; } switch (cp->direction) { /* Input (Host to Controller) */ case 0x00: /* Only confirm connection if output only */ if (conn->iso_qos.ucast.out.sdu && !conn->iso_qos.ucast.in.sdu) hci_connect_cfm(conn, rp->status); break; /* Output (Controller to Host) */ case 0x01: /* Confirm connection since conn->iso_qos is always configured * last. */ hci_connect_cfm(conn, rp->status); /* Notify device connected in case it is a BIG Sync */ if (!rp->status && test_bit(HCI_CONN_BIG_SYNC, &conn->flags)) mgmt_device_connected(hdev, conn, NULL, 0); break; } unlock: hci_dev_unlock(hdev); return rp->status; } static void hci_cs_le_create_big(struct hci_dev *hdev, u8 status) { bt_dev_dbg(hdev, "status 0x%2.2x", status); } static u8 hci_cc_set_per_adv_param(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_status *rp = data; struct hci_cp_le_set_per_adv_params *cp; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); if (rp->status) return rp->status; cp = hci_sent_cmd_data(hdev, HCI_OP_LE_SET_PER_ADV_PARAMS); if (!cp) return rp->status; /* TODO: set the conn state */ return rp->status; } static u8 hci_cc_le_set_per_adv_enable(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_status *rp = data; struct hci_cp_le_set_per_adv_enable *cp; struct adv_info *adv = NULL, *n; u8 per_adv_cnt = 0; bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); if (rp->status) return rp->status; cp = hci_sent_cmd_data(hdev, HCI_OP_LE_SET_PER_ADV_ENABLE); if (!cp) return rp->status; hci_dev_lock(hdev); adv = hci_find_adv_instance(hdev, cp->handle); if (cp->enable) { hci_dev_set_flag(hdev, HCI_LE_PER_ADV); if (adv) adv->periodic_enabled = true; } else { if (adv) adv->periodic_enabled = false; /* If just one instance was disabled check if there are * any other instance enabled before clearing HCI_LE_PER_ADV. * The current periodic adv instance will be marked as * disabled once extended advertising is also disabled. */ list_for_each_entry_safe(adv, n, &hdev->adv_instances, list) { if (adv->periodic && adv->enabled) per_adv_cnt++; } if (per_adv_cnt > 1) goto unlock; hci_dev_clear_flag(hdev, HCI_LE_PER_ADV); } unlock: hci_dev_unlock(hdev); return rp->status; } #define HCI_CC_VL(_op, _func, _min, _max) \ { \ .op = _op, \ .func = _func, \ .min_len = _min, \ .max_len = _max, \ } #define HCI_CC(_op, _func, _len) \ HCI_CC_VL(_op, _func, _len, _len) #define HCI_CC_STATUS(_op, _func) \ HCI_CC(_op, _func, sizeof(struct hci_ev_status)) static const struct hci_cc { u16 op; u8 (*func)(struct hci_dev *hdev, void *data, struct sk_buff *skb); u16 min_len; u16 max_len; } hci_cc_table[] = { HCI_CC_STATUS(HCI_OP_INQUIRY_CANCEL, hci_cc_inquiry_cancel), HCI_CC_STATUS(HCI_OP_PERIODIC_INQ, hci_cc_periodic_inq), HCI_CC_STATUS(HCI_OP_EXIT_PERIODIC_INQ, hci_cc_exit_periodic_inq), HCI_CC(HCI_OP_REMOTE_NAME_REQ_CANCEL, hci_cc_remote_name_req_cancel, sizeof(struct hci_rp_remote_name_req_cancel)), HCI_CC(HCI_OP_ROLE_DISCOVERY, hci_cc_role_discovery, sizeof(struct hci_rp_role_discovery)), HCI_CC(HCI_OP_READ_LINK_POLICY, hci_cc_read_link_policy, sizeof(struct hci_rp_read_link_policy)), HCI_CC(HCI_OP_WRITE_LINK_POLICY, hci_cc_write_link_policy, sizeof(struct hci_rp_write_link_policy)), HCI_CC(HCI_OP_READ_DEF_LINK_POLICY, hci_cc_read_def_link_policy, sizeof(struct hci_rp_read_def_link_policy)), HCI_CC_STATUS(HCI_OP_WRITE_DEF_LINK_POLICY, hci_cc_write_def_link_policy), HCI_CC_STATUS(HCI_OP_RESET, hci_cc_reset), HCI_CC(HCI_OP_READ_STORED_LINK_KEY, hci_cc_read_stored_link_key, sizeof(struct hci_rp_read_stored_link_key)), HCI_CC(HCI_OP_DELETE_STORED_LINK_KEY, hci_cc_delete_stored_link_key, sizeof(struct hci_rp_delete_stored_link_key)), HCI_CC_STATUS(HCI_OP_WRITE_LOCAL_NAME, hci_cc_write_local_name), HCI_CC(HCI_OP_READ_LOCAL_NAME, hci_cc_read_local_name, sizeof(struct hci_rp_read_local_name)), HCI_CC_STATUS(HCI_OP_WRITE_AUTH_ENABLE, hci_cc_write_auth_enable), HCI_CC_STATUS(HCI_OP_WRITE_ENCRYPT_MODE, hci_cc_write_encrypt_mode), HCI_CC_STATUS(HCI_OP_WRITE_SCAN_ENABLE, hci_cc_write_scan_enable), HCI_CC_STATUS(HCI_OP_SET_EVENT_FLT, hci_cc_set_event_filter), HCI_CC(HCI_OP_READ_CLASS_OF_DEV, hci_cc_read_class_of_dev, sizeof(struct hci_rp_read_class_of_dev)), HCI_CC_STATUS(HCI_OP_WRITE_CLASS_OF_DEV, hci_cc_write_class_of_dev), HCI_CC(HCI_OP_READ_VOICE_SETTING, hci_cc_read_voice_setting, sizeof(struct hci_rp_read_voice_setting)), HCI_CC_STATUS(HCI_OP_WRITE_VOICE_SETTING, hci_cc_write_voice_setting), HCI_CC(HCI_OP_READ_NUM_SUPPORTED_IAC, hci_cc_read_num_supported_iac, sizeof(struct hci_rp_read_num_supported_iac)), HCI_CC_STATUS(HCI_OP_WRITE_SSP_MODE, hci_cc_write_ssp_mode), HCI_CC_STATUS(HCI_OP_WRITE_SC_SUPPORT, hci_cc_write_sc_support), HCI_CC(HCI_OP_READ_AUTH_PAYLOAD_TO, hci_cc_read_auth_payload_timeout, sizeof(struct hci_rp_read_auth_payload_to)), HCI_CC(HCI_OP_WRITE_AUTH_PAYLOAD_TO, hci_cc_write_auth_payload_timeout, sizeof(struct hci_rp_write_auth_payload_to)), HCI_CC(HCI_OP_READ_LOCAL_VERSION, hci_cc_read_local_version, sizeof(struct hci_rp_read_local_version)), HCI_CC(HCI_OP_READ_LOCAL_COMMANDS, hci_cc_read_local_commands, sizeof(struct hci_rp_read_local_commands)), HCI_CC(HCI_OP_READ_LOCAL_FEATURES, hci_cc_read_local_features, sizeof(struct hci_rp_read_local_features)), HCI_CC(HCI_OP_READ_LOCAL_EXT_FEATURES, hci_cc_read_local_ext_features, sizeof(struct hci_rp_read_local_ext_features)), HCI_CC(HCI_OP_READ_BUFFER_SIZE, hci_cc_read_buffer_size, sizeof(struct hci_rp_read_buffer_size)), HCI_CC(HCI_OP_READ_BD_ADDR, hci_cc_read_bd_addr, sizeof(struct hci_rp_read_bd_addr)), HCI_CC(HCI_OP_READ_LOCAL_PAIRING_OPTS, hci_cc_read_local_pairing_opts, sizeof(struct hci_rp_read_local_pairing_opts)), HCI_CC(HCI_OP_READ_PAGE_SCAN_ACTIVITY, hci_cc_read_page_scan_activity, sizeof(struct hci_rp_read_page_scan_activity)), HCI_CC_STATUS(HCI_OP_WRITE_PAGE_SCAN_ACTIVITY, hci_cc_write_page_scan_activity), HCI_CC(HCI_OP_READ_PAGE_SCAN_TYPE, hci_cc_read_page_scan_type, sizeof(struct hci_rp_read_page_scan_type)), HCI_CC_STATUS(HCI_OP_WRITE_PAGE_SCAN_TYPE, hci_cc_write_page_scan_type), HCI_CC(HCI_OP_READ_CLOCK, hci_cc_read_clock, sizeof(struct hci_rp_read_clock)), HCI_CC(HCI_OP_READ_ENC_KEY_SIZE, hci_cc_read_enc_key_size, sizeof(struct hci_rp_read_enc_key_size)), HCI_CC(HCI_OP_READ_INQ_RSP_TX_POWER, hci_cc_read_inq_rsp_tx_power, sizeof(struct hci_rp_read_inq_rsp_tx_power)), HCI_CC(HCI_OP_READ_DEF_ERR_DATA_REPORTING, hci_cc_read_def_err_data_reporting, sizeof(struct hci_rp_read_def_err_data_reporting)), HCI_CC_STATUS(HCI_OP_WRITE_DEF_ERR_DATA_REPORTING, hci_cc_write_def_err_data_reporting), HCI_CC(HCI_OP_PIN_CODE_REPLY, hci_cc_pin_code_reply, sizeof(struct hci_rp_pin_code_reply)), HCI_CC(HCI_OP_PIN_CODE_NEG_REPLY, hci_cc_pin_code_neg_reply, sizeof(struct hci_rp_pin_code_neg_reply)), HCI_CC(HCI_OP_READ_LOCAL_OOB_DATA, hci_cc_read_local_oob_data, sizeof(struct hci_rp_read_local_oob_data)), HCI_CC(HCI_OP_READ_LOCAL_OOB_EXT_DATA, hci_cc_read_local_oob_ext_data, sizeof(struct hci_rp_read_local_oob_ext_data)), HCI_CC(HCI_OP_LE_READ_BUFFER_SIZE, hci_cc_le_read_buffer_size, sizeof(struct hci_rp_le_read_buffer_size)), HCI_CC(HCI_OP_LE_READ_LOCAL_FEATURES, hci_cc_le_read_local_features, sizeof(struct hci_rp_le_read_local_features)), HCI_CC(HCI_OP_LE_READ_ADV_TX_POWER, hci_cc_le_read_adv_tx_power, sizeof(struct hci_rp_le_read_adv_tx_power)), HCI_CC(HCI_OP_USER_CONFIRM_REPLY, hci_cc_user_confirm_reply, sizeof(struct hci_rp_user_confirm_reply)), HCI_CC(HCI_OP_USER_CONFIRM_NEG_REPLY, hci_cc_user_confirm_neg_reply, sizeof(struct hci_rp_user_confirm_reply)), HCI_CC(HCI_OP_USER_PASSKEY_REPLY, hci_cc_user_passkey_reply, sizeof(struct hci_rp_user_confirm_reply)), HCI_CC(HCI_OP_USER_PASSKEY_NEG_REPLY, hci_cc_user_passkey_neg_reply, sizeof(struct hci_rp_user_confirm_reply)), HCI_CC_STATUS(HCI_OP_LE_SET_RANDOM_ADDR, hci_cc_le_set_random_addr), HCI_CC_STATUS(HCI_OP_LE_SET_ADV_ENABLE, hci_cc_le_set_adv_enable), HCI_CC_STATUS(HCI_OP_LE_SET_SCAN_PARAM, hci_cc_le_set_scan_param), HCI_CC_STATUS(HCI_OP_LE_SET_SCAN_ENABLE, hci_cc_le_set_scan_enable), HCI_CC(HCI_OP_LE_READ_ACCEPT_LIST_SIZE, hci_cc_le_read_accept_list_size, sizeof(struct hci_rp_le_read_accept_list_size)), HCI_CC_STATUS(HCI_OP_LE_CLEAR_ACCEPT_LIST, hci_cc_le_clear_accept_list), HCI_CC_STATUS(HCI_OP_LE_ADD_TO_ACCEPT_LIST, hci_cc_le_add_to_accept_list), HCI_CC_STATUS(HCI_OP_LE_DEL_FROM_ACCEPT_LIST, hci_cc_le_del_from_accept_list), HCI_CC(HCI_OP_LE_READ_SUPPORTED_STATES, hci_cc_le_read_supported_states, sizeof(struct hci_rp_le_read_supported_states)), HCI_CC(HCI_OP_LE_READ_DEF_DATA_LEN, hci_cc_le_read_def_data_len, sizeof(struct hci_rp_le_read_def_data_len)), HCI_CC_STATUS(HCI_OP_LE_WRITE_DEF_DATA_LEN, hci_cc_le_write_def_data_len), HCI_CC_STATUS(HCI_OP_LE_ADD_TO_RESOLV_LIST, hci_cc_le_add_to_resolv_list), HCI_CC_STATUS(HCI_OP_LE_DEL_FROM_RESOLV_LIST, hci_cc_le_del_from_resolv_list), HCI_CC_STATUS(HCI_OP_LE_CLEAR_RESOLV_LIST, hci_cc_le_clear_resolv_list), HCI_CC(HCI_OP_LE_READ_RESOLV_LIST_SIZE, hci_cc_le_read_resolv_list_size, sizeof(struct hci_rp_le_read_resolv_list_size)), HCI_CC_STATUS(HCI_OP_LE_SET_ADDR_RESOLV_ENABLE, hci_cc_le_set_addr_resolution_enable), HCI_CC(HCI_OP_LE_READ_MAX_DATA_LEN, hci_cc_le_read_max_data_len, sizeof(struct hci_rp_le_read_max_data_len)), HCI_CC_STATUS(HCI_OP_WRITE_LE_HOST_SUPPORTED, hci_cc_write_le_host_supported), HCI_CC_STATUS(HCI_OP_LE_SET_ADV_PARAM, hci_cc_set_adv_param), HCI_CC(HCI_OP_READ_RSSI, hci_cc_read_rssi, sizeof(struct hci_rp_read_rssi)), HCI_CC(HCI_OP_READ_TX_POWER, hci_cc_read_tx_power, sizeof(struct hci_rp_read_tx_power)), HCI_CC_STATUS(HCI_OP_WRITE_SSP_DEBUG_MODE, hci_cc_write_ssp_debug_mode), HCI_CC_STATUS(HCI_OP_LE_SET_EXT_SCAN_PARAMS, hci_cc_le_set_ext_scan_param), HCI_CC_STATUS(HCI_OP_LE_SET_EXT_SCAN_ENABLE, hci_cc_le_set_ext_scan_enable), HCI_CC_STATUS(HCI_OP_LE_SET_DEFAULT_PHY, hci_cc_le_set_default_phy), HCI_CC(HCI_OP_LE_READ_NUM_SUPPORTED_ADV_SETS, hci_cc_le_read_num_adv_sets, sizeof(struct hci_rp_le_read_num_supported_adv_sets)), HCI_CC_STATUS(HCI_OP_LE_SET_EXT_ADV_ENABLE, hci_cc_le_set_ext_adv_enable), HCI_CC_STATUS(HCI_OP_LE_SET_ADV_SET_RAND_ADDR, hci_cc_le_set_adv_set_random_addr), HCI_CC_STATUS(HCI_OP_LE_REMOVE_ADV_SET, hci_cc_le_remove_adv_set), HCI_CC_STATUS(HCI_OP_LE_CLEAR_ADV_SETS, hci_cc_le_clear_adv_sets), HCI_CC_STATUS(HCI_OP_LE_SET_PER_ADV_PARAMS, hci_cc_set_per_adv_param), HCI_CC_STATUS(HCI_OP_LE_SET_PER_ADV_ENABLE, hci_cc_le_set_per_adv_enable), HCI_CC(HCI_OP_LE_READ_TRANSMIT_POWER, hci_cc_le_read_transmit_power, sizeof(struct hci_rp_le_read_transmit_power)), HCI_CC_STATUS(HCI_OP_LE_SET_PRIVACY_MODE, hci_cc_le_set_privacy_mode), HCI_CC(HCI_OP_LE_READ_BUFFER_SIZE_V2, hci_cc_le_read_buffer_size_v2, sizeof(struct hci_rp_le_read_buffer_size_v2)), HCI_CC_VL(HCI_OP_LE_SET_CIG_PARAMS, hci_cc_le_set_cig_params, sizeof(struct hci_rp_le_set_cig_params), HCI_MAX_EVENT_SIZE), HCI_CC(HCI_OP_LE_SETUP_ISO_PATH, hci_cc_le_setup_iso_path, sizeof(struct hci_rp_le_setup_iso_path)), }; static u8 hci_cc_func(struct hci_dev *hdev, const struct hci_cc *cc, struct sk_buff *skb) { void *data; if (skb->len < cc->min_len) { bt_dev_err(hdev, "unexpected cc 0x%4.4x length: %u < %u", cc->op, skb->len, cc->min_len); return HCI_ERROR_UNSPECIFIED; } /* Just warn if the length is over max_len size it still be possible to * partially parse the cc so leave to callback to decide if that is * acceptable. */ if (skb->len > cc->max_len) bt_dev_warn(hdev, "unexpected cc 0x%4.4x length: %u > %u", cc->op, skb->len, cc->max_len); data = hci_cc_skb_pull(hdev, skb, cc->op, cc->min_len); if (!data) return HCI_ERROR_UNSPECIFIED; return cc->func(hdev, data, skb); } static void hci_cmd_complete_evt(struct hci_dev *hdev, void *data, struct sk_buff *skb, u16 *opcode, u8 *status, hci_req_complete_t *req_complete, hci_req_complete_skb_t *req_complete_skb) { struct hci_ev_cmd_complete *ev = data; int i; *opcode = __le16_to_cpu(ev->opcode); bt_dev_dbg(hdev, "opcode 0x%4.4x", *opcode); for (i = 0; i < ARRAY_SIZE(hci_cc_table); i++) { if (hci_cc_table[i].op == *opcode) { *status = hci_cc_func(hdev, &hci_cc_table[i], skb); break; } } if (i == ARRAY_SIZE(hci_cc_table)) { /* Unknown opcode, assume byte 0 contains the status, so * that e.g. __hci_cmd_sync() properly returns errors * for vendor specific commands send by HCI drivers. * If a vendor doesn't actually follow this convention we may * need to introduce a vendor CC table in order to properly set * the status. */ *status = skb->data[0]; } handle_cmd_cnt_and_timer(hdev, ev->ncmd); hci_req_cmd_complete(hdev, *opcode, *status, req_complete, req_complete_skb); if (hci_dev_test_flag(hdev, HCI_CMD_PENDING)) { bt_dev_err(hdev, "unexpected event for opcode 0x%4.4x", *opcode); return; } if (atomic_read(&hdev->cmd_cnt) && !skb_queue_empty(&hdev->cmd_q)) queue_work(hdev->workqueue, &hdev->cmd_work); } static void hci_cs_le_create_cis(struct hci_dev *hdev, u8 status) { struct hci_cp_le_create_cis *cp; bool pending = false; int i; bt_dev_dbg(hdev, "status 0x%2.2x", status); if (!status) return; cp = hci_sent_cmd_data(hdev, HCI_OP_LE_CREATE_CIS); if (!cp) return; hci_dev_lock(hdev); /* Remove connection if command failed */ for (i = 0; i < cp->num_cis; i++) { struct hci_conn *conn; u16 handle; handle = __le16_to_cpu(cp->cis[i].cis_handle); conn = hci_conn_hash_lookup_handle(hdev, handle); if (conn) { if (test_and_clear_bit(HCI_CONN_CREATE_CIS, &conn->flags)) pending = true; conn->state = BT_CLOSED; hci_connect_cfm(conn, status); hci_conn_del(conn); } } cp->num_cis = 0; if (pending) hci_le_create_cis_pending(hdev); hci_dev_unlock(hdev); } #define HCI_CS(_op, _func) \ { \ .op = _op, \ .func = _func, \ } static const struct hci_cs { u16 op; void (*func)(struct hci_dev *hdev, __u8 status); } hci_cs_table[] = { HCI_CS(HCI_OP_INQUIRY, hci_cs_inquiry), HCI_CS(HCI_OP_CREATE_CONN, hci_cs_create_conn), HCI_CS(HCI_OP_DISCONNECT, hci_cs_disconnect), HCI_CS(HCI_OP_ADD_SCO, hci_cs_add_sco), HCI_CS(HCI_OP_AUTH_REQUESTED, hci_cs_auth_requested), HCI_CS(HCI_OP_SET_CONN_ENCRYPT, hci_cs_set_conn_encrypt), HCI_CS(HCI_OP_REMOTE_NAME_REQ, hci_cs_remote_name_req), HCI_CS(HCI_OP_READ_REMOTE_FEATURES, hci_cs_read_remote_features), HCI_CS(HCI_OP_READ_REMOTE_EXT_FEATURES, hci_cs_read_remote_ext_features), HCI_CS(HCI_OP_SETUP_SYNC_CONN, hci_cs_setup_sync_conn), HCI_CS(HCI_OP_ENHANCED_SETUP_SYNC_CONN, hci_cs_enhanced_setup_sync_conn), HCI_CS(HCI_OP_SNIFF_MODE, hci_cs_sniff_mode), HCI_CS(HCI_OP_EXIT_SNIFF_MODE, hci_cs_exit_sniff_mode), HCI_CS(HCI_OP_SWITCH_ROLE, hci_cs_switch_role), HCI_CS(HCI_OP_LE_CREATE_CONN, hci_cs_le_create_conn), HCI_CS(HCI_OP_LE_READ_REMOTE_FEATURES, hci_cs_le_read_remote_features), HCI_CS(HCI_OP_LE_START_ENC, hci_cs_le_start_enc), HCI_CS(HCI_OP_LE_EXT_CREATE_CONN, hci_cs_le_ext_create_conn), HCI_CS(HCI_OP_LE_CREATE_CIS, hci_cs_le_create_cis), HCI_CS(HCI_OP_LE_CREATE_BIG, hci_cs_le_create_big), }; static void hci_cmd_status_evt(struct hci_dev *hdev, void *data, struct sk_buff *skb, u16 *opcode, u8 *status, hci_req_complete_t *req_complete, hci_req_complete_skb_t *req_complete_skb) { struct hci_ev_cmd_status *ev = data; int i; *opcode = __le16_to_cpu(ev->opcode); *status = ev->status; bt_dev_dbg(hdev, "opcode 0x%4.4x", *opcode); for (i = 0; i < ARRAY_SIZE(hci_cs_table); i++) { if (hci_cs_table[i].op == *opcode) { hci_cs_table[i].func(hdev, ev->status); break; } } handle_cmd_cnt_and_timer(hdev, ev->ncmd); /* Indicate request completion if the command failed. Also, if * we're not waiting for a special event and we get a success * command status we should try to flag the request as completed * (since for this kind of commands there will not be a command * complete event). */ if (ev->status || (hdev->req_skb && !hci_skb_event(hdev->req_skb))) { hci_req_cmd_complete(hdev, *opcode, ev->status, req_complete, req_complete_skb); if (hci_dev_test_flag(hdev, HCI_CMD_PENDING)) { bt_dev_err(hdev, "unexpected event for opcode 0x%4.4x", *opcode); return; } } if (atomic_read(&hdev->cmd_cnt) && !skb_queue_empty(&hdev->cmd_q)) queue_work(hdev->workqueue, &hdev->cmd_work); } static void hci_hardware_error_evt(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_hardware_error *ev = data; bt_dev_dbg(hdev, "code 0x%2.2x", ev->code); hdev->hw_error_code = ev->code; queue_work(hdev->req_workqueue, &hdev->error_reset); } static void hci_role_change_evt(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_role_change *ev = data; struct hci_conn *conn; bt_dev_dbg(hdev, "status 0x%2.2x", ev->status); hci_dev_lock(hdev); conn = hci_conn_hash_lookup_ba(hdev, ACL_LINK, &ev->bdaddr); if (conn) { if (!ev->status) conn->role = ev->role; clear_bit(HCI_CONN_RSWITCH_PEND, &conn->flags); hci_role_switch_cfm(conn, ev->status, ev->role); } hci_dev_unlock(hdev); } static void hci_num_comp_pkts_evt(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_num_comp_pkts *ev = data; int i; if (!hci_ev_skb_pull(hdev, skb, HCI_EV_NUM_COMP_PKTS, flex_array_size(ev, handles, ev->num))) return; bt_dev_dbg(hdev, "num %d", ev->num); hci_dev_lock(hdev); for (i = 0; i < ev->num; i++) { struct hci_comp_pkts_info *info = &ev->handles[i]; struct hci_conn *conn; __u16 handle, count; unsigned int i; handle = __le16_to_cpu(info->handle); count = __le16_to_cpu(info->count); conn = hci_conn_hash_lookup_handle(hdev, handle); if (!conn) continue; /* Check if there is really enough packets outstanding before * attempting to decrease the sent counter otherwise it could * underflow.. */ if (conn->sent >= count) { conn->sent -= count; } else { bt_dev_warn(hdev, "hcon %p sent %u < count %u", conn, conn->sent, count); conn->sent = 0; } for (i = 0; i < count; ++i) hci_conn_tx_dequeue(conn); switch (conn->type) { case ACL_LINK: hdev->acl_cnt += count; if (hdev->acl_cnt > hdev->acl_pkts) hdev->acl_cnt = hdev->acl_pkts; break; case LE_LINK: if (hdev->le_pkts) { hdev->le_cnt += count; if (hdev->le_cnt > hdev->le_pkts) hdev->le_cnt = hdev->le_pkts; } else { hdev->acl_cnt += count; if (hdev->acl_cnt > hdev->acl_pkts) hdev->acl_cnt = hdev->acl_pkts; } break; case SCO_LINK: case ESCO_LINK: hdev->sco_cnt += count; if (hdev->sco_cnt > hdev->sco_pkts) hdev->sco_cnt = hdev->sco_pkts; break; case CIS_LINK: case BIS_LINK: case PA_LINK: hdev->iso_cnt += count; if (hdev->iso_cnt > hdev->iso_pkts) hdev->iso_cnt = hdev->iso_pkts; break; default: bt_dev_err(hdev, "unknown type %d conn %p", conn->type, conn); break; } } queue_work(hdev->workqueue, &hdev->tx_work); hci_dev_unlock(hdev); } static void hci_mode_change_evt(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_mode_change *ev = data; struct hci_conn *conn; bt_dev_dbg(hdev, "status 0x%2.2x", ev->status); hci_dev_lock(hdev); conn = hci_conn_hash_lookup_handle(hdev, __le16_to_cpu(ev->handle)); if (conn) { conn->mode = ev->mode; if (!test_and_clear_bit(HCI_CONN_MODE_CHANGE_PEND, &conn->flags)) { if (conn->mode == HCI_CM_ACTIVE) set_bit(HCI_CONN_POWER_SAVE, &conn->flags); else clear_bit(HCI_CONN_POWER_SAVE, &conn->flags); } if (test_and_clear_bit(HCI_CONN_SCO_SETUP_PEND, &conn->flags)) hci_sco_setup(conn, ev->status); } hci_dev_unlock(hdev); } static void hci_pin_code_request_evt(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_pin_code_req *ev = data; struct hci_conn *conn; bt_dev_dbg(hdev, ""); hci_dev_lock(hdev); conn = hci_conn_hash_lookup_ba(hdev, ACL_LINK, &ev->bdaddr); if (!conn) goto unlock; if (conn->state == BT_CONNECTED) { hci_conn_hold(conn); conn->disc_timeout = HCI_PAIRING_TIMEOUT; hci_conn_drop(conn); } if (!hci_dev_test_flag(hdev, HCI_BONDABLE) && !test_bit(HCI_CONN_AUTH_INITIATOR, &conn->flags)) { hci_send_cmd(hdev, HCI_OP_PIN_CODE_NEG_REPLY, sizeof(ev->bdaddr), &ev->bdaddr); } else if (hci_dev_test_flag(hdev, HCI_MGMT)) { u8 secure; if (conn->pending_sec_level == BT_SECURITY_HIGH) secure = 1; else secure = 0; mgmt_pin_code_request(hdev, &ev->bdaddr, secure); } unlock: hci_dev_unlock(hdev); } static void conn_set_key(struct hci_conn *conn, u8 key_type, u8 pin_len) { if (key_type == HCI_LK_CHANGED_COMBINATION) return; conn->pin_length = pin_len; conn->key_type = key_type; switch (key_type) { case HCI_LK_LOCAL_UNIT: case HCI_LK_REMOTE_UNIT: case HCI_LK_DEBUG_COMBINATION: return; case HCI_LK_COMBINATION: if (pin_len == 16) conn->pending_sec_level = BT_SECURITY_HIGH; else conn->pending_sec_level = BT_SECURITY_MEDIUM; break; case HCI_LK_UNAUTH_COMBINATION_P192: case HCI_LK_UNAUTH_COMBINATION_P256: conn->pending_sec_level = BT_SECURITY_MEDIUM; break; case HCI_LK_AUTH_COMBINATION_P192: conn->pending_sec_level = BT_SECURITY_HIGH; break; case HCI_LK_AUTH_COMBINATION_P256: conn->pending_sec_level = BT_SECURITY_FIPS; break; } } static void hci_link_key_request_evt(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_link_key_req *ev = data; struct hci_cp_link_key_reply cp; struct hci_conn *conn; struct link_key *key; bt_dev_dbg(hdev, ""); if (!hci_dev_test_flag(hdev, HCI_MGMT)) return; hci_dev_lock(hdev); key = hci_find_link_key(hdev, &ev->bdaddr); if (!key) { bt_dev_dbg(hdev, "link key not found for %pMR", &ev->bdaddr); goto not_found; } bt_dev_dbg(hdev, "found key type %u for %pMR", key->type, &ev->bdaddr); conn = hci_conn_hash_lookup_ba(hdev, ACL_LINK, &ev->bdaddr); if (conn) { clear_bit(HCI_CONN_NEW_LINK_KEY, &conn->flags); if ((key->type == HCI_LK_UNAUTH_COMBINATION_P192 || key->type == HCI_LK_UNAUTH_COMBINATION_P256) && conn->auth_type != 0xff && (conn->auth_type & 0x01)) { bt_dev_dbg(hdev, "ignoring unauthenticated key"); goto not_found; } if (key->type == HCI_LK_COMBINATION && key->pin_len < 16 && (conn->pending_sec_level == BT_SECURITY_HIGH || conn->pending_sec_level == BT_SECURITY_FIPS)) { bt_dev_dbg(hdev, "ignoring key unauthenticated for high security"); goto not_found; } conn_set_key(conn, key->type, key->pin_len); } bacpy(&cp.bdaddr, &ev->bdaddr); memcpy(cp.link_key, key->val, HCI_LINK_KEY_SIZE); hci_send_cmd(hdev, HCI_OP_LINK_KEY_REPLY, sizeof(cp), &cp); hci_dev_unlock(hdev); return; not_found: hci_send_cmd(hdev, HCI_OP_LINK_KEY_NEG_REPLY, 6, &ev->bdaddr); hci_dev_unlock(hdev); } static void hci_link_key_notify_evt(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_link_key_notify *ev = data; struct hci_conn *conn; struct link_key *key; bool persistent; u8 pin_len = 0; bt_dev_dbg(hdev, ""); hci_dev_lock(hdev); conn = hci_conn_hash_lookup_ba(hdev, ACL_LINK, &ev->bdaddr); if (!conn) goto unlock; /* Ignore NULL link key against CVE-2020-26555 */ if (!crypto_memneq(ev->link_key, ZERO_KEY, HCI_LINK_KEY_SIZE)) { bt_dev_dbg(hdev, "Ignore NULL link key (ZERO KEY) for %pMR", &ev->bdaddr); hci_disconnect(conn, HCI_ERROR_AUTH_FAILURE); hci_conn_drop(conn); goto unlock; } hci_conn_hold(conn); conn->disc_timeout = HCI_DISCONN_TIMEOUT; hci_conn_drop(conn); set_bit(HCI_CONN_NEW_LINK_KEY, &conn->flags); conn_set_key(conn, ev->key_type, conn->pin_length); if (!hci_dev_test_flag(hdev, HCI_MGMT)) goto unlock; key = hci_add_link_key(hdev, conn, &ev->bdaddr, ev->link_key, ev->key_type, pin_len, &persistent); if (!key) goto unlock; /* Update connection information since adding the key will have * fixed up the type in the case of changed combination keys. */ if (ev->key_type == HCI_LK_CHANGED_COMBINATION) conn_set_key(conn, key->type, key->pin_len); mgmt_new_link_key(hdev, key, persistent); /* Keep debug keys around only if the HCI_KEEP_DEBUG_KEYS flag * is set. If it's not set simply remove the key from the kernel * list (we've still notified user space about it but with * store_hint being 0). */ if (key->type == HCI_LK_DEBUG_COMBINATION && !hci_dev_test_flag(hdev, HCI_KEEP_DEBUG_KEYS)) { list_del_rcu(&key->list); kfree_rcu(key, rcu); goto unlock; } if (persistent) clear_bit(HCI_CONN_FLUSH_KEY, &conn->flags); else set_bit(HCI_CONN_FLUSH_KEY, &conn->flags); unlock: hci_dev_unlock(hdev); } static void hci_clock_offset_evt(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_clock_offset *ev = data; struct hci_conn *conn; bt_dev_dbg(hdev, "status 0x%2.2x", ev->status); hci_dev_lock(hdev); conn = hci_conn_hash_lookup_handle(hdev, __le16_to_cpu(ev->handle)); if (conn && !ev->status) { struct inquiry_entry *ie; ie = hci_inquiry_cache_lookup(hdev, &conn->dst); if (ie) { ie->data.clock_offset = ev->clock_offset; ie->timestamp = jiffies; } } hci_dev_unlock(hdev); } static void hci_pkt_type_change_evt(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_pkt_type_change *ev = data; struct hci_conn *conn; bt_dev_dbg(hdev, "status 0x%2.2x", ev->status); hci_dev_lock(hdev); conn = hci_conn_hash_lookup_handle(hdev, __le16_to_cpu(ev->handle)); if (conn && !ev->status) conn->pkt_type = __le16_to_cpu(ev->pkt_type); hci_dev_unlock(hdev); } static void hci_pscan_rep_mode_evt(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_pscan_rep_mode *ev = data; struct inquiry_entry *ie; bt_dev_dbg(hdev, ""); hci_dev_lock(hdev); ie = hci_inquiry_cache_lookup(hdev, &ev->bdaddr); if (ie) { ie->data.pscan_rep_mode = ev->pscan_rep_mode; ie->timestamp = jiffies; } hci_dev_unlock(hdev); } static void hci_inquiry_result_with_rssi_evt(struct hci_dev *hdev, void *edata, struct sk_buff *skb) { struct hci_ev_inquiry_result_rssi *ev = edata; struct inquiry_data data; int i; bt_dev_dbg(hdev, "num_rsp %d", ev->num); if (!ev->num) return; if (hci_dev_test_flag(hdev, HCI_PERIODIC_INQ)) return; hci_dev_lock(hdev); if (skb->len == array_size(ev->num, sizeof(struct inquiry_info_rssi_pscan))) { struct inquiry_info_rssi_pscan *info; for (i = 0; i < ev->num; i++) { u32 flags; info = hci_ev_skb_pull(hdev, skb, HCI_EV_INQUIRY_RESULT_WITH_RSSI, sizeof(*info)); if (!info) { bt_dev_err(hdev, "Malformed HCI Event: 0x%2.2x", HCI_EV_INQUIRY_RESULT_WITH_RSSI); goto unlock; } bacpy(&data.bdaddr, &info->bdaddr); data.pscan_rep_mode = info->pscan_rep_mode; data.pscan_period_mode = info->pscan_period_mode; data.pscan_mode = info->pscan_mode; memcpy(data.dev_class, info->dev_class, 3); data.clock_offset = info->clock_offset; data.rssi = info->rssi; data.ssp_mode = 0x00; flags = hci_inquiry_cache_update(hdev, &data, false); mgmt_device_found(hdev, &info->bdaddr, ACL_LINK, 0x00, info->dev_class, info->rssi, flags, NULL, 0, NULL, 0, 0); } } else if (skb->len == array_size(ev->num, sizeof(struct inquiry_info_rssi))) { struct inquiry_info_rssi *info; for (i = 0; i < ev->num; i++) { u32 flags; info = hci_ev_skb_pull(hdev, skb, HCI_EV_INQUIRY_RESULT_WITH_RSSI, sizeof(*info)); if (!info) { bt_dev_err(hdev, "Malformed HCI Event: 0x%2.2x", HCI_EV_INQUIRY_RESULT_WITH_RSSI); goto unlock; } bacpy(&data.bdaddr, &info->bdaddr); data.pscan_rep_mode = info->pscan_rep_mode; data.pscan_period_mode = info->pscan_period_mode; data.pscan_mode = 0x00; memcpy(data.dev_class, info->dev_class, 3); data.clock_offset = info->clock_offset; data.rssi = info->rssi; data.ssp_mode = 0x00; flags = hci_inquiry_cache_update(hdev, &data, false); mgmt_device_found(hdev, &info->bdaddr, ACL_LINK, 0x00, info->dev_class, info->rssi, flags, NULL, 0, NULL, 0, 0); } } else { bt_dev_err(hdev, "Malformed HCI Event: 0x%2.2x", HCI_EV_INQUIRY_RESULT_WITH_RSSI); } unlock: hci_dev_unlock(hdev); } static void hci_remote_ext_features_evt(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_remote_ext_features *ev = data; struct hci_conn *conn; bt_dev_dbg(hdev, "status 0x%2.2x", ev->status); hci_dev_lock(hdev); conn = hci_conn_hash_lookup_handle(hdev, __le16_to_cpu(ev->handle)); if (!conn) goto unlock; if (ev->page < HCI_MAX_PAGES) memcpy(conn->features[ev->page], ev->features, 8); if (!ev->status && ev->page == 0x01) { struct inquiry_entry *ie; ie = hci_inquiry_cache_lookup(hdev, &conn->dst); if (ie) ie->data.ssp_mode = (ev->features[0] & LMP_HOST_SSP); if (ev->features[0] & LMP_HOST_SSP) { set_bit(HCI_CONN_SSP_ENABLED, &conn->flags); } else { /* It is mandatory by the Bluetooth specification that * Extended Inquiry Results are only used when Secure * Simple Pairing is enabled, but some devices violate * this. * * To make these devices work, the internal SSP * enabled flag needs to be cleared if the remote host * features do not indicate SSP support */ clear_bit(HCI_CONN_SSP_ENABLED, &conn->flags); } if (ev->features[0] & LMP_HOST_SC) set_bit(HCI_CONN_SC_ENABLED, &conn->flags); } if (conn->state != BT_CONFIG) goto unlock; if (!ev->status && !test_bit(HCI_CONN_MGMT_CONNECTED, &conn->flags)) { struct hci_cp_remote_name_req cp; memset(&cp, 0, sizeof(cp)); bacpy(&cp.bdaddr, &conn->dst); cp.pscan_rep_mode = 0x02; hci_send_cmd(hdev, HCI_OP_REMOTE_NAME_REQ, sizeof(cp), &cp); } else { mgmt_device_connected(hdev, conn, NULL, 0); } if (!hci_outgoing_auth_needed(hdev, conn)) { conn->state = BT_CONNECTED; hci_connect_cfm(conn, ev->status); hci_conn_drop(conn); } unlock: hci_dev_unlock(hdev); } static void hci_sync_conn_complete_evt(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_sync_conn_complete *ev = data; struct hci_conn *conn; u8 status = ev->status; switch (ev->link_type) { case SCO_LINK: case ESCO_LINK: break; default: /* As per Core 5.3 Vol 4 Part E 7.7.35 (p.2219), Link_Type * for HCI_Synchronous_Connection_Complete is limited to * either SCO or eSCO */ bt_dev_err(hdev, "Ignoring connect complete event for invalid link type"); return; } bt_dev_dbg(hdev, "status 0x%2.2x", status); hci_dev_lock(hdev); conn = hci_conn_hash_lookup_ba(hdev, ev->link_type, &ev->bdaddr); if (!conn) { if (ev->link_type == ESCO_LINK) goto unlock; /* When the link type in the event indicates SCO connection * and lookup of the connection object fails, then check * if an eSCO connection object exists. * * The core limits the synchronous connections to either * SCO or eSCO. The eSCO connection is preferred and tried * to be setup first and until successfully established, * the link type will be hinted as eSCO. */ conn = hci_conn_hash_lookup_ba(hdev, ESCO_LINK, &ev->bdaddr); if (!conn) goto unlock; } /* The HCI_Synchronous_Connection_Complete event is only sent once per connection. * Processing it more than once per connection can corrupt kernel memory. * * As the connection handle is set here for the first time, it indicates * whether the connection is already set up. */ if (!HCI_CONN_HANDLE_UNSET(conn->handle)) { bt_dev_err(hdev, "Ignoring HCI_Sync_Conn_Complete event for existing connection"); goto unlock; } switch (status) { case 0x00: status = hci_conn_set_handle(conn, __le16_to_cpu(ev->handle)); if (status) { conn->state = BT_CLOSED; break; } conn->state = BT_CONNECTED; conn->type = ev->link_type; hci_debugfs_create_conn(conn); hci_conn_add_sysfs(conn); break; case 0x10: /* Connection Accept Timeout */ case 0x0d: /* Connection Rejected due to Limited Resources */ case 0x11: /* Unsupported Feature or Parameter Value */ case 0x1c: /* SCO interval rejected */ case 0x1a: /* Unsupported Remote Feature */ case 0x1e: /* Invalid LMP Parameters */ case 0x1f: /* Unspecified error */ case 0x20: /* Unsupported LMP Parameter value */ if (conn->out) { conn->pkt_type = (hdev->esco_type & SCO_ESCO_MASK) | (hdev->esco_type & EDR_ESCO_MASK); if (hci_setup_sync(conn, conn->parent->handle)) goto unlock; } fallthrough; default: conn->state = BT_CLOSED; break; } bt_dev_dbg(hdev, "SCO connected with air mode: %02x", ev->air_mode); /* Notify only in case of SCO over HCI transport data path which * is zero and non-zero value shall be non-HCI transport data path */ if (conn->codec.data_path == 0 && hdev->notify) { switch (ev->air_mode) { case 0x02: hdev->notify(hdev, HCI_NOTIFY_ENABLE_SCO_CVSD); break; case 0x03: hdev->notify(hdev, HCI_NOTIFY_ENABLE_SCO_TRANSP); break; } } hci_connect_cfm(conn, status); if (status) hci_conn_del(conn); unlock: hci_dev_unlock(hdev); } static inline size_t eir_get_length(u8 *eir, size_t eir_len) { size_t parsed = 0; while (parsed < eir_len) { u8 field_len = eir[0]; if (field_len == 0) return parsed; parsed += field_len + 1; eir += field_len + 1; } return eir_len; } static void hci_extended_inquiry_result_evt(struct hci_dev *hdev, void *edata, struct sk_buff *skb) { struct hci_ev_ext_inquiry_result *ev = edata; struct inquiry_data data; size_t eir_len; int i; if (!hci_ev_skb_pull(hdev, skb, HCI_EV_EXTENDED_INQUIRY_RESULT, flex_array_size(ev, info, ev->num))) return; bt_dev_dbg(hdev, "num %d", ev->num); if (!ev->num) return; if (hci_dev_test_flag(hdev, HCI_PERIODIC_INQ)) return; hci_dev_lock(hdev); for (i = 0; i < ev->num; i++) { struct extended_inquiry_info *info = &ev->info[i]; u32 flags; bool name_known; bacpy(&data.bdaddr, &info->bdaddr); data.pscan_rep_mode = info->pscan_rep_mode; data.pscan_period_mode = info->pscan_period_mode; data.pscan_mode = 0x00; memcpy(data.dev_class, info->dev_class, 3); data.clock_offset = info->clock_offset; data.rssi = info->rssi; data.ssp_mode = 0x01; if (hci_dev_test_flag(hdev, HCI_MGMT)) name_known = eir_get_data(info->data, sizeof(info->data), EIR_NAME_COMPLETE, NULL); else name_known = true; flags = hci_inquiry_cache_update(hdev, &data, name_known); eir_len = eir_get_length(info->data, sizeof(info->data)); mgmt_device_found(hdev, &info->bdaddr, ACL_LINK, 0x00, info->dev_class, info->rssi, flags, info->data, eir_len, NULL, 0, 0); } hci_dev_unlock(hdev); } static void hci_key_refresh_complete_evt(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_key_refresh_complete *ev = data; struct hci_conn *conn; bt_dev_dbg(hdev, "status 0x%2.2x handle 0x%4.4x", ev->status, __le16_to_cpu(ev->handle)); hci_dev_lock(hdev); conn = hci_conn_hash_lookup_handle(hdev, __le16_to_cpu(ev->handle)); if (!conn) goto unlock; /* For BR/EDR the necessary steps are taken through the * auth_complete event. */ if (conn->type != LE_LINK) goto unlock; if (!ev->status) conn->sec_level = conn->pending_sec_level; clear_bit(HCI_CONN_ENCRYPT_PEND, &conn->flags); if (ev->status && conn->state == BT_CONNECTED) { hci_disconnect(conn, HCI_ERROR_AUTH_FAILURE); hci_conn_drop(conn); goto unlock; } if (conn->state == BT_CONFIG) { if (!ev->status) conn->state = BT_CONNECTED; hci_connect_cfm(conn, ev->status); hci_conn_drop(conn); } else { hci_auth_cfm(conn, ev->status); hci_conn_hold(conn); conn->disc_timeout = HCI_DISCONN_TIMEOUT; hci_conn_drop(conn); } unlock: hci_dev_unlock(hdev); } static u8 hci_get_auth_req(struct hci_conn *conn) { /* If remote requests no-bonding follow that lead */ if (conn->remote_auth == HCI_AT_NO_BONDING || conn->remote_auth == HCI_AT_NO_BONDING_MITM) return conn->remote_auth | (conn->auth_type & 0x01); /* If both remote and local have enough IO capabilities, require * MITM protection */ if (conn->remote_cap != HCI_IO_NO_INPUT_OUTPUT && conn->io_capability != HCI_IO_NO_INPUT_OUTPUT) return conn->remote_auth | 0x01; /* No MITM protection possible so ignore remote requirement */ return (conn->remote_auth & ~0x01) | (conn->auth_type & 0x01); } static u8 bredr_oob_data_present(struct hci_conn *conn) { struct hci_dev *hdev = conn->hdev; struct oob_data *data; data = hci_find_remote_oob_data(hdev, &conn->dst, BDADDR_BREDR); if (!data) return 0x00; if (bredr_sc_enabled(hdev)) { /* When Secure Connections is enabled, then just * return the present value stored with the OOB * data. The stored value contains the right present * information. However it can only be trusted when * not in Secure Connection Only mode. */ if (!hci_dev_test_flag(hdev, HCI_SC_ONLY)) return data->present; /* When Secure Connections Only mode is enabled, then * the P-256 values are required. If they are not * available, then do not declare that OOB data is * present. */ if (!crypto_memneq(data->rand256, ZERO_KEY, 16) || !crypto_memneq(data->hash256, ZERO_KEY, 16)) return 0x00; return 0x02; } /* When Secure Connections is not enabled or actually * not supported by the hardware, then check that if * P-192 data values are present. */ if (!crypto_memneq(data->rand192, ZERO_KEY, 16) || !crypto_memneq(data->hash192, ZERO_KEY, 16)) return 0x00; return 0x01; } static void hci_io_capa_request_evt(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_io_capa_request *ev = data; struct hci_conn *conn; bt_dev_dbg(hdev, ""); hci_dev_lock(hdev); conn = hci_conn_hash_lookup_ba(hdev, ACL_LINK, &ev->bdaddr); if (!conn || !hci_dev_test_flag(hdev, HCI_SSP_ENABLED)) goto unlock; /* Assume remote supports SSP since it has triggered this event */ set_bit(HCI_CONN_SSP_ENABLED, &conn->flags); hci_conn_hold(conn); if (!hci_dev_test_flag(hdev, HCI_MGMT)) goto unlock; /* Allow pairing if we're pairable, the initiators of the * pairing or if the remote is not requesting bonding. */ if (hci_dev_test_flag(hdev, HCI_BONDABLE) || test_bit(HCI_CONN_AUTH_INITIATOR, &conn->flags) || (conn->remote_auth & ~0x01) == HCI_AT_NO_BONDING) { struct hci_cp_io_capability_reply cp; bacpy(&cp.bdaddr, &ev->bdaddr); /* Change the IO capability from KeyboardDisplay * to DisplayYesNo as it is not supported by BT spec. */ cp.capability = (conn->io_capability == 0x04) ? HCI_IO_DISPLAY_YESNO : conn->io_capability; /* If we are initiators, there is no remote information yet */ if (conn->remote_auth == 0xff) { /* Request MITM protection if our IO caps allow it * except for the no-bonding case. */ if (conn->io_capability != HCI_IO_NO_INPUT_OUTPUT && conn->auth_type != HCI_AT_NO_BONDING) conn->auth_type |= 0x01; } else { conn->auth_type = hci_get_auth_req(conn); } /* If we're not bondable, force one of the non-bondable * authentication requirement values. */ if (!hci_dev_test_flag(hdev, HCI_BONDABLE)) conn->auth_type &= HCI_AT_NO_BONDING_MITM; cp.authentication = conn->auth_type; cp.oob_data = bredr_oob_data_present(conn); hci_send_cmd(hdev, HCI_OP_IO_CAPABILITY_REPLY, sizeof(cp), &cp); } else { struct hci_cp_io_capability_neg_reply cp; bacpy(&cp.bdaddr, &ev->bdaddr); cp.reason = HCI_ERROR_PAIRING_NOT_ALLOWED; hci_send_cmd(hdev, HCI_OP_IO_CAPABILITY_NEG_REPLY, sizeof(cp), &cp); } unlock: hci_dev_unlock(hdev); } static void hci_io_capa_reply_evt(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_io_capa_reply *ev = data; struct hci_conn *conn; bt_dev_dbg(hdev, ""); hci_dev_lock(hdev); conn = hci_conn_hash_lookup_ba(hdev, ACL_LINK, &ev->bdaddr); if (!conn) goto unlock; conn->remote_cap = ev->capability; conn->remote_auth = ev->authentication; unlock: hci_dev_unlock(hdev); } static void hci_user_confirm_request_evt(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_user_confirm_req *ev = data; int loc_mitm, rem_mitm, confirm_hint = 0; struct hci_conn *conn; bt_dev_dbg(hdev, ""); hci_dev_lock(hdev); if (!hci_dev_test_flag(hdev, HCI_MGMT)) goto unlock; conn = hci_conn_hash_lookup_ba(hdev, ACL_LINK, &ev->bdaddr); if (!conn) goto unlock; loc_mitm = (conn->auth_type & 0x01); rem_mitm = (conn->remote_auth & 0x01); /* If we require MITM but the remote device can't provide that * (it has NoInputNoOutput) then reject the confirmation * request. We check the security level here since it doesn't * necessarily match conn->auth_type. */ if (conn->pending_sec_level > BT_SECURITY_MEDIUM && conn->remote_cap == HCI_IO_NO_INPUT_OUTPUT) { bt_dev_dbg(hdev, "Rejecting request: remote device can't provide MITM"); hci_send_cmd(hdev, HCI_OP_USER_CONFIRM_NEG_REPLY, sizeof(ev->bdaddr), &ev->bdaddr); goto unlock; } /* If no side requires MITM protection; use JUST_CFM method */ if ((!loc_mitm || conn->remote_cap == HCI_IO_NO_INPUT_OUTPUT) && (!rem_mitm || conn->io_capability == HCI_IO_NO_INPUT_OUTPUT)) { /* If we're not the initiator of request authorization and the * local IO capability is not NoInputNoOutput, use JUST_WORKS * method (mgmt_user_confirm with confirm_hint set to 1). */ if (!test_bit(HCI_CONN_AUTH_PEND, &conn->flags) && conn->io_capability != HCI_IO_NO_INPUT_OUTPUT) { bt_dev_dbg(hdev, "Confirming auto-accept as acceptor"); confirm_hint = 1; goto confirm; } /* If there already exists link key in local host, leave the * decision to user space since the remote device could be * legitimate or malicious. */ if (hci_find_link_key(hdev, &ev->bdaddr)) { bt_dev_dbg(hdev, "Local host already has link key"); confirm_hint = 1; goto confirm; } BT_DBG("Auto-accept of user confirmation with %ums delay", hdev->auto_accept_delay); if (hdev->auto_accept_delay > 0) { int delay = msecs_to_jiffies(hdev->auto_accept_delay); queue_delayed_work(conn->hdev->workqueue, &conn->auto_accept_work, delay); goto unlock; } hci_send_cmd(hdev, HCI_OP_USER_CONFIRM_REPLY, sizeof(ev->bdaddr), &ev->bdaddr); goto unlock; } confirm: mgmt_user_confirm_request(hdev, &ev->bdaddr, ACL_LINK, 0, le32_to_cpu(ev->passkey), confirm_hint); unlock: hci_dev_unlock(hdev); } static void hci_user_passkey_request_evt(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_user_passkey_req *ev = data; bt_dev_dbg(hdev, ""); if (hci_dev_test_flag(hdev, HCI_MGMT)) mgmt_user_passkey_request(hdev, &ev->bdaddr, ACL_LINK, 0); } static void hci_user_passkey_notify_evt(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_user_passkey_notify *ev = data; struct hci_conn *conn; bt_dev_dbg(hdev, ""); conn = hci_conn_hash_lookup_ba(hdev, ACL_LINK, &ev->bdaddr); if (!conn) return; conn->passkey_notify = __le32_to_cpu(ev->passkey); conn->passkey_entered = 0; if (hci_dev_test_flag(hdev, HCI_MGMT)) mgmt_user_passkey_notify(hdev, &conn->dst, conn->type, conn->dst_type, conn->passkey_notify, conn->passkey_entered); } static void hci_keypress_notify_evt(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_keypress_notify *ev = data; struct hci_conn *conn; bt_dev_dbg(hdev, ""); conn = hci_conn_hash_lookup_ba(hdev, ACL_LINK, &ev->bdaddr); if (!conn) return; switch (ev->type) { case HCI_KEYPRESS_STARTED: conn->passkey_entered = 0; return; case HCI_KEYPRESS_ENTERED: conn->passkey_entered++; break; case HCI_KEYPRESS_ERASED: conn->passkey_entered--; break; case HCI_KEYPRESS_CLEARED: conn->passkey_entered = 0; break; case HCI_KEYPRESS_COMPLETED: return; } if (hci_dev_test_flag(hdev, HCI_MGMT)) mgmt_user_passkey_notify(hdev, &conn->dst, conn->type, conn->dst_type, conn->passkey_notify, conn->passkey_entered); } static void hci_simple_pair_complete_evt(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_simple_pair_complete *ev = data; struct hci_conn *conn; bt_dev_dbg(hdev, ""); hci_dev_lock(hdev); conn = hci_conn_hash_lookup_ba(hdev, ACL_LINK, &ev->bdaddr); if (!conn || !hci_conn_ssp_enabled(conn)) goto unlock; /* Reset the authentication requirement to unknown */ conn->remote_auth = 0xff; /* To avoid duplicate auth_failed events to user space we check * the HCI_CONN_AUTH_PEND flag which will be set if we * initiated the authentication. A traditional auth_complete * event gets always produced as initiator and is also mapped to * the mgmt_auth_failed event */ if (!test_bit(HCI_CONN_AUTH_PEND, &conn->flags) && ev->status) mgmt_auth_failed(conn, ev->status); hci_conn_drop(conn); unlock: hci_dev_unlock(hdev); } static void hci_remote_host_features_evt(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_remote_host_features *ev = data; struct inquiry_entry *ie; struct hci_conn *conn; bt_dev_dbg(hdev, ""); hci_dev_lock(hdev); conn = hci_conn_hash_lookup_ba(hdev, ACL_LINK, &ev->bdaddr); if (conn) memcpy(conn->features[1], ev->features, 8); ie = hci_inquiry_cache_lookup(hdev, &ev->bdaddr); if (ie) ie->data.ssp_mode = (ev->features[0] & LMP_HOST_SSP); hci_dev_unlock(hdev); } static void hci_remote_oob_data_request_evt(struct hci_dev *hdev, void *edata, struct sk_buff *skb) { struct hci_ev_remote_oob_data_request *ev = edata; struct oob_data *data; bt_dev_dbg(hdev, ""); hci_dev_lock(hdev); if (!hci_dev_test_flag(hdev, HCI_MGMT)) goto unlock; data = hci_find_remote_oob_data(hdev, &ev->bdaddr, BDADDR_BREDR); if (!data) { struct hci_cp_remote_oob_data_neg_reply cp; bacpy(&cp.bdaddr, &ev->bdaddr); hci_send_cmd(hdev, HCI_OP_REMOTE_OOB_DATA_NEG_REPLY, sizeof(cp), &cp); goto unlock; } if (bredr_sc_enabled(hdev)) { struct hci_cp_remote_oob_ext_data_reply cp; bacpy(&cp.bdaddr, &ev->bdaddr); if (hci_dev_test_flag(hdev, HCI_SC_ONLY)) { memset(cp.hash192, 0, sizeof(cp.hash192)); memset(cp.rand192, 0, sizeof(cp.rand192)); } else { memcpy(cp.hash192, data->hash192, sizeof(cp.hash192)); memcpy(cp.rand192, data->rand192, sizeof(cp.rand192)); } memcpy(cp.hash256, data->hash256, sizeof(cp.hash256)); memcpy(cp.rand256, data->rand256, sizeof(cp.rand256)); hci_send_cmd(hdev, HCI_OP_REMOTE_OOB_EXT_DATA_REPLY, sizeof(cp), &cp); } else { struct hci_cp_remote_oob_data_reply cp; bacpy(&cp.bdaddr, &ev->bdaddr); memcpy(cp.hash, data->hash192, sizeof(cp.hash)); memcpy(cp.rand, data->rand192, sizeof(cp.rand)); hci_send_cmd(hdev, HCI_OP_REMOTE_OOB_DATA_REPLY, sizeof(cp), &cp); } unlock: hci_dev_unlock(hdev); } static void le_conn_update_addr(struct hci_conn *conn, bdaddr_t *bdaddr, u8 bdaddr_type, bdaddr_t *local_rpa) { if (conn->out) { conn->dst_type = bdaddr_type; conn->resp_addr_type = bdaddr_type; bacpy(&conn->resp_addr, bdaddr); /* Check if the controller has set a Local RPA then it must be * used instead or hdev->rpa. */ if (local_rpa && bacmp(local_rpa, BDADDR_ANY)) { conn->init_addr_type = ADDR_LE_DEV_RANDOM; bacpy(&conn->init_addr, local_rpa); } else if (hci_dev_test_flag(conn->hdev, HCI_PRIVACY)) { conn->init_addr_type = ADDR_LE_DEV_RANDOM; bacpy(&conn->init_addr, &conn->hdev->rpa); } else { hci_copy_identity_address(conn->hdev, &conn->init_addr, &conn->init_addr_type); } } else { conn->resp_addr_type = conn->hdev->adv_addr_type; /* Check if the controller has set a Local RPA then it must be * used instead or hdev->rpa. */ if (local_rpa && bacmp(local_rpa, BDADDR_ANY)) { conn->resp_addr_type = ADDR_LE_DEV_RANDOM; bacpy(&conn->resp_addr, local_rpa); } else if (conn->hdev->adv_addr_type == ADDR_LE_DEV_RANDOM) { /* In case of ext adv, resp_addr will be updated in * Adv Terminated event. */ if (!ext_adv_capable(conn->hdev)) bacpy(&conn->resp_addr, &conn->hdev->random_addr); } else { bacpy(&conn->resp_addr, &conn->hdev->bdaddr); } conn->init_addr_type = bdaddr_type; bacpy(&conn->init_addr, bdaddr); /* For incoming connections, set the default minimum * and maximum connection interval. They will be used * to check if the parameters are in range and if not * trigger the connection update procedure. */ conn->le_conn_min_interval = conn->hdev->le_conn_min_interval; conn->le_conn_max_interval = conn->hdev->le_conn_max_interval; } } static void le_conn_complete_evt(struct hci_dev *hdev, u8 status, bdaddr_t *bdaddr, u8 bdaddr_type, bdaddr_t *local_rpa, u8 role, u16 handle, u16 interval, u16 latency, u16 supervision_timeout) { struct hci_conn_params *params; struct hci_conn *conn; struct smp_irk *irk; u8 addr_type; hci_dev_lock(hdev); /* All controllers implicitly stop advertising in the event of a * connection, so ensure that the state bit is cleared. */ hci_dev_clear_flag(hdev, HCI_LE_ADV); /* Check for existing connection: * * 1. If it doesn't exist then use the role to create a new object. * 2. If it does exist confirm that it is connecting/BT_CONNECT in case * of initiator/master role since there could be a collision where * either side is attempting to connect or something like a fuzzing * testing is trying to play tricks to destroy the hcon object before * it even attempts to connect (e.g. hcon->state == BT_OPEN). */ conn = hci_conn_hash_lookup_role(hdev, LE_LINK, role, bdaddr); if (!conn || (conn->role == HCI_ROLE_MASTER && conn->state != BT_CONNECT)) { /* In case of error status and there is no connection pending * just unlock as there is nothing to cleanup. */ if (status) goto unlock; conn = hci_conn_add_unset(hdev, LE_LINK, bdaddr, role); if (IS_ERR(conn)) { bt_dev_err(hdev, "connection err: %ld", PTR_ERR(conn)); goto unlock; } conn->dst_type = bdaddr_type; /* If we didn't have a hci_conn object previously * but we're in central role this must be something * initiated using an accept list. Since accept list based * connections are not "first class citizens" we don't * have full tracking of them. Therefore, we go ahead * with a "best effort" approach of determining the * initiator address based on the HCI_PRIVACY flag. */ if (conn->out) { conn->resp_addr_type = bdaddr_type; bacpy(&conn->resp_addr, bdaddr); if (hci_dev_test_flag(hdev, HCI_PRIVACY)) { conn->init_addr_type = ADDR_LE_DEV_RANDOM; bacpy(&conn->init_addr, &hdev->rpa); } else { hci_copy_identity_address(hdev, &conn->init_addr, &conn->init_addr_type); } } } else { cancel_delayed_work(&conn->le_conn_timeout); } /* The HCI_LE_Connection_Complete event is only sent once per connection. * Processing it more than once per connection can corrupt kernel memory. * * As the connection handle is set here for the first time, it indicates * whether the connection is already set up. */ if (!HCI_CONN_HANDLE_UNSET(conn->handle)) { bt_dev_err(hdev, "Ignoring HCI_Connection_Complete for existing connection"); goto unlock; } le_conn_update_addr(conn, bdaddr, bdaddr_type, local_rpa); /* Lookup the identity address from the stored connection * address and address type. * * When establishing connections to an identity address, the * connection procedure will store the resolvable random * address first. Now if it can be converted back into the * identity address, start using the identity address from * now on. */ irk = hci_get_irk(hdev, &conn->dst, conn->dst_type); if (irk) { bacpy(&conn->dst, &irk->bdaddr); conn->dst_type = irk->addr_type; } conn->dst_type = ev_bdaddr_type(hdev, conn->dst_type, NULL); /* All connection failure handling is taken care of by the * hci_conn_failed function which is triggered by the HCI * request completion callbacks used for connecting. */ if (status || hci_conn_set_handle(conn, handle)) goto unlock; /* Drop the connection if it has been aborted */ if (test_bit(HCI_CONN_CANCEL, &conn->flags)) { hci_conn_drop(conn); goto unlock; } if (conn->dst_type == ADDR_LE_DEV_PUBLIC) addr_type = BDADDR_LE_PUBLIC; else addr_type = BDADDR_LE_RANDOM; /* Drop the connection if the device is blocked */ if (hci_bdaddr_list_lookup(&hdev->reject_list, &conn->dst, addr_type)) { hci_conn_drop(conn); goto unlock; } mgmt_device_connected(hdev, conn, NULL, 0); conn->sec_level = BT_SECURITY_LOW; conn->state = BT_CONFIG; /* Store current advertising instance as connection advertising instance * when software rotation is in use so it can be re-enabled when * disconnected. */ if (!ext_adv_capable(hdev)) conn->adv_instance = hdev->cur_adv_instance; conn->le_conn_interval = interval; conn->le_conn_latency = latency; conn->le_supv_timeout = supervision_timeout; hci_debugfs_create_conn(conn); hci_conn_add_sysfs(conn); /* The remote features procedure is defined for central * role only. So only in case of an initiated connection * request the remote features. * * If the local controller supports peripheral-initiated features * exchange, then requesting the remote features in peripheral * role is possible. Otherwise just transition into the * connected state without requesting the remote features. */ if (conn->out || (hdev->le_features[0] & HCI_LE_PERIPHERAL_FEATURES)) { struct hci_cp_le_read_remote_features cp; cp.handle = __cpu_to_le16(conn->handle); hci_send_cmd(hdev, HCI_OP_LE_READ_REMOTE_FEATURES, sizeof(cp), &cp); hci_conn_hold(conn); } else { conn->state = BT_CONNECTED; hci_connect_cfm(conn, status); } params = hci_pend_le_action_lookup(&hdev->pend_le_conns, &conn->dst, conn->dst_type); if (params) { hci_pend_le_list_del_init(params); if (params->conn) { hci_conn_drop(params->conn); hci_conn_put(params->conn); params->conn = NULL; } } unlock: hci_update_passive_scan(hdev); hci_dev_unlock(hdev); } static void hci_le_conn_complete_evt(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_le_conn_complete *ev = data; bt_dev_dbg(hdev, "status 0x%2.2x", ev->status); le_conn_complete_evt(hdev, ev->status, &ev->bdaddr, ev->bdaddr_type, NULL, ev->role, le16_to_cpu(ev->handle), le16_to_cpu(ev->interval), le16_to_cpu(ev->latency), le16_to_cpu(ev->supervision_timeout)); } static void hci_le_enh_conn_complete_evt(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_le_enh_conn_complete *ev = data; bt_dev_dbg(hdev, "status 0x%2.2x", ev->status); le_conn_complete_evt(hdev, ev->status, &ev->bdaddr, ev->bdaddr_type, &ev->local_rpa, ev->role, le16_to_cpu(ev->handle), le16_to_cpu(ev->interval), le16_to_cpu(ev->latency), le16_to_cpu(ev->supervision_timeout)); } static void hci_le_ext_adv_term_evt(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_evt_le_ext_adv_set_term *ev = data; struct hci_conn *conn; struct adv_info *adv, *n; bt_dev_dbg(hdev, "status 0x%2.2x", ev->status); /* The Bluetooth Core 5.3 specification clearly states that this event * shall not be sent when the Host disables the advertising set. So in * case of HCI_ERROR_CANCELLED_BY_HOST, just ignore the event. * * When the Host disables an advertising set, all cleanup is done via * its command callback and not needed to be duplicated here. */ if (ev->status == HCI_ERROR_CANCELLED_BY_HOST) { bt_dev_warn_ratelimited(hdev, "Unexpected advertising set terminated event"); return; } hci_dev_lock(hdev); adv = hci_find_adv_instance(hdev, ev->handle); if (ev->status) { if (!adv) goto unlock; /* Remove advertising as it has been terminated */ hci_remove_adv_instance(hdev, ev->handle); mgmt_advertising_removed(NULL, hdev, ev->handle); list_for_each_entry_safe(adv, n, &hdev->adv_instances, list) { if (adv->enabled) goto unlock; } /* We are no longer advertising, clear HCI_LE_ADV */ hci_dev_clear_flag(hdev, HCI_LE_ADV); goto unlock; } if (adv) adv->enabled = false; conn = hci_conn_hash_lookup_handle(hdev, __le16_to_cpu(ev->conn_handle)); if (conn) { /* Store handle in the connection so the correct advertising * instance can be re-enabled when disconnected. */ conn->adv_instance = ev->handle; if (hdev->adv_addr_type != ADDR_LE_DEV_RANDOM || bacmp(&conn->resp_addr, BDADDR_ANY)) goto unlock; if (!ev->handle) { bacpy(&conn->resp_addr, &hdev->random_addr); goto unlock; } if (adv) bacpy(&conn->resp_addr, &adv->random_addr); } unlock: hci_dev_unlock(hdev); } static void hci_le_conn_update_complete_evt(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_le_conn_update_complete *ev = data; struct hci_conn *conn; bt_dev_dbg(hdev, "status 0x%2.2x", ev->status); if (ev->status) return; hci_dev_lock(hdev); conn = hci_conn_hash_lookup_handle(hdev, __le16_to_cpu(ev->handle)); if (conn) { conn->le_conn_interval = le16_to_cpu(ev->interval); conn->le_conn_latency = le16_to_cpu(ev->latency); conn->le_supv_timeout = le16_to_cpu(ev->supervision_timeout); } hci_dev_unlock(hdev); } /* This function requires the caller holds hdev->lock */ static struct hci_conn *check_pending_le_conn(struct hci_dev *hdev, bdaddr_t *addr, u8 addr_type, bool addr_resolved, u8 adv_type, u8 phy, u8 sec_phy) { struct hci_conn *conn; struct hci_conn_params *params; /* If the event is not connectable don't proceed further */ if (adv_type != LE_ADV_IND && adv_type != LE_ADV_DIRECT_IND) return NULL; /* Ignore if the device is blocked or hdev is suspended */ if (hci_bdaddr_list_lookup(&hdev->reject_list, addr, addr_type) || hdev->suspended) return NULL; /* Most controller will fail if we try to create new connections * while we have an existing one in peripheral role. */ if (hdev->conn_hash.le_num_peripheral > 0 && (hci_test_quirk(hdev, HCI_QUIRK_BROKEN_LE_STATES) || !(hdev->le_states[3] & 0x10))) return NULL; /* If we're not connectable only connect devices that we have in * our pend_le_conns list. */ params = hci_pend_le_action_lookup(&hdev->pend_le_conns, addr, addr_type); if (!params) return NULL; if (!params->explicit_connect) { switch (params->auto_connect) { case HCI_AUTO_CONN_DIRECT: /* Only devices advertising with ADV_DIRECT_IND are * triggering a connection attempt. This is allowing * incoming connections from peripheral devices. */ if (adv_type != LE_ADV_DIRECT_IND) return NULL; break; case HCI_AUTO_CONN_ALWAYS: /* Devices advertising with ADV_IND or ADV_DIRECT_IND * are triggering a connection attempt. This means * that incoming connections from peripheral device are * accepted and also outgoing connections to peripheral * devices are established when found. */ break; default: return NULL; } } conn = hci_connect_le(hdev, addr, addr_type, addr_resolved, BT_SECURITY_LOW, hdev->def_le_autoconnect_timeout, HCI_ROLE_MASTER, phy, sec_phy); if (!IS_ERR(conn)) { /* If HCI_AUTO_CONN_EXPLICIT is set, conn is already owned * by higher layer that tried to connect, if no then * store the pointer since we don't really have any * other owner of the object besides the params that * triggered it. This way we can abort the connection if * the parameters get removed and keep the reference * count consistent once the connection is established. */ if (!params->explicit_connect) params->conn = hci_conn_get(conn); return conn; } switch (PTR_ERR(conn)) { case -EBUSY: /* If hci_connect() returns -EBUSY it means there is already * an LE connection attempt going on. Since controllers don't * support more than one connection attempt at the time, we * don't consider this an error case. */ break; default: BT_DBG("Failed to connect: err %ld", PTR_ERR(conn)); return NULL; } return NULL; } static void process_adv_report(struct hci_dev *hdev, u8 type, bdaddr_t *bdaddr, u8 bdaddr_type, bdaddr_t *direct_addr, u8 direct_addr_type, u8 phy, u8 sec_phy, s8 rssi, u8 *data, u8 len, bool ext_adv, bool ctl_time, u64 instant) { struct discovery_state *d = &hdev->discovery; struct smp_irk *irk; struct hci_conn *conn; bool match, bdaddr_resolved; u32 flags; u8 *ptr; switch (type) { case LE_ADV_IND: case LE_ADV_DIRECT_IND: case LE_ADV_SCAN_IND: case LE_ADV_NONCONN_IND: case LE_ADV_SCAN_RSP: break; default: bt_dev_err_ratelimited(hdev, "unknown advertising packet " "type: 0x%02x", type); return; } if (len > max_adv_len(hdev)) { bt_dev_err_ratelimited(hdev, "adv larger than maximum supported"); return; } /* Find the end of the data in case the report contains padded zero * bytes at the end causing an invalid length value. * * When data is NULL, len is 0 so there is no need for extra ptr * check as 'ptr < data + 0' is already false in such case. */ for (ptr = data; ptr < data + len && *ptr; ptr += *ptr + 1) { if (ptr + 1 + *ptr > data + len) break; } /* Adjust for actual length. This handles the case when remote * device is advertising with incorrect data length. */ len = ptr - data; /* If the direct address is present, then this report is from * a LE Direct Advertising Report event. In that case it is * important to see if the address is matching the local * controller address. * * If local privacy is not enable the controller shall not be * generating such event since according to its documentation it is only * valid for filter_policy 0x02 and 0x03, but the fact that it did * generate LE Direct Advertising Report means it is probably broken and * won't generate any other event which can potentially break * auto-connect logic so in case local privacy is not enable this * ignores the direct_addr so it works as a regular report. */ if (!hci_dev_test_flag(hdev, HCI_MESH) && direct_addr && hci_dev_test_flag(hdev, HCI_PRIVACY)) { direct_addr_type = ev_bdaddr_type(hdev, direct_addr_type, &bdaddr_resolved); /* Only resolvable random addresses are valid for these * kind of reports and others can be ignored. */ if (!hci_bdaddr_is_rpa(direct_addr, direct_addr_type)) return; /* If the local IRK of the controller does not match * with the resolvable random address provided, then * this report can be ignored. */ if (!smp_irk_matches(hdev, hdev->irk, direct_addr)) return; } /* Check if we need to convert to identity address */ irk = hci_get_irk(hdev, bdaddr, bdaddr_type); if (irk) { bdaddr = &irk->bdaddr; bdaddr_type = irk->addr_type; } bdaddr_type = ev_bdaddr_type(hdev, bdaddr_type, &bdaddr_resolved); /* Check if we have been requested to connect to this device. * * direct_addr is set only for directed advertising reports (it is NULL * for advertising reports) and is already verified to be RPA above. */ conn = check_pending_le_conn(hdev, bdaddr, bdaddr_type, bdaddr_resolved, type, phy, sec_phy); if (!ext_adv && conn && type == LE_ADV_IND && len <= max_adv_len(hdev)) { /* Store report for later inclusion by * mgmt_device_connected */ memcpy(conn->le_adv_data, data, len); conn->le_adv_data_len = len; } if (type == LE_ADV_NONCONN_IND || type == LE_ADV_SCAN_IND) flags = MGMT_DEV_FOUND_NOT_CONNECTABLE; else flags = 0; /* All scan results should be sent up for Mesh systems */ if (hci_dev_test_flag(hdev, HCI_MESH)) { mgmt_device_found(hdev, bdaddr, LE_LINK, bdaddr_type, NULL, rssi, flags, data, len, NULL, 0, instant); return; } /* Passive scanning shouldn't trigger any device found events, * except for devices marked as CONN_REPORT for which we do send * device found events, or advertisement monitoring requested. */ if (hdev->le_scan_type == LE_SCAN_PASSIVE) { if (type == LE_ADV_DIRECT_IND) return; if (!hci_pend_le_action_lookup(&hdev->pend_le_reports, bdaddr, bdaddr_type) && idr_is_empty(&hdev->adv_monitors_idr)) return; mgmt_device_found(hdev, bdaddr, LE_LINK, bdaddr_type, NULL, rssi, flags, data, len, NULL, 0, 0); return; } /* When receiving a scan response, then there is no way to * know if the remote device is connectable or not. However * since scan responses are merged with a previously seen * advertising report, the flags field from that report * will be used. * * In the unlikely case that a controller just sends a scan * response event that doesn't match the pending report, then * it is marked as a standalone SCAN_RSP. */ if (type == LE_ADV_SCAN_RSP) flags = MGMT_DEV_FOUND_SCAN_RSP; /* If there's nothing pending either store the data from this * event or send an immediate device found event if the data * should not be stored for later. */ if (!has_pending_adv_report(hdev)) { /* If the report will trigger a SCAN_REQ store it for * later merging. */ if (!ext_adv && (type == LE_ADV_IND || type == LE_ADV_SCAN_IND)) { store_pending_adv_report(hdev, bdaddr, bdaddr_type, rssi, flags, data, len); return; } mgmt_device_found(hdev, bdaddr, LE_LINK, bdaddr_type, NULL, rssi, flags, data, len, NULL, 0, 0); return; } /* Check if the pending report is for the same device as the new one */ match = (!bacmp(bdaddr, &d->last_adv_addr) && bdaddr_type == d->last_adv_addr_type); /* If the pending data doesn't match this report or this isn't a * scan response (e.g. we got a duplicate ADV_IND) then force * sending of the pending data. */ if (type != LE_ADV_SCAN_RSP || !match) { /* Send out whatever is in the cache, but skip duplicates */ if (!match) mgmt_device_found(hdev, &d->last_adv_addr, LE_LINK, d->last_adv_addr_type, NULL, d->last_adv_rssi, d->last_adv_flags, d->last_adv_data, d->last_adv_data_len, NULL, 0, 0); /* If the new report will trigger a SCAN_REQ store it for * later merging. */ if (!ext_adv && (type == LE_ADV_IND || type == LE_ADV_SCAN_IND)) { store_pending_adv_report(hdev, bdaddr, bdaddr_type, rssi, flags, data, len); return; } /* The advertising reports cannot be merged, so clear * the pending report and send out a device found event. */ clear_pending_adv_report(hdev); mgmt_device_found(hdev, bdaddr, LE_LINK, bdaddr_type, NULL, rssi, flags, data, len, NULL, 0, 0); return; } /* If we get here we've got a pending ADV_IND or ADV_SCAN_IND and * the new event is a SCAN_RSP. We can therefore proceed with * sending a merged device found event. */ mgmt_device_found(hdev, &d->last_adv_addr, LE_LINK, d->last_adv_addr_type, NULL, rssi, d->last_adv_flags, d->last_adv_data, d->last_adv_data_len, data, len, 0); clear_pending_adv_report(hdev); } static void hci_le_adv_report_evt(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_le_advertising_report *ev = data; u64 instant = jiffies; if (!ev->num) return; hci_dev_lock(hdev); while (ev->num--) { struct hci_ev_le_advertising_info *info; s8 rssi; info = hci_le_ev_skb_pull(hdev, skb, HCI_EV_LE_ADVERTISING_REPORT, sizeof(*info)); if (!info) break; if (!hci_le_ev_skb_pull(hdev, skb, HCI_EV_LE_ADVERTISING_REPORT, info->length + 1)) break; if (info->length <= max_adv_len(hdev)) { rssi = info->data[info->length]; process_adv_report(hdev, info->type, &info->bdaddr, info->bdaddr_type, NULL, 0, HCI_ADV_PHY_1M, 0, rssi, info->data, info->length, false, false, instant); } else { bt_dev_err(hdev, "Dropping invalid advertising data"); } } hci_dev_unlock(hdev); } static u8 ext_evt_type_to_legacy(struct hci_dev *hdev, u16 evt_type) { u16 pdu_type = evt_type & ~LE_EXT_ADV_DATA_STATUS_MASK; if (!pdu_type) return LE_ADV_NONCONN_IND; if (evt_type & LE_EXT_ADV_LEGACY_PDU) { switch (evt_type) { case LE_LEGACY_ADV_IND: return LE_ADV_IND; case LE_LEGACY_ADV_DIRECT_IND: return LE_ADV_DIRECT_IND; case LE_LEGACY_ADV_SCAN_IND: return LE_ADV_SCAN_IND; case LE_LEGACY_NONCONN_IND: return LE_ADV_NONCONN_IND; case LE_LEGACY_SCAN_RSP_ADV: case LE_LEGACY_SCAN_RSP_ADV_SCAN: return LE_ADV_SCAN_RSP; } goto invalid; } if (evt_type & LE_EXT_ADV_CONN_IND) { if (evt_type & LE_EXT_ADV_DIRECT_IND) return LE_ADV_DIRECT_IND; return LE_ADV_IND; } if (evt_type & LE_EXT_ADV_SCAN_RSP) return LE_ADV_SCAN_RSP; if (evt_type & LE_EXT_ADV_SCAN_IND) return LE_ADV_SCAN_IND; if (evt_type & LE_EXT_ADV_DIRECT_IND) return LE_ADV_NONCONN_IND; invalid: bt_dev_err_ratelimited(hdev, "Unknown advertising packet type: 0x%02x", evt_type); return LE_ADV_INVALID; } static void hci_le_ext_adv_report_evt(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_le_ext_adv_report *ev = data; u64 instant = jiffies; if (!ev->num) return; hci_dev_lock(hdev); while (ev->num--) { struct hci_ev_le_ext_adv_info *info; u8 legacy_evt_type; u16 evt_type; info = hci_le_ev_skb_pull(hdev, skb, HCI_EV_LE_EXT_ADV_REPORT, sizeof(*info)); if (!info) break; if (!hci_le_ev_skb_pull(hdev, skb, HCI_EV_LE_EXT_ADV_REPORT, info->length)) break; evt_type = __le16_to_cpu(info->type) & LE_EXT_ADV_EVT_TYPE_MASK; legacy_evt_type = ext_evt_type_to_legacy(hdev, evt_type); if (hci_test_quirk(hdev, HCI_QUIRK_FIXUP_LE_EXT_ADV_REPORT_PHY)) { info->primary_phy &= 0x1f; info->secondary_phy &= 0x1f; } /* Check if PA Sync is pending and if the hci_conn SID has not * been set update it. */ if (hci_dev_test_flag(hdev, HCI_PA_SYNC)) { struct hci_conn *conn; conn = hci_conn_hash_lookup_create_pa_sync(hdev); if (conn && conn->sid == HCI_SID_INVALID) conn->sid = info->sid; } if (legacy_evt_type != LE_ADV_INVALID) { process_adv_report(hdev, legacy_evt_type, &info->bdaddr, info->bdaddr_type, NULL, 0, info->primary_phy, info->secondary_phy, info->rssi, info->data, info->length, !(evt_type & LE_EXT_ADV_LEGACY_PDU), false, instant); } } hci_dev_unlock(hdev); } static int hci_le_pa_term_sync(struct hci_dev *hdev, __le16 handle) { struct hci_cp_le_pa_term_sync cp; memset(&cp, 0, sizeof(cp)); cp.handle = handle; return hci_send_cmd(hdev, HCI_OP_LE_PA_TERM_SYNC, sizeof(cp), &cp); } static void hci_le_pa_sync_established_evt(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_le_pa_sync_established *ev = data; int mask = hdev->link_mode; __u8 flags = 0; struct hci_conn *pa_sync, *conn; bt_dev_dbg(hdev, "status 0x%2.2x", ev->status); hci_dev_lock(hdev); hci_dev_clear_flag(hdev, HCI_PA_SYNC); conn = hci_conn_hash_lookup_create_pa_sync(hdev); if (!conn) { bt_dev_err(hdev, "Unable to find connection for dst %pMR sid 0x%2.2x", &ev->bdaddr, ev->sid); goto unlock; } clear_bit(HCI_CONN_CREATE_PA_SYNC, &conn->flags); conn->sync_handle = le16_to_cpu(ev->handle); conn->sid = HCI_SID_INVALID; mask |= hci_proto_connect_ind(hdev, &ev->bdaddr, PA_LINK, &flags); if (!(mask & HCI_LM_ACCEPT)) { hci_le_pa_term_sync(hdev, ev->handle); goto unlock; } if (!(flags & HCI_PROTO_DEFER)) goto unlock; /* Add connection to indicate PA sync event */ pa_sync = hci_conn_add_unset(hdev, PA_LINK, BDADDR_ANY, HCI_ROLE_SLAVE); if (IS_ERR(pa_sync)) goto unlock; pa_sync->sync_handle = le16_to_cpu(ev->handle); if (ev->status) { set_bit(HCI_CONN_PA_SYNC_FAILED, &pa_sync->flags); /* Notify iso layer */ hci_connect_cfm(pa_sync, ev->status); } unlock: hci_dev_unlock(hdev); } static void hci_le_per_adv_report_evt(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_le_per_adv_report *ev = data; int mask = hdev->link_mode; __u8 flags = 0; struct hci_conn *pa_sync; bt_dev_dbg(hdev, "sync_handle 0x%4.4x", le16_to_cpu(ev->sync_handle)); hci_dev_lock(hdev); mask |= hci_proto_connect_ind(hdev, BDADDR_ANY, PA_LINK, &flags); if (!(mask & HCI_LM_ACCEPT)) goto unlock; if (!(flags & HCI_PROTO_DEFER)) goto unlock; pa_sync = hci_conn_hash_lookup_pa_sync_handle (hdev, le16_to_cpu(ev->sync_handle)); if (!pa_sync) goto unlock; if (ev->data_status == LE_PA_DATA_COMPLETE && !test_and_set_bit(HCI_CONN_PA_SYNC, &pa_sync->flags)) { /* Notify iso layer */ hci_connect_cfm(pa_sync, 0); /* Notify MGMT layer */ mgmt_device_connected(hdev, pa_sync, NULL, 0); } unlock: hci_dev_unlock(hdev); } static void hci_le_remote_feat_complete_evt(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_le_remote_feat_complete *ev = data; struct hci_conn *conn; bt_dev_dbg(hdev, "status 0x%2.2x", ev->status); hci_dev_lock(hdev); conn = hci_conn_hash_lookup_handle(hdev, __le16_to_cpu(ev->handle)); if (conn) { if (!ev->status) memcpy(conn->features[0], ev->features, 8); if (conn->state == BT_CONFIG) { __u8 status; /* If the local controller supports peripheral-initiated * features exchange, but the remote controller does * not, then it is possible that the error code 0x1a * for unsupported remote feature gets returned. * * In this specific case, allow the connection to * transition into connected state and mark it as * successful. */ if (!conn->out && ev->status == HCI_ERROR_UNSUPPORTED_REMOTE_FEATURE && (hdev->le_features[0] & HCI_LE_PERIPHERAL_FEATURES)) status = 0x00; else status = ev->status; conn->state = BT_CONNECTED; hci_connect_cfm(conn, status); hci_conn_drop(conn); } } hci_dev_unlock(hdev); } static void hci_le_ltk_request_evt(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_le_ltk_req *ev = data; struct hci_cp_le_ltk_reply cp; struct hci_cp_le_ltk_neg_reply neg; struct hci_conn *conn; struct smp_ltk *ltk; bt_dev_dbg(hdev, "handle 0x%4.4x", __le16_to_cpu(ev->handle)); hci_dev_lock(hdev); conn = hci_conn_hash_lookup_handle(hdev, __le16_to_cpu(ev->handle)); if (conn == NULL) goto not_found; ltk = hci_find_ltk(hdev, &conn->dst, conn->dst_type, conn->role); if (!ltk) goto not_found; if (smp_ltk_is_sc(ltk)) { /* With SC both EDiv and Rand are set to zero */ if (ev->ediv || ev->rand) goto not_found; } else { /* For non-SC keys check that EDiv and Rand match */ if (ev->ediv != ltk->ediv || ev->rand != ltk->rand) goto not_found; } memcpy(cp.ltk, ltk->val, ltk->enc_size); memset(cp.ltk + ltk->enc_size, 0, sizeof(cp.ltk) - ltk->enc_size); cp.handle = cpu_to_le16(conn->handle); conn->pending_sec_level = smp_ltk_sec_level(ltk); conn->enc_key_size = ltk->enc_size; hci_send_cmd(hdev, HCI_OP_LE_LTK_REPLY, sizeof(cp), &cp); /* Ref. Bluetooth Core SPEC pages 1975 and 2004. STK is a * temporary key used to encrypt a connection following * pairing. It is used during the Encrypted Session Setup to * distribute the keys. Later, security can be re-established * using a distributed LTK. */ if (ltk->type == SMP_STK) { set_bit(HCI_CONN_STK_ENCRYPT, &conn->flags); list_del_rcu(<k->list); kfree_rcu(ltk, rcu); } else { clear_bit(HCI_CONN_STK_ENCRYPT, &conn->flags); } hci_dev_unlock(hdev); return; not_found: neg.handle = ev->handle; hci_send_cmd(hdev, HCI_OP_LE_LTK_NEG_REPLY, sizeof(neg), &neg); hci_dev_unlock(hdev); } static void send_conn_param_neg_reply(struct hci_dev *hdev, u16 handle, u8 reason) { struct hci_cp_le_conn_param_req_neg_reply cp; cp.handle = cpu_to_le16(handle); cp.reason = reason; hci_send_cmd(hdev, HCI_OP_LE_CONN_PARAM_REQ_NEG_REPLY, sizeof(cp), &cp); } static void hci_le_remote_conn_param_req_evt(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_le_remote_conn_param_req *ev = data; struct hci_cp_le_conn_param_req_reply cp; struct hci_conn *hcon; u16 handle, min, max, latency, timeout; bt_dev_dbg(hdev, "handle 0x%4.4x", __le16_to_cpu(ev->handle)); handle = le16_to_cpu(ev->handle); min = le16_to_cpu(ev->interval_min); max = le16_to_cpu(ev->interval_max); latency = le16_to_cpu(ev->latency); timeout = le16_to_cpu(ev->timeout); hcon = hci_conn_hash_lookup_handle(hdev, handle); if (!hcon || hcon->state != BT_CONNECTED) return send_conn_param_neg_reply(hdev, handle, HCI_ERROR_UNKNOWN_CONN_ID); if (max > hcon->le_conn_max_interval) return send_conn_param_neg_reply(hdev, handle, HCI_ERROR_INVALID_LL_PARAMS); if (hci_check_conn_params(min, max, latency, timeout)) return send_conn_param_neg_reply(hdev, handle, HCI_ERROR_INVALID_LL_PARAMS); if (hcon->role == HCI_ROLE_MASTER) { struct hci_conn_params *params; u8 store_hint; hci_dev_lock(hdev); params = hci_conn_params_lookup(hdev, &hcon->dst, hcon->dst_type); if (params) { params->conn_min_interval = min; params->conn_max_interval = max; params->conn_latency = latency; params->supervision_timeout = timeout; store_hint = 0x01; } else { store_hint = 0x00; } hci_dev_unlock(hdev); mgmt_new_conn_param(hdev, &hcon->dst, hcon->dst_type, store_hint, min, max, latency, timeout); } cp.handle = ev->handle; cp.interval_min = ev->interval_min; cp.interval_max = ev->interval_max; cp.latency = ev->latency; cp.timeout = ev->timeout; cp.min_ce_len = 0; cp.max_ce_len = 0; hci_send_cmd(hdev, HCI_OP_LE_CONN_PARAM_REQ_REPLY, sizeof(cp), &cp); } static void hci_le_direct_adv_report_evt(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_le_direct_adv_report *ev = data; u64 instant = jiffies; int i; if (!hci_le_ev_skb_pull(hdev, skb, HCI_EV_LE_DIRECT_ADV_REPORT, flex_array_size(ev, info, ev->num))) return; if (!ev->num) return; hci_dev_lock(hdev); for (i = 0; i < ev->num; i++) { struct hci_ev_le_direct_adv_info *info = &ev->info[i]; process_adv_report(hdev, info->type, &info->bdaddr, info->bdaddr_type, &info->direct_addr, info->direct_addr_type, HCI_ADV_PHY_1M, 0, info->rssi, NULL, 0, false, false, instant); } hci_dev_unlock(hdev); } static void hci_le_phy_update_evt(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_ev_le_phy_update_complete *ev = data; struct hci_conn *conn; bt_dev_dbg(hdev, "status 0x%2.2x", ev->status); if (ev->status) return; hci_dev_lock(hdev); conn = hci_conn_hash_lookup_handle(hdev, __le16_to_cpu(ev->handle)); if (!conn) goto unlock; conn->le_tx_phy = ev->tx_phy; conn->le_rx_phy = ev->rx_phy; unlock: hci_dev_unlock(hdev); } static void hci_le_cis_established_evt(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_evt_le_cis_established *ev = data; struct hci_conn *conn; struct bt_iso_qos *qos; bool pending = false; u16 handle = __le16_to_cpu(ev->handle); u32 c_sdu_interval, p_sdu_interval; bt_dev_dbg(hdev, "status 0x%2.2x", ev->status); hci_dev_lock(hdev); conn = hci_conn_hash_lookup_handle(hdev, handle); if (!conn) { bt_dev_err(hdev, "Unable to find connection with handle 0x%4.4x", handle); goto unlock; } if (conn->type != CIS_LINK) { bt_dev_err(hdev, "Invalid connection link type handle 0x%4.4x", handle); goto unlock; } qos = &conn->iso_qos; pending = test_and_clear_bit(HCI_CONN_CREATE_CIS, &conn->flags); /* BLUETOOTH CORE SPECIFICATION Version 5.4 | Vol 6, Part G * page 3075: * Transport_Latency_C_To_P = CIG_Sync_Delay + (FT_C_To_P) × * ISO_Interval + SDU_Interval_C_To_P * ... * SDU_Interval = (CIG_Sync_Delay + (FT) x ISO_Interval) - * Transport_Latency */ c_sdu_interval = (get_unaligned_le24(ev->cig_sync_delay) + (ev->c_ft * le16_to_cpu(ev->interval) * 1250)) - get_unaligned_le24(ev->c_latency); p_sdu_interval = (get_unaligned_le24(ev->cig_sync_delay) + (ev->p_ft * le16_to_cpu(ev->interval) * 1250)) - get_unaligned_le24(ev->p_latency); switch (conn->role) { case HCI_ROLE_SLAVE: qos->ucast.in.interval = c_sdu_interval; qos->ucast.out.interval = p_sdu_interval; /* Convert Transport Latency (us) to Latency (msec) */ qos->ucast.in.latency = DIV_ROUND_CLOSEST(get_unaligned_le24(ev->c_latency), 1000); qos->ucast.out.latency = DIV_ROUND_CLOSEST(get_unaligned_le24(ev->p_latency), 1000); qos->ucast.in.sdu = ev->c_bn ? le16_to_cpu(ev->c_mtu) : 0; qos->ucast.out.sdu = ev->p_bn ? le16_to_cpu(ev->p_mtu) : 0; qos->ucast.in.phy = ev->c_phy; qos->ucast.out.phy = ev->p_phy; break; case HCI_ROLE_MASTER: qos->ucast.in.interval = p_sdu_interval; qos->ucast.out.interval = c_sdu_interval; /* Convert Transport Latency (us) to Latency (msec) */ qos->ucast.out.latency = DIV_ROUND_CLOSEST(get_unaligned_le24(ev->c_latency), 1000); qos->ucast.in.latency = DIV_ROUND_CLOSEST(get_unaligned_le24(ev->p_latency), 1000); qos->ucast.out.sdu = ev->c_bn ? le16_to_cpu(ev->c_mtu) : 0; qos->ucast.in.sdu = ev->p_bn ? le16_to_cpu(ev->p_mtu) : 0; qos->ucast.out.phy = ev->c_phy; qos->ucast.in.phy = ev->p_phy; break; } if (!ev->status) { conn->state = BT_CONNECTED; hci_debugfs_create_conn(conn); hci_conn_add_sysfs(conn); hci_iso_setup_path(conn); goto unlock; } conn->state = BT_CLOSED; hci_connect_cfm(conn, ev->status); hci_conn_del(conn); unlock: if (pending) hci_le_create_cis_pending(hdev); hci_dev_unlock(hdev); } static void hci_le_reject_cis(struct hci_dev *hdev, __le16 handle) { struct hci_cp_le_reject_cis cp; memset(&cp, 0, sizeof(cp)); cp.handle = handle; cp.reason = HCI_ERROR_REJ_BAD_ADDR; hci_send_cmd(hdev, HCI_OP_LE_REJECT_CIS, sizeof(cp), &cp); } static void hci_le_accept_cis(struct hci_dev *hdev, __le16 handle) { struct hci_cp_le_accept_cis cp; memset(&cp, 0, sizeof(cp)); cp.handle = handle; hci_send_cmd(hdev, HCI_OP_LE_ACCEPT_CIS, sizeof(cp), &cp); } static void hci_le_cis_req_evt(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_evt_le_cis_req *ev = data; u16 acl_handle, cis_handle; struct hci_conn *acl, *cis; int mask; __u8 flags = 0; acl_handle = __le16_to_cpu(ev->acl_handle); cis_handle = __le16_to_cpu(ev->cis_handle); bt_dev_dbg(hdev, "acl 0x%4.4x handle 0x%4.4x cig 0x%2.2x cis 0x%2.2x", acl_handle, cis_handle, ev->cig_id, ev->cis_id); hci_dev_lock(hdev); acl = hci_conn_hash_lookup_handle(hdev, acl_handle); if (!acl) goto unlock; mask = hci_proto_connect_ind(hdev, &acl->dst, CIS_LINK, &flags); if (!(mask & HCI_LM_ACCEPT)) { hci_le_reject_cis(hdev, ev->cis_handle); goto unlock; } cis = hci_conn_hash_lookup_handle(hdev, cis_handle); if (!cis) { cis = hci_conn_add(hdev, CIS_LINK, &acl->dst, HCI_ROLE_SLAVE, cis_handle); if (IS_ERR(cis)) { hci_le_reject_cis(hdev, ev->cis_handle); goto unlock; } } cis->iso_qos.ucast.cig = ev->cig_id; cis->iso_qos.ucast.cis = ev->cis_id; if (!(flags & HCI_PROTO_DEFER)) { hci_le_accept_cis(hdev, ev->cis_handle); } else { cis->state = BT_CONNECT2; hci_connect_cfm(cis, 0); } unlock: hci_dev_unlock(hdev); } static int hci_iso_term_big_sync(struct hci_dev *hdev, void *data) { u8 handle = PTR_UINT(data); return hci_le_terminate_big_sync(hdev, handle, HCI_ERROR_LOCAL_HOST_TERM); } static void hci_le_create_big_complete_evt(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_evt_le_create_big_complete *ev = data; struct hci_conn *conn; __u8 i = 0; BT_DBG("%s status 0x%2.2x", hdev->name, ev->status); if (!hci_le_ev_skb_pull(hdev, skb, HCI_EVT_LE_CREATE_BIG_COMPLETE, flex_array_size(ev, bis_handle, ev->num_bis))) return; hci_dev_lock(hdev); /* Connect all BISes that are bound to the BIG */ while ((conn = hci_conn_hash_lookup_big_state(hdev, ev->handle, BT_BOUND, HCI_ROLE_MASTER))) { if (ev->status) { hci_connect_cfm(conn, ev->status); hci_conn_del(conn); continue; } if (hci_conn_set_handle(conn, __le16_to_cpu(ev->bis_handle[i++]))) continue; conn->state = BT_CONNECTED; set_bit(HCI_CONN_BIG_CREATED, &conn->flags); hci_debugfs_create_conn(conn); hci_conn_add_sysfs(conn); hci_iso_setup_path(conn); } if (!ev->status && !i) /* If no BISes have been connected for the BIG, * terminate. This is in case all bound connections * have been closed before the BIG creation * has completed. */ hci_cmd_sync_queue(hdev, hci_iso_term_big_sync, UINT_PTR(ev->handle), NULL); hci_dev_unlock(hdev); } static void hci_le_big_sync_established_evt(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_evt_le_big_sync_established *ev = data; struct hci_conn *bis, *conn; int i; bt_dev_dbg(hdev, "status 0x%2.2x", ev->status); if (!hci_le_ev_skb_pull(hdev, skb, HCI_EVT_LE_BIG_SYNC_ESTABLISHED, flex_array_size(ev, bis, ev->num_bis))) return; hci_dev_lock(hdev); conn = hci_conn_hash_lookup_big_sync_pend(hdev, ev->handle, ev->num_bis); if (!conn) { bt_dev_err(hdev, "Unable to find connection for big 0x%2.2x", ev->handle); goto unlock; } clear_bit(HCI_CONN_CREATE_BIG_SYNC, &conn->flags); conn->num_bis = 0; memset(conn->bis, 0, sizeof(conn->num_bis)); for (i = 0; i < ev->num_bis; i++) { u16 handle = le16_to_cpu(ev->bis[i]); __le32 interval; bis = hci_conn_hash_lookup_handle(hdev, handle); if (!bis) { if (handle > HCI_CONN_HANDLE_MAX) { bt_dev_dbg(hdev, "ignore too large handle %u", handle); continue; } bis = hci_conn_add(hdev, BIS_LINK, BDADDR_ANY, HCI_ROLE_SLAVE, handle); if (IS_ERR(bis)) continue; } if (ev->status != 0x42) { /* Mark PA sync as established */ set_bit(HCI_CONN_PA_SYNC, &bis->flags); /* Reset cleanup callback of PA Sync so it doesn't * terminate the sync when deleting the connection. */ conn->cleanup = NULL; } bis->sync_handle = conn->sync_handle; bis->iso_qos.bcast.big = ev->handle; memset(&interval, 0, sizeof(interval)); memcpy(&interval, ev->latency, sizeof(ev->latency)); bis->iso_qos.bcast.in.interval = le32_to_cpu(interval); /* Convert ISO Interval (1.25 ms slots) to latency (ms) */ bis->iso_qos.bcast.in.latency = le16_to_cpu(ev->interval) * 125 / 100; bis->iso_qos.bcast.in.sdu = le16_to_cpu(ev->max_pdu); if (!ev->status) { bis->state = BT_CONNECTED; set_bit(HCI_CONN_BIG_SYNC, &bis->flags); hci_debugfs_create_conn(bis); hci_conn_add_sysfs(bis); hci_iso_setup_path(bis); } } /* In case BIG sync failed, notify each failed connection to * the user after all hci connections have been added */ if (ev->status) for (i = 0; i < ev->num_bis; i++) { u16 handle = le16_to_cpu(ev->bis[i]); bis = hci_conn_hash_lookup_handle(hdev, handle); if (!bis) continue; set_bit(HCI_CONN_BIG_SYNC_FAILED, &bis->flags); hci_connect_cfm(bis, ev->status); } unlock: hci_dev_unlock(hdev); } static void hci_le_big_sync_lost_evt(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_evt_le_big_sync_lost *ev = data; struct hci_conn *bis, *conn; bool mgmt_conn; bt_dev_dbg(hdev, "big handle 0x%2.2x", ev->handle); hci_dev_lock(hdev); /* Delete the pa sync connection */ bis = hci_conn_hash_lookup_pa_sync_big_handle(hdev, ev->handle); if (bis) { conn = hci_conn_hash_lookup_pa_sync_handle(hdev, bis->sync_handle); if (conn) hci_conn_del(conn); } /* Delete each bis connection */ while ((bis = hci_conn_hash_lookup_big_state(hdev, ev->handle, BT_CONNECTED, HCI_ROLE_SLAVE))) { mgmt_conn = test_and_clear_bit(HCI_CONN_MGMT_CONNECTED, &bis->flags); mgmt_device_disconnected(hdev, &bis->dst, bis->type, bis->dst_type, ev->reason, mgmt_conn); clear_bit(HCI_CONN_BIG_SYNC, &bis->flags); hci_disconn_cfm(bis, ev->reason); hci_conn_del(bis); } hci_dev_unlock(hdev); } static void hci_le_big_info_adv_report_evt(struct hci_dev *hdev, void *data, struct sk_buff *skb) { struct hci_evt_le_big_info_adv_report *ev = data; int mask = hdev->link_mode; __u8 flags = 0; struct hci_conn *pa_sync; bt_dev_dbg(hdev, "sync_handle 0x%4.4x", le16_to_cpu(ev->sync_handle)); hci_dev_lock(hdev); mask |= hci_proto_connect_ind(hdev, BDADDR_ANY, BIS_LINK, &flags); if (!(mask & HCI_LM_ACCEPT)) goto unlock; if (!(flags & HCI_PROTO_DEFER)) goto unlock; pa_sync = hci_conn_hash_lookup_pa_sync_handle (hdev, le16_to_cpu(ev->sync_handle)); if (!pa_sync) goto unlock; pa_sync->iso_qos.bcast.encryption = ev->encryption; /* Notify iso layer */ hci_connect_cfm(pa_sync, 0); unlock: hci_dev_unlock(hdev); } #define HCI_LE_EV_VL(_op, _func, _min_len, _max_len) \ [_op] = { \ .func = _func, \ .min_len = _min_len, \ .max_len = _max_len, \ } #define HCI_LE_EV(_op, _func, _len) \ HCI_LE_EV_VL(_op, _func, _len, _len) #define HCI_LE_EV_STATUS(_op, _func) \ HCI_LE_EV(_op, _func, sizeof(struct hci_ev_status)) /* Entries in this table shall have their position according to the subevent * opcode they handle so the use of the macros above is recommend since it does * attempt to initialize at its proper index using Designated Initializers that * way events without a callback function can be omitted. */ static const struct hci_le_ev { void (*func)(struct hci_dev *hdev, void *data, struct sk_buff *skb); u16 min_len; u16 max_len; } hci_le_ev_table[U8_MAX + 1] = { /* [0x01 = HCI_EV_LE_CONN_COMPLETE] */ HCI_LE_EV(HCI_EV_LE_CONN_COMPLETE, hci_le_conn_complete_evt, sizeof(struct hci_ev_le_conn_complete)), /* [0x02 = HCI_EV_LE_ADVERTISING_REPORT] */ HCI_LE_EV_VL(HCI_EV_LE_ADVERTISING_REPORT, hci_le_adv_report_evt, sizeof(struct hci_ev_le_advertising_report), HCI_MAX_EVENT_SIZE), /* [0x03 = HCI_EV_LE_CONN_UPDATE_COMPLETE] */ HCI_LE_EV(HCI_EV_LE_CONN_UPDATE_COMPLETE, hci_le_conn_update_complete_evt, sizeof(struct hci_ev_le_conn_update_complete)), /* [0x04 = HCI_EV_LE_REMOTE_FEAT_COMPLETE] */ HCI_LE_EV(HCI_EV_LE_REMOTE_FEAT_COMPLETE, hci_le_remote_feat_complete_evt, sizeof(struct hci_ev_le_remote_feat_complete)), /* [0x05 = HCI_EV_LE_LTK_REQ] */ HCI_LE_EV(HCI_EV_LE_LTK_REQ, hci_le_ltk_request_evt, sizeof(struct hci_ev_le_ltk_req)), /* [0x06 = HCI_EV_LE_REMOTE_CONN_PARAM_REQ] */ HCI_LE_EV(HCI_EV_LE_REMOTE_CONN_PARAM_REQ, hci_le_remote_conn_param_req_evt, sizeof(struct hci_ev_le_remote_conn_param_req)), /* [0x0a = HCI_EV_LE_ENHANCED_CONN_COMPLETE] */ HCI_LE_EV(HCI_EV_LE_ENHANCED_CONN_COMPLETE, hci_le_enh_conn_complete_evt, sizeof(struct hci_ev_le_enh_conn_complete)), /* [0x0b = HCI_EV_LE_DIRECT_ADV_REPORT] */ HCI_LE_EV_VL(HCI_EV_LE_DIRECT_ADV_REPORT, hci_le_direct_adv_report_evt, sizeof(struct hci_ev_le_direct_adv_report), HCI_MAX_EVENT_SIZE), /* [0x0c = HCI_EV_LE_PHY_UPDATE_COMPLETE] */ HCI_LE_EV(HCI_EV_LE_PHY_UPDATE_COMPLETE, hci_le_phy_update_evt, sizeof(struct hci_ev_le_phy_update_complete)), /* [0x0d = HCI_EV_LE_EXT_ADV_REPORT] */ HCI_LE_EV_VL(HCI_EV_LE_EXT_ADV_REPORT, hci_le_ext_adv_report_evt, sizeof(struct hci_ev_le_ext_adv_report), HCI_MAX_EVENT_SIZE), /* [0x0e = HCI_EV_LE_PA_SYNC_ESTABLISHED] */ HCI_LE_EV(HCI_EV_LE_PA_SYNC_ESTABLISHED, hci_le_pa_sync_established_evt, sizeof(struct hci_ev_le_pa_sync_established)), /* [0x0f = HCI_EV_LE_PER_ADV_REPORT] */ HCI_LE_EV_VL(HCI_EV_LE_PER_ADV_REPORT, hci_le_per_adv_report_evt, sizeof(struct hci_ev_le_per_adv_report), HCI_MAX_EVENT_SIZE), /* [0x12 = HCI_EV_LE_EXT_ADV_SET_TERM] */ HCI_LE_EV(HCI_EV_LE_EXT_ADV_SET_TERM, hci_le_ext_adv_term_evt, sizeof(struct hci_evt_le_ext_adv_set_term)), /* [0x19 = HCI_EVT_LE_CIS_ESTABLISHED] */ HCI_LE_EV(HCI_EVT_LE_CIS_ESTABLISHED, hci_le_cis_established_evt, sizeof(struct hci_evt_le_cis_established)), /* [0x1a = HCI_EVT_LE_CIS_REQ] */ HCI_LE_EV(HCI_EVT_LE_CIS_REQ, hci_le_cis_req_evt, sizeof(struct hci_evt_le_cis_req)), /* [0x1b = HCI_EVT_LE_CREATE_BIG_COMPLETE] */ HCI_LE_EV_VL(HCI_EVT_LE_CREATE_BIG_COMPLETE, hci_le_create_big_complete_evt, sizeof(struct hci_evt_le_create_big_complete), HCI_MAX_EVENT_SIZE), /* [0x1d = HCI_EV_LE_BIG_SYNC_ESTABLISHED] */ HCI_LE_EV_VL(HCI_EVT_LE_BIG_SYNC_ESTABLISHED, hci_le_big_sync_established_evt, sizeof(struct hci_evt_le_big_sync_established), HCI_MAX_EVENT_SIZE), /* [0x1e = HCI_EVT_LE_BIG_SYNC_LOST] */ HCI_LE_EV_VL(HCI_EVT_LE_BIG_SYNC_LOST, hci_le_big_sync_lost_evt, sizeof(struct hci_evt_le_big_sync_lost), HCI_MAX_EVENT_SIZE), /* [0x22 = HCI_EVT_LE_BIG_INFO_ADV_REPORT] */ HCI_LE_EV_VL(HCI_EVT_LE_BIG_INFO_ADV_REPORT, hci_le_big_info_adv_report_evt, sizeof(struct hci_evt_le_big_info_adv_report), HCI_MAX_EVENT_SIZE), }; static void hci_le_meta_evt(struct hci_dev *hdev, void *data, struct sk_buff *skb, u16 *opcode, u8 *status, hci_req_complete_t *req_complete, hci_req_complete_skb_t *req_complete_skb) { struct hci_ev_le_meta *ev = data; const struct hci_le_ev *subev; bt_dev_dbg(hdev, "subevent 0x%2.2x", ev->subevent); /* Only match event if command OGF is for LE */ if (hdev->req_skb && (hci_opcode_ogf(hci_skb_opcode(hdev->req_skb)) == 0x08 || hci_skb_opcode(hdev->req_skb) == HCI_OP_NOP) && hci_skb_event(hdev->req_skb) == ev->subevent) { *opcode = hci_skb_opcode(hdev->req_skb); hci_req_cmd_complete(hdev, *opcode, 0x00, req_complete, req_complete_skb); } subev = &hci_le_ev_table[ev->subevent]; if (!subev->func) return; if (skb->len < subev->min_len) { bt_dev_err(hdev, "unexpected subevent 0x%2.2x length: %u < %u", ev->subevent, skb->len, subev->min_len); return; } /* Just warn if the length is over max_len size it still be * possible to partially parse the event so leave to callback to * decide if that is acceptable. */ if (skb->len > subev->max_len) bt_dev_warn(hdev, "unexpected subevent 0x%2.2x length: %u > %u", ev->subevent, skb->len, subev->max_len); data = hci_le_ev_skb_pull(hdev, skb, ev->subevent, subev->min_len); if (!data) return; subev->func(hdev, data, skb); } static bool hci_get_cmd_complete(struct hci_dev *hdev, u16 opcode, u8 event, struct sk_buff *skb) { struct hci_ev_cmd_complete *ev; struct hci_event_hdr *hdr; if (!skb) return false; hdr = hci_ev_skb_pull(hdev, skb, event, sizeof(*hdr)); if (!hdr) return false; if (event) { if (hdr->evt != event) return false; return true; } /* Check if request ended in Command Status - no way to retrieve * any extra parameters in this case. */ if (hdr->evt == HCI_EV_CMD_STATUS) return false; if (hdr->evt != HCI_EV_CMD_COMPLETE) { bt_dev_err(hdev, "last event is not cmd complete (0x%2.2x)", hdr->evt); return false; } ev = hci_cc_skb_pull(hdev, skb, opcode, sizeof(*ev)); if (!ev) return false; if (opcode != __le16_to_cpu(ev->opcode)) { BT_DBG("opcode doesn't match (0x%2.2x != 0x%2.2x)", opcode, __le16_to_cpu(ev->opcode)); return false; } return true; } static void hci_store_wake_reason(struct hci_dev *hdev, u8 event, struct sk_buff *skb) { struct hci_ev_le_advertising_info *adv; struct hci_ev_le_direct_adv_info *direct_adv; struct hci_ev_le_ext_adv_info *ext_adv; const struct hci_ev_conn_complete *conn_complete = (void *)skb->data; const struct hci_ev_conn_request *conn_request = (void *)skb->data; hci_dev_lock(hdev); /* If we are currently suspended and this is the first BT event seen, * save the wake reason associated with the event. */ if (!hdev->suspended || hdev->wake_reason) goto unlock; /* Default to remote wake. Values for wake_reason are documented in the * Bluez mgmt api docs. */ hdev->wake_reason = MGMT_WAKE_REASON_REMOTE_WAKE; /* Once configured for remote wakeup, we should only wake up for * reconnections. It's useful to see which device is waking us up so * keep track of the bdaddr of the connection event that woke us up. */ if (event == HCI_EV_CONN_REQUEST) { bacpy(&hdev->wake_addr, &conn_request->bdaddr); hdev->wake_addr_type = BDADDR_BREDR; } else if (event == HCI_EV_CONN_COMPLETE) { bacpy(&hdev->wake_addr, &conn_complete->bdaddr); hdev->wake_addr_type = BDADDR_BREDR; } else if (event == HCI_EV_LE_META) { struct hci_ev_le_meta *le_ev = (void *)skb->data; u8 subevent = le_ev->subevent; u8 *ptr = &skb->data[sizeof(*le_ev)]; u8 num_reports = *ptr; if ((subevent == HCI_EV_LE_ADVERTISING_REPORT || subevent == HCI_EV_LE_DIRECT_ADV_REPORT || subevent == HCI_EV_LE_EXT_ADV_REPORT) && num_reports) { adv = (void *)(ptr + 1); direct_adv = (void *)(ptr + 1); ext_adv = (void *)(ptr + 1); switch (subevent) { case HCI_EV_LE_ADVERTISING_REPORT: bacpy(&hdev->wake_addr, &adv->bdaddr); hdev->wake_addr_type = adv->bdaddr_type; break; case HCI_EV_LE_DIRECT_ADV_REPORT: bacpy(&hdev->wake_addr, &direct_adv->bdaddr); hdev->wake_addr_type = direct_adv->bdaddr_type; break; case HCI_EV_LE_EXT_ADV_REPORT: bacpy(&hdev->wake_addr, &ext_adv->bdaddr); hdev->wake_addr_type = ext_adv->bdaddr_type; break; } } } else { hdev->wake_reason = MGMT_WAKE_REASON_UNEXPECTED; } unlock: hci_dev_unlock(hdev); } #define HCI_EV_VL(_op, _func, _min_len, _max_len) \ [_op] = { \ .req = false, \ .func = _func, \ .min_len = _min_len, \ .max_len = _max_len, \ } #define HCI_EV(_op, _func, _len) \ HCI_EV_VL(_op, _func, _len, _len) #define HCI_EV_STATUS(_op, _func) \ HCI_EV(_op, _func, sizeof(struct hci_ev_status)) #define HCI_EV_REQ_VL(_op, _func, _min_len, _max_len) \ [_op] = { \ .req = true, \ .func_req = _func, \ .min_len = _min_len, \ .max_len = _max_len, \ } #define HCI_EV_REQ(_op, _func, _len) \ HCI_EV_REQ_VL(_op, _func, _len, _len) /* Entries in this table shall have their position according to the event opcode * they handle so the use of the macros above is recommend since it does attempt * to initialize at its proper index using Designated Initializers that way * events without a callback function don't have entered. */ static const struct hci_ev { bool req; union { void (*func)(struct hci_dev *hdev, void *data, struct sk_buff *skb); void (*func_req)(struct hci_dev *hdev, void *data, struct sk_buff *skb, u16 *opcode, u8 *status, hci_req_complete_t *req_complete, hci_req_complete_skb_t *req_complete_skb); }; u16 min_len; u16 max_len; } hci_ev_table[U8_MAX + 1] = { /* [0x01 = HCI_EV_INQUIRY_COMPLETE] */ HCI_EV_STATUS(HCI_EV_INQUIRY_COMPLETE, hci_inquiry_complete_evt), /* [0x02 = HCI_EV_INQUIRY_RESULT] */ HCI_EV_VL(HCI_EV_INQUIRY_RESULT, hci_inquiry_result_evt, sizeof(struct hci_ev_inquiry_result), HCI_MAX_EVENT_SIZE), /* [0x03 = HCI_EV_CONN_COMPLETE] */ HCI_EV(HCI_EV_CONN_COMPLETE, hci_conn_complete_evt, sizeof(struct hci_ev_conn_complete)), /* [0x04 = HCI_EV_CONN_REQUEST] */ HCI_EV(HCI_EV_CONN_REQUEST, hci_conn_request_evt, sizeof(struct hci_ev_conn_request)), /* [0x05 = HCI_EV_DISCONN_COMPLETE] */ HCI_EV(HCI_EV_DISCONN_COMPLETE, hci_disconn_complete_evt, sizeof(struct hci_ev_disconn_complete)), /* [0x06 = HCI_EV_AUTH_COMPLETE] */ HCI_EV(HCI_EV_AUTH_COMPLETE, hci_auth_complete_evt, sizeof(struct hci_ev_auth_complete)), /* [0x07 = HCI_EV_REMOTE_NAME] */ HCI_EV(HCI_EV_REMOTE_NAME, hci_remote_name_evt, sizeof(struct hci_ev_remote_name)), /* [0x08 = HCI_EV_ENCRYPT_CHANGE] */ HCI_EV(HCI_EV_ENCRYPT_CHANGE, hci_encrypt_change_evt, sizeof(struct hci_ev_encrypt_change)), /* [0x09 = HCI_EV_CHANGE_LINK_KEY_COMPLETE] */ HCI_EV(HCI_EV_CHANGE_LINK_KEY_COMPLETE, hci_change_link_key_complete_evt, sizeof(struct hci_ev_change_link_key_complete)), /* [0x0b = HCI_EV_REMOTE_FEATURES] */ HCI_EV(HCI_EV_REMOTE_FEATURES, hci_remote_features_evt, sizeof(struct hci_ev_remote_features)), /* [0x0e = HCI_EV_CMD_COMPLETE] */ HCI_EV_REQ_VL(HCI_EV_CMD_COMPLETE, hci_cmd_complete_evt, sizeof(struct hci_ev_cmd_complete), HCI_MAX_EVENT_SIZE), /* [0x0f = HCI_EV_CMD_STATUS] */ HCI_EV_REQ(HCI_EV_CMD_STATUS, hci_cmd_status_evt, sizeof(struct hci_ev_cmd_status)), /* [0x10 = HCI_EV_CMD_STATUS] */ HCI_EV(HCI_EV_HARDWARE_ERROR, hci_hardware_error_evt, sizeof(struct hci_ev_hardware_error)), /* [0x12 = HCI_EV_ROLE_CHANGE] */ HCI_EV(HCI_EV_ROLE_CHANGE, hci_role_change_evt, sizeof(struct hci_ev_role_change)), /* [0x13 = HCI_EV_NUM_COMP_PKTS] */ HCI_EV_VL(HCI_EV_NUM_COMP_PKTS, hci_num_comp_pkts_evt, sizeof(struct hci_ev_num_comp_pkts), HCI_MAX_EVENT_SIZE), /* [0x14 = HCI_EV_MODE_CHANGE] */ HCI_EV(HCI_EV_MODE_CHANGE, hci_mode_change_evt, sizeof(struct hci_ev_mode_change)), /* [0x16 = HCI_EV_PIN_CODE_REQ] */ HCI_EV(HCI_EV_PIN_CODE_REQ, hci_pin_code_request_evt, sizeof(struct hci_ev_pin_code_req)), /* [0x17 = HCI_EV_LINK_KEY_REQ] */ HCI_EV(HCI_EV_LINK_KEY_REQ, hci_link_key_request_evt, sizeof(struct hci_ev_link_key_req)), /* [0x18 = HCI_EV_LINK_KEY_NOTIFY] */ HCI_EV(HCI_EV_LINK_KEY_NOTIFY, hci_link_key_notify_evt, sizeof(struct hci_ev_link_key_notify)), /* [0x1c = HCI_EV_CLOCK_OFFSET] */ HCI_EV(HCI_EV_CLOCK_OFFSET, hci_clock_offset_evt, sizeof(struct hci_ev_clock_offset)), /* [0x1d = HCI_EV_PKT_TYPE_CHANGE] */ HCI_EV(HCI_EV_PKT_TYPE_CHANGE, hci_pkt_type_change_evt, sizeof(struct hci_ev_pkt_type_change)), /* [0x20 = HCI_EV_PSCAN_REP_MODE] */ HCI_EV(HCI_EV_PSCAN_REP_MODE, hci_pscan_rep_mode_evt, sizeof(struct hci_ev_pscan_rep_mode)), /* [0x22 = HCI_EV_INQUIRY_RESULT_WITH_RSSI] */ HCI_EV_VL(HCI_EV_INQUIRY_RESULT_WITH_RSSI, hci_inquiry_result_with_rssi_evt, sizeof(struct hci_ev_inquiry_result_rssi), HCI_MAX_EVENT_SIZE), /* [0x23 = HCI_EV_REMOTE_EXT_FEATURES] */ HCI_EV(HCI_EV_REMOTE_EXT_FEATURES, hci_remote_ext_features_evt, sizeof(struct hci_ev_remote_ext_features)), /* [0x2c = HCI_EV_SYNC_CONN_COMPLETE] */ HCI_EV(HCI_EV_SYNC_CONN_COMPLETE, hci_sync_conn_complete_evt, sizeof(struct hci_ev_sync_conn_complete)), /* [0x2f = HCI_EV_EXTENDED_INQUIRY_RESULT] */ HCI_EV_VL(HCI_EV_EXTENDED_INQUIRY_RESULT, hci_extended_inquiry_result_evt, sizeof(struct hci_ev_ext_inquiry_result), HCI_MAX_EVENT_SIZE), /* [0x30 = HCI_EV_KEY_REFRESH_COMPLETE] */ HCI_EV(HCI_EV_KEY_REFRESH_COMPLETE, hci_key_refresh_complete_evt, sizeof(struct hci_ev_key_refresh_complete)), /* [0x31 = HCI_EV_IO_CAPA_REQUEST] */ HCI_EV(HCI_EV_IO_CAPA_REQUEST, hci_io_capa_request_evt, sizeof(struct hci_ev_io_capa_request)), /* [0x32 = HCI_EV_IO_CAPA_REPLY] */ HCI_EV(HCI_EV_IO_CAPA_REPLY, hci_io_capa_reply_evt, sizeof(struct hci_ev_io_capa_reply)), /* [0x33 = HCI_EV_USER_CONFIRM_REQUEST] */ HCI_EV(HCI_EV_USER_CONFIRM_REQUEST, hci_user_confirm_request_evt, sizeof(struct hci_ev_user_confirm_req)), /* [0x34 = HCI_EV_USER_PASSKEY_REQUEST] */ HCI_EV(HCI_EV_USER_PASSKEY_REQUEST, hci_user_passkey_request_evt, sizeof(struct hci_ev_user_passkey_req)), /* [0x35 = HCI_EV_REMOTE_OOB_DATA_REQUEST] */ HCI_EV(HCI_EV_REMOTE_OOB_DATA_REQUEST, hci_remote_oob_data_request_evt, sizeof(struct hci_ev_remote_oob_data_request)), /* [0x36 = HCI_EV_SIMPLE_PAIR_COMPLETE] */ HCI_EV(HCI_EV_SIMPLE_PAIR_COMPLETE, hci_simple_pair_complete_evt, sizeof(struct hci_ev_simple_pair_complete)), /* [0x3b = HCI_EV_USER_PASSKEY_NOTIFY] */ HCI_EV(HCI_EV_USER_PASSKEY_NOTIFY, hci_user_passkey_notify_evt, sizeof(struct hci_ev_user_passkey_notify)), /* [0x3c = HCI_EV_KEYPRESS_NOTIFY] */ HCI_EV(HCI_EV_KEYPRESS_NOTIFY, hci_keypress_notify_evt, sizeof(struct hci_ev_keypress_notify)), /* [0x3d = HCI_EV_REMOTE_HOST_FEATURES] */ HCI_EV(HCI_EV_REMOTE_HOST_FEATURES, hci_remote_host_features_evt, sizeof(struct hci_ev_remote_host_features)), /* [0x3e = HCI_EV_LE_META] */ HCI_EV_REQ_VL(HCI_EV_LE_META, hci_le_meta_evt, sizeof(struct hci_ev_le_meta), HCI_MAX_EVENT_SIZE), /* [0xff = HCI_EV_VENDOR] */ HCI_EV_VL(HCI_EV_VENDOR, msft_vendor_evt, 0, HCI_MAX_EVENT_SIZE), }; static void hci_event_func(struct hci_dev *hdev, u8 event, struct sk_buff *skb, u16 *opcode, u8 *status, hci_req_complete_t *req_complete, hci_req_complete_skb_t *req_complete_skb) { const struct hci_ev *ev = &hci_ev_table[event]; void *data; if (!ev->func) return; if (skb->len < ev->min_len) { bt_dev_err(hdev, "unexpected event 0x%2.2x length: %u < %u", event, skb->len, ev->min_len); return; } /* Just warn if the length is over max_len size it still be * possible to partially parse the event so leave to callback to * decide if that is acceptable. */ if (skb->len > ev->max_len) bt_dev_warn_ratelimited(hdev, "unexpected event 0x%2.2x length: %u > %u", event, skb->len, ev->max_len); data = hci_ev_skb_pull(hdev, skb, event, ev->min_len); if (!data) return; if (ev->req) ev->func_req(hdev, data, skb, opcode, status, req_complete, req_complete_skb); else ev->func(hdev, data, skb); } void hci_event_packet(struct hci_dev *hdev, struct sk_buff *skb) { struct hci_event_hdr *hdr = (void *) skb->data; hci_req_complete_t req_complete = NULL; hci_req_complete_skb_t req_complete_skb = NULL; struct sk_buff *orig_skb = NULL; u8 status = 0, event, req_evt = 0; u16 opcode = HCI_OP_NOP; if (skb->len < sizeof(*hdr)) { bt_dev_err(hdev, "Malformed HCI Event"); goto done; } hci_dev_lock(hdev); kfree_skb(hdev->recv_event); hdev->recv_event = skb_clone(skb, GFP_KERNEL); hci_dev_unlock(hdev); event = hdr->evt; if (!event) { bt_dev_warn(hdev, "Received unexpected HCI Event 0x%2.2x", event); goto done; } /* Only match event if command OGF is not for LE */ if (hdev->req_skb && hci_opcode_ogf(hci_skb_opcode(hdev->req_skb)) != 0x08 && hci_skb_event(hdev->req_skb) == event) { hci_req_cmd_complete(hdev, hci_skb_opcode(hdev->req_skb), status, &req_complete, &req_complete_skb); req_evt = event; } /* If it looks like we might end up having to call * req_complete_skb, store a pristine copy of the skb since the * various handlers may modify the original one through * skb_pull() calls, etc. */ if (req_complete_skb || event == HCI_EV_CMD_STATUS || event == HCI_EV_CMD_COMPLETE) orig_skb = skb_clone(skb, GFP_KERNEL); skb_pull(skb, HCI_EVENT_HDR_SIZE); /* Store wake reason if we're suspended */ hci_store_wake_reason(hdev, event, skb); bt_dev_dbg(hdev, "event 0x%2.2x", event); hci_event_func(hdev, event, skb, &opcode, &status, &req_complete, &req_complete_skb); if (req_complete) { req_complete(hdev, status, opcode); } else if (req_complete_skb) { if (!hci_get_cmd_complete(hdev, opcode, req_evt, orig_skb)) { kfree_skb(orig_skb); orig_skb = NULL; } req_complete_skb(hdev, status, opcode, orig_skb); } done: kfree_skb(orig_skb); kfree_skb(skb); hdev->stat.evt_rx++; } |
| 249 1 2 47 2 47 2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 | /* SPDX-License-Identifier: GPL-2.0-or-later */ /* * INET An implementation of the TCP/IP protocol suite for the LINUX * operating system. INET is implemented using the BSD Socket * interface as the means of communication with the user level. * * Definitions for the UDP protocol. * * Version: @(#)udp.h 1.0.2 04/28/93 * * Author: Fred N. van Kempen, <waltje@uWalt.NL.Mugnet.ORG> */ #ifndef _LINUX_UDP_H #define _LINUX_UDP_H #include <net/inet_sock.h> #include <linux/skbuff.h> #include <net/netns/hash.h> #include <uapi/linux/udp.h> static inline struct udphdr *udp_hdr(const struct sk_buff *skb) { return (struct udphdr *)skb_transport_header(skb); } #define UDP_HTABLE_SIZE_MIN_PERNET 128 #define UDP_HTABLE_SIZE_MIN (IS_ENABLED(CONFIG_BASE_SMALL) ? 128 : 256) #define UDP_HTABLE_SIZE_MAX 65536 static inline u32 udp_hashfn(const struct net *net, u32 num, u32 mask) { return (num + net_hash_mix(net)) & mask; } enum { UDP_FLAGS_CORK, /* Cork is required */ UDP_FLAGS_NO_CHECK6_TX, /* Send zero UDP6 checksums on TX? */ UDP_FLAGS_NO_CHECK6_RX, /* Allow zero UDP6 checksums on RX? */ UDP_FLAGS_GRO_ENABLED, /* Request GRO aggregation */ UDP_FLAGS_ACCEPT_FRAGLIST, UDP_FLAGS_ACCEPT_L4, UDP_FLAGS_ENCAP_ENABLED, /* This socket enabled encap */ UDP_FLAGS_UDPLITE_SEND_CC, /* set via udplite setsockopt */ UDP_FLAGS_UDPLITE_RECV_CC, /* set via udplite setsockopt */ }; /* per NUMA structure for lockless producer usage. */ struct udp_prod_queue { struct llist_head ll_root ____cacheline_aligned_in_smp; atomic_t rmem_alloc; }; struct udp_sock { /* inet_sock has to be the first member */ struct inet_sock inet; #define udp_port_hash inet.sk.__sk_common.skc_u16hashes[0] #define udp_portaddr_hash inet.sk.__sk_common.skc_u16hashes[1] #define udp_portaddr_node inet.sk.__sk_common.skc_portaddr_node unsigned long udp_flags; int pending; /* Any pending frames ? */ __u8 encap_type; /* Is this an Encapsulation socket? */ #if !IS_ENABLED(CONFIG_BASE_SMALL) /* For UDP 4-tuple hash */ __u16 udp_lrpa_hash; struct hlist_nulls_node udp_lrpa_node; #endif /* * Following member retains the information to create a UDP header * when the socket is uncorked. */ __u16 len; /* total length of pending frames */ __u16 gso_size; /* * Fields specific to UDP-Lite. */ __u16 pcslen; __u16 pcrlen; /* * For encapsulation sockets. */ int (*encap_rcv)(struct sock *sk, struct sk_buff *skb); void (*encap_err_rcv)(struct sock *sk, struct sk_buff *skb, int err, __be16 port, u32 info, u8 *payload); int (*encap_err_lookup)(struct sock *sk, struct sk_buff *skb); void (*encap_destroy)(struct sock *sk); /* GRO functions for UDP socket */ struct sk_buff * (*gro_receive)(struct sock *sk, struct list_head *head, struct sk_buff *skb); int (*gro_complete)(struct sock *sk, struct sk_buff *skb, int nhoff); struct udp_prod_queue *udp_prod_queue; /* udp_recvmsg try to use this before splicing sk_receive_queue */ struct sk_buff_head reader_queue ____cacheline_aligned_in_smp; /* This field is dirtied by udp_recvmsg() */ int forward_deficit; /* This fields follows rcvbuf value, and is touched by udp_recvmsg */ int forward_threshold; /* Cache friendly copy of sk->sk_peek_off >= 0 */ bool peeking_with_offset; /* * Accounting for the tunnel GRO fastpath. * Unprotected by compilers guard, as it uses space available in * the last UDP socket cacheline. */ struct hlist_node tunnel_list; struct numa_drop_counters drop_counters; }; #define udp_test_bit(nr, sk) \ test_bit(UDP_FLAGS_##nr, &udp_sk(sk)->udp_flags) #define udp_set_bit(nr, sk) \ set_bit(UDP_FLAGS_##nr, &udp_sk(sk)->udp_flags) #define udp_test_and_set_bit(nr, sk) \ test_and_set_bit(UDP_FLAGS_##nr, &udp_sk(sk)->udp_flags) #define udp_clear_bit(nr, sk) \ clear_bit(UDP_FLAGS_##nr, &udp_sk(sk)->udp_flags) #define udp_assign_bit(nr, sk, val) \ assign_bit(UDP_FLAGS_##nr, &udp_sk(sk)->udp_flags, val) #define UDP_MAX_SEGMENTS (1 << 7UL) #define udp_sk(ptr) container_of_const(ptr, struct udp_sock, inet.sk) static inline int udp_set_peek_off(struct sock *sk, int val) { sk_set_peek_off(sk, val); WRITE_ONCE(udp_sk(sk)->peeking_with_offset, val >= 0); return 0; } static inline void udp_set_no_check6_tx(struct sock *sk, bool val) { udp_assign_bit(NO_CHECK6_TX, sk, val); } static inline void udp_set_no_check6_rx(struct sock *sk, bool val) { udp_assign_bit(NO_CHECK6_RX, sk, val); } static inline bool udp_get_no_check6_tx(const struct sock *sk) { return udp_test_bit(NO_CHECK6_TX, sk); } static inline bool udp_get_no_check6_rx(const struct sock *sk) { return udp_test_bit(NO_CHECK6_RX, sk); } static inline void udp_cmsg_recv(struct msghdr *msg, struct sock *sk, struct sk_buff *skb) { int gso_size; if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4) { gso_size = skb_shinfo(skb)->gso_size; put_cmsg(msg, SOL_UDP, UDP_GRO, sizeof(gso_size), &gso_size); } } DECLARE_STATIC_KEY_FALSE(udp_encap_needed_key); #if IS_ENABLED(CONFIG_IPV6) DECLARE_STATIC_KEY_FALSE(udpv6_encap_needed_key); #endif static inline bool udp_encap_needed(void) { if (static_branch_unlikely(&udp_encap_needed_key)) return true; #if IS_ENABLED(CONFIG_IPV6) if (static_branch_unlikely(&udpv6_encap_needed_key)) return true; #endif return false; } static inline bool udp_unexpected_gso(struct sock *sk, struct sk_buff *skb) { if (!skb_is_gso(skb)) return false; if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4 && !udp_test_bit(ACCEPT_L4, sk)) return true; if (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST && !udp_test_bit(ACCEPT_FRAGLIST, sk)) return true; /* GSO packets lacking the SKB_GSO_UDP_TUNNEL/_CSUM bits might still * land in a tunnel as the socket check in udp_gro_receive cannot be * foolproof. */ if (udp_encap_needed() && READ_ONCE(udp_sk(sk)->encap_rcv) && !(skb_shinfo(skb)->gso_type & (SKB_GSO_UDP_TUNNEL | SKB_GSO_UDP_TUNNEL_CSUM))) return true; return false; } static inline void udp_allow_gso(struct sock *sk) { udp_set_bit(ACCEPT_L4, sk); udp_set_bit(ACCEPT_FRAGLIST, sk); } #define udp_portaddr_for_each_entry(__sk, list) \ hlist_for_each_entry(__sk, list, __sk_common.skc_portaddr_node) #define udp_portaddr_for_each_entry_from(__sk) \ hlist_for_each_entry_from(__sk, __sk_common.skc_portaddr_node) #define udp_portaddr_for_each_entry_rcu(__sk, list) \ hlist_for_each_entry_rcu(__sk, list, __sk_common.skc_portaddr_node) #if !IS_ENABLED(CONFIG_BASE_SMALL) #define udp_lrpa_for_each_entry_rcu(__up, node, list) \ hlist_nulls_for_each_entry_rcu(__up, node, list, udp_lrpa_node) #endif #define IS_UDPLITE(__sk) (__sk->sk_protocol == IPPROTO_UDPLITE) static inline struct sock *udp_tunnel_sk(const struct net *net, bool is_ipv6) { #if IS_ENABLED(CONFIG_NET_UDP_TUNNEL) return rcu_dereference(net->ipv4.udp_tunnel_gro[is_ipv6].sk); #else return NULL; #endif } #endif /* _LINUX_UDP_H */ |
| 137 137 138 137 137 1 138 139 139 138 6 6 6 2 18 18 10 8 18 18 6 6 6 7 14 2 6 6 17 7 10 61 2 6 3 60 52 7 78 78 17 3 6 47 5 4 14 29 14 14 14 14 19 21 1 11 32 14 7 7 4 14 31 13 17 21 14 17 2 2 17 14 19 27 17 5 12 12 1 1 5 1 6 1 6 1 1 4 1 5 5 11 33 33 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 | // SPDX-License-Identifier: GPL-2.0-or-later /* * INET An implementation of the TCP/IP protocol suite for the LINUX * operating system. INET is implemented using the BSD Socket * interface as the means of communication with the user level. * * "Ping" sockets * * Based on ipv4/udp.c code. * * Authors: Vasiliy Kulikov / Openwall (for Linux 2.6), * Pavel Kankovsky (for Linux 2.4.32) * * Pavel gave all rights to bugs to Vasiliy, * none of the bugs are Pavel's now. */ #include <linux/uaccess.h> #include <linux/types.h> #include <linux/fcntl.h> #include <linux/socket.h> #include <linux/sockios.h> #include <linux/in.h> #include <linux/errno.h> #include <linux/timer.h> #include <linux/mm.h> #include <linux/inet.h> #include <linux/netdevice.h> #include <net/snmp.h> #include <net/ip.h> #include <net/icmp.h> #include <net/protocol.h> #include <linux/skbuff.h> #include <linux/proc_fs.h> #include <linux/export.h> #include <linux/bpf-cgroup.h> #include <net/sock.h> #include <net/ping.h> #include <net/udp.h> #include <net/route.h> #include <net/inet_common.h> #include <net/checksum.h> #if IS_ENABLED(CONFIG_IPV6) #include <linux/in6.h> #include <linux/icmpv6.h> #include <net/addrconf.h> #include <net/ipv6.h> #include <net/transp_v6.h> #endif struct ping_table { struct hlist_head hash[PING_HTABLE_SIZE]; spinlock_t lock; }; static struct ping_table ping_table; struct pingv6_ops pingv6_ops; EXPORT_IPV6_MOD_GPL(pingv6_ops); static inline u32 ping_hashfn(const struct net *net, u32 num, u32 mask) { u32 res = (num + net_hash_mix(net)) & mask; pr_debug("hash(%u) = %u\n", num, res); return res; } static inline struct hlist_head *ping_hashslot(struct ping_table *table, struct net *net, unsigned int num) { return &table->hash[ping_hashfn(net, num, PING_HTABLE_MASK)]; } int ping_get_port(struct sock *sk, unsigned short ident) { struct net *net = sock_net(sk); struct inet_sock *isk, *isk2; struct hlist_head *hlist; struct sock *sk2 = NULL; isk = inet_sk(sk); spin_lock(&ping_table.lock); if (ident == 0) { u16 result = net->ipv4.ping_port_rover + 1; u32 i; for (i = 0; i < (1L << 16); i++, result++) { if (!result) continue; /* avoid zero */ hlist = ping_hashslot(&ping_table, net, result); sk_for_each(sk2, hlist) { if (!net_eq(sock_net(sk2), net)) continue; isk2 = inet_sk(sk2); if (isk2->inet_num == result) goto next_port; } /* found */ net->ipv4.ping_port_rover = ident = result; break; next_port: ; } if (i >= (1L << 16)) goto fail; } else { hlist = ping_hashslot(&ping_table, net, ident); sk_for_each(sk2, hlist) { if (!net_eq(sock_net(sk2), net)) continue; isk2 = inet_sk(sk2); /* BUG? Why is this reuse and not reuseaddr? ping.c * doesn't turn off SO_REUSEADDR, and it doesn't expect * that other ping processes can steal its packets. */ if ((isk2->inet_num == ident) && (sk2 != sk) && (!sk2->sk_reuse || !sk->sk_reuse)) goto fail; } } pr_debug("found port/ident = %d\n", ident); isk->inet_num = ident; if (sk_unhashed(sk)) { pr_debug("was not hashed\n"); sk_add_node_rcu(sk, hlist); sock_set_flag(sk, SOCK_RCU_FREE); sock_prot_inuse_add(net, sk->sk_prot, 1); } spin_unlock(&ping_table.lock); return 0; fail: spin_unlock(&ping_table.lock); return -EADDRINUSE; } EXPORT_IPV6_MOD_GPL(ping_get_port); void ping_unhash(struct sock *sk) { struct inet_sock *isk = inet_sk(sk); pr_debug("ping_unhash(isk=%p,isk->num=%u)\n", isk, isk->inet_num); spin_lock(&ping_table.lock); if (sk_del_node_init_rcu(sk)) { isk->inet_num = 0; isk->inet_sport = 0; sock_prot_inuse_add(sock_net(sk), sk->sk_prot, -1); } spin_unlock(&ping_table.lock); } EXPORT_IPV6_MOD_GPL(ping_unhash); /* Called under rcu_read_lock() */ static struct sock *ping_lookup(struct net *net, struct sk_buff *skb, u16 ident) { struct hlist_head *hslot = ping_hashslot(&ping_table, net, ident); struct sock *sk = NULL; struct inet_sock *isk; int dif, sdif; if (skb->protocol == htons(ETH_P_IP)) { dif = inet_iif(skb); sdif = inet_sdif(skb); pr_debug("try to find: num = %d, daddr = %pI4, dif = %d\n", (int)ident, &ip_hdr(skb)->daddr, dif); #if IS_ENABLED(CONFIG_IPV6) } else if (skb->protocol == htons(ETH_P_IPV6)) { dif = inet6_iif(skb); sdif = inet6_sdif(skb); pr_debug("try to find: num = %d, daddr = %pI6c, dif = %d\n", (int)ident, &ipv6_hdr(skb)->daddr, dif); #endif } else { return NULL; } sk_for_each_rcu(sk, hslot) { if (!net_eq(sock_net(sk), net)) continue; isk = inet_sk(sk); pr_debug("iterate\n"); if (isk->inet_num != ident) continue; if (skb->protocol == htons(ETH_P_IP) && sk->sk_family == AF_INET) { pr_debug("found: %p: num=%d, daddr=%pI4, dif=%d\n", sk, (int) isk->inet_num, &isk->inet_rcv_saddr, sk->sk_bound_dev_if); if (isk->inet_rcv_saddr && isk->inet_rcv_saddr != ip_hdr(skb)->daddr) continue; #if IS_ENABLED(CONFIG_IPV6) } else if (skb->protocol == htons(ETH_P_IPV6) && sk->sk_family == AF_INET6) { pr_debug("found: %p: num=%d, daddr=%pI6c, dif=%d\n", sk, (int) isk->inet_num, &sk->sk_v6_rcv_saddr, sk->sk_bound_dev_if); if (!ipv6_addr_any(&sk->sk_v6_rcv_saddr) && !ipv6_addr_equal(&sk->sk_v6_rcv_saddr, &ipv6_hdr(skb)->daddr)) continue; #endif } else { continue; } if (sk->sk_bound_dev_if && sk->sk_bound_dev_if != dif && sk->sk_bound_dev_if != sdif) continue; goto exit; } sk = NULL; exit: return sk; } static void inet_get_ping_group_range_net(struct net *net, kgid_t *low, kgid_t *high) { kgid_t *data = net->ipv4.ping_group_range.range; unsigned int seq; do { seq = read_seqbegin(&net->ipv4.ping_group_range.lock); *low = data[0]; *high = data[1]; } while (read_seqretry(&net->ipv4.ping_group_range.lock, seq)); } int ping_init_sock(struct sock *sk) { struct net *net = sock_net(sk); kgid_t group = current_egid(); struct group_info *group_info; int i; kgid_t low, high; int ret = 0; if (sk->sk_family == AF_INET6) sk->sk_ipv6only = 1; inet_get_ping_group_range_net(net, &low, &high); if (gid_lte(low, group) && gid_lte(group, high)) return 0; group_info = get_current_groups(); for (i = 0; i < group_info->ngroups; i++) { kgid_t gid = group_info->gid[i]; if (gid_lte(low, gid) && gid_lte(gid, high)) goto out_release_group; } ret = -EACCES; out_release_group: put_group_info(group_info); return ret; } EXPORT_IPV6_MOD_GPL(ping_init_sock); void ping_close(struct sock *sk, long timeout) { pr_debug("ping_close(sk=%p,sk->num=%u)\n", inet_sk(sk), inet_sk(sk)->inet_num); pr_debug("isk->refcnt = %d\n", refcount_read(&sk->sk_refcnt)); sk_common_release(sk); } EXPORT_IPV6_MOD_GPL(ping_close); static int ping_pre_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len) { /* This check is replicated from __ip4_datagram_connect() and * intended to prevent BPF program called below from accessing bytes * that are out of the bound specified by user in addr_len. */ if (addr_len < sizeof(struct sockaddr_in)) return -EINVAL; return BPF_CGROUP_RUN_PROG_INET4_CONNECT_LOCK(sk, uaddr, &addr_len); } /* Checks the bind address and possibly modifies sk->sk_bound_dev_if. */ static int ping_check_bind_addr(struct sock *sk, struct inet_sock *isk, struct sockaddr *uaddr, int addr_len) { struct net *net = sock_net(sk); if (sk->sk_family == AF_INET) { struct sockaddr_in *addr = (struct sockaddr_in *) uaddr; u32 tb_id = RT_TABLE_LOCAL; int chk_addr_ret; if (addr_len < sizeof(*addr)) return -EINVAL; if (addr->sin_family != AF_INET && !(addr->sin_family == AF_UNSPEC && addr->sin_addr.s_addr == htonl(INADDR_ANY))) return -EAFNOSUPPORT; pr_debug("ping_check_bind_addr(sk=%p,addr=%pI4,port=%d)\n", sk, &addr->sin_addr.s_addr, ntohs(addr->sin_port)); if (addr->sin_addr.s_addr == htonl(INADDR_ANY)) return 0; tb_id = l3mdev_fib_table_by_index(net, sk->sk_bound_dev_if) ? : tb_id; chk_addr_ret = inet_addr_type_table(net, addr->sin_addr.s_addr, tb_id); if (chk_addr_ret == RTN_MULTICAST || chk_addr_ret == RTN_BROADCAST || (chk_addr_ret != RTN_LOCAL && !inet_can_nonlocal_bind(net, isk))) return -EADDRNOTAVAIL; #if IS_ENABLED(CONFIG_IPV6) } else if (sk->sk_family == AF_INET6) { struct sockaddr_in6 *addr = (struct sockaddr_in6 *) uaddr; int addr_type, scoped, has_addr; struct net_device *dev = NULL; if (addr_len < sizeof(*addr)) return -EINVAL; if (addr->sin6_family != AF_INET6) return -EAFNOSUPPORT; pr_debug("ping_check_bind_addr(sk=%p,addr=%pI6c,port=%d)\n", sk, addr->sin6_addr.s6_addr, ntohs(addr->sin6_port)); addr_type = ipv6_addr_type(&addr->sin6_addr); scoped = __ipv6_addr_needs_scope_id(addr_type); if ((addr_type != IPV6_ADDR_ANY && !(addr_type & IPV6_ADDR_UNICAST)) || (scoped && !addr->sin6_scope_id)) return -EINVAL; rcu_read_lock(); if (addr->sin6_scope_id) { dev = dev_get_by_index_rcu(net, addr->sin6_scope_id); if (!dev) { rcu_read_unlock(); return -ENODEV; } } if (!dev && sk->sk_bound_dev_if) { dev = dev_get_by_index_rcu(net, sk->sk_bound_dev_if); if (!dev) { rcu_read_unlock(); return -ENODEV; } } has_addr = pingv6_ops.ipv6_chk_addr(net, &addr->sin6_addr, dev, scoped); rcu_read_unlock(); if (!(ipv6_can_nonlocal_bind(net, isk) || has_addr || addr_type == IPV6_ADDR_ANY)) return -EADDRNOTAVAIL; if (scoped) sk->sk_bound_dev_if = addr->sin6_scope_id; #endif } else { return -EAFNOSUPPORT; } return 0; } static void ping_set_saddr(struct sock *sk, struct sockaddr *saddr) { if (saddr->sa_family == AF_INET) { struct inet_sock *isk = inet_sk(sk); struct sockaddr_in *addr = (struct sockaddr_in *) saddr; isk->inet_rcv_saddr = isk->inet_saddr = addr->sin_addr.s_addr; #if IS_ENABLED(CONFIG_IPV6) } else if (saddr->sa_family == AF_INET6) { struct sockaddr_in6 *addr = (struct sockaddr_in6 *) saddr; struct ipv6_pinfo *np = inet6_sk(sk); sk->sk_v6_rcv_saddr = np->saddr = addr->sin6_addr; #endif } } /* * We need our own bind because there are no privileged id's == local ports. * Moreover, we don't allow binding to multi- and broadcast addresses. */ int ping_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len) { struct inet_sock *isk = inet_sk(sk); unsigned short snum; int err; int dif = sk->sk_bound_dev_if; err = ping_check_bind_addr(sk, isk, uaddr, addr_len); if (err) return err; lock_sock(sk); err = -EINVAL; if (isk->inet_num != 0) goto out; err = -EADDRINUSE; snum = ntohs(((struct sockaddr_in *)uaddr)->sin_port); if (ping_get_port(sk, snum) != 0) { /* Restore possibly modified sk->sk_bound_dev_if by ping_check_bind_addr(). */ sk->sk_bound_dev_if = dif; goto out; } ping_set_saddr(sk, uaddr); pr_debug("after bind(): num = %hu, dif = %d\n", isk->inet_num, sk->sk_bound_dev_if); err = 0; if (sk->sk_family == AF_INET && isk->inet_rcv_saddr) sk->sk_userlocks |= SOCK_BINDADDR_LOCK; #if IS_ENABLED(CONFIG_IPV6) if (sk->sk_family == AF_INET6 && !ipv6_addr_any(&sk->sk_v6_rcv_saddr)) sk->sk_userlocks |= SOCK_BINDADDR_LOCK; #endif if (snum) sk->sk_userlocks |= SOCK_BINDPORT_LOCK; isk->inet_sport = htons(isk->inet_num); isk->inet_daddr = 0; isk->inet_dport = 0; #if IS_ENABLED(CONFIG_IPV6) if (sk->sk_family == AF_INET6) memset(&sk->sk_v6_daddr, 0, sizeof(sk->sk_v6_daddr)); #endif sk_dst_reset(sk); out: release_sock(sk); pr_debug("ping_v4_bind -> %d\n", err); return err; } EXPORT_IPV6_MOD_GPL(ping_bind); /* * Is this a supported type of ICMP message? */ static inline int ping_supported(int family, int type, int code) { return (family == AF_INET && type == ICMP_ECHO && code == 0) || (family == AF_INET && type == ICMP_EXT_ECHO && code == 0) || (family == AF_INET6 && type == ICMPV6_ECHO_REQUEST && code == 0) || (family == AF_INET6 && type == ICMPV6_EXT_ECHO_REQUEST && code == 0); } /* * This routine is called by the ICMP module when it gets some * sort of error condition. */ void ping_err(struct sk_buff *skb, int offset, u32 info) { int family; struct icmphdr *icmph; struct inet_sock *inet_sock; int type; int code; struct net *net = dev_net(skb->dev); struct sock *sk; int harderr; int err; if (skb->protocol == htons(ETH_P_IP)) { family = AF_INET; type = icmp_hdr(skb)->type; code = icmp_hdr(skb)->code; icmph = (struct icmphdr *)(skb->data + offset); } else if (skb->protocol == htons(ETH_P_IPV6)) { family = AF_INET6; type = icmp6_hdr(skb)->icmp6_type; code = icmp6_hdr(skb)->icmp6_code; icmph = (struct icmphdr *) (skb->data + offset); } else { BUG(); } /* We assume the packet has already been checked by icmp_unreach */ if (!ping_supported(family, icmph->type, icmph->code)) return; pr_debug("ping_err(proto=0x%x,type=%d,code=%d,id=%04x,seq=%04x)\n", skb->protocol, type, code, ntohs(icmph->un.echo.id), ntohs(icmph->un.echo.sequence)); sk = ping_lookup(net, skb, ntohs(icmph->un.echo.id)); if (!sk) { pr_debug("no socket, dropping\n"); return; /* No socket for error */ } pr_debug("err on socket %p\n", sk); err = 0; harderr = 0; inet_sock = inet_sk(sk); if (skb->protocol == htons(ETH_P_IP)) { switch (type) { default: case ICMP_TIME_EXCEEDED: err = EHOSTUNREACH; break; case ICMP_SOURCE_QUENCH: /* This is not a real error but ping wants to see it. * Report it with some fake errno. */ err = EREMOTEIO; break; case ICMP_PARAMETERPROB: err = EPROTO; harderr = 1; break; case ICMP_DEST_UNREACH: if (code == ICMP_FRAG_NEEDED) { /* Path MTU discovery */ ipv4_sk_update_pmtu(skb, sk, info); if (READ_ONCE(inet_sock->pmtudisc) != IP_PMTUDISC_DONT) { err = EMSGSIZE; harderr = 1; break; } goto out; } err = EHOSTUNREACH; if (code <= NR_ICMP_UNREACH) { harderr = icmp_err_convert[code].fatal; err = icmp_err_convert[code].errno; } break; case ICMP_REDIRECT: /* See ICMP_SOURCE_QUENCH */ ipv4_sk_redirect(skb, sk); err = EREMOTEIO; break; } #if IS_ENABLED(CONFIG_IPV6) } else if (skb->protocol == htons(ETH_P_IPV6)) { harderr = pingv6_ops.icmpv6_err_convert(type, code, &err); #endif } /* * RFC1122: OK. Passes ICMP errors back to application, as per * 4.1.3.3. */ if ((family == AF_INET && !inet_test_bit(RECVERR, sk)) || (family == AF_INET6 && !inet6_test_bit(RECVERR6, sk))) { if (!harderr || sk->sk_state != TCP_ESTABLISHED) goto out; } else { if (family == AF_INET) { ip_icmp_error(sk, skb, err, 0 /* no remote port */, info, (u8 *)icmph); #if IS_ENABLED(CONFIG_IPV6) } else if (family == AF_INET6) { pingv6_ops.ipv6_icmp_error(sk, skb, err, 0, info, (u8 *)icmph); #endif } } sk->sk_err = err; sk_error_report(sk); out: return; } EXPORT_IPV6_MOD_GPL(ping_err); /* * Copy and checksum an ICMP Echo packet from user space into a buffer * starting from the payload. */ int ping_getfrag(void *from, char *to, int offset, int fraglen, int odd, struct sk_buff *skb) { struct pingfakehdr *pfh = from; if (!csum_and_copy_from_iter_full(to, fraglen, &pfh->wcheck, &pfh->msg->msg_iter)) return -EFAULT; #if IS_ENABLED(CONFIG_IPV6) /* For IPv6, checksum each skb as we go along, as expected by * icmpv6_push_pending_frames. For IPv4, accumulate the checksum in * wcheck, it will be finalized in ping_v4_push_pending_frames. */ if (pfh->family == AF_INET6) { skb->csum = csum_block_add(skb->csum, pfh->wcheck, odd); skb->ip_summed = CHECKSUM_NONE; pfh->wcheck = 0; } #endif return 0; } EXPORT_IPV6_MOD_GPL(ping_getfrag); static int ping_v4_push_pending_frames(struct sock *sk, struct pingfakehdr *pfh, struct flowi4 *fl4) { struct sk_buff *skb = skb_peek(&sk->sk_write_queue); if (!skb) return 0; pfh->wcheck = csum_partial((char *)&pfh->icmph, sizeof(struct icmphdr), pfh->wcheck); pfh->icmph.checksum = csum_fold(pfh->wcheck); memcpy(icmp_hdr(skb), &pfh->icmph, sizeof(struct icmphdr)); skb->ip_summed = CHECKSUM_NONE; return ip_push_pending_frames(sk, fl4); } int ping_common_sendmsg(int family, struct msghdr *msg, size_t len, void *user_icmph, size_t icmph_len) { u8 type, code; if (len > 0xFFFF) return -EMSGSIZE; /* Must have at least a full ICMP header. */ if (len < icmph_len) return -EINVAL; /* * Check the flags. */ /* Mirror BSD error message compatibility */ if (msg->msg_flags & MSG_OOB) return -EOPNOTSUPP; /* * Fetch the ICMP header provided by the userland. * iovec is modified! The ICMP header is consumed. */ if (memcpy_from_msg(user_icmph, msg, icmph_len)) return -EFAULT; if (family == AF_INET) { type = ((struct icmphdr *) user_icmph)->type; code = ((struct icmphdr *) user_icmph)->code; #if IS_ENABLED(CONFIG_IPV6) } else if (family == AF_INET6) { type = ((struct icmp6hdr *) user_icmph)->icmp6_type; code = ((struct icmp6hdr *) user_icmph)->icmp6_code; #endif } else { BUG(); } if (!ping_supported(family, type, code)) return -EINVAL; return 0; } EXPORT_IPV6_MOD_GPL(ping_common_sendmsg); static int ping_v4_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) { struct net *net = sock_net(sk); struct flowi4 fl4; struct inet_sock *inet = inet_sk(sk); struct ipcm_cookie ipc; struct icmphdr user_icmph; struct pingfakehdr pfh; struct rtable *rt = NULL; struct ip_options_data opt_copy; int free = 0; __be32 saddr, daddr, faddr; u8 scope; int err; pr_debug("ping_v4_sendmsg(sk=%p,sk->num=%u)\n", inet, inet->inet_num); err = ping_common_sendmsg(AF_INET, msg, len, &user_icmph, sizeof(user_icmph)); if (err) return err; /* * Get and verify the address. */ if (msg->msg_name) { DECLARE_SOCKADDR(struct sockaddr_in *, usin, msg->msg_name); if (msg->msg_namelen < sizeof(*usin)) return -EINVAL; if (usin->sin_family != AF_INET) return -EAFNOSUPPORT; daddr = usin->sin_addr.s_addr; /* no remote port */ } else { if (sk->sk_state != TCP_ESTABLISHED) return -EDESTADDRREQ; daddr = inet->inet_daddr; /* no remote port */ } ipcm_init_sk(&ipc, inet); if (msg->msg_controllen) { err = ip_cmsg_send(sk, msg, &ipc, false); if (unlikely(err)) { kfree(ipc.opt); return err; } if (ipc.opt) free = 1; } if (!ipc.opt) { struct ip_options_rcu *inet_opt; rcu_read_lock(); inet_opt = rcu_dereference(inet->inet_opt); if (inet_opt) { memcpy(&opt_copy, inet_opt, sizeof(*inet_opt) + inet_opt->opt.optlen); ipc.opt = &opt_copy.opt; } rcu_read_unlock(); } saddr = ipc.addr; ipc.addr = faddr = daddr; if (ipc.opt && ipc.opt->opt.srr) { if (!daddr) { err = -EINVAL; goto out_free; } faddr = ipc.opt->opt.faddr; } scope = ip_sendmsg_scope(inet, &ipc, msg); if (ipv4_is_multicast(daddr)) { if (!ipc.oif || netif_index_is_l3_master(sock_net(sk), ipc.oif)) ipc.oif = READ_ONCE(inet->mc_index); if (!saddr) saddr = READ_ONCE(inet->mc_addr); } else if (!ipc.oif) ipc.oif = READ_ONCE(inet->uc_index); flowi4_init_output(&fl4, ipc.oif, ipc.sockc.mark, ipc.tos & INET_DSCP_MASK, scope, sk->sk_protocol, inet_sk_flowi_flags(sk), faddr, saddr, 0, 0, sk_uid(sk)); fl4.fl4_icmp_type = user_icmph.type; fl4.fl4_icmp_code = user_icmph.code; security_sk_classify_flow(sk, flowi4_to_flowi_common(&fl4)); rt = ip_route_output_flow(net, &fl4, sk); if (IS_ERR(rt)) { err = PTR_ERR(rt); rt = NULL; if (err == -ENETUNREACH) IP_INC_STATS(net, IPSTATS_MIB_OUTNOROUTES); goto out; } err = -EACCES; if ((rt->rt_flags & RTCF_BROADCAST) && !sock_flag(sk, SOCK_BROADCAST)) goto out; if (msg->msg_flags & MSG_CONFIRM) goto do_confirm; back_from_confirm: if (!ipc.addr) ipc.addr = fl4.daddr; lock_sock(sk); pfh.icmph.type = user_icmph.type; /* already checked */ pfh.icmph.code = user_icmph.code; /* ditto */ pfh.icmph.checksum = 0; pfh.icmph.un.echo.id = inet->inet_sport; pfh.icmph.un.echo.sequence = user_icmph.un.echo.sequence; pfh.msg = msg; pfh.wcheck = 0; pfh.family = AF_INET; err = ip_append_data(sk, &fl4, ping_getfrag, &pfh, len, sizeof(struct icmphdr), &ipc, &rt, msg->msg_flags); if (err) ip_flush_pending_frames(sk); else err = ping_v4_push_pending_frames(sk, &pfh, &fl4); release_sock(sk); out: ip_rt_put(rt); out_free: if (free) kfree(ipc.opt); if (!err) { icmp_out_count(sock_net(sk), user_icmph.type); return len; } return err; do_confirm: if (msg->msg_flags & MSG_PROBE) dst_confirm_neigh(&rt->dst, &fl4.daddr); if (!(msg->msg_flags & MSG_PROBE) || len) goto back_from_confirm; err = 0; goto out; } int ping_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int flags, int *addr_len) { struct inet_sock *isk = inet_sk(sk); int family = sk->sk_family; struct sk_buff *skb; int copied, err; pr_debug("ping_recvmsg(sk=%p,sk->num=%u)\n", isk, isk->inet_num); err = -EOPNOTSUPP; if (flags & MSG_OOB) goto out; if (flags & MSG_ERRQUEUE) return inet_recv_error(sk, msg, len, addr_len); skb = skb_recv_datagram(sk, flags, &err); if (!skb) goto out; copied = skb->len; if (copied > len) { msg->msg_flags |= MSG_TRUNC; copied = len; } /* Don't bother checking the checksum */ err = skb_copy_datagram_msg(skb, 0, msg, copied); if (err) goto done; sock_recv_timestamp(msg, sk, skb); /* Copy the address and add cmsg data. */ if (family == AF_INET) { DECLARE_SOCKADDR(struct sockaddr_in *, sin, msg->msg_name); if (sin) { sin->sin_family = AF_INET; sin->sin_port = 0 /* skb->h.uh->source */; sin->sin_addr.s_addr = ip_hdr(skb)->saddr; memset(sin->sin_zero, 0, sizeof(sin->sin_zero)); *addr_len = sizeof(*sin); } if (inet_cmsg_flags(isk)) ip_cmsg_recv(msg, skb); #if IS_ENABLED(CONFIG_IPV6) } else if (family == AF_INET6) { struct ipv6hdr *ip6 = ipv6_hdr(skb); DECLARE_SOCKADDR(struct sockaddr_in6 *, sin6, msg->msg_name); if (sin6) { sin6->sin6_family = AF_INET6; sin6->sin6_port = 0; sin6->sin6_addr = ip6->saddr; sin6->sin6_flowinfo = 0; if (inet6_test_bit(SNDFLOW, sk)) sin6->sin6_flowinfo = ip6_flowinfo(ip6); sin6->sin6_scope_id = ipv6_iface_scope_id(&sin6->sin6_addr, inet6_iif(skb)); *addr_len = sizeof(*sin6); } if (inet6_sk(sk)->rxopt.all) pingv6_ops.ip6_datagram_recv_common_ctl(sk, msg, skb); if (skb->protocol == htons(ETH_P_IPV6) && inet6_sk(sk)->rxopt.all) pingv6_ops.ip6_datagram_recv_specific_ctl(sk, msg, skb); else if (skb->protocol == htons(ETH_P_IP) && inet_cmsg_flags(isk)) ip_cmsg_recv(msg, skb); #endif } else { BUG(); } err = copied; done: skb_free_datagram(sk, skb); out: pr_debug("ping_recvmsg -> %d\n", err); return err; } EXPORT_IPV6_MOD_GPL(ping_recvmsg); static enum skb_drop_reason __ping_queue_rcv_skb(struct sock *sk, struct sk_buff *skb) { enum skb_drop_reason reason; pr_debug("ping_queue_rcv_skb(sk=%p,sk->num=%d,skb=%p)\n", inet_sk(sk), inet_sk(sk)->inet_num, skb); if (sock_queue_rcv_skb_reason(sk, skb, &reason) < 0) { sk_skb_reason_drop(sk, skb, reason); pr_debug("ping_queue_rcv_skb -> failed\n"); return reason; } return SKB_NOT_DROPPED_YET; } int ping_queue_rcv_skb(struct sock *sk, struct sk_buff *skb) { return __ping_queue_rcv_skb(sk, skb) ? -1 : 0; } EXPORT_IPV6_MOD_GPL(ping_queue_rcv_skb); /* * All we need to do is get the socket. */ enum skb_drop_reason ping_rcv(struct sk_buff *skb) { struct net *net = dev_net(skb->dev); struct icmphdr *icmph = icmp_hdr(skb); struct sock *sk; /* We assume the packet has already been checked by icmp_rcv */ pr_debug("ping_rcv(skb=%p,id=%04x,seq=%04x)\n", skb, ntohs(icmph->un.echo.id), ntohs(icmph->un.echo.sequence)); /* Push ICMP header back */ skb_push(skb, skb->data - (u8 *)icmph); sk = ping_lookup(net, skb, ntohs(icmph->un.echo.id)); if (sk) return __ping_queue_rcv_skb(sk, skb); kfree_skb_reason(skb, SKB_DROP_REASON_NO_SOCKET); return SKB_DROP_REASON_NO_SOCKET; } EXPORT_IPV6_MOD_GPL(ping_rcv); struct proto ping_prot = { .name = "PING", .owner = THIS_MODULE, .init = ping_init_sock, .close = ping_close, .pre_connect = ping_pre_connect, .connect = ip4_datagram_connect, .disconnect = __udp_disconnect, .setsockopt = ip_setsockopt, .getsockopt = ip_getsockopt, .sendmsg = ping_v4_sendmsg, .recvmsg = ping_recvmsg, .bind = ping_bind, .backlog_rcv = ping_queue_rcv_skb, .release_cb = ip4_datagram_release_cb, .unhash = ping_unhash, .get_port = ping_get_port, .put_port = ping_unhash, .obj_size = sizeof(struct inet_sock), }; EXPORT_IPV6_MOD(ping_prot); #ifdef CONFIG_PROC_FS static struct sock *ping_get_first(struct seq_file *seq, int start) { struct sock *sk; struct ping_iter_state *state = seq->private; struct net *net = seq_file_net(seq); for (state->bucket = start; state->bucket < PING_HTABLE_SIZE; ++state->bucket) { struct hlist_head *hslot; hslot = &ping_table.hash[state->bucket]; if (hlist_empty(hslot)) continue; sk_for_each(sk, hslot) { if (net_eq(sock_net(sk), net) && sk->sk_family == state->family) goto found; } } sk = NULL; found: return sk; } static struct sock *ping_get_next(struct seq_file *seq, struct sock *sk) { struct ping_iter_state *state = seq->private; struct net *net = seq_file_net(seq); do { sk = sk_next(sk); } while (sk && (!net_eq(sock_net(sk), net))); if (!sk) return ping_get_first(seq, state->bucket + 1); return sk; } static struct sock *ping_get_idx(struct seq_file *seq, loff_t pos) { struct sock *sk = ping_get_first(seq, 0); if (sk) while (pos && (sk = ping_get_next(seq, sk)) != NULL) --pos; return pos ? NULL : sk; } void *ping_seq_start(struct seq_file *seq, loff_t *pos, sa_family_t family) __acquires(ping_table.lock) { struct ping_iter_state *state = seq->private; state->bucket = 0; state->family = family; spin_lock(&ping_table.lock); return *pos ? ping_get_idx(seq, *pos-1) : SEQ_START_TOKEN; } EXPORT_IPV6_MOD_GPL(ping_seq_start); static void *ping_v4_seq_start(struct seq_file *seq, loff_t *pos) { return ping_seq_start(seq, pos, AF_INET); } void *ping_seq_next(struct seq_file *seq, void *v, loff_t *pos) { struct sock *sk; if (v == SEQ_START_TOKEN) sk = ping_get_idx(seq, 0); else sk = ping_get_next(seq, v); ++*pos; return sk; } EXPORT_IPV6_MOD_GPL(ping_seq_next); void ping_seq_stop(struct seq_file *seq, void *v) __releases(ping_table.lock) { spin_unlock(&ping_table.lock); } EXPORT_IPV6_MOD_GPL(ping_seq_stop); static void ping_v4_format_sock(struct sock *sp, struct seq_file *f, int bucket) { struct inet_sock *inet = inet_sk(sp); __be32 dest = inet->inet_daddr; __be32 src = inet->inet_rcv_saddr; __u16 destp = ntohs(inet->inet_dport); __u16 srcp = ntohs(inet->inet_sport); seq_printf(f, "%5d: %08X:%04X %08X:%04X" " %02X %08X:%08X %02X:%08lX %08X %5u %8d %lu %d %pK %u", bucket, src, srcp, dest, destp, sp->sk_state, sk_wmem_alloc_get(sp), sk_rmem_alloc_get(sp), 0, 0L, 0, from_kuid_munged(seq_user_ns(f), sk_uid(sp)), 0, sock_i_ino(sp), refcount_read(&sp->sk_refcnt), sp, sk_drops_read(sp)); } static int ping_v4_seq_show(struct seq_file *seq, void *v) { seq_setwidth(seq, 127); if (v == SEQ_START_TOKEN) seq_puts(seq, " sl local_address rem_address st tx_queue " "rx_queue tr tm->when retrnsmt uid timeout " "inode ref pointer drops"); else { struct ping_iter_state *state = seq->private; ping_v4_format_sock(v, seq, state->bucket); } seq_pad(seq, '\n'); return 0; } static const struct seq_operations ping_v4_seq_ops = { .start = ping_v4_seq_start, .show = ping_v4_seq_show, .next = ping_seq_next, .stop = ping_seq_stop, }; static int __net_init ping_v4_proc_init_net(struct net *net) { if (!proc_create_net("icmp", 0444, net->proc_net, &ping_v4_seq_ops, sizeof(struct ping_iter_state))) return -ENOMEM; net->ipv4.ping_port_rover = get_random_u16(); return 0; } static void __net_exit ping_v4_proc_exit_net(struct net *net) { remove_proc_entry("icmp", net->proc_net); } static struct pernet_operations ping_v4_net_ops = { .init = ping_v4_proc_init_net, .exit = ping_v4_proc_exit_net, }; int __init ping_proc_init(void) { return register_pernet_subsys(&ping_v4_net_ops); } void ping_proc_exit(void) { unregister_pernet_subsys(&ping_v4_net_ops); } #endif void __init ping_init(void) { int i; for (i = 0; i < PING_HTABLE_SIZE; i++) INIT_HLIST_HEAD(&ping_table.hash[i]); spin_lock_init(&ping_table.lock); } |
| 1 1 1 1 1 3 1 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 | // SPDX-License-Identifier: GPL-2.0-or-later /* * sch_plug.c Queue traffic until an explicit release command * * There are two ways to use this qdisc: * 1. A simple "instantaneous" plug/unplug operation, by issuing an alternating * sequence of TCQ_PLUG_BUFFER & TCQ_PLUG_RELEASE_INDEFINITE commands. * * 2. For network output buffering (a.k.a output commit) functionality. * Output commit property is commonly used by applications using checkpoint * based fault-tolerance to ensure that the checkpoint from which a system * is being restored is consistent w.r.t outside world. * * Consider for e.g. Remus - a Virtual Machine checkpointing system, * wherein a VM is checkpointed, say every 50ms. The checkpoint is replicated * asynchronously to the backup host, while the VM continues executing the * next epoch speculatively. * * The following is a typical sequence of output buffer operations: * 1.At epoch i, start_buffer(i) * 2. At end of epoch i (i.e. after 50ms): * 2.1 Stop VM and take checkpoint(i). * 2.2 start_buffer(i+1) and Resume VM * 3. While speculatively executing epoch(i+1), asynchronously replicate * checkpoint(i) to backup host. * 4. When checkpoint_ack(i) is received from backup, release_buffer(i) * Thus, this Qdisc would receive the following sequence of commands: * TCQ_PLUG_BUFFER (epoch i) * .. TCQ_PLUG_BUFFER (epoch i+1) * ....TCQ_PLUG_RELEASE_ONE (epoch i) * ......TCQ_PLUG_BUFFER (epoch i+2) * ........ */ #include <linux/module.h> #include <linux/types.h> #include <linux/kernel.h> #include <linux/errno.h> #include <linux/netdevice.h> #include <linux/skbuff.h> #include <net/pkt_sched.h> /* * State of the queue, when used for network output buffering: * * plug(i+1) plug(i) head * ------------------+--------------------+----------------> * | | * | | * pkts_current_epoch| pkts_last_epoch |pkts_to_release * ----------------->|<--------+--------->|+---------------> * v v * */ struct plug_sched_data { /* If true, the dequeue function releases all packets * from head to end of the queue. The queue turns into * a pass-through queue for newly arriving packets. */ bool unplug_indefinite; bool throttled; /* Queue Limit in bytes */ u32 limit; /* Number of packets (output) from the current speculatively * executing epoch. */ u32 pkts_current_epoch; /* Number of packets corresponding to the recently finished * epoch. These will be released when we receive a * TCQ_PLUG_RELEASE_ONE command. This command is typically * issued after committing a checkpoint at the target. */ u32 pkts_last_epoch; /* * Number of packets from the head of the queue, that can * be released (committed checkpoint). */ u32 pkts_to_release; }; static int plug_enqueue(struct sk_buff *skb, struct Qdisc *sch, struct sk_buff **to_free) { struct plug_sched_data *q = qdisc_priv(sch); if (likely(sch->qstats.backlog + skb->len <= q->limit)) { if (!q->unplug_indefinite) q->pkts_current_epoch++; return qdisc_enqueue_tail(skb, sch); } return qdisc_drop(skb, sch, to_free); } static struct sk_buff *plug_dequeue(struct Qdisc *sch) { struct plug_sched_data *q = qdisc_priv(sch); if (q->throttled) return NULL; if (!q->unplug_indefinite) { if (!q->pkts_to_release) { /* No more packets to dequeue. Block the queue * and wait for the next release command. */ q->throttled = true; return NULL; } q->pkts_to_release--; } return qdisc_dequeue_head(sch); } static int plug_init(struct Qdisc *sch, struct nlattr *opt, struct netlink_ext_ack *extack) { struct plug_sched_data *q = qdisc_priv(sch); q->pkts_current_epoch = 0; q->pkts_last_epoch = 0; q->pkts_to_release = 0; q->unplug_indefinite = false; if (opt == NULL) { q->limit = qdisc_dev(sch)->tx_queue_len * psched_mtu(qdisc_dev(sch)); } else { struct tc_plug_qopt *ctl = nla_data(opt); if (nla_len(opt) < sizeof(*ctl)) return -EINVAL; q->limit = ctl->limit; } q->throttled = true; return 0; } /* Receives 4 types of messages: * TCQ_PLUG_BUFFER: Inset a plug into the queue and * buffer any incoming packets * TCQ_PLUG_RELEASE_ONE: Dequeue packets from queue head * to beginning of the next plug. * TCQ_PLUG_RELEASE_INDEFINITE: Dequeue all packets from queue. * Stop buffering packets until the next TCQ_PLUG_BUFFER * command is received (just act as a pass-thru queue). * TCQ_PLUG_LIMIT: Increase/decrease queue size */ static int plug_change(struct Qdisc *sch, struct nlattr *opt, struct netlink_ext_ack *extack) { struct plug_sched_data *q = qdisc_priv(sch); struct tc_plug_qopt *msg; msg = nla_data(opt); if (nla_len(opt) < sizeof(*msg)) return -EINVAL; switch (msg->action) { case TCQ_PLUG_BUFFER: /* Save size of the current buffer */ q->pkts_last_epoch = q->pkts_current_epoch; q->pkts_current_epoch = 0; if (q->unplug_indefinite) q->throttled = true; q->unplug_indefinite = false; break; case TCQ_PLUG_RELEASE_ONE: /* Add packets from the last complete buffer to the * packets to be released set. */ q->pkts_to_release += q->pkts_last_epoch; q->pkts_last_epoch = 0; q->throttled = false; netif_schedule_queue(sch->dev_queue); break; case TCQ_PLUG_RELEASE_INDEFINITE: q->unplug_indefinite = true; q->pkts_to_release = 0; q->pkts_last_epoch = 0; q->pkts_current_epoch = 0; q->throttled = false; netif_schedule_queue(sch->dev_queue); break; case TCQ_PLUG_LIMIT: /* Limit is supplied in bytes */ q->limit = msg->limit; break; default: return -EINVAL; } return 0; } static struct Qdisc_ops plug_qdisc_ops __read_mostly = { .id = "plug", .priv_size = sizeof(struct plug_sched_data), .enqueue = plug_enqueue, .dequeue = plug_dequeue, .peek = qdisc_peek_dequeued, .init = plug_init, .change = plug_change, .reset = qdisc_reset_queue, .owner = THIS_MODULE, }; MODULE_ALIAS_NET_SCH("plug"); static int __init plug_module_init(void) { return register_qdisc(&plug_qdisc_ops); } static void __exit plug_module_exit(void) { unregister_qdisc(&plug_qdisc_ops); } module_init(plug_module_init) module_exit(plug_module_exit) MODULE_LICENSE("GPL"); MODULE_DESCRIPTION("Qdisc to plug and unplug traffic via netlink control"); |
| 14 14 14 14 14 14 14 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 | // SPDX-License-Identifier: GPL-2.0-or-later /* RSA asymmetric public-key algorithm [RFC3447] * * Copyright (c) 2015, Intel Corporation * Authors: Tadeusz Struk <tadeusz.struk@intel.com> */ #include <linux/fips.h> #include <linux/module.h> #include <linux/mpi.h> #include <crypto/internal/rsa.h> #include <crypto/internal/akcipher.h> #include <crypto/akcipher.h> #include <crypto/algapi.h> struct rsa_mpi_key { MPI n; MPI e; MPI d; MPI p; MPI q; MPI dp; MPI dq; MPI qinv; }; static int rsa_check_payload(MPI x, MPI n) { MPI n1; if (mpi_cmp_ui(x, 1) <= 0) return -EINVAL; n1 = mpi_alloc(0); if (!n1) return -ENOMEM; if (mpi_sub_ui(n1, n, 1) || mpi_cmp(x, n1) >= 0) { mpi_free(n1); return -EINVAL; } mpi_free(n1); return 0; } /* * RSAEP function [RFC3447 sec 5.1.1] * c = m^e mod n; */ static int _rsa_enc(const struct rsa_mpi_key *key, MPI c, MPI m) { /* * Even though (1) in RFC3447 only requires 0 <= m <= n - 1, we are * slightly more conservative and require 1 < m < n - 1. This is in line * with SP 800-56Br2, Section 7.1.1. */ if (rsa_check_payload(m, key->n)) return -EINVAL; /* (2) c = m^e mod n */ return mpi_powm(c, m, key->e, key->n); } /* * RSADP function [RFC3447 sec 5.1.2] * m_1 = c^dP mod p; * m_2 = c^dQ mod q; * h = (m_1 - m_2) * qInv mod p; * m = m_2 + q * h; */ static int _rsa_dec_crt(const struct rsa_mpi_key *key, MPI m_or_m1_or_h, MPI c) { MPI m2, m12_or_qh; int ret = -ENOMEM; /* * Even though (1) in RFC3447 only requires 0 <= c <= n - 1, we are * slightly more conservative and require 1 < c < n - 1. This is in line * with SP 800-56Br2, Section 7.1.2. */ if (rsa_check_payload(c, key->n)) return -EINVAL; m2 = mpi_alloc(0); m12_or_qh = mpi_alloc(0); if (!m2 || !m12_or_qh) goto err_free_mpi; /* (2i) m_1 = c^dP mod p */ ret = mpi_powm(m_or_m1_or_h, c, key->dp, key->p); if (ret) goto err_free_mpi; /* (2i) m_2 = c^dQ mod q */ ret = mpi_powm(m2, c, key->dq, key->q); if (ret) goto err_free_mpi; /* (2iii) h = (m_1 - m_2) * qInv mod p */ ret = mpi_sub(m12_or_qh, m_or_m1_or_h, m2) ?: mpi_mulm(m_or_m1_or_h, m12_or_qh, key->qinv, key->p); /* (2iv) m = m_2 + q * h */ ret = ret ?: mpi_mul(m12_or_qh, key->q, m_or_m1_or_h) ?: mpi_addm(m_or_m1_or_h, m2, m12_or_qh, key->n); err_free_mpi: mpi_free(m12_or_qh); mpi_free(m2); return ret; } static inline struct rsa_mpi_key *rsa_get_key(struct crypto_akcipher *tfm) { return akcipher_tfm_ctx(tfm); } static int rsa_enc(struct akcipher_request *req) { struct crypto_akcipher *tfm = crypto_akcipher_reqtfm(req); const struct rsa_mpi_key *pkey = rsa_get_key(tfm); MPI m, c = mpi_alloc(0); int ret = 0; int sign; if (!c) return -ENOMEM; if (unlikely(!pkey->n || !pkey->e)) { ret = -EINVAL; goto err_free_c; } ret = -ENOMEM; m = mpi_read_raw_from_sgl(req->src, req->src_len); if (!m) goto err_free_c; ret = _rsa_enc(pkey, c, m); if (ret) goto err_free_m; ret = mpi_write_to_sgl(c, req->dst, req->dst_len, &sign); if (ret) goto err_free_m; if (sign < 0) ret = -EBADMSG; err_free_m: mpi_free(m); err_free_c: mpi_free(c); return ret; } static int rsa_dec(struct akcipher_request *req) { struct crypto_akcipher *tfm = crypto_akcipher_reqtfm(req); const struct rsa_mpi_key *pkey = rsa_get_key(tfm); MPI c, m = mpi_alloc(0); int ret = 0; int sign; if (!m) return -ENOMEM; if (unlikely(!pkey->n || !pkey->d)) { ret = -EINVAL; goto err_free_m; } ret = -ENOMEM; c = mpi_read_raw_from_sgl(req->src, req->src_len); if (!c) goto err_free_m; ret = _rsa_dec_crt(pkey, m, c); if (ret) goto err_free_c; ret = mpi_write_to_sgl(m, req->dst, req->dst_len, &sign); if (ret) goto err_free_c; if (sign < 0) ret = -EBADMSG; err_free_c: mpi_free(c); err_free_m: mpi_free(m); return ret; } static void rsa_free_mpi_key(struct rsa_mpi_key *key) { mpi_free(key->d); mpi_free(key->e); mpi_free(key->n); mpi_free(key->p); mpi_free(key->q); mpi_free(key->dp); mpi_free(key->dq); mpi_free(key->qinv); key->d = NULL; key->e = NULL; key->n = NULL; key->p = NULL; key->q = NULL; key->dp = NULL; key->dq = NULL; key->qinv = NULL; } static int rsa_check_key_length(unsigned int len) { switch (len) { case 512: case 1024: case 1536: if (fips_enabled) return -EINVAL; fallthrough; case 2048: case 3072: case 4096: return 0; } return -EINVAL; } static int rsa_check_exponent_fips(MPI e) { MPI e_max = NULL; int err; /* check if odd */ if (!mpi_test_bit(e, 0)) { return -EINVAL; } /* check if 2^16 < e < 2^256. */ if (mpi_cmp_ui(e, 65536) <= 0) { return -EINVAL; } e_max = mpi_alloc(0); if (!e_max) return -ENOMEM; err = mpi_set_bit(e_max, 256); if (err) { mpi_free(e_max); return err; } if (mpi_cmp(e, e_max) >= 0) { mpi_free(e_max); return -EINVAL; } mpi_free(e_max); return 0; } static int rsa_set_pub_key(struct crypto_akcipher *tfm, const void *key, unsigned int keylen) { struct rsa_mpi_key *mpi_key = akcipher_tfm_ctx(tfm); struct rsa_key raw_key = {0}; int ret; /* Free the old MPI key if any */ rsa_free_mpi_key(mpi_key); ret = rsa_parse_pub_key(&raw_key, key, keylen); if (ret) return ret; mpi_key->e = mpi_read_raw_data(raw_key.e, raw_key.e_sz); if (!mpi_key->e) goto err; mpi_key->n = mpi_read_raw_data(raw_key.n, raw_key.n_sz); if (!mpi_key->n) goto err; if (rsa_check_key_length(mpi_get_size(mpi_key->n) << 3)) { rsa_free_mpi_key(mpi_key); return -EINVAL; } if (fips_enabled && rsa_check_exponent_fips(mpi_key->e)) { rsa_free_mpi_key(mpi_key); return -EINVAL; } return 0; err: rsa_free_mpi_key(mpi_key); return -ENOMEM; } static int rsa_set_priv_key(struct crypto_akcipher *tfm, const void *key, unsigned int keylen) { struct rsa_mpi_key *mpi_key = akcipher_tfm_ctx(tfm); struct rsa_key raw_key = {0}; int ret; /* Free the old MPI key if any */ rsa_free_mpi_key(mpi_key); ret = rsa_parse_priv_key(&raw_key, key, keylen); if (ret) return ret; mpi_key->d = mpi_read_raw_data(raw_key.d, raw_key.d_sz); if (!mpi_key->d) goto err; mpi_key->e = mpi_read_raw_data(raw_key.e, raw_key.e_sz); if (!mpi_key->e) goto err; mpi_key->n = mpi_read_raw_data(raw_key.n, raw_key.n_sz); if (!mpi_key->n) goto err; mpi_key->p = mpi_read_raw_data(raw_key.p, raw_key.p_sz); if (!mpi_key->p) goto err; mpi_key->q = mpi_read_raw_data(raw_key.q, raw_key.q_sz); if (!mpi_key->q) goto err; mpi_key->dp = mpi_read_raw_data(raw_key.dp, raw_key.dp_sz); if (!mpi_key->dp) goto err; mpi_key->dq = mpi_read_raw_data(raw_key.dq, raw_key.dq_sz); if (!mpi_key->dq) goto err; mpi_key->qinv = mpi_read_raw_data(raw_key.qinv, raw_key.qinv_sz); if (!mpi_key->qinv) goto err; if (rsa_check_key_length(mpi_get_size(mpi_key->n) << 3)) { rsa_free_mpi_key(mpi_key); return -EINVAL; } if (fips_enabled && rsa_check_exponent_fips(mpi_key->e)) { rsa_free_mpi_key(mpi_key); return -EINVAL; } return 0; err: rsa_free_mpi_key(mpi_key); return -ENOMEM; } static unsigned int rsa_max_size(struct crypto_akcipher *tfm) { struct rsa_mpi_key *pkey = akcipher_tfm_ctx(tfm); return mpi_get_size(pkey->n); } static void rsa_exit_tfm(struct crypto_akcipher *tfm) { struct rsa_mpi_key *pkey = akcipher_tfm_ctx(tfm); rsa_free_mpi_key(pkey); } static struct akcipher_alg rsa = { .encrypt = rsa_enc, .decrypt = rsa_dec, .set_priv_key = rsa_set_priv_key, .set_pub_key = rsa_set_pub_key, .max_size = rsa_max_size, .exit = rsa_exit_tfm, .base = { .cra_name = "rsa", .cra_driver_name = "rsa-generic", .cra_priority = 100, .cra_module = THIS_MODULE, .cra_ctxsize = sizeof(struct rsa_mpi_key), }, }; static int __init rsa_init(void) { int err; err = crypto_register_akcipher(&rsa); if (err) return err; err = crypto_register_template(&rsa_pkcs1pad_tmpl); if (err) goto err_unregister_rsa; err = crypto_register_template(&rsassa_pkcs1_tmpl); if (err) goto err_unregister_rsa_pkcs1pad; return 0; err_unregister_rsa_pkcs1pad: crypto_unregister_template(&rsa_pkcs1pad_tmpl); err_unregister_rsa: crypto_unregister_akcipher(&rsa); return err; } static void __exit rsa_exit(void) { crypto_unregister_template(&rsassa_pkcs1_tmpl); crypto_unregister_template(&rsa_pkcs1pad_tmpl); crypto_unregister_akcipher(&rsa); } module_init(rsa_init); module_exit(rsa_exit); MODULE_ALIAS_CRYPTO("rsa"); MODULE_LICENSE("GPL"); MODULE_DESCRIPTION("RSA generic algorithm"); |
| 33 33 521 520 521 520 756 20 5 16 13 98 315 316 315 315 19 19 5 5 5 5 2 5 5 2 2 2 2 2 2 2 5 5 5 5 5 5 3 19 10 9 9 4 5 5 9 18 19 19 358 1 359 80 1 13 66 65 65 64 65 80 80 80 80 6 6 828 174 174 173 669 676 563 564 559 427 426 427 241 329 753 189 190 28 190 190 1 1 243 828 554 554 555 556 208 425 208 207 324 831 828 830 394 621 65 65 65 3 3 3 3 3 3 3 3 1 1 3 5 5 5 4 1 3 3 3 3 3 3 3 3 1 10 5 5 1 2 3 5 843 120 706 145 1 225 275 213 61 495 112 564 460 439 439 833 608 172 103 31 215 829 1000 996 996 1011 771 3 99 406 22 174 44 448 735 9 144 27 2 15 21 73 2 48 11 60 318 15 27 29 282 470 497 249 847 18 18 18 930 354 4 760 8 1 7 8 8 8 8 5 3 8 8 8 8 8 8 3 5 5 8 11 11 11 11 11 11 11 22 21 22 20 2 26 26 26 26 23 23 23 23 2 21 19 4 20 2 20 2 22 21 20 2 19 2 22 2 10 10 8 6 8 5 7 10 2 11 12 12 4 8 8 16 394 32 33 33 33 33 33 33 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 2674 2675 2676 2677 2678 2679 2680 2681 2682 2683 2684 2685 2686 2687 2688 2689 2690 2691 2692 2693 2694 2695 2696 2697 2698 2699 2700 2701 2702 2703 2704 2705 2706 2707 2708 2709 2710 2711 2712 2713 2714 2715 2716 2717 2718 2719 2720 2721 2722 2723 2724 2725 2726 2727 2728 2729 2730 2731 2732 2733 2734 2735 2736 2737 2738 2739 2740 2741 2742 2743 2744 2745 2746 2747 2748 2749 2750 2751 2752 2753 2754 2755 2756 2757 2758 2759 2760 2761 2762 2763 2764 2765 2766 2767 2768 2769 2770 2771 2772 2773 2774 2775 2776 2777 2778 2779 2780 2781 2782 2783 2784 2785 2786 2787 2788 2789 2790 2791 2792 2793 2794 2795 2796 2797 2798 2799 2800 2801 2802 2803 2804 2805 2806 2807 2808 2809 2810 2811 2812 2813 2814 2815 2816 2817 2818 2819 2820 2821 2822 2823 2824 2825 2826 2827 2828 2829 2830 2831 2832 2833 2834 2835 2836 2837 2838 2839 2840 2841 2842 2843 2844 2845 2846 2847 2848 2849 2850 2851 2852 2853 2854 2855 2856 2857 2858 2859 2860 2861 2862 2863 2864 2865 2866 2867 2868 2869 2870 2871 2872 2873 2874 2875 2876 2877 2878 2879 2880 2881 2882 2883 2884 2885 2886 2887 2888 2889 2890 2891 2892 2893 2894 2895 2896 2897 2898 2899 2900 2901 2902 2903 2904 2905 2906 2907 2908 2909 2910 2911 2912 2913 2914 2915 2916 2917 2918 2919 2920 2921 2922 2923 2924 2925 2926 2927 2928 2929 2930 2931 2932 2933 2934 2935 2936 2937 2938 2939 2940 2941 2942 2943 2944 2945 2946 2947 2948 2949 2950 2951 2952 2953 2954 2955 2956 2957 2958 2959 2960 2961 2962 2963 2964 2965 2966 2967 2968 2969 2970 2971 2972 2973 2974 2975 2976 2977 2978 2979 2980 2981 2982 2983 2984 2985 2986 2987 2988 2989 2990 2991 2992 2993 2994 2995 2996 2997 2998 2999 3000 3001 3002 3003 3004 3005 3006 3007 3008 3009 3010 3011 3012 3013 3014 3015 3016 3017 3018 3019 3020 3021 3022 3023 3024 3025 3026 3027 3028 3029 3030 3031 3032 3033 3034 3035 3036 3037 3038 3039 3040 3041 3042 3043 3044 3045 3046 3047 3048 3049 3050 3051 3052 3053 3054 3055 3056 3057 3058 3059 3060 3061 3062 3063 3064 3065 3066 3067 3068 3069 3070 3071 3072 3073 3074 3075 3076 3077 3078 3079 3080 3081 3082 3083 3084 3085 3086 3087 3088 3089 3090 3091 3092 3093 3094 3095 3096 3097 3098 3099 3100 3101 3102 3103 3104 3105 3106 3107 3108 3109 3110 3111 3112 3113 3114 3115 3116 3117 3118 3119 3120 3121 3122 3123 3124 3125 3126 3127 3128 3129 3130 3131 3132 3133 3134 3135 3136 3137 3138 3139 3140 3141 3142 3143 3144 3145 3146 3147 3148 3149 3150 3151 3152 3153 3154 3155 3156 3157 3158 3159 3160 3161 3162 3163 3164 3165 3166 3167 3168 3169 3170 3171 3172 3173 3174 3175 3176 3177 3178 3179 3180 3181 3182 3183 3184 3185 3186 3187 3188 3189 3190 3191 3192 3193 3194 3195 3196 3197 3198 3199 3200 3201 3202 3203 3204 3205 3206 3207 3208 3209 3210 3211 3212 3213 3214 3215 3216 3217 3218 3219 3220 3221 3222 3223 3224 3225 3226 3227 3228 3229 3230 3231 3232 3233 3234 3235 3236 3237 3238 3239 3240 3241 3242 3243 3244 3245 3246 3247 3248 3249 3250 3251 3252 3253 3254 3255 3256 3257 3258 3259 3260 3261 3262 3263 3264 3265 3266 3267 3268 3269 3270 3271 3272 3273 3274 3275 3276 3277 3278 3279 3280 3281 3282 3283 3284 3285 3286 3287 3288 3289 3290 3291 3292 3293 3294 3295 3296 3297 3298 3299 3300 3301 3302 3303 3304 3305 3306 3307 3308 3309 3310 3311 3312 3313 3314 3315 3316 3317 3318 3319 3320 3321 3322 3323 3324 3325 3326 3327 3328 3329 3330 3331 3332 3333 3334 3335 3336 3337 3338 3339 3340 3341 3342 3343 3344 3345 3346 3347 3348 3349 3350 3351 3352 3353 3354 3355 3356 3357 3358 3359 3360 3361 3362 3363 3364 3365 3366 3367 3368 3369 3370 3371 3372 3373 3374 3375 3376 3377 3378 3379 3380 3381 3382 3383 3384 3385 3386 3387 3388 3389 3390 3391 3392 3393 3394 3395 3396 3397 3398 3399 3400 3401 3402 3403 3404 3405 3406 3407 3408 3409 3410 3411 3412 3413 3414 3415 3416 3417 3418 3419 3420 3421 3422 3423 3424 3425 3426 3427 3428 3429 3430 3431 3432 3433 3434 3435 3436 3437 3438 3439 3440 3441 3442 3443 3444 3445 3446 3447 3448 3449 3450 3451 3452 3453 3454 3455 3456 3457 3458 3459 3460 3461 3462 3463 3464 3465 3466 3467 3468 3469 3470 3471 3472 3473 3474 3475 3476 3477 3478 3479 3480 3481 3482 3483 3484 3485 3486 3487 3488 3489 3490 3491 3492 3493 3494 3495 3496 3497 3498 3499 3500 3501 3502 3503 3504 3505 3506 3507 3508 3509 3510 3511 3512 3513 3514 3515 3516 3517 3518 3519 3520 3521 3522 3523 3524 3525 3526 3527 3528 3529 3530 3531 3532 3533 3534 3535 3536 3537 3538 3539 3540 3541 3542 3543 3544 3545 3546 3547 3548 3549 3550 3551 3552 3553 3554 3555 3556 3557 3558 3559 3560 3561 3562 3563 3564 3565 3566 3567 3568 3569 3570 3571 3572 3573 3574 3575 3576 3577 3578 3579 3580 3581 3582 3583 3584 3585 3586 3587 3588 3589 3590 3591 3592 3593 3594 3595 3596 3597 3598 3599 3600 3601 3602 3603 3604 3605 3606 3607 3608 3609 3610 3611 3612 3613 3614 3615 3616 3617 3618 3619 3620 3621 3622 3623 3624 3625 3626 3627 3628 3629 3630 3631 3632 3633 3634 3635 3636 3637 3638 3639 3640 3641 3642 3643 3644 3645 3646 3647 3648 3649 3650 3651 3652 3653 3654 3655 3656 3657 3658 3659 3660 3661 3662 3663 3664 3665 3666 3667 3668 3669 3670 3671 3672 3673 3674 3675 3676 3677 3678 3679 3680 3681 3682 3683 3684 3685 3686 3687 3688 3689 3690 3691 3692 3693 3694 3695 3696 3697 3698 3699 3700 3701 3702 3703 3704 3705 3706 3707 3708 3709 3710 3711 3712 3713 3714 3715 3716 3717 3718 3719 3720 3721 3722 3723 3724 3725 3726 3727 3728 3729 3730 3731 3732 3733 3734 3735 3736 3737 3738 3739 3740 3741 3742 3743 3744 3745 3746 3747 3748 3749 3750 3751 3752 3753 3754 3755 3756 3757 3758 3759 3760 3761 3762 3763 3764 3765 3766 3767 3768 3769 3770 3771 3772 3773 3774 3775 3776 3777 3778 3779 3780 3781 3782 3783 3784 3785 3786 3787 3788 3789 3790 3791 3792 3793 3794 | // SPDX-License-Identifier: GPL-2.0-or-later /* * INET An implementation of the TCP/IP protocol suite for the LINUX * operating system. INET is implemented using the BSD Socket * interface as the means of communication with the user level. * * ROUTE - implementation of the IP router. * * Authors: Ross Biro * Fred N. van Kempen, <waltje@uWalt.NL.Mugnet.ORG> * Alan Cox, <gw4pts@gw4pts.ampr.org> * Linus Torvalds, <Linus.Torvalds@helsinki.fi> * Alexey Kuznetsov, <kuznet@ms2.inr.ac.ru> * * Fixes: * Alan Cox : Verify area fixes. * Alan Cox : cli() protects routing changes * Rui Oliveira : ICMP routing table updates * (rco@di.uminho.pt) Routing table insertion and update * Linus Torvalds : Rewrote bits to be sensible * Alan Cox : Added BSD route gw semantics * Alan Cox : Super /proc >4K * Alan Cox : MTU in route table * Alan Cox : MSS actually. Also added the window * clamper. * Sam Lantinga : Fixed route matching in rt_del() * Alan Cox : Routing cache support. * Alan Cox : Removed compatibility cruft. * Alan Cox : RTF_REJECT support. * Alan Cox : TCP irtt support. * Jonathan Naylor : Added Metric support. * Miquel van Smoorenburg : BSD API fixes. * Miquel van Smoorenburg : Metrics. * Alan Cox : Use __u32 properly * Alan Cox : Aligned routing errors more closely with BSD * our system is still very different. * Alan Cox : Faster /proc handling * Alexey Kuznetsov : Massive rework to support tree based routing, * routing caches and better behaviour. * * Olaf Erb : irtt wasn't being copied right. * Bjorn Ekwall : Kerneld route support. * Alan Cox : Multicast fixed (I hope) * Pavel Krauz : Limited broadcast fixed * Mike McLagan : Routing by source * Alexey Kuznetsov : End of old history. Split to fib.c and * route.c and rewritten from scratch. * Andi Kleen : Load-limit warning messages. * Vitaly E. Lavrov : Transparent proxy revived after year coma. * Vitaly E. Lavrov : Race condition in ip_route_input_slow. * Tobias Ringstrom : Uninitialized res.type in ip_route_output_slow. * Vladimir V. Ivanov : IP rule info (flowid) is really useful. * Marc Boucher : routing by fwmark * Robert Olsson : Added rt_cache statistics * Arnaldo C. Melo : Convert proc stuff to seq_file * Eric Dumazet : hashed spinlocks and rt_check_expire() fixes. * Ilia Sotnikov : Ignore TOS on PMTUD and Redirect * Ilia Sotnikov : Removed TOS from hash calculations */ #define pr_fmt(fmt) "IPv4: " fmt #include <linux/module.h> #include <linux/bitops.h> #include <linux/kernel.h> #include <linux/mm.h> #include <linux/memblock.h> #include <linux/socket.h> #include <linux/errno.h> #include <linux/in.h> #include <linux/inet.h> #include <linux/netdevice.h> #include <linux/proc_fs.h> #include <linux/init.h> #include <linux/skbuff.h> #include <linux/inetdevice.h> #include <linux/igmp.h> #include <linux/pkt_sched.h> #include <linux/mroute.h> #include <linux/netfilter_ipv4.h> #include <linux/random.h> #include <linux/rcupdate.h> #include <linux/slab.h> #include <linux/jhash.h> #include <net/dst.h> #include <net/dst_metadata.h> #include <net/flow.h> #include <net/inet_dscp.h> #include <net/net_namespace.h> #include <net/ip.h> #include <net/route.h> #include <net/inetpeer.h> #include <net/sock.h> #include <net/ip_fib.h> #include <net/nexthop.h> #include <net/tcp.h> #include <net/icmp.h> #include <net/xfrm.h> #include <net/lwtunnel.h> #include <net/netevent.h> #include <net/rtnetlink.h> #ifdef CONFIG_SYSCTL #include <linux/sysctl.h> #endif #include <net/secure_seq.h> #include <net/ip_tunnels.h> #include "fib_lookup.h" #define RT_GC_TIMEOUT (300*HZ) #define DEFAULT_MIN_PMTU (512 + 20 + 20) #define DEFAULT_MTU_EXPIRES (10 * 60 * HZ) #define DEFAULT_MIN_ADVMSS 256 static int ip_rt_max_size; static int ip_rt_redirect_number __read_mostly = 9; static int ip_rt_redirect_load __read_mostly = HZ / 50; static int ip_rt_redirect_silence __read_mostly = ((HZ / 50) << (9 + 1)); static int ip_rt_error_cost __read_mostly = HZ; static int ip_rt_error_burst __read_mostly = 5 * HZ; static int ip_rt_gc_timeout __read_mostly = RT_GC_TIMEOUT; /* * Interface to generic destination cache. */ INDIRECT_CALLABLE_SCOPE struct dst_entry *ipv4_dst_check(struct dst_entry *dst, u32 cookie); static unsigned int ipv4_default_advmss(const struct dst_entry *dst); INDIRECT_CALLABLE_SCOPE unsigned int ipv4_mtu(const struct dst_entry *dst); static void ipv4_negative_advice(struct sock *sk, struct dst_entry *dst); static void ipv4_link_failure(struct sk_buff *skb); static void ip_rt_update_pmtu(struct dst_entry *dst, struct sock *sk, struct sk_buff *skb, u32 mtu, bool confirm_neigh); static void ip_do_redirect(struct dst_entry *dst, struct sock *sk, struct sk_buff *skb); static void ipv4_dst_destroy(struct dst_entry *dst); static u32 *ipv4_cow_metrics(struct dst_entry *dst, unsigned long old) { WARN_ON(1); return NULL; } static struct neighbour *ipv4_neigh_lookup(const struct dst_entry *dst, struct sk_buff *skb, const void *daddr); static void ipv4_confirm_neigh(const struct dst_entry *dst, const void *daddr); static struct dst_ops ipv4_dst_ops = { .family = AF_INET, .check = ipv4_dst_check, .default_advmss = ipv4_default_advmss, .mtu = ipv4_mtu, .cow_metrics = ipv4_cow_metrics, .destroy = ipv4_dst_destroy, .negative_advice = ipv4_negative_advice, .link_failure = ipv4_link_failure, .update_pmtu = ip_rt_update_pmtu, .redirect = ip_do_redirect, .local_out = __ip_local_out, .neigh_lookup = ipv4_neigh_lookup, .confirm_neigh = ipv4_confirm_neigh, }; #define ECN_OR_COST(class) TC_PRIO_##class const __u8 ip_tos2prio[16] = { TC_PRIO_BESTEFFORT, ECN_OR_COST(BESTEFFORT), TC_PRIO_BESTEFFORT, ECN_OR_COST(BESTEFFORT), TC_PRIO_BULK, ECN_OR_COST(BULK), TC_PRIO_BULK, ECN_OR_COST(BULK), TC_PRIO_INTERACTIVE, ECN_OR_COST(INTERACTIVE), TC_PRIO_INTERACTIVE, ECN_OR_COST(INTERACTIVE), TC_PRIO_INTERACTIVE_BULK, ECN_OR_COST(INTERACTIVE_BULK), TC_PRIO_INTERACTIVE_BULK, ECN_OR_COST(INTERACTIVE_BULK) }; EXPORT_SYMBOL(ip_tos2prio); static DEFINE_PER_CPU(struct rt_cache_stat, rt_cache_stat); #ifndef CONFIG_PREEMPT_RT #define RT_CACHE_STAT_INC(field) raw_cpu_inc(rt_cache_stat.field) #else #define RT_CACHE_STAT_INC(field) this_cpu_inc(rt_cache_stat.field) #endif #ifdef CONFIG_PROC_FS static void *rt_cache_seq_start(struct seq_file *seq, loff_t *pos) { if (*pos) return NULL; return SEQ_START_TOKEN; } static void *rt_cache_seq_next(struct seq_file *seq, void *v, loff_t *pos) { ++*pos; return NULL; } static void rt_cache_seq_stop(struct seq_file *seq, void *v) { } static int rt_cache_seq_show(struct seq_file *seq, void *v) { if (v == SEQ_START_TOKEN) seq_printf(seq, "%-127s\n", "Iface\tDestination\tGateway \tFlags\t\tRefCnt\tUse\t" "Metric\tSource\t\tMTU\tWindow\tIRTT\tTOS\tHHRef\t" "HHUptod\tSpecDst"); return 0; } static const struct seq_operations rt_cache_seq_ops = { .start = rt_cache_seq_start, .next = rt_cache_seq_next, .stop = rt_cache_seq_stop, .show = rt_cache_seq_show, }; static void *rt_cpu_seq_start(struct seq_file *seq, loff_t *pos) { int cpu; if (*pos == 0) return SEQ_START_TOKEN; for (cpu = *pos-1; cpu < nr_cpu_ids; ++cpu) { if (!cpu_possible(cpu)) continue; *pos = cpu+1; return &per_cpu(rt_cache_stat, cpu); } return NULL; } static void *rt_cpu_seq_next(struct seq_file *seq, void *v, loff_t *pos) { int cpu; for (cpu = *pos; cpu < nr_cpu_ids; ++cpu) { if (!cpu_possible(cpu)) continue; *pos = cpu+1; return &per_cpu(rt_cache_stat, cpu); } (*pos)++; return NULL; } static void rt_cpu_seq_stop(struct seq_file *seq, void *v) { } static int rt_cpu_seq_show(struct seq_file *seq, void *v) { struct rt_cache_stat *st = v; if (v == SEQ_START_TOKEN) { seq_puts(seq, "entries in_hit in_slow_tot in_slow_mc in_no_route in_brd in_martian_dst in_martian_src out_hit out_slow_tot out_slow_mc gc_total gc_ignored gc_goal_miss gc_dst_overflow in_hlist_search out_hlist_search\n"); return 0; } seq_printf(seq, "%08x %08x %08x %08x %08x %08x %08x " "%08x %08x %08x %08x %08x %08x " "%08x %08x %08x %08x\n", dst_entries_get_slow(&ipv4_dst_ops), 0, /* st->in_hit */ st->in_slow_tot, st->in_slow_mc, st->in_no_route, st->in_brd, st->in_martian_dst, st->in_martian_src, 0, /* st->out_hit */ st->out_slow_tot, st->out_slow_mc, 0, /* st->gc_total */ 0, /* st->gc_ignored */ 0, /* st->gc_goal_miss */ 0, /* st->gc_dst_overflow */ 0, /* st->in_hlist_search */ 0 /* st->out_hlist_search */ ); return 0; } static const struct seq_operations rt_cpu_seq_ops = { .start = rt_cpu_seq_start, .next = rt_cpu_seq_next, .stop = rt_cpu_seq_stop, .show = rt_cpu_seq_show, }; #ifdef CONFIG_IP_ROUTE_CLASSID static int rt_acct_proc_show(struct seq_file *m, void *v) { struct ip_rt_acct *dst, *src; unsigned int i, j; dst = kcalloc(256, sizeof(struct ip_rt_acct), GFP_KERNEL); if (!dst) return -ENOMEM; for_each_possible_cpu(i) { src = (struct ip_rt_acct *)per_cpu_ptr(ip_rt_acct, i); for (j = 0; j < 256; j++) { dst[j].o_bytes += src[j].o_bytes; dst[j].o_packets += src[j].o_packets; dst[j].i_bytes += src[j].i_bytes; dst[j].i_packets += src[j].i_packets; } } seq_write(m, dst, 256 * sizeof(struct ip_rt_acct)); kfree(dst); return 0; } #endif static int __net_init ip_rt_do_proc_init(struct net *net) { struct proc_dir_entry *pde; pde = proc_create_seq("rt_cache", 0444, net->proc_net, &rt_cache_seq_ops); if (!pde) goto err1; pde = proc_create_seq("rt_cache", 0444, net->proc_net_stat, &rt_cpu_seq_ops); if (!pde) goto err2; #ifdef CONFIG_IP_ROUTE_CLASSID pde = proc_create_single("rt_acct", 0, net->proc_net, rt_acct_proc_show); if (!pde) goto err3; #endif return 0; #ifdef CONFIG_IP_ROUTE_CLASSID err3: remove_proc_entry("rt_cache", net->proc_net_stat); #endif err2: remove_proc_entry("rt_cache", net->proc_net); err1: return -ENOMEM; } static void __net_exit ip_rt_do_proc_exit(struct net *net) { remove_proc_entry("rt_cache", net->proc_net_stat); remove_proc_entry("rt_cache", net->proc_net); #ifdef CONFIG_IP_ROUTE_CLASSID remove_proc_entry("rt_acct", net->proc_net); #endif } static struct pernet_operations ip_rt_proc_ops __net_initdata = { .init = ip_rt_do_proc_init, .exit = ip_rt_do_proc_exit, }; static int __init ip_rt_proc_init(void) { return register_pernet_subsys(&ip_rt_proc_ops); } #else static inline int ip_rt_proc_init(void) { return 0; } #endif /* CONFIG_PROC_FS */ static inline bool rt_is_expired(const struct rtable *rth) { bool res; rcu_read_lock(); res = rth->rt_genid != rt_genid_ipv4(dev_net_rcu(rth->dst.dev)); rcu_read_unlock(); return res; } void rt_cache_flush(struct net *net) { rt_genid_bump_ipv4(net); } static struct neighbour *ipv4_neigh_lookup(const struct dst_entry *dst, struct sk_buff *skb, const void *daddr) { const struct rtable *rt = container_of(dst, struct rtable, dst); struct net_device *dev; struct neighbour *n; rcu_read_lock(); dev = dst_dev_rcu(dst); if (likely(rt->rt_gw_family == AF_INET)) { n = ip_neigh_gw4(dev, rt->rt_gw4); } else if (rt->rt_gw_family == AF_INET6) { n = ip_neigh_gw6(dev, &rt->rt_gw6); } else { __be32 pkey; pkey = skb ? ip_hdr(skb)->daddr : *((__be32 *) daddr); n = ip_neigh_gw4(dev, pkey); } if (!IS_ERR(n) && !refcount_inc_not_zero(&n->refcnt)) n = NULL; rcu_read_unlock(); return n; } static void ipv4_confirm_neigh(const struct dst_entry *dst, const void *daddr) { const struct rtable *rt = container_of(dst, struct rtable, dst); struct net_device *dev = dst_dev(dst); const __be32 *pkey = daddr; if (rt->rt_gw_family == AF_INET) { pkey = (const __be32 *)&rt->rt_gw4; } else if (rt->rt_gw_family == AF_INET6) { return __ipv6_confirm_neigh_stub(dev, &rt->rt_gw6); } else if (!daddr || (rt->rt_flags & (RTCF_MULTICAST | RTCF_BROADCAST | RTCF_LOCAL))) { return; } __ipv4_confirm_neigh(dev, *(__force u32 *)pkey); } /* Hash tables of size 2048..262144 depending on RAM size. * Each bucket uses 8 bytes. */ static u32 ip_idents_mask __read_mostly; static atomic_t *ip_idents __read_mostly; static u32 *ip_tstamps __read_mostly; /* In order to protect privacy, we add a perturbation to identifiers * if one generator is seldom used. This makes hard for an attacker * to infer how many packets were sent between two points in time. */ static u32 ip_idents_reserve(u32 hash, int segs) { u32 bucket, old, now = (u32)jiffies; atomic_t *p_id; u32 *p_tstamp; u32 delta = 0; bucket = hash & ip_idents_mask; p_tstamp = ip_tstamps + bucket; p_id = ip_idents + bucket; old = READ_ONCE(*p_tstamp); if (old != now && cmpxchg(p_tstamp, old, now) == old) delta = get_random_u32_below(now - old); /* If UBSAN reports an error there, please make sure your compiler * supports -fno-strict-overflow before reporting it that was a bug * in UBSAN, and it has been fixed in GCC-8. */ return atomic_add_return(segs + delta, p_id) - segs; } void __ip_select_ident(struct net *net, struct iphdr *iph, int segs) { u32 hash, id; /* Note the following code is not safe, but this is okay. */ if (unlikely(siphash_key_is_zero(&net->ipv4.ip_id_key))) get_random_bytes(&net->ipv4.ip_id_key, sizeof(net->ipv4.ip_id_key)); hash = siphash_3u32((__force u32)iph->daddr, (__force u32)iph->saddr, iph->protocol, &net->ipv4.ip_id_key); id = ip_idents_reserve(hash, segs); iph->id = htons(id); } EXPORT_SYMBOL(__ip_select_ident); static void __build_flow_key(const struct net *net, struct flowi4 *fl4, const struct sock *sk, const struct iphdr *iph, int oif, __u8 tos, u8 prot, u32 mark, int flow_flags) { __u8 scope = RT_SCOPE_UNIVERSE; if (sk) { oif = sk->sk_bound_dev_if; mark = READ_ONCE(sk->sk_mark); tos = ip_sock_rt_tos(sk); scope = ip_sock_rt_scope(sk); prot = inet_test_bit(HDRINCL, sk) ? IPPROTO_RAW : sk->sk_protocol; } flowi4_init_output(fl4, oif, mark, tos & INET_DSCP_MASK, scope, prot, flow_flags, iph->daddr, iph->saddr, 0, 0, sock_net_uid(net, sk)); } static void build_skb_flow_key(struct flowi4 *fl4, const struct sk_buff *skb, const struct sock *sk) { const struct net *net = dev_net(skb->dev); const struct iphdr *iph = ip_hdr(skb); int oif = skb->dev->ifindex; u8 prot = iph->protocol; u32 mark = skb->mark; __u8 tos = iph->tos; __build_flow_key(net, fl4, sk, iph, oif, tos, prot, mark, 0); } static void build_sk_flow_key(struct flowi4 *fl4, const struct sock *sk) { const struct inet_sock *inet = inet_sk(sk); const struct ip_options_rcu *inet_opt; __be32 daddr = inet->inet_daddr; rcu_read_lock(); inet_opt = rcu_dereference(inet->inet_opt); if (inet_opt && inet_opt->opt.srr) daddr = inet_opt->opt.faddr; flowi4_init_output(fl4, sk->sk_bound_dev_if, READ_ONCE(sk->sk_mark), ip_sock_rt_tos(sk), ip_sock_rt_scope(sk), inet_test_bit(HDRINCL, sk) ? IPPROTO_RAW : sk->sk_protocol, inet_sk_flowi_flags(sk), daddr, inet->inet_saddr, 0, 0, sk_uid(sk)); rcu_read_unlock(); } static void ip_rt_build_flow_key(struct flowi4 *fl4, const struct sock *sk, const struct sk_buff *skb) { if (skb) build_skb_flow_key(fl4, skb, sk); else build_sk_flow_key(fl4, sk); } static DEFINE_SPINLOCK(fnhe_lock); static void fnhe_flush_routes(struct fib_nh_exception *fnhe) { struct rtable *rt; rt = rcu_dereference(fnhe->fnhe_rth_input); if (rt) { RCU_INIT_POINTER(fnhe->fnhe_rth_input, NULL); dst_dev_put(&rt->dst); dst_release(&rt->dst); } rt = rcu_dereference(fnhe->fnhe_rth_output); if (rt) { RCU_INIT_POINTER(fnhe->fnhe_rth_output, NULL); dst_dev_put(&rt->dst); dst_release(&rt->dst); } } static void fnhe_remove_oldest(struct fnhe_hash_bucket *hash) { struct fib_nh_exception __rcu **fnhe_p, **oldest_p; struct fib_nh_exception *fnhe, *oldest = NULL; for (fnhe_p = &hash->chain; ; fnhe_p = &fnhe->fnhe_next) { fnhe = rcu_dereference_protected(*fnhe_p, lockdep_is_held(&fnhe_lock)); if (!fnhe) break; if (!oldest || time_before(fnhe->fnhe_stamp, oldest->fnhe_stamp)) { oldest = fnhe; oldest_p = fnhe_p; } } fnhe_flush_routes(oldest); *oldest_p = oldest->fnhe_next; kfree_rcu(oldest, rcu); } static u32 fnhe_hashfun(__be32 daddr) { static siphash_aligned_key_t fnhe_hash_key; u64 hval; net_get_random_once(&fnhe_hash_key, sizeof(fnhe_hash_key)); hval = siphash_1u32((__force u32)daddr, &fnhe_hash_key); return hash_64(hval, FNHE_HASH_SHIFT); } static void fill_route_from_fnhe(struct rtable *rt, struct fib_nh_exception *fnhe) { rt->rt_pmtu = fnhe->fnhe_pmtu; rt->rt_mtu_locked = fnhe->fnhe_mtu_locked; rt->dst.expires = fnhe->fnhe_expires; if (fnhe->fnhe_gw) { rt->rt_flags |= RTCF_REDIRECTED; rt->rt_uses_gateway = 1; rt->rt_gw_family = AF_INET; rt->rt_gw4 = fnhe->fnhe_gw; } } static void update_or_create_fnhe(struct fib_nh_common *nhc, __be32 daddr, __be32 gw, u32 pmtu, bool lock, unsigned long expires) { struct fnhe_hash_bucket *hash; struct fib_nh_exception *fnhe; struct rtable *rt; u32 genid, hval; unsigned int i; int depth; genid = fnhe_genid(dev_net(nhc->nhc_dev)); hval = fnhe_hashfun(daddr); spin_lock_bh(&fnhe_lock); hash = rcu_dereference(nhc->nhc_exceptions); if (!hash) { hash = kcalloc(FNHE_HASH_SIZE, sizeof(*hash), GFP_ATOMIC); if (!hash) goto out_unlock; rcu_assign_pointer(nhc->nhc_exceptions, hash); } hash += hval; depth = 0; for (fnhe = rcu_dereference(hash->chain); fnhe; fnhe = rcu_dereference(fnhe->fnhe_next)) { if (fnhe->fnhe_daddr == daddr) break; depth++; } if (fnhe) { if (fnhe->fnhe_genid != genid) fnhe->fnhe_genid = genid; if (gw) fnhe->fnhe_gw = gw; if (pmtu) { fnhe->fnhe_pmtu = pmtu; fnhe->fnhe_mtu_locked = lock; } fnhe->fnhe_expires = max(1UL, expires); /* Update all cached dsts too */ rt = rcu_dereference(fnhe->fnhe_rth_input); if (rt) fill_route_from_fnhe(rt, fnhe); rt = rcu_dereference(fnhe->fnhe_rth_output); if (rt) fill_route_from_fnhe(rt, fnhe); } else { /* Randomize max depth to avoid some side channels attacks. */ int max_depth = FNHE_RECLAIM_DEPTH + get_random_u32_below(FNHE_RECLAIM_DEPTH); while (depth > max_depth) { fnhe_remove_oldest(hash); depth--; } fnhe = kzalloc(sizeof(*fnhe), GFP_ATOMIC); if (!fnhe) goto out_unlock; fnhe->fnhe_next = hash->chain; fnhe->fnhe_genid = genid; fnhe->fnhe_daddr = daddr; fnhe->fnhe_gw = gw; fnhe->fnhe_pmtu = pmtu; fnhe->fnhe_mtu_locked = lock; fnhe->fnhe_expires = max(1UL, expires); rcu_assign_pointer(hash->chain, fnhe); /* Exception created; mark the cached routes for the nexthop * stale, so anyone caching it rechecks if this exception * applies to them. */ rt = rcu_dereference(nhc->nhc_rth_input); if (rt) WRITE_ONCE(rt->dst.obsolete, DST_OBSOLETE_KILL); for_each_possible_cpu(i) { struct rtable __rcu **prt; prt = per_cpu_ptr(nhc->nhc_pcpu_rth_output, i); rt = rcu_dereference(*prt); if (rt) WRITE_ONCE(rt->dst.obsolete, DST_OBSOLETE_KILL); } } fnhe->fnhe_stamp = jiffies; out_unlock: spin_unlock_bh(&fnhe_lock); } static void __ip_do_redirect(struct rtable *rt, struct sk_buff *skb, struct flowi4 *fl4, bool kill_route) { __be32 new_gw = icmp_hdr(skb)->un.gateway; __be32 old_gw = ip_hdr(skb)->saddr; struct net_device *dev = skb->dev; struct in_device *in_dev; struct fib_result res; struct neighbour *n; struct net *net; switch (icmp_hdr(skb)->code & 7) { case ICMP_REDIR_NET: case ICMP_REDIR_NETTOS: case ICMP_REDIR_HOST: case ICMP_REDIR_HOSTTOS: break; default: return; } if (rt->rt_gw_family != AF_INET || rt->rt_gw4 != old_gw) return; in_dev = __in_dev_get_rcu(dev); if (!in_dev) return; net = dev_net(dev); if (new_gw == old_gw || !IN_DEV_RX_REDIRECTS(in_dev) || ipv4_is_multicast(new_gw) || ipv4_is_lbcast(new_gw) || ipv4_is_zeronet(new_gw)) goto reject_redirect; if (!IN_DEV_SHARED_MEDIA(in_dev)) { if (!inet_addr_onlink(in_dev, new_gw, old_gw)) goto reject_redirect; if (IN_DEV_SEC_REDIRECTS(in_dev) && ip_fib_check_default(new_gw, dev)) goto reject_redirect; } else { if (inet_addr_type(net, new_gw) != RTN_UNICAST) goto reject_redirect; } n = __ipv4_neigh_lookup(rt->dst.dev, (__force u32)new_gw); if (!n) n = neigh_create(&arp_tbl, &new_gw, rt->dst.dev); if (!IS_ERR(n)) { if (!(READ_ONCE(n->nud_state) & NUD_VALID)) { neigh_event_send(n, NULL); } else { if (fib_lookup(net, fl4, &res, 0) == 0) { struct fib_nh_common *nhc; fib_select_path(net, &res, fl4, skb); nhc = FIB_RES_NHC(res); update_or_create_fnhe(nhc, fl4->daddr, new_gw, 0, false, jiffies + ip_rt_gc_timeout); } if (kill_route) WRITE_ONCE(rt->dst.obsolete, DST_OBSOLETE_KILL); call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, n); } neigh_release(n); } return; reject_redirect: #ifdef CONFIG_IP_ROUTE_VERBOSE if (IN_DEV_LOG_MARTIANS(in_dev)) { const struct iphdr *iph = (const struct iphdr *) skb->data; __be32 daddr = iph->daddr; __be32 saddr = iph->saddr; net_info_ratelimited("Redirect from %pI4 on %s about %pI4 ignored\n" " Advised path = %pI4 -> %pI4\n", &old_gw, dev->name, &new_gw, &saddr, &daddr); } #endif ; } static void ip_do_redirect(struct dst_entry *dst, struct sock *sk, struct sk_buff *skb) { struct rtable *rt; struct flowi4 fl4; const struct iphdr *iph = (const struct iphdr *) skb->data; struct net *net = dev_net(skb->dev); int oif = skb->dev->ifindex; u8 prot = iph->protocol; u32 mark = skb->mark; __u8 tos = iph->tos; rt = dst_rtable(dst); __build_flow_key(net, &fl4, sk, iph, oif, tos, prot, mark, 0); __ip_do_redirect(rt, skb, &fl4, true); } static void ipv4_negative_advice(struct sock *sk, struct dst_entry *dst) { struct rtable *rt = dst_rtable(dst); if ((READ_ONCE(dst->obsolete) > 0) || (rt->rt_flags & RTCF_REDIRECTED) || READ_ONCE(rt->dst.expires)) sk_dst_reset(sk); } /* * Algorithm: * 1. The first ip_rt_redirect_number redirects are sent * with exponential backoff, then we stop sending them at all, * assuming that the host ignores our redirects. * 2. If we did not see packets requiring redirects * during ip_rt_redirect_silence, we assume that the host * forgot redirected route and start to send redirects again. * * This algorithm is much cheaper and more intelligent than dumb load limiting * in icmp.c. * * NOTE. Do not forget to inhibit load limiting for redirects (redundant) * and "frag. need" (breaks PMTU discovery) in icmp.c. */ void ip_rt_send_redirect(struct sk_buff *skb) { struct rtable *rt = skb_rtable(skb); struct in_device *in_dev; struct inet_peer *peer; struct net *net; int log_martians; int vif; rcu_read_lock(); in_dev = __in_dev_get_rcu(rt->dst.dev); if (!in_dev || !IN_DEV_TX_REDIRECTS(in_dev)) { rcu_read_unlock(); return; } log_martians = IN_DEV_LOG_MARTIANS(in_dev); vif = l3mdev_master_ifindex_rcu(rt->dst.dev); net = dev_net(rt->dst.dev); peer = inet_getpeer_v4(net->ipv4.peers, ip_hdr(skb)->saddr, vif); if (!peer) { rcu_read_unlock(); icmp_send(skb, ICMP_REDIRECT, ICMP_REDIR_HOST, rt_nexthop(rt, ip_hdr(skb)->daddr)); return; } /* No redirected packets during ip_rt_redirect_silence; * reset the algorithm. */ if (time_after(jiffies, peer->rate_last + ip_rt_redirect_silence)) { peer->rate_tokens = 0; peer->n_redirects = 0; } /* Too many ignored redirects; do not send anything * set dst.rate_last to the last seen redirected packet. */ if (peer->n_redirects >= ip_rt_redirect_number) { peer->rate_last = jiffies; goto out_unlock; } /* Check for load limit; set rate_last to the latest sent * redirect. */ if (peer->n_redirects == 0 || time_after(jiffies, (peer->rate_last + (ip_rt_redirect_load << peer->n_redirects)))) { __be32 gw = rt_nexthop(rt, ip_hdr(skb)->daddr); icmp_send(skb, ICMP_REDIRECT, ICMP_REDIR_HOST, gw); peer->rate_last = jiffies; ++peer->n_redirects; if (IS_ENABLED(CONFIG_IP_ROUTE_VERBOSE) && log_martians && peer->n_redirects == ip_rt_redirect_number) net_warn_ratelimited("host %pI4/if%d ignores redirects for %pI4 to %pI4\n", &ip_hdr(skb)->saddr, inet_iif(skb), &ip_hdr(skb)->daddr, &gw); } out_unlock: rcu_read_unlock(); } static int ip_error(struct sk_buff *skb) { struct rtable *rt = skb_rtable(skb); struct net_device *dev = skb->dev; struct in_device *in_dev; struct inet_peer *peer; unsigned long now; struct net *net; SKB_DR(reason); bool send; int code; if (netif_is_l3_master(skb->dev)) { dev = __dev_get_by_index(dev_net(skb->dev), IPCB(skb)->iif); if (!dev) goto out; } in_dev = __in_dev_get_rcu(dev); /* IP on this device is disabled. */ if (!in_dev) goto out; net = dev_net(rt->dst.dev); if (!IN_DEV_FORWARD(in_dev)) { switch (rt->dst.error) { case EHOSTUNREACH: SKB_DR_SET(reason, IP_INADDRERRORS); __IP_INC_STATS(net, IPSTATS_MIB_INADDRERRORS); break; case ENETUNREACH: SKB_DR_SET(reason, IP_INNOROUTES); __IP_INC_STATS(net, IPSTATS_MIB_INNOROUTES); break; } goto out; } switch (rt->dst.error) { case EINVAL: default: goto out; case EHOSTUNREACH: code = ICMP_HOST_UNREACH; break; case ENETUNREACH: code = ICMP_NET_UNREACH; SKB_DR_SET(reason, IP_INNOROUTES); __IP_INC_STATS(net, IPSTATS_MIB_INNOROUTES); break; case EACCES: code = ICMP_PKT_FILTERED; break; } rcu_read_lock(); peer = inet_getpeer_v4(net->ipv4.peers, ip_hdr(skb)->saddr, l3mdev_master_ifindex_rcu(skb->dev)); send = true; if (peer) { now = jiffies; peer->rate_tokens += now - peer->rate_last; if (peer->rate_tokens > ip_rt_error_burst) peer->rate_tokens = ip_rt_error_burst; peer->rate_last = now; if (peer->rate_tokens >= ip_rt_error_cost) peer->rate_tokens -= ip_rt_error_cost; else send = false; } rcu_read_unlock(); if (send) icmp_send(skb, ICMP_DEST_UNREACH, code, 0); out: kfree_skb_reason(skb, reason); return 0; } static void __ip_rt_update_pmtu(struct rtable *rt, struct flowi4 *fl4, u32 mtu) { struct dst_entry *dst = &rt->dst; struct fib_result res; bool lock = false; struct net *net; u32 old_mtu; if (ip_mtu_locked(dst)) return; old_mtu = ipv4_mtu(dst); if (old_mtu < mtu) return; rcu_read_lock(); net = dst_dev_net_rcu(dst); if (mtu < net->ipv4.ip_rt_min_pmtu) { lock = true; mtu = min(old_mtu, net->ipv4.ip_rt_min_pmtu); } if (rt->rt_pmtu == mtu && !lock && time_before(jiffies, READ_ONCE(dst->expires) - net->ipv4.ip_rt_mtu_expires / 2)) goto out; if (fib_lookup(net, fl4, &res, 0) == 0) { struct fib_nh_common *nhc; fib_select_path(net, &res, fl4, NULL); #ifdef CONFIG_IP_ROUTE_MULTIPATH if (fib_info_num_path(res.fi) > 1) { int nhsel; for (nhsel = 0; nhsel < fib_info_num_path(res.fi); nhsel++) { nhc = fib_info_nhc(res.fi, nhsel); update_or_create_fnhe(nhc, fl4->daddr, 0, mtu, lock, jiffies + net->ipv4.ip_rt_mtu_expires); } goto out; } #endif /* CONFIG_IP_ROUTE_MULTIPATH */ nhc = FIB_RES_NHC(res); update_or_create_fnhe(nhc, fl4->daddr, 0, mtu, lock, jiffies + net->ipv4.ip_rt_mtu_expires); } out: rcu_read_unlock(); } static void ip_rt_update_pmtu(struct dst_entry *dst, struct sock *sk, struct sk_buff *skb, u32 mtu, bool confirm_neigh) { struct rtable *rt = dst_rtable(dst); struct flowi4 fl4; ip_rt_build_flow_key(&fl4, sk, skb); /* Don't make lookup fail for bridged encapsulations */ if (skb && netif_is_any_bridge_port(skb->dev)) fl4.flowi4_oif = 0; __ip_rt_update_pmtu(rt, &fl4, mtu); } void ipv4_update_pmtu(struct sk_buff *skb, struct net *net, u32 mtu, int oif, u8 protocol) { const struct iphdr *iph = (const struct iphdr *)skb->data; struct flowi4 fl4; struct rtable *rt; u32 mark = IP4_REPLY_MARK(net, skb->mark); __build_flow_key(net, &fl4, NULL, iph, oif, iph->tos, protocol, mark, 0); rt = __ip_route_output_key(net, &fl4); if (!IS_ERR(rt)) { __ip_rt_update_pmtu(rt, &fl4, mtu); ip_rt_put(rt); } } EXPORT_SYMBOL_GPL(ipv4_update_pmtu); static void __ipv4_sk_update_pmtu(struct sk_buff *skb, struct sock *sk, u32 mtu) { const struct iphdr *iph = (const struct iphdr *)skb->data; struct flowi4 fl4; struct rtable *rt; __build_flow_key(sock_net(sk), &fl4, sk, iph, 0, 0, 0, 0, 0); if (!fl4.flowi4_mark) fl4.flowi4_mark = IP4_REPLY_MARK(sock_net(sk), skb->mark); rt = __ip_route_output_key(sock_net(sk), &fl4); if (!IS_ERR(rt)) { __ip_rt_update_pmtu(rt, &fl4, mtu); ip_rt_put(rt); } } void ipv4_sk_update_pmtu(struct sk_buff *skb, struct sock *sk, u32 mtu) { const struct iphdr *iph = (const struct iphdr *)skb->data; struct flowi4 fl4; struct rtable *rt; struct dst_entry *odst = NULL; bool new = false; struct net *net = sock_net(sk); bh_lock_sock(sk); if (!ip_sk_accept_pmtu(sk)) goto out; odst = sk_dst_get(sk); if (sock_owned_by_user(sk) || !odst) { __ipv4_sk_update_pmtu(skb, sk, mtu); goto out; } __build_flow_key(net, &fl4, sk, iph, 0, 0, 0, 0, 0); rt = dst_rtable(odst); if (READ_ONCE(odst->obsolete) && !odst->ops->check(odst, 0)) { rt = ip_route_output_flow(sock_net(sk), &fl4, sk); if (IS_ERR(rt)) goto out; new = true; } __ip_rt_update_pmtu(dst_rtable(xfrm_dst_path(&rt->dst)), &fl4, mtu); if (!dst_check(&rt->dst, 0)) { if (new) dst_release(&rt->dst); rt = ip_route_output_flow(sock_net(sk), &fl4, sk); if (IS_ERR(rt)) goto out; new = true; } if (new) sk_dst_set(sk, &rt->dst); out: bh_unlock_sock(sk); dst_release(odst); } EXPORT_SYMBOL_GPL(ipv4_sk_update_pmtu); void ipv4_redirect(struct sk_buff *skb, struct net *net, int oif, u8 protocol) { const struct iphdr *iph = (const struct iphdr *)skb->data; struct flowi4 fl4; struct rtable *rt; __build_flow_key(net, &fl4, NULL, iph, oif, iph->tos, protocol, 0, 0); rt = __ip_route_output_key(net, &fl4); if (!IS_ERR(rt)) { __ip_do_redirect(rt, skb, &fl4, false); ip_rt_put(rt); } } EXPORT_SYMBOL_GPL(ipv4_redirect); void ipv4_sk_redirect(struct sk_buff *skb, struct sock *sk) { const struct iphdr *iph = (const struct iphdr *)skb->data; struct flowi4 fl4; struct rtable *rt; struct net *net = sock_net(sk); __build_flow_key(net, &fl4, sk, iph, 0, 0, 0, 0, 0); rt = __ip_route_output_key(net, &fl4); if (!IS_ERR(rt)) { __ip_do_redirect(rt, skb, &fl4, false); ip_rt_put(rt); } } EXPORT_SYMBOL_GPL(ipv4_sk_redirect); INDIRECT_CALLABLE_SCOPE struct dst_entry *ipv4_dst_check(struct dst_entry *dst, u32 cookie) { struct rtable *rt = dst_rtable(dst); /* All IPV4 dsts are created with ->obsolete set to the value * DST_OBSOLETE_FORCE_CHK which forces validation calls down * into this function always. * * When a PMTU/redirect information update invalidates a route, * this is indicated by setting obsolete to DST_OBSOLETE_KILL or * DST_OBSOLETE_DEAD. */ if (READ_ONCE(dst->obsolete) != DST_OBSOLETE_FORCE_CHK || rt_is_expired(rt)) return NULL; return dst; } EXPORT_INDIRECT_CALLABLE(ipv4_dst_check); static void ipv4_send_dest_unreach(struct sk_buff *skb) { struct inet_skb_parm parm; struct net_device *dev; int res; /* Recompile ip options since IPCB may not be valid anymore. * Also check we have a reasonable ipv4 header. */ if (!pskb_network_may_pull(skb, sizeof(struct iphdr)) || ip_hdr(skb)->version != 4 || ip_hdr(skb)->ihl < 5) return; memset(&parm, 0, sizeof(parm)); if (ip_hdr(skb)->ihl > 5) { if (!pskb_network_may_pull(skb, ip_hdr(skb)->ihl * 4)) return; parm.opt.optlen = ip_hdr(skb)->ihl * 4 - sizeof(struct iphdr); rcu_read_lock(); dev = skb->dev ? skb->dev : skb_rtable(skb)->dst.dev; res = __ip_options_compile(dev_net(dev), &parm.opt, skb, NULL); rcu_read_unlock(); if (res) return; } __icmp_send(skb, ICMP_DEST_UNREACH, ICMP_HOST_UNREACH, 0, &parm); } static void ipv4_link_failure(struct sk_buff *skb) { struct rtable *rt; ipv4_send_dest_unreach(skb); rt = skb_rtable(skb); if (rt) dst_set_expires(&rt->dst, 0); } static int ip_rt_bug(struct net *net, struct sock *sk, struct sk_buff *skb) { pr_debug("%s: %pI4 -> %pI4, %s\n", __func__, &ip_hdr(skb)->saddr, &ip_hdr(skb)->daddr, skb->dev ? skb->dev->name : "?"); kfree_skb(skb); WARN_ON(1); return 0; } /* * We do not cache source address of outgoing interface, * because it is used only by IP RR, TS and SRR options, * so that it out of fast path. * * BTW remember: "addr" is allowed to be not aligned * in IP options! */ void ip_rt_get_source(u8 *addr, struct sk_buff *skb, struct rtable *rt) { __be32 src; if (rt_is_output_route(rt)) src = ip_hdr(skb)->saddr; else { struct fib_result res; struct iphdr *iph = ip_hdr(skb); struct flowi4 fl4 = { .daddr = iph->daddr, .saddr = iph->saddr, .flowi4_dscp = ip4h_dscp(iph), .flowi4_oif = rt->dst.dev->ifindex, .flowi4_iif = skb->dev->ifindex, .flowi4_mark = skb->mark, }; rcu_read_lock(); if (fib_lookup(dev_net(rt->dst.dev), &fl4, &res, 0) == 0) src = fib_result_prefsrc(dev_net(rt->dst.dev), &res); else src = inet_select_addr(rt->dst.dev, rt_nexthop(rt, iph->daddr), RT_SCOPE_UNIVERSE); rcu_read_unlock(); } memcpy(addr, &src, 4); } #ifdef CONFIG_IP_ROUTE_CLASSID static void set_class_tag(struct rtable *rt, u32 tag) { if (!(rt->dst.tclassid & 0xFFFF)) rt->dst.tclassid |= tag & 0xFFFF; if (!(rt->dst.tclassid & 0xFFFF0000)) rt->dst.tclassid |= tag & 0xFFFF0000; } #endif static unsigned int ipv4_default_advmss(const struct dst_entry *dst) { unsigned int header_size = sizeof(struct tcphdr) + sizeof(struct iphdr); unsigned int advmss; struct net *net; rcu_read_lock(); net = dst_dev_net_rcu(dst); advmss = max_t(unsigned int, ipv4_mtu(dst) - header_size, net->ipv4.ip_rt_min_advmss); rcu_read_unlock(); return min(advmss, IPV4_MAX_PMTU - header_size); } INDIRECT_CALLABLE_SCOPE unsigned int ipv4_mtu(const struct dst_entry *dst) { return ip_dst_mtu_maybe_forward(dst, false); } EXPORT_INDIRECT_CALLABLE(ipv4_mtu); static void ip_del_fnhe(struct fib_nh_common *nhc, __be32 daddr) { struct fnhe_hash_bucket *hash; struct fib_nh_exception *fnhe, __rcu **fnhe_p; u32 hval = fnhe_hashfun(daddr); spin_lock_bh(&fnhe_lock); hash = rcu_dereference_protected(nhc->nhc_exceptions, lockdep_is_held(&fnhe_lock)); hash += hval; fnhe_p = &hash->chain; fnhe = rcu_dereference_protected(*fnhe_p, lockdep_is_held(&fnhe_lock)); while (fnhe) { if (fnhe->fnhe_daddr == daddr) { rcu_assign_pointer(*fnhe_p, rcu_dereference_protected( fnhe->fnhe_next, lockdep_is_held(&fnhe_lock))); /* set fnhe_daddr to 0 to ensure it won't bind with * new dsts in rt_bind_exception(). */ fnhe->fnhe_daddr = 0; fnhe_flush_routes(fnhe); kfree_rcu(fnhe, rcu); break; } fnhe_p = &fnhe->fnhe_next; fnhe = rcu_dereference_protected(fnhe->fnhe_next, lockdep_is_held(&fnhe_lock)); } spin_unlock_bh(&fnhe_lock); } static struct fib_nh_exception *find_exception(struct fib_nh_common *nhc, __be32 daddr) { struct fnhe_hash_bucket *hash = rcu_dereference(nhc->nhc_exceptions); struct fib_nh_exception *fnhe; u32 hval; if (!hash) return NULL; hval = fnhe_hashfun(daddr); for (fnhe = rcu_dereference(hash[hval].chain); fnhe; fnhe = rcu_dereference(fnhe->fnhe_next)) { if (fnhe->fnhe_daddr == daddr) { if (fnhe->fnhe_expires && time_after(jiffies, fnhe->fnhe_expires)) { ip_del_fnhe(nhc, daddr); break; } return fnhe; } } return NULL; } /* MTU selection: * 1. mtu on route is locked - use it * 2. mtu from nexthop exception * 3. mtu from egress device */ u32 ip_mtu_from_fib_result(struct fib_result *res, __be32 daddr) { struct fib_nh_common *nhc = res->nhc; struct net_device *dev = nhc->nhc_dev; struct fib_info *fi = res->fi; u32 mtu = 0; if (READ_ONCE(dev_net(dev)->ipv4.sysctl_ip_fwd_use_pmtu) || fi->fib_metrics->metrics[RTAX_LOCK - 1] & (1 << RTAX_MTU)) mtu = fi->fib_mtu; if (likely(!mtu)) { struct fib_nh_exception *fnhe; fnhe = find_exception(nhc, daddr); if (fnhe && !time_after_eq(jiffies, fnhe->fnhe_expires)) mtu = fnhe->fnhe_pmtu; } if (likely(!mtu)) mtu = min(READ_ONCE(dev->mtu), IP_MAX_MTU); return mtu - lwtunnel_headroom(nhc->nhc_lwtstate, mtu); } static bool rt_bind_exception(struct rtable *rt, struct fib_nh_exception *fnhe, __be32 daddr, const bool do_cache) { bool ret = false; spin_lock_bh(&fnhe_lock); if (daddr == fnhe->fnhe_daddr) { struct rtable __rcu **porig; struct rtable *orig; int genid = fnhe_genid(dev_net(rt->dst.dev)); if (rt_is_input_route(rt)) porig = &fnhe->fnhe_rth_input; else porig = &fnhe->fnhe_rth_output; orig = rcu_dereference(*porig); if (fnhe->fnhe_genid != genid) { fnhe->fnhe_genid = genid; fnhe->fnhe_gw = 0; fnhe->fnhe_pmtu = 0; fnhe->fnhe_expires = 0; fnhe->fnhe_mtu_locked = false; fnhe_flush_routes(fnhe); orig = NULL; } fill_route_from_fnhe(rt, fnhe); if (!rt->rt_gw4) { rt->rt_gw4 = daddr; rt->rt_gw_family = AF_INET; } if (do_cache) { dst_hold(&rt->dst); rcu_assign_pointer(*porig, rt); if (orig) { dst_dev_put(&orig->dst); dst_release(&orig->dst); } ret = true; } fnhe->fnhe_stamp = jiffies; } spin_unlock_bh(&fnhe_lock); return ret; } static bool rt_cache_route(struct fib_nh_common *nhc, struct rtable *rt) { struct rtable *orig, *prev, **p; bool ret = true; if (rt_is_input_route(rt)) { p = (struct rtable **)&nhc->nhc_rth_input; } else { p = (struct rtable **)raw_cpu_ptr(nhc->nhc_pcpu_rth_output); } orig = *p; /* hold dst before doing cmpxchg() to avoid race condition * on this dst */ dst_hold(&rt->dst); prev = cmpxchg(p, orig, rt); if (prev == orig) { if (orig) { rt_add_uncached_list(orig); dst_release(&orig->dst); } } else { dst_release(&rt->dst); ret = false; } return ret; } struct uncached_list { spinlock_t lock; struct list_head head; }; static DEFINE_PER_CPU_ALIGNED(struct uncached_list, rt_uncached_list); void rt_add_uncached_list(struct rtable *rt) { struct uncached_list *ul = raw_cpu_ptr(&rt_uncached_list); rt->dst.rt_uncached_list = ul; spin_lock_bh(&ul->lock); list_add_tail(&rt->dst.rt_uncached, &ul->head); spin_unlock_bh(&ul->lock); } void rt_del_uncached_list(struct rtable *rt) { if (!list_empty(&rt->dst.rt_uncached)) { struct uncached_list *ul = rt->dst.rt_uncached_list; spin_lock_bh(&ul->lock); list_del_init(&rt->dst.rt_uncached); spin_unlock_bh(&ul->lock); } } static void ipv4_dst_destroy(struct dst_entry *dst) { ip_dst_metrics_put(dst); rt_del_uncached_list(dst_rtable(dst)); } void rt_flush_dev(struct net_device *dev) { struct rtable *rt, *safe; int cpu; for_each_possible_cpu(cpu) { struct uncached_list *ul = &per_cpu(rt_uncached_list, cpu); if (list_empty(&ul->head)) continue; spin_lock_bh(&ul->lock); list_for_each_entry_safe(rt, safe, &ul->head, dst.rt_uncached) { if (rt->dst.dev != dev) continue; rt->dst.dev = blackhole_netdev; netdev_ref_replace(dev, blackhole_netdev, &rt->dst.dev_tracker, GFP_ATOMIC); list_del_init(&rt->dst.rt_uncached); } spin_unlock_bh(&ul->lock); } } static bool rt_cache_valid(const struct rtable *rt) { return rt && READ_ONCE(rt->dst.obsolete) == DST_OBSOLETE_FORCE_CHK && !rt_is_expired(rt); } static void rt_set_nexthop(struct rtable *rt, __be32 daddr, const struct fib_result *res, struct fib_nh_exception *fnhe, struct fib_info *fi, u16 type, u32 itag, const bool do_cache) { bool cached = false; if (fi) { struct fib_nh_common *nhc = FIB_RES_NHC(*res); if (nhc->nhc_gw_family && nhc->nhc_scope == RT_SCOPE_LINK) { rt->rt_uses_gateway = 1; rt->rt_gw_family = nhc->nhc_gw_family; /* only INET and INET6 are supported */ if (likely(nhc->nhc_gw_family == AF_INET)) rt->rt_gw4 = nhc->nhc_gw.ipv4; else rt->rt_gw6 = nhc->nhc_gw.ipv6; } ip_dst_init_metrics(&rt->dst, fi->fib_metrics); #ifdef CONFIG_IP_ROUTE_CLASSID if (nhc->nhc_family == AF_INET) { struct fib_nh *nh; nh = container_of(nhc, struct fib_nh, nh_common); rt->dst.tclassid = nh->nh_tclassid; } #endif rt->dst.lwtstate = lwtstate_get(nhc->nhc_lwtstate); if (unlikely(fnhe)) cached = rt_bind_exception(rt, fnhe, daddr, do_cache); else if (do_cache) cached = rt_cache_route(nhc, rt); if (unlikely(!cached)) { /* Routes we intend to cache in nexthop exception or * FIB nexthop have the DST_NOCACHE bit clear. * However, if we are unsuccessful at storing this * route into the cache we really need to set it. */ if (!rt->rt_gw4) { rt->rt_gw_family = AF_INET; rt->rt_gw4 = daddr; } rt_add_uncached_list(rt); } } else rt_add_uncached_list(rt); #ifdef CONFIG_IP_ROUTE_CLASSID #ifdef CONFIG_IP_MULTIPLE_TABLES set_class_tag(rt, res->tclassid); #endif set_class_tag(rt, itag); #endif } struct rtable *rt_dst_alloc(struct net_device *dev, unsigned int flags, u16 type, bool noxfrm) { struct rtable *rt; rt = dst_alloc(&ipv4_dst_ops, dev, DST_OBSOLETE_FORCE_CHK, (noxfrm ? DST_NOXFRM : 0)); if (rt) { rt->rt_genid = rt_genid_ipv4(dev_net(dev)); rt->rt_flags = flags; rt->rt_type = type; rt->rt_is_input = 0; rt->rt_iif = 0; rt->rt_pmtu = 0; rt->rt_mtu_locked = 0; rt->rt_uses_gateway = 0; rt->rt_gw_family = 0; rt->rt_gw4 = 0; rt->dst.output = ip_output; if (flags & RTCF_LOCAL) rt->dst.input = ip_local_deliver; } return rt; } EXPORT_SYMBOL(rt_dst_alloc); struct rtable *rt_dst_clone(struct net_device *dev, struct rtable *rt) { struct rtable *new_rt; new_rt = dst_alloc(&ipv4_dst_ops, dev, DST_OBSOLETE_FORCE_CHK, rt->dst.flags); if (new_rt) { new_rt->rt_genid = rt_genid_ipv4(dev_net(dev)); new_rt->rt_flags = rt->rt_flags; new_rt->rt_type = rt->rt_type; new_rt->rt_is_input = rt->rt_is_input; new_rt->rt_iif = rt->rt_iif; new_rt->rt_pmtu = rt->rt_pmtu; new_rt->rt_mtu_locked = rt->rt_mtu_locked; new_rt->rt_gw_family = rt->rt_gw_family; if (rt->rt_gw_family == AF_INET) new_rt->rt_gw4 = rt->rt_gw4; else if (rt->rt_gw_family == AF_INET6) new_rt->rt_gw6 = rt->rt_gw6; new_rt->dst.input = READ_ONCE(rt->dst.input); new_rt->dst.output = READ_ONCE(rt->dst.output); new_rt->dst.error = rt->dst.error; new_rt->dst.lastuse = jiffies; new_rt->dst.lwtstate = lwtstate_get(rt->dst.lwtstate); } return new_rt; } EXPORT_SYMBOL(rt_dst_clone); /* called in rcu_read_lock() section */ enum skb_drop_reason ip_mc_validate_source(struct sk_buff *skb, __be32 daddr, __be32 saddr, dscp_t dscp, struct net_device *dev, struct in_device *in_dev, u32 *itag) { enum skb_drop_reason reason; /* Primary sanity checks. */ if (!in_dev) return SKB_DROP_REASON_NOT_SPECIFIED; if (ipv4_is_multicast(saddr) || ipv4_is_lbcast(saddr)) return SKB_DROP_REASON_IP_INVALID_SOURCE; if (skb->protocol != htons(ETH_P_IP)) return SKB_DROP_REASON_INVALID_PROTO; if (ipv4_is_loopback(saddr) && !IN_DEV_ROUTE_LOCALNET(in_dev)) return SKB_DROP_REASON_IP_LOCALNET; if (ipv4_is_zeronet(saddr)) { if (!ipv4_is_local_multicast(daddr) && ip_hdr(skb)->protocol != IPPROTO_IGMP) return SKB_DROP_REASON_IP_INVALID_SOURCE; } else { reason = fib_validate_source_reason(skb, saddr, 0, dscp, 0, dev, in_dev, itag); if (reason) return reason; } return SKB_NOT_DROPPED_YET; } /* called in rcu_read_lock() section */ static enum skb_drop_reason ip_route_input_mc(struct sk_buff *skb, __be32 daddr, __be32 saddr, dscp_t dscp, struct net_device *dev, int our) { struct in_device *in_dev = __in_dev_get_rcu(dev); unsigned int flags = RTCF_MULTICAST; enum skb_drop_reason reason; struct rtable *rth; u32 itag = 0; reason = ip_mc_validate_source(skb, daddr, saddr, dscp, dev, in_dev, &itag); if (reason) return reason; if (our) flags |= RTCF_LOCAL; if (IN_DEV_ORCONF(in_dev, NOPOLICY)) IPCB(skb)->flags |= IPSKB_NOPOLICY; rth = rt_dst_alloc(dev_net(dev)->loopback_dev, flags, RTN_MULTICAST, false); if (!rth) return SKB_DROP_REASON_NOMEM; #ifdef CONFIG_IP_ROUTE_CLASSID rth->dst.tclassid = itag; #endif rth->dst.output = ip_rt_bug; rth->rt_is_input= 1; #ifdef CONFIG_IP_MROUTE if (!ipv4_is_local_multicast(daddr) && IN_DEV_MFORWARD(in_dev)) rth->dst.input = ip_mr_input; #endif RT_CACHE_STAT_INC(in_slow_mc); skb_dst_drop(skb); skb_dst_set(skb, &rth->dst); return SKB_NOT_DROPPED_YET; } static void ip_handle_martian_source(struct net_device *dev, struct in_device *in_dev, struct sk_buff *skb, __be32 daddr, __be32 saddr) { RT_CACHE_STAT_INC(in_martian_src); #ifdef CONFIG_IP_ROUTE_VERBOSE if (IN_DEV_LOG_MARTIANS(in_dev) && net_ratelimit()) { /* * RFC1812 recommendation, if source is martian, * the only hint is MAC header. */ pr_warn("martian source %pI4 from %pI4, on dev %s\n", &daddr, &saddr, dev->name); if (dev->hard_header_len && skb_mac_header_was_set(skb)) { print_hex_dump(KERN_WARNING, "ll header: ", DUMP_PREFIX_OFFSET, 16, 1, skb_mac_header(skb), dev->hard_header_len, false); } } #endif } /* called in rcu_read_lock() section */ static enum skb_drop_reason __mkroute_input(struct sk_buff *skb, const struct fib_result *res, struct in_device *in_dev, __be32 daddr, __be32 saddr, dscp_t dscp) { enum skb_drop_reason reason = SKB_DROP_REASON_NOT_SPECIFIED; struct fib_nh_common *nhc = FIB_RES_NHC(*res); struct net_device *dev = nhc->nhc_dev; struct fib_nh_exception *fnhe; struct rtable *rth; int err; struct in_device *out_dev; bool do_cache; u32 itag = 0; /* get a working reference to the output device */ out_dev = __in_dev_get_rcu(dev); if (!out_dev) { net_crit_ratelimited("Bug in ip_route_input_slow(). Please report.\n"); return reason; } err = fib_validate_source(skb, saddr, daddr, dscp, FIB_RES_OIF(*res), in_dev->dev, in_dev, &itag); if (err < 0) { reason = -err; ip_handle_martian_source(in_dev->dev, in_dev, skb, daddr, saddr); goto cleanup; } do_cache = res->fi && !itag; if (out_dev == in_dev && err && IN_DEV_TX_REDIRECTS(out_dev) && skb->protocol == htons(ETH_P_IP)) { __be32 gw; gw = nhc->nhc_gw_family == AF_INET ? nhc->nhc_gw.ipv4 : 0; if (IN_DEV_SHARED_MEDIA(out_dev) || inet_addr_onlink(out_dev, saddr, gw)) IPCB(skb)->flags |= IPSKB_DOREDIRECT; } if (skb->protocol != htons(ETH_P_IP)) { /* Not IP (i.e. ARP). Do not create route, if it is * invalid for proxy arp. DNAT routes are always valid. * * Proxy arp feature have been extended to allow, ARP * replies back to the same interface, to support * Private VLAN switch technologies. See arp.c. */ if (out_dev == in_dev && IN_DEV_PROXY_ARP_PVLAN(in_dev) == 0) { reason = SKB_DROP_REASON_ARP_PVLAN_DISABLE; goto cleanup; } } if (IN_DEV_ORCONF(in_dev, NOPOLICY)) IPCB(skb)->flags |= IPSKB_NOPOLICY; fnhe = find_exception(nhc, daddr); if (do_cache) { if (fnhe) rth = rcu_dereference(fnhe->fnhe_rth_input); else rth = rcu_dereference(nhc->nhc_rth_input); if (rt_cache_valid(rth)) { skb_dst_set_noref(skb, &rth->dst); goto out; } } rth = rt_dst_alloc(out_dev->dev, 0, res->type, IN_DEV_ORCONF(out_dev, NOXFRM)); if (!rth) { reason = SKB_DROP_REASON_NOMEM; goto cleanup; } rth->rt_is_input = 1; RT_CACHE_STAT_INC(in_slow_tot); rth->dst.input = ip_forward; rt_set_nexthop(rth, daddr, res, fnhe, res->fi, res->type, itag, do_cache); lwtunnel_set_redirect(&rth->dst); skb_dst_set(skb, &rth->dst); out: reason = SKB_NOT_DROPPED_YET; cleanup: return reason; } #ifdef CONFIG_IP_ROUTE_MULTIPATH /* To make ICMP packets follow the right flow, the multipath hash is * calculated from the inner IP addresses. */ static void ip_multipath_l3_keys(const struct sk_buff *skb, struct flow_keys *hash_keys) { const struct iphdr *outer_iph = ip_hdr(skb); const struct iphdr *key_iph = outer_iph; const struct iphdr *inner_iph; const struct icmphdr *icmph; struct iphdr _inner_iph; struct icmphdr _icmph; if (likely(outer_iph->protocol != IPPROTO_ICMP)) goto out; if (unlikely((outer_iph->frag_off & htons(IP_OFFSET)) != 0)) goto out; icmph = skb_header_pointer(skb, outer_iph->ihl * 4, sizeof(_icmph), &_icmph); if (!icmph) goto out; if (!icmp_is_err(icmph->type)) goto out; inner_iph = skb_header_pointer(skb, outer_iph->ihl * 4 + sizeof(_icmph), sizeof(_inner_iph), &_inner_iph); if (!inner_iph) goto out; key_iph = inner_iph; out: hash_keys->addrs.v4addrs.src = key_iph->saddr; hash_keys->addrs.v4addrs.dst = key_iph->daddr; } static u32 fib_multipath_custom_hash_outer(const struct net *net, const struct sk_buff *skb, bool *p_has_inner) { u32 hash_fields = READ_ONCE(net->ipv4.sysctl_fib_multipath_hash_fields); struct flow_keys keys, hash_keys; if (!(hash_fields & FIB_MULTIPATH_HASH_FIELD_OUTER_MASK)) return 0; memset(&hash_keys, 0, sizeof(hash_keys)); skb_flow_dissect_flow_keys(skb, &keys, FLOW_DISSECTOR_F_STOP_AT_ENCAP); hash_keys.control.addr_type = FLOW_DISSECTOR_KEY_IPV4_ADDRS; if (hash_fields & FIB_MULTIPATH_HASH_FIELD_SRC_IP) hash_keys.addrs.v4addrs.src = keys.addrs.v4addrs.src; if (hash_fields & FIB_MULTIPATH_HASH_FIELD_DST_IP) hash_keys.addrs.v4addrs.dst = keys.addrs.v4addrs.dst; if (hash_fields & FIB_MULTIPATH_HASH_FIELD_IP_PROTO) hash_keys.basic.ip_proto = keys.basic.ip_proto; if (hash_fields & FIB_MULTIPATH_HASH_FIELD_SRC_PORT) hash_keys.ports.src = keys.ports.src; if (hash_fields & FIB_MULTIPATH_HASH_FIELD_DST_PORT) hash_keys.ports.dst = keys.ports.dst; *p_has_inner = !!(keys.control.flags & FLOW_DIS_ENCAPSULATION); return fib_multipath_hash_from_keys(net, &hash_keys); } static u32 fib_multipath_custom_hash_inner(const struct net *net, const struct sk_buff *skb, bool has_inner) { u32 hash_fields = READ_ONCE(net->ipv4.sysctl_fib_multipath_hash_fields); struct flow_keys keys, hash_keys; /* We assume the packet carries an encapsulation, but if none was * encountered during dissection of the outer flow, then there is no * point in calling the flow dissector again. */ if (!has_inner) return 0; if (!(hash_fields & FIB_MULTIPATH_HASH_FIELD_INNER_MASK)) return 0; memset(&hash_keys, 0, sizeof(hash_keys)); skb_flow_dissect_flow_keys(skb, &keys, 0); if (!(keys.control.flags & FLOW_DIS_ENCAPSULATION)) return 0; if (keys.control.addr_type == FLOW_DISSECTOR_KEY_IPV4_ADDRS) { hash_keys.control.addr_type = FLOW_DISSECTOR_KEY_IPV4_ADDRS; if (hash_fields & FIB_MULTIPATH_HASH_FIELD_INNER_SRC_IP) hash_keys.addrs.v4addrs.src = keys.addrs.v4addrs.src; if (hash_fields & FIB_MULTIPATH_HASH_FIELD_INNER_DST_IP) hash_keys.addrs.v4addrs.dst = keys.addrs.v4addrs.dst; } else if (keys.control.addr_type == FLOW_DISSECTOR_KEY_IPV6_ADDRS) { hash_keys.control.addr_type = FLOW_DISSECTOR_KEY_IPV6_ADDRS; if (hash_fields & FIB_MULTIPATH_HASH_FIELD_INNER_SRC_IP) hash_keys.addrs.v6addrs.src = keys.addrs.v6addrs.src; if (hash_fields & FIB_MULTIPATH_HASH_FIELD_INNER_DST_IP) hash_keys.addrs.v6addrs.dst = keys.addrs.v6addrs.dst; if (hash_fields & FIB_MULTIPATH_HASH_FIELD_INNER_FLOWLABEL) hash_keys.tags.flow_label = keys.tags.flow_label; } if (hash_fields & FIB_MULTIPATH_HASH_FIELD_INNER_IP_PROTO) hash_keys.basic.ip_proto = keys.basic.ip_proto; if (hash_fields & FIB_MULTIPATH_HASH_FIELD_INNER_SRC_PORT) hash_keys.ports.src = keys.ports.src; if (hash_fields & FIB_MULTIPATH_HASH_FIELD_INNER_DST_PORT) hash_keys.ports.dst = keys.ports.dst; return fib_multipath_hash_from_keys(net, &hash_keys); } static u32 fib_multipath_custom_hash_skb(const struct net *net, const struct sk_buff *skb) { u32 mhash, mhash_inner; bool has_inner = true; mhash = fib_multipath_custom_hash_outer(net, skb, &has_inner); mhash_inner = fib_multipath_custom_hash_inner(net, skb, has_inner); return jhash_2words(mhash, mhash_inner, 0); } static u32 fib_multipath_custom_hash_fl4(const struct net *net, const struct flowi4 *fl4) { u32 hash_fields = READ_ONCE(net->ipv4.sysctl_fib_multipath_hash_fields); struct flow_keys hash_keys; if (!(hash_fields & FIB_MULTIPATH_HASH_FIELD_OUTER_MASK)) return 0; memset(&hash_keys, 0, sizeof(hash_keys)); hash_keys.control.addr_type = FLOW_DISSECTOR_KEY_IPV4_ADDRS; if (hash_fields & FIB_MULTIPATH_HASH_FIELD_SRC_IP) hash_keys.addrs.v4addrs.src = fl4->saddr; if (hash_fields & FIB_MULTIPATH_HASH_FIELD_DST_IP) hash_keys.addrs.v4addrs.dst = fl4->daddr; if (hash_fields & FIB_MULTIPATH_HASH_FIELD_IP_PROTO) hash_keys.basic.ip_proto = fl4->flowi4_proto; if (hash_fields & FIB_MULTIPATH_HASH_FIELD_SRC_PORT) { if (fl4->flowi4_flags & FLOWI_FLAG_ANY_SPORT) hash_keys.ports.src = (__force __be16)get_random_u16(); else hash_keys.ports.src = fl4->fl4_sport; } if (hash_fields & FIB_MULTIPATH_HASH_FIELD_DST_PORT) hash_keys.ports.dst = fl4->fl4_dport; return fib_multipath_hash_from_keys(net, &hash_keys); } /* if skb is set it will be used and fl4 can be NULL */ int fib_multipath_hash(const struct net *net, const struct flowi4 *fl4, const struct sk_buff *skb, struct flow_keys *flkeys) { u32 multipath_hash = fl4 ? fl4->flowi4_multipath_hash : 0; struct flow_keys hash_keys; u32 mhash = 0; switch (READ_ONCE(net->ipv4.sysctl_fib_multipath_hash_policy)) { case 0: memset(&hash_keys, 0, sizeof(hash_keys)); hash_keys.control.addr_type = FLOW_DISSECTOR_KEY_IPV4_ADDRS; if (skb) { ip_multipath_l3_keys(skb, &hash_keys); } else { hash_keys.addrs.v4addrs.src = fl4->saddr; hash_keys.addrs.v4addrs.dst = fl4->daddr; } mhash = fib_multipath_hash_from_keys(net, &hash_keys); break; case 1: /* skb is currently provided only when forwarding */ if (skb) { unsigned int flag = FLOW_DISSECTOR_F_STOP_AT_ENCAP; struct flow_keys keys; /* short-circuit if we already have L4 hash present */ if (skb->l4_hash) return skb_get_hash_raw(skb) >> 1; memset(&hash_keys, 0, sizeof(hash_keys)); if (!flkeys) { skb_flow_dissect_flow_keys(skb, &keys, flag); flkeys = &keys; } hash_keys.control.addr_type = FLOW_DISSECTOR_KEY_IPV4_ADDRS; hash_keys.addrs.v4addrs.src = flkeys->addrs.v4addrs.src; hash_keys.addrs.v4addrs.dst = flkeys->addrs.v4addrs.dst; hash_keys.ports.src = flkeys->ports.src; hash_keys.ports.dst = flkeys->ports.dst; hash_keys.basic.ip_proto = flkeys->basic.ip_proto; } else { memset(&hash_keys, 0, sizeof(hash_keys)); hash_keys.control.addr_type = FLOW_DISSECTOR_KEY_IPV4_ADDRS; hash_keys.addrs.v4addrs.src = fl4->saddr; hash_keys.addrs.v4addrs.dst = fl4->daddr; if (fl4->flowi4_flags & FLOWI_FLAG_ANY_SPORT) hash_keys.ports.src = (__force __be16)get_random_u16(); else hash_keys.ports.src = fl4->fl4_sport; hash_keys.ports.dst = fl4->fl4_dport; hash_keys.basic.ip_proto = fl4->flowi4_proto; } mhash = fib_multipath_hash_from_keys(net, &hash_keys); break; case 2: memset(&hash_keys, 0, sizeof(hash_keys)); /* skb is currently provided only when forwarding */ if (skb) { struct flow_keys keys; skb_flow_dissect_flow_keys(skb, &keys, 0); /* Inner can be v4 or v6 */ if (keys.control.addr_type == FLOW_DISSECTOR_KEY_IPV4_ADDRS) { hash_keys.control.addr_type = FLOW_DISSECTOR_KEY_IPV4_ADDRS; hash_keys.addrs.v4addrs.src = keys.addrs.v4addrs.src; hash_keys.addrs.v4addrs.dst = keys.addrs.v4addrs.dst; } else if (keys.control.addr_type == FLOW_DISSECTOR_KEY_IPV6_ADDRS) { hash_keys.control.addr_type = FLOW_DISSECTOR_KEY_IPV6_ADDRS; hash_keys.addrs.v6addrs.src = keys.addrs.v6addrs.src; hash_keys.addrs.v6addrs.dst = keys.addrs.v6addrs.dst; hash_keys.tags.flow_label = keys.tags.flow_label; hash_keys.basic.ip_proto = keys.basic.ip_proto; } else { /* Same as case 0 */ hash_keys.control.addr_type = FLOW_DISSECTOR_KEY_IPV4_ADDRS; ip_multipath_l3_keys(skb, &hash_keys); } } else { /* Same as case 0 */ hash_keys.control.addr_type = FLOW_DISSECTOR_KEY_IPV4_ADDRS; hash_keys.addrs.v4addrs.src = fl4->saddr; hash_keys.addrs.v4addrs.dst = fl4->daddr; } mhash = fib_multipath_hash_from_keys(net, &hash_keys); break; case 3: if (skb) mhash = fib_multipath_custom_hash_skb(net, skb); else mhash = fib_multipath_custom_hash_fl4(net, fl4); break; } if (multipath_hash) mhash = jhash_2words(mhash, multipath_hash, 0); return mhash >> 1; } #endif /* CONFIG_IP_ROUTE_MULTIPATH */ static enum skb_drop_reason ip_mkroute_input(struct sk_buff *skb, struct fib_result *res, struct in_device *in_dev, __be32 daddr, __be32 saddr, dscp_t dscp, struct flow_keys *hkeys) { #ifdef CONFIG_IP_ROUTE_MULTIPATH if (res->fi && fib_info_num_path(res->fi) > 1) { int h = fib_multipath_hash(res->fi->fib_net, NULL, skb, hkeys); fib_select_multipath(res, h, NULL); IPCB(skb)->flags |= IPSKB_MULTIPATH; } #endif /* create a routing cache entry */ return __mkroute_input(skb, res, in_dev, daddr, saddr, dscp); } /* Implements all the saddr-related checks as ip_route_input_slow(), * assuming daddr is valid and the destination is not a local broadcast one. * Uses the provided hint instead of performing a route lookup. */ enum skb_drop_reason ip_route_use_hint(struct sk_buff *skb, __be32 daddr, __be32 saddr, dscp_t dscp, struct net_device *dev, const struct sk_buff *hint) { enum skb_drop_reason reason = SKB_DROP_REASON_NOT_SPECIFIED; struct in_device *in_dev = __in_dev_get_rcu(dev); struct rtable *rt = skb_rtable(hint); struct net *net = dev_net(dev); u32 tag = 0; if (!in_dev) return reason; if (ipv4_is_multicast(saddr) || ipv4_is_lbcast(saddr)) { reason = SKB_DROP_REASON_IP_INVALID_SOURCE; goto martian_source; } if (ipv4_is_zeronet(saddr)) { reason = SKB_DROP_REASON_IP_INVALID_SOURCE; goto martian_source; } if (ipv4_is_loopback(saddr) && !IN_DEV_NET_ROUTE_LOCALNET(in_dev, net)) { reason = SKB_DROP_REASON_IP_LOCALNET; goto martian_source; } if (!(rt->rt_flags & RTCF_LOCAL)) goto skip_validate_source; reason = fib_validate_source_reason(skb, saddr, daddr, dscp, 0, dev, in_dev, &tag); if (reason) goto martian_source; skip_validate_source: skb_dst_copy(skb, hint); return SKB_NOT_DROPPED_YET; martian_source: ip_handle_martian_source(dev, in_dev, skb, daddr, saddr); return reason; } /* get device for dst_alloc with local routes */ static struct net_device *ip_rt_get_dev(struct net *net, const struct fib_result *res) { struct fib_nh_common *nhc = res->fi ? res->nhc : NULL; struct net_device *dev = NULL; if (nhc) dev = l3mdev_master_dev_rcu(nhc->nhc_dev); return dev ? : net->loopback_dev; } /* * NOTE. We drop all the packets that has local source * addresses, because every properly looped back packet * must have correct destination already attached by output routine. * Changes in the enforced policies must be applied also to * ip_route_use_hint(). * * Such approach solves two big problems: * 1. Not simplex devices are handled properly. * 2. IP spoofing attempts are filtered with 100% of guarantee. * called with rcu_read_lock() */ static enum skb_drop_reason ip_route_input_slow(struct sk_buff *skb, __be32 daddr, __be32 saddr, dscp_t dscp, struct net_device *dev, struct fib_result *res) { enum skb_drop_reason reason = SKB_DROP_REASON_NOT_SPECIFIED; struct in_device *in_dev = __in_dev_get_rcu(dev); struct flow_keys *flkeys = NULL, _flkeys; struct net *net = dev_net(dev); struct ip_tunnel_info *tun_info; int err = -EINVAL; unsigned int flags = 0; u32 itag = 0; struct rtable *rth; struct flowi4 fl4; bool do_cache = true; /* IP on this device is disabled. */ if (!in_dev) goto out; /* Check for the most weird martians, which can be not detected * by fib_lookup. */ tun_info = skb_tunnel_info(skb); if (tun_info && !(tun_info->mode & IP_TUNNEL_INFO_TX)) fl4.flowi4_tun_key.tun_id = tun_info->key.tun_id; else fl4.flowi4_tun_key.tun_id = 0; skb_dst_drop(skb); if (ipv4_is_multicast(saddr) || ipv4_is_lbcast(saddr)) { reason = SKB_DROP_REASON_IP_INVALID_SOURCE; goto martian_source; } res->fi = NULL; res->table = NULL; if (ipv4_is_lbcast(daddr) || (saddr == 0 && daddr == 0)) goto brd_input; /* Accept zero addresses only to limited broadcast; * I even do not know to fix it or not. Waiting for complains :-) */ if (ipv4_is_zeronet(saddr)) { reason = SKB_DROP_REASON_IP_INVALID_SOURCE; goto martian_source; } if (ipv4_is_zeronet(daddr)) { reason = SKB_DROP_REASON_IP_INVALID_DEST; goto martian_destination; } /* Following code try to avoid calling IN_DEV_NET_ROUTE_LOCALNET(), * and call it once if daddr or/and saddr are loopback addresses */ if (ipv4_is_loopback(daddr)) { if (!IN_DEV_NET_ROUTE_LOCALNET(in_dev, net)) { reason = SKB_DROP_REASON_IP_LOCALNET; goto martian_destination; } } else if (ipv4_is_loopback(saddr)) { if (!IN_DEV_NET_ROUTE_LOCALNET(in_dev, net)) { reason = SKB_DROP_REASON_IP_LOCALNET; goto martian_source; } } /* * Now we are ready to route packet. */ fl4.flowi4_l3mdev = 0; fl4.flowi4_oif = 0; fl4.flowi4_iif = dev->ifindex; fl4.flowi4_mark = skb->mark; fl4.flowi4_dscp = dscp; fl4.flowi4_scope = RT_SCOPE_UNIVERSE; fl4.flowi4_flags = 0; fl4.daddr = daddr; fl4.saddr = saddr; fl4.flowi4_uid = sock_net_uid(net, NULL); fl4.flowi4_multipath_hash = 0; if (fib4_rules_early_flow_dissect(net, skb, &fl4, &_flkeys)) { flkeys = &_flkeys; } else { fl4.flowi4_proto = 0; fl4.fl4_sport = 0; fl4.fl4_dport = 0; } err = fib_lookup(net, &fl4, res, 0); if (err != 0) { if (!IN_DEV_FORWARD(in_dev)) err = -EHOSTUNREACH; goto no_route; } if (res->type == RTN_BROADCAST) { if (IN_DEV_BFORWARD(in_dev)) goto make_route; /* not do cache if bc_forwarding is enabled */ if (IPV4_DEVCONF_ALL_RO(net, BC_FORWARDING)) do_cache = false; goto brd_input; } err = -EINVAL; if (res->type == RTN_LOCAL) { reason = fib_validate_source_reason(skb, saddr, daddr, dscp, 0, dev, in_dev, &itag); if (reason) goto martian_source; goto local_input; } if (!IN_DEV_FORWARD(in_dev)) { err = -EHOSTUNREACH; goto no_route; } if (res->type != RTN_UNICAST) { reason = SKB_DROP_REASON_IP_INVALID_DEST; goto martian_destination; } make_route: reason = ip_mkroute_input(skb, res, in_dev, daddr, saddr, dscp, flkeys); out: return reason; brd_input: if (skb->protocol != htons(ETH_P_IP)) { reason = SKB_DROP_REASON_INVALID_PROTO; goto out; } if (!ipv4_is_zeronet(saddr)) { reason = fib_validate_source_reason(skb, saddr, 0, dscp, 0, dev, in_dev, &itag); if (reason) goto martian_source; } flags |= RTCF_BROADCAST; res->type = RTN_BROADCAST; RT_CACHE_STAT_INC(in_brd); local_input: if (IN_DEV_ORCONF(in_dev, NOPOLICY)) IPCB(skb)->flags |= IPSKB_NOPOLICY; do_cache &= res->fi && !itag; if (do_cache) { struct fib_nh_common *nhc = FIB_RES_NHC(*res); rth = rcu_dereference(nhc->nhc_rth_input); if (rt_cache_valid(rth)) { skb_dst_set_noref(skb, &rth->dst); reason = SKB_NOT_DROPPED_YET; goto out; } } rth = rt_dst_alloc(ip_rt_get_dev(net, res), flags | RTCF_LOCAL, res->type, false); if (!rth) goto e_nobufs; rth->dst.output= ip_rt_bug; #ifdef CONFIG_IP_ROUTE_CLASSID rth->dst.tclassid = itag; #endif rth->rt_is_input = 1; RT_CACHE_STAT_INC(in_slow_tot); if (res->type == RTN_UNREACHABLE) { rth->dst.input= ip_error; rth->dst.error= -err; rth->rt_flags &= ~RTCF_LOCAL; } if (do_cache) { struct fib_nh_common *nhc = FIB_RES_NHC(*res); rth->dst.lwtstate = lwtstate_get(nhc->nhc_lwtstate); if (lwtunnel_input_redirect(rth->dst.lwtstate)) { WARN_ON(rth->dst.input == lwtunnel_input); rth->dst.lwtstate->orig_input = rth->dst.input; rth->dst.input = lwtunnel_input; } if (unlikely(!rt_cache_route(nhc, rth))) rt_add_uncached_list(rth); } skb_dst_set(skb, &rth->dst); reason = SKB_NOT_DROPPED_YET; goto out; no_route: RT_CACHE_STAT_INC(in_no_route); res->type = RTN_UNREACHABLE; res->fi = NULL; res->table = NULL; goto local_input; /* * Do not cache martian addresses: they should be logged (RFC1812) */ martian_destination: RT_CACHE_STAT_INC(in_martian_dst); #ifdef CONFIG_IP_ROUTE_VERBOSE if (IN_DEV_LOG_MARTIANS(in_dev)) net_warn_ratelimited("martian destination %pI4 from %pI4, dev %s\n", &daddr, &saddr, dev->name); #endif goto out; e_nobufs: reason = SKB_DROP_REASON_NOMEM; goto out; martian_source: ip_handle_martian_source(dev, in_dev, skb, daddr, saddr); goto out; } /* called with rcu_read_lock held */ static enum skb_drop_reason ip_route_input_rcu(struct sk_buff *skb, __be32 daddr, __be32 saddr, dscp_t dscp, struct net_device *dev, struct fib_result *res) { /* Multicast recognition logic is moved from route cache to here. * The problem was that too many Ethernet cards have broken/missing * hardware multicast filters :-( As result the host on multicasting * network acquires a lot of useless route cache entries, sort of * SDR messages from all the world. Now we try to get rid of them. * Really, provided software IP multicast filter is organized * reasonably (at least, hashed), it does not result in a slowdown * comparing with route cache reject entries. * Note, that multicast routers are not affected, because * route cache entry is created eventually. */ if (ipv4_is_multicast(daddr)) { enum skb_drop_reason reason = SKB_DROP_REASON_NOT_SPECIFIED; struct in_device *in_dev = __in_dev_get_rcu(dev); int our = 0; if (!in_dev) return reason; our = ip_check_mc_rcu(in_dev, daddr, saddr, ip_hdr(skb)->protocol); /* check l3 master if no match yet */ if (!our && netif_is_l3_slave(dev)) { struct in_device *l3_in_dev; l3_in_dev = __in_dev_get_rcu(skb->dev); if (l3_in_dev) our = ip_check_mc_rcu(l3_in_dev, daddr, saddr, ip_hdr(skb)->protocol); } if (our #ifdef CONFIG_IP_MROUTE || (!ipv4_is_local_multicast(daddr) && IN_DEV_MFORWARD(in_dev)) #endif ) { reason = ip_route_input_mc(skb, daddr, saddr, dscp, dev, our); } return reason; } return ip_route_input_slow(skb, daddr, saddr, dscp, dev, res); } enum skb_drop_reason ip_route_input_noref(struct sk_buff *skb, __be32 daddr, __be32 saddr, dscp_t dscp, struct net_device *dev) { enum skb_drop_reason reason; struct fib_result res; rcu_read_lock(); reason = ip_route_input_rcu(skb, daddr, saddr, dscp, dev, &res); rcu_read_unlock(); return reason; } EXPORT_SYMBOL(ip_route_input_noref); /* called with rcu_read_lock() */ static struct rtable *__mkroute_output(const struct fib_result *res, const struct flowi4 *fl4, int orig_oif, struct net_device *dev_out, unsigned int flags) { struct fib_info *fi = res->fi; struct fib_nh_exception *fnhe; struct in_device *in_dev; u16 type = res->type; struct rtable *rth; bool do_cache; in_dev = __in_dev_get_rcu(dev_out); if (!in_dev) return ERR_PTR(-EINVAL); if (likely(!IN_DEV_ROUTE_LOCALNET(in_dev))) if (ipv4_is_loopback(fl4->saddr) && !(dev_out->flags & IFF_LOOPBACK) && !netif_is_l3_master(dev_out)) return ERR_PTR(-EINVAL); if (ipv4_is_lbcast(fl4->daddr)) { type = RTN_BROADCAST; /* reset fi to prevent gateway resolution */ fi = NULL; } else if (ipv4_is_multicast(fl4->daddr)) { type = RTN_MULTICAST; } else if (ipv4_is_zeronet(fl4->daddr)) { return ERR_PTR(-EINVAL); } if (dev_out->flags & IFF_LOOPBACK) flags |= RTCF_LOCAL; do_cache = true; if (type == RTN_BROADCAST) { flags |= RTCF_BROADCAST | RTCF_LOCAL; } else if (type == RTN_MULTICAST) { flags |= RTCF_MULTICAST | RTCF_LOCAL; if (!ip_check_mc_rcu(in_dev, fl4->daddr, fl4->saddr, fl4->flowi4_proto)) flags &= ~RTCF_LOCAL; else do_cache = false; /* If multicast route do not exist use * default one, but do not gateway in this case. * Yes, it is hack. */ if (fi && res->prefixlen < 4) fi = NULL; } else if ((type == RTN_LOCAL) && (orig_oif != 0) && (orig_oif != dev_out->ifindex)) { /* For local routes that require a particular output interface * we do not want to cache the result. Caching the result * causes incorrect behaviour when there are multiple source * addresses on the interface, the end result being that if the * intended recipient is waiting on that interface for the * packet he won't receive it because it will be delivered on * the loopback interface and the IP_PKTINFO ipi_ifindex will * be set to the loopback interface as well. */ do_cache = false; } fnhe = NULL; do_cache &= fi != NULL; if (fi) { struct fib_nh_common *nhc = FIB_RES_NHC(*res); struct rtable __rcu **prth; fnhe = find_exception(nhc, fl4->daddr); if (!do_cache) goto add; if (fnhe) { prth = &fnhe->fnhe_rth_output; } else { if (unlikely(fl4->flowi4_flags & FLOWI_FLAG_KNOWN_NH && !(nhc->nhc_gw_family && nhc->nhc_scope == RT_SCOPE_LINK))) { do_cache = false; goto add; } prth = raw_cpu_ptr(nhc->nhc_pcpu_rth_output); } rth = rcu_dereference(*prth); if (rt_cache_valid(rth) && dst_hold_safe(&rth->dst)) return rth; } add: rth = rt_dst_alloc(dev_out, flags, type, IN_DEV_ORCONF(in_dev, NOXFRM)); if (!rth) return ERR_PTR(-ENOBUFS); rth->rt_iif = orig_oif; RT_CACHE_STAT_INC(out_slow_tot); if (flags & (RTCF_BROADCAST | RTCF_MULTICAST)) { if (flags & RTCF_LOCAL && !(dev_out->flags & IFF_LOOPBACK)) { rth->dst.output = ip_mc_output; RT_CACHE_STAT_INC(out_slow_mc); } #ifdef CONFIG_IP_MROUTE if (type == RTN_MULTICAST) { if (IN_DEV_MFORWARD(in_dev) && !ipv4_is_local_multicast(fl4->daddr)) { rth->dst.input = ip_mr_input; rth->dst.output = ip_mr_output; } } #endif } rt_set_nexthop(rth, fl4->daddr, res, fnhe, fi, type, 0, do_cache); lwtunnel_set_redirect(&rth->dst); return rth; } /* * Major route resolver routine. */ struct rtable *ip_route_output_key_hash(struct net *net, struct flowi4 *fl4, const struct sk_buff *skb) { struct fib_result res = { .type = RTN_UNSPEC, .fi = NULL, .table = NULL, .tclassid = 0, }; struct rtable *rth; fl4->flowi4_iif = LOOPBACK_IFINDEX; rcu_read_lock(); rth = ip_route_output_key_hash_rcu(net, fl4, &res, skb); rcu_read_unlock(); return rth; } EXPORT_SYMBOL_GPL(ip_route_output_key_hash); struct rtable *ip_route_output_key_hash_rcu(struct net *net, struct flowi4 *fl4, struct fib_result *res, const struct sk_buff *skb) { struct net_device *dev_out = NULL; int orig_oif = fl4->flowi4_oif; unsigned int flags = 0; struct rtable *rth; int err; if (fl4->saddr) { if (ipv4_is_multicast(fl4->saddr) || ipv4_is_lbcast(fl4->saddr)) { rth = ERR_PTR(-EINVAL); goto out; } rth = ERR_PTR(-ENETUNREACH); /* I removed check for oif == dev_out->oif here. * It was wrong for two reasons: * 1. ip_dev_find(net, saddr) can return wrong iface, if saddr * is assigned to multiple interfaces. * 2. Moreover, we are allowed to send packets with saddr * of another iface. --ANK */ if (fl4->flowi4_oif == 0 && (ipv4_is_multicast(fl4->daddr) || ipv4_is_lbcast(fl4->daddr))) { /* It is equivalent to inet_addr_type(saddr) == RTN_LOCAL */ dev_out = __ip_dev_find(net, fl4->saddr, false); if (!dev_out) goto out; /* Special hack: user can direct multicasts * and limited broadcast via necessary interface * without fiddling with IP_MULTICAST_IF or IP_PKTINFO. * This hack is not just for fun, it allows * vic,vat and friends to work. * They bind socket to loopback, set ttl to zero * and expect that it will work. * From the viewpoint of routing cache they are broken, * because we are not allowed to build multicast path * with loopback source addr (look, routing cache * cannot know, that ttl is zero, so that packet * will not leave this host and route is valid). * Luckily, this hack is good workaround. */ fl4->flowi4_oif = dev_out->ifindex; goto make_route; } if (!(fl4->flowi4_flags & FLOWI_FLAG_ANYSRC)) { /* It is equivalent to inet_addr_type(saddr) == RTN_LOCAL */ if (!__ip_dev_find(net, fl4->saddr, false)) goto out; } } if (fl4->flowi4_oif) { dev_out = dev_get_by_index_rcu(net, fl4->flowi4_oif); rth = ERR_PTR(-ENODEV); if (!dev_out) goto out; /* RACE: Check return value of inet_select_addr instead. */ if (!(dev_out->flags & IFF_UP) || !__in_dev_get_rcu(dev_out)) { rth = ERR_PTR(-ENETUNREACH); goto out; } if (ipv4_is_local_multicast(fl4->daddr) || ipv4_is_lbcast(fl4->daddr) || fl4->flowi4_proto == IPPROTO_IGMP) { if (!fl4->saddr) fl4->saddr = inet_select_addr(dev_out, 0, RT_SCOPE_LINK); goto make_route; } if (!fl4->saddr) { if (ipv4_is_multicast(fl4->daddr)) fl4->saddr = inet_select_addr(dev_out, 0, fl4->flowi4_scope); else if (!fl4->daddr) fl4->saddr = inet_select_addr(dev_out, 0, RT_SCOPE_HOST); } } if (!fl4->daddr) { fl4->daddr = fl4->saddr; if (!fl4->daddr) fl4->daddr = fl4->saddr = htonl(INADDR_LOOPBACK); dev_out = net->loopback_dev; fl4->flowi4_oif = LOOPBACK_IFINDEX; res->type = RTN_LOCAL; flags |= RTCF_LOCAL; goto make_route; } err = fib_lookup(net, fl4, res, 0); if (err) { res->fi = NULL; res->table = NULL; if (fl4->flowi4_oif && (ipv4_is_multicast(fl4->daddr) || !fl4->flowi4_l3mdev)) { /* Apparently, routing tables are wrong. Assume, * that the destination is on link. * * WHY? DW. * Because we are allowed to send to iface * even if it has NO routes and NO assigned * addresses. When oif is specified, routing * tables are looked up with only one purpose: * to catch if destination is gatewayed, rather than * direct. Moreover, if MSG_DONTROUTE is set, * we send packet, ignoring both routing tables * and ifaddr state. --ANK * * * We could make it even if oif is unknown, * likely IPv6, but we do not. */ if (fl4->saddr == 0) fl4->saddr = inet_select_addr(dev_out, 0, RT_SCOPE_LINK); res->type = RTN_UNICAST; goto make_route; } rth = ERR_PTR(err); goto out; } if (res->type == RTN_LOCAL) { if (!fl4->saddr) { if (res->fi->fib_prefsrc) fl4->saddr = res->fi->fib_prefsrc; else fl4->saddr = fl4->daddr; } /* L3 master device is the loopback for that domain */ dev_out = l3mdev_master_dev_rcu(FIB_RES_DEV(*res)) ? : net->loopback_dev; /* make sure orig_oif points to fib result device even * though packet rx/tx happens over loopback or l3mdev */ orig_oif = FIB_RES_OIF(*res); fl4->flowi4_oif = dev_out->ifindex; flags |= RTCF_LOCAL; goto make_route; } fib_select_path(net, res, fl4, skb); dev_out = FIB_RES_DEV(*res); make_route: rth = __mkroute_output(res, fl4, orig_oif, dev_out, flags); out: return rth; } static struct dst_ops ipv4_dst_blackhole_ops = { .family = AF_INET, .default_advmss = ipv4_default_advmss, .neigh_lookup = ipv4_neigh_lookup, .check = dst_blackhole_check, .cow_metrics = dst_blackhole_cow_metrics, .update_pmtu = dst_blackhole_update_pmtu, .redirect = dst_blackhole_redirect, .mtu = dst_blackhole_mtu, }; struct dst_entry *ipv4_blackhole_route(struct net *net, struct dst_entry *dst_orig) { struct rtable *ort = dst_rtable(dst_orig); struct rtable *rt; rt = dst_alloc(&ipv4_dst_blackhole_ops, NULL, DST_OBSOLETE_DEAD, 0); if (rt) { struct dst_entry *new = &rt->dst; new->__use = 1; new->input = dst_discard; new->output = dst_discard_out; new->dev = net->loopback_dev; netdev_hold(new->dev, &new->dev_tracker, GFP_ATOMIC); rt->rt_is_input = ort->rt_is_input; rt->rt_iif = ort->rt_iif; rt->rt_pmtu = ort->rt_pmtu; rt->rt_mtu_locked = ort->rt_mtu_locked; rt->rt_genid = rt_genid_ipv4(net); rt->rt_flags = ort->rt_flags; rt->rt_type = ort->rt_type; rt->rt_uses_gateway = ort->rt_uses_gateway; rt->rt_gw_family = ort->rt_gw_family; if (rt->rt_gw_family == AF_INET) rt->rt_gw4 = ort->rt_gw4; else if (rt->rt_gw_family == AF_INET6) rt->rt_gw6 = ort->rt_gw6; } dst_release(dst_orig); return rt ? &rt->dst : ERR_PTR(-ENOMEM); } struct rtable *ip_route_output_flow(struct net *net, struct flowi4 *flp4, const struct sock *sk) { struct rtable *rt = __ip_route_output_key(net, flp4); if (IS_ERR(rt)) return rt; if (flp4->flowi4_proto) { flp4->flowi4_oif = rt->dst.dev->ifindex; rt = dst_rtable(xfrm_lookup_route(net, &rt->dst, flowi4_to_flowi(flp4), sk, 0)); } return rt; } EXPORT_SYMBOL_GPL(ip_route_output_flow); /* called with rcu_read_lock held */ static int rt_fill_info(struct net *net, __be32 dst, __be32 src, struct rtable *rt, u32 table_id, dscp_t dscp, struct flowi4 *fl4, struct sk_buff *skb, u32 portid, u32 seq, unsigned int flags) { struct rtmsg *r; struct nlmsghdr *nlh; unsigned long expires = 0; u32 error; u32 metrics[RTAX_MAX]; nlh = nlmsg_put(skb, portid, seq, RTM_NEWROUTE, sizeof(*r), flags); if (!nlh) return -EMSGSIZE; r = nlmsg_data(nlh); r->rtm_family = AF_INET; r->rtm_dst_len = 32; r->rtm_src_len = 0; r->rtm_tos = inet_dscp_to_dsfield(dscp); r->rtm_table = table_id < 256 ? table_id : RT_TABLE_COMPAT; if (nla_put_u32(skb, RTA_TABLE, table_id)) goto nla_put_failure; r->rtm_type = rt->rt_type; r->rtm_scope = RT_SCOPE_UNIVERSE; r->rtm_protocol = RTPROT_UNSPEC; r->rtm_flags = (rt->rt_flags & ~0xFFFF) | RTM_F_CLONED; if (rt->rt_flags & RTCF_NOTIFY) r->rtm_flags |= RTM_F_NOTIFY; if (IPCB(skb)->flags & IPSKB_DOREDIRECT) r->rtm_flags |= RTCF_DOREDIRECT; if (nla_put_in_addr(skb, RTA_DST, dst)) goto nla_put_failure; if (src) { r->rtm_src_len = 32; if (nla_put_in_addr(skb, RTA_SRC, src)) goto nla_put_failure; } if (rt->dst.dev && nla_put_u32(skb, RTA_OIF, rt->dst.dev->ifindex)) goto nla_put_failure; if (lwtunnel_fill_encap(skb, rt->dst.lwtstate, RTA_ENCAP, RTA_ENCAP_TYPE) < 0) goto nla_put_failure; #ifdef CONFIG_IP_ROUTE_CLASSID if (rt->dst.tclassid && nla_put_u32(skb, RTA_FLOW, rt->dst.tclassid)) goto nla_put_failure; #endif if (fl4 && !rt_is_input_route(rt) && fl4->saddr != src) { if (nla_put_in_addr(skb, RTA_PREFSRC, fl4->saddr)) goto nla_put_failure; } if (rt->rt_uses_gateway) { if (rt->rt_gw_family == AF_INET && nla_put_in_addr(skb, RTA_GATEWAY, rt->rt_gw4)) { goto nla_put_failure; } else if (rt->rt_gw_family == AF_INET6) { int alen = sizeof(struct in6_addr); struct nlattr *nla; struct rtvia *via; nla = nla_reserve(skb, RTA_VIA, alen + 2); if (!nla) goto nla_put_failure; via = nla_data(nla); via->rtvia_family = AF_INET6; memcpy(via->rtvia_addr, &rt->rt_gw6, alen); } } expires = READ_ONCE(rt->dst.expires); if (expires) { unsigned long now = jiffies; if (time_before(now, expires)) expires -= now; else expires = 0; } memcpy(metrics, dst_metrics_ptr(&rt->dst), sizeof(metrics)); if (rt->rt_pmtu && expires) metrics[RTAX_MTU - 1] = rt->rt_pmtu; if (rt->rt_mtu_locked && expires) metrics[RTAX_LOCK - 1] |= BIT(RTAX_MTU); if (rtnetlink_put_metrics(skb, metrics) < 0) goto nla_put_failure; if (fl4) { if (fl4->flowi4_mark && nla_put_u32(skb, RTA_MARK, fl4->flowi4_mark)) goto nla_put_failure; if (!uid_eq(fl4->flowi4_uid, INVALID_UID) && nla_put_u32(skb, RTA_UID, from_kuid_munged(current_user_ns(), fl4->flowi4_uid))) goto nla_put_failure; if (rt_is_input_route(rt)) { #ifdef CONFIG_IP_MROUTE if (ipv4_is_multicast(dst) && !ipv4_is_local_multicast(dst) && IPV4_DEVCONF_ALL_RO(net, MC_FORWARDING)) { int err = ipmr_get_route(net, skb, fl4->saddr, fl4->daddr, r, portid); if (err <= 0) { if (err == 0) return 0; goto nla_put_failure; } } else #endif if (nla_put_u32(skb, RTA_IIF, fl4->flowi4_iif)) goto nla_put_failure; } } error = rt->dst.error; if (rtnl_put_cacheinfo(skb, &rt->dst, 0, expires, error) < 0) goto nla_put_failure; nlmsg_end(skb, nlh); return 0; nla_put_failure: nlmsg_cancel(skb, nlh); return -EMSGSIZE; } static int fnhe_dump_bucket(struct net *net, struct sk_buff *skb, struct netlink_callback *cb, u32 table_id, struct fnhe_hash_bucket *bucket, int genid, int *fa_index, int fa_start, unsigned int flags) { int i; for (i = 0; i < FNHE_HASH_SIZE; i++) { struct fib_nh_exception *fnhe; for (fnhe = rcu_dereference(bucket[i].chain); fnhe; fnhe = rcu_dereference(fnhe->fnhe_next)) { struct rtable *rt; int err; if (*fa_index < fa_start) goto next; if (fnhe->fnhe_genid != genid) goto next; if (fnhe->fnhe_expires && time_after(jiffies, fnhe->fnhe_expires)) goto next; rt = rcu_dereference(fnhe->fnhe_rth_input); if (!rt) rt = rcu_dereference(fnhe->fnhe_rth_output); if (!rt) goto next; err = rt_fill_info(net, fnhe->fnhe_daddr, 0, rt, table_id, 0, NULL, skb, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq, flags); if (err) return err; next: (*fa_index)++; } } return 0; } int fib_dump_info_fnhe(struct sk_buff *skb, struct netlink_callback *cb, u32 table_id, struct fib_info *fi, int *fa_index, int fa_start, unsigned int flags) { struct net *net = sock_net(cb->skb->sk); int nhsel, genid = fnhe_genid(net); for (nhsel = 0; nhsel < fib_info_num_path(fi); nhsel++) { struct fib_nh_common *nhc = fib_info_nhc(fi, nhsel); struct fnhe_hash_bucket *bucket; int err; if (nhc->nhc_flags & RTNH_F_DEAD) continue; rcu_read_lock(); bucket = rcu_dereference(nhc->nhc_exceptions); err = 0; if (bucket) err = fnhe_dump_bucket(net, skb, cb, table_id, bucket, genid, fa_index, fa_start, flags); rcu_read_unlock(); if (err) return err; } return 0; } static struct sk_buff *inet_rtm_getroute_build_skb(__be32 src, __be32 dst, u8 ip_proto, __be16 sport, __be16 dport) { struct sk_buff *skb; struct iphdr *iph; skb = alloc_skb(NLMSG_GOODSIZE, GFP_KERNEL); if (!skb) return NULL; /* Reserve room for dummy headers, this skb can pass * through good chunk of routing engine. */ skb_reset_mac_header(skb); skb_reset_network_header(skb); skb->protocol = htons(ETH_P_IP); iph = skb_put(skb, sizeof(struct iphdr)); iph->protocol = ip_proto; iph->saddr = src; iph->daddr = dst; iph->version = 0x4; iph->frag_off = 0; iph->ihl = 0x5; skb_set_transport_header(skb, skb->len); switch (iph->protocol) { case IPPROTO_UDP: { struct udphdr *udph; udph = skb_put_zero(skb, sizeof(struct udphdr)); udph->source = sport; udph->dest = dport; udph->len = htons(sizeof(struct udphdr)); udph->check = 0; break; } case IPPROTO_TCP: { struct tcphdr *tcph; tcph = skb_put_zero(skb, sizeof(struct tcphdr)); tcph->source = sport; tcph->dest = dport; tcph->doff = sizeof(struct tcphdr) / 4; tcph->rst = 1; tcph->check = ~tcp_v4_check(sizeof(struct tcphdr), src, dst, 0); break; } case IPPROTO_ICMP: { struct icmphdr *icmph; icmph = skb_put_zero(skb, sizeof(struct icmphdr)); icmph->type = ICMP_ECHO; icmph->code = 0; } } return skb; } static int inet_rtm_valid_getroute_req(struct sk_buff *skb, const struct nlmsghdr *nlh, struct nlattr **tb, struct netlink_ext_ack *extack) { struct rtmsg *rtm; int i, err; rtm = nlmsg_payload(nlh, sizeof(*rtm)); if (!rtm) { NL_SET_ERR_MSG(extack, "ipv4: Invalid header for route get request"); return -EINVAL; } if (!netlink_strict_get_check(skb)) return nlmsg_parse_deprecated(nlh, sizeof(*rtm), tb, RTA_MAX, rtm_ipv4_policy, extack); if ((rtm->rtm_src_len && rtm->rtm_src_len != 32) || (rtm->rtm_dst_len && rtm->rtm_dst_len != 32) || rtm->rtm_table || rtm->rtm_protocol || rtm->rtm_scope || rtm->rtm_type) { NL_SET_ERR_MSG(extack, "ipv4: Invalid values in header for route get request"); return -EINVAL; } if (rtm->rtm_flags & ~(RTM_F_NOTIFY | RTM_F_LOOKUP_TABLE | RTM_F_FIB_MATCH)) { NL_SET_ERR_MSG(extack, "ipv4: Unsupported rtm_flags for route get request"); return -EINVAL; } err = nlmsg_parse_deprecated_strict(nlh, sizeof(*rtm), tb, RTA_MAX, rtm_ipv4_policy, extack); if (err) return err; if ((tb[RTA_SRC] && !rtm->rtm_src_len) || (tb[RTA_DST] && !rtm->rtm_dst_len)) { NL_SET_ERR_MSG(extack, "ipv4: rtm_src_len and rtm_dst_len must be 32 for IPv4"); return -EINVAL; } for (i = 0; i <= RTA_MAX; i++) { if (!tb[i]) continue; switch (i) { case RTA_IIF: case RTA_OIF: case RTA_SRC: case RTA_DST: case RTA_IP_PROTO: case RTA_SPORT: case RTA_DPORT: case RTA_MARK: case RTA_UID: break; default: NL_SET_ERR_MSG(extack, "ipv4: Unsupported attribute in route get request"); return -EINVAL; } } return 0; } static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh, struct netlink_ext_ack *extack) { struct net *net = sock_net(in_skb->sk); struct nlattr *tb[RTA_MAX+1]; u32 table_id = RT_TABLE_MAIN; __be16 sport = 0, dport = 0; struct fib_result res = {}; u8 ip_proto = IPPROTO_UDP; struct rtable *rt = NULL; struct sk_buff *skb; struct rtmsg *rtm; struct flowi4 fl4 = {}; __be32 dst = 0; __be32 src = 0; dscp_t dscp; kuid_t uid; u32 iif; int err; int mark; err = inet_rtm_valid_getroute_req(in_skb, nlh, tb, extack); if (err < 0) return err; rtm = nlmsg_data(nlh); src = nla_get_in_addr_default(tb[RTA_SRC], 0); dst = nla_get_in_addr_default(tb[RTA_DST], 0); iif = nla_get_u32_default(tb[RTA_IIF], 0); mark = nla_get_u32_default(tb[RTA_MARK], 0); dscp = inet_dsfield_to_dscp(rtm->rtm_tos); if (tb[RTA_UID]) uid = make_kuid(current_user_ns(), nla_get_u32(tb[RTA_UID])); else uid = (iif ? INVALID_UID : current_uid()); if (tb[RTA_IP_PROTO]) { err = rtm_getroute_parse_ip_proto(tb[RTA_IP_PROTO], &ip_proto, AF_INET, extack); if (err) return err; } if (tb[RTA_SPORT]) sport = nla_get_be16(tb[RTA_SPORT]); if (tb[RTA_DPORT]) dport = nla_get_be16(tb[RTA_DPORT]); skb = inet_rtm_getroute_build_skb(src, dst, ip_proto, sport, dport); if (!skb) return -ENOBUFS; fl4.daddr = dst; fl4.saddr = src; fl4.flowi4_dscp = dscp; fl4.flowi4_oif = nla_get_u32_default(tb[RTA_OIF], 0); fl4.flowi4_mark = mark; fl4.flowi4_uid = uid; if (sport) fl4.fl4_sport = sport; if (dport) fl4.fl4_dport = dport; fl4.flowi4_proto = ip_proto; rcu_read_lock(); if (iif) { struct net_device *dev; dev = dev_get_by_index_rcu(net, iif); if (!dev) { err = -ENODEV; goto errout_rcu; } fl4.flowi4_iif = iif; /* for rt_fill_info */ skb->dev = dev; skb->mark = mark; err = ip_route_input_rcu(skb, dst, src, dscp, dev, &res) ? -EINVAL : 0; rt = skb_rtable(skb); if (err == 0 && rt->dst.error) err = -rt->dst.error; } else { fl4.flowi4_iif = LOOPBACK_IFINDEX; skb->dev = net->loopback_dev; rt = ip_route_output_key_hash_rcu(net, &fl4, &res, skb); err = 0; if (IS_ERR(rt)) err = PTR_ERR(rt); else skb_dst_set(skb, &rt->dst); } if (err) goto errout_rcu; if (rtm->rtm_flags & RTM_F_NOTIFY) rt->rt_flags |= RTCF_NOTIFY; if (rtm->rtm_flags & RTM_F_LOOKUP_TABLE) table_id = res.table ? res.table->tb_id : 0; /* reset skb for netlink reply msg */ skb_trim(skb, 0); skb_reset_network_header(skb); skb_reset_transport_header(skb); skb_reset_mac_header(skb); if (rtm->rtm_flags & RTM_F_FIB_MATCH) { struct fib_rt_info fri; if (!res.fi) { err = fib_props[res.type].error; if (!err) err = -EHOSTUNREACH; goto errout_rcu; } fri.fi = res.fi; fri.tb_id = table_id; fri.dst = res.prefix; fri.dst_len = res.prefixlen; fri.dscp = res.dscp; fri.type = rt->rt_type; fri.offload = 0; fri.trap = 0; fri.offload_failed = 0; if (res.fa_head) { struct fib_alias *fa; hlist_for_each_entry_rcu(fa, res.fa_head, fa_list) { u8 slen = 32 - fri.dst_len; if (fa->fa_slen == slen && fa->tb_id == fri.tb_id && fa->fa_dscp == fri.dscp && fa->fa_info == res.fi && fa->fa_type == fri.type) { fri.offload = READ_ONCE(fa->offload); fri.trap = READ_ONCE(fa->trap); fri.offload_failed = READ_ONCE(fa->offload_failed); break; } } } err = fib_dump_info(skb, NETLINK_CB(in_skb).portid, nlh->nlmsg_seq, RTM_NEWROUTE, &fri, 0); } else { err = rt_fill_info(net, dst, src, rt, table_id, res.dscp, &fl4, skb, NETLINK_CB(in_skb).portid, nlh->nlmsg_seq, 0); } if (err < 0) goto errout_rcu; rcu_read_unlock(); err = rtnl_unicast(skb, net, NETLINK_CB(in_skb).portid); errout_free: return err; errout_rcu: rcu_read_unlock(); kfree_skb(skb); goto errout_free; } void ip_rt_multicast_event(struct in_device *in_dev) { rt_cache_flush(dev_net(in_dev->dev)); } #ifdef CONFIG_SYSCTL static int ip_rt_gc_interval __read_mostly = 60 * HZ; static int ip_rt_gc_min_interval __read_mostly = HZ / 2; static int ip_rt_gc_elasticity __read_mostly = 8; static int ip_min_valid_pmtu __read_mostly = IPV4_MIN_MTU; static int ipv4_sysctl_rtcache_flush(const struct ctl_table *__ctl, int write, void *buffer, size_t *lenp, loff_t *ppos) { struct net *net = (struct net *)__ctl->extra1; if (write) { rt_cache_flush(net); fnhe_genid_bump(net); return 0; } return -EINVAL; } static struct ctl_table ipv4_route_table[] = { { .procname = "gc_thresh", .data = &ipv4_dst_ops.gc_thresh, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec, }, { .procname = "max_size", .data = &ip_rt_max_size, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec, }, { /* Deprecated. Use gc_min_interval_ms */ .procname = "gc_min_interval", .data = &ip_rt_gc_min_interval, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec_jiffies, }, { .procname = "gc_min_interval_ms", .data = &ip_rt_gc_min_interval, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec_ms_jiffies, }, { .procname = "gc_timeout", .data = &ip_rt_gc_timeout, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec_jiffies, }, { .procname = "gc_interval", .data = &ip_rt_gc_interval, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec_jiffies, }, { .procname = "redirect_load", .data = &ip_rt_redirect_load, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec, }, { .procname = "redirect_number", .data = &ip_rt_redirect_number, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec, }, { .procname = "redirect_silence", .data = &ip_rt_redirect_silence, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec, }, { .procname = "error_cost", .data = &ip_rt_error_cost, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec, }, { .procname = "error_burst", .data = &ip_rt_error_burst, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec, }, { .procname = "gc_elasticity", .data = &ip_rt_gc_elasticity, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec, }, }; static const char ipv4_route_flush_procname[] = "flush"; static struct ctl_table ipv4_route_netns_table[] = { { .procname = ipv4_route_flush_procname, .maxlen = sizeof(int), .mode = 0200, .proc_handler = ipv4_sysctl_rtcache_flush, }, { .procname = "min_pmtu", .data = &init_net.ipv4.ip_rt_min_pmtu, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec_minmax, .extra1 = &ip_min_valid_pmtu, }, { .procname = "mtu_expires", .data = &init_net.ipv4.ip_rt_mtu_expires, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec_jiffies, }, { .procname = "min_adv_mss", .data = &init_net.ipv4.ip_rt_min_advmss, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec, }, }; static __net_init int sysctl_route_net_init(struct net *net) { struct ctl_table *tbl; size_t table_size = ARRAY_SIZE(ipv4_route_netns_table); tbl = ipv4_route_netns_table; if (!net_eq(net, &init_net)) { int i; tbl = kmemdup(tbl, sizeof(ipv4_route_netns_table), GFP_KERNEL); if (!tbl) goto err_dup; /* Don't export non-whitelisted sysctls to unprivileged users */ if (net->user_ns != &init_user_ns) { if (tbl[0].procname != ipv4_route_flush_procname) table_size = 0; } /* Update the variables to point into the current struct net * except for the first element flush */ for (i = 1; i < table_size; i++) tbl[i].data += (void *)net - (void *)&init_net; } tbl[0].extra1 = net; net->ipv4.route_hdr = register_net_sysctl_sz(net, "net/ipv4/route", tbl, table_size); if (!net->ipv4.route_hdr) goto err_reg; return 0; err_reg: if (tbl != ipv4_route_netns_table) kfree(tbl); err_dup: return -ENOMEM; } static __net_exit void sysctl_route_net_exit(struct net *net) { const struct ctl_table *tbl; tbl = net->ipv4.route_hdr->ctl_table_arg; unregister_net_sysctl_table(net->ipv4.route_hdr); BUG_ON(tbl == ipv4_route_netns_table); kfree(tbl); } static __net_initdata struct pernet_operations sysctl_route_ops = { .init = sysctl_route_net_init, .exit = sysctl_route_net_exit, }; #endif static __net_init int netns_ip_rt_init(struct net *net) { /* Set default value for namespaceified sysctls */ net->ipv4.ip_rt_min_pmtu = DEFAULT_MIN_PMTU; net->ipv4.ip_rt_mtu_expires = DEFAULT_MTU_EXPIRES; net->ipv4.ip_rt_min_advmss = DEFAULT_MIN_ADVMSS; return 0; } static struct pernet_operations __net_initdata ip_rt_ops = { .init = netns_ip_rt_init, }; static __net_init int rt_genid_init(struct net *net) { atomic_set(&net->ipv4.rt_genid, 0); atomic_set(&net->fnhe_genid, 0); atomic_set(&net->ipv4.dev_addr_genid, get_random_u32()); return 0; } static __net_initdata struct pernet_operations rt_genid_ops = { .init = rt_genid_init, }; static int __net_init ipv4_inetpeer_init(struct net *net) { struct inet_peer_base *bp = kmalloc(sizeof(*bp), GFP_KERNEL); if (!bp) return -ENOMEM; inet_peer_base_init(bp); net->ipv4.peers = bp; return 0; } static void __net_exit ipv4_inetpeer_exit(struct net *net) { struct inet_peer_base *bp = net->ipv4.peers; net->ipv4.peers = NULL; inetpeer_invalidate_tree(bp); kfree(bp); } static __net_initdata struct pernet_operations ipv4_inetpeer_ops = { .init = ipv4_inetpeer_init, .exit = ipv4_inetpeer_exit, }; #ifdef CONFIG_IP_ROUTE_CLASSID struct ip_rt_acct __percpu *ip_rt_acct __read_mostly; #endif /* CONFIG_IP_ROUTE_CLASSID */ static const struct rtnl_msg_handler ip_rt_rtnl_msg_handlers[] __initconst = { {.protocol = PF_INET, .msgtype = RTM_GETROUTE, .doit = inet_rtm_getroute, .flags = RTNL_FLAG_DOIT_UNLOCKED}, }; int __init ip_rt_init(void) { void *idents_hash; int cpu; /* For modern hosts, this will use 2 MB of memory */ idents_hash = alloc_large_system_hash("IP idents", sizeof(*ip_idents) + sizeof(*ip_tstamps), 0, 16, /* one bucket per 64 KB */ HASH_ZERO, NULL, &ip_idents_mask, 2048, 256*1024); ip_idents = idents_hash; get_random_bytes(ip_idents, (ip_idents_mask + 1) * sizeof(*ip_idents)); ip_tstamps = idents_hash + (ip_idents_mask + 1) * sizeof(*ip_idents); for_each_possible_cpu(cpu) { struct uncached_list *ul = &per_cpu(rt_uncached_list, cpu); INIT_LIST_HEAD(&ul->head); spin_lock_init(&ul->lock); } #ifdef CONFIG_IP_ROUTE_CLASSID ip_rt_acct = __alloc_percpu(256 * sizeof(struct ip_rt_acct), __alignof__(struct ip_rt_acct)); if (!ip_rt_acct) panic("IP: failed to allocate ip_rt_acct\n"); #endif ipv4_dst_ops.kmem_cachep = KMEM_CACHE(rtable, SLAB_HWCACHE_ALIGN | SLAB_PANIC); ipv4_dst_blackhole_ops.kmem_cachep = ipv4_dst_ops.kmem_cachep; if (dst_entries_init(&ipv4_dst_ops) < 0) panic("IP: failed to allocate ipv4_dst_ops counter\n"); if (dst_entries_init(&ipv4_dst_blackhole_ops) < 0) panic("IP: failed to allocate ipv4_dst_blackhole_ops counter\n"); ipv4_dst_ops.gc_thresh = ~0; ip_rt_max_size = INT_MAX; devinet_init(); ip_fib_init(); if (ip_rt_proc_init()) pr_err("Unable to create route proc files\n"); #ifdef CONFIG_XFRM xfrm_init(); xfrm4_init(); #endif rtnl_register_many(ip_rt_rtnl_msg_handlers); #ifdef CONFIG_SYSCTL register_pernet_subsys(&sysctl_route_ops); #endif register_pernet_subsys(&ip_rt_ops); register_pernet_subsys(&rt_genid_ops); register_pernet_subsys(&ipv4_inetpeer_ops); return 0; } #ifdef CONFIG_SYSCTL /* * We really need to sanitize the damn ipv4 init order, then all * this nonsense will go away. */ void __init ip_static_sysctl_init(void) { register_net_sysctl(&init_net, "net/ipv4/route", ipv4_route_table); } #endif |
| 68 6 4 58 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 | // SPDX-License-Identifier: GPL-2.0-or-later /* * IP Payload Compression Protocol (IPComp) - RFC3173. * * Copyright (c) 2003 James Morris <jmorris@intercode.com.au> * Copyright (c) 2003-2025 Herbert Xu <herbert@gondor.apana.org.au> * * Todo: * - Tunable compression parameters. * - Compression stats. * - Adaptive compression. */ #include <crypto/acompress.h> #include <linux/err.h> #include <linux/module.h> #include <linux/skbuff_ref.h> #include <linux/slab.h> #include <net/ipcomp.h> #include <net/xfrm.h> #define IPCOMP_SCRATCH_SIZE 65400 struct ipcomp_skb_cb { struct xfrm_skb_cb xfrm; struct acomp_req *req; }; struct ipcomp_data { u16 threshold; struct crypto_acomp *tfm; }; struct ipcomp_req_extra { struct xfrm_state *x; struct scatterlist sg[]; }; static inline struct ipcomp_skb_cb *ipcomp_cb(struct sk_buff *skb) { struct ipcomp_skb_cb *cb = (void *)skb->cb; BUILD_BUG_ON(sizeof(*cb) > sizeof(skb->cb)); return cb; } static int ipcomp_post_acomp(struct sk_buff *skb, int err, int hlen) { struct acomp_req *req = ipcomp_cb(skb)->req; struct ipcomp_req_extra *extra; struct scatterlist *dsg; int len, dlen; if (unlikely(err)) goto out_free_req; extra = acomp_request_extra(req); dsg = extra->sg; dlen = req->dlen; pskb_trim_unique(skb, 0); __skb_put(skb, hlen); /* Only update truesize on input. */ if (!hlen) skb->truesize += dlen; skb->data_len = dlen; skb->len += dlen; do { skb_frag_t *frag; struct page *page; frag = skb_shinfo(skb)->frags + skb_shinfo(skb)->nr_frags; page = sg_page(dsg); dsg = sg_next(dsg); len = PAGE_SIZE; if (dlen < len) len = dlen; skb_frag_fill_page_desc(frag, page, 0, len); skb_shinfo(skb)->nr_frags++; } while ((dlen -= len)); for (; dsg; dsg = sg_next(dsg)) __free_page(sg_page(dsg)); out_free_req: acomp_request_free(req); return err; } static int ipcomp_input_done2(struct sk_buff *skb, int err) { struct ip_comp_hdr *ipch = ip_comp_hdr(skb); const int plen = skb->len; skb->transport_header = skb->network_header + sizeof(*ipch); return ipcomp_post_acomp(skb, err, 0) ?: skb->len < (plen + sizeof(ip_comp_hdr)) ? -EINVAL : ipch->nexthdr; } static void ipcomp_input_done(void *data, int err) { struct sk_buff *skb = data; xfrm_input_resume(skb, ipcomp_input_done2(skb, err)); } static struct acomp_req *ipcomp_setup_req(struct xfrm_state *x, struct sk_buff *skb, int minhead, int dlen) { const int dnfrags = min(MAX_SKB_FRAGS, 16); struct ipcomp_data *ipcd = x->data; struct ipcomp_req_extra *extra; struct scatterlist *sg, *dsg; const int plen = skb->len; struct crypto_acomp *tfm; struct acomp_req *req; int nfrags; int total; int err; int i; ipcomp_cb(skb)->req = NULL; do { struct sk_buff *trailer; if (skb->len > PAGE_SIZE) { if (skb_linearize_cow(skb)) return ERR_PTR(-ENOMEM); nfrags = 1; break; } if (!skb_cloned(skb) && skb_headlen(skb) >= minhead) { if (!skb_is_nonlinear(skb)) { nfrags = 1; break; } else if (!skb_has_frag_list(skb)) { nfrags = skb_shinfo(skb)->nr_frags; nfrags++; break; } } nfrags = skb_cow_data(skb, skb_headlen(skb) < minhead ? minhead - skb_headlen(skb) : 0, &trailer); if (nfrags < 0) return ERR_PTR(nfrags); } while (0); tfm = ipcd->tfm; req = acomp_request_alloc_extra( tfm, sizeof(*extra) + sizeof(*sg) * (nfrags + dnfrags), GFP_ATOMIC); ipcomp_cb(skb)->req = req; if (!req) return ERR_PTR(-ENOMEM); extra = acomp_request_extra(req); extra->x = x; dsg = extra->sg; sg = dsg + dnfrags; sg_init_table(sg, nfrags); err = skb_to_sgvec(skb, sg, 0, plen); if (unlikely(err < 0)) return ERR_PTR(err); sg_init_table(dsg, dnfrags); total = 0; for (i = 0; i < dnfrags && total < dlen; i++) { struct page *page; page = alloc_page(GFP_ATOMIC); if (!page) break; sg_set_page(dsg + i, page, PAGE_SIZE, 0); total += PAGE_SIZE; } if (!i) return ERR_PTR(-ENOMEM); sg_mark_end(dsg + i - 1); dlen = min(dlen, total); acomp_request_set_params(req, sg, dsg, plen, dlen); return req; } static int ipcomp_decompress(struct xfrm_state *x, struct sk_buff *skb) { struct acomp_req *req; int err; req = ipcomp_setup_req(x, skb, 0, IPCOMP_SCRATCH_SIZE); err = PTR_ERR(req); if (IS_ERR(req)) goto out; acomp_request_set_callback(req, 0, ipcomp_input_done, skb); err = crypto_acomp_decompress(req); if (err == -EINPROGRESS) return err; out: return ipcomp_input_done2(skb, err); } int ipcomp_input(struct xfrm_state *x, struct sk_buff *skb) { struct ip_comp_hdr *ipch __maybe_unused; if (!pskb_may_pull(skb, sizeof(*ipch))) return -EINVAL; skb->ip_summed = CHECKSUM_NONE; /* Remove ipcomp header and decompress original payload */ __skb_pull(skb, sizeof(*ipch)); return ipcomp_decompress(x, skb); } EXPORT_SYMBOL_GPL(ipcomp_input); static int ipcomp_output_push(struct sk_buff *skb) { skb_push(skb, -skb_network_offset(skb)); return 0; } static int ipcomp_output_done2(struct xfrm_state *x, struct sk_buff *skb, int err) { struct ip_comp_hdr *ipch; err = ipcomp_post_acomp(skb, err, sizeof(*ipch)); if (err) goto out_ok; /* Install ipcomp header, convert into ipcomp datagram. */ ipch = ip_comp_hdr(skb); ipch->nexthdr = *skb_mac_header(skb); ipch->flags = 0; ipch->cpi = htons((u16 )ntohl(x->id.spi)); *skb_mac_header(skb) = IPPROTO_COMP; out_ok: return ipcomp_output_push(skb); } static void ipcomp_output_done(void *data, int err) { struct ipcomp_req_extra *extra; struct sk_buff *skb = data; struct acomp_req *req; req = ipcomp_cb(skb)->req; extra = acomp_request_extra(req); xfrm_output_resume(skb_to_full_sk(skb), skb, ipcomp_output_done2(extra->x, skb, err)); } static int ipcomp_compress(struct xfrm_state *x, struct sk_buff *skb) { struct ip_comp_hdr *ipch __maybe_unused; struct acomp_req *req; int err; req = ipcomp_setup_req(x, skb, sizeof(*ipch), skb->len - sizeof(*ipch)); err = PTR_ERR(req); if (IS_ERR(req)) goto out; acomp_request_set_callback(req, 0, ipcomp_output_done, skb); err = crypto_acomp_compress(req); if (err == -EINPROGRESS) return err; out: return ipcomp_output_done2(x, skb, err); } int ipcomp_output(struct xfrm_state *x, struct sk_buff *skb) { struct ipcomp_data *ipcd = x->data; if (skb->len < ipcd->threshold) { /* Don't bother compressing */ return ipcomp_output_push(skb); } return ipcomp_compress(x, skb); } EXPORT_SYMBOL_GPL(ipcomp_output); static void ipcomp_free_data(struct ipcomp_data *ipcd) { crypto_free_acomp(ipcd->tfm); } void ipcomp_destroy(struct xfrm_state *x) { struct ipcomp_data *ipcd = x->data; if (!ipcd) return; ipcomp_free_data(ipcd); kfree(ipcd); } EXPORT_SYMBOL_GPL(ipcomp_destroy); int ipcomp_init_state(struct xfrm_state *x, struct netlink_ext_ack *extack) { int err; struct ipcomp_data *ipcd; struct xfrm_algo_desc *calg_desc; err = -EINVAL; if (!x->calg) { NL_SET_ERR_MSG(extack, "Missing required compression algorithm"); goto out; } if (x->encap) { NL_SET_ERR_MSG(extack, "IPComp is not compatible with encapsulation"); goto out; } err = -ENOMEM; ipcd = kzalloc(sizeof(*ipcd), GFP_KERNEL); if (!ipcd) goto out; ipcd->tfm = crypto_alloc_acomp(x->calg->alg_name, 0, 0); if (IS_ERR(ipcd->tfm)) goto error; calg_desc = xfrm_calg_get_byname(x->calg->alg_name, 0); BUG_ON(!calg_desc); ipcd->threshold = calg_desc->uinfo.comp.threshold; x->data = ipcd; err = 0; out: return err; error: ipcomp_free_data(ipcd); kfree(ipcd); goto out; } EXPORT_SYMBOL_GPL(ipcomp_init_state); MODULE_LICENSE("GPL"); MODULE_DESCRIPTION("IP Payload Compression Protocol (IPComp) - RFC3173"); MODULE_AUTHOR("James Morris <jmorris@intercode.com.au>"); |
| 57 10 2 2 32 10 4 24 5 15 28 32 4 4 12 12 12 35 13 22 3 22 9 21 1 10 3 56 34 1 21 1 2 2 2 2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 | // SPDX-License-Identifier: GPL-2.0-only /* * Copyright (c) 2008-2009 Patrick McHardy <kaber@trash.net> * * Development of this code funded by Astaro AG (http://www.astaro.com/) */ #include <linux/kernel.h> #include <linux/init.h> #include <linux/module.h> #include <linux/spinlock.h> #include <linux/netlink.h> #include <linux/netfilter.h> #include <linux/netfilter/nf_tables.h> #include <net/netfilter/nf_tables.h> struct nft_limit { spinlock_t lock; u64 last; u64 tokens; }; struct nft_limit_priv { struct nft_limit *limit; u64 tokens_max; u64 rate; u64 nsecs; u32 burst; bool invert; }; static inline bool nft_limit_eval(struct nft_limit_priv *priv, u64 cost) { u64 now, tokens; s64 delta; spin_lock_bh(&priv->limit->lock); now = ktime_get_ns(); tokens = priv->limit->tokens + now - priv->limit->last; if (tokens > priv->tokens_max) tokens = priv->tokens_max; priv->limit->last = now; delta = tokens - cost; if (delta >= 0) { priv->limit->tokens = delta; spin_unlock_bh(&priv->limit->lock); return priv->invert; } priv->limit->tokens = tokens; spin_unlock_bh(&priv->limit->lock); return !priv->invert; } /* Use same default as in iptables. */ #define NFT_LIMIT_PKT_BURST_DEFAULT 5 static int nft_limit_init(struct nft_limit_priv *priv, const struct nlattr * const tb[], bool pkts) { u64 unit, tokens, rate_with_burst; bool invert = false; if (tb[NFTA_LIMIT_RATE] == NULL || tb[NFTA_LIMIT_UNIT] == NULL) return -EINVAL; priv->rate = be64_to_cpu(nla_get_be64(tb[NFTA_LIMIT_RATE])); if (priv->rate == 0) return -EINVAL; unit = be64_to_cpu(nla_get_be64(tb[NFTA_LIMIT_UNIT])); if (check_mul_overflow(unit, NSEC_PER_SEC, &priv->nsecs)) return -EOVERFLOW; if (tb[NFTA_LIMIT_BURST]) priv->burst = ntohl(nla_get_be32(tb[NFTA_LIMIT_BURST])); if (pkts && priv->burst == 0) priv->burst = NFT_LIMIT_PKT_BURST_DEFAULT; if (check_add_overflow(priv->rate, priv->burst, &rate_with_burst)) return -EOVERFLOW; if (pkts) { u64 tmp = div64_u64(priv->nsecs, priv->rate); if (check_mul_overflow(tmp, priv->burst, &tokens)) return -EOVERFLOW; } else { u64 tmp; /* The token bucket size limits the number of tokens can be * accumulated. tokens_max specifies the bucket size. * tokens_max = unit * (rate + burst) / rate. */ if (check_mul_overflow(priv->nsecs, rate_with_burst, &tmp)) return -EOVERFLOW; tokens = div64_u64(tmp, priv->rate); } if (tb[NFTA_LIMIT_FLAGS]) { u32 flags = ntohl(nla_get_be32(tb[NFTA_LIMIT_FLAGS])); if (flags & ~NFT_LIMIT_F_INV) return -EOPNOTSUPP; if (flags & NFT_LIMIT_F_INV) invert = true; } priv->limit = kmalloc(sizeof(*priv->limit), GFP_KERNEL_ACCOUNT); if (!priv->limit) return -ENOMEM; priv->limit->tokens = tokens; priv->tokens_max = priv->limit->tokens; priv->invert = invert; priv->limit->last = ktime_get_ns(); spin_lock_init(&priv->limit->lock); return 0; } static int nft_limit_dump(struct sk_buff *skb, const struct nft_limit_priv *priv, enum nft_limit_type type) { u32 flags = priv->invert ? NFT_LIMIT_F_INV : 0; u64 secs = div_u64(priv->nsecs, NSEC_PER_SEC); if (nla_put_be64(skb, NFTA_LIMIT_RATE, cpu_to_be64(priv->rate), NFTA_LIMIT_PAD) || nla_put_be64(skb, NFTA_LIMIT_UNIT, cpu_to_be64(secs), NFTA_LIMIT_PAD) || nla_put_be32(skb, NFTA_LIMIT_BURST, htonl(priv->burst)) || nla_put_be32(skb, NFTA_LIMIT_TYPE, htonl(type)) || nla_put_be32(skb, NFTA_LIMIT_FLAGS, htonl(flags))) goto nla_put_failure; return 0; nla_put_failure: return -1; } static void nft_limit_destroy(const struct nft_ctx *ctx, const struct nft_limit_priv *priv) { kfree(priv->limit); } static int nft_limit_clone(struct nft_limit_priv *priv_dst, const struct nft_limit_priv *priv_src, gfp_t gfp) { priv_dst->tokens_max = priv_src->tokens_max; priv_dst->rate = priv_src->rate; priv_dst->nsecs = priv_src->nsecs; priv_dst->burst = priv_src->burst; priv_dst->invert = priv_src->invert; priv_dst->limit = kmalloc(sizeof(*priv_dst->limit), gfp); if (!priv_dst->limit) return -ENOMEM; spin_lock_init(&priv_dst->limit->lock); priv_dst->limit->tokens = priv_src->tokens_max; priv_dst->limit->last = ktime_get_ns(); return 0; } struct nft_limit_priv_pkts { struct nft_limit_priv limit; u64 cost; }; static void nft_limit_pkts_eval(const struct nft_expr *expr, struct nft_regs *regs, const struct nft_pktinfo *pkt) { struct nft_limit_priv_pkts *priv = nft_expr_priv(expr); if (nft_limit_eval(&priv->limit, priv->cost)) regs->verdict.code = NFT_BREAK; } static const struct nla_policy nft_limit_policy[NFTA_LIMIT_MAX + 1] = { [NFTA_LIMIT_RATE] = { .type = NLA_U64 }, [NFTA_LIMIT_UNIT] = { .type = NLA_U64 }, [NFTA_LIMIT_BURST] = { .type = NLA_U32 }, [NFTA_LIMIT_TYPE] = { .type = NLA_U32 }, [NFTA_LIMIT_FLAGS] = { .type = NLA_U32 }, }; static int nft_limit_pkts_init(const struct nft_ctx *ctx, const struct nft_expr *expr, const struct nlattr * const tb[]) { struct nft_limit_priv_pkts *priv = nft_expr_priv(expr); int err; err = nft_limit_init(&priv->limit, tb, true); if (err < 0) return err; priv->cost = div64_u64(priv->limit.nsecs, priv->limit.rate); return 0; } static int nft_limit_pkts_dump(struct sk_buff *skb, const struct nft_expr *expr, bool reset) { const struct nft_limit_priv_pkts *priv = nft_expr_priv(expr); return nft_limit_dump(skb, &priv->limit, NFT_LIMIT_PKTS); } static void nft_limit_pkts_destroy(const struct nft_ctx *ctx, const struct nft_expr *expr) { const struct nft_limit_priv_pkts *priv = nft_expr_priv(expr); nft_limit_destroy(ctx, &priv->limit); } static int nft_limit_pkts_clone(struct nft_expr *dst, const struct nft_expr *src, gfp_t gfp) { struct nft_limit_priv_pkts *priv_dst = nft_expr_priv(dst); struct nft_limit_priv_pkts *priv_src = nft_expr_priv(src); priv_dst->cost = priv_src->cost; return nft_limit_clone(&priv_dst->limit, &priv_src->limit, gfp); } static struct nft_expr_type nft_limit_type; static const struct nft_expr_ops nft_limit_pkts_ops = { .type = &nft_limit_type, .size = NFT_EXPR_SIZE(sizeof(struct nft_limit_priv_pkts)), .eval = nft_limit_pkts_eval, .init = nft_limit_pkts_init, .destroy = nft_limit_pkts_destroy, .clone = nft_limit_pkts_clone, .dump = nft_limit_pkts_dump, .reduce = NFT_REDUCE_READONLY, }; static void nft_limit_bytes_eval(const struct nft_expr *expr, struct nft_regs *regs, const struct nft_pktinfo *pkt) { struct nft_limit_priv *priv = nft_expr_priv(expr); u64 cost = div64_u64(priv->nsecs * pkt->skb->len, priv->rate); if (nft_limit_eval(priv, cost)) regs->verdict.code = NFT_BREAK; } static int nft_limit_bytes_init(const struct nft_ctx *ctx, const struct nft_expr *expr, const struct nlattr * const tb[]) { struct nft_limit_priv *priv = nft_expr_priv(expr); return nft_limit_init(priv, tb, false); } static int nft_limit_bytes_dump(struct sk_buff *skb, const struct nft_expr *expr, bool reset) { const struct nft_limit_priv *priv = nft_expr_priv(expr); return nft_limit_dump(skb, priv, NFT_LIMIT_PKT_BYTES); } static void nft_limit_bytes_destroy(const struct nft_ctx *ctx, const struct nft_expr *expr) { const struct nft_limit_priv *priv = nft_expr_priv(expr); nft_limit_destroy(ctx, priv); } static int nft_limit_bytes_clone(struct nft_expr *dst, const struct nft_expr *src, gfp_t gfp) { struct nft_limit_priv *priv_dst = nft_expr_priv(dst); struct nft_limit_priv *priv_src = nft_expr_priv(src); return nft_limit_clone(priv_dst, priv_src, gfp); } static const struct nft_expr_ops nft_limit_bytes_ops = { .type = &nft_limit_type, .size = NFT_EXPR_SIZE(sizeof(struct nft_limit_priv)), .eval = nft_limit_bytes_eval, .init = nft_limit_bytes_init, .dump = nft_limit_bytes_dump, .clone = nft_limit_bytes_clone, .destroy = nft_limit_bytes_destroy, .reduce = NFT_REDUCE_READONLY, }; static const struct nft_expr_ops * nft_limit_select_ops(const struct nft_ctx *ctx, const struct nlattr * const tb[]) { if (tb[NFTA_LIMIT_TYPE] == NULL) return &nft_limit_pkts_ops; switch (ntohl(nla_get_be32(tb[NFTA_LIMIT_TYPE]))) { case NFT_LIMIT_PKTS: return &nft_limit_pkts_ops; case NFT_LIMIT_PKT_BYTES: return &nft_limit_bytes_ops; } return ERR_PTR(-EOPNOTSUPP); } static struct nft_expr_type nft_limit_type __read_mostly = { .name = "limit", .select_ops = nft_limit_select_ops, .policy = nft_limit_policy, .maxattr = NFTA_LIMIT_MAX, .flags = NFT_EXPR_STATEFUL, .owner = THIS_MODULE, }; static void nft_limit_obj_pkts_eval(struct nft_object *obj, struct nft_regs *regs, const struct nft_pktinfo *pkt) { struct nft_limit_priv_pkts *priv = nft_obj_data(obj); if (nft_limit_eval(&priv->limit, priv->cost)) regs->verdict.code = NFT_BREAK; } static int nft_limit_obj_pkts_init(const struct nft_ctx *ctx, const struct nlattr * const tb[], struct nft_object *obj) { struct nft_limit_priv_pkts *priv = nft_obj_data(obj); int err; err = nft_limit_init(&priv->limit, tb, true); if (err < 0) return err; priv->cost = div64_u64(priv->limit.nsecs, priv->limit.rate); return 0; } static int nft_limit_obj_pkts_dump(struct sk_buff *skb, struct nft_object *obj, bool reset) { const struct nft_limit_priv_pkts *priv = nft_obj_data(obj); return nft_limit_dump(skb, &priv->limit, NFT_LIMIT_PKTS); } static void nft_limit_obj_pkts_destroy(const struct nft_ctx *ctx, struct nft_object *obj) { struct nft_limit_priv_pkts *priv = nft_obj_data(obj); nft_limit_destroy(ctx, &priv->limit); } static struct nft_object_type nft_limit_obj_type; static const struct nft_object_ops nft_limit_obj_pkts_ops = { .type = &nft_limit_obj_type, .size = NFT_EXPR_SIZE(sizeof(struct nft_limit_priv_pkts)), .init = nft_limit_obj_pkts_init, .destroy = nft_limit_obj_pkts_destroy, .eval = nft_limit_obj_pkts_eval, .dump = nft_limit_obj_pkts_dump, }; static void nft_limit_obj_bytes_eval(struct nft_object *obj, struct nft_regs *regs, const struct nft_pktinfo *pkt) { struct nft_limit_priv *priv = nft_obj_data(obj); u64 cost = div64_u64(priv->nsecs * pkt->skb->len, priv->rate); if (nft_limit_eval(priv, cost)) regs->verdict.code = NFT_BREAK; } static int nft_limit_obj_bytes_init(const struct nft_ctx *ctx, const struct nlattr * const tb[], struct nft_object *obj) { struct nft_limit_priv *priv = nft_obj_data(obj); return nft_limit_init(priv, tb, false); } static int nft_limit_obj_bytes_dump(struct sk_buff *skb, struct nft_object *obj, bool reset) { const struct nft_limit_priv *priv = nft_obj_data(obj); return nft_limit_dump(skb, priv, NFT_LIMIT_PKT_BYTES); } static void nft_limit_obj_bytes_destroy(const struct nft_ctx *ctx, struct nft_object *obj) { struct nft_limit_priv *priv = nft_obj_data(obj); nft_limit_destroy(ctx, priv); } static struct nft_object_type nft_limit_obj_type; static const struct nft_object_ops nft_limit_obj_bytes_ops = { .type = &nft_limit_obj_type, .size = sizeof(struct nft_limit_priv), .init = nft_limit_obj_bytes_init, .destroy = nft_limit_obj_bytes_destroy, .eval = nft_limit_obj_bytes_eval, .dump = nft_limit_obj_bytes_dump, }; static const struct nft_object_ops * nft_limit_obj_select_ops(const struct nft_ctx *ctx, const struct nlattr * const tb[]) { if (!tb[NFTA_LIMIT_TYPE]) return &nft_limit_obj_pkts_ops; switch (ntohl(nla_get_be32(tb[NFTA_LIMIT_TYPE]))) { case NFT_LIMIT_PKTS: return &nft_limit_obj_pkts_ops; case NFT_LIMIT_PKT_BYTES: return &nft_limit_obj_bytes_ops; } return ERR_PTR(-EOPNOTSUPP); } static struct nft_object_type nft_limit_obj_type __read_mostly = { .select_ops = nft_limit_obj_select_ops, .type = NFT_OBJECT_LIMIT, .maxattr = NFTA_LIMIT_MAX, .policy = nft_limit_policy, .owner = THIS_MODULE, }; static int __init nft_limit_module_init(void) { int err; err = nft_register_obj(&nft_limit_obj_type); if (err < 0) return err; err = nft_register_expr(&nft_limit_type); if (err < 0) goto err1; return 0; err1: nft_unregister_obj(&nft_limit_obj_type); return err; } static void __exit nft_limit_module_exit(void) { nft_unregister_expr(&nft_limit_type); nft_unregister_obj(&nft_limit_obj_type); } module_init(nft_limit_module_init); module_exit(nft_limit_module_exit); MODULE_LICENSE("GPL"); MODULE_AUTHOR("Patrick McHardy <kaber@trash.net>"); MODULE_ALIAS_NFT_EXPR("limit"); MODULE_ALIAS_NFT_OBJ(NFT_OBJECT_LIMIT); MODULE_DESCRIPTION("nftables limit expression support"); |
| 227 24 9 33 560 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 | /* SPDX-License-Identifier: GPL-2.0-or-later */ /* Internal procfs definitions * * Copyright (C) 2004 Red Hat, Inc. All Rights Reserved. * Written by David Howells (dhowells@redhat.com) */ #include <linux/proc_fs.h> #include <linux/proc_ns.h> #include <linux/refcount.h> #include <linux/spinlock.h> #include <linux/atomic.h> #include <linux/binfmts.h> #include <linux/sched/coredump.h> #include <linux/sched/task.h> #include <linux/mm.h> struct ctl_table_header; struct mempolicy; /* * This is not completely implemented yet. The idea is to * create an in-memory tree (like the actual /proc filesystem * tree) of these proc_dir_entries, so that we can dynamically * add new files to /proc. * * parent/subdir are used for the directory structure (every /proc file has a * parent, but "subdir" is empty for all non-directory entries). * subdir_node is used to build the rb tree "subdir" of the parent. */ struct proc_dir_entry { /* * number of callers into module in progress; * negative -> it's going away RSN */ atomic_t in_use; refcount_t refcnt; struct list_head pde_openers; /* who did ->open, but not ->release */ /* protects ->pde_openers and all struct pde_opener instances */ spinlock_t pde_unload_lock; struct completion *pde_unload_completion; const struct inode_operations *proc_iops; union { const struct proc_ops *proc_ops; const struct file_operations *proc_dir_ops; }; union { const struct seq_operations *seq_ops; int (*single_show)(struct seq_file *, void *); }; proc_write_t write; void *data; unsigned int state_size; unsigned int low_ino; nlink_t nlink; kuid_t uid; kgid_t gid; loff_t size; struct proc_dir_entry *parent; struct rb_root subdir; struct rb_node subdir_node; char *name; umode_t mode; u8 flags; u8 namelen; char inline_name[]; } __randomize_layout; #define SIZEOF_PDE ( \ sizeof(struct proc_dir_entry) < 128 ? 128 : \ sizeof(struct proc_dir_entry) < 192 ? 192 : \ sizeof(struct proc_dir_entry) < 256 ? 256 : \ sizeof(struct proc_dir_entry) < 512 ? 512 : \ 0) #define SIZEOF_PDE_INLINE_NAME (SIZEOF_PDE - sizeof(struct proc_dir_entry)) static inline bool pde_is_permanent(const struct proc_dir_entry *pde) { return pde->flags & PROC_ENTRY_PERMANENT; } static inline void pde_make_permanent(struct proc_dir_entry *pde) { pde->flags |= PROC_ENTRY_PERMANENT; } static inline bool pde_has_proc_read_iter(const struct proc_dir_entry *pde) { return pde->flags & PROC_ENTRY_proc_read_iter; } static inline bool pde_has_proc_compat_ioctl(const struct proc_dir_entry *pde) { #ifdef CONFIG_COMPAT return pde->flags & PROC_ENTRY_proc_compat_ioctl; #else return false; #endif } static inline bool pde_has_proc_lseek(const struct proc_dir_entry *pde) { return pde->flags & PROC_ENTRY_proc_lseek; } extern struct kmem_cache *proc_dir_entry_cache; void pde_free(struct proc_dir_entry *pde); union proc_op { int (*proc_get_link)(struct dentry *, struct path *); int (*proc_show)(struct seq_file *m, struct pid_namespace *ns, struct pid *pid, struct task_struct *task); int lsmid; }; struct proc_inode { struct pid *pid; unsigned int fd; union proc_op op; struct proc_dir_entry *pde; struct ctl_table_header *sysctl; const struct ctl_table *sysctl_entry; struct hlist_node sibling_inodes; const struct proc_ns_operations *ns_ops; struct inode vfs_inode; } __randomize_layout; /* * General functions */ static inline struct proc_inode *PROC_I(const struct inode *inode) { return container_of(inode, struct proc_inode, vfs_inode); } static inline struct proc_dir_entry *PDE(const struct inode *inode) { return PROC_I(inode)->pde; } static inline struct pid *proc_pid(const struct inode *inode) { return PROC_I(inode)->pid; } static inline struct task_struct *get_proc_task(const struct inode *inode) { return get_pid_task(proc_pid(inode), PIDTYPE_PID); } void task_dump_owner(struct task_struct *task, umode_t mode, kuid_t *ruid, kgid_t *rgid); unsigned name_to_int(const struct qstr *qstr); /* * Offset of the first process in the /proc root directory.. */ #define FIRST_PROCESS_ENTRY 256 /* Worst case buffer size needed for holding an integer. */ #define PROC_NUMBUF 13 #ifdef CONFIG_PAGE_MAPCOUNT /** * folio_precise_page_mapcount() - Number of mappings of this folio page. * @folio: The folio. * @page: The page. * * The number of present user page table entries that reference this page * as tracked via the RMAP: either referenced directly (PTE) or as part of * a larger area that covers this page (e.g., PMD). * * Use this function only for the calculation of existing statistics * (USS, PSS, mapcount_max) and for debugging purposes (/proc/kpagecount). * * Do not add new users. * * Returns: The number of mappings of this folio page. 0 for * folios that are not mapped to user space or are not tracked via the RMAP * (e.g., shared zeropage). */ static inline int folio_precise_page_mapcount(struct folio *folio, struct page *page) { int mapcount = atomic_read(&page->_mapcount) + 1; if (page_mapcount_is_type(mapcount)) mapcount = 0; if (folio_test_large(folio)) mapcount += folio_entire_mapcount(folio); return mapcount; } #else /* !CONFIG_PAGE_MAPCOUNT */ static inline int folio_precise_page_mapcount(struct folio *folio, struct page *page) { BUILD_BUG(); } #endif /* CONFIG_PAGE_MAPCOUNT */ /** * folio_average_page_mapcount() - Average number of mappings per page in this * folio * @folio: The folio. * * The average number of user page table entries that reference each page in * this folio as tracked via the RMAP: either referenced directly (PTE) or * as part of a larger area that covers this page (e.g., PMD). * * The average is calculated by rounding to the nearest integer; however, * to avoid duplicated code in current callers, the average is at least * 1 if any page of the folio is mapped. * * Returns: The average number of mappings per page in this folio. */ static inline int folio_average_page_mapcount(struct folio *folio) { int mapcount, entire_mapcount, avg; if (!folio_test_large(folio)) return atomic_read(&folio->_mapcount) + 1; mapcount = folio_large_mapcount(folio); if (unlikely(mapcount <= 0)) return 0; entire_mapcount = folio_entire_mapcount(folio); if (mapcount <= entire_mapcount) return entire_mapcount; mapcount -= entire_mapcount; /* Round to closest integer ... */ avg = ((unsigned int)mapcount + folio_large_nr_pages(folio) / 2) >> folio_large_order(folio); /* ... but return at least 1. */ return max_t(int, avg + entire_mapcount, 1); } /* * array.c */ extern const struct file_operations proc_tid_children_operations; extern void proc_task_name(struct seq_file *m, struct task_struct *p, bool escape); extern int proc_tid_stat(struct seq_file *, struct pid_namespace *, struct pid *, struct task_struct *); extern int proc_tgid_stat(struct seq_file *, struct pid_namespace *, struct pid *, struct task_struct *); extern int proc_pid_status(struct seq_file *, struct pid_namespace *, struct pid *, struct task_struct *); extern int proc_pid_statm(struct seq_file *, struct pid_namespace *, struct pid *, struct task_struct *); /* * base.c */ extern const struct dentry_operations pid_dentry_operations; extern int pid_getattr(struct mnt_idmap *, const struct path *, struct kstat *, u32, unsigned int); extern int proc_setattr(struct mnt_idmap *, struct dentry *, struct iattr *); extern void proc_pid_evict_inode(struct proc_inode *); extern struct inode *proc_pid_make_inode(struct super_block *, struct task_struct *, umode_t); extern void pid_update_inode(struct task_struct *, struct inode *); extern int pid_delete_dentry(const struct dentry *); extern int proc_pid_readdir(struct file *, struct dir_context *); struct dentry *proc_pid_lookup(struct dentry *, unsigned int); extern loff_t mem_lseek(struct file *, loff_t, int); /* Lookups */ typedef struct dentry *instantiate_t(struct dentry *, struct task_struct *, const void *); bool proc_fill_cache(struct file *, struct dir_context *, const char *, unsigned int, instantiate_t, struct task_struct *, const void *); /* * generic.c */ struct proc_dir_entry *proc_create_reg(const char *name, umode_t mode, struct proc_dir_entry **parent, void *data); struct proc_dir_entry *proc_register(struct proc_dir_entry *dir, struct proc_dir_entry *dp); extern struct dentry *proc_lookup(struct inode *, struct dentry *, unsigned int); struct dentry *proc_lookup_de(struct inode *, struct dentry *, struct proc_dir_entry *); extern int proc_readdir(struct file *, struct dir_context *); int proc_readdir_de(struct file *, struct dir_context *, struct proc_dir_entry *); static inline void pde_get(struct proc_dir_entry *pde) { refcount_inc(&pde->refcnt); } extern void pde_put(struct proc_dir_entry *); static inline bool is_empty_pde(const struct proc_dir_entry *pde) { return S_ISDIR(pde->mode) && !pde->proc_iops; } extern ssize_t proc_simple_write(struct file *, const char __user *, size_t, loff_t *); /* * inode.c */ struct pde_opener { struct list_head lh; struct file *file; bool closing; struct completion *c; } __randomize_layout; extern const struct inode_operations proc_link_inode_operations; extern const struct inode_operations proc_pid_link_inode_operations; extern const struct super_operations proc_sops; void proc_init_kmemcache(void); void proc_invalidate_siblings_dcache(struct hlist_head *inodes, spinlock_t *lock); void set_proc_pid_nlink(void); extern struct inode *proc_get_inode(struct super_block *, struct proc_dir_entry *); extern void proc_entry_rundown(struct proc_dir_entry *); /* * proc_namespaces.c */ extern const struct inode_operations proc_ns_dir_inode_operations; extern const struct file_operations proc_ns_dir_operations; /* * proc_net.c */ extern const struct file_operations proc_net_operations; extern const struct inode_operations proc_net_inode_operations; #ifdef CONFIG_NET extern int proc_net_init(void); #else static inline int proc_net_init(void) { return 0; } #endif /* * proc_self.c */ extern int proc_setup_self(struct super_block *); /* * proc_thread_self.c */ extern int proc_setup_thread_self(struct super_block *); extern void proc_thread_self_init(void); /* * proc_sysctl.c */ #ifdef CONFIG_PROC_SYSCTL extern int proc_sys_init(void); extern void proc_sys_evict_inode(struct inode *inode, struct ctl_table_header *head); #else static inline void proc_sys_init(void) { } static inline void proc_sys_evict_inode(struct inode *inode, struct ctl_table_header *head) { } #endif /* * proc_tty.c */ #ifdef CONFIG_TTY extern void proc_tty_init(void); #else static inline void proc_tty_init(void) {} #endif /* * root.c */ extern struct proc_dir_entry proc_root; extern void proc_self_init(void); /* * task_[no]mmu.c */ struct mem_size_stats; struct proc_maps_locking_ctx { struct mm_struct *mm; #ifdef CONFIG_PER_VMA_LOCK bool mmap_locked; struct vm_area_struct *locked_vma; #endif }; struct proc_maps_private { struct inode *inode; struct task_struct *task; struct vma_iterator iter; loff_t last_pos; struct proc_maps_locking_ctx lock_ctx; #ifdef CONFIG_NUMA struct mempolicy *task_mempolicy; #endif } __randomize_layout; struct mm_struct *proc_mem_open(struct inode *inode, unsigned int mode); extern const struct file_operations proc_pid_maps_operations; extern const struct file_operations proc_pid_numa_maps_operations; extern const struct file_operations proc_pid_smaps_operations; extern const struct file_operations proc_pid_smaps_rollup_operations; extern const struct file_operations proc_clear_refs_operations; extern const struct file_operations proc_pagemap_operations; extern unsigned long task_vsize(struct mm_struct *); extern unsigned long task_statm(struct mm_struct *, unsigned long *, unsigned long *, unsigned long *, unsigned long *); extern void task_mem(struct seq_file *, struct mm_struct *); extern const struct dentry_operations proc_net_dentry_ops; static inline void pde_force_lookup(struct proc_dir_entry *pde) { /* /proc/net/ entries can be changed under us by setns(CLONE_NEWNET) */ pde->flags |= PROC_ENTRY_FORCE_LOOKUP; } /* * Add a new procfs dentry that can't serve as a mountpoint. That should * encompass anything that is ephemeral and can just disappear while the * process is still around. */ static inline struct dentry *proc_splice_unmountable(struct inode *inode, struct dentry *dentry, const struct dentry_operations *d_ops) { dont_mount(dentry); return d_splice_alias_ops(inode, dentry, d_ops); } |
| 33 33 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 | // SPDX-License-Identifier: GPL-2.0 #include <linux/sysctl.h> #include <net/lwtunnel.h> #include <net/netfilter/nf_hooks_lwtunnel.h> #include <linux/netfilter.h> #include "nf_internals.h" static inline int nf_hooks_lwtunnel_get(void) { if (static_branch_unlikely(&nf_hooks_lwtunnel_enabled)) return 1; else return 0; } static inline int nf_hooks_lwtunnel_set(int enable) { if (static_branch_unlikely(&nf_hooks_lwtunnel_enabled)) { if (!enable) return -EBUSY; } else if (enable) { static_branch_enable(&nf_hooks_lwtunnel_enabled); } return 0; } #ifdef CONFIG_SYSCTL int nf_hooks_lwtunnel_sysctl_handler(const struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { int proc_nf_hooks_lwtunnel_enabled = 0; struct ctl_table tmp = { .procname = table->procname, .data = &proc_nf_hooks_lwtunnel_enabled, .maxlen = sizeof(int), .mode = table->mode, .extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_ONE, }; int ret; if (!write) proc_nf_hooks_lwtunnel_enabled = nf_hooks_lwtunnel_get(); ret = proc_dointvec_minmax(&tmp, write, buffer, lenp, ppos); if (write && ret == 0) ret = nf_hooks_lwtunnel_set(proc_nf_hooks_lwtunnel_enabled); return ret; } EXPORT_SYMBOL_GPL(nf_hooks_lwtunnel_sysctl_handler); static struct ctl_table nf_lwtunnel_sysctl_table[] = { { .procname = "nf_hooks_lwtunnel", .data = NULL, .maxlen = sizeof(int), .mode = 0644, .proc_handler = nf_hooks_lwtunnel_sysctl_handler, }, }; static int __net_init nf_lwtunnel_net_init(struct net *net) { struct ctl_table_header *hdr; struct ctl_table *table; table = nf_lwtunnel_sysctl_table; if (!net_eq(net, &init_net)) { table = kmemdup(nf_lwtunnel_sysctl_table, sizeof(nf_lwtunnel_sysctl_table), GFP_KERNEL); if (!table) goto err_alloc; } hdr = register_net_sysctl_sz(net, "net/netfilter", table, ARRAY_SIZE(nf_lwtunnel_sysctl_table)); if (!hdr) goto err_reg; net->nf.nf_lwtnl_dir_header = hdr; return 0; err_reg: if (!net_eq(net, &init_net)) kfree(table); err_alloc: return -ENOMEM; } static void __net_exit nf_lwtunnel_net_exit(struct net *net) { const struct ctl_table *table; table = net->nf.nf_lwtnl_dir_header->ctl_table_arg; unregister_net_sysctl_table(net->nf.nf_lwtnl_dir_header); if (!net_eq(net, &init_net)) kfree(table); } static struct pernet_operations nf_lwtunnel_net_ops = { .init = nf_lwtunnel_net_init, .exit = nf_lwtunnel_net_exit, }; int __init netfilter_lwtunnel_init(void) { return register_pernet_subsys(&nf_lwtunnel_net_ops); } void netfilter_lwtunnel_fini(void) { unregister_pernet_subsys(&nf_lwtunnel_net_ops); } #else int __init netfilter_lwtunnel_init(void) { return 0; } void netfilter_lwtunnel_fini(void) {} #endif /* CONFIG_SYSCTL */ |
| 4554 9486 94 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 | /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _KERNEL_EVENTS_INTERNAL_H #define _KERNEL_EVENTS_INTERNAL_H #include <linux/hardirq.h> #include <linux/uaccess.h> #include <linux/refcount.h> /* Buffer handling */ #define RING_BUFFER_WRITABLE 0x01 struct perf_buffer { refcount_t refcount; struct rcu_head rcu_head; #ifdef CONFIG_PERF_USE_VMALLOC struct work_struct work; int page_order; /* allocation order */ #endif int nr_pages; /* nr of data pages */ int overwrite; /* can overwrite itself */ int paused; /* can write into ring buffer */ atomic_t poll; /* POLL_ for wakeups */ local_t head; /* write position */ unsigned int nest; /* nested writers */ local_t events; /* event limit */ local_t wakeup; /* wakeup stamp */ local_t lost; /* nr records lost */ long watermark; /* wakeup watermark */ long aux_watermark; /* poll crap */ spinlock_t event_lock; struct list_head event_list; refcount_t mmap_count; unsigned long mmap_locked; struct user_struct *mmap_user; /* AUX area */ struct mutex aux_mutex; long aux_head; unsigned int aux_nest; long aux_wakeup; /* last aux_watermark boundary crossed by aux_head */ unsigned long aux_pgoff; int aux_nr_pages; int aux_overwrite; refcount_t aux_mmap_count; unsigned long aux_mmap_locked; void (*free_aux)(void *); refcount_t aux_refcount; int aux_in_sampling; int aux_in_pause_resume; void **aux_pages; void *aux_priv; struct perf_event_mmap_page *user_page; void *data_pages[]; }; extern void rb_free(struct perf_buffer *rb); static inline void rb_free_rcu(struct rcu_head *rcu_head) { struct perf_buffer *rb; rb = container_of(rcu_head, struct perf_buffer, rcu_head); rb_free(rb); } static inline void rb_toggle_paused(struct perf_buffer *rb, bool pause) { if (!pause && rb->nr_pages) rb->paused = 0; else rb->paused = 1; } extern struct perf_buffer * rb_alloc(int nr_pages, long watermark, int cpu, int flags); extern void perf_event_wakeup(struct perf_event *event); extern int rb_alloc_aux(struct perf_buffer *rb, struct perf_event *event, pgoff_t pgoff, int nr_pages, long watermark, int flags); extern void rb_free_aux(struct perf_buffer *rb); extern struct perf_buffer *ring_buffer_get(struct perf_event *event); extern void ring_buffer_put(struct perf_buffer *rb); static inline bool rb_has_aux(struct perf_buffer *rb) { return !!rb->aux_nr_pages; } void perf_event_aux_event(struct perf_event *event, unsigned long head, unsigned long size, u64 flags); extern struct page * perf_mmap_to_page(struct perf_buffer *rb, unsigned long pgoff); #ifdef CONFIG_PERF_USE_VMALLOC /* * Back perf_mmap() with vmalloc memory. * * Required for architectures that have d-cache aliasing issues. */ static inline int page_order(struct perf_buffer *rb) { return rb->page_order; } #else static inline int page_order(struct perf_buffer *rb) { return 0; } #endif static inline int data_page_nr(struct perf_buffer *rb) { return rb->nr_pages << page_order(rb); } static inline unsigned long perf_data_size(struct perf_buffer *rb) { return rb->nr_pages << (PAGE_SHIFT + page_order(rb)); } static inline unsigned long perf_aux_size(struct perf_buffer *rb) { return (unsigned long)rb->aux_nr_pages << PAGE_SHIFT; } #define __DEFINE_OUTPUT_COPY_BODY(advance_buf, memcpy_func, ...) \ { \ unsigned long size, written; \ \ do { \ size = min(handle->size, len); \ written = memcpy_func(__VA_ARGS__); \ written = size - written; \ \ len -= written; \ handle->addr += written; \ if (advance_buf) \ buf += written; \ handle->size -= written; \ if (!handle->size) { \ struct perf_buffer *rb = handle->rb; \ \ handle->page++; \ handle->page &= rb->nr_pages - 1; \ handle->addr = rb->data_pages[handle->page]; \ handle->size = PAGE_SIZE << page_order(rb); \ } \ } while (len && written == size); \ \ return len; \ } #define DEFINE_OUTPUT_COPY(func_name, memcpy_func) \ static inline unsigned long \ func_name(struct perf_output_handle *handle, \ const void *buf, unsigned long len) \ __DEFINE_OUTPUT_COPY_BODY(true, memcpy_func, handle->addr, buf, size) static inline unsigned long __output_custom(struct perf_output_handle *handle, perf_copy_f copy_func, const void *buf, unsigned long len) { unsigned long orig_len = len; __DEFINE_OUTPUT_COPY_BODY(false, copy_func, handle->addr, buf, orig_len - len, size) } static inline unsigned long memcpy_common(void *dst, const void *src, unsigned long n) { memcpy(dst, src, n); return 0; } DEFINE_OUTPUT_COPY(__output_copy, memcpy_common) static inline unsigned long memcpy_skip(void *dst, const void *src, unsigned long n) { return 0; } DEFINE_OUTPUT_COPY(__output_skip, memcpy_skip) #ifndef arch_perf_out_copy_user #define arch_perf_out_copy_user arch_perf_out_copy_user static inline unsigned long arch_perf_out_copy_user(void *dst, const void *src, unsigned long n) { unsigned long ret; pagefault_disable(); ret = __copy_from_user_inatomic(dst, src, n); pagefault_enable(); return ret; } #endif DEFINE_OUTPUT_COPY(__output_copy_user, arch_perf_out_copy_user) static inline int get_recursion_context(u8 *recursion) { unsigned char rctx = interrupt_context_level(); if (recursion[rctx]) return -1; recursion[rctx]++; barrier(); return rctx; } static inline void put_recursion_context(u8 *recursion, unsigned char rctx) { barrier(); recursion[rctx]--; } #ifdef CONFIG_HAVE_PERF_USER_STACK_DUMP static inline bool arch_perf_have_user_stack_dump(void) { return true; } #define perf_user_stack_pointer(regs) user_stack_pointer(regs) #else static inline bool arch_perf_have_user_stack_dump(void) { return false; } #define perf_user_stack_pointer(regs) 0 #endif /* CONFIG_HAVE_PERF_USER_STACK_DUMP */ #endif /* _KERNEL_EVENTS_INTERNAL_H */ |
| 4075 357 2695 630 207 418 487 2 1005 205 12 5 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 | /* SPDX-License-Identifier: GPL-2.0 */ #ifndef __LINUX_SPINLOCK_H #define __LINUX_SPINLOCK_H #define __LINUX_INSIDE_SPINLOCK_H /* * include/linux/spinlock.h - generic spinlock/rwlock declarations * * here's the role of the various spinlock/rwlock related include files: * * on SMP builds: * * asm/spinlock_types.h: contains the arch_spinlock_t/arch_rwlock_t and the * initializers * * linux/spinlock_types_raw: * The raw types and initializers * linux/spinlock_types.h: * defines the generic type and initializers * * asm/spinlock.h: contains the arch_spin_*()/etc. lowlevel * implementations, mostly inline assembly code * * (also included on UP-debug builds:) * * linux/spinlock_api_smp.h: * contains the prototypes for the _spin_*() APIs. * * linux/spinlock.h: builds the final spin_*() APIs. * * on UP builds: * * linux/spinlock_type_up.h: * contains the generic, simplified UP spinlock type. * (which is an empty structure on non-debug builds) * * linux/spinlock_types_raw: * The raw RT types and initializers * linux/spinlock_types.h: * defines the generic type and initializers * * linux/spinlock_up.h: * contains the arch_spin_*()/etc. version of UP * builds. (which are NOPs on non-debug, non-preempt * builds) * * (included on UP-non-debug builds:) * * linux/spinlock_api_up.h: * builds the _spin_*() APIs. * * linux/spinlock.h: builds the final spin_*() APIs. */ #include <linux/typecheck.h> #include <linux/preempt.h> #include <linux/linkage.h> #include <linux/compiler.h> #include <linux/irqflags.h> #include <linux/thread_info.h> #include <linux/stringify.h> #include <linux/bottom_half.h> #include <linux/lockdep.h> #include <linux/cleanup.h> #include <asm/barrier.h> #include <asm/mmiowb.h> /* * Must define these before including other files, inline functions need them */ #define LOCK_SECTION_NAME ".text..lock."KBUILD_BASENAME #define LOCK_SECTION_START(extra) \ ".subsection 1\n\t" \ extra \ ".ifndef " LOCK_SECTION_NAME "\n\t" \ LOCK_SECTION_NAME ":\n\t" \ ".endif\n" #define LOCK_SECTION_END \ ".previous\n\t" #define __lockfunc __section(".spinlock.text") /* * Pull the arch_spinlock_t and arch_rwlock_t definitions: */ #include <linux/spinlock_types.h> /* * Pull the arch_spin*() functions/declarations (UP-nondebug doesn't need them): */ #ifdef CONFIG_SMP # include <asm/spinlock.h> #else # include <linux/spinlock_up.h> #endif #ifdef CONFIG_DEBUG_SPINLOCK extern void __raw_spin_lock_init(raw_spinlock_t *lock, const char *name, struct lock_class_key *key, short inner); # define raw_spin_lock_init(lock) \ do { \ static struct lock_class_key __key; \ \ __raw_spin_lock_init((lock), #lock, &__key, LD_WAIT_SPIN); \ } while (0) #else # define raw_spin_lock_init(lock) \ do { *(lock) = __RAW_SPIN_LOCK_UNLOCKED(lock); } while (0) #endif #define raw_spin_is_locked(lock) arch_spin_is_locked(&(lock)->raw_lock) #ifdef arch_spin_is_contended #define raw_spin_is_contended(lock) arch_spin_is_contended(&(lock)->raw_lock) #else #define raw_spin_is_contended(lock) (((void)(lock), 0)) #endif /*arch_spin_is_contended*/ /* * smp_mb__after_spinlock() provides the equivalent of a full memory barrier * between program-order earlier lock acquisitions and program-order later * memory accesses. * * This guarantees that the following two properties hold: * * 1) Given the snippet: * * { X = 0; Y = 0; } * * CPU0 CPU1 * * WRITE_ONCE(X, 1); WRITE_ONCE(Y, 1); * spin_lock(S); smp_mb(); * smp_mb__after_spinlock(); r1 = READ_ONCE(X); * r0 = READ_ONCE(Y); * spin_unlock(S); * * it is forbidden that CPU0 does not observe CPU1's store to Y (r0 = 0) * and CPU1 does not observe CPU0's store to X (r1 = 0); see the comments * preceding the call to smp_mb__after_spinlock() in __schedule() and in * try_to_wake_up(). * * 2) Given the snippet: * * { X = 0; Y = 0; } * * CPU0 CPU1 CPU2 * * spin_lock(S); spin_lock(S); r1 = READ_ONCE(Y); * WRITE_ONCE(X, 1); smp_mb__after_spinlock(); smp_rmb(); * spin_unlock(S); r0 = READ_ONCE(X); r2 = READ_ONCE(X); * WRITE_ONCE(Y, 1); * spin_unlock(S); * * it is forbidden that CPU0's critical section executes before CPU1's * critical section (r0 = 1), CPU2 observes CPU1's store to Y (r1 = 1) * and CPU2 does not observe CPU0's store to X (r2 = 0); see the comments * preceding the calls to smp_rmb() in try_to_wake_up() for similar * snippets but "projected" onto two CPUs. * * Property (2) upgrades the lock to an RCsc lock. * * Since most load-store architectures implement ACQUIRE with an smp_mb() after * the LL/SC loop, they need no further barriers. Similarly all our TSO * architectures imply an smp_mb() for each atomic instruction and equally don't * need more. * * Architectures that can implement ACQUIRE better need to take care. */ #ifndef smp_mb__after_spinlock #define smp_mb__after_spinlock() kcsan_mb() #endif #ifdef CONFIG_DEBUG_SPINLOCK extern void do_raw_spin_lock(raw_spinlock_t *lock) __acquires(lock); extern int do_raw_spin_trylock(raw_spinlock_t *lock); extern void do_raw_spin_unlock(raw_spinlock_t *lock) __releases(lock); #else static inline void do_raw_spin_lock(raw_spinlock_t *lock) __acquires(lock) { __acquire(lock); arch_spin_lock(&lock->raw_lock); mmiowb_spin_lock(); } static inline int do_raw_spin_trylock(raw_spinlock_t *lock) { int ret = arch_spin_trylock(&(lock)->raw_lock); if (ret) mmiowb_spin_lock(); return ret; } static inline void do_raw_spin_unlock(raw_spinlock_t *lock) __releases(lock) { mmiowb_spin_unlock(); arch_spin_unlock(&lock->raw_lock); __release(lock); } #endif /* * Define the various spin_lock methods. Note we define these * regardless of whether CONFIG_SMP or CONFIG_PREEMPTION are set. The * various methods are defined as nops in the case they are not * required. */ #define raw_spin_trylock(lock) __cond_lock(lock, _raw_spin_trylock(lock)) #define raw_spin_lock(lock) _raw_spin_lock(lock) #ifdef CONFIG_DEBUG_LOCK_ALLOC # define raw_spin_lock_nested(lock, subclass) \ _raw_spin_lock_nested(lock, subclass) # define raw_spin_lock_nest_lock(lock, nest_lock) \ do { \ typecheck(struct lockdep_map *, &(nest_lock)->dep_map);\ _raw_spin_lock_nest_lock(lock, &(nest_lock)->dep_map); \ } while (0) #else /* * Always evaluate the 'subclass' argument to avoid that the compiler * warns about set-but-not-used variables when building with * CONFIG_DEBUG_LOCK_ALLOC=n and with W=1. */ # define raw_spin_lock_nested(lock, subclass) \ _raw_spin_lock(((void)(subclass), (lock))) # define raw_spin_lock_nest_lock(lock, nest_lock) _raw_spin_lock(lock) #endif #if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) #define raw_spin_lock_irqsave(lock, flags) \ do { \ typecheck(unsigned long, flags); \ flags = _raw_spin_lock_irqsave(lock); \ } while (0) #ifdef CONFIG_DEBUG_LOCK_ALLOC #define raw_spin_lock_irqsave_nested(lock, flags, subclass) \ do { \ typecheck(unsigned long, flags); \ flags = _raw_spin_lock_irqsave_nested(lock, subclass); \ } while (0) #else #define raw_spin_lock_irqsave_nested(lock, flags, subclass) \ do { \ typecheck(unsigned long, flags); \ flags = _raw_spin_lock_irqsave(lock); \ } while (0) #endif #else #define raw_spin_lock_irqsave(lock, flags) \ do { \ typecheck(unsigned long, flags); \ _raw_spin_lock_irqsave(lock, flags); \ } while (0) #define raw_spin_lock_irqsave_nested(lock, flags, subclass) \ raw_spin_lock_irqsave(lock, flags) #endif #define raw_spin_lock_irq(lock) _raw_spin_lock_irq(lock) #define raw_spin_lock_bh(lock) _raw_spin_lock_bh(lock) #define raw_spin_unlock(lock) _raw_spin_unlock(lock) #define raw_spin_unlock_irq(lock) _raw_spin_unlock_irq(lock) #define raw_spin_unlock_irqrestore(lock, flags) \ do { \ typecheck(unsigned long, flags); \ _raw_spin_unlock_irqrestore(lock, flags); \ } while (0) #define raw_spin_unlock_bh(lock) _raw_spin_unlock_bh(lock) #define raw_spin_trylock_bh(lock) \ __cond_lock(lock, _raw_spin_trylock_bh(lock)) #define raw_spin_trylock_irq(lock) \ ({ \ local_irq_disable(); \ raw_spin_trylock(lock) ? \ 1 : ({ local_irq_enable(); 0; }); \ }) #define raw_spin_trylock_irqsave(lock, flags) \ ({ \ local_irq_save(flags); \ raw_spin_trylock(lock) ? \ 1 : ({ local_irq_restore(flags); 0; }); \ }) #ifndef CONFIG_PREEMPT_RT /* Include rwlock functions for !RT */ #include <linux/rwlock.h> #endif /* * Pull the _spin_*()/_read_*()/_write_*() functions/declarations: */ #if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) # include <linux/spinlock_api_smp.h> #else # include <linux/spinlock_api_up.h> #endif /* Non PREEMPT_RT kernel, map to raw spinlocks: */ #ifndef CONFIG_PREEMPT_RT /* * Map the spin_lock functions to the raw variants for PREEMPT_RT=n */ static __always_inline raw_spinlock_t *spinlock_check(spinlock_t *lock) { return &lock->rlock; } #ifdef CONFIG_DEBUG_SPINLOCK # define spin_lock_init(lock) \ do { \ static struct lock_class_key __key; \ \ __raw_spin_lock_init(spinlock_check(lock), \ #lock, &__key, LD_WAIT_CONFIG); \ } while (0) #else # define spin_lock_init(_lock) \ do { \ spinlock_check(_lock); \ *(_lock) = __SPIN_LOCK_UNLOCKED(_lock); \ } while (0) #endif static __always_inline void spin_lock(spinlock_t *lock) { raw_spin_lock(&lock->rlock); } static __always_inline void spin_lock_bh(spinlock_t *lock) { raw_spin_lock_bh(&lock->rlock); } static __always_inline int spin_trylock(spinlock_t *lock) { return raw_spin_trylock(&lock->rlock); } #define spin_lock_nested(lock, subclass) \ do { \ raw_spin_lock_nested(spinlock_check(lock), subclass); \ } while (0) #define spin_lock_nest_lock(lock, nest_lock) \ do { \ raw_spin_lock_nest_lock(spinlock_check(lock), nest_lock); \ } while (0) static __always_inline void spin_lock_irq(spinlock_t *lock) { raw_spin_lock_irq(&lock->rlock); } #define spin_lock_irqsave(lock, flags) \ do { \ raw_spin_lock_irqsave(spinlock_check(lock), flags); \ } while (0) #define spin_lock_irqsave_nested(lock, flags, subclass) \ do { \ raw_spin_lock_irqsave_nested(spinlock_check(lock), flags, subclass); \ } while (0) static __always_inline void spin_unlock(spinlock_t *lock) { raw_spin_unlock(&lock->rlock); } static __always_inline void spin_unlock_bh(spinlock_t *lock) { raw_spin_unlock_bh(&lock->rlock); } static __always_inline void spin_unlock_irq(spinlock_t *lock) { raw_spin_unlock_irq(&lock->rlock); } static __always_inline void spin_unlock_irqrestore(spinlock_t *lock, unsigned long flags) { raw_spin_unlock_irqrestore(&lock->rlock, flags); } static __always_inline int spin_trylock_bh(spinlock_t *lock) { return raw_spin_trylock_bh(&lock->rlock); } static __always_inline int spin_trylock_irq(spinlock_t *lock) { return raw_spin_trylock_irq(&lock->rlock); } #define spin_trylock_irqsave(lock, flags) \ ({ \ raw_spin_trylock_irqsave(spinlock_check(lock), flags); \ }) /** * spin_is_locked() - Check whether a spinlock is locked. * @lock: Pointer to the spinlock. * * This function is NOT required to provide any memory ordering * guarantees; it could be used for debugging purposes or, when * additional synchronization is needed, accompanied with other * constructs (memory barriers) enforcing the synchronization. * * Returns: 1 if @lock is locked, 0 otherwise. * * Note that the function only tells you that the spinlock is * seen to be locked, not that it is locked on your CPU. * * Further, on CONFIG_SMP=n builds with CONFIG_DEBUG_SPINLOCK=n, * the return value is always 0 (see include/linux/spinlock_up.h). * Therefore you should not rely heavily on the return value. */ static __always_inline int spin_is_locked(spinlock_t *lock) { return raw_spin_is_locked(&lock->rlock); } static __always_inline int spin_is_contended(spinlock_t *lock) { return raw_spin_is_contended(&lock->rlock); } #define assert_spin_locked(lock) assert_raw_spin_locked(&(lock)->rlock) #else /* !CONFIG_PREEMPT_RT */ # include <linux/spinlock_rt.h> #endif /* CONFIG_PREEMPT_RT */ /* * Does a critical section need to be broken due to another * task waiting?: (technically does not depend on CONFIG_PREEMPTION, * but a general need for low latency) */ static inline int spin_needbreak(spinlock_t *lock) { if (!preempt_model_preemptible()) return 0; return spin_is_contended(lock); } /* * Check if a rwlock is contended. * Returns non-zero if there is another task waiting on the rwlock. * Returns zero if the lock is not contended or the system / underlying * rwlock implementation does not support contention detection. * Technically does not depend on CONFIG_PREEMPTION, but a general need * for low latency. */ static inline int rwlock_needbreak(rwlock_t *lock) { if (!preempt_model_preemptible()) return 0; return rwlock_is_contended(lock); } /* * Pull the atomic_t declaration: * (asm-mips/atomic.h needs above definitions) */ #include <linux/atomic.h> /** * atomic_dec_and_lock - lock on reaching reference count zero * @atomic: the atomic counter * @lock: the spinlock in question * * Decrements @atomic by 1. If the result is 0, returns true and locks * @lock. Returns false for all other cases. */ extern int _atomic_dec_and_lock(atomic_t *atomic, spinlock_t *lock); #define atomic_dec_and_lock(atomic, lock) \ __cond_lock(lock, _atomic_dec_and_lock(atomic, lock)) extern int _atomic_dec_and_lock_irqsave(atomic_t *atomic, spinlock_t *lock, unsigned long *flags); #define atomic_dec_and_lock_irqsave(atomic, lock, flags) \ __cond_lock(lock, _atomic_dec_and_lock_irqsave(atomic, lock, &(flags))) extern int _atomic_dec_and_raw_lock(atomic_t *atomic, raw_spinlock_t *lock); #define atomic_dec_and_raw_lock(atomic, lock) \ __cond_lock(lock, _atomic_dec_and_raw_lock(atomic, lock)) extern int _atomic_dec_and_raw_lock_irqsave(atomic_t *atomic, raw_spinlock_t *lock, unsigned long *flags); #define atomic_dec_and_raw_lock_irqsave(atomic, lock, flags) \ __cond_lock(lock, _atomic_dec_and_raw_lock_irqsave(atomic, lock, &(flags))) int __alloc_bucket_spinlocks(spinlock_t **locks, unsigned int *lock_mask, size_t max_size, unsigned int cpu_mult, gfp_t gfp, const char *name, struct lock_class_key *key); #define alloc_bucket_spinlocks(locks, lock_mask, max_size, cpu_mult, gfp) \ ({ \ static struct lock_class_key key; \ int ret; \ \ ret = __alloc_bucket_spinlocks(locks, lock_mask, max_size, \ cpu_mult, gfp, #locks, &key); \ ret; \ }) void free_bucket_spinlocks(spinlock_t *locks); DEFINE_LOCK_GUARD_1(raw_spinlock, raw_spinlock_t, raw_spin_lock(_T->lock), raw_spin_unlock(_T->lock)) DEFINE_LOCK_GUARD_1_COND(raw_spinlock, _try, raw_spin_trylock(_T->lock)) DEFINE_LOCK_GUARD_1(raw_spinlock_nested, raw_spinlock_t, raw_spin_lock_nested(_T->lock, SINGLE_DEPTH_NESTING), raw_spin_unlock(_T->lock)) DEFINE_LOCK_GUARD_1(raw_spinlock_irq, raw_spinlock_t, raw_spin_lock_irq(_T->lock), raw_spin_unlock_irq(_T->lock)) DEFINE_LOCK_GUARD_1_COND(raw_spinlock_irq, _try, raw_spin_trylock_irq(_T->lock)) DEFINE_LOCK_GUARD_1(raw_spinlock_bh, raw_spinlock_t, raw_spin_lock_bh(_T->lock), raw_spin_unlock_bh(_T->lock)) DEFINE_LOCK_GUARD_1_COND(raw_spinlock_bh, _try, raw_spin_trylock_bh(_T->lock)) DEFINE_LOCK_GUARD_1(raw_spinlock_irqsave, raw_spinlock_t, raw_spin_lock_irqsave(_T->lock, _T->flags), raw_spin_unlock_irqrestore(_T->lock, _T->flags), unsigned long flags) DEFINE_LOCK_GUARD_1_COND(raw_spinlock_irqsave, _try, raw_spin_trylock_irqsave(_T->lock, _T->flags)) DEFINE_LOCK_GUARD_1(spinlock, spinlock_t, spin_lock(_T->lock), spin_unlock(_T->lock)) DEFINE_LOCK_GUARD_1_COND(spinlock, _try, spin_trylock(_T->lock)) DEFINE_LOCK_GUARD_1(spinlock_irq, spinlock_t, spin_lock_irq(_T->lock), spin_unlock_irq(_T->lock)) DEFINE_LOCK_GUARD_1_COND(spinlock_irq, _try, spin_trylock_irq(_T->lock)) DEFINE_LOCK_GUARD_1(spinlock_bh, spinlock_t, spin_lock_bh(_T->lock), spin_unlock_bh(_T->lock)) DEFINE_LOCK_GUARD_1_COND(spinlock_bh, _try, spin_trylock_bh(_T->lock)) DEFINE_LOCK_GUARD_1(spinlock_irqsave, spinlock_t, spin_lock_irqsave(_T->lock, _T->flags), spin_unlock_irqrestore(_T->lock, _T->flags), unsigned long flags) DEFINE_LOCK_GUARD_1_COND(spinlock_irqsave, _try, spin_trylock_irqsave(_T->lock, _T->flags)) DEFINE_LOCK_GUARD_1(read_lock, rwlock_t, read_lock(_T->lock), read_unlock(_T->lock)) DEFINE_LOCK_GUARD_1(read_lock_irq, rwlock_t, read_lock_irq(_T->lock), read_unlock_irq(_T->lock)) DEFINE_LOCK_GUARD_1(read_lock_irqsave, rwlock_t, read_lock_irqsave(_T->lock, _T->flags), read_unlock_irqrestore(_T->lock, _T->flags), unsigned long flags) DEFINE_LOCK_GUARD_1(write_lock, rwlock_t, write_lock(_T->lock), write_unlock(_T->lock)) DEFINE_LOCK_GUARD_1(write_lock_irq, rwlock_t, write_lock_irq(_T->lock), write_unlock_irq(_T->lock)) DEFINE_LOCK_GUARD_1(write_lock_irqsave, rwlock_t, write_lock_irqsave(_T->lock, _T->flags), write_unlock_irqrestore(_T->lock, _T->flags), unsigned long flags) #undef __LINUX_INSIDE_SPINLOCK_H #endif /* __LINUX_SPINLOCK_H */ |
| 10 4 5 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 | /* SPDX-License-Identifier: GPL-2.0 */ #ifndef __LINUX_MROUTE6_H #define __LINUX_MROUTE6_H #include <linux/pim.h> #include <linux/skbuff.h> /* for struct sk_buff_head */ #include <net/net_namespace.h> #include <uapi/linux/mroute6.h> #include <linux/mroute_base.h> #include <linux/sockptr.h> #include <net/fib_rules.h> #ifdef CONFIG_IPV6_MROUTE static inline int ip6_mroute_opt(int opt) { return (opt >= MRT6_BASE) && (opt <= MRT6_MAX); } #else static inline int ip6_mroute_opt(int opt) { return 0; } #endif struct sock; #ifdef CONFIG_IPV6_MROUTE extern int ip6_mroute_setsockopt(struct sock *, int, sockptr_t, unsigned int); extern int ip6_mroute_getsockopt(struct sock *, int, sockptr_t, sockptr_t); extern int ip6_mr_input(struct sk_buff *skb); extern int ip6mr_compat_ioctl(struct sock *sk, unsigned int cmd, void __user *arg); extern int ip6_mr_init(void); extern int ip6_mr_output(struct net *net, struct sock *sk, struct sk_buff *skb); extern void ip6_mr_cleanup(void); int ip6mr_ioctl(struct sock *sk, int cmd, void *arg); #else static inline int ip6_mroute_setsockopt(struct sock *sock, int optname, sockptr_t optval, unsigned int optlen) { return -ENOPROTOOPT; } static inline int ip6_mroute_getsockopt(struct sock *sock, int optname, sockptr_t optval, sockptr_t optlen) { return -ENOPROTOOPT; } static inline int ip6mr_ioctl(struct sock *sk, int cmd, void *arg) { return -ENOIOCTLCMD; } static inline int ip6_mr_init(void) { return 0; } static inline int ip6_mr_output(struct net *net, struct sock *sk, struct sk_buff *skb) { return ip6_output(net, sk, skb); } static inline void ip6_mr_cleanup(void) { return; } #endif #ifdef CONFIG_IPV6_MROUTE_MULTIPLE_TABLES bool ip6mr_rule_default(const struct fib_rule *rule); #else static inline bool ip6mr_rule_default(const struct fib_rule *rule) { return true; } #endif #define VIFF_STATIC 0x8000 struct mfc6_cache_cmp_arg { struct in6_addr mf6c_mcastgrp; struct in6_addr mf6c_origin; }; struct mfc6_cache { struct mr_mfc _c; union { struct { struct in6_addr mf6c_mcastgrp; struct in6_addr mf6c_origin; }; struct mfc6_cache_cmp_arg cmparg; }; }; #define MFC_ASSERT_THRESH (3*HZ) /* Maximal freq. of asserts */ struct rtmsg; extern int ip6mr_get_route(struct net *net, struct sk_buff *skb, struct rtmsg *rtm, u32 portid); #ifdef CONFIG_IPV6_MROUTE bool mroute6_is_socket(struct net *net, struct sk_buff *skb); extern int ip6mr_sk_done(struct sock *sk); static inline int ip6mr_sk_ioctl(struct sock *sk, unsigned int cmd, void __user *arg) { switch (cmd) { /* These userspace buffers will be consumed by ip6mr_ioctl() */ case SIOCGETMIFCNT_IN6: { struct sioc_mif_req6 buffer; return sock_ioctl_inout(sk, cmd, arg, &buffer, sizeof(buffer)); } case SIOCGETSGCNT_IN6: { struct sioc_sg_req6 buffer; return sock_ioctl_inout(sk, cmd, arg, &buffer, sizeof(buffer)); } } return 1; } #else static inline bool mroute6_is_socket(struct net *net, struct sk_buff *skb) { return false; } static inline int ip6mr_sk_done(struct sock *sk) { return 0; } static inline int ip6mr_sk_ioctl(struct sock *sk, unsigned int cmd, void __user *arg) { return 1; } #endif #endif |
| 1006 813 809 284 624 812 626 284 283 804 974 977 60 922 974 66 67 67 67 354 351 157 266 354 267 157 157 350 349 3 3 1242 1235 24 1235 1238 24 1237 1243 1243 1243 1243 9 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 | // SPDX-License-Identifier: GPL-2.0-only /* * jump label support * * Copyright (C) 2009 Jason Baron <jbaron@redhat.com> * Copyright (C) 2011 Peter Zijlstra * */ #include <linux/memory.h> #include <linux/uaccess.h> #include <linux/module.h> #include <linux/list.h> #include <linux/slab.h> #include <linux/sort.h> #include <linux/err.h> #include <linux/static_key.h> #include <linux/jump_label_ratelimit.h> #include <linux/bug.h> #include <linux/cpu.h> #include <asm/sections.h> /* mutex to protect coming/going of the jump_label table */ static DEFINE_MUTEX(jump_label_mutex); void jump_label_lock(void) { mutex_lock(&jump_label_mutex); } void jump_label_unlock(void) { mutex_unlock(&jump_label_mutex); } static int jump_label_cmp(const void *a, const void *b) { const struct jump_entry *jea = a; const struct jump_entry *jeb = b; /* * Entrires are sorted by key. */ if (jump_entry_key(jea) < jump_entry_key(jeb)) return -1; if (jump_entry_key(jea) > jump_entry_key(jeb)) return 1; /* * In the batching mode, entries should also be sorted by the code * inside the already sorted list of entries, enabling a bsearch in * the vector. */ if (jump_entry_code(jea) < jump_entry_code(jeb)) return -1; if (jump_entry_code(jea) > jump_entry_code(jeb)) return 1; return 0; } static void jump_label_swap(void *a, void *b, int size) { long delta = (unsigned long)a - (unsigned long)b; struct jump_entry *jea = a; struct jump_entry *jeb = b; struct jump_entry tmp = *jea; jea->code = jeb->code - delta; jea->target = jeb->target - delta; jea->key = jeb->key - delta; jeb->code = tmp.code + delta; jeb->target = tmp.target + delta; jeb->key = tmp.key + delta; } static void jump_label_sort_entries(struct jump_entry *start, struct jump_entry *stop) { unsigned long size; void *swapfn = NULL; if (IS_ENABLED(CONFIG_HAVE_ARCH_JUMP_LABEL_RELATIVE)) swapfn = jump_label_swap; size = (((unsigned long)stop - (unsigned long)start) / sizeof(struct jump_entry)); sort(start, size, sizeof(struct jump_entry), jump_label_cmp, swapfn); } static void jump_label_update(struct static_key *key); /* * There are similar definitions for the !CONFIG_JUMP_LABEL case in jump_label.h. * The use of 'atomic_read()' requires atomic.h and its problematic for some * kernel headers such as kernel.h and others. Since static_key_count() is not * used in the branch statements as it is for the !CONFIG_JUMP_LABEL case its ok * to have it be a function here. Similarly, for 'static_key_enable()' and * 'static_key_disable()', which require bug.h. This should allow jump_label.h * to be included from most/all places for CONFIG_JUMP_LABEL. */ int static_key_count(struct static_key *key) { /* * -1 means the first static_key_slow_inc() is in progress. * static_key_enabled() must return true, so return 1 here. */ int n = atomic_read(&key->enabled); return n >= 0 ? n : 1; } EXPORT_SYMBOL_GPL(static_key_count); /* * static_key_fast_inc_not_disabled - adds a user for a static key * @key: static key that must be already enabled * * The caller must make sure that the static key can't get disabled while * in this function. It doesn't patch jump labels, only adds a user to * an already enabled static key. * * Returns true if the increment was done. Unlike refcount_t the ref counter * is not saturated, but will fail to increment on overflow. */ bool static_key_fast_inc_not_disabled(struct static_key *key) { int v; STATIC_KEY_CHECK_USE(key); /* * Negative key->enabled has a special meaning: it sends * static_key_slow_inc/dec() down the slow path, and it is non-zero * so it counts as "enabled" in jump_label_update(). * * The INT_MAX overflow condition is either used by the networking * code to reset or detected in the slow path of * static_key_slow_inc_cpuslocked(). */ v = atomic_read(&key->enabled); do { if (v <= 0 || v == INT_MAX) return false; } while (!likely(atomic_try_cmpxchg(&key->enabled, &v, v + 1))); return true; } EXPORT_SYMBOL_GPL(static_key_fast_inc_not_disabled); bool static_key_slow_inc_cpuslocked(struct static_key *key) { lockdep_assert_cpus_held(); /* * Careful if we get concurrent static_key_slow_inc/dec() calls; * later calls must wait for the first one to _finish_ the * jump_label_update() process. At the same time, however, * the jump_label_update() call below wants to see * static_key_enabled(&key) for jumps to be updated properly. */ if (static_key_fast_inc_not_disabled(key)) return true; guard(mutex)(&jump_label_mutex); /* Try to mark it as 'enabling in progress. */ if (!atomic_cmpxchg(&key->enabled, 0, -1)) { jump_label_update(key); /* * Ensure that when static_key_fast_inc_not_disabled() or * static_key_dec_not_one() observe the positive value, * they must also observe all the text changes. */ atomic_set_release(&key->enabled, 1); } else { /* * While holding the mutex this should never observe * anything else than a value >= 1 and succeed */ if (WARN_ON_ONCE(!static_key_fast_inc_not_disabled(key))) return false; } return true; } bool static_key_slow_inc(struct static_key *key) { bool ret; cpus_read_lock(); ret = static_key_slow_inc_cpuslocked(key); cpus_read_unlock(); return ret; } EXPORT_SYMBOL_GPL(static_key_slow_inc); void static_key_enable_cpuslocked(struct static_key *key) { STATIC_KEY_CHECK_USE(key); lockdep_assert_cpus_held(); if (atomic_read(&key->enabled) > 0) { WARN_ON_ONCE(atomic_read(&key->enabled) != 1); return; } jump_label_lock(); if (atomic_read(&key->enabled) == 0) { atomic_set(&key->enabled, -1); jump_label_update(key); /* * See static_key_slow_inc(). */ atomic_set_release(&key->enabled, 1); } jump_label_unlock(); } EXPORT_SYMBOL_GPL(static_key_enable_cpuslocked); void static_key_enable(struct static_key *key) { cpus_read_lock(); static_key_enable_cpuslocked(key); cpus_read_unlock(); } EXPORT_SYMBOL_GPL(static_key_enable); void static_key_disable_cpuslocked(struct static_key *key) { STATIC_KEY_CHECK_USE(key); lockdep_assert_cpus_held(); if (atomic_read(&key->enabled) != 1) { WARN_ON_ONCE(atomic_read(&key->enabled) != 0); return; } jump_label_lock(); if (atomic_cmpxchg(&key->enabled, 1, 0) == 1) jump_label_update(key); jump_label_unlock(); } EXPORT_SYMBOL_GPL(static_key_disable_cpuslocked); void static_key_disable(struct static_key *key) { cpus_read_lock(); static_key_disable_cpuslocked(key); cpus_read_unlock(); } EXPORT_SYMBOL_GPL(static_key_disable); static bool static_key_dec_not_one(struct static_key *key) { int v; /* * Go into the slow path if key::enabled is less than or equal than * one. One is valid to shut down the key, anything less than one * is an imbalance, which is handled at the call site. * * That includes the special case of '-1' which is set in * static_key_slow_inc_cpuslocked(), but that's harmless as it is * fully serialized in the slow path below. By the time this task * acquires the jump label lock the value is back to one and the * retry under the lock must succeed. */ v = atomic_read(&key->enabled); do { /* * Warn about the '-1' case though; since that means a * decrement is concurrent with a first (0->1) increment. IOW * people are trying to disable something that wasn't yet fully * enabled. This suggests an ordering problem on the user side. */ WARN_ON_ONCE(v < 0); /* * Warn about underflow, and lie about success in an attempt to * not make things worse. */ if (WARN_ON_ONCE(v == 0)) return true; if (v <= 1) return false; } while (!likely(atomic_try_cmpxchg(&key->enabled, &v, v - 1))); return true; } static void __static_key_slow_dec_cpuslocked(struct static_key *key) { lockdep_assert_cpus_held(); int val; if (static_key_dec_not_one(key)) return; guard(mutex)(&jump_label_mutex); val = atomic_read(&key->enabled); /* * It should be impossible to observe -1 with jump_label_mutex held, * see static_key_slow_inc_cpuslocked(). */ if (WARN_ON_ONCE(val == -1)) return; /* * Cannot already be 0, something went sideways. */ if (WARN_ON_ONCE(val == 0)) return; if (atomic_dec_and_test(&key->enabled)) jump_label_update(key); } static void __static_key_slow_dec(struct static_key *key) { cpus_read_lock(); __static_key_slow_dec_cpuslocked(key); cpus_read_unlock(); } void jump_label_update_timeout(struct work_struct *work) { struct static_key_deferred *key = container_of(work, struct static_key_deferred, work.work); __static_key_slow_dec(&key->key); } EXPORT_SYMBOL_GPL(jump_label_update_timeout); void static_key_slow_dec(struct static_key *key) { STATIC_KEY_CHECK_USE(key); __static_key_slow_dec(key); } EXPORT_SYMBOL_GPL(static_key_slow_dec); void static_key_slow_dec_cpuslocked(struct static_key *key) { STATIC_KEY_CHECK_USE(key); __static_key_slow_dec_cpuslocked(key); } void __static_key_slow_dec_deferred(struct static_key *key, struct delayed_work *work, unsigned long timeout) { STATIC_KEY_CHECK_USE(key); if (static_key_dec_not_one(key)) return; schedule_delayed_work(work, timeout); } EXPORT_SYMBOL_GPL(__static_key_slow_dec_deferred); void __static_key_deferred_flush(void *key, struct delayed_work *work) { STATIC_KEY_CHECK_USE(key); flush_delayed_work(work); } EXPORT_SYMBOL_GPL(__static_key_deferred_flush); void jump_label_rate_limit(struct static_key_deferred *key, unsigned long rl) { STATIC_KEY_CHECK_USE(key); key->timeout = rl; INIT_DELAYED_WORK(&key->work, jump_label_update_timeout); } EXPORT_SYMBOL_GPL(jump_label_rate_limit); static int addr_conflict(struct jump_entry *entry, void *start, void *end) { if (jump_entry_code(entry) <= (unsigned long)end && jump_entry_code(entry) + jump_entry_size(entry) > (unsigned long)start) return 1; return 0; } static int __jump_label_text_reserved(struct jump_entry *iter_start, struct jump_entry *iter_stop, void *start, void *end, bool init) { struct jump_entry *iter; iter = iter_start; while (iter < iter_stop) { if (init || !jump_entry_is_init(iter)) { if (addr_conflict(iter, start, end)) return 1; } iter++; } return 0; } #ifndef arch_jump_label_transform_static static void arch_jump_label_transform_static(struct jump_entry *entry, enum jump_label_type type) { /* nothing to do on most architectures */ } #endif static inline struct jump_entry *static_key_entries(struct static_key *key) { WARN_ON_ONCE(key->type & JUMP_TYPE_LINKED); return (struct jump_entry *)(key->type & ~JUMP_TYPE_MASK); } static inline bool static_key_type(struct static_key *key) { return key->type & JUMP_TYPE_TRUE; } static inline bool static_key_linked(struct static_key *key) { return key->type & JUMP_TYPE_LINKED; } static inline void static_key_clear_linked(struct static_key *key) { key->type &= ~JUMP_TYPE_LINKED; } static inline void static_key_set_linked(struct static_key *key) { key->type |= JUMP_TYPE_LINKED; } /*** * A 'struct static_key' uses a union such that it either points directly * to a table of 'struct jump_entry' or to a linked list of modules which in * turn point to 'struct jump_entry' tables. * * The two lower bits of the pointer are used to keep track of which pointer * type is in use and to store the initial branch direction, we use an access * function which preserves these bits. */ static void static_key_set_entries(struct static_key *key, struct jump_entry *entries) { unsigned long type; WARN_ON_ONCE((unsigned long)entries & JUMP_TYPE_MASK); type = key->type & JUMP_TYPE_MASK; key->entries = entries; key->type |= type; } static enum jump_label_type jump_label_type(struct jump_entry *entry) { struct static_key *key = jump_entry_key(entry); bool enabled = static_key_enabled(key); bool branch = jump_entry_is_branch(entry); /* See the comment in linux/jump_label.h */ return enabled ^ branch; } static bool jump_label_can_update(struct jump_entry *entry, bool init) { /* * Cannot update code that was in an init text area. */ if (!init && jump_entry_is_init(entry)) return false; if (!kernel_text_address(jump_entry_code(entry))) { /* * This skips patching built-in __exit, which * is part of init_section_contains() but is * not part of kernel_text_address(). * * Skipping built-in __exit is fine since it * will never be executed. */ WARN_ONCE(!jump_entry_is_init(entry), "can't patch jump_label at %pS", (void *)jump_entry_code(entry)); return false; } return true; } #ifndef HAVE_JUMP_LABEL_BATCH static void __jump_label_update(struct static_key *key, struct jump_entry *entry, struct jump_entry *stop, bool init) { for (; (entry < stop) && (jump_entry_key(entry) == key); entry++) { if (jump_label_can_update(entry, init)) arch_jump_label_transform(entry, jump_label_type(entry)); } } #else static void __jump_label_update(struct static_key *key, struct jump_entry *entry, struct jump_entry *stop, bool init) { for (; (entry < stop) && (jump_entry_key(entry) == key); entry++) { if (!jump_label_can_update(entry, init)) continue; if (!arch_jump_label_transform_queue(entry, jump_label_type(entry))) { /* * Queue is full: Apply the current queue and try again. */ arch_jump_label_transform_apply(); BUG_ON(!arch_jump_label_transform_queue(entry, jump_label_type(entry))); } } arch_jump_label_transform_apply(); } #endif void __init jump_label_init(void) { struct jump_entry *iter_start = __start___jump_table; struct jump_entry *iter_stop = __stop___jump_table; struct static_key *key = NULL; struct jump_entry *iter; /* * Since we are initializing the static_key.enabled field with * with the 'raw' int values (to avoid pulling in atomic.h) in * jump_label.h, let's make sure that is safe. There are only two * cases to check since we initialize to 0 or 1. */ BUILD_BUG_ON((int)ATOMIC_INIT(0) != 0); BUILD_BUG_ON((int)ATOMIC_INIT(1) != 1); if (static_key_initialized) return; cpus_read_lock(); jump_label_lock(); jump_label_sort_entries(iter_start, iter_stop); for (iter = iter_start; iter < iter_stop; iter++) { struct static_key *iterk; bool in_init; /* rewrite NOPs */ if (jump_label_type(iter) == JUMP_LABEL_NOP) arch_jump_label_transform_static(iter, JUMP_LABEL_NOP); in_init = init_section_contains((void *)jump_entry_code(iter), 1); jump_entry_set_init(iter, in_init); iterk = jump_entry_key(iter); if (iterk == key) continue; key = iterk; static_key_set_entries(key, iter); } static_key_initialized = true; jump_label_unlock(); cpus_read_unlock(); } static inline bool static_key_sealed(struct static_key *key) { return (key->type & JUMP_TYPE_LINKED) && !(key->type & ~JUMP_TYPE_MASK); } static inline void static_key_seal(struct static_key *key) { unsigned long type = key->type & JUMP_TYPE_TRUE; key->type = JUMP_TYPE_LINKED | type; } void jump_label_init_ro(void) { struct jump_entry *iter_start = __start___jump_table; struct jump_entry *iter_stop = __stop___jump_table; struct jump_entry *iter; if (WARN_ON_ONCE(!static_key_initialized)) return; cpus_read_lock(); jump_label_lock(); for (iter = iter_start; iter < iter_stop; iter++) { struct static_key *iterk = jump_entry_key(iter); if (!is_kernel_ro_after_init((unsigned long)iterk)) continue; if (static_key_sealed(iterk)) continue; static_key_seal(iterk); } jump_label_unlock(); cpus_read_unlock(); } #ifdef CONFIG_MODULES enum jump_label_type jump_label_init_type(struct jump_entry *entry) { struct static_key *key = jump_entry_key(entry); bool type = static_key_type(key); bool branch = jump_entry_is_branch(entry); /* See the comment in linux/jump_label.h */ return type ^ branch; } struct static_key_mod { struct static_key_mod *next; struct jump_entry *entries; struct module *mod; }; static inline struct static_key_mod *static_key_mod(struct static_key *key) { WARN_ON_ONCE(!static_key_linked(key)); return (struct static_key_mod *)(key->type & ~JUMP_TYPE_MASK); } /*** * key->type and key->next are the same via union. * This sets key->next and preserves the type bits. * * See additional comments above static_key_set_entries(). */ static void static_key_set_mod(struct static_key *key, struct static_key_mod *mod) { unsigned long type; WARN_ON_ONCE((unsigned long)mod & JUMP_TYPE_MASK); type = key->type & JUMP_TYPE_MASK; key->next = mod; key->type |= type; } static int __jump_label_mod_text_reserved(void *start, void *end) { struct module *mod; int ret; scoped_guard(rcu) { mod = __module_text_address((unsigned long)start); WARN_ON_ONCE(__module_text_address((unsigned long)end) != mod); if (!try_module_get(mod)) mod = NULL; } if (!mod) return 0; ret = __jump_label_text_reserved(mod->jump_entries, mod->jump_entries + mod->num_jump_entries, start, end, mod->state == MODULE_STATE_COMING); module_put(mod); return ret; } static void __jump_label_mod_update(struct static_key *key) { struct static_key_mod *mod; for (mod = static_key_mod(key); mod; mod = mod->next) { struct jump_entry *stop; struct module *m; /* * NULL if the static_key is defined in a module * that does not use it */ if (!mod->entries) continue; m = mod->mod; if (!m) stop = __stop___jump_table; else stop = m->jump_entries + m->num_jump_entries; __jump_label_update(key, mod->entries, stop, m && m->state == MODULE_STATE_COMING); } } static int jump_label_add_module(struct module *mod) { struct jump_entry *iter_start = mod->jump_entries; struct jump_entry *iter_stop = iter_start + mod->num_jump_entries; struct jump_entry *iter; struct static_key *key = NULL; struct static_key_mod *jlm, *jlm2; /* if the module doesn't have jump label entries, just return */ if (iter_start == iter_stop) return 0; jump_label_sort_entries(iter_start, iter_stop); for (iter = iter_start; iter < iter_stop; iter++) { struct static_key *iterk; bool in_init; in_init = within_module_init(jump_entry_code(iter), mod); jump_entry_set_init(iter, in_init); iterk = jump_entry_key(iter); if (iterk == key) continue; key = iterk; if (within_module((unsigned long)key, mod)) { static_key_set_entries(key, iter); continue; } /* * If the key was sealed at init, then there's no need to keep a * reference to its module entries - just patch them now and be * done with it. */ if (static_key_sealed(key)) goto do_poke; jlm = kzalloc(sizeof(struct static_key_mod), GFP_KERNEL); if (!jlm) return -ENOMEM; if (!static_key_linked(key)) { jlm2 = kzalloc(sizeof(struct static_key_mod), GFP_KERNEL); if (!jlm2) { kfree(jlm); return -ENOMEM; } scoped_guard(rcu) jlm2->mod = __module_address((unsigned long)key); jlm2->entries = static_key_entries(key); jlm2->next = NULL; static_key_set_mod(key, jlm2); static_key_set_linked(key); } jlm->mod = mod; jlm->entries = iter; jlm->next = static_key_mod(key); static_key_set_mod(key, jlm); static_key_set_linked(key); /* Only update if we've changed from our initial state */ do_poke: if (jump_label_type(iter) != jump_label_init_type(iter)) __jump_label_update(key, iter, iter_stop, true); } return 0; } static void jump_label_del_module(struct module *mod) { struct jump_entry *iter_start = mod->jump_entries; struct jump_entry *iter_stop = iter_start + mod->num_jump_entries; struct jump_entry *iter; struct static_key *key = NULL; struct static_key_mod *jlm, **prev; for (iter = iter_start; iter < iter_stop; iter++) { if (jump_entry_key(iter) == key) continue; key = jump_entry_key(iter); if (within_module((unsigned long)key, mod)) continue; /* No @jlm allocated because key was sealed at init. */ if (static_key_sealed(key)) continue; /* No memory during module load */ if (WARN_ON(!static_key_linked(key))) continue; prev = &key->next; jlm = static_key_mod(key); while (jlm && jlm->mod != mod) { prev = &jlm->next; jlm = jlm->next; } /* No memory during module load */ if (WARN_ON(!jlm)) continue; if (prev == &key->next) static_key_set_mod(key, jlm->next); else *prev = jlm->next; kfree(jlm); jlm = static_key_mod(key); /* if only one etry is left, fold it back into the static_key */ if (jlm->next == NULL) { static_key_set_entries(key, jlm->entries); static_key_clear_linked(key); kfree(jlm); } } } static int jump_label_module_notify(struct notifier_block *self, unsigned long val, void *data) { struct module *mod = data; int ret = 0; cpus_read_lock(); jump_label_lock(); switch (val) { case MODULE_STATE_COMING: ret = jump_label_add_module(mod); if (ret) { WARN(1, "Failed to allocate memory: jump_label may not work properly.\n"); jump_label_del_module(mod); } break; case MODULE_STATE_GOING: jump_label_del_module(mod); break; } jump_label_unlock(); cpus_read_unlock(); return notifier_from_errno(ret); } static struct notifier_block jump_label_module_nb = { .notifier_call = jump_label_module_notify, .priority = 1, /* higher than tracepoints */ }; static __init int jump_label_init_module(void) { return register_module_notifier(&jump_label_module_nb); } early_initcall(jump_label_init_module); #endif /* CONFIG_MODULES */ /*** * jump_label_text_reserved - check if addr range is reserved * @start: start text addr * @end: end text addr * * checks if the text addr located between @start and @end * overlaps with any of the jump label patch addresses. Code * that wants to modify kernel text should first verify that * it does not overlap with any of the jump label addresses. * Caller must hold jump_label_mutex. * * returns 1 if there is an overlap, 0 otherwise */ int jump_label_text_reserved(void *start, void *end) { bool init = system_state < SYSTEM_RUNNING; int ret = __jump_label_text_reserved(__start___jump_table, __stop___jump_table, start, end, init); if (ret) return ret; #ifdef CONFIG_MODULES ret = __jump_label_mod_text_reserved(start, end); #endif return ret; } static void jump_label_update(struct static_key *key) { struct jump_entry *stop = __stop___jump_table; bool init = system_state < SYSTEM_RUNNING; struct jump_entry *entry; #ifdef CONFIG_MODULES struct module *mod; if (static_key_linked(key)) { __jump_label_mod_update(key); return; } scoped_guard(rcu) { mod = __module_address((unsigned long)key); if (mod) { stop = mod->jump_entries + mod->num_jump_entries; init = mod->state == MODULE_STATE_COMING; } } #endif entry = static_key_entries(key); /* if there are no users, entry can be NULL */ if (entry) __jump_label_update(key, entry, stop, init); } #ifdef CONFIG_STATIC_KEYS_SELFTEST static DEFINE_STATIC_KEY_TRUE(sk_true); static DEFINE_STATIC_KEY_FALSE(sk_false); static __init int jump_label_test(void) { int i; for (i = 0; i < 2; i++) { WARN_ON(static_key_enabled(&sk_true.key) != true); WARN_ON(static_key_enabled(&sk_false.key) != false); WARN_ON(!static_branch_likely(&sk_true)); WARN_ON(!static_branch_unlikely(&sk_true)); WARN_ON(static_branch_likely(&sk_false)); WARN_ON(static_branch_unlikely(&sk_false)); static_branch_disable(&sk_true); static_branch_enable(&sk_false); WARN_ON(static_key_enabled(&sk_true.key) == true); WARN_ON(static_key_enabled(&sk_false.key) == false); WARN_ON(static_branch_likely(&sk_true)); WARN_ON(static_branch_unlikely(&sk_true)); WARN_ON(!static_branch_likely(&sk_false)); WARN_ON(!static_branch_unlikely(&sk_false)); static_branch_enable(&sk_true); static_branch_disable(&sk_false); } return 0; } early_initcall(jump_label_test); #endif /* STATIC_KEYS_SELFTEST */ |
| 33 33 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 | // SPDX-License-Identifier: GPL-2.0-or-later /* Copyright 2020 NXP */ #include <linux/module.h> #include <linux/types.h> #include <linux/kernel.h> #include <linux/string.h> #include <linux/errno.h> #include <linux/skbuff.h> #include <linux/rtnetlink.h> #include <linux/init.h> #include <linux/slab.h> #include <net/act_api.h> #include <net/netlink.h> #include <net/pkt_cls.h> #include <net/tc_act/tc_gate.h> #include <net/tc_wrapper.h> static struct tc_action_ops act_gate_ops; static ktime_t gate_get_time(struct tcf_gate *gact) { ktime_t mono = ktime_get(); switch (gact->tk_offset) { case TK_OFFS_MAX: return mono; default: return ktime_mono_to_any(mono, gact->tk_offset); } return KTIME_MAX; } static void gate_get_start_time(struct tcf_gate *gact, ktime_t *start) { struct tcf_gate_params *param = &gact->param; ktime_t now, base, cycle; u64 n; base = ns_to_ktime(param->tcfg_basetime); now = gate_get_time(gact); if (ktime_after(base, now)) { *start = base; return; } cycle = param->tcfg_cycletime; n = div64_u64(ktime_sub_ns(now, base), cycle); *start = ktime_add_ns(base, (n + 1) * cycle); } static void gate_start_timer(struct tcf_gate *gact, ktime_t start) { ktime_t expires; expires = hrtimer_get_expires(&gact->hitimer); if (expires == 0) expires = KTIME_MAX; start = min_t(ktime_t, start, expires); hrtimer_start(&gact->hitimer, start, HRTIMER_MODE_ABS_SOFT); } static enum hrtimer_restart gate_timer_func(struct hrtimer *timer) { struct tcf_gate *gact = container_of(timer, struct tcf_gate, hitimer); struct tcf_gate_params *p = &gact->param; struct tcfg_gate_entry *next; ktime_t close_time, now; spin_lock(&gact->tcf_lock); next = gact->next_entry; /* cycle start, clear pending bit, clear total octets */ gact->current_gate_status = next->gate_state ? GATE_ACT_GATE_OPEN : 0; gact->current_entry_octets = 0; gact->current_max_octets = next->maxoctets; gact->current_close_time = ktime_add_ns(gact->current_close_time, next->interval); close_time = gact->current_close_time; if (list_is_last(&next->list, &p->entries)) next = list_first_entry(&p->entries, struct tcfg_gate_entry, list); else next = list_next_entry(next, list); now = gate_get_time(gact); if (ktime_after(now, close_time)) { ktime_t cycle, base; u64 n; cycle = p->tcfg_cycletime; base = ns_to_ktime(p->tcfg_basetime); n = div64_u64(ktime_sub_ns(now, base), cycle); close_time = ktime_add_ns(base, (n + 1) * cycle); } gact->next_entry = next; hrtimer_set_expires(&gact->hitimer, close_time); spin_unlock(&gact->tcf_lock); return HRTIMER_RESTART; } TC_INDIRECT_SCOPE int tcf_gate_act(struct sk_buff *skb, const struct tc_action *a, struct tcf_result *res) { struct tcf_gate *gact = to_gate(a); int action = READ_ONCE(gact->tcf_action); tcf_lastuse_update(&gact->tcf_tm); tcf_action_update_bstats(&gact->common, skb); spin_lock(&gact->tcf_lock); if (unlikely(gact->current_gate_status & GATE_ACT_PENDING)) { spin_unlock(&gact->tcf_lock); return action; } if (!(gact->current_gate_status & GATE_ACT_GATE_OPEN)) { spin_unlock(&gact->tcf_lock); goto drop; } if (gact->current_max_octets >= 0) { gact->current_entry_octets += qdisc_pkt_len(skb); if (gact->current_entry_octets > gact->current_max_octets) { spin_unlock(&gact->tcf_lock); goto overlimit; } } spin_unlock(&gact->tcf_lock); return action; overlimit: tcf_action_inc_overlimit_qstats(&gact->common); drop: tcf_action_inc_drop_qstats(&gact->common); return TC_ACT_SHOT; } static const struct nla_policy entry_policy[TCA_GATE_ENTRY_MAX + 1] = { [TCA_GATE_ENTRY_INDEX] = { .type = NLA_U32 }, [TCA_GATE_ENTRY_GATE] = { .type = NLA_FLAG }, [TCA_GATE_ENTRY_INTERVAL] = { .type = NLA_U32 }, [TCA_GATE_ENTRY_IPV] = { .type = NLA_S32 }, [TCA_GATE_ENTRY_MAX_OCTETS] = { .type = NLA_S32 }, }; static const struct nla_policy gate_policy[TCA_GATE_MAX + 1] = { [TCA_GATE_PARMS] = NLA_POLICY_EXACT_LEN(sizeof(struct tc_gate)), [TCA_GATE_PRIORITY] = { .type = NLA_S32 }, [TCA_GATE_ENTRY_LIST] = { .type = NLA_NESTED }, [TCA_GATE_BASE_TIME] = { .type = NLA_U64 }, [TCA_GATE_CYCLE_TIME] = { .type = NLA_U64 }, [TCA_GATE_CYCLE_TIME_EXT] = { .type = NLA_U64 }, [TCA_GATE_FLAGS] = { .type = NLA_U32 }, [TCA_GATE_CLOCKID] = { .type = NLA_S32 }, }; static int fill_gate_entry(struct nlattr **tb, struct tcfg_gate_entry *entry, struct netlink_ext_ack *extack) { u32 interval = 0; entry->gate_state = nla_get_flag(tb[TCA_GATE_ENTRY_GATE]); if (tb[TCA_GATE_ENTRY_INTERVAL]) interval = nla_get_u32(tb[TCA_GATE_ENTRY_INTERVAL]); if (interval == 0) { NL_SET_ERR_MSG(extack, "Invalid interval for schedule entry"); return -EINVAL; } entry->interval = interval; entry->ipv = nla_get_s32_default(tb[TCA_GATE_ENTRY_IPV], -1); entry->maxoctets = nla_get_s32_default(tb[TCA_GATE_ENTRY_MAX_OCTETS], -1); return 0; } static int parse_gate_entry(struct nlattr *n, struct tcfg_gate_entry *entry, int index, struct netlink_ext_ack *extack) { struct nlattr *tb[TCA_GATE_ENTRY_MAX + 1] = { }; int err; err = nla_parse_nested(tb, TCA_GATE_ENTRY_MAX, n, entry_policy, extack); if (err < 0) { NL_SET_ERR_MSG(extack, "Could not parse nested entry"); return -EINVAL; } entry->index = index; return fill_gate_entry(tb, entry, extack); } static void release_entry_list(struct list_head *entries) { struct tcfg_gate_entry *entry, *e; list_for_each_entry_safe(entry, e, entries, list) { list_del(&entry->list); kfree(entry); } } static int parse_gate_list(struct nlattr *list_attr, struct tcf_gate_params *sched, struct netlink_ext_ack *extack) { struct tcfg_gate_entry *entry; struct nlattr *n; int err, rem; int i = 0; if (!list_attr) return -EINVAL; nla_for_each_nested(n, list_attr, rem) { if (nla_type(n) != TCA_GATE_ONE_ENTRY) { NL_SET_ERR_MSG(extack, "Attribute isn't type 'entry'"); continue; } entry = kzalloc(sizeof(*entry), GFP_ATOMIC); if (!entry) { NL_SET_ERR_MSG(extack, "Not enough memory for entry"); err = -ENOMEM; goto release_list; } err = parse_gate_entry(n, entry, i, extack); if (err < 0) { kfree(entry); goto release_list; } list_add_tail(&entry->list, &sched->entries); i++; } sched->num_entries = i; return i; release_list: release_entry_list(&sched->entries); return err; } static void gate_setup_timer(struct tcf_gate *gact, u64 basetime, enum tk_offsets tko, s32 clockid, bool do_init) { if (!do_init) { if (basetime == gact->param.tcfg_basetime && tko == gact->tk_offset && clockid == gact->param.tcfg_clockid) return; spin_unlock_bh(&gact->tcf_lock); hrtimer_cancel(&gact->hitimer); spin_lock_bh(&gact->tcf_lock); } gact->param.tcfg_basetime = basetime; gact->param.tcfg_clockid = clockid; gact->tk_offset = tko; hrtimer_setup(&gact->hitimer, gate_timer_func, clockid, HRTIMER_MODE_ABS_SOFT); } static int tcf_gate_init(struct net *net, struct nlattr *nla, struct nlattr *est, struct tc_action **a, struct tcf_proto *tp, u32 flags, struct netlink_ext_ack *extack) { struct tc_action_net *tn = net_generic(net, act_gate_ops.net_id); enum tk_offsets tk_offset = TK_OFFS_TAI; bool bind = flags & TCA_ACT_FLAGS_BIND; struct nlattr *tb[TCA_GATE_MAX + 1]; struct tcf_chain *goto_ch = NULL; u64 cycletime = 0, basetime = 0; struct tcf_gate_params *p; s32 clockid = CLOCK_TAI; struct tcf_gate *gact; struct tc_gate *parm; int ret = 0, err; u32 gflags = 0; s32 prio = -1; ktime_t start; u32 index; if (!nla) return -EINVAL; err = nla_parse_nested(tb, TCA_GATE_MAX, nla, gate_policy, extack); if (err < 0) return err; if (!tb[TCA_GATE_PARMS]) return -EINVAL; if (tb[TCA_GATE_CLOCKID]) { clockid = nla_get_s32(tb[TCA_GATE_CLOCKID]); switch (clockid) { case CLOCK_REALTIME: tk_offset = TK_OFFS_REAL; break; case CLOCK_MONOTONIC: tk_offset = TK_OFFS_MAX; break; case CLOCK_BOOTTIME: tk_offset = TK_OFFS_BOOT; break; case CLOCK_TAI: tk_offset = TK_OFFS_TAI; break; default: NL_SET_ERR_MSG(extack, "Invalid 'clockid'"); return -EINVAL; } } parm = nla_data(tb[TCA_GATE_PARMS]); index = parm->index; err = tcf_idr_check_alloc(tn, &index, a, bind); if (err < 0) return err; if (err && bind) return ACT_P_BOUND; if (!err) { ret = tcf_idr_create_from_flags(tn, index, est, a, &act_gate_ops, bind, flags); if (ret) { tcf_idr_cleanup(tn, index); return ret; } ret = ACT_P_CREATED; } else if (!(flags & TCA_ACT_FLAGS_REPLACE)) { tcf_idr_release(*a, bind); return -EEXIST; } if (tb[TCA_GATE_PRIORITY]) prio = nla_get_s32(tb[TCA_GATE_PRIORITY]); if (tb[TCA_GATE_BASE_TIME]) basetime = nla_get_u64(tb[TCA_GATE_BASE_TIME]); if (tb[TCA_GATE_FLAGS]) gflags = nla_get_u32(tb[TCA_GATE_FLAGS]); gact = to_gate(*a); if (ret == ACT_P_CREATED) INIT_LIST_HEAD(&gact->param.entries); err = tcf_action_check_ctrlact(parm->action, tp, &goto_ch, extack); if (err < 0) goto release_idr; spin_lock_bh(&gact->tcf_lock); p = &gact->param; if (tb[TCA_GATE_CYCLE_TIME]) cycletime = nla_get_u64(tb[TCA_GATE_CYCLE_TIME]); if (tb[TCA_GATE_ENTRY_LIST]) { err = parse_gate_list(tb[TCA_GATE_ENTRY_LIST], p, extack); if (err < 0) goto chain_put; } if (!cycletime) { struct tcfg_gate_entry *entry; ktime_t cycle = 0; list_for_each_entry(entry, &p->entries, list) cycle = ktime_add_ns(cycle, entry->interval); cycletime = cycle; if (!cycletime) { err = -EINVAL; goto chain_put; } } p->tcfg_cycletime = cycletime; if (tb[TCA_GATE_CYCLE_TIME_EXT]) p->tcfg_cycletime_ext = nla_get_u64(tb[TCA_GATE_CYCLE_TIME_EXT]); gate_setup_timer(gact, basetime, tk_offset, clockid, ret == ACT_P_CREATED); p->tcfg_priority = prio; p->tcfg_flags = gflags; gate_get_start_time(gact, &start); gact->current_close_time = start; gact->current_gate_status = GATE_ACT_GATE_OPEN | GATE_ACT_PENDING; gact->next_entry = list_first_entry(&p->entries, struct tcfg_gate_entry, list); goto_ch = tcf_action_set_ctrlact(*a, parm->action, goto_ch); gate_start_timer(gact, start); spin_unlock_bh(&gact->tcf_lock); if (goto_ch) tcf_chain_put_by_act(goto_ch); return ret; chain_put: spin_unlock_bh(&gact->tcf_lock); if (goto_ch) tcf_chain_put_by_act(goto_ch); release_idr: /* action is not inserted in any list: it's safe to init hitimer * without taking tcf_lock. */ if (ret == ACT_P_CREATED) gate_setup_timer(gact, gact->param.tcfg_basetime, gact->tk_offset, gact->param.tcfg_clockid, true); tcf_idr_release(*a, bind); return err; } static void tcf_gate_cleanup(struct tc_action *a) { struct tcf_gate *gact = to_gate(a); struct tcf_gate_params *p; p = &gact->param; hrtimer_cancel(&gact->hitimer); release_entry_list(&p->entries); } static int dumping_entry(struct sk_buff *skb, struct tcfg_gate_entry *entry) { struct nlattr *item; item = nla_nest_start_noflag(skb, TCA_GATE_ONE_ENTRY); if (!item) return -ENOSPC; if (nla_put_u32(skb, TCA_GATE_ENTRY_INDEX, entry->index)) goto nla_put_failure; if (entry->gate_state && nla_put_flag(skb, TCA_GATE_ENTRY_GATE)) goto nla_put_failure; if (nla_put_u32(skb, TCA_GATE_ENTRY_INTERVAL, entry->interval)) goto nla_put_failure; if (nla_put_s32(skb, TCA_GATE_ENTRY_MAX_OCTETS, entry->maxoctets)) goto nla_put_failure; if (nla_put_s32(skb, TCA_GATE_ENTRY_IPV, entry->ipv)) goto nla_put_failure; return nla_nest_end(skb, item); nla_put_failure: nla_nest_cancel(skb, item); return -1; } static int tcf_gate_dump(struct sk_buff *skb, struct tc_action *a, int bind, int ref) { unsigned char *b = skb_tail_pointer(skb); struct tcf_gate *gact = to_gate(a); struct tc_gate opt = { .index = gact->tcf_index, .refcnt = refcount_read(&gact->tcf_refcnt) - ref, .bindcnt = atomic_read(&gact->tcf_bindcnt) - bind, }; struct tcfg_gate_entry *entry; struct tcf_gate_params *p; struct nlattr *entry_list; struct tcf_t t; spin_lock_bh(&gact->tcf_lock); opt.action = gact->tcf_action; p = &gact->param; if (nla_put(skb, TCA_GATE_PARMS, sizeof(opt), &opt)) goto nla_put_failure; if (nla_put_u64_64bit(skb, TCA_GATE_BASE_TIME, p->tcfg_basetime, TCA_GATE_PAD)) goto nla_put_failure; if (nla_put_u64_64bit(skb, TCA_GATE_CYCLE_TIME, p->tcfg_cycletime, TCA_GATE_PAD)) goto nla_put_failure; if (nla_put_u64_64bit(skb, TCA_GATE_CYCLE_TIME_EXT, p->tcfg_cycletime_ext, TCA_GATE_PAD)) goto nla_put_failure; if (nla_put_s32(skb, TCA_GATE_CLOCKID, p->tcfg_clockid)) goto nla_put_failure; if (nla_put_u32(skb, TCA_GATE_FLAGS, p->tcfg_flags)) goto nla_put_failure; if (nla_put_s32(skb, TCA_GATE_PRIORITY, p->tcfg_priority)) goto nla_put_failure; entry_list = nla_nest_start_noflag(skb, TCA_GATE_ENTRY_LIST); if (!entry_list) goto nla_put_failure; list_for_each_entry(entry, &p->entries, list) { if (dumping_entry(skb, entry) < 0) goto nla_put_failure; } nla_nest_end(skb, entry_list); tcf_tm_dump(&t, &gact->tcf_tm); if (nla_put_64bit(skb, TCA_GATE_TM, sizeof(t), &t, TCA_GATE_PAD)) goto nla_put_failure; spin_unlock_bh(&gact->tcf_lock); return skb->len; nla_put_failure: spin_unlock_bh(&gact->tcf_lock); nlmsg_trim(skb, b); return -1; } static void tcf_gate_stats_update(struct tc_action *a, u64 bytes, u64 packets, u64 drops, u64 lastuse, bool hw) { struct tcf_gate *gact = to_gate(a); struct tcf_t *tm = &gact->tcf_tm; tcf_action_update_stats(a, bytes, packets, drops, hw); tm->lastuse = max_t(u64, tm->lastuse, lastuse); } static size_t tcf_gate_get_fill_size(const struct tc_action *act) { return nla_total_size(sizeof(struct tc_gate)); } static void tcf_gate_entry_destructor(void *priv) { struct action_gate_entry *oe = priv; kfree(oe); } static int tcf_gate_get_entries(struct flow_action_entry *entry, const struct tc_action *act) { entry->gate.entries = tcf_gate_get_list(act); if (!entry->gate.entries) return -EINVAL; entry->destructor = tcf_gate_entry_destructor; entry->destructor_priv = entry->gate.entries; return 0; } static int tcf_gate_offload_act_setup(struct tc_action *act, void *entry_data, u32 *index_inc, bool bind, struct netlink_ext_ack *extack) { int err; if (bind) { struct flow_action_entry *entry = entry_data; entry->id = FLOW_ACTION_GATE; entry->gate.prio = tcf_gate_prio(act); entry->gate.basetime = tcf_gate_basetime(act); entry->gate.cycletime = tcf_gate_cycletime(act); entry->gate.cycletimeext = tcf_gate_cycletimeext(act); entry->gate.num_entries = tcf_gate_num_entries(act); err = tcf_gate_get_entries(entry, act); if (err) return err; *index_inc = 1; } else { struct flow_offload_action *fl_action = entry_data; fl_action->id = FLOW_ACTION_GATE; } return 0; } static struct tc_action_ops act_gate_ops = { .kind = "gate", .id = TCA_ID_GATE, .owner = THIS_MODULE, .act = tcf_gate_act, .dump = tcf_gate_dump, .init = tcf_gate_init, .cleanup = tcf_gate_cleanup, .stats_update = tcf_gate_stats_update, .get_fill_size = tcf_gate_get_fill_size, .offload_act_setup = tcf_gate_offload_act_setup, .size = sizeof(struct tcf_gate), }; MODULE_ALIAS_NET_ACT("gate"); static __net_init int gate_init_net(struct net *net) { struct tc_action_net *tn = net_generic(net, act_gate_ops.net_id); return tc_action_net_init(net, tn, &act_gate_ops); } static void __net_exit gate_exit_net(struct list_head *net_list) { tc_action_net_exit(net_list, act_gate_ops.net_id); } static struct pernet_operations gate_net_ops = { .init = gate_init_net, .exit_batch = gate_exit_net, .id = &act_gate_ops.net_id, .size = sizeof(struct tc_action_net), }; static int __init gate_init_module(void) { return tcf_register_action(&act_gate_ops, &gate_net_ops); } static void __exit gate_cleanup_module(void) { tcf_unregister_action(&act_gate_ops, &gate_net_ops); } module_init(gate_init_module); module_exit(gate_cleanup_module); MODULE_DESCRIPTION("TC gate action"); MODULE_LICENSE("GPL v2"); |
| 14 14 17 33 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 | // SPDX-License-Identifier: GPL-2.0-or-later /* * UDPLITEv6 An implementation of the UDP-Lite protocol over IPv6. * See also net/ipv4/udplite.c * * Authors: Gerrit Renker <gerrit@erg.abdn.ac.uk> * * Changes: * Fixes: */ #define pr_fmt(fmt) "UDPLite6: " fmt #include <linux/export.h> #include <linux/proc_fs.h> #include "udp_impl.h" static int udplitev6_sk_init(struct sock *sk) { udpv6_init_sock(sk); pr_warn_once("UDP-Lite is deprecated and scheduled to be removed in 2025, " "please contact the netdev mailing list\n"); return 0; } static int udplitev6_rcv(struct sk_buff *skb) { return __udp6_lib_rcv(skb, &udplite_table, IPPROTO_UDPLITE); } static int udplitev6_err(struct sk_buff *skb, struct inet6_skb_parm *opt, u8 type, u8 code, int offset, __be32 info) { return __udp6_lib_err(skb, opt, type, code, offset, info, &udplite_table); } static const struct inet6_protocol udplitev6_protocol = { .handler = udplitev6_rcv, .err_handler = udplitev6_err, .flags = INET6_PROTO_NOPOLICY|INET6_PROTO_FINAL, }; struct proto udplitev6_prot = { .name = "UDPLITEv6", .owner = THIS_MODULE, .close = udp_lib_close, .connect = ip6_datagram_connect, .disconnect = udp_disconnect, .ioctl = udp_ioctl, .init = udplitev6_sk_init, .destroy = udpv6_destroy_sock, .setsockopt = udpv6_setsockopt, .getsockopt = udpv6_getsockopt, .sendmsg = udpv6_sendmsg, .recvmsg = udpv6_recvmsg, .hash = udp_lib_hash, .unhash = udp_lib_unhash, .rehash = udp_v6_rehash, .get_port = udp_v6_get_port, .memory_allocated = &net_aligned_data.udp_memory_allocated, .per_cpu_fw_alloc = &udp_memory_per_cpu_fw_alloc, .sysctl_mem = sysctl_udp_mem, .sysctl_wmem_offset = offsetof(struct net, ipv4.sysctl_udp_wmem_min), .sysctl_rmem_offset = offsetof(struct net, ipv4.sysctl_udp_rmem_min), .obj_size = sizeof(struct udp6_sock), .ipv6_pinfo_offset = offsetof(struct udp6_sock, inet6), .h.udp_table = &udplite_table, }; static struct inet_protosw udplite6_protosw = { .type = SOCK_DGRAM, .protocol = IPPROTO_UDPLITE, .prot = &udplitev6_prot, .ops = &inet6_dgram_ops, .flags = INET_PROTOSW_PERMANENT, }; int __init udplitev6_init(void) { int ret; ret = inet6_add_protocol(&udplitev6_protocol, IPPROTO_UDPLITE); if (ret) goto out; ret = inet6_register_protosw(&udplite6_protosw); if (ret) goto out_udplitev6_protocol; out: return ret; out_udplitev6_protocol: inet6_del_protocol(&udplitev6_protocol, IPPROTO_UDPLITE); goto out; } void udplitev6_exit(void) { inet6_unregister_protosw(&udplite6_protosw); inet6_del_protocol(&udplitev6_protocol, IPPROTO_UDPLITE); } #ifdef CONFIG_PROC_FS static struct udp_seq_afinfo udplite6_seq_afinfo = { .family = AF_INET6, .udp_table = &udplite_table, }; static int __net_init udplite6_proc_init_net(struct net *net) { if (!proc_create_net_data("udplite6", 0444, net->proc_net, &udp6_seq_ops, sizeof(struct udp_iter_state), &udplite6_seq_afinfo)) return -ENOMEM; return 0; } static void __net_exit udplite6_proc_exit_net(struct net *net) { remove_proc_entry("udplite6", net->proc_net); } static struct pernet_operations udplite6_net_ops = { .init = udplite6_proc_init_net, .exit = udplite6_proc_exit_net, }; int __init udplite6_proc_init(void) { return register_pernet_subsys(&udplite6_net_ops); } void udplite6_proc_exit(void) { unregister_pernet_subsys(&udplite6_net_ops); } #endif |
| 25526 25534 6700 6692 112 6 11 90 11 32 56 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 | // SPDX-License-Identifier: GPL-2.0 #include <linux/compiler.h> #include <linux/errno.h> #include <linux/export.h> #include <linux/fault-inject-usercopy.h> #include <linux/instrumented.h> #include <linux/kernel.h> #include <linux/nospec.h> #include <linux/string.h> #include <linux/uaccess.h> #include <linux/wordpart.h> /* out-of-line parts */ #if !defined(INLINE_COPY_FROM_USER) || defined(CONFIG_RUST) unsigned long _copy_from_user(void *to, const void __user *from, unsigned long n) { return _inline_copy_from_user(to, from, n); } EXPORT_SYMBOL(_copy_from_user); #endif #if !defined(INLINE_COPY_TO_USER) || defined(CONFIG_RUST) unsigned long _copy_to_user(void __user *to, const void *from, unsigned long n) { return _inline_copy_to_user(to, from, n); } EXPORT_SYMBOL(_copy_to_user); #endif /** * check_zeroed_user: check if a userspace buffer only contains zero bytes * @from: Source address, in userspace. * @size: Size of buffer. * * This is effectively shorthand for "memchr_inv(from, 0, size) == NULL" for * userspace addresses (and is more efficient because we don't care where the * first non-zero byte is). * * Returns: * * 0: There were non-zero bytes present in the buffer. * * 1: The buffer was full of zero bytes. * * -EFAULT: access to userspace failed. */ int check_zeroed_user(const void __user *from, size_t size) { unsigned long val; uintptr_t align = (uintptr_t) from % sizeof(unsigned long); if (unlikely(size == 0)) return 1; from -= align; size += align; if (!user_read_access_begin(from, size)) return -EFAULT; unsafe_get_user(val, (unsigned long __user *) from, err_fault); if (align) val &= ~aligned_byte_mask(align); while (size > sizeof(unsigned long)) { if (unlikely(val)) goto done; from += sizeof(unsigned long); size -= sizeof(unsigned long); unsafe_get_user(val, (unsigned long __user *) from, err_fault); } if (size < sizeof(unsigned long)) val &= aligned_byte_mask(size); done: user_read_access_end(); return (val == 0); err_fault: user_read_access_end(); return -EFAULT; } EXPORT_SYMBOL(check_zeroed_user); |
| 14 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 | /* SPDX-License-Identifier: GPL-2.0-or-later */ /* * RSA internal helpers * * Copyright (c) 2015, Intel Corporation * Authors: Tadeusz Struk <tadeusz.struk@intel.com> */ #ifndef _RSA_HELPER_ #define _RSA_HELPER_ #include <linux/types.h> #include <crypto/akcipher.h> /** * rsa_key - RSA key structure * @n : RSA modulus raw byte stream * @e : RSA public exponent raw byte stream * @d : RSA private exponent raw byte stream * @p : RSA prime factor p of n raw byte stream * @q : RSA prime factor q of n raw byte stream * @dp : RSA exponent d mod (p - 1) raw byte stream * @dq : RSA exponent d mod (q - 1) raw byte stream * @qinv : RSA CRT coefficient q^(-1) mod p raw byte stream * @n_sz : length in bytes of RSA modulus n * @e_sz : length in bytes of RSA public exponent * @d_sz : length in bytes of RSA private exponent * @p_sz : length in bytes of p field * @q_sz : length in bytes of q field * @dp_sz : length in bytes of dp field * @dq_sz : length in bytes of dq field * @qinv_sz : length in bytes of qinv field */ struct rsa_key { const u8 *n; const u8 *e; const u8 *d; const u8 *p; const u8 *q; const u8 *dp; const u8 *dq; const u8 *qinv; size_t n_sz; size_t e_sz; size_t d_sz; size_t p_sz; size_t q_sz; size_t dp_sz; size_t dq_sz; size_t qinv_sz; }; int rsa_parse_pub_key(struct rsa_key *rsa_key, const void *key, unsigned int key_len); int rsa_parse_priv_key(struct rsa_key *rsa_key, const void *key, unsigned int key_len); #define RSA_PUB (true) #define RSA_PRIV (false) static inline int rsa_set_key(struct crypto_akcipher *child, unsigned int *key_size, bool is_pubkey, const void *key, unsigned int keylen) { int err; *key_size = 0; if (is_pubkey) err = crypto_akcipher_set_pub_key(child, key, keylen); else err = crypto_akcipher_set_priv_key(child, key, keylen); if (err) return err; /* Find out new modulus size from rsa implementation */ err = crypto_akcipher_maxsize(child); if (err > PAGE_SIZE) return -ENOTSUPP; *key_size = err; return 0; } extern struct crypto_template rsa_pkcs1pad_tmpl; extern struct crypto_template rsassa_pkcs1_tmpl; #endif |
| 261 3733 3730 261 3732 3736 10 2402 2400 3729 3736 3729 2406 3733 3734 3732 3736 3795 3796 3982 307 3734 259 3664 3731 3733 1652 1653 1653 1653 1653 1649 1653 261 3611 3733 626 870 869 869 870 3734 3729 3733 3736 3736 1241 1238 3731 261 3732 150 3732 3734 632 1241 1612 1 3733 3732 33 97 97 97 97 97 1 97 97 1 1480 46 1454 3797 3791 3800 3799 1423 41 41 41 1037 41 41 41 41 41 41 1 1 1038 1038 490 1038 1036 97 97 1037 1038 1 1 1 1 41 1037 1038 1038 1038 940 97 998 41 2401 2401 1712 1038 251 251 250 392 43 393 393 251 251 102 247 2 2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 | // SPDX-License-Identifier: GPL-2.0-only /* Copyright (c) 2025 Meta Platforms, Inc. and affiliates. */ #include <linux/bpf_verifier.h> #include <linux/hashtable.h> #include <linux/jhash.h> #include <linux/slab.h> /* * This file implements live stack slots analysis. After accumulating * stack usage data, the analysis answers queries about whether a * particular stack slot may be read by an instruction or any of it's * successors. This data is consumed by the verifier states caching * mechanism to decide which stack slots are important when looking for a * visited state corresponding to the current state. * * The analysis is call chain sensitive, meaning that data is collected * and queried for tuples (call chain, subprogram instruction index). * Such sensitivity allows identifying if some subprogram call always * leads to writes in the caller's stack. * * The basic idea is as follows: * - As the verifier accumulates a set of visited states, the analysis instance * accumulates a conservative estimate of stack slots that can be read * or must be written for each visited tuple (call chain, instruction index). * - If several states happen to visit the same instruction with the same * call chain, stack usage information for the corresponding tuple is joined: * - "may_read" set represents a union of all possibly read slots * (any slot in "may_read" set might be read at or after the instruction); * - "must_write" set represents an intersection of all possibly written slots * (any slot in "must_write" set is guaranteed to be written by the instruction). * - The analysis is split into two phases: * - read and write marks accumulation; * - read and write marks propagation. * - The propagation phase is a textbook live variable data flow analysis: * * state[cc, i].live_after = U [state[cc, s].live_before for s in insn_successors(i)] * state[cc, i].live_before = * (state[cc, i].live_after / state[cc, i].must_write) U state[i].may_read * * Where: * - `U` stands for set union * - `/` stands for set difference; * - `cc` stands for a call chain; * - `i` and `s` are instruction indexes; * * The above equations are computed for each call chain and instruction * index until state stops changing. * - Additionally, in order to transfer "must_write" information from a * subprogram to call instructions invoking this subprogram, * the "must_write_acc" set is tracked for each (cc, i) tuple. * A set of stack slots that are guaranteed to be written by this * instruction or any of its successors (within the subprogram). * The equation for "must_write_acc" propagation looks as follows: * * state[cc, i].must_write_acc = * ∩ [state[cc, s].must_write_acc for s in insn_successors(i)] * U state[cc, i].must_write * * (An intersection of all "must_write_acc" for instruction successors * plus all "must_write" slots for the instruction itself). * - After the propagation phase completes for a subprogram, information from * (cc, 0) tuple (subprogram entry) is transferred to the caller's call chain: * - "must_write_acc" set is intersected with the call site's "must_write" set; * - "may_read" set is added to the call site's "may_read" set. * - Any live stack queries must be taken after the propagation phase. * - Accumulation and propagation phases can be entered multiple times, * at any point in time: * - "may_read" set only grows; * - "must_write" set only shrinks; * - for each visited verifier state with zero branches, all relevant * read and write marks are already recorded by the analysis instance. * * Technically, the analysis is facilitated by the following data structures: * - Call chain: for given verifier state, the call chain is a tuple of call * instruction indexes leading to the current subprogram plus the subprogram * entry point index. * - Function instance: for a given call chain, for each instruction in * the current subprogram, a mapping between instruction index and a * set of "may_read", "must_write" and other marks accumulated for this * instruction. * - A hash table mapping call chains to function instances. */ struct callchain { u32 callsites[MAX_CALL_FRAMES]; /* instruction pointer for each frame */ /* cached subprog_info[*].start for functions owning the frames: * - sp_starts[curframe] used to get insn relative index within current function; * - sp_starts[0..current-1] used for fast callchain_frame_up(). */ u32 sp_starts[MAX_CALL_FRAMES]; u32 curframe; /* depth of callsites and sp_starts arrays */ }; struct per_frame_masks { u64 may_read; /* stack slots that may be read by this instruction */ u64 must_write; /* stack slots written by this instruction */ u64 must_write_acc; /* stack slots written by this instruction and its successors */ u64 live_before; /* stack slots that may be read by this insn and its successors */ }; /* * A function instance created for a specific callchain. * Encapsulates read and write marks for each instruction in the function. * Marks are tracked for each frame in the callchain. */ struct func_instance { struct hlist_node hl_node; struct callchain callchain; u32 insn_cnt; /* cached number of insns in the function */ bool updated; bool must_write_dropped; /* Per frame, per instruction masks, frames allocated lazily. */ struct per_frame_masks *frames[MAX_CALL_FRAMES]; /* For each instruction a flag telling if "must_write" had been initialized for it. */ bool *must_write_set; }; struct live_stack_query { struct func_instance *instances[MAX_CALL_FRAMES]; /* valid in range [0..curframe] */ u32 curframe; u32 insn_idx; }; struct bpf_liveness { DECLARE_HASHTABLE(func_instances, 8); /* maps callchain to func_instance */ struct live_stack_query live_stack_query; /* cache to avoid repetitive ht lookups */ /* Cached instance corresponding to env->cur_state, avoids per-instruction ht lookup */ struct func_instance *cur_instance; /* * Below fields are used to accumulate stack write marks for instruction at * @write_insn_idx before submitting the marks to @cur_instance. */ u64 write_masks_acc[MAX_CALL_FRAMES]; u32 write_insn_idx; }; /* Compute callchain corresponding to state @st at depth @frameno */ static void compute_callchain(struct bpf_verifier_env *env, struct bpf_verifier_state *st, struct callchain *callchain, u32 frameno) { struct bpf_subprog_info *subprog_info = env->subprog_info; u32 i; memset(callchain, 0, sizeof(*callchain)); for (i = 0; i <= frameno; i++) { callchain->sp_starts[i] = subprog_info[st->frame[i]->subprogno].start; if (i < st->curframe) callchain->callsites[i] = st->frame[i + 1]->callsite; } callchain->curframe = frameno; callchain->callsites[callchain->curframe] = callchain->sp_starts[callchain->curframe]; } static u32 hash_callchain(struct callchain *callchain) { return jhash2(callchain->callsites, callchain->curframe, 0); } static bool same_callsites(struct callchain *a, struct callchain *b) { int i; if (a->curframe != b->curframe) return false; for (i = a->curframe; i >= 0; i--) if (a->callsites[i] != b->callsites[i]) return false; return true; } /* * Find existing or allocate new function instance corresponding to @callchain. * Instances are accumulated in env->liveness->func_instances and persist * until the end of the verification process. */ static struct func_instance *__lookup_instance(struct bpf_verifier_env *env, struct callchain *callchain) { struct bpf_liveness *liveness = env->liveness; struct bpf_subprog_info *subprog; struct func_instance *result; u32 subprog_sz, size, key; key = hash_callchain(callchain); hash_for_each_possible(liveness->func_instances, result, hl_node, key) if (same_callsites(&result->callchain, callchain)) return result; subprog = bpf_find_containing_subprog(env, callchain->sp_starts[callchain->curframe]); subprog_sz = (subprog + 1)->start - subprog->start; size = sizeof(struct func_instance); result = kvzalloc(size, GFP_KERNEL_ACCOUNT); if (!result) return ERR_PTR(-ENOMEM); result->must_write_set = kvcalloc(subprog_sz, sizeof(*result->must_write_set), GFP_KERNEL_ACCOUNT); if (!result->must_write_set) { kvfree(result); return ERR_PTR(-ENOMEM); } memcpy(&result->callchain, callchain, sizeof(*callchain)); result->insn_cnt = subprog_sz; hash_add(liveness->func_instances, &result->hl_node, key); return result; } static struct func_instance *lookup_instance(struct bpf_verifier_env *env, struct bpf_verifier_state *st, u32 frameno) { struct callchain callchain; compute_callchain(env, st, &callchain, frameno); return __lookup_instance(env, &callchain); } int bpf_stack_liveness_init(struct bpf_verifier_env *env) { env->liveness = kvzalloc(sizeof(*env->liveness), GFP_KERNEL_ACCOUNT); if (!env->liveness) return -ENOMEM; hash_init(env->liveness->func_instances); return 0; } void bpf_stack_liveness_free(struct bpf_verifier_env *env) { struct func_instance *instance; struct hlist_node *tmp; int bkt, i; if (!env->liveness) return; hash_for_each_safe(env->liveness->func_instances, bkt, tmp, instance, hl_node) { for (i = 0; i <= instance->callchain.curframe; i++) kvfree(instance->frames[i]); kvfree(instance->must_write_set); kvfree(instance); } kvfree(env->liveness); } /* * Convert absolute instruction index @insn_idx to an index relative * to start of the function corresponding to @instance. */ static int relative_idx(struct func_instance *instance, u32 insn_idx) { return insn_idx - instance->callchain.sp_starts[instance->callchain.curframe]; } static struct per_frame_masks *get_frame_masks(struct func_instance *instance, u32 frame, u32 insn_idx) { if (!instance->frames[frame]) return NULL; return &instance->frames[frame][relative_idx(instance, insn_idx)]; } static struct per_frame_masks *alloc_frame_masks(struct bpf_verifier_env *env, struct func_instance *instance, u32 frame, u32 insn_idx) { struct per_frame_masks *arr; if (!instance->frames[frame]) { arr = kvcalloc(instance->insn_cnt, sizeof(*arr), GFP_KERNEL_ACCOUNT); instance->frames[frame] = arr; if (!arr) return ERR_PTR(-ENOMEM); } return get_frame_masks(instance, frame, insn_idx); } void bpf_reset_live_stack_callchain(struct bpf_verifier_env *env) { env->liveness->cur_instance = NULL; } /* If @env->liveness->cur_instance is null, set it to instance corresponding to @env->cur_state. */ static int ensure_cur_instance(struct bpf_verifier_env *env) { struct bpf_liveness *liveness = env->liveness; struct func_instance *instance; if (liveness->cur_instance) return 0; instance = lookup_instance(env, env->cur_state, env->cur_state->curframe); if (IS_ERR(instance)) return PTR_ERR(instance); liveness->cur_instance = instance; return 0; } /* Accumulate may_read masks for @frame at @insn_idx */ static int mark_stack_read(struct bpf_verifier_env *env, struct func_instance *instance, u32 frame, u32 insn_idx, u64 mask) { struct per_frame_masks *masks; u64 new_may_read; masks = alloc_frame_masks(env, instance, frame, insn_idx); if (IS_ERR(masks)) return PTR_ERR(masks); new_may_read = masks->may_read | mask; if (new_may_read != masks->may_read && ((new_may_read | masks->live_before) != masks->live_before)) instance->updated = true; masks->may_read |= mask; return 0; } int bpf_mark_stack_read(struct bpf_verifier_env *env, u32 frame, u32 insn_idx, u64 mask) { int err; err = ensure_cur_instance(env); err = err ?: mark_stack_read(env, env->liveness->cur_instance, frame, insn_idx, mask); return err; } static void reset_stack_write_marks(struct bpf_verifier_env *env, struct func_instance *instance, u32 insn_idx) { struct bpf_liveness *liveness = env->liveness; int i; liveness->write_insn_idx = insn_idx; for (i = 0; i <= instance->callchain.curframe; i++) liveness->write_masks_acc[i] = 0; } int bpf_reset_stack_write_marks(struct bpf_verifier_env *env, u32 insn_idx) { struct bpf_liveness *liveness = env->liveness; int err; err = ensure_cur_instance(env); if (err) return err; reset_stack_write_marks(env, liveness->cur_instance, insn_idx); return 0; } void bpf_mark_stack_write(struct bpf_verifier_env *env, u32 frame, u64 mask) { env->liveness->write_masks_acc[frame] |= mask; } static int commit_stack_write_marks(struct bpf_verifier_env *env, struct func_instance *instance) { struct bpf_liveness *liveness = env->liveness; u32 idx, frame, curframe, old_must_write; struct per_frame_masks *masks; u64 mask; if (!instance) return 0; curframe = instance->callchain.curframe; idx = relative_idx(instance, liveness->write_insn_idx); for (frame = 0; frame <= curframe; frame++) { mask = liveness->write_masks_acc[frame]; /* avoid allocating frames for zero masks */ if (mask == 0 && !instance->must_write_set[idx]) continue; masks = alloc_frame_masks(env, instance, frame, liveness->write_insn_idx); if (IS_ERR(masks)) return PTR_ERR(masks); old_must_write = masks->must_write; /* * If instruction at this callchain is seen for a first time, set must_write equal * to @mask. Otherwise take intersection with the previous value. */ if (instance->must_write_set[idx]) mask &= old_must_write; if (old_must_write != mask) { masks->must_write = mask; instance->updated = true; } if (old_must_write & ~mask) instance->must_write_dropped = true; } instance->must_write_set[idx] = true; liveness->write_insn_idx = 0; return 0; } /* * Merge stack writes marks in @env->liveness->write_masks_acc * with information already in @env->liveness->cur_instance. */ int bpf_commit_stack_write_marks(struct bpf_verifier_env *env) { return commit_stack_write_marks(env, env->liveness->cur_instance); } static char *fmt_callchain(struct bpf_verifier_env *env, struct callchain *callchain) { char *buf_end = env->tmp_str_buf + sizeof(env->tmp_str_buf); char *buf = env->tmp_str_buf; int i; buf += snprintf(buf, buf_end - buf, "("); for (i = 0; i <= callchain->curframe; i++) buf += snprintf(buf, buf_end - buf, "%s%d", i ? "," : "", callchain->callsites[i]); snprintf(buf, buf_end - buf, ")"); return env->tmp_str_buf; } static void log_mask_change(struct bpf_verifier_env *env, struct callchain *callchain, char *pfx, u32 frame, u32 insn_idx, u64 old, u64 new) { u64 changed_bits = old ^ new; u64 new_ones = new & changed_bits; u64 new_zeros = ~new & changed_bits; if (!changed_bits) return; bpf_log(&env->log, "%s frame %d insn %d ", fmt_callchain(env, callchain), frame, insn_idx); if (new_ones) { bpf_fmt_stack_mask(env->tmp_str_buf, sizeof(env->tmp_str_buf), new_ones); bpf_log(&env->log, "+%s %s ", pfx, env->tmp_str_buf); } if (new_zeros) { bpf_fmt_stack_mask(env->tmp_str_buf, sizeof(env->tmp_str_buf), new_zeros); bpf_log(&env->log, "-%s %s", pfx, env->tmp_str_buf); } bpf_log(&env->log, "\n"); } int bpf_jmp_offset(struct bpf_insn *insn) { u8 code = insn->code; if (code == (BPF_JMP32 | BPF_JA)) return insn->imm; return insn->off; } __diag_push(); __diag_ignore_all("-Woverride-init", "Allow field initialization overrides for opcode_info_tbl"); inline int bpf_insn_successors(struct bpf_prog *prog, u32 idx, u32 succ[2]) { static const struct opcode_info { bool can_jump; bool can_fallthrough; } opcode_info_tbl[256] = { [0 ... 255] = {.can_jump = false, .can_fallthrough = true}, #define _J(code, ...) \ [BPF_JMP | code] = __VA_ARGS__, \ [BPF_JMP32 | code] = __VA_ARGS__ _J(BPF_EXIT, {.can_jump = false, .can_fallthrough = false}), _J(BPF_JA, {.can_jump = true, .can_fallthrough = false}), _J(BPF_JEQ, {.can_jump = true, .can_fallthrough = true}), _J(BPF_JNE, {.can_jump = true, .can_fallthrough = true}), _J(BPF_JLT, {.can_jump = true, .can_fallthrough = true}), _J(BPF_JLE, {.can_jump = true, .can_fallthrough = true}), _J(BPF_JGT, {.can_jump = true, .can_fallthrough = true}), _J(BPF_JGE, {.can_jump = true, .can_fallthrough = true}), _J(BPF_JSGT, {.can_jump = true, .can_fallthrough = true}), _J(BPF_JSGE, {.can_jump = true, .can_fallthrough = true}), _J(BPF_JSLT, {.can_jump = true, .can_fallthrough = true}), _J(BPF_JSLE, {.can_jump = true, .can_fallthrough = true}), _J(BPF_JCOND, {.can_jump = true, .can_fallthrough = true}), _J(BPF_JSET, {.can_jump = true, .can_fallthrough = true}), #undef _J }; struct bpf_insn *insn = &prog->insnsi[idx]; const struct opcode_info *opcode_info; int i = 0, insn_sz; opcode_info = &opcode_info_tbl[BPF_CLASS(insn->code) | BPF_OP(insn->code)]; insn_sz = bpf_is_ldimm64(insn) ? 2 : 1; if (opcode_info->can_fallthrough) succ[i++] = idx + insn_sz; if (opcode_info->can_jump) succ[i++] = idx + bpf_jmp_offset(insn) + 1; return i; } __diag_pop(); static struct func_instance *get_outer_instance(struct bpf_verifier_env *env, struct func_instance *instance) { struct callchain callchain = instance->callchain; /* Adjust @callchain to represent callchain one frame up */ callchain.callsites[callchain.curframe] = 0; callchain.sp_starts[callchain.curframe] = 0; callchain.curframe--; callchain.callsites[callchain.curframe] = callchain.sp_starts[callchain.curframe]; return __lookup_instance(env, &callchain); } static u32 callchain_subprog_start(struct callchain *callchain) { return callchain->sp_starts[callchain->curframe]; } /* * Transfer @may_read and @must_write_acc marks from the first instruction of @instance, * to the call instruction in function instance calling @instance. */ static int propagate_to_outer_instance(struct bpf_verifier_env *env, struct func_instance *instance) { struct callchain *callchain = &instance->callchain; u32 this_subprog_start, callsite, frame; struct func_instance *outer_instance; struct per_frame_masks *insn; int err; this_subprog_start = callchain_subprog_start(callchain); outer_instance = get_outer_instance(env, instance); callsite = callchain->callsites[callchain->curframe - 1]; reset_stack_write_marks(env, outer_instance, callsite); for (frame = 0; frame < callchain->curframe; frame++) { insn = get_frame_masks(instance, frame, this_subprog_start); if (!insn) continue; bpf_mark_stack_write(env, frame, insn->must_write_acc); err = mark_stack_read(env, outer_instance, frame, callsite, insn->live_before); if (err) return err; } commit_stack_write_marks(env, outer_instance); return 0; } static inline bool update_insn(struct bpf_verifier_env *env, struct func_instance *instance, u32 frame, u32 insn_idx) { struct bpf_insn_aux_data *aux = env->insn_aux_data; u64 new_before, new_after, must_write_acc; struct per_frame_masks *insn, *succ_insn; u32 succ_num, s, succ[2]; bool changed; succ_num = bpf_insn_successors(env->prog, insn_idx, succ); if (unlikely(succ_num == 0)) return false; changed = false; insn = get_frame_masks(instance, frame, insn_idx); new_before = 0; new_after = 0; /* * New "must_write_acc" is an intersection of all "must_write_acc" * of successors plus all "must_write" slots of instruction itself. */ must_write_acc = U64_MAX; for (s = 0; s < succ_num; ++s) { succ_insn = get_frame_masks(instance, frame, succ[s]); new_after |= succ_insn->live_before; must_write_acc &= succ_insn->must_write_acc; } must_write_acc |= insn->must_write; /* * New "live_before" is a union of all "live_before" of successors * minus slots written by instruction plus slots read by instruction. */ new_before = (new_after & ~insn->must_write) | insn->may_read; changed |= new_before != insn->live_before; changed |= must_write_acc != insn->must_write_acc; if (unlikely(env->log.level & BPF_LOG_LEVEL2) && (insn->may_read || insn->must_write || insn_idx == callchain_subprog_start(&instance->callchain) || aux[insn_idx].prune_point)) { log_mask_change(env, &instance->callchain, "live", frame, insn_idx, insn->live_before, new_before); log_mask_change(env, &instance->callchain, "written", frame, insn_idx, insn->must_write_acc, must_write_acc); } insn->live_before = new_before; insn->must_write_acc = must_write_acc; return changed; } /* Fixed-point computation of @live_before and @must_write_acc marks */ static int update_instance(struct bpf_verifier_env *env, struct func_instance *instance) { u32 i, frame, po_start, po_end, cnt, this_subprog_start; struct callchain *callchain = &instance->callchain; int *insn_postorder = env->cfg.insn_postorder; struct bpf_subprog_info *subprog; struct per_frame_masks *insn; bool changed; int err; this_subprog_start = callchain_subprog_start(callchain); /* * If must_write marks were updated must_write_acc needs to be reset * (to account for the case when new must_write sets became smaller). */ if (instance->must_write_dropped) { for (frame = 0; frame <= callchain->curframe; frame++) { if (!instance->frames[frame]) continue; for (i = 0; i < instance->insn_cnt; i++) { insn = get_frame_masks(instance, frame, this_subprog_start + i); insn->must_write_acc = 0; } } } subprog = bpf_find_containing_subprog(env, this_subprog_start); po_start = subprog->postorder_start; po_end = (subprog + 1)->postorder_start; cnt = 0; /* repeat until fixed point is reached */ do { cnt++; changed = false; for (frame = 0; frame <= instance->callchain.curframe; frame++) { if (!instance->frames[frame]) continue; for (i = po_start; i < po_end; i++) changed |= update_insn(env, instance, frame, insn_postorder[i]); } } while (changed); if (env->log.level & BPF_LOG_LEVEL2) bpf_log(&env->log, "%s live stack update done in %d iterations\n", fmt_callchain(env, callchain), cnt); /* transfer marks accumulated for outer frames to outer func instance (caller) */ if (callchain->curframe > 0) { err = propagate_to_outer_instance(env, instance); if (err) return err; } return 0; } /* * Prepare all callchains within @env->cur_state for querying. * This function should be called after each verifier.c:pop_stack() * and whenever verifier.c:do_check_insn() processes subprogram exit. * This would guarantee that visited verifier states with zero branches * have their bpf_mark_stack_{read,write}() effects propagated in * @env->liveness. */ int bpf_update_live_stack(struct bpf_verifier_env *env) { struct func_instance *instance; int err, frame; bpf_reset_live_stack_callchain(env); for (frame = env->cur_state->curframe; frame >= 0; --frame) { instance = lookup_instance(env, env->cur_state, frame); if (IS_ERR(instance)) return PTR_ERR(instance); if (instance->updated) { err = update_instance(env, instance); if (err) return err; instance->updated = false; instance->must_write_dropped = false; } } return 0; } static bool is_live_before(struct func_instance *instance, u32 insn_idx, u32 frameno, u32 spi) { struct per_frame_masks *masks; masks = get_frame_masks(instance, frameno, insn_idx); return masks && (masks->live_before & BIT(spi)); } int bpf_live_stack_query_init(struct bpf_verifier_env *env, struct bpf_verifier_state *st) { struct live_stack_query *q = &env->liveness->live_stack_query; struct func_instance *instance; u32 frame; memset(q, 0, sizeof(*q)); for (frame = 0; frame <= st->curframe; frame++) { instance = lookup_instance(env, st, frame); if (IS_ERR(instance)) return PTR_ERR(instance); q->instances[frame] = instance; } q->curframe = st->curframe; q->insn_idx = st->insn_idx; return 0; } bool bpf_stack_slot_alive(struct bpf_verifier_env *env, u32 frameno, u32 spi) { /* * Slot is alive if it is read before q->st->insn_idx in current func instance, * or if for some outer func instance: * - alive before callsite if callsite calls callback, otherwise * - alive after callsite */ struct live_stack_query *q = &env->liveness->live_stack_query; struct func_instance *instance, *curframe_instance; u32 i, callsite; bool alive; curframe_instance = q->instances[q->curframe]; if (is_live_before(curframe_instance, q->insn_idx, frameno, spi)) return true; for (i = frameno; i < q->curframe; i++) { callsite = curframe_instance->callchain.callsites[i]; instance = q->instances[i]; alive = bpf_calls_callback(env, callsite) ? is_live_before(instance, callsite, frameno, spi) : is_live_before(instance, callsite + 1, frameno, spi); if (alive) return true; } return false; } |
| 1 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 | // SPDX-License-Identifier: GPL-2.0-only /* Kernel module to match connection tracking information. */ /* (C) 1999-2001 Paul `Rusty' Russell * (C) 2002-2005 Netfilter Core Team <coreteam@netfilter.org> */ #include <linux/module.h> #include <linux/skbuff.h> #include <net/netfilter/nf_conntrack.h> #include <linux/netfilter/x_tables.h> #include <linux/netfilter/xt_state.h> MODULE_LICENSE("GPL"); MODULE_AUTHOR("Rusty Russell <rusty@rustcorp.com.au>"); MODULE_DESCRIPTION("ip[6]_tables connection tracking state match module"); MODULE_ALIAS("ipt_state"); MODULE_ALIAS("ip6t_state"); static bool state_mt(const struct sk_buff *skb, struct xt_action_param *par) { const struct xt_state_info *sinfo = par->matchinfo; enum ip_conntrack_info ctinfo; unsigned int statebit; struct nf_conn *ct = nf_ct_get(skb, &ctinfo); if (ct) statebit = XT_STATE_BIT(ctinfo); else if (ctinfo == IP_CT_UNTRACKED) statebit = XT_STATE_UNTRACKED; else statebit = XT_STATE_INVALID; return (sinfo->statemask & statebit); } static int state_mt_check(const struct xt_mtchk_param *par) { int ret; ret = nf_ct_netns_get(par->net, par->family); if (ret < 0) pr_info_ratelimited("cannot load conntrack support for proto=%u\n", par->family); return ret; } static void state_mt_destroy(const struct xt_mtdtor_param *par) { nf_ct_netns_put(par->net, par->family); } static struct xt_match state_mt_reg __read_mostly = { .name = "state", .family = NFPROTO_UNSPEC, .checkentry = state_mt_check, .match = state_mt, .destroy = state_mt_destroy, .matchsize = sizeof(struct xt_state_info), .me = THIS_MODULE, }; static int __init state_mt_init(void) { return xt_register_match(&state_mt_reg); } static void __exit state_mt_exit(void) { xt_unregister_match(&state_mt_reg); } module_init(state_mt_init); module_exit(state_mt_exit); |
| 140 252 117 217 217 3 62 133 9 9 87 47 20 16 9 15 2 6 56 13 56 56 27 38 7 23 2 4 4 5 13 6 7 4 1 2 4 2 11 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 | // SPDX-License-Identifier: GPL-2.0-only /* * IPv6 library code, needed by static components when full IPv6 support is * not configured or static. */ #include <linux/export.h> #include <net/ipv6.h> /* * find out if nexthdr is a well-known extension header or a protocol */ bool ipv6_ext_hdr(u8 nexthdr) { /* * find out if nexthdr is an extension header or a protocol */ return (nexthdr == NEXTHDR_HOP) || (nexthdr == NEXTHDR_ROUTING) || (nexthdr == NEXTHDR_FRAGMENT) || (nexthdr == NEXTHDR_AUTH) || (nexthdr == NEXTHDR_NONE) || (nexthdr == NEXTHDR_DEST); } EXPORT_SYMBOL(ipv6_ext_hdr); /* * Skip any extension headers. This is used by the ICMP module. * * Note that strictly speaking this conflicts with RFC 2460 4.0: * ...The contents and semantics of each extension header determine whether * or not to proceed to the next header. Therefore, extension headers must * be processed strictly in the order they appear in the packet; a * receiver must not, for example, scan through a packet looking for a * particular kind of extension header and process that header prior to * processing all preceding ones. * * We do exactly this. This is a protocol bug. We can't decide after a * seeing an unknown discard-with-error flavour TLV option if it's a * ICMP error message or not (errors should never be send in reply to * ICMP error messages). * * But I see no other way to do this. This might need to be reexamined * when Linux implements ESP (and maybe AUTH) headers. * --AK * * This function parses (probably truncated) exthdr set "hdr". * "nexthdrp" initially points to some place, * where type of the first header can be found. * * It skips all well-known exthdrs, and returns pointer to the start * of unparsable area i.e. the first header with unknown type. * If it is not NULL *nexthdr is updated by type/protocol of this header. * * NOTES: - if packet terminated with NEXTHDR_NONE it returns NULL. * - it may return pointer pointing beyond end of packet, * if the last recognized header is truncated in the middle. * - if packet is truncated, so that all parsed headers are skipped, * it returns NULL. * - First fragment header is skipped, not-first ones * are considered as unparsable. * - Reports the offset field of the final fragment header so it is * possible to tell whether this is a first fragment, later fragment, * or not fragmented. * - ESP is unparsable for now and considered like * normal payload protocol. * - Note also special handling of AUTH header. Thanks to IPsec wizards. * * --ANK (980726) */ int ipv6_skip_exthdr(const struct sk_buff *skb, int start, u8 *nexthdrp, __be16 *frag_offp) { u8 nexthdr = *nexthdrp; *frag_offp = 0; while (ipv6_ext_hdr(nexthdr)) { struct ipv6_opt_hdr _hdr, *hp; int hdrlen; if (nexthdr == NEXTHDR_NONE) return -1; hp = skb_header_pointer(skb, start, sizeof(_hdr), &_hdr); if (!hp) return -1; if (nexthdr == NEXTHDR_FRAGMENT) { __be16 _frag_off, *fp; fp = skb_header_pointer(skb, start+offsetof(struct frag_hdr, frag_off), sizeof(_frag_off), &_frag_off); if (!fp) return -1; *frag_offp = *fp; if (ntohs(*frag_offp) & ~0x7) break; hdrlen = 8; } else if (nexthdr == NEXTHDR_AUTH) hdrlen = ipv6_authlen(hp); else hdrlen = ipv6_optlen(hp); nexthdr = hp->nexthdr; start += hdrlen; } *nexthdrp = nexthdr; return start; } EXPORT_SYMBOL(ipv6_skip_exthdr); int ipv6_find_tlv(const struct sk_buff *skb, int offset, int type) { const unsigned char *nh = skb_network_header(skb); int packet_len = skb_tail_pointer(skb) - skb_network_header(skb); struct ipv6_opt_hdr *hdr; int len; if (offset + 2 > packet_len) goto bad; hdr = (struct ipv6_opt_hdr *)(nh + offset); len = ((hdr->hdrlen + 1) << 3); if (offset + len > packet_len) goto bad; offset += 2; len -= 2; while (len > 0) { int opttype = nh[offset]; int optlen; if (opttype == type) return offset; switch (opttype) { case IPV6_TLV_PAD1: optlen = 1; break; default: if (len < 2) goto bad; optlen = nh[offset + 1] + 2; if (optlen > len) goto bad; break; } offset += optlen; len -= optlen; } /* not_found */ bad: return -1; } EXPORT_SYMBOL_GPL(ipv6_find_tlv); /* * find the offset to specified header or the protocol number of last header * if target < 0. "last header" is transport protocol header, ESP, or * "No next header". * * Note that *offset is used as input/output parameter, and if it is not zero, * then it must be a valid offset to an inner IPv6 header. This can be used * to explore inner IPv6 header, eg. ICMPv6 error messages. * * If target header is found, its offset is set in *offset and return protocol * number. Otherwise, return -1. * * If the first fragment doesn't contain the final protocol header or * NEXTHDR_NONE it is considered invalid. * * Note that non-1st fragment is special case that "the protocol number * of last header" is "next header" field in Fragment header. In this case, * *offset is meaningless and fragment offset is stored in *fragoff if fragoff * isn't NULL. * * if flags is not NULL and it's a fragment, then the frag flag * IP6_FH_F_FRAG will be set. If it's an AH header, the * IP6_FH_F_AUTH flag is set and target < 0, then this function will * stop at the AH header. If IP6_FH_F_SKIP_RH flag was passed, then this * function will skip all those routing headers, where segements_left was 0. */ int ipv6_find_hdr(const struct sk_buff *skb, unsigned int *offset, int target, unsigned short *fragoff, int *flags) { unsigned int start = skb_network_offset(skb) + sizeof(struct ipv6hdr); u8 nexthdr = ipv6_hdr(skb)->nexthdr; bool found; if (fragoff) *fragoff = 0; if (*offset) { struct ipv6hdr _ip6, *ip6; ip6 = skb_header_pointer(skb, *offset, sizeof(_ip6), &_ip6); if (!ip6 || (ip6->version != 6)) return -EBADMSG; start = *offset + sizeof(struct ipv6hdr); nexthdr = ip6->nexthdr; } do { struct ipv6_opt_hdr _hdr, *hp; unsigned int hdrlen; found = (nexthdr == target); if ((!ipv6_ext_hdr(nexthdr)) || nexthdr == NEXTHDR_NONE) { if (target < 0 || found) break; return -ENOENT; } hp = skb_header_pointer(skb, start, sizeof(_hdr), &_hdr); if (!hp) return -EBADMSG; if (nexthdr == NEXTHDR_ROUTING) { struct ipv6_rt_hdr _rh, *rh; rh = skb_header_pointer(skb, start, sizeof(_rh), &_rh); if (!rh) return -EBADMSG; if (flags && (*flags & IP6_FH_F_SKIP_RH) && rh->segments_left == 0) found = false; } if (nexthdr == NEXTHDR_FRAGMENT) { unsigned short _frag_off; __be16 *fp; if (flags) /* Indicate that this is a fragment */ *flags |= IP6_FH_F_FRAG; fp = skb_header_pointer(skb, start+offsetof(struct frag_hdr, frag_off), sizeof(_frag_off), &_frag_off); if (!fp) return -EBADMSG; _frag_off = ntohs(*fp) & ~0x7; if (_frag_off) { if (target < 0 && ((!ipv6_ext_hdr(hp->nexthdr)) || hp->nexthdr == NEXTHDR_NONE)) { if (fragoff) *fragoff = _frag_off; return hp->nexthdr; } if (!found) return -ENOENT; if (fragoff) *fragoff = _frag_off; break; } hdrlen = 8; } else if (nexthdr == NEXTHDR_AUTH) { if (flags && (*flags & IP6_FH_F_AUTH) && (target < 0)) break; hdrlen = ipv6_authlen(hp); } else hdrlen = ipv6_optlen(hp); if (!found) { nexthdr = hp->nexthdr; start += hdrlen; } } while (!found); *offset = start; return nexthdr; } EXPORT_SYMBOL(ipv6_find_hdr); |
| 75 75 74 4 75 75 75 11 11 11 11 11 16 16 16 16 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 | // SPDX-License-Identifier: GPL-2.0-only /* * LED Triggers Core * * Copyright 2005-2007 Openedhand Ltd. * * Author: Richard Purdie <rpurdie@openedhand.com> */ #include <linux/export.h> #include <linux/kernel.h> #include <linux/list.h> #include <linux/spinlock.h> #include <linux/device.h> #include <linux/timer.h> #include <linux/rwsem.h> #include <linux/leds.h> #include <linux/slab.h> #include <linux/mm.h> #include "leds.h" /* * Nests outside led_cdev->trigger_lock */ static DECLARE_RWSEM(triggers_list_lock); static LIST_HEAD(trigger_list); /* Used by LED Class */ static inline bool trigger_relevant(struct led_classdev *led_cdev, struct led_trigger *trig) { return !trig->trigger_type || trig->trigger_type == led_cdev->trigger_type; } ssize_t led_trigger_write(struct file *filp, struct kobject *kobj, const struct bin_attribute *bin_attr, char *buf, loff_t pos, size_t count) { struct device *dev = kobj_to_dev(kobj); struct led_classdev *led_cdev = dev_get_drvdata(dev); struct led_trigger *trig; int ret = count; mutex_lock(&led_cdev->led_access); if (led_sysfs_is_disabled(led_cdev)) { ret = -EBUSY; goto unlock; } if (sysfs_streq(buf, "none")) { led_trigger_remove(led_cdev); goto unlock; } if (sysfs_streq(buf, "default")) { led_trigger_set_default(led_cdev); goto unlock; } down_read(&triggers_list_lock); list_for_each_entry(trig, &trigger_list, next_trig) { if (sysfs_streq(buf, trig->name) && trigger_relevant(led_cdev, trig)) { down_write(&led_cdev->trigger_lock); led_trigger_set(led_cdev, trig); up_write(&led_cdev->trigger_lock); up_read(&triggers_list_lock); goto unlock; } } /* we come here only if buf matches no trigger */ ret = -EINVAL; up_read(&triggers_list_lock); unlock: mutex_unlock(&led_cdev->led_access); return ret; } EXPORT_SYMBOL_GPL(led_trigger_write); __printf(3, 4) static int led_trigger_snprintf(char *buf, ssize_t size, const char *fmt, ...) { va_list args; int i; va_start(args, fmt); if (size <= 0) i = vsnprintf(NULL, 0, fmt, args); else i = vscnprintf(buf, size, fmt, args); va_end(args); return i; } static int led_trigger_format(char *buf, size_t size, struct led_classdev *led_cdev) { struct led_trigger *trig; int len = led_trigger_snprintf(buf, size, "%s", led_cdev->trigger ? "none" : "[none]"); if (led_cdev->default_trigger) len += led_trigger_snprintf(buf + len, size - len, " default"); list_for_each_entry(trig, &trigger_list, next_trig) { bool hit; if (!trigger_relevant(led_cdev, trig)) continue; hit = led_cdev->trigger && !strcmp(led_cdev->trigger->name, trig->name); len += led_trigger_snprintf(buf + len, size - len, " %s%s%s", hit ? "[" : "", trig->name, hit ? "]" : ""); } len += led_trigger_snprintf(buf + len, size - len, "\n"); return len; } /* * It was stupid to create 10000 cpu triggers, but we are stuck with it now. * Don't make that mistake again. We work around it here by creating binary * attribute, which is not limited by length. This is _not_ good design, do not * copy it. */ ssize_t led_trigger_read(struct file *filp, struct kobject *kobj, const struct bin_attribute *attr, char *buf, loff_t pos, size_t count) { struct device *dev = kobj_to_dev(kobj); struct led_classdev *led_cdev = dev_get_drvdata(dev); void *data; int len; down_read(&triggers_list_lock); down_read(&led_cdev->trigger_lock); len = led_trigger_format(NULL, 0, led_cdev); data = kvmalloc(len + 1, GFP_KERNEL); if (!data) { up_read(&led_cdev->trigger_lock); up_read(&triggers_list_lock); return -ENOMEM; } len = led_trigger_format(data, len + 1, led_cdev); up_read(&led_cdev->trigger_lock); up_read(&triggers_list_lock); len = memory_read_from_buffer(buf, count, &pos, data, len); kvfree(data); return len; } EXPORT_SYMBOL_GPL(led_trigger_read); /* Caller must ensure led_cdev->trigger_lock held */ int led_trigger_set(struct led_classdev *led_cdev, struct led_trigger *trig) { char *event = NULL; char *envp[2]; const char *name; int ret; if (!led_cdev->trigger && !trig) return 0; name = trig ? trig->name : "none"; event = kasprintf(GFP_KERNEL, "TRIGGER=%s", name); /* Remove any existing trigger */ if (led_cdev->trigger) { spin_lock(&led_cdev->trigger->leddev_list_lock); list_del_rcu(&led_cdev->trig_list); spin_unlock(&led_cdev->trigger->leddev_list_lock); /* ensure it's no longer visible on the led_cdevs list */ synchronize_rcu(); cancel_work_sync(&led_cdev->set_brightness_work); led_stop_software_blink(led_cdev); device_remove_groups(led_cdev->dev, led_cdev->trigger->groups); if (led_cdev->trigger->deactivate) led_cdev->trigger->deactivate(led_cdev); led_cdev->trigger = NULL; led_cdev->trigger_data = NULL; led_cdev->activated = false; led_cdev->flags &= ~LED_INIT_DEFAULT_TRIGGER; led_set_brightness(led_cdev, LED_OFF); } if (trig) { spin_lock(&trig->leddev_list_lock); list_add_tail_rcu(&led_cdev->trig_list, &trig->led_cdevs); spin_unlock(&trig->leddev_list_lock); led_cdev->trigger = trig; /* * Some activate() calls use led_trigger_event() to initialize * the brightness of the LED for which the trigger is being set. * Ensure the led_cdev is visible on trig->led_cdevs for this. */ synchronize_rcu(); /* * If "set brightness to 0" is pending in workqueue, * we don't want that to be reordered after ->activate() */ flush_work(&led_cdev->set_brightness_work); ret = 0; if (trig->activate) ret = trig->activate(led_cdev); else led_set_brightness(led_cdev, trig->brightness); if (ret) goto err_activate; ret = device_add_groups(led_cdev->dev, trig->groups); if (ret) { dev_err(led_cdev->dev, "Failed to add trigger attributes\n"); goto err_add_groups; } } if (event) { envp[0] = event; envp[1] = NULL; if (kobject_uevent_env(&led_cdev->dev->kobj, KOBJ_CHANGE, envp)) dev_err(led_cdev->dev, "%s: Error sending uevent\n", __func__); kfree(event); } return 0; err_add_groups: if (trig->deactivate) trig->deactivate(led_cdev); err_activate: spin_lock(&led_cdev->trigger->leddev_list_lock); list_del_rcu(&led_cdev->trig_list); spin_unlock(&led_cdev->trigger->leddev_list_lock); synchronize_rcu(); led_cdev->trigger = NULL; led_cdev->trigger_data = NULL; led_set_brightness(led_cdev, LED_OFF); kfree(event); return ret; } EXPORT_SYMBOL_GPL(led_trigger_set); void led_trigger_remove(struct led_classdev *led_cdev) { down_write(&led_cdev->trigger_lock); led_trigger_set(led_cdev, NULL); up_write(&led_cdev->trigger_lock); } EXPORT_SYMBOL_GPL(led_trigger_remove); static bool led_match_default_trigger(struct led_classdev *led_cdev, struct led_trigger *trig) { if (!strcmp(led_cdev->default_trigger, trig->name) && trigger_relevant(led_cdev, trig)) { led_cdev->flags |= LED_INIT_DEFAULT_TRIGGER; led_trigger_set(led_cdev, trig); return true; } return false; } void led_trigger_set_default(struct led_classdev *led_cdev) { struct led_trigger *trig; bool found = false; if (!led_cdev->default_trigger) return; if (!strcmp(led_cdev->default_trigger, "none")) { led_trigger_remove(led_cdev); return; } down_read(&triggers_list_lock); down_write(&led_cdev->trigger_lock); list_for_each_entry(trig, &trigger_list, next_trig) { found = led_match_default_trigger(led_cdev, trig); if (found) break; } up_write(&led_cdev->trigger_lock); up_read(&triggers_list_lock); /* * If default trigger wasn't found, maybe trigger module isn't loaded yet. * Once loaded it will re-probe with all led_cdev's. */ if (!found) request_module_nowait("ledtrig:%s", led_cdev->default_trigger); } EXPORT_SYMBOL_GPL(led_trigger_set_default); /* LED Trigger Interface */ int led_trigger_register(struct led_trigger *trig) { struct led_classdev *led_cdev; struct led_trigger *_trig; spin_lock_init(&trig->leddev_list_lock); INIT_LIST_HEAD(&trig->led_cdevs); down_write(&triggers_list_lock); /* Make sure the trigger's name isn't already in use */ list_for_each_entry(_trig, &trigger_list, next_trig) { if (!strcmp(_trig->name, trig->name) && (trig->trigger_type == _trig->trigger_type || !trig->trigger_type || !_trig->trigger_type)) { up_write(&triggers_list_lock); return -EEXIST; } } /* Add to the list of led triggers */ list_add_tail(&trig->next_trig, &trigger_list); up_write(&triggers_list_lock); /* Register with any LEDs that have this as a default trigger */ down_read(&leds_list_lock); list_for_each_entry(led_cdev, &leds_list, node) { down_write(&led_cdev->trigger_lock); if (!led_cdev->trigger && led_cdev->default_trigger) led_match_default_trigger(led_cdev, trig); up_write(&led_cdev->trigger_lock); } up_read(&leds_list_lock); return 0; } EXPORT_SYMBOL_GPL(led_trigger_register); void led_trigger_unregister(struct led_trigger *trig) { struct led_classdev *led_cdev; if (list_empty_careful(&trig->next_trig)) return; /* Remove from the list of led triggers */ down_write(&triggers_list_lock); list_del_init(&trig->next_trig); up_write(&triggers_list_lock); /* Remove anyone actively using this trigger */ down_read(&leds_list_lock); list_for_each_entry(led_cdev, &leds_list, node) { down_write(&led_cdev->trigger_lock); if (led_cdev->trigger == trig) led_trigger_set(led_cdev, NULL); up_write(&led_cdev->trigger_lock); } up_read(&leds_list_lock); } EXPORT_SYMBOL_GPL(led_trigger_unregister); static void devm_led_trigger_release(struct device *dev, void *res) { led_trigger_unregister(*(struct led_trigger **)res); } int devm_led_trigger_register(struct device *dev, struct led_trigger *trig) { struct led_trigger **dr; int rc; dr = devres_alloc(devm_led_trigger_release, sizeof(*dr), GFP_KERNEL); if (!dr) return -ENOMEM; *dr = trig; rc = led_trigger_register(trig); if (rc) devres_free(dr); else devres_add(dev, dr); return rc; } EXPORT_SYMBOL_GPL(devm_led_trigger_register); /* Simple LED Trigger Interface */ void led_trigger_event(struct led_trigger *trig, enum led_brightness brightness) { struct led_classdev *led_cdev; if (!trig) return; trig->brightness = brightness; rcu_read_lock(); list_for_each_entry_rcu(led_cdev, &trig->led_cdevs, trig_list) led_set_brightness(led_cdev, brightness); rcu_read_unlock(); } EXPORT_SYMBOL_GPL(led_trigger_event); void led_mc_trigger_event(struct led_trigger *trig, unsigned int *intensity_value, unsigned int num_colors, enum led_brightness brightness) { struct led_classdev *led_cdev; if (!trig) return; rcu_read_lock(); list_for_each_entry_rcu(led_cdev, &trig->led_cdevs, trig_list) { if (!(led_cdev->flags & LED_MULTI_COLOR)) continue; led_mc_set_brightness(led_cdev, intensity_value, num_colors, brightness); } rcu_read_unlock(); } EXPORT_SYMBOL_GPL(led_mc_trigger_event); static void led_trigger_blink_setup(struct led_trigger *trig, unsigned long delay_on, unsigned long delay_off, int oneshot, int invert) { struct led_classdev *led_cdev; if (!trig) return; rcu_read_lock(); list_for_each_entry_rcu(led_cdev, &trig->led_cdevs, trig_list) { if (oneshot) led_blink_set_oneshot(led_cdev, &delay_on, &delay_off, invert); else led_blink_set_nosleep(led_cdev, delay_on, delay_off); } rcu_read_unlock(); } void led_trigger_blink(struct led_trigger *trig, unsigned long delay_on, unsigned long delay_off) { led_trigger_blink_setup(trig, delay_on, delay_off, 0, 0); } EXPORT_SYMBOL_GPL(led_trigger_blink); void led_trigger_blink_oneshot(struct led_trigger *trig, unsigned long delay_on, unsigned long delay_off, int invert) { led_trigger_blink_setup(trig, delay_on, delay_off, 1, invert); } EXPORT_SYMBOL_GPL(led_trigger_blink_oneshot); void led_trigger_register_simple(const char *name, struct led_trigger **tp) { struct led_trigger *trig; int err; trig = kzalloc(sizeof(struct led_trigger), GFP_KERNEL); if (trig) { trig->name = name; err = led_trigger_register(trig); if (err < 0) { kfree(trig); trig = NULL; pr_warn("LED trigger %s failed to register (%d)\n", name, err); } } else { pr_warn("LED trigger %s failed to register (no memory)\n", name); } *tp = trig; } EXPORT_SYMBOL_GPL(led_trigger_register_simple); void led_trigger_unregister_simple(struct led_trigger *trig) { if (trig) led_trigger_unregister(trig); kfree(trig); } EXPORT_SYMBOL_GPL(led_trigger_unregister_simple); |
| 5 1178 1178 1164 6 8 17 21 1 1 3 3 3 3 3 1 3 5 5 50 9 51 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 | // SPDX-License-Identifier: GPL-2.0 /* Copyright 2011-2014 Autronica Fire and Security AS * * Author(s): * 2011-2014 Arvid Brodin, arvid.brodin@alten.se * * Event handling for HSR and PRP devices. */ #include <linux/netdevice.h> #include <net/rtnetlink.h> #include <linux/rculist.h> #include <linux/timer.h> #include <linux/etherdevice.h> #include "hsr_main.h" #include "hsr_device.h" #include "hsr_netlink.h" #include "hsr_framereg.h" #include "hsr_slave.h" static bool hsr_slave_empty(struct hsr_priv *hsr) { struct hsr_port *port; hsr_for_each_port_rtnl(hsr, port) if (port->type != HSR_PT_MASTER) return false; return true; } static int hsr_netdev_notify(struct notifier_block *nb, unsigned long event, void *ptr) { struct hsr_port *port, *master; struct net_device *dev; struct hsr_priv *hsr; LIST_HEAD(list_kill); int mtu_max; int res; dev = netdev_notifier_info_to_dev(ptr); port = hsr_port_get_rtnl(dev); if (!port) { if (!is_hsr_master(dev)) return NOTIFY_DONE; /* Not an HSR device */ hsr = netdev_priv(dev); port = hsr_port_get_hsr(hsr, HSR_PT_MASTER); if (!port) { /* Resend of notification concerning removed device? */ return NOTIFY_DONE; } } else { hsr = port->hsr; } switch (event) { case NETDEV_UP: /* Administrative state DOWN */ case NETDEV_DOWN: /* Administrative state UP */ case NETDEV_CHANGE: /* Link (carrier) state changes */ hsr_check_carrier_and_operstate(hsr); break; case NETDEV_CHANGENAME: if (is_hsr_master(dev)) hsr_debugfs_rename(dev); break; case NETDEV_CHANGEADDR: if (port->type == HSR_PT_MASTER) { /* This should not happen since there's no * ndo_set_mac_address() for HSR devices - i.e. not * supported. */ break; } master = hsr_port_get_hsr(hsr, HSR_PT_MASTER); if (port->type == HSR_PT_SLAVE_A) { eth_hw_addr_set(master->dev, dev->dev_addr); call_netdevice_notifiers(NETDEV_CHANGEADDR, master->dev); if (hsr->prot_version == PRP_V1) { port = hsr_port_get_hsr(hsr, HSR_PT_SLAVE_B); if (port) { eth_hw_addr_set(port->dev, dev->dev_addr); call_netdevice_notifiers(NETDEV_CHANGEADDR, port->dev); } } } /* Make sure we recognize frames from ourselves in hsr_rcv() */ port = hsr_port_get_hsr(hsr, HSR_PT_SLAVE_B); res = hsr_create_self_node(hsr, master->dev->dev_addr, port ? port->dev->dev_addr : master->dev->dev_addr); if (res) netdev_warn(master->dev, "Could not update HSR node address.\n"); break; case NETDEV_CHANGEMTU: if (port->type == HSR_PT_MASTER) break; /* Handled in ndo_change_mtu() */ mtu_max = hsr_get_max_mtu(port->hsr); master = hsr_port_get_hsr(port->hsr, HSR_PT_MASTER); WRITE_ONCE(master->dev->mtu, mtu_max); break; case NETDEV_UNREGISTER: if (!is_hsr_master(dev)) { master = hsr_port_get_hsr(port->hsr, HSR_PT_MASTER); hsr_del_port(port); if (hsr_slave_empty(master->hsr)) { const struct rtnl_link_ops *ops; ops = master->dev->rtnl_link_ops; ops->dellink(master->dev, &list_kill); unregister_netdevice_many(&list_kill); } } break; case NETDEV_PRE_TYPE_CHANGE: /* HSR works only on Ethernet devices. Refuse slave to change * its type. */ return NOTIFY_BAD; } return NOTIFY_DONE; } struct hsr_port *hsr_port_get_hsr(struct hsr_priv *hsr, enum hsr_port_type pt) { struct hsr_port *port; hsr_for_each_port_rtnl(hsr, port) if (port->type == pt) return port; return NULL; } int hsr_get_version(struct net_device *dev, enum hsr_version *ver) { struct hsr_priv *hsr; hsr = netdev_priv(dev); *ver = hsr->prot_version; return 0; } EXPORT_SYMBOL(hsr_get_version); static struct notifier_block hsr_nb = { .notifier_call = hsr_netdev_notify, /* Slave event notifications */ }; static int __init hsr_init(void) { int err; BUILD_BUG_ON(sizeof(struct hsr_tag) != HSR_HLEN); err = register_netdevice_notifier(&hsr_nb); if (err) return err; err = hsr_netlink_init(); if (err) { unregister_netdevice_notifier(&hsr_nb); return err; } return 0; } static void __exit hsr_exit(void) { hsr_netlink_exit(); hsr_debugfs_remove_root(); unregister_netdevice_notifier(&hsr_nb); } module_init(hsr_init); module_exit(hsr_exit); MODULE_DESCRIPTION("High-availability Seamless Redundancy (HSR) driver"); MODULE_LICENSE("GPL"); |
| 1417 651 768 1422 14 14 14 14 14 14 14 14 1380 1378 934 523 1405 2 1 1 1406 26 1382 355 1146 1374 1412 98 652 761 955 539 1383 1415 64 64 65 2 21 45 64 64 65 65 5 1 1 1 2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 | // SPDX-License-Identifier: GPL-2.0 /* * linux/fs/read_write.c * * Copyright (C) 1991, 1992 Linus Torvalds */ #include <linux/slab.h> #include <linux/stat.h> #include <linux/sched/xacct.h> #include <linux/fcntl.h> #include <linux/file.h> #include <linux/uio.h> #include <linux/fsnotify.h> #include <linux/security.h> #include <linux/export.h> #include <linux/syscalls.h> #include <linux/pagemap.h> #include <linux/splice.h> #include <linux/compat.h> #include <linux/mount.h> #include <linux/fs.h> #include "internal.h" #include <linux/uaccess.h> #include <asm/unistd.h> const struct file_operations generic_ro_fops = { .llseek = generic_file_llseek, .read_iter = generic_file_read_iter, .mmap_prepare = generic_file_readonly_mmap_prepare, .splice_read = filemap_splice_read, }; EXPORT_SYMBOL(generic_ro_fops); static inline bool unsigned_offsets(struct file *file) { return file->f_op->fop_flags & FOP_UNSIGNED_OFFSET; } /** * vfs_setpos_cookie - update the file offset for lseek and reset cookie * @file: file structure in question * @offset: file offset to seek to * @maxsize: maximum file size * @cookie: cookie to reset * * Update the file offset to the value specified by @offset if the given * offset is valid and it is not equal to the current file offset and * reset the specified cookie to indicate that a seek happened. * * Return the specified offset on success and -EINVAL on invalid offset. */ static loff_t vfs_setpos_cookie(struct file *file, loff_t offset, loff_t maxsize, u64 *cookie) { if (offset < 0 && !unsigned_offsets(file)) return -EINVAL; if (offset > maxsize) return -EINVAL; if (offset != file->f_pos) { file->f_pos = offset; if (cookie) *cookie = 0; } return offset; } /** * vfs_setpos - update the file offset for lseek * @file: file structure in question * @offset: file offset to seek to * @maxsize: maximum file size * * This is a low-level filesystem helper for updating the file offset to * the value specified by @offset if the given offset is valid and it is * not equal to the current file offset. * * Return the specified offset on success and -EINVAL on invalid offset. */ loff_t vfs_setpos(struct file *file, loff_t offset, loff_t maxsize) { return vfs_setpos_cookie(file, offset, maxsize, NULL); } EXPORT_SYMBOL(vfs_setpos); /** * must_set_pos - check whether f_pos has to be updated * @file: file to seek on * @offset: offset to use * @whence: type of seek operation * @eof: end of file * * Check whether f_pos needs to be updated and update @offset according * to @whence. * * Return: 0 if f_pos doesn't need to be updated, 1 if f_pos has to be * updated, and negative error code on failure. */ static int must_set_pos(struct file *file, loff_t *offset, int whence, loff_t eof) { switch (whence) { case SEEK_END: *offset += eof; break; case SEEK_CUR: /* * Here we special-case the lseek(fd, 0, SEEK_CUR) * position-querying operation. Avoid rewriting the "same" * f_pos value back to the file because a concurrent read(), * write() or lseek() might have altered it */ if (*offset == 0) { *offset = file->f_pos; return 0; } break; case SEEK_DATA: /* * In the generic case the entire file is data, so as long as * offset isn't at the end of the file then the offset is data. */ if ((unsigned long long)*offset >= eof) return -ENXIO; break; case SEEK_HOLE: /* * There is a virtual hole at the end of the file, so as long as * offset isn't i_size or larger, return i_size. */ if ((unsigned long long)*offset >= eof) return -ENXIO; *offset = eof; break; } return 1; } /** * generic_file_llseek_size - generic llseek implementation for regular files * @file: file structure to seek on * @offset: file offset to seek to * @whence: type of seek * @maxsize: max size of this file in file system * @eof: offset used for SEEK_END position * * This is a variant of generic_file_llseek that allows passing in a custom * maximum file size and a custom EOF position, for e.g. hashed directories * * Synchronization: * SEEK_SET and SEEK_END are unsynchronized (but atomic on 64bit platforms) * SEEK_CUR is synchronized against other SEEK_CURs, but not read/writes. * read/writes behave like SEEK_SET against seeks. */ loff_t generic_file_llseek_size(struct file *file, loff_t offset, int whence, loff_t maxsize, loff_t eof) { int ret; ret = must_set_pos(file, &offset, whence, eof); if (ret < 0) return ret; if (ret == 0) return offset; if (whence == SEEK_CUR) { /* * If the file requires locking via f_pos_lock we know * that mutual exclusion for SEEK_CUR on the same file * is guaranteed. If the file isn't locked, we take * f_lock to protect against f_pos races with other * SEEK_CURs. */ if (file_seek_cur_needs_f_lock(file)) { guard(spinlock)(&file->f_lock); return vfs_setpos(file, file->f_pos + offset, maxsize); } return vfs_setpos(file, file->f_pos + offset, maxsize); } return vfs_setpos(file, offset, maxsize); } EXPORT_SYMBOL(generic_file_llseek_size); /** * generic_llseek_cookie - versioned llseek implementation * @file: file structure to seek on * @offset: file offset to seek to * @whence: type of seek * @cookie: cookie to update * * See generic_file_llseek for a general description and locking assumptions. * * In contrast to generic_file_llseek, this function also resets a * specified cookie to indicate a seek took place. */ loff_t generic_llseek_cookie(struct file *file, loff_t offset, int whence, u64 *cookie) { struct inode *inode = file->f_mapping->host; loff_t maxsize = inode->i_sb->s_maxbytes; loff_t eof = i_size_read(inode); int ret; if (WARN_ON_ONCE(!cookie)) return -EINVAL; /* * Require that this is only used for directories that guarantee * synchronization between readdir and seek so that an update to * @cookie is correctly synchronized with concurrent readdir. */ if (WARN_ON_ONCE(!(file->f_mode & FMODE_ATOMIC_POS))) return -EINVAL; ret = must_set_pos(file, &offset, whence, eof); if (ret < 0) return ret; if (ret == 0) return offset; /* No need to hold f_lock because we know that f_pos_lock is held. */ if (whence == SEEK_CUR) return vfs_setpos_cookie(file, file->f_pos + offset, maxsize, cookie); return vfs_setpos_cookie(file, offset, maxsize, cookie); } EXPORT_SYMBOL(generic_llseek_cookie); /** * generic_file_llseek - generic llseek implementation for regular files * @file: file structure to seek on * @offset: file offset to seek to * @whence: type of seek * * This is a generic implementation of ->llseek useable for all normal local * filesystems. It just updates the file offset to the value specified by * @offset and @whence. */ loff_t generic_file_llseek(struct file *file, loff_t offset, int whence) { struct inode *inode = file->f_mapping->host; return generic_file_llseek_size(file, offset, whence, inode->i_sb->s_maxbytes, i_size_read(inode)); } EXPORT_SYMBOL(generic_file_llseek); /** * fixed_size_llseek - llseek implementation for fixed-sized devices * @file: file structure to seek on * @offset: file offset to seek to * @whence: type of seek * @size: size of the file * */ loff_t fixed_size_llseek(struct file *file, loff_t offset, int whence, loff_t size) { switch (whence) { case SEEK_SET: case SEEK_CUR: case SEEK_END: return generic_file_llseek_size(file, offset, whence, size, size); default: return -EINVAL; } } EXPORT_SYMBOL(fixed_size_llseek); /** * no_seek_end_llseek - llseek implementation for fixed-sized devices * @file: file structure to seek on * @offset: file offset to seek to * @whence: type of seek * */ loff_t no_seek_end_llseek(struct file *file, loff_t offset, int whence) { switch (whence) { case SEEK_SET: case SEEK_CUR: return generic_file_llseek_size(file, offset, whence, OFFSET_MAX, 0); default: return -EINVAL; } } EXPORT_SYMBOL(no_seek_end_llseek); /** * no_seek_end_llseek_size - llseek implementation for fixed-sized devices * @file: file structure to seek on * @offset: file offset to seek to * @whence: type of seek * @size: maximal offset allowed * */ loff_t no_seek_end_llseek_size(struct file *file, loff_t offset, int whence, loff_t size) { switch (whence) { case SEEK_SET: case SEEK_CUR: return generic_file_llseek_size(file, offset, whence, size, 0); default: return -EINVAL; } } EXPORT_SYMBOL(no_seek_end_llseek_size); /** * noop_llseek - No Operation Performed llseek implementation * @file: file structure to seek on * @offset: file offset to seek to * @whence: type of seek * * This is an implementation of ->llseek useable for the rare special case when * userspace expects the seek to succeed but the (device) file is actually not * able to perform the seek. In this case you use noop_llseek() instead of * falling back to the default implementation of ->llseek. */ loff_t noop_llseek(struct file *file, loff_t offset, int whence) { return file->f_pos; } EXPORT_SYMBOL(noop_llseek); loff_t default_llseek(struct file *file, loff_t offset, int whence) { struct inode *inode = file_inode(file); loff_t retval; retval = inode_lock_killable(inode); if (retval) return retval; switch (whence) { case SEEK_END: offset += i_size_read(inode); break; case SEEK_CUR: if (offset == 0) { retval = file->f_pos; goto out; } offset += file->f_pos; break; case SEEK_DATA: /* * In the generic case the entire file is data, so as * long as offset isn't at the end of the file then the * offset is data. */ if (offset >= inode->i_size) { retval = -ENXIO; goto out; } break; case SEEK_HOLE: /* * There is a virtual hole at the end of the file, so * as long as offset isn't i_size or larger, return * i_size. */ if (offset >= inode->i_size) { retval = -ENXIO; goto out; } offset = inode->i_size; break; } retval = -EINVAL; if (offset >= 0 || unsigned_offsets(file)) { if (offset != file->f_pos) file->f_pos = offset; retval = offset; } out: inode_unlock(inode); return retval; } EXPORT_SYMBOL(default_llseek); loff_t vfs_llseek(struct file *file, loff_t offset, int whence) { if (!(file->f_mode & FMODE_LSEEK)) return -ESPIPE; return file->f_op->llseek(file, offset, whence); } EXPORT_SYMBOL(vfs_llseek); static off_t ksys_lseek(unsigned int fd, off_t offset, unsigned int whence) { off_t retval; CLASS(fd_pos, f)(fd); if (fd_empty(f)) return -EBADF; retval = -EINVAL; if (whence <= SEEK_MAX) { loff_t res = vfs_llseek(fd_file(f), offset, whence); retval = res; if (res != (loff_t)retval) retval = -EOVERFLOW; /* LFS: should only happen on 32 bit platforms */ } return retval; } SYSCALL_DEFINE3(lseek, unsigned int, fd, off_t, offset, unsigned int, whence) { return ksys_lseek(fd, offset, whence); } #ifdef CONFIG_COMPAT COMPAT_SYSCALL_DEFINE3(lseek, unsigned int, fd, compat_off_t, offset, unsigned int, whence) { return ksys_lseek(fd, offset, whence); } #endif #if !defined(CONFIG_64BIT) || defined(CONFIG_COMPAT) || \ defined(__ARCH_WANT_SYS_LLSEEK) SYSCALL_DEFINE5(llseek, unsigned int, fd, unsigned long, offset_high, unsigned long, offset_low, loff_t __user *, result, unsigned int, whence) { int retval; CLASS(fd_pos, f)(fd); loff_t offset; if (fd_empty(f)) return -EBADF; if (whence > SEEK_MAX) return -EINVAL; offset = vfs_llseek(fd_file(f), ((loff_t) offset_high << 32) | offset_low, whence); retval = (int)offset; if (offset >= 0) { retval = -EFAULT; if (!copy_to_user(result, &offset, sizeof(offset))) retval = 0; } return retval; } #endif int rw_verify_area(int read_write, struct file *file, const loff_t *ppos, size_t count) { int mask = read_write == READ ? MAY_READ : MAY_WRITE; int ret; if (unlikely((ssize_t) count < 0)) return -EINVAL; if (ppos) { loff_t pos = *ppos; if (unlikely(pos < 0)) { if (!unsigned_offsets(file)) return -EINVAL; if (count >= -pos) /* both values are in 0..LLONG_MAX */ return -EOVERFLOW; } else if (unlikely((loff_t) (pos + count) < 0)) { if (!unsigned_offsets(file)) return -EINVAL; } } ret = security_file_permission(file, mask); if (ret) return ret; return fsnotify_file_area_perm(file, mask, ppos, count); } EXPORT_SYMBOL(rw_verify_area); static ssize_t new_sync_read(struct file *filp, char __user *buf, size_t len, loff_t *ppos) { struct kiocb kiocb; struct iov_iter iter; ssize_t ret; init_sync_kiocb(&kiocb, filp); kiocb.ki_pos = (ppos ? *ppos : 0); iov_iter_ubuf(&iter, ITER_DEST, buf, len); ret = filp->f_op->read_iter(&kiocb, &iter); BUG_ON(ret == -EIOCBQUEUED); if (ppos) *ppos = kiocb.ki_pos; return ret; } static int warn_unsupported(struct file *file, const char *op) { pr_warn_ratelimited( "kernel %s not supported for file %pD4 (pid: %d comm: %.20s)\n", op, file, current->pid, current->comm); return -EINVAL; } ssize_t __kernel_read(struct file *file, void *buf, size_t count, loff_t *pos) { struct kvec iov = { .iov_base = buf, .iov_len = min_t(size_t, count, MAX_RW_COUNT), }; struct kiocb kiocb; struct iov_iter iter; ssize_t ret; if (WARN_ON_ONCE(!(file->f_mode & FMODE_READ))) return -EINVAL; if (!(file->f_mode & FMODE_CAN_READ)) return -EINVAL; /* * Also fail if ->read_iter and ->read are both wired up as that * implies very convoluted semantics. */ if (unlikely(!file->f_op->read_iter || file->f_op->read)) return warn_unsupported(file, "read"); init_sync_kiocb(&kiocb, file); kiocb.ki_pos = pos ? *pos : 0; iov_iter_kvec(&iter, ITER_DEST, &iov, 1, iov.iov_len); ret = file->f_op->read_iter(&kiocb, &iter); if (ret > 0) { if (pos) *pos = kiocb.ki_pos; fsnotify_access(file); add_rchar(current, ret); } inc_syscr(current); return ret; } ssize_t kernel_read(struct file *file, void *buf, size_t count, loff_t *pos) { ssize_t ret; ret = rw_verify_area(READ, file, pos, count); if (ret) return ret; return __kernel_read(file, buf, count, pos); } EXPORT_SYMBOL(kernel_read); ssize_t vfs_read(struct file *file, char __user *buf, size_t count, loff_t *pos) { ssize_t ret; if (!(file->f_mode & FMODE_READ)) return -EBADF; if (!(file->f_mode & FMODE_CAN_READ)) return -EINVAL; if (unlikely(!access_ok(buf, count))) return -EFAULT; ret = rw_verify_area(READ, file, pos, count); if (ret) return ret; if (count > MAX_RW_COUNT) count = MAX_RW_COUNT; if (file->f_op->read) ret = file->f_op->read(file, buf, count, pos); else if (file->f_op->read_iter) ret = new_sync_read(file, buf, count, pos); else ret = -EINVAL; if (ret > 0) { fsnotify_access(file); add_rchar(current, ret); } inc_syscr(current); return ret; } static ssize_t new_sync_write(struct file *filp, const char __user *buf, size_t len, loff_t *ppos) { struct kiocb kiocb; struct iov_iter iter; ssize_t ret; init_sync_kiocb(&kiocb, filp); kiocb.ki_pos = (ppos ? *ppos : 0); iov_iter_ubuf(&iter, ITER_SOURCE, (void __user *)buf, len); ret = filp->f_op->write_iter(&kiocb, &iter); BUG_ON(ret == -EIOCBQUEUED); if (ret > 0 && ppos) *ppos = kiocb.ki_pos; return ret; } /* caller is responsible for file_start_write/file_end_write */ ssize_t __kernel_write_iter(struct file *file, struct iov_iter *from, loff_t *pos) { struct kiocb kiocb; ssize_t ret; if (WARN_ON_ONCE(!(file->f_mode & FMODE_WRITE))) return -EBADF; if (!(file->f_mode & FMODE_CAN_WRITE)) return -EINVAL; /* * Also fail if ->write_iter and ->write are both wired up as that * implies very convoluted semantics. */ if (unlikely(!file->f_op->write_iter || file->f_op->write)) return warn_unsupported(file, "write"); init_sync_kiocb(&kiocb, file); kiocb.ki_pos = pos ? *pos : 0; ret = file->f_op->write_iter(&kiocb, from); if (ret > 0) { if (pos) *pos = kiocb.ki_pos; fsnotify_modify(file); add_wchar(current, ret); } inc_syscw(current); return ret; } /* caller is responsible for file_start_write/file_end_write */ ssize_t __kernel_write(struct file *file, const void *buf, size_t count, loff_t *pos) { struct kvec iov = { .iov_base = (void *)buf, .iov_len = min_t(size_t, count, MAX_RW_COUNT), }; struct iov_iter iter; iov_iter_kvec(&iter, ITER_SOURCE, &iov, 1, iov.iov_len); return __kernel_write_iter(file, &iter, pos); } /* * This "EXPORT_SYMBOL_GPL()" is more of a "EXPORT_SYMBOL_DONTUSE()", * but autofs is one of the few internal kernel users that actually * wants this _and_ can be built as a module. So we need to export * this symbol for autofs, even though it really isn't appropriate * for any other kernel modules. */ EXPORT_SYMBOL_GPL(__kernel_write); ssize_t kernel_write(struct file *file, const void *buf, size_t count, loff_t *pos) { ssize_t ret; ret = rw_verify_area(WRITE, file, pos, count); if (ret) return ret; file_start_write(file); ret = __kernel_write(file, buf, count, pos); file_end_write(file); return ret; } EXPORT_SYMBOL(kernel_write); ssize_t vfs_write(struct file *file, const char __user *buf, size_t count, loff_t *pos) { ssize_t ret; if (!(file->f_mode & FMODE_WRITE)) return -EBADF; if (!(file->f_mode & FMODE_CAN_WRITE)) return -EINVAL; if (unlikely(!access_ok(buf, count))) return -EFAULT; ret = rw_verify_area(WRITE, file, pos, count); if (ret) return ret; if (count > MAX_RW_COUNT) count = MAX_RW_COUNT; file_start_write(file); if (file->f_op->write) ret = file->f_op->write(file, buf, count, pos); else if (file->f_op->write_iter) ret = new_sync_write(file, buf, count, pos); else ret = -EINVAL; if (ret > 0) { fsnotify_modify(file); add_wchar(current, ret); } inc_syscw(current); file_end_write(file); return ret; } /* file_ppos returns &file->f_pos or NULL if file is stream */ static inline loff_t *file_ppos(struct file *file) { return file->f_mode & FMODE_STREAM ? NULL : &file->f_pos; } ssize_t ksys_read(unsigned int fd, char __user *buf, size_t count) { CLASS(fd_pos, f)(fd); ssize_t ret = -EBADF; if (!fd_empty(f)) { loff_t pos, *ppos = file_ppos(fd_file(f)); if (ppos) { pos = *ppos; ppos = &pos; } ret = vfs_read(fd_file(f), buf, count, ppos); if (ret >= 0 && ppos) fd_file(f)->f_pos = pos; } return ret; } SYSCALL_DEFINE3(read, unsigned int, fd, char __user *, buf, size_t, count) { return ksys_read(fd, buf, count); } ssize_t ksys_write(unsigned int fd, const char __user *buf, size_t count) { CLASS(fd_pos, f)(fd); ssize_t ret = -EBADF; if (!fd_empty(f)) { loff_t pos, *ppos = file_ppos(fd_file(f)); if (ppos) { pos = *ppos; ppos = &pos; } ret = vfs_write(fd_file(f), buf, count, ppos); if (ret >= 0 && ppos) fd_file(f)->f_pos = pos; } return ret; } SYSCALL_DEFINE3(write, unsigned int, fd, const char __user *, buf, size_t, count) { return ksys_write(fd, buf, count); } ssize_t ksys_pread64(unsigned int fd, char __user *buf, size_t count, loff_t pos) { if (pos < 0) return -EINVAL; CLASS(fd, f)(fd); if (fd_empty(f)) return -EBADF; if (fd_file(f)->f_mode & FMODE_PREAD) return vfs_read(fd_file(f), buf, count, &pos); return -ESPIPE; } SYSCALL_DEFINE4(pread64, unsigned int, fd, char __user *, buf, size_t, count, loff_t, pos) { return ksys_pread64(fd, buf, count, pos); } #if defined(CONFIG_COMPAT) && defined(__ARCH_WANT_COMPAT_PREAD64) COMPAT_SYSCALL_DEFINE5(pread64, unsigned int, fd, char __user *, buf, size_t, count, compat_arg_u64_dual(pos)) { return ksys_pread64(fd, buf, count, compat_arg_u64_glue(pos)); } #endif ssize_t ksys_pwrite64(unsigned int fd, const char __user *buf, size_t count, loff_t pos) { if (pos < 0) return -EINVAL; CLASS(fd, f)(fd); if (fd_empty(f)) return -EBADF; if (fd_file(f)->f_mode & FMODE_PWRITE) return vfs_write(fd_file(f), buf, count, &pos); return -ESPIPE; } SYSCALL_DEFINE4(pwrite64, unsigned int, fd, const char __user *, buf, size_t, count, loff_t, pos) { return ksys_pwrite64(fd, buf, count, pos); } #if defined(CONFIG_COMPAT) && defined(__ARCH_WANT_COMPAT_PWRITE64) COMPAT_SYSCALL_DEFINE5(pwrite64, unsigned int, fd, const char __user *, buf, size_t, count, compat_arg_u64_dual(pos)) { return ksys_pwrite64(fd, buf, count, compat_arg_u64_glue(pos)); } #endif static ssize_t do_iter_readv_writev(struct file *filp, struct iov_iter *iter, loff_t *ppos, int type, rwf_t flags) { struct kiocb kiocb; ssize_t ret; init_sync_kiocb(&kiocb, filp); ret = kiocb_set_rw_flags(&kiocb, flags, type); if (ret) return ret; kiocb.ki_pos = (ppos ? *ppos : 0); if (type == READ) ret = filp->f_op->read_iter(&kiocb, iter); else ret = filp->f_op->write_iter(&kiocb, iter); BUG_ON(ret == -EIOCBQUEUED); if (ppos) *ppos = kiocb.ki_pos; return ret; } /* Do it by hand, with file-ops */ static ssize_t do_loop_readv_writev(struct file *filp, struct iov_iter *iter, loff_t *ppos, int type, rwf_t flags) { ssize_t ret = 0; if (flags & ~RWF_HIPRI) return -EOPNOTSUPP; while (iov_iter_count(iter)) { ssize_t nr; if (type == READ) { nr = filp->f_op->read(filp, iter_iov_addr(iter), iter_iov_len(iter), ppos); } else { nr = filp->f_op->write(filp, iter_iov_addr(iter), iter_iov_len(iter), ppos); } if (nr < 0) { if (!ret) ret = nr; break; } ret += nr; if (nr != iter_iov_len(iter)) break; iov_iter_advance(iter, nr); } return ret; } ssize_t vfs_iocb_iter_read(struct file *file, struct kiocb *iocb, struct iov_iter *iter) { size_t tot_len; ssize_t ret = 0; if (!file->f_op->read_iter) return -EINVAL; if (!(file->f_mode & FMODE_READ)) return -EBADF; if (!(file->f_mode & FMODE_CAN_READ)) return -EINVAL; tot_len = iov_iter_count(iter); if (!tot_len) goto out; ret = rw_verify_area(READ, file, &iocb->ki_pos, tot_len); if (ret < 0) return ret; ret = file->f_op->read_iter(iocb, iter); out: if (ret >= 0) fsnotify_access(file); return ret; } EXPORT_SYMBOL(vfs_iocb_iter_read); ssize_t vfs_iter_read(struct file *file, struct iov_iter *iter, loff_t *ppos, rwf_t flags) { size_t tot_len; ssize_t ret = 0; if (!file->f_op->read_iter) return -EINVAL; if (!(file->f_mode & FMODE_READ)) return -EBADF; if (!(file->f_mode & FMODE_CAN_READ)) return -EINVAL; tot_len = iov_iter_count(iter); if (!tot_len) goto out; ret = rw_verify_area(READ, file, ppos, tot_len); if (ret < 0) return ret; ret = do_iter_readv_writev(file, iter, ppos, READ, flags); out: if (ret >= 0) fsnotify_access(file); return ret; } EXPORT_SYMBOL(vfs_iter_read); /* * Caller is responsible for calling kiocb_end_write() on completion * if async iocb was queued. */ ssize_t vfs_iocb_iter_write(struct file *file, struct kiocb *iocb, struct iov_iter *iter) { size_t tot_len; ssize_t ret = 0; if (!file->f_op->write_iter) return -EINVAL; if (!(file->f_mode & FMODE_WRITE)) return -EBADF; if (!(file->f_mode & FMODE_CAN_WRITE)) return -EINVAL; tot_len = iov_iter_count(iter); if (!tot_len) return 0; ret = rw_verify_area(WRITE, file, &iocb->ki_pos, tot_len); if (ret < 0) return ret; kiocb_start_write(iocb); ret = file->f_op->write_iter(iocb, iter); if (ret != -EIOCBQUEUED) kiocb_end_write(iocb); if (ret > 0) fsnotify_modify(file); return ret; } EXPORT_SYMBOL(vfs_iocb_iter_write); ssize_t vfs_iter_write(struct file *file, struct iov_iter *iter, loff_t *ppos, rwf_t flags) { size_t tot_len; ssize_t ret; if (!(file->f_mode & FMODE_WRITE)) return -EBADF; if (!(file->f_mode & FMODE_CAN_WRITE)) return -EINVAL; if (!file->f_op->write_iter) return -EINVAL; tot_len = iov_iter_count(iter); if (!tot_len) return 0; ret = rw_verify_area(WRITE, file, ppos, tot_len); if (ret < 0) return ret; file_start_write(file); ret = do_iter_readv_writev(file, iter, ppos, WRITE, flags); if (ret > 0) fsnotify_modify(file); file_end_write(file); return ret; } EXPORT_SYMBOL(vfs_iter_write); static ssize_t vfs_readv(struct file *file, const struct iovec __user *vec, unsigned long vlen, loff_t *pos, rwf_t flags) { struct iovec iovstack[UIO_FASTIOV]; struct iovec *iov = iovstack; struct iov_iter iter; size_t tot_len; ssize_t ret = 0; if (!(file->f_mode & FMODE_READ)) return -EBADF; if (!(file->f_mode & FMODE_CAN_READ)) return -EINVAL; ret = import_iovec(ITER_DEST, vec, vlen, ARRAY_SIZE(iovstack), &iov, &iter); if (ret < 0) return ret; tot_len = iov_iter_count(&iter); if (!tot_len) goto out; ret = rw_verify_area(READ, file, pos, tot_len); if (ret < 0) goto out; if (file->f_op->read_iter) ret = do_iter_readv_writev(file, &iter, pos, READ, flags); else ret = do_loop_readv_writev(file, &iter, pos, READ, flags); out: if (ret >= 0) fsnotify_access(file); kfree(iov); return ret; } static ssize_t vfs_writev(struct file *file, const struct iovec __user *vec, unsigned long vlen, loff_t *pos, rwf_t flags) { struct iovec iovstack[UIO_FASTIOV]; struct iovec *iov = iovstack; struct iov_iter iter; size_t tot_len; ssize_t ret = 0; if (!(file->f_mode & FMODE_WRITE)) return -EBADF; if (!(file->f_mode & FMODE_CAN_WRITE)) return -EINVAL; ret = import_iovec(ITER_SOURCE, vec, vlen, ARRAY_SIZE(iovstack), &iov, &iter); if (ret < 0) return ret; tot_len = iov_iter_count(&iter); if (!tot_len) goto out; ret = rw_verify_area(WRITE, file, pos, tot_len); if (ret < 0) goto out; file_start_write(file); if (file->f_op->write_iter) ret = do_iter_readv_writev(file, &iter, pos, WRITE, flags); else ret = do_loop_readv_writev(file, &iter, pos, WRITE, flags); if (ret > 0) fsnotify_modify(file); file_end_write(file); out: kfree(iov); return ret; } static ssize_t do_readv(unsigned long fd, const struct iovec __user *vec, unsigned long vlen, rwf_t flags) { CLASS(fd_pos, f)(fd); ssize_t ret = -EBADF; if (!fd_empty(f)) { loff_t pos, *ppos = file_ppos(fd_file(f)); if (ppos) { pos = *ppos; ppos = &pos; } ret = vfs_readv(fd_file(f), vec, vlen, ppos, flags); if (ret >= 0 && ppos) fd_file(f)->f_pos = pos; } if (ret > 0) add_rchar(current, ret); inc_syscr(current); return ret; } static ssize_t do_writev(unsigned long fd, const struct iovec __user *vec, unsigned long vlen, rwf_t flags) { CLASS(fd_pos, f)(fd); ssize_t ret = -EBADF; if (!fd_empty(f)) { loff_t pos, *ppos = file_ppos(fd_file(f)); if (ppos) { pos = *ppos; ppos = &pos; } ret = vfs_writev(fd_file(f), vec, vlen, ppos, flags); if (ret >= 0 && ppos) fd_file(f)->f_pos = pos; } if (ret > 0) add_wchar(current, ret); inc_syscw(current); return ret; } static inline loff_t pos_from_hilo(unsigned long high, unsigned long low) { #define HALF_LONG_BITS (BITS_PER_LONG / 2) return (((loff_t)high << HALF_LONG_BITS) << HALF_LONG_BITS) | low; } static ssize_t do_preadv(unsigned long fd, const struct iovec __user *vec, unsigned long vlen, loff_t pos, rwf_t flags) { ssize_t ret = -EBADF; if (pos < 0) return -EINVAL; CLASS(fd, f)(fd); if (!fd_empty(f)) { ret = -ESPIPE; if (fd_file(f)->f_mode & FMODE_PREAD) ret = vfs_readv(fd_file(f), vec, vlen, &pos, flags); } if (ret > 0) add_rchar(current, ret); inc_syscr(current); return ret; } static ssize_t do_pwritev(unsigned long fd, const struct iovec __user *vec, unsigned long vlen, loff_t pos, rwf_t flags) { ssize_t ret = -EBADF; if (pos < 0) return -EINVAL; CLASS(fd, f)(fd); if (!fd_empty(f)) { ret = -ESPIPE; if (fd_file(f)->f_mode & FMODE_PWRITE) ret = vfs_writev(fd_file(f), vec, vlen, &pos, flags); } if (ret > 0) add_wchar(current, ret); inc_syscw(current); return ret; } SYSCALL_DEFINE3(readv, unsigned long, fd, const struct iovec __user *, vec, unsigned long, vlen) { return do_readv(fd, vec, vlen, 0); } SYSCALL_DEFINE3(writev, unsigned long, fd, const struct iovec __user *, vec, unsigned long, vlen) { return do_writev(fd, vec, vlen, 0); } SYSCALL_DEFINE5(preadv, unsigned long, fd, const struct iovec __user *, vec, unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h) { loff_t pos = pos_from_hilo(pos_h, pos_l); return do_preadv(fd, vec, vlen, pos, 0); } SYSCALL_DEFINE6(preadv2, unsigned long, fd, const struct iovec __user *, vec, unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h, rwf_t, flags) { loff_t pos = pos_from_hilo(pos_h, pos_l); if (pos == -1) return do_readv(fd, vec, vlen, flags); return do_preadv(fd, vec, vlen, pos, flags); } SYSCALL_DEFINE5(pwritev, unsigned long, fd, const struct iovec __user *, vec, unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h) { loff_t pos = pos_from_hilo(pos_h, pos_l); return do_pwritev(fd, vec, vlen, pos, 0); } SYSCALL_DEFINE6(pwritev2, unsigned long, fd, const struct iovec __user *, vec, unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h, rwf_t, flags) { loff_t pos = pos_from_hilo(pos_h, pos_l); if (pos == -1) return do_writev(fd, vec, vlen, flags); return do_pwritev(fd, vec, vlen, pos, flags); } /* * Various compat syscalls. Note that they all pretend to take a native * iovec - import_iovec will properly treat those as compat_iovecs based on * in_compat_syscall(). */ #ifdef CONFIG_COMPAT #ifdef __ARCH_WANT_COMPAT_SYS_PREADV64 COMPAT_SYSCALL_DEFINE4(preadv64, unsigned long, fd, const struct iovec __user *, vec, unsigned long, vlen, loff_t, pos) { return do_preadv(fd, vec, vlen, pos, 0); } #endif COMPAT_SYSCALL_DEFINE5(preadv, compat_ulong_t, fd, const struct iovec __user *, vec, compat_ulong_t, vlen, u32, pos_low, u32, pos_high) { loff_t pos = ((loff_t)pos_high << 32) | pos_low; return do_preadv(fd, vec, vlen, pos, 0); } #ifdef __ARCH_WANT_COMPAT_SYS_PREADV64V2 COMPAT_SYSCALL_DEFINE5(preadv64v2, unsigned long, fd, const struct iovec __user *, vec, unsigned long, vlen, loff_t, pos, rwf_t, flags) { if (pos == -1) return do_readv(fd, vec, vlen, flags); return do_preadv(fd, vec, vlen, pos, flags); } #endif COMPAT_SYSCALL_DEFINE6(preadv2, compat_ulong_t, fd, const struct iovec __user *, vec, compat_ulong_t, vlen, u32, pos_low, u32, pos_high, rwf_t, flags) { loff_t pos = ((loff_t)pos_high << 32) | pos_low; if (pos == -1) return do_readv(fd, vec, vlen, flags); return do_preadv(fd, vec, vlen, pos, flags); } #ifdef __ARCH_WANT_COMPAT_SYS_PWRITEV64 COMPAT_SYSCALL_DEFINE4(pwritev64, unsigned long, fd, const struct iovec __user *, vec, unsigned long, vlen, loff_t, pos) { return do_pwritev(fd, vec, vlen, pos, 0); } #endif COMPAT_SYSCALL_DEFINE5(pwritev, compat_ulong_t, fd, const struct iovec __user *,vec, compat_ulong_t, vlen, u32, pos_low, u32, pos_high) { loff_t pos = ((loff_t)pos_high << 32) | pos_low; return do_pwritev(fd, vec, vlen, pos, 0); } #ifdef __ARCH_WANT_COMPAT_SYS_PWRITEV64V2 COMPAT_SYSCALL_DEFINE5(pwritev64v2, unsigned long, fd, const struct iovec __user *, vec, unsigned long, vlen, loff_t, pos, rwf_t, flags) { if (pos == -1) return do_writev(fd, vec, vlen, flags); return do_pwritev(fd, vec, vlen, pos, flags); } #endif COMPAT_SYSCALL_DEFINE6(pwritev2, compat_ulong_t, fd, const struct iovec __user *,vec, compat_ulong_t, vlen, u32, pos_low, u32, pos_high, rwf_t, flags) { loff_t pos = ((loff_t)pos_high << 32) | pos_low; if (pos == -1) return do_writev(fd, vec, vlen, flags); return do_pwritev(fd, vec, vlen, pos, flags); } #endif /* CONFIG_COMPAT */ static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos, size_t count, loff_t max) { struct inode *in_inode, *out_inode; struct pipe_inode_info *opipe; loff_t pos; loff_t out_pos; ssize_t retval; int fl; /* * Get input file, and verify that it is ok.. */ CLASS(fd, in)(in_fd); if (fd_empty(in)) return -EBADF; if (!(fd_file(in)->f_mode & FMODE_READ)) return -EBADF; if (!ppos) { pos = fd_file(in)->f_pos; } else { pos = *ppos; if (!(fd_file(in)->f_mode & FMODE_PREAD)) return -ESPIPE; } retval = rw_verify_area(READ, fd_file(in), &pos, count); if (retval < 0) return retval; if (count > MAX_RW_COUNT) count = MAX_RW_COUNT; /* * Get output file, and verify that it is ok.. */ CLASS(fd, out)(out_fd); if (fd_empty(out)) return -EBADF; if (!(fd_file(out)->f_mode & FMODE_WRITE)) return -EBADF; in_inode = file_inode(fd_file(in)); out_inode = file_inode(fd_file(out)); out_pos = fd_file(out)->f_pos; if (!max) max = min(in_inode->i_sb->s_maxbytes, out_inode->i_sb->s_maxbytes); if (unlikely(pos + count > max)) { if (pos >= max) return -EOVERFLOW; count = max - pos; } fl = 0; #if 0 /* * We need to debate whether we can enable this or not. The * man page documents EAGAIN return for the output at least, * and the application is arguably buggy if it doesn't expect * EAGAIN on a non-blocking file descriptor. */ if (fd_file(in)->f_flags & O_NONBLOCK) fl = SPLICE_F_NONBLOCK; #endif opipe = get_pipe_info(fd_file(out), true); if (!opipe) { retval = rw_verify_area(WRITE, fd_file(out), &out_pos, count); if (retval < 0) return retval; retval = do_splice_direct(fd_file(in), &pos, fd_file(out), &out_pos, count, fl); } else { if (fd_file(out)->f_flags & O_NONBLOCK) fl |= SPLICE_F_NONBLOCK; retval = splice_file_to_pipe(fd_file(in), opipe, &pos, count, fl); } if (retval > 0) { add_rchar(current, retval); add_wchar(current, retval); fsnotify_access(fd_file(in)); fsnotify_modify(fd_file(out)); fd_file(out)->f_pos = out_pos; if (ppos) *ppos = pos; else fd_file(in)->f_pos = pos; } inc_syscr(current); inc_syscw(current); if (pos > max) retval = -EOVERFLOW; return retval; } SYSCALL_DEFINE4(sendfile, int, out_fd, int, in_fd, off_t __user *, offset, size_t, count) { loff_t pos; off_t off; ssize_t ret; if (offset) { if (unlikely(get_user(off, offset))) return -EFAULT; pos = off; ret = do_sendfile(out_fd, in_fd, &pos, count, MAX_NON_LFS); if (unlikely(put_user(pos, offset))) return -EFAULT; return ret; } return do_sendfile(out_fd, in_fd, NULL, count, 0); } SYSCALL_DEFINE4(sendfile64, int, out_fd, int, in_fd, loff_t __user *, offset, size_t, count) { loff_t pos; ssize_t ret; if (offset) { if (unlikely(copy_from_user(&pos, offset, sizeof(loff_t)))) return -EFAULT; ret = do_sendfile(out_fd, in_fd, &pos, count, 0); if (unlikely(put_user(pos, offset))) return -EFAULT; return ret; } return do_sendfile(out_fd, in_fd, NULL, count, 0); } #ifdef CONFIG_COMPAT COMPAT_SYSCALL_DEFINE4(sendfile, int, out_fd, int, in_fd, compat_off_t __user *, offset, compat_size_t, count) { loff_t pos; off_t off; ssize_t ret; if (offset) { if (unlikely(get_user(off, offset))) return -EFAULT; pos = off; ret = do_sendfile(out_fd, in_fd, &pos, count, MAX_NON_LFS); if (unlikely(put_user(pos, offset))) return -EFAULT; return ret; } return do_sendfile(out_fd, in_fd, NULL, count, 0); } COMPAT_SYSCALL_DEFINE4(sendfile64, int, out_fd, int, in_fd, compat_loff_t __user *, offset, compat_size_t, count) { loff_t pos; ssize_t ret; if (offset) { if (unlikely(copy_from_user(&pos, offset, sizeof(loff_t)))) return -EFAULT; ret = do_sendfile(out_fd, in_fd, &pos, count, 0); if (unlikely(put_user(pos, offset))) return -EFAULT; return ret; } return do_sendfile(out_fd, in_fd, NULL, count, 0); } #endif /* * Performs necessary checks before doing a file copy * * Can adjust amount of bytes to copy via @req_count argument. * Returns appropriate error code that caller should return or * zero in case the copy should be allowed. */ static int generic_copy_file_checks(struct file *file_in, loff_t pos_in, struct file *file_out, loff_t pos_out, size_t *req_count, unsigned int flags) { struct inode *inode_in = file_inode(file_in); struct inode *inode_out = file_inode(file_out); uint64_t count = *req_count; loff_t size_in; int ret; ret = generic_file_rw_checks(file_in, file_out); if (ret) return ret; /* * We allow some filesystems to handle cross sb copy, but passing * a file of the wrong filesystem type to filesystem driver can result * in an attempt to dereference the wrong type of ->private_data, so * avoid doing that until we really have a good reason. * * nfs and cifs define several different file_system_type structures * and several different sets of file_operations, but they all end up * using the same ->copy_file_range() function pointer. */ if (flags & COPY_FILE_SPLICE) { /* cross sb splice is allowed */ } else if (file_out->f_op->copy_file_range) { if (file_in->f_op->copy_file_range != file_out->f_op->copy_file_range) return -EXDEV; } else if (file_inode(file_in)->i_sb != file_inode(file_out)->i_sb) { return -EXDEV; } /* Don't touch certain kinds of inodes */ if (IS_IMMUTABLE(inode_out)) return -EPERM; if (IS_SWAPFILE(inode_in) || IS_SWAPFILE(inode_out)) return -ETXTBSY; /* Ensure offsets don't wrap. */ if (pos_in + count < pos_in || pos_out + count < pos_out) return -EOVERFLOW; /* Shorten the copy to EOF */ size_in = i_size_read(inode_in); if (pos_in >= size_in) count = 0; else count = min(count, size_in - (uint64_t)pos_in); ret = generic_write_check_limits(file_out, pos_out, &count); if (ret) return ret; /* Don't allow overlapped copying within the same file. */ if (inode_in == inode_out && pos_out + count > pos_in && pos_out < pos_in + count) return -EINVAL; *req_count = count; return 0; } /* * copy_file_range() differs from regular file read and write in that it * specifically allows return partial success. When it does so is up to * the copy_file_range method. */ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in, struct file *file_out, loff_t pos_out, size_t len, unsigned int flags) { ssize_t ret; bool splice = flags & COPY_FILE_SPLICE; bool samesb = file_inode(file_in)->i_sb == file_inode(file_out)->i_sb; if (flags & ~COPY_FILE_SPLICE) return -EINVAL; ret = generic_copy_file_checks(file_in, pos_in, file_out, pos_out, &len, flags); if (unlikely(ret)) return ret; ret = rw_verify_area(READ, file_in, &pos_in, len); if (unlikely(ret)) return ret; ret = rw_verify_area(WRITE, file_out, &pos_out, len); if (unlikely(ret)) return ret; if (len == 0) return 0; /* * Make sure return value doesn't overflow in 32bit compat mode. Also * limit the size for all cases except when calling ->copy_file_range(). */ if (splice || !file_out->f_op->copy_file_range || in_compat_syscall()) len = min_t(size_t, MAX_RW_COUNT, len); file_start_write(file_out); /* * Cloning is supported by more file systems, so we implement copy on * same sb using clone, but for filesystems where both clone and copy * are supported (e.g. nfs,cifs), we only call the copy method. */ if (!splice && file_out->f_op->copy_file_range) { ret = file_out->f_op->copy_file_range(file_in, pos_in, file_out, pos_out, len, flags); } else if (!splice && file_in->f_op->remap_file_range && samesb) { ret = file_in->f_op->remap_file_range(file_in, pos_in, file_out, pos_out, len, REMAP_FILE_CAN_SHORTEN); /* fallback to splice */ if (ret <= 0) splice = true; } else if (samesb) { /* Fallback to splice for same sb copy for backward compat */ splice = true; } file_end_write(file_out); if (!splice) goto done; /* * We can get here for same sb copy of filesystems that do not implement * ->copy_file_range() in case filesystem does not support clone or in * case filesystem supports clone but rejected the clone request (e.g. * because it was not block aligned). * * In both cases, fall back to kernel copy so we are able to maintain a * consistent story about which filesystems support copy_file_range() * and which filesystems do not, that will allow userspace tools to * make consistent desicions w.r.t using copy_file_range(). * * We also get here if caller (e.g. nfsd) requested COPY_FILE_SPLICE * for server-side-copy between any two sb. * * In any case, we call do_splice_direct() and not splice_file_range(), * without file_start_write() held, to avoid possible deadlocks related * to splicing from input file, while file_start_write() is held on * the output file on a different sb. */ ret = do_splice_direct(file_in, &pos_in, file_out, &pos_out, len, 0); done: if (ret > 0) { fsnotify_access(file_in); add_rchar(current, ret); fsnotify_modify(file_out); add_wchar(current, ret); } inc_syscr(current); inc_syscw(current); return ret; } EXPORT_SYMBOL(vfs_copy_file_range); SYSCALL_DEFINE6(copy_file_range, int, fd_in, loff_t __user *, off_in, int, fd_out, loff_t __user *, off_out, size_t, len, unsigned int, flags) { loff_t pos_in; loff_t pos_out; ssize_t ret = -EBADF; CLASS(fd, f_in)(fd_in); if (fd_empty(f_in)) return -EBADF; CLASS(fd, f_out)(fd_out); if (fd_empty(f_out)) return -EBADF; if (off_in) { if (copy_from_user(&pos_in, off_in, sizeof(loff_t))) return -EFAULT; } else { pos_in = fd_file(f_in)->f_pos; } if (off_out) { if (copy_from_user(&pos_out, off_out, sizeof(loff_t))) return -EFAULT; } else { pos_out = fd_file(f_out)->f_pos; } if (flags != 0) return -EINVAL; ret = vfs_copy_file_range(fd_file(f_in), pos_in, fd_file(f_out), pos_out, len, flags); if (ret > 0) { pos_in += ret; pos_out += ret; if (off_in) { if (copy_to_user(off_in, &pos_in, sizeof(loff_t))) ret = -EFAULT; } else { fd_file(f_in)->f_pos = pos_in; } if (off_out) { if (copy_to_user(off_out, &pos_out, sizeof(loff_t))) ret = -EFAULT; } else { fd_file(f_out)->f_pos = pos_out; } } return ret; } /* * Don't operate on ranges the page cache doesn't support, and don't exceed the * LFS limits. If pos is under the limit it becomes a short access. If it * exceeds the limit we return -EFBIG. */ int generic_write_check_limits(struct file *file, loff_t pos, loff_t *count) { struct inode *inode = file->f_mapping->host; loff_t max_size = inode->i_sb->s_maxbytes; loff_t limit = rlimit(RLIMIT_FSIZE); if (limit != RLIM_INFINITY) { if (pos >= limit) { send_sig(SIGXFSZ, current, 0); return -EFBIG; } *count = min(*count, limit - pos); } if (!(file->f_flags & O_LARGEFILE)) max_size = MAX_NON_LFS; if (unlikely(pos >= max_size)) return -EFBIG; *count = min(*count, max_size - pos); return 0; } EXPORT_SYMBOL_GPL(generic_write_check_limits); /* Like generic_write_checks(), but takes size of write instead of iter. */ int generic_write_checks_count(struct kiocb *iocb, loff_t *count) { struct file *file = iocb->ki_filp; struct inode *inode = file->f_mapping->host; if (IS_SWAPFILE(inode)) return -ETXTBSY; if (!*count) return 0; if (iocb->ki_flags & IOCB_APPEND) iocb->ki_pos = i_size_read(inode); if ((iocb->ki_flags & IOCB_NOWAIT) && !((iocb->ki_flags & IOCB_DIRECT) || (file->f_op->fop_flags & FOP_BUFFER_WASYNC))) return -EINVAL; return generic_write_check_limits(iocb->ki_filp, iocb->ki_pos, count); } EXPORT_SYMBOL(generic_write_checks_count); /* * Performs necessary checks before doing a write * * Can adjust writing position or amount of bytes to write. * Returns appropriate error code that caller should return or * zero in case that write should be allowed. */ ssize_t generic_write_checks(struct kiocb *iocb, struct iov_iter *from) { loff_t count = iov_iter_count(from); int ret; ret = generic_write_checks_count(iocb, &count); if (ret) return ret; iov_iter_truncate(from, count); return iov_iter_count(from); } EXPORT_SYMBOL(generic_write_checks); /* * Performs common checks before doing a file copy/clone * from @file_in to @file_out. */ int generic_file_rw_checks(struct file *file_in, struct file *file_out) { struct inode *inode_in = file_inode(file_in); struct inode *inode_out = file_inode(file_out); /* Don't copy dirs, pipes, sockets... */ if (S_ISDIR(inode_in->i_mode) || S_ISDIR(inode_out->i_mode)) return -EISDIR; if (!S_ISREG(inode_in->i_mode) || !S_ISREG(inode_out->i_mode)) return -EINVAL; if (!(file_in->f_mode & FMODE_READ) || !(file_out->f_mode & FMODE_WRITE) || (file_out->f_flags & O_APPEND)) return -EBADF; return 0; } int generic_atomic_write_valid(struct kiocb *iocb, struct iov_iter *iter) { size_t len = iov_iter_count(iter); if (!iter_is_ubuf(iter)) return -EINVAL; if (!is_power_of_2(len)) return -EINVAL; if (!IS_ALIGNED(iocb->ki_pos, len)) return -EINVAL; if (!(iocb->ki_flags & IOCB_DIRECT)) return -EOPNOTSUPP; return 0; } EXPORT_SYMBOL_GPL(generic_atomic_write_valid); |
| 17 17 17 17 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 | /* * Copyright (c) 2006 Intel Corporation. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU * General Public License (GPL) Version 2, available from the file * COPYING in the main directory of this source tree, or the * OpenIB.org BSD license below: * * Redistribution and use in source and binary forms, with or * without modification, are permitted provided that the following * conditions are met: * * - Redistributions of source code must retain the above * copyright notice, this list of conditions and the following * disclaimer. * * - Redistributions in binary form must reproduce the above * copyright notice, this list of conditions and the following * disclaimer in the documentation and/or other materials * provided with the distribution. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE * SOFTWARE. */ #include <linux/completion.h> #include <linux/dma-mapping.h> #include <linux/err.h> #include <linux/interrupt.h> #include <linux/export.h> #include <linux/slab.h> #include <linux/bitops.h> #include <linux/random.h> #include <rdma/ib_cache.h> #include "sa.h" static int mcast_add_one(struct ib_device *device); static void mcast_remove_one(struct ib_device *device, void *client_data); static struct ib_client mcast_client = { .name = "ib_multicast", .add = mcast_add_one, .remove = mcast_remove_one }; static struct ib_sa_client sa_client; static struct workqueue_struct *mcast_wq; static union ib_gid mgid0; struct mcast_device; struct mcast_port { struct mcast_device *dev; spinlock_t lock; struct rb_root table; refcount_t refcount; struct completion comp; u32 port_num; }; struct mcast_device { struct ib_device *device; struct ib_event_handler event_handler; int start_port; int end_port; struct mcast_port port[]; }; enum mcast_state { MCAST_JOINING, MCAST_MEMBER, MCAST_ERROR, }; enum mcast_group_state { MCAST_IDLE, MCAST_BUSY, MCAST_GROUP_ERROR, MCAST_PKEY_EVENT }; enum { MCAST_INVALID_PKEY_INDEX = 0xFFFF }; struct mcast_member; struct mcast_group { struct ib_sa_mcmember_rec rec; struct rb_node node; struct mcast_port *port; spinlock_t lock; struct work_struct work; struct list_head pending_list; struct list_head active_list; struct mcast_member *last_join; int members[NUM_JOIN_MEMBERSHIP_TYPES]; atomic_t refcount; enum mcast_group_state state; struct ib_sa_query *query; u16 pkey_index; u8 leave_state; int retries; }; struct mcast_member { struct ib_sa_multicast multicast; struct ib_sa_client *client; struct mcast_group *group; struct list_head list; enum mcast_state state; refcount_t refcount; struct completion comp; }; static void join_handler(int status, struct ib_sa_mcmember_rec *rec, void *context); static void leave_handler(int status, struct ib_sa_mcmember_rec *rec, void *context); static struct mcast_group *mcast_find(struct mcast_port *port, union ib_gid *mgid) { struct rb_node *node = port->table.rb_node; struct mcast_group *group; int ret; while (node) { group = rb_entry(node, struct mcast_group, node); ret = memcmp(mgid->raw, group->rec.mgid.raw, sizeof *mgid); if (!ret) return group; if (ret < 0) node = node->rb_left; else node = node->rb_right; } return NULL; } static struct mcast_group *mcast_insert(struct mcast_port *port, struct mcast_group *group, int allow_duplicates) { struct rb_node **link = &port->table.rb_node; struct rb_node *parent = NULL; struct mcast_group *cur_group; int ret; while (*link) { parent = *link; cur_group = rb_entry(parent, struct mcast_group, node); ret = memcmp(group->rec.mgid.raw, cur_group->rec.mgid.raw, sizeof group->rec.mgid); if (ret < 0) link = &(*link)->rb_left; else if (ret > 0) link = &(*link)->rb_right; else if (allow_duplicates) link = &(*link)->rb_left; else return cur_group; } rb_link_node(&group->node, parent, link); rb_insert_color(&group->node, &port->table); return NULL; } static void deref_port(struct mcast_port *port) { if (refcount_dec_and_test(&port->refcount)) complete(&port->comp); } static void release_group(struct mcast_group *group) { struct mcast_port *port = group->port; unsigned long flags; spin_lock_irqsave(&port->lock, flags); if (atomic_dec_and_test(&group->refcount)) { rb_erase(&group->node, &port->table); spin_unlock_irqrestore(&port->lock, flags); kfree(group); deref_port(port); } else spin_unlock_irqrestore(&port->lock, flags); } static void deref_member(struct mcast_member *member) { if (refcount_dec_and_test(&member->refcount)) complete(&member->comp); } static void queue_join(struct mcast_member *member) { struct mcast_group *group = member->group; unsigned long flags; spin_lock_irqsave(&group->lock, flags); list_add_tail(&member->list, &group->pending_list); if (group->state == MCAST_IDLE) { group->state = MCAST_BUSY; atomic_inc(&group->refcount); queue_work(mcast_wq, &group->work); } spin_unlock_irqrestore(&group->lock, flags); } /* * A multicast group has four types of members: full member, non member, * sendonly non member and sendonly full member. * We need to keep track of the number of members of each * type based on their join state. Adjust the number of members the belong to * the specified join states. */ static void adjust_membership(struct mcast_group *group, u8 join_state, int inc) { int i; for (i = 0; i < NUM_JOIN_MEMBERSHIP_TYPES; i++, join_state >>= 1) if (join_state & 0x1) group->members[i] += inc; } /* * If a multicast group has zero members left for a particular join state, but * the group is still a member with the SA, we need to leave that join state. * Determine which join states we still belong to, but that do not have any * active members. */ static u8 get_leave_state(struct mcast_group *group) { u8 leave_state = 0; int i; for (i = 0; i < NUM_JOIN_MEMBERSHIP_TYPES; i++) if (!group->members[i]) leave_state |= (0x1 << i); return leave_state & group->rec.join_state; } static int check_selector(ib_sa_comp_mask comp_mask, ib_sa_comp_mask selector_mask, ib_sa_comp_mask value_mask, u8 selector, u8 src_value, u8 dst_value) { int err; if (!(comp_mask & selector_mask) || !(comp_mask & value_mask)) return 0; switch (selector) { case IB_SA_GT: err = (src_value <= dst_value); break; case IB_SA_LT: err = (src_value >= dst_value); break; case IB_SA_EQ: err = (src_value != dst_value); break; default: err = 0; break; } return err; } static int cmp_rec(struct ib_sa_mcmember_rec *src, struct ib_sa_mcmember_rec *dst, ib_sa_comp_mask comp_mask) { /* MGID must already match */ if (comp_mask & IB_SA_MCMEMBER_REC_PORT_GID && memcmp(&src->port_gid, &dst->port_gid, sizeof src->port_gid)) return -EINVAL; if (comp_mask & IB_SA_MCMEMBER_REC_QKEY && src->qkey != dst->qkey) return -EINVAL; if (comp_mask & IB_SA_MCMEMBER_REC_MLID && src->mlid != dst->mlid) return -EINVAL; if (check_selector(comp_mask, IB_SA_MCMEMBER_REC_MTU_SELECTOR, IB_SA_MCMEMBER_REC_MTU, dst->mtu_selector, src->mtu, dst->mtu)) return -EINVAL; if (comp_mask & IB_SA_MCMEMBER_REC_TRAFFIC_CLASS && src->traffic_class != dst->traffic_class) return -EINVAL; if (comp_mask & IB_SA_MCMEMBER_REC_PKEY && src->pkey != dst->pkey) return -EINVAL; if (check_selector(comp_mask, IB_SA_MCMEMBER_REC_RATE_SELECTOR, IB_SA_MCMEMBER_REC_RATE, dst->rate_selector, src->rate, dst->rate)) return -EINVAL; if (check_selector(comp_mask, IB_SA_MCMEMBER_REC_PACKET_LIFE_TIME_SELECTOR, IB_SA_MCMEMBER_REC_PACKET_LIFE_TIME, dst->packet_life_time_selector, src->packet_life_time, dst->packet_life_time)) return -EINVAL; if (comp_mask & IB_SA_MCMEMBER_REC_SL && src->sl != dst->sl) return -EINVAL; if (comp_mask & IB_SA_MCMEMBER_REC_FLOW_LABEL && src->flow_label != dst->flow_label) return -EINVAL; if (comp_mask & IB_SA_MCMEMBER_REC_HOP_LIMIT && src->hop_limit != dst->hop_limit) return -EINVAL; if (comp_mask & IB_SA_MCMEMBER_REC_SCOPE && src->scope != dst->scope) return -EINVAL; /* join_state checked separately, proxy_join ignored */ return 0; } static int send_join(struct mcast_group *group, struct mcast_member *member) { struct mcast_port *port = group->port; int ret; group->last_join = member; ret = ib_sa_mcmember_rec_query(&sa_client, port->dev->device, port->port_num, IB_MGMT_METHOD_SET, &member->multicast.rec, member->multicast.comp_mask, 3000, GFP_KERNEL, join_handler, group, &group->query); return (ret > 0) ? 0 : ret; } static int send_leave(struct mcast_group *group, u8 leave_state) { struct mcast_port *port = group->port; struct ib_sa_mcmember_rec rec; int ret; rec = group->rec; rec.join_state = leave_state; group->leave_state = leave_state; ret = ib_sa_mcmember_rec_query(&sa_client, port->dev->device, port->port_num, IB_SA_METHOD_DELETE, &rec, IB_SA_MCMEMBER_REC_MGID | IB_SA_MCMEMBER_REC_PORT_GID | IB_SA_MCMEMBER_REC_JOIN_STATE, 3000, GFP_KERNEL, leave_handler, group, &group->query); return (ret > 0) ? 0 : ret; } static void join_group(struct mcast_group *group, struct mcast_member *member, u8 join_state) { member->state = MCAST_MEMBER; adjust_membership(group, join_state, 1); group->rec.join_state |= join_state; member->multicast.rec = group->rec; member->multicast.rec.join_state = join_state; list_move(&member->list, &group->active_list); } static int fail_join(struct mcast_group *group, struct mcast_member *member, int status) { spin_lock_irq(&group->lock); list_del_init(&member->list); spin_unlock_irq(&group->lock); return member->multicast.callback(status, &member->multicast); } static void process_group_error(struct mcast_group *group) { struct mcast_member *member; int ret = 0; u16 pkey_index; if (group->state == MCAST_PKEY_EVENT) ret = ib_find_pkey(group->port->dev->device, group->port->port_num, be16_to_cpu(group->rec.pkey), &pkey_index); spin_lock_irq(&group->lock); if (group->state == MCAST_PKEY_EVENT && !ret && group->pkey_index == pkey_index) goto out; while (!list_empty(&group->active_list)) { member = list_entry(group->active_list.next, struct mcast_member, list); refcount_inc(&member->refcount); list_del_init(&member->list); adjust_membership(group, member->multicast.rec.join_state, -1); member->state = MCAST_ERROR; spin_unlock_irq(&group->lock); ret = member->multicast.callback(-ENETRESET, &member->multicast); deref_member(member); if (ret) ib_sa_free_multicast(&member->multicast); spin_lock_irq(&group->lock); } group->rec.join_state = 0; out: group->state = MCAST_BUSY; spin_unlock_irq(&group->lock); } static void mcast_work_handler(struct work_struct *work) { struct mcast_group *group; struct mcast_member *member; struct ib_sa_multicast *multicast; int status, ret; u8 join_state; group = container_of(work, typeof(*group), work); retest: spin_lock_irq(&group->lock); while (!list_empty(&group->pending_list) || (group->state != MCAST_BUSY)) { if (group->state != MCAST_BUSY) { spin_unlock_irq(&group->lock); process_group_error(group); goto retest; } member = list_entry(group->pending_list.next, struct mcast_member, list); multicast = &member->multicast; join_state = multicast->rec.join_state; refcount_inc(&member->refcount); if (join_state == (group->rec.join_state & join_state)) { status = cmp_rec(&group->rec, &multicast->rec, multicast->comp_mask); if (!status) join_group(group, member, join_state); else list_del_init(&member->list); spin_unlock_irq(&group->lock); ret = multicast->callback(status, multicast); } else { spin_unlock_irq(&group->lock); status = send_join(group, member); if (!status) { deref_member(member); return; } ret = fail_join(group, member, status); } deref_member(member); if (ret) ib_sa_free_multicast(&member->multicast); spin_lock_irq(&group->lock); } join_state = get_leave_state(group); if (join_state) { group->rec.join_state &= ~join_state; spin_unlock_irq(&group->lock); if (send_leave(group, join_state)) goto retest; } else { group->state = MCAST_IDLE; spin_unlock_irq(&group->lock); release_group(group); } } /* * Fail a join request if it is still active - at the head of the pending queue. */ static void process_join_error(struct mcast_group *group, int status) { struct mcast_member *member; int ret; spin_lock_irq(&group->lock); member = list_entry(group->pending_list.next, struct mcast_member, list); if (group->last_join == member) { refcount_inc(&member->refcount); list_del_init(&member->list); spin_unlock_irq(&group->lock); ret = member->multicast.callback(status, &member->multicast); deref_member(member); if (ret) ib_sa_free_multicast(&member->multicast); } else spin_unlock_irq(&group->lock); } static void join_handler(int status, struct ib_sa_mcmember_rec *rec, void *context) { struct mcast_group *group = context; u16 pkey_index = MCAST_INVALID_PKEY_INDEX; if (status) process_join_error(group, status); else { int mgids_changed, is_mgid0; if (ib_find_pkey(group->port->dev->device, group->port->port_num, be16_to_cpu(rec->pkey), &pkey_index)) pkey_index = MCAST_INVALID_PKEY_INDEX; spin_lock_irq(&group->port->lock); if (group->state == MCAST_BUSY && group->pkey_index == MCAST_INVALID_PKEY_INDEX) group->pkey_index = pkey_index; mgids_changed = memcmp(&rec->mgid, &group->rec.mgid, sizeof(group->rec.mgid)); group->rec = *rec; if (mgids_changed) { rb_erase(&group->node, &group->port->table); is_mgid0 = !memcmp(&mgid0, &group->rec.mgid, sizeof(mgid0)); mcast_insert(group->port, group, is_mgid0); } spin_unlock_irq(&group->port->lock); } mcast_work_handler(&group->work); } static void leave_handler(int status, struct ib_sa_mcmember_rec *rec, void *context) { struct mcast_group *group = context; if (status && group->retries > 0 && !send_leave(group, group->leave_state)) group->retries--; else mcast_work_handler(&group->work); } static struct mcast_group *acquire_group(struct mcast_port *port, union ib_gid *mgid, gfp_t gfp_mask) { struct mcast_group *group, *cur_group; unsigned long flags; int is_mgid0; is_mgid0 = !memcmp(&mgid0, mgid, sizeof mgid0); if (!is_mgid0) { spin_lock_irqsave(&port->lock, flags); group = mcast_find(port, mgid); if (group) goto found; spin_unlock_irqrestore(&port->lock, flags); } group = kzalloc(sizeof *group, gfp_mask); if (!group) return NULL; group->retries = 3; group->port = port; group->rec.mgid = *mgid; group->pkey_index = MCAST_INVALID_PKEY_INDEX; INIT_LIST_HEAD(&group->pending_list); INIT_LIST_HEAD(&group->active_list); INIT_WORK(&group->work, mcast_work_handler); spin_lock_init(&group->lock); spin_lock_irqsave(&port->lock, flags); cur_group = mcast_insert(port, group, is_mgid0); if (cur_group) { kfree(group); group = cur_group; } else refcount_inc(&port->refcount); found: atomic_inc(&group->refcount); spin_unlock_irqrestore(&port->lock, flags); return group; } /* * We serialize all join requests to a single group to make our lives much * easier. Otherwise, two users could try to join the same group * simultaneously, with different configurations, one could leave while the * join is in progress, etc., which makes locking around error recovery * difficult. */ struct ib_sa_multicast * ib_sa_join_multicast(struct ib_sa_client *client, struct ib_device *device, u32 port_num, struct ib_sa_mcmember_rec *rec, ib_sa_comp_mask comp_mask, gfp_t gfp_mask, int (*callback)(int status, struct ib_sa_multicast *multicast), void *context) { struct mcast_device *dev; struct mcast_member *member; struct ib_sa_multicast *multicast; int ret; dev = ib_get_client_data(device, &mcast_client); if (!dev) return ERR_PTR(-ENODEV); member = kmalloc(sizeof *member, gfp_mask); if (!member) return ERR_PTR(-ENOMEM); ib_sa_client_get(client); member->client = client; member->multicast.rec = *rec; member->multicast.comp_mask = comp_mask; member->multicast.callback = callback; member->multicast.context = context; init_completion(&member->comp); refcount_set(&member->refcount, 1); member->state = MCAST_JOINING; member->group = acquire_group(&dev->port[port_num - dev->start_port], &rec->mgid, gfp_mask); if (!member->group) { ret = -ENOMEM; goto err; } /* * The user will get the multicast structure in their callback. They * could then free the multicast structure before we can return from * this routine. So we save the pointer to return before queuing * any callback. */ multicast = &member->multicast; queue_join(member); return multicast; err: ib_sa_client_put(client); kfree(member); return ERR_PTR(ret); } EXPORT_SYMBOL(ib_sa_join_multicast); void ib_sa_free_multicast(struct ib_sa_multicast *multicast) { struct mcast_member *member; struct mcast_group *group; member = container_of(multicast, struct mcast_member, multicast); group = member->group; spin_lock_irq(&group->lock); if (member->state == MCAST_MEMBER) adjust_membership(group, multicast->rec.join_state, -1); list_del_init(&member->list); if (group->state == MCAST_IDLE) { group->state = MCAST_BUSY; spin_unlock_irq(&group->lock); /* Continue to hold reference on group until callback */ queue_work(mcast_wq, &group->work); } else { spin_unlock_irq(&group->lock); release_group(group); } deref_member(member); wait_for_completion(&member->comp); ib_sa_client_put(member->client); kfree(member); } EXPORT_SYMBOL(ib_sa_free_multicast); int ib_sa_get_mcmember_rec(struct ib_device *device, u32 port_num, union ib_gid *mgid, struct ib_sa_mcmember_rec *rec) { struct mcast_device *dev; struct mcast_port *port; struct mcast_group *group; unsigned long flags; int ret = 0; dev = ib_get_client_data(device, &mcast_client); if (!dev) return -ENODEV; port = &dev->port[port_num - dev->start_port]; spin_lock_irqsave(&port->lock, flags); group = mcast_find(port, mgid); if (group) *rec = group->rec; else ret = -EADDRNOTAVAIL; spin_unlock_irqrestore(&port->lock, flags); return ret; } EXPORT_SYMBOL(ib_sa_get_mcmember_rec); /** * ib_init_ah_from_mcmember - Initialize AH attribute from multicast * member record and gid of the device. * @device: RDMA device * @port_num: Port of the rdma device to consider * @rec: Multicast member record to use * @ndev: Optional netdevice, applicable only for RoCE * @gid_type: GID type to consider * @ah_attr: AH attribute to fillup on successful completion * * ib_init_ah_from_mcmember() initializes AH attribute based on multicast * member record and other device properties. On success the caller is * responsible to call rdma_destroy_ah_attr on the ah_attr. Returns 0 on * success or appropriate error code. * */ int ib_init_ah_from_mcmember(struct ib_device *device, u32 port_num, struct ib_sa_mcmember_rec *rec, struct net_device *ndev, enum ib_gid_type gid_type, struct rdma_ah_attr *ah_attr) { const struct ib_gid_attr *sgid_attr; /* GID table is not based on the netdevice for IB link layer, * so ignore ndev during search. */ if (rdma_protocol_ib(device, port_num)) ndev = NULL; else if (!rdma_protocol_roce(device, port_num)) return -EINVAL; sgid_attr = rdma_find_gid_by_port(device, &rec->port_gid, gid_type, port_num, ndev); if (IS_ERR(sgid_attr)) return PTR_ERR(sgid_attr); memset(ah_attr, 0, sizeof(*ah_attr)); ah_attr->type = rdma_ah_find_type(device, port_num); rdma_ah_set_dlid(ah_attr, be16_to_cpu(rec->mlid)); rdma_ah_set_sl(ah_attr, rec->sl); rdma_ah_set_port_num(ah_attr, port_num); rdma_ah_set_static_rate(ah_attr, rec->rate); rdma_move_grh_sgid_attr(ah_attr, &rec->mgid, be32_to_cpu(rec->flow_label), rec->hop_limit, rec->traffic_class, sgid_attr); return 0; } EXPORT_SYMBOL(ib_init_ah_from_mcmember); static void mcast_groups_event(struct mcast_port *port, enum mcast_group_state state) { struct mcast_group *group; struct rb_node *node; unsigned long flags; spin_lock_irqsave(&port->lock, flags); for (node = rb_first(&port->table); node; node = rb_next(node)) { group = rb_entry(node, struct mcast_group, node); spin_lock(&group->lock); if (group->state == MCAST_IDLE) { atomic_inc(&group->refcount); queue_work(mcast_wq, &group->work); } if (group->state != MCAST_GROUP_ERROR) group->state = state; spin_unlock(&group->lock); } spin_unlock_irqrestore(&port->lock, flags); } static void mcast_event_handler(struct ib_event_handler *handler, struct ib_event *event) { struct mcast_device *dev; int index; dev = container_of(handler, struct mcast_device, event_handler); if (!rdma_cap_ib_mcast(dev->device, event->element.port_num)) return; index = event->element.port_num - dev->start_port; switch (event->event) { case IB_EVENT_PORT_ERR: case IB_EVENT_LID_CHANGE: case IB_EVENT_CLIENT_REREGISTER: mcast_groups_event(&dev->port[index], MCAST_GROUP_ERROR); break; case IB_EVENT_PKEY_CHANGE: mcast_groups_event(&dev->port[index], MCAST_PKEY_EVENT); break; default: break; } } static int mcast_add_one(struct ib_device *device) { struct mcast_device *dev; struct mcast_port *port; int i; int count = 0; dev = kmalloc(struct_size(dev, port, device->phys_port_cnt), GFP_KERNEL); if (!dev) return -ENOMEM; dev->start_port = rdma_start_port(device); dev->end_port = rdma_end_port(device); for (i = 0; i <= dev->end_port - dev->start_port; i++) { if (!rdma_cap_ib_mcast(device, dev->start_port + i)) continue; port = &dev->port[i]; port->dev = dev; port->port_num = dev->start_port + i; spin_lock_init(&port->lock); port->table = RB_ROOT; init_completion(&port->comp); refcount_set(&port->refcount, 1); ++count; } if (!count) { kfree(dev); return -EOPNOTSUPP; } dev->device = device; ib_set_client_data(device, &mcast_client, dev); INIT_IB_EVENT_HANDLER(&dev->event_handler, device, mcast_event_handler); ib_register_event_handler(&dev->event_handler); return 0; } static void mcast_remove_one(struct ib_device *device, void *client_data) { struct mcast_device *dev = client_data; struct mcast_port *port; int i; ib_unregister_event_handler(&dev->event_handler); flush_workqueue(mcast_wq); for (i = 0; i <= dev->end_port - dev->start_port; i++) { if (rdma_cap_ib_mcast(device, dev->start_port + i)) { port = &dev->port[i]; deref_port(port); wait_for_completion(&port->comp); } } kfree(dev); } int mcast_init(void) { int ret; mcast_wq = alloc_ordered_workqueue("ib_mcast", WQ_MEM_RECLAIM); if (!mcast_wq) return -ENOMEM; ib_sa_register_client(&sa_client); ret = ib_register_client(&mcast_client); if (ret) goto err; return 0; err: ib_sa_unregister_client(&sa_client); destroy_workqueue(mcast_wq); return ret; } void mcast_cleanup(void) { ib_unregister_client(&mcast_client); ib_sa_unregister_client(&sa_client); destroy_workqueue(mcast_wq); } |
| 528 5700 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 | /* SPDX-License-Identifier: GPL-2.0 */ /* * Filesystem access notification for Linux * * Copyright (C) 2008 Red Hat, Inc., Eric Paris <eparis@redhat.com> */ #ifndef __LINUX_FSNOTIFY_BACKEND_H #define __LINUX_FSNOTIFY_BACKEND_H #ifdef __KERNEL__ #include <linux/idr.h> /* inotify uses this */ #include <linux/fs.h> /* struct inode */ #include <linux/list.h> #include <linux/path.h> /* struct path */ #include <linux/spinlock.h> #include <linux/types.h> #include <linux/atomic.h> #include <linux/user_namespace.h> #include <linux/refcount.h> #include <linux/mempool.h> #include <linux/sched/mm.h> /* * IN_* from inotfy.h lines up EXACTLY with FS_*, this is so we can easily * convert between them. dnotify only needs conversion at watch creation * so no perf loss there. fanotify isn't defined yet, so it can use the * wholes if it needs more events. */ #define FS_ACCESS 0x00000001 /* File was accessed */ #define FS_MODIFY 0x00000002 /* File was modified */ #define FS_ATTRIB 0x00000004 /* Metadata changed */ #define FS_CLOSE_WRITE 0x00000008 /* Writable file was closed */ #define FS_CLOSE_NOWRITE 0x00000010 /* Unwritable file closed */ #define FS_OPEN 0x00000020 /* File was opened */ #define FS_MOVED_FROM 0x00000040 /* File was moved from X */ #define FS_MOVED_TO 0x00000080 /* File was moved to Y */ #define FS_CREATE 0x00000100 /* Subfile was created */ #define FS_DELETE 0x00000200 /* Subfile was deleted */ #define FS_DELETE_SELF 0x00000400 /* Self was deleted */ #define FS_MOVE_SELF 0x00000800 /* Self was moved */ #define FS_OPEN_EXEC 0x00001000 /* File was opened for exec */ #define FS_UNMOUNT 0x00002000 /* inode on umount fs */ #define FS_Q_OVERFLOW 0x00004000 /* Event queued overflowed */ #define FS_ERROR 0x00008000 /* Filesystem Error (fanotify) */ /* * FS_IN_IGNORED overloads FS_ERROR. It is only used internally by inotify * which does not support FS_ERROR. */ #define FS_IN_IGNORED 0x00008000 /* last inotify event here */ #define FS_OPEN_PERM 0x00010000 /* open event in an permission hook */ #define FS_ACCESS_PERM 0x00020000 /* access event in a permissions hook */ #define FS_OPEN_EXEC_PERM 0x00040000 /* open/exec event in a permission hook */ /* #define FS_DIR_MODIFY 0x00080000 */ /* Deprecated (reserved) */ #define FS_PRE_ACCESS 0x00100000 /* Pre-content access hook */ #define FS_MNT_ATTACH 0x01000000 /* Mount was attached */ #define FS_MNT_DETACH 0x02000000 /* Mount was detached */ #define FS_MNT_MOVE (FS_MNT_ATTACH | FS_MNT_DETACH) /* * Set on inode mark that cares about things that happen to its children. * Always set for dnotify and inotify. * Set on inode/sb/mount marks that care about parent/name info. */ #define FS_EVENT_ON_CHILD 0x08000000 #define FS_RENAME 0x10000000 /* File was renamed */ #define FS_DN_MULTISHOT 0x20000000 /* dnotify multishot */ #define FS_ISDIR 0x40000000 /* event occurred against dir */ #define FS_MOVE (FS_MOVED_FROM | FS_MOVED_TO) /* * Directory entry modification events - reported only to directory * where entry is modified and not to a watching parent. * The watching parent may get an FS_ATTRIB|FS_EVENT_ON_CHILD event * when a directory entry inside a child subdir changes. */ #define ALL_FSNOTIFY_DIRENT_EVENTS (FS_CREATE | FS_DELETE | FS_MOVE | FS_RENAME) /* Mount namespace events */ #define FSNOTIFY_MNT_EVENTS (FS_MNT_ATTACH | FS_MNT_DETACH) /* Content events can be used to inspect file content */ #define FSNOTIFY_CONTENT_PERM_EVENTS (FS_OPEN_PERM | FS_OPEN_EXEC_PERM | \ FS_ACCESS_PERM) /* Pre-content events can be used to fill file content */ #define FSNOTIFY_PRE_CONTENT_EVENTS (FS_PRE_ACCESS) #define ALL_FSNOTIFY_PERM_EVENTS (FSNOTIFY_CONTENT_PERM_EVENTS | \ FSNOTIFY_PRE_CONTENT_EVENTS) /* * This is a list of all events that may get sent to a parent that is watching * with flag FS_EVENT_ON_CHILD based on fs event on a child of that directory. */ #define FS_EVENTS_POSS_ON_CHILD (ALL_FSNOTIFY_PERM_EVENTS | \ FS_ACCESS | FS_MODIFY | FS_ATTRIB | \ FS_CLOSE_WRITE | FS_CLOSE_NOWRITE | \ FS_OPEN | FS_OPEN_EXEC) /* * This is a list of all events that may get sent with the parent inode as the * @to_tell argument of fsnotify(). * It may include events that can be sent to an inode/sb/mount mark, but cannot * be sent to a parent watching children. */ #define FS_EVENTS_POSS_TO_PARENT (FS_EVENTS_POSS_ON_CHILD) /* Events that can be reported to backends */ #define ALL_FSNOTIFY_EVENTS (ALL_FSNOTIFY_DIRENT_EVENTS | \ FSNOTIFY_MNT_EVENTS | \ FS_EVENTS_POSS_ON_CHILD | \ FS_DELETE_SELF | FS_MOVE_SELF | \ FS_UNMOUNT | FS_Q_OVERFLOW | FS_IN_IGNORED | \ FS_ERROR) /* Extra flags that may be reported with event or control handling of events */ #define ALL_FSNOTIFY_FLAGS (FS_ISDIR | FS_EVENT_ON_CHILD | FS_DN_MULTISHOT) #define ALL_FSNOTIFY_BITS (ALL_FSNOTIFY_EVENTS | ALL_FSNOTIFY_FLAGS) struct fsnotify_group; struct fsnotify_event; struct fsnotify_mark; struct fsnotify_event_private_data; struct fsnotify_fname; struct fsnotify_iter_info; struct mem_cgroup; /* * Each group much define these ops. The fsnotify infrastructure will call * these operations for each relevant group. * * handle_event - main call for a group to handle an fs event * @group: group to notify * @mask: event type and flags * @data: object that event happened on * @data_type: type of object for fanotify_data_XXX() accessors * @dir: optional directory associated with event - * if @file_name is not NULL, this is the directory that * @file_name is relative to * @file_name: optional file name associated with event * @cookie: inotify rename cookie * @iter_info: array of marks from this group that are interested in the event * * handle_inode_event - simple variant of handle_event() for groups that only * have inode marks and don't have ignore mask * @mark: mark to notify * @mask: event type and flags * @inode: inode that event happened on * @dir: optional directory associated with event - * if @file_name is not NULL, this is the directory that * @file_name is relative to. * Either @inode or @dir must be non-NULL. * @file_name: optional file name associated with event * @cookie: inotify rename cookie * * free_group_priv - called when a group refcnt hits 0 to clean up the private union * freeing_mark - called when a mark is being destroyed for some reason. The group * MUST be holding a reference on each mark and that reference must be * dropped in this function. inotify uses this function to send * userspace messages that marks have been removed. */ struct fsnotify_ops { int (*handle_event)(struct fsnotify_group *group, u32 mask, const void *data, int data_type, struct inode *dir, const struct qstr *file_name, u32 cookie, struct fsnotify_iter_info *iter_info); int (*handle_inode_event)(struct fsnotify_mark *mark, u32 mask, struct inode *inode, struct inode *dir, const struct qstr *file_name, u32 cookie); void (*free_group_priv)(struct fsnotify_group *group); void (*freeing_mark)(struct fsnotify_mark *mark, struct fsnotify_group *group); void (*free_event)(struct fsnotify_group *group, struct fsnotify_event *event); /* called on final put+free to free memory */ void (*free_mark)(struct fsnotify_mark *mark); }; /* * all of the information about the original object we want to now send to * a group. If you want to carry more info from the accessing task to the * listener this structure is where you need to be adding fields. */ struct fsnotify_event { struct list_head list; }; /* * fsnotify group priorities. * Events are sent in order from highest priority to lowest priority. */ enum fsnotify_group_prio { FSNOTIFY_PRIO_NORMAL = 0, /* normal notifiers, no permissions */ FSNOTIFY_PRIO_CONTENT, /* fanotify permission events */ FSNOTIFY_PRIO_PRE_CONTENT, /* fanotify pre-content events */ __FSNOTIFY_PRIO_NUM }; /* * A group is a "thing" that wants to receive notification about filesystem * events. The mask holds the subset of event types this group cares about. * refcnt on a group is up to the implementor and at any moment if it goes 0 * everything will be cleaned up. */ struct fsnotify_group { const struct fsnotify_ops *ops; /* how this group handles things */ /* * How the refcnt is used is up to each group. When the refcnt hits 0 * fsnotify will clean up all of the resources associated with this group. * As an example, the dnotify group will always have a refcnt=1 and that * will never change. Inotify, on the other hand, has a group per * inotify_init() and the refcnt will hit 0 only when that fd has been * closed. */ refcount_t refcnt; /* things with interest in this group */ /* needed to send notification to userspace */ spinlock_t notification_lock; /* protect the notification_list */ struct list_head notification_list; /* list of event_holder this group needs to send to userspace */ wait_queue_head_t notification_waitq; /* read() on the notification file blocks on this waitq */ unsigned int q_len; /* events on the queue */ unsigned int max_events; /* maximum events allowed on the list */ enum fsnotify_group_prio priority; /* priority for sending events */ bool shutdown; /* group is being shut down, don't queue more events */ #define FSNOTIFY_GROUP_USER 0x01 /* user allocated group */ #define FSNOTIFY_GROUP_DUPS 0x02 /* allow multiple marks per object */ int flags; unsigned int owner_flags; /* stored flags of mark_mutex owner */ /* stores all fastpath marks assoc with this group so they can be cleaned on unregister */ struct mutex mark_mutex; /* protect marks_list */ atomic_t user_waits; /* Number of tasks waiting for user * response */ struct list_head marks_list; /* all inode marks for this group */ struct fasync_struct *fsn_fa; /* async notification */ struct fsnotify_event *overflow_event; /* Event we queue when the * notification list is too * full */ struct mem_cgroup *memcg; /* memcg to charge allocations */ struct user_namespace *user_ns; /* user ns where group was created */ /* groups can define private fields here or use the void *private */ union { void *private; #ifdef CONFIG_INOTIFY_USER struct inotify_group_private_data { spinlock_t idr_lock; struct idr idr; struct ucounts *ucounts; } inotify_data; #endif #ifdef CONFIG_FANOTIFY struct fanotify_group_private_data { /* Hash table of events for merge */ struct hlist_head *merge_hash; /* allows a group to block waiting for a userspace response */ struct list_head access_list; wait_queue_head_t access_waitq; int flags; /* flags from fanotify_init() */ int f_flags; /* event_f_flags from fanotify_init() */ struct ucounts *ucounts; mempool_t error_events_pool; /* chained on perm_group_list */ struct list_head perm_grp_list; } fanotify_data; #endif /* CONFIG_FANOTIFY */ }; }; /* * These helpers are used to prevent deadlock when reclaiming inodes with * evictable marks of the same group that is allocating a new mark. */ static inline void fsnotify_group_lock(struct fsnotify_group *group) { mutex_lock(&group->mark_mutex); group->owner_flags = memalloc_nofs_save(); } static inline void fsnotify_group_unlock(struct fsnotify_group *group) { memalloc_nofs_restore(group->owner_flags); mutex_unlock(&group->mark_mutex); } static inline void fsnotify_group_assert_locked(struct fsnotify_group *group) { WARN_ON_ONCE(!mutex_is_locked(&group->mark_mutex)); WARN_ON_ONCE(!(current->flags & PF_MEMALLOC_NOFS)); } /* When calling fsnotify tell it if the data is a path or inode */ enum fsnotify_data_type { FSNOTIFY_EVENT_NONE, FSNOTIFY_EVENT_FILE_RANGE, FSNOTIFY_EVENT_PATH, FSNOTIFY_EVENT_INODE, FSNOTIFY_EVENT_DENTRY, FSNOTIFY_EVENT_MNT, FSNOTIFY_EVENT_ERROR, }; struct fs_error_report { int error; struct inode *inode; struct super_block *sb; }; struct file_range { const struct path *path; loff_t pos; size_t count; }; static inline const struct path *file_range_path(const struct file_range *range) { return range->path; } struct fsnotify_mnt { const struct mnt_namespace *ns; u64 mnt_id; }; static inline struct inode *fsnotify_data_inode(const void *data, int data_type) { switch (data_type) { case FSNOTIFY_EVENT_INODE: return (struct inode *)data; case FSNOTIFY_EVENT_DENTRY: return d_inode(data); case FSNOTIFY_EVENT_PATH: return d_inode(((const struct path *)data)->dentry); case FSNOTIFY_EVENT_FILE_RANGE: return d_inode(file_range_path(data)->dentry); case FSNOTIFY_EVENT_ERROR: return ((struct fs_error_report *)data)->inode; default: return NULL; } } static inline struct dentry *fsnotify_data_dentry(const void *data, int data_type) { switch (data_type) { case FSNOTIFY_EVENT_DENTRY: /* Non const is needed for dget() */ return (struct dentry *)data; case FSNOTIFY_EVENT_PATH: return ((const struct path *)data)->dentry; case FSNOTIFY_EVENT_FILE_RANGE: return file_range_path(data)->dentry; default: return NULL; } } static inline const struct path *fsnotify_data_path(const void *data, int data_type) { switch (data_type) { case FSNOTIFY_EVENT_PATH: return data; case FSNOTIFY_EVENT_FILE_RANGE: return file_range_path(data); default: return NULL; } } static inline struct super_block *fsnotify_data_sb(const void *data, int data_type) { switch (data_type) { case FSNOTIFY_EVENT_INODE: return ((struct inode *)data)->i_sb; case FSNOTIFY_EVENT_DENTRY: return ((struct dentry *)data)->d_sb; case FSNOTIFY_EVENT_PATH: return ((const struct path *)data)->dentry->d_sb; case FSNOTIFY_EVENT_FILE_RANGE: return file_range_path(data)->dentry->d_sb; case FSNOTIFY_EVENT_ERROR: return ((struct fs_error_report *) data)->sb; default: return NULL; } } static inline const struct fsnotify_mnt *fsnotify_data_mnt(const void *data, int data_type) { switch (data_type) { case FSNOTIFY_EVENT_MNT: return data; default: return NULL; } } static inline u64 fsnotify_data_mnt_id(const void *data, int data_type) { const struct fsnotify_mnt *mnt_data = fsnotify_data_mnt(data, data_type); return mnt_data ? mnt_data->mnt_id : 0; } static inline struct fs_error_report *fsnotify_data_error_report( const void *data, int data_type) { switch (data_type) { case FSNOTIFY_EVENT_ERROR: return (struct fs_error_report *) data; default: return NULL; } } static inline const struct file_range *fsnotify_data_file_range( const void *data, int data_type) { switch (data_type) { case FSNOTIFY_EVENT_FILE_RANGE: return (struct file_range *)data; default: return NULL; } } /* * Index to merged marks iterator array that correlates to a type of watch. * The type of watched object can be deduced from the iterator type, but not * the other way around, because an event can match different watched objects * of the same object type. * For example, both parent and child are watching an object of type inode. */ enum fsnotify_iter_type { FSNOTIFY_ITER_TYPE_INODE, FSNOTIFY_ITER_TYPE_VFSMOUNT, FSNOTIFY_ITER_TYPE_SB, FSNOTIFY_ITER_TYPE_PARENT, FSNOTIFY_ITER_TYPE_INODE2, FSNOTIFY_ITER_TYPE_MNTNS, FSNOTIFY_ITER_TYPE_COUNT }; /* The type of object that a mark is attached to */ enum fsnotify_obj_type { FSNOTIFY_OBJ_TYPE_ANY = -1, FSNOTIFY_OBJ_TYPE_INODE, FSNOTIFY_OBJ_TYPE_VFSMOUNT, FSNOTIFY_OBJ_TYPE_SB, FSNOTIFY_OBJ_TYPE_MNTNS, FSNOTIFY_OBJ_TYPE_COUNT, FSNOTIFY_OBJ_TYPE_DETACHED = FSNOTIFY_OBJ_TYPE_COUNT }; static inline bool fsnotify_valid_obj_type(unsigned int obj_type) { return (obj_type < FSNOTIFY_OBJ_TYPE_COUNT); } struct fsnotify_iter_info { struct fsnotify_mark *marks[FSNOTIFY_ITER_TYPE_COUNT]; struct fsnotify_group *current_group; unsigned int report_mask; int srcu_idx; }; static inline bool fsnotify_iter_should_report_type( struct fsnotify_iter_info *iter_info, int iter_type) { return (iter_info->report_mask & (1U << iter_type)); } static inline void fsnotify_iter_set_report_type( struct fsnotify_iter_info *iter_info, int iter_type) { iter_info->report_mask |= (1U << iter_type); } static inline struct fsnotify_mark *fsnotify_iter_mark( struct fsnotify_iter_info *iter_info, int iter_type) { if (fsnotify_iter_should_report_type(iter_info, iter_type)) return iter_info->marks[iter_type]; return NULL; } static inline int fsnotify_iter_step(struct fsnotify_iter_info *iter, int type, struct fsnotify_mark **markp) { while (type < FSNOTIFY_ITER_TYPE_COUNT) { *markp = fsnotify_iter_mark(iter, type); if (*markp) break; type++; } return type; } #define FSNOTIFY_ITER_FUNCS(name, NAME) \ static inline struct fsnotify_mark *fsnotify_iter_##name##_mark( \ struct fsnotify_iter_info *iter_info) \ { \ return fsnotify_iter_mark(iter_info, FSNOTIFY_ITER_TYPE_##NAME); \ } FSNOTIFY_ITER_FUNCS(inode, INODE) FSNOTIFY_ITER_FUNCS(parent, PARENT) FSNOTIFY_ITER_FUNCS(vfsmount, VFSMOUNT) FSNOTIFY_ITER_FUNCS(sb, SB) #define fsnotify_foreach_iter_type(type) \ for (type = 0; type < FSNOTIFY_ITER_TYPE_COUNT; type++) #define fsnotify_foreach_iter_mark_type(iter, mark, type) \ for (type = 0; \ type = fsnotify_iter_step(iter, type, &mark), \ type < FSNOTIFY_ITER_TYPE_COUNT; \ type++) /* * Inode/vfsmount/sb point to this structure which tracks all marks attached to * the inode/vfsmount/sb. The reference to inode/vfsmount/sb is held by this * structure. We destroy this structure when there are no more marks attached * to it. The structure is protected by fsnotify_mark_srcu. */ struct fsnotify_mark_connector { spinlock_t lock; unsigned char type; /* Type of object [lock] */ unsigned char prio; /* Highest priority group */ #define FSNOTIFY_CONN_FLAG_IS_WATCHED 0x01 #define FSNOTIFY_CONN_FLAG_HAS_IREF 0x02 unsigned short flags; /* flags [lock] */ union { /* Object pointer [lock] */ void *obj; /* Used listing heads to free after srcu period expires */ struct fsnotify_mark_connector *destroy_next; }; struct hlist_head list; }; /* * Container for per-sb fsnotify state (sb marks and more). * Attached lazily on first marked object on the sb and freed when killing sb. */ struct fsnotify_sb_info { struct fsnotify_mark_connector __rcu *sb_marks; /* * Number of inode/mount/sb objects that are being watched in this sb. * Note that inodes objects are currently double-accounted. * * The value in watched_objects[prio] is the number of objects that are * watched by groups of priority >= prio, so watched_objects[0] is the * total number of watched objects in this sb. */ atomic_long_t watched_objects[__FSNOTIFY_PRIO_NUM]; }; static inline struct fsnotify_sb_info *fsnotify_sb_info(struct super_block *sb) { #ifdef CONFIG_FSNOTIFY return READ_ONCE(sb->s_fsnotify_info); #else return NULL; #endif } static inline atomic_long_t *fsnotify_sb_watched_objects(struct super_block *sb) { return &fsnotify_sb_info(sb)->watched_objects[0]; } /* * A mark is simply an object attached to an in core inode which allows an * fsnotify listener to indicate they are either no longer interested in events * of a type matching mask or only interested in those events. * * These are flushed when an inode is evicted from core and may be flushed * when the inode is modified (as seen by fsnotify_access). Some fsnotify * users (such as dnotify) will flush these when the open fd is closed and not * at inode eviction or modification. * * Text in brackets is showing the lock(s) protecting modifications of a * particular entry. obj_lock means either inode->i_lock or * mnt->mnt_root->d_lock depending on the mark type. */ struct fsnotify_mark { /* Mask this mark is for [mark->lock, group->mark_mutex] */ __u32 mask; /* We hold one for presence in g_list. Also one ref for each 'thing' * in kernel that found and may be using this mark. */ refcount_t refcnt; /* Group this mark is for. Set on mark creation, stable until last ref * is dropped */ struct fsnotify_group *group; /* List of marks by group->marks_list. Also reused for queueing * mark into destroy_list when it's waiting for the end of SRCU period * before it can be freed. [group->mark_mutex] */ struct list_head g_list; /* Protects inode / mnt pointers, flags, masks */ spinlock_t lock; /* List of marks for inode / vfsmount [connector->lock, mark ref] */ struct hlist_node obj_list; /* Head of list of marks for an object [mark ref] */ struct fsnotify_mark_connector *connector; /* Events types and flags to ignore [mark->lock, group->mark_mutex] */ __u32 ignore_mask; /* General fsnotify mark flags */ #define FSNOTIFY_MARK_FLAG_ALIVE 0x0001 #define FSNOTIFY_MARK_FLAG_ATTACHED 0x0002 /* inotify mark flags */ #define FSNOTIFY_MARK_FLAG_EXCL_UNLINK 0x0010 #define FSNOTIFY_MARK_FLAG_IN_ONESHOT 0x0020 /* fanotify mark flags */ #define FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY 0x0100 #define FSNOTIFY_MARK_FLAG_NO_IREF 0x0200 #define FSNOTIFY_MARK_FLAG_HAS_IGNORE_FLAGS 0x0400 #define FSNOTIFY_MARK_FLAG_HAS_FSID 0x0800 #define FSNOTIFY_MARK_FLAG_WEAK_FSID 0x1000 unsigned int flags; /* flags [mark->lock] */ }; #ifdef CONFIG_FSNOTIFY /* called from the vfs helpers */ /* main fsnotify call to send events */ extern int fsnotify(__u32 mask, const void *data, int data_type, struct inode *dir, const struct qstr *name, struct inode *inode, u32 cookie); extern int __fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data, int data_type); extern void __fsnotify_inode_delete(struct inode *inode); extern void __fsnotify_vfsmount_delete(struct vfsmount *mnt); extern void fsnotify_sb_delete(struct super_block *sb); extern void __fsnotify_mntns_delete(struct mnt_namespace *mntns); extern void fsnotify_sb_free(struct super_block *sb); extern u32 fsnotify_get_cookie(void); extern void fsnotify_mnt(__u32 mask, struct mnt_namespace *ns, struct vfsmount *mnt); static inline __u32 fsnotify_parent_needed_mask(__u32 mask) { /* FS_EVENT_ON_CHILD is set on marks that want parent/name info */ if (!(mask & FS_EVENT_ON_CHILD)) return 0; /* * This object might be watched by a mark that cares about parent/name * info, does it care about the specific set of events that can be * reported with parent/name info? */ return mask & FS_EVENTS_POSS_TO_PARENT; } static inline int fsnotify_inode_watches_children(struct inode *inode) { __u32 parent_mask = READ_ONCE(inode->i_fsnotify_mask); /* FS_EVENT_ON_CHILD is set if the inode may care */ if (!(parent_mask & FS_EVENT_ON_CHILD)) return 0; /* this inode might care about child events, does it care about the * specific set of events that can happen on a child? */ return parent_mask & FS_EVENTS_POSS_ON_CHILD; } /* * Update the dentry with a flag indicating the interest of its parent to receive * filesystem events when those events happens to this dentry->d_inode. */ static inline void fsnotify_update_flags(struct dentry *dentry) { assert_spin_locked(&dentry->d_lock); /* * Serialisation of setting PARENT_WATCHED on the dentries is provided * by d_lock. If inotify_inode_watched changes after we have taken * d_lock, the following fsnotify_set_children_dentry_flags call will * find our entry, so it will spin until we complete here, and update * us with the new state. */ if (fsnotify_inode_watches_children(dentry->d_parent->d_inode)) dentry->d_flags |= DCACHE_FSNOTIFY_PARENT_WATCHED; else dentry->d_flags &= ~DCACHE_FSNOTIFY_PARENT_WATCHED; } /* called from fsnotify listeners, such as fanotify or dnotify */ /* create a new group */ extern struct fsnotify_group *fsnotify_alloc_group( const struct fsnotify_ops *ops, int flags); /* get reference to a group */ extern void fsnotify_get_group(struct fsnotify_group *group); /* drop reference on a group from fsnotify_alloc_group */ extern void fsnotify_put_group(struct fsnotify_group *group); /* group destruction begins, stop queuing new events */ extern void fsnotify_group_stop_queueing(struct fsnotify_group *group); /* destroy group */ extern void fsnotify_destroy_group(struct fsnotify_group *group); /* fasync handler function */ extern int fsnotify_fasync(int fd, struct file *file, int on); /* Free event from memory */ extern void fsnotify_destroy_event(struct fsnotify_group *group, struct fsnotify_event *event); /* attach the event to the group notification queue */ extern int fsnotify_insert_event(struct fsnotify_group *group, struct fsnotify_event *event, int (*merge)(struct fsnotify_group *, struct fsnotify_event *), void (*insert)(struct fsnotify_group *, struct fsnotify_event *)); static inline int fsnotify_add_event(struct fsnotify_group *group, struct fsnotify_event *event, int (*merge)(struct fsnotify_group *, struct fsnotify_event *)) { return fsnotify_insert_event(group, event, merge, NULL); } /* Queue overflow event to a notification group */ static inline void fsnotify_queue_overflow(struct fsnotify_group *group) { fsnotify_add_event(group, group->overflow_event, NULL); } static inline bool fsnotify_is_overflow_event(u32 mask) { return mask & FS_Q_OVERFLOW; } static inline bool fsnotify_notify_queue_is_empty(struct fsnotify_group *group) { assert_spin_locked(&group->notification_lock); return list_empty(&group->notification_list); } extern bool fsnotify_notify_queue_is_empty(struct fsnotify_group *group); /* return, but do not dequeue the first event on the notification queue */ extern struct fsnotify_event *fsnotify_peek_first_event(struct fsnotify_group *group); /* return AND dequeue the first event on the notification queue */ extern struct fsnotify_event *fsnotify_remove_first_event(struct fsnotify_group *group); /* Remove event queued in the notification list */ extern void fsnotify_remove_queued_event(struct fsnotify_group *group, struct fsnotify_event *event); /* functions used to manipulate the marks attached to inodes */ /* * Canonical "ignore mask" including event flags. * * Note the subtle semantic difference from the legacy ->ignored_mask. * ->ignored_mask traditionally only meant which events should be ignored, * while ->ignore_mask also includes flags regarding the type of objects on * which events should be ignored. */ static inline __u32 fsnotify_ignore_mask(struct fsnotify_mark *mark) { __u32 ignore_mask = mark->ignore_mask; /* The event flags in ignore mask take effect */ if (mark->flags & FSNOTIFY_MARK_FLAG_HAS_IGNORE_FLAGS) return ignore_mask; /* * Legacy behavior: * - Always ignore events on dir * - Ignore events on child if parent is watching children */ ignore_mask |= FS_ISDIR; ignore_mask &= ~FS_EVENT_ON_CHILD; ignore_mask |= mark->mask & FS_EVENT_ON_CHILD; return ignore_mask; } /* Legacy ignored_mask - only event types to ignore */ static inline __u32 fsnotify_ignored_events(struct fsnotify_mark *mark) { return mark->ignore_mask & ALL_FSNOTIFY_EVENTS; } /* * Check if mask (or ignore mask) should be applied depending if victim is a * directory and whether it is reported to a watching parent. */ static inline bool fsnotify_mask_applicable(__u32 mask, bool is_dir, int iter_type) { /* Should mask be applied to a directory? */ if (is_dir && !(mask & FS_ISDIR)) return false; /* Should mask be applied to a child? */ if (iter_type == FSNOTIFY_ITER_TYPE_PARENT && !(mask & FS_EVENT_ON_CHILD)) return false; return true; } /* * Effective ignore mask taking into account if event victim is a * directory and whether it is reported to a watching parent. */ static inline __u32 fsnotify_effective_ignore_mask(struct fsnotify_mark *mark, bool is_dir, int iter_type) { __u32 ignore_mask = fsnotify_ignored_events(mark); if (!ignore_mask) return 0; /* For non-dir and non-child, no need to consult the event flags */ if (!is_dir && iter_type != FSNOTIFY_ITER_TYPE_PARENT) return ignore_mask; ignore_mask = fsnotify_ignore_mask(mark); if (!fsnotify_mask_applicable(ignore_mask, is_dir, iter_type)) return 0; return ignore_mask & ALL_FSNOTIFY_EVENTS; } /* Get mask for calculating object interest taking ignore mask into account */ static inline __u32 fsnotify_calc_mask(struct fsnotify_mark *mark) { __u32 mask = mark->mask; if (!fsnotify_ignored_events(mark)) return mask; /* Interest in FS_MODIFY may be needed for clearing ignore mask */ if (!(mark->flags & FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY)) mask |= FS_MODIFY; /* * If mark is interested in ignoring events on children, the object must * show interest in those events for fsnotify_parent() to notice it. */ return mask | mark->ignore_mask; } /* Get mask of events for a list of marks */ extern __u32 fsnotify_conn_mask(struct fsnotify_mark_connector *conn); /* Calculate mask of events for a list of marks */ extern void fsnotify_recalc_mask(struct fsnotify_mark_connector *conn); extern void fsnotify_init_mark(struct fsnotify_mark *mark, struct fsnotify_group *group); /* Find mark belonging to given group in the list of marks */ struct fsnotify_mark *fsnotify_find_mark(void *obj, unsigned int obj_type, struct fsnotify_group *group); /* attach the mark to the object */ int fsnotify_add_mark(struct fsnotify_mark *mark, void *obj, unsigned int obj_type, int add_flags); int fsnotify_add_mark_locked(struct fsnotify_mark *mark, void *obj, unsigned int obj_type, int add_flags); /* attach the mark to the inode */ static inline int fsnotify_add_inode_mark(struct fsnotify_mark *mark, struct inode *inode, int add_flags) { return fsnotify_add_mark(mark, inode, FSNOTIFY_OBJ_TYPE_INODE, add_flags); } static inline int fsnotify_add_inode_mark_locked(struct fsnotify_mark *mark, struct inode *inode, int add_flags) { return fsnotify_add_mark_locked(mark, inode, FSNOTIFY_OBJ_TYPE_INODE, add_flags); } static inline struct fsnotify_mark *fsnotify_find_inode_mark( struct inode *inode, struct fsnotify_group *group) { return fsnotify_find_mark(inode, FSNOTIFY_OBJ_TYPE_INODE, group); } /* given a group and a mark, flag mark to be freed when all references are dropped */ extern void fsnotify_destroy_mark(struct fsnotify_mark *mark, struct fsnotify_group *group); /* detach mark from inode / mount list, group list, drop inode reference */ extern void fsnotify_detach_mark(struct fsnotify_mark *mark); /* free mark */ extern void fsnotify_free_mark(struct fsnotify_mark *mark); /* Wait until all marks queued for destruction are destroyed */ extern void fsnotify_wait_marks_destroyed(void); /* Clear all of the marks of a group attached to a given object type */ extern void fsnotify_clear_marks_by_group(struct fsnotify_group *group, unsigned int obj_type); extern void fsnotify_get_mark(struct fsnotify_mark *mark); extern void fsnotify_put_mark(struct fsnotify_mark *mark); extern void fsnotify_finish_user_wait(struct fsnotify_iter_info *iter_info); extern bool fsnotify_prepare_user_wait(struct fsnotify_iter_info *iter_info); static inline void fsnotify_init_event(struct fsnotify_event *event) { INIT_LIST_HEAD(&event->list); } int fsnotify_pre_content(const struct path *path, const loff_t *ppos, size_t count); #else static inline int fsnotify_pre_content(const struct path *path, const loff_t *ppos, size_t count) { return 0; } static inline int fsnotify(__u32 mask, const void *data, int data_type, struct inode *dir, const struct qstr *name, struct inode *inode, u32 cookie) { return 0; } static inline int __fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data, int data_type) { return 0; } static inline void __fsnotify_inode_delete(struct inode *inode) {} static inline void __fsnotify_vfsmount_delete(struct vfsmount *mnt) {} static inline void fsnotify_sb_delete(struct super_block *sb) {} static inline void __fsnotify_mntns_delete(struct mnt_namespace *mntns) {} static inline void fsnotify_sb_free(struct super_block *sb) {} static inline void fsnotify_update_flags(struct dentry *dentry) {} static inline u32 fsnotify_get_cookie(void) { return 0; } static inline void fsnotify_unmount_inodes(struct super_block *sb) {} static inline void fsnotify_mnt(__u32 mask, struct mnt_namespace *ns, struct vfsmount *mnt) {} #endif /* CONFIG_FSNOTIFY */ #endif /* __KERNEL __ */ #endif /* __LINUX_FSNOTIFY_BACKEND_H */ |
| 5 4 1 5 5 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 | // SPDX-License-Identifier: GPL-2.0 #include <linux/module.h> #include <linux/netfilter/nf_tables.h> #include <net/netfilter/nf_nat.h> #include <net/netfilter/nf_tables.h> #include <net/netfilter/nf_tables_ipv4.h> #include <net/netfilter/nf_tables_ipv6.h> static unsigned int nft_nat_do_chain(void *priv, struct sk_buff *skb, const struct nf_hook_state *state) { struct nft_pktinfo pkt; nft_set_pktinfo(&pkt, skb, state); switch (state->pf) { #ifdef CONFIG_NF_TABLES_IPV4 case NFPROTO_IPV4: nft_set_pktinfo_ipv4(&pkt); break; #endif #ifdef CONFIG_NF_TABLES_IPV6 case NFPROTO_IPV6: nft_set_pktinfo_ipv6(&pkt); break; #endif default: break; } return nft_do_chain(&pkt, priv); } #ifdef CONFIG_NF_TABLES_IPV4 static const struct nft_chain_type nft_chain_nat_ipv4 = { .name = "nat", .type = NFT_CHAIN_T_NAT, .family = NFPROTO_IPV4, .owner = THIS_MODULE, .hook_mask = (1 << NF_INET_PRE_ROUTING) | (1 << NF_INET_POST_ROUTING) | (1 << NF_INET_LOCAL_OUT) | (1 << NF_INET_LOCAL_IN), .hooks = { [NF_INET_PRE_ROUTING] = nft_nat_do_chain, [NF_INET_POST_ROUTING] = nft_nat_do_chain, [NF_INET_LOCAL_OUT] = nft_nat_do_chain, [NF_INET_LOCAL_IN] = nft_nat_do_chain, }, .ops_register = nf_nat_ipv4_register_fn, .ops_unregister = nf_nat_ipv4_unregister_fn, }; #endif #ifdef CONFIG_NF_TABLES_IPV6 static const struct nft_chain_type nft_chain_nat_ipv6 = { .name = "nat", .type = NFT_CHAIN_T_NAT, .family = NFPROTO_IPV6, .owner = THIS_MODULE, .hook_mask = (1 << NF_INET_PRE_ROUTING) | (1 << NF_INET_POST_ROUTING) | (1 << NF_INET_LOCAL_OUT) | (1 << NF_INET_LOCAL_IN), .hooks = { [NF_INET_PRE_ROUTING] = nft_nat_do_chain, [NF_INET_POST_ROUTING] = nft_nat_do_chain, [NF_INET_LOCAL_OUT] = nft_nat_do_chain, [NF_INET_LOCAL_IN] = nft_nat_do_chain, }, .ops_register = nf_nat_ipv6_register_fn, .ops_unregister = nf_nat_ipv6_unregister_fn, }; #endif #ifdef CONFIG_NF_TABLES_INET static int nft_nat_inet_reg(struct net *net, const struct nf_hook_ops *ops) { return nf_nat_inet_register_fn(net, ops); } static void nft_nat_inet_unreg(struct net *net, const struct nf_hook_ops *ops) { nf_nat_inet_unregister_fn(net, ops); } static const struct nft_chain_type nft_chain_nat_inet = { .name = "nat", .type = NFT_CHAIN_T_NAT, .family = NFPROTO_INET, .owner = THIS_MODULE, .hook_mask = (1 << NF_INET_PRE_ROUTING) | (1 << NF_INET_LOCAL_IN) | (1 << NF_INET_LOCAL_OUT) | (1 << NF_INET_POST_ROUTING), .hooks = { [NF_INET_PRE_ROUTING] = nft_nat_do_chain, [NF_INET_LOCAL_IN] = nft_nat_do_chain, [NF_INET_LOCAL_OUT] = nft_nat_do_chain, [NF_INET_POST_ROUTING] = nft_nat_do_chain, }, .ops_register = nft_nat_inet_reg, .ops_unregister = nft_nat_inet_unreg, }; #endif static int __init nft_chain_nat_init(void) { #ifdef CONFIG_NF_TABLES_IPV6 nft_register_chain_type(&nft_chain_nat_ipv6); #endif #ifdef CONFIG_NF_TABLES_IPV4 nft_register_chain_type(&nft_chain_nat_ipv4); #endif #ifdef CONFIG_NF_TABLES_INET nft_register_chain_type(&nft_chain_nat_inet); #endif return 0; } static void __exit nft_chain_nat_exit(void) { #ifdef CONFIG_NF_TABLES_IPV4 nft_unregister_chain_type(&nft_chain_nat_ipv4); #endif #ifdef CONFIG_NF_TABLES_IPV6 nft_unregister_chain_type(&nft_chain_nat_ipv6); #endif #ifdef CONFIG_NF_TABLES_INET nft_unregister_chain_type(&nft_chain_nat_inet); #endif } module_init(nft_chain_nat_init); module_exit(nft_chain_nat_exit); MODULE_LICENSE("GPL"); MODULE_DESCRIPTION("nftables network address translation support"); #ifdef CONFIG_NF_TABLES_IPV4 MODULE_ALIAS_NFT_CHAIN(AF_INET, "nat"); #endif #ifdef CONFIG_NF_TABLES_IPV6 MODULE_ALIAS_NFT_CHAIN(AF_INET6, "nat"); #endif #ifdef CONFIG_NF_TABLES_INET MODULE_ALIAS_NFT_CHAIN(1, "nat"); /* NFPROTO_INET */ #endif |
| 6777 6777 18 6794 10154 2850 8629 2905 39 39 484 4539 4542 39 3 502 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 | /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _LINUX_PGTABLE_H #define _LINUX_PGTABLE_H #include <linux/pfn.h> #include <asm/pgtable.h> #define PMD_ORDER (PMD_SHIFT - PAGE_SHIFT) #define PUD_ORDER (PUD_SHIFT - PAGE_SHIFT) #ifndef __ASSEMBLY__ #ifdef CONFIG_MMU #include <linux/mm_types.h> #include <linux/bug.h> #include <linux/errno.h> #include <asm-generic/pgtable_uffd.h> #include <linux/page_table_check.h> #if 5 - defined(__PAGETABLE_P4D_FOLDED) - defined(__PAGETABLE_PUD_FOLDED) - \ defined(__PAGETABLE_PMD_FOLDED) != CONFIG_PGTABLE_LEVELS #error CONFIG_PGTABLE_LEVELS is not consistent with __PAGETABLE_{P4D,PUD,PMD}_FOLDED #endif /* * On almost all architectures and configurations, 0 can be used as the * upper ceiling to free_pgtables(): on many architectures it has the same * effect as using TASK_SIZE. However, there is one configuration which * must impose a more careful limit, to avoid freeing kernel pgtables. */ #ifndef USER_PGTABLES_CEILING #define USER_PGTABLES_CEILING 0UL #endif /* * This defines the first usable user address. Platforms * can override its value with custom FIRST_USER_ADDRESS * defined in their respective <asm/pgtable.h>. */ #ifndef FIRST_USER_ADDRESS #define FIRST_USER_ADDRESS 0UL #endif /* * This defines the generic helper for accessing PMD page * table page. Although platforms can still override this * via their respective <asm/pgtable.h>. */ #ifndef pmd_pgtable #define pmd_pgtable(pmd) pmd_page(pmd) #endif #define pmd_folio(pmd) page_folio(pmd_page(pmd)) /* * A page table page can be thought of an array like this: pXd_t[PTRS_PER_PxD] * * The pXx_index() functions return the index of the entry in the page * table page which would control the given virtual address * * As these functions may be used by the same code for different levels of * the page table folding, they are always available, regardless of * CONFIG_PGTABLE_LEVELS value. For the folded levels they simply return 0 * because in such cases PTRS_PER_PxD equals 1. */ static inline unsigned long pte_index(unsigned long address) { return (address >> PAGE_SHIFT) & (PTRS_PER_PTE - 1); } #ifndef pmd_index static inline unsigned long pmd_index(unsigned long address) { return (address >> PMD_SHIFT) & (PTRS_PER_PMD - 1); } #define pmd_index pmd_index #endif #ifndef pud_index static inline unsigned long pud_index(unsigned long address) { return (address >> PUD_SHIFT) & (PTRS_PER_PUD - 1); } #define pud_index pud_index #endif #ifndef pgd_index /* Must be a compile-time constant, so implement it as a macro */ #define pgd_index(a) (((a) >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1)) #endif #ifndef kernel_pte_init static inline void kernel_pte_init(void *addr) { } #define kernel_pte_init kernel_pte_init #endif #ifndef pmd_init static inline void pmd_init(void *addr) { } #define pmd_init pmd_init #endif #ifndef pud_init static inline void pud_init(void *addr) { } #define pud_init pud_init #endif #ifndef pte_offset_kernel static inline pte_t *pte_offset_kernel(pmd_t *pmd, unsigned long address) { return (pte_t *)pmd_page_vaddr(*pmd) + pte_index(address); } #define pte_offset_kernel pte_offset_kernel #endif #ifdef CONFIG_HIGHPTE #define __pte_map(pmd, address) \ ((pte_t *)kmap_local_page(pmd_page(*(pmd))) + pte_index((address))) #define pte_unmap(pte) do { \ kunmap_local((pte)); \ rcu_read_unlock(); \ } while (0) #else static inline pte_t *__pte_map(pmd_t *pmd, unsigned long address) { return pte_offset_kernel(pmd, address); } static inline void pte_unmap(pte_t *pte) { rcu_read_unlock(); } #endif void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable); /* Find an entry in the second-level page table.. */ #ifndef pmd_offset static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address) { return pud_pgtable(*pud) + pmd_index(address); } #define pmd_offset pmd_offset #endif #ifndef pud_offset static inline pud_t *pud_offset(p4d_t *p4d, unsigned long address) { return p4d_pgtable(*p4d) + pud_index(address); } #define pud_offset pud_offset #endif static inline pgd_t *pgd_offset_pgd(pgd_t *pgd, unsigned long address) { return (pgd + pgd_index(address)); }; /* * a shortcut to get a pgd_t in a given mm */ #ifndef pgd_offset #define pgd_offset(mm, address) pgd_offset_pgd((mm)->pgd, (address)) #endif /* * a shortcut which implies the use of the kernel's pgd, instead * of a process's */ #define pgd_offset_k(address) pgd_offset(&init_mm, (address)) /* * In many cases it is known that a virtual address is mapped at PMD or PTE * level, so instead of traversing all the page table levels, we can get a * pointer to the PMD entry in user or kernel page table or translate a virtual * address to the pointer in the PTE in the kernel page tables with simple * helpers. */ static inline pmd_t *pmd_off(struct mm_struct *mm, unsigned long va) { return pmd_offset(pud_offset(p4d_offset(pgd_offset(mm, va), va), va), va); } static inline pmd_t *pmd_off_k(unsigned long va) { return pmd_offset(pud_offset(p4d_offset(pgd_offset_k(va), va), va), va); } static inline pte_t *virt_to_kpte(unsigned long vaddr) { pmd_t *pmd = pmd_off_k(vaddr); return pmd_none(*pmd) ? NULL : pte_offset_kernel(pmd, vaddr); } #ifndef pmd_young static inline int pmd_young(pmd_t pmd) { return 0; } #endif #ifndef pmd_dirty static inline int pmd_dirty(pmd_t pmd) { return 0; } #endif /* * A facility to provide lazy MMU batching. This allows PTE updates and * page invalidations to be delayed until a call to leave lazy MMU mode * is issued. Some architectures may benefit from doing this, and it is * beneficial for both shadow and direct mode hypervisors, which may batch * the PTE updates which happen during this window. Note that using this * interface requires that read hazards be removed from the code. A read * hazard could result in the direct mode hypervisor case, since the actual * write to the page tables may not yet have taken place, so reads though * a raw PTE pointer after it has been modified are not guaranteed to be * up to date. * * In the general case, no lock is guaranteed to be held between entry and exit * of the lazy mode. So the implementation must assume preemption may be enabled * and cpu migration is possible; it must take steps to be robust against this. * (In practice, for user PTE updates, the appropriate page table lock(s) are * held, but for kernel PTE updates, no lock is held). Nesting is not permitted * and the mode cannot be used in interrupt context. */ #ifndef __HAVE_ARCH_ENTER_LAZY_MMU_MODE static inline void arch_enter_lazy_mmu_mode(void) {} static inline void arch_leave_lazy_mmu_mode(void) {} static inline void arch_flush_lazy_mmu_mode(void) {} #endif #ifndef pte_batch_hint /** * pte_batch_hint - Number of pages that can be added to batch without scanning. * @ptep: Page table pointer for the entry. * @pte: Page table entry. * * Some architectures know that a set of contiguous ptes all map the same * contiguous memory with the same permissions. In this case, it can provide a * hint to aid pte batching without the core code needing to scan every pte. * * An architecture implementation may ignore the PTE accessed state. Further, * the dirty state must apply atomically to all the PTEs described by the hint. * * May be overridden by the architecture, else pte_batch_hint is always 1. */ static inline unsigned int pte_batch_hint(pte_t *ptep, pte_t pte) { return 1; } #endif #ifndef pte_advance_pfn static inline pte_t pte_advance_pfn(pte_t pte, unsigned long nr) { return __pte(pte_val(pte) + (nr << PFN_PTE_SHIFT)); } #endif #define pte_next_pfn(pte) pte_advance_pfn(pte, 1) #ifndef set_ptes /** * set_ptes - Map consecutive pages to a contiguous range of addresses. * @mm: Address space to map the pages into. * @addr: Address to map the first page at. * @ptep: Page table pointer for the first entry. * @pte: Page table entry for the first page. * @nr: Number of pages to map. * * When nr==1, initial state of pte may be present or not present, and new state * may be present or not present. When nr>1, initial state of all ptes must be * not present, and new state must be present. * * May be overridden by the architecture, or the architecture can define * set_pte() and PFN_PTE_SHIFT. * * Context: The caller holds the page table lock. The pages all belong * to the same folio. The PTEs are all in the same PMD. */ static inline void set_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pte, unsigned int nr) { page_table_check_ptes_set(mm, ptep, pte, nr); for (;;) { set_pte(ptep, pte); if (--nr == 0) break; ptep++; pte = pte_next_pfn(pte); } } #endif #define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1) #ifndef __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS extern int ptep_set_access_flags(struct vm_area_struct *vma, unsigned long address, pte_t *ptep, pte_t entry, int dirty); #endif # |