| 2 2 10 64 9 67 65 2 78 78 4 2 65 65 1 64 1 51 52 1 18 18 19 59 65 2 2 1 1 1 1 17 4 2 11 9 1 8 3 1 5 6 6 3 1 1 1 12 4 1 7 4 2 2 2 2 2 2 2 2 4 3 3 3 7 4 3 2 5 2 5 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 | // SPDX-License-Identifier: GPL-2.0 #include <linux/kernel.h> #include <linux/errno.h> #include <linux/fs.h> #include <linux/file.h> #include <linux/fdtable.h> #include <linux/fsnotify.h> #include <linux/namei.h> #include <linux/pipe_fs_i.h> #include <linux/watch_queue.h> #include <linux/io_uring.h> #include <uapi/linux/io_uring.h> #include "../fs/internal.h" #include "filetable.h" #include "io_uring.h" #include "rsrc.h" #include "openclose.h" struct io_open { struct file *file; int dfd; u32 file_slot; struct filename *filename; struct open_how how; unsigned long nofile; }; struct io_close { struct file *file; int fd; u32 file_slot; }; struct io_fixed_install { struct file *file; unsigned int o_flags; }; static bool io_openat_force_async(struct io_open *open) { /* * Don't bother trying for O_TRUNC, O_CREAT, or O_TMPFILE open, * it'll always -EAGAIN. Note that we test for __O_TMPFILE because * O_TMPFILE includes O_DIRECTORY, which isn't a flag we need to force * async for. */ return open->how.flags & (O_TRUNC | O_CREAT | __O_TMPFILE); } static int __io_openat_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) { struct io_open *open = io_kiocb_to_cmd(req, struct io_open); const char __user *fname; int ret; if (unlikely(sqe->buf_index)) return -EINVAL; if (unlikely(req->flags & REQ_F_FIXED_FILE)) return -EBADF; /* open.how should be already initialised */ if (!(open->how.flags & O_PATH) && force_o_largefile()) open->how.flags |= O_LARGEFILE; open->dfd = READ_ONCE(sqe->fd); fname = u64_to_user_ptr(READ_ONCE(sqe->addr)); open->filename = getname(fname); if (IS_ERR(open->filename)) { ret = PTR_ERR(open->filename); open->filename = NULL; return ret; } open->file_slot = READ_ONCE(sqe->file_index); if (open->file_slot && (open->how.flags & O_CLOEXEC)) return -EINVAL; open->nofile = rlimit(RLIMIT_NOFILE); req->flags |= REQ_F_NEED_CLEANUP; if (io_openat_force_async(open)) req->flags |= REQ_F_FORCE_ASYNC; return 0; } int io_openat_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) { struct io_open *open = io_kiocb_to_cmd(req, struct io_open); u64 mode = READ_ONCE(sqe->len); u64 flags = READ_ONCE(sqe->open_flags); open->how = build_open_how(flags, mode); return __io_openat_prep(req, sqe); } int io_openat2_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) { struct io_open *open = io_kiocb_to_cmd(req, struct io_open); struct open_how __user *how; size_t len; int ret; how = u64_to_user_ptr(READ_ONCE(sqe->addr2)); len = READ_ONCE(sqe->len); if (len < OPEN_HOW_SIZE_VER0) return -EINVAL; ret = copy_struct_from_user(&open->how, sizeof(open->how), how, len); if (ret) return ret; return __io_openat_prep(req, sqe); } int io_openat2(struct io_kiocb *req, unsigned int issue_flags) { struct io_open *open = io_kiocb_to_cmd(req, struct io_open); struct open_flags op; struct file *file; bool resolve_nonblock, nonblock_set; bool fixed = !!open->file_slot; int ret; ret = build_open_flags(&open->how, &op); if (ret) goto err; nonblock_set = op.open_flag & O_NONBLOCK; resolve_nonblock = open->how.resolve & RESOLVE_CACHED; if (issue_flags & IO_URING_F_NONBLOCK) { WARN_ON_ONCE(io_openat_force_async(open)); op.lookup_flags |= LOOKUP_CACHED; op.open_flag |= O_NONBLOCK; } if (!fixed) { ret = __get_unused_fd_flags(open->how.flags, open->nofile); if (ret < 0) goto err; } file = do_filp_open(open->dfd, open->filename, &op); if (IS_ERR(file)) { /* * We could hang on to this 'fd' on retrying, but seems like * marginal gain for something that is now known to be a slower * path. So just put it, and we'll get a new one when we retry. */ if (!fixed) put_unused_fd(ret); ret = PTR_ERR(file); /* only retry if RESOLVE_CACHED wasn't already set by application */ if (ret == -EAGAIN && (!resolve_nonblock && (issue_flags & IO_URING_F_NONBLOCK))) return -EAGAIN; goto err; } if ((issue_flags & IO_URING_F_NONBLOCK) && !nonblock_set) file->f_flags &= ~O_NONBLOCK; if (!fixed) fd_install(ret, file); else ret = io_fixed_fd_install(req, issue_flags, file, open->file_slot); err: putname(open->filename); req->flags &= ~REQ_F_NEED_CLEANUP; if (ret < 0) req_set_fail(req); io_req_set_res(req, ret, 0); return IOU_COMPLETE; } int io_openat(struct io_kiocb *req, unsigned int issue_flags) { return io_openat2(req, issue_flags); } void io_open_cleanup(struct io_kiocb *req) { struct io_open *open = io_kiocb_to_cmd(req, struct io_open); if (open->filename) putname(open->filename); } int __io_close_fixed(struct io_ring_ctx *ctx, unsigned int issue_flags, unsigned int offset) { int ret; io_ring_submit_lock(ctx, issue_flags); ret = io_fixed_fd_remove(ctx, offset); io_ring_submit_unlock(ctx, issue_flags); return ret; } static inline int io_close_fixed(struct io_kiocb *req, unsigned int issue_flags) { struct io_close *close = io_kiocb_to_cmd(req, struct io_close); return __io_close_fixed(req->ctx, issue_flags, close->file_slot - 1); } int io_close_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) { struct io_close *close = io_kiocb_to_cmd(req, struct io_close); if (sqe->off || sqe->addr || sqe->len || sqe->rw_flags || sqe->buf_index) return -EINVAL; if (req->flags & REQ_F_FIXED_FILE) return -EBADF; close->fd = READ_ONCE(sqe->fd); close->file_slot = READ_ONCE(sqe->file_index); if (close->file_slot && close->fd) return -EINVAL; return 0; } int io_close(struct io_kiocb *req, unsigned int issue_flags) { struct files_struct *files = current->files; struct io_close *close = io_kiocb_to_cmd(req, struct io_close); struct file *file; int ret = -EBADF; if (close->file_slot) { ret = io_close_fixed(req, issue_flags); goto err; } spin_lock(&files->file_lock); file = files_lookup_fd_locked(files, close->fd); if (!file || io_is_uring_fops(file)) { spin_unlock(&files->file_lock); goto err; } /* if the file has a flush method, be safe and punt to async */ if (file->f_op->flush && (issue_flags & IO_URING_F_NONBLOCK)) { spin_unlock(&files->file_lock); return -EAGAIN; } file = file_close_fd_locked(files, close->fd); spin_unlock(&files->file_lock); if (!file) goto err; /* No ->flush() or already async, safely close from here */ ret = filp_close(file, current->files); err: if (ret < 0) req_set_fail(req); io_req_set_res(req, ret, 0); return IOU_COMPLETE; } int io_install_fixed_fd_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) { struct io_fixed_install *ifi; unsigned int flags; if (sqe->off || sqe->addr || sqe->len || sqe->buf_index || sqe->splice_fd_in || sqe->addr3) return -EINVAL; /* must be a fixed file */ if (!(req->flags & REQ_F_FIXED_FILE)) return -EBADF; flags = READ_ONCE(sqe->install_fd_flags); if (flags & ~IORING_FIXED_FD_NO_CLOEXEC) return -EINVAL; /* ensure the task's creds are used when installing/receiving fds */ if (req->flags & REQ_F_CREDS) return -EPERM; /* default to O_CLOEXEC, disable if IORING_FIXED_FD_NO_CLOEXEC is set */ ifi = io_kiocb_to_cmd(req, struct io_fixed_install); ifi->o_flags = O_CLOEXEC; if (flags & IORING_FIXED_FD_NO_CLOEXEC) ifi->o_flags = 0; return 0; } int io_install_fixed_fd(struct io_kiocb *req, unsigned int issue_flags) { struct io_fixed_install *ifi; int ret; ifi = io_kiocb_to_cmd(req, struct io_fixed_install); ret = receive_fd(req->file, NULL, ifi->o_flags); if (ret < 0) req_set_fail(req); io_req_set_res(req, ret, 0); return IOU_COMPLETE; } struct io_pipe { struct file *file; int __user *fds; int flags; int file_slot; unsigned long nofile; }; int io_pipe_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) { struct io_pipe *p = io_kiocb_to_cmd(req, struct io_pipe); if (sqe->fd || sqe->off || sqe->addr3) return -EINVAL; p->fds = u64_to_user_ptr(READ_ONCE(sqe->addr)); p->flags = READ_ONCE(sqe->pipe_flags); if (p->flags & ~(O_CLOEXEC | O_NONBLOCK | O_DIRECT | O_NOTIFICATION_PIPE)) return -EINVAL; p->file_slot = READ_ONCE(sqe->file_index); p->nofile = rlimit(RLIMIT_NOFILE); return 0; } static int io_pipe_fixed(struct io_kiocb *req, struct file **files, unsigned int issue_flags) { struct io_pipe *p = io_kiocb_to_cmd(req, struct io_pipe); struct io_ring_ctx *ctx = req->ctx; int ret, fds[2] = { -1, -1 }; int slot = p->file_slot; if (p->flags & O_CLOEXEC) return -EINVAL; io_ring_submit_lock(ctx, issue_flags); ret = __io_fixed_fd_install(ctx, files[0], slot); if (ret < 0) goto err; fds[0] = ret; files[0] = NULL; /* * If a specific slot is given, next one will be used for * the write side. */ if (slot != IORING_FILE_INDEX_ALLOC) slot++; ret = __io_fixed_fd_install(ctx, files[1], slot); if (ret < 0) goto err; fds[1] = ret; files[1] = NULL; io_ring_submit_unlock(ctx, issue_flags); if (!copy_to_user(p->fds, fds, sizeof(fds))) return 0; ret = -EFAULT; io_ring_submit_lock(ctx, issue_flags); err: if (fds[0] != -1) io_fixed_fd_remove(ctx, fds[0]); if (fds[1] != -1) io_fixed_fd_remove(ctx, fds[1]); io_ring_submit_unlock(ctx, issue_flags); return ret; } static int io_pipe_fd(struct io_kiocb *req, struct file **files) { struct io_pipe *p = io_kiocb_to_cmd(req, struct io_pipe); int ret, fds[2] = { -1, -1 }; ret = __get_unused_fd_flags(p->flags, p->nofile); if (ret < 0) goto err; fds[0] = ret; ret = __get_unused_fd_flags(p->flags, p->nofile); if (ret < 0) goto err; fds[1] = ret; if (!copy_to_user(p->fds, fds, sizeof(fds))) { fd_install(fds[0], files[0]); fd_install(fds[1], files[1]); return 0; } ret = -EFAULT; err: if (fds[0] != -1) put_unused_fd(fds[0]); if (fds[1] != -1) put_unused_fd(fds[1]); return ret; } int io_pipe(struct io_kiocb *req, unsigned int issue_flags) { struct io_pipe *p = io_kiocb_to_cmd(req, struct io_pipe); struct file *files[2]; int ret; ret = create_pipe_files(files, p->flags); if (ret) return ret; if (!!p->file_slot) ret = io_pipe_fixed(req, files, issue_flags); else ret = io_pipe_fd(req, files); io_req_set_res(req, ret, 0); if (!ret) return IOU_COMPLETE; req_set_fail(req); if (files[0]) fput(files[0]); if (files[1]) fput(files[1]); return ret; } |
| 243 291 291 246 246 199 199 199 199 199 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 | /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _MM_SWAP_TABLE_H #define _MM_SWAP_TABLE_H #include <linux/rcupdate.h> #include <linux/atomic.h> #include "swap.h" /* A typical flat array in each cluster as swap table */ struct swap_table { atomic_long_t entries[SWAPFILE_CLUSTER]; }; #define SWP_TABLE_USE_PAGE (sizeof(struct swap_table) == PAGE_SIZE) /* * A swap table entry represents the status of a swap slot on a swap * (physical or virtual) device. The swap table in each cluster is a * 1:1 map of the swap slots in this cluster. * * Each swap table entry could be a pointer (folio), a XA_VALUE * (shadow), or NULL. */ /* * Helpers for casting one type of info into a swap table entry. */ static inline unsigned long null_to_swp_tb(void) { BUILD_BUG_ON(sizeof(unsigned long) != sizeof(atomic_long_t)); return 0; } static inline unsigned long folio_to_swp_tb(struct folio *folio) { BUILD_BUG_ON(sizeof(unsigned long) != sizeof(void *)); return (unsigned long)folio; } static inline unsigned long shadow_swp_to_tb(void *shadow) { BUILD_BUG_ON((BITS_PER_XA_VALUE + 1) != BITS_PER_BYTE * sizeof(unsigned long)); VM_WARN_ON_ONCE(shadow && !xa_is_value(shadow)); return (unsigned long)shadow; } /* * Helpers for swap table entry type checking. */ static inline bool swp_tb_is_null(unsigned long swp_tb) { return !swp_tb; } static inline bool swp_tb_is_folio(unsigned long swp_tb) { return !xa_is_value((void *)swp_tb) && !swp_tb_is_null(swp_tb); } static inline bool swp_tb_is_shadow(unsigned long swp_tb) { return xa_is_value((void *)swp_tb); } /* * Helpers for retrieving info from swap table. */ static inline struct folio *swp_tb_to_folio(unsigned long swp_tb) { VM_WARN_ON(!swp_tb_is_folio(swp_tb)); return (void *)swp_tb; } static inline void *swp_tb_to_shadow(unsigned long swp_tb) { VM_WARN_ON(!swp_tb_is_shadow(swp_tb)); return (void *)swp_tb; } /* * Helpers for accessing or modifying the swap table of a cluster, * the swap cluster must be locked. */ static inline void __swap_table_set(struct swap_cluster_info *ci, unsigned int off, unsigned long swp_tb) { atomic_long_t *table = rcu_dereference_protected(ci->table, true); lockdep_assert_held(&ci->lock); VM_WARN_ON_ONCE(off >= SWAPFILE_CLUSTER); atomic_long_set(&table[off], swp_tb); } static inline unsigned long __swap_table_xchg(struct swap_cluster_info *ci, unsigned int off, unsigned long swp_tb) { atomic_long_t *table = rcu_dereference_protected(ci->table, true); lockdep_assert_held(&ci->lock); VM_WARN_ON_ONCE(off >= SWAPFILE_CLUSTER); /* Ordering is guaranteed by cluster lock, relax */ return atomic_long_xchg_relaxed(&table[off], swp_tb); } static inline unsigned long __swap_table_get(struct swap_cluster_info *ci, unsigned int off) { atomic_long_t *table; VM_WARN_ON_ONCE(off >= SWAPFILE_CLUSTER); table = rcu_dereference_check(ci->table, lockdep_is_held(&ci->lock)); return atomic_long_read(&table[off]); } static inline unsigned long swap_table_get(struct swap_cluster_info *ci, unsigned int off) { atomic_long_t *table; unsigned long swp_tb; rcu_read_lock(); table = rcu_dereference(ci->table); swp_tb = table ? atomic_long_read(&table[off]) : null_to_swp_tb(); rcu_read_unlock(); return swp_tb; } #endif |
| 191 728 15552 2 3161 1379 86 15 29 29 29 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 | /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _LINUX_PID_H #define _LINUX_PID_H #include <linux/pid_types.h> #include <linux/rculist.h> #include <linux/rcupdate.h> #include <linux/refcount.h> #include <linux/sched.h> #include <linux/wait.h> /* * What is struct pid? * * A struct pid is the kernel's internal notion of a process identifier. * It refers to individual tasks, process groups, and sessions. While * there are processes attached to it the struct pid lives in a hash * table, so it and then the processes that it refers to can be found * quickly from the numeric pid value. The attached processes may be * quickly accessed by following pointers from struct pid. * * Storing pid_t values in the kernel and referring to them later has a * problem. The process originally with that pid may have exited and the * pid allocator wrapped, and another process could have come along * and been assigned that pid. * * Referring to user space processes by holding a reference to struct * task_struct has a problem. When the user space process exits * the now useless task_struct is still kept. A task_struct plus a * stack consumes around 10K of low kernel memory. More precisely * this is THREAD_SIZE + sizeof(struct task_struct). By comparison * a struct pid is about 64 bytes. * * Holding a reference to struct pid solves both of these problems. * It is small so holding a reference does not consume a lot of * resources, and since a new struct pid is allocated when the numeric pid * value is reused (when pids wrap around) we don't mistakenly refer to new * processes. */ /* * struct upid is used to get the id of the struct pid, as it is * seen in particular namespace. Later the struct pid is found with * find_pid_ns() using the int nr and struct pid_namespace *ns. */ #define RESERVED_PIDS 300 struct pidfs_attr; struct upid { int nr; struct pid_namespace *ns; }; struct pid { refcount_t count; unsigned int level; spinlock_t lock; struct { u64 ino; struct rb_node pidfs_node; struct dentry *stashed; struct pidfs_attr *attr; }; /* lists of tasks that use this pid */ struct hlist_head tasks[PIDTYPE_MAX]; struct hlist_head inodes; /* wait queue for pidfd notifications */ wait_queue_head_t wait_pidfd; struct rcu_head rcu; struct upid numbers[]; }; extern seqcount_spinlock_t pidmap_lock_seq; extern struct pid init_struct_pid; struct file; struct pid *pidfd_pid(const struct file *file); struct pid *pidfd_get_pid(unsigned int fd, unsigned int *flags); struct task_struct *pidfd_get_task(int pidfd, unsigned int *flags); int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret_file); void do_notify_pidfd(struct task_struct *task); static inline struct pid *get_pid(struct pid *pid) { if (pid) refcount_inc(&pid->count); return pid; } extern void put_pid(struct pid *pid); extern struct task_struct *pid_task(struct pid *pid, enum pid_type); static inline bool pid_has_task(struct pid *pid, enum pid_type type) { return !hlist_empty(&pid->tasks[type]); } extern struct task_struct *get_pid_task(struct pid *pid, enum pid_type); extern struct pid *get_task_pid(struct task_struct *task, enum pid_type type); /* * these helpers must be called with the tasklist_lock write-held. */ extern void attach_pid(struct task_struct *task, enum pid_type); void detach_pid(struct pid **pids, struct task_struct *task, enum pid_type); void change_pid(struct pid **pids, struct task_struct *task, enum pid_type, struct pid *pid); extern void exchange_tids(struct task_struct *task, struct task_struct *old); extern void transfer_pid(struct task_struct *old, struct task_struct *new, enum pid_type); /* * look up a PID in the hash table. Must be called with the tasklist_lock * or rcu_read_lock() held. * * find_pid_ns() finds the pid in the namespace specified * find_vpid() finds the pid by its virtual id, i.e. in the current namespace * * see also find_task_by_vpid() set in include/linux/sched.h */ extern struct pid *find_pid_ns(int nr, struct pid_namespace *ns); extern struct pid *find_vpid(int nr); /* * Lookup a PID in the hash table, and return with it's count elevated. */ extern struct pid *find_get_pid(int nr); extern struct pid *find_ge_pid(int nr, struct pid_namespace *); extern struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid, size_t set_tid_size); extern void free_pid(struct pid *pid); void free_pids(struct pid **pids); extern void disable_pid_allocation(struct pid_namespace *ns); /* * ns_of_pid() returns the pid namespace in which the specified pid was * allocated. * * NOTE: * ns_of_pid() is expected to be called for a process (task) that has * an attached 'struct pid' (see attach_pid(), detach_pid()) i.e @pid * is expected to be non-NULL. If @pid is NULL, caller should handle * the resulting NULL pid-ns. */ static inline struct pid_namespace *ns_of_pid(struct pid *pid) { struct pid_namespace *ns = NULL; if (pid) ns = pid->numbers[pid->level].ns; return ns; } /* * is_child_reaper returns true if the pid is the init process * of the current namespace. As this one could be checked before * pid_ns->child_reaper is assigned in copy_process, we check * with the pid number. */ static inline bool is_child_reaper(struct pid *pid) { return pid->numbers[pid->level].nr == 1; } /* * the helpers to get the pid's id seen from different namespaces * * pid_nr() : global id, i.e. the id seen from the init namespace; * pid_vnr() : virtual id, i.e. the id seen from the pid namespace of * current. * pid_nr_ns() : id seen from the ns specified. * * see also task_xid_nr() etc in include/linux/sched.h */ static inline pid_t pid_nr(struct pid *pid) { pid_t nr = 0; if (pid) nr = pid->numbers[0].nr; return nr; } pid_t pid_nr_ns(struct pid *pid, struct pid_namespace *ns); pid_t pid_vnr(struct pid *pid); #define do_each_pid_task(pid, type, task) \ do { \ if ((pid) != NULL) \ hlist_for_each_entry_rcu((task), \ &(pid)->tasks[type], pid_links[type]) { /* * Both old and new leaders may be attached to * the same pid in the middle of de_thread(). */ #define while_each_pid_task(pid, type, task) \ if (type == PIDTYPE_PID) \ break; \ } \ } while (0) #define do_each_pid_thread(pid, type, task) \ do_each_pid_task(pid, type, task) { \ struct task_struct *tg___ = task; \ for_each_thread(tg___, task) { #define while_each_pid_thread(pid, type, task) \ } \ task = tg___; \ } while_each_pid_task(pid, type, task) static inline struct pid *task_pid(struct task_struct *task) { return task->thread_pid; } /* * the helpers to get the task's different pids as they are seen * from various namespaces * * task_xid_nr() : global id, i.e. the id seen from the init namespace; * task_xid_vnr() : virtual id, i.e. the id seen from the pid namespace of * current. * task_xid_nr_ns() : id seen from the ns specified; * * see also pid_nr() etc in include/linux/pid.h */ pid_t __task_pid_nr_ns(struct task_struct *task, enum pid_type type, struct pid_namespace *ns); static inline pid_t task_pid_nr(struct task_struct *tsk) { return tsk->pid; } static inline pid_t task_pid_nr_ns(struct task_struct *tsk, struct pid_namespace *ns) { return __task_pid_nr_ns(tsk, PIDTYPE_PID, ns); } static inline pid_t task_pid_vnr(struct task_struct *tsk) { return __task_pid_nr_ns(tsk, PIDTYPE_PID, NULL); } static inline pid_t task_tgid_nr(struct task_struct *tsk) { return tsk->tgid; } /** * pid_alive - check that a task structure is not stale * @p: Task structure to be checked. * * Test if a process is not yet dead (at most zombie state) * If pid_alive fails, then pointers within the task structure * can be stale and must not be dereferenced. * * Return: 1 if the process is alive. 0 otherwise. */ static inline int pid_alive(const struct task_struct *p) { return p->thread_pid != NULL; } static inline pid_t task_pgrp_nr_ns(struct task_struct *tsk, struct pid_namespace *ns) { return __task_pid_nr_ns(tsk, PIDTYPE_PGID, ns); } static inline pid_t task_pgrp_vnr(struct task_struct *tsk) { return __task_pid_nr_ns(tsk, PIDTYPE_PGID, NULL); } static inline pid_t task_session_nr_ns(struct task_struct *tsk, struct pid_namespace *ns) { return __task_pid_nr_ns(tsk, PIDTYPE_SID, ns); } static inline pid_t task_session_vnr(struct task_struct *tsk) { return __task_pid_nr_ns(tsk, PIDTYPE_SID, NULL); } static inline pid_t task_tgid_nr_ns(struct task_struct *tsk, struct pid_namespace *ns) { return __task_pid_nr_ns(tsk, PIDTYPE_TGID, ns); } static inline pid_t task_tgid_vnr(struct task_struct *tsk) { return __task_pid_nr_ns(tsk, PIDTYPE_TGID, NULL); } static inline pid_t task_ppid_nr_ns(const struct task_struct *tsk, struct pid_namespace *ns) { pid_t pid = 0; rcu_read_lock(); if (pid_alive(tsk)) pid = task_tgid_nr_ns(rcu_dereference(tsk->real_parent), ns); rcu_read_unlock(); return pid; } static inline pid_t task_ppid_nr(const struct task_struct *tsk) { return task_ppid_nr_ns(tsk, &init_pid_ns); } /* Obsolete, do not use: */ static inline pid_t task_pgrp_nr(struct task_struct *tsk) { return task_pgrp_nr_ns(tsk, &init_pid_ns); } /** * is_global_init - check if a task structure is init. Since init * is free to have sub-threads we need to check tgid. * @tsk: Task structure to be checked. * * Check if a task structure is the first user space task the kernel created. * * Return: 1 if the task structure is init. 0 otherwise. */ static inline int is_global_init(struct task_struct *tsk) { return task_tgid_nr(tsk) == 1; } #endif /* _LINUX_PID_H */ |
| 2 2 2 2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 | // SPDX-License-Identifier: GPL-2.0-or-later /* * v4l2-spi - SPI helpers for Video4Linux2 */ #include <linux/module.h> #include <linux/spi/spi.h> #include <media/v4l2-common.h> #include <media/v4l2-device.h> void v4l2_spi_subdev_unregister(struct v4l2_subdev *sd) { struct spi_device *spi = v4l2_get_subdevdata(sd); if (spi && !spi->dev.of_node && !spi->dev.fwnode) spi_unregister_device(spi); } void v4l2_spi_subdev_init(struct v4l2_subdev *sd, struct spi_device *spi, const struct v4l2_subdev_ops *ops) { v4l2_subdev_init(sd, ops); sd->flags |= V4L2_SUBDEV_FL_IS_SPI; /* the owner is the same as the spi_device's driver owner */ sd->owner = spi->dev.driver->owner; sd->dev = &spi->dev; /* spi_device and v4l2_subdev point to one another */ v4l2_set_subdevdata(sd, spi); spi_set_drvdata(spi, sd); /* initialize name */ snprintf(sd->name, sizeof(sd->name), "%s %s", spi->dev.driver->name, dev_name(&spi->dev)); } EXPORT_SYMBOL_GPL(v4l2_spi_subdev_init); struct v4l2_subdev *v4l2_spi_new_subdev(struct v4l2_device *v4l2_dev, struct spi_controller *ctlr, struct spi_board_info *info) { struct v4l2_subdev *sd = NULL; struct spi_device *spi = NULL; if (!v4l2_dev) return NULL; if (info->modalias[0]) request_module(info->modalias); spi = spi_new_device(ctlr, info); if (!spi || !spi->dev.driver) goto error; if (!try_module_get(spi->dev.driver->owner)) goto error; sd = spi_get_drvdata(spi); /* * Register with the v4l2_device which increases the module's * use count as well. */ if (__v4l2_device_register_subdev(v4l2_dev, sd, sd->owner)) sd = NULL; /* Decrease the module use count to match the first try_module_get. */ module_put(spi->dev.driver->owner); error: /* * If we have a client but no subdev, then something went wrong and * we must unregister the client. */ if (!sd) spi_unregister_device(spi); return sd; } EXPORT_SYMBOL_GPL(v4l2_spi_new_subdev); |
| 334 86 1485 222 47 47 27 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 | /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _LINUX_RCULIST_NULLS_H #define _LINUX_RCULIST_NULLS_H #ifdef __KERNEL__ /* * RCU-protected list version */ #include <linux/list_nulls.h> #include <linux/rcupdate.h> /** * hlist_nulls_del_init_rcu - deletes entry from hash list with re-initialization * @n: the element to delete from the hash list. * * Note: hlist_nulls_unhashed() on the node return true after this. It is * useful for RCU based read lockfree traversal if the writer side * must know if the list entry is still hashed or already unhashed. * * In particular, it means that we can not poison the forward pointers * that may still be used for walking the hash list and we can only * zero the pprev pointer so list_unhashed() will return true after * this. * * The caller must take whatever precautions are necessary (such as * holding appropriate locks) to avoid racing with another * list-mutation primitive, such as hlist_nulls_add_head_rcu() or * hlist_nulls_del_rcu(), running on this same list. However, it is * perfectly legal to run concurrently with the _rcu list-traversal * primitives, such as hlist_nulls_for_each_entry_rcu(). */ static inline void hlist_nulls_del_init_rcu(struct hlist_nulls_node *n) { if (!hlist_nulls_unhashed(n)) { __hlist_nulls_del(n); WRITE_ONCE(n->pprev, NULL); } } /** * hlist_nulls_first_rcu - returns the first element of the hash list. * @head: the head of the list. */ #define hlist_nulls_first_rcu(head) \ (*((struct hlist_nulls_node __rcu __force **)&(head)->first)) /** * hlist_nulls_next_rcu - returns the element of the list after @node. * @node: element of the list. */ #define hlist_nulls_next_rcu(node) \ (*((struct hlist_nulls_node __rcu __force **)&(node)->next)) /** * hlist_nulls_del_rcu - deletes entry from hash list without re-initialization * @n: the element to delete from the hash list. * * Note: hlist_nulls_unhashed() on entry does not return true after this, * the entry is in an undefined state. It is useful for RCU based * lockfree traversal. * * In particular, it means that we can not poison the forward * pointers that may still be used for walking the hash list. * * The caller must take whatever precautions are necessary * (such as holding appropriate locks) to avoid racing * with another list-mutation primitive, such as hlist_nulls_add_head_rcu() * or hlist_nulls_del_rcu(), running on this same list. * However, it is perfectly legal to run concurrently with * the _rcu list-traversal primitives, such as * hlist_nulls_for_each_entry(). */ static inline void hlist_nulls_del_rcu(struct hlist_nulls_node *n) { __hlist_nulls_del(n); WRITE_ONCE(n->pprev, LIST_POISON2); } /** * hlist_nulls_add_head_rcu * @n: the element to add to the hash list. * @h: the list to add to. * * Description: * Adds the specified element to the specified hlist_nulls, * while permitting racing traversals. * * The caller must take whatever precautions are necessary * (such as holding appropriate locks) to avoid racing * with another list-mutation primitive, such as hlist_nulls_add_head_rcu() * or hlist_nulls_del_rcu(), running on this same list. * However, it is perfectly legal to run concurrently with * the _rcu list-traversal primitives, such as * hlist_nulls_for_each_entry_rcu(), used to prevent memory-consistency * problems on Alpha CPUs. Regardless of the type of CPU, the * list-traversal primitive must be guarded by rcu_read_lock(). */ static inline void hlist_nulls_add_head_rcu(struct hlist_nulls_node *n, struct hlist_nulls_head *h) { struct hlist_nulls_node *first = h->first; WRITE_ONCE(n->next, first); WRITE_ONCE(n->pprev, &h->first); rcu_assign_pointer(hlist_nulls_first_rcu(h), n); if (!is_a_nulls(first)) WRITE_ONCE(first->pprev, &n->next); } /** * hlist_nulls_add_tail_rcu * @n: the element to add to the hash list. * @h: the list to add to. * * Description: * Adds the specified element to the specified hlist_nulls, * while permitting racing traversals. * * The caller must take whatever precautions are necessary * (such as holding appropriate locks) to avoid racing * with another list-mutation primitive, such as hlist_nulls_add_head_rcu() * or hlist_nulls_del_rcu(), running on this same list. * However, it is perfectly legal to run concurrently with * the _rcu list-traversal primitives, such as * hlist_nulls_for_each_entry_rcu(), used to prevent memory-consistency * problems on Alpha CPUs. Regardless of the type of CPU, the * list-traversal primitive must be guarded by rcu_read_lock(). */ static inline void hlist_nulls_add_tail_rcu(struct hlist_nulls_node *n, struct hlist_nulls_head *h) { struct hlist_nulls_node *i, *last = NULL; /* Note: write side code, so rcu accessors are not needed. */ for (i = h->first; !is_a_nulls(i); i = i->next) last = i; if (last) { WRITE_ONCE(n->next, last->next); n->pprev = &last->next; rcu_assign_pointer(hlist_nulls_next_rcu(last), n); } else { hlist_nulls_add_head_rcu(n, h); } } /* after that hlist_nulls_del will work */ static inline void hlist_nulls_add_fake(struct hlist_nulls_node *n) { n->pprev = &n->next; n->next = (struct hlist_nulls_node *)NULLS_MARKER(NULL); } /** * hlist_nulls_for_each_entry_rcu - iterate over rcu list of given type * @tpos: the type * to use as a loop cursor. * @pos: the &struct hlist_nulls_node to use as a loop cursor. * @head: the head of the list. * @member: the name of the hlist_nulls_node within the struct. * * The barrier() is needed to make sure compiler doesn't cache first element [1], * as this loop can be restarted [2] * [1] Documentation/memory-barriers.txt around line 1533 * [2] Documentation/RCU/rculist_nulls.rst around line 146 */ #define hlist_nulls_for_each_entry_rcu(tpos, pos, head, member) \ for (({barrier();}), \ pos = rcu_dereference_raw(hlist_nulls_first_rcu(head)); \ (!is_a_nulls(pos)) && \ ({ tpos = hlist_nulls_entry(pos, typeof(*tpos), member); 1; }); \ pos = rcu_dereference_raw(hlist_nulls_next_rcu(pos))) /** * hlist_nulls_for_each_entry_safe - * iterate over list of given type safe against removal of list entry * @tpos: the type * to use as a loop cursor. * @pos: the &struct hlist_nulls_node to use as a loop cursor. * @head: the head of the list. * @member: the name of the hlist_nulls_node within the struct. */ #define hlist_nulls_for_each_entry_safe(tpos, pos, head, member) \ for (({barrier();}), \ pos = rcu_dereference_raw(hlist_nulls_first_rcu(head)); \ (!is_a_nulls(pos)) && \ ({ tpos = hlist_nulls_entry(pos, typeof(*tpos), member); \ pos = rcu_dereference_raw(hlist_nulls_next_rcu(pos)); 1; });) #endif #endif |
| 46 41 5 1 1 1 1 1 1 1 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 | // SPDX-License-Identifier: GPL-2.0-or-later /* * Handle bridge arp/nd proxy/suppress * * Copyright (C) 2017 Cumulus Networks * Copyright (c) 2017 Roopa Prabhu <roopa@cumulusnetworks.com> * * Authors: * Roopa Prabhu <roopa@cumulusnetworks.com> */ #include <linux/kernel.h> #include <linux/netdevice.h> #include <linux/etherdevice.h> #include <linux/neighbour.h> #include <net/arp.h> #include <linux/if_vlan.h> #include <linux/inetdevice.h> #include <net/addrconf.h> #include <net/ipv6_stubs.h> #if IS_ENABLED(CONFIG_IPV6) #include <net/ip6_checksum.h> #endif #include "br_private.h" void br_recalculate_neigh_suppress_enabled(struct net_bridge *br) { struct net_bridge_port *p; bool neigh_suppress = false; list_for_each_entry(p, &br->port_list, list) { if (p->flags & (BR_NEIGH_SUPPRESS | BR_NEIGH_VLAN_SUPPRESS)) { neigh_suppress = true; break; } } br_opt_toggle(br, BROPT_NEIGH_SUPPRESS_ENABLED, neigh_suppress); } #if IS_ENABLED(CONFIG_INET) static void br_arp_send(struct net_bridge *br, struct net_bridge_port *p, struct net_device *dev, __be32 dest_ip, __be32 src_ip, const unsigned char *dest_hw, const unsigned char *src_hw, const unsigned char *target_hw, __be16 vlan_proto, u16 vlan_tci) { struct net_bridge_vlan_group *vg; struct sk_buff *skb; u16 pvid; netdev_dbg(dev, "arp send dev %s dst %pI4 dst_hw %pM src %pI4 src_hw %pM\n", dev->name, &dest_ip, dest_hw, &src_ip, src_hw); if (!vlan_tci) { arp_send(ARPOP_REPLY, ETH_P_ARP, dest_ip, dev, src_ip, dest_hw, src_hw, target_hw); return; } skb = arp_create(ARPOP_REPLY, ETH_P_ARP, dest_ip, dev, src_ip, dest_hw, src_hw, target_hw); if (!skb) return; if (p) vg = nbp_vlan_group_rcu(p); else vg = br_vlan_group_rcu(br); pvid = br_get_pvid(vg); if (pvid == (vlan_tci & VLAN_VID_MASK)) vlan_tci = 0; if (vlan_tci) __vlan_hwaccel_put_tag(skb, vlan_proto, vlan_tci); if (p) { arp_xmit(skb); } else { skb_reset_mac_header(skb); __skb_pull(skb, skb_network_offset(skb)); skb->ip_summed = CHECKSUM_UNNECESSARY; skb->pkt_type = PACKET_HOST; netif_rx(skb); } } static int br_chk_addr_ip(struct net_device *dev, struct netdev_nested_priv *priv) { __be32 ip = *(__be32 *)priv->data; struct in_device *in_dev; __be32 addr = 0; in_dev = __in_dev_get_rcu(dev); if (in_dev) addr = inet_confirm_addr(dev_net(dev), in_dev, 0, ip, RT_SCOPE_HOST); if (addr == ip) return 1; return 0; } static bool br_is_local_ip(struct net_device *dev, __be32 ip) { struct netdev_nested_priv priv = { .data = (void *)&ip, }; if (br_chk_addr_ip(dev, &priv)) return true; /* check if ip is configured on upper dev */ if (netdev_walk_all_upper_dev_rcu(dev, br_chk_addr_ip, &priv)) return true; return false; } void br_do_proxy_suppress_arp(struct sk_buff *skb, struct net_bridge *br, u16 vid, struct net_bridge_port *p) { struct net_device *dev = br->dev; struct net_device *vlandev = dev; struct neighbour *n; struct arphdr *parp; u8 *arpptr, *sha; __be32 sip, tip; BR_INPUT_SKB_CB(skb)->proxyarp_replied = 0; if ((dev->flags & IFF_NOARP) || !pskb_may_pull(skb, arp_hdr_len(dev))) return; parp = arp_hdr(skb); if (parp->ar_pro != htons(ETH_P_IP) || parp->ar_hln != dev->addr_len || parp->ar_pln != 4) return; arpptr = (u8 *)parp + sizeof(struct arphdr); sha = arpptr; arpptr += dev->addr_len; /* sha */ memcpy(&sip, arpptr, sizeof(sip)); arpptr += sizeof(sip); arpptr += dev->addr_len; /* tha */ memcpy(&tip, arpptr, sizeof(tip)); if (ipv4_is_loopback(tip) || ipv4_is_multicast(tip)) return; if (br_opt_get(br, BROPT_NEIGH_SUPPRESS_ENABLED)) { if (br_is_neigh_suppress_enabled(p, vid)) return; if (is_unicast_ether_addr(eth_hdr(skb)->h_dest) && parp->ar_op == htons(ARPOP_REQUEST)) return; if (parp->ar_op != htons(ARPOP_RREQUEST) && parp->ar_op != htons(ARPOP_RREPLY) && (ipv4_is_zeronet(sip) || sip == tip)) { /* prevent flooding to neigh suppress ports */ BR_INPUT_SKB_CB(skb)->proxyarp_replied = 1; return; } } if (parp->ar_op != htons(ARPOP_REQUEST)) return; if (vid != 0) { vlandev = __vlan_find_dev_deep_rcu(br->dev, skb->vlan_proto, vid); if (!vlandev) return; } if (br_opt_get(br, BROPT_NEIGH_SUPPRESS_ENABLED) && br_is_local_ip(vlandev, tip)) { /* its our local ip, so don't proxy reply * and don't forward to neigh suppress ports */ BR_INPUT_SKB_CB(skb)->proxyarp_replied = 1; return; } n = neigh_lookup(&arp_tbl, &tip, vlandev); if (n) { struct net_bridge_fdb_entry *f; if (!(READ_ONCE(n->nud_state) & NUD_VALID)) { neigh_release(n); return; } f = br_fdb_find_rcu(br, n->ha, vid); if (f) { bool replied = false; if ((p && (p->flags & BR_PROXYARP)) || (f->dst && (f->dst->flags & BR_PROXYARP_WIFI)) || br_is_neigh_suppress_enabled(f->dst, vid)) { if (!vid) br_arp_send(br, p, skb->dev, sip, tip, sha, n->ha, sha, 0, 0); else br_arp_send(br, p, skb->dev, sip, tip, sha, n->ha, sha, skb->vlan_proto, skb_vlan_tag_get(skb)); replied = true; } /* If we have replied or as long as we know the * mac, indicate to arp replied */ if (replied || br_opt_get(br, BROPT_NEIGH_SUPPRESS_ENABLED)) BR_INPUT_SKB_CB(skb)->proxyarp_replied = 1; } neigh_release(n); } } #endif #if IS_ENABLED(CONFIG_IPV6) struct nd_msg *br_is_nd_neigh_msg(const struct sk_buff *skb, struct nd_msg *msg) { struct nd_msg *m; m = skb_header_pointer(skb, skb_network_offset(skb) + sizeof(struct ipv6hdr), sizeof(*msg), msg); if (!m) return NULL; if (m->icmph.icmp6_code != 0 || (m->icmph.icmp6_type != NDISC_NEIGHBOUR_SOLICITATION && m->icmph.icmp6_type != NDISC_NEIGHBOUR_ADVERTISEMENT)) return NULL; return m; } static void br_nd_send(struct net_bridge *br, struct net_bridge_port *p, struct sk_buff *request, struct neighbour *n, __be16 vlan_proto, u16 vlan_tci, struct nd_msg *ns) { struct net_device *dev = request->dev; struct net_bridge_vlan_group *vg; struct sk_buff *reply; struct nd_msg *na; struct ipv6hdr *pip6; int na_olen = 8; /* opt hdr + ETH_ALEN for target */ int ns_olen; int i, len; u8 *daddr; u16 pvid; if (!dev) return; len = LL_RESERVED_SPACE(dev) + sizeof(struct ipv6hdr) + sizeof(*na) + na_olen + dev->needed_tailroom; reply = alloc_skb(len, GFP_ATOMIC); if (!reply) return; reply->protocol = htons(ETH_P_IPV6); reply->dev = dev; skb_reserve(reply, LL_RESERVED_SPACE(dev)); skb_push(reply, sizeof(struct ethhdr)); skb_set_mac_header(reply, 0); daddr = eth_hdr(request)->h_source; /* Do we need option processing ? */ ns_olen = request->len - (skb_network_offset(request) + sizeof(struct ipv6hdr)) - sizeof(*ns); for (i = 0; i < ns_olen - 1; i += (ns->opt[i + 1] << 3)) { if (!ns->opt[i + 1]) { kfree_skb(reply); return; } if (ns->opt[i] == ND_OPT_SOURCE_LL_ADDR) { daddr = ns->opt + i + sizeof(struct nd_opt_hdr); break; } } /* Ethernet header */ ether_addr_copy(eth_hdr(reply)->h_dest, daddr); ether_addr_copy(eth_hdr(reply)->h_source, n->ha); eth_hdr(reply)->h_proto = htons(ETH_P_IPV6); reply->protocol = htons(ETH_P_IPV6); skb_pull(reply, sizeof(struct ethhdr)); skb_set_network_header(reply, 0); skb_put(reply, sizeof(struct ipv6hdr)); /* IPv6 header */ pip6 = ipv6_hdr(reply); memset(pip6, 0, sizeof(struct ipv6hdr)); pip6->version = 6; pip6->priority = ipv6_hdr(request)->priority; pip6->nexthdr = IPPROTO_ICMPV6; pip6->hop_limit = 255; pip6->daddr = ipv6_hdr(request)->saddr; pip6->saddr = *(struct in6_addr *)n->primary_key; skb_pull(reply, sizeof(struct ipv6hdr)); skb_set_transport_header(reply, 0); na = (struct nd_msg *)skb_put(reply, sizeof(*na) + na_olen); /* Neighbor Advertisement */ memset(na, 0, sizeof(*na) + na_olen); na->icmph.icmp6_type = NDISC_NEIGHBOUR_ADVERTISEMENT; na->icmph.icmp6_router = (n->flags & NTF_ROUTER) ? 1 : 0; na->icmph.icmp6_override = 1; na->icmph.icmp6_solicited = 1; na->target = ns->target; ether_addr_copy(&na->opt[2], n->ha); na->opt[0] = ND_OPT_TARGET_LL_ADDR; na->opt[1] = na_olen >> 3; na->icmph.icmp6_cksum = csum_ipv6_magic(&pip6->saddr, &pip6->daddr, sizeof(*na) + na_olen, IPPROTO_ICMPV6, csum_partial(na, sizeof(*na) + na_olen, 0)); pip6->payload_len = htons(sizeof(*na) + na_olen); skb_push(reply, sizeof(struct ipv6hdr)); skb_push(reply, sizeof(struct ethhdr)); reply->ip_summed = CHECKSUM_UNNECESSARY; if (p) vg = nbp_vlan_group_rcu(p); else vg = br_vlan_group_rcu(br); pvid = br_get_pvid(vg); if (pvid == (vlan_tci & VLAN_VID_MASK)) vlan_tci = 0; if (vlan_tci) __vlan_hwaccel_put_tag(reply, vlan_proto, vlan_tci); netdev_dbg(dev, "nd send dev %s dst %pI6 dst_hw %pM src %pI6 src_hw %pM\n", dev->name, &pip6->daddr, daddr, &pip6->saddr, n->ha); if (p) { dev_queue_xmit(reply); } else { skb_reset_mac_header(reply); __skb_pull(reply, skb_network_offset(reply)); reply->ip_summed = CHECKSUM_UNNECESSARY; reply->pkt_type = PACKET_HOST; netif_rx(reply); } } static int br_chk_addr_ip6(struct net_device *dev, struct netdev_nested_priv *priv) { struct in6_addr *addr = (struct in6_addr *)priv->data; if (ipv6_chk_addr(dev_net(dev), addr, dev, 0)) return 1; return 0; } static bool br_is_local_ip6(struct net_device *dev, struct in6_addr *addr) { struct netdev_nested_priv priv = { .data = (void *)addr, }; if (br_chk_addr_ip6(dev, &priv)) return true; /* check if ip is configured on upper dev */ if (netdev_walk_all_upper_dev_rcu(dev, br_chk_addr_ip6, &priv)) return true; return false; } void br_do_suppress_nd(struct sk_buff *skb, struct net_bridge *br, u16 vid, struct net_bridge_port *p, struct nd_msg *msg) { struct net_device *dev = br->dev; struct net_device *vlandev = NULL; struct in6_addr *saddr, *daddr; struct ipv6hdr *iphdr; struct neighbour *n; BR_INPUT_SKB_CB(skb)->proxyarp_replied = 0; if (br_is_neigh_suppress_enabled(p, vid)) return; if (is_unicast_ether_addr(eth_hdr(skb)->h_dest) && msg->icmph.icmp6_type == NDISC_NEIGHBOUR_SOLICITATION) return; if (msg->icmph.icmp6_type == NDISC_NEIGHBOUR_ADVERTISEMENT && !msg->icmph.icmp6_solicited) { /* prevent flooding to neigh suppress ports */ BR_INPUT_SKB_CB(skb)->proxyarp_replied = 1; return; } if (msg->icmph.icmp6_type != NDISC_NEIGHBOUR_SOLICITATION) return; iphdr = ipv6_hdr(skb); saddr = &iphdr->saddr; daddr = &iphdr->daddr; if (ipv6_addr_any(saddr) || !ipv6_addr_cmp(saddr, daddr)) { /* prevent flooding to neigh suppress ports */ BR_INPUT_SKB_CB(skb)->proxyarp_replied = 1; return; } if (vid != 0) { /* build neigh table lookup on the vlan device */ vlandev = __vlan_find_dev_deep_rcu(br->dev, skb->vlan_proto, vid); if (!vlandev) return; } else { vlandev = dev; } if (br_is_local_ip6(vlandev, &msg->target)) { /* its our own ip, so don't proxy reply * and don't forward to arp suppress ports */ BR_INPUT_SKB_CB(skb)->proxyarp_replied = 1; return; } n = neigh_lookup(ipv6_stub->nd_tbl, &msg->target, vlandev); if (n) { struct net_bridge_fdb_entry *f; if (!(READ_ONCE(n->nud_state) & NUD_VALID)) { neigh_release(n); return; } f = br_fdb_find_rcu(br, n->ha, vid); if (f) { bool replied = false; if (br_is_neigh_suppress_enabled(f->dst, vid)) { if (vid != 0) br_nd_send(br, p, skb, n, skb->vlan_proto, skb_vlan_tag_get(skb), msg); else br_nd_send(br, p, skb, n, 0, 0, msg); replied = true; } /* If we have replied or as long as we know the * mac, indicate to NEIGH_SUPPRESS ports that we * have replied */ if (replied || br_opt_get(br, BROPT_NEIGH_SUPPRESS_ENABLED)) BR_INPUT_SKB_CB(skb)->proxyarp_replied = 1; } neigh_release(n); } } #endif bool br_is_neigh_suppress_enabled(const struct net_bridge_port *p, u16 vid) { if (!p) return false; if (!vid) return !!(p->flags & BR_NEIGH_SUPPRESS); if (p->flags & BR_NEIGH_VLAN_SUPPRESS) { struct net_bridge_vlan_group *vg = nbp_vlan_group_rcu(p); struct net_bridge_vlan *v; v = br_vlan_find(vg, vid); if (!v) return false; return !!(v->priv_flags & BR_VLFLAG_NEIGH_SUPPRESS_ENABLED); } else { return !!(p->flags & BR_NEIGH_SUPPRESS); } } |
| 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 1 2 2 2 2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 | // SPDX-License-Identifier: GPL-2.0-or-later /* * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. * * HID driver for NVIDIA SHIELD peripherals. */ #include <linux/hid.h> #include <linux/idr.h> #include <linux/input-event-codes.h> #include <linux/input.h> #include <linux/jiffies.h> #include <linux/leds.h> #include <linux/module.h> #include <linux/power_supply.h> #include <linux/spinlock.h> #include <linux/timer.h> #include <linux/workqueue.h> #include "hid-ids.h" #define NOT_INIT_STR "NOT INITIALIZED" #define android_map_key(c) hid_map_usage(hi, usage, bit, max, EV_KEY, (c)) enum { HID_USAGE_ANDROID_PLAYPAUSE_BTN = 0xcd, /* Double-tap volume slider */ HID_USAGE_ANDROID_VOLUMEUP_BTN = 0xe9, HID_USAGE_ANDROID_VOLUMEDOWN_BTN = 0xea, HID_USAGE_ANDROID_SEARCH_BTN = 0x221, /* NVIDIA btn on Thunderstrike */ HID_USAGE_ANDROID_HOME_BTN = 0x223, HID_USAGE_ANDROID_BACK_BTN = 0x224, }; enum { SHIELD_FW_VERSION_INITIALIZED = 0, SHIELD_BOARD_INFO_INITIALIZED, SHIELD_BATTERY_STATS_INITIALIZED, SHIELD_CHARGER_STATE_INITIALIZED, }; enum { THUNDERSTRIKE_FW_VERSION_UPDATE = 0, THUNDERSTRIKE_BOARD_INFO_UPDATE, THUNDERSTRIKE_HAPTICS_UPDATE, THUNDERSTRIKE_LED_UPDATE, THUNDERSTRIKE_POWER_SUPPLY_STATS_UPDATE, }; enum { THUNDERSTRIKE_HOSTCMD_REPORT_SIZE = 33, THUNDERSTRIKE_HOSTCMD_REQ_REPORT_ID = 0x4, THUNDERSTRIKE_HOSTCMD_RESP_REPORT_ID = 0x3, }; enum { THUNDERSTRIKE_HOSTCMD_ID_FW_VERSION = 1, THUNDERSTRIKE_HOSTCMD_ID_LED = 6, THUNDERSTRIKE_HOSTCMD_ID_BATTERY, THUNDERSTRIKE_HOSTCMD_ID_BOARD_INFO = 16, THUNDERSTRIKE_HOSTCMD_ID_USB_INIT = 53, THUNDERSTRIKE_HOSTCMD_ID_HAPTICS = 57, THUNDERSTRIKE_HOSTCMD_ID_CHARGER, }; struct power_supply_dev { struct power_supply *psy; struct power_supply_desc desc; }; struct thunderstrike_psy_prop_values { int voltage_min; int voltage_now; int voltage_avg; int voltage_boot; int capacity; int status; int charge_type; int temp; }; static const enum power_supply_property thunderstrike_battery_props[] = { POWER_SUPPLY_PROP_STATUS, POWER_SUPPLY_PROP_CHARGE_TYPE, POWER_SUPPLY_PROP_PRESENT, POWER_SUPPLY_PROP_VOLTAGE_MIN, POWER_SUPPLY_PROP_VOLTAGE_MAX_DESIGN, POWER_SUPPLY_PROP_VOLTAGE_MIN_DESIGN, POWER_SUPPLY_PROP_VOLTAGE_NOW, POWER_SUPPLY_PROP_VOLTAGE_AVG, POWER_SUPPLY_PROP_VOLTAGE_BOOT, POWER_SUPPLY_PROP_CAPACITY, POWER_SUPPLY_PROP_SCOPE, POWER_SUPPLY_PROP_TEMP, POWER_SUPPLY_PROP_TEMP_MIN, POWER_SUPPLY_PROP_TEMP_MAX, POWER_SUPPLY_PROP_TEMP_ALERT_MIN, POWER_SUPPLY_PROP_TEMP_ALERT_MAX, }; enum thunderstrike_led_state { THUNDERSTRIKE_LED_OFF = 1, THUNDERSTRIKE_LED_ON = 8, } __packed; static_assert(sizeof(enum thunderstrike_led_state) == 1); struct thunderstrike_hostcmd_battery { __le16 voltage_avg; u8 reserved_at_10; __le16 thermistor; __le16 voltage_min; __le16 voltage_boot; __le16 voltage_now; u8 capacity; } __packed; enum thunderstrike_charger_type { THUNDERSTRIKE_CHARGER_TYPE_NONE = 0, THUNDERSTRIKE_CHARGER_TYPE_TRICKLE, THUNDERSTRIKE_CHARGER_TYPE_NORMAL, } __packed; static_assert(sizeof(enum thunderstrike_charger_type) == 1); enum thunderstrike_charger_state { THUNDERSTRIKE_CHARGER_STATE_UNKNOWN = 0, THUNDERSTRIKE_CHARGER_STATE_DISABLED, THUNDERSTRIKE_CHARGER_STATE_CHARGING, THUNDERSTRIKE_CHARGER_STATE_FULL, THUNDERSTRIKE_CHARGER_STATE_FAILED = 8, } __packed; static_assert(sizeof(enum thunderstrike_charger_state) == 1); struct thunderstrike_hostcmd_charger { u8 connected; enum thunderstrike_charger_type type; enum thunderstrike_charger_state state; } __packed; struct thunderstrike_hostcmd_board_info { __le16 revision; __le16 serial[7]; } __packed; struct thunderstrike_hostcmd_haptics { u8 motor_left; u8 motor_right; } __packed; struct thunderstrike_hostcmd_resp_report { u8 report_id; /* THUNDERSTRIKE_HOSTCMD_RESP_REPORT_ID */ u8 cmd_id; u8 reserved_at_10; union { struct thunderstrike_hostcmd_board_info board_info; struct thunderstrike_hostcmd_haptics motors; __le16 fw_version; enum thunderstrike_led_state led_state; struct thunderstrike_hostcmd_battery battery; struct thunderstrike_hostcmd_charger charger; u8 payload[30]; } __packed; } __packed; static_assert(sizeof(struct thunderstrike_hostcmd_resp_report) == THUNDERSTRIKE_HOSTCMD_REPORT_SIZE); struct thunderstrike_hostcmd_req_report { u8 report_id; /* THUNDERSTRIKE_HOSTCMD_REQ_REPORT_ID */ u8 cmd_id; u8 reserved_at_10; union { struct __packed { u8 update; enum thunderstrike_led_state state; } led; struct __packed { u8 update; struct thunderstrike_hostcmd_haptics motors; } haptics; } __packed; u8 reserved_at_30[27]; } __packed; static_assert(sizeof(struct thunderstrike_hostcmd_req_report) == THUNDERSTRIKE_HOSTCMD_REPORT_SIZE); /* Common struct for shield accessories. */ struct shield_device { struct hid_device *hdev; struct power_supply_dev battery_dev; unsigned long initialized_flags; const char *codename; u16 fw_version; struct { u16 revision; char serial_number[15]; } board_info; }; /* * Non-trivial to uniquely identify Thunderstrike controllers at initialization * time. Use an ID allocator to help with this. */ static DEFINE_IDA(thunderstrike_ida); struct thunderstrike { struct shield_device base; int id; /* Sub-devices */ struct input_dev *haptics_dev; struct led_classdev led_dev; /* Resources */ void *req_report_dmabuf; unsigned long update_flags; struct thunderstrike_hostcmd_haptics haptics_val; spinlock_t haptics_update_lock; u8 led_state : 1; enum thunderstrike_led_state led_value; struct thunderstrike_psy_prop_values psy_stats; spinlock_t psy_stats_lock; struct timer_list psy_stats_timer; struct work_struct hostcmd_req_work; }; static inline void thunderstrike_hostcmd_req_report_init( struct thunderstrike_hostcmd_req_report *report, u8 cmd_id) { memset(report, 0, sizeof(*report)); report->report_id = THUNDERSTRIKE_HOSTCMD_REQ_REPORT_ID; report->cmd_id = cmd_id; } static inline void shield_strrev(char *dest, size_t len, u16 rev) { dest[0] = ('A' - 1) + (rev >> 8); snprintf(&dest[1], len - 1, "%02X", 0xff & rev); } static struct input_dev *shield_allocate_input_dev(struct hid_device *hdev, const char *name_suffix) { struct input_dev *idev; idev = input_allocate_device(); if (!idev) goto err_device; idev->id.bustype = hdev->bus; idev->id.vendor = hdev->vendor; idev->id.product = hdev->product; idev->id.version = hdev->version; idev->uniq = hdev->uniq; idev->name = devm_kasprintf(&hdev->dev, GFP_KERNEL, "%s %s", hdev->name, name_suffix); if (!idev->name) goto err_name; input_set_drvdata(idev, hdev); return idev; err_name: input_free_device(idev); err_device: return ERR_PTR(-ENOMEM); } static struct input_dev *shield_haptics_create( struct shield_device *dev, int (*play_effect)(struct input_dev *, void *, struct ff_effect *)) { struct input_dev *haptics; int ret; if (!IS_ENABLED(CONFIG_NVIDIA_SHIELD_FF)) return NULL; haptics = shield_allocate_input_dev(dev->hdev, "Haptics"); if (IS_ERR(haptics)) return haptics; input_set_capability(haptics, EV_FF, FF_RUMBLE); ret = input_ff_create_memless(haptics, NULL, play_effect); if (ret) goto err; ret = input_register_device(haptics); if (ret) goto err; return haptics; err: input_free_device(haptics); return ERR_PTR(ret); } static inline void thunderstrike_send_hostcmd_request(struct thunderstrike *ts) { struct thunderstrike_hostcmd_req_report *report = ts->req_report_dmabuf; struct shield_device *shield_dev = &ts->base; int ret; ret = hid_hw_raw_request(shield_dev->hdev, report->report_id, ts->req_report_dmabuf, THUNDERSTRIKE_HOSTCMD_REPORT_SIZE, HID_OUTPUT_REPORT, HID_REQ_SET_REPORT); if (ret < 0) { hid_err(shield_dev->hdev, "Failed to output Thunderstrike HOSTCMD request HID report due to %pe\n", ERR_PTR(ret)); } } static void thunderstrike_hostcmd_req_work_handler(struct work_struct *work) { struct thunderstrike *ts = container_of(work, struct thunderstrike, hostcmd_req_work); struct thunderstrike_hostcmd_req_report *report; unsigned long flags; report = ts->req_report_dmabuf; if (test_and_clear_bit(THUNDERSTRIKE_FW_VERSION_UPDATE, &ts->update_flags)) { thunderstrike_hostcmd_req_report_init( report, THUNDERSTRIKE_HOSTCMD_ID_FW_VERSION); thunderstrike_send_hostcmd_request(ts); } if (test_and_clear_bit(THUNDERSTRIKE_LED_UPDATE, &ts->update_flags)) { thunderstrike_hostcmd_req_report_init(report, THUNDERSTRIKE_HOSTCMD_ID_LED); report->led.update = 1; report->led.state = ts->led_value; thunderstrike_send_hostcmd_request(ts); } if (test_and_clear_bit(THUNDERSTRIKE_POWER_SUPPLY_STATS_UPDATE, &ts->update_flags)) { thunderstrike_hostcmd_req_report_init( report, THUNDERSTRIKE_HOSTCMD_ID_BATTERY); thunderstrike_send_hostcmd_request(ts); thunderstrike_hostcmd_req_report_init( report, THUNDERSTRIKE_HOSTCMD_ID_CHARGER); thunderstrike_send_hostcmd_request(ts); } if (test_and_clear_bit(THUNDERSTRIKE_BOARD_INFO_UPDATE, &ts->update_flags)) { thunderstrike_hostcmd_req_report_init( report, THUNDERSTRIKE_HOSTCMD_ID_BOARD_INFO); thunderstrike_send_hostcmd_request(ts); } if (test_and_clear_bit(THUNDERSTRIKE_HAPTICS_UPDATE, &ts->update_flags)) { thunderstrike_hostcmd_req_report_init( report, THUNDERSTRIKE_HOSTCMD_ID_HAPTICS); report->haptics.update = 1; spin_lock_irqsave(&ts->haptics_update_lock, flags); report->haptics.motors = ts->haptics_val; spin_unlock_irqrestore(&ts->haptics_update_lock, flags); thunderstrike_send_hostcmd_request(ts); } } static inline void thunderstrike_request_firmware_version(struct thunderstrike *ts) { set_bit(THUNDERSTRIKE_FW_VERSION_UPDATE, &ts->update_flags); schedule_work(&ts->hostcmd_req_work); } static inline void thunderstrike_request_board_info(struct thunderstrike *ts) { set_bit(THUNDERSTRIKE_BOARD_INFO_UPDATE, &ts->update_flags); schedule_work(&ts->hostcmd_req_work); } static inline int thunderstrike_update_haptics(struct thunderstrike *ts, struct thunderstrike_hostcmd_haptics *motors) { unsigned long flags; spin_lock_irqsave(&ts->haptics_update_lock, flags); ts->haptics_val = *motors; spin_unlock_irqrestore(&ts->haptics_update_lock, flags); set_bit(THUNDERSTRIKE_HAPTICS_UPDATE, &ts->update_flags); schedule_work(&ts->hostcmd_req_work); return 0; } static int thunderstrike_play_effect(struct input_dev *idev, void *data, struct ff_effect *effect) { struct hid_device *hdev = input_get_drvdata(idev); struct thunderstrike_hostcmd_haptics motors; struct shield_device *shield_dev; struct thunderstrike *ts; if (effect->type != FF_RUMBLE) return 0; shield_dev = hid_get_drvdata(hdev); ts = container_of(shield_dev, struct thunderstrike, base); /* Thunderstrike motor values range from 0 to 32 inclusively */ motors.motor_left = effect->u.rumble.strong_magnitude / 2047; motors.motor_right = effect->u.rumble.weak_magnitude / 2047; hid_dbg(hdev, "Thunderstrike FF_RUMBLE request, left: %u right: %u\n", motors.motor_left, motors.motor_right); return thunderstrike_update_haptics(ts, &motors); } static enum led_brightness thunderstrike_led_get_brightness(struct led_classdev *led) { struct hid_device *hdev = to_hid_device(led->dev->parent); struct shield_device *shield_dev = hid_get_drvdata(hdev); struct thunderstrike *ts; ts = container_of(shield_dev, struct thunderstrike, base); return ts->led_state; } static void thunderstrike_led_set_brightness(struct led_classdev *led, enum led_brightness value) { struct hid_device *hdev = to_hid_device(led->dev->parent); struct shield_device *shield_dev = hid_get_drvdata(hdev); struct thunderstrike *ts; ts = container_of(shield_dev, struct thunderstrike, base); switch (value) { case LED_OFF: ts->led_value = THUNDERSTRIKE_LED_OFF; break; default: ts->led_value = THUNDERSTRIKE_LED_ON; break; } set_bit(THUNDERSTRIKE_LED_UPDATE, &ts->update_flags); schedule_work(&ts->hostcmd_req_work); } static int thunderstrike_battery_get_property(struct power_supply *psy, enum power_supply_property psp, union power_supply_propval *val) { struct shield_device *shield_dev = power_supply_get_drvdata(psy); struct thunderstrike_psy_prop_values prop_values; struct thunderstrike *ts; int ret = 0; ts = container_of(shield_dev, struct thunderstrike, base); spin_lock(&ts->psy_stats_lock); prop_values = ts->psy_stats; spin_unlock(&ts->psy_stats_lock); switch (psp) { case POWER_SUPPLY_PROP_STATUS: val->intval = prop_values.status; break; case POWER_SUPPLY_PROP_CHARGE_TYPE: val->intval = prop_values.charge_type; break; case POWER_SUPPLY_PROP_PRESENT: val->intval = 1; break; case POWER_SUPPLY_PROP_VOLTAGE_MIN: val->intval = prop_values.voltage_min; break; case POWER_SUPPLY_PROP_VOLTAGE_MAX_DESIGN: val->intval = 2900000; /* 2.9 V */ break; case POWER_SUPPLY_PROP_VOLTAGE_MIN_DESIGN: val->intval = 2200000; /* 2.2 V */ break; case POWER_SUPPLY_PROP_VOLTAGE_NOW: val->intval = prop_values.voltage_now; break; case POWER_SUPPLY_PROP_VOLTAGE_AVG: val->intval = prop_values.voltage_avg; break; case POWER_SUPPLY_PROP_VOLTAGE_BOOT: val->intval = prop_values.voltage_boot; break; case POWER_SUPPLY_PROP_CAPACITY: val->intval = prop_values.capacity; break; case POWER_SUPPLY_PROP_SCOPE: val->intval = POWER_SUPPLY_SCOPE_DEVICE; break; case POWER_SUPPLY_PROP_TEMP: val->intval = prop_values.temp; break; case POWER_SUPPLY_PROP_TEMP_MIN: val->intval = 0; /* 0 C */ break; case POWER_SUPPLY_PROP_TEMP_MAX: val->intval = 400; /* 40 C */ break; case POWER_SUPPLY_PROP_TEMP_ALERT_MIN: val->intval = 15; /* 1.5 C */ break; case POWER_SUPPLY_PROP_TEMP_ALERT_MAX: val->intval = 380; /* 38 C */ break; default: ret = -EINVAL; break; } return ret; } static inline void thunderstrike_request_psy_stats(struct thunderstrike *ts) { set_bit(THUNDERSTRIKE_POWER_SUPPLY_STATS_UPDATE, &ts->update_flags); schedule_work(&ts->hostcmd_req_work); } static void thunderstrike_psy_stats_timer_handler(struct timer_list *timer) { struct thunderstrike *ts = container_of(timer, struct thunderstrike, psy_stats_timer); thunderstrike_request_psy_stats(ts); /* Query battery statistics from device every five minutes */ mod_timer(timer, jiffies + 300 * HZ); } static void thunderstrike_parse_fw_version_payload(struct shield_device *shield_dev, __le16 fw_version) { shield_dev->fw_version = le16_to_cpu(fw_version); set_bit(SHIELD_FW_VERSION_INITIALIZED, &shield_dev->initialized_flags); hid_dbg(shield_dev->hdev, "Thunderstrike firmware version 0x%04X\n", shield_dev->fw_version); } static void thunderstrike_parse_board_info_payload(struct shield_device *shield_dev, struct thunderstrike_hostcmd_board_info *board_info) { char board_revision_str[4]; int i; shield_dev->board_info.revision = le16_to_cpu(board_info->revision); for (i = 0; i < 7; ++i) { u16 val = le16_to_cpu(board_info->serial[i]); shield_dev->board_info.serial_number[2 * i] = val & 0xFF; shield_dev->board_info.serial_number[2 * i + 1] = val >> 8; } shield_dev->board_info.serial_number[14] = '\0'; set_bit(SHIELD_BOARD_INFO_INITIALIZED, &shield_dev->initialized_flags); shield_strrev(board_revision_str, 4, shield_dev->board_info.revision); hid_dbg(shield_dev->hdev, "Thunderstrike BOARD_REVISION_%s (0x%04X) S/N: %s\n", board_revision_str, shield_dev->board_info.revision, shield_dev->board_info.serial_number); } static inline void thunderstrike_parse_haptics_payload(struct shield_device *shield_dev, struct thunderstrike_hostcmd_haptics *haptics) { hid_dbg(shield_dev->hdev, "Thunderstrike haptics HOSTCMD response, left: %u right: %u\n", haptics->motor_left, haptics->motor_right); } static void thunderstrike_parse_led_payload(struct shield_device *shield_dev, enum thunderstrike_led_state led_state) { struct thunderstrike *ts = container_of(shield_dev, struct thunderstrike, base); switch (led_state) { case THUNDERSTRIKE_LED_OFF: ts->led_state = 0; break; case THUNDERSTRIKE_LED_ON: ts->led_state = 1; break; } hid_dbg(shield_dev->hdev, "Thunderstrike led HOSTCMD response, 0x%02X\n", led_state); } static void thunderstrike_parse_battery_payload( struct shield_device *shield_dev, struct thunderstrike_hostcmd_battery *battery) { struct thunderstrike *ts = container_of(shield_dev, struct thunderstrike, base); u16 hostcmd_voltage_boot = le16_to_cpu(battery->voltage_boot); u16 hostcmd_voltage_avg = le16_to_cpu(battery->voltage_avg); u16 hostcmd_voltage_min = le16_to_cpu(battery->voltage_min); u16 hostcmd_voltage_now = le16_to_cpu(battery->voltage_now); u16 hostcmd_thermistor = le16_to_cpu(battery->thermistor); int voltage_boot, voltage_avg, voltage_min, voltage_now; struct hid_device *hdev = shield_dev->hdev; u8 capacity = battery->capacity; int temp; /* Convert thunderstrike device values to µV and tenths of degree Celsius */ voltage_boot = hostcmd_voltage_boot * 1000; voltage_avg = hostcmd_voltage_avg * 1000; voltage_min = hostcmd_voltage_min * 1000; voltage_now = hostcmd_voltage_now * 1000; temp = (1378 - (int)hostcmd_thermistor) * 10 / 19; /* Copy converted values */ spin_lock(&ts->psy_stats_lock); ts->psy_stats.voltage_boot = voltage_boot; ts->psy_stats.voltage_avg = voltage_avg; ts->psy_stats.voltage_min = voltage_min; ts->psy_stats.voltage_now = voltage_now; ts->psy_stats.capacity = capacity; ts->psy_stats.temp = temp; spin_unlock(&ts->psy_stats_lock); set_bit(SHIELD_BATTERY_STATS_INITIALIZED, &shield_dev->initialized_flags); hid_dbg(hdev, "Thunderstrike battery HOSTCMD response, voltage_avg: %u voltage_now: %u\n", hostcmd_voltage_avg, hostcmd_voltage_now); hid_dbg(hdev, "Thunderstrike battery HOSTCMD response, voltage_boot: %u voltage_min: %u\n", hostcmd_voltage_boot, hostcmd_voltage_min); hid_dbg(hdev, "Thunderstrike battery HOSTCMD response, thermistor: %u\n", hostcmd_thermistor); hid_dbg(hdev, "Thunderstrike battery HOSTCMD response, capacity: %u%%\n", capacity); } static void thunderstrike_parse_charger_payload( struct shield_device *shield_dev, struct thunderstrike_hostcmd_charger *charger) { struct thunderstrike *ts = container_of(shield_dev, struct thunderstrike, base); int charge_type = POWER_SUPPLY_CHARGE_TYPE_UNKNOWN; struct hid_device *hdev = shield_dev->hdev; int status = POWER_SUPPLY_STATUS_UNKNOWN; switch (charger->type) { case THUNDERSTRIKE_CHARGER_TYPE_NONE: charge_type = POWER_SUPPLY_CHARGE_TYPE_NONE; break; case THUNDERSTRIKE_CHARGER_TYPE_TRICKLE: charge_type = POWER_SUPPLY_CHARGE_TYPE_TRICKLE; break; case THUNDERSTRIKE_CHARGER_TYPE_NORMAL: charge_type = POWER_SUPPLY_CHARGE_TYPE_STANDARD; break; default: hid_warn(hdev, "Unhandled Thunderstrike charger HOSTCMD type, %u\n", charger->type); break; } switch (charger->state) { case THUNDERSTRIKE_CHARGER_STATE_UNKNOWN: status = POWER_SUPPLY_STATUS_UNKNOWN; break; case THUNDERSTRIKE_CHARGER_STATE_DISABLED: /* Indicates charger is disconnected */ break; case THUNDERSTRIKE_CHARGER_STATE_CHARGING: status = POWER_SUPPLY_STATUS_CHARGING; break; case THUNDERSTRIKE_CHARGER_STATE_FULL: status = POWER_SUPPLY_STATUS_FULL; break; case THUNDERSTRIKE_CHARGER_STATE_FAILED: status = POWER_SUPPLY_STATUS_NOT_CHARGING; hid_err(hdev, "Thunderstrike device failed to charge\n"); break; default: hid_warn(hdev, "Unhandled Thunderstrike charger HOSTCMD state, %u\n", charger->state); break; } if (!charger->connected) status = POWER_SUPPLY_STATUS_DISCHARGING; spin_lock(&ts->psy_stats_lock); ts->psy_stats.charge_type = charge_type; ts->psy_stats.status = status; spin_unlock(&ts->psy_stats_lock); set_bit(SHIELD_CHARGER_STATE_INITIALIZED, &shield_dev->initialized_flags); hid_dbg(hdev, "Thunderstrike charger HOSTCMD response, connected: %u, type: %u, state: %u\n", charger->connected, charger->type, charger->state); } static inline void thunderstrike_device_init_info(struct shield_device *shield_dev) { struct thunderstrike *ts = container_of(shield_dev, struct thunderstrike, base); if (!test_bit(SHIELD_FW_VERSION_INITIALIZED, &shield_dev->initialized_flags)) thunderstrike_request_firmware_version(ts); if (!test_bit(SHIELD_BOARD_INFO_INITIALIZED, &shield_dev->initialized_flags)) thunderstrike_request_board_info(ts); if (!test_bit(SHIELD_BATTERY_STATS_INITIALIZED, &shield_dev->initialized_flags) || !test_bit(SHIELD_CHARGER_STATE_INITIALIZED, &shield_dev->initialized_flags)) thunderstrike_psy_stats_timer_handler(&ts->psy_stats_timer); } static int thunderstrike_parse_report(struct shield_device *shield_dev, struct hid_report *report, u8 *data, int size) { struct thunderstrike_hostcmd_resp_report *hostcmd_resp_report; struct hid_device *hdev = shield_dev->hdev; switch (report->id) { case THUNDERSTRIKE_HOSTCMD_RESP_REPORT_ID: if (size != THUNDERSTRIKE_HOSTCMD_REPORT_SIZE) { hid_err(hdev, "Encountered Thunderstrike HOSTCMD HID report with unexpected size %d\n", size); return -EINVAL; } hostcmd_resp_report = (struct thunderstrike_hostcmd_resp_report *)data; switch (hostcmd_resp_report->cmd_id) { case THUNDERSTRIKE_HOSTCMD_ID_FW_VERSION: thunderstrike_parse_fw_version_payload( shield_dev, hostcmd_resp_report->fw_version); break; case THUNDERSTRIKE_HOSTCMD_ID_LED: thunderstrike_parse_led_payload(shield_dev, hostcmd_resp_report->led_state); break; case THUNDERSTRIKE_HOSTCMD_ID_BATTERY: thunderstrike_parse_battery_payload(shield_dev, &hostcmd_resp_report->battery); break; case THUNDERSTRIKE_HOSTCMD_ID_BOARD_INFO: thunderstrike_parse_board_info_payload( shield_dev, &hostcmd_resp_report->board_info); break; case THUNDERSTRIKE_HOSTCMD_ID_HAPTICS: thunderstrike_parse_haptics_payload( shield_dev, &hostcmd_resp_report->motors); break; case THUNDERSTRIKE_HOSTCMD_ID_USB_INIT: /* May block HOSTCMD requests till received initially */ thunderstrike_device_init_info(shield_dev); break; case THUNDERSTRIKE_HOSTCMD_ID_CHARGER: /* May block HOSTCMD requests till received initially */ thunderstrike_device_init_info(shield_dev); thunderstrike_parse_charger_payload( shield_dev, &hostcmd_resp_report->charger); break; default: hid_warn(hdev, "Unhandled Thunderstrike HOSTCMD id %d\n", hostcmd_resp_report->cmd_id); return -ENOENT; } break; default: return 0; } return 0; } static inline int thunderstrike_led_create(struct thunderstrike *ts) { struct led_classdev *led = &ts->led_dev; led->name = devm_kasprintf(&ts->base.hdev->dev, GFP_KERNEL, "thunderstrike%d:blue:led", ts->id); if (!led->name) return -ENOMEM; led->max_brightness = 1; led->flags = LED_CORE_SUSPENDRESUME | LED_RETAIN_AT_SHUTDOWN; led->brightness_get = &thunderstrike_led_get_brightness; led->brightness_set = &thunderstrike_led_set_brightness; return led_classdev_register(&ts->base.hdev->dev, led); } static inline int thunderstrike_psy_create(struct shield_device *shield_dev) { struct thunderstrike *ts = container_of(shield_dev, struct thunderstrike, base); struct power_supply_config psy_cfg = { .drv_data = shield_dev, }; struct hid_device *hdev = shield_dev->hdev; int ret; /* * Set an initial capacity and temperature value to avoid prematurely * triggering alerts. Will be replaced by values queried from initial * HOSTCMD requests. */ ts->psy_stats.capacity = 100; ts->psy_stats.temp = 182; shield_dev->battery_dev.desc.properties = thunderstrike_battery_props; shield_dev->battery_dev.desc.num_properties = ARRAY_SIZE(thunderstrike_battery_props); shield_dev->battery_dev.desc.get_property = thunderstrike_battery_get_property; shield_dev->battery_dev.desc.type = POWER_SUPPLY_TYPE_BATTERY; shield_dev->battery_dev.desc.name = devm_kasprintf(&ts->base.hdev->dev, GFP_KERNEL, "thunderstrike_%d", ts->id); if (!shield_dev->battery_dev.desc.name) return -ENOMEM; shield_dev->battery_dev.psy = power_supply_register( &hdev->dev, &shield_dev->battery_dev.desc, &psy_cfg); if (IS_ERR(shield_dev->battery_dev.psy)) { hid_err(hdev, "Failed to register Thunderstrike battery device\n"); return PTR_ERR(shield_dev->battery_dev.psy); } ret = power_supply_powers(shield_dev->battery_dev.psy, &hdev->dev); if (ret) { hid_err(hdev, "Failed to associate battery device to Thunderstrike\n"); goto err; } return 0; err: power_supply_unregister(shield_dev->battery_dev.psy); return ret; } static struct shield_device *thunderstrike_create(struct hid_device *hdev) { struct shield_device *shield_dev; struct thunderstrike *ts; int ret; ts = devm_kzalloc(&hdev->dev, sizeof(*ts), GFP_KERNEL); if (!ts) return ERR_PTR(-ENOMEM); ts->req_report_dmabuf = devm_kzalloc( &hdev->dev, THUNDERSTRIKE_HOSTCMD_REPORT_SIZE, GFP_KERNEL); if (!ts->req_report_dmabuf) return ERR_PTR(-ENOMEM); shield_dev = &ts->base; shield_dev->hdev = hdev; shield_dev->codename = "Thunderstrike"; spin_lock_init(&ts->haptics_update_lock); spin_lock_init(&ts->psy_stats_lock); INIT_WORK(&ts->hostcmd_req_work, thunderstrike_hostcmd_req_work_handler); hid_set_drvdata(hdev, shield_dev); ts->id = ida_alloc(&thunderstrike_ida, GFP_KERNEL); if (ts->id < 0) return ERR_PTR(ts->id); ts->haptics_dev = shield_haptics_create(shield_dev, thunderstrike_play_effect); if (IS_ERR(ts->haptics_dev)) { hid_err(hdev, "Failed to create Thunderstrike haptics instance\n"); ret = PTR_ERR(ts->haptics_dev); goto err_id; } ret = thunderstrike_psy_create(shield_dev); if (ret) { hid_err(hdev, "Failed to create Thunderstrike power supply instance\n"); goto err_haptics; } ret = thunderstrike_led_create(ts); if (ret) { hid_err(hdev, "Failed to create Thunderstrike LED instance\n"); goto err_psy; } timer_setup(&ts->psy_stats_timer, thunderstrike_psy_stats_timer_handler, 0); hid_info(hdev, "Registered Thunderstrike controller\n"); return shield_dev; err_psy: power_supply_unregister(shield_dev->battery_dev.psy); err_haptics: if (ts->haptics_dev) input_unregister_device(ts->haptics_dev); err_id: ida_free(&thunderstrike_ida, ts->id); return ERR_PTR(ret); } static void thunderstrike_destroy(struct thunderstrike *ts) { led_classdev_unregister(&ts->led_dev); power_supply_unregister(ts->base.battery_dev.psy); if (ts->haptics_dev) input_unregister_device(ts->haptics_dev); ida_free(&thunderstrike_ida, ts->id); } static int android_input_mapping(struct hid_device *hdev, struct hid_input *hi, struct hid_field *field, struct hid_usage *usage, unsigned long **bit, int *max) { if ((usage->hid & HID_USAGE_PAGE) != HID_UP_CONSUMER) return 0; switch (usage->hid & HID_USAGE) { case HID_USAGE_ANDROID_PLAYPAUSE_BTN: android_map_key(KEY_PLAYPAUSE); break; case HID_USAGE_ANDROID_VOLUMEUP_BTN: android_map_key(KEY_VOLUMEUP); break; case HID_USAGE_ANDROID_VOLUMEDOWN_BTN: android_map_key(KEY_VOLUMEDOWN); break; case HID_USAGE_ANDROID_SEARCH_BTN: android_map_key(BTN_Z); break; case HID_USAGE_ANDROID_HOME_BTN: android_map_key(BTN_MODE); break; case HID_USAGE_ANDROID_BACK_BTN: android_map_key(BTN_SELECT); break; default: return 0; } return 1; } static ssize_t firmware_version_show(struct device *dev, struct device_attribute *attr, char *buf) { struct hid_device *hdev = to_hid_device(dev); struct shield_device *shield_dev; int ret; shield_dev = hid_get_drvdata(hdev); if (test_bit(SHIELD_FW_VERSION_INITIALIZED, &shield_dev->initialized_flags)) ret = sysfs_emit(buf, "0x%04X\n", shield_dev->fw_version); else ret = sysfs_emit(buf, NOT_INIT_STR "\n"); return ret; } static DEVICE_ATTR_RO(firmware_version); static ssize_t hardware_version_show(struct device *dev, struct device_attribute *attr, char *buf) { struct hid_device *hdev = to_hid_device(dev); struct shield_device *shield_dev; char board_revision_str[4]; int ret; shield_dev = hid_get_drvdata(hdev); if (test_bit(SHIELD_BOARD_INFO_INITIALIZED, &shield_dev->initialized_flags)) { shield_strrev(board_revision_str, 4, shield_dev->board_info.revision); ret = sysfs_emit(buf, "%s BOARD_REVISION_%s (0x%04X)\n", shield_dev->codename, board_revision_str, shield_dev->board_info.revision); } else ret = sysfs_emit(buf, NOT_INIT_STR "\n"); return ret; } static DEVICE_ATTR_RO(hardware_version); static ssize_t serial_number_show(struct device *dev, struct device_attribute *attr, char *buf) { struct hid_device *hdev = to_hid_device(dev); struct shield_device *shield_dev; int ret; shield_dev = hid_get_drvdata(hdev); if (test_bit(SHIELD_BOARD_INFO_INITIALIZED, &shield_dev->initialized_flags)) ret = sysfs_emit(buf, "%s\n", shield_dev->board_info.serial_number); else ret = sysfs_emit(buf, NOT_INIT_STR "\n"); return ret; } static DEVICE_ATTR_RO(serial_number); static struct attribute *shield_device_attrs[] = { &dev_attr_firmware_version.attr, &dev_attr_hardware_version.attr, &dev_attr_serial_number.attr, NULL, }; ATTRIBUTE_GROUPS(shield_device); static int shield_raw_event(struct hid_device *hdev, struct hid_report *report, u8 *data, int size) { struct shield_device *dev = hid_get_drvdata(hdev); return thunderstrike_parse_report(dev, report, data, size); } static int shield_probe(struct hid_device *hdev, const struct hid_device_id *id) { struct shield_device *shield_dev = NULL; struct thunderstrike *ts; int ret; ret = hid_parse(hdev); if (ret) { hid_err(hdev, "Parse failed\n"); return ret; } switch (id->product) { case USB_DEVICE_ID_NVIDIA_THUNDERSTRIKE_CONTROLLER: shield_dev = thunderstrike_create(hdev); break; } if (unlikely(!shield_dev)) { hid_err(hdev, "Failed to identify SHIELD device\n"); return -ENODEV; } if (IS_ERR(shield_dev)) { hid_err(hdev, "Failed to create SHIELD device\n"); return PTR_ERR(shield_dev); } ts = container_of(shield_dev, struct thunderstrike, base); ret = hid_hw_start(hdev, HID_CONNECT_HIDINPUT); if (ret) { hid_err(hdev, "Failed to start HID device\n"); goto err_ts_create; } ret = hid_hw_open(hdev); if (ret) { hid_err(hdev, "Failed to open HID device\n"); goto err_stop; } thunderstrike_device_init_info(shield_dev); return ret; err_stop: hid_hw_stop(hdev); err_ts_create: thunderstrike_destroy(ts); return ret; } static void shield_remove(struct hid_device *hdev) { struct shield_device *dev = hid_get_drvdata(hdev); struct thunderstrike *ts; ts = container_of(dev, struct thunderstrike, base); hid_hw_close(hdev); thunderstrike_destroy(ts); timer_delete_sync(&ts->psy_stats_timer); cancel_work_sync(&ts->hostcmd_req_work); hid_hw_stop(hdev); } static const struct hid_device_id shield_devices[] = { { HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_NVIDIA, USB_DEVICE_ID_NVIDIA_THUNDERSTRIKE_CONTROLLER) }, { HID_USB_DEVICE(USB_VENDOR_ID_NVIDIA, USB_DEVICE_ID_NVIDIA_THUNDERSTRIKE_CONTROLLER) }, { } }; MODULE_DEVICE_TABLE(hid, shield_devices); static struct hid_driver shield_driver = { .name = "shield", .id_table = shield_devices, .input_mapping = android_input_mapping, .probe = shield_probe, .remove = shield_remove, .raw_event = shield_raw_event, .driver = { .dev_groups = shield_device_groups, }, }; module_hid_driver(shield_driver); MODULE_AUTHOR("Rahul Rameshbabu <rrameshbabu@nvidia.com>"); MODULE_DESCRIPTION("HID Driver for NVIDIA SHIELD peripherals."); MODULE_LICENSE("GPL"); |
| 2 2 2 2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 | // SPDX-License-Identifier: GPL-2.0 /* * thermal.c - sysfs interface of thermal devices * * Copyright (C) 2016 Eduardo Valentin <edubezval@gmail.com> * * Highly based on original thermal_core.c * Copyright (C) 2008 Intel Corp * Copyright (C) 2008 Zhang Rui <rui.zhang@intel.com> * Copyright (C) 2008 Sujith Thomas <sujith.thomas@intel.com> */ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #include <linux/container_of.h> #include <linux/sysfs.h> #include <linux/device.h> #include <linux/err.h> #include <linux/slab.h> #include <linux/string.h> #include <linux/jiffies.h> #include "thermal_core.h" /* sys I/F for thermal zone */ static ssize_t type_show(struct device *dev, struct device_attribute *attr, char *buf) { struct thermal_zone_device *tz = to_thermal_zone(dev); return sprintf(buf, "%s\n", tz->type); } static ssize_t temp_show(struct device *dev, struct device_attribute *attr, char *buf) { struct thermal_zone_device *tz = to_thermal_zone(dev); int temperature, ret; ret = thermal_zone_get_temp(tz, &temperature); if (!ret) return sprintf(buf, "%d\n", temperature); if (ret == -EAGAIN) return -ENODATA; return ret; } static ssize_t mode_show(struct device *dev, struct device_attribute *attr, char *buf) { struct thermal_zone_device *tz = to_thermal_zone(dev); guard(thermal_zone)(tz); if (tz->mode == THERMAL_DEVICE_ENABLED) return sprintf(buf, "enabled\n"); return sprintf(buf, "disabled\n"); } static ssize_t mode_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t count) { struct thermal_zone_device *tz = to_thermal_zone(dev); int result; if (!strncmp(buf, "enabled", sizeof("enabled") - 1)) result = thermal_zone_device_enable(tz); else if (!strncmp(buf, "disabled", sizeof("disabled") - 1)) result = thermal_zone_device_disable(tz); else result = -EINVAL; if (result) return result; return count; } #define thermal_trip_of_attr(_ptr_, _attr_) \ ({ \ struct thermal_trip_desc *td; \ \ td = container_of(_ptr_, struct thermal_trip_desc, \ trip_attrs._attr_.attr); \ &td->trip; \ }) static ssize_t trip_point_type_show(struct device *dev, struct device_attribute *attr, char *buf) { struct thermal_trip *trip = thermal_trip_of_attr(attr, type); return sprintf(buf, "%s\n", thermal_trip_type_name(trip->type)); } static ssize_t trip_point_temp_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t count) { struct thermal_trip *trip = thermal_trip_of_attr(attr, temp); struct thermal_zone_device *tz = to_thermal_zone(dev); int temp; if (kstrtoint(buf, 10, &temp)) return -EINVAL; guard(thermal_zone)(tz); if (temp == trip->temperature) return count; /* Arrange the condition to avoid integer overflows. */ if (temp != THERMAL_TEMP_INVALID && temp <= trip->hysteresis + THERMAL_TEMP_INVALID) return -EINVAL; if (tz->ops.set_trip_temp) { int ret; ret = tz->ops.set_trip_temp(tz, trip, temp); if (ret) return ret; } thermal_zone_set_trip_temp(tz, trip, temp); __thermal_zone_device_update(tz, THERMAL_TRIP_CHANGED); return count; } static ssize_t trip_point_temp_show(struct device *dev, struct device_attribute *attr, char *buf) { struct thermal_trip *trip = thermal_trip_of_attr(attr, temp); return sprintf(buf, "%d\n", READ_ONCE(trip->temperature)); } static ssize_t trip_point_hyst_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t count) { struct thermal_trip *trip = thermal_trip_of_attr(attr, hyst); struct thermal_zone_device *tz = to_thermal_zone(dev); int hyst; if (kstrtoint(buf, 10, &hyst) || hyst < 0) return -EINVAL; guard(thermal_zone)(tz); if (hyst == trip->hysteresis) return count; /* * Allow the hysteresis to be updated when the temperature is invalid * to allow user space to avoid having to adjust hysteresis after a * valid temperature has been set, but in that case just change the * value and do nothing else. */ if (trip->temperature == THERMAL_TEMP_INVALID) { WRITE_ONCE(trip->hysteresis, hyst); return count; } if (trip->temperature - hyst <= THERMAL_TEMP_INVALID) return -EINVAL; thermal_zone_set_trip_hyst(tz, trip, hyst); __thermal_zone_device_update(tz, THERMAL_TRIP_CHANGED); return count; } static ssize_t trip_point_hyst_show(struct device *dev, struct device_attribute *attr, char *buf) { struct thermal_trip *trip = thermal_trip_of_attr(attr, hyst); return sprintf(buf, "%d\n", READ_ONCE(trip->hysteresis)); } static ssize_t policy_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t count) { struct thermal_zone_device *tz = to_thermal_zone(dev); char name[THERMAL_NAME_LENGTH]; int ret; snprintf(name, sizeof(name), "%s", buf); ret = thermal_zone_device_set_policy(tz, name); if (!ret) ret = count; return ret; } static ssize_t policy_show(struct device *dev, struct device_attribute *devattr, char *buf) { struct thermal_zone_device *tz = to_thermal_zone(dev); return sprintf(buf, "%s\n", tz->governor->name); } static ssize_t available_policies_show(struct device *dev, struct device_attribute *devattr, char *buf) { return thermal_build_list_of_policies(buf); } #if (IS_ENABLED(CONFIG_THERMAL_EMULATION)) static ssize_t emul_temp_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t count) { struct thermal_zone_device *tz = to_thermal_zone(dev); int temperature; if (kstrtoint(buf, 10, &temperature)) return -EINVAL; guard(thermal_zone)(tz); if (tz->ops.set_emul_temp) { int ret; ret = tz->ops.set_emul_temp(tz, temperature); if (ret) return ret; } else { tz->emul_temperature = temperature; } __thermal_zone_device_update(tz, THERMAL_EVENT_UNSPECIFIED); return count; } static DEVICE_ATTR_WO(emul_temp); #endif static ssize_t sustainable_power_show(struct device *dev, struct device_attribute *devattr, char *buf) { struct thermal_zone_device *tz = to_thermal_zone(dev); if (tz->tzp) return sprintf(buf, "%u\n", tz->tzp->sustainable_power); else return -EIO; } static ssize_t sustainable_power_store(struct device *dev, struct device_attribute *devattr, const char *buf, size_t count) { struct thermal_zone_device *tz = to_thermal_zone(dev); u32 sustainable_power; if (!tz->tzp) return -EIO; if (kstrtou32(buf, 10, &sustainable_power)) return -EINVAL; tz->tzp->sustainable_power = sustainable_power; return count; } #define create_s32_tzp_attr(name) \ static ssize_t \ name##_show(struct device *dev, struct device_attribute *devattr, \ char *buf) \ { \ struct thermal_zone_device *tz = to_thermal_zone(dev); \ \ if (tz->tzp) \ return sprintf(buf, "%d\n", tz->tzp->name); \ else \ return -EIO; \ } \ \ static ssize_t \ name##_store(struct device *dev, struct device_attribute *devattr, \ const char *buf, size_t count) \ { \ struct thermal_zone_device *tz = to_thermal_zone(dev); \ s32 value; \ \ if (!tz->tzp) \ return -EIO; \ \ if (kstrtos32(buf, 10, &value)) \ return -EINVAL; \ \ tz->tzp->name = value; \ \ return count; \ } \ static DEVICE_ATTR_RW(name) create_s32_tzp_attr(k_po); create_s32_tzp_attr(k_pu); create_s32_tzp_attr(k_i); create_s32_tzp_attr(k_d); create_s32_tzp_attr(integral_cutoff); create_s32_tzp_attr(slope); create_s32_tzp_attr(offset); #undef create_s32_tzp_attr /* * These are thermal zone device attributes that will always be present. * All the attributes created for tzp (create_s32_tzp_attr) also are always * present on the sysfs interface. */ static DEVICE_ATTR_RO(type); static DEVICE_ATTR_RO(temp); static DEVICE_ATTR_RW(policy); static DEVICE_ATTR_RO(available_policies); static DEVICE_ATTR_RW(sustainable_power); /* These thermal zone device attributes are created based on conditions */ static DEVICE_ATTR_RW(mode); /* These attributes are unconditionally added to a thermal zone */ static struct attribute *thermal_zone_dev_attrs[] = { &dev_attr_type.attr, &dev_attr_temp.attr, #if (IS_ENABLED(CONFIG_THERMAL_EMULATION)) &dev_attr_emul_temp.attr, #endif &dev_attr_policy.attr, &dev_attr_available_policies.attr, &dev_attr_sustainable_power.attr, &dev_attr_k_po.attr, &dev_attr_k_pu.attr, &dev_attr_k_i.attr, &dev_attr_k_d.attr, &dev_attr_integral_cutoff.attr, &dev_attr_slope.attr, &dev_attr_offset.attr, NULL, }; static const struct attribute_group thermal_zone_attribute_group = { .attrs = thermal_zone_dev_attrs, }; static struct attribute *thermal_zone_mode_attrs[] = { &dev_attr_mode.attr, NULL, }; static const struct attribute_group thermal_zone_mode_attribute_group = { .attrs = thermal_zone_mode_attrs, }; static const struct attribute_group *thermal_zone_attribute_groups[] = { &thermal_zone_attribute_group, &thermal_zone_mode_attribute_group, /* This is not NULL terminated as we create the group dynamically */ }; /** * create_trip_attrs() - create attributes for trip points * @tz: the thermal zone device * * helper function to instantiate sysfs entries for every trip * point and its properties of a struct thermal_zone_device. * * Return: 0 on success, the proper error value otherwise. */ static int create_trip_attrs(struct thermal_zone_device *tz) { struct thermal_trip_desc *td; struct attribute **attrs; int i; attrs = kcalloc(tz->num_trips * 3 + 1, sizeof(*attrs), GFP_KERNEL); if (!attrs) return -ENOMEM; i = 0; for_each_trip_desc(tz, td) { struct thermal_trip_attrs *trip_attrs = &td->trip_attrs; /* create trip type attribute */ snprintf(trip_attrs->type.name, THERMAL_NAME_LENGTH, "trip_point_%d_type", i); sysfs_attr_init(&trip_attrs->type.attr.attr); trip_attrs->type.attr.attr.name = trip_attrs->type.name; trip_attrs->type.attr.attr.mode = S_IRUGO; trip_attrs->type.attr.show = trip_point_type_show; attrs[i] = &trip_attrs->type.attr.attr; /* create trip temp attribute */ snprintf(trip_attrs->temp.name, THERMAL_NAME_LENGTH, "trip_point_%d_temp", i); sysfs_attr_init(&trip_attrs->temp.attr.attr); trip_attrs->temp.attr.attr.name = trip_attrs->temp.name; trip_attrs->temp.attr.attr.mode = S_IRUGO; trip_attrs->temp.attr.show = trip_point_temp_show; if (td->trip.flags & THERMAL_TRIP_FLAG_RW_TEMP) { trip_attrs->temp.attr.attr.mode |= S_IWUSR; trip_attrs->temp.attr.store = trip_point_temp_store; } attrs[i + tz->num_trips] = &trip_attrs->temp.attr.attr; snprintf(trip_attrs->hyst.name, THERMAL_NAME_LENGTH, "trip_point_%d_hyst", i); sysfs_attr_init(&trip_attrs->hyst.attr.attr); trip_attrs->hyst.attr.attr.name = trip_attrs->hyst.name; trip_attrs->hyst.attr.attr.mode = S_IRUGO; trip_attrs->hyst.attr.show = trip_point_hyst_show; if (td->trip.flags & THERMAL_TRIP_FLAG_RW_HYST) { trip_attrs->hyst.attr.attr.mode |= S_IWUSR; trip_attrs->hyst.attr.store = trip_point_hyst_store; } attrs[i + 2 * tz->num_trips] = &trip_attrs->hyst.attr.attr; i++; } attrs[tz->num_trips * 3] = NULL; tz->trips_attribute_group.attrs = attrs; return 0; } /** * destroy_trip_attrs() - destroy attributes for trip points * @tz: the thermal zone device * * helper function to free resources allocated by create_trip_attrs() */ static void destroy_trip_attrs(struct thermal_zone_device *tz) { if (tz) kfree(tz->trips_attribute_group.attrs); } int thermal_zone_create_device_groups(struct thermal_zone_device *tz) { const struct attribute_group **groups; int i, size, result; /* we need one extra for trips and the NULL to terminate the array */ size = ARRAY_SIZE(thermal_zone_attribute_groups) + 2; /* This also takes care of API requirement to be NULL terminated */ groups = kcalloc(size, sizeof(*groups), GFP_KERNEL); if (!groups) return -ENOMEM; for (i = 0; i < size - 2; i++) groups[i] = thermal_zone_attribute_groups[i]; if (tz->num_trips) { result = create_trip_attrs(tz); if (result) { kfree(groups); return result; } groups[size - 2] = &tz->trips_attribute_group; } tz->device.groups = groups; return 0; } void thermal_zone_destroy_device_groups(struct thermal_zone_device *tz) { if (!tz) return; if (tz->num_trips) destroy_trip_attrs(tz); kfree(tz->device.groups); } /* sys I/F for cooling device */ static ssize_t cdev_type_show(struct device *dev, struct device_attribute *attr, char *buf) { struct thermal_cooling_device *cdev = to_cooling_device(dev); return sprintf(buf, "%s\n", cdev->type); } static ssize_t max_state_show(struct device *dev, struct device_attribute *attr, char *buf) { struct thermal_cooling_device *cdev = to_cooling_device(dev); return sprintf(buf, "%ld\n", cdev->max_state); } static ssize_t cur_state_show(struct device *dev, struct device_attribute *attr, char *buf) { struct thermal_cooling_device *cdev = to_cooling_device(dev); unsigned long state; int ret; ret = cdev->ops->get_cur_state(cdev, &state); if (ret) return ret; return sprintf(buf, "%ld\n", state); } static ssize_t cur_state_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t count) { struct thermal_cooling_device *cdev = to_cooling_device(dev); unsigned long state; int result; if (sscanf(buf, "%ld\n", &state) != 1) return -EINVAL; if ((long)state < 0) return -EINVAL; /* Requested state should be less than max_state + 1 */ if (state > cdev->max_state) return -EINVAL; guard(cooling_dev)(cdev); result = cdev->ops->set_cur_state(cdev, state); if (result) return result; thermal_cooling_device_stats_update(cdev, state); return count; } static struct device_attribute dev_attr_cdev_type = __ATTR(type, 0444, cdev_type_show, NULL); static DEVICE_ATTR_RO(max_state); static DEVICE_ATTR_RW(cur_state); static struct attribute *cooling_device_attrs[] = { &dev_attr_cdev_type.attr, &dev_attr_max_state.attr, &dev_attr_cur_state.attr, NULL, }; static const struct attribute_group cooling_device_attr_group = { .attrs = cooling_device_attrs, }; static const struct attribute_group *cooling_device_attr_groups[] = { &cooling_device_attr_group, NULL, /* Space allocated for cooling_device_stats_attr_group */ NULL, }; #ifdef CONFIG_THERMAL_STATISTICS struct cooling_dev_stats { spinlock_t lock; unsigned int total_trans; unsigned long state; ktime_t last_time; ktime_t *time_in_state; unsigned int *trans_table; }; static void update_time_in_state(struct cooling_dev_stats *stats) { ktime_t now = ktime_get(), delta; delta = ktime_sub(now, stats->last_time); stats->time_in_state[stats->state] = ktime_add(stats->time_in_state[stats->state], delta); stats->last_time = now; } void thermal_cooling_device_stats_update(struct thermal_cooling_device *cdev, unsigned long new_state) { struct cooling_dev_stats *stats = cdev->stats; lockdep_assert_held(&cdev->lock); if (!stats) return; spin_lock(&stats->lock); if (stats->state == new_state) goto unlock; update_time_in_state(stats); stats->trans_table[stats->state * (cdev->max_state + 1) + new_state]++; stats->state = new_state; stats->total_trans++; unlock: spin_unlock(&stats->lock); } static ssize_t total_trans_show(struct device *dev, struct device_attribute *attr, char *buf) { struct thermal_cooling_device *cdev = to_cooling_device(dev); struct cooling_dev_stats *stats; int ret; guard(cooling_dev)(cdev); stats = cdev->stats; if (!stats) return 0; spin_lock(&stats->lock); ret = sprintf(buf, "%u\n", stats->total_trans); spin_unlock(&stats->lock); return ret; } static ssize_t time_in_state_ms_show(struct device *dev, struct device_attribute *attr, char *buf) { struct thermal_cooling_device *cdev = to_cooling_device(dev); struct cooling_dev_stats *stats; ssize_t len = 0; int i; guard(cooling_dev)(cdev); stats = cdev->stats; if (!stats) return 0; spin_lock(&stats->lock); update_time_in_state(stats); for (i = 0; i <= cdev->max_state; i++) { len += sprintf(buf + len, "state%u\t%llu\n", i, ktime_to_ms(stats->time_in_state[i])); } spin_unlock(&stats->lock); return len; } static ssize_t reset_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t count) { struct thermal_cooling_device *cdev = to_cooling_device(dev); struct cooling_dev_stats *stats; int i, states; guard(cooling_dev)(cdev); stats = cdev->stats; if (!stats) return count; states = cdev->max_state + 1; spin_lock(&stats->lock); stats->total_trans = 0; stats->last_time = ktime_get(); memset(stats->trans_table, 0, states * states * sizeof(*stats->trans_table)); for (i = 0; i < states; i++) stats->time_in_state[i] = ktime_set(0, 0); spin_unlock(&stats->lock); return count; } static ssize_t trans_table_show(struct device *dev, struct device_attribute *attr, char *buf) { struct thermal_cooling_device *cdev = to_cooling_device(dev); struct cooling_dev_stats *stats; ssize_t len = 0; int i, j; guard(cooling_dev)(cdev); stats = cdev->stats; if (!stats) return -ENODATA; len += snprintf(buf + len, PAGE_SIZE - len, " From : To\n"); len += snprintf(buf + len, PAGE_SIZE - len, " : "); for (i = 0; i <= cdev->max_state; i++) { if (len >= PAGE_SIZE) break; len += snprintf(buf + len, PAGE_SIZE - len, "state%2u ", i); } if (len >= PAGE_SIZE) return PAGE_SIZE; len += snprintf(buf + len, PAGE_SIZE - len, "\n"); for (i = 0; i <= cdev->max_state; i++) { if (len >= PAGE_SIZE) break; len += snprintf(buf + len, PAGE_SIZE - len, "state%2u:", i); for (j = 0; j <= cdev->max_state; j++) { if (len >= PAGE_SIZE) break; len += snprintf(buf + len, PAGE_SIZE - len, "%8u ", stats->trans_table[i * (cdev->max_state + 1) + j]); } if (len >= PAGE_SIZE) break; len += snprintf(buf + len, PAGE_SIZE - len, "\n"); } if (len >= PAGE_SIZE) { pr_warn_once("Thermal transition table exceeds PAGE_SIZE. Disabling\n"); len = -EFBIG; } return len; } static DEVICE_ATTR_RO(total_trans); static DEVICE_ATTR_RO(time_in_state_ms); static DEVICE_ATTR_WO(reset); static DEVICE_ATTR_RO(trans_table); static struct attribute *cooling_device_stats_attrs[] = { &dev_attr_total_trans.attr, &dev_attr_time_in_state_ms.attr, &dev_attr_reset.attr, &dev_attr_trans_table.attr, NULL }; static const struct attribute_group cooling_device_stats_attr_group = { .attrs = cooling_device_stats_attrs, .name = "stats" }; static void cooling_device_stats_setup(struct thermal_cooling_device *cdev) { const struct attribute_group *stats_attr_group = NULL; struct cooling_dev_stats *stats; /* Total number of states is highest state + 1 */ unsigned long states = cdev->max_state + 1; int var; var = sizeof(*stats); var += sizeof(*stats->time_in_state) * states; var += sizeof(*stats->trans_table) * states * states; stats = kzalloc(var, GFP_KERNEL); if (!stats) goto out; stats->time_in_state = (ktime_t *)(stats + 1); stats->trans_table = (unsigned int *)(stats->time_in_state + states); cdev->stats = stats; stats->last_time = ktime_get(); spin_lock_init(&stats->lock); stats_attr_group = &cooling_device_stats_attr_group; out: /* Fill the empty slot left in cooling_device_attr_groups */ var = ARRAY_SIZE(cooling_device_attr_groups) - 2; cooling_device_attr_groups[var] = stats_attr_group; } static void cooling_device_stats_destroy(struct thermal_cooling_device *cdev) { kfree(cdev->stats); cdev->stats = NULL; } #else static inline void cooling_device_stats_setup(struct thermal_cooling_device *cdev) {} static inline void cooling_device_stats_destroy(struct thermal_cooling_device *cdev) {} #endif /* CONFIG_THERMAL_STATISTICS */ void thermal_cooling_device_setup_sysfs(struct thermal_cooling_device *cdev) { cooling_device_stats_setup(cdev); cdev->device.groups = cooling_device_attr_groups; } void thermal_cooling_device_destroy_sysfs(struct thermal_cooling_device *cdev) { cooling_device_stats_destroy(cdev); } void thermal_cooling_device_stats_reinit(struct thermal_cooling_device *cdev) { lockdep_assert_held(&cdev->lock); cooling_device_stats_destroy(cdev); cooling_device_stats_setup(cdev); } /* these helper will be used only at the time of bindig */ ssize_t trip_point_show(struct device *dev, struct device_attribute *attr, char *buf) { struct thermal_zone_device *tz = to_thermal_zone(dev); struct thermal_instance *instance; instance = container_of(attr, struct thermal_instance, attr); return sprintf(buf, "%d\n", thermal_zone_trip_id(tz, instance->trip)); } ssize_t weight_show(struct device *dev, struct device_attribute *attr, char *buf) { struct thermal_instance *instance; instance = container_of(attr, struct thermal_instance, weight_attr); return sprintf(buf, "%d\n", instance->weight); } ssize_t weight_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t count) { struct thermal_zone_device *tz = to_thermal_zone(dev); struct thermal_instance *instance; int ret, weight; ret = kstrtoint(buf, 0, &weight); if (ret) return ret; instance = container_of(attr, struct thermal_instance, weight_attr); /* Don't race with governors using the 'weight' value */ guard(thermal_zone)(tz); instance->weight = weight; thermal_governor_update_tz(tz, THERMAL_INSTANCE_WEIGHT_CHANGED); return count; } |
| 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 | // SPDX-License-Identifier: GPL-2.0-only /* IP tables module for matching the value of the IPv4/IPv6 DSCP field * * (C) 2002 by Harald Welte <laforge@netfilter.org> */ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #include <linux/module.h> #include <linux/skbuff.h> #include <linux/ip.h> #include <linux/ipv6.h> #include <net/dsfield.h> #include <linux/netfilter/x_tables.h> #include <linux/netfilter/xt_dscp.h> MODULE_AUTHOR("Harald Welte <laforge@netfilter.org>"); MODULE_DESCRIPTION("Xtables: DSCP/TOS field match"); MODULE_LICENSE("GPL"); MODULE_ALIAS("ipt_dscp"); MODULE_ALIAS("ip6t_dscp"); MODULE_ALIAS("ipt_tos"); MODULE_ALIAS("ip6t_tos"); static bool dscp_mt(const struct sk_buff *skb, struct xt_action_param *par) { const struct xt_dscp_info *info = par->matchinfo; u_int8_t dscp = ipv4_get_dsfield(ip_hdr(skb)) >> XT_DSCP_SHIFT; return (dscp == info->dscp) ^ !!info->invert; } static bool dscp_mt6(const struct sk_buff *skb, struct xt_action_param *par) { const struct xt_dscp_info *info = par->matchinfo; u_int8_t dscp = ipv6_get_dsfield(ipv6_hdr(skb)) >> XT_DSCP_SHIFT; return (dscp == info->dscp) ^ !!info->invert; } static int dscp_mt_check(const struct xt_mtchk_param *par) { const struct xt_dscp_info *info = par->matchinfo; if (info->dscp > XT_DSCP_MAX) return -EDOM; return 0; } static bool tos_mt(const struct sk_buff *skb, struct xt_action_param *par) { const struct xt_tos_match_info *info = par->matchinfo; if (xt_family(par) == NFPROTO_IPV4) return ((ip_hdr(skb)->tos & info->tos_mask) == info->tos_value) ^ !!info->invert; else return ((ipv6_get_dsfield(ipv6_hdr(skb)) & info->tos_mask) == info->tos_value) ^ !!info->invert; } static struct xt_match dscp_mt_reg[] __read_mostly = { { .name = "dscp", .family = NFPROTO_IPV4, .checkentry = dscp_mt_check, .match = dscp_mt, .matchsize = sizeof(struct xt_dscp_info), .me = THIS_MODULE, }, { .name = "dscp", .family = NFPROTO_IPV6, .checkentry = dscp_mt_check, .match = dscp_mt6, .matchsize = sizeof(struct xt_dscp_info), .me = THIS_MODULE, }, { .name = "tos", .revision = 1, .family = NFPROTO_IPV4, .match = tos_mt, .matchsize = sizeof(struct xt_tos_match_info), .me = THIS_MODULE, }, { .name = "tos", .revision = 1, .family = NFPROTO_IPV6, .match = tos_mt, .matchsize = sizeof(struct xt_tos_match_info), .me = THIS_MODULE, }, }; static int __init dscp_mt_init(void) { return xt_register_matches(dscp_mt_reg, ARRAY_SIZE(dscp_mt_reg)); } static void __exit dscp_mt_exit(void) { xt_unregister_matches(dscp_mt_reg, ARRAY_SIZE(dscp_mt_reg)); } module_init(dscp_mt_init); module_exit(dscp_mt_exit); |
| 35 202 575 1 576 574 422 264 289 8 8 8 8 567 564 566 549 199 302 399 552 194 2 567 549 196 567 9 375 376 3 371 38 331 365 3 259 1 35 35 1 72 48 4 20 24 47 22 45 24 289 63 350 326 36 248 85 36 295 365 35 101 9 376 450 451 313 16 300 205 206 204 2 128 128 4 18 16 1 106 123 30 18 15 67 1 56 311 261 57 560 557 218 33 183 218 88 169 179 179 173 7 7 7 2 6 69 70 57 4 8 1 181 181 181 181 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 | // SPDX-License-Identifier: GPL-2.0-or-later /* * PF_INET6 socket protocol family * Linux INET6 implementation * * Authors: * Pedro Roque <roque@di.fc.ul.pt> * * Adapted from linux/net/ipv4/af_inet.c * * Fixes: * piggy, Karl Knutson : Socket protocol table * Hideaki YOSHIFUJI : sin6_scope_id support * Arnaldo Melo : check proc_net_create return, cleanups */ #define pr_fmt(fmt) "IPv6: " fmt #include <linux/module.h> #include <linux/capability.h> #include <linux/errno.h> #include <linux/types.h> #include <linux/socket.h> #include <linux/in.h> #include <linux/kernel.h> #include <linux/timer.h> #include <linux/string.h> #include <linux/sockios.h> #include <linux/net.h> #include <linux/fcntl.h> #include <linux/mm.h> #include <linux/interrupt.h> #include <linux/proc_fs.h> #include <linux/stat.h> #include <linux/init.h> #include <linux/slab.h> #include <linux/inet.h> #include <linux/netdevice.h> #include <linux/icmpv6.h> #include <linux/netfilter_ipv6.h> #include <net/ip.h> #include <net/ipv6.h> #include <net/udp.h> #include <net/udplite.h> #include <net/tcp.h> #include <net/ping.h> #include <net/protocol.h> #include <net/inet_common.h> #include <net/route.h> #include <net/transp_v6.h> #include <net/ip6_route.h> #include <net/addrconf.h> #include <net/ipv6_stubs.h> #include <net/ndisc.h> #ifdef CONFIG_IPV6_TUNNEL #include <net/ip6_tunnel.h> #endif #include <net/calipso.h> #include <net/seg6.h> #include <net/rpl.h> #include <net/compat.h> #include <net/xfrm.h> #include <net/ioam6.h> #include <net/rawv6.h> #include <net/rps.h> #include <linux/uaccess.h> #include <linux/mroute6.h> #include "ip6_offload.h" MODULE_AUTHOR("Cast of dozens"); MODULE_DESCRIPTION("IPv6 protocol stack for Linux"); MODULE_LICENSE("GPL"); /* The inetsw6 table contains everything that inet6_create needs to * build a new socket. */ static struct list_head inetsw6[SOCK_MAX]; static DEFINE_SPINLOCK(inetsw6_lock); struct ipv6_params ipv6_defaults = { .disable_ipv6 = 0, .autoconf = 1, }; static int disable_ipv6_mod; module_param_named(disable, disable_ipv6_mod, int, 0444); MODULE_PARM_DESC(disable, "Disable IPv6 module such that it is non-functional"); module_param_named(disable_ipv6, ipv6_defaults.disable_ipv6, int, 0444); MODULE_PARM_DESC(disable_ipv6, "Disable IPv6 on all interfaces"); module_param_named(autoconf, ipv6_defaults.autoconf, int, 0444); MODULE_PARM_DESC(autoconf, "Enable IPv6 address autoconfiguration on all interfaces"); bool ipv6_mod_enabled(void) { return disable_ipv6_mod == 0; } EXPORT_SYMBOL_GPL(ipv6_mod_enabled); static struct ipv6_pinfo *inet6_sk_generic(struct sock *sk) { const int offset = sk->sk_prot->ipv6_pinfo_offset; return (struct ipv6_pinfo *)(((u8 *)sk) + offset); } void inet6_sock_destruct(struct sock *sk) { inet6_cleanup_sock(sk); inet_sock_destruct(sk); } EXPORT_SYMBOL_GPL(inet6_sock_destruct); static int inet6_create(struct net *net, struct socket *sock, int protocol, int kern) { struct inet_sock *inet; struct ipv6_pinfo *np; struct sock *sk; struct inet_protosw *answer; struct proto *answer_prot; unsigned char answer_flags; int try_loading_module = 0; int err; if (protocol < 0 || protocol >= IPPROTO_MAX) return -EINVAL; /* Look for the requested type/protocol pair. */ lookup_protocol: err = -ESOCKTNOSUPPORT; rcu_read_lock(); list_for_each_entry_rcu(answer, &inetsw6[sock->type], list) { err = 0; /* Check the non-wild match. */ if (protocol == answer->protocol) { if (protocol != IPPROTO_IP) break; } else { /* Check for the two wild cases. */ if (IPPROTO_IP == protocol) { protocol = answer->protocol; break; } if (IPPROTO_IP == answer->protocol) break; } err = -EPROTONOSUPPORT; } if (err) { if (try_loading_module < 2) { rcu_read_unlock(); /* * Be more specific, e.g. net-pf-10-proto-132-type-1 * (net-pf-PF_INET6-proto-IPPROTO_SCTP-type-SOCK_STREAM) */ if (++try_loading_module == 1) request_module("net-pf-%d-proto-%d-type-%d", PF_INET6, protocol, sock->type); /* * Fall back to generic, e.g. net-pf-10-proto-132 * (net-pf-PF_INET6-proto-IPPROTO_SCTP) */ else request_module("net-pf-%d-proto-%d", PF_INET6, protocol); goto lookup_protocol; } else goto out_rcu_unlock; } err = -EPERM; if (sock->type == SOCK_RAW && !kern && !ns_capable(net->user_ns, CAP_NET_RAW)) goto out_rcu_unlock; sock->ops = answer->ops; answer_prot = answer->prot; answer_flags = answer->flags; rcu_read_unlock(); WARN_ON(!answer_prot->slab); err = -ENOBUFS; sk = sk_alloc(net, PF_INET6, GFP_KERNEL, answer_prot, kern); if (!sk) goto out; sock_init_data(sock, sk); err = 0; if (INET_PROTOSW_REUSE & answer_flags) sk->sk_reuse = SK_CAN_REUSE; if (INET_PROTOSW_ICSK & answer_flags) inet_init_csk_locks(sk); inet = inet_sk(sk); inet_assign_bit(IS_ICSK, sk, INET_PROTOSW_ICSK & answer_flags); if (SOCK_RAW == sock->type) { inet->inet_num = protocol; if (IPPROTO_RAW == protocol) inet_set_bit(HDRINCL, sk); } sk->sk_destruct = inet6_sock_destruct; sk->sk_family = PF_INET6; sk->sk_protocol = protocol; sk->sk_backlog_rcv = answer->prot->backlog_rcv; inet_sk(sk)->pinet6 = np = inet6_sk_generic(sk); np->hop_limit = -1; np->mcast_hops = IPV6_DEFAULT_MCASTHOPS; inet6_set_bit(MC6_LOOP, sk); inet6_set_bit(MC6_ALL, sk); np->pmtudisc = IPV6_PMTUDISC_WANT; inet6_assign_bit(REPFLOW, sk, net->ipv6.sysctl.flowlabel_reflect & FLOWLABEL_REFLECT_ESTABLISHED); sk->sk_ipv6only = net->ipv6.sysctl.bindv6only; sk->sk_txrehash = READ_ONCE(net->core.sysctl_txrehash); /* Init the ipv4 part of the socket since we can have sockets * using v6 API for ipv4. */ inet->uc_ttl = -1; inet_set_bit(MC_LOOP, sk); inet->mc_ttl = 1; inet->mc_index = 0; RCU_INIT_POINTER(inet->mc_list, NULL); inet->rcv_tos = 0; if (READ_ONCE(net->ipv4.sysctl_ip_no_pmtu_disc)) inet->pmtudisc = IP_PMTUDISC_DONT; else inet->pmtudisc = IP_PMTUDISC_WANT; if (inet->inet_num) { /* It assumes that any protocol which allows * the user to assign a number at socket * creation time automatically shares. */ inet->inet_sport = htons(inet->inet_num); err = sk->sk_prot->hash(sk); if (err) goto out_sk_release; } if (sk->sk_prot->init) { err = sk->sk_prot->init(sk); if (err) goto out_sk_release; } if (!kern) { err = BPF_CGROUP_RUN_PROG_INET_SOCK(sk); if (err) goto out_sk_release; } out: return err; out_rcu_unlock: rcu_read_unlock(); goto out; out_sk_release: sk_common_release(sk); sock->sk = NULL; goto out; } static int __inet6_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len, u32 flags) { struct sockaddr_in6 *addr = (struct sockaddr_in6 *)uaddr; struct inet_sock *inet = inet_sk(sk); struct ipv6_pinfo *np = inet6_sk(sk); struct net *net = sock_net(sk); __be32 v4addr = 0; unsigned short snum; bool saved_ipv6only; int addr_type = 0; int err = 0; if (addr->sin6_family != AF_INET6) return -EAFNOSUPPORT; addr_type = ipv6_addr_type(&addr->sin6_addr); if ((addr_type & IPV6_ADDR_MULTICAST) && sk->sk_type == SOCK_STREAM) return -EINVAL; snum = ntohs(addr->sin6_port); if (!(flags & BIND_NO_CAP_NET_BIND_SERVICE) && snum && inet_port_requires_bind_service(net, snum) && !ns_capable(net->user_ns, CAP_NET_BIND_SERVICE)) return -EACCES; if (flags & BIND_WITH_LOCK) lock_sock(sk); /* Check these errors (active socket, double bind). */ if (sk->sk_state != TCP_CLOSE || inet->inet_num) { err = -EINVAL; goto out; } /* Check if the address belongs to the host. */ if (addr_type == IPV6_ADDR_MAPPED) { struct net_device *dev = NULL; int chk_addr_ret; /* Binding to v4-mapped address on a v6-only socket * makes no sense */ if (ipv6_only_sock(sk)) { err = -EINVAL; goto out; } rcu_read_lock(); if (sk->sk_bound_dev_if) { dev = dev_get_by_index_rcu(net, sk->sk_bound_dev_if); if (!dev) { err = -ENODEV; goto out_unlock; } } /* Reproduce AF_INET checks to make the bindings consistent */ v4addr = addr->sin6_addr.s6_addr32[3]; chk_addr_ret = inet_addr_type_dev_table(net, dev, v4addr); rcu_read_unlock(); if (!inet_addr_valid_or_nonlocal(net, inet, v4addr, chk_addr_ret)) { err = -EADDRNOTAVAIL; goto out; } } else { if (addr_type != IPV6_ADDR_ANY) { struct net_device *dev = NULL; rcu_read_lock(); if (__ipv6_addr_needs_scope_id(addr_type)) { if (addr_len >= sizeof(struct sockaddr_in6) && addr->sin6_scope_id) { /* Override any existing binding, if another one * is supplied by user. */ sk->sk_bound_dev_if = addr->sin6_scope_id; } /* Binding to link-local address requires an interface */ if (!sk->sk_bound_dev_if) { err = -EINVAL; goto out_unlock; } } if (sk->sk_bound_dev_if) { dev = dev_get_by_index_rcu(net, sk->sk_bound_dev_if); if (!dev) { err = -ENODEV; goto out_unlock; } } /* ipv4 addr of the socket is invalid. Only the * unspecified and mapped address have a v4 equivalent. */ v4addr = LOOPBACK4_IPV6; if (!(addr_type & IPV6_ADDR_MULTICAST)) { if (!ipv6_can_nonlocal_bind(net, inet) && !ipv6_chk_addr(net, &addr->sin6_addr, dev, 0)) { err = -EADDRNOTAVAIL; goto out_unlock; } } rcu_read_unlock(); } } inet->inet_rcv_saddr = v4addr; inet->inet_saddr = v4addr; sk->sk_v6_rcv_saddr = addr->sin6_addr; if (!(addr_type & IPV6_ADDR_MULTICAST)) np->saddr = addr->sin6_addr; saved_ipv6only = sk->sk_ipv6only; if (addr_type != IPV6_ADDR_ANY && addr_type != IPV6_ADDR_MAPPED) sk->sk_ipv6only = 1; /* Make sure we are allowed to bind here. */ if (snum || !(inet_test_bit(BIND_ADDRESS_NO_PORT, sk) || (flags & BIND_FORCE_ADDRESS_NO_PORT))) { err = sk->sk_prot->get_port(sk, snum); if (err) { sk->sk_ipv6only = saved_ipv6only; inet_reset_saddr(sk); goto out; } if (!(flags & BIND_FROM_BPF)) { err = BPF_CGROUP_RUN_PROG_INET6_POST_BIND(sk); if (err) { sk->sk_ipv6only = saved_ipv6only; inet_reset_saddr(sk); if (sk->sk_prot->put_port) sk->sk_prot->put_port(sk); goto out; } } } if (addr_type != IPV6_ADDR_ANY) sk->sk_userlocks |= SOCK_BINDADDR_LOCK; if (snum) sk->sk_userlocks |= SOCK_BINDPORT_LOCK; inet->inet_sport = htons(inet->inet_num); inet->inet_dport = 0; inet->inet_daddr = 0; out: if (flags & BIND_WITH_LOCK) release_sock(sk); return err; out_unlock: rcu_read_unlock(); goto out; } int inet6_bind_sk(struct sock *sk, struct sockaddr *uaddr, int addr_len) { u32 flags = BIND_WITH_LOCK; const struct proto *prot; int err = 0; /* IPV6_ADDRFORM can change sk->sk_prot under us. */ prot = READ_ONCE(sk->sk_prot); /* If the socket has its own bind function then use it. */ if (prot->bind) return prot->bind(sk, uaddr, addr_len); if (addr_len < SIN6_LEN_RFC2133) return -EINVAL; /* BPF prog is run before any checks are done so that if the prog * changes context in a wrong way it will be caught. */ err = BPF_CGROUP_RUN_PROG_INET_BIND_LOCK(sk, uaddr, &addr_len, CGROUP_INET6_BIND, &flags); if (err) return err; return __inet6_bind(sk, uaddr, addr_len, flags); } /* bind for INET6 API */ int inet6_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len) { return inet6_bind_sk(sock->sk, uaddr, addr_len); } EXPORT_SYMBOL(inet6_bind); int inet6_release(struct socket *sock) { struct sock *sk = sock->sk; if (!sk) return -EINVAL; /* Free mc lists */ ipv6_sock_mc_close(sk); /* Free ac lists */ ipv6_sock_ac_close(sk); return inet_release(sock); } EXPORT_SYMBOL(inet6_release); void inet6_cleanup_sock(struct sock *sk) { struct ipv6_pinfo *np = inet6_sk(sk); struct sk_buff *skb; struct ipv6_txoptions *opt; /* Release rx options */ skb = xchg(&np->pktoptions, NULL); kfree_skb(skb); skb = xchg(&np->rxpmtu, NULL); kfree_skb(skb); /* Free flowlabels */ fl6_free_socklist(sk); /* Free tx options */ opt = unrcu_pointer(xchg(&np->opt, NULL)); if (opt) { atomic_sub(opt->tot_len, &sk->sk_omem_alloc); txopt_put(opt); } } EXPORT_SYMBOL_GPL(inet6_cleanup_sock); /* * This does both peername and sockname. */ int inet6_getname(struct socket *sock, struct sockaddr *uaddr, int peer) { struct sockaddr_in6 *sin = (struct sockaddr_in6 *)uaddr; int sin_addr_len = sizeof(*sin); struct sock *sk = sock->sk; struct inet_sock *inet = inet_sk(sk); struct ipv6_pinfo *np = inet6_sk(sk); sin->sin6_family = AF_INET6; sin->sin6_flowinfo = 0; sin->sin6_scope_id = 0; lock_sock(sk); if (peer) { if (!inet->inet_dport || (((1 << sk->sk_state) & (TCPF_CLOSE | TCPF_SYN_SENT)) && peer == 1)) { release_sock(sk); return -ENOTCONN; } sin->sin6_port = inet->inet_dport; sin->sin6_addr = sk->sk_v6_daddr; if (inet6_test_bit(SNDFLOW, sk)) sin->sin6_flowinfo = np->flow_label; BPF_CGROUP_RUN_SA_PROG(sk, (struct sockaddr *)sin, &sin_addr_len, CGROUP_INET6_GETPEERNAME); } else { if (ipv6_addr_any(&sk->sk_v6_rcv_saddr)) sin->sin6_addr = np->saddr; else sin->sin6_addr = sk->sk_v6_rcv_saddr; sin->sin6_port = inet->inet_sport; BPF_CGROUP_RUN_SA_PROG(sk, (struct sockaddr *)sin, &sin_addr_len, CGROUP_INET6_GETSOCKNAME); } sin->sin6_scope_id = ipv6_iface_scope_id(&sin->sin6_addr, sk->sk_bound_dev_if); release_sock(sk); return sin_addr_len; } EXPORT_SYMBOL(inet6_getname); int inet6_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg) { void __user *argp = (void __user *)arg; struct sock *sk = sock->sk; struct net *net = sock_net(sk); const struct proto *prot; switch (cmd) { case SIOCADDRT: case SIOCDELRT: { struct in6_rtmsg rtmsg; if (copy_from_user(&rtmsg, argp, sizeof(rtmsg))) return -EFAULT; return ipv6_route_ioctl(net, cmd, &rtmsg); } case SIOCSIFADDR: return addrconf_add_ifaddr(net, argp); case SIOCDIFADDR: return addrconf_del_ifaddr(net, argp); case SIOCSIFDSTADDR: return addrconf_set_dstaddr(net, argp); default: /* IPV6_ADDRFORM can change sk->sk_prot under us. */ prot = READ_ONCE(sk->sk_prot); if (!prot->ioctl) return -ENOIOCTLCMD; return sk_ioctl(sk, cmd, (void __user *)arg); } /*NOTREACHED*/ return 0; } EXPORT_SYMBOL(inet6_ioctl); #ifdef CONFIG_COMPAT struct compat_in6_rtmsg { struct in6_addr rtmsg_dst; struct in6_addr rtmsg_src; struct in6_addr rtmsg_gateway; u32 rtmsg_type; u16 rtmsg_dst_len; u16 rtmsg_src_len; u32 rtmsg_metric; u32 rtmsg_info; u32 rtmsg_flags; s32 rtmsg_ifindex; }; static int inet6_compat_routing_ioctl(struct sock *sk, unsigned int cmd, struct compat_in6_rtmsg __user *ur) { struct in6_rtmsg rt; if (copy_from_user(&rt.rtmsg_dst, &ur->rtmsg_dst, 3 * sizeof(struct in6_addr)) || get_user(rt.rtmsg_type, &ur->rtmsg_type) || get_user(rt.rtmsg_dst_len, &ur->rtmsg_dst_len) || get_user(rt.rtmsg_src_len, &ur->rtmsg_src_len) || get_user(rt.rtmsg_metric, &ur->rtmsg_metric) || get_user(rt.rtmsg_info, &ur->rtmsg_info) || get_user(rt.rtmsg_flags, &ur->rtmsg_flags) || get_user(rt.rtmsg_ifindex, &ur->rtmsg_ifindex)) return -EFAULT; return ipv6_route_ioctl(sock_net(sk), cmd, &rt); } int inet6_compat_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg) { void __user *argp = compat_ptr(arg); struct sock *sk = sock->sk; switch (cmd) { case SIOCADDRT: case SIOCDELRT: return inet6_compat_routing_ioctl(sk, cmd, argp); default: return -ENOIOCTLCMD; } } EXPORT_SYMBOL_GPL(inet6_compat_ioctl); #endif /* CONFIG_COMPAT */ INDIRECT_CALLABLE_DECLARE(int udpv6_sendmsg(struct sock *, struct msghdr *, size_t)); int inet6_sendmsg(struct socket *sock, struct msghdr *msg, size_t size) { struct sock *sk = sock->sk; const struct proto *prot; if (unlikely(inet_send_prepare(sk))) return -EAGAIN; /* IPV6_ADDRFORM can change sk->sk_prot under us. */ prot = READ_ONCE(sk->sk_prot); return INDIRECT_CALL_2(prot->sendmsg, tcp_sendmsg, udpv6_sendmsg, sk, msg, size); } INDIRECT_CALLABLE_DECLARE(int udpv6_recvmsg(struct sock *, struct msghdr *, size_t, int, int *)); int inet6_recvmsg(struct socket *sock, struct msghdr *msg, size_t size, int flags) { struct sock *sk = sock->sk; const struct proto *prot; int addr_len = 0; int err; if (likely(!(flags & MSG_ERRQUEUE))) sock_rps_record_flow(sk); /* IPV6_ADDRFORM can change sk->sk_prot under us. */ prot = READ_ONCE(sk->sk_prot); err = INDIRECT_CALL_2(prot->recvmsg, tcp_recvmsg, udpv6_recvmsg, sk, msg, size, flags, &addr_len); if (err >= 0) msg->msg_namelen = addr_len; return err; } const struct proto_ops inet6_stream_ops = { .family = PF_INET6, .owner = THIS_MODULE, .release = inet6_release, .bind = inet6_bind, .connect = inet_stream_connect, /* ok */ .socketpair = sock_no_socketpair, /* a do nothing */ .accept = inet_accept, /* ok */ .getname = inet6_getname, .poll = tcp_poll, /* ok */ .ioctl = inet6_ioctl, /* must change */ .gettstamp = sock_gettstamp, .listen = inet_listen, /* ok */ .shutdown = inet_shutdown, /* ok */ .setsockopt = sock_common_setsockopt, /* ok */ .getsockopt = sock_common_getsockopt, /* ok */ .sendmsg = inet6_sendmsg, /* retpoline's sake */ .recvmsg = inet6_recvmsg, /* retpoline's sake */ #ifdef CONFIG_MMU .mmap = tcp_mmap, #endif .splice_eof = inet_splice_eof, .sendmsg_locked = tcp_sendmsg_locked, .splice_read = tcp_splice_read, .set_peek_off = sk_set_peek_off, .read_sock = tcp_read_sock, .read_skb = tcp_read_skb, .peek_len = tcp_peek_len, #ifdef CONFIG_COMPAT .compat_ioctl = inet6_compat_ioctl, #endif .set_rcvlowat = tcp_set_rcvlowat, }; EXPORT_SYMBOL_GPL(inet6_stream_ops); const struct proto_ops inet6_dgram_ops = { .family = PF_INET6, .owner = THIS_MODULE, .release = inet6_release, .bind = inet6_bind, .connect = inet_dgram_connect, /* ok */ .socketpair = sock_no_socketpair, /* a do nothing */ .accept = sock_no_accept, /* a do nothing */ .getname = inet6_getname, .poll = udp_poll, /* ok */ .ioctl = inet6_ioctl, /* must change */ .gettstamp = sock_gettstamp, .listen = sock_no_listen, /* ok */ .shutdown = inet_shutdown, /* ok */ .setsockopt = sock_common_setsockopt, /* ok */ .getsockopt = sock_common_getsockopt, /* ok */ .sendmsg = inet6_sendmsg, /* retpoline's sake */ .recvmsg = inet6_recvmsg, /* retpoline's sake */ .read_skb = udp_read_skb, .mmap = sock_no_mmap, .set_peek_off = udp_set_peek_off, #ifdef CONFIG_COMPAT .compat_ioctl = inet6_compat_ioctl, #endif }; static const struct net_proto_family inet6_family_ops = { .family = PF_INET6, .create = inet6_create, .owner = THIS_MODULE, }; int inet6_register_protosw(struct inet_protosw *p) { struct list_head *lh; struct inet_protosw *answer; struct list_head *last_perm; int protocol = p->protocol; int ret; spin_lock_bh(&inetsw6_lock); ret = -EINVAL; if (p->type >= SOCK_MAX) goto out_illegal; /* If we are trying to override a permanent protocol, bail. */ answer = NULL; ret = -EPERM; last_perm = &inetsw6[p->type]; list_for_each(lh, &inetsw6[p->type]) { answer = list_entry(lh, struct inet_protosw, list); /* Check only the non-wild match. */ if (INET_PROTOSW_PERMANENT & answer->flags) { if (protocol == answer->protocol) break; last_perm = lh; } answer = NULL; } if (answer) goto out_permanent; /* Add the new entry after the last permanent entry if any, so that * the new entry does not override a permanent entry when matched with * a wild-card protocol. But it is allowed to override any existing * non-permanent entry. This means that when we remove this entry, the * system automatically returns to the old behavior. */ list_add_rcu(&p->list, last_perm); ret = 0; out: spin_unlock_bh(&inetsw6_lock); return ret; out_permanent: pr_err("Attempt to override permanent protocol %d\n", protocol); goto out; out_illegal: pr_err("Ignoring attempt to register invalid socket type %d\n", p->type); goto out; } EXPORT_SYMBOL(inet6_register_protosw); void inet6_unregister_protosw(struct inet_protosw *p) { if (INET_PROTOSW_PERMANENT & p->flags) { pr_err("Attempt to unregister permanent protocol %d\n", p->protocol); } else { spin_lock_bh(&inetsw6_lock); list_del_rcu(&p->list); spin_unlock_bh(&inetsw6_lock); synchronize_net(); } } EXPORT_SYMBOL(inet6_unregister_protosw); int inet6_sk_rebuild_header(struct sock *sk) { struct ipv6_pinfo *np = inet6_sk(sk); struct dst_entry *dst; dst = __sk_dst_check(sk, np->dst_cookie); if (!dst) { struct inet_sock *inet = inet_sk(sk); struct in6_addr *final_p, final; struct flowi6 fl6; memset(&fl6, 0, sizeof(fl6)); fl6.flowi6_proto = sk->sk_protocol; fl6.daddr = sk->sk_v6_daddr; fl6.saddr = np->saddr; fl6.flowlabel = np->flow_label; fl6.flowi6_oif = sk->sk_bound_dev_if; fl6.flowi6_mark = sk->sk_mark; fl6.fl6_dport = inet->inet_dport; fl6.fl6_sport = inet->inet_sport; fl6.flowi6_uid = sk_uid(sk); security_sk_classify_flow(sk, flowi6_to_flowi_common(&fl6)); rcu_read_lock(); final_p = fl6_update_dst(&fl6, rcu_dereference(np->opt), &final); rcu_read_unlock(); dst = ip6_dst_lookup_flow(sock_net(sk), sk, &fl6, final_p); if (IS_ERR(dst)) { sk->sk_route_caps = 0; WRITE_ONCE(sk->sk_err_soft, -PTR_ERR(dst)); return PTR_ERR(dst); } ip6_dst_store(sk, dst, false, false); } return 0; } EXPORT_SYMBOL_GPL(inet6_sk_rebuild_header); bool ipv6_opt_accepted(const struct sock *sk, const struct sk_buff *skb, const struct inet6_skb_parm *opt) { const struct ipv6_pinfo *np = inet6_sk(sk); if (np->rxopt.all) { if (((opt->flags & IP6SKB_HOPBYHOP) && (np->rxopt.bits.hopopts || np->rxopt.bits.ohopopts)) || (ip6_flowinfo((struct ipv6hdr *) skb_network_header(skb)) && np->rxopt.bits.rxflow) || (opt->srcrt && (np->rxopt.bits.srcrt || np->rxopt.bits.osrcrt)) || ((opt->dst1 || opt->dst0) && (np->rxopt.bits.dstopts || np->rxopt.bits.odstopts))) return true; } return false; } static struct packet_type ipv6_packet_type __read_mostly = { .type = cpu_to_be16(ETH_P_IPV6), .func = ipv6_rcv, .list_func = ipv6_list_rcv, }; static int __init ipv6_packet_init(void) { dev_add_pack(&ipv6_packet_type); return 0; } static void ipv6_packet_cleanup(void) { dev_remove_pack(&ipv6_packet_type); } static int __net_init ipv6_init_mibs(struct net *net) { int i; net->mib.udp_stats_in6 = alloc_percpu(struct udp_mib); if (!net->mib.udp_stats_in6) return -ENOMEM; net->mib.udplite_stats_in6 = alloc_percpu(struct udp_mib); if (!net->mib.udplite_stats_in6) goto err_udplite_mib; net->mib.ipv6_statistics = alloc_percpu(struct ipstats_mib); if (!net->mib.ipv6_statistics) goto err_ip_mib; for_each_possible_cpu(i) { struct ipstats_mib *af_inet6_stats; af_inet6_stats = per_cpu_ptr(net->mib.ipv6_statistics, i); u64_stats_init(&af_inet6_stats->syncp); } net->mib.icmpv6_statistics = alloc_percpu(struct icmpv6_mib); if (!net->mib.icmpv6_statistics) goto err_icmp_mib; net->mib.icmpv6msg_statistics = kzalloc(sizeof(struct icmpv6msg_mib), GFP_KERNEL); if (!net->mib.icmpv6msg_statistics) goto err_icmpmsg_mib; return 0; err_icmpmsg_mib: free_percpu(net->mib.icmpv6_statistics); err_icmp_mib: free_percpu(net->mib.ipv6_statistics); err_ip_mib: free_percpu(net->mib.udplite_stats_in6); err_udplite_mib: free_percpu(net->mib.udp_stats_in6); return -ENOMEM; } static void ipv6_cleanup_mibs(struct net *net) { free_percpu(net->mib.udp_stats_in6); free_percpu(net->mib.udplite_stats_in6); free_percpu(net->mib.ipv6_statistics); free_percpu(net->mib.icmpv6_statistics); kfree(net->mib.icmpv6msg_statistics); } static int __net_init inet6_net_init(struct net *net) { int err = 0; net->ipv6.sysctl.bindv6only = 0; net->ipv6.sysctl.icmpv6_time = 1*HZ; net->ipv6.sysctl.icmpv6_echo_ignore_all = 0; net->ipv6.sysctl.icmpv6_echo_ignore_multicast = 0; net->ipv6.sysctl.icmpv6_echo_ignore_anycast = 0; net->ipv6.sysctl.icmpv6_error_anycast_as_unicast = 0; /* By default, rate limit error messages. * Except for pmtu discovery, it would break it. * proc_do_large_bitmap needs pointer to the bitmap. */ bitmap_set(net->ipv6.sysctl.icmpv6_ratemask, 0, ICMPV6_ERRMSG_MAX + 1); bitmap_clear(net->ipv6.sysctl.icmpv6_ratemask, ICMPV6_PKT_TOOBIG, 1); net->ipv6.sysctl.icmpv6_ratemask_ptr = net->ipv6.sysctl.icmpv6_ratemask; net->ipv6.sysctl.flowlabel_consistency = 1; net->ipv6.sysctl.auto_flowlabels = IP6_DEFAULT_AUTO_FLOW_LABELS; net->ipv6.sysctl.idgen_retries = 3; net->ipv6.sysctl.idgen_delay = 1 * HZ; net->ipv6.sysctl.flowlabel_state_ranges = 0; net->ipv6.sysctl.max_dst_opts_cnt = IP6_DEFAULT_MAX_DST_OPTS_CNT; net->ipv6.sysctl.max_hbh_opts_cnt = IP6_DEFAULT_MAX_HBH_OPTS_CNT; net->ipv6.sysctl.max_dst_opts_len = IP6_DEFAULT_MAX_DST_OPTS_LEN; net->ipv6.sysctl.max_hbh_opts_len = IP6_DEFAULT_MAX_HBH_OPTS_LEN; net->ipv6.sysctl.fib_notify_on_flag_change = 0; atomic_set(&net->ipv6.fib6_sernum, 1); net->ipv6.sysctl.ioam6_id = IOAM6_DEFAULT_ID; net->ipv6.sysctl.ioam6_id_wide = IOAM6_DEFAULT_ID_WIDE; err = ipv6_init_mibs(net); if (err) return err; #ifdef CONFIG_PROC_FS err = udp6_proc_init(net); if (err) goto out; err = tcp6_proc_init(net); if (err) goto proc_tcp6_fail; err = ac6_proc_init(net); if (err) goto proc_ac6_fail; #endif return err; #ifdef CONFIG_PROC_FS proc_ac6_fail: tcp6_proc_exit(net); proc_tcp6_fail: udp6_proc_exit(net); out: ipv6_cleanup_mibs(net); return err; #endif } static void __net_exit inet6_net_exit(struct net *net) { #ifdef CONFIG_PROC_FS udp6_proc_exit(net); tcp6_proc_exit(net); ac6_proc_exit(net); #endif ipv6_cleanup_mibs(net); } static struct pernet_operations inet6_net_ops = { .init = inet6_net_init, .exit = inet6_net_exit, }; static int ipv6_route_input(struct sk_buff *skb) { ip6_route_input(skb); return skb_dst(skb)->error; } static const struct ipv6_stub ipv6_stub_impl = { .ipv6_sock_mc_join = ipv6_sock_mc_join, .ipv6_sock_mc_drop = ipv6_sock_mc_drop, .ipv6_dst_lookup_flow = ip6_dst_lookup_flow, .ipv6_route_input = ipv6_route_input, .fib6_get_table = fib6_get_table, .fib6_table_lookup = fib6_table_lookup, .fib6_lookup = fib6_lookup, .fib6_select_path = fib6_select_path, .ip6_mtu_from_fib6 = ip6_mtu_from_fib6, .fib6_nh_init = fib6_nh_init, .fib6_nh_release = fib6_nh_release, .fib6_nh_release_dsts = fib6_nh_release_dsts, .fib6_update_sernum = fib6_update_sernum_stub, .fib6_rt_update = fib6_rt_update, .ip6_del_rt = ip6_del_rt, .udpv6_encap_enable = udpv6_encap_enable, .ndisc_send_na = ndisc_send_na, #if IS_ENABLED(CONFIG_XFRM) .xfrm6_local_rxpmtu = xfrm6_local_rxpmtu, .xfrm6_udp_encap_rcv = xfrm6_udp_encap_rcv, .xfrm6_gro_udp_encap_rcv = xfrm6_gro_udp_encap_rcv, .xfrm6_rcv_encap = xfrm6_rcv_encap, #endif .nd_tbl = &nd_tbl, .ipv6_fragment = ip6_fragment, .ipv6_dev_find = ipv6_dev_find, .ip6_xmit = ip6_xmit, }; static const struct ipv6_bpf_stub ipv6_bpf_stub_impl = { .inet6_bind = __inet6_bind, .udp6_lib_lookup = __udp6_lib_lookup, .ipv6_setsockopt = do_ipv6_setsockopt, .ipv6_getsockopt = do_ipv6_getsockopt, .ipv6_dev_get_saddr = ipv6_dev_get_saddr, }; static int __init inet6_init(void) { struct list_head *r; int err = 0; sock_skb_cb_check_size(sizeof(struct inet6_skb_parm)); /* Register the socket-side information for inet6_create. */ for (r = &inetsw6[0]; r < &inetsw6[SOCK_MAX]; ++r) INIT_LIST_HEAD(r); raw_hashinfo_init(&raw_v6_hashinfo); if (disable_ipv6_mod) { pr_info("Loaded, but administratively disabled, reboot required to enable\n"); goto out; } err = proto_register(&tcpv6_prot, 1); if (err) goto out; err = proto_register(&udpv6_prot, 1); if (err) goto out_unregister_tcp_proto; err = proto_register(&udplitev6_prot, 1); if (err) goto out_unregister_udp_proto; err = proto_register(&rawv6_prot, 1); if (err) goto out_unregister_udplite_proto; err = proto_register(&pingv6_prot, 1); if (err) goto out_unregister_raw_proto; /* We MUST register RAW sockets before we create the ICMP6, * IGMP6, or NDISC control sockets. */ err = rawv6_init(); if (err) goto out_unregister_ping_proto; /* Register the family here so that the init calls below will * be able to create sockets. (?? is this dangerous ??) */ err = sock_register(&inet6_family_ops); if (err) goto out_sock_register_fail; /* * ipngwg API draft makes clear that the correct semantics * for TCP and UDP is to consider one TCP and UDP instance * in a host available by both INET and INET6 APIs and * able to communicate via both network protocols. */ err = register_pernet_subsys(&inet6_net_ops); if (err) goto register_pernet_fail; err = ip6_mr_init(); if (err) goto ipmr_fail; err = icmpv6_init(); if (err) goto icmp_fail; err = ndisc_init(); if (err) goto ndisc_fail; err = igmp6_init(); if (err) goto igmp_fail; err = ipv6_netfilter_init(); if (err) goto netfilter_fail; /* Create /proc/foo6 entries. */ #ifdef CONFIG_PROC_FS err = -ENOMEM; if (raw6_proc_init()) goto proc_raw6_fail; if (udplite6_proc_init()) goto proc_udplite6_fail; if (ipv6_misc_proc_init()) goto proc_misc6_fail; if (if6_proc_init()) goto proc_if6_fail; #endif err = ip6_route_init(); if (err) goto ip6_route_fail; err = ndisc_late_init(); if (err) goto ndisc_late_fail; err = ip6_flowlabel_init(); if (err) goto ip6_flowlabel_fail; err = ipv6_anycast_init(); if (err) goto ipv6_anycast_fail; err = addrconf_init(); if (err) goto addrconf_fail; /* Init v6 extension headers. */ err = ipv6_exthdrs_init(); if (err) goto ipv6_exthdrs_fail; err = ipv6_frag_init(); if (err) goto ipv6_frag_fail; /* Init v6 transport protocols. */ err = udpv6_init(); if (err) goto udpv6_fail; err = udplitev6_init(); if (err) goto udplitev6_fail; err = udpv6_offload_init(); if (err) goto udpv6_offload_fail; err = tcpv6_init(); if (err) goto tcpv6_fail; err = ipv6_packet_init(); if (err) goto ipv6_packet_fail; err = pingv6_init(); if (err) goto pingv6_fail; err = calipso_init(); if (err) goto calipso_fail; err = seg6_init(); if (err) goto seg6_fail; err = rpl_init(); if (err) goto rpl_fail; err = ioam6_init(); if (err) goto ioam6_fail; err = igmp6_late_init(); if (err) goto igmp6_late_err; #ifdef CONFIG_SYSCTL err = ipv6_sysctl_register(); if (err) goto sysctl_fail; #endif /* ensure that ipv6 stubs are visible only after ipv6 is ready */ wmb(); ipv6_stub = &ipv6_stub_impl; ipv6_bpf_stub = &ipv6_bpf_stub_impl; out: return err; #ifdef CONFIG_SYSCTL sysctl_fail: igmp6_late_cleanup(); #endif igmp6_late_err: ioam6_exit(); ioam6_fail: rpl_exit(); rpl_fail: seg6_exit(); seg6_fail: calipso_exit(); calipso_fail: pingv6_exit(); pingv6_fail: ipv6_packet_cleanup(); ipv6_packet_fail: tcpv6_exit(); tcpv6_fail: udpv6_offload_exit(); udpv6_offload_fail: udplitev6_exit(); udplitev6_fail: udpv6_exit(); udpv6_fail: ipv6_frag_exit(); ipv6_frag_fail: ipv6_exthdrs_exit(); ipv6_exthdrs_fail: addrconf_cleanup(); addrconf_fail: ipv6_anycast_cleanup(); ipv6_anycast_fail: ip6_flowlabel_cleanup(); ip6_flowlabel_fail: ndisc_late_cleanup(); ndisc_late_fail: ip6_route_cleanup(); ip6_route_fail: #ifdef CONFIG_PROC_FS if6_proc_exit(); proc_if6_fail: ipv6_misc_proc_exit(); proc_misc6_fail: udplite6_proc_exit(); proc_udplite6_fail: raw6_proc_exit(); proc_raw6_fail: #endif ipv6_netfilter_fini(); netfilter_fail: igmp6_cleanup(); igmp_fail: ndisc_cleanup(); ndisc_fail: icmpv6_cleanup(); icmp_fail: ip6_mr_cleanup(); ipmr_fail: unregister_pernet_subsys(&inet6_net_ops); register_pernet_fail: sock_unregister(PF_INET6); rtnl_unregister_all(PF_INET6); out_sock_register_fail: rawv6_exit(); out_unregister_ping_proto: proto_unregister(&pingv6_prot); out_unregister_raw_proto: proto_unregister(&rawv6_prot); out_unregister_udplite_proto: proto_unregister(&udplitev6_prot); out_unregister_udp_proto: proto_unregister(&udpv6_prot); out_unregister_tcp_proto: proto_unregister(&tcpv6_prot); goto out; } module_init(inet6_init); MODULE_ALIAS_NETPROTO(PF_INET6); |
| 2140 12169 3566 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 | /* SPDX-License-Identifier: GPL-2.0 */ #undef TRACE_SYSTEM #define TRACE_SYSTEM maple_tree #if !defined(_TRACE_MM_H) || defined(TRACE_HEADER_MULTI_READ) #define _TRACE_MM_H #include <linux/tracepoint.h> struct ma_state; TRACE_EVENT(ma_op, TP_PROTO(const char *fn, struct ma_state *mas), TP_ARGS(fn, mas), TP_STRUCT__entry( __field(const char *, fn) __field(unsigned long, min) __field(unsigned long, max) __field(unsigned long, index) __field(unsigned long, last) __field(void *, node) ), TP_fast_assign( __entry->fn = fn; __entry->min = mas->min; __entry->max = mas->max; __entry->index = mas->index; __entry->last = mas->last; __entry->node = mas->node; ), TP_printk("%s\tNode: %p (%lu %lu) range: %lu-%lu", __entry->fn, (void *) __entry->node, (unsigned long) __entry->min, (unsigned long) __entry->max, (unsigned long) __entry->index, (unsigned long) __entry->last ) ) TRACE_EVENT(ma_read, TP_PROTO(const char *fn, struct ma_state *mas), TP_ARGS(fn, mas), TP_STRUCT__entry( __field(const char *, fn) __field(unsigned long, min) __field(unsigned long, max) __field(unsigned long, index) __field(unsigned long, last) __field(void *, node) ), TP_fast_assign( __entry->fn = fn; __entry->min = mas->min; __entry->max = mas->max; __entry->index = mas->index; __entry->last = mas->last; __entry->node = mas->node; ), TP_printk("%s\tNode: %p (%lu %lu) range: %lu-%lu", __entry->fn, (void *) __entry->node, (unsigned long) __entry->min, (unsigned long) __entry->max, (unsigned long) __entry->index, (unsigned long) __entry->last ) ) TRACE_EVENT(ma_write, TP_PROTO(const char *fn, struct ma_state *mas, unsigned long piv, void *val), TP_ARGS(fn, mas, piv, val), TP_STRUCT__entry( __field(const char *, fn) __field(unsigned long, min) __field(unsigned long, max) __field(unsigned long, index) __field(unsigned long, last) __field(unsigned long, piv) __field(void *, val) __field(void *, node) ), TP_fast_assign( __entry->fn = fn; __entry->min = mas->min; __entry->max = mas->max; __entry->index = mas->index; __entry->last = mas->last; __entry->piv = piv; __entry->val = val; __entry->node = mas->node; ), TP_printk("%s\tNode %p (%lu %lu) range:%lu-%lu piv (%lu) val %p", __entry->fn, (void *) __entry->node, (unsigned long) __entry->min, (unsigned long) __entry->max, (unsigned long) __entry->index, (unsigned long) __entry->last, (unsigned long) __entry->piv, (void *) __entry->val ) ) #endif /* _TRACE_MM_H */ /* This part must be outside protection */ #include <trace/define_trace.h> |
| 59 77 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | /* SPDX-License-Identifier: GPL-2.0-only * * Virtual memory framebuffer access for drawing routines * * Copyright (C) 2025 Zsolt Kajtar (soci@c64.rulez.org) */ /* keeps track of a bit address in framebuffer memory */ struct fb_address { void *address; int bits; }; /* initialize the bit address pointer to the beginning of the frame buffer */ static inline struct fb_address fb_address_init(struct fb_info *p) { void *base = p->screen_buffer; struct fb_address ptr; ptr.address = PTR_ALIGN_DOWN(base, BITS_PER_LONG / BITS_PER_BYTE); ptr.bits = (base - ptr.address) * BITS_PER_BYTE; return ptr; } /* framebuffer write access */ static inline void fb_write_offset(unsigned long val, int offset, const struct fb_address *dst) { unsigned long *mem = dst->address; mem[offset] = val; } /* framebuffer read access */ static inline unsigned long fb_read_offset(int offset, const struct fb_address *src) { unsigned long *mem = src->address; return mem[offset]; } |
| 45 44 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 | // SPDX-License-Identifier: GPL-2.0 /* * OF NUMA Parsing support. * * Copyright (C) 2015 - 2016 Cavium Inc. */ #define pr_fmt(fmt) "OF: NUMA: " fmt #include <linux/of.h> #include <linux/of_address.h> #include <linux/nodemask.h> #include <linux/numa_memblks.h> #include <asm/numa.h> /* * Even though we connect cpus to numa domains later in SMP * init, we need to know the node ids now for all cpus. */ static void __init of_numa_parse_cpu_nodes(void) { u32 nid; int r; struct device_node *np; for_each_of_cpu_node(np) { r = of_property_read_u32(np, "numa-node-id", &nid); if (r) continue; pr_debug("CPU on %u\n", nid); if (nid >= MAX_NUMNODES) pr_warn("Node id %u exceeds maximum value\n", nid); else node_set(nid, numa_nodes_parsed); } } static int __init of_numa_parse_memory_nodes(void) { struct device_node *np = NULL; struct resource rsrc; u32 nid; int i, r = -EINVAL; for_each_node_by_type(np, "memory") { r = of_property_read_u32(np, "numa-node-id", &nid); if (r == -EINVAL) /* * property doesn't exist if -EINVAL, continue * looking for more memory nodes with * "numa-node-id" property */ continue; if (nid >= MAX_NUMNODES) { pr_warn("Node id %u exceeds maximum value\n", nid); r = -EINVAL; } for (i = 0; !r && !of_address_to_resource(np, i, &rsrc); i++) { r = numa_add_memblk(nid, rsrc.start, rsrc.end + 1); if (!r) node_set(nid, numa_nodes_parsed); } if (!i || r) { of_node_put(np); pr_err("bad property in memory node\n"); return r ? : -EINVAL; } } return r; } static int __init of_numa_parse_distance_map_v1(struct device_node *map) { const __be32 *matrix; int entry_count; int i; pr_info("parsing numa-distance-map-v1\n"); matrix = of_get_property(map, "distance-matrix", NULL); if (!matrix) { pr_err("No distance-matrix property in distance-map\n"); return -EINVAL; } entry_count = of_property_count_u32_elems(map, "distance-matrix"); if (entry_count <= 0) { pr_err("Invalid distance-matrix\n"); return -EINVAL; } for (i = 0; i + 2 < entry_count; i += 3) { u32 nodea, nodeb, distance; nodea = of_read_number(matrix, 1); matrix++; nodeb = of_read_number(matrix, 1); matrix++; distance = of_read_number(matrix, 1); matrix++; if ((nodea == nodeb && distance != LOCAL_DISTANCE) || (nodea != nodeb && distance <= LOCAL_DISTANCE)) { pr_err("Invalid distance[node%d -> node%d] = %d\n", nodea, nodeb, distance); return -EINVAL; } node_set(nodea, numa_nodes_parsed); numa_set_distance(nodea, nodeb, distance); /* Set default distance of node B->A same as A->B */ if (nodeb > nodea) numa_set_distance(nodeb, nodea, distance); } return 0; } static int __init of_numa_parse_distance_map(void) { int ret = 0; struct device_node *np; np = of_find_compatible_node(NULL, NULL, "numa-distance-map-v1"); if (np) ret = of_numa_parse_distance_map_v1(np); of_node_put(np); return ret; } int of_node_to_nid(struct device_node *device) { struct device_node *np; u32 nid; int r = -ENODATA; np = of_node_get(device); while (np) { r = of_property_read_u32(np, "numa-node-id", &nid); /* * -EINVAL indicates the property was not found, and * we walk up the tree trying to find a parent with a * "numa-node-id". Any other type of error indicates * a bad device tree and we give up. */ if (r != -EINVAL) break; np = of_get_next_parent(np); } if (np && r) pr_warn("Invalid \"numa-node-id\" property in node %pOFn\n", np); of_node_put(np); /* * If numa=off passed on command line, or with a defective * device tree, the nid may not be in the set of possible * nodes. Check for this case and return NUMA_NO_NODE. */ if (!r && nid < MAX_NUMNODES && node_possible(nid)) return nid; return NUMA_NO_NODE; } int __init of_numa_init(void) { int r; of_numa_parse_cpu_nodes(); r = of_numa_parse_memory_nodes(); if (r) return r; return of_numa_parse_distance_map(); } |
| 9 16 9 467 470 468 75 49 13 10 58 8 56 6 3 10 3 3 2 4 4 443 443 6 9 443 441 4 428 6 428 6 31 448 448 27 27 26 1 448 13 448 8 448 8 2 431 10 428 20 21 16 31 30 428 427 428 428 35 36 427 428 23 5 428 428 428 427 428 428 428 428 3 6 3 95 95 77 22 58 2 58 1 55 27 2 2 1 22 15 10 59 1 4 52 2 45 12 4 1 2 1 3 2 6 9 6 9 5 2 5 9 36 5 19 2 5 2 9 9 9 9 8 1 7 8 8 2 6 5 5 1 5 3 3 5 5 4 4 4 9 9 13 5 9 7 2 14 9 9 1 3 19 1 10 8 18 14 6 8 4 3 4 10 8 5 4 8 3 8 2 10 8 8 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 | // SPDX-License-Identifier: GPL-2.0-only /* * Kernel-based Virtual Machine driver for Linux * cpuid support routines * * derived from arch/x86/kvm/x86.c * * Copyright 2011 Red Hat, Inc. and/or its affiliates. * Copyright IBM Corporation, 2008 */ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #include <linux/kvm_host.h> #include "linux/lockdep.h" #include <linux/export.h> #include <linux/vmalloc.h> #include <linux/uaccess.h> #include <linux/sched/stat.h> #include <asm/processor.h> #include <asm/user.h> #include <asm/fpu/xstate.h> #include <asm/sgx.h> #include <asm/cpuid/api.h> #include "cpuid.h" #include "lapic.h" #include "mmu.h" #include "trace.h" #include "pmu.h" #include "xen.h" /* * Unlike "struct cpuinfo_x86.x86_capability", kvm_cpu_caps doesn't need to be * aligned to sizeof(unsigned long) because it's not accessed via bitops. */ u32 kvm_cpu_caps[NR_KVM_CPU_CAPS] __read_mostly; EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_cpu_caps); struct cpuid_xstate_sizes { u32 eax; u32 ebx; u32 ecx; }; static struct cpuid_xstate_sizes xstate_sizes[XFEATURE_MAX] __ro_after_init; void __init kvm_init_xstate_sizes(void) { u32 ign; int i; for (i = XFEATURE_YMM; i < ARRAY_SIZE(xstate_sizes); i++) { struct cpuid_xstate_sizes *xs = &xstate_sizes[i]; cpuid_count(0xD, i, &xs->eax, &xs->ebx, &xs->ecx, &ign); } } u32 xstate_required_size(u64 xstate_bv, bool compacted) { u32 ret = XSAVE_HDR_SIZE + XSAVE_HDR_OFFSET; int i; xstate_bv &= XFEATURE_MASK_EXTEND; for (i = XFEATURE_YMM; i < ARRAY_SIZE(xstate_sizes) && xstate_bv; i++) { struct cpuid_xstate_sizes *xs = &xstate_sizes[i]; u32 offset; if (!(xstate_bv & BIT_ULL(i))) continue; /* ECX[1]: 64B alignment in compacted form */ if (compacted) offset = (xs->ecx & 0x2) ? ALIGN(ret, 64) : ret; else offset = xs->ebx; ret = max(ret, offset + xs->eax); xstate_bv &= ~BIT_ULL(i); } return ret; } struct kvm_cpuid_entry2 *kvm_find_cpuid_entry2( struct kvm_cpuid_entry2 *entries, int nent, u32 function, u64 index) { struct kvm_cpuid_entry2 *e; int i; /* * KVM has a semi-arbitrary rule that querying the guest's CPUID model * with IRQs disabled is disallowed. The CPUID model can legitimately * have over one hundred entries, i.e. the lookup is slow, and IRQs are * typically disabled in KVM only when KVM is in a performance critical * path, e.g. the core VM-Enter/VM-Exit run loop. Nothing will break * if this rule is violated, this assertion is purely to flag potential * performance issues. If this fires, consider moving the lookup out * of the hotpath, e.g. by caching information during CPUID updates. */ lockdep_assert_irqs_enabled(); for (i = 0; i < nent; i++) { e = &entries[i]; if (e->function != function) continue; /* * If the index isn't significant, use the first entry with a * matching function. It's userspace's responsibility to not * provide "duplicate" entries in all cases. */ if (!(e->flags & KVM_CPUID_FLAG_SIGNIFCANT_INDEX) || e->index == index) return e; /* * Similarly, use the first matching entry if KVM is doing a * lookup (as opposed to emulating CPUID) for a function that's * architecturally defined as not having a significant index. */ if (index == KVM_CPUID_INDEX_NOT_SIGNIFICANT) { /* * Direct lookups from KVM should not diverge from what * KVM defines internally (the architectural behavior). */ WARN_ON_ONCE(cpuid_function_is_indexed(function)); return e; } } return NULL; } EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_find_cpuid_entry2); static int kvm_check_cpuid(struct kvm_vcpu *vcpu) { struct kvm_cpuid_entry2 *best; u64 xfeatures; /* * The existing code assumes virtual address is 48-bit or 57-bit in the * canonical address checks; exit if it is ever changed. */ best = kvm_find_cpuid_entry(vcpu, 0x80000008); if (best) { int vaddr_bits = (best->eax & 0xff00) >> 8; if (vaddr_bits != 48 && vaddr_bits != 57 && vaddr_bits != 0) return -EINVAL; } /* * Exposing dynamic xfeatures to the guest requires additional * enabling in the FPU, e.g. to expand the guest XSAVE state size. */ best = kvm_find_cpuid_entry_index(vcpu, 0xd, 0); if (!best) return 0; xfeatures = best->eax | ((u64)best->edx << 32); xfeatures &= XFEATURE_MASK_USER_DYNAMIC; if (!xfeatures) return 0; return fpu_enable_guest_xfd_features(&vcpu->arch.guest_fpu, xfeatures); } static u32 kvm_apply_cpuid_pv_features_quirk(struct kvm_vcpu *vcpu); static void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu); /* Check whether the supplied CPUID data is equal to what is already set for the vCPU. */ static int kvm_cpuid_check_equal(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2, int nent) { struct kvm_cpuid_entry2 *orig; int i; /* * Apply runtime CPUID updates to the incoming CPUID entries to avoid * false positives due mismatches on KVM-owned feature flags. * * Note! @e2 and @nent track the _old_ CPUID entries! */ kvm_update_cpuid_runtime(vcpu); kvm_apply_cpuid_pv_features_quirk(vcpu); if (nent != vcpu->arch.cpuid_nent) return -EINVAL; for (i = 0; i < nent; i++) { orig = &vcpu->arch.cpuid_entries[i]; if (e2[i].function != orig->function || e2[i].index != orig->index || e2[i].flags != orig->flags || e2[i].eax != orig->eax || e2[i].ebx != orig->ebx || e2[i].ecx != orig->ecx || e2[i].edx != orig->edx) return -EINVAL; } return 0; } static struct kvm_hypervisor_cpuid kvm_get_hypervisor_cpuid(struct kvm_vcpu *vcpu, const char *sig) { struct kvm_hypervisor_cpuid cpuid = {}; struct kvm_cpuid_entry2 *entry; u32 base; for_each_possible_cpuid_base_hypervisor(base) { entry = kvm_find_cpuid_entry(vcpu, base); if (entry) { u32 signature[3]; signature[0] = entry->ebx; signature[1] = entry->ecx; signature[2] = entry->edx; if (!memcmp(signature, sig, sizeof(signature))) { cpuid.base = base; cpuid.limit = entry->eax; break; } } } return cpuid; } static u32 kvm_apply_cpuid_pv_features_quirk(struct kvm_vcpu *vcpu) { struct kvm_hypervisor_cpuid kvm_cpuid; struct kvm_cpuid_entry2 *best; kvm_cpuid = kvm_get_hypervisor_cpuid(vcpu, KVM_SIGNATURE); if (!kvm_cpuid.base) return 0; best = kvm_find_cpuid_entry(vcpu, kvm_cpuid.base | KVM_CPUID_FEATURES); if (!best) return 0; if (kvm_hlt_in_guest(vcpu->kvm)) best->eax &= ~(1 << KVM_FEATURE_PV_UNHALT); return best->eax; } /* * Calculate guest's supported XCR0 taking into account guest CPUID data and * KVM's supported XCR0 (comprised of host's XCR0 and KVM_SUPPORTED_XCR0). */ static u64 cpuid_get_supported_xcr0(struct kvm_vcpu *vcpu) { struct kvm_cpuid_entry2 *best; best = kvm_find_cpuid_entry_index(vcpu, 0xd, 0); if (!best) return 0; return (best->eax | ((u64)best->edx << 32)) & kvm_caps.supported_xcr0; } static u64 cpuid_get_supported_xss(struct kvm_vcpu *vcpu) { struct kvm_cpuid_entry2 *best; best = kvm_find_cpuid_entry_index(vcpu, 0xd, 1); if (!best) return 0; return (best->ecx | ((u64)best->edx << 32)) & kvm_caps.supported_xss; } static __always_inline void kvm_update_feature_runtime(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *entry, unsigned int x86_feature, bool has_feature) { cpuid_entry_change(entry, x86_feature, has_feature); guest_cpu_cap_change(vcpu, x86_feature, has_feature); } static void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu) { struct kvm_cpuid_entry2 *best; vcpu->arch.cpuid_dynamic_bits_dirty = false; best = kvm_find_cpuid_entry(vcpu, 1); if (best) { kvm_update_feature_runtime(vcpu, best, X86_FEATURE_OSXSAVE, kvm_is_cr4_bit_set(vcpu, X86_CR4_OSXSAVE)); kvm_update_feature_runtime(vcpu, best, X86_FEATURE_APIC, vcpu->arch.apic_base & MSR_IA32_APICBASE_ENABLE); if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT)) kvm_update_feature_runtime(vcpu, best, X86_FEATURE_MWAIT, vcpu->arch.ia32_misc_enable_msr & MSR_IA32_MISC_ENABLE_MWAIT); } best = kvm_find_cpuid_entry_index(vcpu, 7, 0); if (best) kvm_update_feature_runtime(vcpu, best, X86_FEATURE_OSPKE, kvm_is_cr4_bit_set(vcpu, X86_CR4_PKE)); best = kvm_find_cpuid_entry_index(vcpu, 0xD, 0); if (best) best->ebx = xstate_required_size(vcpu->arch.xcr0, false); best = kvm_find_cpuid_entry_index(vcpu, 0xD, 1); if (best && (cpuid_entry_has(best, X86_FEATURE_XSAVES) || cpuid_entry_has(best, X86_FEATURE_XSAVEC))) best->ebx = xstate_required_size(vcpu->arch.xcr0 | vcpu->arch.ia32_xss, true); } static bool kvm_cpuid_has_hyperv(struct kvm_vcpu *vcpu) { #ifdef CONFIG_KVM_HYPERV struct kvm_cpuid_entry2 *entry; entry = kvm_find_cpuid_entry(vcpu, HYPERV_CPUID_INTERFACE); return entry && entry->eax == HYPERV_CPUID_SIGNATURE_EAX; #else return false; #endif } static bool guest_cpuid_is_amd_or_hygon(struct kvm_vcpu *vcpu) { struct kvm_cpuid_entry2 *entry; entry = kvm_find_cpuid_entry(vcpu, 0); if (!entry) return false; return is_guest_vendor_amd(entry->ebx, entry->ecx, entry->edx) || is_guest_vendor_hygon(entry->ebx, entry->ecx, entry->edx); } /* * This isn't truly "unsafe", but except for the cpu_caps initialization code, * all register lookups should use __cpuid_entry_get_reg(), which provides * compile-time validation of the input. */ static u32 cpuid_get_reg_unsafe(struct kvm_cpuid_entry2 *entry, u32 reg) { switch (reg) { case CPUID_EAX: return entry->eax; case CPUID_EBX: return entry->ebx; case CPUID_ECX: return entry->ecx; case CPUID_EDX: return entry->edx; default: WARN_ON_ONCE(1); return 0; } } static int cpuid_func_emulated(struct kvm_cpuid_entry2 *entry, u32 func, bool include_partially_emulated); void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) { struct kvm_lapic *apic = vcpu->arch.apic; struct kvm_cpuid_entry2 *best; struct kvm_cpuid_entry2 *entry; bool allow_gbpages; int i; memset(vcpu->arch.cpu_caps, 0, sizeof(vcpu->arch.cpu_caps)); BUILD_BUG_ON(ARRAY_SIZE(reverse_cpuid) != NR_KVM_CPU_CAPS); /* * Reset guest capabilities to userspace's guest CPUID definition, i.e. * honor userspace's definition for features that don't require KVM or * hardware management/support (or that KVM simply doesn't care about). */ for (i = 0; i < NR_KVM_CPU_CAPS; i++) { const struct cpuid_reg cpuid = reverse_cpuid[i]; struct kvm_cpuid_entry2 emulated; if (!cpuid.function) continue; entry = kvm_find_cpuid_entry_index(vcpu, cpuid.function, cpuid.index); if (!entry) continue; cpuid_func_emulated(&emulated, cpuid.function, true); /* * A vCPU has a feature if it's supported by KVM and is enabled * in guest CPUID. Note, this includes features that are * supported by KVM but aren't advertised to userspace! */ vcpu->arch.cpu_caps[i] = kvm_cpu_caps[i] | cpuid_get_reg_unsafe(&emulated, cpuid.reg); vcpu->arch.cpu_caps[i] &= cpuid_get_reg_unsafe(entry, cpuid.reg); } kvm_update_cpuid_runtime(vcpu); /* * If TDP is enabled, let the guest use GBPAGES if they're supported in * hardware. The hardware page walker doesn't let KVM disable GBPAGES, * i.e. won't treat them as reserved, and KVM doesn't redo the GVA->GPA * walk for performance and complexity reasons. Not to mention KVM * _can't_ solve the problem because GVA->GPA walks aren't visible to * KVM once a TDP translation is installed. Mimic hardware behavior so * that KVM's is at least consistent, i.e. doesn't randomly inject #PF. * If TDP is disabled, honor *only* guest CPUID as KVM has full control * and can install smaller shadow pages if the host lacks 1GiB support. */ allow_gbpages = tdp_enabled ? boot_cpu_has(X86_FEATURE_GBPAGES) : guest_cpu_cap_has(vcpu, X86_FEATURE_GBPAGES); guest_cpu_cap_change(vcpu, X86_FEATURE_GBPAGES, allow_gbpages); best = kvm_find_cpuid_entry(vcpu, 1); if (best && apic) { if (cpuid_entry_has(best, X86_FEATURE_TSC_DEADLINE_TIMER)) apic->lapic_timer.timer_mode_mask = 3 << 17; else apic->lapic_timer.timer_mode_mask = 1 << 17; kvm_apic_set_version(vcpu); } vcpu->arch.guest_supported_xcr0 = cpuid_get_supported_xcr0(vcpu); vcpu->arch.guest_supported_xss = cpuid_get_supported_xss(vcpu); vcpu->arch.pv_cpuid.features = kvm_apply_cpuid_pv_features_quirk(vcpu); vcpu->arch.is_amd_compatible = guest_cpuid_is_amd_or_hygon(vcpu); vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu); vcpu->arch.reserved_gpa_bits = kvm_vcpu_reserved_gpa_bits_raw(vcpu); kvm_pmu_refresh(vcpu); #define __kvm_cpu_cap_has(UNUSED_, f) kvm_cpu_cap_has(f) vcpu->arch.cr4_guest_rsvd_bits = __cr4_reserved_bits(__kvm_cpu_cap_has, UNUSED_) | __cr4_reserved_bits(guest_cpu_cap_has, vcpu); #undef __kvm_cpu_cap_has kvm_hv_set_cpuid(vcpu, kvm_cpuid_has_hyperv(vcpu)); /* Invoke the vendor callback only after the above state is updated. */ kvm_x86_call(vcpu_after_set_cpuid)(vcpu); /* * Except for the MMU, which needs to do its thing any vendor specific * adjustments to the reserved GPA bits. */ kvm_mmu_after_set_cpuid(vcpu); kvm_make_request(KVM_REQ_RECALC_INTERCEPTS, vcpu); } int cpuid_query_maxphyaddr(struct kvm_vcpu *vcpu) { struct kvm_cpuid_entry2 *best; best = kvm_find_cpuid_entry(vcpu, 0x80000000); if (!best || best->eax < 0x80000008) goto not_found; best = kvm_find_cpuid_entry(vcpu, 0x80000008); if (best) return best->eax & 0xff; not_found: return 36; } int cpuid_query_maxguestphyaddr(struct kvm_vcpu *vcpu) { struct kvm_cpuid_entry2 *best; best = kvm_find_cpuid_entry(vcpu, 0x80000000); if (!best || best->eax < 0x80000008) goto not_found; best = kvm_find_cpuid_entry(vcpu, 0x80000008); if (best) return (best->eax >> 16) & 0xff; not_found: return 0; } /* * This "raw" version returns the reserved GPA bits without any adjustments for * encryption technologies that usurp bits. The raw mask should be used if and * only if hardware does _not_ strip the usurped bits, e.g. in virtual MTRRs. */ u64 kvm_vcpu_reserved_gpa_bits_raw(struct kvm_vcpu *vcpu) { return rsvd_bits(cpuid_maxphyaddr(vcpu), 63); } static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2, int nent) { u32 vcpu_caps[NR_KVM_CPU_CAPS]; int r; /* * Swap the existing (old) entries with the incoming (new) entries in * order to massage the new entries, e.g. to account for dynamic bits * that KVM controls, without clobbering the current guest CPUID, which * KVM needs to preserve in order to unwind on failure. * * Similarly, save the vCPU's current cpu_caps so that the capabilities * can be updated alongside the CPUID entries when performing runtime * updates. Full initialization is done if and only if the vCPU hasn't * run, i.e. only if userspace is potentially changing CPUID features. */ swap(vcpu->arch.cpuid_entries, e2); swap(vcpu->arch.cpuid_nent, nent); memcpy(vcpu_caps, vcpu->arch.cpu_caps, sizeof(vcpu_caps)); BUILD_BUG_ON(sizeof(vcpu_caps) != sizeof(vcpu->arch.cpu_caps)); /* * KVM does not correctly handle changing guest CPUID after KVM_RUN, as * MAXPHYADDR, GBPAGES support, AMD reserved bit behavior, etc.. aren't * tracked in kvm_mmu_page_role. As a result, KVM may miss guest page * faults due to reusing SPs/SPTEs. In practice no sane VMM mucks with * the core vCPU model on the fly. It would've been better to forbid any * KVM_SET_CPUID{,2} calls after KVM_RUN altogether but unfortunately * some VMMs (e.g. QEMU) reuse vCPU fds for CPU hotplug/unplug and do * KVM_SET_CPUID{,2} again. To support this legacy behavior, check * whether the supplied CPUID data is equal to what's already set. */ if (kvm_vcpu_has_run(vcpu)) { r = kvm_cpuid_check_equal(vcpu, e2, nent); if (r) goto err; goto success; } #ifdef CONFIG_KVM_HYPERV if (kvm_cpuid_has_hyperv(vcpu)) { r = kvm_hv_vcpu_init(vcpu); if (r) goto err; } #endif r = kvm_check_cpuid(vcpu); if (r) goto err; #ifdef CONFIG_KVM_XEN vcpu->arch.xen.cpuid = kvm_get_hypervisor_cpuid(vcpu, XEN_SIGNATURE); #endif kvm_vcpu_after_set_cpuid(vcpu); success: kvfree(e2); return 0; err: memcpy(vcpu->arch.cpu_caps, vcpu_caps, sizeof(vcpu_caps)); swap(vcpu->arch.cpuid_entries, e2); swap(vcpu->arch.cpuid_nent, nent); return r; } /* when an old userspace process fills a new kernel module */ int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid *cpuid, struct kvm_cpuid_entry __user *entries) { int r, i; struct kvm_cpuid_entry *e = NULL; struct kvm_cpuid_entry2 *e2 = NULL; if (cpuid->nent > KVM_MAX_CPUID_ENTRIES) return -E2BIG; if (cpuid->nent) { e = vmemdup_array_user(entries, cpuid->nent, sizeof(*e)); if (IS_ERR(e)) return PTR_ERR(e); e2 = kvmalloc_array(cpuid->nent, sizeof(*e2), GFP_KERNEL_ACCOUNT); if (!e2) { r = -ENOMEM; goto out_free_cpuid; } } for (i = 0; i < cpuid->nent; i++) { e2[i].function = e[i].function; e2[i].eax = e[i].eax; e2[i].ebx = e[i].ebx; e2[i].ecx = e[i].ecx; e2[i].edx = e[i].edx; e2[i].index = 0; e2[i].flags = 0; e2[i].padding[0] = 0; e2[i].padding[1] = 0; e2[i].padding[2] = 0; } r = kvm_set_cpuid(vcpu, e2, cpuid->nent); if (r) kvfree(e2); out_free_cpuid: kvfree(e); return r; } int kvm_vcpu_ioctl_set_cpuid2(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid, struct kvm_cpuid_entry2 __user *entries) { struct kvm_cpuid_entry2 *e2 = NULL; int r; if (cpuid->nent > KVM_MAX_CPUID_ENTRIES) return -E2BIG; if (cpuid->nent) { e2 = vmemdup_array_user(entries, cpuid->nent, sizeof(*e2)); if (IS_ERR(e2)) return PTR_ERR(e2); } r = kvm_set_cpuid(vcpu, e2, cpuid->nent); if (r) kvfree(e2); return r; } int kvm_vcpu_ioctl_get_cpuid2(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid, struct kvm_cpuid_entry2 __user *entries) { if (cpuid->nent < vcpu->arch.cpuid_nent) return -E2BIG; if (vcpu->arch.cpuid_dynamic_bits_dirty) kvm_update_cpuid_runtime(vcpu); if (copy_to_user(entries, vcpu->arch.cpuid_entries, vcpu->arch.cpuid_nent * sizeof(struct kvm_cpuid_entry2))) return -EFAULT; cpuid->nent = vcpu->arch.cpuid_nent; return 0; } static __always_inline u32 raw_cpuid_get(struct cpuid_reg cpuid) { struct kvm_cpuid_entry2 entry; u32 base; /* * KVM only supports features defined by Intel (0x0), AMD (0x80000000), * and Centaur (0xc0000000). WARN if a feature for new vendor base is * defined, as this and other code would need to be updated. */ base = cpuid.function & 0xffff0000; if (WARN_ON_ONCE(base && base != 0x80000000 && base != 0xc0000000)) return 0; if (cpuid_eax(base) < cpuid.function) return 0; cpuid_count(cpuid.function, cpuid.index, &entry.eax, &entry.ebx, &entry.ecx, &entry.edx); return *__cpuid_entry_get_reg(&entry, cpuid.reg); } /* * For kernel-defined leafs, mask KVM's supported feature set with the kernel's * capabilities as well as raw CPUID. For KVM-defined leafs, consult only raw * CPUID, as KVM is the one and only authority (in the kernel). */ #define kvm_cpu_cap_init(leaf, feature_initializers...) \ do { \ const struct cpuid_reg cpuid = x86_feature_cpuid(leaf * 32); \ const u32 __maybe_unused kvm_cpu_cap_init_in_progress = leaf; \ const u32 *kernel_cpu_caps = boot_cpu_data.x86_capability; \ u32 kvm_cpu_cap_passthrough = 0; \ u32 kvm_cpu_cap_synthesized = 0; \ u32 kvm_cpu_cap_emulated = 0; \ u32 kvm_cpu_cap_features = 0; \ \ feature_initializers \ \ kvm_cpu_caps[leaf] = kvm_cpu_cap_features; \ \ if (leaf < NCAPINTS) \ kvm_cpu_caps[leaf] &= kernel_cpu_caps[leaf]; \ \ kvm_cpu_caps[leaf] |= kvm_cpu_cap_passthrough; \ kvm_cpu_caps[leaf] &= (raw_cpuid_get(cpuid) | \ kvm_cpu_cap_synthesized); \ kvm_cpu_caps[leaf] |= kvm_cpu_cap_emulated; \ } while (0) /* * Assert that the feature bit being declared, e.g. via F(), is in the CPUID * word that's being initialized. Exempt 0x8000_0001.EDX usage of 0x1.EDX * features, as AMD duplicated many 0x1.EDX features into 0x8000_0001.EDX. */ #define KVM_VALIDATE_CPU_CAP_USAGE(name) \ do { \ u32 __leaf = __feature_leaf(X86_FEATURE_##name); \ \ BUILD_BUG_ON(__leaf != kvm_cpu_cap_init_in_progress); \ } while (0) #define F(name) \ ({ \ KVM_VALIDATE_CPU_CAP_USAGE(name); \ kvm_cpu_cap_features |= feature_bit(name); \ }) /* Scattered Flag - For features that are scattered by cpufeatures.h. */ #define SCATTERED_F(name) \ ({ \ BUILD_BUG_ON(X86_FEATURE_##name >= MAX_CPU_FEATURES); \ KVM_VALIDATE_CPU_CAP_USAGE(name); \ if (boot_cpu_has(X86_FEATURE_##name)) \ F(name); \ }) /* Features that KVM supports only on 64-bit kernels. */ #define X86_64_F(name) \ ({ \ KVM_VALIDATE_CPU_CAP_USAGE(name); \ if (IS_ENABLED(CONFIG_X86_64)) \ F(name); \ }) /* * Emulated Feature - For features that KVM emulates in software irrespective * of host CPU/kernel support. */ #define EMULATED_F(name) \ ({ \ kvm_cpu_cap_emulated |= feature_bit(name); \ F(name); \ }) /* * Synthesized Feature - For features that are synthesized into boot_cpu_data, * i.e. may not be present in the raw CPUID, but can still be advertised to * userspace. Primarily used for mitigation related feature flags. */ #define SYNTHESIZED_F(name) \ ({ \ kvm_cpu_cap_synthesized |= feature_bit(name); \ F(name); \ }) /* * Passthrough Feature - For features that KVM supports based purely on raw * hardware CPUID, i.e. that KVM virtualizes even if the host kernel doesn't * use the feature. Simply force set the feature in KVM's capabilities, raw * CPUID support will be factored in by kvm_cpu_cap_mask(). */ #define PASSTHROUGH_F(name) \ ({ \ kvm_cpu_cap_passthrough |= feature_bit(name); \ F(name); \ }) /* * Aliased Features - For features in 0x8000_0001.EDX that are duplicates of * identical 0x1.EDX features, and thus are aliased from 0x1 to 0x8000_0001. */ #define ALIASED_1_EDX_F(name) \ ({ \ BUILD_BUG_ON(__feature_leaf(X86_FEATURE_##name) != CPUID_1_EDX); \ BUILD_BUG_ON(kvm_cpu_cap_init_in_progress != CPUID_8000_0001_EDX); \ kvm_cpu_cap_features |= feature_bit(name); \ }) /* * Vendor Features - For features that KVM supports, but are added in later * because they require additional vendor enabling. */ #define VENDOR_F(name) \ ({ \ KVM_VALIDATE_CPU_CAP_USAGE(name); \ }) /* * Runtime Features - For features that KVM dynamically sets/clears at runtime, * e.g. when CR4 changes, but which are never advertised to userspace. */ #define RUNTIME_F(name) \ ({ \ KVM_VALIDATE_CPU_CAP_USAGE(name); \ }) /* * Undefine the MSR bit macro to avoid token concatenation issues when * processing X86_FEATURE_SPEC_CTRL_SSBD. */ #undef SPEC_CTRL_SSBD /* DS is defined by ptrace-abi.h on 32-bit builds. */ #undef DS void kvm_set_cpu_caps(void) { memset(kvm_cpu_caps, 0, sizeof(kvm_cpu_caps)); BUILD_BUG_ON(sizeof(kvm_cpu_caps) - (NKVMCAPINTS * sizeof(*kvm_cpu_caps)) > sizeof(boot_cpu_data.x86_capability)); kvm_cpu_cap_init(CPUID_1_ECX, F(XMM3), F(PCLMULQDQ), VENDOR_F(DTES64), /* * NOTE: MONITOR (and MWAIT) are emulated as NOP, but *not* * advertised to guests via CPUID! MWAIT is also technically a * runtime flag thanks to IA32_MISC_ENABLES; mark it as such so * that KVM is aware that it's a known, unadvertised flag. */ RUNTIME_F(MWAIT), /* DS-CPL */ VENDOR_F(VMX), /* SMX, EST */ /* TM2 */ F(SSSE3), /* CNXT-ID */ /* Reserved */ F(FMA), F(CX16), /* xTPR Update */ F(PDCM), F(PCID), /* Reserved, DCA */ F(XMM4_1), F(XMM4_2), EMULATED_F(X2APIC), F(MOVBE), F(POPCNT), EMULATED_F(TSC_DEADLINE_TIMER), F(AES), F(XSAVE), RUNTIME_F(OSXSAVE), F(AVX), F(F16C), F(RDRAND), EMULATED_F(HYPERVISOR), ); kvm_cpu_cap_init(CPUID_1_EDX, F(FPU), F(VME), F(DE), F(PSE), F(TSC), F(MSR), F(PAE), F(MCE), F(CX8), F(APIC), /* Reserved */ F(SEP), F(MTRR), F(PGE), F(MCA), F(CMOV), F(PAT), F(PSE36), /* PSN */ F(CLFLUSH), /* Reserved */ VENDOR_F(DS), /* ACPI */ F(MMX), F(FXSR), F(XMM), F(XMM2), F(SELFSNOOP), /* HTT, TM, Reserved, PBE */ ); kvm_cpu_cap_init(CPUID_7_0_EBX, F(FSGSBASE), EMULATED_F(TSC_ADJUST), F(SGX), F(BMI1), F(HLE), F(AVX2), F(FDP_EXCPTN_ONLY), F(SMEP), F(BMI2), F(ERMS), F(INVPCID), F(RTM), F(ZERO_FCS_FDS), VENDOR_F(MPX), F(AVX512F), F(AVX512DQ), F(RDSEED), F(ADX), F(SMAP), F(AVX512IFMA), F(CLFLUSHOPT), F(CLWB), VENDOR_F(INTEL_PT), F(AVX512PF), F(AVX512ER), F(AVX512CD), F(SHA_NI), F(AVX512BW), F(AVX512VL), ); kvm_cpu_cap_init(CPUID_7_ECX, F(AVX512VBMI), PASSTHROUGH_F(LA57), F(PKU), RUNTIME_F(OSPKE), F(RDPID), F(AVX512_VPOPCNTDQ), F(UMIP), F(AVX512_VBMI2), F(GFNI), F(VAES), F(VPCLMULQDQ), F(AVX512_VNNI), F(AVX512_BITALG), F(CLDEMOTE), F(MOVDIRI), F(MOVDIR64B), VENDOR_F(WAITPKG), F(SGX_LC), F(BUS_LOCK_DETECT), X86_64_F(SHSTK), ); /* * PKU not yet implemented for shadow paging and requires OSPKE * to be set on the host. Clear it if that is not the case */ if (!tdp_enabled || !boot_cpu_has(X86_FEATURE_OSPKE)) kvm_cpu_cap_clear(X86_FEATURE_PKU); /* * Shadow Stacks aren't implemented in the Shadow MMU. Shadow Stack * accesses require "magic" Writable=0,Dirty=1 protection, which KVM * doesn't know how to emulate or map. */ if (!tdp_enabled) kvm_cpu_cap_clear(X86_FEATURE_SHSTK); kvm_cpu_cap_init(CPUID_7_EDX, F(AVX512_4VNNIW), F(AVX512_4FMAPS), F(SPEC_CTRL), F(SPEC_CTRL_SSBD), EMULATED_F(ARCH_CAPABILITIES), F(INTEL_STIBP), F(MD_CLEAR), F(AVX512_VP2INTERSECT), F(FSRM), F(SERIALIZE), F(TSXLDTRK), F(AVX512_FP16), F(AMX_TILE), F(AMX_INT8), F(AMX_BF16), F(FLUSH_L1D), F(IBT), ); /* * Disable support for IBT and SHSTK if KVM is configured to emulate * accesses to reserved GPAs, as KVM's emulator doesn't support IBT or * SHSTK, nor does KVM handle Shadow Stack #PFs (see above). */ if (allow_smaller_maxphyaddr) { kvm_cpu_cap_clear(X86_FEATURE_SHSTK); kvm_cpu_cap_clear(X86_FEATURE_IBT); } if (boot_cpu_has(X86_FEATURE_AMD_IBPB_RET) && boot_cpu_has(X86_FEATURE_AMD_IBPB) && boot_cpu_has(X86_FEATURE_AMD_IBRS)) kvm_cpu_cap_set(X86_FEATURE_SPEC_CTRL); if (boot_cpu_has(X86_FEATURE_STIBP)) kvm_cpu_cap_set(X86_FEATURE_INTEL_STIBP); if (boot_cpu_has(X86_FEATURE_AMD_SSBD)) kvm_cpu_cap_set(X86_FEATURE_SPEC_CTRL_SSBD); kvm_cpu_cap_init(CPUID_7_1_EAX, F(SHA512), F(SM3), F(SM4), F(AVX_VNNI), F(AVX512_BF16), F(CMPCCXADD), F(FZRM), F(FSRS), F(FSRC), F(WRMSRNS), X86_64_F(LKGS), F(AMX_FP16), F(AVX_IFMA), F(LAM), ); kvm_cpu_cap_init(CPUID_7_1_ECX, SCATTERED_F(MSR_IMM), ); kvm_cpu_cap_init(CPUID_7_1_EDX, F(AVX_VNNI_INT8), F(AVX_NE_CONVERT), F(AMX_COMPLEX), F(AVX_VNNI_INT16), F(PREFETCHITI), F(AVX10), ); kvm_cpu_cap_init(CPUID_7_2_EDX, F(INTEL_PSFD), F(IPRED_CTRL), F(RRSBA_CTRL), F(DDPD_U), F(BHI_CTRL), F(MCDT_NO), ); kvm_cpu_cap_init(CPUID_D_1_EAX, F(XSAVEOPT), F(XSAVEC), F(XGETBV1), F(XSAVES), X86_64_F(XFD), ); kvm_cpu_cap_init(CPUID_12_EAX, SCATTERED_F(SGX1), SCATTERED_F(SGX2), SCATTERED_F(SGX_EDECCSSA), ); kvm_cpu_cap_init(CPUID_24_0_EBX, F(AVX10_128), F(AVX10_256), F(AVX10_512), ); kvm_cpu_cap_init(CPUID_8000_0001_ECX, F(LAHF_LM), F(CMP_LEGACY), VENDOR_F(SVM), /* ExtApicSpace */ F(CR8_LEGACY), F(ABM), F(SSE4A), F(MISALIGNSSE), F(3DNOWPREFETCH), F(OSVW), /* IBS */ F(XOP), /* SKINIT, WDT, LWP */ F(FMA4), F(TBM), F(TOPOEXT), VENDOR_F(PERFCTR_CORE), ); kvm_cpu_cap_init(CPUID_8000_0001_EDX, ALIASED_1_EDX_F(FPU), ALIASED_1_EDX_F(VME), ALIASED_1_EDX_F(DE), ALIASED_1_EDX_F(PSE), ALIASED_1_EDX_F(TSC), ALIASED_1_EDX_F(MSR), ALIASED_1_EDX_F(PAE), ALIASED_1_EDX_F(MCE), ALIASED_1_EDX_F(CX8), ALIASED_1_EDX_F(APIC), /* Reserved */ F(SYSCALL), ALIASED_1_EDX_F(MTRR), ALIASED_1_EDX_F(PGE), ALIASED_1_EDX_F(MCA), ALIASED_1_EDX_F(CMOV), ALIASED_1_EDX_F(PAT), ALIASED_1_EDX_F(PSE36), /* Reserved */ F(NX), /* Reserved */ F(MMXEXT), ALIASED_1_EDX_F(MMX), ALIASED_1_EDX_F(FXSR), F(FXSR_OPT), X86_64_F(GBPAGES), F(RDTSCP), /* Reserved */ X86_64_F(LM), F(3DNOWEXT), F(3DNOW), ); if (!tdp_enabled && IS_ENABLED(CONFIG_X86_64)) kvm_cpu_cap_set(X86_FEATURE_GBPAGES); kvm_cpu_cap_init(CPUID_8000_0007_EDX, SCATTERED_F(CONSTANT_TSC), ); kvm_cpu_cap_init(CPUID_8000_0008_EBX, F(CLZERO), F(XSAVEERPTR), F(WBNOINVD), F(AMD_IBPB), F(AMD_IBRS), F(AMD_SSBD), F(VIRT_SSBD), F(AMD_SSB_NO), F(AMD_STIBP), F(AMD_STIBP_ALWAYS_ON), F(AMD_IBRS_SAME_MODE), F(AMD_PSFD), F(AMD_IBPB_RET), ); /* * AMD has separate bits for each SPEC_CTRL bit. * arch/x86/kernel/cpu/bugs.c is kind enough to * record that in cpufeatures so use them. */ if (boot_cpu_has(X86_FEATURE_IBPB)) { kvm_cpu_cap_set(X86_FEATURE_AMD_IBPB); if (boot_cpu_has(X86_FEATURE_SPEC_CTRL) && !boot_cpu_has_bug(X86_BUG_EIBRS_PBRSB)) kvm_cpu_cap_set(X86_FEATURE_AMD_IBPB_RET); } if (boot_cpu_has(X86_FEATURE_IBRS)) kvm_cpu_cap_set(X86_FEATURE_AMD_IBRS); if (boot_cpu_has(X86_FEATURE_STIBP)) kvm_cpu_cap_set(X86_FEATURE_AMD_STIBP); if (boot_cpu_has(X86_FEATURE_SPEC_CTRL_SSBD)) kvm_cpu_cap_set(X86_FEATURE_AMD_SSBD); if (!boot_cpu_has_bug(X86_BUG_SPEC_STORE_BYPASS)) kvm_cpu_cap_set(X86_FEATURE_AMD_SSB_NO); /* * The preference is to use SPEC CTRL MSR instead of the * VIRT_SPEC MSR. */ if (boot_cpu_has(X86_FEATURE_LS_CFG_SSBD) && !boot_cpu_has(X86_FEATURE_AMD_SSBD)) kvm_cpu_cap_set(X86_FEATURE_VIRT_SSBD); /* All SVM features required additional vendor module enabling. */ kvm_cpu_cap_init(CPUID_8000_000A_EDX, VENDOR_F(NPT), VENDOR_F(VMCBCLEAN), VENDOR_F(FLUSHBYASID), VENDOR_F(NRIPS), VENDOR_F(TSCRATEMSR), VENDOR_F(V_VMSAVE_VMLOAD), VENDOR_F(LBRV), VENDOR_F(PAUSEFILTER), VENDOR_F(PFTHRESHOLD), VENDOR_F(VGIF), VENDOR_F(VNMI), VENDOR_F(SVME_ADDR_CHK), ); kvm_cpu_cap_init(CPUID_8000_001F_EAX, VENDOR_F(SME), VENDOR_F(SEV), /* VM_PAGE_FLUSH */ VENDOR_F(SEV_ES), F(SME_COHERENT), ); kvm_cpu_cap_init(CPUID_8000_0021_EAX, F(NO_NESTED_DATA_BP), F(WRMSR_XX_BASE_NS), /* * Synthesize "LFENCE is serializing" into the AMD-defined entry * in KVM's supported CPUID, i.e. if the feature is reported as * supported by the kernel. LFENCE_RDTSC was a Linux-defined * synthetic feature long before AMD joined the bandwagon, e.g. * LFENCE is serializing on most CPUs that support SSE2. On * CPUs that don't support AMD's leaf, ANDing with the raw host * CPUID will drop the flags, and reporting support in AMD's * leaf can make it easier for userspace to detect the feature. */ SYNTHESIZED_F(LFENCE_RDTSC), /* SmmPgCfgLock */ /* 4: Resv */ SYNTHESIZED_F(VERW_CLEAR), F(NULL_SEL_CLR_BASE), /* UpperAddressIgnore */ F(AUTOIBRS), F(PREFETCHI), EMULATED_F(NO_SMM_CTL_MSR), /* PrefetchCtlMsr */ /* GpOnUserCpuid */ /* EPSF */ SYNTHESIZED_F(SBPB), SYNTHESIZED_F(IBPB_BRTYPE), SYNTHESIZED_F(SRSO_NO), F(SRSO_USER_KERNEL_NO), ); kvm_cpu_cap_init(CPUID_8000_0021_ECX, SYNTHESIZED_F(TSA_SQ_NO), SYNTHESIZED_F(TSA_L1_NO), ); kvm_cpu_cap_init(CPUID_8000_0022_EAX, F(PERFMON_V2), ); if (!static_cpu_has_bug(X86_BUG_NULL_SEG)) kvm_cpu_cap_set(X86_FEATURE_NULL_SEL_CLR_BASE); kvm_cpu_cap_init(CPUID_C000_0001_EDX, F(XSTORE), F(XSTORE_EN), F(XCRYPT), F(XCRYPT_EN), F(ACE2), F(ACE2_EN), F(PHE), F(PHE_EN), F(PMM), F(PMM_EN), ); /* * Hide RDTSCP and RDPID if either feature is reported as supported but * probing MSR_TSC_AUX failed. This is purely a sanity check and * should never happen, but the guest will likely crash if RDTSCP or * RDPID is misreported, and KVM has botched MSR_TSC_AUX emulation in * the past. For example, the sanity check may fire if this instance of * KVM is running as L1 on top of an older, broken KVM. */ if (WARN_ON((kvm_cpu_cap_has(X86_FEATURE_RDTSCP) || kvm_cpu_cap_has(X86_FEATURE_RDPID)) && !kvm_is_supported_user_return_msr(MSR_TSC_AUX))) { kvm_cpu_cap_clear(X86_FEATURE_RDTSCP); kvm_cpu_cap_clear(X86_FEATURE_RDPID); } } EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_set_cpu_caps); #undef F #undef SCATTERED_F #undef X86_64_F #undef EMULATED_F #undef SYNTHESIZED_F #undef PASSTHROUGH_F #undef ALIASED_1_EDX_F #undef VENDOR_F #undef RUNTIME_F struct kvm_cpuid_array { struct kvm_cpuid_entry2 *entries; int maxnent; int nent; }; static struct kvm_cpuid_entry2 *get_next_cpuid(struct kvm_cpuid_array *array) { if (array->nent >= array->maxnent) return NULL; return &array->entries[array->nent++]; } static struct kvm_cpuid_entry2 *do_host_cpuid(struct kvm_cpuid_array *array, u32 function, u32 index) { struct kvm_cpuid_entry2 *entry = get_next_cpuid(array); if (!entry) return NULL; memset(entry, 0, sizeof(*entry)); entry->function = function; entry->index = index; switch (function & 0xC0000000) { case 0x40000000: /* Hypervisor leaves are always synthesized by __do_cpuid_func. */ return entry; case 0x80000000: /* * 0x80000021 is sometimes synthesized by __do_cpuid_func, which * would result in out-of-bounds calls to do_host_cpuid. */ { static int max_cpuid_80000000; if (!READ_ONCE(max_cpuid_80000000)) WRITE_ONCE(max_cpuid_80000000, cpuid_eax(0x80000000)); if (function > READ_ONCE(max_cpuid_80000000)) return entry; } break; default: break; } cpuid_count(entry->function, entry->index, &entry->eax, &entry->ebx, &entry->ecx, &entry->edx); if (cpuid_function_is_indexed(function)) entry->flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX; return entry; } static int cpuid_func_emulated(struct kvm_cpuid_entry2 *entry, u32 func, bool include_partially_emulated) { memset(entry, 0, sizeof(*entry)); entry->function = func; entry->index = 0; entry->flags = 0; switch (func) { case 0: entry->eax = 7; return 1; case 1: entry->ecx = feature_bit(MOVBE); /* * KVM allows userspace to enumerate MONITOR+MWAIT support to * the guest, but the MWAIT feature flag is never advertised * to userspace because MONITOR+MWAIT aren't virtualized by * hardware, can't be faithfully emulated in software (KVM * emulates them as NOPs), and allowing the guest to execute * them natively requires enabling a per-VM capability. */ if (include_partially_emulated) entry->ecx |= feature_bit(MWAIT); return 1; case 7: entry->flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX; entry->eax = 0; if (kvm_cpu_cap_has(X86_FEATURE_RDTSCP)) entry->ecx = feature_bit(RDPID); return 1; default: return 0; } } static int __do_cpuid_func_emulated(struct kvm_cpuid_array *array, u32 func) { if (array->nent >= array->maxnent) return -E2BIG; array->nent += cpuid_func_emulated(&array->entries[array->nent], func, false); return 0; } static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function) { struct kvm_cpuid_entry2 *entry; int r, i, max_idx; /* all calls to cpuid_count() should be made on the same cpu */ get_cpu(); r = -E2BIG; entry = do_host_cpuid(array, function, 0); if (!entry) goto out; switch (function) { case 0: /* Limited to the highest leaf implemented in KVM. */ entry->eax = min(entry->eax, 0x24U); break; case 1: cpuid_entry_override(entry, CPUID_1_EDX); cpuid_entry_override(entry, CPUID_1_ECX); break; case 2: /* * On ancient CPUs, function 2 entries are STATEFUL. That is, * CPUID(function=2, index=0) may return different results each * time, with the least-significant byte in EAX enumerating the * number of times software should do CPUID(2, 0). * * Modern CPUs, i.e. every CPU KVM has *ever* run on are less * idiotic. Intel's SDM states that EAX & 0xff "will always * return 01H. Software should ignore this value and not * interpret it as an informational descriptor", while AMD's * APM states that CPUID(2) is reserved. * * WARN if a frankenstein CPU that supports virtualization and * a stateful CPUID.0x2 is encountered. */ WARN_ON_ONCE((entry->eax & 0xff) > 1); break; /* functions 4 and 0x8000001d have additional index. */ case 4: case 0x8000001d: /* * Read entries until the cache type in the previous entry is * zero, i.e. indicates an invalid entry. */ for (i = 1; entry->eax & 0x1f; ++i) { entry = do_host_cpuid(array, function, i); if (!entry) goto out; } break; case 6: /* Thermal management */ entry->eax = 0x4; /* allow ARAT */ entry->ebx = 0; entry->ecx = 0; entry->edx = 0; break; /* function 7 has additional index. */ case 7: max_idx = entry->eax = min(entry->eax, 2u); cpuid_entry_override(entry, CPUID_7_0_EBX); cpuid_entry_override(entry, CPUID_7_ECX); cpuid_entry_override(entry, CPUID_7_EDX); /* KVM only supports up to 0x7.2, capped above via min(). */ if (max_idx >= 1) { entry = do_host_cpuid(array, function, 1); if (!entry) goto out; cpuid_entry_override(entry, CPUID_7_1_EAX); cpuid_entry_override(entry, CPUID_7_1_ECX); cpuid_entry_override(entry, CPUID_7_1_EDX); entry->ebx = 0; } if (max_idx >= 2) { entry = do_host_cpuid(array, function, 2); if (!entry) goto out; cpuid_entry_override(entry, CPUID_7_2_EDX); entry->ecx = 0; entry->ebx = 0; entry->eax = 0; } break; case 0xa: { /* Architectural Performance Monitoring */ union cpuid10_eax eax = { }; union cpuid10_edx edx = { }; if (!enable_pmu || !static_cpu_has(X86_FEATURE_ARCH_PERFMON)) { entry->eax = entry->ebx = entry->ecx = entry->edx = 0; break; } eax.split.version_id = kvm_pmu_cap.version; eax.split.num_counters = kvm_pmu_cap.num_counters_gp; eax.split.bit_width = kvm_pmu_cap.bit_width_gp; eax.split.mask_length = kvm_pmu_cap.events_mask_len; edx.split.num_counters_fixed = kvm_pmu_cap.num_counters_fixed; edx.split.bit_width_fixed = kvm_pmu_cap.bit_width_fixed; if (kvm_pmu_cap.version) edx.split.anythread_deprecated = 1; entry->eax = eax.full; entry->ebx = kvm_pmu_cap.events_mask; entry->ecx = 0; entry->edx = edx.full; break; } case 0x1f: case 0xb: /* * No topology; a valid topology is indicated by the presence * of subleaf 1. */ entry->eax = entry->ebx = entry->ecx = 0; break; case 0xd: { u64 permitted_xcr0 = kvm_get_filtered_xcr0(); u64 permitted_xss = kvm_caps.supported_xss; entry->eax &= permitted_xcr0; entry->ebx = xstate_required_size(permitted_xcr0, false); entry->ecx = entry->ebx; entry->edx &= permitted_xcr0 >> 32; if (!permitted_xcr0) break; entry = do_host_cpuid(array, function, 1); if (!entry) goto out; cpuid_entry_override(entry, CPUID_D_1_EAX); if (entry->eax & (feature_bit(XSAVES) | feature_bit(XSAVEC))) entry->ebx = xstate_required_size(permitted_xcr0 | permitted_xss, true); else { WARN_ON_ONCE(permitted_xss != 0); entry->ebx = 0; } entry->ecx &= permitted_xss; entry->edx &= permitted_xss >> 32; for (i = 2; i < 64; ++i) { bool s_state; if (permitted_xcr0 & BIT_ULL(i)) s_state = false; else if (permitted_xss & BIT_ULL(i)) s_state = true; else continue; entry = do_host_cpuid(array, function, i); if (!entry) goto out; /* * The supported check above should have filtered out * invalid sub-leafs. Only valid sub-leafs should * reach this point, and they should have a non-zero * save state size. Furthermore, check whether the * processor agrees with permitted_xcr0/permitted_xss * on whether this is an XCR0- or IA32_XSS-managed area. */ if (WARN_ON_ONCE(!entry->eax || (entry->ecx & 0x1) != s_state)) { --array->nent; continue; } if (!kvm_cpu_cap_has(X86_FEATURE_XFD)) entry->ecx &= ~BIT_ULL(2); entry->edx = 0; } break; } case 0x12: /* Intel SGX */ if (!kvm_cpu_cap_has(X86_FEATURE_SGX)) { entry->eax = entry->ebx = entry->ecx = entry->edx = 0; break; } /* * Index 0: Sub-features, MISCSELECT (a.k.a extended features) * and max enclave sizes. The SGX sub-features and MISCSELECT * are restricted by kernel and KVM capabilities (like most * feature flags), while enclave size is unrestricted. */ cpuid_entry_override(entry, CPUID_12_EAX); entry->ebx &= SGX_MISC_EXINFO; entry = do_host_cpuid(array, function, 1); if (!entry) goto out; /* * Index 1: SECS.ATTRIBUTES. ATTRIBUTES are restricted a la * feature flags. Advertise all supported flags, including * privileged attributes that require explicit opt-in from * userspace. ATTRIBUTES.XFRM is not adjusted as userspace is * expected to derive it from supported XCR0. */ entry->eax &= SGX_ATTR_PRIV_MASK | SGX_ATTR_UNPRIV_MASK; entry->ebx &= 0; break; /* Intel PT */ case 0x14: if (!kvm_cpu_cap_has(X86_FEATURE_INTEL_PT)) { entry->eax = entry->ebx = entry->ecx = entry->edx = 0; break; } for (i = 1, max_idx = entry->eax; i <= max_idx; ++i) { if (!do_host_cpuid(array, function, i)) goto out; } break; /* Intel AMX TILE */ case 0x1d: if (!kvm_cpu_cap_has(X86_FEATURE_AMX_TILE)) { entry->eax = entry->ebx = entry->ecx = entry->edx = 0; break; } for (i = 1, max_idx = entry->eax; i <= max_idx; ++i) { if (!do_host_cpuid(array, function, i)) goto out; } break; case 0x1e: /* TMUL information */ if (!kvm_cpu_cap_has(X86_FEATURE_AMX_TILE)) { entry->eax = entry->ebx = entry->ecx = entry->edx = 0; break; } break; case 0x24: { u8 avx10_version; if (!kvm_cpu_cap_has(X86_FEATURE_AVX10)) { entry->eax = entry->ebx = entry->ecx = entry->edx = 0; break; } /* * The AVX10 version is encoded in EBX[7:0]. Note, the version * is guaranteed to be >=1 if AVX10 is supported. Note #2, the * version needs to be captured before overriding EBX features! */ avx10_version = min_t(u8, entry->ebx & 0xff, 1); cpuid_entry_override(entry, CPUID_24_0_EBX); entry->ebx |= avx10_version; entry->eax = 0; entry->ecx = 0; entry->edx = 0; break; } case KVM_CPUID_SIGNATURE: { const u32 *sigptr = (const u32 *)KVM_SIGNATURE; entry->eax = KVM_CPUID_FEATURES; entry->ebx = sigptr[0]; entry->ecx = sigptr[1]; entry->edx = sigptr[2]; break; } case KVM_CPUID_FEATURES: entry->eax = (1 << KVM_FEATURE_CLOCKSOURCE) | (1 << KVM_FEATURE_NOP_IO_DELAY) | (1 << KVM_FEATURE_CLOCKSOURCE2) | (1 << KVM_FEATURE_ASYNC_PF) | (1 << KVM_FEATURE_PV_EOI) | (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) | (1 << KVM_FEATURE_PV_UNHALT) | (1 << KVM_FEATURE_PV_TLB_FLUSH) | (1 << KVM_FEATURE_ASYNC_PF_VMEXIT) | (1 << KVM_FEATURE_PV_SEND_IPI) | (1 << KVM_FEATURE_POLL_CONTROL) | (1 << KVM_FEATURE_PV_SCHED_YIELD) | (1 << KVM_FEATURE_ASYNC_PF_INT); if (sched_info_on()) entry->eax |= (1 << KVM_FEATURE_STEAL_TIME); entry->ebx = 0; entry->ecx = 0; entry->edx = 0; break; case 0x80000000: entry->eax = min(entry->eax, 0x80000022); /* * Serializing LFENCE is reported in a multitude of ways, and * NullSegClearsBase is not reported in CPUID on Zen2; help * userspace by providing the CPUID leaf ourselves. * * However, only do it if the host has CPUID leaf 0x8000001d. * QEMU thinks that it can query the host blindly for that * CPUID leaf if KVM reports that it supports 0x8000001d or * above. The processor merrily returns values from the * highest Intel leaf which QEMU tries to use as the guest's * 0x8000001d. Even worse, this can result in an infinite * loop if said highest leaf has no subleaves indexed by ECX. */ if (entry->eax >= 0x8000001d && (static_cpu_has(X86_FEATURE_LFENCE_RDTSC) || !static_cpu_has_bug(X86_BUG_NULL_SEG))) entry->eax = max(entry->eax, 0x80000021); break; case 0x80000001: entry->ebx &= ~GENMASK(27, 16); cpuid_entry_override(entry, CPUID_8000_0001_EDX); cpuid_entry_override(entry, CPUID_8000_0001_ECX); break; case 0x80000005: /* Pass host L1 cache and TLB info. */ break; case 0x80000006: /* Drop reserved bits, pass host L2 cache and TLB info. */ entry->edx &= ~GENMASK(17, 16); break; case 0x80000007: /* Advanced power management */ cpuid_entry_override(entry, CPUID_8000_0007_EDX); /* mask against host */ entry->edx &= boot_cpu_data.x86_power; entry->eax = entry->ebx = entry->ecx = 0; break; case 0x80000008: { /* * GuestPhysAddrSize (EAX[23:16]) is intended for software * use. * * KVM's ABI is to report the effective MAXPHYADDR for the * guest in PhysAddrSize (phys_as), and the maximum * *addressable* GPA in GuestPhysAddrSize (g_phys_as). * * GuestPhysAddrSize is valid if and only if TDP is enabled, * in which case the max GPA that can be addressed by KVM may * be less than the max GPA that can be legally generated by * the guest, e.g. if MAXPHYADDR>48 but the CPU doesn't * support 5-level TDP. */ unsigned int virt_as = max((entry->eax >> 8) & 0xff, 48U); unsigned int phys_as, g_phys_as; /* * If TDP (NPT) is disabled use the adjusted host MAXPHYADDR as * the guest operates in the same PA space as the host, i.e. * reductions in MAXPHYADDR for memory encryption affect shadow * paging, too. * * If TDP is enabled, use the raw bare metal MAXPHYADDR as * reductions to the HPAs do not affect GPAs. The max * addressable GPA is the same as the max effective GPA, except * that it's capped at 48 bits if 5-level TDP isn't supported * (hardware processes bits 51:48 only when walking the fifth * level page table). */ if (!tdp_enabled) { phys_as = boot_cpu_data.x86_phys_bits; g_phys_as = 0; } else { phys_as = entry->eax & 0xff; g_phys_as = phys_as; if (kvm_mmu_get_max_tdp_level() < 5) g_phys_as = min(g_phys_as, 48U); } entry->eax = phys_as | (virt_as << 8) | (g_phys_as << 16); entry->ecx &= ~(GENMASK(31, 16) | GENMASK(11, 8)); entry->edx = 0; cpuid_entry_override(entry, CPUID_8000_0008_EBX); break; } case 0x8000000A: if (!kvm_cpu_cap_has(X86_FEATURE_SVM)) { entry->eax = entry->ebx = entry->ecx = entry->edx = 0; break; } entry->eax = 1; /* SVM revision 1 */ entry->ebx = 8; /* Lets support 8 ASIDs in case we add proper ASID emulation to nested SVM */ entry->ecx = 0; /* Reserved */ cpuid_entry_override(entry, CPUID_8000_000A_EDX); break; case 0x80000019: entry->ecx = entry->edx = 0; break; case 0x8000001a: entry->eax &= GENMASK(2, 0); entry->ebx = entry->ecx = entry->edx = 0; break; case 0x8000001e: /* Do not return host topology information. */ entry->eax = entry->ebx = entry->ecx = 0; entry->edx = 0; /* reserved */ break; case 0x8000001F: if (!kvm_cpu_cap_has(X86_FEATURE_SEV)) { entry->eax = entry->ebx = entry->ecx = entry->edx = 0; } else { cpuid_entry_override(entry, CPUID_8000_001F_EAX); /* Clear NumVMPL since KVM does not support VMPL. */ entry->ebx &= ~GENMASK(31, 12); /* * Enumerate '0' for "PA bits reduction", the adjusted * MAXPHYADDR is enumerated directly (see 0x80000008). */ entry->ebx &= ~GENMASK(11, 6); } break; case 0x80000020: entry->eax = entry->ebx = entry->ecx = entry->edx = 0; break; case 0x80000021: entry->ebx = entry->edx = 0; cpuid_entry_override(entry, CPUID_8000_0021_EAX); cpuid_entry_override(entry, CPUID_8000_0021_ECX); break; /* AMD Extended Performance Monitoring and Debug */ case 0x80000022: { union cpuid_0x80000022_ebx ebx = { }; entry->ecx = entry->edx = 0; if (!enable_pmu || !kvm_cpu_cap_has(X86_FEATURE_PERFMON_V2)) { entry->eax = entry->ebx = 0; break; } cpuid_entry_override(entry, CPUID_8000_0022_EAX); ebx.split.num_core_pmc = kvm_pmu_cap.num_counters_gp; entry->ebx = ebx.full; break; } /*Add support for Centaur's CPUID instruction*/ case 0xC0000000: /*Just support up to 0xC0000004 now*/ entry->eax = min(entry->eax, 0xC0000004); break; case 0xC0000001: cpuid_entry_override(entry, CPUID_C000_0001_EDX); break; case 3: /* Processor serial number */ case 5: /* MONITOR/MWAIT */ case 0xC0000002: case 0xC0000003: case 0xC0000004: default: entry->eax = entry->ebx = entry->ecx = entry->edx = 0; break; } r = 0; out: put_cpu(); return r; } static int do_cpuid_func(struct kvm_cpuid_array *array, u32 func, unsigned int type) { if (type == KVM_GET_EMULATED_CPUID) return __do_cpuid_func_emulated(array, func); return __do_cpuid_func(array, func); } #define CENTAUR_CPUID_SIGNATURE 0xC0000000 static int get_cpuid_func(struct kvm_cpuid_array *array, u32 func, unsigned int type) { u32 limit; int r; if (func == CENTAUR_CPUID_SIGNATURE && boot_cpu_data.x86_vendor != X86_VENDOR_CENTAUR && boot_cpu_data.x86_vendor != X86_VENDOR_ZHAOXIN) return 0; r = do_cpuid_func(array, func, type); if (r) return r; limit = array->entries[array->nent - 1].eax; for (func = func + 1; func <= limit; ++func) { r = do_cpuid_func(array, func, type); if (r) break; } return r; } static bool sanity_check_entries(struct kvm_cpuid_entry2 __user *entries, __u32 num_entries, unsigned int ioctl_type) { int i; __u32 pad[3]; if (ioctl_type != KVM_GET_EMULATED_CPUID) return false; /* * We want to make sure that ->padding is being passed clean from * userspace in case we want to use it for something in the future. * * Sadly, this wasn't enforced for KVM_GET_SUPPORTED_CPUID and so we * have to give ourselves satisfied only with the emulated side. /me * sheds a tear. */ for (i = 0; i < num_entries; i++) { if (copy_from_user(pad, entries[i].padding, sizeof(pad))) return true; if (pad[0] || pad[1] || pad[2]) return true; } return false; } int kvm_dev_ioctl_get_cpuid(struct kvm_cpuid2 *cpuid, struct kvm_cpuid_entry2 __user *entries, unsigned int type) { static const u32 funcs[] = { 0, 0x80000000, CENTAUR_CPUID_SIGNATURE, KVM_CPUID_SIGNATURE, }; struct kvm_cpuid_array array = { .nent = 0, }; int r, i; if (cpuid->nent < 1) return -E2BIG; if (cpuid->nent > KVM_MAX_CPUID_ENTRIES) cpuid->nent = KVM_MAX_CPUID_ENTRIES; if (sanity_check_entries(entries, cpuid->nent, type)) return -EINVAL; array.entries = kvcalloc(cpuid->nent, sizeof(struct kvm_cpuid_entry2), GFP_KERNEL); if (!array.entries) return -ENOMEM; array.maxnent = cpuid->nent; for (i = 0; i < ARRAY_SIZE(funcs); i++) { r = get_cpuid_func(&array, funcs[i], type); if (r) goto out_free; } cpuid->nent = array.nent; if (copy_to_user(entries, array.entries, array.nent * sizeof(struct kvm_cpuid_entry2))) r = -EFAULT; out_free: kvfree(array.entries); return r; } /* * Intel CPUID semantics treats any query for an out-of-range leaf as if the * highest basic leaf (i.e. CPUID.0H:EAX) were requested. AMD CPUID semantics * returns all zeroes for any undefined leaf, whether or not the leaf is in * range. Centaur/VIA follows Intel semantics. * * A leaf is considered out-of-range if its function is higher than the maximum * supported leaf of its associated class or if its associated class does not * exist. * * There are three primary classes to be considered, with their respective * ranges described as "<base> - <top>[,<base2> - <top2>] inclusive. A primary * class exists if a guest CPUID entry for its <base> leaf exists. For a given * class, CPUID.<base>.EAX contains the max supported leaf for the class. * * - Basic: 0x00000000 - 0x3fffffff, 0x50000000 - 0x7fffffff * - Hypervisor: 0x40000000 - 0x4fffffff * - Extended: 0x80000000 - 0xbfffffff * - Centaur: 0xc0000000 - 0xcfffffff * * The Hypervisor class is further subdivided into sub-classes that each act as * their own independent class associated with a 0x100 byte range. E.g. if Qemu * is advertising support for both HyperV and KVM, the resulting Hypervisor * CPUID sub-classes are: * * - HyperV: 0x40000000 - 0x400000ff * - KVM: 0x40000100 - 0x400001ff */ static struct kvm_cpuid_entry2 * get_out_of_range_cpuid_entry(struct kvm_vcpu *vcpu, u32 *fn_ptr, u32 index) { struct kvm_cpuid_entry2 *basic, *class; u32 function = *fn_ptr; basic = kvm_find_cpuid_entry(vcpu, 0); if (!basic) return NULL; if (is_guest_vendor_amd(basic->ebx, basic->ecx, basic->edx) || is_guest_vendor_hygon(basic->ebx, basic->ecx, basic->edx)) return NULL; if (function >= 0x40000000 && function <= 0x4fffffff) class = kvm_find_cpuid_entry(vcpu, function & 0xffffff00); else if (function >= 0xc0000000) class = kvm_find_cpuid_entry(vcpu, 0xc0000000); else class = kvm_find_cpuid_entry(vcpu, function & 0x80000000); if (class && function <= class->eax) return NULL; /* * Leaf specific adjustments are also applied when redirecting to the * max basic entry, e.g. if the max basic leaf is 0xb but there is no * entry for CPUID.0xb.index (see below), then the output value for EDX * needs to be pulled from CPUID.0xb.1. */ *fn_ptr = basic->eax; /* * The class does not exist or the requested function is out of range; * the effective CPUID entry is the max basic leaf. Note, the index of * the original requested leaf is observed! */ return kvm_find_cpuid_entry_index(vcpu, basic->eax, index); } bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx, u32 *ecx, u32 *edx, bool exact_only) { u32 orig_function = *eax, function = *eax, index = *ecx; struct kvm_cpuid_entry2 *entry; bool exact, used_max_basic = false; if (vcpu->arch.cpuid_dynamic_bits_dirty) kvm_update_cpuid_runtime(vcpu); entry = kvm_find_cpuid_entry_index(vcpu, function, index); exact = !!entry; if (!entry && !exact_only) { entry = get_out_of_range_cpuid_entry(vcpu, &function, index); used_max_basic = !!entry; } if (entry) { *eax = entry->eax; *ebx = entry->ebx; *ecx = entry->ecx; *edx = entry->edx; if (function == 7 && index == 0) { u64 data; if ((*ebx & (feature_bit(RTM) | feature_bit(HLE))) && !kvm_msr_read(vcpu, MSR_IA32_TSX_CTRL, &data) && (data & TSX_CTRL_CPUID_CLEAR)) *ebx &= ~(feature_bit(RTM) | feature_bit(HLE)); } else if (function == 0x80000007) { if (kvm_hv_invtsc_suppressed(vcpu)) *edx &= ~feature_bit(CONSTANT_TSC); } else if (IS_ENABLED(CONFIG_KVM_XEN) && kvm_xen_is_tsc_leaf(vcpu, function)) { /* * Update guest TSC frequency information if necessary. * Ignore failures, there is no sane value that can be * provided if KVM can't get the TSC frequency. */ if (kvm_check_request(KVM_REQ_CLOCK_UPDATE, vcpu)) kvm_guest_time_update(vcpu); if (index == 1) { *ecx = vcpu->arch.pvclock_tsc_mul; *edx = vcpu->arch.pvclock_tsc_shift; } else if (index == 2) { *eax = vcpu->arch.hw_tsc_khz; } } } else { *eax = *ebx = *ecx = *edx = 0; /* * When leaf 0BH or 1FH is defined, CL is pass-through * and EDX is always the x2APIC ID, even for undefined * subleaves. Index 1 will exist iff the leaf is * implemented, so we pass through CL iff leaf 1 * exists. EDX can be copied from any existing index. */ if (function == 0xb || function == 0x1f) { entry = kvm_find_cpuid_entry_index(vcpu, function, 1); if (entry) { *ecx = index & 0xff; *edx = entry->edx; } } } trace_kvm_cpuid(orig_function, index, *eax, *ebx, *ecx, *edx, exact, used_max_basic); return exact; } EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_cpuid); int kvm_emulate_cpuid(struct kvm_vcpu *vcpu) { u32 eax, ebx, ecx, edx; if (cpuid_fault_enabled(vcpu) && !kvm_require_cpl(vcpu, 0)) return 1; eax = kvm_rax_read(vcpu); ecx = kvm_rcx_read(vcpu); kvm_cpuid(vcpu, &eax, &ebx, &ecx, &edx, false); kvm_rax_write(vcpu, eax); kvm_rbx_write(vcpu, ebx); kvm_rcx_write(vcpu, ecx); kvm_rdx_write(vcpu, edx); return kvm_skip_emulated_instruction(vcpu); } EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_cpuid); |
| 2 16 14 2 2 2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 | // SPDX-License-Identifier: GPL-2.0-or-later /* * CIPSO - Commercial IP Security Option * * This is an implementation of the CIPSO 2.2 protocol as specified in * draft-ietf-cipso-ipsecurity-01.txt with additional tag types as found in * FIPS-188. While CIPSO never became a full IETF RFC standard many vendors * have chosen to adopt the protocol and over the years it has become a * de-facto standard for labeled networking. * * The CIPSO draft specification can be found in the kernel's Documentation * directory as well as the following URL: * https://tools.ietf.org/id/draft-ietf-cipso-ipsecurity-01.txt * The FIPS-188 specification can be found at the following URL: * https://www.itl.nist.gov/fipspubs/fip188.htm * * Author: Paul Moore <paul.moore@hp.com> */ /* * (c) Copyright Hewlett-Packard Development Company, L.P., 2006, 2008 */ #include <linux/init.h> #include <linux/types.h> #include <linux/rcupdate.h> #include <linux/list.h> #include <linux/spinlock.h> #include <linux/string.h> #include <linux/jhash.h> #include <linux/audit.h> #include <linux/slab.h> #include <net/ip.h> #include <net/icmp.h> #include <net/tcp.h> #include <net/netlabel.h> #include <net/cipso_ipv4.h> #include <linux/atomic.h> #include <linux/bug.h> #include <linux/unaligned.h> /* List of available DOI definitions */ /* XXX - This currently assumes a minimal number of different DOIs in use, * if in practice there are a lot of different DOIs this list should * probably be turned into a hash table or something similar so we * can do quick lookups. */ static DEFINE_SPINLOCK(cipso_v4_doi_list_lock); static LIST_HEAD(cipso_v4_doi_list); /* Label mapping cache */ int cipso_v4_cache_enabled = 1; int cipso_v4_cache_bucketsize = 10; #define CIPSO_V4_CACHE_BUCKETBITS 7 #define CIPSO_V4_CACHE_BUCKETS (1 << CIPSO_V4_CACHE_BUCKETBITS) #define CIPSO_V4_CACHE_REORDERLIMIT 10 struct cipso_v4_map_cache_bkt { spinlock_t lock; u32 size; struct list_head list; }; struct cipso_v4_map_cache_entry { u32 hash; unsigned char *key; size_t key_len; struct netlbl_lsm_cache *lsm_data; u32 activity; struct list_head list; }; static struct cipso_v4_map_cache_bkt *cipso_v4_cache; /* Restricted bitmap (tag #1) flags */ int cipso_v4_rbm_optfmt; int cipso_v4_rbm_strictvalid = 1; /* * Protocol Constants */ /* Maximum size of the CIPSO IP option, derived from the fact that the maximum * IPv4 header size is 60 bytes and the base IPv4 header is 20 bytes long. */ #define CIPSO_V4_OPT_LEN_MAX 40 /* Length of the base CIPSO option, this includes the option type (1 byte), the * option length (1 byte), and the DOI (4 bytes). */ #define CIPSO_V4_HDR_LEN 6 /* Base length of the restrictive category bitmap tag (tag #1). */ #define CIPSO_V4_TAG_RBM_BLEN 4 /* Base length of the enumerated category tag (tag #2). */ #define CIPSO_V4_TAG_ENUM_BLEN 4 /* Base length of the ranged categories bitmap tag (tag #5). */ #define CIPSO_V4_TAG_RNG_BLEN 4 /* The maximum number of category ranges permitted in the ranged category tag * (tag #5). You may note that the IETF draft states that the maximum number * of category ranges is 7, but if the low end of the last category range is * zero then it is possible to fit 8 category ranges because the zero should * be omitted. */ #define CIPSO_V4_TAG_RNG_CAT_MAX 8 /* Base length of the local tag (non-standard tag). * Tag definition (may change between kernel versions) * * 0 8 16 24 32 * +----------+----------+----------+----------+ * | 10000000 | 00000110 | 32-bit secid value | * +----------+----------+----------+----------+ * | in (host byte order)| * +----------+----------+ * */ #define CIPSO_V4_TAG_LOC_BLEN 6 /* * Helper Functions */ /** * cipso_v4_cache_entry_free - Frees a cache entry * @entry: the entry to free * * Description: * This function frees the memory associated with a cache entry including the * LSM cache data if there are no longer any users, i.e. reference count == 0. * */ static void cipso_v4_cache_entry_free(struct cipso_v4_map_cache_entry *entry) { if (entry->lsm_data) netlbl_secattr_cache_free(entry->lsm_data); kfree(entry->key); kfree(entry); } /** * cipso_v4_map_cache_hash - Hashing function for the CIPSO cache * @key: the hash key * @key_len: the length of the key in bytes * * Description: * The CIPSO tag hashing function. Returns a 32-bit hash value. * */ static u32 cipso_v4_map_cache_hash(const unsigned char *key, u32 key_len) { return jhash(key, key_len, 0); } /* * Label Mapping Cache Functions */ /** * cipso_v4_cache_init - Initialize the CIPSO cache * * Description: * Initializes the CIPSO label mapping cache, this function should be called * before any of the other functions defined in this file. Returns zero on * success, negative values on error. * */ static int __init cipso_v4_cache_init(void) { u32 iter; cipso_v4_cache = kcalloc(CIPSO_V4_CACHE_BUCKETS, sizeof(struct cipso_v4_map_cache_bkt), GFP_KERNEL); if (!cipso_v4_cache) return -ENOMEM; for (iter = 0; iter < CIPSO_V4_CACHE_BUCKETS; iter++) { spin_lock_init(&cipso_v4_cache[iter].lock); cipso_v4_cache[iter].size = 0; INIT_LIST_HEAD(&cipso_v4_cache[iter].list); } return 0; } /** * cipso_v4_cache_invalidate - Invalidates the current CIPSO cache * * Description: * Invalidates and frees any entries in the CIPSO cache. * */ void cipso_v4_cache_invalidate(void) { struct cipso_v4_map_cache_entry *entry, *tmp_entry; u32 iter; for (iter = 0; iter < CIPSO_V4_CACHE_BUCKETS; iter++) { spin_lock_bh(&cipso_v4_cache[iter].lock); list_for_each_entry_safe(entry, tmp_entry, &cipso_v4_cache[iter].list, list) { list_del(&entry->list); cipso_v4_cache_entry_free(entry); } cipso_v4_cache[iter].size = 0; spin_unlock_bh(&cipso_v4_cache[iter].lock); } } /** * cipso_v4_cache_check - Check the CIPSO cache for a label mapping * @key: the buffer to check * @key_len: buffer length in bytes * @secattr: the security attribute struct to use * * Description: * This function checks the cache to see if a label mapping already exists for * the given key. If there is a match then the cache is adjusted and the * @secattr struct is populated with the correct LSM security attributes. The * cache is adjusted in the following manner if the entry is not already the * first in the cache bucket: * * 1. The cache entry's activity counter is incremented * 2. The previous (higher ranking) entry's activity counter is decremented * 3. If the difference between the two activity counters is geater than * CIPSO_V4_CACHE_REORDERLIMIT the two entries are swapped * * Returns zero on success, -ENOENT for a cache miss, and other negative values * on error. * */ static int cipso_v4_cache_check(const unsigned char *key, u32 key_len, struct netlbl_lsm_secattr *secattr) { u32 bkt; struct cipso_v4_map_cache_entry *entry; struct cipso_v4_map_cache_entry *prev_entry = NULL; u32 hash; if (!READ_ONCE(cipso_v4_cache_enabled)) return -ENOENT; hash = cipso_v4_map_cache_hash(key, key_len); bkt = hash & (CIPSO_V4_CACHE_BUCKETS - 1); spin_lock_bh(&cipso_v4_cache[bkt].lock); list_for_each_entry(entry, &cipso_v4_cache[bkt].list, list) { if (entry->hash == hash && entry->key_len == key_len && memcmp(entry->key, key, key_len) == 0) { entry->activity += 1; refcount_inc(&entry->lsm_data->refcount); secattr->cache = entry->lsm_data; secattr->flags |= NETLBL_SECATTR_CACHE; secattr->type = NETLBL_NLTYPE_CIPSOV4; if (!prev_entry) { spin_unlock_bh(&cipso_v4_cache[bkt].lock); return 0; } if (prev_entry->activity > 0) prev_entry->activity -= 1; if (entry->activity > prev_entry->activity && entry->activity - prev_entry->activity > CIPSO_V4_CACHE_REORDERLIMIT) { __list_del(entry->list.prev, entry->list.next); __list_add(&entry->list, prev_entry->list.prev, &prev_entry->list); } spin_unlock_bh(&cipso_v4_cache[bkt].lock); return 0; } prev_entry = entry; } spin_unlock_bh(&cipso_v4_cache[bkt].lock); return -ENOENT; } /** * cipso_v4_cache_add - Add an entry to the CIPSO cache * @cipso_ptr: pointer to CIPSO IP option * @secattr: the packet's security attributes * * Description: * Add a new entry into the CIPSO label mapping cache. Add the new entry to * head of the cache bucket's list, if the cache bucket is out of room remove * the last entry in the list first. It is important to note that there is * currently no checking for duplicate keys. Returns zero on success, * negative values on failure. * */ int cipso_v4_cache_add(const unsigned char *cipso_ptr, const struct netlbl_lsm_secattr *secattr) { int bkt_size = READ_ONCE(cipso_v4_cache_bucketsize); int ret_val = -EPERM; u32 bkt; struct cipso_v4_map_cache_entry *entry = NULL; struct cipso_v4_map_cache_entry *old_entry = NULL; u32 cipso_ptr_len; if (!READ_ONCE(cipso_v4_cache_enabled) || bkt_size <= 0) return 0; cipso_ptr_len = cipso_ptr[1]; entry = kzalloc(sizeof(*entry), GFP_ATOMIC); if (!entry) return -ENOMEM; entry->key = kmemdup(cipso_ptr, cipso_ptr_len, GFP_ATOMIC); if (!entry->key) { ret_val = -ENOMEM; goto cache_add_failure; } entry->key_len = cipso_ptr_len; entry->hash = cipso_v4_map_cache_hash(cipso_ptr, cipso_ptr_len); refcount_inc(&secattr->cache->refcount); entry->lsm_data = secattr->cache; bkt = entry->hash & (CIPSO_V4_CACHE_BUCKETS - 1); spin_lock_bh(&cipso_v4_cache[bkt].lock); if (cipso_v4_cache[bkt].size < bkt_size) { list_add(&entry->list, &cipso_v4_cache[bkt].list); cipso_v4_cache[bkt].size += 1; } else { old_entry = list_entry(cipso_v4_cache[bkt].list.prev, struct cipso_v4_map_cache_entry, list); list_del(&old_entry->list); list_add(&entry->list, &cipso_v4_cache[bkt].list); cipso_v4_cache_entry_free(old_entry); } spin_unlock_bh(&cipso_v4_cache[bkt].lock); return 0; cache_add_failure: if (entry) cipso_v4_cache_entry_free(entry); return ret_val; } /* * DOI List Functions */ /** * cipso_v4_doi_search - Searches for a DOI definition * @doi: the DOI to search for * * Description: * Search the DOI definition list for a DOI definition with a DOI value that * matches @doi. The caller is responsible for calling rcu_read_[un]lock(). * Returns a pointer to the DOI definition on success and NULL on failure. */ static struct cipso_v4_doi *cipso_v4_doi_search(u32 doi) { struct cipso_v4_doi *iter; list_for_each_entry_rcu(iter, &cipso_v4_doi_list, list) if (iter->doi == doi && refcount_read(&iter->refcount)) return iter; return NULL; } /** * cipso_v4_doi_add - Add a new DOI to the CIPSO protocol engine * @doi_def: the DOI structure * @audit_info: NetLabel audit information * * Description: * The caller defines a new DOI for use by the CIPSO engine and calls this * function to add it to the list of acceptable domains. The caller must * ensure that the mapping table specified in @doi_def->map meets all of the * requirements of the mapping type (see cipso_ipv4.h for details). Returns * zero on success and non-zero on failure. * */ int cipso_v4_doi_add(struct cipso_v4_doi *doi_def, struct netlbl_audit *audit_info) { int ret_val = -EINVAL; u32 iter; u32 doi; u32 doi_type; struct audit_buffer *audit_buf; doi = doi_def->doi; doi_type = doi_def->type; if (doi_def->doi == CIPSO_V4_DOI_UNKNOWN) goto doi_add_return; for (iter = 0; iter < CIPSO_V4_TAG_MAXCNT; iter++) { switch (doi_def->tags[iter]) { case CIPSO_V4_TAG_RBITMAP: break; case CIPSO_V4_TAG_RANGE: case CIPSO_V4_TAG_ENUM: if (doi_def->type != CIPSO_V4_MAP_PASS) goto doi_add_return; break; case CIPSO_V4_TAG_LOCAL: if (doi_def->type != CIPSO_V4_MAP_LOCAL) goto doi_add_return; break; case CIPSO_V4_TAG_INVALID: if (iter == 0) goto doi_add_return; break; default: goto doi_add_return; } } refcount_set(&doi_def->refcount, 1); spin_lock(&cipso_v4_doi_list_lock); if (cipso_v4_doi_search(doi_def->doi)) { spin_unlock(&cipso_v4_doi_list_lock); ret_val = -EEXIST; goto doi_add_return; } list_add_tail_rcu(&doi_def->list, &cipso_v4_doi_list); spin_unlock(&cipso_v4_doi_list_lock); ret_val = 0; doi_add_return: audit_buf = netlbl_audit_start(AUDIT_MAC_CIPSOV4_ADD, audit_info); if (audit_buf) { const char *type_str; switch (doi_type) { case CIPSO_V4_MAP_TRANS: type_str = "trans"; break; case CIPSO_V4_MAP_PASS: type_str = "pass"; break; case CIPSO_V4_MAP_LOCAL: type_str = "local"; break; default: type_str = "(unknown)"; } audit_log_format(audit_buf, " cipso_doi=%u cipso_type=%s res=%u", doi, type_str, ret_val == 0 ? 1 : 0); audit_log_end(audit_buf); } return ret_val; } /** * cipso_v4_doi_free - Frees a DOI definition * @doi_def: the DOI definition * * Description: * This function frees all of the memory associated with a DOI definition. * */ void cipso_v4_doi_free(struct cipso_v4_doi *doi_def) { if (!doi_def) return; switch (doi_def->type) { case CIPSO_V4_MAP_TRANS: kfree(doi_def->map.std->lvl.cipso); kfree(doi_def->map.std->lvl.local); kfree(doi_def->map.std->cat.cipso); kfree(doi_def->map.std->cat.local); kfree(doi_def->map.std); break; } kfree(doi_def); } /** * cipso_v4_doi_free_rcu - Frees a DOI definition via the RCU pointer * @entry: the entry's RCU field * * Description: * This function is designed to be used as a callback to the call_rcu() * function so that the memory allocated to the DOI definition can be released * safely. * */ static void cipso_v4_doi_free_rcu(struct rcu_head *entry) { struct cipso_v4_doi *doi_def; doi_def = container_of(entry, struct cipso_v4_doi, rcu); cipso_v4_doi_free(doi_def); } /** * cipso_v4_doi_remove - Remove an existing DOI from the CIPSO protocol engine * @doi: the DOI value * @audit_info: NetLabel audit information * * Description: * Removes a DOI definition from the CIPSO engine. The NetLabel routines will * be called to release their own LSM domain mappings as well as our own * domain list. Returns zero on success and negative values on failure. * */ int cipso_v4_doi_remove(u32 doi, struct netlbl_audit *audit_info) { int ret_val; struct cipso_v4_doi *doi_def; struct audit_buffer *audit_buf; spin_lock(&cipso_v4_doi_list_lock); doi_def = cipso_v4_doi_search(doi); if (!doi_def) { spin_unlock(&cipso_v4_doi_list_lock); ret_val = -ENOENT; goto doi_remove_return; } list_del_rcu(&doi_def->list); spin_unlock(&cipso_v4_doi_list_lock); cipso_v4_doi_putdef(doi_def); ret_val = 0; doi_remove_return: audit_buf = netlbl_audit_start(AUDIT_MAC_CIPSOV4_DEL, audit_info); if (audit_buf) { audit_log_format(audit_buf, " cipso_doi=%u res=%u", doi, ret_val == 0 ? 1 : 0); audit_log_end(audit_buf); } return ret_val; } /** * cipso_v4_doi_getdef - Returns a reference to a valid DOI definition * @doi: the DOI value * * Description: * Searches for a valid DOI definition and if one is found it is returned to * the caller. Otherwise NULL is returned. The caller must ensure that * rcu_read_lock() is held while accessing the returned definition and the DOI * definition reference count is decremented when the caller is done. * */ struct cipso_v4_doi *cipso_v4_doi_getdef(u32 doi) { struct cipso_v4_doi *doi_def; rcu_read_lock(); doi_def = cipso_v4_doi_search(doi); if (!doi_def) goto doi_getdef_return; if (!refcount_inc_not_zero(&doi_def->refcount)) doi_def = NULL; doi_getdef_return: rcu_read_unlock(); return doi_def; } /** * cipso_v4_doi_putdef - Releases a reference for the given DOI definition * @doi_def: the DOI definition * * Description: * Releases a DOI definition reference obtained from cipso_v4_doi_getdef(). * */ void cipso_v4_doi_putdef(struct cipso_v4_doi *doi_def) { if (!doi_def) return; if (!refcount_dec_and_test(&doi_def->refcount)) return; cipso_v4_cache_invalidate(); call_rcu(&doi_def->rcu, cipso_v4_doi_free_rcu); } /** * cipso_v4_doi_walk - Iterate through the DOI definitions * @skip_cnt: skip past this number of DOI definitions, updated * @callback: callback for each DOI definition * @cb_arg: argument for the callback function * * Description: * Iterate over the DOI definition list, skipping the first @skip_cnt entries. * For each entry call @callback, if @callback returns a negative value stop * 'walking' through the list and return. Updates the value in @skip_cnt upon * return. Returns zero on success, negative values on failure. * */ int cipso_v4_doi_walk(u32 *skip_cnt, int (*callback) (struct cipso_v4_doi *doi_def, void *arg), void *cb_arg) { int ret_val = -ENOENT; u32 doi_cnt = 0; struct cipso_v4_doi *iter_doi; rcu_read_lock(); list_for_each_entry_rcu(iter_doi, &cipso_v4_doi_list, list) if (refcount_read(&iter_doi->refcount) > 0) { if (doi_cnt++ < *skip_cnt) continue; ret_val = callback(iter_doi, cb_arg); if (ret_val < 0) { doi_cnt--; goto doi_walk_return; } } doi_walk_return: rcu_read_unlock(); *skip_cnt = doi_cnt; return ret_val; } /* * Label Mapping Functions */ /** * cipso_v4_map_lvl_valid - Checks to see if the given level is understood * @doi_def: the DOI definition * @level: the level to check * * Description: * Checks the given level against the given DOI definition and returns a * negative value if the level does not have a valid mapping and a zero value * if the level is defined by the DOI. * */ static int cipso_v4_map_lvl_valid(const struct cipso_v4_doi *doi_def, u8 level) { switch (doi_def->type) { case CIPSO_V4_MAP_PASS: return 0; case CIPSO_V4_MAP_TRANS: if ((level < doi_def->map.std->lvl.cipso_size) && (doi_def->map.std->lvl.cipso[level] < CIPSO_V4_INV_LVL)) return 0; break; } return -EFAULT; } /** * cipso_v4_map_lvl_hton - Perform a level mapping from the host to the network * @doi_def: the DOI definition * @host_lvl: the host MLS level * @net_lvl: the network/CIPSO MLS level * * Description: * Perform a label mapping to translate a local MLS level to the correct * CIPSO level using the given DOI definition. Returns zero on success, * negative values otherwise. * */ static int cipso_v4_map_lvl_hton(const struct cipso_v4_doi *doi_def, u32 host_lvl, u32 *net_lvl) { switch (doi_def->type) { case CIPSO_V4_MAP_PASS: *net_lvl = host_lvl; return 0; case CIPSO_V4_MAP_TRANS: if (host_lvl < doi_def->map.std->lvl.local_size && doi_def->map.std->lvl.local[host_lvl] < CIPSO_V4_INV_LVL) { *net_lvl = doi_def->map.std->lvl.local[host_lvl]; return 0; } return -EPERM; } return -EINVAL; } /** * cipso_v4_map_lvl_ntoh - Perform a level mapping from the network to the host * @doi_def: the DOI definition * @net_lvl: the network/CIPSO MLS level * @host_lvl: the host MLS level * * Description: * Perform a label mapping to translate a CIPSO level to the correct local MLS * level using the given DOI definition. Returns zero on success, negative * values otherwise. * */ static int cipso_v4_map_lvl_ntoh(const struct cipso_v4_doi *doi_def, u32 net_lvl, u32 *host_lvl) { struct cipso_v4_std_map_tbl *map_tbl; switch (doi_def->type) { case CIPSO_V4_MAP_PASS: *host_lvl = net_lvl; return 0; case CIPSO_V4_MAP_TRANS: map_tbl = doi_def->map.std; if (net_lvl < map_tbl->lvl.cipso_size && map_tbl->lvl.cipso[net_lvl] < CIPSO_V4_INV_LVL) { *host_lvl = doi_def->map.std->lvl.cipso[net_lvl]; return 0; } return -EPERM; } return -EINVAL; } /** * cipso_v4_map_cat_rbm_valid - Checks to see if the category bitmap is valid * @doi_def: the DOI definition * @bitmap: category bitmap * @bitmap_len: bitmap length in bytes * * Description: * Checks the given category bitmap against the given DOI definition and * returns a negative value if any of the categories in the bitmap do not have * a valid mapping and a zero value if all of the categories are valid. * */ static int cipso_v4_map_cat_rbm_valid(const struct cipso_v4_doi *doi_def, const unsigned char *bitmap, u32 bitmap_len) { int cat = -1; u32 bitmap_len_bits = bitmap_len * 8; u32 cipso_cat_size; u32 *cipso_array; switch (doi_def->type) { case CIPSO_V4_MAP_PASS: return 0; case CIPSO_V4_MAP_TRANS: cipso_cat_size = doi_def->map.std->cat.cipso_size; cipso_array = doi_def->map.std->cat.cipso; for (;;) { cat = netlbl_bitmap_walk(bitmap, bitmap_len_bits, cat + 1, 1); if (cat < 0) break; if (cat >= cipso_cat_size || cipso_array[cat] >= CIPSO_V4_INV_CAT) return -EFAULT; } if (cat == -1) return 0; break; } return -EFAULT; } /** * cipso_v4_map_cat_rbm_hton - Perform a category mapping from host to network * @doi_def: the DOI definition * @secattr: the security attributes * @net_cat: the zero'd out category bitmap in network/CIPSO format * @net_cat_len: the length of the CIPSO bitmap in bytes * * Description: * Perform a label mapping to translate a local MLS category bitmap to the * correct CIPSO bitmap using the given DOI definition. Returns the minimum * size in bytes of the network bitmap on success, negative values otherwise. * */ static int cipso_v4_map_cat_rbm_hton(const struct cipso_v4_doi *doi_def, const struct netlbl_lsm_secattr *secattr, unsigned char *net_cat, u32 net_cat_len) { int host_spot = -1; u32 net_spot = CIPSO_V4_INV_CAT; u32 net_spot_max = 0; u32 net_clen_bits = net_cat_len * 8; u32 host_cat_size = 0; u32 *host_cat_array = NULL; if (doi_def->type == CIPSO_V4_MAP_TRANS) { host_cat_size = doi_def->map.std->cat.local_size; host_cat_array = doi_def->map.std->cat.local; } for (;;) { host_spot = netlbl_catmap_walk(secattr->attr.mls.cat, host_spot + 1); if (host_spot < 0) break; switch (doi_def->type) { case CIPSO_V4_MAP_PASS: net_spot = host_spot; break; case CIPSO_V4_MAP_TRANS: if (host_spot >= host_cat_size) return -EPERM; net_spot = host_cat_array[host_spot]; if (net_spot >= CIPSO_V4_INV_CAT) return -EPERM; break; } if (net_spot >= net_clen_bits) return -ENOSPC; netlbl_bitmap_setbit(net_cat, net_spot, 1); if (net_spot > net_spot_max) net_spot_max = net_spot; } if (++net_spot_max % 8) return net_spot_max / 8 + 1; return net_spot_max / 8; } /** * cipso_v4_map_cat_rbm_ntoh - Perform a category mapping from network to host * @doi_def: the DOI definition * @net_cat: the category bitmap in network/CIPSO format * @net_cat_len: the length of the CIPSO bitmap in bytes * @secattr: the security attributes * * Description: * Perform a label mapping to translate a CIPSO bitmap to the correct local * MLS category bitmap using the given DOI definition. Returns zero on * success, negative values on failure. * */ static int cipso_v4_map_cat_rbm_ntoh(const struct cipso_v4_doi *doi_def, const unsigned char *net_cat, u32 net_cat_len, struct netlbl_lsm_secattr *secattr) { int ret_val; int net_spot = -1; u32 host_spot = CIPSO_V4_INV_CAT; u32 net_clen_bits = net_cat_len * 8; u32 net_cat_size = 0; u32 *net_cat_array = NULL; if (doi_def->type == CIPSO_V4_MAP_TRANS) { net_cat_size = doi_def->map.std->cat.cipso_size; net_cat_array = doi_def->map.std->cat.cipso; } for (;;) { net_spot = netlbl_bitmap_walk(net_cat, net_clen_bits, net_spot + 1, 1); if (net_spot < 0) return 0; switch (doi_def->type) { case CIPSO_V4_MAP_PASS: host_spot = net_spot; break; case CIPSO_V4_MAP_TRANS: if (net_spot >= net_cat_size) return -EPERM; host_spot = net_cat_array[net_spot]; if (host_spot >= CIPSO_V4_INV_CAT) return -EPERM; break; } ret_val = netlbl_catmap_setbit(&secattr->attr.mls.cat, host_spot, GFP_ATOMIC); if (ret_val != 0) return ret_val; } return -EINVAL; } /** * cipso_v4_map_cat_enum_valid - Checks to see if the categories are valid * @doi_def: the DOI definition * @enumcat: category list * @enumcat_len: length of the category list in bytes * * Description: * Checks the given categories against the given DOI definition and returns a * negative value if any of the categories do not have a valid mapping and a * zero value if all of the categories are valid. * */ static int cipso_v4_map_cat_enum_valid(const struct cipso_v4_doi *doi_def, const unsigned char *enumcat, u32 enumcat_len) { u16 cat; int cat_prev = -1; u32 iter; if (doi_def->type != CIPSO_V4_MAP_PASS || enumcat_len & 0x01) return -EFAULT; for (iter = 0; iter < enumcat_len; iter += 2) { cat = get_unaligned_be16(&enumcat[iter]); if (cat <= cat_prev) return -EFAULT; cat_prev = cat; } return 0; } /** * cipso_v4_map_cat_enum_hton - Perform a category mapping from host to network * @doi_def: the DOI definition * @secattr: the security attributes * @net_cat: the zero'd out category list in network/CIPSO format * @net_cat_len: the length of the CIPSO category list in bytes * * Description: * Perform a label mapping to translate a local MLS category bitmap to the * correct CIPSO category list using the given DOI definition. Returns the * size in bytes of the network category bitmap on success, negative values * otherwise. * */ static int cipso_v4_map_cat_enum_hton(const struct cipso_v4_doi *doi_def, const struct netlbl_lsm_secattr *secattr, unsigned char *net_cat, u32 net_cat_len) { int cat = -1; u32 cat_iter = 0; for (;;) { cat = netlbl_catmap_walk(secattr->attr.mls.cat, cat + 1); if (cat < 0) break; if ((cat_iter + 2) > net_cat_len) return -ENOSPC; *((__be16 *)&net_cat[cat_iter]) = htons(cat); cat_iter += 2; } return cat_iter; } /** * cipso_v4_map_cat_enum_ntoh - Perform a category mapping from network to host * @doi_def: the DOI definition * @net_cat: the category list in network/CIPSO format * @net_cat_len: the length of the CIPSO bitmap in bytes * @secattr: the security attributes * * Description: * Perform a label mapping to translate a CIPSO category list to the correct * local MLS category bitmap using the given DOI definition. Returns zero on * success, negative values on failure. * */ static int cipso_v4_map_cat_enum_ntoh(const struct cipso_v4_doi *doi_def, const unsigned char *net_cat, u32 net_cat_len, struct netlbl_lsm_secattr *secattr) { int ret_val; u32 iter; for (iter = 0; iter < net_cat_len; iter += 2) { ret_val = netlbl_catmap_setbit(&secattr->attr.mls.cat, get_unaligned_be16(&net_cat[iter]), GFP_ATOMIC); if (ret_val != 0) return ret_val; } return 0; } /** * cipso_v4_map_cat_rng_valid - Checks to see if the categories are valid * @doi_def: the DOI definition * @rngcat: category list * @rngcat_len: length of the category list in bytes * * Description: * Checks the given categories against the given DOI definition and returns a * negative value if any of the categories do not have a valid mapping and a * zero value if all of the categories are valid. * */ static int cipso_v4_map_cat_rng_valid(const struct cipso_v4_doi *doi_def, const unsigned char *rngcat, u32 rngcat_len) { u16 cat_high; u16 cat_low; u32 cat_prev = CIPSO_V4_MAX_REM_CATS + 1; u32 iter; if (doi_def->type != CIPSO_V4_MAP_PASS || rngcat_len & 0x01) return -EFAULT; for (iter = 0; iter < rngcat_len; iter += 4) { cat_high = get_unaligned_be16(&rngcat[iter]); if ((iter + 4) <= rngcat_len) cat_low = get_unaligned_be16(&rngcat[iter + 2]); else cat_low = 0; if (cat_high > cat_prev) return -EFAULT; cat_prev = cat_low; } return 0; } /** * cipso_v4_map_cat_rng_hton - Perform a category mapping from host to network * @doi_def: the DOI definition * @secattr: the security attributes * @net_cat: the zero'd out category list in network/CIPSO format * @net_cat_len: the length of the CIPSO category list in bytes * * Description: * Perform a label mapping to translate a local MLS category bitmap to the * correct CIPSO category list using the given DOI definition. Returns the * size in bytes of the network category bitmap on success, negative values * otherwise. * */ static int cipso_v4_map_cat_rng_hton(const struct cipso_v4_doi *doi_def, const struct netlbl_lsm_secattr *secattr, unsigned char *net_cat, u32 net_cat_len) { int iter = -1; u16 array[CIPSO_V4_TAG_RNG_CAT_MAX * 2]; u32 array_cnt = 0; u32 cat_size = 0; /* make sure we don't overflow the 'array[]' variable */ if (net_cat_len > (CIPSO_V4_OPT_LEN_MAX - CIPSO_V4_HDR_LEN - CIPSO_V4_TAG_RNG_BLEN)) return -ENOSPC; for (;;) { iter = netlbl_catmap_walk(secattr->attr.mls.cat, iter + 1); if (iter < 0) break; cat_size += (iter == 0 ? 0 : sizeof(u16)); if (cat_size > net_cat_len) return -ENOSPC; array[array_cnt++] = iter; iter = netlbl_catmap_walkrng(secattr->attr.mls.cat, iter); if (iter < 0) return -EFAULT; cat_size += sizeof(u16); if (cat_size > net_cat_len) return -ENOSPC; array[array_cnt++] = iter; } for (iter = 0; array_cnt > 0;) { *((__be16 *)&net_cat[iter]) = htons(array[--array_cnt]); iter += 2; array_cnt--; if (array[array_cnt] != 0) { *((__be16 *)&net_cat[iter]) = htons(array[array_cnt]); iter += 2; } } return cat_size; } /** * cipso_v4_map_cat_rng_ntoh - Perform a category mapping from network to host * @doi_def: the DOI definition * @net_cat: the category list in network/CIPSO format * @net_cat_len: the length of the CIPSO bitmap in bytes * @secattr: the security attributes * * Description: * Perform a label mapping to translate a CIPSO category list to the correct * local MLS category bitmap using the given DOI definition. Returns zero on * success, negative values on failure. * */ static int cipso_v4_map_cat_rng_ntoh(const struct cipso_v4_doi *doi_def, const unsigned char *net_cat, u32 net_cat_len, struct netlbl_lsm_secattr *secattr) { int ret_val; u32 net_iter; u16 cat_low; u16 cat_high; for (net_iter = 0; net_iter < net_cat_len; net_iter += 4) { cat_high = get_unaligned_be16(&net_cat[net_iter]); if ((net_iter + 4) <= net_cat_len) cat_low = get_unaligned_be16(&net_cat[net_iter + 2]); else cat_low = 0; ret_val = netlbl_catmap_setrng(&secattr->attr.mls.cat, cat_low, cat_high, GFP_ATOMIC); if (ret_val != 0) return ret_val; } return 0; } /* * Protocol Handling Functions */ /** * cipso_v4_gentag_hdr - Generate a CIPSO option header * @doi_def: the DOI definition * @len: the total tag length in bytes, not including this header * @buf: the CIPSO option buffer * * Description: * Write a CIPSO header into the beginning of @buffer. * */ static void cipso_v4_gentag_hdr(const struct cipso_v4_doi *doi_def, unsigned char *buf, u32 len) { buf[0] = IPOPT_CIPSO; buf[1] = CIPSO_V4_HDR_LEN + len; put_unaligned_be32(doi_def->doi, &buf[2]); } /** * cipso_v4_gentag_rbm - Generate a CIPSO restricted bitmap tag (type #1) * @doi_def: the DOI definition * @secattr: the security attributes * @buffer: the option buffer * @buffer_len: length of buffer in bytes * * Description: * Generate a CIPSO option using the restricted bitmap tag, tag type #1. The * actual buffer length may be larger than the indicated size due to * translation between host and network category bitmaps. Returns the size of * the tag on success, negative values on failure. * */ static int cipso_v4_gentag_rbm(const struct cipso_v4_doi *doi_def, const struct netlbl_lsm_secattr *secattr, unsigned char *buffer, u32 buffer_len) { int ret_val; u32 tag_len; u32 level; if ((secattr->flags & NETLBL_SECATTR_MLS_LVL) == 0) return -EPERM; ret_val = cipso_v4_map_lvl_hton(doi_def, secattr->attr.mls.lvl, &level); if (ret_val != 0) return ret_val; if (secattr->flags & NETLBL_SECATTR_MLS_CAT) { ret_val = cipso_v4_map_cat_rbm_hton(doi_def, secattr, &buffer[4], buffer_len - 4); if (ret_val < 0) return ret_val; /* This will send packets using the "optimized" format when * possible as specified in section 3.4.2.6 of the * CIPSO draft. */ if (READ_ONCE(cipso_v4_rbm_optfmt) && ret_val > 0 && ret_val <= 10) tag_len = 14; else tag_len = 4 + ret_val; } else tag_len = 4; buffer[0] = CIPSO_V4_TAG_RBITMAP; buffer[1] = tag_len; buffer[3] = level; return tag_len; } /** * cipso_v4_parsetag_rbm - Parse a CIPSO restricted bitmap tag * @doi_def: the DOI definition * @tag: the CIPSO tag * @secattr: the security attributes * * Description: * Parse a CIPSO restricted bitmap tag (tag type #1) and return the security * attributes in @secattr. Return zero on success, negatives values on * failure. * */ static int cipso_v4_parsetag_rbm(const struct cipso_v4_doi *doi_def, const unsigned char *tag, struct netlbl_lsm_secattr *secattr) { int ret_val; u8 tag_len = tag[1]; u32 level; ret_val = cipso_v4_map_lvl_ntoh(doi_def, tag[3], &level); if (ret_val != 0) return ret_val; secattr->attr.mls.lvl = level; secattr->flags |= NETLBL_SECATTR_MLS_LVL; if (tag_len > 4) { ret_val = cipso_v4_map_cat_rbm_ntoh(doi_def, &tag[4], tag_len - 4, secattr); if (ret_val != 0) { netlbl_catmap_free(secattr->attr.mls.cat); return ret_val; } if (secattr->attr.mls.cat) secattr->flags |= NETLBL_SECATTR_MLS_CAT; } return 0; } /** * cipso_v4_gentag_enum - Generate a CIPSO enumerated tag (type #2) * @doi_def: the DOI definition * @secattr: the security attributes * @buffer: the option buffer * @buffer_len: length of buffer in bytes * * Description: * Generate a CIPSO option using the enumerated tag, tag type #2. Returns the * size of the tag on success, negative values on failure. * */ static int cipso_v4_gentag_enum(const struct cipso_v4_doi *doi_def, const struct netlbl_lsm_secattr *secattr, unsigned char *buffer, u32 buffer_len) { int ret_val; u32 tag_len; u32 level; if (!(secattr->flags & NETLBL_SECATTR_MLS_LVL)) return -EPERM; ret_val = cipso_v4_map_lvl_hton(doi_def, secattr->attr.mls.lvl, &level); if (ret_val != 0) return ret_val; if (secattr->flags & NETLBL_SECATTR_MLS_CAT) { ret_val = cipso_v4_map_cat_enum_hton(doi_def, secattr, &buffer[4], buffer_len - 4); if (ret_val < 0) return ret_val; tag_len = 4 + ret_val; } else tag_len = 4; buffer[0] = CIPSO_V4_TAG_ENUM; buffer[1] = tag_len; buffer[3] = level; return tag_len; } /** * cipso_v4_parsetag_enum - Parse a CIPSO enumerated tag * @doi_def: the DOI definition * @tag: the CIPSO tag * @secattr: the security attributes * * Description: * Parse a CIPSO enumerated tag (tag type #2) and return the security * attributes in @secattr. Return zero on success, negatives values on * failure. * */ static int cipso_v4_parsetag_enum(const struct cipso_v4_doi *doi_def, const unsigned char *tag, struct netlbl_lsm_secattr *secattr) { int ret_val; u8 tag_len = tag[1]; u32 level; ret_val = cipso_v4_map_lvl_ntoh(doi_def, tag[3], &level); if (ret_val != 0) return ret_val; secattr->attr.mls.lvl = level; secattr->flags |= NETLBL_SECATTR_MLS_LVL; if (tag_len > 4) { ret_val = cipso_v4_map_cat_enum_ntoh(doi_def, &tag[4], tag_len - 4, secattr); if (ret_val != 0) { netlbl_catmap_free(secattr->attr.mls.cat); return ret_val; } secattr->flags |= NETLBL_SECATTR_MLS_CAT; } return 0; } /** * cipso_v4_gentag_rng - Generate a CIPSO ranged tag (type #5) * @doi_def: the DOI definition * @secattr: the security attributes * @buffer: the option buffer * @buffer_len: length of buffer in bytes * * Description: * Generate a CIPSO option using the ranged tag, tag type #5. Returns the * size of the tag on success, negative values on failure. * */ static int cipso_v4_gentag_rng(const struct cipso_v4_doi *doi_def, const struct netlbl_lsm_secattr *secattr, unsigned char *buffer, u32 buffer_len) { int ret_val; u32 tag_len; u32 level; if (!(secattr->flags & NETLBL_SECATTR_MLS_LVL)) return -EPERM; ret_val = cipso_v4_map_lvl_hton(doi_def, secattr->attr.mls.lvl, &level); if (ret_val != 0) return ret_val; if (secattr->flags & NETLBL_SECATTR_MLS_CAT) { ret_val = cipso_v4_map_cat_rng_hton(doi_def, secattr, &buffer[4], buffer_len - 4); if (ret_val < 0) return ret_val; tag_len = 4 + ret_val; } else tag_len = 4; buffer[0] = CIPSO_V4_TAG_RANGE; buffer[1] = tag_len; buffer[3] = level; return tag_len; } /** * cipso_v4_parsetag_rng - Parse a CIPSO ranged tag * @doi_def: the DOI definition * @tag: the CIPSO tag * @secattr: the security attributes * * Description: * Parse a CIPSO ranged tag (tag type #5) and return the security attributes * in @secattr. Return zero on success, negatives values on failure. * */ static int cipso_v4_parsetag_rng(const struct cipso_v4_doi *doi_def, const unsigned char *tag, struct netlbl_lsm_secattr *secattr) { int ret_val; u8 tag_len = tag[1]; u32 level; ret_val = cipso_v4_map_lvl_ntoh(doi_def, tag[3], &level); if (ret_val != 0) return ret_val; secattr->attr.mls.lvl = level; secattr->flags |= NETLBL_SECATTR_MLS_LVL; if (tag_len > 4) { ret_val = cipso_v4_map_cat_rng_ntoh(doi_def, &tag[4], tag_len - 4, secattr); if (ret_val != 0) { netlbl_catmap_free(secattr->attr.mls.cat); return ret_val; } if (secattr->attr.mls.cat) secattr->flags |= NETLBL_SECATTR_MLS_CAT; } return 0; } /** * cipso_v4_gentag_loc - Generate a CIPSO local tag (non-standard) * @doi_def: the DOI definition * @secattr: the security attributes * @buffer: the option buffer * @buffer_len: length of buffer in bytes * * Description: * Generate a CIPSO option using the local tag. Returns the size of the tag * on success, negative values on failure. * */ static int cipso_v4_gentag_loc(const struct cipso_v4_doi *doi_def, const struct netlbl_lsm_secattr *secattr, unsigned char *buffer, u32 buffer_len) { if (!(secattr->flags & NETLBL_SECATTR_SECID)) return -EPERM; buffer[0] = CIPSO_V4_TAG_LOCAL; buffer[1] = CIPSO_V4_TAG_LOC_BLEN; *(u32 *)&buffer[2] = secattr->attr.secid; return CIPSO_V4_TAG_LOC_BLEN; } /** * cipso_v4_parsetag_loc - Parse a CIPSO local tag * @doi_def: the DOI definition * @tag: the CIPSO tag * @secattr: the security attributes * * Description: * Parse a CIPSO local tag and return the security attributes in @secattr. * Return zero on success, negatives values on failure. * */ static int cipso_v4_parsetag_loc(const struct cipso_v4_doi *doi_def, const unsigned char *tag, struct netlbl_lsm_secattr *secattr) { secattr->attr.secid = *(u32 *)&tag[2]; secattr->flags |= NETLBL_SECATTR_SECID; return 0; } /** * cipso_v4_optptr - Find the CIPSO option in the packet * @skb: the packet * * Description: * Parse the packet's IP header looking for a CIPSO option. Returns a pointer * to the start of the CIPSO option on success, NULL if one is not found. * */ unsigned char *cipso_v4_optptr(const struct sk_buff *skb) { const struct iphdr *iph = ip_hdr(skb); unsigned char *optptr = (unsigned char *)&(ip_hdr(skb)[1]); int optlen; int taglen; for (optlen = iph->ihl*4 - sizeof(struct iphdr); optlen > 1; ) { switch (optptr[0]) { case IPOPT_END: return NULL; case IPOPT_NOOP: taglen = 1; break; default: taglen = optptr[1]; } if (!taglen || taglen > optlen) return NULL; if (optptr[0] == IPOPT_CIPSO) return optptr; optlen -= taglen; optptr += taglen; } return NULL; } /** * cipso_v4_validate - Validate a CIPSO option * @skb: the packet * @option: the start of the option, on error it is set to point to the error * * Description: * This routine is called to validate a CIPSO option, it checks all of the * fields to ensure that they are at least valid, see the draft snippet below * for details. If the option is valid then a zero value is returned and * the value of @option is unchanged. If the option is invalid then a * non-zero value is returned and @option is adjusted to point to the * offending portion of the option. From the IETF draft ... * * "If any field within the CIPSO options, such as the DOI identifier, is not * recognized the IP datagram is discarded and an ICMP 'parameter problem' * (type 12) is generated and returned. The ICMP code field is set to 'bad * parameter' (code 0) and the pointer is set to the start of the CIPSO field * that is unrecognized." * */ int cipso_v4_validate(const struct sk_buff *skb, unsigned char **option) { unsigned char *opt = *option; unsigned char *tag; unsigned char opt_iter; unsigned char err_offset = 0; u8 opt_len; u8 tag_len; struct cipso_v4_doi *doi_def = NULL; u32 tag_iter; /* caller already checks for length values that are too large */ opt_len = opt[1]; if (opt_len < 8) { err_offset = 1; goto validate_return; } rcu_read_lock(); doi_def = cipso_v4_doi_search(get_unaligned_be32(&opt[2])); if (!doi_def) { err_offset = 2; goto validate_return_locked; } opt_iter = CIPSO_V4_HDR_LEN; tag = opt + opt_iter; while (opt_iter < opt_len) { for (tag_iter = 0; doi_def->tags[tag_iter] != tag[0];) if (doi_def->tags[tag_iter] == CIPSO_V4_TAG_INVALID || ++tag_iter == CIPSO_V4_TAG_MAXCNT) { err_offset = opt_iter; goto validate_return_locked; } if (opt_iter + 1 == opt_len) { err_offset = opt_iter; goto validate_return_locked; } tag_len = tag[1]; if (tag_len > (opt_len - opt_iter)) { err_offset = opt_iter + 1; goto validate_return_locked; } switch (tag[0]) { case CIPSO_V4_TAG_RBITMAP: if (tag_len < CIPSO_V4_TAG_RBM_BLEN) { err_offset = opt_iter + 1; goto validate_return_locked; } /* We are already going to do all the verification * necessary at the socket layer so from our point of * view it is safe to turn these checks off (and less * work), however, the CIPSO draft says we should do * all the CIPSO validations here but it doesn't * really specify _exactly_ what we need to validate * ... so, just make it a sysctl tunable. */ if (READ_ONCE(cipso_v4_rbm_strictvalid)) { if (cipso_v4_map_lvl_valid(doi_def, tag[3]) < 0) { err_offset = opt_iter + 3; goto validate_return_locked; } if (tag_len > CIPSO_V4_TAG_RBM_BLEN && cipso_v4_map_cat_rbm_valid(doi_def, &tag[4], tag_len - 4) < 0) { err_offset = opt_iter + 4; goto validate_return_locked; } } break; case CIPSO_V4_TAG_ENUM: if (tag_len < CIPSO_V4_TAG_ENUM_BLEN) { err_offset = opt_iter + 1; goto validate_return_locked; } if (cipso_v4_map_lvl_valid(doi_def, tag[3]) < 0) { err_offset = opt_iter + 3; goto validate_return_locked; } if (tag_len > CIPSO_V4_TAG_ENUM_BLEN && cipso_v4_map_cat_enum_valid(doi_def, &tag[4], tag_len - 4) < 0) { err_offset = opt_iter + 4; goto validate_return_locked; } break; case CIPSO_V4_TAG_RANGE: if (tag_len < CIPSO_V4_TAG_RNG_BLEN) { err_offset = opt_iter + 1; goto validate_return_locked; } if (cipso_v4_map_lvl_valid(doi_def, tag[3]) < 0) { err_offset = opt_iter + 3; goto validate_return_locked; } if (tag_len > CIPSO_V4_TAG_RNG_BLEN && cipso_v4_map_cat_rng_valid(doi_def, &tag[4], tag_len - 4) < 0) { err_offset = opt_iter + 4; goto validate_return_locked; } break; case CIPSO_V4_TAG_LOCAL: /* This is a non-standard tag that we only allow for * local connections, so if the incoming interface is * not the loopback device drop the packet. Further, * there is no legitimate reason for setting this from * userspace so reject it if skb is NULL. */ if (!skb || !(skb->dev->flags & IFF_LOOPBACK)) { err_offset = opt_iter; goto validate_return_locked; } if (tag_len != CIPSO_V4_TAG_LOC_BLEN) { err_offset = opt_iter + 1; goto validate_return_locked; } break; default: err_offset = opt_iter; goto validate_return_locked; } tag += tag_len; opt_iter += tag_len; } validate_return_locked: rcu_read_unlock(); validate_return: *option = opt + err_offset; return err_offset; } /** * cipso_v4_error - Send the correct response for a bad packet * @skb: the packet * @error: the error code * @gateway: CIPSO gateway flag * * Description: * Based on the error code given in @error, send an ICMP error message back to * the originating host. From the IETF draft ... * * "If the contents of the CIPSO [option] are valid but the security label is * outside of the configured host or port label range, the datagram is * discarded and an ICMP 'destination unreachable' (type 3) is generated and * returned. The code field of the ICMP is set to 'communication with * destination network administratively prohibited' (code 9) or to * 'communication with destination host administratively prohibited' * (code 10). The value of the code is dependent on whether the originator * of the ICMP message is acting as a CIPSO host or a CIPSO gateway. The * recipient of the ICMP message MUST be able to handle either value. The * same procedure is performed if a CIPSO [option] can not be added to an * IP packet because it is too large to fit in the IP options area." * * "If the error is triggered by receipt of an ICMP message, the message is * discarded and no response is permitted (consistent with general ICMP * processing rules)." * */ void cipso_v4_error(struct sk_buff *skb, int error, u32 gateway) { struct inet_skb_parm parm; int res; if (ip_hdr(skb)->protocol == IPPROTO_ICMP || error != -EACCES) return; /* * We might be called above the IP layer, * so we can not use icmp_send and IPCB here. */ memset(&parm, 0, sizeof(parm)); parm.opt.optlen = ip_hdr(skb)->ihl * 4 - sizeof(struct iphdr); rcu_read_lock(); res = __ip_options_compile(dev_net(skb->dev), &parm.opt, skb, NULL); rcu_read_unlock(); if (res) return; if (gateway) __icmp_send(skb, ICMP_DEST_UNREACH, ICMP_NET_ANO, 0, &parm); else __icmp_send(skb, ICMP_DEST_UNREACH, ICMP_HOST_ANO, 0, &parm); } /** * cipso_v4_genopt - Generate a CIPSO option * @buf: the option buffer * @buf_len: the size of opt_buf * @doi_def: the CIPSO DOI to use * @secattr: the security attributes * * Description: * Generate a CIPSO option using the DOI definition and security attributes * passed to the function. Returns the length of the option on success and * negative values on failure. * */ static int cipso_v4_genopt(unsigned char *buf, u32 buf_len, const struct cipso_v4_doi *doi_def, const struct netlbl_lsm_secattr *secattr) { int ret_val; u32 iter; if (buf_len <= CIPSO_V4_HDR_LEN) return -ENOSPC; /* XXX - This code assumes only one tag per CIPSO option which isn't * really a good assumption to make but since we only support the MAC * tags right now it is a safe assumption. */ iter = 0; do { memset(buf, 0, buf_len); switch (doi_def->tags[iter]) { case CIPSO_V4_TAG_RBITMAP: ret_val = cipso_v4_gentag_rbm(doi_def, secattr, &buf[CIPSO_V4_HDR_LEN], buf_len - CIPSO_V4_HDR_LEN); break; case CIPSO_V4_TAG_ENUM: ret_val = cipso_v4_gentag_enum(doi_def, secattr, &buf[CIPSO_V4_HDR_LEN], buf_len - CIPSO_V4_HDR_LEN); break; case CIPSO_V4_TAG_RANGE: ret_val = cipso_v4_gentag_rng(doi_def, secattr, &buf[CIPSO_V4_HDR_LEN], buf_len - CIPSO_V4_HDR_LEN); break; case CIPSO_V4_TAG_LOCAL: ret_val = cipso_v4_gentag_loc(doi_def, secattr, &buf[CIPSO_V4_HDR_LEN], buf_len - CIPSO_V4_HDR_LEN); break; default: return -EPERM; } iter++; } while (ret_val < 0 && iter < CIPSO_V4_TAG_MAXCNT && doi_def->tags[iter] != CIPSO_V4_TAG_INVALID); if (ret_val < 0) return ret_val; cipso_v4_gentag_hdr(doi_def, buf, ret_val); return CIPSO_V4_HDR_LEN + ret_val; } static int cipso_v4_get_actual_opt_len(const unsigned char *data, int len) { int iter = 0, optlen = 0; /* determining the new total option length is tricky because of * the padding necessary, the only thing i can think to do at * this point is walk the options one-by-one, skipping the * padding at the end to determine the actual option size and * from there we can determine the new total option length */ while (iter < len) { if (data[iter] == IPOPT_END) { break; } else if (data[iter] == IPOPT_NOP) { iter++; } else { iter += data[iter + 1]; optlen = iter; } } return optlen; } /** * cipso_v4_sock_setattr - Add a CIPSO option to a socket * @sk: the socket * @doi_def: the CIPSO DOI to use * @secattr: the specific security attributes of the socket * @sk_locked: true if caller holds the socket lock * * Description: * Set the CIPSO option on the given socket using the DOI definition and * security attributes passed to the function. This function requires * exclusive access to @sk, which means it either needs to be in the * process of being created or locked. Returns zero on success and negative * values on failure. * */ int cipso_v4_sock_setattr(struct sock *sk, const struct cipso_v4_doi *doi_def, const struct netlbl_lsm_secattr *secattr, bool sk_locked) { int ret_val = -EPERM; unsigned char *buf = NULL; u32 buf_len; u32 opt_len; struct ip_options_rcu *old, *opt = NULL; struct inet_sock *sk_inet; struct inet_connection_sock *sk_conn; /* In the case of sock_create_lite(), the sock->sk field is not * defined yet but it is not a problem as the only users of these * "lite" PF_INET sockets are functions which do an accept() call * afterwards so we will label the socket as part of the accept(). */ if (!sk) return 0; /* We allocate the maximum CIPSO option size here so we are probably * being a little wasteful, but it makes our life _much_ easier later * on and after all we are only talking about 40 bytes. */ buf_len = CIPSO_V4_OPT_LEN_MAX; buf = kmalloc(buf_len, GFP_ATOMIC); if (!buf) { ret_val = -ENOMEM; goto socket_setattr_failure; } ret_val = cipso_v4_genopt(buf, buf_len, doi_def, secattr); if (ret_val < 0) goto socket_setattr_failure; buf_len = ret_val; /* We can't use ip_options_get() directly because it makes a call to * ip_options_get_alloc() which allocates memory with GFP_KERNEL and * we won't always have CAP_NET_RAW even though we _always_ want to * set the IPOPT_CIPSO option. */ opt_len = (buf_len + 3) & ~3; opt = kzalloc(sizeof(*opt) + opt_len, GFP_ATOMIC); if (!opt) { ret_val = -ENOMEM; goto socket_setattr_failure; } memcpy(opt->opt.__data, buf, buf_len); opt->opt.optlen = opt_len; opt->opt.cipso = sizeof(struct iphdr); kfree(buf); buf = NULL; sk_inet = inet_sk(sk); old = rcu_dereference_protected(sk_inet->inet_opt, sk_locked); if (inet_test_bit(IS_ICSK, sk)) { sk_conn = inet_csk(sk); if (old) sk_conn->icsk_ext_hdr_len -= old->opt.optlen; sk_conn->icsk_ext_hdr_len += opt->opt.optlen; sk_conn->icsk_sync_mss(sk, sk_conn->icsk_pmtu_cookie); } rcu_assign_pointer(sk_inet->inet_opt, opt); if (old) kfree_rcu(old, rcu); return 0; socket_setattr_failure: kfree(buf); kfree(opt); return ret_val; } /** * cipso_v4_req_setattr - Add a CIPSO option to a connection request socket * @req: the connection request socket * @doi_def: the CIPSO DOI to use * @secattr: the specific security attributes of the socket * * Description: * Set the CIPSO option on the given socket using the DOI definition and * security attributes passed to the function. Returns zero on success and * negative values on failure. * */ int cipso_v4_req_setattr(struct request_sock *req, const struct cipso_v4_doi *doi_def, const struct netlbl_lsm_secattr *secattr) { int ret_val = -EPERM; unsigned char *buf = NULL; u32 buf_len; u32 opt_len; struct ip_options_rcu *opt = NULL; struct inet_request_sock *req_inet; /* We allocate the maximum CIPSO option size here so we are probably * being a little wasteful, but it makes our life _much_ easier later * on and after all we are only talking about 40 bytes. */ buf_len = CIPSO_V4_OPT_LEN_MAX; buf = kmalloc(buf_len, GFP_ATOMIC); if (!buf) { ret_val = -ENOMEM; goto req_setattr_failure; } ret_val = cipso_v4_genopt(buf, buf_len, doi_def, secattr); if (ret_val < 0) goto req_setattr_failure; buf_len = ret_val; /* We can't use ip_options_get() directly because it makes a call to * ip_options_get_alloc() which allocates memory with GFP_KERNEL and * we won't always have CAP_NET_RAW even though we _always_ want to * set the IPOPT_CIPSO option. */ opt_len = (buf_len + 3) & ~3; opt = kzalloc(sizeof(*opt) + opt_len, GFP_ATOMIC); if (!opt) { ret_val = -ENOMEM; goto req_setattr_failure; } memcpy(opt->opt.__data, buf, buf_len); opt->opt.optlen = opt_len; opt->opt.cipso = sizeof(struct iphdr); kfree(buf); buf = NULL; req_inet = inet_rsk(req); opt = unrcu_pointer(xchg(&req_inet->ireq_opt, RCU_INITIALIZER(opt))); if (opt) kfree_rcu(opt, rcu); return 0; req_setattr_failure: kfree(buf); kfree(opt); return ret_val; } /** * cipso_v4_delopt - Delete the CIPSO option from a set of IP options * @opt_ptr: IP option pointer * * Description: * Deletes the CIPSO IP option from a set of IP options and makes the necessary * adjustments to the IP option structure. Returns zero on success, negative * values on failure. * */ static int cipso_v4_delopt(struct ip_options_rcu __rcu **opt_ptr) { struct ip_options_rcu *opt = rcu_dereference_protected(*opt_ptr, 1); int hdr_delta = 0; if (!opt || opt->opt.cipso == 0) return 0; if (opt->opt.srr || opt->opt.rr || opt->opt.ts || opt->opt.router_alert) { u8 cipso_len; u8 cipso_off; unsigned char *cipso_ptr; int optlen_new; cipso_off = opt->opt.cipso - sizeof(struct iphdr); cipso_ptr = &opt->opt.__data[cipso_off]; cipso_len = cipso_ptr[1]; if (opt->opt.srr > opt->opt.cipso) opt->opt.srr -= cipso_len; if (opt->opt.rr > opt->opt.cipso) opt->opt.rr -= cipso_len; if (opt->opt.ts > opt->opt.cipso) opt->opt.ts -= cipso_len; if (opt->opt.router_alert > opt->opt.cipso) opt->opt.router_alert -= cipso_len; opt->opt.cipso = 0; memmove(cipso_ptr, cipso_ptr + cipso_len, opt->opt.optlen - cipso_off - cipso_len); optlen_new = cipso_v4_get_actual_opt_len(opt->opt.__data, opt->opt.optlen); hdr_delta = opt->opt.optlen; opt->opt.optlen = (optlen_new + 3) & ~3; hdr_delta -= opt->opt.optlen; } else { /* only the cipso option was present on the socket so we can * remove the entire option struct */ *opt_ptr = NULL; hdr_delta = opt->opt.optlen; kfree_rcu(opt, rcu); } return hdr_delta; } /** * cipso_v4_sock_delattr - Delete the CIPSO option from a socket * @sk: the socket * * Description: * Removes the CIPSO option from a socket, if present. * */ void cipso_v4_sock_delattr(struct sock *sk) { struct inet_sock *sk_inet; int hdr_delta; sk_inet = inet_sk(sk); hdr_delta = cipso_v4_delopt(&sk_inet->inet_opt); if (inet_test_bit(IS_ICSK, sk) && hdr_delta > 0) { struct inet_connection_sock *sk_conn = inet_csk(sk); sk_conn->icsk_ext_hdr_len -= hdr_delta; sk_conn->icsk_sync_mss(sk, sk_conn->icsk_pmtu_cookie); } } /** * cipso_v4_req_delattr - Delete the CIPSO option from a request socket * @req: the request socket * * Description: * Removes the CIPSO option from a request socket, if present. * */ void cipso_v4_req_delattr(struct request_sock *req) { cipso_v4_delopt(&inet_rsk(req)->ireq_opt); } /** * cipso_v4_getattr - Helper function for the cipso_v4_*_getattr functions * @cipso: the CIPSO v4 option * @secattr: the security attributes * * Description: * Inspect @cipso and return the security attributes in @secattr. Returns zero * on success and negative values on failure. * */ int cipso_v4_getattr(const unsigned char *cipso, struct netlbl_lsm_secattr *secattr) { int ret_val = -ENOMSG; u32 doi; struct cipso_v4_doi *doi_def; if (cipso_v4_cache_check(cipso, cipso[1], secattr) == 0) return 0; doi = get_unaligned_be32(&cipso[2]); rcu_read_lock(); doi_def = cipso_v4_doi_search(doi); if (!doi_def) goto getattr_return; /* XXX - This code assumes only one tag per CIPSO option which isn't * really a good assumption to make but since we only support the MAC * tags right now it is a safe assumption. */ switch (cipso[6]) { case CIPSO_V4_TAG_RBITMAP: ret_val = cipso_v4_parsetag_rbm(doi_def, &cipso[6], secattr); break; case CIPSO_V4_TAG_ENUM: ret_val = cipso_v4_parsetag_enum(doi_def, &cipso[6], secattr); break; case CIPSO_V4_TAG_RANGE: ret_val = cipso_v4_parsetag_rng(doi_def, &cipso[6], secattr); break; case CIPSO_V4_TAG_LOCAL: ret_val = cipso_v4_parsetag_loc(doi_def, &cipso[6], secattr); break; } if (ret_val == 0) secattr->type = NETLBL_NLTYPE_CIPSOV4; getattr_return: rcu_read_unlock(); return ret_val; } /** * cipso_v4_sock_getattr - Get the security attributes from a sock * @sk: the sock * @secattr: the security attributes * * Description: * Query @sk to see if there is a CIPSO option attached to the sock and if * there is return the CIPSO security attributes in @secattr. This function * requires that @sk be locked, or privately held, but it does not do any * locking itself. Returns zero on success and negative values on failure. * */ int cipso_v4_sock_getattr(struct sock *sk, struct netlbl_lsm_secattr *secattr) { struct ip_options_rcu *opt; int res = -ENOMSG; rcu_read_lock(); opt = rcu_dereference(inet_sk(sk)->inet_opt); if (opt && opt->opt.cipso) res = cipso_v4_getattr(opt->opt.__data + opt->opt.cipso - sizeof(struct iphdr), secattr); rcu_read_unlock(); return res; } /** * cipso_v4_skbuff_setattr - Set the CIPSO option on a packet * @skb: the packet * @doi_def: the DOI structure * @secattr: the security attributes * * Description: * Set the CIPSO option on the given packet based on the security attributes. * Returns a pointer to the IP header on success and NULL on failure. * */ int cipso_v4_skbuff_setattr(struct sk_buff *skb, const struct cipso_v4_doi *doi_def, const struct netlbl_lsm_secattr *secattr) { int ret_val; struct iphdr *iph; struct ip_options *opt = &IPCB(skb)->opt; unsigned char buf[CIPSO_V4_OPT_LEN_MAX]; u32 buf_len = CIPSO_V4_OPT_LEN_MAX; u32 opt_len; int len_delta; ret_val = cipso_v4_genopt(buf, buf_len, doi_def, secattr); if (ret_val < 0) return ret_val; buf_len = ret_val; opt_len = (buf_len + 3) & ~3; /* we overwrite any existing options to ensure that we have enough * room for the CIPSO option, the reason is that we _need_ to guarantee * that the security label is applied to the packet - we do the same * thing when using the socket options and it hasn't caused a problem, * if we need to we can always revisit this choice later */ len_delta = opt_len - opt->optlen; /* if we don't ensure enough headroom we could panic on the skb_push() * call below so make sure we have enough, we are also "mangling" the * packet so we should probably do a copy-on-write call anyway */ ret_val = skb_cow(skb, skb_headroom(skb) + len_delta); if (ret_val < 0) return ret_val; if (len_delta > 0) { /* we assume that the header + opt->optlen have already been * "pushed" in ip_options_build() or similar */ iph = ip_hdr(skb); skb_push(skb, len_delta); memmove((char *)iph - len_delta, iph, iph->ihl << 2); skb_reset_network_header(skb); iph = ip_hdr(skb); } else if (len_delta < 0) { iph = ip_hdr(skb); memset(iph + 1, IPOPT_NOP, opt->optlen); } else iph = ip_hdr(skb); if (opt->optlen > 0) memset(opt, 0, sizeof(*opt)); opt->optlen = opt_len; opt->cipso = sizeof(struct iphdr); opt->is_changed = 1; /* we have to do the following because we are being called from a * netfilter hook which means the packet already has had the header * fields populated and the checksum calculated - yes this means we * are doing more work than needed but we do it to keep the core * stack clean and tidy */ memcpy(iph + 1, buf, buf_len); if (opt_len > buf_len) memset((char *)(iph + 1) + buf_len, 0, opt_len - buf_len); if (len_delta != 0) { iph->ihl = 5 + (opt_len >> 2); iph_set_totlen(iph, skb->len); } ip_send_check(iph); return 0; } /** * cipso_v4_skbuff_delattr - Delete any CIPSO options from a packet * @skb: the packet * * Description: * Removes any and all CIPSO options from the given packet. Returns zero on * success, negative values on failure. * */ int cipso_v4_skbuff_delattr(struct sk_buff *skb) { int ret_val, cipso_len, hdr_len_actual, new_hdr_len_actual, new_hdr_len, hdr_len_delta; struct iphdr *iph; struct ip_options *opt = &IPCB(skb)->opt; unsigned char *cipso_ptr; if (opt->cipso == 0) return 0; /* since we are changing the packet we should make a copy */ ret_val = skb_cow(skb, skb_headroom(skb)); if (ret_val < 0) return ret_val; iph = ip_hdr(skb); cipso_ptr = (unsigned char *)iph + opt->cipso; cipso_len = cipso_ptr[1]; hdr_len_actual = sizeof(struct iphdr) + cipso_v4_get_actual_opt_len((unsigned char *)(iph + 1), opt->optlen); new_hdr_len_actual = hdr_len_actual - cipso_len; new_hdr_len = (new_hdr_len_actual + 3) & ~3; hdr_len_delta = (iph->ihl << 2) - new_hdr_len; /* 1. shift any options after CIPSO to the left */ memmove(cipso_ptr, cipso_ptr + cipso_len, new_hdr_len_actual - opt->cipso); /* 2. move the whole IP header to its new place */ memmove((unsigned char *)iph + hdr_len_delta, iph, new_hdr_len_actual); /* 3. adjust the skb layout */ skb_pull(skb, hdr_len_delta); skb_reset_network_header(skb); iph = ip_hdr(skb); /* 4. re-fill new padding with IPOPT_END (may now be longer) */ memset((unsigned char *)iph + new_hdr_len_actual, IPOPT_END, new_hdr_len - new_hdr_len_actual); opt->optlen -= hdr_len_delta; opt->cipso = 0; opt->is_changed = 1; if (hdr_len_delta != 0) { iph->ihl = new_hdr_len >> 2; iph_set_totlen(iph, skb->len); } ip_send_check(iph); return 0; } /* * Setup Functions */ /** * cipso_v4_init - Initialize the CIPSO module * * Description: * Initialize the CIPSO module and prepare it for use. Returns zero on success * and negative values on failure. * */ static int __init cipso_v4_init(void) { int ret_val; ret_val = cipso_v4_cache_init(); if (ret_val != 0) panic("Failed to initialize the CIPSO/IPv4 cache (%d)\n", ret_val); return 0; } subsys_initcall(cipso_v4_init); |
| 1 1 1 8 8 8 8 8 1868 1869 1261 626 625 626 539 89 1 625 8 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 | // SPDX-License-Identifier: GPL-2.0 #include <linux/compat.h> #include <linux/errno.h> #include <linux/sched.h> #include <linux/sched/mm.h> #include <linux/syscalls.h> #include <linux/mm.h> #include <linux/fs.h> #include <linux/smp.h> #include <linux/sem.h> #include <linux/msg.h> #include <linux/shm.h> #include <linux/stat.h> #include <linux/mman.h> #include <linux/file.h> #include <linux/utsname.h> #include <linux/personality.h> #include <linux/random.h> #include <linux/uaccess.h> #include <linux/elf.h> #include <linux/hugetlb.h> #include <asm/elf.h> #include <asm/ia32.h> /* * Align a virtual address to avoid aliasing in the I$ on AMD F15h. */ static unsigned long get_align_mask(struct file *filp) { if (filp && is_file_hugepages(filp)) return huge_page_mask_align(filp); /* handle 32- and 64-bit case with a single conditional */ if (va_align.flags < 0 || !(va_align.flags & (2 - mmap_is_ia32()))) return 0; if (!(current->flags & PF_RANDOMIZE)) return 0; return va_align.mask; } /* * To avoid aliasing in the I$ on AMD F15h, the bits defined by the * va_align.bits, [12:upper_bit), are set to a random value instead of * zeroing them. This random value is computed once per boot. This form * of ASLR is known as "per-boot ASLR". * * To achieve this, the random value is added to the info.align_offset * value before calling vm_unmapped_area() or ORed directly to the * address. */ static unsigned long get_align_bits(void) { return va_align.bits & get_align_mask(NULL); } static int __init control_va_addr_alignment(char *str) { /* guard against enabling this on other CPU families */ if (va_align.flags < 0) return 1; if (*str == 0) return 1; if (!strcmp(str, "32")) va_align.flags = ALIGN_VA_32; else if (!strcmp(str, "64")) va_align.flags = ALIGN_VA_64; else if (!strcmp(str, "off")) va_align.flags = 0; else if (!strcmp(str, "on")) va_align.flags = ALIGN_VA_32 | ALIGN_VA_64; else pr_warn("invalid option value: 'align_va_addr=%s'\n", str); return 1; } __setup("align_va_addr=", control_va_addr_alignment); SYSCALL_DEFINE6(mmap, unsigned long, addr, unsigned long, len, unsigned long, prot, unsigned long, flags, unsigned long, fd, unsigned long, off) { if (off & ~PAGE_MASK) return -EINVAL; return ksys_mmap_pgoff(addr, len, prot, flags, fd, off >> PAGE_SHIFT); } static void find_start_end(unsigned long addr, unsigned long flags, unsigned long *begin, unsigned long *end) { if (!in_32bit_syscall() && (flags & MAP_32BIT)) { /* This is usually used needed to map code in small model, so it needs to be in the first 31bit. Limit it to that. This means we need to move the unmapped base down for this case. This can give conflicts with the heap, but we assume that glibc malloc knows how to fall back to mmap. Give it 1GB of playground for now. -AK */ *begin = 0x40000000; *end = 0x80000000; if (current->flags & PF_RANDOMIZE) { *begin = randomize_page(*begin, 0x02000000); } return; } *begin = get_mmap_base(1); if (in_32bit_syscall()) *end = task_size_32bit(); else *end = task_size_64bit(addr > DEFAULT_MAP_WINDOW); } static inline unsigned long stack_guard_placement(vm_flags_t vm_flags) { if (vm_flags & VM_SHADOW_STACK) return PAGE_SIZE; return 0; } unsigned long arch_get_unmapped_area(struct file *filp, unsigned long addr, unsigned long len, unsigned long pgoff, unsigned long flags, vm_flags_t vm_flags) { struct mm_struct *mm = current->mm; struct vm_area_struct *vma; struct vm_unmapped_area_info info = {}; unsigned long begin, end; if (flags & MAP_FIXED) return addr; find_start_end(addr, flags, &begin, &end); if (len > end) return -ENOMEM; if (addr) { addr = PAGE_ALIGN(addr); vma = find_vma(mm, addr); if (end - len >= addr && (!vma || addr + len <= vm_start_gap(vma))) return addr; } info.length = len; info.low_limit = begin; info.high_limit = end; if (!(filp && is_file_hugepages(filp))) { info.align_offset = pgoff << PAGE_SHIFT; info.start_gap = stack_guard_placement(vm_flags); } if (filp) { info.align_mask = get_align_mask(filp); info.align_offset += get_align_bits(); } return vm_unmapped_area(&info); } unsigned long arch_get_unmapped_area_topdown(struct file *filp, unsigned long addr0, unsigned long len, unsigned long pgoff, unsigned long flags, vm_flags_t vm_flags) { struct vm_area_struct *vma; struct mm_struct *mm = current->mm; unsigned long addr = addr0; struct vm_unmapped_area_info info = {}; /* requested length too big for entire address space */ if (len > TASK_SIZE) return -ENOMEM; /* No address checking. See comment at mmap_address_hint_valid() */ if (flags & MAP_FIXED) return addr; /* for MAP_32BIT mappings we force the legacy mmap base */ if (!in_32bit_syscall() && (flags & MAP_32BIT)) goto bottomup; /* requesting a specific address */ if (addr) { addr &= PAGE_MASK; if (!mmap_address_hint_valid(addr, len)) goto get_unmapped_area; vma = find_vma(mm, addr); if (!vma || addr + len <= vm_start_gap(vma)) return addr; } get_unmapped_area: info.flags = VM_UNMAPPED_AREA_TOPDOWN; info.length = len; if (!in_32bit_syscall() && (flags & MAP_ABOVE4G)) info.low_limit = SZ_4G; else info.low_limit = PAGE_SIZE; info.high_limit = get_mmap_base(0); if (!(filp && is_file_hugepages(filp))) { info.start_gap = stack_guard_placement(vm_flags); info.align_offset = pgoff << PAGE_SHIFT; } /* * If hint address is above DEFAULT_MAP_WINDOW, look for unmapped area * in the full address space. * * !in_32bit_syscall() check to avoid high addresses for x32 * (and make it no op on native i386). */ if (addr > DEFAULT_MAP_WINDOW && !in_32bit_syscall()) info.high_limit += TASK_SIZE_MAX - DEFAULT_MAP_WINDOW; if (filp) { info.align_mask = get_align_mask(filp); info.align_offset += get_align_bits(); } addr = vm_unmapped_area(&info); if (!(addr & ~PAGE_MASK)) return addr; VM_BUG_ON(addr != -ENOMEM); bottomup: /* * A failed mmap() very likely causes application failure, * so fall back to the bottom-up function here. This scenario * can happen with large stack limits and large mmap() * allocations. */ return arch_get_unmapped_area(filp, addr0, len, pgoff, flags, 0); } |
| 1 1 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 | // SPDX-License-Identifier: GPL-2.0 /* Sysctl interface for parport devices. * * Authors: David Campbell * Tim Waugh <tim@cyberelk.demon.co.uk> * Philip Blundell <philb@gnu.org> * Andrea Arcangeli * Riccardo Facchetti <fizban@tin.it> * * based on work by Grant Guenther <grant@torque.net> * and Philip Blundell * * Cleaned up include files - Russell King <linux@arm.uk.linux.org> */ #include <linux/string.h> #include <linux/init.h> #include <linux/module.h> #include <linux/errno.h> #include <linux/kernel.h> #include <linux/slab.h> #include <linux/parport.h> #include <linux/ctype.h> #include <linux/sysctl.h> #include <linux/device.h> #include <linux/uaccess.h> #if defined(CONFIG_SYSCTL) && defined(CONFIG_PROC_FS) #define PARPORT_MIN_TIMESLICE_VALUE 1ul #define PARPORT_MAX_TIMESLICE_VALUE ((unsigned long) HZ) #define PARPORT_MIN_SPINTIME_VALUE 1 #define PARPORT_MAX_SPINTIME_VALUE 1000 static int do_active_device(const struct ctl_table *table, int write, void *result, size_t *lenp, loff_t *ppos) { struct parport *port = (struct parport *)table->extra1; char buffer[256]; struct pardevice *dev; int len = 0; if (write) /* can't happen anyway */ return -EACCES; if (*ppos) { *lenp = 0; return 0; } for (dev = port->devices; dev ; dev = dev->next) { if(dev == port->cad) { len += scnprintf(buffer, sizeof(buffer), "%s\n", dev->name); } } if(!len) { len += scnprintf(buffer, sizeof(buffer), "%s\n", "none"); } if (len > *lenp) len = *lenp; else *lenp = len; *ppos += len; memcpy(result, buffer, len); return 0; } #ifdef CONFIG_PARPORT_1284 static int do_autoprobe(const struct ctl_table *table, int write, void *result, size_t *lenp, loff_t *ppos) { struct parport_device_info *info = table->extra2; const char *str; char buffer[256]; int len = 0; if (write) /* permissions stop this */ return -EACCES; if (*ppos) { *lenp = 0; return 0; } if ((str = info->class_name) != NULL) len += scnprintf (buffer + len, sizeof(buffer) - len, "CLASS:%s;\n", str); if ((str = info->model) != NULL) len += scnprintf (buffer + len, sizeof(buffer) - len, "MODEL:%s;\n", str); if ((str = info->mfr) != NULL) len += scnprintf (buffer + len, sizeof(buffer) - len, "MANUFACTURER:%s;\n", str); if ((str = info->description) != NULL) len += scnprintf (buffer + len, sizeof(buffer) - len, "DESCRIPTION:%s;\n", str); if ((str = info->cmdset) != NULL) len += scnprintf (buffer + len, sizeof(buffer) - len, "COMMAND SET:%s;\n", str); if (len > *lenp) len = *lenp; else *lenp = len; *ppos += len; memcpy(result, buffer, len); return 0; } #endif /* IEEE1284.3 support. */ static int do_hardware_base_addr(const struct ctl_table *table, int write, void *result, size_t *lenp, loff_t *ppos) { struct parport *port = (struct parport *)table->extra1; char buffer[64]; int len = 0; if (*ppos) { *lenp = 0; return 0; } if (write) /* permissions prevent this anyway */ return -EACCES; len += scnprintf (buffer, sizeof(buffer), "%lu\t%lu\n", port->base, port->base_hi); if (len > *lenp) len = *lenp; else *lenp = len; *ppos += len; memcpy(result, buffer, len); return 0; } static int do_hardware_irq(const struct ctl_table *table, int write, void *result, size_t *lenp, loff_t *ppos) { struct parport *port = (struct parport *)table->extra1; char buffer[20]; int len = 0; if (*ppos) { *lenp = 0; return 0; } if (write) /* permissions prevent this anyway */ return -EACCES; len += scnprintf (buffer, sizeof(buffer), "%d\n", port->irq); if (len > *lenp) len = *lenp; else *lenp = len; *ppos += len; memcpy(result, buffer, len); return 0; } static int do_hardware_dma(const struct ctl_table *table, int write, void *result, size_t *lenp, loff_t *ppos) { struct parport *port = (struct parport *)table->extra1; char buffer[20]; int len = 0; if (*ppos) { *lenp = 0; return 0; } if (write) /* permissions prevent this anyway */ return -EACCES; len += scnprintf (buffer, sizeof(buffer), "%d\n", port->dma); if (len > *lenp) len = *lenp; else *lenp = len; *ppos += len; memcpy(result, buffer, len); return 0; } static int do_hardware_modes(const struct ctl_table *table, int write, void *result, size_t *lenp, loff_t *ppos) { struct parport *port = (struct parport *)table->extra1; char buffer[40]; int len = 0; if (*ppos) { *lenp = 0; return 0; } if (write) /* permissions prevent this anyway */ return -EACCES; { #define printmode(x) \ do { \ if (port->modes & PARPORT_MODE_##x) \ len += scnprintf(buffer + len, sizeof(buffer) - len, "%s%s", f++ ? "," : "", #x); \ } while (0) int f = 0; printmode(PCSPP); printmode(TRISTATE); printmode(COMPAT); printmode(EPP); printmode(ECP); printmode(DMA); #undef printmode } buffer[len++] = '\n'; if (len > *lenp) len = *lenp; else *lenp = len; *ppos += len; memcpy(result, buffer, len); return 0; } static const unsigned long parport_min_timeslice_value = PARPORT_MIN_TIMESLICE_VALUE; static const unsigned long parport_max_timeslice_value = PARPORT_MAX_TIMESLICE_VALUE; static const int parport_min_spintime_value = PARPORT_MIN_SPINTIME_VALUE; static const int parport_max_spintime_value = PARPORT_MAX_SPINTIME_VALUE; struct parport_sysctl_table { struct ctl_table_header *port_header; struct ctl_table_header *devices_header; #ifdef CONFIG_PARPORT_1284 struct ctl_table vars[10]; #else struct ctl_table vars[5]; #endif /* IEEE 1284 support */ struct ctl_table device_dir[1]; }; static const struct parport_sysctl_table parport_sysctl_template = { .port_header = NULL, .devices_header = NULL, { { .procname = "spintime", .data = NULL, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec_minmax, .extra1 = (void*) &parport_min_spintime_value, .extra2 = (void*) &parport_max_spintime_value }, { .procname = "base-addr", .data = NULL, .maxlen = 0, .mode = 0444, .proc_handler = do_hardware_base_addr }, { .procname = "irq", .data = NULL, .maxlen = 0, .mode = 0444, .proc_handler = do_hardware_irq }, { .procname = "dma", .data = NULL, .maxlen = 0, .mode = 0444, .proc_handler = do_hardware_dma }, { .procname = "modes", .data = NULL, .maxlen = 0, .mode = 0444, .proc_handler = do_hardware_modes }, #ifdef CONFIG_PARPORT_1284 { .procname = "autoprobe", .data = NULL, .maxlen = 0, .mode = 0444, .proc_handler = do_autoprobe }, { .procname = "autoprobe0", .data = NULL, .maxlen = 0, .mode = 0444, .proc_handler = do_autoprobe }, { .procname = "autoprobe1", .data = NULL, .maxlen = 0, .mode = 0444, .proc_handler = do_autoprobe }, { .procname = "autoprobe2", .data = NULL, .maxlen = 0, .mode = 0444, .proc_handler = do_autoprobe }, { .procname = "autoprobe3", .data = NULL, .maxlen = 0, .mode = 0444, .proc_handler = do_autoprobe }, #endif /* IEEE 1284 support */ }, { { .procname = "active", .data = NULL, .maxlen = 0, .mode = 0444, .proc_handler = do_active_device }, }, }; struct parport_device_sysctl_table { struct ctl_table_header *sysctl_header; struct ctl_table vars[1]; struct ctl_table device_dir[1]; }; static const struct parport_device_sysctl_table parport_device_sysctl_template = { .sysctl_header = NULL, { { .procname = "timeslice", .data = NULL, .maxlen = sizeof(unsigned long), .mode = 0644, .proc_handler = proc_doulongvec_ms_jiffies_minmax, .extra1 = (void*) &parport_min_timeslice_value, .extra2 = (void*) &parport_max_timeslice_value }, }, { { .procname = NULL, .data = NULL, .maxlen = 0, .mode = 0555, }, } }; struct parport_default_sysctl_table { struct ctl_table_header *sysctl_header; struct ctl_table vars[2]; }; static struct parport_default_sysctl_table parport_default_sysctl_table = { .sysctl_header = NULL, { { .procname = "timeslice", .data = &parport_default_timeslice, .maxlen = sizeof(parport_default_timeslice), .mode = 0644, .proc_handler = proc_doulongvec_ms_jiffies_minmax, .extra1 = (void*) &parport_min_timeslice_value, .extra2 = (void*) &parport_max_timeslice_value }, { .procname = "spintime", .data = &parport_default_spintime, .maxlen = sizeof(parport_default_spintime), .mode = 0644, .proc_handler = proc_dointvec_minmax, .extra1 = (void*) &parport_min_spintime_value, .extra2 = (void*) &parport_max_spintime_value }, } }; int parport_proc_register(struct parport *port) { struct parport_sysctl_table *t; char *tmp_dir_path; int i, err = 0; t = kmemdup(&parport_sysctl_template, sizeof(*t), GFP_KERNEL); if (t == NULL) return -ENOMEM; t->device_dir[0].extra1 = port; t->vars[0].data = &port->spintime; for (i = 0; i < 5; i++) { t->vars[i].extra1 = port; #ifdef CONFIG_PARPORT_1284 t->vars[5 + i].extra2 = &port->probe_info[i]; #endif /* IEEE 1284 support */ } tmp_dir_path = kasprintf(GFP_KERNEL, "dev/parport/%s/devices", port->name); if (!tmp_dir_path) { err = -ENOMEM; goto exit_free_t; } t->devices_header = register_sysctl(tmp_dir_path, t->device_dir); if (t->devices_header == NULL) { err = -ENOENT; goto exit_free_tmp_dir_path; } kfree(tmp_dir_path); tmp_dir_path = kasprintf(GFP_KERNEL, "dev/parport/%s", port->name); if (!tmp_dir_path) { err = -ENOMEM; goto unregister_devices_h; } t->port_header = register_sysctl(tmp_dir_path, t->vars); if (t->port_header == NULL) { err = -ENOENT; goto unregister_devices_h; } port->sysctl_table = t; kfree(tmp_dir_path); return 0; unregister_devices_h: unregister_sysctl_table(t->devices_header); exit_free_tmp_dir_path: kfree(tmp_dir_path); exit_free_t: kfree(t); return err; } int parport_proc_unregister(struct parport *port) { if (port->sysctl_table) { struct parport_sysctl_table *t = port->sysctl_table; port->sysctl_table = NULL; unregister_sysctl_table(t->devices_header); unregister_sysctl_table(t->port_header); kfree(t); } return 0; } int parport_device_proc_register(struct pardevice *device) { struct parport_device_sysctl_table *t; struct parport * port = device->port; char *tmp_dir_path; int err = 0; t = kmemdup(&parport_device_sysctl_template, sizeof(*t), GFP_KERNEL); if (t == NULL) return -ENOMEM; /* Allocate a buffer for two paths: dev/parport/PORT/devices/DEVICE. */ tmp_dir_path = kasprintf(GFP_KERNEL, "dev/parport/%s/devices/%s", port->name, device->name); if (!tmp_dir_path) { err = -ENOMEM; goto exit_free_t; } t->vars[0].data = &device->timeslice; t->sysctl_header = register_sysctl(tmp_dir_path, t->vars); if (t->sysctl_header == NULL) { kfree(t); t = NULL; } device->sysctl_table = t; kfree(tmp_dir_path); return 0; exit_free_t: kfree(t); return err; } int parport_device_proc_unregister(struct pardevice *device) { if (device->sysctl_table) { struct parport_device_sysctl_table *t = device->sysctl_table; device->sysctl_table = NULL; unregister_sysctl_table(t->sysctl_header); kfree(t); } return 0; } static int __init parport_default_proc_register(void) { int ret; parport_default_sysctl_table.sysctl_header = register_sysctl("dev/parport/default", parport_default_sysctl_table.vars); if (!parport_default_sysctl_table.sysctl_header) return -ENOMEM; ret = parport_bus_init(); if (ret) { unregister_sysctl_table(parport_default_sysctl_table. sysctl_header); return ret; } return 0; } static void __exit parport_default_proc_unregister(void) { if (parport_default_sysctl_table.sysctl_header) { unregister_sysctl_table(parport_default_sysctl_table. sysctl_header); parport_default_sysctl_table.sysctl_header = NULL; } parport_bus_exit(); } #else /* no sysctl or no procfs*/ int parport_proc_register(struct parport *pp) { return 0; } int parport_proc_unregister(struct parport *pp) { return 0; } int parport_device_proc_register(struct pardevice *device) { return 0; } int parport_device_proc_unregister(struct pardevice *device) { return 0; } static int __init parport_default_proc_register (void) { return parport_bus_init(); } static void __exit parport_default_proc_unregister (void) { parport_bus_exit(); } #endif subsys_initcall(parport_default_proc_register) module_exit(parport_default_proc_unregister) |
| 2 1 1 2 54 123 26 82 14 91 13 8 14 27 25 22 3 19 7 29 214 51 214 108 109 107 5 104 107 136 135 159 163 91 24 64 56 56 56 56 46 34 34 21 21 21 96 2 19 56 11 35 33 44 44 33 212 2 32 1 54 2 52 2 2 56 54 2 54 54 54 2 57 1 56 56 54 2 2 11 11 11 44 44 44 6 44 53 53 3 52 5 44 44 44 4 44 6 44 3 44 43 1 44 44 3 37 1 84 3 4 18 15 39 35 39 32 11 11 51 15 14 1 15 15 15 11 11 135 11 37 37 103 103 94 94 56 17 2 56 1 38 51 12 12 1 2 1 1 15 13 37 4 3 16 13 1 12 3 15 37 61 9 16 37 8 38 37 23 23 29 5 4 20 24 6 3 7 18 3 12 18 18 19 4 4 47 7 7 7 103 102 7 11 3 64 39 83 83 76 80 10 71 2 2 2 31 30 2 3 26 1 1 1 1 1 10 5 3 2 18 74 5 20 20 3 1 1 1 2 2 15 2 4 3 2 3 3 4 2 1 2 4 14 4 1 14 13 13 4 14 14 169 39 2 2 2 2 4 4 75 37 2 7 7 2 5 6 3 10 9 2 88 4 92 92 4 90 90 135 135 99 20 72 55 2 54 18 38 1 1 44 58 30 24 1 14 4 2 2 2 2 2 2 7 6 89 56 71 1 15 56 3 7 51 7 3 4 51 3 3 19 4 14 1 6 4 35 35 3 11 11 1 11 11 1 11 11 11 11 11 1 11 1 1 11 11 91 91 8 101 63 103 8 29 13 15 2 2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 2674 2675 2676 2677 2678 2679 2680 2681 2682 2683 2684 2685 2686 2687 2688 2689 2690 2691 2692 2693 2694 2695 2696 2697 2698 2699 2700 2701 2702 2703 2704 2705 2706 2707 2708 2709 2710 2711 2712 2713 2714 2715 2716 2717 2718 2719 2720 2721 2722 2723 2724 2725 2726 2727 2728 2729 2730 2731 2732 2733 2734 2735 2736 2737 2738 2739 2740 2741 2742 2743 2744 2745 2746 2747 2748 2749 2750 2751 2752 2753 2754 2755 2756 2757 2758 2759 2760 2761 2762 2763 2764 2765 2766 2767 2768 2769 2770 2771 2772 2773 2774 2775 2776 2777 2778 2779 2780 2781 2782 2783 2784 2785 2786 2787 2788 2789 2790 2791 2792 2793 2794 2795 2796 2797 2798 2799 2800 2801 2802 2803 2804 2805 2806 2807 2808 2809 2810 2811 2812 2813 2814 2815 2816 2817 2818 2819 2820 2821 2822 2823 2824 2825 2826 2827 2828 2829 2830 2831 2832 2833 2834 2835 2836 2837 2838 2839 2840 2841 2842 2843 2844 2845 2846 2847 2848 2849 2850 2851 2852 2853 2854 2855 2856 2857 2858 2859 2860 2861 2862 2863 2864 2865 2866 2867 2868 2869 2870 2871 2872 2873 2874 2875 2876 2877 2878 2879 2880 2881 2882 2883 2884 2885 2886 2887 2888 2889 2890 2891 2892 2893 2894 2895 2896 2897 2898 2899 2900 2901 2902 2903 2904 2905 2906 2907 2908 2909 2910 2911 2912 2913 2914 2915 2916 2917 2918 2919 2920 2921 2922 2923 2924 2925 2926 2927 2928 2929 2930 2931 2932 2933 2934 2935 2936 2937 2938 2939 2940 2941 2942 2943 2944 2945 2946 2947 2948 2949 2950 2951 2952 2953 2954 2955 2956 2957 2958 2959 2960 2961 2962 2963 2964 2965 2966 2967 2968 2969 2970 2971 2972 2973 2974 2975 2976 2977 2978 2979 2980 2981 2982 2983 2984 2985 2986 2987 2988 2989 2990 2991 2992 2993 2994 2995 2996 2997 2998 2999 3000 3001 3002 3003 3004 3005 3006 3007 3008 3009 3010 3011 3012 3013 3014 3015 3016 3017 3018 3019 3020 3021 3022 3023 3024 3025 3026 3027 3028 3029 3030 3031 3032 3033 3034 3035 3036 3037 3038 3039 3040 3041 3042 3043 3044 3045 3046 3047 3048 3049 3050 3051 3052 3053 3054 3055 3056 3057 3058 3059 3060 3061 3062 3063 3064 3065 3066 3067 3068 3069 3070 3071 3072 3073 3074 3075 3076 3077 3078 3079 3080 3081 3082 3083 3084 3085 3086 3087 3088 3089 3090 3091 3092 3093 3094 3095 3096 3097 3098 3099 3100 3101 3102 3103 3104 3105 3106 3107 3108 3109 3110 3111 3112 3113 3114 3115 3116 3117 3118 3119 3120 3121 3122 3123 3124 3125 3126 3127 3128 3129 3130 3131 3132 3133 3134 3135 3136 3137 3138 3139 3140 3141 3142 3143 3144 3145 3146 3147 3148 3149 3150 3151 3152 3153 3154 3155 3156 3157 3158 3159 3160 3161 3162 3163 3164 3165 3166 3167 3168 3169 3170 3171 3172 3173 3174 3175 3176 3177 3178 3179 3180 3181 3182 3183 3184 3185 3186 3187 3188 3189 3190 3191 3192 3193 3194 3195 3196 3197 3198 3199 3200 3201 3202 3203 3204 3205 3206 3207 3208 3209 3210 3211 3212 3213 3214 3215 3216 3217 3218 3219 3220 3221 3222 3223 3224 3225 3226 3227 3228 3229 3230 3231 3232 3233 3234 3235 3236 3237 3238 3239 3240 3241 3242 3243 3244 3245 3246 3247 3248 3249 3250 3251 3252 3253 3254 3255 3256 3257 3258 3259 3260 3261 3262 3263 3264 3265 3266 3267 3268 3269 3270 3271 3272 3273 3274 3275 3276 3277 3278 3279 3280 3281 3282 3283 3284 3285 3286 3287 3288 3289 3290 3291 3292 3293 3294 | // SPDX-License-Identifier: GPL-2.0-only /* Copyright (C) 2009 Red Hat, Inc. * Copyright (C) 2006 Rusty Russell IBM Corporation * * Author: Michael S. Tsirkin <mst@redhat.com> * * Inspiration, some code, and most witty comments come from * Documentation/virtual/lguest/lguest.c, by Rusty Russell * * Generic code for virtio server in host kernel. */ #include <linux/eventfd.h> #include <linux/vhost.h> #include <linux/uio.h> #include <linux/mm.h> #include <linux/miscdevice.h> #include <linux/mutex.h> #include <linux/poll.h> #include <linux/file.h> #include <linux/highmem.h> #include <linux/slab.h> #include <linux/vmalloc.h> #include <linux/kthread.h> #include <linux/cgroup.h> #include <linux/module.h> #include <linux/sort.h> #include <linux/sched/mm.h> #include <linux/sched/signal.h> #include <linux/sched/vhost_task.h> #include <linux/interval_tree_generic.h> #include <linux/nospec.h> #include <linux/kcov.h> #include "vhost.h" static ushort max_mem_regions = 64; module_param(max_mem_regions, ushort, 0444); MODULE_PARM_DESC(max_mem_regions, "Maximum number of memory regions in memory map. (default: 64)"); static int max_iotlb_entries = 2048; module_param(max_iotlb_entries, int, 0444); MODULE_PARM_DESC(max_iotlb_entries, "Maximum number of iotlb entries. (default: 2048)"); static bool fork_from_owner_default = VHOST_FORK_OWNER_TASK; #ifdef CONFIG_VHOST_ENABLE_FORK_OWNER_CONTROL module_param(fork_from_owner_default, bool, 0444); MODULE_PARM_DESC(fork_from_owner_default, "Set task mode as the default(default: Y)"); #endif enum { VHOST_MEMORY_F_LOG = 0x1, }; #define vhost_used_event(vq) ((__virtio16 __user *)&vq->avail->ring[vq->num]) #define vhost_avail_event(vq) ((__virtio16 __user *)&vq->used->ring[vq->num]) #ifdef CONFIG_VHOST_CROSS_ENDIAN_LEGACY static void vhost_disable_cross_endian(struct vhost_virtqueue *vq) { vq->user_be = !virtio_legacy_is_little_endian(); } static void vhost_enable_cross_endian_big(struct vhost_virtqueue *vq) { vq->user_be = true; } static void vhost_enable_cross_endian_little(struct vhost_virtqueue *vq) { vq->user_be = false; } static long vhost_set_vring_endian(struct vhost_virtqueue *vq, int __user *argp) { struct vhost_vring_state s; if (vq->private_data) return -EBUSY; if (copy_from_user(&s, argp, sizeof(s))) return -EFAULT; if (s.num != VHOST_VRING_LITTLE_ENDIAN && s.num != VHOST_VRING_BIG_ENDIAN) return -EINVAL; if (s.num == VHOST_VRING_BIG_ENDIAN) vhost_enable_cross_endian_big(vq); else vhost_enable_cross_endian_little(vq); return 0; } static long vhost_get_vring_endian(struct vhost_virtqueue *vq, u32 idx, int __user *argp) { struct vhost_vring_state s = { .index = idx, .num = vq->user_be }; if (copy_to_user(argp, &s, sizeof(s))) return -EFAULT; return 0; } static void vhost_init_is_le(struct vhost_virtqueue *vq) { /* Note for legacy virtio: user_be is initialized at reset time * according to the host endianness. If userspace does not set an * explicit endianness, the default behavior is native endian, as * expected by legacy virtio. */ vq->is_le = vhost_has_feature(vq, VIRTIO_F_VERSION_1) || !vq->user_be; } #else static void vhost_disable_cross_endian(struct vhost_virtqueue *vq) { } static long vhost_set_vring_endian(struct vhost_virtqueue *vq, int __user *argp) { return -ENOIOCTLCMD; } static long vhost_get_vring_endian(struct vhost_virtqueue *vq, u32 idx, int __user *argp) { return -ENOIOCTLCMD; } static void vhost_init_is_le(struct vhost_virtqueue *vq) { vq->is_le = vhost_has_feature(vq, VIRTIO_F_VERSION_1) || virtio_legacy_is_little_endian(); } #endif /* CONFIG_VHOST_CROSS_ENDIAN_LEGACY */ static void vhost_reset_is_le(struct vhost_virtqueue *vq) { vhost_init_is_le(vq); } struct vhost_flush_struct { struct vhost_work work; struct completion wait_event; }; static void vhost_flush_work(struct vhost_work *work) { struct vhost_flush_struct *s; s = container_of(work, struct vhost_flush_struct, work); complete(&s->wait_event); } static void vhost_poll_func(struct file *file, wait_queue_head_t *wqh, poll_table *pt) { struct vhost_poll *poll; poll = container_of(pt, struct vhost_poll, table); poll->wqh = wqh; add_wait_queue(wqh, &poll->wait); } static int vhost_poll_wakeup(wait_queue_entry_t *wait, unsigned mode, int sync, void *key) { struct vhost_poll *poll = container_of(wait, struct vhost_poll, wait); struct vhost_work *work = &poll->work; if (!(key_to_poll(key) & poll->mask)) return 0; if (!poll->dev->use_worker) work->fn(work); else vhost_poll_queue(poll); return 0; } void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn) { clear_bit(VHOST_WORK_QUEUED, &work->flags); work->fn = fn; } EXPORT_SYMBOL_GPL(vhost_work_init); /* Init poll structure */ void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn, __poll_t mask, struct vhost_dev *dev, struct vhost_virtqueue *vq) { init_waitqueue_func_entry(&poll->wait, vhost_poll_wakeup); init_poll_funcptr(&poll->table, vhost_poll_func); poll->mask = mask; poll->dev = dev; poll->wqh = NULL; poll->vq = vq; vhost_work_init(&poll->work, fn); } EXPORT_SYMBOL_GPL(vhost_poll_init); /* Start polling a file. We add ourselves to file's wait queue. The caller must * keep a reference to a file until after vhost_poll_stop is called. */ int vhost_poll_start(struct vhost_poll *poll, struct file *file) { __poll_t mask; if (poll->wqh) return 0; mask = vfs_poll(file, &poll->table); if (mask) vhost_poll_wakeup(&poll->wait, 0, 0, poll_to_key(mask)); if (mask & EPOLLERR) { vhost_poll_stop(poll); return -EINVAL; } return 0; } EXPORT_SYMBOL_GPL(vhost_poll_start); /* Stop polling a file. After this function returns, it becomes safe to drop the * file reference. You must also flush afterwards. */ void vhost_poll_stop(struct vhost_poll *poll) { if (poll->wqh) { remove_wait_queue(poll->wqh, &poll->wait); poll->wqh = NULL; } } EXPORT_SYMBOL_GPL(vhost_poll_stop); static void vhost_worker_queue(struct vhost_worker *worker, struct vhost_work *work) { if (!test_and_set_bit(VHOST_WORK_QUEUED, &work->flags)) { /* We can only add the work to the list after we're * sure it was not in the list. * test_and_set_bit() implies a memory barrier. */ llist_add(&work->node, &worker->work_list); worker->ops->wakeup(worker); } } bool vhost_vq_work_queue(struct vhost_virtqueue *vq, struct vhost_work *work) { struct vhost_worker *worker; bool queued = false; rcu_read_lock(); worker = rcu_dereference(vq->worker); if (worker) { queued = true; vhost_worker_queue(worker, work); } rcu_read_unlock(); return queued; } EXPORT_SYMBOL_GPL(vhost_vq_work_queue); /** * __vhost_worker_flush - flush a worker * @worker: worker to flush * * The worker's flush_mutex must be held. */ static void __vhost_worker_flush(struct vhost_worker *worker) { struct vhost_flush_struct flush; if (!worker->attachment_cnt || worker->killed) return; init_completion(&flush.wait_event); vhost_work_init(&flush.work, vhost_flush_work); vhost_worker_queue(worker, &flush.work); /* * Drop mutex in case our worker is killed and it needs to take the * mutex to force cleanup. */ mutex_unlock(&worker->mutex); wait_for_completion(&flush.wait_event); mutex_lock(&worker->mutex); } static void vhost_worker_flush(struct vhost_worker *worker) { mutex_lock(&worker->mutex); __vhost_worker_flush(worker); mutex_unlock(&worker->mutex); } void vhost_dev_flush(struct vhost_dev *dev) { struct vhost_worker *worker; unsigned long i; xa_for_each(&dev->worker_xa, i, worker) vhost_worker_flush(worker); } EXPORT_SYMBOL_GPL(vhost_dev_flush); /* A lockless hint for busy polling code to exit the loop */ bool vhost_vq_has_work(struct vhost_virtqueue *vq) { struct vhost_worker *worker; bool has_work = false; rcu_read_lock(); worker = rcu_dereference(vq->worker); if (worker && !llist_empty(&worker->work_list)) has_work = true; rcu_read_unlock(); return has_work; } EXPORT_SYMBOL_GPL(vhost_vq_has_work); void vhost_poll_queue(struct vhost_poll *poll) { vhost_vq_work_queue(poll->vq, &poll->work); } EXPORT_SYMBOL_GPL(vhost_poll_queue); static void __vhost_vq_meta_reset(struct vhost_virtqueue *vq) { int j; for (j = 0; j < VHOST_NUM_ADDRS; j++) vq->meta_iotlb[j] = NULL; } static void vhost_vq_meta_reset(struct vhost_dev *d) { int i; for (i = 0; i < d->nvqs; ++i) __vhost_vq_meta_reset(d->vqs[i]); } static void vhost_vring_call_reset(struct vhost_vring_call *call_ctx) { call_ctx->ctx = NULL; memset(&call_ctx->producer, 0x0, sizeof(struct irq_bypass_producer)); } bool vhost_vq_is_setup(struct vhost_virtqueue *vq) { return vq->avail && vq->desc && vq->used && vhost_vq_access_ok(vq); } EXPORT_SYMBOL_GPL(vhost_vq_is_setup); static void vhost_vq_reset(struct vhost_dev *dev, struct vhost_virtqueue *vq) { vq->num = 1; vq->desc = NULL; vq->avail = NULL; vq->used = NULL; vq->last_avail_idx = 0; vq->next_avail_head = 0; vq->avail_idx = 0; vq->last_used_idx = 0; vq->signalled_used = 0; vq->signalled_used_valid = false; vq->used_flags = 0; vq->log_used = false; vq->log_addr = -1ull; vq->private_data = NULL; virtio_features_zero(vq->acked_features_array); vq->acked_backend_features = 0; vq->log_base = NULL; vq->error_ctx = NULL; vq->kick = NULL; vq->log_ctx = NULL; vhost_disable_cross_endian(vq); vhost_reset_is_le(vq); vq->busyloop_timeout = 0; vq->umem = NULL; vq->iotlb = NULL; rcu_assign_pointer(vq->worker, NULL); vhost_vring_call_reset(&vq->call_ctx); __vhost_vq_meta_reset(vq); } static int vhost_run_work_kthread_list(void *data) { struct vhost_worker *worker = data; struct vhost_work *work, *work_next; struct vhost_dev *dev = worker->dev; struct llist_node *node; kthread_use_mm(dev->mm); for (;;) { /* mb paired w/ kthread_stop */ set_current_state(TASK_INTERRUPTIBLE); if (kthread_should_stop()) { __set_current_state(TASK_RUNNING); break; } node = llist_del_all(&worker->work_list); if (!node) schedule(); node = llist_reverse_order(node); /* make sure flag is seen after deletion */ smp_wmb(); llist_for_each_entry_safe(work, work_next, node, node) { clear_bit(VHOST_WORK_QUEUED, &work->flags); __set_current_state(TASK_RUNNING); kcov_remote_start_common(worker->kcov_handle); work->fn(work); kcov_remote_stop(); cond_resched(); } } kthread_unuse_mm(dev->mm); return 0; } static bool vhost_run_work_list(void *data) { struct vhost_worker *worker = data; struct vhost_work *work, *work_next; struct llist_node *node; node = llist_del_all(&worker->work_list); if (node) { __set_current_state(TASK_RUNNING); node = llist_reverse_order(node); /* make sure flag is seen after deletion */ smp_wmb(); llist_for_each_entry_safe(work, work_next, node, node) { clear_bit(VHOST_WORK_QUEUED, &work->flags); kcov_remote_start_common(worker->kcov_handle); work->fn(work); kcov_remote_stop(); cond_resched(); } } return !!node; } static void vhost_worker_killed(void *data) { struct vhost_worker *worker = data; struct vhost_dev *dev = worker->dev; struct vhost_virtqueue *vq; int i, attach_cnt = 0; mutex_lock(&worker->mutex); worker->killed = true; for (i = 0; i < dev->nvqs; i++) { vq = dev->vqs[i]; mutex_lock(&vq->mutex); if (worker == rcu_dereference_check(vq->worker, lockdep_is_held(&vq->mutex))) { rcu_assign_pointer(vq->worker, NULL); attach_cnt++; } mutex_unlock(&vq->mutex); } worker->attachment_cnt -= attach_cnt; if (attach_cnt) synchronize_rcu(); /* * Finish vhost_worker_flush calls and any other works that snuck in * before the synchronize_rcu. */ vhost_run_work_list(worker); mutex_unlock(&worker->mutex); } static void vhost_vq_free_iovecs(struct vhost_virtqueue *vq) { kfree(vq->indirect); vq->indirect = NULL; kfree(vq->log); vq->log = NULL; kfree(vq->heads); vq->heads = NULL; kfree(vq->nheads); vq->nheads = NULL; } /* Helper to allocate iovec buffers for all vqs. */ static long vhost_dev_alloc_iovecs(struct vhost_dev *dev) { struct vhost_virtqueue *vq; int i; for (i = 0; i < dev->nvqs; ++i) { vq = dev->vqs[i]; vq->indirect = kmalloc_array(UIO_MAXIOV, sizeof(*vq->indirect), GFP_KERNEL); vq->log = kmalloc_array(dev->iov_limit, sizeof(*vq->log), GFP_KERNEL); vq->heads = kmalloc_array(dev->iov_limit, sizeof(*vq->heads), GFP_KERNEL); vq->nheads = kmalloc_array(dev->iov_limit, sizeof(*vq->nheads), GFP_KERNEL); if (!vq->indirect || !vq->log || !vq->heads || !vq->nheads) goto err_nomem; } return 0; err_nomem: for (; i >= 0; --i) vhost_vq_free_iovecs(dev->vqs[i]); return -ENOMEM; } static void vhost_dev_free_iovecs(struct vhost_dev *dev) { int i; for (i = 0; i < dev->nvqs; ++i) vhost_vq_free_iovecs(dev->vqs[i]); } bool vhost_exceeds_weight(struct vhost_virtqueue *vq, int pkts, int total_len) { struct vhost_dev *dev = vq->dev; if ((dev->byte_weight && total_len >= dev->byte_weight) || pkts >= dev->weight) { vhost_poll_queue(&vq->poll); return true; } return false; } EXPORT_SYMBOL_GPL(vhost_exceeds_weight); static size_t vhost_get_avail_size(struct vhost_virtqueue *vq, unsigned int num) { size_t event __maybe_unused = vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX) ? 2 : 0; return size_add(struct_size(vq->avail, ring, num), event); } static size_t vhost_get_used_size(struct vhost_virtqueue *vq, unsigned int num) { size_t event __maybe_unused = vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX) ? 2 : 0; return size_add(struct_size(vq->used, ring, num), event); } static size_t vhost_get_desc_size(struct vhost_virtqueue *vq, unsigned int num) { return sizeof(*vq->desc) * num; } void vhost_dev_init(struct vhost_dev *dev, struct vhost_virtqueue **vqs, int nvqs, int iov_limit, int weight, int byte_weight, bool use_worker, int (*msg_handler)(struct vhost_dev *dev, u32 asid, struct vhost_iotlb_msg *msg)) { struct vhost_virtqueue *vq; int i; dev->vqs = vqs; dev->nvqs = nvqs; mutex_init(&dev->mutex); dev->log_ctx = NULL; dev->umem = NULL; dev->iotlb = NULL; dev->mm = NULL; dev->iov_limit = iov_limit; dev->weight = weight; dev->byte_weight = byte_weight; dev->use_worker = use_worker; dev->msg_handler = msg_handler; dev->fork_owner = fork_from_owner_default; init_waitqueue_head(&dev->wait); INIT_LIST_HEAD(&dev->read_list); INIT_LIST_HEAD(&dev->pending_list); spin_lock_init(&dev->iotlb_lock); xa_init_flags(&dev->worker_xa, XA_FLAGS_ALLOC); for (i = 0; i < dev->nvqs; ++i) { vq = dev->vqs[i]; vq->log = NULL; vq->indirect = NULL; vq->heads = NULL; vq->nheads = NULL; vq->dev = dev; mutex_init(&vq->mutex); vhost_vq_reset(dev, vq); if (vq->handle_kick) vhost_poll_init(&vq->poll, vq->handle_kick, EPOLLIN, dev, vq); } } EXPORT_SYMBOL_GPL(vhost_dev_init); /* Caller should have device mutex */ long vhost_dev_check_owner(struct vhost_dev *dev) { /* Are you the owner? If not, I don't think you mean to do that */ return dev->mm == current->mm ? 0 : -EPERM; } EXPORT_SYMBOL_GPL(vhost_dev_check_owner); struct vhost_attach_cgroups_struct { struct vhost_work work; struct task_struct *owner; int ret; }; static void vhost_attach_cgroups_work(struct vhost_work *work) { struct vhost_attach_cgroups_struct *s; s = container_of(work, struct vhost_attach_cgroups_struct, work); s->ret = cgroup_attach_task_all(s->owner, current); } static int vhost_attach_task_to_cgroups(struct vhost_worker *worker) { struct vhost_attach_cgroups_struct attach; int saved_cnt; attach.owner = current; vhost_work_init(&attach.work, vhost_attach_cgroups_work); vhost_worker_queue(worker, &attach.work); mutex_lock(&worker->mutex); /* * Bypass attachment_cnt check in __vhost_worker_flush: * Temporarily change it to INT_MAX to bypass the check */ saved_cnt = worker->attachment_cnt; worker->attachment_cnt = INT_MAX; __vhost_worker_flush(worker); worker->attachment_cnt = saved_cnt; mutex_unlock(&worker->mutex); return attach.ret; } /* Caller should have device mutex */ bool vhost_dev_has_owner(struct vhost_dev *dev) { return dev->mm; } EXPORT_SYMBOL_GPL(vhost_dev_has_owner); static void vhost_attach_mm(struct vhost_dev *dev) { /* No owner, become one */ if (dev->use_worker) { dev->mm = get_task_mm(current); } else { /* vDPA device does not use worker thread, so there's * no need to hold the address space for mm. This helps * to avoid deadlock in the case of mmap() which may * hold the refcnt of the file and depends on release * method to remove vma. */ dev->mm = current->mm; mmgrab(dev->mm); } } static void vhost_detach_mm(struct vhost_dev *dev) { if (!dev->mm) return; if (dev->use_worker) mmput(dev->mm); else mmdrop(dev->mm); dev->mm = NULL; } static void vhost_worker_destroy(struct vhost_dev *dev, struct vhost_worker *worker) { if (!worker) return; WARN_ON(!llist_empty(&worker->work_list)); xa_erase(&dev->worker_xa, worker->id); worker->ops->stop(worker); kfree(worker); } static void vhost_workers_free(struct vhost_dev *dev) { struct vhost_worker *worker; unsigned long i; if (!dev->use_worker) return; for (i = 0; i < dev->nvqs; i++) rcu_assign_pointer(dev->vqs[i]->worker, NULL); /* * Free the default worker we created and cleanup workers userspace * created but couldn't clean up (it forgot or crashed). */ xa_for_each(&dev->worker_xa, i, worker) vhost_worker_destroy(dev, worker); xa_destroy(&dev->worker_xa); } static void vhost_task_wakeup(struct vhost_worker *worker) { return vhost_task_wake(worker->vtsk); } static void vhost_kthread_wakeup(struct vhost_worker *worker) { wake_up_process(worker->kthread_task); } static void vhost_task_do_stop(struct vhost_worker *worker) { return vhost_task_stop(worker->vtsk); } static void vhost_kthread_do_stop(struct vhost_worker *worker) { kthread_stop(worker->kthread_task); } static int vhost_task_worker_create(struct vhost_worker *worker, struct vhost_dev *dev, const char *name) { struct vhost_task *vtsk; u32 id; int ret; vtsk = vhost_task_create(vhost_run_work_list, vhost_worker_killed, worker, name); if (IS_ERR(vtsk)) return PTR_ERR(vtsk); worker->vtsk = vtsk; vhost_task_start(vtsk); ret = xa_alloc(&dev->worker_xa, &id, worker, xa_limit_32b, GFP_KERNEL); if (ret < 0) { vhost_task_do_stop(worker); return ret; } worker->id = id; return 0; } static int vhost_kthread_worker_create(struct vhost_worker *worker, struct vhost_dev *dev, const char *name) { struct task_struct *task; u32 id; int ret; task = kthread_create(vhost_run_work_kthread_list, worker, "%s", name); if (IS_ERR(task)) return PTR_ERR(task); worker->kthread_task = task; wake_up_process(task); ret = xa_alloc(&dev->worker_xa, &id, worker, xa_limit_32b, GFP_KERNEL); if (ret < 0) goto stop_worker; ret = vhost_attach_task_to_cgroups(worker); if (ret) goto stop_worker; worker->id = id; return 0; stop_worker: vhost_kthread_do_stop(worker); return ret; } static const struct vhost_worker_ops kthread_ops = { .create = vhost_kthread_worker_create, .stop = vhost_kthread_do_stop, .wakeup = vhost_kthread_wakeup, }; static const struct vhost_worker_ops vhost_task_ops = { .create = vhost_task_worker_create, .stop = vhost_task_do_stop, .wakeup = vhost_task_wakeup, }; static struct vhost_worker *vhost_worker_create(struct vhost_dev *dev) { struct vhost_worker *worker; char name[TASK_COMM_LEN]; int ret; const struct vhost_worker_ops *ops = dev->fork_owner ? &vhost_task_ops : &kthread_ops; worker = kzalloc(sizeof(*worker), GFP_KERNEL_ACCOUNT); if (!worker) return NULL; worker->dev = dev; worker->ops = ops; snprintf(name, sizeof(name), "vhost-%d", current->pid); mutex_init(&worker->mutex); init_llist_head(&worker->work_list); worker->kcov_handle = kcov_common_handle(); ret = ops->create(worker, dev, name); if (ret < 0) goto free_worker; return worker; free_worker: kfree(worker); return NULL; } /* Caller must have device mutex */ static void __vhost_vq_attach_worker(struct vhost_virtqueue *vq, struct vhost_worker *worker) { struct vhost_worker *old_worker; mutex_lock(&worker->mutex); if (worker->killed) { mutex_unlock(&worker->mutex); return; } mutex_lock(&vq->mutex); old_worker = rcu_dereference_check(vq->worker, lockdep_is_held(&vq->mutex)); rcu_assign_pointer(vq->worker, worker); worker->attachment_cnt++; if (!old_worker) { mutex_unlock(&vq->mutex); mutex_unlock(&worker->mutex); return; } mutex_unlock(&vq->mutex); mutex_unlock(&worker->mutex); /* * Take the worker mutex to make sure we see the work queued from * device wide flushes which doesn't use RCU for execution. */ mutex_lock(&old_worker->mutex); if (old_worker->killed) { mutex_unlock(&old_worker->mutex); return; } /* * We don't want to call synchronize_rcu for every vq during setup * because it will slow down VM startup. If we haven't done * VHOST_SET_VRING_KICK and not done the driver specific * SET_ENDPOINT/RUNNING then we can skip the sync since there will * not be any works queued for scsi and net. */ mutex_lock(&vq->mutex); if (!vhost_vq_get_backend(vq) && !vq->kick) { mutex_unlock(&vq->mutex); old_worker->attachment_cnt--; mutex_unlock(&old_worker->mutex); /* * vsock can queue anytime after VHOST_VSOCK_SET_GUEST_CID. * Warn if it adds support for multiple workers but forgets to * handle the early queueing case. */ WARN_ON(!old_worker->attachment_cnt && !llist_empty(&old_worker->work_list)); return; } mutex_unlock(&vq->mutex); /* Make sure new vq queue/flush/poll calls see the new worker */ synchronize_rcu(); /* Make sure whatever was queued gets run */ __vhost_worker_flush(old_worker); old_worker->attachment_cnt--; mutex_unlock(&old_worker->mutex); } /* Caller must have device mutex */ static int vhost_vq_attach_worker(struct vhost_virtqueue *vq, struct vhost_vring_worker *info) { unsigned long index = info->worker_id; struct vhost_dev *dev = vq->dev; struct vhost_worker *worker; if (!dev->use_worker) return -EINVAL; worker = xa_find(&dev->worker_xa, &index, UINT_MAX, XA_PRESENT); if (!worker || worker->id != info->worker_id) return -ENODEV; __vhost_vq_attach_worker(vq, worker); return 0; } /* Caller must have device mutex */ static int vhost_new_worker(struct vhost_dev *dev, struct vhost_worker_state *info) { struct vhost_worker *worker; worker = vhost_worker_create(dev); if (!worker) return -ENOMEM; info->worker_id = worker->id; return 0; } /* Caller must have device mutex */ static int vhost_free_worker(struct vhost_dev *dev, struct vhost_worker_state *info) { unsigned long index = info->worker_id; struct vhost_worker *worker; worker = xa_find(&dev->worker_xa, &index, UINT_MAX, XA_PRESENT); if (!worker || worker->id != info->worker_id) return -ENODEV; mutex_lock(&worker->mutex); if (worker->attachment_cnt || worker->killed) { mutex_unlock(&worker->mutex); return -EBUSY; } /* * A flush might have raced and snuck in before attachment_cnt was set * to zero. Make sure flushes are flushed from the queue before * freeing. */ __vhost_worker_flush(worker); mutex_unlock(&worker->mutex); vhost_worker_destroy(dev, worker); return 0; } static int vhost_get_vq_from_user(struct vhost_dev *dev, void __user *argp, struct vhost_virtqueue **vq, u32 *id) { u32 __user *idxp = argp; u32 idx; long r; r = get_user(idx, idxp); if (r < 0) return r; if (idx >= dev->nvqs) return -ENOBUFS; idx = array_index_nospec(idx, dev->nvqs); *vq = dev->vqs[idx]; *id = idx; return 0; } /* Caller must have device mutex */ long vhost_worker_ioctl(struct vhost_dev *dev, unsigned int ioctl, void __user *argp) { struct vhost_vring_worker ring_worker; struct vhost_worker_state state; struct vhost_worker *worker; struct vhost_virtqueue *vq; long ret; u32 idx; if (!dev->use_worker) return -EINVAL; if (!vhost_dev_has_owner(dev)) return -EINVAL; ret = vhost_dev_check_owner(dev); if (ret) return ret; switch (ioctl) { /* dev worker ioctls */ case VHOST_NEW_WORKER: /* * vhost_tasks will account for worker threads under the parent's * NPROC value but kthreads do not. To avoid userspace overflowing * the system with worker threads fork_owner must be true. */ if (!dev->fork_owner) return -EFAULT; ret = vhost_new_worker(dev, &state); if (!ret && copy_to_user(argp, &state, sizeof(state))) ret = -EFAULT; return ret; case VHOST_FREE_WORKER: if (copy_from_user(&state, argp, sizeof(state))) return -EFAULT; return vhost_free_worker(dev, &state); /* vring worker ioctls */ case VHOST_ATTACH_VRING_WORKER: case VHOST_GET_VRING_WORKER: break; default: return -ENOIOCTLCMD; } ret = vhost_get_vq_from_user(dev, argp, &vq, &idx); if (ret) return ret; switch (ioctl) { case VHOST_ATTACH_VRING_WORKER: if (copy_from_user(&ring_worker, argp, sizeof(ring_worker))) { ret = -EFAULT; break; } ret = vhost_vq_attach_worker(vq, &ring_worker); break; case VHOST_GET_VRING_WORKER: worker = rcu_dereference_check(vq->worker, lockdep_is_held(&dev->mutex)); if (!worker) { ret = -EINVAL; break; } ring_worker.index = idx; ring_worker.worker_id = worker->id; if (copy_to_user(argp, &ring_worker, sizeof(ring_worker))) ret = -EFAULT; break; default: ret = -ENOIOCTLCMD; break; } return ret; } EXPORT_SYMBOL_GPL(vhost_worker_ioctl); /* Caller should have device mutex */ long vhost_dev_set_owner(struct vhost_dev *dev) { struct vhost_worker *worker; int err, i; /* Is there an owner already? */ if (vhost_dev_has_owner(dev)) { err = -EBUSY; goto err_mm; } vhost_attach_mm(dev); err = vhost_dev_alloc_iovecs(dev); if (err) goto err_iovecs; if (dev->use_worker) { /* * This should be done last, because vsock can queue work * before VHOST_SET_OWNER so it simplifies the failure path * below since we don't have to worry about vsock queueing * while we free the worker. */ worker = vhost_worker_create(dev); if (!worker) { err = -ENOMEM; goto err_worker; } for (i = 0; i < dev->nvqs; i++) __vhost_vq_attach_worker(dev->vqs[i], worker); } return 0; err_worker: vhost_dev_free_iovecs(dev); err_iovecs: vhost_detach_mm(dev); err_mm: return err; } EXPORT_SYMBOL_GPL(vhost_dev_set_owner); static struct vhost_iotlb *iotlb_alloc(void) { return vhost_iotlb_alloc(max_iotlb_entries, VHOST_IOTLB_FLAG_RETIRE); } struct vhost_iotlb *vhost_dev_reset_owner_prepare(void) { return iotlb_alloc(); } EXPORT_SYMBOL_GPL(vhost_dev_reset_owner_prepare); /* Caller should have device mutex */ void vhost_dev_reset_owner(struct vhost_dev *dev, struct vhost_iotlb *umem) { int i; vhost_dev_cleanup(dev); dev->fork_owner = fork_from_owner_default; dev->umem = umem; /* We don't need VQ locks below since vhost_dev_cleanup makes sure * VQs aren't running. */ for (i = 0; i < dev->nvqs; ++i) dev->vqs[i]->umem = umem; } EXPORT_SYMBOL_GPL(vhost_dev_reset_owner); void vhost_dev_stop(struct vhost_dev *dev) { int i; for (i = 0; i < dev->nvqs; ++i) { if (dev->vqs[i]->kick && dev->vqs[i]->handle_kick) vhost_poll_stop(&dev->vqs[i]->poll); } vhost_dev_flush(dev); } EXPORT_SYMBOL_GPL(vhost_dev_stop); void vhost_clear_msg(struct vhost_dev *dev) { struct vhost_msg_node *node, *n; spin_lock(&dev->iotlb_lock); list_for_each_entry_safe(node, n, &dev->read_list, node) { list_del(&node->node); kfree(node); } list_for_each_entry_safe(node, n, &dev->pending_list, node) { list_del(&node->node); kfree(node); } spin_unlock(&dev->iotlb_lock); } EXPORT_SYMBOL_GPL(vhost_clear_msg); void vhost_dev_cleanup(struct vhost_dev *dev) { int i; for (i = 0; i < dev->nvqs; ++i) { if (dev->vqs[i]->error_ctx) eventfd_ctx_put(dev->vqs[i]->error_ctx); if (dev->vqs[i]->kick) fput(dev->vqs[i]->kick); if (dev->vqs[i]->call_ctx.ctx) eventfd_ctx_put(dev->vqs[i]->call_ctx.ctx); vhost_vq_reset(dev, dev->vqs[i]); } vhost_dev_free_iovecs(dev); if (dev->log_ctx) eventfd_ctx_put(dev->log_ctx); dev->log_ctx = NULL; /* No one will access memory at this point */ vhost_iotlb_free(dev->umem); dev->umem = NULL; vhost_iotlb_free(dev->iotlb); dev->iotlb = NULL; vhost_clear_msg(dev); wake_up_interruptible_poll(&dev->wait, EPOLLIN | EPOLLRDNORM); vhost_workers_free(dev); vhost_detach_mm(dev); } EXPORT_SYMBOL_GPL(vhost_dev_cleanup); static bool log_access_ok(void __user *log_base, u64 addr, unsigned long sz) { u64 a = addr / VHOST_PAGE_SIZE / 8; /* Make sure 64 bit math will not overflow. */ if (a > ULONG_MAX - (unsigned long)log_base || a + (unsigned long)log_base > ULONG_MAX) return false; return access_ok(log_base + a, (sz + VHOST_PAGE_SIZE * 8 - 1) / VHOST_PAGE_SIZE / 8); } /* Make sure 64 bit math will not overflow. */ static bool vhost_overflow(u64 uaddr, u64 size) { if (uaddr > ULONG_MAX || size > ULONG_MAX) return true; if (!size) return false; return uaddr > ULONG_MAX - size + 1; } /* Caller should have vq mutex and device mutex. */ static bool vq_memory_access_ok(void __user *log_base, struct vhost_iotlb *umem, int log_all) { struct vhost_iotlb_map *map; if (!umem) return false; list_for_each_entry(map, &umem->list, link) { unsigned long a = map->addr; if (vhost_overflow(map->addr, map->size)) return false; if (!access_ok((void __user *)a, map->size)) return false; else if (log_all && !log_access_ok(log_base, map->start, map->size)) return false; } return true; } static inline void __user *vhost_vq_meta_fetch(struct vhost_virtqueue *vq, u64 addr, unsigned int size, int type) { const struct vhost_iotlb_map *map = vq->meta_iotlb[type]; if (!map) return NULL; return (void __user *)(uintptr_t)(map->addr + addr - map->start); } /* Can we switch to this memory table? */ /* Caller should have device mutex but not vq mutex */ static bool memory_access_ok(struct vhost_dev *d, struct vhost_iotlb *umem, int log_all) { int i; for (i = 0; i < d->nvqs; ++i) { bool ok; bool log; mutex_lock(&d->vqs[i]->mutex); log = log_all || vhost_has_feature(d->vqs[i], VHOST_F_LOG_ALL); /* If ring is inactive, will check when it's enabled. */ if (d->vqs[i]->private_data) ok = vq_memory_access_ok(d->vqs[i]->log_base, umem, log); else ok = true; mutex_unlock(&d->vqs[i]->mutex); if (!ok) return false; } return true; } static int translate_desc(struct vhost_virtqueue *vq, u64 addr, u32 len, struct iovec iov[], int iov_size, int access); static int vhost_copy_to_user(struct vhost_virtqueue *vq, void __user *to, const void *from, unsigned size) { int ret; if (!vq->iotlb) return __copy_to_user(to, from, size); else { /* This function should be called after iotlb * prefetch, which means we're sure that all vq * could be access through iotlb. So -EAGAIN should * not happen in this case. */ struct iov_iter t; void __user *uaddr = vhost_vq_meta_fetch(vq, (u64)(uintptr_t)to, size, VHOST_ADDR_USED); if (uaddr) return __copy_to_user(uaddr, from, size); ret = translate_desc(vq, (u64)(uintptr_t)to, size, vq->iotlb_iov, ARRAY_SIZE(vq->iotlb_iov), VHOST_ACCESS_WO); if (ret < 0) goto out; iov_iter_init(&t, ITER_DEST, vq->iotlb_iov, ret, size); ret = copy_to_iter(from, size, &t); if (ret == size) ret = 0; } out: return ret; } static int vhost_copy_from_user(struct vhost_virtqueue *vq, void *to, void __user *from, unsigned size) { int ret; if (!vq->iotlb) return __copy_from_user(to, from, size); else { /* This function should be called after iotlb * prefetch, which means we're sure that vq * could be access through iotlb. So -EAGAIN should * not happen in this case. */ void __user *uaddr = vhost_vq_meta_fetch(vq, (u64)(uintptr_t)from, size, VHOST_ADDR_DESC); struct iov_iter f; if (uaddr) return __copy_from_user(to, uaddr, size); ret = translate_desc(vq, (u64)(uintptr_t)from, size, vq->iotlb_iov, ARRAY_SIZE(vq->iotlb_iov), VHOST_ACCESS_RO); if (ret < 0) { vq_err(vq, "IOTLB translation failure: uaddr " "%p size 0x%llx\n", from, (unsigned long long) size); goto out; } iov_iter_init(&f, ITER_SOURCE, vq->iotlb_iov, ret, size); ret = copy_from_iter(to, size, &f); if (ret == size) ret = 0; } out: return ret; } static void __user *__vhost_get_user_slow(struct vhost_virtqueue *vq, void __user *addr, unsigned int size, int type) { int ret; ret = translate_desc(vq, (u64)(uintptr_t)addr, size, vq->iotlb_iov, ARRAY_SIZE(vq->iotlb_iov), VHOST_ACCESS_RO); if (ret < 0) { vq_err(vq, "IOTLB translation failure: uaddr " "%p size 0x%llx\n", addr, (unsigned long long) size); return NULL; } if (ret != 1 || vq->iotlb_iov[0].iov_len != size) { vq_err(vq, "Non atomic userspace memory access: uaddr " "%p size 0x%llx\n", addr, (unsigned long long) size); return NULL; } return vq->iotlb_iov[0].iov_base; } /* This function should be called after iotlb * prefetch, which means we're sure that vq * could be access through iotlb. So -EAGAIN should * not happen in this case. */ static inline void __user *__vhost_get_user(struct vhost_virtqueue *vq, void __user *addr, unsigned int size, int type) { void __user *uaddr = vhost_vq_meta_fetch(vq, (u64)(uintptr_t)addr, size, type); if (uaddr) return uaddr; return __vhost_get_user_slow(vq, addr, size, type); } #define vhost_put_user(vq, x, ptr) \ ({ \ int ret; \ if (!vq->iotlb) { \ ret = __put_user(x, ptr); \ } else { \ __typeof__(ptr) to = \ (__typeof__(ptr)) __vhost_get_user(vq, ptr, \ sizeof(*ptr), VHOST_ADDR_USED); \ if (to != NULL) \ ret = __put_user(x, to); \ else \ ret = -EFAULT; \ } \ ret; \ }) static inline int vhost_put_avail_event(struct vhost_virtqueue *vq) { return vhost_put_user(vq, cpu_to_vhost16(vq, vq->avail_idx), vhost_avail_event(vq)); } static inline int vhost_put_used(struct vhost_virtqueue *vq, struct vring_used_elem *head, int idx, int count) { return vhost_copy_to_user(vq, vq->used->ring + idx, head, count * sizeof(*head)); } static inline int vhost_put_used_flags(struct vhost_virtqueue *vq) { return vhost_put_user(vq, cpu_to_vhost16(vq, vq->used_flags), &vq->used->flags); } static inline int vhost_put_used_idx(struct vhost_virtqueue *vq) { return vhost_put_user(vq, cpu_to_vhost16(vq, vq->last_used_idx), &vq->used->idx); } #define vhost_get_user(vq, x, ptr, type) \ ({ \ int ret; \ if (!vq->iotlb) { \ ret = __get_user(x, ptr); \ } else { \ __typeof__(ptr) from = \ (__typeof__(ptr)) __vhost_get_user(vq, ptr, \ sizeof(*ptr), \ type); \ if (from != NULL) \ ret = __get_user(x, from); \ else \ ret = -EFAULT; \ } \ ret; \ }) #define vhost_get_avail(vq, x, ptr) \ vhost_get_user(vq, x, ptr, VHOST_ADDR_AVAIL) #define vhost_get_used(vq, x, ptr) \ vhost_get_user(vq, x, ptr, VHOST_ADDR_USED) static void vhost_dev_lock_vqs(struct vhost_dev *d) { int i = 0; for (i = 0; i < d->nvqs; ++i) mutex_lock_nested(&d->vqs[i]->mutex, i); } static void vhost_dev_unlock_vqs(struct vhost_dev *d) { int i = 0; for (i = 0; i < d->nvqs; ++i) mutex_unlock(&d->vqs[i]->mutex); } static inline int vhost_get_avail_idx(struct vhost_virtqueue *vq) { __virtio16 idx; int r; r = vhost_get_avail(vq, idx, &vq->avail->idx); if (unlikely(r < 0)) { vq_err(vq, "Failed to access available index at %p (%d)\n", &vq->avail->idx, r); return r; } /* Check it isn't doing very strange thing with available indexes */ vq->avail_idx = vhost16_to_cpu(vq, idx); if (unlikely((u16)(vq->avail_idx - vq->last_avail_idx) > vq->num)) { vq_err(vq, "Invalid available index change from %u to %u", vq->last_avail_idx, vq->avail_idx); return -EINVAL; } /* We're done if there is nothing new */ if (vq->avail_idx == vq->last_avail_idx) return 0; /* * We updated vq->avail_idx so we need a memory barrier between * the index read above and the caller reading avail ring entries. */ smp_rmb(); return 1; } static inline int vhost_get_avail_head(struct vhost_virtqueue *vq, __virtio16 *head, int idx) { return vhost_get_avail(vq, *head, &vq->avail->ring[idx & (vq->num - 1)]); } static inline int vhost_get_avail_flags(struct vhost_virtqueue *vq, __virtio16 *flags) { return vhost_get_avail(vq, *flags, &vq->avail->flags); } static inline int vhost_get_used_event(struct vhost_virtqueue *vq, __virtio16 *event) { return vhost_get_avail(vq, *event, vhost_used_event(vq)); } static inline int vhost_get_used_idx(struct vhost_virtqueue *vq, __virtio16 *idx) { return vhost_get_used(vq, *idx, &vq->used->idx); } static inline int vhost_get_desc(struct vhost_virtqueue *vq, struct vring_desc *desc, int idx) { return vhost_copy_from_user(vq, desc, vq->desc + idx, sizeof(*desc)); } static void vhost_iotlb_notify_vq(struct vhost_dev *d, struct vhost_iotlb_msg *msg) { struct vhost_msg_node *node, *n; spin_lock(&d->iotlb_lock); list_for_each_entry_safe(node, n, &d->pending_list, node) { struct vhost_iotlb_msg *vq_msg = &node->msg.iotlb; if (msg->iova <= vq_msg->iova && msg->iova + msg->size - 1 >= vq_msg->iova && vq_msg->type == VHOST_IOTLB_MISS) { vhost_poll_queue(&node->vq->poll); list_del(&node->node); kfree(node); } } spin_unlock(&d->iotlb_lock); } static bool umem_access_ok(u64 uaddr, u64 size, int access) { unsigned long a = uaddr; /* Make sure 64 bit math will not overflow. */ if (vhost_overflow(uaddr, size)) return false; if ((access & VHOST_ACCESS_RO) && !access_ok((void __user *)a, size)) return false; if ((access & VHOST_ACCESS_WO) && !access_ok((void __user *)a, size)) return false; return true; } static int vhost_process_iotlb_msg(struct vhost_dev *dev, u32 asid, struct vhost_iotlb_msg *msg) { int ret = 0; if (asid != 0) return -EINVAL; mutex_lock(&dev->mutex); vhost_dev_lock_vqs(dev); switch (msg->type) { case VHOST_IOTLB_UPDATE: if (!dev->iotlb) { ret = -EFAULT; break; } if (!umem_access_ok(msg->uaddr, msg->size, msg->perm)) { ret = -EFAULT; break; } vhost_vq_meta_reset(dev); if (vhost_iotlb_add_range(dev->iotlb, msg->iova, msg->iova + msg->size - 1, msg->uaddr, msg->perm)) { ret = -ENOMEM; break; } vhost_iotlb_notify_vq(dev, msg); break; case VHOST_IOTLB_INVALIDATE: if (!dev->iotlb) { ret = -EFAULT; break; } vhost_vq_meta_reset(dev); vhost_iotlb_del_range(dev->iotlb, msg->iova, msg->iova + msg->size - 1); break; default: ret = -EINVAL; break; } vhost_dev_unlock_vqs(dev); mutex_unlock(&dev->mutex); return ret; } ssize_t vhost_chr_write_iter(struct vhost_dev *dev, struct iov_iter *from) { struct vhost_iotlb_msg msg; size_t offset; int type, ret; u32 asid = 0; ret = copy_from_iter(&type, sizeof(type), from); if (ret != sizeof(type)) { ret = -EINVAL; goto done; } switch (type) { case VHOST_IOTLB_MSG: /* There maybe a hole after type for V1 message type, * so skip it here. */ offset = offsetof(struct vhost_msg, iotlb) - sizeof(int); break; case VHOST_IOTLB_MSG_V2: if (vhost_backend_has_feature(dev->vqs[0], VHOST_BACKEND_F_IOTLB_ASID)) { ret = copy_from_iter(&asid, sizeof(asid), from); if (ret != sizeof(asid)) { ret = -EINVAL; goto done; } offset = 0; } else offset = sizeof(__u32); break; default: ret = -EINVAL; goto done; } iov_iter_advance(from, offset); ret = copy_from_iter(&msg, sizeof(msg), from); if (ret != sizeof(msg)) { ret = -EINVAL; goto done; } if (msg.type == VHOST_IOTLB_UPDATE && msg.size == 0) { ret = -EINVAL; goto done; } if (dev->msg_handler) ret = dev->msg_handler(dev, asid, &msg); else ret = vhost_process_iotlb_msg(dev, asid, &msg); if (ret) { ret = -EFAULT; goto done; } ret = (type == VHOST_IOTLB_MSG) ? sizeof(struct vhost_msg) : sizeof(struct vhost_msg_v2); done: return ret; } EXPORT_SYMBOL(vhost_chr_write_iter); __poll_t vhost_chr_poll(struct file *file, struct vhost_dev *dev, poll_table *wait) { __poll_t mask = 0; poll_wait(file, &dev->wait, wait); if (!list_empty(&dev->read_list)) mask |= EPOLLIN | EPOLLRDNORM; return mask; } EXPORT_SYMBOL(vhost_chr_poll); ssize_t vhost_chr_read_iter(struct vhost_dev *dev, struct iov_iter *to, int noblock) { DEFINE_WAIT(wait); struct vhost_msg_node *node; ssize_t ret = 0; unsigned size = sizeof(struct vhost_msg); if (iov_iter_count(to) < size) return 0; while (1) { if (!noblock) prepare_to_wait(&dev->wait, &wait, TASK_INTERRUPTIBLE); node = vhost_dequeue_msg(dev, &dev->read_list); if (node) break; if (noblock) { ret = -EAGAIN; break; } if (signal_pending(current)) { ret = -ERESTARTSYS; break; } if (!dev->iotlb) { ret = -EBADFD; break; } schedule(); } if (!noblock) finish_wait(&dev->wait, &wait); if (node) { struct vhost_iotlb_msg *msg; void *start = &node->msg; switch (node->msg.type) { case VHOST_IOTLB_MSG: size = sizeof(node->msg); msg = &node->msg.iotlb; break; case VHOST_IOTLB_MSG_V2: size = sizeof(node->msg_v2); msg = &node->msg_v2.iotlb; break; default: BUG(); break; } ret = copy_to_iter(start, size, to); if (ret != size || msg->type != VHOST_IOTLB_MISS) { kfree(node); return ret; } vhost_enqueue_msg(dev, &dev->pending_list, node); } return ret; } EXPORT_SYMBOL_GPL(vhost_chr_read_iter); static int vhost_iotlb_miss(struct vhost_virtqueue *vq, u64 iova, int access) { struct vhost_dev *dev = vq->dev; struct vhost_msg_node *node; struct vhost_iotlb_msg *msg; bool v2 = vhost_backend_has_feature(vq, VHOST_BACKEND_F_IOTLB_MSG_V2); node = vhost_new_msg(vq, v2 ? VHOST_IOTLB_MSG_V2 : VHOST_IOTLB_MSG); if (!node) return -ENOMEM; if (v2) { node->msg_v2.type = VHOST_IOTLB_MSG_V2; msg = &node->msg_v2.iotlb; } else { msg = &node->msg.iotlb; } msg->type = VHOST_IOTLB_MISS; msg->iova = iova; msg->perm = access; vhost_enqueue_msg(dev, &dev->read_list, node); return 0; } static bool vq_access_ok(struct vhost_virtqueue *vq, unsigned int num, vring_desc_t __user *desc, vring_avail_t __user *avail, vring_used_t __user *used) { /* If an IOTLB device is present, the vring addresses are * GIOVAs. Access validation occurs at prefetch time. */ if (vq->iotlb) return true; return access_ok(desc, vhost_get_desc_size(vq, num)) && access_ok(avail, vhost_get_avail_size(vq, num)) && access_ok(used, vhost_get_used_size(vq, num)); } static void vhost_vq_meta_update(struct vhost_virtqueue *vq, const struct vhost_iotlb_map *map, int type) { int access = (type == VHOST_ADDR_USED) ? VHOST_ACCESS_WO : VHOST_ACCESS_RO; if (likely(map->perm & access)) vq->meta_iotlb[type] = map; } static bool iotlb_access_ok(struct vhost_virtqueue *vq, int access, u64 addr, u64 len, int type) { const struct vhost_iotlb_map *map; struct vhost_iotlb *umem = vq->iotlb; u64 s = 0, size, orig_addr = addr, last = addr + len - 1; if (vhost_vq_meta_fetch(vq, addr, len, type)) return true; while (len > s) { map = vhost_iotlb_itree_first(umem, addr, last); if (map == NULL || map->start > addr) { vhost_iotlb_miss(vq, addr, access); return false; } else if (!(map->perm & access)) { /* Report the possible access violation by * request another translation from userspace. */ return false; } size = map->size - addr + map->start; if (orig_addr == addr && size >= len) vhost_vq_meta_update(vq, map, type); s += size; addr += size; } return true; } int vq_meta_prefetch(struct vhost_virtqueue *vq) { unsigned int num = vq->num; if (!vq->iotlb) return 1; return iotlb_access_ok(vq, VHOST_MAP_RO, (u64)(uintptr_t)vq->desc, vhost_get_desc_size(vq, num), VHOST_ADDR_DESC) && iotlb_access_ok(vq, VHOST_MAP_RO, (u64)(uintptr_t)vq->avail, vhost_get_avail_size(vq, num), VHOST_ADDR_AVAIL) && iotlb_access_ok(vq, VHOST_MAP_WO, (u64)(uintptr_t)vq->used, vhost_get_used_size(vq, num), VHOST_ADDR_USED); } EXPORT_SYMBOL_GPL(vq_meta_prefetch); /* Can we log writes? */ /* Caller should have device mutex but not vq mutex */ bool vhost_log_access_ok(struct vhost_dev *dev) { return memory_access_ok(dev, dev->umem, 1); } EXPORT_SYMBOL_GPL(vhost_log_access_ok); static bool vq_log_used_access_ok(struct vhost_virtqueue *vq, void __user *log_base, bool log_used, u64 log_addr) { /* If an IOTLB device is present, log_addr is a GIOVA that * will never be logged by log_used(). */ if (vq->iotlb) return true; return !log_used || log_access_ok(log_base, log_addr, vhost_get_used_size(vq, vq->num)); } /* Verify access for write logging. */ /* Caller should have vq mutex and device mutex */ static bool vq_log_access_ok(struct vhost_virtqueue *vq, void __user *log_base) { return vq_memory_access_ok(log_base, vq->umem, vhost_has_feature(vq, VHOST_F_LOG_ALL)) && vq_log_used_access_ok(vq, log_base, vq->log_used, vq->log_addr); } /* Can we start vq? */ /* Caller should have vq mutex and device mutex */ bool vhost_vq_access_ok(struct vhost_virtqueue *vq) { if (!vq_log_access_ok(vq, vq->log_base)) return false; return vq_access_ok(vq, vq->num, vq->desc, vq->avail, vq->used); } EXPORT_SYMBOL_GPL(vhost_vq_access_ok); static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user *m) { struct vhost_memory mem, *newmem; struct vhost_memory_region *region; struct vhost_iotlb *newumem, *oldumem; unsigned long size = offsetof(struct vhost_memory, regions); int i; if (copy_from_user(&mem, m, size)) return -EFAULT; if (mem.padding) return -EOPNOTSUPP; if (mem.nregions > max_mem_regions) return -E2BIG; newmem = kvzalloc(struct_size(newmem, regions, mem.nregions), GFP_KERNEL); if (!newmem) return -ENOMEM; memcpy(newmem, &mem, size); if (copy_from_user(newmem->regions, m->regions, flex_array_size(newmem, regions, mem.nregions))) { kvfree(newmem); return -EFAULT; } newumem = iotlb_alloc(); if (!newumem) { kvfree(newmem); return -ENOMEM; } for (region = newmem->regions; region < newmem->regions + mem.nregions; region++) { if (vhost_iotlb_add_range(newumem, region->guest_phys_addr, region->guest_phys_addr + region->memory_size - 1, region->userspace_addr, VHOST_MAP_RW)) goto err; } if (!memory_access_ok(d, newumem, 0)) goto err; oldumem = d->umem; d->umem = newumem; /* All memory accesses are done under some VQ mutex. */ for (i = 0; i < d->nvqs; ++i) { mutex_lock(&d->vqs[i]->mutex); d->vqs[i]->umem = newumem; mutex_unlock(&d->vqs[i]->mutex); } kvfree(newmem); vhost_iotlb_free(oldumem); return 0; err: vhost_iotlb_free(newumem); kvfree(newmem); return -EFAULT; } static long vhost_vring_set_num(struct vhost_dev *d, struct vhost_virtqueue *vq, void __user *argp) { struct vhost_vring_state s; /* Resizing ring with an active backend? * You don't want to do that. */ if (vq->private_data) return -EBUSY; if (copy_from_user(&s, argp, sizeof s)) return -EFAULT; if (!s.num || s.num > 0xffff || (s.num & (s.num - 1))) return -EINVAL; vq->num = s.num; return 0; } static long vhost_vring_set_addr(struct vhost_dev *d, struct vhost_virtqueue *vq, void __user *argp) { struct vhost_vring_addr a; if (copy_from_user(&a, argp, sizeof a)) return -EFAULT; if (a.flags & ~(0x1 << VHOST_VRING_F_LOG)) return -EOPNOTSUPP; /* For 32bit, verify that the top 32bits of the user data are set to zero. */ if ((u64)(unsigned long)a.desc_user_addr != a.desc_user_addr || (u64)(unsigned long)a.used_user_addr != a.used_user_addr || (u64)(unsigned long)a.avail_user_addr != a.avail_user_addr) return -EFAULT; /* Make sure it's safe to cast pointers to vring types. */ BUILD_BUG_ON(__alignof__ *vq->avail > VRING_AVAIL_ALIGN_SIZE); BUILD_BUG_ON(__alignof__ *vq->used > VRING_USED_ALIGN_SIZE); if ((a.avail_user_addr & (VRING_AVAIL_ALIGN_SIZE - 1)) || (a.used_user_addr & (VRING_USED_ALIGN_SIZE - 1)) || (a.log_guest_addr & (VRING_USED_ALIGN_SIZE - 1))) return -EINVAL; /* We only verify access here if backend is configured. * If it is not, we don't as size might not have been setup. * We will verify when backend is configured. */ if (vq->private_data) { if (!vq_access_ok(vq, vq->num, (void __user *)(unsigned long)a.desc_user_addr, (void __user *)(unsigned long)a.avail_user_addr, (void __user *)(unsigned long)a.used_user_addr)) return -EINVAL; /* Also validate log access for used ring if enabled. */ if (!vq_log_used_access_ok(vq, vq->log_base, a.flags & (0x1 << VHOST_VRING_F_LOG), a.log_guest_addr)) return -EINVAL; } vq->log_used = !!(a.flags & (0x1 << VHOST_VRING_F_LOG)); vq->desc = (void __user *)(unsigned long)a.desc_user_addr; vq->avail = (void __user *)(unsigned long)a.avail_user_addr; vq->log_addr = a.log_guest_addr; vq->used = (void __user *)(unsigned long)a.used_user_addr; return 0; } static long vhost_vring_set_num_addr(struct vhost_dev *d, struct vhost_virtqueue *vq, unsigned int ioctl, void __user *argp) { long r; mutex_lock(&vq->mutex); switch (ioctl) { case VHOST_SET_VRING_NUM: r = vhost_vring_set_num(d, vq, argp); break; case VHOST_SET_VRING_ADDR: r = vhost_vring_set_addr(d, vq, argp); break; default: BUG(); } mutex_unlock(&vq->mutex); return r; } long vhost_vring_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *argp) { struct file *eventfp, *filep = NULL; bool pollstart = false, pollstop = false; struct eventfd_ctx *ctx = NULL; struct vhost_virtqueue *vq; struct vhost_vring_state s; struct vhost_vring_file f; u32 idx; long r; r = vhost_get_vq_from_user(d, argp, &vq, &idx); if (r < 0) return r; if (ioctl == VHOST_SET_VRING_NUM || ioctl == VHOST_SET_VRING_ADDR) { return vhost_vring_set_num_addr(d, vq, ioctl, argp); } mutex_lock(&vq->mutex); switch (ioctl) { case VHOST_SET_VRING_BASE: /* Moving base with an active backend? * You don't want to do that. */ if (vq->private_data) { r = -EBUSY; break; } if (copy_from_user(&s, argp, sizeof s)) { r = -EFAULT; break; } if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED)) { vq->next_avail_head = vq->last_avail_idx = s.num & 0xffff; vq->last_used_idx = (s.num >> 16) & 0xffff; } else { if (s.num > 0xffff) { r = -EINVAL; break; } vq->next_avail_head = vq->last_avail_idx = s.num; } /* Forget the cached index value. */ vq->avail_idx = vq->last_avail_idx; break; case VHOST_GET_VRING_BASE: s.index = idx; if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED)) s.num = (u32)vq->last_avail_idx | ((u32)vq->last_used_idx << 16); else s.num = vq->last_avail_idx; if (copy_to_user(argp, &s, sizeof s)) r = -EFAULT; break; case VHOST_SET_VRING_KICK: if (copy_from_user(&f, argp, sizeof f)) { r = -EFAULT; break; } eventfp = f.fd == VHOST_FILE_UNBIND ? NULL : eventfd_fget(f.fd); if (IS_ERR(eventfp)) { r = PTR_ERR(eventfp); break; } if (eventfp != vq->kick) { pollstop = (filep = vq->kick) != NULL; pollstart = (vq->kick = eventfp) != NULL; } else filep = eventfp; break; case VHOST_SET_VRING_CALL: if (copy_from_user(&f, argp, sizeof f)) { r = -EFAULT; break; } ctx = f.fd == VHOST_FILE_UNBIND ? NULL : eventfd_ctx_fdget(f.fd); if (IS_ERR(ctx)) { r = PTR_ERR(ctx); break; } swap(ctx, vq->call_ctx.ctx); break; case VHOST_SET_VRING_ERR: if (copy_from_user(&f, argp, sizeof f)) { r = -EFAULT; break; } ctx = f.fd == VHOST_FILE_UNBIND ? NULL : eventfd_ctx_fdget(f.fd); if (IS_ERR(ctx)) { r = PTR_ERR(ctx); break; } swap(ctx, vq->error_ctx); break; case VHOST_SET_VRING_ENDIAN: r = vhost_set_vring_endian(vq, argp); break; case VHOST_GET_VRING_ENDIAN: r = vhost_get_vring_endian(vq, idx, argp); break; case VHOST_SET_VRING_BUSYLOOP_TIMEOUT: if (copy_from_user(&s, argp, sizeof(s))) { r = -EFAULT; break; } vq->busyloop_timeout = s.num; break; case VHOST_GET_VRING_BUSYLOOP_TIMEOUT: s.index = idx; s.num = vq->busyloop_timeout; if (copy_to_user(argp, &s, sizeof(s))) r = -EFAULT; break; default: r = -ENOIOCTLCMD; } if (pollstop && vq->handle_kick) vhost_poll_stop(&vq->poll); if (!IS_ERR_OR_NULL(ctx)) eventfd_ctx_put(ctx); if (filep) fput(filep); if (pollstart && vq->handle_kick) r = vhost_poll_start(&vq->poll, vq->kick); mutex_unlock(&vq->mutex); if (pollstop && vq->handle_kick) vhost_dev_flush(vq->poll.dev); return r; } EXPORT_SYMBOL_GPL(vhost_vring_ioctl); int vhost_init_device_iotlb(struct vhost_dev *d) { struct vhost_iotlb *niotlb, *oiotlb; int i; niotlb = iotlb_alloc(); if (!niotlb) return -ENOMEM; oiotlb = d->iotlb; d->iotlb = niotlb; for (i = 0; i < d->nvqs; ++i) { struct vhost_virtqueue *vq = d->vqs[i]; mutex_lock(&vq->mutex); vq->iotlb = niotlb; __vhost_vq_meta_reset(vq); mutex_unlock(&vq->mutex); } vhost_iotlb_free(oiotlb); return 0; } EXPORT_SYMBOL_GPL(vhost_init_device_iotlb); /* Caller must have device mutex */ long vhost_dev_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *argp) { struct eventfd_ctx *ctx; u64 p; long r; int i, fd; /* If you are not the owner, you can become one */ if (ioctl == VHOST_SET_OWNER) { r = vhost_dev_set_owner(d); goto done; } #ifdef CONFIG_VHOST_ENABLE_FORK_OWNER_CONTROL if (ioctl == VHOST_SET_FORK_FROM_OWNER) { /* Only allow modification before owner is set */ if (vhost_dev_has_owner(d)) { r = -EBUSY; goto done; } u8 fork_owner_val; if (get_user(fork_owner_val, (u8 __user *)argp)) { r = -EFAULT; goto done; } if (fork_owner_val != VHOST_FORK_OWNER_TASK && fork_owner_val != VHOST_FORK_OWNER_KTHREAD) { r = -EINVAL; goto done; } d->fork_owner = !!fork_owner_val; r = 0; goto done; } if (ioctl == VHOST_GET_FORK_FROM_OWNER) { u8 fork_owner_val = d->fork_owner; if (fork_owner_val != VHOST_FORK_OWNER_TASK && fork_owner_val != VHOST_FORK_OWNER_KTHREAD) { r = -EINVAL; goto done; } if (put_user(fork_owner_val, (u8 __user *)argp)) { r = -EFAULT; goto done; } r = 0; goto done; } #endif /* You must be the owner to do anything else */ r = vhost_dev_check_owner(d); if (r) goto done; switch (ioctl) { case VHOST_SET_MEM_TABLE: r = vhost_set_memory(d, argp); break; case VHOST_SET_LOG_BASE: if (copy_from_user(&p, argp, sizeof p)) { r = -EFAULT; break; } if ((u64)(unsigned long)p != p) { r = -EFAULT; break; } for (i = 0; i < d->nvqs; ++i) { struct vhost_virtqueue *vq; void __user *base = (void __user *)(unsigned long)p; vq = d->vqs[i]; mutex_lock(&vq->mutex); /* If ring is inactive, will check when it's enabled. */ if (vq->private_data && !vq_log_access_ok(vq, base)) r = -EFAULT; else vq->log_base = base; mutex_unlock(&vq->mutex); } break; case VHOST_SET_LOG_FD: r = get_user(fd, (int __user *)argp); if (r < 0) break; ctx = fd == VHOST_FILE_UNBIND ? NULL : eventfd_ctx_fdget(fd); if (IS_ERR(ctx)) { r = PTR_ERR(ctx); break; } swap(ctx, d->log_ctx); for (i = 0; i < d->nvqs; ++i) { mutex_lock(&d->vqs[i]->mutex); d->vqs[i]->log_ctx = d->log_ctx; mutex_unlock(&d->vqs[i]->mutex); } if (ctx) eventfd_ctx_put(ctx); break; default: r = -ENOIOCTLCMD; break; } done: return r; } EXPORT_SYMBOL_GPL(vhost_dev_ioctl); /* TODO: This is really inefficient. We need something like get_user() * (instruction directly accesses the data, with an exception table entry * returning -EFAULT). See Documentation/arch/x86/exception-tables.rst. */ static int set_bit_to_user(int nr, void __user *addr) { unsigned long log = (unsigned long)addr; struct page *page; void *base; int bit = nr + (log % PAGE_SIZE) * 8; int r; r = pin_user_pages_fast(log, 1, FOLL_WRITE, &page); if (r < 0) return r; BUG_ON(r != 1); base = kmap_atomic(page); set_bit(bit, base); kunmap_atomic(base); unpin_user_pages_dirty_lock(&page, 1, true); return 0; } static int log_write(void __user *log_base, u64 write_address, u64 write_length) { u64 write_page = write_address / VHOST_PAGE_SIZE; int r; if (!write_length) return 0; write_length += write_address % VHOST_PAGE_SIZE; for (;;) { u64 base = (u64)(unsigned long)log_base; u64 log = base + write_page / 8; int bit = write_page % 8; if ((u64)(unsigned long)log != log) return -EFAULT; r = set_bit_to_user(bit, (void __user *)(unsigned long)log); if (r < 0) return r; if (write_length <= VHOST_PAGE_SIZE) break; write_length -= VHOST_PAGE_SIZE; write_page += 1; } return r; } static int log_write_hva(struct vhost_virtqueue *vq, u64 hva, u64 len) { struct vhost_iotlb *umem = vq->umem; struct vhost_iotlb_map *u; u64 start, end, l, min; int r; bool hit = false; while (len) { min = len; /* More than one GPAs can be mapped into a single HVA. So * iterate all possible umems here to be safe. */ list_for_each_entry(u, &umem->list, link) { if (u->addr > hva - 1 + len || u->addr - 1 + u->size < hva) continue; start = max(u->addr, hva); end = min(u->addr - 1 + u->size, hva - 1 + len); l = end - start + 1; r = log_write(vq->log_base, u->start + start - u->addr, l); if (r < 0) return r; hit = true; min = min(l, min); } if (!hit) return -EFAULT; len -= min; hva += min; } return 0; } static int log_used(struct vhost_virtqueue *vq, u64 used_offset, u64 len) { struct iovec *iov = vq->log_iov; int i, ret; if (!vq->iotlb) return log_write(vq->log_base, vq->log_addr + used_offset, len); ret = translate_desc(vq, (uintptr_t)vq->used + used_offset, len, iov, 64, VHOST_ACCESS_WO); if (ret < 0) return ret; for (i = 0; i < ret; i++) { ret = log_write_hva(vq, (uintptr_t)iov[i].iov_base, iov[i].iov_len); if (ret) return ret; } return 0; } /* * vhost_log_write() - Log in dirty page bitmap * @vq: vhost virtqueue. * @log: Array of dirty memory in GPA. * @log_num: Size of vhost_log arrary. * @len: The total length of memory buffer to log in the dirty bitmap. * Some drivers may only partially use pages shared via the last * vring descriptor (i.e. vhost-net RX buffer). * Use (len == U64_MAX) to indicate the driver would log all * pages of vring descriptors. * @iov: Array of dirty memory in HVA. * @count: Size of iovec array. */ int vhost_log_write(struct vhost_virtqueue *vq, struct vhost_log *log, unsigned int log_num, u64 len, struct iovec *iov, int count) { int i, r; /* Make sure data written is seen before log. */ smp_wmb(); if (vq->iotlb) { for (i = 0; i < count; i++) { r = log_write_hva(vq, (uintptr_t)iov[i].iov_base, iov[i].iov_len); if (r < 0) return r; } return 0; } for (i = 0; i < log_num; ++i) { u64 l = min(log[i].len, len); r = log_write(vq->log_base, log[i].addr, l); if (r < 0) return r; if (len != U64_MAX) len -= l; } if (vq->log_ctx) eventfd_signal(vq->log_ctx); return 0; } EXPORT_SYMBOL_GPL(vhost_log_write); static int vhost_update_used_flags(struct vhost_virtqueue *vq) { void __user *used; if (vhost_put_used_flags(vq)) return -EFAULT; if (unlikely(vq->log_used)) { /* Make sure the flag is seen before log. */ smp_wmb(); /* Log used flag write. */ used = &vq->used->flags; log_used(vq, (used - (void __user *)vq->used), sizeof vq->used->flags); if (vq->log_ctx) eventfd_signal(vq->log_ctx); } return 0; } static int vhost_update_avail_event(struct vhost_virtqueue *vq) { if (vhost_put_avail_event(vq)) return -EFAULT; if (unlikely(vq->log_used)) { void __user *used; /* Make sure the event is seen before log. */ smp_wmb(); /* Log avail event write */ used = vhost_avail_event(vq); log_used(vq, (used - (void __user *)vq->used), sizeof *vhost_avail_event(vq)); if (vq->log_ctx) eventfd_signal(vq->log_ctx); } return 0; } int vhost_vq_init_access(struct vhost_virtqueue *vq) { __virtio16 last_used_idx; int r; bool is_le = vq->is_le; if (!vq->private_data) return 0; vhost_init_is_le(vq); r = vhost_update_used_flags(vq); if (r) goto err; vq->signalled_used_valid = false; if (!vq->iotlb && !access_ok(&vq->used->idx, sizeof vq->used->idx)) { r = -EFAULT; goto err; } r = vhost_get_used_idx(vq, &last_used_idx); if (r) { vq_err(vq, "Can't access used idx at %p\n", &vq->used->idx); goto err; } vq->last_used_idx = vhost16_to_cpu(vq, last_used_idx); return 0; err: vq->is_le = is_le; return r; } EXPORT_SYMBOL_GPL(vhost_vq_init_access); static int translate_desc(struct vhost_virtqueue *vq, u64 addr, u32 len, struct iovec iov[], int iov_size, int access) { const struct vhost_iotlb_map *map; struct vhost_dev *dev = vq->dev; struct vhost_iotlb *umem = dev->iotlb ? dev->iotlb : dev->umem; struct iovec *_iov; u64 s = 0, last = addr + len - 1; int ret = 0; while ((u64)len > s) { u64 size; if (unlikely(ret >= iov_size)) { ret = -ENOBUFS; break; } map = vhost_iotlb_itree_first(umem, addr, last); if (map == NULL || map->start > addr) { if (umem != dev->iotlb) { ret = -EFAULT; break; } ret = -EAGAIN; break; } else if (!(map->perm & access)) { ret = -EPERM; break; } _iov = iov + ret; size = map->size - addr + map->start; _iov->iov_len = min((u64)len - s, size); _iov->iov_base = (void __user *)(unsigned long) (map->addr + addr - map->start); s += size; addr += size; ++ret; } if (ret == -EAGAIN) vhost_iotlb_miss(vq, addr, access); return ret; } /* Each buffer in the virtqueues is actually a chain of descriptors. This * function returns the next descriptor in the chain, * or -1U if we're at the end. */ static unsigned next_desc(struct vhost_virtqueue *vq, struct vring_desc *desc) { unsigned int next; /* If this descriptor says it doesn't chain, we're done. */ if (!(desc->flags & cpu_to_vhost16(vq, VRING_DESC_F_NEXT))) return -1U; /* Check they're not leading us off end of descriptors. */ next = vhost16_to_cpu(vq, READ_ONCE(desc->next)); return next; } static int get_indirect(struct vhost_virtqueue *vq, struct iovec iov[], unsigned int iov_size, unsigned int *out_num, unsigned int *in_num, struct vhost_log *log, unsigned int *log_num, struct vring_desc *indirect) { struct vring_desc desc; unsigned int i = 0, count, found = 0; u32 len = vhost32_to_cpu(vq, indirect->len); struct iov_iter from; int ret, access; /* Sanity check */ if (unlikely(len % sizeof desc)) { vq_err(vq, "Invalid length in indirect descriptor: " "len 0x%llx not multiple of 0x%zx\n", (unsigned long long)len, sizeof desc); return -EINVAL; } ret = translate_desc(vq, vhost64_to_cpu(vq, indirect->addr), len, vq->indirect, UIO_MAXIOV, VHOST_ACCESS_RO); if (unlikely(ret < 0)) { if (ret != -EAGAIN) vq_err(vq, "Translation failure %d in indirect.\n", ret); return ret; } iov_iter_init(&from, ITER_SOURCE, vq->indirect, ret, len); count = len / sizeof desc; /* Buffers are chained via a 16 bit next field, so * we can have at most 2^16 of these. */ if (unlikely(count > USHRT_MAX + 1)) { vq_err(vq, "Indirect buffer length too big: %d\n", indirect->len); return -E2BIG; } do { unsigned iov_count = *in_num + *out_num; if (unlikely(++found > count)) { vq_err(vq, "Loop detected: last one at %u " "indirect size %u\n", i, count); return -EINVAL; } if (unlikely(!copy_from_iter_full(&desc, sizeof(desc), &from))) { vq_err(vq, "Failed indirect descriptor: idx %d, %zx\n", i, (size_t)vhost64_to_cpu(vq, indirect->addr) + i * sizeof desc); return -EINVAL; } if (unlikely(desc.flags & cpu_to_vhost16(vq, VRING_DESC_F_INDIRECT))) { vq_err(vq, "Nested indirect descriptor: idx %d, %zx\n", i, (size_t)vhost64_to_cpu(vq, indirect->addr) + i * sizeof desc); return -EINVAL; } if (desc.flags & cpu_to_vhost16(vq, VRING_DESC_F_WRITE)) access = VHOST_ACCESS_WO; else access = VHOST_ACCESS_RO; ret = translate_desc(vq, vhost64_to_cpu(vq, desc.addr), vhost32_to_cpu(vq, desc.len), iov + iov_count, iov_size - iov_count, access); if (unlikely(ret < 0)) { if (ret != -EAGAIN) vq_err(vq, "Translation failure %d indirect idx %d\n", ret, i); return ret; } /* If this is an input descriptor, increment that count. */ if (access == VHOST_ACCESS_WO) { *in_num += ret; if (unlikely(log && ret)) { log[*log_num].addr = vhost64_to_cpu(vq, desc.addr); log[*log_num].len = vhost32_to_cpu(vq, desc.len); ++*log_num; } } else { /* If it's an output descriptor, they're all supposed * to come before any input descriptors. */ if (unlikely(*in_num)) { vq_err(vq, "Indirect descriptor " "has out after in: idx %d\n", i); return -EINVAL; } *out_num += ret; } } while ((i = next_desc(vq, &desc)) != -1); return 0; } /* This looks in the virtqueue and for the first available buffer, and converts * it to an iovec for convenient access. Since descriptors consist of some * number of output then some number of input descriptors, it's actually two * iovecs, but we pack them into one and note how many of each there were. * * This function returns the descriptor number found, or vq->num (which is * never a valid descriptor number) if none was found. A negative code is * returned on error. */ int vhost_get_vq_desc(struct vhost_virtqueue *vq, struct iovec iov[], unsigned int iov_size, unsigned int *out_num, unsigned int *in_num, struct vhost_log *log, unsigned int *log_num) { bool in_order = vhost_has_feature(vq, VIRTIO_F_IN_ORDER); struct vring_desc desc; unsigned int i, head, found = 0; u16 last_avail_idx = vq->last_avail_idx; __virtio16 ring_head; int ret, access, c = 0; if (vq->avail_idx == vq->last_avail_idx) { ret = vhost_get_avail_idx(vq); if (unlikely(ret < 0)) return ret; if (!ret) return vq->num; } if (in_order) head = vq->next_avail_head & (vq->num - 1); else { /* Grab the next descriptor number they're * advertising, and increment the index we've seen. */ if (unlikely(vhost_get_avail_head(vq, &ring_head, last_avail_idx))) { vq_err(vq, "Failed to read head: idx %d address %p\n", last_avail_idx, &vq->avail->ring[last_avail_idx % vq->num]); return -EFAULT; } head = vhost16_to_cpu(vq, ring_head); } /* If their number is silly, that's an error. */ if (unlikely(head >= vq->num)) { vq_err(vq, "Guest says index %u > %u is available", head, vq->num); return -EINVAL; } /* When we start there are none of either input nor output. */ *out_num = *in_num = 0; if (unlikely(log)) *log_num = 0; i = head; do { unsigned iov_count = *in_num + *out_num; if (unlikely(i >= vq->num)) { vq_err(vq, "Desc index is %u > %u, head = %u", i, vq->num, head); return -EINVAL; } if (unlikely(++found > vq->num)) { vq_err(vq, "Loop detected: last one at %u " "vq size %u head %u\n", i, vq->num, head); return -EINVAL; } ret = vhost_get_desc(vq, &desc, i); if (unlikely(ret)) { vq_err(vq, "Failed to get descriptor: idx %d addr %p\n", i, vq->desc + i); return -EFAULT; } if (desc.flags & cpu_to_vhost16(vq, VRING_DESC_F_INDIRECT)) { ret = get_indirect(vq, iov, iov_size, out_num, in_num, log, log_num, &desc); if (unlikely(ret < 0)) { if (ret != -EAGAIN) vq_err(vq, "Failure detected " "in indirect descriptor at idx %d\n", i); return ret; } ++c; continue; } if (desc.flags & cpu_to_vhost16(vq, VRING_DESC_F_WRITE)) access = VHOST_ACCESS_WO; else access = VHOST_ACCESS_RO; ret = translate_desc(vq, vhost64_to_cpu(vq, desc.addr), vhost32_to_cpu(vq, desc.len), iov + iov_count, iov_size - iov_count, access); if (unlikely(ret < 0)) { if (ret != -EAGAIN) vq_err(vq, "Translation failure %d descriptor idx %d\n", ret, i); return ret; } if (access == VHOST_ACCESS_WO) { /* If this is an input descriptor, * increment that count. */ *in_num += ret; if (unlikely(log && ret)) { log[*log_num].addr = vhost64_to_cpu(vq, desc.addr); log[*log_num].len = vhost32_to_cpu(vq, desc.len); ++*log_num; } } else { /* If it's an output descriptor, they're all supposed * to come before any input descriptors. */ if (unlikely(*in_num)) { vq_err(vq, "Descriptor has out after in: " "idx %d\n", i); return -EINVAL; } *out_num += ret; } ++c; } while ((i = next_desc(vq, &desc)) != -1); /* On success, increment avail index. */ vq->last_avail_idx++; vq->next_avail_head += c; /* Assume notifications from guest are disabled at this point, * if they aren't we would need to update avail_event index. */ BUG_ON(!(vq->used_flags & VRING_USED_F_NO_NOTIFY)); return head; } EXPORT_SYMBOL_GPL(vhost_get_vq_desc); /* Reverse the effect of vhost_get_vq_desc. Useful for error handling. */ void vhost_discard_vq_desc(struct vhost_virtqueue *vq, int n) { vq->last_avail_idx -= n; } EXPORT_SYMBOL_GPL(vhost_discard_vq_desc); /* After we've used one of their buffers, we tell them about it. We'll then * want to notify the guest, using eventfd. */ int vhost_add_used(struct vhost_virtqueue *vq, unsigned int head, int len) { struct vring_used_elem heads = { cpu_to_vhost32(vq, head), cpu_to_vhost32(vq, len) }; u16 nheads = 1; return vhost_add_used_n(vq, &heads, &nheads, 1); } EXPORT_SYMBOL_GPL(vhost_add_used); static int __vhost_add_used_n(struct vhost_virtqueue *vq, struct vring_used_elem *heads, unsigned count) { vring_used_elem_t __user *used; u16 old, new; int start; start = vq->last_used_idx & (vq->num - 1); used = vq->used->ring + start; if (vhost_put_used(vq, heads, start, count)) { vq_err(vq, "Failed to write used"); return -EFAULT; } if (unlikely(vq->log_used)) { /* Make sure data is seen before log. */ smp_wmb(); /* Log used ring entry write. */ log_used(vq, ((void __user *)used - (void __user *)vq->used), count * sizeof *used); } old = vq->last_used_idx; new = (vq->last_used_idx += count); /* If the driver never bothers to signal in a very long while, * used index might wrap around. If that happens, invalidate * signalled_used index we stored. TODO: make sure driver * signals at least once in 2^16 and remove this. */ if (unlikely((u16)(new - vq->signalled_used) < (u16)(new - old))) vq->signalled_used_valid = false; return 0; } static int vhost_add_used_n_ooo(struct vhost_virtqueue *vq, struct vring_used_elem *heads, unsigned count) { int start, n, r; start = vq->last_used_idx & (vq->num - 1); n = vq->num - start; if (n < count) { r = __vhost_add_used_n(vq, heads, n); if (r < 0) return r; heads += n; count -= n; } return __vhost_add_used_n(vq, heads, count); } static int vhost_add_used_n_in_order(struct vhost_virtqueue *vq, struct vring_used_elem *heads, const u16 *nheads, unsigned count) { vring_used_elem_t __user *used; u16 old, new = vq->last_used_idx; int start, i; if (!nheads) return -EINVAL; start = vq->last_used_idx & (vq->num - 1); used = vq->used->ring + start; for (i = 0; i < count; i++) { if (vhost_put_used(vq, &heads[i], start, 1)) { vq_err(vq, "Failed to write used"); return -EFAULT; } start += nheads[i]; new += nheads[i]; if (start >= vq->num) start -= vq->num; } if (unlikely(vq->log_used)) { /* Make sure data is seen before log. */ smp_wmb(); /* Log used ring entry write. */ log_used(vq, ((void __user *)used - (void __user *)vq->used), (vq->num - start) * sizeof *used); if (start + count > vq->num) log_used(vq, 0, (start + count - vq->num) * sizeof *used); } old = vq->last_used_idx; vq->last_used_idx = new; /* If the driver never bothers to signal in a very long while, * used index might wrap around. If that happens, invalidate * signalled_used index we stored. TODO: make sure driver * signals at least once in 2^16 and remove this. */ if (unlikely((u16)(new - vq->signalled_used) < (u16)(new - old))) vq->signalled_used_valid = false; return 0; } /* After we've used one of their buffers, we tell them about it. We'll then * want to notify the guest, using eventfd. */ int vhost_add_used_n(struct vhost_virtqueue *vq, struct vring_used_elem *heads, u16 *nheads, unsigned count) { bool in_order = vhost_has_feature(vq, VIRTIO_F_IN_ORDER); int r; if (!in_order || !nheads) r = vhost_add_used_n_ooo(vq, heads, count); else r = vhost_add_used_n_in_order(vq, heads, nheads, count); if (r < 0) return r; /* Make sure buffer is written before we update index. */ smp_wmb(); if (vhost_put_used_idx(vq)) { vq_err(vq, "Failed to increment used idx"); return -EFAULT; } if (unlikely(vq->log_used)) { /* Make sure used idx is seen before log. */ smp_wmb(); /* Log used index update. */ log_used(vq, offsetof(struct vring_used, idx), sizeof vq->used->idx); if (vq->log_ctx) eventfd_signal(vq->log_ctx); } return r; } EXPORT_SYMBOL_GPL(vhost_add_used_n); static bool vhost_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq) { __u16 old, new; __virtio16 event; bool v; /* Flush out used index updates. This is paired * with the barrier that the Guest executes when enabling * interrupts. */ smp_mb(); if (vhost_has_feature(vq, VIRTIO_F_NOTIFY_ON_EMPTY) && unlikely(vq->avail_idx == vq->last_avail_idx)) return true; if (!vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX)) { __virtio16 flags; if (vhost_get_avail_flags(vq, &flags)) { vq_err(vq, "Failed to get flags"); return true; } return !(flags & cpu_to_vhost16(vq, VRING_AVAIL_F_NO_INTERRUPT)); } old = vq->signalled_used; v = vq->signalled_used_valid; new = vq->signalled_used = vq->last_used_idx; vq->signalled_used_valid = true; if (unlikely(!v)) return true; if (vhost_get_used_event(vq, &event)) { vq_err(vq, "Failed to get used event idx"); return true; } return vring_need_event(vhost16_to_cpu(vq, event), new, old); } /* This actually signals the guest, using eventfd. */ void vhost_signal(struct vhost_dev *dev, struct vhost_virtqueue *vq) { /* Signal the Guest tell them we used something up. */ if (vq->call_ctx.ctx && vhost_notify(dev, vq)) eventfd_signal(vq->call_ctx.ctx); } EXPORT_SYMBOL_GPL(vhost_signal); /* And here's the combo meal deal. Supersize me! */ void vhost_add_used_and_signal(struct vhost_dev *dev, struct vhost_virtqueue *vq, unsigned int head, int len) { vhost_add_used(vq, head, len); vhost_signal(dev, vq); } EXPORT_SYMBOL_GPL(vhost_add_used_and_signal); /* multi-buffer version of vhost_add_used_and_signal */ void vhost_add_used_and_signal_n(struct vhost_dev *dev, struct vhost_virtqueue *vq, struct vring_used_elem *heads, u16 *nheads, unsigned count) { vhost_add_used_n(vq, heads, nheads, count); vhost_signal(dev, vq); } EXPORT_SYMBOL_GPL(vhost_add_used_and_signal_n); /* return true if we're sure that available ring is empty */ bool vhost_vq_avail_empty(struct vhost_dev *dev, struct vhost_virtqueue *vq) { int r; if (vq->avail_idx != vq->last_avail_idx) return false; r = vhost_get_avail_idx(vq); /* Note: we treat error as non-empty here */ return r == 0; } EXPORT_SYMBOL_GPL(vhost_vq_avail_empty); /* OK, now we need to know about added descriptors. */ bool vhost_enable_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq) { int r; if (!(vq->used_flags & VRING_USED_F_NO_NOTIFY)) return false; vq->used_flags &= ~VRING_USED_F_NO_NOTIFY; if (!vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX)) { r = vhost_update_used_flags(vq); if (r) { vq_err(vq, "Failed to enable notification at %p: %d\n", &vq->used->flags, r); return false; } } else { r = vhost_update_avail_event(vq); if (r) { vq_err(vq, "Failed to update avail event index at %p: %d\n", vhost_avail_event(vq), r); return false; } } /* They could have slipped one in as we were doing that: make * sure it's written, then check again. */ smp_mb(); r = vhost_get_avail_idx(vq); /* Note: we treat error as empty here */ if (unlikely(r < 0)) return false; return r; } EXPORT_SYMBOL_GPL(vhost_enable_notify); /* We don't need to be notified again. */ void vhost_disable_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq) { int r; if (vq->used_flags & VRING_USED_F_NO_NOTIFY) return; vq->used_flags |= VRING_USED_F_NO_NOTIFY; if (!vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX)) { r = vhost_update_used_flags(vq); if (r) vq_err(vq, "Failed to disable notification at %p: %d\n", &vq->used->flags, r); } } EXPORT_SYMBOL_GPL(vhost_disable_notify); /* Create a new message. */ struct vhost_msg_node *vhost_new_msg(struct vhost_virtqueue *vq, int type) { /* Make sure all padding within the structure is initialized. */ struct vhost_msg_node *node = kzalloc(sizeof(*node), GFP_KERNEL); if (!node) return NULL; node->vq = vq; node->msg.type = type; return node; } EXPORT_SYMBOL_GPL(vhost_new_msg); void vhost_enqueue_msg(struct vhost_dev *dev, struct list_head *head, struct vhost_msg_node *node) { spin_lock(&dev->iotlb_lock); list_add_tail(&node->node, head); spin_unlock(&dev->iotlb_lock); wake_up_interruptible_poll(&dev->wait, EPOLLIN | EPOLLRDNORM); } EXPORT_SYMBOL_GPL(vhost_enqueue_msg); struct vhost_msg_node *vhost_dequeue_msg(struct vhost_dev *dev, struct list_head *head) { struct vhost_msg_node *node = NULL; spin_lock(&dev->iotlb_lock); if (!list_empty(head)) { node = list_first_entry(head, struct vhost_msg_node, node); list_del(&node->node); } spin_unlock(&dev->iotlb_lock); return node; } EXPORT_SYMBOL_GPL(vhost_dequeue_msg); void vhost_set_backend_features(struct vhost_dev *dev, u64 features) { struct vhost_virtqueue *vq; int i; mutex_lock(&dev->mutex); for (i = 0; i < dev->nvqs; ++i) { vq = dev->vqs[i]; mutex_lock(&vq->mutex); vq->acked_backend_features = features; mutex_unlock(&vq->mutex); } mutex_unlock(&dev->mutex); } EXPORT_SYMBOL_GPL(vhost_set_backend_features); static int __init vhost_init(void) { return 0; } static void __exit vhost_exit(void) { } module_init(vhost_init); module_exit(vhost_exit); MODULE_VERSION("0.0.1"); MODULE_LICENSE("GPL v2"); MODULE_AUTHOR("Michael S. Tsirkin"); MODULE_DESCRIPTION("Host kernel accelerator for virtio"); |
| 17 8 17 17 1 6 2 3 1 1 1 1 1 1 1 1 1 1 1 1 2 1 5 1 2 2 2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 | // SPDX-License-Identifier: GPL-2.0 /* * Qualcomm Serial USB driver * * Copyright (c) 2008 QUALCOMM Incorporated. * Copyright (c) 2009 Greg Kroah-Hartman <gregkh@suse.de> * Copyright (c) 2009 Novell Inc. */ #include <linux/tty.h> #include <linux/tty_flip.h> #include <linux/module.h> #include <linux/usb.h> #include <linux/usb/serial.h> #include <linux/slab.h> #include "usb-wwan.h" #define DRIVER_AUTHOR "Qualcomm Inc" #define DRIVER_DESC "Qualcomm USB Serial driver" #define QUECTEL_EC20_PID 0x9215 /* standard device layouts supported by this driver */ enum qcserial_layouts { QCSERIAL_G2K = 0, /* Gobi 2000 */ QCSERIAL_G1K = 1, /* Gobi 1000 */ QCSERIAL_SWI = 2, /* Sierra Wireless */ QCSERIAL_HWI = 3, /* Huawei */ }; #define DEVICE_G1K(v, p) \ USB_DEVICE(v, p), .driver_info = QCSERIAL_G1K #define DEVICE_SWI(v, p) \ USB_DEVICE(v, p), .driver_info = QCSERIAL_SWI #define DEVICE_HWI(v, p) \ USB_DEVICE(v, p), .driver_info = QCSERIAL_HWI static const struct usb_device_id id_table[] = { /* Gobi 1000 devices */ {DEVICE_G1K(0x05c6, 0x9211)}, /* Acer Gobi QDL device */ {DEVICE_G1K(0x05c6, 0x9212)}, /* Acer Gobi Modem Device */ {DEVICE_G1K(0x03f0, 0x1f1d)}, /* HP un2400 Gobi Modem Device */ {DEVICE_G1K(0x03f0, 0x201d)}, /* HP un2400 Gobi QDL Device */ {DEVICE_G1K(0x04da, 0x250d)}, /* Panasonic Gobi Modem device */ {DEVICE_G1K(0x04da, 0x250c)}, /* Panasonic Gobi QDL device */ {DEVICE_G1K(0x413c, 0x8172)}, /* Dell Gobi Modem device */ {DEVICE_G1K(0x413c, 0x8171)}, /* Dell Gobi QDL device */ {DEVICE_G1K(0x1410, 0xa001)}, /* Novatel/Verizon USB-1000 */ {DEVICE_G1K(0x1410, 0xa002)}, /* Novatel Gobi Modem device */ {DEVICE_G1K(0x1410, 0xa003)}, /* Novatel Gobi Modem device */ {DEVICE_G1K(0x1410, 0xa004)}, /* Novatel Gobi Modem device */ {DEVICE_G1K(0x1410, 0xa005)}, /* Novatel Gobi Modem device */ {DEVICE_G1K(0x1410, 0xa006)}, /* Novatel Gobi Modem device */ {DEVICE_G1K(0x1410, 0xa007)}, /* Novatel Gobi Modem device */ {DEVICE_G1K(0x1410, 0xa008)}, /* Novatel Gobi QDL device */ {DEVICE_G1K(0x0b05, 0x1776)}, /* Asus Gobi Modem device */ {DEVICE_G1K(0x0b05, 0x1774)}, /* Asus Gobi QDL device */ {DEVICE_G1K(0x19d2, 0xfff3)}, /* ONDA Gobi Modem device */ {DEVICE_G1K(0x19d2, 0xfff2)}, /* ONDA Gobi QDL device */ {DEVICE_G1K(0x1557, 0x0a80)}, /* OQO Gobi QDL device */ {DEVICE_G1K(0x05c6, 0x9001)}, /* Generic Gobi Modem device */ {DEVICE_G1K(0x05c6, 0x9002)}, /* Generic Gobi Modem device */ {DEVICE_G1K(0x05c6, 0x9202)}, /* Generic Gobi Modem device */ {DEVICE_G1K(0x05c6, 0x9203)}, /* Generic Gobi Modem device */ {DEVICE_G1K(0x05c6, 0x9222)}, /* Generic Gobi Modem device */ {DEVICE_G1K(0x05c6, 0x9008)}, /* Generic Gobi QDL device */ {DEVICE_G1K(0x05c6, 0x9009)}, /* Generic Gobi Modem device */ {DEVICE_G1K(0x05c6, 0x9201)}, /* Generic Gobi QDL device */ {DEVICE_G1K(0x05c6, 0x9221)}, /* Generic Gobi QDL device */ {DEVICE_G1K(0x05c6, 0x9231)}, /* Generic Gobi QDL device */ {DEVICE_G1K(0x1f45, 0x0001)}, /* Unknown Gobi QDL device */ {DEVICE_G1K(0x1bc7, 0x900e)}, /* Telit Gobi QDL device */ /* Gobi 2000 devices */ {USB_DEVICE(0x1410, 0xa010)}, /* Novatel Gobi 2000 QDL device */ {USB_DEVICE(0x1410, 0xa011)}, /* Novatel Gobi 2000 QDL device */ {USB_DEVICE(0x1410, 0xa012)}, /* Novatel Gobi 2000 QDL device */ {USB_DEVICE(0x1410, 0xa013)}, /* Novatel Gobi 2000 QDL device */ {USB_DEVICE(0x1410, 0xa014)}, /* Novatel Gobi 2000 QDL device */ {USB_DEVICE(0x413c, 0x8185)}, /* Dell Gobi 2000 QDL device (N0218, VU936) */ {USB_DEVICE(0x413c, 0x8186)}, /* Dell Gobi 2000 Modem device (N0218, VU936) */ {USB_DEVICE(0x05c6, 0x9208)}, /* Generic Gobi 2000 QDL device */ {USB_DEVICE(0x05c6, 0x920b)}, /* Generic Gobi 2000 Modem device */ {USB_DEVICE(0x05c6, 0x9224)}, /* Sony Gobi 2000 QDL device (N0279, VU730) */ {USB_DEVICE(0x05c6, 0x9225)}, /* Sony Gobi 2000 Modem device (N0279, VU730) */ {USB_DEVICE(0x05c6, 0x9244)}, /* Samsung Gobi 2000 QDL device (VL176) */ {USB_DEVICE(0x05c6, 0x9245)}, /* Samsung Gobi 2000 Modem device (VL176) */ {USB_DEVICE(0x03f0, 0x241d)}, /* HP Gobi 2000 QDL device (VP412) */ {USB_DEVICE(0x03f0, 0x251d)}, /* HP Gobi 2000 Modem device (VP412) */ {USB_DEVICE(0x05c6, 0x9214)}, /* Acer Gobi 2000 QDL device (VP413) */ {USB_DEVICE(0x05c6, 0x9215)}, /* Acer Gobi 2000 Modem device (VP413) */ {USB_DEVICE(0x05c6, 0x9264)}, /* Asus Gobi 2000 QDL device (VR305) */ {USB_DEVICE(0x05c6, 0x9265)}, /* Asus Gobi 2000 Modem device (VR305) */ {USB_DEVICE(0x05c6, 0x9234)}, /* Top Global Gobi 2000 QDL device (VR306) */ {USB_DEVICE(0x05c6, 0x9235)}, /* Top Global Gobi 2000 Modem device (VR306) */ {USB_DEVICE(0x05c6, 0x9274)}, /* iRex Technologies Gobi 2000 QDL device (VR307) */ {USB_DEVICE(0x05c6, 0x9275)}, /* iRex Technologies Gobi 2000 Modem device (VR307) */ {USB_DEVICE(0x1199, 0x9000)}, /* Sierra Wireless Gobi 2000 QDL device (VT773) */ {USB_DEVICE(0x1199, 0x9001)}, /* Sierra Wireless Gobi 2000 Modem device (VT773) */ {USB_DEVICE(0x1199, 0x9002)}, /* Sierra Wireless Gobi 2000 Modem device (VT773) */ {USB_DEVICE(0x1199, 0x9003)}, /* Sierra Wireless Gobi 2000 Modem device (VT773) */ {USB_DEVICE(0x1199, 0x9004)}, /* Sierra Wireless Gobi 2000 Modem device (VT773) */ {USB_DEVICE(0x1199, 0x9005)}, /* Sierra Wireless Gobi 2000 Modem device (VT773) */ {USB_DEVICE(0x1199, 0x9006)}, /* Sierra Wireless Gobi 2000 Modem device (VT773) */ {USB_DEVICE(0x1199, 0x9007)}, /* Sierra Wireless Gobi 2000 Modem device (VT773) */ {USB_DEVICE(0x1199, 0x9008)}, /* Sierra Wireless Gobi 2000 Modem device (VT773) */ {USB_DEVICE(0x1199, 0x9009)}, /* Sierra Wireless Gobi 2000 Modem device (VT773) */ {USB_DEVICE(0x1199, 0x900a)}, /* Sierra Wireless Gobi 2000 Modem device (VT773) */ {USB_DEVICE(0x1199, 0x9011)}, /* Sierra Wireless Gobi 2000 Modem device (MC8305) */ {USB_DEVICE(0x16d8, 0x8001)}, /* CMDTech Gobi 2000 QDL device (VU922) */ {USB_DEVICE(0x16d8, 0x8002)}, /* CMDTech Gobi 2000 Modem device (VU922) */ {USB_DEVICE(0x05c6, 0x9204)}, /* Gobi 2000 QDL device */ {USB_DEVICE(0x05c6, 0x9205)}, /* Gobi 2000 Modem device */ /* Gobi 3000 devices */ {USB_DEVICE(0x03f0, 0x371d)}, /* HP un2430 Gobi 3000 QDL */ {USB_DEVICE(0x05c6, 0x920c)}, /* Gobi 3000 QDL */ {USB_DEVICE(0x05c6, 0x920d)}, /* Gobi 3000 Composite */ {USB_DEVICE(0x1410, 0xa020)}, /* Novatel Gobi 3000 QDL */ {USB_DEVICE(0x1410, 0xa021)}, /* Novatel Gobi 3000 Composite */ {USB_DEVICE(0x413c, 0x8193)}, /* Dell Gobi 3000 QDL */ {USB_DEVICE(0x413c, 0x8194)}, /* Dell Gobi 3000 Composite */ {USB_DEVICE(0x413c, 0x81a6)}, /* Dell DW5570 QDL (MC8805) */ {USB_DEVICE(0x1199, 0x68a4)}, /* Sierra Wireless QDL */ {USB_DEVICE(0x1199, 0x68a5)}, /* Sierra Wireless Modem */ {USB_DEVICE(0x1199, 0x68a8)}, /* Sierra Wireless QDL */ {USB_DEVICE(0x1199, 0x68a9)}, /* Sierra Wireless Modem */ {USB_DEVICE(0x1199, 0x9010)}, /* Sierra Wireless Gobi 3000 QDL */ {USB_DEVICE(0x1199, 0x9012)}, /* Sierra Wireless Gobi 3000 QDL */ {USB_DEVICE(0x1199, 0x9013)}, /* Sierra Wireless Gobi 3000 Modem device (MC8355) */ {USB_DEVICE(0x1199, 0x9014)}, /* Sierra Wireless Gobi 3000 QDL */ {USB_DEVICE(0x1199, 0x9015)}, /* Sierra Wireless Gobi 3000 Modem device */ {USB_DEVICE(0x1199, 0x9018)}, /* Sierra Wireless Gobi 3000 QDL */ {USB_DEVICE(0x1199, 0x9019)}, /* Sierra Wireless Gobi 3000 Modem device */ {USB_DEVICE(0x1199, 0x901b)}, /* Sierra Wireless MC7770 */ {USB_DEVICE(0x12D1, 0x14F0)}, /* Sony Gobi 3000 QDL */ {USB_DEVICE(0x12D1, 0x14F1)}, /* Sony Gobi 3000 Composite */ {USB_DEVICE(0x0AF0, 0x8120)}, /* Option GTM681W */ /* non-Gobi Sierra Wireless devices */ {DEVICE_SWI(0x03f0, 0x4e1d)}, /* HP lt4111 LTE/EV-DO/HSPA+ Gobi 4G Module */ {DEVICE_SWI(0x0f3d, 0x68a2)}, /* Sierra Wireless MC7700 */ {DEVICE_SWI(0x114f, 0x68a2)}, /* Sierra Wireless MC7750 */ {DEVICE_SWI(0x1199, 0x68a2)}, /* Sierra Wireless MC7710 */ {DEVICE_SWI(0x1199, 0x68c0)}, /* Sierra Wireless MC7304/MC7354 */ {DEVICE_SWI(0x1199, 0x901c)}, /* Sierra Wireless EM7700 */ {DEVICE_SWI(0x1199, 0x901e)}, /* Sierra Wireless EM7355 QDL */ {DEVICE_SWI(0x1199, 0x901f)}, /* Sierra Wireless EM7355 */ {DEVICE_SWI(0x1199, 0x9040)}, /* Sierra Wireless Modem */ {DEVICE_SWI(0x1199, 0x9041)}, /* Sierra Wireless MC7305/MC7355 */ {DEVICE_SWI(0x1199, 0x9051)}, /* Netgear AirCard 340U */ {DEVICE_SWI(0x1199, 0x9053)}, /* Sierra Wireless Modem */ {DEVICE_SWI(0x1199, 0x9054)}, /* Sierra Wireless Modem */ {DEVICE_SWI(0x1199, 0x9055)}, /* Netgear AirCard 341U */ {DEVICE_SWI(0x1199, 0x9056)}, /* Sierra Wireless Modem */ {DEVICE_SWI(0x1199, 0x9060)}, /* Sierra Wireless Modem */ {DEVICE_SWI(0x1199, 0x9061)}, /* Sierra Wireless Modem */ {DEVICE_SWI(0x1199, 0x9062)}, /* Sierra Wireless EM7305 QDL */ {DEVICE_SWI(0x1199, 0x9063)}, /* Sierra Wireless EM7305 */ {DEVICE_SWI(0x1199, 0x9070)}, /* Sierra Wireless MC74xx */ {DEVICE_SWI(0x1199, 0x9071)}, /* Sierra Wireless MC74xx */ {DEVICE_SWI(0x1199, 0x9078)}, /* Sierra Wireless EM74xx */ {DEVICE_SWI(0x1199, 0x9079)}, /* Sierra Wireless EM74xx */ {DEVICE_SWI(0x1199, 0x907a)}, /* Sierra Wireless EM74xx QDL */ {DEVICE_SWI(0x1199, 0x907b)}, /* Sierra Wireless EM74xx */ {DEVICE_SWI(0x1199, 0x9090)}, /* Sierra Wireless EM7565 QDL */ {DEVICE_SWI(0x1199, 0x9091)}, /* Sierra Wireless EM7565 */ {DEVICE_SWI(0x1199, 0x90d2)}, /* Sierra Wireless EM9191 QDL */ {DEVICE_SWI(0x1199, 0x90e4)}, /* Sierra Wireless EM86xx QDL*/ {DEVICE_SWI(0x1199, 0x90e5)}, /* Sierra Wireless EM86xx */ {DEVICE_SWI(0x1199, 0xc080)}, /* Sierra Wireless EM7590 QDL */ {DEVICE_SWI(0x1199, 0xc081)}, /* Sierra Wireless EM7590 */ {DEVICE_SWI(0x413c, 0x81a2)}, /* Dell Wireless 5806 Gobi(TM) 4G LTE Mobile Broadband Card */ {DEVICE_SWI(0x413c, 0x81a3)}, /* Dell Wireless 5570 HSPA+ (42Mbps) Mobile Broadband Card */ {DEVICE_SWI(0x413c, 0x81a4)}, /* Dell Wireless 5570e HSPA+ (42Mbps) Mobile Broadband Card */ {DEVICE_SWI(0x413c, 0x81a8)}, /* Dell Wireless 5808 Gobi(TM) 4G LTE Mobile Broadband Card */ {DEVICE_SWI(0x413c, 0x81a9)}, /* Dell Wireless 5808e Gobi(TM) 4G LTE Mobile Broadband Card */ {DEVICE_SWI(0x413c, 0x81b1)}, /* Dell Wireless 5809e Gobi(TM) 4G LTE Mobile Broadband Card */ {DEVICE_SWI(0x413c, 0x81b3)}, /* Dell Wireless 5809e Gobi(TM) 4G LTE Mobile Broadband Card (rev3) */ {DEVICE_SWI(0x413c, 0x81b5)}, /* Dell Wireless 5811e QDL */ {DEVICE_SWI(0x413c, 0x81b6)}, /* Dell Wireless 5811e QDL */ {DEVICE_SWI(0x413c, 0x81c2)}, /* Dell Wireless 5811e */ {DEVICE_SWI(0x413c, 0x81cb)}, /* Dell Wireless 5816e QDL */ {DEVICE_SWI(0x413c, 0x81cc)}, /* Dell Wireless 5816e */ {DEVICE_SWI(0x413c, 0x81cf)}, /* Dell Wireless 5819 */ {DEVICE_SWI(0x413c, 0x81d0)}, /* Dell Wireless 5819 */ {DEVICE_SWI(0x413c, 0x81d1)}, /* Dell Wireless 5818 */ {DEVICE_SWI(0x413c, 0x81d2)}, /* Dell Wireless 5818 */ {DEVICE_SWI(0x413c, 0x8217)}, /* Dell Wireless DW5826e */ {DEVICE_SWI(0x413c, 0x8218)}, /* Dell Wireless DW5826e QDL */ /* Huawei devices */ {DEVICE_HWI(0x03f0, 0x581d)}, /* HP lt4112 LTE/HSPA+ Gobi 4G Modem (Huawei me906e) */ { } /* Terminating entry */ }; MODULE_DEVICE_TABLE(usb, id_table); static int handle_quectel_ec20(struct device *dev, int ifnum) { int altsetting = 0; /* * Quectel EC20 Mini PCIe LTE module layout: * 0: DM/DIAG (use libqcdm from ModemManager for communication) * 1: NMEA * 2: AT-capable modem port * 3: Modem interface * 4: NDIS */ switch (ifnum) { case 0: dev_dbg(dev, "Quectel EC20 DM/DIAG interface found\n"); break; case 1: dev_dbg(dev, "Quectel EC20 NMEA GPS interface found\n"); break; case 2: case 3: dev_dbg(dev, "Quectel EC20 Modem port found\n"); break; case 4: /* Don't claim the QMI/net interface */ altsetting = -1; break; } return altsetting; } static int qcprobe(struct usb_serial *serial, const struct usb_device_id *id) { struct usb_host_interface *intf = serial->interface->cur_altsetting; struct device *dev = &serial->dev->dev; int retval = -ENODEV; __u8 nintf; __u8 ifnum; int altsetting = -1; bool sendsetup = false; /* we only support vendor specific functions */ if (intf->desc.bInterfaceClass != USB_CLASS_VENDOR_SPEC) goto done; nintf = serial->dev->actconfig->desc.bNumInterfaces; dev_dbg(dev, "Num Interfaces = %d\n", nintf); ifnum = intf->desc.bInterfaceNumber; dev_dbg(dev, "This Interface = %d\n", ifnum); if (nintf == 1) { /* QDL mode */ /* Gobi 2000 has a single altsetting, older ones have two */ if (serial->interface->num_altsetting == 2) intf = usb_altnum_to_altsetting(serial->interface, 1); else if (serial->interface->num_altsetting > 2) goto done; if (intf && intf->desc.bNumEndpoints == 2 && usb_endpoint_is_bulk_in(&intf->endpoint[0].desc) && usb_endpoint_is_bulk_out(&intf->endpoint[1].desc)) { dev_dbg(dev, "QDL port found\n"); if (serial->interface->num_altsetting == 1) retval = 0; /* Success */ else altsetting = 1; } goto done; } /* default to enabling interface */ altsetting = 0; /* * Composite mode; don't bind to the QMI/net interface as that * gets handled by other drivers. */ switch (id->driver_info) { case QCSERIAL_G1K: /* * Gobi 1K USB layout: * 0: DM/DIAG (use libqcdm from ModemManager for communication) * 1: serial port (doesn't respond) * 2: AT-capable modem port * 3: QMI/net */ if (nintf < 3 || nintf > 4) { dev_err(dev, "unknown number of interfaces: %d\n", nintf); altsetting = -1; goto done; } if (ifnum == 0) { dev_dbg(dev, "Gobi 1K DM/DIAG interface found\n"); altsetting = 1; } else if (ifnum == 2) dev_dbg(dev, "Modem port found\n"); else altsetting = -1; break; case QCSERIAL_G2K: /* handle non-standard layouts */ if (nintf == 5 && id->idProduct == QUECTEL_EC20_PID) { altsetting = handle_quectel_ec20(dev, ifnum); goto done; } /* * Gobi 2K+ USB layout: * 0: QMI/net * 1: DM/DIAG (use libqcdm from ModemManager for communication) * 2: AT-capable modem port * 3: NMEA */ if (nintf < 3 || nintf > 4) { dev_err(dev, "unknown number of interfaces: %d\n", nintf); altsetting = -1; goto done; } switch (ifnum) { case 0: /* Don't claim the QMI/net interface */ altsetting = -1; break; case 1: dev_dbg(dev, "Gobi 2K+ DM/DIAG interface found\n"); break; case 2: dev_dbg(dev, "Modem port found\n"); break; case 3: /* * NMEA (serial line 9600 8N1) * # echo "\$GPS_START" > /dev/ttyUSBx * # echo "\$GPS_STOP" > /dev/ttyUSBx */ dev_dbg(dev, "Gobi 2K+ NMEA GPS interface found\n"); break; } break; case QCSERIAL_SWI: /* * Sierra Wireless layout: * 0: DM/DIAG (use libqcdm from ModemManager for communication) * 2: NMEA * 3: AT-capable modem port * 8: QMI/net */ switch (ifnum) { case 0: dev_dbg(dev, "DM/DIAG interface found\n"); break; case 2: dev_dbg(dev, "NMEA GPS interface found\n"); sendsetup = true; break; case 3: dev_dbg(dev, "Modem port found\n"); sendsetup = true; break; default: /* don't claim any unsupported interface */ altsetting = -1; break; } break; case QCSERIAL_HWI: /* * Huawei devices map functions by subclass + protocol * instead of interface numbers. The protocol identify * a specific function, while the subclass indicate a * specific firmware source * * This is a list of functions known to be non-serial. The rest * are assumed to be serial and will be handled by this driver */ switch (intf->desc.bInterfaceProtocol) { /* QMI combined (qmi_wwan) */ case 0x07: case 0x37: case 0x67: /* QMI data (qmi_wwan) */ case 0x08: case 0x38: case 0x68: /* QMI control (qmi_wwan) */ case 0x09: case 0x39: case 0x69: /* NCM like (huawei_cdc_ncm) */ case 0x16: case 0x46: case 0x76: altsetting = -1; break; default: dev_dbg(dev, "Huawei type serial port found (%02x/%02x/%02x)\n", intf->desc.bInterfaceClass, intf->desc.bInterfaceSubClass, intf->desc.bInterfaceProtocol); } break; default: dev_err(dev, "unsupported device layout type: %lu\n", id->driver_info); break; } done: if (altsetting >= 0) { retval = usb_set_interface(serial->dev, ifnum, altsetting); if (retval < 0) { dev_err(dev, "Could not set interface, error %d\n", retval); retval = -ENODEV; } } if (!retval) usb_set_serial_data(serial, (void *)(unsigned long)sendsetup); return retval; } static int qc_attach(struct usb_serial *serial) { struct usb_wwan_intf_private *data; bool sendsetup; data = kzalloc(sizeof(*data), GFP_KERNEL); if (!data) return -ENOMEM; sendsetup = !!(unsigned long)(usb_get_serial_data(serial)); if (sendsetup) data->use_send_setup = 1; spin_lock_init(&data->susp_lock); usb_set_serial_data(serial, data); return 0; } static void qc_release(struct usb_serial *serial) { struct usb_wwan_intf_private *priv = usb_get_serial_data(serial); usb_set_serial_data(serial, NULL); kfree(priv); } static struct usb_serial_driver qcdevice = { .driver = { .name = "qcserial", }, .description = "Qualcomm USB modem", .id_table = id_table, .num_ports = 1, .probe = qcprobe, .open = usb_wwan_open, .close = usb_wwan_close, .dtr_rts = usb_wwan_dtr_rts, .write = usb_wwan_write, .write_room = usb_wwan_write_room, .chars_in_buffer = usb_wwan_chars_in_buffer, .tiocmget = usb_wwan_tiocmget, .tiocmset = usb_wwan_tiocmset, .attach = qc_attach, .release = qc_release, .port_probe = usb_wwan_port_probe, .port_remove = usb_wwan_port_remove, #ifdef CONFIG_PM .suspend = usb_wwan_suspend, .resume = usb_wwan_resume, #endif }; static struct usb_serial_driver * const serial_drivers[] = { &qcdevice, NULL }; module_usb_serial_driver(serial_drivers, id_table); MODULE_AUTHOR(DRIVER_AUTHOR); MODULE_DESCRIPTION(DRIVER_DESC); MODULE_LICENSE("GPL v2"); |
| 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 | // SPDX-License-Identifier: GPL-2.0+ /* * Nano River Technologies viperboard GPIO lib driver * * (C) 2012 by Lemonage GmbH * Author: Lars Poeschel <poeschel@lemonage.de> * All rights reserved. */ #include <linux/kernel.h> #include <linux/errno.h> #include <linux/module.h> #include <linux/slab.h> #include <linux/types.h> #include <linux/mutex.h> #include <linux/platform_device.h> #include <linux/usb.h> #include <linux/gpio/driver.h> #include <linux/mfd/viperboard.h> #define VPRBRD_GPIOA_CLK_1MHZ 0 #define VPRBRD_GPIOA_CLK_100KHZ 1 #define VPRBRD_GPIOA_CLK_10KHZ 2 #define VPRBRD_GPIOA_CLK_1KHZ 3 #define VPRBRD_GPIOA_CLK_100HZ 4 #define VPRBRD_GPIOA_CLK_10HZ 5 #define VPRBRD_GPIOA_FREQ_DEFAULT 1000 #define VPRBRD_GPIOA_CMD_CONT 0x00 #define VPRBRD_GPIOA_CMD_PULSE 0x01 #define VPRBRD_GPIOA_CMD_PWM 0x02 #define VPRBRD_GPIOA_CMD_SETOUT 0x03 #define VPRBRD_GPIOA_CMD_SETIN 0x04 #define VPRBRD_GPIOA_CMD_SETINT 0x05 #define VPRBRD_GPIOA_CMD_GETIN 0x06 #define VPRBRD_GPIOB_CMD_SETDIR 0x00 #define VPRBRD_GPIOB_CMD_SETVAL 0x01 struct vprbrd_gpioa_msg { u8 cmd; u8 clk; u8 offset; u8 t1; u8 t2; u8 invert; u8 pwmlevel; u8 outval; u8 risefall; u8 answer; u8 __fill; } __packed; struct vprbrd_gpiob_msg { u8 cmd; u16 val; u16 mask; } __packed; struct vprbrd_gpio { struct gpio_chip gpioa; /* gpio a related things */ u32 gpioa_out; u32 gpioa_val; struct gpio_chip gpiob; /* gpio b related things */ u32 gpiob_out; u32 gpiob_val; struct vprbrd *vb; }; /* gpioa sampling clock module parameter */ static unsigned char gpioa_clk; static unsigned int gpioa_freq = VPRBRD_GPIOA_FREQ_DEFAULT; module_param(gpioa_freq, uint, 0); MODULE_PARM_DESC(gpioa_freq, "gpio-a sampling freq in Hz (default is 1000Hz) valid values: 10, 100, 1000, 10000, 100000, 1000000"); /* ----- begin of gipo a chip -------------------------------------------- */ static int vprbrd_gpioa_get(struct gpio_chip *chip, unsigned int offset) { int ret, answer, error = 0; struct vprbrd_gpio *gpio = gpiochip_get_data(chip); struct vprbrd *vb = gpio->vb; struct vprbrd_gpioa_msg *gamsg = (struct vprbrd_gpioa_msg *)vb->buf; /* if io is set to output, just return the saved value */ if (gpio->gpioa_out & (1 << offset)) return !!(gpio->gpioa_val & (1 << offset)); mutex_lock(&vb->lock); gamsg->cmd = VPRBRD_GPIOA_CMD_GETIN; gamsg->clk = 0x00; gamsg->offset = offset; gamsg->t1 = 0x00; gamsg->t2 = 0x00; gamsg->invert = 0x00; gamsg->pwmlevel = 0x00; gamsg->outval = 0x00; gamsg->risefall = 0x00; gamsg->answer = 0x00; gamsg->__fill = 0x00; ret = usb_control_msg(vb->usb_dev, usb_sndctrlpipe(vb->usb_dev, 0), VPRBRD_USB_REQUEST_GPIOA, VPRBRD_USB_TYPE_OUT, 0x0000, 0x0000, gamsg, sizeof(struct vprbrd_gpioa_msg), VPRBRD_USB_TIMEOUT_MS); if (ret != sizeof(struct vprbrd_gpioa_msg)) error = -EREMOTEIO; ret = usb_control_msg(vb->usb_dev, usb_rcvctrlpipe(vb->usb_dev, 0), VPRBRD_USB_REQUEST_GPIOA, VPRBRD_USB_TYPE_IN, 0x0000, 0x0000, gamsg, sizeof(struct vprbrd_gpioa_msg), VPRBRD_USB_TIMEOUT_MS); answer = gamsg->answer & 0x01; mutex_unlock(&vb->lock); if (ret != sizeof(struct vprbrd_gpioa_msg)) error = -EREMOTEIO; if (error) return error; return answer; } static int vprbrd_gpioa_set(struct gpio_chip *chip, unsigned int offset, int value) { int ret = 0; struct vprbrd_gpio *gpio = gpiochip_get_data(chip); struct vprbrd *vb = gpio->vb; struct vprbrd_gpioa_msg *gamsg = (struct vprbrd_gpioa_msg *)vb->buf; if (!(gpio->gpioa_out & (1 << offset))) return 0; if (value) gpio->gpioa_val |= (1 << offset); else gpio->gpioa_val &= ~(1 << offset); mutex_lock(&vb->lock); gamsg->cmd = VPRBRD_GPIOA_CMD_SETOUT; gamsg->clk = 0x00; gamsg->offset = offset; gamsg->t1 = 0x00; gamsg->t2 = 0x00; gamsg->invert = 0x00; gamsg->pwmlevel = 0x00; gamsg->outval = value; gamsg->risefall = 0x00; gamsg->answer = 0x00; gamsg->__fill = 0x00; ret = usb_control_msg(vb->usb_dev, usb_sndctrlpipe(vb->usb_dev, 0), VPRBRD_USB_REQUEST_GPIOA, VPRBRD_USB_TYPE_OUT, 0x0000, 0x0000, gamsg, sizeof(struct vprbrd_gpioa_msg), VPRBRD_USB_TIMEOUT_MS); mutex_unlock(&vb->lock); if (ret != sizeof(struct vprbrd_gpioa_msg)) { dev_err(chip->parent, "usb error setting pin value\n"); return -EREMOTEIO; } return 0; } static int vprbrd_gpioa_direction_input(struct gpio_chip *chip, unsigned int offset) { int ret; struct vprbrd_gpio *gpio = gpiochip_get_data(chip); struct vprbrd *vb = gpio->vb; struct vprbrd_gpioa_msg *gamsg = (struct vprbrd_gpioa_msg *)vb->buf; gpio->gpioa_out &= ~(1 << offset); mutex_lock(&vb->lock); gamsg->cmd = VPRBRD_GPIOA_CMD_SETIN; gamsg->clk = gpioa_clk; gamsg->offset = offset; gamsg->t1 = 0x00; gamsg->t2 = 0x00; gamsg->invert = 0x00; gamsg->pwmlevel = 0x00; gamsg->outval = 0x00; gamsg->risefall = 0x00; gamsg->answer = 0x00; gamsg->__fill = 0x00; ret = usb_control_msg(vb->usb_dev, usb_sndctrlpipe(vb->usb_dev, 0), VPRBRD_USB_REQUEST_GPIOA, VPRBRD_USB_TYPE_OUT, 0x0000, 0x0000, gamsg, sizeof(struct vprbrd_gpioa_msg), VPRBRD_USB_TIMEOUT_MS); mutex_unlock(&vb->lock); if (ret != sizeof(struct vprbrd_gpioa_msg)) return -EREMOTEIO; return 0; } static int vprbrd_gpioa_direction_output(struct gpio_chip *chip, unsigned int offset, int value) { int ret; struct vprbrd_gpio *gpio = gpiochip_get_data(chip); struct vprbrd *vb = gpio->vb; struct vprbrd_gpioa_msg *gamsg = (struct vprbrd_gpioa_msg *)vb->buf; gpio->gpioa_out |= (1 << offset); if (value) gpio->gpioa_val |= (1 << offset); else gpio->gpioa_val &= ~(1 << offset); mutex_lock(&vb->lock); gamsg->cmd = VPRBRD_GPIOA_CMD_SETOUT; gamsg->clk = 0x00; gamsg->offset = offset; gamsg->t1 = 0x00; gamsg->t2 = 0x00; gamsg->invert = 0x00; gamsg->pwmlevel = 0x00; gamsg->outval = value; gamsg->risefall = 0x00; gamsg->answer = 0x00; gamsg->__fill = 0x00; ret = usb_control_msg(vb->usb_dev, usb_sndctrlpipe(vb->usb_dev, 0), VPRBRD_USB_REQUEST_GPIOA, VPRBRD_USB_TYPE_OUT, 0x0000, 0x0000, gamsg, sizeof(struct vprbrd_gpioa_msg), VPRBRD_USB_TIMEOUT_MS); mutex_unlock(&vb->lock); if (ret != sizeof(struct vprbrd_gpioa_msg)) return -EREMOTEIO; return 0; } /* ----- end of gpio a chip ---------------------------------------------- */ /* ----- begin of gipo b chip -------------------------------------------- */ static int vprbrd_gpiob_setdir(struct vprbrd *vb, unsigned int offset, unsigned int dir) { struct vprbrd_gpiob_msg *gbmsg = (struct vprbrd_gpiob_msg *)vb->buf; int ret; gbmsg->cmd = VPRBRD_GPIOB_CMD_SETDIR; gbmsg->val = cpu_to_be16(dir << offset); gbmsg->mask = cpu_to_be16(0x0001 << offset); ret = usb_control_msg(vb->usb_dev, usb_sndctrlpipe(vb->usb_dev, 0), VPRBRD_USB_REQUEST_GPIOB, VPRBRD_USB_TYPE_OUT, 0x0000, 0x0000, gbmsg, sizeof(struct vprbrd_gpiob_msg), VPRBRD_USB_TIMEOUT_MS); if (ret != sizeof(struct vprbrd_gpiob_msg)) return -EREMOTEIO; return 0; } static int vprbrd_gpiob_get(struct gpio_chip *chip, unsigned int offset) { int ret; u16 val; struct vprbrd_gpio *gpio = gpiochip_get_data(chip); struct vprbrd *vb = gpio->vb; struct vprbrd_gpiob_msg *gbmsg = (struct vprbrd_gpiob_msg *)vb->buf; /* if io is set to output, just return the saved value */ if (gpio->gpiob_out & (1 << offset)) return gpio->gpiob_val & (1 << offset); mutex_lock(&vb->lock); ret = usb_control_msg(vb->usb_dev, usb_rcvctrlpipe(vb->usb_dev, 0), VPRBRD_USB_REQUEST_GPIOB, VPRBRD_USB_TYPE_IN, 0x0000, 0x0000, gbmsg, sizeof(struct vprbrd_gpiob_msg), VPRBRD_USB_TIMEOUT_MS); val = gbmsg->val; mutex_unlock(&vb->lock); if (ret != sizeof(struct vprbrd_gpiob_msg)) return ret; /* cache the read values */ gpio->gpiob_val = be16_to_cpu(val); return (gpio->gpiob_val >> offset) & 0x1; } static int vprbrd_gpiob_set(struct gpio_chip *chip, unsigned int offset, int value) { int ret; struct vprbrd_gpio *gpio = gpiochip_get_data(chip); struct vprbrd *vb = gpio->vb; struct vprbrd_gpiob_msg *gbmsg = (struct vprbrd_gpiob_msg *)vb->buf; if (!(gpio->gpiob_out & (1 << offset))) return 0; if (value) gpio->gpiob_val |= (1 << offset); else gpio->gpiob_val &= ~(1 << offset); mutex_lock(&vb->lock); gbmsg->cmd = VPRBRD_GPIOB_CMD_SETVAL; gbmsg->val = cpu_to_be16(value << offset); gbmsg->mask = cpu_to_be16(0x0001 << offset); ret = usb_control_msg(vb->usb_dev, usb_sndctrlpipe(vb->usb_dev, 0), VPRBRD_USB_REQUEST_GPIOB, VPRBRD_USB_TYPE_OUT, 0x0000, 0x0000, gbmsg, sizeof(struct vprbrd_gpiob_msg), VPRBRD_USB_TIMEOUT_MS); mutex_unlock(&vb->lock); if (ret != sizeof(struct vprbrd_gpiob_msg)) { dev_err(chip->parent, "usb error setting pin value\n"); return -EREMOTEIO; } return 0; } static int vprbrd_gpiob_direction_input(struct gpio_chip *chip, unsigned int offset) { int ret; struct vprbrd_gpio *gpio = gpiochip_get_data(chip); struct vprbrd *vb = gpio->vb; gpio->gpiob_out &= ~(1 << offset); mutex_lock(&vb->lock); ret = vprbrd_gpiob_setdir(vb, offset, 0); mutex_unlock(&vb->lock); if (ret) dev_err(chip->parent, "usb error setting pin to input\n"); return ret; } static int vprbrd_gpiob_direction_output(struct gpio_chip *chip, unsigned int offset, int value) { int ret; struct vprbrd_gpio *gpio = gpiochip_get_data(chip); struct vprbrd *vb = gpio->vb; gpio->gpiob_out |= (1 << offset); mutex_lock(&vb->lock); ret = vprbrd_gpiob_setdir(vb, offset, 1); mutex_unlock(&vb->lock); if (ret) { dev_err(chip->parent, "usb error setting pin to output\n"); return ret; } return vprbrd_gpiob_set(chip, offset, value); } /* ----- end of gpio b chip ---------------------------------------------- */ static int vprbrd_gpio_probe(struct platform_device *pdev) { struct vprbrd *vb = dev_get_drvdata(pdev->dev.parent); struct vprbrd_gpio *vb_gpio; int ret; vb_gpio = devm_kzalloc(&pdev->dev, sizeof(*vb_gpio), GFP_KERNEL); if (vb_gpio == NULL) return -ENOMEM; vb_gpio->vb = vb; /* registering gpio a */ vb_gpio->gpioa.label = "viperboard gpio a"; vb_gpio->gpioa.parent = &pdev->dev; vb_gpio->gpioa.owner = THIS_MODULE; vb_gpio->gpioa.base = -1; vb_gpio->gpioa.ngpio = 16; vb_gpio->gpioa.can_sleep = true; vb_gpio->gpioa.set = vprbrd_gpioa_set; vb_gpio->gpioa.get = vprbrd_gpioa_get; vb_gpio->gpioa.direction_input = vprbrd_gpioa_direction_input; vb_gpio->gpioa.direction_output = vprbrd_gpioa_direction_output; ret = devm_gpiochip_add_data(&pdev->dev, &vb_gpio->gpioa, vb_gpio); if (ret < 0) return ret; /* registering gpio b */ vb_gpio->gpiob.label = "viperboard gpio b"; vb_gpio->gpiob.parent = &pdev->dev; vb_gpio->gpiob.owner = THIS_MODULE; vb_gpio->gpiob.base = -1; vb_gpio->gpiob.ngpio = 16; vb_gpio->gpiob.can_sleep = true; vb_gpio->gpiob.set = vprbrd_gpiob_set; vb_gpio->gpiob.get = vprbrd_gpiob_get; vb_gpio->gpiob.direction_input = vprbrd_gpiob_direction_input; vb_gpio->gpiob.direction_output = vprbrd_gpiob_direction_output; return devm_gpiochip_add_data(&pdev->dev, &vb_gpio->gpiob, vb_gpio); } static struct platform_driver vprbrd_gpio_driver = { .driver.name = "viperboard-gpio", .probe = vprbrd_gpio_probe, }; static int __init vprbrd_gpio_init(void) { switch (gpioa_freq) { case 1000000: gpioa_clk = VPRBRD_GPIOA_CLK_1MHZ; break; case 100000: gpioa_clk = VPRBRD_GPIOA_CLK_100KHZ; break; case 10000: gpioa_clk = VPRBRD_GPIOA_CLK_10KHZ; break; case 1000: gpioa_clk = VPRBRD_GPIOA_CLK_1KHZ; break; case 100: gpioa_clk = VPRBRD_GPIOA_CLK_100HZ; break; case 10: gpioa_clk = VPRBRD_GPIOA_CLK_10HZ; break; default: pr_warn("invalid gpioa_freq (%d)\n", gpioa_freq); gpioa_clk = VPRBRD_GPIOA_CLK_1KHZ; } return platform_driver_register(&vprbrd_gpio_driver); } subsys_initcall(vprbrd_gpio_init); static void __exit vprbrd_gpio_exit(void) { platform_driver_unregister(&vprbrd_gpio_driver); } module_exit(vprbrd_gpio_exit); MODULE_AUTHOR("Lars Poeschel <poeschel@lemonage.de>"); MODULE_DESCRIPTION("GPIO driver for Nano River Techs Viperboard"); MODULE_LICENSE("GPL"); MODULE_ALIAS("platform:viperboard-gpio"); |
| 2 2 2 2 2 2 2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 | /* SPDX-License-Identifier: GPL-2.0-or-later */ /* * Glue Code for the AVX/AES-NI/GFNI assembler implementation of the ARIA Cipher * * Copyright (c) 2022 Taehee Yoo <ap420073@gmail.com> */ #include <crypto/algapi.h> #include <crypto/aria.h> #include <linux/crypto.h> #include <linux/err.h> #include <linux/export.h> #include <linux/module.h> #include <linux/types.h> #include "ecb_cbc_helpers.h" #include "aria-avx.h" asmlinkage void aria_aesni_avx_encrypt_16way(const void *ctx, u8 *dst, const u8 *src); EXPORT_SYMBOL_GPL(aria_aesni_avx_encrypt_16way); asmlinkage void aria_aesni_avx_decrypt_16way(const void *ctx, u8 *dst, const u8 *src); EXPORT_SYMBOL_GPL(aria_aesni_avx_decrypt_16way); asmlinkage void aria_aesni_avx_ctr_crypt_16way(const void *ctx, u8 *dst, const u8 *src, u8 *keystream, u8 *iv); EXPORT_SYMBOL_GPL(aria_aesni_avx_ctr_crypt_16way); #ifdef CONFIG_AS_GFNI asmlinkage void aria_aesni_avx_gfni_encrypt_16way(const void *ctx, u8 *dst, const u8 *src); EXPORT_SYMBOL_GPL(aria_aesni_avx_gfni_encrypt_16way); asmlinkage void aria_aesni_avx_gfni_decrypt_16way(const void *ctx, u8 *dst, const u8 *src); EXPORT_SYMBOL_GPL(aria_aesni_avx_gfni_decrypt_16way); asmlinkage void aria_aesni_avx_gfni_ctr_crypt_16way(const void *ctx, u8 *dst, const u8 *src, u8 *keystream, u8 *iv); EXPORT_SYMBOL_GPL(aria_aesni_avx_gfni_ctr_crypt_16way); #endif /* CONFIG_AS_GFNI */ static struct aria_avx_ops aria_ops; struct aria_avx_request_ctx { u8 keystream[ARIA_AESNI_PARALLEL_BLOCK_SIZE]; }; static int ecb_do_encrypt(struct skcipher_request *req, const u32 *rkey) { ECB_WALK_START(req, ARIA_BLOCK_SIZE, ARIA_AESNI_PARALLEL_BLOCKS); ECB_BLOCK(ARIA_AESNI_PARALLEL_BLOCKS, aria_ops.aria_encrypt_16way); ECB_BLOCK(1, aria_encrypt); ECB_WALK_END(); } static int ecb_do_decrypt(struct skcipher_request *req, const u32 *rkey) { ECB_WALK_START(req, ARIA_BLOCK_SIZE, ARIA_AESNI_PARALLEL_BLOCKS); ECB_BLOCK(ARIA_AESNI_PARALLEL_BLOCKS, aria_ops.aria_decrypt_16way); ECB_BLOCK(1, aria_decrypt); ECB_WALK_END(); } static int aria_avx_ecb_encrypt(struct skcipher_request *req) { struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); struct aria_ctx *ctx = crypto_skcipher_ctx(tfm); return ecb_do_encrypt(req, ctx->enc_key[0]); } static int aria_avx_ecb_decrypt(struct skcipher_request *req) { struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); struct aria_ctx *ctx = crypto_skcipher_ctx(tfm); return ecb_do_decrypt(req, ctx->dec_key[0]); } static int aria_avx_set_key(struct crypto_skcipher *tfm, const u8 *key, unsigned int keylen) { return aria_set_key(&tfm->base, key, keylen); } static int aria_avx_ctr_encrypt(struct skcipher_request *req) { struct aria_avx_request_ctx *req_ctx = skcipher_request_ctx(req); struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); struct aria_ctx *ctx = crypto_skcipher_ctx(tfm); struct skcipher_walk walk; unsigned int nbytes; int err; err = skcipher_walk_virt(&walk, req, false); while ((nbytes = walk.nbytes) > 0) { const u8 *src = walk.src.virt.addr; u8 *dst = walk.dst.virt.addr; while (nbytes >= ARIA_AESNI_PARALLEL_BLOCK_SIZE) { kernel_fpu_begin(); aria_ops.aria_ctr_crypt_16way(ctx, dst, src, &req_ctx->keystream[0], walk.iv); kernel_fpu_end(); dst += ARIA_AESNI_PARALLEL_BLOCK_SIZE; src += ARIA_AESNI_PARALLEL_BLOCK_SIZE; nbytes -= ARIA_AESNI_PARALLEL_BLOCK_SIZE; } while (nbytes >= ARIA_BLOCK_SIZE) { memcpy(&req_ctx->keystream[0], walk.iv, ARIA_BLOCK_SIZE); crypto_inc(walk.iv, ARIA_BLOCK_SIZE); aria_encrypt(ctx, &req_ctx->keystream[0], &req_ctx->keystream[0]); crypto_xor_cpy(dst, src, &req_ctx->keystream[0], ARIA_BLOCK_SIZE); dst += ARIA_BLOCK_SIZE; src += ARIA_BLOCK_SIZE; nbytes -= ARIA_BLOCK_SIZE; } if (walk.nbytes == walk.total && nbytes > 0) { memcpy(&req_ctx->keystream[0], walk.iv, ARIA_BLOCK_SIZE); crypto_inc(walk.iv, ARIA_BLOCK_SIZE); aria_encrypt(ctx, &req_ctx->keystream[0], &req_ctx->keystream[0]); crypto_xor_cpy(dst, src, &req_ctx->keystream[0], nbytes); dst += nbytes; src += nbytes; nbytes = 0; } err = skcipher_walk_done(&walk, nbytes); } return err; } static int aria_avx_init_tfm(struct crypto_skcipher *tfm) { crypto_skcipher_set_reqsize(tfm, sizeof(struct aria_avx_request_ctx)); return 0; } static struct skcipher_alg aria_algs[] = { { .base.cra_name = "ecb(aria)", .base.cra_driver_name = "ecb-aria-avx", .base.cra_priority = 400, .base.cra_blocksize = ARIA_BLOCK_SIZE, .base.cra_ctxsize = sizeof(struct aria_ctx), .base.cra_module = THIS_MODULE, .min_keysize = ARIA_MIN_KEY_SIZE, .max_keysize = ARIA_MAX_KEY_SIZE, .setkey = aria_avx_set_key, .encrypt = aria_avx_ecb_encrypt, .decrypt = aria_avx_ecb_decrypt, }, { .base.cra_name = "ctr(aria)", .base.cra_driver_name = "ctr-aria-avx", .base.cra_priority = 400, .base.cra_blocksize = 1, .base.cra_ctxsize = sizeof(struct aria_ctx), .base.cra_module = THIS_MODULE, .min_keysize = ARIA_MIN_KEY_SIZE, .max_keysize = ARIA_MAX_KEY_SIZE, .ivsize = ARIA_BLOCK_SIZE, .chunksize = ARIA_BLOCK_SIZE, .walksize = 16 * ARIA_BLOCK_SIZE, .setkey = aria_avx_set_key, .encrypt = aria_avx_ctr_encrypt, .decrypt = aria_avx_ctr_encrypt, .init = aria_avx_init_tfm, } }; static int __init aria_avx_init(void) { const char *feature_name; if (!boot_cpu_has(X86_FEATURE_AVX) || !boot_cpu_has(X86_FEATURE_AES) || !boot_cpu_has(X86_FEATURE_OSXSAVE)) { pr_info("AVX or AES-NI instructions are not detected.\n"); return -ENODEV; } if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, &feature_name)) { pr_info("CPU feature '%s' is not supported.\n", feature_name); return -ENODEV; } if (boot_cpu_has(X86_FEATURE_GFNI) && IS_ENABLED(CONFIG_AS_GFNI)) { aria_ops.aria_encrypt_16way = aria_aesni_avx_gfni_encrypt_16way; aria_ops.aria_decrypt_16way = aria_aesni_avx_gfni_decrypt_16way; aria_ops.aria_ctr_crypt_16way = aria_aesni_avx_gfni_ctr_crypt_16way; } else { aria_ops.aria_encrypt_16way = aria_aesni_avx_encrypt_16way; aria_ops.aria_decrypt_16way = aria_aesni_avx_decrypt_16way; aria_ops.aria_ctr_crypt_16way = aria_aesni_avx_ctr_crypt_16way; } return crypto_register_skciphers(aria_algs, ARRAY_SIZE(aria_algs)); } static void __exit aria_avx_exit(void) { crypto_unregister_skciphers(aria_algs, ARRAY_SIZE(aria_algs)); } module_init(aria_avx_init); module_exit(aria_avx_exit); MODULE_LICENSE("GPL"); MODULE_AUTHOR("Taehee Yoo <ap420073@gmail.com>"); MODULE_DESCRIPTION("ARIA Cipher Algorithm, AVX/AES-NI/GFNI optimized"); MODULE_ALIAS_CRYPTO("aria"); MODULE_ALIAS_CRYPTO("aria-aesni-avx"); |
| 261 261 264 29 434 384 252 382 2 7 8 8 8 3 7 4 7 8 362 362 360 1 27 3 28 7 2 5 1 1 2 3 5 382 383 19 361 28 36 36 12 12 12 422 434 13 13 3 16 434 434 434 433 434 384 365 3 16 13 10 6 12 12 12 21 1 1 21 52 5 21 21 37 36 36 36 36 36 36 36 6 34 589 588 589 589 11 1 10 3 2 5 10 14 13 8 14 6 11 14 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 | // SPDX-License-Identifier: GPL-2.0 /* * Basic worker thread pool for io_uring * * Copyright (C) 2019 Jens Axboe * */ #include <linux/kernel.h> #include <linux/init.h> #include <linux/errno.h> #include <linux/sched/signal.h> #include <linux/percpu.h> #include <linux/slab.h> #include <linux/rculist_nulls.h> #include <linux/cpu.h> #include <linux/cpuset.h> #include <linux/task_work.h> #include <linux/audit.h> #include <linux/mmu_context.h> #include <uapi/linux/io_uring.h> #include "io-wq.h" #include "slist.h" #include "io_uring.h" #define WORKER_IDLE_TIMEOUT (5 * HZ) #define WORKER_INIT_LIMIT 3 enum { IO_WORKER_F_UP = 0, /* up and active */ IO_WORKER_F_RUNNING = 1, /* account as running */ IO_WORKER_F_FREE = 2, /* worker on free list */ }; enum { IO_WQ_BIT_EXIT = 0, /* wq exiting */ }; enum { IO_ACCT_STALLED_BIT = 0, /* stalled on hash */ }; /* * One for each thread in a wq pool */ struct io_worker { refcount_t ref; unsigned long flags; struct hlist_nulls_node nulls_node; struct list_head all_list; struct task_struct *task; struct io_wq *wq; struct io_wq_acct *acct; struct io_wq_work *cur_work; raw_spinlock_t lock; struct completion ref_done; unsigned long create_state; struct callback_head create_work; int init_retries; union { struct rcu_head rcu; struct delayed_work work; }; }; #if BITS_PER_LONG == 64 #define IO_WQ_HASH_ORDER 6 #else #define IO_WQ_HASH_ORDER 5 #endif #define IO_WQ_NR_HASH_BUCKETS (1u << IO_WQ_HASH_ORDER) struct io_wq_acct { /** * Protects access to the worker lists. */ raw_spinlock_t workers_lock; unsigned nr_workers; unsigned max_workers; atomic_t nr_running; /** * The list of free workers. Protected by #workers_lock * (write) and RCU (read). */ struct hlist_nulls_head free_list; /** * The list of all workers. Protected by #workers_lock * (write) and RCU (read). */ struct list_head all_list; raw_spinlock_t lock; struct io_wq_work_list work_list; unsigned long flags; }; enum { IO_WQ_ACCT_BOUND, IO_WQ_ACCT_UNBOUND, IO_WQ_ACCT_NR, }; /* * Per io_wq state */ struct io_wq { unsigned long state; struct io_wq_hash *hash; atomic_t worker_refs; struct completion worker_done; struct hlist_node cpuhp_node; struct task_struct *task; struct io_wq_acct acct[IO_WQ_ACCT_NR]; struct wait_queue_entry wait; struct io_wq_work *hash_tail[IO_WQ_NR_HASH_BUCKETS]; cpumask_var_t cpu_mask; }; static enum cpuhp_state io_wq_online; struct io_cb_cancel_data { work_cancel_fn *fn; void *data; int nr_running; int nr_pending; bool cancel_all; }; static bool create_io_worker(struct io_wq *wq, struct io_wq_acct *acct); static void io_wq_dec_running(struct io_worker *worker); static bool io_acct_cancel_pending_work(struct io_wq *wq, struct io_wq_acct *acct, struct io_cb_cancel_data *match); static void create_worker_cb(struct callback_head *cb); static void io_wq_cancel_tw_create(struct io_wq *wq); static inline unsigned int __io_get_work_hash(unsigned int work_flags) { return work_flags >> IO_WQ_HASH_SHIFT; } static inline unsigned int io_get_work_hash(struct io_wq_work *work) { return __io_get_work_hash(atomic_read(&work->flags)); } static bool io_worker_get(struct io_worker *worker) { return refcount_inc_not_zero(&worker->ref); } static void io_worker_release(struct io_worker *worker) { if (refcount_dec_and_test(&worker->ref)) complete(&worker->ref_done); } static inline struct io_wq_acct *io_get_acct(struct io_wq *wq, bool bound) { return &wq->acct[bound ? IO_WQ_ACCT_BOUND : IO_WQ_ACCT_UNBOUND]; } static inline struct io_wq_acct *io_work_get_acct(struct io_wq *wq, unsigned int work_flags) { return io_get_acct(wq, !(work_flags & IO_WQ_WORK_UNBOUND)); } static inline struct io_wq_acct *io_wq_get_acct(struct io_worker *worker) { return worker->acct; } static void io_worker_ref_put(struct io_wq *wq) { if (atomic_dec_and_test(&wq->worker_refs)) complete(&wq->worker_done); } bool io_wq_worker_stopped(void) { struct io_worker *worker = current->worker_private; if (WARN_ON_ONCE(!io_wq_current_is_worker())) return true; return test_bit(IO_WQ_BIT_EXIT, &worker->wq->state); } static void io_worker_cancel_cb(struct io_worker *worker) { struct io_wq_acct *acct = io_wq_get_acct(worker); struct io_wq *wq = worker->wq; atomic_dec(&acct->nr_running); raw_spin_lock(&acct->workers_lock); acct->nr_workers--; raw_spin_unlock(&acct->workers_lock); io_worker_ref_put(wq); clear_bit_unlock(0, &worker->create_state); io_worker_release(worker); } static bool io_task_worker_match(struct callback_head *cb, void *data) { struct io_worker *worker; if (cb->func != create_worker_cb) return false; worker = container_of(cb, struct io_worker, create_work); return worker == data; } static void io_worker_exit(struct io_worker *worker) { struct io_wq *wq = worker->wq; struct io_wq_acct *acct = io_wq_get_acct(worker); while (1) { struct callback_head *cb = task_work_cancel_match(wq->task, io_task_worker_match, worker); if (!cb) break; io_worker_cancel_cb(worker); } io_worker_release(worker); wait_for_completion(&worker->ref_done); raw_spin_lock(&acct->workers_lock); if (test_bit(IO_WORKER_F_FREE, &worker->flags)) hlist_nulls_del_rcu(&worker->nulls_node); list_del_rcu(&worker->all_list); raw_spin_unlock(&acct->workers_lock); io_wq_dec_running(worker); /* * this worker is a goner, clear ->worker_private to avoid any * inc/dec running calls that could happen as part of exit from * touching 'worker'. */ current->worker_private = NULL; kfree_rcu(worker, rcu); io_worker_ref_put(wq); do_exit(0); } static inline bool __io_acct_run_queue(struct io_wq_acct *acct) { return !test_bit(IO_ACCT_STALLED_BIT, &acct->flags) && !wq_list_empty(&acct->work_list); } /* * If there's work to do, returns true with acct->lock acquired. If not, * returns false with no lock held. */ static inline bool io_acct_run_queue(struct io_wq_acct *acct) __acquires(&acct->lock) { raw_spin_lock(&acct->lock); if (__io_acct_run_queue(acct)) return true; raw_spin_unlock(&acct->lock); return false; } /* * Check head of free list for an available worker. If one isn't available, * caller must create one. */ static bool io_acct_activate_free_worker(struct io_wq_acct *acct) __must_hold(RCU) { struct hlist_nulls_node *n; struct io_worker *worker; /* * Iterate free_list and see if we can find an idle worker to * activate. If a given worker is on the free_list but in the process * of exiting, keep trying. */ hlist_nulls_for_each_entry_rcu(worker, n, &acct->free_list, nulls_node) { if (!io_worker_get(worker)) continue; /* * If the worker is already running, it's either already * starting work or finishing work. In either case, if it does * to go sleep, we'll kick off a new task for this work anyway. */ wake_up_process(worker->task); io_worker_release(worker); return true; } return false; } /* * We need a worker. If we find a free one, we're good. If not, and we're * below the max number of workers, create one. */ static bool io_wq_create_worker(struct io_wq *wq, struct io_wq_acct *acct) { /* * Most likely an attempt to queue unbounded work on an io_wq that * wasn't setup with any unbounded workers. */ if (unlikely(!acct->max_workers)) pr_warn_once("io-wq is not configured for unbound workers"); raw_spin_lock(&acct->workers_lock); if (acct->nr_workers >= acct->max_workers) { raw_spin_unlock(&acct->workers_lock); return true; } acct->nr_workers++; raw_spin_unlock(&acct->workers_lock); atomic_inc(&acct->nr_running); atomic_inc(&wq->worker_refs); return create_io_worker(wq, acct); } static void io_wq_inc_running(struct io_worker *worker) { struct io_wq_acct *acct = io_wq_get_acct(worker); atomic_inc(&acct->nr_running); } static void create_worker_cb(struct callback_head *cb) { struct io_worker *worker; struct io_wq *wq; struct io_wq_acct *acct; bool activated_free_worker, do_create = false; worker = container_of(cb, struct io_worker, create_work); wq = worker->wq; acct = worker->acct; rcu_read_lock(); activated_free_worker = io_acct_activate_free_worker(acct); rcu_read_unlock(); if (activated_free_worker) goto no_need_create; raw_spin_lock(&acct->workers_lock); if (acct->nr_workers < acct->max_workers) { acct->nr_workers++; do_create = true; } raw_spin_unlock(&acct->workers_lock); if (do_create) { create_io_worker(wq, acct); } else { no_need_create: atomic_dec(&acct->nr_running); io_worker_ref_put(wq); } clear_bit_unlock(0, &worker->create_state); io_worker_release(worker); } static bool io_queue_worker_create(struct io_worker *worker, struct io_wq_acct *acct, task_work_func_t func) { struct io_wq *wq = worker->wq; /* raced with exit, just ignore create call */ if (test_bit(IO_WQ_BIT_EXIT, &wq->state)) goto fail; if (!io_worker_get(worker)) goto fail; /* * create_state manages ownership of create_work/index. We should * only need one entry per worker, as the worker going to sleep * will trigger the condition, and waking will clear it once it * runs the task_work. */ if (test_bit(0, &worker->create_state) || test_and_set_bit_lock(0, &worker->create_state)) goto fail_release; atomic_inc(&wq->worker_refs); init_task_work(&worker->create_work, func); if (!task_work_add(wq->task, &worker->create_work, TWA_SIGNAL)) { /* * EXIT may have been set after checking it above, check after * adding the task_work and remove any creation item if it is * now set. wq exit does that too, but we can have added this * work item after we canceled in io_wq_exit_workers(). */ if (test_bit(IO_WQ_BIT_EXIT, &wq->state)) io_wq_cancel_tw_create(wq); io_worker_ref_put(wq); return true; } io_worker_ref_put(wq); clear_bit_unlock(0, &worker->create_state); fail_release: io_worker_release(worker); fail: atomic_dec(&acct->nr_running); io_worker_ref_put(wq); return false; } /* Defer if current and next work are both hashed to the same chain */ static bool io_wq_hash_defer(struct io_wq_work *work, struct io_wq_acct *acct) { unsigned int hash, work_flags; struct io_wq_work *next; lockdep_assert_held(&acct->lock); work_flags = atomic_read(&work->flags); if (!__io_wq_is_hashed(work_flags)) return false; /* should not happen, io_acct_run_queue() said we had work */ if (wq_list_empty(&acct->work_list)) return true; hash = __io_get_work_hash(work_flags); next = container_of(acct->work_list.first, struct io_wq_work, list); work_flags = atomic_read(&next->flags); if (!__io_wq_is_hashed(work_flags)) return false; return hash == __io_get_work_hash(work_flags); } static void io_wq_dec_running(struct io_worker *worker) { struct io_wq_acct *acct = io_wq_get_acct(worker); struct io_wq *wq = worker->wq; if (!test_bit(IO_WORKER_F_UP, &worker->flags)) return; if (!atomic_dec_and_test(&acct->nr_running)) return; if (!worker->cur_work) return; if (!io_acct_run_queue(acct)) return; if (io_wq_hash_defer(worker->cur_work, acct)) { raw_spin_unlock(&acct->lock); return; } raw_spin_unlock(&acct->lock); atomic_inc(&acct->nr_running); atomic_inc(&wq->worker_refs); io_queue_worker_create(worker, acct, create_worker_cb); } /* * Worker will start processing some work. Move it to the busy list, if * it's currently on the freelist */ static void __io_worker_busy(struct io_wq_acct *acct, struct io_worker *worker) { if (test_bit(IO_WORKER_F_FREE, &worker->flags)) { clear_bit(IO_WORKER_F_FREE, &worker->flags); raw_spin_lock(&acct->workers_lock); hlist_nulls_del_init_rcu(&worker->nulls_node); raw_spin_unlock(&acct->workers_lock); } } /* * No work, worker going to sleep. Move to freelist. */ static void __io_worker_idle(struct io_wq_acct *acct, struct io_worker *worker) __must_hold(acct->workers_lock) { if (!test_bit(IO_WORKER_F_FREE, &worker->flags)) { set_bit(IO_WORKER_F_FREE, &worker->flags); hlist_nulls_add_head_rcu(&worker->nulls_node, &acct->free_list); } } static bool io_wait_on_hash(struct io_wq *wq, unsigned int hash) { bool ret = false; spin_lock_irq(&wq->hash->wait.lock); if (list_empty(&wq->wait.entry)) { __add_wait_queue(&wq->hash->wait, &wq->wait); if (!test_bit(hash, &wq->hash->map)) { __set_current_state(TASK_RUNNING); list_del_init(&wq->wait.entry); ret = true; } } spin_unlock_irq(&wq->hash->wait.lock); return ret; } static struct io_wq_work *io_get_next_work(struct io_wq_acct *acct, struct io_wq *wq) __must_hold(acct->lock) { struct io_wq_work_node *node, *prev; struct io_wq_work *work, *tail; unsigned int stall_hash = -1U; wq_list_for_each(node, prev, &acct->work_list) { unsigned int work_flags; unsigned int hash; work = container_of(node, struct io_wq_work, list); /* not hashed, can run anytime */ work_flags = atomic_read(&work->flags); if (!__io_wq_is_hashed(work_flags)) { wq_list_del(&acct->work_list, node, prev); return work; } hash = __io_get_work_hash(work_flags); /* all items with this hash lie in [work, tail] */ tail = wq->hash_tail[hash]; /* hashed, can run if not already running */ if (!test_and_set_bit(hash, &wq->hash->map)) { wq->hash_tail[hash] = NULL; wq_list_cut(&acct->work_list, &tail->list, prev); return work; } if (stall_hash == -1U) stall_hash = hash; /* fast forward to a next hash, for-each will fix up @prev */ node = &tail->list; } if (stall_hash != -1U) { bool unstalled; /* * Set this before dropping the lock to avoid racing with new * work being added and clearing the stalled bit. */ set_bit(IO_ACCT_STALLED_BIT, &acct->flags); raw_spin_unlock(&acct->lock); unstalled = io_wait_on_hash(wq, stall_hash); raw_spin_lock(&acct->lock); if (unstalled) { clear_bit(IO_ACCT_STALLED_BIT, &acct->flags); if (wq_has_sleeper(&wq->hash->wait)) wake_up(&wq->hash->wait); } } return NULL; } static void io_assign_current_work(struct io_worker *worker, struct io_wq_work *work) { if (work) { io_run_task_work(); cond_resched(); } raw_spin_lock(&worker->lock); worker->cur_work = work; raw_spin_unlock(&worker->lock); } /* * Called with acct->lock held, drops it before returning */ static void io_worker_handle_work(struct io_wq_acct *acct, struct io_worker *worker) __releases(&acct->lock) { struct io_wq *wq = worker->wq; bool do_kill = test_bit(IO_WQ_BIT_EXIT, &wq->state); do { struct io_wq_work *work; /* * If we got some work, mark us as busy. If we didn't, but * the list isn't empty, it means we stalled on hashed work. * Mark us stalled so we don't keep looking for work when we * can't make progress, any work completion or insertion will * clear the stalled flag. */ work = io_get_next_work(acct, wq); if (work) { /* * Make sure cancelation can find this, even before * it becomes the active work. That avoids a window * where the work has been removed from our general * work list, but isn't yet discoverable as the * current work item for this worker. */ raw_spin_lock(&worker->lock); worker->cur_work = work; raw_spin_unlock(&worker->lock); } raw_spin_unlock(&acct->lock); if (!work) break; __io_worker_busy(acct, worker); io_assign_current_work(worker, work); __set_current_state(TASK_RUNNING); /* handle a whole dependent link */ do { struct io_wq_work *next_hashed, *linked; unsigned int work_flags = atomic_read(&work->flags); unsigned int hash = __io_wq_is_hashed(work_flags) ? __io_get_work_hash(work_flags) : -1U; next_hashed = wq_next_work(work); if (do_kill && (work_flags & IO_WQ_WORK_UNBOUND)) atomic_or(IO_WQ_WORK_CANCEL, &work->flags); io_wq_submit_work(work); io_assign_current_work(worker, NULL); linked = io_wq_free_work(work); work = next_hashed; if (!work && linked && !io_wq_is_hashed(linked)) { work = linked; linked = NULL; } io_assign_current_work(worker, work); if (linked) io_wq_enqueue(wq, linked); if (hash != -1U && !next_hashed) { /* serialize hash clear with wake_up() */ spin_lock_irq(&wq->hash->wait.lock); clear_bit(hash, &wq->hash->map); clear_bit(IO_ACCT_STALLED_BIT, &acct->flags); spin_unlock_irq(&wq->hash->wait.lock); if (wq_has_sleeper(&wq->hash->wait)) wake_up(&wq->hash->wait); } } while (work); if (!__io_acct_run_queue(acct)) break; raw_spin_lock(&acct->lock); } while (1); } static int io_wq_worker(void *data) { struct io_worker *worker = data; struct io_wq_acct *acct = io_wq_get_acct(worker); struct io_wq *wq = worker->wq; bool exit_mask = false, last_timeout = false; char buf[TASK_COMM_LEN] = {}; set_mask_bits(&worker->flags, 0, BIT(IO_WORKER_F_UP) | BIT(IO_WORKER_F_RUNNING)); snprintf(buf, sizeof(buf), "iou-wrk-%d", wq->task->pid); set_task_comm(current, buf); while (!test_bit(IO_WQ_BIT_EXIT, &wq->state)) { long ret; set_current_state(TASK_INTERRUPTIBLE); /* * If we have work to do, io_acct_run_queue() returns with * the acct->lock held. If not, it will drop it. */ while (io_acct_run_queue(acct)) io_worker_handle_work(acct, worker); raw_spin_lock(&acct->workers_lock); /* * Last sleep timed out. Exit if we're not the last worker, * or if someone modified our affinity. */ if (last_timeout && (exit_mask || acct->nr_workers > 1)) { acct->nr_workers--; raw_spin_unlock(&acct->workers_lock); __set_current_state(TASK_RUNNING); break; } last_timeout = false; __io_worker_idle(acct, worker); raw_spin_unlock(&acct->workers_lock); if (io_run_task_work()) continue; ret = schedule_timeout(WORKER_IDLE_TIMEOUT); if (signal_pending(current)) { struct ksignal ksig; if (!get_signal(&ksig)) continue; break; } if (!ret) { last_timeout = true; exit_mask = !cpumask_test_cpu(raw_smp_processor_id(), wq->cpu_mask); } } if (test_bit(IO_WQ_BIT_EXIT, &wq->state) && io_acct_run_queue(acct)) io_worker_handle_work(acct, worker); io_worker_exit(worker); return 0; } /* * Called when a worker is scheduled in. Mark us as currently running. */ void io_wq_worker_running(struct task_struct *tsk) { struct io_worker *worker = tsk->worker_private; if (!worker) return; if (!test_bit(IO_WORKER_F_UP, &worker->flags)) return; if (test_bit(IO_WORKER_F_RUNNING, &worker->flags)) return; set_bit(IO_WORKER_F_RUNNING, &worker->flags); io_wq_inc_running(worker); } /* * Called when worker is going to sleep. If there are no workers currently * running and we have work pending, wake up a free one or create a new one. */ void io_wq_worker_sleeping(struct task_struct *tsk) { struct io_worker *worker = tsk->worker_private; if (!worker) return; if (!test_bit(IO_WORKER_F_UP, &worker->flags)) return; if (!test_bit(IO_WORKER_F_RUNNING, &worker->flags)) return; clear_bit(IO_WORKER_F_RUNNING, &worker->flags); io_wq_dec_running(worker); } static void io_init_new_worker(struct io_wq *wq, struct io_wq_acct *acct, struct io_worker *worker, struct task_struct *tsk) { tsk->worker_private = worker; worker->task = tsk; set_cpus_allowed_ptr(tsk, wq->cpu_mask); raw_spin_lock(&acct->workers_lock); hlist_nulls_add_head_rcu(&worker->nulls_node, &acct->free_list); list_add_tail_rcu(&worker->all_list, &acct->all_list); set_bit(IO_WORKER_F_FREE, &worker->flags); raw_spin_unlock(&acct->workers_lock); wake_up_new_task(tsk); } static bool io_wq_work_match_all(struct io_wq_work *work, void *data) { return true; } static inline bool io_should_retry_thread(struct io_worker *worker, long err) { /* * Prevent perpetual task_work retry, if the task (or its group) is * exiting. */ if (fatal_signal_pending(current)) return false; if (worker->init_retries++ >= WORKER_INIT_LIMIT) return false; switch (err) { case -EAGAIN: case -ERESTARTSYS: case -ERESTARTNOINTR: case -ERESTARTNOHAND: return true; default: return false; } } static void queue_create_worker_retry(struct io_worker *worker) { /* * We only bother retrying because there's a chance that the * failure to create a worker is due to some temporary condition * in the forking task (e.g. outstanding signal); give the task * some time to clear that condition. */ schedule_delayed_work(&worker->work, msecs_to_jiffies(worker->init_retries * 5)); } static void create_worker_cont(struct callback_head *cb) { struct io_worker *worker; struct task_struct *tsk; struct io_wq *wq; struct io_wq_acct *acct; worker = container_of(cb, struct io_worker, create_work); clear_bit_unlock(0, &worker->create_state); wq = worker->wq; acct = io_wq_get_acct(worker); tsk = create_io_thread(io_wq_worker, worker, NUMA_NO_NODE); if (!IS_ERR(tsk)) { io_init_new_worker(wq, acct, worker, tsk); io_worker_release(worker); return; } else if (!io_should_retry_thread(worker, PTR_ERR(tsk))) { atomic_dec(&acct->nr_running); raw_spin_lock(&acct->workers_lock); acct->nr_workers--; if (!acct->nr_workers) { struct io_cb_cancel_data match = { .fn = io_wq_work_match_all, .cancel_all = true, }; raw_spin_unlock(&acct->workers_lock); while (io_acct_cancel_pending_work(wq, acct, &match)) ; } else { raw_spin_unlock(&acct->workers_lock); } io_worker_ref_put(wq); kfree(worker); return; } /* re-create attempts grab a new worker ref, drop the existing one */ io_worker_release(worker); queue_create_worker_retry(worker); } static void io_workqueue_create(struct work_struct *work) { struct io_worker *worker = container_of(work, struct io_worker, work.work); struct io_wq_acct *acct = io_wq_get_acct(worker); if (!io_queue_worker_create(worker, acct, create_worker_cont)) kfree(worker); } static bool create_io_worker(struct io_wq *wq, struct io_wq_acct *acct) { struct io_worker *worker; struct task_struct *tsk; __set_current_state(TASK_RUNNING); worker = kzalloc(sizeof(*worker), GFP_KERNEL); if (!worker) { fail: atomic_dec(&acct->nr_running); raw_spin_lock(&acct->workers_lock); acct->nr_workers--; raw_spin_unlock(&acct->workers_lock); io_worker_ref_put(wq); return false; } refcount_set(&worker->ref, 1); worker->wq = wq; worker->acct = acct; raw_spin_lock_init(&worker->lock); init_completion(&worker->ref_done); tsk = create_io_thread(io_wq_worker, worker, NUMA_NO_NODE); if (!IS_ERR(tsk)) { io_init_new_worker(wq, acct, worker, tsk); } else if (!io_should_retry_thread(worker, PTR_ERR(tsk))) { kfree(worker); goto fail; } else { INIT_DELAYED_WORK(&worker->work, io_workqueue_create); queue_create_worker_retry(worker); } return true; } /* * Iterate the passed in list and call the specific function for each * worker that isn't exiting */ static bool io_acct_for_each_worker(struct io_wq_acct *acct, bool (*func)(struct io_worker *, void *), void *data) { struct io_worker *worker; bool ret = false; list_for_each_entry_rcu(worker, &acct->all_list, all_list) { if (io_worker_get(worker)) { /* no task if node is/was offline */ if (worker->task) ret = func(worker, data); io_worker_release(worker); if (ret) break; } } return ret; } static bool io_wq_for_each_worker(struct io_wq *wq, bool (*func)(struct io_worker *, void *), void *data) { for (int i = 0; i < IO_WQ_ACCT_NR; i++) { if (!io_acct_for_each_worker(&wq->acct[i], func, data)) return false; } return true; } static bool io_wq_worker_wake(struct io_worker *worker, void *data) { __set_notify_signal(worker->task); wake_up_process(worker->task); return false; } static void io_run_cancel(struct io_wq_work *work, struct io_wq *wq) { do { atomic_or(IO_WQ_WORK_CANCEL, &work->flags); io_wq_submit_work(work); work = io_wq_free_work(work); } while (work); } static void io_wq_insert_work(struct io_wq *wq, struct io_wq_acct *acct, struct io_wq_work *work, unsigned int work_flags) { unsigned int hash; struct io_wq_work *tail; if (!__io_wq_is_hashed(work_flags)) { append: wq_list_add_tail(&work->list, &acct->work_list); return; } hash = __io_get_work_hash(work_flags); tail = wq->hash_tail[hash]; wq->hash_tail[hash] = work; if (!tail) goto append; wq_list_add_after(&work->list, &tail->list, &acct->work_list); } static bool io_wq_work_match_item(struct io_wq_work *work, void *data) { return work == data; } void io_wq_enqueue(struct io_wq *wq, struct io_wq_work *work) { unsigned int work_flags = atomic_read(&work->flags); struct io_wq_acct *acct = io_work_get_acct(wq, work_flags); struct io_cb_cancel_data match = { .fn = io_wq_work_match_item, .data = work, .cancel_all = false, }; bool do_create; /* * If io-wq is exiting for this task, or if the request has explicitly * been marked as one that should not get executed, cancel it here. */ if (test_bit(IO_WQ_BIT_EXIT, &wq->state) || (work_flags & IO_WQ_WORK_CANCEL)) { io_run_cancel(work, wq); return; } raw_spin_lock(&acct->lock); io_wq_insert_work(wq, acct, work, work_flags); clear_bit(IO_ACCT_STALLED_BIT, &acct->flags); raw_spin_unlock(&acct->lock); rcu_read_lock(); do_create = !io_acct_activate_free_worker(acct); rcu_read_unlock(); if (do_create && ((work_flags & IO_WQ_WORK_CONCURRENT) || !atomic_read(&acct->nr_running))) { bool did_create; did_create = io_wq_create_worker(wq, acct); if (likely(did_create)) return; raw_spin_lock(&acct->workers_lock); if (acct->nr_workers) { raw_spin_unlock(&acct->workers_lock); return; } raw_spin_unlock(&acct->workers_lock); /* fatal condition, failed to create the first worker */ io_acct_cancel_pending_work(wq, acct, &match); } } /* * Work items that hash to the same value will not be done in parallel. * Used to limit concurrent writes, generally hashed by inode. */ void io_wq_hash_work(struct io_wq_work *work, void *val) { unsigned int bit; bit = hash_ptr(val, IO_WQ_HASH_ORDER); atomic_or(IO_WQ_WORK_HASHED | (bit << IO_WQ_HASH_SHIFT), &work->flags); } static bool __io_wq_worker_cancel(struct io_worker *worker, struct io_cb_cancel_data *match, struct io_wq_work *work) { if (work && match->fn(work, match->data)) { atomic_or(IO_WQ_WORK_CANCEL, &work->flags); __set_notify_signal(worker->task); return true; } return false; } static bool io_wq_worker_cancel(struct io_worker *worker, void *data) { struct io_cb_cancel_data *match = data; /* * Hold the lock to avoid ->cur_work going out of scope, caller * may dereference the passed in work. */ raw_spin_lock(&worker->lock); if (__io_wq_worker_cancel(worker, match, worker->cur_work)) match->nr_running++; raw_spin_unlock(&worker->lock); return match->nr_running && !match->cancel_all; } static inline void io_wq_remove_pending(struct io_wq *wq, struct io_wq_acct *acct, struct io_wq_work *work, struct io_wq_work_node *prev) { unsigned int hash = io_get_work_hash(work); struct io_wq_work *prev_work = NULL; if (io_wq_is_hashed(work) && work == wq->hash_tail[hash]) { if (prev) prev_work = container_of(prev, struct io_wq_work, list); if (prev_work && io_get_work_hash(prev_work) == hash) wq->hash_tail[hash] = prev_work; else wq->hash_tail[hash] = NULL; } wq_list_del(&acct->work_list, &work->list, prev); } static bool io_acct_cancel_pending_work(struct io_wq *wq, struct io_wq_acct *acct, struct io_cb_cancel_data *match) { struct io_wq_work_node *node, *prev; struct io_wq_work *work; raw_spin_lock(&acct->lock); wq_list_for_each(node, prev, &acct->work_list) { work = container_of(node, struct io_wq_work, list); if (!match->fn(work, match->data)) continue; io_wq_remove_pending(wq, acct, work, prev); raw_spin_unlock(&acct->lock); io_run_cancel(work, wq); match->nr_pending++; /* not safe to continue after unlock */ return true; } raw_spin_unlock(&acct->lock); return false; } static void io_wq_cancel_pending_work(struct io_wq *wq, struct io_cb_cancel_data *match) { int i; retry: for (i = 0; i < IO_WQ_ACCT_NR; i++) { struct io_wq_acct *acct = io_get_acct(wq, i == 0); if (io_acct_cancel_pending_work(wq, acct, match)) { if (match->cancel_all) goto retry; break; } } } static void io_acct_cancel_running_work(struct io_wq_acct *acct, struct io_cb_cancel_data *match) { raw_spin_lock(&acct->workers_lock); io_acct_for_each_worker(acct, io_wq_worker_cancel, match); raw_spin_unlock(&acct->workers_lock); } static void io_wq_cancel_running_work(struct io_wq *wq, struct io_cb_cancel_data *match) { rcu_read_lock(); for (int i = 0; i < IO_WQ_ACCT_NR; i++) io_acct_cancel_running_work(&wq->acct[i], match); rcu_read_unlock(); } enum io_wq_cancel io_wq_cancel_cb(struct io_wq *wq, work_cancel_fn *cancel, void *data, bool cancel_all) { struct io_cb_cancel_data match = { .fn = cancel, .data = data, .cancel_all = cancel_all, }; /* * First check pending list, if we're lucky we can just remove it * from there. CANCEL_OK means that the work is returned as-new, * no completion will be posted for it. * * Then check if a free (going busy) or busy worker has the work * currently running. If we find it there, we'll return CANCEL_RUNNING * as an indication that we attempt to signal cancellation. The * completion will run normally in this case. * * Do both of these while holding the acct->workers_lock, to ensure that * we'll find a work item regardless of state. */ io_wq_cancel_pending_work(wq, &match); if (match.nr_pending && !match.cancel_all) return IO_WQ_CANCEL_OK; io_wq_cancel_running_work(wq, &match); if (match.nr_running && !match.cancel_all) return IO_WQ_CANCEL_RUNNING; if (match.nr_running) return IO_WQ_CANCEL_RUNNING; if (match.nr_pending) return IO_WQ_CANCEL_OK; return IO_WQ_CANCEL_NOTFOUND; } static int io_wq_hash_wake(struct wait_queue_entry *wait, unsigned mode, int sync, void *key) { struct io_wq *wq = container_of(wait, struct io_wq, wait); int i; list_del_init(&wait->entry); rcu_read_lock(); for (i = 0; i < IO_WQ_ACCT_NR; i++) { struct io_wq_acct *acct = &wq->acct[i]; if (test_and_clear_bit(IO_ACCT_STALLED_BIT, &acct->flags)) io_acct_activate_free_worker(acct); } rcu_read_unlock(); return 1; } struct io_wq *io_wq_create(unsigned bounded, struct io_wq_data *data) { int ret, i; struct io_wq *wq; if (WARN_ON_ONCE(!bounded)) return ERR_PTR(-EINVAL); wq = kzalloc(sizeof(struct io_wq), GFP_KERNEL); if (!wq) return ERR_PTR(-ENOMEM); refcount_inc(&data->hash->refs); wq->hash = data->hash; ret = -ENOMEM; if (!alloc_cpumask_var(&wq->cpu_mask, GFP_KERNEL)) goto err; cpuset_cpus_allowed(data->task, wq->cpu_mask); wq->acct[IO_WQ_ACCT_BOUND].max_workers = bounded; wq->acct[IO_WQ_ACCT_UNBOUND].max_workers = task_rlimit(current, RLIMIT_NPROC); INIT_LIST_HEAD(&wq->wait.entry); wq->wait.func = io_wq_hash_wake; for (i = 0; i < IO_WQ_ACCT_NR; i++) { struct io_wq_acct *acct = &wq->acct[i]; atomic_set(&acct->nr_running, 0); raw_spin_lock_init(&acct->workers_lock); INIT_HLIST_NULLS_HEAD(&acct->free_list, 0); INIT_LIST_HEAD(&acct->all_list); INIT_WQ_LIST(&acct->work_list); raw_spin_lock_init(&acct->lock); } wq->task = get_task_struct(data->task); atomic_set(&wq->worker_refs, 1); init_completion(&wq->worker_done); ret = cpuhp_state_add_instance_nocalls(io_wq_online, &wq->cpuhp_node); if (ret) { put_task_struct(wq->task); goto err; } return wq; err: io_wq_put_hash(data->hash); free_cpumask_var(wq->cpu_mask); kfree(wq); return ERR_PTR(ret); } static bool io_task_work_match(struct callback_head *cb, void *data) { struct io_worker *worker; if (cb->func != create_worker_cb && cb->func != create_worker_cont) return false; worker = container_of(cb, struct io_worker, create_work); return worker->wq == data; } void io_wq_exit_start(struct io_wq *wq) { set_bit(IO_WQ_BIT_EXIT, &wq->state); } static void io_wq_cancel_tw_create(struct io_wq *wq) { struct callback_head *cb; while ((cb = task_work_cancel_match(wq->task, io_task_work_match, wq)) != NULL) { struct io_worker *worker; worker = container_of(cb, struct io_worker, create_work); io_worker_cancel_cb(worker); /* * Only the worker continuation helper has worker allocated and * hence needs freeing. */ if (cb->func == create_worker_cont) kfree(worker); } } static void io_wq_exit_workers(struct io_wq *wq) { if (!wq->task) return; io_wq_cancel_tw_create(wq); rcu_read_lock(); io_wq_for_each_worker(wq, io_wq_worker_wake, NULL); rcu_read_unlock(); io_worker_ref_put(wq); wait_for_completion(&wq->worker_done); spin_lock_irq(&wq->hash->wait.lock); list_del_init(&wq->wait.entry); spin_unlock_irq(&wq->hash->wait.lock); put_task_struct(wq->task); wq->task = NULL; } static void io_wq_destroy(struct io_wq *wq) { struct io_cb_cancel_data match = { .fn = io_wq_work_match_all, .cancel_all = true, }; cpuhp_state_remove_instance_nocalls(io_wq_online, &wq->cpuhp_node); io_wq_cancel_pending_work(wq, &match); free_cpumask_var(wq->cpu_mask); io_wq_put_hash(wq->hash); kfree(wq); } void io_wq_put_and_exit(struct io_wq *wq) { WARN_ON_ONCE(!test_bit(IO_WQ_BIT_EXIT, &wq->state)); io_wq_exit_workers(wq); io_wq_destroy(wq); } struct online_data { unsigned int cpu; bool online; }; static bool io_wq_worker_affinity(struct io_worker *worker, void *data) { struct online_data *od = data; if (od->online) cpumask_set_cpu(od->cpu, worker->wq->cpu_mask); else cpumask_clear_cpu(od->cpu, worker->wq->cpu_mask); return false; } static int __io_wq_cpu_online(struct io_wq *wq, unsigned int cpu, bool online) { struct online_data od = { .cpu = cpu, .online = online }; rcu_read_lock(); io_wq_for_each_worker(wq, io_wq_worker_affinity, &od); rcu_read_unlock(); return 0; } static int io_wq_cpu_online(unsigned int cpu, struct hlist_node *node) { struct io_wq *wq = hlist_entry_safe(node, struct io_wq, cpuhp_node); return __io_wq_cpu_online(wq, cpu, true); } static int io_wq_cpu_offline(unsigned int cpu, struct hlist_node *node) { struct io_wq *wq = hlist_entry_safe(node, struct io_wq, cpuhp_node); return __io_wq_cpu_online(wq, cpu, false); } int io_wq_cpu_affinity(struct io_uring_task *tctx, cpumask_var_t mask) { cpumask_var_t allowed_mask; int ret = 0; if (!tctx || !tctx->io_wq) return -EINVAL; if (!alloc_cpumask_var(&allowed_mask, GFP_KERNEL)) return -ENOMEM; rcu_read_lock(); cpuset_cpus_allowed(tctx->io_wq->task, allowed_mask); if (mask) { if (cpumask_subset(mask, allowed_mask)) cpumask_copy(tctx->io_wq->cpu_mask, mask); else ret = -EINVAL; } else { cpumask_copy(tctx->io_wq->cpu_mask, allowed_mask); } rcu_read_unlock(); free_cpumask_var(allowed_mask); return ret; } /* * Set max number of unbounded workers, returns old value. If new_count is 0, * then just return the old value. */ int io_wq_max_workers(struct io_wq *wq, int *new_count) { struct io_wq_acct *acct; int prev[IO_WQ_ACCT_NR]; int i; BUILD_BUG_ON((int) IO_WQ_ACCT_BOUND != (int) IO_WQ_BOUND); BUILD_BUG_ON((int) IO_WQ_ACCT_UNBOUND != (int) IO_WQ_UNBOUND); BUILD_BUG_ON((int) IO_WQ_ACCT_NR != 2); for (i = 0; i < IO_WQ_ACCT_NR; i++) { if (new_count[i] > task_rlimit(current, RLIMIT_NPROC)) new_count[i] = task_rlimit(current, RLIMIT_NPROC); } for (i = 0; i < IO_WQ_ACCT_NR; i++) prev[i] = 0; rcu_read_lock(); for (i = 0; i < IO_WQ_ACCT_NR; i++) { acct = &wq->acct[i]; raw_spin_lock(&acct->workers_lock); prev[i] = max_t(int, acct->max_workers, prev[i]); if (new_count[i]) acct->max_workers = new_count[i]; raw_spin_unlock(&acct->workers_lock); } rcu_read_unlock(); for (i = 0; i < IO_WQ_ACCT_NR; i++) new_count[i] = prev[i]; return 0; } static __init int io_wq_init(void) { int ret; ret = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, "io-wq/online", io_wq_cpu_online, io_wq_cpu_offline); if (ret < 0) return ret; io_wq_online = ret; return 0; } subsys_initcall(io_wq_init); |
| 13 13 5 5 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 | #include <linux/gfp.h> #include <linux/initrd.h> #include <linux/ioport.h> #include <linux/swap.h> #include <linux/memblock.h> #include <linux/swapfile.h> #include <linux/swapops.h> #include <linux/kmemleak.h> #include <linux/sched/task.h> #include <linux/execmem.h> #include <asm/set_memory.h> #include <asm/cpu_device_id.h> #include <asm/e820/api.h> #include <asm/init.h> #include <asm/page.h> #include <asm/page_types.h> #include <asm/sections.h> #include <asm/setup.h> #include <asm/tlbflush.h> #include <asm/tlb.h> #include <asm/proto.h> #include <asm/dma.h> /* for MAX_DMA_PFN */ #include <asm/kaslr.h> #include <asm/hypervisor.h> #include <asm/cpufeature.h> #include <asm/pti.h> #include <asm/text-patching.h> #include <asm/memtype.h> #include <asm/paravirt.h> #include <asm/mmu_context.h> /* * We need to define the tracepoints somewhere, and tlb.c * is only compiled when SMP=y. */ #define CREATE_TRACE_POINTS #include <trace/events/tlb.h> #include "mm_internal.h" /* * Tables translating between page_cache_type_t and pte encoding. * * The default values are defined statically as minimal supported mode; * WC and WT fall back to UC-. pat_init() updates these values to support * more cache modes, WC and WT, when it is safe to do so. See pat_init() * for the details. Note, __early_ioremap() used during early boot-time * takes pgprot_t (pte encoding) and does not use these tables. * * Index into __cachemode2pte_tbl[] is the cachemode. * * Index into __pte2cachemode_tbl[] are the caching attribute bits of the pte * (_PAGE_PWT, _PAGE_PCD, _PAGE_PAT) at index bit positions 0, 1, 2. */ static uint16_t __cachemode2pte_tbl[_PAGE_CACHE_MODE_NUM] = { [_PAGE_CACHE_MODE_WB ] = 0 | 0 , [_PAGE_CACHE_MODE_WC ] = 0 | _PAGE_PCD, [_PAGE_CACHE_MODE_UC_MINUS] = 0 | _PAGE_PCD, [_PAGE_CACHE_MODE_UC ] = _PAGE_PWT | _PAGE_PCD, [_PAGE_CACHE_MODE_WT ] = 0 | _PAGE_PCD, [_PAGE_CACHE_MODE_WP ] = 0 | _PAGE_PCD, }; unsigned long cachemode2protval(enum page_cache_mode pcm) { if (likely(pcm == 0)) return 0; return __cachemode2pte_tbl[pcm]; } EXPORT_SYMBOL(cachemode2protval); static uint8_t __pte2cachemode_tbl[8] = { [__pte2cm_idx( 0 | 0 | 0 )] = _PAGE_CACHE_MODE_WB, [__pte2cm_idx(_PAGE_PWT | 0 | 0 )] = _PAGE_CACHE_MODE_UC_MINUS, [__pte2cm_idx( 0 | _PAGE_PCD | 0 )] = _PAGE_CACHE_MODE_UC_MINUS, [__pte2cm_idx(_PAGE_PWT | _PAGE_PCD | 0 )] = _PAGE_CACHE_MODE_UC, [__pte2cm_idx( 0 | 0 | _PAGE_PAT)] = _PAGE_CACHE_MODE_WB, [__pte2cm_idx(_PAGE_PWT | 0 | _PAGE_PAT)] = _PAGE_CACHE_MODE_UC_MINUS, [__pte2cm_idx(0 | _PAGE_PCD | _PAGE_PAT)] = _PAGE_CACHE_MODE_UC_MINUS, [__pte2cm_idx(_PAGE_PWT | _PAGE_PCD | _PAGE_PAT)] = _PAGE_CACHE_MODE_UC, }; /* * Check that the write-protect PAT entry is set for write-protect. * To do this without making assumptions how PAT has been set up (Xen has * another layout than the kernel), translate the _PAGE_CACHE_MODE_WP cache * mode via the __cachemode2pte_tbl[] into protection bits (those protection * bits will select a cache mode of WP or better), and then translate the * protection bits back into the cache mode using __pte2cm_idx() and the * __pte2cachemode_tbl[] array. This will return the really used cache mode. */ bool x86_has_pat_wp(void) { uint16_t prot = __cachemode2pte_tbl[_PAGE_CACHE_MODE_WP]; return __pte2cachemode_tbl[__pte2cm_idx(prot)] == _PAGE_CACHE_MODE_WP; } enum page_cache_mode pgprot2cachemode(pgprot_t pgprot) { unsigned long masked; masked = pgprot_val(pgprot) & _PAGE_CACHE_MASK; if (likely(masked == 0)) return 0; return __pte2cachemode_tbl[__pte2cm_idx(masked)]; } static unsigned long __initdata pgt_buf_start; static unsigned long __initdata pgt_buf_end; static unsigned long __initdata pgt_buf_top; static unsigned long min_pfn_mapped; static bool __initdata can_use_brk_pgt = true; /* * Pages returned are already directly mapped. * * Changing that is likely to break Xen, see commit: * * 279b706 x86,xen: introduce x86_init.mapping.pagetable_reserve * * for detailed information. */ __ref void *alloc_low_pages(unsigned int num) { unsigned long pfn; int i; if (after_bootmem) { unsigned int order; order = get_order((unsigned long)num << PAGE_SHIFT); return (void *)__get_free_pages(GFP_ATOMIC | __GFP_ZERO, order); } if ((pgt_buf_end + num) > pgt_buf_top || !can_use_brk_pgt) { unsigned long ret = 0; if (min_pfn_mapped < max_pfn_mapped) { ret = memblock_phys_alloc_range( PAGE_SIZE * num, PAGE_SIZE, min_pfn_mapped << PAGE_SHIFT, max_pfn_mapped << PAGE_SHIFT); } if (!ret && can_use_brk_pgt) ret = __pa(extend_brk(PAGE_SIZE * num, PAGE_SIZE)); if (!ret) panic("alloc_low_pages: can not alloc memory"); pfn = ret >> PAGE_SHIFT; } else { pfn = pgt_buf_end; pgt_buf_end += num; } for (i = 0; i < num; i++) { void *adr; adr = __va((pfn + i) << PAGE_SHIFT); clear_page(adr); } return __va(pfn << PAGE_SHIFT); } /* * By default need to be able to allocate page tables below PGD firstly for * the 0-ISA_END_ADDRESS range and secondly for the initial PMD_SIZE mapping. * With KASLR memory randomization, depending on the machine e820 memory and the * PUD alignment, twice that many pages may be needed when KASLR memory * randomization is enabled. */ #define INIT_PGD_PAGE_TABLES 4 #ifndef CONFIG_RANDOMIZE_MEMORY #define INIT_PGD_PAGE_COUNT (2 * INIT_PGD_PAGE_TABLES) #else #define INIT_PGD_PAGE_COUNT (4 * INIT_PGD_PAGE_TABLES) #endif #define INIT_PGT_BUF_SIZE (INIT_PGD_PAGE_COUNT * PAGE_SIZE) RESERVE_BRK(early_pgt_alloc, INIT_PGT_BUF_SIZE); void __init early_alloc_pgt_buf(void) { unsigned long tables = INIT_PGT_BUF_SIZE; phys_addr_t base; base = __pa(extend_brk(tables, PAGE_SIZE)); pgt_buf_start = base >> PAGE_SHIFT; pgt_buf_end = pgt_buf_start; pgt_buf_top = pgt_buf_start + (tables >> PAGE_SHIFT); } int after_bootmem; early_param_on_off("gbpages", "nogbpages", direct_gbpages, CONFIG_X86_DIRECT_GBPAGES); struct map_range { unsigned long start; unsigned long end; unsigned page_size_mask; }; static int page_size_mask; /* * Save some of cr4 feature set we're using (e.g. Pentium 4MB * enable and PPro Global page enable), so that any CPU's that boot * up after us can get the correct flags. Invoked on the boot CPU. */ static inline void cr4_set_bits_and_update_boot(unsigned long mask) { mmu_cr4_features |= mask; if (trampoline_cr4_features) *trampoline_cr4_features = mmu_cr4_features; cr4_set_bits(mask); } static void __init probe_page_size_mask(void) { /* * For pagealloc debugging, identity mapping will use small pages. * This will simplify cpa(), which otherwise needs to support splitting * large pages into small in interrupt context, etc. */ if (boot_cpu_has(X86_FEATURE_PSE) && !debug_pagealloc_enabled()) page_size_mask |= 1 << PG_LEVEL_2M; else direct_gbpages = 0; /* Enable PSE if available */ if (boot_cpu_has(X86_FEATURE_PSE)) cr4_set_bits_and_update_boot(X86_CR4_PSE); /* Enable PGE if available */ __supported_pte_mask &= ~_PAGE_GLOBAL; if (boot_cpu_has(X86_FEATURE_PGE)) { cr4_set_bits_and_update_boot(X86_CR4_PGE); __supported_pte_mask |= _PAGE_GLOBAL; } /* By the default is everything supported: */ __default_kernel_pte_mask = __supported_pte_mask; /* Except when with PTI where the kernel is mostly non-Global: */ if (cpu_feature_enabled(X86_FEATURE_PTI)) __default_kernel_pte_mask &= ~_PAGE_GLOBAL; /* Enable 1 GB linear kernel mappings if available: */ if (direct_gbpages && boot_cpu_has(X86_FEATURE_GBPAGES)) { printk(KERN_INFO "Using GB pages for direct mapping\n"); page_size_mask |= 1 << PG_LEVEL_1G; } else { direct_gbpages = 0; } } /* * INVLPG may not properly flush Global entries on * these CPUs. New microcode fixes the issue. */ static const struct x86_cpu_id invlpg_miss_ids[] = { X86_MATCH_VFM(INTEL_ALDERLAKE, 0x2e), X86_MATCH_VFM(INTEL_ALDERLAKE_L, 0x42c), X86_MATCH_VFM(INTEL_ATOM_GRACEMONT, 0x11), X86_MATCH_VFM(INTEL_RAPTORLAKE, 0x118), X86_MATCH_VFM(INTEL_RAPTORLAKE_P, 0x4117), X86_MATCH_VFM(INTEL_RAPTORLAKE_S, 0x2e), {} }; static void setup_pcid(void) { const struct x86_cpu_id *invlpg_miss_match; if (!IS_ENABLED(CONFIG_X86_64)) return; if (!boot_cpu_has(X86_FEATURE_PCID)) return; invlpg_miss_match = x86_match_cpu(invlpg_miss_ids); if (invlpg_miss_match && boot_cpu_data.microcode < invlpg_miss_match->driver_data) { pr_info("Incomplete global flushes, disabling PCID"); setup_clear_cpu_cap(X86_FEATURE_PCID); return; } if (boot_cpu_has(X86_FEATURE_PGE)) { /* * This can't be cr4_set_bits_and_update_boot() -- the * trampoline code can't handle CR4.PCIDE and it wouldn't * do any good anyway. Despite the name, * cr4_set_bits_and_update_boot() doesn't actually cause * the bits in question to remain set all the way through * the secondary boot asm. * * Instead, we brute-force it and set CR4.PCIDE manually in * start_secondary(). */ cr4_set_bits(X86_CR4_PCIDE); } else { /* * flush_tlb_all(), as currently implemented, won't work if * PCID is on but PGE is not. Since that combination * doesn't exist on real hardware, there's no reason to try * to fully support it, but it's polite to avoid corrupting * data if we're on an improperly configured VM. */ setup_clear_cpu_cap(X86_FEATURE_PCID); } } #ifdef CONFIG_X86_32 #define NR_RANGE_MR 3 #else /* CONFIG_X86_64 */ #define NR_RANGE_MR 5 #endif static int __meminit save_mr(struct map_range *mr, int nr_range, unsigned long start_pfn, unsigned long end_pfn, unsigned long page_size_mask) { if (start_pfn < end_pfn) { if (nr_range >= NR_RANGE_MR) panic("run out of range for init_memory_mapping\n"); mr[nr_range].start = start_pfn<<PAGE_SHIFT; mr[nr_range].end = end_pfn<<PAGE_SHIFT; mr[nr_range].page_size_mask = page_size_mask; nr_range++; } return nr_range; } /* * adjust the page_size_mask for small range to go with * big page size instead small one if nearby are ram too. */ static void __ref adjust_range_page_size_mask(struct map_range *mr, int nr_range) { int i; for (i = 0; i < nr_range; i++) { if ((page_size_mask & (1<<PG_LEVEL_2M)) && !(mr[i].page_size_mask & (1<<PG_LEVEL_2M))) { unsigned long start = round_down(mr[i].start, PMD_SIZE); unsigned long end = round_up(mr[i].end, PMD_SIZE); #ifdef CONFIG_X86_32 if ((end >> PAGE_SHIFT) > max_low_pfn) continue; #endif if (memblock_is_region_memory(start, end - start)) mr[i].page_size_mask |= 1<<PG_LEVEL_2M; } if ((page_size_mask & (1<<PG_LEVEL_1G)) && !(mr[i].page_size_mask & (1<<PG_LEVEL_1G))) { unsigned long start = round_down(mr[i].start, PUD_SIZE); unsigned long end = round_up(mr[i].end, PUD_SIZE); if (memblock_is_region_memory(start, end - start)) mr[i].page_size_mask |= 1<<PG_LEVEL_1G; } } } static const char *page_size_string(struct map_range *mr) { static const char str_1g[] = "1G"; static const char str_2m[] = "2M"; static const char str_4m[] = "4M"; static const char str_4k[] = "4k"; if (mr->page_size_mask & (1<<PG_LEVEL_1G)) return str_1g; /* * 32-bit without PAE has a 4M large page size. * PG_LEVEL_2M is misnamed, but we can at least * print out the right size in the string. */ if (IS_ENABLED(CONFIG_X86_32) && !IS_ENABLED(CONFIG_X86_PAE) && mr->page_size_mask & (1<<PG_LEVEL_2M)) return str_4m; if (mr->page_size_mask & (1<<PG_LEVEL_2M)) return str_2m; return str_4k; } static int __meminit split_mem_range(struct map_range *mr, int nr_range, unsigned long start, unsigned long end) { unsigned long start_pfn, end_pfn, limit_pfn; unsigned long pfn; int i; limit_pfn = PFN_DOWN(end); /* head if not big page alignment ? */ pfn = start_pfn = PFN_DOWN(start); #ifdef CONFIG_X86_32 /* * Don't use a large page for the first 2/4MB of memory * because there are often fixed size MTRRs in there * and overlapping MTRRs into large pages can cause * slowdowns. */ if (pfn == 0) end_pfn = PFN_DOWN(PMD_SIZE); else end_pfn = round_up(pfn, PFN_DOWN(PMD_SIZE)); #else /* CONFIG_X86_64 */ end_pfn = round_up(pfn, PFN_DOWN(PMD_SIZE)); #endif if (end_pfn > limit_pfn) end_pfn = limit_pfn; if (start_pfn < end_pfn) { nr_range = save_mr(mr, nr_range, start_pfn, end_pfn, 0); pfn = end_pfn; } /* big page (2M) range */ start_pfn = round_up(pfn, PFN_DOWN(PMD_SIZE)); #ifdef CONFIG_X86_32 end_pfn = round_down(limit_pfn, PFN_DOWN(PMD_SIZE)); #else /* CONFIG_X86_64 */ end_pfn = round_up(pfn, PFN_DOWN(PUD_SIZE)); if (end_pfn > round_down(limit_pfn, PFN_DOWN(PMD_SIZE))) end_pfn = round_down(limit_pfn, PFN_DOWN(PMD_SIZE)); #endif if (start_pfn < end_pfn) { nr_range = save_mr(mr, nr_range, start_pfn, end_pfn, page_size_mask & (1<<PG_LEVEL_2M)); pfn = end_pfn; } #ifdef CONFIG_X86_64 /* big page (1G) range */ start_pfn = round_up(pfn, PFN_DOWN(PUD_SIZE)); end_pfn = round_down(limit_pfn, PFN_DOWN(PUD_SIZE)); if (start_pfn < end_pfn) { nr_range = save_mr(mr, nr_range, start_pfn, end_pfn, page_size_mask & ((1<<PG_LEVEL_2M)|(1<<PG_LEVEL_1G))); pfn = end_pfn; } /* tail is not big page (1G) alignment */ start_pfn = round_up(pfn, PFN_DOWN(PMD_SIZE)); end_pfn = round_down(limit_pfn, PFN_DOWN(PMD_SIZE)); if (start_pfn < end_pfn) { nr_range = save_mr(mr, nr_range, start_pfn, end_pfn, page_size_mask & (1<<PG_LEVEL_2M)); pfn = end_pfn; } #endif /* tail is not big page (2M) alignment */ start_pfn = pfn; end_pfn = limit_pfn; nr_range = save_mr(mr, nr_range, start_pfn, end_pfn, 0); if (!after_bootmem) adjust_range_page_size_mask(mr, nr_range); /* try to merge same page size and continuous */ for (i = 0; nr_range > 1 && i < nr_range - 1; i++) { unsigned long old_start; if (mr[i].end != mr[i+1].start || mr[i].page_size_mask != mr[i+1].page_size_mask) continue; /* move it */ old_start = mr[i].start; memmove(&mr[i], &mr[i+1], (nr_range - 1 - i) * sizeof(struct map_range)); mr[i--].start = old_start; nr_range--; } for (i = 0; i < nr_range; i++) pr_debug(" [mem %#010lx-%#010lx] page %s\n", mr[i].start, mr[i].end - 1, page_size_string(&mr[i])); return nr_range; } struct range pfn_mapped[E820_MAX_ENTRIES]; int nr_pfn_mapped; static void add_pfn_range_mapped(unsigned long start_pfn, unsigned long end_pfn) { nr_pfn_mapped = add_range_with_merge(pfn_mapped, E820_MAX_ENTRIES, nr_pfn_mapped, start_pfn, end_pfn); nr_pfn_mapped = clean_sort_range(pfn_mapped, E820_MAX_ENTRIES); max_pfn_mapped = max(max_pfn_mapped, end_pfn); if (start_pfn < (1UL<<(32-PAGE_SHIFT))) max_low_pfn_mapped = max(max_low_pfn_mapped, min(end_pfn, 1UL<<(32-PAGE_SHIFT))); } bool pfn_range_is_mapped(unsigned long start_pfn, unsigned long end_pfn) { int i; for (i = 0; i < nr_pfn_mapped; i++) if ((start_pfn >= pfn_mapped[i].start) && (end_pfn <= pfn_mapped[i].end)) return true; return false; } /* * Setup the direct mapping of the physical memory at PAGE_OFFSET. * This runs before bootmem is initialized and gets pages directly from * the physical memory. To access them they are temporarily mapped. */ unsigned long __ref init_memory_mapping(unsigned long start, unsigned long end, pgprot_t prot) { struct map_range mr[NR_RANGE_MR]; unsigned long ret = 0; int nr_range, i; pr_debug("init_memory_mapping: [mem %#010lx-%#010lx]\n", start, end - 1); memset(mr, 0, sizeof(mr)); nr_range = split_mem_range(mr, 0, start, end); for (i = 0; i < nr_range; i++) ret = kernel_physical_mapping_init(mr[i].start, mr[i].end, mr[i].page_size_mask, prot); add_pfn_range_mapped(start >> PAGE_SHIFT, ret >> PAGE_SHIFT); return ret >> PAGE_SHIFT; } /* * We need to iterate through the E820 memory map and create direct mappings * for only E820_TYPE_RAM and E820_KERN_RESERVED regions. We cannot simply * create direct mappings for all pfns from [0 to max_low_pfn) and * [4GB to max_pfn) because of possible memory holes in high addresses * that cannot be marked as UC by fixed/variable range MTRRs. * Depending on the alignment of E820 ranges, this may possibly result * in using smaller size (i.e. 4K instead of 2M or 1G) page tables. * * init_mem_mapping() calls init_range_memory_mapping() with big range. * That range would have hole in the middle or ends, and only ram parts * will be mapped in init_range_memory_mapping(). */ static unsigned long __init init_range_memory_mapping( unsigned long r_start, unsigned long r_end) { unsigned long start_pfn, end_pfn; unsigned long mapped_ram_size = 0; int i; for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, NULL) { u64 start = clamp_val(PFN_PHYS(start_pfn), r_start, r_end); u64 end = clamp_val(PFN_PHYS(end_pfn), r_start, r_end); if (start >= end) continue; /* * if it is overlapping with brk pgt, we need to * alloc pgt buf from memblock instead. */ can_use_brk_pgt = max(start, (u64)pgt_buf_end<<PAGE_SHIFT) >= min(end, (u64)pgt_buf_top<<PAGE_SHIFT); init_memory_mapping(start, end, PAGE_KERNEL); mapped_ram_size += end - start; can_use_brk_pgt = true; } return mapped_ram_size; } static unsigned long __init get_new_step_size(unsigned long step_size) { /* * Initial mapped size is PMD_SIZE (2M). * We can not set step_size to be PUD_SIZE (1G) yet. * In worse case, when we cross the 1G boundary, and * PG_LEVEL_2M is not set, we will need 1+1+512 pages (2M + 8k) * to map 1G range with PTE. Hence we use one less than the * difference of page table level shifts. * * Don't need to worry about overflow in the top-down case, on 32bit, * when step_size is 0, round_down() returns 0 for start, and that * turns it into 0x100000000ULL. * In the bottom-up case, round_up(x, 0) returns 0 though too, which * needs to be taken into consideration by the code below. */ return step_size << (PMD_SHIFT - PAGE_SHIFT - 1); } /** * memory_map_top_down - Map [map_start, map_end) top down * @map_start: start address of the target memory range * @map_end: end address of the target memory range * * This function will setup direct mapping for memory range * [map_start, map_end) in top-down. That said, the page tables * will be allocated at the end of the memory, and we map the * memory in top-down. */ static void __init memory_map_top_down(unsigned long map_start, unsigned long map_end) { unsigned long real_end, last_start; unsigned long step_size; unsigned long addr; unsigned long mapped_ram_size = 0; /* * Systems that have many reserved areas near top of the memory, * e.g. QEMU with less than 1G RAM and EFI enabled, or Xen, will * require lots of 4K mappings which may exhaust pgt_buf. * Start with top-most PMD_SIZE range aligned at PMD_SIZE to ensure * there is enough mapped memory that can be allocated from * memblock. */ addr = memblock_phys_alloc_range(PMD_SIZE, PMD_SIZE, map_start, map_end); if (!addr) { pr_warn("Failed to release memory for alloc_low_pages()"); real_end = max(map_start, ALIGN_DOWN(map_end, PMD_SIZE)); } else { memblock_phys_free(addr, PMD_SIZE); real_end = addr + PMD_SIZE; } /* step_size need to be small so pgt_buf from BRK could cover it */ step_size = PMD_SIZE; max_pfn_mapped = 0; /* will get exact value next */ min_pfn_mapped = real_end >> PAGE_SHIFT; last_start = real_end; /* * We start from the top (end of memory) and go to the bottom. * The memblock_find_in_range() gets us a block of RAM from the * end of RAM in [min_pfn_mapped, max_pfn_mapped) used as new pages * for page table. */ while (last_start > map_start) { unsigned long start; if (last_start > step_size) { start = round_down(last_start - 1, step_size); if (start < map_start) start = map_start; } else start = map_start; mapped_ram_size += init_range_memory_mapping(start, last_start); last_start = start; min_pfn_mapped = last_start >> PAGE_SHIFT; if (mapped_ram_size >= step_size) step_size = get_new_step_size(step_size); } if (real_end < map_end) init_range_memory_mapping(real_end, map_end); } /** * memory_map_bottom_up - Map [map_start, map_end) bottom up * @map_start: start address of the target memory range * @map_end: end address of the target memory range * * This function will setup direct mapping for memory range * [map_start, map_end) in bottom-up. Since we have limited the * bottom-up allocation above the kernel, the page tables will * be allocated just above the kernel and we map the memory * in [map_start, map_end) in bottom-up. */ static void __init memory_map_bottom_up(unsigned long map_start, unsigned long map_end) { unsigned long next, start; unsigned long mapped_ram_size = 0; /* step_size need to be small so pgt_buf from BRK could cover it */ unsigned long step_size = PMD_SIZE; start = map_start; min_pfn_mapped = start >> PAGE_SHIFT; /* * We start from the bottom (@map_start) and go to the top (@map_end). * The memblock_find_in_range() gets us a block of RAM from the * end of RAM in [min_pfn_mapped, max_pfn_mapped) used as new pages * for page table. */ while (start < map_end) { if (step_size && map_end - start > step_size) { next = round_up(start + 1, step_size); if (next > map_end) next = map_end; } else { next = map_end; } mapped_ram_size += init_range_memory_mapping(start, next); start = next; if (mapped_ram_size >= step_size) step_size = get_new_step_size(step_size); } } /* * The real mode trampoline, which is required for bootstrapping CPUs * occupies only a small area under the low 1MB. See reserve_real_mode() * for details. * * If KASLR is disabled the first PGD entry of the direct mapping is copied * to map the real mode trampoline. * * If KASLR is enabled, copy only the PUD which covers the low 1MB * area. This limits the randomization granularity to 1GB for both 4-level * and 5-level paging. */ static void __init init_trampoline(void) { #ifdef CONFIG_X86_64 /* * The code below will alias kernel page-tables in the user-range of the * address space, including the Global bit. So global TLB entries will * be created when using the trampoline page-table. */ if (!kaslr_memory_enabled()) trampoline_pgd_entry = init_top_pgt[pgd_index(__PAGE_OFFSET)]; else init_trampoline_kaslr(); #endif } void __init init_mem_mapping(void) { unsigned long end; pti_check_boottime_disable(); probe_page_size_mask(); setup_pcid(); #ifdef CONFIG_X86_64 end = max_pfn << PAGE_SHIFT; #else end = max_low_pfn << PAGE_SHIFT; #endif /* the ISA range is always mapped regardless of memory holes */ init_memory_mapping(0, ISA_END_ADDRESS, PAGE_KERNEL); /* Init the trampoline, possibly with KASLR memory offset */ init_trampoline(); /* * If the allocation is in bottom-up direction, we setup direct mapping * in bottom-up, otherwise we setup direct mapping in top-down. */ if (memblock_bottom_up()) { unsigned long kernel_end = __pa_symbol(_end); /* * we need two separate calls here. This is because we want to * allocate page tables above the kernel. So we first map * [kernel_end, end) to make memory above the kernel be mapped * as soon as possible. And then use page tables allocated above * the kernel to map [ISA_END_ADDRESS, kernel_end). */ memory_map_bottom_up(kernel_end, end); memory_map_bottom_up(ISA_END_ADDRESS, kernel_end); } else { memory_map_top_down(ISA_END_ADDRESS, end); } #ifdef CONFIG_X86_64 if (max_pfn > max_low_pfn) { /* can we preserve max_low_pfn ?*/ max_low_pfn = max_pfn; } #else early_ioremap_page_table_range_init(); #endif load_cr3(swapper_pg_dir); __flush_tlb_all(); x86_init.hyper.init_mem_mapping(); early_memtest(0, max_pfn_mapped << PAGE_SHIFT); } /* * Initialize an mm_struct to be used during poking and a pointer to be used * during patching. */ void __init poking_init(void) { spinlock_t *ptl; pte_t *ptep; text_poke_mm = mm_alloc(); BUG_ON(!text_poke_mm); /* Xen PV guests need the PGD to be pinned. */ paravirt_enter_mmap(text_poke_mm); set_notrack_mm(text_poke_mm); /* * Randomize the poking address, but make sure that the following page * will be mapped at the same PMD. We need 2 pages, so find space for 3, * and adjust the address if the PMD ends after the first one. */ text_poke_mm_addr = TASK_UNMAPPED_BASE; if (IS_ENABLED(CONFIG_RANDOMIZE_BASE)) text_poke_mm_addr += (kaslr_get_random_long("Poking") & PAGE_MASK) % (TASK_SIZE - TASK_UNMAPPED_BASE - 3 * PAGE_SIZE); if (((text_poke_mm_addr + PAGE_SIZE) & ~PMD_MASK) == 0) text_poke_mm_addr += PAGE_SIZE; /* * We need to trigger the allocation of the page-tables that will be * needed for poking now. Later, poking may be performed in an atomic * section, which might cause allocation to fail. */ ptep = get_locked_pte(text_poke_mm, text_poke_mm_addr, &ptl); BUG_ON(!ptep); pte_unmap_unlock(ptep, ptl); } /* * devmem_is_allowed() checks to see if /dev/mem access to a certain address * is valid. The argument is a physical page number. * * On x86, access has to be given to the first megabyte of RAM because that * area traditionally contains BIOS code and data regions used by X, dosemu, * and similar apps. Since they map the entire memory range, the whole range * must be allowed (for mapping), but any areas that would otherwise be * disallowed are flagged as being "zero filled" instead of rejected. * Access has to be given to non-kernel-ram areas as well, these contain the * PCI mmio resources as well as potential bios/acpi data regions. */ int devmem_is_allowed(unsigned long pagenr) { if (region_intersects(PFN_PHYS(pagenr), PAGE_SIZE, IORESOURCE_SYSTEM_RAM, IORES_DESC_NONE) != REGION_DISJOINT) { /* * For disallowed memory regions in the low 1MB range, * request that the page be shown as all zeros. */ if (pagenr < 256) return 2; return 0; } /* * This must follow RAM test, since System RAM is considered a * restricted resource under CONFIG_STRICT_DEVMEM. */ if (iomem_is_exclusive(pagenr << PAGE_SHIFT)) { /* Low 1MB bypasses iomem restrictions. */ if (pagenr < 256) return 1; return 0; } return 1; } void free_init_pages(const char *what, unsigned long begin, unsigned long end) { unsigned long begin_aligned, end_aligned; /* Make sure boundaries are page aligned */ begin_aligned = PAGE_ALIGN(begin); end_aligned = end & PAGE_MASK; if (WARN_ON(begin_aligned != begin || end_aligned != end)) { begin = begin_aligned; end = end_aligned; } if (begin >= end) return; /* * If debugging page accesses then do not free this memory but * mark them not present - any buggy init-section access will * create a kernel page fault: */ if (debug_pagealloc_enabled()) { pr_info("debug: unmapping init [mem %#010lx-%#010lx]\n", begin, end - 1); /* * Inform kmemleak about the hole in the memory since the * corresponding pages will be unmapped. */ kmemleak_free_part((void *)begin, end - begin); set_memory_np(begin, (end - begin) >> PAGE_SHIFT); } else { /* * We just marked the kernel text read only above, now that * we are going to free part of that, we need to make that * writeable and non-executable first. */ set_memory_nx(begin, (end - begin) >> PAGE_SHIFT); set_memory_rw(begin, (end - begin) >> PAGE_SHIFT); free_reserved_area((void *)begin, (void *)end, POISON_FREE_INITMEM, what); } } /* * begin/end can be in the direct map or the "high kernel mapping" * used for the kernel image only. free_init_pages() will do the * right thing for either kind of address. */ void free_kernel_image_pages(const char *what, void *begin, void *end) { unsigned long begin_ul = (unsigned long)begin; unsigned long end_ul = (unsigned long)end; unsigned long len_pages = (end_ul - begin_ul) >> PAGE_SHIFT; free_init_pages(what, begin_ul, end_ul); /* * PTI maps some of the kernel into userspace. For performance, * this includes some kernel areas that do not contain secrets. * Those areas might be adjacent to the parts of the kernel image * being freed, which may contain secrets. Remove the "high kernel * image mapping" for these freed areas, ensuring they are not even * potentially vulnerable to Meltdown regardless of the specific * optimizations PTI is currently using. * * The "noalias" prevents unmapping the direct map alias which is * needed to access the freed pages. * * This is only valid for 64bit kernels. 32bit has only one mapping * which can't be treated in this way for obvious reasons. */ if (IS_ENABLED(CONFIG_X86_64) && cpu_feature_enabled(X86_FEATURE_PTI)) set_memory_np_noalias(begin_ul, len_pages); } void __ref free_initmem(void) { e820__reallocate_tables(); mem_encrypt_free_decrypted_mem(); free_kernel_image_pages("unused kernel image (initmem)", &__init_begin, &__init_end); } #ifdef CONFIG_BLK_DEV_INITRD void __init free_initrd_mem(unsigned long start, unsigned long end) { /* * end could be not aligned, and We can not align that, * decompressor could be confused by aligned initrd_end * We already reserve the end partial page before in * - i386_start_kernel() * - x86_64_start_kernel() * - relocate_initrd() * So here We can do PAGE_ALIGN() safely to get partial page to be freed */ free_init_pages("initrd", start, PAGE_ALIGN(end)); } #endif void __init zone_sizes_init(void) { unsigned long max_zone_pfns[MAX_NR_ZONES]; memset(max_zone_pfns, 0, sizeof(max_zone_pfns)); #ifdef CONFIG_ZONE_DMA max_zone_pfns[ZONE_DMA] = min(MAX_DMA_PFN, max_low_pfn); #endif #ifdef CONFIG_ZONE_DMA32 max_zone_pfns[ZONE_DMA32] = min(MAX_DMA32_PFN, max_low_pfn); #endif max_zone_pfns[ZONE_NORMAL] = max_low_pfn; #ifdef CONFIG_HIGHMEM max_zone_pfns[ZONE_HIGHMEM] = max_pfn; #endif free_area_init(max_zone_pfns); } __visible DEFINE_PER_CPU_ALIGNED(struct tlb_state, cpu_tlbstate) = { .loaded_mm = &init_mm, .next_asid = 1, .cr4 = ~0UL, /* fail hard if we screw up cr4 shadow initialization */ }; #ifdef CONFIG_ADDRESS_MASKING DEFINE_PER_CPU(u64, tlbstate_untag_mask); EXPORT_PER_CPU_SYMBOL(tlbstate_untag_mask); #endif void update_cache_mode_entry(unsigned entry, enum page_cache_mode cache) { /* entry 0 MUST be WB (hardwired to speed up translations) */ BUG_ON(!entry && cache != _PAGE_CACHE_MODE_WB); __cachemode2pte_tbl[cache] = __cm_idx2pte(entry); __pte2cachemode_tbl[entry] = cache; } #ifdef CONFIG_SWAP unsigned long arch_max_swapfile_size(void) { unsigned long pages; pages = generic_max_swapfile_size(); if (boot_cpu_has_bug(X86_BUG_L1TF) && l1tf_mitigation != L1TF_MITIGATION_OFF) { /* Limit the swap file size to MAX_PA/2 for L1TF workaround */ unsigned long long l1tf_limit = l1tf_pfn_limit(); /* * We encode swap offsets also with 3 bits below those for pfn * which makes the usable limit higher. */ #if CONFIG_PGTABLE_LEVELS > 2 l1tf_limit <<= PAGE_SHIFT - SWP_OFFSET_FIRST_BIT; #endif pages = min_t(unsigned long long, l1tf_limit, pages); } return pages; } #endif #ifdef CONFIG_EXECMEM static struct execmem_info execmem_info __ro_after_init; #ifdef CONFIG_ARCH_HAS_EXECMEM_ROX void execmem_fill_trapping_insns(void *ptr, size_t size) { memset(ptr, INT3_INSN_OPCODE, size); } #endif struct execmem_info __init *execmem_arch_setup(void) { unsigned long start, offset = 0; enum execmem_range_flags flags; pgprot_t pgprot; if (kaslr_enabled()) offset = get_random_u32_inclusive(1, 1024) * PAGE_SIZE; start = MODULES_VADDR + offset; if (IS_ENABLED(CONFIG_ARCH_HAS_EXECMEM_ROX) && cpu_feature_enabled(X86_FEATURE_PSE)) { pgprot = PAGE_KERNEL_ROX; flags = EXECMEM_KASAN_SHADOW | EXECMEM_ROX_CACHE; } else { pgprot = PAGE_KERNEL; flags = EXECMEM_KASAN_SHADOW; } execmem_info = (struct execmem_info){ .ranges = { [EXECMEM_MODULE_TEXT] = { .flags = flags, .start = start, .end = MODULES_END, .pgprot = pgprot, .alignment = MODULE_ALIGN, }, [EXECMEM_KPROBES] = { .flags = flags, .start = start, .end = MODULES_END, .pgprot = PAGE_KERNEL_ROX, .alignment = MODULE_ALIGN, }, [EXECMEM_FTRACE] = { .flags = flags, .start = start, .end = MODULES_END, .pgprot = pgprot, .alignment = MODULE_ALIGN, }, [EXECMEM_BPF] = { .flags = EXECMEM_KASAN_SHADOW, .start = start, .end = MODULES_END, .pgprot = PAGE_KERNEL, .alignment = MODULE_ALIGN, }, [EXECMEM_MODULE_DATA] = { .flags = EXECMEM_KASAN_SHADOW, .start = start, .end = MODULES_END, .pgprot = PAGE_KERNEL, .alignment = MODULE_ALIGN, }, }, }; return &execmem_info; } #endif /* CONFIG_EXECMEM */ |
| 4 3 3 5 3 5 35 3 35 35 35 35 5 5 5 1 2 3 5 5 5 5 5 5 5 5 5 5 5 5 2 3 5 5 2 2 2 36 36 36 36 36 36 36 4 5 5 5 5 4 1 5 1 4 5 5 5 5 5 5 5 5 3 5 1 4 5 5 5 5 5 5 3 2 3 3 2 2 1 5 3 2 2 2 1 5 5 5 34 381 381 382 382 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 | // SPDX-License-Identifier: GPL-2.0-or-later /* */ #include <linux/gfp.h> #include <linux/init.h> #include <linux/ratelimit.h> #include <linux/usb.h> #include <linux/usb/audio.h> #include <linux/slab.h> #include <sound/core.h> #include <sound/pcm.h> #include <sound/pcm_params.h> #include "usbaudio.h" #include "helper.h" #include "card.h" #include "endpoint.h" #include "pcm.h" #include "clock.h" #include "quirks.h" enum { EP_STATE_STOPPED, EP_STATE_RUNNING, EP_STATE_STOPPING, }; /* interface refcounting */ struct snd_usb_iface_ref { unsigned char iface; bool need_setup; int opened; int altset; struct list_head list; }; /* clock refcounting */ struct snd_usb_clock_ref { unsigned char clock; atomic_t locked; int opened; int rate; bool need_setup; struct list_head list; }; /* * snd_usb_endpoint is a model that abstracts everything related to an * USB endpoint and its streaming. * * There are functions to activate and deactivate the streaming URBs and * optional callbacks to let the pcm logic handle the actual content of the * packets for playback and record. Thus, the bus streaming and the audio * handlers are fully decoupled. * * There are two different types of endpoints in audio applications. * * SND_USB_ENDPOINT_TYPE_DATA handles full audio data payload for both * inbound and outbound traffic. * * SND_USB_ENDPOINT_TYPE_SYNC endpoints are for inbound traffic only and * expect the payload to carry Q10.14 / Q16.16 formatted sync information * (3 or 4 bytes). * * Each endpoint has to be configured prior to being used by calling * snd_usb_endpoint_set_params(). * * The model incorporates a reference counting, so that multiple users * can call snd_usb_endpoint_start() and snd_usb_endpoint_stop(), and * only the first user will effectively start the URBs, and only the last * one to stop it will tear the URBs down again. */ /* * convert a sampling rate into our full speed format (fs/1000 in Q16.16) * this will overflow at approx 524 kHz */ static inline unsigned get_usb_full_speed_rate(unsigned int rate) { return ((rate << 13) + 62) / 125; } /* * convert a sampling rate into USB high speed format (fs/8000 in Q16.16) * this will overflow at approx 4 MHz */ static inline unsigned get_usb_high_speed_rate(unsigned int rate) { return ((rate << 10) + 62) / 125; } /* * release a urb data */ static void release_urb_ctx(struct snd_urb_ctx *u) { if (u->urb && u->buffer_size) usb_free_coherent(u->ep->chip->dev, u->buffer_size, u->urb->transfer_buffer, u->urb->transfer_dma); usb_free_urb(u->urb); u->urb = NULL; u->buffer_size = 0; } static const char *usb_error_string(int err) { switch (err) { case -ENODEV: return "no device"; case -ENOENT: return "endpoint not enabled"; case -EPIPE: return "endpoint stalled"; case -ENOSPC: return "not enough bandwidth"; case -ESHUTDOWN: return "device disabled"; case -EHOSTUNREACH: return "device suspended"; case -EINVAL: case -EAGAIN: case -EFBIG: case -EMSGSIZE: return "internal error"; default: return "unknown error"; } } static inline bool ep_state_running(struct snd_usb_endpoint *ep) { return atomic_read(&ep->state) == EP_STATE_RUNNING; } static inline bool ep_state_update(struct snd_usb_endpoint *ep, int old, int new) { return atomic_try_cmpxchg(&ep->state, &old, new); } /** * snd_usb_endpoint_implicit_feedback_sink: Report endpoint usage type * * @ep: The snd_usb_endpoint * * Determine whether an endpoint is driven by an implicit feedback * data endpoint source. */ int snd_usb_endpoint_implicit_feedback_sink(struct snd_usb_endpoint *ep) { return ep->implicit_fb_sync && usb_pipeout(ep->pipe); } /* * Return the number of samples to be sent in the next packet * for streaming based on information derived from sync endpoints * * This won't be used for implicit feedback which takes the packet size * returned from the sync source */ static int slave_next_packet_size(struct snd_usb_endpoint *ep, unsigned int avail) { unsigned int phase; int ret; if (ep->fill_max) return ep->maxframesize; guard(spinlock_irqsave)(&ep->lock); phase = (ep->phase & 0xffff) + (ep->freqm << ep->datainterval); ret = min(phase >> 16, ep->maxframesize); if (avail && ret >= avail) ret = -EAGAIN; else ep->phase = phase; return ret; } /* * Return the number of samples to be sent in the next packet * for adaptive and synchronous endpoints */ static int next_packet_size(struct snd_usb_endpoint *ep, unsigned int avail) { unsigned int sample_accum; int ret; if (ep->fill_max) return ep->maxframesize; sample_accum = ep->sample_accum + ep->sample_rem; if (sample_accum >= ep->pps) { sample_accum -= ep->pps; ret = ep->packsize[1]; } else { ret = ep->packsize[0]; } if (avail && ret >= avail) ret = -EAGAIN; else ep->sample_accum = sample_accum; return ret; } /* * snd_usb_endpoint_next_packet_size: Return the number of samples to be sent * in the next packet * * If the size is equal or exceeds @avail, don't proceed but return -EAGAIN * Exception: @avail = 0 for skipping the check. */ int snd_usb_endpoint_next_packet_size(struct snd_usb_endpoint *ep, struct snd_urb_ctx *ctx, int idx, unsigned int avail) { unsigned int packet; packet = ctx->packet_size[idx]; if (packet) { if (avail && packet >= avail) return -EAGAIN; return packet; } if (ep->sync_source) return slave_next_packet_size(ep, avail); else return next_packet_size(ep, avail); } static void call_retire_callback(struct snd_usb_endpoint *ep, struct urb *urb) { struct snd_usb_substream *data_subs; data_subs = READ_ONCE(ep->data_subs); if (data_subs && ep->retire_data_urb) ep->retire_data_urb(data_subs, urb); } static void retire_outbound_urb(struct snd_usb_endpoint *ep, struct snd_urb_ctx *urb_ctx) { call_retire_callback(ep, urb_ctx->urb); } static void snd_usb_handle_sync_urb(struct snd_usb_endpoint *ep, struct snd_usb_endpoint *sender, const struct urb *urb); static void retire_inbound_urb(struct snd_usb_endpoint *ep, struct snd_urb_ctx *urb_ctx) { struct urb *urb = urb_ctx->urb; struct snd_usb_endpoint *sync_sink; if (unlikely(ep->skip_packets > 0)) { ep->skip_packets--; return; } sync_sink = READ_ONCE(ep->sync_sink); if (sync_sink) snd_usb_handle_sync_urb(sync_sink, ep, urb); call_retire_callback(ep, urb); } static inline bool has_tx_length_quirk(struct snd_usb_audio *chip) { return chip->quirk_flags & QUIRK_FLAG_TX_LENGTH; } static void prepare_silent_urb(struct snd_usb_endpoint *ep, struct snd_urb_ctx *ctx) { struct urb *urb = ctx->urb; unsigned int offs = 0; unsigned int extra = 0; __le32 packet_length; int i; /* For tx_length_quirk, put packet length at start of packet */ if (has_tx_length_quirk(ep->chip)) extra = sizeof(packet_length); for (i = 0; i < ctx->packets; ++i) { unsigned int offset; unsigned int length; int counts; counts = snd_usb_endpoint_next_packet_size(ep, ctx, i, 0); length = counts * ep->stride; /* number of silent bytes */ offset = offs * ep->stride + extra * i; urb->iso_frame_desc[i].offset = offset; urb->iso_frame_desc[i].length = length + extra; if (extra) { packet_length = cpu_to_le32(length); memcpy(urb->transfer_buffer + offset, &packet_length, sizeof(packet_length)); } memset(urb->transfer_buffer + offset + extra, ep->silence_value, length); offs += counts; } urb->number_of_packets = ctx->packets; urb->transfer_buffer_length = offs * ep->stride + ctx->packets * extra; ctx->queued = 0; } /* * Prepare a PLAYBACK urb for submission to the bus. */ static int prepare_outbound_urb(struct snd_usb_endpoint *ep, struct snd_urb_ctx *ctx, bool in_stream_lock) { struct urb *urb = ctx->urb; unsigned char *cp = urb->transfer_buffer; struct snd_usb_substream *data_subs; urb->dev = ep->chip->dev; /* we need to set this at each time */ switch (ep->type) { case SND_USB_ENDPOINT_TYPE_DATA: data_subs = READ_ONCE(ep->data_subs); if (data_subs && ep->prepare_data_urb) return ep->prepare_data_urb(data_subs, urb, in_stream_lock); /* no data provider, so send silence */ prepare_silent_urb(ep, ctx); break; case SND_USB_ENDPOINT_TYPE_SYNC: if (snd_usb_get_speed(ep->chip->dev) >= USB_SPEED_HIGH) { /* * fill the length and offset of each urb descriptor. * the fixed 12.13 frequency is passed as 16.16 through the pipe. */ urb->iso_frame_desc[0].length = 4; urb->iso_frame_desc[0].offset = 0; cp[0] = ep->freqn; cp[1] = ep->freqn >> 8; cp[2] = ep->freqn >> 16; cp[3] = ep->freqn >> 24; } else { /* * fill the length and offset of each urb descriptor. * the fixed 10.14 frequency is passed through the pipe. */ urb->iso_frame_desc[0].length = 3; urb->iso_frame_desc[0].offset = 0; cp[0] = ep->freqn >> 2; cp[1] = ep->freqn >> 10; cp[2] = ep->freqn >> 18; } break; } return 0; } /* * Prepare a CAPTURE or SYNC urb for submission to the bus. */ static int prepare_inbound_urb(struct snd_usb_endpoint *ep, struct snd_urb_ctx *urb_ctx) { int i, offs; struct urb *urb = urb_ctx->urb; urb->dev = ep->chip->dev; /* we need to set this at each time */ switch (ep->type) { case SND_USB_ENDPOINT_TYPE_DATA: offs = 0; for (i = 0; i < urb_ctx->packets; i++) { urb->iso_frame_desc[i].offset = offs; urb->iso_frame_desc[i].length = ep->curpacksize; offs += ep->curpacksize; } urb->transfer_buffer_length = offs; urb->number_of_packets = urb_ctx->packets; break; case SND_USB_ENDPOINT_TYPE_SYNC: urb->iso_frame_desc[0].length = min(4u, ep->syncmaxsize); urb->iso_frame_desc[0].offset = 0; break; } return 0; } /* notify an error as XRUN to the assigned PCM data substream */ static void notify_xrun(struct snd_usb_endpoint *ep) { struct snd_usb_substream *data_subs; struct snd_pcm_substream *psubs; data_subs = READ_ONCE(ep->data_subs); if (!data_subs) return; psubs = data_subs->pcm_substream; if (psubs && psubs->runtime && psubs->runtime->state == SNDRV_PCM_STATE_RUNNING) snd_pcm_stop_xrun(psubs); } static struct snd_usb_packet_info * next_packet_fifo_enqueue(struct snd_usb_endpoint *ep) { struct snd_usb_packet_info *p; p = ep->next_packet + (ep->next_packet_head + ep->next_packet_queued) % ARRAY_SIZE(ep->next_packet); ep->next_packet_queued++; return p; } static struct snd_usb_packet_info * next_packet_fifo_dequeue(struct snd_usb_endpoint *ep) { struct snd_usb_packet_info *p; p = ep->next_packet + ep->next_packet_head; ep->next_packet_head++; ep->next_packet_head %= ARRAY_SIZE(ep->next_packet); ep->next_packet_queued--; return p; } static void push_back_to_ready_list(struct snd_usb_endpoint *ep, struct snd_urb_ctx *ctx) { guard(spinlock_irqsave)(&ep->lock); list_add_tail(&ctx->ready_list, &ep->ready_playback_urbs); } /* * Send output urbs that have been prepared previously. URBs are dequeued * from ep->ready_playback_urbs and in case there aren't any available * or there are no packets that have been prepared, this function does * nothing. * * The reason why the functionality of sending and preparing URBs is separated * is that host controllers don't guarantee the order in which they return * inbound and outbound packets to their submitters. * * This function is used both for implicit feedback endpoints and in low- * latency playback mode. */ int snd_usb_queue_pending_output_urbs(struct snd_usb_endpoint *ep, bool in_stream_lock) { bool implicit_fb = snd_usb_endpoint_implicit_feedback_sink(ep); while (ep_state_running(ep)) { struct snd_usb_packet_info *packet; struct snd_urb_ctx *ctx = NULL; int err, i; scoped_guard(spinlock_irqsave, &ep->lock) { if ((!implicit_fb || ep->next_packet_queued > 0) && !list_empty(&ep->ready_playback_urbs)) { /* take URB out of FIFO */ ctx = list_first_entry(&ep->ready_playback_urbs, struct snd_urb_ctx, ready_list); list_del_init(&ctx->ready_list); if (implicit_fb) packet = next_packet_fifo_dequeue(ep); } } if (ctx == NULL) break; /* copy over the length information */ if (implicit_fb) { for (i = 0; i < packet->packets; i++) ctx->packet_size[i] = packet->packet_size[i]; } /* call the data handler to fill in playback data */ err = prepare_outbound_urb(ep, ctx, in_stream_lock); /* can be stopped during prepare callback */ if (unlikely(!ep_state_running(ep))) break; if (err < 0) { /* push back to ready list again for -EAGAIN */ if (err == -EAGAIN) { push_back_to_ready_list(ep, ctx); break; } if (!in_stream_lock) notify_xrun(ep); return -EPIPE; } if (!atomic_read(&ep->chip->shutdown)) err = usb_submit_urb(ctx->urb, GFP_ATOMIC); else err = -ENODEV; if (err < 0) { if (!atomic_read(&ep->chip->shutdown)) { usb_audio_err(ep->chip, "Unable to submit urb #%d: %d at %s\n", ctx->index, err, __func__); if (!in_stream_lock) notify_xrun(ep); } return -EPIPE; } set_bit(ctx->index, &ep->active_mask); atomic_inc(&ep->submitted_urbs); } return 0; } /* * complete callback for urbs */ static void snd_complete_urb(struct urb *urb) { struct snd_urb_ctx *ctx = urb->context; struct snd_usb_endpoint *ep = ctx->ep; int err; if (unlikely(urb->status == -ENOENT || /* unlinked */ urb->status == -ENODEV || /* device removed */ urb->status == -ECONNRESET || /* unlinked */ urb->status == -ESHUTDOWN)) /* device disabled */ goto exit_clear; /* device disconnected */ if (unlikely(atomic_read(&ep->chip->shutdown))) goto exit_clear; if (unlikely(!ep_state_running(ep))) goto exit_clear; if (usb_pipeout(ep->pipe)) { retire_outbound_urb(ep, ctx); /* can be stopped during retire callback */ if (unlikely(!ep_state_running(ep))) goto exit_clear; /* in low-latency and implicit-feedback modes, push back the * URB to ready list at first, then process as much as possible */ if (ep->lowlatency_playback || snd_usb_endpoint_implicit_feedback_sink(ep)) { push_back_to_ready_list(ep, ctx); clear_bit(ctx->index, &ep->active_mask); snd_usb_queue_pending_output_urbs(ep, false); /* decrement at last, and check xrun */ if (atomic_dec_and_test(&ep->submitted_urbs) && !snd_usb_endpoint_implicit_feedback_sink(ep)) notify_xrun(ep); return; } /* in non-lowlatency mode, no error handling for prepare */ prepare_outbound_urb(ep, ctx, false); /* can be stopped during prepare callback */ if (unlikely(!ep_state_running(ep))) goto exit_clear; } else { retire_inbound_urb(ep, ctx); /* can be stopped during retire callback */ if (unlikely(!ep_state_running(ep))) goto exit_clear; prepare_inbound_urb(ep, ctx); } if (!atomic_read(&ep->chip->shutdown)) err = usb_submit_urb(urb, GFP_ATOMIC); else err = -ENODEV; if (err == 0) return; if (!atomic_read(&ep->chip->shutdown)) { usb_audio_err(ep->chip, "cannot submit urb (err = %d)\n", err); notify_xrun(ep); } exit_clear: clear_bit(ctx->index, &ep->active_mask); atomic_dec(&ep->submitted_urbs); } /* * Find or create a refcount object for the given interface * * The objects are released altogether in snd_usb_endpoint_free_all() */ static struct snd_usb_iface_ref * iface_ref_find(struct snd_usb_audio *chip, int iface) { struct snd_usb_iface_ref *ip; list_for_each_entry(ip, &chip->iface_ref_list, list) if (ip->iface == iface) return ip; ip = kzalloc(sizeof(*ip), GFP_KERNEL); if (!ip) return NULL; ip->iface = iface; list_add_tail(&ip->list, &chip->iface_ref_list); return ip; } /* Similarly, a refcount object for clock */ static struct snd_usb_clock_ref * clock_ref_find(struct snd_usb_audio *chip, int clock) { struct snd_usb_clock_ref *ref; list_for_each_entry(ref, &chip->clock_ref_list, list) if (ref->clock == clock) return ref; ref = kzalloc(sizeof(*ref), GFP_KERNEL); if (!ref) return NULL; ref->clock = clock; atomic_set(&ref->locked, 0); list_add_tail(&ref->list, &chip->clock_ref_list); return ref; } /* * Get the existing endpoint object corresponding EP * Returns NULL if not present. */ struct snd_usb_endpoint * snd_usb_get_endpoint(struct snd_usb_audio *chip, int ep_num) { struct snd_usb_endpoint *ep; list_for_each_entry(ep, &chip->ep_list, list) { if (ep->ep_num == ep_num) return ep; } return NULL; } #define ep_type_name(type) \ (type == SND_USB_ENDPOINT_TYPE_DATA ? "data" : "sync") /** * snd_usb_add_endpoint: Add an endpoint to an USB audio chip * * @chip: The chip * @ep_num: The number of the endpoint to use * @type: SND_USB_ENDPOINT_TYPE_DATA or SND_USB_ENDPOINT_TYPE_SYNC * * If the requested endpoint has not been added to the given chip before, * a new instance is created. * * Returns zero on success or a negative error code. * * New endpoints will be added to chip->ep_list and freed by * calling snd_usb_endpoint_free_all(). * * For SND_USB_ENDPOINT_TYPE_SYNC, the caller needs to guarantee that * bNumEndpoints > 1 beforehand. */ int snd_usb_add_endpoint(struct snd_usb_audio *chip, int ep_num, int type) { struct snd_usb_endpoint *ep; bool is_playback; ep = snd_usb_get_endpoint(chip, ep_num); if (ep) return 0; usb_audio_dbg(chip, "Creating new %s endpoint #%x\n", ep_type_name(type), ep_num); ep = kzalloc(sizeof(*ep), GFP_KERNEL); if (!ep) return -ENOMEM; ep->chip = chip; spin_lock_init(&ep->lock); ep->type = type; ep->ep_num = ep_num; INIT_LIST_HEAD(&ep->ready_playback_urbs); atomic_set(&ep->submitted_urbs, 0); is_playback = ((ep_num & USB_ENDPOINT_DIR_MASK) == USB_DIR_OUT); ep_num &= USB_ENDPOINT_NUMBER_MASK; if (is_playback) ep->pipe = usb_sndisocpipe(chip->dev, ep_num); else ep->pipe = usb_rcvisocpipe(chip->dev, ep_num); list_add_tail(&ep->list, &chip->ep_list); return 0; } /* Set up syncinterval and maxsyncsize for a sync EP */ static void endpoint_set_syncinterval(struct snd_usb_audio *chip, struct snd_usb_endpoint *ep) { struct usb_host_interface *alts; struct usb_endpoint_descriptor *desc; alts = snd_usb_get_host_interface(chip, ep->iface, ep->altsetting); if (!alts) return; desc = get_endpoint(alts, ep->ep_idx); if (desc->bLength >= USB_DT_ENDPOINT_AUDIO_SIZE && desc->bRefresh >= 1 && desc->bRefresh <= 9) ep->syncinterval = desc->bRefresh; else if (snd_usb_get_speed(chip->dev) == USB_SPEED_FULL) ep->syncinterval = 1; else if (desc->bInterval >= 1 && desc->bInterval <= 16) ep->syncinterval = desc->bInterval - 1; else ep->syncinterval = 3; ep->syncmaxsize = le16_to_cpu(desc->wMaxPacketSize); } static bool endpoint_compatible(struct snd_usb_endpoint *ep, const struct audioformat *fp, const struct snd_pcm_hw_params *params) { if (!ep->opened) return false; if (ep->cur_audiofmt != fp) return false; if (ep->cur_rate != params_rate(params) || ep->cur_format != params_format(params) || ep->cur_period_frames != params_period_size(params) || ep->cur_buffer_periods != params_periods(params)) return false; return true; } /* * Check whether the given fp and hw params are compatible with the current * setup of the target EP for implicit feedback sync */ bool snd_usb_endpoint_compatible(struct snd_usb_audio *chip, struct snd_usb_endpoint *ep, const struct audioformat *fp, const struct snd_pcm_hw_params *params) { guard(mutex)(&chip->mutex); return endpoint_compatible(ep, fp, params); } /* * snd_usb_endpoint_open: Open the endpoint * * Called from hw_params to assign the endpoint to the substream. * It's reference-counted, and only the first opener is allowed to set up * arbitrary parameters. The later opener must be compatible with the * former opened parameters. * The endpoint needs to be closed via snd_usb_endpoint_close() later. * * Note that this function doesn't configure the endpoint. The substream * needs to set it up later via snd_usb_endpoint_set_params() and * snd_usb_endpoint_prepare(). */ struct snd_usb_endpoint * snd_usb_endpoint_open(struct snd_usb_audio *chip, const struct audioformat *fp, const struct snd_pcm_hw_params *params, bool is_sync_ep, bool fixed_rate) { struct snd_usb_endpoint *ep; int ep_num = is_sync_ep ? fp->sync_ep : fp->endpoint; guard(mutex)(&chip->mutex); ep = snd_usb_get_endpoint(chip, ep_num); if (!ep) { usb_audio_err(chip, "Cannot find EP 0x%x to open\n", ep_num); return NULL; } if (!ep->opened) { if (is_sync_ep) { ep->iface = fp->sync_iface; ep->altsetting = fp->sync_altsetting; ep->ep_idx = fp->sync_ep_idx; } else { ep->iface = fp->iface; ep->altsetting = fp->altsetting; ep->ep_idx = fp->ep_idx; } usb_audio_dbg(chip, "Open EP 0x%x, iface=%d:%d, idx=%d\n", ep_num, ep->iface, ep->altsetting, ep->ep_idx); ep->iface_ref = iface_ref_find(chip, ep->iface); if (!ep->iface_ref) return NULL; if (fp->protocol != UAC_VERSION_1) { ep->clock_ref = clock_ref_find(chip, fp->clock); if (!ep->clock_ref) return NULL; ep->clock_ref->opened++; } ep->cur_audiofmt = fp; ep->cur_channels = fp->channels; ep->cur_rate = params_rate(params); ep->cur_format = params_format(params); ep->cur_frame_bytes = snd_pcm_format_physical_width(ep->cur_format) * ep->cur_channels / 8; ep->cur_period_frames = params_period_size(params); ep->cur_period_bytes = ep->cur_period_frames * ep->cur_frame_bytes; ep->cur_buffer_periods = params_periods(params); if (ep->type == SND_USB_ENDPOINT_TYPE_SYNC) endpoint_set_syncinterval(chip, ep); ep->implicit_fb_sync = fp->implicit_fb; ep->need_setup = true; ep->need_prepare = true; ep->fixed_rate = fixed_rate; usb_audio_dbg(chip, " channels=%d, rate=%d, format=%s, period_bytes=%d, periods=%d, implicit_fb=%d\n", ep->cur_channels, ep->cur_rate, snd_pcm_format_name(ep->cur_format), ep->cur_period_bytes, ep->cur_buffer_periods, ep->implicit_fb_sync); } else { if (WARN_ON(!ep->iface_ref)) return NULL; if (!endpoint_compatible(ep, fp, params)) { usb_audio_err(chip, "Incompatible EP setup for 0x%x\n", ep_num); return NULL; } usb_audio_dbg(chip, "Reopened EP 0x%x (count %d)\n", ep_num, ep->opened); } if (!ep->iface_ref->opened++) ep->iface_ref->need_setup = true; ep->opened++; return ep; } /* * snd_usb_endpoint_set_sync: Link data and sync endpoints * * Pass NULL to sync_ep to unlink again */ void snd_usb_endpoint_set_sync(struct snd_usb_audio *chip, struct snd_usb_endpoint *data_ep, struct snd_usb_endpoint *sync_ep) { data_ep->sync_source = sync_ep; } /* * Set data endpoint callbacks and the assigned data stream * * Called at PCM trigger and cleanups. * Pass NULL to deactivate each callback. */ void snd_usb_endpoint_set_callback(struct snd_usb_endpoint *ep, int (*prepare)(struct snd_usb_substream *subs, struct urb *urb, bool in_stream_lock), void (*retire)(struct snd_usb_substream *subs, struct urb *urb), struct snd_usb_substream *data_subs) { ep->prepare_data_urb = prepare; ep->retire_data_urb = retire; if (data_subs) ep->lowlatency_playback = data_subs->lowlatency_playback; else ep->lowlatency_playback = false; WRITE_ONCE(ep->data_subs, data_subs); } static int endpoint_set_interface(struct snd_usb_audio *chip, struct snd_usb_endpoint *ep, bool set) { int altset = set ? ep->altsetting : 0; int err; int retries = 0; const int max_retries = 5; if (ep->iface_ref->altset == altset) return 0; /* already disconnected? */ if (unlikely(atomic_read(&chip->shutdown))) return -ENODEV; usb_audio_dbg(chip, "Setting usb interface %d:%d for EP 0x%x\n", ep->iface, altset, ep->ep_num); retry: err = usb_set_interface(chip->dev, ep->iface, altset); if (err < 0) { if (err == -EPROTO && ++retries <= max_retries) { msleep(5 * (1 << (retries - 1))); goto retry; } usb_audio_err_ratelimited( chip, "%d:%d: usb_set_interface failed (%d)\n", ep->iface, altset, err); return err; } if (chip->quirk_flags & QUIRK_FLAG_IFACE_DELAY) msleep(50); ep->iface_ref->altset = altset; return 0; } /* * snd_usb_endpoint_close: Close the endpoint * * Unreference the already opened endpoint via snd_usb_endpoint_open(). */ void snd_usb_endpoint_close(struct snd_usb_audio *chip, struct snd_usb_endpoint *ep) { guard(mutex)(&chip->mutex); usb_audio_dbg(chip, "Closing EP 0x%x (count %d)\n", ep->ep_num, ep->opened); if (!--ep->iface_ref->opened && !(chip->quirk_flags & QUIRK_FLAG_IFACE_SKIP_CLOSE)) endpoint_set_interface(chip, ep, false); if (!--ep->opened) { if (ep->clock_ref) { if (!--ep->clock_ref->opened) ep->clock_ref->rate = 0; } ep->iface = 0; ep->altsetting = 0; ep->cur_audiofmt = NULL; ep->cur_rate = 0; ep->iface_ref = NULL; ep->clock_ref = NULL; usb_audio_dbg(chip, "EP 0x%x closed\n", ep->ep_num); } } /* Prepare for suspening EP, called from the main suspend handler */ void snd_usb_endpoint_suspend(struct snd_usb_endpoint *ep) { ep->need_prepare = true; if (ep->iface_ref) ep->iface_ref->need_setup = true; if (ep->clock_ref) ep->clock_ref->rate = 0; } /* * wait until all urbs are processed. */ static int wait_clear_urbs(struct snd_usb_endpoint *ep) { unsigned long end_time = jiffies + msecs_to_jiffies(1000); int alive; if (atomic_read(&ep->state) != EP_STATE_STOPPING) return 0; do { alive = atomic_read(&ep->submitted_urbs); if (!alive) break; schedule_timeout_uninterruptible(1); } while (time_before(jiffies, end_time)); if (alive) usb_audio_err(ep->chip, "timeout: still %d active urbs on EP #%x\n", alive, ep->ep_num); if (ep_state_update(ep, EP_STATE_STOPPING, EP_STATE_STOPPED)) { ep->sync_sink = NULL; snd_usb_endpoint_set_callback(ep, NULL, NULL, NULL); } return 0; } /* sync the pending stop operation; * this function itself doesn't trigger the stop operation */ void snd_usb_endpoint_sync_pending_stop(struct snd_usb_endpoint *ep) { if (ep) wait_clear_urbs(ep); } /* * Stop active urbs * * This function moves the EP to STOPPING state if it's being RUNNING. */ static int stop_urbs(struct snd_usb_endpoint *ep, bool force, bool keep_pending) { unsigned int i; if (!force && atomic_read(&ep->running)) return -EBUSY; if (!ep_state_update(ep, EP_STATE_RUNNING, EP_STATE_STOPPING)) return 0; scoped_guard(spinlock_irqsave, &ep->lock) { INIT_LIST_HEAD(&ep->ready_playback_urbs); ep->next_packet_head = 0; ep->next_packet_queued = 0; } if (keep_pending) return 0; for (i = 0; i < ep->nurbs; i++) { if (test_bit(i, &ep->active_mask)) { if (!test_and_set_bit(i, &ep->unlink_mask)) { struct urb *u = ep->urb[i].urb; usb_unlink_urb(u); } } } return 0; } /* * release an endpoint's urbs */ static int release_urbs(struct snd_usb_endpoint *ep, bool force) { int i, err; /* route incoming urbs to nirvana */ snd_usb_endpoint_set_callback(ep, NULL, NULL, NULL); /* stop and unlink urbs */ err = stop_urbs(ep, force, false); if (err) return err; wait_clear_urbs(ep); for (i = 0; i < ep->nurbs; i++) release_urb_ctx(&ep->urb[i]); usb_free_coherent(ep->chip->dev, SYNC_URBS * 4, ep->syncbuf, ep->sync_dma); ep->syncbuf = NULL; ep->nurbs = 0; return 0; } /* * configure a data endpoint */ static int data_ep_set_params(struct snd_usb_endpoint *ep) { struct snd_usb_audio *chip = ep->chip; unsigned int maxsize, minsize, packs_per_ms, max_packs_per_urb; unsigned int max_packs_per_period, urbs_per_period, urb_packs; unsigned int max_urbs, i; const struct audioformat *fmt = ep->cur_audiofmt; int frame_bits = ep->cur_frame_bytes * 8; int tx_length_quirk = (has_tx_length_quirk(chip) && usb_pipeout(ep->pipe)); usb_audio_dbg(chip, "Setting params for data EP 0x%x, pipe 0x%x\n", ep->ep_num, ep->pipe); if (ep->cur_format == SNDRV_PCM_FORMAT_DSD_U16_LE && fmt->dsd_dop) { /* * When operating in DSD DOP mode, the size of a sample frame * in hardware differs from the actual physical format width * because we need to make room for the DOP markers. */ frame_bits += ep->cur_channels << 3; } ep->datainterval = fmt->datainterval; ep->stride = frame_bits >> 3; switch (ep->cur_format) { case SNDRV_PCM_FORMAT_U8: ep->silence_value = 0x80; break; case SNDRV_PCM_FORMAT_DSD_U8: case SNDRV_PCM_FORMAT_DSD_U16_LE: case SNDRV_PCM_FORMAT_DSD_U32_LE: case SNDRV_PCM_FORMAT_DSD_U16_BE: case SNDRV_PCM_FORMAT_DSD_U32_BE: ep->silence_value = 0x69; break; default: ep->silence_value = 0; } /* assume max. frequency is 50% higher than nominal */ ep->freqmax = ep->freqn + (ep->freqn >> 1); /* Round up freqmax to nearest integer in order to calculate maximum * packet size, which must represent a whole number of frames. * This is accomplished by adding 0x0.ffff before converting the * Q16.16 format into integer. * In order to accurately calculate the maximum packet size when * the data interval is more than 1 (i.e. ep->datainterval > 0), * multiply by the data interval prior to rounding. For instance, * a freqmax of 41 kHz will result in a max packet size of 6 (5.125) * frames with a data interval of 1, but 11 (10.25) frames with a * data interval of 2. * (ep->freqmax << ep->datainterval overflows at 8.192 MHz for the * maximum datainterval value of 3, at USB full speed, higher for * USB high speed, noting that ep->freqmax is in units of * frames per packet in Q16.16 format.) */ maxsize = (((ep->freqmax << ep->datainterval) + 0xffff) >> 16) * (frame_bits >> 3); if (tx_length_quirk) maxsize += sizeof(__le32); /* Space for length descriptor */ /* but wMaxPacketSize might reduce this */ if (ep->maxpacksize && ep->maxpacksize < maxsize) { /* whatever fits into a max. size packet */ unsigned int data_maxsize = maxsize = ep->maxpacksize; if (tx_length_quirk) /* Need to remove the length descriptor to calc freq */ data_maxsize -= sizeof(__le32); ep->freqmax = (data_maxsize / (frame_bits >> 3)) << (16 - ep->datainterval); } if (ep->fill_max) ep->curpacksize = ep->maxpacksize; else ep->curpacksize = maxsize; if (snd_usb_get_speed(chip->dev) != USB_SPEED_FULL) { packs_per_ms = 8 >> ep->datainterval; max_packs_per_urb = MAX_PACKS_HS; } else { packs_per_ms = 1; max_packs_per_urb = MAX_PACKS; } if (ep->sync_source && !ep->implicit_fb_sync) max_packs_per_urb = min(max_packs_per_urb, 1U << ep->sync_source->syncinterval); max_packs_per_urb = max(1u, max_packs_per_urb >> ep->datainterval); /* * Capture endpoints need to use small URBs because there's no way * to tell in advance where the next period will end, and we don't * want the next URB to complete much after the period ends. * * Playback endpoints with implicit sync much use the same parameters * as their corresponding capture endpoint. */ if (usb_pipein(ep->pipe) || ep->implicit_fb_sync) { /* make capture URBs <= 1 ms and smaller than a period */ urb_packs = min(max_packs_per_urb, packs_per_ms); while (urb_packs > 1 && urb_packs * maxsize >= ep->cur_period_bytes) urb_packs >>= 1; ep->nurbs = MAX_URBS; /* * Playback endpoints without implicit sync are adjusted so that * a period fits as evenly as possible in the smallest number of * URBs. The total number of URBs is adjusted to the size of the * ALSA buffer, subject to the MAX_URBS and MAX_QUEUE limits. */ } else { /* determine how small a packet can be */ minsize = (ep->freqn >> (16 - ep->datainterval)) * (frame_bits >> 3); /* with sync from device, assume it can be 12% lower */ if (ep->sync_source) minsize -= minsize >> 3; minsize = max(minsize, 1u); /* how many packets will contain an entire ALSA period? */ max_packs_per_period = DIV_ROUND_UP(ep->cur_period_bytes, minsize); /* how many URBs will contain a period? */ urbs_per_period = DIV_ROUND_UP(max_packs_per_period, max_packs_per_urb); /* how many packets are needed in each URB? */ urb_packs = DIV_ROUND_UP(max_packs_per_period, urbs_per_period); /* limit the number of frames in a single URB */ ep->max_urb_frames = DIV_ROUND_UP(ep->cur_period_frames, urbs_per_period); /* try to use enough URBs to contain an entire ALSA buffer */ max_urbs = min((unsigned) MAX_URBS, MAX_QUEUE * packs_per_ms / urb_packs); ep->nurbs = min(max_urbs, urbs_per_period * ep->cur_buffer_periods); } /* allocate and initialize data urbs */ for (i = 0; i < ep->nurbs; i++) { struct snd_urb_ctx *u = &ep->urb[i]; u->index = i; u->ep = ep; u->packets = urb_packs; u->buffer_size = maxsize * u->packets; if (fmt->fmt_type == UAC_FORMAT_TYPE_II) u->packets++; /* for transfer delimiter */ u->urb = usb_alloc_urb(u->packets, GFP_KERNEL); if (!u->urb) goto out_of_memory; u->urb->transfer_buffer = usb_alloc_coherent(chip->dev, u->buffer_size, GFP_KERNEL, &u->urb->transfer_dma); if (!u->urb->transfer_buffer) goto out_of_memory; u->urb->pipe = ep->pipe; u->urb->transfer_flags = URB_NO_TRANSFER_DMA_MAP; u->urb->interval = 1 << ep->datainterval; u->urb->context = u; u->urb->complete = snd_complete_urb; INIT_LIST_HEAD(&u->ready_list); } return 0; out_of_memory: release_urbs(ep, false); return -ENOMEM; } /* * configure a sync endpoint */ static int sync_ep_set_params(struct snd_usb_endpoint *ep) { struct snd_usb_audio *chip = ep->chip; int i; usb_audio_dbg(chip, "Setting params for sync EP 0x%x, pipe 0x%x\n", ep->ep_num, ep->pipe); ep->syncbuf = usb_alloc_coherent(chip->dev, SYNC_URBS * 4, GFP_KERNEL, &ep->sync_dma); if (!ep->syncbuf) return -ENOMEM; ep->nurbs = SYNC_URBS; for (i = 0; i < SYNC_URBS; i++) { struct snd_urb_ctx *u = &ep->urb[i]; u->index = i; u->ep = ep; u->packets = 1; u->urb = usb_alloc_urb(1, GFP_KERNEL); if (!u->urb) goto out_of_memory; u->urb->transfer_buffer = ep->syncbuf + i * 4; u->urb->transfer_dma = ep->sync_dma + i * 4; u->urb->transfer_buffer_length = 4; u->urb->pipe = ep->pipe; u->urb->transfer_flags = URB_NO_TRANSFER_DMA_MAP; u->urb->number_of_packets = 1; u->urb->interval = 1 << ep->syncinterval; u->urb->context = u; u->urb->complete = snd_complete_urb; } return 0; out_of_memory: release_urbs(ep, false); return -ENOMEM; } /* update the rate of the referred clock; return the actual rate */ static int update_clock_ref_rate(struct snd_usb_audio *chip, struct snd_usb_endpoint *ep) { struct snd_usb_clock_ref *clock = ep->clock_ref; int rate = ep->cur_rate; if (!clock || clock->rate == rate) return rate; if (clock->rate) { if (atomic_read(&clock->locked)) return clock->rate; if (clock->rate != rate) { usb_audio_err(chip, "Mismatched sample rate %d vs %d for EP 0x%x\n", clock->rate, rate, ep->ep_num); return clock->rate; } } clock->rate = rate; clock->need_setup = true; return rate; } /* * snd_usb_endpoint_set_params: configure an snd_usb_endpoint * * It's called either from hw_params callback. * Determine the number of URBs to be used on this endpoint. * An endpoint must be configured before it can be started. * An endpoint that is already running can not be reconfigured. */ int snd_usb_endpoint_set_params(struct snd_usb_audio *chip, struct snd_usb_endpoint *ep) { const struct audioformat *fmt = ep->cur_audiofmt; int err; guard(mutex)(&chip->mutex); if (!ep->need_setup) return 0; /* release old buffers, if any */ err = release_urbs(ep, false); if (err < 0) return err; ep->datainterval = fmt->datainterval; ep->maxpacksize = fmt->maxpacksize; ep->fill_max = !!(fmt->attributes & UAC_EP_CS_ATTR_FILL_MAX); if (snd_usb_get_speed(chip->dev) == USB_SPEED_FULL) { ep->freqn = get_usb_full_speed_rate(ep->cur_rate); ep->pps = 1000 >> ep->datainterval; } else { ep->freqn = get_usb_high_speed_rate(ep->cur_rate); ep->pps = 8000 >> ep->datainterval; } ep->sample_rem = ep->cur_rate % ep->pps; ep->packsize[0] = ep->cur_rate / ep->pps; ep->packsize[1] = (ep->cur_rate + (ep->pps - 1)) / ep->pps; /* calculate the frequency in 16.16 format */ ep->freqm = ep->freqn; ep->freqshift = INT_MIN; ep->phase = 0; switch (ep->type) { case SND_USB_ENDPOINT_TYPE_DATA: err = data_ep_set_params(ep); break; case SND_USB_ENDPOINT_TYPE_SYNC: err = sync_ep_set_params(ep); break; default: err = -EINVAL; } usb_audio_dbg(chip, "Set up %d URBS, ret=%d\n", ep->nurbs, err); if (err < 0) return err; /* some unit conversions in runtime */ ep->maxframesize = ep->maxpacksize / ep->cur_frame_bytes; ep->curframesize = ep->curpacksize / ep->cur_frame_bytes; err = update_clock_ref_rate(chip, ep); if (err >= 0) { ep->need_setup = false; err = 0; } return err; } static int init_sample_rate(struct snd_usb_audio *chip, struct snd_usb_endpoint *ep) { struct snd_usb_clock_ref *clock = ep->clock_ref; int rate, err; rate = update_clock_ref_rate(chip, ep); if (rate < 0) return rate; if (clock && !clock->need_setup) return 0; if (!ep->fixed_rate) { err = snd_usb_init_sample_rate(chip, ep->cur_audiofmt, rate); if (err < 0) { if (clock) clock->rate = 0; /* reset rate */ return err; } } if (clock) clock->need_setup = false; return 0; } /* * snd_usb_endpoint_prepare: Prepare the endpoint * * This function sets up the EP to be fully usable state. * It's called either from prepare callback. * The function checks need_setup flag, and performs nothing unless needed, * so it's safe to call this multiple times. * * This returns zero if unchanged, 1 if the configuration has changed, * or a negative error code. */ int snd_usb_endpoint_prepare(struct snd_usb_audio *chip, struct snd_usb_endpoint *ep) { bool iface_first; int err = 0; guard(mutex)(&chip->mutex); if (WARN_ON(!ep->iface_ref)) return 0; if (!ep->need_prepare) return 0; /* If the interface has been already set up, just set EP parameters */ if (!ep->iface_ref->need_setup) { /* sample rate setup of UAC1 is per endpoint, and we need * to update at each EP configuration */ if (ep->cur_audiofmt->protocol == UAC_VERSION_1) { err = init_sample_rate(chip, ep); if (err < 0) return err; } goto done; } /* Need to deselect altsetting at first */ endpoint_set_interface(chip, ep, false); /* Some UAC1 devices (e.g. Yamaha THR10) need the host interface * to be set up before parameter setups */ iface_first = ep->cur_audiofmt->protocol == UAC_VERSION_1; /* Workaround for devices that require the interface setup at first like UAC1 */ if (chip->quirk_flags & QUIRK_FLAG_SET_IFACE_FIRST) iface_first = true; if (iface_first) { err = endpoint_set_interface(chip, ep, true); if (err < 0) return err; } err = snd_usb_init_pitch(chip, ep->cur_audiofmt); if (err < 0) return err; err = init_sample_rate(chip, ep); if (err < 0) return err; err = snd_usb_select_mode_quirk(chip, ep->cur_audiofmt); if (err < 0) return err; /* for UAC2/3, enable the interface altset here at last */ if (!iface_first) { err = endpoint_set_interface(chip, ep, true); if (err < 0) return err; } ep->iface_ref->need_setup = false; done: ep->need_prepare = false; return 1; } EXPORT_SYMBOL_GPL(snd_usb_endpoint_prepare); /* get the current rate set to the given clock by any endpoint */ int snd_usb_endpoint_get_clock_rate(struct snd_usb_audio *chip, int clock) { struct snd_usb_clock_ref *ref; int rate = 0; if (!clock) return 0; guard(mutex)(&chip->mutex); list_for_each_entry(ref, &chip->clock_ref_list, list) { if (ref->clock == clock) { rate = ref->rate; break; } } return rate; } /** * snd_usb_endpoint_start: start an snd_usb_endpoint * * @ep: the endpoint to start * * A call to this function will increment the running count of the endpoint. * In case it is not already running, the URBs for this endpoint will be * submitted. Otherwise, this function does nothing. * * Must be balanced to calls of snd_usb_endpoint_stop(). * * Returns an error if the URB submission failed, 0 in all other cases. */ int snd_usb_endpoint_start(struct snd_usb_endpoint *ep) { bool is_playback = usb_pipeout(ep->pipe); int err; unsigned int i; if (atomic_read(&ep->chip->shutdown)) return -EBADFD; if (ep->sync_source) WRITE_ONCE(ep->sync_source->sync_sink, ep); usb_audio_dbg(ep->chip, "Starting %s EP 0x%x (running %d)\n", ep_type_name(ep->type), ep->ep_num, atomic_read(&ep->running)); /* already running? */ if (atomic_inc_return(&ep->running) != 1) return 0; if (ep->clock_ref) atomic_inc(&ep->clock_ref->locked); ep->active_mask = 0; ep->unlink_mask = 0; ep->phase = 0; ep->sample_accum = 0; snd_usb_endpoint_start_quirk(ep); /* * If this endpoint has a data endpoint as implicit feedback source, * don't start the urbs here. Instead, mark them all as available, * wait for the record urbs to return and queue the playback urbs * from that context. */ if (!ep_state_update(ep, EP_STATE_STOPPED, EP_STATE_RUNNING)) goto __error; if (snd_usb_endpoint_implicit_feedback_sink(ep) && !(ep->chip->quirk_flags & QUIRK_FLAG_PLAYBACK_FIRST)) { usb_audio_dbg(ep->chip, "No URB submission due to implicit fb sync\n"); i = 0; goto fill_rest; } for (i = 0; i < ep->nurbs; i++) { struct urb *urb = ep->urb[i].urb; if (snd_BUG_ON(!urb)) goto __error; if (is_playback) err = prepare_outbound_urb(ep, urb->context, true); else err = prepare_inbound_urb(ep, urb->context); if (err < 0) { /* stop filling at applptr */ if (err == -EAGAIN) break; usb_audio_dbg(ep->chip, "EP 0x%x: failed to prepare urb: %d\n", ep->ep_num, err); goto __error; } if (!atomic_read(&ep->chip->shutdown)) err = usb_submit_urb(urb, GFP_ATOMIC); else err = -ENODEV; if (err < 0) { if (!atomic_read(&ep->chip->shutdown)) usb_audio_err(ep->chip, "cannot submit urb %d, error %d: %s\n", i, err, usb_error_string(err)); goto __error; } set_bit(i, &ep->active_mask); atomic_inc(&ep->submitted_urbs); } if (!i) { usb_audio_dbg(ep->chip, "XRUN at starting EP 0x%x\n", ep->ep_num); goto __error; } usb_audio_dbg(ep->chip, "%d URBs submitted for EP 0x%x\n", i, ep->ep_num); fill_rest: /* put the remaining URBs to ready list */ if (is_playback) { for (; i < ep->nurbs; i++) push_back_to_ready_list(ep, ep->urb + i); } return 0; __error: snd_usb_endpoint_stop(ep, false); return -EPIPE; } /** * snd_usb_endpoint_stop: stop an snd_usb_endpoint * * @ep: the endpoint to stop (may be NULL) * @keep_pending: keep in-flight URBs * * A call to this function will decrement the running count of the endpoint. * In case the last user has requested the endpoint stop, the URBs will * actually be deactivated. * * Must be balanced to calls of snd_usb_endpoint_start(). * * The caller needs to synchronize the pending stop operation via * snd_usb_endpoint_sync_pending_stop(). */ void snd_usb_endpoint_stop(struct snd_usb_endpoint *ep, bool keep_pending) { if (!ep) return; usb_audio_dbg(ep->chip, "Stopping %s EP 0x%x (running %d)\n", ep_type_name(ep->type), ep->ep_num, atomic_read(&ep->running)); if (snd_BUG_ON(!atomic_read(&ep->running))) return; if (!atomic_dec_return(&ep->running)) { if (ep->sync_source) WRITE_ONCE(ep->sync_source->sync_sink, NULL); stop_urbs(ep, false, keep_pending); if (ep->clock_ref) atomic_dec(&ep->clock_ref->locked); if (ep->chip->quirk_flags & QUIRK_FLAG_FORCE_IFACE_RESET && usb_pipeout(ep->pipe)) { ep->need_prepare = true; if (ep->iface_ref) ep->iface_ref->need_setup = true; } } } /** * snd_usb_endpoint_release: Tear down an snd_usb_endpoint * * @ep: the endpoint to release * * This function does not care for the endpoint's running count but will tear * down all the streaming URBs immediately. */ void snd_usb_endpoint_release(struct snd_usb_endpoint *ep) { release_urbs(ep, true); } /** * snd_usb_endpoint_free_all: Free the resources of an snd_usb_endpoint * @chip: The chip * * This free all endpoints and those resources */ void snd_usb_endpoint_free_all(struct snd_usb_audio *chip) { struct snd_usb_endpoint *ep, *en; struct snd_usb_iface_ref *ip, *in; struct snd_usb_clock_ref *cp, *cn; list_for_each_entry_safe(ep, en, &chip->ep_list, list) kfree(ep); list_for_each_entry_safe(ip, in, &chip->iface_ref_list, list) kfree(ip); list_for_each_entry_safe(cp, cn, &chip->clock_ref_list, list) kfree(cp); } /* * snd_usb_handle_sync_urb: parse an USB sync packet * * @ep: the endpoint to handle the packet * @sender: the sending endpoint * @urb: the received packet * * This function is called from the context of an endpoint that received * the packet and is used to let another endpoint object handle the payload. */ static void snd_usb_handle_sync_urb(struct snd_usb_endpoint *ep, struct snd_usb_endpoint *sender, const struct urb *urb) { int shift; unsigned int f; unsigned long flags; snd_BUG_ON(ep == sender); /* * In case the endpoint is operating in implicit feedback mode, prepare * a new outbound URB that has the same layout as the received packet * and add it to the list of pending urbs. queue_pending_output_urbs() * will take care of them later. */ if (snd_usb_endpoint_implicit_feedback_sink(ep) && atomic_read(&ep->running)) { /* implicit feedback case */ int i, bytes = 0; struct snd_urb_ctx *in_ctx; struct snd_usb_packet_info *out_packet; in_ctx = urb->context; /* Count overall packet size */ for (i = 0; i < in_ctx->packets; i++) if (urb->iso_frame_desc[i].status == 0) bytes += urb->iso_frame_desc[i].actual_length; /* * skip empty packets. At least M-Audio's Fast Track Ultra stops * streaming once it received a 0-byte OUT URB */ if (bytes == 0) return; spin_lock_irqsave(&ep->lock, flags); if (ep->next_packet_queued >= ARRAY_SIZE(ep->next_packet)) { spin_unlock_irqrestore(&ep->lock, flags); usb_audio_err(ep->chip, "next package FIFO overflow EP 0x%x\n", ep->ep_num); notify_xrun(ep); return; } out_packet = next_packet_fifo_enqueue(ep); /* * Iterate through the inbound packet and prepare the lengths * for the output packet. The OUT packet we are about to send * will have the same amount of payload bytes per stride as the * IN packet we just received. Since the actual size is scaled * by the stride, use the sender stride to calculate the length * in case the number of channels differ between the implicitly * fed-back endpoint and the synchronizing endpoint. */ out_packet->packets = in_ctx->packets; for (i = 0; i < in_ctx->packets; i++) { if (urb->iso_frame_desc[i].status == 0) out_packet->packet_size[i] = urb->iso_frame_desc[i].actual_length / sender->stride; else out_packet->packet_size[i] = 0; } spin_unlock_irqrestore(&ep->lock, flags); snd_usb_queue_pending_output_urbs(ep, false); return; } /* * process after playback sync complete * * Full speed devices report feedback values in 10.14 format as samples * per frame, high speed devices in 16.16 format as samples per * microframe. * * Because the Audio Class 1 spec was written before USB 2.0, many high * speed devices use a wrong interpretation, some others use an * entirely different format. * * Therefore, we cannot predict what format any particular device uses * and must detect it automatically. */ if (urb->iso_frame_desc[0].status != 0 || urb->iso_frame_desc[0].actual_length < 3) return; f = le32_to_cpup(urb->transfer_buffer); if (urb->iso_frame_desc[0].actual_length == 3) f &= 0x00ffffff; else f &= 0x0fffffff; if (f == 0) return; if (unlikely(sender->tenor_fb_quirk)) { /* * Devices based on Tenor 8802 chipsets (TEAC UD-H01 * and others) sometimes change the feedback value * by +/- 0x1.0000. */ if (f < ep->freqn - 0x8000) f += 0xf000; else if (f > ep->freqn + 0x8000) f -= 0xf000; } else if (unlikely(ep->freqshift == INT_MIN)) { /* * The first time we see a feedback value, determine its format * by shifting it left or right until it matches the nominal * frequency value. This assumes that the feedback does not * differ from the nominal value more than +50% or -25%. */ shift = 0; while (f < ep->freqn - ep->freqn / 4) { f <<= 1; shift++; } while (f > ep->freqn + ep->freqn / 2) { f >>= 1; shift--; } ep->freqshift = shift; } else if (ep->freqshift >= 0) f <<= ep->freqshift; else f >>= -ep->freqshift; if (likely(f >= ep->freqn - ep->freqn / 8 && f <= ep->freqmax)) { /* * If the frequency looks valid, set it. * This value is referred to in prepare_playback_urb(). */ guard(spinlock_irqsave)(&ep->lock); ep->freqm = f; } else { /* * Out of range; maybe the shift value is wrong. * Reset it so that we autodetect again the next time. */ ep->freqshift = INT_MIN; } } |
| 141 140 4 135 135 112 17 11 3 3 80 47 108 21 120 2 5 96 31 43 43 1 40 40 40 22 2 20 20 1 19 20 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 | // SPDX-License-Identifier: GPL-2.0-or-later /* * GRE over IPv4 demultiplexer driver * * Authors: Dmitry Kozlov (xeb@mail.ru) */ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #include <linux/module.h> #include <linux/if.h> #include <linux/icmp.h> #include <linux/kernel.h> #include <linux/kmod.h> #include <linux/skbuff.h> #include <linux/in.h> #include <linux/ip.h> #include <linux/netdevice.h> #include <linux/if_tunnel.h> #include <linux/spinlock.h> #include <net/protocol.h> #include <net/gre.h> #include <net/erspan.h> #include <net/icmp.h> #include <net/route.h> #include <net/xfrm.h> static const struct gre_protocol __rcu *gre_proto[GREPROTO_MAX] __read_mostly; int gre_add_protocol(const struct gre_protocol *proto, u8 version) { if (version >= GREPROTO_MAX) return -EINVAL; return (cmpxchg((const struct gre_protocol **)&gre_proto[version], NULL, proto) == NULL) ? 0 : -EBUSY; } EXPORT_SYMBOL_GPL(gre_add_protocol); int gre_del_protocol(const struct gre_protocol *proto, u8 version) { int ret; if (version >= GREPROTO_MAX) return -EINVAL; ret = (cmpxchg((const struct gre_protocol **)&gre_proto[version], proto, NULL) == proto) ? 0 : -EBUSY; if (ret) return ret; synchronize_rcu(); return 0; } EXPORT_SYMBOL_GPL(gre_del_protocol); /* Fills in tpi and returns header length to be pulled. * Note that caller must use pskb_may_pull() before pulling GRE header. */ int gre_parse_header(struct sk_buff *skb, struct tnl_ptk_info *tpi, bool *csum_err, __be16 proto, int nhs) { const struct gre_base_hdr *greh; __be32 *options; int hdr_len; if (unlikely(!pskb_may_pull(skb, nhs + sizeof(struct gre_base_hdr)))) return -EINVAL; greh = (struct gre_base_hdr *)(skb->data + nhs); if (unlikely(greh->flags & (GRE_VERSION | GRE_ROUTING))) return -EINVAL; gre_flags_to_tnl_flags(tpi->flags, greh->flags); hdr_len = gre_calc_hlen(tpi->flags); if (!pskb_may_pull(skb, nhs + hdr_len)) return -EINVAL; greh = (struct gre_base_hdr *)(skb->data + nhs); tpi->proto = greh->protocol; options = (__be32 *)(greh + 1); if (greh->flags & GRE_CSUM) { if (!skb_checksum_simple_validate(skb)) { skb_checksum_try_convert(skb, IPPROTO_GRE, null_compute_pseudo); } else if (csum_err) { *csum_err = true; return -EINVAL; } options++; } if (greh->flags & GRE_KEY) { tpi->key = *options; options++; } else { tpi->key = 0; } if (unlikely(greh->flags & GRE_SEQ)) { tpi->seq = *options; options++; } else { tpi->seq = 0; } /* WCCP version 1 and 2 protocol decoding. * - Change protocol to IPv4/IPv6 * - When dealing with WCCPv2, Skip extra 4 bytes in GRE header */ if (greh->flags == 0 && tpi->proto == htons(ETH_P_WCCP)) { u8 _val, *val; val = skb_header_pointer(skb, nhs + hdr_len, sizeof(_val), &_val); if (!val) return -EINVAL; tpi->proto = proto; if ((*val & 0xF0) != 0x40) hdr_len += 4; } tpi->hdr_len = hdr_len; /* ERSPAN ver 1 and 2 protocol sets GRE key field * to 0 and sets the configured key in the * inner erspan header field */ if ((greh->protocol == htons(ETH_P_ERSPAN) && hdr_len != 4) || greh->protocol == htons(ETH_P_ERSPAN2)) { struct erspan_base_hdr *ershdr; if (!pskb_may_pull(skb, nhs + hdr_len + sizeof(*ershdr))) return -EINVAL; ershdr = (struct erspan_base_hdr *)(skb->data + nhs + hdr_len); tpi->key = cpu_to_be32(get_session_id(ershdr)); } return hdr_len; } EXPORT_SYMBOL(gre_parse_header); static int gre_rcv(struct sk_buff *skb) { const struct gre_protocol *proto; u8 ver; int ret; if (!pskb_may_pull(skb, 12)) goto drop; ver = skb->data[1]&0x7f; if (ver >= GREPROTO_MAX) goto drop; rcu_read_lock(); proto = rcu_dereference(gre_proto[ver]); if (!proto || !proto->handler) goto drop_unlock; ret = proto->handler(skb); rcu_read_unlock(); return ret; drop_unlock: rcu_read_unlock(); drop: kfree_skb(skb); return NET_RX_DROP; } static int gre_err(struct sk_buff *skb, u32 info) { const struct gre_protocol *proto; const struct iphdr *iph = (const struct iphdr *)skb->data; u8 ver = skb->data[(iph->ihl<<2) + 1]&0x7f; int err = 0; if (ver >= GREPROTO_MAX) return -EINVAL; rcu_read_lock(); proto = rcu_dereference(gre_proto[ver]); if (proto && proto->err_handler) proto->err_handler(skb, info); else err = -EPROTONOSUPPORT; rcu_read_unlock(); return err; } static const struct net_protocol net_gre_protocol = { .handler = gre_rcv, .err_handler = gre_err, }; static int __init gre_init(void) { pr_info("GRE over IPv4 demultiplexer driver\n"); if (inet_add_protocol(&net_gre_protocol, IPPROTO_GRE) < 0) { pr_err("can't add protocol\n"); return -EAGAIN; } return 0; } static void __exit gre_exit(void) { inet_del_protocol(&net_gre_protocol, IPPROTO_GRE); } module_init(gre_init); module_exit(gre_exit); MODULE_DESCRIPTION("GRE over IPv4 demultiplexer driver"); MODULE_AUTHOR("D. Kozlov <xeb@mail.ru>"); MODULE_LICENSE("GPL"); |
| 12 5 5 12 4 4 2 16 10 12 12 10 4 5 16 20 1 2 1 16 15 14 12 6 16 16 4 4 4 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 | // SPDX-License-Identifier: GPL-2.0-or-later /* * net/sched/em_meta.c Metadata ematch * * Authors: Thomas Graf <tgraf@suug.ch> * * ========================================================================== * * The metadata ematch compares two meta objects where each object * represents either a meta value stored in the kernel or a static * value provided by userspace. The objects are not provided by * userspace itself but rather a definition providing the information * to build them. Every object is of a certain type which must be * equal to the object it is being compared to. * * The definition of a objects conists of the type (meta type), a * identifier (meta id) and additional type specific information. * The meta id is either TCF_META_TYPE_VALUE for values provided by * userspace or a index to the meta operations table consisting of * function pointers to type specific meta data collectors returning * the value of the requested meta value. * * lvalue rvalue * +-----------+ +-----------+ * | type: INT | | type: INT | * def | id: DEV | | id: VALUE | * | data: | | data: 3 | * +-----------+ +-----------+ * | | * ---> meta_ops[INT][DEV](...) | * | | * ----------- | * V V * +-----------+ +-----------+ * | type: INT | | type: INT | * obj | id: DEV | | id: VALUE | * | data: 2 |<--data got filled out | data: 3 | * +-----------+ +-----------+ * | | * --------------> 2 equals 3 <-------------- * * This is a simplified schema, the complexity varies depending * on the meta type. Obviously, the length of the data must also * be provided for non-numeric types. * * Additionally, type dependent modifiers such as shift operators * or mask may be applied to extend the functionality. As of now, * the variable length type supports shifting the byte string to * the right, eating up any number of octets and thus supporting * wildcard interface name comparisons such as "ppp%" matching * ppp0..9. * * NOTE: Certain meta values depend on other subsystems and are * only available if that subsystem is enabled in the kernel. */ #include <linux/slab.h> #include <linux/module.h> #include <linux/types.h> #include <linux/kernel.h> #include <linux/sched.h> #include <linux/sched/loadavg.h> #include <linux/string.h> #include <linux/skbuff.h> #include <linux/random.h> #include <linux/if_vlan.h> #include <linux/tc_ematch/tc_em_meta.h> #include <net/dst.h> #include <net/route.h> #include <net/pkt_cls.h> #include <net/sock.h> struct meta_obj { unsigned long value; unsigned int len; }; struct meta_value { struct tcf_meta_val hdr; unsigned long val; unsigned int len; }; struct meta_match { struct meta_value lvalue; struct meta_value rvalue; }; static inline int meta_id(struct meta_value *v) { return TCF_META_ID(v->hdr.kind); } static inline int meta_type(struct meta_value *v) { return TCF_META_TYPE(v->hdr.kind); } #define META_COLLECTOR(FUNC) static void meta_##FUNC(struct sk_buff *skb, \ struct tcf_pkt_info *info, struct meta_value *v, \ struct meta_obj *dst, int *err) /************************************************************************** * System status & misc **************************************************************************/ META_COLLECTOR(int_random) { get_random_bytes(&dst->value, sizeof(dst->value)); } static inline unsigned long fixed_loadavg(int load) { int rnd_load = load + (FIXED_1/200); int rnd_frac = ((rnd_load & (FIXED_1-1)) * 100) >> FSHIFT; return ((rnd_load >> FSHIFT) * 100) + rnd_frac; } META_COLLECTOR(int_loadavg_0) { dst->value = fixed_loadavg(avenrun[0]); } META_COLLECTOR(int_loadavg_1) { dst->value = fixed_loadavg(avenrun[1]); } META_COLLECTOR(int_loadavg_2) { dst->value = fixed_loadavg(avenrun[2]); } /************************************************************************** * Device names & indices **************************************************************************/ static inline int int_dev(struct net_device *dev, struct meta_obj *dst) { if (unlikely(dev == NULL)) return -1; dst->value = dev->ifindex; return 0; } static inline int var_dev(struct net_device *dev, struct meta_obj *dst) { if (unlikely(dev == NULL)) return -1; dst->value = (unsigned long) dev->name; dst->len = strlen(dev->name); return 0; } META_COLLECTOR(int_dev) { *err = int_dev(skb->dev, dst); } META_COLLECTOR(var_dev) { *err = var_dev(skb->dev, dst); } /************************************************************************** * vlan tag **************************************************************************/ META_COLLECTOR(int_vlan_tag) { unsigned short tag; if (skb_vlan_tag_present(skb)) dst->value = skb_vlan_tag_get(skb); else if (!__vlan_get_tag(skb, &tag)) dst->value = tag; else *err = -1; } /************************************************************************** * skb attributes **************************************************************************/ META_COLLECTOR(int_priority) { dst->value = skb->priority; } META_COLLECTOR(int_protocol) { /* Let userspace take care of the byte ordering */ dst->value = skb_protocol(skb, false); } META_COLLECTOR(int_pkttype) { dst->value = skb->pkt_type; } META_COLLECTOR(int_pktlen) { dst->value = skb->len; } META_COLLECTOR(int_datalen) { dst->value = skb->data_len; } META_COLLECTOR(int_maclen) { dst->value = skb->mac_len; } META_COLLECTOR(int_rxhash) { dst->value = skb_get_hash(skb); } /************************************************************************** * Netfilter **************************************************************************/ META_COLLECTOR(int_mark) { dst->value = skb->mark; } /************************************************************************** * Traffic Control **************************************************************************/ META_COLLECTOR(int_tcindex) { dst->value = skb->tc_index; } /************************************************************************** * Routing **************************************************************************/ META_COLLECTOR(int_rtclassid) { if (unlikely(skb_dst(skb) == NULL)) *err = -1; else #ifdef CONFIG_IP_ROUTE_CLASSID dst->value = skb_dst(skb)->tclassid; #else dst->value = 0; #endif } META_COLLECTOR(int_rtiif) { if (unlikely(skb_rtable(skb) == NULL)) *err = -1; else dst->value = inet_iif(skb); } /************************************************************************** * Socket Attributes **************************************************************************/ #define skip_nonlocal(skb) \ (unlikely(skb->sk == NULL)) META_COLLECTOR(int_sk_family) { if (skip_nonlocal(skb)) { *err = -1; return; } dst->value = skb->sk->sk_family; } META_COLLECTOR(int_sk_state) { if (skip_nonlocal(skb)) { *err = -1; return; } dst->value = skb->sk->sk_state; } META_COLLECTOR(int_sk_reuse) { if (skip_nonlocal(skb)) { *err = -1; return; } dst->value = skb->sk->sk_reuse; } META_COLLECTOR(int_sk_bound_if) { if (skip_nonlocal(skb)) { *err = -1; return; } /* No error if bound_dev_if is 0, legal userspace check */ dst->value = skb->sk->sk_bound_dev_if; } META_COLLECTOR(var_sk_bound_if) { int bound_dev_if; if (skip_nonlocal(skb)) { *err = -1; return; } bound_dev_if = READ_ONCE(skb->sk->sk_bound_dev_if); if (bound_dev_if == 0) { dst->value = (unsigned long) "any"; dst->len = 3; } else { struct net_device *dev; rcu_read_lock(); dev = dev_get_by_index_rcu(sock_net(skb->sk), bound_dev_if); *err = var_dev(dev, dst); rcu_read_unlock(); } } META_COLLECTOR(int_sk_refcnt) { if (skip_nonlocal(skb)) { *err = -1; return; } dst->value = refcount_read(&skb->sk->sk_refcnt); } META_COLLECTOR(int_sk_rcvbuf) { const struct sock *sk = skb_to_full_sk(skb); if (!sk) { *err = -1; return; } dst->value = sk->sk_rcvbuf; } META_COLLECTOR(int_sk_shutdown) { const struct sock *sk = skb_to_full_sk(skb); if (!sk) { *err = -1; return; } dst->value = sk->sk_shutdown; } META_COLLECTOR(int_sk_proto) { const struct sock *sk = skb_to_full_sk(skb); if (!sk) { *err = -1; return; } dst->value = sk->sk_protocol; } META_COLLECTOR(int_sk_type) { const struct sock *sk = skb_to_full_sk(skb); if (!sk) { *err = -1; return; } dst->value = sk->sk_type; } META_COLLECTOR(int_sk_rmem_alloc) { const struct sock *sk = skb_to_full_sk(skb); if (!sk) { *err = -1; return; } dst->value = sk_rmem_alloc_get(sk); } META_COLLECTOR(int_sk_wmem_alloc) { const struct sock *sk = skb_to_full_sk(skb); if (!sk) { *err = -1; return; } dst->value = sk_wmem_alloc_get(sk); } META_COLLECTOR(int_sk_omem_alloc) { const struct sock *sk = skb_to_full_sk(skb); if (!sk) { *err = -1; return; } dst->value = atomic_read(&sk->sk_omem_alloc); } META_COLLECTOR(int_sk_rcv_qlen) { const struct sock *sk = skb_to_full_sk(skb); if (!sk) { *err = -1; return; } dst->value = sk->sk_receive_queue.qlen; } META_COLLECTOR(int_sk_snd_qlen) { const struct sock *sk = skb_to_full_sk(skb); if (!sk) { *err = -1; return; } dst->value = sk->sk_write_queue.qlen; } META_COLLECTOR(int_sk_wmem_queued) { const struct sock *sk = skb_to_full_sk(skb); if (!sk) { *err = -1; return; } dst->value = READ_ONCE(sk->sk_wmem_queued); } META_COLLECTOR(int_sk_fwd_alloc) { const struct sock *sk = skb_to_full_sk(skb); if (!sk) { *err = -1; return; } dst->value = READ_ONCE(sk->sk_forward_alloc); } META_COLLECTOR(int_sk_sndbuf) { const struct sock *sk = skb_to_full_sk(skb); if (!sk) { *err = -1; return; } dst->value = sk->sk_sndbuf; } META_COLLECTOR(int_sk_alloc) { const struct sock *sk = skb_to_full_sk(skb); if (!sk) { *err = -1; return; } dst->value = (__force int) sk->sk_allocation; } META_COLLECTOR(int_sk_hash) { if (skip_nonlocal(skb)) { *err = -1; return; } dst->value = skb->sk->sk_hash; } META_COLLECTOR(int_sk_lingertime) { const struct sock *sk = skb_to_full_sk(skb); if (!sk) { *err = -1; return; } dst->value = READ_ONCE(sk->sk_lingertime) / HZ; } META_COLLECTOR(int_sk_err_qlen) { const struct sock *sk = skb_to_full_sk(skb); if (!sk) { *err = -1; return; } dst->value = sk->sk_error_queue.qlen; } META_COLLECTOR(int_sk_ack_bl) { const struct sock *sk = skb_to_full_sk(skb); if (!sk) { *err = -1; return; } dst->value = READ_ONCE(sk->sk_ack_backlog); } META_COLLECTOR(int_sk_max_ack_bl) { const struct sock *sk = skb_to_full_sk(skb); if (!sk) { *err = -1; return; } dst->value = READ_ONCE(sk->sk_max_ack_backlog); } META_COLLECTOR(int_sk_prio) { const struct sock *sk = skb_to_full_sk(skb); if (!sk) { *err = -1; return; } dst->value = READ_ONCE(sk->sk_priority); } META_COLLECTOR(int_sk_rcvlowat) { const struct sock *sk = skb_to_full_sk(skb); if (!sk) { *err = -1; return; } dst->value = READ_ONCE(sk->sk_rcvlowat); } META_COLLECTOR(int_sk_rcvtimeo) { const struct sock *sk = skb_to_full_sk(skb); if (!sk) { *err = -1; return; } dst->value = READ_ONCE(sk->sk_rcvtimeo) / HZ; } META_COLLECTOR(int_sk_sndtimeo) { const struct sock *sk = skb_to_full_sk(skb); if (!sk) { *err = -1; return; } dst->value = READ_ONCE(sk->sk_sndtimeo) / HZ; } META_COLLECTOR(int_sk_sendmsg_off) { const struct sock *sk = skb_to_full_sk(skb); if (!sk) { *err = -1; return; } dst->value = sk->sk_frag.offset; } META_COLLECTOR(int_sk_write_pend) { const struct sock *sk = skb_to_full_sk(skb); if (!sk) { *err = -1; return; } dst->value = sk->sk_write_pending; } /************************************************************************** * Meta value collectors assignment table **************************************************************************/ struct meta_ops { void (*get)(struct sk_buff *, struct tcf_pkt_info *, struct meta_value *, struct meta_obj *, int *); }; #define META_ID(name) TCF_META_ID_##name #define META_FUNC(name) { .get = meta_##name } /* Meta value operations table listing all meta value collectors and * assigns them to a type and meta id. */ static struct meta_ops __meta_ops[TCF_META_TYPE_MAX + 1][TCF_META_ID_MAX + 1] = { [TCF_META_TYPE_VAR] = { [META_ID(DEV)] = META_FUNC(var_dev), [META_ID(SK_BOUND_IF)] = META_FUNC(var_sk_bound_if), }, [TCF_META_TYPE_INT] = { [META_ID(RANDOM)] = META_FUNC(int_random), [META_ID(LOADAVG_0)] = META_FUNC(int_loadavg_0), [META_ID(LOADAVG_1)] = META_FUNC(int_loadavg_1), [META_ID(LOADAVG_2)] = META_FUNC(int_loadavg_2), [META_ID(DEV)] = META_FUNC(int_dev), [META_ID(PRIORITY)] = META_FUNC(int_priority), [META_ID(PROTOCOL)] = META_FUNC(int_protocol), [META_ID(PKTTYPE)] = META_FUNC(int_pkttype), [META_ID(PKTLEN)] = META_FUNC(int_pktlen), [META_ID(DATALEN)] = META_FUNC(int_datalen), [META_ID(MACLEN)] = META_FUNC(int_maclen), [META_ID(NFMARK)] = META_FUNC(int_mark), [META_ID(TCINDEX)] = META_FUNC(int_tcindex), [META_ID(RTCLASSID)] = META_FUNC(int_rtclassid), [META_ID(RTIIF)] = META_FUNC(int_rtiif), [META_ID(SK_FAMILY)] = META_FUNC(int_sk_family), [META_ID(SK_STATE)] = META_FUNC(int_sk_state), [META_ID(SK_REUSE)] = META_FUNC(int_sk_reuse), [META_ID(SK_BOUND_IF)] = META_FUNC(int_sk_bound_if), [META_ID(SK_REFCNT)] = META_FUNC(int_sk_refcnt), [META_ID(SK_RCVBUF)] = META_FUNC(int_sk_rcvbuf), [META_ID(SK_SNDBUF)] = META_FUNC(int_sk_sndbuf), [META_ID(SK_SHUTDOWN)] = META_FUNC(int_sk_shutdown), [META_ID(SK_PROTO)] = META_FUNC(int_sk_proto), [META_ID(SK_TYPE)] = META_FUNC(int_sk_type), [META_ID(SK_RMEM_ALLOC)] = META_FUNC(int_sk_rmem_alloc), [META_ID(SK_WMEM_ALLOC)] = META_FUNC(int_sk_wmem_alloc), [META_ID(SK_OMEM_ALLOC)] = META_FUNC(int_sk_omem_alloc), [META_ID(SK_WMEM_QUEUED)] = META_FUNC(int_sk_wmem_queued), [META_ID(SK_RCV_QLEN)] = META_FUNC(int_sk_rcv_qlen), [META_ID(SK_SND_QLEN)] = META_FUNC(int_sk_snd_qlen), [META_ID(SK_ERR_QLEN)] = META_FUNC(int_sk_err_qlen), [META_ID(SK_FORWARD_ALLOCS)] = META_FUNC(int_sk_fwd_alloc), [META_ID(SK_ALLOCS)] = META_FUNC(int_sk_alloc), [META_ID(SK_HASH)] = META_FUNC(int_sk_hash), [META_ID(SK_LINGERTIME)] = META_FUNC(int_sk_lingertime), [META_ID(SK_ACK_BACKLOG)] = META_FUNC(int_sk_ack_bl), [META_ID(SK_MAX_ACK_BACKLOG)] = META_FUNC(int_sk_max_ack_bl), [META_ID(SK_PRIO)] = META_FUNC(int_sk_prio), [META_ID(SK_RCVLOWAT)] = META_FUNC(int_sk_rcvlowat), [META_ID(SK_RCVTIMEO)] = META_FUNC(int_sk_rcvtimeo), [META_ID(SK_SNDTIMEO)] = META_FUNC(int_sk_sndtimeo), [META_ID(SK_SENDMSG_OFF)] = META_FUNC(int_sk_sendmsg_off), [META_ID(SK_WRITE_PENDING)] = META_FUNC(int_sk_write_pend), [META_ID(VLAN_TAG)] = META_FUNC(int_vlan_tag), [META_ID(RXHASH)] = META_FUNC(int_rxhash), } }; static inline struct meta_ops *meta_ops(struct meta_value *val) { return &__meta_ops[meta_type(val)][meta_id(val)]; } /************************************************************************** * Type specific operations for TCF_META_TYPE_VAR **************************************************************************/ static int meta_var_compare(struct meta_obj *a, struct meta_obj *b) { int r = a->len - b->len; if (r == 0) r = memcmp((void *) a->value, (void *) b->value, a->len); return r; } static int meta_var_change(struct meta_value *dst, struct nlattr *nla) { int len = nla_len(nla); dst->val = (unsigned long)kmemdup(nla_data(nla), len, GFP_KERNEL); if (dst->val == 0UL) return -ENOMEM; dst->len = len; return 0; } static void meta_var_destroy(struct meta_value *v) { kfree((void *) v->val); } static void meta_var_apply_extras(struct meta_value *v, struct meta_obj *dst) { int shift = v->hdr.shift; if (shift && shift < dst->len) dst->len -= shift; } static int meta_var_dump(struct sk_buff *skb, struct meta_value *v, int tlv) { if (v->val && v->len && nla_put(skb, tlv, v->len, (void *) v->val)) goto nla_put_failure; return 0; nla_put_failure: return -1; } /************************************************************************** * Type specific operations for TCF_META_TYPE_INT **************************************************************************/ static int meta_int_compare(struct meta_obj *a, struct meta_obj *b) { /* Let gcc optimize it, the unlikely is not really based on * some numbers but jump free code for mismatches seems * more logical. */ if (unlikely(a->value == b->value)) return 0; else if (a->value < b->value) return -1; else return 1; } static int meta_int_change(struct meta_value *dst, struct nlattr *nla) { if (nla_len(nla) >= sizeof(unsigned long)) { dst->val = *(unsigned long *) nla_data(nla); dst->len = sizeof(unsigned long); } else if (nla_len(nla) == sizeof(u32)) { dst->val = nla_get_u32(nla); dst->len = sizeof(u32); } else return -EINVAL; return 0; } static void meta_int_apply_extras(struct meta_value *v, struct meta_obj *dst) { if (v->hdr.shift) dst->value >>= v->hdr.shift; if (v->val) dst->value &= v->val; } static int meta_int_dump(struct sk_buff *skb, struct meta_value *v, int tlv) { if (v->len == sizeof(unsigned long)) { if (nla_put(skb, tlv, sizeof(unsigned long), &v->val)) goto nla_put_failure; } else if (v->len == sizeof(u32)) { if (nla_put_u32(skb, tlv, v->val)) goto nla_put_failure; } return 0; nla_put_failure: return -1; } /************************************************************************** * Type specific operations table **************************************************************************/ struct meta_type_ops { void (*destroy)(struct meta_value *); int (*compare)(struct meta_obj *, struct meta_obj *); int (*change)(struct meta_value *, struct nlattr *); void (*apply_extras)(struct meta_value *, struct meta_obj *); int (*dump)(struct sk_buff *, struct meta_value *, int); }; static const struct meta_type_ops __meta_type_ops[TCF_META_TYPE_MAX + 1] = { [TCF_META_TYPE_VAR] = { .destroy = meta_var_destroy, .compare = meta_var_compare, .change = meta_var_change, .apply_extras = meta_var_apply_extras, .dump = meta_var_dump }, [TCF_META_TYPE_INT] = { .compare = meta_int_compare, .change = meta_int_change, .apply_extras = meta_int_apply_extras, .dump = meta_int_dump } }; static inline const struct meta_type_ops *meta_type_ops(struct meta_value *v) { return &__meta_type_ops[meta_type(v)]; } /************************************************************************** * Core **************************************************************************/ static int meta_get(struct sk_buff *skb, struct tcf_pkt_info *info, struct meta_value *v, struct meta_obj *dst) { int err = 0; if (meta_id(v) == TCF_META_ID_VALUE) { dst->value = v->val; dst->len = v->len; return 0; } meta_ops(v)->get(skb, info, v, dst, &err); if (err < 0) return err; if (meta_type_ops(v)->apply_extras) meta_type_ops(v)->apply_extras(v, dst); return 0; } static int em_meta_match(struct sk_buff *skb, struct tcf_ematch *m, struct tcf_pkt_info *info) { int r; struct meta_match *meta = (struct meta_match *) m->data; struct meta_obj l_value, r_value; if (meta_get(skb, info, &meta->lvalue, &l_value) < 0 || meta_get(skb, info, &meta->rvalue, &r_value) < 0) return 0; r = meta_type_ops(&meta->lvalue)->compare(&l_value, &r_value); switch (meta->lvalue.hdr.op) { case TCF_EM_OPND_EQ: return !r; case TCF_EM_OPND_LT: return r < 0; case TCF_EM_OPND_GT: return r > 0; } return 0; } static void meta_delete(struct meta_match *meta) { if (meta) { const struct meta_type_ops *ops = meta_type_ops(&meta->lvalue); if (ops && ops->destroy) { ops->destroy(&meta->lvalue); ops->destroy(&meta->rvalue); } } kfree(meta); } static inline int meta_change_data(struct meta_value *dst, struct nlattr *nla) { if (nla) { if (nla_len(nla) == 0) return -EINVAL; return meta_type_ops(dst)->change(dst, nla); } return 0; } static inline int meta_is_supported(struct meta_value *val) { return !meta_id(val) || meta_ops(val)->get; } static const struct nla_policy meta_policy[TCA_EM_META_MAX + 1] = { [TCA_EM_META_HDR] = { .len = sizeof(struct tcf_meta_hdr) }, }; static int em_meta_change(struct net *net, void *data, int len, struct tcf_ematch *m) { int err; struct nlattr *tb[TCA_EM_META_MAX + 1]; struct tcf_meta_hdr *hdr; struct meta_match *meta = NULL; err = nla_parse_deprecated(tb, TCA_EM_META_MAX, data, len, meta_policy, NULL); if (err < 0) goto errout; err = -EINVAL; if (tb[TCA_EM_META_HDR] == NULL) goto errout; hdr = nla_data(tb[TCA_EM_META_HDR]); if (TCF_META_TYPE(hdr->left.kind) != TCF_META_TYPE(hdr->right.kind) || TCF_META_TYPE(hdr->left.kind) > TCF_META_TYPE_MAX || TCF_META_ID(hdr->left.kind) > TCF_META_ID_MAX || TCF_META_ID(hdr->right.kind) > TCF_META_ID_MAX) goto errout; meta = kzalloc(sizeof(*meta), GFP_KERNEL); if (meta == NULL) { err = -ENOMEM; goto errout; } memcpy(&meta->lvalue.hdr, &hdr->left, sizeof(hdr->left)); memcpy(&meta->rvalue.hdr, &hdr->right, sizeof(hdr->right)); if (!meta_is_supported(&meta->lvalue) || !meta_is_supported(&meta->rvalue)) { err = -EOPNOTSUPP; goto errout; } if (meta_change_data(&meta->lvalue, tb[TCA_EM_META_LVALUE]) < 0 || meta_change_data(&meta->rvalue, tb[TCA_EM_META_RVALUE]) < 0) goto errout; m->datalen = sizeof(*meta); m->data = (unsigned long) meta; err = 0; errout: if (err && meta) meta_delete(meta); return err; } static void em_meta_destroy(struct tcf_ematch *m) { if (m) meta_delete((struct meta_match *) m->data); } static int em_meta_dump(struct sk_buff *skb, struct tcf_ematch *em) { struct meta_match *meta = (struct meta_match *) em->data; struct tcf_meta_hdr hdr; const struct meta_type_ops *ops; memset(&hdr, 0, sizeof(hdr)); memcpy(&hdr.left, &meta->lvalue.hdr, sizeof(hdr.left)); memcpy(&hdr.right, &meta->rvalue.hdr, sizeof(hdr.right)); if (nla_put(skb, TCA_EM_META_HDR, sizeof(hdr), &hdr)) goto nla_put_failure; ops = meta_type_ops(&meta->lvalue); if (ops->dump(skb, &meta->lvalue, TCA_EM_META_LVALUE) < 0 || ops->dump(skb, &meta->rvalue, TCA_EM_META_RVALUE) < 0) goto nla_put_failure; return 0; nla_put_failure: return -1; } static struct tcf_ematch_ops em_meta_ops = { .kind = TCF_EM_META, .change = em_meta_change, .match = em_meta_match, .destroy = em_meta_destroy, .dump = em_meta_dump, .owner = THIS_MODULE, .link = LIST_HEAD_INIT(em_meta_ops.link) }; static int __init init_em_meta(void) { return tcf_em_register(&em_meta_ops); } static void __exit exit_em_meta(void) { tcf_em_unregister(&em_meta_ops); } MODULE_DESCRIPTION("ematch classifier for various internal kernel metadata, skb metadata and sk metadata"); MODULE_LICENSE("GPL"); module_init(init_em_meta); module_exit(exit_em_meta); MODULE_ALIAS_TCF_EMATCH(TCF_EM_META); |
| 15 17 22 4 44 44 3 41 39 36 4 3 41 41 8 34 34 34 34 34 8 8 8 8 8 8 8 33 13 21 1 6 2 2 21 3 15 8 3 17 16 2 2 16 1 1 16 1 5 12 44 44 1 1 2 41 39 3 41 17 17 17 3 3 3 3 3 15 3 3 1 1 1 1 1 1 1 1 1 4 3 1 3 1 4 4 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 | // SPDX-License-Identifier: GPL-2.0-or-later /* * net/sched/cls_route.c ROUTE4 classifier. * * Authors: Alexey Kuznetsov, <kuznet@ms2.inr.ac.ru> */ #include <linux/module.h> #include <linux/slab.h> #include <linux/types.h> #include <linux/kernel.h> #include <linux/string.h> #include <linux/errno.h> #include <linux/skbuff.h> #include <net/dst.h> #include <net/route.h> #include <net/netlink.h> #include <net/act_api.h> #include <net/pkt_cls.h> #include <net/tc_wrapper.h> /* * 1. For now we assume that route tags < 256. * It allows to use direct table lookups, instead of hash tables. * 2. For now we assume that "from TAG" and "fromdev DEV" statements * are mutually exclusive. * 3. "to TAG from ANY" has higher priority, than "to ANY from XXX" */ struct route4_fastmap { struct route4_filter *filter; u32 id; int iif; }; struct route4_head { struct route4_fastmap fastmap[16]; struct route4_bucket __rcu *table[256 + 1]; struct rcu_head rcu; }; struct route4_bucket { /* 16 FROM buckets + 16 IIF buckets + 1 wildcard bucket */ struct route4_filter __rcu *ht[16 + 16 + 1]; struct rcu_head rcu; }; struct route4_filter { struct route4_filter __rcu *next; u32 id; int iif; struct tcf_result res; struct tcf_exts exts; u32 handle; struct route4_bucket *bkt; struct tcf_proto *tp; struct rcu_work rwork; }; #define ROUTE4_FAILURE ((struct route4_filter *)(-1L)) static inline int route4_fastmap_hash(u32 id, int iif) { return id & 0xF; } static DEFINE_SPINLOCK(fastmap_lock); static void route4_reset_fastmap(struct route4_head *head) { spin_lock_bh(&fastmap_lock); memset(head->fastmap, 0, sizeof(head->fastmap)); spin_unlock_bh(&fastmap_lock); } static void route4_set_fastmap(struct route4_head *head, u32 id, int iif, struct route4_filter *f) { int h = route4_fastmap_hash(id, iif); /* fastmap updates must look atomic to aling id, iff, filter */ spin_lock_bh(&fastmap_lock); head->fastmap[h].id = id; head->fastmap[h].iif = iif; head->fastmap[h].filter = f; spin_unlock_bh(&fastmap_lock); } static inline int route4_hash_to(u32 id) { return id & 0xFF; } static inline int route4_hash_from(u32 id) { return (id >> 16) & 0xF; } static inline int route4_hash_iif(int iif) { return 16 + ((iif >> 16) & 0xF); } static inline int route4_hash_wild(void) { return 32; } #define ROUTE4_APPLY_RESULT() \ { \ *res = f->res; \ if (tcf_exts_has_actions(&f->exts)) { \ int r = tcf_exts_exec(skb, &f->exts, res); \ if (r < 0) { \ dont_cache = 1; \ continue; \ } \ return r; \ } else if (!dont_cache) \ route4_set_fastmap(head, id, iif, f); \ return 0; \ } TC_INDIRECT_SCOPE int route4_classify(struct sk_buff *skb, const struct tcf_proto *tp, struct tcf_result *res) { struct route4_head *head = rcu_dereference_bh(tp->root); struct dst_entry *dst; struct route4_bucket *b; struct route4_filter *f; u32 id, h; int iif, dont_cache = 0; dst = skb_dst(skb); if (!dst) goto failure; id = dst->tclassid; iif = inet_iif(skb); h = route4_fastmap_hash(id, iif); spin_lock(&fastmap_lock); if (id == head->fastmap[h].id && iif == head->fastmap[h].iif && (f = head->fastmap[h].filter) != NULL) { if (f == ROUTE4_FAILURE) { spin_unlock(&fastmap_lock); goto failure; } *res = f->res; spin_unlock(&fastmap_lock); return 0; } spin_unlock(&fastmap_lock); h = route4_hash_to(id); restart: b = rcu_dereference_bh(head->table[h]); if (b) { for (f = rcu_dereference_bh(b->ht[route4_hash_from(id)]); f; f = rcu_dereference_bh(f->next)) if (f->id == id) ROUTE4_APPLY_RESULT(); for (f = rcu_dereference_bh(b->ht[route4_hash_iif(iif)]); f; f = rcu_dereference_bh(f->next)) if (f->iif == iif) ROUTE4_APPLY_RESULT(); for (f = rcu_dereference_bh(b->ht[route4_hash_wild()]); f; f = rcu_dereference_bh(f->next)) ROUTE4_APPLY_RESULT(); } if (h < 256) { h = 256; id &= ~0xFFFF; goto restart; } if (!dont_cache) route4_set_fastmap(head, id, iif, ROUTE4_FAILURE); failure: return -1; } static inline u32 to_hash(u32 id) { u32 h = id & 0xFF; if (id & 0x8000) h += 256; return h; } static inline u32 from_hash(u32 id) { id &= 0xFFFF; if (id == 0xFFFF) return 32; if (!(id & 0x8000)) { if (id > 255) return 256; return id & 0xF; } return 16 + (id & 0xF); } static void *route4_get(struct tcf_proto *tp, u32 handle) { struct route4_head *head = rtnl_dereference(tp->root); struct route4_bucket *b; struct route4_filter *f; unsigned int h1, h2; h1 = to_hash(handle); if (h1 > 256) return NULL; h2 = from_hash(handle >> 16); if (h2 > 32) return NULL; b = rtnl_dereference(head->table[h1]); if (b) { for (f = rtnl_dereference(b->ht[h2]); f; f = rtnl_dereference(f->next)) if (f->handle == handle) return f; } return NULL; } static int route4_init(struct tcf_proto *tp) { struct route4_head *head; head = kzalloc(sizeof(struct route4_head), GFP_KERNEL); if (head == NULL) return -ENOBUFS; rcu_assign_pointer(tp->root, head); return 0; } static void __route4_delete_filter(struct route4_filter *f) { tcf_exts_destroy(&f->exts); tcf_exts_put_net(&f->exts); kfree(f); } static void route4_delete_filter_work(struct work_struct *work) { struct route4_filter *f = container_of(to_rcu_work(work), struct route4_filter, rwork); rtnl_lock(); __route4_delete_filter(f); rtnl_unlock(); } static void route4_queue_work(struct route4_filter *f) { tcf_queue_work(&f->rwork, route4_delete_filter_work); } static void route4_destroy(struct tcf_proto *tp, bool rtnl_held, struct netlink_ext_ack *extack) { struct route4_head *head = rtnl_dereference(tp->root); int h1, h2; if (head == NULL) return; for (h1 = 0; h1 <= 256; h1++) { struct route4_bucket *b; b = rtnl_dereference(head->table[h1]); if (b) { for (h2 = 0; h2 <= 32; h2++) { struct route4_filter *f; while ((f = rtnl_dereference(b->ht[h2])) != NULL) { struct route4_filter *next; next = rtnl_dereference(f->next); RCU_INIT_POINTER(b->ht[h2], next); tcf_unbind_filter(tp, &f->res); if (tcf_exts_get_net(&f->exts)) route4_queue_work(f); else __route4_delete_filter(f); } } RCU_INIT_POINTER(head->table[h1], NULL); kfree_rcu(b, rcu); } } kfree_rcu(head, rcu); } static int route4_delete(struct tcf_proto *tp, void *arg, bool *last, bool rtnl_held, struct netlink_ext_ack *extack) { struct route4_head *head = rtnl_dereference(tp->root); struct route4_filter *f = arg; struct route4_filter __rcu **fp; struct route4_filter *nf; struct route4_bucket *b; unsigned int h = 0; int i, h1; if (!head || !f) return -EINVAL; h = f->handle; b = f->bkt; fp = &b->ht[from_hash(h >> 16)]; for (nf = rtnl_dereference(*fp); nf; fp = &nf->next, nf = rtnl_dereference(*fp)) { if (nf == f) { /* unlink it */ RCU_INIT_POINTER(*fp, rtnl_dereference(f->next)); /* Remove any fastmap lookups that might ref filter * notice we unlink'd the filter so we can't get it * back in the fastmap. */ route4_reset_fastmap(head); /* Delete it */ tcf_unbind_filter(tp, &f->res); tcf_exts_get_net(&f->exts); tcf_queue_work(&f->rwork, route4_delete_filter_work); /* Strip RTNL protected tree */ for (i = 0; i <= 32; i++) { struct route4_filter *rt; rt = rtnl_dereference(b->ht[i]); if (rt) goto out; } /* OK, session has no flows */ RCU_INIT_POINTER(head->table[to_hash(h)], NULL); kfree_rcu(b, rcu); break; } } out: *last = true; for (h1 = 0; h1 <= 256; h1++) { if (rcu_access_pointer(head->table[h1])) { *last = false; break; } } return 0; } static const struct nla_policy route4_policy[TCA_ROUTE4_MAX + 1] = { [TCA_ROUTE4_CLASSID] = { .type = NLA_U32 }, [TCA_ROUTE4_TO] = NLA_POLICY_MAX(NLA_U32, 0xFF), [TCA_ROUTE4_FROM] = NLA_POLICY_MAX(NLA_U32, 0xFF), [TCA_ROUTE4_IIF] = NLA_POLICY_MAX(NLA_U32, 0x7FFF), }; static int route4_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, struct route4_filter *f, u32 handle, struct route4_head *head, struct nlattr **tb, struct nlattr *est, int new, u32 flags, struct netlink_ext_ack *extack) { u32 id = 0, to = 0, nhandle = 0x8000; struct route4_filter *fp; unsigned int h1; struct route4_bucket *b; int err; err = tcf_exts_validate(net, tp, tb, est, &f->exts, flags, extack); if (err < 0) return err; if (tb[TCA_ROUTE4_TO]) { if (new && handle & 0x8000) { NL_SET_ERR_MSG(extack, "Invalid handle"); return -EINVAL; } to = nla_get_u32(tb[TCA_ROUTE4_TO]); nhandle = to; } if (tb[TCA_ROUTE4_FROM] && tb[TCA_ROUTE4_IIF]) { NL_SET_ERR_MSG_ATTR(extack, tb[TCA_ROUTE4_FROM], "'from' and 'fromif' are mutually exclusive"); return -EINVAL; } if (tb[TCA_ROUTE4_FROM]) { id = nla_get_u32(tb[TCA_ROUTE4_FROM]); nhandle |= id << 16; } else if (tb[TCA_ROUTE4_IIF]) { id = nla_get_u32(tb[TCA_ROUTE4_IIF]); nhandle |= (id | 0x8000) << 16; } else nhandle |= 0xFFFF << 16; if (handle && new) { nhandle |= handle & 0x7F00; if (nhandle != handle) { NL_SET_ERR_MSG_FMT(extack, "Handle mismatch constructed: %x (expected: %x)", handle, nhandle); return -EINVAL; } } if (!nhandle) { NL_SET_ERR_MSG(extack, "Replacing with handle of 0 is invalid"); return -EINVAL; } h1 = to_hash(nhandle); b = rtnl_dereference(head->table[h1]); if (!b) { b = kzalloc(sizeof(struct route4_bucket), GFP_KERNEL); if (b == NULL) return -ENOBUFS; rcu_assign_pointer(head->table[h1], b); } else { unsigned int h2 = from_hash(nhandle >> 16); for (fp = rtnl_dereference(b->ht[h2]); fp; fp = rtnl_dereference(fp->next)) if (fp->handle == f->handle) return -EEXIST; } if (tb[TCA_ROUTE4_TO]) f->id = to; if (tb[TCA_ROUTE4_FROM]) f->id = to | id<<16; else if (tb[TCA_ROUTE4_IIF]) f->iif = id; f->handle = nhandle; f->bkt = b; f->tp = tp; if (tb[TCA_ROUTE4_CLASSID]) { f->res.classid = nla_get_u32(tb[TCA_ROUTE4_CLASSID]); tcf_bind_filter(tp, &f->res, base); } return 0; } static int route4_change(struct net *net, struct sk_buff *in_skb, struct tcf_proto *tp, unsigned long base, u32 handle, struct nlattr **tca, void **arg, u32 flags, struct netlink_ext_ack *extack) { struct route4_head *head = rtnl_dereference(tp->root); struct route4_filter __rcu **fp; struct route4_filter *fold, *f1, *pfp, *f = NULL; struct route4_bucket *b; struct nlattr *tb[TCA_ROUTE4_MAX + 1]; unsigned int h, th; int err; bool new = true; if (!handle) { NL_SET_ERR_MSG(extack, "Creating with handle of 0 is invalid"); return -EINVAL; } if (NL_REQ_ATTR_CHECK(extack, NULL, tca, TCA_OPTIONS)) { NL_SET_ERR_MSG_MOD(extack, "Missing options"); return -EINVAL; } err = nla_parse_nested_deprecated(tb, TCA_ROUTE4_MAX, tca[TCA_OPTIONS], route4_policy, NULL); if (err < 0) return err; fold = *arg; if (fold && fold->handle != handle) return -EINVAL; err = -ENOBUFS; f = kzalloc(sizeof(struct route4_filter), GFP_KERNEL); if (!f) goto errout; err = tcf_exts_init(&f->exts, net, TCA_ROUTE4_ACT, TCA_ROUTE4_POLICE); if (err < 0) goto errout; if (fold) { f->id = fold->id; f->iif = fold->iif; f->handle = fold->handle; f->tp = fold->tp; f->bkt = fold->bkt; new = false; } err = route4_set_parms(net, tp, base, f, handle, head, tb, tca[TCA_RATE], new, flags, extack); if (err < 0) goto errout; h = from_hash(f->handle >> 16); fp = &f->bkt->ht[h]; for (pfp = rtnl_dereference(*fp); (f1 = rtnl_dereference(*fp)) != NULL; fp = &f1->next) if (f->handle < f1->handle) break; tcf_block_netif_keep_dst(tp->chain->block); rcu_assign_pointer(f->next, f1); rcu_assign_pointer(*fp, f); if (fold) { th = to_hash(fold->handle); h = from_hash(fold->handle >> 16); b = rtnl_dereference(head->table[th]); if (b) { fp = &b->ht[h]; for (pfp = rtnl_dereference(*fp); pfp; fp = &pfp->next, pfp = rtnl_dereference(*fp)) { if (pfp == fold) { rcu_assign_pointer(*fp, fold->next); break; } } } } route4_reset_fastmap(head); *arg = f; if (fold) { tcf_unbind_filter(tp, &fold->res); tcf_exts_get_net(&fold->exts); tcf_queue_work(&fold->rwork, route4_delete_filter_work); } return 0; errout: if (f) tcf_exts_destroy(&f->exts); kfree(f); return err; } static void route4_walk(struct tcf_proto *tp, struct tcf_walker *arg, bool rtnl_held) { struct route4_head *head = rtnl_dereference(tp->root); unsigned int h, h1; if (head == NULL || arg->stop) return; for (h = 0; h <= 256; h++) { struct route4_bucket *b = rtnl_dereference(head->table[h]); if (b) { for (h1 = 0; h1 <= 32; h1++) { struct route4_filter *f; for (f = rtnl_dereference(b->ht[h1]); f; f = rtnl_dereference(f->next)) { if (!tc_cls_stats_dump(tp, arg, f)) return; } } } } } static int route4_dump(struct net *net, struct tcf_proto *tp, void *fh, struct sk_buff *skb, struct tcmsg *t, bool rtnl_held) { struct route4_filter *f = fh; struct nlattr *nest; u32 id; if (f == NULL) return skb->len; t->tcm_handle = f->handle; nest = nla_nest_start_noflag(skb, TCA_OPTIONS); if (nest == NULL) goto nla_put_failure; if (!(f->handle & 0x8000)) { id = f->id & 0xFF; if (nla_put_u32(skb, TCA_ROUTE4_TO, id)) goto nla_put_failure; } if (f->handle & 0x80000000) { if ((f->handle >> 16) != 0xFFFF && nla_put_u32(skb, TCA_ROUTE4_IIF, f->iif)) goto nla_put_failure; } else { id = f->id >> 16; if (nla_put_u32(skb, TCA_ROUTE4_FROM, id)) goto nla_put_failure; } if (f->res.classid && nla_put_u32(skb, TCA_ROUTE4_CLASSID, f->res.classid)) goto nla_put_failure; if (tcf_exts_dump(skb, &f->exts) < 0) goto nla_put_failure; nla_nest_end(skb, nest); if (tcf_exts_dump_stats(skb, &f->exts) < 0) goto nla_put_failure; return skb->len; nla_put_failure: nla_nest_cancel(skb, nest); return -1; } static void route4_bind_class(void *fh, u32 classid, unsigned long cl, void *q, unsigned long base) { struct route4_filter *f = fh; tc_cls_bind_class(classid, cl, q, &f->res, base); } static struct tcf_proto_ops cls_route4_ops __read_mostly = { .kind = "route", .classify = route4_classify, .init = route4_init, .destroy = route4_destroy, .get = route4_get, .change = route4_change, .delete = route4_delete, .walk = route4_walk, .dump = route4_dump, .bind_class = route4_bind_class, .owner = THIS_MODULE, }; MODULE_ALIAS_NET_CLS("route"); static int __init init_route4(void) { return register_tcf_proto_ops(&cls_route4_ops); } static void __exit exit_route4(void) { unregister_tcf_proto_ops(&cls_route4_ops); } module_init(init_route4) module_exit(exit_route4) MODULE_DESCRIPTION("Routing table realm based TC classifier"); MODULE_LICENSE("GPL"); |
| 3807 3631 3900 21 1 16 1 2 1 2 21 21 16 52 72 4 1 56 3878 3879 3878 17 9 3882 15 3882 162 3573 3744 3743 3819 3704 3811 3811 3823 3827 4 4 4 3746 3739 3746 2 8 8 39 4 35 4 39 37 7 56 56 19 37 13 129 129 31 3 5 37 2 93 93 20 93 74 93 93 92 1 93 93 93 93 73 19 1 2 92 1 93 93 2 93 95 92 10 82 51 3 83 9 53 39 2 2 2 2 2 68 46 3 2 44 12 6 9 9 9 47 8 33 14 8 10 43 3 10 16 11 4 25 3 39 3 3 8 47 9 56 11 45 37 8 11 45 19 26 11 3 34 37 37 37 8 37 8 37 37 3821 3909 81 81 4 77 81 81 37 16 3 129 4 125 114 11 57 4 64 68 68 68 68 2 2 1 1 1 9 7 4 4 1 5 5 1 13 13 13 7 1 3719 13 3708 131 3809 3595 3595 3591 25 3521 83 3596 3842 45 197 3727 1 3694 113 3739 104 3844 3593 199 3730 3732 3603 27 3730 3597 3586 3585 25 3587 3584 3590 3588 3584 3519 80 3586 3586 3585 3582 3579 1635 2315 3702 3705 3706 3706 3703 2 3703 23 3705 3708 3707 3720 3720 3725 3711 1515 2210 1570 2153 1532 2189 3704 3710 3705 3704 3705 23 3732 3725 3885 44 3736 3730 44 3734 28 3734 26 3732 3173 570 3733 3735 1 28 2 2 3738 26 28 3728 1 1 28 1 3731 3733 30 1 3729 3738 24 22 8 18 5 6 11 3586 3586 3705 3707 3 82 82 81 3 2 1 1 82 82 82 1 82 82 2 81 24 24 13 7 6 7 6 12 3 3 6 24 24 24 24 4 22 19 6 19 19 19 24 24 24 24 7 7 7 19 5 2 3 3803 2 5 5 10 10 10 10 10 10 93 93 93 93 21 21 3632 9 3633 3629 3629 3714 7 3709 3718 3712 57 2 55 55 13 57 4 3569 3565 3571 3564 117 3570 3566 3729 3727 3726 3722 3718 3725 3730 3721 3735 2 3732 26 3732 2 16 3662 70 3732 3729 16 3172 569 3735 3731 7 3725 3735 3729 3 4 1 3720 8 3731 1 3728 7 3 1 1 3724 1451 2309 1970 349 3719 3447 265 21 3721 3726 1 39 7 69 3698 3706 11 11 11 4 7 11 10 1 3 1 6 3731 3580 3726 3595 3728 9 6 1 3570 3583 2 3584 4 3583 33 3735 3732 28 3723 3717 3264 393 76 3719 23 3705 3702 36 36 23 28 23 33 10 10 2 24 25 1 32 3805 3805 1 1 3805 12 5 3 3810 5 2 3808 6 7 3802 5 5 5 3808 6 3803 1 3805 6 3806 3 3808 3803 3808 3735 3810 3822 8 3 1 3815 2 2 3816 3743 3737 3810 3817 3819 13 13 13 13 21 1 1 21 8 13 21 21 2 9 9 5 4 8 9 4 4 1 8 8 8 13 11 8 3 8 8 6 8 8 8 8 6 2 109 109 109 93 38 11 31 31 7 7 26 26 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 2674 2675 2676 2677 2678 2679 2680 2681 2682 2683 2684 2685 2686 2687 2688 2689 2690 2691 2692 2693 2694 2695 2696 2697 2698 2699 2700 2701 2702 2703 2704 2705 2706 2707 2708 2709 2710 2711 2712 2713 2714 2715 2716 2717 2718 2719 2720 2721 2722 2723 2724 2725 2726 2727 2728 2729 2730 2731 2732 2733 2734 2735 2736 2737 2738 2739 2740 2741 2742 2743 2744 2745 2746 2747 2748 2749 2750 2751 2752 2753 2754 2755 2756 2757 2758 2759 2760 2761 2762 2763 2764 2765 2766 2767 2768 2769 2770 2771 2772 2773 2774 2775 2776 2777 2778 2779 2780 2781 2782 2783 2784 2785 2786 2787 2788 2789 2790 2791 2792 2793 2794 2795 2796 2797 2798 2799 2800 2801 2802 2803 2804 2805 2806 2807 2808 2809 2810 2811 2812 2813 2814 2815 2816 2817 2818 2819 2820 2821 2822 2823 2824 2825 2826 2827 2828 2829 2830 2831 2832 2833 2834 2835 2836 2837 2838 2839 2840 2841 2842 2843 2844 2845 2846 2847 2848 2849 2850 2851 2852 2853 2854 2855 2856 2857 2858 2859 2860 2861 2862 2863 2864 2865 2866 2867 2868 2869 2870 2871 2872 2873 2874 2875 2876 2877 2878 2879 2880 2881 2882 2883 2884 2885 2886 2887 2888 2889 2890 2891 2892 2893 2894 2895 2896 2897 2898 2899 2900 2901 2902 2903 2904 2905 2906 2907 2908 2909 2910 2911 2912 2913 2914 2915 2916 2917 2918 2919 2920 2921 2922 2923 2924 2925 2926 2927 2928 2929 2930 2931 2932 2933 2934 2935 2936 2937 2938 2939 2940 2941 2942 2943 2944 2945 2946 2947 2948 2949 2950 2951 2952 2953 2954 2955 2956 2957 2958 2959 2960 2961 2962 2963 2964 2965 2966 2967 2968 2969 2970 2971 2972 2973 2974 2975 2976 2977 2978 2979 2980 2981 2982 2983 2984 2985 2986 2987 2988 2989 2990 2991 2992 2993 2994 2995 2996 2997 2998 2999 3000 3001 3002 3003 3004 3005 3006 3007 3008 3009 3010 3011 3012 3013 3014 3015 3016 3017 3018 3019 3020 3021 3022 3023 3024 3025 3026 3027 3028 3029 3030 3031 3032 3033 3034 3035 3036 3037 3038 3039 3040 3041 3042 3043 3044 3045 3046 3047 3048 3049 3050 3051 3052 3053 3054 3055 3056 3057 3058 3059 3060 3061 3062 3063 3064 3065 3066 3067 3068 3069 3070 3071 3072 3073 3074 3075 3076 3077 3078 3079 3080 3081 3082 3083 3084 3085 3086 3087 3088 3089 3090 3091 3092 3093 3094 3095 3096 3097 3098 3099 3100 3101 3102 3103 3104 3105 3106 3107 3108 3109 3110 3111 3112 3113 3114 3115 3116 3117 3118 3119 3120 3121 3122 3123 3124 3125 3126 3127 3128 3129 3130 3131 3132 3133 3134 3135 3136 3137 3138 3139 3140 3141 3142 3143 3144 3145 3146 3147 3148 3149 3150 3151 3152 3153 3154 3155 3156 3157 3158 3159 3160 3161 3162 3163 3164 3165 3166 3167 3168 3169 3170 3171 3172 3173 3174 3175 3176 3177 3178 3179 3180 3181 3182 3183 3184 3185 3186 3187 3188 3189 3190 3191 3192 3193 3194 3195 3196 3197 3198 3199 3200 3201 3202 3203 3204 3205 3206 3207 3208 3209 3210 3211 3212 3213 3214 3215 3216 3217 3218 3219 3220 3221 3222 3223 3224 3225 3226 3227 3228 3229 3230 3231 3232 3233 3234 3235 3236 3237 3238 3239 3240 3241 3242 3243 3244 3245 3246 3247 3248 3249 3250 3251 3252 3253 3254 3255 3256 3257 3258 3259 3260 3261 3262 3263 3264 3265 3266 3267 3268 3269 3270 3271 3272 3273 3274 3275 3276 3277 3278 3279 3280 3281 3282 3283 3284 3285 3286 3287 3288 3289 3290 3291 3292 3293 3294 3295 3296 3297 3298 3299 3300 3301 3302 3303 3304 3305 3306 3307 3308 3309 3310 3311 3312 3313 3314 3315 3316 3317 3318 3319 3320 3321 3322 3323 3324 3325 3326 3327 3328 3329 3330 3331 3332 3333 3334 3335 3336 3337 3338 3339 3340 3341 3342 3343 3344 3345 3346 3347 3348 3349 3350 3351 3352 3353 3354 3355 3356 3357 3358 3359 3360 3361 3362 3363 3364 3365 3366 3367 3368 3369 3370 3371 3372 3373 3374 3375 3376 3377 3378 3379 3380 3381 3382 3383 3384 3385 3386 3387 3388 3389 3390 3391 3392 3393 3394 3395 3396 3397 3398 3399 3400 3401 3402 3403 3404 3405 3406 3407 3408 3409 3410 3411 3412 3413 3414 3415 3416 3417 3418 3419 3420 3421 3422 3423 3424 3425 3426 3427 3428 3429 3430 3431 3432 3433 3434 3435 3436 3437 3438 3439 3440 3441 3442 3443 3444 3445 3446 3447 3448 3449 3450 3451 3452 3453 3454 3455 3456 3457 3458 3459 3460 3461 3462 3463 3464 3465 3466 3467 3468 3469 3470 3471 3472 3473 3474 3475 3476 3477 3478 3479 3480 3481 3482 3483 3484 3485 3486 3487 3488 3489 3490 3491 3492 3493 3494 3495 3496 3497 3498 3499 3500 3501 3502 3503 3504 3505 3506 3507 3508 3509 3510 3511 3512 3513 3514 3515 3516 3517 3518 3519 3520 3521 3522 3523 3524 3525 3526 3527 3528 3529 3530 3531 3532 3533 3534 3535 3536 3537 3538 3539 3540 3541 3542 3543 3544 3545 3546 3547 3548 3549 3550 3551 3552 3553 3554 3555 3556 3557 3558 3559 3560 3561 3562 3563 3564 3565 3566 3567 3568 3569 3570 3571 3572 3573 3574 3575 3576 3577 3578 3579 3580 3581 3582 3583 3584 3585 3586 3587 3588 3589 3590 3591 3592 3593 3594 3595 3596 3597 3598 3599 3600 3601 3602 3603 3604 3605 3606 3607 3608 3609 3610 3611 3612 3613 3614 3615 3616 3617 3618 3619 3620 3621 3622 3623 3624 3625 3626 3627 3628 3629 3630 3631 3632 3633 3634 3635 3636 3637 3638 3639 3640 3641 3642 3643 3644 3645 3646 3647 3648 3649 3650 3651 3652 3653 3654 3655 3656 3657 3658 3659 3660 3661 3662 3663 3664 3665 3666 3667 3668 3669 3670 3671 3672 3673 3674 3675 3676 3677 3678 3679 3680 3681 3682 3683 3684 3685 3686 3687 3688 3689 3690 3691 3692 3693 3694 3695 3696 3697 3698 3699 3700 3701 3702 3703 3704 3705 3706 3707 3708 3709 3710 3711 3712 3713 3714 3715 3716 3717 3718 3719 3720 3721 3722 3723 3724 3725 3726 3727 3728 3729 3730 3731 3732 3733 3734 3735 3736 3737 3738 3739 3740 3741 3742 3743 3744 3745 3746 3747 3748 3749 3750 3751 3752 3753 3754 3755 3756 3757 3758 3759 3760 3761 3762 3763 3764 3765 3766 3767 3768 3769 3770 3771 3772 3773 3774 3775 3776 3777 3778 3779 3780 3781 3782 3783 3784 3785 3786 3787 3788 3789 3790 3791 3792 3793 3794 3795 3796 3797 3798 3799 3800 3801 3802 3803 3804 3805 3806 3807 3808 3809 3810 3811 3812 3813 3814 3815 3816 3817 3818 3819 3820 3821 3822 3823 3824 3825 3826 3827 3828 3829 3830 3831 3832 3833 3834 3835 3836 3837 3838 3839 3840 3841 3842 3843 3844 3845 3846 3847 3848 3849 3850 3851 3852 3853 3854 3855 3856 3857 3858 3859 3860 3861 3862 3863 3864 3865 3866 3867 3868 3869 3870 3871 3872 3873 3874 3875 3876 3877 3878 3879 3880 3881 3882 3883 3884 3885 3886 3887 3888 3889 3890 3891 3892 3893 3894 3895 3896 3897 3898 3899 3900 3901 3902 3903 3904 3905 3906 3907 3908 3909 3910 3911 3912 3913 3914 3915 3916 3917 3918 3919 3920 3921 3922 3923 3924 3925 3926 3927 3928 3929 3930 3931 3932 3933 3934 3935 3936 3937 3938 3939 3940 3941 3942 3943 3944 3945 3946 3947 3948 3949 3950 3951 3952 3953 3954 3955 3956 3957 3958 3959 3960 3961 3962 3963 3964 3965 3966 3967 3968 3969 3970 3971 3972 3973 3974 3975 3976 3977 3978 3979 3980 3981 3982 3983 3984 3985 3986 3987 3988 3989 3990 3991 3992 3993 3994 3995 3996 3997 3998 3999 4000 4001 4002 4003 4004 4005 4006 4007 4008 4009 4010 4011 4012 4013 4014 4015 4016 4017 4018 4019 4020 4021 4022 4023 4024 4025 4026 4027 4028 4029 4030 4031 4032 4033 4034 4035 4036 4037 4038 4039 4040 4041 4042 4043 4044 4045 4046 4047 4048 4049 4050 4051 4052 4053 4054 4055 4056 4057 4058 4059 4060 4061 4062 4063 4064 4065 4066 4067 4068 4069 4070 4071 4072 4073 4074 4075 4076 4077 4078 4079 4080 4081 4082 4083 4084 4085 4086 4087 4088 4089 4090 4091 4092 4093 4094 4095 4096 4097 4098 4099 4100 4101 4102 4103 4104 4105 4106 4107 4108 4109 4110 4111 4112 4113 4114 4115 4116 4117 4118 4119 4120 4121 4122 4123 4124 4125 4126 4127 4128 4129 4130 4131 4132 4133 4134 4135 4136 4137 4138 4139 4140 4141 4142 4143 4144 4145 4146 4147 4148 4149 4150 4151 4152 4153 4154 4155 4156 4157 4158 4159 4160 4161 4162 4163 4164 4165 4166 4167 4168 4169 4170 4171 4172 4173 4174 4175 4176 4177 4178 4179 4180 4181 4182 4183 4184 4185 4186 4187 4188 4189 4190 4191 4192 4193 4194 4195 4196 4197 4198 4199 4200 4201 4202 4203 4204 4205 4206 4207 4208 4209 4210 4211 4212 4213 4214 4215 4216 4217 4218 4219 4220 4221 4222 4223 4224 4225 4226 4227 4228 4229 4230 4231 4232 4233 4234 4235 4236 4237 4238 4239 4240 4241 4242 4243 4244 4245 4246 4247 4248 4249 4250 4251 4252 4253 4254 4255 4256 4257 4258 4259 4260 4261 4262 4263 4264 4265 4266 4267 4268 4269 4270 4271 4272 4273 4274 4275 4276 4277 4278 4279 4280 4281 4282 4283 4284 4285 4286 4287 4288 4289 4290 4291 4292 4293 4294 4295 4296 4297 4298 4299 4300 4301 4302 4303 4304 4305 4306 4307 4308 4309 4310 4311 4312 4313 4314 4315 4316 4317 4318 4319 4320 4321 4322 4323 4324 4325 4326 4327 4328 4329 4330 4331 4332 4333 4334 4335 4336 4337 4338 4339 4340 4341 4342 4343 4344 4345 4346 4347 4348 4349 4350 4351 4352 4353 4354 4355 4356 4357 4358 4359 4360 4361 4362 4363 4364 4365 4366 4367 4368 4369 4370 4371 4372 4373 4374 4375 4376 4377 4378 4379 4380 4381 4382 4383 4384 4385 4386 4387 4388 4389 4390 4391 4392 4393 4394 4395 4396 4397 4398 4399 4400 4401 4402 4403 4404 4405 4406 4407 4408 4409 4410 4411 4412 4413 4414 4415 4416 4417 4418 4419 4420 4421 4422 4423 4424 4425 4426 4427 4428 4429 4430 4431 4432 4433 4434 4435 4436 4437 4438 4439 4440 4441 4442 4443 4444 4445 4446 4447 4448 4449 4450 4451 4452 4453 4454 4455 4456 4457 4458 4459 4460 4461 4462 4463 4464 4465 4466 4467 4468 4469 4470 4471 4472 4473 4474 4475 4476 4477 4478 4479 4480 4481 4482 4483 4484 4485 4486 4487 4488 4489 4490 4491 4492 4493 4494 4495 4496 4497 4498 4499 4500 4501 4502 4503 4504 4505 4506 4507 4508 4509 4510 4511 4512 4513 4514 4515 4516 4517 4518 4519 4520 4521 4522 4523 4524 4525 4526 4527 4528 4529 4530 4531 4532 4533 4534 4535 4536 4537 4538 4539 4540 4541 4542 4543 4544 4545 4546 4547 4548 4549 4550 4551 4552 4553 4554 4555 4556 4557 4558 4559 4560 4561 4562 4563 4564 4565 4566 4567 4568 4569 4570 4571 4572 4573 4574 4575 4576 4577 4578 4579 4580 4581 4582 4583 4584 4585 4586 4587 4588 4589 4590 4591 4592 4593 4594 4595 4596 4597 4598 4599 4600 4601 4602 4603 4604 4605 4606 4607 4608 4609 4610 4611 4612 4613 4614 4615 4616 4617 4618 4619 4620 4621 4622 4623 4624 4625 4626 4627 4628 4629 4630 4631 4632 4633 4634 4635 4636 4637 4638 4639 4640 4641 4642 4643 4644 4645 4646 4647 4648 4649 4650 4651 4652 4653 4654 4655 4656 4657 4658 4659 4660 4661 4662 4663 4664 4665 4666 4667 4668 4669 4670 4671 4672 4673 4674 4675 4676 4677 4678 4679 4680 4681 4682 4683 4684 4685 4686 4687 4688 4689 4690 4691 4692 4693 4694 4695 4696 4697 4698 4699 4700 4701 4702 4703 4704 4705 4706 4707 4708 4709 4710 4711 4712 4713 4714 4715 4716 4717 4718 4719 4720 4721 4722 4723 4724 4725 4726 4727 4728 4729 4730 4731 4732 4733 4734 4735 4736 4737 4738 4739 4740 4741 4742 4743 4744 4745 4746 4747 4748 4749 4750 4751 4752 4753 4754 4755 4756 4757 4758 4759 4760 4761 4762 4763 4764 4765 4766 4767 4768 4769 4770 4771 4772 4773 4774 4775 4776 4777 4778 4779 4780 4781 4782 4783 4784 4785 4786 4787 4788 4789 4790 4791 4792 4793 4794 4795 4796 4797 4798 4799 4800 4801 4802 4803 4804 4805 4806 4807 4808 4809 4810 4811 4812 4813 4814 4815 4816 4817 4818 4819 4820 4821 4822 4823 4824 4825 4826 4827 4828 4829 4830 4831 4832 4833 4834 4835 4836 4837 4838 4839 4840 4841 4842 4843 4844 4845 4846 4847 4848 4849 4850 4851 4852 4853 4854 4855 4856 4857 4858 4859 4860 4861 4862 4863 4864 4865 4866 4867 4868 4869 4870 4871 4872 4873 4874 4875 4876 4877 4878 4879 4880 4881 4882 4883 4884 4885 4886 4887 4888 4889 4890 4891 4892 4893 4894 4895 4896 4897 4898 4899 4900 4901 4902 4903 4904 4905 4906 4907 4908 4909 4910 4911 4912 4913 4914 4915 4916 4917 4918 4919 4920 4921 4922 4923 4924 4925 4926 4927 4928 4929 4930 4931 4932 4933 4934 4935 4936 4937 4938 4939 4940 4941 4942 4943 4944 4945 4946 4947 4948 4949 4950 4951 4952 4953 4954 4955 4956 4957 4958 4959 4960 4961 4962 4963 4964 4965 4966 4967 4968 4969 4970 4971 4972 4973 4974 4975 4976 4977 4978 4979 4980 4981 4982 4983 4984 4985 4986 4987 4988 4989 4990 4991 4992 4993 4994 4995 4996 4997 4998 4999 5000 5001 5002 5003 5004 5005 5006 5007 5008 5009 5010 5011 5012 5013 5014 5015 5016 5017 5018 5019 5020 5021 5022 5023 5024 5025 5026 5027 5028 5029 5030 5031 5032 5033 5034 5035 5036 5037 5038 5039 5040 5041 5042 5043 5044 5045 5046 5047 5048 5049 5050 5051 5052 5053 5054 5055 5056 5057 5058 5059 5060 5061 5062 5063 5064 5065 5066 5067 5068 5069 5070 5071 5072 5073 5074 5075 5076 5077 5078 5079 5080 5081 5082 5083 5084 5085 5086 5087 5088 5089 5090 5091 5092 5093 5094 5095 5096 5097 5098 5099 5100 5101 5102 5103 5104 5105 5106 5107 5108 5109 5110 5111 5112 5113 5114 5115 5116 5117 5118 5119 5120 5121 5122 5123 5124 5125 5126 5127 5128 5129 5130 5131 5132 5133 5134 5135 5136 5137 5138 5139 5140 5141 5142 5143 5144 5145 5146 5147 5148 5149 5150 5151 5152 5153 5154 5155 5156 5157 5158 5159 5160 5161 5162 5163 5164 5165 5166 5167 5168 5169 5170 5171 5172 5173 5174 5175 5176 5177 5178 5179 5180 5181 5182 5183 5184 5185 5186 5187 5188 5189 5190 5191 5192 5193 5194 5195 5196 5197 5198 5199 5200 5201 5202 5203 5204 5205 5206 5207 5208 5209 5210 5211 5212 5213 5214 5215 5216 5217 5218 5219 5220 5221 5222 5223 5224 5225 5226 5227 5228 5229 5230 5231 5232 5233 5234 5235 5236 5237 5238 5239 5240 5241 5242 5243 5244 5245 5246 5247 5248 5249 5250 5251 5252 5253 5254 5255 5256 5257 5258 5259 5260 5261 5262 5263 5264 5265 5266 5267 5268 5269 5270 5271 5272 5273 5274 5275 5276 5277 5278 5279 5280 5281 5282 5283 5284 5285 5286 5287 5288 5289 5290 5291 5292 5293 5294 5295 5296 5297 5298 5299 5300 5301 5302 5303 5304 5305 5306 5307 5308 5309 5310 5311 5312 5313 5314 5315 5316 5317 5318 5319 5320 5321 5322 5323 5324 5325 5326 5327 5328 5329 5330 5331 5332 5333 5334 5335 5336 5337 5338 5339 5340 5341 5342 5343 5344 5345 5346 5347 5348 5349 5350 5351 5352 5353 5354 5355 5356 5357 5358 5359 5360 5361 5362 5363 5364 5365 5366 5367 5368 5369 5370 5371 5372 5373 5374 5375 5376 5377 5378 5379 5380 5381 5382 5383 5384 5385 5386 5387 5388 5389 5390 5391 5392 5393 5394 5395 5396 5397 5398 5399 5400 5401 5402 5403 5404 5405 5406 5407 5408 5409 5410 5411 5412 5413 5414 5415 5416 5417 5418 5419 5420 5421 5422 5423 5424 5425 5426 5427 5428 5429 5430 5431 5432 5433 5434 5435 5436 5437 5438 5439 5440 5441 5442 5443 5444 5445 5446 5447 5448 5449 5450 5451 5452 5453 5454 5455 5456 5457 5458 5459 5460 5461 5462 5463 5464 5465 5466 5467 5468 5469 5470 5471 5472 5473 5474 5475 5476 5477 5478 5479 5480 5481 5482 5483 5484 5485 5486 5487 5488 5489 5490 5491 5492 5493 5494 5495 5496 5497 5498 5499 5500 5501 5502 5503 5504 5505 5506 5507 5508 5509 5510 5511 5512 5513 5514 5515 5516 5517 5518 5519 5520 5521 5522 5523 5524 5525 5526 5527 5528 5529 5530 5531 5532 5533 5534 5535 5536 5537 5538 5539 5540 5541 5542 5543 5544 5545 5546 5547 5548 5549 5550 5551 5552 5553 5554 5555 5556 5557 5558 5559 5560 5561 5562 5563 5564 5565 5566 5567 5568 5569 5570 5571 5572 5573 5574 5575 5576 5577 5578 5579 5580 5581 5582 5583 5584 5585 5586 5587 5588 5589 5590 5591 5592 5593 5594 5595 5596 5597 5598 5599 5600 5601 5602 5603 5604 5605 5606 5607 5608 5609 5610 5611 5612 5613 5614 5615 5616 5617 5618 5619 5620 5621 5622 5623 5624 5625 5626 5627 5628 5629 5630 5631 5632 5633 5634 5635 5636 5637 5638 5639 5640 5641 5642 5643 5644 5645 5646 5647 5648 5649 5650 5651 5652 5653 5654 5655 5656 5657 5658 5659 5660 5661 5662 5663 5664 5665 5666 5667 5668 5669 5670 5671 5672 5673 5674 5675 5676 5677 5678 5679 5680 5681 5682 5683 5684 5685 5686 5687 5688 5689 5690 5691 5692 5693 5694 5695 5696 5697 5698 5699 5700 5701 5702 5703 5704 5705 5706 5707 5708 5709 5710 5711 5712 5713 5714 5715 5716 5717 5718 5719 5720 5721 5722 5723 5724 5725 5726 5727 5728 5729 5730 5731 5732 5733 5734 5735 5736 5737 5738 5739 5740 5741 5742 5743 5744 5745 5746 5747 5748 5749 5750 5751 5752 5753 5754 5755 5756 5757 5758 5759 5760 5761 5762 5763 5764 5765 5766 5767 5768 5769 5770 5771 5772 5773 5774 5775 5776 5777 5778 5779 5780 5781 5782 5783 5784 5785 5786 5787 5788 5789 5790 5791 5792 5793 5794 5795 5796 5797 5798 5799 5800 5801 5802 5803 5804 5805 5806 5807 5808 5809 5810 5811 5812 5813 5814 5815 5816 5817 5818 5819 5820 5821 5822 5823 5824 5825 5826 5827 5828 5829 5830 5831 5832 5833 5834 5835 5836 5837 5838 5839 5840 5841 5842 5843 5844 5845 5846 5847 5848 5849 5850 5851 5852 5853 5854 5855 5856 5857 5858 5859 5860 5861 5862 5863 5864 5865 5866 5867 5868 5869 5870 5871 5872 5873 5874 5875 5876 5877 5878 5879 5880 5881 5882 5883 5884 5885 5886 5887 5888 5889 5890 5891 5892 5893 5894 5895 5896 5897 5898 5899 5900 5901 5902 5903 5904 5905 5906 5907 5908 5909 5910 5911 5912 5913 5914 5915 5916 5917 5918 5919 5920 5921 5922 5923 5924 5925 5926 5927 5928 5929 5930 5931 5932 5933 5934 5935 5936 5937 5938 5939 5940 5941 5942 5943 5944 5945 5946 5947 5948 5949 5950 5951 5952 5953 5954 5955 5956 5957 5958 5959 5960 5961 5962 5963 5964 5965 5966 5967 5968 5969 5970 5971 5972 5973 5974 5975 5976 5977 5978 5979 5980 5981 5982 5983 5984 5985 5986 5987 5988 5989 5990 5991 5992 5993 5994 5995 5996 5997 5998 5999 6000 6001 6002 6003 6004 6005 6006 6007 6008 6009 6010 6011 6012 6013 6014 6015 6016 6017 6018 6019 6020 6021 6022 6023 6024 6025 6026 6027 6028 6029 6030 6031 6032 6033 6034 6035 6036 6037 6038 6039 6040 6041 6042 6043 6044 6045 6046 6047 6048 6049 6050 6051 6052 6053 6054 6055 6056 6057 6058 6059 6060 6061 6062 6063 6064 6065 6066 6067 6068 6069 6070 6071 6072 6073 6074 6075 6076 6077 6078 6079 6080 6081 6082 6083 6084 6085 6086 6087 6088 6089 6090 6091 6092 6093 6094 6095 6096 6097 6098 6099 6100 6101 6102 6103 6104 6105 6106 6107 6108 6109 6110 6111 6112 6113 6114 6115 6116 6117 6118 6119 6120 6121 6122 6123 6124 6125 6126 6127 6128 6129 6130 6131 6132 6133 6134 6135 6136 6137 6138 6139 6140 6141 6142 6143 6144 6145 6146 6147 6148 6149 6150 6151 6152 6153 6154 6155 6156 6157 6158 6159 6160 6161 6162 6163 6164 6165 6166 6167 6168 6169 6170 6171 6172 6173 6174 6175 6176 6177 6178 6179 6180 6181 6182 6183 6184 6185 6186 6187 6188 6189 6190 6191 6192 6193 6194 6195 6196 6197 6198 6199 6200 6201 6202 6203 6204 6205 6206 6207 6208 6209 6210 6211 6212 6213 6214 6215 6216 6217 6218 6219 6220 6221 6222 6223 6224 6225 6226 6227 6228 6229 6230 6231 6232 6233 6234 6235 6236 6237 6238 6239 6240 6241 6242 6243 6244 6245 6246 6247 6248 6249 6250 6251 6252 6253 6254 6255 6256 6257 6258 6259 6260 6261 6262 6263 6264 6265 6266 6267 6268 6269 6270 6271 6272 6273 6274 6275 6276 6277 6278 6279 6280 6281 6282 6283 6284 6285 6286 6287 6288 6289 6290 6291 6292 6293 6294 6295 6296 6297 6298 6299 6300 6301 6302 6303 6304 6305 6306 6307 6308 6309 6310 6311 6312 6313 6314 6315 6316 6317 6318 6319 6320 6321 6322 6323 6324 6325 6326 6327 6328 6329 6330 6331 6332 6333 6334 6335 6336 6337 6338 6339 6340 6341 6342 6343 6344 6345 6346 6347 6348 6349 6350 6351 6352 6353 6354 6355 6356 6357 6358 6359 6360 6361 6362 6363 6364 6365 6366 6367 6368 6369 6370 6371 6372 6373 6374 6375 6376 6377 6378 6379 6380 6381 6382 6383 6384 6385 6386 6387 6388 6389 6390 6391 6392 6393 6394 6395 6396 6397 6398 6399 6400 6401 6402 6403 6404 6405 6406 6407 6408 6409 6410 6411 6412 6413 6414 6415 6416 6417 6418 6419 6420 6421 6422 6423 6424 6425 6426 6427 6428 6429 6430 6431 6432 6433 6434 6435 6436 6437 6438 6439 6440 6441 6442 6443 6444 6445 6446 6447 6448 6449 6450 6451 6452 6453 6454 6455 6456 6457 6458 6459 6460 6461 6462 6463 6464 6465 6466 6467 6468 6469 6470 6471 6472 6473 6474 6475 6476 6477 6478 6479 6480 6481 6482 6483 6484 6485 6486 6487 6488 6489 6490 6491 6492 6493 6494 6495 6496 6497 6498 6499 6500 6501 6502 6503 6504 6505 6506 6507 6508 6509 6510 6511 6512 6513 6514 6515 6516 6517 6518 6519 6520 6521 6522 6523 6524 6525 6526 6527 6528 6529 6530 6531 6532 6533 6534 6535 6536 6537 6538 6539 6540 6541 6542 6543 6544 6545 6546 6547 6548 6549 6550 6551 6552 6553 6554 6555 6556 6557 6558 6559 6560 6561 6562 6563 6564 6565 6566 | // SPDX-License-Identifier: GPL-2.0 /* * USB hub driver. * * (C) Copyright 1999 Linus Torvalds * (C) Copyright 1999 Johannes Erdfelt * (C) Copyright 1999 Gregory P. Smith * (C) Copyright 2001 Brad Hards (bhards@bigpond.net.au) * * Released under the GPLv2 only. */ #include <linux/kernel.h> #include <linux/errno.h> #include <linux/module.h> #include <linux/moduleparam.h> #include <linux/completion.h> #include <linux/sched/mm.h> #include <linux/list.h> #include <linux/slab.h> #include <linux/string_choices.h> #include <linux/kcov.h> #include <linux/ioctl.h> #include <linux/usb.h> #include <linux/usbdevice_fs.h> #include <linux/usb/hcd.h> #include <linux/usb/onboard_dev.h> #include <linux/usb/otg.h> #include <linux/usb/quirks.h> #include <linux/workqueue.h> #include <linux/mutex.h> #include <linux/random.h> #include <linux/pm_qos.h> #include <linux/kobject.h> #include <linux/bitfield.h> #include <linux/uaccess.h> #include <asm/byteorder.h> #include "hub.h" #include "phy.h" #include "otg_productlist.h" #define USB_VENDOR_GENESYS_LOGIC 0x05e3 #define USB_VENDOR_SMSC 0x0424 #define USB_PRODUCT_USB5534B 0x5534 #define USB_VENDOR_CYPRESS 0x04b4 #define USB_PRODUCT_CY7C65632 0x6570 #define USB_VENDOR_TEXAS_INSTRUMENTS 0x0451 #define USB_PRODUCT_TUSB8041_USB3 0x8140 #define USB_PRODUCT_TUSB8041_USB2 0x8142 #define USB_VENDOR_MICROCHIP 0x0424 #define USB_PRODUCT_USB4913 0x4913 #define USB_PRODUCT_USB4914 0x4914 #define USB_PRODUCT_USB4915 0x4915 #define HUB_QUIRK_CHECK_PORT_AUTOSUSPEND BIT(0) #define HUB_QUIRK_DISABLE_AUTOSUSPEND BIT(1) #define HUB_QUIRK_REDUCE_FRAME_INTR_BINTERVAL BIT(2) #define USB_TP_TRANSMISSION_DELAY 40 /* ns */ #define USB_TP_TRANSMISSION_DELAY_MAX 65535 /* ns */ #define USB_PING_RESPONSE_TIME 400 /* ns */ #define USB_REDUCE_FRAME_INTR_BINTERVAL 9 /* * The SET_ADDRESS request timeout will be 500 ms when * USB_QUIRK_SHORT_SET_ADDRESS_REQ_TIMEOUT quirk flag is set. */ #define USB_SHORT_SET_ADDRESS_REQ_TIMEOUT 500 /* ms */ /* * Give SS hubs 200ms time after wake to train downstream links before * assuming no port activity and allowing hub to runtime suspend back. */ #define USB_SS_PORT_U0_WAKE_TIME 200 /* ms */ /* Protect struct usb_device->state and ->children members * Note: Both are also protected by ->dev.sem, except that ->state can * change to USB_STATE_NOTATTACHED even when the semaphore isn't held. */ static DEFINE_SPINLOCK(device_state_lock); /* workqueue to process hub events */ static struct workqueue_struct *hub_wq; static void hub_event(struct work_struct *work); /* synchronize hub-port add/remove and peering operations */ DEFINE_MUTEX(usb_port_peer_mutex); /* cycle leds on hubs that aren't blinking for attention */ static bool blinkenlights; module_param(blinkenlights, bool, S_IRUGO); MODULE_PARM_DESC(blinkenlights, "true to cycle leds on hubs"); /* * Device SATA8000 FW1.0 from DATAST0R Technology Corp requires about * 10 seconds to send reply for the initial 64-byte descriptor request. */ /* define initial 64-byte descriptor request timeout in milliseconds */ static int initial_descriptor_timeout = USB_CTRL_GET_TIMEOUT; module_param(initial_descriptor_timeout, int, S_IRUGO|S_IWUSR); MODULE_PARM_DESC(initial_descriptor_timeout, "initial 64-byte descriptor request timeout in milliseconds " "(default 5000 - 5.0 seconds)"); /* * As of 2.6.10 we introduce a new USB device initialization scheme which * closely resembles the way Windows works. Hopefully it will be compatible * with a wider range of devices than the old scheme. However some previously * working devices may start giving rise to "device not accepting address" * errors; if that happens the user can try the old scheme by adjusting the * following module parameters. * * For maximum flexibility there are two boolean parameters to control the * hub driver's behavior. On the first initialization attempt, if the * "old_scheme_first" parameter is set then the old scheme will be used, * otherwise the new scheme is used. If that fails and "use_both_schemes" * is set, then the driver will make another attempt, using the other scheme. */ static bool old_scheme_first; module_param(old_scheme_first, bool, S_IRUGO | S_IWUSR); MODULE_PARM_DESC(old_scheme_first, "start with the old device initialization scheme"); static bool use_both_schemes = true; module_param(use_both_schemes, bool, S_IRUGO | S_IWUSR); MODULE_PARM_DESC(use_both_schemes, "try the other device initialization scheme if the " "first one fails"); /* Mutual exclusion for EHCI CF initialization. This interferes with * port reset on some companion controllers. */ DECLARE_RWSEM(ehci_cf_port_reset_rwsem); EXPORT_SYMBOL_GPL(ehci_cf_port_reset_rwsem); #define HUB_DEBOUNCE_TIMEOUT 2000 #define HUB_DEBOUNCE_STEP 25 #define HUB_DEBOUNCE_STABLE 100 static int usb_reset_and_verify_device(struct usb_device *udev); static int hub_port_disable(struct usb_hub *hub, int port1, int set_state); static bool hub_port_warm_reset_required(struct usb_hub *hub, int port1, u16 portstatus); static inline char *portspeed(struct usb_hub *hub, int portstatus) { if (hub_is_superspeedplus(hub->hdev)) return "10.0 Gb/s"; if (hub_is_superspeed(hub->hdev)) return "5.0 Gb/s"; if (portstatus & USB_PORT_STAT_HIGH_SPEED) return "480 Mb/s"; else if (portstatus & USB_PORT_STAT_LOW_SPEED) return "1.5 Mb/s"; else return "12 Mb/s"; } /* Note that hdev or one of its children must be locked! */ struct usb_hub *usb_hub_to_struct_hub(struct usb_device *hdev) { if (!hdev || !hdev->actconfig || !hdev->maxchild) return NULL; return usb_get_intfdata(hdev->actconfig->interface[0]); } int usb_device_supports_lpm(struct usb_device *udev) { /* Some devices have trouble with LPM */ if (udev->quirks & USB_QUIRK_NO_LPM) return 0; /* Skip if the device BOS descriptor couldn't be read */ if (!udev->bos) return 0; /* USB 2.1 (and greater) devices indicate LPM support through * their USB 2.0 Extended Capabilities BOS descriptor. */ if (udev->speed == USB_SPEED_HIGH || udev->speed == USB_SPEED_FULL) { if (udev->bos->ext_cap && (USB_LPM_SUPPORT & le32_to_cpu(udev->bos->ext_cap->bmAttributes))) return 1; return 0; } /* * According to the USB 3.0 spec, all USB 3.0 devices must support LPM. * However, there are some that don't, and they set the U1/U2 exit * latencies to zero. */ if (!udev->bos->ss_cap) { dev_info(&udev->dev, "No LPM exit latency info found, disabling LPM.\n"); return 0; } if (udev->bos->ss_cap->bU1devExitLat == 0 && udev->bos->ss_cap->bU2DevExitLat == 0) { if (udev->parent) dev_info(&udev->dev, "LPM exit latency is zeroed, disabling LPM.\n"); else dev_info(&udev->dev, "We don't know the algorithms for LPM for this host, disabling LPM.\n"); return 0; } if (!udev->parent || udev->parent->lpm_capable) return 1; return 0; } /* * Set the Maximum Exit Latency (MEL) for the host to wakup up the path from * U1/U2, send a PING to the device and receive a PING_RESPONSE. * See USB 3.1 section C.1.5.2 */ static void usb_set_lpm_mel(struct usb_device *udev, struct usb3_lpm_parameters *udev_lpm_params, unsigned int udev_exit_latency, struct usb_hub *hub, struct usb3_lpm_parameters *hub_lpm_params, unsigned int hub_exit_latency) { unsigned int total_mel; /* * tMEL1. time to transition path from host to device into U0. * MEL for parent already contains the delay up to parent, so only add * the exit latency for the last link (pick the slower exit latency), * and the hub header decode latency. See USB 3.1 section C 2.2.1 * Store MEL in nanoseconds */ total_mel = hub_lpm_params->mel + max(udev_exit_latency, hub_exit_latency) * 1000 + hub->descriptor->u.ss.bHubHdrDecLat * 100; /* * tMEL2. Time to submit PING packet. Sum of tTPTransmissionDelay for * each link + wHubDelay for each hub. Add only for last link. * tMEL4, the time for PING_RESPONSE to traverse upstream is similar. * Multiply by 2 to include it as well. */ total_mel += (__le16_to_cpu(hub->descriptor->u.ss.wHubDelay) + USB_TP_TRANSMISSION_DELAY) * 2; /* * tMEL3, tPingResponse. Time taken by device to generate PING_RESPONSE * after receiving PING. Also add 2100ns as stated in USB 3.1 C 1.5.2.4 * to cover the delay if the PING_RESPONSE is queued behind a Max Packet * Size DP. * Note these delays should be added only once for the entire path, so * add them to the MEL of the device connected to the roothub. */ if (!hub->hdev->parent) total_mel += USB_PING_RESPONSE_TIME + 2100; udev_lpm_params->mel = total_mel; } /* * Set the maximum Device to Host Exit Latency (PEL) for the device to initiate * a transition from either U1 or U2. */ static void usb_set_lpm_pel(struct usb_device *udev, struct usb3_lpm_parameters *udev_lpm_params, unsigned int udev_exit_latency, struct usb_hub *hub, struct usb3_lpm_parameters *hub_lpm_params, unsigned int hub_exit_latency, unsigned int port_to_port_exit_latency) { unsigned int first_link_pel; unsigned int hub_pel; /* * First, the device sends an LFPS to transition the link between the * device and the parent hub into U0. The exit latency is the bigger of * the device exit latency or the hub exit latency. */ if (udev_exit_latency > hub_exit_latency) first_link_pel = udev_exit_latency * 1000; else first_link_pel = hub_exit_latency * 1000; /* * When the hub starts to receive the LFPS, there is a slight delay for * it to figure out that one of the ports is sending an LFPS. Then it * will forward the LFPS to its upstream link. The exit latency is the * delay, plus the PEL that we calculated for this hub. */ hub_pel = port_to_port_exit_latency * 1000 + hub_lpm_params->pel; /* * According to figure C-7 in the USB 3.0 spec, the PEL for this device * is the greater of the two exit latencies. */ if (first_link_pel > hub_pel) udev_lpm_params->pel = first_link_pel; else udev_lpm_params->pel = hub_pel; } /* * Set the System Exit Latency (SEL) to indicate the total worst-case time from * when a device initiates a transition to U0, until when it will receive the * first packet from the host controller. * * Section C.1.5.1 describes the four components to this: * - t1: device PEL * - t2: time for the ERDY to make it from the device to the host. * - t3: a host-specific delay to process the ERDY. * - t4: time for the packet to make it from the host to the device. * * t3 is specific to both the xHCI host and the platform the host is integrated * into. The Intel HW folks have said it's negligible, FIXME if a different * vendor says otherwise. */ static void usb_set_lpm_sel(struct usb_device *udev, struct usb3_lpm_parameters *udev_lpm_params) { struct usb_device *parent; unsigned int num_hubs; unsigned int total_sel; /* t1 = device PEL */ total_sel = udev_lpm_params->pel; /* How many external hubs are in between the device & the root port. */ for (parent = udev->parent, num_hubs = 0; parent->parent; parent = parent->parent) num_hubs++; /* t2 = 2.1us + 250ns * (num_hubs - 1) */ if (num_hubs > 0) total_sel += 2100 + 250 * (num_hubs - 1); /* t4 = 250ns * num_hubs */ total_sel += 250 * num_hubs; udev_lpm_params->sel = total_sel; } static void usb_set_lpm_parameters(struct usb_device *udev) { struct usb_hub *hub; unsigned int port_to_port_delay; unsigned int udev_u1_del; unsigned int udev_u2_del; unsigned int hub_u1_del; unsigned int hub_u2_del; if (!udev->lpm_capable || udev->speed < USB_SPEED_SUPER) return; /* Skip if the device BOS descriptor couldn't be read */ if (!udev->bos) return; hub = usb_hub_to_struct_hub(udev->parent); /* It doesn't take time to transition the roothub into U0, since it * doesn't have an upstream link. */ if (!hub) return; udev_u1_del = udev->bos->ss_cap->bU1devExitLat; udev_u2_del = le16_to_cpu(udev->bos->ss_cap->bU2DevExitLat); hub_u1_del = udev->parent->bos->ss_cap->bU1devExitLat; hub_u2_del = le16_to_cpu(udev->parent->bos->ss_cap->bU2DevExitLat); usb_set_lpm_mel(udev, &udev->u1_params, udev_u1_del, hub, &udev->parent->u1_params, hub_u1_del); usb_set_lpm_mel(udev, &udev->u2_params, udev_u2_del, hub, &udev->parent->u2_params, hub_u2_del); /* * Appendix C, section C.2.2.2, says that there is a slight delay from * when the parent hub notices the downstream port is trying to * transition to U0 to when the hub initiates a U0 transition on its * upstream port. The section says the delays are tPort2PortU1EL and * tPort2PortU2EL, but it doesn't define what they are. * * The hub chapter, sections 10.4.2.4 and 10.4.2.5 seem to be talking * about the same delays. Use the maximum delay calculations from those * sections. For U1, it's tHubPort2PortExitLat, which is 1us max. For * U2, it's tHubPort2PortExitLat + U2DevExitLat - U1DevExitLat. I * assume the device exit latencies they are talking about are the hub * exit latencies. * * What do we do if the U2 exit latency is less than the U1 exit * latency? It's possible, although not likely... */ port_to_port_delay = 1; usb_set_lpm_pel(udev, &udev->u1_params, udev_u1_del, hub, &udev->parent->u1_params, hub_u1_del, port_to_port_delay); if (hub_u2_del > hub_u1_del) port_to_port_delay = 1 + hub_u2_del - hub_u1_del; else port_to_port_delay = 1 + hub_u1_del; usb_set_lpm_pel(udev, &udev->u2_params, udev_u2_del, hub, &udev->parent->u2_params, hub_u2_del, port_to_port_delay); /* Now that we've got PEL, calculate SEL. */ usb_set_lpm_sel(udev, &udev->u1_params); usb_set_lpm_sel(udev, &udev->u2_params); } /* USB 2.0 spec Section 11.24.4.5 */ static int get_hub_descriptor(struct usb_device *hdev, struct usb_hub_descriptor *desc) { int i, ret, size; unsigned dtype; if (hub_is_superspeed(hdev)) { dtype = USB_DT_SS_HUB; size = USB_DT_SS_HUB_SIZE; } else { dtype = USB_DT_HUB; size = sizeof(struct usb_hub_descriptor); } for (i = 0; i < 3; i++) { ret = usb_control_msg(hdev, usb_rcvctrlpipe(hdev, 0), USB_REQ_GET_DESCRIPTOR, USB_DIR_IN | USB_RT_HUB, dtype << 8, 0, desc, size, USB_CTRL_GET_TIMEOUT); if (hub_is_superspeed(hdev)) { if (ret == size) return ret; } else if (ret >= USB_DT_HUB_NONVAR_SIZE + 2) { /* Make sure we have the DeviceRemovable field. */ size = USB_DT_HUB_NONVAR_SIZE + desc->bNbrPorts / 8 + 1; if (ret < size) return -EMSGSIZE; return ret; } } return -EINVAL; } /* * USB 2.0 spec Section 11.24.2.1 */ static int clear_hub_feature(struct usb_device *hdev, int feature) { return usb_control_msg(hdev, usb_sndctrlpipe(hdev, 0), USB_REQ_CLEAR_FEATURE, USB_RT_HUB, feature, 0, NULL, 0, 1000); } /* * USB 2.0 spec Section 11.24.2.2 */ int usb_clear_port_feature(struct usb_device *hdev, int port1, int feature) { return usb_control_msg(hdev, usb_sndctrlpipe(hdev, 0), USB_REQ_CLEAR_FEATURE, USB_RT_PORT, feature, port1, NULL, 0, 1000); } /* * USB 2.0 spec Section 11.24.2.13 */ static int set_port_feature(struct usb_device *hdev, int port1, int feature) { return usb_control_msg(hdev, usb_sndctrlpipe(hdev, 0), USB_REQ_SET_FEATURE, USB_RT_PORT, feature, port1, NULL, 0, 1000); } static char *to_led_name(int selector) { switch (selector) { case HUB_LED_AMBER: return "amber"; case HUB_LED_GREEN: return "green"; case HUB_LED_OFF: return "off"; case HUB_LED_AUTO: return "auto"; default: return "??"; } } /* * USB 2.0 spec Section 11.24.2.7.1.10 and table 11-7 * for info about using port indicators */ static void set_port_led(struct usb_hub *hub, int port1, int selector) { struct usb_port *port_dev = hub->ports[port1 - 1]; int status; status = set_port_feature(hub->hdev, (selector << 8) | port1, USB_PORT_FEAT_INDICATOR); dev_dbg(&port_dev->dev, "indicator %s status %d\n", to_led_name(selector), status); } #define LED_CYCLE_PERIOD ((2*HZ)/3) static void led_work(struct work_struct *work) { struct usb_hub *hub = container_of(work, struct usb_hub, leds.work); struct usb_device *hdev = hub->hdev; unsigned i; unsigned changed = 0; int cursor = -1; if (hdev->state != USB_STATE_CONFIGURED || hub->quiescing) return; for (i = 0; i < hdev->maxchild; i++) { unsigned selector, mode; /* 30%-50% duty cycle */ switch (hub->indicator[i]) { /* cycle marker */ case INDICATOR_CYCLE: cursor = i; selector = HUB_LED_AUTO; mode = INDICATOR_AUTO; break; /* blinking green = sw attention */ case INDICATOR_GREEN_BLINK: selector = HUB_LED_GREEN; mode = INDICATOR_GREEN_BLINK_OFF; break; case INDICATOR_GREEN_BLINK_OFF: selector = HUB_LED_OFF; mode = INDICATOR_GREEN_BLINK; break; /* blinking amber = hw attention */ case INDICATOR_AMBER_BLINK: selector = HUB_LED_AMBER; mode = INDICATOR_AMBER_BLINK_OFF; break; case INDICATOR_AMBER_BLINK_OFF: selector = HUB_LED_OFF; mode = INDICATOR_AMBER_BLINK; break; /* blink green/amber = reserved */ case INDICATOR_ALT_BLINK: selector = HUB_LED_GREEN; mode = INDICATOR_ALT_BLINK_OFF; break; case INDICATOR_ALT_BLINK_OFF: selector = HUB_LED_AMBER; mode = INDICATOR_ALT_BLINK; break; default: continue; } if (selector != HUB_LED_AUTO) changed = 1; set_port_led(hub, i + 1, selector); hub->indicator[i] = mode; } if (!changed && blinkenlights) { cursor++; cursor %= hdev->maxchild; set_port_led(hub, cursor + 1, HUB_LED_GREEN); hub->indicator[cursor] = INDICATOR_CYCLE; changed++; } if (changed) queue_delayed_work(system_power_efficient_wq, &hub->leds, LED_CYCLE_PERIOD); } /* use a short timeout for hub/port status fetches */ #define USB_STS_TIMEOUT 1000 #define USB_STS_RETRIES 5 /* * USB 2.0 spec Section 11.24.2.6 */ static int get_hub_status(struct usb_device *hdev, struct usb_hub_status *data) { int i, status = -ETIMEDOUT; for (i = 0; i < USB_STS_RETRIES && (status == -ETIMEDOUT || status == -EPIPE); i++) { status = usb_control_msg(hdev, usb_rcvctrlpipe(hdev, 0), USB_REQ_GET_STATUS, USB_DIR_IN | USB_RT_HUB, 0, 0, data, sizeof(*data), USB_STS_TIMEOUT); } return status; } /* * USB 2.0 spec Section 11.24.2.7 * USB 3.1 takes into use the wValue and wLength fields, spec Section 10.16.2.6 */ static int get_port_status(struct usb_device *hdev, int port1, void *data, u16 value, u16 length) { int i, status = -ETIMEDOUT; for (i = 0; i < USB_STS_RETRIES && (status == -ETIMEDOUT || status == -EPIPE); i++) { status = usb_control_msg(hdev, usb_rcvctrlpipe(hdev, 0), USB_REQ_GET_STATUS, USB_DIR_IN | USB_RT_PORT, value, port1, data, length, USB_STS_TIMEOUT); } return status; } static int hub_ext_port_status(struct usb_hub *hub, int port1, int type, u16 *status, u16 *change, u32 *ext_status) { int ret; int len = 4; if (type != HUB_PORT_STATUS) len = 8; mutex_lock(&hub->status_mutex); ret = get_port_status(hub->hdev, port1, &hub->status->port, type, len); if (ret < len) { if (ret != -ENODEV) dev_err(hub->intfdev, "%s failed (err = %d)\n", __func__, ret); if (ret >= 0) ret = -EIO; } else { *status = le16_to_cpu(hub->status->port.wPortStatus); *change = le16_to_cpu(hub->status->port.wPortChange); if (type != HUB_PORT_STATUS && ext_status) *ext_status = le32_to_cpu( hub->status->port.dwExtPortStatus); ret = 0; } mutex_unlock(&hub->status_mutex); /* * There is no need to lock status_mutex here, because status_mutex * protects hub->status, and the phy driver only checks the port * status without changing the status. */ if (!ret) { struct usb_device *hdev = hub->hdev; /* * Only roothub will be notified of connection changes, * since the USB PHY only cares about changes at the next * level. */ if (is_root_hub(hdev)) { struct usb_hcd *hcd = bus_to_hcd(hdev->bus); bool connect; bool connect_change; connect_change = *change & USB_PORT_STAT_C_CONNECTION; connect = *status & USB_PORT_STAT_CONNECTION; if (connect_change && connect) usb_phy_roothub_notify_connect(hcd->phy_roothub, port1 - 1); else if (connect_change) usb_phy_roothub_notify_disconnect(hcd->phy_roothub, port1 - 1); } } return ret; } int usb_hub_port_status(struct usb_hub *hub, int port1, u16 *status, u16 *change) { return hub_ext_port_status(hub, port1, HUB_PORT_STATUS, status, change, NULL); } static void hub_resubmit_irq_urb(struct usb_hub *hub) { unsigned long flags; int status; spin_lock_irqsave(&hub->irq_urb_lock, flags); if (hub->quiescing) { spin_unlock_irqrestore(&hub->irq_urb_lock, flags); return; } status = usb_submit_urb(hub->urb, GFP_ATOMIC); if (status && status != -ENODEV && status != -EPERM && status != -ESHUTDOWN) { dev_err(hub->intfdev, "resubmit --> %d\n", status); mod_timer(&hub->irq_urb_retry, jiffies + HZ); } spin_unlock_irqrestore(&hub->irq_urb_lock, flags); } static void hub_retry_irq_urb(struct timer_list *t) { struct usb_hub *hub = timer_container_of(hub, t, irq_urb_retry); hub_resubmit_irq_urb(hub); } static void kick_hub_wq(struct usb_hub *hub) { struct usb_interface *intf; if (hub->disconnected || work_pending(&hub->events)) return; /* * Suppress autosuspend until the event is proceed. * * Be careful and make sure that the symmetric operation is * always called. We are here only when there is no pending * work for this hub. Therefore put the interface either when * the new work is called or when it is canceled. */ intf = to_usb_interface(hub->intfdev); usb_autopm_get_interface_no_resume(intf); hub_get(hub); if (queue_work(hub_wq, &hub->events)) return; /* the work has already been scheduled */ usb_autopm_put_interface_async(intf); hub_put(hub); } void usb_kick_hub_wq(struct usb_device *hdev) { struct usb_hub *hub = usb_hub_to_struct_hub(hdev); if (hub) kick_hub_wq(hub); } /* * Let the USB core know that a USB 3.0 device has sent a Function Wake Device * Notification, which indicates it had initiated remote wakeup. * * USB 3.0 hubs do not report the port link state change from U3 to U0 when the * device initiates resume, so the USB core will not receive notice of the * resume through the normal hub interrupt URB. */ void usb_wakeup_notification(struct usb_device *hdev, unsigned int portnum) { struct usb_hub *hub; struct usb_port *port_dev; if (!hdev) return; hub = usb_hub_to_struct_hub(hdev); if (hub) { port_dev = hub->ports[portnum - 1]; if (port_dev && port_dev->child) pm_wakeup_event(&port_dev->child->dev, 0); set_bit(portnum, hub->wakeup_bits); kick_hub_wq(hub); } } EXPORT_SYMBOL_GPL(usb_wakeup_notification); /* completion function, fires on port status changes and various faults */ static void hub_irq(struct urb *urb) { struct usb_hub *hub = urb->context; int status = urb->status; unsigned i; unsigned long bits; switch (status) { case -ENOENT: /* synchronous unlink */ case -ECONNRESET: /* async unlink */ case -ESHUTDOWN: /* hardware going away */ return; default: /* presumably an error */ /* Cause a hub reset after 10 consecutive errors */ dev_dbg(hub->intfdev, "transfer --> %d\n", status); if ((++hub->nerrors < 10) || hub->error) goto resubmit; hub->error = status; fallthrough; /* let hub_wq handle things */ case 0: /* we got data: port status changed */ bits = 0; for (i = 0; i < urb->actual_length; ++i) bits |= ((unsigned long) ((*hub->buffer)[i])) << (i*8); hub->event_bits[0] = bits; break; } hub->nerrors = 0; /* Something happened, let hub_wq figure it out */ kick_hub_wq(hub); resubmit: hub_resubmit_irq_urb(hub); } /* USB 2.0 spec Section 11.24.2.3 */ static inline int hub_clear_tt_buffer(struct usb_device *hdev, u16 devinfo, u16 tt) { /* Need to clear both directions for control ep */ if (((devinfo >> 11) & USB_ENDPOINT_XFERTYPE_MASK) == USB_ENDPOINT_XFER_CONTROL) { int status = usb_control_msg(hdev, usb_sndctrlpipe(hdev, 0), HUB_CLEAR_TT_BUFFER, USB_RT_PORT, devinfo ^ 0x8000, tt, NULL, 0, 1000); if (status) return status; } return usb_control_msg(hdev, usb_sndctrlpipe(hdev, 0), HUB_CLEAR_TT_BUFFER, USB_RT_PORT, devinfo, tt, NULL, 0, 1000); } /* * enumeration blocks hub_wq for a long time. we use keventd instead, since * long blocking there is the exception, not the rule. accordingly, HCDs * talking to TTs must queue control transfers (not just bulk and iso), so * both can talk to the same hub concurrently. */ static void hub_tt_work(struct work_struct *work) { struct usb_hub *hub = container_of(work, struct usb_hub, tt.clear_work); unsigned long flags; spin_lock_irqsave(&hub->tt.lock, flags); while (!list_empty(&hub->tt.clear_list)) { struct list_head *next; struct usb_tt_clear *clear; struct usb_device *hdev = hub->hdev; const struct hc_driver *drv; int status; next = hub->tt.clear_list.next; clear = list_entry(next, struct usb_tt_clear, clear_list); list_del(&clear->clear_list); /* drop lock so HCD can concurrently report other TT errors */ spin_unlock_irqrestore(&hub->tt.lock, flags); status = hub_clear_tt_buffer(hdev, clear->devinfo, clear->tt); if (status && status != -ENODEV) dev_err(&hdev->dev, "clear tt %d (%04x) error %d\n", clear->tt, clear->devinfo, status); /* Tell the HCD, even if the operation failed */ drv = clear->hcd->driver; if (drv->clear_tt_buffer_complete) (drv->clear_tt_buffer_complete)(clear->hcd, clear->ep); kfree(clear); spin_lock_irqsave(&hub->tt.lock, flags); } spin_unlock_irqrestore(&hub->tt.lock, flags); } /** * usb_hub_set_port_power - control hub port's power state * @hdev: USB device belonging to the usb hub * @hub: target hub * @port1: port index * @set: expected status * * call this function to control port's power via setting or * clearing the port's PORT_POWER feature. * * Return: 0 if successful. A negative error code otherwise. */ int usb_hub_set_port_power(struct usb_device *hdev, struct usb_hub *hub, int port1, bool set) { int ret; if (set) ret = set_port_feature(hdev, port1, USB_PORT_FEAT_POWER); else ret = usb_clear_port_feature(hdev, port1, USB_PORT_FEAT_POWER); if (ret) return ret; if (set) set_bit(port1, hub->power_bits); else clear_bit(port1, hub->power_bits); return 0; } /** * usb_hub_clear_tt_buffer - clear control/bulk TT state in high speed hub * @urb: an URB associated with the failed or incomplete split transaction * * High speed HCDs use this to tell the hub driver that some split control or * bulk transaction failed in a way that requires clearing internal state of * a transaction translator. This is normally detected (and reported) from * interrupt context. * * It may not be possible for that hub to handle additional full (or low) * speed transactions until that state is fully cleared out. * * Return: 0 if successful. A negative error code otherwise. */ int usb_hub_clear_tt_buffer(struct urb *urb) { struct usb_device *udev = urb->dev; int pipe = urb->pipe; struct usb_tt *tt = udev->tt; unsigned long flags; struct usb_tt_clear *clear; /* we've got to cope with an arbitrary number of pending TT clears, * since each TT has "at least two" buffers that can need it (and * there can be many TTs per hub). even if they're uncommon. */ clear = kmalloc(sizeof *clear, GFP_ATOMIC); if (clear == NULL) { dev_err(&udev->dev, "can't save CLEAR_TT_BUFFER state\n"); /* FIXME recover somehow ... RESET_TT? */ return -ENOMEM; } /* info that CLEAR_TT_BUFFER needs */ clear->tt = tt->multi ? udev->ttport : 1; clear->devinfo = usb_pipeendpoint (pipe); clear->devinfo |= ((u16)udev->devaddr) << 4; clear->devinfo |= usb_pipecontrol(pipe) ? (USB_ENDPOINT_XFER_CONTROL << 11) : (USB_ENDPOINT_XFER_BULK << 11); if (usb_pipein(pipe)) clear->devinfo |= 1 << 15; /* info for completion callback */ clear->hcd = bus_to_hcd(udev->bus); clear->ep = urb->ep; /* tell keventd to clear state for this TT */ spin_lock_irqsave(&tt->lock, flags); list_add_tail(&clear->clear_list, &tt->clear_list); schedule_work(&tt->clear_work); spin_unlock_irqrestore(&tt->lock, flags); return 0; } EXPORT_SYMBOL_GPL(usb_hub_clear_tt_buffer); static void hub_power_on(struct usb_hub *hub, bool do_delay) { int port1; /* Enable power on each port. Some hubs have reserved values * of LPSM (> 2) in their descriptors, even though they are * USB 2.0 hubs. Some hubs do not implement port-power switching * but only emulate it. In all cases, the ports won't work * unless we send these messages to the hub. */ if (hub_is_port_power_switchable(hub)) dev_dbg(hub->intfdev, "enabling power on all ports\n"); else dev_dbg(hub->intfdev, "trying to enable port power on " "non-switchable hub\n"); for (port1 = 1; port1 <= hub->hdev->maxchild; port1++) if (test_bit(port1, hub->power_bits)) set_port_feature(hub->hdev, port1, USB_PORT_FEAT_POWER); else usb_clear_port_feature(hub->hdev, port1, USB_PORT_FEAT_POWER); if (do_delay) msleep(hub_power_on_good_delay(hub)); } static int hub_hub_status(struct usb_hub *hub, u16 *status, u16 *change) { int ret; mutex_lock(&hub->status_mutex); ret = get_hub_status(hub->hdev, &hub->status->hub); if (ret < 0) { if (ret != -ENODEV) dev_err(hub->intfdev, "%s failed (err = %d)\n", __func__, ret); } else { *status = le16_to_cpu(hub->status->hub.wHubStatus); *change = le16_to_cpu(hub->status->hub.wHubChange); ret = 0; } mutex_unlock(&hub->status_mutex); return ret; } static int hub_set_port_link_state(struct usb_hub *hub, int port1, unsigned int link_status) { return set_port_feature(hub->hdev, port1 | (link_status << 3), USB_PORT_FEAT_LINK_STATE); } /* * Disable a port and mark a logical connect-change event, so that some * time later hub_wq will disconnect() any existing usb_device on the port * and will re-enumerate if there actually is a device attached. */ static void hub_port_logical_disconnect(struct usb_hub *hub, int port1) { dev_dbg(&hub->ports[port1 - 1]->dev, "logical disconnect\n"); hub_port_disable(hub, port1, 1); /* FIXME let caller ask to power down the port: * - some devices won't enumerate without a VBUS power cycle * - SRP saves power that way * - ... new call, TBD ... * That's easy if this hub can switch power per-port, and * hub_wq reactivates the port later (timer, SRP, etc). * Powerdown must be optional, because of reset/DFU. */ set_bit(port1, hub->change_bits); kick_hub_wq(hub); } /** * usb_remove_device - disable a device's port on its parent hub * @udev: device to be disabled and removed * Context: @udev locked, must be able to sleep. * * After @udev's port has been disabled, hub_wq is notified and it will * see that the device has been disconnected. When the device is * physically unplugged and something is plugged in, the events will * be received and processed normally. * * Return: 0 if successful. A negative error code otherwise. */ int usb_remove_device(struct usb_device *udev) { struct usb_hub *hub; struct usb_interface *intf; int ret; if (!udev->parent) /* Can't remove a root hub */ return -EINVAL; hub = usb_hub_to_struct_hub(udev->parent); intf = to_usb_interface(hub->intfdev); ret = usb_autopm_get_interface(intf); if (ret < 0) return ret; set_bit(udev->portnum, hub->removed_bits); hub_port_logical_disconnect(hub, udev->portnum); usb_autopm_put_interface(intf); return 0; } enum hub_activation_type { HUB_INIT, HUB_INIT2, HUB_INIT3, /* INITs must come first */ HUB_POST_RESET, HUB_RESUME, HUB_RESET_RESUME, }; static void hub_init_func2(struct work_struct *ws); static void hub_init_func3(struct work_struct *ws); static void hub_activate(struct usb_hub *hub, enum hub_activation_type type) { struct usb_device *hdev = hub->hdev; struct usb_hcd *hcd; int ret; int port1; int status; bool need_debounce_delay = false; unsigned delay; /* Continue a partial initialization */ if (type == HUB_INIT2 || type == HUB_INIT3) { device_lock(&hdev->dev); /* Was the hub disconnected while we were waiting? */ if (hub->disconnected) goto disconnected; if (type == HUB_INIT2) goto init2; goto init3; } hub_get(hub); /* The superspeed hub except for root hub has to use Hub Depth * value as an offset into the route string to locate the bits * it uses to determine the downstream port number. So hub driver * should send a set hub depth request to superspeed hub after * the superspeed hub is set configuration in initialization or * reset procedure. * * After a resume, port power should still be on. * For any other type of activation, turn it on. */ if (type != HUB_RESUME) { if (hdev->parent && hub_is_superspeed(hdev)) { ret = usb_control_msg(hdev, usb_sndctrlpipe(hdev, 0), HUB_SET_DEPTH, USB_RT_HUB, hdev->level - 1, 0, NULL, 0, USB_CTRL_SET_TIMEOUT); if (ret < 0) dev_err(hub->intfdev, "set hub depth failed\n"); } /* Speed up system boot by using a delayed_work for the * hub's initial power-up delays. This is pretty awkward * and the implementation looks like a home-brewed sort of * setjmp/longjmp, but it saves at least 100 ms for each * root hub (assuming usbcore is compiled into the kernel * rather than as a module). It adds up. * * This can't be done for HUB_RESUME or HUB_RESET_RESUME * because for those activation types the ports have to be * operational when we return. In theory this could be done * for HUB_POST_RESET, but it's easier not to. */ if (type == HUB_INIT) { delay = hub_power_on_good_delay(hub); hub_power_on(hub, false); INIT_DELAYED_WORK(&hub->init_work, hub_init_func2); queue_delayed_work(system_power_efficient_wq, &hub->init_work, msecs_to_jiffies(delay)); /* Suppress autosuspend until init is done */ usb_autopm_get_interface_no_resume( to_usb_interface(hub->intfdev)); return; /* Continues at init2: below */ } else if (type == HUB_RESET_RESUME) { /* The internal host controller state for the hub device * may be gone after a host power loss on system resume. * Update the device's info so the HW knows it's a hub. */ hcd = bus_to_hcd(hdev->bus); if (hcd->driver->update_hub_device) { ret = hcd->driver->update_hub_device(hcd, hdev, &hub->tt, GFP_NOIO); if (ret < 0) { dev_err(hub->intfdev, "Host not accepting hub info update\n"); dev_err(hub->intfdev, "LS/FS devices and hubs may not work under this hub\n"); } } hub_power_on(hub, true); } else { hub_power_on(hub, true); } /* Give some time on remote wakeup to let links to transit to U0 */ } else if (hub_is_superspeed(hub->hdev)) msleep(20); init2: /* * Check each port and set hub->change_bits to let hub_wq know * which ports need attention. */ for (port1 = 1; port1 <= hdev->maxchild; ++port1) { struct usb_port *port_dev = hub->ports[port1 - 1]; struct usb_device *udev = port_dev->child; u16 portstatus, portchange; portstatus = portchange = 0; status = usb_hub_port_status(hub, port1, &portstatus, &portchange); if (status) goto abort; if (udev || (portstatus & USB_PORT_STAT_CONNECTION)) dev_dbg(&port_dev->dev, "status %04x change %04x\n", portstatus, portchange); /* * After anything other than HUB_RESUME (i.e., initialization * or any sort of reset), every port should be disabled. * Unconnected ports should likewise be disabled (paranoia), * and so should ports for which we have no usb_device. */ if ((portstatus & USB_PORT_STAT_ENABLE) && ( type != HUB_RESUME || !(portstatus & USB_PORT_STAT_CONNECTION) || !udev || udev->state == USB_STATE_NOTATTACHED)) { /* * USB3 protocol ports will automatically transition * to Enabled state when detect an USB3.0 device attach. * Do not disable USB3 protocol ports, just pretend * power was lost */ portstatus &= ~USB_PORT_STAT_ENABLE; if (!hub_is_superspeed(hdev)) usb_clear_port_feature(hdev, port1, USB_PORT_FEAT_ENABLE); } /* Make sure a warm-reset request is handled by port_event */ if (type == HUB_RESUME && hub_port_warm_reset_required(hub, port1, portstatus)) set_bit(port1, hub->event_bits); /* * Add debounce if USB3 link is in polling/link training state. * Link will automatically transition to Enabled state after * link training completes. */ if (hub_is_superspeed(hdev) && ((portstatus & USB_PORT_STAT_LINK_STATE) == USB_SS_PORT_LS_POLLING)) need_debounce_delay = true; /* Clear status-change flags; we'll debounce later */ if (portchange & USB_PORT_STAT_C_CONNECTION) { need_debounce_delay = true; usb_clear_port_feature(hub->hdev, port1, USB_PORT_FEAT_C_CONNECTION); } if (portchange & USB_PORT_STAT_C_ENABLE) { need_debounce_delay = true; usb_clear_port_feature(hub->hdev, port1, USB_PORT_FEAT_C_ENABLE); } if (portchange & USB_PORT_STAT_C_RESET) { need_debounce_delay = true; usb_clear_port_feature(hub->hdev, port1, USB_PORT_FEAT_C_RESET); } if ((portchange & USB_PORT_STAT_C_BH_RESET) && hub_is_superspeed(hub->hdev)) { need_debounce_delay = true; usb_clear_port_feature(hub->hdev, port1, USB_PORT_FEAT_C_BH_PORT_RESET); } /* We can forget about a "removed" device when there's a * physical disconnect or the connect status changes. */ if (!(portstatus & USB_PORT_STAT_CONNECTION) || (portchange & USB_PORT_STAT_C_CONNECTION)) clear_bit(port1, hub->removed_bits); if (!udev || udev->state == USB_STATE_NOTATTACHED) { /* Tell hub_wq to disconnect the device or * check for a new connection or over current condition. * Based on USB2.0 Spec Section 11.12.5, * C_PORT_OVER_CURRENT could be set while * PORT_OVER_CURRENT is not. So check for any of them. */ if (udev || (portstatus & USB_PORT_STAT_CONNECTION) || (portchange & USB_PORT_STAT_C_CONNECTION) || (portstatus & USB_PORT_STAT_OVERCURRENT) || (portchange & USB_PORT_STAT_C_OVERCURRENT)) set_bit(port1, hub->change_bits); } else if (portstatus & USB_PORT_STAT_ENABLE) { bool port_resumed = (portstatus & USB_PORT_STAT_LINK_STATE) == USB_SS_PORT_LS_U0; /* The power session apparently survived the resume. * If there was an overcurrent or suspend change * (i.e., remote wakeup request), have hub_wq * take care of it. Look at the port link state * for USB 3.0 hubs, since they don't have a suspend * change bit, and they don't set the port link change * bit on device-initiated resume. */ if (portchange || (hub_is_superspeed(hub->hdev) && port_resumed)) set_bit(port1, hub->event_bits); } else if (udev->persist_enabled) { #ifdef CONFIG_PM udev->reset_resume = 1; #endif /* Don't set the change_bits when the device * was powered off. */ if (test_bit(port1, hub->power_bits)) set_bit(port1, hub->change_bits); } else { /* The power session is gone; tell hub_wq */ usb_set_device_state(udev, USB_STATE_NOTATTACHED); set_bit(port1, hub->change_bits); } } /* If no port-status-change flags were set, we don't need any * debouncing. If flags were set we can try to debounce the * ports all at once right now, instead of letting hub_wq do them * one at a time later on. * * If any port-status changes do occur during this delay, hub_wq * will see them later and handle them normally. */ if (need_debounce_delay) { delay = HUB_DEBOUNCE_STABLE; /* Don't do a long sleep inside a workqueue routine */ if (type == HUB_INIT2) { INIT_DELAYED_WORK(&hub->init_work, hub_init_func3); queue_delayed_work(system_power_efficient_wq, &hub->init_work, msecs_to_jiffies(delay)); device_unlock(&hdev->dev); return; /* Continues at init3: below */ } else { msleep(delay); } } init3: hub->quiescing = 0; status = usb_submit_urb(hub->urb, GFP_NOIO); if (status < 0) dev_err(hub->intfdev, "activate --> %d\n", status); if (hub->has_indicators && blinkenlights) queue_delayed_work(system_power_efficient_wq, &hub->leds, LED_CYCLE_PERIOD); /* Scan all ports that need attention */ kick_hub_wq(hub); abort: if (type == HUB_INIT2 || type == HUB_INIT3) { /* Allow autosuspend if it was suppressed */ disconnected: usb_autopm_put_interface_async(to_usb_interface(hub->intfdev)); device_unlock(&hdev->dev); } if (type == HUB_RESUME && hub_is_superspeed(hub->hdev)) { /* give usb3 downstream links training time after hub resume */ usb_autopm_get_interface_no_resume( to_usb_interface(hub->intfdev)); queue_delayed_work(system_power_efficient_wq, &hub->post_resume_work, msecs_to_jiffies(USB_SS_PORT_U0_WAKE_TIME)); return; } hub_put(hub); } /* Implement the continuations for the delays above */ static void hub_init_func2(struct work_struct *ws) { struct usb_hub *hub = container_of(ws, struct usb_hub, init_work.work); hub_activate(hub, HUB_INIT2); } static void hub_init_func3(struct work_struct *ws) { struct usb_hub *hub = container_of(ws, struct usb_hub, init_work.work); hub_activate(hub, HUB_INIT3); } static void hub_post_resume(struct work_struct *ws) { struct usb_hub *hub = container_of(ws, struct usb_hub, post_resume_work.work); usb_autopm_put_interface_async(to_usb_interface(hub->intfdev)); hub_put(hub); } enum hub_quiescing_type { HUB_DISCONNECT, HUB_PRE_RESET, HUB_SUSPEND }; static void hub_quiesce(struct usb_hub *hub, enum hub_quiescing_type type) { struct usb_device *hdev = hub->hdev; unsigned long flags; int i; /* hub_wq and related activity won't re-trigger */ spin_lock_irqsave(&hub->irq_urb_lock, flags); hub->quiescing = 1; spin_unlock_irqrestore(&hub->irq_urb_lock, flags); if (type != HUB_SUSPEND) { /* Disconnect all the children */ for (i = 0; i < hdev->maxchild; ++i) { if (hub->ports[i]->child) usb_disconnect(&hub->ports[i]->child); } } /* Stop hub_wq and related activity */ timer_delete_sync(&hub->irq_urb_retry); flush_delayed_work(&hub->post_resume_work); usb_kill_urb(hub->urb); if (hub->has_indicators) cancel_delayed_work_sync(&hub->leds); if (hub->tt.hub) flush_work(&hub->tt.clear_work); } static void hub_pm_barrier_for_all_ports(struct usb_hub *hub) { int i; for (i = 0; i < hub->hdev->maxchild; ++i) pm_runtime_barrier(&hub->ports[i]->dev); } /* caller has locked the hub device */ static int hub_pre_reset(struct usb_interface *intf) { struct usb_hub *hub = usb_get_intfdata(intf); hub_quiesce(hub, HUB_PRE_RESET); hub->in_reset = 1; hub_pm_barrier_for_all_ports(hub); return 0; } /* caller has locked the hub device */ static int hub_post_reset(struct usb_interface *intf) { struct usb_hub *hub = usb_get_intfdata(intf); hub->in_reset = 0; hub_pm_barrier_for_all_ports(hub); hub_activate(hub, HUB_POST_RESET); return 0; } static int hub_configure(struct usb_hub *hub, struct usb_endpoint_descriptor *endpoint) { struct usb_hcd *hcd; struct usb_device *hdev = hub->hdev; struct device *hub_dev = hub->intfdev; u16 hubstatus, hubchange; u16 wHubCharacteristics; unsigned int pipe; int maxp, ret, i; char *message = "out of memory"; unsigned unit_load; unsigned full_load; unsigned maxchild; hub->buffer = kmalloc(sizeof(*hub->buffer), GFP_KERNEL); if (!hub->buffer) { ret = -ENOMEM; goto fail; } hub->status = kmalloc(sizeof(*hub->status), GFP_KERNEL); if (!hub->status) { ret = -ENOMEM; goto fail; } mutex_init(&hub->status_mutex); hub->descriptor = kzalloc(sizeof(*hub->descriptor), GFP_KERNEL); if (!hub->descriptor) { ret = -ENOMEM; goto fail; } /* Request the entire hub descriptor. * hub->descriptor can handle USB_MAXCHILDREN ports, * but a (non-SS) hub can/will return fewer bytes here. */ ret = get_hub_descriptor(hdev, hub->descriptor); if (ret < 0) { message = "can't read hub descriptor"; goto fail; } maxchild = USB_MAXCHILDREN; if (hub_is_superspeed(hdev)) maxchild = min_t(unsigned, maxchild, USB_SS_MAXPORTS); if (hub->descriptor->bNbrPorts > maxchild) { message = "hub has too many ports!"; ret = -ENODEV; goto fail; } else if (hub->descriptor->bNbrPorts == 0) { message = "hub doesn't have any ports!"; ret = -ENODEV; goto fail; } /* * Accumulate wHubDelay + 40ns for every hub in the tree of devices. * The resulting value will be used for SetIsochDelay() request. */ if (hub_is_superspeed(hdev) || hub_is_superspeedplus(hdev)) { u32 delay = __le16_to_cpu(hub->descriptor->u.ss.wHubDelay); if (hdev->parent) delay += hdev->parent->hub_delay; delay += USB_TP_TRANSMISSION_DELAY; hdev->hub_delay = min_t(u32, delay, USB_TP_TRANSMISSION_DELAY_MAX); } maxchild = hub->descriptor->bNbrPorts; dev_info(hub_dev, "%d port%s detected\n", maxchild, str_plural(maxchild)); hub->ports = kcalloc(maxchild, sizeof(struct usb_port *), GFP_KERNEL); if (!hub->ports) { ret = -ENOMEM; goto fail; } wHubCharacteristics = le16_to_cpu(hub->descriptor->wHubCharacteristics); if (hub_is_superspeed(hdev)) { unit_load = 150; full_load = 900; } else { unit_load = 100; full_load = 500; } /* FIXME for USB 3.0, skip for now */ if ((wHubCharacteristics & HUB_CHAR_COMPOUND) && !(hub_is_superspeed(hdev))) { char portstr[USB_MAXCHILDREN + 1]; for (i = 0; i < maxchild; i++) portstr[i] = hub->descriptor->u.hs.DeviceRemovable [((i + 1) / 8)] & (1 << ((i + 1) % 8)) ? 'F' : 'R'; portstr[maxchild] = 0; dev_dbg(hub_dev, "compound device; port removable status: %s\n", portstr); } else dev_dbg(hub_dev, "standalone hub\n"); switch (wHubCharacteristics & HUB_CHAR_LPSM) { case HUB_CHAR_COMMON_LPSM: dev_dbg(hub_dev, "ganged power switching\n"); break; case HUB_CHAR_INDV_PORT_LPSM: dev_dbg(hub_dev, "individual port power switching\n"); break; case HUB_CHAR_NO_LPSM: case HUB_CHAR_LPSM: dev_dbg(hub_dev, "no power switching (usb 1.0)\n"); break; } switch (wHubCharacteristics & HUB_CHAR_OCPM) { case HUB_CHAR_COMMON_OCPM: dev_dbg(hub_dev, "global over-current protection\n"); break; case HUB_CHAR_INDV_PORT_OCPM: dev_dbg(hub_dev, "individual port over-current protection\n"); break; case HUB_CHAR_NO_OCPM: case HUB_CHAR_OCPM: dev_dbg(hub_dev, "no over-current protection\n"); break; } spin_lock_init(&hub->tt.lock); INIT_LIST_HEAD(&hub->tt.clear_list); INIT_WORK(&hub->tt.clear_work, hub_tt_work); switch (hdev->descriptor.bDeviceProtocol) { case USB_HUB_PR_FS: break; case USB_HUB_PR_HS_SINGLE_TT: dev_dbg(hub_dev, "Single TT\n"); hub->tt.hub = hdev; break; case USB_HUB_PR_HS_MULTI_TT: ret = usb_set_interface(hdev, 0, 1); if (ret == 0) { dev_dbg(hub_dev, "TT per port\n"); hub->tt.multi = 1; } else dev_err(hub_dev, "Using single TT (err %d)\n", ret); hub->tt.hub = hdev; break; case USB_HUB_PR_SS: /* USB 3.0 hubs don't have a TT */ break; default: dev_dbg(hub_dev, "Unrecognized hub protocol %d\n", hdev->descriptor.bDeviceProtocol); break; } /* Note 8 FS bit times == (8 bits / 12000000 bps) ~= 666ns */ switch (wHubCharacteristics & HUB_CHAR_TTTT) { case HUB_TTTT_8_BITS: if (hdev->descriptor.bDeviceProtocol != 0) { hub->tt.think_time = 666; dev_dbg(hub_dev, "TT requires at most %d " "FS bit times (%d ns)\n", 8, hub->tt.think_time); } break; case HUB_TTTT_16_BITS: hub->tt.think_time = 666 * 2; dev_dbg(hub_dev, "TT requires at most %d " "FS bit times (%d ns)\n", 16, hub->tt.think_time); break; case HUB_TTTT_24_BITS: hub->tt.think_time = 666 * 3; dev_dbg(hub_dev, "TT requires at most %d " "FS bit times (%d ns)\n", 24, hub->tt.think_time); break; case HUB_TTTT_32_BITS: hub->tt.think_time = 666 * 4; dev_dbg(hub_dev, "TT requires at most %d " "FS bit times (%d ns)\n", 32, hub->tt.think_time); break; } /* probe() zeroes hub->indicator[] */ if (wHubCharacteristics & HUB_CHAR_PORTIND) { hub->has_indicators = 1; dev_dbg(hub_dev, "Port indicators are supported\n"); } dev_dbg(hub_dev, "power on to power good time: %dms\n", hub->descriptor->bPwrOn2PwrGood * 2); /* power budgeting mostly matters with bus-powered hubs, * and battery-powered root hubs (may provide just 8 mA). */ ret = usb_get_std_status(hdev, USB_RECIP_DEVICE, 0, &hubstatus); if (ret) { message = "can't get hub status"; goto fail; } hcd = bus_to_hcd(hdev->bus); if (hdev == hdev->bus->root_hub) { if (hcd->power_budget > 0) hdev->bus_mA = hcd->power_budget; else hdev->bus_mA = full_load * maxchild; if (hdev->bus_mA >= full_load) hub->mA_per_port = full_load; else { hub->mA_per_port = hdev->bus_mA; hub->limited_power = 1; } } else if ((hubstatus & (1 << USB_DEVICE_SELF_POWERED)) == 0) { int remaining = hdev->bus_mA - hub->descriptor->bHubContrCurrent; dev_dbg(hub_dev, "hub controller current requirement: %dmA\n", hub->descriptor->bHubContrCurrent); hub->limited_power = 1; if (remaining < maxchild * unit_load) dev_warn(hub_dev, "insufficient power available " "to use all downstream ports\n"); hub->mA_per_port = unit_load; /* 7.2.1 */ } else { /* Self-powered external hub */ /* FIXME: What about battery-powered external hubs that * provide less current per port? */ hub->mA_per_port = full_load; } if (hub->mA_per_port < full_load) dev_dbg(hub_dev, "%umA bus power budget for each child\n", hub->mA_per_port); ret = hub_hub_status(hub, &hubstatus, &hubchange); if (ret < 0) { message = "can't get hub status"; goto fail; } /* local power status reports aren't always correct */ if (hdev->actconfig->desc.bmAttributes & USB_CONFIG_ATT_SELFPOWER) dev_dbg(hub_dev, "local power source is %s\n", (hubstatus & HUB_STATUS_LOCAL_POWER) ? "lost (inactive)" : "good"); if ((wHubCharacteristics & HUB_CHAR_OCPM) == 0) dev_dbg(hub_dev, "%sover-current condition exists\n", (hubstatus & HUB_STATUS_OVERCURRENT) ? "" : "no "); /* set up the interrupt endpoint * We use the EP's maxpacket size instead of (PORTS+1+7)/8 * bytes as USB2.0[11.12.3] says because some hubs are known * to send more data (and thus cause overflow). For root hubs, * maxpktsize is defined in hcd.c's fake endpoint descriptors * to be big enough for at least USB_MAXCHILDREN ports. */ pipe = usb_rcvintpipe(hdev, endpoint->bEndpointAddress); maxp = usb_maxpacket(hdev, pipe); if (maxp > sizeof(*hub->buffer)) maxp = sizeof(*hub->buffer); hub->urb = usb_alloc_urb(0, GFP_KERNEL); if (!hub->urb) { ret = -ENOMEM; goto fail; } usb_fill_int_urb(hub->urb, hdev, pipe, *hub->buffer, maxp, hub_irq, hub, endpoint->bInterval); /* maybe cycle the hub leds */ if (hub->has_indicators && blinkenlights) hub->indicator[0] = INDICATOR_CYCLE; mutex_lock(&usb_port_peer_mutex); for (i = 0; i < maxchild; i++) { ret = usb_hub_create_port_device(hub, i + 1); if (ret < 0) { dev_err(hub->intfdev, "couldn't create port%d device.\n", i + 1); break; } } hdev->maxchild = i; for (i = 0; i < hdev->maxchild; i++) { struct usb_port *port_dev = hub->ports[i]; pm_runtime_put(&port_dev->dev); } mutex_unlock(&usb_port_peer_mutex); if (ret < 0) goto fail; /* Update the HCD's internal representation of this hub before hub_wq * starts getting port status changes for devices under the hub. */ if (hcd->driver->update_hub_device) { ret = hcd->driver->update_hub_device(hcd, hdev, &hub->tt, GFP_KERNEL); if (ret < 0) { message = "can't update HCD hub info"; goto fail; } } usb_hub_adjust_deviceremovable(hdev, hub->descriptor); hub_activate(hub, HUB_INIT); return 0; fail: dev_err(hub_dev, "config failed, %s (err %d)\n", message, ret); /* hub_disconnect() frees urb and descriptor */ return ret; } static void hub_release(struct kref *kref) { struct usb_hub *hub = container_of(kref, struct usb_hub, kref); usb_put_dev(hub->hdev); usb_put_intf(to_usb_interface(hub->intfdev)); kfree(hub); } void hub_get(struct usb_hub *hub) { kref_get(&hub->kref); } void hub_put(struct usb_hub *hub) { kref_put(&hub->kref, hub_release); } static unsigned highspeed_hubs; static void hub_disconnect(struct usb_interface *intf) { struct usb_hub *hub = usb_get_intfdata(intf); struct usb_device *hdev = interface_to_usbdev(intf); int port1; /* * Stop adding new hub events. We do not want to block here and thus * will not try to remove any pending work item. */ hub->disconnected = 1; /* Disconnect all children and quiesce the hub */ hub->error = 0; hub_quiesce(hub, HUB_DISCONNECT); mutex_lock(&usb_port_peer_mutex); /* Avoid races with recursively_mark_NOTATTACHED() */ spin_lock_irq(&device_state_lock); port1 = hdev->maxchild; hdev->maxchild = 0; usb_set_intfdata(intf, NULL); spin_unlock_irq(&device_state_lock); for (; port1 > 0; --port1) usb_hub_remove_port_device(hub, port1); mutex_unlock(&usb_port_peer_mutex); if (hub->hdev->speed == USB_SPEED_HIGH) highspeed_hubs--; usb_free_urb(hub->urb); kfree(hub->ports); kfree(hub->descriptor); kfree(hub->status); kfree(hub->buffer); pm_suspend_ignore_children(&intf->dev, false); if (hub->quirk_disable_autosuspend) usb_autopm_put_interface(intf); onboard_dev_destroy_pdevs(&hub->onboard_devs); hub_put(hub); } static bool hub_descriptor_is_sane(struct usb_host_interface *desc) { /* Some hubs have a subclass of 1, which AFAICT according to the */ /* specs is not defined, but it works */ if (desc->desc.bInterfaceSubClass != 0 && desc->desc.bInterfaceSubClass != 1) return false; /* Multiple endpoints? What kind of mutant ninja-hub is this? */ if (desc->desc.bNumEndpoints != 1) return false; /* If the first endpoint is not interrupt IN, we'd better punt! */ if (!usb_endpoint_is_int_in(&desc->endpoint[0].desc)) return false; return true; } static int hub_probe(struct usb_interface *intf, const struct usb_device_id *id) { struct usb_host_interface *desc; struct usb_device *hdev; struct usb_hub *hub; desc = intf->cur_altsetting; hdev = interface_to_usbdev(intf); /* * The USB 2.0 spec prohibits hubs from having more than one * configuration or interface, and we rely on this prohibition. * Refuse to accept a device that violates it. */ if (hdev->descriptor.bNumConfigurations > 1 || hdev->actconfig->desc.bNumInterfaces > 1) { dev_err(&intf->dev, "Invalid hub with more than one config or interface\n"); return -EINVAL; } /* * Set default autosuspend delay as 0 to speedup bus suspend, * based on the below considerations: * * - Unlike other drivers, the hub driver does not rely on the * autosuspend delay to provide enough time to handle a wakeup * event, and the submitted status URB is just to check future * change on hub downstream ports, so it is safe to do it. * * - The patch might cause one or more auto supend/resume for * below very rare devices when they are plugged into hub * first time: * * devices having trouble initializing, and disconnect * themselves from the bus and then reconnect a second * or so later * * devices just for downloading firmware, and disconnects * themselves after completing it * * For these quite rare devices, their drivers may change the * autosuspend delay of their parent hub in the probe() to one * appropriate value to avoid the subtle problem if someone * does care it. * * - The patch may cause one or more auto suspend/resume on * hub during running 'lsusb', but it is probably too * infrequent to worry about. * * - Change autosuspend delay of hub can avoid unnecessary auto * suspend timer for hub, also may decrease power consumption * of USB bus. * * - If user has indicated to prevent autosuspend by passing * usbcore.autosuspend = -1 then keep autosuspend disabled. */ #ifdef CONFIG_PM if (hdev->dev.power.autosuspend_delay >= 0) pm_runtime_set_autosuspend_delay(&hdev->dev, 0); #endif /* * Hubs have proper suspend/resume support, except for root hubs * where the controller driver doesn't have bus_suspend and * bus_resume methods. */ if (hdev->parent) { /* normal device */ usb_enable_autosuspend(hdev); } else { /* root hub */ const struct hc_driver *drv = bus_to_hcd(hdev->bus)->driver; if (drv->bus_suspend && drv->bus_resume) usb_enable_autosuspend(hdev); } if (hdev->level == MAX_TOPO_LEVEL) { dev_err(&intf->dev, "Unsupported bus topology: hub nested too deep\n"); return -E2BIG; } #ifdef CONFIG_USB_OTG_DISABLE_EXTERNAL_HUB if (hdev->parent) { dev_warn(&intf->dev, "ignoring external hub\n"); return -ENODEV; } #endif if (!hub_descriptor_is_sane(desc)) { dev_err(&intf->dev, "bad descriptor, ignoring hub\n"); return -EIO; } /* We found a hub */ dev_info(&intf->dev, "USB hub found\n"); hub = kzalloc(sizeof(*hub), GFP_KERNEL); if (!hub) return -ENOMEM; kref_init(&hub->kref); hub->intfdev = &intf->dev; hub->hdev = hdev; INIT_DELAYED_WORK(&hub->leds, led_work); INIT_DELAYED_WORK(&hub->init_work, NULL); INIT_DELAYED_WORK(&hub->post_resume_work, hub_post_resume); INIT_WORK(&hub->events, hub_event); INIT_LIST_HEAD(&hub->onboard_devs); spin_lock_init(&hub->irq_urb_lock); timer_setup(&hub->irq_urb_retry, hub_retry_irq_urb, 0); usb_get_intf(intf); usb_get_dev(hdev); usb_set_intfdata(intf, hub); intf->needs_remote_wakeup = 1; pm_suspend_ignore_children(&intf->dev, true); if (hdev->speed == USB_SPEED_HIGH) highspeed_hubs++; if (id->driver_info & HUB_QUIRK_CHECK_PORT_AUTOSUSPEND) hub->quirk_check_port_auto_suspend = 1; if (id->driver_info & HUB_QUIRK_DISABLE_AUTOSUSPEND) { hub->quirk_disable_autosuspend = 1; usb_autopm_get_interface_no_resume(intf); } if ((id->driver_info & HUB_QUIRK_REDUCE_FRAME_INTR_BINTERVAL) && desc->endpoint[0].desc.bInterval > USB_REDUCE_FRAME_INTR_BINTERVAL) { desc->endpoint[0].desc.bInterval = USB_REDUCE_FRAME_INTR_BINTERVAL; /* Tell the HCD about the interrupt ep's new bInterval */ usb_set_interface(hdev, 0, 0); } if (hub_configure(hub, &desc->endpoint[0].desc) >= 0) { onboard_dev_create_pdevs(hdev, &hub->onboard_devs); return 0; } hub_disconnect(intf); return -ENODEV; } static int hub_ioctl(struct usb_interface *intf, unsigned int code, void *user_data) { struct usb_device *hdev = interface_to_usbdev(intf); struct usb_hub *hub = usb_hub_to_struct_hub(hdev); /* assert ifno == 0 (part of hub spec) */ switch (code) { case USBDEVFS_HUB_PORTINFO: { struct usbdevfs_hub_portinfo *info = user_data; int i; spin_lock_irq(&device_state_lock); if (hdev->devnum <= 0) info->nports = 0; else { info->nports = hdev->maxchild; for (i = 0; i < info->nports; i++) { if (hub->ports[i]->child == NULL) info->port[i] = 0; else info->port[i] = hub->ports[i]->child->devnum; } } spin_unlock_irq(&device_state_lock); return info->nports + 1; } default: return -ENOSYS; } } /* * Allow user programs to claim ports on a hub. When a device is attached * to one of these "claimed" ports, the program will "own" the device. */ static int find_port_owner(struct usb_device *hdev, unsigned port1, struct usb_dev_state ***ppowner) { struct usb_hub *hub = usb_hub_to_struct_hub(hdev); if (hdev->state == USB_STATE_NOTATTACHED) return -ENODEV; if (port1 == 0 || port1 > hdev->maxchild) return -EINVAL; /* Devices not managed by the hub driver * will always have maxchild equal to 0. */ *ppowner = &(hub->ports[port1 - 1]->port_owner); return 0; } /* In the following three functions, the caller must hold hdev's lock */ int usb_hub_claim_port(struct usb_device *hdev, unsigned port1, struct usb_dev_state *owner) { int rc; struct usb_dev_state **powner; rc = find_port_owner(hdev, port1, &powner); if (rc) return rc; if (*powner) return -EBUSY; *powner = owner; return rc; } EXPORT_SYMBOL_GPL(usb_hub_claim_port); int usb_hub_release_port(struct usb_device *hdev, unsigned port1, struct usb_dev_state *owner) { int rc; struct usb_dev_state **powner; rc = find_port_owner(hdev, port1, &powner); if (rc) return rc; if (*powner != owner) return -ENOENT; *powner = NULL; return rc; } EXPORT_SYMBOL_GPL(usb_hub_release_port); void usb_hub_release_all_ports(struct usb_device *hdev, struct usb_dev_state *owner) { struct usb_hub *hub = usb_hub_to_struct_hub(hdev); int n; for (n = 0; n < hdev->maxchild; n++) { if (hub->ports[n]->port_owner == owner) hub->ports[n]->port_owner = NULL; } } /* The caller must hold udev's lock */ bool usb_device_is_owned(struct usb_device *udev) { struct usb_hub *hub; if (udev->state == USB_STATE_NOTATTACHED || !udev->parent) return false; hub = usb_hub_to_struct_hub(udev->parent); return !!hub->ports[udev->portnum - 1]->port_owner; } static void update_port_device_state(struct usb_device *udev) { struct usb_hub *hub; struct usb_port *port_dev; if (udev->parent) { hub = usb_hub_to_struct_hub(udev->parent); /* * The Link Layer Validation System Driver (lvstest) * has a test step to unbind the hub before running the * rest of the procedure. This triggers hub_disconnect * which will set the hub's maxchild to 0, further * resulting in usb_hub_to_struct_hub returning NULL. */ if (hub) { port_dev = hub->ports[udev->portnum - 1]; WRITE_ONCE(port_dev->state, udev->state); sysfs_notify_dirent(port_dev->state_kn); } } } static void recursively_mark_NOTATTACHED(struct usb_device *udev) { struct usb_hub *hub = usb_hub_to_struct_hub(udev); int i; for (i = 0; i < udev->maxchild; ++i) { if (hub->ports[i]->child) recursively_mark_NOTATTACHED(hub->ports[i]->child); } if (udev->state == USB_STATE_SUSPENDED) udev->active_duration -= jiffies; udev->state = USB_STATE_NOTATTACHED; update_port_device_state(udev); } /** * usb_set_device_state - change a device's current state (usbcore, hcds) * @udev: pointer to device whose state should be changed * @new_state: new state value to be stored * * udev->state is _not_ fully protected by the device lock. Although * most transitions are made only while holding the lock, the state can * can change to USB_STATE_NOTATTACHED at almost any time. This * is so that devices can be marked as disconnected as soon as possible, * without having to wait for any semaphores to be released. As a result, * all changes to any device's state must be protected by the * device_state_lock spinlock. * * Once a device has been added to the device tree, all changes to its state * should be made using this routine. The state should _not_ be set directly. * * If udev->state is already USB_STATE_NOTATTACHED then no change is made. * Otherwise udev->state is set to new_state, and if new_state is * USB_STATE_NOTATTACHED then all of udev's descendants' states are also set * to USB_STATE_NOTATTACHED. */ void usb_set_device_state(struct usb_device *udev, enum usb_device_state new_state) { unsigned long flags; int wakeup = -1; spin_lock_irqsave(&device_state_lock, flags); if (udev->state == USB_STATE_NOTATTACHED) ; /* do nothing */ else if (new_state != USB_STATE_NOTATTACHED) { /* root hub wakeup capabilities are managed out-of-band * and may involve silicon errata ... ignore them here. */ if (udev->parent) { if (udev->state == USB_STATE_SUSPENDED || new_state == USB_STATE_SUSPENDED) ; /* No change to wakeup settings */ else if (new_state == USB_STATE_CONFIGURED) wakeup = (udev->quirks & USB_QUIRK_IGNORE_REMOTE_WAKEUP) ? 0 : udev->actconfig->desc.bmAttributes & USB_CONFIG_ATT_WAKEUP; else wakeup = 0; } if (udev->state == USB_STATE_SUSPENDED && new_state != USB_STATE_SUSPENDED) udev->active_duration -= jiffies; else if (new_state == USB_STATE_SUSPENDED && udev->state != USB_STATE_SUSPENDED) udev->active_duration += jiffies; udev->state = new_state; update_port_device_state(udev); } else recursively_mark_NOTATTACHED(udev); spin_unlock_irqrestore(&device_state_lock, flags); if (wakeup >= 0) device_set_wakeup_capable(&udev->dev, wakeup); } EXPORT_SYMBOL_GPL(usb_set_device_state); /* * Choose a device number. * * Device numbers are used as filenames in usbfs. On USB-1.1 and * USB-2.0 buses they are also used as device addresses, however on * USB-3.0 buses the address is assigned by the controller hardware * and it usually is not the same as the device number. * * Devices connected under xHCI are not as simple. The host controller * supports virtualization, so the hardware assigns device addresses and * the HCD must setup data structures before issuing a set address * command to the hardware. */ static void choose_devnum(struct usb_device *udev) { int devnum; struct usb_bus *bus = udev->bus; /* be safe when more hub events are proceed in parallel */ mutex_lock(&bus->devnum_next_mutex); /* Try to allocate the next devnum beginning at bus->devnum_next. */ devnum = find_next_zero_bit(bus->devmap, 128, bus->devnum_next); if (devnum >= 128) devnum = find_next_zero_bit(bus->devmap, 128, 1); bus->devnum_next = (devnum >= 127 ? 1 : devnum + 1); if (devnum < 128) { set_bit(devnum, bus->devmap); udev->devnum = devnum; } mutex_unlock(&bus->devnum_next_mutex); } static void release_devnum(struct usb_device *udev) { if (udev->devnum > 0) { clear_bit(udev->devnum, udev->bus->devmap); udev->devnum = -1; } } static void update_devnum(struct usb_device *udev, int devnum) { udev->devnum = devnum; if (!udev->devaddr) udev->devaddr = (u8)devnum; } static void hub_free_dev(struct usb_device *udev) { struct usb_hcd *hcd = bus_to_hcd(udev->bus); /* Root hubs aren't real devices, so don't free HCD resources */ if (hcd->driver->free_dev && udev->parent) hcd->driver->free_dev(hcd, udev); } static void hub_disconnect_children(struct usb_device *udev) { struct usb_hub *hub = usb_hub_to_struct_hub(udev); int i; /* Free up all the children before we remove this device */ for (i = 0; i < udev->maxchild; i++) { if (hub->ports[i]->child) usb_disconnect(&hub->ports[i]->child); } } /** * usb_disconnect - disconnect a device (usbcore-internal) * @pdev: pointer to device being disconnected * * Context: task context, might sleep * * Something got disconnected. Get rid of it and all of its children. * * If *pdev is a normal device then the parent hub must already be locked. * If *pdev is a root hub then the caller must hold the usb_bus_idr_lock, * which protects the set of root hubs as well as the list of buses. * * Only hub drivers (including virtual root hub drivers for host * controllers) should ever call this. * * This call is synchronous, and may not be used in an interrupt context. */ void usb_disconnect(struct usb_device **pdev) { struct usb_port *port_dev = NULL; struct usb_device *udev = *pdev; struct usb_hub *hub = NULL; int port1 = 1; /* mark the device as inactive, so any further urb submissions for * this device (and any of its children) will fail immediately. * this quiesces everything except pending urbs. */ usb_set_device_state(udev, USB_STATE_NOTATTACHED); dev_info(&udev->dev, "USB disconnect, device number %d\n", udev->devnum); /* * Ensure that the pm runtime code knows that the USB device * is in the process of being disconnected. */ pm_runtime_barrier(&udev->dev); usb_lock_device(udev); hub_disconnect_children(udev); /* deallocate hcd/hardware state ... nuking all pending urbs and * cleaning up all state associated with the current configuration * so that the hardware is now fully quiesced. */ dev_dbg(&udev->dev, "unregistering device\n"); usb_disable_device(udev, 0); usb_hcd_synchronize_unlinks(udev); if (udev->parent) { port1 = udev->portnum; hub = usb_hub_to_struct_hub(udev->parent); port_dev = hub->ports[port1 - 1]; sysfs_remove_link(&udev->dev.kobj, "port"); sysfs_remove_link(&port_dev->dev.kobj, "device"); /* * As usb_port_runtime_resume() de-references udev, make * sure no resumes occur during removal */ if (!test_and_set_bit(port1, hub->child_usage_bits)) pm_runtime_get_sync(&port_dev->dev); typec_deattach(port_dev->connector, &udev->dev); } usb_remove_ep_devs(&udev->ep0); usb_unlock_device(udev); if (udev->usb4_link) device_link_del(udev->usb4_link); /* Unregister the device. The device driver is responsible * for de-configuring the device and invoking the remove-device * notifier chain (used by usbfs and possibly others). */ device_del(&udev->dev); /* Free the device number and delete the parent's children[] * (or root_hub) pointer. */ release_devnum(udev); /* Avoid races with recursively_mark_NOTATTACHED() */ spin_lock_irq(&device_state_lock); *pdev = NULL; spin_unlock_irq(&device_state_lock); if (port_dev && test_and_clear_bit(port1, hub->child_usage_bits)) pm_runtime_put(&port_dev->dev); hub_free_dev(udev); put_device(&udev->dev); } #ifdef CONFIG_USB_ANNOUNCE_NEW_DEVICES static void show_string(struct usb_device *udev, char *id, char *string) { if (!string) return; dev_info(&udev->dev, "%s: %s\n", id, string); } static void announce_device(struct usb_device *udev) { u16 bcdDevice = le16_to_cpu(udev->descriptor.bcdDevice); dev_info(&udev->dev, "New USB device found, idVendor=%04x, idProduct=%04x, bcdDevice=%2x.%02x\n", le16_to_cpu(udev->descriptor.idVendor), le16_to_cpu(udev->descriptor.idProduct), bcdDevice >> 8, bcdDevice & 0xff); dev_info(&udev->dev, "New USB device strings: Mfr=%d, Product=%d, SerialNumber=%d\n", udev->descriptor.iManufacturer, udev->descriptor.iProduct, udev->descriptor.iSerialNumber); show_string(udev, "Product", udev->product); show_string(udev, "Manufacturer", udev->manufacturer); show_string(udev, "SerialNumber", udev->serial); } #else static inline void announce_device(struct usb_device *udev) { } #endif /** * usb_enumerate_device_otg - FIXME (usbcore-internal) * @udev: newly addressed device (in ADDRESS state) * * Finish enumeration for On-The-Go devices * * Return: 0 if successful. A negative error code otherwise. */ static int usb_enumerate_device_otg(struct usb_device *udev) { int err = 0; #ifdef CONFIG_USB_OTG /* * OTG-aware devices on OTG-capable root hubs may be able to use SRP, * to wake us after we've powered off VBUS; and HNP, switching roles * "host" to "peripheral". The OTG descriptor helps figure this out. */ if (!udev->bus->is_b_host && udev->config && udev->parent == udev->bus->root_hub) { struct usb_otg_descriptor *desc = NULL; struct usb_bus *bus = udev->bus; unsigned port1 = udev->portnum; /* descriptor may appear anywhere in config */ err = __usb_get_extra_descriptor(udev->rawdescriptors[0], le16_to_cpu(udev->config[0].desc.wTotalLength), USB_DT_OTG, (void **) &desc, sizeof(*desc)); if (err || !(desc->bmAttributes & USB_OTG_HNP)) return 0; dev_info(&udev->dev, "Dual-Role OTG device on %sHNP port\n", (port1 == bus->otg_port) ? "" : "non-"); /* enable HNP before suspend, it's simpler */ if (port1 == bus->otg_port) { bus->b_hnp_enable = 1; err = usb_control_msg(udev, usb_sndctrlpipe(udev, 0), USB_REQ_SET_FEATURE, 0, USB_DEVICE_B_HNP_ENABLE, 0, NULL, 0, USB_CTRL_SET_TIMEOUT); if (err < 0) { /* * OTG MESSAGE: report errors here, * customize to match your product. */ dev_err(&udev->dev, "can't set HNP mode: %d\n", err); bus->b_hnp_enable = 0; } } else if (desc->bLength == sizeof (struct usb_otg_descriptor)) { /* * We are operating on a legacy OTP device * These should be told that they are operating * on the wrong port if we have another port that does * support HNP */ if (bus->otg_port != 0) { /* Set a_alt_hnp_support for legacy otg device */ err = usb_control_msg(udev, usb_sndctrlpipe(udev, 0), USB_REQ_SET_FEATURE, 0, USB_DEVICE_A_ALT_HNP_SUPPORT, 0, NULL, 0, USB_CTRL_SET_TIMEOUT); if (err < 0) dev_err(&udev->dev, "set a_alt_hnp_support failed: %d\n", err); } } } #endif return err; } /** * usb_enumerate_device - Read device configs/intfs/otg (usbcore-internal) * @udev: newly addressed device (in ADDRESS state) * * This is only called by usb_new_device() -- all comments that apply there * apply here wrt to environment. * * If the device is WUSB and not authorized, we don't attempt to read * the string descriptors, as they will be errored out by the device * until it has been authorized. * * Return: 0 if successful. A negative error code otherwise. */ static int usb_enumerate_device(struct usb_device *udev) { int err; struct usb_hcd *hcd = bus_to_hcd(udev->bus); if (udev->config == NULL) { err = usb_get_configuration(udev); if (err < 0) { if (err != -ENODEV) dev_err(&udev->dev, "can't read configurations, error %d\n", err); return err; } } /* read the standard strings and cache them if present */ udev->product = usb_cache_string(udev, udev->descriptor.iProduct); udev->manufacturer = usb_cache_string(udev, udev->descriptor.iManufacturer); udev->serial = usb_cache_string(udev, udev->descriptor.iSerialNumber); err = usb_enumerate_device_otg(udev); if (err < 0) return err; if (IS_ENABLED(CONFIG_USB_OTG_PRODUCTLIST) && hcd->tpl_support && !is_targeted(udev)) { /* Maybe it can talk to us, though we can't talk to it. * (Includes HNP test device.) */ if (IS_ENABLED(CONFIG_USB_OTG) && (udev->bus->b_hnp_enable || udev->bus->is_b_host)) { err = usb_port_suspend(udev, PMSG_AUTO_SUSPEND); if (err < 0) dev_dbg(&udev->dev, "HNP fail, %d\n", err); } return -ENOTSUPP; } usb_detect_interface_quirks(udev); return 0; } static void set_usb_port_removable(struct usb_device *udev) { struct usb_device *hdev = udev->parent; struct usb_hub *hub; u8 port = udev->portnum; u16 wHubCharacteristics; bool removable = true; dev_set_removable(&udev->dev, DEVICE_REMOVABLE_UNKNOWN); if (!hdev) return; hub = usb_hub_to_struct_hub(udev->parent); /* * If the platform firmware has provided information about a port, * use that to determine whether it's removable. */ switch (hub->ports[udev->portnum - 1]->connect_type) { case USB_PORT_CONNECT_TYPE_HOT_PLUG: dev_set_removable(&udev->dev, DEVICE_REMOVABLE); return; case USB_PORT_CONNECT_TYPE_HARD_WIRED: case USB_PORT_NOT_USED: dev_set_removable(&udev->dev, DEVICE_FIXED); return; default: break; } /* * Otherwise, check whether the hub knows whether a port is removable * or not */ wHubCharacteristics = le16_to_cpu(hub->descriptor->wHubCharacteristics); if (!(wHubCharacteristics & HUB_CHAR_COMPOUND)) return; if (hub_is_superspeed(hdev)) { if (le16_to_cpu(hub->descriptor->u.ss.DeviceRemovable) & (1 << port)) removable = false; } else { if (hub->descriptor->u.hs.DeviceRemovable[port / 8] & (1 << (port % 8))) removable = false; } if (removable) dev_set_removable(&udev->dev, DEVICE_REMOVABLE); else dev_set_removable(&udev->dev, DEVICE_FIXED); } /** * usb_new_device - perform initial device setup (usbcore-internal) * @udev: newly addressed device (in ADDRESS state) * * This is called with devices which have been detected but not fully * enumerated. The device descriptor is available, but not descriptors * for any device configuration. The caller must have locked either * the parent hub (if udev is a normal device) or else the * usb_bus_idr_lock (if udev is a root hub). The parent's pointer to * udev has already been installed, but udev is not yet visible through * sysfs or other filesystem code. * * This call is synchronous, and may not be used in an interrupt context. * * Only the hub driver or root-hub registrar should ever call this. * * Return: Whether the device is configured properly or not. Zero if the * interface was registered with the driver core; else a negative errno * value. * */ int usb_new_device(struct usb_device *udev) { int err; if (udev->parent) { /* Initialize non-root-hub device wakeup to disabled; * device (un)configuration controls wakeup capable * sysfs power/wakeup controls wakeup enabled/disabled */ device_init_wakeup(&udev->dev, 0); } /* Tell the runtime-PM framework the device is active */ pm_runtime_set_active(&udev->dev); pm_runtime_get_noresume(&udev->dev); pm_runtime_use_autosuspend(&udev->dev); pm_runtime_enable(&udev->dev); /* By default, forbid autosuspend for all devices. It will be * allowed for hubs during binding. */ usb_disable_autosuspend(udev); err = usb_enumerate_device(udev); /* Read descriptors */ if (err < 0) goto fail; dev_dbg(&udev->dev, "udev %d, busnum %d, minor = %d\n", udev->devnum, udev->bus->busnum, (((udev->bus->busnum-1) * 128) + (udev->devnum-1))); /* export the usbdev device-node for libusb */ udev->dev.devt = MKDEV(USB_DEVICE_MAJOR, (((udev->bus->busnum-1) * 128) + (udev->devnum-1))); /* Tell the world! */ announce_device(udev); if (udev->serial) add_device_randomness(udev->serial, strlen(udev->serial)); if (udev->product) add_device_randomness(udev->product, strlen(udev->product)); if (udev->manufacturer) add_device_randomness(udev->manufacturer, strlen(udev->manufacturer)); device_enable_async_suspend(&udev->dev); /* check whether the hub or firmware marks this port as non-removable */ set_usb_port_removable(udev); /* Register the device. The device driver is responsible * for configuring the device and invoking the add-device * notifier chain (used by usbfs and possibly others). */ err = device_add(&udev->dev); if (err) { dev_err(&udev->dev, "can't device_add, error %d\n", err); goto fail; } /* Create link files between child device and usb port device. */ if (udev->parent) { struct usb_hub *hub = usb_hub_to_struct_hub(udev->parent); int port1 = udev->portnum; struct usb_port *port_dev = hub->ports[port1 - 1]; err = sysfs_create_link(&udev->dev.kobj, &port_dev->dev.kobj, "port"); if (err) goto out_del_dev; err = sysfs_create_link(&port_dev->dev.kobj, &udev->dev.kobj, "device"); if (err) { sysfs_remove_link(&udev->dev.kobj, "port"); goto out_del_dev; } if (!test_and_set_bit(port1, hub->child_usage_bits)) pm_runtime_get_sync(&port_dev->dev); typec_attach(port_dev->connector, &udev->dev); } (void) usb_create_ep_devs(&udev->dev, &udev->ep0, udev); usb_mark_last_busy(udev); pm_runtime_put_sync_autosuspend(&udev->dev); return err; out_del_dev: device_del(&udev->dev); fail: usb_set_device_state(udev, USB_STATE_NOTATTACHED); pm_runtime_disable(&udev->dev); pm_runtime_set_suspended(&udev->dev); return err; } /** * usb_deauthorize_device - deauthorize a device (usbcore-internal) * @usb_dev: USB device * * Move the USB device to a very basic state where interfaces are disabled * and the device is in fact unconfigured and unusable. * * We share a lock (that we have) with device_del(), so we need to * defer its call. * * Return: 0. */ int usb_deauthorize_device(struct usb_device *usb_dev) { usb_lock_device(usb_dev); if (usb_dev->authorized == 0) goto out_unauthorized; usb_dev->authorized = 0; usb_set_configuration(usb_dev, -1); out_unauthorized: usb_unlock_device(usb_dev); return 0; } int usb_authorize_device(struct usb_device *usb_dev) { int result = 0, c; usb_lock_device(usb_dev); if (usb_dev->authorized == 1) goto out_authorized; result = usb_autoresume_device(usb_dev); if (result < 0) { dev_err(&usb_dev->dev, "can't autoresume for authorization: %d\n", result); goto error_autoresume; } usb_dev->authorized = 1; /* Choose and set the configuration. This registers the interfaces * with the driver core and lets interface drivers bind to them. */ c = usb_choose_configuration(usb_dev); if (c >= 0) { result = usb_set_configuration(usb_dev, c); if (result) { dev_err(&usb_dev->dev, "can't set config #%d, error %d\n", c, result); /* This need not be fatal. The user can try to * set other configurations. */ } } dev_info(&usb_dev->dev, "authorized to connect\n"); usb_autosuspend_device(usb_dev); error_autoresume: out_authorized: usb_unlock_device(usb_dev); /* complements locktree */ return result; } /** * get_port_ssp_rate - Match the extended port status to SSP rate * @hdev: The hub device * @ext_portstatus: extended port status * * Match the extended port status speed id to the SuperSpeed Plus sublink speed * capability attributes. Base on the number of connected lanes and speed, * return the corresponding enum usb_ssp_rate. */ static enum usb_ssp_rate get_port_ssp_rate(struct usb_device *hdev, u32 ext_portstatus) { struct usb_ssp_cap_descriptor *ssp_cap; u32 attr; u8 speed_id; u8 ssac; u8 lanes; int i; if (!hdev->bos) goto out; ssp_cap = hdev->bos->ssp_cap; if (!ssp_cap) goto out; speed_id = ext_portstatus & USB_EXT_PORT_STAT_RX_SPEED_ID; lanes = USB_EXT_PORT_RX_LANES(ext_portstatus) + 1; ssac = le32_to_cpu(ssp_cap->bmAttributes) & USB_SSP_SUBLINK_SPEED_ATTRIBS; for (i = 0; i <= ssac; i++) { u8 ssid; attr = le32_to_cpu(ssp_cap->bmSublinkSpeedAttr[i]); ssid = FIELD_GET(USB_SSP_SUBLINK_SPEED_SSID, attr); if (speed_id == ssid) { u16 mantissa; u8 lse; u8 type; /* * Note: currently asymmetric lane types are only * applicable for SSIC operate in SuperSpeed protocol */ type = FIELD_GET(USB_SSP_SUBLINK_SPEED_ST, attr); if (type == USB_SSP_SUBLINK_SPEED_ST_ASYM_RX || type == USB_SSP_SUBLINK_SPEED_ST_ASYM_TX) goto out; if (FIELD_GET(USB_SSP_SUBLINK_SPEED_LP, attr) != USB_SSP_SUBLINK_SPEED_LP_SSP) goto out; lse = FIELD_GET(USB_SSP_SUBLINK_SPEED_LSE, attr); mantissa = FIELD_GET(USB_SSP_SUBLINK_SPEED_LSM, attr); /* Convert to Gbps */ for (; lse < USB_SSP_SUBLINK_SPEED_LSE_GBPS; lse++) mantissa /= 1000; if (mantissa >= 10 && lanes == 1) return USB_SSP_GEN_2x1; if (mantissa >= 10 && lanes == 2) return USB_SSP_GEN_2x2; if (mantissa >= 5 && lanes == 2) return USB_SSP_GEN_1x2; goto out; } } out: return USB_SSP_GEN_UNKNOWN; } #ifdef CONFIG_USB_FEW_INIT_RETRIES #define PORT_RESET_TRIES 2 #define SET_ADDRESS_TRIES 1 #define GET_DESCRIPTOR_TRIES 1 #define GET_MAXPACKET0_TRIES 1 #define PORT_INIT_TRIES 4 #else #define PORT_RESET_TRIES 5 #define SET_ADDRESS_TRIES 2 #define GET_DESCRIPTOR_TRIES 2 #define GET_MAXPACKET0_TRIES 3 #define PORT_INIT_TRIES 4 #endif /* CONFIG_USB_FEW_INIT_RETRIES */ #define DETECT_DISCONNECT_TRIES 5 #define HUB_ROOT_RESET_TIME 60 /* times are in msec */ #define HUB_SHORT_RESET_TIME 10 #define HUB_BH_RESET_TIME 50 #define HUB_LONG_RESET_TIME 200 #define HUB_RESET_TIMEOUT 800 static bool use_new_scheme(struct usb_device *udev, int retry, struct usb_port *port_dev) { int old_scheme_first_port = (port_dev->quirks & USB_PORT_QUIRK_OLD_SCHEME) || old_scheme_first; /* * "New scheme" enumeration causes an extra state transition to be * exposed to an xhci host and causes USB3 devices to receive control * commands in the default state. This has been seen to cause * enumeration failures, so disable this enumeration scheme for USB3 * devices. */ if (udev->speed >= USB_SPEED_SUPER) return false; /* * If use_both_schemes is set, use the first scheme (whichever * it is) for the larger half of the retries, then use the other * scheme. Otherwise, use the first scheme for all the retries. */ if (use_both_schemes && retry >= (PORT_INIT_TRIES + 1) / 2) return old_scheme_first_port; /* Second half */ return !old_scheme_first_port; /* First half or all */ } /* Is a USB 3.0 port in the Inactive or Compliance Mode state? * Port warm reset is required to recover */ static bool hub_port_warm_reset_required(struct usb_hub *hub, int port1, u16 portstatus) { u16 link_state; if (!hub_is_superspeed(hub->hdev)) return false; if (test_bit(port1, hub->warm_reset_bits)) return true; link_state = portstatus & USB_PORT_STAT_LINK_STATE; return link_state == USB_SS_PORT_LS_SS_INACTIVE || link_state == USB_SS_PORT_LS_COMP_MOD; } static int hub_port_wait_reset(struct usb_hub *hub, int port1, struct usb_device *udev, unsigned int delay, bool warm) { int delay_time, ret; u16 portstatus; u16 portchange; u32 ext_portstatus = 0; for (delay_time = 0; delay_time < HUB_RESET_TIMEOUT; delay_time += delay) { /* wait to give the device a chance to reset */ msleep(delay); /* read and decode port status */ if (hub_is_superspeedplus(hub->hdev)) ret = hub_ext_port_status(hub, port1, HUB_EXT_PORT_STATUS, &portstatus, &portchange, &ext_portstatus); else ret = usb_hub_port_status(hub, port1, &portstatus, &portchange); if (ret < 0) return ret; /* * The port state is unknown until the reset completes. * * On top of that, some chips may require additional time * to re-establish a connection after the reset is complete, * so also wait for the connection to be re-established. */ if (!(portstatus & USB_PORT_STAT_RESET) && (portstatus & USB_PORT_STAT_CONNECTION)) break; /* switch to the long delay after two short delay failures */ if (delay_time >= 2 * HUB_SHORT_RESET_TIME) delay = HUB_LONG_RESET_TIME; dev_dbg(&hub->ports[port1 - 1]->dev, "not %sreset yet, waiting %dms\n", warm ? "warm " : "", delay); } if ((portstatus & USB_PORT_STAT_RESET)) return -EBUSY; if (hub_port_warm_reset_required(hub, port1, portstatus)) return -ENOTCONN; /* Device went away? */ if (!(portstatus & USB_PORT_STAT_CONNECTION)) return -ENOTCONN; /* Retry if connect change is set but status is still connected. * A USB 3.0 connection may bounce if multiple warm resets were issued, * but the device may have successfully re-connected. Ignore it. */ if (!hub_is_superspeed(hub->hdev) && (portchange & USB_PORT_STAT_C_CONNECTION)) { usb_clear_port_feature(hub->hdev, port1, USB_PORT_FEAT_C_CONNECTION); return -EAGAIN; } if (!(portstatus & USB_PORT_STAT_ENABLE)) return -EBUSY; if (!udev) return 0; if (hub_is_superspeedplus(hub->hdev)) { /* extended portstatus Rx and Tx lane count are zero based */ udev->rx_lanes = USB_EXT_PORT_RX_LANES(ext_portstatus) + 1; udev->tx_lanes = USB_EXT_PORT_TX_LANES(ext_portstatus) + 1; udev->ssp_rate = get_port_ssp_rate(hub->hdev, ext_portstatus); } else { udev->rx_lanes = 1; udev->tx_lanes = 1; udev->ssp_rate = USB_SSP_GEN_UNKNOWN; } if (udev->ssp_rate != USB_SSP_GEN_UNKNOWN) udev->speed = USB_SPEED_SUPER_PLUS; else if (hub_is_superspeed(hub->hdev)) udev->speed = USB_SPEED_SUPER; else if (portstatus & USB_PORT_STAT_HIGH_SPEED) udev->speed = USB_SPEED_HIGH; else if (portstatus & USB_PORT_STAT_LOW_SPEED) udev->speed = USB_SPEED_LOW; else udev->speed = USB_SPEED_FULL; return 0; } /* Handle port reset and port warm(BH) reset (for USB3 protocol ports) */ static int hub_port_reset(struct usb_hub *hub, int port1, struct usb_device *udev, unsigned int delay, bool warm) { int i, status; u16 portchange, portstatus; struct usb_port *port_dev = hub->ports[port1 - 1]; int reset_recovery_time; if (!hub_is_superspeed(hub->hdev)) { if (warm) { dev_err(hub->intfdev, "only USB3 hub support " "warm reset\n"); return -EINVAL; } /* Block EHCI CF initialization during the port reset. * Some companion controllers don't like it when they mix. */ down_read(&ehci_cf_port_reset_rwsem); } else if (!warm) { /* * If the caller hasn't explicitly requested a warm reset, * double check and see if one is needed. */ if (usb_hub_port_status(hub, port1, &portstatus, &portchange) == 0) if (hub_port_warm_reset_required(hub, port1, portstatus)) warm = true; } clear_bit(port1, hub->warm_reset_bits); /* Reset the port */ for (i = 0; i < PORT_RESET_TRIES; i++) { status = set_port_feature(hub->hdev, port1, (warm ? USB_PORT_FEAT_BH_PORT_RESET : USB_PORT_FEAT_RESET)); if (status == -ENODEV) { ; /* The hub is gone */ } else if (status) { dev_err(&port_dev->dev, "cannot %sreset (err = %d)\n", warm ? "warm " : "", status); } else { status = hub_port_wait_reset(hub, port1, udev, delay, warm); if (status && status != -ENOTCONN && status != -ENODEV) dev_dbg(hub->intfdev, "port_wait_reset: err = %d\n", status); } /* * Check for disconnect or reset, and bail out after several * reset attempts to avoid warm reset loop. */ if (status == 0 || status == -ENOTCONN || status == -ENODEV || (status == -EBUSY && i == PORT_RESET_TRIES - 1)) { usb_clear_port_feature(hub->hdev, port1, USB_PORT_FEAT_C_RESET); if (!hub_is_superspeed(hub->hdev)) goto done; usb_clear_port_feature(hub->hdev, port1, USB_PORT_FEAT_C_BH_PORT_RESET); usb_clear_port_feature(hub->hdev, port1, USB_PORT_FEAT_C_PORT_LINK_STATE); if (udev) usb_clear_port_feature(hub->hdev, port1, USB_PORT_FEAT_C_CONNECTION); /* * If a USB 3.0 device migrates from reset to an error * state, re-issue the warm reset. */ if (usb_hub_port_status(hub, port1, &portstatus, &portchange) < 0) goto done; if (!hub_port_warm_reset_required(hub, port1, portstatus)) goto done; /* * If the port is in SS.Inactive or Compliance Mode, the * hot or warm reset failed. Try another warm reset. */ if (!warm) { dev_dbg(&port_dev->dev, "hot reset failed, warm reset\n"); warm = true; } } dev_dbg(&port_dev->dev, "not enabled, trying %sreset again...\n", warm ? "warm " : ""); delay = HUB_LONG_RESET_TIME; } dev_err(&port_dev->dev, "Cannot enable. Maybe the USB cable is bad?\n"); done: if (status == 0) { if (port_dev->quirks & USB_PORT_QUIRK_FAST_ENUM) usleep_range(10000, 12000); else { /* TRSTRCY = 10 ms; plus some extra */ reset_recovery_time = 10 + 40; /* Hub needs extra delay after resetting its port. */ if (hub->hdev->quirks & USB_QUIRK_HUB_SLOW_RESET) reset_recovery_time += 100; msleep(reset_recovery_time); } if (udev) { struct usb_hcd *hcd = bus_to_hcd(udev->bus); update_devnum(udev, 0); /* The xHC may think the device is already reset, * so ignore the status. */ if (hcd->driver->reset_device) hcd->driver->reset_device(hcd, udev); usb_set_device_state(udev, USB_STATE_DEFAULT); } } else { if (udev) usb_set_device_state(udev, USB_STATE_NOTATTACHED); } if (!hub_is_superspeed(hub->hdev)) up_read(&ehci_cf_port_reset_rwsem); return status; } /* * hub_port_stop_enumerate - stop USB enumeration or ignore port events * @hub: target hub * @port1: port num of the port * @retries: port retries number of hub_port_init() * * Return: * true: ignore port actions/events or give up connection attempts. * false: keep original behavior. * * This function will be based on retries to check whether the port which is * marked with early_stop attribute would stop enumeration or ignore events. * * Note: * This function didn't change anything if early_stop is not set, and it will * prevent all connection attempts when early_stop is set and the attempts of * the port are more than 1. */ static bool hub_port_stop_enumerate(struct usb_hub *hub, int port1, int retries) { struct usb_port *port_dev = hub->ports[port1 - 1]; if (port_dev->early_stop) { if (port_dev->ignore_event) return true; /* * We want unsuccessful attempts to fail quickly. * Since some devices may need one failure during * port initialization, we allow two tries but no * more. */ if (retries < 2) return false; port_dev->ignore_event = 1; } else port_dev->ignore_event = 0; return port_dev->ignore_event; } /* Check if a port is power on */ int usb_port_is_power_on(struct usb_hub *hub, unsigned int portstatus) { int ret = 0; if (hub_is_superspeed(hub->hdev)) { if (portstatus & USB_SS_PORT_STAT_POWER) ret = 1; } else { if (portstatus & USB_PORT_STAT_POWER) ret = 1; } return ret; } static void usb_lock_port(struct usb_port *port_dev) __acquires(&port_dev->status_lock) { mutex_lock(&port_dev->status_lock); __acquire(&port_dev->status_lock); } static void usb_unlock_port(struct usb_port *port_dev) __releases(&port_dev->status_lock) { mutex_unlock(&port_dev->status_lock); __release(&port_dev->status_lock); } #ifdef CONFIG_PM /* Check if a port is suspended(USB2.0 port) or in U3 state(USB3.0 port) */ static int port_is_suspended(struct usb_hub *hub, unsigned portstatus) { int ret = 0; if (hub_is_superspeed(hub->hdev)) { if ((portstatus & USB_PORT_STAT_LINK_STATE) == USB_SS_PORT_LS_U3) ret = 1; } else { if (portstatus & USB_PORT_STAT_SUSPEND) ret = 1; } return ret; } /* Determine whether the device on a port is ready for a normal resume, * is ready for a reset-resume, or should be disconnected. */ static int check_port_resume_type(struct usb_device *udev, struct usb_hub *hub, int port1, int status, u16 portchange, u16 portstatus) { struct usb_port *port_dev = hub->ports[port1 - 1]; int retries = 3; retry: /* Is a warm reset needed to recover the connection? */ if (status == 0 && udev->reset_resume && hub_port_warm_reset_required(hub, port1, portstatus)) { /* pass */; } /* Is the device still present? */ else if (status || port_is_suspended(hub, portstatus) || !usb_port_is_power_on(hub, portstatus)) { if (status >= 0) status = -ENODEV; } else if (!(portstatus & USB_PORT_STAT_CONNECTION)) { if (retries--) { usleep_range(200, 300); status = usb_hub_port_status(hub, port1, &portstatus, &portchange); goto retry; } status = -ENODEV; } /* Can't do a normal resume if the port isn't enabled, * so try a reset-resume instead. */ else if (!(portstatus & USB_PORT_STAT_ENABLE) && !udev->reset_resume) { if (udev->persist_enabled) udev->reset_resume = 1; else status = -ENODEV; } if (status) { dev_dbg(&port_dev->dev, "status %04x.%04x after resume, %d\n", portchange, portstatus, status); } else if (udev->reset_resume) { /* Late port handoff can set status-change bits */ if (portchange & USB_PORT_STAT_C_CONNECTION) usb_clear_port_feature(hub->hdev, port1, USB_PORT_FEAT_C_CONNECTION); if (portchange & USB_PORT_STAT_C_ENABLE) usb_clear_port_feature(hub->hdev, port1, USB_PORT_FEAT_C_ENABLE); /* * Whatever made this reset-resume necessary may have * turned on the port1 bit in hub->change_bits. But after * a successful reset-resume we want the bit to be clear; * if it was on it would indicate that something happened * following the reset-resume. */ clear_bit(port1, hub->change_bits); } return status; } int usb_disable_ltm(struct usb_device *udev) { struct usb_hcd *hcd = bus_to_hcd(udev->bus); /* Check if the roothub and device supports LTM. */ if (!usb_device_supports_ltm(hcd->self.root_hub) || !usb_device_supports_ltm(udev)) return 0; /* Clear Feature LTM Enable can only be sent if the device is * configured. */ if (!udev->actconfig) return 0; return usb_control_msg(udev, usb_sndctrlpipe(udev, 0), USB_REQ_CLEAR_FEATURE, USB_RECIP_DEVICE, USB_DEVICE_LTM_ENABLE, 0, NULL, 0, USB_CTRL_SET_TIMEOUT); } EXPORT_SYMBOL_GPL(usb_disable_ltm); void usb_enable_ltm(struct usb_device *udev) { struct usb_hcd *hcd = bus_to_hcd(udev->bus); /* Check if the roothub and device supports LTM. */ if (!usb_device_supports_ltm(hcd->self.root_hub) || !usb_device_supports_ltm(udev)) return; /* Set Feature LTM Enable can only be sent if the device is * configured. */ if (!udev->actconfig) return; usb_control_msg(udev, usb_sndctrlpipe(udev, 0), USB_REQ_SET_FEATURE, USB_RECIP_DEVICE, USB_DEVICE_LTM_ENABLE, 0, NULL, 0, USB_CTRL_SET_TIMEOUT); } EXPORT_SYMBOL_GPL(usb_enable_ltm); /* * usb_enable_remote_wakeup - enable remote wakeup for a device * @udev: target device * * For USB-2 devices: Set the device's remote wakeup feature. * * For USB-3 devices: Assume there's only one function on the device and * enable remote wake for the first interface. FIXME if the interface * association descriptor shows there's more than one function. */ static int usb_enable_remote_wakeup(struct usb_device *udev) { if (udev->speed < USB_SPEED_SUPER) return usb_control_msg(udev, usb_sndctrlpipe(udev, 0), USB_REQ_SET_FEATURE, USB_RECIP_DEVICE, USB_DEVICE_REMOTE_WAKEUP, 0, NULL, 0, USB_CTRL_SET_TIMEOUT); else return usb_control_msg(udev, usb_sndctrlpipe(udev, 0), USB_REQ_SET_FEATURE, USB_RECIP_INTERFACE, USB_INTRF_FUNC_SUSPEND, USB_INTRF_FUNC_SUSPEND_RW | USB_INTRF_FUNC_SUSPEND_LP, NULL, 0, USB_CTRL_SET_TIMEOUT); } /* * usb_disable_remote_wakeup - disable remote wakeup for a device * @udev: target device * * For USB-2 devices: Clear the device's remote wakeup feature. * * For USB-3 devices: Assume there's only one function on the device and * disable remote wake for the first interface. FIXME if the interface * association descriptor shows there's more than one function. */ static int usb_disable_remote_wakeup(struct usb_device *udev) { if (udev->speed < USB_SPEED_SUPER) return usb_control_msg(udev, usb_sndctrlpipe(udev, 0), USB_REQ_CLEAR_FEATURE, USB_RECIP_DEVICE, USB_DEVICE_REMOTE_WAKEUP, 0, NULL, 0, USB_CTRL_SET_TIMEOUT); else return usb_control_msg(udev, usb_sndctrlpipe(udev, 0), USB_REQ_SET_FEATURE, USB_RECIP_INTERFACE, USB_INTRF_FUNC_SUSPEND, 0, NULL, 0, USB_CTRL_SET_TIMEOUT); } /* Count of wakeup-enabled devices at or below udev */ unsigned usb_wakeup_enabled_descendants(struct usb_device *udev) { struct usb_hub *hub = usb_hub_to_struct_hub(udev); return udev->do_remote_wakeup + (hub ? hub->wakeup_enabled_descendants : 0); } EXPORT_SYMBOL_GPL(usb_wakeup_enabled_descendants); /* * usb_port_suspend - suspend a usb device's upstream port * @udev: device that's no longer in active use, not a root hub * Context: must be able to sleep; device not locked; pm locks held * * Suspends a USB device that isn't in active use, conserving power. * Devices may wake out of a suspend, if anything important happens, * using the remote wakeup mechanism. They may also be taken out of * suspend by the host, using usb_port_resume(). It's also routine * to disconnect devices while they are suspended. * * This only affects the USB hardware for a device; its interfaces * (and, for hubs, child devices) must already have been suspended. * * Selective port suspend reduces power; most suspended devices draw * less than 500 uA. It's also used in OTG, along with remote wakeup. * All devices below the suspended port are also suspended. * * Devices leave suspend state when the host wakes them up. Some devices * also support "remote wakeup", where the device can activate the USB * tree above them to deliver data, such as a keypress or packet. In * some cases, this wakes the USB host. * * Suspending OTG devices may trigger HNP, if that's been enabled * between a pair of dual-role devices. That will change roles, such * as from A-Host to A-Peripheral or from B-Host back to B-Peripheral. * * Devices on USB hub ports have only one "suspend" state, corresponding * to ACPI D2, "may cause the device to lose some context". * State transitions include: * * - suspend, resume ... when the VBUS power link stays live * - suspend, disconnect ... VBUS lost * * Once VBUS drop breaks the circuit, the port it's using has to go through * normal re-enumeration procedures, starting with enabling VBUS power. * Other than re-initializing the hub (plug/unplug, except for root hubs), * Linux (2.6) currently has NO mechanisms to initiate that: no hub_wq * timer, no SRP, no requests through sysfs. * * If Runtime PM isn't enabled or used, non-SuperSpeed devices may not get * suspended until their bus goes into global suspend (i.e., the root * hub is suspended). Nevertheless, we change @udev->state to * USB_STATE_SUSPENDED as this is the device's "logical" state. The actual * upstream port setting is stored in @udev->port_is_suspended. * * Returns 0 on success, else negative errno. */ int usb_port_suspend(struct usb_device *udev, pm_message_t msg) { struct usb_hub *hub = usb_hub_to_struct_hub(udev->parent); struct usb_port *port_dev = hub->ports[udev->portnum - 1]; int port1 = udev->portnum; int status; bool really_suspend = true; usb_lock_port(port_dev); /* enable remote wakeup when appropriate; this lets the device * wake up the upstream hub (including maybe the root hub). * * NOTE: OTG devices may issue remote wakeup (or SRP) even when * we don't explicitly enable it here. */ if (udev->do_remote_wakeup) { status = usb_enable_remote_wakeup(udev); if (status) { dev_dbg(&udev->dev, "won't remote wakeup, status %d\n", status); /* bail if autosuspend is requested */ if (PMSG_IS_AUTO(msg)) goto err_wakeup; } } /* disable USB2 hardware LPM */ usb_disable_usb2_hardware_lpm(udev); if (usb_disable_ltm(udev)) { dev_err(&udev->dev, "Failed to disable LTM before suspend\n"); status = -ENOMEM; if (PMSG_IS_AUTO(msg)) goto err_ltm; } /* see 7.1.7.6 */ if (hub_is_superspeed(hub->hdev)) status = hub_set_port_link_state(hub, port1, USB_SS_PORT_LS_U3); /* * For system suspend, we do not need to enable the suspend feature * on individual USB-2 ports. The devices will automatically go * into suspend a few ms after the root hub stops sending packets. * The USB 2.0 spec calls this "global suspend". * * However, many USB hubs have a bug: They don't relay wakeup requests * from a downstream port if the port's suspend feature isn't on. * Therefore we will turn on the suspend feature if udev or any of its * descendants is enabled for remote wakeup. */ else if (PMSG_IS_AUTO(msg) || usb_wakeup_enabled_descendants(udev) > 0) status = set_port_feature(hub->hdev, port1, USB_PORT_FEAT_SUSPEND); else { really_suspend = false; status = 0; } if (status) { /* Check if the port has been suspended for the timeout case * to prevent the suspended port from incorrect handling. */ if (status == -ETIMEDOUT) { int ret; u16 portstatus, portchange; portstatus = portchange = 0; ret = usb_hub_port_status(hub, port1, &portstatus, &portchange); dev_dbg(&port_dev->dev, "suspend timeout, status %04x\n", portstatus); if (ret == 0 && port_is_suspended(hub, portstatus)) { status = 0; goto suspend_done; } } dev_dbg(&port_dev->dev, "can't suspend, status %d\n", status); /* Try to enable USB3 LTM again */ usb_enable_ltm(udev); err_ltm: /* Try to enable USB2 hardware LPM again */ usb_enable_usb2_hardware_lpm(udev); if (udev->do_remote_wakeup) (void) usb_disable_remote_wakeup(udev); err_wakeup: /* System sleep transitions should never fail */ if (!PMSG_IS_AUTO(msg)) status = 0; } else { suspend_done: dev_dbg(&udev->dev, "usb %ssuspend, wakeup %d\n", (PMSG_IS_AUTO(msg) ? "auto-" : ""), udev->do_remote_wakeup); if (really_suspend) { udev->port_is_suspended = 1; /* device has up to 10 msec to fully suspend */ msleep(10); } usb_set_device_state(udev, USB_STATE_SUSPENDED); } if (status == 0 && !udev->do_remote_wakeup && udev->persist_enabled && test_and_clear_bit(port1, hub->child_usage_bits)) pm_runtime_put_sync(&port_dev->dev); usb_mark_last_busy(hub->hdev); usb_unlock_port(port_dev); return status; } /* * If the USB "suspend" state is in use (rather than "global suspend"), * many devices will be individually taken out of suspend state using * special "resume" signaling. This routine kicks in shortly after * hardware resume signaling is finished, either because of selective * resume (by host) or remote wakeup (by device) ... now see what changed * in the tree that's rooted at this device. * * If @udev->reset_resume is set then the device is reset before the * status check is done. */ static int finish_port_resume(struct usb_device *udev) { int status = 0; u16 devstatus = 0; /* caller owns the udev device lock */ dev_dbg(&udev->dev, "%s\n", udev->reset_resume ? "finish reset-resume" : "finish resume"); /* usb ch9 identifies four variants of SUSPENDED, based on what * state the device resumes to. Linux currently won't see the * first two on the host side; they'd be inside hub_port_init() * during many timeouts, but hub_wq can't suspend until later. */ usb_set_device_state(udev, udev->actconfig ? USB_STATE_CONFIGURED : USB_STATE_ADDRESS); /* 10.5.4.5 says not to reset a suspended port if the attached * device is enabled for remote wakeup. Hence the reset * operation is carried out here, after the port has been * resumed. */ if (udev->reset_resume) { /* * If the device morphs or switches modes when it is reset, * we don't want to perform a reset-resume. We'll fail the * resume, which will cause a logical disconnect, and then * the device will be rediscovered. */ retry_reset_resume: if (udev->quirks & USB_QUIRK_RESET) status = -ENODEV; else status = usb_reset_and_verify_device(udev); } /* 10.5.4.5 says be sure devices in the tree are still there. * For now let's assume the device didn't go crazy on resume, * and device drivers will know about any resume quirks. */ if (status == 0) { devstatus = 0; status = usb_get_std_status(udev, USB_RECIP_DEVICE, 0, &devstatus); /* If a normal resume failed, try doing a reset-resume */ if (status && !udev->reset_resume && udev->persist_enabled) { dev_dbg(&udev->dev, "retry with reset-resume\n"); udev->reset_resume = 1; goto retry_reset_resume; } } if (status) { dev_dbg(&udev->dev, "gone after usb resume? status %d\n", status); /* * There are a few quirky devices which violate the standard * by claiming to have remote wakeup enabled after a reset, * which crash if the feature is cleared, hence check for * udev->reset_resume */ } else if (udev->actconfig && !udev->reset_resume) { if (udev->speed < USB_SPEED_SUPER) { if (devstatus & (1 << USB_DEVICE_REMOTE_WAKEUP)) status = usb_disable_remote_wakeup(udev); } else { status = usb_get_std_status(udev, USB_RECIP_INTERFACE, 0, &devstatus); if (!status && devstatus & (USB_INTRF_STAT_FUNC_RW_CAP | USB_INTRF_STAT_FUNC_RW)) status = usb_disable_remote_wakeup(udev); } if (status) dev_dbg(&udev->dev, "disable remote wakeup, status %d\n", status); status = 0; } return status; } /* * There are some SS USB devices which take longer time for link training. * XHCI specs 4.19.4 says that when Link training is successful, port * sets CCS bit to 1. So if SW reads port status before successful link * training, then it will not find device to be present. * USB Analyzer log with such buggy devices show that in some cases * device switch on the RX termination after long delay of host enabling * the VBUS. In few other cases it has been seen that device fails to * negotiate link training in first attempt. It has been * reported till now that few devices take as long as 2000 ms to train * the link after host enabling its VBUS and termination. Following * routine implements a 2000 ms timeout for link training. If in a case * link trains before timeout, loop will exit earlier. * * There are also some 2.0 hard drive based devices and 3.0 thumb * drives that, when plugged into a 2.0 only port, take a long * time to set CCS after VBUS enable. * * FIXME: If a device was connected before suspend, but was removed * while system was asleep, then the loop in the following routine will * only exit at timeout. * * This routine should only be called when persist is enabled. */ static int wait_for_connected(struct usb_device *udev, struct usb_hub *hub, int port1, u16 *portchange, u16 *portstatus) { int status = 0, delay_ms = 0; while (delay_ms < 2000) { if (status || *portstatus & USB_PORT_STAT_CONNECTION) break; if (!usb_port_is_power_on(hub, *portstatus)) { status = -ENODEV; break; } msleep(20); delay_ms += 20; status = usb_hub_port_status(hub, port1, portstatus, portchange); } dev_dbg(&udev->dev, "Waited %dms for CONNECT\n", delay_ms); return status; } /* * usb_port_resume - re-activate a suspended usb device's upstream port * @udev: device to re-activate, not a root hub * Context: must be able to sleep; device not locked; pm locks held * * This will re-activate the suspended device, increasing power usage * while letting drivers communicate again with its endpoints. * USB resume explicitly guarantees that the power session between * the host and the device is the same as it was when the device * suspended. * * If @udev->reset_resume is set then this routine won't check that the * port is still enabled. Furthermore, finish_port_resume() above will * reset @udev. The end result is that a broken power session can be * recovered and @udev will appear to persist across a loss of VBUS power. * * For example, if a host controller doesn't maintain VBUS suspend current * during a system sleep or is reset when the system wakes up, all the USB * power sessions below it will be broken. This is especially troublesome * for mass-storage devices containing mounted filesystems, since the * device will appear to have disconnected and all the memory mappings * to it will be lost. Using the USB_PERSIST facility, the device can be * made to appear as if it had not disconnected. * * This facility can be dangerous. Although usb_reset_and_verify_device() makes * every effort to insure that the same device is present after the * reset as before, it cannot provide a 100% guarantee. Furthermore it's * quite possible for a device to remain unaltered but its media to be * changed. If the user replaces a flash memory card while the system is * asleep, he will have only himself to blame when the filesystem on the * new card is corrupted and the system crashes. * * Returns 0 on success, else negative errno. */ int usb_port_resume(struct usb_device *udev, pm_message_t msg) { struct usb_hub *hub = usb_hub_to_struct_hub(udev->parent); struct usb_port *port_dev = hub->ports[udev->portnum - 1]; int port1 = udev->portnum; int status; u16 portchange, portstatus; if (!test_and_set_bit(port1, hub->child_usage_bits)) { status = pm_runtime_resume_and_get(&port_dev->dev); if (status < 0) { dev_dbg(&udev->dev, "can't resume usb port, status %d\n", status); return status; } } usb_lock_port(port_dev); /* Skip the initial Clear-Suspend step for a remote wakeup */ status = usb_hub_port_status(hub, port1, &portstatus, &portchange); if (status == 0 && !port_is_suspended(hub, portstatus)) { if (portchange & USB_PORT_STAT_C_SUSPEND) pm_wakeup_event(&udev->dev, 0); goto SuspendCleared; } /* see 7.1.7.7; affects power usage, but not budgeting */ if (hub_is_superspeed(hub->hdev)) status = hub_set_port_link_state(hub, port1, USB_SS_PORT_LS_U0); else status = usb_clear_port_feature(hub->hdev, port1, USB_PORT_FEAT_SUSPEND); if (status) { dev_dbg(&port_dev->dev, "can't resume, status %d\n", status); } else { /* drive resume for USB_RESUME_TIMEOUT msec */ dev_dbg(&udev->dev, "usb %sresume\n", (PMSG_IS_AUTO(msg) ? "auto-" : "")); msleep(USB_RESUME_TIMEOUT); /* Virtual root hubs can trigger on GET_PORT_STATUS to * stop resume signaling. Then finish the resume * sequence. */ status = usb_hub_port_status(hub, port1, &portstatus, &portchange); } SuspendCleared: if (status == 0) { udev->port_is_suspended = 0; if (hub_is_superspeed(hub->hdev)) { if (portchange & USB_PORT_STAT_C_LINK_STATE) usb_clear_port_feature(hub->hdev, port1, USB_PORT_FEAT_C_PORT_LINK_STATE); } else { if (portchange & USB_PORT_STAT_C_SUSPEND) usb_clear_port_feature(hub->hdev, port1, USB_PORT_FEAT_C_SUSPEND); } /* TRSMRCY = 10 msec */ msleep(10); } if (udev->persist_enabled) status = wait_for_connected(udev, hub, port1, &portchange, &portstatus); status = check_port_resume_type(udev, hub, port1, status, portchange, portstatus); if (status == 0) status = finish_port_resume(udev); if (status < 0) { dev_dbg(&udev->dev, "can't resume, status %d\n", status); hub_port_logical_disconnect(hub, port1); } else { /* Try to enable USB2 hardware LPM */ usb_enable_usb2_hardware_lpm(udev); /* Try to enable USB3 LTM */ usb_enable_ltm(udev); } usb_unlock_port(port_dev); return status; } int usb_remote_wakeup(struct usb_device *udev) { int status = 0; usb_lock_device(udev); if (udev->state == USB_STATE_SUSPENDED) { dev_dbg(&udev->dev, "usb %sresume\n", "wakeup-"); status = usb_autoresume_device(udev); if (status == 0) { /* Let the drivers do their thing, then... */ usb_autosuspend_device(udev); } } usb_unlock_device(udev); return status; } /* Returns 1 if there was a remote wakeup and a connect status change. */ static int hub_handle_remote_wakeup(struct usb_hub *hub, unsigned int port, u16 portstatus, u16 portchange) __must_hold(&port_dev->status_lock) { struct usb_port *port_dev = hub->ports[port - 1]; struct usb_device *hdev; struct usb_device *udev; int connect_change = 0; u16 link_state; int ret; hdev = hub->hdev; udev = port_dev->child; if (!hub_is_superspeed(hdev)) { if (!(portchange & USB_PORT_STAT_C_SUSPEND)) return 0; usb_clear_port_feature(hdev, port, USB_PORT_FEAT_C_SUSPEND); } else { link_state = portstatus & USB_PORT_STAT_LINK_STATE; if (!udev || udev->state != USB_STATE_SUSPENDED || (link_state != USB_SS_PORT_LS_U0 && link_state != USB_SS_PORT_LS_U1 && link_state != USB_SS_PORT_LS_U2)) return 0; } if (udev) { /* TRSMRCY = 10 msec */ msleep(10); usb_unlock_port(port_dev); ret = usb_remote_wakeup(udev); usb_lock_port(port_dev); if (ret < 0) connect_change = 1; } else { ret = -ENODEV; hub_port_disable(hub, port, 1); } dev_dbg(&port_dev->dev, "resume, status %d\n", ret); return connect_change; } static int check_ports_changed(struct usb_hub *hub) { int port1; for (port1 = 1; port1 <= hub->hdev->maxchild; ++port1) { u16 portstatus, portchange; int status; status = usb_hub_port_status(hub, port1, &portstatus, &portchange); if (!status && portchange) return 1; } return 0; } static int hub_suspend(struct usb_interface *intf, pm_message_t msg) { struct usb_hub *hub = usb_get_intfdata(intf); struct usb_device *hdev = hub->hdev; unsigned port1; /* * Warn if children aren't already suspended. * Also, add up the number of wakeup-enabled descendants. */ hub->wakeup_enabled_descendants = 0; for (port1 = 1; port1 <= hdev->maxchild; port1++) { struct usb_port *port_dev = hub->ports[port1 - 1]; struct usb_device *udev = port_dev->child; if (udev && udev->can_submit) { dev_warn(&port_dev->dev, "device %s not suspended yet\n", dev_name(&udev->dev)); if (PMSG_IS_AUTO(msg)) return -EBUSY; } if (udev) hub->wakeup_enabled_descendants += usb_wakeup_enabled_descendants(udev); } if (hdev->do_remote_wakeup && hub->quirk_check_port_auto_suspend) { /* check if there are changes pending on hub ports */ if (check_ports_changed(hub)) { if (PMSG_IS_AUTO(msg)) return -EBUSY; pm_wakeup_event(&hdev->dev, 2000); } } if (hub_is_superspeed(hdev) && hdev->do_remote_wakeup) { /* Enable hub to send remote wakeup for all ports. */ for (port1 = 1; port1 <= hdev->maxchild; port1++) { set_port_feature(hdev, port1 | USB_PORT_FEAT_REMOTE_WAKE_CONNECT | USB_PORT_FEAT_REMOTE_WAKE_DISCONNECT | USB_PORT_FEAT_REMOTE_WAKE_OVER_CURRENT, USB_PORT_FEAT_REMOTE_WAKE_MASK); } } dev_dbg(&intf->dev, "%s\n", __func__); /* stop hub_wq and related activity */ hub_quiesce(hub, HUB_SUSPEND); return 0; } /* Report wakeup requests from the ports of a resuming root hub */ static void report_wakeup_requests(struct usb_hub *hub) { struct usb_device *hdev = hub->hdev; struct usb_device *udev; struct usb_hcd *hcd; unsigned long resuming_ports; int i; if (hdev->parent) return; /* Not a root hub */ hcd = bus_to_hcd(hdev->bus); if (hcd->driver->get_resuming_ports) { /* * The get_resuming_ports() method returns a bitmap (origin 0) * of ports which have started wakeup signaling but have not * yet finished resuming. During system resume we will * resume all the enabled ports, regardless of any wakeup * signals, which means the wakeup requests would be lost. * To prevent this, report them to the PM core here. */ resuming_ports = hcd->driver->get_resuming_ports(hcd); for (i = 0; i < hdev->maxchild; ++i) { if (test_bit(i, &resuming_ports)) { udev = hub->ports[i]->child; if (udev) pm_wakeup_event(&udev->dev, 0); } } } } static int hub_resume(struct usb_interface *intf) { struct usb_hub *hub = usb_get_intfdata(intf); dev_dbg(&intf->dev, "%s\n", __func__); hub_activate(hub, HUB_RESUME); /* * This should be called only for system resume, not runtime resume. * We can't tell the difference here, so some wakeup requests will be * reported at the wrong time or more than once. This shouldn't * matter much, so long as they do get reported. */ report_wakeup_requests(hub); return 0; } static int hub_reset_resume(struct usb_interface *intf) { struct usb_hub *hub = usb_get_intfdata(intf); dev_dbg(&intf->dev, "%s\n", __func__); hub_activate(hub, HUB_RESET_RESUME); return 0; } /** * usb_root_hub_lost_power - called by HCD if the root hub lost Vbus power * @rhdev: struct usb_device for the root hub * * The USB host controller driver calls this function when its root hub * is resumed and Vbus power has been interrupted or the controller * has been reset. The routine marks @rhdev as having lost power. * When the hub driver is resumed it will take notice and carry out * power-session recovery for all the "USB-PERSIST"-enabled child devices; * the others will be disconnected. */ void usb_root_hub_lost_power(struct usb_device *rhdev) { dev_notice(&rhdev->dev, "root hub lost power or was reset\n"); rhdev->reset_resume = 1; } EXPORT_SYMBOL_GPL(usb_root_hub_lost_power); static const char * const usb3_lpm_names[] = { "U0", "U1", "U2", "U3", }; /* * Send a Set SEL control transfer to the device, prior to enabling * device-initiated U1 or U2. This lets the device know the exit latencies from * the time the device initiates a U1 or U2 exit, to the time it will receive a * packet from the host. * * This function will fail if the SEL or PEL values for udev are greater than * the maximum allowed values for the link state to be enabled. */ static int usb_req_set_sel(struct usb_device *udev) { struct usb_set_sel_req *sel_values; unsigned long long u1_sel; unsigned long long u1_pel; unsigned long long u2_sel; unsigned long long u2_pel; int ret; if (!udev->parent || udev->speed < USB_SPEED_SUPER || !udev->lpm_capable) return 0; /* Convert SEL and PEL stored in ns to us */ u1_sel = DIV_ROUND_UP(udev->u1_params.sel, 1000); u1_pel = DIV_ROUND_UP(udev->u1_params.pel, 1000); u2_sel = DIV_ROUND_UP(udev->u2_params.sel, 1000); u2_pel = DIV_ROUND_UP(udev->u2_params.pel, 1000); /* * Make sure that the calculated SEL and PEL values for the link * state we're enabling aren't bigger than the max SEL/PEL * value that will fit in the SET SEL control transfer. * Otherwise the device would get an incorrect idea of the exit * latency for the link state, and could start a device-initiated * U1/U2 when the exit latencies are too high. */ if (u1_sel > USB3_LPM_MAX_U1_SEL_PEL || u1_pel > USB3_LPM_MAX_U1_SEL_PEL || u2_sel > USB3_LPM_MAX_U2_SEL_PEL || u2_pel > USB3_LPM_MAX_U2_SEL_PEL) { dev_dbg(&udev->dev, "Device-initiated U1/U2 disabled due to long SEL or PEL\n"); return -EINVAL; } /* * usb_enable_lpm() can be called as part of a failed device reset, * which may be initiated by an error path of a mass storage driver. * Therefore, use GFP_NOIO. */ sel_values = kmalloc(sizeof *(sel_values), GFP_NOIO); if (!sel_values) return -ENOMEM; sel_values->u1_sel = u1_sel; sel_values->u1_pel = u1_pel; sel_values->u2_sel = cpu_to_le16(u2_sel); sel_values->u2_pel = cpu_to_le16(u2_pel); ret = usb_control_msg(udev, usb_sndctrlpipe(udev, 0), USB_REQ_SET_SEL, USB_RECIP_DEVICE, 0, 0, sel_values, sizeof *(sel_values), USB_CTRL_SET_TIMEOUT); kfree(sel_values); if (ret > 0) udev->lpm_devinit_allow = 1; return ret; } /* * Enable or disable device-initiated U1 or U2 transitions. */ static int usb_set_device_initiated_lpm(struct usb_device *udev, enum usb3_link_state state, bool enable) { int ret; int feature; switch (state) { case USB3_LPM_U1: feature = USB_DEVICE_U1_ENABLE; break; case USB3_LPM_U2: feature = USB_DEVICE_U2_ENABLE; break; default: dev_warn(&udev->dev, "%s: Can't %s non-U1 or U2 state.\n", __func__, str_enable_disable(enable)); return -EINVAL; } if (udev->state != USB_STATE_CONFIGURED) { dev_dbg(&udev->dev, "%s: Can't %s %s state " "for unconfigured device.\n", __func__, str_enable_disable(enable), usb3_lpm_names[state]); return -EINVAL; } if (enable) { /* * Now send the control transfer to enable device-initiated LPM * for either U1 or U2. */ ret = usb_control_msg(udev, usb_sndctrlpipe(udev, 0), USB_REQ_SET_FEATURE, USB_RECIP_DEVICE, feature, 0, NULL, 0, USB_CTRL_SET_TIMEOUT); } else { ret = usb_control_msg(udev, usb_sndctrlpipe(udev, 0), USB_REQ_CLEAR_FEATURE, USB_RECIP_DEVICE, feature, 0, NULL, 0, USB_CTRL_SET_TIMEOUT); } if (ret < 0) { dev_warn(&udev->dev, "%s of device-initiated %s failed.\n", str_enable_disable(enable), usb3_lpm_names[state]); return -EBUSY; } return 0; } static int usb_set_lpm_timeout(struct usb_device *udev, enum usb3_link_state state, int timeout) { int ret; int feature; switch (state) { case USB3_LPM_U1: feature = USB_PORT_FEAT_U1_TIMEOUT; break; case USB3_LPM_U2: feature = USB_PORT_FEAT_U2_TIMEOUT; break; default: dev_warn(&udev->dev, "%s: Can't set timeout for non-U1 or U2 state.\n", __func__); return -EINVAL; } if (state == USB3_LPM_U1 && timeout > USB3_LPM_U1_MAX_TIMEOUT && timeout != USB3_LPM_DEVICE_INITIATED) { dev_warn(&udev->dev, "Failed to set %s timeout to 0x%x, " "which is a reserved value.\n", usb3_lpm_names[state], timeout); return -EINVAL; } ret = set_port_feature(udev->parent, USB_PORT_LPM_TIMEOUT(timeout) | udev->portnum, feature); if (ret < 0) { dev_warn(&udev->dev, "Failed to set %s timeout to 0x%x," "error code %i\n", usb3_lpm_names[state], timeout, ret); return -EBUSY; } if (state == USB3_LPM_U1) udev->u1_params.timeout = timeout; else udev->u2_params.timeout = timeout; return 0; } /* * Don't allow device intiated U1/U2 if device isn't in the configured state, * or the system exit latency + one bus interval is greater than the minimum * service interval of any active periodic endpoint. See USB 3.2 section 9.4.9 */ static bool usb_device_may_initiate_lpm(struct usb_device *udev, enum usb3_link_state state) { unsigned int sel; /* us */ int i, j; if (!udev->lpm_devinit_allow || !udev->actconfig) return false; if (state == USB3_LPM_U1) sel = DIV_ROUND_UP(udev->u1_params.sel, 1000); else if (state == USB3_LPM_U2) sel = DIV_ROUND_UP(udev->u2_params.sel, 1000); else return false; for (i = 0; i < udev->actconfig->desc.bNumInterfaces; i++) { struct usb_interface *intf; struct usb_endpoint_descriptor *desc; unsigned int interval; intf = udev->actconfig->interface[i]; if (!intf) continue; for (j = 0; j < intf->cur_altsetting->desc.bNumEndpoints; j++) { desc = &intf->cur_altsetting->endpoint[j].desc; if (usb_endpoint_xfer_int(desc) || usb_endpoint_xfer_isoc(desc)) { interval = (1 << (desc->bInterval - 1)) * 125; if (sel + 125 > interval) return false; } } } return true; } /* * Enable the hub-initiated U1/U2 idle timeouts, and enable device-initiated * U1/U2 entry. * * We will attempt to enable U1 or U2, but there are no guarantees that the * control transfers to set the hub timeout or enable device-initiated U1/U2 * will be successful. * * If the control transfer to enable device-initiated U1/U2 entry fails, then * hub-initiated U1/U2 will be disabled. * * If we cannot set the parent hub U1/U2 timeout, we attempt to let the xHCI * driver know about it. If that call fails, it should be harmless, and just * take up more slightly more bus bandwidth for unnecessary U1/U2 exit latency. */ static int usb_enable_link_state(struct usb_hcd *hcd, struct usb_device *udev, enum usb3_link_state state) { int timeout; __u8 u1_mel; __le16 u2_mel; /* Skip if the device BOS descriptor couldn't be read */ if (!udev->bos) return -EINVAL; u1_mel = udev->bos->ss_cap->bU1devExitLat; u2_mel = udev->bos->ss_cap->bU2DevExitLat; /* If the device says it doesn't have *any* exit latency to come out of * U1 or U2, it's probably lying. Assume it doesn't implement that link * state. */ if ((state == USB3_LPM_U1 && u1_mel == 0) || (state == USB3_LPM_U2 && u2_mel == 0)) return -EINVAL; /* We allow the host controller to set the U1/U2 timeout internally * first, so that it can change its schedule to account for the * additional latency to send data to a device in a lower power * link state. */ timeout = hcd->driver->enable_usb3_lpm_timeout(hcd, udev, state); /* xHCI host controller doesn't want to enable this LPM state. */ if (timeout == 0) return -EINVAL; if (timeout < 0) { dev_warn(&udev->dev, "Could not enable %s link state, " "xHCI error %i.\n", usb3_lpm_names[state], timeout); return timeout; } if (usb_set_lpm_timeout(udev, state, timeout)) { /* If we can't set the parent hub U1/U2 timeout, * device-initiated LPM won't be allowed either, so let the xHCI * host know that this link state won't be enabled. */ hcd->driver->disable_usb3_lpm_timeout(hcd, udev, state); return -EBUSY; } if (state == USB3_LPM_U1) udev->usb3_lpm_u1_enabled = 1; else if (state == USB3_LPM_U2) udev->usb3_lpm_u2_enabled = 1; return 0; } /* * Disable the hub-initiated U1/U2 idle timeouts, and disable device-initiated * U1/U2 entry. * * If this function returns -EBUSY, the parent hub will still allow U1/U2 entry. * If zero is returned, the parent will not allow the link to go into U1/U2. * * If zero is returned, device-initiated U1/U2 entry may still be enabled, but * it won't have an effect on the bus link state because the parent hub will * still disallow device-initiated U1/U2 entry. * * If zero is returned, the xHCI host controller may still think U1/U2 entry is * possible. The result will be slightly more bus bandwidth will be taken up * (to account for U1/U2 exit latency), but it should be harmless. */ static int usb_disable_link_state(struct usb_hcd *hcd, struct usb_device *udev, enum usb3_link_state state) { switch (state) { case USB3_LPM_U1: case USB3_LPM_U2: break; default: dev_warn(&udev->dev, "%s: Can't disable non-U1 or U2 state.\n", __func__); return -EINVAL; } if (usb_set_lpm_timeout(udev, state, 0)) return -EBUSY; if (hcd->driver->disable_usb3_lpm_timeout(hcd, udev, state)) dev_warn(&udev->dev, "Could not disable xHCI %s timeout, " "bus schedule bandwidth may be impacted.\n", usb3_lpm_names[state]); /* As soon as usb_set_lpm_timeout(0) return 0, hub initiated LPM * is disabled. Hub will disallows link to enter U1/U2 as well, * even device is initiating LPM. Hence LPM is disabled if hub LPM * timeout set to 0, no matter device-initiated LPM is disabled or * not. */ if (state == USB3_LPM_U1) udev->usb3_lpm_u1_enabled = 0; else if (state == USB3_LPM_U2) udev->usb3_lpm_u2_enabled = 0; return 0; } /* * Disable hub-initiated and device-initiated U1 and U2 entry. * Caller must own the bandwidth_mutex. * * This will call usb_enable_lpm() on failure, which will decrement * lpm_disable_count, and will re-enable LPM if lpm_disable_count reaches zero. */ int usb_disable_lpm(struct usb_device *udev) { struct usb_hcd *hcd; int err; if (!udev || !udev->parent || udev->speed < USB_SPEED_SUPER || !udev->lpm_capable || udev->state < USB_STATE_CONFIGURED) return 0; hcd = bus_to_hcd(udev->bus); if (!hcd || !hcd->driver->disable_usb3_lpm_timeout) return 0; udev->lpm_disable_count++; if ((udev->u1_params.timeout == 0 && udev->u2_params.timeout == 0)) return 0; /* If LPM is enabled, attempt to disable it. */ if (usb_disable_link_state(hcd, udev, USB3_LPM_U1)) goto disable_failed; if (usb_disable_link_state(hcd, udev, USB3_LPM_U2)) goto disable_failed; err = usb_set_device_initiated_lpm(udev, USB3_LPM_U1, false); if (!err) usb_set_device_initiated_lpm(udev, USB3_LPM_U2, false); return 0; disable_failed: udev->lpm_disable_count--; return -EBUSY; } EXPORT_SYMBOL_GPL(usb_disable_lpm); /* Grab the bandwidth_mutex before calling usb_disable_lpm() */ int usb_unlocked_disable_lpm(struct usb_device *udev) { struct usb_hcd *hcd = bus_to_hcd(udev->bus); int ret; if (!hcd) return -EINVAL; mutex_lock(hcd->bandwidth_mutex); ret = usb_disable_lpm(udev); mutex_unlock(hcd->bandwidth_mutex); return ret; } EXPORT_SYMBOL_GPL(usb_unlocked_disable_lpm); /* * Attempt to enable device-initiated and hub-initiated U1 and U2 entry. The * xHCI host policy may prevent U1 or U2 from being enabled. * * Other callers may have disabled link PM, so U1 and U2 entry will be disabled * until the lpm_disable_count drops to zero. Caller must own the * bandwidth_mutex. */ void usb_enable_lpm(struct usb_device *udev) { struct usb_hcd *hcd; struct usb_hub *hub; struct usb_port *port_dev; if (!udev || !udev->parent || udev->speed < USB_SPEED_SUPER || !udev->lpm_capable || udev->state < USB_STATE_CONFIGURED) return; udev->lpm_disable_count--; hcd = bus_to_hcd(udev->bus); /* Double check that we can both enable and disable LPM. * Device must be configured to accept set feature U1/U2 timeout. */ if (!hcd || !hcd->driver->enable_usb3_lpm_timeout || !hcd->driver->disable_usb3_lpm_timeout) return; if (udev->lpm_disable_count > 0) return; hub = usb_hub_to_struct_hub(udev->parent); if (!hub) return; port_dev = hub->ports[udev->portnum - 1]; if (port_dev->usb3_lpm_u1_permit) if (usb_enable_link_state(hcd, udev, USB3_LPM_U1)) return; if (port_dev->usb3_lpm_u2_permit) if (usb_enable_link_state(hcd, udev, USB3_LPM_U2)) return; /* * Enable device initiated U1/U2 with a SetFeature(U1/U2_ENABLE) request * if system exit latency is short enough and device is configured */ if (usb_device_may_initiate_lpm(udev, USB3_LPM_U1)) { if (usb_set_device_initiated_lpm(udev, USB3_LPM_U1, true)) return; if (usb_device_may_initiate_lpm(udev, USB3_LPM_U2)) usb_set_device_initiated_lpm(udev, USB3_LPM_U2, true); } } EXPORT_SYMBOL_GPL(usb_enable_lpm); /* Grab the bandwidth_mutex before calling usb_enable_lpm() */ void usb_unlocked_enable_lpm(struct usb_device *udev) { struct usb_hcd *hcd = bus_to_hcd(udev->bus); if (!hcd) return; mutex_lock(hcd->bandwidth_mutex); usb_enable_lpm(udev); mutex_unlock(hcd->bandwidth_mutex); } EXPORT_SYMBOL_GPL(usb_unlocked_enable_lpm); /* usb3 devices use U3 for disabled, make sure remote wakeup is disabled */ static void hub_usb3_port_prepare_disable(struct usb_hub *hub, struct usb_port *port_dev) { struct usb_device *udev = port_dev->child; int ret; if (udev && udev->port_is_suspended && udev->do_remote_wakeup) { ret = hub_set_port_link_state(hub, port_dev->portnum, USB_SS_PORT_LS_U0); if (!ret) { msleep(USB_RESUME_TIMEOUT); ret = usb_disable_remote_wakeup(udev); } if (ret) dev_warn(&udev->dev, "Port disable: can't disable remote wake\n"); udev->do_remote_wakeup = 0; } } #else /* CONFIG_PM */ #define hub_suspend NULL #define hub_resume NULL #define hub_reset_resume NULL static inline void hub_usb3_port_prepare_disable(struct usb_hub *hub, struct usb_port *port_dev) { } int usb_disable_lpm(struct usb_device *udev) { return 0; } EXPORT_SYMBOL_GPL(usb_disable_lpm); void usb_enable_lpm(struct usb_device *udev) { } EXPORT_SYMBOL_GPL(usb_enable_lpm); int usb_unlocked_disable_lpm(struct usb_device *udev) { return 0; } EXPORT_SYMBOL_GPL(usb_unlocked_disable_lpm); void usb_unlocked_enable_lpm(struct usb_device *udev) { } EXPORT_SYMBOL_GPL(usb_unlocked_enable_lpm); int usb_disable_ltm(struct usb_device *udev) { return 0; } EXPORT_SYMBOL_GPL(usb_disable_ltm); void usb_enable_ltm(struct usb_device *udev) { } EXPORT_SYMBOL_GPL(usb_enable_ltm); static int hub_handle_remote_wakeup(struct usb_hub *hub, unsigned int port, u16 portstatus, u16 portchange) { return 0; } static int usb_req_set_sel(struct usb_device *udev) { return 0; } #endif /* CONFIG_PM */ /* * USB-3 does not have a similar link state as USB-2 that will avoid negotiating * a connection with a plugged-in cable but will signal the host when the cable * is unplugged. Disable remote wake and set link state to U3 for USB-3 devices */ static int hub_port_disable(struct usb_hub *hub, int port1, int set_state) { struct usb_port *port_dev = hub->ports[port1 - 1]; struct usb_device *hdev = hub->hdev; int ret = 0; if (!hub->error) { if (hub_is_superspeed(hub->hdev)) { hub_usb3_port_prepare_disable(hub, port_dev); ret = hub_set_port_link_state(hub, port_dev->portnum, USB_SS_PORT_LS_U3); } else { ret = usb_clear_port_feature(hdev, port1, USB_PORT_FEAT_ENABLE); } } if (port_dev->child && set_state) usb_set_device_state(port_dev->child, USB_STATE_NOTATTACHED); if (ret && ret != -ENODEV) dev_err(&port_dev->dev, "cannot disable (err = %d)\n", ret); return ret; } /* * usb_port_disable - disable a usb device's upstream port * @udev: device to disable * Context: @udev locked, must be able to sleep. * * Disables a USB device that isn't in active use. */ int usb_port_disable(struct usb_device *udev) { struct usb_hub *hub = usb_hub_to_struct_hub(udev->parent); return hub_port_disable(hub, udev->portnum, 0); } /* USB 2.0 spec, 7.1.7.3 / fig 7-29: * * Between connect detection and reset signaling there must be a delay * of 100ms at least for debounce and power-settling. The corresponding * timer shall restart whenever the downstream port detects a disconnect. * * Apparently there are some bluetooth and irda-dongles and a number of * low-speed devices for which this debounce period may last over a second. * Not covered by the spec - but easy to deal with. * * This implementation uses a 1500ms total debounce timeout; if the * connection isn't stable by then it returns -ETIMEDOUT. It checks * every 25ms for transient disconnects. When the port status has been * unchanged for 100ms it returns the port status. */ int hub_port_debounce(struct usb_hub *hub, int port1, bool must_be_connected) { int ret; u16 portchange, portstatus; unsigned connection = 0xffff; int total_time, stable_time = 0; struct usb_port *port_dev = hub->ports[port1 - 1]; for (total_time = 0; ; total_time += HUB_DEBOUNCE_STEP) { ret = usb_hub_port_status(hub, port1, &portstatus, &portchange); if (ret < 0) return ret; if (!(portchange & USB_PORT_STAT_C_CONNECTION) && (portstatus & USB_PORT_STAT_CONNECTION) == connection) { if (!must_be_connected || (connection == USB_PORT_STAT_CONNECTION)) stable_time += HUB_DEBOUNCE_STEP; if (stable_time >= HUB_DEBOUNCE_STABLE) break; } else { stable_time = 0; connection = portstatus & USB_PORT_STAT_CONNECTION; } if (portchange & USB_PORT_STAT_C_CONNECTION) { usb_clear_port_feature(hub->hdev, port1, USB_PORT_FEAT_C_CONNECTION); } if (total_time >= HUB_DEBOUNCE_TIMEOUT) break; msleep(HUB_DEBOUNCE_STEP); } dev_dbg(&port_dev->dev, "debounce total %dms stable %dms status 0x%x\n", total_time, stable_time, portstatus); if (stable_time < HUB_DEBOUNCE_STABLE) return -ETIMEDOUT; return portstatus; } void usb_ep0_reinit(struct usb_device *udev) { usb_disable_endpoint(udev, 0 + USB_DIR_IN, true); usb_disable_endpoint(udev, 0 + USB_DIR_OUT, true); usb_enable_endpoint(udev, &udev->ep0, true); } EXPORT_SYMBOL_GPL(usb_ep0_reinit); static int hub_set_address(struct usb_device *udev, int devnum) { int retval; unsigned int timeout_ms = USB_CTRL_SET_TIMEOUT; struct usb_hcd *hcd = bus_to_hcd(udev->bus); struct usb_hub *hub = usb_hub_to_struct_hub(udev->parent); if (hub->hdev->quirks & USB_QUIRK_SHORT_SET_ADDRESS_REQ_TIMEOUT) timeout_ms = USB_SHORT_SET_ADDRESS_REQ_TIMEOUT; /* * The host controller will choose the device address, * instead of the core having chosen it earlier */ if (!hcd->driver->address_device && devnum <= 1) return -EINVAL; if (udev->state == USB_STATE_ADDRESS) return 0; if (udev->state != USB_STATE_DEFAULT) return -EINVAL; if (hcd->driver->address_device) retval = hcd->driver->address_device(hcd, udev, timeout_ms); else retval = usb_control_msg(udev, usb_sndctrlpipe(udev, 0), USB_REQ_SET_ADDRESS, 0, devnum, 0, NULL, 0, timeout_ms); if (retval == 0) { update_devnum(udev, devnum); /* Device now using proper address. */ usb_set_device_state(udev, USB_STATE_ADDRESS); usb_ep0_reinit(udev); } return retval; } /* * There are reports of USB 3.0 devices that say they support USB 2.0 Link PM * when they're plugged into a USB 2.0 port, but they don't work when LPM is * enabled. * * Only enable USB 2.0 Link PM if the port is internal (hardwired), or the * device says it supports the new USB 2.0 Link PM errata by setting the BESL * support bit in the BOS descriptor. */ static void hub_set_initial_usb2_lpm_policy(struct usb_device *udev) { struct usb_hub *hub = usb_hub_to_struct_hub(udev->parent); int connect_type = USB_PORT_CONNECT_TYPE_UNKNOWN; if (!udev->usb2_hw_lpm_capable || !udev->bos) return; if (hub) connect_type = hub->ports[udev->portnum - 1]->connect_type; if ((udev->bos->ext_cap->bmAttributes & cpu_to_le32(USB_BESL_SUPPORT)) || connect_type == USB_PORT_CONNECT_TYPE_HARD_WIRED) { udev->usb2_hw_lpm_allowed = 1; usb_enable_usb2_hardware_lpm(udev); } } static int hub_enable_device(struct usb_device *udev) { struct usb_hcd *hcd = bus_to_hcd(udev->bus); if (!hcd->driver->enable_device) return 0; if (udev->state == USB_STATE_ADDRESS) return 0; if (udev->state != USB_STATE_DEFAULT) return -EINVAL; return hcd->driver->enable_device(hcd, udev); } /* * Get the bMaxPacketSize0 value during initialization by reading the * device's device descriptor. Since we don't already know this value, * the transfer is unsafe and it ignores I/O errors, only testing for * reasonable received values. * * For "old scheme" initialization, size will be 8 so we read just the * start of the device descriptor, which should work okay regardless of * the actual bMaxPacketSize0 value. For "new scheme" initialization, * size will be 64 (and buf will point to a sufficiently large buffer), * which might not be kosher according to the USB spec but it's what * Windows does and what many devices expect. * * Returns: bMaxPacketSize0 or a negative error code. */ static int get_bMaxPacketSize0(struct usb_device *udev, struct usb_device_descriptor *buf, int size, bool first_time) { int i, rc; /* * Retry on all errors; some devices are flakey. * 255 is for WUSB devices, we actually need to use * 512 (WUSB1.0[4.8.1]). */ for (i = 0; i < GET_MAXPACKET0_TRIES; ++i) { /* Start with invalid values in case the transfer fails */ buf->bDescriptorType = buf->bMaxPacketSize0 = 0; rc = usb_control_msg(udev, usb_rcvctrlpipe(udev, 0), USB_REQ_GET_DESCRIPTOR, USB_DIR_IN, USB_DT_DEVICE << 8, 0, buf, size, initial_descriptor_timeout); switch (buf->bMaxPacketSize0) { case 8: case 16: case 32: case 64: case 9: if (buf->bDescriptorType == USB_DT_DEVICE) { rc = buf->bMaxPacketSize0; break; } fallthrough; default: if (rc >= 0) rc = -EPROTO; break; } /* * Some devices time out if they are powered on * when already connected. They need a second * reset, so return early. But only on the first * attempt, lest we get into a time-out/reset loop. */ if (rc > 0 || (rc == -ETIMEDOUT && first_time && udev->speed > USB_SPEED_FULL)) break; } return rc; } #define GET_DESCRIPTOR_BUFSIZE 64 /* Reset device, (re)assign address, get device descriptor. * Device connection must be stable, no more debouncing needed. * Returns device in USB_STATE_ADDRESS, except on error. * * If this is called for an already-existing device (as part of * usb_reset_and_verify_device), the caller must own the device lock and * the port lock. For a newly detected device that is not accessible * through any global pointers, it's not necessary to lock the device, * but it is still necessary to lock the port. * * For a newly detected device, @dev_descr must be NULL. The device * descriptor retrieved from the device will then be stored in * @udev->descriptor. For an already existing device, @dev_descr * must be non-NULL. The device descriptor will be stored there, * not in @udev->descriptor, because descriptors for registered * devices are meant to be immutable. */ static int hub_port_init(struct usb_hub *hub, struct usb_device *udev, int port1, int retry_counter, struct usb_device_descriptor *dev_descr) { struct usb_device *hdev = hub->hdev; struct usb_hcd *hcd = bus_to_hcd(hdev->bus); struct usb_port *port_dev = hub->ports[port1 - 1]; int retries, operations, retval, i; unsigned delay = HUB_SHORT_RESET_TIME; enum usb_device_speed oldspeed = udev->speed; const char *speed; int devnum = udev->devnum; const char *driver_name; bool do_new_scheme; const bool initial = !dev_descr; int maxp0; struct usb_device_descriptor *buf, *descr; buf = kmalloc(GET_DESCRIPTOR_BUFSIZE, GFP_NOIO); if (!buf) return -ENOMEM; /* root hub ports have a slightly longer reset period * (from USB 2.0 spec, section 7.1.7.5) */ if (!hdev->parent) { delay = HUB_ROOT_RESET_TIME; if (port1 == hdev->bus->otg_port) hdev->bus->b_hnp_enable = 0; } /* Some low speed devices have problems with the quick delay, so */ /* be a bit pessimistic with those devices. RHbug #23670 */ if (oldspeed == USB_SPEED_LOW) delay = HUB_LONG_RESET_TIME; /* Reset the device; full speed may morph to high speed */ /* FIXME a USB 2.0 device may morph into SuperSpeed on reset. */ retval = hub_port_reset(hub, port1, udev, delay, false); if (retval < 0) /* error or disconnect */ goto fail; /* success, speed is known */ retval = -ENODEV; /* Don't allow speed changes at reset, except usb 3.0 to faster */ if (oldspeed != USB_SPEED_UNKNOWN && oldspeed != udev->speed && !(oldspeed == USB_SPEED_SUPER && udev->speed > oldspeed)) { dev_dbg(&udev->dev, "device reset changed speed!\n"); goto fail; } oldspeed = udev->speed; if (initial) { /* USB 2.0 section 5.5.3 talks about ep0 maxpacket ... * it's fixed size except for full speed devices. */ switch (udev->speed) { case USB_SPEED_SUPER_PLUS: case USB_SPEED_SUPER: udev->ep0.desc.wMaxPacketSize = cpu_to_le16(512); break; case USB_SPEED_HIGH: /* fixed at 64 */ udev->ep0.desc.wMaxPacketSize = cpu_to_le16(64); break; case USB_SPEED_FULL: /* 8, 16, 32, or 64 */ /* to determine the ep0 maxpacket size, try to read * the device descriptor to get bMaxPacketSize0 and * then correct our initial guess. */ udev->ep0.desc.wMaxPacketSize = cpu_to_le16(64); break; case USB_SPEED_LOW: /* fixed at 8 */ udev->ep0.desc.wMaxPacketSize = cpu_to_le16(8); break; default: goto fail; } } speed = usb_speed_string(udev->speed); /* * The controller driver may be NULL if the controller device * is the middle device between platform device and roothub. * This middle device may not need a device driver due to * all hardware control can be at platform device driver, this * platform device is usually a dual-role USB controller device. */ if (udev->bus->controller->driver) driver_name = udev->bus->controller->driver->name; else driver_name = udev->bus->sysdev->driver->name; if (udev->speed < USB_SPEED_SUPER) dev_info(&udev->dev, "%s %s USB device number %d using %s\n", (initial ? "new" : "reset"), speed, devnum, driver_name); if (initial) { /* Set up TT records, if needed */ if (hdev->tt) { udev->tt = hdev->tt; udev->ttport = hdev->ttport; } else if (udev->speed != USB_SPEED_HIGH && hdev->speed == USB_SPEED_HIGH) { if (!hub->tt.hub) { dev_err(&udev->dev, "parent hub has no TT\n"); retval = -EINVAL; goto fail; } udev->tt = &hub->tt; udev->ttport = port1; } } /* Why interleave GET_DESCRIPTOR and SET_ADDRESS this way? * Because device hardware and firmware is sometimes buggy in * this area, and this is how Linux has done it for ages. * Change it cautiously. * * NOTE: If use_new_scheme() is true we will start by issuing * a 64-byte GET_DESCRIPTOR request. This is what Windows does, * so it may help with some non-standards-compliant devices. * Otherwise we start with SET_ADDRESS and then try to read the * first 8 bytes of the device descriptor to get the ep0 maxpacket * value. */ do_new_scheme = use_new_scheme(udev, retry_counter, port_dev); for (retries = 0; retries < GET_DESCRIPTOR_TRIES; (++retries, msleep(100))) { if (hub_port_stop_enumerate(hub, port1, retries)) { retval = -ENODEV; break; } if (do_new_scheme) { retval = hub_enable_device(udev); if (retval < 0) { dev_err(&udev->dev, "hub failed to enable device, error %d\n", retval); goto fail; } maxp0 = get_bMaxPacketSize0(udev, buf, GET_DESCRIPTOR_BUFSIZE, retries == 0); if (maxp0 > 0 && !initial && maxp0 != udev->descriptor.bMaxPacketSize0) { dev_err(&udev->dev, "device reset changed ep0 maxpacket size!\n"); retval = -ENODEV; goto fail; } retval = hub_port_reset(hub, port1, udev, delay, false); if (retval < 0) /* error or disconnect */ goto fail; if (oldspeed != udev->speed) { dev_dbg(&udev->dev, "device reset changed speed!\n"); retval = -ENODEV; goto fail; } if (maxp0 < 0) { if (maxp0 != -ENODEV) dev_err(&udev->dev, "device descriptor read/64, error %d\n", maxp0); retval = maxp0; continue; } } for (operations = 0; operations < SET_ADDRESS_TRIES; ++operations) { retval = hub_set_address(udev, devnum); if (retval >= 0) break; msleep(200); } if (retval < 0) { if (retval != -ENODEV) dev_err(&udev->dev, "device not accepting address %d, error %d\n", devnum, retval); goto fail; } if (udev->speed >= USB_SPEED_SUPER) { devnum = udev->devnum; dev_info(&udev->dev, "%s SuperSpeed%s%s USB device number %d using %s\n", (udev->config) ? "reset" : "new", (udev->speed == USB_SPEED_SUPER_PLUS) ? " Plus" : "", (udev->ssp_rate == USB_SSP_GEN_2x2) ? " Gen 2x2" : (udev->ssp_rate == USB_SSP_GEN_2x1) ? " Gen 2x1" : (udev->ssp_rate == USB_SSP_GEN_1x2) ? " Gen 1x2" : "", devnum, driver_name); } /* * cope with hardware quirkiness: * - let SET_ADDRESS settle, some device hardware wants it * - read ep0 maxpacket even for high and low speed, */ msleep(10); if (do_new_scheme) break; maxp0 = get_bMaxPacketSize0(udev, buf, 8, retries == 0); if (maxp0 < 0) { retval = maxp0; if (retval != -ENODEV) dev_err(&udev->dev, "device descriptor read/8, error %d\n", retval); } else { u32 delay; if (!initial && maxp0 != udev->descriptor.bMaxPacketSize0) { dev_err(&udev->dev, "device reset changed ep0 maxpacket size!\n"); retval = -ENODEV; goto fail; } delay = udev->parent->hub_delay; udev->hub_delay = min_t(u32, delay, USB_TP_TRANSMISSION_DELAY_MAX); retval = usb_set_isoch_delay(udev); if (retval) { dev_dbg(&udev->dev, "Failed set isoch delay, error %d\n", retval); retval = 0; } break; } } if (retval) goto fail; /* * Check the ep0 maxpacket guess and correct it if necessary. * maxp0 is the value stored in the device descriptor; * i is the value it encodes (logarithmic for SuperSpeed or greater). */ i = maxp0; if (udev->speed >= USB_SPEED_SUPER) { if (maxp0 <= 16) i = 1 << maxp0; else i = 0; /* Invalid */ } if (usb_endpoint_maxp(&udev->ep0.desc) == i) { ; /* Initial ep0 maxpacket guess is right */ } else if (((udev->speed == USB_SPEED_FULL || udev->speed == USB_SPEED_HIGH) && (i == 8 || i == 16 || i == 32 || i == 64)) || (udev->speed >= USB_SPEED_SUPER && i > 0)) { /* Initial guess is wrong; use the descriptor's value */ if (udev->speed == USB_SPEED_FULL) dev_dbg(&udev->dev, "ep0 maxpacket = %d\n", i); else dev_warn(&udev->dev, "Using ep0 maxpacket: %d\n", i); udev->ep0.desc.wMaxPacketSize = cpu_to_le16(i); usb_ep0_reinit(udev); } else { /* Initial guess is wrong and descriptor's value is invalid */ dev_err(&udev->dev, "Invalid ep0 maxpacket: %d\n", maxp0); retval = -EMSGSIZE; goto fail; } descr = usb_get_device_descriptor(udev); if (IS_ERR(descr)) { retval = PTR_ERR(descr); if (retval != -ENODEV) dev_err(&udev->dev, "device descriptor read/all, error %d\n", retval); goto fail; } if (initial) udev->descriptor = *descr; else *dev_descr = *descr; kfree(descr); /* * Some superspeed devices have finished the link training process * and attached to a superspeed hub port, but the device descriptor * got from those devices show they aren't superspeed devices. Warm * reset the port attached by the devices can fix them. */ if ((udev->speed >= USB_SPEED_SUPER) && (le16_to_cpu(udev->descriptor.bcdUSB) < 0x0300)) { dev_err(&udev->dev, "got a wrong device descriptor, warm reset device\n"); hub_port_reset(hub, port1, udev, HUB_BH_RESET_TIME, true); retval = -EINVAL; goto fail; } usb_detect_quirks(udev); if (le16_to_cpu(udev->descriptor.bcdUSB) >= 0x0201) { retval = usb_get_bos_descriptor(udev); if (!retval) { udev->lpm_capable = usb_device_supports_lpm(udev); udev->lpm_disable_count = 1; usb_set_lpm_parameters(udev); usb_req_set_sel(udev); } } retval = 0; /* notify HCD that we have a device connected and addressed */ if (hcd->driver->update_device) hcd->driver->update_device(hcd, udev); hub_set_initial_usb2_lpm_policy(udev); fail: if (retval) { hub_port_disable(hub, port1, 0); update_devnum(udev, devnum); /* for disconnect processing */ } kfree(buf); return retval; } static void check_highspeed(struct usb_hub *hub, struct usb_device *udev, int port1) { struct usb_qualifier_descriptor *qual; int status; if (udev->quirks & USB_QUIRK_DEVICE_QUALIFIER) return; qual = kmalloc(sizeof *qual, GFP_KERNEL); if (qual == NULL) return; status = usb_get_descriptor(udev, USB_DT_DEVICE_QUALIFIER, 0, qual, sizeof *qual); if (status == sizeof *qual) { dev_info(&udev->dev, "not running at top speed; " "connect to a high speed hub\n"); /* hub LEDs are probably harder to miss than syslog */ if (hub->has_indicators) { hub->indicator[port1-1] = INDICATOR_GREEN_BLINK; queue_delayed_work(system_power_efficient_wq, &hub->leds, 0); } } kfree(qual); } static unsigned hub_power_remaining(struct usb_hub *hub) { struct usb_device *hdev = hub->hdev; int remaining; int port1; if (!hub->limited_power) return 0; remaining = hdev->bus_mA - hub->descriptor->bHubContrCurrent; for (port1 = 1; port1 <= hdev->maxchild; ++port1) { struct usb_port *port_dev = hub->ports[port1 - 1]; struct usb_device *udev = port_dev->child; unsigned unit_load; int delta; if (!udev) continue; if (hub_is_superspeed(udev)) unit_load = 150; else unit_load = 100; /* * Unconfigured devices may not use more than one unit load, * or 8mA for OTG ports */ if (udev->actconfig) delta = usb_get_max_power(udev, udev->actconfig); else if (port1 != udev->bus->otg_port || hdev->parent) delta = unit_load; else delta = 8; if (delta > hub->mA_per_port) dev_warn(&port_dev->dev, "%dmA is over %umA budget!\n", delta, hub->mA_per_port); remaining -= delta; } if (remaining < 0) { dev_warn(hub->intfdev, "%dmA over power budget!\n", -remaining); remaining = 0; } return remaining; } static int descriptors_changed(struct usb_device *udev, struct usb_device_descriptor *new_device_descriptor, struct usb_host_bos *old_bos) { int changed = 0; unsigned index; unsigned serial_len = 0; unsigned len; unsigned old_length; int length; char *buf; if (memcmp(&udev->descriptor, new_device_descriptor, sizeof(*new_device_descriptor)) != 0) return 1; if ((old_bos && !udev->bos) || (!old_bos && udev->bos)) return 1; if (udev->bos) { len = le16_to_cpu(udev->bos->desc->wTotalLength); if (len != le16_to_cpu(old_bos->desc->wTotalLength)) return 1; if (memcmp(udev->bos->desc, old_bos->desc, len)) return 1; } /* Since the idVendor, idProduct, and bcdDevice values in the * device descriptor haven't changed, we will assume the * Manufacturer and Product strings haven't changed either. * But the SerialNumber string could be different (e.g., a * different flash card of the same brand). */ if (udev->serial) serial_len = strlen(udev->serial) + 1; len = serial_len; for (index = 0; index < udev->descriptor.bNumConfigurations; index++) { old_length = le16_to_cpu(udev->config[index].desc.wTotalLength); len = max(len, old_length); } buf = kmalloc(len, GFP_NOIO); if (!buf) /* assume the worst */ return 1; for (index = 0; index < udev->descriptor.bNumConfigurations; index++) { old_length = le16_to_cpu(udev->config[index].desc.wTotalLength); length = usb_get_descriptor(udev, USB_DT_CONFIG, index, buf, old_length); if (length != old_length) { dev_dbg(&udev->dev, "config index %d, error %d\n", index, length); changed = 1; break; } if (memcmp(buf, udev->rawdescriptors[index], old_length) != 0) { dev_dbg(&udev->dev, "config index %d changed (#%d)\n", index, ((struct usb_config_descriptor *) buf)-> bConfigurationValue); changed = 1; break; } } if (!changed && serial_len) { length = usb_string(udev, udev->descriptor.iSerialNumber, buf, serial_len); if (length + 1 != serial_len) { dev_dbg(&udev->dev, "serial string error %d\n", length); changed = 1; } else if (memcmp(buf, udev->serial, length) != 0) { dev_dbg(&udev->dev, "serial string changed\n"); changed = 1; } } kfree(buf); return changed; } static void hub_port_connect(struct usb_hub *hub, int port1, u16 portstatus, u16 portchange) { int status = -ENODEV; int i; unsigned unit_load; struct usb_device *hdev = hub->hdev; struct usb_hcd *hcd = bus_to_hcd(hdev->bus); struct usb_port *port_dev = hub->ports[port1 - 1]; struct usb_device *udev = port_dev->child; static int unreliable_port = -1; bool retry_locked; /* Disconnect any existing devices under this port */ if (udev) { if (hcd->usb_phy && !hdev->parent) usb_phy_notify_disconnect(hcd->usb_phy, udev->speed); usb_disconnect(&port_dev->child); } /* We can forget about a "removed" device when there's a physical * disconnect or the connect status changes. */ if (!(portstatus & USB_PORT_STAT_CONNECTION) || (portchange & USB_PORT_STAT_C_CONNECTION)) clear_bit(port1, hub->removed_bits); if (portchange & (USB_PORT_STAT_C_CONNECTION | USB_PORT_STAT_C_ENABLE)) { status = hub_port_debounce_be_stable(hub, port1); if (status < 0) { if (status != -ENODEV && port1 != unreliable_port && printk_ratelimit()) dev_err(&port_dev->dev, "connect-debounce failed\n"); portstatus &= ~USB_PORT_STAT_CONNECTION; unreliable_port = port1; } else { portstatus = status; } } /* Return now if debouncing failed or nothing is connected or * the device was "removed". */ if (!(portstatus & USB_PORT_STAT_CONNECTION) || test_bit(port1, hub->removed_bits)) { /* * maybe switch power back on (e.g. root hub was reset) * but only if the port isn't owned by someone else. */ if (hub_is_port_power_switchable(hub) && !usb_port_is_power_on(hub, portstatus) && !port_dev->port_owner) set_port_feature(hdev, port1, USB_PORT_FEAT_POWER); if (portstatus & USB_PORT_STAT_ENABLE) goto done; return; } if (hub_is_superspeed(hub->hdev)) unit_load = 150; else unit_load = 100; status = 0; for (i = 0; i < PORT_INIT_TRIES; i++) { if (hub_port_stop_enumerate(hub, port1, i)) { status = -ENODEV; break; } usb_lock_port(port_dev); mutex_lock(hcd->address0_mutex); retry_locked = true; /* reallocate for each attempt, since references * to the previous one can escape in various ways */ udev = usb_alloc_dev(hdev, hdev->bus, port1); if (!udev) { dev_err(&port_dev->dev, "couldn't allocate usb_device\n"); mutex_unlock(hcd->address0_mutex); usb_unlock_port(port_dev); goto done; } usb_set_device_state(udev, USB_STATE_POWERED); udev->bus_mA = hub->mA_per_port; udev->level = hdev->level + 1; /* Devices connected to SuperSpeed hubs are USB 3.0 or later */ if (hub_is_superspeed(hub->hdev)) udev->speed = USB_SPEED_SUPER; else udev->speed = USB_SPEED_UNKNOWN; choose_devnum(udev); if (udev->devnum <= 0) { status = -ENOTCONN; /* Don't retry */ goto loop; } /* reset (non-USB 3.0 devices) and get descriptor */ status = hub_port_init(hub, udev, port1, i, NULL); if (status < 0) goto loop; mutex_unlock(hcd->address0_mutex); usb_unlock_port(port_dev); retry_locked = false; if (udev->quirks & USB_QUIRK_DELAY_INIT) msleep(2000); /* consecutive bus-powered hubs aren't reliable; they can * violate the voltage drop budget. if the new child has * a "powered" LED, users should notice we didn't enable it * (without reading syslog), even without per-port LEDs * on the parent. */ if (udev->descriptor.bDeviceClass == USB_CLASS_HUB && udev->bus_mA <= unit_load) { u16 devstat; status = usb_get_std_status(udev, USB_RECIP_DEVICE, 0, &devstat); if (status) { dev_dbg(&udev->dev, "get status %d ?\n", status); goto loop_disable; } if ((devstat & (1 << USB_DEVICE_SELF_POWERED)) == 0) { dev_err(&udev->dev, "can't connect bus-powered hub " "to this port\n"); if (hub->has_indicators) { hub->indicator[port1-1] = INDICATOR_AMBER_BLINK; queue_delayed_work( system_power_efficient_wq, &hub->leds, 0); } status = -ENOTCONN; /* Don't retry */ goto loop_disable; } } /* check for devices running slower than they could */ if (le16_to_cpu(udev->descriptor.bcdUSB) >= 0x0200 && udev->speed == USB_SPEED_FULL && highspeed_hubs != 0) check_highspeed(hub, udev, port1); /* Store the parent's children[] pointer. At this point * udev becomes globally accessible, although presumably * no one will look at it until hdev is unlocked. */ status = 0; mutex_lock(&usb_port_peer_mutex); /* We mustn't add new devices if the parent hub has * been disconnected; we would race with the * recursively_mark_NOTATTACHED() routine. */ spin_lock_irq(&device_state_lock); if (hdev->state == USB_STATE_NOTATTACHED) status = -ENOTCONN; else port_dev->child = udev; spin_unlock_irq(&device_state_lock); mutex_unlock(&usb_port_peer_mutex); /* Run it through the hoops (find a driver, etc) */ if (!status) { status = usb_new_device(udev); if (status) { mutex_lock(&usb_port_peer_mutex); spin_lock_irq(&device_state_lock); port_dev->child = NULL; spin_unlock_irq(&device_state_lock); mutex_unlock(&usb_port_peer_mutex); } else { if (hcd->usb_phy && !hdev->parent) usb_phy_notify_connect(hcd->usb_phy, udev->speed); } } if (status) goto loop_disable; status = hub_power_remaining(hub); if (status) dev_dbg(hub->intfdev, "%dmA power budget left\n", status); return; loop_disable: hub_port_disable(hub, port1, 1); loop: usb_ep0_reinit(udev); release_devnum(udev); hub_free_dev(udev); if (retry_locked) { mutex_unlock(hcd->address0_mutex); usb_unlock_port(port_dev); } usb_put_dev(udev); if ((status == -ENOTCONN) || (status == -ENOTSUPP)) break; /* When halfway through our retry count, power-cycle the port */ if (i == (PORT_INIT_TRIES - 1) / 2) { dev_info(&port_dev->dev, "attempt power cycle\n"); usb_hub_set_port_power(hdev, hub, port1, false); msleep(2 * hub_power_on_good_delay(hub)); usb_hub_set_port_power(hdev, hub, port1, true); msleep(hub_power_on_good_delay(hub)); } } if (hub->hdev->parent || !hcd->driver->port_handed_over || !(hcd->driver->port_handed_over)(hcd, port1)) { if (status != -ENOTCONN && status != -ENODEV) dev_err(&port_dev->dev, "unable to enumerate USB device\n"); } done: hub_port_disable(hub, port1, 1); if (hcd->driver->relinquish_port && !hub->hdev->parent) { if (status != -ENOTCONN && status != -ENODEV) hcd->driver->relinquish_port(hcd, port1); } } /* Handle physical or logical connection change events. * This routine is called when: * a port connection-change occurs; * a port enable-change occurs (often caused by EMI); * usb_reset_and_verify_device() encounters changed descriptors (as from * a firmware download) * caller already locked the hub */ static void hub_port_connect_change(struct usb_hub *hub, int port1, u16 portstatus, u16 portchange) __must_hold(&port_dev->status_lock) { struct usb_port *port_dev = hub->ports[port1 - 1]; struct usb_device *udev = port_dev->child; struct usb_device_descriptor *descr; int status = -ENODEV; dev_dbg(&port_dev->dev, "status %04x, change %04x, %s\n", portstatus, portchange, portspeed(hub, portstatus)); if (hub->has_indicators) { set_port_led(hub, port1, HUB_LED_AUTO); hub->indicator[port1-1] = INDICATOR_AUTO; } #ifdef CONFIG_USB_OTG /* during HNP, don't repeat the debounce */ if (hub->hdev->bus->is_b_host) portchange &= ~(USB_PORT_STAT_C_CONNECTION | USB_PORT_STAT_C_ENABLE); #endif /* Try to resuscitate an existing device */ if ((portstatus & USB_PORT_STAT_CONNECTION) && udev && udev->state != USB_STATE_NOTATTACHED) { if (portstatus & USB_PORT_STAT_ENABLE) { /* * USB-3 connections are initialized automatically by * the hostcontroller hardware. Therefore check for * changed device descriptors before resuscitating the * device. */ descr = usb_get_device_descriptor(udev); if (IS_ERR(descr)) { dev_dbg(&udev->dev, "can't read device descriptor %ld\n", PTR_ERR(descr)); } else { if (descriptors_changed(udev, descr, udev->bos)) { dev_dbg(&udev->dev, "device descriptor has changed\n"); } else { status = 0; /* Nothing to do */ } kfree(descr); } #ifdef CONFIG_PM } else if (udev->state == USB_STATE_SUSPENDED && udev->persist_enabled) { /* For a suspended device, treat this as a * remote wakeup event. */ usb_unlock_port(port_dev); status = usb_remote_wakeup(udev); usb_lock_port(port_dev); #endif } else { /* Don't resuscitate */; } } clear_bit(port1, hub->change_bits); /* successfully revalidated the connection */ if (status == 0) return; usb_unlock_port(port_dev); hub_port_connect(hub, port1, portstatus, portchange); usb_lock_port(port_dev); } /* Handle notifying userspace about hub over-current events */ static void port_over_current_notify(struct usb_port *port_dev) { char *envp[3] = { NULL, NULL, NULL }; struct device *hub_dev; char *port_dev_path; sysfs_notify(&port_dev->dev.kobj, NULL, "over_current_count"); hub_dev = port_dev->dev.parent; if (!hub_dev) return; port_dev_path = kobject_get_path(&port_dev->dev.kobj, GFP_KERNEL); if (!port_dev_path) return; envp[0] = kasprintf(GFP_KERNEL, "OVER_CURRENT_PORT=%s", port_dev_path); if (!envp[0]) goto exit; envp[1] = kasprintf(GFP_KERNEL, "OVER_CURRENT_COUNT=%u", port_dev->over_current_count); if (!envp[1]) goto exit; kobject_uevent_env(&hub_dev->kobj, KOBJ_CHANGE, envp); exit: kfree(envp[1]); kfree(envp[0]); kfree(port_dev_path); } static void port_event(struct usb_hub *hub, int port1) __must_hold(&port_dev->status_lock) { int connect_change; struct usb_port *port_dev = hub->ports[port1 - 1]; struct usb_device *udev = port_dev->child; struct usb_device *hdev = hub->hdev; u16 portstatus, portchange; int i = 0; int err; connect_change = test_bit(port1, hub->change_bits); clear_bit(port1, hub->event_bits); clear_bit(port1, hub->wakeup_bits); if (usb_hub_port_status(hub, port1, &portstatus, &portchange) < 0) return; if (portchange & USB_PORT_STAT_C_CONNECTION) { usb_clear_port_feature(hdev, port1, USB_PORT_FEAT_C_CONNECTION); connect_change = 1; } if (portchange & USB_PORT_STAT_C_ENABLE) { if (!connect_change) dev_dbg(&port_dev->dev, "enable change, status %08x\n", portstatus); usb_clear_port_feature(hdev, port1, USB_PORT_FEAT_C_ENABLE); /* * EM interference sometimes causes badly shielded USB devices * to be shutdown by the hub, this hack enables them again. * Works at least with mouse driver. */ if (!(portstatus & USB_PORT_STAT_ENABLE) && !connect_change && udev) { dev_err(&port_dev->dev, "disabled by hub (EMI?), re-enabling...\n"); connect_change = 1; } } if (portchange & USB_PORT_STAT_C_OVERCURRENT) { u16 status = 0, unused; port_dev->over_current_count++; port_over_current_notify(port_dev); dev_dbg(&port_dev->dev, "over-current change #%u\n", port_dev->over_current_count); usb_clear_port_feature(hdev, port1, USB_PORT_FEAT_C_OVER_CURRENT); msleep(100); /* Cool down */ hub_power_on(hub, true); usb_hub_port_status(hub, port1, &status, &unused); if (status & USB_PORT_STAT_OVERCURRENT) dev_err(&port_dev->dev, "over-current condition\n"); } if (portchange & USB_PORT_STAT_C_RESET) { dev_dbg(&port_dev->dev, "reset change\n"); usb_clear_port_feature(hdev, port1, USB_PORT_FEAT_C_RESET); } if ((portchange & USB_PORT_STAT_C_BH_RESET) && hub_is_superspeed(hdev)) { dev_dbg(&port_dev->dev, "warm reset change\n"); usb_clear_port_feature(hdev, port1, USB_PORT_FEAT_C_BH_PORT_RESET); } if (portchange & USB_PORT_STAT_C_LINK_STATE) { dev_dbg(&port_dev->dev, "link state change\n"); usb_clear_port_feature(hdev, port1, USB_PORT_FEAT_C_PORT_LINK_STATE); } if (portchange & USB_PORT_STAT_C_CONFIG_ERROR) { dev_warn(&port_dev->dev, "config error\n"); usb_clear_port_feature(hdev, port1, USB_PORT_FEAT_C_PORT_CONFIG_ERROR); } /* skip port actions that require the port to be powered on */ if (!pm_runtime_active(&port_dev->dev)) return; /* skip port actions if ignore_event and early_stop are true */ if (port_dev->ignore_event && port_dev->early_stop) return; if (hub_handle_remote_wakeup(hub, port1, portstatus, portchange)) connect_change = 1; /* * Avoid trying to recover a USB3 SS.Inactive port with a warm reset if * the device was disconnected. A 12ms disconnect detect timer in * SS.Inactive state transitions the port to RxDetect automatically. * SS.Inactive link error state is common during device disconnect. */ while (hub_port_warm_reset_required(hub, port1, portstatus)) { if ((i++ < DETECT_DISCONNECT_TRIES) && udev) { u16 unused; msleep(20); usb_hub_port_status(hub, port1, &portstatus, &unused); dev_dbg(&port_dev->dev, "Wait for inactive link disconnect detect\n"); continue; } else if (!udev || !(portstatus & USB_PORT_STAT_CONNECTION) || udev->state == USB_STATE_NOTATTACHED) { dev_dbg(&port_dev->dev, "do warm reset, port only\n"); err = hub_port_reset(hub, port1, NULL, HUB_BH_RESET_TIME, true); if (!udev && err == -ENOTCONN) connect_change = 0; else if (err < 0) hub_port_disable(hub, port1, 1); } else { dev_dbg(&port_dev->dev, "do warm reset, full device\n"); usb_unlock_port(port_dev); usb_lock_device(udev); usb_reset_device(udev); usb_unlock_device(udev); usb_lock_port(port_dev); connect_change = 0; } break; } if (connect_change) hub_port_connect_change(hub, port1, portstatus, portchange); } static void hub_event(struct work_struct *work) { struct usb_device *hdev; struct usb_interface *intf; struct usb_hub *hub; struct device *hub_dev; u16 hubstatus; u16 hubchange; int i, ret; hub = container_of(work, struct usb_hub, events); hdev = hub->hdev; hub_dev = hub->intfdev; intf = to_usb_interface(hub_dev); kcov_remote_start_usb((u64)hdev->bus->busnum); dev_dbg(hub_dev, "state %d ports %d chg %04x evt %04x\n", hdev->state, hdev->maxchild, /* NOTE: expects max 15 ports... */ (u16) hub->change_bits[0], (u16) hub->event_bits[0]); /* Lock the device, then check to see if we were * disconnected while waiting for the lock to succeed. */ usb_lock_device(hdev); if (unlikely(hub->disconnected)) goto out_hdev_lock; /* If the hub has died, clean up after it */ if (hdev->state == USB_STATE_NOTATTACHED) { hub->error = -ENODEV; hub_quiesce(hub, HUB_DISCONNECT); goto out_hdev_lock; } /* Autoresume */ ret = usb_autopm_get_interface(intf); if (ret) { dev_dbg(hub_dev, "Can't autoresume: %d\n", ret); goto out_hdev_lock; } /* If this is an inactive hub, do nothing */ if (hub->quiescing) goto out_autopm; if (hub->error) { dev_dbg(hub_dev, "resetting for error %d\n", hub->error); ret = usb_reset_device(hdev); if (ret) { dev_dbg(hub_dev, "error resetting hub: %d\n", ret); goto out_autopm; } hub->nerrors = 0; hub->error = 0; } /* deal with port status changes */ for (i = 1; i <= hdev->maxchild; i++) { struct usb_port *port_dev = hub->ports[i - 1]; if (test_bit(i, hub->event_bits) || test_bit(i, hub->change_bits) || test_bit(i, hub->wakeup_bits)) { /* * The get_noresume and barrier ensure that if * the port was in the process of resuming, we * flush that work and keep the port active for * the duration of the port_event(). However, * if the port is runtime pm suspended * (powered-off), we leave it in that state, run * an abbreviated port_event(), and move on. */ pm_runtime_get_noresume(&port_dev->dev); pm_runtime_barrier(&port_dev->dev); usb_lock_port(port_dev); port_event(hub, i); usb_unlock_port(port_dev); pm_runtime_put_sync(&port_dev->dev); } } /* deal with hub status changes */ if (test_and_clear_bit(0, hub->event_bits) == 0) ; /* do nothing */ else if (hub_hub_status(hub, &hubstatus, &hubchange) < 0) dev_err(hub_dev, "get_hub_status failed\n"); else { if (hubchange & HUB_CHANGE_LOCAL_POWER) { dev_dbg(hub_dev, "power change\n"); clear_hub_feature(hdev, C_HUB_LOCAL_POWER); if (hubstatus & HUB_STATUS_LOCAL_POWER) /* FIXME: Is this always true? */ hub->limited_power = 1; else hub->limited_power = 0; } if (hubchange & HUB_CHANGE_OVERCURRENT) { u16 status = 0; u16 unused; dev_dbg(hub_dev, "over-current change\n"); clear_hub_feature(hdev, C_HUB_OVER_CURRENT); msleep(500); /* Cool down */ hub_power_on(hub, true); hub_hub_status(hub, &status, &unused); if (status & HUB_STATUS_OVERCURRENT) dev_err(hub_dev, "over-current condition\n"); } } out_autopm: /* Balance the usb_autopm_get_interface() above */ usb_autopm_put_interface_no_suspend(intf); out_hdev_lock: usb_unlock_device(hdev); /* Balance the stuff in kick_hub_wq() and allow autosuspend */ usb_autopm_put_interface(intf); hub_put(hub); kcov_remote_stop(); } static const struct usb_device_id hub_id_table[] = { { .match_flags = USB_DEVICE_ID_MATCH_VENDOR | USB_DEVICE_ID_MATCH_PRODUCT | USB_DEVICE_ID_MATCH_INT_CLASS, .idVendor = USB_VENDOR_SMSC, .idProduct = USB_PRODUCT_USB5534B, .bInterfaceClass = USB_CLASS_HUB, .driver_info = HUB_QUIRK_DISABLE_AUTOSUSPEND}, { .match_flags = USB_DEVICE_ID_MATCH_VENDOR | USB_DEVICE_ID_MATCH_PRODUCT, .idVendor = USB_VENDOR_CYPRESS, .idProduct = USB_PRODUCT_CY7C65632, .driver_info = HUB_QUIRK_DISABLE_AUTOSUSPEND}, { .match_flags = USB_DEVICE_ID_MATCH_VENDOR | USB_DEVICE_ID_MATCH_INT_CLASS, .idVendor = USB_VENDOR_GENESYS_LOGIC, .bInterfaceClass = USB_CLASS_HUB, .driver_info = HUB_QUIRK_CHECK_PORT_AUTOSUSPEND}, { .match_flags = USB_DEVICE_ID_MATCH_VENDOR | USB_DEVICE_ID_MATCH_PRODUCT, .idVendor = USB_VENDOR_TEXAS_INSTRUMENTS, .idProduct = USB_PRODUCT_TUSB8041_USB2, .driver_info = HUB_QUIRK_DISABLE_AUTOSUSPEND}, { .match_flags = USB_DEVICE_ID_MATCH_VENDOR | USB_DEVICE_ID_MATCH_PRODUCT, .idVendor = USB_VENDOR_TEXAS_INSTRUMENTS, .idProduct = USB_PRODUCT_TUSB8041_USB3, .driver_info = HUB_QUIRK_DISABLE_AUTOSUSPEND}, { .match_flags = USB_DEVICE_ID_MATCH_VENDOR | USB_DEVICE_ID_MATCH_PRODUCT, .idVendor = USB_VENDOR_MICROCHIP, .idProduct = USB_PRODUCT_USB4913, .driver_info = HUB_QUIRK_REDUCE_FRAME_INTR_BINTERVAL}, { .match_flags = USB_DEVICE_ID_MATCH_VENDOR | USB_DEVICE_ID_MATCH_PRODUCT, .idVendor = USB_VENDOR_MICROCHIP, .idProduct = USB_PRODUCT_USB4914, .driver_info = HUB_QUIRK_REDUCE_FRAME_INTR_BINTERVAL}, { .match_flags = USB_DEVICE_ID_MATCH_VENDOR | USB_DEVICE_ID_MATCH_PRODUCT, .idVendor = USB_VENDOR_MICROCHIP, .idProduct = USB_PRODUCT_USB4915, .driver_info = HUB_QUIRK_REDUCE_FRAME_INTR_BINTERVAL}, { .match_flags = USB_DEVICE_ID_MATCH_DEV_CLASS, .bDeviceClass = USB_CLASS_HUB}, { .match_flags = USB_DEVICE_ID_MATCH_INT_CLASS, .bInterfaceClass = USB_CLASS_HUB}, { } /* Terminating entry */ }; MODULE_DEVICE_TABLE(usb, hub_id_table); static struct usb_driver hub_driver = { .name = "hub", .probe = hub_probe, .disconnect = hub_disconnect, .suspend = hub_suspend, .resume = hub_resume, .reset_resume = hub_reset_resume, .pre_reset = hub_pre_reset, .post_reset = hub_post_reset, .unlocked_ioctl = hub_ioctl, .id_table = hub_id_table, .supports_autosuspend = 1, }; int usb_hub_init(void) { if (usb_register(&hub_driver) < 0) { printk(KERN_ERR "%s: can't register hub driver\n", usbcore_name); return -1; } /* * The workqueue needs to be freezable to avoid interfering with * USB-PERSIST port handover. Otherwise it might see that a full-speed * device was gone before the EHCI controller had handed its port * over to the companion full-speed controller. */ hub_wq = alloc_workqueue("usb_hub_wq", WQ_FREEZABLE, 0); if (hub_wq) return 0; /* Fall through if kernel_thread failed */ usb_deregister(&hub_driver); pr_err("%s: can't allocate workqueue for usb hub\n", usbcore_name); return -1; } void usb_hub_cleanup(void) { destroy_workqueue(hub_wq); /* * Hub resources are freed for us by usb_deregister. It calls * usb_driver_purge on every device which in turn calls that * devices disconnect function if it is using this driver. * The hub_disconnect function takes care of releasing the * individual hub resources. -greg */ usb_deregister(&hub_driver); } /* usb_hub_cleanup() */ /** * hub_hc_release_resources - clear resources used by host controller * @udev: pointer to device being released * * Context: task context, might sleep * * Function releases the host controller resources in correct order before * making any operation on resuming usb device. The host controller resources * allocated for devices in tree should be released starting from the last * usb device in tree toward the root hub. This function is used only during * resuming device when usb device require reinitialization – that is, when * flag udev->reset_resume is set. * * This call is synchronous, and may not be used in an interrupt context. */ static void hub_hc_release_resources(struct usb_device *udev) { struct usb_hub *hub = usb_hub_to_struct_hub(udev); struct usb_hcd *hcd = bus_to_hcd(udev->bus); int i; /* Release up resources for all children before this device */ for (i = 0; i < udev->maxchild; i++) if (hub->ports[i]->child) hub_hc_release_resources(hub->ports[i]->child); if (hcd->driver->reset_device) hcd->driver->reset_device(hcd, udev); } /** * usb_reset_and_verify_device - perform a USB port reset to reinitialize a device * @udev: device to reset (not in SUSPENDED or NOTATTACHED state) * * WARNING - don't use this routine to reset a composite device * (one with multiple interfaces owned by separate drivers)! * Use usb_reset_device() instead. * * Do a port reset, reassign the device's address, and establish its * former operating configuration. If the reset fails, or the device's * descriptors change from their values before the reset, or the original * configuration and altsettings cannot be restored, a flag will be set * telling hub_wq to pretend the device has been disconnected and then * re-connected. All drivers will be unbound, and the device will be * re-enumerated and probed all over again. * * Return: 0 if the reset succeeded, -ENODEV if the device has been * flagged for logical disconnection, or some other negative error code * if the reset wasn't even attempted. * * Note: * The caller must own the device lock and the port lock, the latter is * taken by usb_reset_device(). For example, it's safe to use * usb_reset_device() from a driver probe() routine after downloading * new firmware. For calls that might not occur during probe(), drivers * should lock the device using usb_lock_device_for_reset(). * * Locking exception: This routine may also be called from within an * autoresume handler. Such usage won't conflict with other tasks * holding the device lock because these tasks should always call * usb_autopm_resume_device(), thereby preventing any unwanted * autoresume. The autoresume handler is expected to have already * acquired the port lock before calling this routine. */ static int usb_reset_and_verify_device(struct usb_device *udev) { struct usb_device *parent_hdev = udev->parent; struct usb_hub *parent_hub; struct usb_hcd *hcd = bus_to_hcd(udev->bus); struct usb_device_descriptor descriptor; struct usb_interface *intf; struct usb_host_bos *bos; int i, j, ret = 0; int port1 = udev->portnum; if (udev->state == USB_STATE_NOTATTACHED || udev->state == USB_STATE_SUSPENDED) { dev_dbg(&udev->dev, "device reset not allowed in state %d\n", udev->state); return -EINVAL; } if (!parent_hdev) return -EISDIR; parent_hub = usb_hub_to_struct_hub(parent_hdev); /* Disable USB2 hardware LPM. * It will be re-enabled by the enumeration process. */ usb_disable_usb2_hardware_lpm(udev); bos = udev->bos; udev->bos = NULL; if (udev->reset_resume) hub_hc_release_resources(udev); mutex_lock(hcd->address0_mutex); for (i = 0; i < PORT_INIT_TRIES; ++i) { if (hub_port_stop_enumerate(parent_hub, port1, i)) { ret = -ENODEV; break; } /* ep0 maxpacket size may change; let the HCD know about it. * Other endpoints will be handled by re-enumeration. */ usb_ep0_reinit(udev); ret = hub_port_init(parent_hub, udev, port1, i, &descriptor); if (ret >= 0 || ret == -ENOTCONN || ret == -ENODEV) break; } mutex_unlock(hcd->address0_mutex); if (ret < 0) goto re_enumerate; /* Device might have changed firmware (DFU or similar) */ if (descriptors_changed(udev, &descriptor, bos)) { dev_info(&udev->dev, "device firmware changed\n"); goto re_enumerate; } /* Restore the device's previous configuration */ if (!udev->actconfig) goto done; /* * Some devices can't handle setting default altsetting 0 with a * Set-Interface request. Disable host-side endpoints of those * interfaces here. Enable and reset them back after host has set * its internal endpoint structures during usb_hcd_alloc_bandwith() */ for (i = 0; i < udev->actconfig->desc.bNumInterfaces; i++) { intf = udev->actconfig->interface[i]; if (intf->cur_altsetting->desc.bAlternateSetting == 0) usb_disable_interface(udev, intf, true); } mutex_lock(hcd->bandwidth_mutex); ret = usb_hcd_alloc_bandwidth(udev, udev->actconfig, NULL, NULL); if (ret < 0) { dev_warn(&udev->dev, "Busted HC? Not enough HCD resources for " "old configuration.\n"); mutex_unlock(hcd->bandwidth_mutex); goto re_enumerate; } ret = usb_control_msg(udev, usb_sndctrlpipe(udev, 0), USB_REQ_SET_CONFIGURATION, 0, udev->actconfig->desc.bConfigurationValue, 0, NULL, 0, USB_CTRL_SET_TIMEOUT); if (ret < 0) { dev_err(&udev->dev, "can't restore configuration #%d (error=%d)\n", udev->actconfig->desc.bConfigurationValue, ret); mutex_unlock(hcd->bandwidth_mutex); goto re_enumerate; } mutex_unlock(hcd->bandwidth_mutex); usb_set_device_state(udev, USB_STATE_CONFIGURED); /* Put interfaces back into the same altsettings as before. * Don't bother to send the Set-Interface request for interfaces * that were already in altsetting 0; besides being unnecessary, * many devices can't handle it. Instead just reset the host-side * endpoint state. */ for (i = 0; i < udev->actconfig->desc.bNumInterfaces; i++) { struct usb_host_config *config = udev->actconfig; struct usb_interface_descriptor *desc; intf = config->interface[i]; desc = &intf->cur_altsetting->desc; if (desc->bAlternateSetting == 0) { usb_enable_interface(udev, intf, true); ret = 0; } else { /* Let the bandwidth allocation function know that this * device has been reset, and it will have to use * alternate setting 0 as the current alternate setting. */ intf->resetting_device = 1; ret = usb_set_interface(udev, desc->bInterfaceNumber, desc->bAlternateSetting); intf->resetting_device = 0; } if (ret < 0) { dev_err(&udev->dev, "failed to restore interface %d " "altsetting %d (error=%d)\n", desc->bInterfaceNumber, desc->bAlternateSetting, ret); goto re_enumerate; } /* Resetting also frees any allocated streams */ for (j = 0; j < intf->cur_altsetting->desc.bNumEndpoints; j++) intf->cur_altsetting->endpoint[j].streams = 0; } done: /* Now that the alt settings are re-installed, enable LTM and LPM. */ usb_enable_usb2_hardware_lpm(udev); usb_unlocked_enable_lpm(udev); usb_enable_ltm(udev); usb_release_bos_descriptor(udev); udev->bos = bos; return 0; re_enumerate: usb_release_bos_descriptor(udev); udev->bos = bos; hub_port_logical_disconnect(parent_hub, port1); return -ENODEV; } /** * usb_reset_device - warn interface drivers and perform a USB port reset * @udev: device to reset (not in NOTATTACHED state) * * Warns all drivers bound to registered interfaces (using their pre_reset * method), performs the port reset, and then lets the drivers know that * the reset is over (using their post_reset method). * * Return: The same as for usb_reset_and_verify_device(). * However, if a reset is already in progress (for instance, if a * driver doesn't have pre_reset() or post_reset() callbacks, and while * being unbound or re-bound during the ongoing reset its disconnect() * or probe() routine tries to perform a second, nested reset), the * routine returns -EINPROGRESS. * * Note: * The caller must own the device lock. For example, it's safe to use * this from a driver probe() routine after downloading new firmware. * For calls that might not occur during probe(), drivers should lock * the device using usb_lock_device_for_reset(). * * If an interface is currently being probed or disconnected, we assume * its driver knows how to handle resets. For all other interfaces, * if the driver doesn't have pre_reset and post_reset methods then * we attempt to unbind it and rebind afterward. */ int usb_reset_device(struct usb_device *udev) { int ret; int i; unsigned int noio_flag; struct usb_port *port_dev; struct usb_host_config *config = udev->actconfig; struct usb_hub *hub = usb_hub_to_struct_hub(udev->parent); if (udev->state == USB_STATE_NOTATTACHED) { dev_dbg(&udev->dev, "device reset not allowed in state %d\n", udev->state); return -EINVAL; } if (!udev->parent) { /* this requires hcd-specific logic; see ohci_restart() */ dev_dbg(&udev->dev, "%s for root hub!\n", __func__); return -EISDIR; } if (udev->reset_in_progress) return -EINPROGRESS; udev->reset_in_progress = 1; port_dev = hub->ports[udev->portnum - 1]; /* * Don't allocate memory with GFP_KERNEL in current * context to avoid possible deadlock if usb mass * storage interface or usbnet interface(iSCSI case) * is included in current configuration. The easist * approach is to do it for every device reset, * because the device 'memalloc_noio' flag may have * not been set before reseting the usb device. */ noio_flag = memalloc_noio_save(); /* Prevent autosuspend during the reset */ usb_autoresume_device(udev); if (config) { for (i = 0; i < config->desc.bNumInterfaces; ++i) { struct usb_interface *cintf = config->interface[i]; struct usb_driver *drv; int unbind = 0; if (cintf->dev.driver) { drv = to_usb_driver(cintf->dev.driver); if (drv->pre_reset && drv->post_reset) unbind = (drv->pre_reset)(cintf); else if (cintf->condition == USB_INTERFACE_BOUND) unbind = 1; if (unbind) usb_forced_unbind_intf(cintf); } } } usb_lock_port(port_dev); ret = usb_reset_and_verify_device(udev); usb_unlock_port(port_dev); if (config) { for (i = config->desc.bNumInterfaces - 1; i >= 0; --i) { struct usb_interface *cintf = config->interface[i]; struct usb_driver *drv; int rebind = cintf->needs_binding; if (!rebind && cintf->dev.driver) { drv = to_usb_driver(cintf->dev.driver); if (drv->post_reset) rebind = (drv->post_reset)(cintf); else if (cintf->condition == USB_INTERFACE_BOUND) rebind = 1; if (rebind) cintf->needs_binding = 1; } } /* If the reset failed, hub_wq will unbind drivers later */ if (ret == 0) usb_unbind_and_rebind_marked_interfaces(udev); } usb_autosuspend_device(udev); memalloc_noio_restore(noio_flag); udev->reset_in_progress = 0; return ret; } EXPORT_SYMBOL_GPL(usb_reset_device); /** * usb_queue_reset_device - Reset a USB device from an atomic context * @iface: USB interface belonging to the device to reset * * This function can be used to reset a USB device from an atomic * context, where usb_reset_device() won't work (as it blocks). * * Doing a reset via this method is functionally equivalent to calling * usb_reset_device(), except for the fact that it is delayed to a * workqueue. This means that any drivers bound to other interfaces * might be unbound, as well as users from usbfs in user space. * * Corner cases: * * - Scheduling two resets at the same time from two different drivers * attached to two different interfaces of the same device is * possible; depending on how the driver attached to each interface * handles ->pre_reset(), the second reset might happen or not. * * - If the reset is delayed so long that the interface is unbound from * its driver, the reset will be skipped. * * - This function can be called during .probe(). It can also be called * during .disconnect(), but doing so is pointless because the reset * will not occur. If you really want to reset the device during * .disconnect(), call usb_reset_device() directly -- but watch out * for nested unbinding issues! */ void usb_queue_reset_device(struct usb_interface *iface) { if (schedule_work(&iface->reset_ws)) usb_get_intf(iface); } EXPORT_SYMBOL_GPL(usb_queue_reset_device); /** * usb_hub_find_child - Get the pointer of child device * attached to the port which is specified by @port1. * @hdev: USB device belonging to the usb hub * @port1: port num to indicate which port the child device * is attached to. * * USB drivers call this function to get hub's child device * pointer. * * Return: %NULL if input param is invalid and * child's usb_device pointer if non-NULL. */ struct usb_device *usb_hub_find_child(struct usb_device *hdev, int port1) { struct usb_hub *hub = usb_hub_to_struct_hub(hdev); if (port1 < 1 || port1 > hdev->maxchild) return NULL; return hub->ports[port1 - 1]->child; } EXPORT_SYMBOL_GPL(usb_hub_find_child); void usb_hub_adjust_deviceremovable(struct usb_device *hdev, struct usb_hub_descriptor *desc) { struct usb_hub *hub = usb_hub_to_struct_hub(hdev); enum usb_port_connect_type connect_type; int i; if (!hub) return; if (!hub_is_superspeed(hdev)) { for (i = 1; i <= hdev->maxchild; i++) { struct usb_port *port_dev = hub->ports[i - 1]; connect_type = port_dev->connect_type; if (connect_type == USB_PORT_CONNECT_TYPE_HARD_WIRED) { u8 mask = 1 << (i%8); if (!(desc->u.hs.DeviceRemovable[i/8] & mask)) { dev_dbg(&port_dev->dev, "DeviceRemovable is changed to 1 according to platform information.\n"); desc->u.hs.DeviceRemovable[i/8] |= mask; } } } } else { u16 port_removable = le16_to_cpu(desc->u.ss.DeviceRemovable); for (i = 1; i <= hdev->maxchild; i++) { struct usb_port *port_dev = hub->ports[i - 1]; connect_type = port_dev->connect_type; if (connect_type == USB_PORT_CONNECT_TYPE_HARD_WIRED) { u16 mask = 1 << i; if (!(port_removable & mask)) { dev_dbg(&port_dev->dev, "DeviceRemovable is changed to 1 according to platform information.\n"); port_removable |= mask; } } } desc->u.ss.DeviceRemovable = cpu_to_le16(port_removable); } } #ifdef CONFIG_ACPI /** * usb_get_hub_port_acpi_handle - Get the usb port's acpi handle * @hdev: USB device belonging to the usb hub * @port1: port num of the port * * Return: Port's acpi handle if successful, %NULL if params are * invalid. */ acpi_handle usb_get_hub_port_acpi_handle(struct usb_device *hdev, int port1) { struct usb_hub *hub = usb_hub_to_struct_hub(hdev); if (!hub) return NULL; return ACPI_HANDLE(&hub->ports[port1 - 1]->dev); } #endif |
| 17 78 78 97 97 97 98 3 78 77 78 78 107 1 108 107 107 36 266 266 266 95 95 18 17 18 3 2 3 76 7 78 2 2 2 2 2 2 2 2 79 77 2 2 79 77 91 98 94 3 3 3 18 18 18 94 77 18 95 3 3 2 3 3 3 3 94 95 94 95 95 94 93 3 1 3 3 2 3 2 3 3 3 3 3 3 1 1064 897 179 178 94 179 179 179 41 179 179 95 179 984 137 137 130 1 13 130 137 137 98 137 136 137 131 131 130 131 137 135 136 130 131 129 137 133 147 21 136 137 132 122 77 1732 576 1613 171 170 170 171 171 171 26 26 26 26 171 171 169 171 27 2 27 1842 1511 11 367 143 145 102 26 26 26 77 329 204 113 34 131 131 131 131 131 204 93 106 107 108 73 93 93 108 107 131 130 131 93 130 131 131 131 129 182 62 182 165 165 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 2674 2675 2676 2677 2678 2679 2680 2681 2682 2683 2684 2685 2686 2687 2688 2689 2690 2691 2692 2693 2694 2695 2696 2697 2698 2699 2700 2701 2702 2703 2704 2705 2706 2707 2708 2709 2710 2711 2712 2713 2714 2715 2716 2717 2718 2719 2720 2721 2722 2723 2724 2725 2726 2727 2728 2729 2730 2731 2732 2733 2734 2735 2736 2737 2738 2739 2740 2741 2742 2743 2744 2745 2746 2747 2748 2749 2750 2751 2752 2753 2754 2755 2756 2757 2758 2759 2760 2761 2762 2763 2764 2765 2766 2767 2768 2769 2770 2771 2772 2773 2774 2775 2776 2777 2778 2779 2780 2781 2782 2783 2784 2785 2786 2787 2788 2789 2790 2791 2792 2793 2794 2795 2796 2797 2798 2799 2800 2801 2802 2803 2804 2805 2806 2807 2808 2809 2810 2811 2812 2813 2814 2815 2816 2817 2818 2819 2820 2821 2822 2823 2824 2825 2826 2827 2828 2829 2830 2831 2832 2833 2834 2835 2836 2837 2838 2839 2840 2841 2842 2843 2844 2845 2846 2847 2848 2849 2850 2851 2852 2853 2854 2855 2856 2857 2858 2859 2860 2861 2862 2863 2864 2865 2866 2867 2868 2869 2870 2871 2872 2873 2874 2875 2876 2877 2878 2879 2880 2881 2882 2883 2884 2885 2886 2887 2888 2889 2890 2891 2892 2893 2894 2895 2896 2897 2898 2899 2900 2901 2902 2903 2904 2905 2906 2907 2908 2909 2910 2911 2912 2913 2914 2915 2916 2917 2918 2919 2920 2921 2922 2923 2924 2925 2926 2927 2928 2929 2930 2931 2932 2933 2934 2935 2936 2937 2938 2939 2940 2941 2942 2943 2944 2945 2946 2947 2948 2949 2950 2951 2952 2953 2954 2955 2956 2957 2958 2959 2960 2961 2962 2963 2964 2965 2966 2967 2968 2969 2970 2971 2972 2973 2974 2975 2976 2977 2978 2979 2980 2981 2982 2983 2984 2985 2986 2987 2988 2989 2990 2991 2992 2993 2994 2995 2996 2997 2998 2999 3000 3001 3002 3003 3004 3005 3006 3007 3008 3009 3010 3011 3012 3013 3014 3015 3016 3017 3018 3019 3020 3021 3022 3023 3024 3025 3026 3027 3028 3029 3030 3031 3032 3033 3034 3035 3036 3037 3038 3039 3040 3041 3042 3043 3044 3045 3046 3047 3048 3049 3050 3051 3052 3053 3054 3055 3056 3057 3058 3059 3060 3061 3062 3063 3064 3065 3066 3067 3068 3069 3070 3071 3072 3073 3074 3075 3076 3077 3078 3079 3080 3081 3082 3083 3084 3085 3086 3087 3088 3089 3090 3091 3092 3093 3094 3095 3096 3097 3098 3099 3100 3101 3102 3103 3104 3105 3106 3107 3108 3109 3110 3111 3112 3113 3114 3115 3116 3117 3118 3119 3120 3121 3122 3123 3124 3125 3126 3127 3128 3129 3130 3131 3132 3133 3134 3135 3136 3137 3138 3139 3140 3141 3142 3143 | // SPDX-License-Identifier: GPL-2.0-only /* * mm/page-writeback.c * * Copyright (C) 2002, Linus Torvalds. * Copyright (C) 2007 Red Hat, Inc., Peter Zijlstra * * Contains functions related to writing back dirty pages at the * address_space level. * * 10Apr2002 Andrew Morton * Initial version */ #include <linux/kernel.h> #include <linux/math64.h> #include <linux/export.h> #include <linux/spinlock.h> #include <linux/fs.h> #include <linux/mm.h> #include <linux/swap.h> #include <linux/slab.h> #include <linux/pagemap.h> #include <linux/writeback.h> #include <linux/init.h> #include <linux/backing-dev.h> #include <linux/task_io_accounting_ops.h> #include <linux/blkdev.h> #include <linux/mpage.h> #include <linux/rmap.h> #include <linux/percpu.h> #include <linux/smp.h> #include <linux/sysctl.h> #include <linux/cpu.h> #include <linux/syscalls.h> #include <linux/pagevec.h> #include <linux/timer.h> #include <linux/sched/rt.h> #include <linux/sched/signal.h> #include <linux/mm_inline.h> #include <linux/shmem_fs.h> #include <trace/events/writeback.h> #include "internal.h" /* * Sleep at most 200ms at a time in balance_dirty_pages(). */ #define MAX_PAUSE max(HZ/5, 1) /* * Try to keep balance_dirty_pages() call intervals higher than this many pages * by raising pause time to max_pause when falls below it. */ #define DIRTY_POLL_THRESH (128 >> (PAGE_SHIFT - 10)) /* * Estimate write bandwidth or update dirty limit at 200ms intervals. */ #define BANDWIDTH_INTERVAL max(HZ/5, 1) #define RATELIMIT_CALC_SHIFT 10 /* * After a CPU has dirtied this many pages, balance_dirty_pages_ratelimited * will look to see if it needs to force writeback or throttling. */ static long ratelimit_pages = 32; /* The following parameters are exported via /proc/sys/vm */ /* * Start background writeback (via writeback threads) at this percentage */ static int dirty_background_ratio = 10; /* * dirty_background_bytes starts at 0 (disabled) so that it is a function of * dirty_background_ratio * the amount of dirtyable memory */ static unsigned long dirty_background_bytes; /* * free highmem will not be subtracted from the total free memory * for calculating free ratios if vm_highmem_is_dirtyable is true */ static int vm_highmem_is_dirtyable; /* * The generator of dirty data starts writeback at this percentage */ static int vm_dirty_ratio = 20; /* * vm_dirty_bytes starts at 0 (disabled) so that it is a function of * vm_dirty_ratio * the amount of dirtyable memory */ static unsigned long vm_dirty_bytes; /* * The interval between `kupdate'-style writebacks */ unsigned int dirty_writeback_interval = 5 * 100; /* centiseconds */ EXPORT_SYMBOL_GPL(dirty_writeback_interval); /* * The longest time for which data is allowed to remain dirty */ unsigned int dirty_expire_interval = 30 * 100; /* centiseconds */ /* * Flag that puts the machine in "laptop mode". Doubles as a timeout in jiffies: * a full sync is triggered after this time elapses without any disk activity. */ int laptop_mode; EXPORT_SYMBOL(laptop_mode); /* End of sysctl-exported parameters */ struct wb_domain global_wb_domain; /* * Length of period for aging writeout fractions of bdis. This is an * arbitrarily chosen number. The longer the period, the slower fractions will * reflect changes in current writeout rate. */ #define VM_COMPLETIONS_PERIOD_LEN (3*HZ) #ifdef CONFIG_CGROUP_WRITEBACK #define GDTC_INIT(__wb) .wb = (__wb), \ .dom = &global_wb_domain, \ .wb_completions = &(__wb)->completions #define GDTC_INIT_NO_WB .dom = &global_wb_domain #define MDTC_INIT(__wb, __gdtc) .wb = (__wb), \ .dom = mem_cgroup_wb_domain(__wb), \ .wb_completions = &(__wb)->memcg_completions, \ .gdtc = __gdtc static bool mdtc_valid(struct dirty_throttle_control *dtc) { return dtc->dom; } static struct wb_domain *dtc_dom(struct dirty_throttle_control *dtc) { return dtc->dom; } static struct dirty_throttle_control *mdtc_gdtc(struct dirty_throttle_control *mdtc) { return mdtc->gdtc; } static struct fprop_local_percpu *wb_memcg_completions(struct bdi_writeback *wb) { return &wb->memcg_completions; } static void wb_min_max_ratio(struct bdi_writeback *wb, unsigned long *minp, unsigned long *maxp) { unsigned long this_bw = READ_ONCE(wb->avg_write_bandwidth); unsigned long tot_bw = atomic_long_read(&wb->bdi->tot_write_bandwidth); unsigned long long min = wb->bdi->min_ratio; unsigned long long max = wb->bdi->max_ratio; /* * @wb may already be clean by the time control reaches here and * the total may not include its bw. */ if (this_bw < tot_bw) { if (min) { min *= this_bw; min = div64_ul(min, tot_bw); } if (max < 100 * BDI_RATIO_SCALE) { max *= this_bw; max = div64_ul(max, tot_bw); } } *minp = min; *maxp = max; } #else /* CONFIG_CGROUP_WRITEBACK */ #define GDTC_INIT(__wb) .wb = (__wb), \ .wb_completions = &(__wb)->completions #define GDTC_INIT_NO_WB #define MDTC_INIT(__wb, __gdtc) static bool mdtc_valid(struct dirty_throttle_control *dtc) { return false; } static struct wb_domain *dtc_dom(struct dirty_throttle_control *dtc) { return &global_wb_domain; } static struct dirty_throttle_control *mdtc_gdtc(struct dirty_throttle_control *mdtc) { return NULL; } static struct fprop_local_percpu *wb_memcg_completions(struct bdi_writeback *wb) { return NULL; } static void wb_min_max_ratio(struct bdi_writeback *wb, unsigned long *minp, unsigned long *maxp) { *minp = wb->bdi->min_ratio; *maxp = wb->bdi->max_ratio; } #endif /* CONFIG_CGROUP_WRITEBACK */ /* * In a memory zone, there is a certain amount of pages we consider * available for the page cache, which is essentially the number of * free and reclaimable pages, minus some zone reserves to protect * lowmem and the ability to uphold the zone's watermarks without * requiring writeback. * * This number of dirtyable pages is the base value of which the * user-configurable dirty ratio is the effective number of pages that * are allowed to be actually dirtied. Per individual zone, or * globally by using the sum of dirtyable pages over all zones. * * Because the user is allowed to specify the dirty limit globally as * absolute number of bytes, calculating the per-zone dirty limit can * require translating the configured limit into a percentage of * global dirtyable memory first. */ /** * node_dirtyable_memory - number of dirtyable pages in a node * @pgdat: the node * * Return: the node's number of pages potentially available for dirty * page cache. This is the base value for the per-node dirty limits. */ static unsigned long node_dirtyable_memory(struct pglist_data *pgdat) { unsigned long nr_pages = 0; int z; for (z = 0; z < MAX_NR_ZONES; z++) { struct zone *zone = pgdat->node_zones + z; if (!populated_zone(zone)) continue; nr_pages += zone_page_state(zone, NR_FREE_PAGES); } /* * Pages reserved for the kernel should not be considered * dirtyable, to prevent a situation where reclaim has to * clean pages in order to balance the zones. */ nr_pages -= min(nr_pages, pgdat->totalreserve_pages); nr_pages += node_page_state(pgdat, NR_INACTIVE_FILE); nr_pages += node_page_state(pgdat, NR_ACTIVE_FILE); return nr_pages; } static unsigned long highmem_dirtyable_memory(unsigned long total) { #ifdef CONFIG_HIGHMEM int node; unsigned long x = 0; int i; for_each_node_state(node, N_HIGH_MEMORY) { for (i = ZONE_NORMAL + 1; i < MAX_NR_ZONES; i++) { struct zone *z; unsigned long nr_pages; if (!is_highmem_idx(i)) continue; z = &NODE_DATA(node)->node_zones[i]; if (!populated_zone(z)) continue; nr_pages = zone_page_state(z, NR_FREE_PAGES); /* watch for underflows */ nr_pages -= min(nr_pages, high_wmark_pages(z)); nr_pages += zone_page_state(z, NR_ZONE_INACTIVE_FILE); nr_pages += zone_page_state(z, NR_ZONE_ACTIVE_FILE); x += nr_pages; } } /* * Make sure that the number of highmem pages is never larger * than the number of the total dirtyable memory. This can only * occur in very strange VM situations but we want to make sure * that this does not occur. */ return min(x, total); #else return 0; #endif } /** * global_dirtyable_memory - number of globally dirtyable pages * * Return: the global number of pages potentially available for dirty * page cache. This is the base value for the global dirty limits. */ static unsigned long global_dirtyable_memory(void) { unsigned long x; x = global_zone_page_state(NR_FREE_PAGES); /* * Pages reserved for the kernel should not be considered * dirtyable, to prevent a situation where reclaim has to * clean pages in order to balance the zones. */ x -= min(x, totalreserve_pages); x += global_node_page_state(NR_INACTIVE_FILE); x += global_node_page_state(NR_ACTIVE_FILE); if (!vm_highmem_is_dirtyable) x -= highmem_dirtyable_memory(x); return x + 1; /* Ensure that we never return 0 */ } /** * domain_dirty_limits - calculate thresh and bg_thresh for a wb_domain * @dtc: dirty_throttle_control of interest * * Calculate @dtc->thresh and ->bg_thresh considering * vm_dirty_{bytes|ratio} and dirty_background_{bytes|ratio}. The caller * must ensure that @dtc->avail is set before calling this function. The * dirty limits will be lifted by 1/4 for real-time tasks. */ static void domain_dirty_limits(struct dirty_throttle_control *dtc) { const unsigned long available_memory = dtc->avail; struct dirty_throttle_control *gdtc = mdtc_gdtc(dtc); unsigned long bytes = vm_dirty_bytes; unsigned long bg_bytes = dirty_background_bytes; /* convert ratios to per-PAGE_SIZE for higher precision */ unsigned long ratio = (vm_dirty_ratio * PAGE_SIZE) / 100; unsigned long bg_ratio = (dirty_background_ratio * PAGE_SIZE) / 100; unsigned long thresh; unsigned long bg_thresh; struct task_struct *tsk; /* gdtc is !NULL iff @dtc is for memcg domain */ if (gdtc) { unsigned long global_avail = gdtc->avail; /* * The byte settings can't be applied directly to memcg * domains. Convert them to ratios by scaling against * globally available memory. As the ratios are in * per-PAGE_SIZE, they can be obtained by dividing bytes by * number of pages. */ if (bytes) ratio = min(DIV_ROUND_UP(bytes, global_avail), PAGE_SIZE); if (bg_bytes) bg_ratio = min(DIV_ROUND_UP(bg_bytes, global_avail), PAGE_SIZE); bytes = bg_bytes = 0; } if (bytes) thresh = DIV_ROUND_UP(bytes, PAGE_SIZE); else thresh = (ratio * available_memory) / PAGE_SIZE; if (bg_bytes) bg_thresh = DIV_ROUND_UP(bg_bytes, PAGE_SIZE); else bg_thresh = (bg_ratio * available_memory) / PAGE_SIZE; tsk = current; if (rt_or_dl_task(tsk)) { bg_thresh += bg_thresh / 4 + global_wb_domain.dirty_limit / 32; thresh += thresh / 4 + global_wb_domain.dirty_limit / 32; } /* * Dirty throttling logic assumes the limits in page units fit into * 32-bits. This gives 16TB dirty limits max which is hopefully enough. */ if (thresh > UINT_MAX) thresh = UINT_MAX; /* This makes sure bg_thresh is within 32-bits as well */ if (bg_thresh >= thresh) bg_thresh = thresh / 2; dtc->thresh = thresh; dtc->bg_thresh = bg_thresh; /* we should eventually report the domain in the TP */ if (!gdtc) trace_global_dirty_state(bg_thresh, thresh); } /** * global_dirty_limits - background-writeback and dirty-throttling thresholds * @pbackground: out parameter for bg_thresh * @pdirty: out parameter for thresh * * Calculate bg_thresh and thresh for global_wb_domain. See * domain_dirty_limits() for details. */ void global_dirty_limits(unsigned long *pbackground, unsigned long *pdirty) { struct dirty_throttle_control gdtc = { GDTC_INIT_NO_WB }; gdtc.avail = global_dirtyable_memory(); domain_dirty_limits(&gdtc); *pbackground = gdtc.bg_thresh; *pdirty = gdtc.thresh; } /** * node_dirty_limit - maximum number of dirty pages allowed in a node * @pgdat: the node * * Return: the maximum number of dirty pages allowed in a node, based * on the node's dirtyable memory. */ static unsigned long node_dirty_limit(struct pglist_data *pgdat) { unsigned long node_memory = node_dirtyable_memory(pgdat); struct task_struct *tsk = current; unsigned long dirty; if (vm_dirty_bytes) dirty = DIV_ROUND_UP(vm_dirty_bytes, PAGE_SIZE) * node_memory / global_dirtyable_memory(); else dirty = vm_dirty_ratio * node_memory / 100; if (rt_or_dl_task(tsk)) dirty += dirty / 4; /* * Dirty throttling logic assumes the limits in page units fit into * 32-bits. This gives 16TB dirty limits max which is hopefully enough. */ return min_t(unsigned long, dirty, UINT_MAX); } /** * node_dirty_ok - tells whether a node is within its dirty limits * @pgdat: the node to check * * Return: %true when the dirty pages in @pgdat are within the node's * dirty limit, %false if the limit is exceeded. */ bool node_dirty_ok(struct pglist_data *pgdat) { unsigned long limit = node_dirty_limit(pgdat); unsigned long nr_pages = 0; nr_pages += node_page_state(pgdat, NR_FILE_DIRTY); nr_pages += node_page_state(pgdat, NR_WRITEBACK); return nr_pages <= limit; } #ifdef CONFIG_SYSCTL static int dirty_background_ratio_handler(const struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { int ret; ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos); if (ret == 0 && write) dirty_background_bytes = 0; return ret; } static int dirty_background_bytes_handler(const struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { int ret; unsigned long old_bytes = dirty_background_bytes; ret = proc_doulongvec_minmax(table, write, buffer, lenp, ppos); if (ret == 0 && write) { if (DIV_ROUND_UP(dirty_background_bytes, PAGE_SIZE) > UINT_MAX) { dirty_background_bytes = old_bytes; return -ERANGE; } dirty_background_ratio = 0; } return ret; } static int dirty_ratio_handler(const struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { int old_ratio = vm_dirty_ratio; int ret; ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos); if (ret == 0 && write && vm_dirty_ratio != old_ratio) { vm_dirty_bytes = 0; writeback_set_ratelimit(); } return ret; } static int dirty_bytes_handler(const struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { unsigned long old_bytes = vm_dirty_bytes; int ret; ret = proc_doulongvec_minmax(table, write, buffer, lenp, ppos); if (ret == 0 && write && vm_dirty_bytes != old_bytes) { if (DIV_ROUND_UP(vm_dirty_bytes, PAGE_SIZE) > UINT_MAX) { vm_dirty_bytes = old_bytes; return -ERANGE; } writeback_set_ratelimit(); vm_dirty_ratio = 0; } return ret; } #endif static unsigned long wp_next_time(unsigned long cur_time) { cur_time += VM_COMPLETIONS_PERIOD_LEN; /* 0 has a special meaning... */ if (!cur_time) return 1; return cur_time; } static void wb_domain_writeout_add(struct wb_domain *dom, struct fprop_local_percpu *completions, unsigned int max_prop_frac, long nr) { __fprop_add_percpu_max(&dom->completions, completions, max_prop_frac, nr); /* First event after period switching was turned off? */ if (unlikely(!dom->period_time)) { /* * We can race with other wb_domain_writeout_add calls here but * it does not cause any harm since the resulting time when * timer will fire and what is in writeout_period_time will be * roughly the same. */ dom->period_time = wp_next_time(jiffies); mod_timer(&dom->period_timer, dom->period_time); } } /* * Increment @wb's writeout completion count and the global writeout * completion count. Called from __folio_end_writeback(). */ static inline void __wb_writeout_add(struct bdi_writeback *wb, long nr) { struct wb_domain *cgdom; wb_stat_mod(wb, WB_WRITTEN, nr); wb_domain_writeout_add(&global_wb_domain, &wb->completions, wb->bdi->max_prop_frac, nr); cgdom = mem_cgroup_wb_domain(wb); if (cgdom) wb_domain_writeout_add(cgdom, wb_memcg_completions(wb), wb->bdi->max_prop_frac, nr); } void wb_writeout_inc(struct bdi_writeback *wb) { unsigned long flags; local_irq_save(flags); __wb_writeout_add(wb, 1); local_irq_restore(flags); } EXPORT_SYMBOL_GPL(wb_writeout_inc); /* * On idle system, we can be called long after we scheduled because we use * deferred timers so count with missed periods. */ static void writeout_period(struct timer_list *t) { struct wb_domain *dom = timer_container_of(dom, t, period_timer); int miss_periods = (jiffies - dom->period_time) / VM_COMPLETIONS_PERIOD_LEN; if (fprop_new_period(&dom->completions, miss_periods + 1)) { dom->period_time = wp_next_time(dom->period_time + miss_periods * VM_COMPLETIONS_PERIOD_LEN); mod_timer(&dom->period_timer, dom->period_time); } else { /* * Aging has zeroed all fractions. Stop wasting CPU on period * updates. */ dom->period_time = 0; } } int wb_domain_init(struct wb_domain *dom, gfp_t gfp) { memset(dom, 0, sizeof(*dom)); spin_lock_init(&dom->lock); timer_setup(&dom->period_timer, writeout_period, TIMER_DEFERRABLE); dom->dirty_limit_tstamp = jiffies; return fprop_global_init(&dom->completions, gfp); } #ifdef CONFIG_CGROUP_WRITEBACK void wb_domain_exit(struct wb_domain *dom) { timer_delete_sync(&dom->period_timer); fprop_global_destroy(&dom->completions); } #endif /* * bdi_min_ratio keeps the sum of the minimum dirty shares of all * registered backing devices, which, for obvious reasons, can not * exceed 100%. */ static unsigned int bdi_min_ratio; static int bdi_check_pages_limit(unsigned long pages) { unsigned long max_dirty_pages = global_dirtyable_memory(); if (pages > max_dirty_pages) return -EINVAL; return 0; } static unsigned long bdi_ratio_from_pages(unsigned long pages) { unsigned long background_thresh; unsigned long dirty_thresh; unsigned long ratio; global_dirty_limits(&background_thresh, &dirty_thresh); if (!dirty_thresh) return -EINVAL; ratio = div64_u64(pages * 100ULL * BDI_RATIO_SCALE, dirty_thresh); return ratio; } static u64 bdi_get_bytes(unsigned int ratio) { unsigned long background_thresh; unsigned long dirty_thresh; u64 bytes; global_dirty_limits(&background_thresh, &dirty_thresh); bytes = (dirty_thresh * PAGE_SIZE * ratio) / BDI_RATIO_SCALE / 100; return bytes; } static int __bdi_set_min_ratio(struct backing_dev_info *bdi, unsigned int min_ratio) { unsigned int delta; int ret = 0; if (min_ratio > 100 * BDI_RATIO_SCALE) return -EINVAL; spin_lock_bh(&bdi_lock); if (min_ratio > bdi->max_ratio) { ret = -EINVAL; } else { if (min_ratio < bdi->min_ratio) { delta = bdi->min_ratio - min_ratio; bdi_min_ratio -= delta; bdi->min_ratio = min_ratio; } else { delta = min_ratio - bdi->min_ratio; if (bdi_min_ratio + delta < 100 * BDI_RATIO_SCALE) { bdi_min_ratio += delta; bdi->min_ratio = min_ratio; } else { ret = -EINVAL; } } } spin_unlock_bh(&bdi_lock); return ret; } static int __bdi_set_max_ratio(struct backing_dev_info *bdi, unsigned int max_ratio) { int ret = 0; if (max_ratio > 100 * BDI_RATIO_SCALE) return -EINVAL; spin_lock_bh(&bdi_lock); if (bdi->min_ratio > max_ratio) { ret = -EINVAL; } else { bdi->max_ratio = max_ratio; bdi->max_prop_frac = (FPROP_FRAC_BASE * max_ratio) / (100 * BDI_RATIO_SCALE); } spin_unlock_bh(&bdi_lock); return ret; } int bdi_set_min_ratio_no_scale(struct backing_dev_info *bdi, unsigned int min_ratio) { return __bdi_set_min_ratio(bdi, min_ratio); } int bdi_set_max_ratio_no_scale(struct backing_dev_info *bdi, unsigned int max_ratio) { return __bdi_set_max_ratio(bdi, max_ratio); } int bdi_set_min_ratio(struct backing_dev_info *bdi, unsigned int min_ratio) { return __bdi_set_min_ratio(bdi, min_ratio * BDI_RATIO_SCALE); } int bdi_set_max_ratio(struct backing_dev_info *bdi, unsigned int max_ratio) { return __bdi_set_max_ratio(bdi, max_ratio * BDI_RATIO_SCALE); } EXPORT_SYMBOL(bdi_set_max_ratio); u64 bdi_get_min_bytes(struct backing_dev_info *bdi) { return bdi_get_bytes(bdi->min_ratio); } int bdi_set_min_bytes(struct backing_dev_info *bdi, u64 min_bytes) { int ret; unsigned long pages = min_bytes >> PAGE_SHIFT; long min_ratio; ret = bdi_check_pages_limit(pages); if (ret) return ret; min_ratio = bdi_ratio_from_pages(pages); if (min_ratio < 0) return min_ratio; return __bdi_set_min_ratio(bdi, min_ratio); } u64 bdi_get_max_bytes(struct backing_dev_info *bdi) { return bdi_get_bytes(bdi->max_ratio); } int bdi_set_max_bytes(struct backing_dev_info *bdi, u64 max_bytes) { int ret; unsigned long pages = max_bytes >> PAGE_SHIFT; long max_ratio; ret = bdi_check_pages_limit(pages); if (ret) return ret; max_ratio = bdi_ratio_from_pages(pages); if (max_ratio < 0) return max_ratio; return __bdi_set_max_ratio(bdi, max_ratio); } int bdi_set_strict_limit(struct backing_dev_info *bdi, unsigned int strict_limit) { if (strict_limit > 1) return -EINVAL; spin_lock_bh(&bdi_lock); if (strict_limit) bdi->capabilities |= BDI_CAP_STRICTLIMIT; else bdi->capabilities &= ~BDI_CAP_STRICTLIMIT; spin_unlock_bh(&bdi_lock); return 0; } static unsigned long dirty_freerun_ceiling(unsigned long thresh, unsigned long bg_thresh) { return (thresh + bg_thresh) / 2; } static unsigned long hard_dirty_limit(struct wb_domain *dom, unsigned long thresh) { return max(thresh, dom->dirty_limit); } /* * Memory which can be further allocated to a memcg domain is capped by * system-wide clean memory excluding the amount being used in the domain. */ static void mdtc_calc_avail(struct dirty_throttle_control *mdtc, unsigned long filepages, unsigned long headroom) { struct dirty_throttle_control *gdtc = mdtc_gdtc(mdtc); unsigned long clean = filepages - min(filepages, mdtc->dirty); unsigned long global_clean = gdtc->avail - min(gdtc->avail, gdtc->dirty); unsigned long other_clean = global_clean - min(global_clean, clean); mdtc->avail = filepages + min(headroom, other_clean); } static inline bool dtc_is_global(struct dirty_throttle_control *dtc) { return mdtc_gdtc(dtc) == NULL; } /* * Dirty background will ignore pages being written as we're trying to * decide whether to put more under writeback. */ static void domain_dirty_avail(struct dirty_throttle_control *dtc, bool include_writeback) { if (dtc_is_global(dtc)) { dtc->avail = global_dirtyable_memory(); dtc->dirty = global_node_page_state(NR_FILE_DIRTY); if (include_writeback) dtc->dirty += global_node_page_state(NR_WRITEBACK); } else { unsigned long filepages = 0, headroom = 0, writeback = 0; mem_cgroup_wb_stats(dtc->wb, &filepages, &headroom, &dtc->dirty, &writeback); if (include_writeback) dtc->dirty += writeback; mdtc_calc_avail(dtc, filepages, headroom); } } /** * __wb_calc_thresh - @wb's share of dirty threshold * @dtc: dirty_throttle_context of interest * @thresh: dirty throttling or dirty background threshold of wb_domain in @dtc * * Note that balance_dirty_pages() will only seriously take dirty throttling * threshold as a hard limit when sleeping max_pause per page is not enough * to keep the dirty pages under control. For example, when the device is * completely stalled due to some error conditions, or when there are 1000 * dd tasks writing to a slow 10MB/s USB key. * In the other normal situations, it acts more gently by throttling the tasks * more (rather than completely block them) when the wb dirty pages go high. * * It allocates high/low dirty limits to fast/slow devices, in order to prevent * - starving fast devices * - piling up dirty pages (that will take long time to sync) on slow devices * * The wb's share of dirty limit will be adapting to its throughput and * bounded by the bdi->min_ratio and/or bdi->max_ratio parameters, if set. * * Return: @wb's dirty limit in pages. For dirty throttling limit, the term * "dirty" in the context of dirty balancing includes all PG_dirty and * PG_writeback pages. */ static unsigned long __wb_calc_thresh(struct dirty_throttle_control *dtc, unsigned long thresh) { struct wb_domain *dom = dtc_dom(dtc); struct bdi_writeback *wb = dtc->wb; u64 wb_thresh; u64 wb_max_thresh; unsigned long numerator, denominator; unsigned long wb_min_ratio, wb_max_ratio; /* * Calculate this wb's share of the thresh ratio. */ fprop_fraction_percpu(&dom->completions, dtc->wb_completions, &numerator, &denominator); wb_thresh = (thresh * (100 * BDI_RATIO_SCALE - bdi_min_ratio)) / (100 * BDI_RATIO_SCALE); wb_thresh *= numerator; wb_thresh = div64_ul(wb_thresh, denominator); wb_min_max_ratio(wb, &wb_min_ratio, &wb_max_ratio); wb_thresh += (thresh * wb_min_ratio) / (100 * BDI_RATIO_SCALE); /* * It's very possible that wb_thresh is close to 0 not because the * device is slow, but that it has remained inactive for long time. * Honour such devices a reasonable good (hopefully IO efficient) * threshold, so that the occasional writes won't be blocked and active * writes can rampup the threshold quickly. */ if (thresh > dtc->dirty) { if (unlikely(wb->bdi->capabilities & BDI_CAP_STRICTLIMIT)) wb_thresh = max(wb_thresh, (thresh - dtc->dirty) / 100); else wb_thresh = max(wb_thresh, (thresh - dtc->dirty) / 8); } wb_max_thresh = thresh * wb_max_ratio / (100 * BDI_RATIO_SCALE); if (wb_thresh > wb_max_thresh) wb_thresh = wb_max_thresh; return wb_thresh; } unsigned long wb_calc_thresh(struct bdi_writeback *wb, unsigned long thresh) { struct dirty_throttle_control gdtc = { GDTC_INIT(wb) }; domain_dirty_avail(&gdtc, true); return __wb_calc_thresh(&gdtc, thresh); } unsigned long cgwb_calc_thresh(struct bdi_writeback *wb) { struct dirty_throttle_control gdtc = { GDTC_INIT_NO_WB }; struct dirty_throttle_control mdtc = { MDTC_INIT(wb, &gdtc) }; domain_dirty_avail(&gdtc, true); domain_dirty_avail(&mdtc, true); domain_dirty_limits(&mdtc); return __wb_calc_thresh(&mdtc, mdtc.thresh); } /* * setpoint - dirty 3 * f(dirty) := 1.0 + (----------------) * limit - setpoint * * it's a 3rd order polynomial that subjects to * * (1) f(freerun) = 2.0 => rampup dirty_ratelimit reasonably fast * (2) f(setpoint) = 1.0 => the balance point * (3) f(limit) = 0 => the hard limit * (4) df/dx <= 0 => negative feedback control * (5) the closer to setpoint, the smaller |df/dx| (and the reverse) * => fast response on large errors; small oscillation near setpoint */ static long long pos_ratio_polynom(unsigned long setpoint, unsigned long dirty, unsigned long limit) { long long pos_ratio; long x; x = div64_s64(((s64)setpoint - (s64)dirty) << RATELIMIT_CALC_SHIFT, (limit - setpoint) | 1); pos_ratio = x; pos_ratio = pos_ratio * x >> RATELIMIT_CALC_SHIFT; pos_ratio = pos_ratio * x >> RATELIMIT_CALC_SHIFT; pos_ratio += 1 << RATELIMIT_CALC_SHIFT; return clamp(pos_ratio, 0LL, 2LL << RATELIMIT_CALC_SHIFT); } /* * Dirty position control. * * (o) global/bdi setpoints * * We want the dirty pages be balanced around the global/wb setpoints. * When the number of dirty pages is higher/lower than the setpoint, the * dirty position control ratio (and hence task dirty ratelimit) will be * decreased/increased to bring the dirty pages back to the setpoint. * * pos_ratio = 1 << RATELIMIT_CALC_SHIFT * * if (dirty < setpoint) scale up pos_ratio * if (dirty > setpoint) scale down pos_ratio * * if (wb_dirty < wb_setpoint) scale up pos_ratio * if (wb_dirty > wb_setpoint) scale down pos_ratio * * task_ratelimit = dirty_ratelimit * pos_ratio >> RATELIMIT_CALC_SHIFT * * (o) global control line * * ^ pos_ratio * | * | |<===== global dirty control scope ======>| * 2.0 * * * * * * * * | .* * | . * * | . * * | . * * | . * * | . * * 1.0 ................................* * | . . * * | . . * * | . . * * | . . * * | . . * * 0 +------------.------------------.----------------------*-------------> * freerun^ setpoint^ limit^ dirty pages * * (o) wb control line * * ^ pos_ratio * | * | * * | * * | * * | * * | * |<=========== span ============>| * 1.0 .......................* * | . * * | . * * | . * * | . * * | . * * | . * * | . * * | . * * | . * * | . * * | . * * 1/4 ...............................................* * * * * * * * * * * * * | . . * | . . * | . . * 0 +----------------------.-------------------------------.-------------> * wb_setpoint^ x_intercept^ * * The wb control line won't drop below pos_ratio=1/4, so that wb_dirty can * be smoothly throttled down to normal if it starts high in situations like * - start writing to a slow SD card and a fast disk at the same time. The SD * card's wb_dirty may rush to many times higher than wb_setpoint. * - the wb dirty thresh drops quickly due to change of JBOD workload */ static void wb_position_ratio(struct dirty_throttle_control *dtc) { struct bdi_writeback *wb = dtc->wb; unsigned long write_bw = READ_ONCE(wb->avg_write_bandwidth); unsigned long freerun = dirty_freerun_ceiling(dtc->thresh, dtc->bg_thresh); unsigned long limit = dtc->limit = hard_dirty_limit(dtc_dom(dtc), dtc->thresh); unsigned long wb_thresh = dtc->wb_thresh; unsigned long x_intercept; unsigned long setpoint; /* dirty pages' target balance point */ unsigned long wb_setpoint; unsigned long span; long long pos_ratio; /* for scaling up/down the rate limit */ long x; dtc->pos_ratio = 0; if (unlikely(dtc->dirty >= limit)) return; /* * global setpoint * * See comment for pos_ratio_polynom(). */ setpoint = (freerun + limit) / 2; pos_ratio = pos_ratio_polynom(setpoint, dtc->dirty, limit); /* * The strictlimit feature is a tool preventing mistrusted filesystems * from growing a large number of dirty pages before throttling. For * such filesystems balance_dirty_pages always checks wb counters * against wb limits. Even if global "nr_dirty" is under "freerun". * This is especially important for fuse which sets bdi->max_ratio to * 1% by default. * * Here, in wb_position_ratio(), we calculate pos_ratio based on * two values: wb_dirty and wb_thresh. Let's consider an example: * total amount of RAM is 16GB, bdi->max_ratio is equal to 1%, global * limits are set by default to 10% and 20% (background and throttle). * Then wb_thresh is 1% of 20% of 16GB. This amounts to ~8K pages. * wb_calc_thresh(wb, bg_thresh) is about ~4K pages. wb_setpoint is * about ~6K pages (as the average of background and throttle wb * limits). The 3rd order polynomial will provide positive feedback if * wb_dirty is under wb_setpoint and vice versa. * * Note, that we cannot use global counters in these calculations * because we want to throttle process writing to a strictlimit wb * much earlier than global "freerun" is reached (~23MB vs. ~2.3GB * in the example above). */ if (unlikely(wb->bdi->capabilities & BDI_CAP_STRICTLIMIT)) { long long wb_pos_ratio; if (dtc->wb_dirty >= wb_thresh) return; wb_setpoint = dirty_freerun_ceiling(wb_thresh, dtc->wb_bg_thresh); if (wb_setpoint == 0 || wb_setpoint == wb_thresh) return; wb_pos_ratio = pos_ratio_polynom(wb_setpoint, dtc->wb_dirty, wb_thresh); /* * Typically, for strictlimit case, wb_setpoint << setpoint * and pos_ratio >> wb_pos_ratio. In the other words global * state ("dirty") is not limiting factor and we have to * make decision based on wb counters. But there is an * important case when global pos_ratio should get precedence: * global limits are exceeded (e.g. due to activities on other * wb's) while given strictlimit wb is below limit. * * "pos_ratio * wb_pos_ratio" would work for the case above, * but it would look too non-natural for the case of all * activity in the system coming from a single strictlimit wb * with bdi->max_ratio == 100%. * * Note that min() below somewhat changes the dynamics of the * control system. Normally, pos_ratio value can be well over 3 * (when globally we are at freerun and wb is well below wb * setpoint). Now the maximum pos_ratio in the same situation * is 2. We might want to tweak this if we observe the control * system is too slow to adapt. */ dtc->pos_ratio = min(pos_ratio, wb_pos_ratio); return; } /* * We have computed basic pos_ratio above based on global situation. If * the wb is over/under its share of dirty pages, we want to scale * pos_ratio further down/up. That is done by the following mechanism. */ /* * wb setpoint * * f(wb_dirty) := 1.0 + k * (wb_dirty - wb_setpoint) * * x_intercept - wb_dirty * := -------------------------- * x_intercept - wb_setpoint * * The main wb control line is a linear function that subjects to * * (1) f(wb_setpoint) = 1.0 * (2) k = - 1 / (8 * write_bw) (in single wb case) * or equally: x_intercept = wb_setpoint + 8 * write_bw * * For single wb case, the dirty pages are observed to fluctuate * regularly within range * [wb_setpoint - write_bw/2, wb_setpoint + write_bw/2] * for various filesystems, where (2) can yield in a reasonable 12.5% * fluctuation range for pos_ratio. * * For JBOD case, wb_thresh (not wb_dirty!) could fluctuate up to its * own size, so move the slope over accordingly and choose a slope that * yields 100% pos_ratio fluctuation on suddenly doubled wb_thresh. */ if (unlikely(wb_thresh > dtc->thresh)) wb_thresh = dtc->thresh; /* * scale global setpoint to wb's: * wb_setpoint = setpoint * wb_thresh / thresh */ x = div_u64((u64)wb_thresh << 16, dtc->thresh | 1); wb_setpoint = setpoint * (u64)x >> 16; /* * Use span=(8*write_bw) in single wb case as indicated by * (thresh - wb_thresh ~= 0) and transit to wb_thresh in JBOD case. * * wb_thresh thresh - wb_thresh * span = --------- * (8 * write_bw) + ------------------ * wb_thresh * thresh thresh */ span = (dtc->thresh - wb_thresh + 8 * write_bw) * (u64)x >> 16; x_intercept = wb_setpoint + span; if (dtc->wb_dirty < x_intercept - span / 4) { pos_ratio = div64_u64(pos_ratio * (x_intercept - dtc->wb_dirty), (x_intercept - wb_setpoint) | 1); } else pos_ratio /= 4; /* * wb reserve area, safeguard against dirty pool underrun and disk idle * It may push the desired control point of global dirty pages higher * than setpoint. */ x_intercept = wb_thresh / 2; if (dtc->wb_dirty < x_intercept) { if (dtc->wb_dirty > x_intercept / 8) pos_ratio = div_u64(pos_ratio * x_intercept, dtc->wb_dirty); else pos_ratio *= 8; } dtc->pos_ratio = pos_ratio; } static void wb_update_write_bandwidth(struct bdi_writeback *wb, unsigned long elapsed, unsigned long written) { const unsigned long period = roundup_pow_of_two(3 * HZ); unsigned long avg = wb->avg_write_bandwidth; unsigned long old = wb->write_bandwidth; u64 bw; /* * bw = written * HZ / elapsed * * bw * elapsed + write_bandwidth * (period - elapsed) * write_bandwidth = --------------------------------------------------- * period * * @written may have decreased due to folio_redirty_for_writepage(). * Avoid underflowing @bw calculation. */ bw = written - min(written, wb->written_stamp); bw *= HZ; if (unlikely(elapsed > period)) { bw = div64_ul(bw, elapsed); avg = bw; goto out; } bw += (u64)wb->write_bandwidth * (period - elapsed); bw >>= ilog2(period); /* * one more level of smoothing, for filtering out sudden spikes */ if (avg > old && old >= (unsigned long)bw) avg -= (avg - old) >> 3; if (avg < old && old <= (unsigned long)bw) avg += (old - avg) >> 3; out: /* keep avg > 0 to guarantee that tot > 0 if there are dirty wbs */ avg = max(avg, 1LU); if (wb_has_dirty_io(wb)) { long delta = avg - wb->avg_write_bandwidth; WARN_ON_ONCE(atomic_long_add_return(delta, &wb->bdi->tot_write_bandwidth) <= 0); } wb->write_bandwidth = bw; WRITE_ONCE(wb->avg_write_bandwidth, avg); } static void update_dirty_limit(struct dirty_throttle_control *dtc) { struct wb_domain *dom = dtc_dom(dtc); unsigned long thresh = dtc->thresh; unsigned long limit = dom->dirty_limit; /* * Follow up in one step. */ if (limit < thresh) { limit = thresh; goto update; } /* * Follow down slowly. Use the higher one as the target, because thresh * may drop below dirty. This is exactly the reason to introduce * dom->dirty_limit which is guaranteed to lie above the dirty pages. */ thresh = max(thresh, dtc->dirty); if (limit > thresh) { limit -= (limit - thresh) >> 5; goto update; } return; update: dom->dirty_limit = limit; } static void domain_update_dirty_limit(struct dirty_throttle_control *dtc, unsigned long now) { struct wb_domain *dom = dtc_dom(dtc); /* * check locklessly first to optimize away locking for the most time */ if (time_before(now, dom->dirty_limit_tstamp + BANDWIDTH_INTERVAL)) return; spin_lock(&dom->lock); if (time_after_eq(now, dom->dirty_limit_tstamp + BANDWIDTH_INTERVAL)) { update_dirty_limit(dtc); dom->dirty_limit_tstamp = now; } spin_unlock(&dom->lock); } /* * Maintain wb->dirty_ratelimit, the base dirty throttle rate. * * Normal wb tasks will be curbed at or below it in long term. * Obviously it should be around (write_bw / N) when there are N dd tasks. */ static void wb_update_dirty_ratelimit(struct dirty_throttle_control *dtc, unsigned long dirtied, unsigned long elapsed) { struct bdi_writeback *wb = dtc->wb; unsigned long dirty = dtc->dirty; unsigned long freerun = dirty_freerun_ceiling(dtc->thresh, dtc->bg_thresh); unsigned long limit = hard_dirty_limit(dtc_dom(dtc), dtc->thresh); unsigned long setpoint = (freerun + limit) / 2; unsigned long write_bw = wb->avg_write_bandwidth; unsigned long dirty_ratelimit = wb->dirty_ratelimit; unsigned long dirty_rate; unsigned long task_ratelimit; unsigned long balanced_dirty_ratelimit; unsigned long step; unsigned long x; unsigned long shift; /* * The dirty rate will match the writeout rate in long term, except * when dirty pages are truncated by userspace or re-dirtied by FS. */ dirty_rate = (dirtied - wb->dirtied_stamp) * HZ / elapsed; /* * task_ratelimit reflects each dd's dirty rate for the past 200ms. */ task_ratelimit = (u64)dirty_ratelimit * dtc->pos_ratio >> RATELIMIT_CALC_SHIFT; task_ratelimit++; /* it helps rampup dirty_ratelimit from tiny values */ /* * A linear estimation of the "balanced" throttle rate. The theory is, * if there are N dd tasks, each throttled at task_ratelimit, the wb's * dirty_rate will be measured to be (N * task_ratelimit). So the below * formula will yield the balanced rate limit (write_bw / N). * * Note that the expanded form is not a pure rate feedback: * rate_(i+1) = rate_(i) * (write_bw / dirty_rate) (1) * but also takes pos_ratio into account: * rate_(i+1) = rate_(i) * (write_bw / dirty_rate) * pos_ratio (2) * * (1) is not realistic because pos_ratio also takes part in balancing * the dirty rate. Consider the state * pos_ratio = 0.5 (3) * rate = 2 * (write_bw / N) (4) * If (1) is used, it will stuck in that state! Because each dd will * be throttled at * task_ratelimit = pos_ratio * rate = (write_bw / N) (5) * yielding * dirty_rate = N * task_ratelimit = write_bw (6) * put (6) into (1) we get * rate_(i+1) = rate_(i) (7) * * So we end up using (2) to always keep * rate_(i+1) ~= (write_bw / N) (8) * regardless of the value of pos_ratio. As long as (8) is satisfied, * pos_ratio is able to drive itself to 1.0, which is not only where * the dirty count meet the setpoint, but also where the slope of * pos_ratio is most flat and hence task_ratelimit is least fluctuated. */ balanced_dirty_ratelimit = div_u64((u64)task_ratelimit * write_bw, dirty_rate | 1); /* * balanced_dirty_ratelimit ~= (write_bw / N) <= write_bw */ if (unlikely(balanced_dirty_ratelimit > write_bw)) balanced_dirty_ratelimit = write_bw; /* * We could safely do this and return immediately: * * wb->dirty_ratelimit = balanced_dirty_ratelimit; * * However to get a more stable dirty_ratelimit, the below elaborated * code makes use of task_ratelimit to filter out singular points and * limit the step size. * * The below code essentially only uses the relative value of * * task_ratelimit - dirty_ratelimit * = (pos_ratio - 1) * dirty_ratelimit * * which reflects the direction and size of dirty position error. */ /* * dirty_ratelimit will follow balanced_dirty_ratelimit iff * task_ratelimit is on the same side of dirty_ratelimit, too. * For example, when * - dirty_ratelimit > balanced_dirty_ratelimit * - dirty_ratelimit > task_ratelimit (dirty pages are above setpoint) * lowering dirty_ratelimit will help meet both the position and rate * control targets. Otherwise, don't update dirty_ratelimit if it will * only help meet the rate target. After all, what the users ultimately * feel and care are stable dirty rate and small position error. * * |task_ratelimit - dirty_ratelimit| is used to limit the step size * and filter out the singular points of balanced_dirty_ratelimit. Which * keeps jumping around randomly and can even leap far away at times * due to the small 200ms estimation period of dirty_rate (we want to * keep that period small to reduce time lags). */ step = 0; /* * For strictlimit case, calculations above were based on wb counters * and limits (starting from pos_ratio = wb_position_ratio() and up to * balanced_dirty_ratelimit = task_ratelimit * write_bw / dirty_rate). * Hence, to calculate "step" properly, we have to use wb_dirty as * "dirty" and wb_setpoint as "setpoint". */ if (unlikely(wb->bdi->capabilities & BDI_CAP_STRICTLIMIT)) { dirty = dtc->wb_dirty; setpoint = (dtc->wb_thresh + dtc->wb_bg_thresh) / 2; } if (dirty < setpoint) { x = min3(wb->balanced_dirty_ratelimit, balanced_dirty_ratelimit, task_ratelimit); if (dirty_ratelimit < x) step = x - dirty_ratelimit; } else { x = max3(wb->balanced_dirty_ratelimit, balanced_dirty_ratelimit, task_ratelimit); if (dirty_ratelimit > x) step = dirty_ratelimit - x; } /* * Don't pursue 100% rate matching. It's impossible since the balanced * rate itself is constantly fluctuating. So decrease the track speed * when it gets close to the target. Helps eliminate pointless tremors. */ shift = dirty_ratelimit / (2 * step + 1); if (shift < BITS_PER_LONG) step = DIV_ROUND_UP(step >> shift, 8); else step = 0; if (dirty_ratelimit < balanced_dirty_ratelimit) dirty_ratelimit += step; else dirty_ratelimit -= step; WRITE_ONCE(wb->dirty_ratelimit, max(dirty_ratelimit, 1UL)); wb->balanced_dirty_ratelimit = balanced_dirty_ratelimit; trace_bdi_dirty_ratelimit(wb, dirty_rate, task_ratelimit); } static void __wb_update_bandwidth(struct dirty_throttle_control *gdtc, struct dirty_throttle_control *mdtc, bool update_ratelimit) { struct bdi_writeback *wb = gdtc->wb; unsigned long now = jiffies; unsigned long elapsed; unsigned long dirtied; unsigned long written; spin_lock(&wb->list_lock); /* * Lockless checks for elapsed time are racy and delayed update after * IO completion doesn't do it at all (to make sure written pages are * accounted reasonably quickly). Make sure elapsed >= 1 to avoid * division errors. */ elapsed = max(now - wb->bw_time_stamp, 1UL); dirtied = percpu_counter_read(&wb->stat[WB_DIRTIED]); written = percpu_counter_read(&wb->stat[WB_WRITTEN]); if (update_ratelimit) { domain_update_dirty_limit(gdtc, now); wb_update_dirty_ratelimit(gdtc, dirtied, elapsed); /* * @mdtc is always NULL if !CGROUP_WRITEBACK but the * compiler has no way to figure that out. Help it. */ if (IS_ENABLED(CONFIG_CGROUP_WRITEBACK) && mdtc) { domain_update_dirty_limit(mdtc, now); wb_update_dirty_ratelimit(mdtc, dirtied, elapsed); } } wb_update_write_bandwidth(wb, elapsed, written); wb->dirtied_stamp = dirtied; wb->written_stamp = written; WRITE_ONCE(wb->bw_time_stamp, now); spin_unlock(&wb->list_lock); } void wb_update_bandwidth(struct bdi_writeback *wb) { struct dirty_throttle_control gdtc = { GDTC_INIT(wb) }; __wb_update_bandwidth(&gdtc, NULL, false); } /* Interval after which we consider wb idle and don't estimate bandwidth */ #define WB_BANDWIDTH_IDLE_JIF (HZ) static void wb_bandwidth_estimate_start(struct bdi_writeback *wb) { unsigned long now = jiffies; unsigned long elapsed = now - READ_ONCE(wb->bw_time_stamp); if (elapsed > WB_BANDWIDTH_IDLE_JIF && !atomic_read(&wb->writeback_inodes)) { spin_lock(&wb->list_lock); wb->dirtied_stamp = wb_stat(wb, WB_DIRTIED); wb->written_stamp = wb_stat(wb, WB_WRITTEN); WRITE_ONCE(wb->bw_time_stamp, now); spin_unlock(&wb->list_lock); } } /* * After a task dirtied this many pages, balance_dirty_pages_ratelimited() * will look to see if it needs to start dirty throttling. * * If dirty_poll_interval is too low, big NUMA machines will call the expensive * global_zone_page_state() too often. So scale it near-sqrt to the safety margin * (the number of pages we may dirty without exceeding the dirty limits). */ static unsigned long dirty_poll_interval(unsigned long dirty, unsigned long thresh) { if (thresh > dirty) return 1UL << (ilog2(thresh - dirty) >> 1); return 1; } static unsigned long wb_max_pause(struct bdi_writeback *wb, unsigned long wb_dirty) { unsigned long bw = READ_ONCE(wb->avg_write_bandwidth); unsigned long t; /* * Limit pause time for small memory systems. If sleeping for too long * time, a small pool of dirty/writeback pages may go empty and disk go * idle. * * 8 serves as the safety ratio. */ t = wb_dirty / (1 + bw / roundup_pow_of_two(1 + HZ / 8)); t++; return min_t(unsigned long, t, MAX_PAUSE); } static long wb_min_pause(struct bdi_writeback *wb, long max_pause, unsigned long task_ratelimit, unsigned long dirty_ratelimit, int *nr_dirtied_pause) { long hi = ilog2(READ_ONCE(wb->avg_write_bandwidth)); long lo = ilog2(READ_ONCE(wb->dirty_ratelimit)); long t; /* target pause */ long pause; /* estimated next pause */ int pages; /* target nr_dirtied_pause */ /* target for 10ms pause on 1-dd case */ t = max(1, HZ / 100); /* * Scale up pause time for concurrent dirtiers in order to reduce CPU * overheads. * * (N * 10ms) on 2^N concurrent tasks. */ if (hi > lo) t += (hi - lo) * (10 * HZ) / 1024; /* * This is a bit convoluted. We try to base the next nr_dirtied_pause * on the much more stable dirty_ratelimit. However the next pause time * will be computed based on task_ratelimit and the two rate limits may * depart considerably at some time. Especially if task_ratelimit goes * below dirty_ratelimit/2 and the target pause is max_pause, the next * pause time will be max_pause*2 _trimmed down_ to max_pause. As a * result task_ratelimit won't be executed faithfully, which could * eventually bring down dirty_ratelimit. * * We apply two rules to fix it up: * 1) try to estimate the next pause time and if necessary, use a lower * nr_dirtied_pause so as not to exceed max_pause. When this happens, * nr_dirtied_pause will be "dancing" with task_ratelimit. * 2) limit the target pause time to max_pause/2, so that the normal * small fluctuations of task_ratelimit won't trigger rule (1) and * nr_dirtied_pause will remain as stable as dirty_ratelimit. */ t = min(t, 1 + max_pause / 2); pages = dirty_ratelimit * t / roundup_pow_of_two(HZ); /* * Tiny nr_dirtied_pause is found to hurt I/O performance in the test * case fio-mmap-randwrite-64k, which does 16*{sync read, async write}. * When the 16 consecutive reads are often interrupted by some dirty * throttling pause during the async writes, cfq will go into idles * (deadline is fine). So push nr_dirtied_pause as high as possible * until reaches DIRTY_POLL_THRESH=32 pages. */ if (pages < DIRTY_POLL_THRESH) { t = max_pause; pages = dirty_ratelimit * t / roundup_pow_of_two(HZ); if (pages > DIRTY_POLL_THRESH) { pages = DIRTY_POLL_THRESH; t = HZ * DIRTY_POLL_THRESH / dirty_ratelimit; } } pause = HZ * pages / (task_ratelimit + 1); if (pause > max_pause) { t = max_pause; pages = task_ratelimit * t / roundup_pow_of_two(HZ); } *nr_dirtied_pause = pages; /* * The minimal pause time will normally be half the target pause time. */ return pages >= DIRTY_POLL_THRESH ? 1 + t / 2 : t; } static inline void wb_dirty_limits(struct dirty_throttle_control *dtc) { struct bdi_writeback *wb = dtc->wb; unsigned long wb_reclaimable; /* * wb_thresh is not treated as some limiting factor as * dirty_thresh, due to reasons * - in JBOD setup, wb_thresh can fluctuate a lot * - in a system with HDD and USB key, the USB key may somehow * go into state (wb_dirty >> wb_thresh) either because * wb_dirty starts high, or because wb_thresh drops low. * In this case we don't want to hard throttle the USB key * dirtiers for 100 seconds until wb_dirty drops under * wb_thresh. Instead the auxiliary wb control line in * wb_position_ratio() will let the dirtier task progress * at some rate <= (write_bw / 2) for bringing down wb_dirty. */ dtc->wb_thresh = __wb_calc_thresh(dtc, dtc->thresh); dtc->wb_bg_thresh = dtc->thresh ? div_u64((u64)dtc->wb_thresh * dtc->bg_thresh, dtc->thresh) : 0; /* * In order to avoid the stacked BDI deadlock we need * to ensure we accurately count the 'dirty' pages when * the threshold is low. * * Otherwise it would be possible to get thresh+n pages * reported dirty, even though there are thresh-m pages * actually dirty; with m+n sitting in the percpu * deltas. */ if (dtc->wb_thresh < 2 * wb_stat_error()) { wb_reclaimable = wb_stat_sum(wb, WB_RECLAIMABLE); dtc->wb_dirty = wb_reclaimable + wb_stat_sum(wb, WB_WRITEBACK); } else { wb_reclaimable = wb_stat(wb, WB_RECLAIMABLE); dtc->wb_dirty = wb_reclaimable + wb_stat(wb, WB_WRITEBACK); } } static unsigned long domain_poll_intv(struct dirty_throttle_control *dtc, bool strictlimit) { unsigned long dirty, thresh; if (strictlimit) { dirty = dtc->wb_dirty; thresh = dtc->wb_thresh; } else { dirty = dtc->dirty; thresh = dtc->thresh; } return dirty_poll_interval(dirty, thresh); } /* * Throttle it only when the background writeback cannot catch-up. This avoids * (excessively) small writeouts when the wb limits are ramping up in case of * !strictlimit. * * In strictlimit case make decision based on the wb counters and limits. Small * writeouts when the wb limits are ramping up are the price we consciously pay * for strictlimit-ing. */ static void domain_dirty_freerun(struct dirty_throttle_control *dtc, bool strictlimit) { unsigned long dirty, thresh, bg_thresh; if (unlikely(strictlimit)) { wb_dirty_limits(dtc); dirty = dtc->wb_dirty; thresh = dtc->wb_thresh; bg_thresh = dtc->wb_bg_thresh; } else { dirty = dtc->dirty; thresh = dtc->thresh; bg_thresh = dtc->bg_thresh; } dtc->freerun = dirty <= dirty_freerun_ceiling(thresh, bg_thresh); } static void balance_domain_limits(struct dirty_throttle_control *dtc, bool strictlimit) { domain_dirty_avail(dtc, true); domain_dirty_limits(dtc); domain_dirty_freerun(dtc, strictlimit); } static void wb_dirty_freerun(struct dirty_throttle_control *dtc, bool strictlimit) { dtc->freerun = false; /* was already handled in domain_dirty_freerun */ if (strictlimit) return; wb_dirty_limits(dtc); /* * LOCAL_THROTTLE tasks must not be throttled when below the per-wb * freerun ceiling. */ if (!(current->flags & PF_LOCAL_THROTTLE)) return; dtc->freerun = dtc->wb_dirty < dirty_freerun_ceiling(dtc->wb_thresh, dtc->wb_bg_thresh); } static inline void wb_dirty_exceeded(struct dirty_throttle_control *dtc, bool strictlimit) { dtc->dirty_exceeded = (dtc->wb_dirty > dtc->wb_thresh) && ((dtc->dirty > dtc->thresh) || strictlimit); } /* * The limits fields dirty_exceeded and pos_ratio won't be updated if wb is * in freerun state. Please don't use these invalid fields in freerun case. */ static void balance_wb_limits(struct dirty_throttle_control *dtc, bool strictlimit) { wb_dirty_freerun(dtc, strictlimit); if (dtc->freerun) return; wb_dirty_exceeded(dtc, strictlimit); wb_position_ratio(dtc); } /* * balance_dirty_pages() must be called by processes which are generating dirty * data. It looks at the number of dirty pages in the machine and will force * the caller to wait once crossing the (background_thresh + dirty_thresh) / 2. * If we're over `background_thresh' then the writeback threads are woken to * perform some writeout. */ static int balance_dirty_pages(struct bdi_writeback *wb, unsigned long pages_dirtied, unsigned int flags) { struct dirty_throttle_control gdtc_stor = { GDTC_INIT(wb) }; struct dirty_throttle_control mdtc_stor = { MDTC_INIT(wb, &gdtc_stor) }; struct dirty_throttle_control * const gdtc = &gdtc_stor; struct dirty_throttle_control * const mdtc = mdtc_valid(&mdtc_stor) ? &mdtc_stor : NULL; struct dirty_throttle_control *sdtc; unsigned long nr_dirty; long period; long pause; long max_pause; long min_pause; int nr_dirtied_pause; unsigned long task_ratelimit; unsigned long dirty_ratelimit; struct backing_dev_info *bdi = wb->bdi; bool strictlimit = bdi->capabilities & BDI_CAP_STRICTLIMIT; unsigned long start_time = jiffies; int ret = 0; for (;;) { unsigned long now = jiffies; nr_dirty = global_node_page_state(NR_FILE_DIRTY); balance_domain_limits(gdtc, strictlimit); if (mdtc) { /* * If @wb belongs to !root memcg, repeat the same * basic calculations for the memcg domain. */ balance_domain_limits(mdtc, strictlimit); } /* * In laptop mode, we wait until hitting the higher threshold * before starting background writeout, and then write out all * the way down to the lower threshold. So slow writers cause * minimal disk activity. * * In normal mode, we start background writeout at the lower * background_thresh, to keep the amount of dirty memory low. */ if (!laptop_mode && nr_dirty > gdtc->bg_thresh && !writeback_in_progress(wb)) wb_start_background_writeback(wb); /* * If memcg domain is in effect, @dirty should be under * both global and memcg freerun ceilings. */ if (gdtc->freerun && (!mdtc || mdtc->freerun)) { unsigned long intv; unsigned long m_intv; free_running: intv = domain_poll_intv(gdtc, strictlimit); m_intv = ULONG_MAX; current->dirty_paused_when = now; current->nr_dirtied = 0; if (mdtc) m_intv = domain_poll_intv(mdtc, strictlimit); current->nr_dirtied_pause = min(intv, m_intv); break; } /* Start writeback even when in laptop mode */ if (unlikely(!writeback_in_progress(wb))) wb_start_background_writeback(wb); mem_cgroup_flush_foreign(wb); /* * Calculate global domain's pos_ratio and select the * global dtc by default. */ balance_wb_limits(gdtc, strictlimit); if (gdtc->freerun) goto free_running; sdtc = gdtc; if (mdtc) { /* * If memcg domain is in effect, calculate its * pos_ratio. @wb should satisfy constraints from * both global and memcg domains. Choose the one * w/ lower pos_ratio. */ balance_wb_limits(mdtc, strictlimit); if (mdtc->freerun) goto free_running; if (mdtc->pos_ratio < gdtc->pos_ratio) sdtc = mdtc; } wb->dirty_exceeded = gdtc->dirty_exceeded || (mdtc && mdtc->dirty_exceeded); if (time_is_before_jiffies(READ_ONCE(wb->bw_time_stamp) + BANDWIDTH_INTERVAL)) __wb_update_bandwidth(gdtc, mdtc, true); /* throttle according to the chosen dtc */ dirty_ratelimit = READ_ONCE(wb->dirty_ratelimit); task_ratelimit = ((u64)dirty_ratelimit * sdtc->pos_ratio) >> RATELIMIT_CALC_SHIFT; max_pause = wb_max_pause(wb, sdtc->wb_dirty); min_pause = wb_min_pause(wb, max_pause, task_ratelimit, dirty_ratelimit, &nr_dirtied_pause); if (unlikely(task_ratelimit == 0)) { period = max_pause; pause = max_pause; goto pause; } period = HZ * pages_dirtied / task_ratelimit; pause = period; if (current->dirty_paused_when) pause -= now - current->dirty_paused_when; /* * For less than 1s think time (ext3/4 may block the dirtier * for up to 800ms from time to time on 1-HDD; so does xfs, * however at much less frequency), try to compensate it in * future periods by updating the virtual time; otherwise just * do a reset, as it may be a light dirtier. */ if (pause < min_pause) { trace_balance_dirty_pages(wb, sdtc, dirty_ratelimit, task_ratelimit, pages_dirtied, period, min(pause, 0L), start_time); if (pause < -HZ) { current->dirty_paused_when = now; current->nr_dirtied = 0; } else if (period) { current->dirty_paused_when += period; current->nr_dirtied = 0; } else if (current->nr_dirtied_pause <= pages_dirtied) current->nr_dirtied_pause += pages_dirtied; break; } if (unlikely(pause > max_pause)) { /* for occasional dropped task_ratelimit */ now += min(pause - max_pause, max_pause); pause = max_pause; } pause: trace_balance_dirty_pages(wb, sdtc, dirty_ratelimit, task_ratelimit, pages_dirtied, period, pause, start_time); if (flags & BDP_ASYNC) { ret = -EAGAIN; break; } __set_current_state(TASK_KILLABLE); bdi->last_bdp_sleep = jiffies; io_schedule_timeout(pause); current->dirty_paused_when = now + pause; current->nr_dirtied = 0; current->nr_dirtied_pause = nr_dirtied_pause; /* * This is typically equal to (dirty < thresh) and can also * keep "1000+ dd on a slow USB stick" under control. */ if (task_ratelimit) break; /* * In the case of an unresponsive NFS server and the NFS dirty * pages exceeds dirty_thresh, give the other good wb's a pipe * to go through, so that tasks on them still remain responsive. * * In theory 1 page is enough to keep the consumer-producer * pipe going: the flusher cleans 1 page => the task dirties 1 * more page. However wb_dirty has accounting errors. So use * the larger and more IO friendly wb_stat_error. */ if (sdtc->wb_dirty <= wb_stat_error()) break; if (fatal_signal_pending(current)) break; } return ret; } static DEFINE_PER_CPU(int, bdp_ratelimits); /* * Normal tasks are throttled by * loop { * dirty tsk->nr_dirtied_pause pages; * take a snap in balance_dirty_pages(); * } * However there is a worst case. If every task exit immediately when dirtied * (tsk->nr_dirtied_pause - 1) pages, balance_dirty_pages() will never be * called to throttle the page dirties. The solution is to save the not yet * throttled page dirties in dirty_throttle_leaks on task exit and charge them * randomly into the running tasks. This works well for the above worst case, * as the new task will pick up and accumulate the old task's leaked dirty * count and eventually get throttled. */ DEFINE_PER_CPU(int, dirty_throttle_leaks) = 0; /** * balance_dirty_pages_ratelimited_flags - Balance dirty memory state. * @mapping: address_space which was dirtied. * @flags: BDP flags. * * Processes which are dirtying memory should call in here once for each page * which was newly dirtied. The function will periodically check the system's * dirty state and will initiate writeback if needed. * * See balance_dirty_pages_ratelimited() for details. * * Return: If @flags contains BDP_ASYNC, it may return -EAGAIN to * indicate that memory is out of balance and the caller must wait * for I/O to complete. Otherwise, it will return 0 to indicate * that either memory was already in balance, or it was able to sleep * until the amount of dirty memory returned to balance. */ int balance_dirty_pages_ratelimited_flags(struct address_space *mapping, unsigned int flags) { struct inode *inode = mapping->host; struct backing_dev_info *bdi = inode_to_bdi(inode); struct bdi_writeback *wb = NULL; int ratelimit; int ret = 0; int *p; if (!(bdi->capabilities & BDI_CAP_WRITEBACK)) return ret; if (inode_cgwb_enabled(inode)) wb = wb_get_create_current(bdi, GFP_KERNEL); if (!wb) wb = &bdi->wb; ratelimit = current->nr_dirtied_pause; if (wb->dirty_exceeded) ratelimit = min(ratelimit, 32 >> (PAGE_SHIFT - 10)); preempt_disable(); /* * This prevents one CPU to accumulate too many dirtied pages without * calling into balance_dirty_pages(), which can happen when there are * 1000+ tasks, all of them start dirtying pages at exactly the same * time, hence all honoured too large initial task->nr_dirtied_pause. */ p = this_cpu_ptr(&bdp_ratelimits); if (unlikely(current->nr_dirtied >= ratelimit)) *p = 0; else if (unlikely(*p >= ratelimit_pages)) { *p = 0; ratelimit = 0; } /* * Pick up the dirtied pages by the exited tasks. This avoids lots of * short-lived tasks (eg. gcc invocations in a kernel build) escaping * the dirty throttling and livelock other long-run dirtiers. */ p = this_cpu_ptr(&dirty_throttle_leaks); if (*p > 0 && current->nr_dirtied < ratelimit) { unsigned long nr_pages_dirtied; nr_pages_dirtied = min(*p, ratelimit - current->nr_dirtied); *p -= nr_pages_dirtied; current->nr_dirtied += nr_pages_dirtied; } preempt_enable(); if (unlikely(current->nr_dirtied >= ratelimit)) ret = balance_dirty_pages(wb, current->nr_dirtied, flags); wb_put(wb); return ret; } EXPORT_SYMBOL_GPL(balance_dirty_pages_ratelimited_flags); /** * balance_dirty_pages_ratelimited - balance dirty memory state. * @mapping: address_space which was dirtied. * * Processes which are dirtying memory should call in here once for each page * which was newly dirtied. The function will periodically check the system's * dirty state and will initiate writeback if needed. * * Once we're over the dirty memory limit we decrease the ratelimiting * by a lot, to prevent individual processes from overshooting the limit * by (ratelimit_pages) each. */ void balance_dirty_pages_ratelimited(struct address_space *mapping) { balance_dirty_pages_ratelimited_flags(mapping, 0); } EXPORT_SYMBOL(balance_dirty_pages_ratelimited); /* * Similar to wb_dirty_limits, wb_bg_dirty_limits also calculates dirty * and thresh, but it's for background writeback. */ static void wb_bg_dirty_limits(struct dirty_throttle_control *dtc) { struct bdi_writeback *wb = dtc->wb; dtc->wb_bg_thresh = __wb_calc_thresh(dtc, dtc->bg_thresh); if (dtc->wb_bg_thresh < 2 * wb_stat_error()) dtc->wb_dirty = wb_stat_sum(wb, WB_RECLAIMABLE); else dtc->wb_dirty = wb_stat(wb, WB_RECLAIMABLE); } static bool domain_over_bg_thresh(struct dirty_throttle_control *dtc) { domain_dirty_avail(dtc, false); domain_dirty_limits(dtc); if (dtc->dirty > dtc->bg_thresh) return true; wb_bg_dirty_limits(dtc); if (dtc->wb_dirty > dtc->wb_bg_thresh) return true; return false; } /** * wb_over_bg_thresh - does @wb need to be written back? * @wb: bdi_writeback of interest * * Determines whether background writeback should keep writing @wb or it's * clean enough. * * Return: %true if writeback should continue. */ bool wb_over_bg_thresh(struct bdi_writeback *wb) { struct dirty_throttle_control gdtc = { GDTC_INIT(wb) }; struct dirty_throttle_control mdtc = { MDTC_INIT(wb, &gdtc) }; if (domain_over_bg_thresh(&gdtc)) return true; if (mdtc_valid(&mdtc)) return domain_over_bg_thresh(&mdtc); return false; } #ifdef CONFIG_SYSCTL /* * sysctl handler for /proc/sys/vm/dirty_writeback_centisecs */ static int dirty_writeback_centisecs_handler(const struct ctl_table *table, int write, void *buffer, size_t *length, loff_t *ppos) { unsigned int old_interval = dirty_writeback_interval; int ret; ret = proc_dointvec(table, write, buffer, length, ppos); /* * Writing 0 to dirty_writeback_interval will disable periodic writeback * and a different non-zero value will wakeup the writeback threads. * wb_wakeup_delayed() would be more appropriate, but it's a pain to * iterate over all bdis and wbs. * The reason we do this is to make the change take effect immediately. */ if (!ret && write && dirty_writeback_interval && dirty_writeback_interval != old_interval) wakeup_flusher_threads(WB_REASON_PERIODIC); return ret; } #endif void laptop_mode_timer_fn(struct timer_list *t) { struct backing_dev_info *backing_dev_info = timer_container_of(backing_dev_info, t, laptop_mode_wb_timer); wakeup_flusher_threads_bdi(backing_dev_info, WB_REASON_LAPTOP_TIMER); } /* * We've spun up the disk and we're in laptop mode: schedule writeback * of all dirty data a few seconds from now. If the flush is already scheduled * then push it back - the user is still using the disk. */ void laptop_io_completion(struct backing_dev_info *info) { mod_timer(&info->laptop_mode_wb_timer, jiffies + laptop_mode); } /* * We're in laptop mode and we've just synced. The sync's writes will have * caused another writeback to be scheduled by laptop_io_completion. * Nothing needs to be written back anymore, so we unschedule the writeback. */ void laptop_sync_completion(void) { struct backing_dev_info *bdi; rcu_read_lock(); list_for_each_entry_rcu(bdi, &bdi_list, bdi_list) timer_delete(&bdi->laptop_mode_wb_timer); rcu_read_unlock(); } /* * If ratelimit_pages is too high then we can get into dirty-data overload * if a large number of processes all perform writes at the same time. * * Here we set ratelimit_pages to a level which ensures that when all CPUs are * dirtying in parallel, we cannot go more than 3% (1/32) over the dirty memory * thresholds. */ void writeback_set_ratelimit(void) { struct wb_domain *dom = &global_wb_domain; unsigned long background_thresh; unsigned long dirty_thresh; global_dirty_limits(&background_thresh, &dirty_thresh); dom->dirty_limit = dirty_thresh; ratelimit_pages = dirty_thresh / (num_online_cpus() * 32); if (ratelimit_pages < 16) ratelimit_pages = 16; } static int page_writeback_cpu_online(unsigned int cpu) { writeback_set_ratelimit(); return 0; } #ifdef CONFIG_SYSCTL /* this is needed for the proc_doulongvec_minmax of vm_dirty_bytes */ static const unsigned long dirty_bytes_min = 2 * PAGE_SIZE; static const struct ctl_table vm_page_writeback_sysctls[] = { { .procname = "dirty_background_ratio", .data = &dirty_background_ratio, .maxlen = sizeof(dirty_background_ratio), .mode = 0644, .proc_handler = dirty_background_ratio_handler, .extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_ONE_HUNDRED, }, { .procname = "dirty_background_bytes", .data = &dirty_background_bytes, .maxlen = sizeof(dirty_background_bytes), .mode = 0644, .proc_handler = dirty_background_bytes_handler, .extra1 = SYSCTL_LONG_ONE, }, { .procname = "dirty_ratio", .data = &vm_dirty_ratio, .maxlen = sizeof(vm_dirty_ratio), .mode = 0644, .proc_handler = dirty_ratio_handler, .extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_ONE_HUNDRED, }, { .procname = "dirty_bytes", .data = &vm_dirty_bytes, .maxlen = sizeof(vm_dirty_bytes), .mode = 0644, .proc_handler = dirty_bytes_handler, .extra1 = (void *)&dirty_bytes_min, }, { .procname = "dirty_writeback_centisecs", .data = &dirty_writeback_interval, .maxlen = sizeof(dirty_writeback_interval), .mode = 0644, .proc_handler = dirty_writeback_centisecs_handler, }, { .procname = "dirty_expire_centisecs", .data = &dirty_expire_interval, .maxlen = sizeof(dirty_expire_interval), .mode = 0644, .proc_handler = proc_dointvec_minmax, .extra1 = SYSCTL_ZERO, }, #ifdef CONFIG_HIGHMEM { .procname = "highmem_is_dirtyable", .data = &vm_highmem_is_dirtyable, .maxlen = sizeof(vm_highmem_is_dirtyable), .mode = 0644, .proc_handler = proc_dointvec_minmax, .extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_ONE, }, #endif { .procname = "laptop_mode", .data = &laptop_mode, .maxlen = sizeof(laptop_mode), .mode = 0644, .proc_handler = proc_dointvec_jiffies, }, }; #endif /* * Called early on to tune the page writeback dirty limits. * * We used to scale dirty pages according to how total memory * related to pages that could be allocated for buffers. * * However, that was when we used "dirty_ratio" to scale with * all memory, and we don't do that any more. "dirty_ratio" * is now applied to total non-HIGHPAGE memory, and as such we can't * get into the old insane situation any more where we had * large amounts of dirty pages compared to a small amount of * non-HIGHMEM memory. * * But we might still want to scale the dirty_ratio by how * much memory the box has.. */ void __init page_writeback_init(void) { BUG_ON(wb_domain_init(&global_wb_domain, GFP_KERNEL)); cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "mm/writeback:online", page_writeback_cpu_online, NULL); cpuhp_setup_state(CPUHP_MM_WRITEBACK_DEAD, "mm/writeback:dead", NULL, page_writeback_cpu_online); #ifdef CONFIG_SYSCTL register_sysctl_init("vm", vm_page_writeback_sysctls); #endif } /** * tag_pages_for_writeback - tag pages to be written by writeback * @mapping: address space structure to write * @start: starting page index * @end: ending page index (inclusive) * * This function scans the page range from @start to @end (inclusive) and tags * all pages that have DIRTY tag set with a special TOWRITE tag. The caller * can then use the TOWRITE tag to identify pages eligible for writeback. * This mechanism is used to avoid livelocking of writeback by a process * steadily creating new dirty pages in the file (thus it is important for this * function to be quick so that it can tag pages faster than a dirtying process * can create them). */ void tag_pages_for_writeback(struct address_space *mapping, pgoff_t start, pgoff_t end) { XA_STATE(xas, &mapping->i_pages, start); unsigned int tagged = 0; void *page; xas_lock_irq(&xas); xas_for_each_marked(&xas, page, end, PAGECACHE_TAG_DIRTY) { xas_set_mark(&xas, PAGECACHE_TAG_TOWRITE); if (++tagged % XA_CHECK_SCHED) continue; xas_pause(&xas); xas_unlock_irq(&xas); cond_resched(); xas_lock_irq(&xas); } xas_unlock_irq(&xas); } EXPORT_SYMBOL(tag_pages_for_writeback); static bool folio_prepare_writeback(struct address_space *mapping, struct writeback_control *wbc, struct folio *folio) { /* * Folio truncated or invalidated. We can freely skip it then, * even for data integrity operations: the folio has disappeared * concurrently, so there could be no real expectation of this * data integrity operation even if there is now a new, dirty * folio at the same pagecache index. */ if (unlikely(folio->mapping != mapping)) return false; /* * Did somebody else write it for us? */ if (!folio_test_dirty(folio)) return false; if (folio_test_writeback(folio)) { if (wbc->sync_mode == WB_SYNC_NONE) return false; folio_wait_writeback(folio); } BUG_ON(folio_test_writeback(folio)); if (!folio_clear_dirty_for_io(folio)) return false; return true; } static xa_mark_t wbc_to_tag(struct writeback_control *wbc) { if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages) return PAGECACHE_TAG_TOWRITE; return PAGECACHE_TAG_DIRTY; } static pgoff_t wbc_end(struct writeback_control *wbc) { if (wbc->range_cyclic) return -1; return wbc->range_end >> PAGE_SHIFT; } static struct folio *writeback_get_folio(struct address_space *mapping, struct writeback_control *wbc) { struct folio *folio; retry: folio = folio_batch_next(&wbc->fbatch); if (!folio) { folio_batch_release(&wbc->fbatch); cond_resched(); filemap_get_folios_tag(mapping, &wbc->index, wbc_end(wbc), wbc_to_tag(wbc), &wbc->fbatch); folio = folio_batch_next(&wbc->fbatch); if (!folio) return NULL; } folio_lock(folio); if (unlikely(!folio_prepare_writeback(mapping, wbc, folio))) { folio_unlock(folio); goto retry; } trace_wbc_writepage(wbc, inode_to_bdi(mapping->host)); return folio; } /** * writeback_iter - iterate folio of a mapping for writeback * @mapping: address space structure to write * @wbc: writeback context * @folio: previously iterated folio (%NULL to start) * @error: in-out pointer for writeback errors (see below) * * This function returns the next folio for the writeback operation described by * @wbc on @mapping and should be called in a while loop in the ->writepages * implementation. * * To start the writeback operation, %NULL is passed in the @folio argument, and * for every subsequent iteration the folio returned previously should be passed * back in. * * If there was an error in the per-folio writeback inside the writeback_iter() * loop, @error should be set to the error value. * * Once the writeback described in @wbc has finished, this function will return * %NULL and if there was an error in any iteration restore it to @error. * * Note: callers should not manually break out of the loop using break or goto * but must keep calling writeback_iter() until it returns %NULL. * * Return: the folio to write or %NULL if the loop is done. */ struct folio *writeback_iter(struct address_space *mapping, struct writeback_control *wbc, struct folio *folio, int *error) { if (!folio) { folio_batch_init(&wbc->fbatch); wbc->saved_err = *error = 0; /* * For range cyclic writeback we remember where we stopped so * that we can continue where we stopped. * * For non-cyclic writeback we always start at the beginning of * the passed in range. */ if (wbc->range_cyclic) wbc->index = mapping->writeback_index; else wbc->index = wbc->range_start >> PAGE_SHIFT; /* * To avoid livelocks when other processes dirty new pages, we * first tag pages which should be written back and only then * start writing them. * * For data-integrity writeback we have to be careful so that we * do not miss some pages (e.g., because some other process has * cleared the TOWRITE tag we set). The rule we follow is that * TOWRITE tag can be cleared only by the process clearing the * DIRTY tag (and submitting the page for I/O). */ if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages) tag_pages_for_writeback(mapping, wbc->index, wbc_end(wbc)); } else { wbc->nr_to_write -= folio_nr_pages(folio); WARN_ON_ONCE(*error > 0); /* * For integrity writeback we have to keep going until we have * written all the folios we tagged for writeback above, even if * we run past wbc->nr_to_write or encounter errors. * We stash away the first error we encounter in wbc->saved_err * so that it can be retrieved when we're done. This is because * the file system may still have state to clear for each folio. * * For background writeback we exit as soon as we run past * wbc->nr_to_write or encounter the first error. */ if (wbc->sync_mode == WB_SYNC_ALL) { if (*error && !wbc->saved_err) wbc->saved_err = *error; } else { if (*error || wbc->nr_to_write <= 0) goto done; } } folio = writeback_get_folio(mapping, wbc); if (!folio) { /* * To avoid deadlocks between range_cyclic writeback and callers * that hold folios in writeback to aggregate I/O until * the writeback iteration finishes, we do not loop back to the * start of the file. Doing so causes a folio lock/folio * writeback access order inversion - we should only ever lock * multiple folios in ascending folio->index order, and looping * back to the start of the file violates that rule and causes * deadlocks. */ if (wbc->range_cyclic) mapping->writeback_index = 0; /* * Return the first error we encountered (if there was any) to * the caller. */ *error = wbc->saved_err; } return folio; done: if (wbc->range_cyclic) mapping->writeback_index = folio_next_index(folio); folio_batch_release(&wbc->fbatch); return NULL; } EXPORT_SYMBOL_GPL(writeback_iter); int do_writepages(struct address_space *mapping, struct writeback_control *wbc) { int ret; struct bdi_writeback *wb; if (wbc->nr_to_write <= 0) return 0; wb = inode_to_wb_wbc(mapping->host, wbc); wb_bandwidth_estimate_start(wb); while (1) { if (mapping->a_ops->writepages) ret = mapping->a_ops->writepages(mapping, wbc); else /* deal with chardevs and other special files */ ret = 0; if (ret != -ENOMEM || wbc->sync_mode != WB_SYNC_ALL) break; /* * Lacking an allocation context or the locality or writeback * state of any of the inode's pages, throttle based on * writeback activity on the local node. It's as good a * guess as any. */ reclaim_throttle(NODE_DATA(numa_node_id()), VMSCAN_THROTTLE_WRITEBACK); } /* * Usually few pages are written by now from those we've just submitted * but if there's constant writeback being submitted, this makes sure * writeback bandwidth is updated once in a while. */ if (time_is_before_jiffies(READ_ONCE(wb->bw_time_stamp) + BANDWIDTH_INTERVAL)) wb_update_bandwidth(wb); return ret; } /* * For address_spaces which do not use buffers nor write back. */ bool noop_dirty_folio(struct address_space *mapping, struct folio *folio) { if (!folio_test_dirty(folio)) return !folio_test_set_dirty(folio); return false; } EXPORT_SYMBOL(noop_dirty_folio); /* * Helper function for set_page_dirty family. * * NOTE: This relies on being atomic wrt interrupts. */ static void folio_account_dirtied(struct folio *folio, struct address_space *mapping) { struct inode *inode = mapping->host; trace_writeback_dirty_folio(folio, mapping); if (mapping_can_writeback(mapping)) { struct bdi_writeback *wb; long nr = folio_nr_pages(folio); inode_attach_wb(inode, folio); wb = inode_to_wb(inode); __lruvec_stat_mod_folio(folio, NR_FILE_DIRTY, nr); __zone_stat_mod_folio(folio, NR_ZONE_WRITE_PENDING, nr); __node_stat_mod_folio(folio, NR_DIRTIED, nr); wb_stat_mod(wb, WB_RECLAIMABLE, nr); wb_stat_mod(wb, WB_DIRTIED, nr); task_io_account_write(nr * PAGE_SIZE); current->nr_dirtied += nr; __this_cpu_add(bdp_ratelimits, nr); mem_cgroup_track_foreign_dirty(folio, wb); } } /* * Helper function for deaccounting dirty page without writeback. * */ void folio_account_cleaned(struct folio *folio, struct bdi_writeback *wb) { long nr = folio_nr_pages(folio); lruvec_stat_mod_folio(folio, NR_FILE_DIRTY, -nr); zone_stat_mod_folio(folio, NR_ZONE_WRITE_PENDING, -nr); wb_stat_mod(wb, WB_RECLAIMABLE, -nr); task_io_account_cancelled_write(nr * PAGE_SIZE); } /* * Mark the folio dirty, and set it dirty in the page cache. * * If warn is true, then emit a warning if the folio is not uptodate and has * not been truncated. * * It is the caller's responsibility to prevent the folio from being truncated * while this function is in progress, although it may have been truncated * before this function is called. Most callers have the folio locked. * A few have the folio blocked from truncation through other means (e.g. * zap_vma_pages() has it mapped and is holding the page table lock). * When called from mark_buffer_dirty(), the filesystem should hold a * reference to the buffer_head that is being marked dirty, which causes * try_to_free_buffers() to fail. */ void __folio_mark_dirty(struct folio *folio, struct address_space *mapping, int warn) { unsigned long flags; /* * Shmem writeback relies on swap, and swap writeback is LRU based, * not using the dirty mark. */ VM_WARN_ON_ONCE(folio_test_swapcache(folio) || shmem_mapping(mapping)); xa_lock_irqsave(&mapping->i_pages, flags); if (folio->mapping) { /* Race with truncate? */ WARN_ON_ONCE(warn && !folio_test_uptodate(folio)); folio_account_dirtied(folio, mapping); __xa_set_mark(&mapping->i_pages, folio->index, PAGECACHE_TAG_DIRTY); } xa_unlock_irqrestore(&mapping->i_pages, flags); } /** * filemap_dirty_folio - Mark a folio dirty for filesystems which do not use buffer_heads. * @mapping: Address space this folio belongs to. * @folio: Folio to be marked as dirty. * * Filesystems which do not use buffer heads should call this function * from their dirty_folio address space operation. It ignores the * contents of folio_get_private(), so if the filesystem marks individual * blocks as dirty, the filesystem should handle that itself. * * This is also sometimes used by filesystems which use buffer_heads when * a single buffer is being dirtied: we want to set the folio dirty in * that case, but not all the buffers. This is a "bottom-up" dirtying, * whereas block_dirty_folio() is a "top-down" dirtying. * * The caller must ensure this doesn't race with truncation. Most will * simply hold the folio lock, but e.g. zap_pte_range() calls with the * folio mapped and the pte lock held, which also locks out truncation. */ bool filemap_dirty_folio(struct address_space *mapping, struct folio *folio) { if (folio_test_set_dirty(folio)) return false; __folio_mark_dirty(folio, mapping, !folio_test_private(folio)); if (mapping->host) { /* !PageAnon && !swapper_space */ __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); } return true; } EXPORT_SYMBOL(filemap_dirty_folio); /** * folio_redirty_for_writepage - Decline to write a dirty folio. * @wbc: The writeback control. * @folio: The folio. * * When a writepage implementation decides that it doesn't want to write * @folio for some reason, it should call this function, unlock @folio and * return 0. * * Return: True if we redirtied the folio. False if someone else dirtied * it first. */ bool folio_redirty_for_writepage(struct writeback_control *wbc, struct folio *folio) { struct address_space *mapping = folio->mapping; long nr = folio_nr_pages(folio); bool ret; wbc->pages_skipped += nr; ret = filemap_dirty_folio(mapping, folio); if (mapping && mapping_can_writeback(mapping)) { struct inode *inode = mapping->host; struct bdi_writeback *wb; struct wb_lock_cookie cookie = {}; wb = unlocked_inode_to_wb_begin(inode, &cookie); current->nr_dirtied -= nr; node_stat_mod_folio(folio, NR_DIRTIED, -nr); wb_stat_mod(wb, WB_DIRTIED, -nr); unlocked_inode_to_wb_end(inode, &cookie); } return ret; } EXPORT_SYMBOL(folio_redirty_for_writepage); /** * folio_mark_dirty - Mark a folio as being modified. * @folio: The folio. * * The folio may not be truncated while this function is running. * Holding the folio lock is sufficient to prevent truncation, but some * callers cannot acquire a sleeping lock. These callers instead hold * the page table lock for a page table which contains at least one page * in this folio. Truncation will block on the page table lock as it * unmaps pages before removing the folio from its mapping. * * Return: True if the folio was newly dirtied, false if it was already dirty. */ bool folio_mark_dirty(struct folio *folio) { struct address_space *mapping = folio_mapping(folio); if (likely(mapping)) { /* * readahead/folio_deactivate could remain * PG_readahead/PG_reclaim due to race with folio_end_writeback * About readahead, if the folio is written, the flags would be * reset. So no problem. * About folio_deactivate, if the folio is redirtied, * the flag will be reset. So no problem. but if the * folio is used by readahead it will confuse readahead * and make it restart the size rampup process. But it's * a trivial problem. */ if (folio_test_reclaim(folio)) folio_clear_reclaim(folio); return mapping->a_ops->dirty_folio(mapping, folio); } return noop_dirty_folio(mapping, folio); } EXPORT_SYMBOL(folio_mark_dirty); /* * folio_mark_dirty() is racy if the caller has no reference against * folio->mapping->host, and if the folio is unlocked. This is because another * CPU could truncate the folio off the mapping and then free the mapping. * * Usually, the folio _is_ locked, or the caller is a user-space process which * holds a reference on the inode by having an open file. * * In other cases, the folio should be locked before running folio_mark_dirty(). */ bool folio_mark_dirty_lock(struct folio *folio) { bool ret; folio_lock(folio); ret = folio_mark_dirty(folio); folio_unlock(folio); return ret; } EXPORT_SYMBOL(folio_mark_dirty_lock); /* * This cancels just the dirty bit on the kernel page itself, it does NOT * actually remove dirty bits on any mmap's that may be around. It also * leaves the page tagged dirty, so any sync activity will still find it on * the dirty lists, and in particular, clear_page_dirty_for_io() will still * look at the dirty bits in the VM. * * Doing this should *normally* only ever be done when a page is truncated, * and is not actually mapped anywhere at all. However, fs/buffer.c does * this when it notices that somebody has cleaned out all the buffers on a * page without actually doing it through the VM. Can you say "ext3 is * horribly ugly"? Thought you could. */ void __folio_cancel_dirty(struct folio *folio) { struct address_space *mapping = folio_mapping(folio); if (mapping_can_writeback(mapping)) { struct inode *inode = mapping->host; struct bdi_writeback *wb; struct wb_lock_cookie cookie = {}; wb = unlocked_inode_to_wb_begin(inode, &cookie); if (folio_test_clear_dirty(folio)) folio_account_cleaned(folio, wb); unlocked_inode_to_wb_end(inode, &cookie); } else { folio_clear_dirty(folio); } } EXPORT_SYMBOL(__folio_cancel_dirty); /* * Clear a folio's dirty flag, while caring for dirty memory accounting. * Returns true if the folio was previously dirty. * * This is for preparing to put the folio under writeout. We leave * the folio tagged as dirty in the xarray so that a concurrent * write-for-sync can discover it via a PAGECACHE_TAG_DIRTY walk. * The ->writepage implementation will run either folio_start_writeback() * or folio_mark_dirty(), at which stage we bring the folio's dirty flag * and xarray dirty tag back into sync. * * This incoherency between the folio's dirty flag and xarray tag is * unfortunate, but it only exists while the folio is locked. */ bool folio_clear_dirty_for_io(struct folio *folio) { struct address_space *mapping = folio_mapping(folio); bool ret = false; VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); if (mapping && mapping_can_writeback(mapping)) { struct inode *inode = mapping->host; struct bdi_writeback *wb; struct wb_lock_cookie cookie = {}; /* * Yes, Virginia, this is indeed insane. * * We use this sequence to make sure that * (a) we account for dirty stats properly * (b) we tell the low-level filesystem to * mark the whole folio dirty if it was * dirty in a pagetable. Only to then * (c) clean the folio again and return 1 to * cause the writeback. * * This way we avoid all nasty races with the * dirty bit in multiple places and clearing * them concurrently from different threads. * * Note! Normally the "folio_mark_dirty(folio)" * has no effect on the actual dirty bit - since * that will already usually be set. But we * need the side effects, and it can help us * avoid races. * * We basically use the folio "master dirty bit" * as a serialization point for all the different * threads doing their things. */ if (folio_mkclean(folio)) folio_mark_dirty(folio); /* * We carefully synchronise fault handlers against * installing a dirty pte and marking the folio dirty * at this point. We do this by having them hold the * page lock while dirtying the folio, and folios are * always locked coming in here, so we get the desired * exclusion. */ wb = unlocked_inode_to_wb_begin(inode, &cookie); if (folio_test_clear_dirty(folio)) { long nr = folio_nr_pages(folio); lruvec_stat_mod_folio(folio, NR_FILE_DIRTY, -nr); zone_stat_mod_folio(folio, NR_ZONE_WRITE_PENDING, -nr); wb_stat_mod(wb, WB_RECLAIMABLE, -nr); ret = true; } unlocked_inode_to_wb_end(inode, &cookie); return ret; } return folio_test_clear_dirty(folio); } EXPORT_SYMBOL(folio_clear_dirty_for_io); static void wb_inode_writeback_start(struct bdi_writeback *wb) { atomic_inc(&wb->writeback_inodes); } static void wb_inode_writeback_end(struct bdi_writeback *wb) { unsigned long flags; atomic_dec(&wb->writeback_inodes); /* * Make sure estimate of writeback throughput gets updated after * writeback completed. We delay the update by BANDWIDTH_INTERVAL * (which is the interval other bandwidth updates use for batching) so * that if multiple inodes end writeback at a similar time, they get * batched into one bandwidth update. */ spin_lock_irqsave(&wb->work_lock, flags); if (test_bit(WB_registered, &wb->state)) queue_delayed_work(bdi_wq, &wb->bw_dwork, BANDWIDTH_INTERVAL); spin_unlock_irqrestore(&wb->work_lock, flags); } bool __folio_end_writeback(struct folio *folio) { long nr = folio_nr_pages(folio); struct address_space *mapping = folio_mapping(folio); bool ret; if (mapping && mapping_use_writeback_tags(mapping)) { struct inode *inode = mapping->host; struct bdi_writeback *wb; unsigned long flags; xa_lock_irqsave(&mapping->i_pages, flags); ret = folio_xor_flags_has_waiters(folio, 1 << PG_writeback); __xa_clear_mark(&mapping->i_pages, folio->index, PAGECACHE_TAG_WRITEBACK); wb = inode_to_wb(inode); wb_stat_mod(wb, WB_WRITEBACK, -nr); __wb_writeout_add(wb, nr); if (!mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK)) { wb_inode_writeback_end(wb); if (mapping->host) sb_clear_inode_writeback(mapping->host); } xa_unlock_irqrestore(&mapping->i_pages, flags); } else { ret = folio_xor_flags_has_waiters(folio, 1 << PG_writeback); } lruvec_stat_mod_folio(folio, NR_WRITEBACK, -nr); zone_stat_mod_folio(folio, NR_ZONE_WRITE_PENDING, -nr); node_stat_mod_folio(folio, NR_WRITTEN, nr); return ret; } void __folio_start_writeback(struct folio *folio, bool keep_write) { long nr = folio_nr_pages(folio); struct address_space *mapping = folio_mapping(folio); int access_ret; VM_BUG_ON_FOLIO(folio_test_writeback(folio), folio); VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); if (mapping && mapping_use_writeback_tags(mapping)) { XA_STATE(xas, &mapping->i_pages, folio->index); struct inode *inode = mapping->host; struct bdi_writeback *wb; unsigned long flags; bool on_wblist; xas_lock_irqsave(&xas, flags); xas_load(&xas); folio_test_set_writeback(folio); on_wblist = mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK); xas_set_mark(&xas, PAGECACHE_TAG_WRITEBACK); wb = inode_to_wb(inode); wb_stat_mod(wb, WB_WRITEBACK, nr); if (!on_wblist) { wb_inode_writeback_start(wb); /* * We can come through here when swapping anonymous * folios, so we don't necessarily have an inode to * track for sync. */ if (mapping->host) sb_mark_inode_writeback(mapping->host); } if (!folio_test_dirty(folio)) xas_clear_mark(&xas, PAGECACHE_TAG_DIRTY); if (!keep_write) xas_clear_mark(&xas, PAGECACHE_TAG_TOWRITE); xas_unlock_irqrestore(&xas, flags); } else { folio_test_set_writeback(folio); } lruvec_stat_mod_folio(folio, NR_WRITEBACK, nr); zone_stat_mod_folio(folio, NR_ZONE_WRITE_PENDING, nr); access_ret = arch_make_folio_accessible(folio); /* * If writeback has been triggered on a page that cannot be made * accessible, it is too late to recover here. */ VM_BUG_ON_FOLIO(access_ret != 0, folio); } EXPORT_SYMBOL(__folio_start_writeback); /** * folio_wait_writeback - Wait for a folio to finish writeback. * @folio: The folio to wait for. * * If the folio is currently being written back to storage, wait for the * I/O to complete. * * Context: Sleeps. Must be called in process context and with * no spinlocks held. Caller should hold a reference on the folio. * If the folio is not locked, writeback may start again after writeback * has finished. */ void folio_wait_writeback(struct folio *folio) { while (folio_test_writeback(folio)) { trace_folio_wait_writeback(folio, folio_mapping(folio)); folio_wait_bit(folio, PG_writeback); } } EXPORT_SYMBOL_GPL(folio_wait_writeback); /** * folio_wait_writeback_killable - Wait for a folio to finish writeback. * @folio: The folio to wait for. * * If the folio is currently being written back to storage, wait for the * I/O to complete or a fatal signal to arrive. * * Context: Sleeps. Must be called in process context and with * no spinlocks held. Caller should hold a reference on the folio. * If the folio is not locked, writeback may start again after writeback * has finished. * Return: 0 on success, -EINTR if we get a fatal signal while waiting. */ int folio_wait_writeback_killable(struct folio *folio) { while (folio_test_writeback(folio)) { trace_folio_wait_writeback(folio, folio_mapping(folio)); if (folio_wait_bit_killable(folio, PG_writeback)) return -EINTR; } return 0; } EXPORT_SYMBOL_GPL(folio_wait_writeback_killable); /** * folio_wait_stable() - wait for writeback to finish, if necessary. * @folio: The folio to wait on. * * This function determines if the given folio is related to a backing * device that requires folio contents to be held stable during writeback. * If so, then it will wait for any pending writeback to complete. * * Context: Sleeps. Must be called in process context and with * no spinlocks held. Caller should hold a reference on the folio. * If the folio is not locked, writeback may start again after writeback * has finished. */ void folio_wait_stable(struct folio *folio) { if (mapping_stable_writes(folio_mapping(folio))) folio_wait_writeback(folio); } EXPORT_SYMBOL_GPL(folio_wait_stable); |
| 85 84 7 1 2 1 3 7 10 10 10 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 | // SPDX-License-Identifier: GPL-2.0 /* * Filesystem-level keyring for fscrypt * * Copyright 2019 Google LLC */ /* * This file implements management of fscrypt master keys in the * filesystem-level keyring, including the ioctls: * * - FS_IOC_ADD_ENCRYPTION_KEY * - FS_IOC_REMOVE_ENCRYPTION_KEY * - FS_IOC_REMOVE_ENCRYPTION_KEY_ALL_USERS * - FS_IOC_GET_ENCRYPTION_KEY_STATUS * * See the "User API" section of Documentation/filesystems/fscrypt.rst for more * information about these ioctls. */ #include <crypto/skcipher.h> #include <linux/export.h> #include <linux/key-type.h> #include <linux/once.h> #include <linux/random.h> #include <linux/seq_file.h> #include <linux/unaligned.h> #include "fscrypt_private.h" /* The master encryption keys for a filesystem (->s_master_keys) */ struct fscrypt_keyring { /* * Lock that protects ->key_hashtable. It does *not* protect the * fscrypt_master_key structs themselves. */ spinlock_t lock; /* Hash table that maps fscrypt_key_specifier to fscrypt_master_key */ struct hlist_head key_hashtable[128]; }; static void wipe_master_key_secret(struct fscrypt_master_key_secret *secret) { memzero_explicit(secret, sizeof(*secret)); } static void move_master_key_secret(struct fscrypt_master_key_secret *dst, struct fscrypt_master_key_secret *src) { memcpy(dst, src, sizeof(*dst)); memzero_explicit(src, sizeof(*src)); } static void fscrypt_free_master_key(struct rcu_head *head) { struct fscrypt_master_key *mk = container_of(head, struct fscrypt_master_key, mk_rcu_head); /* * The master key secret and any embedded subkeys should have already * been wiped when the last active reference to the fscrypt_master_key * struct was dropped; doing it here would be unnecessarily late. * Nevertheless, use kfree_sensitive() in case anything was missed. */ kfree_sensitive(mk); } void fscrypt_put_master_key(struct fscrypt_master_key *mk) { if (!refcount_dec_and_test(&mk->mk_struct_refs)) return; /* * No structural references left, so free ->mk_users, and also free the * fscrypt_master_key struct itself after an RCU grace period ensures * that concurrent keyring lookups can no longer find it. */ WARN_ON_ONCE(refcount_read(&mk->mk_active_refs) != 0); if (mk->mk_users) { /* Clear the keyring so the quota gets released right away. */ keyring_clear(mk->mk_users); key_put(mk->mk_users); mk->mk_users = NULL; } call_rcu(&mk->mk_rcu_head, fscrypt_free_master_key); } void fscrypt_put_master_key_activeref(struct super_block *sb, struct fscrypt_master_key *mk) { size_t i; if (!refcount_dec_and_test(&mk->mk_active_refs)) return; /* * No active references left, so complete the full removal of this * fscrypt_master_key struct by removing it from the keyring and * destroying any subkeys embedded in it. */ if (WARN_ON_ONCE(!sb->s_master_keys)) return; spin_lock(&sb->s_master_keys->lock); hlist_del_rcu(&mk->mk_node); spin_unlock(&sb->s_master_keys->lock); /* * ->mk_active_refs == 0 implies that ->mk_present is false and * ->mk_decrypted_inodes is empty. */ WARN_ON_ONCE(mk->mk_present); WARN_ON_ONCE(!list_empty(&mk->mk_decrypted_inodes)); for (i = 0; i <= FSCRYPT_MODE_MAX; i++) { fscrypt_destroy_prepared_key( sb, &mk->mk_direct_keys[i]); fscrypt_destroy_prepared_key( sb, &mk->mk_iv_ino_lblk_64_keys[i]); fscrypt_destroy_prepared_key( sb, &mk->mk_iv_ino_lblk_32_keys[i]); } memzero_explicit(&mk->mk_ino_hash_key, sizeof(mk->mk_ino_hash_key)); mk->mk_ino_hash_key_initialized = false; /* Drop the structural ref associated with the active refs. */ fscrypt_put_master_key(mk); } /* * This transitions the key state from present to incompletely removed, and then * potentially to absent (depending on whether inodes remain). */ static void fscrypt_initiate_key_removal(struct super_block *sb, struct fscrypt_master_key *mk) { WRITE_ONCE(mk->mk_present, false); wipe_master_key_secret(&mk->mk_secret); fscrypt_put_master_key_activeref(sb, mk); } static inline bool valid_key_spec(const struct fscrypt_key_specifier *spec) { if (spec->__reserved) return false; return master_key_spec_len(spec) != 0; } static int fscrypt_user_key_instantiate(struct key *key, struct key_preparsed_payload *prep) { /* * We just charge FSCRYPT_MAX_RAW_KEY_SIZE bytes to the user's key quota * for each key, regardless of the exact key size. The amount of memory * actually used is greater than the size of the raw key anyway. */ return key_payload_reserve(key, FSCRYPT_MAX_RAW_KEY_SIZE); } static void fscrypt_user_key_describe(const struct key *key, struct seq_file *m) { seq_puts(m, key->description); } /* * Type of key in ->mk_users. Each key of this type represents a particular * user who has added a particular master key. * * Note that the name of this key type really should be something like * ".fscrypt-user" instead of simply ".fscrypt". But the shorter name is chosen * mainly for simplicity of presentation in /proc/keys when read by a non-root * user. And it is expected to be rare that a key is actually added by multiple * users, since users should keep their encryption keys confidential. */ static struct key_type key_type_fscrypt_user = { .name = ".fscrypt", .instantiate = fscrypt_user_key_instantiate, .describe = fscrypt_user_key_describe, }; #define FSCRYPT_MK_USERS_DESCRIPTION_SIZE \ (CONST_STRLEN("fscrypt-") + 2 * FSCRYPT_KEY_IDENTIFIER_SIZE + \ CONST_STRLEN("-users") + 1) #define FSCRYPT_MK_USER_DESCRIPTION_SIZE \ (2 * FSCRYPT_KEY_IDENTIFIER_SIZE + CONST_STRLEN(".uid.") + 10 + 1) static void format_mk_users_keyring_description( char description[FSCRYPT_MK_USERS_DESCRIPTION_SIZE], const u8 mk_identifier[FSCRYPT_KEY_IDENTIFIER_SIZE]) { sprintf(description, "fscrypt-%*phN-users", FSCRYPT_KEY_IDENTIFIER_SIZE, mk_identifier); } static void format_mk_user_description( char description[FSCRYPT_MK_USER_DESCRIPTION_SIZE], const u8 mk_identifier[FSCRYPT_KEY_IDENTIFIER_SIZE]) { sprintf(description, "%*phN.uid.%u", FSCRYPT_KEY_IDENTIFIER_SIZE, mk_identifier, __kuid_val(current_fsuid())); } /* Create ->s_master_keys if needed. Synchronized by fscrypt_add_key_mutex. */ static int allocate_filesystem_keyring(struct super_block *sb) { struct fscrypt_keyring *keyring; if (sb->s_master_keys) return 0; keyring = kzalloc(sizeof(*keyring), GFP_KERNEL); if (!keyring) return -ENOMEM; spin_lock_init(&keyring->lock); /* * Pairs with the smp_load_acquire() in fscrypt_find_master_key(). * I.e., here we publish ->s_master_keys with a RELEASE barrier so that * concurrent tasks can ACQUIRE it. */ smp_store_release(&sb->s_master_keys, keyring); return 0; } /* * Release all encryption keys that have been added to the filesystem, along * with the keyring that contains them. * * This is called at unmount time, after all potentially-encrypted inodes have * been evicted. The filesystem's underlying block device(s) are still * available at this time; this is important because after user file accesses * have been allowed, this function may need to evict keys from the keyslots of * an inline crypto engine, which requires the block device(s). */ void fscrypt_destroy_keyring(struct super_block *sb) { struct fscrypt_keyring *keyring = sb->s_master_keys; size_t i; if (!keyring) return; for (i = 0; i < ARRAY_SIZE(keyring->key_hashtable); i++) { struct hlist_head *bucket = &keyring->key_hashtable[i]; struct fscrypt_master_key *mk; struct hlist_node *tmp; hlist_for_each_entry_safe(mk, tmp, bucket, mk_node) { /* * Since all potentially-encrypted inodes were already * evicted, every key remaining in the keyring should * have an empty inode list, and should only still be in * the keyring due to the single active ref associated * with ->mk_present. There should be no structural * refs beyond the one associated with the active ref. */ WARN_ON_ONCE(refcount_read(&mk->mk_active_refs) != 1); WARN_ON_ONCE(refcount_read(&mk->mk_struct_refs) != 1); WARN_ON_ONCE(!mk->mk_present); fscrypt_initiate_key_removal(sb, mk); } } kfree_sensitive(keyring); sb->s_master_keys = NULL; } static struct hlist_head * fscrypt_mk_hash_bucket(struct fscrypt_keyring *keyring, const struct fscrypt_key_specifier *mk_spec) { /* * Since key specifiers should be "random" values, it is sufficient to * use a trivial hash function that just takes the first several bits of * the key specifier. */ unsigned long i = get_unaligned((unsigned long *)&mk_spec->u); return &keyring->key_hashtable[i % ARRAY_SIZE(keyring->key_hashtable)]; } /* * Find the specified master key struct in ->s_master_keys and take a structural * ref to it. The structural ref guarantees that the key struct continues to * exist, but it does *not* guarantee that ->s_master_keys continues to contain * the key struct. The structural ref needs to be dropped by * fscrypt_put_master_key(). Returns NULL if the key struct is not found. */ struct fscrypt_master_key * fscrypt_find_master_key(struct super_block *sb, const struct fscrypt_key_specifier *mk_spec) { struct fscrypt_keyring *keyring; struct hlist_head *bucket; struct fscrypt_master_key *mk; /* * Pairs with the smp_store_release() in allocate_filesystem_keyring(). * I.e., another task can publish ->s_master_keys concurrently, * executing a RELEASE barrier. We need to use smp_load_acquire() here * to safely ACQUIRE the memory the other task published. */ keyring = smp_load_acquire(&sb->s_master_keys); if (keyring == NULL) return NULL; /* No keyring yet, so no keys yet. */ bucket = fscrypt_mk_hash_bucket(keyring, mk_spec); rcu_read_lock(); switch (mk_spec->type) { case FSCRYPT_KEY_SPEC_TYPE_DESCRIPTOR: hlist_for_each_entry_rcu(mk, bucket, mk_node) { if (mk->mk_spec.type == FSCRYPT_KEY_SPEC_TYPE_DESCRIPTOR && memcmp(mk->mk_spec.u.descriptor, mk_spec->u.descriptor, FSCRYPT_KEY_DESCRIPTOR_SIZE) == 0 && refcount_inc_not_zero(&mk->mk_struct_refs)) goto out; } break; case FSCRYPT_KEY_SPEC_TYPE_IDENTIFIER: hlist_for_each_entry_rcu(mk, bucket, mk_node) { if (mk->mk_spec.type == FSCRYPT_KEY_SPEC_TYPE_IDENTIFIER && memcmp(mk->mk_spec.u.identifier, mk_spec->u.identifier, FSCRYPT_KEY_IDENTIFIER_SIZE) == 0 && refcount_inc_not_zero(&mk->mk_struct_refs)) goto out; } break; } mk = NULL; out: rcu_read_unlock(); return mk; } static int allocate_master_key_users_keyring(struct fscrypt_master_key *mk) { char description[FSCRYPT_MK_USERS_DESCRIPTION_SIZE]; struct key *keyring; format_mk_users_keyring_description(description, mk->mk_spec.u.identifier); keyring = keyring_alloc(description, GLOBAL_ROOT_UID, GLOBAL_ROOT_GID, current_cred(), KEY_POS_SEARCH | KEY_USR_SEARCH | KEY_USR_READ | KEY_USR_VIEW, KEY_ALLOC_NOT_IN_QUOTA, NULL, NULL); if (IS_ERR(keyring)) return PTR_ERR(keyring); mk->mk_users = keyring; return 0; } /* * Find the current user's "key" in the master key's ->mk_users. * Returns ERR_PTR(-ENOKEY) if not found. */ static struct key *find_master_key_user(struct fscrypt_master_key *mk) { char description[FSCRYPT_MK_USER_DESCRIPTION_SIZE]; key_ref_t keyref; format_mk_user_description(description, mk->mk_spec.u.identifier); /* * We need to mark the keyring reference as "possessed" so that we * acquire permission to search it, via the KEY_POS_SEARCH permission. */ keyref = keyring_search(make_key_ref(mk->mk_users, true /*possessed*/), &key_type_fscrypt_user, description, false); if (IS_ERR(keyref)) { if (PTR_ERR(keyref) == -EAGAIN || /* not found */ PTR_ERR(keyref) == -EKEYREVOKED) /* recently invalidated */ keyref = ERR_PTR(-ENOKEY); return ERR_CAST(keyref); } return key_ref_to_ptr(keyref); } /* * Give the current user a "key" in ->mk_users. This charges the user's quota * and marks the master key as added by the current user, so that it cannot be * removed by another user with the key. Either ->mk_sem must be held for * write, or the master key must be still undergoing initialization. */ static int add_master_key_user(struct fscrypt_master_key *mk) { char description[FSCRYPT_MK_USER_DESCRIPTION_SIZE]; struct key *mk_user; int err; format_mk_user_description(description, mk->mk_spec.u.identifier); mk_user = key_alloc(&key_type_fscrypt_user, description, current_fsuid(), current_gid(), current_cred(), KEY_POS_SEARCH | KEY_USR_VIEW, 0, NULL); if (IS_ERR(mk_user)) return PTR_ERR(mk_user); err = key_instantiate_and_link(mk_user, NULL, 0, mk->mk_users, NULL); key_put(mk_user); return err; } /* * Remove the current user's "key" from ->mk_users. * ->mk_sem must be held for write. * * Returns 0 if removed, -ENOKEY if not found, or another -errno code. */ static int remove_master_key_user(struct fscrypt_master_key *mk) { struct key *mk_user; int err; mk_user = find_master_key_user(mk); if (IS_ERR(mk_user)) return PTR_ERR(mk_user); err = key_unlink(mk->mk_users, mk_user); key_put(mk_user); return err; } /* * Allocate a new fscrypt_master_key, transfer the given secret over to it, and * insert it into sb->s_master_keys. */ static int add_new_master_key(struct super_block *sb, struct fscrypt_master_key_secret *secret, const struct fscrypt_key_specifier *mk_spec) { struct fscrypt_keyring *keyring = sb->s_master_keys; struct fscrypt_master_key *mk; int err; mk = kzalloc(sizeof(*mk), GFP_KERNEL); if (!mk) return -ENOMEM; init_rwsem(&mk->mk_sem); refcount_set(&mk->mk_struct_refs, 1); mk->mk_spec = *mk_spec; INIT_LIST_HEAD(&mk->mk_decrypted_inodes); spin_lock_init(&mk->mk_decrypted_inodes_lock); if (mk_spec->type == FSCRYPT_KEY_SPEC_TYPE_IDENTIFIER) { err = allocate_master_key_users_keyring(mk); if (err) goto out_put; err = add_master_key_user(mk); if (err) goto out_put; } move_master_key_secret(&mk->mk_secret, secret); mk->mk_present = true; refcount_set(&mk->mk_active_refs, 1); /* ->mk_present is true */ spin_lock(&keyring->lock); hlist_add_head_rcu(&mk->mk_node, fscrypt_mk_hash_bucket(keyring, mk_spec)); spin_unlock(&keyring->lock); return 0; out_put: fscrypt_put_master_key(mk); return err; } #define KEY_DEAD 1 static int add_existing_master_key(struct fscrypt_master_key *mk, struct fscrypt_master_key_secret *secret) { int err; /* * If the current user is already in ->mk_users, then there's nothing to * do. Otherwise, we need to add the user to ->mk_users. (Neither is * applicable for v1 policy keys, which have NULL ->mk_users.) */ if (mk->mk_users) { struct key *mk_user = find_master_key_user(mk); if (mk_user != ERR_PTR(-ENOKEY)) { if (IS_ERR(mk_user)) return PTR_ERR(mk_user); key_put(mk_user); return 0; } err = add_master_key_user(mk); if (err) return err; } /* If the key is incompletely removed, make it present again. */ if (!mk->mk_present) { if (!refcount_inc_not_zero(&mk->mk_active_refs)) { /* * Raced with the last active ref being dropped, so the * key has become, or is about to become, "absent". * Therefore, we need to allocate a new key struct. */ return KEY_DEAD; } move_master_key_secret(&mk->mk_secret, secret); WRITE_ONCE(mk->mk_present, true); } return 0; } static int do_add_master_key(struct super_block *sb, struct fscrypt_master_key_secret *secret, const struct fscrypt_key_specifier *mk_spec) { static DEFINE_MUTEX(fscrypt_add_key_mutex); struct fscrypt_master_key *mk; int err; mutex_lock(&fscrypt_add_key_mutex); /* serialize find + link */ mk = fscrypt_find_master_key(sb, mk_spec); if (!mk) { /* Didn't find the key in ->s_master_keys. Add it. */ err = allocate_filesystem_keyring(sb); if (!err) err = add_new_master_key(sb, secret, mk_spec); } else { /* * Found the key in ->s_master_keys. Add the user to ->mk_users * if needed, and make the key "present" again if possible. */ down_write(&mk->mk_sem); err = add_existing_master_key(mk, secret); up_write(&mk->mk_sem); if (err == KEY_DEAD) { /* * We found a key struct, but it's already been fully * removed. Ignore the old struct and add a new one. * fscrypt_add_key_mutex means we don't need to worry * about concurrent adds. */ err = add_new_master_key(sb, secret, mk_spec); } fscrypt_put_master_key(mk); } mutex_unlock(&fscrypt_add_key_mutex); return err; } static int add_master_key(struct super_block *sb, struct fscrypt_master_key_secret *secret, struct fscrypt_key_specifier *key_spec) { int err; if (key_spec->type == FSCRYPT_KEY_SPEC_TYPE_IDENTIFIER) { u8 sw_secret[BLK_CRYPTO_SW_SECRET_SIZE]; u8 *kdf_key = secret->bytes; unsigned int kdf_key_size = secret->size; u8 keyid_kdf_ctx = HKDF_CONTEXT_KEY_IDENTIFIER_FOR_RAW_KEY; /* * For raw keys, the fscrypt master key is used directly as the * fscrypt KDF key. For hardware-wrapped keys, we have to pass * the master key to the hardware to derive the KDF key, which * is then only used to derive non-file-contents subkeys. */ if (secret->is_hw_wrapped) { err = fscrypt_derive_sw_secret(sb, secret->bytes, secret->size, sw_secret); if (err) return err; kdf_key = sw_secret; kdf_key_size = sizeof(sw_secret); /* * To avoid weird behavior if someone manages to * determine sw_secret and add it as a raw key, ensure * that hardware-wrapped keys and raw keys will have * different key identifiers by deriving their key * identifiers using different KDF contexts. */ keyid_kdf_ctx = HKDF_CONTEXT_KEY_IDENTIFIER_FOR_HW_WRAPPED_KEY; } fscrypt_init_hkdf(&secret->hkdf, kdf_key, kdf_key_size); /* * Now that the KDF context is initialized, the raw KDF key is * no longer needed. */ memzero_explicit(kdf_key, kdf_key_size); /* Calculate the key identifier */ fscrypt_hkdf_expand(&secret->hkdf, keyid_kdf_ctx, NULL, 0, key_spec->u.identifier, FSCRYPT_KEY_IDENTIFIER_SIZE); } return do_add_master_key(sb, secret, key_spec); } /* * Validate the size of an fscrypt master key being added. Note that this is * just an initial check, as we don't know which ciphers will be used yet. * There is a stricter size check later when the key is actually used by a file. */ static inline bool fscrypt_valid_key_size(size_t size, u32 add_key_flags) { u32 max_size = (add_key_flags & FSCRYPT_ADD_KEY_FLAG_HW_WRAPPED) ? FSCRYPT_MAX_HW_WRAPPED_KEY_SIZE : FSCRYPT_MAX_RAW_KEY_SIZE; return size >= FSCRYPT_MIN_KEY_SIZE && size <= max_size; } static int fscrypt_provisioning_key_preparse(struct key_preparsed_payload *prep) { const struct fscrypt_provisioning_key_payload *payload = prep->data; if (prep->datalen < sizeof(*payload)) return -EINVAL; if (!fscrypt_valid_key_size(prep->datalen - sizeof(*payload), payload->flags)) return -EINVAL; if (payload->type != FSCRYPT_KEY_SPEC_TYPE_DESCRIPTOR && payload->type != FSCRYPT_KEY_SPEC_TYPE_IDENTIFIER) return -EINVAL; if (payload->flags & ~FSCRYPT_ADD_KEY_FLAG_HW_WRAPPED) return -EINVAL; prep->payload.data[0] = kmemdup(payload, prep->datalen, GFP_KERNEL); if (!prep->payload.data[0]) return -ENOMEM; prep->quotalen = prep->datalen; return 0; } static void fscrypt_provisioning_key_free_preparse( struct key_preparsed_payload *prep) { kfree_sensitive(prep->payload.data[0]); } static void fscrypt_provisioning_key_describe(const struct key *key, struct seq_file *m) { seq_puts(m, key->description); if (key_is_positive(key)) { const struct fscrypt_provisioning_key_payload *payload = key->payload.data[0]; seq_printf(m, ": %u [%u]", key->datalen, payload->type); } } static void fscrypt_provisioning_key_destroy(struct key *key) { kfree_sensitive(key->payload.data[0]); } static struct key_type key_type_fscrypt_provisioning = { .name = "fscrypt-provisioning", .preparse = fscrypt_provisioning_key_preparse, .free_preparse = fscrypt_provisioning_key_free_preparse, .instantiate = generic_key_instantiate, .describe = fscrypt_provisioning_key_describe, .destroy = fscrypt_provisioning_key_destroy, }; /* * Retrieve the key from the Linux keyring key specified by 'key_id', and store * it into 'secret'. * * The key must be of type "fscrypt-provisioning" and must have the 'type' and * 'flags' field of the payload set to the given values, indicating that the key * is intended for use for the specified purpose. We don't use the "logon" key * type because there's no way to completely restrict the use of such keys; they * can be used by any kernel API that accepts "logon" keys and doesn't require a * specific service prefix. * * The ability to specify the key via Linux keyring key is intended for cases * where userspace needs to re-add keys after the filesystem is unmounted and * re-mounted. Most users should just provide the key directly instead. */ static int get_keyring_key(u32 key_id, u32 type, u32 flags, struct fscrypt_master_key_secret *secret) { key_ref_t ref; struct key *key; const struct fscrypt_provisioning_key_payload *payload; int err; ref = lookup_user_key(key_id, 0, KEY_NEED_SEARCH); if (IS_ERR(ref)) return PTR_ERR(ref); key = key_ref_to_ptr(ref); if (key->type != &key_type_fscrypt_provisioning) goto bad_key; payload = key->payload.data[0]; /* * Don't allow fscrypt v1 keys to be used as v2 keys and vice versa. * Similarly, don't allow hardware-wrapped keys to be used as * non-hardware-wrapped keys and vice versa. */ if (payload->type != type || payload->flags != flags) goto bad_key; secret->size = key->datalen - sizeof(*payload); memcpy(secret->bytes, payload->raw, secret->size); err = 0; goto out_put; bad_key: err = -EKEYREJECTED; out_put: key_ref_put(ref); return err; } /* * Add a master encryption key to the filesystem, causing all files which were * encrypted with it to appear "unlocked" (decrypted) when accessed. * * When adding a key for use by v1 encryption policies, this ioctl is * privileged, and userspace must provide the 'key_descriptor'. * * When adding a key for use by v2+ encryption policies, this ioctl is * unprivileged. This is needed, in general, to allow non-root users to use * encryption without encountering the visibility problems of process-subscribed * keyrings and the inability to properly remove keys. This works by having * each key identified by its cryptographically secure hash --- the * 'key_identifier'. The cryptographic hash ensures that a malicious user * cannot add the wrong key for a given identifier. Furthermore, each added key * is charged to the appropriate user's quota for the keyrings service, which * prevents a malicious user from adding too many keys. Finally, we forbid a * user from removing a key while other users have added it too, which prevents * a user who knows another user's key from causing a denial-of-service by * removing it at an inopportune time. (We tolerate that a user who knows a key * can prevent other users from removing it.) * * For more details, see the "FS_IOC_ADD_ENCRYPTION_KEY" section of * Documentation/filesystems/fscrypt.rst. */ int fscrypt_ioctl_add_key(struct file *filp, void __user *_uarg) { struct super_block *sb = file_inode(filp)->i_sb; struct fscrypt_add_key_arg __user *uarg = _uarg; struct fscrypt_add_key_arg arg; struct fscrypt_master_key_secret secret; int err; if (copy_from_user(&arg, uarg, sizeof(arg))) return -EFAULT; if (!valid_key_spec(&arg.key_spec)) return -EINVAL; if (memchr_inv(arg.__reserved, 0, sizeof(arg.__reserved))) return -EINVAL; /* * Only root can add keys that are identified by an arbitrary descriptor * rather than by a cryptographic hash --- since otherwise a malicious * user could add the wrong key. */ if (arg.key_spec.type == FSCRYPT_KEY_SPEC_TYPE_DESCRIPTOR && !capable(CAP_SYS_ADMIN)) return -EACCES; memset(&secret, 0, sizeof(secret)); if (arg.flags) { if (arg.flags & ~FSCRYPT_ADD_KEY_FLAG_HW_WRAPPED) return -EINVAL; if (arg.key_spec.type != FSCRYPT_KEY_SPEC_TYPE_IDENTIFIER) return -EINVAL; secret.is_hw_wrapped = true; } if (arg.key_id) { if (arg.raw_size != 0) return -EINVAL; err = get_keyring_key(arg.key_id, arg.key_spec.type, arg.flags, &secret); if (err) goto out_wipe_secret; } else { if (!fscrypt_valid_key_size(arg.raw_size, arg.flags)) return -EINVAL; secret.size = arg.raw_size; err = -EFAULT; if (copy_from_user(secret.bytes, uarg->raw, secret.size)) goto out_wipe_secret; } err = add_master_key(sb, &secret, &arg.key_spec); if (err) goto out_wipe_secret; /* Return the key identifier to userspace, if applicable */ err = -EFAULT; if (arg.key_spec.type == FSCRYPT_KEY_SPEC_TYPE_IDENTIFIER && copy_to_user(uarg->key_spec.u.identifier, arg.key_spec.u.identifier, FSCRYPT_KEY_IDENTIFIER_SIZE)) goto out_wipe_secret; err = 0; out_wipe_secret: wipe_master_key_secret(&secret); return err; } EXPORT_SYMBOL_GPL(fscrypt_ioctl_add_key); static void fscrypt_get_test_dummy_secret(struct fscrypt_master_key_secret *secret) { static u8 test_key[FSCRYPT_MAX_RAW_KEY_SIZE]; get_random_once(test_key, sizeof(test_key)); memset(secret, 0, sizeof(*secret)); secret->size = sizeof(test_key); memcpy(secret->bytes, test_key, sizeof(test_key)); } void fscrypt_get_test_dummy_key_identifier( u8 key_identifier[FSCRYPT_KEY_IDENTIFIER_SIZE]) { struct fscrypt_master_key_secret secret; fscrypt_get_test_dummy_secret(&secret); fscrypt_init_hkdf(&secret.hkdf, secret.bytes, secret.size); fscrypt_hkdf_expand(&secret.hkdf, HKDF_CONTEXT_KEY_IDENTIFIER_FOR_RAW_KEY, NULL, 0, key_identifier, FSCRYPT_KEY_IDENTIFIER_SIZE); wipe_master_key_secret(&secret); } /** * fscrypt_add_test_dummy_key() - add the test dummy encryption key * @sb: the filesystem instance to add the key to * @key_spec: the key specifier of the test dummy encryption key * * Add the key for the test_dummy_encryption mount option to the filesystem. To * prevent misuse of this mount option, a per-boot random key is used instead of * a hardcoded one. This makes it so that any encrypted files created using * this option won't be accessible after a reboot. * * Return: 0 on success, -errno on failure */ int fscrypt_add_test_dummy_key(struct super_block *sb, struct fscrypt_key_specifier *key_spec) { struct fscrypt_master_key_secret secret; int err; fscrypt_get_test_dummy_secret(&secret); err = add_master_key(sb, &secret, key_spec); wipe_master_key_secret(&secret); return err; } /* * Verify that the current user has added a master key with the given identifier * (returns -ENOKEY if not). This is needed to prevent a user from encrypting * their files using some other user's key which they don't actually know. * Cryptographically this isn't much of a problem, but the semantics of this * would be a bit weird, so it's best to just forbid it. * * The system administrator (CAP_FOWNER) can override this, which should be * enough for any use cases where encryption policies are being set using keys * that were chosen ahead of time but aren't available at the moment. * * Note that the key may have already removed by the time this returns, but * that's okay; we just care whether the key was there at some point. * * Return: 0 if the key is added, -ENOKEY if it isn't, or another -errno code */ int fscrypt_verify_key_added(struct super_block *sb, const u8 identifier[FSCRYPT_KEY_IDENTIFIER_SIZE]) { struct fscrypt_key_specifier mk_spec; struct fscrypt_master_key *mk; struct key *mk_user; int err; mk_spec.type = FSCRYPT_KEY_SPEC_TYPE_IDENTIFIER; memcpy(mk_spec.u.identifier, identifier, FSCRYPT_KEY_IDENTIFIER_SIZE); mk = fscrypt_find_master_key(sb, &mk_spec); if (!mk) { err = -ENOKEY; goto out; } down_read(&mk->mk_sem); mk_user = find_master_key_user(mk); if (IS_ERR(mk_user)) { err = PTR_ERR(mk_user); } else { key_put(mk_user); err = 0; } up_read(&mk->mk_sem); fscrypt_put_master_key(mk); out: if (err == -ENOKEY && capable(CAP_FOWNER)) err = 0; return err; } /* * Try to evict the inode's dentries from the dentry cache. If the inode is a * directory, then it can have at most one dentry; however, that dentry may be * pinned by child dentries, so first try to evict the children too. */ static void shrink_dcache_inode(struct inode *inode) { struct dentry *dentry; if (S_ISDIR(inode->i_mode)) { dentry = d_find_any_alias(inode); if (dentry) { shrink_dcache_parent(dentry); dput(dentry); } } d_prune_aliases(inode); } static void evict_dentries_for_decrypted_inodes(struct fscrypt_master_key *mk) { struct fscrypt_inode_info *ci; struct inode *inode; struct inode *toput_inode = NULL; spin_lock(&mk->mk_decrypted_inodes_lock); list_for_each_entry(ci, &mk->mk_decrypted_inodes, ci_master_key_link) { inode = ci->ci_inode; spin_lock(&inode->i_lock); if (inode->i_state & (I_FREEING | I_WILL_FREE | I_NEW)) { spin_unlock(&inode->i_lock); continue; } __iget(inode); spin_unlock(&inode->i_lock); spin_unlock(&mk->mk_decrypted_inodes_lock); shrink_dcache_inode(inode); iput(toput_inode); toput_inode = inode; spin_lock(&mk->mk_decrypted_inodes_lock); } spin_unlock(&mk->mk_decrypted_inodes_lock); iput(toput_inode); } static int check_for_busy_inodes(struct super_block *sb, struct fscrypt_master_key *mk) { struct list_head *pos; size_t busy_count = 0; unsigned long ino; char ino_str[50] = ""; spin_lock(&mk->mk_decrypted_inodes_lock); list_for_each(pos, &mk->mk_decrypted_inodes) busy_count++; if (busy_count == 0) { spin_unlock(&mk->mk_decrypted_inodes_lock); return 0; } { /* select an example file to show for debugging purposes */ struct inode *inode = list_first_entry(&mk->mk_decrypted_inodes, struct fscrypt_inode_info, ci_master_key_link)->ci_inode; ino = inode->i_ino; } spin_unlock(&mk->mk_decrypted_inodes_lock); /* If the inode is currently being created, ino may still be 0. */ if (ino) snprintf(ino_str, sizeof(ino_str), ", including ino %lu", ino); fscrypt_warn(NULL, "%s: %zu inode(s) still busy after removing key with %s %*phN%s", sb->s_id, busy_count, master_key_spec_type(&mk->mk_spec), master_key_spec_len(&mk->mk_spec), (u8 *)&mk->mk_spec.u, ino_str); return -EBUSY; } static int try_to_lock_encrypted_files(struct super_block *sb, struct fscrypt_master_key *mk) { int err1; int err2; /* * An inode can't be evicted while it is dirty or has dirty pages. * Thus, we first have to clean the inodes in ->mk_decrypted_inodes. * * Just do it the easy way: call sync_filesystem(). It's overkill, but * it works, and it's more important to minimize the amount of caches we * drop than the amount of data we sync. Also, unprivileged users can * already call sync_filesystem() via sys_syncfs() or sys_sync(). */ down_read(&sb->s_umount); err1 = sync_filesystem(sb); up_read(&sb->s_umount); /* If a sync error occurs, still try to evict as much as possible. */ /* * Inodes are pinned by their dentries, so we have to evict their * dentries. shrink_dcache_sb() would suffice, but would be overkill * and inappropriate for use by unprivileged users. So instead go * through the inodes' alias lists and try to evict each dentry. */ evict_dentries_for_decrypted_inodes(mk); /* * evict_dentries_for_decrypted_inodes() already iput() each inode in * the list; any inodes for which that dropped the last reference will * have been evicted due to fscrypt_drop_inode() detecting the key * removal and telling the VFS to evict the inode. So to finish, we * just need to check whether any inodes couldn't be evicted. */ err2 = check_for_busy_inodes(sb, mk); return err1 ?: err2; } /* * Try to remove an fscrypt master encryption key. * * FS_IOC_REMOVE_ENCRYPTION_KEY (all_users=false) removes the current user's * claim to the key, then removes the key itself if no other users have claims. * FS_IOC_REMOVE_ENCRYPTION_KEY_ALL_USERS (all_users=true) always removes the * key itself. * * To "remove the key itself", first we transition the key to the "incompletely * removed" state, so that no more inodes can be unlocked with it. Then we try * to evict all cached inodes that had been unlocked with the key. * * If all inodes were evicted, then we unlink the fscrypt_master_key from the * keyring. Otherwise it remains in the keyring in the "incompletely removed" * state where it tracks the list of remaining inodes. Userspace can execute * the ioctl again later to retry eviction, or alternatively can re-add the key. * * For more details, see the "Removing keys" section of * Documentation/filesystems/fscrypt.rst. */ static int do_remove_key(struct file *filp, void __user *_uarg, bool all_users) { struct super_block *sb = file_inode(filp)->i_sb; struct fscrypt_remove_key_arg __user *uarg = _uarg; struct fscrypt_remove_key_arg arg; struct fscrypt_master_key *mk; u32 status_flags = 0; int err; bool inodes_remain; if (copy_from_user(&arg, uarg, sizeof(arg))) return -EFAULT; if (!valid_key_spec(&arg.key_spec)) return -EINVAL; if (memchr_inv(arg.__reserved, 0, sizeof(arg.__reserved))) return -EINVAL; /* * Only root can add and remove keys that are identified by an arbitrary * descriptor rather than by a cryptographic hash. */ if (arg.key_spec.type == FSCRYPT_KEY_SPEC_TYPE_DESCRIPTOR && !capable(CAP_SYS_ADMIN)) return -EACCES; /* Find the key being removed. */ mk = fscrypt_find_master_key(sb, &arg.key_spec); if (!mk) return -ENOKEY; down_write(&mk->mk_sem); /* If relevant, remove current user's (or all users) claim to the key */ if (mk->mk_users && mk->mk_users->keys.nr_leaves_on_tree != 0) { if (all_users) err = keyring_clear(mk->mk_users); else err = remove_master_key_user(mk); if (err) { up_write(&mk->mk_sem); goto out_put_key; } if (mk->mk_users->keys.nr_leaves_on_tree != 0) { /* * Other users have still added the key too. We removed * the current user's claim to the key, but we still * can't remove the key itself. */ status_flags |= FSCRYPT_KEY_REMOVAL_STATUS_FLAG_OTHER_USERS; err = 0; up_write(&mk->mk_sem); goto out_put_key; } } /* No user claims remaining. Initiate removal of the key. */ err = -ENOKEY; if (mk->mk_present) { fscrypt_initiate_key_removal(sb, mk); err = 0; } inodes_remain = refcount_read(&mk->mk_active_refs) > 0; up_write(&mk->mk_sem); if (inodes_remain) { /* Some inodes still reference this key; try to evict them. */ err = try_to_lock_encrypted_files(sb, mk); if (err == -EBUSY) { status_flags |= FSCRYPT_KEY_REMOVAL_STATUS_FLAG_FILES_BUSY; err = 0; } } /* * We return 0 if we successfully did something: removed a claim to the * key, initiated removal of the key, or tried locking the files again. * Users need to check the informational status flags if they care * whether the key has been fully removed including all files locked. */ out_put_key: fscrypt_put_master_key(mk); if (err == 0) err = put_user(status_flags, &uarg->removal_status_flags); return err; } int fscrypt_ioctl_remove_key(struct file *filp, void __user *uarg) { return do_remove_key(filp, uarg, false); } EXPORT_SYMBOL_GPL(fscrypt_ioctl_remove_key); int fscrypt_ioctl_remove_key_all_users(struct file *filp, void __user *uarg) { if (!capable(CAP_SYS_ADMIN)) return -EACCES; return do_remove_key(filp, uarg, true); } EXPORT_SYMBOL_GPL(fscrypt_ioctl_remove_key_all_users); /* * Retrieve the status of an fscrypt master encryption key. * * We set ->status to indicate whether the key is absent, present, or * incompletely removed. (For an explanation of what these statuses mean and * how they are represented internally, see struct fscrypt_master_key.) This * field allows applications to easily determine the status of an encrypted * directory without using a hack such as trying to open a regular file in it * (which can confuse the "incompletely removed" status with absent or present). * * In addition, for v2 policy keys we allow applications to determine, via * ->status_flags and ->user_count, whether the key has been added by the * current user, by other users, or by both. Most applications should not need * this, since ordinarily only one user should know a given key. However, if a * secret key is shared by multiple users, applications may wish to add an * already-present key to prevent other users from removing it. This ioctl can * be used to check whether that really is the case before the work is done to * add the key --- which might e.g. require prompting the user for a passphrase. * * For more details, see the "FS_IOC_GET_ENCRYPTION_KEY_STATUS" section of * Documentation/filesystems/fscrypt.rst. */ int fscrypt_ioctl_get_key_status(struct file *filp, void __user *uarg) { struct super_block *sb = file_inode(filp)->i_sb; struct fscrypt_get_key_status_arg arg; struct fscrypt_master_key *mk; int err; if (copy_from_user(&arg, uarg, sizeof(arg))) return -EFAULT; if (!valid_key_spec(&arg.key_spec)) return -EINVAL; if (memchr_inv(arg.__reserved, 0, sizeof(arg.__reserved))) return -EINVAL; arg.status_flags = 0; arg.user_count = 0; memset(arg.__out_reserved, 0, sizeof(arg.__out_reserved)); mk = fscrypt_find_master_key(sb, &arg.key_spec); if (!mk) { arg.status = FSCRYPT_KEY_STATUS_ABSENT; err = 0; goto out; } down_read(&mk->mk_sem); if (!mk->mk_present) { arg.status = refcount_read(&mk->mk_active_refs) > 0 ? FSCRYPT_KEY_STATUS_INCOMPLETELY_REMOVED : FSCRYPT_KEY_STATUS_ABSENT /* raced with full removal */; err = 0; goto out_release_key; } arg.status = FSCRYPT_KEY_STATUS_PRESENT; if (mk->mk_users) { struct key *mk_user; arg.user_count = mk->mk_users->keys.nr_leaves_on_tree; mk_user = find_master_key_user(mk); if (!IS_ERR(mk_user)) { arg.status_flags |= FSCRYPT_KEY_STATUS_FLAG_ADDED_BY_SELF; key_put(mk_user); } else if (mk_user != ERR_PTR(-ENOKEY)) { err = PTR_ERR(mk_user); goto out_release_key; } } err = 0; out_release_key: up_read(&mk->mk_sem); fscrypt_put_master_key(mk); out: if (!err && copy_to_user(uarg, &arg, sizeof(arg))) err = -EFAULT; return err; } EXPORT_SYMBOL_GPL(fscrypt_ioctl_get_key_status); int __init fscrypt_init_keyring(void) { int err; err = register_key_type(&key_type_fscrypt_user); if (err) return err; err = register_key_type(&key_type_fscrypt_provisioning); if (err) goto err_unregister_fscrypt_user; return 0; err_unregister_fscrypt_user: unregister_key_type(&key_type_fscrypt_user); return err; } |
| 1464 1463 64 64 6 6 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 | // SPDX-License-Identifier: GPL-2.0 /* * shstk.c - Intel shadow stack support * * Copyright (c) 2021, Intel Corporation. * Yu-cheng Yu <yu-cheng.yu@intel.com> */ #include <linux/sched.h> #include <linux/bitops.h> #include <linux/types.h> #include <linux/mm.h> #include <linux/mman.h> #include <linux/slab.h> #include <linux/uaccess.h> #include <linux/sched/signal.h> #include <linux/compat.h> #include <linux/sizes.h> #include <linux/user.h> #include <linux/syscalls.h> #include <asm/msr.h> #include <asm/fpu/xstate.h> #include <asm/fpu/types.h> #include <asm/shstk.h> #include <asm/special_insns.h> #include <asm/fpu/api.h> #include <asm/prctl.h> #define SS_FRAME_SIZE 8 static bool features_enabled(unsigned long features) { return current->thread.features & features; } static void features_set(unsigned long features) { current->thread.features |= features; } static void features_clr(unsigned long features) { current->thread.features &= ~features; } /* * Create a restore token on the shadow stack. A token is always 8-byte * and aligned to 8. */ static int create_rstor_token(unsigned long ssp, unsigned long *token_addr) { unsigned long addr; /* Token must be aligned */ if (!IS_ALIGNED(ssp, 8)) return -EINVAL; addr = ssp - SS_FRAME_SIZE; /* * SSP is aligned, so reserved bits and mode bit are a zero, just mark * the token 64-bit. */ ssp |= BIT(0); if (write_user_shstk_64((u64 __user *)addr, (u64)ssp)) return -EFAULT; if (token_addr) *token_addr = addr; return 0; } /* * VM_SHADOW_STACK will have a guard page. This helps userspace protect * itself from attacks. The reasoning is as follows: * * The shadow stack pointer(SSP) is moved by CALL, RET, and INCSSPQ. The * INCSSP instruction can increment the shadow stack pointer. It is the * shadow stack analog of an instruction like: * * addq $0x80, %rsp * * However, there is one important difference between an ADD on %rsp * and INCSSP. In addition to modifying SSP, INCSSP also reads from the * memory of the first and last elements that were "popped". It can be * thought of as acting like this: * * READ_ONCE(ssp); // read+discard top element on stack * ssp += nr_to_pop * 8; // move the shadow stack * READ_ONCE(ssp-8); // read+discard last popped stack element * * The maximum distance INCSSP can move the SSP is 2040 bytes, before * it would read the memory. Therefore a single page gap will be enough * to prevent any operation from shifting the SSP to an adjacent stack, * since it would have to land in the gap at least once, causing a * fault. */ static unsigned long alloc_shstk(unsigned long addr, unsigned long size, unsigned long token_offset, bool set_res_tok) { int flags = MAP_ANONYMOUS | MAP_PRIVATE | MAP_ABOVE4G; struct mm_struct *mm = current->mm; unsigned long mapped_addr, unused; if (addr) flags |= MAP_FIXED_NOREPLACE; mmap_write_lock(mm); mapped_addr = do_mmap(NULL, addr, size, PROT_READ, flags, VM_SHADOW_STACK | VM_WRITE, 0, &unused, NULL); mmap_write_unlock(mm); if (!set_res_tok || IS_ERR_VALUE(mapped_addr)) goto out; if (create_rstor_token(mapped_addr + token_offset, NULL)) { vm_munmap(mapped_addr, size); return -EINVAL; } out: return mapped_addr; } static unsigned long adjust_shstk_size(unsigned long size) { if (size) return PAGE_ALIGN(size); return PAGE_ALIGN(min_t(unsigned long long, rlimit(RLIMIT_STACK), SZ_4G)); } static void unmap_shadow_stack(u64 base, u64 size) { int r; r = vm_munmap(base, size); /* * mmap_write_lock_killable() failed with -EINTR. This means * the process is about to die and have it's MM cleaned up. * This task shouldn't ever make it back to userspace. In this * case it is ok to leak a shadow stack, so just exit out. */ if (r == -EINTR) return; /* * For all other types of vm_munmap() failure, either the * system is out of memory or there is bug. */ WARN_ON_ONCE(r); } static int shstk_setup(void) { struct thread_shstk *shstk = ¤t->thread.shstk; unsigned long addr, size; /* Already enabled */ if (features_enabled(ARCH_SHSTK_SHSTK)) return 0; /* Also not supported for 32 bit */ if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK) || in_ia32_syscall()) return -EOPNOTSUPP; size = adjust_shstk_size(0); addr = alloc_shstk(0, size, 0, false); if (IS_ERR_VALUE(addr)) return PTR_ERR((void *)addr); fpregs_lock_and_load(); wrmsrq(MSR_IA32_PL3_SSP, addr + size); wrmsrq(MSR_IA32_U_CET, CET_SHSTK_EN); fpregs_unlock(); shstk->base = addr; shstk->size = size; features_set(ARCH_SHSTK_SHSTK); return 0; } void reset_thread_features(void) { memset(¤t->thread.shstk, 0, sizeof(struct thread_shstk)); current->thread.features = 0; current->thread.features_locked = 0; } unsigned long shstk_alloc_thread_stack(struct task_struct *tsk, u64 clone_flags, unsigned long stack_size) { struct thread_shstk *shstk = &tsk->thread.shstk; unsigned long addr, size; /* * If shadow stack is not enabled on the new thread, skip any * switch to a new shadow stack. */ if (!features_enabled(ARCH_SHSTK_SHSTK)) return 0; /* * For CLONE_VFORK the child will share the parents shadow stack. * Make sure to clear the internal tracking of the thread shadow * stack so the freeing logic run for child knows to leave it alone. */ if (clone_flags & CLONE_VFORK) { shstk->base = 0; shstk->size = 0; return 0; } /* * For !CLONE_VM the child will use a copy of the parents shadow * stack. */ if (!(clone_flags & CLONE_VM)) return 0; size = adjust_shstk_size(stack_size); addr = alloc_shstk(0, size, 0, false); if (IS_ERR_VALUE(addr)) return addr; shstk->base = addr; shstk->size = size; return addr + size; } static unsigned long get_user_shstk_addr(void) { unsigned long long ssp; fpregs_lock_and_load(); rdmsrq(MSR_IA32_PL3_SSP, ssp); fpregs_unlock(); return ssp; } int shstk_pop(u64 *val) { int ret = 0; u64 ssp; if (!features_enabled(ARCH_SHSTK_SHSTK)) return -ENOTSUPP; fpregs_lock_and_load(); rdmsrq(MSR_IA32_PL3_SSP, ssp); if (val && get_user(*val, (__user u64 *)ssp)) ret = -EFAULT; else wrmsrq(MSR_IA32_PL3_SSP, ssp + SS_FRAME_SIZE); fpregs_unlock(); return ret; } int shstk_push(u64 val) { u64 ssp; int ret; if (!features_enabled(ARCH_SHSTK_SHSTK)) return -ENOTSUPP; fpregs_lock_and_load(); rdmsrq(MSR_IA32_PL3_SSP, ssp); ssp -= SS_FRAME_SIZE; ret = write_user_shstk_64((__user void *)ssp, val); if (!ret) wrmsrq(MSR_IA32_PL3_SSP, ssp); fpregs_unlock(); return ret; } #define SHSTK_DATA_BIT BIT(63) static int put_shstk_data(u64 __user *addr, u64 data) { if (WARN_ON_ONCE(data & SHSTK_DATA_BIT)) return -EINVAL; /* * Mark the high bit so that the sigframe can't be processed as a * return address. */ if (write_user_shstk_64(addr, data | SHSTK_DATA_BIT)) return -EFAULT; return 0; } static int get_shstk_data(unsigned long *data, unsigned long __user *addr) { unsigned long ldata; if (unlikely(get_user(ldata, addr))) return -EFAULT; if (!(ldata & SHSTK_DATA_BIT)) return -EINVAL; *data = ldata & ~SHSTK_DATA_BIT; return 0; } static int shstk_push_sigframe(unsigned long *ssp) { unsigned long target_ssp = *ssp; /* Token must be aligned */ if (!IS_ALIGNED(target_ssp, 8)) return -EINVAL; *ssp -= SS_FRAME_SIZE; if (put_shstk_data((void __user *)*ssp, target_ssp)) return -EFAULT; return 0; } static int shstk_pop_sigframe(unsigned long *ssp) { struct vm_area_struct *vma; unsigned long token_addr; bool need_to_check_vma; int err = 1; /* * It is possible for the SSP to be off the end of a shadow stack by 4 * or 8 bytes. If the shadow stack is at the start of a page or 4 bytes * before it, it might be this case, so check that the address being * read is actually shadow stack. */ if (!IS_ALIGNED(*ssp, 8)) return -EINVAL; need_to_check_vma = PAGE_ALIGN(*ssp) == *ssp; if (need_to_check_vma) mmap_read_lock_killable(current->mm); err = get_shstk_data(&token_addr, (unsigned long __user *)*ssp); if (unlikely(err)) goto out_err; if (need_to_check_vma) { vma = find_vma(current->mm, *ssp); if (!vma || !(vma->vm_flags & VM_SHADOW_STACK)) { err = -EFAULT; goto out_err; } mmap_read_unlock(current->mm); } /* Restore SSP aligned? */ if (unlikely(!IS_ALIGNED(token_addr, 8))) return -EINVAL; /* SSP in userspace? */ if (unlikely(token_addr >= TASK_SIZE_MAX)) return -EINVAL; *ssp = token_addr; return 0; out_err: if (need_to_check_vma) mmap_read_unlock(current->mm); return err; } int setup_signal_shadow_stack(struct ksignal *ksig) { void __user *restorer = ksig->ka.sa.sa_restorer; unsigned long ssp; int err; if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK) || !features_enabled(ARCH_SHSTK_SHSTK)) return 0; if (!restorer) return -EINVAL; ssp = get_user_shstk_addr(); if (unlikely(!ssp)) return -EINVAL; err = shstk_push_sigframe(&ssp); if (unlikely(err)) return err; /* Push restorer address */ ssp -= SS_FRAME_SIZE; err = write_user_shstk_64((u64 __user *)ssp, (u64)restorer); if (unlikely(err)) return -EFAULT; fpregs_lock_and_load(); wrmsrq(MSR_IA32_PL3_SSP, ssp); fpregs_unlock(); return 0; } int restore_signal_shadow_stack(void) { unsigned long ssp; int err; if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK) || !features_enabled(ARCH_SHSTK_SHSTK)) return 0; ssp = get_user_shstk_addr(); if (unlikely(!ssp)) return -EINVAL; err = shstk_pop_sigframe(&ssp); if (unlikely(err)) return err; fpregs_lock_and_load(); wrmsrq(MSR_IA32_PL3_SSP, ssp); fpregs_unlock(); return 0; } void shstk_free(struct task_struct *tsk) { struct thread_shstk *shstk = &tsk->thread.shstk; if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK) || !features_enabled(ARCH_SHSTK_SHSTK)) return; /* * When fork() with CLONE_VM fails, the child (tsk) already has a * shadow stack allocated, and exit_thread() calls this function to * free it. In this case the parent (current) and the child share * the same mm struct. */ if (!tsk->mm || tsk->mm != current->mm) return; /* * If shstk->base is NULL, then this task is not managing its * own shadow stack (CLONE_VFORK). So skip freeing it. */ if (!shstk->base) return; /* * shstk->base is NULL for CLONE_VFORK child tasks, and so is * normal. But size = 0 on a shstk->base is not normal and * indicated an attempt to free the thread shadow stack twice. * Warn about it. */ if (WARN_ON(!shstk->size)) return; unmap_shadow_stack(shstk->base, shstk->size); shstk->size = 0; } static int wrss_control(bool enable) { u64 msrval; if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) return -EOPNOTSUPP; /* * Only enable WRSS if shadow stack is enabled. If shadow stack is not * enabled, WRSS will already be disabled, so don't bother clearing it * when disabling. */ if (!features_enabled(ARCH_SHSTK_SHSTK)) return -EPERM; /* Already enabled/disabled? */ if (features_enabled(ARCH_SHSTK_WRSS) == enable) return 0; fpregs_lock_and_load(); rdmsrq(MSR_IA32_U_CET, msrval); if (enable) { features_set(ARCH_SHSTK_WRSS); msrval |= CET_WRSS_EN; } else { features_clr(ARCH_SHSTK_WRSS); if (!(msrval & CET_WRSS_EN)) goto unlock; msrval &= ~CET_WRSS_EN; } wrmsrq(MSR_IA32_U_CET, msrval); unlock: fpregs_unlock(); return 0; } static int shstk_disable(void) { if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) return -EOPNOTSUPP; /* Already disabled? */ if (!features_enabled(ARCH_SHSTK_SHSTK)) return 0; fpregs_lock_and_load(); /* Disable WRSS too when disabling shadow stack */ wrmsrq(MSR_IA32_U_CET, 0); wrmsrq(MSR_IA32_PL3_SSP, 0); fpregs_unlock(); shstk_free(current); features_clr(ARCH_SHSTK_SHSTK | ARCH_SHSTK_WRSS); return 0; } SYSCALL_DEFINE3(map_shadow_stack, unsigned long, addr, unsigned long, size, unsigned int, flags) { bool set_tok = flags & SHADOW_STACK_SET_TOKEN; unsigned long aligned_size; if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) return -EOPNOTSUPP; if (flags & ~SHADOW_STACK_SET_TOKEN) return -EINVAL; /* If there isn't space for a token */ if (set_tok && size < 8) return -ENOSPC; if (addr && addr < SZ_4G) return -ERANGE; /* * An overflow would result in attempting to write the restore token * to the wrong location. Not catastrophic, but just return the right * error code and block it. */ aligned_size = PAGE_ALIGN(size); if (aligned_size < size) return -EOVERFLOW; return alloc_shstk(addr, aligned_size, size, set_tok); } long shstk_prctl(struct task_struct *task, int option, unsigned long arg2) { unsigned long features = arg2; if (option == ARCH_SHSTK_STATUS) { return put_user(task->thread.features, (unsigned long __user *)arg2); } if (option == ARCH_SHSTK_LOCK) { task->thread.features_locked |= features; return 0; } /* Only allow via ptrace */ if (task != current) { if (option == ARCH_SHSTK_UNLOCK && IS_ENABLED(CONFIG_CHECKPOINT_RESTORE)) { task->thread.features_locked &= ~features; return 0; } return -EINVAL; } /* Do not allow to change locked features */ if (features & task->thread.features_locked) return -EPERM; /* Only support enabling/disabling one feature at a time. */ if (hweight_long(features) > 1) return -EINVAL; if (option == ARCH_SHSTK_DISABLE) { if (features & ARCH_SHSTK_WRSS) return wrss_control(false); if (features & ARCH_SHSTK_SHSTK) return shstk_disable(); return -EINVAL; } /* Handle ARCH_SHSTK_ENABLE */ if (features & ARCH_SHSTK_SHSTK) return shstk_setup(); if (features & ARCH_SHSTK_WRSS) return wrss_control(true); return -EINVAL; } int shstk_update_last_frame(unsigned long val) { unsigned long ssp; if (!features_enabled(ARCH_SHSTK_SHSTK)) return 0; ssp = get_user_shstk_addr(); return write_user_shstk_64((u64 __user *)ssp, (u64)val); } bool shstk_is_enabled(void) { return features_enabled(ARCH_SHSTK_SHSTK); } |
| 35 1 1 27 18 20 176 14 4 1 9 8 2 2 2 28 1 25 1 3 2 21 26 14 3 15 3 26 18 17 8 27 33 2 4 28 25 28 27 28 26 1 20 1 1 1 23 19 2 2 18 5 21 2 15 6 2 22 1 23 84 17 67 84 80 4 83 18 4 5 5 9 65 63 155 155 152 155 154 60 94 65 50 50 50 23 1 5 13 13 5 61 61 2 59 2 20 3 50 21 4 32 39 1 11 11 9 19 17 1 6 5 3 4 39 1 58 9 1 12 18 10 10 9 426 429 430 429 430 429 34 6 5 5 64 9 1 1 7 41 41 9 34 4 30 31 31 2 3 28 22 7 18 18 11 12 13 9 9 20 20 20 4 4 4 9 9 8 36 35 34 36 36 18 18 1 17 17 9 12 17 40 40 1 39 39 35 8 38 64 64 64 64 60 30 64 2 2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 | /* * videobuf2-v4l2.c - V4L2 driver helper framework * * Copyright (C) 2010 Samsung Electronics * * Author: Pawel Osciak <pawel@osciak.com> * Marek Szyprowski <m.szyprowski@samsung.com> * * The vb2_thread implementation was based on code from videobuf-dvb.c: * (c) 2004 Gerd Knorr <kraxel@bytesex.org> [SUSE Labs] * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation. */ #include <linux/device.h> #include <linux/err.h> #include <linux/freezer.h> #include <linux/kernel.h> #include <linux/kthread.h> #include <linux/mm.h> #include <linux/module.h> #include <linux/poll.h> #include <linux/sched.h> #include <linux/slab.h> #include <media/v4l2-common.h> #include <media/v4l2-dev.h> #include <media/v4l2-device.h> #include <media/v4l2-event.h> #include <media/v4l2-fh.h> #include <media/videobuf2-v4l2.h> static int debug; module_param(debug, int, 0644); #define dprintk(q, level, fmt, arg...) \ do { \ if (debug >= level) \ pr_info("vb2-v4l2: [%p] %s: " fmt, \ (q)->name, __func__, ## arg); \ } while (0) /* Flags that are set by us */ #define V4L2_BUFFER_MASK_FLAGS (V4L2_BUF_FLAG_MAPPED | V4L2_BUF_FLAG_QUEUED | \ V4L2_BUF_FLAG_DONE | V4L2_BUF_FLAG_ERROR | \ V4L2_BUF_FLAG_PREPARED | \ V4L2_BUF_FLAG_IN_REQUEST | \ V4L2_BUF_FLAG_REQUEST_FD | \ V4L2_BUF_FLAG_TIMESTAMP_MASK) /* Output buffer flags that should be passed on to the driver */ #define V4L2_BUFFER_OUT_FLAGS (V4L2_BUF_FLAG_PFRAME | \ V4L2_BUF_FLAG_BFRAME | \ V4L2_BUF_FLAG_KEYFRAME | \ V4L2_BUF_FLAG_TIMECODE | \ V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF) /* * __verify_planes_array() - verify that the planes array passed in struct * v4l2_buffer from userspace can be safely used */ static int __verify_planes_array(struct vb2_buffer *vb, const struct v4l2_buffer *b) { if (!V4L2_TYPE_IS_MULTIPLANAR(b->type)) return 0; /* Is memory for copying plane information present? */ if (b->m.planes == NULL) { dprintk(vb->vb2_queue, 1, "multi-planar buffer passed but planes array not provided\n"); return -EINVAL; } if (b->length < vb->num_planes || b->length > VB2_MAX_PLANES) { dprintk(vb->vb2_queue, 1, "incorrect planes array length, expected %d, got %d\n", vb->num_planes, b->length); return -EINVAL; } return 0; } static int __verify_planes_array_core(struct vb2_buffer *vb, const void *pb) { return __verify_planes_array(vb, pb); } /* * __verify_length() - Verify that the bytesused value for each plane fits in * the plane length and that the data offset doesn't exceed the bytesused value. */ static int __verify_length(struct vb2_buffer *vb, const struct v4l2_buffer *b) { unsigned int length; unsigned int bytesused; unsigned int plane; if (V4L2_TYPE_IS_CAPTURE(b->type)) return 0; if (V4L2_TYPE_IS_MULTIPLANAR(b->type)) { for (plane = 0; plane < vb->num_planes; ++plane) { length = (b->memory == VB2_MEMORY_USERPTR || b->memory == VB2_MEMORY_DMABUF) ? b->m.planes[plane].length : vb->planes[plane].length; bytesused = b->m.planes[plane].bytesused ? b->m.planes[plane].bytesused : length; if (b->m.planes[plane].bytesused > length) return -EINVAL; if (b->m.planes[plane].data_offset > 0 && b->m.planes[plane].data_offset >= bytesused) return -EINVAL; } } else { length = (b->memory == VB2_MEMORY_USERPTR) ? b->length : vb->planes[0].length; if (b->bytesused > length) return -EINVAL; } return 0; } /* * __init_vb2_v4l2_buffer() - initialize the vb2_v4l2_buffer struct */ static void __init_vb2_v4l2_buffer(struct vb2_buffer *vb) { struct vb2_v4l2_buffer *vbuf = to_vb2_v4l2_buffer(vb); vbuf->request_fd = -1; } static void __copy_timestamp(struct vb2_buffer *vb, const void *pb) { const struct v4l2_buffer *b = pb; struct vb2_v4l2_buffer *vbuf = to_vb2_v4l2_buffer(vb); struct vb2_queue *q = vb->vb2_queue; if (q->is_output) { /* * For output buffers copy the timestamp if needed, * and the timecode field and flag if needed. */ if (q->copy_timestamp) vb->timestamp = v4l2_buffer_get_timestamp(b); vbuf->flags |= b->flags & V4L2_BUF_FLAG_TIMECODE; if (b->flags & V4L2_BUF_FLAG_TIMECODE) vbuf->timecode = b->timecode; } }; static void vb2_warn_zero_bytesused(struct vb2_buffer *vb) { static bool check_once; if (check_once) return; check_once = true; pr_warn("use of bytesused == 0 is deprecated and will be removed in the future,\n"); if (vb->vb2_queue->allow_zero_bytesused) pr_warn("use VIDIOC_DECODER_CMD(V4L2_DEC_CMD_STOP) instead.\n"); else pr_warn("use the actual size instead.\n"); } static int vb2_fill_vb2_v4l2_buffer(struct vb2_buffer *vb, struct v4l2_buffer *b) { struct vb2_queue *q = vb->vb2_queue; struct vb2_v4l2_buffer *vbuf = to_vb2_v4l2_buffer(vb); struct vb2_plane *planes = vbuf->planes; unsigned int plane; int ret; ret = __verify_length(vb, b); if (ret < 0) { dprintk(q, 1, "plane parameters verification failed: %d\n", ret); return ret; } if (b->field == V4L2_FIELD_ALTERNATE && q->is_output) { /* * If the format's field is ALTERNATE, then the buffer's field * should be either TOP or BOTTOM, not ALTERNATE since that * makes no sense. The driver has to know whether the * buffer represents a top or a bottom field in order to * program any DMA correctly. Using ALTERNATE is wrong, since * that just says that it is either a top or a bottom field, * but not which of the two it is. */ dprintk(q, 1, "the field is incorrectly set to ALTERNATE for an output buffer\n"); return -EINVAL; } vbuf->sequence = 0; vbuf->request_fd = -1; vbuf->is_held = false; if (V4L2_TYPE_IS_MULTIPLANAR(b->type)) { switch (b->memory) { case VB2_MEMORY_USERPTR: for (plane = 0; plane < vb->num_planes; ++plane) { planes[plane].m.userptr = b->m.planes[plane].m.userptr; planes[plane].length = b->m.planes[plane].length; } break; case VB2_MEMORY_DMABUF: for (plane = 0; plane < vb->num_planes; ++plane) { planes[plane].m.fd = b->m.planes[plane].m.fd; planes[plane].length = b->m.planes[plane].length; } break; default: for (plane = 0; plane < vb->num_planes; ++plane) { planes[plane].m.offset = vb->planes[plane].m.offset; planes[plane].length = vb->planes[plane].length; } break; } /* Fill in user-provided information for OUTPUT types */ if (V4L2_TYPE_IS_OUTPUT(b->type)) { /* * Will have to go up to b->length when API starts * accepting variable number of planes. * * If bytesused == 0 for the output buffer, then fall * back to the full buffer size. In that case * userspace clearly never bothered to set it and * it's a safe assumption that they really meant to * use the full plane sizes. * * Some drivers, e.g. old codec drivers, use bytesused == 0 * as a way to indicate that streaming is finished. * In that case, the driver should use the * allow_zero_bytesused flag to keep old userspace * applications working. */ for (plane = 0; plane < vb->num_planes; ++plane) { struct vb2_plane *pdst = &planes[plane]; struct v4l2_plane *psrc = &b->m.planes[plane]; if (psrc->bytesused == 0) vb2_warn_zero_bytesused(vb); if (vb->vb2_queue->allow_zero_bytesused) pdst->bytesused = psrc->bytesused; else pdst->bytesused = psrc->bytesused ? psrc->bytesused : pdst->length; pdst->data_offset = psrc->data_offset; } } } else { /* * Single-planar buffers do not use planes array, * so fill in relevant v4l2_buffer struct fields instead. * In vb2 we use our internal V4l2_planes struct for * single-planar buffers as well, for simplicity. * * If bytesused == 0 for the output buffer, then fall back * to the full buffer size as that's a sensible default. * * Some drivers, e.g. old codec drivers, use bytesused == 0 as * a way to indicate that streaming is finished. In that case, * the driver should use the allow_zero_bytesused flag to keep * old userspace applications working. */ switch (b->memory) { case VB2_MEMORY_USERPTR: planes[0].m.userptr = b->m.userptr; planes[0].length = b->length; break; case VB2_MEMORY_DMABUF: planes[0].m.fd = b->m.fd; planes[0].length = b->length; break; default: planes[0].m.offset = vb->planes[0].m.offset; planes[0].length = vb->planes[0].length; break; } planes[0].data_offset = 0; if (V4L2_TYPE_IS_OUTPUT(b->type)) { if (b->bytesused == 0) vb2_warn_zero_bytesused(vb); if (vb->vb2_queue->allow_zero_bytesused) planes[0].bytesused = b->bytesused; else planes[0].bytesused = b->bytesused ? b->bytesused : planes[0].length; } else planes[0].bytesused = 0; } /* Zero flags that we handle */ vbuf->flags = b->flags & ~V4L2_BUFFER_MASK_FLAGS; if (!vb->vb2_queue->copy_timestamp || V4L2_TYPE_IS_CAPTURE(b->type)) { /* * Non-COPY timestamps and non-OUTPUT queues will get * their timestamp and timestamp source flags from the * queue. */ vbuf->flags &= ~V4L2_BUF_FLAG_TSTAMP_SRC_MASK; } if (V4L2_TYPE_IS_OUTPUT(b->type)) { /* * For output buffers mask out the timecode flag: * this will be handled later in vb2_qbuf(). * The 'field' is valid metadata for this output buffer * and so that needs to be copied here. */ vbuf->flags &= ~V4L2_BUF_FLAG_TIMECODE; vbuf->field = b->field; if (!(q->subsystem_flags & VB2_V4L2_FL_SUPPORTS_M2M_HOLD_CAPTURE_BUF)) vbuf->flags &= ~V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF; } else { /* Zero any output buffer flags as this is a capture buffer */ vbuf->flags &= ~V4L2_BUFFER_OUT_FLAGS; /* Zero last flag, this is a signal from driver to userspace */ vbuf->flags &= ~V4L2_BUF_FLAG_LAST; } return 0; } static void set_buffer_cache_hints(struct vb2_queue *q, struct vb2_buffer *vb, struct v4l2_buffer *b) { if (!vb2_queue_allows_cache_hints(q)) { /* * Clear buffer cache flags if queue does not support user * space hints. That's to indicate to userspace that these * flags won't work. */ b->flags &= ~V4L2_BUF_FLAG_NO_CACHE_INVALIDATE; b->flags &= ~V4L2_BUF_FLAG_NO_CACHE_CLEAN; return; } if (b->flags & V4L2_BUF_FLAG_NO_CACHE_INVALIDATE) vb->skip_cache_sync_on_finish = 1; if (b->flags & V4L2_BUF_FLAG_NO_CACHE_CLEAN) vb->skip_cache_sync_on_prepare = 1; } static int vb2_queue_or_prepare_buf(struct vb2_queue *q, struct media_device *mdev, struct vb2_buffer *vb, struct v4l2_buffer *b, bool is_prepare, struct media_request **p_req) { const char *opname = is_prepare ? "prepare_buf" : "qbuf"; struct media_request *req; struct vb2_v4l2_buffer *vbuf; int ret; if (b->type != q->type) { dprintk(q, 1, "%s: invalid buffer type\n", opname); return -EINVAL; } if (b->memory != q->memory) { dprintk(q, 1, "%s: invalid memory type\n", opname); return -EINVAL; } vbuf = to_vb2_v4l2_buffer(vb); ret = __verify_planes_array(vb, b); if (ret) return ret; if (!is_prepare && (b->flags & V4L2_BUF_FLAG_REQUEST_FD) && vb->state != VB2_BUF_STATE_DEQUEUED) { dprintk(q, 1, "%s: buffer is not in dequeued state\n", opname); return -EINVAL; } if (!vb->prepared) { set_buffer_cache_hints(q, vb, b); /* Copy relevant information provided by the userspace */ memset(vbuf->planes, 0, sizeof(vbuf->planes[0]) * vb->num_planes); ret = vb2_fill_vb2_v4l2_buffer(vb, b); if (ret) return ret; } if (is_prepare) return 0; if (!(b->flags & V4L2_BUF_FLAG_REQUEST_FD)) { if (q->requires_requests) { dprintk(q, 1, "%s: queue requires requests\n", opname); return -EBADR; } if (q->uses_requests) { dprintk(q, 1, "%s: queue uses requests\n", opname); return -EBUSY; } return 0; } else if (!q->supports_requests) { dprintk(q, 1, "%s: queue does not support requests\n", opname); return -EBADR; } else if (q->uses_qbuf) { dprintk(q, 1, "%s: queue does not use requests\n", opname); return -EBUSY; } /* * For proper locking when queueing a request you need to be able * to lock access to the vb2 queue, so check that there is a lock * that we can use. In addition p_req must be non-NULL. */ if (WARN_ON(!q->lock || !p_req)) return -EINVAL; /* * Make sure this op is implemented by the driver. It's easy to forget * this callback, but is it important when canceling a buffer in a * queued request. */ if (WARN_ON(!q->ops->buf_request_complete)) return -EINVAL; /* * Make sure this op is implemented by the driver for the output queue. * It's easy to forget this callback, but is it important to correctly * validate the 'field' value at QBUF time. */ if (WARN_ON((q->type == V4L2_BUF_TYPE_VIDEO_OUTPUT || q->type == V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE) && !q->ops->buf_out_validate)) return -EINVAL; req = media_request_get_by_fd(mdev, b->request_fd); if (IS_ERR(req)) { dprintk(q, 1, "%s: invalid request_fd\n", opname); return PTR_ERR(req); } /* * Early sanity check. This is checked again when the buffer * is bound to the request in vb2_core_qbuf(). */ if (req->state != MEDIA_REQUEST_STATE_IDLE && req->state != MEDIA_REQUEST_STATE_UPDATING) { dprintk(q, 1, "%s: request is not idle\n", opname); media_request_put(req); return -EBUSY; } *p_req = req; vbuf->request_fd = b->request_fd; return 0; } /* * __fill_v4l2_buffer() - fill in a struct v4l2_buffer with information to be * returned to userspace */ static void __fill_v4l2_buffer(struct vb2_buffer *vb, void *pb) { struct v4l2_buffer *b = pb; struct vb2_v4l2_buffer *vbuf = to_vb2_v4l2_buffer(vb); struct vb2_queue *q = vb->vb2_queue; unsigned int plane; /* Copy back data such as timestamp, flags, etc. */ b->index = vb->index; b->type = vb->type; b->memory = vb->memory; b->bytesused = 0; b->flags = vbuf->flags; b->field = vbuf->field; v4l2_buffer_set_timestamp(b, vb->timestamp); b->timecode = vbuf->timecode; b->sequence = vbuf->sequence; b->reserved2 = 0; b->request_fd = 0; if (q->is_multiplanar) { /* * Fill in plane-related data if userspace provided an array * for it. The caller has already verified memory and size. */ b->length = vb->num_planes; for (plane = 0; plane < vb->num_planes; ++plane) { struct v4l2_plane *pdst = &b->m.planes[plane]; struct vb2_plane *psrc = &vb->planes[plane]; pdst->bytesused = psrc->bytesused; pdst->length = psrc->length; if (q->memory == VB2_MEMORY_MMAP) pdst->m.mem_offset = psrc->m.offset; else if (q->memory == VB2_MEMORY_USERPTR) pdst->m.userptr = psrc->m.userptr; else if (q->memory == VB2_MEMORY_DMABUF) pdst->m.fd = psrc->m.fd; pdst->data_offset = psrc->data_offset; memset(pdst->reserved, 0, sizeof(pdst->reserved)); } } else { /* * We use length and offset in v4l2_planes array even for * single-planar buffers, but userspace does not. */ b->length = vb->planes[0].length; b->bytesused = vb->planes[0].bytesused; if (q->memory == VB2_MEMORY_MMAP) b->m.offset = vb->planes[0].m.offset; else if (q->memory == VB2_MEMORY_USERPTR) b->m.userptr = vb->planes[0].m.userptr; else if (q->memory == VB2_MEMORY_DMABUF) b->m.fd = vb->planes[0].m.fd; } /* * Clear any buffer state related flags. */ b->flags &= ~V4L2_BUFFER_MASK_FLAGS; b->flags |= q->timestamp_flags & V4L2_BUF_FLAG_TIMESTAMP_MASK; if (!q->copy_timestamp) { /* * For non-COPY timestamps, drop timestamp source bits * and obtain the timestamp source from the queue. */ b->flags &= ~V4L2_BUF_FLAG_TSTAMP_SRC_MASK; b->flags |= q->timestamp_flags & V4L2_BUF_FLAG_TSTAMP_SRC_MASK; } switch (vb->state) { case VB2_BUF_STATE_QUEUED: case VB2_BUF_STATE_ACTIVE: b->flags |= V4L2_BUF_FLAG_QUEUED; break; case VB2_BUF_STATE_IN_REQUEST: b->flags |= V4L2_BUF_FLAG_IN_REQUEST; break; case VB2_BUF_STATE_ERROR: b->flags |= V4L2_BUF_FLAG_ERROR; fallthrough; case VB2_BUF_STATE_DONE: b->flags |= V4L2_BUF_FLAG_DONE; break; case VB2_BUF_STATE_PREPARING: case VB2_BUF_STATE_DEQUEUED: /* nothing */ break; } if ((vb->state == VB2_BUF_STATE_DEQUEUED || vb->state == VB2_BUF_STATE_IN_REQUEST) && vb->synced && vb->prepared) b->flags |= V4L2_BUF_FLAG_PREPARED; if (vb2_buffer_in_use(q, vb)) b->flags |= V4L2_BUF_FLAG_MAPPED; if (vbuf->request_fd >= 0) { b->flags |= V4L2_BUF_FLAG_REQUEST_FD; b->request_fd = vbuf->request_fd; } } /* * __fill_vb2_buffer() - fill a vb2_buffer with information provided in a * v4l2_buffer by the userspace. It also verifies that struct * v4l2_buffer has a valid number of planes. */ static int __fill_vb2_buffer(struct vb2_buffer *vb, struct vb2_plane *planes) { struct vb2_v4l2_buffer *vbuf = to_vb2_v4l2_buffer(vb); unsigned int plane; if (!vb->vb2_queue->copy_timestamp) vb->timestamp = 0; for (plane = 0; plane < vb->num_planes; ++plane) { if (vb->vb2_queue->memory != VB2_MEMORY_MMAP) { planes[plane].m = vbuf->planes[plane].m; planes[plane].length = vbuf->planes[plane].length; } planes[plane].bytesused = vbuf->planes[plane].bytesused; planes[plane].data_offset = vbuf->planes[plane].data_offset; } return 0; } static const struct vb2_buf_ops v4l2_buf_ops = { .verify_planes_array = __verify_planes_array_core, .init_buffer = __init_vb2_v4l2_buffer, .fill_user_buffer = __fill_v4l2_buffer, .fill_vb2_buffer = __fill_vb2_buffer, .copy_timestamp = __copy_timestamp, }; struct vb2_buffer *vb2_find_buffer(struct vb2_queue *q, u64 timestamp) { unsigned int i; struct vb2_buffer *vb2; /* * This loop doesn't scale if there is a really large number of buffers. * Maybe something more efficient will be needed in this case. */ for (i = 0; i < q->max_num_buffers; i++) { vb2 = vb2_get_buffer(q, i); if (!vb2) continue; if (vb2->copied_timestamp && vb2->timestamp == timestamp) return vb2; } return NULL; } EXPORT_SYMBOL_GPL(vb2_find_buffer); /* * vb2_querybuf() - query video buffer information * @q: vb2 queue * @b: buffer struct passed from userspace to vidioc_querybuf handler * in driver * * Should be called from vidioc_querybuf ioctl handler in driver. * This function will verify the passed v4l2_buffer structure and fill the * relevant information for the userspace. * * The return values from this function are intended to be directly returned * from vidioc_querybuf handler in driver. */ int vb2_querybuf(struct vb2_queue *q, struct v4l2_buffer *b) { struct vb2_buffer *vb; int ret; if (b->type != q->type) { dprintk(q, 1, "wrong buffer type\n"); return -EINVAL; } vb = vb2_get_buffer(q, b->index); if (!vb) { dprintk(q, 1, "can't find the requested buffer %u\n", b->index); return -EINVAL; } ret = __verify_planes_array(vb, b); if (!ret) vb2_core_querybuf(q, vb, b); return ret; } EXPORT_SYMBOL(vb2_querybuf); static void vb2_set_flags_and_caps(struct vb2_queue *q, u32 memory, u32 *flags, u32 *caps, u32 *max_num_bufs) { if (!q->allow_cache_hints || memory != V4L2_MEMORY_MMAP) { /* * This needs to clear V4L2_MEMORY_FLAG_NON_COHERENT only, * but in order to avoid bugs we zero out all bits. */ *flags = 0; } else { /* Clear all unknown flags. */ *flags &= V4L2_MEMORY_FLAG_NON_COHERENT; } *caps |= V4L2_BUF_CAP_SUPPORTS_ORPHANED_BUFS; if (q->io_modes & VB2_MMAP) *caps |= V4L2_BUF_CAP_SUPPORTS_MMAP; if (q->io_modes & VB2_USERPTR) *caps |= V4L2_BUF_CAP_SUPPORTS_USERPTR; if (q->io_modes & VB2_DMABUF) *caps |= V4L2_BUF_CAP_SUPPORTS_DMABUF; if (q->subsystem_flags & VB2_V4L2_FL_SUPPORTS_M2M_HOLD_CAPTURE_BUF) *caps |= V4L2_BUF_CAP_SUPPORTS_M2M_HOLD_CAPTURE_BUF; if (q->allow_cache_hints && q->io_modes & VB2_MMAP) *caps |= V4L2_BUF_CAP_SUPPORTS_MMAP_CACHE_HINTS; if (q->supports_requests) *caps |= V4L2_BUF_CAP_SUPPORTS_REQUESTS; if (max_num_bufs) { *max_num_bufs = q->max_num_buffers; *caps |= V4L2_BUF_CAP_SUPPORTS_MAX_NUM_BUFFERS; } } int vb2_reqbufs(struct vb2_queue *q, struct v4l2_requestbuffers *req) { int ret = vb2_verify_memory_type(q, req->memory, req->type); u32 flags = req->flags; vb2_set_flags_and_caps(q, req->memory, &flags, &req->capabilities, NULL); req->flags = flags; return ret ? ret : vb2_core_reqbufs(q, req->memory, req->flags, &req->count); } EXPORT_SYMBOL_GPL(vb2_reqbufs); int vb2_prepare_buf(struct vb2_queue *q, struct media_device *mdev, struct v4l2_buffer *b) { struct vb2_buffer *vb; int ret; if (vb2_fileio_is_active(q)) { dprintk(q, 1, "file io in progress\n"); return -EBUSY; } if (b->flags & V4L2_BUF_FLAG_REQUEST_FD) return -EINVAL; vb = vb2_get_buffer(q, b->index); if (!vb) { dprintk(q, 1, "can't find the requested buffer %u\n", b->index); return -EINVAL; } ret = vb2_queue_or_prepare_buf(q, mdev, vb, b, true, NULL); return ret ? ret : vb2_core_prepare_buf(q, vb, b); } EXPORT_SYMBOL_GPL(vb2_prepare_buf); int vb2_create_bufs(struct vb2_queue *q, struct v4l2_create_buffers *create) { unsigned requested_planes = 1; unsigned requested_sizes[VIDEO_MAX_PLANES]; struct v4l2_format *f = &create->format; int ret = vb2_verify_memory_type(q, create->memory, f->type); unsigned i; create->index = vb2_get_num_buffers(q); vb2_set_flags_and_caps(q, create->memory, &create->flags, &create->capabilities, &create->max_num_buffers); if (create->count == 0) return ret != -EBUSY ? ret : 0; switch (f->type) { case V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE: case V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE: requested_planes = f->fmt.pix_mp.num_planes; if (requested_planes == 0 || requested_planes > VIDEO_MAX_PLANES) return -EINVAL; for (i = 0; i < requested_planes; i++) requested_sizes[i] = f->fmt.pix_mp.plane_fmt[i].sizeimage; break; case V4L2_BUF_TYPE_VIDEO_CAPTURE: case V4L2_BUF_TYPE_VIDEO_OUTPUT: requested_sizes[0] = f->fmt.pix.sizeimage; break; case V4L2_BUF_TYPE_VBI_CAPTURE: case V4L2_BUF_TYPE_VBI_OUTPUT: requested_sizes[0] = f->fmt.vbi.samples_per_line * (f->fmt.vbi.count[0] + f->fmt.vbi.count[1]); break; case V4L2_BUF_TYPE_SLICED_VBI_CAPTURE: case V4L2_BUF_TYPE_SLICED_VBI_OUTPUT: requested_sizes[0] = f->fmt.sliced.io_size; break; case V4L2_BUF_TYPE_SDR_CAPTURE: case V4L2_BUF_TYPE_SDR_OUTPUT: requested_sizes[0] = f->fmt.sdr.buffersize; break; case V4L2_BUF_TYPE_META_CAPTURE: case V4L2_BUF_TYPE_META_OUTPUT: requested_sizes[0] = f->fmt.meta.buffersize; break; default: return -EINVAL; } for (i = 0; i < requested_planes; i++) if (requested_sizes[i] == 0) return -EINVAL; if (ret) return ret; return vb2_core_create_bufs(q, create->memory, create->flags, &create->count, requested_planes, requested_sizes, &create->index); } EXPORT_SYMBOL_GPL(vb2_create_bufs); int vb2_qbuf(struct vb2_queue *q, struct media_device *mdev, struct v4l2_buffer *b) { struct media_request *req = NULL; struct vb2_buffer *vb; int ret; if (vb2_fileio_is_active(q)) { dprintk(q, 1, "file io in progress\n"); return -EBUSY; } vb = vb2_get_buffer(q, b->index); if (!vb) { dprintk(q, 1, "can't find the requested buffer %u\n", b->index); return -EINVAL; } ret = vb2_queue_or_prepare_buf(q, mdev, vb, b, false, &req); if (ret) return ret; ret = vb2_core_qbuf(q, vb, b, req); if (req) media_request_put(req); return ret; } EXPORT_SYMBOL_GPL(vb2_qbuf); int vb2_dqbuf(struct vb2_queue *q, struct v4l2_buffer *b, bool nonblocking) { int ret; if (vb2_fileio_is_active(q)) { dprintk(q, 1, "file io in progress\n"); return -EBUSY; } if (b->type != q->type) { dprintk(q, 1, "invalid buffer type\n"); return -EINVAL; } ret = vb2_core_dqbuf(q, NULL, b, nonblocking); if (!q->is_output && b->flags & V4L2_BUF_FLAG_DONE && b->flags & V4L2_BUF_FLAG_LAST) q->last_buffer_dequeued = true; /* * After calling the VIDIOC_DQBUF V4L2_BUF_FLAG_DONE must be * cleared. */ b->flags &= ~V4L2_BUF_FLAG_DONE; return ret; } EXPORT_SYMBOL_GPL(vb2_dqbuf); int vb2_streamon(struct vb2_queue *q, enum v4l2_buf_type type) { if (vb2_fileio_is_active(q)) { dprintk(q, 1, "file io in progress\n"); return -EBUSY; } return vb2_core_streamon(q, type); } EXPORT_SYMBOL_GPL(vb2_streamon); int vb2_streamoff(struct vb2_queue *q, enum v4l2_buf_type type) { if (vb2_fileio_is_active(q)) { dprintk(q, 1, "file io in progress\n"); return -EBUSY; } return vb2_core_streamoff(q, type); } EXPORT_SYMBOL_GPL(vb2_streamoff); int vb2_expbuf(struct vb2_queue *q, struct v4l2_exportbuffer *eb) { struct vb2_buffer *vb; vb = vb2_get_buffer(q, eb->index); if (!vb) { dprintk(q, 1, "can't find the requested buffer %u\n", eb->index); return -EINVAL; } return vb2_core_expbuf(q, &eb->fd, eb->type, vb, eb->plane, eb->flags); } EXPORT_SYMBOL_GPL(vb2_expbuf); int vb2_queue_init_name(struct vb2_queue *q, const char *name) { /* vb2_memory should match with v4l2_memory */ BUILD_BUG_ON(VB2_MEMORY_MMAP != (int)V4L2_MEMORY_MMAP); BUILD_BUG_ON(VB2_MEMORY_USERPTR != (int)V4L2_MEMORY_USERPTR); BUILD_BUG_ON(VB2_MEMORY_DMABUF != (int)V4L2_MEMORY_DMABUF); /* * Sanity check */ if (WARN_ON(!q) || WARN_ON(q->timestamp_flags & ~(V4L2_BUF_FLAG_TIMESTAMP_MASK | V4L2_BUF_FLAG_TSTAMP_SRC_MASK))) return -EINVAL; /* Warn that the driver should choose an appropriate timestamp type */ WARN_ON((q->timestamp_flags & V4L2_BUF_FLAG_TIMESTAMP_MASK) == V4L2_BUF_FLAG_TIMESTAMP_UNKNOWN); if (q->buf_struct_size == 0) q->buf_struct_size = sizeof(struct vb2_v4l2_buffer); q->buf_ops = &v4l2_buf_ops; q->is_multiplanar = V4L2_TYPE_IS_MULTIPLANAR(q->type); q->is_output = V4L2_TYPE_IS_OUTPUT(q->type); q->copy_timestamp = (q->timestamp_flags & V4L2_BUF_FLAG_TIMESTAMP_MASK) == V4L2_BUF_FLAG_TIMESTAMP_COPY; /* * For compatibility with vb1: if QBUF hasn't been called yet, then * return EPOLLERR as well. This only affects capture queues, output * queues will always initialize waiting_for_buffers to false. */ q->quirk_poll_must_check_waiting_for_buffers = true; if (name) strscpy(q->name, name, sizeof(q->name)); else q->name[0] = '\0'; return vb2_core_queue_init(q); } EXPORT_SYMBOL_GPL(vb2_queue_init_name); int vb2_queue_init(struct vb2_queue *q) { return vb2_queue_init_name(q, NULL); } EXPORT_SYMBOL_GPL(vb2_queue_init); void vb2_queue_release(struct vb2_queue *q) { vb2_core_queue_release(q); } EXPORT_SYMBOL_GPL(vb2_queue_release); int vb2_queue_change_type(struct vb2_queue *q, unsigned int type) { if (type == q->type) return 0; if (vb2_is_busy(q)) return -EBUSY; q->type = type; return 0; } EXPORT_SYMBOL_GPL(vb2_queue_change_type); __poll_t vb2_poll(struct vb2_queue *q, struct file *file, poll_table *wait) { struct v4l2_fh *fh = file_to_v4l2_fh(file); __poll_t res; res = vb2_core_poll(q, file, wait); poll_wait(file, &fh->wait, wait); if (v4l2_event_pending(fh)) res |= EPOLLPRI; return res; } EXPORT_SYMBOL_GPL(vb2_poll); /* * The following functions are not part of the vb2 core API, but are helper * functions that plug into struct v4l2_ioctl_ops, struct v4l2_file_operations * and struct vb2_ops. * They contain boilerplate code that most if not all drivers have to do * and so they simplify the driver code. */ /* vb2 ioctl helpers */ int vb2_ioctl_remove_bufs(struct file *file, void *priv, struct v4l2_remove_buffers *d) { struct video_device *vdev = video_devdata(file); if (vdev->queue->type != d->type) return -EINVAL; if (d->count == 0) return 0; if (vb2_queue_is_busy(vdev->queue, file)) return -EBUSY; return vb2_core_remove_bufs(vdev->queue, d->index, d->count); } EXPORT_SYMBOL_GPL(vb2_ioctl_remove_bufs); int vb2_ioctl_reqbufs(struct file *file, void *priv, struct v4l2_requestbuffers *p) { struct video_device *vdev = video_devdata(file); int res = vb2_verify_memory_type(vdev->queue, p->memory, p->type); u32 flags = p->flags; vb2_set_flags_and_caps(vdev->queue, p->memory, &flags, &p->capabilities, NULL); p->flags = flags; if (res) return res; if (vb2_queue_is_busy(vdev->queue, file)) return -EBUSY; res = vb2_core_reqbufs(vdev->queue, p->memory, p->flags, &p->count); /* If count == 0, then the owner has released all buffers and he is no longer owner of the queue. Otherwise we have a new owner. */ if (res == 0) vdev->queue->owner = p->count ? file->private_data : NULL; return res; } EXPORT_SYMBOL_GPL(vb2_ioctl_reqbufs); int vb2_ioctl_create_bufs(struct file *file, void *priv, struct v4l2_create_buffers *p) { struct video_device *vdev = video_devdata(file); int res = vb2_verify_memory_type(vdev->queue, p->memory, p->format.type); p->index = vb2_get_num_buffers(vdev->queue); vb2_set_flags_and_caps(vdev->queue, p->memory, &p->flags, &p->capabilities, &p->max_num_buffers); /* * If count == 0, then just check if memory and type are valid. * Any -EBUSY result from vb2_verify_memory_type can be mapped to 0. */ if (p->count == 0) return res != -EBUSY ? res : 0; if (res) return res; if (vb2_queue_is_busy(vdev->queue, file)) return -EBUSY; res = vb2_create_bufs(vdev->queue, p); if (res == 0) vdev->queue->owner = file->private_data; return res; } EXPORT_SYMBOL_GPL(vb2_ioctl_create_bufs); int vb2_ioctl_prepare_buf(struct file *file, void *priv, struct v4l2_buffer *p) { struct video_device *vdev = video_devdata(file); if (vb2_queue_is_busy(vdev->queue, file)) return -EBUSY; return vb2_prepare_buf(vdev->queue, vdev->v4l2_dev->mdev, p); } EXPORT_SYMBOL_GPL(vb2_ioctl_prepare_buf); int vb2_ioctl_querybuf(struct file *file, void *priv, struct v4l2_buffer *p) { struct video_device *vdev = video_devdata(file); /* No need to call vb2_queue_is_busy(), anyone can query buffers. */ return vb2_querybuf(vdev->queue, p); } EXPORT_SYMBOL_GPL(vb2_ioctl_querybuf); int vb2_ioctl_qbuf(struct file *file, void *priv, struct v4l2_buffer *p) { struct video_device *vdev = video_devdata(file); if (vb2_queue_is_busy(vdev->queue, file)) return -EBUSY; return vb2_qbuf(vdev->queue, vdev->v4l2_dev->mdev, p); } EXPORT_SYMBOL_GPL(vb2_ioctl_qbuf); int vb2_ioctl_dqbuf(struct file *file, void *priv, struct v4l2_buffer *p) { struct video_device *vdev = video_devdata(file); if (vb2_queue_is_busy(vdev->queue, file)) return -EBUSY; return vb2_dqbuf(vdev->queue, p, file->f_flags & O_NONBLOCK); } EXPORT_SYMBOL_GPL(vb2_ioctl_dqbuf); int vb2_ioctl_streamon(struct file *file, void *priv, enum v4l2_buf_type i) { struct video_device *vdev = video_devdata(file); if (vb2_queue_is_busy(vdev->queue, file)) return -EBUSY; return vb2_streamon(vdev->queue, i); } EXPORT_SYMBOL_GPL(vb2_ioctl_streamon); int vb2_ioctl_streamoff(struct file *file, void *priv, enum v4l2_buf_type i) { struct video_device *vdev = video_devdata(file); if (vb2_queue_is_busy(vdev->queue, file)) return -EBUSY; return vb2_streamoff(vdev->queue, i); } EXPORT_SYMBOL_GPL(vb2_ioctl_streamoff); int vb2_ioctl_expbuf(struct file *file, void *priv, struct v4l2_exportbuffer *p) { struct video_device *vdev = video_devdata(file); if (vb2_queue_is_busy(vdev->queue, file)) return -EBUSY; return vb2_expbuf(vdev->queue, p); } EXPORT_SYMBOL_GPL(vb2_ioctl_expbuf); /* v4l2_file_operations helpers */ int vb2_fop_mmap(struct file *file, struct vm_area_struct *vma) { struct video_device *vdev = video_devdata(file); return vb2_mmap(vdev->queue, vma); } EXPORT_SYMBOL_GPL(vb2_fop_mmap); int _vb2_fop_release(struct file *file, struct mutex *lock) { struct video_device *vdev = video_devdata(file); if (lock) mutex_lock(lock); if (!vdev->queue->owner || file->private_data == vdev->queue->owner) { vb2_queue_release(vdev->queue); vdev->queue->owner = NULL; } if (lock) mutex_unlock(lock); return v4l2_fh_release(file); } EXPORT_SYMBOL_GPL(_vb2_fop_release); int vb2_fop_release(struct file *file) { struct video_device *vdev = video_devdata(file); struct mutex *lock = vdev->queue->lock ? vdev->queue->lock : vdev->lock; return _vb2_fop_release(file, lock); } EXPORT_SYMBOL_GPL(vb2_fop_release); ssize_t vb2_fop_write(struct file *file, const char __user *buf, size_t count, loff_t *ppos) { struct video_device *vdev = video_devdata(file); struct mutex *lock = vdev->queue->lock ? vdev->queue->lock : vdev->lock; int err = -EBUSY; if (!(vdev->queue->io_modes & VB2_WRITE)) return -EINVAL; if (lock && mutex_lock_interruptible(lock)) return -ERESTARTSYS; if (vb2_queue_is_busy(vdev->queue, file)) goto exit; err = vb2_write(vdev->queue, buf, count, ppos, file->f_flags & O_NONBLOCK); if (vdev->queue->fileio) vdev->queue->owner = file->private_data; exit: if (lock) mutex_unlock(lock); return err; } EXPORT_SYMBOL_GPL(vb2_fop_write); ssize_t vb2_fop_read(struct file *file, char __user *buf, size_t count, loff_t *ppos) { struct video_device *vdev = video_devdata(file); struct mutex *lock = vdev->queue->lock ? vdev->queue->lock : vdev->lock; int err = -EBUSY; if (!(vdev->queue->io_modes & VB2_READ)) return -EINVAL; if (lock && mutex_lock_interruptible(lock)) return -ERESTARTSYS; if (vb2_queue_is_busy(vdev->queue, file)) goto exit; vdev->queue->owner = file->private_data; err = vb2_read(vdev->queue, buf, count, ppos, file->f_flags & O_NONBLOCK); if (!vdev->queue->fileio) vdev->queue->owner = NULL; exit: if (lock) mutex_unlock(lock); return err; } EXPORT_SYMBOL_GPL(vb2_fop_read); __poll_t vb2_fop_poll(struct file *file, poll_table *wait) { struct video_device *vdev = video_devdata(file); struct vb2_queue *q = vdev->queue; struct mutex *lock = q->lock ? q->lock : vdev->lock; __poll_t res; void *fileio; /* * If this helper doesn't know how to lock, then you shouldn't be using * it but you should write your own. */ WARN_ON(!lock); if (lock && mutex_lock_interruptible(lock)) return EPOLLERR; fileio = q->fileio; res = vb2_poll(vdev->queue, file, wait); /* If fileio was started, then we have a new queue owner. */ if (!fileio && q->fileio) q->owner = file->private_data; if (lock) mutex_unlock(lock); return res; } EXPORT_SYMBOL_GPL(vb2_fop_poll); #ifndef CONFIG_MMU unsigned long vb2_fop_get_unmapped_area(struct file *file, unsigned long addr, unsigned long len, unsigned long pgoff, unsigned long flags) { struct video_device *vdev = video_devdata(file); return vb2_get_unmapped_area(vdev->queue, addr, len, pgoff, flags); } EXPORT_SYMBOL_GPL(vb2_fop_get_unmapped_area); #endif void vb2_video_unregister_device(struct video_device *vdev) { /* Check if vdev was ever registered at all */ if (!vdev || !video_is_registered(vdev)) return; /* * Calling this function only makes sense if vdev->queue is set. * If it is NULL, then just call video_unregister_device() instead. */ WARN_ON(!vdev->queue); /* * Take a reference to the device since video_unregister_device() * calls device_unregister(), but we don't want that to release * the device since we want to clean up the queue first. */ get_device(&vdev->dev); video_unregister_device(vdev); if (vdev->queue) { struct mutex *lock = vdev->queue->lock ? vdev->queue->lock : vdev->lock; if (lock) mutex_lock(lock); vb2_queue_release(vdev->queue); vdev->queue->owner = NULL; if (lock) mutex_unlock(lock); } /* * Now we put the device, and in most cases this will release * everything. */ put_device(&vdev->dev); } EXPORT_SYMBOL_GPL(vb2_video_unregister_device); /* vb2_ops helpers. Only use if vq->lock is non-NULL. */ void vb2_ops_wait_prepare(struct vb2_queue *vq) { mutex_unlock(vq->lock); } EXPORT_SYMBOL_GPL(vb2_ops_wait_prepare); void vb2_ops_wait_finish(struct vb2_queue *vq) { mutex_lock(vq->lock); } EXPORT_SYMBOL_GPL(vb2_ops_wait_finish); /* * Note that this function is called during validation time and * thus the req_queue_mutex is held to ensure no request objects * can be added or deleted while validating. So there is no need * to protect the objects list. */ int vb2_request_validate(struct media_request *req) { struct media_request_object *obj; int ret = 0; if (!vb2_request_buffer_cnt(req)) return -ENOENT; list_for_each_entry(obj, &req->objects, list) { if (!obj->ops->prepare) continue; ret = obj->ops->prepare(obj); if (ret) break; } if (ret) { list_for_each_entry_continue_reverse(obj, &req->objects, list) if (obj->ops->unprepare) obj->ops->unprepare(obj); return ret; } return 0; } EXPORT_SYMBOL_GPL(vb2_request_validate); void vb2_request_queue(struct media_request *req) { struct media_request_object *obj, *obj_safe; /* * Queue all objects. Note that buffer objects are at the end of the * objects list, after all other object types. Once buffer objects * are queued, the driver might delete them immediately (if the driver * processes the buffer at once), so we have to use * list_for_each_entry_safe() to handle the case where the object we * queue is deleted. */ list_for_each_entry_safe(obj, obj_safe, &req->objects, list) if (obj->ops->queue) obj->ops->queue(obj); } EXPORT_SYMBOL_GPL(vb2_request_queue); MODULE_DESCRIPTION("Driver helper framework for Video for Linux 2"); MODULE_AUTHOR("Pawel Osciak <pawel@osciak.com>, Marek Szyprowski"); MODULE_LICENSE("GPL"); |
| 3 1 1 2 2 4 3 3 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 | // SPDX-License-Identifier: GPL-2.0-or-later /* * ECB: Electronic CodeBook mode * * Copyright (c) 2006 Herbert Xu <herbert@gondor.apana.org.au> */ #include <crypto/internal/cipher.h> #include <crypto/internal/skcipher.h> #include <linux/err.h> #include <linux/init.h> #include <linux/kernel.h> #include <linux/module.h> #include <linux/slab.h> static int crypto_ecb_crypt(struct crypto_cipher *cipher, const u8 *src, u8 *dst, unsigned nbytes, bool final, void (*fn)(struct crypto_tfm *, u8 *, const u8 *)) { const unsigned int bsize = crypto_cipher_blocksize(cipher); while (nbytes >= bsize) { fn(crypto_cipher_tfm(cipher), dst, src); src += bsize; dst += bsize; nbytes -= bsize; } return nbytes && final ? -EINVAL : nbytes; } static int crypto_ecb_encrypt2(struct crypto_lskcipher *tfm, const u8 *src, u8 *dst, unsigned len, u8 *iv, u32 flags) { struct crypto_cipher **ctx = crypto_lskcipher_ctx(tfm); struct crypto_cipher *cipher = *ctx; return crypto_ecb_crypt(cipher, src, dst, len, flags & CRYPTO_LSKCIPHER_FLAG_FINAL, crypto_cipher_alg(cipher)->cia_encrypt); } static int crypto_ecb_decrypt2(struct crypto_lskcipher *tfm, const u8 *src, u8 *dst, unsigned len, u8 *iv, u32 flags) { struct crypto_cipher **ctx = crypto_lskcipher_ctx(tfm); struct crypto_cipher *cipher = *ctx; return crypto_ecb_crypt(cipher, src, dst, len, flags & CRYPTO_LSKCIPHER_FLAG_FINAL, crypto_cipher_alg(cipher)->cia_decrypt); } static int lskcipher_setkey_simple2(struct crypto_lskcipher *tfm, const u8 *key, unsigned int keylen) { struct crypto_cipher **ctx = crypto_lskcipher_ctx(tfm); struct crypto_cipher *cipher = *ctx; crypto_cipher_clear_flags(cipher, CRYPTO_TFM_REQ_MASK); crypto_cipher_set_flags(cipher, crypto_lskcipher_get_flags(tfm) & CRYPTO_TFM_REQ_MASK); return crypto_cipher_setkey(cipher, key, keylen); } static int lskcipher_init_tfm_simple2(struct crypto_lskcipher *tfm) { struct lskcipher_instance *inst = lskcipher_alg_instance(tfm); struct crypto_cipher **ctx = crypto_lskcipher_ctx(tfm); struct crypto_cipher_spawn *spawn; struct crypto_cipher *cipher; spawn = lskcipher_instance_ctx(inst); cipher = crypto_spawn_cipher(spawn); if (IS_ERR(cipher)) return PTR_ERR(cipher); *ctx = cipher; return 0; } static void lskcipher_exit_tfm_simple2(struct crypto_lskcipher *tfm) { struct crypto_cipher **ctx = crypto_lskcipher_ctx(tfm); crypto_free_cipher(*ctx); } static void lskcipher_free_instance_simple2(struct lskcipher_instance *inst) { crypto_drop_cipher(lskcipher_instance_ctx(inst)); kfree(inst); } static struct lskcipher_instance *lskcipher_alloc_instance_simple2( struct crypto_template *tmpl, struct rtattr **tb) { struct crypto_cipher_spawn *spawn; struct lskcipher_instance *inst; struct crypto_alg *cipher_alg; u32 mask; int err; err = crypto_check_attr_type(tb, CRYPTO_ALG_TYPE_LSKCIPHER, &mask); if (err) return ERR_PTR(err); inst = kzalloc(sizeof(*inst) + sizeof(*spawn), GFP_KERNEL); if (!inst) return ERR_PTR(-ENOMEM); spawn = lskcipher_instance_ctx(inst); err = crypto_grab_cipher(spawn, lskcipher_crypto_instance(inst), crypto_attr_alg_name(tb[1]), 0, mask); if (err) goto err_free_inst; cipher_alg = crypto_spawn_cipher_alg(spawn); err = crypto_inst_setname(lskcipher_crypto_instance(inst), tmpl->name, cipher_alg); if (err) goto err_free_inst; inst->free = lskcipher_free_instance_simple2; /* Default algorithm properties, can be overridden */ inst->alg.co.base.cra_blocksize = cipher_alg->cra_blocksize; inst->alg.co.base.cra_alignmask = cipher_alg->cra_alignmask; inst->alg.co.base.cra_priority = cipher_alg->cra_priority; inst->alg.co.min_keysize = cipher_alg->cra_cipher.cia_min_keysize; inst->alg.co.max_keysize = cipher_alg->cra_cipher.cia_max_keysize; inst->alg.co.ivsize = cipher_alg->cra_blocksize; /* Use struct crypto_cipher * by default, can be overridden */ inst->alg.co.base.cra_ctxsize = sizeof(struct crypto_cipher *); inst->alg.setkey = lskcipher_setkey_simple2; inst->alg.init = lskcipher_init_tfm_simple2; inst->alg.exit = lskcipher_exit_tfm_simple2; return inst; err_free_inst: lskcipher_free_instance_simple2(inst); return ERR_PTR(err); } static int crypto_ecb_create2(struct crypto_template *tmpl, struct rtattr **tb) { struct lskcipher_instance *inst; int err; inst = lskcipher_alloc_instance_simple2(tmpl, tb); if (IS_ERR(inst)) return PTR_ERR(inst); /* ECB mode doesn't take an IV */ inst->alg.co.ivsize = 0; inst->alg.encrypt = crypto_ecb_encrypt2; inst->alg.decrypt = crypto_ecb_decrypt2; err = lskcipher_register_instance(tmpl, inst); if (err) inst->free(inst); return err; } static int crypto_ecb_create(struct crypto_template *tmpl, struct rtattr **tb) { struct crypto_lskcipher_spawn *spawn; struct lskcipher_alg *cipher_alg; struct lskcipher_instance *inst; int err; inst = lskcipher_alloc_instance_simple(tmpl, tb); if (IS_ERR(inst)) { err = crypto_ecb_create2(tmpl, tb); return err; } spawn = lskcipher_instance_ctx(inst); cipher_alg = crypto_lskcipher_spawn_alg(spawn); /* ECB mode doesn't take an IV */ inst->alg.co.ivsize = 0; if (cipher_alg->co.ivsize) return -EINVAL; inst->alg.co.base.cra_ctxsize = cipher_alg->co.base.cra_ctxsize; inst->alg.setkey = cipher_alg->setkey; inst->alg.encrypt = cipher_alg->encrypt; inst->alg.decrypt = cipher_alg->decrypt; inst->alg.init = cipher_alg->init; inst->alg.exit = cipher_alg->exit; err = lskcipher_register_instance(tmpl, inst); if (err) inst->free(inst); return err; } static struct crypto_template crypto_ecb_tmpl = { .name = "ecb", .create = crypto_ecb_create, .module = THIS_MODULE, }; static int __init crypto_ecb_module_init(void) { return crypto_register_template(&crypto_ecb_tmpl); } static void __exit crypto_ecb_module_exit(void) { crypto_unregister_template(&crypto_ecb_tmpl); } module_init(crypto_ecb_module_init); module_exit(crypto_ecb_module_exit); MODULE_LICENSE("GPL"); MODULE_DESCRIPTION("ECB block cipher mode of operation"); MODULE_ALIAS_CRYPTO("ecb"); MODULE_IMPORT_NS("CRYPTO_INTERNAL"); |
| 1320 1319 111 1320 178 178 178 2 178 178 178 178 180 44 180 177 44 35 2 178 175 34 7 29 20 1 29 12 29 34 178 178 214 1 212 2 159 71 188 187 94 37 37 9 9 2 176 175 176 13 164 176 175 2 158 20 207 74 138 40 3 35 8 180 20 20 7 170 168 2 170 170 170 170 1 169 20 20 7 20 4 160 160 170 170 8 164 156 8 2 2 6 171 2 170 170 164 164 217 10 208 73 1 1 2 5 3 7 165 1 1 86 137 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 | // SPDX-License-Identifier: GPL-2.0-only /* * linux/fs/exec.c * * Copyright (C) 1991, 1992 Linus Torvalds */ /* * #!-checking implemented by tytso. */ /* * Demand-loading implemented 01.12.91 - no need to read anything but * the header into memory. The inode of the executable is put into * "current->executable", and page faults do the actual loading. Clean. * * Once more I can proudly say that linux stood up to being changed: it * was less than 2 hours work to get demand-loading completely implemented. * * Demand loading changed July 1993 by Eric Youngdale. Use mmap instead, * current->executable is only used by the procfs. This allows a dispatch * table to check for several different types of binary formats. We keep * trying until we recognize the file or we run out of supported binary * formats. */ #include <linux/kernel_read_file.h> #include <linux/slab.h> #include <linux/file.h> #include <linux/fdtable.h> #include <linux/mm.h> #include <linux/stat.h> #include <linux/fcntl.h> #include <linux/swap.h> #include <linux/string.h> #include <linux/init.h> #include <linux/sched/mm.h> #include <linux/sched/coredump.h> #include <linux/sched/signal.h> #include <linux/sched/numa_balancing.h> #include <linux/sched/task.h> #include <linux/pagemap.h> #include <linux/perf_event.h> #include <linux/highmem.h> #include <linux/spinlock.h> #include <linux/key.h> #include <linux/personality.h> #include <linux/binfmts.h> #include <linux/utsname.h> #include <linux/pid_namespace.h> #include <linux/module.h> #include <linux/namei.h> #include <linux/mount.h> #include <linux/security.h> #include <linux/syscalls.h> #include <linux/tsacct_kern.h> #include <linux/cn_proc.h> #include <linux/audit.h> #include <linux/kmod.h> #include <linux/fsnotify.h> #include <linux/fs_struct.h> #include <linux/oom.h> #include <linux/compat.h> #include <linux/vmalloc.h> #include <linux/io_uring.h> #include <linux/syscall_user_dispatch.h> #include <linux/coredump.h> #include <linux/time_namespace.h> #include <linux/user_events.h> #include <linux/rseq.h> #include <linux/ksm.h> #include <linux/uaccess.h> #include <asm/mmu_context.h> #include <asm/tlb.h> #include <trace/events/task.h> #include "internal.h" #include <trace/events/sched.h> /* For vma exec functions. */ #include "../mm/internal.h" static int bprm_creds_from_file(struct linux_binprm *bprm); int suid_dumpable = 0; static LIST_HEAD(formats); static DEFINE_RWLOCK(binfmt_lock); void __register_binfmt(struct linux_binfmt * fmt, int insert) { write_lock(&binfmt_lock); insert ? list_add(&fmt->lh, &formats) : list_add_tail(&fmt->lh, &formats); write_unlock(&binfmt_lock); } EXPORT_SYMBOL(__register_binfmt); void unregister_binfmt(struct linux_binfmt * fmt) { write_lock(&binfmt_lock); list_del(&fmt->lh); write_unlock(&binfmt_lock); } EXPORT_SYMBOL(unregister_binfmt); static inline void put_binfmt(struct linux_binfmt * fmt) { module_put(fmt->module); } bool path_noexec(const struct path *path) { /* If it's an anonymous inode make sure that we catch any shenanigans. */ VFS_WARN_ON_ONCE(IS_ANON_FILE(d_inode(path->dentry)) && !(path->mnt->mnt_sb->s_iflags & SB_I_NOEXEC)); return (path->mnt->mnt_flags & MNT_NOEXEC) || (path->mnt->mnt_sb->s_iflags & SB_I_NOEXEC); } #ifdef CONFIG_MMU /* * The nascent bprm->mm is not visible until exec_mmap() but it can * use a lot of memory, account these pages in current->mm temporary * for oom_badness()->get_mm_rss(). Once exec succeeds or fails, we * change the counter back via acct_arg_size(0). */ static void acct_arg_size(struct linux_binprm *bprm, unsigned long pages) { struct mm_struct *mm = current->mm; long diff = (long)(pages - bprm->vma_pages); if (!mm || !diff) return; bprm->vma_pages = pages; add_mm_counter(mm, MM_ANONPAGES, diff); } static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos, int write) { struct page *page; struct vm_area_struct *vma = bprm->vma; struct mm_struct *mm = bprm->mm; int ret; /* * Avoid relying on expanding the stack down in GUP (which * does not work for STACK_GROWSUP anyway), and just do it * ahead of time. */ if (!mmap_read_lock_maybe_expand(mm, vma, pos, write)) return NULL; /* * We are doing an exec(). 'current' is the process * doing the exec and 'mm' is the new process's mm. */ ret = get_user_pages_remote(mm, pos, 1, write ? FOLL_WRITE : 0, &page, NULL); mmap_read_unlock(mm); if (ret <= 0) return NULL; if (write) acct_arg_size(bprm, vma_pages(vma)); return page; } static void put_arg_page(struct page *page) { put_page(page); } static void free_arg_pages(struct linux_binprm *bprm) { } static void flush_arg_page(struct linux_binprm *bprm, unsigned long pos, struct page *page) { flush_cache_page(bprm->vma, pos, page_to_pfn(page)); } static bool valid_arg_len(struct linux_binprm *bprm, long len) { return len <= MAX_ARG_STRLEN; } #else static inline void acct_arg_size(struct linux_binprm *bprm, unsigned long pages) { } static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos, int write) { struct page *page; page = bprm->page[pos / PAGE_SIZE]; if (!page && write) { page = alloc_page(GFP_HIGHUSER|__GFP_ZERO); if (!page) return NULL; bprm->page[pos / PAGE_SIZE] = page; } return page; } static void put_arg_page(struct page *page) { } static void free_arg_page(struct linux_binprm *bprm, int i) { if (bprm->page[i]) { __free_page(bprm->page[i]); bprm->page[i] = NULL; } } static void free_arg_pages(struct linux_binprm *bprm) { int i; for (i = 0; i < MAX_ARG_PAGES; i++) free_arg_page(bprm, i); } static void flush_arg_page(struct linux_binprm *bprm, unsigned long pos, struct page *page) { } static bool valid_arg_len(struct linux_binprm *bprm, long len) { return len <= bprm->p; } #endif /* CONFIG_MMU */ /* * Create a new mm_struct and populate it with a temporary stack * vm_area_struct. We don't have enough context at this point to set the stack * flags, permissions, and offset, so we use temporary values. We'll update * them later in setup_arg_pages(). */ static int bprm_mm_init(struct linux_binprm *bprm) { int err; struct mm_struct *mm = NULL; bprm->mm = mm = mm_alloc(); err = -ENOMEM; if (!mm) goto err; /* Save current stack limit for all calculations made during exec. */ task_lock(current->group_leader); bprm->rlim_stack = current->signal->rlim[RLIMIT_STACK]; task_unlock(current->group_leader); #ifndef CONFIG_MMU bprm->p = PAGE_SIZE * MAX_ARG_PAGES - sizeof(void *); #else err = create_init_stack_vma(bprm->mm, &bprm->vma, &bprm->p); if (err) goto err; #endif return 0; err: if (mm) { bprm->mm = NULL; mmdrop(mm); } return err; } struct user_arg_ptr { #ifdef CONFIG_COMPAT bool is_compat; #endif union { const char __user *const __user *native; #ifdef CONFIG_COMPAT const compat_uptr_t __user *compat; #endif } ptr; }; static const char __user *get_user_arg_ptr(struct user_arg_ptr argv, int nr) { const char __user *native; #ifdef CONFIG_COMPAT if (unlikely(argv.is_compat)) { compat_uptr_t compat; if (get_user(compat, argv.ptr.compat + nr)) return ERR_PTR(-EFAULT); return compat_ptr(compat); } #endif if (get_user(native, argv.ptr.native + nr)) return ERR_PTR(-EFAULT); return native; } /* * count() counts the number of strings in array ARGV. */ static int count(struct user_arg_ptr argv, int max) { int i = 0; if (argv.ptr.native != NULL) { for (;;) { const char __user *p = get_user_arg_ptr(argv, i); if (!p) break; if (IS_ERR(p)) return -EFAULT; if (i >= max) return -E2BIG; ++i; if (fatal_signal_pending(current)) return -ERESTARTNOHAND; cond_resched(); } } return i; } static int count_strings_kernel(const char *const *argv) { int i; if (!argv) return 0; for (i = 0; argv[i]; ++i) { if (i >= MAX_ARG_STRINGS) return -E2BIG; if (fatal_signal_pending(current)) return -ERESTARTNOHAND; cond_resched(); } return i; } static inline int bprm_set_stack_limit(struct linux_binprm *bprm, unsigned long limit) { #ifdef CONFIG_MMU /* Avoid a pathological bprm->p. */ if (bprm->p < limit) return -E2BIG; bprm->argmin = bprm->p - limit; #endif return 0; } static inline bool bprm_hit_stack_limit(struct linux_binprm *bprm) { #ifdef CONFIG_MMU return bprm->p < bprm->argmin; #else return false; #endif } /* * Calculate bprm->argmin from: * - _STK_LIM * - ARG_MAX * - bprm->rlim_stack.rlim_cur * - bprm->argc * - bprm->envc * - bprm->p */ static int bprm_stack_limits(struct linux_binprm *bprm) { unsigned long limit, ptr_size; /* * Limit to 1/4 of the max stack size or 3/4 of _STK_LIM * (whichever is smaller) for the argv+env strings. * This ensures that: * - the remaining binfmt code will not run out of stack space, * - the program will have a reasonable amount of stack left * to work from. */ limit = _STK_LIM / 4 * 3; limit = min(limit, bprm->rlim_stack.rlim_cur / 4); /* * We've historically supported up to 32 pages (ARG_MAX) * of argument strings even with small stacks */ limit = max_t(unsigned long, limit, ARG_MAX); /* Reject totally pathological counts. */ if (bprm->argc < 0 || bprm->envc < 0) return -E2BIG; /* * We must account for the size of all the argv and envp pointers to * the argv and envp strings, since they will also take up space in * the stack. They aren't stored until much later when we can't * signal to the parent that the child has run out of stack space. * Instead, calculate it here so it's possible to fail gracefully. * * In the case of argc = 0, make sure there is space for adding a * empty string (which will bump argc to 1), to ensure confused * userspace programs don't start processing from argv[1], thinking * argc can never be 0, to keep them from walking envp by accident. * See do_execveat_common(). */ if (check_add_overflow(max(bprm->argc, 1), bprm->envc, &ptr_size) || check_mul_overflow(ptr_size, sizeof(void *), &ptr_size)) return -E2BIG; if (limit <= ptr_size) return -E2BIG; limit -= ptr_size; return bprm_set_stack_limit(bprm, limit); } /* * 'copy_strings()' copies argument/environment strings from the old * processes's memory to the new process's stack. The call to get_user_pages() * ensures the destination page is created and not swapped out. */ static int copy_strings(int argc, struct user_arg_ptr argv, struct linux_binprm *bprm) { struct page *kmapped_page = NULL; char *kaddr = NULL; unsigned long kpos = 0; int ret; while (argc-- > 0) { const char __user *str; int len; unsigned long pos; ret = -EFAULT; str = get_user_arg_ptr(argv, argc); if (IS_ERR(str)) goto out; len = strnlen_user(str, MAX_ARG_STRLEN); if (!len) goto out; ret = -E2BIG; if (!valid_arg_len(bprm, len)) goto out; /* We're going to work our way backwards. */ pos = bprm->p; str += len; bprm->p -= len; if (bprm_hit_stack_limit(bprm)) goto out; while (len > 0) { int offset, bytes_to_copy; if (fatal_signal_pending(current)) { ret = -ERESTARTNOHAND; goto out; } cond_resched(); offset = pos % PAGE_SIZE; if (offset == 0) offset = PAGE_SIZE; bytes_to_copy = offset; if (bytes_to_copy > len) bytes_to_copy = len; offset -= bytes_to_copy; pos -= bytes_to_copy; str -= bytes_to_copy; len -= bytes_to_copy; if (!kmapped_page || kpos != (pos & PAGE_MASK)) { struct page *page; page = get_arg_page(bprm, pos, 1); if (!page) { ret = -E2BIG; goto out; } if (kmapped_page) { flush_dcache_page(kmapped_page); kunmap_local(kaddr); put_arg_page(kmapped_page); } kmapped_page = page; kaddr = kmap_local_page(kmapped_page); kpos = pos & PAGE_MASK; flush_arg_page(bprm, kpos, kmapped_page); } if (copy_from_user(kaddr+offset, str, bytes_to_copy)) { ret = -EFAULT; goto out; } } } ret = 0; out: if (kmapped_page) { flush_dcache_page(kmapped_page); kunmap_local(kaddr); put_arg_page(kmapped_page); } return ret; } /* * Copy and argument/environment string from the kernel to the processes stack. */ int copy_string_kernel(const char *arg, struct linux_binprm *bprm) { int len = strnlen(arg, MAX_ARG_STRLEN) + 1 /* terminating NUL */; unsigned long pos = bprm->p; if (len == 0) return -EFAULT; if (!valid_arg_len(bprm, len)) return -E2BIG; /* We're going to work our way backwards. */ arg += len; bprm->p -= len; if (bprm_hit_stack_limit(bprm)) return -E2BIG; while (len > 0) { unsigned int bytes_to_copy = min_t(unsigned int, len, min_not_zero(offset_in_page(pos), PAGE_SIZE)); struct page *page; pos -= bytes_to_copy; arg -= bytes_to_copy; len -= bytes_to_copy; page = get_arg_page(bprm, pos, 1); if (!page) return -E2BIG; flush_arg_page(bprm, pos & PAGE_MASK, page); memcpy_to_page(page, offset_in_page(pos), arg, bytes_to_copy); put_arg_page(page); } return 0; } EXPORT_SYMBOL(copy_string_kernel); static int copy_strings_kernel(int argc, const char *const *argv, struct linux_binprm *bprm) { while (argc-- > 0) { int ret = copy_string_kernel(argv[argc], bprm); if (ret < 0) return ret; if (fatal_signal_pending(current)) return -ERESTARTNOHAND; cond_resched(); } return 0; } #ifdef CONFIG_MMU /* * Finalizes the stack vm_area_struct. The flags and permissions are updated, * the stack is optionally relocated, and some extra space is added. */ int setup_arg_pages(struct linux_binprm *bprm, unsigned long stack_top, int executable_stack) { int ret; unsigned long stack_shift; struct mm_struct *mm = current->mm; struct vm_area_struct *vma = bprm->vma; struct vm_area_struct *prev = NULL; vm_flags_t vm_flags; unsigned long stack_base; unsigned long stack_size; unsigned long stack_expand; unsigned long rlim_stack; struct mmu_gather tlb; struct vma_iterator vmi; #ifdef CONFIG_STACK_GROWSUP /* Limit stack size */ stack_base = bprm->rlim_stack.rlim_max; stack_base = calc_max_stack_size(stack_base); /* Add space for stack randomization. */ if (current->flags & PF_RANDOMIZE) stack_base += (STACK_RND_MASK << PAGE_SHIFT); /* Make sure we didn't let the argument array grow too large. */ if (vma->vm_end - vma->vm_start > stack_base) return -ENOMEM; stack_base = PAGE_ALIGN(stack_top - stack_base); stack_shift = vma->vm_start - stack_base; mm->arg_start = bprm->p - stack_shift; bprm->p = vma->vm_end - stack_shift; #else stack_top = arch_align_stack(stack_top); stack_top = PAGE_ALIGN(stack_top); if (unlikely(stack_top < mmap_min_addr) || unlikely(vma->vm_end - vma->vm_start >= stack_top - mmap_min_addr)) return -ENOMEM; stack_shift = vma->vm_end - stack_top; bprm->p -= stack_shift; mm->arg_start = bprm->p; #endif bprm->exec -= stack_shift; if (mmap_write_lock_killable(mm)) return -EINTR; vm_flags = VM_STACK_FLAGS; /* * Adjust stack execute permissions; explicitly enable for * EXSTACK_ENABLE_X, disable for EXSTACK_DISABLE_X and leave alone * (arch default) otherwise. */ if (unlikely(executable_stack == EXSTACK_ENABLE_X)) vm_flags |= VM_EXEC; else if (executable_stack == EXSTACK_DISABLE_X) vm_flags &= ~VM_EXEC; vm_flags |= mm->def_flags; vm_flags |= VM_STACK_INCOMPLETE_SETUP; vma_iter_init(&vmi, mm, vma->vm_start); tlb_gather_mmu(&tlb, mm); ret = mprotect_fixup(&vmi, &tlb, vma, &prev, vma->vm_start, vma->vm_end, vm_flags); tlb_finish_mmu(&tlb); if (ret) goto out_unlock; BUG_ON(prev != vma); if (unlikely(vm_flags & VM_EXEC)) { pr_warn_once("process '%pD4' started with executable stack\n", bprm->file); } /* Move stack pages down in memory. */ if (stack_shift) { /* * During bprm_mm_init(), we create a temporary stack at STACK_TOP_MAX. Once * the binfmt code determines where the new stack should reside, we shift it to * its final location. */ ret = relocate_vma_down(vma, stack_shift); if (ret) goto out_unlock; } /* mprotect_fixup is overkill to remove the temporary stack flags */ vm_flags_clear(vma, VM_STACK_INCOMPLETE_SETUP); stack_expand = 131072UL; /* randomly 32*4k (or 2*64k) pages */ stack_size = vma->vm_end - vma->vm_start; /* * Align this down to a page boundary as expand_stack * will align it up. */ rlim_stack = bprm->rlim_stack.rlim_cur & PAGE_MASK; stack_expand = min(rlim_stack, stack_size + stack_expand); #ifdef CONFIG_STACK_GROWSUP stack_base = vma->vm_start + stack_expand; #else stack_base = vma->vm_end - stack_expand; #endif current->mm->start_stack = bprm->p; ret = expand_stack_locked(vma, stack_base); if (ret) ret = -EFAULT; out_unlock: mmap_write_unlock(mm); return ret; } EXPORT_SYMBOL(setup_arg_pages); #else /* * Transfer the program arguments and environment from the holding pages * onto the stack. The provided stack pointer is adjusted accordingly. */ int transfer_args_to_stack(struct linux_binprm *bprm, unsigned long *sp_location) { unsigned long index, stop, sp; int ret = 0; stop = bprm->p >> PAGE_SHIFT; sp = *sp_location; for (index = MAX_ARG_PAGES - 1; index >= stop; index--) { unsigned int offset = index == stop ? bprm->p & ~PAGE_MASK : 0; char *src = kmap_local_page(bprm->page[index]) + offset; sp -= PAGE_SIZE - offset; if (copy_to_user((void *) sp, src, PAGE_SIZE - offset) != 0) ret = -EFAULT; kunmap_local(src); if (ret) goto out; } bprm->exec += *sp_location - MAX_ARG_PAGES * PAGE_SIZE; *sp_location = sp; out: return ret; } EXPORT_SYMBOL(transfer_args_to_stack); #endif /* CONFIG_MMU */ /* * On success, caller must call do_close_execat() on the returned * struct file to close it. */ static struct file *do_open_execat(int fd, struct filename *name, int flags) { int err; struct file *file __free(fput) = NULL; struct open_flags open_exec_flags = { .open_flag = O_LARGEFILE | O_RDONLY | __FMODE_EXEC, .acc_mode = MAY_EXEC, .intent = LOOKUP_OPEN, .lookup_flags = LOOKUP_FOLLOW, }; if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH | AT_EXECVE_CHECK)) != 0) return ERR_PTR(-EINVAL); if (flags & AT_SYMLINK_NOFOLLOW) open_exec_flags.lookup_flags &= ~LOOKUP_FOLLOW; if (flags & AT_EMPTY_PATH) open_exec_flags.lookup_flags |= LOOKUP_EMPTY; file = do_filp_open(fd, name, &open_exec_flags); if (IS_ERR(file)) return file; if (path_noexec(&file->f_path)) return ERR_PTR(-EACCES); /* * In the past the regular type check was here. It moved to may_open() in * 633fb6ac3980 ("exec: move S_ISREG() check earlier"). Since then it is * an invariant that all non-regular files error out before we get here. */ if (WARN_ON_ONCE(!S_ISREG(file_inode(file)->i_mode))) return ERR_PTR(-EACCES); err = exe_file_deny_write_access(file); if (err) return ERR_PTR(err); return no_free_ptr(file); } /** * open_exec - Open a path name for execution * * @name: path name to open with the intent of executing it. * * Returns ERR_PTR on failure or allocated struct file on success. * * As this is a wrapper for the internal do_open_execat(), callers * must call exe_file_allow_write_access() before fput() on release. Also see * do_close_execat(). */ struct file *open_exec(const char *name) { struct filename *filename = getname_kernel(name); struct file *f = ERR_CAST(filename); if (!IS_ERR(filename)) { f = do_open_execat(AT_FDCWD, filename, 0); putname(filename); } return f; } EXPORT_SYMBOL(open_exec); #if defined(CONFIG_BINFMT_FLAT) || defined(CONFIG_BINFMT_ELF_FDPIC) ssize_t read_code(struct file *file, unsigned long addr, loff_t pos, size_t len) { ssize_t res = vfs_read(file, (void __user *)addr, len, &pos); if (res > 0) flush_icache_user_range(addr, addr + len); return res; } EXPORT_SYMBOL(read_code); #endif /* * Maps the mm_struct mm into the current task struct. * On success, this function returns with exec_update_lock * held for writing. */ static int exec_mmap(struct mm_struct *mm) { struct task_struct *tsk; struct mm_struct *old_mm, *active_mm; int ret; /* Notify parent that we're no longer interested in the old VM */ tsk = current; old_mm = current->mm; exec_mm_release(tsk, old_mm); ret = down_write_killable(&tsk->signal->exec_update_lock); if (ret) return ret; if (old_mm) { /* * If there is a pending fatal signal perhaps a signal * whose default action is to create a coredump get * out and die instead of going through with the exec. */ ret = mmap_read_lock_killable(old_mm); if (ret) { up_write(&tsk->signal->exec_update_lock); return ret; } } task_lock(tsk); membarrier_exec_mmap(mm); local_irq_disable(); active_mm = tsk->active_mm; tsk->active_mm = mm; tsk->mm = mm; mm_init_cid(mm, tsk); /* * This prevents preemption while active_mm is being loaded and * it and mm are being updated, which could cause problems for * lazy tlb mm refcounting when these are updated by context * switches. Not all architectures can handle irqs off over * activate_mm yet. */ if (!IS_ENABLED(CONFIG_ARCH_WANT_IRQS_OFF_ACTIVATE_MM)) local_irq_enable(); activate_mm(active_mm, mm); if (IS_ENABLED(CONFIG_ARCH_WANT_IRQS_OFF_ACTIVATE_MM)) local_irq_enable(); lru_gen_add_mm(mm); task_unlock(tsk); lru_gen_use_mm(mm); if (old_mm) { mmap_read_unlock(old_mm); BUG_ON(active_mm != old_mm); setmax_mm_hiwater_rss(&tsk->signal->maxrss, old_mm); mm_update_next_owner(old_mm); mmput(old_mm); return 0; } mmdrop_lazy_tlb(active_mm); return 0; } static int de_thread(struct task_struct *tsk) { struct signal_struct *sig = tsk->signal; struct sighand_struct *oldsighand = tsk->sighand; spinlock_t *lock = &oldsighand->siglock; if (thread_group_empty(tsk)) goto no_thread_group; /* * Kill all other threads in the thread group. */ spin_lock_irq(lock); if ((sig->flags & SIGNAL_GROUP_EXIT) || sig->group_exec_task) { /* * Another group action in progress, just * return so that the signal is processed. */ spin_unlock_irq(lock); return -EAGAIN; } sig->group_exec_task = tsk; sig->notify_count = zap_other_threads(tsk); if (!thread_group_leader(tsk)) sig->notify_count--; while (sig->notify_count) { __set_current_state(TASK_KILLABLE); spin_unlock_irq(lock); schedule(); if (__fatal_signal_pending(tsk)) goto killed; spin_lock_irq(lock); } spin_unlock_irq(lock); /* * At this point all other threads have exited, all we have to * do is to wait for the thread group leader to become inactive, * and to assume its PID: */ if (!thread_group_leader(tsk)) { struct task_struct *leader = tsk->group_leader; for (;;) { cgroup_threadgroup_change_begin(tsk); write_lock_irq(&tasklist_lock); /* * Do this under tasklist_lock to ensure that * exit_notify() can't miss ->group_exec_task */ sig->notify_count = -1; if (likely(leader->exit_state)) break; __set_current_state(TASK_KILLABLE); write_unlock_irq(&tasklist_lock); cgroup_threadgroup_change_end(tsk); schedule(); if (__fatal_signal_pending(tsk)) goto killed; } /* * The only record we have of the real-time age of a * process, regardless of execs it's done, is start_time. * All the past CPU time is accumulated in signal_struct * from sister threads now dead. But in this non-leader * exec, nothing survives from the original leader thread, * whose birth marks the true age of this process now. * When we take on its identity by switching to its PID, we * also take its birthdate (always earlier than our own). */ tsk->start_time = leader->start_time; tsk->start_boottime = leader->start_boottime; BUG_ON(!same_thread_group(leader, tsk)); /* * An exec() starts a new thread group with the * TGID of the previous thread group. Rehash the * two threads with a switched PID, and release * the former thread group leader: */ /* Become a process group leader with the old leader's pid. * The old leader becomes a thread of the this thread group. */ exchange_tids(tsk, leader); transfer_pid(leader, tsk, PIDTYPE_TGID); transfer_pid(leader, tsk, PIDTYPE_PGID); transfer_pid(leader, tsk, PIDTYPE_SID); list_replace_rcu(&leader->tasks, &tsk->tasks); list_replace_init(&leader->sibling, &tsk->sibling); tsk->group_leader = tsk; leader->group_leader = tsk; tsk->exit_signal = SIGCHLD; leader->exit_signal = -1; BUG_ON(leader->exit_state != EXIT_ZOMBIE); leader->exit_state = EXIT_DEAD; /* * We are going to release_task()->ptrace_unlink() silently, * the tracer can sleep in do_wait(). EXIT_DEAD guarantees * the tracer won't block again waiting for this thread. */ if (unlikely(leader->ptrace)) __wake_up_parent(leader, leader->parent); write_unlock_irq(&tasklist_lock); cgroup_threadgroup_change_end(tsk); release_task(leader); } sig->group_exec_task = NULL; sig->notify_count = 0; no_thread_group: /* we have changed execution domain */ tsk->exit_signal = SIGCHLD; BUG_ON(!thread_group_leader(tsk)); return 0; killed: /* protects against exit_notify() and __exit_signal() */ read_lock(&tasklist_lock); sig->group_exec_task = NULL; sig->notify_count = 0; read_unlock(&tasklist_lock); return -EAGAIN; } /* * This function makes sure the current process has its own signal table, * so that flush_signal_handlers can later reset the handlers without * disturbing other processes. (Other processes might share the signal * table via the CLONE_SIGHAND option to clone().) */ static int unshare_sighand(struct task_struct *me) { struct sighand_struct *oldsighand = me->sighand; if (refcount_read(&oldsighand->count) != 1) { struct sighand_struct *newsighand; /* * This ->sighand is shared with the CLONE_SIGHAND * but not CLONE_THREAD task, switch to the new one. */ newsighand = kmem_cache_alloc(sighand_cachep, GFP_KERNEL); if (!newsighand) return -ENOMEM; refcount_set(&newsighand->count, 1); write_lock_irq(&tasklist_lock); spin_lock(&oldsighand->siglock); memcpy(newsighand->action, oldsighand->action, sizeof(newsighand->action)); rcu_assign_pointer(me->sighand, newsighand); spin_unlock(&oldsighand->siglock); write_unlock_irq(&tasklist_lock); __cleanup_sighand(oldsighand); } return 0; } /* * This is unlocked -- the string will always be NUL-terminated, but * may show overlapping contents if racing concurrent reads. */ void __set_task_comm(struct task_struct *tsk, const char *buf, bool exec) { size_t len = min(strlen(buf), sizeof(tsk->comm) - 1); trace_task_rename(tsk, buf); memcpy(tsk->comm, buf, len); memset(&tsk->comm[len], 0, sizeof(tsk->comm) - len); perf_event_comm(tsk, exec); } /* * Calling this is the point of no return. None of the failures will be * seen by userspace since either the process is already taking a fatal * signal (via de_thread() or coredump), or will have SEGV raised * (after exec_mmap()) by search_binary_handler (see below). */ int begin_new_exec(struct linux_binprm * bprm) { struct task_struct *me = current; int retval; /* Once we are committed compute the creds */ retval = bprm_creds_from_file(bprm); if (retval) return retval; /* * This tracepoint marks the point before flushing the old exec where * the current task is still unchanged, but errors are fatal (point of * no return). The later "sched_process_exec" tracepoint is called after * the current task has successfully switched to the new exec. */ trace_sched_prepare_exec(current, bprm); /* * Ensure all future errors are fatal. */ bprm->point_of_no_return = true; /* Make this the only thread in the thread group */ retval = de_thread(me); if (retval) goto out; /* see the comment in check_unsafe_exec() */ current->fs->in_exec = 0; /* * Cancel any io_uring activity across execve */ io_uring_task_cancel(); /* Ensure the files table is not shared. */ retval = unshare_files(); if (retval) goto out; /* * Must be called _before_ exec_mmap() as bprm->mm is * not visible until then. Doing it here also ensures * we don't race against replace_mm_exe_file(). */ retval = set_mm_exe_file(bprm->mm, bprm->file); if (retval) goto out; /* If the binary is not readable then enforce mm->dumpable=0 */ would_dump(bprm, bprm->file); if (bprm->have_execfd) would_dump(bprm, bprm->executable); /* * Release all of the old mmap stuff */ acct_arg_size(bprm, 0); retval = exec_mmap(bprm->mm); if (retval) goto out; bprm->mm = NULL; retval = exec_task_namespaces(); if (retval) goto out_unlock; #ifdef CONFIG_POSIX_TIMERS spin_lock_irq(&me->sighand->siglock); posix_cpu_timers_exit(me); spin_unlock_irq(&me->sighand->siglock); exit_itimers(me); flush_itimer_signals(); #endif /* * Make the signal table private. */ retval = unshare_sighand(me); if (retval) goto out_unlock; me->flags &= ~(PF_RANDOMIZE | PF_FORKNOEXEC | PF_NOFREEZE | PF_NO_SETAFFINITY); flush_thread(); me->personality &= ~bprm->per_clear; clear_syscall_work_syscall_user_dispatch(me); /* * We have to apply CLOEXEC before we change whether the process is * dumpable (in setup_new_exec) to avoid a race with a process in userspace * trying to access the should-be-closed file descriptors of a process * undergoing exec(2). */ do_close_on_exec(me->files); if (bprm->secureexec) { /* Make sure parent cannot signal privileged process. */ me->pdeath_signal = 0; /* * For secureexec, reset the stack limit to sane default to * avoid bad behavior from the prior rlimits. This has to * happen before arch_pick_mmap_layout(), which examines * RLIMIT_STACK, but after the point of no return to avoid * needing to clean up the change on failure. */ if (bprm->rlim_stack.rlim_cur > _STK_LIM) bprm->rlim_stack.rlim_cur = _STK_LIM; } me->sas_ss_sp = me->sas_ss_size = 0; /* * Figure out dumpability. Note that this checking only of current * is wrong, but userspace depends on it. This should be testing * bprm->secureexec instead. */ if (bprm->interp_flags & BINPRM_FLAGS_ENFORCE_NONDUMP || !(uid_eq(current_euid(), current_uid()) && gid_eq(current_egid(), current_gid()))) set_dumpable(current->mm, suid_dumpable); else set_dumpable(current->mm, SUID_DUMP_USER); perf_event_exec(); /* * If the original filename was empty, alloc_bprm() made up a path * that will probably not be useful to admins running ps or similar. * Let's fix it up to be something reasonable. */ if (bprm->comm_from_dentry) { /* * Hold RCU lock to keep the name from being freed behind our back. * Use acquire semantics to make sure the terminating NUL from * __d_alloc() is seen. * * Note, we're deliberately sloppy here. We don't need to care about * detecting a concurrent rename and just want a terminated name. */ rcu_read_lock(); __set_task_comm(me, smp_load_acquire(&bprm->file->f_path.dentry->d_name.name), true); rcu_read_unlock(); } else { __set_task_comm(me, kbasename(bprm->filename), true); } /* An exec changes our domain. We are no longer part of the thread group */ WRITE_ONCE(me->self_exec_id, me->self_exec_id + 1); flush_signal_handlers(me, 0); retval = set_cred_ucounts(bprm->cred); if (retval < 0) goto out_unlock; /* * install the new credentials for this executable */ security_bprm_committing_creds(bprm); commit_creds(bprm->cred); bprm->cred = NULL; /* * Disable monitoring for regular users * when executing setuid binaries. Must * wait until new credentials are committed * by commit_creds() above */ if (get_dumpable(me->mm) != SUID_DUMP_USER) perf_event_exit_task(me); /* * cred_guard_mutex must be held at least to this point to prevent * ptrace_attach() from altering our determination of the task's * credentials; any time after this it may be unlocked. */ security_bprm_committed_creds(bprm); /* Pass the opened binary to the interpreter. */ if (bprm->have_execfd) { retval = get_unused_fd_flags(0); if (retval < 0) goto out_unlock; fd_install(retval, bprm->executable); bprm->executable = NULL; bprm->execfd = retval; } return 0; out_unlock: up_write(&me->signal->exec_update_lock); if (!bprm->cred) mutex_unlock(&me->signal->cred_guard_mutex); out: return retval; } EXPORT_SYMBOL(begin_new_exec); void would_dump(struct linux_binprm *bprm, struct file *file) { struct inode *inode = file_inode(file); struct mnt_idmap *idmap = file_mnt_idmap(file); if (inode_permission(idmap, inode, MAY_READ) < 0) { struct user_namespace *old, *user_ns; bprm->interp_flags |= BINPRM_FLAGS_ENFORCE_NONDUMP; /* Ensure mm->user_ns contains the executable */ user_ns = old = bprm->mm->user_ns; while ((user_ns != &init_user_ns) && !privileged_wrt_inode_uidgid(user_ns, idmap, inode)) user_ns = user_ns->parent; if (old != user_ns) { bprm->mm->user_ns = get_user_ns(user_ns); put_user_ns(old); } } } EXPORT_SYMBOL(would_dump); void setup_new_exec(struct linux_binprm * bprm) { /* Setup things that can depend upon the personality */ struct task_struct *me = current; arch_pick_mmap_layout(me->mm, &bprm->rlim_stack); arch_setup_new_exec(); /* Set the new mm task size. We have to do that late because it may * depend on TIF_32BIT which is only updated in flush_thread() on * some architectures like powerpc */ me->mm->task_size = TASK_SIZE; up_write(&me->signal->exec_update_lock); mutex_unlock(&me->signal->cred_guard_mutex); } EXPORT_SYMBOL(setup_new_exec); /* Runs immediately before start_thread() takes over. */ void finalize_exec(struct linux_binprm *bprm) { /* Store any stack rlimit changes before starting thread. */ task_lock(current->group_leader); current->signal->rlim[RLIMIT_STACK] = bprm->rlim_stack; task_unlock(current->group_leader); } EXPORT_SYMBOL(finalize_exec); /* * Prepare credentials and lock ->cred_guard_mutex. * setup_new_exec() commits the new creds and drops the lock. * Or, if exec fails before, free_bprm() should release ->cred * and unlock. */ static int prepare_bprm_creds(struct linux_binprm *bprm) { if (mutex_lock_interruptible(¤t->signal->cred_guard_mutex)) return -ERESTARTNOINTR; bprm->cred = prepare_exec_creds(); if (likely(bprm->cred)) return 0; mutex_unlock(¤t->signal->cred_guard_mutex); return -ENOMEM; } /* Matches do_open_execat() */ static void do_close_execat(struct file *file) { if (!file) return; exe_file_allow_write_access(file); fput(file); } static void free_bprm(struct linux_binprm *bprm) { if (bprm->mm) { acct_arg_size(bprm, 0); mmput(bprm->mm); } free_arg_pages(bprm); if (bprm->cred) { /* in case exec fails before de_thread() succeeds */ current->fs->in_exec = 0; mutex_unlock(¤t->signal->cred_guard_mutex); abort_creds(bprm->cred); } do_close_execat(bprm->file); if (bprm->executable) fput(bprm->executable); /* If a binfmt changed the interp, free it. */ if (bprm->interp != bprm->filename) kfree(bprm->interp); kfree(bprm->fdpath); kfree(bprm); } static struct linux_binprm *alloc_bprm(int fd, struct filename *filename, int flags) { struct linux_binprm *bprm; struct file *file; int retval = -ENOMEM; file = do_open_execat(fd, filename, flags); if (IS_ERR(file)) return ERR_CAST(file); bprm = kzalloc(sizeof(*bprm), GFP_KERNEL); if (!bprm) { do_close_execat(file); return ERR_PTR(-ENOMEM); } bprm->file = file; if (fd == AT_FDCWD || filename->name[0] == '/') { bprm->filename = filename->name; } else { if (filename->name[0] == '\0') { bprm->fdpath = kasprintf(GFP_KERNEL, "/dev/fd/%d", fd); bprm->comm_from_dentry = 1; } else { bprm->fdpath = kasprintf(GFP_KERNEL, "/dev/fd/%d/%s", fd, filename->name); } if (!bprm->fdpath) goto out_free; /* * Record that a name derived from an O_CLOEXEC fd will be * inaccessible after exec. This allows the code in exec to * choose to fail when the executable is not mmaped into the * interpreter and an open file descriptor is not passed to * the interpreter. This makes for a better user experience * than having the interpreter start and then immediately fail * when it finds the executable is inaccessible. */ if (get_close_on_exec(fd)) bprm->interp_flags |= BINPRM_FLAGS_PATH_INACCESSIBLE; bprm->filename = bprm->fdpath; } bprm->interp = bprm->filename; /* * At this point, security_file_open() has already been called (with * __FMODE_EXEC) and access control checks for AT_EXECVE_CHECK will * stop just after the security_bprm_creds_for_exec() call in * bprm_execve(). Indeed, the kernel should not try to parse the * content of the file with exec_binprm() nor change the calling * thread, which means that the following security functions will not * be called: * - security_bprm_check() * - security_bprm_creds_from_file() * - security_bprm_committing_creds() * - security_bprm_committed_creds() */ bprm->is_check = !!(flags & AT_EXECVE_CHECK); retval = bprm_mm_init(bprm); if (!retval) return bprm; out_free: free_bprm(bprm); return ERR_PTR(retval); } int bprm_change_interp(const char *interp, struct linux_binprm *bprm) { /* If a binfmt changed the interp, free it first. */ if (bprm->interp != bprm->filename) kfree(bprm->interp); bprm->interp = kstrdup(interp, GFP_KERNEL); if (!bprm->interp) return -ENOMEM; return 0; } EXPORT_SYMBOL(bprm_change_interp); /* * determine how safe it is to execute the proposed program * - the caller must hold ->cred_guard_mutex to protect against * PTRACE_ATTACH or seccomp thread-sync */ static void check_unsafe_exec(struct linux_binprm *bprm) { struct task_struct *p = current, *t; unsigned n_fs; if (p->ptrace) bprm->unsafe |= LSM_UNSAFE_PTRACE; /* * This isn't strictly necessary, but it makes it harder for LSMs to * mess up. */ if (task_no_new_privs(current)) bprm->unsafe |= LSM_UNSAFE_NO_NEW_PRIVS; /* * If another task is sharing our fs, we cannot safely * suid exec because the differently privileged task * will be able to manipulate the current directory, etc. * It would be nice to force an unshare instead... * * Otherwise we set fs->in_exec = 1 to deny clone(CLONE_FS) * from another sub-thread until de_thread() succeeds, this * state is protected by cred_guard_mutex we hold. */ n_fs = 1; read_seqlock_excl(&p->fs->seq); rcu_read_lock(); for_other_threads(p, t) { if (t->fs == p->fs) n_fs++; } rcu_read_unlock(); /* "users" and "in_exec" locked for copy_fs() */ if (p->fs->users > n_fs) bprm->unsafe |= LSM_UNSAFE_SHARE; else p->fs->in_exec = 1; read_sequnlock_excl(&p->fs->seq); } static void bprm_fill_uid(struct linux_binprm *bprm, struct file *file) { /* Handle suid and sgid on files */ struct mnt_idmap *idmap; struct inode *inode = file_inode(file); unsigned int mode; vfsuid_t vfsuid; vfsgid_t vfsgid; int err; if (!mnt_may_suid(file->f_path.mnt)) return; if (task_no_new_privs(current)) return; mode = READ_ONCE(inode->i_mode); if (!(mode & (S_ISUID|S_ISGID))) return; idmap = file_mnt_idmap(file); /* Be careful if suid/sgid is set */ inode_lock(inode); /* Atomically reload and check mode/uid/gid now that lock held. */ mode = inode->i_mode; vfsuid = i_uid_into_vfsuid(idmap, inode); vfsgid = i_gid_into_vfsgid(idmap, inode); err = inode_permission(idmap, inode, MAY_EXEC); inode_unlock(inode); /* Did the exec bit vanish out from under us? Give up. */ if (err) return; /* We ignore suid/sgid if there are no mappings for them in the ns */ if (!vfsuid_has_mapping(bprm->cred->user_ns, vfsuid) || !vfsgid_has_mapping(bprm->cred->user_ns, vfsgid)) return; if (mode & S_ISUID) { bprm->per_clear |= PER_CLEAR_ON_SETID; bprm->cred->euid = vfsuid_into_kuid(vfsuid); } if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) { bprm->per_clear |= PER_CLEAR_ON_SETID; bprm->cred->egid = vfsgid_into_kgid(vfsgid); } } /* * Compute brpm->cred based upon the final binary. */ static int bprm_creds_from_file(struct linux_binprm *bprm) { /* Compute creds based on which file? */ struct file *file = bprm->execfd_creds ? bprm->executable : bprm->file; bprm_fill_uid(bprm, file); return security_bprm_creds_from_file(bprm, file); } /* * Fill the binprm structure from the inode. * Read the first BINPRM_BUF_SIZE bytes * * This may be called multiple times for binary chains (scripts for example). */ static int prepare_binprm(struct linux_binprm *bprm) { loff_t pos = 0; memset(bprm->buf, 0, BINPRM_BUF_SIZE); return kernel_read(bprm->file, bprm->buf, BINPRM_BUF_SIZE, &pos); } /* * Arguments are '\0' separated strings found at the location bprm->p * points to; chop off the first by relocating brpm->p to right after * the first '\0' encountered. */ int remove_arg_zero(struct linux_binprm *bprm) { unsigned long offset; char *kaddr; struct page *page; if (!bprm->argc) return 0; do { offset = bprm->p & ~PAGE_MASK; page = get_arg_page(bprm, bprm->p, 0); if (!page) return -EFAULT; kaddr = kmap_local_page(page); for (; offset < PAGE_SIZE && kaddr[offset]; offset++, bprm->p++) ; kunmap_local(kaddr); put_arg_page(page); } while (offset == PAGE_SIZE); bprm->p++; bprm->argc--; return 0; } EXPORT_SYMBOL(remove_arg_zero); /* * cycle the list of binary formats handler, until one recognizes the image */ static int search_binary_handler(struct linux_binprm *bprm) { struct linux_binfmt *fmt; int retval; retval = prepare_binprm(bprm); if (retval < 0) return retval; retval = security_bprm_check(bprm); if (retval) return retval; read_lock(&binfmt_lock); list_for_each_entry(fmt, &formats, lh) { if (!try_module_get(fmt->module)) continue; read_unlock(&binfmt_lock); retval = fmt->load_binary(bprm); read_lock(&binfmt_lock); put_binfmt(fmt); if (bprm->point_of_no_return || (retval != -ENOEXEC)) { read_unlock(&binfmt_lock); return retval; } } read_unlock(&binfmt_lock); return -ENOEXEC; } /* binfmt handlers will call back into begin_new_exec() on success. */ static int exec_binprm(struct linux_binprm *bprm) { pid_t old_pid, old_vpid; int ret, depth; /* Need to fetch pid before load_binary changes it */ old_pid = current->pid; rcu_read_lock(); old_vpid = task_pid_nr_ns(current, task_active_pid_ns(current->parent)); rcu_read_unlock(); /* This allows 4 levels of binfmt rewrites before failing hard. */ for (depth = 0;; depth++) { struct file *exec; if (depth > 5) return -ELOOP; ret = search_binary_handler(bprm); if (ret < 0) return ret; if (!bprm->interpreter) break; exec = bprm->file; bprm->file = bprm->interpreter; bprm->interpreter = NULL; exe_file_allow_write_access(exec); if (unlikely(bprm->have_execfd)) { if (bprm->executable) { fput(exec); return -ENOEXEC; } bprm->executable = exec; } else fput(exec); } audit_bprm(bprm); trace_sched_process_exec(current, old_pid, bprm); ptrace_event(PTRACE_EVENT_EXEC, old_vpid); proc_exec_connector(current); return 0; } static int bprm_execve(struct linux_binprm *bprm) { int retval; retval = prepare_bprm_creds(bprm); if (retval) return retval; /* * Check for unsafe execution states before exec_binprm(), which * will call back into begin_new_exec(), into bprm_creds_from_file(), * where setuid-ness is evaluated. */ check_unsafe_exec(bprm); current->in_execve = 1; sched_mm_cid_before_execve(current); sched_exec(); /* Set the unchanging part of bprm->cred */ retval = security_bprm_creds_for_exec(bprm); if (retval || bprm->is_check) goto out; retval = exec_binprm(bprm); if (retval < 0) goto out; sched_mm_cid_after_execve(current); rseq_execve(current); /* execve succeeded */ current->in_execve = 0; user_events_execve(current); acct_update_integrals(current); task_numa_free(current, false); return retval; out: /* * If past the point of no return ensure the code never * returns to the userspace process. Use an existing fatal * signal if present otherwise terminate the process with * SIGSEGV. */ if (bprm->point_of_no_return && !fatal_signal_pending(current)) force_fatal_sig(SIGSEGV); sched_mm_cid_after_execve(current); rseq_set_notify_resume(current); current->in_execve = 0; return retval; } static int do_execveat_common(int fd, struct filename *filename, struct user_arg_ptr argv, struct user_arg_ptr envp, int flags) { struct linux_binprm *bprm; int retval; if (IS_ERR(filename)) return PTR_ERR(filename); /* * We move the actual failure in case of RLIMIT_NPROC excess from * set*uid() to execve() because too many poorly written programs * don't check setuid() return code. Here we additionally recheck * whether NPROC limit is still exceeded. */ if ((current->flags & PF_NPROC_EXCEEDED) && is_rlimit_overlimit(current_ucounts(), UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC))) { retval = -EAGAIN; goto out_ret; } /* We're below the limit (still or again), so we don't want to make * further execve() calls fail. */ current->flags &= ~PF_NPROC_EXCEEDED; bprm = alloc_bprm(fd, filename, flags); if (IS_ERR(bprm)) { retval = PTR_ERR(bprm); goto out_ret; } retval = count(argv, MAX_ARG_STRINGS); if (retval < 0) goto out_free; bprm->argc = retval; retval = count(envp, MAX_ARG_STRINGS); if (retval < 0) goto out_free; bprm->envc = retval; retval = bprm_stack_limits(bprm); if (retval < 0) goto out_free; retval = copy_string_kernel(bprm->filename, bprm); if (retval < 0) goto out_free; bprm->exec = bprm->p; retval = copy_strings(bprm->envc, envp, bprm); if (retval < 0) goto out_free; retval = copy_strings(bprm->argc, argv, bprm); if (retval < 0) goto out_free; /* * When argv is empty, add an empty string ("") as argv[0] to * ensure confused userspace programs that start processing * from argv[1] won't end up walking envp. See also * bprm_stack_limits(). */ if (bprm->argc == 0) { retval = copy_string_kernel("", bprm); if (retval < 0) goto out_free; bprm->argc = 1; pr_warn_once("process '%s' launched '%s' with NULL argv: empty string added\n", current->comm, bprm->filename); } retval = bprm_execve(bprm); out_free: free_bprm(bprm); out_ret: putname(filename); return retval; } int kernel_execve(const char *kernel_filename, const char *const *argv, const char *const *envp) { struct filename *filename; struct linux_binprm *bprm; int fd = AT_FDCWD; int retval; /* It is non-sense for kernel threads to call execve */ if (WARN_ON_ONCE(current->flags & PF_KTHREAD)) return -EINVAL; filename = getname_kernel(kernel_filename); if (IS_ERR(filename)) return PTR_ERR(filename); bprm = alloc_bprm(fd, filename, 0); if (IS_ERR(bprm)) { retval = PTR_ERR(bprm); goto out_ret; } retval = count_strings_kernel(argv); if (WARN_ON_ONCE(retval == 0)) retval = -EINVAL; if (retval < 0) goto out_free; bprm->argc = retval; retval = count_strings_kernel(envp); if (retval < 0) goto out_free; bprm->envc = retval; retval = bprm_stack_limits(bprm); if (retval < 0) goto out_free; retval = copy_string_kernel(bprm->filename, bprm); if (retval < 0) goto out_free; bprm->exec = bprm->p; retval = copy_strings_kernel(bprm->envc, envp, bprm); if (retval < 0) goto out_free; retval = copy_strings_kernel(bprm->argc, argv, bprm); if (retval < 0) goto out_free; retval = bprm_execve(bprm); out_free: free_bprm(bprm); out_ret: putname(filename); return retval; } static int do_execve(struct filename *filename, const char __user *const __user *__argv, const char __user *const __user *__envp) { struct user_arg_ptr argv = { .ptr.native = __argv }; struct user_arg_ptr envp = { .ptr.native = __envp }; return do_execveat_common(AT_FDCWD, filename, argv, envp, 0); } static int do_execveat(int fd, struct filename *filename, const char __user *const __user *__argv, const char __user *const __user *__envp, int flags) { struct user_arg_ptr argv = { .ptr.native = __argv }; struct user_arg_ptr envp = { .ptr.native = __envp }; return do_execveat_common(fd, filename, argv, envp, flags); } #ifdef CONFIG_COMPAT static int compat_do_execve(struct filename *filename, const compat_uptr_t __user *__argv, const compat_uptr_t __user *__envp) { struct user_arg_ptr argv = { .is_compat = true, .ptr.compat = __argv, }; struct user_arg_ptr envp = { .is_compat = true, .ptr.compat = __envp, }; return do_execveat_common(AT_FDCWD, filename, argv, envp, 0); } static int compat_do_execveat(int fd, struct filename *filename, const compat_uptr_t __user *__argv, const compat_uptr_t __user *__envp, int flags) { struct user_arg_ptr argv = { .is_compat = true, .ptr.compat = __argv, }; struct user_arg_ptr envp = { .is_compat = true, .ptr.compat = __envp, }; return do_execveat_common(fd, filename, argv, envp, flags); } #endif void set_binfmt(struct linux_binfmt *new) { struct mm_struct *mm = current->mm; if (mm->binfmt) module_put(mm->binfmt->module); mm->binfmt = new; if (new) __module_get(new->module); } EXPORT_SYMBOL(set_binfmt); /* * set_dumpable stores three-value SUID_DUMP_* into mm->flags. */ void set_dumpable(struct mm_struct *mm, int value) { if (WARN_ON((unsigned)value > SUID_DUMP_ROOT)) return; __mm_flags_set_mask_dumpable(mm, value); } SYSCALL_DEFINE3(execve, const char __user *, filename, const char __user *const __user *, argv, const char __user *const __user *, envp) { return do_execve(getname(filename), argv, envp); } SYSCALL_DEFINE5(execveat, int, fd, const char __user *, filename, const char __user *const __user *, argv, const char __user *const __user *, envp, int, flags) { return do_execveat(fd, getname_uflags(filename, flags), argv, envp, flags); } #ifdef CONFIG_COMPAT COMPAT_SYSCALL_DEFINE3(execve, const char __user *, filename, const compat_uptr_t __user *, argv, const compat_uptr_t __user *, envp) { return compat_do_execve(getname(filename), argv, envp); } COMPAT_SYSCALL_DEFINE5(execveat, int, fd, const char __user *, filename, const compat_uptr_t __user *, argv, const compat_uptr_t __user *, envp, int, flags) { return compat_do_execveat(fd, getname_uflags(filename, flags), argv, envp, flags); } #endif #ifdef CONFIG_SYSCTL static int proc_dointvec_minmax_coredump(const struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { int error = proc_dointvec_minmax(table, write, buffer, lenp, ppos); if (!error && !write) validate_coredump_safety(); return error; } static const struct ctl_table fs_exec_sysctls[] = { { .procname = "suid_dumpable", .data = &suid_dumpable, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec_minmax_coredump, .extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_TWO, }, }; static int __init init_fs_exec_sysctls(void) { register_sysctl_init("fs", fs_exec_sysctls); return 0; } fs_initcall(init_fs_exec_sysctls); #endif /* CONFIG_SYSCTL */ #ifdef CONFIG_EXEC_KUNIT_TEST #include "tests/exec_kunit.c" #endif |
| 5 5 5 5 5 5 5 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 | // SPDX-License-Identifier: GPL-2.0-only /* * * Copyright (C) 2005 Mike Isely <isely@pobox.com> */ #include "pvrusb2-std.h" #include "pvrusb2-debug.h" #include <asm/string.h> #include <linux/slab.h> struct std_name { const char *name; v4l2_std_id id; }; #define CSTD_PAL \ (V4L2_STD_PAL_B| \ V4L2_STD_PAL_B1| \ V4L2_STD_PAL_G| \ V4L2_STD_PAL_H| \ V4L2_STD_PAL_I| \ V4L2_STD_PAL_D| \ V4L2_STD_PAL_D1| \ V4L2_STD_PAL_K| \ V4L2_STD_PAL_M| \ V4L2_STD_PAL_N| \ V4L2_STD_PAL_Nc| \ V4L2_STD_PAL_60) #define CSTD_NTSC \ (V4L2_STD_NTSC_M| \ V4L2_STD_NTSC_M_JP| \ V4L2_STD_NTSC_M_KR| \ V4L2_STD_NTSC_443) #define CSTD_ATSC \ (V4L2_STD_ATSC_8_VSB| \ V4L2_STD_ATSC_16_VSB) #define CSTD_SECAM \ (V4L2_STD_SECAM_B| \ V4L2_STD_SECAM_D| \ V4L2_STD_SECAM_G| \ V4L2_STD_SECAM_H| \ V4L2_STD_SECAM_K| \ V4L2_STD_SECAM_K1| \ V4L2_STD_SECAM_L| \ V4L2_STD_SECAM_LC) #define TSTD_B (V4L2_STD_PAL_B|V4L2_STD_SECAM_B) #define TSTD_B1 (V4L2_STD_PAL_B1) #define TSTD_D (V4L2_STD_PAL_D|V4L2_STD_SECAM_D) #define TSTD_D1 (V4L2_STD_PAL_D1) #define TSTD_G (V4L2_STD_PAL_G|V4L2_STD_SECAM_G) #define TSTD_H (V4L2_STD_PAL_H|V4L2_STD_SECAM_H) #define TSTD_I (V4L2_STD_PAL_I) #define TSTD_K (V4L2_STD_PAL_K|V4L2_STD_SECAM_K) #define TSTD_K1 (V4L2_STD_SECAM_K1) #define TSTD_L (V4L2_STD_SECAM_L) #define TSTD_M (V4L2_STD_PAL_M|V4L2_STD_NTSC_M) #define TSTD_N (V4L2_STD_PAL_N) #define TSTD_Nc (V4L2_STD_PAL_Nc) #define TSTD_60 (V4L2_STD_PAL_60) #define CSTD_ALL (CSTD_PAL|CSTD_NTSC|CSTD_ATSC|CSTD_SECAM) /* Mapping of standard bits to color system */ static const struct std_name std_groups[] = { {"PAL",CSTD_PAL}, {"NTSC",CSTD_NTSC}, {"SECAM",CSTD_SECAM}, {"ATSC",CSTD_ATSC}, }; /* Mapping of standard bits to modulation system */ static const struct std_name std_items[] = { {"B",TSTD_B}, {"B1",TSTD_B1}, {"D",TSTD_D}, {"D1",TSTD_D1}, {"G",TSTD_G}, {"H",TSTD_H}, {"I",TSTD_I}, {"K",TSTD_K}, {"K1",TSTD_K1}, {"L",TSTD_L}, {"LC",V4L2_STD_SECAM_LC}, {"M",TSTD_M}, {"Mj",V4L2_STD_NTSC_M_JP}, {"443",V4L2_STD_NTSC_443}, {"Mk",V4L2_STD_NTSC_M_KR}, {"N",TSTD_N}, {"Nc",TSTD_Nc}, {"60",TSTD_60}, {"8VSB",V4L2_STD_ATSC_8_VSB}, {"16VSB",V4L2_STD_ATSC_16_VSB}, }; // Search an array of std_name structures and return a pointer to the // element with the matching name. static const struct std_name *find_std_name(const struct std_name *arrPtr, unsigned int arrSize, const char *bufPtr, unsigned int bufSize) { unsigned int idx; const struct std_name *p; for (idx = 0; idx < arrSize; idx++) { p = arrPtr + idx; if (strlen(p->name) != bufSize) continue; if (!memcmp(bufPtr,p->name,bufSize)) return p; } return NULL; } int pvr2_std_str_to_id(v4l2_std_id *idPtr,const char *bufPtr, unsigned int bufSize) { v4l2_std_id id = 0; v4l2_std_id cmsk = 0; v4l2_std_id t; int mMode = 0; unsigned int cnt; char ch; const struct std_name *sp; while (bufSize) { if (!mMode) { cnt = 0; while ((cnt < bufSize) && (bufPtr[cnt] != '-')) cnt++; if (cnt >= bufSize) return 0; // No more characters sp = find_std_name(std_groups, ARRAY_SIZE(std_groups), bufPtr,cnt); if (!sp) return 0; // Illegal color system name cnt++; bufPtr += cnt; bufSize -= cnt; mMode = !0; cmsk = sp->id; continue; } cnt = 0; while (cnt < bufSize) { ch = bufPtr[cnt]; if (ch == ';') { mMode = 0; break; } if (ch == '/') break; cnt++; } sp = find_std_name(std_items, ARRAY_SIZE(std_items), bufPtr,cnt); if (!sp) return 0; // Illegal modulation system ID t = sp->id & cmsk; if (!t) return 0; // Specific color + modulation system illegal id |= t; if (cnt < bufSize) cnt++; bufPtr += cnt; bufSize -= cnt; } if (idPtr) *idPtr = id; return !0; } unsigned int pvr2_std_id_to_str(char *bufPtr, unsigned int bufSize, v4l2_std_id id) { unsigned int idx1,idx2; const struct std_name *ip,*gp; int gfl,cfl; unsigned int c1,c2; cfl = 0; c1 = 0; for (idx1 = 0; idx1 < ARRAY_SIZE(std_groups); idx1++) { gp = std_groups + idx1; gfl = 0; for (idx2 = 0; idx2 < ARRAY_SIZE(std_items); idx2++) { ip = std_items + idx2; if (!(gp->id & ip->id & id)) continue; if (!gfl) { if (cfl) { c2 = scnprintf(bufPtr,bufSize,";"); c1 += c2; bufSize -= c2; bufPtr += c2; } cfl = !0; c2 = scnprintf(bufPtr,bufSize, "%s-",gp->name); gfl = !0; } else { c2 = scnprintf(bufPtr,bufSize,"/"); } c1 += c2; bufSize -= c2; bufPtr += c2; c2 = scnprintf(bufPtr,bufSize, ip->name); c1 += c2; bufSize -= c2; bufPtr += c2; } } return c1; } v4l2_std_id pvr2_std_get_usable(void) { return CSTD_ALL; } |
| 23 23 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 | // SPDX-License-Identifier: GPL-2.0 #include <linux/init.h> #include <linux/async.h> #include <linux/export.h> #include <linux/fs.h> #include <linux/slab.h> #include <linux/types.h> #include <linux/fcntl.h> #include <linux/delay.h> #include <linux/string.h> #include <linux/dirent.h> #include <linux/syscalls.h> #include <linux/utime.h> #include <linux/file.h> #include <linux/kstrtox.h> #include <linux/memblock.h> #include <linux/mm.h> #include <linux/namei.h> #include <linux/init_syscalls.h> #include <linux/umh.h> #include <linux/security.h> #include <linux/overflow.h> #include "do_mounts.h" #include "initramfs_internal.h" static __initdata bool csum_present; static __initdata u32 io_csum; static ssize_t __init xwrite(struct file *file, const unsigned char *p, size_t count, loff_t *pos) { ssize_t out = 0; /* sys_write only can write MAX_RW_COUNT aka 2G-4K bytes at most */ while (count) { ssize_t rv = kernel_write(file, p, count, pos); if (rv < 0) { if (rv == -EINTR || rv == -EAGAIN) continue; return out ? out : rv; } else if (rv == 0) break; if (csum_present) { ssize_t i; for (i = 0; i < rv; i++) io_csum += p[i]; } p += rv; out += rv; count -= rv; } return out; } static __initdata char *message; static void __init error(char *x) { if (!message) message = x; } #define panic_show_mem(fmt, ...) \ ({ show_mem(); panic(fmt, ##__VA_ARGS__); }) /* link hash */ #define N_ALIGN(len) ((((len) + 1) & ~3) + 2) static __initdata struct hash { int ino, minor, major; umode_t mode; struct hash *next; char name[N_ALIGN(PATH_MAX)]; } *head[32]; static __initdata bool hardlink_seen; static inline int hash(int major, int minor, int ino) { unsigned long tmp = ino + minor + (major << 3); tmp += tmp >> 5; return tmp & 31; } static char __init *find_link(int major, int minor, int ino, umode_t mode, char *name) { struct hash **p, *q; for (p = head + hash(major, minor, ino); *p; p = &(*p)->next) { if ((*p)->ino != ino) continue; if ((*p)->minor != minor) continue; if ((*p)->major != major) continue; if (((*p)->mode ^ mode) & S_IFMT) continue; return (*p)->name; } q = kmalloc(sizeof(struct hash), GFP_KERNEL); if (!q) panic_show_mem("can't allocate link hash entry"); q->major = major; q->minor = minor; q->ino = ino; q->mode = mode; strscpy(q->name, name); q->next = NULL; *p = q; hardlink_seen = true; return NULL; } static void __init free_hash(void) { struct hash **p, *q; for (p = head; hardlink_seen && p < head + 32; p++) { while (*p) { q = *p; *p = q->next; kfree(q); } } hardlink_seen = false; } #ifdef CONFIG_INITRAMFS_PRESERVE_MTIME static void __init do_utime(char *filename, time64_t mtime) { struct timespec64 t[2] = { { .tv_sec = mtime }, { .tv_sec = mtime } }; init_utimes(filename, t); } static void __init do_utime_path(const struct path *path, time64_t mtime) { struct timespec64 t[2] = { { .tv_sec = mtime }, { .tv_sec = mtime } }; vfs_utimes(path, t); } static __initdata LIST_HEAD(dir_list); struct dir_entry { struct list_head list; time64_t mtime; char name[]; }; static void __init dir_add(const char *name, size_t nlen, time64_t mtime) { struct dir_entry *de; de = kmalloc(struct_size(de, name, nlen), GFP_KERNEL); if (!de) panic_show_mem("can't allocate dir_entry buffer"); INIT_LIST_HEAD(&de->list); strscpy(de->name, name, nlen); de->mtime = mtime; list_add(&de->list, &dir_list); } static void __init dir_utime(void) { struct dir_entry *de, *tmp; list_for_each_entry_safe(de, tmp, &dir_list, list) { list_del(&de->list); do_utime(de->name, de->mtime); kfree(de); } } #else static void __init do_utime(char *filename, time64_t mtime) {} static void __init do_utime_path(const struct path *path, time64_t mtime) {} static void __init dir_add(const char *name, size_t nlen, time64_t mtime) {} static void __init dir_utime(void) {} #endif static __initdata time64_t mtime; /* cpio header parsing */ static __initdata unsigned long ino, major, minor, nlink; static __initdata umode_t mode; static __initdata unsigned long body_len, name_len; static __initdata uid_t uid; static __initdata gid_t gid; static __initdata unsigned rdev; static __initdata u32 hdr_csum; static void __init parse_header(char *s) { unsigned long parsed[13]; int i; for (i = 0, s += 6; i < 13; i++, s += 8) parsed[i] = simple_strntoul(s, NULL, 16, 8); ino = parsed[0]; mode = parsed[1]; uid = parsed[2]; gid = parsed[3]; nlink = parsed[4]; mtime = parsed[5]; /* breaks in y2106 */ body_len = parsed[6]; major = parsed[7]; minor = parsed[8]; rdev = new_encode_dev(MKDEV(parsed[9], parsed[10])); name_len = parsed[11]; hdr_csum = parsed[12]; } /* FSM */ static __initdata enum state { Start, Collect, GotHeader, SkipIt, GotName, CopyFile, GotSymlink, Reset } state, next_state; static __initdata char *victim; static unsigned long byte_count __initdata; static __initdata loff_t this_header, next_header; static inline void __init eat(unsigned n) { victim += n; this_header += n; byte_count -= n; } static __initdata char *collected; static long remains __initdata; static __initdata char *collect; static void __init read_into(char *buf, unsigned size, enum state next) { if (byte_count >= size) { collected = victim; eat(size); state = next; } else { collect = collected = buf; remains = size; next_state = next; state = Collect; } } static __initdata char *header_buf, *symlink_buf, *name_buf; static int __init do_start(void) { read_into(header_buf, CPIO_HDRLEN, GotHeader); return 0; } static int __init do_collect(void) { unsigned long n = remains; if (byte_count < n) n = byte_count; memcpy(collect, victim, n); eat(n); collect += n; if ((remains -= n) != 0) return 1; state = next_state; return 0; } static int __init do_header(void) { if (!memcmp(collected, "070701", 6)) { csum_present = false; } else if (!memcmp(collected, "070702", 6)) { csum_present = true; } else { if (memcmp(collected, "070707", 6) == 0) error("incorrect cpio method used: use -H newc option"); else error("no cpio magic"); return 1; } parse_header(collected); next_header = this_header + N_ALIGN(name_len) + body_len; next_header = (next_header + 3) & ~3; state = SkipIt; if (name_len <= 0 || name_len > PATH_MAX) return 0; if (S_ISLNK(mode)) { if (body_len > PATH_MAX) return 0; collect = collected = symlink_buf; remains = N_ALIGN(name_len) + body_len; next_state = GotSymlink; state = Collect; return 0; } if (S_ISREG(mode) || !body_len) read_into(name_buf, N_ALIGN(name_len), GotName); return 0; } static int __init do_skip(void) { if (this_header + byte_count < next_header) { eat(byte_count); return 1; } else { eat(next_header - this_header); state = next_state; return 0; } } static int __init do_reset(void) { while (byte_count && *victim == '\0') eat(1); if (byte_count && (this_header & 3)) error("broken padding"); return 1; } static void __init clean_path(char *path, umode_t fmode) { struct kstat st; if (!init_stat(path, &st, AT_SYMLINK_NOFOLLOW) && (st.mode ^ fmode) & S_IFMT) { if (S_ISDIR(st.mode)) init_rmdir(path); else init_unlink(path); } } static int __init maybe_link(void) { if (nlink >= 2) { char *old = find_link(major, minor, ino, mode, collected); if (old) { clean_path(collected, 0); return (init_link(old, collected) < 0) ? -1 : 1; } } return 0; } static __initdata struct file *wfile; static __initdata loff_t wfile_pos; static int __init do_name(void) { state = SkipIt; next_state = Reset; /* name_len > 0 && name_len <= PATH_MAX checked in do_header */ if (collected[name_len - 1] != '\0') { pr_err("initramfs name without nulterm: %.*s\n", (int)name_len, collected); error("malformed archive"); return 1; } if (strcmp(collected, "TRAILER!!!") == 0) { free_hash(); return 0; } clean_path(collected, mode); if (S_ISREG(mode)) { int ml = maybe_link(); if (ml >= 0) { int openflags = O_WRONLY|O_CREAT|O_LARGEFILE; if (ml != 1) openflags |= O_TRUNC; wfile = filp_open(collected, openflags, mode); if (IS_ERR(wfile)) return 0; wfile_pos = 0; io_csum = 0; vfs_fchown(wfile, uid, gid); vfs_fchmod(wfile, mode); if (body_len) vfs_truncate(&wfile->f_path, body_len); state = CopyFile; } } else if (S_ISDIR(mode)) { init_mkdir(collected, mode); init_chown(collected, uid, gid, 0); init_chmod(collected, mode); dir_add(collected, name_len, mtime); } else if (S_ISBLK(mode) || S_ISCHR(mode) || S_ISFIFO(mode) || S_ISSOCK(mode)) { if (maybe_link() == 0) { init_mknod(collected, mode, rdev); init_chown(collected, uid, gid, 0); init_chmod(collected, mode); do_utime(collected, mtime); } } return 0; } static int __init do_copy(void) { if (byte_count >= body_len) { if (xwrite(wfile, victim, body_len, &wfile_pos) != body_len) error("write error"); do_utime_path(&wfile->f_path, mtime); fput(wfile); if (csum_present && io_csum != hdr_csum) error("bad data checksum"); eat(body_len); state = SkipIt; return 0; } else { if (xwrite(wfile, victim, byte_count, &wfile_pos) != byte_count) error("write error"); body_len -= byte_count; eat(byte_count); return 1; } } static int __init do_symlink(void) { if (collected[name_len - 1] != '\0') { pr_err("initramfs symlink without nulterm: %.*s\n", (int)name_len, collected); error("malformed archive"); return 1; } collected[N_ALIGN(name_len) + body_len] = '\0'; clean_path(collected, 0); init_symlink(collected + N_ALIGN(name_len), collected); init_chown(collected, uid, gid, AT_SYMLINK_NOFOLLOW); do_utime(collected, mtime); state = SkipIt; next_state = Reset; return 0; } static __initdata int (*actions[])(void) = { [Start] = do_start, [Collect] = do_collect, [GotHeader] = do_header, [SkipIt] = do_skip, [GotName] = do_name, [CopyFile] = do_copy, [GotSymlink] = do_symlink, [Reset] = do_reset, }; static long __init write_buffer(char *buf, unsigned long len) { byte_count = len; victim = buf; while (!actions[state]()) ; return len - byte_count; } static long __init flush_buffer(void *bufv, unsigned long len) { char *buf = bufv; long written; long origLen = len; if (message) return -1; while ((written = write_buffer(buf, len)) < len && !message) { char c = buf[written]; if (c == '0') { buf += written; len -= written; state = Start; } else if (c == 0) { buf += written; len -= written; state = Reset; } else error("junk within compressed archive"); } return origLen; } static unsigned long my_inptr __initdata; /* index of next byte to be processed in inbuf */ #include <linux/decompress/generic.h> /** * unpack_to_rootfs - decompress and extract an initramfs archive * @buf: input initramfs archive to extract * @len: length of initramfs data to process * * Returns: NULL for success or an error message string * * This symbol shouldn't be used externally. It's available for unit tests. */ char * __init unpack_to_rootfs(char *buf, unsigned long len) { long written; decompress_fn decompress; const char *compress_name; struct { char header[CPIO_HDRLEN]; char symlink[PATH_MAX + N_ALIGN(PATH_MAX) + 1]; char name[N_ALIGN(PATH_MAX)]; } *bufs = kmalloc(sizeof(*bufs), GFP_KERNEL); if (!bufs) panic_show_mem("can't allocate buffers"); header_buf = bufs->header; symlink_buf = bufs->symlink; name_buf = bufs->name; state = Start; this_header = 0; message = NULL; while (!message && len) { loff_t saved_offset = this_header; if (*buf == '0' && !(this_header & 3)) { state = Start; written = write_buffer(buf, len); buf += written; len -= written; continue; } if (!*buf) { buf++; len--; this_header++; continue; } this_header = 0; decompress = decompress_method(buf, len, &compress_name); pr_debug("Detected %s compressed data\n", compress_name); if (decompress) { int res = decompress(buf, len, NULL, flush_buffer, NULL, &my_inptr, error); if (res) error("decompressor failed"); } else if (compress_name) { pr_err("compression method %s not configured\n", compress_name); error("decompressor failed"); } else error("invalid magic at start of compressed archive"); if (state != Reset) error("junk at the end of compressed archive"); this_header = saved_offset + my_inptr; buf += my_inptr; len -= my_inptr; } dir_utime(); /* free any hardlink state collected without optional TRAILER!!! */ free_hash(); kfree(bufs); return message; } static int __initdata do_retain_initrd; static int __init retain_initrd_param(char *str) { if (*str) return 0; do_retain_initrd = 1; return 1; } __setup("retain_initrd", retain_initrd_param); #ifdef CONFIG_ARCH_HAS_KEEPINITRD static int __init keepinitrd_setup(char *__unused) { do_retain_initrd = 1; return 1; } __setup("keepinitrd", keepinitrd_setup); #endif static bool __initdata initramfs_async = true; static int __init initramfs_async_setup(char *str) { return kstrtobool(str, &initramfs_async) == 0; } __setup("initramfs_async=", initramfs_async_setup); extern char __initramfs_start[]; extern unsigned long __initramfs_size; #include <linux/initrd.h> #include <linux/kexec.h> static BIN_ATTR(initrd, 0440, sysfs_bin_attr_simple_read, NULL, 0); void __init reserve_initrd_mem(void) { phys_addr_t start; unsigned long size; /* Ignore the virtul address computed during device tree parsing */ initrd_start = initrd_end = 0; if (!phys_initrd_size) return; /* * Round the memory region to page boundaries as per free_initrd_mem() * This allows us to detect whether the pages overlapping the initrd * are in use, but more importantly, reserves the entire set of pages * as we don't want these pages allocated for other purposes. */ start = round_down(phys_initrd_start, PAGE_SIZE); size = phys_initrd_size + (phys_initrd_start - start); size = round_up(size, PAGE_SIZE); if (!memblock_is_region_memory(start, size)) { pr_err("INITRD: 0x%08llx+0x%08lx is not a memory region", (u64)start, size); goto disable; } if (memblock_is_region_reserved(start, size)) { pr_err("INITRD: 0x%08llx+0x%08lx overlaps in-use memory region\n", (u64)start, size); goto disable; } memblock_reserve(start, size); /* Now convert initrd to virtual addresses */ initrd_start = (unsigned long)__va(phys_initrd_start); initrd_end = initrd_start + phys_initrd_size; initrd_below_start_ok = 1; return; disable: pr_cont(" - disabling initrd\n"); initrd_start = 0; initrd_end = 0; } void __weak __init free_initrd_mem(unsigned long start, unsigned long end) { #ifdef CONFIG_ARCH_KEEP_MEMBLOCK unsigned long aligned_start = ALIGN_DOWN(start, PAGE_SIZE); unsigned long aligned_end = ALIGN(end, PAGE_SIZE); memblock_free((void *)aligned_start, aligned_end - aligned_start); #endif free_reserved_area((void *)start, (void *)end, POISON_FREE_INITMEM, "initrd"); } #ifdef CONFIG_CRASH_RESERVE static bool __init kexec_free_initrd(void) { unsigned long crashk_start = (unsigned long)__va(crashk_res.start); unsigned long crashk_end = (unsigned long)__va(crashk_res.end); /* * If the initrd region is overlapped with crashkernel reserved region, * free only memory that is not part of crashkernel region. */ if (initrd_start >= crashk_end || initrd_end <= crashk_start) return false; /* * Initialize initrd memory region since the kexec boot does not do. */ memset((void *)initrd_start, 0, initrd_end - initrd_start); if (initrd_start < crashk_start) free_initrd_mem(initrd_start, crashk_start); if (initrd_end > crashk_end) free_initrd_mem(crashk_end, initrd_end); return true; } #else static inline bool kexec_free_initrd(void) { return false; } #endif /* CONFIG_KEXEC_CORE */ #ifdef CONFIG_BLK_DEV_RAM static void __init populate_initrd_image(char *err) { ssize_t written; struct file *file; loff_t pos = 0; printk(KERN_INFO "rootfs image is not initramfs (%s); looks like an initrd\n", err); file = filp_open("/initrd.image", O_WRONLY|O_CREAT|O_LARGEFILE, 0700); if (IS_ERR(file)) return; written = xwrite(file, (char *)initrd_start, initrd_end - initrd_start, &pos); if (written != initrd_end - initrd_start) pr_err("/initrd.image: incomplete write (%zd != %ld)\n", written, initrd_end - initrd_start); fput(file); } #endif /* CONFIG_BLK_DEV_RAM */ static void __init do_populate_rootfs(void *unused, async_cookie_t cookie) { /* Load the built in initramfs */ char *err = unpack_to_rootfs(__initramfs_start, __initramfs_size); if (err) panic_show_mem("%s", err); /* Failed to decompress INTERNAL initramfs */ if (!initrd_start || IS_ENABLED(CONFIG_INITRAMFS_FORCE)) goto done; if (IS_ENABLED(CONFIG_BLK_DEV_RAM)) printk(KERN_INFO "Trying to unpack rootfs image as initramfs...\n"); else printk(KERN_INFO "Unpacking initramfs...\n"); err = unpack_to_rootfs((char *)initrd_start, initrd_end - initrd_start); if (err) { #ifdef CONFIG_BLK_DEV_RAM populate_initrd_image(err); #else printk(KERN_EMERG "Initramfs unpacking failed: %s\n", err); #endif } done: security_initramfs_populated(); /* * If the initrd region is overlapped with crashkernel reserved region, * free only memory that is not part of crashkernel region. */ if (!do_retain_initrd && initrd_start && !kexec_free_initrd()) { free_initrd_mem(initrd_start, initrd_end); } else if (do_retain_initrd && initrd_start) { bin_attr_initrd.size = initrd_end - initrd_start; bin_attr_initrd.private = (void *)initrd_start; if (sysfs_create_bin_file(firmware_kobj, &bin_attr_initrd)) pr_err("Failed to create initrd sysfs file"); } initrd_start = 0; initrd_end = 0; init_flush_fput(); } static ASYNC_DOMAIN_EXCLUSIVE(initramfs_domain); static async_cookie_t initramfs_cookie; void wait_for_initramfs(void) { if (!initramfs_cookie) { /* * Something before rootfs_initcall wants to access * the filesystem/initramfs. Probably a bug. Make a * note, avoid deadlocking the machine, and let the * caller's access fail as it used to. */ pr_warn_once("wait_for_initramfs() called before rootfs_initcalls\n"); return; } async_synchronize_cookie_domain(initramfs_cookie + 1, &initramfs_domain); } EXPORT_SYMBOL_GPL(wait_for_initramfs); static int __init populate_rootfs(void) { initramfs_cookie = async_schedule_domain(do_populate_rootfs, NULL, &initramfs_domain); usermodehelper_enable(); if (!initramfs_async) wait_for_initramfs(); return 0; } rootfs_initcall(populate_rootfs); |
| 1 1 1 51 1 1 51 51 1 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 | // SPDX-License-Identifier: GPL-2.0-only /* * Copyright 2003-2005 Devicescape Software, Inc. * Copyright (c) 2006 Jiri Benc <jbenc@suse.cz> * Copyright 2007 Johannes Berg <johannes@sipsolutions.net> * Copyright 2013-2014 Intel Mobile Communications GmbH * Copyright(c) 2016 Intel Deutschland GmbH * Copyright (C) 2018 - 2023 Intel Corporation */ #include <linux/debugfs.h> #include <linux/ieee80211.h> #include "ieee80211_i.h" #include "debugfs.h" #include "debugfs_sta.h" #include "sta_info.h" #include "driver-ops.h" /* sta attributes */ #define STA_READ(name, field, format_string) \ static ssize_t sta_ ##name## _read(struct file *file, \ char __user *userbuf, \ size_t count, loff_t *ppos) \ { \ struct sta_info *sta = file->private_data; \ return mac80211_format_buffer(userbuf, count, ppos, \ format_string, sta->field); \ } #define STA_READ_D(name, field) STA_READ(name, field, "%d\n") #define STA_OPS(name) \ static const struct debugfs_short_fops sta_ ##name## _ops = { \ .read = sta_##name##_read, \ .llseek = generic_file_llseek, \ } #define STA_OPS_RW(name) \ static const struct debugfs_short_fops sta_ ##name## _ops = { \ .read = sta_##name##_read, \ .write = sta_##name##_write, \ .llseek = generic_file_llseek, \ } #define STA_FILE(name, field, format) \ STA_READ_##format(name, field) \ STA_OPS(name) STA_FILE(aid, sta.aid, D); static const char * const sta_flag_names[] = { #define FLAG(F) [WLAN_STA_##F] = #F FLAG(AUTH), FLAG(ASSOC), FLAG(PS_STA), FLAG(AUTHORIZED), FLAG(SHORT_PREAMBLE), FLAG(WDS), FLAG(CLEAR_PS_FILT), FLAG(MFP), FLAG(BLOCK_BA), FLAG(PS_DRIVER), FLAG(PSPOLL), FLAG(TDLS_PEER), FLAG(TDLS_PEER_AUTH), FLAG(TDLS_INITIATOR), FLAG(TDLS_CHAN_SWITCH), FLAG(TDLS_OFF_CHANNEL), FLAG(TDLS_WIDER_BW), FLAG(UAPSD), FLAG(SP), FLAG(4ADDR_EVENT), FLAG(INSERTED), FLAG(RATE_CONTROL), FLAG(TOFFSET_KNOWN), FLAG(MPSP_OWNER), FLAG(MPSP_RECIPIENT), FLAG(PS_DELIVER), FLAG(USES_ENCRYPTION), FLAG(DECAP_OFFLOAD), #undef FLAG }; static ssize_t sta_flags_read(struct file *file, char __user *userbuf, size_t count, loff_t *ppos) { char buf[16 * NUM_WLAN_STA_FLAGS], *pos = buf; char *end = buf + sizeof(buf) - 1; struct sta_info *sta = file->private_data; unsigned int flg; BUILD_BUG_ON(ARRAY_SIZE(sta_flag_names) != NUM_WLAN_STA_FLAGS); for (flg = 0; flg < NUM_WLAN_STA_FLAGS; flg++) { if (test_sta_flag(sta, flg)) pos += scnprintf(pos, end - pos, "%s\n", sta_flag_names[flg]); } return simple_read_from_buffer(userbuf, count, ppos, buf, strlen(buf)); } STA_OPS(flags); static ssize_t sta_num_ps_buf_frames_read(struct file *file, char __user *userbuf, size_t count, loff_t *ppos) { struct sta_info *sta = file->private_data; char buf[17*IEEE80211_NUM_ACS], *p = buf; int ac; for (ac = 0; ac < IEEE80211_NUM_ACS; ac++) p += scnprintf(p, sizeof(buf)+buf-p, "AC%d: %d\n", ac, skb_queue_len(&sta->ps_tx_buf[ac]) + skb_queue_len(&sta->tx_filtered[ac])); return simple_read_from_buffer(userbuf, count, ppos, buf, p - buf); } STA_OPS(num_ps_buf_frames); static ssize_t sta_last_seq_ctrl_read(struct file *file, char __user *userbuf, size_t count, loff_t *ppos) { char buf[15*IEEE80211_NUM_TIDS], *p = buf; int i; struct sta_info *sta = file->private_data; for (i = 0; i < IEEE80211_NUM_TIDS; i++) p += scnprintf(p, sizeof(buf)+buf-p, "%x ", le16_to_cpu(sta->last_seq_ctrl[i])); p += scnprintf(p, sizeof(buf)+buf-p, "\n"); return simple_read_from_buffer(userbuf, count, ppos, buf, p - buf); } STA_OPS(last_seq_ctrl); #define AQM_TXQ_ENTRY_LEN 130 static ssize_t sta_aqm_read(struct file *file, char __user *userbuf, size_t count, loff_t *ppos) { struct sta_info *sta = file->private_data; struct ieee80211_local *local = sta->local; size_t bufsz = AQM_TXQ_ENTRY_LEN * (IEEE80211_NUM_TIDS + 2); char *buf = kzalloc(bufsz, GFP_KERNEL), *p = buf; struct txq_info *txqi; ssize_t rv; int i; if (!buf) return -ENOMEM; spin_lock_bh(&local->fq.lock); p += scnprintf(p, bufsz + buf - p, "tid ac backlog-bytes backlog-packets new-flows drops marks overlimit collisions tx-bytes tx-packets flags\n"); for (i = 0; i < ARRAY_SIZE(sta->sta.txq); i++) { if (!sta->sta.txq[i]) continue; txqi = to_txq_info(sta->sta.txq[i]); p += scnprintf(p, bufsz + buf - p, "%d %d %u %u %u %u %u %u %u %u %u 0x%lx(%s%s%s%s)\n", txqi->txq.tid, txqi->txq.ac, txqi->tin.backlog_bytes, txqi->tin.backlog_packets, txqi->tin.flows, txqi->cstats.drop_count, txqi->cstats.ecn_mark, txqi->tin.overlimit, txqi->tin.collisions, txqi->tin.tx_bytes, txqi->tin.tx_packets, txqi->flags, test_bit(IEEE80211_TXQ_STOP, &txqi->flags) ? "STOP" : "RUN", test_bit(IEEE80211_TXQ_AMPDU, &txqi->flags) ? " AMPDU" : "", test_bit(IEEE80211_TXQ_NO_AMSDU, &txqi->flags) ? " NO-AMSDU" : "", test_bit(IEEE80211_TXQ_DIRTY, &txqi->flags) ? " DIRTY" : ""); } spin_unlock_bh(&local->fq.lock); rv = simple_read_from_buffer(userbuf, count, ppos, buf, p - buf); kfree(buf); return rv; } STA_OPS(aqm); static ssize_t sta_airtime_read(struct file *file, char __user *userbuf, size_t count, loff_t *ppos) { struct sta_info *sta = file->private_data; struct ieee80211_local *local = sta->sdata->local; size_t bufsz = 400; char *buf = kzalloc(bufsz, GFP_KERNEL), *p = buf; u64 rx_airtime = 0, tx_airtime = 0; s32 deficit[IEEE80211_NUM_ACS]; ssize_t rv; int ac; if (!buf) return -ENOMEM; for (ac = 0; ac < IEEE80211_NUM_ACS; ac++) { spin_lock_bh(&local->active_txq_lock[ac]); rx_airtime += sta->airtime[ac].rx_airtime; tx_airtime += sta->airtime[ac].tx_airtime; deficit[ac] = sta->airtime[ac].deficit; spin_unlock_bh(&local->active_txq_lock[ac]); } p += scnprintf(p, bufsz + buf - p, "RX: %llu us\nTX: %llu us\nWeight: %u\n" "Deficit: VO: %d us VI: %d us BE: %d us BK: %d us\n", rx_airtime, tx_airtime, sta->airtime_weight, deficit[0], deficit[1], deficit[2], deficit[3]); rv = simple_read_from_buffer(userbuf, count, ppos, buf, p - buf); kfree(buf); return rv; } static ssize_t sta_airtime_write(struct file *file, const char __user *userbuf, size_t count, loff_t *ppos) { struct sta_info *sta = file->private_data; struct ieee80211_local *local = sta->sdata->local; int ac; for (ac = 0; ac < IEEE80211_NUM_ACS; ac++) { spin_lock_bh(&local->active_txq_lock[ac]); sta->airtime[ac].rx_airtime = 0; sta->airtime[ac].tx_airtime = 0; sta->airtime[ac].deficit = sta->airtime_weight; spin_unlock_bh(&local->active_txq_lock[ac]); } return count; } STA_OPS_RW(airtime); static ssize_t sta_aql_read(struct file *file, char __user *userbuf, size_t count, loff_t *ppos) { struct sta_info *sta = file->private_data; struct ieee80211_local *local = sta->sdata->local; size_t bufsz = 400; char *buf = kzalloc(bufsz, GFP_KERNEL), *p = buf; u32 q_depth[IEEE80211_NUM_ACS]; u32 q_limit_l[IEEE80211_NUM_ACS], q_limit_h[IEEE80211_NUM_ACS]; ssize_t rv; int ac; if (!buf) return -ENOMEM; for (ac = 0; ac < IEEE80211_NUM_ACS; ac++) { spin_lock_bh(&local->active_txq_lock[ac]); q_limit_l[ac] = sta->airtime[ac].aql_limit_low; q_limit_h[ac] = sta->airtime[ac].aql_limit_high; spin_unlock_bh(&local->active_txq_lock[ac]); q_depth[ac] = atomic_read(&sta->airtime[ac].aql_tx_pending); } p += scnprintf(p, bufsz + buf - p, "Q depth: VO: %u us VI: %u us BE: %u us BK: %u us\n" "Q limit[low/high]: VO: %u/%u VI: %u/%u BE: %u/%u BK: %u/%u\n", q_depth[0], q_depth[1], q_depth[2], q_depth[3], q_limit_l[0], q_limit_h[0], q_limit_l[1], q_limit_h[1], q_limit_l[2], q_limit_h[2], q_limit_l[3], q_limit_h[3]); rv = simple_read_from_buffer(userbuf, count, ppos, buf, p - buf); kfree(buf); return rv; } static ssize_t sta_aql_write(struct file *file, const char __user *userbuf, size_t count, loff_t *ppos) { struct sta_info *sta = file->private_data; u32 ac, q_limit_l, q_limit_h; char _buf[100] = {}, *buf = _buf; if (count > sizeof(_buf)) return -EINVAL; if (copy_from_user(buf, userbuf, count)) return -EFAULT; buf[sizeof(_buf) - 1] = '\0'; if (sscanf(buf, "limit %u %u %u", &ac, &q_limit_l, &q_limit_h) != 3) return -EINVAL; if (ac >= IEEE80211_NUM_ACS) return -EINVAL; sta->airtime[ac].aql_limit_low = q_limit_l; sta->airtime[ac].aql_limit_high = q_limit_h; return count; } STA_OPS_RW(aql); static ssize_t sta_agg_status_do_read(struct wiphy *wiphy, struct file *file, char *buf, size_t bufsz, void *data) { struct sta_info *sta = data; char *p = buf; int i; struct tid_ampdu_rx *tid_rx; struct tid_ampdu_tx *tid_tx; p += scnprintf(p, bufsz + buf - p, "next dialog_token: %#02x\n", sta->ampdu_mlme.dialog_token_allocator + 1); p += scnprintf(p, bufsz + buf - p, "TID\t\tRX\tDTKN\tSSN\t\tTX\tDTKN\tpending\n"); for (i = 0; i < IEEE80211_NUM_TIDS; i++) { bool tid_rx_valid; tid_rx = wiphy_dereference(wiphy, sta->ampdu_mlme.tid_rx[i]); tid_tx = wiphy_dereference(wiphy, sta->ampdu_mlme.tid_tx[i]); tid_rx_valid = test_bit(i, sta->ampdu_mlme.agg_session_valid); p += scnprintf(p, bufsz + buf - p, "%02d", i); p += scnprintf(p, bufsz + buf - p, "\t\t%x", tid_rx_valid); p += scnprintf(p, bufsz + buf - p, "\t%#.2x", tid_rx_valid ? sta->ampdu_mlme.tid_rx_token[i] : 0); p += scnprintf(p, bufsz + buf - p, "\t%#.3x", tid_rx ? tid_rx->ssn : 0); p += scnprintf(p, bufsz + buf - p, "\t\t%x", !!tid_tx); p += scnprintf(p, bufsz + buf - p, "\t%#.2x", tid_tx ? tid_tx->dialog_token : 0); p += scnprintf(p, bufsz + buf - p, "\t%03d", tid_tx ? skb_queue_len(&tid_tx->pending) : 0); p += scnprintf(p, bufsz + buf - p, "\n"); } return p - buf; } static ssize_t sta_agg_status_read(struct file *file, char __user *userbuf, size_t count, loff_t *ppos) { struct sta_info *sta = file->private_data; struct wiphy *wiphy = sta->local->hw.wiphy; size_t bufsz = 71 + IEEE80211_NUM_TIDS * 40; char *buf = kmalloc(bufsz, GFP_KERNEL); ssize_t ret; if (!buf) return -ENOMEM; ret = wiphy_locked_debugfs_read(wiphy, file, buf, bufsz, userbuf, count, ppos, sta_agg_status_do_read, sta); kfree(buf); return ret; } static ssize_t sta_agg_status_do_write(struct wiphy *wiphy, struct file *file, char *buf, size_t count, void *data) { struct sta_info *sta = data; bool start, tx; unsigned long tid; char *pos = buf; int ret, timeout = 5000; buf = strsep(&pos, " "); if (!buf) return -EINVAL; if (!strcmp(buf, "tx")) tx = true; else if (!strcmp(buf, "rx")) tx = false; else return -EINVAL; buf = strsep(&pos, " "); if (!buf) return -EINVAL; if (!strcmp(buf, "start")) { start = true; if (!tx) return -EINVAL; } else if (!strcmp(buf, "stop")) { start = false; } else { return -EINVAL; } buf = strsep(&pos, " "); if (!buf) return -EINVAL; if (sscanf(buf, "timeout=%d", &timeout) == 1) { buf = strsep(&pos, " "); if (!buf || !tx || !start) return -EINVAL; } ret = kstrtoul(buf, 0, &tid); if (ret || tid >= IEEE80211_NUM_TIDS) return -EINVAL; if (tx) { if (start) ret = ieee80211_start_tx_ba_session(&sta->sta, tid, timeout); else ret = ieee80211_stop_tx_ba_session(&sta->sta, tid); } else { __ieee80211_stop_rx_ba_session(sta, tid, WLAN_BACK_RECIPIENT, 3, true); ret = 0; } return ret ?: count; } static ssize_t sta_agg_status_write(struct file *file, const char __user *userbuf, size_t count, loff_t *ppos) { struct sta_info *sta = file->private_data; struct wiphy *wiphy = sta->local->hw.wiphy; char _buf[26]; return wiphy_locked_debugfs_write(wiphy, file, _buf, sizeof(_buf), userbuf, count, sta_agg_status_do_write, sta); } STA_OPS_RW(agg_status); /* link sta attributes */ #define LINK_STA_OPS(name) \ static const struct debugfs_short_fops link_sta_ ##name## _ops = { \ .read = link_sta_##name##_read, \ .llseek = generic_file_llseek, \ } static ssize_t link_sta_addr_read(struct file *file, char __user *userbuf, size_t count, loff_t *ppos) { struct link_sta_info *link_sta = file->private_data; u8 mac[MAC_ADDR_STR_LEN + 2]; snprintf(mac, sizeof(mac), "%pM\n", link_sta->pub->addr); return simple_read_from_buffer(userbuf, count, ppos, mac, MAC_ADDR_STR_LEN + 1); } LINK_STA_OPS(addr); static ssize_t link_sta_ht_capa_read(struct file *file, char __user *userbuf, size_t count, loff_t *ppos) { #define PRINT_HT_CAP(_cond, _str) \ do { \ if (_cond) \ p += scnprintf(p, bufsz + buf - p, "\t" _str "\n"); \ } while (0) char *buf, *p; int i; ssize_t bufsz = 512; struct link_sta_info *link_sta = file->private_data; struct ieee80211_sta_ht_cap *htc = &link_sta->pub->ht_cap; ssize_t ret; buf = kzalloc(bufsz, GFP_KERNEL); if (!buf) return -ENOMEM; p = buf; p += scnprintf(p, bufsz + buf - p, "ht %ssupported\n", htc->ht_supported ? "" : "not "); if (htc->ht_supported) { p += scnprintf(p, bufsz + buf - p, "cap: %#.4x\n", htc->cap); PRINT_HT_CAP((htc->cap & BIT(0)), "RX LDPC"); PRINT_HT_CAP((htc->cap & BIT(1)), "HT20/HT40"); PRINT_HT_CAP(!(htc->cap & BIT(1)), "HT20"); PRINT_HT_CAP(((htc->cap >> 2) & 0x3) == 0, "Static SM Power Save"); PRINT_HT_CAP(((htc->cap >> 2) & 0x3) == 1, "Dynamic SM Power Save"); PRINT_HT_CAP(((htc->cap >> 2) & 0x3) == 3, "SM Power Save disabled"); PRINT_HT_CAP((htc->cap & BIT(4)), "RX Greenfield"); PRINT_HT_CAP((htc->cap & BIT(5)), "RX HT20 SGI"); PRINT_HT_CAP((htc->cap & BIT(6)), "RX HT40 SGI"); PRINT_HT_CAP((htc->cap & BIT(7)), "TX STBC"); PRINT_HT_CAP(((htc->cap >> 8) & 0x3) == 0, "No RX STBC"); PRINT_HT_CAP(((htc->cap >> 8) & 0x3) == 1, "RX STBC 1-stream"); PRINT_HT_CAP(((htc->cap >> 8) & 0x3) == 2, "RX STBC 2-streams"); PRINT_HT_CAP(((htc->cap >> 8) & 0x3) == 3, "RX STBC 3-streams"); PRINT_HT_CAP((htc->cap & BIT(10)), "HT Delayed Block Ack"); PRINT_HT_CAP(!(htc->cap & BIT(11)), "Max AMSDU length: " "3839 bytes"); PRINT_HT_CAP((htc->cap & BIT(11)), "Max AMSDU length: " "7935 bytes"); /* * For beacons and probe response this would mean the BSS * does or does not allow the usage of DSSS/CCK HT40. * Otherwise it means the STA does or does not use * DSSS/CCK HT40. */ PRINT_HT_CAP((htc->cap & BIT(12)), "DSSS/CCK HT40"); PRINT_HT_CAP(!(htc->cap & BIT(12)), "No DSSS/CCK HT40"); /* BIT(13) is reserved */ PRINT_HT_CAP((htc->cap & BIT(14)), "40 MHz Intolerant"); PRINT_HT_CAP((htc->cap & BIT(15)), "L-SIG TXOP protection"); p += scnprintf(p, bufsz + buf - p, "ampdu factor/density: %d/%d\n", htc->ampdu_factor, htc->ampdu_density); p += scnprintf(p, bufsz + buf - p, "MCS mask:"); for (i = 0; i < IEEE80211_HT_MCS_MASK_LEN; i++) p += scnprintf(p, bufsz + buf - p, " %.2x", htc->mcs.rx_mask[i]); p += scnprintf(p, bufsz + buf - p, "\n"); /* If not set this is meaningless */ if (le16_to_cpu(htc->mcs.rx_highest)) { p += scnprintf(p, bufsz + buf - p, "MCS rx highest: %d Mbps\n", le16_to_cpu(htc->mcs.rx_highest)); } p += scnprintf(p, bufsz + buf - p, "MCS tx params: %x\n", htc->mcs.tx_params); } ret = simple_read_from_buffer(userbuf, count, ppos, buf, p - buf); kfree(buf); return ret; } LINK_STA_OPS(ht_capa); static ssize_t link_sta_vht_capa_read(struct file *file, char __user *userbuf, size_t count, loff_t *ppos) { char *buf, *p; struct link_sta_info *link_sta = file->private_data; struct ieee80211_sta_vht_cap *vhtc = &link_sta->pub->vht_cap; ssize_t ret; ssize_t bufsz = 512; buf = kzalloc(bufsz, GFP_KERNEL); if (!buf) return -ENOMEM; p = buf; p += scnprintf(p, bufsz + buf - p, "VHT %ssupported\n", vhtc->vht_supported ? "" : "not "); if (vhtc->vht_supported) { p += scnprintf(p, bufsz + buf - p, "cap: %#.8x\n", vhtc->cap); #define PFLAG(a, b) \ do { \ if (vhtc->cap & IEEE80211_VHT_CAP_ ## a) \ p += scnprintf(p, bufsz + buf - p, \ "\t\t%s\n", b); \ } while (0) switch (vhtc->cap & 0x3) { case IEEE80211_VHT_CAP_MAX_MPDU_LENGTH_3895: p += scnprintf(p, bufsz + buf - p, "\t\tMAX-MPDU-3895\n"); break; case IEEE80211_VHT_CAP_MAX_MPDU_LENGTH_7991: p += scnprintf(p, bufsz + buf - p, "\t\tMAX-MPDU-7991\n"); break; case IEEE80211_VHT_CAP_MAX_MPDU_LENGTH_11454: p += scnprintf(p, bufsz + buf - p, "\t\tMAX-MPDU-11454\n"); break; default: p += scnprintf(p, bufsz + buf - p, "\t\tMAX-MPDU-UNKNOWN\n"); } switch (vhtc->cap & IEEE80211_VHT_CAP_SUPP_CHAN_WIDTH_MASK) { case 0: p += scnprintf(p, bufsz + buf - p, "\t\t80Mhz\n"); break; case IEEE80211_VHT_CAP_SUPP_CHAN_WIDTH_160MHZ: p += scnprintf(p, bufsz + buf - p, "\t\t160Mhz\n"); break; case IEEE80211_VHT_CAP_SUPP_CHAN_WIDTH_160_80PLUS80MHZ: p += scnprintf(p, bufsz + buf - p, "\t\t80+80Mhz\n"); break; default: p += scnprintf(p, bufsz + buf - p, "\t\tUNKNOWN-MHZ: 0x%x\n", (vhtc->cap >> 2) & 0x3); } PFLAG(RXLDPC, "RXLDPC"); PFLAG(SHORT_GI_80, "SHORT-GI-80"); PFLAG(SHORT_GI_160, "SHORT-GI-160"); PFLAG(TXSTBC, "TXSTBC"); p += scnprintf(p, bufsz + buf - p, "\t\tRXSTBC_%d\n", (vhtc->cap >> 8) & 0x7); PFLAG(SU_BEAMFORMER_CAPABLE, "SU-BEAMFORMER-CAPABLE"); PFLAG(SU_BEAMFORMEE_CAPABLE, "SU-BEAMFORMEE-CAPABLE"); p += scnprintf(p, bufsz + buf - p, "\t\tBEAMFORMEE-STS: 0x%x\n", (vhtc->cap & IEEE80211_VHT_CAP_BEAMFORMEE_STS_MASK) >> IEEE80211_VHT_CAP_BEAMFORMEE_STS_SHIFT); p += scnprintf(p, bufsz + buf - p, "\t\tSOUNDING-DIMENSIONS: 0x%x\n", (vhtc->cap & IEEE80211_VHT_CAP_SOUNDING_DIMENSIONS_MASK) >> IEEE80211_VHT_CAP_SOUNDING_DIMENSIONS_SHIFT); PFLAG(MU_BEAMFORMER_CAPABLE, "MU-BEAMFORMER-CAPABLE"); PFLAG(MU_BEAMFORMEE_CAPABLE, "MU-BEAMFORMEE-CAPABLE"); PFLAG(VHT_TXOP_PS, "TXOP-PS"); PFLAG(HTC_VHT, "HTC-VHT"); p += scnprintf(p, bufsz + buf - p, "\t\tMPDU-LENGTH-EXPONENT: 0x%x\n", (vhtc->cap & IEEE80211_VHT_CAP_MAX_A_MPDU_LENGTH_EXPONENT_MASK) >> IEEE80211_VHT_CAP_MAX_A_MPDU_LENGTH_EXPONENT_SHIFT); PFLAG(VHT_LINK_ADAPTATION_VHT_UNSOL_MFB, "LINK-ADAPTATION-VHT-UNSOL-MFB"); p += scnprintf(p, bufsz + buf - p, "\t\tLINK-ADAPTATION-VHT-MRQ-MFB: 0x%x\n", (vhtc->cap & IEEE80211_VHT_CAP_VHT_LINK_ADAPTATION_VHT_MRQ_MFB) >> 26); PFLAG(RX_ANTENNA_PATTERN, "RX-ANTENNA-PATTERN"); PFLAG(TX_ANTENNA_PATTERN, "TX-ANTENNA-PATTERN"); p += scnprintf(p, bufsz + buf - p, "RX MCS: %.4x\n", le16_to_cpu(vhtc->vht_mcs.rx_mcs_map)); if (vhtc->vht_mcs.rx_highest) p += scnprintf(p, bufsz + buf - p, "MCS RX highest: %d Mbps\n", le16_to_cpu(vhtc->vht_mcs.rx_highest)); p += scnprintf(p, bufsz + buf - p, "TX MCS: %.4x\n", le16_to_cpu(vhtc->vht_mcs.tx_mcs_map)); if (vhtc->vht_mcs.tx_highest) p += scnprintf(p, bufsz + buf - p, "MCS TX highest: %d Mbps\n", le16_to_cpu(vhtc->vht_mcs.tx_highest)); #undef PFLAG } ret = simple_read_from_buffer(userbuf, count, ppos, buf, p - buf); kfree(buf); return ret; } LINK_STA_OPS(vht_capa); static ssize_t link_sta_he_capa_read(struct file *file, char __user *userbuf, size_t count, loff_t *ppos) { char *buf, *p; size_t buf_sz = PAGE_SIZE; struct link_sta_info *link_sta = file->private_data; struct ieee80211_sta_he_cap *hec = &link_sta->pub->he_cap; struct ieee80211_he_mcs_nss_supp *nss = &hec->he_mcs_nss_supp; u8 ppe_size; u8 *cap; int i; ssize_t ret; buf = kmalloc(buf_sz, GFP_KERNEL); if (!buf) return -ENOMEM; p = buf; p += scnprintf(p, buf_sz + buf - p, "HE %ssupported\n", hec->has_he ? "" : "not "); if (!hec->has_he) goto out; cap = hec->he_cap_elem.mac_cap_info; p += scnprintf(p, buf_sz + buf - p, "MAC-CAP: %#.2x %#.2x %#.2x %#.2x %#.2x %#.2x\n", cap[0], cap[1], cap[2], cap[3], cap[4], cap[5]); #define PRINT(fmt, ...) \ p += scnprintf(p, buf_sz + buf - p, "\t\t" fmt "\n", \ ##__VA_ARGS__) #define PFLAG(t, n, a, b) \ do { \ if (cap[n] & IEEE80211_HE_##t##_CAP##n##_##a) \ PRINT("%s", b); \ } while (0) #define PFLAG_RANGE(t, i, n, s, m, off, fmt) \ do { \ u8 msk = IEEE80211_HE_##t##_CAP##i##_##n##_MASK; \ u8 idx = ((cap[i] & msk) >> (ffs(msk) - 1)) + off; \ PRINT(fmt, (s << idx) + (m * idx)); \ } while (0) #define PFLAG_RANGE_DEFAULT(t, i, n, s, m, off, fmt, a, b) \ do { \ if (cap[i] == IEEE80211_HE_##t ##_CAP##i##_##n##_##a) { \ PRINT("%s", b); \ break; \ } \ PFLAG_RANGE(t, i, n, s, m, off, fmt); \ } while (0) PFLAG(MAC, 0, HTC_HE, "HTC-HE"); PFLAG(MAC, 0, TWT_REQ, "TWT-REQ"); PFLAG(MAC, 0, TWT_RES, "TWT-RES"); PFLAG_RANGE_DEFAULT(MAC, 0, DYNAMIC_FRAG, 0, 1, 0, "DYNAMIC-FRAG-LEVEL-%d", NOT_SUPP, "NOT-SUPP"); PFLAG_RANGE_DEFAULT(MAC, 0, MAX_NUM_FRAG_MSDU, 1, 0, 0, "MAX-NUM-FRAG-MSDU-%d", UNLIMITED, "UNLIMITED"); PFLAG_RANGE_DEFAULT(MAC, 1, MIN_FRAG_SIZE, 128, 0, -1, "MIN-FRAG-SIZE-%d", UNLIMITED, "UNLIMITED"); PFLAG_RANGE_DEFAULT(MAC, 1, TF_MAC_PAD_DUR, 0, 8, 0, "TF-MAC-PAD-DUR-%dUS", MASK, "UNKNOWN"); PFLAG_RANGE(MAC, 1, MULTI_TID_AGG_RX_QOS, 0, 1, 1, "MULTI-TID-AGG-RX-QOS-%d"); if (cap[0] & IEEE80211_HE_MAC_CAP0_HTC_HE) { switch (((cap[2] << 1) | (cap[1] >> 7)) & 0x3) { case 0: PRINT("LINK-ADAPTATION-NO-FEEDBACK"); break; case 1: PRINT("LINK-ADAPTATION-RESERVED"); break; case 2: PRINT("LINK-ADAPTATION-UNSOLICITED-FEEDBACK"); break; case 3: PRINT("LINK-ADAPTATION-BOTH"); break; } } PFLAG(MAC, 2, ALL_ACK, "ALL-ACK"); PFLAG(MAC, 2, TRS, "TRS"); PFLAG(MAC, 2, BSR, "BSR"); PFLAG(MAC, 2, BCAST_TWT, "BCAST-TWT"); PFLAG(MAC, 2, 32BIT_BA_BITMAP, "32BIT-BA-BITMAP"); PFLAG(MAC, 2, MU_CASCADING, "MU-CASCADING"); PFLAG(MAC, 2, ACK_EN, "ACK-EN"); PFLAG(MAC, 3, OMI_CONTROL, "OMI-CONTROL"); PFLAG(MAC, 3, OFDMA_RA, "OFDMA-RA"); switch (cap[3] & IEEE80211_HE_MAC_CAP3_MAX_AMPDU_LEN_EXP_MASK) { case IEEE80211_HE_MAC_CAP3_MAX_AMPDU_LEN_EXP_EXT_0: PRINT("MAX-AMPDU-LEN-EXP-USE-EXT-0"); break; case IEEE80211_HE_MAC_CAP3_MAX_AMPDU_LEN_EXP_EXT_1: PRINT("MAX-AMPDU-LEN-EXP-VHT-EXT-1"); break; case IEEE80211_HE_MAC_CAP3_MAX_AMPDU_LEN_EXP_EXT_2: PRINT("MAX-AMPDU-LEN-EXP-VHT-EXT-2"); break; case IEEE80211_HE_MAC_CAP3_MAX_AMPDU_LEN_EXP_EXT_3: PRINT("MAX-AMPDU-LEN-EXP-VHT-EXT-3"); break; } PFLAG(MAC, 3, AMSDU_FRAG, "AMSDU-FRAG"); PFLAG(MAC, 3, FLEX_TWT_SCHED, "FLEX-TWT-SCHED"); PFLAG(MAC, 3, RX_CTRL_FRAME_TO_MULTIBSS, "RX-CTRL-FRAME-TO-MULTIBSS"); PFLAG(MAC, 4, BSRP_BQRP_A_MPDU_AGG, "BSRP-BQRP-A-MPDU-AGG"); PFLAG(MAC, 4, QTP, "QTP"); PFLAG(MAC, 4, BQR, "BQR"); PFLAG(MAC, 4, PSR_RESP, "PSR-RESP"); PFLAG(MAC, 4, NDP_FB_REP, "NDP-FB-REP"); PFLAG(MAC, 4, OPS, "OPS"); PFLAG(MAC, 4, AMSDU_IN_AMPDU, "AMSDU-IN-AMPDU"); PRINT("MULTI-TID-AGG-TX-QOS-%d", ((cap[5] << 1) | (cap[4] >> 7)) & 0x7); PFLAG(MAC, 5, SUBCHAN_SELECTIVE_TRANSMISSION, "SUBCHAN-SELECTIVE-TRANSMISSION"); PFLAG(MAC, 5, UL_2x996_TONE_RU, "UL-2x996-TONE-RU"); PFLAG(MAC, 5, OM_CTRL_UL_MU_DATA_DIS_RX, "OM-CTRL-UL-MU-DATA-DIS-RX"); PFLAG(MAC, 5, HE_DYNAMIC_SM_PS, "HE-DYNAMIC-SM-PS"); PFLAG(MAC, 5, PUNCTURED_SOUNDING, "PUNCTURED-SOUNDING"); PFLAG(MAC, 5, HT_VHT_TRIG_FRAME_RX, "HT-VHT-TRIG-FRAME-RX"); cap = hec->he_cap_elem.phy_cap_info; p += scnprintf(p, buf_sz + buf - p, "PHY CAP: %#.2x %#.2x %#.2x %#.2x %#.2x %#.2x %#.2x %#.2x %#.2x %#.2x %#.2x\n", cap[0], cap[1], cap[2], cap[3], cap[4], cap[5], cap[6], cap[7], cap[8], cap[9], cap[10]); PFLAG(PHY, 0, CHANNEL_WIDTH_SET_40MHZ_IN_2G, "CHANNEL-WIDTH-SET-40MHZ-IN-2G"); PFLAG(PHY, 0, CHANNEL_WIDTH_SET_40MHZ_80MHZ_IN_5G, "CHANNEL-WIDTH-SET-40MHZ-80MHZ-IN-5G"); PFLAG(PHY, 0, CHANNEL_WIDTH_SET_160MHZ_IN_5G, "CHANNEL-WIDTH-SET-160MHZ-IN-5G"); PFLAG(PHY, 0, CHANNEL_WIDTH_SET_80PLUS80_MHZ_IN_5G, "CHANNEL-WIDTH-SET-80PLUS80-MHZ-IN-5G"); PFLAG(PHY, 0, CHANNEL_WIDTH_SET_RU_MAPPING_IN_2G, "CHANNEL-WIDTH-SET-RU-MAPPING-IN-2G"); PFLAG(PHY, 0, CHANNEL_WIDTH_SET_RU_MAPPING_IN_5G, "CHANNEL-WIDTH-SET-RU-MAPPING-IN-5G"); switch (cap[1] & IEEE80211_HE_PHY_CAP1_PREAMBLE_PUNC_RX_MASK) { case IEEE80211_HE_PHY_CAP1_PREAMBLE_PUNC_RX_80MHZ_ONLY_SECOND_20MHZ: PRINT("PREAMBLE-PUNC-RX-80MHZ-ONLY-SECOND-20MHZ"); break; case IEEE80211_HE_PHY_CAP1_PREAMBLE_PUNC_RX_80MHZ_ONLY_SECOND_40MHZ: PRINT("PREAMBLE-PUNC-RX-80MHZ-ONLY-SECOND-40MHZ"); break; case IEEE80211_HE_PHY_CAP1_PREAMBLE_PUNC_RX_160MHZ_ONLY_SECOND_20MHZ: PRINT("PREAMBLE-PUNC-RX-160MHZ-ONLY-SECOND-20MHZ"); break; case IEEE80211_HE_PHY_CAP1_PREAMBLE_PUNC_RX_160MHZ_ONLY_SECOND_40MHZ: PRINT("PREAMBLE-PUNC-RX-160MHZ-ONLY-SECOND-40MHZ"); break; } PFLAG(PHY, 1, DEVICE_CLASS_A, "IEEE80211-HE-PHY-CAP1-DEVICE-CLASS-A"); PFLAG(PHY, 1, LDPC_CODING_IN_PAYLOAD, "LDPC-CODING-IN-PAYLOAD"); PFLAG(PHY, 1, HE_LTF_AND_GI_FOR_HE_PPDUS_0_8US, "HY-CAP1-HE-LTF-AND-GI-FOR-HE-PPDUS-0-8US"); PRINT("MIDAMBLE-RX-MAX-NSTS-%d", ((cap[2] << 1) | (cap[1] >> 7)) & 0x3); PFLAG(PHY, 2, NDP_4x_LTF_AND_3_2US, "NDP-4X-LTF-AND-3-2US"); PFLAG(PHY, 2, STBC_TX_UNDER_80MHZ, "STBC-TX-UNDER-80MHZ"); PFLAG(PHY, 2, STBC_RX_UNDER_80MHZ, "STBC-RX-UNDER-80MHZ"); PFLAG(PHY, 2, DOPPLER_TX, "DOPPLER-TX"); PFLAG(PHY, 2, DOPPLER_RX, "DOPPLER-RX"); PFLAG(PHY, 2, UL_MU_FULL_MU_MIMO, "UL-MU-FULL-MU-MIMO"); PFLAG(PHY, 2, UL_MU_PARTIAL_MU_MIMO, "UL-MU-PARTIAL-MU-MIMO"); switch (cap[3] & IEEE80211_HE_PHY_CAP3_DCM_MAX_CONST_TX_MASK) { case IEEE80211_HE_PHY_CAP3_DCM_MAX_CONST_TX_NO_DCM: PRINT("DCM-MAX-CONST-TX-NO-DCM"); break; case IEEE80211_HE_PHY_CAP3_DCM_MAX_CONST_TX_BPSK: PRINT("DCM-MAX-CONST-TX-BPSK"); break; case IEEE80211_HE_PHY_CAP3_DCM_MAX_CONST_TX_QPSK: PRINT("DCM-MAX-CONST-TX-QPSK"); break; case IEEE80211_HE_PHY_CAP3_DCM_MAX_CONST_TX_16_QAM: PRINT("DCM-MAX-CONST-TX-16-QAM"); break; } PFLAG(PHY, 3, DCM_MAX_TX_NSS_1, "DCM-MAX-TX-NSS-1"); PFLAG(PHY, 3, DCM_MAX_TX_NSS_2, "DCM-MAX-TX-NSS-2"); switch (cap[3] & IEEE80211_HE_PHY_CAP3_DCM_MAX_CONST_RX_MASK) { case IEEE80211_HE_PHY_CAP3_DCM_MAX_CONST_RX_NO_DCM: PRINT("DCM-MAX-CONST-RX-NO-DCM"); break; case IEEE80211_HE_PHY_CAP3_DCM_MAX_CONST_RX_BPSK: PRINT("DCM-MAX-CONST-RX-BPSK"); break; case IEEE80211_HE_PHY_CAP3_DCM_MAX_CONST_RX_QPSK: PRINT("DCM-MAX-CONST-RX-QPSK"); break; case IEEE80211_HE_PHY_CAP3_DCM_MAX_CONST_RX_16_QAM: PRINT("DCM-MAX-CONST-RX-16-QAM"); break; } PFLAG(PHY, 3, DCM_MAX_RX_NSS_1, "DCM-MAX-RX-NSS-1"); PFLAG(PHY, 3, DCM_MAX_RX_NSS_2, "DCM-MAX-RX-NSS-2"); PFLAG(PHY, 3, RX_PARTIAL_BW_SU_IN_20MHZ_MU, "RX-PARTIAL-BW-SU-IN-20MHZ-MU"); PFLAG(PHY, 3, SU_BEAMFORMER, "SU-BEAMFORMER"); PFLAG(PHY, 4, SU_BEAMFORMEE, "SU-BEAMFORMEE"); PFLAG(PHY, 4, MU_BEAMFORMER, "MU-BEAMFORMER"); PFLAG_RANGE(PHY, 4, BEAMFORMEE_MAX_STS_UNDER_80MHZ, 0, 1, 4, "BEAMFORMEE-MAX-STS-UNDER-%d"); PFLAG_RANGE(PHY, 4, BEAMFORMEE_MAX_STS_ABOVE_80MHZ, 0, 1, 4, "BEAMFORMEE-MAX-STS-ABOVE-%d"); PFLAG_RANGE(PHY, 5, BEAMFORMEE_NUM_SND_DIM_UNDER_80MHZ, 0, 1, 1, "NUM-SND-DIM-UNDER-80MHZ-%d"); PFLAG_RANGE(PHY, 5, BEAMFORMEE_NUM_SND_DIM_ABOVE_80MHZ, 0, 1, 1, "NUM-SND-DIM-ABOVE-80MHZ-%d"); PFLAG(PHY, 5, NG16_SU_FEEDBACK, "NG16-SU-FEEDBACK"); PFLAG(PHY, 5, NG16_MU_FEEDBACK, "NG16-MU-FEEDBACK"); PFLAG(PHY, 6, CODEBOOK_SIZE_42_SU, "CODEBOOK-SIZE-42-SU"); PFLAG(PHY, 6, CODEBOOK_SIZE_75_MU, "CODEBOOK-SIZE-75-MU"); PFLAG(PHY, 6, TRIG_SU_BEAMFORMING_FB, "TRIG-SU-BEAMFORMING-FB"); PFLAG(PHY, 6, TRIG_MU_BEAMFORMING_PARTIAL_BW_FB, "MU-BEAMFORMING-PARTIAL-BW-FB"); PFLAG(PHY, 6, TRIG_CQI_FB, "TRIG-CQI-FB"); PFLAG(PHY, 6, PARTIAL_BW_EXT_RANGE, "PARTIAL-BW-EXT-RANGE"); PFLAG(PHY, 6, PARTIAL_BANDWIDTH_DL_MUMIMO, "PARTIAL-BANDWIDTH-DL-MUMIMO"); PFLAG(PHY, 6, PPE_THRESHOLD_PRESENT, "PPE-THRESHOLD-PRESENT"); PFLAG(PHY, 7, PSR_BASED_SR, "PSR-BASED-SR"); PFLAG(PHY, 7, POWER_BOOST_FACTOR_SUPP, "POWER-BOOST-FACTOR-SUPP"); PFLAG(PHY, 7, HE_SU_MU_PPDU_4XLTF_AND_08_US_GI, "HE-SU-MU-PPDU-4XLTF-AND-08-US-GI"); PFLAG_RANGE(PHY, 7, MAX_NC, 0, 1, 1, "MAX-NC-%d"); PFLAG(PHY, 7, STBC_TX_ABOVE_80MHZ, "STBC-TX-ABOVE-80MHZ"); PFLAG(PHY, 7, STBC_RX_ABOVE_80MHZ, "STBC-RX-ABOVE-80MHZ"); PFLAG(PHY, 8, HE_ER_SU_PPDU_4XLTF_AND_08_US_GI, "HE-ER-SU-PPDU-4XLTF-AND-08-US-GI"); PFLAG(PHY, 8, 20MHZ_IN_40MHZ_HE_PPDU_IN_2G, "20MHZ-IN-40MHZ-HE-PPDU-IN-2G"); PFLAG(PHY, 8, 20MHZ_IN_160MHZ_HE_PPDU, "20MHZ-IN-160MHZ-HE-PPDU"); PFLAG(PHY, 8, 80MHZ_IN_160MHZ_HE_PPDU, "80MHZ-IN-160MHZ-HE-PPDU"); PFLAG(PHY, 8, HE_ER_SU_1XLTF_AND_08_US_GI, "HE-ER-SU-1XLTF-AND-08-US-GI"); PFLAG(PHY, 8, MIDAMBLE_RX_TX_2X_AND_1XLTF, "MIDAMBLE-RX-TX-2X-AND-1XLTF"); switch (cap[8] & IEEE80211_HE_PHY_CAP8_DCM_MAX_RU_MASK) { case IEEE80211_HE_PHY_CAP8_DCM_MAX_RU_242: PRINT("DCM-MAX-RU-242"); break; case IEEE80211_HE_PHY_CAP8_DCM_MAX_RU_484: PRINT("DCM-MAX-RU-484"); break; case IEEE80211_HE_PHY_CAP8_DCM_MAX_RU_996: PRINT("DCM-MAX-RU-996"); break; case IEEE80211_HE_PHY_CAP8_DCM_MAX_RU_2x996: PRINT("DCM-MAX-RU-2x996"); break; } PFLAG(PHY, 9, LONGER_THAN_16_SIGB_OFDM_SYM, "LONGER-THAN-16-SIGB-OFDM-SYM"); PFLAG(PHY, 9, NON_TRIGGERED_CQI_FEEDBACK, "NON-TRIGGERED-CQI-FEEDBACK"); PFLAG(PHY, 9, TX_1024_QAM_LESS_THAN_242_TONE_RU, "TX-1024-QAM-LESS-THAN-242-TONE-RU"); PFLAG(PHY, 9, RX_1024_QAM_LESS_THAN_242_TONE_RU, "RX-1024-QAM-LESS-THAN-242-TONE-RU"); PFLAG(PHY, 9, RX_FULL_BW_SU_USING_MU_WITH_COMP_SIGB, "RX-FULL-BW-SU-USING-MU-WITH-COMP-SIGB"); PFLAG(PHY, 9, RX_FULL_BW_SU_USING_MU_WITH_NON_COMP_SIGB, "RX-FULL-BW-SU-USING-MU-WITH-NON-COMP-SIGB"); switch (u8_get_bits(cap[9], IEEE80211_HE_PHY_CAP9_NOMINAL_PKT_PADDING_MASK)) { case IEEE80211_HE_PHY_CAP9_NOMINAL_PKT_PADDING_0US: PRINT("NOMINAL-PACKET-PADDING-0US"); break; case IEEE80211_HE_PHY_CAP9_NOMINAL_PKT_PADDING_8US: PRINT("NOMINAL-PACKET-PADDING-8US"); break; case IEEE80211_HE_PHY_CAP9_NOMINAL_PKT_PADDING_16US: PRINT("NOMINAL-PACKET-PADDING-16US"); break; } #undef PFLAG_RANGE_DEFAULT #undef PFLAG_RANGE #undef PFLAG #define PRINT_NSS_SUPP(f, n) \ do { \ int _i; \ u16 v = le16_to_cpu(nss->f); \ p += scnprintf(p, buf_sz + buf - p, n ": %#.4x\n", v); \ for (_i = 0; _i < 8; _i += 2) { \ switch ((v >> _i) & 0x3) { \ case 0: \ PRINT(n "-%d-SUPPORT-0-7", _i / 2); \ break; \ case 1: \ PRINT(n "-%d-SUPPORT-0-9", _i / 2); \ break; \ case 2: \ PRINT(n "-%d-SUPPORT-0-11", _i / 2); \ break; \ case 3: \ PRINT(n "-%d-NOT-SUPPORTED", _i / 2); \ break; \ } \ } \ } while (0) PRINT_NSS_SUPP(rx_mcs_80, "RX-MCS-80"); PRINT_NSS_SUPP(tx_mcs_80, "TX-MCS-80"); if (cap[0] & IEEE80211_HE_PHY_CAP0_CHANNEL_WIDTH_SET_160MHZ_IN_5G) { PRINT_NSS_SUPP(rx_mcs_160, "RX-MCS-160"); PRINT_NSS_SUPP(tx_mcs_160, "TX-MCS-160"); } if (cap[0] & IEEE80211_HE_PHY_CAP0_CHANNEL_WIDTH_SET_80PLUS80_MHZ_IN_5G) { PRINT_NSS_SUPP(rx_mcs_80p80, "RX-MCS-80P80"); PRINT_NSS_SUPP(tx_mcs_80p80, "TX-MCS-80P80"); } #undef PRINT_NSS_SUPP #undef PRINT if (!(cap[6] & IEEE80211_HE_PHY_CAP6_PPE_THRESHOLD_PRESENT)) goto out; p += scnprintf(p, buf_sz + buf - p, "PPE-THRESHOLDS: %#.2x", hec->ppe_thres[0]); ppe_size = ieee80211_he_ppe_size(hec->ppe_thres[0], cap); for (i = 1; i < ppe_size; i++) { p += scnprintf(p, buf_sz + buf - p, " %#.2x", hec->ppe_thres[i]); } p += scnprintf(p, buf_sz + buf - p, "\n"); out: ret = simple_read_from_buffer(userbuf, count, ppos, buf, p - buf); kfree(buf); return ret; } LINK_STA_OPS(he_capa); static ssize_t link_sta_eht_capa_read(struct file *file, char __user *userbuf, size_t count, loff_t *ppos) { char *buf, *p; size_t buf_sz = PAGE_SIZE; struct link_sta_info *link_sta = file->private_data; struct ieee80211_sta_eht_cap *bec = &link_sta->pub->eht_cap; struct ieee80211_eht_cap_elem_fixed *fixed = &bec->eht_cap_elem; struct ieee80211_eht_mcs_nss_supp *nss = &bec->eht_mcs_nss_supp; u8 *cap; int i; ssize_t ret; static const char *mcs_desc[] = { "0-7", "8-9", "10-11", "12-13"}; buf = kmalloc(buf_sz, GFP_KERNEL); if (!buf) return -ENOMEM; p = buf; p += scnprintf(p, buf_sz + buf - p, "EHT %ssupported\n", bec->has_eht ? "" : "not "); if (!bec->has_eht) goto out; p += scnprintf(p, buf_sz + buf - p, "MAC-CAP: %#.2x %#.2x\n", fixed->mac_cap_info[0], fixed->mac_cap_info[1]); p += scnprintf(p, buf_sz + buf - p, "PHY-CAP: %#.2x %#.2x %#.2x %#.2x %#.2x %#.2x %#.2x %#.2x %#.2x\n", fixed->phy_cap_info[0], fixed->phy_cap_info[1], fixed->phy_cap_info[2], fixed->phy_cap_info[3], fixed->phy_cap_info[4], fixed->phy_cap_info[5], fixed->phy_cap_info[6], fixed->phy_cap_info[7], fixed->phy_cap_info[8]); #define PRINT(fmt, ...) \ p += scnprintf(p, buf_sz + buf - p, "\t\t" fmt "\n", \ ##__VA_ARGS__) #define PFLAG(t, n, a, b) \ do { \ if (cap[n] & IEEE80211_EHT_##t##_CAP##n##_##a) \ PRINT("%s", b); \ } while (0) cap = fixed->mac_cap_info; PFLAG(MAC, 0, EPCS_PRIO_ACCESS, "EPCS-PRIO-ACCESS"); PFLAG(MAC, 0, OM_CONTROL, "OM-CONTROL"); PFLAG(MAC, 0, TRIG_TXOP_SHARING_MODE1, "TRIG-TXOP-SHARING-MODE1"); PFLAG(MAC, 0, TRIG_TXOP_SHARING_MODE2, "TRIG-TXOP-SHARING-MODE2"); PFLAG(MAC, 0, RESTRICTED_TWT, "RESTRICTED-TWT"); PFLAG(MAC, 0, SCS_TRAFFIC_DESC, "SCS-TRAFFIC-DESC"); switch ((cap[0] & 0xc0) >> 6) { case IEEE80211_EHT_MAC_CAP0_MAX_MPDU_LEN_3895: PRINT("MAX-MPDU-LEN: 3985"); break; case IEEE80211_EHT_MAC_CAP0_MAX_MPDU_LEN_7991: PRINT("MAX-MPDU-LEN: 7991"); break; case IEEE80211_EHT_MAC_CAP0_MAX_MPDU_LEN_11454: PRINT("MAX-MPDU-LEN: 11454"); break; } cap = fixed->phy_cap_info; PFLAG(PHY, 0, 320MHZ_IN_6GHZ, "320MHZ-IN-6GHZ"); PFLAG(PHY, 0, 242_TONE_RU_GT20MHZ, "242-TONE-RU-GT20MHZ"); PFLAG(PHY, 0, NDP_4_EHT_LFT_32_GI, "NDP-4-EHT-LFT-32-GI"); PFLAG(PHY, 0, PARTIAL_BW_UL_MU_MIMO, "PARTIAL-BW-UL-MU-MIMO"); PFLAG(PHY, 0, SU_BEAMFORMER, "SU-BEAMFORMER"); PFLAG(PHY, 0, SU_BEAMFORMEE, "SU-BEAMFORMEE"); i = cap[0] >> 7; i |= (cap[1] & 0x3) << 1; PRINT("BEAMFORMEE-80-NSS: %i", i); PRINT("BEAMFORMEE-160-NSS: %i", (cap[1] >> 2) & 0x7); PRINT("BEAMFORMEE-320-NSS: %i", (cap[1] >> 5) & 0x7); PRINT("SOUNDING-DIM-80-NSS: %i", (cap[2] & 0x7)); PRINT("SOUNDING-DIM-160-NSS: %i", (cap[2] >> 3) & 0x7); i = cap[2] >> 6; i |= (cap[3] & 0x1) << 3; PRINT("SOUNDING-DIM-320-NSS: %i", i); PFLAG(PHY, 3, NG_16_SU_FEEDBACK, "NG-16-SU-FEEDBACK"); PFLAG(PHY, 3, NG_16_MU_FEEDBACK, "NG-16-MU-FEEDBACK"); PFLAG(PHY, 3, CODEBOOK_4_2_SU_FDBK, "CODEBOOK-4-2-SU-FDBK"); PFLAG(PHY, 3, CODEBOOK_7_5_MU_FDBK, "CODEBOOK-7-5-MU-FDBK"); PFLAG(PHY, 3, TRIG_SU_BF_FDBK, "TRIG-SU-BF-FDBK"); PFLAG(PHY, 3, TRIG_MU_BF_PART_BW_FDBK, "TRIG-MU-BF-PART-BW-FDBK"); PFLAG(PHY, 3, TRIG_CQI_FDBK, "TRIG-CQI-FDBK"); PFLAG(PHY, 4, PART_BW_DL_MU_MIMO, "PART-BW-DL-MU-MIMO"); PFLAG(PHY, 4, PSR_SR_SUPP, "PSR-SR-SUPP"); PFLAG(PHY, 4, POWER_BOOST_FACT_SUPP, "POWER-BOOST-FACT-SUPP"); PFLAG(PHY, 4, EHT_MU_PPDU_4_EHT_LTF_08_GI, "EHT-MU-PPDU-4-EHT-LTF-08-GI"); PRINT("MAX_NC: %i", cap[4] >> 4); PFLAG(PHY, 5, NON_TRIG_CQI_FEEDBACK, "NON-TRIG-CQI-FEEDBACK"); PFLAG(PHY, 5, TX_LESS_242_TONE_RU_SUPP, "TX-LESS-242-TONE-RU-SUPP"); PFLAG(PHY, 5, RX_LESS_242_TONE_RU_SUPP, "RX-LESS-242-TONE-RU-SUPP"); PFLAG(PHY, 5, PPE_THRESHOLD_PRESENT, "PPE_THRESHOLD_PRESENT"); switch (cap[5] >> 4 & 0x3) { case IEEE80211_EHT_PHY_CAP5_COMMON_NOMINAL_PKT_PAD_0US: PRINT("NOMINAL_PKT_PAD: 0us"); break; case IEEE80211_EHT_PHY_CAP5_COMMON_NOMINAL_PKT_PAD_8US: PRINT("NOMINAL_PKT_PAD: 8us"); break; case IEEE80211_EHT_PHY_CAP5_COMMON_NOMINAL_PKT_PAD_16US: PRINT("NOMINAL_PKT_PAD: 16us"); break; case IEEE80211_EHT_PHY_CAP5_COMMON_NOMINAL_PKT_PAD_20US: PRINT("NOMINAL_PKT_PAD: 20us"); break; } i = cap[5] >> 6; i |= cap[6] & 0x7; PRINT("MAX-NUM-SUPP-EHT-LTF: %i", i); PFLAG(PHY, 5, SUPP_EXTRA_EHT_LTF, "SUPP-EXTRA-EHT-LTF"); i = (cap[6] >> 3) & 0xf; PRINT("MCS15-SUPP-MASK: %i", i); PFLAG(PHY, 6, EHT_DUP_6GHZ_SUPP, "EHT-DUP-6GHZ-SUPP"); PFLAG(PHY, 7, 20MHZ_STA_RX_NDP_WIDER_BW, "20MHZ-STA-RX-NDP-WIDER-BW"); PFLAG(PHY, 7, NON_OFDMA_UL_MU_MIMO_80MHZ, "NON-OFDMA-UL-MU-MIMO-80MHZ"); PFLAG(PHY, 7, NON_OFDMA_UL_MU_MIMO_160MHZ, "NON-OFDMA-UL-MU-MIMO-160MHZ"); PFLAG(PHY, 7, NON_OFDMA_UL_MU_MIMO_320MHZ, "NON-OFDMA-UL-MU-MIMO-320MHZ"); PFLAG(PHY, 7, MU_BEAMFORMER_80MHZ, "MU-BEAMFORMER-80MHZ"); PFLAG(PHY, 7, MU_BEAMFORMER_160MHZ, "MU-BEAMFORMER-160MHZ"); PFLAG(PHY, 7, MU_BEAMFORMER_320MHZ, "MU-BEAMFORMER-320MHZ"); PFLAG(PHY, 7, TB_SOUNDING_FDBK_RATE_LIMIT, "TB-SOUNDING-FDBK-RATE-LIMIT"); PFLAG(PHY, 8, RX_1024QAM_WIDER_BW_DL_OFDMA, "RX-1024QAM-WIDER-BW-DL-OFDMA"); PFLAG(PHY, 8, RX_4096QAM_WIDER_BW_DL_OFDMA, "RX-4096QAM-WIDER-BW-DL-OFDMA"); #undef PFLAG PRINT(""); /* newline */ if (!(link_sta->pub->he_cap.he_cap_elem.phy_cap_info[0] & IEEE80211_HE_PHY_CAP0_CHANNEL_WIDTH_SET_MASK_ALL)) { u8 *mcs_vals = (u8 *)(&nss->only_20mhz); for (i = 0; i < 4; i++) PRINT("EHT bw=20 MHz, max NSS for MCS %s: Rx=%u, Tx=%u", mcs_desc[i], mcs_vals[i] & 0xf, mcs_vals[i] >> 4); } else { u8 *mcs_vals = (u8 *)(&nss->bw._80); for (i = 0; i < 3; i++) PRINT("EHT bw <= 80 MHz, max NSS for MCS %s: Rx=%u, Tx=%u", mcs_desc[i + 1], mcs_vals[i] & 0xf, mcs_vals[i] >> 4); mcs_vals = (u8 *)(&nss->bw._160); for (i = 0; i < 3; i++) PRINT("EHT bw <= 160 MHz, max NSS for MCS %s: Rx=%u, Tx=%u", mcs_desc[i + 1], mcs_vals[i] & 0xf, mcs_vals[i] >> 4); mcs_vals = (u8 *)(&nss->bw._320); for (i = 0; i < 3; i++) PRINT("EHT bw <= 320 MHz, max NSS for MCS %s: Rx=%u, Tx=%u", mcs_desc[i + 1], mcs_vals[i] & 0xf, mcs_vals[i] >> 4); } if (cap[5] & IEEE80211_EHT_PHY_CAP5_PPE_THRESHOLD_PRESENT) { u8 ppe_size = ieee80211_eht_ppe_size(bec->eht_ppe_thres[0], cap); p += scnprintf(p, buf_sz + buf - p, "EHT PPE Thresholds: "); for (i = 0; i < ppe_size; i++) p += scnprintf(p, buf_sz + buf - p, "0x%02x ", bec->eht_ppe_thres[i]); PRINT(""); /* newline */ } out: ret = simple_read_from_buffer(userbuf, count, ppos, buf, p - buf); kfree(buf); return ret; } LINK_STA_OPS(eht_capa); #define DEBUGFS_ADD(name) \ debugfs_create_file(#name, 0400, \ sta->debugfs_dir, sta, &sta_ ##name## _ops) #define DEBUGFS_ADD_COUNTER(name, field) \ debugfs_create_ulong(#name, 0400, sta->debugfs_dir, &sta->field); void ieee80211_sta_debugfs_add(struct sta_info *sta) { struct ieee80211_local *local = sta->local; struct ieee80211_sub_if_data *sdata = sta->sdata; struct dentry *stations_dir = sta->sdata->debugfs.subdir_stations; u8 mac[MAC_ADDR_STR_LEN + 1]; if (!stations_dir) return; snprintf(mac, sizeof(mac), "%pM", sta->sta.addr); /* * This might fail due to a race condition: * When mac80211 unlinks a station, the debugfs entries * remain, but it is already possible to link a new * station with the same address which triggers adding * it to debugfs; therefore, if the old station isn't * destroyed quickly enough the old station's debugfs * dir might still be around. */ sta->debugfs_dir = debugfs_create_dir(mac, stations_dir); DEBUGFS_ADD(flags); DEBUGFS_ADD(aid); DEBUGFS_ADD(num_ps_buf_frames); DEBUGFS_ADD(last_seq_ctrl); DEBUGFS_ADD(agg_status); /* FIXME: Kept here as the statistics are only done on the deflink */ DEBUGFS_ADD_COUNTER(tx_filtered, deflink.status_stats.filtered); DEBUGFS_ADD(aqm); DEBUGFS_ADD(airtime); if (wiphy_ext_feature_isset(local->hw.wiphy, NL80211_EXT_FEATURE_AQL)) DEBUGFS_ADD(aql); debugfs_create_xul("driver_buffered_tids", 0400, sta->debugfs_dir, &sta->driver_buffered_tids); drv_sta_add_debugfs(local, sdata, &sta->sta, sta->debugfs_dir); } void ieee80211_sta_debugfs_remove(struct sta_info *sta) { debugfs_remove_recursive(sta->debugfs_dir); sta->debugfs_dir = NULL; } #undef DEBUGFS_ADD #undef DEBUGFS_ADD_COUNTER #define DEBUGFS_ADD(name) \ debugfs_create_file(#name, 0400, \ link_sta->debugfs_dir, link_sta, &link_sta_ ##name## _ops) #define DEBUGFS_ADD_COUNTER(name, field) \ debugfs_create_ulong(#name, 0400, link_sta->debugfs_dir, &link_sta->field) void ieee80211_link_sta_debugfs_add(struct link_sta_info *link_sta) { if (WARN_ON(!link_sta->sta->debugfs_dir)) return; /* For non-MLO, leave the files in the main directory. */ if (link_sta->sta->sta.valid_links) { char link_dir_name[10]; snprintf(link_dir_name, sizeof(link_dir_name), "link-%d", link_sta->link_id); link_sta->debugfs_dir = debugfs_create_dir(link_dir_name, link_sta->sta->debugfs_dir); DEBUGFS_ADD(addr); } else { if (WARN_ON(link_sta != &link_sta->sta->deflink)) return; link_sta->debugfs_dir = link_sta->sta->debugfs_dir; } DEBUGFS_ADD(ht_capa); DEBUGFS_ADD(vht_capa); DEBUGFS_ADD(he_capa); DEBUGFS_ADD(eht_capa); DEBUGFS_ADD_COUNTER(rx_duplicates, rx_stats.num_duplicates); DEBUGFS_ADD_COUNTER(rx_fragments, rx_stats.fragments); } void ieee80211_link_sta_debugfs_remove(struct link_sta_info *link_sta) { if (!link_sta->debugfs_dir || !link_sta->sta->debugfs_dir) { link_sta->debugfs_dir = NULL; return; } if (link_sta->debugfs_dir == link_sta->sta->debugfs_dir) { WARN_ON(link_sta != &link_sta->sta->deflink); link_sta->sta->debugfs_dir = NULL; return; } debugfs_remove_recursive(link_sta->debugfs_dir); link_sta->debugfs_dir = NULL; } void ieee80211_link_sta_debugfs_drv_add(struct link_sta_info *link_sta) { if (WARN_ON(!link_sta->debugfs_dir)) return; drv_link_sta_add_debugfs(link_sta->sta->local, link_sta->sta->sdata, link_sta->pub, link_sta->debugfs_dir); } void ieee80211_link_sta_debugfs_drv_remove(struct link_sta_info *link_sta) { if (!link_sta->debugfs_dir) return; if (WARN_ON(link_sta->debugfs_dir == link_sta->sta->debugfs_dir)) return; /* Recreate the directory excluding the driver data */ debugfs_remove_recursive(link_sta->debugfs_dir); link_sta->debugfs_dir = NULL; ieee80211_link_sta_debugfs_add(link_sta); } |
| 134 134 572 326 481 470 52 137 116 18 1 433 454 454 213 7 4 6 314 134 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 | /* SPDX-License-Identifier: GPL-2.0-or-later */ #ifndef __SOUND_PCM_H #define __SOUND_PCM_H /* * Digital Audio (PCM) abstract layer * Copyright (c) by Jaroslav Kysela <perex@perex.cz> * Abramo Bagnara <abramo@alsa-project.org> */ #include <sound/asound.h> #include <sound/memalloc.h> #include <sound/minors.h> #include <linux/poll.h> #include <linux/mm.h> #include <linux/bitops.h> #include <linux/pm_qos.h> #include <linux/refcount.h> #include <linux/uio.h> #define snd_pcm_substream_chip(substream) ((substream)->private_data) #define snd_pcm_chip(pcm) ((pcm)->private_data) #if IS_ENABLED(CONFIG_SND_PCM_OSS) #include <sound/pcm_oss.h> #endif /* * Hardware (lowlevel) section */ struct snd_pcm_hardware { unsigned int info; /* SNDRV_PCM_INFO_* */ u64 formats; /* SNDRV_PCM_FMTBIT_* */ u32 subformats; /* for S32_LE, SNDRV_PCM_SUBFMTBIT_* */ unsigned int rates; /* SNDRV_PCM_RATE_* */ unsigned int rate_min; /* min rate */ unsigned int rate_max; /* max rate */ unsigned int channels_min; /* min channels */ unsigned int channels_max; /* max channels */ size_t buffer_bytes_max; /* max buffer size */ size_t period_bytes_min; /* min period size */ size_t period_bytes_max; /* max period size */ unsigned int periods_min; /* min # of periods */ unsigned int periods_max; /* max # of periods */ size_t fifo_size; /* fifo size in bytes */ }; struct snd_pcm_status64; struct snd_pcm_substream; struct snd_pcm_audio_tstamp_config; /* definitions further down */ struct snd_pcm_audio_tstamp_report; struct snd_pcm_ops { int (*open)(struct snd_pcm_substream *substream); int (*close)(struct snd_pcm_substream *substream); int (*ioctl)(struct snd_pcm_substream * substream, unsigned int cmd, void *arg); int (*hw_params)(struct snd_pcm_substream *substream, struct snd_pcm_hw_params *params); int (*hw_free)(struct snd_pcm_substream *substream); int (*prepare)(struct snd_pcm_substream *substream); int (*trigger)(struct snd_pcm_substream *substream, int cmd); int (*sync_stop)(struct snd_pcm_substream *substream); snd_pcm_uframes_t (*pointer)(struct snd_pcm_substream *substream); int (*get_time_info)(struct snd_pcm_substream *substream, struct timespec64 *system_ts, struct timespec64 *audio_ts, struct snd_pcm_audio_tstamp_config *audio_tstamp_config, struct snd_pcm_audio_tstamp_report *audio_tstamp_report); int (*fill_silence)(struct snd_pcm_substream *substream, int channel, unsigned long pos, unsigned long bytes); int (*copy)(struct snd_pcm_substream *substream, int channel, unsigned long pos, struct iov_iter *iter, unsigned long bytes); struct page *(*page)(struct snd_pcm_substream *substream, unsigned long offset); int (*mmap)(struct snd_pcm_substream *substream, struct vm_area_struct *vma); int (*ack)(struct snd_pcm_substream *substream); }; /* * */ #if defined(CONFIG_SND_DYNAMIC_MINORS) #define SNDRV_PCM_DEVICES (SNDRV_OS_MINORS-2) #else #define SNDRV_PCM_DEVICES 8 #endif #define SNDRV_PCM_IOCTL1_RESET 0 /* 1 is absent slot. */ #define SNDRV_PCM_IOCTL1_CHANNEL_INFO 2 /* 3 is absent slot. */ #define SNDRV_PCM_IOCTL1_FIFO_SIZE 4 #define SNDRV_PCM_IOCTL1_SYNC_ID 5 #define SNDRV_PCM_TRIGGER_STOP 0 #define SNDRV_PCM_TRIGGER_START 1 #define SNDRV_PCM_TRIGGER_PAUSE_PUSH 2 #define SNDRV_PCM_TRIGGER_PAUSE_RELEASE 3 #define SNDRV_PCM_TRIGGER_SUSPEND 4 #define SNDRV_PCM_TRIGGER_RESUME 5 #define SNDRV_PCM_TRIGGER_DRAIN 6 #define SNDRV_PCM_POS_XRUN ((snd_pcm_uframes_t)-1) /* If you change this don't forget to change rates[] table in pcm_native.c */ #define SNDRV_PCM_RATE_5512 (1U<<0) /* 5512Hz */ #define SNDRV_PCM_RATE_8000 (1U<<1) /* 8000Hz */ #define SNDRV_PCM_RATE_11025 (1U<<2) /* 11025Hz */ #define SNDRV_PCM_RATE_16000 (1U<<3) /* 16000Hz */ #define SNDRV_PCM_RATE_22050 (1U<<4) /* 22050Hz */ #define SNDRV_PCM_RATE_32000 (1U<<5) /* 32000Hz */ #define SNDRV_PCM_RATE_44100 (1U<<6) /* 44100Hz */ #define SNDRV_PCM_RATE_48000 (1U<<7) /* 48000Hz */ #define SNDRV_PCM_RATE_64000 (1U<<8) /* 64000Hz */ #define SNDRV_PCM_RATE_88200 (1U<<9) /* 88200Hz */ #define SNDRV_PCM_RATE_96000 (1U<<10) /* 96000Hz */ #define SNDRV_PCM_RATE_176400 (1U<<11) /* 176400Hz */ #define SNDRV_PCM_RATE_192000 (1U<<12) /* 192000Hz */ #define SNDRV_PCM_RATE_352800 (1U<<13) /* 352800Hz */ #define SNDRV_PCM_RATE_384000 (1U<<14) /* 384000Hz */ #define SNDRV_PCM_RATE_705600 (1U<<15) /* 705600Hz */ #define SNDRV_PCM_RATE_768000 (1U<<16) /* 768000Hz */ /* extended rates since 6.12 */ #define SNDRV_PCM_RATE_12000 (1U<<17) /* 12000Hz */ #define SNDRV_PCM_RATE_24000 (1U<<18) /* 24000Hz */ #define SNDRV_PCM_RATE_128000 (1U<<19) /* 128000Hz */ #define SNDRV_PCM_RATE_CONTINUOUS (1U<<30) /* continuous range */ #define SNDRV_PCM_RATE_KNOT (1U<<31) /* supports more non-continuous rates */ #define SNDRV_PCM_RATE_8000_44100 (SNDRV_PCM_RATE_8000|SNDRV_PCM_RATE_11025|\ SNDRV_PCM_RATE_16000|SNDRV_PCM_RATE_22050|\ SNDRV_PCM_RATE_32000|SNDRV_PCM_RATE_44100) #define SNDRV_PCM_RATE_8000_48000 (SNDRV_PCM_RATE_8000_44100|SNDRV_PCM_RATE_48000) #define SNDRV_PCM_RATE_8000_96000 (SNDRV_PCM_RATE_8000_48000|SNDRV_PCM_RATE_64000|\ SNDRV_PCM_RATE_88200|SNDRV_PCM_RATE_96000) #define SNDRV_PCM_RATE_8000_192000 (SNDRV_PCM_RATE_8000_96000|SNDRV_PCM_RATE_176400|\ SNDRV_PCM_RATE_192000) #define SNDRV_PCM_RATE_8000_384000 (SNDRV_PCM_RATE_8000_192000|\ SNDRV_PCM_RATE_352800|\ SNDRV_PCM_RATE_384000) #define SNDRV_PCM_RATE_8000_768000 (SNDRV_PCM_RATE_8000_384000|\ SNDRV_PCM_RATE_705600|\ SNDRV_PCM_RATE_768000) #define _SNDRV_PCM_FMTBIT(fmt) (1ULL << (__force int)SNDRV_PCM_FORMAT_##fmt) #define SNDRV_PCM_FMTBIT_S8 _SNDRV_PCM_FMTBIT(S8) #define SNDRV_PCM_FMTBIT_U8 _SNDRV_PCM_FMTBIT(U8) #define SNDRV_PCM_FMTBIT_S16_LE _SNDRV_PCM_FMTBIT(S16_LE) #define SNDRV_PCM_FMTBIT_S16_BE _SNDRV_PCM_FMTBIT(S16_BE) #define SNDRV_PCM_FMTBIT_U16_LE _SNDRV_PCM_FMTBIT(U16_LE) #define SNDRV_PCM_FMTBIT_U16_BE _SNDRV_PCM_FMTBIT(U16_BE) #define SNDRV_PCM_FMTBIT_S24_LE _SNDRV_PCM_FMTBIT(S24_LE) #define SNDRV_PCM_FMTBIT_S24_BE _SNDRV_PCM_FMTBIT(S24_BE) #define SNDRV_PCM_FMTBIT_U24_LE _SNDRV_PCM_FMTBIT(U24_LE) #define SNDRV_PCM_FMTBIT_U24_BE _SNDRV_PCM_FMTBIT(U24_BE) // For S32/U32 formats, 'msbits' hardware parameter is often used to deliver information about the // available bit count in most significant bit. It's for the case of so-called 'left-justified' or // `right-padding` sample which has less width than 32 bit. #define SNDRV_PCM_FMTBIT_S32_LE _SNDRV_PCM_FMTBIT(S32_LE) #define SNDRV_PCM_FMTBIT_S32_BE _SNDRV_PCM_FMTBIT(S32_BE) #define SNDRV_PCM_FMTBIT_U32_LE _SNDRV_PCM_FMTBIT(U32_LE) #define SNDRV_PCM_FMTBIT_U32_BE _SNDRV_PCM_FMTBIT(U32_BE) #define SNDRV_PCM_FMTBIT_FLOAT_LE _SNDRV_PCM_FMTBIT(FLOAT_LE) #define SNDRV_PCM_FMTBIT_FLOAT_BE _SNDRV_PCM_FMTBIT(FLOAT_BE) #define SNDRV_PCM_FMTBIT_FLOAT64_LE _SNDRV_PCM_FMTBIT(FLOAT64_LE) #define SNDRV_PCM_FMTBIT_FLOAT64_BE _SNDRV_PCM_FMTBIT(FLOAT64_BE) #define SNDRV_PCM_FMTBIT_IEC958_SUBFRAME_LE _SNDRV_PCM_FMTBIT(IEC958_SUBFRAME_LE) #define SNDRV_PCM_FMTBIT_IEC958_SUBFRAME_BE _SNDRV_PCM_FMTBIT(IEC958_SUBFRAME_BE) #define SNDRV_PCM_FMTBIT_MU_LAW _SNDRV_PCM_FMTBIT(MU_LAW) #define SNDRV_PCM_FMTBIT_A_LAW _SNDRV_PCM_FMTBIT(A_LAW) #define SNDRV_PCM_FMTBIT_IMA_ADPCM _SNDRV_PCM_FMTBIT(IMA_ADPCM) #define SNDRV_PCM_FMTBIT_MPEG _SNDRV_PCM_FMTBIT(MPEG) #define SNDRV_PCM_FMTBIT_GSM _SNDRV_PCM_FMTBIT(GSM) #define SNDRV_PCM_FMTBIT_S20_LE _SNDRV_PCM_FMTBIT(S20_LE) #define SNDRV_PCM_FMTBIT_U20_LE _SNDRV_PCM_FMTBIT(U20_LE) #define SNDRV_PCM_FMTBIT_S20_BE _SNDRV_PCM_FMTBIT(S20_BE) #define SNDRV_PCM_FMTBIT_U20_BE _SNDRV_PCM_FMTBIT(U20_BE) #define SNDRV_PCM_FMTBIT_SPECIAL _SNDRV_PCM_FMTBIT(SPECIAL) #define SNDRV_PCM_FMTBIT_S24_3LE _SNDRV_PCM_FMTBIT(S24_3LE) #define SNDRV_PCM_FMTBIT_U24_3LE _SNDRV_PCM_FMTBIT(U24_3LE) #define SNDRV_PCM_FMTBIT_S24_3BE _SNDRV_PCM_FMTBIT(S24_3BE) #define SNDRV_PCM_FMTBIT_U24_3BE _SNDRV_PCM_FMTBIT(U24_3BE) #define SNDRV_PCM_FMTBIT_S20_3LE _SNDRV_PCM_FMTBIT(S20_3LE) #define SNDRV_PCM_FMTBIT_U20_3LE _SNDRV_PCM_FMTBIT(U20_3LE) #define SNDRV_PCM_FMTBIT_S20_3BE _SNDRV_PCM_FMTBIT(S20_3BE) #define SNDRV_PCM_FMTBIT_U20_3BE _SNDRV_PCM_FMTBIT(U20_3BE) #define SNDRV_PCM_FMTBIT_S18_3LE _SNDRV_PCM_FMTBIT(S18_3LE) #define SNDRV_PCM_FMTBIT_U18_3LE _SNDRV_PCM_FMTBIT(U18_3LE) #define SNDRV_PCM_FMTBIT_S18_3BE _SNDRV_PCM_FMTBIT(S18_3BE) #define SNDRV_PCM_FMTBIT_U18_3BE _SNDRV_PCM_FMTBIT(U18_3BE) #define SNDRV_PCM_FMTBIT_G723_24 _SNDRV_PCM_FMTBIT(G723_24) #define SNDRV_PCM_FMTBIT_G723_24_1B _SNDRV_PCM_FMTBIT(G723_24_1B) #define SNDRV_PCM_FMTBIT_G723_40 _SNDRV_PCM_FMTBIT(G723_40) #define SNDRV_PCM_FMTBIT_G723_40_1B _SNDRV_PCM_FMTBIT(G723_40_1B) #define SNDRV_PCM_FMTBIT_DSD_U8 _SNDRV_PCM_FMTBIT(DSD_U8) #define SNDRV_PCM_FMTBIT_DSD_U16_LE _SNDRV_PCM_FMTBIT(DSD_U16_LE) #define SNDRV_PCM_FMTBIT_DSD_U32_LE _SNDRV_PCM_FMTBIT(DSD_U32_LE) #define SNDRV_PCM_FMTBIT_DSD_U16_BE _SNDRV_PCM_FMTBIT(DSD_U16_BE) #define SNDRV_PCM_FMTBIT_DSD_U32_BE _SNDRV_PCM_FMTBIT(DSD_U32_BE) #ifdef SNDRV_LITTLE_ENDIAN #define SNDRV_PCM_FMTBIT_S16 SNDRV_PCM_FMTBIT_S16_LE #define SNDRV_PCM_FMTBIT_U16 SNDRV_PCM_FMTBIT_U16_LE #define SNDRV_PCM_FMTBIT_S24 SNDRV_PCM_FMTBIT_S24_LE #define SNDRV_PCM_FMTBIT_U24 SNDRV_PCM_FMTBIT_U24_LE #define SNDRV_PCM_FMTBIT_S32 SNDRV_PCM_FMTBIT_S32_LE #define SNDRV_PCM_FMTBIT_U32 SNDRV_PCM_FMTBIT_U32_LE #define SNDRV_PCM_FMTBIT_FLOAT SNDRV_PCM_FMTBIT_FLOAT_LE #define SNDRV_PCM_FMTBIT_FLOAT64 SNDRV_PCM_FMTBIT_FLOAT64_LE #define SNDRV_PCM_FMTBIT_IEC958_SUBFRAME SNDRV_PCM_FMTBIT_IEC958_SUBFRAME_LE #define SNDRV_PCM_FMTBIT_S20 SNDRV_PCM_FMTBIT_S20_LE #define SNDRV_PCM_FMTBIT_U20 SNDRV_PCM_FMTBIT_U20_LE #endif #ifdef SNDRV_BIG_ENDIAN #define SNDRV_PCM_FMTBIT_S16 SNDRV_PCM_FMTBIT_S16_BE #define SNDRV_PCM_FMTBIT_U16 SNDRV_PCM_FMTBIT_U16_BE #define SNDRV_PCM_FMTBIT_S24 SNDRV_PCM_FMTBIT_S24_BE #define SNDRV_PCM_FMTBIT_U24 SNDRV_PCM_FMTBIT_U24_BE #define SNDRV_PCM_FMTBIT_S32 SNDRV_PCM_FMTBIT_S32_BE #define SNDRV_PCM_FMTBIT_U32 SNDRV_PCM_FMTBIT_U32_BE #define SNDRV_PCM_FMTBIT_FLOAT SNDRV_PCM_FMTBIT_FLOAT_BE #define SNDRV_PCM_FMTBIT_FLOAT64 SNDRV_PCM_FMTBIT_FLOAT64_BE #define SNDRV_PCM_FMTBIT_IEC958_SUBFRAME SNDRV_PCM_FMTBIT_IEC958_SUBFRAME_BE #define SNDRV_PCM_FMTBIT_S20 SNDRV_PCM_FMTBIT_S20_BE #define SNDRV_PCM_FMTBIT_U20 SNDRV_PCM_FMTBIT_U20_BE #endif #define _SNDRV_PCM_SUBFMTBIT(fmt) BIT((__force int)SNDRV_PCM_SUBFORMAT_##fmt) #define SNDRV_PCM_SUBFMTBIT_STD _SNDRV_PCM_SUBFMTBIT(STD) #define SNDRV_PCM_SUBFMTBIT_MSBITS_MAX _SNDRV_PCM_SUBFMTBIT(MSBITS_MAX) #define SNDRV_PCM_SUBFMTBIT_MSBITS_20 _SNDRV_PCM_SUBFMTBIT(MSBITS_20) #define SNDRV_PCM_SUBFMTBIT_MSBITS_24 _SNDRV_PCM_SUBFMTBIT(MSBITS_24) struct snd_pcm_file { struct snd_pcm_substream *substream; int no_compat_mmap; unsigned int user_pversion; /* supported protocol version */ }; struct snd_pcm_hw_rule; typedef int (*snd_pcm_hw_rule_func_t)(struct snd_pcm_hw_params *params, struct snd_pcm_hw_rule *rule); struct snd_pcm_hw_rule { unsigned int cond; int var; int deps[5]; snd_pcm_hw_rule_func_t func; void *private; }; struct snd_pcm_hw_constraints { struct snd_mask masks[SNDRV_PCM_HW_PARAM_LAST_MASK - SNDRV_PCM_HW_PARAM_FIRST_MASK + 1]; struct snd_interval intervals[SNDRV_PCM_HW_PARAM_LAST_INTERVAL - SNDRV_PCM_HW_PARAM_FIRST_INTERVAL + 1]; unsigned int rules_num; unsigned int rules_all; struct snd_pcm_hw_rule *rules; }; static inline struct snd_mask *constrs_mask(struct snd_pcm_hw_constraints *constrs, snd_pcm_hw_param_t var) { return &constrs->masks[var - SNDRV_PCM_HW_PARAM_FIRST_MASK]; } static inline struct snd_interval *constrs_interval(struct snd_pcm_hw_constraints *constrs, snd_pcm_hw_param_t var) { return &constrs->intervals[var - SNDRV_PCM_HW_PARAM_FIRST_INTERVAL]; } struct snd_ratnum { unsigned int num; unsigned int den_min, den_max, den_step; }; struct snd_ratden { unsigned int num_min, num_max, num_step; unsigned int den; }; struct snd_pcm_hw_constraint_ratnums { int nrats; const struct snd_ratnum *rats; }; struct snd_pcm_hw_constraint_ratdens { int nrats; const struct snd_ratden *rats; }; struct snd_pcm_hw_constraint_list { const unsigned int *list; unsigned int count; unsigned int mask; }; struct snd_pcm_hw_constraint_ranges { unsigned int count; const struct snd_interval *ranges; unsigned int mask; }; /* * userspace-provided audio timestamp config to kernel, * structure is for internal use only and filled with dedicated unpack routine */ struct snd_pcm_audio_tstamp_config { /* 5 of max 16 bits used */ u32 type_requested:4; u32 report_delay:1; /* add total delay to A/D or D/A */ }; static inline void snd_pcm_unpack_audio_tstamp_config(__u32 data, struct snd_pcm_audio_tstamp_config *config) { config->type_requested = data & 0xF; config->report_delay = (data >> 4) & 1; } /* * kernel-provided audio timestamp report to user-space * structure is for internal use only and read by dedicated pack routine */ struct snd_pcm_audio_tstamp_report { /* 6 of max 16 bits used for bit-fields */ /* for backwards compatibility */ u32 valid:1; /* actual type if hardware could not support requested timestamp */ u32 actual_type:4; /* accuracy represented in ns units */ u32 accuracy_report:1; /* 0 if accuracy unknown, 1 if accuracy field is valid */ u32 accuracy; /* up to 4.29s, will be packed in separate field */ }; static inline void snd_pcm_pack_audio_tstamp_report(__u32 *data, __u32 *accuracy, const struct snd_pcm_audio_tstamp_report *report) { u32 tmp; tmp = report->accuracy_report; tmp <<= 4; tmp |= report->actual_type; tmp <<= 1; tmp |= report->valid; *data &= 0xffff; /* zero-clear MSBs */ *data |= (tmp << 16); *accuracy = report->accuracy; } struct snd_pcm_runtime { /* -- Status -- */ snd_pcm_state_t state; /* stream state */ snd_pcm_state_t suspended_state; /* suspended stream state */ struct snd_pcm_substream *trigger_master; struct timespec64 trigger_tstamp; /* trigger timestamp */ bool trigger_tstamp_latched; /* trigger timestamp latched in low-level driver/hardware */ int overrange; snd_pcm_uframes_t avail_max; snd_pcm_uframes_t hw_ptr_base; /* Position at buffer restart */ snd_pcm_uframes_t hw_ptr_interrupt; /* Position at interrupt time */ unsigned long hw_ptr_jiffies; /* Time when hw_ptr is updated */ unsigned long hw_ptr_buffer_jiffies; /* buffer time in jiffies */ snd_pcm_sframes_t delay; /* extra delay; typically FIFO size */ u64 hw_ptr_wrap; /* offset for hw_ptr due to boundary wrap-around */ /* -- HW params -- */ snd_pcm_access_t access; /* access mode */ snd_pcm_format_t format; /* SNDRV_PCM_FORMAT_* */ snd_pcm_subformat_t subformat; /* subformat */ unsigned int rate; /* rate in Hz */ unsigned int channels; /* channels */ snd_pcm_uframes_t period_size; /* period size */ unsigned int periods; /* periods */ snd_pcm_uframes_t buffer_size; /* buffer size */ snd_pcm_uframes_t min_align; /* Min alignment for the format */ size_t byte_align; unsigned int frame_bits; unsigned int sample_bits; unsigned int info; unsigned int rate_num; unsigned int rate_den; unsigned int no_period_wakeup: 1; /* -- SW params; see struct snd_pcm_sw_params for comments -- */ int tstamp_mode; unsigned int period_step; snd_pcm_uframes_t start_threshold; snd_pcm_uframes_t stop_threshold; snd_pcm_uframes_t silence_threshold; snd_pcm_uframes_t silence_size; snd_pcm_uframes_t boundary; /* internal data of auto-silencer */ snd_pcm_uframes_t silence_start; /* starting pointer to silence area */ snd_pcm_uframes_t silence_filled; /* already filled part of silence area */ bool std_sync_id; /* hardware synchronization - standard per card ID */ /* -- mmap -- */ struct snd_pcm_mmap_status *status; struct snd_pcm_mmap_control *control; /* -- locking / scheduling -- */ snd_pcm_uframes_t twake; /* do transfer (!poll) wakeup if non-zero */ wait_queue_head_t sleep; /* poll sleep */ wait_queue_head_t tsleep; /* transfer sleep */ struct snd_fasync *fasync; bool stop_operating; /* sync_stop will be called */ struct mutex buffer_mutex; /* protect for buffer changes */ atomic_t buffer_accessing; /* >0: in r/w operation, <0: blocked */ /* -- private section -- */ void *private_data; void (*private_free)(struct snd_pcm_runtime *runtime); /* -- hardware description -- */ struct snd_pcm_hardware hw; struct snd_pcm_hw_constraints hw_constraints; /* -- timer -- */ unsigned int timer_resolution; /* timer resolution */ int tstamp_type; /* timestamp type */ /* -- DMA -- */ unsigned char *dma_area; /* DMA area */ dma_addr_t dma_addr; /* physical bus address (not accessible from main CPU) */ size_t dma_bytes; /* size of DMA area */ struct snd_dma_buffer *dma_buffer_p; /* allocated buffer */ unsigned int buffer_changed:1; /* buffer allocation changed; set only in managed mode */ /* -- audio timestamp config -- */ struct snd_pcm_audio_tstamp_config audio_tstamp_config; struct snd_pcm_audio_tstamp_report audio_tstamp_report; struct timespec64 driver_tstamp; #if IS_ENABLED(CONFIG_SND_PCM_OSS) /* -- OSS things -- */ struct snd_pcm_oss_runtime oss; #endif }; struct snd_pcm_group { /* keep linked substreams */ spinlock_t lock; struct mutex mutex; struct list_head substreams; refcount_t refs; }; struct pid; struct snd_pcm_substream { struct snd_pcm *pcm; struct snd_pcm_str *pstr; void *private_data; /* copied from pcm->private_data */ int number; char name[32]; /* substream name */ int stream; /* stream (direction) */ struct pm_qos_request latency_pm_qos_req; /* pm_qos request */ size_t buffer_bytes_max; /* limit ring buffer size */ struct snd_dma_buffer dma_buffer; size_t dma_max; /* -- hardware operations -- */ const struct snd_pcm_ops *ops; /* -- runtime information -- */ struct snd_pcm_runtime *runtime; /* -- timer section -- */ struct snd_timer *timer; /* timer */ unsigned timer_running: 1; /* time is running */ long wait_time; /* time in ms for R/W to wait for avail */ /* -- next substream -- */ struct snd_pcm_substream *next; /* -- linked substreams -- */ struct list_head link_list; /* linked list member */ struct snd_pcm_group self_group; /* fake group for non linked substream (with substream lock inside) */ struct snd_pcm_group *group; /* pointer to current group */ /* -- assigned files -- */ int ref_count; atomic_t mmap_count; unsigned int f_flags; void (*pcm_release)(struct snd_pcm_substream *); struct pid *pid; #if IS_ENABLED(CONFIG_SND_PCM_OSS) /* -- OSS things -- */ struct snd_pcm_oss_substream oss; #endif #ifdef CONFIG_SND_VERBOSE_PROCFS struct snd_info_entry *proc_root; #endif /* CONFIG_SND_VERBOSE_PROCFS */ /* misc flags */ unsigned int hw_opened: 1; unsigned int managed_buffer_alloc:1; #ifdef CONFIG_SND_PCM_XRUN_DEBUG unsigned int xrun_counter; /* number of times xrun happens */ #endif /* CONFIG_SND_PCM_XRUN_DEBUG */ }; #define SUBSTREAM_BUSY(substream) ((substream)->ref_count > 0) struct snd_pcm_str { int stream; /* stream (direction) */ struct snd_pcm *pcm; /* -- substreams -- */ unsigned int substream_count; unsigned int substream_opened; struct snd_pcm_substream *substream; #if IS_ENABLED(CONFIG_SND_PCM_OSS) /* -- OSS things -- */ struct snd_pcm_oss_stream oss; #endif #ifdef CONFIG_SND_VERBOSE_PROCFS struct snd_info_entry *proc_root; #ifdef CONFIG_SND_PCM_XRUN_DEBUG unsigned int xrun_debug; /* 0 = disabled, 1 = verbose, 2 = stacktrace */ #endif #endif struct snd_kcontrol *chmap_kctl; /* channel-mapping controls */ struct device *dev; }; struct snd_pcm { struct snd_card *card; struct list_head list; int device; /* device number */ unsigned int info_flags; unsigned short dev_class; unsigned short dev_subclass; char id[64]; char name[80]; struct snd_pcm_str streams[2]; struct mutex open_mutex; wait_queue_head_t open_wait; void *private_data; void (*private_free) (struct snd_pcm *pcm); bool internal; /* pcm is for internal use only */ bool nonatomic; /* whole PCM operations are in non-atomic context */ bool no_device_suspend; /* don't invoke device PM suspend */ #if IS_ENABLED(CONFIG_SND_PCM_OSS) struct snd_pcm_oss oss; #endif }; /* * Registering */ extern const struct file_operations snd_pcm_f_ops[2]; int snd_pcm_new(struct snd_card *card, const char *id, int device, int playback_count, int capture_count, struct snd_pcm **rpcm); int snd_pcm_new_internal(struct snd_card *card, const char *id, int device, int playback_count, int capture_count, struct snd_pcm **rpcm); int snd_pcm_new_stream(struct snd_pcm *pcm, int stream, int substream_count); #if IS_ENABLED(CONFIG_SND_PCM_OSS) struct snd_pcm_notify { int (*n_register) (struct snd_pcm * pcm); int (*n_disconnect) (struct snd_pcm * pcm); int (*n_unregister) (struct snd_pcm * pcm); struct list_head list; }; int snd_pcm_notify(struct snd_pcm_notify *notify, int nfree); #endif /* * Native I/O */ int snd_pcm_info(struct snd_pcm_substream *substream, struct snd_pcm_info *info); int snd_pcm_info_user(struct snd_pcm_substream *substream, struct snd_pcm_info __user *info); int snd_pcm_status64(struct snd_pcm_substream *substream, struct snd_pcm_status64 *status); int snd_pcm_start(struct snd_pcm_substream *substream); int snd_pcm_stop(struct snd_pcm_substream *substream, snd_pcm_state_t status); int snd_pcm_drain_done(struct snd_pcm_substream *substream); int snd_pcm_stop_xrun(struct snd_pcm_substream *substream); #ifdef CONFIG_PM int snd_pcm_suspend_all(struct snd_pcm *pcm); #else static inline int snd_pcm_suspend_all(struct snd_pcm *pcm) { return 0; } #endif int snd_pcm_kernel_ioctl(struct snd_pcm_substream *substream, unsigned int cmd, void *arg); int snd_pcm_open_substream(struct snd_pcm *pcm, int stream, struct file *file, struct snd_pcm_substream **rsubstream); void snd_pcm_release_substream(struct snd_pcm_substream *substream); int snd_pcm_attach_substream(struct snd_pcm *pcm, int stream, struct file *file, struct snd_pcm_substream **rsubstream); void snd_pcm_detach_substream(struct snd_pcm_substream *substream); int snd_pcm_mmap_data(struct snd_pcm_substream *substream, struct file *file, struct vm_area_struct *area); #ifdef CONFIG_SND_DEBUG void snd_pcm_debug_name(struct snd_pcm_substream *substream, char *name, size_t len); #else static inline void snd_pcm_debug_name(struct snd_pcm_substream *substream, char *buf, size_t size) { *buf = 0; } #endif /* * PCM library */ /** * snd_pcm_stream_linked - Check whether the substream is linked with others * @substream: substream to check * * Return: true if the given substream is being linked with others */ static inline int snd_pcm_stream_linked(struct snd_pcm_substream *substream) { return substream->group != &substream->self_group; } void snd_pcm_stream_lock(struct snd_pcm_substream *substream); void snd_pcm_stream_unlock(struct snd_pcm_substream *substream); void snd_pcm_stream_lock_irq(struct snd_pcm_substream *substream); void snd_pcm_stream_unlock_irq(struct snd_pcm_substream *substream); unsigned long _snd_pcm_stream_lock_irqsave(struct snd_pcm_substream *substream); unsigned long _snd_pcm_stream_lock_irqsave_nested(struct snd_pcm_substream *substream); /** * snd_pcm_stream_lock_irqsave - Lock the PCM stream * @substream: PCM substream * @flags: irq flags * * This locks the PCM stream like snd_pcm_stream_lock() but with the local * IRQ (only when nonatomic is false). In nonatomic case, this is identical * as snd_pcm_stream_lock(). */ #define snd_pcm_stream_lock_irqsave(substream, flags) \ do { \ typecheck(unsigned long, flags); \ flags = _snd_pcm_stream_lock_irqsave(substream); \ } while (0) void snd_pcm_stream_unlock_irqrestore(struct snd_pcm_substream *substream, unsigned long flags); /** * snd_pcm_stream_lock_irqsave_nested - Single-nested PCM stream locking * @substream: PCM substream * @flags: irq flags * * This locks the PCM stream like snd_pcm_stream_lock_irqsave() but with * the single-depth lockdep subclass. */ #define snd_pcm_stream_lock_irqsave_nested(substream, flags) \ do { \ typecheck(unsigned long, flags); \ flags = _snd_pcm_stream_lock_irqsave_nested(substream); \ } while (0) /* definitions for guard(); use like guard(pcm_stream_lock) */ DEFINE_LOCK_GUARD_1(pcm_stream_lock, struct snd_pcm_substream, snd_pcm_stream_lock(_T->lock), snd_pcm_stream_unlock(_T->lock)) DEFINE_LOCK_GUARD_1(pcm_stream_lock_irq, struct snd_pcm_substream, snd_pcm_stream_lock_irq(_T->lock), snd_pcm_stream_unlock_irq(_T->lock)) DEFINE_LOCK_GUARD_1(pcm_stream_lock_irqsave, struct snd_pcm_substream, snd_pcm_stream_lock_irqsave(_T->lock, _T->flags), snd_pcm_stream_unlock_irqrestore(_T->lock, _T->flags), unsigned long flags) /** * snd_pcm_group_for_each_entry - iterate over the linked substreams * @s: the iterator * @substream: the substream * * Iterate over the all linked substreams to the given @substream. * When @substream isn't linked with any others, this gives returns @substream * itself once. */ #define snd_pcm_group_for_each_entry(s, substream) \ list_for_each_entry(s, &substream->group->substreams, link_list) #define for_each_pcm_streams(stream) \ for (stream = SNDRV_PCM_STREAM_PLAYBACK; \ stream <= SNDRV_PCM_STREAM_LAST; \ stream++) /** * snd_pcm_running - Check whether the substream is in a running state * @substream: substream to check * * Return: true if the given substream is in the state RUNNING, or in the * state DRAINING for playback. */ static inline int snd_pcm_running(struct snd_pcm_substream *substream) { return (substream->runtime->state == SNDRV_PCM_STATE_RUNNING || (substream->runtime->state == SNDRV_PCM_STATE_DRAINING && substream->stream == SNDRV_PCM_STREAM_PLAYBACK)); } /** * __snd_pcm_set_state - Change the current PCM state * @runtime: PCM runtime to set * @state: the current state to set * * Call within the stream lock */ static inline void __snd_pcm_set_state(struct snd_pcm_runtime *runtime, snd_pcm_state_t state) { runtime->state = state; runtime->status->state = state; /* copy for mmap */ } /** * bytes_to_samples - Unit conversion of the size from bytes to samples * @runtime: PCM runtime instance * @size: size in bytes * * Return: the size in samples */ static inline ssize_t bytes_to_samples(struct snd_pcm_runtime *runtime, ssize_t size) { return size * 8 / runtime->sample_bits; } /** * bytes_to_frames - Unit conversion of the size from bytes to frames * @runtime: PCM runtime instance * @size: size in bytes * * Return: the size in frames */ static inline snd_pcm_sframes_t bytes_to_frames(struct snd_pcm_runtime *runtime, ssize_t size) { return size * 8 / runtime->frame_bits; } /** * samples_to_bytes - Unit conversion of the size from samples to bytes * @runtime: PCM runtime instance * @size: size in samples * * Return: the byte size */ static inline ssize_t samples_to_bytes(struct snd_pcm_runtime *runtime, ssize_t size) { return size * runtime->sample_bits / 8; } /** * frames_to_bytes - Unit conversion of the size from frames to bytes * @runtime: PCM runtime instance * @size: size in frames * * Return: the byte size */ static inline ssize_t frames_to_bytes(struct snd_pcm_runtime *runtime, snd_pcm_sframes_t size) { return size * runtime->frame_bits / 8; } /** * frame_aligned - Check whether the byte size is aligned to frames * @runtime: PCM runtime instance * @bytes: size in bytes * * Return: true if aligned, or false if not */ static inline int frame_aligned(struct snd_pcm_runtime *runtime, ssize_t bytes) { return bytes % runtime->byte_align == 0; } /** * snd_pcm_lib_buffer_bytes - Get the buffer size of the current PCM in bytes * @substream: PCM substream * * Return: buffer byte size */ static inline size_t snd_pcm_lib_buffer_bytes(struct snd_pcm_substream *substream) { struct snd_pcm_runtime *runtime = substream->runtime; return frames_to_bytes(runtime, runtime->buffer_size); } /** * snd_pcm_lib_period_bytes - Get the period size of the current PCM in bytes * @substream: PCM substream * * Return: period byte size */ static inline size_t snd_pcm_lib_period_bytes(struct snd_pcm_substream *substream) { struct snd_pcm_runtime *runtime = substream->runtime; return frames_to_bytes(runtime, runtime->period_size); } /** * snd_pcm_playback_avail - Get the available (writable) space for playback * @runtime: PCM runtime instance * * Result is between 0 ... (boundary - 1) * * Return: available frame size */ static inline snd_pcm_uframes_t snd_pcm_playback_avail(struct snd_pcm_runtime *runtime) { snd_pcm_sframes_t avail = runtime->status->hw_ptr + runtime->buffer_size - runtime->control->appl_ptr; if (avail < 0) avail += runtime->boundary; else if ((snd_pcm_uframes_t) avail >= runtime->boundary) avail -= runtime->boundary; return avail; } /** * snd_pcm_capture_avail - Get the available (readable) space for capture * @runtime: PCM runtime instance * * Result is between 0 ... (boundary - 1) * * Return: available frame size */ static inline snd_pcm_uframes_t snd_pcm_capture_avail(struct snd_pcm_runtime *runtime) { snd_pcm_sframes_t avail = runtime->status->hw_ptr - runtime->control->appl_ptr; if (avail < 0) avail += runtime->boundary; return avail; } /** * snd_pcm_playback_hw_avail - Get the queued space for playback * @runtime: PCM runtime instance * * Return: available frame size */ static inline snd_pcm_sframes_t snd_pcm_playback_hw_avail(struct snd_pcm_runtime *runtime) { return runtime->buffer_size - snd_pcm_playback_avail(runtime); } /** * snd_pcm_capture_hw_avail - Get the free space for capture * @runtime: PCM runtime instance * * Return: available frame size */ static inline snd_pcm_sframes_t snd_pcm_capture_hw_avail(struct snd_pcm_runtime *runtime) { return runtime->buffer_size - snd_pcm_capture_avail(runtime); } /** * snd_pcm_playback_ready - check whether the playback buffer is available * @substream: the pcm substream instance * * Checks whether enough free space is available on the playback buffer. * * Return: Non-zero if available, or zero if not. */ static inline int snd_pcm_playback_ready(struct snd_pcm_substream *substream) { struct snd_pcm_runtime *runtime = substream->runtime; return snd_pcm_playback_avail(runtime) >= runtime->control->avail_min; } /** * snd_pcm_capture_ready - check whether the capture buffer is available * @substream: the pcm substream instance * * Checks whether enough capture data is available on the capture buffer. * * Return: Non-zero if available, or zero if not. */ static inline int snd_pcm_capture_ready(struct snd_pcm_substream *substream) { struct snd_pcm_runtime *runtime = substream->runtime; return snd_pcm_capture_avail(runtime) >= runtime->control->avail_min; } /** * snd_pcm_playback_data - check whether any data exists on the playback buffer * @substream: the pcm substream instance * * Checks whether any data exists on the playback buffer. * * Return: Non-zero if any data exists, or zero if not. If stop_threshold * is bigger or equal to boundary, then this function returns always non-zero. */ static inline int snd_pcm_playback_data(struct snd_pcm_substream *substream) { struct snd_pcm_runtime *runtime = substream->runtime; if (runtime->stop_threshold >= runtime->boundary) return 1; return snd_pcm_playback_avail(runtime) < runtime->buffer_size; } /** * snd_pcm_playback_empty - check whether the playback buffer is empty * @substream: the pcm substream instance * * Checks whether the playback buffer is empty. * * Return: Non-zero if empty, or zero if not. */ static inline int snd_pcm_playback_empty(struct snd_pcm_substream *substream) { struct snd_pcm_runtime *runtime = substream->runtime; return snd_pcm_playback_avail(runtime) >= runtime->buffer_size; } /** * snd_pcm_capture_empty - check whether the capture buffer is empty * @substream: the pcm substream instance * * Checks whether the capture buffer is empty. * * Return: Non-zero if empty, or zero if not. */ static inline int snd_pcm_capture_empty(struct snd_pcm_substream *substream) { struct snd_pcm_runtime *runtime = substream->runtime; return snd_pcm_capture_avail(runtime) == 0; } /** * snd_pcm_trigger_done - Mark the master substream * @substream: the pcm substream instance * @master: the linked master substream * * When multiple substreams of the same card are linked and the hardware * supports the single-shot operation, the driver calls this in the loop * in snd_pcm_group_for_each_entry() for marking the substream as "done". * Then most of trigger operations are performed only to the given master * substream. * * The trigger_master mark is cleared at timestamp updates at the end * of trigger operations. */ static inline void snd_pcm_trigger_done(struct snd_pcm_substream *substream, struct snd_pcm_substream *master) { substream->runtime->trigger_master = master; } static inline int hw_is_mask(int var) { return var >= SNDRV_PCM_HW_PARAM_FIRST_MASK && var <= SNDRV_PCM_HW_PARAM_LAST_MASK; } static inline int hw_is_interval(int var) { return var >= SNDRV_PCM_HW_PARAM_FIRST_INTERVAL && var <= SNDRV_PCM_HW_PARAM_LAST_INTERVAL; } static inline struct snd_mask *hw_param_mask(struct snd_pcm_hw_params *params, snd_pcm_hw_param_t var) { return ¶ms->masks[var - SNDRV_PCM_HW_PARAM_FIRST_MASK]; } static inline struct snd_interval *hw_param_interval(struct snd_pcm_hw_params *params, snd_pcm_hw_param_t var) { return ¶ms->intervals[var - SNDRV_PCM_HW_PARAM_FIRST_INTERVAL]; } static inline const struct snd_mask *hw_param_mask_c(const struct snd_pcm_hw_params *params, snd_pcm_hw_param_t var) { return ¶ms->masks[var - SNDRV_PCM_HW_PARAM_FIRST_MASK]; } static inline const struct snd_interval *hw_param_interval_c(const struct snd_pcm_hw_params *params, snd_pcm_hw_param_t var) { return ¶ms->intervals[var - SNDRV_PCM_HW_PARAM_FIRST_INTERVAL]; } /** * params_channels - Get the number of channels from the hw params * @p: hw params * * Return: the number of channels */ static inline unsigned int params_channels(const struct snd_pcm_hw_params *p) { return hw_param_interval_c(p, SNDRV_PCM_HW_PARAM_CHANNELS)->min; } /** * params_rate - Get the sample rate from the hw params * @p: hw params * * Return: the sample rate */ static inline unsigned int params_rate(const struct snd_pcm_hw_params *p) { return hw_param_interval_c(p, SNDRV_PCM_HW_PARAM_RATE)->min; } /** * params_period_size - Get the period size (in frames) from the hw params * @p: hw params * * Return: the period size in frames */ static inline unsigned int params_period_size(const struct snd_pcm_hw_params *p) { return hw_param_interval_c(p, SNDRV_PCM_HW_PARAM_PERIOD_SIZE)->min; } /** * params_periods - Get the number of periods from the hw params * @p: hw params * * Return: the number of periods */ static inline unsigned int params_periods(const struct snd_pcm_hw_params *p) { return hw_param_interval_c(p, SNDRV_PCM_HW_PARAM_PERIODS)->min; } /** * params_buffer_size - Get the buffer size (in frames) from the hw params * @p: hw params * * Return: the buffer size in frames */ static inline unsigned int params_buffer_size(const struct snd_pcm_hw_params *p) { return hw_param_interval_c(p, SNDRV_PCM_HW_PARAM_BUFFER_SIZE)->min; } /** * params_buffer_bytes - Get the buffer size (in bytes) from the hw params * @p: hw params * * Return: the buffer size in bytes */ static inline unsigned int params_buffer_bytes(const struct snd_pcm_hw_params *p) { return hw_param_interval_c(p, SNDRV_PCM_HW_PARAM_BUFFER_BYTES)->min; } int snd_interval_refine(struct snd_interval *i, const struct snd_interval *v); int snd_interval_list(struct snd_interval *i, unsigned int count, const unsigned int *list, unsigned int mask); int snd_interval_ranges(struct snd_interval *i, unsigned int count, const struct snd_interval *list, unsigned int mask); int snd_interval_ratnum(struct snd_interval *i, unsigned int rats_count, const struct snd_ratnum *rats, unsigned int *nump, unsigned int *denp); void _snd_pcm_hw_params_any(struct snd_pcm_hw_params *params); void _snd_pcm_hw_param_setempty(struct snd_pcm_hw_params *params, snd_pcm_hw_param_t var); int snd_pcm_hw_refine(struct snd_pcm_substream *substream, struct snd_pcm_hw_params *params); int snd_pcm_hw_constraint_mask64(struct snd_pcm_runtime *runtime, snd_pcm_hw_param_t var, u_int64_t mask); int snd_pcm_hw_constraint_minmax(struct snd_pcm_runtime *runtime, snd_pcm_hw_param_t var, unsigned int min, unsigned int max); int snd_pcm_hw_constraint_integer(struct snd_pcm_runtime *runtime, snd_pcm_hw_param_t var); int snd_pcm_hw_constraint_list(struct snd_pcm_runtime *runtime, unsigned int cond, snd_pcm_hw_param_t var, const struct snd_pcm_hw_constraint_list *l); int snd_pcm_hw_constraint_ranges(struct snd_pcm_runtime *runtime, unsigned int cond, snd_pcm_hw_param_t var, const struct snd_pcm_hw_constraint_ranges *r); int snd_pcm_hw_constraint_ratnums(struct snd_pcm_runtime *runtime, unsigned int cond, snd_pcm_hw_param_t var, const struct snd_pcm_hw_constraint_ratnums *r); int snd_pcm_hw_constraint_ratdens(struct snd_pcm_runtime *runtime, unsigned int cond, snd_pcm_hw_param_t var, const struct snd_pcm_hw_constraint_ratdens *r); int snd_pcm_hw_constraint_msbits(struct snd_pcm_runtime *runtime, unsigned int cond, unsigned int width, unsigned int msbits); int snd_pcm_hw_constraint_step(struct snd_pcm_runtime *runtime, unsigned int cond, snd_pcm_hw_param_t var, unsigned long step); int snd_pcm_hw_constraint_pow2(struct snd_pcm_runtime *runtime, unsigned int cond, snd_pcm_hw_param_t var); int snd_pcm_hw_rule_noresample(struct snd_pcm_runtime *runtime, unsigned int base_rate); int snd_pcm_hw_rule_add(struct snd_pcm_runtime *runtime, unsigned int cond, int var, snd_pcm_hw_rule_func_t func, void *private, int dep, ...); /** * snd_pcm_hw_constraint_single() - Constrain parameter to a single value * @runtime: PCM runtime instance * @var: The hw_params variable to constrain * @val: The value to constrain to * * Return: Positive if the value is changed, zero if it's not changed, or a * negative error code. */ static inline int snd_pcm_hw_constraint_single( struct snd_pcm_runtime *runtime, snd_pcm_hw_param_t var, unsigned int val) { return snd_pcm_hw_constraint_minmax(runtime, var, val, val); } int snd_pcm_format_signed(snd_pcm_format_t format); int snd_pcm_format_unsigned(snd_pcm_format_t format); int snd_pcm_format_linear(snd_pcm_format_t format); int snd_pcm_format_little_endian(snd_pcm_format_t format); int snd_pcm_format_big_endian(snd_pcm_format_t format); #if 0 /* just for kernel-doc */ /** * snd_pcm_format_cpu_endian - Check the PCM format is CPU-endian * @format: the format to check * * Return: 1 if the given PCM format is CPU-endian, 0 if * opposite, or a negative error code if endian not specified. */ int snd_pcm_format_cpu_endian(snd_pcm_format_t format); #endif /* DocBook */ #ifdef SNDRV_LITTLE_ENDIAN #define snd_pcm_format_cpu_endian(format) snd_pcm_format_little_endian(format) #else #define snd_pcm_format_cpu_endian(format) snd_pcm_format_big_endian(format) #endif int snd_pcm_format_width(snd_pcm_format_t format); /* in bits */ int snd_pcm_format_physical_width(snd_pcm_format_t format); /* in bits */ ssize_t snd_pcm_format_size(snd_pcm_format_t format, size_t samples); const unsigned char *snd_pcm_format_silence_64(snd_pcm_format_t format); int snd_pcm_format_set_silence(snd_pcm_format_t format, void *buf, unsigned int frames); void snd_pcm_set_ops(struct snd_pcm * pcm, int direction, const struct snd_pcm_ops *ops); void snd_pcm_set_sync_per_card(struct snd_pcm_substream *substream, struct snd_pcm_hw_params *params, const unsigned char *id, unsigned int len); /** * snd_pcm_set_sync - set the PCM sync id * @substream: the pcm substream * * Use the default PCM sync identifier for the specific card. */ static inline void snd_pcm_set_sync(struct snd_pcm_substream *substream) { substream->runtime->std_sync_id = true; } int snd_pcm_lib_ioctl(struct snd_pcm_substream *substream, unsigned int cmd, void *arg); void snd_pcm_period_elapsed_under_stream_lock(struct snd_pcm_substream *substream); void snd_pcm_period_elapsed(struct snd_pcm_substream *substream); snd_pcm_sframes_t __snd_pcm_lib_xfer(struct snd_pcm_substream *substream, void *buf, bool interleaved, snd_pcm_uframes_t frames, bool in_kernel); static inline snd_pcm_sframes_t snd_pcm_lib_write(struct snd_pcm_substream *substream, const void __user *buf, snd_pcm_uframes_t frames) { return __snd_pcm_lib_xfer(substream, (void __force *)buf, true, frames, false); } static inline snd_pcm_sframes_t snd_pcm_lib_read(struct snd_pcm_substream *substream, void __user *buf, snd_pcm_uframes_t frames) { return __snd_pcm_lib_xfer(substream, (void __force *)buf, true, frames, false); } static inline snd_pcm_sframes_t snd_pcm_lib_writev(struct snd_pcm_substream *substream, void __user **bufs, snd_pcm_uframes_t frames) { return __snd_pcm_lib_xfer(substream, (void *)bufs, false, frames, false); } static inline snd_pcm_sframes_t snd_pcm_lib_readv(struct snd_pcm_substream *substream, void __user **bufs, snd_pcm_uframes_t frames) { return __snd_pcm_lib_xfer(substream, (void *)bufs, false, frames, false); } static inline snd_pcm_sframes_t snd_pcm_kernel_write(struct snd_pcm_substream *substream, const void *buf, snd_pcm_uframes_t frames) { return __snd_pcm_lib_xfer(substream, (void *)buf, true, frames, true); } static inline snd_pcm_sframes_t snd_pcm_kernel_read(struct snd_pcm_substream *substream, void *buf, snd_pcm_uframes_t frames) { return __snd_pcm_lib_xfer(substream, buf, true, frames, true); } static inline snd_pcm_sframes_t snd_pcm_kernel_writev(struct snd_pcm_substream *substream, void **bufs, snd_pcm_uframes_t frames) { return __snd_pcm_lib_xfer(substream, bufs, false, frames, true); } static inline snd_pcm_sframes_t snd_pcm_kernel_readv(struct snd_pcm_substream *substream, void **bufs, snd_pcm_uframes_t frames) { return __snd_pcm_lib_xfer(substream, bufs, false, frames, true); } int snd_pcm_hw_limit_rates(struct snd_pcm_hardware *hw); static inline int snd_pcm_limit_hw_rates(struct snd_pcm_runtime *runtime) { return snd_pcm_hw_limit_rates(&runtime->hw); } unsigned int snd_pcm_rate_to_rate_bit(unsigned int rate); unsigned int snd_pcm_rate_bit_to_rate(unsigned int rate_bit); unsigned int snd_pcm_rate_mask_intersect(unsigned int rates_a, unsigned int rates_b); /** * snd_pcm_set_runtime_buffer - Set the PCM runtime buffer * @substream: PCM substream to set * @bufp: the buffer information, NULL to clear * * Copy the buffer information to runtime->dma_buffer when @bufp is non-NULL. * Otherwise it clears the current buffer information. */ static inline void snd_pcm_set_runtime_buffer(struct snd_pcm_substream *substream, struct snd_dma_buffer *bufp) { struct snd_pcm_runtime *runtime = substream->runtime; if (bufp) { runtime->dma_buffer_p = bufp; runtime->dma_area = bufp->area; runtime->dma_addr = bufp->addr; runtime->dma_bytes = bufp->bytes; } else { runtime->dma_buffer_p = NULL; runtime->dma_area = NULL; runtime->dma_addr = 0; runtime->dma_bytes = 0; } } /** * snd_pcm_gettime - Fill the timespec64 depending on the timestamp mode * @runtime: PCM runtime instance * @tv: timespec64 to fill */ static inline void snd_pcm_gettime(struct snd_pcm_runtime *runtime, struct timespec64 *tv) { switch (runtime->tstamp_type) { case SNDRV_PCM_TSTAMP_TYPE_MONOTONIC: ktime_get_ts64(tv); break; case SNDRV_PCM_TSTAMP_TYPE_MONOTONIC_RAW: ktime_get_raw_ts64(tv); break; default: ktime_get_real_ts64(tv); break; } } /* * Memory */ void snd_pcm_lib_preallocate_free(struct snd_pcm_substream *substream); void snd_pcm_lib_preallocate_free_for_all(struct snd_pcm *pcm); void snd_pcm_lib_preallocate_pages(struct snd_pcm_substream *substream, int type, struct device *data, size_t size, size_t max); void snd_pcm_lib_preallocate_pages_for_all(struct snd_pcm *pcm, int type, void *data, size_t size, size_t max); int snd_pcm_lib_malloc_pages(struct snd_pcm_substream *substream, size_t size); int snd_pcm_lib_free_pages(struct snd_pcm_substream *substream); int snd_pcm_set_managed_buffer(struct snd_pcm_substream *substream, int type, struct device *data, size_t size, size_t max); int snd_pcm_set_managed_buffer_all(struct snd_pcm *pcm, int type, struct device *data, size_t size, size_t max); /** * snd_pcm_set_fixed_buffer - Preallocate and set up the fixed size PCM buffer * @substream: the pcm substream instance * @type: DMA type (SNDRV_DMA_TYPE_*) * @data: DMA type dependent data * @size: the requested pre-allocation size in bytes * * This is a variant of snd_pcm_set_managed_buffer(), but this pre-allocates * only the given sized buffer and doesn't allow re-allocation nor dynamic * allocation of a larger buffer unlike the standard one. * The function may return -ENOMEM error, hence the caller must check it. * * Return: zero if successful, or a negative error code */ static inline int __must_check snd_pcm_set_fixed_buffer(struct snd_pcm_substream *substream, int type, struct device *data, size_t size) { return snd_pcm_set_managed_buffer(substream, type, data, size, 0); } /** * snd_pcm_set_fixed_buffer_all - Preallocate and set up the fixed size PCM buffer * @pcm: the pcm instance * @type: DMA type (SNDRV_DMA_TYPE_*) * @data: DMA type dependent data * @size: the requested pre-allocation size in bytes * * Apply the set up of the fixed buffer via snd_pcm_set_fixed_buffer() for * all substream. If any of allocation fails, it returns -ENOMEM, hence the * caller must check the return value. * * Return: zero if successful, or a negative error code */ static inline int __must_check snd_pcm_set_fixed_buffer_all(struct snd_pcm *pcm, int type, struct device *data, size_t size) { return snd_pcm_set_managed_buffer_all(pcm, type, data, size, 0); } #define snd_pcm_get_dma_buf(substream) ((substream)->runtime->dma_buffer_p) /** * snd_pcm_sgbuf_get_addr - Get the DMA address at the corresponding offset * @substream: PCM substream * @ofs: byte offset * * Return: DMA address */ static inline dma_addr_t snd_pcm_sgbuf_get_addr(struct snd_pcm_substream *substream, unsigned int ofs) { return snd_sgbuf_get_addr(snd_pcm_get_dma_buf(substream), ofs); } /** * snd_pcm_sgbuf_get_chunk_size - Compute the max size that fits within the * contig. page from the given size * @substream: PCM substream * @ofs: byte offset * @size: byte size to examine * * Return: chunk size */ static inline unsigned int snd_pcm_sgbuf_get_chunk_size(struct snd_pcm_substream *substream, unsigned int ofs, unsigned int size) { return snd_sgbuf_get_chunk_size(snd_pcm_get_dma_buf(substream), ofs, size); } int snd_pcm_lib_default_mmap(struct snd_pcm_substream *substream, struct vm_area_struct *area); /* mmap for io-memory area */ #if defined(CONFIG_X86) || defined(CONFIG_PPC) || defined(CONFIG_ALPHA) #define SNDRV_PCM_INFO_MMAP_IOMEM SNDRV_PCM_INFO_MMAP int snd_pcm_lib_mmap_iomem(struct snd_pcm_substream *substream, struct vm_area_struct *area); #else #define SNDRV_PCM_INFO_MMAP_IOMEM 0 #define snd_pcm_lib_mmap_iomem NULL #endif void snd_pcm_runtime_buffer_set_silence(struct snd_pcm_runtime *runtime); /** * snd_pcm_limit_isa_dma_size - Get the max size fitting with ISA DMA transfer * @dma: DMA number * @max: pointer to store the max size */ static inline void snd_pcm_limit_isa_dma_size(int dma, size_t *max) { *max = dma < 4 ? 64 * 1024 : 128 * 1024; } /* * Misc */ #define SNDRV_PCM_DEFAULT_CON_SPDIF (IEC958_AES0_CON_EMPHASIS_NONE|\ (IEC958_AES1_CON_ORIGINAL<<8)|\ (IEC958_AES1_CON_PCM_CODER<<8)|\ (IEC958_AES3_CON_FS_48000<<24)) const char *snd_pcm_format_name(snd_pcm_format_t format); /** * snd_pcm_direction_name - Get a string naming the direction of a stream * @direction: Stream's direction, one of SNDRV_PCM_STREAM_XXX * * Returns a string naming the direction of the stream. */ static inline const char *snd_pcm_direction_name(int direction) { if (direction == SNDRV_PCM_STREAM_PLAYBACK) return "Playback"; else return "Capture"; } /** * snd_pcm_stream_str - Get a string naming the direction of a stream * @substream: the pcm substream instance * * Return: A string naming the direction of the stream. */ static inline const char *snd_pcm_stream_str(struct snd_pcm_substream *substream) { return snd_pcm_direction_name(substream->stream); } /* * PCM channel-mapping control API */ /* array element of channel maps */ struct snd_pcm_chmap_elem { unsigned char channels; unsigned char map[15]; }; /* channel map information; retrieved via snd_kcontrol_chip() */ struct snd_pcm_chmap { struct snd_pcm *pcm; /* assigned PCM instance */ int stream; /* PLAYBACK or CAPTURE */ struct snd_kcontrol *kctl; const struct snd_pcm_chmap_elem *chmap; unsigned int max_channels; unsigned int channel_mask; /* optional: active channels bitmask */ void *private_data; /* optional: private data pointer */ }; /** * snd_pcm_chmap_substream - get the PCM substream assigned to the given chmap info * @info: chmap information * @idx: the substream number index * * Return: the matched PCM substream, or NULL if not found */ static inline struct snd_pcm_substream * snd_pcm_chmap_substream(struct snd_pcm_chmap *info, unsigned int idx) { struct snd_pcm_substream *s; for (s = info->pcm->streams[info->stream].substream; s; s = s->next) if (s->number == idx) return s; return NULL; } /* ALSA-standard channel maps (RL/RR prior to C/LFE) */ extern const struct snd_pcm_chmap_elem snd_pcm_std_chmaps[]; /* Other world's standard channel maps (C/LFE prior to RL/RR) */ extern const struct snd_pcm_chmap_elem snd_pcm_alt_chmaps[]; /* bit masks to be passed to snd_pcm_chmap.channel_mask field */ #define SND_PCM_CHMAP_MASK_24 ((1U << 2) | (1U << 4)) #define SND_PCM_CHMAP_MASK_246 (SND_PCM_CHMAP_MASK_24 | (1U << 6)) #define SND_PCM_CHMAP_MASK_2468 (SND_PCM_CHMAP_MASK_246 | (1U << 8)) int snd_pcm_add_chmap_ctls(struct snd_pcm *pcm, int stream, const struct snd_pcm_chmap_elem *chmap, int max_channels, unsigned long private_value, struct snd_pcm_chmap **info_ret); /** * pcm_format_to_bits - Strong-typed conversion of pcm_format to bitwise * @pcm_format: PCM format * * Return: 64bit mask corresponding to the given PCM format */ static inline u64 pcm_format_to_bits(snd_pcm_format_t pcm_format) { return 1ULL << (__force int) pcm_format; } /** * pcm_for_each_format - helper to iterate for each format type * @f: the iterator variable in snd_pcm_format_t type */ #define pcm_for_each_format(f) \ for ((f) = SNDRV_PCM_FORMAT_FIRST; \ (__force int)(f) <= (__force int)SNDRV_PCM_FORMAT_LAST; \ (f) = (__force snd_pcm_format_t)((__force int)(f) + 1)) /* printk helpers */ #define pcm_err(pcm, fmt, args...) \ dev_err((pcm)->card->dev, fmt, ##args) #define pcm_warn(pcm, fmt, args...) \ dev_warn((pcm)->card->dev, fmt, ##args) #define pcm_dbg(pcm, fmt, args...) \ dev_dbg((pcm)->card->dev, fmt, ##args) /* helpers for copying between iov_iter and iomem */ size_t copy_to_iter_fromio(const void __iomem *src, size_t bytes, struct iov_iter *iter) __must_check; size_t copy_from_iter_toio(void __iomem *dst, size_t bytes, struct iov_iter *iter) __must_check; struct snd_pcm_status64 { snd_pcm_state_t state; /* stream state */ u8 rsvd[4]; s64 trigger_tstamp_sec; /* time when stream was started/stopped/paused */ s64 trigger_tstamp_nsec; s64 tstamp_sec; /* reference timestamp */ s64 tstamp_nsec; snd_pcm_uframes_t appl_ptr; /* appl ptr */ snd_pcm_uframes_t hw_ptr; /* hw ptr */ snd_pcm_sframes_t delay; /* current delay in frames */ snd_pcm_uframes_t avail; /* number of frames available */ snd_pcm_uframes_t avail_max; /* max frames available on hw since last status */ snd_pcm_uframes_t overrange; /* count of ADC (capture) overrange detections from last status */ snd_pcm_state_t suspended_state; /* suspended stream state */ __u32 audio_tstamp_data; /* needed for 64-bit alignment, used for configs/report to/from userspace */ s64 audio_tstamp_sec; /* sample counter, wall clock, PHC or on-demand sync'ed */ s64 audio_tstamp_nsec; s64 driver_tstamp_sec; /* useful in case reference system tstamp is reported with delay */ s64 driver_tstamp_nsec; __u32 audio_tstamp_accuracy; /* in ns units, only valid if indicated in audio_tstamp_data */ unsigned char reserved[52-4*sizeof(s64)]; /* must be filled with zero */ }; #define SNDRV_PCM_IOCTL_STATUS64 _IOR('A', 0x20, struct snd_pcm_status64) #define SNDRV_PCM_IOCTL_STATUS_EXT64 _IOWR('A', 0x24, struct snd_pcm_status64) struct snd_pcm_status32 { snd_pcm_state_t state; /* stream state */ s32 trigger_tstamp_sec; /* time when stream was started/stopped/paused */ s32 trigger_tstamp_nsec; s32 tstamp_sec; /* reference timestamp */ s32 tstamp_nsec; u32 appl_ptr; /* appl ptr */ u32 hw_ptr; /* hw ptr */ s32 delay; /* current delay in frames */ u32 avail; /* number of frames available */ u32 avail_max; /* max frames available on hw since last status */ u32 overrange; /* count of ADC (capture) overrange detections from last status */ snd_pcm_state_t suspended_state; /* suspended stream state */ u32 audio_tstamp_data; /* needed for 64-bit alignment, used for configs/report to/from userspace */ s32 audio_tstamp_sec; /* sample counter, wall clock, PHC or on-demand sync'ed */ s32 audio_tstamp_nsec; s32 driver_tstamp_sec; /* useful in case reference system tstamp is reported with delay */ s32 driver_tstamp_nsec; u32 audio_tstamp_accuracy; /* in ns units, only valid if indicated in audio_tstamp_data */ unsigned char reserved[52-4*sizeof(s32)]; /* must be filled with zero */ }; #define SNDRV_PCM_IOCTL_STATUS32 _IOR('A', 0x20, struct snd_pcm_status32) #define SNDRV_PCM_IOCTL_STATUS_EXT32 _IOWR('A', 0x24, struct snd_pcm_status32) #endif /* __SOUND_PCM_H */ |
| 8 33 104 104 8 1 5 1 3 9 42 42 60 111 21 11 136 24 41 314 30 309 56 56 38 56 136 136 136 136 136 136 93 144 15 34 34 19 19 271 147 147 59 88 1411 1399 50 50 49 6 30 49 33 33 50 50 42 3 1412 1410 1413 49 42 42 43 43 21 1370 1412 1411 71 70 72 21 21 21 21 21 21 21 3 21 17 6 21 10 20 3 24 23 23 56 56 55 50 28 49 49 49 49 49 49 50 50 4 49 55 56 56 24 50 49 50 50 43 10 36 3 35 44 24 20 20 21 21 21 4 62 64 64 25 24 117 103 95 103 104 104 104 110 112 187 106 103 187 106 104 32 164 188 25 25 17 29 9 18 4 14 25 112 25 93 33 59 104 83 21 21 21 2 2 14 36 36 36 35 36 36 36 93 14 1 14 15 14 14 15 15 1 14 18 18 17 18 18 2 14 15 2 21 40 47 47 7 46 46 4 43 29 43 30 43 43 43 43 43 43 10 57 18 43 32 19 57 43 43 42 18 43 30 39 56 57 43 43 26 28 57 57 57 44 57 44 47 47 47 18 47 47 47 45 46 46 56 57 57 47 4 4 4 44 5 44 44 43 13 24 34 35 13 24 35 34 41 7 35 8 27 12 24 34 34 33 26 9 11 24 2 34 34 33 3 3 2 2 1 25 25 18 25 25 13 25 36 36 36 36 1 25 25 25 25 112 112 111 111 112 111 111 112 54 54 111 112 112 112 25 104 104 111 9 112 25 25 25 25 25 25 7 25 9 25 25 24 25 25 6 11 6 11 27 25 2 26 26 1 20 8 22 7 19 23 5 7 7 11 20 19 8 1 1 6 2 6 6 6 3 1 2 2 10 2 8 1 1 5 1 2 2 2 1 1 6 1 3 3 3 1 2 2 2 2 2 2 2 2 2 5 5 5 5 2 3 5 5 2 2 2 2 2 2 11 2 10 4 6 5 3 3 4 6 1 17 2 1 14 8 8 1 4 8 8 8 11 2 1 1 8 51 51 51 25 13 13 25 195 196 1 196 176 18 18 2 9 1 4 2 8 4 4 31 27 30 12 19 105 84 18 3 2 104 104 39 117 117 117 117 117 117 117 117 104 25 15 14 32 7 88 199 5 88 177 176 68 176 75 176 63 2 2 2 1 2 44 43 44 44 44 44 24 34 19 19 19 1 19 64 63 1 12 63 2 35 26 12 3 12 64 64 64 36 4 36 2 37 37 3 3 1 64 64 62 2 39 32 38 32 37 39 62 7 11 28 61 13 49 64 61 38 64 2 61 39 60 29 35 53 10 54 5 7 3 5 11 11 12 10 3 4 8 8 11 28 27 14 28 28 27 28 28 21 7 23 5 4 18 15 4 8 1 104 104 104 103 104 104 104 45 101 104 104 104 104 104 93 93 2 91 2 57 36 93 93 2 93 33 60 94 85 1 3 95 1 94 93 93 25 25 24 3 20 20 1 1 18 18 18 17 1 18 15 4 18 20 2 18 4 4 4 4 4 4 4 4 4 4 4 1 4 4 1516 5 11 15 5 2 1363 18 1381 1380 1379 1381 1362 3 15 9 9 10 10 10 18 1371 1369 1 1358 9 1381 1381 18 1372 1 1 1 1370 9 1361 1370 1371 1 1372 1347 24 24 1369 3 55 55 55 15 2 1 12 12 1 12 12 4 4 4 4 4 4 2 2 2 1942 1943 1945 1945 1 796 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 2674 2675 2676 2677 2678 2679 2680 2681 2682 2683 2684 2685 2686 2687 2688 2689 2690 2691 2692 2693 2694 2695 2696 2697 2698 2699 2700 2701 2702 2703 2704 2705 2706 2707 2708 2709 2710 2711 2712 2713 2714 2715 2716 2717 2718 2719 2720 2721 2722 2723 2724 2725 2726 2727 2728 2729 2730 2731 2732 2733 2734 2735 2736 2737 2738 2739 2740 2741 2742 2743 2744 2745 2746 2747 2748 2749 2750 2751 2752 2753 2754 2755 2756 2757 2758 2759 2760 2761 2762 2763 2764 2765 2766 2767 2768 2769 2770 2771 2772 2773 2774 2775 2776 2777 2778 2779 2780 2781 2782 2783 2784 2785 2786 2787 2788 2789 2790 2791 2792 2793 2794 2795 2796 2797 2798 2799 2800 2801 2802 2803 2804 2805 2806 2807 2808 2809 2810 2811 2812 2813 2814 2815 2816 2817 2818 2819 2820 2821 2822 2823 2824 2825 2826 2827 2828 2829 2830 2831 2832 2833 2834 2835 2836 2837 2838 2839 2840 2841 2842 2843 2844 2845 2846 2847 2848 2849 2850 2851 2852 2853 2854 2855 2856 2857 2858 2859 2860 2861 2862 2863 2864 2865 2866 2867 2868 2869 2870 2871 2872 2873 2874 2875 2876 2877 2878 2879 2880 2881 2882 2883 2884 2885 2886 2887 2888 2889 2890 2891 2892 2893 2894 2895 2896 2897 2898 2899 2900 2901 2902 2903 2904 2905 2906 2907 2908 2909 2910 2911 2912 2913 2914 2915 2916 2917 2918 2919 2920 2921 2922 2923 2924 2925 2926 2927 2928 2929 2930 2931 2932 2933 2934 2935 2936 2937 2938 2939 2940 2941 2942 2943 2944 2945 2946 2947 2948 2949 2950 2951 2952 2953 2954 2955 2956 2957 2958 2959 2960 2961 2962 2963 2964 2965 2966 2967 2968 2969 2970 2971 2972 2973 2974 2975 2976 2977 2978 2979 2980 2981 2982 2983 2984 2985 2986 2987 2988 2989 2990 2991 2992 2993 2994 2995 2996 2997 2998 2999 3000 3001 3002 3003 3004 3005 3006 3007 3008 3009 3010 3011 3012 3013 3014 3015 3016 3017 3018 3019 3020 3021 3022 3023 3024 3025 3026 3027 3028 3029 3030 3031 3032 3033 3034 3035 3036 3037 3038 3039 3040 3041 3042 3043 3044 3045 3046 3047 3048 3049 3050 3051 3052 3053 3054 3055 3056 3057 3058 3059 3060 3061 3062 3063 3064 3065 3066 3067 3068 3069 3070 3071 3072 3073 3074 3075 3076 3077 3078 3079 3080 3081 3082 3083 3084 3085 3086 3087 3088 3089 3090 3091 3092 3093 3094 3095 3096 3097 3098 3099 3100 3101 3102 3103 3104 3105 3106 3107 3108 3109 3110 3111 3112 3113 3114 3115 3116 3117 3118 3119 3120 3121 3122 3123 3124 3125 3126 3127 3128 3129 3130 3131 3132 3133 3134 3135 3136 3137 3138 3139 3140 3141 3142 3143 3144 3145 3146 3147 3148 3149 3150 3151 3152 3153 3154 3155 3156 3157 3158 3159 3160 3161 3162 3163 3164 3165 3166 3167 3168 3169 3170 3171 3172 3173 3174 3175 3176 3177 3178 3179 3180 3181 3182 3183 3184 3185 3186 3187 3188 3189 3190 3191 3192 3193 3194 3195 3196 3197 3198 3199 3200 3201 3202 3203 3204 3205 3206 3207 3208 3209 3210 3211 3212 3213 3214 3215 3216 3217 3218 3219 3220 3221 3222 3223 3224 3225 3226 3227 3228 3229 3230 3231 3232 3233 3234 3235 3236 3237 3238 3239 3240 3241 3242 3243 3244 3245 3246 3247 3248 3249 3250 3251 3252 3253 3254 3255 3256 3257 3258 3259 3260 3261 3262 3263 3264 3265 3266 3267 3268 3269 3270 3271 3272 3273 3274 3275 3276 3277 3278 3279 3280 3281 3282 3283 3284 3285 3286 3287 3288 3289 3290 3291 3292 3293 3294 3295 3296 3297 3298 3299 3300 3301 3302 3303 3304 3305 3306 3307 3308 3309 3310 3311 3312 3313 3314 3315 3316 3317 3318 3319 3320 3321 3322 3323 3324 3325 3326 3327 3328 3329 3330 3331 3332 3333 3334 3335 3336 3337 3338 3339 3340 3341 3342 3343 3344 3345 3346 3347 3348 3349 3350 3351 3352 3353 3354 3355 3356 3357 3358 3359 3360 3361 3362 3363 3364 3365 3366 3367 3368 3369 3370 3371 3372 3373 3374 3375 3376 3377 3378 3379 3380 3381 3382 3383 3384 3385 3386 3387 3388 3389 3390 3391 3392 3393 3394 3395 3396 3397 3398 3399 3400 3401 3402 3403 3404 3405 3406 3407 3408 3409 3410 3411 3412 3413 3414 3415 3416 3417 3418 3419 3420 3421 3422 3423 3424 3425 3426 3427 3428 3429 3430 3431 3432 3433 3434 3435 3436 3437 3438 3439 3440 3441 3442 3443 3444 3445 3446 3447 3448 3449 3450 3451 3452 3453 3454 3455 3456 3457 3458 3459 3460 3461 3462 3463 3464 3465 3466 3467 3468 3469 3470 3471 3472 3473 3474 3475 3476 3477 3478 3479 3480 3481 3482 3483 3484 3485 3486 3487 3488 3489 3490 3491 3492 3493 3494 3495 3496 3497 3498 3499 3500 3501 3502 3503 3504 3505 3506 3507 3508 3509 3510 3511 3512 3513 3514 3515 3516 3517 3518 3519 3520 3521 3522 3523 3524 3525 3526 3527 3528 3529 3530 3531 3532 3533 3534 3535 3536 3537 3538 3539 3540 3541 3542 3543 3544 3545 3546 3547 3548 3549 3550 3551 3552 3553 3554 3555 3556 3557 3558 3559 3560 3561 3562 3563 3564 3565 3566 3567 3568 3569 3570 3571 3572 3573 3574 3575 3576 3577 3578 3579 3580 3581 3582 3583 3584 3585 3586 3587 3588 3589 3590 3591 3592 3593 3594 3595 3596 3597 3598 3599 3600 3601 3602 3603 3604 3605 3606 3607 3608 3609 3610 3611 3612 3613 3614 3615 3616 3617 3618 3619 3620 3621 3622 3623 3624 3625 3626 3627 3628 3629 3630 3631 3632 3633 3634 3635 3636 3637 3638 3639 3640 3641 3642 3643 3644 3645 3646 3647 3648 3649 3650 3651 3652 3653 3654 3655 3656 3657 3658 3659 3660 3661 3662 3663 3664 3665 3666 3667 3668 3669 3670 3671 3672 3673 3674 3675 3676 3677 3678 3679 3680 3681 3682 3683 3684 3685 3686 3687 3688 3689 3690 3691 3692 3693 3694 3695 3696 3697 3698 3699 3700 3701 3702 3703 3704 3705 3706 3707 3708 3709 3710 3711 3712 3713 3714 3715 3716 3717 3718 3719 3720 3721 3722 3723 3724 3725 3726 3727 3728 3729 3730 3731 3732 3733 3734 3735 3736 3737 3738 3739 3740 3741 3742 3743 3744 3745 3746 3747 3748 3749 3750 3751 3752 3753 3754 3755 3756 3757 3758 3759 3760 3761 3762 3763 3764 3765 3766 3767 3768 3769 3770 3771 3772 3773 3774 3775 3776 3777 3778 3779 3780 3781 3782 3783 3784 3785 3786 3787 3788 3789 3790 3791 3792 3793 3794 3795 3796 3797 3798 3799 3800 3801 3802 3803 3804 3805 3806 3807 3808 3809 3810 3811 3812 3813 3814 3815 3816 3817 3818 3819 3820 3821 3822 3823 3824 3825 3826 3827 3828 3829 3830 3831 3832 3833 3834 3835 3836 3837 3838 3839 3840 3841 3842 3843 3844 3845 3846 3847 3848 3849 3850 3851 3852 3853 3854 3855 3856 3857 3858 3859 3860 3861 3862 3863 3864 3865 3866 3867 3868 3869 3870 3871 3872 3873 3874 3875 3876 3877 3878 3879 3880 3881 3882 3883 3884 3885 3886 3887 3888 3889 3890 3891 3892 3893 3894 3895 3896 3897 3898 3899 3900 3901 3902 3903 3904 3905 3906 3907 3908 3909 3910 3911 3912 3913 3914 3915 3916 3917 3918 3919 3920 3921 3922 3923 3924 3925 3926 3927 3928 3929 3930 3931 3932 3933 3934 3935 3936 3937 3938 3939 3940 3941 3942 3943 3944 3945 3946 3947 3948 3949 3950 3951 3952 3953 3954 3955 3956 3957 3958 3959 3960 3961 3962 3963 3964 3965 3966 3967 3968 3969 3970 3971 3972 3973 3974 3975 3976 3977 3978 3979 3980 3981 3982 3983 3984 3985 3986 3987 3988 3989 3990 3991 3992 3993 3994 3995 3996 3997 3998 3999 4000 4001 4002 4003 4004 4005 4006 4007 4008 4009 4010 4011 4012 4013 4014 4015 4016 4017 4018 4019 4020 4021 4022 4023 4024 4025 4026 4027 4028 4029 4030 4031 4032 4033 4034 4035 4036 4037 4038 4039 4040 4041 4042 4043 4044 4045 4046 4047 4048 4049 4050 4051 4052 4053 4054 4055 4056 4057 4058 4059 4060 4061 4062 4063 4064 4065 4066 4067 4068 4069 4070 4071 4072 4073 4074 4075 4076 4077 4078 4079 4080 4081 4082 4083 4084 4085 4086 4087 4088 4089 4090 4091 4092 4093 4094 4095 4096 4097 4098 4099 4100 4101 4102 4103 4104 4105 4106 4107 4108 4109 4110 4111 4112 4113 4114 4115 4116 4117 4118 4119 4120 4121 4122 4123 4124 4125 4126 4127 4128 4129 4130 4131 4132 4133 4134 4135 4136 4137 4138 4139 4140 4141 4142 4143 4144 4145 4146 4147 4148 4149 4150 4151 4152 4153 4154 4155 4156 4157 4158 4159 4160 4161 4162 4163 4164 4165 4166 4167 4168 4169 4170 4171 4172 4173 4174 4175 4176 4177 4178 4179 4180 4181 4182 4183 4184 4185 4186 4187 4188 4189 4190 4191 4192 4193 4194 4195 4196 4197 4198 4199 4200 4201 4202 4203 4204 4205 4206 4207 4208 4209 4210 4211 4212 4213 4214 4215 4216 4217 4218 4219 4220 4221 4222 4223 4224 4225 4226 4227 4228 4229 4230 4231 4232 4233 4234 4235 4236 4237 4238 4239 4240 4241 4242 4243 4244 4245 4246 4247 4248 4249 4250 4251 4252 4253 4254 4255 4256 4257 4258 4259 4260 4261 4262 4263 4264 4265 4266 4267 4268 4269 4270 4271 4272 4273 4274 4275 4276 4277 4278 4279 4280 4281 4282 4283 4284 4285 4286 4287 4288 4289 4290 4291 4292 4293 4294 4295 4296 4297 4298 4299 4300 4301 4302 4303 4304 4305 4306 4307 4308 4309 4310 4311 4312 4313 4314 4315 4316 4317 4318 4319 4320 4321 4322 4323 4324 4325 4326 4327 4328 4329 4330 4331 4332 4333 4334 4335 4336 4337 4338 4339 4340 4341 4342 4343 4344 4345 4346 4347 4348 4349 4350 4351 4352 4353 4354 4355 4356 4357 4358 4359 4360 4361 4362 4363 4364 4365 4366 4367 4368 4369 4370 4371 4372 4373 4374 4375 4376 4377 4378 4379 4380 4381 4382 4383 4384 4385 4386 4387 4388 4389 4390 4391 4392 4393 4394 4395 4396 4397 4398 4399 4400 4401 4402 4403 4404 4405 4406 4407 4408 4409 4410 4411 4412 4413 4414 4415 4416 4417 4418 4419 4420 4421 4422 4423 4424 4425 4426 4427 4428 4429 4430 4431 4432 4433 4434 4435 4436 4437 4438 4439 4440 4441 4442 4443 4444 4445 4446 4447 4448 4449 4450 4451 4452 4453 4454 4455 4456 4457 4458 4459 4460 4461 4462 4463 4464 4465 4466 4467 4468 4469 4470 4471 4472 4473 4474 4475 4476 4477 4478 4479 4480 4481 4482 4483 4484 4485 4486 4487 4488 4489 4490 4491 4492 4493 4494 4495 4496 4497 4498 4499 4500 4501 4502 4503 4504 4505 4506 4507 4508 4509 4510 4511 4512 4513 4514 4515 4516 4517 4518 4519 4520 4521 4522 4523 4524 4525 4526 4527 4528 4529 4530 4531 4532 4533 4534 4535 4536 4537 4538 4539 4540 4541 4542 4543 4544 4545 4546 4547 4548 4549 4550 4551 4552 4553 4554 4555 4556 4557 4558 4559 4560 4561 4562 4563 4564 4565 4566 4567 4568 4569 4570 4571 4572 4573 4574 4575 4576 4577 4578 4579 4580 4581 4582 4583 4584 4585 4586 4587 4588 4589 4590 4591 4592 4593 4594 4595 4596 4597 4598 4599 4600 4601 4602 4603 4604 4605 4606 4607 4608 4609 4610 4611 4612 4613 4614 4615 4616 4617 4618 4619 4620 4621 4622 4623 4624 4625 4626 4627 4628 4629 4630 4631 4632 4633 4634 4635 4636 4637 4638 4639 4640 4641 4642 4643 4644 4645 4646 4647 4648 4649 4650 4651 4652 4653 4654 4655 4656 4657 4658 4659 4660 4661 4662 4663 4664 4665 4666 4667 4668 4669 4670 4671 4672 4673 4674 4675 4676 4677 4678 4679 4680 4681 4682 4683 4684 4685 4686 4687 4688 4689 4690 4691 4692 4693 4694 4695 4696 4697 4698 4699 4700 4701 4702 4703 4704 4705 4706 4707 4708 4709 4710 4711 4712 4713 4714 4715 4716 4717 4718 4719 4720 4721 4722 4723 4724 4725 4726 4727 4728 4729 4730 4731 4732 4733 4734 4735 4736 4737 4738 4739 4740 4741 4742 4743 4744 4745 4746 4747 4748 4749 4750 4751 4752 4753 4754 4755 4756 4757 4758 4759 4760 4761 4762 4763 4764 4765 4766 4767 4768 4769 4770 4771 4772 4773 4774 4775 4776 4777 4778 4779 4780 4781 4782 4783 4784 4785 4786 4787 4788 4789 4790 4791 4792 4793 4794 4795 4796 4797 4798 4799 4800 4801 4802 4803 4804 4805 4806 4807 4808 4809 4810 4811 4812 4813 4814 4815 4816 4817 4818 4819 4820 4821 4822 4823 4824 4825 4826 4827 4828 4829 4830 4831 4832 4833 4834 4835 4836 4837 4838 4839 4840 4841 4842 4843 4844 4845 4846 4847 4848 4849 4850 4851 4852 4853 4854 4855 4856 4857 4858 4859 4860 4861 4862 4863 4864 4865 4866 4867 4868 4869 4870 4871 4872 4873 4874 4875 4876 4877 4878 4879 4880 4881 4882 4883 4884 4885 4886 4887 4888 4889 4890 4891 4892 4893 4894 4895 4896 4897 4898 4899 4900 4901 4902 4903 4904 4905 4906 4907 4908 4909 4910 4911 4912 4913 4914 4915 4916 4917 4918 4919 4920 4921 4922 4923 4924 4925 4926 4927 4928 4929 4930 4931 4932 4933 4934 4935 4936 4937 4938 4939 4940 4941 4942 4943 4944 4945 4946 4947 4948 4949 4950 4951 4952 4953 4954 4955 4956 4957 4958 4959 4960 4961 4962 4963 4964 4965 4966 4967 4968 4969 4970 4971 4972 4973 4974 4975 4976 4977 4978 4979 4980 4981 4982 4983 4984 4985 4986 4987 4988 4989 4990 4991 4992 4993 4994 4995 4996 4997 4998 4999 5000 5001 5002 5003 5004 5005 5006 5007 5008 5009 5010 5011 5012 5013 5014 5015 5016 5017 5018 5019 5020 5021 5022 5023 5024 5025 5026 5027 5028 5029 5030 5031 5032 5033 5034 5035 5036 5037 5038 5039 5040 5041 5042 5043 5044 5045 5046 5047 5048 5049 5050 5051 5052 5053 5054 5055 5056 5057 5058 5059 5060 5061 5062 5063 5064 5065 5066 5067 5068 5069 5070 5071 5072 5073 5074 5075 5076 5077 5078 5079 5080 5081 5082 5083 5084 5085 5086 5087 5088 5089 5090 5091 5092 5093 5094 5095 5096 5097 5098 5099 5100 5101 5102 5103 5104 5105 5106 5107 5108 5109 5110 5111 5112 5113 5114 5115 5116 5117 5118 5119 5120 5121 5122 5123 5124 5125 5126 5127 5128 5129 5130 5131 5132 5133 5134 5135 5136 5137 5138 5139 5140 5141 5142 5143 5144 5145 5146 5147 5148 5149 5150 5151 5152 5153 5154 5155 5156 5157 5158 5159 5160 5161 5162 5163 5164 5165 5166 5167 5168 5169 5170 5171 5172 5173 5174 5175 5176 5177 5178 5179 5180 5181 5182 5183 5184 5185 5186 5187 5188 5189 5190 5191 5192 5193 5194 5195 5196 5197 5198 5199 5200 5201 5202 5203 5204 5205 5206 5207 5208 5209 5210 5211 5212 5213 5214 5215 5216 5217 5218 5219 5220 5221 5222 5223 5224 5225 5226 5227 5228 5229 5230 5231 5232 5233 5234 5235 5236 5237 5238 5239 5240 5241 5242 5243 5244 5245 5246 5247 5248 5249 5250 5251 5252 5253 5254 5255 5256 5257 5258 5259 5260 5261 5262 5263 5264 5265 5266 5267 5268 5269 5270 5271 5272 5273 5274 5275 5276 5277 5278 5279 5280 5281 5282 5283 5284 5285 5286 5287 5288 5289 5290 5291 5292 5293 5294 5295 5296 5297 5298 5299 5300 5301 5302 5303 5304 5305 5306 5307 5308 5309 5310 5311 5312 5313 5314 5315 5316 5317 5318 5319 5320 5321 5322 5323 5324 5325 5326 5327 5328 5329 5330 5331 5332 5333 5334 5335 5336 5337 5338 5339 5340 5341 5342 5343 5344 5345 5346 5347 5348 5349 5350 5351 5352 5353 5354 5355 5356 5357 5358 5359 5360 5361 5362 5363 5364 5365 5366 5367 5368 5369 5370 5371 5372 5373 5374 5375 5376 5377 5378 5379 5380 5381 5382 5383 5384 5385 5386 5387 5388 5389 5390 5391 5392 5393 5394 5395 5396 5397 5398 5399 5400 5401 5402 5403 5404 5405 5406 5407 5408 5409 5410 5411 5412 5413 5414 5415 5416 5417 5418 5419 5420 5421 5422 5423 5424 5425 5426 5427 5428 5429 5430 5431 5432 5433 5434 5435 5436 5437 5438 5439 5440 5441 5442 5443 5444 5445 5446 5447 5448 5449 5450 5451 5452 5453 5454 5455 5456 5457 5458 5459 5460 5461 5462 5463 5464 5465 5466 5467 5468 5469 5470 5471 5472 5473 5474 5475 5476 5477 5478 5479 5480 5481 5482 5483 5484 5485 5486 5487 5488 5489 5490 5491 5492 5493 5494 5495 5496 5497 5498 5499 5500 5501 5502 5503 5504 5505 5506 5507 5508 5509 5510 5511 5512 5513 5514 5515 5516 5517 5518 5519 5520 5521 5522 5523 5524 5525 5526 5527 5528 5529 5530 5531 5532 5533 5534 5535 5536 5537 5538 5539 5540 5541 5542 5543 5544 5545 5546 5547 5548 5549 5550 5551 5552 5553 5554 5555 5556 5557 5558 5559 5560 5561 5562 5563 5564 5565 5566 5567 5568 5569 5570 5571 5572 5573 5574 5575 5576 5577 5578 5579 5580 5581 5582 5583 5584 5585 5586 5587 5588 5589 5590 5591 5592 5593 5594 5595 5596 5597 5598 5599 5600 5601 5602 5603 5604 5605 5606 5607 5608 5609 5610 5611 5612 5613 5614 5615 5616 5617 5618 5619 5620 5621 5622 5623 5624 5625 5626 5627 5628 5629 5630 5631 5632 5633 5634 5635 5636 5637 5638 5639 5640 5641 5642 5643 5644 5645 5646 5647 5648 5649 5650 5651 5652 5653 5654 5655 5656 5657 5658 5659 5660 5661 5662 5663 5664 5665 5666 5667 5668 5669 5670 5671 5672 5673 5674 5675 5676 5677 5678 5679 5680 5681 5682 5683 5684 5685 5686 5687 5688 5689 5690 5691 5692 5693 5694 5695 5696 5697 5698 5699 5700 5701 5702 5703 5704 5705 5706 5707 5708 5709 5710 5711 5712 5713 5714 5715 5716 5717 5718 5719 5720 5721 5722 5723 5724 5725 5726 5727 5728 5729 5730 5731 5732 5733 5734 5735 5736 5737 5738 5739 5740 5741 5742 5743 5744 5745 5746 5747 5748 5749 5750 5751 5752 5753 5754 5755 5756 5757 5758 5759 5760 5761 5762 5763 5764 5765 5766 5767 5768 5769 5770 5771 5772 5773 5774 5775 5776 5777 5778 5779 5780 5781 5782 5783 5784 5785 5786 5787 5788 5789 5790 5791 5792 5793 5794 5795 5796 5797 5798 5799 5800 5801 5802 5803 5804 5805 5806 5807 5808 5809 5810 5811 5812 5813 5814 5815 5816 5817 5818 5819 5820 5821 5822 5823 5824 5825 5826 5827 5828 5829 5830 5831 5832 5833 5834 5835 5836 5837 5838 5839 5840 5841 5842 5843 5844 5845 5846 5847 5848 5849 5850 5851 5852 5853 5854 5855 5856 5857 5858 5859 5860 5861 5862 5863 5864 5865 5866 5867 5868 5869 5870 5871 5872 5873 5874 5875 5876 5877 5878 5879 5880 5881 5882 5883 5884 5885 5886 5887 5888 5889 5890 5891 5892 5893 5894 5895 5896 5897 5898 5899 5900 5901 5902 5903 5904 5905 5906 5907 5908 5909 5910 5911 5912 5913 5914 5915 5916 5917 5918 5919 5920 5921 5922 5923 5924 5925 5926 5927 5928 5929 5930 5931 5932 5933 5934 5935 5936 5937 5938 5939 5940 5941 5942 5943 5944 5945 5946 5947 5948 5949 5950 5951 5952 5953 5954 5955 5956 5957 5958 5959 5960 5961 5962 5963 5964 5965 5966 5967 5968 5969 5970 5971 5972 5973 5974 5975 5976 5977 5978 5979 5980 5981 5982 5983 5984 5985 5986 5987 5988 5989 5990 5991 5992 5993 5994 5995 5996 5997 5998 5999 6000 6001 6002 6003 6004 6005 6006 6007 6008 6009 6010 6011 6012 6013 6014 6015 6016 6017 6018 6019 6020 6021 6022 6023 6024 6025 6026 6027 6028 6029 6030 6031 6032 6033 6034 6035 6036 6037 6038 6039 6040 6041 6042 6043 6044 6045 6046 6047 6048 6049 6050 6051 6052 6053 6054 6055 6056 6057 6058 6059 6060 6061 6062 6063 6064 6065 6066 6067 6068 6069 6070 6071 6072 6073 6074 6075 6076 6077 6078 6079 6080 6081 6082 6083 6084 6085 6086 6087 6088 6089 6090 6091 6092 6093 6094 6095 6096 6097 6098 6099 6100 6101 6102 6103 6104 6105 6106 6107 6108 6109 6110 6111 6112 6113 6114 6115 6116 6117 6118 6119 6120 6121 6122 6123 6124 6125 6126 6127 6128 6129 6130 6131 6132 6133 6134 6135 6136 6137 6138 6139 6140 6141 6142 6143 6144 6145 6146 6147 6148 6149 6150 6151 6152 6153 6154 6155 6156 6157 6158 6159 6160 6161 6162 6163 6164 6165 6166 6167 6168 6169 6170 6171 6172 6173 6174 6175 6176 6177 6178 6179 6180 6181 6182 6183 6184 6185 6186 6187 6188 6189 6190 6191 6192 6193 6194 6195 6196 6197 6198 6199 6200 6201 6202 6203 6204 6205 6206 6207 6208 6209 6210 6211 6212 6213 6214 6215 6216 6217 6218 6219 6220 6221 6222 6223 6224 6225 6226 6227 6228 6229 6230 6231 6232 6233 6234 6235 6236 6237 6238 6239 6240 6241 6242 6243 6244 6245 6246 6247 6248 6249 6250 6251 6252 6253 6254 6255 6256 6257 6258 6259 6260 6261 6262 6263 6264 6265 6266 6267 6268 6269 6270 6271 6272 6273 6274 6275 6276 6277 6278 6279 6280 6281 6282 6283 6284 6285 6286 6287 6288 6289 6290 6291 6292 6293 6294 6295 6296 6297 6298 6299 6300 6301 6302 6303 6304 6305 6306 6307 6308 6309 6310 6311 6312 6313 6314 6315 6316 6317 6318 6319 6320 6321 6322 6323 6324 6325 6326 6327 6328 6329 6330 6331 6332 6333 6334 6335 6336 6337 6338 6339 6340 6341 6342 6343 6344 6345 6346 6347 6348 6349 6350 6351 6352 6353 6354 6355 6356 6357 6358 6359 6360 6361 6362 6363 6364 6365 6366 6367 6368 6369 6370 6371 6372 6373 6374 6375 6376 6377 6378 6379 6380 6381 6382 6383 6384 6385 6386 6387 6388 6389 6390 6391 6392 6393 6394 6395 6396 6397 6398 6399 6400 6401 6402 6403 6404 6405 6406 6407 6408 6409 6410 6411 6412 6413 6414 6415 6416 6417 6418 6419 6420 6421 6422 6423 6424 6425 6426 6427 6428 6429 6430 6431 6432 6433 6434 6435 6436 6437 6438 6439 6440 6441 6442 6443 6444 6445 6446 6447 6448 6449 6450 6451 6452 6453 6454 6455 6456 6457 6458 6459 6460 6461 6462 6463 6464 6465 6466 6467 6468 6469 6470 6471 6472 6473 6474 6475 6476 6477 6478 6479 6480 6481 6482 6483 6484 6485 6486 6487 6488 6489 6490 6491 6492 6493 6494 6495 6496 6497 6498 6499 6500 6501 6502 6503 6504 6505 6506 6507 6508 6509 6510 6511 6512 6513 6514 6515 6516 6517 6518 6519 6520 6521 6522 6523 6524 6525 6526 6527 6528 6529 6530 6531 6532 6533 6534 6535 6536 6537 6538 6539 6540 6541 6542 6543 6544 6545 6546 6547 6548 6549 6550 6551 6552 6553 6554 6555 6556 6557 6558 6559 6560 6561 6562 6563 6564 6565 6566 6567 6568 6569 6570 6571 6572 6573 6574 6575 6576 6577 6578 6579 6580 6581 6582 6583 6584 6585 6586 6587 6588 6589 6590 6591 6592 6593 6594 6595 6596 6597 6598 6599 6600 6601 6602 6603 6604 6605 6606 6607 6608 6609 6610 6611 6612 6613 6614 6615 6616 6617 6618 6619 6620 6621 6622 6623 6624 6625 6626 6627 6628 6629 6630 6631 6632 6633 6634 6635 6636 6637 6638 6639 6640 6641 6642 6643 6644 6645 6646 6647 6648 6649 6650 6651 6652 6653 6654 6655 6656 6657 6658 6659 6660 6661 6662 6663 6664 6665 6666 6667 6668 6669 6670 6671 6672 6673 6674 6675 6676 6677 6678 6679 6680 6681 6682 6683 6684 6685 6686 6687 6688 6689 6690 6691 6692 6693 6694 6695 6696 6697 6698 6699 6700 6701 6702 6703 6704 6705 6706 6707 6708 6709 6710 6711 6712 6713 6714 6715 6716 6717 6718 6719 6720 6721 6722 6723 6724 6725 6726 6727 6728 6729 6730 6731 6732 6733 6734 6735 6736 6737 6738 6739 6740 6741 6742 6743 6744 6745 6746 6747 6748 6749 6750 6751 6752 6753 6754 6755 6756 6757 6758 6759 6760 6761 6762 6763 6764 6765 6766 6767 6768 6769 6770 6771 6772 6773 6774 6775 6776 6777 6778 6779 6780 6781 6782 6783 6784 6785 6786 6787 6788 6789 6790 6791 6792 6793 6794 6795 6796 6797 6798 6799 6800 6801 6802 6803 6804 6805 6806 6807 6808 6809 6810 6811 6812 6813 6814 6815 6816 6817 6818 6819 6820 6821 6822 6823 6824 6825 6826 6827 6828 6829 6830 6831 6832 6833 6834 6835 6836 6837 6838 6839 6840 6841 6842 6843 6844 6845 6846 6847 6848 6849 6850 6851 6852 6853 6854 6855 6856 6857 6858 6859 6860 6861 6862 6863 6864 6865 6866 6867 6868 6869 6870 6871 6872 6873 6874 6875 6876 6877 6878 6879 6880 6881 6882 6883 6884 6885 6886 6887 6888 6889 6890 6891 6892 6893 6894 6895 6896 6897 6898 6899 6900 6901 6902 6903 6904 6905 6906 6907 6908 6909 6910 6911 6912 6913 6914 6915 6916 6917 6918 6919 6920 6921 6922 6923 6924 6925 6926 6927 6928 6929 6930 6931 6932 6933 6934 6935 6936 6937 6938 6939 6940 6941 6942 6943 6944 6945 6946 6947 6948 6949 6950 6951 6952 6953 6954 6955 6956 6957 6958 6959 6960 6961 6962 6963 6964 6965 6966 6967 6968 6969 6970 6971 6972 6973 6974 6975 6976 6977 6978 6979 6980 6981 6982 6983 6984 6985 6986 6987 6988 6989 6990 6991 6992 6993 6994 6995 6996 6997 6998 6999 7000 7001 7002 7003 7004 7005 7006 7007 7008 7009 7010 7011 7012 7013 7014 7015 7016 7017 7018 7019 7020 7021 7022 7023 7024 7025 7026 7027 7028 7029 7030 7031 7032 7033 7034 7035 7036 7037 7038 7039 7040 7041 7042 7043 7044 7045 7046 7047 7048 7049 7050 7051 7052 7053 7054 7055 7056 7057 7058 7059 7060 7061 7062 7063 7064 7065 7066 7067 7068 7069 7070 7071 7072 7073 7074 7075 7076 7077 7078 7079 7080 7081 7082 7083 7084 7085 7086 7087 7088 7089 7090 7091 7092 7093 7094 7095 7096 7097 7098 7099 7100 7101 7102 7103 7104 7105 7106 7107 7108 7109 7110 7111 7112 7113 7114 7115 7116 7117 7118 7119 7120 7121 7122 7123 7124 7125 7126 7127 7128 7129 7130 7131 7132 7133 7134 7135 7136 7137 7138 7139 7140 7141 7142 7143 7144 7145 7146 7147 7148 7149 7150 7151 7152 7153 7154 7155 7156 7157 7158 7159 7160 7161 7162 7163 7164 7165 7166 7167 7168 7169 7170 7171 7172 7173 7174 7175 7176 7177 7178 7179 7180 7181 7182 7183 7184 7185 7186 7187 7188 7189 7190 7191 7192 7193 7194 7195 7196 7197 7198 7199 7200 7201 7202 7203 7204 7205 7206 7207 7208 7209 7210 7211 7212 7213 7214 7215 7216 7217 7218 7219 7220 7221 7222 7223 7224 7225 7226 7227 7228 7229 7230 7231 7232 7233 7234 7235 7236 7237 7238 7239 7240 7241 7242 7243 7244 7245 7246 7247 7248 7249 7250 7251 7252 7253 7254 7255 7256 7257 7258 7259 7260 7261 7262 7263 7264 7265 7266 7267 7268 7269 7270 7271 7272 7273 7274 7275 7276 7277 7278 7279 7280 7281 7282 7283 7284 7285 7286 7287 7288 7289 7290 7291 7292 7293 7294 7295 7296 7297 7298 7299 7300 7301 7302 7303 7304 7305 7306 7307 7308 7309 7310 7311 7312 7313 7314 7315 7316 7317 7318 7319 7320 7321 7322 7323 7324 7325 7326 7327 7328 7329 7330 7331 7332 7333 7334 7335 7336 7337 7338 7339 7340 7341 7342 7343 7344 7345 7346 7347 7348 7349 7350 7351 7352 7353 7354 7355 7356 7357 7358 7359 7360 7361 7362 7363 7364 7365 7366 7367 7368 7369 7370 7371 7372 7373 7374 7375 7376 7377 7378 7379 7380 7381 7382 7383 7384 7385 7386 7387 7388 7389 7390 7391 | /* * Generic process-grouping system. * * Based originally on the cpuset system, extracted by Paul Menage * Copyright (C) 2006 Google, Inc * * Notifications support * Copyright (C) 2009 Nokia Corporation * Author: Kirill A. Shutemov * * Copyright notices from the original cpuset code: * -------------------------------------------------- * Copyright (C) 2003 BULL SA. * Copyright (C) 2004-2006 Silicon Graphics, Inc. * * Portions derived from Patrick Mochel's sysfs code. * sysfs is Copyright (c) 2001-3 Patrick Mochel * * 2003-10-10 Written by Simon Derr. * 2003-10-22 Updates by Stephen Hemminger. * 2004 May-July Rework by Paul Jackson. * --------------------------------------------------- * * This file is subject to the terms and conditions of the GNU General Public * License. See the file COPYING in the main directory of the Linux * distribution for more details. */ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #include "cgroup-internal.h" #include <linux/bpf-cgroup.h> #include <linux/cred.h> #include <linux/errno.h> #include <linux/init_task.h> #include <linux/kernel.h> #include <linux/magic.h> #include <linux/mutex.h> #include <linux/mount.h> #include <linux/pagemap.h> #include <linux/proc_fs.h> #include <linux/rcupdate.h> #include <linux/sched.h> #include <linux/sched/task.h> #include <linux/slab.h> #include <linux/spinlock.h> #include <linux/percpu-rwsem.h> #include <linux/string.h> #include <linux/hashtable.h> #include <linux/idr.h> #include <linux/kthread.h> #include <linux/atomic.h> #include <linux/cpuset.h> #include <linux/proc_ns.h> #include <linux/nsproxy.h> #include <linux/file.h> #include <linux/fs_parser.h> #include <linux/sched/cputime.h> #include <linux/sched/deadline.h> #include <linux/psi.h> #include <linux/nstree.h> #include <net/sock.h> #define CREATE_TRACE_POINTS #include <trace/events/cgroup.h> #define CGROUP_FILE_NAME_MAX (MAX_CGROUP_TYPE_NAMELEN + \ MAX_CFTYPE_NAME + 2) /* let's not notify more than 100 times per second */ #define CGROUP_FILE_NOTIFY_MIN_INTV DIV_ROUND_UP(HZ, 100) /* * To avoid confusing the compiler (and generating warnings) with code * that attempts to access what would be a 0-element array (i.e. sized * to a potentially empty array when CGROUP_SUBSYS_COUNT == 0), this * constant expression can be added. */ #define CGROUP_HAS_SUBSYS_CONFIG (CGROUP_SUBSYS_COUNT > 0) /* * cgroup_mutex is the master lock. Any modification to cgroup or its * hierarchy must be performed while holding it. * * css_set_lock protects task->cgroups pointer, the list of css_set * objects, and the chain of tasks off each css_set. * * These locks are exported if CONFIG_PROVE_RCU so that accessors in * cgroup.h can use them for lockdep annotations. */ DEFINE_MUTEX(cgroup_mutex); DEFINE_SPINLOCK(css_set_lock); #if (defined CONFIG_PROVE_RCU || defined CONFIG_LOCKDEP) EXPORT_SYMBOL_GPL(cgroup_mutex); EXPORT_SYMBOL_GPL(css_set_lock); #endif struct blocking_notifier_head cgroup_lifetime_notifier = BLOCKING_NOTIFIER_INIT(cgroup_lifetime_notifier); DEFINE_SPINLOCK(trace_cgroup_path_lock); char trace_cgroup_path[TRACE_CGROUP_PATH_LEN]; static bool cgroup_debug __read_mostly; /* * Protects cgroup_idr and css_idr so that IDs can be released without * grabbing cgroup_mutex. */ static DEFINE_SPINLOCK(cgroup_idr_lock); /* * Protects cgroup_file->kn for !self csses. It synchronizes notifications * against file removal/re-creation across css hiding. */ static DEFINE_SPINLOCK(cgroup_file_kn_lock); DEFINE_PERCPU_RWSEM(cgroup_threadgroup_rwsem); #define cgroup_assert_mutex_or_rcu_locked() \ RCU_LOCKDEP_WARN(!rcu_read_lock_held() && \ !lockdep_is_held(&cgroup_mutex), \ "cgroup_mutex or RCU read lock required"); /* * cgroup destruction makes heavy use of work items and there can be a lot * of concurrent destructions. Use a separate workqueue so that cgroup * destruction work items don't end up filling up max_active of system_percpu_wq * which may lead to deadlock. * * A cgroup destruction should enqueue work sequentially to: * cgroup_offline_wq: use for css offline work * cgroup_release_wq: use for css release work * cgroup_free_wq: use for free work * * Rationale for using separate workqueues: * The cgroup root free work may depend on completion of other css offline * operations. If all tasks were enqueued to a single workqueue, this could * create a deadlock scenario where: * - Free work waits for other css offline work to complete. * - But other css offline work is queued after free work in the same queue. * * Example deadlock scenario with single workqueue (cgroup_destroy_wq): * 1. umount net_prio * 2. net_prio root destruction enqueues work to cgroup_destroy_wq (CPUx) * 3. perf_event CSS A offline enqueues work to same cgroup_destroy_wq (CPUx) * 4. net_prio cgroup_destroy_root->cgroup_lock_and_drain_offline. * 5. net_prio root destruction blocks waiting for perf_event CSS A offline, * which can never complete as it's behind in the same queue and * workqueue's max_active is 1. */ static struct workqueue_struct *cgroup_offline_wq; static struct workqueue_struct *cgroup_release_wq; static struct workqueue_struct *cgroup_free_wq; /* generate an array of cgroup subsystem pointers */ #define SUBSYS(_x) [_x ## _cgrp_id] = &_x ## _cgrp_subsys, struct cgroup_subsys *cgroup_subsys[] = { #include <linux/cgroup_subsys.h> }; #undef SUBSYS /* array of cgroup subsystem names */ #define SUBSYS(_x) [_x ## _cgrp_id] = #_x, static const char *cgroup_subsys_name[] = { #include <linux/cgroup_subsys.h> }; #undef SUBSYS /* array of static_keys for cgroup_subsys_enabled() and cgroup_subsys_on_dfl() */ #define SUBSYS(_x) \ DEFINE_STATIC_KEY_TRUE(_x ## _cgrp_subsys_enabled_key); \ DEFINE_STATIC_KEY_TRUE(_x ## _cgrp_subsys_on_dfl_key); \ EXPORT_SYMBOL_GPL(_x ## _cgrp_subsys_enabled_key); \ EXPORT_SYMBOL_GPL(_x ## _cgrp_subsys_on_dfl_key); #include <linux/cgroup_subsys.h> #undef SUBSYS #define SUBSYS(_x) [_x ## _cgrp_id] = &_x ## _cgrp_subsys_enabled_key, static struct static_key_true *cgroup_subsys_enabled_key[] = { #include <linux/cgroup_subsys.h> }; #undef SUBSYS #define SUBSYS(_x) [_x ## _cgrp_id] = &_x ## _cgrp_subsys_on_dfl_key, static struct static_key_true *cgroup_subsys_on_dfl_key[] = { #include <linux/cgroup_subsys.h> }; #undef SUBSYS static DEFINE_PER_CPU(struct css_rstat_cpu, root_rstat_cpu); static DEFINE_PER_CPU(struct cgroup_rstat_base_cpu, root_rstat_base_cpu); /* the default hierarchy */ struct cgroup_root cgrp_dfl_root = { .cgrp.self.rstat_cpu = &root_rstat_cpu, .cgrp.rstat_base_cpu = &root_rstat_base_cpu, }; EXPORT_SYMBOL_GPL(cgrp_dfl_root); /* * The default hierarchy always exists but is hidden until mounted for the * first time. This is for backward compatibility. */ bool cgrp_dfl_visible; /* some controllers are not supported in the default hierarchy */ static u16 cgrp_dfl_inhibit_ss_mask; /* some controllers are implicitly enabled on the default hierarchy */ static u16 cgrp_dfl_implicit_ss_mask; /* some controllers can be threaded on the default hierarchy */ static u16 cgrp_dfl_threaded_ss_mask; /* The list of hierarchy roots */ LIST_HEAD(cgroup_roots); static int cgroup_root_count; /* hierarchy ID allocation and mapping, protected by cgroup_mutex */ static DEFINE_IDR(cgroup_hierarchy_idr); /* * Assign a monotonically increasing serial number to csses. It guarantees * cgroups with bigger numbers are newer than those with smaller numbers. * Also, as csses are always appended to the parent's ->children list, it * guarantees that sibling csses are always sorted in the ascending serial * number order on the list. Protected by cgroup_mutex. */ static u64 css_serial_nr_next = 1; /* * These bitmasks identify subsystems with specific features to avoid * having to do iterative checks repeatedly. */ static u16 have_fork_callback __read_mostly; static u16 have_exit_callback __read_mostly; static u16 have_release_callback __read_mostly; static u16 have_canfork_callback __read_mostly; static bool have_favordynmods __ro_after_init = IS_ENABLED(CONFIG_CGROUP_FAVOR_DYNMODS); /* * Write protected by cgroup_mutex and write-lock of cgroup_threadgroup_rwsem, * read protected by either. * * Can only be turned on, but not turned off. */ bool cgroup_enable_per_threadgroup_rwsem __read_mostly; /* cgroup namespace for init task */ struct cgroup_namespace init_cgroup_ns = { .ns.__ns_ref = REFCOUNT_INIT(2), .user_ns = &init_user_ns, .ns.ops = &cgroupns_operations, .ns.inum = ns_init_inum(&init_cgroup_ns), .root_cset = &init_css_set, .ns.ns_type = ns_common_type(&init_cgroup_ns), }; static struct file_system_type cgroup2_fs_type; static struct cftype cgroup_base_files[]; static struct cftype cgroup_psi_files[]; /* cgroup optional features */ enum cgroup_opt_features { #ifdef CONFIG_PSI OPT_FEATURE_PRESSURE, #endif OPT_FEATURE_COUNT }; static const char *cgroup_opt_feature_names[OPT_FEATURE_COUNT] = { #ifdef CONFIG_PSI "pressure", #endif }; static u16 cgroup_feature_disable_mask __read_mostly; static int cgroup_apply_control(struct cgroup *cgrp); static void cgroup_finalize_control(struct cgroup *cgrp, int ret); static void css_task_iter_skip(struct css_task_iter *it, struct task_struct *task); static int cgroup_destroy_locked(struct cgroup *cgrp); static struct cgroup_subsys_state *css_create(struct cgroup *cgrp, struct cgroup_subsys *ss); static void css_release(struct percpu_ref *ref); static void kill_css(struct cgroup_subsys_state *css); static int cgroup_addrm_files(struct cgroup_subsys_state *css, struct cgroup *cgrp, struct cftype cfts[], bool is_add); #ifdef CONFIG_DEBUG_CGROUP_REF #define CGROUP_REF_FN_ATTRS noinline #define CGROUP_REF_EXPORT(fn) EXPORT_SYMBOL_GPL(fn); #include <linux/cgroup_refcnt.h> #endif /** * cgroup_ssid_enabled - cgroup subsys enabled test by subsys ID * @ssid: subsys ID of interest * * cgroup_subsys_enabled() can only be used with literal subsys names which * is fine for individual subsystems but unsuitable for cgroup core. This * is slower static_key_enabled() based test indexed by @ssid. */ bool cgroup_ssid_enabled(int ssid) { if (!CGROUP_HAS_SUBSYS_CONFIG) return false; return static_key_enabled(cgroup_subsys_enabled_key[ssid]); } /** * cgroup_on_dfl - test whether a cgroup is on the default hierarchy * @cgrp: the cgroup of interest * * The default hierarchy is the v2 interface of cgroup and this function * can be used to test whether a cgroup is on the default hierarchy for * cases where a subsystem should behave differently depending on the * interface version. * * List of changed behaviors: * * - Mount options "noprefix", "xattr", "clone_children", "release_agent" * and "name" are disallowed. * * - When mounting an existing superblock, mount options should match. * * - rename(2) is disallowed. * * - "tasks" is removed. Everything should be at process granularity. Use * "cgroup.procs" instead. * * - "cgroup.procs" is not sorted. pids will be unique unless they got * recycled in-between reads. * * - "release_agent" and "notify_on_release" are removed. Replacement * notification mechanism will be implemented. * * - "cgroup.clone_children" is removed. * * - "cgroup.subtree_populated" is available. Its value is 0 if the cgroup * and its descendants contain no task; otherwise, 1. The file also * generates kernfs notification which can be monitored through poll and * [di]notify when the value of the file changes. * * - cpuset: tasks will be kept in empty cpusets when hotplug happens and * take masks of ancestors with non-empty cpus/mems, instead of being * moved to an ancestor. * * - cpuset: a task can be moved into an empty cpuset, and again it takes * masks of ancestors. * * - blkcg: blk-throttle becomes properly hierarchical. */ bool cgroup_on_dfl(const struct cgroup *cgrp) { return cgrp->root == &cgrp_dfl_root; } /* IDR wrappers which synchronize using cgroup_idr_lock */ static int cgroup_idr_alloc(struct idr *idr, void *ptr, int start, int end, gfp_t gfp_mask) { int ret; idr_preload(gfp_mask); spin_lock_bh(&cgroup_idr_lock); ret = idr_alloc(idr, ptr, start, end, gfp_mask & ~__GFP_DIRECT_RECLAIM); spin_unlock_bh(&cgroup_idr_lock); idr_preload_end(); return ret; } static void *cgroup_idr_replace(struct idr *idr, void *ptr, int id) { void *ret; spin_lock_bh(&cgroup_idr_lock); ret = idr_replace(idr, ptr, id); spin_unlock_bh(&cgroup_idr_lock); return ret; } static void cgroup_idr_remove(struct idr *idr, int id) { spin_lock_bh(&cgroup_idr_lock); idr_remove(idr, id); spin_unlock_bh(&cgroup_idr_lock); } static bool cgroup_has_tasks(struct cgroup *cgrp) { return cgrp->nr_populated_csets; } static bool cgroup_is_threaded(struct cgroup *cgrp) { return cgrp->dom_cgrp != cgrp; } /* can @cgrp host both domain and threaded children? */ static bool cgroup_is_mixable(struct cgroup *cgrp) { /* * Root isn't under domain level resource control exempting it from * the no-internal-process constraint, so it can serve as a thread * root and a parent of resource domains at the same time. */ return !cgroup_parent(cgrp); } /* can @cgrp become a thread root? Should always be true for a thread root */ static bool cgroup_can_be_thread_root(struct cgroup *cgrp) { /* mixables don't care */ if (cgroup_is_mixable(cgrp)) return true; /* domain roots can't be nested under threaded */ if (cgroup_is_threaded(cgrp)) return false; /* can only have either domain or threaded children */ if (cgrp->nr_populated_domain_children) return false; /* and no domain controllers can be enabled */ if (cgrp->subtree_control & ~cgrp_dfl_threaded_ss_mask) return false; return true; } /* is @cgrp root of a threaded subtree? */ static bool cgroup_is_thread_root(struct cgroup *cgrp) { /* thread root should be a domain */ if (cgroup_is_threaded(cgrp)) return false; /* a domain w/ threaded children is a thread root */ if (cgrp->nr_threaded_children) return true; /* * A domain which has tasks and explicit threaded controllers * enabled is a thread root. */ if (cgroup_has_tasks(cgrp) && (cgrp->subtree_control & cgrp_dfl_threaded_ss_mask)) return true; return false; } /* a domain which isn't connected to the root w/o brekage can't be used */ static bool cgroup_is_valid_domain(struct cgroup *cgrp) { /* the cgroup itself can be a thread root */ if (cgroup_is_threaded(cgrp)) return false; /* but the ancestors can't be unless mixable */ while ((cgrp = cgroup_parent(cgrp))) { if (!cgroup_is_mixable(cgrp) && cgroup_is_thread_root(cgrp)) return false; if (cgroup_is_threaded(cgrp)) return false; } return true; } /* subsystems visibly enabled on a cgroup */ static u16 cgroup_control(struct cgroup *cgrp) { struct cgroup *parent = cgroup_parent(cgrp); u16 root_ss_mask = cgrp->root->subsys_mask; if (parent) { u16 ss_mask = parent->subtree_control; /* threaded cgroups can only have threaded controllers */ if (cgroup_is_threaded(cgrp)) ss_mask &= cgrp_dfl_threaded_ss_mask; return ss_mask; } if (cgroup_on_dfl(cgrp)) root_ss_mask &= ~(cgrp_dfl_inhibit_ss_mask | cgrp_dfl_implicit_ss_mask); return root_ss_mask; } /* subsystems enabled on a cgroup */ static u16 cgroup_ss_mask(struct cgroup *cgrp) { struct cgroup *parent = cgroup_parent(cgrp); if (parent) { u16 ss_mask = parent->subtree_ss_mask; /* threaded cgroups can only have threaded controllers */ if (cgroup_is_threaded(cgrp)) ss_mask &= cgrp_dfl_threaded_ss_mask; return ss_mask; } return cgrp->root->subsys_mask; } /** * cgroup_css - obtain a cgroup's css for the specified subsystem * @cgrp: the cgroup of interest * @ss: the subsystem of interest (%NULL returns @cgrp->self) * * Return @cgrp's css (cgroup_subsys_state) associated with @ss. This * function must be called either under cgroup_mutex or rcu_read_lock() and * the caller is responsible for pinning the returned css if it wants to * keep accessing it outside the said locks. This function may return * %NULL if @cgrp doesn't have @subsys_id enabled. */ static struct cgroup_subsys_state *cgroup_css(struct cgroup *cgrp, struct cgroup_subsys *ss) { if (CGROUP_HAS_SUBSYS_CONFIG && ss) return rcu_dereference_check(cgrp->subsys[ss->id], lockdep_is_held(&cgroup_mutex)); else return &cgrp->self; } /** * cgroup_e_css_by_mask - obtain a cgroup's effective css for the specified ss * @cgrp: the cgroup of interest * @ss: the subsystem of interest (%NULL returns @cgrp->self) * * Similar to cgroup_css() but returns the effective css, which is defined * as the matching css of the nearest ancestor including self which has @ss * enabled. If @ss is associated with the hierarchy @cgrp is on, this * function is guaranteed to return non-NULL css. */ static struct cgroup_subsys_state *cgroup_e_css_by_mask(struct cgroup *cgrp, struct cgroup_subsys *ss) { lockdep_assert_held(&cgroup_mutex); if (!ss) return &cgrp->self; /* * This function is used while updating css associations and thus * can't test the csses directly. Test ss_mask. */ while (!(cgroup_ss_mask(cgrp) & (1 << ss->id))) { cgrp = cgroup_parent(cgrp); if (!cgrp) return NULL; } return cgroup_css(cgrp, ss); } /** * cgroup_e_css - obtain a cgroup's effective css for the specified subsystem * @cgrp: the cgroup of interest * @ss: the subsystem of interest * * Find and get the effective css of @cgrp for @ss. The effective css is * defined as the matching css of the nearest ancestor including self which * has @ss enabled. If @ss is not mounted on the hierarchy @cgrp is on, * the root css is returned, so this function always returns a valid css. * * The returned css is not guaranteed to be online, and therefore it is the * callers responsibility to try get a reference for it. */ struct cgroup_subsys_state *cgroup_e_css(struct cgroup *cgrp, struct cgroup_subsys *ss) { struct cgroup_subsys_state *css; if (!CGROUP_HAS_SUBSYS_CONFIG) return NULL; do { css = cgroup_css(cgrp, ss); if (css) return css; cgrp = cgroup_parent(cgrp); } while (cgrp); return init_css_set.subsys[ss->id]; } /** * cgroup_get_e_css - get a cgroup's effective css for the specified subsystem * @cgrp: the cgroup of interest * @ss: the subsystem of interest * * Find and get the effective css of @cgrp for @ss. The effective css is * defined as the matching css of the nearest ancestor including self which * has @ss enabled. If @ss is not mounted on the hierarchy @cgrp is on, * the root css is returned, so this function always returns a valid css. * The returned css must be put using css_put(). */ struct cgroup_subsys_state *cgroup_get_e_css(struct cgroup *cgrp, struct cgroup_subsys *ss) { struct cgroup_subsys_state *css; if (!CGROUP_HAS_SUBSYS_CONFIG) return NULL; rcu_read_lock(); do { css = cgroup_css(cgrp, ss); if (css && css_tryget_online(css)) goto out_unlock; cgrp = cgroup_parent(cgrp); } while (cgrp); css = init_css_set.subsys[ss->id]; css_get(css); out_unlock: rcu_read_unlock(); return css; } EXPORT_SYMBOL_GPL(cgroup_get_e_css); static void cgroup_get_live(struct cgroup *cgrp) { WARN_ON_ONCE(cgroup_is_dead(cgrp)); cgroup_get(cgrp); } /** * __cgroup_task_count - count the number of tasks in a cgroup. The caller * is responsible for taking the css_set_lock. * @cgrp: the cgroup in question */ int __cgroup_task_count(const struct cgroup *cgrp) { int count = 0; struct cgrp_cset_link *link; lockdep_assert_held(&css_set_lock); list_for_each_entry(link, &cgrp->cset_links, cset_link) count += link->cset->nr_tasks; return count; } /** * cgroup_task_count - count the number of tasks in a cgroup. * @cgrp: the cgroup in question */ int cgroup_task_count(const struct cgroup *cgrp) { int count; spin_lock_irq(&css_set_lock); count = __cgroup_task_count(cgrp); spin_unlock_irq(&css_set_lock); return count; } static struct cgroup *kn_priv(struct kernfs_node *kn) { struct kernfs_node *parent; /* * The parent can not be replaced due to KERNFS_ROOT_INVARIANT_PARENT. * Therefore it is always safe to dereference this pointer outside of a * RCU section. */ parent = rcu_dereference_check(kn->__parent, kernfs_root_flags(kn) & KERNFS_ROOT_INVARIANT_PARENT); return parent->priv; } struct cgroup_subsys_state *of_css(struct kernfs_open_file *of) { struct cgroup *cgrp = kn_priv(of->kn); struct cftype *cft = of_cft(of); /* * This is open and unprotected implementation of cgroup_css(). * seq_css() is only called from a kernfs file operation which has * an active reference on the file. Because all the subsystem * files are drained before a css is disassociated with a cgroup, * the matching css from the cgroup's subsys table is guaranteed to * be and stay valid until the enclosing operation is complete. */ if (CGROUP_HAS_SUBSYS_CONFIG && cft->ss) return rcu_dereference_raw(cgrp->subsys[cft->ss->id]); else return &cgrp->self; } EXPORT_SYMBOL_GPL(of_css); /** * for_each_css - iterate all css's of a cgroup * @css: the iteration cursor * @ssid: the index of the subsystem, CGROUP_SUBSYS_COUNT after reaching the end * @cgrp: the target cgroup to iterate css's of * * Should be called under cgroup_mutex. */ #define for_each_css(css, ssid, cgrp) \ for ((ssid) = 0; (ssid) < CGROUP_SUBSYS_COUNT; (ssid)++) \ if (!((css) = rcu_dereference_check( \ (cgrp)->subsys[(ssid)], \ lockdep_is_held(&cgroup_mutex)))) { } \ else /** * do_each_subsys_mask - filter for_each_subsys with a bitmask * @ss: the iteration cursor * @ssid: the index of @ss, CGROUP_SUBSYS_COUNT after reaching the end * @ss_mask: the bitmask * * The block will only run for cases where the ssid-th bit (1 << ssid) of * @ss_mask is set. */ #define do_each_subsys_mask(ss, ssid, ss_mask) do { \ unsigned long __ss_mask = (ss_mask); \ if (!CGROUP_HAS_SUBSYS_CONFIG) { \ (ssid) = 0; \ break; \ } \ for_each_set_bit(ssid, &__ss_mask, CGROUP_SUBSYS_COUNT) { \ (ss) = cgroup_subsys[ssid]; \ { #define while_each_subsys_mask() \ } \ } \ } while (false) /* iterate over child cgrps, lock should be held throughout iteration */ #define cgroup_for_each_live_child(child, cgrp) \ list_for_each_entry((child), &(cgrp)->self.children, self.sibling) \ if (({ lockdep_assert_held(&cgroup_mutex); \ cgroup_is_dead(child); })) \ ; \ else /* walk live descendants in pre order */ #define cgroup_for_each_live_descendant_pre(dsct, d_css, cgrp) \ css_for_each_descendant_pre((d_css), cgroup_css((cgrp), NULL)) \ if (({ lockdep_assert_held(&cgroup_mutex); \ (dsct) = (d_css)->cgroup; \ cgroup_is_dead(dsct); })) \ ; \ else /* walk live descendants in postorder */ #define cgroup_for_each_live_descendant_post(dsct, d_css, cgrp) \ css_for_each_descendant_post((d_css), cgroup_css((cgrp), NULL)) \ if (({ lockdep_assert_held(&cgroup_mutex); \ (dsct) = (d_css)->cgroup; \ cgroup_is_dead(dsct); })) \ ; \ else /* * The default css_set - used by init and its children prior to any * hierarchies being mounted. It contains a pointer to the root state * for each subsystem. Also used to anchor the list of css_sets. Not * reference-counted, to improve performance when child cgroups * haven't been created. */ struct css_set init_css_set = { .refcount = REFCOUNT_INIT(1), .dom_cset = &init_css_set, .tasks = LIST_HEAD_INIT(init_css_set.tasks), .mg_tasks = LIST_HEAD_INIT(init_css_set.mg_tasks), .dying_tasks = LIST_HEAD_INIT(init_css_set.dying_tasks), .task_iters = LIST_HEAD_INIT(init_css_set.task_iters), .threaded_csets = LIST_HEAD_INIT(init_css_set.threaded_csets), .cgrp_links = LIST_HEAD_INIT(init_css_set.cgrp_links), .mg_src_preload_node = LIST_HEAD_INIT(init_css_set.mg_src_preload_node), .mg_dst_preload_node = LIST_HEAD_INIT(init_css_set.mg_dst_preload_node), .mg_node = LIST_HEAD_INIT(init_css_set.mg_node), /* * The following field is re-initialized when this cset gets linked * in cgroup_init(). However, let's initialize the field * statically too so that the default cgroup can be accessed safely * early during boot. */ .dfl_cgrp = &cgrp_dfl_root.cgrp, }; static int css_set_count = 1; /* 1 for init_css_set */ static bool css_set_threaded(struct css_set *cset) { return cset->dom_cset != cset; } /** * css_set_populated - does a css_set contain any tasks? * @cset: target css_set * * css_set_populated() should be the same as !!cset->nr_tasks at steady * state. However, css_set_populated() can be called while a task is being * added to or removed from the linked list before the nr_tasks is * properly updated. Hence, we can't just look at ->nr_tasks here. */ static bool css_set_populated(struct css_set *cset) { lockdep_assert_held(&css_set_lock); return !list_empty(&cset->tasks) || !list_empty(&cset->mg_tasks); } /** * cgroup_update_populated - update the populated count of a cgroup * @cgrp: the target cgroup * @populated: inc or dec populated count * * One of the css_sets associated with @cgrp is either getting its first * task or losing the last. Update @cgrp->nr_populated_* accordingly. The * count is propagated towards root so that a given cgroup's * nr_populated_children is zero iff none of its descendants contain any * tasks. * * @cgrp's interface file "cgroup.populated" is zero if both * @cgrp->nr_populated_csets and @cgrp->nr_populated_children are zero and * 1 otherwise. When the sum changes from or to zero, userland is notified * that the content of the interface file has changed. This can be used to * detect when @cgrp and its descendants become populated or empty. */ static void cgroup_update_populated(struct cgroup *cgrp, bool populated) { struct cgroup *child = NULL; int adj = populated ? 1 : -1; lockdep_assert_held(&css_set_lock); do { bool was_populated = cgroup_is_populated(cgrp); if (!child) { cgrp->nr_populated_csets += adj; } else { if (cgroup_is_threaded(child)) cgrp->nr_populated_threaded_children += adj; else cgrp->nr_populated_domain_children += adj; } if (was_populated == cgroup_is_populated(cgrp)) break; cgroup1_check_for_release(cgrp); TRACE_CGROUP_PATH(notify_populated, cgrp, cgroup_is_populated(cgrp)); cgroup_file_notify(&cgrp->events_file); child = cgrp; cgrp = cgroup_parent(cgrp); } while (cgrp); } /** * css_set_update_populated - update populated state of a css_set * @cset: target css_set * @populated: whether @cset is populated or depopulated * * @cset is either getting the first task or losing the last. Update the * populated counters of all associated cgroups accordingly. */ static void css_set_update_populated(struct css_set *cset, bool populated) { struct cgrp_cset_link *link; lockdep_assert_held(&css_set_lock); list_for_each_entry(link, &cset->cgrp_links, cgrp_link) cgroup_update_populated(link->cgrp, populated); } /* * @task is leaving, advance task iterators which are pointing to it so * that they can resume at the next position. Advancing an iterator might * remove it from the list, use safe walk. See css_task_iter_skip() for * details. */ static void css_set_skip_task_iters(struct css_set *cset, struct task_struct *task) { struct css_task_iter *it, *pos; list_for_each_entry_safe(it, pos, &cset->task_iters, iters_node) css_task_iter_skip(it, task); } /** * css_set_move_task - move a task from one css_set to another * @task: task being moved * @from_cset: css_set @task currently belongs to (may be NULL) * @to_cset: new css_set @task is being moved to (may be NULL) * @use_mg_tasks: move to @to_cset->mg_tasks instead of ->tasks * * Move @task from @from_cset to @to_cset. If @task didn't belong to any * css_set, @from_cset can be NULL. If @task is being disassociated * instead of moved, @to_cset can be NULL. * * This function automatically handles populated counter updates and * css_task_iter adjustments but the caller is responsible for managing * @from_cset and @to_cset's reference counts. */ static void css_set_move_task(struct task_struct *task, struct css_set *from_cset, struct css_set *to_cset, bool use_mg_tasks) { lockdep_assert_held(&css_set_lock); if (to_cset && !css_set_populated(to_cset)) css_set_update_populated(to_cset, true); if (from_cset) { WARN_ON_ONCE(list_empty(&task->cg_list)); css_set_skip_task_iters(from_cset, task); list_del_init(&task->cg_list); if (!css_set_populated(from_cset)) css_set_update_populated(from_cset, false); } else { WARN_ON_ONCE(!list_empty(&task->cg_list)); } if (to_cset) { /* * We are synchronized through cgroup_threadgroup_rwsem * against PF_EXITING setting such that we can't race * against cgroup_exit()/cgroup_free() dropping the css_set. */ WARN_ON_ONCE(task->flags & PF_EXITING); cgroup_move_task(task, to_cset); list_add_tail(&task->cg_list, use_mg_tasks ? &to_cset->mg_tasks : &to_cset->tasks); } } /* * hash table for cgroup groups. This improves the performance to find * an existing css_set. This hash doesn't (currently) take into * account cgroups in empty hierarchies. */ #define CSS_SET_HASH_BITS 7 static DEFINE_HASHTABLE(css_set_table, CSS_SET_HASH_BITS); static unsigned long css_set_hash(struct cgroup_subsys_state **css) { unsigned long key = 0UL; struct cgroup_subsys *ss; int i; for_each_subsys(ss, i) key += (unsigned long)css[i]; key = (key >> 16) ^ key; return key; } void put_css_set_locked(struct css_set *cset) { struct cgrp_cset_link *link, *tmp_link; struct cgroup_subsys *ss; int ssid; lockdep_assert_held(&css_set_lock); if (!refcount_dec_and_test(&cset->refcount)) return; WARN_ON_ONCE(!list_empty(&cset->threaded_csets)); /* This css_set is dead. Unlink it and release cgroup and css refs */ for_each_subsys(ss, ssid) { list_del(&cset->e_cset_node[ssid]); css_put(cset->subsys[ssid]); } hash_del(&cset->hlist); css_set_count--; list_for_each_entry_safe(link, tmp_link, &cset->cgrp_links, cgrp_link) { list_del(&link->cset_link); list_del(&link->cgrp_link); if (cgroup_parent(link->cgrp)) cgroup_put(link->cgrp); kfree(link); } if (css_set_threaded(cset)) { list_del(&cset->threaded_csets_node); put_css_set_locked(cset->dom_cset); } kfree_rcu(cset, rcu_head); } /** * compare_css_sets - helper function for find_existing_css_set(). * @cset: candidate css_set being tested * @old_cset: existing css_set for a task * @new_cgrp: cgroup that's being entered by the task * @template: desired set of css pointers in css_set (pre-calculated) * * Returns true if "cset" matches "old_cset" except for the hierarchy * which "new_cgrp" belongs to, for which it should match "new_cgrp". */ static bool compare_css_sets(struct css_set *cset, struct css_set *old_cset, struct cgroup *new_cgrp, struct cgroup_subsys_state *template[]) { struct cgroup *new_dfl_cgrp; struct list_head *l1, *l2; /* * On the default hierarchy, there can be csets which are * associated with the same set of cgroups but different csses. * Let's first ensure that csses match. */ if (memcmp(template, cset->subsys, sizeof(cset->subsys))) return false; /* @cset's domain should match the default cgroup's */ if (cgroup_on_dfl(new_cgrp)) new_dfl_cgrp = new_cgrp; else new_dfl_cgrp = old_cset->dfl_cgrp; if (new_dfl_cgrp->dom_cgrp != cset->dom_cset->dfl_cgrp) return false; /* * Compare cgroup pointers in order to distinguish between * different cgroups in hierarchies. As different cgroups may * share the same effective css, this comparison is always * necessary. */ l1 = &cset->cgrp_links; l2 = &old_cset->cgrp_links; while (1) { struct cgrp_cset_link *link1, *link2; struct cgroup *cgrp1, *cgrp2; l1 = l1->next; l2 = l2->next; /* See if we reached the end - both lists are equal length. */ if (l1 == &cset->cgrp_links) { BUG_ON(l2 != &old_cset->cgrp_links); break; } else { BUG_ON(l2 == &old_cset->cgrp_links); } /* Locate the cgroups associated with these links. */ link1 = list_entry(l1, struct cgrp_cset_link, cgrp_link); link2 = list_entry(l2, struct cgrp_cset_link, cgrp_link); cgrp1 = link1->cgrp; cgrp2 = link2->cgrp; /* Hierarchies should be linked in the same order. */ BUG_ON(cgrp1->root != cgrp2->root); /* * If this hierarchy is the hierarchy of the cgroup * that's changing, then we need to check that this * css_set points to the new cgroup; if it's any other * hierarchy, then this css_set should point to the * same cgroup as the old css_set. */ if (cgrp1->root == new_cgrp->root) { if (cgrp1 != new_cgrp) return false; } else { if (cgrp1 != cgrp2) return false; } } return true; } /** * find_existing_css_set - init css array and find the matching css_set * @old_cset: the css_set that we're using before the cgroup transition * @cgrp: the cgroup that we're moving into * @template: out param for the new set of csses, should be clear on entry */ static struct css_set *find_existing_css_set(struct css_set *old_cset, struct cgroup *cgrp, struct cgroup_subsys_state **template) { struct cgroup_root *root = cgrp->root; struct cgroup_subsys *ss; struct css_set *cset; unsigned long key; int i; /* * Build the set of subsystem state objects that we want to see in the * new css_set. While subsystems can change globally, the entries here * won't change, so no need for locking. */ for_each_subsys(ss, i) { if (root->subsys_mask & (1UL << i)) { /* * @ss is in this hierarchy, so we want the * effective css from @cgrp. */ template[i] = cgroup_e_css_by_mask(cgrp, ss); } else { /* * @ss is not in this hierarchy, so we don't want * to change the css. */ template[i] = old_cset->subsys[i]; } } key = css_set_hash(template); hash_for_each_possible(css_set_table, cset, hlist, key) { if (!compare_css_sets(cset, old_cset, cgrp, template)) continue; /* This css_set matches what we need */ return cset; } /* No existing cgroup group matched */ return NULL; } static void free_cgrp_cset_links(struct list_head *links_to_free) { struct cgrp_cset_link *link, *tmp_link; list_for_each_entry_safe(link, tmp_link, links_to_free, cset_link) { list_del(&link->cset_link); kfree(link); } } /** * allocate_cgrp_cset_links - allocate cgrp_cset_links * @count: the number of links to allocate * @tmp_links: list_head the allocated links are put on * * Allocate @count cgrp_cset_link structures and chain them on @tmp_links * through ->cset_link. Returns 0 on success or -errno. */ static int allocate_cgrp_cset_links(int count, struct list_head *tmp_links) { struct cgrp_cset_link *link; int i; INIT_LIST_HEAD(tmp_links); for (i = 0; i < count; i++) { link = kzalloc(sizeof(*link), GFP_KERNEL); if (!link) { free_cgrp_cset_links(tmp_links); return -ENOMEM; } list_add(&link->cset_link, tmp_links); } return 0; } /** * link_css_set - a helper function to link a css_set to a cgroup * @tmp_links: cgrp_cset_link objects allocated by allocate_cgrp_cset_links() * @cset: the css_set to be linked * @cgrp: the destination cgroup */ static void link_css_set(struct list_head *tmp_links, struct css_set *cset, struct cgroup *cgrp) { struct cgrp_cset_link *link; BUG_ON(list_empty(tmp_links)); if (cgroup_on_dfl(cgrp)) cset->dfl_cgrp = cgrp; link = list_first_entry(tmp_links, struct cgrp_cset_link, cset_link); link->cset = cset; link->cgrp = cgrp; /* * Always add links to the tail of the lists so that the lists are * in chronological order. */ list_move_tail(&link->cset_link, &cgrp->cset_links); list_add_tail(&link->cgrp_link, &cset->cgrp_links); if (cgroup_parent(cgrp)) cgroup_get_live(cgrp); } /** * find_css_set - return a new css_set with one cgroup updated * @old_cset: the baseline css_set * @cgrp: the cgroup to be updated * * Return a new css_set that's equivalent to @old_cset, but with @cgrp * substituted into the appropriate hierarchy. */ static struct css_set *find_css_set(struct css_set *old_cset, struct cgroup *cgrp) { struct cgroup_subsys_state *template[CGROUP_SUBSYS_COUNT] = { }; struct css_set *cset; struct list_head tmp_links; struct cgrp_cset_link *link; struct cgroup_subsys *ss; unsigned long key; int ssid; lockdep_assert_held(&cgroup_mutex); /* First see if we already have a cgroup group that matches * the desired set */ spin_lock_irq(&css_set_lock); cset = find_existing_css_set(old_cset, cgrp, template); if (cset) get_css_set(cset); spin_unlock_irq(&css_set_lock); if (cset) return cset; cset = kzalloc(sizeof(*cset), GFP_KERNEL); if (!cset) return NULL; /* Allocate all the cgrp_cset_link objects that we'll need */ if (allocate_cgrp_cset_links(cgroup_root_count, &tmp_links) < 0) { kfree(cset); return NULL; } refcount_set(&cset->refcount, 1); cset->dom_cset = cset; INIT_LIST_HEAD(&cset->tasks); INIT_LIST_HEAD(&cset->mg_tasks); INIT_LIST_HEAD(&cset->dying_tasks); INIT_LIST_HEAD(&cset->task_iters); INIT_LIST_HEAD(&cset->threaded_csets); INIT_HLIST_NODE(&cset->hlist); INIT_LIST_HEAD(&cset->cgrp_links); INIT_LIST_HEAD(&cset->mg_src_preload_node); INIT_LIST_HEAD(&cset->mg_dst_preload_node); INIT_LIST_HEAD(&cset->mg_node); /* Copy the set of subsystem state objects generated in * find_existing_css_set() */ memcpy(cset->subsys, template, sizeof(cset->subsys)); spin_lock_irq(&css_set_lock); /* Add reference counts and links from the new css_set. */ list_for_each_entry(link, &old_cset->cgrp_links, cgrp_link) { struct cgroup *c = link->cgrp; if (c->root == cgrp->root) c = cgrp; link_css_set(&tmp_links, cset, c); } BUG_ON(!list_empty(&tmp_links)); css_set_count++; /* Add @cset to the hash table */ key = css_set_hash(cset->subsys); hash_add(css_set_table, &cset->hlist, key); for_each_subsys(ss, ssid) { struct cgroup_subsys_state *css = cset->subsys[ssid]; list_add_tail(&cset->e_cset_node[ssid], &css->cgroup->e_csets[ssid]); css_get(css); } spin_unlock_irq(&css_set_lock); /* * If @cset should be threaded, look up the matching dom_cset and * link them up. We first fully initialize @cset then look for the * dom_cset. It's simpler this way and safe as @cset is guaranteed * to stay empty until we return. */ if (cgroup_is_threaded(cset->dfl_cgrp)) { struct css_set *dcset; dcset = find_css_set(cset, cset->dfl_cgrp->dom_cgrp); if (!dcset) { put_css_set(cset); return NULL; } spin_lock_irq(&css_set_lock); cset->dom_cset = dcset; list_add_tail(&cset->threaded_csets_node, &dcset->threaded_csets); spin_unlock_irq(&css_set_lock); } return cset; } struct cgroup_root *cgroup_root_from_kf(struct kernfs_root *kf_root) { struct cgroup *root_cgrp = kernfs_root_to_node(kf_root)->priv; return root_cgrp->root; } void cgroup_favor_dynmods(struct cgroup_root *root, bool favor) { bool favoring = root->flags & CGRP_ROOT_FAVOR_DYNMODS; /* * see the comment above CGRP_ROOT_FAVOR_DYNMODS definition. * favordynmods can flip while task is between * cgroup_threadgroup_change_begin() and end(), so down_write global * cgroup_threadgroup_rwsem to synchronize them. * * Once cgroup_enable_per_threadgroup_rwsem is enabled, holding * cgroup_threadgroup_rwsem doesn't exlude tasks between * cgroup_thread_group_change_begin() and end() and thus it's unsafe to * turn off. As the scenario is unlikely, simply disallow disabling once * enabled and print out a warning. */ percpu_down_write(&cgroup_threadgroup_rwsem); if (favor && !favoring) { cgroup_enable_per_threadgroup_rwsem = true; rcu_sync_enter(&cgroup_threadgroup_rwsem.rss); root->flags |= CGRP_ROOT_FAVOR_DYNMODS; } else if (!favor && favoring) { if (cgroup_enable_per_threadgroup_rwsem) pr_warn_once("cgroup favordynmods: per threadgroup rwsem mechanism can't be disabled\n"); rcu_sync_exit(&cgroup_threadgroup_rwsem.rss); root->flags &= ~CGRP_ROOT_FAVOR_DYNMODS; } percpu_up_write(&cgroup_threadgroup_rwsem); } static int cgroup_init_root_id(struct cgroup_root *root) { int id; lockdep_assert_held(&cgroup_mutex); id = idr_alloc_cyclic(&cgroup_hierarchy_idr, root, 0, 0, GFP_KERNEL); if (id < 0) return id; root->hierarchy_id = id; return 0; } static void cgroup_exit_root_id(struct cgroup_root *root) { lockdep_assert_held(&cgroup_mutex); idr_remove(&cgroup_hierarchy_idr, root->hierarchy_id); } void cgroup_free_root(struct cgroup_root *root) { kfree_rcu(root, rcu); } static void cgroup_destroy_root(struct cgroup_root *root) { struct cgroup *cgrp = &root->cgrp; struct cgrp_cset_link *link, *tmp_link; int ret; trace_cgroup_destroy_root(root); cgroup_lock_and_drain_offline(&cgrp_dfl_root.cgrp); BUG_ON(atomic_read(&root->nr_cgrps)); BUG_ON(!list_empty(&cgrp->self.children)); ret = blocking_notifier_call_chain(&cgroup_lifetime_notifier, CGROUP_LIFETIME_OFFLINE, cgrp); WARN_ON_ONCE(notifier_to_errno(ret)); /* Rebind all subsystems back to the default hierarchy */ WARN_ON(rebind_subsystems(&cgrp_dfl_root, root->subsys_mask)); /* * Release all the links from cset_links to this hierarchy's * root cgroup */ spin_lock_irq(&css_set_lock); list_for_each_entry_safe(link, tmp_link, &cgrp->cset_links, cset_link) { list_del(&link->cset_link); list_del(&link->cgrp_link); kfree(link); } spin_unlock_irq(&css_set_lock); WARN_ON_ONCE(list_empty(&root->root_list)); list_del_rcu(&root->root_list); cgroup_root_count--; if (!have_favordynmods) cgroup_favor_dynmods(root, false); cgroup_exit_root_id(root); cgroup_unlock(); kernfs_destroy_root(root->kf_root); cgroup_free_root(root); } /* * Returned cgroup is without refcount but it's valid as long as cset pins it. */ static inline struct cgroup *__cset_cgroup_from_root(struct css_set *cset, struct cgroup_root *root) { struct cgroup *res_cgroup = NULL; if (cset == &init_css_set) { res_cgroup = &root->cgrp; } else if (root == &cgrp_dfl_root) { res_cgroup = cset->dfl_cgrp; } else { struct cgrp_cset_link *link; lockdep_assert_held(&css_set_lock); list_for_each_entry(link, &cset->cgrp_links, cgrp_link) { struct cgroup *c = link->cgrp; if (c->root == root) { res_cgroup = c; break; } } } /* * If cgroup_mutex is not held, the cgrp_cset_link will be freed * before we remove the cgroup root from the root_list. Consequently, * when accessing a cgroup root, the cset_link may have already been * freed, resulting in a NULL res_cgroup. However, by holding the * cgroup_mutex, we ensure that res_cgroup can't be NULL. * If we don't hold cgroup_mutex in the caller, we must do the NULL * check. */ return res_cgroup; } /* * look up cgroup associated with current task's cgroup namespace on the * specified hierarchy */ static struct cgroup * current_cgns_cgroup_from_root(struct cgroup_root *root) { struct cgroup *res = NULL; struct css_set *cset; lockdep_assert_held(&css_set_lock); rcu_read_lock(); cset = current->nsproxy->cgroup_ns->root_cset; res = __cset_cgroup_from_root(cset, root); rcu_read_unlock(); /* * The namespace_sem is held by current, so the root cgroup can't * be umounted. Therefore, we can ensure that the res is non-NULL. */ WARN_ON_ONCE(!res); return res; } /* * Look up cgroup associated with current task's cgroup namespace on the default * hierarchy. * * Unlike current_cgns_cgroup_from_root(), this doesn't need locks: * - Internal rcu_read_lock is unnecessary because we don't dereference any rcu * pointers. * - css_set_lock is not needed because we just read cset->dfl_cgrp. * - As a bonus returned cgrp is pinned with the current because it cannot * switch cgroup_ns asynchronously. */ static struct cgroup *current_cgns_cgroup_dfl(void) { struct css_set *cset; if (current->nsproxy) { cset = current->nsproxy->cgroup_ns->root_cset; return __cset_cgroup_from_root(cset, &cgrp_dfl_root); } else { /* * NOTE: This function may be called from bpf_cgroup_from_id() * on a task which has already passed exit_task_namespaces() and * nsproxy == NULL. Fall back to cgrp_dfl_root which will make all * cgroups visible for lookups. */ return &cgrp_dfl_root.cgrp; } } /* look up cgroup associated with given css_set on the specified hierarchy */ static struct cgroup *cset_cgroup_from_root(struct css_set *cset, struct cgroup_root *root) { lockdep_assert_held(&css_set_lock); return __cset_cgroup_from_root(cset, root); } /* * Return the cgroup for "task" from the given hierarchy. Must be * called with css_set_lock held to prevent task's groups from being modified. * Must be called with either cgroup_mutex or rcu read lock to prevent the * cgroup root from being destroyed. */ struct cgroup *task_cgroup_from_root(struct task_struct *task, struct cgroup_root *root) { /* * No need to lock the task - since we hold css_set_lock the * task can't change groups. */ return cset_cgroup_from_root(task_css_set(task), root); } /* * A task must hold cgroup_mutex to modify cgroups. * * Any task can increment and decrement the count field without lock. * So in general, code holding cgroup_mutex can't rely on the count * field not changing. However, if the count goes to zero, then only * cgroup_attach_task() can increment it again. Because a count of zero * means that no tasks are currently attached, therefore there is no * way a task attached to that cgroup can fork (the other way to * increment the count). So code holding cgroup_mutex can safely * assume that if the count is zero, it will stay zero. Similarly, if * a task holds cgroup_mutex on a cgroup with zero count, it * knows that the cgroup won't be removed, as cgroup_rmdir() * needs that mutex. * * A cgroup can only be deleted if both its 'count' of using tasks * is zero, and its list of 'children' cgroups is empty. Since all * tasks in the system use _some_ cgroup, and since there is always at * least one task in the system (init, pid == 1), therefore, root cgroup * always has either children cgroups and/or using tasks. So we don't * need a special hack to ensure that root cgroup cannot be deleted. * * P.S. One more locking exception. RCU is used to guard the * update of a tasks cgroup pointer by cgroup_attach_task() */ static struct kernfs_syscall_ops cgroup_kf_syscall_ops; static char *cgroup_file_name(struct cgroup *cgrp, const struct cftype *cft, char *buf) { struct cgroup_subsys *ss = cft->ss; if (cft->ss && !(cft->flags & CFTYPE_NO_PREFIX) && !(cgrp->root->flags & CGRP_ROOT_NOPREFIX)) { const char *dbg = (cft->flags & CFTYPE_DEBUG) ? ".__DEBUG__." : ""; snprintf(buf, CGROUP_FILE_NAME_MAX, "%s%s.%s", dbg, cgroup_on_dfl(cgrp) ? ss->name : ss->legacy_name, cft->name); } else { strscpy(buf, cft->name, CGROUP_FILE_NAME_MAX); } return buf; } /** * cgroup_file_mode - deduce file mode of a control file * @cft: the control file in question * * S_IRUGO for read, S_IWUSR for write. */ static umode_t cgroup_file_mode(const struct cftype *cft) { umode_t mode = 0; if (cft->read_u64 || cft->read_s64 || cft->seq_show) mode |= S_IRUGO; if (cft->write_u64 || cft->write_s64 || cft->write) { if (cft->flags & CFTYPE_WORLD_WRITABLE) mode |= S_IWUGO; else mode |= S_IWUSR; } return mode; } /** * cgroup_calc_subtree_ss_mask - calculate subtree_ss_mask * @subtree_control: the new subtree_control mask to consider * @this_ss_mask: available subsystems * * On the default hierarchy, a subsystem may request other subsystems to be * enabled together through its ->depends_on mask. In such cases, more * subsystems than specified in "cgroup.subtree_control" may be enabled. * * This function calculates which subsystems need to be enabled if * @subtree_control is to be applied while restricted to @this_ss_mask. */ static u16 cgroup_calc_subtree_ss_mask(u16 subtree_control, u16 this_ss_mask) { u16 cur_ss_mask = subtree_control; struct cgroup_subsys *ss; int ssid; lockdep_assert_held(&cgroup_mutex); cur_ss_mask |= cgrp_dfl_implicit_ss_mask; while (true) { u16 new_ss_mask = cur_ss_mask; do_each_subsys_mask(ss, ssid, cur_ss_mask) { new_ss_mask |= ss->depends_on; } while_each_subsys_mask(); /* * Mask out subsystems which aren't available. This can * happen only if some depended-upon subsystems were bound * to non-default hierarchies. */ new_ss_mask &= this_ss_mask; if (new_ss_mask == cur_ss_mask) break; cur_ss_mask = new_ss_mask; } return cur_ss_mask; } /** * cgroup_kn_unlock - unlocking helper for cgroup kernfs methods * @kn: the kernfs_node being serviced * * This helper undoes cgroup_kn_lock_live() and should be invoked before * the method finishes if locking succeeded. Note that once this function * returns the cgroup returned by cgroup_kn_lock_live() may become * inaccessible any time. If the caller intends to continue to access the * cgroup, it should pin it before invoking this function. */ void cgroup_kn_unlock(struct kernfs_node *kn) { struct cgroup *cgrp; if (kernfs_type(kn) == KERNFS_DIR) cgrp = kn->priv; else cgrp = kn_priv(kn); cgroup_unlock(); kernfs_unbreak_active_protection(kn); cgroup_put(cgrp); } /** * cgroup_kn_lock_live - locking helper for cgroup kernfs methods * @kn: the kernfs_node being serviced * @drain_offline: perform offline draining on the cgroup * * This helper is to be used by a cgroup kernfs method currently servicing * @kn. It breaks the active protection, performs cgroup locking and * verifies that the associated cgroup is alive. Returns the cgroup if * alive; otherwise, %NULL. A successful return should be undone by a * matching cgroup_kn_unlock() invocation. If @drain_offline is %true, the * cgroup is drained of offlining csses before return. * * Any cgroup kernfs method implementation which requires locking the * associated cgroup should use this helper. It avoids nesting cgroup * locking under kernfs active protection and allows all kernfs operations * including self-removal. */ struct cgroup *cgroup_kn_lock_live(struct kernfs_node *kn, bool drain_offline) { struct cgroup *cgrp; if (kernfs_type(kn) == KERNFS_DIR) cgrp = kn->priv; else cgrp = kn_priv(kn); /* * We're gonna grab cgroup_mutex which nests outside kernfs * active_ref. cgroup liveliness check alone provides enough * protection against removal. Ensure @cgrp stays accessible and * break the active_ref protection. */ if (!cgroup_tryget(cgrp)) return NULL; kernfs_break_active_protection(kn); if (drain_offline) cgroup_lock_and_drain_offline(cgrp); else cgroup_lock(); if (!cgroup_is_dead(cgrp)) return cgrp; cgroup_kn_unlock(kn); return NULL; } static void cgroup_rm_file(struct cgroup *cgrp, const struct cftype *cft) { char name[CGROUP_FILE_NAME_MAX]; lockdep_assert_held(&cgroup_mutex); if (cft->file_offset) { struct cgroup_subsys_state *css = cgroup_css(cgrp, cft->ss); struct cgroup_file *cfile = (void *)css + cft->file_offset; spin_lock_irq(&cgroup_file_kn_lock); cfile->kn = NULL; spin_unlock_irq(&cgroup_file_kn_lock); timer_delete_sync(&cfile->notify_timer); } kernfs_remove_by_name(cgrp->kn, cgroup_file_name(cgrp, cft, name)); } /** * css_clear_dir - remove subsys files in a cgroup directory * @css: target css */ static void css_clear_dir(struct cgroup_subsys_state *css) { struct cgroup *cgrp = css->cgroup; struct cftype *cfts; if (!(css->flags & CSS_VISIBLE)) return; css->flags &= ~CSS_VISIBLE; if (css_is_self(css)) { if (cgroup_on_dfl(cgrp)) { cgroup_addrm_files(css, cgrp, cgroup_base_files, false); if (cgroup_psi_enabled()) cgroup_addrm_files(css, cgrp, cgroup_psi_files, false); } else { cgroup_addrm_files(css, cgrp, cgroup1_base_files, false); } } else { list_for_each_entry(cfts, &css->ss->cfts, node) cgroup_addrm_files(css, cgrp, cfts, false); } } /** * css_populate_dir - create subsys files in a cgroup directory * @css: target css * * On failure, no file is added. */ static int css_populate_dir(struct cgroup_subsys_state *css) { struct cgroup *cgrp = css->cgroup; struct cftype *cfts, *failed_cfts; int ret; if (css->flags & CSS_VISIBLE) return 0; if (css_is_self(css)) { if (cgroup_on_dfl(cgrp)) { ret = cgroup_addrm_files(css, cgrp, cgroup_base_files, true); if (ret < 0) return ret; if (cgroup_psi_enabled()) { ret = cgroup_addrm_files(css, cgrp, cgroup_psi_files, true); if (ret < 0) { cgroup_addrm_files(css, cgrp, cgroup_base_files, false); return ret; } } } else { ret = cgroup_addrm_files(css, cgrp, cgroup1_base_files, true); if (ret < 0) return ret; } } else { list_for_each_entry(cfts, &css->ss->cfts, node) { ret = cgroup_addrm_files(css, cgrp, cfts, true); if (ret < 0) { failed_cfts = cfts; goto err; } } } css->flags |= CSS_VISIBLE; return 0; err: list_for_each_entry(cfts, &css->ss->cfts, node) { if (cfts == failed_cfts) break; cgroup_addrm_files(css, cgrp, cfts, false); } return ret; } int rebind_subsystems(struct cgroup_root *dst_root, u16 ss_mask) { struct cgroup *dcgrp = &dst_root->cgrp; struct cgroup_subsys *ss; int ssid, ret; u16 dfl_disable_ss_mask = 0; lockdep_assert_held(&cgroup_mutex); do_each_subsys_mask(ss, ssid, ss_mask) { /* * If @ss has non-root csses attached to it, can't move. * If @ss is an implicit controller, it is exempt from this * rule and can be stolen. */ if (css_next_child(NULL, cgroup_css(&ss->root->cgrp, ss)) && !ss->implicit_on_dfl) return -EBUSY; /* can't move between two non-dummy roots either */ if (ss->root != &cgrp_dfl_root && dst_root != &cgrp_dfl_root) return -EBUSY; /* * Collect ssid's that need to be disabled from default * hierarchy. */ if (ss->root == &cgrp_dfl_root) dfl_disable_ss_mask |= 1 << ssid; } while_each_subsys_mask(); if (dfl_disable_ss_mask) { struct cgroup *scgrp = &cgrp_dfl_root.cgrp; /* * Controllers from default hierarchy that need to be rebound * are all disabled together in one go. */ cgrp_dfl_root.subsys_mask &= ~dfl_disable_ss_mask; WARN_ON(cgroup_apply_control(scgrp)); cgroup_finalize_control(scgrp, 0); } do_each_subsys_mask(ss, ssid, ss_mask) { struct cgroup_root *src_root = ss->root; struct cgroup *scgrp = &src_root->cgrp; struct cgroup_subsys_state *css = cgroup_css(scgrp, ss); struct css_set *cset, *cset_pos; struct css_task_iter *it; WARN_ON(!css || cgroup_css(dcgrp, ss)); if (src_root != &cgrp_dfl_root) { /* disable from the source */ src_root->subsys_mask &= ~(1 << ssid); WARN_ON(cgroup_apply_control(scgrp)); cgroup_finalize_control(scgrp, 0); } /* rebind */ RCU_INIT_POINTER(scgrp->subsys[ssid], NULL); rcu_assign_pointer(dcgrp->subsys[ssid], css); ss->root = dst_root; spin_lock_irq(&css_set_lock); css->cgroup = dcgrp; WARN_ON(!list_empty(&dcgrp->e_csets[ss->id])); list_for_each_entry_safe(cset, cset_pos, &scgrp->e_csets[ss->id], e_cset_node[ss->id]) { list_move_tail(&cset->e_cset_node[ss->id], &dcgrp->e_csets[ss->id]); /* * all css_sets of scgrp together in same order to dcgrp, * patch in-flight iterators to preserve correct iteration. * since the iterator is always advanced right away and * finished when it->cset_pos meets it->cset_head, so only * update it->cset_head is enough here. */ list_for_each_entry(it, &cset->task_iters, iters_node) if (it->cset_head == &scgrp->e_csets[ss->id]) it->cset_head = &dcgrp->e_csets[ss->id]; } spin_unlock_irq(&css_set_lock); /* default hierarchy doesn't enable controllers by default */ dst_root->subsys_mask |= 1 << ssid; if (dst_root == &cgrp_dfl_root) { static_branch_enable(cgroup_subsys_on_dfl_key[ssid]); } else { dcgrp->subtree_control |= 1 << ssid; static_branch_disable(cgroup_subsys_on_dfl_key[ssid]); } ret = cgroup_apply_control(dcgrp); if (ret) pr_warn("partial failure to rebind %s controller (err=%d)\n", ss->name, ret); if (ss->bind) ss->bind(css); } while_each_subsys_mask(); kernfs_activate(dcgrp->kn); return 0; } int cgroup_show_path(struct seq_file *sf, struct kernfs_node *kf_node, struct kernfs_root *kf_root) { int len = 0; char *buf = NULL; struct cgroup_root *kf_cgroot = cgroup_root_from_kf(kf_root); struct cgroup *ns_cgroup; buf = kmalloc(PATH_MAX, GFP_KERNEL); if (!buf) return -ENOMEM; spin_lock_irq(&css_set_lock); ns_cgroup = current_cgns_cgroup_from_root(kf_cgroot); len = kernfs_path_from_node(kf_node, ns_cgroup->kn, buf, PATH_MAX); spin_unlock_irq(&css_set_lock); if (len == -E2BIG) len = -ERANGE; else if (len > 0) { seq_escape(sf, buf, " \t\n\\"); len = 0; } kfree(buf); return len; } enum cgroup2_param { Opt_nsdelegate, Opt_favordynmods, Opt_memory_localevents, Opt_memory_recursiveprot, Opt_memory_hugetlb_accounting, Opt_pids_localevents, nr__cgroup2_params }; static const struct fs_parameter_spec cgroup2_fs_parameters[] = { fsparam_flag("nsdelegate", Opt_nsdelegate), fsparam_flag("favordynmods", Opt_favordynmods), fsparam_flag("memory_localevents", Opt_memory_localevents), fsparam_flag("memory_recursiveprot", Opt_memory_recursiveprot), fsparam_flag("memory_hugetlb_accounting", Opt_memory_hugetlb_accounting), fsparam_flag("pids_localevents", Opt_pids_localevents), {} }; static int cgroup2_parse_param(struct fs_context *fc, struct fs_parameter *param) { struct cgroup_fs_context *ctx = cgroup_fc2context(fc); struct fs_parse_result result; int opt; opt = fs_parse(fc, cgroup2_fs_parameters, param, &result); if (opt < 0) return opt; switch (opt) { case Opt_nsdelegate: ctx->flags |= CGRP_ROOT_NS_DELEGATE; return 0; case Opt_favordynmods: ctx->flags |= CGRP_ROOT_FAVOR_DYNMODS; return 0; case Opt_memory_localevents: ctx->flags |= CGRP_ROOT_MEMORY_LOCAL_EVENTS; return 0; case Opt_memory_recursiveprot: ctx->flags |= CGRP_ROOT_MEMORY_RECURSIVE_PROT; return 0; case Opt_memory_hugetlb_accounting: ctx->flags |= CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING; return 0; case Opt_pids_localevents: ctx->flags |= CGRP_ROOT_PIDS_LOCAL_EVENTS; return 0; } return -EINVAL; } struct cgroup_of_peak *of_peak(struct kernfs_open_file *of) { struct cgroup_file_ctx *ctx = of->priv; return &ctx->peak; } static void apply_cgroup_root_flags(unsigned int root_flags) { if (current->nsproxy->cgroup_ns == &init_cgroup_ns) { if (root_flags & CGRP_ROOT_NS_DELEGATE) cgrp_dfl_root.flags |= CGRP_ROOT_NS_DELEGATE; else cgrp_dfl_root.flags &= ~CGRP_ROOT_NS_DELEGATE; cgroup_favor_dynmods(&cgrp_dfl_root, root_flags & CGRP_ROOT_FAVOR_DYNMODS); if (root_flags & CGRP_ROOT_MEMORY_LOCAL_EVENTS) cgrp_dfl_root.flags |= CGRP_ROOT_MEMORY_LOCAL_EVENTS; else cgrp_dfl_root.flags &= ~CGRP_ROOT_MEMORY_LOCAL_EVENTS; if (root_flags & CGRP_ROOT_MEMORY_RECURSIVE_PROT) cgrp_dfl_root.flags |= CGRP_ROOT_MEMORY_RECURSIVE_PROT; else cgrp_dfl_root.flags &= ~CGRP_ROOT_MEMORY_RECURSIVE_PROT; if (root_flags & CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING) cgrp_dfl_root.flags |= CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING; else cgrp_dfl_root.flags &= ~CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING; if (root_flags & CGRP_ROOT_PIDS_LOCAL_EVENTS) cgrp_dfl_root.flags |= CGRP_ROOT_PIDS_LOCAL_EVENTS; else cgrp_dfl_root.flags &= ~CGRP_ROOT_PIDS_LOCAL_EVENTS; } } static int cgroup_show_options(struct seq_file *seq, struct kernfs_root *kf_root) { if (cgrp_dfl_root.flags & CGRP_ROOT_NS_DELEGATE) seq_puts(seq, ",nsdelegate"); if (cgrp_dfl_root.flags & CGRP_ROOT_FAVOR_DYNMODS) seq_puts(seq, ",favordynmods"); if (cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_LOCAL_EVENTS) seq_puts(seq, ",memory_localevents"); if (cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_RECURSIVE_PROT) seq_puts(seq, ",memory_recursiveprot"); if (cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING) seq_puts(seq, ",memory_hugetlb_accounting"); if (cgrp_dfl_root.flags & CGRP_ROOT_PIDS_LOCAL_EVENTS) seq_puts(seq, ",pids_localevents"); return 0; } static int cgroup_reconfigure(struct fs_context *fc) { struct cgroup_fs_context *ctx = cgroup_fc2context(fc); apply_cgroup_root_flags(ctx->flags); return 0; } static void init_cgroup_housekeeping(struct cgroup *cgrp) { struct cgroup_subsys *ss; int ssid; INIT_LIST_HEAD(&cgrp->self.sibling); INIT_LIST_HEAD(&cgrp->self.children); INIT_LIST_HEAD(&cgrp->cset_links); INIT_LIST_HEAD(&cgrp->pidlists); mutex_init(&cgrp->pidlist_mutex); cgrp->self.cgroup = cgrp; cgrp->self.flags |= CSS_ONLINE; cgrp->dom_cgrp = cgrp; cgrp->max_descendants = INT_MAX; cgrp->max_depth = INT_MAX; prev_cputime_init(&cgrp->prev_cputime); for_each_subsys(ss, ssid) INIT_LIST_HEAD(&cgrp->e_csets[ssid]); #ifdef CONFIG_CGROUP_BPF for (int i = 0; i < ARRAY_SIZE(cgrp->bpf.revisions); i++) cgrp->bpf.revisions[i] = 1; #endif init_waitqueue_head(&cgrp->offline_waitq); INIT_WORK(&cgrp->release_agent_work, cgroup1_release_agent); } void init_cgroup_root(struct cgroup_fs_context *ctx) { struct cgroup_root *root = ctx->root; struct cgroup *cgrp = &root->cgrp; INIT_LIST_HEAD_RCU(&root->root_list); atomic_set(&root->nr_cgrps, 1); cgrp->root = root; init_cgroup_housekeeping(cgrp); /* DYNMODS must be modified through cgroup_favor_dynmods() */ root->flags = ctx->flags & ~CGRP_ROOT_FAVOR_DYNMODS; if (ctx->release_agent) strscpy(root->release_agent_path, ctx->release_agent, PATH_MAX); if (ctx->name) strscpy(root->name, ctx->name, MAX_CGROUP_ROOT_NAMELEN); if (ctx->cpuset_clone_children) set_bit(CGRP_CPUSET_CLONE_CHILDREN, &root->cgrp.flags); } int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask) { LIST_HEAD(tmp_links); struct cgroup *root_cgrp = &root->cgrp; struct kernfs_syscall_ops *kf_sops; struct css_set *cset; int i, ret; lockdep_assert_held(&cgroup_mutex); ret = percpu_ref_init(&root_cgrp->self.refcnt, css_release, 0, GFP_KERNEL); if (ret) goto out; /* * We're accessing css_set_count without locking css_set_lock here, * but that's OK - it can only be increased by someone holding * cgroup_lock, and that's us. Later rebinding may disable * controllers on the default hierarchy and thus create new csets, * which can't be more than the existing ones. Allocate 2x. */ ret = allocate_cgrp_cset_links(2 * css_set_count, &tmp_links); if (ret) goto cancel_ref; ret = cgroup_init_root_id(root); if (ret) goto cancel_ref; kf_sops = root == &cgrp_dfl_root ? &cgroup_kf_syscall_ops : &cgroup1_kf_syscall_ops; root->kf_root = kernfs_create_root(kf_sops, KERNFS_ROOT_CREATE_DEACTIVATED | KERNFS_ROOT_SUPPORT_EXPORTOP | KERNFS_ROOT_SUPPORT_USER_XATTR | KERNFS_ROOT_INVARIANT_PARENT, root_cgrp); if (IS_ERR(root->kf_root)) { ret = PTR_ERR(root->kf_root); goto exit_root_id; } root_cgrp->kn = kernfs_root_to_node(root->kf_root); WARN_ON_ONCE(cgroup_ino(root_cgrp) != 1); root_cgrp->ancestors[0] = root_cgrp; ret = css_populate_dir(&root_cgrp->self); if (ret) goto destroy_root; ret = css_rstat_init(&root_cgrp->self); if (ret) goto destroy_root; ret = rebind_subsystems(root, ss_mask); if (ret) goto exit_stats; ret = blocking_notifier_call_chain(&cgroup_lifetime_notifier, CGROUP_LIFETIME_ONLINE, root_cgrp); WARN_ON_ONCE(notifier_to_errno(ret)); trace_cgroup_setup_root(root); /* * There must be no failure case after here, since rebinding takes * care of subsystems' refcounts, which are explicitly dropped in * the failure exit path. */ list_add_rcu(&root->root_list, &cgroup_roots); cgroup_root_count++; /* * Link the root cgroup in this hierarchy into all the css_set * objects. */ spin_lock_irq(&css_set_lock); hash_for_each(css_set_table, i, cset, hlist) { link_css_set(&tmp_links, cset, root_cgrp); if (css_set_populated(cset)) cgroup_update_populated(root_cgrp, true); } spin_unlock_irq(&css_set_lock); BUG_ON(!list_empty(&root_cgrp->self.children)); BUG_ON(atomic_read(&root->nr_cgrps) != 1); ret = 0; goto out; exit_stats: css_rstat_exit(&root_cgrp->self); destroy_root: kernfs_destroy_root(root->kf_root); root->kf_root = NULL; exit_root_id: cgroup_exit_root_id(root); cancel_ref: percpu_ref_exit(&root_cgrp->self.refcnt); out: free_cgrp_cset_links(&tmp_links); return ret; } int cgroup_do_get_tree(struct fs_context *fc) { struct cgroup_fs_context *ctx = cgroup_fc2context(fc); int ret; ctx->kfc.root = ctx->root->kf_root; if (fc->fs_type == &cgroup2_fs_type) ctx->kfc.magic = CGROUP2_SUPER_MAGIC; else ctx->kfc.magic = CGROUP_SUPER_MAGIC; ret = kernfs_get_tree(fc); /* * In non-init cgroup namespace, instead of root cgroup's dentry, * we return the dentry corresponding to the cgroupns->root_cgrp. */ if (!ret && ctx->ns != &init_cgroup_ns) { struct dentry *nsdentry; struct super_block *sb = fc->root->d_sb; struct cgroup *cgrp; cgroup_lock(); spin_lock_irq(&css_set_lock); cgrp = cset_cgroup_from_root(ctx->ns->root_cset, ctx->root); spin_unlock_irq(&css_set_lock); cgroup_unlock(); nsdentry = kernfs_node_dentry(cgrp->kn, sb); dput(fc->root); if (IS_ERR(nsdentry)) { deactivate_locked_super(sb); ret = PTR_ERR(nsdentry); nsdentry = NULL; } fc->root = nsdentry; } if (!ctx->kfc.new_sb_created) cgroup_put(&ctx->root->cgrp); return ret; } /* * Destroy a cgroup filesystem context. */ static void cgroup_fs_context_free(struct fs_context *fc) { struct cgroup_fs_context *ctx = cgroup_fc2context(fc); kfree(ctx->name); kfree(ctx->release_agent); put_cgroup_ns(ctx->ns); kernfs_free_fs_context(fc); kfree(ctx); } static int cgroup_get_tree(struct fs_context *fc) { struct cgroup_fs_context *ctx = cgroup_fc2context(fc); int ret; WRITE_ONCE(cgrp_dfl_visible, true); cgroup_get_live(&cgrp_dfl_root.cgrp); ctx->root = &cgrp_dfl_root; ret = cgroup_do_get_tree(fc); if (!ret) apply_cgroup_root_flags(ctx->flags); return ret; } static const struct fs_context_operations cgroup_fs_context_ops = { .free = cgroup_fs_context_free, .parse_param = cgroup2_parse_param, .get_tree = cgroup_get_tree, .reconfigure = cgroup_reconfigure, }; static const struct fs_context_operations cgroup1_fs_context_ops = { .free = cgroup_fs_context_free, .parse_param = cgroup1_parse_param, .get_tree = cgroup1_get_tree, .reconfigure = cgroup1_reconfigure, }; /* * Initialise the cgroup filesystem creation/reconfiguration context. Notably, * we select the namespace we're going to use. */ static int cgroup_init_fs_context(struct fs_context *fc) { struct cgroup_fs_context *ctx; ctx = kzalloc(sizeof(struct cgroup_fs_context), GFP_KERNEL); if (!ctx) return -ENOMEM; ctx->ns = current->nsproxy->cgroup_ns; get_cgroup_ns(ctx->ns); fc->fs_private = &ctx->kfc; if (fc->fs_type == &cgroup2_fs_type) fc->ops = &cgroup_fs_context_ops; else fc->ops = &cgroup1_fs_context_ops; put_user_ns(fc->user_ns); fc->user_ns = get_user_ns(ctx->ns->user_ns); fc->global = true; if (have_favordynmods) ctx->flags |= CGRP_ROOT_FAVOR_DYNMODS; return 0; } static void cgroup_kill_sb(struct super_block *sb) { struct kernfs_root *kf_root = kernfs_root_from_sb(sb); struct cgroup_root *root = cgroup_root_from_kf(kf_root); /* * If @root doesn't have any children, start killing it. * This prevents new mounts by disabling percpu_ref_tryget_live(). * * And don't kill the default root. */ if (list_empty(&root->cgrp.self.children) && root != &cgrp_dfl_root && !percpu_ref_is_dying(&root->cgrp.self.refcnt)) percpu_ref_kill(&root->cgrp.self.refcnt); cgroup_put(&root->cgrp); kernfs_kill_sb(sb); } struct file_system_type cgroup_fs_type = { .name = "cgroup", .init_fs_context = cgroup_init_fs_context, .parameters = cgroup1_fs_parameters, .kill_sb = cgroup_kill_sb, .fs_flags = FS_USERNS_MOUNT, }; static struct file_system_type cgroup2_fs_type = { .name = "cgroup2", .init_fs_context = cgroup_init_fs_context, .parameters = cgroup2_fs_parameters, .kill_sb = cgroup_kill_sb, .fs_flags = FS_USERNS_MOUNT, }; #ifdef CONFIG_CPUSETS_V1 enum cpuset_param { Opt_cpuset_v2_mode, }; static const struct fs_parameter_spec cpuset_fs_parameters[] = { fsparam_flag ("cpuset_v2_mode", Opt_cpuset_v2_mode), {} }; static int cpuset_parse_param(struct fs_context *fc, struct fs_parameter *param) { struct cgroup_fs_context *ctx = cgroup_fc2context(fc); struct fs_parse_result result; int opt; opt = fs_parse(fc, cpuset_fs_parameters, param, &result); if (opt < 0) return opt; switch (opt) { case Opt_cpuset_v2_mode: ctx->flags |= CGRP_ROOT_CPUSET_V2_MODE; return 0; } return -EINVAL; } static const struct fs_context_operations cpuset_fs_context_ops = { .get_tree = cgroup1_get_tree, .free = cgroup_fs_context_free, .parse_param = cpuset_parse_param, }; /* * This is ugly, but preserves the userspace API for existing cpuset * users. If someone tries to mount the "cpuset" filesystem, we * silently switch it to mount "cgroup" instead */ static int cpuset_init_fs_context(struct fs_context *fc) { char *agent = kstrdup("/sbin/cpuset_release_agent", GFP_USER); struct cgroup_fs_context *ctx; int err; err = cgroup_init_fs_context(fc); if (err) { kfree(agent); return err; } fc->ops = &cpuset_fs_context_ops; ctx = cgroup_fc2context(fc); ctx->subsys_mask = 1 << cpuset_cgrp_id; ctx->flags |= CGRP_ROOT_NOPREFIX; ctx->release_agent = agent; get_filesystem(&cgroup_fs_type); put_filesystem(fc->fs_type); fc->fs_type = &cgroup_fs_type; return 0; } static struct file_system_type cpuset_fs_type = { .name = "cpuset", .init_fs_context = cpuset_init_fs_context, .parameters = cpuset_fs_parameters, .fs_flags = FS_USERNS_MOUNT, }; #endif int cgroup_path_ns_locked(struct cgroup *cgrp, char *buf, size_t buflen, struct cgroup_namespace *ns) { struct cgroup *root = cset_cgroup_from_root(ns->root_cset, cgrp->root); return kernfs_path_from_node(cgrp->kn, root->kn, buf, buflen); } int cgroup_path_ns(struct cgroup *cgrp, char *buf, size_t buflen, struct cgroup_namespace *ns) { int ret; cgroup_lock(); spin_lock_irq(&css_set_lock); ret = cgroup_path_ns_locked(cgrp, buf, buflen, ns); spin_unlock_irq(&css_set_lock); cgroup_unlock(); return ret; } EXPORT_SYMBOL_GPL(cgroup_path_ns); /** * cgroup_attach_lock - Lock for ->attach() * @lock_mode: whether acquire and acquire which rwsem * @tsk: thread group to lock * * cgroup migration sometimes needs to stabilize threadgroups against forks and * exits by write-locking cgroup_threadgroup_rwsem. However, some ->attach() * implementations (e.g. cpuset), also need to disable CPU hotplug. * Unfortunately, letting ->attach() operations acquire cpus_read_lock() can * lead to deadlocks. * * Bringing up a CPU may involve creating and destroying tasks which requires * read-locking threadgroup_rwsem, so threadgroup_rwsem nests inside * cpus_read_lock(). If we call an ->attach() which acquires the cpus lock while * write-locking threadgroup_rwsem, the locking order is reversed and we end up * waiting for an on-going CPU hotplug operation which in turn is waiting for * the threadgroup_rwsem to be released to create new tasks. For more details: * * http://lkml.kernel.org/r/20220711174629.uehfmqegcwn2lqzu@wubuntu * * Resolve the situation by always acquiring cpus_read_lock() before optionally * write-locking cgroup_threadgroup_rwsem. This allows ->attach() to assume that * CPU hotplug is disabled on entry. * * When favordynmods is enabled, take per threadgroup rwsem to reduce overhead * on dynamic cgroup modifications. see the comment above * CGRP_ROOT_FAVOR_DYNMODS definition. * * tsk is not NULL only when writing to cgroup.procs. */ void cgroup_attach_lock(enum cgroup_attach_lock_mode lock_mode, struct task_struct *tsk) { cpus_read_lock(); switch (lock_mode) { case CGRP_ATTACH_LOCK_NONE: break; case CGRP_ATTACH_LOCK_GLOBAL: percpu_down_write(&cgroup_threadgroup_rwsem); break; case CGRP_ATTACH_LOCK_PER_THREADGROUP: down_write(&tsk->signal->cgroup_threadgroup_rwsem); break; default: pr_warn("cgroup: Unexpected attach lock mode."); break; } } /** * cgroup_attach_unlock - Undo cgroup_attach_lock() * @lock_mode: whether release and release which rwsem * @tsk: thread group to lock */ void cgroup_attach_unlock(enum cgroup_attach_lock_mode lock_mode, struct task_struct *tsk) { switch (lock_mode) { case CGRP_ATTACH_LOCK_NONE: break; case CGRP_ATTACH_LOCK_GLOBAL: percpu_up_write(&cgroup_threadgroup_rwsem); break; case CGRP_ATTACH_LOCK_PER_THREADGROUP: up_write(&tsk->signal->cgroup_threadgroup_rwsem); break; default: pr_warn("cgroup: Unexpected attach lock mode."); break; } cpus_read_unlock(); } /** * cgroup_migrate_add_task - add a migration target task to a migration context * @task: target task * @mgctx: target migration context * * Add @task, which is a migration target, to @mgctx->tset. This function * becomes noop if @task doesn't need to be migrated. @task's css_set * should have been added as a migration source and @task->cg_list will be * moved from the css_set's tasks list to mg_tasks one. */ static void cgroup_migrate_add_task(struct task_struct *task, struct cgroup_mgctx *mgctx) { struct css_set *cset; lockdep_assert_held(&css_set_lock); /* @task either already exited or can't exit until the end */ if (task->flags & PF_EXITING) return; /* cgroup_threadgroup_rwsem protects racing against forks */ WARN_ON_ONCE(list_empty(&task->cg_list)); cset = task_css_set(task); if (!cset->mg_src_cgrp) return; mgctx->tset.nr_tasks++; list_move_tail(&task->cg_list, &cset->mg_tasks); if (list_empty(&cset->mg_node)) list_add_tail(&cset->mg_node, &mgctx->tset.src_csets); if (list_empty(&cset->mg_dst_cset->mg_node)) list_add_tail(&cset->mg_dst_cset->mg_node, &mgctx->tset.dst_csets); } /** * cgroup_taskset_first - reset taskset and return the first task * @tset: taskset of interest * @dst_cssp: output variable for the destination css * * @tset iteration is initialized and the first task is returned. */ struct task_struct *cgroup_taskset_first(struct cgroup_taskset *tset, struct cgroup_subsys_state **dst_cssp) { tset->cur_cset = list_first_entry(tset->csets, struct css_set, mg_node); tset->cur_task = NULL; return cgroup_taskset_next(tset, dst_cssp); } /** * cgroup_taskset_next - iterate to the next task in taskset * @tset: taskset of interest * @dst_cssp: output variable for the destination css * * Return the next task in @tset. Iteration must have been initialized * with cgroup_taskset_first(). */ struct task_struct *cgroup_taskset_next(struct cgroup_taskset *tset, struct cgroup_subsys_state **dst_cssp) { struct css_set *cset = tset->cur_cset; struct task_struct *task = tset->cur_task; while (CGROUP_HAS_SUBSYS_CONFIG && &cset->mg_node != tset->csets) { if (!task) task = list_first_entry(&cset->mg_tasks, struct task_struct, cg_list); else task = list_next_entry(task, cg_list); if (&task->cg_list != &cset->mg_tasks) { tset->cur_cset = cset; tset->cur_task = task; /* * This function may be called both before and * after cgroup_migrate_execute(). The two cases * can be distinguished by looking at whether @cset * has its ->mg_dst_cset set. */ if (cset->mg_dst_cset) *dst_cssp = cset->mg_dst_cset->subsys[tset->ssid]; else *dst_cssp = cset->subsys[tset->ssid]; return task; } cset = list_next_entry(cset, mg_node); task = NULL; } return NULL; } /** * cgroup_migrate_execute - migrate a taskset * @mgctx: migration context * * Migrate tasks in @mgctx as setup by migration preparation functions. * This function fails iff one of the ->can_attach callbacks fails and * guarantees that either all or none of the tasks in @mgctx are migrated. * @mgctx is consumed regardless of success. */ static int cgroup_migrate_execute(struct cgroup_mgctx *mgctx) { struct cgroup_taskset *tset = &mgctx->tset; struct cgroup_subsys *ss; struct task_struct *task, *tmp_task; struct css_set *cset, *tmp_cset; int ssid, failed_ssid, ret; /* check that we can legitimately attach to the cgroup */ if (tset->nr_tasks) { do_each_subsys_mask(ss, ssid, mgctx->ss_mask) { if (ss->can_attach) { tset->ssid = ssid; ret = ss->can_attach(tset); if (ret) { failed_ssid = ssid; goto out_cancel_attach; } } } while_each_subsys_mask(); } /* * Now that we're guaranteed success, proceed to move all tasks to * the new cgroup. There are no failure cases after here, so this * is the commit point. */ spin_lock_irq(&css_set_lock); list_for_each_entry(cset, &tset->src_csets, mg_node) { list_for_each_entry_safe(task, tmp_task, &cset->mg_tasks, cg_list) { struct css_set *from_cset = task_css_set(task); struct css_set *to_cset = cset->mg_dst_cset; get_css_set(to_cset); to_cset->nr_tasks++; css_set_move_task(task, from_cset, to_cset, true); from_cset->nr_tasks--; /* * If the source or destination cgroup is frozen, * the task might require to change its state. */ cgroup_freezer_migrate_task(task, from_cset->dfl_cgrp, to_cset->dfl_cgrp); put_css_set_locked(from_cset); } } spin_unlock_irq(&css_set_lock); /* * Migration is committed, all target tasks are now on dst_csets. * Nothing is sensitive to fork() after this point. Notify * controllers that migration is complete. */ tset->csets = &tset->dst_csets; if (tset->nr_tasks) { do_each_subsys_mask(ss, ssid, mgctx->ss_mask) { if (ss->attach) { tset->ssid = ssid; ss->attach(tset); } } while_each_subsys_mask(); } ret = 0; goto out_release_tset; out_cancel_attach: if (tset->nr_tasks) { do_each_subsys_mask(ss, ssid, mgctx->ss_mask) { if (ssid == failed_ssid) break; if (ss->cancel_attach) { tset->ssid = ssid; ss->cancel_attach(tset); } } while_each_subsys_mask(); } out_release_tset: spin_lock_irq(&css_set_lock); list_splice_init(&tset->dst_csets, &tset->src_csets); list_for_each_entry_safe(cset, tmp_cset, &tset->src_csets, mg_node) { list_splice_tail_init(&cset->mg_tasks, &cset->tasks); list_del_init(&cset->mg_node); } spin_unlock_irq(&css_set_lock); /* * Re-initialize the cgroup_taskset structure in case it is reused * again in another cgroup_migrate_add_task()/cgroup_migrate_execute() * iteration. */ tset->nr_tasks = 0; tset->csets = &tset->src_csets; return ret; } /** * cgroup_migrate_vet_dst - verify whether a cgroup can be migration destination * @dst_cgrp: destination cgroup to test * * On the default hierarchy, except for the mixable, (possible) thread root * and threaded cgroups, subtree_control must be zero for migration * destination cgroups with tasks so that child cgroups don't compete * against tasks. */ int cgroup_migrate_vet_dst(struct cgroup *dst_cgrp) { /* v1 doesn't have any restriction */ if (!cgroup_on_dfl(dst_cgrp)) return 0; /* verify @dst_cgrp can host resources */ if (!cgroup_is_valid_domain(dst_cgrp->dom_cgrp)) return -EOPNOTSUPP; /* * If @dst_cgrp is already or can become a thread root or is * threaded, it doesn't matter. */ if (cgroup_can_be_thread_root(dst_cgrp) || cgroup_is_threaded(dst_cgrp)) return 0; /* apply no-internal-process constraint */ if (dst_cgrp->subtree_control) return -EBUSY; return 0; } /** * cgroup_migrate_finish - cleanup after attach * @mgctx: migration context * * Undo cgroup_migrate_add_src() and cgroup_migrate_prepare_dst(). See * those functions for details. */ void cgroup_migrate_finish(struct cgroup_mgctx *mgctx) { struct css_set *cset, *tmp_cset; lockdep_assert_held(&cgroup_mutex); spin_lock_irq(&css_set_lock); list_for_each_entry_safe(cset, tmp_cset, &mgctx->preloaded_src_csets, mg_src_preload_node) { cset->mg_src_cgrp = NULL; cset->mg_dst_cgrp = NULL; cset->mg_dst_cset = NULL; list_del_init(&cset->mg_src_preload_node); put_css_set_locked(cset); } list_for_each_entry_safe(cset, tmp_cset, &mgctx->preloaded_dst_csets, mg_dst_preload_node) { cset->mg_src_cgrp = NULL; cset->mg_dst_cgrp = NULL; cset->mg_dst_cset = NULL; list_del_init(&cset->mg_dst_preload_node); put_css_set_locked(cset); } spin_unlock_irq(&css_set_lock); } /** * cgroup_migrate_add_src - add a migration source css_set * @src_cset: the source css_set to add * @dst_cgrp: the destination cgroup * @mgctx: migration context * * Tasks belonging to @src_cset are about to be migrated to @dst_cgrp. Pin * @src_cset and add it to @mgctx->src_csets, which should later be cleaned * up by cgroup_migrate_finish(). * * This function may be called without holding cgroup_threadgroup_rwsem * even if the target is a process. Threads may be created and destroyed * but as long as cgroup_mutex is not dropped, no new css_set can be put * into play and the preloaded css_sets are guaranteed to cover all * migrations. */ void cgroup_migrate_add_src(struct css_set *src_cset, struct cgroup *dst_cgrp, struct cgroup_mgctx *mgctx) { struct cgroup *src_cgrp; lockdep_assert_held(&cgroup_mutex); lockdep_assert_held(&css_set_lock); /* * If ->dead, @src_set is associated with one or more dead cgroups * and doesn't contain any migratable tasks. Ignore it early so * that the rest of migration path doesn't get confused by it. */ if (src_cset->dead) return; if (!list_empty(&src_cset->mg_src_preload_node)) return; src_cgrp = cset_cgroup_from_root(src_cset, dst_cgrp->root); WARN_ON(src_cset->mg_src_cgrp); WARN_ON(src_cset->mg_dst_cgrp); WARN_ON(!list_empty(&src_cset->mg_tasks)); WARN_ON(!list_empty(&src_cset->mg_node)); src_cset->mg_src_cgrp = src_cgrp; src_cset->mg_dst_cgrp = dst_cgrp; get_css_set(src_cset); list_add_tail(&src_cset->mg_src_preload_node, &mgctx->preloaded_src_csets); } /** * cgroup_migrate_prepare_dst - prepare destination css_sets for migration * @mgctx: migration context * * Tasks are about to be moved and all the source css_sets have been * preloaded to @mgctx->preloaded_src_csets. This function looks up and * pins all destination css_sets, links each to its source, and append them * to @mgctx->preloaded_dst_csets. * * This function must be called after cgroup_migrate_add_src() has been * called on each migration source css_set. After migration is performed * using cgroup_migrate(), cgroup_migrate_finish() must be called on * @mgctx. */ int cgroup_migrate_prepare_dst(struct cgroup_mgctx *mgctx) { struct css_set *src_cset, *tmp_cset; lockdep_assert_held(&cgroup_mutex); /* look up the dst cset for each src cset and link it to src */ list_for_each_entry_safe(src_cset, tmp_cset, &mgctx->preloaded_src_csets, mg_src_preload_node) { struct css_set *dst_cset; struct cgroup_subsys *ss; int ssid; dst_cset = find_css_set(src_cset, src_cset->mg_dst_cgrp); if (!dst_cset) return -ENOMEM; WARN_ON_ONCE(src_cset->mg_dst_cset || dst_cset->mg_dst_cset); /* * If src cset equals dst, it's noop. Drop the src. * cgroup_migrate() will skip the cset too. Note that we * can't handle src == dst as some nodes are used by both. */ if (src_cset == dst_cset) { src_cset->mg_src_cgrp = NULL; src_cset->mg_dst_cgrp = NULL; list_del_init(&src_cset->mg_src_preload_node); put_css_set(src_cset); put_css_set(dst_cset); continue; } src_cset->mg_dst_cset = dst_cset; if (list_empty(&dst_cset->mg_dst_preload_node)) list_add_tail(&dst_cset->mg_dst_preload_node, &mgctx->preloaded_dst_csets); else put_css_set(dst_cset); for_each_subsys(ss, ssid) if (src_cset->subsys[ssid] != dst_cset->subsys[ssid]) mgctx->ss_mask |= 1 << ssid; } return 0; } /** * cgroup_migrate - migrate a process or task to a cgroup * @leader: the leader of the process or the task to migrate * @threadgroup: whether @leader points to the whole process or a single task * @mgctx: migration context * * Migrate a process or task denoted by @leader. If migrating a process, * the caller must be holding cgroup_threadgroup_rwsem. The caller is also * responsible for invoking cgroup_migrate_add_src() and * cgroup_migrate_prepare_dst() on the targets before invoking this * function and following up with cgroup_migrate_finish(). * * As long as a controller's ->can_attach() doesn't fail, this function is * guaranteed to succeed. This means that, excluding ->can_attach() * failure, when migrating multiple targets, the success or failure can be * decided for all targets by invoking group_migrate_prepare_dst() before * actually starting migrating. */ int cgroup_migrate(struct task_struct *leader, bool threadgroup, struct cgroup_mgctx *mgctx) { struct task_struct *task; /* * The following thread iteration should be inside an RCU critical * section to prevent tasks from being freed while taking the snapshot. * spin_lock_irq() implies RCU critical section here. */ spin_lock_irq(&css_set_lock); task = leader; do { cgroup_migrate_add_task(task, mgctx); if (!threadgroup) break; } while_each_thread(leader, task); spin_unlock_irq(&css_set_lock); return cgroup_migrate_execute(mgctx); } /** * cgroup_attach_task - attach a task or a whole threadgroup to a cgroup * @dst_cgrp: the cgroup to attach to * @leader: the task or the leader of the threadgroup to be attached * @threadgroup: attach the whole threadgroup? * * Call holding cgroup_mutex and cgroup_threadgroup_rwsem. */ int cgroup_attach_task(struct cgroup *dst_cgrp, struct task_struct *leader, bool threadgroup) { DEFINE_CGROUP_MGCTX(mgctx); struct task_struct *task; int ret = 0; /* look up all src csets */ spin_lock_irq(&css_set_lock); task = leader; do { cgroup_migrate_add_src(task_css_set(task), dst_cgrp, &mgctx); if (!threadgroup) break; } while_each_thread(leader, task); spin_unlock_irq(&css_set_lock); /* prepare dst csets and commit */ ret = cgroup_migrate_prepare_dst(&mgctx); if (!ret) ret = cgroup_migrate(leader, threadgroup, &mgctx); cgroup_migrate_finish(&mgctx); if (!ret) TRACE_CGROUP_PATH(attach_task, dst_cgrp, leader, threadgroup); return ret; } struct task_struct *cgroup_procs_write_start(char *buf, bool threadgroup, enum cgroup_attach_lock_mode *lock_mode) { struct task_struct *tsk; pid_t pid; if (kstrtoint(strstrip(buf), 0, &pid) || pid < 0) return ERR_PTR(-EINVAL); retry_find_task: rcu_read_lock(); if (pid) { tsk = find_task_by_vpid(pid); if (!tsk) { tsk = ERR_PTR(-ESRCH); goto out_unlock_rcu; } } else { tsk = current; } if (threadgroup) tsk = tsk->group_leader; /* * kthreads may acquire PF_NO_SETAFFINITY during initialization. * If userland migrates such a kthread to a non-root cgroup, it can * become trapped in a cpuset, or RT kthread may be born in a * cgroup with no rt_runtime allocated. Just say no. */ if (tsk->no_cgroup_migration || (tsk->flags & PF_NO_SETAFFINITY)) { tsk = ERR_PTR(-EINVAL); goto out_unlock_rcu; } get_task_struct(tsk); rcu_read_unlock(); /* * If we migrate a single thread, we don't care about threadgroup * stability. If the thread is `current`, it won't exit(2) under our * hands or change PID through exec(2). We exclude * cgroup_update_dfl_csses and other cgroup_{proc,thread}s_write callers * by cgroup_mutex. Therefore, we can skip the global lock. */ lockdep_assert_held(&cgroup_mutex); if (pid || threadgroup) { if (cgroup_enable_per_threadgroup_rwsem) *lock_mode = CGRP_ATTACH_LOCK_PER_THREADGROUP; else *lock_mode = CGRP_ATTACH_LOCK_GLOBAL; } else { *lock_mode = CGRP_ATTACH_LOCK_NONE; } cgroup_attach_lock(*lock_mode, tsk); if (threadgroup) { if (!thread_group_leader(tsk)) { /* * A race with de_thread from another thread's exec() * may strip us of our leadership. If this happens, * throw this task away and try again. */ cgroup_attach_unlock(*lock_mode, tsk); put_task_struct(tsk); goto retry_find_task; } } return tsk; out_unlock_rcu: rcu_read_unlock(); return tsk; } void cgroup_procs_write_finish(struct task_struct *task, enum cgroup_attach_lock_mode lock_mode) { cgroup_attach_unlock(lock_mode, task); /* release reference from cgroup_procs_write_start() */ put_task_struct(task); } static void cgroup_print_ss_mask(struct seq_file *seq, u16 ss_mask) { struct cgroup_subsys *ss; bool printed = false; int ssid; do_each_subsys_mask(ss, ssid, ss_mask) { if (printed) seq_putc(seq, ' '); seq_puts(seq, ss->name); printed = true; } while_each_subsys_mask(); if (printed) seq_putc(seq, '\n'); } /* show controllers which are enabled from the parent */ static int cgroup_controllers_show(struct seq_file *seq, void *v) { struct cgroup *cgrp = seq_css(seq)->cgroup; cgroup_print_ss_mask(seq, cgroup_control(cgrp)); return 0; } /* show controllers which are enabled for a given cgroup's children */ static int cgroup_subtree_control_show(struct seq_file *seq, void *v) { struct cgroup *cgrp = seq_css(seq)->cgroup; cgroup_print_ss_mask(seq, cgrp->subtree_control); return 0; } /** * cgroup_update_dfl_csses - update css assoc of a subtree in default hierarchy * @cgrp: root of the subtree to update csses for * * @cgrp's control masks have changed and its subtree's css associations * need to be updated accordingly. This function looks up all css_sets * which are attached to the subtree, creates the matching updated css_sets * and migrates the tasks to the new ones. */ static int cgroup_update_dfl_csses(struct cgroup *cgrp) { DEFINE_CGROUP_MGCTX(mgctx); struct cgroup_subsys_state *d_css; struct cgroup *dsct; struct css_set *src_cset; enum cgroup_attach_lock_mode lock_mode; bool has_tasks; int ret; lockdep_assert_held(&cgroup_mutex); /* look up all csses currently attached to @cgrp's subtree */ spin_lock_irq(&css_set_lock); cgroup_for_each_live_descendant_pre(dsct, d_css, cgrp) { struct cgrp_cset_link *link; /* * As cgroup_update_dfl_csses() is only called by * cgroup_apply_control(). The csses associated with the * given cgrp will not be affected by changes made to * its subtree_control file. We can skip them. */ if (dsct == cgrp) continue; list_for_each_entry(link, &dsct->cset_links, cset_link) cgroup_migrate_add_src(link->cset, dsct, &mgctx); } spin_unlock_irq(&css_set_lock); /* * We need to write-lock threadgroup_rwsem while migrating tasks. * However, if there are no source csets for @cgrp, changing its * controllers isn't gonna produce any task migrations and the * write-locking can be skipped safely. */ has_tasks = !list_empty(&mgctx.preloaded_src_csets); if (has_tasks) lock_mode = CGRP_ATTACH_LOCK_GLOBAL; else lock_mode = CGRP_ATTACH_LOCK_NONE; cgroup_attach_lock(lock_mode, NULL); /* NULL dst indicates self on default hierarchy */ ret = cgroup_migrate_prepare_dst(&mgctx); if (ret) goto out_finish; spin_lock_irq(&css_set_lock); list_for_each_entry(src_cset, &mgctx.preloaded_src_csets, mg_src_preload_node) { struct task_struct *task, *ntask; /* all tasks in src_csets need to be migrated */ list_for_each_entry_safe(task, ntask, &src_cset->tasks, cg_list) cgroup_migrate_add_task(task, &mgctx); } spin_unlock_irq(&css_set_lock); ret = cgroup_migrate_execute(&mgctx); out_finish: cgroup_migrate_finish(&mgctx); cgroup_attach_unlock(lock_mode, NULL); return ret; } /** * cgroup_lock_and_drain_offline - lock cgroup_mutex and drain offlined csses * @cgrp: root of the target subtree * * Because css offlining is asynchronous, userland may try to re-enable a * controller while the previous css is still around. This function grabs * cgroup_mutex and drains the previous css instances of @cgrp's subtree. */ void cgroup_lock_and_drain_offline(struct cgroup *cgrp) __acquires(&cgroup_mutex) { struct cgroup *dsct; struct cgroup_subsys_state *d_css; struct cgroup_subsys *ss; int ssid; restart: cgroup_lock(); cgroup_for_each_live_descendant_post(dsct, d_css, cgrp) { for_each_subsys(ss, ssid) { struct cgroup_subsys_state *css = cgroup_css(dsct, ss); DEFINE_WAIT(wait); if (!css || !percpu_ref_is_dying(&css->refcnt)) continue; cgroup_get_live(dsct); prepare_to_wait(&dsct->offline_waitq, &wait, TASK_UNINTERRUPTIBLE); cgroup_unlock(); schedule(); finish_wait(&dsct->offline_waitq, &wait); cgroup_put(dsct); goto restart; } } } /** * cgroup_save_control - save control masks and dom_cgrp of a subtree * @cgrp: root of the target subtree * * Save ->subtree_control, ->subtree_ss_mask and ->dom_cgrp to the * respective old_ prefixed fields for @cgrp's subtree including @cgrp * itself. */ static void cgroup_save_control(struct cgroup *cgrp) { struct cgroup *dsct; struct cgroup_subsys_state *d_css; cgroup_for_each_live_descendant_pre(dsct, d_css, cgrp) { dsct->old_subtree_control = dsct->subtree_control; dsct->old_subtree_ss_mask = dsct->subtree_ss_mask; dsct->old_dom_cgrp = dsct->dom_cgrp; } } /** * cgroup_propagate_control - refresh control masks of a subtree * @cgrp: root of the target subtree * * For @cgrp and its subtree, ensure ->subtree_ss_mask matches * ->subtree_control and propagate controller availability through the * subtree so that descendants don't have unavailable controllers enabled. */ static void cgroup_propagate_control(struct cgroup *cgrp) { struct cgroup *dsct; struct cgroup_subsys_state *d_css; cgroup_for_each_live_descendant_pre(dsct, d_css, cgrp) { dsct->subtree_control &= cgroup_control(dsct); dsct->subtree_ss_mask = cgroup_calc_subtree_ss_mask(dsct->subtree_control, cgroup_ss_mask(dsct)); } } /** * cgroup_restore_control - restore control masks and dom_cgrp of a subtree * @cgrp: root of the target subtree * * Restore ->subtree_control, ->subtree_ss_mask and ->dom_cgrp from the * respective old_ prefixed fields for @cgrp's subtree including @cgrp * itself. */ static void cgroup_restore_control(struct cgroup *cgrp) { struct cgroup *dsct; struct cgroup_subsys_state *d_css; cgroup_for_each_live_descendant_post(dsct, d_css, cgrp) { dsct->subtree_control = dsct->old_subtree_control; dsct->subtree_ss_mask = dsct->old_subtree_ss_mask; dsct->dom_cgrp = dsct->old_dom_cgrp; } } static bool css_visible(struct cgroup_subsys_state *css) { struct cgroup_subsys *ss = css->ss; struct cgroup *cgrp = css->cgroup; if (cgroup_control(cgrp) & (1 << ss->id)) return true; if (!(cgroup_ss_mask(cgrp) & (1 << ss->id))) return false; return cgroup_on_dfl(cgrp) && ss->implicit_on_dfl; } /** * cgroup_apply_control_enable - enable or show csses according to control * @cgrp: root of the target subtree * * Walk @cgrp's subtree and create new csses or make the existing ones * visible. A css is created invisible if it's being implicitly enabled * through dependency. An invisible css is made visible when the userland * explicitly enables it. * * Returns 0 on success, -errno on failure. On failure, csses which have * been processed already aren't cleaned up. The caller is responsible for * cleaning up with cgroup_apply_control_disable(). */ static int cgroup_apply_control_enable(struct cgroup *cgrp) { struct cgroup *dsct; struct cgroup_subsys_state *d_css; struct cgroup_subsys *ss; int ssid, ret; cgroup_for_each_live_descendant_pre(dsct, d_css, cgrp) { for_each_subsys(ss, ssid) { struct cgroup_subsys_state *css = cgroup_css(dsct, ss); if (!(cgroup_ss_mask(dsct) & (1 << ss->id))) continue; if (!css) { css = css_create(dsct, ss); if (IS_ERR(css)) return PTR_ERR(css); } WARN_ON_ONCE(percpu_ref_is_dying(&css->refcnt)); if (css_visible(css)) { ret = css_populate_dir(css); if (ret) return ret; } } } return 0; } /** * cgroup_apply_control_disable - kill or hide csses according to control * @cgrp: root of the target subtree * * Walk @cgrp's subtree and kill and hide csses so that they match * cgroup_ss_mask() and cgroup_visible_mask(). * * A css is hidden when the userland requests it to be disabled while other * subsystems are still depending on it. The css must not actively control * resources and be in the vanilla state if it's made visible again later. * Controllers which may be depended upon should provide ->css_reset() for * this purpose. */ static void cgroup_apply_control_disable(struct cgroup *cgrp) { struct cgroup *dsct; struct cgroup_subsys_state *d_css; struct cgroup_subsys *ss; int ssid; cgroup_for_each_live_descendant_post(dsct, d_css, cgrp) { for_each_subsys(ss, ssid) { struct cgroup_subsys_state *css = cgroup_css(dsct, ss); if (!css) continue; WARN_ON_ONCE(percpu_ref_is_dying(&css->refcnt)); if (css->parent && !(cgroup_ss_mask(dsct) & (1 << ss->id))) { kill_css(css); } else if (!css_visible(css)) { css_clear_dir(css); if (ss->css_reset) ss->css_reset(css); } } } } /** * cgroup_apply_control - apply control mask updates to the subtree * @cgrp: root of the target subtree * * subsystems can be enabled and disabled in a subtree using the following * steps. * * 1. Call cgroup_save_control() to stash the current state. * 2. Update ->subtree_control masks in the subtree as desired. * 3. Call cgroup_apply_control() to apply the changes. * 4. Optionally perform other related operations. * 5. Call cgroup_finalize_control() to finish up. * * This function implements step 3 and propagates the mask changes * throughout @cgrp's subtree, updates csses accordingly and perform * process migrations. */ static int cgroup_apply_control(struct cgroup *cgrp) { int ret; cgroup_propagate_control(cgrp); ret = cgroup_apply_control_enable(cgrp); if (ret) return ret; /* * At this point, cgroup_e_css_by_mask() results reflect the new csses * making the following cgroup_update_dfl_csses() properly update * css associations of all tasks in the subtree. */ return cgroup_update_dfl_csses(cgrp); } /** * cgroup_finalize_control - finalize control mask update * @cgrp: root of the target subtree * @ret: the result of the update * * Finalize control mask update. See cgroup_apply_control() for more info. */ static void cgroup_finalize_control(struct cgroup *cgrp, int ret) { if (ret) { cgroup_restore_control(cgrp); cgroup_propagate_control(cgrp); } cgroup_apply_control_disable(cgrp); } static int cgroup_vet_subtree_control_enable(struct cgroup *cgrp, u16 enable) { u16 domain_enable = enable & ~cgrp_dfl_threaded_ss_mask; /* if nothing is getting enabled, nothing to worry about */ if (!enable) return 0; /* can @cgrp host any resources? */ if (!cgroup_is_valid_domain(cgrp->dom_cgrp)) return -EOPNOTSUPP; /* mixables don't care */ if (cgroup_is_mixable(cgrp)) return 0; if (domain_enable) { /* can't enable domain controllers inside a thread subtree */ if (cgroup_is_thread_root(cgrp) || cgroup_is_threaded(cgrp)) return -EOPNOTSUPP; } else { /* * Threaded controllers can handle internal competitions * and are always allowed inside a (prospective) thread * subtree. */ if (cgroup_can_be_thread_root(cgrp) || cgroup_is_threaded(cgrp)) return 0; } /* * Controllers can't be enabled for a cgroup with tasks to avoid * child cgroups competing against tasks. */ if (cgroup_has_tasks(cgrp)) return -EBUSY; return 0; } /* change the enabled child controllers for a cgroup in the default hierarchy */ static ssize_t cgroup_subtree_co |