1"""Read/Write Videos (and images) using PyAV.
2
3.. note::
4 To use this plugin you need to have `PyAV <https://pyav.org/docs/stable/>`_
5 installed::
6
7 pip install av
8
9This plugin wraps pyAV, a pythonic binding for the FFMPEG library. It is similar
10to our FFMPEG plugin, has improved performance, features a robust interface, and
11aims to supersede the FFMPEG plugin in the future.
12
13
14Methods
15-------
16.. note::
17 Check the respective function for a list of supported kwargs and detailed
18 documentation.
19
20.. autosummary::
21 :toctree:
22
23 PyAVPlugin.read
24 PyAVPlugin.iter
25 PyAVPlugin.write
26 PyAVPlugin.properties
27 PyAVPlugin.metadata
28
29Additional methods available inside the :func:`imopen <imageio.v3.imopen>`
30context:
31
32.. autosummary::
33 :toctree:
34
35 PyAVPlugin.init_video_stream
36 PyAVPlugin.write_frame
37 PyAVPlugin.set_video_filter
38 PyAVPlugin.container_metadata
39 PyAVPlugin.video_stream_metadata
40
41Advanced API
42------------
43
44In addition to the default ImageIO v3 API this plugin exposes custom functions
45that are specific to reading/writing video and its metadata. These are available
46inside the :func:`imopen <imageio.v3.imopen>` context and allow fine-grained
47control over how the video is processed. The functions are documented above and
48below you can find a usage example::
49
50 import imageio.v3 as iio
51
52 with iio.imopen("test.mp4", "w", plugin="pyav") as file:
53 file.init_video_stream("libx264")
54 file.container_metadata["comment"] = "This video was created using ImageIO."
55
56 for _ in range(5):
57 for frame in iio.imiter("imageio:newtonscradle.gif"):
58 file.write_frame(frame)
59
60 meta = iio.immeta("test.mp4", plugin="pyav")
61 assert meta["comment"] == "This video was created using ImageIO."
62
63
64
65Pixel Formats (Colorspaces)
66---------------------------
67
68By default, this plugin converts the video into 8-bit RGB (called ``rgb24`` in
69ffmpeg). This is a useful behavior for many use-cases, but sometimes you may
70want to use the video's native colorspace or you may wish to convert the video
71into an entirely different colorspace. This is controlled using the ``format``
72kwarg. You can use ``format=None`` to leave the image in its native colorspace
73or specify any colorspace supported by FFMPEG as long as it is stridable, i.e.,
74as long as it can be represented by a single numpy array. Some useful choices
75include:
76
77- rgb24 (default; 8-bit RGB)
78- rgb48le (16-bit lower-endian RGB)
79- bgr24 (8-bit BGR; openCVs default colorspace)
80- gray (8-bit grayscale)
81- yuv444p (8-bit channel-first YUV)
82
83Further, FFMPEG maintains a list of available formats, albeit not as part of the
84narrative docs. It can be `found here
85<https://ffmpeg.org/doxygen/trunk/pixfmt_8h_source.html>`_ (warning: C source
86code).
87
88Filters
89-------
90
91On top of providing basic read/write functionality, this plugin allows you to
92use the full collection of `video filters available in FFMPEG
93<https://ffmpeg.org/ffmpeg-filters.html#Video-Filters>`_. This means that you
94can apply excessive preprocessing to your video before retrieving it as a numpy
95array or apply excessive post-processing before you encode your data.
96
97Filters come in two forms: sequences or graphs. Filter sequences are, as the
98name suggests, sequences of filters that are applied one after the other. They
99are specified using the ``filter_sequence`` kwarg. Filter graphs, on the other
100hand, come in the form of a directed graph and are specified using the
101``filter_graph`` kwarg.
102
103.. note::
104 All filters are either sequences or graphs. If all you want is to apply a
105 single filter, you can do this by specifying a filter sequence with a single
106 entry.
107
108A ``filter_sequence`` is a list of filters, each defined through a 2-element
109tuple of the form ``(filter_name, filter_parameters)``. The first element of the
110tuple is the name of the filter. The second element are the filter parameters,
111which can be given either as a string or a dict. The string matches the same
112format that you would use when specifying the filter using the ffmpeg
113command-line tool and the dict has entries of the form ``parameter:value``. For
114example::
115
116 import imageio.v3 as iio
117
118 # using a filter_parameters str
119 img1 = iio.imread(
120 "imageio:cockatoo.mp4",
121 plugin="pyav",
122 filter_sequence=[
123 ("rotate", "45*PI/180")
124 ]
125 )
126
127 # using a filter_parameters dict
128 img2 = iio.imread(
129 "imageio:cockatoo.mp4",
130 plugin="pyav",
131 filter_sequence=[
132 ("rotate", {"angle":"45*PI/180", "fillcolor":"AliceBlue"})
133 ]
134 )
135
136A ``filter_graph``, on the other hand, is specified using a ``(nodes, edges)``
137tuple. It is best explained using an example::
138
139 img = iio.imread(
140 "imageio:cockatoo.mp4",
141 plugin="pyav",
142 filter_graph=(
143 {
144 "split": ("split", ""),
145 "scale_overlay":("scale", "512:-1"),
146 "overlay":("overlay", "x=25:y=25:enable='between(t,1,8)'"),
147 },
148 [
149 ("video_in", "split", 0, 0),
150 ("split", "overlay", 0, 0),
151 ("split", "scale_overlay", 1, 0),
152 ("scale_overlay", "overlay", 0, 1),
153 ("overlay", "video_out", 0, 0),
154 ]
155 )
156 )
157
158The above transforms the video to have picture-in-picture of itself in the top
159left corner. As you can see, nodes are specified using a dict which has names as
160its keys and filter tuples as values; the same tuples as the ones used when
161defining a filter sequence. Edges are a list of a 4-tuples of the form
162``(node_out, node_in, output_idx, input_idx)`` and specify which two filters are
163connected and which inputs/outputs should be used for this.
164
165Further, there are two special nodes in a filter graph: ``video_in`` and
166``video_out``, which represent the graph's input and output respectively. These
167names can not be chosen for other nodes (those nodes would simply be
168overwritten), and for a graph to be valid there must be a path from the input to
169the output and all nodes in the graph must be connected.
170
171While most graphs are quite simple, they can become very complex and we
172recommend that you read through the `FFMPEG documentation
173<https://ffmpeg.org/ffmpeg-filters.html#Filtergraph-description>`_ and their
174examples to better understand how to use them.
175
176"""
177
178from fractions import Fraction
179from math import ceil
180from typing import Any, Dict, Generator, List, Optional, Tuple, Union
181
182import av
183import av.filter
184import numpy as np
185from av.codec.context import Flags
186from numpy.lib.stride_tricks import as_strided
187
188from ..core import Request
189from ..core.request import URI_BYTES, InitializationError, IOMode
190from ..core.v3_plugin_api import ImageProperties, PluginV3
191
192
193def _format_to_dtype(format: av.VideoFormat) -> np.dtype:
194 """Convert a pyAV video format into a numpy dtype"""
195
196 if len(format.components) == 0:
197 # fake format
198 raise ValueError(
199 f"Can't determine dtype from format `{format.name}`. It has no channels."
200 )
201
202 endian = ">" if format.is_big_endian else "<"
203 dtype = "f" if "f32" in format.name else "u"
204 bits_per_channel = [x.bits for x in format.components]
205 n_bytes = str(int(ceil(bits_per_channel[0] / 8)))
206
207 return np.dtype(endian + dtype + n_bytes)
208
209
210def _get_frame_shape(frame: av.VideoFrame) -> Tuple[int, ...]:
211 """Compute the frame's array shape
212
213 Parameters
214 ----------
215 frame : av.VideoFrame
216 A frame for which the resulting shape should be computed.
217
218 Returns
219 -------
220 shape : Tuple[int, ...]
221 A tuple describing the shape of the image data in the frame.
222
223 """
224
225 widths = [component.width for component in frame.format.components]
226 heights = [component.height for component in frame.format.components]
227 bits = np.array([component.bits for component in frame.format.components])
228 line_sizes = [plane.line_size for plane in frame.planes]
229
230 subsampled_width = widths[:-1] != widths[1:]
231 subsampled_height = heights[:-1] != heights[1:]
232 unaligned_components = np.any(bits % 8 != 0) or (line_sizes[:-1] != line_sizes[1:])
233 if subsampled_width or subsampled_height or unaligned_components:
234 raise IOError(
235 f"{frame.format.name} can't be expressed as a strided array."
236 "Use `format=` to select a format to convert into."
237 )
238
239 shape = [frame.height, frame.width]
240
241 # ffmpeg doesn't have a notion of channel-first or channel-last formats
242 # instead it stores frames in one or more planes which contain individual
243 # components of a pixel depending on the pixel format. For channel-first
244 # formats each component lives on a separate plane (n_planes) and for
245 # channel-last formats all components are packed on a single plane
246 # (n_channels)
247 n_planes = max([component.plane for component in frame.format.components]) + 1
248 if n_planes > 1:
249 shape = [n_planes] + shape
250
251 channels_per_plane = [0] * n_planes
252 for component in frame.format.components:
253 channels_per_plane[component.plane] += 1
254 n_channels = max(channels_per_plane)
255
256 if n_channels > 1:
257 shape = shape + [n_channels]
258
259 return tuple(shape)
260
261
262def _get_frame_type(picture_type: int) -> str:
263 """Return a human-readable name for provided picture type
264
265 Parameters
266 ----------
267 picture_type : int
268 The picture type extracted from Frame.pict_type
269
270 Returns
271 -------
272 picture_name : str
273 A human readable name of the picture type
274
275 """
276
277 if not isinstance(picture_type, int):
278 # old pyAV versions send an enum, not an int
279 return picture_type.name
280
281 picture_types = [
282 "NONE",
283 "I",
284 "P",
285 "B",
286 "S",
287 "SI",
288 "SP",
289 "BI",
290 ]
291
292 return picture_types[picture_type]
293
294
295class PyAVPlugin(PluginV3):
296 """Support for pyAV as backend.
297
298 Parameters
299 ----------
300 request : iio.Request
301 A request object that represents the users intent. It provides a
302 standard interface to access various the various ImageResources and
303 serves them to the plugin as a file object (or file). Check the docs for
304 details.
305 container : str
306 Only used during `iio_mode="w"`! If not None, overwrite the default container
307 format chosen by pyav.
308 kwargs : Any
309 Additional kwargs are forwarded to PyAV's constructor.
310
311 """
312
313 def __init__(self, request: Request, *, container: str = None, **kwargs) -> None:
314 """Initialize a new Plugin Instance.
315
316 See Plugin's docstring for detailed documentation.
317
318 Notes
319 -----
320 The implementation here stores the request as a local variable that is
321 exposed using a @property below. If you inherit from PluginV3, remember
322 to call ``super().__init__(request)``.
323
324 """
325
326 super().__init__(request)
327
328 self._container = None
329 self._video_stream = None
330 self._video_filter = None
331
332 if request.mode.io_mode == IOMode.read:
333 self._next_idx = 0
334 try:
335 if request._uri_type == 5: # 5 is the value of URI_HTTP
336 # pyav should read from HTTP by itself. This enables reading
337 # HTTP-based streams like DASH. Note that solving streams
338 # like this is temporary until the new request object gets
339 # implemented.
340 self._container = av.open(request.raw_uri, **kwargs)
341 else:
342 self._container = av.open(request.get_file(), **kwargs)
343 self._video_stream = self._container.streams.video[0]
344 self._decoder = self._container.decode(video=0)
345 except av.FFmpegError:
346 if isinstance(request.raw_uri, bytes):
347 msg = "PyAV does not support these `<bytes>`"
348 else:
349 msg = f"PyAV does not support `{request.raw_uri}`"
350 raise InitializationError(msg) from None
351 else:
352 self.frames_written = 0
353 file_handle = self.request.get_file()
354 filename = getattr(file_handle, "name", None)
355 extension = self.request.extension or self.request.format_hint
356 if extension is None:
357 raise InitializationError("Can't determine output container to use.")
358
359 # hacky, but beats running our own format selection logic
360 # (since av_guess_format is not exposed)
361 try:
362 setattr(file_handle, "name", filename or "tmp" + extension)
363 except AttributeError:
364 pass # read-only, nothing we can do
365
366 try:
367 self._container = av.open(
368 file_handle, mode="w", format=container, **kwargs
369 )
370 except ValueError:
371 raise InitializationError(
372 f"PyAV can not write to `{self.request.raw_uri}`"
373 )
374
375 # ---------------------
376 # Standard V3 Interface
377 # ---------------------
378
379 def read(
380 self,
381 *,
382 index: int = ...,
383 format: str = "rgb24",
384 filter_sequence: List[Tuple[str, Union[str, dict]]] = None,
385 filter_graph: Tuple[dict, List] = None,
386 constant_framerate: bool = None,
387 thread_count: int = 0,
388 thread_type: str = None,
389 ) -> np.ndarray:
390 """Read frames from the video.
391
392 If ``index`` is an integer, this function reads the index-th frame from
393 the file. If ``index`` is ... (Ellipsis), this function reads all frames
394 from the video, stacks them along the first dimension, and returns a
395 batch of frames.
396
397 Parameters
398 ----------
399 index : int
400 The index of the frame to read, e.g. ``index=5`` reads the 5th
401 frame. If ``...``, read all the frames in the video and stack them
402 along a new, prepended, batch dimension.
403 format : str
404 Set the returned colorspace. If not None (default: rgb24), convert
405 the data into the given format before returning it. If ``None``
406 return the data in the encoded format if it can be expressed as a
407 strided array; otherwise raise an Exception.
408 filter_sequence : List[str, str, dict]
409 If not None, apply the given sequence of FFmpeg filters to each
410 ndimage. Check the (module-level) plugin docs for details and
411 examples.
412 filter_graph : (dict, List)
413 If not None, apply the given graph of FFmpeg filters to each
414 ndimage. The graph is given as a tuple of two dicts. The first dict
415 contains a (named) set of nodes, and the second dict contains a set
416 of edges between nodes of the previous dict. Check the (module-level)
417 plugin docs for details and examples.
418 constant_framerate : bool
419 If True assume the video's framerate is constant. This allows for
420 faster seeking inside the file. If False, the video is reset before
421 each read and searched from the beginning. If None (default), this
422 value will be read from the container format.
423 thread_count : int
424 How many threads to use when decoding a frame. The default is 0,
425 which will set the number using ffmpeg's default, which is based on
426 the codec, number of available cores, threadding model, and other
427 considerations.
428 thread_type : str
429 The threading model to be used. One of
430
431 - `"SLICE"`: threads assemble parts of the current frame
432 - `"FRAME"`: threads may assemble future frames
433 - None (default): Uses ``"FRAME"`` if ``index=...`` and ffmpeg's
434 default otherwise.
435
436
437 Returns
438 -------
439 frame : np.ndarray
440 A numpy array containing loaded frame data.
441
442 Notes
443 -----
444 Accessing random frames repeatedly is costly (O(k), where k is the
445 average distance between two keyframes). You should do so only sparingly
446 if possible. In some cases, it can be faster to bulk-read the video (if
447 it fits into memory) and to then access the returned ndarray randomly.
448
449 The current implementation may cause problems for b-frames, i.e.,
450 bidirectionaly predicted pictures. I lack test videos to write unit
451 tests for this case.
452
453 Reading from an index other than ``...``, i.e. reading a single frame,
454 currently doesn't support filters that introduce delays.
455
456 """
457
458 if index is ...:
459 props = self.properties(format=format)
460 uses_filter = (
461 self._video_filter is not None
462 or filter_graph is not None
463 or filter_sequence is not None
464 )
465
466 self._container.seek(0)
467 if not uses_filter and props.shape[0] != 0:
468 frames = np.empty(props.shape, dtype=props.dtype)
469 for idx, frame in enumerate(
470 self.iter(
471 format=format,
472 filter_sequence=filter_sequence,
473 filter_graph=filter_graph,
474 thread_count=thread_count,
475 thread_type=thread_type or "FRAME",
476 )
477 ):
478 frames[idx] = frame
479 else:
480 frames = np.stack(
481 [
482 x
483 for x in self.iter(
484 format=format,
485 filter_sequence=filter_sequence,
486 filter_graph=filter_graph,
487 thread_count=thread_count,
488 thread_type=thread_type or "FRAME",
489 )
490 ]
491 )
492
493 # reset stream container, because threading model can't change after
494 # first access
495 self._video_stream = self._container.streams.video[0]
496
497 return frames
498
499 if thread_type is not None and not (
500 self._video_stream.thread_type == thread_type
501 or self._video_stream.thread_type.name == thread_type
502 ):
503 self._video_stream.thread_type = thread_type
504
505 if (
506 thread_count != 0
507 and thread_count != self._video_stream.codec_context.thread_count
508 ):
509 # in FFMPEG thread_count == 0 means use the default count, which we
510 # change to mean don't change the thread count.
511 self._video_stream.codec_context.thread_count = thread_count
512
513 if constant_framerate is None:
514 # "variable_fps" is now a flag (handle got removed). Full list at
515 # https://pyav.org/docs/stable/api/container.html#module-av.format
516 variable_fps = bool(self._container.format.flags & 0x400)
517 constant_framerate = not variable_fps
518
519 # note: cheap for contigous incremental reads
520 self._seek(index, constant_framerate=constant_framerate)
521 desired_frame = next(self._decoder)
522 self._next_idx += 1
523
524 self.set_video_filter(filter_sequence, filter_graph)
525 if self._video_filter is not None:
526 desired_frame = self._video_filter.send(desired_frame)
527
528 return self._unpack_frame(desired_frame, format=format)
529
530 def iter(
531 self,
532 *,
533 format: str = "rgb24",
534 filter_sequence: List[Tuple[str, Union[str, dict]]] = None,
535 filter_graph: Tuple[dict, List] = None,
536 thread_count: int = 0,
537 thread_type: str = None,
538 ) -> np.ndarray:
539 """Yield frames from the video.
540
541 Parameters
542 ----------
543 frame : np.ndarray
544 A numpy array containing loaded frame data.
545 format : str
546 Convert the data into the given format before returning it. If None,
547 return the data in the encoded format if it can be expressed as a
548 strided array; otherwise raise an Exception.
549 filter_sequence : List[str, str, dict]
550 Set the returned colorspace. If not None (default: rgb24), convert
551 the data into the given format before returning it. If ``None``
552 return the data in the encoded format if it can be expressed as a
553 strided array; otherwise raise an Exception.
554 filter_graph : (dict, List)
555 If not None, apply the given graph of FFmpeg filters to each
556 ndimage. The graph is given as a tuple of two dicts. The first dict
557 contains a (named) set of nodes, and the second dict contains a set
558 of edges between nodes of the previous dict. Check the (module-level)
559 plugin docs for details and examples.
560 thread_count : int
561 How many threads to use when decoding a frame. The default is 0,
562 which will set the number using ffmpeg's default, which is based on
563 the codec, number of available cores, threadding model, and other
564 considerations.
565 thread_type : str
566 The threading model to be used. One of
567
568 - `"SLICE"` (default): threads assemble parts of the current frame
569 - `"FRAME"`: threads may assemble future frames (faster for bulk reading)
570
571
572 Yields
573 ------
574 frame : np.ndarray
575 A (decoded) video frame.
576
577
578 """
579
580 self._video_stream.thread_type = thread_type or "SLICE"
581 self._video_stream.codec_context.thread_count = thread_count
582
583 self.set_video_filter(filter_sequence, filter_graph)
584
585 for frame in self._decoder:
586 self._next_idx += 1
587
588 if self._video_filter is not None:
589 try:
590 frame = self._video_filter.send(frame)
591 except StopIteration:
592 break
593
594 if frame is None:
595 continue
596
597 yield self._unpack_frame(frame, format=format)
598
599 if self._video_filter is not None:
600 for frame in self._video_filter:
601 yield self._unpack_frame(frame, format=format)
602
603 def write(
604 self,
605 ndimage: Union[np.ndarray, List[np.ndarray]],
606 *,
607 codec: str = None,
608 is_batch: bool = True,
609 fps: int = 24,
610 in_pixel_format: str = "rgb24",
611 out_pixel_format: str = None,
612 filter_sequence: List[Tuple[str, Union[str, dict]]] = None,
613 filter_graph: Tuple[dict, List] = None,
614 ) -> Optional[bytes]:
615 """Save a ndimage as a video.
616
617 Given a batch of frames (stacked along the first axis) or a list of
618 frames, encode them and add the result to the ImageResource.
619
620 Parameters
621 ----------
622 ndimage : ArrayLike, List[ArrayLike]
623 The ndimage to encode and write to the ImageResource.
624 codec : str
625 The codec to use when encoding frames. Only needed on first write
626 and ignored on subsequent writes.
627 is_batch : bool
628 If True (default), the ndimage is a batch of images, otherwise it is
629 a single image. This parameter has no effect on lists of ndimages.
630 fps : str
631 The resulting videos frames per second.
632 in_pixel_format : str
633 The pixel format of the incoming ndarray. Defaults to "rgb24" and can
634 be any stridable pix_fmt supported by FFmpeg.
635 out_pixel_format : str
636 The pixel format to use while encoding frames. If None (default)
637 use the codec's default.
638 filter_sequence : List[str, str, dict]
639 If not None, apply the given sequence of FFmpeg filters to each
640 ndimage. Check the (module-level) plugin docs for details and
641 examples.
642 filter_graph : (dict, List)
643 If not None, apply the given graph of FFmpeg filters to each
644 ndimage. The graph is given as a tuple of two dicts. The first dict
645 contains a (named) set of nodes, and the second dict contains a set
646 of edges between nodes of the previous dict. Check the (module-level)
647 plugin docs for details and examples.
648
649 Returns
650 -------
651 encoded_image : bytes or None
652 If the chosen ImageResource is the special target ``"<bytes>"`` then
653 write will return a byte string containing the encoded image data.
654 Otherwise, it returns None.
655
656 Notes
657 -----
658 When writing ``<bytes>``, the video is finalized immediately after the
659 first write call and calling write multiple times to append frames is
660 not possible.
661
662 """
663
664 if isinstance(ndimage, list):
665 # frames shapes must agree for video
666 if any(f.shape != ndimage[0].shape for f in ndimage):
667 raise ValueError("All frames should have the same shape")
668 elif not is_batch:
669 ndimage = np.asarray(ndimage)[None, ...]
670 else:
671 ndimage = np.asarray(ndimage)
672
673 if self._video_stream is None:
674 self.init_video_stream(codec, fps=fps, pixel_format=out_pixel_format)
675
676 self.set_video_filter(filter_sequence, filter_graph)
677
678 for img in ndimage:
679 self.write_frame(img, pixel_format=in_pixel_format)
680
681 if self.request._uri_type == URI_BYTES:
682 # bytes are immutuable, so we have to flush immediately
683 # and can't support appending
684 self._flush_writer()
685 self._container.close()
686
687 return self.request.get_file().getvalue()
688
689 def properties(self, index: int = ..., *, format: str = "rgb24") -> ImageProperties:
690 """Standardized ndimage metadata.
691
692 Parameters
693 ----------
694 index : int
695 The index of the ndimage for which to return properties. If ``...``
696 (Ellipsis, default), return the properties for the resulting batch
697 of frames.
698 format : str
699 If not None (default: rgb24), convert the data into the given format
700 before returning it. If None return the data in the encoded format
701 if that can be expressed as a strided array; otherwise raise an
702 Exception.
703
704 Returns
705 -------
706 properties : ImageProperties
707 A dataclass filled with standardized image metadata.
708
709 Notes
710 -----
711 This function is efficient and won't process any pixel data.
712
713 The provided metadata does not include modifications by any filters
714 (through ``filter_sequence`` or ``filter_graph``).
715
716 """
717
718 video_width = self._video_stream.codec_context.width
719 video_height = self._video_stream.codec_context.height
720 pix_format = format or self._video_stream.codec_context.pix_fmt
721 frame_template = av.VideoFrame(video_width, video_height, pix_format)
722
723 shape = _get_frame_shape(frame_template)
724 if index is ...:
725 n_frames = self._video_stream.frames
726 shape = (n_frames,) + shape
727
728 return ImageProperties(
729 shape=tuple(shape),
730 dtype=_format_to_dtype(frame_template.format),
731 n_images=shape[0] if index is ... else None,
732 is_batch=index is ...,
733 )
734
735 def metadata(
736 self,
737 index: int = ...,
738 exclude_applied: bool = True,
739 constant_framerate: bool = None,
740 ) -> Dict[str, Any]:
741 """Format-specific metadata.
742
743 Returns a dictionary filled with metadata that is either stored in the
744 container, the video stream, or the frame's side-data.
745
746 Parameters
747 ----------
748 index : int
749 If ... (Ellipsis, default) return global metadata (the metadata
750 stored in the container and video stream). If not ..., return the
751 side data stored in the frame at the given index.
752 exclude_applied : bool
753 Currently, this parameter has no effect. It exists for compliance with
754 the ImageIO v3 API.
755 constant_framerate : bool
756 If True assume the video's framerate is constant. This allows for
757 faster seeking inside the file. If False, the video is reset before
758 each read and searched from the beginning. If None (default), this
759 value will be read from the container format.
760
761 Returns
762 -------
763 metadata : dict
764 A dictionary filled with format-specific metadata fields and their
765 values.
766
767 """
768
769 metadata = dict()
770
771 if index is ...:
772 # useful flags defined on the container and/or video stream
773 metadata.update(
774 {
775 "video_format": self._video_stream.codec_context.pix_fmt,
776 "codec": self._video_stream.codec.name,
777 "long_codec": self._video_stream.codec.long_name,
778 "profile": self._video_stream.profile,
779 "fps": float(self._video_stream.guessed_rate),
780 }
781 )
782 if self._video_stream.duration is not None:
783 duration = float(
784 self._video_stream.duration * self._video_stream.time_base
785 )
786 metadata.update({"duration": duration})
787
788 metadata.update(self.container_metadata)
789 metadata.update(self.video_stream_metadata)
790 return metadata
791
792 if constant_framerate is None:
793 # "variable_fps" is now a flag (handle got removed). Full list at
794 # https://pyav.org/docs/stable/api/container.html#module-av.format
795 variable_fps = bool(self._container.format.flags & 0x400)
796 constant_framerate = not variable_fps
797
798 self._seek(index, constant_framerate=constant_framerate)
799 desired_frame = next(self._decoder)
800 self._next_idx += 1
801
802 # useful flags defined on the frame
803 metadata.update(
804 {
805 "key_frame": bool(desired_frame.key_frame),
806 "time": desired_frame.time,
807 "interlaced_frame": bool(desired_frame.interlaced_frame),
808 "frame_type": _get_frame_type(desired_frame.pict_type),
809 }
810 )
811
812 # side data
813 metadata.update(
814 {item.type.name: bytes(item) for item in desired_frame.side_data}
815 )
816
817 return metadata
818
819 def close(self) -> None:
820 """Close the Video."""
821
822 is_write = self.request.mode.io_mode == IOMode.write
823 if is_write and self._video_stream is not None:
824 self._flush_writer()
825
826 if self._video_stream is not None:
827 self._video_stream = None
828
829 if self._container is not None:
830 self._container.close()
831
832 self.request.finish()
833
834 def __enter__(self) -> "PyAVPlugin":
835 return super().__enter__()
836
837 # ------------------------------
838 # Add-on Interface inside imopen
839 # ------------------------------
840
841 def init_video_stream(
842 self,
843 codec: str,
844 *,
845 fps: float = 24,
846 pixel_format: str = None,
847 max_keyframe_interval: int = None,
848 force_keyframes: bool = None,
849 ) -> None:
850 """Initialize a new video stream.
851
852 This function adds a new video stream to the ImageResource using the
853 selected encoder (codec), framerate, and colorspace.
854
855 Parameters
856 ----------
857 codec : str
858 The codec to use, e.g. ``"h264"`` or ``"vp9"``.
859 fps : float
860 The desired framerate of the video stream (frames per second).
861 pixel_format : str
862 The pixel format to use while encoding frames. If None (default) use
863 the codec's default.
864 max_keyframe_interval : int
865 The maximum distance between two intra frames (I-frames). Also known
866 as GOP size. If unspecified use the codec's default. Note that not
867 every I-frame is a keyframe; see the notes for details.
868 force_keyframes : bool
869 If True, limit inter frames dependency to frames within the current
870 keyframe interval (GOP), i.e., force every I-frame to be a keyframe.
871 If unspecified, use the codec's default.
872
873 Notes
874 -----
875 You can usually leave ``max_keyframe_interval`` and ``force_keyframes``
876 at their default values, unless you try to generate seek-optimized video
877 or have a similar specialist use-case. In this case, ``force_keyframes``
878 controls the ability to seek to _every_ I-frame, and
879 ``max_keyframe_interval`` controls how close to a random frame you can
880 seek. Low values allow more fine-grained seek at the expense of
881 file-size (and thus I/O performance).
882
883 """
884
885 fps = Fraction.from_float(fps)
886 stream = self._container.add_stream(codec, fps)
887 stream.time_base = Fraction(1 / fps).limit_denominator(int(2**16 - 1))
888 if pixel_format is not None:
889 stream.pix_fmt = pixel_format
890 if max_keyframe_interval is not None:
891 stream.gop_size = max_keyframe_interval
892 if force_keyframes is not None:
893 if force_keyframes:
894 stream.codec_context.flags |= Flags.closed_gop
895 else:
896 stream.codec_context.flags &= ~Flags.closed_gop
897
898 self._video_stream = stream
899
900 def write_frame(self, frame: np.ndarray, *, pixel_format: str = "rgb24") -> None:
901 """Add a frame to the video stream.
902
903 This function appends a new frame to the video. It assumes that the
904 stream previously has been initialized. I.e., ``init_video_stream`` has
905 to be called before calling this function for the write to succeed.
906
907 Parameters
908 ----------
909 frame : np.ndarray
910 The image to be appended/written to the video stream.
911 pixel_format : str
912 The colorspace (pixel format) of the incoming frame.
913
914 Notes
915 -----
916 Frames may be held in a buffer, e.g., by the filter pipeline used during
917 writing or by FFMPEG to batch them prior to encoding. Make sure to
918 ``.close()`` the plugin or to use a context manager to ensure that all
919 frames are written to the ImageResource.
920
921 """
922
923 # manual packing of ndarray into frame
924 # (this should live in pyAV, but it doesn't support all the formats we
925 # want and PRs there are slow)
926 pixel_format = av.VideoFormat(pixel_format)
927 img_dtype = _format_to_dtype(pixel_format)
928 width = frame.shape[2 if pixel_format.is_planar else 1]
929 height = frame.shape[1 if pixel_format.is_planar else 0]
930 av_frame = av.VideoFrame(width, height, pixel_format.name)
931 if pixel_format.is_planar:
932 for idx, plane in enumerate(av_frame.planes):
933 plane_array = np.frombuffer(plane, dtype=img_dtype)
934 plane_array = as_strided(
935 plane_array,
936 shape=(plane.height, plane.width),
937 strides=(plane.line_size, img_dtype.itemsize),
938 )
939 plane_array[...] = frame[idx]
940 else:
941 if pixel_format.name.startswith("bayer_"):
942 # ffmpeg doesn't describe bayer formats correctly
943 # see https://github.com/imageio/imageio/issues/761#issuecomment-1059318851
944 # and following for details.
945 n_channels = 1
946 else:
947 n_channels = len(pixel_format.components)
948
949 plane = av_frame.planes[0]
950 plane_shape = (plane.height, plane.width)
951 plane_strides = (plane.line_size, n_channels * img_dtype.itemsize)
952 if n_channels > 1:
953 plane_shape += (n_channels,)
954 plane_strides += (img_dtype.itemsize,)
955
956 plane_array = as_strided(
957 np.frombuffer(plane, dtype=img_dtype),
958 shape=plane_shape,
959 strides=plane_strides,
960 )
961 plane_array[...] = frame
962
963 stream = self._video_stream
964 av_frame.time_base = stream.codec_context.time_base
965 av_frame.pts = self.frames_written
966 self.frames_written += 1
967
968 if self._video_filter is not None:
969 av_frame = self._video_filter.send(av_frame)
970 if av_frame is None:
971 return
972
973 if stream.frames == 0:
974 stream.width = av_frame.width
975 stream.height = av_frame.height
976
977 for packet in stream.encode(av_frame):
978 self._container.mux(packet)
979
980 def set_video_filter(
981 self,
982 filter_sequence: List[Tuple[str, Union[str, dict]]] = None,
983 filter_graph: Tuple[dict, List] = None,
984 ) -> None:
985 """Set the filter(s) to use.
986
987 This function creates a new FFMPEG filter graph to use when reading or
988 writing video. In the case of reading, frames are passed through the
989 filter graph before begin returned and, in case of writing, frames are
990 passed through the filter before being written to the video.
991
992 Parameters
993 ----------
994 filter_sequence : List[str, str, dict]
995 If not None, apply the given sequence of FFmpeg filters to each
996 ndimage. Check the (module-level) plugin docs for details and
997 examples.
998 filter_graph : (dict, List)
999 If not None, apply the given graph of FFmpeg filters to each
1000 ndimage. The graph is given as a tuple of two dicts. The first dict
1001 contains a (named) set of nodes, and the second dict contains a set
1002 of edges between nodes of the previous dict. Check the
1003 (module-level) plugin docs for details and examples.
1004
1005 Notes
1006 -----
1007 Changing a filter graph with lag during reading or writing will
1008 currently cause frames in the filter queue to be lost.
1009
1010 """
1011
1012 if filter_sequence is None and filter_graph is None:
1013 self._video_filter = None
1014 return
1015
1016 if filter_sequence is None:
1017 filter_sequence = list()
1018
1019 node_descriptors: Dict[str, Tuple[str, Union[str, Dict]]]
1020 edges: List[Tuple[str, str, int, int]]
1021 if filter_graph is None:
1022 node_descriptors, edges = dict(), [("video_in", "video_out", 0, 0)]
1023 else:
1024 node_descriptors, edges = filter_graph
1025
1026 graph = av.filter.Graph()
1027
1028 previous_node = graph.add_buffer(template=self._video_stream)
1029 for filter_name, argument in filter_sequence:
1030 if isinstance(argument, str):
1031 current_node = graph.add(filter_name, argument)
1032 else:
1033 current_node = graph.add(filter_name, **argument)
1034 previous_node.link_to(current_node)
1035 previous_node = current_node
1036
1037 nodes = dict()
1038 nodes["video_in"] = previous_node
1039 nodes["video_out"] = graph.add("buffersink")
1040 for name, (filter_name, arguments) in node_descriptors.items():
1041 if isinstance(arguments, str):
1042 nodes[name] = graph.add(filter_name, arguments)
1043 else:
1044 nodes[name] = graph.add(filter_name, **arguments)
1045
1046 for from_note, to_node, out_idx, in_idx in edges:
1047 nodes[from_note].link_to(nodes[to_node], out_idx, in_idx)
1048
1049 graph.configure()
1050
1051 def video_filter():
1052 # this starts a co-routine
1053 # send frames using graph.send()
1054 frame = yield None
1055
1056 # send and receive frames in "parallel"
1057 while frame is not None:
1058 graph.push(frame)
1059 try:
1060 frame = yield graph.pull()
1061 except av.error.BlockingIOError:
1062 # filter has lag and needs more frames
1063 frame = yield None
1064 except av.error.EOFError:
1065 break
1066
1067 try:
1068 # send EOF in av>=9.0
1069 graph.push(None)
1070 except ValueError: # pragma: no cover
1071 # handle av<9.0
1072 pass
1073
1074 # all frames have been sent, empty the filter
1075 while True:
1076 try:
1077 yield graph.pull()
1078 except av.error.EOFError:
1079 break # EOF
1080 except av.error.BlockingIOError: # pragma: no cover
1081 # handle av<9.0
1082 break
1083
1084 self._video_filter = video_filter()
1085 self._video_filter.send(None)
1086
1087 @property
1088 def container_metadata(self):
1089 """Container-specific metadata.
1090
1091 A dictionary containing metadata stored at the container level.
1092
1093 """
1094 return self._container.metadata
1095
1096 @property
1097 def video_stream_metadata(self):
1098 """Stream-specific metadata.
1099
1100 A dictionary containing metadata stored at the stream level.
1101
1102 """
1103 return self._video_stream.metadata
1104
1105 # -------------------------------
1106 # Internals and private functions
1107 # -------------------------------
1108
1109 def _unpack_frame(self, frame: av.VideoFrame, *, format: str = None) -> np.ndarray:
1110 """Convert a av.VideoFrame into a ndarray
1111
1112 Parameters
1113 ----------
1114 frame : av.VideoFrame
1115 The frame to unpack.
1116 format : str
1117 If not None, convert the frame to the given format before unpacking.
1118
1119 """
1120
1121 if format is not None:
1122 frame = frame.reformat(format=format)
1123
1124 dtype = _format_to_dtype(frame.format)
1125 shape = _get_frame_shape(frame)
1126
1127 planes = list()
1128 for idx in range(len(frame.planes)):
1129 n_channels = sum(
1130 [
1131 x.bits // (dtype.itemsize * 8)
1132 for x in frame.format.components
1133 if x.plane == idx
1134 ]
1135 )
1136 av_plane = frame.planes[idx]
1137 plane_shape = (av_plane.height, av_plane.width)
1138 plane_strides = (av_plane.line_size, n_channels * dtype.itemsize)
1139 if n_channels > 1:
1140 plane_shape += (n_channels,)
1141 plane_strides += (dtype.itemsize,)
1142
1143 np_plane = as_strided(
1144 np.frombuffer(av_plane, dtype=dtype),
1145 shape=plane_shape,
1146 strides=plane_strides,
1147 )
1148 planes.append(np_plane)
1149
1150 if len(planes) > 1:
1151 # Note: the planes *should* exist inside a contigous memory block
1152 # somewhere inside av.Frame however pyAV does not appear to expose this,
1153 # so we are forced to copy the planes individually instead of wrapping
1154 # them :(
1155 out = np.concatenate(planes).reshape(shape)
1156 else:
1157 out = planes[0]
1158
1159 return out
1160
1161 def _seek(self, index, *, constant_framerate: bool = True) -> Generator:
1162 """Seeks to the frame at the given index."""
1163
1164 if index == self._next_idx:
1165 return # fast path :)
1166
1167 # we must decode at least once before we seek otherwise the
1168 # returned frames become corrupt.
1169 if self._next_idx == 0:
1170 next(self._decoder)
1171 self._next_idx += 1
1172
1173 if index == self._next_idx:
1174 return # fast path :)
1175
1176 # remove this branch until I find a way to efficiently find the next
1177 # keyframe. keeping this as a reminder
1178 # if self._next_idx < index and index < self._next_keyframe_idx:
1179 # frames_to_yield = index - self._next_idx
1180 if not constant_framerate and index > self._next_idx:
1181 frames_to_yield = index - self._next_idx
1182 elif not constant_framerate:
1183 # seek backwards and can't link idx and pts
1184 self._container.seek(0)
1185 self._decoder = self._container.decode(video=0)
1186 self._next_idx = 0
1187
1188 frames_to_yield = index
1189 else:
1190 # we know that the time between consecutive frames is constant
1191 # hence we can link index and pts
1192
1193 # how many pts lie between two frames
1194 sec_delta = 1 / self._video_stream.guessed_rate
1195 pts_delta = sec_delta / self._video_stream.time_base
1196
1197 index_pts = int(index * pts_delta)
1198
1199 # this only seeks to the closed (preceeding) keyframe
1200 self._container.seek(index_pts, stream=self._video_stream)
1201 self._decoder = self._container.decode(video=0)
1202
1203 # this may be made faster if we could get the keyframe's time without
1204 # decoding it
1205 keyframe = next(self._decoder)
1206 keyframe_time = keyframe.pts * keyframe.time_base
1207 keyframe_pts = int(keyframe_time / self._video_stream.time_base)
1208 keyframe_index = keyframe_pts // pts_delta
1209
1210 self._container.seek(index_pts, stream=self._video_stream)
1211 self._next_idx = keyframe_index
1212
1213 frames_to_yield = index - keyframe_index
1214
1215 for _ in range(frames_to_yield):
1216 next(self._decoder)
1217 self._next_idx += 1
1218
1219 def _flush_writer(self):
1220 """Flush the filter and encoder
1221
1222 This will reset the filter to `None` and send EoF to the encoder,
1223 i.e., after calling, no more frames may be written.
1224
1225 """
1226
1227 stream = self._video_stream
1228
1229 if self._video_filter is not None:
1230 # flush encoder
1231 for av_frame in self._video_filter:
1232 if stream.frames == 0:
1233 stream.width = av_frame.width
1234 stream.height = av_frame.height
1235 for packet in stream.encode(av_frame):
1236 self._container.mux(packet)
1237 self._video_filter = None
1238
1239 # flush stream
1240 for packet in stream.encode():
1241 self._container.mux(packet)
1242 self._video_stream = None