1"""Read/Write Videos (and images) using PyAV.
2
3.. note::
4 To use this plugin you need to have `PyAV <https://pyav.org/docs/stable/>`_
5 installed::
6
7 pip install av
8
9This plugin wraps pyAV, a pythonic binding for the FFMPEG library. It is similar
10to our FFMPEG plugin, has improved performance, features a robust interface, and
11aims to supersede the FFMPEG plugin in the future.
12
13
14Methods
15-------
16.. note::
17 Check the respective function for a list of supported kwargs and detailed
18 documentation.
19
20.. autosummary::
21 :toctree:
22
23 PyAVPlugin.read
24 PyAVPlugin.iter
25 PyAVPlugin.write
26 PyAVPlugin.properties
27 PyAVPlugin.metadata
28
29Additional methods available inside the :func:`imopen <imageio.v3.imopen>`
30context:
31
32.. autosummary::
33 :toctree:
34
35 PyAVPlugin.init_video_stream
36 PyAVPlugin.write_frame
37 PyAVPlugin.set_video_filter
38 PyAVPlugin.container_metadata
39 PyAVPlugin.video_stream_metadata
40
41Advanced API
42------------
43
44In addition to the default ImageIO v3 API this plugin exposes custom functions
45that are specific to reading/writing video and its metadata. These are available
46inside the :func:`imopen <imageio.v3.imopen>` context and allow fine-grained
47control over how the video is processed. The functions are documented above and
48below you can find a usage example::
49
50 import imageio.v3 as iio
51
52 with iio.imopen("test.mp4", "w", plugin="pyav") as file:
53 file.init_video_stream("libx264")
54 file.container_metadata["comment"] = "This video was created using ImageIO."
55
56 for _ in range(5):
57 for frame in iio.imiter("imageio:newtonscradle.gif"):
58 file.write_frame(frame)
59
60 meta = iio.immeta("test.mp4", plugin="pyav")
61 assert meta["comment"] == "This video was created using ImageIO."
62
63
64
65Pixel Formats (Colorspaces)
66---------------------------
67
68By default, this plugin converts the video into 8-bit RGB (called ``rgb24`` in
69ffmpeg). This is a useful behavior for many use-cases, but sometimes you may
70want to use the video's native colorspace or you may wish to convert the video
71into an entirely different colorspace. This is controlled using the ``format``
72kwarg. You can use ``format=None`` to leave the image in its native colorspace
73or specify any colorspace supported by FFMPEG as long as it is stridable, i.e.,
74as long as it can be represented by a single numpy array. Some useful choices
75include:
76
77- rgb24 (default; 8-bit RGB)
78- rgb48le (16-bit lower-endian RGB)
79- bgr24 (8-bit BGR; openCVs default colorspace)
80- gray (8-bit grayscale)
81- yuv444p (8-bit channel-first YUV)
82
83Further, FFMPEG maintains a list of available formats, albeit not as part of the
84narrative docs. It can be `found here
85<https://ffmpeg.org/doxygen/trunk/pixfmt_8h_source.html>`_ (warning: C source
86code).
87
88Filters
89-------
90
91On top of providing basic read/write functionality, this plugin allows you to
92use the full collection of `video filters available in FFMPEG
93<https://ffmpeg.org/ffmpeg-filters.html#Video-Filters>`_. This means that you
94can apply excessive preprocessing to your video before retrieving it as a numpy
95array or apply excessive post-processing before you encode your data.
96
97Filters come in two forms: sequences or graphs. Filter sequences are, as the
98name suggests, sequences of filters that are applied one after the other. They
99are specified using the ``filter_sequence`` kwarg. Filter graphs, on the other
100hand, come in the form of a directed graph and are specified using the
101``filter_graph`` kwarg.
102
103.. note::
104 All filters are either sequences or graphs. If all you want is to apply a
105 single filter, you can do this by specifying a filter sequence with a single
106 entry.
107
108A ``filter_sequence`` is a list of filters, each defined through a 2-element
109tuple of the form ``(filter_name, filter_parameters)``. The first element of the
110tuple is the name of the filter. The second element are the filter parameters,
111which can be given either as a string or a dict. The string matches the same
112format that you would use when specifying the filter using the ffmpeg
113command-line tool and the dict has entries of the form ``parameter:value``. For
114example::
115
116 import imageio.v3 as iio
117
118 # using a filter_parameters str
119 img1 = iio.imread(
120 "imageio:cockatoo.mp4",
121 plugin="pyav",
122 filter_sequence=[
123 ("rotate", "45*PI/180")
124 ]
125 )
126
127 # using a filter_parameters dict
128 img2 = iio.imread(
129 "imageio:cockatoo.mp4",
130 plugin="pyav",
131 filter_sequence=[
132 ("rotate", {"angle":"45*PI/180", "fillcolor":"AliceBlue"})
133 ]
134 )
135
136A ``filter_graph``, on the other hand, is specified using a ``(nodes, edges)``
137tuple. It is best explained using an example::
138
139 img = iio.imread(
140 "imageio:cockatoo.mp4",
141 plugin="pyav",
142 filter_graph=(
143 {
144 "split": ("split", ""),
145 "scale_overlay":("scale", "512:-1"),
146 "overlay":("overlay", "x=25:y=25:enable='between(t,1,8)'"),
147 },
148 [
149 ("video_in", "split", 0, 0),
150 ("split", "overlay", 0, 0),
151 ("split", "scale_overlay", 1, 0),
152 ("scale_overlay", "overlay", 0, 1),
153 ("overlay", "video_out", 0, 0),
154 ]
155 )
156 )
157
158The above transforms the video to have picture-in-picture of itself in the top
159left corner. As you can see, nodes are specified using a dict which has names as
160its keys and filter tuples as values; the same tuples as the ones used when
161defining a filter sequence. Edges are a list of a 4-tuples of the form
162``(node_out, node_in, output_idx, input_idx)`` and specify which two filters are
163connected and which inputs/outputs should be used for this.
164
165Further, there are two special nodes in a filter graph: ``video_in`` and
166``video_out``, which represent the graph's input and output respectively. These
167names can not be chosen for other nodes (those nodes would simply be
168overwritten), and for a graph to be valid there must be a path from the input to
169the output and all nodes in the graph must be connected.
170
171While most graphs are quite simple, they can become very complex and we
172recommend that you read through the `FFMPEG documentation
173<https://ffmpeg.org/ffmpeg-filters.html#Filtergraph-description>`_ and their
174examples to better understand how to use them.
175
176"""
177
178from fractions import Fraction
179from math import ceil
180from typing import Any, Dict, List, Optional, Tuple, Union, Generator
181
182import av
183import av.filter
184import numpy as np
185from numpy.lib.stride_tricks import as_strided
186
187from ..core import Request
188from ..core.request import URI_BYTES, InitializationError, IOMode
189from ..core.v3_plugin_api import ImageProperties, PluginV3
190
191
192def _format_to_dtype(format: av.VideoFormat) -> np.dtype:
193 """Convert a pyAV video format into a numpy dtype"""
194
195 if len(format.components) == 0:
196 # fake format
197 raise ValueError(
198 f"Can't determine dtype from format `{format.name}`. It has no channels."
199 )
200
201 endian = ">" if format.is_big_endian else "<"
202 dtype = "f" if "f32" in format.name else "u"
203 bits_per_channel = [x.bits for x in format.components]
204 n_bytes = str(int(ceil(bits_per_channel[0] / 8)))
205
206 return np.dtype(endian + dtype + n_bytes)
207
208
209def _get_frame_shape(frame: av.VideoFrame) -> Tuple[int, ...]:
210 """Compute the frame's array shape
211
212 Parameters
213 ----------
214 frame : av.VideoFrame
215 A frame for which the resulting shape should be computed.
216
217 Returns
218 -------
219 shape : Tuple[int, ...]
220 A tuple describing the shape of the image data in the frame.
221
222 """
223
224 widths = [component.width for component in frame.format.components]
225 heights = [component.height for component in frame.format.components]
226 bits = np.array([component.bits for component in frame.format.components])
227 line_sizes = [plane.line_size for plane in frame.planes]
228
229 subsampled_width = widths[:-1] != widths[1:]
230 subsampled_height = heights[:-1] != heights[1:]
231 unaligned_components = np.any(bits % 8 != 0) or (line_sizes[:-1] != line_sizes[1:])
232 if subsampled_width or subsampled_height or unaligned_components:
233 raise IOError(
234 f"{frame.format.name} can't be expressed as a strided array."
235 "Use `format=` to select a format to convert into."
236 )
237
238 shape = [frame.height, frame.width]
239
240 # ffmpeg doesn't have a notion of channel-first or channel-last formats
241 # instead it stores frames in one or more planes which contain individual
242 # components of a pixel depending on the pixel format. For channel-first
243 # formats each component lives on a separate plane (n_planes) and for
244 # channel-last formats all components are packed on a single plane
245 # (n_channels)
246 n_planes = max([component.plane for component in frame.format.components]) + 1
247 if n_planes > 1:
248 shape = [n_planes] + shape
249
250 channels_per_plane = [0] * n_planes
251 for component in frame.format.components:
252 channels_per_plane[component.plane] += 1
253 n_channels = max(channels_per_plane)
254
255 if n_channels > 1:
256 shape = shape + [n_channels]
257
258 return tuple(shape)
259
260
261class PyAVPlugin(PluginV3):
262 """Support for pyAV as backend.
263
264 Parameters
265 ----------
266 request : iio.Request
267 A request object that represents the users intent. It provides a
268 standard interface to access various the various ImageResources and
269 serves them to the plugin as a file object (or file). Check the docs for
270 details.
271 container : str
272 Only used during `iio_mode="w"`! If not None, overwrite the default container
273 format chosen by pyav.
274 kwargs : Any
275 Additional kwargs are forwarded to PyAV's constructor.
276
277 """
278
279 def __init__(self, request: Request, *, container: str = None, **kwargs) -> None:
280 """Initialize a new Plugin Instance.
281
282 See Plugin's docstring for detailed documentation.
283
284 Notes
285 -----
286 The implementation here stores the request as a local variable that is
287 exposed using a @property below. If you inherit from PluginV3, remember
288 to call ``super().__init__(request)``.
289
290 """
291
292 super().__init__(request)
293
294 self._container = None
295 self._video_stream = None
296 self._video_filter = None
297
298 if request.mode.io_mode == IOMode.read:
299 self._next_idx = 0
300 try:
301 if request._uri_type == 5: # 5 is the value of URI_HTTP
302 # pyav should read from HTTP by itself. This enables reading
303 # HTTP-based streams like DASH. Note that solving streams
304 # like this is temporary until the new request object gets
305 # implemented.
306 self._container = av.open(request.raw_uri, **kwargs)
307 else:
308 self._container = av.open(request.get_file(), **kwargs)
309 self._video_stream = self._container.streams.video[0]
310 self._decoder = self._container.decode(video=0)
311 except av.AVError:
312 if isinstance(request.raw_uri, bytes):
313 msg = "PyAV does not support these `<bytes>`"
314 else:
315 msg = f"PyAV does not support `{request.raw_uri}`"
316 raise InitializationError(msg) from None
317 else:
318 self.frames_written = 0
319 file_handle = self.request.get_file()
320 filename = getattr(file_handle, "name", None)
321 extension = self.request.extension or self.request.format_hint
322 if extension is None:
323 raise InitializationError("Can't determine output container to use.")
324
325 # hacky, but beats running our own format selection logic
326 # (since av_guess_format is not exposed)
327 try:
328 setattr(file_handle, "name", filename or "tmp" + extension)
329 except AttributeError:
330 pass # read-only, nothing we can do
331
332 try:
333 self._container = av.open(
334 file_handle, mode="w", format=container, **kwargs
335 )
336 except ValueError:
337 raise InitializationError(
338 f"PyAV can not write to `{self.request.raw_uri}`"
339 )
340
341 # ---------------------
342 # Standard V3 Interface
343 # ---------------------
344
345 def read(
346 self,
347 *,
348 index: int = ...,
349 format: str = "rgb24",
350 filter_sequence: List[Tuple[str, Union[str, dict]]] = None,
351 filter_graph: Tuple[dict, List] = None,
352 constant_framerate: bool = None,
353 thread_count: int = 0,
354 thread_type: str = None,
355 ) -> np.ndarray:
356 """Read frames from the video.
357
358 If ``index`` is an integer, this function reads the index-th frame from
359 the file. If ``index`` is ... (Ellipsis), this function reads all frames
360 from the video, stacks them along the first dimension, and returns a
361 batch of frames.
362
363 Parameters
364 ----------
365 index : int
366 The index of the frame to read, e.g. ``index=5`` reads the 5th
367 frame. If ``...``, read all the frames in the video and stack them
368 along a new, prepended, batch dimension.
369 format : str
370 Set the returned colorspace. If not None (default: rgb24), convert
371 the data into the given format before returning it. If ``None``
372 return the data in the encoded format if it can be expressed as a
373 strided array; otherwise raise an Exception.
374 filter_sequence : List[str, str, dict]
375 If not None, apply the given sequence of FFmpeg filters to each
376 ndimage. Check the (module-level) plugin docs for details and
377 examples.
378 filter_graph : (dict, List)
379 If not None, apply the given graph of FFmpeg filters to each
380 ndimage. The graph is given as a tuple of two dicts. The first dict
381 contains a (named) set of nodes, and the second dict contains a set
382 of edges between nodes of the previous dict. Check the (module-level)
383 plugin docs for details and examples.
384 constant_framerate : bool
385 If True assume the video's framerate is constant. This allows for
386 faster seeking inside the file. If False, the video is reset before
387 each read and searched from the beginning. If None (default), this
388 value will be read from the container format.
389 thread_count : int
390 How many threads to use when decoding a frame. The default is 0,
391 which will set the number using ffmpeg's default, which is based on
392 the codec, number of available cores, threadding model, and other
393 considerations.
394 thread_type : str
395 The threading model to be used. One of
396
397 - `"SLICE"`: threads assemble parts of the current frame
398 - `"FRAME"`: threads may assemble future frames
399 - None (default): Uses ``"FRAME"`` if ``index=...`` and ffmpeg's
400 default otherwise.
401
402
403 Returns
404 -------
405 frame : np.ndarray
406 A numpy array containing loaded frame data.
407
408 Notes
409 -----
410 Accessing random frames repeatedly is costly (O(k), where k is the
411 average distance between two keyframes). You should do so only sparingly
412 if possible. In some cases, it can be faster to bulk-read the video (if
413 it fits into memory) and to then access the returned ndarray randomly.
414
415 The current implementation may cause problems for b-frames, i.e.,
416 bidirectionaly predicted pictures. I lack test videos to write unit
417 tests for this case.
418
419 Reading from an index other than ``...``, i.e. reading a single frame,
420 currently doesn't support filters that introduce delays.
421
422 """
423
424 if index is ...:
425 props = self.properties(format=format)
426 uses_filter = (
427 self._video_filter is not None
428 or filter_graph is not None
429 or filter_sequence is not None
430 )
431
432 self._container.seek(0)
433 if not uses_filter and props.shape[0] != 0:
434 frames = np.empty(props.shape, dtype=props.dtype)
435 for idx, frame in enumerate(
436 self.iter(
437 format=format,
438 filter_sequence=filter_sequence,
439 filter_graph=filter_graph,
440 thread_count=thread_count,
441 thread_type=thread_type or "FRAME",
442 )
443 ):
444 frames[idx] = frame
445 else:
446 frames = np.stack(
447 [
448 x
449 for x in self.iter(
450 format=format,
451 filter_sequence=filter_sequence,
452 filter_graph=filter_graph,
453 thread_count=thread_count,
454 thread_type=thread_type or "FRAME",
455 )
456 ]
457 )
458
459 # reset stream container, because threading model can't change after
460 # first access
461 self._video_stream.close()
462 self._video_stream = self._container.streams.video[0]
463
464 return frames
465
466 if thread_type is not None and thread_type != self._video_stream.thread_type:
467 self._video_stream.thread_type = thread_type
468 if (
469 thread_count != 0
470 and thread_count != self._video_stream.codec_context.thread_count
471 ):
472 # in FFMPEG thread_count == 0 means use the default count, which we
473 # change to mean don't change the thread count.
474 self._video_stream.codec_context.thread_count = thread_count
475
476 if constant_framerate is None:
477 constant_framerate = not self._container.format.variable_fps
478
479 # note: cheap for contigous incremental reads
480 self._seek(index, constant_framerate=constant_framerate)
481 desired_frame = next(self._decoder)
482 self._next_idx += 1
483
484 self.set_video_filter(filter_sequence, filter_graph)
485 if self._video_filter is not None:
486 desired_frame = self._video_filter.send(desired_frame)
487
488 return self._unpack_frame(desired_frame, format=format)
489
490 def iter(
491 self,
492 *,
493 format: str = "rgb24",
494 filter_sequence: List[Tuple[str, Union[str, dict]]] = None,
495 filter_graph: Tuple[dict, List] = None,
496 thread_count: int = 0,
497 thread_type: str = None,
498 ) -> np.ndarray:
499 """Yield frames from the video.
500
501 Parameters
502 ----------
503 frame : np.ndarray
504 A numpy array containing loaded frame data.
505 format : str
506 Convert the data into the given format before returning it. If None,
507 return the data in the encoded format if it can be expressed as a
508 strided array; otherwise raise an Exception.
509 filter_sequence : List[str, str, dict]
510 Set the returned colorspace. If not None (default: rgb24), convert
511 the data into the given format before returning it. If ``None``
512 return the data in the encoded format if it can be expressed as a
513 strided array; otherwise raise an Exception.
514 filter_graph : (dict, List)
515 If not None, apply the given graph of FFmpeg filters to each
516 ndimage. The graph is given as a tuple of two dicts. The first dict
517 contains a (named) set of nodes, and the second dict contains a set
518 of edges between nodes of the previous dict. Check the (module-level)
519 plugin docs for details and examples.
520 thread_count : int
521 How many threads to use when decoding a frame. The default is 0,
522 which will set the number using ffmpeg's default, which is based on
523 the codec, number of available cores, threadding model, and other
524 considerations.
525 thread_type : str
526 The threading model to be used. One of
527
528 - `"SLICE"` (default): threads assemble parts of the current frame
529 - `"FRAME"`: threads may assemble future frames (faster for bulk reading)
530
531
532 Yields
533 ------
534 frame : np.ndarray
535 A (decoded) video frame.
536
537
538 """
539
540 self._video_stream.thread_type = thread_type or "SLICE"
541 self._video_stream.codec_context.thread_count = thread_count
542
543 self.set_video_filter(filter_sequence, filter_graph)
544
545 for frame in self._decoder:
546 self._next_idx += 1
547
548 if self._video_filter is not None:
549 try:
550 frame = self._video_filter.send(frame)
551 except StopIteration:
552 break
553
554 if frame is None:
555 continue
556
557 yield self._unpack_frame(frame, format=format)
558
559 if self._video_filter is not None:
560 for frame in self._video_filter:
561 yield self._unpack_frame(frame, format=format)
562
563 def write(
564 self,
565 ndimage: Union[np.ndarray, List[np.ndarray]],
566 *,
567 codec: str = None,
568 is_batch: bool = True,
569 fps: int = 24,
570 in_pixel_format: str = "rgb24",
571 out_pixel_format: str = None,
572 filter_sequence: List[Tuple[str, Union[str, dict]]] = None,
573 filter_graph: Tuple[dict, List] = None,
574 ) -> Optional[bytes]:
575 """Save a ndimage as a video.
576
577 Given a batch of frames (stacked along the first axis) or a list of
578 frames, encode them and add the result to the ImageResource.
579
580 Parameters
581 ----------
582 ndimage : ArrayLike, List[ArrayLike]
583 The ndimage to encode and write to the ImageResource.
584 codec : str
585 The codec to use when encoding frames. Only needed on first write
586 and ignored on subsequent writes.
587 is_batch : bool
588 If True (default), the ndimage is a batch of images, otherwise it is
589 a single image. This parameter has no effect on lists of ndimages.
590 fps : str
591 The resulting videos frames per second.
592 in_pixel_format : str
593 The pixel format of the incoming ndarray. Defaults to "rgb24" and can
594 be any stridable pix_fmt supported by FFmpeg.
595 out_pixel_format : str
596 The pixel format to use while encoding frames. If None (default)
597 use the codec's default.
598 filter_sequence : List[str, str, dict]
599 If not None, apply the given sequence of FFmpeg filters to each
600 ndimage. Check the (module-level) plugin docs for details and
601 examples.
602 filter_graph : (dict, List)
603 If not None, apply the given graph of FFmpeg filters to each
604 ndimage. The graph is given as a tuple of two dicts. The first dict
605 contains a (named) set of nodes, and the second dict contains a set
606 of edges between nodes of the previous dict. Check the (module-level)
607 plugin docs for details and examples.
608
609 Returns
610 -------
611 encoded_image : bytes or None
612 If the chosen ImageResource is the special target ``"<bytes>"`` then
613 write will return a byte string containing the encoded image data.
614 Otherwise, it returns None.
615
616 Notes
617 -----
618 When writing ``<bytes>``, the video is finalized immediately after the
619 first write call and calling write multiple times to append frames is
620 not possible.
621
622 """
623
624 if isinstance(ndimage, list):
625 # frames shapes must agree for video
626 if any(f.shape != ndimage[0].shape for f in ndimage):
627 raise ValueError("All frames should have the same shape")
628 elif not is_batch:
629 ndimage = np.asarray(ndimage)[None, ...]
630 else:
631 ndimage = np.asarray(ndimage)
632
633 if self._video_stream is None:
634 self.init_video_stream(codec, fps=fps, pixel_format=out_pixel_format)
635
636 self.set_video_filter(filter_sequence, filter_graph)
637
638 for img in ndimage:
639 self.write_frame(img, pixel_format=in_pixel_format)
640
641 if self.request._uri_type == URI_BYTES:
642 # bytes are immutuable, so we have to flush immediately
643 # and can't support appending
644 self._flush_writer()
645 self._container.close()
646
647 return self.request.get_file().getvalue()
648
649 def properties(self, index: int = ..., *, format: str = "rgb24") -> ImageProperties:
650 """Standardized ndimage metadata.
651
652 Parameters
653 ----------
654 index : int
655 The index of the ndimage for which to return properties. If ``...``
656 (Ellipsis, default), return the properties for the resulting batch
657 of frames.
658 format : str
659 If not None (default: rgb24), convert the data into the given format
660 before returning it. If None return the data in the encoded format
661 if that can be expressed as a strided array; otherwise raise an
662 Exception.
663
664 Returns
665 -------
666 properties : ImageProperties
667 A dataclass filled with standardized image metadata.
668
669 Notes
670 -----
671 This function is efficient and won't process any pixel data.
672
673 The provided metadata does not include modifications by any filters
674 (through ``filter_sequence`` or ``filter_graph``).
675
676 """
677
678 video_width = self._video_stream.codec_context.width
679 video_height = self._video_stream.codec_context.height
680 pix_format = format or self._video_stream.codec_context.pix_fmt
681 frame_template = av.VideoFrame(video_width, video_height, pix_format)
682
683 shape = _get_frame_shape(frame_template)
684 if index is ...:
685 n_frames = self._video_stream.frames
686 shape = (n_frames,) + shape
687
688 return ImageProperties(
689 shape=tuple(shape),
690 dtype=_format_to_dtype(frame_template.format),
691 n_images=shape[0] if index is ... else None,
692 is_batch=index is ...,
693 )
694
695 def metadata(
696 self,
697 index: int = ...,
698 exclude_applied: bool = True,
699 constant_framerate: bool = None,
700 ) -> Dict[str, Any]:
701 """Format-specific metadata.
702
703 Returns a dictionary filled with metadata that is either stored in the
704 container, the video stream, or the frame's side-data.
705
706 Parameters
707 ----------
708 index : int
709 If ... (Ellipsis, default) return global metadata (the metadata
710 stored in the container and video stream). If not ..., return the
711 side data stored in the frame at the given index.
712 exclude_applied : bool
713 Currently, this parameter has no effect. It exists for compliance with
714 the ImageIO v3 API.
715 constant_framerate : bool
716 If True assume the video's framerate is constant. This allows for
717 faster seeking inside the file. If False, the video is reset before
718 each read and searched from the beginning. If None (default), this
719 value will be read from the container format.
720
721 Returns
722 -------
723 metadata : dict
724 A dictionary filled with format-specific metadata fields and their
725 values.
726
727 """
728
729 metadata = dict()
730
731 if index is ...:
732 # useful flags defined on the container and/or video stream
733 metadata.update(
734 {
735 "video_format": self._video_stream.codec_context.pix_fmt,
736 "codec": self._video_stream.codec.name,
737 "long_codec": self._video_stream.codec.long_name,
738 "profile": self._video_stream.profile,
739 "fps": float(self._video_stream.guessed_rate),
740 }
741 )
742 if self._video_stream.duration is not None:
743 duration = float(
744 self._video_stream.duration * self._video_stream.time_base
745 )
746 metadata.update({"duration": duration})
747
748 metadata.update(self.container_metadata)
749 metadata.update(self.video_stream_metadata)
750 return metadata
751
752 if constant_framerate is None:
753 constant_framerate = not self._container.format.variable_fps
754
755 self._seek(index, constant_framerate=constant_framerate)
756 desired_frame = next(self._decoder)
757 self._next_idx += 1
758
759 # useful flags defined on the frame
760 metadata.update(
761 {
762 "key_frame": bool(desired_frame.key_frame),
763 "time": desired_frame.time,
764 "interlaced_frame": bool(desired_frame.interlaced_frame),
765 "frame_type": desired_frame.pict_type.name,
766 }
767 )
768
769 # side data
770 metadata.update(
771 {item.type.name: item.to_bytes() for item in desired_frame.side_data}
772 )
773
774 return metadata
775
776 def close(self) -> None:
777 """Close the Video."""
778
779 is_write = self.request.mode.io_mode == IOMode.write
780 if is_write and self._video_stream is not None:
781 self._flush_writer()
782
783 if self._video_stream is not None:
784 try:
785 self._video_stream.close()
786 except ValueError:
787 pass # stream already closed
788
789 if self._container is not None:
790 self._container.close()
791
792 self.request.finish()
793
794 def __enter__(self) -> "PyAVPlugin":
795 return super().__enter__()
796
797 # ------------------------------
798 # Add-on Interface inside imopen
799 # ------------------------------
800
801 def init_video_stream(
802 self,
803 codec: str,
804 *,
805 fps: float = 24,
806 pixel_format: str = None,
807 max_keyframe_interval: int = None,
808 force_keyframes: bool = None,
809 ) -> None:
810 """Initialize a new video stream.
811
812 This function adds a new video stream to the ImageResource using the
813 selected encoder (codec), framerate, and colorspace.
814
815 Parameters
816 ----------
817 codec : str
818 The codec to use, e.g. ``"libx264"`` or ``"vp9"``.
819 fps : float
820 The desired framerate of the video stream (frames per second).
821 pixel_format : str
822 The pixel format to use while encoding frames. If None (default) use
823 the codec's default.
824 max_keyframe_interval : int
825 The maximum distance between two intra frames (I-frames). Also known
826 as GOP size. If unspecified use the codec's default. Note that not
827 every I-frame is a keyframe; see the notes for details.
828 force_keyframes : bool
829 If True, limit inter frames dependency to frames within the current
830 keyframe interval (GOP), i.e., force every I-frame to be a keyframe.
831 If unspecified, use the codec's default.
832
833 Notes
834 -----
835 You can usually leave ``max_keyframe_interval`` and ``force_keyframes``
836 at their default values, unless you try to generate seek-optimized video
837 or have a similar specialist use-case. In this case, ``force_keyframes``
838 controls the ability to seek to _every_ I-frame, and
839 ``max_keyframe_interval`` controls how close to a random frame you can
840 seek. Low values allow more fine-grained seek at the expense of
841 file-size (and thus I/O performance).
842
843 """
844
845 stream = self._container.add_stream(codec, fps)
846 stream.time_base = Fraction(1 / fps).limit_denominator(int(2**16 - 1))
847 if pixel_format is not None:
848 stream.pix_fmt = pixel_format
849 if max_keyframe_interval is not None:
850 stream.gop_size = max_keyframe_interval
851 if force_keyframes is not None:
852 stream.closed_gop = force_keyframes
853
854 self._video_stream = stream
855
856 def write_frame(self, frame: np.ndarray, *, pixel_format: str = "rgb24") -> None:
857 """Add a frame to the video stream.
858
859 This function appends a new frame to the video. It assumes that the
860 stream previously has been initialized. I.e., ``init_video_stream`` has
861 to be called before calling this function for the write to succeed.
862
863 Parameters
864 ----------
865 frame : np.ndarray
866 The image to be appended/written to the video stream.
867 pixel_format : str
868 The colorspace (pixel format) of the incoming frame.
869
870 Notes
871 -----
872 Frames may be held in a buffer, e.g., by the filter pipeline used during
873 writing or by FFMPEG to batch them prior to encoding. Make sure to
874 ``.close()`` the plugin or to use a context manager to ensure that all
875 frames are written to the ImageResource.
876
877 """
878
879 # manual packing of ndarray into frame
880 # (this should live in pyAV, but it doesn't support all the formats we
881 # want and PRs there are slow)
882 pixel_format = av.VideoFormat(pixel_format)
883 img_dtype = _format_to_dtype(pixel_format)
884 width = frame.shape[2 if pixel_format.is_planar else 1]
885 height = frame.shape[1 if pixel_format.is_planar else 0]
886 av_frame = av.VideoFrame(width, height, pixel_format.name)
887 if pixel_format.is_planar:
888 for idx, plane in enumerate(av_frame.planes):
889 plane_array = np.frombuffer(plane, dtype=img_dtype)
890 plane_array = as_strided(
891 plane_array,
892 shape=(plane.height, plane.width),
893 strides=(plane.line_size, img_dtype.itemsize),
894 )
895 plane_array[...] = frame[idx]
896 else:
897 if pixel_format.name.startswith("bayer_"):
898 # ffmpeg doesn't describe bayer formats correctly
899 # see https://github.com/imageio/imageio/issues/761#issuecomment-1059318851
900 # and following for details.
901 n_channels = 1
902 else:
903 n_channels = len(pixel_format.components)
904
905 plane = av_frame.planes[0]
906 plane_shape = (plane.height, plane.width)
907 plane_strides = (plane.line_size, n_channels * img_dtype.itemsize)
908 if n_channels > 1:
909 plane_shape += (n_channels,)
910 plane_strides += (img_dtype.itemsize,)
911
912 plane_array = as_strided(
913 np.frombuffer(plane, dtype=img_dtype),
914 shape=plane_shape,
915 strides=plane_strides,
916 )
917 plane_array[...] = frame
918
919 stream = self._video_stream
920 av_frame.time_base = stream.codec_context.time_base
921 av_frame.pts = self.frames_written
922 self.frames_written += 1
923
924 if self._video_filter is not None:
925 av_frame = self._video_filter.send(av_frame)
926 if av_frame is None:
927 return
928
929 if stream.frames == 0:
930 stream.width = av_frame.width
931 stream.height = av_frame.height
932
933 for packet in stream.encode(av_frame):
934 self._container.mux(packet)
935
936 def set_video_filter(
937 self,
938 filter_sequence: List[Tuple[str, Union[str, dict]]] = None,
939 filter_graph: Tuple[dict, List] = None,
940 ) -> None:
941 """Set the filter(s) to use.
942
943 This function creates a new FFMPEG filter graph to use when reading or
944 writing video. In the case of reading, frames are passed through the
945 filter graph before begin returned and, in case of writing, frames are
946 passed through the filter before being written to the video.
947
948 Parameters
949 ----------
950 filter_sequence : List[str, str, dict]
951 If not None, apply the given sequence of FFmpeg filters to each
952 ndimage. Check the (module-level) plugin docs for details and
953 examples.
954 filter_graph : (dict, List)
955 If not None, apply the given graph of FFmpeg filters to each
956 ndimage. The graph is given as a tuple of two dicts. The first dict
957 contains a (named) set of nodes, and the second dict contains a set
958 of edges between nodes of the previous dict. Check the
959 (module-level) plugin docs for details and examples.
960
961 Notes
962 -----
963 Changing a filter graph with lag during reading or writing will
964 currently cause frames in the filter queue to be lost.
965
966 """
967
968 if filter_sequence is None and filter_graph is None:
969 self._video_filter = None
970 return
971
972 if filter_sequence is None:
973 filter_sequence = list()
974
975 node_descriptors: Dict[str, Tuple[str, Union[str, Dict]]]
976 edges: List[Tuple[str, str, int, int]]
977 if filter_graph is None:
978 node_descriptors, edges = dict(), [("video_in", "video_out", 0, 0)]
979 else:
980 node_descriptors, edges = filter_graph
981
982 graph = av.filter.Graph()
983
984 previous_node = graph.add_buffer(template=self._video_stream)
985 for filter_name, argument in filter_sequence:
986 if isinstance(argument, str):
987 current_node = graph.add(filter_name, argument)
988 else:
989 current_node = graph.add(filter_name, **argument)
990 previous_node.link_to(current_node)
991 previous_node = current_node
992
993 nodes = dict()
994 nodes["video_in"] = previous_node
995 nodes["video_out"] = graph.add("buffersink")
996 for name, (filter_name, arguments) in node_descriptors.items():
997 if isinstance(arguments, str):
998 nodes[name] = graph.add(filter_name, arguments)
999 else:
1000 nodes[name] = graph.add(filter_name, **arguments)
1001
1002 for from_note, to_node, out_idx, in_idx in edges:
1003 nodes[from_note].link_to(nodes[to_node], out_idx, in_idx)
1004
1005 graph.configure()
1006
1007 def video_filter():
1008 # this starts a co-routine
1009 # send frames using graph.send()
1010 frame = yield None
1011
1012 # send and receive frames in "parallel"
1013 while frame is not None:
1014 graph.push(frame)
1015 try:
1016 frame = yield graph.pull()
1017 except av.error.BlockingIOError:
1018 # filter has lag and needs more frames
1019 frame = yield None
1020 except av.error.EOFError:
1021 break
1022
1023 try:
1024 # send EOF in av>=9.0
1025 graph.push(None)
1026 except ValueError: # pragma: no cover
1027 # handle av<9.0
1028 pass
1029
1030 # all frames have been sent, empty the filter
1031 while True:
1032 try:
1033 yield graph.pull()
1034 except av.error.EOFError:
1035 break # EOF
1036 except av.error.BlockingIOError: # pragma: no cover
1037 # handle av<9.0
1038 break
1039
1040 self._video_filter = video_filter()
1041 self._video_filter.send(None)
1042
1043 @property
1044 def container_metadata(self):
1045 """Container-specific metadata.
1046
1047 A dictionary containing metadata stored at the container level.
1048
1049 """
1050 return self._container.metadata
1051
1052 @property
1053 def video_stream_metadata(self):
1054 """Stream-specific metadata.
1055
1056 A dictionary containing metadata stored at the stream level.
1057
1058 """
1059 return self._video_stream.metadata
1060
1061 # -------------------------------
1062 # Internals and private functions
1063 # -------------------------------
1064
1065 def _unpack_frame(self, frame: av.VideoFrame, *, format: str = None) -> np.ndarray:
1066 """Convert a av.VideoFrame into a ndarray
1067
1068 Parameters
1069 ----------
1070 frame : av.VideoFrame
1071 The frame to unpack.
1072 format : str
1073 If not None, convert the frame to the given format before unpacking.
1074
1075 """
1076
1077 if format is not None:
1078 frame = frame.reformat(format=format)
1079
1080 dtype = _format_to_dtype(frame.format)
1081 shape = _get_frame_shape(frame)
1082
1083 planes = list()
1084 for idx in range(len(frame.planes)):
1085 n_channels = sum(
1086 [
1087 x.bits // (dtype.itemsize * 8)
1088 for x in frame.format.components
1089 if x.plane == idx
1090 ]
1091 )
1092 av_plane = frame.planes[idx]
1093 plane_shape = (av_plane.height, av_plane.width)
1094 plane_strides = (av_plane.line_size, n_channels * dtype.itemsize)
1095 if n_channels > 1:
1096 plane_shape += (n_channels,)
1097 plane_strides += (dtype.itemsize,)
1098
1099 np_plane = as_strided(
1100 np.frombuffer(av_plane, dtype=dtype),
1101 shape=plane_shape,
1102 strides=plane_strides,
1103 )
1104 planes.append(np_plane)
1105
1106 if len(planes) > 1:
1107 # Note: the planes *should* exist inside a contigous memory block
1108 # somewhere inside av.Frame however pyAV does not appear to expose this,
1109 # so we are forced to copy the planes individually instead of wrapping
1110 # them :(
1111 out = np.concatenate(planes).reshape(shape)
1112 else:
1113 out = planes[0]
1114
1115 return out
1116
1117 def _seek(self, index, *, constant_framerate: bool = True) -> Generator:
1118 """Seeks to the frame at the given index."""
1119
1120 if index == self._next_idx:
1121 return # fast path :)
1122
1123 # we must decode at least once before we seek otherwise the
1124 # returned frames become corrupt.
1125 if self._next_idx == 0:
1126 next(self._decoder)
1127 self._next_idx += 1
1128
1129 if index == self._next_idx:
1130 return # fast path :)
1131
1132 # remove this branch until I find a way to efficiently find the next
1133 # keyframe. keeping this as a reminder
1134 # if self._next_idx < index and index < self._next_keyframe_idx:
1135 # frames_to_yield = index - self._next_idx
1136 if not constant_framerate and index > self._next_idx:
1137 frames_to_yield = index - self._next_idx
1138 elif not constant_framerate:
1139 # seek backwards and can't link idx and pts
1140 self._container.seek(0)
1141 self._decoder = self._container.decode(video=0)
1142 self._next_idx = 0
1143
1144 frames_to_yield = index
1145 else:
1146 # we know that the time between consecutive frames is constant
1147 # hence we can link index and pts
1148
1149 # how many pts lie between two frames
1150 sec_delta = 1 / self._video_stream.guessed_rate
1151 pts_delta = sec_delta / self._video_stream.time_base
1152
1153 index_pts = int(index * pts_delta)
1154
1155 # this only seeks to the closed (preceeding) keyframe
1156 self._container.seek(index_pts, stream=self._video_stream)
1157 self._decoder = self._container.decode(video=0)
1158
1159 # this may be made faster if we could get the keyframe's time without
1160 # decoding it
1161 keyframe = next(self._decoder)
1162 keyframe_time = keyframe.pts * keyframe.time_base
1163 keyframe_pts = int(keyframe_time / self._video_stream.time_base)
1164 keyframe_index = keyframe_pts // pts_delta
1165
1166 self._container.seek(index_pts, stream=self._video_stream)
1167 self._next_idx = keyframe_index
1168
1169 frames_to_yield = index - keyframe_index
1170
1171 for _ in range(frames_to_yield):
1172 next(self._decoder)
1173 self._next_idx += 1
1174
1175 def _flush_writer(self):
1176 """Flush the filter and encoder
1177
1178 This will reset the filter to `None` and send EoF to the encoder,
1179 i.e., after calling, no more frames may be written.
1180
1181 """
1182
1183 stream = self._video_stream
1184
1185 if self._video_filter is not None:
1186 # flush encoder
1187 for av_frame in self._video_filter:
1188 if stream.frames == 0:
1189 stream.width = av_frame.width
1190 stream.height = av_frame.height
1191 for packet in stream.encode(av_frame):
1192 self._container.mux(packet)
1193 self._video_filter = None
1194
1195 # flush stream
1196 for packet in stream.encode():
1197 self._container.mux(packet)
1198 self._video_stream = None