1"""Read/Write Videos (and images) using PyAV.
2
3.. note::
4 To use this plugin you need to have `PyAV <https://pyav.org/docs/stable/>`_
5 installed::
6
7 pip install av
8
9This plugin wraps pyAV, a pythonic binding for the FFMPEG library. It is similar
10to our FFMPEG plugin, has improved performance, features a robust interface, and
11aims to supersede the FFMPEG plugin in the future.
12
13
14Methods
15-------
16.. note::
17 Check the respective function for a list of supported kwargs and detailed
18 documentation.
19
20.. autosummary::
21 :toctree:
22
23 PyAVPlugin.read
24 PyAVPlugin.iter
25 PyAVPlugin.write
26 PyAVPlugin.properties
27 PyAVPlugin.metadata
28
29Additional methods available inside the :func:`imopen <imageio.v3.imopen>`
30context:
31
32.. autosummary::
33 :toctree:
34
35 PyAVPlugin.init_video_stream
36 PyAVPlugin.write_frame
37 PyAVPlugin.set_video_filter
38 PyAVPlugin.container_metadata
39 PyAVPlugin.video_stream_metadata
40
41Advanced API
42------------
43
44In addition to the default ImageIO v3 API this plugin exposes custom functions
45that are specific to reading/writing video and its metadata. These are available
46inside the :func:`imopen <imageio.v3.imopen>` context and allow fine-grained
47control over how the video is processed. The functions are documented above and
48below you can find a usage example::
49
50 import imageio.v3 as iio
51
52 with iio.imopen("test.mp4", "w", plugin="pyav") as file:
53 file.init_video_stream("libx264")
54 file.container_metadata["comment"] = "This video was created using ImageIO."
55
56 for _ in range(5):
57 for frame in iio.imiter("imageio:newtonscradle.gif"):
58 file.write_frame(frame)
59
60 meta = iio.immeta("test.mp4", plugin="pyav")
61 assert meta["comment"] == "This video was created using ImageIO."
62
63
64
65Pixel Formats (Colorspaces)
66---------------------------
67
68By default, this plugin converts the video into 8-bit RGB (called ``rgb24`` in
69ffmpeg). This is a useful behavior for many use-cases, but sometimes you may
70want to use the video's native colorspace or you may wish to convert the video
71into an entirely different colorspace. This is controlled using the ``format``
72kwarg. You can use ``format=None`` to leave the image in its native colorspace
73or specify any colorspace supported by FFMPEG as long as it is stridable, i.e.,
74as long as it can be represented by a single numpy array. Some useful choices
75include:
76
77- rgb24 (default; 8-bit RGB)
78- rgb48le (16-bit lower-endian RGB)
79- bgr24 (8-bit BGR; openCVs default colorspace)
80- gray (8-bit grayscale)
81- yuv444p (8-bit channel-first YUV)
82
83Further, FFMPEG maintains a list of available formats, albeit not as part of the
84narrative docs. It can be `found here
85<https://ffmpeg.org/doxygen/trunk/pixfmt_8h_source.html>`_ (warning: C source
86code).
87
88Filters
89-------
90
91On top of providing basic read/write functionality, this plugin allows you to
92use the full collection of `video filters available in FFMPEG
93<https://ffmpeg.org/ffmpeg-filters.html#Video-Filters>`_. This means that you
94can apply excessive preprocessing to your video before retrieving it as a numpy
95array or apply excessive post-processing before you encode your data.
96
97Filters come in two forms: sequences or graphs. Filter sequences are, as the
98name suggests, sequences of filters that are applied one after the other. They
99are specified using the ``filter_sequence`` kwarg. Filter graphs, on the other
100hand, come in the form of a directed graph and are specified using the
101``filter_graph`` kwarg.
102
103.. note::
104 All filters are either sequences or graphs. If all you want is to apply a
105 single filter, you can do this by specifying a filter sequence with a single
106 entry.
107
108A ``filter_sequence`` is a list of filters, each defined through a 2-element
109tuple of the form ``(filter_name, filter_parameters)``. The first element of the
110tuple is the name of the filter. The second element are the filter parameters,
111which can be given either as a string or a dict. The string matches the same
112format that you would use when specifying the filter using the ffmpeg
113command-line tool and the dict has entries of the form ``parameter:value``. For
114example::
115
116 import imageio.v3 as iio
117
118 # using a filter_parameters str
119 img1 = iio.imread(
120 "imageio:cockatoo.mp4",
121 plugin="pyav",
122 filter_sequence=[
123 ("rotate", "45*PI/180")
124 ]
125 )
126
127 # using a filter_parameters dict
128 img2 = iio.imread(
129 "imageio:cockatoo.mp4",
130 plugin="pyav",
131 filter_sequence=[
132 ("rotate", {"angle":"45*PI/180", "fillcolor":"AliceBlue"})
133 ]
134 )
135
136A ``filter_graph``, on the other hand, is specified using a ``(nodes, edges)``
137tuple. It is best explained using an example::
138
139 img = iio.imread(
140 "imageio:cockatoo.mp4",
141 plugin="pyav",
142 filter_graph=(
143 {
144 "split": ("split", ""),
145 "scale_overlay":("scale", "512:-1"),
146 "overlay":("overlay", "x=25:y=25:enable='between(t,1,8)'"),
147 },
148 [
149 ("video_in", "split", 0, 0),
150 ("split", "overlay", 0, 0),
151 ("split", "scale_overlay", 1, 0),
152 ("scale_overlay", "overlay", 0, 1),
153 ("overlay", "video_out", 0, 0),
154 ]
155 )
156 )
157
158The above transforms the video to have picture-in-picture of itself in the top
159left corner. As you can see, nodes are specified using a dict which has names as
160its keys and filter tuples as values; the same tuples as the ones used when
161defining a filter sequence. Edges are a list of a 4-tuples of the form
162``(node_out, node_in, output_idx, input_idx)`` and specify which two filters are
163connected and which inputs/outputs should be used for this.
164
165Further, there are two special nodes in a filter graph: ``video_in`` and
166``video_out``, which represent the graph's input and output respectively. These
167names can not be chosen for other nodes (those nodes would simply be
168overwritten), and for a graph to be valid there must be a path from the input to
169the output and all nodes in the graph must be connected.
170
171While most graphs are quite simple, they can become very complex and we
172recommend that you read through the `FFMPEG documentation
173<https://ffmpeg.org/ffmpeg-filters.html#Filtergraph-description>`_ and their
174examples to better understand how to use them.
175
176"""
177
178from fractions import Fraction
179from math import ceil
180from typing import Any, Dict, List, Optional, Tuple, Union, Generator
181
182import av
183import av.filter
184import numpy as np
185from numpy.lib.stride_tricks import as_strided
186
187from ..core import Request
188from ..core.request import URI_BYTES, InitializationError, IOMode
189from ..core.v3_plugin_api import ImageProperties, PluginV3
190
191
192def _format_to_dtype(format: av.VideoFormat) -> np.dtype:
193 """Convert a pyAV video format into a numpy dtype"""
194
195 if len(format.components) == 0:
196 # fake format
197 raise ValueError(
198 f"Can't determine dtype from format `{format.name}`. It has no channels."
199 )
200
201 endian = ">" if format.is_big_endian else "<"
202 dtype = "f" if "f32" in format.name else "u"
203 bits_per_channel = [x.bits for x in format.components]
204 n_bytes = str(int(ceil(bits_per_channel[0] / 8)))
205
206 return np.dtype(endian + dtype + n_bytes)
207
208
209def _get_frame_shape(frame: av.VideoFrame) -> Tuple[int, ...]:
210 """Compute the frame's array shape
211
212 Parameters
213 ----------
214 frame : av.VideoFrame
215 A frame for which the resulting shape should be computed.
216
217 Returns
218 -------
219 shape : Tuple[int, ...]
220 A tuple describing the shape of the image data in the frame.
221
222 """
223
224 widths = [component.width for component in frame.format.components]
225 heights = [component.height for component in frame.format.components]
226 bits = np.array([component.bits for component in frame.format.components])
227 line_sizes = [plane.line_size for plane in frame.planes]
228
229 subsampled_width = widths[:-1] != widths[1:]
230 subsampled_height = heights[:-1] != heights[1:]
231 unaligned_components = np.any(bits % 8 != 0) or (line_sizes[:-1] != line_sizes[1:])
232 if subsampled_width or subsampled_height or unaligned_components:
233 raise IOError(
234 f"{frame.format.name} can't be expressed as a strided array."
235 "Use `format=` to select a format to convert into."
236 )
237
238 shape = [frame.height, frame.width]
239
240 # ffmpeg doesn't have a notion of channel-first or channel-last formats
241 # instead it stores frames in one or more planes which contain individual
242 # components of a pixel depending on the pixel format. For channel-first
243 # formats each component lives on a separate plane (n_planes) and for
244 # channel-last formats all components are packed on a single plane
245 # (n_channels)
246 n_planes = max([component.plane for component in frame.format.components]) + 1
247 if n_planes > 1:
248 shape = [n_planes] + shape
249
250 channels_per_plane = [0] * n_planes
251 for component in frame.format.components:
252 channels_per_plane[component.plane] += 1
253 n_channels = max(channels_per_plane)
254
255 if n_channels > 1:
256 shape = shape + [n_channels]
257
258 return tuple(shape)
259
260
261class PyAVPlugin(PluginV3):
262 """Support for pyAV as backend.
263
264 Parameters
265 ----------
266 request : iio.Request
267 A request object that represents the users intent. It provides a
268 standard interface to access various the various ImageResources and
269 serves them to the plugin as a file object (or file). Check the docs for
270 details.
271 container : str
272 Only used during `iio_mode="w"`! If not None, overwrite the default container
273 format chosen by pyav.
274 kwargs : Any
275 Additional kwargs are forwarded to PyAV's constructor.
276
277 """
278
279 def __init__(self, request: Request, *, container: str = None, **kwargs) -> None:
280 """Initialize a new Plugin Instance.
281
282 See Plugin's docstring for detailed documentation.
283
284 Notes
285 -----
286 The implementation here stores the request as a local variable that is
287 exposed using a @property below. If you inherit from PluginV3, remember
288 to call ``super().__init__(request)``.
289
290 """
291
292 super().__init__(request)
293
294 self._container = None
295 self._video_stream = None
296 self._video_filter = None
297
298 if request.mode.io_mode == IOMode.read:
299 self._next_idx = 0
300 try:
301 if request._uri_type == 5: # 5 is the value of URI_HTTP
302 # pyav should read from HTTP by itself. This enables reading
303 # HTTP-based streams like DASH. Note that solving streams
304 # like this is temporary until the new request object gets
305 # implemented.
306 self._container = av.open(request.raw_uri, **kwargs)
307 else:
308 self._container = av.open(request.get_file(), **kwargs)
309 self._video_stream = self._container.streams.video[0]
310 self._decoder = self._container.decode(video=0)
311 except av.AVError:
312 if isinstance(request.raw_uri, bytes):
313 msg = "PyAV does not support these `<bytes>`"
314 else:
315 msg = f"PyAV does not support `{request.raw_uri}`"
316 raise InitializationError(msg) from None
317 else:
318 self.frames_written = 0
319 file_handle = self.request.get_file()
320 filename = getattr(file_handle, "name", None)
321 extension = self.request.extension or self.request.format_hint
322 if extension is None:
323 raise InitializationError("Can't determine output container to use.")
324
325 # hacky, but beats running our own format selection logic
326 # (since av_guess_format is not exposed)
327 try:
328 setattr(file_handle, "name", filename or "tmp" + extension)
329 except AttributeError:
330 pass # read-only, nothing we can do
331
332 try:
333 self._container = av.open(
334 file_handle, mode="w", format=container, **kwargs
335 )
336 except ValueError:
337 raise InitializationError(
338 f"PyAV can not write to `{self.request.raw_uri}`"
339 )
340
341 # ---------------------
342 # Standard V3 Interface
343 # ---------------------
344
345 def read(
346 self,
347 *,
348 index: int = ...,
349 format: str = "rgb24",
350 filter_sequence: List[Tuple[str, Union[str, dict]]] = None,
351 filter_graph: Tuple[dict, List] = None,
352 constant_framerate: bool = None,
353 thread_count: int = 0,
354 thread_type: str = None,
355 ) -> np.ndarray:
356 """Read frames from the video.
357
358 If ``index`` is an integer, this function reads the index-th frame from
359 the file. If ``index`` is ... (Ellipsis), this function reads all frames
360 from the video, stacks them along the first dimension, and returns a
361 batch of frames.
362
363 Parameters
364 ----------
365 index : int
366 The index of the frame to read, e.g. ``index=5`` reads the 5th
367 frame. If ``...``, read all the frames in the video and stack them
368 along a new, prepended, batch dimension.
369 format : str
370 Set the returned colorspace. If not None (default: rgb24), convert
371 the data into the given format before returning it. If ``None``
372 return the data in the encoded format if it can be expressed as a
373 strided array; otherwise raise an Exception.
374 filter_sequence : List[str, str, dict]
375 If not None, apply the given sequence of FFmpeg filters to each
376 ndimage. Check the (module-level) plugin docs for details and
377 examples.
378 filter_graph : (dict, List)
379 If not None, apply the given graph of FFmpeg filters to each
380 ndimage. The graph is given as a tuple of two dicts. The first dict
381 contains a (named) set of nodes, and the second dict contains a set
382 of edges between nodes of the previous dict. Check the (module-level)
383 plugin docs for details and examples.
384 constant_framerate : bool
385 If True assume the video's framerate is constant. This allows for
386 faster seeking inside the file. If False, the video is reset before
387 each read and searched from the beginning. If None (default), this
388 value will be read from the container format.
389 thread_count : int
390 How many threads to use when decoding a frame. The default is 0,
391 which will set the number using ffmpeg's default, which is based on
392 the codec, number of available cores, threadding model, and other
393 considerations.
394 thread_type : str
395 The threading model to be used. One of
396
397 - `"SLICE"`: threads assemble parts of the current frame
398 - `"FRAME"`: threads may assemble future frames
399 - None (default): Uses ``"FRAME"`` if ``index=...`` and ffmpeg's
400 default otherwise.
401
402
403 Returns
404 -------
405 frame : np.ndarray
406 A numpy array containing loaded frame data.
407
408 Notes
409 -----
410 Accessing random frames repeatedly is costly (O(k), where k is the
411 average distance between two keyframes). You should do so only sparingly
412 if possible. In some cases, it can be faster to bulk-read the video (if
413 it fits into memory) and to then access the returned ndarray randomly.
414
415 The current implementation may cause problems for b-frames, i.e.,
416 bidirectionaly predicted pictures. I lack test videos to write unit
417 tests for this case.
418
419 Reading from an index other than ``...``, i.e. reading a single frame,
420 currently doesn't support filters that introduce delays.
421
422 """
423
424 if index is ...:
425 props = self.properties(format=format)
426 uses_filter = (
427 self._video_filter is not None
428 or filter_graph is not None
429 or filter_sequence is not None
430 )
431
432 self._container.seek(0)
433 if not uses_filter and props.shape[0] != 0:
434 frames = np.empty(props.shape, dtype=props.dtype)
435 for idx, frame in enumerate(
436 self.iter(
437 format=format,
438 filter_sequence=filter_sequence,
439 filter_graph=filter_graph,
440 thread_count=thread_count,
441 thread_type=thread_type or "FRAME",
442 )
443 ):
444 frames[idx] = frame
445 else:
446 frames = np.stack(
447 [
448 x
449 for x in self.iter(
450 format=format,
451 filter_sequence=filter_sequence,
452 filter_graph=filter_graph,
453 thread_count=thread_count,
454 thread_type=thread_type or "FRAME",
455 )
456 ]
457 )
458
459 # reset stream container, because threading model can't change after
460 # first access
461 self._video_stream.close()
462 self._video_stream = self._container.streams.video[0]
463
464 return frames
465
466 if thread_type is not None and thread_type != self._video_stream.thread_type:
467 self._video_stream.thread_type = thread_type
468 if (
469 thread_count != 0
470 and thread_count != self._video_stream.codec_context.thread_count
471 ):
472 # in FFMPEG thread_count == 0 means use the default count, which we
473 # change to mean don't change the thread count.
474 self._video_stream.codec_context.thread_count = thread_count
475
476 if constant_framerate is None:
477 constant_framerate = not self._container.format.variable_fps
478
479 # note: cheap for contigous incremental reads
480 self._seek(index, constant_framerate=constant_framerate)
481 desired_frame = next(self._decoder)
482 self._next_idx += 1
483
484 self.set_video_filter(filter_sequence, filter_graph)
485 if self._video_filter is not None:
486 desired_frame = self._video_filter.send(desired_frame)
487
488 return self._unpack_frame(desired_frame, format=format)
489
490 def iter(
491 self,
492 *,
493 format: str = "rgb24",
494 filter_sequence: List[Tuple[str, Union[str, dict]]] = None,
495 filter_graph: Tuple[dict, List] = None,
496 thread_count: int = 0,
497 thread_type: str = None,
498 ) -> np.ndarray:
499 """Yield frames from the video.
500
501 Parameters
502 ----------
503 frame : np.ndarray
504 A numpy array containing loaded frame data.
505 format : str
506 Convert the data into the given format before returning it. If None,
507 return the data in the encoded format if it can be expressed as a
508 strided array; otherwise raise an Exception.
509 filter_sequence : List[str, str, dict]
510 Set the returned colorspace. If not None (default: rgb24), convert
511 the data into the given format before returning it. If ``None``
512 return the data in the encoded format if it can be expressed as a
513 strided array; otherwise raise an Exception.
514 filter_graph : (dict, List)
515 If not None, apply the given graph of FFmpeg filters to each
516 ndimage. The graph is given as a tuple of two dicts. The first dict
517 contains a (named) set of nodes, and the second dict contains a set
518 of edges between nodes of the previous dict. Check the (module-level)
519 plugin docs for details and examples.
520 thread_count : int
521 How many threads to use when decoding a frame. The default is 0,
522 which will set the number using ffmpeg's default, which is based on
523 the codec, number of available cores, threadding model, and other
524 considerations.
525 thread_type : str
526 The threading model to be used. One of
527
528 - `"SLICE"` (default): threads assemble parts of the current frame
529 - `"FRAME"`: threads may assemble future frames (faster for bulk reading)
530
531
532 Yields
533 ------
534 frame : np.ndarray
535 A (decoded) video frame.
536
537
538 """
539
540 self._video_stream.thread_type = thread_type or "SLICE"
541 self._video_stream.codec_context.thread_count = thread_count
542
543 self.set_video_filter(filter_sequence, filter_graph)
544
545 for frame in self._decoder:
546 self._next_idx += 1
547
548 if self._video_filter is not None:
549 try:
550 frame = self._video_filter.send(frame)
551 except StopIteration:
552 break
553
554 if frame is None:
555 continue
556
557 yield self._unpack_frame(frame, format=format)
558
559 if self._video_filter is not None:
560 for frame in self._video_filter:
561 yield self._unpack_frame(frame, format=format)
562
563 def write(
564 self,
565 ndimage: Union[np.ndarray, List[np.ndarray]],
566 *,
567 codec: str = None,
568 is_batch: bool = True,
569 fps: int = 24,
570 in_pixel_format: str = "rgb24",
571 out_pixel_format: str = None,
572 filter_sequence: List[Tuple[str, Union[str, dict]]] = None,
573 filter_graph: Tuple[dict, List] = None,
574 ) -> Optional[bytes]:
575 """Save a ndimage as a video.
576
577 Given a batch of frames (stacked along the first axis) or a list of
578 frames, encode them and add the result to the ImageResource.
579
580 Parameters
581 ----------
582 ndimage : ArrayLike, List[ArrayLike]
583 The ndimage to encode and write to the ImageResource.
584 codec : str
585 The codec to use when encoding frames. Only needed on first write
586 and ignored on subsequent writes.
587 is_batch : bool
588 If True (default), the ndimage is a batch of images, otherwise it is
589 a single image. This parameter has no effect on lists of ndimages.
590 fps : str
591 The resulting videos frames per second.
592 in_pixel_format : str
593 The pixel format of the incoming ndarray. Defaults to "rgb24" and can
594 be any stridable pix_fmt supported by FFmpeg.
595 out_pixel_format : str
596 The pixel format to use while encoding frames. If None (default)
597 use the codec's default.
598 filter_sequence : List[str, str, dict]
599 If not None, apply the given sequence of FFmpeg filters to each
600 ndimage. Check the (module-level) plugin docs for details and
601 examples.
602 filter_graph : (dict, List)
603 If not None, apply the given graph of FFmpeg filters to each
604 ndimage. The graph is given as a tuple of two dicts. The first dict
605 contains a (named) set of nodes, and the second dict contains a set
606 of edges between nodes of the previous dict. Check the (module-level)
607 plugin docs for details and examples.
608
609 Returns
610 -------
611 encoded_image : bytes or None
612 If the chosen ImageResource is the special target ``"<bytes>"`` then
613 write will return a byte string containing the encoded image data.
614 Otherwise, it returns None.
615
616 Notes
617 -----
618 When writing ``<bytes>``, the video is finalized immediately after the
619 first write call and calling write multiple times to append frames is
620 not possible.
621
622 """
623
624 if isinstance(ndimage, list):
625 # frames shapes must agree for video
626 if any(f.shape != ndimage[0].shape for f in ndimage):
627 raise ValueError("All frames should have the same shape")
628 elif not is_batch:
629 ndimage = np.asarray(ndimage)[None, ...]
630 else:
631 ndimage = np.asarray(ndimage)
632
633 if self._video_stream is None:
634 self.init_video_stream(codec, fps=fps, pixel_format=out_pixel_format)
635
636 self.set_video_filter(filter_sequence, filter_graph)
637
638 for img in ndimage:
639 self.write_frame(img, pixel_format=in_pixel_format)
640
641 if self.request._uri_type == URI_BYTES:
642 # bytes are immutuable, so we have to flush immediately
643 # and can't support appending
644 self._flush_writer()
645 self._container.close()
646
647 return self.request.get_file().getvalue()
648
649 def properties(self, index: int = ..., *, format: str = "rgb24") -> ImageProperties:
650 """Standardized ndimage metadata.
651
652 Parameters
653 ----------
654 index : int
655 The index of the ndimage for which to return properties. If ``...``
656 (Ellipsis, default), return the properties for the resulting batch
657 of frames.
658 format : str
659 If not None (default: rgb24), convert the data into the given format
660 before returning it. If None return the data in the encoded format
661 if that can be expressed as a strided array; otherwise raise an
662 Exception.
663
664 Returns
665 -------
666 properties : ImageProperties
667 A dataclass filled with standardized image metadata.
668
669 Notes
670 -----
671 This function is efficient and won't process any pixel data.
672
673 The provided metadata does not include modifications by any filters
674 (through ``filter_sequence`` or ``filter_graph``).
675
676 """
677
678 video_width = self._video_stream.codec_context.width
679 video_height = self._video_stream.codec_context.height
680 pix_format = format or self._video_stream.codec_context.pix_fmt
681 frame_template = av.VideoFrame(video_width, video_height, pix_format)
682
683 shape = _get_frame_shape(frame_template)
684 if index is ...:
685 n_frames = self._video_stream.frames
686 shape = (n_frames,) + shape
687
688 return ImageProperties(
689 shape=tuple(shape),
690 dtype=_format_to_dtype(frame_template.format),
691 n_images=shape[0] if index is ... else None,
692 is_batch=index is ...,
693 )
694
695 def metadata(
696 self,
697 index: int = ...,
698 exclude_applied: bool = True,
699 constant_framerate: bool = None,
700 ) -> Dict[str, Any]:
701 """Format-specific metadata.
702
703 Returns a dictionary filled with metadata that is either stored in the
704 container, the video stream, or the frame's side-data.
705
706 Parameters
707 ----------
708 index : int
709 If ... (Ellipsis, default) return global metadata (the metadata
710 stored in the container and video stream). If not ..., return the
711 side data stored in the frame at the given index.
712 exclude_applied : bool
713 Currently, this parameter has no effect. It exists for compliance with
714 the ImageIO v3 API.
715 constant_framerate : bool
716 If True assume the video's framerate is constant. This allows for
717 faster seeking inside the file. If False, the video is reset before
718 each read and searched from the beginning. If None (default), this
719 value will be read from the container format.
720
721 Returns
722 -------
723 metadata : dict
724 A dictionary filled with format-specific metadata fields and their
725 values.
726
727 """
728
729 metadata = dict()
730
731 if index is ...:
732 # useful flags defined on the container and/or video stream
733 metadata.update(
734 {
735 "video_format": self._video_stream.codec_context.pix_fmt,
736 "codec": self._video_stream.codec.name,
737 "long_codec": self._video_stream.codec.long_name,
738 "profile": self._video_stream.profile,
739 "fps": float(self._video_stream.guessed_rate),
740 }
741 )
742 if self._video_stream.duration is not None:
743 duration = float(
744 self._video_stream.duration * self._video_stream.time_base
745 )
746 metadata.update({"duration": duration})
747
748 metadata.update(self.container_metadata)
749 metadata.update(self.video_stream_metadata)
750 return metadata
751
752 if constant_framerate is None:
753 constant_framerate = not self._container.format.variable_fps
754
755 self._seek(index, constant_framerate=constant_framerate)
756 desired_frame = next(self._decoder)
757 self._next_idx += 1
758
759 # useful flags defined on the frame
760 metadata.update(
761 {
762 "key_frame": bool(desired_frame.key_frame),
763 "time": desired_frame.time,
764 "interlaced_frame": bool(desired_frame.interlaced_frame),
765 "frame_type": desired_frame.pict_type.name,
766 }
767 )
768
769 # side data
770 metadata.update(
771 {item.type.name: bytes(item) for item in desired_frame.side_data}
772 )
773
774 return metadata
775
776 def close(self) -> None:
777 """Close the Video."""
778
779 is_write = self.request.mode.io_mode == IOMode.write
780 if is_write and self._video_stream is not None:
781 self._flush_writer()
782
783 if self._video_stream is not None:
784 try:
785 self._video_stream.close()
786 except ValueError:
787 pass # stream already closed
788
789 if self._container is not None:
790 self._container.close()
791
792 self.request.finish()
793
794 def __enter__(self) -> "PyAVPlugin":
795 return super().__enter__()
796
797 # ------------------------------
798 # Add-on Interface inside imopen
799 # ------------------------------
800
801 def init_video_stream(
802 self,
803 codec: str,
804 *,
805 fps: float = 24,
806 pixel_format: str = None,
807 max_keyframe_interval: int = None,
808 force_keyframes: bool = None,
809 ) -> None:
810 """Initialize a new video stream.
811
812 This function adds a new video stream to the ImageResource using the
813 selected encoder (codec), framerate, and colorspace.
814
815 Parameters
816 ----------
817 codec : str
818 The codec to use, e.g. ``"libx264"`` or ``"vp9"``.
819 fps : float
820 The desired framerate of the video stream (frames per second).
821 pixel_format : str
822 The pixel format to use while encoding frames. If None (default) use
823 the codec's default.
824 max_keyframe_interval : int
825 The maximum distance between two intra frames (I-frames). Also known
826 as GOP size. If unspecified use the codec's default. Note that not
827 every I-frame is a keyframe; see the notes for details.
828 force_keyframes : bool
829 If True, limit inter frames dependency to frames within the current
830 keyframe interval (GOP), i.e., force every I-frame to be a keyframe.
831 If unspecified, use the codec's default.
832
833 Notes
834 -----
835 You can usually leave ``max_keyframe_interval`` and ``force_keyframes``
836 at their default values, unless you try to generate seek-optimized video
837 or have a similar specialist use-case. In this case, ``force_keyframes``
838 controls the ability to seek to _every_ I-frame, and
839 ``max_keyframe_interval`` controls how close to a random frame you can
840 seek. Low values allow more fine-grained seek at the expense of
841 file-size (and thus I/O performance).
842
843 """
844
845 fps = Fraction.from_float(fps)
846 stream = self._container.add_stream(codec, fps)
847 stream.time_base = Fraction(1 / fps).limit_denominator(int(2**16 - 1))
848 if pixel_format is not None:
849 stream.pix_fmt = pixel_format
850 if max_keyframe_interval is not None:
851 stream.gop_size = max_keyframe_interval
852 if force_keyframes is not None:
853 stream.closed_gop = force_keyframes
854
855 self._video_stream = stream
856
857 def write_frame(self, frame: np.ndarray, *, pixel_format: str = "rgb24") -> None:
858 """Add a frame to the video stream.
859
860 This function appends a new frame to the video. It assumes that the
861 stream previously has been initialized. I.e., ``init_video_stream`` has
862 to be called before calling this function for the write to succeed.
863
864 Parameters
865 ----------
866 frame : np.ndarray
867 The image to be appended/written to the video stream.
868 pixel_format : str
869 The colorspace (pixel format) of the incoming frame.
870
871 Notes
872 -----
873 Frames may be held in a buffer, e.g., by the filter pipeline used during
874 writing or by FFMPEG to batch them prior to encoding. Make sure to
875 ``.close()`` the plugin or to use a context manager to ensure that all
876 frames are written to the ImageResource.
877
878 """
879
880 # manual packing of ndarray into frame
881 # (this should live in pyAV, but it doesn't support all the formats we
882 # want and PRs there are slow)
883 pixel_format = av.VideoFormat(pixel_format)
884 img_dtype = _format_to_dtype(pixel_format)
885 width = frame.shape[2 if pixel_format.is_planar else 1]
886 height = frame.shape[1 if pixel_format.is_planar else 0]
887 av_frame = av.VideoFrame(width, height, pixel_format.name)
888 if pixel_format.is_planar:
889 for idx, plane in enumerate(av_frame.planes):
890 plane_array = np.frombuffer(plane, dtype=img_dtype)
891 plane_array = as_strided(
892 plane_array,
893 shape=(plane.height, plane.width),
894 strides=(plane.line_size, img_dtype.itemsize),
895 )
896 plane_array[...] = frame[idx]
897 else:
898 if pixel_format.name.startswith("bayer_"):
899 # ffmpeg doesn't describe bayer formats correctly
900 # see https://github.com/imageio/imageio/issues/761#issuecomment-1059318851
901 # and following for details.
902 n_channels = 1
903 else:
904 n_channels = len(pixel_format.components)
905
906 plane = av_frame.planes[0]
907 plane_shape = (plane.height, plane.width)
908 plane_strides = (plane.line_size, n_channels * img_dtype.itemsize)
909 if n_channels > 1:
910 plane_shape += (n_channels,)
911 plane_strides += (img_dtype.itemsize,)
912
913 plane_array = as_strided(
914 np.frombuffer(plane, dtype=img_dtype),
915 shape=plane_shape,
916 strides=plane_strides,
917 )
918 plane_array[...] = frame
919
920 stream = self._video_stream
921 av_frame.time_base = stream.codec_context.time_base
922 av_frame.pts = self.frames_written
923 self.frames_written += 1
924
925 if self._video_filter is not None:
926 av_frame = self._video_filter.send(av_frame)
927 if av_frame is None:
928 return
929
930 if stream.frames == 0:
931 stream.width = av_frame.width
932 stream.height = av_frame.height
933
934 for packet in stream.encode(av_frame):
935 self._container.mux(packet)
936
937 def set_video_filter(
938 self,
939 filter_sequence: List[Tuple[str, Union[str, dict]]] = None,
940 filter_graph: Tuple[dict, List] = None,
941 ) -> None:
942 """Set the filter(s) to use.
943
944 This function creates a new FFMPEG filter graph to use when reading or
945 writing video. In the case of reading, frames are passed through the
946 filter graph before begin returned and, in case of writing, frames are
947 passed through the filter before being written to the video.
948
949 Parameters
950 ----------
951 filter_sequence : List[str, str, dict]
952 If not None, apply the given sequence of FFmpeg filters to each
953 ndimage. Check the (module-level) plugin docs for details and
954 examples.
955 filter_graph : (dict, List)
956 If not None, apply the given graph of FFmpeg filters to each
957 ndimage. The graph is given as a tuple of two dicts. The first dict
958 contains a (named) set of nodes, and the second dict contains a set
959 of edges between nodes of the previous dict. Check the
960 (module-level) plugin docs for details and examples.
961
962 Notes
963 -----
964 Changing a filter graph with lag during reading or writing will
965 currently cause frames in the filter queue to be lost.
966
967 """
968
969 if filter_sequence is None and filter_graph is None:
970 self._video_filter = None
971 return
972
973 if filter_sequence is None:
974 filter_sequence = list()
975
976 node_descriptors: Dict[str, Tuple[str, Union[str, Dict]]]
977 edges: List[Tuple[str, str, int, int]]
978 if filter_graph is None:
979 node_descriptors, edges = dict(), [("video_in", "video_out", 0, 0)]
980 else:
981 node_descriptors, edges = filter_graph
982
983 graph = av.filter.Graph()
984
985 previous_node = graph.add_buffer(template=self._video_stream)
986 for filter_name, argument in filter_sequence:
987 if isinstance(argument, str):
988 current_node = graph.add(filter_name, argument)
989 else:
990 current_node = graph.add(filter_name, **argument)
991 previous_node.link_to(current_node)
992 previous_node = current_node
993
994 nodes = dict()
995 nodes["video_in"] = previous_node
996 nodes["video_out"] = graph.add("buffersink")
997 for name, (filter_name, arguments) in node_descriptors.items():
998 if isinstance(arguments, str):
999 nodes[name] = graph.add(filter_name, arguments)
1000 else:
1001 nodes[name] = graph.add(filter_name, **arguments)
1002
1003 for from_note, to_node, out_idx, in_idx in edges:
1004 nodes[from_note].link_to(nodes[to_node], out_idx, in_idx)
1005
1006 graph.configure()
1007
1008 def video_filter():
1009 # this starts a co-routine
1010 # send frames using graph.send()
1011 frame = yield None
1012
1013 # send and receive frames in "parallel"
1014 while frame is not None:
1015 graph.push(frame)
1016 try:
1017 frame = yield graph.pull()
1018 except av.error.BlockingIOError:
1019 # filter has lag and needs more frames
1020 frame = yield None
1021 except av.error.EOFError:
1022 break
1023
1024 try:
1025 # send EOF in av>=9.0
1026 graph.push(None)
1027 except ValueError: # pragma: no cover
1028 # handle av<9.0
1029 pass
1030
1031 # all frames have been sent, empty the filter
1032 while True:
1033 try:
1034 yield graph.pull()
1035 except av.error.EOFError:
1036 break # EOF
1037 except av.error.BlockingIOError: # pragma: no cover
1038 # handle av<9.0
1039 break
1040
1041 self._video_filter = video_filter()
1042 self._video_filter.send(None)
1043
1044 @property
1045 def container_metadata(self):
1046 """Container-specific metadata.
1047
1048 A dictionary containing metadata stored at the container level.
1049
1050 """
1051 return self._container.metadata
1052
1053 @property
1054 def video_stream_metadata(self):
1055 """Stream-specific metadata.
1056
1057 A dictionary containing metadata stored at the stream level.
1058
1059 """
1060 return self._video_stream.metadata
1061
1062 # -------------------------------
1063 # Internals and private functions
1064 # -------------------------------
1065
1066 def _unpack_frame(self, frame: av.VideoFrame, *, format: str = None) -> np.ndarray:
1067 """Convert a av.VideoFrame into a ndarray
1068
1069 Parameters
1070 ----------
1071 frame : av.VideoFrame
1072 The frame to unpack.
1073 format : str
1074 If not None, convert the frame to the given format before unpacking.
1075
1076 """
1077
1078 if format is not None:
1079 frame = frame.reformat(format=format)
1080
1081 dtype = _format_to_dtype(frame.format)
1082 shape = _get_frame_shape(frame)
1083
1084 planes = list()
1085 for idx in range(len(frame.planes)):
1086 n_channels = sum(
1087 [
1088 x.bits // (dtype.itemsize * 8)
1089 for x in frame.format.components
1090 if x.plane == idx
1091 ]
1092 )
1093 av_plane = frame.planes[idx]
1094 plane_shape = (av_plane.height, av_plane.width)
1095 plane_strides = (av_plane.line_size, n_channels * dtype.itemsize)
1096 if n_channels > 1:
1097 plane_shape += (n_channels,)
1098 plane_strides += (dtype.itemsize,)
1099
1100 np_plane = as_strided(
1101 np.frombuffer(av_plane, dtype=dtype),
1102 shape=plane_shape,
1103 strides=plane_strides,
1104 )
1105 planes.append(np_plane)
1106
1107 if len(planes) > 1:
1108 # Note: the planes *should* exist inside a contigous memory block
1109 # somewhere inside av.Frame however pyAV does not appear to expose this,
1110 # so we are forced to copy the planes individually instead of wrapping
1111 # them :(
1112 out = np.concatenate(planes).reshape(shape)
1113 else:
1114 out = planes[0]
1115
1116 return out
1117
1118 def _seek(self, index, *, constant_framerate: bool = True) -> Generator:
1119 """Seeks to the frame at the given index."""
1120
1121 if index == self._next_idx:
1122 return # fast path :)
1123
1124 # we must decode at least once before we seek otherwise the
1125 # returned frames become corrupt.
1126 if self._next_idx == 0:
1127 next(self._decoder)
1128 self._next_idx += 1
1129
1130 if index == self._next_idx:
1131 return # fast path :)
1132
1133 # remove this branch until I find a way to efficiently find the next
1134 # keyframe. keeping this as a reminder
1135 # if self._next_idx < index and index < self._next_keyframe_idx:
1136 # frames_to_yield = index - self._next_idx
1137 if not constant_framerate and index > self._next_idx:
1138 frames_to_yield = index - self._next_idx
1139 elif not constant_framerate:
1140 # seek backwards and can't link idx and pts
1141 self._container.seek(0)
1142 self._decoder = self._container.decode(video=0)
1143 self._next_idx = 0
1144
1145 frames_to_yield = index
1146 else:
1147 # we know that the time between consecutive frames is constant
1148 # hence we can link index and pts
1149
1150 # how many pts lie between two frames
1151 sec_delta = 1 / self._video_stream.guessed_rate
1152 pts_delta = sec_delta / self._video_stream.time_base
1153
1154 index_pts = int(index * pts_delta)
1155
1156 # this only seeks to the closed (preceeding) keyframe
1157 self._container.seek(index_pts, stream=self._video_stream)
1158 self._decoder = self._container.decode(video=0)
1159
1160 # this may be made faster if we could get the keyframe's time without
1161 # decoding it
1162 keyframe = next(self._decoder)
1163 keyframe_time = keyframe.pts * keyframe.time_base
1164 keyframe_pts = int(keyframe_time / self._video_stream.time_base)
1165 keyframe_index = keyframe_pts // pts_delta
1166
1167 self._container.seek(index_pts, stream=self._video_stream)
1168 self._next_idx = keyframe_index
1169
1170 frames_to_yield = index - keyframe_index
1171
1172 for _ in range(frames_to_yield):
1173 next(self._decoder)
1174 self._next_idx += 1
1175
1176 def _flush_writer(self):
1177 """Flush the filter and encoder
1178
1179 This will reset the filter to `None` and send EoF to the encoder,
1180 i.e., after calling, no more frames may be written.
1181
1182 """
1183
1184 stream = self._video_stream
1185
1186 if self._video_filter is not None:
1187 # flush encoder
1188 for av_frame in self._video_filter:
1189 if stream.frames == 0:
1190 stream.width = av_frame.width
1191 stream.height = av_frame.height
1192 for packet in stream.encode(av_frame):
1193 self._container.mux(packet)
1194 self._video_filter = None
1195
1196 # flush stream
1197 for packet in stream.encode():
1198 self._container.mux(packet)
1199 self._video_stream = None