##### Copyright 2022 The TensorFlow Authors.

In [1]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

<table class="tfo-notebook-buttons" align="left">
  <td>     <a target="_blank" href="https://tensorflow.google.cn/tutorials/video/transfer_learning_with_movinet"><img src="https://tensorflow.google.cn/images/tf_logo_32px.png">在 TensorFlow.org 上查看</a>
</td>
  <td>     <a target="_blank" href="https://colab.research.google.com/github/tensorflow/docs-l10n/blob/master/site/zh-cn/tutorials/video/transfer_learning_with_movinet.ipynb"><img src="https://tensorflow.google.cn/images/colab_logo_32px.png">在 Google Colab 中运行</a>
</td>
  <td>     <a target="_blank" href="https://github.com/tensorflow/docs-l10n/blob/master/site/zh-cn/tutorials/video/transfer_learning_with_movinet.ipynb"><img src="https://tensorflow.google.cn/images/GitHub-Mark-32px.png">在 Github 上查看源代码</a>
</td>
  <td>     <a href="https://storage.googleapis.com/tensorflow_docs/docs-l10n/site/zh-cn/tutorials/video/transfer_learning_with_movinet.ipynb"><img src="https://tensorflow.google.cn/images/download_logo_32px.png">下载笔记本</a>
</td>
</table>

# 使用 MoViNet 进行视频分类的迁移学习

MoViNets（移动视频网络）提供了一系列高效的视频分类模型，支持对流式视频进行推断。在本教程中，您将使用预训练的 MoViNet 模型对来自 [UCF101 数据集](https://www.crcv.ucf.edu/data/UCF101.php)的视频进行分类，特别是针对动作识别任务。预训练模型是一个先前在更大数据集上训练过的已保存网络。可以在 Kondratyuk, D. 等人 2021 年撰写的 [MoViNets: Mobile Video Networks for Efficient Video Recognition](https://arxiv.org/abs/2103.11511) 论文中找到有关 MoViNets 的更多详细信息。在本教程中，您将完成以下任务：

- 了解如何下载预训练的 MoViNet 模型
- 通过冻结 MoViNet 模型的卷积基，使用带有新分类器的预训练模型创建新模型
- 将分类器头替换为新数据集的标签数
- 在 [UCF101 数据集](https://www.crcv.ucf.edu/data/UCF101.php)上执行迁移学习

本教程下载的模型来自 [official/projects/movinet](https://github.com/tensorflow/models/tree/master/official/projects/movinet)。此仓库包含 TF Hub 在 TensorFlow 2 SavedModel 格式中使用的 MoViNet 模型集合。

本视频加载和预处理教程是 TensorFlow 视频教程系列的第一部分。下面是其他三个教程：

- [加载视频数据](https://tensorflow.google.cn/tutorials/load_data/video)：本教程解释了本文档中使用的大部分代码；特别是，更详细地解释了如何通过 `FrameGenerator` 类预处理和加载数据。
- [构建用于视频分类的 3D CNN 模型](https://tensorflow.google.cn/tutorials/video/video_classification)。请注意，本教程使用分解 3D 数据的空间和时间方面的 (2+1)D CNN；如果使用 MRI 扫描等体数据，请考虑使用 3D CNN 而不是 (2+1)D CNN。
- [用于流式动作识别的 MoViNet](https://tensorflow.google.cn/hub/tutorials/movinet)：熟悉 TF Hub 上提供的 MoViNet 模型。

## 安装

首先，安装并导入一些必要的库，包括：用于检查 ZIP 文件内容的 [remotezip](https://github.com/gtsystem/python-remotezip)，用于使用进度条的 [tqdm](https://github.com/tqdm/tqdm)，用于处理视频文件的 [OpenCV](https://opencv.org/)（确保 `opencv-python` 和 `opencv-python-headless` 是同一版本），以及用于下载预训练 MoViNet 模型的 TensorFlow 模型 ([`tf-models- official`](https://github.com/tensorflow/models/tree/master/official))。TensorFlow 模型软件包是一组使用 TensorFlow 高级 API 的模型。

In [2]:
!pip install remotezip tqdm opencv-python==4.5.2.52 opencv-python-headless==4.5.2.52 tf-models-official

Collecting remotezip


  Downloading remotezip-0.12.1.tar.gz (7.5 kB)


  Preparing metadata (setup.py) ... [?25l-

 done


Collecting opencv-python==4.5.2.52
  Downloading opencv_python-4.5.2.52-cp39-cp39-manylinux2014_x86_64.whl (51.0 MB)


Collecting opencv-python-headless==4.5.2.52
  Downloading opencv_python_headless-4.5.2.52-cp39-cp39-manylinux2014_x86_64.whl (38.2 MB)


Collecting tf-models-official


  Downloading tf_models_official-2.14.2-py2.py3-none-any.whl.metadata (1.4 kB)
Collecting tabulate (from remotezip)


  Downloading tabulate-0.9.0-py3-none-any.whl (35 kB)


Collecting Cython (from tf-models-official)


  Downloading Cython-3.0.5-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.2 kB)
Collecting gin-config (from tf-models-official)
  Downloading gin_config-0.5.0-py3-none-any.whl (61 kB)


Collecting google-api-python-client>=1.6.7 (from tf-models-official)
  Downloading google_api_python_client-2.107.0-py2.py3-none-any.whl.metadata (6.6 kB)


Collecting immutabledict (from tf-models-official)
  Downloading immutabledict-3.0.0-py3-none-any.whl.metadata (3.1 kB)


Collecting kaggle>=1.3.9 (from tf-models-official)
  Downloading kaggle-1.5.16.tar.gz (83 kB)


  Preparing metadata (setup.py) ... [?25l-

 done


Collecting oauth2client (from tf-models-official)
  Downloading oauth2client-4.1.3-py2.py3-none-any.whl (98 kB)


Collecting py-cpuinfo>=3.3.0 (from tf-models-official)
  Downloading py_cpuinfo-9.0.0-py3-none-any.whl (22 kB)


Collecting pycocotools (from tf-models-official)
  Downloading pycocotools-2.0.7-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (1.1 kB)


Collecting sacrebleu (from tf-models-official)


  Downloading sacrebleu-2.3.2-py3-none-any.whl.metadata (57 kB)


Collecting sentencepiece (from tf-models-official)
  Downloading sentencepiece-0.1.99-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)


Collecting seqeval (from tf-models-official)
  Downloading seqeval-1.2.2.tar.gz (43 kB)


  Preparing metadata (setup.py) ... [?25l-

 \

 done


Collecting tensorflow-model-optimization>=0.4.1 (from tf-models-official)
  Downloading tensorflow_model_optimization-0.7.5-py2.py3-none-any.whl.metadata (914 bytes)


Collecting tensorflow-text~=2.14.0 (from tf-models-official)


  Downloading tensorflow_text-2.14.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (1.9 kB)


Collecting tensorflow~=2.14.0 (from tf-models-official)


  Downloading tensorflow-2.14.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.1 kB)


Collecting tf-slim>=1.1.0 (from tf-models-official)
  Downloading tf_slim-1.1.0-py2.py3-none-any.whl (352 kB)


Collecting httplib2<1.dev0,>=0.15.0 (from google-api-python-client>=1.6.7->tf-models-official)
  Downloading httplib2-0.22.0-py3-none-any.whl (96 kB)


Collecting google-auth-httplib2>=0.1.0 (from google-api-python-client>=1.6.7->tf-models-official)
  Downloading google_auth_httplib2-0.1.1-py2.py3-none-any.whl.metadata (2.1 kB)


Collecting google-api-core!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.0,<3.0.0.dev0,>=1.31.5 (from google-api-python-client>=1.6.7->tf-models-official)
  Downloading google_api_core-2.12.0-py3-none-any.whl.metadata (2.7 kB)


Collecting uritemplate<5,>=3.0.1 (from google-api-python-client>=1.6.7->tf-models-official)
  Downloading uritemplate-4.1.1-py2.py3-none-any.whl (10 kB)


Collecting python-slugify (from kaggle>=1.3.9->tf-models-official)
  Downloading python_slugify-8.0.1-py2.py3-none-any.whl (9.7 kB)






Collecting tensorboard<2.15,>=2.14 (from tensorflow~=2.14.0->tf-models-official)
  Downloading tensorboard-2.14.1-py3-none-any.whl.metadata (1.7 kB)


Collecting tensorflow-estimator<2.15,>=2.14.0 (from tensorflow~=2.14.0->tf-models-official)
  Downloading tensorflow_estimator-2.14.0-py2.py3-none-any.whl.metadata (1.3 kB)


Collecting keras<2.15,>=2.14.0 (from tensorflow~=2.14.0->tf-models-official)
  Downloading keras-2.14.0-py3-none-any.whl.metadata (2.4 kB)






Collecting portalocker (from sacrebleu->tf-models-official)
  Downloading portalocker-2.8.2-py3-none-any.whl.metadata (8.5 kB)


Collecting regex (from sacrebleu->tf-models-official)
  Downloading regex-2023.10.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (40 kB)


Collecting colorama (from sacrebleu->tf-models-official)
  Downloading colorama-0.4.6-py2.py3-none-any.whl (25 kB)


Collecting lxml (from sacrebleu->tf-models-official)
  Downloading lxml-4.9.3-cp39-cp39-manylinux_2_28_x86_64.whl.metadata (3.8 kB)












Collecting google-auth-oauthlib<1.1,>=0.5 (from tensorboard<2.15,>=2.14->tensorflow~=2.14.0->tf-models-official)
  Downloading google_auth_oauthlib-1.0.0-py2.py3-none-any.whl (18 kB)




Collecting text-unidecode>=1.3 (from python-slugify->kaggle>=1.3.9->tf-models-official)


  Downloading text_unidecode-1.3-py2.py3-none-any.whl (78 kB)








Downloading tf_models_official-2.14.2-py2.py3-none-any.whl (2.7 MB)


Downloading google_api_python_client-2.107.0-py2.py3-none-any.whl (12.7 MB)


Downloading tensorflow-2.14.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (489.8 MB)


Downloading tensorflow_model_optimization-0.7.5-py2.py3-none-any.whl (241 kB)
Downloading tensorflow_text-2.14.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.5 MB)


Downloading Cython-3.0.5-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.6 MB)


Downloading immutabledict-3.0.0-py3-none-any.whl (4.0 kB)
Downloading pycocotools-2.0.7-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (435 kB)
Downloading sacrebleu-2.3.2-py3-none-any.whl (119 kB)


Downloading google_api_core-2.12.0-py3-none-any.whl (121 kB)
Downloading google_auth_httplib2-0.1.1-py2.py3-none-any.whl (9.3 kB)


Downloading keras-2.14.0-py3-none-any.whl (1.7 MB)
Downloading tensorboard-2.14.1-py3-none-any.whl (5.5 MB)


Downloading tensorflow_estimator-2.14.0-py2.py3-none-any.whl (440 kB)
Downloading lxml-4.9.3-cp39-cp39-manylinux_2_28_x86_64.whl (8.0 MB)


Downloading portalocker-2.8.2-py3-none-any.whl (17 kB)
Downloading regex-2023.10.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (773 kB)


Building wheels for collected packages: remotezip, kaggle, seqeval


  Building wheel for remotezip (setup.py) ... [?25l-

 done
[?25h  Created wheel for remotezip: filename=remotezip-0.12.1-py3-none-any.whl size=7934 sha256=71040c701024ca50e6820b29587d9487c50f07b5dabe4c335f9759b36f1543a0
  Stored in directory: /home/kbuilder/.cache/pip/wheels/60/74/6c/b12b4c8fb4b7ab08f495ce17e88f1e98835268af7a8ad5588f


  Building wheel for kaggle (setup.py) ... [?25l-

 \

 done
[?25h  Created wheel for kaggle: filename=kaggle-1.5.16-py3-none-any.whl size=110683 sha256=6d41fc76e591c7d0525ee18ae40a68e35671c21f903befa4f8fd0d0887ff0ca3
  Stored in directory: /home/kbuilder/.cache/pip/wheels/d2/ed/a5/da3a0cfb13373d1ace41cafa4f2467d858c55c52473ba72799


  Building wheel for seqeval (setup.py) ... [?25l-

 \

 done
[?25h  Created wheel for seqeval: filename=seqeval-1.2.2-py3-none-any.whl size=16162 sha256=6a82fd451c05eee48f90f2958e8725be600286a1c36757efe66d90b8f57d6615
  Stored in directory: /home/kbuilder/.cache/pip/wheels/e2/a5/92/2c80d1928733611c2747a9820e1324a6835524d9411510c142
Successfully built remotezip kaggle seqeval


Installing collected packages: text-unidecode, sentencepiece, py-cpuinfo, gin-config, uritemplate, tf-slim, tensorflow-model-optimization, tensorflow-estimator, tabulate, regex, python-slugify, portalocker, opencv-python-headless, opencv-python, lxml, keras, immutabledict, httplib2, Cython, colorama, sacrebleu, remotezip, oauth2client, kaggle, seqeval, pycocotools, google-auth-oauthlib, google-auth-httplib2, google-api-core, tensorboard, google-api-python-client, tensorflow, tensorflow-text, tf-models-official


  Attempting uninstall: tensorflow-estimator
    Found existing installation: tensorflow-estimator 2.15.0
    Uninstalling tensorflow-estimator-2.15.0:
      Successfully uninstalled tensorflow-estimator-2.15.0


  Attempting uninstall: keras
    Found existing installation: keras 2.15.0


    Uninstalling keras-2.15.0:
      Successfully uninstalled keras-2.15.0


  Attempting uninstall: google-auth-oauthlib
    Found existing installation: google-auth-oauthlib 1.1.0
    Uninstalling google-auth-oauthlib-1.1.0:
      Successfully uninstalled google-auth-oauthlib-1.1.0


  Attempting uninstall: tensorboard
    Found existing installation: tensorboard 2.15.1
    Uninstalling tensorboard-2.15.1:
      Successfully uninstalled tensorboard-2.15.1


  Attempting uninstall: tensorflow
    Found existing installation: tensorflow 2.15.0rc1


    Uninstalling tensorflow-2.15.0rc1:


      Successfully uninstalled tensorflow-2.15.0rc1


Successfully installed Cython-3.0.5 colorama-0.4.6 gin-config-0.5.0 google-api-core-2.12.0 google-api-python-client-2.107.0 google-auth-httplib2-0.1.1 google-auth-oauthlib-1.0.0 httplib2-0.22.0 immutabledict-3.0.0 kaggle-1.5.16 keras-2.14.0 lxml-4.9.3 oauth2client-4.1.3 opencv-python-4.5.2.52 opencv-python-headless-4.5.2.52 portalocker-2.8.2 py-cpuinfo-9.0.0 pycocotools-2.0.7 python-slugify-8.0.1 regex-2023.10.3 remotezip-0.12.1 sacrebleu-2.3.2 sentencepiece-0.1.99 seqeval-1.2.2 tabulate-0.9.0 tensorboard-2.14.1 tensorflow-2.14.0 tensorflow-estimator-2.14.0 tensorflow-model-optimization-0.7.5 tensorflow-text-2.14.0 text-unidecode-1.3 tf-models-official-2.14.2 tf-slim-1.1.0 uritemplate-4.1.1


In [3]:
import tqdm
import random
import pathlib
import itertools
import collections

import cv2
import numpy as np
import remotezip as rz
import seaborn as sns
import matplotlib.pyplot as plt

import keras
import tensorflow as tf
import tensorflow_hub as hub
from tensorflow.keras import layers
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.losses import SparseCategoricalCrossentropy

# Import the MoViNet model from TensorFlow Models (tf-models-official) for the MoViNet model
from official.projects.movinet.modeling import movinet
from official.projects.movinet.modeling import movinet_model

2023-11-07 17:29:26.513969: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-11-07 17:29:26.514019: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-11-07 17:29:26.514050: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


## 加载数据

下面的隐藏单元定义了从 UCF-101 数据集下载数据切片并将其加载到 `tf.data.Dataset` 中的函数。[加载视频数据教程](https://tensorflow.google.cn/tutorials/load_data/video)详细地介绍了此代码。

隐藏块末尾的 `FrameGenerator` 类是这里最重要的实用工具。它会创建一个可以将数据馈送到 TensorFlow 数据流水线中的可迭代对象。具体来说，此类包含一个可加载视频帧及其编码标签的 Python 生成器。生成器 (`__call__`) 函数可产生由 `frames_from_video_file` 生成的帧数组以及与帧集关联的标签的独热编码向量。


In [4]:
#@title 

def list_files_per_class(zip_url):
  """
    List the files in each class of the dataset given the zip URL.

    Args:
      zip_url: URL from which the files can be unzipped. 

    Return:
      files: List of files in each of the classes.
  """
  files = []
  with rz.RemoteZip(URL) as zip:
    for zip_info in zip.infolist():
      files.append(zip_info.filename)
  return files

def get_class(fname):
  """
    Retrieve the name of the class given a filename.

    Args:
      fname: Name of the file in the UCF101 dataset.

    Return:
      Class that the file belongs to.
  """
  return fname.split('_')[-3]

def get_files_per_class(files):
  """
    Retrieve the files that belong to each class. 

    Args:
      files: List of files in the dataset.

    Return:
      Dictionary of class names (key) and files (values).
  """
  files_for_class = collections.defaultdict(list)
  for fname in files:
    class_name = get_class(fname)
    files_for_class[class_name].append(fname)
  return files_for_class

def download_from_zip(zip_url, to_dir, file_names):
  """
    Download the contents of the zip file from the zip URL.

    Args:
      zip_url: Zip URL containing data.
      to_dir: Directory to download data to.
      file_names: Names of files to download.
  """
  with rz.RemoteZip(zip_url) as zip:
    for fn in tqdm.tqdm(file_names):
      class_name = get_class(fn)
      zip.extract(fn, str(to_dir / class_name))
      unzipped_file = to_dir / class_name / fn

      fn = pathlib.Path(fn).parts[-1]
      output_file = to_dir / class_name / fn
      unzipped_file.rename(output_file,)

def split_class_lists(files_for_class, count):
  """
    Returns the list of files belonging to a subset of data as well as the remainder of
    files that need to be downloaded.

    Args:
      files_for_class: Files belonging to a particular class of data.
      count: Number of files to download.

    Return:
      split_files: Files belonging to the subset of data.
      remainder: Dictionary of the remainder of files that need to be downloaded.
  """
  split_files = []
  remainder = {}
  for cls in files_for_class:
    split_files.extend(files_for_class[cls][:count])
    remainder[cls] = files_for_class[cls][count:]
  return split_files, remainder

def download_ufc_101_subset(zip_url, num_classes, splits, download_dir):
  """
    Download a subset of the UFC101 dataset and split them into various parts, such as
    training, validation, and test. 

    Args:
      zip_url: Zip URL containing data.
      num_classes: Number of labels.
      splits: Dictionary specifying the training, validation, test, etc. (key) division of data 
              (value is number of files per split).
      download_dir: Directory to download data to.

    Return:
      dir: Posix path of the resulting directories containing the splits of data.
  """
  files = list_files_per_class(zip_url)
  for f in files:
    tokens = f.split('/')
    if len(tokens) <= 2:
      files.remove(f) # Remove that item from the list if it does not have a filename

  files_for_class = get_files_per_class(files)

  classes = list(files_for_class.keys())[:num_classes]

  for cls in classes:
    new_files_for_class = files_for_class[cls]
    random.shuffle(new_files_for_class)
    files_for_class[cls] = new_files_for_class

  # Only use the number of classes you want in the dictionary
  files_for_class = {x: files_for_class[x] for x in list(files_for_class)[:num_classes]}

  dirs = {}
  for split_name, split_count in splits.items():
    print(split_name, ":")
    split_dir = download_dir / split_name
    split_files, files_for_class = split_class_lists(files_for_class, split_count)
    download_from_zip(zip_url, split_dir, split_files)
    dirs[split_name] = split_dir

  return dirs

def format_frames(frame, output_size):
  """
    Pad and resize an image from a video.

    Args:
      frame: Image that needs to resized and padded. 
      output_size: Pixel size of the output frame image.

    Return:
      Formatted frame with padding of specified output size.
  """
  frame = tf.image.convert_image_dtype(frame, tf.float32)
  frame = tf.image.resize_with_pad(frame, *output_size)
  return frame

def frames_from_video_file(video_path, n_frames, output_size = (224,224), frame_step = 15):
  """
    Creates frames from each video file present for each category.

    Args:
      video_path: File path to the video.
      n_frames: Number of frames to be created per video file.
      output_size: Pixel size of the output frame image.

    Return:
      An NumPy array of frames in the shape of (n_frames, height, width, channels).
  """
  # Read each video frame by frame
  result = []
  src = cv2.VideoCapture(str(video_path))  

  video_length = src.get(cv2.CAP_PROP_FRAME_COUNT)

  need_length = 1 + (n_frames - 1) * frame_step

  if need_length > video_length:
    start = 0
  else:
    max_start = video_length - need_length
    start = random.randint(0, max_start + 1)

  src.set(cv2.CAP_PROP_POS_FRAMES, start)
  # ret is a boolean indicating whether read was successful, frame is the image itself
  ret, frame = src.read()
  result.append(format_frames(frame, output_size))

  for _ in range(n_frames - 1):
    for _ in range(frame_step):
      ret, frame = src.read()
    if ret:
      frame = format_frames(frame, output_size)
      result.append(frame)
    else:
      result.append(np.zeros_like(result[0]))
  src.release()
  result = np.array(result)[..., [2, 1, 0]]

  return result

class FrameGenerator:
  def __init__(self, path, n_frames, training = False):
    """ Returns a set of frames with their associated label. 

      Args:
        path: Video file paths.
        n_frames: Number of frames. 
        training: Boolean to determine if training dataset is being created.
    """
    self.path = path
    self.n_frames = n_frames
    self.training = training
    self.class_names = sorted(set(p.name for p in self.path.iterdir() if p.is_dir()))
    self.class_ids_for_name = dict((name, idx) for idx, name in enumerate(self.class_names))

  def get_files_and_class_names(self):
    video_paths = list(self.path.glob('*/*.avi'))
    classes = [p.parent.name for p in video_paths] 
    return video_paths, classes

  def __call__(self):
    video_paths, classes = self.get_files_and_class_names()

    pairs = list(zip(video_paths, classes))

    if self.training:
      random.shuffle(pairs)

    for path, name in pairs:
      video_frames = frames_from_video_file(path, self.n_frames) 
      label = self.class_ids_for_name[name] # Encode labels
      yield video_frames, label

In [5]:
URL = 'https://storage.googleapis.com/thumos14_files/UCF101_videos.zip'
download_dir = pathlib.Path('./UCF101_subset/')
subset_paths = download_ufc_101_subset(URL, 
                        num_classes = 10, 
                        splits = {"train": 30, "test": 20}, 
                        download_dir = download_dir)

train :


  0%|          | 0/300 [00:00<?, ?it/s]

  0%|          | 1/300 [00:00<00:46,  6.42it/s]

  1%|          | 2/300 [00:00<00:45,  6.56it/s]

  1%|          | 3/300 [00:00<00:41,  7.11it/s]

  1%|▏         | 4/300 [00:00<00:39,  7.59it/s]

  2%|▏         | 5/300 [00:00<00:40,  7.32it/s]

  2%|▏         | 7/300 [00:00<00:34,  8.57it/s]

  3%|▎         | 8/300 [00:01<00:35,  8.19it/s]

  3%|▎         | 9/300 [00:01<00:36,  7.87it/s]

  3%|▎         | 10/300 [00:01<00:41,  6.93it/s]

  4%|▍         | 12/300 [00:01<00:35,  8.23it/s]

  5%|▍         | 14/300 [00:01<00:31,  9.18it/s]

  5%|▌         | 16/300 [00:01<00:30,  9.46it/s]

  6%|▌         | 17/300 [00:02<00:31,  8.88it/s]

  6%|▋         | 19/300 [00:02<00:27, 10.07it/s]

  7%|▋         | 21/300 [00:02<00:27, 10.32it/s]

  8%|▊         | 23/300 [00:02<00:26, 10.32it/s]

  8%|▊         | 25/300 [00:02<00:25, 10.63it/s]

  9%|▉         | 27/300 [00:02<00:27,  9.96it/s]

 10%|▉         | 29/300 [00:03<00:27,  9.74it/s]

 10%|█         | 31/300 [00:03<00:25, 10.54it/s]

 11%|█         | 33/300 [00:03<00:23, 11.24it/s]

 12%|█▏        | 35/300 [00:03<00:22, 11.57it/s]

 12%|█▏        | 37/300 [00:03<00:21, 12.33it/s]

 13%|█▎        | 39/300 [00:03<00:21, 11.89it/s]

 14%|█▎        | 41/300 [00:04<00:21, 11.78it/s]

 14%|█▍        | 43/300 [00:04<00:21, 11.75it/s]

 15%|█▌        | 45/300 [00:04<00:21, 11.63it/s]

 16%|█▌        | 47/300 [00:04<00:22, 11.49it/s]

 16%|█▋        | 49/300 [00:04<00:21, 11.70it/s]

 17%|█▋        | 51/300 [00:04<00:20, 12.41it/s]

 18%|█▊        | 53/300 [00:05<00:19, 12.58it/s]

 18%|█▊        | 55/300 [00:05<00:19, 12.64it/s]

 19%|█▉        | 57/300 [00:05<00:19, 12.78it/s]

 20%|█▉        | 59/300 [00:05<00:18, 12.95it/s]

 20%|██        | 61/300 [00:05<00:21, 11.24it/s]

 21%|██        | 63/300 [00:06<00:22, 10.49it/s]

 22%|██▏       | 65/300 [00:06<00:27,  8.52it/s]

 22%|██▏       | 67/300 [00:06<00:25,  8.99it/s]

 23%|██▎       | 69/300 [00:06<00:25,  8.92it/s]

 24%|██▎       | 71/300 [00:07<00:24,  9.44it/s]

 24%|██▍       | 72/300 [00:07<00:25,  8.96it/s]

 25%|██▍       | 74/300 [00:07<00:22,  9.99it/s]

 25%|██▌       | 76/300 [00:07<00:26,  8.58it/s]

 26%|██▌       | 77/300 [00:07<00:27,  8.24it/s]

 26%|██▋       | 79/300 [00:07<00:26,  8.26it/s]

 27%|██▋       | 80/300 [00:08<00:27,  7.89it/s]

 27%|██▋       | 82/300 [00:08<00:23,  9.10it/s]

 28%|██▊       | 83/300 [00:08<00:24,  9.04it/s]

 28%|██▊       | 84/300 [00:08<00:26,  8.29it/s]

 29%|██▊       | 86/300 [00:08<00:22,  9.32it/s]

 29%|██▉       | 87/300 [00:08<00:24,  8.66it/s]

 30%|██▉       | 89/300 [00:09<00:24,  8.77it/s]

 30%|███       | 91/300 [00:09<00:21,  9.68it/s]

 31%|███       | 93/300 [00:09<00:21,  9.60it/s]

 31%|███▏      | 94/300 [00:09<00:21,  9.53it/s]

 32%|███▏      | 95/300 [00:09<00:21,  9.54it/s]

 32%|███▏      | 97/300 [00:09<00:20, 10.14it/s]

 33%|███▎      | 99/300 [00:10<00:19, 10.56it/s]

 34%|███▎      | 101/300 [00:10<00:17, 11.25it/s]

 34%|███▍      | 103/300 [00:10<00:16, 11.76it/s]

 35%|███▌      | 105/300 [00:10<00:17, 11.01it/s]

 36%|███▌      | 107/300 [00:10<00:19, 10.02it/s]

 36%|███▋      | 109/300 [00:10<00:18, 10.23it/s]

 37%|███▋      | 111/300 [00:11<00:18, 10.36it/s]

 38%|███▊      | 113/300 [00:11<00:17, 10.86it/s]

 38%|███▊      | 115/300 [00:11<00:15, 11.71it/s]

 39%|███▉      | 117/300 [00:11<00:14, 12.29it/s]

 40%|███▉      | 119/300 [00:11<00:14, 12.55it/s]

 40%|████      | 121/300 [00:11<00:14, 11.94it/s]

 41%|████      | 123/300 [00:12<00:16, 10.73it/s]

 42%|████▏     | 125/300 [00:12<00:16, 10.86it/s]

 42%|████▏     | 127/300 [00:12<00:15, 11.52it/s]

 43%|████▎     | 129/300 [00:12<00:14, 11.47it/s]

 44%|████▎     | 131/300 [00:12<00:14, 11.44it/s]

 44%|████▍     | 133/300 [00:13<00:15, 11.09it/s]

 45%|████▌     | 135/300 [00:13<00:14, 11.51it/s]

 46%|████▌     | 137/300 [00:13<00:13, 11.90it/s]

 46%|████▋     | 139/300 [00:13<00:12, 12.50it/s]

 47%|████▋     | 141/300 [00:13<00:12, 13.03it/s]

 48%|████▊     | 143/300 [00:13<00:12, 12.30it/s]

 48%|████▊     | 145/300 [00:14<00:12, 12.02it/s]

 49%|████▉     | 147/300 [00:14<00:12, 11.83it/s]

 50%|████▉     | 149/300 [00:14<00:12, 11.84it/s]

 50%|█████     | 151/300 [00:14<00:14, 10.47it/s]

 51%|█████     | 153/300 [00:14<00:15,  9.34it/s]

 51%|█████▏    | 154/300 [00:14<00:15,  9.31it/s]

 52%|█████▏    | 156/300 [00:15<00:15,  9.22it/s]

 52%|█████▏    | 157/300 [00:15<00:15,  9.02it/s]

 53%|█████▎    | 158/300 [00:15<00:16,  8.66it/s]

 53%|█████▎    | 159/300 [00:15<00:16,  8.38it/s]

 53%|█████▎    | 160/300 [00:15<00:18,  7.63it/s]

 54%|█████▎    | 161/300 [00:15<00:18,  7.38it/s]

 54%|█████▍    | 162/300 [00:16<00:18,  7.41it/s]

 54%|█████▍    | 163/300 [00:16<00:19,  7.07it/s]

 55%|█████▌    | 165/300 [00:16<00:18,  7.49it/s]

 55%|█████▌    | 166/300 [00:16<00:18,  7.29it/s]

 56%|█████▌    | 167/300 [00:16<00:17,  7.76it/s]

 56%|█████▋    | 169/300 [00:16<00:16,  8.18it/s]

 57%|█████▋    | 171/300 [00:17<00:14,  8.95it/s]

 57%|█████▋    | 172/300 [00:17<00:14,  8.68it/s]

 58%|█████▊    | 174/300 [00:17<00:12, 10.04it/s]

 59%|█████▊    | 176/300 [00:17<00:12,  9.57it/s]

 59%|█████▉    | 177/300 [00:17<00:13,  8.91it/s]

 60%|█████▉    | 179/300 [00:17<00:12,  9.60it/s]

 60%|██████    | 180/300 [00:18<00:13,  8.97it/s]

 61%|██████    | 182/300 [00:18<00:12,  9.32it/s]

 61%|██████    | 183/300 [00:18<00:12,  9.39it/s]

 61%|██████▏   | 184/300 [00:18<00:12,  9.07it/s]

 62%|██████▏   | 186/300 [00:18<00:11,  9.59it/s]

 63%|██████▎   | 188/300 [00:18<00:10, 10.71it/s]

 63%|██████▎   | 190/300 [00:18<00:09, 11.43it/s]

 64%|██████▍   | 192/300 [00:19<00:09, 11.48it/s]

 65%|██████▍   | 194/300 [00:19<00:09, 11.11it/s]

 65%|██████▌   | 196/300 [00:19<00:10, 10.02it/s]

 66%|██████▌   | 198/300 [00:19<00:09, 10.45it/s]

 67%|██████▋   | 200/300 [00:19<00:09, 10.61it/s]

 67%|██████▋   | 202/300 [00:20<00:08, 10.98it/s]

 68%|██████▊   | 204/300 [00:20<00:07, 12.16it/s]

 69%|██████▊   | 206/300 [00:20<00:07, 12.67it/s]

 69%|██████▉   | 208/300 [00:20<00:07, 11.84it/s]

 70%|███████   | 210/300 [00:20<00:07, 11.28it/s]

 71%|███████   | 212/300 [00:20<00:08, 10.70it/s]

 71%|███████▏  | 214/300 [00:21<00:07, 10.82it/s]

 72%|███████▏  | 216/300 [00:21<00:07, 11.42it/s]

 73%|███████▎  | 218/300 [00:21<00:08,  9.44it/s]

 73%|███████▎  | 220/300 [00:21<00:08,  9.33it/s]

 74%|███████▍  | 222/300 [00:22<00:07,  9.91it/s]

 75%|███████▍  | 224/300 [00:22<00:07, 10.05it/s]

 75%|███████▌  | 226/300 [00:22<00:07, 10.46it/s]

 76%|███████▌  | 228/300 [00:22<00:06, 10.68it/s]

 77%|███████▋  | 230/300 [00:22<00:06, 11.41it/s]

 77%|███████▋  | 232/300 [00:22<00:06, 11.16it/s]

 78%|███████▊  | 234/300 [00:23<00:05, 11.54it/s]

 79%|███████▊  | 236/300 [00:23<00:05, 11.57it/s]

 79%|███████▉  | 238/300 [00:23<00:05, 11.99it/s]

 80%|████████  | 240/300 [00:23<00:04, 12.87it/s]

 81%|████████  | 242/300 [00:23<00:05, 10.86it/s]

 81%|████████▏ | 244/300 [00:23<00:05, 11.11it/s]

 82%|████████▏ | 246/300 [00:24<00:04, 11.29it/s]

 83%|████████▎ | 248/300 [00:24<00:04, 11.62it/s]

 83%|████████▎ | 250/300 [00:24<00:04, 10.34it/s]

 84%|████████▍ | 252/300 [00:24<00:04, 10.74it/s]

 85%|████████▍ | 254/300 [00:24<00:04, 10.13it/s]

 85%|████████▌ | 256/300 [00:25<00:04, 10.74it/s]

 86%|████████▌ | 258/300 [00:25<00:04, 10.46it/s]

 87%|████████▋ | 260/300 [00:25<00:03, 10.80it/s]

 87%|████████▋ | 262/300 [00:25<00:03, 11.57it/s]

 88%|████████▊ | 264/300 [00:25<00:03, 10.70it/s]

 89%|████████▊ | 266/300 [00:26<00:04,  8.45it/s]

 89%|████████▉ | 268/300 [00:26<00:03,  9.56it/s]

 90%|█████████ | 270/300 [00:26<00:03,  9.72it/s]

 91%|█████████ | 272/300 [00:26<00:02,  9.54it/s]

 91%|█████████▏| 274/300 [00:26<00:02,  9.35it/s]

 92%|█████████▏| 276/300 [00:27<00:02,  9.92it/s]

 93%|█████████▎| 278/300 [00:27<00:02,  9.60it/s]

 93%|█████████▎| 279/300 [00:27<00:02,  8.72it/s]

 94%|█████████▎| 281/300 [00:27<00:01,  9.60it/s]

 94%|█████████▍| 283/300 [00:27<00:01, 10.85it/s]

 95%|█████████▌| 285/300 [00:28<00:01, 10.02it/s]

 96%|█████████▌| 287/300 [00:28<00:01, 10.02it/s]

 96%|█████████▋| 289/300 [00:28<00:01, 10.61it/s]

 97%|█████████▋| 291/300 [00:28<00:00, 11.32it/s]

 98%|█████████▊| 293/300 [00:28<00:00, 12.27it/s]

 98%|█████████▊| 295/300 [00:28<00:00, 10.50it/s]

 99%|█████████▉| 297/300 [00:29<00:00, 11.04it/s]

100%|█████████▉| 299/300 [00:29<00:00,  9.83it/s]

100%|██████████| 300/300 [00:29<00:00, 10.20it/s]




test :


  0%|          | 0/200 [00:00<?, ?it/s]

  0%|          | 1/200 [00:00<00:29,  6.66it/s]

  1%|          | 2/200 [00:00<00:25,  7.79it/s]

  2%|▏         | 3/200 [00:00<00:26,  7.38it/s]

  2%|▏         | 4/200 [00:00<00:27,  7.12it/s]

  2%|▎         | 5/200 [00:00<00:28,  6.74it/s]

  3%|▎         | 6/200 [00:00<00:37,  5.23it/s]

  4%|▎         | 7/200 [00:01<00:36,  5.34it/s]

  4%|▍         | 8/200 [00:01<00:35,  5.38it/s]

  4%|▍         | 9/200 [00:01<00:34,  5.58it/s]

  5%|▌         | 10/200 [00:01<00:33,  5.67it/s]

  6%|▌         | 11/200 [00:01<00:33,  5.64it/s]

  6%|▌         | 12/200 [00:01<00:29,  6.43it/s]

  7%|▋         | 14/200 [00:02<00:23,  7.78it/s]

  8%|▊         | 16/200 [00:02<00:21,  8.55it/s]

  8%|▊         | 17/200 [00:02<00:21,  8.45it/s]

  9%|▉         | 18/200 [00:02<00:29,  6.24it/s]

 10%|█         | 20/200 [00:03<00:25,  7.15it/s]

 11%|█         | 22/200 [00:03<00:20,  8.66it/s]

 12%|█▏        | 24/200 [00:03<00:18,  9.51it/s]

 13%|█▎        | 26/200 [00:03<00:16, 10.65it/s]

 14%|█▍        | 28/200 [00:03<00:18,  9.36it/s]

 15%|█▌        | 30/200 [00:03<00:19,  8.77it/s]

 16%|█▌        | 32/200 [00:04<00:17,  9.88it/s]

 17%|█▋        | 34/200 [00:04<00:15, 10.52it/s]

 18%|█▊        | 36/200 [00:04<00:17,  9.13it/s]

 18%|█▊        | 37/200 [00:04<00:19,  8.51it/s]

 20%|█▉        | 39/200 [00:04<00:16,  9.83it/s]

 20%|██        | 41/200 [00:05<00:17,  9.29it/s]

 21%|██        | 42/200 [00:05<00:18,  8.73it/s]

 22%|██▏       | 43/200 [00:05<00:18,  8.52it/s]

 22%|██▎       | 45/200 [00:05<00:16,  9.43it/s]

 23%|██▎       | 46/200 [00:05<00:17,  8.58it/s]

 24%|██▍       | 48/200 [00:05<00:15, 10.02it/s]

 25%|██▌       | 50/200 [00:06<00:15,  9.54it/s]

 26%|██▌       | 52/200 [00:06<00:14,  9.99it/s]

 27%|██▋       | 54/200 [00:06<00:16,  8.73it/s]

 28%|██▊       | 55/200 [00:06<00:17,  8.11it/s]

 28%|██▊       | 57/200 [00:06<00:16,  8.70it/s]

 29%|██▉       | 58/200 [00:07<00:15,  8.92it/s]

 30%|██▉       | 59/200 [00:07<00:16,  8.52it/s]

 30%|███       | 60/200 [00:07<00:19,  7.03it/s]

 30%|███       | 61/200 [00:07<00:18,  7.51it/s]

 32%|███▏      | 63/200 [00:07<00:14,  9.14it/s]

 32%|███▎      | 65/200 [00:07<00:14,  9.11it/s]

 34%|███▎      | 67/200 [00:08<00:13, 10.16it/s]

 34%|███▍      | 69/200 [00:08<00:11, 11.01it/s]

 36%|███▌      | 71/200 [00:08<00:12, 10.44it/s]

 36%|███▋      | 73/200 [00:08<00:11, 11.42it/s]

 38%|███▊      | 75/200 [00:08<00:10, 11.38it/s]

 38%|███▊      | 77/200 [00:08<00:12,  9.49it/s]

 40%|███▉      | 79/200 [00:09<00:13,  9.05it/s]

 40%|████      | 81/200 [00:09<00:12,  9.71it/s]

 42%|████▏     | 83/200 [00:09<00:12,  9.18it/s]

 42%|████▎     | 85/200 [00:09<00:11, 10.03it/s]

 44%|████▎     | 87/200 [00:10<00:13,  8.69it/s]

 44%|████▍     | 89/200 [00:10<00:11,  9.92it/s]

 46%|████▌     | 91/200 [00:10<00:10, 10.31it/s]

 46%|████▋     | 93/200 [00:10<00:10, 10.14it/s]

 48%|████▊     | 95/200 [00:10<00:10,  9.75it/s]

 48%|████▊     | 97/200 [00:10<00:09, 10.63it/s]

 50%|████▉     | 99/200 [00:11<00:09, 11.00it/s]

 50%|█████     | 101/200 [00:11<00:08, 11.08it/s]

 52%|█████▏    | 103/200 [00:11<00:09,  9.98it/s]

 52%|█████▎    | 105/200 [00:11<00:09,  9.67it/s]

 53%|█████▎    | 106/200 [00:11<00:10,  8.96it/s]

 54%|█████▎    | 107/200 [00:12<00:10,  8.85it/s]

 54%|█████▍    | 108/200 [00:12<00:10,  8.73it/s]

 55%|█████▌    | 110/200 [00:12<00:09,  9.27it/s]

 56%|█████▌    | 111/200 [00:12<00:10,  8.44it/s]

 56%|█████▌    | 112/200 [00:12<00:10,  8.27it/s]

 57%|█████▋    | 114/200 [00:12<00:09,  9.18it/s]

 57%|█████▊    | 115/200 [00:12<00:09,  9.23it/s]

 58%|█████▊    | 117/200 [00:13<00:08, 10.02it/s]

 60%|█████▉    | 119/200 [00:13<00:08,  9.10it/s]

 60%|██████    | 121/200 [00:13<00:08,  9.30it/s]

 61%|██████    | 122/200 [00:13<00:08,  8.98it/s]

 62%|██████▏   | 123/200 [00:13<00:09,  8.20it/s]

 62%|██████▏   | 124/200 [00:14<00:09,  7.97it/s]

 62%|██████▎   | 125/200 [00:14<00:09,  7.53it/s]

 63%|██████▎   | 126/200 [00:14<00:11,  6.72it/s]

 64%|██████▎   | 127/200 [00:14<00:10,  7.00it/s]

 64%|██████▍   | 128/200 [00:14<00:09,  7.58it/s]

 64%|██████▍   | 129/200 [00:14<00:10,  6.85it/s]

 66%|██████▌   | 131/200 [00:15<00:09,  7.36it/s]

 66%|██████▌   | 132/200 [00:15<00:08,  7.62it/s]

 67%|██████▋   | 134/200 [00:15<00:07,  8.81it/s]

 68%|██████▊   | 136/200 [00:15<00:07,  8.70it/s]

 69%|██████▉   | 138/200 [00:15<00:06,  9.69it/s]

 70%|███████   | 140/200 [00:15<00:05, 10.43it/s]

 71%|███████   | 142/200 [00:16<00:05,  9.88it/s]

 72%|███████▏  | 144/200 [00:16<00:06,  8.71it/s]

 72%|███████▎  | 145/200 [00:16<00:06,  8.82it/s]

 73%|███████▎  | 146/200 [00:16<00:06,  8.82it/s]

 74%|███████▎  | 147/200 [00:16<00:06,  8.17it/s]

 74%|███████▍  | 149/200 [00:16<00:05,  8.85it/s]

 75%|███████▌  | 150/200 [00:17<00:06,  7.95it/s]

 76%|███████▌  | 151/200 [00:17<00:06,  7.71it/s]

 76%|███████▋  | 153/200 [00:17<00:05,  8.76it/s]

 77%|███████▋  | 154/200 [00:17<00:05,  8.91it/s]

 78%|███████▊  | 155/200 [00:17<00:05,  8.37it/s]

 78%|███████▊  | 157/200 [00:17<00:04,  9.74it/s]

 80%|███████▉  | 159/200 [00:17<00:03, 11.38it/s]

 80%|████████  | 161/200 [00:18<00:03, 10.70it/s]

 82%|████████▏ | 163/200 [00:18<00:03,  9.36it/s]

 82%|████████▏ | 164/200 [00:18<00:03,  9.24it/s]

 82%|████████▎ | 165/200 [00:18<00:04,  8.36it/s]

 84%|████████▎ | 167/200 [00:18<00:03,  9.78it/s]

 84%|████████▍ | 169/200 [00:19<00:03,  9.23it/s]

 86%|████████▌ | 171/200 [00:19<00:03,  9.30it/s]

 86%|████████▌ | 172/200 [00:19<00:02,  9.39it/s]

 86%|████████▋ | 173/200 [00:19<00:03,  8.32it/s]

 87%|████████▋ | 174/200 [00:19<00:03,  8.50it/s]

 88%|████████▊ | 175/200 [00:19<00:03,  8.03it/s]

 88%|████████▊ | 176/200 [00:20<00:03,  7.69it/s]

 89%|████████▉ | 178/200 [00:20<00:02,  9.02it/s]

 90%|█████████ | 180/200 [00:20<00:01, 10.34it/s]

 91%|█████████ | 182/200 [00:20<00:02,  8.42it/s]

 92%|█████████▏| 183/200 [00:20<00:02,  8.10it/s]

 92%|█████████▏| 184/200 [00:20<00:02,  7.97it/s]

 92%|█████████▎| 185/200 [00:21<00:01,  8.27it/s]

 94%|█████████▎| 187/200 [00:21<00:01,  8.90it/s]

 94%|█████████▍| 188/200 [00:21<00:01,  7.71it/s]

 94%|█████████▍| 189/200 [00:21<00:01,  7.74it/s]

 95%|█████████▌| 190/200 [00:21<00:01,  7.26it/s]

 96%|█████████▌| 191/200 [00:21<00:01,  6.59it/s]

 96%|█████████▌| 192/200 [00:22<00:01,  6.83it/s]

 96%|█████████▋| 193/200 [00:22<00:01,  6.40it/s]

 98%|█████████▊| 195/200 [00:22<00:00,  7.61it/s]

 98%|█████████▊| 196/200 [00:22<00:00,  7.89it/s]

 98%|█████████▊| 197/200 [00:22<00:00,  7.54it/s]

100%|█████████▉| 199/200 [00:22<00:00,  8.13it/s]

100%|██████████| 200/200 [00:23<00:00,  8.49it/s]

100%|██████████| 200/200 [00:23<00:00,  8.69it/s]




创建训练并测试数据集：

In [6]:
batch_size = 8
num_frames = 8

output_signature = (tf.TensorSpec(shape = (None, None, None, 3), dtype = tf.float32),
                    tf.TensorSpec(shape = (), dtype = tf.int16))

train_ds = tf.data.Dataset.from_generator(FrameGenerator(subset_paths['train'], num_frames, training = True),
                                          output_signature = output_signature)
train_ds = train_ds.batch(batch_size)

test_ds = tf.data.Dataset.from_generator(FrameGenerator(subset_paths['test'], num_frames),
                                         output_signature = output_signature)
test_ds = test_ds.batch(batch_size)

2023-11-07 17:30:23.068909: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2211] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...


此处生成的标签表示类的编码。例如，“ApplyEyeMakeup”被映射到整数。查看训练数据的标签，确保数据集已被充分重排。 

In [7]:
for frames, labels in train_ds.take(10):
  print(labels)

tf.Tensor([2 9 2 5 4 9 7 9], shape=(8,), dtype=int16)


tf.Tensor([8 8 2 4 9 2 9 2], shape=(8,), dtype=int16)


tf.Tensor([7 1 8 1 2 6 0 6], shape=(8,), dtype=int16)


tf.Tensor([0 5 2 1 4 7 1 2], shape=(8,), dtype=int16)


tf.Tensor([8 3 1 4 7 1 4 0], shape=(8,), dtype=int16)


tf.Tensor([2 4 5 8 6 2 0 4], shape=(8,), dtype=int16)


tf.Tensor([2 2 1 5 6 1 1 8], shape=(8,), dtype=int16)


tf.Tensor([1 1 8 0 5 8 0 2], shape=(8,), dtype=int16)


tf.Tensor([0 4 6 6 2 9 6 4], shape=(8,), dtype=int16)


tf.Tensor([7 3 5 3 6 7 9 0], shape=(8,), dtype=int16)


查看数据的形状。

In [8]:
print(f"Shape: {frames.shape}")
print(f"Label: {labels.shape}")

Shape: (8, 8, 224, 224, 3)
Label: (8,)


## 什么是 MoViNets？

如前所述，[MoViNets](https://arxiv.org/abs/2103.11511) 是用于流式传输视频或动作识别等任务中的在线推断的视频分类模型。考虑使用 MoViNets 对您的视频数据进行分类以进行动作识别。

基于 2D 帧的分类器高效且可简单地运行整个视频，或者一次流式传输一帧。由于它们不能考虑时间上下文，它们的准确率有限，并且可能会在帧与帧之间给出不一致的输出。

一个简单的 3D CNN 使用双向时间上下文，可以提高准确率和时间一致性。这些网络可能需要更多资源，并且由于它们着眼于未来，不能用于流式传输数据。

![Standard convolution](https://tensorflow.google.cn/images/tutorials/video/standard_convolution.png)

MoViNet 架构使用沿时间轴“因果”的 3D 卷积（如 `padding="causal"` 的 `layers.Conv1D`）。这提供了两种方式的一些优点，主要是它允许高效流式传输。

![Causal convolution](https://tensorflow.google.cn/images/tutorials/video/causal_convolution.png)

因果卷积确保仅使用直到时间 *t* 的输入来计算时间 *t* 的输出。为了演示这如何提高流式传输的效率，请从您可能熟悉的一个更简单示例开始：RNN。RNN 通过时间向前传递状态：

![RNN model](https://tensorflow.google.cn/images/tutorials/video/rnn_comparison.png)

In [9]:
gru = layers.GRU(units=4, return_sequences=True, return_state=True)

inputs = tf.random.normal(shape=[1, 10, 8]) # (batch, sequence, channels)

result, state = gru(inputs) # Run it all at once

通过设置 RNN 的 `return_sequences=True` 参数，可以要求它在计算结束时返回状态。这样，您就可以暂停，随后从上次中断的地方继续，以获得完全相同的结果：

![States passing in RNNs](https://tensorflow.google.cn/images/tutorials/video/rnn_state_passing.png)

In [10]:
first_half, state = gru(inputs[:, :5, :])   # run the first half, and capture the state
second_half, _ = gru(inputs[:,5:, :], initial_state=state)  # Use the state to continue where you left off.

print(np.allclose(result[:, :5,:], first_half))
print(np.allclose(result[:, 5:,:], second_half))

True
True


如果小心处理，因果卷积能以相同的方式使用。Le Paine 等人在 [Fast Wavenet Generation Algorithm](https://arxiv.org/abs/1611.09482) 中使用了这种技术。在 [MoVinet 论文](https://arxiv.org/abs/2103.11511)中，`state` 被称为“流缓冲区”。

![States passed in causal convolution](https://tensorflow.google.cn/images/tutorials/video/causal_conv_states.png)

通过向前传递这一点状态，可以避免重新计算上面显示的整个感受域。 

## 下载预训练的 MoViNet 模型

在本部分中，您将：

1. 可以使用 TensorFlow 模型的 [`official/projects/movinet`](https://github.com/tensorflow/models/tree/master/official/projects/movinet) 中提供的开源代码创建 MoViNet 模型。
2. 加载预训练的权重。
3. 冻结卷积基或除最终分类器头之外的所有其他层，以加快微调速度。

要构建模型，您可以从 `a0` 配置开始，因为在针对其他模型进行基准分析时，它的训练速度最快。查看 [TensorFlow Model Garden 上的可用 MoViNet 模型](https://github.com/tensorflow/models/blob/master/official/projects/movinet/configs/movinet.py)，了解哪些模型可能适用于您的用例。

In [11]:
model_id = 'a0'
resolution = 224

tf.keras.backend.clear_session()

backbone = movinet.Movinet(model_id=model_id)
backbone.trainable = False

# Set num_classes=600 to load the pre-trained weights from the original model
model = movinet_model.MovinetClassifier(backbone=backbone, num_classes=600)
model.build([None, None, None, None, 3])

# Load pre-trained weights
!wget https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a0_base.tar.gz -O movinet_a0_base.tar.gz -q
!tar -xvf movinet_a0_base.tar.gz

checkpoint_dir = f'movinet_{model_id}_base'
checkpoint_path = tf.train.latest_checkpoint(checkpoint_dir)
checkpoint = tf.train.Checkpoint(model=model)
status = checkpoint.restore(checkpoint_path)
status.assert_existing_objects_matched()

movinet_a0_base/
movinet_a0_base/checkpoint
movinet_a0_base/ckpt-1.data-00000-of-00001
movinet_a0_base/ckpt-1.index


<tensorflow.python.checkpoint.checkpoint.CheckpointLoadStatus at 0x7f120079dfd0>

要构建分类器，请创建一个采用主干和数据集中的类数的函数。`build_classifier` 函数将采用主干和数据集中的类数来构建分类器。在这种情况下，新分类器将采用 `num_classes` 个输出（UCF101 的此子集有 10 个类）。

In [12]:
def build_classifier(batch_size, num_frames, resolution, backbone, num_classes):
  """Builds a classifier on top of a backbone model."""
  model = movinet_model.MovinetClassifier(
      backbone=backbone,
      num_classes=num_classes)
  model.build([batch_size, num_frames, resolution, resolution, 3])

  return model

In [13]:
model = build_classifier(batch_size, num_frames, resolution, backbone, 10)

对于本教程，选择 `tf.keras.optimizers.Adam` 优化器和 `tf.keras.losses.SparseCategoricalCrossentropy` 损失函数。使用指标参数查看每个步骤中模型性能的准确率。

In [14]:
num_epochs = 2

loss_obj = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

optimizer = tf.keras.optimizers.Adam(learning_rate = 0.001)

model.compile(loss=loss_obj, optimizer=optimizer, metrics=['accuracy'])

训练模型。两个周期后，观察训练集和测试集的低损失和高准确率。 

In [15]:
results = model.fit(train_ds,
                    validation_data=test_ds,
                    epochs=num_epochs,
                    validation_freq=1,
                    verbose=1)

Epoch 1/2


2023-11-07 17:30:49.251128: E ./tensorflow/compiler/xla/stream_executor/stream_executor_internal.h:124] SetPriority unimplemented for this stream.
2023-11-07 17:30:49.387365: E ./tensorflow/compiler/xla/stream_executor/stream_executor_internal.h:124] SetPriority unimplemented for this stream.


      1/Unknown - 24s 24s/step - loss: 2.2871 - accuracy: 0.1250

      2/Unknown - 37s 12s/step - loss: 2.2236 - accuracy: 0.3125

      3/Unknown - 49s 12s/step - loss: 2.1787 - accuracy: 0.3750

      4/Unknown - 62s 12s/step - loss: 2.1110 - accuracy: 0.4062

      5/Unknown - 74s 12s/step - loss: 2.0345 - accuracy: 0.4000

      6/Unknown - 86s 12s/step - loss: 1.9710 - accuracy: 0.4375

      7/Unknown - 99s 12s/step - loss: 1.9378 - accuracy: 0.4643

      8/Unknown - 111s 12s/step - loss: 1.9194 - accuracy: 0.4688

      9/Unknown - 123s 12s/step - loss: 1.8709 - accuracy: 0.5000

     10/Unknown - 136s 12s/step - loss: 1.8010 - accuracy: 0.5375

     11/Unknown - 148s 12s/step - loss: 1.7230 - accuracy: 0.5568

     12/Unknown - 160s 12s/step - loss: 1.6488 - accuracy: 0.5833

     13/Unknown - 173s 12s/step - loss: 1.5995 - accuracy: 0.6154

     14/Unknown - 185s 12s/step - loss: 1.5441 - accuracy: 0.6429

     15/Unknown - 197s 12s/step - loss: 1.5073 - accuracy: 0.6583

     16/Unknown - 210s 12s/step - loss: 1.4792 - accuracy: 0.6719

     17/Unknown - 222s 12s/step - loss: 1.4502 - accuracy: 0.6838

     18/Unknown - 234s 12s/step - loss: 1.4147 - accuracy: 0.6944

     19/Unknown - 247s 12s/step - loss: 1.3872 - accuracy: 0.6974

     20/Unknown - 259s 12s/step - loss: 1.3603 - accuracy: 0.6938

     21/Unknown - 271s 12s/step - loss: 1.3240 - accuracy: 0.7024

     22/Unknown - 284s 12s/step - loss: 1.2887 - accuracy: 0.7159

     23/Unknown - 296s 12s/step - loss: 1.2520 - accuracy: 0.7283

     24/Unknown - 309s 12s/step - loss: 1.2208 - accuracy: 0.7344

     25/Unknown - 321s 12s/step - loss: 1.1851 - accuracy: 0.7400

     26/Unknown - 334s 12s/step - loss: 1.1598 - accuracy: 0.7404

     27/Unknown - 346s 12s/step - loss: 1.1294 - accuracy: 0.7500

     28/Unknown - 359s 12s/step - loss: 1.1045 - accuracy: 0.7545

     29/Unknown - 371s 12s/step - loss: 1.0767 - accuracy: 0.7629

     30/Unknown - 383s 12s/step - loss: 1.0498 - accuracy: 0.7708

     31/Unknown - 396s 12s/step - loss: 1.0223 - accuracy: 0.7782

     32/Unknown - 408s 12s/step - loss: 0.9927 - accuracy: 0.7852

     33/Unknown - 420s 12s/step - loss: 0.9854 - accuracy: 0.7803

     34/Unknown - 433s 12s/step - loss: 0.9604 - accuracy: 0.7868

     35/Unknown - 445s 12s/step - loss: 0.9376 - accuracy: 0.7929

     36/Unknown - 457s 12s/step - loss: 0.9195 - accuracy: 0.7951

     37/Unknown - 470s 12s/step - loss: 0.8998 - accuracy: 0.8007

     38/Unknown - 479s 12s/step - loss: 0.8940 - accuracy: 0.8000



Epoch 2/2


 1/38 [..............................] - ETA: 8:07 - loss: 0.0802 - accuracy: 1.0000

 2/38 [>.............................] - ETA: 7:24 - loss: 0.0697 - accuracy: 1.0000

 3/38 [=>............................] - ETA: 7:11 - loss: 0.0536 - accuracy: 1.0000

 4/38 [==>...........................] - ETA: 6:59 - loss: 0.1014 - accuracy: 0.9688

 5/38 [==>...........................] - ETA: 6:45 - loss: 0.1108 - accuracy: 0.9750

 6/38 [===>..........................] - ETA: 6:33 - loss: 0.0981 - accuracy: 0.9792

 7/38 [====>.........................] - ETA: 6:20 - loss: 0.1141 - accuracy: 0.9643

 8/38 [=====>........................] - ETA: 6:08 - loss: 0.1098 - accuracy: 0.9688































































## 评估模型

该模型在训练数据集上取得了很高的准确率。接下来，使用 Keras `Model.evaluate` 在测试集上对其进行评估。

In [16]:
model.evaluate(test_ds, return_dict=True)

      1/Unknown - 13s 13s/step - loss: 0.0174 - accuracy: 1.0000

      2/Unknown - 25s 12s/step - loss: 0.0258 - accuracy: 1.0000

      3/Unknown - 37s 12s/step - loss: 0.0198 - accuracy: 1.0000

      4/Unknown - 49s 12s/step - loss: 0.0155 - accuracy: 1.0000

      5/Unknown - 61s 12s/step - loss: 0.0136 - accuracy: 1.0000

      6/Unknown - 73s 12s/step - loss: 0.0118 - accuracy: 1.0000

      7/Unknown - 85s 12s/step - loss: 0.0111 - accuracy: 1.0000

      8/Unknown - 97s 12s/step - loss: 0.0211 - accuracy: 1.0000

      9/Unknown - 109s 12s/step - loss: 0.0578 - accuracy: 0.9861

     10/Unknown - 121s 12s/step - loss: 0.0711 - accuracy: 0.9875

     11/Unknown - 133s 12s/step - loss: 0.0697 - accuracy: 0.9886

     12/Unknown - 145s 12s/step - loss: 0.0887 - accuracy: 0.9792

     13/Unknown - 157s 12s/step - loss: 0.0961 - accuracy: 0.9808

     14/Unknown - 169s 12s/step - loss: 0.0960 - accuracy: 0.9821

     15/Unknown - 181s 12s/step - loss: 0.0946 - accuracy: 0.9833

     16/Unknown - 193s 12s/step - loss: 0.0909 - accuracy: 0.9844

     17/Unknown - 205s 12s/step - loss: 0.0877 - accuracy: 0.9853

     18/Unknown - 217s 12s/step - loss: 0.0843 - accuracy: 0.9861

     19/Unknown - 229s 12s/step - loss: 0.0889 - accuracy: 0.9803

     20/Unknown - 241s 12s/step - loss: 0.0861 - accuracy: 0.9812

     21/Unknown - 253s 12s/step - loss: 0.0865 - accuracy: 0.9821

     22/Unknown - 265s 12s/step - loss: 0.0834 - accuracy: 0.9830

     23/Unknown - 277s 12s/step - loss: 0.0863 - accuracy: 0.9783

     24/Unknown - 289s 12s/step - loss: 0.0831 - accuracy: 0.9792

     25/Unknown - 301s 12s/step - loss: 0.0820 - accuracy: 0.9800



{'loss': 0.08196067810058594, 'accuracy': 0.9800000190734863}

要进一步呈现模型性能，请使用[混淆矩阵](https://tensorflow.google.cn/api_docs/python/tf/math/confusion_matrix)。混淆矩阵允许评估分类模型的性能，而不仅仅是准确率。为了构建此多类分类问题的混淆矩阵，需要获得测试集中的实际值和预测值。

In [17]:
def get_actual_predicted_labels(dataset):
  """
    Create a list of actual ground truth values and the predictions from the model.

    Args:
      dataset: An iterable data structure, such as a TensorFlow Dataset, with features and labels.

    Return:
      Ground truth and predicted values for a particular dataset.
  """
  actual = [labels for _, labels in dataset.unbatch()]
  predicted = model.predict(dataset)

  actual = tf.stack(actual, axis=0)
  predicted = tf.concat(predicted, axis=0)
  predicted = tf.argmax(predicted, axis=1)

  return actual, predicted

In [18]:
def plot_confusion_matrix(actual, predicted, labels, ds_type):
  cm = tf.math.confusion_matrix(actual, predicted)
  ax = sns.heatmap(cm, annot=True, fmt='g')
  sns.set(rc={'figure.figsize':(12, 12)})
  sns.set(font_scale=1.4)
  ax.set_title('Confusion matrix of action recognition for ' + ds_type)
  ax.set_xlabel('Predicted Action')
  ax.set_ylabel('Actual Action')
  plt.xticks(rotation=90)
  plt.yticks(rotation=0)
  ax.xaxis.set_ticklabels(labels)
  ax.yaxis.set_ticklabels(labels)

In [19]:
fg = FrameGenerator(subset_paths['train'], num_frames, training = True)
label_names = list(fg.class_ids_for_name.keys())

In [20]:
actual, predicted = get_actual_predicted_labels(test_ds)
plot_confusion_matrix(actual, predicted, label_names, 'test')

      1/Unknown - 16s 16s/step

      2/Unknown - 28s 12s/step

      3/Unknown - 40s 12s/step

      4/Unknown - 52s 12s/step

      5/Unknown - 64s 12s/step

      6/Unknown - 76s 12s/step

      7/Unknown - 88s 12s/step

      8/Unknown - 100s 12s/step

      9/Unknown - 112s 12s/step

     10/Unknown - 124s 12s/step

     11/Unknown - 136s 12s/step

     12/Unknown - 148s 12s/step

     13/Unknown - 160s 12s/step

     14/Unknown - 172s 12s/step

     15/Unknown - 184s 12s/step

     16/Unknown - 196s 12s/step

     17/Unknown - 208s 12s/step

     18/Unknown - 220s 12s/step

     19/Unknown - 233s 12s/step

     20/Unknown - 245s 12s/step

     21/Unknown - 257s 12s/step

     22/Unknown - 269s 12s/step

     23/Unknown - 280s 12s/step

     24/Unknown - 293s 12s/step

     25/Unknown - 305s 12s/step



## 后续步骤

现在，您已经对 MoViNet 模型以及如何利用各种 TensorFlow API（例如，用于迁移学习）有了一定的了解，请尝试将本教程中的代码用于您自己的数据集。数据不必限于视频数据。MRI 扫描等体数据也可与 3D CNN 一起使用。[用于精神分裂症和控制分类的基于脑 MRI 的 3D 卷积神经网络](https://arxiv.org/pdf/2003.08818.pdf)中提到的 NUSDAT 和 IMH 数据集可能是 MRI 数据的两个此类来源。

特别是，使用本教程和其他视频数据与分类教程中使用的 `FrameGenerator` 类可以帮助您将数据加载到模型中。

要详细了解如何在 TensorFlow 中处理视频数据，请查看以下教程：

- [加载视频数据](https://tensorflow.google.cn/tutorials/load_data/video)
- [构建用于视频分类的 3D CNN 模型](https://tensorflow.google.cn/tutorials/video/video_classification)
- [用于流式动作识别的 MoViNet](https://tensorflow.google.cn/hub/tutorials/movinet)