{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "oL9KopJirB2g" }, "source": [ "##### Copyright 2018 The TensorFlow Authors." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "cellView": "form", "execution": { "iopub.execute_input": "2023-11-07T23:56:43.555565Z", "iopub.status.busy": "2023-11-07T23:56:43.554954Z", "iopub.status.idle": "2023-11-07T23:56:43.559302Z", "shell.execute_reply": "2023-11-07T23:56:43.558625Z" }, "id": "SKaX3Hd3ra6C" }, "outputs": [], "source": [ "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n", "# you may not use this file except in compliance with the License.\n", "# You may obtain a copy of the License at\n", "#\n", "# https://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License." ] }, { "cell_type": "markdown", "metadata": { "id": "AAK88XQ9Pm9N" }, "source": [ "# Unicode 字符串" ] }, { "cell_type": "markdown", "metadata": { "id": "0TD5ZrvEMbhZ" }, "source": [ "
\n",
" \n",
" ![]() | \n",
" \n",
" \n",
" ![]() | \n",
" \n",
" \n",
" ![]() | \n",
" \n",
" ![]() | \n",
"
0
和 `0x10FFFF` 之间的唯一整数[码位](https://en.wikipedia.org/wiki/Code_point)进行编码。*Unicode 字符串*是由零个或更多码位组成的序列。\n",
"\n",
"本教程介绍了如何在 TensorFlow 中表示 Unicode 字符串,以及如何使用标准字符串运算的 Unicode 等效项对其进行操作。它会根据字符体系检测将 Unicode 字符串划分为不同词例。"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"execution": {
"iopub.execute_input": "2023-11-07T23:56:43.563317Z",
"iopub.status.busy": "2023-11-07T23:56:43.562744Z",
"iopub.status.idle": "2023-11-07T23:56:46.215361Z",
"shell.execute_reply": "2023-11-07T23:56:46.214317Z"
},
"id": "OIKHl5Lvn4gh"
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"2023-11-07 23:56:44.044088: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n",
"2023-11-07 23:56:44.044147: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n",
"2023-11-07 23:56:44.045783: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n"
]
}
],
"source": [
"import tensorflow as tf"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "n-LkcI-vtWNj"
},
"source": [
"## `tf.string` 数据类型\n",
"\n",
"您可以使用基本的 TensorFlow `tf.string` `dtype` 构建字节字符串张量。Unicode 字符串默认使用 UTF-8 编码。"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"execution": {
"iopub.execute_input": "2023-11-07T23:56:46.219810Z",
"iopub.status.busy": "2023-11-07T23:56:46.219339Z",
"iopub.status.idle": "2023-11-07T23:56:48.563325Z",
"shell.execute_reply": "2023-11-07T23:56:48.562576Z"
},
"id": "3yo-Qv6ntaFr"
},
"outputs": [
{
"data": {
"text/plain": [
"