**Copyright 2021 The TensorFlow Hub Authors.**

Licensed under the Apache License, Version 2.0 (the "License");

In [1]:
# Copyright 2021 The TensorFlow Hub Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================

<table class="tfo-notebook-buttons" align="left">
  <td>
    <a target="_blank" href="https://tfhub.dev/google/universal-sentence-encoder-cmlm/en-base/1"><img src="https://www.tensorflow.org/images/tf_logo_32px.png" />View on TensorFlow.org</a>
  </td>
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/tensorflow/hub/blob/master/examples/colab/senteval_for_universal_sentence_encoder_cmlm.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/tensorflow/hub/blob/master/examples/colab/senteval_for_universal_sentence_encoder_cmlm.ipynb"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View on GitHub</a>
  </td>
  <td>
    <a href="https://storage.googleapis.com/tensorflow_docs/hub/examples/colab/senteval_for_universal_sentence_encoder_cmlm.ipynb"><img src="https://www.tensorflow.org/images/download_logo_32px.png" />Download notebook</a>
  </td>
  <td>
    <a href="https://tfhub.dev/google/universal-sentence-encoder-cmlm/en-base/1"><img src="https://www.tensorflow.org/images/hub_logo_32px.png" />See TF Hub model</a>
  </td>
</table>

# Universal Sentence Encoder SentEval demo
This colab demostrates the [Universal Sentence Encoder CMLM model](https://tfhub.dev/google/universal-sentence-encoder-cmlm/en-base/1) using the [SentEval](https://github.com/facebookresearch/SentEval) toolkit, which is a library for measuring the quality of sentence embeddings. The SentEval toolkit includes a diverse set of downstream tasks that are able to evaluate the generalization power of an embedding model and to evaluate the linguistic properties encoded.

Run the first two code blocks to setup the environment, in the third code block you can pick a SentEval task to evaluate the model. A GPU runtime is recommended to run this Colab.

To learn more about the Universal Sentence Encoder CMLM model, see https://openreview.net/forum?id=WDVD4lUCTzU.

In [2]:
#@title Install dependencies
!pip install --quiet "tensorflow-text==2.11.*"
!pip install --quiet torch==1.8.1

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow-datasets 4.9.2 requires protobuf>=3.20, but you have protobuf 3.19.6 which is incompatible.
tensorflow-metadata 1.13.1 requires protobuf<5,>=3.20.3, but you have protobuf 3.19.6 which is incompatible.[0m[31m
[0m

## Download SentEval and task data
This step download SentEval from github and execute the data script to download the task data. It may take up to 5 minutes to complete.

In [3]:
#@title Install SentEval and download task data
!rm -rf ./SentEval
!git clone https://github.com/facebookresearch/SentEval.git
!cd $PWD/SentEval/data/downstream && bash get_transfer_data.bash > /dev/null 2>&1

Cloning into 'SentEval'...


remote: Enumerating objects: 691, done.[K
remote: Counting objects:  50% (1/2)[Kremote: Counting objects: 100% (2/2)[Kremote: Counting objects: 100% (2/2), done.[K
remote: Compressing objects:  50% (1/2)[Kremote: Compressing objects: 100% (2/2)[Kremote: Compressing objects: 100% (2/2), done.[K
Receiving objects:   0% (1/691)Receiving objects:   1% (7/691)Receiving objects:   2% (14/691)Receiving objects:   3% (21/691)Receiving objects:   4% (28/691)Receiving objects:   5% (35/691)Receiving objects:   6% (42/691)Receiving objects:   7% (49/691)Receiving objects:   8% (56/691)Receiving objects:   9% (63/691)Receiving objects:  10% (70/691)Receiving objects:  11% (77/691)Receiving objects:  12% (83/691)Receiving objects:  13% (90/691)Receiving objects:  14% (97/691)Receiving objects:  15% (104/691)Receiving objects:  16% (111/691)Receiving objects:  17% (118/691)Receiving objects:  18% (125/691)Receiving objects:  19% (132/691)Receiving objects:  20% (139

Receiving objects:  24% (166/691), 10.64 MiB | 21.26 MiB/s

Receiving objects:  24% (166/691), 21.40 MiB | 21.42 MiB/s

Receiving objects:  25% (173/691), 32.04 MiB | 21.23 MiB/sReceiving objects:  26% (180/691), 32.04 MiB | 21.23 MiB/sReceiving objects:  27% (187/691), 32.04 MiB | 21.23 MiB/sReceiving objects:  28% (194/691), 32.04 MiB | 21.23 MiB/sReceiving objects:  29% (201/691), 32.04 MiB | 21.23 MiB/sReceiving objects:  30% (208/691), 32.04 MiB | 21.23 MiB/sReceiving objects:  31% (215/691), 32.04 MiB | 21.23 MiB/sReceiving objects:  32% (222/691), 32.04 MiB | 21.23 MiB/sReceiving objects:  33% (229/691), 32.04 MiB | 21.23 MiB/sReceiving objects:  34% (235/691), 32.04 MiB | 21.23 MiB/sReceiving objects:  35% (242/691), 32.04 MiB | 21.23 MiB/sReceiving objects:  36% (249/691), 32.04 MiB | 21.23 MiB/sReceiving objects:  37% (256/691), 32.04 MiB | 21.23 MiB/sReceiving objects:  38% (263/691), 32.04 MiB | 21.23 MiB/sReceiving objects:  39% (270/691), 32.04 MiB | 21.23 MiB/sReceiving objects:  40% (277/691), 32.04 MiB | 21.23 MiB/sReceiving objects:  41% (284/691), 32.04 MiB | 21.23 MiB

#Execute a SentEval evaulation task
The following code block executes a SentEval task and output the results, choose one of the following tasks to evaluate the USE CMLM model:

```
MR	CR	SUBJ	MPQA	SST	TREC	MRPC	SICK-E
```

Select a model, params and task to run. The rapid prototyping params can be used for reducing computation time for faster result.

It typically takes 5-15 mins to complete a task with the **'rapid prototyping'** params and up to an hour with the **'slower, best performance'** params.

```
params = {'task_path': PATH_TO_DATA, 'usepytorch': True, 'kfold': 5}
params['classifier'] = {'nhid': 0, 'optim': 'rmsprop', 'batch_size': 128,
                                 'tenacity': 3, 'epoch_size': 2}
```

For better result, use the slower **'slower, best performance'** params, computation may take up to 1 hour:

```
params = {'task_path': PATH_TO_DATA, 'usepytorch': True, 'kfold': 10}
params['classifier'] = {'nhid': 0, 'optim': 'adam', 'batch_size': 16,
                                 'tenacity': 5, 'epoch_size': 6}
```



In [4]:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

import sys
sys.path.append(f'{os.getcwd()}/SentEval')

import tensorflow as tf

# Prevent TF from claiming all GPU memory so there is some left for pytorch.
gpus = tf.config.list_physical_devices('GPU')
if gpus:
  # Memory growth needs to be the same across GPUs.
  for gpu in gpus:
    tf.config.experimental.set_memory_growth(gpu, True)

import tensorflow_hub as hub
import tensorflow_text
import senteval
import time

PATH_TO_DATA = f'{os.getcwd()}/SentEval/data'
MODEL = 'https://tfhub.dev/google/universal-sentence-encoder-cmlm/en-base/1' #@param ['https://tfhub.dev/google/universal-sentence-encoder-cmlm/en-base/1', 'https://tfhub.dev/google/universal-sentence-encoder-cmlm/en-large/1']
PARAMS = 'rapid prototyping' #@param ['slower, best performance', 'rapid prototyping']
TASK = 'CR' #@param ['CR','MR', 'MPQA', 'MRPC', 'SICKEntailment', 'SNLI', 'SST2', 'SUBJ', 'TREC']

params_prototyping = {'task_path': PATH_TO_DATA, 'usepytorch': True, 'kfold': 5}
params_prototyping['classifier'] = {'nhid': 0, 'optim': 'rmsprop', 'batch_size': 128,
                                 'tenacity': 3, 'epoch_size': 2}

params_best = {'task_path': PATH_TO_DATA, 'usepytorch': True, 'kfold': 10}
params_best['classifier'] = {'nhid': 0, 'optim': 'adam', 'batch_size': 16,
                                 'tenacity': 5, 'epoch_size': 6}

params = params_best if PARAMS == 'slower, best performance' else params_prototyping

preprocessor = hub.KerasLayer(
    "https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3")
encoder = hub.KerasLayer(
    "https://tfhub.dev/google/universal-sentence-encoder-cmlm/en-base/1")

inputs = tf.keras.Input(shape=tf.shape(''), dtype=tf.string)
outputs = encoder(preprocessor(inputs))

model = tf.keras.Model(inputs=inputs, outputs=outputs)

def prepare(params, samples):
    return

def batcher(_, batch):
    batch = [' '.join(sent) if sent else '.' for sent in batch]
    return model.predict(tf.constant(batch))["default"]


se = senteval.engine.SE(params, batcher, prepare)
print("Evaluating task %s with %s parameters" % (TASK, PARAMS))
start = time.time()
results = se.eval(TASK)
end = time.time()
print('Time took on task %s : %.1f. seconds' % (TASK, end - start))
print(results)


Instructions for updating:
Lambda fuctions will be no more assumed to be used in the statement where they are used, or at least in the same block. https://github.com/tensorflow/tensorflow/issues/56089


Instructions for updating:
Lambda fuctions will be no more assumed to be used in the statement where they are used, or at least in the same block. https://github.com/tensorflow/tensorflow/issues/56089


Evaluating task CR with rapid prototyping parameters








































































































































































































































































































Time took on task CR : 49.9. seconds
{'devacc': 90.42, 'acc': 88.98, 'ndev': 3775, 'ntest': 3775}


#Learn More

*   Find more text embedding models on [TensorFlow Hub](https://tfhub.dev)
*   See also the [Multilingual Universal Sentence Encoder CMLM model](https://tfhub.dev/google/universal-sentence-encoder-cmlm/multilingual-base-br/1)
*   Check out other [Universal Sentence Encoder models](https://tfhub.dev/google/collections/universal-sentence-encoder/1)

## Reference

*   Ziyi Yang, Yinfei Yang, Daniel Cer, Jax Law, Eric Darve. [Universal Sentence Representations Learning with Conditional Masked Language Model. November 2020](https://openreview.net/forum?id=WDVD4lUCTzU)
