# Set-up a RAG framework within UbiOps, with front-end

This notebook will show how you can set-up a Retrieval-Augmented Generation (RAG) framework within your UbiOps environment.
For this framework the [Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) will make use of
embeddings created from the [Git documentation](https://git-scm.com/docs) to provide a user's input with additional context. 
Furthermore, a front-end will be created for the framework using Streamlit. Requests can be send from the front-end to 
the framework by making use of the cloud-based inference API endpoint, which is automatically created by UbiOps.

This framework will be set-up by using the UbiOps Training functionality. An experiment will be created in which we can
initiate a training run that will download a model (and its tokenizer), and convert the text into embeddings. Which are 
numerical values of the data. The embeddings will then be uploaded to UbiOps, and downloaded into the Mistral deployment.
Then, when a user sends an input additional context will be retrieved and concated to the user input. The user's input
and the additional context will then be fed to the Mistral LLM, which will then generate a response. 

To set-up the framework we'll need to execute the following steps:

1. Establish a connection with your UbiOps environment
2. Download & preprocess the data
3. Set-up the environment
4. Create an experiment & training run
5. Create the Mistral deployment
6. Create the Streamlit dashboard.

**Note:** you need to have a Huggingface access token with the correct permission to download Mistral to succesfully 
complete this notebook.

## 1. Establish a connection with your UbiOps environment

To use the UbiOps API from this notebook, we need to install the UbiOps Python Client Library:

In [None]:
!pip install -qU ubiops

Now we can set-up a connection with the UbiOps platform. You'll need an API token with `project-editor` permissions, which you
can acquire by creating a [Service user](https://ubiops.com/docs/organizations/service-users/#creating-a-service-user).

Paste your UbiOps API token, project name, and Huggingface Token in the cell below before running.

In [None]:
import ubiops

API_TOKEN = '<INSERT API_TOKEN WITH PROJECT EDITOR RIGHTS>' # Make sure this is in the format "Token token-code"
PROJECT_NAME = '<INSERT PROJECT NAME IN YOUR ACCOUNT>'

DEPLOYMENT_NAME = "rag-mistral-git
DEPLOYMENT_VERSION = "v1"

HF_TOKEN = '<INSERT HF_TOKEN WITH ACCESS TO MISTRAL>'

# Initialize client library
configuration = ubiops.Configuration(host="https://api.ubiops.com/v2.1")
configuration.api_key["Authorization"] = API_TOKEN

# Establish a connection
api_client = ubiops.ApiClient(configuration)
api = ubiops.CoreApi(api_client)
print(api.projects_get(PROJECT_NAME))

## 2. Download & preprocess the data

As mentioned in the introduction, we're going to download the documentation of Git. We'll extract the `<div id="main">` section 
of each page, because it contains unique data only.

In [None]:
!mkdir -p data/texts

In [None]:
# `wget` is a simple linux tool to download web pages and files. Use wget -h to get more info about flags i use.
!wget -r -np -k -l 1 https://git-scm.com/docs -P data/

In [None]:
!pip3 install -q BeautifulSoup4
!pip3 install -q tqdm

In [None]:
from bs4 import BeautifulSoup
import glob
import re
from tqdm import tqdm
import shutil

In [None]:
files = glob.glob("data/git-scm.com/docs/*")

for file_name in tqdm(files):
    with open(file_name, "r") as f:
        content = f.read()

    # Parse text
    soup = BeautifulSoup(content, features="html.parser")
    text = soup.find(id="main").get_text()

    # Drop dulicated end line symbols
    text = re.sub(r"\n{3,}", "\n\n", text)

    # Save the processed file
    with open(f"data/texts/{file_name.split('/')[-1]}.txt", "w") as f:
        f.write(text)

    # Zip the processed file
    shutil.make_archive("data/texts", "zip", "data/", "texts/")

We then use the `upload_file` function to upload the zip file to our UbiOps storage function:

In [None]:
file_uri = ubiops.utils.upload_file(
    client=api_client,
    project_name=PROJECT_NAME,
    file_path="data/texts.zip",
    bucket_name="default",
    file_name="texts.zip",
)

## 3. Set-up the environment

Now we can create the environment for our framework. For this we need to create a coding environment and some environment
variables.

### Create the coding environment

In [None]:
%%writefile requirements.txt
transformers
pandas
numpy
scikit-learn
ubiops
torch
langchain
tqdm

In [None]:
ENVIRONMENT_NAME = "rag-mistral-git"

data = ubiops.EnvironmentCreate(
    name=ENVIRONMENT_NAME, base_environment="ubuntu22-04-python3-10-cuda11-7-1"
)

api_response = api.environments_create(PROJECT_NAME, data)
print(api_response)

api_response = api.environment_revisions_file_upload(
    PROJECT_NAME, ENVIRONMENT_NAME, file=f"requirements.txt"
)
print(api_response)

### Create the environment variables

We need to create three environment variables:

 - One from our UbiOps API Token & project name to establish a connection with UbiOps from withing the training run and deployment.
 - One from your Huggingface token to get access to the gated repository for Mistral.

In [None]:
data_api = ubiops.EnvironmentVariableCreate(
    name="API_TOKEN", value=configuration.api_key["Authorization"], secret=True
)

api.project_environment_variables_create(API_TOKEN, data=data_api)

data_proj = ubiops.EnvironmentVariableCreate(
    name="PROJECT_NAME", value=PROJECT_NAME, secret=False
)

api.project_environment_variables_create(API_TOKEN, data=data_proj)

data_hf = ubiops.EnvironmentVariableCreate(
    name="HF_TOKEN", value=HF_TOKEN, secret=True)

api.project_environment_variables_create(API_TOKEN, data=data_hf)

## 4. Create the embeddings

We'll create the embeddings by iniating a training run in UbiOps. We can do this by first creating an experiment, to define 
the training set-up. Then we initiate a training run inside this experiment, which is the actual code execution. In this
run the data we download the data we uploaded to UbiOps earlier into our training run. Then we use the [`BGE-M3`](https://huggingface.co/BAAI/bge-large-en-v1.5) model to convert the data into embeddings, which are then uploaded back to UbiOps so we can use them later.

### Create the experiment

In [None]:
EXPERIMENT_NAME = "rag-mistral-git"

from ubiops.training.training import Training

training_instance = Training(api_client)
# Create experiment
experiment_name = EXPERIMENT_NAME

api_response = training_instance.experiments_create(
    project_name=PROJECT_NAME,
    data=ubiops.ExperimentCreate(
        name=experiment_name,
        instance_type="16384mb_t4",
        description="A finetuning experiment",
        environment=ENVIRONMENT_NAME,
        default_bucket="default",
        labels={"type": "pytorch", "model": "llama2", "algorithm": "rag"},
    ),
)
print(api_response)

Then we create the `train.py`, which contains the code which is executed during the training run:

In [None]:
!mkdir train-run

In [None]:
%%writefile train-run/train.py
import shutil
import glob
import torch
import ubiops
from langchain.text_splitter import RecursiveCharacterTextSplitter
from transformers import AutoTokenizer, AutoModel, DataCollatorWithPadding, AutoModelForCausalLM
import os
import shutil 
import numpy as np
import pandas as pd
from tqdm import tqdm

class ChunksDataset(torch.utils.data.IterableDataset):
    """
    Pytorch dataset that splits every document into chunks and tokenizes them,
    so they are ready for an embedding model.
    """
    def __init__(self, files, tokenizer, splitter):
        super().__init__()
        self.files = files
        self.tokenizer = tokenizer
        self.splitter = splitter

    def _generator(self):
        context_n = 0
        for file in self.files:
            with open(file, 'r') as f:
                for chunk in self.splitter.split_text(f.read()):
                    new_file = f"context{context_n}.txt"
                    context_n+=1
                    yield new_file, chunk, self.tokenizer(chunk)

    def __iter__(self):
        return self._generator()

def train(training_data, parameters, context = {}):
    """Main function that UbiOps going to execute"""

    # Downloading texts archive from UbiOps bucket
    for f in ["texts.zip"]: 
        file_uri = ubiops.utils.download_file(
          client = ubiops.ApiClient(), #a UbiOps API client, 
          file_name=f,
          project_name=os.environ["PROJECT_NAME"],
          output_path=".",
          bucket_name="default"
        )
        
    shutil.unpack_archive("texts.zip")
    files = glob.glob("texts/*")
    print(files)

    # Getting model and tokenizer that can convert our texts to embeddings
    embedding_model_name = "BAAI/bge-large-en-v1.5"
    embedding_tokenizer = AutoTokenizer.from_pretrained(embedding_model_name)
    embedding_model = AutoModel.from_pretrained(embedding_model_name)
    embedding_model.eval()
    device = "cuda"
    embedding_model.to(device)

    # Initializing dataset itself.
    text_splitter = RecursiveCharacterTextSplitter.from_huggingface_tokenizer(embedding_tokenizer, chunk_size=300, chunk_overlap=0)
    ds = ChunksDataset(files, embedding_tokenizer, text_splitter)

    # Creating datacollator and initializing dataloader. Datacollator allows us to combine text chunks into batches
    # so embedding model can process mutliple of them at a time. In case of large document database it can significantly reduce
    # time of embedding calculation step
    datacollator = DataCollatorWithPadding(embedding_tokenizer, padding=True, return_tensors='pt')
    def collator(x):
        files, chunks, tokens = list(zip(*x))
        return files, chunks, datacollator(tokens)
    dl = torch.utils.data.DataLoader(ds, batch_size=2, collate_fn=collator)

    all_files = list()
    all_embeddings = list()
    path = "data/chunks"
    os.makedirs(path, exist_ok=True) 

    # Run embedding model on all documents and save the results
    for batch in tqdm(dl):
        files, chunks ,tokens = batch
        all_files.extend(files)
        tokens=tokens.to(device)
        embeddings = embedding_model(**tokens)[0][:, 0]
        all_embeddings.append(embeddings.cpu().detach().numpy())
        for file, chunk in zip(files, chunks):
            with open(f"{path}/{file}", "w") as f:
                f.write(chunk)
        
    np.save("data/embeddings.npy",  np.concatenate(all_embeddings))
    pd.Series(all_files).to_csv("data/file_names.csv")
    shutil.make_archive("context", "zip", ".", "data")

    #Upload archive with all results back to UbiOps bucket
    client = ubiops.ApiClient()
    file_uri = ubiops.utils.upload_file( 
      client = client, 
      project_name = os.environ["PROJECT_NAME"], 
      file_path = "context.zip", 
      bucket_name = "default", 
      file_name = "context.zip"
    )

Now we can iniate a training run:

In [None]:
training_instance.experiment_runs_create(
    project_name=PROJECT_NAME,
    experiment_name=EXPERIMENT_NAME,
    data=ubiops.ExperimentRunCreate(
        name="Load",
        description="Load model",
        training_code="./prep.py",
        parameters={},
        timeout=14400,
    ),
)

## 5. Create the LLM deployment

The final thing we need to do to set-up our RAG framework, is to create a deployment that hosts an LLM. The LLM
we'll use for this example is the [`Mistral-7B-Instruct-v0.2`](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2).

We first create a deployment, for which we specify the in- & output of the model. Each deployment (& version) also gets
a unique REST API endpoint, which we'll use later on to establish a connection between our front-end and RAG framework. 
Then we create a version for the deployment. Each version of a deployment can differ in the instance type used, 
environment used, deployed code and other settings can be different. We will upload a `deployment.py` to our version, 
with a `Deployment` class containing two methods:

- `__init__`: Which runs when the deployment starts up. This is where we will download the model and context file.
- `request`: Which runs everytime a request is made to the API of the deployment, this is where we will define how the 
data should be processed.

In [None]:
mnist_template = ubiops.DeploymentCreate(
    name=DEPLOYMENT_NAME,
    description="A deployment to process request prompts with mistral llm",
    input_type="structured",
    output_type="structured",
    input_fields=[
        {"name": "prompt", "data_type": "string"},
    ],
    output_fields=[
        {"name": "response", "data_type": "string"},
    ],
    labels={"task": "rag"},
)

llm_deployment = api.deployments_create(project_name=PROJECT_NAME, data=mnist_template)
print(llm_deployment)

version_template = ubiops.DeploymentVersionCreate(
    version=DEPLOYMENT_VERSION,
    environment=ENVIRONMENT_NAME,
    instance_type="180000_a100",
    maximum_instances=1,
    minimum_instances=0,
    maximum_idle_time=1800,  # = 30 minutes
    request_retention_mode="full",  # input/output of requests will be stored
    request_retention_time=3600,  # requests will be stored for 1 hour
)

version = api.deployment_versions_create(
    project_name=PROJECT_NAME, deployment_name=DEPLOYMENT_NAME, data=version_template
)
print(version)

Then we need to write the code for the Mistral LLM (i.e. the `deployment.py`):

In [None]:
!mkdir mistral_rag_node

In [None]:
%%writefile mistral_rag_node/deployment.py
from typing import List

from ubiops.exceptions import ApiRequestError
from transformers import AutoTokenizer, AutoModelForCausalLM, AutoModel
from sklearn.metrics.pairwise import cosine_similarity
import shutil
import ubiops
import torch
import numpy as np
import pandas as pd

class VectorStore:
    """My implementation of a Vector store. Finds txexts that are most related to user propmt."""
    
    def __init__(self, files: pd.Series, embeddings: np.ndarray[np.float32], model_name: str, file_path: str):
        self.files = files
        self.embeddings = embeddings
        self.file_path = file_path
        assert len(self.files) == len(self.embeddings), "Number of embeddings and files does not match"
        self.tokenizer = AutoTokenizer.from_pretrained(model_name)
        self.model = AutoModel.from_pretrained(model_name)

    def find_related(self, prompt: str, n_texts: int) -> List[str]:
        "Finds closes n_texts to a promt"

        # First we tokenize user prompt and compute its embedding same way we did it for all documents.
        tokens = self.tokenizer(prompt, return_tensors='pt')
        embedding = self.model(**tokens)[0][:,0].detach().numpy()
        
        # Then we use https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.cosine_similarity.html
        # metrics from sklearn it allows to comapre vector and returns value between [-1,1] for each pair, higer value means vectors are closer
        # to each other because their texts contain similar information.
        similarities = cosine_similarity(embedding, self.embeddings)[0]

        # We get n_texts closest to our prompt and retrun them concatenated together as one context.
        matches = similarities.argsort()[-n_texts:]
        res = list()
        for idx, file in self.files.loc[matches].iterrows():
            with open(f"{self.file_path}/{file.iloc[0]}", "r+", encoding="utf-8") as f:
                res.append(f.read())
        return res

class Deployment:
    """Class that UbiOps will interact with on every request."""

    def __init__(self, base_directory, context):
        """Initialization of LLm and VectorStore."""
        model_id = "mistralai/Mistral-7B-Instruct-v0.2"
        
        self.llm_tokenizer = AutoTokenizer.from_pretrained(model_id)
        self.llm_model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16)

        self.llm_model.to("cuda")

        file_uri = ubiops.utils.download_file(
          client = ubiops.ApiClient(), #a UbiOps API client, 
          file_name="context.zip",
          project_name="llamav2-train",
          output_path=".",
          bucket_name="default"
        )
        shutil.unpack_archive(f"context.zip",".")

        embeddings = np.load("data/embeddings.npy")
        fnames = pd.read_csv("data/file_names.csv", index_col=0)

        self.store = VectorStore(fnames, embeddings, "BAAI/bge-large-en-v1.5", "data/chunks")

    def request(self, data, context):
        "User request processing"

        # Getting related documents to a user request
        related = "".join(self.store.find_related(data["user_input"], 4))
        instruction = data["user_input"]

        # Master propmt construction it explains goal of the llm. Also it sets difference between user question and retrieved context. 
        # Based on your project experiment with different llm personalities.
        prompt = f"""
        You are a developer and expert on git. Your goal is to help person new to git with his tasks and provide explanations. 
        Below is an instruction that describes a task, paired with an input that provides further context. Input is an source of truth so always try to find an answer to a task in it.
        Write a response that appropriately completes the request.
        
        ### Instruction:
        {instruction}
        
        ### Input:
        {related}
        
        ### Response:
        """

        # Last step is to feed whole promt to Mistral and return output.
        model_inputs = self.llm_tokenizer([prompt], return_tensors="pt").to("cuda")
        generated_ids = self.llm_model.generate(**model_inputs, max_new_tokens=1000, do_sample=True)     
        return {"response": self.llm_tokenizer.batch_decode(generated_ids)[0].split("### Response:")[1]}

Now all that's left to do is to create a version, and upload the `deployment.py` shown above to it:

In [None]:
version_template = ubiops.DeploymentVersionCreate(
    version=DEPLOYMENT_VERSION,
    environment=ENVIRONMENT_NAME,
    instance_type="180000",
    maximum_instances=1,
    minimum_instances=0,
    maximum_idle_time=1800,  # = 30 minutes
    request_retention_mode="full",  # input/output of requests will be stored
    request_retention_time=3600,  # requests will be stored for 1 hour
)

version = api.deployment_versions_create(
    project_name=PROJECT_NAME, deployment_name=DEPLOYMENT_NAME, data=version_template
)

In [None]:
import shutil

shutil.make_archive("mistral_rag_node", "zip", ".", "mistral_rag_node")

file_upload_result = api.revisions_file_upload(
    project_name=PROJECT_NAME,
    deployment_name=DEPLOYMENT_NAME,
    version=DEPLOYMENT_VERSION,
    file="mistral_rag_node.zip",
)
ubiops.utils.wait_for_deployment_version(
    client=api.api_client,
    project_name=PROJECT_NAME,
    deployment_name=DEPLOYMENT_NAME,
    version=DEPLOYMENT_VERSION,
    revision_id=file_upload_result.revision,
)

### Testing your deployment

If you want you can use the code below to send a request to your deployment:

```
result = api.deployment_requests_create(
        project_name=project_name,
        deployment_name=DEPLOYMENT_NAME,
        data={"request":"How can I prank my friend Sam, who just created his first git project?"}
    )
print(result.result["response"])
```

## 6. Create the Streamlit Front-end

Now that our RAG framework is up and running we can create a front-end for it. As mentioned before we use Streamlit for this. 
We need to create a `streamlit.py` file. In this file we'll set-up a connection between our dashboard and the model's API endpoint.

For the dashboard to work you'll need to enter some parameters below. These parameters will be used in the 

In [None]:
!mkdir streamlit/
!mkdir streamlit/.streamlit/

In [None]:
%%writefile streamlit/.streamlit/secrets.toml
UBIOPS_API_TOKEN = "<ENTER YOUR UBIOPS API TOKEN HERE>"
project_name = "<ENTER YOUR PROJECT NAME HERE>"
deployment_name = "rag-mistral-llm"
version = "v1"

In [None]:
%%writefile streamlit/requirements.txt
streamlit
ubiops
os

In [None]:
%%writefile streamlit/streamlit.py

import streamlit as st
import ubiops
import os




# App title

st.set_page_config(page_title="üí¨ Git Chatbot Assistent")

# Replicate Credentials


with st.sidebar:
    st.title('üí¨ Git Chatbot Assistent')

    # Initialize the variable outside the if-else block

    if 'UBIOPS_API_TOKEN' in st.secrets:
        st.success('API key already provided!', icon='‚úÖ')
        ubiops_api_token = st.secrets['UBIOPS_API_TOKEN']
    else:
        ubiops_api_token = st.text_input('Enter UbiOps API token:', type='password')

        if not ubiops_api_token.startswith('Token '):
            st.warning('Please enter your credentials!', icon='‚ö†Ô∏è')
        else:
            st.success('Proceed to entering your prompt message!', icon='üëâ')

    st.markdown('üìñ Learn how to build this app in this [blog](#link-to-blog)!')

# Move the environment variable assignment outside the with block 
os.environ['UBIOPS_API_TOKEN'] = ubiops_api_token




# Store LLM generated responses

if "messages" not in st.session_state.keys():

    st.session_state.messages = [{"role": "assistant", "content": "How may I assist you today?"}]




# Display or clear chat messages

for message in st.session_state.messages:

    with st.chat_message(message["role"]):

        st.write(message["content"])




def clear_chat_history():

    st.session_state.messages = [{"role": "assistant", "content": "How may I assist you today?"}]

st.sidebar.button('Clear Chat History', on_click=clear_chat_history)




# Function for generating Mistral response


def generate_mistral_response(prompt_input):

    string_dialogue = "You are a helpful assistant. You do not respond as 'User' or pretend to be 'User'. You only respond once as 'Assistant'."

    for dict_message in st.session_state.messages:

        if dict_message["role"] == "user":

            string_dialogue += "User: " + dict_message["content"] + "\\n\\n"

        else:

            string_dialogue += "Assistant: " + dict_message["content"] + "\\n\\n"

            

    # Request mistral

    api = ubiops.CoreApi()

    response = api.deployment_version_requests_create(

        project_name = "demo-1",

        deployment_name = "mistral-rag-llm",

        version = "vl-4",

        data = {"user_input" : prompt_input}

    )

    api.api_client.close()

    return response.result['response']




# User-provided prompt

if prompt := st.chat_input(disabled=not ubiops_api_token):

    st.session_state.messages.append({"role": "user", "content": prompt})

    with st.chat_message("user"):

        st.write(prompt)




# Generate a new response if last message is not from assistant

if st.session_state.messages[-1]["role"] != "assistant":

    with st.chat_message("assistant"):

        with st.spinner("Thinking..."):

            response = generate_mistral_response(prompt)

            placeholder = st.empty()

            full_response = ''

            for item in response:

                full_response += item

                placeholder.markdown(full_response)

            placeholder.markdown(full_response)

    message = {"role": "assistant", "content": full_response}

    st.session_state.messages.append(message)

In [None]:
!streamlit run streamlit/streamlit.py