# Dieting Network

## Problem statement

This neural network below wants to lose weight and go on a diet.

```
NN: I want to slim down
You: ... what?
NN: I want to have less weight! I think the less numbers I store the better.
You: (trying to think)
You: I suppose you can try sparsity? The more weights you have that are zero, then the less numbers you have to save to disk. But you must save your weights as a sparse tensor though. By default, network weights are saved as dense tensors so even if you have zero as a number, it will still take up space
NN: (scrolling on phone)
NN: I'm reading this tiktok that says Python stores integers as as objects, and so when I have one million copies of the same integer, I'm actually only using one number.
You: (thinking)
You: Uhhh
You: I feel like there are many other things that need to be done properly for your statement to stand! But most importantly though
You: How are you going to function if you practically have no weights?
NN: Well, here's where you come in to help me out!
You: Hm, but I don't know how!
NN: Hahaha
NN: You don't know how **yet!**
You: ...
You: This is so strange.
Narrator: _This is not even the strangest problem in Selection 2_
```

Do what you can to make the network below perform well when all of its weights are frozen to unity with no biases (i.e. all weights = 1 and all biases = 0). You may only adjust the activation functions. Implement whatever activation function you want!

You are given:
- a baseline network below where you are to modify `self.act1`, `self.act2` and `self.act3`
- tensors `X_train`, `y_train`, `X_val`, `y_val`, `X_test` to act as your training data, validation data and test data
- helper code to generate predictions on `X_test` for scoring y a separate notebook

The following restrictions apply:

- Each activation function may contain a small amount of parameters, but have to be less than or equal to 5
- Each activation function shall be stateless during inferencing, i.e. the activation function should return the exact same answer when provided the exact same input. Inputs from a previous iteration should not affect outputs of the current iteration.

Scores shall be awarded as follows:
- 1 pt for explaining the reasoning of your approach in this notebook
- 1 pt for scoring R2 >= 0.25 on `X_test` while using activations that fulfill the restrictions above. You will be scored by `sklearn.metrics.r2_score`. See example below.
- Additional 0 - 3 pts to be assigned based on this formula: `(Your R2 score - baseline score) / (Benchmark score - baseline score) x 3 pts`, where:
    - Benchmark score is the highest scoring R2 achieved by all participants in this problem
    - Baseline score is 0.25 R2, by default. If the lowest scoring R2 by all participants exceeds 0.25 R2, the baseline score will be set as that instead
    - e.g. max R2 score achieved is 0.5, while min R2 score achieved is 0.3. If your score is 0.4, you get (0.4 - 0.3)/(0.5 - 0.3) x 3 = 1.5 pts

## Datasets

In [None]:
!curl https://storage.googleapis.com/aiolympiadmy_public/dieting_network/X_train.pt -o X_train.pt
!curl https://storage.googleapis.com/aiolympiadmy_public/dieting_network/X_val.pt -o X_val.pt
!curl https://storage.googleapis.com/aiolympiadmy_public/dieting_network/X_test.pt -o X_test.pt
!curl https://storage.googleapis.com/aiolympiadmy_public/dieting_network/y_train.pt -o y_train.pt
!curl https://storage.googleapis.com/aiolympiadmy_public/dieting_network/y_val.pt -o y_val.pt

In [1]:
import torch
import torch.nn as nn
from sklearn.metrics import r2_score

In [2]:
with open("X_train.pt", "rb") as f:
    X_train = torch.load(f)

with open("X_val.pt", "rb") as f:
    X_val = torch.load(f)

with open("X_test.pt", "rb") as f:
    X_test = torch.load(f)

  X_train = torch.load(f)
  X_val = torch.load(f)
  X_test = torch.load(f)


In [3]:
with open("y_train.pt", "rb") as f:
    y_train = torch.load(f)

with open("y_val.pt", "rb") as f:
    y_val = torch.load(f)

  y_train = torch.load(f)
  y_val = torch.load(f)


## Baseline network

In [4]:
class FixedLinear(nn.Module):
    """
    Similar to a nn.Linear layer, just not trainable
    """

    def __init__(self, in_features, out_features):
        super().__init__()
        self.weight = torch.ones(out_features, in_features)

    def forward(self, x):
        return torch.mm(x, self.weight.t())

In [5]:
class DietNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.layer1 = FixedLinear(8, 5)
        self.act1 = nn.Identity() # <- Replace me!
        self.layer2 = FixedLinear(5, 5)
        self.act2 = nn.Identity() # <- Replace me!
        self.layer3 = FixedLinear(5, 5)
        self.act3 = nn.Identity() # <- Replace me!
        self.layer4 = FixedLinear(5, 1)

    def forward(self, x):
        x = self.layer1(x)
        x = self.act1(x)
        x = self.layer2(x)
        x = self.act2(x)
        x = self.layer3(x)
        x = self.act3(x)
        x = self.layer4(x)
        return x

In [6]:
model = DietNetwork()
model.eval();

In [7]:
criterion = nn.MSELoss()

In [8]:
with torch.no_grad():
    y_train_pred = model(X_train)
    y_val_pred = model(X_val)
    
    train_r2 = r2_score(y_train, y_train_pred)
    val_r2 = r2_score(y_val, y_val_pred)
    
    print(
        f"train / val R2: {train_r2:.4f} / {val_r2:.4f}"
    )

train / val R2: -116767.6885 / -113389.9052


## Your work below

In [None]:
# Read everything clearly before you start!

## Saving for grading

In [None]:
with torch.no_grad():
    y_test_pred = model(X_test)

In [None]:
with open("y_test_pred.pt", "wb") as f:
    torch.save(y_test_pred, f)