| gpt3-small | gpt3-175b | 636-GPT3_XL_filterexperiment_0 | |
|---|---|---|---|
| Model shape | |||
| git_commit | |||
| n_head | 12 | 96 | 32 |
| n_vocab | 50257 | 50257 | 50257 |
| n_layer | 12 | 96 | 24 |
| n_embd | 768 | 12288 | 2048 |
| n_ctx | 2048 | 2048 | 2048 |
| approx_model_params | 123.53 M | 174.56 G | 1.31 G |
| Training size | |||
| train_batch_size | 250 | 1600 | 256 |
| train_steps | 585938 | 91553 | 25000 |
| total_train_tokens | 300.00 G | 300.00 G | 13.11 G |
| total_approx_ops | 2.22e+20 | 3.14e+23 | 1.03e+20 |
| total_pflops_days | 2.57 | 3636.75 | 1.19 |
| TPU | |||
| tpu_name | chell_4 | ||
| n_cores | 2048 | 2048 | 256 |
| total_flops | 107.52 P | 107.52 P | 13.44 P |
| theo_train_days | 0.02 | 33.82 | 0.09 |
| Training progress | |||
| tb_url | vm.eleuther.ai:8005 | ||
| sacred_id | 636 | ||
| status | RUNNING | ||
| start_time | 2021-05-03 20:14:56 UTC | ||
| n_updates | 0 | 0 | 0 |
| last_update_time | |||
| wall_time_secs | |||
| latest_batch* | |||
| latest_loss* | |||
| fraction_done* | |||
| train_tokens_elapsed* | |||
| approx_ops_elapsed* | |||
| pflops_days_elapsed* | |||
| secs_per_batch | |||
| tokens_per_sec | |||
| theo_eff | |||
| wall_remaining_secs | |||
| est_finish_time |