# Configuring TQL
This guide will show you how to configure and customize TQL to fit your needs.

## The TQL Configuration File
TQL is configured by a [yaml](https://yaml.org/) file.  You can always see the location of the
yaml configuration file by using TQL's command line tool by running the command `tql status`:
```
(tql-venv) jshmoe$ tql status
------------------------------------------------------------------
           Version: 20.1.16
 Version Timestamp: 2021-07-21 17:34:23
       Version Age: 6 days, 17 hours, 56 minutes, 6 seconds
   Filesystem Root: /home/jshmoe/.tql/files
 Working Directory: /home/jshmoe/.tql
Configuration File: /home/jshmoe/.tql/tql_conf.yml
       Api Gateway: http://localhost:9000
    Service Status: Icarus: OFFLINE, Daedalus: OFFLINE
    Service Uptime: 
------------------------------------------------------------------
```


You can also see the current configuration by running the command `tql conf`:
```
(tql-venv) jshmoe$ tql conf
conf_path: /home/jshmoe/.tql/tql_conf.yml
database:
  nanml.standalone.db.h2-disk.directory: /home/jshmoe/.tql/db
  nanml.standalone.db.instance_type: h2-disk
filesystem:
  root: /home/jshmoe/.tql/files
icarus:
...
```

<div class="alert alert-block alert-info">
    <b>NOTE:</b> The TQL configuration file will only be generated on first startup of
     the TQL backend.  If you just installed TQL but have not yet tried to use it, first
     use the command <b>tql start</b> to generate a configuration file to edit.
</div>

<br>

## Modifying the Configuration
Edit the configuration file at `/home/jshmoe/.tql/tql_conf.yml` using your editor
 of choice.  Afterwords, restart TQL using the command `tql restart` to apply
 your configuration changes.

## Configuration Sections 
The configuration file is broken up into four sections: `filesystem`, `pyspark`,
 `icarus`, and `database`.

### FileSystem 
The backend reads and writes files as part of its normal operation. Configure this property
 by adding the following section to your conf:
```
filesystem:
  root: /path/to/filesystem/root
```
By default, the backend reads and writes to the local filesystem, in a directory `~/.tql/files/`.
  However, if deploying the backend to a cluster environment, such an AWS EMR Cluster or GCP
  Dataproc, all processing nodes must have access to the filesystem.  For this you should
  configure the backend to write to cloud storage, such as S3 or GCS.

#### Amazon S3 Filesystem
To read/write to Amazon S3, use the following configuration:
```
filesystem:
  root: s3://your-bucket/sub-folder/
  s3:
    s3_access_key: YOUR_ACCESS_KEY
    s3_secret_key: YOUR_SECRET_KEY
    s3_region: YOUR_REGION
```

#### Google GCS Filesystem:
To read/write to GCS, use the following configuration:
```
filesystem:
  root: gs://your-bucket/sub-folder/
  gcs:
    type: "service_account"
    project_id" YOUR_PROJECT_ID
    private_key_id: YOUR_PRIVATE_KEY_ID
    private_key: "YOUR_PRIVATE_KEY"
    client_email: YOUR_PROJECT_ID@appspot.gserviceaccount.com
    client_id: YOUR_CLIENT_ID
    auth_uri: "https://accounts.google.com/o/oauth2/auth"
    token_uri: "https://oauth2.googleapis.com/token"
    auth_provider_x509_cert_url: "https://www.googleapis.com/oauth2/v1/certs"
    client_x509_cert_url: "https://www.googleapis.com..."
```

### PySpark
TQL installs with a basic default configuration of PySpark.  By default, TQL uses a local,
 embedded instance of Spark, denoted by `local[*]` with `1024m` of memory for both the
 driver and executor. You can customize the Spark instance using the following configuration,
  changing the values from their defaults shown here:
```
pyspark:
  conf:
    spark.master: local[*] 
    spark.driver.memory: 1024m
    spark.executor.memory: 1024m
    ...
```
You may add any [Spark application properties](https://spark.apache.org/docs/latest/configuration.html#application-properties)
 you wish under `pyspark -> conf`.  In particular, you may wish to connect to a different type of Spark
 master, such as `yarn`.  This configuration change is required in order to use TQL on an
 [Amazon EMR](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark.html) or
 [Google Dataproc](https://cloud.google.com/dataproc) Cluster.

### Icarus
TQL's REST API server, named **Icarus**, acts as a gateway between the Python user interface
 and the Java query execution engine.  By default Icarus runs on port `9000` and `9001`
 with `512M` of memory. You can customize the web service's memory and ports using the
 following configuration, changing the values from their defaults shown here:
```
icarus:
  configuration:
    server:
      adminConnectors:
      - port: 9001
        type: http
      applicationConnectors:
      - port: 9000
        type: http
  memory: 512m
```

Under the hood, Icarus uses the Dropwizard web framework, and you can further customize
it's configuration in this section of the config file.  Read more about Dropwizard
configuration [here](https://www.dropwizard.io/en/latest/manual/configuration.html).

### Database
TQL uses a SQL database to store it's metadata about Projects, Timelines, Queries,
 and Resultsets.  By default TQL uses a file-based [H2 Database](http://www.h2database.com/html/main.html).
 With the following configuration:
```
database:
  nanml.standalone.db.instance_type: h2-disk
  nanml.standalone.db.h2-disk.directory: ~/.tql/db
```

However, you may wish to use a persistent MySQL datastore.  To do so, instead you
should use the following configuration:
```
database
  nanml.db.instance_type: mysql
  nanml.mysql.db.host: 172.25.0.99
```