# Configuring TQL This guide will show you how to configure and customize TQL to fit your needs. ## The TQL Configuration File TQL is configured by a [yaml](https://yaml.org/) file. You can always see the location of the yaml configuration file by using TQL's command line tool by running the command `tql status`: ``` (tql-venv) jshmoe$ tql status ------------------------------------------------------------------ Version: 20.1.16 Version Timestamp: 2021-07-21 17:34:23 Version Age: 6 days, 17 hours, 56 minutes, 6 seconds Filesystem Root: /home/jshmoe/.tql/files Working Directory: /home/jshmoe/.tql Configuration File: /home/jshmoe/.tql/tql_conf.yml Api Gateway: http://localhost:9000 Service Status: Icarus: OFFLINE, Daedalus: OFFLINE Service Uptime: ------------------------------------------------------------------ ``` You can also see the current configuration by running the command `tql conf`: ``` (tql-venv) jshmoe$ tql conf conf_path: /home/jshmoe/.tql/tql_conf.yml database: nanml.standalone.db.h2-disk.directory: /home/jshmoe/.tql/db nanml.standalone.db.instance_type: h2-disk filesystem: root: /home/jshmoe/.tql/files icarus: ... ```
NOTE: The TQL configuration file will only be generated on first startup of the TQL backend. If you just installed TQL but have not yet tried to use it, first use the command tql start to generate a configuration file to edit.

## Modifying the Configuration Edit the configuration file at `/home/jshmoe/.tql/tql_conf.yml` using your editor of choice. Afterwords, restart TQL using the command `tql restart` to apply your configuration changes. ## Configuration Sections The configuration file is broken up into four sections: `filesystem`, `pyspark`, `icarus`, and `database`. ### FileSystem The backend reads and writes files as part of its normal operation. Configure this property by adding the following section to your conf: ``` filesystem: root: /path/to/filesystem/root ``` By default, the backend reads and writes to the local filesystem, in a directory `~/.tql/files/`. However, if deploying the backend to a cluster environment, such an AWS EMR Cluster or GCP Dataproc, all processing nodes must have access to the filesystem. For this you should configure the backend to write to cloud storage, such as S3 or GCS. #### Amazon S3 Filesystem To read/write to Amazon S3, use the following configuration: ``` filesystem: root: s3://your-bucket/sub-folder/ s3: s3_access_key: YOUR_ACCESS_KEY s3_secret_key: YOUR_SECRET_KEY s3_region: YOUR_REGION ``` #### Google GCS Filesystem: To read/write to GCS, use the following configuration: ``` filesystem: root: gs://your-bucket/sub-folder/ gcs: type: "service_account" project_id" YOUR_PROJECT_ID private_key_id: YOUR_PRIVATE_KEY_ID private_key: "YOUR_PRIVATE_KEY" client_email: YOUR_PROJECT_ID@appspot.gserviceaccount.com client_id: YOUR_CLIENT_ID auth_uri: "https://accounts.google.com/o/oauth2/auth" token_uri: "https://oauth2.googleapis.com/token" auth_provider_x509_cert_url: "https://www.googleapis.com/oauth2/v1/certs" client_x509_cert_url: "https://www.googleapis.com..." ``` ### PySpark TQL installs with a basic default configuration of PySpark. By default, TQL uses a local, embedded instance of Spark, denoted by `local[*]` with `1024m` of memory for both the driver and executor. You can customize the Spark instance using the following configuration, changing the values from their defaults shown here: ``` pyspark: conf: spark.master: local[*] spark.driver.memory: 1024m spark.executor.memory: 1024m ... ``` You may add any [Spark application properties](https://spark.apache.org/docs/latest/configuration.html#application-properties) you wish under `pyspark -> conf`. In particular, you may wish to connect to a different type of Spark master, such as `yarn`. This configuration change is required in order to use TQL on an [Amazon EMR](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark.html) or [Google Dataproc](https://cloud.google.com/dataproc) Cluster. ### Icarus TQL's REST API server, named **Icarus**, acts as a gateway between the Python user interface and the Java query execution engine. By default Icarus runs on port `9000` and `9001` with `512M` of memory. You can customize the web service's memory and ports using the following configuration, changing the values from their defaults shown here: ``` icarus: configuration: server: adminConnectors: - port: 9001 type: http applicationConnectors: - port: 9000 type: http memory: 512m ``` Under the hood, Icarus uses the Dropwizard web framework, and you can further customize it's configuration in this section of the config file. Read more about Dropwizard configuration [here](https://www.dropwizard.io/en/latest/manual/configuration.html). ### Database TQL uses a SQL database to store it's metadata about Projects, Timelines, Queries, and Resultsets. By default TQL uses a file-based [H2 Database](http://www.h2database.com/html/main.html). With the following configuration: ``` database: nanml.standalone.db.instance_type: h2-disk nanml.standalone.db.h2-disk.directory: ~/.tql/db ``` However, you may wish to use a persistent MySQL datastore. To do so, instead you should use the following configuration: ``` database nanml.db.instance_type: mysql nanml.mysql.db.host: 172.25.0.99 ```