Basic ETL Template README Instructions for using the basic ETL template and recommendations for modifying it to your project needs Written by: Hannah Calkins Written on: 2020-03-17 The basic format of the ETL template assumes you are taking data from a tabular format (such as a relational database) and transforming it into data that will also be in a tabular format. There are recommendations for how to adapt this template to other data formats later in this README. This template can be used as a tool to outline an ETL process in advance of writing code. Filling out this template achieves the following goals: * Produces a reference document that easily and quickly identifies the source of data in case of questions or troubleshooting * Creates a resource that can be used to onboard new staff to an existing process * Documents an established workflow so it can be reproduced in the future * Identifies areas where the data source or destination is unclear and needs additional consideration or decision making This template should NOT serve as a replacement for accurately commenting code if that is applicable to your process. Columns: Source Table: this column should hold the name of the table from which you are pulling the source data Source Field: this column should hold the name of the field from which you are pulling the source data Destination Table: this column should hold the name of the table into which you are placing the data Destination Field: this column should hold the name of the field into which you are placing the data Transformation Description: this column should hold a description of any transformations you are performing on the data, such as if you convert days to years, modify a date format, or map a value such as white to caucasian. Note: this column is a place to put any additional notes you want to record. Modifications You may want or need to modify this template in various ways to document your particular process. Here are some ideas that may be applicable. Source and destination database: if you are pulling data from multiple databases, or if you are pushing data into multiple databases, it may be necessary to document additional levels above “table”. You can add columns for source database and destination database to address this. Many source fields to one destination field: if you are pulling data from many source fields and pushing it into one destination field, you can use the template as-is. It is OK to list the same destination field on multiple lines One source field to many destination fields: if you are pulling data from one source field and “splitting” it into many destination fields, use the template as-is and use the transformation description and note fields as appropriate to denote the particular filtering or “splitting” criteria you’ve determined. Remember that this should just provide a basic summary, and it should not be a replacement for any code that does the actual work which can also serve as a reference in this case. Starting from or pushing to non tabular data: if you are pulling data from a non-tabular dataset or pushing to a non-tabular data set, you can still use this method to document your source & destination. Instead of table & field, use a location reference appropriate to your data format- such as a JSON path for JSON data, or XPATH for XML data.