Create a pipeline
Follow the steps below to create a pipeline from the
WeatherAPI.com API to DuckDB from scratch. The same steps can be repeated for any source and
destination of your choice—use dlt init <source> <destination>
and then build the pipeline for
that API instead.
Please make sure you have installed dlt
before following the
steps below.
1. Initialize project
Create a new empty directory for your dlt
project by running:
mkdir weatherapi_duckdb && cd weatherapi_duckdb
Start a dlt
project with a pipeline template that loads data to DuckDB by running:
dlt init weatherapi duckdb
Install the dependencies necessary for DuckDB:
pip install -r requirements.txt
2. Add WeatherAPI.com API credentials
You will need to sign up for the WeatherAPI.com API.
Once you do this, you should see your API Key
at the top of your
user page.
Copy the value of the API key into .dlt/secrets.toml
:
[sources]
api_secret_key = '<api key value>'
The secret name must correspond to the argument name in the source, i.e.
api_secret_key=dlt.secrets.value
in def weatherapi_source(api_secret_key=dlt.secrets.value):
.
Run the weatherapi.py
pipeline script to test that authentication headers look fine:
python3 weatherapi.py
Your API key should be printed out to stdout along with some test data.
3. Request data from the WeatherAPI.com API
Replace the definition of the weatherapi_resource
function definition in the weatherapi.py
pipeline script with a call to the WeatherAPI.com API:
@dlt.resource(write_disposition="append")
def weatherapi_resource(api_secret_key=dlt.secrets.value):
url = "https://api.weatherapi.com/v1/current.json"
params = {
"q": "NYC",
"key": api_secret_key
}
response = requests.get(url, params=params)
response.raise_for_status()
yield response.json()
Run the weatherapi.py
pipeline script to test that the API call works:
python3 weatherapi.py
This should print out the weather in New York City right now.
4. Load the data
Remove the exit()
call from the main
function in weatherapi.py
, so that running the
python3 weatherapi.py
command will now also run the pipeline:
if __name__=='__main__':
# configure the pipeline with your destination details
pipeline = dlt.pipeline(
pipeline_name='weatherapi',
destination='duckdb',
dataset_name='weatherapi_data'
)
# print credentials by running the resource
data = list(weatherapi_resource())
# print the data yielded from resource
print(data)
# run the pipeline with your parameters
load_info = pipeline.run(weatherapi_source())
# pretty print the information on data that was loaded
print(load_info)
Run the weatherapi.py
pipeline script to load data into DuckDB:
python3 weatherapi.py
Then this command to see that the data loaded:
dlt pipeline weatherapi show
This will open a Streamlit app that gives you an overview of the data loaded.
5. Next steps
Now that you have a working pipeline, you have options for what to learn next:
- Deploy this pipeline with GitHub Actions, so that the data is automatically loaded on a schedule.
- Transform the loaded data with dbt or in Pandas DataFrames.
- Learn how to run, monitor, and alert when you put your pipeline in production.
- Try loading data to a different destination like Google BigQuery, Amazon Redshift, or Postgres.