Skip to main content
Version: 0.3.25

Shopify

Need help deploying these sources, or figuring out how to run them in your data stack?

Join our Slack community or book a call with our support engineer Adrian.

Shopify is a user-friendly e-commerce solution that enables anyone to easily create and manage their own online store.

This Shopify dlt verified source and pipeline example loads data using “Shopify API” to the destination of your choice.

The resources that this verified source supports are:

NameDescription
customersIndividuals or entities who have created accounts on a Shopify-powered online store
ordersTransactions made by customers on an online store
productsThe individual items or goods that are available for sale

Setup Guide

Grab credentials

Grab API token

  1. Log in to Shopify.
  2. Click the settings icon⚙️ at the bottom left.
  3. Choose “Apps and sales channels.”
  4. Select the “Develop apps” tab.
  5. Click “Create an app” and enter app details.
  6. Go to “Configuration” and choose “Configure” under “Admin API integration."
  7. Grant read access in “Admin API access scopes.”
  8. Save the configuration.
  9. Hit “Install app” and confirm.
  10. Reveal and copy the Admin API token from “secrets.toml”. Store safely; it's shown only once.

Note: The Shopify UI, which is described here, might change. The full guide is available at this link.

Initialize the verified source

To get started with your data pipeline, follow these steps:

  1. Enter the following command:

    dlt init shopify_dlt duckdb

    This command will initialize the pipeline example with Shopify as the source and duckdb as the destination.

  2. If you'd like to use a different destination, simply replace duckdb with the name of your preferred destination.

  3. After running this command, a new directory will be created with the necessary files and configuration settings to get started.

For more information, read the Walkthrough: Add a verified source.

Add credential

  1. Inside the .dlt folder, you'll find a file called secrets.toml, which is where you can securely store your access tokens and other sensitive information. It's important to handle this file with care and keep it safe.

    Here's what the file looks like:

    #shopify
    [sources.shopify_dlt]
    private_app_password=" Please set me up !" #Admin API access token copied above
  2. Update private_app_password with the API access token that you copied above.

  3. Next, store your pipeline configuration details in the .dlt/config.toml.

    Here's what the config.toml looks like:

    [sources.shopify_dlt]
    shop_url = "Please set me up !" # please set me up!
  4. Update shop_url with the URL of your Shopify store. For example, "https://shop-123.myshopify.com/%E2%80%9D".

  5. Next, follow the destination documentation instructions to add credentials for your chosen destination, ensuring proper routing of your data to the final destination.

For more information, read the General Usage: Credentials.

Run the pipeline

  1. Before running the pipeline, ensure that you have installed all the necessary dependencies by running the command:
    pip install -r requirements.txt
  2. You're now ready to run the pipeline! To get started, run the following command:
    python shopify_dlt_pipeline.py
  3. Once the pipeline has finished running, you can verify that everything loaded correctly by using the following command:
    dlt pipeline <pipeline_name> show
    For example, the pipeline_name for the above pipeline example is shopify_data, you may also use any custom name instead.

For more information, read the Walkthrough: Run a pipeline.

Sources and resources

dlt works on the principle of sources and resources.

Source shopify_source:

This function returns a list of resources to load products, orders, and customers data from Shopify API.

def shopify_source(
private_app_password: str = dlt.secrets.value,
api_version: str = DEFAULT_API_VERSION,
shop_url: str = dlt.config.value,
start_date: TAnyDateTime = FIRST_DAY_OF_MILLENNIUM,
end_date: Optional[TAnyDateTime] = None,
created_at_min: TAnyDateTime = FIRST_DAY_OF_MILLENNIUM,
items_per_page: int = DEFAULT_ITEMS_PER_PAGE,
order_status: TOrderStatus = "any",
) -> Iterable[DltResource]:

private_app_password: App's password for your shop.

api_version: API version (e.g., 2023-01).

shop_url: Your shop's URL (e.g., https://my-shop.myshopify.com).

items_per_page: Max items fetched per page (Default: 250).

start_date: Imports items updated since this date (Default: 2000-01-01). Used for incremental loading if end_time isn't specified. Accepts ISO 8601 date/datetime formats.

end_time: Data load range end time. Paired with start_date for specified time range. Enables incremental loading if unspecified.

created_at_min: Load items created since this date (Default: 2000-01-01).

order_status: Filter for order status: 'open', 'closed', 'cancelled', 'any' (Default: 'any').

Resource products:

This resource loads products from your Shopify shop into the destination. It supports incremental loading and pagination.

@dlt.resource(primary_key="id", write_disposition="merge")
def products(
updated_at: dlt.sources.incremental[
pendulum.DateTime
] = dlt.sources.incremental(
"updated_at",
initial_value=start_date_obj,
end_value=end_date_obj,
allow_external_schedulers=True,
),
created_at_min: pendulum.DateTime = created_at_min_obj,
items_per_page: int = items_per_page,
) -> Iterable[TDataItem]:

updated_at: The saved state of the last 'updated_at' value.

Similar to the mentioned resource, there are two more resources "orders" and "customers", both support incremental loading and pagination.

Customization

Create your own pipeline

If you wish to create your own pipelines, you can leverage source and resource methods from this verified source.

  1. Configure the pipeline by specifying the pipeline name, destination, and dataset as follows:

    pipeline = dlt.pipeline(
    pipeline_name="shopify", # Use a custom name if desired
    destination="duckdb", # Choose the appropriate destination (e.g., duckdb, redshift, post)
    dataset_name="shopify_data" # Use a custom name if desired
    )

    To read more about pipeline configuration, please refer to our documentation.

  2. To load data from "products", "orders" and "customers" from 1st Jan 2023.

    # Add your desired resources to the list...
    resources = ["products", "orders", "customers"]
    start_date="2023-01-01"

    load_data = shopify_source(start_date=start_date).with_resources(*resources)
    load_info = pipeline.run(load_data)
    print(load_info)
  3. To load past Shopify orders in weekly chunks using start_date and end_date parameters. This minimizes potential failure during large data loads. Running chunks and incremental loads in parallel accelerates the initial load.

    # Load all orders from 2023-01-01 to now
    min_start_date = current_start_date = pendulum.datetime(2023, 1, 1)
    max_end_date = pendulum.now()
    # Create a list of time ranges of 1 week each, we'll use this to load the data in chunks
    ranges: List[Tuple[pendulum.DateTime, pendulum.DateTime]] = []
    while current_start_date < max_end_date:
    end_date = min(current_start_date.add(weeks=1), max_end_date)
    ranges.append((current_start_date, end_date))
    current_start_date = end_date

    for start_date, end_date in ranges:
    print(f"Load orders between {start_date} and {end_date}")
    # Create the source with start and end date set according to the current time range to filter
    # created_at_min lets us set a cutoff to exclude orders created before the initial date of (2023-01-01)
    # even if they were updated after that date
    load_data = shopify_source(
    start_date=start_date, end_date=end_date, created_at_min=min_start_date
    ).with_resources("orders")

    load_info = pipeline.run(load_data)
    print(load_info)
    # Continue loading new data incrementally starting at the end of the last range
    # created_at_min still filters out items created before 2023-01-01
    load_info = pipeline.run(
    shopify_source(
    start_date=max_end_date, created_at_min=min_start_date
    ).with_resources("orders")
    )
    print(load_info)

This demo works on codespaces. Codespaces is a development environment available for free to anyone with a Github account. You'll be asked to fork the demo repository and from there the README guides you with further steps.
The demo uses the Continue VSCode extension.

Off to codespaces!

DHelp

Ask a question

Welcome to "Codex Central", your next-gen help center, driven by OpenAI's GPT-4 model. It's more than just a forum or a FAQ hub – it's a dynamic knowledge base where coders can find AI-assisted solutions to their pressing problems. With GPT-4's powerful comprehension and predictive abilities, Codex Central provides instantaneous issue resolution, insightful debugging, and personalized guidance. Get your code running smoothly with the unparalleled support at Codex Central - coding help reimagined with AI prowess.