Open Source Python Package

Synthetic table data that behaves like the real world

Define your schema in YAML once, then generate realistic multi-table data with Faker, relationships, deterministic seeds, and export targets for analytics and testing pipelines.

pip install tablefaker
Bank data generated from YAML configuration

Features

Everything needed for realistic synthetic datasets

YAML-first schema design

Define columns, types, and generation logic in a readable config that works well in repos and CI.

Cross-table relationships

Use foreign_key(...) and copy_from_fk(...) for realistic parent-child data.

Deterministic generation

Set config.seed for reproducible datasets across local runs, tests, and pipelines.

Flexible distributions

Model skewed behavior with uniform, zipf, and weighted parent foreign-key sampling.

Custom logic support

Use Python expressions, imports, and community Faker providers when built-in generators are not enough.

CLI and Python API

Run from terminal or import as a package for notebooks, scripts, and app-level test tooling.

Output Formats

One schema, many targets

Export in the format your workflow already expects.

Pandas DataFrame CSV JSON Parquet Excel SQL Inserts Delta Lake

Examples

Start in minutes with YAML and CLI

Minimal YAML

tables:
  - table_name: person
    columns:
      - column_name: id
        data: row_id
      - column_name: first_name
        data: fake.first_name()
      - column_name: last_name
        data: fake.last_name()

CLI command

tablefaker \
  --config tests/test_table.yaml \
  --file_type csv \
  --target ./exports \
  --seed 42

Use --infer-attrs true to override attribute inference from the command line.

Advanced Features

YAML and Code snippets for power users

Expand each item to copy focused YAML and Code examples for advanced generation patterns.

Sample Domains

Production-style examples included

Quick Start

Up and running in three steps

1

Install package

Install from PyPI with pip install tablefaker.

2

Create YAML schema

Define tables, columns, and generation expressions in a single config file.

3

Generate data

Use CLI or Python API to export data into CSV, JSON, Parquet, SQL, Excel, or Delta Lake.

FAQ

Common questions

Can I generate deterministic data?

Yes. Set config.seed or pass --seed to generate reproducible output.

Can I model table relationships?

Yes. Define primary keys and use foreign_key() and copy_from_fk() in child tables.

Does it support custom Faker providers?

Yes. Add providers through config.community_providers or Python API options.

Where can I report issues?

Use the GitHub repository issues page for bug reports and feature requests.

Build better test data pipelines with tablefaker

Open source, scriptable, and ready for local development, CI, and analytics workflows.