(Blueprint)
Event-Driven Bronze → Silver → Gold Architecture
with Automated Data Quality & QuickSight Dashboards
Demo using synthetic E-Commerce data (Mockaroo)
Why many teams modernize toward automated cloud pipelines
Many organizations operate with fragmented data sources and a high degree of manual effort before reporting and analytics become reliable.
A serverless AWS blueprint that automates ingestion, transformation, and quality monitoring (event-driven), and prepares analytics-ready data for BI and dashboards.
Three-Layer Medallion Architecture on AWS
Raw data ingestion from source systems
CSV/JSON files stored in S3, cataloged via Glue Crawlers
Cleaned and standardized Parquet format
Type casting, partitioning, data standardization
Star schema with fact and dimension tables
Business-ready analytics tables optimized for querying
Modern Cloud & Data Engineering with AWS Serverless Services
S3 upload triggers EventBridge rule that launches Step Functions orchestration
Athena Saved Queries validate data completeness, null checks, and plausibility with SNS alerts
Fact tables (orders, transactions) joined with dimension tables (customers, products, date)
Event-Driven Workflow on AWS
CSV file uploaded to S3 Bronze layer
S3 bucket with EventBridge notifications enabled
EventBridge triggers Step Functions state machine
Event pattern matches bucket and file extension
Glue ETL Job transforms CSV to Parquet
PySpark job applies schema and writes to Silver layer
Athena registers new partition
ALTER TABLE ADD PARTITION for Silver table
Athena inserts data into fact tables
INSERT INTO with type casting and transformations
QuickSight Refresh – optional
Datasets can be refreshed after pipeline completion to reflect the latest data
Fully automated — from file upload to visualization
Key Advantages
Automated Canary Checks
Query Example
SELECT COUNT(*) FROM fact_orders
Validation
Ensures data exists in fact tables
Prevents reports from being built on missing data.
Trigger
Daily schedule via EventBridge
Query Example
SELECT COUNT(*) WHERE order_id IS NULL
Validation
Validates primary key completeness
Avoids broken joins and incorrect KPIs.
Trigger
After data load completion
Query Example
SELECT COUNT(*) WHERE amount <= 0
Validation
Identifies unreasonable values
Catches data errors before dashboards.
Trigger
Scheduled and on-demand
Athena Saved Queries executed by Lambda, results evaluated, SNS alerts on failures
AWS Cloud-Native, Managed Services
Serverless architecture built entirely on AWS managed services
Star Schema Implementation
fact_orders
Keys: order_id, customer_key, order_date_key
Metrics: total_amount, order_status
fact_order_items
Keys: order_item_id, order_id, product_key
Metrics: quantity, unit_price
fact_transactions
Keys: transaction_id, order_id
Metrics: amount, payment_method
dim_customers
Attributes: customer_id, name, email, address, registration_date
dim_products
Attributes: product_id, name, category, brand, price
dim_date
Attributes: date_key, date, day, month, year, week
Technical Skills Demonstrated in This Project
Cloud & Data Engineering Expertise on AWS
Designing scalable, automated data solutions using modern cloud architecture patterns.
Interested in exploring a similar solution for your organization?
Contact Me