Slide 01 — Project Overview

AWS Smart Forecast

End-to-end retail forecasting and business analytics on AWS

Cloud-based forecasting demo built on the Rossmann Store Sales dataset
Designed to showcase production-style data engineering, ML forecasting, and BI delivery
Covers the full workflow from raw CSV ingestion to dashboard-ready forecasts
Built for both technical teams and business stakeholders

Slide 02 — Business Problem & Value

Business challenge and client value

Why retail forecasting needs more than a model

Daily store sales fluctuate due to promotions, holidays, seasonality, and store-specific factors
Retail data often arrives fragmented, inconsistent, and not directly usable for forecasting
Business teams need reliable forecasts for inventory, staffing, and promotion planning
Leaders need fast insights via dashboards and simple interfaces, not manual SQL workflows
The solution combines automation, forecasting, and analytics in one pipeline

Slide 03 — End-to-End Architecture

End-to-end AWS architecture for forecasting

From raw data to business-facing insights

Amazon S3 data lake with layered design (Bronze → Silver → Gold)
AWS Glue Crawler + Data Catalog for schema discovery and metadata management
Amazon Athena for validation and analytical SQL queries
AWS Glue Jobs (PySpark) for ETL, cleaning, joins, and feature engineering
Amazon SageMaker (XGBoost) for training and forecast generation
Amazon QuickSight for dashboards, KPIs, and monitoring
Optional chat analytics layer: Lambda + Athena + Streamlit + OpenAI

Slide 04 — Architecture Diagram

Architecture diagram: End-to-end AWS pipeline

Data flow from ingestion to BI/chat — modular, scalable, and traceable

End-to-end overview

Main flow: Raw data → S3 (Bronze/Silver/Gold) → Glue (ETL & features) → SageMaker (train & forecast) → QuickSight (dashboards). Secondary path: S3 → Athena (SQL & validation). Optional: Chat (NLQ) uses Athena via validated query templates.

Slide 05 — Data Engineering Pipeline

Data engineering pipeline (Bronze → Silver → Gold)

Reliable, queryable, and ML-ready data preparation

Bronze (Raw): CSV files stored in S3 as received for reproducibility
Catalog & validation: Glue Crawler registers datasets; Athena checks schema, completeness, and anomalies
Silver (Cleaned): PySpark ETL joins sales and store metadata, fixes types, handles nulls, removes invalid rows
Storage optimization: Parquet output partitioned by store / year / month for faster Athena scans
Gold (Engineered): feature-enriched dataset for forecasting, dashboards, and downstream analytics

Slide 06 — Feature Engineering

Feature engineering for real-world retail demand

Transforming raw sales into predictive business signals

Lag features (e.g., lag_1, lag_7, lag_14) capture short-term memory and weekly behavior
Moving averages (e.g., ma_7, ma_30) smooth volatility and represent trends
Promo, holiday, and operational flags model demand shocks and store availability
Temporal features (weekday, month, week_of_year, year) encode seasonality
Store metadata (assortment, competition distance, store type) adds business context
Result: a richer feature matrix that improves forecast quality and interpretability

Slide 07 — Model Training & Validation

Forecasting with SageMaker (XGBoost) and time-aware validation

Forecast quality measured on future periods, not random samples

XGBoost regression model trained in Amazon SageMaker on the Gold dataset
Chronological split to avoid leakage (no future information in training; 70% train / 15% validation / 15% test)
Hyperparameter tuning and early stopping to improve generalization
One consistent ML workflow supports multiple prediction scenarios
Forecast outputs stored in S3 for dashboards and query-based reporting

Project metrics: RMSE 559.23, MAPE 5.25%, Bias -236.90 (Accuracy ~94.75%).

Slide 08 — Forecast Delivery

From forecasts to decisions

Business-facing delivery through dashboards and natural-language analytics

QuickSight dashboards show Forecast vs Actual trends, KPI cards, and error hotspots
Store-level filters support operational review and management reporting
Promotion impact and high-deviation stores are highlighted in dedicated views
Streamlit chat UI lets business users ask questions in natural language
Lambda + Athena backend executes controlled, validated query templates
OpenAI supports question interpretation and structured query mapping (no direct raw SQL access)

Slide 09 — Summary

Technical Strength & Business Relevance

Why this project matters for real client work

End-to-end AWS data + ML architecture (S3, Glue, Athena, SageMaker, QuickSight)
Strong data engineering execution (ETL, cleaning, joins, partitioning, feature pipelines)
Practical forecasting workflow design with leakage-aware validation
Business-ready outputs: dashboards, KPI monitoring, and conversational analytics
Modular design adaptable to retail, e-commerce, demand planning, and operations