Back to Portfolio

Car Recommendation System

An AI-powered recommendation engine combining semantic search, vector databases, and natural language understanding to match users with the right vehicles

The Challenge

๐Ÿ’ก The Core Problem:
Users think in use-cases. They start with why (use-cases), not what (specs). Marketplaces do the opposite.

The Solution

1

Input: Natural Language or Form

Users either describe their situation in natural language (e.g., "family needing space for kids and luggage") or fill a structured form with key features like budget, seats, fuel and usage.

โ†’
2

Interpretation & Structuring

For natural language input, GPT interprets intent and asks clarifying questions; for form input, the structured features are used directly without additional AI interpretation.

โ†’
3

Smart Matching

CRS returns matching vehicles + convincing alternatives via semantic retrieval

๐ŸŽฏ Result:
Users can choose between a conversational, guided search via natural language or a fast expert form. In both cases, preferences are transformed into structured data for precise matching instead of relying on rigid keyword filters.

How It Works

๐Ÿ“ Input โ†’ Structure

Users either describe their needs in natural language or fill a structured form. Both paths are transformed into a unified preferences JSON (seats, budget, fuel, gearbox, usageโ€ฆ)

๐Ÿ” Vector Search

Vector search in OpenSearch finds similar vehicles (kNN), even with incomplete inputs

โš™๏ธ Hybrid Filtering

Hard filters + ranking + explanation produce results

๐ŸŽฏ Guided Experience

Users experience a competent, human-like consultation instead of tedious manual filtering

Approach Example: Text โ†’ Structured Query

๐Ÿ‘ค User Input

  • "Family of 5, city commuting"
  • "Need 7 seats, automatic"
  • "Budget โ‰ค โ‚ฌ18k, max 120k km"

๐Ÿค– CRS Preference JSON

{ "numberOfSeats": 7, "gearbox": "automatic", "price_max": 18000, "mileage_max": 120000, "usage": "city", "missing": ["fuel", "bodyType"] }
โ†’ Follow-up: "SUV or van? Fuel preference?"

Technical Deep Dive: Prompt & Dialog Strategy

Technical Deep Dive: Semantic Vector Search

Semantic Vector
Embedding space
Embeddings of car descriptions and a matching user query vector in the same space
โ†’
OpenSearch
kNN vector search
โ†’
Top-k
Ranked Cars
Semantic similarity scoring + metadata filters (price, mileage, year)

End-to-End Architecture (Numbered Flow)

1
Ingestion & Enrichment
๐ŸŒ
Marketplaces
mobile.de / AutoScout24
๐Ÿ“ฆ
S3 Bucket
Raw Ads (JSON)
โฐ
EventBridge
Daily Schedule
โ†“
โšก
Lambda
Orchestrator
๐Ÿ”ง
AWS Glue ETL
Normalize + Enrich
๐Ÿงฌ
Embedding Generator
Vector Creation
2
Storage & Indexing
๐Ÿ’พ
DynamoDB
Structured Features
๐Ÿ”
OpenSearch
kNN Index (Vectors)
๐Ÿ—„๏ธ
S3
Raw Archive
3
Query & User Experience
๐Ÿ’ฌ
Streamlit UI
Form + Chat Mode
๐Ÿค–
GPT-4o
Intent โ†’ JSON
โš™๏ธ
Query Engine
Vector + Filters
๐Ÿš€ Key Architecture Features:
  • AWS-managed, largely serverless architecture
  • Daily automated data refresh pipeline
  • Optimized for low-latency retrieval (vector similarity + metadata filters)
  • Modular components (easy to swap GPT/embeddings/index providers)

Evaluation: Test Design & Data Basis

62k
Total Dataset
Vehicle Listings
1,000
Test Set
Randomly Selected
3
Test Scenarios
Feature Combinations

๐Ÿงช Reproducible Test Procedure

  • For each test vehicle, exactly one search query is generated
  • Search queries based exclusively on technical attributes (no make/model)
  • Independent search for each of 1,000 vehicles, defined by objective characteristics

What is Measured?

Evaluation: Definition of Test Cases

Test 1 โ€“ 9 Features

Complete data scenario

BodyType NumberOfDoors FirstRegistration GearBox NumberOfSeats Fuel Power DriveType CubicCapacity

Test 2 โ€“ 6 Features

Realistic partial data

BodyType FirstRegistration GearBox NumberOfSeats Fuel Power

Test 3 โ€“ 3 Features

Sparse data scenario

BodyType GearBox Fuel
๐Ÿ“Œ Design Rationale: The tests simulate realistic listing completeness across three levels: core-feature complete, partially complete, and sparse listings (within the selected feature subset).

Evaluation: Results Overview

Test Scenarios Rank 1 (Hit@1) Top 3 (Hit@3) Top 5 (Hit@5) Top 10 (Hit@10)
Test 1 โ€“ 9 features 71.4% 90.7% 94.5% 97.9%
Test 2 โ€“ 6 features 59.3% 83.6% 89.7% 95.0%
Test 3 โ€“ 3 features 19.1% 39.1% 51.3% 68.4%
๐Ÿ“Š Interpretation: For each test scenario, the table shows in what percentage of the 1,000 searches the target vehicle appears on Rank 1, within the Top 3, Top 5, or Top 10 of the result list. Each row corresponds to 1,000 searches for the respective test vehicles.

Evaluation: Hit Rates by Rank

Hit@1 (Rank 1)

Test 1 (9 features)
71.4%
Test 2 (6 features)
59.3%
Test 3 (3 features)
19.1%

Hit@10 (Top 10)

Test 1 (9 features)
97.9%
Test 2 (6 features)
95.0%
Test 3 (3 features)
68.4%

Evaluation: Top 10 Hit Rate Analysis

979
Test 1: Vehicles in Top 10
(out of 1,000)
950
Test 2: Vehicles in Top 10
(out of 1,000)
684
Test 3: Vehicles in Top 10
(out of 1,000)

Performance Characteristics

๐ŸŽฏ Core Metric: Hit@10 = 97.9%
In the most complete scenario (9 features), nearly all vehicles (979/1000) appear within the first 10 results, demonstrating excellent retrieval accuracy.

Evaluation: Technical Interpretation

๐Ÿ” Vector Search Performance

The system reliably re-identifies vehicles using vector search and similarity metrics, making it much more flexible than purely static filter logic.

โš™๏ธ End-to-End Validation

Evaluation covers the entire technical chain: DynamoDB storage โ†’ OpenSearch kNN indexing โ†’ result quality metrics.

Key Technical Insights

๐Ÿ’ก Practical Takeaway: The dual vector strategy enables the system to handle real-world scenarios where data is incomplete, while still delivering highly relevant results. This is a significant advantage over traditional keyword-based or pure filter systems.

Impact & Business Value

๐Ÿ“ˆ User Experience

  • More relevant recommendations
  • Reduced friction in search
  • Guided consultation experience

๐Ÿ’ฐ Business Metrics

  • Increased conversion likelihood
  • Higher user engagement
  • Reduced drop-off rates

๐Ÿš€ Foundation for Scale

  • White-label integration friendly: Can be integrated into existing marketplaces via API or embeddable widget
  • Multi-domain potential: Architecture extends beyond cars (real estate, e-commerce, job matching)
  • Production-ready: AWS-managed infrastructure designed for reliability and scale

๐ŸŽฏ Core Achievement

CRS demonstrates how intelligent data architecture and generative AI can transform traditional filter-based search into a precise, user-centered recommendation experience.

Next: White-Label + Multi-Domain Growth

1

White-Label Integration

API + embeddable widget for seamless marketplace integration

โ†’
2

Partner Rollout

KPI optimization: conversion rate, CTR, drop-off reduction

โ†’
3

Multi-Domain Expansion

Extend to real estate, jobs, e-commerce via configurable schema

๐Ÿ”ง Technical Extensibility

  • Modular architecture: Easy to swap LLM providers (GPT โ†’ Claude/Llama), embedding models, or vector databases
  • Configurable schema: JSON-based preference definitions adapt to different product domains
  • A/B testing ready: Built-in analytics hooks for measuring recommendation quality improvements