System Architecture Overview

ReptiDex is built as a modern, cloud-native platform designed for scalability, reliability, and maintainability. The architecture follows domain-driven design principles with microservices that can scale independently while maintaining data consistency and business rule integrity.

System Purpose and Vision

Primary Goal

To streamline reptile lineage tracking by creating a standardized, trusted, and transparent process across reptile species, establishing a shared standard for lineage tracking that makes data interoperable and reliable across breeders, buyers, and organizations.

Core Value Proposition

Lineage-First System: Ensures accurate tracking of genetics, ancestry, and health across generations
Verified Transparency: Builds breeder and buyer confidence through verified, transparent records
Foundation-Focused: Positions lineage tracking as the foundation, with additional features as value-add bonuses

Guiding Principle

Every feature in ReptiDex supports or enhances the core lineage tracking experience.

System Context (C4 Level 1)

Primary Actors

Breeder: Manages vivariums, animals, pairings/clutches, lineage, media; sets up profiles/listings; purchases ads; receives notifications
Buyer/Visitor: Browses public profiles/listings, views pedigrees and lineage; interacts with embedded widgets
ReptiDex Admin: Handles moderation, abuse, feature flags, entitlements, and support tooling

External Systems

Payment Processors: Stripe/PayPal for subscriptions, invoices, dunning, refunds
Communication: Email/SMS providers for transactional and marketing notifications
Authentication: OAuth IdPs (Google/GitHub) for optional SSO
Content Delivery: CDN/Edge for public pages, media, embed assets, export downloads
Future Integrations: Genetics labs, partner marketplaces, reptile registries

High-Level Architecture

Architecture Principles

Domain-Driven Design: Services organized around business domains with clear bounded contexts
Event-Driven Architecture: Real-time consistency through synchronous events and eventual consistency through async messaging
Consolidated Microservices: 6 logical services with shared databases, optimized for solo developer operations
Multi-Tenant by Design: Row-level security and organization-scoped data from day 1
API-First: All functionality exposed through well-defined OpenAPI contracts
Cloud-Native: Built for AWS with simple, cost-effective deployment strategy

System Boundaries (6 Consolidated Services)

Technology Stack

Backend Services

Runtime: Python with FastAPI
Authentication: JWT with RS256 signatures
Validation: Pydantic schemas with comprehensive validation
Testing: pytest with comprehensive test coverage

Frontend Applications

Applications: 4 core apps (web-public, web-breeder, web-admin, web-embed)
Framework: Vite + React 19
Language: TypeScript
UI Components: Radix UI with Tailwind CSS via @reptidex/ui
State Management: Zustand + React Query via @reptidex/core
Shared Packages: Simplified 2-package approach for maintainability

Infrastructure & Platform

Cloud Provider: AWS
Deployment: EC2 instances with Docker Compose (simple and cost-effective)
Load Balancing: Application Load Balancer
DNS & CDN: CloudFront with Route 53
Monitoring: Grafana + Prometheus + Loki

Data Storage

Primary Database: PostgreSQL (RDS) with multi-schema design
Caching: Redis (ElastiCache) for session and computed data
Search Engine: PostgreSQL full-text search (simple, no OpenSearch overhead)
Object Storage: S3 with lifecycle policies
Message Queuing: SNS/SQS for event-driven communication

Event-Driven Architecture

Real-Time Consistency Design

ReptiDex implements a hybrid event architecture optimized for real-time consistency where critical and eventual consistency where acceptable. Event Infrastructure:

Primary: AWS SNS → SQS with dead letter queues for reliable delivery
Real-time: EventBridge for synchronous cross-service events
Event Store: PostgreSQL with event sourcing for audit trails and replay
Saga Orchestration: Distributed transaction coordination for complex workflows

Event Categories & Processing

Critical Events (Synchronous Processing):

animal.created, animal.updated, animal.transferred - Immediate lineage updates
lineage.updated, lineage.validated - Real-time ancestry graph consistency
breeding.paired, breeding.clutch.created - Genetics analysis triggers
transaction.initiated, transaction.completed - Financial consistency
inventory.reserved, inventory.transferred - Availability state changes

High-Priority Events (Near Real-Time < 5 seconds):

genetics.prediction.updated - AI model results propagation
commerce.listing.created - Marketplace visibility updates
sales.inquiry.received - Customer engagement notifications
profile.updated - Search index and cache invalidation

Background Events (Asynchronous):

search.index.updated - Search engine synchronization
analytics.event.tracked - Business intelligence aggregation
notification.queued - Email/SMS delivery processing
audit.logged - Compliance and monitoring data

Event Processing Patterns

Saga Pattern for Complex Transactions:

# Animal sale orchestration with compensation
@saga_orchestrator("animal.sale")
async def animal_sale_saga(sale_request):
    steps = [
        ("inventory", "reserve_animal"),
        ("transaction", "process_payment"),
        ("animal", "transfer_ownership"),
        ("lineage", "update_ancestry"),
        ("commerce", "complete_sale")
    ]
    # Automatic compensation on failure

Event Sourcing for Critical Domains:

Animal ownership history (immutable audit trail)
Lineage relationships (ancestry validation)
Financial transactions (regulatory compliance)
Breeding records (genetic history preservation)

Service Communication

Synchronous (Real-time):

API calls for immediate consistency requirements
Health checks and service discovery
User-facing operations requiring instant feedback

Asynchronous (Event-driven):

Cross-domain data synchronization
Business process orchestration
Analytics and reporting pipeline
Notification delivery system

Security Architecture

Multi-Tenant Security Framework

ReptiDex implements defense-in-depth security with organization-level isolation from day 1. Security Layers:

Network Security: VPC with private subnets, security groups, WAF protection
Application Security: JWT with RS256, RBAC with vivarium roles, API rate limiting
Data Security: Row-level security (RLS), encryption at rest/transit, field-level privacy
Service Security: Service-to-service authentication, circuit breakers, input validation

Multi-Tenant Data Isolation

Row-Level Security (RLS):

-- Organization-scoped data access
CREATE POLICY org_isolation ON animals 
FOR ALL TO app_role 
USING (organization_id = current_setting('app.current_org_id')::uuid);

-- Service-level tenant context setting
SET app.current_org_id = '${tenant_id}';

Service Isolation:

Service-per-database with dedicated connection pools
API gateway with tenant context validation
Cross-service calls include tenant verification
Event messages include organization scope

Privacy and Compliance

Granular Privacy Controls: Public, organization-only, private, members-only visibility
Immutable Audit Trails: Complete audit logging via repti-audit service
Data Ownership: Clear data ownership with transfer controls and two-party handshake
Compliance Ready: GDPR/CCPA compliance framework with data export/deletion
Secure Identifiers: Base58 global IDs to prevent enumeration attacks

Data Architecture

Database-Per-Service Strategy

Each of the 6 consolidated services owns its database completely, ensuring proper isolation and service autonomy. Service Database Ownership:

repti_core_db: Auth, config, billing, events, telemetry data
repti_animal_db: Animals, lineage, genetics, taxonomy, breeding data
repti_commerce_db: Marketplace, sales, transactions, inventory data
repti_media_db: File metadata, rendering jobs, embed configurations
repti_community_db: Search indexes, notifications, community data
repti_ops_db: Admin tools, audit logs, integrations, system logs

Multi-Tenant Data Patterns

Row-Level Security: All tenant-scoped tables use RLS policies for organization isolation
Tenant Context: Each service connection sets tenant context for automatic filtering
Database Isolation: Services can only access their own database
Cross-Service Data Access: Only via service APIs and events, never direct database access

Database Features

Service Database Isolation: Complete data ownership and schema control per service
Independent Scaling: Each database can be scaled based on service-specific needs
Service-Specific Backups: Tailored backup strategies per database (critical vs supporting)
Fault Isolation: Database issues in one service don’t affect others
Technology Flexibility: Can use specialized databases per service if needed (e.g., time-series for telemetry)

Caching Strategy

Performance and Scalability

Performance Targets

API Response Time: p95 < 500ms for read operations
Database Performance: Query optimization with proper indexing
Cache Hit Rates: > 90% for frequently accessed data
CDN Optimization: Global edge locations for static content

Scaling Strategy

Service-Level Auto-Scaling: Independent scaling for each of 30 microservices based on demand
Database Scaling: Service-specific read replicas with connection pooling per database
Event-Driven Scaling: SNS/SQS scaling based on queue depth and processing lag
Geographic Scaling: Multi-region deployment with cross-region data replication

Deployment Architecture

Simple Container Deployment

Docker Compose on EC2:

Single EC2 Instance: t3.large or t3.xlarge for development/staging
Production Setup: 2-3 EC2 instances behind ALB for redundancy
Service Containers: 6 consolidated services in Docker containers
Shared Networking: Docker Compose networking for inter-service communication

Load Balancing & Discovery:

Application Load Balancer with target groups for each service
Simple service discovery via environment variables and internal DNS
Health checks and automatic failover
Rolling deployments with zero downtime

Monitoring and Observability

Streamlined Observability (6 Services)

Centralized Monitoring via repti-ops:

Logs: Structured JSON logging aggregated from all 6 services
Metrics: Prometheus instrumentation with Grafana dashboards and alerting
Health Checks: Simple health monitoring with service dependency tracking
Alerts: Grafana intelligent alerting with CloudWatch fallback for infrastructure

Grafana Integration:

APM Monitoring: Application performance monitoring for all 6 services
Custom Dashboards: Business metrics and service health visualization
Distributed Tracing: End-to-end request tracing across service boundaries
Error Tracking: Automatic error detection and notification
Infrastructure Monitoring: ECS, RDS, and Redis performance metrics

Key Metrics by Category

Business Metrics:

User acquisition: Signups, activations, conversions by organization
Engagement: Animal creation, lineage updates, breeding events
Commerce: Listings, sales, transaction volume, payment success rates
Retention: Active users, subscription renewals, feature adoption

Technical Metrics:

Service Performance: API latency (p50, p95, p99), error rates, throughput per service
Event Processing: Event publish/consume rates, queue depths, processing lag
Database Performance: Query performance, connection pool usage, replication lag
Infrastructure: CPU, memory, network per ECS cluster and service

Future Architecture Considerations

Simplified Evolution Roadmap (18-Month Plan)

Phase 1 (Months 1-6): Foundation & Core

Deploy 6 consolidated services with Docker Compose
Establish multi-tenant security and basic event-driven patterns
Single PostgreSQL database with schema isolation
Basic monitoring and logging with CloudWatch

Phase 2 (Months 7-12): Feature Completion

Complete all business domain features including genetics AI
Full marketplace and breeder workspace functionality
Advanced search with PostgreSQL full-text search
Production deployment with 2-3 EC2 instances

Phase 3 (Months 13-18): Scale & Optimization

Performance optimization and caching strategies
Enhanced monitoring and operational dashboards
Consider service splitting if needed (based on actual usage patterns)
Geographic expansion preparation

Technology Evolution Pathways

Database Strategy Evolution:

Current: 6 separate PostgreSQL databases with RLS per service
Scale (10K+ users): Read replicas per database and connection pooling optimization
Advanced: Specialized databases per service (e.g., time-series for telemetry, graph DB for lineage)
Enterprise (100K+ users): Database per tenant for largest customers
Global (1M+ users): Multi-region database replication per service

Event Architecture Evolution:

Current: SNS/SQS with EventBridge for real-time events
Scale: Event schema registry and versioning
Advanced: Event sourcing expansion to more domains
Global: Multi-region event replication and CQRS optimization

Deployment Strategy Evolution:

Current: Docker Compose on EC2 with 6 consolidated services
Scale: ECS Fargate migration when operational complexity justifies it
Advanced: Service splitting based on actual scaling needs and bottlenecks
Enterprise: Kubernetes evaluation only if team size and complexity demand it

Getting Started

System Architecture

Backend Services

Frontend Applications

Shared Packages

Development Standards

Infrastructure & Operations

API Reference

​System Architecture Overview

​System Purpose and Vision

​Primary Goal

​Core Value Proposition

​Guiding Principle

​System Context (C4 Level 1)

​Primary Actors

​External Systems

​High-Level Architecture

​Architecture Principles

​System Boundaries (6 Consolidated Services)

​Technology Stack

​Backend Services

​Frontend Applications

​Infrastructure & Platform

​Data Storage

​Event-Driven Architecture

​Real-Time Consistency Design

​Event Categories & Processing

​Event Processing Patterns

​Service Communication

​Security Architecture

​Multi-Tenant Security Framework

​Multi-Tenant Data Isolation

​Privacy and Compliance

​Data Architecture

​Database-Per-Service Strategy

​Multi-Tenant Data Patterns

​Database Features

​Caching Strategy

​Performance and Scalability

​Performance Targets

​Scaling Strategy

​Deployment Architecture

​Simple Container Deployment

​Monitoring and Observability

​Streamlined Observability (6 Services)

​Key Metrics by Category

​Future Architecture Considerations

​Simplified Evolution Roadmap (18-Month Plan)

​Technology Evolution Pathways

System Architecture Overview

System Purpose and Vision

Primary Goal

Core Value Proposition

Guiding Principle

System Context (C4 Level 1)

Primary Actors

External Systems

High-Level Architecture

Architecture Principles

System Boundaries (6 Consolidated Services)

Technology Stack

Backend Services

Frontend Applications

Infrastructure & Platform

Data Storage

Event-Driven Architecture

Real-Time Consistency Design

Event Categories & Processing

Event Processing Patterns

Service Communication

Security Architecture

Multi-Tenant Security Framework

Multi-Tenant Data Isolation

Privacy and Compliance

Data Architecture

Database-Per-Service Strategy

Multi-Tenant Data Patterns

Database Features

Caching Strategy

Performance and Scalability

Performance Targets

Scaling Strategy

Deployment Architecture

Simple Container Deployment

Monitoring and Observability

Streamlined Observability (6 Services)

Key Metrics by Category

Future Architecture Considerations

Simplified Evolution Roadmap (18-Month Plan)

Technology Evolution Pathways