Skip to main content

System Architecture Overview

ReptiDex is built as a modern, cloud-native platform designed for scalability, reliability, and maintainability. The architecture follows domain-driven design principles with microservices that can scale independently while maintaining data consistency and business rule integrity.

System Purpose and Vision

Primary Goal

To streamline reptile lineage tracking by creating a standardized, trusted, and transparent process across reptile species, establishing a shared standard for lineage tracking that makes data interoperable and reliable across breeders, buyers, and organizations.

Core Value Proposition

  • Lineage-First System: Ensures accurate tracking of genetics, ancestry, and health across generations
  • Verified Transparency: Builds breeder and buyer confidence through verified, transparent records
  • Foundation-Focused: Positions lineage tracking as the foundation, with additional features as value-add bonuses

Guiding Principle

Every feature in ReptiDex supports or enhances the core lineage tracking experience.

System Context (C4 Level 1)

Primary Actors

  • Breeder: Manages vivariums, animals, pairings/clutches, lineage, media; sets up profiles/listings; purchases ads; receives notifications
  • Buyer/Visitor: Browses public profiles/listings, views pedigrees and lineage; interacts with embedded widgets
  • ReptiDex Admin: Handles moderation, abuse, feature flags, entitlements, and support tooling

External Systems

  • Payment Processors: Stripe/PayPal for subscriptions, invoices, dunning, refunds
  • Communication: Email/SMS providers for transactional and marketing notifications
  • Authentication: OAuth IdPs (Google/GitHub) for optional SSO
  • Content Delivery: CDN/Edge for public pages, media, embed assets, export downloads
  • Future Integrations: Genetics labs, partner marketplaces, reptile registries

High-Level Architecture

Architecture Principles

  1. Domain-Driven Design: Services organized around business domains with clear bounded contexts
  2. Event-Driven Architecture: Real-time consistency through synchronous events and eventual consistency through async messaging
  3. Consolidated Microservices: 6 logical services with shared databases, optimized for solo developer operations
  4. Multi-Tenant by Design: Row-level security and organization-scoped data from day 1
  5. API-First: All functionality exposed through well-defined OpenAPI contracts
  6. Cloud-Native: Built for AWS with simple, cost-effective deployment strategy

System Boundaries (6 Consolidated Services)

Technology Stack

Backend Services

  • Runtime: Python with FastAPI
  • Authentication: JWT with RS256 signatures
  • Validation: Pydantic schemas with comprehensive validation
  • Testing: pytest with comprehensive test coverage

Frontend Applications

  • Applications: 4 core apps (web-public, web-breeder, web-admin, web-embed)
  • Framework: Vite + React 19
  • Language: TypeScript
  • UI Components: Radix UI with Tailwind CSS via @reptidex/ui
  • State Management: Zustand + React Query via @reptidex/core
  • Shared Packages: Simplified 2-package approach for maintainability

Infrastructure & Platform

  • Cloud Provider: AWS
  • Deployment: EC2 instances with Docker Compose (simple and cost-effective)
  • Load Balancing: Application Load Balancer
  • DNS & CDN: CloudFront with Route 53
  • Monitoring: Grafana + Prometheus + Loki

Data Storage

  • Primary Database: PostgreSQL (RDS) with multi-schema design
  • Caching: Redis (ElastiCache) for session and computed data
  • Search Engine: PostgreSQL full-text search (simple, no OpenSearch overhead)
  • Object Storage: S3 with lifecycle policies
  • Message Queuing: SNS/SQS for event-driven communication

Event-Driven Architecture

Real-Time Consistency Design

ReptiDex implements a hybrid event architecture optimized for real-time consistency where critical and eventual consistency where acceptable. Event Infrastructure:
  • Primary: AWS SNS → SQS with dead letter queues for reliable delivery
  • Real-time: EventBridge for synchronous cross-service events
  • Event Store: PostgreSQL with event sourcing for audit trails and replay
  • Saga Orchestration: Distributed transaction coordination for complex workflows

Event Categories & Processing

Critical Events (Synchronous Processing):
  • animal.created, animal.updated, animal.transferred - Immediate lineage updates
  • lineage.updated, lineage.validated - Real-time ancestry graph consistency
  • breeding.paired, breeding.clutch.created - Genetics analysis triggers
  • transaction.initiated, transaction.completed - Financial consistency
  • inventory.reserved, inventory.transferred - Availability state changes
High-Priority Events (Near Real-Time < 5 seconds):
  • genetics.prediction.updated - AI model results propagation
  • commerce.listing.created - Marketplace visibility updates
  • sales.inquiry.received - Customer engagement notifications
  • profile.updated - Search index and cache invalidation
Background Events (Asynchronous):
  • search.index.updated - Search engine synchronization
  • analytics.event.tracked - Business intelligence aggregation
  • notification.queued - Email/SMS delivery processing
  • audit.logged - Compliance and monitoring data

Event Processing Patterns

Saga Pattern for Complex Transactions:
# Animal sale orchestration with compensation
@saga_orchestrator("animal.sale")
async def animal_sale_saga(sale_request):
    steps = [
        ("inventory", "reserve_animal"),
        ("transaction", "process_payment"),
        ("animal", "transfer_ownership"),
        ("lineage", "update_ancestry"),
        ("commerce", "complete_sale")
    ]
    # Automatic compensation on failure
Event Sourcing for Critical Domains:
  • Animal ownership history (immutable audit trail)
  • Lineage relationships (ancestry validation)
  • Financial transactions (regulatory compliance)
  • Breeding records (genetic history preservation)

Service Communication

Synchronous (Real-time):
  • API calls for immediate consistency requirements
  • Health checks and service discovery
  • User-facing operations requiring instant feedback
Asynchronous (Event-driven):
  • Cross-domain data synchronization
  • Business process orchestration
  • Analytics and reporting pipeline
  • Notification delivery system

Security Architecture

Multi-Tenant Security Framework

ReptiDex implements defense-in-depth security with organization-level isolation from day 1. Security Layers:
  1. Network Security: VPC with private subnets, security groups, WAF protection
  2. Application Security: JWT with RS256, RBAC with vivarium roles, API rate limiting
  3. Data Security: Row-level security (RLS), encryption at rest/transit, field-level privacy
  4. Service Security: Service-to-service authentication, circuit breakers, input validation

Multi-Tenant Data Isolation

Row-Level Security (RLS):
-- Organization-scoped data access
CREATE POLICY org_isolation ON animals 
FOR ALL TO app_role 
USING (organization_id = current_setting('app.current_org_id')::uuid);

-- Service-level tenant context setting
SET app.current_org_id = '${tenant_id}';
Service Isolation:
  • Service-per-database with dedicated connection pools
  • API gateway with tenant context validation
  • Cross-service calls include tenant verification
  • Event messages include organization scope

Privacy and Compliance

  • Granular Privacy Controls: Public, organization-only, private, members-only visibility
  • Immutable Audit Trails: Complete audit logging via repti-audit service
  • Data Ownership: Clear data ownership with transfer controls and two-party handshake
  • Compliance Ready: GDPR/CCPA compliance framework with data export/deletion
  • Secure Identifiers: Base58 global IDs to prevent enumeration attacks

Data Architecture

Database-Per-Service Strategy

Each of the 6 consolidated services owns its database completely, ensuring proper isolation and service autonomy. Service Database Ownership:
  • repti_core_db: Auth, config, billing, events, telemetry data
  • repti_animal_db: Animals, lineage, genetics, taxonomy, breeding data
  • repti_commerce_db: Marketplace, sales, transactions, inventory data
  • repti_media_db: File metadata, rendering jobs, embed configurations
  • repti_community_db: Search indexes, notifications, community data
  • repti_ops_db: Admin tools, audit logs, integrations, system logs

Multi-Tenant Data Patterns

  • Row-Level Security: All tenant-scoped tables use RLS policies for organization isolation
  • Tenant Context: Each service connection sets tenant context for automatic filtering
  • Database Isolation: Services can only access their own database
  • Cross-Service Data Access: Only via service APIs and events, never direct database access

Database Features

  • Service Database Isolation: Complete data ownership and schema control per service
  • Independent Scaling: Each database can be scaled based on service-specific needs
  • Service-Specific Backups: Tailored backup strategies per database (critical vs supporting)
  • Fault Isolation: Database issues in one service don’t affect others
  • Technology Flexibility: Can use specialized databases per service if needed (e.g., time-series for telemetry)

Caching Strategy

Performance and Scalability

Performance Targets

  • API Response Time: p95 < 500ms for read operations
  • Database Performance: Query optimization with proper indexing
  • Cache Hit Rates: > 90% for frequently accessed data
  • CDN Optimization: Global edge locations for static content

Scaling Strategy

  • Service-Level Auto-Scaling: Independent scaling for each of 30 microservices based on demand
  • Database Scaling: Service-specific read replicas with connection pooling per database
  • Event-Driven Scaling: SNS/SQS scaling based on queue depth and processing lag
  • Geographic Scaling: Multi-region deployment with cross-region data replication

Deployment Architecture

Simple Container Deployment

Docker Compose on EC2:
  • Single EC2 Instance: t3.large or t3.xlarge for development/staging
  • Production Setup: 2-3 EC2 instances behind ALB for redundancy
  • Service Containers: 6 consolidated services in Docker containers
  • Shared Networking: Docker Compose networking for inter-service communication
Load Balancing & Discovery:
  • Application Load Balancer with target groups for each service
  • Simple service discovery via environment variables and internal DNS
  • Health checks and automatic failover
  • Rolling deployments with zero downtime

Monitoring and Observability

Streamlined Observability (6 Services)

Centralized Monitoring via repti-ops:
  • Logs: Structured JSON logging aggregated from all 6 services
  • Metrics: Prometheus instrumentation with Grafana dashboards and alerting
  • Health Checks: Simple health monitoring with service dependency tracking
  • Alerts: Grafana intelligent alerting with CloudWatch fallback for infrastructure
Grafana Integration:
  • APM Monitoring: Application performance monitoring for all 6 services
  • Custom Dashboards: Business metrics and service health visualization
  • Distributed Tracing: End-to-end request tracing across service boundaries
  • Error Tracking: Automatic error detection and notification
  • Infrastructure Monitoring: ECS, RDS, and Redis performance metrics

Key Metrics by Category

Business Metrics:
  • User acquisition: Signups, activations, conversions by organization
  • Engagement: Animal creation, lineage updates, breeding events
  • Commerce: Listings, sales, transaction volume, payment success rates
  • Retention: Active users, subscription renewals, feature adoption
Technical Metrics:
  • Service Performance: API latency (p50, p95, p99), error rates, throughput per service
  • Event Processing: Event publish/consume rates, queue depths, processing lag
  • Database Performance: Query performance, connection pool usage, replication lag
  • Infrastructure: CPU, memory, network per ECS cluster and service

Future Architecture Considerations

Simplified Evolution Roadmap (18-Month Plan)

Phase 1 (Months 1-6): Foundation & Core
  • Deploy 6 consolidated services with Docker Compose
  • Establish multi-tenant security and basic event-driven patterns
  • Single PostgreSQL database with schema isolation
  • Basic monitoring and logging with CloudWatch
Phase 2 (Months 7-12): Feature Completion
  • Complete all business domain features including genetics AI
  • Full marketplace and breeder workspace functionality
  • Advanced search with PostgreSQL full-text search
  • Production deployment with 2-3 EC2 instances
Phase 3 (Months 13-18): Scale & Optimization
  • Performance optimization and caching strategies
  • Enhanced monitoring and operational dashboards
  • Consider service splitting if needed (based on actual usage patterns)
  • Geographic expansion preparation

Technology Evolution Pathways

Database Strategy Evolution:
  • Current: 6 separate PostgreSQL databases with RLS per service
  • Scale (10K+ users): Read replicas per database and connection pooling optimization
  • Advanced: Specialized databases per service (e.g., time-series for telemetry, graph DB for lineage)
  • Enterprise (100K+ users): Database per tenant for largest customers
  • Global (1M+ users): Multi-region database replication per service
Event Architecture Evolution:
  • Current: SNS/SQS with EventBridge for real-time events
  • Scale: Event schema registry and versioning
  • Advanced: Event sourcing expansion to more domains
  • Global: Multi-region event replication and CQRS optimization
Deployment Strategy Evolution:
  • Current: Docker Compose on EC2 with 6 consolidated services
  • Scale: ECS Fargate migration when operational complexity justifies it
  • Advanced: Service splitting based on actual scaling needs and bottlenecks
  • Enterprise: Kubernetes evaluation only if team size and complexity demand it