System Architecture Overview
ReptiDex is built as a modern, cloud-native platform designed for scalability, reliability, and maintainability. The architecture follows domain-driven design principles with microservices that can scale independently while maintaining data consistency and business rule integrity.System Purpose and Vision
Primary Goal
To streamline reptile lineage tracking by creating a standardized, trusted, and transparent process across reptile species, establishing a shared standard for lineage tracking that makes data interoperable and reliable across breeders, buyers, and organizations.Core Value Proposition
- Lineage-First System: Ensures accurate tracking of genetics, ancestry, and health across generations
- Verified Transparency: Builds breeder and buyer confidence through verified, transparent records
- Foundation-Focused: Positions lineage tracking as the foundation, with additional features as value-add bonuses
Guiding Principle
Every feature in ReptiDex supports or enhances the core lineage tracking experience.System Context (C4 Level 1)
Primary Actors
- Breeder: Manages vivariums, animals, pairings/clutches, lineage, media; sets up profiles/listings; purchases ads; receives notifications
- Buyer/Visitor: Browses public profiles/listings, views pedigrees and lineage; interacts with embedded widgets
- ReptiDex Admin: Handles moderation, abuse, feature flags, entitlements, and support tooling
External Systems
- Payment Processors: Stripe/PayPal for subscriptions, invoices, dunning, refunds
- Communication: Email/SMS providers for transactional and marketing notifications
- Authentication: OAuth IdPs (Google/GitHub) for optional SSO
- Content Delivery: CDN/Edge for public pages, media, embed assets, export downloads
- Future Integrations: Genetics labs, partner marketplaces, reptile registries
High-Level Architecture
Architecture Principles
- Domain-Driven Design: Services organized around business domains with clear bounded contexts
- Event-Driven Architecture: Real-time consistency through synchronous events and eventual consistency through async messaging
- Consolidated Microservices: 6 logical services with shared databases, optimized for solo developer operations
- Multi-Tenant by Design: Row-level security and organization-scoped data from day 1
- API-First: All functionality exposed through well-defined OpenAPI contracts
- Cloud-Native: Built for AWS with simple, cost-effective deployment strategy
System Boundaries (6 Consolidated Services)
Technology Stack
Backend Services
- Runtime: Python with FastAPI
- Authentication: JWT with RS256 signatures
- Validation: Pydantic schemas with comprehensive validation
- Testing: pytest with comprehensive test coverage
Frontend Applications
- Applications: 4 core apps (web-public, web-breeder, web-admin, web-embed)
- Framework: Vite + React 19
- Language: TypeScript
- UI Components: Radix UI with Tailwind CSS via @reptidex/ui
- State Management: Zustand + React Query via @reptidex/core
- Shared Packages: Simplified 2-package approach for maintainability
Infrastructure & Platform
- Cloud Provider: AWS
- Deployment: EC2 instances with Docker Compose (simple and cost-effective)
- Load Balancing: Application Load Balancer
- DNS & CDN: CloudFront with Route 53
- Monitoring: Grafana + Prometheus + Loki
Data Storage
- Primary Database: PostgreSQL (RDS) with multi-schema design
- Caching: Redis (ElastiCache) for session and computed data
- Search Engine: PostgreSQL full-text search (simple, no OpenSearch overhead)
- Object Storage: S3 with lifecycle policies
- Message Queuing: SNS/SQS for event-driven communication
Event-Driven Architecture
Real-Time Consistency Design
ReptiDex implements a hybrid event architecture optimized for real-time consistency where critical and eventual consistency where acceptable. Event Infrastructure:- Primary: AWS SNS → SQS with dead letter queues for reliable delivery
- Real-time: EventBridge for synchronous cross-service events
- Event Store: PostgreSQL with event sourcing for audit trails and replay
- Saga Orchestration: Distributed transaction coordination for complex workflows
Event Categories & Processing
Critical Events (Synchronous Processing):animal.created,animal.updated,animal.transferred- Immediate lineage updateslineage.updated,lineage.validated- Real-time ancestry graph consistencybreeding.paired,breeding.clutch.created- Genetics analysis triggerstransaction.initiated,transaction.completed- Financial consistencyinventory.reserved,inventory.transferred- Availability state changes
genetics.prediction.updated- AI model results propagationcommerce.listing.created- Marketplace visibility updatessales.inquiry.received- Customer engagement notificationsprofile.updated- Search index and cache invalidation
search.index.updated- Search engine synchronizationanalytics.event.tracked- Business intelligence aggregationnotification.queued- Email/SMS delivery processingaudit.logged- Compliance and monitoring data
Event Processing Patterns
Saga Pattern for Complex Transactions:- Animal ownership history (immutable audit trail)
- Lineage relationships (ancestry validation)
- Financial transactions (regulatory compliance)
- Breeding records (genetic history preservation)
Service Communication
Synchronous (Real-time):- API calls for immediate consistency requirements
- Health checks and service discovery
- User-facing operations requiring instant feedback
- Cross-domain data synchronization
- Business process orchestration
- Analytics and reporting pipeline
- Notification delivery system
Security Architecture
Multi-Tenant Security Framework
ReptiDex implements defense-in-depth security with organization-level isolation from day 1. Security Layers:- Network Security: VPC with private subnets, security groups, WAF protection
- Application Security: JWT with RS256, RBAC with vivarium roles, API rate limiting
- Data Security: Row-level security (RLS), encryption at rest/transit, field-level privacy
- Service Security: Service-to-service authentication, circuit breakers, input validation
Multi-Tenant Data Isolation
Row-Level Security (RLS):- Service-per-database with dedicated connection pools
- API gateway with tenant context validation
- Cross-service calls include tenant verification
- Event messages include organization scope
Privacy and Compliance
- Granular Privacy Controls: Public, organization-only, private, members-only visibility
- Immutable Audit Trails: Complete audit logging via repti-audit service
- Data Ownership: Clear data ownership with transfer controls and two-party handshake
- Compliance Ready: GDPR/CCPA compliance framework with data export/deletion
- Secure Identifiers: Base58 global IDs to prevent enumeration attacks
Data Architecture
Database-Per-Service Strategy
Each of the 6 consolidated services owns its database completely, ensuring proper isolation and service autonomy. Service Database Ownership:- repti_core_db: Auth, config, billing, events, telemetry data
- repti_animal_db: Animals, lineage, genetics, taxonomy, breeding data
- repti_commerce_db: Marketplace, sales, transactions, inventory data
- repti_media_db: File metadata, rendering jobs, embed configurations
- repti_community_db: Search indexes, notifications, community data
- repti_ops_db: Admin tools, audit logs, integrations, system logs
Multi-Tenant Data Patterns
- Row-Level Security: All tenant-scoped tables use RLS policies for organization isolation
- Tenant Context: Each service connection sets tenant context for automatic filtering
- Database Isolation: Services can only access their own database
- Cross-Service Data Access: Only via service APIs and events, never direct database access
Database Features
- Service Database Isolation: Complete data ownership and schema control per service
- Independent Scaling: Each database can be scaled based on service-specific needs
- Service-Specific Backups: Tailored backup strategies per database (critical vs supporting)
- Fault Isolation: Database issues in one service don’t affect others
- Technology Flexibility: Can use specialized databases per service if needed (e.g., time-series for telemetry)
Caching Strategy
Performance and Scalability
Performance Targets
- API Response Time: p95 < 500ms for read operations
- Database Performance: Query optimization with proper indexing
- Cache Hit Rates: > 90% for frequently accessed data
- CDN Optimization: Global edge locations for static content
Scaling Strategy
- Service-Level Auto-Scaling: Independent scaling for each of 30 microservices based on demand
- Database Scaling: Service-specific read replicas with connection pooling per database
- Event-Driven Scaling: SNS/SQS scaling based on queue depth and processing lag
- Geographic Scaling: Multi-region deployment with cross-region data replication
Deployment Architecture
Simple Container Deployment
Docker Compose on EC2:- Single EC2 Instance: t3.large or t3.xlarge for development/staging
- Production Setup: 2-3 EC2 instances behind ALB for redundancy
- Service Containers: 6 consolidated services in Docker containers
- Shared Networking: Docker Compose networking for inter-service communication
- Application Load Balancer with target groups for each service
- Simple service discovery via environment variables and internal DNS
- Health checks and automatic failover
- Rolling deployments with zero downtime
Monitoring and Observability
Streamlined Observability (6 Services)
Centralized Monitoring via repti-ops:- Logs: Structured JSON logging aggregated from all 6 services
- Metrics: Prometheus instrumentation with Grafana dashboards and alerting
- Health Checks: Simple health monitoring with service dependency tracking
- Alerts: Grafana intelligent alerting with CloudWatch fallback for infrastructure
- APM Monitoring: Application performance monitoring for all 6 services
- Custom Dashboards: Business metrics and service health visualization
- Distributed Tracing: End-to-end request tracing across service boundaries
- Error Tracking: Automatic error detection and notification
- Infrastructure Monitoring: ECS, RDS, and Redis performance metrics
Key Metrics by Category
Business Metrics:- User acquisition: Signups, activations, conversions by organization
- Engagement: Animal creation, lineage updates, breeding events
- Commerce: Listings, sales, transaction volume, payment success rates
- Retention: Active users, subscription renewals, feature adoption
- Service Performance: API latency (p50, p95, p99), error rates, throughput per service
- Event Processing: Event publish/consume rates, queue depths, processing lag
- Database Performance: Query performance, connection pool usage, replication lag
- Infrastructure: CPU, memory, network per ECS cluster and service
Future Architecture Considerations
Simplified Evolution Roadmap (18-Month Plan)
Phase 1 (Months 1-6): Foundation & Core- Deploy 6 consolidated services with Docker Compose
- Establish multi-tenant security and basic event-driven patterns
- Single PostgreSQL database with schema isolation
- Basic monitoring and logging with CloudWatch
- Complete all business domain features including genetics AI
- Full marketplace and breeder workspace functionality
- Advanced search with PostgreSQL full-text search
- Production deployment with 2-3 EC2 instances
- Performance optimization and caching strategies
- Enhanced monitoring and operational dashboards
- Consider service splitting if needed (based on actual usage patterns)
- Geographic expansion preparation
Technology Evolution Pathways
Database Strategy Evolution:- Current: 6 separate PostgreSQL databases with RLS per service
- Scale (10K+ users): Read replicas per database and connection pooling optimization
- Advanced: Specialized databases per service (e.g., time-series for telemetry, graph DB for lineage)
- Enterprise (100K+ users): Database per tenant for largest customers
- Global (1M+ users): Multi-region database replication per service
- Current: SNS/SQS with EventBridge for real-time events
- Scale: Event schema registry and versioning
- Advanced: Event sourcing expansion to more domains
- Global: Multi-region event replication and CQRS optimization
- Current: Docker Compose on EC2 with 6 consolidated services
- Scale: ECS Fargate migration when operational complexity justifies it
- Advanced: Service splitting based on actual scaling needs and bottlenecks
- Enterprise: Kubernetes evaluation only if team size and complexity demand it

