Skip to main content

Database Architecture & Management

Detailed database architecture and management strategy for ReptiDex, covering PostgreSQL primary storage, Redis caching, schema design, performance optimization, backup strategies, and scaling approaches.

Quick Navigation


Database Architecture Overview

Multi-Database Strategy

PostgreSQL Primary

Relational Data Storage
  • User accounts and authentication
  • Animal records and breeding data
  • Vivarium organizations and memberships
  • Transactional commerce data
  • Taxonomy and reference data

Redis Cache Layer

High-Performance Caching
  • Session storage and management
  • Frequently accessed data caching
  • Real-time features and pub/sub
  • Rate limiting and temporary data

Database Technology Decisions

  • PostgreSQL Choice
  • Redis Choice
  • Alternative Considerations

Why PostgreSQL for Primary Storage

Technical Advantages:
  • ACID Compliance: Full transactional integrity for commerce and breeding records
  • JSON Support: Native JSON/JSONB for flexible animal attribute storage
  • Advanced Indexing: GIN, GiST indexes for complex queries and search
  • Full-Text Search: Built-in search capabilities for animal and breeder discovery
Operational Benefits:
  • AWS RDS Integration: Managed service with Multi-AZ, automated backups
  • Mature Ecosystem: Extensive Python/SQLAlchemy support
  • Performance: Excellent performance for read-heavy workloads
  • Scaling Options: Read replicas, connection pooling, partitioning
ReptiDex Specific Features:
  • Hierarchical Data: Support for nested taxonomies and lineage trees
  • Geographic Queries: PostGIS extension for location-based features
  • Array Support: Native array types for tag and morph listings
  • Custom Types: Enum types for species, morphs, and breeding statuses

PostgreSQL Configuration

Database Schema Architecture

Schema Organization by Service

  • repti_core: Users, authentication, profiles, vivariums (orgs)
  • repti_animal: Animals, records, breeding, lineage
  • repti_commerce: Listings, transactions, payments
  • repti_community: Messages, forums, notifications
  • repti_media: File storage, image metadata, processing
  • repti_ops: System logs, metrics, configuration
  • taxonomy: Species, morphs, traits (shared reference)
  • audit: Change tracking, compliance, security logs

Data Relationships and Schema Organization

The database schemas are organized by service boundaries with clear data relationships:

Key Entity Relationships

  • UsersAnimals (ownership relationship)
  • UsersVivariums (organization membership)
  • VivariumsAnimals (organizational grouping)
  • AnimalsBreeding Events (genetic tracking)
  • AnimalsListings (marketplace integration)
  • Species/MorphsAnimals (taxonomic classification)
  • • Foreign keys enforce referential integrity
  • • Shared taxonomy tables across all schemas
  • • Audit trails for all transactional data
  • • Media references linked to primary entities
  • • Geographic data for location-based features
Detailed schema designs are maintained within their respective component documentation to ensure single-source maintenance and prevent synchronization issues. Refer to individual service documentation for complete table structures and relationships.

Database Performance Optimization

Performance Tuning Strategy

  • Primary Keys: UUID with sequential generation
  • Foreign Keys: All relationships indexed
  • Query Patterns: Composite indexes for common filters
  • JSON Fields: GIN indexes for JSONB columns
  • Full-Text Search: GIN indexes on tsvector columns
  • Connection Pooling: PgBouncer for connection management
  • Query Analysis: Regular EXPLAIN ANALYZE reviews
  • Pagination: Cursor-based for large result sets
  • Aggregations: Materialized views for complex reports
  • Bulk Operations: COPY and batch inserts
  • Slow Query Log: Track queries >1 second
  • Index Usage: Monitor unused and duplicate indexes
  • VACUUM: Automated maintenance scheduling
  • Statistics: Regular ANALYZE for query planner
  • Bloat Monitoring: Table and index bloat tracking

Redis Cache Configuration

Cache Architecture Strategy

  • Session Management
  • Query Result Caching
  • Real-time Features
  • Rate Limiting

User Session Storage

Session Data Structure:
// Key: session:{session_token}
// TTL: 2 weeks (configurable)
{
  "user_id": "uuid",
  "username": "string",
  "email": "string",
  "subscription_tier": "free|hobbyist|breeder|professional",
  "permissions": ["read", "write", "admin"],
  "preferences": {},
  "last_activity": "2025-01-15T10:30:00Z",
  "device_info": {
    "user_agent": "string",
    "ip_address": "string",
    "device_type": "desktop|mobile|tablet"
  }
}
Authentication Token Cache:
// Key: auth:token:{token_hash}
// TTL: 1 hour (access tokens), 30 days (refresh tokens)
{
  "user_id": "uuid",
  "token_type": "access|refresh",
  "expires_at": "2025-01-15T11:30:00Z",
  "scopes": ["read", "write"],
  "issued_at": "2025-01-15T10:30:00Z"
}
Login Attempt Tracking:
// Key: login_attempts:{ip_address}
// TTL: 1 hour
{
  "attempts": 3,
  "first_attempt": "2025-01-15T10:25:00Z",
  "last_attempt": "2025-01-15T10:30:00Z",
  "blocked_until": "2025-01-15T11:30:00Z"
}

Redis Performance Configuration

Redis Optimization Settings

  • Max Memory: 80% of available RAM
  • Eviction Policy: allkeys-lru for cache data
  • Memory Sampling: 5 samples for LRU decisions
  • Compression: Hash and list compression enabled
  • Key Expiration: Active expiry of 10% keys per cycle
  • RDB Snapshots: Every 15 minutes if >1 change
  • AOF: Enabled with everysec fsync policy
  • AOF Rewrite: Auto-rewrite when 100% size growth
  • Backup: Daily snapshots to S3
  • Multi-AZ: ElastiCache automatic failover

Backup & Recovery Strategy

Comprehensive Data Protection

  • PostgreSQL Backups
  • Redis Persistence
  • Disaster Recovery

RDS Automated Backup Strategy

Automated Daily Backups:
  • Backup Window: 3:00-4:00 AM EST (low traffic period)
  • Retention Period: 30 days for production, 7 days for development
  • Point-in-Time Recovery: Available within retention period
  • Cross-Region Backup: Automated replication to secondary region
Manual Backup Procedures:
-- Pre-deployment backup
CREATE BACKUP 'reptidex-pre-deploy-{timestamp}';

-- Monthly full export for archival
pg_dump -h {rds_endpoint} -U {username} -f reptidex_full_{date}.sql reptidex

-- Schema-only backup for development
pg_dump --schema-only -h {rds_endpoint} -U {username} reptidex > schema_{date}.sql
Critical Data Export:
  • User Data: Weekly export of user accounts and profiles
  • Transaction Data: Daily export of commerce transactions
  • Animal Records: Weekly export of breeding and lineage data
  • System Configuration: Monthly export of settings and configurations
Backup Verification:
  • Automated Testing: Weekly restore tests to staging environment
  • Data Integrity: Checksums and row count validation
  • Recovery Time Testing: Monthly full recovery time measurement
  • Cross-Region Validation: Quarterly disaster recovery drills

Backup Monitoring and Alerting

Backup Health Monitoring

  • Backup Completion: Success/failure alerts
  • Size Validation: Backup size consistency checks
  • Time Tracking: Backup duration monitoring
  • Storage Usage: Backup storage capacity alerts
  • Checksum Validation: Automatic data integrity verification
  • Row Count Verification: Table count consistency checks
  • Foreign Key Validation: Referential integrity testing
  • Sample Data Testing: Random data sampling verification
  • Restore Testing: Weekly automated restore validation
  • Performance Baseline: Recovery time benchmarking
  • Dependency Verification: External service availability
  • Documentation Updates: Automated procedure updates

Database Scaling Strategy

Horizontal and Vertical Scaling

  • PostgreSQL Scaling
  • Redis Scaling
  • Database Sharding

Read Replica Strategy

Read Replica Configuration:
  • Geographic Distribution: Read replicas in user-dense regions
  • Query Routing: Automatic read/write splitting at application level
  • Lag Monitoring: < 5 second replication lag tolerance
  • Failover Capability: Promote replica to primary if needed
Connection Pooling:
# PgBouncer configuration for connection management
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 100
reserve_pool_size = 10
max_db_connections = 100

# Application connection configuration
DATABASE_CONFIG = {
    'WRITE_DB': 'postgresql://user:pass@primary-endpoint:5432/reptidex',
    'READ_DB': 'postgresql://user:pass@replica-endpoint:5432/reptidex',
    'POOL_SIZE': 20,
    'MAX_OVERFLOW': 50,
    'POOL_TIMEOUT': 30,
    'POOL_RECYCLE': 3600
}
Partitioning Strategy:
  • Time-based Partitioning: Care logs and system logs by month
  • Hash Partitioning: Large tables by user_id hash
  • Range Partitioning: Transactions by date ranges
  • Automated Maintenance: Partition creation and cleanup scripts
Vertical Scaling Thresholds:
  • CPU Utilization: Scale up when >80% for 10 minutes
  • Memory Usage: Scale up when >85% for 5 minutes
  • Connection Count: Scale up when >80% of max connections
  • IOPS Utilization: Scale up when >70% of provisioned IOPS

Database Maintenance Automation

Automated Maintenance Tasks

  • VACUUM: Weekly full vacuum, daily auto-vacuum
  • ANALYZE: Statistics update after significant data changes
  • REINDEX: Monthly index rebuilding for large tables
  • Log Rotation: Daily cleanup of old log files
  • Connection Monitoring: Idle connection cleanup
  • Bloat Detection: Weekly table and index bloat reports
  • Memory Defragmentation: Weekly active defragmentation
  • Key Expiration: Automated cleanup of expired keys
  • AOF Rewrite: Automatic AOF compaction when needed
  • Snapshot Management: Automated snapshot cleanup
  • Performance Monitoring: Key distribution analysis
  • Cache Hit Rate: Monitoring and optimization alerts

This comprehensive database architecture provides ReptiDex with a robust, scalable foundation for data storage and retrieval. The combination of PostgreSQL for transactional data and Redis for high-performance caching ensures optimal performance while maintaining data integrity and providing clear scaling paths for future growth.