Database Architecture & Management

Detailed database architecture and management strategy for ReptiDex, covering PostgreSQL primary storage, Redis caching, schema design, performance optimization, backup strategies, and scaling approaches.

Database Overview

Architecture & strategy

PostgreSQL

Primary database setup

Redis Caching

Cache & session storage

Backup & Recovery

Data protection strategy

Database Architecture Overview

Multi-Database Strategy

PostgreSQL Primary

Relational Data Storage

User accounts and authentication
Animal records and breeding data
Vivarium organizations and memberships
Transactional commerce data
Taxonomy and reference data

Redis Cache Layer

High-Performance Caching

Session storage and management
Frequently accessed data caching
Real-time features and pub/sub
Rate limiting and temporary data

Database Technology Decisions

PostgreSQL Choice
Redis Choice
Alternative Considerations

Why PostgreSQL for Primary Storage

Technical Advantages:

ACID Compliance: Full transactional integrity for commerce and breeding records
JSON Support: Native JSON/JSONB for flexible animal attribute storage
Advanced Indexing: GIN, GiST indexes for complex queries and search
Full-Text Search: Built-in search capabilities for animal and breeder discovery

Operational Benefits:

AWS RDS Integration: Managed service with Multi-AZ, automated backups
Mature Ecosystem: Extensive Python/SQLAlchemy support
Performance: Excellent performance for read-heavy workloads
Scaling Options: Read replicas, connection pooling, partitioning

ReptiDex Specific Features:

Hierarchical Data: Support for nested taxonomies and lineage trees
Geographic Queries: PostGIS extension for location-based features
Array Support: Native array types for tag and morph listings
Custom Types: Enum types for species, morphs, and breeding statuses

PostgreSQL Configuration

Database Schema Architecture

Schema Organization by Service

• repti_core: Users, authentication, profiles, vivariums (orgs)
• repti_animal: Animals, records, breeding, lineage
• repti_commerce: Listings, transactions, payments
• repti_community: Messages, forums, notifications

• repti_media: File storage, image metadata, processing
• repti_ops: System logs, metrics, configuration
• taxonomy: Species, morphs, traits (shared reference)
• audit: Change tracking, compliance, security logs

Data Relationships and Schema Organization

The database schemas are organized by service boundaries with clear data relationships:

Key Entity Relationships

• Users → Animals (ownership relationship)
• Users → Vivariums (organization membership)
• Vivariums → Animals (organizational grouping)
• Animals → Breeding Events (genetic tracking)
• Animals → Listings (marketplace integration)
• Species/Morphs → Animals (taxonomic classification)

• Foreign keys enforce referential integrity
• Shared taxonomy tables across all schemas
• Audit trails for all transactional data
• Media references linked to primary entities
• Geographic data for location-based features

Detailed schema designs are maintained within their respective component documentation to ensure single-source maintenance and prevent synchronization issues. Refer to individual service documentation for complete table structures and relationships.

Database Performance Optimization

Performance Tuning Strategy

• Primary Keys: UUID with sequential generation
• Foreign Keys: All relationships indexed
• Query Patterns: Composite indexes for common filters
• JSON Fields: GIN indexes for JSONB columns
• Full-Text Search: GIN indexes on tsvector columns

• Connection Pooling: PgBouncer for connection management
• Query Analysis: Regular EXPLAIN ANALYZE reviews
• Pagination: Cursor-based for large result sets
• Aggregations: Materialized views for complex reports
• Bulk Operations: COPY and batch inserts

• Slow Query Log: Track queries >1 second
• Index Usage: Monitor unused and duplicate indexes
• VACUUM: Automated maintenance scheduling
• Statistics: Regular ANALYZE for query planner
• Bloat Monitoring: Table and index bloat tracking

Redis Cache Configuration

Cache Architecture Strategy

Session Management
Query Result Caching
Real-time Features
Rate Limiting

User Session Storage

Session Data Structure:

// Key: session:{session_token}
// TTL: 2 weeks (configurable)
{
  "user_id": "uuid",
  "username": "string",
  "email": "string",
  "subscription_tier": "free|hobbyist|breeder|professional",
  "permissions": ["read", "write", "admin"],
  "preferences": {},
  "last_activity": "2025-01-15T10:30:00Z",
  "device_info": {
    "user_agent": "string",
    "ip_address": "string",
    "device_type": "desktop|mobile|tablet"
  }
}

Authentication Token Cache:

// Key: auth:token:{token_hash}
// TTL: 1 hour (access tokens), 30 days (refresh tokens)
{
  "user_id": "uuid",
  "token_type": "access|refresh",
  "expires_at": "2025-01-15T11:30:00Z",
  "scopes": ["read", "write"],
  "issued_at": "2025-01-15T10:30:00Z"
}

Login Attempt Tracking:

// Key: login_attempts:{ip_address}
// TTL: 1 hour
{
  "attempts": 3,
  "first_attempt": "2025-01-15T10:25:00Z",
  "last_attempt": "2025-01-15T10:30:00Z",
  "blocked_until": "2025-01-15T11:30:00Z"
}

Redis Performance Configuration

Redis Optimization Settings

• Max Memory: 80% of available RAM
• Eviction Policy: allkeys-lru for cache data
• Memory Sampling: 5 samples for LRU decisions
• Compression: Hash and list compression enabled
• Key Expiration: Active expiry of 10% keys per cycle

• RDB Snapshots: Every 15 minutes if >1 change
• AOF: Enabled with everysec fsync policy
• AOF Rewrite: Auto-rewrite when 100% size growth
• Backup: Daily snapshots to S3
• Multi-AZ: ElastiCache automatic failover

Backup & Recovery Strategy

Comprehensive Data Protection

PostgreSQL Backups
Redis Persistence
Disaster Recovery

RDS Automated Backup Strategy

Automated Daily Backups:

Backup Window: 3:00-4:00 AM EST (low traffic period)
Retention Period: 30 days for production, 7 days for development
Point-in-Time Recovery: Available within retention period
Cross-Region Backup: Automated replication to secondary region

Manual Backup Procedures:

-- Pre-deployment backup
CREATE BACKUP 'reptidex-pre-deploy-{timestamp}';

-- Monthly full export for archival
pg_dump -h {rds_endpoint} -U {username} -f reptidex_full_{date}.sql reptidex

-- Schema-only backup for development
pg_dump --schema-only -h {rds_endpoint} -U {username} reptidex > schema_{date}.sql

Critical Data Export:

User Data: Weekly export of user accounts and profiles
Transaction Data: Daily export of commerce transactions
Animal Records: Weekly export of breeding and lineage data
System Configuration: Monthly export of settings and configurations

Backup Verification:

Automated Testing: Weekly restore tests to staging environment
Data Integrity: Checksums and row count validation
Recovery Time Testing: Monthly full recovery time measurement
Cross-Region Validation: Quarterly disaster recovery drills

Backup Monitoring and Alerting

Backup Health Monitoring

• Backup Completion: Success/failure alerts
• Size Validation: Backup size consistency checks
• Time Tracking: Backup duration monitoring
• Storage Usage: Backup storage capacity alerts

• Checksum Validation: Automatic data integrity verification
• Row Count Verification: Table count consistency checks
• Foreign Key Validation: Referential integrity testing
• Sample Data Testing: Random data sampling verification

• Restore Testing: Weekly automated restore validation
• Performance Baseline: Recovery time benchmarking
• Dependency Verification: External service availability
• Documentation Updates: Automated procedure updates

Database Scaling Strategy

Horizontal and Vertical Scaling

PostgreSQL Scaling
Redis Scaling
Database Sharding

Read Replica Strategy

Read Replica Configuration:

Geographic Distribution: Read replicas in user-dense regions
Query Routing: Automatic read/write splitting at application level
Lag Monitoring: < 5 second replication lag tolerance
Failover Capability: Promote replica to primary if needed

Connection Pooling:

# PgBouncer configuration for connection management
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 100
reserve_pool_size = 10
max_db_connections = 100

# Application connection configuration
DATABASE_CONFIG = {
    'WRITE_DB': 'postgresql://user:pass@primary-endpoint:5432/reptidex',
    'READ_DB': 'postgresql://user:pass@replica-endpoint:5432/reptidex',
    'POOL_SIZE': 20,
    'MAX_OVERFLOW': 50,
    'POOL_TIMEOUT': 30,
    'POOL_RECYCLE': 3600
}

Partitioning Strategy:

Time-based Partitioning: Care logs and system logs by month
Hash Partitioning: Large tables by user_id hash
Range Partitioning: Transactions by date ranges
Automated Maintenance: Partition creation and cleanup scripts

Vertical Scaling Thresholds:

CPU Utilization: Scale up when >80% for 10 minutes
Memory Usage: Scale up when >85% for 5 minutes
Connection Count: Scale up when >80% of max connections
IOPS Utilization: Scale up when >70% of provisioned IOPS

Database Maintenance Automation

Automated Maintenance Tasks

• VACUUM: Weekly full vacuum, daily auto-vacuum
• ANALYZE: Statistics update after significant data changes
• REINDEX: Monthly index rebuilding for large tables
• Log Rotation: Daily cleanup of old log files
• Connection Monitoring: Idle connection cleanup
• Bloat Detection: Weekly table and index bloat reports

• Memory Defragmentation: Weekly active defragmentation
• Key Expiration: Automated cleanup of expired keys
• AOF Rewrite: Automatic AOF compaction when needed
• Snapshot Management: Automated snapshot cleanup
• Performance Monitoring: Key distribution analysis
• Cache Hit Rate: Monitoring and optimization alerts

This comprehensive database architecture provides ReptiDex with a robust, scalable foundation for data storage and retrieval. The combination of PostgreSQL for transactional data and Redis for high-performance caching ensures optimal performance while maintaining data integrity and providing clear scaling paths for future growth.

Getting Started

System Architecture

Backend Services

Frontend Applications

Shared Packages

Development Standards

Infrastructure & Operations

API Reference

Database Architecture & Management

Database Architecture & Management

Quick Navigation

Database Architecture Overview

Multi-Database Strategy

PostgreSQL Primary

Redis Cache Layer

Database Technology Decisions

Why PostgreSQL for Primary Storage

PostgreSQL Configuration

Database Schema Architecture

Schema Organization by Service

Data Relationships and Schema Organization

Key Entity Relationships

Database Performance Optimization

Performance Tuning Strategy

Redis Cache Configuration

Cache Architecture Strategy

User Session Storage

Redis Performance Configuration

Redis Optimization Settings

Backup & Recovery Strategy

Comprehensive Data Protection

RDS Automated Backup Strategy

Backup Monitoring and Alerting

Backup Health Monitoring

Database Scaling Strategy

Horizontal and Vertical Scaling

Read Replica Strategy

Database Maintenance Automation

Automated Maintenance Tasks

Getting Started

System Architecture

Backend Services

Frontend Applications

Shared Packages

Development Standards

Infrastructure & Operations

API Reference

​Database Architecture & Management

​Quick Navigation

​Database Architecture Overview

​Multi-Database Strategy

PostgreSQL Primary

Redis Cache Layer

​Database Technology Decisions

​Why PostgreSQL for Primary Storage

​PostgreSQL Configuration

​Database Schema Architecture

​Schema Organization by Service

​Data Relationships and Schema Organization

​Key Entity Relationships

​Database Performance Optimization

​Performance Tuning Strategy

​Redis Cache Configuration

​Cache Architecture Strategy

​User Session Storage

​Redis Performance Configuration

​Redis Optimization Settings

​Backup & Recovery Strategy

​Comprehensive Data Protection

​RDS Automated Backup Strategy

​Backup Monitoring and Alerting

​Backup Health Monitoring

​Database Scaling Strategy

​Horizontal and Vertical Scaling

​Read Replica Strategy

​Database Maintenance Automation

​Automated Maintenance Tasks

Database Architecture & Management

Quick Navigation

Database Architecture Overview

Multi-Database Strategy

Database Technology Decisions

Why PostgreSQL for Primary Storage

PostgreSQL Configuration

Database Schema Architecture

Schema Organization by Service

Data Relationships and Schema Organization

Key Entity Relationships

Database Performance Optimization

Performance Tuning Strategy

Redis Cache Configuration

Cache Architecture Strategy

User Session Storage

Redis Performance Configuration

Redis Optimization Settings

Backup & Recovery Strategy

Comprehensive Data Protection

RDS Automated Backup Strategy

Backup Monitoring and Alerting

Backup Health Monitoring

Database Scaling Strategy

Horizontal and Vertical Scaling

Read Replica Strategy

Database Maintenance Automation

Automated Maintenance Tasks