Skip to main content

Saved LogQL Queries

This document contains commonly used LogQL queries for ReptiDex services. These queries are also available as saved queries in Grafana.

Error Analysis

All Errors from a Service

{service="repti-core"} | json | level="ERROR"

Errors with Grouping by Type

{service="repti-core"}
| json
| level="ERROR"
| line_format "{{.error_type}}: {{.message}}"

Top 10 Error Types

sum by (error_type) (
  count_over_time(
    {service="repti-core"}
    | json
    | level="ERROR" [24h]
  )
)

Error Rate Over Time

rate(
  {service="repti-core"}
  | json
  | level="ERROR" [5m]
)

Errors by Endpoint

{service=~"repti-.*"}
| json
| level="ERROR"
| endpoint != ""
| line_format "{{.endpoint}}: {{.error_type}}"

New Error Types (Last 24h)

{service=~"repti-.*"}
| json
| level="ERROR"
| __error__=""
| line_format "{{.timestamp}} [{{.service}}] {{.error_type}}: {{.message}}"

Performance Analysis

Slow Requests (>1 second)

{service=~"repti-.*"}
| json
| duration_ms > 1000
| line_format "{{.duration_ms}}ms - {{.method}} {{.endpoint}} ({{.status_code}})"

Top 10 Slowest Endpoints

topk(10,
  avg by (endpoint) (
    avg_over_time(
      {service=~"repti-.*"}
      | json
      | unwrap duration_ms [1h]
    )
  )
)

Request Duration Percentiles

# P50
quantile_over_time(0.50,
  {service="repti-core"}
  | json
  | unwrap duration_ms [5m]
)

# P95
quantile_over_time(0.95,
  {service="repti-core"}
  | json
  | unwrap duration_ms [5m]
)

# P99
quantile_over_time(0.99,
  {service="repti-core"}
  | json
  | unwrap duration_ms [5m]
)

Slow Database Queries

{service=~"repti-.*"}
| json
| message =~ "(?i)database|postgres|query"
| duration_ms > 1000
| line_format "{{.duration_ms}}ms - {{.message}}"

Database Connection Pool Exhaustion

{service=~"repti-.*"}
| json
| message =~ "(?i)connection pool|max connections|pool exhausted"
| level =~ "WARN|ERROR"

Cache Miss Patterns

{service=~"repti-.*"}
| json
| message =~ "(?i)cache miss|cache not found"
| line_format "{{.timestamp}} [{{.service}}] {{.endpoint}}"

Memory Warnings

{service=~"repti-.*"}
| json
| message =~ "(?i)memory|oom|out of memory"
| level =~ "WARN|ERROR"

Security & Authentication

Failed Authentication Attempts

{service="repti-auth"}
| json
| message =~ "(?i)authentication failed|login failed|invalid credentials"
| line_format "{{.timestamp}} - {{.user_id}} - {{.source_ip}}"

Authentication Failure Rate

rate(
  {service="repti-auth"}
  | json
  | message =~ "(?i)authentication failed" [5m]
)

Unauthorized Access Attempts (401/403)

{service=~"repti-.*"}
| json
| status_code =~ "401|403"
| line_format "{{.timestamp}} [{{.service}}] {{.method}} {{.endpoint}} - {{.user_id}} - {{.source_ip}}"

Top IPs with Auth Failures

sum by (source_ip) (
  count_over_time(
    {service="repti-auth"}
    | json
    | message =~ "(?i)authentication failed" [1h]
  )
)

Suspicious Activity (Multiple Failed Attempts)

sum by (source_ip, user_id) (
  count_over_time(
    {service="repti-auth"}
    | json
    | message =~ "(?i)authentication failed" [5m]
  )
) > 5

API Rate Limits Hit

{service="repti-gateway"}
| json
| status_code="429"
| line_format "{{.timestamp}} - {{.endpoint}} - {{.user_id}} - {{.source_ip}}"

Rate Limit Hits by Endpoint

sum by (endpoint) (
  count_over_time(
    {service="repti-gateway"}
    | json
    | status_code="429" [1h]
  )
)

User Activity

User Activity Timeline

{service=~"repti-.*"}
| json
| user_id="<USER_ID_HERE>"
| line_format "{{.timestamp}} [{{.service}}] {{.method}} {{.endpoint}} ({{.status_code}})"

User Actions by Service

sum by (service) (
  count_over_time(
    {service=~"repti-.*"}
    | json
    | user_id="<USER_ID_HERE>" [24h]
  )
)

Active Users in Last Hour

count by (user_id) (
  count_over_time(
    {service=~"repti-.*"}
    | json
    | user_id != "" [1h]
  )
)

User Session Analysis

{service=~"repti-.*"}
| json
| session_id="<SESSION_ID_HERE>"
| line_format "{{.timestamp}} [{{.service}}] {{.endpoint}} - Duration: {{.duration_ms}}ms"

Database Operations

Database Errors

{service=~"repti-.*"}
| json
| message =~ "(?i)database|postgres|connection|query error"
| level="ERROR"
| line_format "{{.timestamp}} [{{.service}}] {{.error_type}}: {{.message}}"

Database Connection Errors

{service=~"repti-.*"}
| json
| error_type =~ ".*Connection.*|.*Database.*"
| level="ERROR"

SQL Injection Attempts

{service=~"repti-.*"}
| json
| message =~ "(?i)sql injection|drop table|union select|exec|script"
| level="WARN"

Transaction Rollbacks

{service=~"repti-.*"}
| json
| message =~ "(?i)rollback|transaction failed"
| line_format "{{.timestamp}} [{{.service}}] {{.endpoint}}: {{.message}}"

API Monitoring

5xx Server Errors

{service=~"repti-.*"}
| json
| status_code >= 500
| line_format "{{.timestamp}} [{{.service}}] {{.method}} {{.endpoint}} ({{.status_code}})"

4xx Client Errors

{service=~"repti-.*"}
| json
| status_code >= 400 and status_code < 500
| line_format "{{.timestamp}} [{{.service}}] {{.method}} {{.endpoint}} ({{.status_code}})"

Error Rate by Status Code

sum by (status_code) (
  rate(
    {service=~"repti-.*"}
    | json
    | status_code >= 400 [5m]
  )
)

Top Error Endpoints

topk(10,
  sum by (endpoint) (
    count_over_time(
      {service=~"repti-.*"}
      | json
      | status_code >= 500 [1h]
    )
  )
)

Request Volume by Service

sum by (service) (
  rate({service=~"repti-.*"} [5m])
)

Service Health

Service Startup/Shutdown Events

{service=~"repti-.*"}
| json
| message =~ "(?i)starting|started|stopping|stopped|shutting down"
| line_format "{{.timestamp}} [{{.service}}] {{.message}}"

Unhealthy Service Checks

{service=~"repti-.*"}
| json
| message =~ "(?i)health check failed|unhealthy"
| level =~ "WARN|ERROR"

Background Job Failures

{service=~"repti-.*"}
| json
| message =~ "(?i)job failed|task failed|worker error"
| level="ERROR"

Resource Warnings

{service=~"repti-.*"}
| json
| message =~ "(?i)high cpu|high memory|disk full|low disk space"
| level="WARN"

Debugging Queries

Context View (Logs Around a Specific Time)

{service="repti-core"}
| json
| line_format "{{.timestamp}} [{{.level}}] {{.message}}"
Then in Grafana, click on a log line and select “Show context” to see logs before and after.

Live Tail (Real-time Logs)

{service="repti-core"} | json
Click the “Live” button in Grafana Explore to stream logs in real-time.

Full Request/Response Debug

{service="repti-core"}
| json
| request_id="<REQUEST_ID>"
| line_format "{{.timestamp}} [{{.level}}] {{.message}}\nDetails: {{.details}}"

Unique Error Messages

count by (message) (
  count_over_time(
    {service="repti-core"}
    | json
    | level="ERROR" [24h]
  )
)

Correlation Queries

Errors with Slow Requests

{service="repti-core"}
| json
| duration_ms > 2000 or level="ERROR"
| line_format "{{.duration_ms}}ms [{{.level}}] {{.method}} {{.endpoint}} - {{.message}}"

Failed Requests with User Context

{service=~"repti-.*"}
| json
| status_code >= 400
| user_id != ""
| line_format "User: {{.user_id}} - {{.method}} {{.endpoint}} ({{.status_code}}) - {{.message}}"

Variables for Dashboards

When using these queries in Grafana dashboards, use these variables:
$service     # Service name (e.g., repti-core)
$environment # Environment (dev, staging, prod)
$user_id     # User ID for filtering
$endpoint    # API endpoint
$level       # Log level (DEBUG, INFO, WARN, ERROR)
$timerange   # Time range (e.g., [5m], [1h], [24h])
Example with variables:
{service="$service", environment="$environment"}
| json
| level="$level"
| user_id="$user_id"

Query Tips

  1. Start broad, filter narrow: Begin with {service="repti-core"} then add filters
  2. Use json parsing early: | json extracts fields for filtering
  3. Format output for readability: | line_format "{{.field1}}: {{.field2}}"
  4. Limit results: Add | limit 100 to large result sets
  5. Use aggregation functions: count_over_time(), rate(), sum(), avg()
  6. Regex tips:
    • =~ for regex match
    • !~ for regex not match
    • (?i) for case-insensitive
  7. Time ranges: Always specify time range with [5m], [1h], etc.

Next Steps