Docs are not available for mobile use. Please use a desktop computer to view the documentation.
Ingestion Runs
Data ingestion job tracking and processing metadata
Purpose
The ingestion_runs
table tracks data ingestion operations for individual signals. It provides detailed metadata about data loading processes, including timing, volumes, file paths, and error handling for monitoring and debugging data pipeline operations.
Schema Definition
Field | Type | Description | Constraints |
---|---|---|---|
id | uuid | Unique ingestion run identifier | Primary key, auto-generated |
signalId | uuid | Signal being ingested | Foreign key to signals.id |
status | ingestion_status | Current ingestion status | Default: ‘pending’ |
startedAt | timestamptz | When ingestion began | Optional |
completedAt | timestamptz | When ingestion finished | Optional |
recordsAdded | integer | Number of records successfully added | Default: 0 |
minioPath | varchar | Path to source data file in storage | Optional |
errorMessage | text | Error details if ingestion failed | Optional |
createdAt | timestamptz | Record creation timestamp | Auto-generated |
updatedAt | timestamptz | Last update timestamp | Auto-updated |
Relationships
References:
signals.id
viasignalId
- Signal being ingested (cascade delete)
Indexes
ingestion_runs_signal_id_idx
- B-tree index on signalId for signal-specific queriesingestion_runs_status_idx
- B-tree index on status for filtering by status
Enumeration Values
Ingestion Status
pending
- Ingestion is scheduled but not startedrunning
- Ingestion is currently in progresscompleted
- Ingestion finished successfullyfailed
- Ingestion encountered errors and failedcancelled
- Ingestion was cancelled before completion
Usage Examples
Get recent ingestion runs for a signal
SELECT status, startedAt, completedAt, recordsAdded, errorMessage
FROM ingestion_runs
WHERE signalId = 'signal-uuid'
ORDER BY createdAt DESC
LIMIT 10;
Monitor active ingestions
SELECT
ir.id,
s.signalName,
s.displayName,
ir.status,
ir.startedAt,
EXTRACT(EPOCH FROM (NOW() - ir.startedAt))/60 as elapsed_minutes,
ir.recordsAdded
FROM ingestion_runs ir
JOIN signals s ON ir.signalId = s.id
WHERE ir.status = 'running'
ORDER BY ir.startedAt;
Analyze ingestion performance
SELECT
s.signalName,
COUNT(*) as total_runs,
COUNT(*) FILTER (WHERE ir.status = 'completed') as successful_runs,
COUNT(*) FILTER (WHERE ir.status = 'failed') as failed_runs,
AVG(ir.recordsAdded) FILTER (WHERE ir.status = 'completed') as avg_records,
AVG(EXTRACT(EPOCH FROM (ir.completedAt - ir.startedAt))/60) as avg_duration_minutes
FROM ingestion_runs ir
JOIN signals s ON ir.signalId = s.id
WHERE ir.createdAt >= NOW() - INTERVAL '30 days'
GROUP BY s.signalName
ORDER BY total_runs DESC;
Find failed ingestions for troubleshooting
SELECT
s.signalName,
s.displayName,
ir.startedAt,
ir.recordsAdded,
ir.errorMessage,
ir.minioPath
FROM ingestion_runs ir
JOIN signals s ON ir.signalId = s.id
WHERE ir.status = 'failed'
AND ir.createdAt >= NOW() - INTERVAL '7 days'
ORDER BY ir.startedAt DESC;
Get ingestion volume statistics
SELECT
DATE_TRUNC('day', ir.completedAt) as day,
COUNT(*) as completed_runs,
SUM(ir.recordsAdded) as total_records,
AVG(ir.recordsAdded) as avg_records_per_run
FROM ingestion_runs ir
WHERE ir.status = 'completed'
AND ir.completedAt >= NOW() - INTERVAL '30 days'
GROUP BY day
ORDER BY day;
Find long-running ingestions
SELECT
s.signalName,
ir.startedAt,
ir.status,
EXTRACT(EPOCH FROM (COALESCE(ir.completedAt, NOW()) - ir.startedAt))/60 as duration_minutes,
ir.recordsAdded
FROM ingestion_runs ir
JOIN signals s ON ir.signalId = s.id
WHERE ir.startedAt IS NOT NULL
AND (
ir.status = 'running' OR
EXTRACT(EPOCH FROM (ir.completedAt - ir.startedAt)) > 3600 -- > 1 hour
)
ORDER BY duration_minutes DESC;
Check ingestion health for a user’s signals
SELECT
s.signalName,
s.displayName,
s.lastSuccessfulIngestionAt,
COUNT(ir.id) as recent_runs,
COUNT(ir.id) FILTER (WHERE ir.status = 'completed') as successful_runs,
MAX(ir.completedAt) as last_completed
FROM signals s
LEFT JOIN ingestion_runs ir ON s.id = ir.signalId
AND ir.createdAt >= NOW() - INTERVAL '24 hours'
WHERE s.userId = 'user-uuid'
AND s.status = 'active'
GROUP BY s.id, s.signalName, s.displayName, s.lastSuccessfulIngestionAt
ORDER BY s.signalName;