# Telomere Documentation > Lifetime as a Service Telomere is a developer-focused platform that provides lifecycle management as a service. It helps you track and manage any process, resource, or object that has a defined lifetime. Telomere is organized around two core concepts: lifecycles and runs. You define lifecycles as templates for the things you'd like to track. A run is an instance of a lifecycle that will be monitored by Telomere. All phases of a run can trigger webhooks to integrate with your infrastructure, and to implement things like automated handling of failures. Email alerts are also available. Every run is either marked completed or failed via an API call, otherwise Telomere will automatically mark it as failed due to timeout after its defined lifetime. Whether you're tracking cron jobs, monitoring workflow completion, managing API timeouts, handling resource cleanup, or tracking session lifecycles, Telomere provides reliable infrastructure for managing lifecycles without building custom timeout logic into every service. This document is the canonical, machine-readable version of the Telomere docs. It is served as plain markdown at `https://telomere.modulecollective.com/llms.txt` and rendered for humans at `https://telomere.modulecollective.com/docs`. ## Quick Start This example shows you how to monitor a database backup process with Telomere. The workflow involves creating a lifecycle (a template for monitoring), starting a run when your process begins, and reporting completion when it finishes. If your process doesn't complete within the configured timeout (1 hour in this example), Telomere can trigger webhooks or email alerts to notify you of potential issues. ### 1. Sign up for an account Visit [Telomere](https://telomere.modulecollective.com) to create your account. ### 2. Create an API key Navigate to Settings → API Keys in the web UI to create your first API key. ### 3. Set up your environment ```bash # Export your API key as an environment variable export TELOMERE_API_KEY="your_api_key_here" ``` ### 4. Create a lifecycle for the backup process This is one time setup to define a lifecycle for the backup process and could also be done via the web UI. ```bash curl -X POST https://telomere.modulecollective.com/api/lifecycles \ -H "Authorization: Bearer $TELOMERE_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "name": "daily-backup", "description": "Daily database backup process", "defaultTimeoutSeconds": 3600 }' ``` ### 5. Report a new run of the lifecycle This would be at the top of the backup process script. ```bash RUN_ID=$(curl -X POST https://telomere.modulecollective.com/api/lifecycles/daily-backup/runs \ -H "Authorization: Bearer $TELOMERE_API_KEY" \ -H "Content-Type: application/json" \ -d '{"tags": {"environment": "production"}}' | jq -r '.id') ``` ### 6. Report completion of this lifecycle run This would be at the end of the backup process script. If this call doesn't happen in 3600 seconds from the start, this run will automatically be marked as failed due to timeout by Telomere. ```bash curl -X POST https://telomere.modulecollective.com/api/runs/$RUN_ID/end \ -H "Authorization: Bearer $TELOMERE_API_KEY" ``` ## Authentication ### Creating API Keys API keys can only be created through the web UI: 1. Log in to Telomere 2. Navigate to Settings → API Keys 3. Click "Create API Key" 4. Give your key a descriptive name 5. Copy the key immediately - it won't be shown again ### Required Header All API requests require authentication using an API key. Include your API key in the Authorization header of every request: ```http Authorization: Bearer YOUR_API_KEY ``` ## Core Concepts Telomere is built around two simple building blocks: lifecycles and runs. These can be used in a variety of ways to model many different lifecycle types and scenarios. ### Lifecycles A lifecycle is a template for a specific kind of process or operation you want to monitor. Examples include database migrations, deployments, batch jobs, or scheduled maintenance tasks. Each lifecycle has: - **Name:** A unique identifier for the lifecycle (e.g., "database-migration", "data-sync") - **Description:** Optional human-readable description of what this lifecycle represents - **Default Timeout:** How long runs can execute before automatic timeout (in seconds) - **Default Tags:** Key-value pairs for categorization and filtering ### Runs A run is an execution instance of a lifecycle. Each run tracks: - **Run ID:** Automatically-generated unique ID for this instance - **Status:** Current state - running, completed, failed, or timeout - **Timestamps:** When the run started and ended - **URL:** Optional link to related resources or dashboards - **Timeout:** How long runs can execute before automatic timeout (in seconds, inherits from lifecycle) - **Tags:** Metadata for categorization (inherits from lifecycle) - **Message & source:** An optional message about the run's outcome, plus a `messageSource` field noting who set it — `user` when the message came from the API caller (in an end or fail request), or `system` when Telomere generated it (for example, on timeout) Telomere allows you to define teams where admins can add or remove members. Every lifecycle and run belongs to a team, and can be accessed by any member of that team. Billing and usage quotas are also defined per team. A user can only be an admin of a limited number of teams, and every user has at least one "Default" team. ## API Reference The Telomere API provides RESTful endpoints for managing lifecycles and runs. All endpoints require authentication via API key and are prefixed with `/api`. Base URL: `https://telomere.modulecollective.com/api` ### Lifecycle Endpoints #### List Lifecycles `GET /api/lifecycles` Retrieve a paginated list of all lifecycles for your team. Query parameters: | Name | Type | Required | Default | Description | | -------- | ------ | -------- | ------- | --------------------------------- | | page | number | No | 1 | Page number for pagination | | pageSize | number | No | 10 | Number of items per page (max 100)| Response example: ```json { "data": [ { "id": "123e4567-e89b-12d3-a456-426614174000", "name": "daily-backup", "description": "Daily database backup process", "defaultTimeoutSeconds": 3600, "defaultTags": { "team": "backend" }, "createdAt": "2024-01-15T10:00:00Z", "updatedAt": "2024-01-15T10:00:00Z" } ], "pagination": { "page": 1, "pageSize": 10, "total": 42, "totalPages": 5 } } ``` #### Create Lifecycle `POST /api/lifecycles` Create a new lifecycle to monitor a specific process or operation. Request body fields: | Name | Type | Required | Default | Description | | --------------------- | ------ | -------- | ------- | ---------------------------------------------------- | | name | string | Yes | | Unique name for the lifecycle | | description | string | No | | Human-readable description of the lifecycle purpose | | defaultTimeoutSeconds | number | No | 60 | Default timeout in seconds for all runs | | defaultTags | object | No | | Default tags to apply to all runs | Request example: ```json { "name": "database-migration", "description": "Track database schema migrations", "defaultTimeoutSeconds": 300, "defaultTags": { "team": "backend", "critical": "true" } } ``` Response example: ```json { "id": "123e4567-e89b-12d3-a456-426614174000", "name": "database-migration", "description": "Track database schema migrations", "defaultTimeoutSeconds": 300, "defaultTags": { "team": "backend", "critical": "true" }, "teamId": "987e6543-e89b-12d3-a456-426614174000", "createdAt": "2024-01-15T10:00:00Z", "updatedAt": "2024-01-15T10:00:00Z" } ``` #### Get Lifecycle `GET /api/lifecycles/:idOrName` Retrieve details of a specific lifecycle by its ID or name. Path parameters: | Name | Type | Required | Description | | -------- | ------ | -------- | ----------------------------- | | idOrName | string | Yes | Lifecycle UUID or unique name | Response example: ```json { "id": "123e4567-e89b-12d3-a456-426614174000", "name": "database-migration", "description": "Track database schema migrations", "defaultTimeoutSeconds": 300, "defaultTags": { "team": "backend" }, "teamId": "987e6543-e89b-12d3-a456-426614174000", "createdAt": "2024-01-15T10:00:00Z", "updatedAt": "2024-01-15T10:00:00Z", "totalRuns": 29, "runningRuns": 0, "completedRuns": 6, "failedRuns": 1, "timeoutRuns": 22, "lastRunAt": "2025-07-01T18:18:37.405Z", "lastRunId": "009e5d97-8716-4d1a-8a77-eb37d8adec32" } ``` #### Update Lifecycle `PATCH /api/lifecycles/:idOrName` Update an existing lifecycle's configuration. Returns the full GET lifecycle response reflecting the changes. Path parameters: | Name | Type | Required | Description | | -------- | ------ | -------- | ----------------------------- | | idOrName | string | Yes | Lifecycle UUID or unique name | Request body fields: | Name | Type | Required | Description | | --------------------- | ------ | -------- | ----------------------- | | name | string | No | Updated name | | description | string | No | Updated description | | defaultTimeoutSeconds | number | No | Updated default timeout | | defaultTags | object | No | Updated default tags | Request example: ```json { "name": "Updated name", "description": "Updated description", "defaultTimeoutSeconds": 600, "defaultTags": { "team": "backend", "priority": "high" } } ``` #### Delete Lifecycle `DELETE /api/lifecycles/:idOrName` Delete a lifecycle and all its associated runs. This action cannot be undone. Path parameters: | Name | Type | Required | Description | | -------- | ------ | -------- | ----------------------------- | | idOrName | string | Yes | Lifecycle UUID or unique name | Response example: ```json { "message": "Lifecycle deleted successfully" } ``` #### Start Lifecycle `POST /api/lifecycles/:idOrName/runs` Start a new run for a specific lifecycle. The run will be created in "running" status. Path parameters: | Name | Type | Required | Description | | -------- | ------ | -------- | ----------------------------- | | idOrName | string | Yes | Lifecycle UUID or unique name | Request body fields (all optional): | Name | Type | Required | Description | | -------------- | ------ | -------- | -------------------------------------------------- | | timeoutSeconds | number | No | Override the default timeout for this run | | url | string | No | URL to related resources (logs, dashboards, etc.) | | tags | object | No | Additional tags for this specific run | Request example: ```json { "timeoutSeconds": 300, "url": "https://dashboard.example.com/job/123", "tags": { "environment": "production", "version": "v1.2.3" } } ``` Response example: ```json { "id": "abc123-e89b-12d3-a456-426614174000", "lifecycleId": "123e4567-e89b-12d3-a456-426614174000", "status": "running", "startedAt": "2024-01-15T10:00:00Z", "endedAt": null, "timeoutSeconds": 300, "url": "https://dashboard.example.com/job/123", "tags": { "environment": "production", "version": "v1.2.3" } } ``` #### Respawn Lifecycle `POST /api/lifecycles/:idOrName/respawn` Atomically completes any running runs and starts a new run. This is ideal for cron jobs and scheduled tasks where you want to ensure only one instance is running at a time, and you want to avoid having to track state for the previous run. Path parameters: | Name | Type | Required | Description | | -------- | ------ | -------- | ----------------------------- | | idOrName | string | Yes | Lifecycle UUID or unique name | Request body fields (all optional): | Name | Type | Required | Default | Description | | --------------------- | ------ | -------- | -------- | -------------------------------------------------------------------- | | timeoutSeconds | number | No | | Override the default timeout for this run | | url | string | No | | URL to related resources (logs, dashboards, etc.) | | tags | object | No | | Override the default tags for this run | | previousRunResolution | string | No | complete | How to resolve previous running runs: "complete", "fail", or "timeout" | The resolution verb sets the resolved run's status: `complete` → `completed`, `fail` → `failed`, `timeout` → `timeout`. Request example: ```json { "timeoutSeconds": 300, "url": "https://dashboard.example.com/job/123", "tags": { "environment": "production", "version": "v1.2.3" }, "previousRunResolution": "complete" } ``` Response example: ```json { "previousRun": { "id": "789xyz-e89b-12d3-a456-426614174000", "status": "completed", "endedAt": "2024-01-15T10:00:00Z", "duration": 300 }, "newRun": { "id": "abc123-e89b-12d3-a456-426614174000", "status": "running", "startedAt": "2024-01-15T10:00:01Z", "timeoutAt": "2024-01-15T10:05:01Z" } } ``` Simple respawn using all defaults from the lifecycle: ```bash curl -X POST https://telomere.modulecollective.com/api/lifecycles/daily-backup/respawn \ -H "Authorization: Bearer $TELOMERE_API_KEY" ``` Complex respawn with custom timeout and tags: ```bash curl -X POST https://telomere.modulecollective.com/api/lifecycles/daily-backup/respawn \ -H "Authorization: Bearer $TELOMERE_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "timeoutSeconds": 7200, "tags": {"date": "'$(date +%Y-%m-%d)'"} }' ``` #### Unspawn Lifecycle `POST /api/lifecycles/:idOrName/unspawn` Complete all running runs without starting a new one. This is useful for graceful shutdown scenarios where you want to stop all active runs. Path parameters: | Name | Type | Required | Description | | -------- | ------ | -------- | ----------------------------- | | idOrName | string | Yes | Lifecycle UUID or unique name | Request body fields (all optional): | Name | Type | Required | Default | Description | | ---------- | ------ | -------- | -------- | ------------------------------------------------------------ | | resolution | string | No | complete | How to resolve running runs: "complete", "fail", or "timeout" | The resolution verb sets each ended run's status: `complete` → `completed`, `fail` → `failed`, `timeout` → `timeout`. Request example: ```json { "resolution": "complete" } ``` Response example: ```json { "endedRuns": [ { "id": "789xyz-e89b-12d3-a456-426614174000", "status": "completed", "endedAt": "2024-01-15T10:00:00Z", "duration": 300 }, { "id": "def456-e89b-12d3-a456-426614174000", "status": "completed", "endedAt": "2024-01-15T10:00:00Z", "duration": 150 } ] } ``` Simple unspawn, marking existing runs complete: ```bash curl -X POST https://telomere.modulecollective.com/api/lifecycles/worker-heartbeat/unspawn \ -H "Authorization: Bearer $TELOMERE_API_KEY" ``` Unspawn with failure resolution: ```bash curl -X POST https://telomere.modulecollective.com/api/lifecycles/worker-heartbeat/unspawn \ -H "Authorization: Bearer $TELOMERE_API_KEY" \ -H "Content-Type: application/json" \ -d '{"resolution": "fail"}' ``` ### Run Endpoints #### List Runs `GET /api/runs` Retrieve a paginated list of all runs across all lifecycles with optional filters. Query parameters: | Name | Type | Required | Default | Description | | ----------------- | ------ | -------- | ------- | ---------------------------------------------------- | | page | number | No | 1 | Page number for pagination | | pageSize | number | No | 10 | Number of items per page (max 100) | | status | string | No | | Filter by status: running, completed, failed, timeout | | lifecycleIdOrName | string | No | | Filter by lifecycle name or ID | Response example: ```json { "data": [ { "id": "abc123-e89b-12d3-a456-426614174000", "lifecycleId": "123e4567-e89b-12d3-a456-426614174000", "status": "completed", "startedAt": "2024-01-15T10:00:00Z", "endedAt": "2024-01-15T10:05:00Z", "timeoutSeconds": 300, "url": "https://ci.example.com/job/123", "tags": { "environment": "production", "version": "v1.2.3" }, "lifecycleName": "database-migration", "message": "All migrations complete!", "messageSource": "user" } ], "pagination": { "page": 1, "pageSize": 10, "total": 142, "totalPages": 15 } } ``` #### Get Run `GET /api/runs/:id` Retrieve details of a specific run by its ID. Path parameters: | Name | Type | Required | Description | | ---- | ------ | -------- | ----------- | | id | string | Yes | Run UUID | Response example: ```json { "id": "abc123-e89b-12d3-a456-426614174000", "lifecycleId": "123e4567-e89b-12d3-a456-426614174000", "status": "completed", "startedAt": "2024-01-15T10:00:00Z", "endedAt": "2024-01-15T10:05:00Z", "timeoutSeconds": 300, "url": "https://ci.example.com/job/123", "tags": { "environment": "production", "version": "v1.2.3" }, "message": "All migrations complete!", "messageSource": "user" } ``` #### End Run `POST /api/runs/:id/end` Mark a running lifecycle as completed successfully. Only runs in "running" status can be ended. Path parameters: | Name | Type | Required | Description | | ---- | ------ | -------- | ----------- | | id | string | Yes | Run UUID | Request body fields (all optional): | Name | Type | Required | Description | | ------- | ------ | -------- | ------------------------------------- | | message | string | No | Brief message about the completed run | Request example: ```json { "message": "Run completed successfully!" } ``` Response example: ```json { "id": "abc123-e89b-12d3-a456-426614174000", "lifecycleId": "123e4567-e89b-12d3-a456-426614174000", "status": "completed", "startedAt": "2024-01-15T10:00:00Z", "endedAt": "2024-01-15T10:05:23Z", "timeoutSeconds": 300, "url": "https://ci.example.com/job/123", "tags": { "environment": "production", "version": "v1.2.3" }, "message": "Run completed successfully!", "messageSource": "user" } ``` #### Fail Run `POST /api/runs/:id/fail` Mark a running lifecycle as failed. Only runs in "running" status can be failed. Path parameters: | Name | Type | Required | Description | | ---- | ------ | -------- | ----------- | | id | string | Yes | Run UUID | Request body fields (all optional): | Name | Type | Required | Description | | ------- | ------ | -------- | ---------------------------------- | | message | string | No | Brief message about the failed run | Request example: ```json { "message": "Run failed due to error 123!" } ``` Response example: ```json { "id": "abc123-e89b-12d3-a456-426614174000", "lifecycleId": "123e4567-e89b-12d3-a456-426614174000", "status": "failed", "startedAt": "2024-01-15T10:00:00Z", "endedAt": "2024-01-15T10:02:15Z", "timeoutSeconds": 300, "url": "https://ci.example.com/job/123", "tags": { "environment": "production", "version": "v1.2.3" }, "message": "Run failed due to error 123!", "messageSource": "user" } ``` ## Webhooks Webhooks allow you to receive real-time notifications when events occur in your Telomere lifecycles. Configure a webhook endpoint to automatically trigger workflows, send alerts, or integrate with other services when runs start, complete, fail, or timeout. ### Webhook Configuration Each team can configure one webhook endpoint through the Telomere web interface. When you create or update a webhook, Telomere generates a unique secret key for request validation. To configure webhooks: 1. Navigate to your team's Settings page in the Telomere dashboard 2. Click on the "Webhooks" tab 3. Enter your webhook endpoint URL (must use HTTPS) 4. Select which event types you want to receive (or leave empty for all events) 5. Click "Save" to activate the webhook 6. Copy and securely store the generated webhook secret for request validation Security requirements: - Webhook URLs must use HTTPS - URLs cannot point to private IP addresses or localhost - Webhooks have a 10 second timeout for delivery - Failed webhooks are retried with exponential backoff ### Webhook Events You can subscribe to specific events or receive all events by not specifying the `eventTypes` parameter. Available events: | Event | Description | Trigger | | ------------- | ---------------------------------------------- | -------------------------------------------------------- | | run.created | Fired when a new run is started | POST /lifecycles/:id/runs or POST /lifecycles/:id/respawn | | run.completed | Fired when a run completes successfully | POST /runs/:id/end | | run.failed | Fired when a run fails explicitly | POST /runs/:id/fail | | run.timeout | Fired when a run times out | Automatic when timeout_seconds expires | | webhook.test | Test event for verifying webhook configuration | Send Test Webhook button in Settings | ### Webhook Payload All webhook events (except test events) follow the same payload structure: ```json { "id": "evt_ln8x3q9p_k2h7f5md", "event": "run.timeout", "timestamp": "2024-01-15T10:30:00.000Z", "lifecycle": { "id": "550e8400-e29b-41d4-a716-446655440000", "name": "daily-backup", "description": "Daily database backup job", "timeout_seconds": 3600 }, "run": { "id": "660e8400-e29b-41d4-a716-446655440001", "status": "timeout", "started_at": "2024-01-15T09:30:00.000Z", "ended_at": "2024-01-15T10:30:00.000Z", "duration_seconds": 3600, "url": "https://app.example.com/jobs/123", "tags": { "environment": "production", "job_id": "backup-20240115" }, "message": null }, "team": { "id": "770e8400-e29b-41d4-a716-446655440002", "name": "Acme Corp" } } ``` Test events have a simpler structure: ```json { "id": "evt_test_k2h7f5md", "event": "webhook.test", "timestamp": "2024-01-15T10:30:00.000Z", "data": { "message": "This is a test webhook from Telomere" }, "team": { "id": "770e8400-e29b-41d4-a716-446655440002", "name": "Acme Corp" } } ``` ### Webhook Headers Telomere includes several headers with each webhook request: - `X-Telomere-Signature` - HMAC-SHA256 signature for request validation - `X-Telomere-Event` - The event type (e.g., "run.timeout") - `X-Telomere-Event-ID` - Unique identifier for this event - `Content-Type: application/json` - Always JSON payload - `User-Agent: Telomere/0.0.1` - Telomere webhook delivery agent ### Webhook Validation Always validate the webhook signature to ensure requests are coming from Telomere. The signature is computed using HMAC-SHA256 with your webhook secret. ```typescript import { createHmac } from 'crypto'; interface WebhookHeaders { 'x-telomere-signature': string; 'x-telomere-event': string; 'x-telomere-event-id': string; } /** * Validate a webhook request from Telomere * @param payload - The raw request body as a string * @param signature - The X-Telomere-Signature header value * @param secret - Your webhook secret from Telomere * @returns true if the signature is valid */ function validateWebhook( payload: string, signature: string, secret: string ): boolean { // Compute the expected signature const expectedSignature = 'sha256=' + createHmac('sha256', secret) .update(payload) .digest('hex'); // Use timing-safe comparison to prevent timing attacks return crypto.timingSafeEqual( Buffer.from(signature), Buffer.from(expectedSignature) ); } // Express.js example app.post('/webhooks/telomere', express.raw({ type: 'application/json' }), (req, res) => { const payload = req.body.toString(); const signature = req.headers['x-telomere-signature'] as string; const secret = process.env.TELOMERE_WEBHOOK_SECRET!; if (!validateWebhook(payload, signature, secret)) { return res.status(401).send('Invalid signature'); } // Parse and process the webhook const event = JSON.parse(payload); console.log(`Received ${event.event} event for run ${event.run.id}`); // Handle different event types switch (event.event) { case 'run.timeout': // Handle timeout - maybe restart the job or alert someone break; case 'run.failed': // Handle failure - maybe send an alert break; case 'run.completed': // Handle success - maybe trigger next step in workflow break; } // Always return 200 OK quickly res.status(200).send('OK'); }); ``` Important: always validate the webhook signature before processing the payload. Never trust the request without validation. ### Webhook Reliability Telomere ensures reliable webhook delivery with the following features: - **Automatic retries** - Failed webhooks are retried with exponential backoff (up to 3 attempts with a 1 minute base delay) - **Timeout handling** - Webhooks timeout after 10 seconds to prevent hanging deliveries - **At-least-once delivery** - In rare cases, you may receive the same event multiple times. Use the event ID for deduplication - **Automatic disabling** - Endpoints are disabled after 10 consecutive failures and you'll receive an email notification - **Event ordering** - Events are delivered in the order they occur, but retries may cause out-of-order delivery Your webhook endpoint should: - Return a 2xx status code quickly (within 10 seconds) - Process events asynchronously if needed - Be idempotent - handle duplicate events gracefully - Not rely on event ordering ### Webhook Testing After configuring your webhook endpoint, you can test it directly from the Telomere dashboard: 1. Navigate to your team's Settings page 2. Click on the "Webhooks" tab 3. Click the "Send Test Webhook" button 4. Check your webhook endpoint logs to confirm receipt The test webhook will be sent with the `webhook.test` event type and includes a simple message payload to verify connectivity and signature validation. For local development, you can use tools like [ngrok](https://ngrok.com) to expose your local webhook endpoint with HTTPS, which is required by Telomere. ## Use Cases Telomere provides universal lifecycle management - from trial periods and ephemeral resources to cron jobs and service health. These examples show real-world patterns you can adapt for any process with a defined lifetime. ### Cron Job Monitoring Traditional cron monitoring services excel at heartbeat checks but typically can't track whether your job actually succeeded or how long it took. Telomere solves this by using two lifecycles: one for the heartbeat and another to track the actual job execution with proper timeout and failure handling. Complete cron monitoring pattern: ```bash #!/bin/bash # Add to your crontab: 0 2 * * * /path/to/daily-backup.sh # 1. Respawn the heartbeat lifecycle (proves cron is running) # This should have a timeout of 25 hours (24h + 1h grace period) curl -s -X POST https://telomere.modulecollective.com/api/lifecycles/daily-backup-heartbeat/respawn \ -H "Authorization: Bearer $TELOMERE_API_KEY" \ -H "Content-Type: application/json" \ -d '{"timeoutSeconds": 90000}' > /dev/null # 2. Start a run for the actual job JOB_RUN_ID=$(curl -s -X POST https://telomere.modulecollective.com/api/lifecycles/daily-backup-job/runs \ -H "Authorization: Bearer $TELOMERE_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "timeoutSeconds": 3600, "tags": {"date": "'$(date +%Y-%m-%d)'"} }' | jq -r '.id') # 3. Execute the backup if mysqldump --all-databases > /backups/mysql-$(date +%Y%m%d).sql; then # Success - end the job run curl -X POST https://telomere.modulecollective.com/api/runs/$JOB_RUN_ID/end \ -H "Authorization: Bearer $TELOMERE_API_KEY" echo "Backup completed successfully" else # Failure - mark the job as failed curl -X POST https://telomere.modulecollective.com/api/runs/$JOB_RUN_ID/fail \ -H "Authorization: Bearer $TELOMERE_API_KEY" echo "Backup failed" exit 1 fi ``` ### Service Health Monitoring Use Telomere as a dead man's switch for critical services. If a service stops sending heartbeats, you'll be alerted immediately. ```javascript // Arrange for these to get called when your service starts/stops: async startWorkerProcessTracking(): Promise { const telomereClient = TelomereClient.getInstance(); const lifecycleName = 'worker-heartbeat'; const intervalSeconds = 60; // Respawn every minute const timeoutSeconds = intervalSeconds * 2; // 2 minute timeout (allows for some delay) // Ensure lifecycle exists await telomereClient.ensureLifecycle({ name: lifecycleName, description: 'Heartbeat for worker process to detect crashes', defaultTimeoutSeconds: timeoutSeconds, }); // Initial respawn try { await telomereClient.respawn({ lifecycleName, timeoutSeconds, tags: { processId: process.pid.toString(), workerCount: this.workers.size.toString(), }, }); logger.info('Worker process heartbeat started'); } catch (error) { logger.error({ error }, 'Failed to start worker process heartbeat'); } // Set up periodic respawn this.workerProcessInterval = setInterval(async () => { try { await telomereClient.respawn({ lifecycleName, timeoutSeconds, tags: { processId: process.pid.toString(), workerCount: this.workers.size.toString(), runningCount: this.getRunningCount().toString(), }, }); logger.debug('Worker process heartbeat respawned'); } catch (error) { logger.error({ error }, 'Failed to respawn worker process heartbeat'); } }, intervalSeconds * 1000); } async stopWorkerProcessTracking(): Promise { if (this.workerProcessInterval) { clearInterval(this.workerProcessInterval); this.workerProcessInterval = undefined; // Unspawn the worker process heartbeat, completing any running instances const telomereClient = TelomereClient.getInstance(); try { await telomereClient.unspawn('worker-heartbeat'); logger.info('Worker process heartbeat unspawned'); } catch (error) { logger.warn('Failed to unspawn worker process heartbeat', { error }); } logger.info('Worker process heartbeat stopped'); } } ``` Set up alert policies on timeout status to get notified when services fail. This pattern works great for monitoring any process that should remain up and running. ### Trial Period Management Automatically manage trial periods for your SaaS product. When users sign up, create a lifecycle that expires after your trial duration. Use webhooks to trigger conversion flows, send reminder emails, or downgrade features automatically. ```python import requests from datetime import datetime class TrialManager: def __init__(self, api_key, api_url): self.api_key = api_key self.api_url = api_url self.headers = {"Authorization": f"Bearer {api_key}"} def start_trial(self, user_id, email, plan="pro", trial_days=14): """Start a trial period for a new user""" lifecycle_name = f"trial-{user_id}" timeout_seconds = trial_days * 24 * 60 * 60 # Convert days to seconds # Ensure the trial lifecycle exists first - starting a run against a # lifecycle that doesn't exist returns 404. A 409 means it was already # created (e.g. the user restarted their trial), which is fine. create_response = requests.post( f"{self.api_url}/lifecycles", headers=self.headers, json={ "name": lifecycle_name, "description": f"Trial period for {email}", "defaultTimeoutSeconds": timeout_seconds, } ) if create_response.status_code not in (200, 409): raise Exception(f"Failed to create trial lifecycle: {create_response.text}") # Start the trial run against that lifecycle response = requests.post( f"{self.api_url}/lifecycles/{lifecycle_name}/runs", headers=self.headers, json={ "timeoutSeconds": timeout_seconds, "tags": { "user_id": user_id, "email": email, "plan": plan, "trial_start": datetime.now().isoformat() } } ) if response.status_code == 200: run_data = response.json() # Store the run ID with the user record for later use return run_data["id"] else: raise Exception(f"Failed to start trial: {response.text}") def convert_to_paid(self, user_id, run_id): """Mark trial as completed when user converts to paid""" response = requests.post( f"{self.api_url}/runs/{run_id}/end", headers=self.headers ) if response.status_code == 200: print(f"Trial converted for user {user_id}") else: print(f"Failed to end trial: {response.text}") def handle_trial_expiry_webhook(self, webhook_data): """Handle the webhook when a trial expires""" tags = webhook_data["run"]["tags"] user_id = tags["user_id"] email = tags["email"] plan = tags["plan"] # Trigger your conversion flow self.send_trial_expired_email(email) self.downgrade_to_free_plan(user_id) self.notify_sales_team(user_id, email, plan) # Usage example trial_manager = TrialManager(API_KEY, "https://telomere.modulecollective.com/api") # When a user signs up run_id = trial_manager.start_trial( user_id="user123", email="user@example.com", plan="pro", trial_days=14 ) # When they convert to paid (before trial expires) trial_manager.convert_to_paid("user123", run_id) ``` ### Ephemeral Resource Cleanup Automatically clean up temporary resources like preview environments, development databases, or temporary file storage. Create lifecycles when resources are provisioned and let Telomere trigger cleanup webhooks after the timeout. Since webhooks must respond within 10 seconds, queue long cleanup tasks for async processing. ```javascript // When creating a preview environment for a pull request async function createPreviewEnvironment(pullRequest) { // 1. Provision the actual resources const env = await provisionEnvironment({ name: `preview-pr-${pullRequest.number}`, branch: pullRequest.head.ref, }); // 2. Start a Telomere lifecycle that will trigger cleanup after 48 hours const response = await fetch(`${API_URL}/lifecycles/preview-env/runs`, { method: 'POST', headers: { 'Authorization': `Bearer ${API_KEY}`, 'Content-Type': 'application/json' }, body: JSON.stringify({ timeoutSeconds: 172800, // 48 hours tags: { env_id: env.id, pr_number: pullRequest.number.toString(), resources: JSON.stringify({ instances: env.instances, database: env.database, bucket: env.bucket, }), } }) }); const run = await response.json(); return { environment: env, runId: run.id }; } // Webhook handler async function handleCleanupWebhook(webhookData) { const { env_id, pr_number, resources } = webhookData.run.tags; // Queue the cleanup job for async processing await queueJob('cleanup-environment', { env_id, pr_number, resources: JSON.parse(resources), timestamp: new Date().toISOString() }); // Return immediately for webhook return { status: 'cleanup_queued' }; } // Background job processor handles the actual cleanup async function processCleanupJob(job) { const { env_id, pr_number, resources } = job.data; console.log(`Processing cleanup for PR #${pr_number}`); try { // Clean up all resources await Promise.all([ terminateInstances(resources.instances), deleteDatabase(resources.database), deleteBucket(resources.bucket), ]); // Notify on the PR await addPRComment(pr_number, "✅ Preview environment cleaned up after 48 hours." ); } catch (error) { await addPRComment(pr_number, "❌ Failed to clean up preview environment. Manual cleanup required." ); throw error; } } // When PR is merged/closed, end the lifecycle run early async function cleanupOnPRClose(runId) { await fetch(`${API_URL}/runs/${runId}/end`, { method: 'POST', headers: { 'Authorization': `Bearer ${API_KEY}` } }); } ``` This pattern helps control cloud costs by ensuring temporary resources are always cleaned up after the specified timeout. ## Comparison While many tools handle specific monitoring scenarios, Telomere provides universal lifecycle management that works across all your use cases. Here's how we compare to popular alternatives: | Product | Approach | Strengths | Primary Focus | How Telomere Differs | | --------------- | --------------------------------------------------------------------- | ---------------------------------------------------------------------------- | ----------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Cronitor | Full-stack monitoring platform (cron jobs, uptime, RUM, infrastructure) | Comprehensive monitoring suite, detailed analytics, extensive integrations | All-in-one monitoring solution | Telomere specializes in lifecycle management with a simpler API focused on timeout-based workflows | | Healthchecks.io | Ping-based monitoring with timeout alerts and webhooks | Simple setup, generous free tier, mature cron monitoring | Cron job and scheduled task monitoring | Telomere is built for universal process lifecycle management (API calls, user sessions, trials, resources) with automation-first design and unlimited usage at $49/mo vs per-check pricing | | Temporal | Durable workflow execution with event sourcing | Extremely reliable, supports complex workflows, strong consistency guarantees | Enterprise workflow orchestration | Telomere offers lightweight lifecycle management without infrastructure requirements | ### Key Differentiators **Track Any Lifecycle:** Unlike monitoring tools focused on infrastructure, Telomere manages lifecycles for anything: trial periods, ephemeral resources, user sessions, data retention, temporary access grants, or traditional cron jobs. One API for all timeouts. **Proactive Timeout Handling:** While traditional monitoring tools alert after failures, Telomere's timeout-based approach enables proactive automation and remediation through webhooks before problems cascade. **Simple Pricing:** At $49/month for unlimited usage, Telomere eliminates the complex pricing tiers and per-job costs that make other solutions expensive at scale. **Developer-First Design:** With a clean REST API and minimal concepts (lifecycles and runs), Telomere integrates into any stack without requiring infrastructure changes or complex SDKs. ### When to Choose Telomere Choose Telomere when you need: - Unified lifecycle management across different types of processes - Proactive timeout handling that triggers automation, not just alerts - Simple, predictable pricing without per-job or per-check costs - Quick integration without infrastructure changes - Flexible timeout management (different timeouts per run) - Team-based workspaces for organizing monitoring by project or service ## Integrations Telomere integrates with popular tools and platforms to help you monitor and manage lifecycles across your entire stack. Our growing catalog of integrations makes it easy to incorporate Telomere into your existing workflows. ### Apache Airflow Provider The Telomere Airflow Provider enables seamless integration between Apache Airflow and Telomere, allowing you to monitor your DAG executions and task runs with automatic lifecycle tracking. Source: [telomere-airflow-provider on GitHub](https://github.com/modulecollective/telomere-airflow-provider) Key features: - Automatic DAG lifecycle tracking with decorators - Task-level monitoring for granular visibility - Custom timeout configuration per DAG or task - Rich metadata support with tags and descriptions - Zero-configuration setup with sensible defaults Installation: ```bash pip install telomere-airflow-provider ``` ### Building Your Own Integration Telomere's simple API makes it easy to build custom integrations. Check out the [API Reference](#api-reference) to get started, or explore the [Airflow Provider source code](https://github.com/modulecollective/telomere-airflow-provider) as a reference implementation. Have a specific integration request? [Let us know](mailto:hello@modulecollective.com) what you'd like to see next! Also, if you created one we'd be happy to add it to the list.