Our offices

  • United States
    2332 Beach Avenue
    Venice, CA 90291
  • Singapore
    L39, Marina Bay Financial Centre Tower
    10 Marina Boulevard

Follow us

Cognition: AI-Powered Observability That Doesn't Just Watch — It Acts

Skytells announces the general availability of Cognition, an AI-powered production observability SDK for Node.js. Cognition captures every error with full context, monitors runtime health in real time, detects security threats before they escalate, and uses AI to correlate signals into automated actions — with one line of code. Manage everything from the Skytells Console.

··12 min·
Skytells Cognition — AI-powered production observability SDK for Node.js with error capture, runtime monitoring, security scanning, and automated response
Cognition: AI-powered production observability, available now

Cognition: AI-Powered Observability That Doesn't Just Watch — It Acts

April 13, 2026 — Most monitoring tools collect data. They store your logs, aggregate your metrics, and send you alerts when thresholds are crossed. Then they wait for you to figure out what happened, why it happened, and what to do about it.

We built something different.

Cognition is an AI-powered production observability SDK for Node.js that goes beyond data collection. It captures every uncaught exception and unhandled rejection with full stack traces and execution context. It monitors your runtime health — event loop lag, memory pressure, CPU usage, garbage collection behavior — as it happens, not in five-minute aggregates. It scans incoming request payloads for SQL injection, XSS, path traversal, and command injection before they reach your application logic.

And then it does something that no traditional monitoring tool does: it uses AI to correlate all of those signals together, identify root causes, and take action — automatically, with human-in-the-loop approval.

Install it in one line:

npm install @skytells/cognition

Initialize it in three:

import { Cognition } from "@skytells/cognition";

Cognition.init({
  apiKey: process.env.SKYTELLS_API_KEY!,
  projectId: process.env.SKYTELLS_PROJECT_ID!,
});

That's it. AI-powered cognition is live in your production system.

Why Traditional Monitoring Falls Short

There's a fundamental gap in how most monitoring works today.

A traditional APM tool captures an error. It shows you the stack trace. Maybe it groups similar errors together and gives you a count. If you're lucky, it attaches some request metadata. Then it's up to you — the on-call engineer at 2 AM — to correlate that error with your deployment history, your infrastructure metrics, your recent configuration changes, and whatever else might be relevant.

That correlation work is where the real time is spent during incidents. Not in the fix itself — usually the fix is a rollback, a config change, or a restart. The expensive part is the investigation: opening four different tools, cross-referencing timestamps, building a mental model of what happened across multiple subsystems.

Cognition eliminates that gap. It doesn't just collect signals from different layers — it understands how they relate to each other. When error rates spike, Cognition already knows whether the event loop is lagging, whether memory pressure is building, whether a suspicious request pattern preceded the errors, and whether a recent deployment correlates with the change. That context arrives with the alert, not after thirty minutes of manual investigation.

Three Layers Working Together

Cognition's architecture has three distinct layers, each valuable on its own, but designed to operate as a single system.

Layer 1: Error Capture

Every uncaught exception and unhandled promise rejection is captured automatically. No manual try-catch wrapping. No SDK calls scattered through your codebase. Cognition hooks into Node.js process-level handlers and captures:

  • Full stack traces with source-mapped file paths and line numbers
  • Execution context including the request that triggered the error, active middleware, and route information
  • Breadcrumbs — a timeline of console output, HTTP requests, and custom events that preceded the error, giving you the story of what happened before the crash

Console output is automatically captured as breadcrumbs and attached to error events. You can also add custom breadcrumbs with category, message, and arbitrary metadata:

Cognition.addBreadcrumb({
  category: "payment",
  message: "Stripe charge initiated",
  metadata: { amount: 4999, currency: "usd" },
});

When an error occurs, the breadcrumb timeline tells you what the application was doing in the moments leading up to the failure — without you having to reproduce the issue.

Layer 2: Runtime Observer

The runtime observer tracks the health of your Node.js process continuously:

  • Event loop lag — how long callbacks wait before execution. A healthy Node.js process has sub-millisecond lag. When this number climbs, your application is struggling.
  • Memory usage — heap used, heap total, RSS, external memory, array buffer allocations. Track these over time and you'll see memory leaks forming before they cause OOM kills.
  • CPU utilization — process-level CPU consumption, sampled continuously.
  • Garbage collection pressure — frequency and duration of GC pauses. Frequent, long GC pauses indicate memory allocation patterns that are silently degrading your response times.
  • Handle leaks — active handles (timers, sockets, file descriptors) that aren't being cleaned up. These are the kind of slow-burn issues that don't trigger errors but eventually exhaust system resources.

This data streams to the Skytells Console in real time. On the console dashboard, you'll see a live telemetry view:

MetricExample Value
Heap Used256 MB
Event Loop4.2 ms
CPU12.4%
Errors0
StatusStreaming to dsn.skytells.ai

The runtime observer doesn't just collect numbers. It establishes baselines for your application's normal behavior and flags deviations. If your event loop lag is typically under 5ms and suddenly jumps to 50ms, that's not just a metric change — it's a signal that something has gone wrong, and it gets treated as such.

Layer 3: Security Scanner

Cognition includes a built-in security scanner that inspects incoming request payloads for common attack patterns:

  • SQL injection — parameterized query bypasses, UNION-based injections, boolean-based blind injection patterns
  • Cross-site scripting (XSS) — script tag injection, event handler injection, encoded XSS payloads
  • Path traversal — directory traversal sequences (../, encoded variants) targeting file system operations
  • Command injection — shell metacharacters, command chaining operators, and subshell execution patterns

The scanner works as Express middleware or as a standalone function you can call on any input:

import { securityMiddleware } from "@skytells/cognition";

// As Express middleware — scans every incoming request
app.use(securityMiddleware());

Threats are caught before they reach your application logic. Each detection is logged as a security event with the full request context — source IP, headers, payload, matched attack pattern — and streamed to the console for review.

This isn't a replacement for a WAF or a dedicated security product. It's a defense-in-depth layer that runs inside your application process, close to the code that would be affected. It catches the attacks that reach your app despite your perimeter defenses.

The AI Layer: From Signals to Actions

The three layers above each produce valuable signals independently. Cognition's AI layer is what connects them.

Traditional monitoring systems present you with separate dashboards: errors here, metrics there, security events over there. You're the correlation engine. You're the one who has to notice that the spike in 500 errors started two minutes after a deployment, coincides with a 10x increase in event loop lag, and was preceded by a suspicious burst of requests that look like automated scanning.

Cognition's AI layer does that correlation for you. It receives every error event, every runtime snapshot, and every security detection. It builds a model of normal behavior for your application. And when things deviate from normal, it doesn't just alert — it explains.

An AI-correlated insight looks like this: "Error rate increased 340% at 14:23 UTC. Event loop lag rose from 3ms to 47ms at 14:21 UTC. Deployment dep-abc123 was released at 14:20 UTC. Memory allocation rate doubled after deployment. Probable cause: new code path is creating excessive object allocations under load."

That's not a raw alert. That's a diagnosis.

Automated Response with Human-in-the-Loop

Cognition doesn't stop at diagnosis. It can act.

When the AI layer identifies a situation with high confidence — a deployment that's degrading performance, a security threat that's escalating, a resource leak that will eventually crash the process — it can trigger predefined automated actions. Rollback a deployment. Block a suspicious IP range. Restart a process. Scale up capacity.

Every automated action follows a human-in-the-loop model. The system proposes an action, explains its reasoning, and waits for approval unless the severity and confidence thresholds have been configured for automatic execution. You stay in control. The AI handles the analysis and preparation; the human makes the final call.

This is the difference between monitoring and cognition. Monitoring watches. Cognition understands, reasons, and acts.

Zero Dependencies, Maximum Compatibility

Cognition is built entirely on Node.js built-in modules. No external runtime dependencies.

This matters for several practical reasons:

  • No supply chain risk. Every dependency is a potential vulnerability. Cognition adds zero third-party packages to your dependency tree.
  • No version conflicts. No chance of conflicting with other packages in your application over shared transitive dependencies.
  • Small footprint. The SDK adds minimal overhead to your application's memory and startup time.
  • Full TypeScript support. Ships with complete type declarations — ESM and CJS builds included.

The SDK streams telemetry data to dsn.skytells.ai using lightweight, batched HTTP requests. It's designed to add negligible latency to your application's hot paths.

Monitoring from the Skytells Console

Every signal Cognition produces — errors, runtime snapshots, security events, AI-correlated insights — surfaces in the Skytells Console. The console is where you see the complete picture: your applications, your infrastructure, your inference workloads, and your production health — in one place.

Here's what Cognition's console integration looks like in practice:

Real-Time Dashboard

The Cognition dashboard provides a live view of every connected application. Each application card shows current runtime metrics (heap, event loop, CPU), active error count, security threat status, and the AI assessment of overall application health.

When something is wrong, you don't need to click through pages to find it. The dashboard highlights degraded services immediately, with the AI layer's explanation of what's happening and what's likely causing it.

Error Explorer

The error explorer groups errors by type, frequency, and impact. Each error entry includes the full stack trace, breadcrumb timeline, request context, and the AI layer's root cause analysis. You can filter by time range, error type, severity, and deployment.

For recurring errors, Cognition tracks the first occurrence, the most recent occurrence, the affected deployments, and whether the error correlates with specific runtime conditions (high memory, elevated CPU, active security threats).

Security Event Feed

Every security detection is logged with the full request payload, the matched attack pattern, the scanner's confidence level, and the action taken (blocked, logged, or escalated). The feed is filterable by attack type, source, severity, and time range.

For teams operating under compliance requirements (SOC 2, HIPAA, PCI-DSS), the security event feed provides the evidence trail that auditors need — every threat detected, every action taken, timestamped and attributable.

Runtime Health Timeline

The runtime health view displays historical metrics for event loop lag, memory usage, CPU utilization, and GC activity. Overlay deployment markers on the timeline and you can visually identify which deployments affected performance — and by how much.

This view is where you find the slow-burn issues: the memory leak that grows 10MB per hour, the gradually increasing event loop lag that corresponds to database connection pool saturation, the CPU creep from a caching layer that's not expiring entries.

AI Insights Panel

The AI insights panel consolidates every correlation and recommendation Cognition has produced for your project. Each insight includes the triggering signals, the analysis, the proposed action, and the outcome (if an action was approved and executed).

Over time, this panel becomes an operational knowledge base — a record of what went wrong, why, and what was done about it. New team members can review past insights to understand the operational characteristics of the system without reading through months of incident reports.

Getting Started: From Zero to Production Monitoring in Five Minutes

Setting up Cognition requires three steps. The entire process takes less than five minutes.

Step 1: Create a Project in the Skytells Console

Go to console.skytells.ai/projects/new and create a new project. Give it a name that matches your application or service.

Once the project is created, you'll land on the project dashboard. Note your Project ID — you'll need it for the SDK initialization.

Step 2: Get Your API Key

Navigate to Project Settings → API Keys in the console sidebar. Generate a new API key. This key authenticates the SDK's telemetry stream.

Keep this key private. It should be stored in your environment variables or a secrets manager — never committed to source control. The API key can be scoped to specific permissions, so a key used for Cognition telemetry doesn't need access to deployment or infrastructure operations.

Step 3: Install and Initialize the SDK

In your Node.js project, install the Cognition SDK:

npm install @skytells/cognition

Then, at the very top of your application's entry file — before any other imports that execute application logic — add the initialization:

import { Cognition } from "@skytells/cognition";

Cognition.init({
  apiKey: process.env.SKYTELLS_API_KEY!,
  projectId: process.env.SKYTELLS_PROJECT_ID!,
});

That's it. From this point forward:

  • Every uncaught exception and unhandled rejection is captured with full context
  • Runtime health metrics stream continuously to the console
  • The AI layer begins building a baseline of your application's normal behavior
  • Breadcrumbs are collected automatically from console output

Within seconds, you'll see your application appear on the Cognition dashboard in the Skytells Console. The real-time telemetry view will show heap usage, event loop latency, CPU utilization, and error count — all live.

Optional: Add the Security Scanner

If your application handles HTTP requests, add the security middleware:

import express from "express";
import { Cognition, securityMiddleware } from "@skytells/cognition";

Cognition.init({
  apiKey: process.env.SKYTELLS_API_KEY!,
  projectId: process.env.SKYTELLS_PROJECT_ID!,
});

const app = express();
app.use(securityMiddleware());

Every incoming request is now scanned for SQL injection, XSS, path traversal, and command injection. Detections stream to the console's security event feed in real time.

Optional: Enable OpenTelemetry Bridge

If your application already uses OpenTelemetry for distributed tracing, Cognition can act as a transport layer — shipping OTel spans through the Cognition pipeline with a single configuration flag. No new instrumentation. No extra plumbing. Your existing tracing setup flows through the same console where errors, runtime metrics, and security events live.

Optional: Event Filtering and PII Redaction

Cognition provides a beforeSend hook that lets you inspect, transform, or drop events before they leave your application:

Cognition.init({
  apiKey: process.env.SKYTELLS_API_KEY!,
  projectId: process.env.SKYTELLS_PROJECT_ID!,
  beforeSend: (event) => {
    // Redact sensitive fields
    if (event.context?.request?.headers?.authorization) {
      event.context.request.headers.authorization = "[REDACTED]";
    }
    // Drop events from health check endpoints
    if (event.context?.request?.url?.includes("/health")) {
      return null;
    }
    return event;
  },
});

This gives you fine-grained control over what data leaves your infrastructure — essential for applications handling PII, financial data, or health records.

Proven in Production: The Gulf Region Incident

On March 5, 2026, a regional connectivity disruption affected multiple cloud providers across the Gulf region. Several major platforms experienced prolonged outages. Skytells' infrastructure maintained continuity for all affected clients.

Cognition was central to that response. Here's what happened:

TimelineEvent
T+0 minCognition's runtime observer detected elevated error rates and event loop latency spikes across multiple applications in the affected region. Alerts triggered within seconds — before external status pages updated.
T+10 minThe AI layer correlated the signals: error patterns matched regional connectivity failure, not application-level bugs. It isolated the root cause and identified which backend paths were affected.
T+15 minAutomated failover triggered. Cognition's analysis powered multi-vendor route failover with zero manual intervention. Traffic rerouted to unaffected paths.
T+55 minFull service restored across all affected applications. Zero data loss. 100% client continuity maintained throughout the incident.

The total detection-to-recovery time was under 60 minutes. More importantly, Cognition detected the anomaly in under 5 minutes — while the teams at other providers were still figuring out that something was wrong.

That's the difference between monitoring that collects data and monitoring that understands what's happening. Cognition's AI layer didn't just flag the error spike. It determined that the errors were infrastructure-related (not application-related), identified the affected network paths, and triggered the appropriate failover response. The humans involved approved the action; the system did the analysis and execution.

CLI Access for Terminal-Native Workflows

For engineers who prefer the terminal, every Cognition capability is accessible through the Skytells CLI:

# Application health overview
skytells cognition overview --project proj-id --hours 24

# List recent errors with full context
skytells cognition errors --limit 20 --json

# Review security events
skytells cognition security --project proj-id

# Detect anomalies
skytells cognition anomalies --project proj-id --limit 10

# Stream real-time events for alerting integrations
skytells cognition events --project proj-id --since last-event-id --json

# Time-series metrics for capacity planning
skytells cognition timeseries --project proj-id --hours 48 --json

Combined with skytells logs --follow in an adjacent terminal pane, an on-call engineer has a complete incident response environment without opening a browser.

What Cognition Replaces

Teams adopting Cognition typically retire or simplify several parts of their existing monitoring stack:

  • Error tracking (Sentry, Bugsnag, Rollbar) — Cognition's error capture with breadcrumbs and AI analysis covers this entirely
  • APM runtime monitoring (New Relic, Datadog APM, Dynatrace) — the runtime observer tracks the same metrics with zero dependencies
  • Security scanning (rate limiters, WAF add-ons) — the built-in scanner adds a defense-in-depth layer at the application level
  • Manual incident correlation — the AI layer does the cross-signal analysis that on-call engineers currently do by hand

This isn't about consolidating for the sake of fewer tools. It's about the operational advantage of having error data, runtime metrics, and security signals processed by a single AI system that understands the relationships between them. Three separate tools can each tell you that something is wrong. One integrated system can tell you why it's wrong and what to do about it.

Pricing and Availability

Cognition follows a freemium model with monthly billing. The free tier includes enough event volume for development and early-stage production workloads. Pricing details are available at skytells.ai/pricing.

The SDK is available on npm:

npm install @skytells/cognition

Source code and documentation:

About Skytells

Skytells has been building AI systems, foundation models, and production infrastructure since 2012. The platform serves over 15,000 organizations across technology, healthcare, finance, and government sectors. Cognition is the latest addition to a product suite that includes the Skytells Console, the Skytells CLI, managed GPU infrastructure, edge computing, and enterprise AI solutions. Learn more at skytells.ai.

Share this article

Sarah Burton

Sarah Burton

Product and strategy lead at Skytells, with a background in AI product management and development.

Last updated on

More Articles

Introducing Skytells Cloud Agents: Parallel Engineering at the Speed of Intent

Skytells announces Cloud Agents — a production system that puts multiple specialized AI agents to work inside your GitHub repository workflows simultaneously. Orchestrated by Eve, powered by Skytells and leading models, and designed for engineering teams who can't afford serial execution.

Read more

Inside the Skytells Console — One Control Plane for AI, Infrastructure, and Production Operations

A detailed look at the Skytells Console: the unified platform where engineering teams manage AI model inference across 50+ models, deploy applications from Git, provision GPU and cloud infrastructure across 30+ global regions, design enterprise networks, and monitor production systems — all from a single control plane.

Read more

Introducing the Skytells CLI — Ship, Scale, and Observe from Your Terminal

Skytells announces the general availability of its official command-line interface. The Skytells CLI gives engineering teams a single tool to manage projects, deploy applications, provision databases, control GPU cloud instances, run orchestration workflows, and monitor production systems — all from the terminal.

Read more