Features

Media Processing

Cortex can process any media file type — images, videos, audio, and documents. The specific analyses performed depend on which nodes are active in a pipeline.

Offline Mode

In offline mode Cortex requires no network connection and no Loom server. Extracted metadata is stored using extended file attributes (xattr) or in the local meta-path directory (~/.cache/metaloom/cortex/meta). This makes Cortex suitable for air-gapped environments and for pre-processing large archives before connecting to a Loom server.

Online Mode

When connected to Loom, Cortex:

  1. Authenticates against the Loom REST API.

  2. Loads active pipeline definitions from Loom.

  3. Processes media assets found in configured source paths.

  4. Sends extracted results (hashes, thumbnails, embeddings, captions, …) back to Loom via the REST API.

  5. Publishes real-time processing events over the pipeline WebSocket.

Asset Hashing

Multiple hash algorithms are supported in parallel:

  • SHA-256

  • SHA-512

  • MD5

  • Chunk-based content hash

Hashes enable exact deduplication and fast lookup of already-processed assets.

Video Fingerprinting

The fingerprint node computes a perceptual multi-sector fingerprint for video files using the video4j-fingerprint library. The fingerprint is invariant to transcoding, resolution changes, and minor edits. It enables near-duplicate detection at scale via vector similarity search.

Face Detection

Cortex detects faces in both images and video using the InspireFace library (or dlib/InsightFace in alternative builds). For each detected face it can:

  • Return bounding box coordinates

  • Extract facial embeddings

  • Compute the optimal focal point for cropped thumbnails

Thumbnail Generation

Cortex generates preview images from video frames and source images. Thumbnails are stored locally and, in online mode, uploaded to Loom as asset binaries.

Metadata Extraction (Tika)

Apache Tika extracts structured and unstructured metadata from any supported file format. This includes EXIF data from images, codec information from videos, and full-text content from documents.

Scene Detection

The optical-flow scene detector splits videos into scenes, producing a list of cut timestamps. These timestamps can drive downstream workflows such as generating per-scene thumbnails or clips.

OCR

The ocr node uses Tesseract to extract text from images and document pages.

AI Captioning & LLM Enrichment

Cortex integrates with SmolVLM (for captioning) and Ollama (for LLM enrichment) to automatically generate natural-language descriptions of media assets.

Speech-to-Text

The whisper node transcribes audio and video soundtracks using the Whisper model.

Consistency Checking

Cortex can validate that stored metadata (hashes, dimensions, codec info) still matches the file on disk. Inconsistencies are flagged for review or automatic remediation.

Extensibility

Cortex is designed to be extended. Custom nodes can be built by:

  1. Extending AbstractMediaNode<YourOptions>.

  2. Implementing name(), isProcessable(), and compute().

  3. Registering the node via a Dagger module.

See Cortex Examples for a complete working example.

Deployment Flexibility

Cortex can run as:

  • A long-running daemon server (cortex server start)

  • A one-shot batch processor (cortex process run <path>)

  • A Kubernetes CronJob or Job workload

  • A standalone Docker container