Pipeline Nodes

Cortex ships with a set of built-in processing nodes. Each node has a unique name that is used to reference it in CLI --actions flags and pipeline definitions.

Results are stored in the local metadata cache (--meta-path) and optionally forwarded to Loom via the loom node.

Hashing Nodes

sha256

Computes the SHA-256 hash of the media file.

  • Output key: sha256 (string)

  • Reads from Loom (online mode) to avoid recomputation when the hash is already stored.

sha512

Computes the SHA-512 hash of the media file.

  • Output key: sha512 (string)

md5

Computes the MD5 hash of the media file.

  • Output key: md5 (string)

chunk-hash

Computes a content-based chunked hash. Used for near-duplicate detection on partially modified files.

  • Output key: chunk_hash (string)

Fingerprinting

fingerprint

Generates a perceptual video fingerprint using the multi-sector algorithm from video4j-fingerprint. The fingerprint enables similarity search and near-duplicate detection for video content.

  • Applicable to: video files

  • Output key: fingerprint (string)

  • Depends on: video4j native library (OpenCV)

Media Metadata

tika

Extracts rich metadata from any file format using Apache Tika. Supports images, video, audio, PDFs, Office documents, and more.

  • Applicable to: all media types

  • Output keys:

    • tika_flags — processing flags

    • tika_content — full extracted text content

Thumbnail Generation

thumbnail

Generates a preview thumbnail image from a video or image file.

  • Applicable to: video, image

  • Output keys:

    • thumbnail_flag — processing status flag

    • thumbnail_path — path to the generated thumbnail file

Scene Detection

scene-detection

Detects scene changes in video files using optical-flow analysis. Returns a list of timestamps where scene cuts occur.

  • Applicable to: video files

  • Output key: scene_detection (string/JSON)

  • Algorithm: OpticalFlowSceneDetector

Face Detection

facedetect

Detects faces in images and video frames. Extracts face regions and (optionally) computes face embeddings for recognition.

  • Applicable to: image, video

  • Output: face bounding boxes, embedding vectors

  • Supports scanning full video via VideoFaceScanner

face-description

Generates a textual description of detected faces using a vision-language model.

  • Depends on: facedetect node (upstream)

Deduplication

hash-dedup

Detects exact duplicates by comparing SHA-512 hashes against the Loom asset database.

  • Depends on: sha512 node (upstream)

fingerprint-dedup

Detects near-duplicate videos by comparing fingerprints via vector similarity search.

  • Depends on: fingerprint node (upstream)

OCR

ocr

Extracts text from images and document scans using Tesseract (via Tess4J).

  • Applicable to: image, PDF

  • Output key: ocr_text (string)

Captioning

captioning

Generates a natural-language caption for an image or video frame using the SmolVLM vision-language model.

  • Applicable to: image, video

  • Output key: caption_result (string)

  • Requires: SmolVLM HTTP service (configurable host/port via CaptioningNodeOptions)

LLM Enrichment

llm

Sends asset metadata to a large language model (LLM) via Ollama for enrichment. The default prompt asks the model to classify and describe the asset.

  • Output key: LLM-generated JSON response

  • Default model: gemma2:27b

  • Requires: Ollama service

Consistency

consistency

Checks that stored metadata is consistent with the actual file on disk. Detects corruption, truncation, and format mismatches.

  • Applicable to: all media types

Loom Sync

loom

Forwards all results from upstream nodes to the Loom server via the REST API. This node is what makes "online mode" work — without it, Cortex only writes to local xattr/meta-path storage.

  • Requires: Loom connection (--hostname, --port)

Speech-to-Text

whisper

Transcribes audio tracks using the Whisper speech-to-text model.

  • Applicable to: audio, video (audio track)

  • Requires: whisper.cpp or compatible Whisper HTTP service

Quality Assessment

quality

Assesses the perceptual quality of an image or video frame (blur detection, exposure, noise level).

  • Applicable to: image, video

  • Output key: quality score