Vector Functions (Similarity, Distance, Norms)
LoraDB has a first-class VECTOR value type
with a compact set of built-in functions for measuring similarity,
computing signed distances under standard metrics, and inspecting
shape. Vector values are constructed with casts. Every similarity / distance
computation is exhaustive when called directly in a query. For a
cataloged vector search surface, use
CREATE VECTOR INDEX with
db.index.vector.queryNodes or queryRelationships; those procedures
currently use flat scan execution over the indexed label/type scope.
All similarity / distance math uses f32 internally: coordinates
are converted into f32 before accumulation, then the scalar result
widens back to f64. This is stable regardless of the underlying
coordinate type.
Overview
| Goal | Function |
|---|---|
| Construct a vector | [1, 2, 3]::VECTOR<INTEGER>(3) |
Bounded similarity (higher = closer, in [0, 1]) | vector.similarity(a, b), vector.similarity(a, b, 'euclidean') |
| Signed distance under a named metric (smaller = closer) | vector.distance(a, b, METRIC) |
| Magnitude of a vector | vector.norm(v, METRIC) |
| Dimension | vector.dimension(v), value.size(v) |
| Runtime type tag | type.of(v) |
Back to a LIST | vector.coordinates(v, INTEGER), vector.coordinates(v, FLOAT) |
vector.similarity also accepts a plain LIST<NUMBER> on either
side — the list is coerced to a FLOAT32 vector of the same length.
vector.distance and vector.norm require real VECTOR values.
Construction
Construct vectors with value::VECTOR<COORD>(DIM) or
CAST(value AS VECTOR<COORD>(DIM)). See Data Types → Vectors →
Construction for the full rules;
the examples below cover the common shapes.
// Integer-backed
RETURN [1, 2, 3]::VECTOR<INTEGER>(3) AS v; // VECTOR<INTEGER>(3)
RETURN [1, 2, 3]::VECTOR<INT8>(3) AS v; // VECTOR<INTEGER8>(3)
RETURN [1, 2, 3]::VECTOR<INT16>(3) AS v; // VECTOR<INTEGER16>(3)
RETURN [1, 2, 3]::VECTOR<INT32>(3) AS v // VECTOR<INTEGER32>(3)
;// Float-backed
RETURN [0.1, 0.2, 0.3]::VECTOR<FLOAT32>(3) AS v; // VECTOR<FLOAT32>(3)
RETURN [0.1, 0.2, 0.3]::VECTOR<FLOAT64>(3) AS v; // VECTOR<FLOAT64>(3)
RETURN [0.1, 0.2, 0.3]::VECTOR<FLOAT>(3) AS v // FLOAT is an alias for FLOAT64
;// CAST(...) form, useful when the value is already parenthesized
RETURN CAST('[1.05, 0.123, 5]' AS VECTOR<FLOAT64>(3)) AS v;
RETURN CAST('[1e-2, 2e-2, 3e-2]' AS VECTOR<FLOAT32>(3)) AS vCoordinate-type tag forms
The coordinate tag appears inside the VECTOR<...>(...) type. Matching
is case-insensitive and accepts aliases such as FLOAT for FLOAT64
and INT8 for INTEGER8.
RETURN [1, 2, 3]::VECTOR<INTEGER>(3);
RETURN [1, 2, 3]::VECTOR<INT8>(3);
RETURN [1, 2, 3]::VECTOR<FLOAT>(3) // FLOAT aliases FLOAT64From a parameter
RETURN $embedding::VECTOR<FLOAT32>(384) AS query_vec;
RETURN CAST($embedding AS VECTOR<FLOAT32>(384)) AS query_vec// Node / TypeScript
await db.execute(
'RETURN $embedding::VECTOR<FLOAT32>(384) AS q',
{ embedding: myFloat32Array },
);
# Python
db.execute(
"RETURN $embedding::VECTOR<FLOAT32>(384) AS q",
{"embedding": embedding_list},
)
Coercion quick reference
// Integers promoted to float-backed vectors (exact for small magnitudes)
RETURN [1, 2, 3]::VECTOR<FLOAT32>(3) // [1.0, 2.0, 3.0]
;// Floats truncate toward zero into integer-backed vectors
RETURN [1.9, -1.9, 0.999, -0.999]::VECTOR<INTEGER>(4)
// [1, -1, 0, 0]
;// Out-of-range errors loudly (no silent saturation)
RETURN [128]::VECTOR<INT8>(1); // error: value 128 overflows INTEGER8
RETURN [2e39]::VECTOR<FLOAT32>(1) // error: value overflows FLOAT32
;// NaN / Infinity / mixed types / nested lists all error
RETURN [1, 'two', 3]::VECTOR<FLOAT32>(3) // error: non-numeric coordinateNull propagation
RETURN null::VECTOR<FLOAT32>(3); // null
RETURN CAST(null AS VECTOR<FLOAT32>(3)) // nullBounded similarity
vector.similarity(a, b) defaults to cosine. Pass 'euclidean' as the
third argument for bounded Euclidean similarity. Both forms return a
scalar in [0, 1] where higher = more similar and accept a VECTOR
or a LIST<NUMBER> on either side.
// Cosine: (1 + raw_cosine) / 2
RETURN vector.similarity([1, 0, 0], [1, 0, 0]); // 1.0 (identical direction)
RETURN vector.similarity([1, 0, 0], [0, 1, 0]); // 0.5 (orthogonal)
RETURN vector.similarity([1, 0, 0], [-1, 0, 0]); // 0.0 (opposite)
RETURN vector.similarity([1, 2, 3], [2, 4, 6]) // 1.0 (colinear)
;// Euclidean: 1 / (1 + d²)
RETURN vector.similarity([4, 5, 6], [2, 8, 3], 'euclidean'); // ≈ 0.04348
RETURN vector.similarity([0, 0, 0], [0, 0, 0], 'euclidean') // 1.0Mixing lists and vectors
// Pure VECTOR on both sides
WITH [0.1, 0.2, 0.3]::VECTOR<FLOAT32>(3) AS a,
[0.2, 0.2, 0.2]::VECTOR<FLOAT32>(3) AS b
RETURN vector.similarity(a, b) AS score
;// VECTOR vs LIST: list is coerced to FLOAT32
MATCH (d:Doc)
RETURN d.id,
vector.similarity(d.embedding, [0.1, 0.2, 0.3]) AS score
ORDER BY score DESC
LIMIT 10
;// LIST vs LIST: both coerced, useful for ad-hoc debugging
RETURN vector.similarity([1, 2, 3], [1, 2, 3]) AS scoreNull / error semantics
// null on either side → null
RETURN vector.similarity(null, [1, 2, 3]); // null
RETURN vector.similarity([1,2]::VECTOR<FLOAT32>(2), null) // null
;// zero-norm argument to cosine → null (cosine is undefined)
RETURN vector.similarity([0, 0, 0], [1, 2, 3]) // null
;// dimension mismatch → error
RETURN vector.similarity([1, 2, 3], [1, 2]) // error
;// empty list → error
RETURN vector.similarity([], [1, 2, 3]) // errorSigned distance
vector.distance(a, b, METRIC) — smaller = more similar. Both
arguments must be real VECTOR values with matching dimensions; a
plain list is rejected here (unlike the bounded-similarity functions).
| Metric | Formula | Range |
|---|---|---|
EUCLIDEAN | sqrt(Σ (aᵢ - bᵢ)²) | [0, ∞) |
EUCLIDEAN_SQUARED | Σ (aᵢ - bᵢ)² | [0, ∞) — skip the sqrt for pure ranking |
MANHATTAN | Σ |aᵢ - bᵢ| | [0, ∞) |
COSINE | 1 - raw_cosine(a, b) (raw, not bounded) | [0, 2] — identical = 0, opposite = 2 |
DOT | -(a · b) — negated so smaller = closer | (-∞, ∞) |
HAMMING | count of positions where aᵢ ≠ bᵢ (f32 compare) | [0, dim] |
Metric names are case-insensitive and may be passed as bare identifiers or quoted strings.
WITH [1, 2, 3]::VECTOR<FLOAT32>(3) AS a,
[4, 6, 8]::VECTOR<FLOAT32>(3) AS b
RETURN vector.distance(a, b, EUCLIDEAN) AS l2, // ≈ 7.0711
vector.distance(a, b, EUCLIDEAN_SQUARED) AS l2_squared, // 50.0
vector.distance(a, b, MANHATTAN) AS l1, // 12.0
vector.distance(a, b, COSINE) AS cos_dist,
vector.distance(a, b, DOT) AS neg_dot,
vector.distance(a, b, HAMMING) AS hamming // 3 (all positions differ)Pick the right metric
// L2 / Euclidean — generic "closeness", respects magnitude.
WITH [1, 0]::VECTOR<FLOAT32>(2) AS a, [3, 0]::VECTOR<FLOAT32>(2) AS b
RETURN vector.distance(a, b, EUCLIDEAN) // 2.0
;// Squared L2 — same ranking as L2, cheaper (no sqrt). Use for ORDER BY.
WITH [1, 0]::VECTOR<FLOAT32>(2) AS a, [3, 0]::VECTOR<FLOAT32>(2) AS b
RETURN vector.distance(a, b, EUCLIDEAN_SQUARED) // 4.0
;// Cosine — magnitude-invariant; parallel vectors are "the same".
WITH [1, 2, 3]::VECTOR<FLOAT32>(3) AS a,
[2, 4, 6]::VECTOR<FLOAT32>(3) AS b
RETURN vector.distance(a, b, COSINE) // ≈ 0.0 (colinear)
// Dot — raw inner product, negated so "smaller is closer".
;// Useful when embeddings are already unit-normalised.
WITH [1, 0]::VECTOR<FLOAT32>(2) AS a, [1, 0]::VECTOR<FLOAT32>(2) AS b
RETURN vector.distance(a, b, DOT) // -1.0
;// Hamming — positionwise difference count. Handy for binary / quantised vectors.
WITH [1, 0, 1, 1]::VECTOR<INT8>(4) AS a,
[1, 1, 1, 0]::VECTOR<INT8>(4) AS b
RETURN vector.distance(a, b, HAMMING) // 2Null / error semantics
// null vectors or null metric → null
RETURN vector.distance(null, [1,2,3]::VECTOR<FLOAT32>(3), EUCLIDEAN); // null
RETURN vector.distance([1,2,3]::VECTOR<FLOAT32>(3), null, EUCLIDEAN); // null
RETURN vector.distance([1,2,3]::VECTOR<FLOAT32>(3),
[4,5,6]::VECTOR<FLOAT32>(3), null) // null
;// Plain list → error (unlike vector.similarity)
RETURN vector.distance([1,2,3], [4,5,6]::VECTOR<FLOAT32>(3), EUCLIDEAN) // error
;// Dimension mismatch → error
RETURN vector.distance([1,2]::VECTOR<FLOAT32>(2),
[1,2,3]::VECTOR<FLOAT32>(3), EUCLIDEAN) // error
;// Unknown metric → error
RETURN vector.distance([1,2,3]::VECTOR<FLOAT32>(3),
[4,5,6]::VECTOR<FLOAT32>(3), 'MAHALANOBIS') // errorNorms
vector.norm(v, METRIC) — magnitude of a single vector.
| Metric | Formula |
|---|---|
EUCLIDEAN | sqrt(Σ xᵢ²) — L2 length |
MANHATTAN | Σ |xᵢ| — L1 length |
WITH [3, 4]::VECTOR<FLOAT32>(2) AS v
RETURN vector.norm(v, EUCLIDEAN); // 5.0 (3² + 4² = 25)
WITH [3, 4]::VECTOR<FLOAT32>(2) AS v
RETURN vector.norm(v, MANHATTAN); // 7.0
WITH [1, -2, 2]::VECTOR<FLOAT32>(3) AS v
RETURN vector.norm(v, EUCLIDEAN) // 3.0 (sqrt(9))
;// null propagates
RETURN vector.norm(null, EUCLIDEAN); // null
RETURN vector.norm([1,2,3]::VECTOR<FLOAT32>(3), null); // null
RETURN vector.norm([1,2,3]::VECTOR<FLOAT32>(3), 'MAHALANOBIS') // errorUnit-normalisation pattern
There's no built-in vector.normalize — compose with a list and cast
the rebuilt coordinates:
WITH [3, 0, 4]::VECTOR<FLOAT32>(3) AS v
WITH v, vector.norm(v, EUCLIDEAN) AS n, vector.coordinates(v, FLOAT) AS coords
RETURN [coords[0] / n, coords[1] / n, coords[2] / n]::VECTOR<FLOAT32>(3) AS unit
// [0.6, 0.0, 0.8]For variable-dimension unit-normalisation, this is easier to keep host-side — the client languages all ship vector parameter helpers.
Introspection
| Expression | Returns | Notes |
|---|---|---|
type.of(v) | String — "VECTOR<TYPE>(DIM)" | Only type whose tag encodes structure |
value.size(v) | Int — dimension | Same as vector.dimension |
vector.dimension(v) | Int — dimension | Explicit name |
WITH [1, 2, 3, 4]::VECTOR<FLOAT32>(4) AS v
RETURN type.of(v) AS t, // 'VECTOR<FLOAT32>(4)'
value.size(v) AS s, // 4
vector.dimension(v) AS d // 4
;// Coordinate type is part of the type tag
RETURN type.of([1,2,3]::VECTOR<INTEGER>(3)); // 'VECTOR<INTEGER>(3)'
RETURN type.of([1,2,3]::VECTOR<INT8>(3)) // 'VECTOR<INTEGER8>(3)'Guarding by shape
MATCH (d:Doc)
WHERE type.of(d.embedding) = 'VECTOR<FLOAT32>(384)'
RETURN d.id AS idMATCH (d:Doc)
WHERE vector.dimension(d.embedding) = 384
RETURN count(*) AS docs_with_384dList conversion
| Function | From | To |
|---|---|---|
vector.coordinates(v, INTEGER) | any VECTOR | LIST<INTEGER> — truncates toward zero |
vector.coordinates(v, FLOAT) | any VECTOR | LIST<FLOAT> — widens exact |
RETURN vector.coordinates([1.9, -1.9, 3.0]::VECTOR<FLOAT32>(3), INTEGER)
; // [1, -1, 3]
RETURN vector.coordinates([1, 2, 3]::VECTOR<INT8>(3), FLOAT)
// [1.0, 2.0, 3.0]
;// null propagates
RETURN vector.coordinates(null, INTEGER); // null
RETURN vector.coordinates(null, FLOAT) // null
;// non-VECTOR input errors
RETURN vector.coordinates([1, 2, 3], FLOAT) // errorBoth converters round-trip cleanly through the binding layer — use them when you need to hand off to a caller that wants a plain array.
kNN and retrieval patterns
Top-k by cosine similarity
MATCH (d:Doc)
RETURN d.id AS id, d.title AS title
ORDER BY vector.similarity(d.embedding, $query) DESC
LIMIT 10Top-k carrying the score forward
MATCH (d:Doc)
WITH d, vector.similarity(d.embedding, $query) AS score
RETURN d.id AS id, d.title AS title, score
ORDER BY score DESC
LIMIT 10Top-k by signed distance (smaller = closer)
MATCH (d:Doc)
WITH d, vector.distance(d.embedding, $query, EUCLIDEAN) AS dist
RETURN d.id AS id, dist
ORDER BY dist ASC
LIMIT 10Cheaper ranking with EUCLIDEAN_SQUARED
The rankings are identical; EUCLIDEAN_SQUARED skips the sqrt.
MATCH (d:Doc)
WITH d, vector.distance(d.embedding, $query, EUCLIDEAN_SQUARED) AS d2
RETURN d.id AS id
ORDER BY d2 ASC
LIMIT 20Narrow the candidate set first
Similarity is O(n) over matched nodes — push filters into MATCH
and WHERE before scoring.
MATCH (d:Doc {tenant: $tenant})
WHERE d.language = 'en' AND d.published_at >= '2026-01-01'::DATE
WITH d, vector.similarity(d.embedding, $query) AS score
RETURN d.id, score
ORDER BY score DESC
LIMIT 10Score threshold
MATCH (d:Doc)
WITH d, vector.similarity(d.embedding, $query) AS score
WHERE score >= 0.75
RETURN d.id, score
ORDER BY score DESCGraph-filtered retrieval
The reason VECTOR lives next to the graph — score first, then use
relationships to explain or filter.
MATCH (d:Doc)
WITH d, vector.similarity(d.embedding, $query) AS score
MATCH (d)-[:MENTIONS]->(e:Entity)
WHERE e.type = $entity_type
RETURN d.id, d.title, score, collect(e.name) AS entities
ORDER BY score DESC
LIMIT 5Neighbour-expansion after retrieval
Pull the local graph context around each hit:
MATCH (d:Doc)
WITH d, vector.similarity(d.embedding, $query) AS score
ORDER BY score DESC
LIMIT 10
MATCH (d)-[:CITED_BY]->(citing:Doc)
RETURN d.id AS hit, score, collect(citing.id) AS citationsPer-category nearest
MATCH (d:Doc)
WITH d, d.category AS category,
vector.similarity(d.embedding, $query) AS score
ORDER BY score DESC
WITH category, collect({d: d, score: score})[0] AS top
RETURN category, top.d.id AS id, top.score AS score
ORDER BY score DESCHybrid: keyword + vector
Blend a lexical boost into a vector score — straight arithmetic, no special function required:
MATCH (d:Doc)
WHERE string.lower(d.title) CONTAINS string.lower($q)
OR string.lower(d.body) CONTAINS string.lower($q)
WITH d,
vector.similarity(d.embedding, $query) AS vec_score,
CASE WHEN string.lower(d.title) CONTAINS string.lower($q) THEN 0.2 ELSE 0.0 END AS title_boost
RETURN d.id AS id,
vec_score + title_boost AS score
ORDER BY score DESC
LIMIT 10Expand candidates via relationships, then rank
Start from a seed node, hop to candidates through the graph, then rank by similarity to the query vector:
MATCH (seed:Doc {id: $seed_id})-[:SIMILAR_TO*1..2]-(candidate:Doc)
WHERE candidate.id <> $seed_id
WITH DISTINCT candidate,
vector.similarity(candidate.embedding, $query) AS score
RETURN candidate.id, score
ORDER BY score DESC
LIMIT 10Multi-vector query (best-of / max-sim)
Score each document against several query vectors and keep the best:
UNWIND $queries AS q
MATCH (d:Doc)
WITH d, max(vector.similarity(d.embedding, q)) AS best
RETURN d.id, best
ORDER BY best DESC
LIMIT 10Average query (query centroid)
If you have several positive examples, a host-side average is usually clearer than a Cypher fold. For a small fixed number, you can also stay in Cypher:
WITH [$q1, $q2, $q3] AS qs
MATCH (d:Doc)
WITH d, reduce(acc = 0.0,
q IN qs |
acc + vector.similarity(d.embedding, q) / value.size(qs)) AS score
RETURN d.id, score
ORDER BY score DESC
LIMIT 10Metric comparison side-by-side
Useful during debugging — show every metric for one candidate:
WITH [0.10, 0.20, 0.30]::VECTOR<FLOAT32>(3) AS q
MATCH (d:Doc {id: $id})
RETURN d.id,
vector.similarity (d.embedding, q) AS cos_bounded,
vector.similarity(d.embedding, q, 'euclidean') AS euc_bounded,
vector.distance(d.embedding, q, EUCLIDEAN) AS l2,
vector.distance(d.embedding, q, EUCLIDEAN_SQUARED) AS l2_sq,
vector.distance(d.embedding, q, MANHATTAN) AS l1,
vector.distance(d.embedding, q, COSINE) AS cos_dist,
vector.distance(d.embedding, q, DOT) AS neg_dotCount above threshold
MATCH (d:Doc)
WITH d, vector.similarity(d.embedding, $query) AS score
WHERE score >= $threshold
RETURN count(*) AS hitsBucket by similarity band
MATCH (d:Doc)
WITH vector.similarity(d.embedding, $query) AS score
WITH CASE
WHEN score >= 0.9 THEN 'very-close'
WHEN score >= 0.7 THEN 'close'
WHEN score >= 0.5 THEN 'related'
ELSE 'distant'
END AS band
RETURN band, count(*) AS n
ORDER BY n DESCDedup by identity and keep the best match
MATCH (d:Doc)
WITH d, vector.similarity(d.embedding, $query) AS score
ORDER BY score DESC
WITH d.fingerprint AS fp, collect({d: d, score: score})[0] AS best
RETURN best.d.id AS id, best.score AS score
ORDER BY score DESC
LIMIT 10Bulk insert
Vectors load efficiently through a single UNWIND over a parameter
list. Each row becomes a standalone CREATE, so each vector flows
through property conversion as a top-level property.
UNWIND $batch AS row
CREATE (:Doc {id: row.id, title: row.title, embedding: row.embedding})import { vector } from "@loradb/lora-node";
await db.execute(
`UNWIND $batch AS row
CREATE (:Doc {id: row.id, title: row.title, embedding: row.embedding})`,
{ batch: docs.map(d => ({
id: d.id,
title: d.title,
embedding: vector(d.embedding, 384, "FLOAT32"),
})) },
);
Edge cases
DISTINCT keys coordinate type
DISTINCT collapses duplicates by coordinate type + dimension +
values; vectors of different coord types never dedup to each other
even with numerically identical values:
UNWIND [
[1, 2, 3]::VECTOR<INTEGER>(3),
[1, 2, 3]::VECTOR<INTEGER>(3),
[1, 2, 3]::VECTOR<INTEGER8>(3)
] AS v
RETURN DISTINCT v
// returns two rows: one INTEGER, one INTEGER8Ordering by a vector column
ORDER BY some_vector_column is accepted and stable, but the order
is implementation-defined. Order by a scalar score when intent
matters:
// Works, but meaningless as a primary sort.
MATCH (d:Doc) RETURN d ORDER BY d.embedding LIMIT 5
;// Use this instead.
MATCH (d:Doc)
RETURN d
ORDER BY vector.similarity(d.embedding, $query) DESC
LIMIT 5Zero vectors and cosine
Cosine on a zero-norm vector is undefined, so
vector.similarity([0,…], anything) returns null. Filter
or coalesce explicitly:
MATCH (d:Doc)
WITH d, coalesce(vector.similarity(d.embedding, $query), 0.0) AS score
RETURN d.id, score
ORDER BY score DESC
LIMIT 10Integer-backed vectors in similarity
Integer coordinates widen to f32 before accumulation — identical
ranking behaviour to a FLOAT32 vector with the same values, modulo
precision loss for magnitudes that don't fit in the mantissa.
RETURN vector.similarity([1,2,3]::VECTOR<INT8>(3),
[2,4,6]::VECTOR<INT8>(3))
// 1.0 (colinear, same result as FLOAT32)HTTP and parameters
POST /query does not yet accept a params field, so a vector cannot
ride in as a parameter over HTTP. Either embed the vector literally
in the query — using the string form makes this practical —
curl -s http://127.0.0.1:4747/query \
-H 'content-type: application/json' \
-d '{"query":"RETURN [0.1,0.2,0.3]::VECTOR<FLOAT32>(3) AS v"}'
— or use one of the in-process bindings, which all support parameters.
Index-backed retrieval
For the supported index procedure surface, create a vector index and
query it with CALL:
CREATE VECTOR INDEX doc_embedding
FOR (d:Doc)
ON (d.embedding)
OPTIONS {indexConfig: {
\`vector.dimensions\`: 384,
\`vector.similarity_function\`: 'cosine'
}};
CALL db.index.vector.queryNodes('doc_embedding', 10, $query)
YIELD node, score;This returns the top k rows by descending score. k must be
positive, and the query vector dimension must match the index
configuration. See Queries → Indexes → Vector indexes
for relationship indexes and option details.
Limitations
- No ANN structure yet — vector index procedures are supported, but currently scan the indexed label/type scope linearly.
- Direct vector function calls are exhaustive — keep
MATCHfilters tight when usingORDER BY vector.similarity(...) LIMIT k. - No embedding generation — LoraDB has no plugin surface. Produce embeddings host-side and pass them in.
- No list-of-vectors as a property — store each vector on its own node or relationship. Lists of vectors inside a query are fine.
- No parameters over HTTP — see the note above.
- Dimension ≤ 4096 — enforced at construction.
- Ordering by a vector column is unspecified — order by a scalar score instead.
See also the Cypher support matrix (§13b) for the engine-side behaviour grid.
See also
- Vectors (data type) — full reference for
the
VECTORvalue type: storage, coercion, parameter binding. - Cookbook → Vector-retrieval patterns — top-k and graph-filtered retrieval recipes.
- Queries → Parameters — passing vectors as parameters.
- Math — scalar arithmetic used alongside vector scores.
- Aggregation —
max,min,collect, used after ranking.
Background reading
- Vectors belong next to relationships — why similarity lives as a value type instead of in a sidecar store.
- LoraDB v0.2: vector values for connected AI context
— the release that introduced
VECTOR.