Tencent MPS
一个面向 crypto 场景的 Agent 技能。原始说明:腾讯云 MPS 媒体处理服务,支持以下功能:【视频转码】转码/压缩/格式转换/H.264/H.265/AV1/MP4/编码/码率/分辨率/帧率。【画质增强】画质增强/老片修复/超分/视频超分/真人增强/漫剧增强/防抖/720P/1080P/2K/4K。【音频处理】音频分离/人声提取/伴奏提取/去...
name: lance-format
description: Reference for Lance v7 - the open columnar lakehouse format for multimodal AI - and its Rust crate workspace (lance, lance-table, lance-file, lance-encoding, lance-index, lance-io, lance-namespace, and more). Use when building directly on the Lance crates - creating or reading .lance datasets, manifests, fragments, deletion files, the 2.x file format and structural encodings, vector / scalar / full-text / geo indexes, MemWAL streaming writes, optimistic-concurrency commits and commit handlers, schema evolution, versioning, time-travel, tags, branches, stable row IDs, namespaces, or object-store config. Triggers on lance crate, .lance file, lance dataset, lance file format, structural encoding, IVFPQ, IVFHNSW, IVF_RQ, RaBitQ, lance FTS, zonemap, MemWAL, OCC retry, lance schema evolution, lance namespace, pylance. This is the Lance format and engine (the lance-format/lance repo), not LanceDB the database product - but also the right reference for what LanceDB builds on.
metadata:
version: "0.4.0"
upstream: "lance-format/lance@v7.1.0-beta.2"
openclaw:
homepage: https://github.com/tenequm/skills/tree/main/skills/lance-format
emoji: "🗄️"
Lance is an open columnar format for multimodal AI - "a columnar data format that is 100x
faster than Parquet for random access." It is not one format but a stack of interoperating
specs: a file format, a table format, index formats, catalog specs, and a
namespace client spec. The Rust workspace at lance-format/lance implements all of them
plus Python (pylance) and Java bindings.
This skill tracks v7.1.0-beta.2 (the lance-format/lance git tag). Pin against tags,
not main - Lance ships beta tags every few days and next-format encodings can change.
The deep reference is references/lance-reference.md. Load it for any concrete schema, parameter,
proto, or constraint. This file is the orientation: read it first, then jump into the
reference section you need.
These are two different things and conflating them produces wrong answers.
lance-format/lance repo; the lance /lance-* Rust crates; pylance. It gives you datasets, the file/table format, indexes, commits,
scans. Consumed directly by DuckDB, Polars, Ray, Spark, PyTorch, DataFusion, or your own
Rust/Python code. This skill is about Lance.
lancedb/lancedb) built on top of Lance. It adds a query-builder API, an embedding registry, rerankers-as-API, multi-language SDK
parity, and managed Cloud / Enterprise tiers. Not covered here.
If you are linking the lance crate in Cargo.toml, you are using Lance directly - use this
skill. If a question is about LanceDB internals, the storage layer underneath it is still
Lance, so this skill remains the authority for the format itself.
24 crate directories under rust/. lance is the public entry point; the rest are layers
beneath it. Full table with descriptions and citations in references/lance-reference.md section 2.
| Crate | Role |
|-------|------|
| lance | Public entry point - Dataset, scanner, indexes, commits |
| lance-table | Table format - manifest, feature flags, commit handlers, row IDs |
| lance-file | File format - file reader/writer |
| lance-encoding | Structural encodings, compression (internal, not for external use) |
| lance-index | Scalar / vector / FTS / system indexes |
| lance-io | Object store, I/O schedulers |
| lance-core | Shared Error/Result, cache, datatypes |
| lance-datafusion | DataFusion glue (exec, expr, planner, UDFs) |
| lance-linalg | SIMD L2 / dot / cosine / hamming kernels |
| lance-select | Row-selection primitives - RowAddrMask, RowIdMask, IndexExprResult (extracted from lance-core/lance-index in v7.1.0-beta.2) |
| lance-tokenizer | FTS tokenizer stack (simple, ngram, jieba, lindera, stemmers) |
| lance-geo | Geospatial UDFs (feature-gated geo) |
| lance-namespace / -impls / -datafusion | Namespace trait, Directory/REST impls, DataFusion catalog bridge |
| lance-arrow, lance-tools, fsst, lance-bitpacking, ... | Arrow extensions, CLI, compression sub-crates |
All share version = "7.1.0-beta.2" except lance-arrow-scalar, which is pinned at58.0.0 to track Arrow. Workspace: edition 2024, rust-version = 1.91.0,resolver = "3".
The file format carries a single major.minor version. Selected per-dataset at creation viadata_storage_version and fixed once the dataset exists (to change it, rewrite the
dataset).
| Version | Status | Notes |
|---------|--------|-------|
| 0.1 (legacy) | read-only | Original format; no longer writable |
| 2.0 | stable | Removed row groups; null support for lists/FSL/primitives |
| 2.1 | current default (stable) | Adaptive structural encodings; better integer/string compression; nulls in struct fields; better nested random access. Default since Lance 5.0.0 |
| 2.2 | next (unstable) | Map type, Blob v2, VariablePackedStruct, larger mini-blocks. Required for Map and Blob v2; encodings may still change |
stable and next are aliases resolved by the running Lance release - pin an explicit
number for deterministic behavior.
The v6 -> v7 boundary is one breaking change: feat!: make dataset object store access (#6647) - object-store access is now scoped to a dataset base rather than a
base-aware
flat global path, which underpins multi-base storage (hot/cold tiering, shallow clones).
The dominant theme across the v7 betas is MemWAL - an experimental LSM / write-ahead-log
architecture for high-throughput streaming writes (WAL appender/tailer primitives, shard
writers, a Lance-native in-memory HNSW index, the shared-memory:// object-store scheme).
Also landing in the v7 era: branches (Git-like, alongside tags), segmented and
distributed index builds (FTS, bitmap, btree), newer scalar indexes (zonemap, bloom
filter, ngram), the geo / RTree index and lance-geo crate, manifest version hints
for fast latest-version lookup, and a formal split of the catalog / namespace / table /
index specifications. The v7.1.0-beta.1 tag opens the v7.1 line and adds a
materialized-view namespace API. v7.1.0-beta.2 then extracts mask code into a newlance-select crate (#6879) and lands two MemWAL correctness fixes - flushed
memtables now build their secondary indexes so vector rows are visible to fast_search
(#6901), and a per-source PK-hash block-list post-filter suppresses stale LSM vector reads
when the fresh row falls out of its source's top-k (#6899). Details inreferences/lance-reference.md section 14.
references/lance-reference.md is the full v7 reference, regrounded against the v7.1.0-beta.2
source. Load the section for your task:
constant / blob page types), compression schemes, blob encoding
extension arrays (bfloat16, image types)
files, base paths
handlers (conditional-put, DynamoDB), conflict resolution matrix
zonemap), full-text (BM25, tokenizers), geo/RTree
Citations in references/lance-reference.md are path:line relative to the lance-format/lance repo;
build a permalink as https://github.com/lance-format/lance/blob/v7.1.0-beta.2/<path>.
To refresh: git -C ~/pjv/lance-format/lance fetch --tags, check out the newest v7* tag,
re-read the format spec under docs/src/format/ and the user guide under docs/src/guide/,
re-verify the crate workspace, and bump metadata.upstream plus every v7.1.0-beta.2
reference. Line numbers in citations drift between tags - treat them as approximate.