文件预览
storage-properties.md

查看 Alibabacloud Emr Starrocks Assistant 技能包中的文件内容。
返回技能详情下载技能包打开来源页
文件内容
references/schema/storage-properties.md

# Detailed Storage Properties Guide

## Table of Contents

1. [Compression Algorithms](#compression-algorithms)
2. [Replicas and Write Strategy](#replicas-and-write-strategy)
3. [Storage Media and Hot/Cold Storage Tiering](#storage-media-and-hotcold-storage-tiering)
4. [Fast Schema Evolution](#fast-schema-evolution)
5. [Compaction Control](#compaction-control)
6. [Hybrid Row-Column Storage](#hybrid-row-column-storage)
7. [Other Useful Properties](#other-useful-properties)

---

## Compression Algorithms

**Configuration:**
```sql
PROPERTIES ("compression" = "algorithm name")
```

| Algorithm | Compression Ratio | Speed | Applicable Scenario |
|------|-------|------|---------|
| **LZ4** (default) | Medium | Fastest | Most scenarios; prioritizes query performance |
| **ZSTD** | High | Fairly fast | Storage-space sensitive, cold data, log tables |
| **zlib** | Highest | Slowest | Extreme storage optimization (impacts query performance) |
| **Snappy** | Low | Fast | Compatibility needs |

**ZSTD level configuration (v3.1.8+):**
```sql
-- Levels 1-22; higher numbers give higher compression but slower speed; default 3
PROPERTIES ("compression" = "zstd(3)")
```

**Selection recommendations:**
- Online query tables → **LZ4** (default is fine)
- Log/archive tables with infrequent queries → **ZSTD**
- Extremely storage-cost sensitive → **ZSTD(9)** or **zlib**
- Generally no change is needed; LZ4 is a safe default

---

## Replicas and Write Strategy

### Replica Count

```sql
PROPERTIES ("replication_num" = "3")  -- Default 3
```

| Configuration | Applicable Scenario |
|------|---------|
| `3` (default) | Production environments, ensures high availability |
| `1` | Dev/test environments; shared-data clusters (data is redundant in object storage) |
| `2` | Rarely used |

### Write Acknowledgement Strategy

```sql
PROPERTIES ("write_quorum" = "MAJORITY")
```

| Strategy | Meaning | Applicable Scenario |
|------|------|---------|
| `MAJORITY` (default) | Return success once the majority of replicas have written | Production environments (recommended) |
| `ONE` | Return success once any replica succeeds | Write performance priority (risk of data loss) |
| `ALL` | Return success only after all replicas succeed | Extremely strict data consistency |

### Replica Sync Mode

```sql
PROPERTIES ("replicated_storage" = "true")
```

| Mode | Meaning | Default by Version |
|------|------|---------|
| `true` | Single-Leader write, synced to Followers (low CPU usage) | v3.0+ (recommended) |
| `false` | All replicas write independently (high CPU usage) | v2.5 |

---

## Storage Media and Hot/Cold Storage Tiering

### Storage Media

```sql
PROPERTIES (
    "storage_medium" = "SSD",           -- Initially stored on SSD
    "storage_cooldown_ttl" = "7 DAY"    -- Migrate to HDD after 7 days
)
```

### Cooldown Strategy

| Property | Description |
|------|------|
| `storage_medium` | `SSD` or `HDD` |
| `storage_cooldown_ttl` | Cool down after partition upper-bound time + TTL (e.g., `"30 DAY"`) |
| `storage_cooldown_time` | Cool down at an absolute time (e.g., `"2025-06-01 00:00:00"`) |

**Limitations:**
- Only applies to Range / List partitioned tables
- The partition column must be a date type
- Primary Key tables do not support hot/cold storage tiering
- **Silently ignored in shared-data** — `PropertyAnalyzer` strips both `storage_cooldown_time` and `storage_cooldown_ttl` when `RunMode.isSharedDataMode()` is true. Use `datacache.partition_duration` (see *Shared-data properties* below) for the equivalent effect.

**Full example:**
```sql
CREATE TABLE hot_cold_table (
    dt DATE NOT NULL,
    user_id BIGINT,
    data VARCHAR(256)
)
DUPLICATE KEY(dt, user_id)
PARTITION BY date_trunc('day', dt)
DISTRIBUTED BY HASH(user_id) BUCKETS 16
PROPERTIES (
    "storage_medium" = "SSD",
    "storage_cooldown_ttl" = "7 DAY",
    "replication_num" = "3"
);
```

---

## Shared-data (存算分离) properties

Shared-data clusters keep data files in object storage and cache hot data on CN local disk. This changes which `PROPERTIES` are meaningful at `CREATE TABLE`.

### Properties to **set**

| Property | Values | Default | Version | Purpose |
|---|---|---|---|---|
| `storage_volume` | volume name (string) | the cluster's default volume | v3.1+ | Which object-storage backend the table writes to; omit to use the cluster default |
| `datacache.enable` | `true` / `false` | `true` | v3.1+ | Local-disk cache for hot reads; leave on unless intentionally bypassing |
| `datacache.partition_duration` | e.g. `"7 DAY"`, `"3 MONTH"`, `"12 HOUR"` | unset (cache evicts purely by LRU) | v3.1+ | Treat partitions whose upper bound is within this window as hot — they stay cached and are reloaded if evicted. **Set this to match the user's stated query window** (e.g. "queries the last 90 days" → `"90 DAY"`). Setting it smaller than the query window means in-window queries miss the cache; do not default to a generic "hot window" like 30 days when the user has given a concrete number. |
| `enable_persistent_index` | `true` / `false` | `true` (PK tables) | — | Required for Primary Key tables; persists the PK index across restarts |
| `persistent_index_type` | `LOCAL` / `CLOUD_NATIVE` | v3.2: `LOCAL`; v3.3+: `CLOUD_NATIVE` on shared-data | v3.1.4+ (LOCAL), v3.3+ (CLOUD_NATIVE on object storage) | `CLOUD_NATIVE` stores the PK index on object storage — survives CN rebalance without rebuild; the right default for shared-data |

### Properties to **omit** (and why)

| Property | What happens |
|---|---|
| `enable_async_write_back` | **Disabled since v3.1.4** — setting it (any value) throws `AnalysisException: enable_async_write_back is disabled since version 3.1.4` |
| `storage_cooldown_time` / `storage_cooldown_ttl` | Silently stripped in shared-data; use `datacache.partition_duration` instead |
| `storage_medium` | Meaningless on shared-data — there's no SSD/HDD tier choice; the cache disk is what it is |
| `replicated_storage` | Single-leader replication is a shared-nothing concept; not applicable |

### Properties that **still apply but behave differently**

- `replication_num`: not ignored — `PropertyAnalyzer` validates it against the **CN node count** (not BE). Defaults to 3 cluster-wide; lower it if you have fewer CNs than 3.
- `fast_schema_evolution`: defaults to `true` on shared-data v3.3+; you do not need to set it.

### Minimal Primary-Key-on-shared-data checklist (v3.3+)

```sql
CREATE TABLE orders (
    order_id  BIGINT NOT NULL,
    user_id   BIGINT,
    create_time DATETIME NOT NULL,
    amount    DECIMAL(18,2),
    status    VARCHAR(32)
)
PRIMARY KEY (order_id, create_time)
PARTITION BY date_trunc('day', create_time)
DISTRIBUTED BY HASH(order_id) BUCKETS 12
ORDER BY (create_time, order_id)
PROPERTIES (
    -- "storage_volume"             = "my_oss_vol",   -- omit to use cluster default
    "datacache.enable"              = "true",
    "datacache.partition_duration"  = "90 DAY",       -- last 90 days stays in cache
    "enable_persistent_index"       = "true",
    "persistent_index_type"         = "CLOUD_NATIVE", -- v3.3+ default; explicit for clarity
    "compression"                   = "LZ4"
);
```

---

## Fast Schema Evolution

**Function:** Accelerates ADD/DROP COLUMN operations and reduces resource usage.

```sql
PROPERTIES ("fast_schema_evolution" = "true")
```

| Version | Default |
|------|-------|
| v3.2+ (shared-nothing) | `false` (must be enabled manually) |
| v3.3+ (shared-data) | `true` (enabled by default) |

**Effects when enabled:**
- ADD COLUMN: Completes in seconds, no need to rewrite data files
- DROP COLUMN: Marked for deletion, cleaned up asynchronously in the background

**Recommendation:** Enable for tables whose columns change frequently.

---

## Compaction Control

### Forbidden Compaction Windows

```sql
PROPERTIES (
    "base_compaction_forbidden_time_ranges" = "* 8-20 * * *"
)
```

**Format:** Quartz cron (minute hour day month day-of-week)

| Configuration | Meaning |
|------|------|
| `"* 8-20 * * *"` | Forbid Base Compaction daily from 8:00 to 20:59 |
| `"* 0-6 * * 1-5"` | Forbid weekdays from midnight to 6 a.m. |

**Applicable scenarios:**
- Avoid Compaction competing for IO during daytime business peaks
- Note: During the forbidden window, at most 500 ingestion operations are allowed (Cumulative Compaction still runs)

### Compaction-Related Properties

```sql
-- Partition TTL (v3.1.7+): automatically delete expired partitions
PROPERTIES ("partition_ttl" = "90 DAY")

-- Equivalent to the start parameter of Dynamic Partition, but more concise
```

---

## Hybrid Row-Column Storage

**Version:** v3.2.3+ (Preview feature)

**Purpose:** Support efficient columnar analytical queries and row-based point lookups simultaneously.

**Prerequisites:**
```sql
-- FE configuration
ADMIN SET FRONTEND CONFIG ("enable_experimental_rowstore" = "true");
```

**Syntax:**
```sql
CREATE TABLE user_profile (
    user_id BIGINT NOT NULL,
    name VARCHAR(64),
    email VARCHAR(128),
    age INT,
    balance DECIMAL(12,2)
)
PRIMARY KEY(user_id)
DISTRIBUTED BY HASH(user_id) BUCKETS 16
PROPERTIES ("store_type" = "column_with_row");
```

**Applicable scenarios:**
- Tables with both primary-key point lookups (SELECT * WHERE id = ?) and analytical aggregation queries
- API services and reporting analytics sharing a single table

**Limitations:**
- Only supports Primary Key tables
- Row data is stored in the hidden column `__row`, adding storage overhead
- Maximum 1 MB per row
- v3.2.4+ supports BITMAP, HLL, JSON, ARRAY, MAP, STRUCT types

---

## Other Useful Properties

### StarRocks-specific column types

| Type | When to use |
|------|---------|
| `HLL` | Approximate `COUNT(DISTINCT)` with O(1) memory; combine with `hll_union_agg()` |
| `BITMAP` | Exact `COUNT(DISTINCT)` on integer / dictionary-encoded keys; combine with `bitmap_union_count()` |
| `PERCENTILE` | Pre-aggregated percentile state for Aggregate tables; query via `percentile_approx_raw()` |
| `JSON` | Native binary JSON; faster than `VARCHAR` for `->`/`json_query()` extraction |
| `ARRAY<T>` / `MAP<K,V>` / `STRUCT` | Multi-value / nested fields; query with `[idx]`, `element_at()`, `unnest()` |
| `agg_state(<func>)` | (v3.1+) Pre-aggregated state column on Aggregate tables; finalize via `<func>_union` / `<func>_merge` |

### Table-Level Comment

```sql
CREATE TABLE my_table (...)
COMMENT "Business purpose description"
PROPERTIES (...);
```

### Per-Partition Bucket Override

```sql
-- Different partitions can have different bucket counts (only when manually creating Range Partitions)
PARTITION BY RANGE(dt) (
    PARTITION p_old VALUES LESS THAN ("2024-01-01") DISTRIBUTED BY HASH(id) BUCKETS 8,
    PARTITION p_new VALUES LESS THAN ("2025-01-01") DISTRIBUTED BY HASH(id) BUCKETS 32
)
```

### Tablet Size Inspection

```sql
-- View the table's Tablet distribution and sizes
SHOW TABLET FROM table_name;

-- View statistics
SHOW TABLE STATS table_name;

-- View partition information
SHOW PARTITIONS FROM table_name;
```