文件预览

cli-options.md

查看 Ppx Parse 技能包中的文件内容。

文件内容

references/cli-options.md

# CLI Options

Use these defaults unless the user requests something more specific.

## Input Selection

- `ppx parse <file> -o <dir>`: Parse one PDF or image.
- `ppx parse <dir> -o <dir>`: Batch-parse a directory.
- `--pages "1-5,10,15-20"`: Restrict parsing to selected pages.

## OCR

- `--ocr auto`: Default. Works for mixed native/scanned documents.
- `--ocr yes`: Force OCR for scanned PDFs and raster-heavy pages.
- `--ocr no`: Skip OCR for native PDFs with selectable text.

## Tables

- `--table auto`: Default.
- `--table wbk`: White-background tables in ordinary documents.
- `--table ybk`: Colored or dark-background tables.
- `--table llm`: Highest table fidelity when an LLM backend is configured.
- `--table no`: Skip table extraction.

## Other Useful Flags

- `--mode page|tree|ppt`: Select parse mode. Use `page` unless a different downstream structure is required.
- `--workers N`: Parallelize directory parsing.
- `--json`: Emit JSON-focused output.
- `--cpu`: Force CPU mode.
- `--debug` or `-x`: Keep extra debug output.
- `--dev`: Save intermediate results for investigation.