Skip to content

Caveman Compression

Caveman Compression is a semantic compression technique designed to optimize text for Large Language Models (LLMs) by removing predictable grammatical structures while preserving critical information^[001-TODO__Caveman_Compression_-LLM_语义压缩方法.md]. The method leverages the LLM's ability to reliably reconstruct syntax and filler words, allowing users to reduce token usage by approximately 15% to 58% without losing semantic meaning^[001-TODO__Caveman_Compression-_LLM_语义压缩方法.md].

This approach is particularly effective for managing limited context windows, optimizing [[RAG systems]], and reducing costs associated with long prompts or agent reasoning chains^[001-TODO__Caveman_Compression_-_LLM_语义压缩方法.md].

Core Principles

The technique is based on the insight that LLMs are adept at inferring language gaps. Therefore, compression focuses on retaining "unpredictable" content—such as facts, data, and constraints—while stripping "predictable" elements^[001-TODO__Caveman_Compression_-_LLM_语义压缩方法.md].

  • Removed (Predictable): Articles ("a", "the"), auxiliary verbs ("is", "are"), conjunctions ("therefore", "however"), passive voice, and filler words ("very", "quite")^[001-TODO__Caveman_Compression_-_LLM_语义压缩方法.md].
  • Retained (Unpredictable): Specific data points (numbers, dates), technical terms, constraints, and named entities^[001-TODO__Caveman_Compression_-_LLM_语义压缩方法.md].

Compression Methods

There are three primary methods to implement Caveman Compression, varying in cost, speed, and compression rate^[001-TODO__Caveman_Compression_-_LLM_语义压缩方法.md]:

Method Basis Compression Rate Cost Speed
LLM-based Context-aware (OpenAI API) 40–58% Paid ~2s/request
MLM-based Token predictability (RoBERTa) 20–30% Free (Offline) ~1–5s/doc
NLP-based Linguistic rules (spaCy) 15–30% Free (Offline) <100ms

1. LLM-based

Utilizes the OpenAI API to achieve the highest compression rates by understanding context^[001-TODO__Caveman_Compression_-_LLM_语义压缩方法.md]. It requires an API key and is suitable for scenarios where token cost is a primary concern.

python caveman_compress.py compress "Your verbose text here"

2. MLM-based

Uses a Masked Language Model (RoBERTa) to remove the top-k most predictable tokens^[001-TODO__Caveman_Compression_-_LLM_语义压缩方法.md]. This method runs locally (approx. 500MB model size) and balances cost-free operation with good compression quality.

python caveman_compress_mlm.py compress "Your verbose text here"

3. NLP-based

Relies on spaCy rule-based matching^[001-TODO__Caveman_Compression_-_LLM_语义压缩方法.md]. It is the fastest method, supports 15+ languages (including Chinese and Japanese), and is fully offline.

python caveman_compress_nlp.py compress "Your verbose text here"

Core Compression Rules

Regardless of the specific method used, Caveman Compression follows a set of semantic rules to ensure the output remains usable and human-readable^[001-TODO__Caveman_Compression_-_LLM_语义压缩方法.md]:

  1. Remove Connectives: Eliminate words like "therefore," "however," and "in order to."
  2. Shorten Sentences: Limit sentences to 2–5 words to represent atomic thoughts.
  3. Use Simple Verbs: Prefer direct action verbs like "do," "make," "fix," or "check" over abstract nouns like "facilitate" or "optimize."
  4. Be Specific: Use explicit lists (e.g., "test five, test six") rather than ranges ("test values 5-6").
  5. Active Voice: Use "calculate value" instead of "value is calculated."
  6. Preserve Meaning: Always retain numbers, sizes, names, and constraints.

Example Comparison

The following example demonstrates the transformation and resulting token savings^[001-TODO__Caveman_Compression_-_LLM_语义压缩方法.md]:

  • Original (70 tokens): "In order to optimize the database query performance, we should consider implementing an index on the frequently accessed columns..."
  • Compressed (50 tokens): "Need fast queries. Check which columns used most. Add index to those columns..."

Use Cases

Caveman Compression is highly effective for technical and internal contexts where exact phrasing is less critical than information density^[001-TODO__Caveman_Compression_-_LLM_语义压缩方法.md].

  • Best Suited:
    • System prompts and reasoning/thinking blocks.
    • Internal documentation and knowledge bases.
    • [[RAG system]] retrieval chunks.
    • Agent chain-of-thought processes.
  • Not Suitable:
    • User-facing content.
    • Marketing or legal documents.
    • Communication relying on emotional nuance.

Benchmark Results

Testing indicates an average compression rate of 40%, with specific performance varying by text type^[001-TODO__Caveman_Compression_-_LLM_语义压缩方法.md].

Scenario Original Tokens Compressed Tokens Reduction
System Prompt 171 72 58%
API Documentation 137 79 42%
Resume 201 156 22%

In controlled fact-retention tests, the method preserved 13/13 facts (100%), verifying that the compression is semantically lossless regarding critical information^[001-TODO__Caveman_Compression_-_LLM_语义压缩方法.md].

  • [[Token Optimization]]
  • [[Prompt Engineering]]
  • [[RAG System]]

Sources

  • 001-TODO__Caveman_Compression_-_LLM_语义压缩方法.md