Skip to content

LLM-based compression

LLM-based compression is a semantic text reduction technique designed to optimize input for Large Language Models (LLMs) by removing predictable syntactic elements while retaining factual content^[001-TODO__Caveman_Compression_-_LLM_语义压缩方法.md].

The core insight behind this method is that LLMs are adept at inferring grammatical structure and filling linguistic gaps. Therefore, input text can be aggressively compressed by stripping away "reconstructable" components—such as articles, conjunctions, and passive voice constructions—without losing the underlying meaning or the model's ability to process the information^[001-TODO__Caveman_Compression_-_LLM_语义压缩方法.md].

Mechanism

This approach operates on the principle that only the "unpredictable" parts of a text—the entities, constraints, and specific facts—are strictly necessary for the model to perform reasoning tasks^[001-TODO__Caveman_Compression_-LLM_语义压缩方法.md]. By treating the LLM as a semantic decoder, the compression process functions as an encoder that strips syntax before the tokens ever reach the context window^[001-TODO__Caveman_Compression-_LLM_语义压缩方法.md].

What is Removed

The method targets high-frequency, predictable linguistic features:

  • Function Words: Articles like "a", "the", and "is".
  • Connectives: Words used for flow like "therefore", "however", and "because".
  • Filler Phrases: Qualifiers such as "very", "quite", or "essentially".
  • Passive Structures: Phrases like "is calculated by"^[001-TODO__Caveman_Compression_-_LLM_语义压缩方法.md].

What is Retained

To ensure lossless transmission of information, the following elements are preserved:

  • Factual Data: Numbers, dates, and specific names.
  • Technical Terms: Domain-specific jargon or constraints (e.g., "O(log n)", "99.9% uptime").
  • Logic and Constraints: Critical parameters and specific conditions^[001-TODO__Caveman_Compression_-_LLM_语义压缩方法.md].

Compression Characteristics

Compared to other optimization strategies like NLP-based rule engines or Masked Language Models (MLM), the LLM-based variant offers the highest compression ratio and context awareness^[001-TODO__Caveman_Compression_-_LLM_语义压缩方法.md].

  • Compression Rate: Achieves a reduction of 40–58% in token count^[001-TODO__Caveman_Compression_-_LLM_语义压缩方法.md].
  • Cost: Typically requires an API key (e.g., OpenAI) to perform the compression, incurring a small upfront cost to save downstream tokens^[001-TODO__Caveman_Compression_-_LLM_语义压缩方法.md].
  • Quality: Provides "best" quality compression because the LLM understands context and nuance better than rule-based systems^[001-TODO__Caveman_Compression_-_LLM_语义压缩方法.md].

Applications

LLM-based compression is particularly effective for token-constrained scenarios where preserving semantic accuracy is critical^[001-TODO__Caveman_Compression_-_LLM_语义压缩方法.md]:

  • System Prompts: Reducing the overhead of instructional text.
  • Retrieval-Augmented Generation (RAG): Compressing documents before storing them in vector databases or before injection into the context window^[001-TODO__Caveman_Compression_-_LLM_语义压缩方法.md].
  • Chain-of-Thought (CoT): Compressing intermediate reasoning steps or long agent memory chains.
  • Internal Documentation: Optimizing technical documentation or API references for LLM consumption^[001-TODO__Caveman_Compression_-_LLM_语义压缩方法.md].

Limitations

While effective for machine processing, the output—often described as "Caveman" speech due to its telegrammic nature—is not suitable for user-facing content, marketing copy, or legal documents where tone and full syntax are required^[001-TODO__Caveman_Compression_-_LLM_语义压缩方法.md].

  • [[Prompt Engineering]]
  • [[Token Optimization]]
  • [[RAG Systems]]

Sources

  • 001-TODO__Caveman_Compression_-_LLM_语义压缩方法.md