Skip to content

NLP-based compression

NLP-based compression is a text optimization strategy designed to reduce token usage in Large Language Model (LLM) prompts by removing predictable linguistic structures while preserving semantic content^[001-TODO__Caveman_Compression_-_LLM_语义压缩方法.md].

This approach operates on the principle that LLMs are proficient at inferring grammar, syntax, and functional words (such as articles and conjunctions). By stripping away these "reconstructable" elements and retaining only the core entities, facts, and data, developers can achieve significant reductions in token costs—typically ranging from 15% to 30%—without degrading the quality of the model's output^[001-TODO__Caveman_Compression_-_LLM_语义压缩方法.md].

Characteristics

NLP-based compression is distinguished by its speed, cost-efficiency, and multilingual support compared to other methods^[001-TODO__Caveman_Compression_-_LLM_语义压缩方法.md].

  • Speed: It is the fastest compression method available, capable of processing text in under 100 milliseconds^[001-TODO__Caveman_Compression_-_LLM_语义压缩方法.md].
  • Cost: It is free to use and operates completely offline^[001-TODO__Caveman_Compression_-_LLM_语义压缩方法.md].
  • Language Support: Unlike LLM-based methods that may rely on specific API training data, NLP-based rule systems generally support a wide range of languages, including English, Chinese, Japanese, Russian, and others (15+ languages total)^[001-TODO__Caveman_Compression_-_LLM_语义压缩方法.md].
  • Compression Rate: It typically achieves a compression rate of 15-30%^[001-TODO__Caveman_Compression_-_LLM_语义压缩方法.md].

Technical Implementation

This method relies on rule-based natural language processing, typically using libraries such as spaCy^[001-TODO__Caveman_Compression_-LLM_语义压缩方法.md]. It identifies and removes specific categories of "predictable" words while keeping "unpredictable" information carriers^[001-TODO__Caveman_Compression-_LLM_语义压缩方法.md].

Removal Targets

The algorithm targets elements of the text that add structure but carry little independent factual weight:

  • Grammar words: Articles (e.g., "a", "the") and common verbs (e.g., "is", "are")^[001-TODO__Caveman_Compression_-_LLM_语义压缩方法.md].
  • Connectors: Transition words used for flow, such as "therefore", "however", and "because"^[001-TODO__Caveman_Compression_-_LLM_语义压缩方法.md].
  • Passive voice: Constructions like "is calculated by" are often simplified to active forms or removed^[001-TODO__Caveman_Compression_-_LLM_语义压缩方法.md].
  • Filler words: Qualifiers that do not add precision, such as "very", "quite", or "essentially"^[001-TODO__Caveman_Compression_-_LLM_语义压缩方法.md].

Retention Targets

Conversely, the system preserves data that is essential for the LLM to perform the specific task:

  • Factual data: Numbers, dates, and proper names^[001-TODO__Caveman_Compression_-_LLM_语义压缩方法.md].
  • Technical terms: Specific domain language, such as "O(log n)" or "binary search"^[001-TODO__Caveman_Compression_-_LLM_语义压缩方法.md].
  • Constraints: Critical modifiers like "medium-large" or "frequently accessed"^[001-TODO__Caveman_Compression_-_LLM_语义压缩方法.md].

Core Rules

To ensure the compressed text remains effective for inference, NLP-based compression often adheres to specific formatting rules^[001-TODO__Caveman_Compression_-_LLM_语义压缩方法.md]:

  1. Remove connectors: Strip out words like "therefore" or "in order to"^[001-TODO__Caveman_Compression_-_LLM_语义压缩方法.md].
  2. Atomic sentences: Break complex thoughts into short sentences containing 2-5 words^[001-TODO__Caveman_Compression_-_LLM_语义压缩方法.md].
  3. Use simple verbs: Prefer direct action verbs like "do", "make", "fix", or "check" over abstract nouns like "facilitate" or "optimize"^[001-TODO__Caveman_Compression_-_LLM_语义压缩方法.md].
  4. Active voice: Use phrasing like "calculate value" instead of "value is calculated"^[001-TODO__Caveman_Compression_-_LLM_语义压缩方法.md].

Comparison with Alternatives

Within the context of semantic compression frameworks (such as Caveman Compression), NLP-based compression occupies a specific niche alongside LLM-based and MLM-based approaches^[001-TODO__Caveman_Compression_-_LLM_语义压缩方法.md].

Feature NLP-based LLM-based MLM-based
Mechanism Rules (spaCy) Context-aware API Masked Language Models
Compression Rate 15-30% 40-58% 20-30%
Speed < 100ms ~ 2s ~ 1-5s
Cost Free Paid API Free (Local Model)
Offline Support Yes No Yes

Use Cases

NLP-based compression is ideal for high-volume or latency-sensitive applications where some compression is beneficial, but the highest possible compression rate is not critical^[001-TODO__Caveman_Compression_-_LLM_语义压缩方法.md].

  • Internal Documentation: Compressing knowledge bases or technical manuals before feeding them into an LLM^[001-TODO__Caveman_Compression_-_LLM_语义压缩方法.md].
  • Real-time Systems: Applications where the ~2s latency of an LLM-based compression call is unacceptable^[001-TODO__Caveman_Compression_-_LLM_语义压缩方法.md].
  • Cost-Sensitive Operations: Scenarios where avoiding API costs for compression is a priority^[001-TODO__Caveman_Compression_-_LLM_语义压缩方法.md].

Sources

  • 001-TODO__Caveman_Compression_-_LLM_语义压缩方法.md