Kv cache compression 1

Revolutionizing LLM Inference: PyramidInfer's Efficient KV Cache Compression

LLMs KV Cache Compression PyramidInfer Efficient Inference GPU Memory Usage

•24 May, 2024

Revolutionizing LLM Inference: PyramidInfer's Efficient KV Cache Compression

By Harper Montgomery