Markovian Thinker: Slash LLM Costs with Linear AI Reasoning (2025)

The arrival of the Markov Thinking Machine marks a significant breakthrough in reducing the computing power and inference costs associated with large language models (LLMs). This innovative approach leverages reinforcement learning to enhance LLM reasoning abilities, but it comes with a high computational cost. The traditional method involves generating a long chain of thought (LongCoT) and increasing the number of 'thinking tokens' to improve model capabilities. However, this approach faces challenges due to the quadratic increase in computational cost as the thinking process lengthens, making it resource-intensive. To address this, researchers from Mila and Microsoft Research introduced the concept of the Markovian Thinker, a policy that conducts reasoning based on a fixed-size state, effectively decoupling the thinking length from the context size. This paradigm shift has profound implications, as it allows for linear computational costs and constant memory usage, regardless of the thinking length. The Delethink paradigm, a key component of the Markovian Thinker, organizes the reasoning process into fixed-size chunks, forcing the policy to learn across these chunks. This approach has shown remarkable results, with the DeepSeek R1-Distill 1.5B model trained using Delethink outperforming models trained with LongCoT-RL in terms of both accuracy and efficiency. The Delethink model can think up to 24K tokens and even achieve 49% accuracy on AIME’24 with a problem-solving process of 36K tokens. The linear computation aspect is particularly significant, as it reduces training time by a factor of 4, from 27 H100 months to 7 H100 months for an average thinking length of 94K tokens. The Markovian Thinker's effectiveness is further supported by its compatibility with state-of-the-art models, as demonstrated by the GPT-OSS 120B model's robust Markovian thinking abilities across various domains. This breakthrough not only reduces the computational burden but also opens up new possibilities for the development of reasoning models, suggesting that non-quadratic sequence architectures may be particularly beneficial for enhancing reasoning capabilities.

Markovian Thinker: Slash LLM Costs with Linear AI Reasoning (2025)

References

Top Articles
Latest Posts
Recommended Articles
Article information

Author: Prof. Nancy Dach

Last Updated:

Views: 5733

Rating: 4.7 / 5 (57 voted)

Reviews: 88% of readers found this page helpful

Author information

Name: Prof. Nancy Dach

Birthday: 1993-08-23

Address: 569 Waelchi Ports, South Blainebury, LA 11589

Phone: +9958996486049

Job: Sales Manager

Hobby: Web surfing, Scuba diving, Mountaineering, Writing, Sailing, Dance, Blacksmithing

Introduction: My name is Prof. Nancy Dach, I am a lively, joyous, courageous, lovely, tender, charming, open person who loves writing and wants to share my knowledge and understanding with you.