<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Sri Balaji Muruganandam's Blog]]></title><description><![CDATA[I share the latest news in AI and data, explained simply. Excited about tech that makes life better - one breakthrough at a time. No jargon, just clear, practic]]></description><link>https://blog.sribalaji.io</link><generator>RSS for Node</generator><lastBuildDate>Wed, 06 May 2026 10:21:22 GMT</lastBuildDate><atom:link href="https://blog.sribalaji.io/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Microsoft BitNet Explained: My Learnings, Insights, and Why It Matters in the LLM Race]]></title><description><![CDATA[Try it Live: BitNet
Microsoft launched a language model called BitNet with 2 billion parameters trained on 4 trillion tokens. The specialty of this model is how the parameters values are stored. BitNet b1.58 only uses about 1.58 bits per parameter
Pa...]]></description><link>https://blog.sribalaji.io/microsoft-bitnet</link><guid isPermaLink="true">https://blog.sribalaji.io/microsoft-bitnet</guid><category><![CDATA[BitNet]]></category><category><![CDATA[Microsoft]]></category><category><![CDATA[llm]]></category><category><![CDATA[Artificial Intelligence]]></category><category><![CDATA[Machine Learning]]></category><category><![CDATA[transformers]]></category><category><![CDATA[nlp]]></category><category><![CDATA[Deep Learning]]></category><category><![CDATA[Open Source]]></category><dc:creator><![CDATA[Sri Balaji Muruganandam]]></dc:creator><pubDate>Thu, 17 Apr 2025 02:31:28 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1744859160493/ef94afbd-4059-4c4e-9d56-011797124d61.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Try it Live: <a target="_blank" href="https://bitnet-demo.azurewebsites.net/">BitNet</a></p>
<p>Microsoft launched a language model called BitNet with 2 billion parameters trained on 4 trillion tokens. The specialty of this model is how the parameters values are stored. BitNet b1.58 only uses about 1.58 bits per parameter</p>
<h2 id="heading-parameters">Parameters</h2>
<p>As we know most LLM use 16 or 32 bit for storing the parameters[those are like knobs of an antenna to tune and adjust to get the respective TV channel signal better]</p>
<p>Basically each parameters in the model has either of these 3 values: +1, 0, -1</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1744856206831/2eeefa70-1417-4eac-bd41-e34affc9b479.png" alt class="image--center mx-auto" /></p>
<p>This is how it got the value 1.58 bits per parameter. With these 3 values in the parameters , it eliminates the usual complex matrix or tensor multiplications and replace the computation with just addition and subtraction. As we know Addition and subtraction is much faster and easier for computers.</p>
<h2 id="heading-why-and-what-is-the-big-deal">Why and What is the big deal?</h2>
<p>By using only 3 values for the parameter, the model is made much smaller and faster, They also use less memory and energy compared to traditional models. More detailed info on the performance on their GitHub Repo: <a target="_blank" href="https://github.com/microsoft/BitNet">Microsoft BitNet</a></p>
<p>Also they say, even they use fewer bits, they can match or even many times perform better than the traditional models of similar size which uses usual float based parameters</p>
<p>In there hugging face, model description I noticed the memory and latency is significantly less compared to other state of the art models! They also measure the energy usage in Joules which is also less, making the model more accessible and sustainable</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1744856741343/de6e92a0-12db-4474-a817-9d74f0adc2be.png" alt class="image--center mx-auto" /></p>
<p>For other performance evaluation metrics please refer their hugging face page: <a target="_blank" href="https://huggingface.co/microsoft/bitnet-b1.58-2B-4T">Hugging Face Link</a></p>
<p>This basically makes the model to run in local computers on CPU and GPU easily. In their Hugging face model page I noticed, <strong>to get the most efficient performance use it in the bitnet.cpp</strong></p>
<h2 id="heading-formats-of-the-model">Formats of the model</h2>
<p>In the page, they introced the model in 3 formats,</p>
<ul>
<li><p>1.58bit format</p>
</li>
<li><p>BF16 - Bfloat16 format</p>
</li>
<li><p>GGUF - GPT Generated Unified Format</p>
</li>
</ul>
]]></content:encoded></item></channel></rss>