<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://rpallas92.github.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://rpallas92.github.io/" rel="alternate" type="text/html" /><updated>2026-04-17T13:55:37+00:00</updated><id>https://rpallas92.github.io/feed.xml</id><title type="html">Ricardo Pallás</title><subtitle>Sowftware engineering blog</subtitle><entry><title type="html">Fast Data Persistence: GrausDB + Zero-Copy Serialization</title><link href="https://rpallas92.github.io/zero-copy-serde/" rel="alternate" type="text/html" title="Fast Data Persistence: GrausDB + Zero-Copy Serialization" /><published>2025-09-27T08:00:00+00:00</published><updated>2025-09-27T08:00:00+00:00</updated><id>https://rpallas92.github.io/zero-copy-serde</id><content type="html" xml:base="https://rpallas92.github.io/zero-copy-serde/"><![CDATA[<h1 id="fast-data-persistence-grausdb--zero-copy-serialization">Fast Data Persistence: GrausDB + Zero-Copy Serialization</h1>

<h2 id="table-of-contents">Table of Contents</h2>

<ul>
  <li><a href="#introduction">Introduction</a></li>
  <li><a href="#grausdb-summary">GrausDB Summary</a></li>
  <li><a href="#the-problem-with-traditional-serialization">The Problem with Traditional Serialization</a></li>
  <li><a href="#zero-copy-serialization">Zero-Copy Serialization</a></li>
  <li><a href="#example-the-product-struct">Example: The Product Struct</a></li>
  <li><a href="#product-stock-update">Product Stock Update</a></li>
  <li><a href="#results">Results</a></li>
  <li><a href="#zero-copy-vs-traditional-json-serialization">Zero-Copy vs Traditional JSON Serialization</a></li>
  <li><a href="#important-caveats-and-pitfalls">Important Caveats and Pitfalls</a></li>
  <li><a href="#conclusion">Conclusion</a></li>
</ul>

<h2 id="introduction">Introduction</h2>

<p>Experiment with short pieces of code to make it fast is a fun hobby. A while ago, I wrote about <a href="https://rpallas.xyz/math-parser/">Optimizing a simple math parser</a>, and the obsession hasn’t faded. Today, I want to talk about a different kind of speed: the speed of saving and loading your data using <a href="https://github.com/RPallas92/GrausDB">GrausDB</a>.</p>

<p>We often overlook it, but serialization (the process of converting our data structures into a format that can be stored or transmitted) have a high impact on performance. We write our beautiful code, with structs and objects, and then, when it’s time to save them to a database, we just any serialization library (even <code class="language-plaintext highlighter-rouge">JSON.stringify</code> in JavaScript) and hope for the best. But under the hood, a lot of copying, allocating, and processing is happening. It all counts for perf.</p>

<p>What if we could just… not do that? <strong>What if we could take our in-memory data and persist it directly</strong>, with no copies, no intermediate formats, no fuss? This is the promise of zero-copy serialization, and when you combine it with a high-performance embedded database like <a href="https://github.com/grausdb/grausdb">GrausDB</a>, the results are great!</p>

<p><strong>This blog post is about showing you how to persist structs in GrausDB easily and with incredible performance.</strong></p>

<h2 id="grausdb-summary">GrausDB Summary</h2>

<p>I created GrausDB with a simple idea in mind: provide a very simple, low-level API that’s easy to use. An API so straightforward that you don’t need to constantly check the documentation. It has just 4 methods:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">get(key)</code> - Retrieve a value</li>
  <li><code class="language-plaintext highlighter-rouge">set(key, value)</code> - Store a value</li>
  <li><code class="language-plaintext highlighter-rouge">delete(key)</code> - Remove a value</li>
  <li><code class="language-plaintext highlighter-rouge">update_if(key, update_fn, value_if_missing, predicate)</code> - Atomically update a value if certain predicate is satisfied</li>
</ul>

<p>That’s it. You work directly with slices of bytes (<code class="language-plaintext highlighter-rouge">&amp;[u8]</code>), giving you complete control. And with <code class="language-plaintext highlighter-rouge">update_if</code>, you can perform atomic updates without worrying about transactions, commits, or flush methods (I didn’t include those because I don’t want you to think about these details when using the lib).</p>

<h2 id="the-problem-with-traditional-serialization">The Problem with Traditional Serialization</h2>

<p>Imagine you have a book. Every time you want to save its contents, you get a fresh stack of paper and meticulously copy the entire book, word for word. When you want to read it back, you take your copy and read from it. This is how traditional serialization often works.</p>

<p>When you serialize an object (like a <code class="language-plaintext highlighter-rouge">struct</code> in Rust), the library typically allocates a new buffer and then walks through your object, copying its data field by field into that buffer. Deserialization is the reverse: it reads the buffer, allocates memory for a new object, and then copies the data from the buffer into this new object.</p>

<p>This “photocopying” has a cost:</p>

<ul>
  <li><strong>CPU cycles:</strong> All that data copying keeps your CPU busy.</li>
  <li><strong>Memory allocation:</strong> Constantly creating new objects and buffers.</li>
  <li><strong>Cache misses:</strong> When data is scattered around in memory, the CPU can’t efficiently use its cache, which slows things down.</li>
</ul>

<p>For most of the applications, this is fine. But when you’re building a high-throughput system this overhead becomes a bottleneck.</p>

<h2 id="zero-copy-serialization">Zero-Copy Serialization</h2>

<p>Now, imagine instead of photocopying the book, you could just place a bookmark in it. This bookmark tells you exactly where the book is and how it’s laid out. To “read” it, you just look at the original. No copies needed.</p>

<p>This is the core idea of zero-copy serialization. We ensure that the way our data is organized in memory is <em>identical</em> to how we want to store it on disk. If the memory layout and the disk layout match, <strong>we can just take a raw slice of memory and write it to our database</strong>. To read it back, we can take the raw bytes from the database and interpret them directly as our data structure, without creating any new objects or allocating extra memory in the heap.</p>

<p>In Rust, we can achieve this with a couple of key tools:</p>

<ol>
  <li><code class="language-plaintext highlighter-rouge">#[repr(C)]</code>: This attribute tells the Rust compiler to lay out a struct’s fields in memory in the same way a C compiler would: sequentially, in the order they are defined. This gives us a predictable and stable memory layout.</li>
  <li><strong>A zero-copy library:</strong> While you could do this manually with unsafe code, libraries like <code class="language-plaintext highlighter-rouge">musli-zerocopy</code> make it safe and easy.</li>
</ol>

<p>Let’s see how this works in a real example.</p>

<h2 id="example-the-product-struct">Example: The Product Struct</h2>

<p>Here is the code we’ll be benchmarking. It’s a simple program that creates a <code class="language-plaintext highlighter-rouge">Product</code>, saves it to GrausDB, and then reads and modifies it 20,000 times. I added many comments to the code to over-explain it.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">use</span> <span class="nn">graus_db</span><span class="p">::</span><span class="n">GrausDb</span><span class="p">;</span>
<span class="k">use</span> <span class="nn">musli_zerocopy</span><span class="p">::{</span><span class="n">endian</span><span class="p">,</span> <span class="n">Buf</span><span class="p">,</span> <span class="n">OwnedBuf</span><span class="p">,</span> <span class="n">Ref</span><span class="p">,</span> <span class="n">ZeroCopy</span><span class="p">};</span>
<span class="k">use</span> <span class="nn">std</span><span class="p">::</span><span class="nn">error</span><span class="p">::</span><span class="n">Error</span><span class="p">;</span>
<span class="k">use</span> <span class="nn">std</span><span class="p">::</span><span class="n">fs</span><span class="p">;</span>
<span class="k">use</span> <span class="nn">std</span><span class="p">::</span><span class="n">mem</span><span class="p">;</span>
<span class="k">use</span> <span class="nn">std</span><span class="p">::</span><span class="nn">time</span><span class="p">::</span><span class="n">Instant</span><span class="p">;</span>

<span class="cd">/// Represents a product with a stock count and a name.</span>
<span class="cd">/// `#[derive(ZeroCopy)]` enables zero-copy serialization/deserialization with `musli-zerocopy`.</span>
<span class="cd">/// `#[repr(C)]` ensures a C-compatible memory layout, which is required for zero-copy.</span>
<span class="nd">#[derive(ZeroCopy)]</span>
<span class="nd">#[repr(C)]</span>
<span class="k">struct</span> <span class="n">Product</span> <span class="p">{</span>
    <span class="n">stock</span><span class="p">:</span> <span class="nb">u16</span><span class="p">,</span>
    <span class="n">name</span><span class="p">:</span> <span class="n">Ref</span><span class="o">&lt;</span><span class="nb">str</span><span class="o">&gt;</span><span class="p">,</span> <span class="c1">// `Ref&lt;str&gt;` allows zero-copy referencing of string data within the buffer.</span>
<span class="p">}</span>

<span class="k">impl</span> <span class="n">Product</span> <span class="p">{</span>
    <span class="cd">/// Serializes a `Product` into an `OwnedBuf` using zero-copy principles.</span>
    <span class="cd">/// The `stock` is stored directly, and the `name_str` is stored as a `Ref&lt;str&gt;`.</span>
    <span class="cd">/// This avoids copying the string data during serialization.</span>
    <span class="k">fn</span> <span class="nf">to_bytes</span><span class="p">(</span><span class="n">stock</span><span class="p">:</span> <span class="nb">u16</span><span class="p">,</span> <span class="n">name_str</span><span class="p">:</span> <span class="o">&amp;</span><span class="nb">str</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="n">OwnedBuf</span> <span class="p">{</span>
        <span class="c1">// Create a new owned buffer, configured for little-endian byte order.</span>
        <span class="k">let</span> <span class="k">mut</span> <span class="n">buf</span> <span class="o">=</span> <span class="nn">OwnedBuf</span><span class="p">::</span><span class="nf">new</span><span class="p">()</span><span class="py">.with_byte_order</span><span class="p">::</span><span class="o">&lt;</span><span class="nn">endian</span><span class="p">::</span><span class="n">Little</span><span class="o">&gt;</span><span class="p">();</span>
        <span class="c1">// Reserve space for the `Product` struct without initializing it.</span>
        <span class="k">let</span> <span class="n">product_ref</span> <span class="o">=</span> <span class="n">buf</span><span class="py">.store_uninit</span><span class="p">::</span><span class="o">&lt;</span><span class="n">Product</span><span class="o">&gt;</span><span class="p">();</span>
        <span class="c1">// Store the string data for the name, returning a `Ref&lt;str&gt;` to it within the buffer.</span>
        <span class="k">let</span> <span class="n">name</span> <span class="o">=</span> <span class="n">buf</span><span class="nf">.store_unsized</span><span class="p">(</span><span class="n">name_str</span><span class="p">);</span>
        <span class="c1">// Load the uninitialized `Product` reference and write the actual `Product` data into it.</span>
        <span class="n">buf</span><span class="nf">.load_uninit_mut</span><span class="p">(</span><span class="n">product_ref</span><span class="p">)</span>
            <span class="nf">.write</span><span class="p">(</span><span class="o">&amp;</span><span class="n">Product</span> <span class="p">{</span> <span class="n">stock</span><span class="p">,</span> <span class="n">name</span> <span class="p">});</span>
        <span class="n">buf</span>
    <span class="p">}</span>

    <span class="cd">/// Deserializes a `Product` reference from a byte slice using zero-copy.</span>
    <span class="cd">/// This function returns a reference to the `Product` directly from the input bytes,</span>
    <span class="cd">/// without allocating new memory for the struct itself.</span>
    <span class="k">fn</span> <span class="n">from_bytes</span><span class="o">&lt;</span><span class="nv">'a</span><span class="o">&gt;</span><span class="p">(</span><span class="n">bytes</span><span class="p">:</span> <span class="o">&amp;</span><span class="nv">'a</span> <span class="p">[</span><span class="nb">u8</span><span class="p">])</span> <span class="k">-&gt;</span> <span class="o">&amp;</span><span class="nv">'a</span> <span class="n">Product</span> <span class="p">{</span>
        <span class="c1">// Create a `Buf` from the input byte slice.</span>
        <span class="k">let</span> <span class="n">loaded_buf</span> <span class="o">=</span> <span class="nn">Buf</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="n">bytes</span><span class="p">);</span>
        <span class="c1">// Create a `Ref` to the `Product` at the beginning of the buffer (offset 0).</span>
        <span class="c1">// This assumes the `Product` struct is at the start of the serialized data.</span>
        <span class="k">let</span> <span class="n">loaded_product_ref</span> <span class="o">=</span> <span class="nn">Ref</span><span class="p">::</span><span class="o">&lt;</span><span class="n">Product</span><span class="p">,</span> <span class="nn">endian</span><span class="p">::</span><span class="n">Little</span><span class="o">&gt;</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="mi">0</span> <span class="k">as</span> <span class="nb">usize</span><span class="p">);</span>
        <span class="c1">// Load the `Product` reference from the buffer. `unwrap()` is used here for simplicity,</span>
        <span class="c1">// but in a real application, error handling would be necessary.</span>
        <span class="n">loaded_buf</span><span class="nf">.load</span><span class="p">(</span><span class="n">loaded_product_ref</span><span class="p">)</span><span class="nf">.unwrap</span><span class="p">()</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="k">const</span> <span class="n">ITERATIONS</span><span class="p">:</span> <span class="nb">usize</span> <span class="o">=</span> <span class="mi">20_000</span><span class="p">;</span>

<span class="cd">/// Main function to demonstrate GrausDB usage with zero-copy serialization/deserialization structs.</span>
<span class="cd">/// It opens a database, sets a product, retrieves it, decreases its stock, and retrieves it again (20k times).</span>
<span class="k">fn</span> <span class="nf">main</span><span class="p">()</span> <span class="k">-&gt;</span> <span class="nb">Result</span><span class="o">&lt;</span><span class="p">(),</span> <span class="nb">Box</span><span class="o">&lt;</span><span class="k">dyn</span> <span class="n">Error</span><span class="o">&gt;&gt;</span> <span class="p">{</span>
    <span class="k">let</span> <span class="n">db_path</span> <span class="o">=</span> <span class="s">"./grausdb_data"</span><span class="p">;</span>
    <span class="k">let</span> <span class="n">_</span> <span class="o">=</span> <span class="nn">fs</span><span class="p">::</span><span class="nf">remove_dir_all</span><span class="p">(</span><span class="n">db_path</span><span class="p">);</span> <span class="c1">// Just to clean up previous database data (if it exists)</span>
    <span class="k">let</span> <span class="n">db</span> <span class="o">=</span> <span class="nn">GrausDb</span><span class="p">::</span><span class="nf">open</span><span class="p">(</span><span class="n">db_path</span><span class="p">)</span><span class="o">?</span><span class="p">;</span>

    <span class="nd">println!</span><span class="p">(</span><span class="s">"GrausDB opened at ='{:?}'"</span><span class="p">,</span> <span class="n">db_path</span><span class="p">);</span>

    <span class="c1">// Create a Product and serialize it into an OwnedBuf using zero-copy.</span>
    <span class="k">let</span> <span class="n">product_buf</span> <span class="o">=</span> <span class="nn">Product</span><span class="p">::</span><span class="nf">to_bytes</span><span class="p">(</span><span class="n">ITERATIONS</span> <span class="k">as</span> <span class="nb">u16</span> <span class="o">+</span> <span class="mi">1</span><span class="p">,</span> <span class="s">"Yeezy Boost 350 V2"</span><span class="p">);</span>

    <span class="c1">// Store it in the database.</span>
    <span class="k">let</span> <span class="n">key</span> <span class="o">=</span> <span class="s">b"yeezy"</span><span class="nf">.to_vec</span><span class="p">();</span>
    <span class="n">db</span><span class="nf">.set</span><span class="p">(</span><span class="n">key</span><span class="nf">.clone</span><span class="p">(),</span> <span class="o">&amp;</span><span class="n">product_buf</span><span class="p">[</span><span class="o">..</span><span class="p">])</span><span class="o">?</span><span class="p">;</span>

    <span class="k">let</span> <span class="n">start_time</span> <span class="o">=</span> <span class="nn">Instant</span><span class="p">::</span><span class="nf">now</span><span class="p">();</span>

    <span class="k">for</span> <span class="n">_i</span> <span class="k">in</span> <span class="mi">0</span><span class="o">..</span><span class="n">ITERATIONS</span> <span class="p">{</span>
        <span class="c1">// Retrieve the product bytes from the database.</span>
        <span class="k">let</span> <span class="n">loaded_bytes</span> <span class="o">=</span> <span class="n">db</span><span class="nf">.get</span><span class="p">(</span><span class="o">&amp;</span><span class="n">key</span><span class="p">)</span><span class="o">?</span><span class="nf">.expect</span><span class="p">(</span><span class="s">"Value not found"</span><span class="p">);</span>
        <span class="c1">// Deserialize the bytes back into a `Product` reference using zero-copy.</span>
        <span class="k">let</span> <span class="n">loaded_product</span> <span class="o">=</span> <span class="nn">Product</span><span class="p">::</span><span class="nf">from_bytes</span><span class="p">(</span><span class="o">&amp;</span><span class="n">loaded_bytes</span><span class="p">);</span>

        <span class="c1">// To access the name, which is a `Ref&lt;str&gt;`, we still need the original buffer.</span>
        <span class="c1">// This is a limitation of `Ref&lt;str&gt;` and zero-copy deserialization:</span>
        <span class="c1">// the `loaded_product` itself contains a `Ref&lt;str&gt;`, which needs a `Buf` to resolve</span>
        <span class="c1">// the actual string slice from the underlying byte buffer.</span>
        <span class="k">let</span> <span class="n">loaded_buf_for_name</span> <span class="o">=</span> <span class="nn">Buf</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="o">&amp;</span><span class="n">loaded_bytes</span><span class="p">);</span>

        <span class="c1">// In a real benchmark, you might want to assert values or perform some operation</span>
        <span class="c1">// to ensure the data is correctly loaded, but for a simple performance test,</span>
        <span class="c1">// just loading and accessing is sufficient to test deserialization.</span>
        <span class="k">let</span> <span class="n">_</span> <span class="o">=</span> <span class="n">loaded_buf_for_name</span><span class="nf">.load</span><span class="p">(</span><span class="n">loaded_product</span><span class="py">.name</span><span class="p">)</span><span class="o">?</span><span class="p">;</span>

        <span class="c1">// This function decreases the stock by 1, and stores the updated product in the db again.</span>
        <span class="nf">decrease_stock</span><span class="p">(</span><span class="n">key</span><span class="nf">.clone</span><span class="p">(),</span> <span class="o">&amp;</span><span class="n">db</span><span class="p">)</span><span class="o">?</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="k">let</span> <span class="n">duration</span> <span class="o">=</span> <span class="n">start_time</span><span class="nf">.elapsed</span><span class="p">();</span>
    <span class="nd">println!</span><span class="p">(</span><span class="s">"Benchmark completed in {:?}"</span><span class="p">,</span> <span class="n">duration</span><span class="p">);</span>

    <span class="c1">// We retrieve the product and print it to check that the stock is 1.</span>
    <span class="k">let</span> <span class="n">loaded_bytes_final</span> <span class="o">=</span> <span class="n">db</span><span class="nf">.get</span><span class="p">(</span><span class="o">&amp;</span><span class="n">key</span><span class="p">)</span><span class="o">?</span><span class="nf">.expect</span><span class="p">(</span><span class="s">"Value not found after benchmark"</span><span class="p">);</span>
    <span class="k">let</span> <span class="n">loaded_product_final</span> <span class="o">=</span> <span class="nn">Product</span><span class="p">::</span><span class="nf">from_bytes</span><span class="p">(</span><span class="o">&amp;</span><span class="n">loaded_bytes_final</span><span class="p">);</span>
    <span class="k">let</span> <span class="n">loaded_buf_for_name_final</span> <span class="o">=</span> <span class="nn">Buf</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="o">&amp;</span><span class="n">loaded_bytes_final</span><span class="p">);</span>

    <span class="nd">println!</span><span class="p">(</span>
        <span class="s">"Final Product state: stock = {}, name = {}"</span><span class="p">,</span>
        <span class="n">loaded_product_final</span><span class="py">.stock</span><span class="p">,</span>
        <span class="n">loaded_buf_for_name_final</span><span class="nf">.load</span><span class="p">(</span><span class="n">loaded_product_final</span><span class="py">.name</span><span class="p">)</span><span class="o">?</span>
    <span class="p">);</span>

    <span class="nf">Ok</span><span class="p">(())</span>
<span class="p">}</span>

<span class="cd">/// Decreases the stock of a product identified by `key` in the `GrausDb`.</span>
<span class="cd">/// This function performs an in-place, zero-copy update for max performance.</span>
<span class="cd">/// It directly modifies the `stock` field within the stored byte buffer.</span>
<span class="k">fn</span> <span class="nf">decrease_stock</span><span class="p">(</span><span class="n">key</span><span class="p">:</span> <span class="nb">Vec</span><span class="o">&lt;</span><span class="nb">u8</span><span class="o">&gt;</span><span class="p">,</span> <span class="n">db</span><span class="p">:</span> <span class="o">&amp;</span><span class="n">GrausDb</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">Result</span><span class="o">&lt;</span><span class="p">(),</span> <span class="nb">Box</span><span class="o">&lt;</span><span class="k">dyn</span> <span class="n">Error</span><span class="o">&gt;&gt;</span> <span class="p">{</span>
    <span class="c1">// The `update_fn` closure is executed by `db.update_if` with mutable access</span>
    <span class="c1">// to the raw byte vector (`&amp;mut Vec&lt;u8&gt;`) representing the stored value.</span>
    <span class="k">let</span> <span class="n">update_fn</span> <span class="o">=</span> <span class="p">|</span><span class="n">value</span><span class="p">:</span> <span class="o">&amp;</span><span class="k">mut</span> <span class="nb">Vec</span><span class="o">&lt;</span><span class="nb">u8</span><span class="o">&gt;</span><span class="p">|</span> <span class="p">{</span>
        <span class="c1">// Ensure the buffer is large enough to contain at least the stock field (u16).</span>
        <span class="c1">// The `stock` field is at the beginning of the `Product` struct due to `#[repr(C)]`.</span>
        <span class="k">if</span> <span class="n">value</span><span class="nf">.len</span><span class="p">()</span> <span class="o">&lt;</span> <span class="nn">mem</span><span class="p">::</span><span class="nn">size_of</span><span class="p">::</span><span class="o">&lt;</span><span class="nb">u16</span><span class="o">&gt;</span><span class="p">()</span> <span class="p">{</span>
            <span class="nd">panic!</span><span class="p">(</span><span class="s">"Buffer too small to contain stock for key: {:?}"</span><span class="p">,</span> <span class="n">key</span><span class="p">);</span>
        <span class="p">}</span>

        <span class="c1">// Read the current stock value (u16) from the first two bytes of the buffer.</span>
        <span class="k">let</span> <span class="n">current_stock</span> <span class="o">=</span> <span class="nn">u16</span><span class="p">::</span><span class="nf">from_le_bytes</span><span class="p">([</span><span class="n">value</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">value</span><span class="p">[</span><span class="mi">1</span><span class="p">]]);</span>

        <span class="c1">// Decrement the stock, using `saturating_sub(1)` to prevent underflow (stock won't go below 0).</span>
        <span class="k">let</span> <span class="n">new_stock</span> <span class="o">=</span> <span class="n">current_stock</span><span class="nf">.saturating_sub</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span>

        <span class="c1">// Convert the new stock value back into its little-endian byte representation.</span>
        <span class="k">let</span> <span class="n">new_stock_bytes</span> <span class="o">=</span> <span class="n">new_stock</span><span class="nf">.to_le_bytes</span><span class="p">();</span>
        <span class="c1">// Write the new stock bytes directly back into the first two bytes of the buffer.</span>
        <span class="n">value</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="n">new_stock_bytes</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span>
        <span class="n">value</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="n">new_stock_bytes</span><span class="p">[</span><span class="mi">1</span><span class="p">];</span>
    <span class="p">};</span>

    <span class="c1">// Call `db.update_if` to atomically update the value associated with the key.</span>
    <span class="c1">//</span>
    <span class="c1">//</span>
    <span class="c1">// Note: The `None as Option&lt;fn(&amp;[u8]) -&gt; bool&gt;` cast is required by Rust compiler</span>
    <span class="c1">// infer the generic type `P` for the `predicate` parameter when `None` is provided,</span>
    <span class="c1">// resolving type inference ambiguity.</span>
    <span class="n">db</span><span class="nf">.update_if</span><span class="p">(</span>
        <span class="n">key</span><span class="nf">.clone</span><span class="p">(),</span>
        <span class="n">update_fn</span><span class="p">,</span>
        <span class="nb">None</span><span class="p">,</span>
        <span class="nb">None</span> <span class="k">as</span> <span class="nb">Option</span><span class="o">&lt;</span><span class="k">fn</span><span class="p">(</span><span class="o">&amp;</span><span class="p">[</span><span class="nb">u8</span><span class="p">])</span> <span class="k">-&gt;</span> <span class="nb">bool</span><span class="o">&gt;</span><span class="p">,</span>
    <span class="p">)</span>
    <span class="nf">.map_err</span><span class="p">(|</span><span class="n">e</span><span class="p">|</span> <span class="n">e</span><span class="nf">.into</span><span class="p">())</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Our <code class="language-plaintext highlighter-rouge">Product</code> struct looks like this:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">#[derive(ZeroCopy)]</span>
<span class="nd">#[repr(C)]</span>
<span class="k">struct</span> <span class="n">Product</span> <span class="p">{</span>
    <span class="n">stock</span><span class="p">:</span> <span class="nb">u16</span><span class="p">,</span>
    <span class="n">name</span><span class="p">:</span> <span class="n">Ref</span><span class="o">&lt;</span><span class="nb">str</span><span class="o">&gt;</span><span class="p">,</span>
<span class="p">}</span>
</code></pre></div></div>

<ul>
  <li><code class="language-plaintext highlighter-rouge">#[derive(ZeroCopy)]</code>: This comes from <code class="language-plaintext highlighter-rouge">musli-zerocopy</code> and automatically implements the necessary traits for zero-copy operations.</li>
  <li><code class="language-plaintext highlighter-rouge">#[repr(C)]</code>: As we discussed, this ensures the <code class="language-plaintext highlighter-rouge">stock</code> and <code class="language-plaintext highlighter-rouge">name</code> fields are laid out sequentially in memory.</li>
  <li><code class="language-plaintext highlighter-rouge">Ref&lt;str&gt;</code>: This is the clever part. A <code class="language-plaintext highlighter-rouge">String</code> or <code class="language-plaintext highlighter-rouge">&amp;str</code> can have any length, which complicates a fixed memory layout. <code class="language-plaintext highlighter-rouge">Ref&lt;str&gt;</code> solves this. It doesn’t store the string data itself, but acts like a “pointer” or an offset to where the string data is located within the same byte buffer.</li>
</ul>

<p>The <code class="language-plaintext highlighter-rouge">Product::from_bytes()</code> function is where the magic happens. It takes a slice of bytes (<code class="language-plaintext highlighter-rouge">&amp;[u8]</code>) and, with a bit of pointer casting, gives us back a <code class="language-plaintext highlighter-rouge">&amp;Product</code>. It’s almost instantaneous because there’s no real “work” to do. It’s just re-interpreting the existing data. <strong>This is key, we just read the bytes from the database and tell Rust to treat them as a <code class="language-plaintext highlighter-rouge">Product</code> (no other serialization operations)</strong>.</p>

<h2 id="product-stock-update">Product Stock Update</h2>

<p>The benchmark doesn’t just read data; it performs a read-modify-write cycle. Look at the <code class="language-plaintext highlighter-rouge">decrease_stock()</code> function. It uses <code class="language-plaintext highlighter-rouge">db.update_if</code>, which gives us direct, mutable access to the raw bytes of the value <em>as they exist in the database</em>.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">let</span> <span class="n">update_fn</span> <span class="o">=</span> <span class="p">|</span><span class="n">value</span><span class="p">:</span> <span class="o">&amp;</span><span class="k">mut</span> <span class="nb">Vec</span><span class="o">&lt;</span><span class="nb">u8</span><span class="o">&gt;</span><span class="p">|</span> <span class="p">{</span>
    <span class="c1">// Read the current stock from the first two bytes</span>
    <span class="k">let</span> <span class="n">current_stock</span> <span class="o">=</span> <span class="nn">u16</span><span class="p">::</span><span class="nf">from_le_bytes</span><span class="p">([</span><span class="n">value</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">value</span><span class="p">[</span><span class="mi">1</span><span class="p">]]);</span>
    <span class="c1">// Decrement it</span>
    <span class="k">let</span> <span class="n">new_stock</span> <span class="o">=</span> <span class="n">current_stock</span><span class="nf">.saturating_sub</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span>
    <span class="c1">// Write the new value back to the first two bytes</span>
    <span class="k">let</span> <span class="n">new_stock_bytes</span> <span class="o">=</span> <span class="n">new_stock</span><span class="nf">.to_le_bytes</span><span class="p">();</span>
    <span class="n">value</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="n">new_stock_bytes</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span>
    <span class="n">value</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="n">new_stock_bytes</span><span class="p">[</span><span class="mi">1</span><span class="p">];</span>
<span class="p">};</span>
</code></pre></div></div>

<p>Because we used <code class="language-plaintext highlighter-rouge">#[repr(C)]</code>, we know <em>exactly</em> where the <code class="language-plaintext highlighter-rouge">stock</code> value is: it’s the first two bytes of our data. So, to decrease the stock, we don’t need to deserialize the whole object. We can just read those two bytes, calculate the new value, and write them back.</p>

<h2 id="results">Results</h2>

<p>So, this is the result of combining zero-copy deserialization with GrausDB on a cheap laptop:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>rpallas@rpallas-Surface-Laptop-Go-2:~/workspace/GrausDB/examples/zero_copy_struct_serde<span class="nv">$ </span>cargo run <span class="nt">--release</span>
    Finished <span class="sb">`</span>release<span class="sb">`</span> profile <span class="o">[</span>optimized] target<span class="o">(</span>s<span class="o">)</span> <span class="k">in </span>0.02s
     Running <span class="sb">`</span>/home/rpallas/workspace/GrausDB/target/release/grausdb_example<span class="sb">`</span>
GrausDB opened at <span class="o">=</span><span class="s1">'"./grausdb_data"'</span>
Benchmark completed <span class="k">in </span>54.992673ms
Final Product state: stock <span class="o">=</span> 1, name <span class="o">=</span> Yeezy Boost 350 V2
</code></pre></div></div>

<p><strong>55 milliseconds for 20,000 full read-modify-write cycles.</strong></p>

<p>Let me break down what’s happening in each iteration:</p>

<ol>
  <li><code class="language-plaintext highlighter-rouge">db.get(&amp;key)</code> - Reading the product bytes from the database</li>
  <li><code class="language-plaintext highlighter-rouge">Product::from_bytes(&amp;loaded_bytes)</code> - Zero-copy deserialization (basically free!)</li>
  <li><code class="language-plaintext highlighter-rouge">loaded_buf_for_name.load(loaded_product.name)</code> - Accessing the name field to ensure the data is valid and just to deserialize it for the benchmark</li>
  <li><code class="language-plaintext highlighter-rouge">decrease_stock(key.clone(), &amp;db)</code> - Another database call with <code class="language-plaintext highlighter-rouge">db.update_if</code> to modify the stock atomically</li>
</ol>

<p>So each iteration involves multiple database operations: two reads and one write. That means we’re doing <strong>60,000 database operations</strong> in 55 milliseconds—that’s over <strong>1 million operations per second</strong>. On a Surface Go 2!</p>

<p>By avoiding data copies and using zero-copy deserialization, we can build systems that are fast, even on modest hardware.</p>

<h2 id="zero-copy-vs-traditional-json-serialization">Zero-Copy vs Traditional JSON Serialization</h2>

<p>To show the real-world impact of zero-copy serialization, I ran a comparison benchmark against traditional JSON serialization using <code class="language-plaintext highlighter-rouge">serde_json</code>. This time with a more realistic, complex struct that better represents real-world data.</p>

<p>Here’s the struct we’re testing with:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">#[derive(ZeroCopy)]</span>
<span class="nd">#[repr(C)]</span>
<span class="k">struct</span> <span class="n">Product</span> <span class="p">{</span>
    <span class="n">product_id</span><span class="p">:</span> <span class="nb">u64</span><span class="p">,</span>
    <span class="n">stock</span><span class="p">:</span> <span class="nb">u16</span><span class="p">,</span>
    <span class="n">price</span><span class="p">:</span> <span class="nb">u32</span><span class="p">,</span>
    <span class="n">weight</span><span class="p">:</span> <span class="nb">f32</span><span class="p">,</span>
    <span class="n">is_available</span><span class="p">:</span> <span class="nb">bool</span><span class="p">,</span>
    <span class="n">category</span><span class="p">:</span> <span class="n">Ref</span><span class="o">&lt;</span><span class="nb">str</span><span class="o">&gt;</span><span class="p">,</span>
    <span class="n">manufacturer</span><span class="p">:</span> <span class="n">Ref</span><span class="o">&lt;</span><span class="nb">str</span><span class="o">&gt;</span><span class="p">,</span>
    <span class="n">dimensions</span><span class="p">:</span> <span class="n">Dimensions</span><span class="p">,</span>
    <span class="n">rating</span><span class="p">:</span> <span class="nb">f32</span><span class="p">,</span>
    <span class="n">name</span><span class="p">:</span> <span class="n">Ref</span><span class="o">&lt;</span><span class="nb">str</span><span class="o">&gt;</span><span class="p">,</span>
    <span class="n">description</span><span class="p">:</span> <span class="n">Ref</span><span class="o">&lt;</span><span class="nb">str</span><span class="o">&gt;</span><span class="p">,</span>
<span class="p">}</span>

<span class="nd">#[derive(ZeroCopy)]</span>
<span class="nd">#[repr(C)]</span>
<span class="k">struct</span> <span class="n">Dimensions</span> <span class="p">{</span>
    <span class="n">length</span><span class="p">:</span> <span class="nb">u32</span><span class="p">,</span>
    <span class="n">width</span><span class="p">:</span> <span class="nb">u32</span><span class="p">,</span>
    <span class="n">height</span><span class="p">:</span> <span class="nb">u32</span><span class="p">,</span>
<span class="p">}</span>
</code></pre></div></div>

<p>And the JSON equivalent for comparison:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">#[derive(Serialize,</span> <span class="nd">Deserialize)]</span>
<span class="k">struct</span> <span class="n">ProductJson</span><span class="o">&lt;</span><span class="nv">'a</span><span class="o">&gt;</span> <span class="p">{</span>
    <span class="n">product_id</span><span class="p">:</span> <span class="nb">u64</span><span class="p">,</span>
    <span class="n">stock</span><span class="p">:</span> <span class="nb">u16</span><span class="p">,</span>
    <span class="n">price</span><span class="p">:</span> <span class="nb">u32</span><span class="p">,</span>
    <span class="n">weight</span><span class="p">:</span> <span class="nb">f32</span><span class="p">,</span>
    <span class="n">is_available</span><span class="p">:</span> <span class="nb">bool</span><span class="p">,</span>
    <span class="n">category</span><span class="p">:</span> <span class="o">&amp;</span><span class="nv">'a</span> <span class="nb">str</span><span class="p">,</span>
    <span class="n">manufacturer</span><span class="p">:</span> <span class="o">&amp;</span><span class="nv">'a</span> <span class="nb">str</span><span class="p">,</span>
    <span class="n">dimensions</span><span class="p">:</span> <span class="n">DimensionsJson</span><span class="p">,</span>
    <span class="n">rating</span><span class="p">:</span> <span class="nb">f32</span><span class="p">,</span>
    <span class="n">name</span><span class="p">:</span> <span class="o">&amp;</span><span class="nv">'a</span> <span class="nb">str</span><span class="p">,</span>
    <span class="n">description</span><span class="p">:</span> <span class="o">&amp;</span><span class="nv">'a</span> <span class="nb">str</span><span class="p">,</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The benchmark runs 1 million read operations (with deserialization) for both approaches:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>rpallas@rpallas-Surface-Laptop-Go-2:~/workspace/GrausDB/examples/zero_copy_struct_serde<span class="nv">$ </span>cargo run <span class="nt">--release</span> <span class="nt">--example</span> zero_copy_vs_traditional_serde
GrausDB opened at <span class="o">=</span><span class="s1">'"./grausdb_data"'</span>
Zero-copy benchmark completed <span class="k">in </span>563.205706ms

Starting JSON benchmark...
JSON benchmark completed <span class="k">in </span>1.072244937s
</code></pre></div></div>

<p><strong>Zero-copy is ~90% faster</strong> than JSON serialization (563ms vs 1072ms). That’s nearly <strong>2x the performance</strong>.</p>

<p>With a more complex struct (11 fields including a nested struct), the advantage of zero-copy becomes clear. JSON serialization needs to:</p>
<ul>
  <li>Parse the JSON structure</li>
  <li>Validate UTF-8 for every string</li>
  <li>Allocate memory for the deserialized struct</li>
  <li>Copy all the data into the new allocations</li>
</ul>

<p>Zero-copy just casts the bytes and you’re done. No parsing, no validation, no allocations.</p>

<p>For systems doing millions of operations per second, this difference is huge. The full comparison code is available at: <a href="https://github.com/RPallas92/GrausDB/blob/main/examples/zero_copy_struct_serde/src/zero_copy_vs_traditional_serde.rs">zero_copy_vs_traditional_serde.rs</a></p>

<h2 id="important-caveats-and-pitfalls">Important Caveats and Pitfalls</h2>

<p>Zero-copy is powerful, but it has some caveats.</p>

<h3 id="1-endianness-and-portability">1. Endianness and Portability</h3>

<p>The example uses <code class="language-plaintext highlighter-rouge">endian::Little</code>. If you write data on one architecture and read on another with different endianness, or if you change the byte order, things will break.</p>

<h3 id="2-alignment-and-padding">2. Alignment and Padding</h3>

<p><code class="language-plaintext highlighter-rouge">#[repr(C)]</code> helps, but be mindful of alignment and padding. When you add fields, check <code class="language-plaintext highlighter-rouge">mem::size_of</code> and <code class="language-plaintext highlighter-rouge">mem::align_of</code> to make sure offsets stay where you expect.</p>

<h3 id="3-partial-writes-and-crash-consistency">3. Partial Writes and Crash Consistency</h3>

<p>Writing a <code class="language-plaintext highlighter-rouge">u16</code> in-place involves changing two bytes. On many systems, updating those two bytes is fast, but it is <em>not guaranteed</em> to be atomic with respect to power loss. If the process or machine crashes while the write is in-flight, you might end up with a partially updated value (corruption). GrausDB does not implement this yeat (that ios why is not production ready).</p>

<h3 id="4-durability-vs-performance-fsync">4. Durability vs Performance (fsync)</h3>

<p><strong>⚠️ Warning:</strong> GrausDB does <strong>not</strong> call <code class="language-plaintext highlighter-rouge">fsync</code> after every write by default. This is an explicit trade-off for speed.</p>

<p><strong>What is fsync?</strong></p>

<p>The operating system typically caches file writes in memory (page cache) and flushes them to disk later. <code class="language-plaintext highlighter-rouge">fsync</code> forces the OS to flush those buffers to the physical storage, ensuring the data is durable on disk before the call returns.</p>

<p><strong>Why GrausDB skips it:</strong> Calling <code class="language-plaintext highlighter-rouge">fsync</code> is slow. Skipping <code class="language-plaintext highlighter-rouge">fsync</code> lets databases reach very high throughput, but at the cost that recently-written data may be lost if the machine crashes before the OS flushes the pages to disk.</p>

<p><strong>When this trade-off makes sense:</strong> For many use cases, this is perfectly acceptable. Think about a server handling a product drop where all the stock is sold in 2 seconds. You need extreme performance during those 2 seconds. In fact, GrausDB can be better than Redis for this scenario because you don’t have the network overhead of sending data back and forth. And if you need high availability, you can create a cluster of servers with local GrausDB instances and sync them asynchronously for maximum performance.</p>

<h2 id="conclusion">Conclusion</h2>

<p>We’ve seen how combining embedded database like GrausDB with zero-copy serialization in Rust can lead to great performance. My key takeaway is to avoid copying as it is hidden performance cost. Zero-copy eliminates it.</p>

<p>This approach requires a bit more thought than just using a standard serialization library, but the performance gains are worth it.</p>

<p>If you want to dive deeper into the mechanics of how libraries like <code class="language-plaintext highlighter-rouge">musli-zerocopy</code> work their magic, I highly recommend reading this excellent article:</p>

<ul>
  <li><strong>Understanding Musli Zero-Copy:</strong> <a href="https://udoprog.github.io/rust/2023-10-19/musli-zerocopy.html">https://udoprog.github.io/rust/2023-10-19/musli-zerocopy.html</a></li>
</ul>

<p><strong>Also, you can find the full code example here:</strong> <a href="https://github.com/RPallas92/GrausDB/blob/main/examples/zero_copy_struct_serde/src/main.rs">https://github.com/RPallas92/GrausDB/blob/main/examples/zero_copy_struct_serde/src/main.rs</a></p>

<p>Thanks for reading, and happy coding!</p>]]></content><author><name>Ricardo Pallas</name></author><category term="blog" /><category term="rust" /><category term="performance" /><category term="optimization" /><category term="zero copy" /><category term="GrausDB" /><summary type="html"><![CDATA[How to persist structs with great performance using zero-copy serialization and GrausDB. No copies, no allocations—just direct memory-to-disk writes reaching over 1 million operations per second.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://rpallas92.github.io/assets/images/zero-copy/header.png" /><media:content medium="image" url="https://rpallas92.github.io/assets/images/zero-copy/header.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Optimizing a Math Expression Parser in Rust</title><link href="https://rpallas92.github.io/math-parser/" rel="alternate" type="text/html" title="Optimizing a Math Expression Parser in Rust" /><published>2025-06-30T08:00:00+00:00</published><updated>2025-06-30T08:00:00+00:00</updated><id>https://rpallas92.github.io/math-parser</id><content type="html" xml:base="https://rpallas92.github.io/math-parser/"><![CDATA[<h1 id="optimizing-a-math-expression-parser-in-rust">Optimizing a Math Expression Parser in Rust</h1>

<h2 id="table-of-contents">Table of contents</h2>

<ol>
  <li><a href="#baseline-implementation-431s">Baseline implementation (43.1 s)</a>
    <ol>
      <li><a href="#how-it-works">How it works</a></li>
      <li><a href="#parser-example-1--2---3">Parser Example: (1 + 2) - 3</a></li>
      <li><a href="#it-works-but-we-can-do-better">It works! But we can do better</a></li>
    </ol>
  </li>
  <li><a href="#optimizations-for-speed-and-memory">Optimizations for speed and memory</a>
    <ol>
      <li><a href="#optimization-1-do-not-allocate-a-vector-when-tokenizing-431s--645s--85-improvement">Optimization 1: Do not allocate a Vector when tokenizing (43.1 s → 6.45 s, –85% improvement)</a></li>
      <li><a href="#optimization-2-zero-allocations--parse-directly-from-the-input-bytes-645s--368s--43-improvement">Optimization 2: Zero allocations — parse directly from the input bytes (6.45 s → 3.68 s, –43% improvement)</a></li>
      <li><a href="#optimization-3-do-not-use-peekable-368s--321s--13-improvement">Optimization 3: Do not use Peekable (3.68 s → 3.21 s, –13% improvement)</a></li>
      <li><a href="#optimization-4-multithreading-and-simd-321s--221s--31-improvement">Optimization 4: Multithreading and SIMD (3.21 s → 2.21 s, –31% improvement)</a></li>
      <li><a href="#optimization-5-memory-mapped-io-221s--098s--56-improvement">Optimization 5: Memory‑mapped I/O (2.21 s → 0.98 s, –56% improvement)</a></li>
    </ol>
  </li>
  <li><a href="#conclusion">Conclusion</a></li>
</ol>

<hr />

<p>In a <a href="https://rpallas.xyz/1brc/">previous post</a> I explored how to optimize file parsing for max speed. This time, we’ll look at a different, self-contained problem: writing a math expression parser in Rust, and making it as fast and memory-efficient as possible.</p>

<p>Let’s say we want to parse simple math expressions with addition, subtraction, and parentheses. For example:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>4 + 5 + 2 - 1       =&gt; 10
(4 + 5) - (2 + 1)   =&gt; 6
(1 + (2 + 3)) - 4   =&gt; 2
</code></pre></div></div>

<p>We’ll start with a straightforward implementation and optimize it step by step.</p>

<p><strong>YOU CAN FIND THE FULL CODE ON:</strong> <a href="https://github.com/RPallas92/math_parser">https://github.com/RPallas92/math_parser</a></p>

<hr />

<h2 id="baseline-implementation-431s">Baseline implementation (43.1 s)</h2>

<p>Here’s the first version of our parser:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">use</span> <span class="nn">std</span><span class="p">::</span><span class="n">fs</span><span class="p">;</span>
<span class="k">use</span> <span class="nn">std</span><span class="p">::</span><span class="nn">io</span><span class="p">::</span><span class="nb">Result</span><span class="p">;</span>
<span class="k">use</span> <span class="nn">std</span><span class="p">::{</span><span class="nn">iter</span><span class="p">::</span><span class="n">Peekable</span><span class="p">,</span> <span class="nn">time</span><span class="p">::</span><span class="n">Instant</span><span class="p">};</span>

<span class="k">fn</span> <span class="nf">main</span><span class="p">()</span> <span class="k">-&gt;</span> <span class="nb">Result</span><span class="o">&lt;</span><span class="p">()</span><span class="o">&gt;</span> <span class="p">{</span>
    <span class="k">let</span> <span class="n">total_start</span> <span class="o">=</span> <span class="nn">Instant</span><span class="p">::</span><span class="nf">now</span><span class="p">();</span>
    <span class="k">let</span> <span class="k">mut</span> <span class="n">step_start</span> <span class="o">=</span> <span class="nn">Instant</span><span class="p">::</span><span class="nf">now</span><span class="p">();</span>

    <span class="k">let</span> <span class="n">input</span> <span class="o">=</span> <span class="nf">read_input_file</span><span class="p">()</span><span class="o">?</span><span class="p">;</span>
   
    <span class="nd">println!</span><span class="p">(</span><span class="s">"Step 1: Input file read in {:?}"</span><span class="p">,</span> <span class="n">step_start</span><span class="nf">.elapsed</span><span class="p">());</span>

    <span class="n">step_start</span> <span class="o">=</span> <span class="nn">Instant</span><span class="p">::</span><span class="nf">now</span><span class="p">();</span>
    
    <span class="k">let</span> <span class="n">result</span> <span class="o">=</span> <span class="nf">eval</span><span class="p">(</span><span class="o">&amp;</span><span class="n">input</span><span class="p">);</span>
    
    <span class="nd">println!</span><span class="p">(</span>
        <span class="s">"Step 2: Calculation completed in {:?}"</span><span class="p">,</span>
        <span class="n">step_start</span><span class="nf">.elapsed</span><span class="p">()</span>
    <span class="p">);</span>

    <span class="k">let</span> <span class="n">total_duration</span> <span class="o">=</span> <span class="n">total_start</span><span class="nf">.elapsed</span><span class="p">();</span>
    
    <span class="nd">println!</span><span class="p">(</span><span class="s">"--- Summary ---"</span><span class="p">);</span>
    <span class="nd">println!</span><span class="p">(</span><span class="s">"Result: {}"</span><span class="p">,</span> <span class="n">result</span><span class="p">);</span>
    <span class="nd">println!</span><span class="p">(</span><span class="s">"Total time: {:?}"</span><span class="p">,</span> <span class="n">total_duration</span><span class="p">);</span>

    <span class="nf">Ok</span><span class="p">(())</span>
<span class="p">}</span>

<span class="k">fn</span> <span class="nf">read_input_file</span><span class="p">()</span> <span class="k">-&gt;</span> <span class="nb">Result</span><span class="o">&lt;</span><span class="nb">String</span><span class="o">&gt;</span> <span class="p">{</span>
    <span class="nn">fs</span><span class="p">::</span><span class="nf">read_to_string</span><span class="p">(</span><span class="s">"data/input.txt"</span><span class="p">)</span>
<span class="p">}</span>

<span class="k">fn</span> <span class="nf">eval</span><span class="p">(</span><span class="n">input</span><span class="p">:</span> <span class="o">&amp;</span><span class="nb">str</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">u32</span> <span class="p">{</span>
    <span class="k">let</span> <span class="k">mut</span> <span class="n">tokens</span> <span class="o">=</span> <span class="nf">tokenize</span><span class="p">(</span><span class="n">input</span><span class="p">)</span><span class="nf">.into_iter</span><span class="p">()</span><span class="nf">.peekable</span><span class="p">();</span>
    <span class="nf">parse_expression</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="n">tokens</span><span class="p">)</span>
<span class="p">}</span>

<span class="k">fn</span> <span class="nf">tokenize</span><span class="p">(</span><span class="n">input</span><span class="p">:</span> <span class="o">&amp;</span><span class="nb">str</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">Vec</span><span class="o">&lt;</span><span class="n">Token</span><span class="o">&gt;</span> <span class="p">{</span>
    <span class="n">input</span>
        <span class="nf">.split_whitespace</span><span class="p">()</span>
        <span class="nf">.map</span><span class="p">(|</span><span class="n">s</span><span class="p">|</span> <span class="k">match</span> <span class="n">s</span> <span class="p">{</span>
            <span class="s">"+"</span> <span class="k">=&gt;</span> <span class="nn">Token</span><span class="p">::</span><span class="n">Plus</span><span class="p">,</span>
            <span class="s">"-"</span> <span class="k">=&gt;</span> <span class="nn">Token</span><span class="p">::</span><span class="n">Minus</span><span class="p">,</span>
            <span class="s">"("</span> <span class="k">=&gt;</span> <span class="nn">Token</span><span class="p">::</span><span class="n">OpeningParenthesis</span><span class="p">,</span>
            <span class="s">")"</span> <span class="k">=&gt;</span> <span class="nn">Token</span><span class="p">::</span><span class="n">ClosingParenthesis</span><span class="p">,</span>
            <span class="n">n</span> <span class="k">=&gt;</span> <span class="nn">Token</span><span class="p">::</span><span class="nf">Operand</span><span class="p">(</span><span class="n">n</span><span class="nf">.parse</span><span class="p">()</span><span class="nf">.unwrap</span><span class="p">()),</span>
        <span class="p">})</span>
        <span class="nf">.collect</span><span class="p">()</span>
<span class="p">}</span>

<span class="k">fn</span> <span class="nf">parse_expression</span><span class="p">(</span><span class="n">tokens</span><span class="p">:</span> <span class="o">&amp;</span><span class="k">mut</span> <span class="n">Peekable</span><span class="o">&lt;</span><span class="k">impl</span> <span class="nb">Iterator</span><span class="o">&lt;</span><span class="n">Item</span> <span class="o">=</span> <span class="n">Token</span><span class="o">&gt;&gt;</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">u32</span> <span class="p">{</span>
    <span class="k">let</span> <span class="k">mut</span> <span class="n">left</span> <span class="o">=</span> <span class="nf">parse_primary</span><span class="p">(</span><span class="n">tokens</span><span class="p">);</span>

    <span class="k">while</span> <span class="k">let</span> <span class="nf">Some</span><span class="p">(</span><span class="nn">Token</span><span class="p">::</span><span class="n">Plus</span><span class="p">)</span> <span class="p">|</span> <span class="nf">Some</span><span class="p">(</span><span class="nn">Token</span><span class="p">::</span><span class="n">Minus</span><span class="p">)</span> <span class="o">=</span> <span class="n">tokens</span><span class="nf">.peek</span><span class="p">()</span> <span class="p">{</span>
        <span class="k">let</span> <span class="n">operator</span><span class="p">:</span> <span class="nb">Option</span><span class="o">&lt;</span><span class="n">Token</span><span class="o">&gt;</span> <span class="o">=</span> <span class="n">tokens</span><span class="nf">.next</span><span class="p">();</span>
        <span class="k">let</span> <span class="n">right</span> <span class="o">=</span> <span class="nf">parse_primary</span><span class="p">(</span><span class="n">tokens</span><span class="p">);</span>
        <span class="n">left</span> <span class="o">=</span> <span class="k">match</span> <span class="n">operator</span> <span class="p">{</span>
            <span class="nf">Some</span><span class="p">(</span><span class="nn">Token</span><span class="p">::</span><span class="n">Plus</span><span class="p">)</span> <span class="k">=&gt;</span> <span class="n">left</span> <span class="o">+</span> <span class="n">right</span><span class="p">,</span>
            <span class="nf">Some</span><span class="p">(</span><span class="nn">Token</span><span class="p">::</span><span class="n">Minus</span><span class="p">)</span> <span class="k">=&gt;</span> <span class="n">left</span> <span class="o">-</span> <span class="n">right</span><span class="p">,</span>
            <span class="n">other</span> <span class="k">=&gt;</span> <span class="nd">panic!</span><span class="p">(</span><span class="s">"Expected operator, got {:?}"</span><span class="p">,</span> <span class="n">other</span><span class="p">),</span>
        <span class="p">}</span>
    <span class="p">}</span>

    <span class="k">return</span> <span class="n">left</span><span class="p">;</span>
<span class="p">}</span>

<span class="k">fn</span> <span class="nf">parse_primary</span><span class="p">(</span><span class="n">tokens</span><span class="p">:</span> <span class="o">&amp;</span><span class="k">mut</span> <span class="n">Peekable</span><span class="o">&lt;</span><span class="k">impl</span> <span class="nb">Iterator</span><span class="o">&lt;</span><span class="n">Item</span> <span class="o">=</span> <span class="n">Token</span><span class="o">&gt;&gt;</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">u32</span> <span class="p">{</span>
    <span class="k">match</span> <span class="n">tokens</span><span class="nf">.peek</span><span class="p">()</span> <span class="p">{</span>
        <span class="nf">Some</span><span class="p">(</span><span class="nn">Token</span><span class="p">::</span><span class="n">OpeningParenthesis</span><span class="p">)</span> <span class="k">=&gt;</span> <span class="p">{</span>
            <span class="n">tokens</span><span class="nf">.next</span><span class="p">();</span> <span class="c1">// consume '('</span>
            <span class="k">let</span> <span class="n">val</span> <span class="o">=</span> <span class="nf">parse_expression</span><span class="p">(</span><span class="n">tokens</span><span class="p">);</span>
            <span class="n">tokens</span><span class="nf">.next</span><span class="p">();</span> <span class="c1">// consume ')'</span>
        <span class="p">}</span>
        <span class="n">_</span> <span class="k">=&gt;</span> <span class="nf">parse_operand</span><span class="p">(</span><span class="n">tokens</span><span class="p">),</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="k">fn</span> <span class="nf">parse_operand</span><span class="p">(</span><span class="n">tokens</span><span class="p">:</span> <span class="o">&amp;</span><span class="k">mut</span> <span class="n">Peekable</span><span class="o">&lt;</span><span class="k">impl</span> <span class="nb">Iterator</span><span class="o">&lt;</span><span class="n">Item</span> <span class="o">=</span> <span class="n">Token</span><span class="o">&gt;&gt;</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">u32</span> <span class="p">{</span>
    <span class="k">match</span> <span class="n">tokens</span><span class="nf">.next</span><span class="p">()</span> <span class="p">{</span>
        <span class="nf">Some</span><span class="p">(</span><span class="nn">Token</span><span class="p">::</span><span class="nf">Operand</span><span class="p">(</span><span class="n">n</span><span class="p">))</span> <span class="k">=&gt;</span> <span class="n">n</span><span class="p">,</span>
        <span class="n">other</span> <span class="k">=&gt;</span> <span class="nd">panic!</span><span class="p">(</span><span class="s">"Expected number, got {:?}"</span><span class="p">,</span> <span class="n">other</span><span class="p">),</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="nd">#[derive(Debug,</span> <span class="nd">Clone,</span> <span class="nd">PartialEq)]</span>
<span class="k">enum</span> <span class="n">Token</span> <span class="p">{</span>
    <span class="nf">Operand</span><span class="p">(</span><span class="nb">u32</span><span class="p">),</span>
    <span class="n">Plus</span><span class="p">,</span>
    <span class="n">Minus</span><span class="p">,</span>
    <span class="n">OpeningParenthesis</span><span class="p">,</span>
    <span class="n">ClosingParenthesis</span><span class="p">,</span>
<span class="p">}</span>
</code></pre></div></div>

<hr />

<h3 id="how-it-works">How it works</h3>

<p>Let’s break it down.</p>

<p>The program reads from a file called <code class="language-plaintext highlighter-rouge">input.txt</code>, which contains a math expression in a single line. That expression is passed to the <code class="language-plaintext highlighter-rouge">eval()</code> function.</p>

<p>The <code class="language-plaintext highlighter-rouge">tokenize()</code> function processes the input string, splitting it by whitespace and converting each segment into a token. For example, this input:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>7 - 3 + 1
</code></pre></div></div>

<p>…is turned into this list of tokens:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[Operand(7), Minus, Operand(3), Plus, Operand(1)]
</code></pre></div></div>

<p>The parser then uses a simple recursive strategy:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">parse_expression()</code> handles sequences of additions and subtractions.</li>
  <li><code class="language-plaintext highlighter-rouge">parse_primary()</code> handles numbers and expressions inside parentheses.</li>
  <li><code class="language-plaintext highlighter-rouge">parse_operand()</code> handles the actual integer values.</li>
</ul>

<p>The recursive call to <code class="language-plaintext highlighter-rouge">parse_expression()</code> inside <code class="language-plaintext highlighter-rouge">parse_primary()</code> allows us to evaluate nested expressions (parentheses).</p>

<hr />

<h3 id="parser-example-1--2---3">Parser Example: (1 + 2) - 3</h3>

<p>Let’s walk through parsing the expression <code class="language-plaintext highlighter-rouge">(1 + 2) - 3</code> using our functions:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">fn</span> <span class="nf">eval</span><span class="p">(</span><span class="n">input</span><span class="p">:</span> <span class="o">&amp;</span><span class="nb">str</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">u32</span> <span class="p">{</span>
    <span class="k">let</span> <span class="k">mut</span> <span class="n">tokens</span> <span class="o">=</span> <span class="nf">tokenize</span><span class="p">(</span><span class="n">input</span><span class="p">)</span><span class="nf">.peekable</span><span class="p">();</span>
    <span class="nf">parse_expression</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="n">tokens</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>Input string:</strong> <code class="language-plaintext highlighter-rouge">(1 + 2) - 3</code></p>

<p><strong>Tokens with index:</strong></p>

<table>
  <thead>
    <tr>
      <th>Index</th>
      <th>Token</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>0</td>
      <td>OpeningParenthesis <code class="language-plaintext highlighter-rouge">(</code></td>
    </tr>
    <tr>
      <td>1</td>
      <td>Operand(1)</td>
    </tr>
    <tr>
      <td>2</td>
      <td>Plus <code class="language-plaintext highlighter-rouge">+</code></td>
    </tr>
    <tr>
      <td>3</td>
      <td>Operand(2)</td>
    </tr>
    <tr>
      <td>4</td>
      <td>ClosingParenthesis <code class="language-plaintext highlighter-rouge">)</code></td>
    </tr>
    <tr>
      <td>5</td>
      <td>Minus <code class="language-plaintext highlighter-rouge">-</code></td>
    </tr>
    <tr>
      <td>6</td>
      <td>Operand(3)</td>
    </tr>
  </tbody>
</table>

<p>We begin by calling <code class="language-plaintext highlighter-rouge">parse_expression</code> at <strong>depth 1</strong>:</p>

<ol>
  <li><strong><code class="language-plaintext highlighter-rouge">parse_expression</code> (depth 1)</strong>
    <ul>
      <li>Calls <code class="language-plaintext highlighter-rouge">parse_primary</code> to get the first value.</li>
    </ul>
  </li>
  <li><strong><code class="language-plaintext highlighter-rouge">parse_primary</code> (depth 2)</strong>
    <ul>
      <li>Sees <code class="language-plaintext highlighter-rouge">OpeningParenthesis</code> at index 0.</li>
      <li>Consumes <code class="language-plaintext highlighter-rouge">(</code> and calls <strong><code class="language-plaintext highlighter-rouge">parse_expression</code> (depth 3)</strong> for the parenthesized subexpression.</li>
    </ul>
  </li>
  <li><strong><code class="language-plaintext highlighter-rouge">parse_expression</code> (depth 3)</strong>
    <ul>
      <li>Calls <code class="language-plaintext highlighter-rouge">parse_primary</code> (depth 4).</li>
    </ul>
  </li>
  <li><strong><code class="language-plaintext highlighter-rouge">parse_primary</code> (depth 4)</strong>
    <ul>
      <li>Sees <code class="language-plaintext highlighter-rouge">Operand(1)</code> at index 1.</li>
      <li>Calls <code class="language-plaintext highlighter-rouge">parse_operand</code> (depth 5), which consumes index 1 and returns <code class="language-plaintext highlighter-rouge">1</code>.</li>
    </ul>
  </li>
  <li><strong><code class="language-plaintext highlighter-rouge">parse_expression</code> (depth 3)</strong> (resuming)
    <ul>
      <li>Sees <code class="language-plaintext highlighter-rouge">Plus</code> at index 2.</li>
      <li>Consumes <code class="language-plaintext highlighter-rouge">+</code> and calls <code class="language-plaintext highlighter-rouge">parse_primary</code> (depth 4) again.</li>
    </ul>
  </li>
  <li><strong><code class="language-plaintext highlighter-rouge">parse_primary</code> (depth 4)</strong>
    <ul>
      <li>Sees <code class="language-plaintext highlighter-rouge">Operand(2)</code> at index 3.</li>
      <li>Calls <code class="language-plaintext highlighter-rouge">parse_operand</code> (depth 5), which consumes index 3 and returns <code class="language-plaintext highlighter-rouge">2</code>.</li>
    </ul>
  </li>
  <li><strong><code class="language-plaintext highlighter-rouge">parse_expression</code> (depth 3)</strong>
    <ul>
      <li>Combines <code class="language-plaintext highlighter-rouge">1 + 2 = 3</code>.</li>
      <li>Returns <code class="language-plaintext highlighter-rouge">3</code> to the caller at depth 2.</li>
    </ul>
  </li>
  <li><strong><code class="language-plaintext highlighter-rouge">parse_primary</code> (depth 2)</strong> (resuming)
    <ul>
      <li>Now at index 4 sees <code class="language-plaintext highlighter-rouge">ClosingParenthesis</code>.</li>
      <li>Consumes <code class="language-plaintext highlighter-rouge">)</code> and returns the inner value <code class="language-plaintext highlighter-rouge">3</code>.</li>
    </ul>
  </li>
  <li><strong><code class="language-plaintext highlighter-rouge">parse_expression</code> (depth 1)</strong> (resuming)
    <ul>
      <li>Left side is <code class="language-plaintext highlighter-rouge">3</code>.</li>
      <li>Sees <code class="language-plaintext highlighter-rouge">Minus</code> at index 5.</li>
      <li>Consumes <code class="language-plaintext highlighter-rouge">-</code> and calls <code class="language-plaintext highlighter-rouge">parse_primary</code> (depth 2).</li>
    </ul>
  </li>
  <li><strong><code class="language-plaintext highlighter-rouge">parse_primary</code> (depth 2)</strong>
    <ul>
      <li>Sees <code class="language-plaintext highlighter-rouge">Operand(3)</code> at index 6.</li>
      <li>Calls <code class="language-plaintext highlighter-rouge">parse_operand</code> (depth 5), consumes it, and returns <code class="language-plaintext highlighter-rouge">3</code>.</li>
    </ul>
  </li>
  <li><strong><code class="language-plaintext highlighter-rouge">parse_expression</code> (depth 1)</strong>
    <ul>
      <li>Computes <code class="language-plaintext highlighter-rouge">3 - 3 = 0</code>.</li>
      <li>No more operators, so it returns <code class="language-plaintext highlighter-rouge">0</code> as the final result.</li>
    </ul>
  </li>
</ol>

<hr />

<h3 id="it-works-but-we-can-do-better">It works! But we can do better</h3>

<p>This baseline parser works well, but it’s not optimized.</p>

<p>If we compile it in release mode and execute it for the test file of <strong>1.5GB</strong>, it takes <strong>43.87 seconds</strong> to execute on my laptop:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Step 1: Input file read in 1.189915008s
Step 2: Calculation completed in 41.876205675s

--- Summary ---
Result: 2652
Total time: 43.06795088s

</code></pre></div></div>

<p>While the current parser is correct, its 43-second runtime shows there is room for improvement. Our goal is to make it faster and more memory-efficient.</p>

<p>We will improve the parser’s performance through several key optimizations:</p>

<ul>
  <li><strong>Eliminate unnecessary allocations:</strong> First, we’ll change the tokenizer to avoid creating a list of tokens in memory.</li>
  <li><strong>Process bytes directly:</strong> We’ll modify the parser to read raw bytes instead of string slices, reducing overhead.</li>
  <li><strong>Parallelize the work:</strong> We’ll use multithreading and SIMD to perform calculations in parallel.</li>
  <li><strong>Optimize file I/O:</strong> Finally, we’ll use memory-mapped files to speed up file reading.</li>
</ul>

<p>Let’s get started.</p>

<hr />

<h2 id="optimizations-for-speed-and-memory">Optimizations for speed and memory</h2>

<h3 id="optimization-1-do-not-allocate-a-vector-when-tokenizing-431s--645s-85-improvement">Optimization 1: Do not allocate a Vector when tokenizing (43.1 s → 6.45 s, –85% improvement)</h3>

<p>Let’s use <a href="https://github.com/brendangregg/FlameGraph">cargo flamegraph</a> to visualize the call stack of the current solution and identify areas for optimization.</p>

<p><code class="language-plaintext highlighter-rouge">cargo flamegraph --dev --bin parser</code></p>

<p>We get the following flame graph:</p>

<p><img src="../assets/images/math_parser/flamegraph_1_naive_solution.png" alt="First flamegraph" /></p>

<p>We can see that the majority of the time is spent in the tokenizer function, which reads the input string and allocates a vector of tokens.</p>

<p>To profile memory usage, we can use <a href="https://github.com/nnethercote/dhat-rs">dhat</a> to generate a profile JSON file and view it at https://nnethercote.github.io/dh_view/dh_view.html:</p>

<p><img src="../assets/images/math_parser/memory_1_naive_solution.png" alt="First memory profiling" /></p>

<p>Notice how <strong>4 GB of RAM</strong> is used just to allocate the token vector!</p>

<hr />

<p>I made a mistake in my initial implementation. Why does the <code class="language-plaintext highlighter-rouge">tokenize</code> function return a vector if we’re converting it into an iterator later anyway? Let’s just return a lazy iterator directly instead of allocating a vector:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">fn</span> <span class="nf">eval</span><span class="p">(</span><span class="n">input</span><span class="p">:</span> <span class="o">&amp;</span><span class="nb">str</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">u32</span> <span class="p">{</span>
    <span class="k">let</span> <span class="k">mut</span> <span class="n">tokens</span> <span class="o">=</span> <span class="nf">tokenize</span><span class="p">(</span><span class="n">input</span><span class="p">)</span><span class="nf">.peekable</span><span class="p">();</span>
    <span class="nf">parse_expression</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="n">tokens</span><span class="p">)</span>
<span class="p">}</span>

<span class="k">fn</span> <span class="nf">tokenize</span><span class="p">(</span><span class="n">input</span><span class="p">:</span> <span class="o">&amp;</span><span class="nb">str</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="k">impl</span> <span class="nb">Iterator</span><span class="o">&lt;</span><span class="n">Item</span> <span class="o">=</span> <span class="n">Token</span><span class="o">&gt;</span> <span class="o">+</span> <span class="nv">'_</span> <span class="p">{</span>
    <span class="n">input</span><span class="nf">.split_whitespace</span><span class="p">()</span><span class="nf">.map</span><span class="p">(|</span><span class="n">s</span><span class="p">|</span> <span class="k">match</span> <span class="n">s</span> <span class="p">{</span>
        <span class="s">"+"</span> <span class="k">=&gt;</span> <span class="nn">Token</span><span class="p">::</span><span class="n">Plus</span><span class="p">,</span>
        <span class="s">"-"</span> <span class="k">=&gt;</span> <span class="nn">Token</span><span class="p">::</span><span class="n">Minus</span><span class="p">,</span>
        <span class="s">"("</span> <span class="k">=&gt;</span> <span class="nn">Token</span><span class="p">::</span><span class="n">OpeningParenthesis</span><span class="p">,</span>
        <span class="s">")"</span> <span class="k">=&gt;</span> <span class="nn">Token</span><span class="p">::</span><span class="n">ClosingParenthesis</span><span class="p">,</span>
        <span class="n">n</span> <span class="k">=&gt;</span> <span class="nn">Token</span><span class="p">::</span><span class="nf">Operand</span><span class="p">(</span><span class="n">n</span><span class="nf">.parse</span><span class="p">()</span><span class="nf">.unwrap</span><span class="p">()),</span>
    <span class="p">})</span>
<span class="p">}</span>
</code></pre></div></div>

<p>If we run the parser again after this small change, the speed improves significantly:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Step 1: Input file read in 1.249408413s
Step 2: Calculation completed in 5.204344393s

--- Summary ---
Result: 2652
Total time: 6.45377661s
</code></pre></div></div>

<p><strong>Wow! From 43 seconds down to just 6.45</strong>. What an improvement. A small mistake can have a huge impact on performance. Fortunately, the flamegraph pointed us straight to the bottleneck!</p>

<hr />

<h3 id="optimization-2-zero-allocations--parse-directly-from-the-input-bytes-645s--368s-43-improvement">Optimization 2: Zero allocations — parse directly from the input bytes (6.45 s → 3.68 s, –43% improvement)</h3>

<p>After removing the initial <code class="language-plaintext highlighter-rouge">Vec&lt;Token&gt;</code> allocation, performance improved significantly. But we can still do better.</p>

<p>If we analyze the flamegraph again, we notice that although we no longer allocate a vector of tokens, we’re still splitting the input string by whitespace. This iterator-based approach is a huge improvement, but there is still overhead in processing the string and creating <code class="language-plaintext highlighter-rouge">&amp;str</code> slices for each token:</p>

<p><img src="../assets/images/math_parser/flamegraph_2.png" alt="Second flamegraph" /></p>

<p>The pink/violet boxes correspond to the <code class="language-plaintext highlighter-rouge">split_whitespace</code> function used by our tokenizer:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">fn</span> <span class="nf">tokenize</span><span class="p">(</span><span class="n">input</span><span class="p">:</span> <span class="o">&amp;</span><span class="nb">str</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="k">impl</span> <span class="nb">Iterator</span><span class="o">&lt;</span><span class="n">Item</span> <span class="o">=</span> <span class="n">Token</span><span class="o">&gt;</span> <span class="o">+</span> <span class="nv">'_</span> <span class="p">{</span>
    <span class="n">input</span><span class="nf">.split_whitespace</span><span class="p">()</span><span class="nf">.map</span><span class="p">(|</span><span class="n">s</span><span class="p">|</span> <span class="k">match</span> <span class="n">s</span> <span class="p">{</span>
        <span class="s">"+"</span> <span class="k">=&gt;</span> <span class="nn">Token</span><span class="p">::</span><span class="n">Plus</span><span class="p">,</span>
        <span class="s">"-"</span> <span class="k">=&gt;</span> <span class="nn">Token</span><span class="p">::</span><span class="n">Minus</span><span class="p">,</span>
        <span class="s">"("</span> <span class="k">=&gt;</span> <span class="nn">Token</span><span class="p">::</span><span class="n">OpeningParenthesis</span><span class="p">,</span>
        <span class="s">")"</span> <span class="k">=&gt;</span> <span class="nn">Token</span><span class="p">::</span><span class="n">ClosingParenthesis</span><span class="p">,</span>
        <span class="n">n</span> <span class="k">=&gt;</span> <span class="nn">Token</span><span class="p">::</span><span class="nf">Operand</span><span class="p">(</span><span class="n">n</span><span class="nf">.parse</span><span class="p">()</span><span class="nf">.unwrap</span><span class="p">()),</span>
    <span class="p">})</span>
<span class="p">}</span>
</code></pre></div></div>

<p>We’re paying a cost for each <code class="language-plaintext highlighter-rouge">split_whitespace</code> call, which allocates intermediate slices. This churns memory and CPU cycles.</p>

<p>Let’s dive deeper.</p>

<h4 id="the-idea-use-u8">The idea: Use &amp;[u8]</h4>

<p>Instead of working with UTF-8 strings and <code class="language-plaintext highlighter-rouge">&amp;str</code>, we can use raw bytes (<code class="language-plaintext highlighter-rouge">&amp;[u8]</code>) and manually scan for digits and operators to avoid temporary string allocations.</p>

<p>Here is our new zero-allocation tokenizer:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">fn</span> <span class="nf">read_input_file</span><span class="p">()</span> <span class="k">-&gt;</span> <span class="nb">Result</span><span class="o">&lt;</span><span class="nb">Vec</span><span class="o">&lt;</span><span class="nb">u8</span><span class="o">&gt;&gt;</span> <span class="p">{</span>
    <span class="nn">fs</span><span class="p">::</span><span class="nf">read</span><span class="p">(</span><span class="s">"data/input.txt"</span><span class="p">)</span>
<span class="p">}</span>

<span class="k">struct</span> <span class="n">Tokenizer</span><span class="o">&lt;</span><span class="nv">'a</span><span class="o">&gt;</span> <span class="p">{</span>
    <span class="n">input</span><span class="p">:</span> <span class="o">&amp;</span><span class="nv">'a</span> <span class="p">[</span><span class="nb">u8</span><span class="p">],</span>
    <span class="n">pos</span><span class="p">:</span> <span class="nb">usize</span><span class="p">,</span>
<span class="p">}</span>

<span class="k">impl</span><span class="o">&lt;</span><span class="nv">'a</span><span class="o">&gt;</span> <span class="nb">Iterator</span> <span class="k">for</span> <span class="n">Tokenizer</span><span class="o">&lt;</span><span class="nv">'a</span><span class="o">&gt;</span> <span class="p">{</span>
    <span class="k">type</span> <span class="n">Item</span> <span class="o">=</span> <span class="n">Token</span><span class="p">;</span>

    <span class="k">fn</span> <span class="nf">next</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="k">self</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">Option</span><span class="o">&lt;</span><span class="k">Self</span><span class="p">::</span><span class="n">Item</span><span class="o">&gt;</span> <span class="p">{</span>
        <span class="k">if</span> <span class="k">self</span><span class="py">.pos</span> <span class="o">&gt;=</span> <span class="k">self</span><span class="py">.input</span><span class="nf">.len</span><span class="p">()</span> <span class="p">{</span>
            <span class="k">return</span> <span class="nb">None</span><span class="p">;</span>
        <span class="p">}</span>

        <span class="k">let</span> <span class="n">byte</span> <span class="o">=</span> <span class="k">self</span><span class="py">.input</span><span class="p">[</span><span class="k">self</span><span class="py">.pos</span><span class="p">];</span>

        <span class="k">self</span><span class="py">.pos</span> <span class="o">+=</span> <span class="mi">1</span><span class="p">;</span>

        <span class="k">let</span> <span class="n">token</span> <span class="o">=</span> <span class="k">match</span> <span class="n">byte</span> <span class="p">{</span>
            <span class="sc">b'+'</span> <span class="k">=&gt;</span> <span class="nf">Some</span><span class="p">(</span><span class="nn">Token</span><span class="p">::</span><span class="n">Plus</span><span class="p">),</span>
            <span class="sc">b'-'</span> <span class="k">=&gt;</span> <span class="nf">Some</span><span class="p">(</span><span class="nn">Token</span><span class="p">::</span><span class="n">Minus</span><span class="p">),</span>
            <span class="sc">b'('</span> <span class="k">=&gt;</span> <span class="nf">Some</span><span class="p">(</span><span class="nn">Token</span><span class="p">::</span><span class="n">OpeningParenthesis</span><span class="p">),</span>
            <span class="sc">b')'</span> <span class="k">=&gt;</span> <span class="nf">Some</span><span class="p">(</span><span class="nn">Token</span><span class="p">::</span><span class="n">ClosingParenthesis</span><span class="p">),</span>
            <span class="sc">b'0'</span><span class="o">..=</span><span class="sc">b'9'</span> <span class="k">=&gt;</span> <span class="p">{</span>
                <span class="k">let</span> <span class="k">mut</span> <span class="n">value</span> <span class="o">=</span> <span class="n">byte</span> <span class="o">-</span> <span class="sc">b'0'</span><span class="p">;</span>
                <span class="k">while</span> <span class="k">self</span><span class="py">.pos</span> <span class="o">&lt;</span> <span class="k">self</span><span class="py">.input</span><span class="nf">.len</span><span class="p">()</span> <span class="o">&amp;&amp;</span> <span class="k">self</span><span class="py">.input</span><span class="p">[</span><span class="k">self</span><span class="py">.pos</span><span class="p">]</span><span class="nf">.is_ascii_digit</span><span class="p">()</span> <span class="p">{</span>
                    <span class="n">value</span> <span class="o">=</span> <span class="mi">10</span> <span class="o">*</span> <span class="n">value</span> <span class="o">+</span> <span class="p">(</span><span class="k">self</span><span class="py">.input</span><span class="p">[</span><span class="k">self</span><span class="py">.pos</span><span class="p">]</span> <span class="o">-</span> <span class="sc">b'0'</span><span class="p">);</span>
                    <span class="k">self</span><span class="py">.pos</span> <span class="o">+=</span> <span class="mi">1</span><span class="p">;</span>
                <span class="p">}</span>

                <span class="nf">Some</span><span class="p">(</span><span class="nn">Token</span><span class="p">::</span><span class="nf">Operand</span><span class="p">(</span><span class="n">value</span><span class="p">))</span>
            <span class="p">}</span>
            <span class="n">other</span> <span class="k">=&gt;</span> <span class="nd">panic!</span><span class="p">(</span><span class="s">"Unexpected byte: '{}'"</span><span class="p">,</span> <span class="n">other</span> <span class="k">as</span> <span class="nb">char</span><span class="p">),</span>
        <span class="p">};</span>

        <span class="k">self</span><span class="py">.pos</span> <span class="o">+=</span> <span class="mi">1</span><span class="p">;</span> <span class="c1">// skip whitespace</span>

        <span class="k">return</span> <span class="n">token</span><span class="p">;</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The only heap allocation occurs when the file is read into a vector. The tokenizer operates on references to that vector of bytes and does not perform any intermediate allocations.</p>

<p>If we execute the program again, we get:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Step 1: Input file read in 1.212080967s
Step 2: Calculation completed in 2.471639289s

--- Summary ---
Result: 2652
Total time: 3.683753465s
</code></pre></div></div>

<h2 id="a-great-improvement-from-645-to-368-seconds-nearly-2-seconds-faster">A great improvement! From <strong>6.45 to 3.68 seconds</strong>, nearly 2 seconds faster!</h2>

<h3 id="optimization-3-do-not-use-peekable-368s--321s-13-improvement">Optimization 3: Do not use Peekable (3.68 s → 3.21 s, –13% improvement)</h3>

<p>The new flamegraph shows several samples related to <code class="language-plaintext highlighter-rouge">Peekable</code>:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">core::iter::adapters::peekable::Peekable::peek::_</code></li>
  <li><code class="language-plaintext highlighter-rouge">core::iter::adapters::peekable::Peekable::peek</code></li>
</ul>

<p><img src="../assets/images/math_parser/flamegraph_2.png" alt="Third flamegraph" /></p>

<p>This is because we wrap our tokenizer in Rust’s <code class="language-plaintext highlighter-rouge">Peekable</code> adapter, which allows us to inspect the next token without consuming it. We initially used it for look ahead when parsing expressions like <code class="language-plaintext highlighter-rouge">1 + (2 - 3)</code> to determine whether to continue parsing or return early.</p>

<p>However, in our use case, <code class="language-plaintext highlighter-rouge">peek()</code> isn’t necessary. We can restructure the algorithm to work directly with a plain iterator.</p>

<p>Here’s the old version:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">fn</span> <span class="nf">parse_expression</span><span class="p">(</span><span class="n">tokens</span><span class="p">:</span> <span class="o">&amp;</span><span class="k">mut</span> <span class="n">Peekable</span><span class="o">&lt;</span><span class="k">impl</span> <span class="nb">Iterator</span><span class="o">&lt;</span><span class="n">Item</span> <span class="o">=</span> <span class="n">Token</span><span class="o">&gt;&gt;</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">u32</span> <span class="p">{</span>
    <span class="k">let</span> <span class="k">mut</span> <span class="n">left</span> <span class="o">=</span> <span class="nf">parse_primary</span><span class="p">(</span><span class="n">tokens</span><span class="p">);</span>

    <span class="k">while</span> <span class="k">let</span> <span class="nf">Some</span><span class="p">(</span><span class="nn">Token</span><span class="p">::</span><span class="n">Plus</span><span class="p">)</span> <span class="p">|</span> <span class="nf">Some</span><span class="p">(</span><span class="nn">Token</span><span class="p">::</span><span class="n">Minus</span><span class="p">)</span> <span class="o">=</span> <span class="n">tokens</span><span class="nf">.peek</span><span class="p">()</span> <span class="p">{</span>
        <span class="k">let</span> <span class="n">operator</span> <span class="o">=</span> <span class="n">tokens</span><span class="nf">.next</span><span class="p">();</span>
        <span class="k">let</span> <span class="n">right</span> <span class="o">=</span> <span class="nf">parse_primary</span><span class="p">(</span><span class="n">tokens</span><span class="p">);</span>
        <span class="n">left</span> <span class="o">=</span> <span class="k">match</span> <span class="n">operator</span> <span class="p">{</span>
            <span class="nf">Some</span><span class="p">(</span><span class="nn">Token</span><span class="p">::</span><span class="n">Plus</span><span class="p">)</span> <span class="k">=&gt;</span> <span class="n">left</span> <span class="o">+</span> <span class="n">right</span><span class="p">,</span>
            <span class="nf">Some</span><span class="p">(</span><span class="nn">Token</span><span class="p">::</span><span class="n">Minus</span><span class="p">)</span> <span class="k">=&gt;</span> <span class="n">left</span> <span class="o">-</span> <span class="n">right</span><span class="p">,</span>
            <span class="n">other</span> <span class="k">=&gt;</span> <span class="nd">panic!</span><span class="p">(</span><span class="s">"Expected operator, got {:?}"</span><span class="p">,</span> <span class="n">other</span><span class="p">),</span>
        <span class="p">};</span>
    <span class="p">}</span>

    <span class="n">left</span>
<span class="p">}</span>
</code></pre></div></div>

<p>And here’s the new version that eliminates <code class="language-plaintext highlighter-rouge">Peekable</code>:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">fn</span> <span class="nf">parse_expression</span><span class="p">(</span><span class="n">tokens</span><span class="p">:</span> <span class="o">&amp;</span><span class="k">mut</span> <span class="k">impl</span> <span class="nb">Iterator</span><span class="o">&lt;</span><span class="n">Item</span> <span class="o">=</span> <span class="n">Token</span><span class="o">&gt;</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">u32</span> <span class="p">{</span>
    <span class="k">let</span> <span class="k">mut</span> <span class="n">left</span> <span class="o">=</span> <span class="nf">parse_primary</span><span class="p">(</span><span class="n">tokens</span><span class="p">);</span>

    <span class="k">while</span> <span class="k">let</span> <span class="nf">Some</span><span class="p">(</span><span class="n">token</span><span class="p">)</span> <span class="o">=</span> <span class="n">tokens</span><span class="nf">.next</span><span class="p">()</span> <span class="p">{</span>
        <span class="k">if</span> <span class="n">token</span> <span class="o">==</span> <span class="nn">Token</span><span class="p">::</span><span class="n">ClosingParenthesis</span> <span class="p">{</span>
            <span class="k">break</span><span class="p">;</span>
        <span class="p">}</span>

        <span class="k">let</span> <span class="n">right</span> <span class="o">=</span> <span class="nf">parse_primary</span><span class="p">(</span><span class="n">tokens</span><span class="p">);</span>
        <span class="n">left</span> <span class="o">=</span> <span class="k">match</span> <span class="n">token</span> <span class="p">{</span>
            <span class="nn">Token</span><span class="p">::</span><span class="n">Plus</span> <span class="k">=&gt;</span> <span class="n">left</span> <span class="o">+</span> <span class="n">right</span><span class="p">,</span>
            <span class="nn">Token</span><span class="p">::</span><span class="n">Minus</span> <span class="k">=&gt;</span> <span class="n">left</span> <span class="o">-</span> <span class="n">right</span><span class="p">,</span>
            <span class="n">other</span> <span class="k">=&gt;</span> <span class="nd">panic!</span><span class="p">(</span><span class="s">"Expected operator, got {:?}"</span><span class="p">,</span> <span class="n">other</span><span class="p">),</span>
        <span class="p">};</span>
    <span class="p">}</span>

    <span class="n">left</span>
<span class="p">}</span>
</code></pre></div></div>

<p>We replaced the <code class="language-plaintext highlighter-rouge">peek()</code> logic with a <code class="language-plaintext highlighter-rouge">match</code> on the current token. If it’s a <code class="language-plaintext highlighter-rouge">+</code> or <code class="language-plaintext highlighter-rouge">-</code>, we consume the right-hand operand and compute the result. If it’s a closing parenthesis, we <code class="language-plaintext highlighter-rouge">break</code> (this is an important point: <strong>we no longer manually skip the closing parenthesis after parsing a sub-expression</strong>).</p>

<p>Previously, with <code class="language-plaintext highlighter-rouge">peekable</code>, we consumed the <code class="language-plaintext highlighter-rouge">(</code>, parsed the sub-expression, and then had to explicitly <code class="language-plaintext highlighter-rouge">next()</code> again to discard the <code class="language-plaintext highlighter-rouge">)</code> after the recursive call. Now, since we’re using a flat iterator, we simply let the closing <code class="language-plaintext highlighter-rouge">)</code> token be returned by <code class="language-plaintext highlighter-rouge">next()</code>, and our <code class="language-plaintext highlighter-rouge">while let Some(token)</code> loop handles it. If the token is a <code class="language-plaintext highlighter-rouge">)</code>, we break out of the loop, and the recursive call returns.</p>

<p>We also simplified <code class="language-plaintext highlighter-rouge">parse_primary</code> in a similar way:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">fn</span> <span class="nf">parse_primary</span><span class="p">(</span><span class="n">tokens</span><span class="p">:</span> <span class="o">&amp;</span><span class="k">mut</span> <span class="k">impl</span> <span class="nb">Iterator</span><span class="o">&lt;</span><span class="n">Item</span> <span class="o">=</span> <span class="n">Token</span><span class="o">&gt;</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">u32</span> <span class="p">{</span>
    <span class="k">match</span> <span class="n">tokens</span><span class="nf">.next</span><span class="p">()</span> <span class="p">{</span>
        <span class="nf">Some</span><span class="p">(</span><span class="nn">Token</span><span class="p">::</span><span class="n">OpeningParenthesis</span><span class="p">)</span> <span class="k">=&gt;</span> <span class="p">{</span>
            <span class="k">let</span> <span class="n">val</span> <span class="o">=</span> <span class="nf">parse_expression</span><span class="p">(</span><span class="n">tokens</span><span class="p">);</span>
            <span class="n">val</span>
        <span class="p">}</span>
        <span class="nf">Some</span><span class="p">(</span><span class="nn">Token</span><span class="p">::</span><span class="nf">Operand</span><span class="p">(</span><span class="n">n</span><span class="p">))</span> <span class="k">=&gt;</span> <span class="n">n</span> <span class="k">as</span> <span class="nb">u32</span><span class="p">,</span>
        <span class="n">other</span> <span class="k">=&gt;</span> <span class="nd">panic!</span><span class="p">(</span><span class="s">"Expected number, got {:?}"</span><span class="p">,</span> <span class="n">other</span><span class="p">),</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>By avoiding <code class="language-plaintext highlighter-rouge">peek()</code> and handling the tokens linearly, we improve the performance:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Step 1: Input file read in 1.116952011s
Step 2: Calculation completed in 2.094806113s

--- Summary ---
Result: 2652
Total time: 3.21178544s
</code></pre></div></div>

<p>From <strong>3.68 to 3.21 seconds</strong>. We are getting faster. Let’s continue optimizing!</p>

<hr />

<h3 id="optimization-4-multithreading-and-simd-321s--221s-31-improvement">Optimization 4: Multithreading and SIMD (3.21 s → 2.21 s, –31% improvement)</h3>

<p>The next logical step is to parallelize the computation. Ideally, if we have a CPU with 8 cores, we want to split the input file into 8 equal chunks and have each core work on one chunk simultaneously. This should, in theory, make our program up to 8 times faster.</p>

<p>However, this is not as simple as just splitting the file into 8 equal chunks. We are bound by the rules of math and syntax, which introduce two restrictions:</p>

<ol>
  <li><strong>We cannot split inside parentheses.</strong> A split can only happen at the “top level” of the expression. For example, splitting <code class="language-plaintext highlighter-rouge">((2 + 1)| - 2)</code> is invalid, and this applies to nested parentheses as well.</li>
  <li><strong>We cannot split at a <code class="language-plaintext highlighter-rouge">-</code> operator.</strong> Addition is <em>associative</em>, meaning <code class="language-plaintext highlighter-rouge">(a + b) + c</code> is equivalent to <code class="language-plaintext highlighter-rouge">a + (b + c)</code>. This property allows us to group additions freely. Subtraction, however, is <em>not</em> associative: <code class="language-plaintext highlighter-rouge">(a - b) - c</code> is not the same as <code class="language-plaintext highlighter-rouge">a - (b - c)</code>. Splitting on a <code class="language-plaintext highlighter-rouge">-</code> would alter the order of operations and lead to an incorrect result.</li>
</ol>

<p>These restrictions mean we cannot simply split the file at <code class="language-plaintext highlighter-rouge">(total_size / 8)</code>. We need a way to find the <em>closest valid split point</em> (a <code class="language-plaintext highlighter-rouge">+</code> sign at the top level) to that ideal boundary.</p>

<p>To find these points, we would need to scan the entire input to identify where all current parentheses are closed. A naive scan for this would be slow, requiring a full pass over the data just to find the split points before the actual work begins. So, is this solution slower (2 passes vs 1 pass)? Not necessarily. We can make the first pass blazing fast by using <strong>SIMD</strong>.</p>

<h4 id="the-algorithm-at-a-high-level">The algorithm at a high level</h4>

<p>Before diving into the code, let’s look at the high-level plan. The entire process is started by our <code class="language-plaintext highlighter-rouge">parallel_eval</code> function, which follows this data flow:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[ Input File ]
      |
      v
.-----------------------.
|     parallel_eval     |
'-----------------------'
      |
      | 1. Find Splits
      v
.----------------------------------.
| find_best_split_indices_simd     |------&gt; [ Split Indices ]
'----------------------------------'             |
      |                                          |
      | 2. Create Chunks                         |
      v                                          |
[ Chunk 1 ] [ Chunk 2 ] ... [ Chunk N ] &lt;--------+
      |         |               |
      |         |               | 3. Process in Parallel
      v         v               v
.------------------------------------.
|          Thread Pool               |
|                                    |
|  eval(c1)  eval(c2) ...  eval(cN)  |
'------------------------------------'
      |
      | 4. Collect Results
      v
[ Result 1, Result 2, ... Result N ]
      |
      | 5. Sum Results
      v
[ Final Answer ]
</code></pre></div></div>

<h4 id="what-is-simd">What is SIMD?</h4>

<p><strong>SIMD</strong> stands for <strong>S</strong>ingle <strong>I</strong>nstruction, <strong>M</strong>ultiple <strong>D</strong>ata. It’s a powerful feature built into modern CPUs. At its core, SIMD allows the CPU to perform the same operation on multiple pieces of data <em>at the same time</em>, with a single instruction.</p>

<p>Consider a cashier at a grocery store. A traditional CPU core operates like a cashier scanning items one by one. This is a <strong>scalar</strong> operation, where one instruction processes one piece of data.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Scalar Operation (One by one)
Instruction: Is this byte '+'?
      |
      V
[ H | e | l | l | o |   | + |   | W | o | r | l | d ]
  ^--- Processed sequentially ---&gt;
</code></pre></div></div>

<p>A SIMD-enabled CPU is like a cashier with a wide scanner that can read the barcodes of an entire row of items in the cart simultaneously. This is a <strong>vector</strong> operation.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>SIMD Operation (All at once)
Instruction: For all 64 of these bytes, tell me which ones are '+'?
      |
      V
[ H | e | l | l | o |   | + |   | W | o | r | l | d | ... (up to 64 bytes) ]
[ 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | ... (result mask)  ]
\_______________________________________________________________________/
                         Processed in a single cycle
</code></pre></div></div>

<p>For repetitive tasks, such as searching for a specific character in a long string, the performance gain is great.</p>

<h5 id="simd-example-finding-">SIMD example: Finding +</h5>

<p>In our project, we need to locate all <code class="language-plaintext highlighter-rouge">+</code> characters.</p>

<p><strong>The Scalar Way:</strong>
Without SIMD, we would need a simple <code class="language-plaintext highlighter-rouge">for</code> loop to check every single byte:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">let</span> <span class="k">mut</span> <span class="n">positions</span> <span class="o">=</span> <span class="nn">Vec</span><span class="p">::</span><span class="nf">new</span><span class="p">();</span>
<span class="k">for</span> <span class="p">(</span><span class="n">i</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">byte</span><span class="p">)</span> <span class="k">in</span> <span class="n">input</span><span class="nf">.iter</span><span class="p">()</span><span class="nf">.enumerate</span><span class="p">()</span> <span class="p">{</span>
    <span class="k">if</span> <span class="n">byte</span> <span class="o">==</span> <span class="sc">b'+'</span> <span class="p">{</span>
        <span class="n">positions</span><span class="nf">.push</span><span class="p">(</span><span class="n">i</span><span class="p">);</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>This approach is simple and correct, but for a 1.5GB file like ours, this loop would execute 1.5 billion times.</p>

<p><strong>The SIMD Way:</strong>
With SIMD (specifically, using AVX-512 instructions), the process is different:</p>

<ol>
  <li><strong>Load:</strong> We load a big chunk of our input string (64 bytes at a time) into a wide 512-bit CPU register.</li>
  <li><strong>Compare:</strong> We use a single instruction (<code class="language-plaintext highlighter-rouge">_mm512_cmpeq_epi8_mask</code>) to compare all 64 bytes in our register against a template register that contains 64 copies of the <code class="language-plaintext highlighter-rouge">+</code> character.</li>
  <li><strong>Get Mask:</strong> The CPU returns a single 64-bit integer (<code class="language-plaintext highlighter-rouge">u64</code>) as a result. This is a <strong>bitmask</strong>. If the 5th bit of this integer is <code class="language-plaintext highlighter-rouge">1</code>, it indicates that the 5th byte of our input chunk was a <code class="language-plaintext highlighter-rouge">+</code>.</li>
</ol>

<p>In a single instruction, we have done the work of 64 loop iterations. While SIMD requires more complex code, the performance gains are worth it.</p>

<h4 id="the-code">The Code</h4>

<p>Here are the two key functions that implement our parallel strategy: <code class="language-plaintext highlighter-rouge">parallel_eval</code> orchestrates the process, and <code class="language-plaintext highlighter-rouge">find_best_split_indices_simd</code> uses SIMD to find the valid split points.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">fn</span> <span class="nf">parallel_eval</span><span class="p">(</span><span class="n">input</span><span class="p">:</span> <span class="o">&amp;</span><span class="p">[</span><span class="nb">u8</span><span class="p">],</span> <span class="n">num_threads</span><span class="p">:</span> <span class="nb">usize</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">i64</span> <span class="p">{</span>
    <span class="k">if</span> <span class="n">num_threads</span> <span class="o">&lt;=</span> <span class="mi">1</span> <span class="p">||</span> <span class="n">input</span><span class="nf">.len</span><span class="p">()</span> <span class="o">&lt;</span> <span class="mi">1000</span> <span class="p">{</span>
        <span class="k">return</span> <span class="nf">eval</span><span class="p">(</span><span class="n">input</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="c1">// 1. Find the best places to split the input.</span>
    <span class="k">let</span> <span class="n">split_indices</span> <span class="o">=</span> <span class="k">unsafe</span> <span class="p">{</span> <span class="nf">find_best_split_indices_simd</span><span class="p">(</span><span class="n">input</span><span class="p">,</span> <span class="n">num_threads</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span> <span class="p">};</span>

    <span class="k">if</span> <span class="n">split_indices</span><span class="nf">.is_empty</span><span class="p">()</span> <span class="p">{</span>
        <span class="k">return</span> <span class="nf">eval</span><span class="p">(</span><span class="n">input</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="c1">// 2. Create the chunks based on the indices.</span>
    <span class="k">let</span> <span class="k">mut</span> <span class="n">chunks</span> <span class="o">=</span> <span class="nn">Vec</span><span class="p">::</span><span class="nf">with_capacity</span><span class="p">(</span><span class="n">num_threads</span><span class="p">);</span>
    <span class="k">let</span> <span class="k">mut</span> <span class="n">last_idx</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="k">for</span> <span class="o">&amp;</span><span class="n">idx</span> <span class="k">in</span> <span class="o">&amp;</span><span class="n">split_indices</span> <span class="p">{</span>
        <span class="c1">// Slice from the last index to just before the operator's space.</span>
        <span class="n">chunks</span><span class="nf">.push</span><span class="p">(</span><span class="o">&amp;</span><span class="n">input</span><span class="p">[</span><span class="n">last_idx</span><span class="o">..</span><span class="n">idx</span> <span class="o">-</span> <span class="mi">1</span><span class="p">]);</span>
        <span class="c1">// The next chunk starts after the operator and its space.</span>
        <span class="n">last_idx</span> <span class="o">=</span> <span class="n">idx</span> <span class="o">+</span> <span class="mi">2</span><span class="p">;</span>
    <span class="p">}</span>
    <span class="n">chunks</span><span class="nf">.push</span><span class="p">(</span><span class="o">&amp;</span><span class="n">input</span><span class="p">[</span><span class="n">last_idx</span><span class="o">..</span><span class="p">]);</span>

    <span class="c1">// 3. Process all chunks in parallel with Rayon.</span>
    <span class="k">let</span> <span class="n">chunk_results</span><span class="p">:</span> <span class="nb">Vec</span><span class="o">&lt;</span><span class="nb">i64</span><span class="o">&gt;</span> <span class="o">=</span> <span class="n">chunks</span><span class="nf">.par_iter</span><span class="p">()</span><span class="nf">.map</span><span class="p">(|</span><span class="o">&amp;</span><span class="n">chunk</span><span class="p">|</span> <span class="nf">eval</span><span class="p">(</span><span class="n">chunk</span><span class="p">))</span><span class="nf">.collect</span><span class="p">();</span>

    <span class="c1">// 4. Since we only split on '+', the final result is the sum of all parts.</span>
    <span class="n">chunk_results</span><span class="nf">.into_iter</span><span class="p">()</span><span class="nf">.sum</span><span class="p">()</span>
<span class="p">}</span>

<span class="nd">#[cfg(target_arch</span> <span class="nd">=</span> <span class="s">"x86_64"</span><span class="nd">)]</span>
<span class="nd">#[target_feature(enable</span> <span class="nd">=</span> <span class="s">"avx512f"</span><span class="nd">)]</span>
<span class="nd">#[target_feature(enable</span> <span class="nd">=</span> <span class="s">"avx512bw"</span><span class="nd">)]</span>
<span class="k">unsafe</span> <span class="k">fn</span> <span class="nf">find_best_split_indices_simd</span><span class="p">(</span><span class="n">input</span><span class="p">:</span> <span class="o">&amp;</span><span class="p">[</span><span class="nb">u8</span><span class="p">],</span> <span class="n">num_splits</span><span class="p">:</span> <span class="nb">usize</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">Vec</span><span class="o">&lt;</span><span class="nb">usize</span><span class="o">&gt;</span> <span class="p">{</span>
    <span class="c1">// Explanation of this function in the next section.</span>
    <span class="k">let</span> <span class="k">mut</span> <span class="n">final_indices</span> <span class="o">=</span> <span class="nn">Vec</span><span class="p">::</span><span class="nf">with_capacity</span><span class="p">(</span><span class="n">num_splits</span><span class="p">);</span>
    <span class="k">if</span> <span class="n">num_splits</span> <span class="o">==</span> <span class="mi">0</span> <span class="p">{</span>
        <span class="k">return</span> <span class="n">final_indices</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="k">let</span> <span class="n">chunk_size</span> <span class="o">=</span> <span class="n">input</span><span class="nf">.len</span><span class="p">()</span> <span class="o">/</span> <span class="p">(</span><span class="n">num_splits</span> <span class="o">+</span> <span class="mi">1</span><span class="p">);</span>
    <span class="k">let</span> <span class="k">mut</span> <span class="n">target_idx</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
    <span class="k">let</span> <span class="k">mut</span> <span class="n">last_op_at_depth_zero</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="k">let</span> <span class="k">mut</span> <span class="n">depth</span><span class="p">:</span> <span class="nb">i32</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="k">let</span> <span class="k">mut</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="k">let</span> <span class="n">len</span> <span class="o">=</span> <span class="n">input</span><span class="nf">.len</span><span class="p">();</span>

    <span class="k">let</span> <span class="n">open_parens</span> <span class="o">=</span> <span class="nf">_mm512_set1_epi8</span><span class="p">(</span><span class="sc">b'('</span> <span class="k">as</span> <span class="nb">i8</span><span class="p">);</span>
    <span class="k">let</span> <span class="n">close_parens</span> <span class="o">=</span> <span class="nf">_mm512_set1_epi8</span><span class="p">(</span><span class="sc">b')'</span> <span class="k">as</span> <span class="nb">i8</span><span class="p">);</span>
    <span class="k">let</span> <span class="n">pluses</span> <span class="o">=</span> <span class="nf">_mm512_set1_epi8</span><span class="p">(</span><span class="sc">b'+'</span> <span class="k">as</span> <span class="nb">i8</span><span class="p">);</span>

    <span class="nv">'outer</span><span class="p">:</span> <span class="k">while</span> <span class="n">i</span> <span class="o">+</span> <span class="mi">64</span> <span class="o">&lt;=</span> <span class="n">len</span> <span class="p">{</span>
        <span class="k">if</span> <span class="n">final_indices</span><span class="nf">.len</span><span class="p">()</span> <span class="o">&gt;=</span> <span class="n">num_splits</span> <span class="p">{</span>
            <span class="k">break</span><span class="p">;</span>
        <span class="p">}</span>
        <span class="k">let</span> <span class="n">chunk</span> <span class="o">=</span> <span class="nf">_mm512_loadu_si512</span><span class="p">(</span><span class="n">input</span><span class="nf">.as_ptr</span><span class="p">()</span><span class="nf">.add</span><span class="p">(</span><span class="n">i</span><span class="p">)</span> <span class="k">as</span> <span class="o">*</span><span class="k">const</span> <span class="n">_</span><span class="p">);</span>
        <span class="k">let</span> <span class="n">open_mask</span> <span class="o">=</span> <span class="nf">_mm512_cmpeq_epi8_mask</span><span class="p">(</span><span class="n">chunk</span><span class="p">,</span> <span class="n">open_parens</span><span class="p">);</span>
        <span class="k">let</span> <span class="n">close_mask</span> <span class="o">=</span> <span class="nf">_mm512_cmpeq_epi8_mask</span><span class="p">(</span><span class="n">chunk</span><span class="p">,</span> <span class="n">close_parens</span><span class="p">);</span>
        <span class="k">let</span> <span class="n">plus_mask</span> <span class="o">=</span> <span class="nf">_mm512_cmpeq_epi8_mask</span><span class="p">(</span><span class="n">chunk</span><span class="p">,</span> <span class="n">pluses</span><span class="p">);</span>

        <span class="k">let</span> <span class="k">mut</span> <span class="n">all_interesting_mask</span> <span class="o">=</span> <span class="n">open_mask</span> <span class="p">|</span> <span class="n">close_mask</span> <span class="p">|</span> <span class="n">plus_mask</span><span class="p">;</span>

        <span class="k">while</span> <span class="n">all_interesting_mask</span> <span class="o">!=</span> <span class="mi">0</span> <span class="p">{</span>
            <span class="k">let</span> <span class="n">j</span> <span class="o">=</span> <span class="n">all_interesting_mask</span><span class="nf">.trailing_zeros</span><span class="p">()</span> <span class="k">as</span> <span class="nb">usize</span><span class="p">;</span>
            <span class="k">let</span> <span class="n">current_idx</span> <span class="o">=</span> <span class="n">i</span> <span class="o">+</span> <span class="n">j</span><span class="p">;</span>
            <span class="k">if</span> <span class="p">(</span><span class="n">open_mask</span> <span class="o">&gt;&gt;</span> <span class="n">j</span><span class="p">)</span> <span class="o">&amp;</span> <span class="mi">1</span> <span class="o">==</span> <span class="mi">1</span> <span class="p">{</span>
                <span class="n">depth</span> <span class="o">+=</span> <span class="mi">1</span><span class="p">;</span>
            <span class="p">}</span> <span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="n">close_mask</span> <span class="o">&gt;&gt;</span> <span class="n">j</span><span class="p">)</span> <span class="o">&amp;</span> <span class="mi">1</span> <span class="o">==</span> <span class="mi">1</span> <span class="p">{</span>
                <span class="n">depth</span> <span class="o">-=</span> <span class="mi">1</span><span class="p">;</span>
            <span class="p">}</span> <span class="k">else</span> <span class="p">{</span> <span class="c1">// Is a '+' operator</span>
                <span class="k">if</span> <span class="n">depth</span> <span class="o">==</span> <span class="mi">0</span> <span class="p">{</span>
                    <span class="n">last_op_at_depth_zero</span> <span class="o">=</span> <span class="n">current_idx</span><span class="p">;</span>
                    <span class="k">let</span> <span class="n">ideal_pos</span> <span class="o">=</span> <span class="n">target_idx</span> <span class="o">*</span> <span class="n">chunk_size</span><span class="p">;</span>
                    <span class="k">if</span> <span class="n">current_idx</span> <span class="o">&gt;=</span> <span class="n">ideal_pos</span> <span class="p">{</span>
                        <span class="n">final_indices</span><span class="nf">.push</span><span class="p">(</span><span class="n">current_idx</span><span class="p">);</span>
                        <span class="n">target_idx</span> <span class="o">+=</span> <span class="mi">1</span><span class="p">;</span>
                        <span class="k">if</span> <span class="n">final_indices</span><span class="nf">.len</span><span class="p">()</span> <span class="o">&gt;=</span> <span class="n">num_splits</span> <span class="p">{</span>
                            <span class="k">break</span> <span class="nv">'outer</span><span class="p">;</span>
                        <span class="p">}</span>
                    <span class="p">}</span>
                <span class="p">}</span>
            <span class="p">}</span>
            <span class="n">all_interesting_mask</span> <span class="o">&amp;=</span> <span class="n">all_interesting_mask</span> <span class="o">-</span> <span class="mi">1</span><span class="p">;</span>
        <span class="p">}</span>
        <span class="n">i</span> <span class="o">+=</span> <span class="mi">64</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="c1">// ... scalar remainder and fill logic ...</span>
    <span class="k">while</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">len</span> <span class="o">&amp;&amp;</span> <span class="n">final_indices</span><span class="nf">.len</span><span class="p">()</span> <span class="o">&lt;</span> <span class="n">num_splits</span> <span class="p">{</span>
        <span class="k">let</span> <span class="n">char_byte</span> <span class="o">=</span> <span class="o">*</span><span class="n">input</span><span class="nf">.get_unchecked</span><span class="p">(</span><span class="n">i</span><span class="p">);</span>
        <span class="k">if</span> <span class="n">char_byte</span> <span class="o">==</span> <span class="sc">b'('</span> <span class="p">{</span> <span class="n">depth</span> <span class="o">+=</span> <span class="mi">1</span><span class="p">;</span> <span class="p">}</span>
        <span class="k">else</span> <span class="k">if</span> <span class="n">char_byte</span> <span class="o">==</span> <span class="sc">b')'</span> <span class="p">{</span> <span class="n">depth</span> <span class="o">-=</span> <span class="mi">1</span><span class="p">;</span> <span class="p">}</span>
        <span class="k">else</span> <span class="k">if</span> <span class="n">char_byte</span> <span class="o">==</span> <span class="sc">b'+'</span> <span class="o">&amp;&amp;</span> <span class="n">depth</span> <span class="o">==</span> <span class="mi">0</span> <span class="p">{</span>
            <span class="n">last_op_at_depth_zero</span> <span class="o">=</span> <span class="n">i</span><span class="p">;</span>
            <span class="k">let</span> <span class="n">ideal_pos</span> <span class="o">=</span> <span class="n">target_idx</span> <span class="o">*</span> <span class="n">chunk_size</span><span class="p">;</span>
            <span class="k">if</span> <span class="n">i</span> <span class="o">&gt;=</span> <span class="n">ideal_pos</span> <span class="p">{</span>
                <span class="n">final_indices</span><span class="nf">.push</span><span class="p">(</span><span class="n">i</span><span class="p">);</span>
                <span class="n">target_idx</span> <span class="o">+=</span> <span class="mi">1</span><span class="p">;</span>
            <span class="p">}</span>
        <span class="p">}</span>
        <span class="n">i</span> <span class="o">+=</span> <span class="mi">1</span><span class="p">;</span>
    <span class="p">}</span>
    <span class="k">while</span> <span class="n">final_indices</span><span class="nf">.len</span><span class="p">()</span> <span class="o">&lt;</span> <span class="n">num_splits</span> <span class="o">&amp;&amp;</span> <span class="n">last_op_at_depth_zero</span> <span class="o">&gt;</span> <span class="mi">0</span> <span class="p">{</span>
        <span class="n">final_indices</span><span class="nf">.push</span><span class="p">(</span><span class="n">last_op_at_depth_zero</span><span class="p">);</span>
    <span class="p">}</span>
    <span class="n">final_indices</span>
<span class="p">}</span>
</code></pre></div></div>

<h4 id="algorithm-breakdown-find_best_split_indices_simd">Algorithm Breakdown: <code class="language-plaintext highlighter-rouge">find_best_split_indices_simd</code></h4>

<p>This function’s purpose is to identify the optimal <code class="language-plaintext highlighter-rouge">+</code> signs for splitting.</p>

<h5 id="step-1-the-simd-scan">Step 1: The SIMD Scan</h5>

<p>The code enters a main loop, processing the input in 64-byte chunks. Within this loop, it uses <code class="language-plaintext highlighter-rouge">_mm512_cmpeq_epi8_mask</code> to generate bitmasks. This instruction compares all 64 bytes of the current chunk against a target character and returns a 64-bit integer (<code class="language-plaintext highlighter-rouge">u64</code>) where the N-th bit is <code class="language-plaintext highlighter-rouge">1</code> if the N-th byte was a match.</p>

<h5 id="step-2-the-serial-scan">Step 2: The serial scan</h5>

<p>Next, we combine these masks and iterate only through the “interesting” bits. This is a key step:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">let</span> <span class="k">mut</span> <span class="n">all_interesting_mask</span> <span class="o">=</span> <span class="n">open_mask</span> <span class="p">|</span> <span class="n">close_mask</span> <span class="p">|</span> <span class="n">plus_mask</span><span class="p">;</span> <span class="c1">// This means we are looking for '(', ')' and '+' characters.</span>

<span class="k">while</span> <span class="n">all_interesting_mask</span> <span class="o">!=</span> <span class="mi">0</span> <span class="p">{</span>
    <span class="k">let</span> <span class="n">j</span> <span class="o">=</span> <span class="n">all_interesting_mask</span><span class="nf">.trailing_zeros</span><span class="p">()</span> <span class="k">as</span> <span class="nb">usize</span><span class="p">;</span> <span class="c1">// j is the index of the next found interesting character.</span>
    <span class="k">let</span> <span class="n">current_idx</span> <span class="o">=</span> <span class="n">i</span> <span class="o">+</span> <span class="n">j</span><span class="p">;</span> <span class="c1">// + j because the mask is a u64 little endian, so trailing zeros are the leading 0 in reality</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">open_mask</span> <span class="o">&gt;&gt;</span> <span class="n">j</span><span class="p">)</span> <span class="o">&amp;</span> <span class="mi">1</span> <span class="o">==</span> <span class="mi">1</span> <span class="p">{</span> <span class="c1">// If that char is a '(' we increase the depth (we enter in a sub expression)</span>
        <span class="n">depth</span> <span class="o">+=</span> <span class="mi">1</span><span class="p">;</span>
    <span class="p">}</span> <span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="n">close_mask</span> <span class="o">&gt;&gt;</span> <span class="n">j</span><span class="p">)</span> <span class="o">&amp;</span> <span class="mi">1</span> <span class="o">==</span> <span class="mi">1</span> <span class="p">{</span> <span class="c1">// If that char is a ')' we decrease the depth (we exit from a sub expression)</span>
        <span class="n">depth</span> <span class="o">-=</span> <span class="mi">1</span><span class="p">;</span>
    <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
        <span class="k">if</span> <span class="n">depth</span> <span class="o">==</span> <span class="mi">0</span> <span class="p">{</span> <span class="c1">// If the depth is 0, we are at a top level, outside of parentheses. And it is a '+' sign.</span>
            <span class="n">last_op_at_depth_zero</span> <span class="o">=</span> <span class="n">current_idx</span><span class="p">;</span>
            <span class="k">if</span> <span class="n">current_idx</span> <span class="o">&gt;=</span> <span class="n">ideal_pos</span> <span class="p">{</span> <span class="c1">// If we have reached the ideal position (chunk / NUM_THREADS). So we add this '+' sign to the splitting indices.</span>
                <span class="n">final_indices</span><span class="nf">.push</span><span class="p">(</span><span class="n">current_idx</span><span class="p">);</span>
                <span class="n">target_idx</span> <span class="o">+=</span> <span class="mi">1</span><span class="p">;</span>
                <span class="k">if</span> <span class="n">final_indices</span><span class="nf">.len</span><span class="p">()</span> <span class="o">&gt;=</span> <span class="n">num_splits</span> <span class="p">{</span>
                    <span class="k">break</span> <span class="nv">'outer</span><span class="p">;</span>
                <span class="p">}</span>
            <span class="p">}</span>
        <span class="p">}</span>
    <span class="p">}</span>
    <span class="n">all_interesting_mask</span> <span class="o">&amp;=</span> <span class="n">all_interesting_mask</span> <span class="o">-</span> <span class="mi">1</span><span class="p">;</span> <span class="c1">// Clears the lowest set 1 bit from the mask, as we have processed it already</span>
<span class="p">}</span>
</code></pre></div></div>

<p>This loop does <strong>not</strong> run 64 times. It only runs for the number of set bits in <code class="language-plaintext highlighter-rouge">all_interesting_mask</code>. To understand how it processes characters from left-to-right, we need to look at two key details:</p>

<ol>
  <li>
    <p><strong><code class="language-plaintext highlighter-rouge">trailing_zeros()</code> and Little-Endian:</strong> While you might assume <code class="language-plaintext highlighter-rouge">trailing_zeros</code> starts from the <em>end</em> of the string, it’s actually the opposite. Modern x86-64 CPUs are <strong>little-endian</strong>. When a block of memory is loaded into a large integer register, the first byte in memory (e.g., <code class="language-plaintext highlighter-rouge">chunk[0]</code>) becomes the least significant byte (LSB) of the integer. The <code class="language-plaintext highlighter-rouge">trailing_zeros()</code> instruction counts from this LSB, meaning it always finds the set bit corresponding to the character with the <strong>lowest index</strong> in our chunk.</p>

    <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Memory (Bytes in a chunk):
  Byte Index:   0   1   2   3   ...   63
  Content:     '(' '1' '+' '2'  ...   'X'

      |
      |  Load into a 64-bit integer
      v

Resulting u64 Bitmask:
  Bit Position:  63  ...   3   2   1   0   &lt;-- LSB (The "trailing" end)
  Corresponds to: 'X' ...  '2' '+' '1' '('
</code></pre></div>    </div>
    <p>As you can see, <code class="language-plaintext highlighter-rouge">trailing_zeros</code> starts from the right of the integer, which corresponds to the left of our string chunk.</p>
  </li>
  <li>
    <p><strong><code class="language-plaintext highlighter-rouge">if (open_mask &gt;&gt; j) &amp; 1 == 1 {</code></strong>: This is just to check if there is an open parenthesis at position <code class="language-plaintext highlighter-rouge">j</code>. If so, we increment our counter <code class="language-plaintext highlighter-rouge">depth</code>.</p>
  </li>
  <li>
    <p><strong><code class="language-plaintext highlighter-rouge">all_interesting_mask &amp;= all_interesting_mask - 1</code></strong>: This is a trick that clears the lowest set 1 bit we just found. On the next iteration, <code class="language-plaintext highlighter-rouge">trailing_zeros</code> finds the <em>new</em> lowest set bit, which corresponds to the character at the <em>next lowest index</em>.</p>
  </li>
</ol>

<p>This combination allows us to visit every interesting character in the correct, forward order, but without a slow byte-by-byte scan. Inside the loop, we just update our <code class="language-plaintext highlighter-rouge">depth</code> counter to know if we are at a top level position, and if so, we check if we can add a splitting point.</p>

<h4 id="full-example">Full example</h4>

<p>Let’s trace the entire flow with a small, concrete example:</p>

<ul>
  <li><strong>Input String:</strong> <code class="language-plaintext highlighter-rouge">(1-2) + (3-4) + (5-6)</code> (Length is 23 bytes)</li>
  <li><strong>Goal:</strong> Find 1 split point (<code class="language-plaintext highlighter-rouge">num_splits = 1</code>) to create 2 chunks.</li>
  <li><strong>Ideal Split Position:</strong> <code class="language-plaintext highlighter-rouge">1 * (23 / 2) = 11</code>. We are looking for the first <code class="language-plaintext highlighter-rouge">+</code> at depth 0 at or after byte 11.</li>
</ul>

<h5 id="part-1-find_best_split_indices_simd-runs">Part 1: <code class="language-plaintext highlighter-rouge">find_best_split_indices_simd</code> runs</h5>

<p>The function will scan the input to find the best split point.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Input:        ( 1 - 2 )   +   ( 3 - 4 )   +   ( 5 - 6 )
Index:        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2
                                  1 1 1 1 1 1 1 1 1 1 2 2 2
Depth:        1 1 1 1 1 0 0 0 1 1 1 1 1 0 0 0 1 1 1 1 1 0 0
Ideal Split -&gt;                      ^
</code></pre></div></div>

<ol>
  <li>The code starts scanning. It finds the first <code class="language-plaintext highlighter-rouge">+</code> at index <code class="language-plaintext highlighter-rouge">7</code>.</li>
  <li>It checks the depth. The <code class="language-plaintext highlighter-rouge">(</code> at index 0 increased depth to 1, and the <code class="language-plaintext highlighter-rouge">)</code> at index 5 decreased it back to 0. So, at index 7, <code class="language-plaintext highlighter-rouge">depth == 0</code>.</li>
  <li>It checks the splitting logic: <code class="language-plaintext highlighter-rouge">is current_idx (7) &gt;= ideal_pos (11)?</code> The answer is <strong>No</strong>. The code continues scanning.</li>
  <li>The code finds the next <code class="language-plaintext highlighter-rouge">+</code> at index <code class="language-plaintext highlighter-rouge">15</code>.</li>
  <li>It checks the depth. The <code class="language-plaintext highlighter-rouge">(</code> at index 9 and <code class="language-plaintext highlighter-rouge">)</code> at index 13 have kept the depth at 0.</li>
  <li>It checks the splitting logic: <code class="language-plaintext highlighter-rouge">is current_idx (15) &gt;= ideal_pos (11)?</code> The answer is <strong>Yes!</strong></li>
  <li><strong>Action:</strong> The code pushes <code class="language-plaintext highlighter-rouge">15</code> into <code class="language-plaintext highlighter-rouge">final_indices</code> and immediately breaks out of all loops because it has found the 1 split it was looking for.</li>
  <li>The function returns <code class="language-plaintext highlighter-rouge">[15]</code>.</li>
</ol>

<h5 id="part-2-parallel_eval-receives-the-result">Part 2: <code class="language-plaintext highlighter-rouge">parallel_eval</code> receives the result</h5>

<ol>
  <li><code class="language-plaintext highlighter-rouge">split_indices</code> is now <code class="language-plaintext highlighter-rouge">[15]</code>.</li>
  <li>The <code class="language-plaintext highlighter-rouge">for</code> loop runs once for the index <code class="language-plaintext highlighter-rouge">15</code>.
    <ul>
      <li>It creates the first chunk by slicing from <code class="language-plaintext highlighter-rouge">0</code> to <code class="language-plaintext highlighter-rouge">15 - 1 = 14</code>. The chunk is <code class="language-plaintext highlighter-rouge">(1-2) + (3-4)</code>.</li>
      <li>It updates <code class="language-plaintext highlighter-rouge">last_idx</code> to <code class="language-plaintext highlighter-rouge">15 + 2 = 17</code>.</li>
    </ul>
  </li>
  <li>
    <p>The loop finishes. It creates the final chunk by slicing from <code class="language-plaintext highlighter-rouge">17</code> to the end. The chunk is <code class="language-plaintext highlighter-rouge">(5-6)</code>.</p>

    <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Original:     (1-2) + (3-4)   +   (5-6)
              &lt;-- chunk 1 --&gt;   &lt;-- chunk 2 --&gt;

Split Index:                    ^ (15)
</code></pre></div>    </div>
  </li>
  <li>The two chunks, <code class="language-plaintext highlighter-rouge">(1-2) + (3-4)</code> and <code class="language-plaintext highlighter-rouge">(5-6)</code>, are sent to the Rayon thread pool.</li>
  <li>Thread 1 gets <code class="language-plaintext highlighter-rouge">(1-2) + (3-4)</code>, calls <code class="language-plaintext highlighter-rouge">eval</code>, and gets the result <code class="language-plaintext highlighter-rouge">-2</code>.</li>
  <li>Thread 2 gets <code class="language-plaintext highlighter-rouge">(5-6)</code>, calls <code class="language-plaintext highlighter-rouge">eval</code>, and gets the result <code class="language-plaintext highlighter-rouge">-1</code>.</li>
  <li><code class="language-plaintext highlighter-rouge">collect()</code> gathers the results into a vector: <code class="language-plaintext highlighter-rouge">[-2, -1]</code>.</li>
  <li>Finally, <code class="language-plaintext highlighter-rouge">sum()</code> adds them together: <code class="language-plaintext highlighter-rouge">-2 + -1 = -3</code>.</li>
</ol>

<p>The final answer is <strong>-3</strong>, which is correct. The entire process worked perfectly.</p>

<h4 id="result">Result</h4>

<p>In summary, by using SIMD, we can perform an initial, extremely fast pass to identify optimal split points in the input, and then process each chunk in parallel. I believe a similar technique is employed by the popular <a href="https://github.com/simdjson/simdjson">simdjson</a> library.</p>

<p>I executed the code on my Surface laptop, and here are the results:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Step 1: Input file read in 1.199915008s

Step 2: Calculation completed in 1.010507822s

--- Summary ---
Result: 2652
Total time: 2.210422830s
</code></pre></div></div>

<p>From <strong>3.21 to 2.21 seconds</strong>. 1 second faster, for an already optimized program. Good!</p>

<hr />

<h3 id="optimization-5-memory-mapped-io-221s--098s-56-improvement">Optimization 5: Memory-Mapped I/O (2.21 s → 0.98 s, –56% improvement)</h3>

<p>After profiling the memory usage of our parallel solution, we can see that we’re still allocating a very large buffer on the heap to hold the entire file’s contents.</p>

<p><code class="language-plaintext highlighter-rouge">mmap</code> (memory-mapped files) can be more efficient than standard file I/O because it avoids extra copying from kernel to user space, and allows the operating system to manage memory for us.</p>

<p>When I initially tried <code class="language-plaintext highlighter-rouge">mmap</code> with the single-threaded version of the code, the performance gain was negligible. However, now that our program is multithreaded, let’s re-evaluate its impact.</p>

<h4 id="kernel-space-vs-user-space">Kernel Space vs. User Space</h4>

<ul>
  <li><strong>Kernel space</strong>: The privileged area where the operating system runs, managing hardware, I/O, and the page cache.</li>
  <li><strong>User space</strong>: The unprivileged area where our application code executes, including your heap buffers, stacks, and other program data.</li>
  <li><strong>Page cache</strong>: A kernel-managed buffer that temporarily stores file data in memory to speed up subsequent access.</li>
</ul>

<h4 id="cost-of-fsread">Cost of fs::read</h4>

<ol>
  <li><strong>Double Memory Footprint</strong>
    <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[ Disk ] → [ Page Cache ] (1.5 GB)  
          → [ Heap Vec&lt;u8&gt; ] (1.5 GB)  
</code></pre></div>    </div>
    <p>This process involves loading the file into kernel space and then copying it to user space.</p>
  </li>
  <li><strong>False Sharing Contention</strong>
Modern CPUs transfer data between main memory and CPU caches in 64-byte blocks called “cache lines.” False sharing occurs when multiple threads access <em>different</em> variables that happen to reside on the same cache line. If one thread modifies its variable, the entire cache line is invalidated for all other threads, forcing them to re-fetch it from memory even though their own data hasn’t changed.
    <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>// Thread 1 writes to data at byte 8
// Thread 2 writes to data at byte 40
// Both bytes are in the same 64-byte cache line (0-63).
// The cache line "bounces" between the cores, causing delays.
</code></pre></div>    </div>
    <p>With a single large <code class="language-plaintext highlighter-rouge">Vec&lt;u8&gt;</code>, the boundaries of the chunks processed by each thread could easily fall in a way that causes this contention.</p>
  </li>
</ol>

<h4 id="mmap-improvement">mmap improvement</h4>

<p>Instead of reading the entire file into a <code class="language-plaintext highlighter-rouge">Vec&lt;u8&gt;</code>, we can map it directly into memory with <code class="language-plaintext highlighter-rouge">mmap</code>. This gives us:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">use</span> <span class="nn">memmap2</span><span class="p">::</span><span class="n">Mmap</span><span class="p">;</span>

<span class="k">fn</span> <span class="nf">read_input_file</span><span class="p">()</span> <span class="k">-&gt;</span> <span class="nn">std</span><span class="p">::</span><span class="nn">io</span><span class="p">::</span><span class="nb">Result</span><span class="o">&lt;</span><span class="n">Mmap</span><span class="o">&gt;</span> <span class="p">{</span>
    <span class="k">let</span> <span class="n">file</span> <span class="o">=</span> <span class="nn">File</span><span class="p">::</span><span class="nf">open</span><span class="p">(</span><span class="s">"data/input.txt"</span><span class="p">)</span><span class="o">?</span><span class="p">;</span>
    <span class="k">unsafe</span> <span class="p">{</span> <span class="nn">Mmap</span><span class="p">::</span><span class="nf">map</span><span class="p">(</span><span class="o">&amp;</span><span class="n">file</span><span class="p">)</span> <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>This approach avoids the extra copy performed by <code class="language-plaintext highlighter-rouge">fs::read</code> and does not allocate the file’s content in user space memory.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[ Disk ] → [ Page Cache (1.5 GB) ] ↔ [ mmap view in user space ]
</code></pre></div></div>

<p>I also think it is faster because we don’t have false sharing with mmap. It hands us the file in 4 KB pages. Threads get whole pages:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Thread 1 works on data starting at Page 0 (byte 0)
Thread 2 works on data starting at Page N (byte N*4096)
</code></pre></div></div>
<p>Since pages (4 KB) are much larger than cache lines (64 B), threads operate on memory regions that are physically far apart, preventing them from contending over the same cache lines. I’m not entirely certain about this, but it’s my conclusion after reading a lot about the topic and consulting various LLM models.</p>

<h4 id="code-changes">Code Changes</h4>

<p>The change is minimal. The <code class="language-plaintext highlighter-rouge">read_input_file</code> function now returns an <code class="language-plaintext highlighter-rouge">Mmap</code> object, which is passed directly to the <code class="language-plaintext highlighter-rouge">parallel_eval</code> function. This allows the operating system to efficiently map the file directly into our process memory on demand:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">use</span> <span class="nn">memmap2</span><span class="p">::</span><span class="n">Mmap</span><span class="p">;</span>

<span class="k">fn</span> <span class="nf">read_input_file</span><span class="p">()</span> <span class="k">-&gt;</span> <span class="nb">Result</span><span class="o">&lt;</span><span class="n">Mmap</span><span class="o">&gt;</span> <span class="p">{</span>
    <span class="k">let</span> <span class="n">file</span> <span class="o">=</span> <span class="nn">File</span><span class="p">::</span><span class="nf">open</span><span class="p">(</span><span class="s">"data/input.txt"</span><span class="p">)</span><span class="o">?</span><span class="p">;</span>
    <span class="k">unsafe</span> <span class="p">{</span> <span class="nn">Mmap</span><span class="p">::</span><span class="nf">map</span><span class="p">(</span><span class="o">&amp;</span><span class="n">file</span><span class="p">)</span> <span class="p">}</span>
<span class="p">}</span>

<span class="k">fn</span> <span class="nf">main</span><span class="p">()</span> <span class="k">-&gt;</span> <span class="nb">Result</span><span class="o">&lt;</span><span class="p">()</span><span class="o">&gt;</span> <span class="p">{</span>
    <span class="k">let</span> <span class="n">mmap</span> <span class="o">=</span> <span class="nf">read_input_file</span><span class="p">()</span><span class="o">?</span><span class="p">;</span>
    <span class="k">let</span> <span class="n">result</span> <span class="o">=</span> <span class="nf">parallel_eval</span><span class="p">(</span><span class="o">&amp;</span><span class="n">mmap</span><span class="p">,</span> <span class="n">NUM_THREADS</span><span class="p">);</span>
    <span class="nd">println!</span><span class="p">(</span><span class="s">"Result: {}"</span><span class="p">,</span> <span class="n">result</span><span class="p">);</span>
    <span class="nf">Ok</span><span class="p">(())</span>
<span class="p">}</span>
</code></pre></div></div>

<h4 id="performance-results">Performance Results</h4>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Step 1: Input file read in 18.8 µs  
Step 2: Calculation completed in 981.2 ms  
**Total time:** 981.3 ms
</code></pre></div></div>

<p>From <strong>2.21s to 981ms</strong>. Less than a second!!</p>

<hr />

<h2 id="conclusion">Conclusion</h2>

<p><strong>YOU CAN FIND THE FULL CODE ON:</strong> <a href="https://github.com/RPallas92/math_parser">https://github.com/RPallas92/math_parser</a></p>

<p>We started with a simple math parser that took 43 seconds to run. By making a series of changes, we made it run in under one second. Here is a summary of what we did:</p>

<ol>
  <li><strong>Stopped creating a list of all tokens at once.</strong> Instead of reading the whole file and creating a big list of tokens, we processed them one by one. This was the biggest improvement, bringing the time down from 43 to 6.4 seconds. (To be honest I made this mistake in purpose just to see the difference).</li>
  <li><strong>Worked with bytes instead of text.</strong> Instead of treating the input as text, we worked with the raw bytes. This avoided extra work and brought the time down to 3.7 seconds.</li>
  <li><strong>Simplified the code by removing <code class="language-plaintext highlighter-rouge">Peekable</code>.</strong> We changed the logic to avoid peeking at the next token, which made the code faster, reducing the time to 3.2 seconds.</li>
  <li><strong>Used multiple threads and modern CPU features.</strong> We used Rayon to run calculations in parallel and SIMD to find split points faster. This brought the time down to 2.2 seconds.</li>
  <li><strong>Used memory-mapped files.</strong> Instead of reading the file into memory ourselves, we let the operating system handle it. This was the final optimization, bringing the time down to just 0.98 seconds.</li>
</ol>

<p><strong>If you have any corrections or comments, please contact me on LinkedIn or via email. Thank you very much for reading!</strong></p>]]></content><author><name>Ricardo Pallas</name></author><category term="blog" /><category term="algorithms" /><category term="data structures" /><category term="rust" /><category term="performance" /><category term="multithreading" /><category term="optimization" /><category term="bitwise" /><category term="hash maps" /><category term="SIMD" /><category term="mmap" /><summary type="html"><![CDATA[Optimizing a math expression parser for speed and memory.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://rpallas92.github.io/assets/images/1brc/flamegraph1.jpg" /><media:content medium="image" url="https://rpallas92.github.io/assets/images/1brc/flamegraph1.jpg" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Pingora async runtime and threading model</title><link href="https://rpallas92.github.io/pingora-internals-1-async-runtime/" rel="alternate" type="text/html" title="Pingora async runtime and threading model" /><published>2024-09-07T08:00:00+00:00</published><updated>2024-09-07T08:00:00+00:00</updated><id>https://rpallas92.github.io/pingora-internals-1-async-runtime</id><content type="html" xml:base="https://rpallas92.github.io/pingora-internals-1-async-runtime/"><![CDATA[<h1 id="pingora-async-runtime-and-threading-model">Pingora async runtime and threading model</h1>

<h2 id="introduction">Introduction</h2>

<p>Cloudflare <a href="https://blog.cloudflare.com/pingora-open-source/">open-sourced</a> their <strong>Rust</strong> framework for building programmable <strong>network services</strong> called <strong><a href="https://github.com/cloudflare/pingora">Pingora</a></strong>. They developed it to replace NGINX to overcome its limitations, as explained in <a href="https://blog.cloudflare.com/how-we-built-pingora-the-proxy-that-connects-cloudflare-to-the-internet/">their blog</a>:</p>

<blockquote>
  <p>Today we are excited to talk about Pingora, a new HTTP proxy we’ve built in-house using Rust that serves over 1 trillion requests a day, boosts our performance, and enables many new features for Cloudflare customers, all while requiring only a third of the CPU and memory resources of our previous proxy infrastructure.</p>

  <p>…</p>

  <p>Over the years, our usage of NGINX has run up against limitations. For some limitations, we optimized or worked around them. But others were much harder to overcome.</p>
</blockquote>

<p>I am starting a <strong>series of posts where I will dive deep into the internals of Pingora</strong>. Each post will cover a different aspect of Pingora’s architecture. In this post, we will discuss its <strong>runtime</strong> for running async Rust and its <strong>threading model</strong>.</p>

<h2 id="pingoras-async-runtime">Pingora’s async runtime</h2>

<p>Pingora’s async runtime is based on <a href="https://tokio.rs/">Tokio</a>. Tokio is the <em>de facto</em> runtime for write asynchronous, non-blocking code in Rust:</p>

<blockquote>
  <p>Tokio is scalable, built on top of the async/await language feature, which itself is scalable. When dealing with networking, there’s a limit to how fast you can handle a connection due to latency, so the only way to scale is to handle many connections at once. With the async/await language feature, increasing the number of concurrent operations becomes incredibly cheap, allowing you to scale to a large number of concurrent tasks.</p>
</blockquote>

<p>Pingora offers <strong>two multi-threaded runtimes</strong> (with and without work stealing) that we will discuss later. But first, let’s explain how Tokio’s runtime works, as it forms the basis of Pingora.</p>

<h3 id="tokios-runtime">Tokio’s runtime</h3>

<p>The Tokio <a href="https://docs.rs/tokio/1.40.0/tokio/runtime/index.html">runtime</a> is composed of three main components:</p>
<ol>
  <li>An <strong>I/O event loop</strong>, called the driver, which drives I/O resources and dispatches I/O events to tasks that depend on them.</li>
  <li>A <strong>scheduler</strong> to execute tasks that use these I/O resources.</li>
  <li>A <strong>timer</strong> for scheduling work to run after a set period of time.</li>
</ol>

<p>The <code class="language-plaintext highlighter-rouge">tokio::main</code> macro provides a default-configured runtime:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">use</span> <span class="nn">futures</span><span class="p">::</span><span class="nn">future</span><span class="p">::</span><span class="n">join_all</span><span class="p">;</span>

<span class="nd">#[tokio::main]</span>
<span class="k">async</span> <span class="k">fn</span> <span class="nf">main</span><span class="p">()</span> <span class="k">-&gt;</span> <span class="nb">Result</span><span class="o">&lt;</span><span class="p">(),</span> <span class="nb">Box</span><span class="o">&lt;</span><span class="k">dyn</span> <span class="nn">std</span><span class="p">::</span><span class="nn">error</span><span class="p">::</span><span class="n">Error</span><span class="o">&gt;&gt;</span> <span class="p">{</span>
    <span class="nd">println!</span><span class="p">(</span><span class="s">"Executing main"</span><span class="p">);</span>

    <span class="k">let</span> <span class="k">mut</span> <span class="n">handles</span> <span class="o">=</span> <span class="nn">Vec</span><span class="p">::</span><span class="nf">new</span><span class="p">();</span>

    <span class="k">for</span> <span class="n">i</span> <span class="k">in</span> <span class="mi">0</span><span class="o">..</span><span class="mi">5</span> <span class="p">{</span>
        <span class="k">let</span> <span class="n">handle</span> <span class="o">=</span> <span class="nn">tokio</span><span class="p">::</span><span class="nf">spawn</span><span class="p">(</span><span class="k">async</span> <span class="k">move</span> <span class="p">{</span>
            <span class="nd">println!</span><span class="p">(</span><span class="s">"Executing task {}"</span><span class="p">,</span> <span class="n">i</span><span class="p">);</span>
        <span class="p">});</span>
        <span class="n">handles</span><span class="nf">.push</span><span class="p">(</span><span class="n">handle</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="nf">join_all</span><span class="p">(</span><span class="n">handles</span><span class="p">)</span><span class="k">.await</span><span class="p">;</span>
    <span class="nf">Ok</span><span class="p">(())</span>
<span class="p">}</span>
</code></pre></div></div>

<p>In the code above, the <code class="language-plaintext highlighter-rouge">main</code> async function is scheduled on the Tokio runtime. The function also schedules five additional async tasks on the same runtime by calling <code class="language-plaintext highlighter-rouge">tokio::spawn</code>.</p>

<p>Tokio’s runtime provides two main task scheduling strategies:</p>
<ol>
  <li>Multi-Thread Scheduler</li>
  <li>Current-Thread Scheduler</li>
</ol>

<h4 id="multi-thread-scheduler">Multi-Thread Scheduler</h4>

<p>The multi-thread scheduler executes futures on a thread pool, using a work-stealing strategy. By default, it will start a worker thread for each CPU core available on the system.</p>

<p>At its most basic level, a runtime has a collection of tasks that need to be scheduled. It will repeatedly remove a task from that collection and schedule it (by calling <code class="language-plaintext highlighter-rouge">poll</code>). When the collection is empty, the thread will go to sleep until a task is added to the collection.</p>

<p>The multi-thread runtime maintains one global queue, and a local queue for each worker thread. The runtime will prefer to choose the next task to schedule from the local queue, and will only pick a task from the global queue if the local queue is empty.</p>

<p>If both the local queue and global queue are empty, the worker thread <strong>will attempt to steal tasks from the local queue of another worker thread</strong>. Stealing is done by moving half of the tasks in one local queue to another local queue.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> Multi-Thread Scheduler with 2 threads                        
┌────────────────────────────────────────────────────────────┐
│                                                            │
│   Tokio runtime                                            │
│                                                            │
│   ┌────────────────────────────────────────────────────┐   │
│   │ Global queue                                       │   │
│   │  (empty)                                           │   │
│   └──────────┬─────────────────────────────┬───────────┘   │
│              │                             │               │
│              │                             │               │
│              ▼                             ▼               │
│   ┌──────────────────────┐      ┌──────────────────────┐   │
│   │ Thread 1             │      │ Thread 2             │   │
│   │                      │      │                      │   │
│   │ ┌──────────────────┐ │      │ ┌──────────────────┐ │   │
│   │ │ Local queue      │ │      │ │ Local queue      │ │   │
│   │ │ Task 1           │ │      │ │ (empty)          │ │   │
│   │ │ Task 2           │ │      │ │                  │ │   │
│   │ │ Task 3           │ │      │ │                  │ │   │
│   │ │ Task 4           │ │      │ │                  │ │   │
│   │ └──────────────────┘ │      │ └──────────────────┘ │   │
│   │                      │      │                      │   │
│   │  Executing Task 1    │      │  Idle                │   │
│   │                      │      │                      │   │
│   └──────────────────────┘      └──────────────────────┘   │
│                                                            │
└────────────────────────────────────────────────────────────┘
</code></pre></div></div>

<p>In the diagram above, the Tokio runtime has two threads. Thread 1 has four tasks in its local queue, while Thread 2 has no tasks. Since the global queue is also empty, Thread 2 will steal half of the tasks from Thread 1. This ensures that tasks are balanced across CPU cores, maximizing the use of available hardware resources and increasing throughput.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> Multi-Thread Scheduler with 2 threads                        
┌────────────────────────────────────────────────────────────┐
│                                                            │
│   Tokio runtime                                            │
│                                                            │
│   ┌────────────────────────────────────────────────────┐   │
│   │ Global queue                                       │   │
│   │  (empty)                                           │   │
│   └──────────┬─────────────────────────────┬───────────┘   │
│              │                             │               │
│              │                             │               │
│              ▼                             ▼               │
│   ┌──────────────────────┐      ┌──────────────────────┐   │
│   │ Thread 1             │      │ Thread 2             │   │
│   │                      │      │                      │   │
│   │ ┌──────────────────┐ │      │ ┌──────────────────┐ │   │
│   │ │ Local queue      │ │      │ │ Local queue      │ │   │
│   │ │ Task 1           │ │      │ │ Task 3           │ │   │
│   │ │ Task 2           │ │      │ │ Task 4           │ │   │
│   │ └──────────────────┘ │      │ └──────────────────┘ │   │
│   │                      │      │                      │   │
│   │  Executing Task 1    │      │  Executing Task 3    │   │
│   │                      │      │                      │   │
│   └──────────────────────┘      └──────────────────────┘   │
│                                                            │
└────────────────────────────────────────────────────────────┘
</code></pre></div></div>

<h4 id="single-thread-scheudler">Single-Thread Scheudler</h4>

<p>The single-thread scheduler in Tokio operates with a single thread and uses a global and a local queues to manage tasks. This scheduler is designed to handle all asynchronous tasks on a single core. In this model, there is no need for task migration or work stealing, as all tasks are handled by the single thread:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> Single-Thread Tokio Scheduler                        
┌──────────────────────────────┐
│                              │
│   Tokio Runtime              │
│                              │
│   ┌──────────────────────┐   │
│   │ Thread               │   │
│   │                      │   │
│   │ ┌──────────────────┐ │   │
│   │ │ Global Queue     │ │   │
│   │ │ (empty)          │ │   │
│   │ └──────────────────┘ │   │
│   │                      │   │
│   │ ┌──────────────────┐ │   │
│   │ │ Local Queue      │ │   │
│   │ │ Task 1           │ │   │
│   │ │ Task 2           │ │   │
│   │ │ Task 3           │ │   │
│   │ │ Task 4           │ │   │
│   │ └──────────────────┘ │   │
│   │                      │   │
│   │  Executing Task 1    │   │
│   │                      │   │
│   └──────────────────────┘   │
│                              │
└──────────────────────────────┘
</code></pre></div></div>

<h2 id="pingora-threading-model">Pingora’ threading model</h2>

<p>As already mentioned before, Pingora’s threading model is based on Tokio’s asynchronous runtime, and it provides two main flavors of runtimes: a stealing (multi-threaded with work stealing) and a no-steal (multi-threaded without work stealing) approach.</p>

<p>The stealing flavour is <strong>just the standard Tokio multi-thread runtime without any customizations</strong> that we already discussed.</p>

<p>On the other hand, the no-steal flavour is a <strong>set of OS threads</strong> each with <strong>its own single-threaded Tokio runtime</strong>.</p>

<h2 id="pingoras-multi-thread-runtime-without-work-stealing">Pingora’s multi-thread runtime without work stealing</h2>

<p>The no-steal runtime model in Pingora is an alternative where each thread runs its own independent Tokio runtime with no task migration between threads. This means each thread operates as a single-threaded runtime, but the overall system can still utilize multiple cores by spawning multiple such runtimes:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> Multi-Threaded No Stealing Pingora Runtime with 2 Threads                            
┌────────────────────────────────────────────────────────────┐
│                                                            │
│   ┌────────────────────────────────────────────────────┐   │
│   │   Single-thread Tokio Runtime (Thread 1)           │   │
│   └────────────────────────────────────────────────────┘   │
│                                                            │
│   ┌────────────────────────────────────────────────────┐   │
│   │   Single-thread Tokio Runtime (Thread 2)           │   │
│   └────────────────────────────────────────────────────┘   │
│                                                            │
│   ┌──────────────────────┐     ┌──────────────────────┐    │
│   │ Thread 1             │     │ Thread 2             │    │
│   │                      │     │                      │    │
│   │ ┌──────────────────┐ │     │ ┌──────────────────┐ │    │
│   │ │ Global Queue     │ │     │ │ Global Queue     │ │    │
│   │ │ (Tasks 1, 2, 3)  │ │     │ │ (Tasks 4, 5, 6)  │ │    │
│   │ └──────────────────┘ │     │ └──────────────────┘ │    │
│   │                      │     │                      │    │
│   │ ┌──────────────────┐ │     │ ┌──────────────────┐ │    │
│   │ │ Local Queue      │ │     │ │ Local Queue      │ │    │
│   │ │ (empty)          │ │     │ │ (empty)          │ │    │
│   │ └──────────────────┘ │     │ └──────────────────┘ │    │
│   │                      │     │                      │    │
│   │  Executing Task 1    │     │  Executing Task 5    │    │
│   │                      │     │                      │    │
│   └──────────────────────┘     └──────────────────────┘    │
│                                                            │
│   ┌────────────────────────────────────────────────────┐   │
│   │               Multi-Core Utilization               │   │
│   └────────────────────────────────────────────────────┘   │
│                                                            │
└────────────────────────────────────────────────────────────┘

</code></pre></div></div>

<p>As we can see in the diagram above, each thread has its own Tokio runtime and task queues, and tasks scheduled on one thread remain on that thread throughout their lifetime. This means there is no work stealing, each thread is responsible for its own work, and no tasks are stolen or migrated across threads. Pingora ensures that any new task spawned in this runtime is randomly assigned to one of the threads.</p>

<p>This runtime flavour allows a thread-per-core thread model: it spawns multiple OS threads, with one thread typically mapped to each available CPU core.</p>

<h4 id="alternative-thread-per-core-runtimes">Alternative thread-per-core runtimes</h4>

<p>One improvement that Pingora could add to its non-stealing runtime is the use of <code class="language-plaintext highlighter-rouge">LocalSet</code> (<a href="https://docs.rs/tokio/1.40.0/tokio/runtime/struct.Builder.html#method.new_current_thread">see docs</a>). Since it is composed of Tokio single-thread runtimes the futures shouldn’t be required to be <code class="language-plaintext highlighter-rouge">Send</code> and <code class="language-plaintext highlighter-rouge">Sync</code> as they are always run in the same thread. <code class="language-plaintext highlighter-rouge">LocalSet</code>  provides a way to spawn and manage non-Send tasks within the context of a single-threaded runtime.</p>

<p>There are also other existing alternative runtimes that lend themselves to thread-per-core architectures: <a href="https://crates.io/crates/glommio">glommio</a> from DataDog and <a href="https://crates.io/crates/monoio">monoio</a> from ByteDance.</p>

<p>I recommend reading <a href="https://emschwartz.me/async-rust-can-be-a-pleasure-to-work-with-without-send-sync-static/">this post</a> about async Rust without <code class="language-plaintext highlighter-rouge">Send + Sync + 'static</code>.</p>

<h3 id="which-pingora-runtime-should-i-use">Which Pingora runtime should I use?</h3>

<p>Let’s discuss the pros and cons of each alternative:</p>

<h4 id="work-stealing">Work stealing</h4>

<p><strong>Pros:</strong></p>
<ul>
  <li>It dynamically balances the load across threads. If one thread is overwhelmed with tasks, other threads can steal work from the busy thread’s queue.</li>
  <li>By distributing tasks, it helps ensure that all CPU cores are utilized efficiently.</li>
</ul>

<p><strong>Cons:</strong></p>
<ul>
  <li>Some overhead is introduced due to the need for coordination and synchronization when threads steal tasks from each other. This can cause additional latency in certain contexts.</li>
</ul>

<h4 id="thread-per-core">Thread-Per-Core</h4>

<p><strong>Pros:</strong></p>
<ul>
  <li>Since threads do not need to steal work from one another, there is less synchronization overhead.</li>
  <li>Tasks are bound to specific threads, which can be beneficial for tasks that are stateful or have significant initialization costs.</li>
</ul>

<p><strong>Cons:</strong></p>
<ul>
  <li>If some tasks require more time to execute than others, this could lead to some CPU cores being busy while others remain idle.</li>
</ul>

<p>In my opinion, in the particular case of an HTTP proxy like Pingora, if the incoming traffic is predictable and each request requires a similar amount of time to be processed, a thread-per-core model might provide consistent performance with reduced thread contention. I opened a discussion <a href="https://github.com/cloudflare/pingora/discussions/376">here</a>.</p>

<h2 id="conclusion">Conclusion</h2>

<p>In summary, Pingora offers two async runtimes: one with work stealing and one without.</p>

<ul>
  <li>The work-stealing runtime is good for handling varying loads by balancing tasks across threads, but it introduces some overhead.</li>
  <li>The thread-per-core model, where each thread runs its own Tokio runtime, can provide more consistent performance with less contention, especially if workloads are predictable.</li>
</ul>

<p>In my opinion, Pingora should consider improving its non-stealing policy by allowing the use of non-Send and non-Sync futures with a <code class="language-plaintext highlighter-rouge">LocalSet</code>. On top of that, they could even explore supporting other runtimes like <a href="https://crates.io/crates/glommio">glommio</a> and <a href="https://crates.io/crates/monoio">monoio</a>.</p>]]></content><author><name>Ricardo Pallas</name></author><category term="blog" /><category term="pingora" /><category term="nginx" /><category term="rust" /><category term="data structures" /><category term="http proxy" /><category term="performance" /><category term="multithreading" /><category term="tokio" /><category term="thread-per-core" /><category term="async" /><summary type="html"><![CDATA[Dive deep into Pingora async runtime and threading model.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://cf-assets.www.cloudflare.com/slt3lc6tev37/f24aeLVQUjlbXRxDzdWm9/4f560a3a3e7da5d10093728548029ed7/pingora-open-source.png" /><media:content medium="image" url="https://cf-assets.www.cloudflare.com/slt3lc6tev37/f24aeLVQUjlbXRxDzdWm9/4f560a3a3e7da5d10093728548029ed7/pingora-open-source.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Rust 1 Billion Row Challenge without Dependencies</title><link href="https://rpallas92.github.io/1brc/" rel="alternate" type="text/html" title="Rust 1 Billion Row Challenge without Dependencies" /><published>2024-06-28T08:00:00+00:00</published><updated>2024-06-28T08:00:00+00:00</updated><id>https://rpallas92.github.io/1brc</id><content type="html" xml:base="https://rpallas92.github.io/1brc/"><![CDATA[<h1 id="rust-1-billion-row-challenge-without-dependencies">Rust 1 Billion Row Challenge without Dependencies</h1>

<h2 id="table-of-contents">Table of Contents</h2>

<ol>
  <li><a href="#introduction">Introduction</a></li>
  <li><a href="#base-naive-implementation-90-seconds">Base Naive Implementation (90 seconds)</a></li>
  <li><a href="#multithreading-solution-1796-secs---80-improvement">Multithreading Solution (17.96 secs - 80% improvement)</a></li>
  <li><a href="#custom-number-parsing-81-seconds---549-improvement">Custom Number Parsing (8.1 seconds - 54.9% improvement)</a></li>
  <li><a href="#custom-key-parsing-676-seconds---165-improvement">Custom Key Parsing (6.76 seconds - 16.5% improvement)</a></li>
  <li><a href="#custom-hash-function-585-seconds---135-improvement">Custom Hash Function (5.85 seconds - 13.5% improvement)</a></li>
  <li><a href="#unsafe-string-parsing-516-seconds---118-improvement">Unsafe String Parsing (5.16 seconds - 11.8% improvement)</a></li>
  <li><a href="#edit-1-custom-line-splitting-482-seconds---659-improvement">Edit 1: Custom Line Splitting (4.82 seconds - 6.59% improvement)</a></li>
  <li><a href="#conclusion">Conclusion</a></li>
</ol>

<h2 id="introduction">Introduction</h2>

<p>On January 1st, 2024, <a href="https://www.morling.dev/blog/one-billion-row-challenge/">Gunnar Morling announced</a> the 1 Billion Row Challenge (1BRC). The challenge is to write a Java program to read temperature data from a text file and find the minimum, average, and maximum temperatures for each weather station. The file has 1,000,000,000 rows.</p>

<p>The text file has a simple structure with one measurement value per row:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Graus;12.0
Zaragoza;8.9
Madrid;38.8
Paris;15.2
London;12.6
...
</code></pre></div></div>

<p>The program should print out the min, mean, and max values per station, ordered alphabetically like so:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{Graus=5.0/18.0/27.4, Madrid=15.7/26.0/34.1, New York=12.1/29.4/35.6, ...}
</code></pre></div></div>

<p>I was curious and read several implementations in Rust. They were really good and optimized, and I learned a lot from them. However, they did not follow one of the rules of the 1BRC: no external dependencies may be used.</p>

<p>I tried running the <a href="https://1brc.dev/">official Rust solution</a> with my 1 billion rows test file on my <a href="https://www.bee-link.com/products/beelink-ser5-max-5800h?variant=46189745766642">SER5 MAX mini PC</a>. It took <strong>5.7 seconds</strong> to execute.</p>

<p>I decided to write my own solution in Rust without using any external crates. My goal was to achieve similar performance to the official solution while keeping the code simple and short.</p>

<p>The code is <a href="https://github.com/RPallas92/one-billion-row">available on this Github repository</a>.</p>

<h2 id="base-naive-implementation-90-seconds">Base Naive Implementation (90 seconds)</h2>

<p>I started by writing a simple, naive and unoptimized first version to use it as a base implementation for further improvements.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">use</span> <span class="nn">std</span><span class="p">::{</span>
    <span class="nn">collections</span><span class="p">::</span><span class="n">BTreeMap</span><span class="p">,</span>
    <span class="nn">fmt</span><span class="p">::</span><span class="n">Display</span><span class="p">,</span>
    <span class="nn">fs</span><span class="p">::</span><span class="n">File</span><span class="p">,</span>
    <span class="nn">io</span><span class="p">::{</span><span class="n">BufRead</span><span class="p">,</span> <span class="n">BufReader</span><span class="p">,</span> <span class="nb">Result</span><span class="p">},</span>
    <span class="nn">time</span><span class="p">::</span><span class="n">Instant</span><span class="p">,</span>
<span class="p">};</span>

<span class="k">fn</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
    <span class="cm">/*
    The release build is executed in around 90 seconds on SER5 MAX:
       - CPU: AMD Ryzen 7 5800H with Radeon Graphics (16) @ 3.200GHz
       - GPU: AMD ATI Radeon Vega Series / Radeon Vega Mobile Series
       - Memory: 28993MiB
    */</span>
    <span class="k">let</span> <span class="n">start</span> <span class="o">=</span> <span class="nn">Instant</span><span class="p">::</span><span class="nf">now</span><span class="p">();</span>

    <span class="k">let</span> <span class="n">reader</span> <span class="o">=</span> <span class="nf">get_file_reader</span><span class="p">()</span><span class="nf">.unwrap</span><span class="p">();</span>
    <span class="k">let</span> <span class="n">station_to_metrics</span> <span class="o">=</span> <span class="nf">build_map</span><span class="p">(</span><span class="n">reader</span><span class="p">)</span><span class="nf">.unwrap</span><span class="p">();</span>
    <span class="nf">print_metrics</span><span class="p">(</span><span class="n">station_to_metrics</span><span class="p">);</span>

    <span class="k">let</span> <span class="n">duration</span> <span class="o">=</span> <span class="n">start</span><span class="nf">.elapsed</span><span class="p">();</span>
    <span class="nd">println!</span><span class="p">(</span><span class="s">"</span><span class="se">\n</span><span class="s"> Execution time: {:?}"</span><span class="p">,</span> <span class="n">duration</span><span class="p">);</span>
<span class="p">}</span>

<span class="k">fn</span> <span class="nf">get_file_reader</span><span class="p">()</span> <span class="k">-&gt;</span> <span class="nb">Result</span><span class="o">&lt;</span><span class="n">BufReader</span><span class="o">&lt;</span><span class="n">File</span><span class="o">&gt;&gt;</span> <span class="p">{</span>
    <span class="k">let</span> <span class="n">file</span><span class="p">:</span> <span class="n">File</span> <span class="o">=</span> <span class="nn">File</span><span class="p">::</span><span class="nf">open</span><span class="p">(</span><span class="s">"./data/weather_stations.csv"</span><span class="p">)</span><span class="o">?</span><span class="p">;</span>
    <span class="nf">Ok</span><span class="p">(</span><span class="nn">BufReader</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="n">file</span><span class="p">))</span>
<span class="p">}</span>

<span class="k">fn</span> <span class="nf">build_map</span><span class="p">(</span><span class="n">file_reader</span><span class="p">:</span> <span class="n">BufReader</span><span class="o">&lt;</span><span class="n">File</span><span class="o">&gt;</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">Result</span><span class="o">&lt;</span><span class="n">BTreeMap</span><span class="o">&lt;</span><span class="nb">String</span><span class="p">,</span> <span class="n">StationMetrics</span><span class="o">&gt;&gt;</span> <span class="p">{</span>
    <span class="k">let</span> <span class="k">mut</span> <span class="n">station_to_metrics</span> <span class="o">=</span> <span class="nn">BTreeMap</span><span class="p">::</span><span class="o">&lt;</span><span class="nb">String</span><span class="p">,</span> <span class="n">StationMetrics</span><span class="o">&gt;</span><span class="p">::</span><span class="nf">new</span><span class="p">();</span>
    <span class="k">for</span> <span class="n">line</span> <span class="k">in</span> <span class="n">file_reader</span><span class="nf">.lines</span><span class="p">()</span> <span class="p">{</span>
        <span class="k">let</span> <span class="n">line</span> <span class="o">=</span> <span class="n">line</span><span class="o">?</span><span class="p">;</span>
        <span class="k">let</span> <span class="p">(</span><span class="n">city</span><span class="p">,</span> <span class="n">temperature</span><span class="p">)</span> <span class="o">=</span> <span class="n">line</span><span class="nf">.split_once</span><span class="p">(</span><span class="sc">';'</span><span class="p">)</span><span class="nf">.unwrap</span><span class="p">();</span>
        <span class="k">let</span> <span class="n">temperature</span><span class="p">:</span> <span class="nb">f32</span> <span class="o">=</span> <span class="n">temperature</span><span class="nf">.parse</span><span class="p">()</span><span class="nf">.expect</span><span class="p">(</span><span class="s">"Incorrect temperature"</span><span class="p">);</span>
        <span class="n">station_to_metrics</span>
            <span class="nf">.entry</span><span class="p">(</span><span class="n">city</span><span class="nf">.to_string</span><span class="p">())</span>
            <span class="nf">.or_default</span><span class="p">()</span>
            <span class="nf">.update</span><span class="p">(</span><span class="n">temperature</span><span class="p">);</span>
    <span class="p">}</span>
    <span class="nf">Ok</span><span class="p">(</span><span class="n">station_to_metrics</span><span class="p">)</span>
<span class="p">}</span>

<span class="c1">// BTreeMap already sorts keys in ascending order.</span>
<span class="k">fn</span> <span class="nf">print_metrics</span><span class="p">(</span><span class="n">station_to_metrics</span><span class="p">:</span> <span class="n">BTreeMap</span><span class="o">&lt;</span><span class="nb">String</span><span class="p">,</span> <span class="n">StationMetrics</span><span class="o">&gt;</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">for</span> <span class="p">(</span><span class="n">i</span><span class="p">,</span> <span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="n">state</span><span class="p">))</span> <span class="k">in</span> <span class="n">station_to_metrics</span><span class="nf">.into_iter</span><span class="p">()</span><span class="nf">.enumerate</span><span class="p">()</span> <span class="p">{</span>
        <span class="k">if</span> <span class="n">i</span> <span class="o">==</span> <span class="mi">0</span> <span class="p">{</span>
            <span class="nd">print!</span><span class="p">(</span><span class="s">"{name}={state}"</span><span class="p">);</span>
        <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
            <span class="nd">print!</span><span class="p">(</span><span class="s">", {name}={state}"</span><span class="p">);</span>
        <span class="p">}</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="nd">#[derive(Debug)]</span>
<span class="k">struct</span> <span class="n">StationMetrics</span> <span class="p">{</span>
    <span class="n">sum_temperature</span><span class="p">:</span> <span class="nb">f64</span><span class="p">,</span>
    <span class="n">num_records</span><span class="p">:</span> <span class="nb">u32</span><span class="p">,</span>
    <span class="n">min_temperature</span><span class="p">:</span> <span class="nb">f32</span><span class="p">,</span>
    <span class="n">max_temperature</span><span class="p">:</span> <span class="nb">f32</span><span class="p">,</span>
<span class="p">}</span>

<span class="k">impl</span> <span class="n">StationMetrics</span> <span class="p">{</span>
    <span class="k">fn</span> <span class="nf">update</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="k">self</span><span class="p">,</span> <span class="n">temperature</span><span class="p">:</span> <span class="nb">f32</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">self</span><span class="py">.max_temperature</span> <span class="o">=</span> <span class="k">self</span><span class="py">.max_temperature</span><span class="nf">.max</span><span class="p">(</span><span class="n">temperature</span><span class="p">);</span>
        <span class="k">self</span><span class="py">.min_temperature</span> <span class="o">=</span> <span class="k">self</span><span class="py">.min_temperature</span><span class="nf">.min</span><span class="p">(</span><span class="n">temperature</span><span class="p">);</span>
        <span class="k">self</span><span class="py">.num_records</span> <span class="o">+=</span> <span class="mi">1</span><span class="p">;</span>
        <span class="k">self</span><span class="py">.sum_temperature</span> <span class="o">+=</span> <span class="n">temperature</span> <span class="k">as</span> <span class="nb">f64</span><span class="p">;</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="k">impl</span> <span class="nb">Default</span> <span class="k">for</span> <span class="n">StationMetrics</span> <span class="p">{</span>
    <span class="k">fn</span> <span class="nf">default</span><span class="p">()</span> <span class="k">-&gt;</span> <span class="k">Self</span> <span class="p">{</span>
        <span class="n">StationMetrics</span> <span class="p">{</span>
            <span class="n">sum_temperature</span><span class="p">:</span> <span class="mf">0.0</span><span class="p">,</span>
            <span class="n">num_records</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span>
            <span class="n">min_temperature</span><span class="p">:</span> <span class="nn">f32</span><span class="p">::</span><span class="n">MAX</span><span class="p">,</span>
            <span class="n">max_temperature</span><span class="p">:</span> <span class="nn">f32</span><span class="p">::</span><span class="n">MIN</span><span class="p">,</span>
        <span class="p">}</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="k">impl</span> <span class="n">Display</span> <span class="k">for</span> <span class="n">StationMetrics</span> <span class="p">{</span>
    <span class="k">fn</span> <span class="nf">fmt</span><span class="p">(</span><span class="o">&amp;</span><span class="k">self</span><span class="p">,</span> <span class="n">f</span><span class="p">:</span> <span class="o">&amp;</span><span class="k">mut</span> <span class="nn">std</span><span class="p">::</span><span class="nn">fmt</span><span class="p">::</span><span class="n">Formatter</span><span class="o">&lt;</span><span class="nv">'_</span><span class="o">&gt;</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nn">std</span><span class="p">::</span><span class="nn">fmt</span><span class="p">::</span><span class="nb">Result</span> <span class="p">{</span>
        <span class="k">let</span> <span class="n">avg_temperature</span> <span class="o">=</span> <span class="k">self</span><span class="py">.sum_temperature</span> <span class="o">/</span> <span class="p">(</span><span class="k">self</span><span class="py">.num_records</span> <span class="k">as</span> <span class="nb">f64</span><span class="p">);</span>
        <span class="nd">write!</span><span class="p">(</span>
            <span class="n">f</span><span class="p">,</span>
            <span class="s">"{:.1}/{avg_temperature:.1}/{:.1}"</span><span class="p">,</span>
            <span class="k">self</span><span class="py">.min_temperature</span><span class="p">,</span> <span class="k">self</span><span class="py">.max_temperature</span>
        <span class="p">)</span>
    <span class="p">}</span>
<span class="p">}</span>

</code></pre></div></div>

<p>As simple as this:</p>
<ol>
  <li>It opens the CSV file <code class="language-plaintext highlighter-rouge">./data/weather_stations.csv</code> and creates a buffered reader for efficient file reading.</li>
  <li>It reads each line of the file, splitting each line into a city name and a temperature value. It uses a BTreeMap to store and update temperature statistics (min, mean, and max) for each city. The BTreeMap automatically keeps the city names sorted.</li>
  <li>For each city, it has a StationMetrics struct that tracks the sum, count, min, and max temperatures. The implementation updates these metrics as it processes each line of the file.</li>
  <li>Once all data is processed, we print the temperature statistics for each city.</li>
</ol>

<p>It is executed in <strong>90 seconds</strong> on my mini PC. This is too far from the 5.7 seconds of the official implementation. Let’s get started!</p>

<p><strong><a href="https://github.com/RPallas92/one-billion-row/commit/2a1c2f75a59ce7e4f03b60846a1f46b36f003c48">Link to commit</a></strong></p>

<blockquote>
  <p>I also wrote this script to create a sample test file to try against. In the repository, you can execute it by running <code class="language-plaintext highlighter-rouge">cargo run --bin create_data_file</code>.</p>
</blockquote>

<h2 id="multithreading-solution-1796-secs---80-improvement">Multithreading Solution (17.96 secs - 80% improvement)</h2>

<p>The first thing that came to mind to improve performance was to introduce multithreading, as my mini PC has a CPU with 8 cores and 16 threads. We can follow this strategy: create as many threads as the number of CPU threads (N). Then, each thread will process 1/N of the file in parallel. Finally, we will merge the results from all threads to calculate the final result.</p>

<p>Here are the changes:</p>

<p>The main function determines the number of available CPU cores to decide how many threads to spawn.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">fn</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
    <span class="cm">/*
    The release build is executed in around 17.96 seconds on SER5 PRO MAX:
       - CPU: AMD Ryzen 7 5800H with Radeon Graphics (16) @ 3.200GHz
       - GPU: AMD ATI Radeon Vega Series / Radeon Vega Mobile Series
       - Memory: 28993MiB
    */</span>
    <span class="k">let</span> <span class="n">start</span> <span class="o">=</span> <span class="nn">Instant</span><span class="p">::</span><span class="nf">now</span><span class="p">();</span>

    <span class="k">let</span> <span class="n">n_threads</span><span class="p">:</span> <span class="nb">usize</span> <span class="o">=</span> <span class="nn">std</span><span class="p">::</span><span class="nn">thread</span><span class="p">::</span><span class="nf">available_parallelism</span><span class="p">()</span><span class="nf">.unwrap</span><span class="p">()</span><span class="nf">.into</span><span class="p">();</span>

    <span class="o">...</span>
</code></pre></div></div>

<p>Then it divides the file into intervals based on <code class="language-plaintext highlighter-rouge">n_threads</code>. Each interval represents a chunk of the file to be processed by a thread in parallel.</p>

<p><strong>Note that the function that calculates the intervals ensures that no lines are split beetween chunks by adjusting the end positions to the end of the line.</strong></p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cd">/// Splits the file into intervals based on the number of CPUs.</span>
<span class="cd">/// Each interval is determined by dividing the file size by the number of CPUs</span>
<span class="cd">/// and adjusting the intervals to ensure lines are not split between chunks.</span>
<span class="cd">///</span>
<span class="cd">/// Example:</span>
<span class="cd">///</span>
<span class="cd">/// Suppose the file size is 1000 bytes and `cpus` is 4.</span>
<span class="cd">/// The file will be divided into 4 chunks, and the intervals might be as follows:</span>
<span class="cd">///</span>
<span class="cd">/// Interval { start: 0, end: 249 }</span>
<span class="cd">/// Interval { start: 250, end: 499 }</span>
<span class="cd">/// Interval { start: 500, end: 749 }</span>
<span class="cd">/// Interval { start: 750, end: 999 }</span>
<span class="cd">/// ```</span>
<span class="k">fn</span> <span class="nf">get_file_intervals_for_cpus</span><span class="p">(</span>
    <span class="n">cpus</span><span class="p">:</span> <span class="nb">usize</span><span class="p">,</span>
    <span class="n">file_size</span><span class="p">:</span> <span class="nb">u64</span><span class="p">,</span>
    <span class="n">reader</span><span class="p">:</span> <span class="o">&amp;</span><span class="k">mut</span> <span class="n">BufReader</span><span class="o">&lt;</span><span class="n">File</span><span class="o">&gt;</span><span class="p">,</span>
<span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">Vec</span><span class="o">&lt;</span><span class="n">Interval</span><span class="o">&gt;</span> <span class="p">{</span>
    <span class="k">let</span> <span class="n">chunk_size</span> <span class="o">=</span> <span class="n">file_size</span> <span class="o">/</span> <span class="p">(</span><span class="n">cpus</span> <span class="k">as</span> <span class="nb">u64</span><span class="p">);</span>
    <span class="k">let</span> <span class="k">mut</span> <span class="n">intervals</span> <span class="o">=</span> <span class="nn">Vec</span><span class="p">::</span><span class="nf">new</span><span class="p">();</span>
    <span class="k">let</span> <span class="k">mut</span> <span class="n">start</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="k">let</span> <span class="k">mut</span> <span class="n">buf</span> <span class="o">=</span> <span class="nn">String</span><span class="p">::</span><span class="nf">new</span><span class="p">();</span>

    <span class="k">for</span> <span class="n">_</span> <span class="k">in</span> <span class="mi">0</span><span class="o">..</span><span class="n">cpus</span> <span class="p">{</span>
        <span class="k">let</span> <span class="k">mut</span> <span class="n">end</span><span class="p">:</span> <span class="nb">u64</span> <span class="o">=</span> <span class="p">(</span><span class="n">start</span> <span class="o">+</span> <span class="n">chunk_size</span><span class="p">)</span><span class="nf">.min</span><span class="p">(</span><span class="n">file_size</span><span class="p">);</span>
        <span class="n">_</span> <span class="o">=</span> <span class="n">reader</span><span class="nf">.seek</span><span class="p">(</span><span class="nn">SeekFrom</span><span class="p">::</span><span class="nf">Start</span><span class="p">(</span><span class="n">end</span><span class="p">));</span>
        <span class="k">let</span> <span class="n">bytes_until_end_of_line</span> <span class="o">=</span> <span class="n">reader</span><span class="nf">.read_line</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="n">buf</span><span class="p">)</span><span class="nf">.unwrap</span><span class="p">();</span>
        <span class="n">end</span> <span class="o">=</span> <span class="n">end</span> <span class="o">+</span> <span class="p">(</span><span class="n">bytes_until_end_of_line</span> <span class="k">as</span> <span class="nb">u64</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">;</span> <span class="c1">// -1 because read_line() also reads the /n</span>

        <span class="n">intervals</span><span class="nf">.push</span><span class="p">(</span><span class="n">Interval</span> <span class="p">{</span> <span class="n">start</span><span class="p">,</span> <span class="n">end</span> <span class="p">});</span>

        <span class="n">start</span> <span class="o">=</span> <span class="n">end</span> <span class="o">+</span> <span class="mi">1</span><span class="p">;</span>
        <span class="n">buf</span><span class="nf">.clear</span><span class="p">();</span>
    <span class="p">}</span>
    <span class="n">intervals</span>
<span class="p">}</span>
</code></pre></div></div>

<p>For each interval, a new thread is spawned to process the corresponding file chunk. The <code class="language-plaintext highlighter-rouge">process_chunk</code> function reads the assigned file chunk and builds its own <code class="language-plaintext highlighter-rouge">StationsMap</code> for that chunk.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
<span class="k">fn</span> <span class="nf">process_chunk</span><span class="p">(</span><span class="n">file_path</span><span class="p">:</span> <span class="o">&amp;</span><span class="n">Path</span><span class="p">,</span> <span class="n">interval</span><span class="p">:</span> <span class="n">Interval</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="n">StationsMap</span> <span class="p">{</span>
    <span class="k">let</span> <span class="k">mut</span> <span class="n">reader</span> <span class="o">=</span> <span class="nf">get_file_reader</span><span class="p">(</span><span class="n">file_path</span><span class="p">)</span><span class="nf">.unwrap</span><span class="p">();</span>
    <span class="c1">// Starts from the interval start.</span>
    <span class="n">_</span> <span class="o">=</span> <span class="n">reader</span><span class="nf">.seek</span><span class="p">(</span><span class="nn">SeekFrom</span><span class="p">::</span><span class="nf">Start</span><span class="p">(</span><span class="n">interval</span><span class="py">.start</span><span class="p">));</span>
    <span class="c1">// The readers only reads the number of bytes for that interval.</span>
    <span class="k">let</span> <span class="n">chunk_reader</span> <span class="o">=</span> <span class="n">reader</span><span class="nf">.take</span><span class="p">(</span><span class="n">interval</span><span class="py">.end</span> <span class="o">-</span> <span class="n">interval</span><span class="py">.start</span><span class="p">);</span>
    <span class="nf">build_map</span><span class="p">(</span><span class="n">chunk_reader</span><span class="p">)</span><span class="nf">.unwrap</span><span class="p">()</span>
<span class="p">}</span>
</code></pre></div></div>

<p>After all threads complete, their results are merged into a single map using the <code class="language-plaintext highlighter-rouge">merge_maps</code> function.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">fn</span> <span class="nf">merge_maps</span><span class="p">(</span><span class="n">a</span><span class="p">:</span> <span class="n">StationsMap</span><span class="p">,</span> <span class="n">b</span><span class="p">:</span> <span class="o">&amp;</span><span class="n">StationsMap</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="n">StationsMap</span> <span class="p">{</span>
    <span class="k">let</span> <span class="k">mut</span> <span class="n">merged_map</span> <span class="o">=</span> <span class="n">a</span><span class="p">;</span>
    <span class="k">for</span> <span class="p">(</span><span class="n">k</span><span class="p">,</span> <span class="n">v</span><span class="p">)</span> <span class="k">in</span> <span class="n">b</span> <span class="p">{</span>
        <span class="n">merged_map</span><span class="nf">.entry</span><span class="p">(</span><span class="n">k</span><span class="nf">.into</span><span class="p">())</span><span class="nf">.or_default</span><span class="p">()</span><span class="nf">.merge</span><span class="p">(</span><span class="n">v</span><span class="p">);</span>
    <span class="p">}</span>
    <span class="n">merged_map</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Here is the final main function to get a better overview of all parts.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
<span class="k">fn</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
    <span class="k">let</span> <span class="n">start</span> <span class="o">=</span> <span class="nn">Instant</span><span class="p">::</span><span class="nf">now</span><span class="p">();</span>

    <span class="c1">// Number of threads available.</span>
    <span class="k">let</span> <span class="n">n_threads</span><span class="p">:</span> <span class="nb">usize</span> <span class="o">=</span> <span class="nn">std</span><span class="p">::</span><span class="nn">thread</span><span class="p">::</span><span class="nf">available_parallelism</span><span class="p">()</span><span class="nf">.unwrap</span><span class="p">()</span><span class="nf">.into</span><span class="p">();</span>

    <span class="k">let</span> <span class="n">file_path</span> <span class="o">=</span> <span class="nn">Path</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="s">"./data/weather_stations.csv"</span><span class="p">);</span>
    <span class="k">let</span> <span class="n">file_size</span> <span class="o">=</span> <span class="nn">fs</span><span class="p">::</span><span class="nf">metadata</span><span class="p">(</span><span class="o">&amp;</span><span class="n">file_path</span><span class="p">)</span><span class="nf">.unwrap</span><span class="p">()</span><span class="nf">.size</span><span class="p">();</span>
    <span class="k">let</span> <span class="k">mut</span> <span class="n">reader</span> <span class="o">=</span> <span class="nf">get_file_reader</span><span class="p">(</span><span class="n">file_path</span><span class="p">)</span><span class="nf">.unwrap</span><span class="p">();</span>

    <span class="c1">// Divide the file into n_threads intervals.</span>
    <span class="k">let</span> <span class="n">intervals</span> <span class="o">=</span> <span class="nf">get_file_intervals_for_cpus</span><span class="p">(</span><span class="n">n_threads</span><span class="p">,</span> <span class="n">file_size</span><span class="p">,</span> <span class="o">&amp;</span><span class="k">mut</span> <span class="n">reader</span><span class="p">);</span>

    <span class="c1">// Vector that contains all partial results of each thread.</span>
    <span class="k">let</span> <span class="n">results</span> <span class="o">=</span> <span class="nn">Arc</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="nn">Mutex</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="nn">Vec</span><span class="p">::</span><span class="nf">new</span><span class="p">()));</span>
    <span class="k">let</span> <span class="k">mut</span> <span class="n">handles</span> <span class="o">=</span> <span class="nn">Vec</span><span class="p">::</span><span class="nf">new</span><span class="p">();</span>

    <span class="k">for</span> <span class="n">interval</span> <span class="k">in</span> <span class="n">intervals</span> <span class="p">{</span>
        <span class="k">let</span> <span class="n">results</span> <span class="o">=</span> <span class="nn">Arc</span><span class="p">::</span><span class="nf">clone</span><span class="p">(</span><span class="o">&amp;</span><span class="n">results</span><span class="p">);</span>

        <span class="c1">// Each thread process a chunk in parallel.</span>
        <span class="k">let</span> <span class="n">handle</span> <span class="o">=</span> <span class="nn">thread</span><span class="p">::</span><span class="nf">spawn</span><span class="p">(</span><span class="k">move</span> <span class="p">||</span> <span class="p">{</span>
            <span class="k">let</span> <span class="n">station_to_metrics</span> <span class="o">=</span> <span class="nf">process_chunk</span><span class="p">(</span><span class="n">file_path</span><span class="p">,</span> <span class="n">interval</span><span class="p">);</span>
            <span class="n">results</span><span class="nf">.lock</span><span class="p">()</span><span class="nf">.unwrap</span><span class="p">()</span><span class="nf">.push</span><span class="p">(</span><span class="n">station_to_metrics</span><span class="p">);</span>
        <span class="p">});</span>
        <span class="n">handles</span><span class="nf">.push</span><span class="p">(</span><span class="n">handle</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="k">for</span> <span class="n">handle</span> <span class="k">in</span> <span class="n">handles</span> <span class="p">{</span>
        <span class="n">handle</span><span class="nf">.join</span><span class="p">()</span><span class="nf">.expect</span><span class="p">(</span><span class="s">"Thread panicked"</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="c1">// Combines all partial results into the final result.</span>
    <span class="k">let</span> <span class="n">result</span> <span class="o">=</span> <span class="n">results</span>
        <span class="nf">.lock</span><span class="p">()</span>
        <span class="nf">.unwrap</span><span class="p">()</span>
        <span class="nf">.iter</span><span class="p">()</span>
        <span class="nf">.fold</span><span class="p">(</span><span class="nn">StationsMap</span><span class="p">::</span><span class="nf">default</span><span class="p">(),</span> <span class="p">|</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">|</span> <span class="nf">merge_maps</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">b</span><span class="p">));</span>

    <span class="nf">print_metrics</span><span class="p">(</span><span class="o">&amp;</span><span class="n">result</span><span class="p">);</span>
    <span class="nd">println!</span><span class="p">(</span><span class="s">"</span><span class="se">\n</span><span class="s"> Execution time: {:?}"</span><span class="p">,</span> <span class="n">start</span><span class="nf">.elapsed</span><span class="p">());</span>
<span class="p">}</span>
</code></pre></div></div>

<p>This solution improves the execution time from 90 seconds to 17.96 seconds. Great achivement! But we need to do better to be closer to the 5.7 seconds of the official solution. Let’s continue optimizing!</p>

<p><strong><a href="https://github.com/RPallas92/one-billion-row/commit/a8072a50829070a74e44cf1144f9aee940169a98">Link to commit</a></strong></p>

<h2 id="custom-number-parsing-81-seconds---549-improvement">Custom number parsing (8.1 seconds - 54.9% improvement)</h2>

<p>Let’s use <a href="https://github.com/brendangregg/FlameGraph">cargo flamegraph</a> to visualize the stack of the current solution to know what we can start optimizing. Since I am using Fedora, it uses <a href="https://perf.wiki.kernel.org/index.php/Main_Page">perf</a> under the hood:</p>

<p><code class="language-plaintext highlighter-rouge">cargo flamegraph -b one-billion-row</code></p>

<p>We get the following flame graph:</p>

<p><a href="../assets/images/1brc/flamegraph1.png"><img src="../assets/images/1brc/flamegraph1.png" alt="Flamegraph of the first multitheading solution" /></a></p>

<p>If we zoom in on the right side of the image, we will see that almost 10% of the samples are for parsing the temperature into a <code class="language-plaintext highlighter-rouge">f32</code> number:</p>

<p><a href="../assets/images/1brc/flamegraph2.png"><img src="../assets/images/1brc/flamegraph2.png" alt="Zoomed in flamegraph of the first multitheading solution" /></a></p>

<p>The image corresponds to this part of the code:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">let</span> <span class="n">temperature</span><span class="p">:</span> <span class="nb">f32</span> <span class="o">=</span> <span class="n">temperature</span><span class="nf">.parse</span><span class="p">()</span><span class="nf">.expect</span><span class="p">(</span><span class="s">"Incorrect temperature"</span><span class="p">);</span>
</code></pre></div></div>

<p>We know that our test file contains all temperatures in one of the following two formats:</p>
<ol>
  <li><code class="language-plaintext highlighter-rouge">ab.c</code> (e.g., 12.5)</li>
  <li><code class="language-plaintext highlighter-rouge">b.c</code>  (e.g., 5.4)</li>
</ol>

<p>Also, in case of negative numbers it has a <code class="language-plaintext highlighter-rouge">-</code> right before.</p>

<p>Knowing this, we update our code to read each line of the file chunk as bytes instead of strings, and write a function that manually parses the bytes corresponding to the temperature to a fixed-precision <code class="language-plaintext highlighter-rouge">i32</code> signed integer. This should be faster than parsing the file bytes to a string and then to a <code class="language-plaintext highlighter-rouge">f32</code>.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Assuming the file always have 1-2 integer parts and always 1 decimal digit</span>
<span class="k">fn</span> <span class="nf">parse_temperature</span><span class="p">(</span><span class="k">mut</span> <span class="n">s</span><span class="p">:</span> <span class="o">&amp;</span><span class="p">[</span><span class="nb">u8</span><span class="p">])</span> <span class="k">-&gt;</span> <span class="n">V</span> <span class="p">{</span>
    <span class="k">let</span> <span class="n">neg</span> <span class="o">=</span> <span class="k">if</span> <span class="n">s</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">==</span> <span class="sc">b'-'</span> <span class="p">{</span>
        <span class="n">s</span> <span class="o">=</span> <span class="o">&amp;</span><span class="n">s</span><span class="p">[</span><span class="mi">1</span><span class="o">..</span><span class="p">];</span>
        <span class="k">true</span>
    <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
        <span class="k">false</span>
    <span class="p">};</span>

    <span class="k">let</span> <span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="p">)</span> <span class="o">=</span> <span class="k">match</span> <span class="n">s</span> <span class="p">{</span>
        <span class="p">[</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="sc">b'.'</span><span class="p">,</span> <span class="n">c</span><span class="p">]</span> <span class="k">=&gt;</span> <span class="p">(</span><span class="n">a</span> <span class="o">-</span> <span class="sc">b'0'</span><span class="p">,</span> <span class="n">b</span> <span class="o">-</span> <span class="sc">b'0'</span><span class="p">,</span> <span class="n">c</span> <span class="o">-</span> <span class="sc">b'0'</span><span class="p">),</span>
        <span class="p">[</span><span class="n">b</span><span class="p">,</span> <span class="sc">b'.'</span><span class="p">,</span> <span class="n">c</span><span class="p">]</span> <span class="k">=&gt;</span> <span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">b</span> <span class="o">-</span> <span class="sc">b'0'</span><span class="p">,</span> <span class="n">c</span> <span class="o">-</span> <span class="sc">b'0'</span><span class="p">),</span>
        <span class="n">_</span> <span class="k">=&gt;</span> <span class="nd">panic!</span><span class="p">(</span><span class="s">"Unknown pattern {:?}"</span><span class="p">,</span> <span class="nn">std</span><span class="p">::</span><span class="nn">str</span><span class="p">::</span><span class="nf">from_utf8</span><span class="p">(</span><span class="n">s</span><span class="p">)</span><span class="nf">.unwrap</span><span class="p">()),</span>
    <span class="p">};</span>

    <span class="k">let</span> <span class="n">v</span> <span class="o">=</span> <span class="p">(</span><span class="n">a</span> <span class="k">as</span> <span class="n">V</span><span class="p">)</span> <span class="o">*</span> <span class="mi">100</span> <span class="o">+</span> <span class="p">(</span><span class="n">b</span> <span class="k">as</span> <span class="n">V</span><span class="p">)</span> <span class="o">*</span> <span class="mi">10</span> <span class="o">+</span> <span class="p">(</span><span class="n">c</span> <span class="k">as</span> <span class="n">V</span><span class="p">);</span>

    <span class="k">if</span> <span class="n">neg</span> <span class="p">{</span>
        <span class="o">-</span><span class="n">v</span>
    <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
        <span class="n">v</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>This is how the function works:</p>
<ol>
  <li>It first checks if the first byte (character) in the input slice s is a minus sign (-). If so, it removes it.</li>
  <li>It uses pattern matching to handle the two formats of the temperature:
    <ul>
      <li><code class="language-plaintext highlighter-rouge">[a, b, b'.', c]</code>: This pattern matches when the input has two digits before the decimal point and one digit after it. For example, 23.4.</li>
      <li><code class="language-plaintext highlighter-rouge">[b, b'.', c]</code>: This pattern matches when the input has one digit before the decimal point and one digit after it. For example, 3.4.</li>
    </ul>
  </li>
  <li>For both patterns it extracts the numeric values of the digits by subtracting the ASCII value of 0 from each byte. This converts the ASCII byte representation of a digit to its actual integer value.</li>
  <li>It calculates the temperature by combining the digits:
    <ul>
      <li><code class="language-plaintext highlighter-rouge">a</code> is multiplied by 100.</li>
      <li><code class="language-plaintext highlighter-rouge">b</code> is multiplied by 10.</li>
      <li><code class="language-plaintext highlighter-rouge">c</code> is used as is.</li>
    </ul>
  </li>
  <li>The sum of these products gives the temperature. But notice that we need to divide it by 10 at the printing step.</li>
  <li>If the number is negative it returns <code class="language-plaintext highlighter-rouge">-v</code>.</li>
</ol>

<blockquote>
  <p>Note that if we want to support other formats, we need to update the function by adding a new branch to the <code class="language-plaintext highlighter-rouge">match</code> statement.</p>
</blockquote>

<p>After this change, we have improved the execution time from 17.96 seconds to 8.1 seconds. We are getting closer!</p>

<p><strong><a href="https://github.com/RPallas92/one-billion-row/commit/f67e2d752bde29c51427cc24c0e817a00efe277a">Link to commit</a></strong></p>

<h2 id="custom-key-parsing-676-seconds---165-improvement">Custom key parsing (6.76 seconds - 16.5% improvement)</h2>

<p>Let’s now generate another frame graph of the current solution to see what can be our next improvement:</p>

<p><a href="../assets/images/1brc/flamegraph3.png"><img src="../assets/images/1brc/flamegraph3.png" alt="Flamegraph of the second multitheading solution" /></a></p>

<p>If we zoom in on the right side of the image, we will see that more than 10% of the samples are for parsing the city ftom a bytes slice to a string:</p>

<p><a href="../assets/images/1brc/flamegraph4.png"><img src="../assets/images/1brc/flamegraph4.png" alt="Zoomed in flamegraph of the first multitheading solution" /></a></p>

<p>Let’s do the same we did for parsing the temperature. We are going to write a custom parser to make the program faster.</p>

<p>Our current <code class="language-plaintext highlighter-rouge">StationsMap</code> maps from city names (String) to <code class="language-plaintext highlighter-rouge">StationMetrics</code>. What if we change it to <code class="language-plaintext highlighter-rouge">BTreeMap&lt;u64, StationMetrics&gt;</code>? We can write a fast function that parses from a bytes slice to a u64, and u64 (8 bytes) should be enough to identify a city (e.g. the first 8 characters of its name).</p>

<p>Having a <code class="language-plaintext highlighter-rouge">u64</code> instead of a string as key also comes with the advantage of having the hash map keys inlined.</p>

<h3 id="how-hash-maps-work">How hash maps work</h3>

<p>When you use <code class="language-plaintext highlighter-rouge">String</code> as keys in a HashMap, each key is a heap-allocated, dynamically sized string. This can have implications for performance, especially in terms of hashing and memory usage:</p>

<p><a href="../assets/images/1brc/hashmap1.png"><img src="../assets/images/1brc/hashmap1.png" alt="Hash map 1" /></a></p>

<p>When you use <code class="language-plaintext highlighter-rouge">u64</code> as keys in a HashMap, each key is a fixed-size, stack-allocated integer. This is typically more efficient in terms of both hashing and memory usage.</p>

<p><a href="../assets/images/1brc/hashmap2.png"><img src="../assets/images/1brc/hashmap2.png" alt="Hash map 2" /></a></p>

<p>The difference is that since each <code class="language-plaintext highlighter-rouge">u64</code> key is stack-allocated,  it uses a fixed amount of memory and is stored directly on the stack, which is generally faster to allocate and deallocate. That is why by using <code class="language-plaintext highlighter-rouge">u64</code> keys, we can achieve better performance for lookups, insertions, and deletions in the hash map.</p>

<p>See how a value is retrieved in both cases.</p>

<p>HashMap with <code class="language-plaintext highlighter-rouge">String</code> keys:</p>

<p><a href="../assets/images/1brc/hashmap3.png"><img src="../assets/images/1brc/hashmap3.png" alt="Hash map with string keys" /></a></p>

<p>HashMap with <code class="language-plaintext highlighter-rouge">u64</code> keys (inlined):</p>

<p><a href="../assets/images/1brc/hashmap4.png"><img src="../assets/images/1brc/hashmap4.png" alt="Hash map with u64 keys" /></a></p>

<p>As we can see in the graphs, the differences are that:</p>
<ol>
  <li><code class="language-plaintext highlighter-rouge">u64</code> keys are allocated in the stack which is faster and don´t need the additional step of heap allocating the key.</li>
  <li>Hashing a fixed-size integer is faster.</li>
</ol>

<h3 id="code-changes">Code changes</h3>

<p>Let’s then change <code class="language-plaintext highlighter-rouge">StationsMap</code> to <code class="language-plaintext highlighter-rouge">BTreeMap&lt;u64, StationMetrics&gt;</code> and write a function to parse the city as u64. The inconvenience is that we will need to add a string field (city) to <code class="language-plaintext highlighter-rouge">StationMetrics</code> to store the actual name of the city. Notice this string will be only calculated once for each city, not for the one billion rows.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">fn</span> <span class="nf">to_key</span><span class="p">(</span><span class="n">data</span><span class="p">:</span> <span class="o">&amp;</span><span class="p">[</span><span class="nb">u8</span><span class="p">])</span> <span class="k">-&gt;</span> <span class="nb">u64</span> <span class="p">{</span>
    <span class="k">let</span> <span class="k">mut</span> <span class="n">hash</span> <span class="o">=</span> <span class="mi">0u64</span><span class="p">;</span>
    <span class="k">let</span> <span class="n">len</span> <span class="o">=</span> <span class="n">data</span><span class="nf">.len</span><span class="p">();</span>
    <span class="k">unsafe</span> <span class="p">{</span>
        <span class="k">if</span> <span class="n">len</span> <span class="o">&gt;=</span> <span class="mi">8</span> <span class="p">{</span>
            <span class="n">hash</span> <span class="o">=</span> <span class="o">*</span><span class="p">(</span><span class="n">data</span><span class="nf">.as_ptr</span><span class="p">()</span> <span class="k">as</span> <span class="o">*</span><span class="k">const</span> <span class="nb">u64</span><span class="p">);</span>
        <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
            <span class="k">for</span> <span class="n">i</span> <span class="k">in</span> <span class="mi">0</span><span class="o">..</span><span class="n">len</span> <span class="p">{</span>
                <span class="n">hash</span> <span class="p">|</span><span class="o">=</span> <span class="p">(</span><span class="o">*</span><span class="n">data</span><span class="nf">.get_unchecked</span><span class="p">(</span><span class="n">i</span><span class="p">)</span> <span class="k">as</span> <span class="nb">u64</span><span class="p">)</span> <span class="o">&lt;&lt;</span> <span class="p">(</span><span class="n">i</span> <span class="o">*</span> <span class="mi">8</span><span class="p">);</span>
            <span class="p">}</span>
        <span class="p">}</span>
    <span class="p">}</span>

    <span class="n">hash</span> <span class="o">^=</span> <span class="n">len</span> <span class="k">as</span> <span class="nb">u64</span><span class="p">;</span>
    <span class="n">hash</span>
<span class="p">}</span>

</code></pre></div></div>

<p>As we already said, the <code class="language-plaintext highlighter-rouge">to_key</code> function converts a slice of bytes into a <code class="language-plaintext highlighter-rouge">u64</code> integer that is used to identify cities in our <code class="language-plaintext highlighter-rouge">StationMaps</code>. This is how it works:</p>

<ol>
  <li>It starts by creating a variable called <code class="language-plaintext highlighter-rouge">hash</code> and sets it to 0.</li>
  <li>It then gets the length of the input data (number of bytes).</li>
  <li>If the data length is 8 or more bytes:
    <ul>
      <li>It directly reads the first 8 bytes and interprets them as a <code class="language-plaintext highlighter-rouge">u64</code> integer. This is a fast way to create a key.</li>
    </ul>
  </li>
  <li>If the data length is less than 8 bytes:
    <ul>
      <li>It processes each byte individually in a loop.</li>
      <li>For each byte, it shifts the byte’s value by its position (multiplied by 8) and combines it with the hash using the bitwise OR operator.</li>
    </ul>
  </li>
  <li>Finally, it adjusts the final hash by XOR-ing with the length of the data. This is to ensure that cities that start with the same 8 bytes have a different hash (assuming they have different length).</li>
</ol>

<p>Running this solution now improves the execution time from 8.1 seconds to 6.76 seconds. We still have some work to do!</p>

<p><strong><a href="https://github.com/RPallas92/one-billion-row/commit/f5d8ec3239297097359e3f01ba0f96572e91771b">Link to commit</a></strong></p>

<h2 id="custom-hash-function-585-seconds---135-improvement">Custom hash function (5.85 seconds - 13.5% improvement)</h2>

<p>In the previous section, we explained how hash maps work. In the diagrams, we saw that retrieving a value for a given key from the hash map requires first hashing the key, then looking up that hash in the hash table, and finally retrieving the data for that hash.</p>

<p>A question that naturally arises now is: why do we need to hash our keys if we are already using a <code class="language-plaintext highlighter-rouge">u64</code> key that identifies the city? Why not using that <code class="language-plaintext highlighter-rouge">u64</code> key directly as hash so we avoid that extra step?</p>

<p>Let’s use a custom hasher that just returns the <code class="language-plaintext highlighter-rouge">u64</code> without applying any hash function.</p>

<p>One inconvenience is that <code class="language-plaintext highlighter-rouge">BTreeMap</code> does not support custom hashers, so we will have to use a <code class="language-plaintext highlighter-rouge">HashMap</code> instead. This means we will need to sort the values before printing them.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">#[derive(Default)]</span>
<span class="k">struct</span> <span class="n">NoOpHasher</span> <span class="p">{</span>
    <span class="n">hash</span><span class="p">:</span> <span class="nb">u64</span><span class="p">,</span>
<span class="p">}</span>

<span class="k">impl</span> <span class="n">Hasher</span> <span class="k">for</span> <span class="n">NoOpHasher</span> <span class="p">{</span>
    <span class="k">fn</span> <span class="nf">write</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="k">self</span><span class="p">,</span> <span class="n">_bytes</span><span class="p">:</span> <span class="o">&amp;</span><span class="p">[</span><span class="nb">u8</span><span class="p">])</span> <span class="p">{</span>
        <span class="nd">panic!</span><span class="p">(</span><span class="s">"NoOpHasher only supports u64 values"</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="k">fn</span> <span class="nf">write_u64</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="k">self</span><span class="p">,</span> <span class="n">i</span><span class="p">:</span> <span class="nb">u64</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">self</span><span class="py">.hash</span> <span class="o">=</span> <span class="n">i</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="k">fn</span> <span class="nf">finish</span><span class="p">(</span><span class="o">&amp;</span><span class="k">self</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">u64</span> <span class="p">{</span>
        <span class="k">self</span><span class="py">.hash</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="k">struct</span> <span class="n">NoOpBuildHasher</span><span class="p">;</span>

<span class="k">impl</span> <span class="n">BuildHasher</span> <span class="k">for</span> <span class="n">NoOpBuildHasher</span> <span class="p">{</span>
    <span class="k">type</span> <span class="n">Hasher</span> <span class="o">=</span> <span class="n">NoOpHasher</span><span class="p">;</span>

    <span class="k">fn</span> <span class="nf">build_hasher</span><span class="p">(</span><span class="o">&amp;</span><span class="k">self</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="n">NoOpHasher</span> <span class="p">{</span>
        <span class="nn">NoOpHasher</span><span class="p">::</span><span class="nf">default</span><span class="p">()</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="k">type</span> <span class="n">StationsMap</span> <span class="o">=</span> <span class="n">HashMap</span><span class="o">&lt;</span><span class="nb">u64</span><span class="p">,</span> <span class="n">StationMetrics</span><span class="p">,</span> <span class="n">NoOpBuildHasher</span><span class="o">&gt;</span><span class="p">;</span>
</code></pre></div></div>

<p>Not much to comment about this piece of code, it just does what we already mentioned: it uses a <code class="language-plaintext highlighter-rouge">u64</code> value without applying any hash function.</p>

<p>After applying these changes and executing the code, we see an improvement from 6.76 seconds to 5.85 seconds. This is almost the same as the official Rust solution (5.7 seconds)!</p>

<p><strong><a href="https://github.com/RPallas92/one-billion-row/commit/f424a88727b284dcdacbb4977a381e894e42b53f">Link to commit</a></strong></p>

<h2 id="unsafe-string-parsing-516-seconds---118-improvement">Unsafe string parsing (5.16 seconds - 11.8% improvement)</h2>

<p>Our current solution is already optimized and has a performance similar to the official Rust solution. But can we do a final optimization? Let’s see what the flamegraph has to say:</p>

<p><a href="../assets/images/1brc/flamegraph5.png"><img src="../assets/images/1brc/flamegraph5.png" alt="Flamegraph of the custom hash solution" /></a></p>

<p>If we zoom in, we see that 11% of the samples are calls to <code class="language-plaintext highlighter-rouge">core::str::converts::from_utf8</code>:</p>

<p><a href="../assets/images/1brc/flamegraph6.png"><img src="../assets/images/1brc/flamegraph6.png" alt="Zoomed in flamegraph of the custom hash solution" /></a></p>

<p>How is this possible if we are parsing the city names as <code class="language-plaintext highlighter-rouge">u64</code>? This is because we are still parsing it as strings to store them in the <code class="language-plaintext highlighter-rouge">StationMetrics</code> struct to printing at the end. Even though we only parse them once per city and per thread, this represents 11% of the traces.</p>

<p>How can we improve this? We know that we are calling <code class="language-plaintext highlighter-rouge">std::str::from_utf8(city).unwrap().to_string()</code>, which is safe and checks whether the byte slice is valid UTF-8. Since we know that our file contains valid UTF-8 strings, we can replace it with:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">city</span><span class="p">:</span> <span class="k">unsafe</span> <span class="p">{</span> <span class="nn">std</span><span class="p">::</span><span class="nn">str</span><span class="p">::</span><span class="nf">from_utf8_unchecked</span><span class="p">(</span><span class="n">city</span><span class="p">)</span><span class="nf">.to_string</span><span class="p">()</span> <span class="p">},</span>

</code></pre></div></div>

<p>This way, we skip the UTF-8 validation. Note that it should be only used when you are certain that the byte slice is a valid UTF-8.</p>

<p>Now the program takes 5.16 seconds. This time, it is faster than the official Rust solution!</p>

<p><strong><a href="https://github.com/RPallas92/one-billion-row/commit/b7c799cad367173c901b4e253ec4ccb41eac1fdc">Link to commit</a></strong></p>

<h3 id="edit-1-custom-line-splitting-482-seconds---659-improvement">Edit 1: Custom line splitting (4.82 seconds - 6.59% improvement)</h3>

<p>After reading this post, my friend <a href="https://kkyr.io/">Kyriacos</a> suggested an optimization to me. He said:</p>

<blockquote>
  <p>Why are you using <code class="language-plaintext highlighter-rouge">line.split_once(|&amp;c| c == b';')</code> to split each line if you know that the separator is almost always at the same position?</p>
</blockquote>

<p>He was right. I was using the <code class="language-plaintext highlighter-rouge">split_once</code> function, which performs a linear search over the line. This means it scans through the byte slice from the beginning to the end until it finds the separator character. This can clearly be optimized given the file format.</p>

<p>We know that the separator can be at one of these three positions:</p>
<ol>
  <li><code class="language-plaintext highlighter-rouge">line_length - 4</code>: for lines like “city_name;2.3”</li>
  <li><code class="language-plaintext highlighter-rouge">line_length - 5</code>: for lines like “city_name;12.3” or “city_name;-2.3”</li>
  <li><code class="language-plaintext highlighter-rouge">line_length - 6</code>: for lines like “city_name;-12.3”</li>
</ol>

<p>Instead of scanning the whole line (linear time complexity), we can check any of these three positions (constant time complexity). Let’s update the code:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">let</span> <span class="n">line_length</span> <span class="o">=</span> <span class="n">line</span><span class="nf">.len</span><span class="p">();</span>
<span class="k">let</span> <span class="n">separator_pos</span> <span class="o">=</span> <span class="k">if</span> <span class="n">line</span><span class="p">[</span><span class="n">line_length</span> <span class="o">-</span> <span class="mi">4</span><span class="p">]</span> <span class="o">==</span> <span class="sc">b';'</span> <span class="p">{</span>
    <span class="n">line_length</span> <span class="o">-</span> <span class="mi">4</span>
<span class="p">}</span> <span class="k">else</span> <span class="k">if</span> <span class="n">line</span><span class="p">[</span><span class="n">line_length</span> <span class="o">-</span> <span class="mi">5</span><span class="p">]</span> <span class="o">==</span> <span class="sc">b';'</span> <span class="p">{</span>
    <span class="n">line_length</span> <span class="o">-</span> <span class="mi">5</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
    <span class="n">line_length</span> <span class="o">-</span> <span class="mi">6</span>
<span class="p">};</span>

<span class="k">let</span> <span class="p">(</span><span class="n">city</span><span class="p">,</span> <span class="n">temperature</span><span class="p">)</span> <span class="o">=</span> <span class="n">line</span><span class="nf">.split_at</span><span class="p">(</span><span class="n">separator_pos</span><span class="p">);</span>
</code></pre></div></div>

<p>After executing the program again, the time decreased from 5.16 seconds to 4.82 seconds. Thanks, Kyriacos!</p>

<p><strong><a href="https://github.com/RPallas92/one-billion-row/commit/d361957a8f75d9f4d0db6dc56abe66b45c34fcfb">Link to commit</a></strong></p>

<h2 id="conclusion">Conclusion</h2>

<p>In this blog post, we explored various optimizations for the 1 Billion Row Challenge in Rust, aiming to improve performance without external dependencies. We started from a basic solution that took 90 seconds and implemented several enhancements:</p>

<ul>
  <li>Multithreading: Reduced execution time to 17.96 seconds.</li>
  <li>Custom Number Parsing: Improved performance to 8.1 seconds.</li>
  <li>Custom Key Parsing: Further optimized to 6.76 seconds.</li>
  <li>Custom Hash Function: Achieved a time of 5.85 seconds.</li>
  <li>Unsafe String Parsing: Reached a final time of 5.16 seconds, which is faster than the official Rust solution!</li>
</ul>

<p>While our final solution is faster than the official Rust implementation, other approaches using libraries like memmap or hashmap might be even more efficient for real-world scenarios.</p>

<p>Keep in mind that these results are specific to my machine and test file, so performance may vary for others. I invite you to find more optimizations and share them in the comments!</p>

<p>Thanks for reading, and happy optimizing!</p>]]></content><author><name>Ricardo Pallas</name></author><category term="blog" /><category term="1brc" /><category term="algorithms" /><category term="data structures" /><category term="1 billion row challenge" /><category term="rust" /><category term="performance" /><category term="multithreading" /><category term="optimization" /><category term="bitwise" /><category term="hash maps" /><summary type="html"><![CDATA[Fast implementation of the 1BRC in Rust with 0 crates.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://rpallas92.github.io/assets/images/1brc/flamegraph1.jpg" /><media:content medium="image" url="https://rpallas92.github.io/assets/images/1brc/flamegraph1.jpg" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Gas stations finder in Golang</title><link href="https://rpallas92.github.io/gas-stations-finder-in-go/" rel="alternate" type="text/html" title="Gas stations finder in Golang" /><published>2023-11-30T08:00:00+00:00</published><updated>2023-11-30T08:00:00+00:00</updated><id>https://rpallas92.github.io/gas-stations-finder-in-go</id><content type="html" xml:base="https://rpallas92.github.io/gas-stations-finder-in-go/"><![CDATA[<p>Hi all,</p>

<p>I wanted to share a project I created a couple of years ago, during winter holidays, with the purpose of learning Go.</p>

<p>In December 2021, gas prices were more expensive than ever in Spain, and I drove a lot. That’s why I written the <a href="https://github.com/RPallas92/GasPrices">Gas Station Finder</a> in Go; I wanted to find all gas stations between 2 cities and sort them by price. This way, I could fill the tank at a cheaper gas station without having to deviate from the route.</p>

<p><img src="https://raw.githubusercontent.com/RPallas92/GasPrices/main/gas_prices_ui.png" alt="Sample UI for Gas Stations service" /></p>

<p>Let me break down how it all works.</p>

<p>The project relies on the OpenRouteService API to find the route between two coordinates. It also uses its reverse geocoding service to convert city names into coordinates. This is done to expose an API that given 2 city names, it returns both the route and all near gas stations with its prices.</p>

<p>The REST API is built with the Gin web framework. I liked it as it was straightforward and simple to use.</p>

<p>The app is split into different components: one for figuring out where you are, one for planning routes, another for getting directions, one for getting gas station prices, and another to retrieve gas stations nearby.</p>

<p>In order to improve performance, I did two things:</p>
<ul>
  <li>Use a KDTree to store Gas stations. For a given route, it called the KDTree to find nearby gas stations really fast (O(log n)).</li>
  <li>Keep all prices in memory and refresh them in the background every 4 hours. Since the prices don´t change that often, caching them was a good idea to improve performance, because retrieving the prices from the Spanish goverment website takes quite a bit.</li>
</ul>

<p>As an improvement, instead of having all the gas stations prices in memory, we can use an embedded database like <a href="https://github.com/RPallas92/GrausDB">GrausDB</a>. It can also be used to store routes between cities instead of calling OpenRouteService to calculate them each time. These routes are not going to change often, therefore we can keep refresh them every 2 weeks, for example.</p>

<p>I enjoyed coding it since it only took a few hours, and it proved to be a useful app for myself. Also, I learned a little bit of Go!</p>

<p>You can find the code <a href="https://github.com/RPallas92/GasPrices">on GitHub: GasPrices</a>.</p>

<p>Thanks for reading!</p>

<p>Ricardo.</p>]]></content><author><name>Ricardo Pallas</name></author><category term="blog" /><category term="go" /><category term="algorithms" /><category term="data structures" /><category term="faang" /><category term="golang" /><category term="kd-tree" /><category term="gas stations" /><summary type="html"><![CDATA[Gas Station Finder, crafted with Go, simplifies gas station searches with smart features and efficient architecture.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://raw.githubusercontent.com/RPallas92/GasPrices/main/gas_prices_ui.png" /><media:content medium="image" url="https://raw.githubusercontent.com/RPallas92/GasPrices/main/gas_prices_ui.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Essential algorithms for the coding interview</title><link href="https://rpallas92.github.io/essential-algorithms/" rel="alternate" type="text/html" title="Essential algorithms for the coding interview" /><published>2023-01-04T08:00:00+00:00</published><updated>2023-01-04T08:00:00+00:00</updated><id>https://rpallas92.github.io/essential-algorithms</id><content type="html" xml:base="https://rpallas92.github.io/essential-algorithms/"><![CDATA[<p>Hello everyone,</p>

<p>I am excited to announce the release of my new/first book, “Essential algorithms for the coding interview.”, which has been self-published through Amazon KDP.</p>

<p><strong>Who is this book for?</strong></p>

<p>This book is designed to help software engineers and computer science students prepare for technical interviews by providing a comprehensive guide to essential algorithms and data structures.</p>

<p><strong>What is the structure of the book?</strong></p>

<p>This book is a concise, informative resource that contains 11 main chapters, each covering a different type of coding problem. For each problem, it provides detailed explanations of the algorithms and data structures needed to solve it, followed by a detailed solution with expositions, graphs, and step-by-step executions to help readers understand and apply the concepts.</p>

<p>Here is the table of contents:</p>

<ol>
  <li>Introduction.</li>
  <li>First Bad Version (Binary Search).</li>
  <li>Valid Parentheses (Stack).</li>
  <li>Valid Palindrome (2 pointers).</li>
  <li>House Robber (Recursion).</li>
  <li>Combination Sum (Backtracking).</li>
  <li>Lowest Common Ancestor (Binary Trees).</li>
  <li>Binary Tree Level Order Traversal (Tree and BFS).</li>
  <li>Course Schedule (Graphs).</li>
  <li>Minimum Window Substring (Sliding Window).</li>
  <li>Merge k Sorted Lists (Heaps).</li>
  <li>Soduku Solver (Backtracking).</li>
  <li>Afterword.</li>
</ol>

<p><strong>Where to find the book?</strong></p>

<p>You can find it on all Amazon marketplaces like:</p>

<ul>
  <li>US: <a href="https://www.amazon.com/dp/B0BRJPFT54">https://www.amazon.com/dp/B0BRJPFT54</a>.</li>
  <li>ES: <a href="https://www.amazon.es/dp/B0BRJPFT54">https://www.amazon.es/dp/B0BRJPFT54</a>.</li>
  <li>DE: <a href="https://www.amazon.de/dp/B0BRJPFT54">https://www.amazon.de/dp/B0BRJPFT54</a>.</li>
  <li>UK: <a href="https://www.amazon.co.uk/dp/B0BRJPFT54">https://www.amazon.co.uk/dp/B0BRJPFT54</a>.</li>
  <li>FR: <a href="https://www.amazon.fr/dp/B0BRJPFT54">https://www.amazon.fr/dp/B0BRJPFT54</a>.</li>
  <li>IT: <a href="https://www.amazon.it/dp/B0BRJPFT54">https://www.amazon.it/dp/B0BRJPFT54</a>.</li>
  <li>JP: <a href="https://www.amazon.co.jp/dp/B0BRJPFT54">https://www.amazon.co.jp/dp/B0BRJPFT54</a>.</li>
  <li>CA: <a href="https://www.amazon.ca/dp/B0BRJPFT54">https://www.amazon.ca/dp/B0BRJPFT54</a>.</li>
</ul>

<p>Thank you for your interest, and I hope you find this book helpful!</p>

<p>Ricardo.</p>]]></content><author><name>Ricardo Pallas</name></author><category term="blog" /><category term="coding interview" /><category term="algorithms" /><category term="data structures" /><category term="faang" /><category term="backtracking" /><category term="coding problems" /><category term="binary trees" /><category term="recursion" /><category term="graphs" /><category term="heaps" /><category term="arrays" /><category term="hash maps" /><category term="sliding windows" /><summary type="html"><![CDATA[Paradigmatic coding problems and templates for preparing the coding interview]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://rpallas92.github.io/assets/images/markdown.jpg" /><media:content medium="image" url="https://rpallas92.github.io/assets/images/markdown.jpg" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Promised Architecture Kit V2</title><link href="https://rpallas92.github.io/promised-architecture-kit/" rel="alternate" type="text/html" title="Promised Architecture Kit V2" /><published>2018-12-26T14:00:00+00:00</published><updated>2018-12-26T14:00:00+00:00</updated><id>https://rpallas92.github.io/promised-architecture-kit</id><content type="html" xml:base="https://rpallas92.github.io/promised-architecture-kit/"><![CDATA[<h1 id="promisedarchitecturekit-v2">PromisedArchitectureKit V2</h1>

<p>The simplest architecture for <a href="https://github.com/mxcl/PromiseKit">PromiseKit</a>, now V2, even simpler and easier to reason about.</p>

<p>I have published a new version of <a href="https://github.com/RPallas92/PromisedArchitectureKit">PromisedArchitectureKit</a>, fully redesigned and simplified.</p>

<h3 id="v2-goal">V2 Goal</h3>

<blockquote>
  <p>PromisedArchitectureKit V2 has been designed to impose constraints that enforce correctness and simplicity.</p>
</blockquote>

<h2 id="introduction">Introduction</h2>

<p>PromisedArchitectureKit is a library that tries to enforce correctness and simplify the state management of applications and systems. It helps you write applications that behave consistently, and are easy to test. It’s inspired by Redux and RxFeedback.</p>

<h2 id="motivation">Motivation</h2>

<p>I have been trying to find a proper way and architecture to simplify the complexity of managing and handling the state of mobile applications, and also, easy to test.</p>

<p>I started with <strong>Model-View-Controller (MVC)</strong>, then <strong>Model-View-ViewModel (MVVM)</strong> and also Model-View-Presenter (MVP) along with Clean architecture. MVC is not as easy to test as in MVVM and MVP. MVVM and MVP are easy to test, but the issue is the UI state can be a mess, since there is not a centralized way to update it, and you can have lots of methods among the code that changes the state.</p>

<p>Then it appeared <strong>Elm</strong> and <strong>Redux</strong> and other Redux-like architectures as Redux-Observable, RxFeedback, Cycle.js, ReSwift, etc. The main difference between these architectures (including PromisedArchitectureKit) and MVP is that they introduce constrains of how the UI state can be updated, in order to enforce correctness and make apps easier to reason about.</p>

<p>Which make PromisedArchitectureKit different from these Redux-like architectures is it uses
async reducers (using PromiseKit) to wrap the effects, then it runs side effects for you and calls the UI with the result.</p>

<p><strong>PromisedArchitectureKit runs side effects for you. Your code stays 100% pure.</strong></p>

<h2 id="quick-start">Quick start</h2>

<h3 id="installation">Installation</h3>

<p>PromisedArchitectureKit is available through <a href="https://cocoapods.org">CocoaPods</a>. To install
it, simply add the following line to your Podfile:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">pod</span> <span class="s1">'PromisedArchitectureKit'</span>
</code></pre></div></div>

<h2 id="promisedarchitecturekit">PromisedArchitectureKit</h2>
<p>PromisedArchitectureKit itself is very simple. How it looks:</p>

<div class="language-swift highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">self</span><span class="o">.</span><span class="n">system</span> <span class="o">=</span> <span class="kt">System</span><span class="o">.</span><span class="nf">pure</span><span class="p">(</span>
	<span class="nv">initialState</span><span class="p">:</span> <span class="kt">State</span><span class="o">.</span><span class="n">start</span><span class="p">,</span>
   	<span class="nv">reducer</span><span class="p">:</span> <span class="kt">State</span><span class="o">.</span><span class="n">reduce</span><span class="p">,</span>
   	<span class="nv">uiBindings</span><span class="p">:</span> <span class="p">[</span><span class="n">view</span><span class="p">?</span><span class="o">.</span><span class="n">updateUI</span><span class="p">]</span>
<span class="p">)</span>
</code></pre></div></div>

<h3 id="the-core-concept">The core concept</h3>
<p>Each screen of your app (and the whole app) has a state itself. in PromisedArchitectureKit, this state is represented as an Enum. For example, the state of a Ecommerce <strong>Product detail page (PDP)</strong> app might look like this:</p>

<div class="language-swift highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">enum</span> <span class="kt">State</span> <span class="p">{</span>
    <span class="k">case</span> <span class="n">start</span>
    <span class="k">case</span> <span class="n">loading</span>
    <span class="k">case</span> <span class="nf">productLoaded</span><span class="p">(</span><span class="kt">Product</span><span class="p">)</span>
    <span class="k">case</span> <span class="nf">addedToCart</span><span class="p">(</span><span class="kt">Product</span><span class="p">,</span> <span class="kt">CartResponse</span><span class="p">)</span>
    <span class="k">case</span> <span class="nf">error</span><span class="p">(</span><span class="kt">Error</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div></div>
<p>In this screen, the app loads the product, then it can show the product or an error. After the product is loaded, the user can add it to the basket.</p>

<p>This State enum, representes the state of the <strong>“PDP screen”</strong> in the ecommerce app.
With this approach of having an enum that actually represents the state of a screen, views are a direct mapping of state:</p>

<p><code class="language-plaintext highlighter-rouge">view = f(state)</code>.</p>

<p>That “f” function will be the UI binding function that we will see later on.</p>

<p><strong>To change something in the state, you need to dispatch an Event.</strong> An event is an enum that describes what happened. Here are a few example events:</p>

<div class="language-swift highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">enum</span> <span class="kt">Event</span> <span class="p">{</span>
    <span class="k">case</span> <span class="n">loadProduct</span>
    <span class="k">case</span> <span class="n">addToCart</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Enforcing that every change is described as an event lets us have a clear understanding of what’s going on in the app. If something changed, we know why it changed.</p>

<p>Events are like breadcrumbs of what has happened. Finally, to tie state and actions together, we write a function called <strong>reducer</strong>. A reducer it’s just a function that <strong>takes state and action as arguments, and returns the next state of the app (asynchronously)</strong>:</p>

<p><code class="language-plaintext highlighter-rouge">(State, Event) -&gt; AsyncResult&lt;State&gt;</code></p>

<p>AsyncResult is just a wrapper of Promise.</p>

<p>We write a reducer function for every state of every screen. For the PDP screen:</p>

<div class="language-swift highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="kd">static</span> <span class="kd">func</span> <span class="nf">reduce</span><span class="p">(</span><span class="nv">state</span><span class="p">:</span> <span class="kt">State</span><span class="p">,</span> <span class="nv">event</span><span class="p">:</span> <span class="kt">Event</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="kt">AsyncResult</span><span class="o">&lt;</span><span class="kt">State</span><span class="o">&gt;</span> <span class="p">{</span>
        <span class="k">switch</span> <span class="n">event</span> <span class="p">{</span>

        <span class="k">case</span> <span class="o">.</span><span class="nv">loadProduct</span><span class="p">:</span>
            <span class="k">let</span> <span class="nv">productResult</span> <span class="o">=</span> <span class="nf">getProduct</span><span class="p">(</span><span class="nv">cached</span><span class="p">:</span> <span class="kc">false</span><span class="p">)</span>
            
            <span class="k">return</span> <span class="n">productResult</span>
                <span class="o">.</span><span class="n">map</span> <span class="p">{</span> <span class="kt">State</span><span class="o">.</span><span class="nf">productLoaded</span><span class="p">(</span><span class="nv">$0</span><span class="p">)</span> <span class="p">}</span>
                <span class="o">.</span><span class="nf">stateWhenLoading</span><span class="p">(</span><span class="kt">State</span><span class="o">.</span><span class="n">loading</span><span class="p">)</span>
                <span class="o">.</span><span class="n">mapErrorRecover</span> <span class="p">{</span> <span class="kt">State</span><span class="o">.</span><span class="nf">error</span><span class="p">(</span><span class="nv">$0</span><span class="p">)</span> <span class="p">}</span>
            
        <span class="k">case</span> <span class="o">.</span><span class="nv">addToCart</span><span class="p">:</span>
            <span class="k">let</span> <span class="nv">productResult</span> <span class="o">=</span> <span class="nf">getProduct</span><span class="p">(</span><span class="nv">cached</span><span class="p">:</span> <span class="kc">true</span><span class="p">)</span>
            <span class="k">let</span> <span class="nv">userResult</span> <span class="o">=</span> <span class="nf">getUser</span><span class="p">()</span>
            
            <span class="k">return</span> <span class="kt">AsyncResult</span><span class="o">&lt;</span><span class="p">(</span><span class="kt">Product</span><span class="p">,</span> <span class="kt">User</span><span class="p">)</span><span class="o">&gt;.</span><span class="nf">zip</span><span class="p">(</span><span class="n">productResult</span><span class="p">,</span> <span class="n">userResult</span><span class="p">)</span><span class="o">.</span><span class="n">flatMap</span> <span class="p">{</span> <span class="n">pair</span> <span class="o">-&gt;</span> <span class="kt">AsyncResult</span><span class="o">&lt;</span><span class="kt">State</span><span class="o">&gt;</span> <span class="k">in</span>
                <span class="k">let</span> <span class="p">(</span><span class="nv">product</span><span class="p">,</span> <span class="nv">user</span><span class="p">)</span> <span class="o">=</span> <span class="n">pair</span>
                
                <span class="k">return</span> <span class="nf">addToCart</span><span class="p">(</span><span class="nv">product</span><span class="p">:</span> <span class="n">product</span><span class="p">,</span> <span class="nv">user</span><span class="p">:</span> <span class="n">user</span><span class="p">)</span>
                    <span class="o">.</span><span class="n">map</span> <span class="p">{</span> <span class="kt">State</span><span class="o">.</span><span class="nf">addedToCart</span><span class="p">(</span><span class="n">product</span><span class="p">,</span> <span class="nv">$0</span><span class="p">)</span> <span class="p">}</span>
                    <span class="o">.</span><span class="n">mapErrorRecover</span><span class="p">{</span> <span class="kt">State</span><span class="o">.</span><span class="nf">error</span><span class="p">(</span><span class="nv">$0</span><span class="p">)</span> <span class="p">}</span>
            <span class="p">}</span>
            <span class="o">.</span><span class="nf">stateWhenLoading</span><span class="p">(</span><span class="kt">State</span><span class="o">.</span><span class="n">loading</span><span class="p">)</span>
        <span class="p">}</span>
    <span class="p">}</span>
</code></pre></div></div>

<p>Notice that the reducer is a pure function, in terms of referencial transparency, and for state S and event E, it always return the same state description, and has no side effects (it only returns descriptions of the effects, the library will run them for you).</p>

<p><strong>This is basically the whole idea of PromisedArchitectureKit</strong>. Note that we haven’t used any PromisedArchitectureKit APIs. It comes with a few utilities to facilitate this pattern, but the main idea is that you describe how your state is updated over time in response to events, and 90% of the code you write is just plain Swift, so the UI logic can be tested with ease.</p>

<p>But what about asynchronous code and side effects as API calls, DB calls, logging, reading and writing files?</p>

<h3 id="using-promisekit-as-a-time-abstraction">Using PromiseKit as a time abstraction</h3>

<p>A Promise is used for handling asynchronous operations. PromisedArchitectureKit uses them in order to trigger reactions to some states. Example of Promise:</p>

<div class="language-swift highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
    <span class="kd">func</span> <span class="nf">getProduct</span><span class="p">()</span> <span class="o">-&gt;</span> <span class="kt">Promise</span><span class="o">&lt;</span><span class="kt">Product</span><span class="o">&gt;</span> <span class="p">{</span>
        <span class="k">return</span> <span class="kt">Promise</span> <span class="p">{</span> <span class="n">seal</span> <span class="k">in</span>
            <span class="kt">DispatchQueue</span><span class="o">.</span><span class="n">main</span><span class="o">.</span><span class="nf">asyncAfter</span><span class="p">(</span><span class="nv">deadline</span><span class="p">:</span> <span class="o">.</span><span class="nf">now</span><span class="p">()</span> <span class="o">+</span> <span class="mi">5</span><span class="p">)</span> <span class="p">{</span>
                <span class="n">seal</span><span class="o">.</span><span class="nf">fulfill</span><span class="p">(</span><span class="s">"Yeezy 500"</span><span class="p">)</span>
            <span class="p">}</span>
        <span class="p">}</span>
    <span class="p">}</span>

</code></pre></div></div>

<p>That function returns a Promise that will return a product. It waits for 5 seconds and then returns the product. It simulates a network call.</p>

<h3 id="dont-fear-the-asyncresult">Don’t fear the AsyncResult</h3>
<p>AsyncResult is just a wrapper over Promise that provides it more power. It is just like a Promise on steroids.</p>

<p>But don’t worry. If you whole app uses Promises, it is ok. You can keep using promises and transform them to AsyncResults on the reducer function with ease.</p>

<p>How to get an AsyncResult from a Promise?:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>let asyncResult = AsyncResult(promise)
</code></pre></div></div>

<p>And that’s it!</p>

<h3 id="what-if-i-want-to-make-network-calls-db-calls-and-so-on">What if i want to make network calls, DB calls, and so on?</h3>

<p>If we want to load the product from the backend, we would require a network call, which is a side effect and it is asynchronous.</p>

<p>In order to achieve it, we will use Promises to handle async code. As the reducer funciton returns the new state async, we can map Promises to new states.</p>

<p>For example, we are in Start state, and we want to load a product and go to loadedProduct state, when a loadProduct event is triggered. In the reducer we do:</p>

<div class="language-swift highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="kd">static</span> <span class="kd">func</span> <span class="nf">reduce</span><span class="p">(</span><span class="nv">state</span><span class="p">:</span> <span class="kt">State</span><span class="p">,</span> <span class="nv">event</span><span class="p">:</span> <span class="kt">Event</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="kt">AsyncResult</span><span class="o">&lt;</span><span class="kt">State</span><span class="o">&gt;</span> <span class="p">{</span>
        <span class="k">switch</span> <span class="n">event</span> <span class="p">{</span>

        <span class="k">case</span> <span class="o">.</span><span class="nv">loadProduct</span><span class="p">:</span>
            <span class="k">let</span> <span class="nv">productResult</span> <span class="o">=</span> <span class="nf">getProduct</span><span class="p">(</span><span class="nv">cached</span><span class="p">:</span> <span class="kc">false</span><span class="p">)</span>
            
            <span class="k">return</span> <span class="n">productResult</span>
                <span class="o">.</span><span class="n">map</span> <span class="p">{</span> <span class="kt">State</span><span class="o">.</span><span class="nf">productLoaded</span><span class="p">(</span><span class="nv">$0</span><span class="p">)</span> <span class="p">}</span>
                <span class="o">.</span><span class="nf">stateWhenLoading</span><span class="p">(</span><span class="kt">State</span><span class="o">.</span><span class="n">loading</span><span class="p">)</span>
                <span class="o">.</span><span class="n">mapErrorRecover</span> <span class="p">{</span> <span class="kt">State</span><span class="o">.</span><span class="nf">error</span><span class="p">(</span><span class="nv">$0</span><span class="p">)</span> <span class="p">}</span>
                
        <span class="p">(</span><span class="o">...</span><span class="p">)</span>

</code></pre></div></div>

<p>What is this doing? Step by step:</p>

<ul>
  <li>When a loadProduct event is triggered</li>
</ul>

<div class="language-swift highlighter-rouge"><div class="highlight"><pre class="highlight"><code>	<span class="k">switch</span> <span class="n">event</span> <span class="p">{</span>
   		<span class="k">case</span> <span class="o">.</span><span class="nv">loadProduct</span><span class="p">:</span>
</code></pre></div></div>
<ul>
  <li>We get the product (AsyncResult<Product>)</Product></li>
</ul>

<div class="language-swift highlighter-rouge"><div class="highlight"><pre class="highlight"><code>	<span class="k">let</span> <span class="nv">productResult</span> <span class="o">=</span> <span class="nf">getProduct</span><span class="p">(</span><span class="nv">cached</span><span class="p">:</span> <span class="kc">false</span><span class="p">)</span>
</code></pre></div></div>

<ul>
  <li>In case of the product would be retrieved successfully we will return a loadedProduct state:</li>
</ul>

<div class="language-swift highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">return</span> <span class="n">productResult</span>
 	<span class="o">.</span><span class="n">map</span> <span class="p">{</span> <span class="kt">State</span><span class="o">.</span><span class="nf">productLoaded</span><span class="p">(</span><span class="nv">$0</span><span class="p">)</span> <span class="p">}</span>

</code></pre></div></div>

<ul>
  <li>We want to send the UI a loading state while the Promise being executed until it gets resolved, so the UI can show a loading indicator:</li>
</ul>

<div class="language-swift highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">.</span><span class="nf">stateWhenLoading</span><span class="p">(</span><span class="kt">State</span><span class="o">.</span><span class="n">loading</span><span class="p">)</span>
</code></pre></div></div>

<ul>
  <li>In case of the product <strong>wouldn’t</strong> be retrieved successfully we will return a error state:</li>
</ul>

<div class="language-swift highlighter-rouge"><div class="highlight"><pre class="highlight"><code>	<span class="o">.</span><span class="n">mapErrorRecover</span> <span class="p">{</span> <span class="kt">State</span><span class="o">.</span><span class="nf">error</span><span class="p">(</span><span class="nv">$0</span><span class="p">)</span> <span class="p">}</span>

</code></pre></div></div>

<p>Pretty easy and neat.</p>

<p><strong>There is no side effect here: there is only a description of it. Actually, the side effect will be executed by the library.</strong></p>

<h3 id="update-the-view">Update the view</h3>

<p>After a new state change, the View’s updateUI function will be called with the new state. Then the view is in charge of update its ui components.</p>

<p>Example:</p>

<div class="language-swift highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="kd">func</span> <span class="nf">updateUI</span><span class="p">(</span><span class="nv">state</span><span class="p">:</span> <span class="kt">State</span><span class="p">)</span> <span class="p">{</span>
        <span class="nf">showLoading</span><span class="p">()</span>
        <span class="n">addToCartButton</span><span class="o">.</span><span class="n">isEnabled</span> <span class="o">=</span> <span class="kc">false</span>
        <span class="n">refreshButton</span><span class="o">.</span><span class="n">isHidden</span> <span class="o">=</span> <span class="kc">false</span>

    
        <span class="k">switch</span> <span class="n">state</span> <span class="p">{</span>
        <span class="k">case</span> <span class="o">.</span><span class="nv">start</span><span class="p">:</span>
            <span class="n">productTitleLabel</span><span class="o">.</span><span class="n">text</span> <span class="o">=</span> <span class="s">""</span>
            <span class="n">descriptionLabel</span><span class="o">.</span><span class="n">text</span> <span class="o">=</span> <span class="s">""</span>
            <span class="n">imageView</span><span class="o">.</span><span class="n">image</span> <span class="o">=</span> <span class="kc">nil</span>
        <span class="k">case</span> <span class="o">.</span><span class="nv">loading</span><span class="p">:</span>
            <span class="n">refreshButton</span><span class="o">.</span><span class="n">isHidden</span> <span class="o">=</span> <span class="kc">true</span>
            <span class="nf">showLoading</span><span class="p">()</span>
            
        <span class="k">case</span> <span class="o">.</span><span class="nf">productLoaded</span><span class="p">(</span><span class="k">let</span> <span class="nv">product</span><span class="p">):</span>
            <span class="n">productTitleLabel</span><span class="o">.</span><span class="n">text</span> <span class="o">=</span> <span class="n">product</span><span class="o">.</span><span class="n">title</span>
            <span class="n">descriptionLabel</span><span class="o">.</span><span class="n">text</span> <span class="o">=</span> <span class="n">product</span><span class="o">.</span><span class="n">description</span>
            <span class="nf">updateImage</span><span class="p">(</span><span class="nv">with</span><span class="p">:</span> <span class="n">product</span><span class="o">.</span><span class="n">imageUrl</span><span class="p">)</span>
            <span class="n">addToCartButton</span><span class="o">.</span><span class="n">isEnabled</span> <span class="o">=</span> <span class="kc">true</span>
            <span class="nf">hideLoading</span><span class="p">()</span>
            
        <span class="k">case</span> <span class="o">.</span><span class="nf">error</span><span class="p">(</span><span class="k">let</span> <span class="nv">error</span><span class="p">):</span>
            <span class="n">descriptionLabel</span><span class="o">.</span><span class="n">text</span> <span class="o">=</span> <span class="n">error</span><span class="o">.</span><span class="n">localizedDescription</span>
            <span class="nf">hideLoading</span><span class="p">()</span>
            
        <span class="k">case</span> <span class="o">.</span><span class="nf">addedToCart</span><span class="p">(</span><span class="n">_</span><span class="p">,</span> <span class="k">let</span> <span class="nv">cartResponse</span><span class="p">):</span>
            <span class="nf">hideLoading</span><span class="p">()</span>
            <span class="n">addToCartButton</span><span class="o">.</span><span class="n">isEnabled</span> <span class="o">=</span> <span class="kc">true</span>
            <span class="nf">showAddedToCartAlert</span><span class="p">(</span><span class="n">cartResponse</span><span class="p">)</span>
        <span class="p">}</span>

        <span class="nf">print</span><span class="p">(</span><span class="n">state</span><span class="p">)</span>
    <span class="p">}</span>
</code></pre></div></div>

<p>So, the presenter will compute the next state, and will send it to the view. The view will draw itself accordingly.</p>

<h2 id="what-the-library-does-under-the-hood">What the library does under the hood?</h2>
<p>The library’s core is small. It can be pasted here:</p>

<div class="language-swift highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">//</span>
<span class="c1">//  System.swift</span>
<span class="c1">//  PromisedArchitectureKit</span>
<span class="c1">//</span>
<span class="c1">//  Created by Pallas, Ricardo on 7/3/18.</span>
<span class="c1">//</span>

<span class="kd">import</span> <span class="kt">Foundation</span>
<span class="kd">import</span> <span class="kt">PromiseKit</span>

<span class="kd">public</span> <span class="kd">final</span> <span class="kd">class</span> <span class="kt">System</span><span class="o">&lt;</span><span class="kt">State</span><span class="p">,</span> <span class="kt">Event</span><span class="o">&gt;</span> <span class="p">{</span>

    <span class="kd">internal</span> <span class="k">var</span> <span class="nv">eventQueue</span> <span class="o">=</span> <span class="p">[</span><span class="kt">Event</span><span class="p">]()</span>
    <span class="kd">internal</span> <span class="k">var</span> <span class="nv">callback</span><span class="p">:</span> <span class="p">((</span><span class="kt">State</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="p">())?</span> <span class="o">=</span> <span class="kc">nil</span>

    <span class="kd">internal</span> <span class="k">var</span> <span class="nv">initialState</span><span class="p">:</span> <span class="kt">State</span>
    <span class="kd">internal</span> <span class="k">var</span> <span class="nv">reducer</span><span class="p">:</span> <span class="p">(</span><span class="kt">State</span><span class="p">,</span> <span class="kt">Event</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="kt">AsyncResult</span><span class="o">&lt;</span><span class="kt">State</span><span class="o">&gt;</span>
    <span class="kd">internal</span> <span class="k">var</span> <span class="nv">uiBindings</span><span class="p">:</span> <span class="p">[((</span><span class="kt">State</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="p">())?]</span>
    <span class="kd">internal</span> <span class="k">var</span> <span class="nv">currentState</span><span class="p">:</span> <span class="kt">State</span>

    <span class="kd">private</span> <span class="nf">init</span><span class="p">(</span>
        <span class="nv">initialState</span><span class="p">:</span> <span class="kt">State</span><span class="p">,</span>
        <span class="nv">reducer</span><span class="p">:</span> <span class="kd">@escaping</span> <span class="p">(</span><span class="kt">State</span><span class="p">,</span> <span class="kt">Event</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="kt">AsyncResult</span><span class="o">&lt;</span><span class="kt">State</span><span class="o">&gt;</span><span class="p">,</span>
        <span class="nv">uiBindings</span><span class="p">:</span> <span class="p">[((</span><span class="kt">State</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="p">())?]</span>
        <span class="p">)</span> <span class="p">{</span>
        <span class="k">self</span><span class="o">.</span><span class="n">initialState</span> <span class="o">=</span> <span class="n">initialState</span>
        <span class="k">self</span><span class="o">.</span><span class="n">reducer</span> <span class="o">=</span> <span class="n">reducer</span>
        <span class="k">self</span><span class="o">.</span><span class="n">uiBindings</span> <span class="o">=</span> <span class="n">uiBindings</span>
        <span class="k">self</span><span class="o">.</span><span class="n">currentState</span> <span class="o">=</span> <span class="n">initialState</span>
    <span class="p">}</span>

    <span class="kd">public</span> <span class="kd">static</span> <span class="kd">func</span> <span class="nf">pure</span><span class="p">(</span>
        <span class="nv">initialState</span><span class="p">:</span> <span class="kt">State</span><span class="p">,</span>
        <span class="nv">reducer</span><span class="p">:</span> <span class="kd">@escaping</span> <span class="p">(</span><span class="kt">State</span><span class="p">,</span> <span class="kt">Event</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="kt">AsyncResult</span><span class="o">&lt;</span><span class="kt">State</span><span class="o">&gt;</span><span class="p">,</span>
        <span class="nv">uiBindings</span><span class="p">:</span> <span class="p">[((</span><span class="kt">State</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="p">())?]</span>
        <span class="p">)</span> <span class="o">-&gt;</span> <span class="kt">System</span> <span class="p">{</span>
        
        <span class="k">let</span> <span class="nv">system</span> <span class="o">=</span> <span class="kt">System</span><span class="o">&lt;</span><span class="kt">State</span><span class="p">,</span><span class="kt">Event</span><span class="o">&gt;</span><span class="p">(</span><span class="nv">initialState</span><span class="p">:</span> <span class="n">initialState</span><span class="p">,</span> <span class="nv">reducer</span><span class="p">:</span> <span class="n">reducer</span><span class="p">,</span> <span class="nv">uiBindings</span><span class="p">:</span> <span class="n">uiBindings</span><span class="p">)</span>
        <span class="n">system</span><span class="o">.</span><span class="nf">bindUI</span><span class="p">(</span><span class="n">initialState</span><span class="p">)</span>
        <span class="k">return</span> <span class="n">system</span>
    <span class="p">}</span>

    <span class="kd">public</span> <span class="kd">func</span> <span class="nf">addLoopCallback</span><span class="p">(</span><span class="nv">callback</span><span class="p">:</span> <span class="kd">@escaping</span> <span class="p">(</span><span class="kt">State</span><span class="p">)</span><span class="o">-&gt;</span><span class="p">()){</span>
        <span class="k">self</span><span class="o">.</span><span class="n">callback</span> <span class="o">=</span> <span class="n">callback</span>
    <span class="p">}</span>

    <span class="k">var</span> <span class="nv">actionExecuting</span> <span class="o">=</span> <span class="kc">false</span>

    <span class="kd">public</span> <span class="kd">func</span> <span class="nf">sendEvent</span><span class="p">(</span><span class="n">_</span> <span class="nv">action</span><span class="p">:</span> <span class="kt">Event</span><span class="p">)</span> <span class="p">{</span>
        <span class="nf">assert</span><span class="p">(</span><span class="kt">Thread</span><span class="o">.</span><span class="n">isMainThread</span><span class="p">)</span>
        <span class="k">if</span> <span class="n">actionExecuting</span> <span class="p">{</span>
            <span class="k">self</span><span class="o">.</span><span class="n">eventQueue</span><span class="o">.</span><span class="nf">append</span><span class="p">(</span><span class="n">action</span><span class="p">)</span>
        <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
            <span class="n">actionExecuting</span> <span class="o">=</span> <span class="kc">true</span>
            <span class="k">let</span> <span class="nv">_</span> <span class="o">=</span> <span class="nf">doLoop</span><span class="p">(</span><span class="n">action</span><span class="p">)</span><span class="o">.</span><span class="n">done</span> <span class="p">{</span> <span class="n">state</span> <span class="k">in</span>
                <span class="nf">assert</span><span class="p">(</span><span class="kt">Thread</span><span class="o">.</span><span class="n">isMainThread</span><span class="p">,</span> <span class="s">"PromisedArchitectureKit: Final callback must be run on main thread"</span><span class="p">)</span>
                <span class="k">if</span> <span class="k">let</span> <span class="nv">callback</span> <span class="o">=</span> <span class="k">self</span><span class="o">.</span><span class="n">callback</span> <span class="p">{</span>
                    <span class="nf">callback</span><span class="p">(</span><span class="n">state</span><span class="p">)</span>
                <span class="p">}</span>
                <span class="k">self</span><span class="o">.</span><span class="n">actionExecuting</span> <span class="o">=</span> <span class="kc">false</span>
                <span class="k">if</span> <span class="k">let</span> <span class="nv">nextEvent</span> <span class="o">=</span> <span class="k">self</span><span class="o">.</span><span class="n">eventQueue</span><span class="o">.</span><span class="n">first</span> <span class="p">{</span>
                    <span class="k">self</span><span class="o">.</span><span class="n">eventQueue</span><span class="o">.</span><span class="nf">removeFirst</span><span class="p">()</span>
                    <span class="k">self</span><span class="o">.</span><span class="nf">sendEvent</span><span class="p">(</span><span class="n">nextEvent</span><span class="p">)</span>
                <span class="p">}</span>
            <span class="p">}</span>
        <span class="p">}</span>
    <span class="p">}</span>

    <span class="kd">private</span> <span class="kd">func</span> <span class="nf">doLoop</span><span class="p">(</span><span class="n">_</span> <span class="nv">event</span><span class="p">:</span> <span class="kt">Event</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="kt">Promise</span><span class="o">&lt;</span><span class="kt">State</span><span class="o">&gt;</span> <span class="p">{</span>
        <span class="k">return</span> <span class="kt">Promise</span><span class="o">.</span><span class="nf">value</span><span class="p">(</span><span class="n">event</span><span class="p">)</span>
            <span class="o">.</span><span class="n">then</span> <span class="p">{</span> <span class="n">event</span> <span class="o">-&gt;</span> <span class="kt">Promise</span><span class="o">&lt;</span><span class="kt">State</span><span class="o">&gt;</span> <span class="k">in</span>

                <span class="k">let</span> <span class="nv">asyncResultState</span> <span class="o">=</span> <span class="k">self</span><span class="o">.</span><span class="nf">reducer</span><span class="p">(</span><span class="k">self</span><span class="o">.</span><span class="n">currentState</span><span class="p">,</span> <span class="n">event</span><span class="p">)</span>

                <span class="k">if</span> <span class="k">let</span> <span class="nv">stateWhenLoading</span> <span class="o">=</span> <span class="n">asyncResultState</span><span class="o">.</span><span class="n">loadingResult</span> <span class="p">{</span>
                    <span class="k">self</span><span class="o">.</span><span class="nf">bindUI</span><span class="p">(</span><span class="n">stateWhenLoading</span><span class="p">)</span>
                <span class="p">}</span>

                <span class="k">return</span> <span class="n">asyncResultState</span><span class="o">.</span><span class="n">promise</span>
            <span class="p">}</span>
            <span class="o">.</span><span class="n">map</span> <span class="p">{</span> <span class="n">state</span> <span class="k">in</span>
                <span class="k">self</span><span class="o">.</span><span class="n">currentState</span> <span class="o">=</span> <span class="n">state</span>
                <span class="k">self</span><span class="o">.</span><span class="nf">bindUI</span><span class="p">(</span><span class="n">state</span><span class="p">)</span>
                <span class="k">return</span> <span class="n">state</span>
            <span class="p">}</span>
    <span class="p">}</span>

    <span class="kd">private</span> <span class="kd">func</span> <span class="nf">bindUI</span><span class="p">(</span><span class="n">_</span> <span class="nv">state</span><span class="p">:</span> <span class="kt">State</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">self</span><span class="o">.</span><span class="n">uiBindings</span><span class="o">.</span><span class="n">forEach</span> <span class="p">{</span> <span class="n">uiBinding</span> <span class="k">in</span>
            <span class="nf">uiBinding</span><span class="p">?(</span><span class="n">state</span><span class="p">)</span>
        <span class="p">}</span>
    <span class="p">}</span>
<span class="p">}</span>

</code></pre></div></div>

<p>It executes loops on the <code class="language-plaintext highlighter-rouge">doLoop</code> function. What is a loop?
It is the whole cycle where and event is triggered, a new state is calculated and the UI is updated accordingly.</p>

<p>Following the load product example:</p>

<ol>
  <li>
    <p>A <code class="language-plaintext highlighter-rouge">loadProduct</code> event is sent by the view. The <code class="language-plaintext highlighter-rouge">sendEvent</code> function is called that calls the <code class="language-plaintext highlighter-rouge">doLoop</code> function.</p>
  </li>
  <li>
    <p>The <code class="language-plaintext highlighter-rouge">doLoop</code> function executed the side effects thrown by the reducer and gets the new state async. If a loading state was specified it notifies the UI before running the side effects. After that, it updates the current state and calls the UI with the new state.</p>
  </li>
</ol>

<p><strong>To sum up: The system listens to events, runs side effects to get the new state and notifies the UI that the state has changed.</strong></p>

<h2 id="why-should-i-use-promisedarchiteruekit-v2-">Why should I use PromisedArchiterueKit V2 ?</h2>

<p>As said before, the goal of the library is to put constraints to enforce correcness and make architecure easier to read and easier to reason about. These contraints are: there a finite number of states for each screen, there are a finite number of events that can change the state, and the library decides when to update the UI.</p>

<p>Those restrictions comes with advantages, the trade off is worth it. 
The main advantages the library provides are:</p>

<ul>
  <li>The library executes <strong>all side effects for you</strong> so your code stays pure.</li>
  <li>It updates the view when needed, you don’t need to take care.</li>
  <li>You can know what the screen is about, reading the State enum.</li>
  <li>You know in compile-time that your view handles are states.</li>
  <li>You know what actions can be done on the screen, reading the Event enum.</li>
  <li>You know that all events are handled by the presenter on compile time.</li>
  <li>A single function will be called on every state change. That can be useful to have good analytics, for example.</li>
</ul>

<h2 id="example">Example</h2>

<p>To run the example project, clone the repo, and run <code class="language-plaintext highlighter-rouge">pod install</code> from the Example directory first.</p>

<p>ViewController’s code:</p>

<div class="language-swift highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">import</span> <span class="kt">UIKit</span>
<span class="kd">import</span> <span class="kt">PromisedArchitectureKit</span>

<span class="kd">class</span> <span class="kt">ViewController</span><span class="p">:</span> <span class="kt">UIViewController</span><span class="p">,</span> <span class="kt">View</span> <span class="p">{</span>
    
    <span class="kd">@IBOutlet</span> <span class="k">weak</span> <span class="k">var</span> <span class="nv">productTitleLabel</span><span class="p">:</span> <span class="kt">UILabel</span><span class="o">!</span>
    <span class="kd">@IBOutlet</span> <span class="k">weak</span> <span class="k">var</span> <span class="nv">imageView</span><span class="p">:</span> <span class="kt">UIImageView</span><span class="o">!</span>
    <span class="kd">@IBOutlet</span> <span class="k">weak</span> <span class="k">var</span> <span class="nv">descriptionLabel</span><span class="p">:</span> <span class="kt">UILabel</span><span class="o">!</span>
    <span class="kd">@IBOutlet</span> <span class="k">weak</span> <span class="k">var</span> <span class="nv">addToCartButton</span><span class="p">:</span> <span class="kt">UIButton</span><span class="o">!</span>
    <span class="kd">@IBOutlet</span> <span class="k">weak</span> <span class="k">var</span> <span class="nv">refreshButton</span><span class="p">:</span> <span class="kt">UIButton</span><span class="o">!</span>
    
    <span class="k">var</span> <span class="nv">presenter</span><span class="p">:</span> <span class="kt">Presenter</span><span class="o">!</span> <span class="o">=</span> <span class="kc">nil</span>
    <span class="k">var</span> <span class="nv">indicator</span><span class="p">:</span> <span class="kt">UIActivityIndicatorView</span><span class="o">!</span> <span class="o">=</span> <span class="kc">nil</span>
    
    <span class="k">override</span> <span class="kd">func</span> <span class="nf">viewDidLoad</span><span class="p">()</span> <span class="p">{</span>
        <span class="k">super</span><span class="o">.</span><span class="nf">viewDidLoad</span><span class="p">()</span>
        <span class="nf">addLoadingIndicator</span><span class="p">()</span>
        
        <span class="n">presenter</span> <span class="o">=</span> <span class="kt">Presenter</span><span class="p">(</span><span class="nv">view</span><span class="p">:</span> <span class="k">self</span><span class="p">)</span>
        <span class="n">presenter</span><span class="o">.</span><span class="nf">controllerLoaded</span><span class="p">()</span>
    <span class="p">}</span>
    
    <span class="k">override</span> <span class="kd">func</span> <span class="nf">viewWillAppear</span><span class="p">(</span><span class="n">_</span> <span class="nv">animated</span><span class="p">:</span> <span class="kt">Bool</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">super</span><span class="o">.</span><span class="nf">viewWillAppear</span><span class="p">(</span><span class="n">animated</span><span class="p">)</span>
        <span class="n">presenter</span><span class="o">.</span><span class="nf">sendEvent</span><span class="p">(</span><span class="kt">Event</span><span class="o">.</span><span class="n">loadProduct</span><span class="p">)</span>
    <span class="p">}</span>
    
    <span class="kd">private</span> <span class="kd">func</span> <span class="nf">addLoadingIndicator</span><span class="p">()</span> <span class="p">{</span>
        <span class="n">indicator</span> <span class="o">=</span> <span class="kt">UIActivityIndicatorView</span><span class="p">(</span><span class="nv">style</span><span class="p">:</span> <span class="kt">UIActivityIndicatorView</span><span class="o">.</span><span class="kt">Style</span><span class="o">.</span><span class="n">gray</span><span class="p">)</span>
        <span class="n">indicator</span><span class="o">.</span><span class="n">frame</span> <span class="o">=</span> <span class="kt">CGRect</span><span class="p">(</span><span class="nv">x</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="nv">y</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="nv">width</span><span class="p">:</span> <span class="n">view</span><span class="o">.</span><span class="n">frame</span><span class="o">.</span><span class="n">width</span><span class="p">,</span> <span class="nv">height</span><span class="p">:</span> <span class="n">view</span><span class="o">.</span><span class="n">frame</span><span class="o">.</span><span class="n">height</span><span class="p">)</span>
        <span class="n">indicator</span><span class="o">.</span><span class="n">center</span> <span class="o">=</span> <span class="n">view</span><span class="o">.</span><span class="n">center</span>
        <span class="n">view</span><span class="o">.</span><span class="nf">addSubview</span><span class="p">(</span><span class="n">indicator</span><span class="p">)</span>
        <span class="n">view</span><span class="o">.</span><span class="nf">bringSubviewToFront</span><span class="p">(</span><span class="n">indicator</span><span class="p">)</span>
        <span class="kt">UIApplication</span><span class="o">.</span><span class="n">shared</span><span class="o">.</span><span class="n">isNetworkActivityIndicatorVisible</span> <span class="o">=</span> <span class="kc">true</span>
    <span class="p">}</span>
    
    <span class="c1">// MARK: - User Actions</span>
    <span class="kd">@IBAction</span> <span class="kd">func</span> <span class="nf">didTapRefresh</span><span class="p">(</span><span class="n">_</span> <span class="nv">sender</span><span class="p">:</span> <span class="kt">Any</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">presenter</span><span class="o">.</span><span class="nf">sendEvent</span><span class="p">(</span><span class="kt">Event</span><span class="o">.</span><span class="n">loadProduct</span><span class="p">)</span>
    <span class="p">}</span>
    
    <span class="kd">@IBAction</span> <span class="kd">func</span> <span class="nf">didTapAddToCart</span><span class="p">(</span><span class="n">_</span> <span class="nv">sender</span><span class="p">:</span> <span class="kt">Any</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">presenter</span><span class="o">.</span><span class="nf">sendEvent</span><span class="p">(</span><span class="kt">Event</span><span class="o">.</span><span class="n">addToCart</span><span class="p">)</span>
    <span class="p">}</span>

    <span class="c1">// MARK: - User Outputs</span>
    <span class="kd">func</span> <span class="nf">updateUI</span><span class="p">(</span><span class="nv">state</span><span class="p">:</span> <span class="kt">State</span><span class="p">)</span> <span class="p">{</span>
        <span class="nf">showLoading</span><span class="p">()</span>
        <span class="n">addToCartButton</span><span class="o">.</span><span class="n">isEnabled</span> <span class="o">=</span> <span class="kc">false</span>
        <span class="n">refreshButton</span><span class="o">.</span><span class="n">isHidden</span> <span class="o">=</span> <span class="kc">false</span>

    
        <span class="k">switch</span> <span class="n">state</span> <span class="p">{</span>
        <span class="k">case</span> <span class="o">.</span><span class="nv">start</span><span class="p">:</span>
            <span class="n">productTitleLabel</span><span class="o">.</span><span class="n">text</span> <span class="o">=</span> <span class="s">""</span>
            <span class="n">descriptionLabel</span><span class="o">.</span><span class="n">text</span> <span class="o">=</span> <span class="s">""</span>
            <span class="n">imageView</span><span class="o">.</span><span class="n">image</span> <span class="o">=</span> <span class="kc">nil</span>
        <span class="k">case</span> <span class="o">.</span><span class="nv">loading</span><span class="p">:</span>
            <span class="n">refreshButton</span><span class="o">.</span><span class="n">isHidden</span> <span class="o">=</span> <span class="kc">true</span>
            <span class="nf">showLoading</span><span class="p">()</span>
            
        <span class="k">case</span> <span class="o">.</span><span class="nf">productLoaded</span><span class="p">(</span><span class="k">let</span> <span class="nv">product</span><span class="p">):</span>
            <span class="n">productTitleLabel</span><span class="o">.</span><span class="n">text</span> <span class="o">=</span> <span class="n">product</span><span class="o">.</span><span class="n">title</span>
            <span class="n">descriptionLabel</span><span class="o">.</span><span class="n">text</span> <span class="o">=</span> <span class="n">product</span><span class="o">.</span><span class="n">description</span>
            <span class="nf">updateImage</span><span class="p">(</span><span class="nv">with</span><span class="p">:</span> <span class="n">product</span><span class="o">.</span><span class="n">imageUrl</span><span class="p">)</span>
            <span class="n">addToCartButton</span><span class="o">.</span><span class="n">isEnabled</span> <span class="o">=</span> <span class="kc">true</span>
            <span class="nf">hideLoading</span><span class="p">()</span>
            
        <span class="k">case</span> <span class="o">.</span><span class="nf">error</span><span class="p">(</span><span class="k">let</span> <span class="nv">error</span><span class="p">):</span>
            <span class="n">descriptionLabel</span><span class="o">.</span><span class="n">text</span> <span class="o">=</span> <span class="n">error</span><span class="o">.</span><span class="n">localizedDescription</span>
            <span class="nf">hideLoading</span><span class="p">()</span>
            
        <span class="k">case</span> <span class="o">.</span><span class="nf">addedToCart</span><span class="p">(</span><span class="n">_</span><span class="p">,</span> <span class="k">let</span> <span class="nv">cartResponse</span><span class="p">):</span>
            <span class="nf">hideLoading</span><span class="p">()</span>
            <span class="n">addToCartButton</span><span class="o">.</span><span class="n">isEnabled</span> <span class="o">=</span> <span class="kc">true</span>
            <span class="nf">showAddedToCartAlert</span><span class="p">(</span><span class="n">cartResponse</span><span class="p">)</span>
        <span class="p">}</span>

        <span class="nf">print</span><span class="p">(</span><span class="n">state</span><span class="p">)</span>
    <span class="p">}</span>
    
    <span class="kd">private</span> <span class="kd">func</span> <span class="nf">showLoading</span><span class="p">()</span> <span class="p">{</span>
        <span class="n">indicator</span><span class="o">.</span><span class="nf">startAnimating</span><span class="p">()</span>
    <span class="p">}</span>
    
    <span class="kd">private</span> <span class="kd">func</span> <span class="nf">hideLoading</span><span class="p">()</span> <span class="p">{</span>
        <span class="n">indicator</span><span class="o">.</span><span class="nf">stopAnimating</span><span class="p">()</span>
    <span class="p">}</span>
    
    <span class="kd">private</span> <span class="kd">func</span> <span class="nf">showAddedToCartAlert</span><span class="p">(</span><span class="n">_</span> <span class="nv">message</span><span class="p">:</span> <span class="kt">String</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">let</span> <span class="nv">alertController</span> <span class="o">=</span> <span class="kt">UIAlertController</span><span class="p">(</span><span class="nv">title</span><span class="p">:</span> <span class="s">"Added to cart"</span><span class="p">,</span> <span class="nv">message</span><span class="p">:</span>
            <span class="n">message</span><span class="p">,</span> <span class="nv">preferredStyle</span><span class="p">:</span> <span class="kt">UIAlertController</span><span class="o">.</span><span class="kt">Style</span><span class="o">.</span><span class="n">alert</span><span class="p">)</span>
        <span class="n">alertController</span><span class="o">.</span><span class="nf">addAction</span><span class="p">(</span><span class="kt">UIAlertAction</span><span class="p">(</span><span class="nv">title</span><span class="p">:</span> <span class="s">"Dismiss"</span><span class="p">,</span> <span class="nv">style</span><span class="p">:</span> <span class="kt">UIAlertAction</span><span class="o">.</span><span class="kt">Style</span><span class="o">.</span><span class="k">default</span><span class="p">,</span><span class="nv">handler</span><span class="p">:</span> <span class="kc">nil</span><span class="p">))</span>
        <span class="k">self</span><span class="o">.</span><span class="nf">present</span><span class="p">(</span><span class="n">alertController</span><span class="p">,</span> <span class="nv">animated</span><span class="p">:</span> <span class="kc">true</span><span class="p">,</span> <span class="nv">completion</span><span class="p">:</span> <span class="kc">nil</span><span class="p">)</span>
    <span class="p">}</span>
    
    <span class="kd">private</span> <span class="kd">func</span> <span class="nf">updateImage</span><span class="p">(</span><span class="n">with</span> <span class="nv">urlPath</span><span class="p">:</span> <span class="kt">String</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">if</span> <span class="k">let</span> <span class="nv">url</span> <span class="o">=</span> <span class="kt">URL</span><span class="p">(</span><span class="nv">string</span><span class="p">:</span> <span class="n">urlPath</span><span class="p">),</span> <span class="k">let</span> <span class="nv">data</span> <span class="o">=</span> <span class="k">try</span><span class="p">?</span> <span class="kt">Data</span><span class="p">(</span><span class="nv">contentsOf</span><span class="p">:</span> <span class="n">url</span><span class="p">)</span> <span class="p">{</span>
            <span class="k">let</span> <span class="nv">image</span> <span class="o">=</span> <span class="kt">UIImage</span><span class="p">(</span><span class="nv">data</span><span class="p">:</span> <span class="n">data</span><span class="p">)</span>
            <span class="n">imageView</span><span class="o">.</span><span class="n">image</span> <span class="o">=</span> <span class="n">image</span>
        <span class="p">}</span>
    <span class="p">}</span>

<span class="p">}</span>
</code></pre></div></div>

<p>Prenseter’s code:</p>

<div class="language-swift highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">import</span> <span class="kt">Foundation</span>
<span class="kd">import</span> <span class="kt">PromisedArchitectureKit</span>
<span class="kd">import</span> <span class="kt">PromiseKit</span>

<span class="kd">typealias</span> <span class="kt">CartResponse</span> <span class="o">=</span> <span class="kt">String</span>
<span class="kd">typealias</span> <span class="kt">User</span> <span class="o">=</span> <span class="kt">String</span>

<span class="kd">struct</span> <span class="kt">Product</span><span class="p">:</span> <span class="kt">Equatable</span> <span class="p">{</span>
    <span class="k">let</span> <span class="nv">title</span><span class="p">:</span> <span class="kt">String</span>
    <span class="k">let</span> <span class="nv">description</span><span class="p">:</span> <span class="kt">String</span>
    <span class="k">let</span> <span class="nv">imageUrl</span><span class="p">:</span> <span class="kt">String</span>
<span class="p">}</span>

<span class="kd">protocol</span> <span class="kt">View</span><span class="p">:</span> <span class="kd">class</span> <span class="p">{</span>
    <span class="kd">func</span> <span class="nf">updateUI</span><span class="p">(</span><span class="nv">state</span><span class="p">:</span> <span class="kt">State</span><span class="p">)</span>
<span class="p">}</span>

<span class="c1">// MARK: - Events</span>
<span class="kd">enum</span> <span class="kt">Event</span> <span class="p">{</span>
    <span class="k">case</span> <span class="n">loadProduct</span>
    <span class="k">case</span> <span class="n">addToCart</span>
<span class="p">}</span>

<span class="c1">// MARK: - State</span>
<span class="kd">enum</span> <span class="kt">State</span> <span class="p">{</span>
    <span class="k">case</span> <span class="n">start</span>
    <span class="k">case</span> <span class="n">loading</span>
    <span class="k">case</span> <span class="nf">productLoaded</span><span class="p">(</span><span class="kt">Product</span><span class="p">)</span>
    <span class="k">case</span> <span class="nf">addedToCart</span><span class="p">(</span><span class="kt">Product</span><span class="p">,</span> <span class="kt">CartResponse</span><span class="p">)</span>
    <span class="k">case</span> <span class="nf">error</span><span class="p">(</span><span class="kt">Error</span><span class="p">)</span>
    
    <span class="kd">static</span> <span class="kd">func</span> <span class="nf">reduce</span><span class="p">(</span><span class="nv">state</span><span class="p">:</span> <span class="kt">State</span><span class="p">,</span> <span class="nv">event</span><span class="p">:</span> <span class="kt">Event</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="kt">AsyncResult</span><span class="o">&lt;</span><span class="kt">State</span><span class="o">&gt;</span> <span class="p">{</span>
        <span class="k">switch</span> <span class="n">event</span> <span class="p">{</span>

        <span class="k">case</span> <span class="o">.</span><span class="nv">loadProduct</span><span class="p">:</span>
            <span class="k">let</span> <span class="nv">productResult</span> <span class="o">=</span> <span class="nf">getProduct</span><span class="p">(</span><span class="nv">cached</span><span class="p">:</span> <span class="kc">false</span><span class="p">)</span>
            
            <span class="k">return</span> <span class="n">productResult</span>
                <span class="o">.</span><span class="n">map</span> <span class="p">{</span> <span class="kt">State</span><span class="o">.</span><span class="nf">productLoaded</span><span class="p">(</span><span class="nv">$0</span><span class="p">)</span> <span class="p">}</span>
                <span class="o">.</span><span class="nf">stateWhenLoading</span><span class="p">(</span><span class="kt">State</span><span class="o">.</span><span class="n">loading</span><span class="p">)</span>
                <span class="o">.</span><span class="n">mapErrorRecover</span> <span class="p">{</span> <span class="kt">State</span><span class="o">.</span><span class="nf">error</span><span class="p">(</span><span class="nv">$0</span><span class="p">)</span> <span class="p">}</span>
            
        <span class="k">case</span> <span class="o">.</span><span class="nv">addToCart</span><span class="p">:</span>
            <span class="k">let</span> <span class="nv">productResult</span> <span class="o">=</span> <span class="nf">getProduct</span><span class="p">(</span><span class="nv">cached</span><span class="p">:</span> <span class="kc">true</span><span class="p">)</span>
            <span class="k">let</span> <span class="nv">userResult</span> <span class="o">=</span> <span class="nf">getUser</span><span class="p">()</span>
            
            <span class="k">return</span> <span class="kt">AsyncResult</span><span class="o">&lt;</span><span class="p">(</span><span class="kt">Product</span><span class="p">,</span> <span class="kt">User</span><span class="p">)</span><span class="o">&gt;.</span><span class="nf">zip</span><span class="p">(</span><span class="n">productResult</span><span class="p">,</span> <span class="n">userResult</span><span class="p">)</span><span class="o">.</span><span class="n">flatMap</span> <span class="p">{</span> <span class="n">pair</span> <span class="o">-&gt;</span> <span class="kt">AsyncResult</span><span class="o">&lt;</span><span class="kt">State</span><span class="o">&gt;</span> <span class="k">in</span>
                <span class="k">let</span> <span class="p">(</span><span class="nv">product</span><span class="p">,</span> <span class="nv">user</span><span class="p">)</span> <span class="o">=</span> <span class="n">pair</span>
                
                <span class="k">return</span> <span class="nf">addToCart</span><span class="p">(</span><span class="nv">product</span><span class="p">:</span> <span class="n">product</span><span class="p">,</span> <span class="nv">user</span><span class="p">:</span> <span class="n">user</span><span class="p">)</span>
                    <span class="o">.</span><span class="n">map</span> <span class="p">{</span> <span class="kt">State</span><span class="o">.</span><span class="nf">addedToCart</span><span class="p">(</span><span class="n">product</span><span class="p">,</span> <span class="nv">$0</span><span class="p">)</span> <span class="p">}</span>
                    <span class="o">.</span><span class="n">mapErrorRecover</span><span class="p">{</span> <span class="kt">State</span><span class="o">.</span><span class="nf">error</span><span class="p">(</span><span class="nv">$0</span><span class="p">)</span> <span class="p">}</span>
            <span class="p">}</span>
            <span class="o">.</span><span class="nf">stateWhenLoading</span><span class="p">(</span><span class="kt">State</span><span class="o">.</span><span class="n">loading</span><span class="p">)</span>
        <span class="p">}</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="kd">fileprivate</span> <span class="kd">func</span> <span class="nf">getProduct</span><span class="p">(</span><span class="nv">cached</span><span class="p">:</span> <span class="kt">Bool</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="kt">AsyncResult</span><span class="o">&lt;</span><span class="kt">Product</span><span class="o">&gt;</span> <span class="p">{</span>
    <span class="k">let</span> <span class="nv">delay</span><span class="p">:</span> <span class="kt">DispatchTime</span> <span class="o">=</span> <span class="n">cached</span> <span class="p">?</span> <span class="o">.</span><span class="nf">now</span><span class="p">()</span> <span class="p">:</span> <span class="o">.</span><span class="nf">now</span><span class="p">()</span> <span class="o">+</span> <span class="mi">3</span>
    <span class="k">let</span> <span class="nv">product</span> <span class="o">=</span> <span class="kt">Product</span><span class="p">(</span>
        <span class="nv">title</span><span class="p">:</span> <span class="s">"Yeezy Triple White"</span><span class="p">,</span>
        <span class="nv">description</span><span class="p">:</span> <span class="s">"YEEZY Boost 350 V2 “Triple White,” aka “Cream”. </span><span class="se">\n</span><span class="s"> adidas Originals has officially announced its largest-ever YEEZY Boost 350 V2 release. The “Triple White” iteration of one of Kanye West’s most popular silhouettes will drop again on September 21 for a retail price of $220. The sneaker previously dropped under the “Cream” alias."</span><span class="p">,</span>
        <span class="nv">imageUrl</span><span class="p">:</span> <span class="s">"https://static.highsnobiety.com/wp-content/uploads/2018/08/20172554/adidas-originals-yeezy-boost-350-v2-triple-white-release-date-price-02.jpg"</span><span class="p">)</span>
    
    <span class="k">let</span> <span class="nv">promise</span> <span class="o">=</span> <span class="kt">Promise</span> <span class="p">{</span> <span class="n">seal</span> <span class="k">in</span>
        <span class="kt">DispatchQueue</span><span class="o">.</span><span class="n">main</span><span class="o">.</span><span class="nf">asyncAfter</span><span class="p">(</span><span class="nv">deadline</span><span class="p">:</span> <span class="n">delay</span><span class="p">)</span> <span class="p">{</span>
            <span class="n">seal</span><span class="o">.</span><span class="nf">fulfill</span><span class="p">(</span><span class="n">product</span><span class="p">)</span>
        <span class="p">}</span>
    <span class="p">}</span>

    <span class="k">return</span> <span class="kt">AsyncResult</span><span class="o">&lt;</span><span class="kt">Product</span><span class="o">&gt;</span><span class="p">(</span><span class="n">promise</span><span class="p">)</span>
<span class="p">}</span>

<span class="kd">fileprivate</span> <span class="kd">func</span> <span class="nf">addToCart</span><span class="p">(</span><span class="nv">product</span><span class="p">:</span> <span class="kt">Product</span><span class="p">,</span> <span class="nv">user</span><span class="p">:</span> <span class="kt">User</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="kt">AsyncResult</span><span class="o">&lt;</span><span class="kt">CartResponse</span><span class="o">&gt;</span> <span class="p">{</span>
    <span class="k">let</span> <span class="nv">randomNumber</span> <span class="o">=</span> <span class="kt">Int</span><span class="o">.</span><span class="nf">random</span><span class="p">(</span><span class="nv">in</span><span class="p">:</span> <span class="mi">1</span><span class="o">..&lt;</span><span class="mi">10</span><span class="p">)</span>

    <span class="k">let</span> <span class="nv">failedPromise</span> <span class="o">=</span> <span class="kt">Promise</span><span class="o">&lt;</span><span class="kt">CartResponse</span><span class="o">&gt;</span><span class="p">(</span><span class="nv">error</span><span class="p">:</span> <span class="kt">NSError</span><span class="p">(</span><span class="nv">domain</span><span class="p">:</span> <span class="s">"Error adding to cart"</span><span class="p">,</span><span class="nv">code</span><span class="p">:</span> <span class="mi">15</span><span class="p">,</span> <span class="nv">userInfo</span><span class="p">:</span> <span class="kc">nil</span><span class="p">))</span>
    <span class="k">let</span> <span class="nv">promise</span> <span class="o">=</span> <span class="kt">Promise</span><span class="o">&lt;</span><span class="kt">CartResponse</span><span class="o">&gt;.</span><span class="nf">value</span><span class="p">(</span><span class="s">"Product: </span><span class="se">\(</span><span class="n">product</span><span class="o">.</span><span class="n">title</span><span class="se">)</span><span class="s"> added to cart for user: </span><span class="se">\(</span><span class="n">user</span><span class="se">)</span><span class="s">"</span><span class="p">)</span>

    <span class="k">if</span> <span class="n">randomNumber</span> <span class="o">&lt;</span> <span class="mi">5</span> <span class="p">{</span>
        <span class="k">return</span> <span class="kt">AsyncResult</span><span class="o">&lt;</span><span class="kt">CartResponse</span><span class="o">&gt;</span><span class="p">(</span><span class="n">failedPromise</span><span class="p">)</span>
    <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
        <span class="k">return</span> <span class="kt">AsyncResult</span><span class="o">&lt;</span><span class="kt">CartResponse</span><span class="o">&gt;</span><span class="p">(</span><span class="n">promise</span><span class="p">)</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="kd">fileprivate</span> <span class="kd">func</span> <span class="nf">getUser</span><span class="p">()</span> <span class="o">-&gt;</span> <span class="kt">AsyncResult</span><span class="o">&lt;</span><span class="kt">User</span><span class="o">&gt;</span> <span class="p">{</span>
    <span class="k">let</span> <span class="nv">promise</span> <span class="o">=</span> <span class="kt">Promise</span> <span class="p">{</span> <span class="n">seal</span> <span class="k">in</span>
        <span class="kt">DispatchQueue</span><span class="o">.</span><span class="n">main</span><span class="o">.</span><span class="nf">asyncAfter</span><span class="p">(</span><span class="nv">deadline</span><span class="p">:</span> <span class="o">.</span><span class="nf">now</span><span class="p">()</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
            <span class="n">seal</span><span class="o">.</span><span class="nf">fulfill</span><span class="p">(</span><span class="s">"Richi"</span><span class="p">)</span>
        <span class="p">}</span>
    <span class="p">}</span>

    <span class="k">return</span> <span class="kt">AsyncResult</span><span class="o">&lt;</span><span class="kt">User</span><span class="o">&gt;</span><span class="p">(</span><span class="n">promise</span><span class="p">)</span>
<span class="p">}</span>

<span class="c1">// MARK: - Presenter</span>
<span class="kd">class</span> <span class="kt">Presenter</span> <span class="p">{</span>
    
    <span class="k">var</span> <span class="nv">system</span><span class="p">:</span> <span class="kt">System</span><span class="o">&lt;</span><span class="kt">State</span><span class="p">,</span> <span class="kt">Event</span><span class="o">&gt;</span><span class="p">?</span>
    <span class="k">weak</span> <span class="k">var</span> <span class="nv">view</span><span class="p">:</span> <span class="kt">View</span><span class="p">?</span>
    
    <span class="nf">init</span><span class="p">(</span><span class="nv">view</span><span class="p">:</span> <span class="kt">View</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">self</span><span class="o">.</span><span class="n">view</span> <span class="o">=</span> <span class="n">view</span>
    <span class="p">}</span>
    
    <span class="kd">func</span> <span class="nf">sendEvent</span><span class="p">(</span><span class="n">_</span> <span class="nv">event</span><span class="p">:</span> <span class="kt">Event</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">system</span><span class="p">?</span><span class="o">.</span><span class="nf">sendEvent</span><span class="p">(</span><span class="n">event</span><span class="p">)</span>
    <span class="p">}</span>
    
    <span class="kd">func</span> <span class="nf">controllerLoaded</span><span class="p">()</span> <span class="p">{</span>
        <span class="n">system</span> <span class="o">=</span> <span class="kt">System</span><span class="o">.</span><span class="nf">pure</span><span class="p">(</span>
            <span class="nv">initialState</span><span class="p">:</span> <span class="kt">State</span><span class="o">.</span><span class="n">start</span><span class="p">,</span>
            <span class="nv">reducer</span><span class="p">:</span> <span class="kt">State</span><span class="o">.</span><span class="n">reduce</span><span class="p">,</span>
            <span class="nv">uiBindings</span><span class="p">:</span> <span class="p">[</span><span class="n">view</span><span class="p">?</span><span class="o">.</span><span class="n">updateUI</span><span class="p">]</span>
        <span class="p">)</span>
    <span class="p">}</span>
<span class="p">}</span>

</code></pre></div></div>

<h2 id="bonus-analytics">Bonus: analytics</h2>
<p>In case you want to add analytics to your app, you will end up having lots of calls to some <code class="language-plaintext highlighter-rouge">TrackingService.trackEvent</code> method among the code. Which, sometimes, can become an mess.</p>

<p>Luckily, PromisedArchitectureKit, includes the “addLoopCallback(callback: @escaping (State)-&gt;())” function, that will be called every time a state change occurs. The function receives the new state as a parameter, which can be use for analytics.</p>

<h3 id="analytics-example">Analytics Example</h3>

<div class="language-swift highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">func</span> <span class="nf">handleAnalitycs</span><span class="p">(</span><span class="nv">state</span><span class="p">:</span> <span class="kt">State</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">switch</span> <span class="n">state</span> <span class="p">{</span>
    <span class="k">case</span> <span class="o">.</span><span class="nv">start</span><span class="p">:</span>
        <span class="kt">EventTracker</span><span class="o">.</span><span class="nf">trackEvent</span><span class="p">(</span><span class="nv">event</span><span class="p">:</span> <span class="o">.</span><span class="n">pdpShown</span><span class="p">)</span>
        
    <span class="k">case</span> <span class="o">.</span><span class="nv">loading</span><span class="p">:</span>
        <span class="kt">EventTracker</span><span class="o">.</span><span class="nf">trackEvent</span><span class="p">(</span><span class="nv">event</span><span class="p">:</span> <span class="o">.</span><span class="n">pdpLoading</span><span class="p">)</span>

    <span class="k">case</span> <span class="o">.</span><span class="nf">productLoaded</span><span class="p">(</span><span class="k">let</span> <span class="nv">product</span><span class="p">):</span>
        <span class="kt">EventTracker</span><span class="o">.</span><span class="nf">trackEvent</span><span class="p">(</span><span class="nv">event</span><span class="p">:</span> <span class="o">.</span><span class="n">productLoaded</span><span class="p">,</span> <span class="nv">attr</span><span class="p">:</span> <span class="n">product</span><span class="p">)</span>

    <span class="k">case</span> <span class="o">.</span><span class="nf">error</span><span class="p">(</span><span class="k">let</span> <span class="nv">error</span><span class="p">):</span>
        <span class="kt">EventTracker</span><span class="o">.</span><span class="nf">trackEvent</span><span class="p">(</span><span class="nv">event</span><span class="p">:</span> <span class="o">.</span><span class="n">pdpError</span><span class="p">,</span> <span class="nv">attr</span><span class="p">:</span> <span class="n">error</span><span class="p">)</span>

        
    <span class="k">case</span> <span class="o">.</span><span class="nf">addedToCart</span><span class="p">(</span><span class="k">let</span> <span class="nv">product</span><span class="p">,</span> <span class="n">_</span><span class="p">):</span>
        <span class="kt">EventTracker</span><span class="o">.</span><span class="nf">trackEvent</span><span class="p">(</span><span class="nv">event</span><span class="p">:</span> <span class="o">.</span><span class="n">pdpAddedToCart</span><span class="p">,</span> <span class="nv">attr</span><span class="p">:</span> <span class="n">product</span><span class="p">)</span>

    <span class="p">}</span>
<span class="p">}</span>


<span class="kd">func</span> <span class="nf">controllerLoaded</span><span class="p">()</span> <span class="p">{</span>
    <span class="n">system</span> <span class="o">=</span> <span class="kt">System</span><span class="o">.</span><span class="nf">pure</span><span class="p">(</span>
        <span class="nv">initialState</span><span class="p">:</span> <span class="kt">State</span><span class="o">.</span><span class="n">start</span><span class="p">,</span>
        <span class="nv">reducer</span><span class="p">:</span> <span class="kt">State</span><span class="o">.</span><span class="n">reduce</span><span class="p">,</span>
        <span class="nv">uiBindings</span><span class="p">:</span> <span class="p">[</span><span class="n">view</span><span class="p">?</span><span class="o">.</span><span class="n">updateUI</span><span class="p">]</span>
    <span class="p">)</span>
        
    <span class="n">system</span><span class="p">?</span><span class="o">.</span><span class="nf">addLoopCallback</span><span class="p">(</span><span class="nv">callback</span><span class="p">:</span> <span class="n">handleAnalytics</span><span class="p">)</span>
<span class="p">}</span>
    
</code></pre></div></div>

<p>By adding the <code class="language-plaintext highlighter-rouge">handleAnalytics</code> method as a system’s loop callback, we have all analytics in the same place, centralized.</p>

<p>Disclaimer: This will only work with analytics related to logic. If you need to track things like “User did scroll”, you will need to do it the same way as without the library.</p>

<h2 id="author">Author</h2>

<p>Ricardo Pallás</p>

<h2 id="license">License</h2>

<p>PromisedArchitectureKit is available under the MIT license. See the LICENSE file for more info.</p>]]></content><author><name>Ricardo Pallas</name></author><category term="blog" /><category term="functional programming" /><category term="swift" /><category term="ArchitectureKit" /><category term="PromisedArchitectureKit" /><category term="PromiseKit" /><category term="FunctionalKit" /><category term="Swiftz" /><summary type="html"><![CDATA[Simplest architecture for PromiseKit]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://rpallas92.github.io/assets/images/markdown.jpg" /><media:content medium="image" url="https://rpallas92.github.io/assets/images/markdown.jpg" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Functional architecture for Swift</title><link href="https://rpallas92.github.io/functional-architecture-for-swift/" rel="alternate" type="text/html" title="Functional architecture for Swift" /><published>2018-01-07T23:48:00+00:00</published><updated>2018-01-07T23:48:00+00:00</updated><id>https://rpallas92.github.io/functional-architecture-for-swift</id><content type="html" xml:base="https://rpallas92.github.io/functional-architecture-for-swift/"><![CDATA[<p><img src="https://cdn-images-1.medium.com/max/1600/1*6rHrtpuEEnQj19mOUpS-pQ.png" alt="" /></p>

<p>In this article I am going to introduce a library for architecting iOS apps,
called ArchitectureKit:</p>

<blockquote>
  <p>“Simplest architecture for FunctionalKit”</p>
</blockquote>

<h3 id="table-of-contents">Table of contents</h3>

<ol>
  <li>Introduction</li>
  <li>Motivation</li>
  <li>ArchitectureKit</li>
  <li>Dependency Injection</li>
  <li>Full example</li>
  <li>Conclusion</li>
</ol>

<h3 id="1-introduction">1. Introduction</h3>

<p><a href="https://github.com/RPallas92/ArchitectureKit">ArchitectureKit</a> is a library
that tries to enforce correctness and simplify the state management of
applications and systems. It helps you write applications that behave
consistently, and are easy to test. It’s strongly inspired by
<a href="https://redux.js.org">Redux</a> and
<a href="https://github.com/NoTests/RxFeedback.swift">RxFeedback</a>.</p>

<h3 id="2-motivation">2. Motivation</h3>

<p>I have been trying to find a proper way and architecture  to simplify the
complexity of managing and handling the state of mobile applications, and also,
easy to test.</p>

<p>I started with
<a href="https://developer.apple.com/library/content/documentation/General/Conceptual/DevPedia-CocoaCore/MVC.html">Model-View-Controller</a>
(MVC), then
<a href="https://msdn.microsoft.com/en-us/library/hh848246.aspx">Model-View-ViewModel</a>
(MVVM) and also
<a href="https://en.wikipedia.org/wiki/Modelâviewâpresenter">Model-View-Presenter</a>
(MVP) along with <a href="https://fernandocejas.com/2014/09/03/architecting-android-the-clean-way/">Clean
architecture</a>.
MVC is not as easy to test as in MVVM and MVP. <strong>MVVM and MVP are easy to
test</strong>, but the issue is the <strong>UI state can be a mess</strong>, since there is not a
centralized way to update it, and you can have lots of methods among the code
that changes the state.</p>

<p>Then it appeared <a href="https://guide.elm-lang.org/architecture/">Elm</a> and
<a href="https://redux.js.org">Redux</a> and other Redux-like architectures as
<a href="https://github.com/redux-observable/redux-observable">Redux-Observable</a>,
<a href="https://github.com/NoTests/RxFeedback.swift">RxFeedback</a>,
<a href="https://cycle.js.org">Cycle.js</a>, <a href="https://github.com/ReSwift/ReSwift">ReSwift</a>,
etc. The main difference between these architectures (including ArchitectureKit)
and MVP is that they <strong>introduce constrains of how the UI state can be updated,
in order to enforce correctness and make apps easier to reason about.</strong></p>

<p>Which make ArchitectureKit different from these Redux-like architectures is it
uses feedback loops to run effects and encodes them into part of state (we will
see this in point 3) and uses monads from FunctionalKit to wrap the effects.</p>

<p><strong>ArchitectureKit runs **<a href="https://en.wikipedia.org/wiki/Side_effect_(computer_science)">side
effects</a></strong> for
you. Your code stays 100% pure.**</p>

<h3 id="3-architecturekit">3. ArchitectureKit</h3>

<p>ArchitectureKit itself is very simple.</p>

<h4 id="the-core-concept">The core concept</h4>

<p>Each screen of your app (and the whole app) has a state itself. in
ArchitectureKit, this state is represented as and object (i.e. Struct). For
example, the state of a TO-DO app might look like this:</p>

<p>This State object, representes the state of the “List of TO-DOs screen” in a
TO-DO app. The “todos” var contains all the to-dos that might be drawn in the
screen and “visibilityFilter” tells what todos should appear in the list.</p>

<p>With this approach of having an object that actually represents the state of a
screen, <strong>views are a direct mapping of state</strong>:</p>

<blockquote>
  <p><code class="language-plaintext highlighter-rouge">view = f(state)</code></p>
</blockquote>

<p>That “f” function will be the UI binding function that we will see later on.</p>

<p>To change something in the state, you need to dispatch an <strong>Event</strong>. An event is
an enum that describes what happened. Here are a few example events:</p>

<p>Enforcing that every change is described as an event lets us have a clear
understanding of what’s going on in the app. If something changed, we know why
it changed. Events are like breadcrumbs of what has happened. Finally, to tie
state and actions together, we write a function called a reducer. A reducer it’s
just a function that takes state and action as arguments, and returns the next
state of the app:</p>

<blockquote>
  <p><strong>(State, Event) -&gt; State</strong></p>
</blockquote>

<p>We write a reducer for every state of every screen. For the list of todos’
screen:</p>

<p>Notice that the reducer is a pure function, in terms of <a href="https://en.wikipedia.org/wiki/Referential_transparency">referencial
transparency</a>, and for
state S and event E, it always return the same state, and has no side effects.</p>

<p>This is basically the whole idea of ArchitectureKit. Note that we haven’t used
any ArchitectureKit APIs. It comes with a few utilities to facilitate this
pattern, but the main idea is that you describe how your state is updated over
time in response to events, and 90% of the code you write is just plain Swift,
so the UI logic can be tested with ease.</p>

<p>But what about asynchronous code and side effects as API calls, DB calls,
logging, reading and writing files?</p>

<h4 id="asyncresult-and-functionalkit">AsyncResult and FunctionalKit</h4>

<p>The  <code class="language-plaintext highlighter-rouge">AsyncResult</code>** **data structure is used for handling asynchronous
operations. AsyncResult is just a  <code class="language-plaintext highlighter-rouge">typealias</code> to a <code class="language-plaintext highlighter-rouge">Reader&lt;Future&lt;Result&gt;&gt;</code>
monad stack. These monads (and its monad transformers) are available in
<a href="https://github.com/facile-it/FunctionalKit">FunctionalKit</a>, which is the only
dependency in ArchitectureKit.</p>

<p>FunctionalKit provides basic functions and combinators for functional
programming in Swift, and it can be considered a extension to <code class="language-plaintext highlighter-rouge">Foundation</code>. We
mainly use the Reader monad along with the Future and Result.</p>

<ul>
  <li><a href="https://github.com/facile-it/FunctionalKit/blob/master/Sources/FunctionalKit/ReaderType.swift">Reader
monad:</a>**
**it is used in the top of the monad stack to provide a way to inject
dependencies. We will see it in depth later.</li>
  <li><a href="https://github.com/facile-it/FunctionalKit/blob/master/Sources/FunctionalKit/FutureType.swift">Future
monad:</a>
it is used to represent async values.</li>
  <li><a href="https://github.com/facile-it/FunctionalKit/blob/master/Sources/FunctionalKit/ResultType.swift">Result
monad:</a>
it represents if a computation was successful or there was an error.</li>
</ul>

<p>We use monad transformers to create AsyncResult as a stack of these three
monads. <strong>An AsyncResult is a monad that represents an asynchornous operation
that returns either a successful value or an error, and also provides a
mechanism for dependency injection.</strong></p>

<p>We can see in the following snippet an example of Facebook login using
AsyncResult:</p>

<p>To create an AsyncResult we use its static method  <code class="language-plaintext highlighter-rouge">unfoldTT</code> (the TT stands for
transformer, since it is a monad transoformer). It expects a function as a
parameter which has two inputs: an environment or context and a continuation or
callback. The environment parameter comes from the Reader monad and it is an
object that contains the injected dependencies. The continuation parameter is a
callback function that must be called with the   <code class="language-plaintext highlighter-rouge">Result</code> value returned from de
async operation. In the example the Result returns an   <code class="language-plaintext highlighter-rouge">string</code> when it
succeeds. When the login succeeds, we call the continuation method with a
sucessful <code class="language-plaintext highlighter-rouge">Result</code> with the token from Facebook. It the login fails, we call the
continuation method with a failure <code class="language-plaintext highlighter-rouge">Result</code> that contains the error.</p>

<p>The AsyncResult must me parameterized with 3 values. First one is the
Environment type (which contains the dependencies), second one is the actual
value expected from the async operation (in the example we use    <code class="language-plaintext highlighter-rouge">string</code>
because we expect the facebook login to return the login token), and the last
one is the error type the  <code class="language-plaintext highlighter-rouge">Result</code> will return if something goes wrong.</p>

<p>Every asynchronous operation and side effect must be performed using the
AsyncResult monad and we will use Feedbacks from ArchitectureKit to execute the
their side effects. Also, we will see how to work with AsyncResults in the full
example.</p>

<h4 id="design-feedback-loops">Design feedback loops</h4>

<p>Let’s add a new feature to our previous TO-DO app! We want let users to save
their TO-DOs to the cloud. That would require an network call, which is a side
effect and it is asynchronous, so to achieve it, we will use feedback loops. The
way of dealing with effects in ArchitecrueKit is encode them into part of state
and then design the feedback loop.</p>

<p>A feedback loop is just a computation that is triggered in some cases, depending
on the current state of the system, that launches a new event, and produces a
new state.</p>

<p>A whole ArchitectureKit loop begins from a
<a href="https://github.com/RPallas92/ArchitectureKit/blob/master/ArchitectureKit/UserAction.swift">UserAction</a>
that triggers an event. Then the reducer function computes a new state from the
event and previous state. ArchitectureKit checks if any feedback loop must be
triggered from the new state. If so, the feedback produces a new event
asyncrhonously (by executing side effects) and a new state if computed from the
feedback’s event.</p>

<p>So, we can see a whole ArchitectureKit loop as the following sequence:</p>

<ol>
  <li>UserAction produces an event.</li>
  <li>reducer(Current state, event) -&gt; new state.</li>
  <li>Query new state to check if feedback loop must be triggered.</li>
  <li>if so, new event triggered (side effects executed).</li>
  <li>reducer(new state, new event) -&gt; newer state.</li>
  <li>Repeat from step 3 until no more feedback (or maximum of 5 feedback loops)</li>
</ol>

<p><img src="https://cdn-images-1.medium.com/max/1600/1*HHbqRbOi9HHBrbwFahoCAQ.png" alt="" />
<span class="figcaption_hack">ArchitectureKit whole loop</span></p>

<p>In the following code snippet, we can see a Feedback example of how to store the
user’s TO-DOs in the cloud:</p>

<p>For implementing this feature, two new events have been added  <code class="language-plaintext highlighter-rouge">storeTodos()</code>
and <code class="language-plaintext highlighter-rouge">todosStored(Bool)</code> and there is a new Bool variable in the state: 
<code class="language-plaintext highlighter-rouge">mustStoreTodos</code>. The  <code class="language-plaintext highlighter-rouge">storeUserTodos(todos:[Todo])</code> function is the function
executed in the feedback loop, which returns an AsyncResult monad that returns
the  <code class="language-plaintext highlighter-rouge">todosStored(Bool)</code> event when side effects are executed. This function is
in charge of storing the user’s TO-DOs.</p>

<p>A Feedback object is composed by two functions that receive the current state as
parameter. The first function is the actual AsyncResult to be executed, and the
second one checks when the feedback loop must be executed, depending on the
state. In the example, the user’s TO-DOs feedback will be executed when<br />
<code class="language-plaintext highlighter-rouge">mustStoreTodos</code> variable is true.</p>

<p>In the new reducer, the <code class="language-plaintext highlighter-rouge">storeTodos()</code> event is setting    <code class="language-plaintext highlighter-rouge">mustStoreTodos</code> to
true, and  <code class="language-plaintext highlighter-rouge">todosStored(Bool)</code> is setting it back to false. The<code class="language-plaintext highlighter-rouge">storeTodos()</code>
event will be triggered by an UserAction, like tapping a button.</p>

<p>The following diagram illustrates the steps for storing the user’s TO-DOs:</p>

<p><img src="https://cdn-images-1.medium.com/max/1600/1*gs6P6pEHxK6_iSNNaV4WbQ.png" alt="" />
<span class="figcaption_hack">How Feedback loop is executed after an UserAction</span></p>

<h4 id="who-dispatches-events-useractions">Who dispatches events? UserActions</h4>

<p><a href="https://github.com/RPallas92/ArchitectureKit/blob/master/ArchitectureKit/UserAction.swift">UserAction</a>
is the object from ArchitectureKit that represents any action from the User or
the iOS framework that triggers an event that changes the state (and from that
state change it could trigger a feedback loop).</p>

<p>It has two methods:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">init</code>: creates the UserAction and specifies what event will be triggered when
user actions is executed</li>
  <li><code class="language-plaintext highlighter-rouge">execute</code>: executes the user action.</li>
</ul>

<h4 id="simple-example">Simple example</h4>

<p>We can see here a simple example of how ArchitectureKit’s code would look like:</p>

<p>It’s a simple counter with an increment and decrement buttons. The State is just
an integer that contains the current count.</p>

<p><img src="https://cdn-images-1.medium.com/max/1600/1*Mk6XqMNScFYorwUbdrHEuQ.gif" alt="" /></p>

<h3 id="dependency-injection">Dependency injection</h3>

<p>see Jorge Castillo article in Kotlin using Kategory</p>

<h3 id="full-example">Full example</h3>

<p>also add a diagram</p>

<p>key: view = f(state) direct mapping between state and view</p>

<p>ArchitectureKit runs side effects for you. So your code stays 100% pure</p>

<h3 id="conclusion">Conclusion</h3>

<p>(when I would use ArchitectureKit)</p>

<p>difference ArchitectureKIt uses Feedback loops, i remconed use it with
Functioanl Clean Architecture, it runs the side effects</p>

<p>but Functional clean architecture (functions and no objects, except objetcs for
dependency inbjection and protocols)</p>

<p>next steps: create user actions for every UIKit control</p>]]></content><author><name>Ricardo Pallas</name></author><category term="blog" /><category term="functional programming" /><category term="swift" /><category term="ArchitectureKit" /><category term="FunctionalKit" /><category term="Swiftz" /><summary type="html"><![CDATA[Simplest architecture for FunctionalKit]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://rpallas92.github.io/assets/images/markdown.jpg" /><media:content medium="image" url="https://rpallas92.github.io/assets/images/markdown.jpg" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Functional data validation in Swift</title><link href="https://rpallas92.github.io/switz-validation/" rel="alternate" type="text/html" title="Functional data validation in Swift" /><published>2017-09-24T22:48:00+00:00</published><updated>2017-09-24T22:48:00+00:00</updated><id>https://rpallas92.github.io/switz-validation</id><content type="html" xml:base="https://rpallas92.github.io/switz-validation/"><![CDATA[<p>I am going to talk about a little library I created in Swift to be used either
standlone or with <a href="https://github.com/typelift/Swiftz">Swiftz</a> lib. It is called
<a href="https://github.com/RPallas92/Swiftz-Validation">Swiftz-Validation.</a></p>

<p><img src="https://cdn-images-1.medium.com/max/1600/1*gfmaVQKjba0As4J_Bdo5vQ.png" alt="" /></p>

<h4 id="what-is-swiftz-validation">What is Swiftz-Validation?</h4>

<p>It’s a data structure that typically models form validations, and other
scenarios where you want to aggregate all failures, rather than short-circuit if
an error happens (for which Swiftx’s Either is better suited). A Validation may
either be a Success(value), which contains a successful value, or a
Failure(value), which contains an error.</p>

<p>A Validation is a data structure that implements the Applicative interface
(<code class="language-plaintext highlighter-rouge">.ap</code>), and does so in a way that if a failure is applied to another failure,
then it results in a new validation that contains the failures of both
validations. In other words, Validation is a data structure made for errors that
can be aggregated, and it makes sense in the contexts of things like form
validations, where you want to display to the user all of the fields that failed
the validation rather than just stopping at the first failure.</p>

<p>Validations can’t be as easily used for sequencing operations because the<code class="language-plaintext highlighter-rouge">.ap</code>
method takes two validations, so the operations that create them must have been
executed already. While it is possible to use Validations in a sequential
manner, it’s better to leave the job to
<a href="https://github.com/typelift/Swiftx/blob/master/Sources/Either.swift">Either</a>, a
data structure made for that.</p>

<h4 id="validating-data-example">Validating data example</h4>

<p>In the following example we are going to validate a password: it should contain
more than 8 characters, it should contain an especial character and it has to be
different from the user name.</p>

<div class="language-swift highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
    
    <span class="c1">//Check if the password is long enough</span>
    <span class="kd">func</span> <span class="nf">isPasswordLongEnough</span><span class="p">(</span><span class="n">_</span> <span class="nv">password</span><span class="p">:</span><span class="kt">String</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="kt">Validation</span><span class="o">&lt;</span><span class="p">[</span><span class="kt">String</span><span class="p">],</span> <span class="kt">String</span><span class="o">&gt;</span> <span class="p">{</span>
        <span class="k">if</span> <span class="n">password</span><span class="o">.</span><span class="n">characters</span><span class="o">.</span><span class="n">count</span> <span class="o">&lt;</span> <span class="mi">8</span> <span class="p">{</span>
            <span class="k">return</span> <span class="kt">Validation</span><span class="o">.</span><span class="kt">Failure</span><span class="p">([</span><span class="s">"Password must have more than 8 characters."</span><span class="p">])</span>
        <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
            <span class="k">return</span> <span class="kt">Validation</span><span class="o">.</span><span class="kt">Success</span><span class="p">(</span><span class="n">password</span><span class="p">)</span>
        <span class="p">}</span>
    <span class="p">}</span>
    
    <span class="c1">//Check if the password contains a special character</span>
    <span class="kd">func</span> <span class="nf">isPasswordStrongEnough</span><span class="p">(</span><span class="n">_</span> <span class="nv">password</span><span class="p">:</span><span class="kt">String</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="kt">Validation</span><span class="o">&lt;</span><span class="p">[</span><span class="kt">String</span><span class="p">],</span> <span class="kt">String</span><span class="o">&gt;</span> <span class="p">{</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">password</span><span class="o">.</span><span class="nf">range</span><span class="p">(</span><span class="nv">of</span><span class="p">:</span><span class="s">"[</span><span class="se">\\</span><span class="s">W]"</span><span class="p">,</span> <span class="nv">options</span><span class="p">:</span> <span class="o">.</span><span class="n">regularExpression</span><span class="p">)</span> <span class="o">!=</span> <span class="kc">nil</span><span class="p">){</span>
            <span class="k">return</span> <span class="kt">Validation</span><span class="o">.</span><span class="kt">Success</span><span class="p">(</span><span class="n">password</span><span class="p">)</span>
        <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
            <span class="k">return</span> <span class="kt">Validation</span><span class="o">.</span><span class="kt">Failure</span><span class="p">([</span><span class="s">"Password must contain a special character."</span><span class="p">])</span>
        <span class="p">}</span>
    <span class="p">}</span>
    
    <span class="c1">//Check if the user is different from password, by Jlopez</span>
    <span class="kd">func</span> <span class="nf">isDifferentUserPass</span><span class="p">(</span><span class="n">_</span> <span class="nv">user</span><span class="p">:</span><span class="kt">String</span><span class="p">,</span> <span class="n">_</span> <span class="nv">password</span><span class="p">:</span><span class="kt">String</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="kt">Validation</span><span class="o">&lt;</span><span class="p">[</span><span class="kt">String</span><span class="p">],</span> <span class="kt">String</span><span class="o">&gt;</span> <span class="p">{</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">user</span> <span class="o">==</span> <span class="n">password</span><span class="p">){</span>
            <span class="k">return</span> <span class="kt">Validation</span><span class="o">.</span><span class="kt">Failure</span><span class="p">([</span><span class="s">"Username and password MUST be different."</span><span class="p">])</span>
        <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
            <span class="k">return</span> <span class="kt">Validation</span><span class="o">.</span><span class="kt">Success</span><span class="p">(</span><span class="n">password</span><span class="p">)</span>
        <span class="p">}</span>
    <span class="p">}</span>
    

    <span class="c1">//Concating all validations in one that checks all rules</span>
    <span class="kd">func</span> <span class="nf">isPasswordValid</span><span class="p">(</span><span class="nv">user</span><span class="p">:</span> <span class="kt">String</span><span class="p">,</span> <span class="nv">password</span><span class="p">:</span><span class="kt">String</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="kt">Validation</span><span class="o">&lt;</span><span class="p">[</span><span class="kt">String</span><span class="p">],</span> <span class="kt">String</span><span class="o">&gt;</span> <span class="p">{</span>
        <span class="k">return</span> <span class="nf">isPasswordLongEnough</span><span class="p">(</span><span class="n">password</span><span class="p">)</span>
            <span class="o">.</span><span class="nf">sconcat</span><span class="p">(</span><span class="nf">isPasswordStrongEnough</span><span class="p">(</span><span class="n">password</span><span class="p">))</span>
            <span class="o">.</span><span class="nf">sconcat</span><span class="p">(</span><span class="nf">isDifferentUserPass</span><span class="p">(</span><span class="n">user</span><span class="p">,</span> <span class="n">password</span><span class="p">))</span>
    <span class="p">}</span>


    <span class="c1">//Examples with invalid password</span>
    <span class="k">let</span> <span class="nv">result</span> <span class="o">=</span> <span class="nf">isPasswordValid</span><span class="p">(</span><span class="nv">user</span><span class="p">:</span> <span class="s">"Richi"</span><span class="p">,</span> <span class="nv">password</span><span class="p">:</span> <span class="s">"Richi"</span><span class="p">)</span>
    <span class="cm">/* ▿ Validation&lt;Array&lt;String&gt;, String&gt;
           ▿ Failure : 3 elements
                - 0 : "Password must have more than 8 characters."
                - 1 : "Password must contain a special character."
                - 2 : "Username and password MUST be different."
    */</span>

    <span class="c1">//Example with valid password</span>
    <span class="k">let</span> <span class="nv">result</span> <span class="o">=</span> <span class="nf">isPasswordValid</span><span class="p">(</span><span class="nv">user</span><span class="p">:</span><span class="s">"Richi"</span><span class="p">,</span> <span class="nv">password</span><span class="p">:</span> <span class="s">"Ricardo$"</span><span class="p">)</span>
    <span class="cm">/*
       ▿ Validation&lt;Array&lt;String&gt;, String&gt;
           - Success : "Ricardo$"
    */</span>
</code></pre></div></div>

<h3 id="advantages-of-using-swiftz-validation">Advantages of using Swiftz-Validation</h3>

<p>Things like form and schema validation are pretty common in programming, but we
end up either using branching or designing very specific solutions for each
case.</p>

<p>With branching, I mean using if-else conditions, things get quickly out of hand,
it doesn’t scale because it’s difficult to abstract over it and it’s hard to
reason about each rule. Let’s see an example of the same validation as before,
using branching:</p>

<div class="language-swift highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
<span class="kd">func</span> <span class="nf">validatePassword</span><span class="p">(</span><span class="nv">username</span><span class="p">:</span> <span class="kt">String</span><span class="p">,</span> <span class="nv">password</span><span class="p">:</span><span class="kt">String</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="p">[</span><span class="kt">String</span><span class="p">]{</span>
        <span class="k">var</span> <span class="nv">errors</span><span class="p">:[</span><span class="kt">String</span><span class="p">]</span> <span class="o">=</span> <span class="p">[]</span>
        
        <span class="k">if</span> <span class="n">password</span><span class="o">.</span><span class="n">characters</span><span class="o">.</span><span class="n">count</span> <span class="o">&lt;</span> <span class="mi">8</span> <span class="p">{</span>
            <span class="n">errors</span><span class="o">.</span><span class="nf">append</span><span class="p">(</span><span class="s">"Password must have more than 8 characters."</span><span class="p">)</span>
        <span class="p">}</span>
        
        <span class="k">if</span> <span class="p">(</span><span class="n">password</span><span class="o">.</span><span class="nf">range</span><span class="p">(</span><span class="nv">of</span><span class="p">:</span><span class="s">"[</span><span class="se">\\</span><span class="s">W]"</span><span class="p">,</span> <span class="nv">options</span><span class="p">:</span> <span class="o">.</span><span class="n">regularExpression</span><span class="p">)</span> <span class="o">==</span> <span class="kc">nil</span><span class="p">){</span>
            <span class="n">errors</span><span class="o">.</span><span class="nf">append</span><span class="p">(</span><span class="s">"Password must contain a special character."</span><span class="p">)</span>
        <span class="p">}</span>
        
        <span class="k">if</span> <span class="p">(</span><span class="n">username</span> <span class="o">==</span> <span class="n">password</span><span class="p">){</span>
            <span class="n">errors</span><span class="o">.</span><span class="nf">append</span><span class="p">(</span><span class="s">"Username and password MUST be different."</span><span class="p">)</span>
        <span class="p">}</span>
        
        <span class="k">return</span> <span class="n">errors</span>
    <span class="p">}</span>
    
    <span class="nf">validatePassword</span><span class="p">(</span><span class="nv">username</span><span class="p">:</span> <span class="s">"Richi"</span><span class="p">,</span> <span class="nv">password</span><span class="p">:</span> <span class="s">"Richi"</span><span class="p">)</span>
    <span class="cm">/*
     * Array&lt;String&gt; 3 elements:
     - 0: "Password must have more than 8 characters."
     - 1: "Password must contain a special character."
     - 2: "Username and password MUST be different."
     */</span>
    
    <span class="nf">validatePassword</span><span class="p">(</span><span class="nv">username</span><span class="p">:</span> <span class="s">"Richi"</span><span class="p">,</span> <span class="nv">password</span><span class="p">:</span> <span class="s">"Ricardo$"</span><span class="p">)</span>
    <span class="cm">/*
     * Array&lt;String&gt; 0 elements
     */</span>

</code></pre></div></div>

<p>Because this function uses <code class="language-plaintext highlighter-rouge">if</code> conditions and modifies a local variable it’s
not very modular. This means it’s not possible to split these checks in smaller
pieces that can be entirely understood by themselves — they modify something,
and so you have to understand how they modify that thing, in which context, etc.
For very simple things it’s not too bad, but as complexity grows it becomes
unmanageable.</p>

<h4 id="advantages">Advantages</h4>

<p>The main advantages of Swiftz-Validation is that:</p>

<ul>
  <li>Easy to understand and reason about each validation in its own</li>
  <li>Easy to compose validation rules</li>
  <li>Easy to reuse validation rules and compose more complex validations <a href="https://en.wikipedia.org/wiki/Don't_repeat_yourself">(DRY
principle)</a></li>
  <li>It has a well know interface or abstraction to work with (It is a functor,
pointed, applicative and a semigroup). So you can combine validations with
<strong>sconcat (Semigroup)</strong>, apply functions with <strong>ap (Applicative)</strong>, transform
results with <strong>fmap (Functor) **and react to results with some kind of pattern
matching with a **switch</strong> statement.</li>
</ul>

<p>In the following example, you can see how the Validation structure gives you a
tool for basing validation libraris and functions on in a way that’s reusable
(DRY) and composable:</p>

<div class="language-swift highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="c1">//Validate min length</span>
    <span class="kd">func</span> <span class="nf">minLength</span><span class="p">(</span><span class="n">_</span> <span class="nv">value</span><span class="p">:</span><span class="kt">String</span><span class="p">,</span> <span class="nv">min</span><span class="p">:</span><span class="kt">Int</span><span class="p">,</span> <span class="nv">fieldName</span><span class="p">:</span><span class="kt">String</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="kt">Validation</span><span class="o">&lt;</span><span class="p">[</span><span class="kt">String</span><span class="p">],</span> <span class="kt">String</span><span class="o">&gt;</span><span class="p">{</span>
        <span class="k">if</span><span class="p">(</span><span class="n">value</span><span class="o">.</span><span class="n">characters</span><span class="o">.</span><span class="n">count</span> <span class="o">&lt;</span> <span class="n">min</span><span class="p">){</span>
            <span class="k">return</span> <span class="kt">Validation</span><span class="o">.</span><span class="kt">Failure</span><span class="p">([</span><span class="s">"</span><span class="se">\(</span><span class="n">fieldName</span><span class="se">)</span><span class="s"> must have more than </span><span class="se">\(</span><span class="n">min</span><span class="se">)</span><span class="s"> characters"</span><span class="p">])</span>
        <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
            <span class="k">return</span> <span class="kt">Validation</span><span class="o">.</span><span class="kt">Success</span><span class="p">(</span><span class="n">value</span><span class="p">)</span>
        <span class="p">}</span>
    <span class="p">}</span>
    
    <span class="c1">//Validate match a regular expression</span>
    <span class="kd">func</span> <span class="nf">matches</span><span class="p">(</span><span class="n">_</span> <span class="nv">value</span><span class="p">:</span><span class="kt">String</span><span class="p">,</span> <span class="nv">regex</span><span class="p">:</span><span class="kt">String</span><span class="p">,</span> <span class="nv">errorMessage</span><span class="p">:</span><span class="kt">String</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="kt">Validation</span><span class="o">&lt;</span><span class="p">[</span><span class="kt">String</span><span class="p">],</span> <span class="kt">String</span><span class="o">&gt;</span><span class="p">{</span>
        <span class="k">if</span><span class="p">(</span><span class="n">value</span><span class="o">.</span><span class="nf">range</span><span class="p">(</span><span class="nv">of</span><span class="p">:</span><span class="n">regex</span><span class="p">,</span> <span class="nv">options</span><span class="p">:</span> <span class="o">.</span><span class="n">regularExpression</span><span class="p">)</span> <span class="o">==</span> <span class="kc">nil</span><span class="p">){</span>
            <span class="k">return</span> <span class="kt">Validation</span><span class="o">.</span><span class="kt">Failure</span><span class="p">([</span><span class="n">errorMessage</span><span class="p">])</span>
        <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
            <span class="k">return</span> <span class="kt">Validation</span><span class="o">.</span><span class="kt">Success</span><span class="p">(</span><span class="n">value</span><span class="p">)</span>
        <span class="p">}</span>
    <span class="p">}</span>
    
    <span class="c1">//Validate password: concatenation of matches and minLength</span>
    <span class="kd">func</span> <span class="nf">isPasswordValid</span><span class="p">(</span><span class="n">_</span> <span class="nv">password</span><span class="p">:</span><span class="kt">String</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="kt">Validation</span><span class="o">&lt;</span><span class="p">[</span><span class="kt">String</span><span class="p">],</span> <span class="kt">String</span><span class="o">&gt;</span> <span class="p">{</span>
        <span class="k">return</span> <span class="nf">matches</span><span class="p">(</span><span class="n">password</span><span class="p">,</span> <span class="nv">regex</span><span class="p">:</span> <span class="s">"[</span><span class="se">\\</span><span class="s">W]"</span><span class="p">,</span> <span class="nv">errorMessage</span><span class="p">:</span> <span class="s">"Password must contain an special character"</span><span class="p">)</span>
            <span class="o">.</span><span class="nf">sconcat</span><span class="p">(</span><span class="nf">minLength</span><span class="p">(</span><span class="n">password</span><span class="p">,</span> <span class="nv">min</span><span class="p">:</span> <span class="mi">8</span><span class="p">,</span> <span class="nv">fieldName</span><span class="p">:</span> <span class="s">"Password"</span><span class="p">))</span>
    <span class="p">}</span>
    
    <span class="c1">//Validate name: minLength</span>
    <span class="kd">func</span> <span class="nf">isNameValid</span><span class="p">(</span><span class="n">_</span> <span class="nv">name</span><span class="p">:</span> <span class="kt">String</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="kt">Validation</span><span class="o">&lt;</span><span class="p">[</span><span class="kt">String</span><span class="p">],</span> <span class="kt">String</span><span class="o">&gt;</span> <span class="p">{</span>
        <span class="k">return</span> <span class="nf">minLength</span><span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="nv">min</span><span class="p">:</span> <span class="mi">3</span><span class="p">,</span> <span class="nv">fieldName</span><span class="p">:</span> <span class="s">"Name"</span><span class="p">)</span>
    <span class="p">}</span>
    
    <span class="c1">//Validate form: concatenation of isPasswordValid and isNameValid</span>
    <span class="kd">func</span> <span class="nf">validateForm</span><span class="p">(</span><span class="nv">name</span><span class="p">:</span> <span class="kt">String</span><span class="p">,</span> <span class="nv">password</span><span class="p">:</span> <span class="kt">String</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="kt">Validation</span><span class="o">&lt;</span><span class="p">[</span><span class="kt">String</span><span class="p">],</span> <span class="kt">String</span><span class="o">&gt;</span> <span class="p">{</span>
        <span class="k">return</span> <span class="nf">isNameValid</span><span class="p">(</span><span class="n">name</span><span class="p">)</span>
            <span class="o">.</span><span class="nf">sconcat</span><span class="p">(</span><span class="nf">isPasswordValid</span><span class="p">(</span><span class="n">password</span><span class="p">))</span>
    <span class="p">}</span>
    
    
    <span class="c1">//Examples</span>
    <span class="k">let</span> <span class="nv">result</span> <span class="o">=</span> <span class="nf">validateForm</span><span class="p">(</span><span class="nv">name</span><span class="p">:</span> <span class="s">"FP"</span><span class="p">,</span> <span class="nv">password</span><span class="p">:</span> <span class="s">"Ricardo$"</span><span class="p">)</span>
    <span class="cm">/*▿ Validation&lt;Array&lt;String&gt;, String&gt;
      ▿ Failure : 1 element
        - 0 : "Name must have more than 3 characters"
    */</span>
    <span class="k">let</span> <span class="nv">result1</span> <span class="o">=</span> <span class="nf">validateForm</span><span class="p">(</span><span class="nv">name</span><span class="p">:</span> <span class="s">"FP"</span><span class="p">,</span> <span class="nv">password</span><span class="p">:</span> <span class="s">"A"</span><span class="p">)</span>
    <span class="cm">/*  Validation&lt;Array&lt;String&gt;, String&gt;
      ▿ Failure : 3 elements
        - 0 : "Name must have more than 3 characters"
        - 1 : "Password must contain an special character"
        - 2 : "Password must have more than 8 characters"
    */</span>
    <span class="k">let</span> <span class="nv">result2</span> <span class="o">=</span> <span class="nf">validateForm</span><span class="p">(</span><span class="nv">name</span><span class="p">:</span> <span class="s">"FPZ"</span><span class="p">,</span> <span class="nv">password</span><span class="p">:</span> <span class="s">"A"</span><span class="p">)</span>
    <span class="cm">/* ▿ Validation&lt;Array&lt;String&gt;, String&gt;
       ▿ Failure : 2 elements
        - 0 : "Password must contain an special character"
        - 1 : "Password must have more than 8 characters"
    */</span>
    <span class="k">let</span> <span class="nv">result3</span> <span class="o">=</span> <span class="nf">validateForm</span><span class="p">(</span><span class="nv">name</span><span class="p">:</span> <span class="s">"FPZ"</span><span class="p">,</span> <span class="nv">password</span><span class="p">:</span> <span class="s">"A$k34k21!!"</span><span class="p">)</span>
    <span class="cm">/* ▿ Validation&lt;Array&lt;String&gt;, String&gt;
         - Success : "A$k34k21!!"
    */</span>

</code></pre></div></div>

<h3 id="how-to-use-the-library">How to use the library</h3>

<p>The Validation lib is implemented as an <strong>enum</strong> with two cases:</p>

<ul>
  <li><strong>Success</strong>(successValue) — represents a successful value.</li>
  <li><strong>Failure</strong>(failureValue) — represents an unsuccessful value.</li>
</ul>

<p>Validation functions just return one of these two cases instead of throwing
errors or mutating other variables. The keys of working with Validations are:</p>

<ul>
  <li><strong>Combining validations:</strong> sometimes we want to create very complex validation
rules. They key is to create simple reusable and composable validations in ther
own and combine them into a complex validation structure.</li>
  <li><strong>Transforming validations values:</strong> Sometimes we get a Validation value that is
not what we are looking for. We don’t really want to change anything about the
status of the validation (whether it passed or failed), but we’d like to tweak
the <em>value</em> a little bit. This is the equivalent of applying functions in an
expression.</li>
  <li><strong>Reacting to validations results:</strong> Once we have the validation results, we
need a way to react accordingly if the value is a success or a failure.</li>
</ul>

<p>Now, we are going to see some examples:</p>

<p><strong>Combining validations</strong></p>

<div class="language-swift highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="c1">//Validate min length</span>
    <span class="kd">func</span> <span class="nf">minLength</span><span class="p">(</span><span class="n">_</span> <span class="nv">value</span><span class="p">:</span><span class="kt">String</span><span class="p">,</span> <span class="nv">min</span><span class="p">:</span><span class="kt">Int</span><span class="p">,</span> <span class="nv">fieldName</span><span class="p">:</span><span class="kt">String</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="kt">Validation</span><span class="o">&lt;</span><span class="p">[</span><span class="kt">String</span><span class="p">],</span> <span class="kt">String</span><span class="o">&gt;</span><span class="p">{</span>
        <span class="k">if</span><span class="p">(</span><span class="n">value</span><span class="o">.</span><span class="n">characters</span><span class="o">.</span><span class="n">count</span> <span class="o">&lt;</span> <span class="n">min</span><span class="p">){</span>
            <span class="k">return</span> <span class="kt">Validation</span><span class="o">.</span><span class="kt">Failure</span><span class="p">([</span><span class="s">"</span><span class="se">\(</span><span class="n">fieldName</span><span class="se">)</span><span class="s"> must have more than </span><span class="se">\(</span><span class="n">min</span><span class="se">)</span><span class="s"> characters"</span><span class="p">])</span>
        <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
            <span class="k">return</span> <span class="kt">Validation</span><span class="o">.</span><span class="kt">Success</span><span class="p">(</span><span class="n">value</span><span class="p">)</span>
        <span class="p">}</span>
    <span class="p">}</span>
    
    <span class="c1">//Validate match a regular expression</span>
    <span class="kd">func</span> <span class="nf">matches</span><span class="p">(</span><span class="n">_</span> <span class="nv">value</span><span class="p">:</span><span class="kt">String</span><span class="p">,</span> <span class="nv">regex</span><span class="p">:</span><span class="kt">String</span><span class="p">,</span> <span class="nv">errorMessage</span><span class="p">:</span><span class="kt">String</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="kt">Validation</span><span class="o">&lt;</span><span class="p">[</span><span class="kt">String</span><span class="p">],</span> <span class="kt">String</span><span class="o">&gt;</span><span class="p">{</span>
        <span class="k">if</span><span class="p">(</span><span class="n">value</span><span class="o">.</span><span class="nf">range</span><span class="p">(</span><span class="nv">of</span><span class="p">:</span><span class="n">regex</span><span class="p">,</span> <span class="nv">options</span><span class="p">:</span> <span class="o">.</span><span class="n">regularExpression</span><span class="p">)</span> <span class="o">==</span> <span class="kc">nil</span><span class="p">){</span>
            <span class="k">return</span> <span class="kt">Validation</span><span class="o">.</span><span class="kt">Failure</span><span class="p">([</span><span class="n">errorMessage</span><span class="p">])</span>
        <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
            <span class="k">return</span> <span class="kt">Validation</span><span class="o">.</span><span class="kt">Success</span><span class="p">(</span><span class="n">value</span><span class="p">)</span>
        <span class="p">}</span>
    <span class="p">}</span>
    
    <span class="c1">//Validate password: concatenation of matches and minLength</span>
    <span class="kd">func</span> <span class="nf">isPasswordValid</span><span class="p">(</span><span class="n">_</span> <span class="nv">password</span><span class="p">:</span><span class="kt">String</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="kt">Validation</span><span class="o">&lt;</span><span class="p">[</span><span class="kt">String</span><span class="p">],</span> <span class="kt">String</span><span class="o">&gt;</span> <span class="p">{</span>
        <span class="k">return</span> <span class="nf">matches</span><span class="p">(</span><span class="n">password</span><span class="p">,</span> <span class="nv">regex</span><span class="p">:</span> <span class="s">"[</span><span class="se">\\</span><span class="s">W]"</span><span class="p">,</span> <span class="nv">errorMessage</span><span class="p">:</span> <span class="s">"Password must contain an special character"</span><span class="p">)</span>
            <span class="o">.</span><span class="nf">sconcat</span><span class="p">(</span><span class="nf">minLength</span><span class="p">(</span><span class="n">password</span><span class="p">,</span> <span class="nv">min</span><span class="p">:</span> <span class="mi">8</span><span class="p">,</span> <span class="nv">fieldName</span><span class="p">:</span> <span class="s">"Password"</span><span class="p">))</span>
    <span class="p">}</span>
    
 
    
    <span class="c1">//Is password valid is a more complex validation created by combining minLenght and matches validations</span>
    <span class="k">let</span> <span class="nv">result</span> <span class="o">=</span> <span class="nf">isPasswordValid</span><span class="p">(</span><span class="s">"A"</span><span class="p">)</span>
    <span class="cm">/*  Validation&lt;Array&lt;String&gt;, String&gt;
      ▿ Failure : 2 elements
        - 0 : "Password must contain an special character"
        - 1 : "Password must have more than 8 characters"
    */</span>
   

</code></pre></div></div>

<p><strong>Transforming validation values</strong></p>

<div class="language-swift highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    
    <span class="c1">//The fmap function is only applied on Success values.</span>
    
    <span class="k">let</span> <span class="nv">success</span><span class="p">:</span> <span class="kt">Validation</span><span class="o">&lt;</span><span class="kt">String</span><span class="p">,</span> <span class="kt">Int</span><span class="o">&gt;</span> <span class="o">=</span> <span class="kt">Validation</span><span class="o">.</span><span class="kt">Success</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
    <span class="n">success</span><span class="o">.</span><span class="n">fmap</span><span class="p">{</span> <span class="nv">$0</span> <span class="o">+</span> <span class="mi">1</span> <span class="p">}</span>
    <span class="c1">// ==&gt; Validation.Success(2)</span>
    
    <span class="k">let</span> <span class="nv">failure</span><span class="p">:</span> <span class="kt">Validation</span><span class="o">&lt;</span><span class="kt">String</span><span class="p">,</span> <span class="kt">Int</span><span class="o">&gt;</span> <span class="o">=</span> <span class="kt">Validation</span><span class="o">.</span><span class="kt">Failure</span><span class="p">(</span><span class="s">"error"</span><span class="p">)</span>
    <span class="n">failure</span><span class="o">.</span><span class="n">fmap</span><span class="p">{</span><span class="nv">$0</span> <span class="o">+</span> <span class="mi">1</span><span class="p">}</span>
    <span class="c1">// ==&gt; Validation.Failure("error")</span>

</code></pre></div></div>

<p><strong>Reacting to validation results</strong></p>

<div class="language-swift highlighter-rouge"><div class="highlight"><pre class="highlight"><code>        <span class="c1">//You can react to the validation result value, either it's a success or a failure</span>
        
        <span class="k">let</span> <span class="nv">success</span><span class="p">:</span> <span class="kt">Validation</span><span class="o">&lt;</span><span class="kt">String</span><span class="p">,</span> <span class="kt">Int</span><span class="o">&gt;</span> <span class="o">=</span> <span class="kt">Validation</span><span class="o">.</span><span class="kt">Success</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
        <span class="k">switch</span><span class="p">(</span><span class="n">success</span><span class="p">){</span>
        <span class="k">case</span> <span class="o">.</span><span class="kt">Success</span><span class="p">(</span><span class="k">let</span> <span class="nv">value</span><span class="p">):</span>
            <span class="nf">print</span><span class="p">(</span><span class="n">value</span><span class="p">)</span>
        <span class="k">case</span> <span class="o">.</span><span class="kt">Failure</span><span class="p">(</span><span class="k">let</span> <span class="nv">error</span><span class="p">):</span>
            <span class="nf">print</span><span class="p">(</span><span class="n">error</span><span class="p">)</span>
        <span class="p">}</span>
        <span class="c1">// ==&gt; Print(1)</span>
        
        <span class="k">let</span> <span class="nv">failure</span><span class="p">:</span> <span class="kt">Validation</span><span class="o">&lt;</span><span class="kt">String</span><span class="p">,</span> <span class="kt">Int</span><span class="o">&gt;</span> <span class="o">=</span> <span class="kt">Validation</span><span class="o">.</span><span class="kt">Failure</span><span class="p">(</span><span class="s">"error"</span><span class="p">)</span>
        <span class="k">switch</span><span class="p">(</span><span class="n">failure</span><span class="p">){</span>
        <span class="k">case</span> <span class="o">.</span><span class="kt">Success</span><span class="p">(</span><span class="k">let</span> <span class="nv">value</span><span class="p">):</span>
            <span class="nf">print</span><span class="p">(</span><span class="n">value</span><span class="p">)</span>
        <span class="k">case</span> <span class="o">.</span><span class="kt">Failure</span><span class="p">(</span><span class="k">let</span> <span class="nv">error</span><span class="p">):</span>
            <span class="nf">print</span><span class="p">(</span><span class="n">error</span><span class="p">)</span>
        <span class="p">}</span>
        <span class="c1">// ==&gt; Print("error")</span>

</code></pre></div></div>

<h3 id="conclusion">Conclusion</h3>

<p>I wrote this lib as an personal experiment since the core SwiftZ library doesn’t
include a similar data structure and I think it is a very important one, because
validation is pretty common in every sowftware program. The lib is still work in
progress but it can be used with SwiftZ or standalone. I would add more
operations like liftA3 and similar.</p>

<p>Feel free to pull request the repo and improve it, thanks!.</p>

<p>The lib it’s inspired by the Validation Package for Haskell:
<a href="https://hackage.haskell.org/package/Validation">https://hackage.haskell.org/package/Validation</a></p>

<p><strong>Acknowledgements</strong></p>

<ul>
  <li>Thanks to <a href="https://medium.com/@joseluisalcala">Jose Luis Alcala</a> for helping me
with Swift and SwiftZ.</li>
  <li>Thanks to @jlopez_rz for helping me with test cases.</li>
  <li>Thanks to <a href="https://medium.com/@jorgeatgu">Jorge Aznar </a>for helping me writing
this article.</li>
</ul>]]></content><author><name>Ricardo Pallas</name></author><category term="blog" /><category term="functional programming" /><category term="swift" /><category term="swiftz" /><category term="applicative functor" /><summary type="html"><![CDATA[Functional validation with Swift]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://rpallas92.github.io/assets/images/markdown.jpg" /><media:content medium="image" url="https://rpallas92.github.io/assets/images/markdown.jpg" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Awesome functional programming en JavaScript — Spanish version</title><link href="https://rpallas92.github.io/awesome-fp-js/" rel="alternate" type="text/html" title="Awesome functional programming en JavaScript — Spanish version" /><published>2017-08-31T23:48:00+00:00</published><updated>2017-08-31T23:48:00+00:00</updated><id>https://rpallas92.github.io/awesome-fp-js</id><content type="html" xml:base="https://rpallas92.github.io/awesome-fp-js/"><![CDATA[<p>JavaScript es un lenguaje de programación multi-paradigma, casi siempre
utilizado orientado a objetos, aunque debido a su gran popularidad, se podría
decir que es el lenguaje de programación funcional (<strong>FP</strong>) más utilizado.</p>

<p><strong>Disclaimer:</strong>* Este artículo no pretende enseñar ni introducir en profundidad
la programación funcional, sino que es una guía de bibliotecas y recursos, para
poder utilizar la mayoría de herramientas y capacidades que da el estilo de
programación funcional en JavaScript.*</p>

<p><img src="https://cdn-images-1.medium.com/max/1600/1*5Ma0Z2oOwOsyJFAwT43kMg.png" alt="" />
<span class="figcaption_hack">Lambda — Representa el <a href="https://en.wikipedia.org/wiki/Lambda_calculus">lambda
calculus</a>, base de la
programación funcional.</span></p>

<h4 id="qué-es-la-programación-funcional">¿Qué es la programación funcional?</h4>

<p>Empecemos con una breve introducción de la programación funcional para ponernos
en contexto.</p>

<p>La programación funcional es un paradigma de programación que se basa en
funciones modeladas como funciones matemáticas. La esencia de la programación
funcional es que los programas son una combinación de expresiones. Las
expresiones pueden ser valores concretos, variables o funciones. Las funciones
se pueden definir de forma más específica: son expresiones a las cuales se les
aplica un argumento o entrada, y una vez aplicadas, se pueden <strong>reducir</strong> o
<strong>evaluar</strong>. En los lenguajes funcionales y lenguajes modernos, las funciones
son ciudadanos de primer clase: se pueden utilizar como valores o pueden ser
pasadas como argumentos, o entradas, a otras funciones.</p>

<p>Cabe destacar que, los lenguajes puramente funcionales, están todos basados en
el <a href="https://en.wikipedia.org/wiki/Lambda_calculus">lambda calculus</a>.</p>

<h4 id="por-qué-javascript">¿Por qué JavaScript?</h4>

<p>Ciertamente, no es el mejor lenguaje para hacer FP, siendo que la corriente
actual suele programar en JavaScript de forma imperativa. Además, no es un
lenguaje de programación funcional
<a href="https://www.fpcomplete.com/blog/2017/04/pure-functional-programming">puro</a>, y
es débilmente y dinámicamente tipado 😥.</p>

<p>Sin embargo, sus puntos a favor son:</p>

<ul>
  <li>Es uno de los lenguajes más utilizados en la industria y seguramente trabajes
con él.</li>
  <li>Lo más probable, es que ya sepas programar en JavaScript. No tienes que aprender
un lenguaje nuevo.</li>
  <li>Con la ayuda de bibliotecas puedes utilizar muchas de las herramientas de la FP.</li>
</ul>

<h3 id="bibliotecas-que-facilitan-la-programación-funcional">Bibliotecas (que facilitan la programación funcional)</h3>

<p>Voy a hablar, en este punto, de unas pocas bibliotecas, que he elegido y
utilizado para FP en JavaScript.</p>

<h4 id="sanctuary">Sanctuary</h4>

<p>La primera de ellas (y mi favorita) es <a href="https://sanctuary.js.org/">Sanctuary</a>,
cuyo lema es *“El refugio del JavaScript inseguro”, *refiriéndose a que ayuda a
eliminar muchos errores en tiempo de ejecución, sobre todos los provocados por
valores nulos.</p>

<p>Sanctuary está inspirado por los lenguajes de programación Haskell y PureScript.
Provee un conjunto de funciones similares a Ramda y lodash-fp, pero muchas de
ellas son seguras y trabajan con <em>data types</em> directamente, como por ejemplo el
tipo Maybe.</p>

<p>Provee dos <em>data types</em> básicos, <a href="https://sanctuary.js.org/#maybe-type">Maybe</a> y
<a href="https://sanctuary.js.org/#either-type">Either</a>, que cumplen la especificación
de <a href="https://github.com/fantasyland/fantasy-land/tree/v3.3.0">Fantasy Land</a>, que
es la especificación de facto de ADTs en JavaScript.</p>

<p>Promueve un estilo de programación más seguro, sin los odiosos null checks, y
reduce los posibles errores en tiempo de ejecución.</p>

<p>Una gran ventaja es su <strong>sistema de tipos</strong> ad-hoc en tiempo de ejecución,
definidos en
<a href="https://github.com/sanctuary-js/sanctuary-def/tree/v0.12.1">sanctuary-def</a>. Con
este sistema podemos detectar los errores causados por tipos de forma inmediata,
lo cual nos evita las clásicas sorpresas de JavaScript…</p>

<p>Conviene leer la <a href="https://survivejs.com/blog/sanctuary-interview/">entrevista</a>
que se le hizo a su creador, donde se explica el por qué de la biblioteca.</p>

<h4 id="fluture">Fluture</h4>

<p><a href="https://github.com/fluture-js/Fluture">Fluture</a>, es una biblioteca para provee
de una mónada para ejecutar código asíncrono, parecido a una promesa. Como tal,
representa un valor <em>success *o *failure, *que resulta de una operación
asíncrona de</em> *I/O. La diferencia es que Fluture es una mónada (cumple con su
interfaz y leyes), por lo tanto se evalúa de forma perezosa y no lanza los
side-effects al crearla.</p>

<p>Puede haber confusión entre las similitudes de una Promesa con una mónada. Se
podría decir que el <em>.then *es un *bind, *que el</em> resolve <em>es un</em> pure, etc.
*Pero no hay que olvidarse de que, una promesa, no ofrece la interfaz
especificada por Fantasy Land, ni mucho menos, las leyes de las mónadas. Además
de que ejecutan la operación asíncrona (side effects) nada más crearlas. Se
puede ver más claro en este
<a href="https://glebbahmutov.com/blog/difference-between-promise-and-task/">artículo</a>.</p>

<h4 id="daggy">Daggy</h4>

<p><a href="https://github.com/fantasyland/daggy">Daggy</a>, es una pequeña, pero muy útil,
biblioteca cuya finalidad es crear
<a href="https://en.wikipedia.org/wiki/Algebraic_data_type">ADTs</a> (También llamados
<a href="https://guide.elm-lang.org/types/union_types.html">Union Types</a> por la
comunidad JS/ELM). Permiten representar datos complejos de forma natural e
incluso emular <a href="https://gist.github.com/yang-wei/4f563fbf81ff843e8b1e">pattern
matching</a>.</p>

<p>En el siguiente <a href="https://codesandbox.io/s/3x47m2x0x6">ejemplo</a>, se puede ver
una** brillante manera** de usar las
<a href="https://en.wikipedia.org/wiki/Algebraic_data_type">ADTs</a> creadas con <strong>Daggy</strong>
en componentes de <strong>React</strong>.</p>

<h4 id="otras">Otras</h4>

<p>Cabe destacar también, la biblioteca <a href="http://ramdajs.com/">RamdaJS</a>, que utilicé
en su día junto a Sanctuary. Es una biblioteca de utilidades, es parecida a
Underscore o lodash, pero ahora, Sanctuary es mucho más madura que al principio
y ha incorporado la mayoría de funciones que provee Ramda, y por ello, dejé de
usarla. Las principal diferencia entre ambas son como manejan las entradas
invalidas y el sistema de tipos en runtime. Ramda es más insegura (sus funciones
pueden causar excepciones) porque los creadores no quieren utilizar data types
(ellos creen que les quitaría usuarios), y solo proveen de funciones típicas en
programación funcional, como map, sin aprovechar todo el potencial de este
paradigma de programación.</p>

<p>Y FolktaleJS, una de las pioneras. Su versión 1.o siempre ha sido muy respetada
y ahora están haciendo un gran trabajo en la 2.0, reescribiéndola por completo.
Es el mismo concepto que Sanctuary, funciones tipo map, curry, chain, etc y
tipos de datos como Maybe, Either, Validation… Se podría decir que Folktale es
más orientado a un estilo Java y Sanctuary más hacia Haskell. Además Folktale
incluye el tipo <em>Task</em> para tareas asíncronas, muy parecido a lo que ofrece la
biblioteca Fluture.</p>

<h3 id="libros-y-artículos">Libros y artículos</h3>

<ul>
  <li><a href="https://github.com/MostlyAdequate/mostly-adequate-guide">Professor Frisby’s Mostly Adequate Guide to Functional
Programming</a> — Libro
por excelencia en FP en JS, escrito por <a href="https://twitter.com/drboolean">Brian
Lonsdorf</a>. Introduce al paradigma de programación
funcional <strong>en general</strong> utilizando JavaScript. Es una introducción práctica,
que va añadiendo, desde la intuición, ejemplos reales. Es un libro
imprescindible si no se tiene experiencia previa con FP.</li>
  <li><a href="https://github.com/getify/functional-light-js">Functional-Light JavaScript</a> —
Este libro explora aquellos principios básicos de la FP que se pueden aplicar en
JS. Se diferencia en su enfoque práctico, sin usar toda la terminología, que a
muchos les echa atrás.</li>
  <li><a href="https://hughfdjackson.com/javascript/why-curry-helps/">Why Curry Helps</a> — Una
visión general de como el currying ayuda a escribir código mas reusable y
declarativo.</li>
  <li><a href="http://blog.jenkster.com/2016/06/functional-mumbo-jumbo-adts.html">Functional Mumbo Jumbo —
ADTs</a> — Una
introducción a los tipos algebraicos de datos, para principiantes.</li>
</ul>

<h3 id="ejemplos">Ejemplos</h3>

<ul>
  <li>En el <a href="https://github.com/RPallas92/congreso-web-2016">repositorio</a> del taller
que di en el Congreso web en 2016 acerca de la programación funcional en JS,
podrás encontrar las
<a href="https://github.com/RPallas92/congreso-web-2016/blob/master/slides.pdf">slides</a>
y el <a href="https://github.com/RPallas92/congreso-web-2016/tree/master/app_videos/app_finished">proyecto de
ejemplo</a>,
un <strong>buscador de vídeos de Youtube (utilizando React y FP).</strong></li>
  <li><a href="https://github.com/justsml/escape-from-callback-mountain">Escape from Callback
Mountain</a> —
Refactorización, diseño y buenas prácticas.</li>
  <li>Design &amp; refactoring tips for Promise-based Functional JavaScript. Key benefits
include better readability, testability, and reusability. MIT.</li>
</ul>

<h3 id="otros-recursos"><strong>Otros recursos</strong></h3>

<p>El repositorio de <a href="https://github.com/stoeffel/awesome-fp-js">Awesome FP JS</a>
provee de muchos recursos además de los aquí comentados.</p>

<h3 id="bola-extra-typescript">Bola extra: TypeScript</h3>

<p>Sólo comentar que se <a href="https://github.com/gcanti">Giulio Canti</a> está en proceso
de creación de la biblioteca <a href="https://github.com/gcanti/fp-ts">FP-TS</a>, que
permitirá el uso de la <strong>programación funcional en TypeScript</strong>, con la gran
mejora sobre JavaScript, de los tipos, que ayudan a escribir código más
correcto. Gracias a estos, se puede emular cierto grado de <strong>pattern matching en
TypeScript</strong>, sin bibliotecas adicionales, como se explica
<a href="https://pattern-matching-with-typescript.alabor.me/">aquí.</a></p>

<h3 id="conclusión">Conclusión</h3>

<p>Con la combinación de las bibliotecas <strong>Sanctuary, Fluture, Daggy</strong>, podemos
llegar a programar en el paradigma de programación funcional de una manera muy
similar a la que lo haríamos en cualquier otro lenguaje funcional, salvando las
distancias.</p>

<p>Con este artículo quería demostrar que es posible poner en práctica los
conceptos y herramientas de la FP, sin tener que recurrir a un lenguaje
especializado.</p>

<p>Por otro lado quiero decir, que por sólo hacer FP no vas a desarrollar programas
perfectos, sino que te expones a los mismos problemas que si no haces FP… pero
que aprender FP te va a ayudar a aprender nuevos conceptos para ser mejor
programador… como se discute
<a href="https://medium.com/@bfil/just-enough-functional-programming-a0c4fd09c8f7">aquí</a>.</p>

<p>También quiero recomendar el uso en desarrollos frontend de otros lenguajes
tipados como <a href="http://www.purescript.org/">PureScript</a>, Elm, ReasonML, que ayudan
a escribir programas más correctos, sobre todo cuando se trata de proyectos más
grandes y complejos.</p>

<p>Antes de finalizar, decir que en los enlaces que se han ido poniendo a lo largo
del artículo de puede ampliar más información.</p>

<p>Finalmente, para el siguiente post, veremos un ejemplo sencillo de como parsear
JSON de un servidor de forma segura con programación imperativa vs programación
funcional.</p>]]></content><author><name>Ricardo Pallas</name></author><category term="blog" /><category term="functional programming" /><category term="javascript" /><category term="typescript" /><category term="ramda" /><category term="sanctuary" /><summary type="html"><![CDATA[FP state of the art in JS]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://rpallas92.github.io/assets/images/markdown.jpg" /><media:content medium="image" url="https://rpallas92.github.io/assets/images/markdown.jpg" xmlns:media="http://search.yahoo.com/mrss/" /></entry></feed>